Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 8 additions & 26 deletions sdk/ai/azure-ai-voicelive/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,16 @@
# Release History

## 1.2.0 (2025-12-10)
## 1.2.0b3 (Unreleased)

### Features Added

- **MCP (Model Context Protocol) Support**: Added comprehensive support for Model Context Protocol integration:
- `MCPServer` tool type for defining MCP server configurations with authorization, headers, and approval requirements
- `MCPTool` model for representing MCP tool definitions with input schemas and annotations
- `MCPApprovalType` enum for controlling approval workflows (`never`, `always`, or tool-specific)
- New item types: `MCPApprovalResponseRequestItem`, `ResponseMCPApprovalRequestItem`, `ResponseMCPApprovalResponseItem`, `ResponseMCPCallItem`, and `ResponseMCPListToolItem`
- New server events: `ServerEventMcpListToolsInProgress`, `ServerEventMcpListToolsCompleted`, `ServerEventMcpListToolsFailed`, `ServerEventResponseMcpCallArgumentsDelta`, `ServerEventResponseMcpCallArgumentsDone`, `ServerEventResponseMcpCallInProgress`, `ServerEventResponseMcpCallCompleted`, and `ServerEventResponseMcpCallFailed`
- Client event `MCP_APPROVAL_RESPONSE` for responding to approval requests
- Enhanced `ItemType` enum with MCP-related types: `mcp_list_tools`, `mcp_call`, `mcp_approval_request`, and `mcp_approval_response`
- **Enhanced Avatar Configuration**: Expanded avatar functionality with new configuration options:
- Added `AvatarConfigTypes` enum with support for `video-avatar` and `photo-avatar` types
- Added `PhotoAvatarBaseModes` enum for photo avatar base models (e.g., `vasa-1`)
- Added `AvatarOutputProtocol` enum for avatar streaming protocols (`webrtc`, `websocket`)
- Enhanced `AvatarConfig` model with new properties: `type`, `model`, and `output_protocol`
- **Image Content Support**: Added support for image inputs in conversations:
- New `RequestImageContentPart` model for including images in requests
- New `RequestImageContentPartDetail` enum for controlling image detail levels (`auto`, `low`, `high`)
- Added `INPUT_IMAGE` to `ContentPartType` enum
- Enhanced token details models (`InputTokenDetails`, `CachedTokenDetails`) with `image_tokens` tracking
- **Enhanced OpenAI Voices**: Added new OpenAI voice options:
- Added `marin` and `cedar` voices to `OpenAIVoiceName` enum
- **Extended Azure Personal Voice Configuration**: Enhanced `AzurePersonalVoice` with additional customization options:
- Added support for custom lexicon via `custom_lexicon_url`
- Added `prefer_locales` for locale preferences
- Added `locale`, `style`, `pitch`, `rate`, and `volume` properties for fine-tuned voice control
- **Pre-generated Assistant Messages**: Added support for pre-generated assistant messages in `ResponseCreateParams` via the `pre_generated_assistant_message` property
- **Support for Explicit Null Values**: Enhanced `RequestSession` to properly serialize explicitly set `None` values (e.g., `turn_detection=None` now correctly sends `"turn_detection": null` in the WebSocket message)

### Other Changes

- **Dependency Update**: Updated minimum `azure-core` version from 1.35.0 to 1.36.0

### Bug Fixes

## 1.2.0b2 (2025-11-20)

Expand Down
2 changes: 1 addition & 1 deletion sdk/ai/azure-ai-voicelive/azure/ai/voicelive/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
# --------------------------------------------------------------------------

VERSION = "1.2.0"
VERSION = "1.2.0b3"
31 changes: 30 additions & 1 deletion sdk/ai/azure-ai-voicelive/azure/ai/voicelive/models/_patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,38 @@

Follow our quickstart for examples: https://aka.ms/azsdk/python/dpcodegen/python/customize
"""
from typing import Any
from ._models import RequestSession as GeneratedRequestSession


__all__: list[str] = [] # Add all objects you want publicly available to users at this package level
class RequestSession(GeneratedRequestSession):
"""Extended RequestSession that tracks explicitly set None values."""

def __init__(self, *args: Any, **kwargs: Any) -> None:
# Track which kwargs were explicitly passed as None
self._explicit_none_fields = {k for k, v in kwargs.items() if v is None}
super().__init__(*args, **kwargs)

def as_dict(self, **kwargs: Any) -> dict[str, Any]:
"""Convert to dict, including explicitly set None values.

:return: A dictionary representation of the RequestSession, including fields that were
explicitly set to None.
:rtype: dict[str, Any]
"""
result = super().as_dict(**kwargs)
# Add back any fields that were explicitly set to None
for field in self._explicit_none_fields:
# Convert attribute name to rest field name if needed
rest_name = self._attr_to_rest_field.get(field)
if rest_name:
result[rest_name._rest_name] = None # pylint: disable=protected-access
else:
result[field] = None
return result


__all__: list[str] = ["RequestSession"]


def patch_sdk():
Expand Down
4 changes: 2 additions & 2 deletions sdk/ai/azure-ai-voicelive/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ authors = [
description = "Microsoft Corporation Azure Ai Voicelive Client Library for Python"
license = "MIT"
classifiers = [
"Development Status :: 5 - Production/Stable",
"Development Status :: 4 - Beta",
"Programming Language :: Python",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3",
Expand All @@ -32,7 +32,7 @@ keywords = ["azure", "azure sdk"]

dependencies = [
"isodate>=0.6.1",
"azure-core>=1.35.0",
"azure-core>=1.36.0",
"typing-extensions>=4.6.0",
]
dynamic = [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -867,7 +867,7 @@ async def test_realtime_service_wo_turn_detection(self, test_data_dir: Path, mod
async with connect(
endpoint=voicelive_openai_endpoint, credential=AzureKeyCredential(voicelive_openai_api_key), model=model
) as conn:
session = RequestSession(turn_detection={"type": "none"})
session = RequestSession(turn_detection=None)

await conn.session.update(session=session)
await conn.input_audio_buffer.append(audio=_load_audio_b64(file))
Expand Down