Skip to content

[CONTRIBUTION]: Add TTS Support with vllm omni (V1) #7664

@hatemfaheem

Description

@hatemfaheem

Type of Change

New feature

Problem Statement

Dynamo supports multimodal generation through its vLLM Omni integration — image generation (/v1/images/generations) and video generation (/v1/videos) already ship. TTS is a natural next modality: vLLM Omni already supports TTS models (e.g. Qwen3-TTS), and the Dynamo codebase contains partial scaffolding for audio (protocol stubs, endpoint type enum, router placeholders), but no working end-to-end implementation. Users deploying TTS models through Dynamo today have no supported path.

Proposed Solution

Add a new Text-to-Speech audio generation endpoint (POST /v1/audio/speech) to Dynamo, powered by vLLM Omni (sglang support is out of scope here). The endpoint accepts text input and returns a complete WAV or PCM audio file, following the OpenAI TTS API contract and the same architectural patterns established by the image and video generation modalities.

V1 scope is a working end-to-end pipeline: HTTP handler, protocol types, model discovery, metrics, and a single supported model (Qwen3-TTS) producing non-streaming audio responses.

Explicitly out of scope for V1:

  • Support for additional TTS models (e.g., Voxtral TTS)
  • Audio streaming to clients
  • Voice cloning (ref_audio / ref_text)
  • Additional codecs beyond WAV and PCM (MP3, FLAC, Opus, AAC, OGG)

More retails in this DEP
ai-dynamo/enhancements#78

Estimated PR Size

XXL (1000+ lines)

Files/Components Affected

Draft PR: #7661

components/src/dynamo/vllm/omni/omni_handler.py
components/src/dynamo/common/protocols/audio_protocol.py
components/src/dynamo/common/utils/output_modalities.py
lib/bindings/python/rust/lib.rs
lib/llm/src/discovery/model.rs
lib/llm/src/discovery/model_manager.rs
lib/llm/src/discovery/watcher.rs
lib/llm/src/discovery/worker_set.rs
lib/llm/src/http/service/metrics.rs
lib/llm/src/http/service/openai.rs
lib/llm/src/http/service/service_v2.rs
lib/llm/src/protocols/openai/audios/aggregator.rs
lib/llm/src/protocols/openai/audios/nvext.rs
lib/llm/src/protocols/openai/audios.rs
lib/llm/src/protocols/openai.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    contribution-requestExternal contributor proposing to implement a change

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions