refactor(vLLM): Move video support from example to backend by rmccorm4 · Pull Request #7663 · ai-dynamo/dynamo

rmccorm4 · 2026-03-27T07:22:25Z

Overview:

replace model-name allowlists with capability-driven vision loading and multimodal handling
add native video_url loading in the standard TokensPrompt multi_modal_data flow
move the video agg/disagg launch scripts under examples/backends/vllm and update docs/tests

Details:

Quick Benchmark: Dynamo vs vllm serve for Video Inference

I ran a quick apples-to-apples comparison between Dynamo aggregate mode (examples/backends/vllm/launch/video_agg.sh) and plain vllm serve, both serving Qwen/Qwen2-VL-2B-Instruct on the same machine and GPU configuration.

Benchmark command:

aiperf profile \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --endpoint-type chat \
  --endpoint /v1/chat/completions \
  --url localhost:8000 \
  --video-width 640 \
  --video-height 480 \
  --video-fps 4 \
  --video-duration 5.0 \
  --request-count 20 \
  --osl 1200 \
  --osl-stddev 0 \
  --extra-inputs '{"ignore_eos": true, "min_tokens": 1200}' \
  --use-server-token-count \
  --ui none \
  --no-server-metrics \
  --no-gpu-telemetry

Both runs completed successfully with identical prompt/completion lengths:

Average ISL: 962
Average OSL: 1200
Success rate: 20/20

Metric	Dynamo (`video_agg.sh`)	`vllm serve`	Delta
Request throughput	`0.17110 req/s`	`0.17150 req/s`	`vLLM +0.23%`
Avg latency	`5842.53 ms`	`5829.02 ms`	`vLLM -0.23%`
P50 latency	`5648.22 ms`	`5631.58 ms`	`vLLM -0.30%`
P90 latency	`5688.17 ms`	`5665.62 ms`	`vLLM -0.40%`
P99 latency	`8735.89 ms`	`8833.25 ms`	`vLLM +1.11%`
Output token throughput	`205.32 tok/s`	`205.79 tok/s`	`vLLM +0.23%`
Total token throughput	`369.92 tok/s`	`370.77 tok/s`	`vLLM +0.23%`
Benchmark duration	`116.89 s`	`116.62 s`	`vLLM -0.23%`

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

- replace model-name allowlists with capability-driven vision loading and multimodal handling - add native video_url loading in the standard TokensPrompt multi_modal_data flow - move the video agg/disagg launch scripts under examples/backends/vllm and update docs/tests

…direct to backend

github-actions · 2026-03-27T07:24:02Z

🌿 Fern Docs Preview: https://nvidia-preview-331dcd89-8549-4d1d-bccc-fbf9d89a5ebd.docs.buildwithfern.com/dynamo/dev

rmccorm4 added 2 commits March 26, 2026 22:23

simplify - remove prefetch, return error in mm example on image to re…

968fc8c

…direct to backend

pull-request-size bot added the size/XXL label Mar 27, 2026

github-actions bot added refactor documentation Improvements or additions to documentation backend::vllm Relates to the vllm backend multimodal labels Mar 27, 2026

iterate

d411022

copy-pr-bot bot temporarily deployed to GITLAB March 27, 2026 07:30 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 27, 2026 07:31 Inactive

simplify

4660ca6

copy-pr-bot bot temporarily deployed to GITLAB March 27, 2026 07:40 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 27, 2026 07:43 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(vLLM): Move video support from example to backend#7663

refactor(vLLM): Move video support from example to backend#7663
rmccorm4 wants to merge 4 commits intomainfrom
rmccormick/vllm-video

rmccorm4 commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rmccorm4 commented Mar 27, 2026

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

github-actions bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 27, 2026 •

edited

Loading