Skip to content

Add ray data option for video benchmarks#2002

Open
oyilmaz-nvidia wants to merge 1 commit into
mainfrom
onur/add-ray-data-for-video-benchmarks
Open

Add ray data option for video benchmarks#2002
oyilmaz-nvidia wants to merge 1 commit into
mainfrom
onur/add-ray-data-for-video-benchmarks

Conversation

@oyilmaz-nvidia
Copy link
Copy Markdown
Contributor

Add ray_data variants for video nightly benchmarks

Summary

Adds ray_data executor coverage for the four video pipelines that currently only run on Xenna in nightly benchmarks, so both backends are tracked side-by-side every night — matching the dual-executor pattern already used for audio_readspeech_*, image_curation_*, domain_classification_*, etc.

Config-only change to benchmarking/nightly-benchmark.yaml — no Python edits needed because video_pipeline_benchmark.py already accepts --executor={xenna,ray_data} and routes through setup_executor().

Changes

Renamed each existing video benchmark to add an explicit _xenna suffix and added a sibling _raydata entry (identical args except --executor=ray_data):

Pipeline Xenna entry Ray Data entry Timeout num_clips Min throughput
Embedding video_embedding_xenna video_embedding_raydata 400s 1400 4.0/s
Transcoding video_transcoding_xenna video_transcoding_raydata 400s 1400 5.0/s
Captioning video_captioning_xenna video_captioning_raydata 1800s 377 0.25/s
TransNetV2 + filters video_transnetv2_motion_aesthetic_filter_embeddings_xenna video_transnetv2_motion_aesthetic_filter_embeddings_raydata 800s 113 0.25/s

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oyilmaz-nvidia oyilmaz-nvidia marked this pull request as ready for review May 19, 2026 23:30
@oyilmaz-nvidia oyilmaz-nvidia requested a review from a team as a code owner May 19, 2026 23:30
@oyilmaz-nvidia oyilmaz-nvidia requested review from ayushdg and removed request for a team May 19, 2026 23:30
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 19, 2026

Greptile Summary

This PR adds ray_data executor variants for the four existing video pipeline benchmarks (embedding, transcoding, captioning, transnetv2_motion_aesthetic_filter_embeddings), matching the dual-executor pattern used for audio, image, and domain classification pipelines. It is a config-only change — no Python edits.

  • Each existing video_* benchmark is renamed with an _xenna suffix, and a sibling _raydata entry is added with --executor=ray_data as the only argument difference, keeping all other flags, timeouts, and requirements identical.
  • All four raydata entries correctly inherit the same exact_value clip counts, min_value throughput requirements, and sink_data Slack notifications as their xenna counterparts.

Confidence Score: 5/5

Config-only addition that faithfully mirrors existing xenna entries; no logic changes.

All four raydata entries are verified to be exact copies of their xenna counterparts with only --executor=ray_data substituted. GPU resource flags, argument sets, timeouts, clip count requirements, and throughput thresholds are all consistent across every pair.

No files require special attention.

Important Files Changed

Filename Overview
benchmarking/nightly-benchmark.yaml Adds four ray_data benchmark entries mirroring existing xenna entries; renames four existing entries with _xenna suffix. All args, timeouts, requirements, and GPU resource flags are consistent between pairs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[nightly-benchmark.yaml] --> B[video_pipeline_benchmark.py]
    B --> C{--executor}
    C -->|xenna| D[video_embedding_xenna]
    C -->|ray_data| E[video_embedding_raydata]
    C -->|xenna| F[video_transcoding_xenna]
    C -->|ray_data| G[video_transcoding_raydata]
    C -->|xenna| H[video_captioning_xenna]
    C -->|ray_data| I[video_captioning_raydata]
    C -->|xenna| J[video_transnetv2_..._xenna]
    C -->|ray_data| K[video_transnetv2_..._raydata]
Loading

Reviews (1): Last reviewed commit: "Add ray data option for video benchmarks" | Re-trigger Greptile

Comment on lines +866 to +870
- metric: num_clips_generated
exact_value: 1400
- metric: throughput_clips_per_sec
min_value: 4.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The throughput might be different for ray data. Do we want to run verify and then add these or add these first and then fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants