Skip to content

feat(recipes): add GAIE recipe for llama-3-70b vLLM disagg-multi-node#7651

Open
AsadShahid04 wants to merge 3 commits intoai-dynamo:mainfrom
AsadShahid04:fix/inf-issue-7392
Open

feat(recipes): add GAIE recipe for llama-3-70b vLLM disagg-multi-node#7651
AsadShahid04 wants to merge 3 commits intoai-dynamo:mainfrom
AsadShahid04:fix/inf-issue-7392

Conversation

@AsadShahid04
Copy link
Copy Markdown

@AsadShahid04 AsadShahid04 commented Mar 26, 2026

Add GAIE (Generative AI Engine) recipe for llama-3-70b with vLLM in disaggregated multi-node configuration. This completes GAIE support across all llama-3-70b vLLM recipe shapes (agg, disagg-single-node, and disagg-multi-node).

Includes:

  • deploy.yaml with Epp (Embeddings Pre-Processor) and dual worker setup
  • http-route.yaml for Ingress Gateway configuration

Fixes #7392

Summary by CodeRabbit

  • New Features
    • Deployed Llama-3-70b model with optimized distributed inference configuration
    • Configured HTTP routing and endpoint access for the model deployment

Add GAIE (Generative AI Engine) recipe for llama-3-70b with vLLM
in disaggregated multi-node configuration. This completes GAIE support
across all llama-3-70b vLLM recipe shapes (agg, disagg-single-node,
and disagg-multi-node).

Includes:
- deploy.yaml with Epp (Embeddings Pre-Processor) and dual worker setup
- http-route.yaml for Ingress Gateway configuration

Fixes ai-dynamo#7392

Signed-off-by: Asad Shahid <[email protected]>
@AsadShahid04 AsadShahid04 requested review from a team as code owners March 26, 2026 04:39
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi AsadShahid04! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added feat external-contribution Pull request is from an external contributor labels Mar 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

Walkthrough

Added two new Kubernetes manifest files for a GAIE recipe supporting disaggregated multi-node deployment of Llama 3 70B with vLLM. The deploy.yaml defines a DynamoGraphDeployment with an endpoint service and separate prefill/decode worker services configured with GPU resources, shared memory, and disaggregation scheduling. The http-route.yaml defines an HTTPRoute for external traffic routing to the deployment.

Changes

Cohort / File(s) Summary
GAIE Deployment Configuration
recipes/llama-3-70b/vllm/disagg-multi-node/gaie/deploy.yaml
New DynamoGraphDeployment manifest defining three services (Epp, VllmPrefillWorker, VllmDecodeWorker) with GPU allocation (8 per worker), shared memory configuration (80Gi), model cache mounting, disaggregation scheduling profiles, and sidecar frontend components.
GAIE Network Routing
recipes/llama-3-70b/vllm/disagg-multi-node/gaie/http-route.yaml
New HTTPRoute manifest routing traffic to llama3-70b-disagg-pool InferencePool backend with 300s request timeout on path /.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR adds deploy.yaml and http-route.yaml files as required by issue #7392, but misses documentation updates and recipe validation tasks. Complete the linked issue requirements by adding updates to docs/kubernetes/inference-gateway.md, recipes/README.md, recipes/llama-3-70b/README.md, and GAIE recipe validation/tests.
Description check ❓ Inconclusive The PR description covers the overview and main changes but lacks complete coverage of all required template sections. Add a 'Where should the reviewer start?' section identifying key files (deploy.yaml and http-route.yaml) to review, and clarify which issues are fixed.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a GAIE recipe for llama-3-70b vLLM in disaggregated multi-node configuration.
Out of Scope Changes check ✅ Passed All changes in the PR are directly related to adding the GAIE recipe for disagg-multi-node configuration as specified in issue #7392.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@recipes/llama-3-70b/vllm/disagg-multi-node/gaie/deploy.yaml`:
- Around line 1-177: The PR is missing the required documentation and
validation/tests referenced in issue `#7392`; add updates to the Kubernetes
inference gateway doc, the top-level recipes README, and the llama-3-70b recipe
README to document the new GAIE deploy and its parameters, and add GAIE recipe
validation/tests that exercise the new DynamoGraphDeployment named
"llama3-70b-disagg-mn-gaie" and its components (Epp, VllmPrefillWorker,
VllmDecodeWorker) including env/config expectations (e.g., DYN_MODEL_NAME,
DYN_ENFORCE_DISAGG, SERVED_MODEL_NAME, MODEL_PATH, kv-transfer-config and
resource/gpu counts) and ensure test matrix entries cover the new path.

In `@recipes/llama-3-70b/vllm/disagg-multi-node/gaie/http-route.yaml`:
- Around line 16-17: The note and backendRefs usage are inconsistent:
backendRefs.namespace must either be explicitly set where backendRefs are
defined (the backendRefs entries referenced in the file) to the namespace
containing the InferencePool, or the top-level metadata.namespace note must be
updated to explicitly state "this route must be created in the same namespace as
the InferencePool" so omitting backendRefs.namespace is safe; update either the
backendRefs entries to include backendRefs.namespace = <InferencePool-namespace>
(matching your InferencePool resource name/namespace) or change the comment
about metadata.namespace to clearly require same-namespace application for the
route and InferencePool.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0d37f30d-bb25-4d5d-aa69-b92854074c53

📥 Commits

Reviewing files that changed from the base of the PR and between a58bcc3 and 60eeafc.

📒 Files selected for processing (2)
  • recipes/llama-3-70b/vllm/disagg-multi-node/gaie/deploy.yaml
  • recipes/llama-3-70b/vllm/disagg-multi-node/gaie/http-route.yaml

Update inference-gateway.md, recipes/README.md, and llama-3-70b/README.md
to include the new disagg-multi-node GAIE recipe path. Marks GAIE support
as available for the multi-node disaggregated configuration in the recipes
compatibility matrix.

Fixes ai-dynamo#7392

Signed-off-by: Asad Shahid <[email protected]>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 27, 2026
@AsadShahid04
Copy link
Copy Markdown
Author

Added documentation updates in d6b5574:

  • Updated docs/kubernetes/inference-gateway.md to include disagg-multi-node/gaie/ path alongside existing agg and disagg-single-node references
  • Updated recipes/README.md to mark GAIE support as ✅ for Llama-3-70B disagg-multi-node
  • Updated recipes/llama-3-70b/README.md to list all three GAIE-supported configurations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GAIE recipe for llama-3-70b vLLM disagg-multi-node

1 participant