feat: add XPU container run script#7669
feat: add XPU container run script#7669ZhengHongming888 wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
Add run_xpu.sh script for launching Dynamo containers with Intel XPU support. The script handles XPU-specific device access, DRI group management, and provides flexible container configuration options. Co-Authored-By: Claude Sonnet 4.5 <[email protected]> Signed-off-by: Hongming Zheng <[email protected]>
|
👋 Hi ZhengHongming888! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughA new executable Bash script is added at Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
container/run_xpu.sh (2)
229-239: Minor inconsistency withrun.shmount path.The
/mntmount on line 232 lacks the trailing slash thatrun.shuses (/mnt/:/mnt). While functionally equivalent for directories, consider aligning for consistency across container run scripts.VOLUME_MOUNTS+=" -v ${SOURCE_DIR}/..:/workspace " VOLUME_MOUNTS+=" -v /tmp:/tmp " - VOLUME_MOUNTS+=" -v /mnt:/mnt " + VOLUME_MOUNTS+=" -v /mnt/:/mnt "🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@container/run_xpu.sh` around lines 229 - 239, In the MOUNT_WORKSPACE block where VOLUME_MOUNTS is assembled (conditioned by MOUNT_WORKSPACE and using SOURCE_DIR), change the mount entry currently added as "-v /mnt:/mnt" to include the trailing slash so it becomes "-v /mnt/:/mnt" to match the style used in run.sh; update the VOLUME_MOUNTS concatenation that references /mnt so the variable and surrounding concatenations (VOLUME_MOUNTS, MOUNT_WORKSPACE, SOURCE_DIR) remain consistent.
281-288: Consider multi-XPU environments.The script only checks
/dev/dri/renderD128, but systems with multiple Intel XPUs will have additional render nodes (renderD129, renderD130, etc.). While the GID is typically the same for all render devices (therendergroup), consider:
- Verifying
/dev/driexists rather than a specific device- Documenting this assumption in the help text
# XPU specific: Add DRI device group for GPU access - if [ -e /dev/dri/renderD128 ]; then - DRI_GROUP=$(stat -c '%g' /dev/dri/renderD128) + # Get GID from any render device (all share the same group) + RENDER_DEVICE=$(ls /dev/dri/renderD* 2>/dev/null | head -n1) + if [ -n "$RENDER_DEVICE" ]; then + DRI_GROUP=$(stat -c '%g' "$RENDER_DEVICE") GROUP_ADD_STRING="--group-add ${DRI_GROUP}" else GROUP_ADD_STRING="" - echo "Warning: /dev/dri/renderD128 not found. XPU access may not work properly." + echo "Warning: No /dev/dri/renderD* devices found. XPU access may not work properly." fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@container/run_xpu.sh` around lines 281 - 288, The check for a single device node /dev/dri/renderD128 is fragile in multi-XPU setups; update the logic around DRI_GROUP and GROUP_ADD_STRING to first verify /dev/dri exists and then derive the group ID from any renderD* entry (e.g., find the first /dev/dri/renderD*), and update the help/warning text to document the assumption that all render devices share the same GID; keep using DRI_GROUP and GROUP_ADD_STRING but replace the hardcoded path check with a /dev/dri presence check and a probe for renderD* to obtain the GID.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@container/run_xpu.sh`:
- Around line 352-372: The docker run command invoked by ${RUN_PREFIX} needs
explicit XPU device passthrough when running non-privileged (i.e. when
${PRIVILEGED_STRING} indicates FALSE); update the script to add device flags
(for example --device /dev/dri --device /dev/dri/renderD128 or a new
${DEVICE_STRING} variable) into the docker run invocation path that is used when
PRIVILEGED_STRING is not set, so the container retains access to /dev/dri
devices; alternatively, if non-privileged XPU is unsupported, remove the
--privileged option or update the help text accordingly.
---
Nitpick comments:
In `@container/run_xpu.sh`:
- Around line 229-239: In the MOUNT_WORKSPACE block where VOLUME_MOUNTS is
assembled (conditioned by MOUNT_WORKSPACE and using SOURCE_DIR), change the
mount entry currently added as "-v /mnt:/mnt" to include the trailing slash so
it becomes "-v /mnt/:/mnt" to match the style used in run.sh; update the
VOLUME_MOUNTS concatenation that references /mnt so the variable and surrounding
concatenations (VOLUME_MOUNTS, MOUNT_WORKSPACE, SOURCE_DIR) remain consistent.
- Around line 281-288: The check for a single device node /dev/dri/renderD128 is
fragile in multi-XPU setups; update the logic around DRI_GROUP and
GROUP_ADD_STRING to first verify /dev/dri exists and then derive the group ID
from any renderD* entry (e.g., find the first /dev/dri/renderD*), and update the
help/warning text to document the assumption that all render devices share the
same GID; keep using DRI_GROUP and GROUP_ADD_STRING but replace the hardcoded
path check with a /dev/dri presence check and a probe for renderD* to obtain the
GID.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bf5a77e0-1ae4-4a79-aeee-ac53125f0bd2
📒 Files selected for processing (1)
container/run_xpu.sh
| ${RUN_PREFIX} docker run \ | ||
| ${INTERACTIVE} \ | ||
| ${RM_STRING} \ | ||
| ${PRIVILEGED_STRING} \ | ||
| ${GROUP_ADD_STRING} \ | ||
| --network "$NETWORK" \ | ||
| --shm-size=10G \ | ||
| --ulimit memlock=-1 \ | ||
| --ulimit stack=67108864 \ | ||
| --ulimit nofile=65536:65536 \ | ||
| ${ENVIRONMENT_VARIABLES} \ | ||
| ${VOLUME_MOUNTS} \ | ||
| ${PORT_MAPPINGS} \ | ||
| -w "$WORKDIR" \ | ||
| --cap-add CAP_SYS_PTRACE \ | ||
| --ipc host \ | ||
| ${USER_STRING} \ | ||
| ${NAME_STRING} \ | ||
| ${ENTRYPOINT_STRING} \ | ||
| ${IMAGE} \ | ||
| "${REMAINING_ARGS[@]}" |
There was a problem hiding this comment.
Missing XPU device passthrough for non-privileged mode.
When --privileged FALSE is specified, the container loses access to XPU devices because there's no --device flag to explicitly pass /dev/dri. The --group-add provides permission but not device visibility.
Consider adding device passthrough when not running privileged:
+ # XPU device passthrough (only needed when not privileged)
+ if [[ ${PRIVILEGED^^} == "FALSE" ]] && [ -d /dev/dri ]; then
+ DEVICE_STRING="--device=/dev/dri"
+ else
+ DEVICE_STRING=""
+ fi
+
${RUN_PREFIX} docker run \
+ ${DEVICE_STRING} \
${INTERACTIVE} \
${RM_STRING} \Alternatively, if non-privileged XPU mode isn't intended to be supported, consider either removing the --privileged option or documenting this limitation in the help text.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@container/run_xpu.sh` around lines 352 - 372, The docker run command invoked
by ${RUN_PREFIX} needs explicit XPU device passthrough when running
non-privileged (i.e. when ${PRIVILEGED_STRING} indicates FALSE); update the
script to add device flags (for example --device /dev/dri --device
/dev/dri/renderD128 or a new ${DEVICE_STRING} variable) into the docker run
invocation path that is used when PRIVILEGED_STRING is not set, so the container
retains access to /dev/dri devices; alternatively, if non-privileged XPU is
unsupported, remove the --privileged option or update the help text accordingly.
Overview:
This PR is to add run_xpu.sh script for launching Dynamo containers with Intel XPU support. The script handles XPU-specific device access, DRI group management, and provides flexible container configuration options.
Details:
generate dockerfile for xpu
python container/render.py --framework vllm --target local-dev --device xpu --output-short-filenamedocker build -
docker build --progress=plain --target local-dev -t dynamo:latest-vllm17.1-local-dev6 -f container/rendered.Dockerfile --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) --build-arg http_proxy=... --build-arg https_proxy=... --build-arg no_proxy=localhost,127.0.0.1,0.0.0.0 .create container -
container/run_xpu.sh --mount-workspace --image dynamo:latest-vllm17.1-local-dev -itfor local-dev container you still need install inside container for dynamo -
cargo build --features dynamo-llm/block-manager && cd /workspace/lib/bindings/python && maturin develop --uv && cd /workspace && uv pip install --no-deps -e /workspaceWith the run_xpu.sh it will simplify the step for creating the xpu container and also aligned with Dynamo templating style.

Tested with run_xpu.sh it works fine.
Thanks.
Summary by CodeRabbit
Release Notes