Describe the bug
Starting with UCX 1.20.0, the ucx-cuda DEB package declares Recommends: libnvidia-compute | libnvidia-ml1. Since apt installs Recommends by default, this causes NVIDIA driver userspace libraries to be pulled in automatically when installing UCX — even in environments that already have a working GPU driver.
When the version of the recommended libnvidia-compute package (resolved from the apt repository) does not match the kernel driver already installed on the host, this results in a driver/library version mismatch that breaks GPU functionality:
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 595.45
In UCX 1.19.x and earlier, the ucx-cuda package had no Recommends field, so installing UCX was harmless to the system's existing driver setup.
Steps to Reproduce
- Start with a system or container that has a working NVIDIA GPU driver (e.g., kernel driver 590.44.01)
- Install UCX 1.20.0 DEB packages:
wget https://github.com/openucx/ucx/releases/download/v1.20.0/ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2
tar -xvf ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2
apt install -y *.deb
- Observe that
apt automatically installs additional NVIDIA packages as recommended dependencies:
The following additional packages will be installed:
libnvidia-cfg1 libnvidia-common libnvidia-compute libnvidia-decode
libnvidia-gpucomp nvidia-persistenced
- Run
nvidia-smi — it fails with Driver/library version mismatch
Expected behavior
UCX should not pull in driver packages, even as soft dependencies. UCX uses the CUDA Driver API via forward-compatible libcuda.so, which is designed to work across driver versions. The driver is a system-level component managed independently of UCX.
Setup and versions
- UCX version: 1.20.0
- OS: Ubuntu 22.04
- Package:
ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2
- Host driver: 590.44.01
- Pulled driver: 595.45.04 (from NVIDIA CUDA apt repository)
Additional information
Describe the bug
Starting with UCX 1.20.0, the
ucx-cudaDEB package declaresRecommends: libnvidia-compute | libnvidia-ml1. SinceaptinstallsRecommendsby default, this causes NVIDIA driver userspace libraries to be pulled in automatically when installing UCX — even in environments that already have a working GPU driver.When the version of the recommended
libnvidia-computepackage (resolved from the apt repository) does not match the kernel driver already installed on the host, this results in a driver/library version mismatch that breaks GPU functionality:In UCX 1.19.x and earlier, the
ucx-cudapackage had noRecommendsfield, so installing UCX was harmless to the system's existing driver setup.Steps to Reproduce
wget https://github.com/openucx/ucx/releases/download/v1.20.0/ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2 tar -xvf ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2 apt install -y *.debaptautomatically installs additional NVIDIA packages as recommended dependencies:nvidia-smi— it fails withDriver/library version mismatchExpected behavior
UCX should not pull in driver packages, even as soft dependencies. UCX uses the CUDA Driver API via forward-compatible
libcuda.so, which is designed to work across driver versions. The driver is a system-level component managed independently of UCX.Setup and versions
ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2Additional information