[BUG]: No module named 'lmcache' in vLLM `1.0.1-cuda13` image

### Describe the Bug

When I attempted to deploy a Qwen3 model using the `nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1-cuda13` image with LMCache, the vllm prefil worker failed with error `No module named 'lmcache'`.

### Steps to Reproduce

1. Create a `DynamoGraphDeployment` that specifies disaggregated serving:
   1. using the `nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1-cuda13` image
   1. using LMCache as KV cache offloading backend

### Expected Behavior

Both the prefill and decode workers should start up successfully.

### Actual Behavior

The decode worker could sart, but the prefill worker failed with the following logs:

```console
(Worker_TP1 pid=1431) INFO 03-23 02:31:45 [factory.py:64] Creating v1 connector with name: PdConnector and engine_id: b37d8061-db07-416c-9c5f-f5dd04dfc762
(Worker_TP1 pid=1431) WARNING 03-23 02:31:45 [base.py:166] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(Worker_TP1 pid=1431) WARNING 03-23 02:31:45 [base.py:166] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(Worker_TP1 pid=1431) INFO 03-23 02:31:45 [lmcache_connector.py:95] Initializing latest dev LMCache connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] WorkerProc hit an exception.
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] Traceback (most recent call last):
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     output = func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 316, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self.worker.initialize_from_config(kv_cache_config)  # type: ignore
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 421, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 82, in create_connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return connector_cls(config, role, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/kvbm/vllm_integration/connector/pd_connector.py", line 62, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     super().__init__(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 130, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self._connectors.append(connector_cls(temp_config, role, kv_cache_config))
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 97, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     from lmcache.integration.vllm.vllm_v1_adapter import (
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ModuleNotFoundError: No module named 'lmcache'
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] Traceback (most recent call last):
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     output = func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 316, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self.worker.initialize_from_config(kv_cache_config)  # type: ignore
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 421, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 82, in create_connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return connector_cls(config, role, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/kvbm/vllm_integration/connector/pd_connector.py", line 62, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     super().__init__(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 130, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self._connectors.append(connector_cls(temp_config, role, kv_cache_config))
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 97, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     from lmcache.integration.vllm.vllm_v1_adapter import (
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ModuleNotFoundError: No module named 'lmcache'
```

### Environment

1. Dynamo platform deployed using `v1.0.1` helm chart.

### Additional Context

The prefill and decode could start if I merely changed the image as follows:
1. `nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0`
2. `nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1`

### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: No module named 'lmcache' in vLLM `1.0.1-cuda13` image #7568

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: No module named 'lmcache' in vLLM 1.0.1-cuda13 image #7568

Description

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG]: No module named 'lmcache' in vLLM `1.0.1-cuda13` image #7568