Skip to content

[BUG]: No module named 'lmcache' in vLLM 1.0.1-cuda13 image #7568

@george-kuanli-peng

Description

@george-kuanli-peng

Describe the Bug

When I attempted to deploy a Qwen3 model using the nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1-cuda13 image with LMCache, the vllm prefil worker failed with error No module named 'lmcache'.

Steps to Reproduce

  1. Create a DynamoGraphDeployment that specifies disaggregated serving:
    1. using the nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1-cuda13 image
    2. using LMCache as KV cache offloading backend

Expected Behavior

Both the prefill and decode workers should start up successfully.

Actual Behavior

The decode worker could sart, but the prefill worker failed with the following logs:

(Worker_TP1 pid=1431) INFO 03-23 02:31:45 [factory.py:64] Creating v1 connector with name: PdConnector and engine_id: b37d8061-db07-416c-9c5f-f5dd04dfc762
(Worker_TP1 pid=1431) WARNING 03-23 02:31:45 [base.py:166] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(Worker_TP1 pid=1431) WARNING 03-23 02:31:45 [base.py:166] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(Worker_TP1 pid=1431) INFO 03-23 02:31:45 [lmcache_connector.py:95] Initializing latest dev LMCache connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] WorkerProc hit an exception.
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] Traceback (most recent call last):
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     output = func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 316, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self.worker.initialize_from_config(kv_cache_config)  # type: ignore
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 421, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 82, in create_connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return connector_cls(config, role, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/kvbm/vllm_integration/connector/pd_connector.py", line 62, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     super().__init__(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 130, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self._connectors.append(connector_cls(temp_config, role, kv_cache_config))
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 97, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     from lmcache.integration.vllm.vllm_v1_adapter import (
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ModuleNotFoundError: No module named 'lmcache'
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] Traceback (most recent call last):
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     output = func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 316, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self.worker.initialize_from_config(kv_cache_config)  # type: ignore
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 421, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 82, in create_connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     return connector_cls(config, role, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/kvbm/vllm_integration/connector/pd_connector.py", line 62, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     super().__init__(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 130, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     self._connectors.append(connector_cls(temp_config, role, kv_cache_config))
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]   File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 97, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863]     from lmcache.integration.vllm.vllm_v1_adapter import (
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ModuleNotFoundError: No module named 'lmcache'

Environment

  1. Dynamo platform deployed using v1.0.1 helm chart.

Additional Context

The prefill and decode could start if I merely changed the image as follows:

  1. nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0
  2. nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1

Screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions