-
Notifications
You must be signed in to change notification settings - Fork 958
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the Bug
When I attempted to deploy a Qwen3 model using the nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1-cuda13 image with LMCache, the vllm prefil worker failed with error No module named 'lmcache'.
Steps to Reproduce
- Create a
DynamoGraphDeploymentthat specifies disaggregated serving:- using the
nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1-cuda13image - using LMCache as KV cache offloading backend
- using the
Expected Behavior
Both the prefill and decode workers should start up successfully.
Actual Behavior
The decode worker could sart, but the prefill worker failed with the following logs:
(Worker_TP1 pid=1431) INFO 03-23 02:31:45 [factory.py:64] Creating v1 connector with name: PdConnector and engine_id: b37d8061-db07-416c-9c5f-f5dd04dfc762
(Worker_TP1 pid=1431) WARNING 03-23 02:31:45 [base.py:166] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(Worker_TP1 pid=1431) WARNING 03-23 02:31:45 [base.py:166] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
(Worker_TP1 pid=1431) INFO 03-23 02:31:45 [lmcache_connector.py:95] Initializing latest dev LMCache connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] WorkerProc hit an exception.
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] Traceback (most recent call last):
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] output = func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 316, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] return func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 421, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 82, in create_connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] return connector_cls(config, role, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/kvbm/vllm_integration/connector/pd_connector.py", line 62, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] super().__init__(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 130, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] self._connectors.append(connector_cls(temp_config, role, kv_cache_config))
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 97, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] from lmcache.integration.vllm.vllm_v1_adapter import (
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ModuleNotFoundError: No module named 'lmcache'
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] Traceback (most recent call last):
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 858, in worker_busy_loop
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] output = func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 316, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] return func(*args, **kwargs)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 421, in initialize_from_config
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 82, in create_connector
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] return connector_cls(config, role, kv_cache_config)
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/kvbm/vllm_integration/connector/pd_connector.py", line 62, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] super().__init__(
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 130, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] self._connectors.append(connector_cls(temp_config, role, kv_cache_config))
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] File "/opt/dynamo/venv/lib/python3.12/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 97, in __init__
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] from lmcache.integration.vllm.vllm_v1_adapter import (
(Worker_TP1 pid=1431) ERROR 03-23 02:31:45 [multiproc_executor.py:863] ModuleNotFoundError: No module named 'lmcache'Environment
- Dynamo platform deployed using
v1.0.1helm chart.
Additional Context
The prefill and decode could start if I merely changed the image as follows:
nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.0nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.0.1
Screenshots
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working