-
Notifications
You must be signed in to change notification settings - Fork 60
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.3
Libc version: glibc-2.35
Python version: 3.10.17 (main, May 8 2025, 07:18:04) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.6.0-72.0.0.76.oe2403sp1.aarch64-aarch64-with-glibc2.35
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
BIOS Vendor ID: HiSilicon
Model name: Kunpeng-920
BIOS Model name: HUAWEI Kunpeng 920 5250
Model: 0
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 4
Stepping: 0x1
Frequency boost: disabled
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.0.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.post1.dev20250619
[pip3] torchvision==0.20.1
[pip3] transformers==4.52.4
[conda] Could not collect
vLLM Version: 0.9.2
vLLM Ascend Version: 0.9.2rc1
ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ASCEND_RT_VISIBLE_DEVICES=6,7
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 25.3.rc1 Version: 25.3.rc1 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B4-1 | OK | 124.4 41 0 / 0 |
| 0 | 0000:C1:00.0 | 16 0 / 0 61553/ 65536 |
+===========================+===============+====================================================+
| 1 910B4-1 | OK | 127.8 38 0 / 0 |
| 0 | 0000:C2:00.0 | 15 0 / 0 61555/ 65536 |
+===========================+===============+====================================================+
| 2 910B4-1 | OK | 86.0 34 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 3402 / 65536 |
+===========================+===============+====================================================+
| 3 910B4-1 | OK | 90.5 36 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 54701/ 65536 |
+===========================+===============+====================================================+
| 4 910B4-1 | OK | 92.9 40 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 54851/ 65536 |
+===========================+===============+====================================================+
| 5 910B4-1 | OK | 88.1 41 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 3403 / 65536 |
+===========================+===============+====================================================+
| 6 910B4-1 | OK | 96.5 39 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 3403 / 65536 |
+===========================+===============+====================================================+
| 7 910B4-1 | OK | 88.8 39 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 3403 / 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| 0 0 | 1271050 | | 58026 |
+===========================+===============+====================================================+
| 1 0 | 1271052 | | 58026 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| 3 0 | 3533510 | | 51312 |
+===========================+===============+====================================================+
| 4 0 | 3533512 | | 51312 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| No running processes found in NPU 6 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+
CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux
🐛 Describe the bug
tail -n 4 ~/.bashrc
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
export PLATFORM=ascend
export ENABLE_SPARSE=TRUE
执行python examples/offline_inference_esa.py报错,相关配置已按照文档修改
ktc = KVTransferConfig(
kv_connector=name,
kv_connector_module_path=module_path,
kv_role="kv_both",
kv_connector_extra_config={
"ucm_connectors": [
{
"ucm_connector_name": "UcmNfsStore",
"ucm_connector_config": {
"storage_backends": data_dir,
"kv_block_size": 33554432,
# "transferStreamNumber":16,
# "use_direct": False,
},
}
],
"ucm_sparse_config": {
"ESA": {
"init_window_sz": 1,
"local_window_sz": 2,
"min_blocks": 4,
"sparse_ratio": 0.2,
"retrieval_stride": 10,
}
# "GSA":{}
},
},
)报错如下:
(VllmWorker rank=0 pid=14516) [2026-01-28 15:52:35] - ucm.sparse.factory - INFO [factory.py:43] Creating sparse method with name: ESA
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] WorkerProc failed to start.
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] Traceback (most recent call last):
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 461, in worker_main
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 357, in __init__
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] self.worker.init_device()
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/vllm-workspace/vllm/vllm/worker/worker_base.py", line 606, in init_device
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] self.worker.init_device() # type: ignore
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 140, in init_device
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] self._init_worker_distributed_environment()
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 340, in _init_worker_distributed_environment
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] ensure_ucm_sparse_initialized(self.vllm_config)
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/workspace/unified-cache-management/ucm/sparse/state.py", line 52, in ensure_ucm_sparse_initialized
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] _UCM_SPARSE_AGENT = UcmSparseFactory.create_sparse_method(
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/workspace/unified-cache-management/ucm/sparse/factory.py", line 44, in create_sparse_method
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] return sparse_method_cls(config, role)
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] File "/workspace/unified-cache-management/ucm/sparse/esa/esa.py", line 456, in __init__
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] self.connector = get_kv_transfer_group().connector.store
(VllmWorker rank=0 pid=14516) ERROR 01-28 15:52:35 [multiproc_executor.py:487] AttributeError: 'UCMDirectConnector' object has no attribute 'store'
ERROR 01-28 15:52:35 [core.py:586] EngineCore failed to start.
ERROR 01-28 15:52:35 [core.py:586] Traceback (most recent call last):
ERROR 01-28 15:52:35 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core
ERROR 01-28 15:52:35 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 01-28 15:52:35 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__
ERROR 01-28 15:52:35 [core.py:586] super().__init__(vllm_config, executor_class, log_stats,
ERROR 01-28 15:52:35 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__
ERROR 01-28 15:52:35 [core.py:586] self.model_executor = executor_class(vllm_config)
ERROR 01-28 15:52:35 [core.py:586] File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__
ERROR 01-28 15:52:35 [core.py:586] self._init_executor()
ERROR 01-28 15:52:35 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 93, in _init_executor
ERROR 01-28 15:52:35 [core.py:586] self.workers = WorkerProc.wait_for_ready(unready_workers)
ERROR 01-28 15:52:35 [core.py:586] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 422, in wait_for_ready
ERROR 01-28 15:52:35 [core.py:586] raise e from None
ERROR 01-28 15:52:35 [core.py:586] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/python3.10.17/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 590, in run_engine_core
raise e
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 577, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 404, in __init__
super().__init__(vllm_config, executor_class, log_stats,
File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__
self.model_executor = executor_class(vllm_config)
File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__
self._init_executor()
File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 93, in _init_executor
self.workers = WorkerProc.wait_for_ready(unready_workers)
File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 422, in wait_for_ready
raise e from None
Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Traceback (most recent call last):
File "/workspace/unified-cache-management/examples/offline_inference_esa.py", line 176, in <module>
main()
File "/workspace/unified-cache-management/examples/offline_inference_esa.py", line 154, in main
with build_llm_with_uc(module_path, name, model) as llm:
File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/workspace/unified-cache-management/examples/offline_inference_esa.py", line 107, in build_llm_with_uc
llm = LLM(**asdict(llm_args))
File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__
self.llm_engine = LLMEngine.from_engine_args(
File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 501, in from_engine_args
return engine_cls.from_vllm_config(
File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 124, in from_vllm_config
return cls(vllm_config=vllm_config,
File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 101, in __init__
self.engine_core = EngineCoreClient.make_client(
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 75, in make_client
return SyncMPClient(vllm_config, executor_class, log_stats)
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 503, in __init__
/usr/local/python3.10.17/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
super().__init__(
File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 403, in __init__
with launch_core_engines(vllm_config, executor_class,
File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 434, in launch_core_engines
wait_for_engine_startup(
File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 484, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': 1}
[ERROR] 2026-01-28-15:52:35 (PID:14045, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working