-
Notifications
You must be signed in to change notification settings - Fork 145
Open
Description
Description
I’m serving Kimi-K2.5 on a single machine with 8× NVIDIA H200 using vLLM (OpenAI-compatible server). The service runs normally at first, but after running for a while the model sometimes starts returning garbled / nonsensical text (looks like random multilingual fragments, broken tokens, and junk characters). This happens in the reasoning field (and the response becomes unreadable / meaningless).
The deployment generally follows the Kimi-K2.5 recommended inference engines (vLLM is listed as recommended in the repo README). 
Env
- GPUs: 8× NVIDIA H200
- Serving: vLLM OpenAI server
- Container image: vllm-openai:nightly-8fae54faff485e446dc8d1a700417f07659ef89e
- CUDA libs mounted via LD_LIBRARY_PATH=/usr/local/cuda-12.9/compat:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
- Model: moonshotai/Kimi-K2.5 (local volume mount)
docker-compose
version: "3.9"
services:
kimi_k25_int4:
image: vllm-openai:nightly-8fae54faff485e446dc8d1a700417f07659ef89e
container_name: kimi-k25
ipc: host
ports:
- "40000:8000"
environment:
- LD_LIBRARY_PATH=/usr/local/cuda-12.9/compat:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- /data3/models/Kimi-K2.5:/model:ro
command: >
--host 0.0.0.0
--port 8000
--model /model
--served-model-name kimi-k2.5
--tensor-parallel-size 8
--tool-call-parser kimi_k2
--reasoning-parser kimi_k2
--mm-encoder-tp-mode data
--trust-remote-code
--enable-auto-tool-choice
Steps to reproduce
- Start the server with the configuration above.
- Send chat completion requests normally (with reasoning enabled / returned by the server).
- After the server has been running for some time (and under ongoing requests), responses occasionally become garbled.
Expected behavior
Responses (including reasoning) remain coherent and readable.
Actual behavior
The reasoning content becomes unreadable / looks like corrupted tokens. Example:
灵性土。稍地
-elect. InoJC. After。鸽0 Bloodh o199h18wmm4 o @ has been.3.
A more.
|dc I. .ah
00AACY undning0000
GThe fluoride
B在邓·王e要求:Orcle rock whiskeyTheaypal solar. pick by barDear user通过短流为。这个gg
3lit coni.
Example: Digu1.stice oil comesk7 aerobic i-s.控件J4rab2 When office:λ
D radiation
00h8 a� blog 005723O.
003007NH) is wedding. Thermal equipment virus serum
December患失_APPROXry3388那那狗 .
Sel"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels