Garbled / gibberish output after serving Kimi-K2.5 with vLLM on 8×H200 (INT4) for some time

## Description
I’m serving Kimi-K2.5 on a single machine with 8× NVIDIA H200 using vLLM (OpenAI-compatible server). The service runs normally at first, but after running for a while the model sometimes starts returning garbled / nonsensical text (looks like random multilingual fragments, broken tokens, and junk characters). This happens in the reasoning field (and the response becomes unreadable / meaningless).

The deployment generally follows the Kimi-K2.5 recommended inference engines (vLLM is listed as recommended in the repo README).  ￼

## Env
- GPUs: 8× NVIDIA H200
- Serving: vLLM OpenAI server
- Container image: vllm-openai:nightly-8fae54faff485e446dc8d1a700417f07659ef89e
- CUDA libs mounted via LD_LIBRARY_PATH=/usr/local/cuda-12.9/compat:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
- Model: moonshotai/Kimi-K2.5 (local volume mount)
## docker-compose
```
version: "3.9"

services:
  kimi_k25_int4:
    image: vllm-openai:nightly-8fae54faff485e446dc8d1a700417f07659ef89e
    container_name: kimi-k25
    ipc: host
    ports:
      - "40000:8000"
    environment:
      - LD_LIBRARY_PATH=/usr/local/cuda-12.9/compat:/usr/local/nvidia/lib64:/usr/local/cuda/lib64
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - /data3/models/Kimi-K2.5:/model:ro
    command: >
      --host 0.0.0.0
      --port 8000
      --model /model
      --served-model-name kimi-k2.5
      --tensor-parallel-size 8
      --tool-call-parser kimi_k2
      --reasoning-parser kimi_k2
      --mm-encoder-tp-mode data
      --trust-remote-code
      --enable-auto-tool-choice
```

## Steps to reproduce
1.  Start the server with the configuration above.
2.  Send chat completion requests normally (with reasoning enabled / returned by the server).
3.  After the server has been running for some time (and under ongoing requests), responses occasionally become garbled.

## Expected behavior

Responses (including reasoning) remain coherent and readable.

## Actual behavior

The reasoning content becomes unreadable / looks like corrupted tokens. Example:
```
灵性土。稍地

-elect. In‍oJC. After。鸽0 Bloodh o199h18wmm4 o @ has been.3.
A more.

|dc I. .ah
00AACY undning0000

GThe fluoride
B在邓·王e要求：Orcle rock whiskeyTheaypal solar.  pick by barDear user通过短流为。这个gg
3lit coni.

Example: Digu1.stice oil comesk7 aerobic i-s.控件J4rab2 When office:λ

D radiation

00h8 a&#81488618 blog 005723O.
003007NH) is wedding. Thermal equipment virus serum
　 December患失_APPROXry3388那那狗 .

Sel"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbled / gibberish output after serving Kimi-K2.5 with vLLM on 8×H200 (INT4) for some time #23

Description

Env

docker-compose

Steps to reproduce

Expected behavior

Actual behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Garbled / gibberish output after serving Kimi-K2.5 with vLLM on 8×H200 (INT4) for some time #23

Description

Description

Env

docker-compose

Steps to reproduce

Expected behavior

Actual behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions