Conversation
WalkthroughThis PR introduces a design document, runtime log, and reproduction script. The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
vllm_native_kvbm.sh (1)
25-33: Harden startup timeout and cleanup semantics.If readiness never succeeds, the script continues without a clear timeout failure path, and cleanup is not guaranteed on early exits. Add explicit timeout failure plus
trap-based cleanup so the server process is always terminated.Proposed script hardening
set -euo pipefail MODEL="Qwen/Qwen3-VL-30B-A3B-Instruct-FP8" -PORT=8000 +PORT="${PORT:-8000}" +VLLM_PID="" + +cleanup() { + if [[ -n "${VLLM_PID}" ]]; then + kill "${VLLM_PID}" 2>/dev/null || true + wait "${VLLM_PID}" 2>/dev/null || true + fi +} +trap cleanup EXIT INT TERM @@ for i in $(seq 1 120); do if curl -s http://localhost:$PORT/v1/models > /dev/null 2>&1; then echo "Ready after ${i}s" break fi sleep 1 done + +if ! curl -s http://localhost:$PORT/v1/models > /dev/null 2>&1; then + echo "Timed out waiting for vLLM on port ${PORT}" >&2 + exit 1 +fi @@ -# ── Cleanup ── -echo "" -echo "=== Cleanup ===" -kill $VLLM_PID 2>/dev/null || true -wait $VLLM_PID 2>/dev/null || true -echo "Done." +echo "" +echo "=== Cleanup ===" +echo "Done."Also applies to: 69-74
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vllm_native_kvbm.sh` around lines 25 - 33, The readiness loop that polls http://localhost:$PORT/v1/models (the for loop using seq 1 120 and curl) lacks a failure path and no guaranteed cleanup; modify it to track elapsed seconds (or use a timeout counter) and if the loop exceeds the max (120s) exit with a non-zero status and error message, and add a trap that captures EXIT (and INT/TERM) to kill the background server PID variable (e.g., the PID you get when launching vLLM) so the server is always terminated on early exit; apply the same timeout+trap pattern to the second readiness block referenced around lines 69-74 to ensure symmetric behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plan.md`:
- Line 85: Tighten the contract for extra_block_hashes used by new() and
split_tokens(): ensure the vector has one entry per normalized block (use None
for text-only blocks) to avoid indexed-access hazards; implement a validation in
new() that either rejects mismatched lengths (return Err/panic) or normalizes
the input by expanding an empty or shorter extra_block_hashes into a
Vec<Option<u64>> of length num_blocks filled with None, and update
split_tokens() to rely on that validated/normalized vector instead of indexing
blindly (references: new(), split_tokens(), extra_block_hashes).
- Line 161: The fenced code block containing the example tokens and "Chained
seq_hash" lines is missing a language tag (MD040); update that fence to include
a language identifier (e.g., change ``` to ```text) so the block becomes ```text
... ```; locate the block that contains "Tokens: [sys×32, "What is", img_pad×48,
"?"]" and "Chained seq_hash: S0→S1 shared prefix, S2 diverges." and add the
language tag to the opening fence.
- Around line 49-50: The plan references the wrong Rust module path for the
tokens implementation; update any mentions that point to the nonexistent
llm-level tokens file to reference the actual tokens crate/module (the tokens
module used by the project, e.g., the tokens crate or tokens:: module and its
lib entry) and the block/token implementation symbol (the tokens module and its
block implementation) so guidance and TODOs target the real code; make the same
correction for the other occurrence noted in the plan.
In `@vllm_native_kvbm.sh`:
- Line 1: Add the required repository file header to the top of
vllm_native_kvbm.sh: insert the project's copyright/license/header boilerplate
as the first lines (above the shebang or replace the current first line if
policy expects header before shebang) so the file passes the repository
copyright/header check; ensure the header format matches other shell files in
the repo (same SPDX, copyright notices, and any required contributor lines).
- Line 9: The script currently hard-codes PORT=8000 which breaks reproducibility
on shared hosts; change the script to allocate a free port dynamically (or call
the repo’s port allocator utility) and assign it to the PORT variable, then use
that PORT when launching the server and in any curl calls; update all
occurrences of the literal 8000 in this script (the PORT variable and any
server/curl invocations) to reference the new PORT variable so the same
allocated port is consistently passed through.
In `@vllm.txt`:
- Line 1: The file vllm.txt is missing the standard repository file header which
is causing CI to fail; add the project's standard header block (the same header
used across other source files—license, copyright, and any required metadata
lines) to the very top of vllm.txt so the file matches the repository header
format expected by CI.
- Line 7: The log in utils.py (around the non-default args logging at
utils.py:261) is emitting environment-specific identifiers (e.g., engine_id,
kv_ip, kv_port and any other connector-specific fields) — update the code that
formats or emits the non-default args to redact or replace those fields with
stable placeholders (e.g., "<ENGINE_ID_REDACTED>", "<KV_IP_REDACTED>",
"<KV_PORT_REDACTED>") before logging; implement this by filtering or mapping
keys in the dict used for the log (the non-default args object) or by calling a
small sanitizer function (e.g., redact_env_identifiers) prior to the logging
call so engine_id, kv_connector_module_path, network endpoints and similar
infra-specific values are not committed in plaintext.
---
Nitpick comments:
In `@vllm_native_kvbm.sh`:
- Around line 25-33: The readiness loop that polls
http://localhost:$PORT/v1/models (the for loop using seq 1 120 and curl) lacks a
failure path and no guaranteed cleanup; modify it to track elapsed seconds (or
use a timeout counter) and if the loop exceeds the max (120s) exit with a
non-zero status and error message, and add a trap that captures EXIT (and
INT/TERM) to kill the background server PID variable (e.g., the PID you get when
launching vLLM) so the server is always terminated on early exit; apply the same
timeout+trap pattern to the second readiness block referenced around lines 69-74
to ensure symmetric behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7d05730a-1438-4817-94e4-26065d300c0e
📒 Files selected for processing (3)
plan.mdvllm.txtvllm_native_kvbm.sh
| **File**: `lib/llm/src/tokens.rs` | ||
|
|
There was a problem hiding this comment.
Fix the Rust path references to the actual token module.
The plan points to lib/llm/src/tokens.rs, but the current token/block implementation is in lib/tokens/src/lib.rs. This mismatch can send implementation work to the wrong location.
Also applies to: 179-179
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plan.md` around lines 49 - 50, The plan references the wrong Rust module path
for the tokens implementation; update any mentions that point to the nonexistent
llm-level tokens file to reference the actual tokens crate/module (the tokens
module used by the project, e.g., the tokens crate or tokens:: module and its
lib entry) and the block/token implementation symbol (the tokens module and its
block implementation) so guidance and TODOs target the real code; make the same
correction for the other occurrence noted in the plan.
| } | ||
| ``` | ||
|
|
||
| `new()` and `split_tokens()` accept `extra_block_hashes: Vec<Option<u64>>`. In `split_tokens()`, Rayon parallel map passes `extra_block_hashes[i]` to each chunk. |
There was a problem hiding this comment.
Tighten extra_block_hashes length contract to avoid index hazards.
The plan currently permits an empty extra_block_hashes, while also describing per-block indexed access. Please require one normalized entry per block (None for text-only blocks) or explicitly define a safe fallback when vector is empty.
Also applies to: 104-104, 119-129
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plan.md` at line 85, Tighten the contract for extra_block_hashes used by
new() and split_tokens(): ensure the vector has one entry per normalized block
(use None for text-only blocks) to avoid indexed-access hazards; implement a
validation in new() that either rejects mismatched lengths (return Err/panic) or
normalizes the input by expanding an empty or shorter extra_block_hashes into a
Vec<Option<u64>> of length num_blocks filled with None, and update
split_tokens() to rely on that validated/normalized vector instead of indexing
blindly (references: new(), split_tokens(), extra_block_hashes).
|
|
||
| ## Block-Level Example | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced code block (MD040).
This warning is valid from markdownlint; specify a fence language (for example, text).
Proposed doc fix
-```
+```text
Tokens: [sys×32, "What is", img_pad×48, "?"]
block_size = 16
...
-Chained seq_hash: S0→S1 shared prefix, S2 diverges.
+Chained seq_hash: S0→S1 shared prefix, S2 diverges.</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>
[warning] 161-161: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @plan.md at line 161, The fenced code block containing the example tokens and
"Chained seq_hash" lines is missing a language tag (MD040); update that fence to
include a language identifier (e.g., change totext) so the block becomes
text ... ; locate the block that contains "Tokens: [sys×32, "What is",
img_pad×48, "?"]" and "Chained seq_hash: S0→S1 shared prefix, S2 diverges." and
add the language tag to the opening fence.
</details>
<!-- fingerprinting:phantom:poseidon:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
as title
Summary by CodeRabbit
Release Notes
Documentation