Pronounced "zep-em"
Rust implementation of middleware for Zed's Edit Prediction feature. This server is meant to sit between Zed and a llama.cpp server using a FIM-compatible model. The server expects the interface found in the zeta InlineCompletion struct.
POST /predict_edits/v2- Main endpoint for text completionGET /health- Health check endpointGET /test-completion- Test endpoint with hardcoded completionPOST /diagnostics- Advanced diagnostics endpoint for troubleshooting
The server recognizes special markers in the text:
<|user_cursor_is_here|>- Cursor position marker<|editable_region_start|>- Start of editable region<|editable_region_end|>- End of editable region
- Start the server w/ logging:
RUST_LOG=info cargo run- Connect with Zed editor:
ZED_PREDICT_EDITS_URL=http://localhost:3000/predict_edits/v2 zedFor testing with a hardcoded completion:
ZED_PREDICT_EDITS_URL=http://localhost:3000/test-completion zedPORT- Server port (default: 3000)LLAMA_SERVER_URL- URL of the llama.cpp server (default: http://localhost:8080)MAX_TOKENS- Maximum tokens to generate (default: 128)TEMPERATURE- Temperature for generation (default: 0)
- Rust 1.58+
- A running llama.cpp server with infill capabilities
-
LLM returns only cursor markers - Sometimes the LLM returns
<|cursor|>instead of actual completions. The server now adds fallback text if this happens. -
LLM connection errors - Ensure your llama.cpp server is running and accessible at the URL specified in LLAMA_SERVER_URL.
-
Empty or insufficient completions - If completions are empty or not useful:
- Try increasing MAX_TOKENS (default: 128)
- Adjust TEMPERATURE (lower is usually what you want for more deterministic completions)
- Ensure your cursor is placed in a location where completion makes sense
Use the /diagnostics endpoint to directly test the LLM server integration:
curl -X POST http://localhost:3000/diagnostics -H "Content-Type: application/json" \
-d '{"input_text": "your text with <|user_cursor_is_here|> marker"}'This returns detailed information about:
- The context extracted from your input
- The exact request sent to the LLM
- The raw response from the LLM
- The processed output
Check server logs for detailed information:
DEBUGmessages show exact request/response dataINFOmessages show high-level operation detailsWARNmessages indicate potential issues
If a log entry contains LLM returned '<|cursor|>' marker, removing it, the LLM is not generating proper completions.