-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Describe the bug
There appears to be a state leak where EXO "remembers" the result of a function call from a previous request.
In a stateless /v1/chat/completions setup, each request should be independent. However, after successfully providing a tool result to the model, subsequent identical requests (without the tool history) bypass the tool-calling mechanism and directly return the final answer. This behavior persists until the instance is deleted and re-launched.
To Reproduce
Model : mlx-community/gpt-oss-120b-MXFP4-Q8
Sharding : Pipeline
Instance Type : MLX Ring
I have attached two scripts, p1.sh and p2.sh, to facilitate testing.
1. Initial Request (using p1.sh)
Execute p1.sh to request 2+2 using the calculator tool.
Expected & Actual Result:
The model correctly identifies the need for a tool and returns a tool_calls response:
{
"id": "b34fb07b-5176-4ce6-9703-1e36200748ee",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<think>We need to use calculator tool. Use function.</think>",
"tool_calls": [
{
"id": "36415a46-9a79-440d-b2e0-4ec57ed917d8",
"type": "function",
"function": {
"name": "calculator",
"arguments": "{\n \"expression\": \"2+2\"\n}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}2. Provide Tool Result (using p2.sh)
Execute p2.sh, which includes the full history (User message, Assistant tool call, and Tool output).
Actual Result:
The model correctly returns the final answer:
{
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The result of 2 + 2 is **4**."
},
"finish_reason": "stop"
}
]
}3. Re-run Initial Request (using p1.sh again)
Run p1.sh for the second time (no tool history provided).
Expected behavior
The model should trigger tool_calls again, identical to Step 1.
Actual behavior
The model bypasses the tool call and returns the final answer directly, as if it "remembered" the state from Step 2:
{
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The result of 2 + 2 is **4**.",
"tool_calls": []
},
"finish_reason": "stop"
}
]
}Environment
- macOS Version:26.2
- EXO Version:1.0.67
- Hardware:M3 Ultra Mac Studio (512GB RAM)
- Interconnection:Single Host
Additional context
- Persistence: Repeatedly executing
p1.shcontinues to return the direct answer without triggering the tool. - Workaround: Deleting the instance and re-launching it "clears" this memory. After a fresh launch,
p1.shwill correctly trigger a tool call again (but only until a tool result is provided once).
Appendix: Test Scripts
p1.sh: This script sends a stateless initial request. It asks "What is 2+2?" and provides the calculator tool definition. Under normal conditions, the model should always return a tool_calls object to request the calculation.
p1.sh
curl -N -X POST http://localhost:52415/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "mlx-community/gpt-oss-120b-MXFP4-Q8",
"messages": [
{
"role": "user",
"content": "What is 2+2? Use the calculator tool."
}
],
"stream": false,
"tools": [
{
"type": "function",
"function": {
"name": "calculator",
"description": "Perform basic arithmetic",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression"
}
},
"required": [
"expression"
]
}
}
}
]
}' | jqp2.sh: This script sends a request with conversation history. It includes the user's question, the assistant's previous tool call, and the tool's response (2+2 = 4). This is used to complete the function-calling cycle.
p2.sh
curl -N -X POST http://localhost:52415/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "mlx-community/gpt-oss-120b-MXFP4-Q8",
"messages": [
{
"role": "user",
"content": "What is 2+2? Use the calculator tool."
},
{
"role": "assistant",
"content": "<think>User asks: \"What is 2+2? Use the calculator tool.\" So we need to call calculator function with expression \"2+2\".</think>",
"tool_calls": [
{
"id": "be3f751a-ff6f-4b60-8e84-d60cffabe589",
"index": 0,
"type": "function",
"function": {
"name": "calculator",
"arguments": "{\n \"expression\": \"2+2\"\n}"
}
}
]
},
{
"role": "tool",
"content": "2+2 = 4",
"tool_name": "calculator",
"tool_call_id": "be3f751a-ff6f-4b60-8e84-d60cffabe589"
}
],
"stream": false,
"tools": [
{
"type": "function",
"function": {
"name": "calculator",
"description": "Perform basic arithmetic",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression"
}
},
"required": [
"expression"
]
}
}
}
]
}' | jq- Baseline: Run
p1.sh. You will receive atool_callsresponse (Correct). - State Injection: Run
p2.sh. You will receive the final answer "4" (Correct). - The Leak: Run
p1.shagain.
- Expected: A
tool_callsresponse (identical to step 1). - Actual: The model directly returns the final answer "4", bypassing the tool call (Bug).
Screenshot
