Skip to content

[BUG] Tool Call Results Persist Across Independent Stateless Requests (v1.0.67 / gpt-oss-120b-MXFP4-Q8) #1362

@andrewwutw

Description

@andrewwutw

Describe the bug

There appears to be a state leak where EXO "remembers" the result of a function call from a previous request.

In a stateless /v1/chat/completions setup, each request should be independent. However, after successfully providing a tool result to the model, subsequent identical requests (without the tool history) bypass the tool-calling mechanism and directly return the final answer. This behavior persists until the instance is deleted and re-launched.

To Reproduce

Model : mlx-community/gpt-oss-120b-MXFP4-Q8
Sharding : Pipeline
Instance Type : MLX Ring

I have attached two scripts, p1.sh and p2.sh, to facilitate testing.

1. Initial Request (using p1.sh)

Execute p1.sh to request 2+2 using the calculator tool.

Expected & Actual Result:
The model correctly identifies the need for a tool and returns a tool_calls response:

{
  "id": "b34fb07b-5176-4ce6-9703-1e36200748ee",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>We need to use calculator tool. Use function.</think>",
        "tool_calls": [
          {
            "id": "36415a46-9a79-440d-b2e0-4ec57ed917d8",
            "type": "function",
            "function": {
              "name": "calculator",
              "arguments": "{\n  \"expression\": \"2+2\"\n}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

2. Provide Tool Result (using p2.sh)

Execute p2.sh, which includes the full history (User message, Assistant tool call, and Tool output).

Actual Result:
The model correctly returns the final answer:

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The result of 2 + 2 is **4**."
      },
      "finish_reason": "stop"
    }
  ]
}

3. Re-run Initial Request (using p1.sh again)

Run p1.sh for the second time (no tool history provided).

Expected behavior

The model should trigger tool_calls again, identical to Step 1.

Actual behavior

The model bypasses the tool call and returns the final answer directly, as if it "remembered" the state from Step 2:

{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The result of 2 + 2 is **4**.",
        "tool_calls": []
      },
      "finish_reason": "stop"
    }
  ]
}

Environment

  • macOS Version:26.2
  • EXO Version:1.0.67
  • Hardware:M3 Ultra Mac Studio (512GB RAM)
  • Interconnection:Single Host

Additional context

  • Persistence: Repeatedly executing p1.sh continues to return the direct answer without triggering the tool.
  • Workaround: Deleting the instance and re-launching it "clears" this memory. After a fresh launch, p1.sh will correctly trigger a tool call again (but only until a tool result is provided once).

Appendix: Test Scripts

p1.sh: This script sends a stateless initial request. It asks "What is 2+2?" and provides the calculator tool definition. Under normal conditions, the model should always return a tool_calls object to request the calculation.

p1.sh

curl -N -X POST http://localhost:52415/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "mlx-community/gpt-oss-120b-MXFP4-Q8",
  "messages": [
    {
      "role": "user",
      "content": "What is 2+2? Use the calculator tool."
    }
  ],
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "calculator",
        "description": "Perform basic arithmetic",
        "parameters": {
          "type": "object",
          "properties": {
            "expression": {
              "type": "string",
              "description": "Math expression"
            }
          },
          "required": [
            "expression"
          ]
        }
      }
    }
  ]
}' | jq

p2.sh: This script sends a request with conversation history. It includes the user's question, the assistant's previous tool call, and the tool's response (2+2 = 4). This is used to complete the function-calling cycle.

p2.sh

curl -N -X POST http://localhost:52415/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "mlx-community/gpt-oss-120b-MXFP4-Q8",
  "messages": [
    {
      "role": "user",
      "content": "What is 2+2? Use the calculator tool."
    },
    {
      "role": "assistant",
      "content": "<think>User asks: \"What is 2+2? Use the calculator tool.\" So we need to call calculator function with expression \"2+2\".</think>",
      "tool_calls": [
        {
          "id": "be3f751a-ff6f-4b60-8e84-d60cffabe589",
          "index": 0,
          "type": "function",
          "function": {
            "name": "calculator",
            "arguments": "{\n  \"expression\": \"2+2\"\n}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "2+2 = 4",
      "tool_name": "calculator",
      "tool_call_id": "be3f751a-ff6f-4b60-8e84-d60cffabe589"
    }
  ],
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "calculator",
        "description": "Perform basic arithmetic",
        "parameters": {
          "type": "object",
          "properties": {
            "expression": {
              "type": "string",
              "description": "Math expression"
            }
          },
          "required": [
            "expression"
          ]
        }
      }
    }
  ]
}' | jq

p1.sh

p2.sh

  1. Baseline: Run p1.sh. You will receive a tool_calls response (Correct).
  2. State Injection: Run p2.sh. You will receive the final answer "4" (Correct).
  3. The Leak: Run p1.sh again.
  • Expected: A tool_calls response (identical to step 1).
  • Actual: The model directly returns the final answer "4", bypassing the tool call (Bug).

Screenshot

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions