[BUG] Tool Call Results Persist Across Independent Stateless Requests (v1.0.67 / gpt-oss-120b-MXFP4-Q8)

## Describe the bug

There appears to be a state leak where EXO "remembers" the result of a function call from a previous request. 

In a stateless `/v1/chat/completions` setup, each request should be independent. However, after successfully providing a tool result to the model, subsequent identical requests (without the tool history) bypass the tool-calling mechanism and directly return the final answer. This behavior persists until the instance is deleted and re-launched.

## To Reproduce

Model : `mlx-community/gpt-oss-120b-MXFP4-Q8`
Sharding : Pipeline
Instance Type : MLX Ring

I have attached two scripts, **`p1.sh`** and **`p2.sh`**, to facilitate testing.

### 1. Initial Request (using `p1.sh`)
Execute `p1.sh` to request `2+2` using the `calculator` tool.

**Expected & Actual Result:**
The model correctly identifies the need for a tool and returns a `tool_calls` response:
```json
{
  "id": "b34fb07b-5176-4ce6-9703-1e36200748ee",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>We need to use calculator tool. Use function.</think>",
        "tool_calls": [
          {
            "id": "36415a46-9a79-440d-b2e0-4ec57ed917d8",
            "type": "function",
            "function": {
              "name": "calculator",
              "arguments": "{\n  \"expression\": \"2+2\"\n}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
```

### 2. Provide Tool Result (using `p2.sh`)
Execute `p2.sh`, which includes the full history (User message, Assistant tool call, and Tool output).

**Actual Result:**
The model correctly returns the final answer:
```json
{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The result of 2 + 2 is **4**."
      },
      "finish_reason": "stop"
    }
  ]
}
```

### 3. Re-run Initial Request (using `p1.sh` again)
Run `p1.sh` for the second time (no tool history provided).

## Expected behavior

The model should trigger `tool_calls` again, identical to Step 1.

## Actual behavior

The model bypasses the tool call and returns the final answer directly, as if it "remembered" the state from Step 2:
```json
{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The result of 2 + 2 is **4**.",
        "tool_calls": []
      },
      "finish_reason": "stop"
    }
  ]
}
```

## Environment

- macOS Version:26.2
- EXO Version:1.0.67
- Hardware:M3 Ultra Mac Studio (512GB RAM)
- Interconnection:Single Host

## Additional context

* **Persistence**: Repeatedly executing `p1.sh` continues to return the direct answer without triggering the tool.
* **Workaround**: Deleting the instance and re-launching it "clears" this memory. After a fresh launch, `p1.sh` will correctly trigger a tool call again (but only until a tool result is provided once).

---

## Appendix: Test Scripts

`p1.sh`: This script sends a **stateless initial request**. It asks "What is 2+2?" and provides the `calculator` tool definition. Under normal conditions, the model should always return a `tool_calls` object to request the calculation.

### p1.sh
```bash
curl -N -X POST http://localhost:52415/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "mlx-community/gpt-oss-120b-MXFP4-Q8",
  "messages": [
    {
      "role": "user",
      "content": "What is 2+2? Use the calculator tool."
    }
  ],
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "calculator",
        "description": "Perform basic arithmetic",
        "parameters": {
          "type": "object",
          "properties": {
            "expression": {
              "type": "string",
              "description": "Math expression"
            }
          },
          "required": [
            "expression"
          ]
        }
      }
    }
  ]
}' | jq
```

`p2.sh`: This script sends a **request with conversation history**. It includes the user's question, the assistant's previous tool call, and the tool's response (`2+2 = 4`). This is used to complete the function-calling cycle.

### p2.sh
```bash
curl -N -X POST http://localhost:52415/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "mlx-community/gpt-oss-120b-MXFP4-Q8",
  "messages": [
    {
      "role": "user",
      "content": "What is 2+2? Use the calculator tool."
    },
    {
      "role": "assistant",
      "content": "<think>User asks: \"What is 2+2? Use the calculator tool.\" So we need to call calculator function with expression \"2+2\".</think>",
      "tool_calls": [
        {
          "id": "be3f751a-ff6f-4b60-8e84-d60cffabe589",
          "index": 0,
          "type": "function",
          "function": {
            "name": "calculator",
            "arguments": "{\n  \"expression\": \"2+2\"\n}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "2+2 = 4",
      "tool_name": "calculator",
      "tool_call_id": "be3f751a-ff6f-4b60-8e84-d60cffabe589"
    }
  ],
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "calculator",
        "description": "Perform basic arithmetic",
        "parameters": {
          "type": "object",
          "properties": {
            "expression": {
              "type": "string",
              "description": "Math expression"
            }
          },
          "required": [
            "expression"
          ]
        }
      }
    }
  ]
}' | jq
```

[p1.sh](https://github.com/user-attachments/files/25047238/p1.sh)

[p2.sh](https://github.com/user-attachments/files/25047252/p2.sh)

1. **Baseline**: Run `p1.sh`. You will receive a `tool_calls` response (Correct).
2. **State Injection**: Run `p2.sh`. You will receive the final answer "4" (Correct).
3. **The Leak**: Run `p1.sh` again.
* *Expected*: A `tool_calls` response (identical to step 1).
* *Actual*: The model directly returns the final answer "4", bypassing the tool call (Bug).

## Screenshot

<img width="1652" height="900" alt="Image" src="https://github.com/user-attachments/assets/3d414b72-3f4a-429d-9969-9107e1656b7c" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Tool Call Results Persist Across Independent Stateless Requests (v1.0.67 / gpt-oss-120b-MXFP4-Q8) #1362

Describe the bug

To Reproduce

1. Initial Request (using `p1.sh`)

2. Provide Tool Result (using `p2.sh`)

3. Re-run Initial Request (using `p1.sh` again)

Expected behavior

Actual behavior

Environment

Additional context

Appendix: Test Scripts

p1.sh

p2.sh

Screenshot

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Tool Call Results Persist Across Independent Stateless Requests (v1.0.67 / gpt-oss-120b-MXFP4-Q8) #1362

Description

Describe the bug

To Reproduce

1. Initial Request (using p1.sh)

2. Provide Tool Result (using p2.sh)

3. Re-run Initial Request (using p1.sh again)

Expected behavior

Actual behavior

Environment

Additional context

Appendix: Test Scripts

p1.sh

p2.sh

Screenshot

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Initial Request (using `p1.sh`)

2. Provide Tool Result (using `p2.sh`)

3. Re-run Initial Request (using `p1.sh` again)