[BUG] When executing Runners(of which agent contains stateful toolset) in parallel, toolset is being released when only one of them has finished

**Describe the bug**

When you configure root agent able to call same `AgentTool` which is configured with MCP toolset in parallel, and one of them finishes first, the rest of them fails.

**To Reproduce**

<details>
<summary>Before we go..</summary>

- Any stateful toolset(which means that its `close()` method does something) may cause this error.
- Since I encountered this is when using `McpToolset`, I will explain the step to reproduce using `McpToolset`.
- It is more likely to reproduce the issue if:
  - The execution time of AgentTool is fairly long(about >10s).
  - The execution time of AgentTool is random.
  - These will cause the invocation of AgentTools not to finish at the same time, making a certain call to finish at first.

</details>

You may reproduce this issue by 2 ways.

First case: call `Runner` which shares same `Agent` concurrently.
1. A root agent, configured with an `McpToolset`.
2. Execute two runners simultaneously.
3. When any runner finishes first, the rest of runner fails, raises exception(related to resource exhaution).

Second case: call `AgentTool` with `McpToolset` in parallel.
1. A root agent, configured with an AgentTool.
2. This AgentTool consists of an agent, which has a MCP toolset.
3. Write an instruction to trigger root agent to call AgentTools in parallel. (e.g. `You must call 10 'search_item' tool in parallel.`)
4. Execute the root agent.
5. When any AgentTool finishes first, the rest of AgentTool fails, raises exception(related to resource exhaution).

**Expected behavior**

Even if one of the runner has finished first, the rest of runners should not fail.

**Screenshots**

None, but I could reproduce both case by adding 2 test sample agent setups. You may look into it this PR #4046.

**Desktop (please complete the following information):**

 - OS: macOS
 - Python version(python -V): 3.13
 - ADK version(pip show google-adk): 1.9.0

 **Model Information:**

 - Are you using LiteLLM: No
 - Which model is being used: gemini-2.5-flash

**Additional context**

This happens due to the early release of shared resource in runner, especially when the toolset is stateful.

<details>
<summary>First case: call `Runner` which shares same `Agent` concurrently</summary>

Starting with the first case, I described the initial state to a diagram as below.
Note that 2 runners are connected to a single toolset, demonstrates that the runner will close the toolset after it finishes since both runners are based on the same agent.

<img width="1602" height="882" alt="Image" src="https://github.com/user-attachments/assets/e963d944-403f-49f5-b294-c4c8c0e67b11" />

Assume that Runner 1 is done first, cleaning up the toolset by calling `self.close()`.

<img width="1664" height="820" alt="Image" src="https://github.com/user-attachments/assets/21e75b6a-0b9d-49c4-b8ed-c5ee6826cb29" />

Now the toolset has been closed. But Runner 2 is not finished yet. Therefore, tool call throws exception, making the runner to fail.

<img width="1740" height="946" alt="Image" src="https://github.com/user-attachments/assets/7392ebb1-7c82-42f5-8fba-3c837d15a987" />

</details>

<details>
<summary>Second case: call `AgentTool` with `McpToolset` in parallel.</summary>

Same thing goes on with the second case. The initial state is quite complicated, but the configuration is basically same: 2 runners are being executed concurrently.

<img width="2442" height="1122" alt="Image" src="https://github.com/user-attachments/assets/6f79c895-78f3-4333-98f1-c4883b368e1e" />

And as you expected, same problem occurs.

<img width="1962" height="1060" alt="Image" src="https://github.com/user-attachments/assets/ce7b99e7-bae8-4073-87b3-a8a2e8a360c7" />

</details>

IMO users usually do not use runner explicitly, I think first case would not occur frequently.

The second will occur quite frequently since calling same AgentTool in parallel is common(and useful!), e.g.
- RAG search in parallel (this is my case)
- run PR review bot in parallel
- fetch data chunks in parallel

Also, calling tool in parallel is decided by LLM, and from what I know is that LLM usually calls tool in parallel. We could controll this behaviour only by restricting LLM to call tools sequentially, which is not deterministic(99% maybe, but not 100%).

I made a PR to address the second issue, adding AgentToolManager which functions as reference-count for agent when using AgentTool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] When executing Runners(of which agent contains stateful toolset) in parallel, toolset is being released when only one of them has finished #4045

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] When executing Runners(of which agent contains stateful toolset) in parallel, toolset is being released when only one of them has finished #4045

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions