Skip to content

feat(loop-detection): add escape hatches for legitimate batch tool calls#2591

Open
mvanhorn wants to merge 1 commit into
bytedance:mainfrom
mvanhorn:fix/2511-loop-detection-batch-friendly
Open

feat(loop-detection): add escape hatches for legitimate batch tool calls#2591
mvanhorn wants to merge 1 commit into
bytedance:mainfrom
mvanhorn:fix/2511-loop-detection-batch-friendly

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Closes #2511

Scientific workflows (RNA-seq differential expression, parameter sweeps, multi-file analysis) legitimately call the same tool many times in succession. The current LoopDetectionMiddleware hash + frequency layers fire on these workflows and force-stop the run with no way to opt out.

Change

Three orthogonal escape hatches, plus a clearer error message:

Mechanism Scope Set by
batch_friendly_tools={\"bash\", ...} Per-tool, all calls Middleware constructor
_loop_detection_skip: true in tool args Single call Tool author
loop_detection_disabled: true in runtime context Whole run CLI flag / agent config

Calls excluded by any mechanism are not appended to _history or counted in _tool_freq — they are effectively invisible to detection. Other calls in the same response keep being tracked normally.

When the middleware does fire a hard stop, the message now mentions all three escape hatches so the user has a discoverable path out:

[FORCED STOP] Repeated tool calls exceeded the safety limit. Producing
final answer with results collected so far. If this is an intentional
batch workflow, use runtime context loop_detection_disabled, configure
batch_friendly_tools, or set _loop_detection_skip=true in a tool call's args.

Tests

backend/tests/test_loop_detection_middleware.py adds a TestEscapeHatches class with 6 cases:

  • test_batch_friendly_tools_skip_hash_layer — 6 identical bash calls with bash batch-friendly: no hard stop.
  • test_batch_friendly_tools_skip_frequency_layer — 60 varying-args bash calls: no warning, no hard stop.
  • test_loop_detection_skip_marker_in_args — 6 identical calls with _loop_detection_skip=true: no hard stop.
  • test_runtime_loop_detection_disabled — 6 identical calls with runtime override: no hard stop, no state recorded.
  • test_hard_stop_message_includes_escape_hatch_hint — default config triggers hard stop, asserts the hint text contains all three mechanism names.
  • test_no_regression_when_no_escape_hatches_configured — default behavior unchanged.
============================== 52 passed in 0.15s ==============================

Verification

  • cd backend && PYTHONPATH=. uv run pytest tests/test_loop_detection_middleware.py -v — 52 passed
  • cd backend && uvx ruff check packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py tests/test_loop_detection_middleware.py — clean
  • cd backend && uvx ruff format --check ... — clean
  • backend/CLAUDE.md middleware-chain bullet updated to mention the new escape hatches.

Coordination

Coordinates with #2569's alternating-pattern detector (separate PR): that one tightens detection for cases the current logic misses, this one loosens it for cases the current logic over-fires on. Together they make detection more accurate without regressing existing behavior.

🤖 Built with assistance from Claude Code and Codex.

Closes bytedance#2511.

The loop-detection middleware aborts scientific workflows that legitimately
batch hundreds of tool calls (e.g. RNA-seq differential expression). Add
three orthogonal escape hatches plus a clearer error message:

- batch_friendly_tools constructor arg: tool names listed here are
  excluded from both the hash layer and the per-tool-frequency layer.
- _loop_detection_skip: true in tool args: per-call opt-out, useful for
  one-off batch operations without changing the middleware config.
- runtime context loop_detection_disabled: true: workflow-level kill
  switch when the entire run is known to be batch-safe.

The hard-stop messages now mention all three so users have a discoverable
path out of a forced stop.

6 new tests cover each escape hatch plus the unchanged default-config
regression. 52 tests total pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loop Detection Middleware Incorrectly Interrupts Scientific Workflows with Repeated Tool Calls

1 participant