Skip to content

fix(loop-detection): detect alternating tool-name cycles#2590

Open
mvanhorn wants to merge 1 commit into
bytedance:mainfrom
mvanhorn:fix/2569-loop-detection-alternating-pattern
Open

fix(loop-detection): detect alternating tool-name cycles#2590
mvanhorn wants to merge 1 commit into
bytedance:mainfrom
mvanhorn:fix/2569-loop-detection-alternating-pattern

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

Adds a third detection layer to LoopDetectionMiddleware that catches short alternating tool-name patterns. The two existing layers miss web_search → web_fetch → web_search → web_fetch → ... loops because:

  • Layer 1 (hash) compares full tool-call sets, so two calls with different args (different search terms / URLs) hash differently and never match.
  • Layer 2 (per-tool frequency) only triggers when one tool name passes the threshold; alternation splits the count across two names so neither hits the limit fast enough.

Reported in #2569 with a trace showing 200+ alternating calls before the existing safeguards stopped the run.

Why this matters

Cited evidence:

  • #2569 shows the literal failure mode: 200+ tool calls of web_search → web_fetch → web_search → web_fetch → ... exhausting recursion_limit.
  • The middleware's own docstring (lines 144-159 of loop_detection_middleware.py) describes the two existing layers and the gap they leave: alternation passes the hash check (different args) and the per-tool-frequency check (split count).
  • The fix lands at the same module so all three layers stay co-located and configurable.

Changes

backend/packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py:

  • New _detect_cycle(names, min_len, max_len, min_repeats) static helper that scans the tail of names for a length-L cycle repeating min_repeats times for any L in [min_len, max_len]. Skips single-name repeats via len(set(cycle)) < 2 so it never overlaps with Layer 2.
  • New per-thread _name_history: OrderedDict[str, list[str]] and _cycle_warned: dict[str, set[str]].
  • New constructor args (with defaults): cycle_min_len=2, cycle_max_len=4, cycle_repeats_warn=3, cycle_repeats_hard=4.
  • Layer 3 runs after Layers 1 & 2 in _track_and_check. Hard limit returns _HARD_STOP_MSG, True. Warn returns _WARNING_MSG, False once per (thread, cycle) pair.
  • LRU eviction in _evict_if_needed and the thread-reset path both clear the new dicts.

backend/tests/test_loop_detection_middleware.py:

  • New TestCycleDetection class with 6 scenario tests:
    • 2-tool alternation warns at 3 cycles (6 calls).
    • 2-tool alternation hard-stops at 4 cycles (8 calls).
    • 3-tool cycle (a, b, c) hard-stops at 4 cycles (12 calls).
    • Mixed pattern (a, b, a, c, a, b) does NOT trigger.
    • Hash-based detection still works (regression).
    • Per-tool frequency still works (regression).
  • Static unit tests of _detect_cycle for empty list, below-min length, exact hit, and near-miss.

How to test

cd backend
uvx ruff check packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py tests/test_loop_detection_middleware.py
PYTHONPATH=. uv run pytest tests/test_loop_detection_middleware.py -v

Both pass locally: 53/53 tests pass, ruff clean.

A reviewer can also reproduce the original failure mode by feeding the middleware an alternating sequence and observing that warn fires by call 6 and hard stop by call 8 — vs the 200+ calls reported in #2569.

AI assistance disclosure

Developed with Claude Code orchestrating Codex CLI (gpt-5.5 high). Adversarial review focused on layer ordering, single-tool overlap with Layer 2 (covered by len(set(cycle)) < 2 guard), and eviction parity (verified the new dicts are cleared in both _evict_if_needed and the reset path).

Closes #2569

Adds a third detection layer to LoopDetectionMiddleware that catches
short alternating tool-name patterns (e.g. web_search → web_fetch →
web_search → web_fetch ...). The two existing layers miss these:

- Layer 1 (hash) compares full tool-call sets, so two calls with
  different args hash differently.
- Layer 2 (per-tool frequency) only triggers when one tool name passes
  the threshold; alternation splits the count across two names so
  neither hits the limit fast enough.

The new layer tracks the per-thread sequence of tool names, scans the
tail for length-L cycles (L in [cycle_min_len, cycle_max_len]), and
fires a warning at cycle_repeats_warn (default 3) and a hard stop at
cycle_repeats_hard (default 4). A `len(set(cycle)) < 2` guard skips
single-name repeats so this never overlaps with Layer 2.

Adds 53 tests total: cycle warn / hard stop on 2- and 3-tool patterns,
mixed sequences that should NOT trigger, regression for hash-based
detection, regression for per-tool frequency, and unit coverage of the
_detect_cycle helper at the boundaries (empty list, below min, exact
hit, near-miss).

Also extends LRU eviction and the thread-reset path to clear the new
_name_history and _cycle_warned dicts.

Closes bytedance#2569
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a third “Layer 3” to LoopDetectionMiddleware to detect short repeating tool-name cycles (e.g., alternating web_search/web_fetch) that evade the existing hash-based and per-tool-frequency safeguards, along with new unit tests covering cycle detection and regressions.

Changes:

  • Implement cycle-based loop detection with new configurable thresholds and per-thread name history tracking.
  • Add warning/hard-stop behavior for detected cycles and ensure new tracking state is evicted/reset correctly.
  • Extend test suite with new cycle-detection scenarios and helper for generating named tool calls.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
backend/packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py Adds Layer 3 cycle detection, new config defaults, per-thread name tracking, and reset/eviction updates.
backend/tests/test_loop_detection_middleware.py Adds TestCycleDetection coverage and updates LRU eviction assertions for new tracking structures.

Comment on lines +361 to +362
name_history = self._name_history[thread_id]
name_history.extend(name for name in tool_names if name)
Comment on lines +393 to +405
if cycle:
warned = self._cycle_warned[thread_id]
if cycle not in warned:
warned.add(cycle)
logger.warning(
"Tool-name cycle detected — injecting warning",
extra={
"thread_id": thread_id,
"cycle": cycle,
"tools": tool_names,
},
)
return _WARNING_MSG, False
Comment on lines +39 to +40
_DEFAULT_CYCLE_REPEATS_WARN = 3 # warn when a cycle repeats 3 times
_DEFAULT_CYCLE_REPEATS_HARD = 4 # hard-stop at 4 repeats
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants