fix(loop-detection): detect alternating tool-name cycles#2590
Open
mvanhorn wants to merge 1 commit into
Open
Conversation
Adds a third detection layer to LoopDetectionMiddleware that catches short alternating tool-name patterns (e.g. web_search → web_fetch → web_search → web_fetch ...). The two existing layers miss these: - Layer 1 (hash) compares full tool-call sets, so two calls with different args hash differently. - Layer 2 (per-tool frequency) only triggers when one tool name passes the threshold; alternation splits the count across two names so neither hits the limit fast enough. The new layer tracks the per-thread sequence of tool names, scans the tail for length-L cycles (L in [cycle_min_len, cycle_max_len]), and fires a warning at cycle_repeats_warn (default 3) and a hard stop at cycle_repeats_hard (default 4). A `len(set(cycle)) < 2` guard skips single-name repeats so this never overlaps with Layer 2. Adds 53 tests total: cycle warn / hard stop on 2- and 3-tool patterns, mixed sequences that should NOT trigger, regression for hash-based detection, regression for per-tool frequency, and unit coverage of the _detect_cycle helper at the boundaries (empty list, below min, exact hit, near-miss). Also extends LRU eviction and the thread-reset path to clear the new _name_history and _cycle_warned dicts. Closes bytedance#2569
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a third “Layer 3” to LoopDetectionMiddleware to detect short repeating tool-name cycles (e.g., alternating web_search/web_fetch) that evade the existing hash-based and per-tool-frequency safeguards, along with new unit tests covering cycle detection and regressions.
Changes:
- Implement cycle-based loop detection with new configurable thresholds and per-thread name history tracking.
- Add warning/hard-stop behavior for detected cycles and ensure new tracking state is evicted/reset correctly.
- Extend test suite with new cycle-detection scenarios and helper for generating named tool calls.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
backend/packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py |
Adds Layer 3 cycle detection, new config defaults, per-thread name tracking, and reset/eviction updates. |
backend/tests/test_loop_detection_middleware.py |
Adds TestCycleDetection coverage and updates LRU eviction assertions for new tracking structures. |
Comment on lines
+361
to
+362
| name_history = self._name_history[thread_id] | ||
| name_history.extend(name for name in tool_names if name) |
Comment on lines
+393
to
+405
| if cycle: | ||
| warned = self._cycle_warned[thread_id] | ||
| if cycle not in warned: | ||
| warned.add(cycle) | ||
| logger.warning( | ||
| "Tool-name cycle detected — injecting warning", | ||
| extra={ | ||
| "thread_id": thread_id, | ||
| "cycle": cycle, | ||
| "tools": tool_names, | ||
| }, | ||
| ) | ||
| return _WARNING_MSG, False |
Comment on lines
+39
to
+40
| _DEFAULT_CYCLE_REPEATS_WARN = 3 # warn when a cycle repeats 3 times | ||
| _DEFAULT_CYCLE_REPEATS_HARD = 4 # hard-stop at 4 repeats |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a third detection layer to
LoopDetectionMiddlewarethat catches short alternating tool-name patterns. The two existing layers missweb_search → web_fetch → web_search → web_fetch → ...loops because:Reported in #2569 with a trace showing 200+ alternating calls before the existing safeguards stopped the run.
Why this matters
Cited evidence:
web_search → web_fetch → web_search → web_fetch → ...exhaustingrecursion_limit.loop_detection_middleware.py) describes the two existing layers and the gap they leave: alternation passes the hash check (different args) and the per-tool-frequency check (split count).Changes
backend/packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py:_detect_cycle(names, min_len, max_len, min_repeats)static helper that scans the tail ofnamesfor a length-L cycle repeatingmin_repeatstimes for any L in[min_len, max_len]. Skips single-name repeats vialen(set(cycle)) < 2so it never overlaps with Layer 2._name_history: OrderedDict[str, list[str]]and_cycle_warned: dict[str, set[str]].cycle_min_len=2,cycle_max_len=4,cycle_repeats_warn=3,cycle_repeats_hard=4._track_and_check. Hard limit returns_HARD_STOP_MSG, True. Warn returns_WARNING_MSG, Falseonce per (thread, cycle) pair._evict_if_neededand the thread-reset path both clear the new dicts.backend/tests/test_loop_detection_middleware.py:TestCycleDetectionclass with 6 scenario tests:a, b, c) hard-stops at 4 cycles (12 calls).a, b, a, c, a, b) does NOT trigger._detect_cyclefor empty list, below-min length, exact hit, and near-miss.How to test
Both pass locally: 53/53 tests pass, ruff clean.
A reviewer can also reproduce the original failure mode by feeding the middleware an alternating sequence and observing that warn fires by call 6 and hard stop by call 8 — vs the 200+ calls reported in #2569.
AI assistance disclosure
Developed with Claude Code orchestrating Codex CLI (gpt-5.5 high). Adversarial review focused on layer ordering, single-tool overlap with Layer 2 (covered by
len(set(cycle)) < 2guard), and eviction parity (verified the new dicts are cleared in both_evict_if_neededand the reset path).Closes #2569