fix(loop-detection): detect alternating tool-name cycles by mvanhorn · Pull Request #2590 · bytedance/deer-flow

mvanhorn · 2026-04-26T23:05:54Z

Summary

Adds a third detection layer to LoopDetectionMiddleware that catches short alternating tool-name patterns. The two existing layers miss web_search → web_fetch → web_search → web_fetch → ... loops because:

Layer 1 (hash) compares full tool-call sets, so two calls with different args (different search terms / URLs) hash differently and never match.
Layer 2 (per-tool frequency) only triggers when one tool name passes the threshold; alternation splits the count across two names so neither hits the limit fast enough.

Reported in #2569 with a trace showing 200+ alternating calls before the existing safeguards stopped the run.

Why this matters

Cited evidence:

#2569 shows the literal failure mode: 200+ tool calls of web_search → web_fetch → web_search → web_fetch → ... exhausting recursion_limit.
The middleware's own docstring (lines 144-159 of loop_detection_middleware.py) describes the two existing layers and the gap they leave: alternation passes the hash check (different args) and the per-tool-frequency check (split count).
The fix lands at the same module so all three layers stay co-located and configurable.

Changes

backend/packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py:

New _detect_cycle(names, min_len, max_len, min_repeats) static helper that scans the tail of names for a length-L cycle repeating min_repeats times for any L in [min_len, max_len]. Skips single-name repeats via len(set(cycle)) < 2 so it never overlaps with Layer 2.
New per-thread _name_history: OrderedDict[str, list[str]] and _cycle_warned: dict[str, set[str]].
New constructor args (with defaults): cycle_min_len=2, cycle_max_len=4, cycle_repeats_warn=3, cycle_repeats_hard=4.
Layer 3 runs after Layers 1 & 2 in _track_and_check. Hard limit returns _HARD_STOP_MSG, True. Warn returns _WARNING_MSG, False once per (thread, cycle) pair.
LRU eviction in _evict_if_needed and the thread-reset path both clear the new dicts.

backend/tests/test_loop_detection_middleware.py:

New TestCycleDetection class with 6 scenario tests:
- 2-tool alternation warns at 3 cycles (6 calls).
- 2-tool alternation hard-stops at 4 cycles (8 calls).
- 3-tool cycle (a, b, c) hard-stops at 4 cycles (12 calls).
- Mixed pattern (a, b, a, c, a, b) does NOT trigger.
- Hash-based detection still works (regression).
- Per-tool frequency still works (regression).
Static unit tests of _detect_cycle for empty list, below-min length, exact hit, and near-miss.

How to test

cd backend
uvx ruff check packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py tests/test_loop_detection_middleware.py
PYTHONPATH=. uv run pytest tests/test_loop_detection_middleware.py -v

Both pass locally: 53/53 tests pass, ruff clean.

A reviewer can also reproduce the original failure mode by feeding the middleware an alternating sequence and observing that warn fires by call 6 and hard stop by call 8 — vs the 200+ calls reported in #2569.

AI assistance disclosure

Developed with Claude Code orchestrating Codex CLI (gpt-5.5 high). Adversarial review focused on layer ordering, single-tool overlap with Layer 2 (covered by len(set(cycle)) < 2 guard), and eviction parity (verified the new dicts are cleared in both _evict_if_needed and the reset path).

Closes #2569

Adds a third detection layer to LoopDetectionMiddleware that catches short alternating tool-name patterns (e.g. web_search → web_fetch → web_search → web_fetch ...). The two existing layers miss these: - Layer 1 (hash) compares full tool-call sets, so two calls with different args hash differently. - Layer 2 (per-tool frequency) only triggers when one tool name passes the threshold; alternation splits the count across two names so neither hits the limit fast enough. The new layer tracks the per-thread sequence of tool names, scans the tail for length-L cycles (L in [cycle_min_len, cycle_max_len]), and fires a warning at cycle_repeats_warn (default 3) and a hard stop at cycle_repeats_hard (default 4). A `len(set(cycle)) < 2` guard skips single-name repeats so this never overlaps with Layer 2. Adds 53 tests total: cycle warn / hard stop on 2- and 3-tool patterns, mixed sequences that should NOT trigger, regression for hash-based detection, regression for per-tool frequency, and unit coverage of the _detect_cycle helper at the boundaries (empty list, below min, exact hit, near-miss). Also extends LRU eviction and the thread-reset path to clear the new _name_history and _cycle_warned dicts. Closes bytedance#2569

Copilot

Pull request overview

Adds a third “Layer 3” to LoopDetectionMiddleware to detect short repeating tool-name cycles (e.g., alternating web_search/web_fetch) that evade the existing hash-based and per-tool-frequency safeguards, along with new unit tests covering cycle detection and regressions.

Changes:

Implement cycle-based loop detection with new configurable thresholds and per-thread name history tracking.
Add warning/hard-stop behavior for detected cycles and ensure new tracking state is evicted/reset correctly.
Extend test suite with new cycle-detection scenarios and helper for generating named tool calls.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`backend/packages/harness/deerflow/agents/middlewares/loop_detection_middleware.py`	Adds Layer 3 cycle detection, new config defaults, per-thread name tracking, and reset/eviction updates.
`backend/tests/test_loop_detection_middleware.py`	Adds `TestCycleDetection` coverage and updates LRU eviction assertions for new tracking structures.

+            name_history = self._name_history[thread_id]
+            name_history.extend(name for name in tool_names if name)


+            if cycle:
+                warned = self._cycle_warned[thread_id]
+                if cycle not in warned:
+                    warned.add(cycle)
+                    logger.warning(
+                        "Tool-name cycle detected — injecting warning",
+                        extra={
+                            "thread_id": thread_id,
+                            "cycle": cycle,
+                            "tools": tool_names,
+                        },
+                    )
+                    return _WARNING_MSG, False


+_DEFAULT_CYCLE_REPEATS_WARN = 3  # warn when a cycle repeats 3 times
+_DEFAULT_CYCLE_REPEATS_HARD = 4  # hard-stop at 4 repeats


WillemJiang requested a review from Copilot May 1, 2026 08:30

Copilot started reviewing on behalf of WillemJiang May 1, 2026 08:31 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

NewAmorend mentioned this pull request May 9, 2026

[RFC] SWE-bench Recursion Limits and Long-Horizon Benchmark Suggestions #2820

Open

4 tasks

MazBenOscar mentioned this pull request May 10, 2026

docs: add benchmark recursion reproducibility kit #2848

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(loop-detection): detect alternating tool-name cycles#2590

fix(loop-detection): detect alternating tool-name cycles#2590
mvanhorn wants to merge 1 commit into
bytedance:mainfrom
mvanhorn:fix/2569-loop-detection-alternating-pattern

mvanhorn commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		name_history = self._name_history[thread_id]
		name_history.extend(name for name in tool_names if name)

		_DEFAULT_CYCLE_REPEATS_WARN = 3 # warn when a cycle repeats 3 times
		_DEFAULT_CYCLE_REPEATS_HARD = 4 # hard-stop at 4 repeats

Conversation

mvanhorn commented Apr 26, 2026

Summary

Why this matters

Changes

How to test

AI assistance disclosure

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants