diff --git a/python/packages/core/AGENTS.md b/python/packages/core/AGENTS.md index f5fc15a3d7..ad1f215f51 100644 --- a/python/packages/core/AGENTS.md +++ b/python/packages/core/AGENTS.md @@ -128,7 +128,10 @@ agent_framework/ - **`AgentLoopMiddleware`** - `AgentMiddleware` that re-runs an agent in a loop by calling `call_next()` repeatedly (the pipeline re-reads `context.messages` each time). One configurable class covers two patterns: a required user `should_continue` predicate (sync or async, the first positional/keyword arg), and a chat-client judge built via the `.with_judge(...)` factory (a second chat client decides whether the original request was answered; loops while it is *not*, using a `JudgeVerdict` structured-output response — internally just an async `should_continue` predicate). The constructor covers the predicate pattern directly; only the judge has a convenience classmethod factory (`.with_judge(judge_client, ...)`) that forwards to `__init__`. Supports both streaming and non-streaming runs. By default a non-streaming run returns an aggregated `AgentResponse` containing every iteration's messages plus the injected `next_message` "nudge" messages (as `user` messages); set `return_final_only=True` to return only the last iteration's response. Streaming runs always yield each iteration's updates and emit the injected nudge messages as `user` updates between iterations (the `return_final_only` flag has no effect on streaming, and the final response reflects the last iteration; `MiddlewareTermination` is handled cleanly). `should_continue` is required; other constructor args are optional: `max_iterations` (safety cap; defaults to `DEFAULT_MAX_ITERATIONS`=10, explicit `None`→unbounded, positive int caps; `.with_judge` uses `DEFAULT_JUDGE_MAX_ITERATIONS`=5 as its default), `next_message` (defaults to a short "continue" nudge), `return_final_only`, and `additional_instructions` (an extra `system` message injected ahead of the input before the agent runs — becomes part of the original messages so it survives `fresh_context` resets and persists via a session). The judge is configured only through `.with_judge` (`judge_client`/`instructions`/`criteria`), not the constructor, and its `reasoning` is fed back to the agent as the next iteration's input; the judge forwards the original request messages and the agent's latest response messages verbatim so multi-modal content is preserved. `criteria` (a `list[str]`) is both injected as the agent's `additional_instructions` and rendered into the judge instructions wherever the `{{criteria}}` placeholder (`CRITERIA_PLACEHOLDER`) appears (`DEFAULT_JUDGE_INSTRUCTIONS` ends with it; custom `instructions` may include it, and it is stripped when no criteria are given). The `should_continue`/`next_message` callables are invoked with keyword args (`iteration`, `last_result`, `messages`, `original_messages`, `session`, `agent`, `progress`, `feedback`) and may be sync or async; declare only what you need plus `**kwargs`. `should_continue` may return a plain `bool` or a `(bool, str | None)` tuple whose second item is feedback surfaced to `next_message`/`record_feedback` via the `feedback` kwarg (the judge uses this to relay its `reasoning`). Stop precedence per iteration is `max_iterations` → `should_continue`, evaluated before `record_feedback` so the feedback is available to it. - **Feedback tracking** - `record_feedback` captures a per-iteration progress entry (called with the loop kwargs; if it returns a truthy string the entry is appended, otherwise the agent's response text is used as the fallback entry). The accumulated log is exposed to every callback via the `progress` keyword (a per-iteration copy of prior entries) and, when `inject_progress=True` (default), injected into the next iteration's input as a `user` message (the full log without a session, only the latest entry with a session to avoid duplicating history). `fresh_context=True` restarts each iteration from the original task plus the progress log; when a session is attached it is snapshotted (`to_dict()`) before the loop and restored (`from_dict` + field copy) between iterations so the local transcript and any service-side conversation id reset too (in-loop working-state is discarded, pre-loop state preserved, continuity carried only by the progress log). -- **`todos_remaining(provider)`** / **`background_tasks_running(provider)`** - Helper factories returning `should_continue` predicates that loop while a `TodoProvider` has open items, or while a `BackgroundAgentsProvider`'s persisted state shows running tasks. +- **`todos_remaining(*, looping_modes=None)`** / **`todos_remaining_message`** - Helper factories for todo-driven loops (the Python counterpart of .NET's `TodoCompletionLoopEvaluator`), designed for `create_harness_agent` but usable with any agent that registers a `TodoProvider` via `context_providers`. They resolve the `TodoProvider`/`AgentModeProvider` from the *running agent* (`agent.context_providers`, via `_resolve_context_provider`) rather than taking the provider as an argument, so they can be wired directly into `loop_should_continue`/`loop_next_message`. `todos_remaining` returns a `should_continue` predicate that loops while any todo is open; pass `looping_modes=[...]` to gate looping to specific operating modes (case-insensitive; honors the `AgentModeProvider`'s `source_id`/`available_modes`), `looping_modes=None` (default) applies in every mode, and an empty sequence raises `ValueError`. `todos_remaining_message` is a `next_message` callable that lists the still-open todo titles and tells the agent to finish them, returning `None` when the session/agent/provider is unavailable or nothing is open (in which case the middleware's default `None` handling applies: reuse the previous iteration's messages verbatim under the default `fresh_context=False`, or `DEFAULT_NEXT_MESSAGE` only when `fresh_context=True`). +- **`background_tasks_running(provider)`** - Helper factory returning a `should_continue` predicate that loops while a `BackgroundAgentsProvider`'s persisted state shows running tasks (takes the provider explicitly, unlike `todos_remaining`). + - **Approval escape hatch** - `_has_pending_approval_request(result)` checks whether an iteration's response carries a pending tool-approval request (any content with `type == "function_approval_request"`). Both the streaming and non-streaming loops stop and return that response to the caller *before* evaluating `should_continue`/`max_iterations` or injecting `next_message`, so the loop is HITL-safe even when wrapped outermost around a `ToolApprovalMiddleware` (mirrors the C# `LoopAgent`'s `HasPendingApprovalRequests`). + - **Harness integration** - `create_harness_agent` enables the loop when a `loop_should_continue` callable is passed; it prepends `AgentLoopMiddleware(loop_should_continue, max_iterations=loop_max_iterations, next_message=loop_next_message)` ahead of `ToolApprovalMiddleware` so the loop is the outermost middleware (each iteration is a full agent run including tool approval, and the escape hatch hands pending approvals back to the caller). `loop_next_message` and `loop_max_iterations` only take effect together with `loop_should_continue` (with no `loop_should_continue` there is no loop, so they are ignored); `loop_max_iterations` defaults to the loop's default cap (`None` → unbounded). ### Workflows (`_workflows/`) diff --git a/python/packages/core/agent_framework/__init__.py b/python/packages/core/agent_framework/__init__.py index 07516ad36b..1d60836598 100644 --- a/python/packages/core/agent_framework/__init__.py +++ b/python/packages/core/agent_framework/__init__.py @@ -112,6 +112,7 @@ JudgeVerdict, background_tasks_running, todos_remaining, + todos_remaining_message, ) from ._harness._memory import ( DEFAULT_MEMORY_SOURCE_ID, @@ -606,6 +607,7 @@ "set_agent_mode", "step", "todos_remaining", + "todos_remaining_message", "tool", "tool_call_args_match", "tool_called_check", diff --git a/python/packages/core/agent_framework/_harness/_agent.py b/python/packages/core/agent_framework/_harness/_agent.py index 3acad1b03e..87280a9fcc 100644 --- a/python/packages/core/agent_framework/_harness/_agent.py +++ b/python/packages/core/agent_framework/_harness/_agent.py @@ -24,6 +24,7 @@ from ._background_agents import BackgroundAgentsProvider from ._file_access import AgentFileStore, FileAccessProvider, FileSystemAgentFileStore from ._file_memory import FileMemoryProvider +from ._loop import DEFAULT_MAX_ITERATIONS, AgentLoopMiddleware from ._mode import AgentModeProvider from ._todo import TodoProvider from ._tool_approval import ToolApprovalMiddleware @@ -37,6 +38,7 @@ from .._compaction import CompactionStrategy, TokenizerProtocol from .._middleware import MiddlewareTypes from .._tools import ToolTypes + from ._loop import NextMessageCallable, ShouldContinueCallable from ._tool_approval import ToolApprovalRuleCallback logger = logging.getLogger(__name__) @@ -269,6 +271,9 @@ def create_harness_agent( disable_web_search: bool = False, disable_tool_auto_approval: bool = False, auto_approval_rules: Sequence[ToolApprovalRuleCallback] | None = None, + loop_should_continue: ShouldContinueCallable | None = None, + loop_next_message: NextMessageCallable | None = None, + loop_max_iterations: int | None = DEFAULT_MAX_ITERATIONS, otel_provider_name: str | None = None, context_providers: Sequence[ContextProvider] | None = None, middleware: Sequence[MiddlewareTypes] | None = None, @@ -289,6 +294,7 @@ def create_harness_agent( - **BackgroundAgentsProvider** — delegate work to background sub-agents - **Tool approval** — "don't ask again" standing approval rules plus heuristic auto-approval callbacks + - **Looping** — re-run the agent until a ``should_continue`` predicate is satisfied - **OpenTelemetry** — observability via ``AgentTelemetryLayer`` Each feature can be disabled or customized via keyword arguments. @@ -403,6 +409,19 @@ def create_harness_agent( content and returns ``True`` to approve it. Rules are evaluated after standing rules (derived from prior user approvals) but before prompting the user. Only used when ``disable_tool_auto_approval`` is False. + loop_should_continue: Optional predicate that enables the looping middleware. When provided, the + agent is re-run in a loop (via :class:`~agent_framework.AgentLoopMiddleware`, wired as + the outermost middleware so each iteration is a full agent run including tool approval) + for as long as the predicate returns ``True``, up to ``loop_max_iterations``. If an + iteration returns a pending tool-approval request, the loop stops and returns it so the + caller can approve before continuing. When None (default), no loop is added. + loop_next_message: Optional callable controlling the input for the next loop iteration. + Only takes effect when ``loop_should_continue`` is set (otherwise no loop is added and + this is ignored). + loop_max_iterations: Safety cap on the number of loop iterations. ``None`` means unbounded; + a positive integer caps the loop (defaults to the loop middleware's default cap). Only + takes effect when ``loop_should_continue`` is set (otherwise no loop is added and this + is ignored). otel_provider_name: Custom OpenTelemetry provider/source name for telemetry. context_providers: Additional context providers to include after the built-in ones. middleware: Additional middleware to include. @@ -500,9 +519,21 @@ def create_harness_agent( # placed first so it sits outermost: it intercepts inbound "always approve" responses and # outbound approval requests at the caller boundary, and its re-invocation loop re-runs any # user-supplied middleware. ToolApprovalMiddleware requires an AgentSession at run time. + # When should_continue is supplied, the loop is prepended ahead of tool approval so it sits + # outermost of all: each loop iteration is a full agent run (including tool approval), and the + # loop's approval escape hatch returns any pending approval request to the caller. assembled_middleware: list[MiddlewareTypes] = [] if not disable_tool_auto_approval: assembled_middleware.append(ToolApprovalMiddleware(auto_approval_rules=auto_approval_rules)) + if loop_should_continue is not None: + assembled_middleware.insert( + 0, + AgentLoopMiddleware( + loop_should_continue, + max_iterations=loop_max_iterations, + next_message=loop_next_message, + ), + ) if middleware: assembled_middleware.extend(middleware) diff --git a/python/packages/core/agent_framework/_harness/_loop.py b/python/packages/core/agent_framework/_harness/_loop.py index 05bb624d00..e61cdb4740 100644 --- a/python/packages/core/agent_framework/_harness/_loop.py +++ b/python/packages/core/agent_framework/_harness/_loop.py @@ -54,6 +54,7 @@ "JudgeVerdict", "background_tasks_running", "todos_remaining", + "todos_remaining_message", ] DEFAULT_NEXT_MESSAGE = "Continue working on the task. If it is complete, say so." @@ -420,6 +421,25 @@ async def process( else: await self._process_non_streaming(context, call_next, original_messages, snapshot) + @staticmethod + def _has_pending_approval_request(result: AgentResponse | None) -> bool: + """Return ``True`` if ``result`` carries a pending tool-approval request. + + When the loop sits outermost (e.g. around a tool-approval middleware), an iteration may + return a response that asks the caller to approve a tool call rather than a completed turn. + In that case the loop must stop and hand the response back so a human can approve, instead + of continuing or injecting the next message. This mirrors the C# ``LoopAgent`` escape hatch + (``HasPendingApprovalRequests``). A pending request is any content whose ``type`` is + ``"function_approval_request"``. + """ + if result is None: + return False + return any( + getattr(content, "type", None) == "function_approval_request" + for message in result.messages + for content in message.contents + ) + @staticmethod def _restore_session(session: Any, snapshot: dict[str, Any]) -> None: """Restore a session in place to a previously captured ``to_dict()`` snapshot. @@ -467,6 +487,11 @@ async def _process_non_streaming( if result.usage_details is not None: aggregated_usage = add_usage_details(aggregated_usage, result.usage_details) + # Escape hatch: if this iteration is asking for tool approval, stop and return the + # response so the caller can approve, instead of continuing or injecting next_message. + if self._has_pending_approval_request(result): + break + messages_used = context.messages loop_kwargs = self._build_loop_kwargs( context=context, @@ -555,6 +580,10 @@ async def _generator() -> Any: messages_used = context.messages final = holder["final"] + # Escape hatch: if this iteration is asking for tool approval, stop the loop and + # let the caller approve, instead of continuing or injecting next_message. + if self._has_pending_approval_request(final): + return loop_kwargs = self._build_loop_kwargs( context=context, iteration=iteration, @@ -748,25 +777,6 @@ async def _resolve_next_message( return list(next_msgs) -def todos_remaining(provider: Any) -> ShouldContinueCallable: - """Build a ``should_continue`` predicate that loops while a ``TodoProvider`` has open items. - - Args: - provider: A :class:`~agent_framework.TodoProvider` attached to the same session as the loop. - - Returns: - A predicate suitable for :class:`AgentLoopMiddleware`'s ``should_continue`` argument. - """ - - async def _should_continue(*, session: Any = None, **kwargs: Any) -> bool: - if session is None: - return False - items = await provider.store.load_items(session, source_id=provider.source_id) - return any(not item.is_complete for item in items) - - return _should_continue - - def background_tasks_running(provider: Any) -> ShouldContinueCallable: """Build a ``should_continue`` predicate that loops while a ``BackgroundAgentsProvider`` is busy. @@ -794,3 +804,115 @@ def _should_continue(*, session: Any = None, **kwargs: Any) -> bool: ) return _should_continue + + +def _resolve_context_provider(agent: Any, provider_type: type) -> Any: + """Return the first ``provider_type`` instance on ``agent.context_providers`` (or ``None``). + + The harness exposes its built-in context providers (``TodoProvider``, ``AgentModeProvider``, + ...) on ``agent.context_providers``, so loop callbacks can reuse the same instances that + :func:`~agent_framework.create_harness_agent` wired up instead of constructing their own. + """ + return next( + (provider for provider in getattr(agent, "context_providers", []) if isinstance(provider, provider_type)), + None, + ) + + +def todos_remaining(*, looping_modes: Sequence[str] | None = None) -> ShouldContinueCallable: + """Build a ``should_continue`` predicate that loops while the Agent's ``TodoProvider`` has open items. + + This resolves the :class:`~agent_framework.TodoProvider` from the running agent + (``agent.context_providers``) rather than taking it as an argument, so it can be used directly + with :func:`~agent_framework.create_harness_agent` (whose providers are built internally) as well + as with any agent that registers a ``TodoProvider`` via ``context_providers``. It is the Python + counterpart of the .NET ``TodoCompletionLoopEvaluator``. + + Args: + looping_modes: When provided, the loop only continues while the agent's current operating + mode (read from its :class:`~agent_framework.AgentModeProvider`) is one of these modes; + in any other mode the predicate returns ``False`` so the agent stays interactive. Mode + matching is case-insensitive. When ``None`` (default), the loop applies in every mode. An + empty sequence is rejected (there would be no mode in which the loop could ever run). + Restricting looping to certain modes is useful when, for example, the agent has a planning + and execution mode, and you only want to loop on the execution mode until all todos are + complete. Looping until completion in planning is usually undesirable since the agent is + still building the list of todos to complete. + + Returns: + A predicate suitable for :class:`AgentLoopMiddleware`'s ``should_continue`` argument (and for + ``create_harness_agent``'s ``loop_should_continue``). + + Raises: + ValueError: ``looping_modes`` is an empty sequence. + """ + if looping_modes is not None: + allowed_modes: set[str] | None = {mode.strip().lower() for mode in looping_modes} + if not allowed_modes: + raise ValueError("looping_modes must be None or a non-empty sequence of mode names.") + else: + allowed_modes = None + + async def _should_continue(*, session: Any = None, agent: Any = None, **kwargs: Any) -> bool: + from ._mode import AgentModeProvider, get_agent_mode + from ._todo import TodoProvider + + if session is None or agent is None: + return False + + if allowed_modes is not None: + mode_provider = _resolve_context_provider(agent, AgentModeProvider) + if mode_provider is not None: + current_mode = get_agent_mode( + session, + source_id=mode_provider.source_id, + default_mode=mode_provider.default_mode, + available_modes=mode_provider.available_modes, + ) + else: + current_mode = get_agent_mode(session) + if current_mode.strip().lower() not in allowed_modes: + return False + + todo_provider = _resolve_context_provider(agent, TodoProvider) + if todo_provider is None: + return False + items = await todo_provider.store.load_items(session, source_id=todo_provider.source_id) + return any(not item.is_complete for item in items) + + return _should_continue + + +async def todos_remaining_message(*, session: Any = None, agent: Any = None, **kwargs: Any) -> str | None: + """``next_message`` callable that reminds the agent which todos are still open. + + Designed to pair with :func:`todos_remaining` as a loop's ``next_message`` (e.g. + ``create_harness_agent``'s ``loop_next_message``): between iterations it resolves the harness + :class:`~agent_framework.TodoProvider` from the agent, lists the still-open todo items, and + instructs the agent to complete them all before finishing. + + Returns ``None`` when the session/agent/provider is unavailable or no todos are open. In that + case the loop's default ``next_message`` handling applies: with ``fresh_context=False`` (the + default, used by ``create_harness_agent``) it reuses the previous iteration's messages verbatim + (skipping progress injection); only with ``fresh_context=True`` does it fall back to + ``DEFAULT_NEXT_MESSAGE``. In normal looping a ``None`` here is rare, since "no open todos" also + makes :func:`todos_remaining` stop the loop before the next message is consulted. + """ + from ._todo import TodoProvider + + if session is None or agent is None: + return None + todo_provider = _resolve_context_provider(agent, TodoProvider) + if todo_provider is None: + return None + items = await todo_provider.store.load_items(session, source_id=todo_provider.source_id) + open_items = [item for item in items if not item.is_complete] + if not open_items: + return None + todo_lines = "\n".join(f"- {item.title}" for item in open_items) + return ( + f"You still have {len(open_items)} open todo item(s) that must be addressed before you can " + f"finish:\n{todo_lines}\n\n" + "Continue working through them now. Mark each todo complete as you finish it, and only stop " + "once every todo item is complete." + ) diff --git a/python/packages/core/tests/core/test_harness_agent.py b/python/packages/core/tests/core/test_harness_agent.py index 0737e14e05..8213df5888 100644 --- a/python/packages/core/tests/core/test_harness_agent.py +++ b/python/packages/core/tests/core/test_harness_agent.py @@ -835,3 +835,153 @@ def test_create_harness_agent_no_middleware_when_tool_approval_disabled_and_none disable_tool_auto_approval=True, ) assert agent.middleware is None + + +# --- Loop Wiring Tests --- + + +def _find_loop_middleware(agent: Any) -> Any: + from agent_framework import AgentLoopMiddleware + + for mw in agent.middleware or []: + if isinstance(mw, AgentLoopMiddleware): + return mw + return None + + +def test_create_harness_agent_no_loop_by_default() -> None: + """No loop middleware should be wired when loop_should_continue is not provided.""" + agent = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + ) + assert _find_loop_middleware(agent) is None + + +def test_create_harness_agent_wires_loop_when_should_continue_given() -> None: + """Passing loop_should_continue should add an AgentLoopMiddleware as the outermost middleware.""" + from agent_framework import AgentLoopMiddleware + + def _should_continue(**kwargs: Any) -> bool: + return False + + agent = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + loop_should_continue=_should_continue, + ) + assert agent.middleware is not None + assert isinstance(agent.middleware[0], AgentLoopMiddleware) + assert agent.middleware[0].should_continue is _should_continue + + +def test_create_harness_agent_loop_outermost_of_tool_approval_and_user_middleware() -> None: + """The loop should sit outermost: loop, then tool approval, then user middleware.""" + from agent_framework import AgentLoopMiddleware, AgentMiddleware, ToolApprovalMiddleware + + class _CustomMiddleware(AgentMiddleware): + async def process(self, context: Any, call_next: Any) -> None: + await call_next() + + custom = _CustomMiddleware() + + def _should_continue(**kwargs: Any) -> bool: + return False + + agent = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + loop_should_continue=_should_continue, + middleware=[custom], + ) + assert agent.middleware is not None + assert isinstance(agent.middleware[0], AgentLoopMiddleware) + assert isinstance(agent.middleware[1], ToolApprovalMiddleware) + assert agent.middleware.index(custom) > agent.middleware.index(agent.middleware[1]) + + +def test_create_harness_agent_forwards_next_message_to_loop() -> None: + """loop_next_message should be forwarded to the loop middleware.""" + + def _should_continue(**kwargs: Any) -> bool: + return False + + def _next_message(**kwargs: Any) -> Any: + return "keep going" + + agent = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + loop_should_continue=_should_continue, + loop_next_message=_next_message, + ) + loop = _find_loop_middleware(agent) + assert loop is not None + assert loop.next_message is _next_message + + +def test_create_harness_agent_uses_default_max_iterations_when_omitted() -> None: + """When loop_max_iterations is omitted, the loop keeps the middleware's default cap.""" + from agent_framework._harness._loop import DEFAULT_MAX_ITERATIONS + + def _should_continue(**kwargs: Any) -> bool: + return False + + agent = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + loop_should_continue=_should_continue, + ) + loop = _find_loop_middleware(agent) + assert loop is not None + assert loop.max_iterations == DEFAULT_MAX_ITERATIONS + + +def test_create_harness_agent_forwards_max_iterations_to_loop() -> None: + """loop_max_iterations should be forwarded to the loop middleware, including None (unbounded).""" + + def _should_continue(**kwargs: Any) -> bool: + return False + + agent = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + loop_should_continue=_should_continue, + loop_max_iterations=3, + ) + loop = _find_loop_middleware(agent) + assert loop is not None + assert loop.max_iterations == 3 + + unbounded = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + loop_should_continue=_should_continue, + loop_max_iterations=None, + ) + unbounded_loop = _find_loop_middleware(unbounded) + assert unbounded_loop is not None + assert unbounded_loop.max_iterations is None + + +def test_create_harness_agent_next_message_and_max_iterations_ignored_without_should_continue() -> None: + """Without loop_should_continue no loop is added, so loop params are simply ignored.""" + + def _next_message(**kwargs: Any) -> Any: + return "keep going" + + agent = create_harness_agent( + client=_FakeChatClient(), # type: ignore[arg-type] + max_context_window_tokens=128_000, + max_output_tokens=16_384, + loop_next_message=_next_message, + loop_max_iterations=5, + ) + assert _find_loop_middleware(agent) is None diff --git a/python/packages/core/tests/core/test_harness_loop.py b/python/packages/core/tests/core/test_harness_loop.py index f1154523cc..ba2c876928 100644 --- a/python/packages/core/tests/core/test_harness_loop.py +++ b/python/packages/core/tests/core/test_harness_loop.py @@ -12,6 +12,7 @@ Agent, AgentContext, AgentMiddleware, + AgentModeProvider, AgentResponse, AgentSession, BackgroundTaskInfo, @@ -28,7 +29,9 @@ TodoItem, TodoProvider, background_tasks_running, + set_agent_mode, todos_remaining, + todos_remaining_message, ) from agent_framework._harness._loop import ( DEFAULT_JUDGE_MAX_ITERATIONS, @@ -958,36 +961,6 @@ async def test_additional_instructions_injected_as_system_message() -> None: # region provider helpers -async def test_todos_remaining_helper_reflects_store_state() -> None: - provider = TodoProvider() - session = AgentSession() - predicate = todos_remaining(provider) - - # No items yet -> nothing to continue for. - assert await _resolve_should_continue_result(predicate(session=session)) is False - - await provider.store.save_state( - session, - [TodoItem(id=1, title="open item", is_complete=False)], - next_id=2, - source_id=provider.source_id, - ) - assert await _resolve_should_continue_result(predicate(session=session)) is True - - await provider.store.save_state( - session, - [TodoItem(id=1, title="open item", is_complete=True)], - next_id=2, - source_id=provider.source_id, - ) - assert await _resolve_should_continue_result(predicate(session=session)) is False - - -async def test_todos_remaining_helper_without_session() -> None: - predicate = todos_remaining(TodoProvider()) - assert await _resolve_should_continue_result(predicate(session=None)) is False - - def test_background_tasks_running_helper_reflects_state() -> None: from agent_framework import BackgroundAgentsProvider @@ -1039,6 +1012,128 @@ def run(self, *args: Any, **kwargs: Any) -> Any: ... assert predicate(session=None) is False +# region todos_remaining / todos_remaining_message helpers + + +class _FakeHarnessAgent: + """Minimal stand-in for a harness agent exposing built-in context providers.""" + + def __init__(self, *providers: Any) -> None: + self.context_providers = list(providers) + + +async def _save_todos(provider: TodoProvider, session: AgentSession, items: list[TodoItem]) -> None: + await provider.store.save_state( + session, + items, + next_id=len(items) + 1, + source_id=provider.source_id, + ) + + +async def test_todos_remaining_reflects_store_state() -> None: + provider = TodoProvider() + session = AgentSession() + agent = _FakeHarnessAgent(provider) + predicate = todos_remaining() + + # No items yet -> nothing to continue for. + assert await _resolve_should_continue_result(predicate(session=session, agent=agent)) is False + + await _save_todos(provider, session, [TodoItem(id=1, title="open", is_complete=False)]) + assert await _resolve_should_continue_result(predicate(session=session, agent=agent)) is True + + await _save_todos(provider, session, [TodoItem(id=1, title="open", is_complete=True)]) + assert await _resolve_should_continue_result(predicate(session=session, agent=agent)) is False + + +async def test_todos_remaining_requires_session_agent_and_provider() -> None: + provider = TodoProvider() + session = AgentSession() + await _save_todos(provider, session, [TodoItem(id=1, title="open", is_complete=False)]) + + predicate = todos_remaining() + # Missing session or agent -> False. + assert await _resolve_should_continue_result(predicate(session=None, agent=_FakeHarnessAgent(provider))) is False + assert await _resolve_should_continue_result(predicate(session=session, agent=None)) is False + # Agent without a TodoProvider -> False. + assert await _resolve_should_continue_result(predicate(session=session, agent=_FakeHarnessAgent())) is False + + +async def test_todos_remaining_mode_gating() -> None: + provider = TodoProvider() + mode_provider = AgentModeProvider() + session = AgentSession() + agent = _FakeHarnessAgent(provider, mode_provider) + await _save_todos(provider, session, [TodoItem(id=1, title="open", is_complete=False)]) + + predicate = todos_remaining(looping_modes=["execute"]) + + # Default mode is "plan" -> not in allowed modes -> False even with open todos. + assert await _resolve_should_continue_result(predicate(session=session, agent=agent)) is False + + set_agent_mode(session, "execute") + assert await _resolve_should_continue_result(predicate(session=session, agent=agent)) is True + + # Case-insensitive matching. + predicate_upper = todos_remaining(looping_modes=["EXECUTE"]) + assert await _resolve_should_continue_result(predicate_upper(session=session, agent=agent)) is True + + +async def test_todos_remaining_modes_none_ignores_mode() -> None: + provider = TodoProvider() + mode_provider = AgentModeProvider() + session = AgentSession() + agent = _FakeHarnessAgent(provider, mode_provider) + await _save_todos(provider, session, [TodoItem(id=1, title="open", is_complete=False)]) + + predicate = todos_remaining(looping_modes=None) + # "plan" mode still loops because no mode gating is applied. + assert await _resolve_should_continue_result(predicate(session=session, agent=agent)) is True + + +def test_todos_remaining_rejects_empty_modes() -> None: + with pytest.raises(ValueError): + todos_remaining(looping_modes=[]) + + +async def test_todos_remaining_message_lists_open_items() -> None: + provider = TodoProvider() + session = AgentSession() + agent = _FakeHarnessAgent(provider) + await _save_todos( + provider, + session, + [ + TodoItem(id=1, title="first", is_complete=False), + TodoItem(id=2, title="second", is_complete=True), + TodoItem(id=3, title="third", is_complete=False), + ], + ) + + message = await todos_remaining_message(session=session, agent=agent) + assert message is not None + assert "2 open todo item(s)" in message + assert "- first" in message + assert "- third" in message + assert "second" not in message + + +async def test_todos_remaining_message_returns_none_when_unavailable() -> None: + provider = TodoProvider() + session = AgentSession() + agent = _FakeHarnessAgent(provider) + + # No session / agent. + assert await todos_remaining_message(session=None, agent=agent) is None + assert await todos_remaining_message(session=session, agent=None) is None + # Agent without a TodoProvider. + assert await todos_remaining_message(session=session, agent=_FakeHarnessAgent()) is None + # All todos complete -> nothing to remind about. + await _save_todos(provider, session, [TodoItem(id=1, title="done", is_complete=True)]) + assert await todos_remaining_message(session=session, agent=agent) is None + + # region streaming behavior @@ -1136,3 +1231,95 @@ async def process(self, context: AgentContext, call_next: Any) -> None: assert terminator.calls == 2 assert "only" in final.text assert any("only" in (u.text or "") for u in updates) + + +# region approval escape hatch + + +def _approval_request_content() -> Content: + """Build a pending tool-approval request content (as a downstream approval middleware would).""" + return Content.from_function_approval_request( + id="call-1", + function_call=Content.from_function_call(call_id="call-1", name="write_file"), + ) + + +class _ApprovalChatClient(BaseChatClient[ChatOptions[None]]): + """A minimal client that returns a pending approval request on its first call, text thereafter.""" + + def __init__(self) -> None: + super().__init__() + self.call_count = 0 + + def _inner_get_response( + self, + *, + messages: Sequence[Message], + stream: bool = False, + options: Mapping[str, Any], + **kwargs: Any, + ) -> Awaitable[ChatResponse] | ResponseStream[ChatResponseUpdate, ChatResponse]: + first = self.call_count == 0 + contents: list[Any] = [_approval_request_content()] if first else [Content.from_text("done")] + if stream: + + async def _gen() -> AsyncIterable[ChatResponseUpdate]: + self.call_count += 1 + yield ChatResponseUpdate(contents=contents, role="assistant", finish_reason="stop") + + return ResponseStream(_gen(), finalizer=lambda updates: ChatResponse.from_updates(updates)) + + async def _get() -> ChatResponse: + self.call_count += 1 + return ChatResponse(messages=Message(role="assistant", contents=contents)) + + return _get() + + +def test_has_pending_approval_request_detects_request() -> None: + response = AgentResponse(messages=[Message(role="assistant", contents=[_approval_request_content()])]) + assert AgentLoopMiddleware._has_pending_approval_request(response) is True + + +def test_has_pending_approval_request_false_for_plain_response() -> None: + response = AgentResponse(messages=[Message(role="assistant", contents=[Content.from_text("hi")])]) + assert AgentLoopMiddleware._has_pending_approval_request(response) is False + assert AgentLoopMiddleware._has_pending_approval_request(None) is False + + +async def test_non_streaming_escape_hatch_stops_on_pending_approval() -> None: + client = _ApprovalChatClient() + calls: list[int] = [] + + def should_continue(*, iteration: int, **kwargs: Any) -> bool: + calls.append(iteration) + return True + + agent = Agent(client=client, middleware=[AgentLoopMiddleware(should_continue, max_iterations=5)]) + + response = await agent.run("write a file") + + # The loop stops after the first iteration because it carries a pending approval request, + # before should_continue is evaluated and without injecting next_message. + assert client.call_count == 1 + assert calls == [] + assert AgentLoopMiddleware._has_pending_approval_request(response) is True + + +async def test_streaming_escape_hatch_stops_on_pending_approval() -> None: + client = _ApprovalChatClient() + calls: list[int] = [] + + def should_continue(*, iteration: int, **kwargs: Any) -> bool: + calls.append(iteration) + return True + + agent = Agent(client=client, middleware=[AgentLoopMiddleware(should_continue, max_iterations=5)]) + + stream = agent.run("write a file", stream=True) + _ = [update async for update in stream] + final = await stream.get_final_response() + + assert client.call_count == 1 + assert calls == [] + assert AgentLoopMiddleware._has_pending_approval_request(final) is True diff --git a/python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py b/python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py index 02e67752ee..4fa79348f3 100644 --- a/python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py +++ b/python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py @@ -16,6 +16,7 @@ from collections.abc import Awaitable, Callable, Coroutine, Mapping, Sequence from dataclasses import dataclass from pathlib import Path +from tempfile import TemporaryDirectory from typing import Any, cast import pytest @@ -665,7 +666,7 @@ def test_parse_output_files_skips_symlink_to_host_file(tmp_path: Path, monkeypat contents = execute_code_module._parse_output_files( sandbox=object(), - output_dir=_OutputDirShim(output_root), + output_dir=cast("TemporaryDirectory[str]", _OutputDirShim(output_root)), expect_output_files=False, ) @@ -691,7 +692,7 @@ def test_parse_output_files_rejects_intermediate_dir_symlink_from_listing( contents = execute_code_module._parse_output_files( sandbox=_SandboxWithListing(["output/sub/leak.txt"]), - output_dir=_OutputDirShim(output_root), + output_dir=cast("TemporaryDirectory[str]", _OutputDirShim(output_root)), expect_output_files=False, ) @@ -720,7 +721,7 @@ def test_parse_output_files_collects_real_output_file(tmp_path: Path) -> None: contents = execute_code_module._parse_output_files( sandbox=object(), - output_dir=_OutputDirShim(output_root), + output_dir=cast("TemporaryDirectory[str]", _OutputDirShim(output_root)), expect_output_files=True, ) diff --git a/python/samples/02-agents/harness/README.md b/python/samples/02-agents/harness/README.md index 13fa2aa7f5..bd2af5b7ed 100644 --- a/python/samples/02-agents/harness/README.md +++ b/python/samples/02-agents/harness/README.md @@ -19,6 +19,7 @@ from a chat client. | SkillsProvider | File-based skill discovery and progressive loading | | Shell tool | Shell command execution + environment probing (when `shell_executor` provided) | | Tool approval | "Don't ask again" standing rules + heuristic auto-approval (enabled by default) | +| Looping | Re-invoke the agent until a `loop_should_continue` predicate is satisfied (when provided) | | OpenTelemetry | Built-in observability | Each feature can be disabled or customized via keyword arguments. @@ -27,7 +28,7 @@ Each feature can be disabled or customized via keyword arguments. | File | Description | |------|-------------| -| `harness_research.py` | Interactive research assistant with web search and planning workflow | +| `harness_research.py` | Interactive research assistant with web search, a plan/execute workflow, and an execute-mode loop that re-invokes the agent until every todo is complete | ## Running diff --git a/python/samples/02-agents/harness/harness_research.py b/python/samples/02-agents/harness/harness_research.py index 48e7fd7e0c..1979d914fe 100644 --- a/python/samples/02-agents/harness/harness_research.py +++ b/python/samples/02-agents/harness/harness_research.py @@ -33,6 +33,15 @@ using todos, switch between plan and execute modes, search the web for current information, and track its progress. +It also demonstrates harness **looping**: a ``loop_should_continue`` predicate +keeps re-invoking the agent automatically while it is in ``"execute"`` mode and +the ``TodoProvider`` still has open items, so the agent works through the whole +plan autonomously once execution begins. A ``loop_next_message`` callable injects +a reminder between iterations listing the todos that are still open, and the loop +is scoped to ``"execute"`` mode so ``"plan"`` mode stays interactive. +``loop_max_iterations`` caps the number of autonomous passes per turn as a safety +net. + Environment variables: FOUNDRY_PROJECT_ENDPOINT — Azure AI Foundry project endpoint URL FOUNDRY_MODEL — Model deployment name @@ -43,7 +52,11 @@ import asyncio -from agent_framework import create_harness_agent +from agent_framework import ( + create_harness_agent, + todos_remaining, + todos_remaining_message, +) from agent_framework.foundry import FoundryChatClient from azure.identity import AzureCliCredential from console import build_observers_with_planning, run_agent_async @@ -96,6 +109,13 @@ async def main() -> None: name="ResearchAgent", description="A research assistant that plans and executes research tasks.", agent_instructions=RESEARCH_INSTRUCTIONS, + # Enable harness looping: while the agent is in "execute" mode and still has open todos, + # keep re-invoking it automatically so it works through the whole plan without manual + # prompting. loop_next_message reminds the agent which todos are still open each pass, and + # loop_max_iterations caps the autonomous passes per turn as a safety net. + loop_should_continue=todos_remaining(looping_modes=["execute"]), + loop_next_message=todos_remaining_message, + loop_max_iterations=10, ) # Run the harness console with the research agent. diff --git a/python/samples/02-agents/middleware/agent_loop_middleware_report.py b/python/samples/02-agents/middleware/agent_loop_middleware_report.py index 0bec473b43..12ad6666e4 100644 --- a/python/samples/02-agents/middleware/agent_loop_middleware_report.py +++ b/python/samples/02-agents/middleware/agent_loop_middleware_report.py @@ -86,7 +86,7 @@ async def report_loop(client: FoundryChatClient, editor_client: FoundryChatClien # builds the ``should_continue`` predicate; ``max_iterations`` caps planning + one-todo-per-turn # drafting + the final assembly turn. todo_loop = AgentLoopMiddleware( - todos_remaining(todo_provider), + todos_remaining(), max_iterations=8, ) diff --git a/python/samples/02-agents/middleware/agent_loop_middleware_todos.py b/python/samples/02-agents/middleware/agent_loop_middleware_todos.py index 9bf291615c..d3a3236a38 100644 --- a/python/samples/02-agents/middleware/agent_loop_middleware_todos.py +++ b/python/samples/02-agents/middleware/agent_loop_middleware_todos.py @@ -41,7 +41,7 @@ async def todo_loop(client: FoundryChatClient) -> None: # 2. ``todos_remaining`` builds a ``should_continue`` predicate that returns True while any todo # item is still open. ``max_iterations`` guarantees the loop stops even if the agent stalls. loop = AgentLoopMiddleware( - should_continue=todos_remaining(todo_provider), + should_continue=todos_remaining(), max_iterations=6, ) diff --git a/python/uv.lock b/python/uv.lock index 910f4b253d..1f55c8d534 100644 --- a/python/uv.lock +++ b/python/uv.lock @@ -230,7 +230,7 @@ dependencies = [ [package.metadata] requires-dist = [ { name = "agent-framework-core", editable = "packages/core" }, - { name = "anthropic", specifier = ">=0.80.0,<0.107.2" }, + { name = "anthropic", specifier = ">=0.80.0,<0.80.1" }, ] [[package]] @@ -1108,7 +1108,7 @@ wheels = [ [[package]] name = "anthropic" -version = "0.107.1" +version = "0.80.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, @@ -1120,9 +1120,9 @@ dependencies = [ { name = "sniffio", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, { name = "typing-extensions", marker = "sys_platform == 'darwin' or sys_platform == 'linux' or sys_platform == 'win32'" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/b1/f1/c6076a92e0bf6b0dfa126e213b3f9e8a510acd73567953210713aae6c256/anthropic-0.107.1.tar.gz", hash = "sha256:8e7169a6ab57fb806b778d9af018c867bad688144efec8969cdb4c5ccecd6670", size = 856312, upload-time = "2026-06-07T17:18:57.358Z" } +sdist = { url = "https://files.pythonhosted.org/packages/7f/63/791e14ef5a8ecb485cef5b5d058c7ca3ad6c50a2f94cf4cea5231c6b7c16/anthropic-0.80.0.tar.gz", hash = "sha256:ef042586673fdcab2a6ffd381aa5f9a1bcce38ffe73c07fe70bd56d12b8124ba", size = 533291, upload-time = "2026-02-17T19:26:26.717Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/86/0e/71432f0777a263701955a23ebcc6650485c2753be9afbce2a6a8d72526e3/anthropic-0.107.1-py3-none-any.whl", hash = "sha256:b74338d08000ba105dfc8adae29af3713ece845a4bffec9986a20697e087c7b3", size = 838729, upload-time = "2026-06-07T17:18:58.729Z" }, + { url = "https://files.pythonhosted.org/packages/b2/4b/665f29338f51d0c2f9e04b276ea54cc1e957ae5c521a0ad868aa80abc608/anthropic-0.80.0-py3-none-any.whl", hash = "sha256:dad0e40ec371ee686e9ffb2e0cb461a0ed51447fa100927fb5d39b174c286d6f", size = 453667, upload-time = "2026-02-17T19:26:29.96Z" }, ] [[package]]