Skip to content

Fix LocalPythonExecutor timeout waiting for worker completion#2199

Open
shaun0927 wants to merge 8 commits intohuggingface:mainfrom
shaun0927:fix/local-timeout-prompt-return
Open

Fix LocalPythonExecutor timeout waiting for worker completion#2199
shaun0927 wants to merge 8 commits intohuggingface:mainfrom
shaun0927:fix/local-timeout-prompt-return

Conversation

@shaun0927
Copy link
Copy Markdown

@shaun0927 shaun0927 commented Apr 16, 2026

Fixes #2197

Summary

Make LocalPythonExecutor timeouts return control promptly without letting timed-out workers mutate later executions' live state, including nested mutable values.

Problem / Root Cause

PR #1910 introduced a thread-based timeout mechanism for local Python execution. The original implementation wrapped future.result(timeout=...) in a with ThreadPoolExecutor(...) block.

When the timeout is exceeded, future.result() raises as expected, but exiting the with block shuts the executor down in a way that waits for the worker thread to finish. In practice, callers do not regain control near the configured timeout boundary.

Minimal repro before this patch:

import time
from smolagents.local_python_executor import evaluate_python_code, ExecutionTimeoutError

code = """
import time
time.sleep(3)
result = 42
"""

start = time.monotonic()
try:
    evaluate_python_code(code, authorized_imports=["time"], timeout_seconds=1)
except ExecutionTimeoutError:
    print(f"elapsed={time.monotonic() - start:.2f}s")

Observed before patch: elapsed is about 3.00s, not about 1.00s.

After switching to prompt-return shutdown semantics, there are two stale-worker races to account for:

  1. top-level execution dict updates could arrive late from the timed-out worker,
  2. nested mutable values already present in state (for example lists/dicts/objects) could still be mutated in place after the timeout if only a shallow copy was used.

In particular, a reused LocalPythonExecutor could otherwise see late variable/log overwrites or nested list/dict mutations from the stale worker after a later call has already completed.

This PR does not treat LocalPythonExecutor as a security boundary. It fixes timeout semantics / caller behavior and isolates the live per-call execution dictionaries from stale timed-out workers as far as Python in-process threading allows.

Solution

Switch from context-manager-based executor lifetime to explicit shutdown behavior:

  • keep wait=True on normal completion,
  • use wait=False and cancel_futures=True after a timeout,
  • preserve the documented Python limitation that a running worker thread may continue in the background until completion.

Then isolate execution context per call by deep-copying state / custom_tools for timed executions and only committing them back after non-timeout completion. That keeps prompt timeout behavior while preventing stale timed-out workers from mutating the dictionaries and nested mutable values reused by later executions.

Regression tests

Added tests that verify:

  • prompt return semantics for evaluate_python_code(..., timeout_seconds=1)
  • prompt return semantics for LocalPythonExecutor(..., timeout_seconds=1)
  • timed-out evaluate_python_code does not overwrite a reused state dict later
  • timed-out LocalPythonExecutor does not overwrite later variables/logs after a subsequent successful call
  • timed-out evaluate_python_code does not mutate nested reused state values later
  • timed-out LocalPythonExecutor does not mutate nested reused state values later

The evaluate_python_code timing assertion runs in a fresh interpreter subprocess to avoid interference from background timed-out worker threads created by other tests.

Validation

PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'
PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'not test_can_import_sklearn_if_explicitly_authorized'

Passed locally. The only skipped full-file coverage in this environment is the existing sklearn-specific test because sklearn is not installed locally.

The current timeout wrapper raises ExecutionTimeoutError only after the
underlying worker finishes because executor shutdown waits for the thread
when leaving the context manager. This change uses explicit shutdown
behavior so callers regain control at the configured timeout while keeping
Python's documented limitation that timed-out worker threads may continue
in the background until completion.

Constraint: Python threads cannot be forcefully terminated safely
Rejected: Process-based execution for LocalPythonExecutor | much broader behavioral change than a bugfix
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep timeout semantics prompt for callers even though background worker cleanup remains best-effort
Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'
Not-tested: Full test suite
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cda9f1e676

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/smolagents/local_python_executor.py
@shaun0927
Copy link
Copy Markdown
Author

I checked the current CI state for this fork PR: both Python tests and Quality Check are in action_required with the message This workflow is awaiting approval from a maintainer in #2199.

I also confirmed that maintainerCanModify is enabled on this PR, so there should not be any extra contributor-side setting needed from my side. Once a maintainer approves the workflow runs, the normal checks should be able to execute.

The prompt-return timeout fix exposed a follow-up race: once a timed-out\nworker keeps running in the background, it can still mutate shared\nexecution dictionaries after the caller has moved on. That made later\nLocalPythonExecutor calls vulnerable to late variable/log overwrites and\nleft reused evaluate_python_code state dicts open to the same top-level\nleakage.\n\nThis snapshots state/custom tool dictionaries for each execution and only\ncommits them back after non-timeout completion. Timeouts still return\npromptly, but stale workers no longer write into the live dictionaries\nused by subsequent calls.\n\nConstraint: Python threads still cannot be force-killed safely\nRejected: Revert prompt-return timeout behavior | restores the original availability bug\nRejected: Process-based isolation | too broad for this targeted bugfix\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: Keep timeout fixes paired with reuse-after-timeout regressions because stale workers can outlive the caller\nTested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'\nTested: targeted local repro confirming timed-out workers no longer overwrite reused state/logs\nNot-tested: Full pytest suite / CI workflows still require maintainer approval on fork PRs
@shaun0927
Copy link
Copy Markdown
Author

Follow-up pushed in d9028cb to address the stale-worker state race from the review thread.

What changed:

  • isolate state / custom_tools per execution
  • only commit them back after non-timeout completion
  • add regression tests for reused evaluate_python_code state and reused LocalPythonExecutor state/logs after timeout

Validation:

  • PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'

CI note: the fork PR workflows still appear to require maintainer approval before GitHub Actions can run.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9028cb818

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/smolagents/local_python_executor.py Outdated
The previous timeout follow-up only copied the top-level execution
mapping, so timed-out workers could still mutate nested lists, dicts, or
objects shared with later calls. That left the P1 race open for common
in-place mutations like list append even after prompt-return semantics
were fixed.

This switches timed executions to deep-copied state/custom tool
snapshots and adds regressions covering nested state reuse for both
 and .

Constraint: Python threads still cannot be force-killed safely
Rejected: Keep shallow copies | nested mutable values still leak after timeout
Rejected: Process isolation for local executor | much broader behavior change than this bugfix
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future timeout-isolation change must cover nested mutable state, not just top-level dict writes
Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'
Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'not test_can_import_sklearn_if_explicitly_authorized'
Tested: targeted nested-state timeout repro for evaluate_python_code and LocalPythonExecutor
Not-tested: tests/test_local_python_executor.py::test_can_import_sklearn_if_explicitly_authorized (local sklearn missing)
Not-tested: Full repo test suite / GitHub Actions still require maintainer approval on fork PRs
@shaun0927
Copy link
Copy Markdown
Author

Fresh follow-up pushed in aa42324 for the new nested-mutable-state review finding.

What changed:

  • deep-copy timed execution state / custom_tools
  • keep commit-back only on non-timeout completion
  • add regressions for nested mutable state reuse after timeout

Validation:

  • PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'
  • PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'not test_can_import_sklearn_if_explicitly_authorized'

Note: the only excluded test in this environment is the existing sklearn-specific import test because sklearn is not installed locally. GitHub Actions on the fork PR may still require maintainer approval before CI can run.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa42324abc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/smolagents/local_python_executor.py Outdated
Comment on lines +1632 to +1633
else:
state = copy.deepcopy(state)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop deep-copying persisted state unconditionally

evaluate_python_code now deep-copies state on every call, which requires all previously stored values to be deepcopyable; that breaks common stateful flows where prior code imported modules, because evaluate_import stores module objects in state and copy.deepcopy on a ModuleType raises TypeError before execution starts. In practice, a successful call like import time can make the next call with the same LocalPythonExecutor/state fail immediately, which is a functional regression introduced by this change.

Useful? React with 👍 / 👎.

@shaun0927
Copy link
Copy Markdown
Author

Fresh follow-up pushed in cdc8139 for the non-deepcopyable module regression from the latest review thread.

What changed:

  • replace unconditional copy.deepcopy(...) on state / custom_tools with a snapshot helper that still recursively copies mutable containers but preserves runtime objects like modules/functions/classes
  • keep the timeout isolation behavior for nested mutable state, without breaking reused executor state after imports like import time
  • add regressions for reused evaluate_python_code(..., state=...) and reused LocalPythonExecutor(...) state when imported modules are present

Validation:

  • ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix"

CI note: fork PR workflows may still require maintainer approval before Actions can run.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdc8139c3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/smolagents/local_python_executor.py Outdated
Comment on lines +333 to +335
if isinstance(value, dict):
return {key: snapshot_execution_context(val) for key, val in value.items()}
if isinstance(value, list):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve aliasing when cloning execution context

snapshot_execution_context rebuilds containers with fresh comprehensions, which drops shared-reference identity across keys/variables. If a prior run leaves state with aliases (e.g., b = a), the next evaluate_python_code call clones them into different objects, so mutating a no longer updates b and execution results become silently incorrect even without any timeout path. This is a behavior regression from the previous in-place state model and can corrupt multi-step executor sessions.

Useful? React with 👍 / 👎.

Comment thread src/smolagents/local_python_executor.py Outdated
Comment on lines +333 to +334
if isinstance(value, dict):
return {key: snapshot_execution_context(val) for key, val in value.items()}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle cycles while snapshotting execution state

The new recursive snapshot logic has no memo/visited tracking, so cyclic inputs like s = {}; s['self'] = s recurse indefinitely and fail before user code runs. Previously, such states could be reused because no deep snapshot occurred; now they can trigger RecursionError on entry to evaluate_python_code, making valid Python object graphs unusable.

Useful? React with 👍 / 👎.

The timeout isolation follow-up fixed stale worker mutations, but the
snapshot helper still broke shared-reference semantics and cyclic state
graphs. This update keeps runtime objects like modules intact while
preserving aliasing and cycles for reused executor state so subsequent
calls keep behaving like normal Python stateful sessions.

Constraint: Timed-out worker threads still cannot be forcefully terminated safely in-process
Rejected: Keep unconditional deepcopy semantics | breaks imported modules, aliases, and cyclic state reuse
Rejected: Drop per-call state snapshots | reintroduces stale-worker mutation races after timeout
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future state snapshot change must preserve aliasing/cycles as well as timeout isolation
Tested: ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
Tested: ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles"
Not-tested: Full pytest suite / GitHub Actions still require maintainer approval on fork PRs
@shaun0927
Copy link
Copy Markdown
Author

Fresh follow-up pushed in f5dddf4 for the latest snapshotting review findings.

What changed:

  • preserve aliasing when snapshotting timed execution state / custom_tools
  • handle cyclic container graphs without blowing up on recursive state reuse
  • still preserve runtime objects like imported modules/functions/classes instead of trying to deepcopy them
  • add regressions for reused aliased state and reused cyclic state

Validation:

  • ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles"
  • result: 20 passed

The only remaining blocker I can see from my side is still maintainer approval for the fork PR workflows so CI can run on the latest commits.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5dddf4604

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/smolagents/local_python_executor.py Outdated
Comment on lines +369 to +370
except Exception:
copied_value = value
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid sharing uncopyable context objects after timeout

When copy.deepcopy fails, snapshot_execution_context silently falls back to copied_value = value, which keeps the original object shared between the caller and the timed worker thread. This reintroduces late-mutation races for common non-deepcopyable mutables (for example queue.Queue, which fails deepcopy because of its internal lock): after ExecutionTimeoutError returns, the background thread can still mutate caller-visible state/custom tools and corrupt later executions. Because this change was intended to isolate timed-out workers, silently aliasing here undermines the fix.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 774565e.

What changed:

  • snapshot queue.Queue-style state objects into independent queue instances instead of letting them fall back to shared references
  • when timeouts are disabled, skip snapshotting entirely so ordinary reused state keeps the old in-place behavior
  • if timeout isolation still cannot safely snapshot a remaining state/custom tool object after deepcopy + shallow-copy fallback, raise a clear error instead of silently aliasing it
  • add regressions for reused queue state after timeout and for rejecting truly unsnapshotable timeout state objects

Validation:

  • ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects"
  • result: 23 passed

The latest review pointed out that silently aliasing state when deepcopy fails still lets timed-out workers mutate caller-visible objects. This change snapshots queue.Queue-based objects explicitly, falls back to shallow copy before giving up, and raises a clear error when timeout isolation cannot safely snapshot a remaining state object. It also avoids snapshot overhead when timeouts are disabled and locks the new behavior with queue and uncopyable-object regressions.\n\nConstraint: Timeouted worker threads still run in the background, so shared state objects must not leak back into later executions\nRejected: Keep aliasing deepcopy failures | still lets timed-out workers mutate caller-visible queue state after timeout\nConfidence: medium\nScope-risk: narrow\nReversibility: clean\nDirective: Any future timeout snapshot helper must preserve aliasing/cycles for supported objects and fail loudly instead of silently sharing unsupported mutable state\nTested: ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py; ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py; PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects"\nNot-tested: Full test suite; GitHub Actions on fork PR (still maintainer approval gated)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 774565e4c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/smolagents/local_python_executor.py Outdated
Comment on lines +352 to +353
if isinstance(value, dict):
copied_dict = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve mapping subclasses when cloning execution context

The snapshot logic converts every dict subclass into a plain {} and then evaluate_python_code commits that cloned state back on successful timed runs, so subclass behavior is silently lost even when no timeout occurs. A common case is collections.defaultdict: after one execution with default timeout settings, missing-key access starts raising KeyError because the stored object has been downgraded to dict, which is a functional regression from the previous in-place state behavior.

Useful? React with 👍 / 👎.

Comment thread src/smolagents/local_python_executor.py Outdated
Comment on lines +346 to +349
with value.mutex:
items = list(value.queue)
for item in items:
copied_queue.put_nowait(snapshot_execution_context(item, memo))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Queue task counters when snapshotting state

The queue.Queue snapshot only copies buffered items and reconstructs a new queue via put_nowait, but it does not preserve bookkeeping like unfinished_tasks. When this cloned queue is committed back after a timed execution, join()/task_done() semantics can become incorrect (for example, outstanding work can be dropped and join() may return early), which can corrupt queue-based workflows that persist queue objects in executor state.

Useful? React with 👍 / 👎.

The latest open PR huggingface#2199 review pointed out a real regression: the timeout-state
snapshot helper special-cased plain dict construction in a way that downgraded
dict subclasses like collections.defaultdict into regular dicts on the next
successful timed execution. This preserves dict subclass identity by snapshotting
plain dicts and dict subclasses separately, and adds focused regressions for
reused defaultdict state through both evaluate_python_code and
LocalPythonExecutor.

Constraint: Keep timeout isolation intact without regressing normal reusable Python state semantics
Rejected: Widen this follow-up to queue unfinished-task bookkeeping changes | lower-priority behavioral surface beyond the current P1 mapping-subclass regression
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future snapshot helper changes must preserve container subtype behavior in addition to aliasing/cycle isolation
Tested: ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
Tested: ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects or defaultdict"
Not-tested: Full pytest suite; GitHub Actions still require maintainer approval on fork PRs
Related: PR huggingface#2199
@shaun0927
Copy link
Copy Markdown
Author

Fresh follow-up pushed in 4e7e534 for the latest defaultdict / mapping-subclass review finding.

What changed:

  • preserve plain dict and dict-subclass snapshotting separately so timed execution snapshots no longer downgrade collections.defaultdict into a plain dict
  • add focused regressions for reused evaluate_python_code(..., state=...) and reused LocalPythonExecutor(...) state when defaultdict behavior must survive across timed runs

Validation:

  • ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
  • PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects or defaultdict"
  • result: 25 passed

I did not widen this follow-up to the lower-priority queue bookkeeping comment yet. That one touches Queue.join() / unfinished_tasks semantics and deserves its own validation pass instead of being folded into the current P1 mapping-subclass fix.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e7e534145

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/smolagents/local_python_executor.py Outdated
Comment on lines +339 to +341
if isinstance(value, (ModuleType, FunctionType, BuiltinFunctionType, MethodType, type)):
memo[obj_id] = value
return value
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve bound-method aliasing in state snapshots

snapshot_execution_context currently returns BuiltinFunctionType/MethodType objects as-is, even when their owner objects are being cloned in the same snapshot. With timeout-enabled executions (the default), a state like {"items": [...], "append_fn": items.append} is cloned into a new items list but keeps append_fn bound to the old list, so append_fn(...) mutates stale data while items appears unchanged. This silently breaks normal Python aliasing semantics across calls and can produce incorrect results without any timeout actually firing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@VANDRANKI VANDRANKI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The executor shutdown fix is correct and addresses a real usability problem. The old with ThreadPoolExecutor(...) as executor block calls executor.shutdown(wait=True) on exit. On a timeout, this means the ExecutionTimeoutError is raised but the caller blocks waiting for the timed-out thread to finish - defeating the purpose of the timeout. Switching to executor.shutdown(wait=not timed_out, cancel_futures=timed_out) returns control immediately on timeout. The docstring correctly notes that the worker thread may still run to completion in the background - Python cannot forcibly kill threads.

snapshot_execution_context: the function is well-designed for its stated purpose - it handles the standard copy hazards (cycles via memo, modules/functions passed by reference, Queue items copied under mutex). The MethodType addition to the reference-passthrough list is correct - bound methods should not be deep-copied.

Where is snapshot_execution_context called? I can see the function definition in the diff but not its call sites. If it is not yet called from evaluate_python_code or the executor main loop, the snapshot machinery is unreachable dead code. Could you point to where it gets invoked?

Interaction with PR #2187: PR #2187 adds ctx = contextvars.copy_context() and ctx.run(func, ...) to the same timeout function. Both PRs are valid independently, but they modify the same lines and will conflict. The merged solution should include both the cancel_futures shutdown fix from this PR and the copy_context() propagation from #2187.

The executor shutdown fix is LGTM. The snapshot utility needs call-site evidence before I can assess its impact.

@shaun0927
Copy link
Copy Markdown
Author

@VANDRANKI thanks for the review. After dropping the snapshot machinery the call-site question no longer applies.

On the #2187 overlap: both PRs only touch the timeout() wrapper, so whichever lands first is a ~5 line rebase on the other, no deeper conflict.

Keeping state isolation for timed-out workers out of scope here. The docstring already notes the worker thread may keep running in the background, so reusing state across a timeout is a caller concern and not something we can solve inside a pure-Python threading model.

@shaun0927
Copy link
Copy Markdown
Author

Pushed 9914fdb to reduce scope back to the original shutdown fix. The snapshot_execution_context helper and the added state-handling tests are gone, so the earlier call-site question is moot.

What remains is the timeout() decorator change plus two prompt-return regression tests (evaluate_python_code and LocalPythonExecutor).

Local check: PYTHONPATH=src python3 -m pytest tests/test_local_python_executor.py -k timeout -q -> 12 passed.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9914fdbf86

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

f"Code execution exceeded the maximum execution time of {timeout_seconds} seconds"
)
finally:
executor.shutdown(wait=not timed_out, cancel_futures=timed_out)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Snapshot custom tools before allowing timed worker to continue

Switching to executor.shutdown(wait=not timed_out, ...) allows a timed-out worker to keep running, but evaluate_python_code still executes against the caller’s custom_tools mapping; for LocalPythonExecutor.__call__, that is the shared self.custom_tools reused across calls. If timed-out code reaches a late def/class statement (for example after sleep), evaluate_function_def/evaluate_class_def can still mutate that shared tool context after ExecutionTimeoutError has been raised, making subsequent executions observe nondeterministic tool availability.

Useful? React with 👍 / 👎.

@VANDRANKI
Copy link
Copy Markdown

The reduced scope is the right call. Dropping the snapshot machinery makes the PR much easier to reason about and review.

What remains is clean:

  • executor.shutdown(wait=not timed_out, cancel_futures=timed_out) correctly avoids the deadlock that with ThreadPoolExecutor caused when the future timed out but the context manager blocked waiting for the thread to finish
  • The two regression tests (subprocess-isolated evaluate_python_code + LocalPythonExecutor) directly verify the promptness guarantee

On the state-isolation concern that Codex also flagged: your reasoning is sound. A pure-Python threading model cannot forcibly terminate the background thread, so snapshotting state before the call and restoring it after a timeout would need to happen at the caller level anyway. The docstring note that the worker may keep running is the right place to document this.

LGTM from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: LocalPythonExecutor timeout waits for worker completion before returning

2 participants