Fix LocalPythonExecutor timeout waiting for worker completion by shaun0927 · Pull Request #2199 · huggingface/smolagents

shaun0927 · 2026-04-16T12:11:32Z

Summary

Make LocalPythonExecutor timeouts return control promptly without letting timed-out workers mutate later executions' live state, including nested mutable values.

Problem / Root Cause

PR #1910 introduced a thread-based timeout mechanism for local Python execution. The original implementation wrapped future.result(timeout=...) in a with ThreadPoolExecutor(...) block.

When the timeout is exceeded, future.result() raises as expected, but exiting the with block shuts the executor down in a way that waits for the worker thread to finish. In practice, callers do not regain control near the configured timeout boundary.

Minimal repro before this patch:

import time
from smolagents.local_python_executor import evaluate_python_code, ExecutionTimeoutError

code = """
import time
time.sleep(3)
result = 42
"""

start = time.monotonic()
try:
    evaluate_python_code(code, authorized_imports=["time"], timeout_seconds=1)
except ExecutionTimeoutError:
    print(f"elapsed={time.monotonic() - start:.2f}s")

Observed before patch: elapsed is about 3.00s, not about 1.00s.

After switching to prompt-return shutdown semantics, there are two stale-worker races to account for:

top-level execution dict updates could arrive late from the timed-out worker,
nested mutable values already present in state (for example lists/dicts/objects) could still be mutated in place after the timeout if only a shallow copy was used.

In particular, a reused LocalPythonExecutor could otherwise see late variable/log overwrites or nested list/dict mutations from the stale worker after a later call has already completed.

This PR does not treat LocalPythonExecutor as a security boundary. It fixes timeout semantics / caller behavior and isolates the live per-call execution dictionaries from stale timed-out workers as far as Python in-process threading allows.

Solution

Switch from context-manager-based executor lifetime to explicit shutdown behavior:

keep wait=True on normal completion,
use wait=False and cancel_futures=True after a timeout,
preserve the documented Python limitation that a running worker thread may continue in the background until completion.

Then isolate execution context per call by deep-copying state / custom_tools for timed executions and only committing them back after non-timeout completion. That keeps prompt timeout behavior while preventing stale timed-out workers from mutating the dictionaries and nested mutable values reused by later executions.

Regression tests

Added tests that verify:

prompt return semantics for evaluate_python_code(..., timeout_seconds=1)
prompt return semantics for LocalPythonExecutor(..., timeout_seconds=1)
timed-out evaluate_python_code does not overwrite a reused state dict later
timed-out LocalPythonExecutor does not overwrite later variables/logs after a subsequent successful call
timed-out evaluate_python_code does not mutate nested reused state values later
timed-out LocalPythonExecutor does not mutate nested reused state values later

The evaluate_python_code timing assertion runs in a fresh interpreter subprocess to avoid interference from background timed-out worker threads created by other tests.

Validation

PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'
PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'not test_can_import_sklearn_if_explicitly_authorized'

Passed locally. The only skipped full-file coverage in this environment is the existing sklearn-specific test because sklearn is not installed locally.

The current timeout wrapper raises ExecutionTimeoutError only after the underlying worker finishes because executor shutdown waits for the thread when leaving the context manager. This change uses explicit shutdown behavior so callers regain control at the configured timeout while keeping Python's documented limitation that timed-out worker threads may continue in the background until completion. Constraint: Python threads cannot be forcefully terminated safely Rejected: Process-based execution for LocalPythonExecutor | much broader behavioral change than a bugfix Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep timeout semantics prompt for callers even though background worker cleanup remains best-effort Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout' Not-tested: Full test suite

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cda9f1e676

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

shaun0927 · 2026-04-16T14:53:08Z

I checked the current CI state for this fork PR: both Python tests and Quality Check are in action_required with the message This workflow is awaiting approval from a maintainer in #2199.

I also confirmed that maintainerCanModify is enabled on this PR, so there should not be any extra contributor-side setting needed from my side. Once a maintainer approves the workflow runs, the normal checks should be able to execute.

The prompt-return timeout fix exposed a follow-up race: once a timed-out\nworker keeps running in the background, it can still mutate shared\nexecution dictionaries after the caller has moved on. That made later\nLocalPythonExecutor calls vulnerable to late variable/log overwrites and\nleft reused evaluate_python_code state dicts open to the same top-level\nleakage.\n\nThis snapshots state/custom tool dictionaries for each execution and only\ncommits them back after non-timeout completion. Timeouts still return\npromptly, but stale workers no longer write into the live dictionaries\nused by subsequent calls.\n\nConstraint: Python threads still cannot be force-killed safely\nRejected: Revert prompt-return timeout behavior | restores the original availability bug\nRejected: Process-based isolation | too broad for this targeted bugfix\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: Keep timeout fixes paired with reuse-after-timeout regressions because stale workers can outlive the caller\nTested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'\nTested: targeted local repro confirming timed-out workers no longer overwrite reused state/logs\nNot-tested: Full pytest suite / CI workflows still require maintainer approval on fork PRs

shaun0927 · 2026-04-17T03:12:28Z

Follow-up pushed in d9028cb to address the stale-worker state race from the review thread.

What changed:

isolate state / custom_tools per execution
only commit them back after non-timeout completion
add regression tests for reused evaluate_python_code state and reused LocalPythonExecutor state/logs after timeout

Validation:

PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'

CI note: the fork PR workflows still appear to require maintainer approval before GitHub Actions can run.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9028cb818

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The previous timeout follow-up only copied the top-level execution mapping, so timed-out workers could still mutate nested lists, dicts, or objects shared with later calls. That left the P1 race open for common in-place mutations like list append even after prompt-return semantics were fixed. This switches timed executions to deep-copied state/custom tool snapshots and adds regressions covering nested state reuse for both and . Constraint: Python threads still cannot be force-killed safely Rejected: Keep shallow copies | nested mutable values still leak after timeout Rejected: Process isolation for local executor | much broader behavior change than this bugfix Confidence: high Scope-risk: narrow Reversibility: clean Directive: Any future timeout-isolation change must cover nested mutable state, not just top-level dict writes Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout' Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'not test_can_import_sklearn_if_explicitly_authorized' Tested: targeted nested-state timeout repro for evaluate_python_code and LocalPythonExecutor Not-tested: tests/test_local_python_executor.py::test_can_import_sklearn_if_explicitly_authorized (local sklearn missing) Not-tested: Full repo test suite / GitHub Actions still require maintainer approval on fork PRs

shaun0927 · 2026-04-17T04:15:35Z

Fresh follow-up pushed in aa42324 for the new nested-mutable-state review finding.

What changed:

deep-copy timed execution state / custom_tools
keep commit-back only on non-timeout completion
add regressions for nested mutable state reuse after timeout

Validation:

PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'timeout'
PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k 'not test_can_import_sklearn_if_explicitly_authorized'

Note: the only excluded test in this environment is the existing sklearn-specific import test because sklearn is not installed locally. GitHub Actions on the fork PR may still require maintainer approval before CI can run.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa42324abc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T04:18:39Z

+    else:
+        state = copy.deepcopy(state)


Stop deep-copying persisted state unconditionally

evaluate_python_code now deep-copies state on every call, which requires all previously stored values to be deepcopyable; that breaks common stateful flows where prior code imported modules, because evaluate_import stores module objects in state and copy.deepcopy on a ModuleType raises TypeError before execution starts. In practice, a successful call like import time can make the next call with the same LocalPythonExecutor/state fail immediately, which is a functional regression introduced by this change.

Useful? React with 👍 / 👎.

shaun0927 · 2026-04-17T04:31:43Z

Fresh follow-up pushed in cdc8139 for the non-deepcopyable module regression from the latest review thread.

What changed:

replace unconditional copy.deepcopy(...) on state / custom_tools with a snapshot helper that still recursively copies mutable containers but preserves runtime objects like modules/functions/classes
keep the timeout isolation behavior for nested mutable state, without breaking reused executor state after imports like import time
add regressions for reused evaluate_python_code(..., state=...) and reused LocalPythonExecutor(...) state when imported modules are present

Validation:

ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix"

CI note: fork PR workflows may still require maintainer approval before Actions can run.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdc8139c3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T04:35:00Z

+    if isinstance(value, dict):
+        return {key: snapshot_execution_context(val) for key, val in value.items()}
+    if isinstance(value, list):


Preserve aliasing when cloning execution context

snapshot_execution_context rebuilds containers with fresh comprehensions, which drops shared-reference identity across keys/variables. If a prior run leaves state with aliases (e.g., b = a), the next evaluate_python_code call clones them into different objects, so mutating a no longer updates b and execution results become silently incorrect even without any timeout path. This is a behavior regression from the previous in-place state model and can corrupt multi-step executor sessions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T04:35:00Z

+    if isinstance(value, dict):
+        return {key: snapshot_execution_context(val) for key, val in value.items()}


Handle cycles while snapshotting execution state

The new recursive snapshot logic has no memo/visited tracking, so cyclic inputs like s = {}; s['self'] = s recurse indefinitely and fail before user code runs. Previously, such states could be reused because no deep snapshot occurred; now they can trigger RecursionError on entry to evaluate_python_code, making valid Python object graphs unusable.

Useful? React with 👍 / 👎.

The timeout isolation follow-up fixed stale worker mutations, but the snapshot helper still broke shared-reference semantics and cyclic state graphs. This update keeps runtime objects like modules intact while preserving aliasing and cycles for reused executor state so subsequent calls keep behaving like normal Python stateful sessions. Constraint: Timed-out worker threads still cannot be forcefully terminated safely in-process Rejected: Keep unconditional deepcopy semantics | breaks imported modules, aliases, and cyclic state reuse Rejected: Drop per-call state snapshots | reintroduces stale-worker mutation races after timeout Confidence: high Scope-risk: narrow Reversibility: clean Directive: Any future state snapshot change must preserve aliasing/cycles as well as timeout isolation Tested: ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py Tested: ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles" Not-tested: Full pytest suite / GitHub Actions still require maintainer approval on fork PRs

shaun0927 · 2026-04-17T04:57:36Z

Fresh follow-up pushed in f5dddf4 for the latest snapshotting review findings.

What changed:

preserve aliasing when snapshotting timed execution state / custom_tools
handle cyclic container graphs without blowing up on recursive state reuse
still preserve runtime objects like imported modules/functions/classes instead of trying to deepcopy them
add regressions for reused aliased state and reused cyclic state

Validation:

ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles"
result: 20 passed

The only remaining blocker I can see from my side is still maintainer approval for the fork PR workflows so CI can run on the latest commits.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5dddf4604

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T05:00:46Z

+    except Exception:
+        copied_value = value


Avoid sharing uncopyable context objects after timeout

When copy.deepcopy fails, snapshot_execution_context silently falls back to copied_value = value, which keeps the original object shared between the caller and the timed worker thread. This reintroduces late-mutation races for common non-deepcopyable mutables (for example queue.Queue, which fails deepcopy because of its internal lock): after ExecutionTimeoutError returns, the background thread can still mutate caller-visible state/custom tools and corrupt later executions. Because this change was intended to isolate timed-out workers, silently aliasing here undermines the fix.

Useful? React with 👍 / 👎.

Addressed in 774565e.

What changed:

snapshot queue.Queue-style state objects into independent queue instances instead of letting them fall back to shared references

when timeouts are disabled, skip snapshotting entirely so ordinary reused state keeps the old in-place behavior

if timeout isolation still cannot safely snapshot a remaining state/custom tool object after deepcopy + shallow-copy fallback, raise a clear error instead of silently aliasing it

add regressions for reused queue state after timeout and for rejecting truly unsnapshotable timeout state objects

Validation:

ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py

ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py

PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects"

result: 23 passed

The latest review pointed out that silently aliasing state when deepcopy fails still lets timed-out workers mutate caller-visible objects. This change snapshots queue.Queue-based objects explicitly, falls back to shallow copy before giving up, and raises a clear error when timeout isolation cannot safely snapshot a remaining state object. It also avoids snapshot overhead when timeouts are disabled and locks the new behavior with queue and uncopyable-object regressions.\n\nConstraint: Timeouted worker threads still run in the background, so shared state objects must not leak back into later executions\nRejected: Keep aliasing deepcopy failures | still lets timed-out workers mutate caller-visible queue state after timeout\nConfidence: medium\nScope-risk: narrow\nReversibility: clean\nDirective: Any future timeout snapshot helper must preserve aliasing/cycles for supported objects and fail loudly instead of silently sharing unsupported mutable state\nTested: ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py; ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py; PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects"\nNot-tested: Full test suite; GitHub Actions on fork PR (still maintainer approval gated)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 774565e4c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T05:23:46Z

+    if isinstance(value, dict):
+        copied_dict = {}


Preserve mapping subclasses when cloning execution context

The snapshot logic converts every dict subclass into a plain {} and then evaluate_python_code commits that cloned state back on successful timed runs, so subclass behavior is silently lost even when no timeout occurs. A common case is collections.defaultdict: after one execution with default timeout settings, missing-key access starts raising KeyError because the stored object has been downgraded to dict, which is a functional regression from the previous in-place state behavior.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T05:23:46Z

+        with value.mutex:
+            items = list(value.queue)
+        for item in items:
+            copied_queue.put_nowait(snapshot_execution_context(item, memo))


Preserve Queue task counters when snapshotting state

The queue.Queue snapshot only copies buffered items and reconstructs a new queue via put_nowait, but it does not preserve bookkeeping like unfinished_tasks. When this cloned queue is committed back after a timed execution, join()/task_done() semantics can become incorrect (for example, outstanding work can be dropped and join() may return early), which can corrupt queue-based workflows that persist queue objects in executor state.

Useful? React with 👍 / 👎.

The latest open PR huggingface#2199 review pointed out a real regression: the timeout-state snapshot helper special-cased plain dict construction in a way that downgraded dict subclasses like collections.defaultdict into regular dicts on the next successful timed execution. This preserves dict subclass identity by snapshotting plain dicts and dict subclasses separately, and adds focused regressions for reused defaultdict state through both evaluate_python_code and LocalPythonExecutor. Constraint: Keep timeout isolation intact without regressing normal reusable Python state semantics Rejected: Widen this follow-up to queue unfinished-task bookkeeping changes | lower-priority behavioral surface beyond the current P1 mapping-subclass regression Confidence: high Scope-risk: narrow Reversibility: clean Directive: Any future snapshot helper changes must preserve container subtype behavior in addition to aliasing/cycle isolation Tested: ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py Tested: ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py Tested: PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects or defaultdict" Not-tested: Full pytest suite; GitHub Actions still require maintainer approval on fork PRs Related: PR huggingface#2199

shaun0927 · 2026-04-17T05:30:51Z

Fresh follow-up pushed in 4e7e534 for the latest defaultdict / mapping-subclass review finding.

What changed:

preserve plain dict and dict-subclass snapshotting separately so timed execution snapshots no longer downgrade collections.defaultdict into a plain dict
add focused regressions for reused evaluate_python_code(..., state=...) and reused LocalPythonExecutor(...) state when defaultdict behavior must survive across timed runs

Validation:

ruff check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
ruff format --check src/smolagents/local_python_executor.py tests/test_local_python_executor.py
PYTHONPATH=src pytest -q tests/test_local_python_executor.py -k "timeout or imported_modules_after_timeout_fix or preserves_aliasing or preserves_cycles or queue_objects or uncopyable_state_objects or defaultdict"
result: 25 passed

I did not widen this follow-up to the lower-priority queue bookkeeping comment yet. That one touches Queue.join() / unfinished_tasks semantics and deserves its own validation pass instead of being folded into the current P1 mapping-subclass fix.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e7e534145

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T05:35:01Z

+    if isinstance(value, (ModuleType, FunctionType, BuiltinFunctionType, MethodType, type)):
+        memo[obj_id] = value
+        return value


Preserve bound-method aliasing in state snapshots

snapshot_execution_context currently returns BuiltinFunctionType/MethodType objects as-is, even when their owner objects are being cloned in the same snapshot. With timeout-enabled executions (the default), a state like {"items": [...], "append_fn": items.append} is cloned into a new items list but keeps append_fn bound to the old list, so append_fn(...) mutates stale data while items appears unchanged. This silently breaks normal Python aliasing semantics across calls and can produce incorrect results without any timeout actually firing.

Useful? React with 👍 / 👎.

VANDRANKI

The executor shutdown fix is correct and addresses a real usability problem. The old with ThreadPoolExecutor(...) as executor block calls executor.shutdown(wait=True) on exit. On a timeout, this means the ExecutionTimeoutError is raised but the caller blocks waiting for the timed-out thread to finish - defeating the purpose of the timeout. Switching to executor.shutdown(wait=not timed_out, cancel_futures=timed_out) returns control immediately on timeout. The docstring correctly notes that the worker thread may still run to completion in the background - Python cannot forcibly kill threads.

snapshot_execution_context: the function is well-designed for its stated purpose - it handles the standard copy hazards (cycles via memo, modules/functions passed by reference, Queue items copied under mutex). The MethodType addition to the reference-passthrough list is correct - bound methods should not be deep-copied.

Where is snapshot_execution_context called? I can see the function definition in the diff but not its call sites. If it is not yet called from evaluate_python_code or the executor main loop, the snapshot machinery is unreachable dead code. Could you point to where it gets invoked?

Interaction with PR #2187: PR #2187 adds ctx = contextvars.copy_context() and ctx.run(func, ...) to the same timeout function. Both PRs are valid independently, but they modify the same lines and will conflict. The merged solution should include both the cancel_futures shutdown fix from this PR and the copy_context() propagation from #2187.

The executor shutdown fix is LGTM. The snapshot utility needs call-site evidence before I can assess its impact.

shaun0927 · 2026-04-23T01:37:14Z

@VANDRANKI thanks for the review. After dropping the snapshot machinery the call-site question no longer applies.

On the #2187 overlap: both PRs only touch the timeout() wrapper, so whichever lands first is a ~5 line rebase on the other, no deeper conflict.

Keeping state isolation for timed-out workers out of scope here. The docstring already notes the worker thread may keep running in the background, so reusing state across a timeout is a caller concern and not something we can solve inside a pure-Python threading model.

shaun0927 · 2026-04-23T01:39:39Z

Pushed 9914fdb to reduce scope back to the original shutdown fix. The snapshot_execution_context helper and the added state-handling tests are gone, so the earlier call-site question is moot.

What remains is the timeout() decorator change plus two prompt-return regression tests (evaluate_python_code and LocalPythonExecutor).

Local check: PYTHONPATH=src python3 -m pytest tests/test_local_python_executor.py -k timeout -q -> 12 passed.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9914fdbf86

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-23T01:42:02Z

+                    f"Code execution exceeded the maximum execution time of {timeout_seconds} seconds"
+                )
+            finally:
+                executor.shutdown(wait=not timed_out, cancel_futures=timed_out)


Snapshot custom tools before allowing timed worker to continue

Switching to executor.shutdown(wait=not timed_out, ...) allows a timed-out worker to keep running, but evaluate_python_code still executes against the caller’s custom_tools mapping; for LocalPythonExecutor.__call__, that is the shared self.custom_tools reused across calls. If timed-out code reaches a late def/class statement (for example after sleep), evaluate_function_def/evaluate_class_def can still mutate that shared tool context after ExecutionTimeoutError has been raised, making subsequent executions observe nondeterministic tool availability.

Useful? React with 👍 / 👎.

VANDRANKI · 2026-04-23T16:10:05Z

The reduced scope is the right call. Dropping the snapshot machinery makes the PR much easier to reason about and review.

What remains is clean:

executor.shutdown(wait=not timed_out, cancel_futures=timed_out) correctly avoids the deadlock that with ThreadPoolExecutor caused when the future timed out but the context manager blocked waiting for the thread to finish
The two regression tests (subprocess-isolated evaluate_python_code + LocalPythonExecutor) directly verify the promptness guarantee

On the state-isolation concern that Codex also flagged: your reasoning is sound. A pure-Python threading model cannot forcibly terminate the background thread, so snapshotting state before the call and restoring it after a timeout would need to happen at the caller level anyway. The docstring note that the worker may keep running is the right place to document this.

LGTM from my side.

chatgpt-codex-connector Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread src/smolagents/local_python_executor.py

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread src/smolagents/local_python_executor.py Outdated

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

Avoid deepcopy regressions in timeout state snapshots

cdc8139

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

VANDRANKI mentioned this pull request Apr 23, 2026

Fix OpenTelemetry tool spans missing parent ID in CodeAgent #2187

Open

4 tasks

VANDRANKI reviewed Apr 23, 2026

View reviewed changes

Drop state-snapshot scope per review; keep shutdown fix only

9914fdb

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

		if isinstance(value, dict):
		return {key: snapshot_execution_context(val) for key, val in value.items()}

Conversation

shaun0927 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem / Root Cause

Solution

Regression tests

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

shaun0927 commented Apr 16, 2026

Uh oh!

shaun0927 commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

shaun0927 commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

shaun0927 commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

VANDRANKI left a comment

Choose a reason for hiding this comment

Uh oh!

shaun0927 commented Apr 23, 2026

Uh oh!

shaun0927 commented Apr 16, 2026 •

edited

Loading