feat: add E2B cloud sandbox environment by JunYeopLee · Pull Request #792 · SWE-agent/mini-swe-agent

JunYeopLee · 2026-03-24T10:40:52Z

Summary

This PR adds E2BEnvironment, a new environment backend that executes commands inside E2B cloud sandboxes.

Key design decisions

Automatic template management
The first time a Docker image is used, E2BTemplateManager converts it into a persistent E2B template via Template.build(). Subsequent runs reuse the cached template, so the one-time build cost is paid only once per unique image.

Deterministic template naming
_image_to_template_name() maps Docker image names to stable, collision-resistant E2B template names using a sha256 8-character suffix. The result always stays within E2B's 63-character, alphanumeric-plus-hyphen limit.

Thread-safe build timeout
Template builds are wrapped in a ThreadPoolExecutor (rather than signal.alarm) so that the timeout works correctly when invoked from worker threads — e.g., during parallel SWE-bench evaluation runs.

SWE-bench integration
get_sb_environment() in swebench.py now injects the per-instance image for e2b the same way it already does for docker and swerex_modal.

Changes

File	Change
`src/minisweagent/environments/extra/e2b.py`	New environment class (`E2BEnvironmentConfig`, `E2BTemplateManager`, `E2BEnvironment`)
`src/minisweagent/environments/__init__.py`	Register `"e2b"` key in the environment mapping
`src/minisweagent/run/benchmarks/swebench.py`	Inject Docker image for `e2b` environment class
`pyproject.toml`	Add `e2b` optional dependency (`e2b>=1.0.0`)
`tests/environments/extra/test_e2b.py`	18 unit tests (all passing)
`docs/advanced/environments.md`	Add `e2b` entry to the environment list
`docs/reference/environments/e2b.md`	New reference page for `E2BEnvironment`
`README.md`	Mention E2B in the deployable environments list

Usage

Install the extra and set the API key:

pip install "mini-swe-agent[e2b]"
export E2B_API_KEY="your-e2b-api-key"

Run SWE-bench evaluation via E2B:

mini-extra swebench \
    --subset verified \
    --split test \
    --workers 50 \
    --environment-class e2b

Or in a YAML config:

environment:
  environment_class: e2b
  sandbox_timeout: 3600
  cpu_count: 2
  memory_mb: 2048

Test plan

18 unit tests covering E2BEnvironmentConfig, _image_to_template_name, execute() (dict/string action, non-zero exit, exception, Submitted detection), serialize() (structure, credential exclusion), stop() (normal, missing sandbox, exception-tolerant)
Full test suite (485 passed, 33 skipped) passes without regressions

Add `E2BEnvironment`, a new environment backend that runs commands inside [E2B](https://e2b.dev) cloud sandboxes. Unlike the Docker and Modal backends, it requires no local Docker daemon — the sandbox runs entirely in the cloud. Key design decisions: - **Automatic template management**: The first time a Docker image is used, `E2BTemplateManager` converts it into a persistent E2B template via `Template.build()`. Subsequent runs reuse the cached template, so the build cost is paid only once per unique image. - **Deterministic template naming**: `_image_to_template_name()` produces a stable, collision-resistant name (sha256 8-char suffix) that stays within E2B's 63-character, alphanumeric-plus-hyphen limit. - **Thread-safe build timeout**: Template builds run in a `ThreadPoolExecutor` (not `signal.alarm`) so that the timeout works correctly when called from worker threads (e.g., parallel SWE-bench runs). - **SWE-bench integration**: `get_sb_environment()` in `swebench.py` now injects the instance image for `e2b` the same way it does for `docker` and `swerex_modal`. Changes: - `src/minisweagent/environments/extra/e2b.py` — new environment class - `src/minisweagent/environments/__init__.py` — register `"e2b"` key - `src/minisweagent/run/benchmarks/swebench.py` — inject image for e2b - `pyproject.toml` — add `e2b` optional dependency (`e2b>=1.0.0`) - `tests/environments/extra/test_e2b.py` — 18 unit tests (all passing) - `docs/` — update environments reference and README

for more information, see https://pre-commit.ci

Add a module-level _active_sandboxes set and an atexit handler (_cleanup_all_sandboxes) that kills all live sandboxes when the interpreter exits. This ensures sandboxes are cleaned up on Ctrl+C or unhandled exceptions where __del__ may not be reliably called. - __init__ adds self to _active_sandboxes after sandbox creation - stop() removes self from _active_sandboxes before calling sandbox.kill() - atexit handler iterates over a snapshot of the set to avoid mutation issues Two additional tests cover the registry and cleanup behaviour.

E2B_ACCESS_TOKEN / access_token is not recognised by the E2B SDK. Remove the config field, all call-site usages (Template.build, Sandbox.create), the serialize exclusion, and the corresponding tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

get_or_build() was short-circuiting to the else branch without consulting skip_cache, making force-rebuild impossible despite the field documenting "force-rebuild even if it already exists". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

klieret · 2026-03-24T17:04:54Z

Great! Happy to include that. I'm gonna look through things in detail soon.
Have you done some real test runs to confirm everything works (on a few SWE-bench instances with swebench.py for example)?

JunYeopLee · 2026-03-24T18:41:49Z

Hello @klieret, nice to meet you.

I’ve run some test executions using the setup below:

source .env
# OPENAI_API_KEY=
# E2B_API_KEY=

uv run mini-extra swebench \
    --model openai/gpt-5-nano \
    --split test \
    --workers 4 --environment-class e2b --output ./results

As shown in the screenshot, the execution is working as expected. I’ve also attached the result file for reference.

Please let me know if there are any additional checks or scenarios you’d like me to validate.

scikit-learn__scikit-learn-12471.traj.json

SWE-bench Docker images have /testbed owned by root, but E2B sandboxes run commands as user (UID 1000) by default, causing permission denied. Add user="root" to commands.run() to match Docker behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

klieret · 2026-03-26T16:04:51Z

Awesome, let me review in detail today!

klieret · 2026-03-26T16:06:55Z

(Also I sometimes announce new features/tag new contributors on twitter/linkedin, do you have an account that I should mention?)

JunYeopLee · 2026-03-26T21:43:51Z

@klieret Sounds great!

Here are my linkedin/X profile.

https://www.linkedin.com/in/leejunyeop/
https://x.com/junyeoplee2

Thanks :)

JunYeopLee and others added 5 commits March 24, 2026 10:40

[pre-commit.ci] auto fixes from pre-commit.com hooks

db34ac7

for more information, see https://pre-commit.ci

JunYeopLee force-pushed the feat/e2b-environment branch from 9159fb4 to 32d1f91 Compare March 24, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add E2B cloud sandbox environment#792

feat: add E2B cloud sandbox environment#792
JunYeopLee wants to merge 6 commits intoSWE-agent:mainfrom
JunYeopLee:feat/e2b-environment

JunYeopLee commented Mar 24, 2026 •

edited

Loading

Uh oh!

klieret commented Mar 24, 2026 •

edited

Loading

Uh oh!

JunYeopLee commented Mar 24, 2026

Uh oh!

klieret commented Mar 26, 2026

Uh oh!

klieret commented Mar 26, 2026

Uh oh!

JunYeopLee commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JunYeopLee commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design decisions

Changes

Usage

Test plan

Uh oh!

klieret commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JunYeopLee commented Mar 24, 2026

Uh oh!

klieret commented Mar 26, 2026

Uh oh!

klieret commented Mar 26, 2026

Uh oh!

JunYeopLee commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JunYeopLee commented Mar 24, 2026 •

edited

Loading

klieret commented Mar 24, 2026 •

edited

Loading