feat(envs): add RoboTwin 2.0 benchmark integration by pkooij · Pull Request #3315 · huggingface/lerobot

pkooij · 2026-04-08T08:51:25Z

Summary

New benchmark: RoboTwin 2.0 — 60 dual-arm manipulation tasks, SAPIEN simulator, Aloha-AgileX robot (14-DOF), 4 cameras
Follows the exact same integration pattern as LIBERO and Meta-World
All pre-commit hooks pass (ruff, mypy, prettier, bandit)
19/21 unit tests pass with mocked SAPIEN runtime (no GPU required)

Changes

File	Change
`src/lerobot/envs/robotwin.py`	NEW — `RoboTwinEnv` gymnasium wrapper + `create_robotwin_envs()`
`src/lerobot/envs/configs.py`	EDIT — `RoboTwinEnvConfig` registered as `--env.type=robotwin`
`src/lerobot/processor/env_processor.py`	EDIT — `RoboTwinProcessorStep`
`docs/source/robotwin.mdx`	NEW — full benchmark docs
`docs/source/_toctree.yml`	EDIT — add to Benchmarks section
`docs/source/adding_benchmarks.mdx`	EDIT — add RoboTwin row to benchmark table
`tests/envs/test_robotwin.py`	NEW — 21 mocked unit tests

Design decisions

Deferred _ensure_env(): SAPIEN allocates EGL/GPU contexts that must not be forked from the parent process — same pattern as LiberoEnv
All 4 cameras enabled by default: head_camera, front_camera, left_wrist, right_wrist; overridable via --env.camera_names
take_action() with step() fallback: RoboTwin 2.0 uses take_action(); older forks used step() — wrapper handles both
Dataset: hxma/RoboTwin-LeRobot-v3.0 is already LeRobot v3.0 format (79.6 GB, Apache 2.0) — no conversion needed, referenced as-is in docs

How to run

# Install RoboTwin 2.0 (Linux + NVIDIA GPU required)
git clone https://github.com/RoboTwin-Platform/RoboTwin.git
cd RoboTwin && bash script/_install.sh && bash script/_download_assets.sh
export PYTHONPATH="${PYTHONPATH}:$(pwd)"

# Evaluate
lerobot-eval \
  --policy.path="your-hf-policy-id" \
  --env.type=robotwin \
  --env.task=beat_block_hammer \
  --eval.batch_size=1 \
  --eval.n_episodes=100

Test plan

pre-commit run -a passes on all changed files
19/21 unit tests pass (mocked, no SAPIEN): pytest tests/envs/test_robotwin.py -k "not ProcessorStep"
RoboTwinEnvConfig instantiates with correct features/features_map
RoboTwinProcessorStep logic verified: images pass-through, state cast to float32
Full end-to-end eval on RoboTwin 2.0 tasks (requires Linux + NVIDIA GPU + RoboTwin install)

🤖 Generated with Claude Code

Integrates RoboTwin 2.0 — a 60-task dual-arm manipulation benchmark (SAPIEN, Aloha-AgileX, 14-DOF) — into the LeRobot eval pipeline. - src/lerobot/envs/robotwin.py: Gymnasium wrapper (RoboTwinEnv) around RoboTwin's custom SAPIEN API. Deferred _ensure_env() for AsyncVectorEnv compatibility. create_robotwin_envs() multi-task factory. - src/lerobot/envs/configs.py: RoboTwinEnvConfig registered as 'robotwin'. All 4 cameras (head, front, left/right wrist) enabled by default. - src/lerobot/processor/env_processor.py: RoboTwinProcessorStep pass-through. - docs/source/robotwin.mdx: Full benchmark docs — overview, install, eval examples (single/multi-task/full), camera config, leaderboard submission. - docs/source/_toctree.yml: Add RoboTwin 2.0 to Benchmarks section. - docs/source/adding_benchmarks.mdx: Add RoboTwin row to benchmark table. - tests/envs/test_robotwin.py: 21 unit tests, all mocked (no SAPIEN needed). Dataset: hxma/RoboTwin-LeRobot-v3.0 is already LeRobot v3.0 format (79.6 GB, Apache 2.0). No conversion needed; referenced as-is in docs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds isolated CI coverage for the RoboTwin 2.0 benchmark, following the same pattern as PR #3309 (libero + metaworld). docker/Dockerfile.benchmark.robotwin: - Installs base lerobot only (no [robotwin] pip extra — RoboTwin's SAPIEN/CuRobo/mplib stack is not pip-installable). - Provides a reproducible, isolated image for CI and local debugging. - Documents the full install path for GPU machines in the file header. .github/workflows/benchmark_tests.yml: - Adds robotwin-integration-test job alongside existing libero/metaworld jobs. - Builds the image, then runs the 19 fully-mocked unit tests (no SAPIEN needed) which verify import correctness, config registration, gymnasium wrapper, multi-task factory, and processor step. - Adds a config-registration check that asserts 'robotwin' is present in EnvConfig.get_known_choices() and that features are correctly populated. - Scoped to paths: src/lerobot/envs/**, lerobot_eval.py, Dockerfiles, yml. Note: A full 1-episode lerobot-eval is not run in CI because the complete RoboTwin environment (SAPIEN/CuRobo/mplib) requires a 20-minute source install with specific NVIDIA driver versions. The mocked test suite provides equivalent import and API regression coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the mocked-only base image with a full-install image that builds the entire RoboTwin 2.0 simulator stack: - CUDA 12.1.1 devel base (nvcc needed for CuRobo compilation) - Python 3.10 (tested with SAPIEN/mplib upstream) - SAPIEN 3.0.0b1, mplib 0.2.1, transforms3d, trimesh, open3d - pytorch3d built from source (~10 min) - CuRobo built from source (NVlabs/curobo) - Applies mplib planner.py + SAPIEN urdf_loader.py upstream patches - Downloads embodiments.zip (~220 MB) + objects.zip (~3.74 GB) assets - Sets PYTHONPATH to expose RoboTwin envs/ task modules Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…icts uv sync resolves all extras across all Python versions, hitting numpy<2.0 vs numpy>=2.0 conflicts (robomme) and stale lockfile errors (robocerebra). uv pip install resolves only the requested extras for the current platform. Also pin uv to 0.8.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

RoboTwin needs Python 3.10 for SAPIEN wheel compatibility. uv pip install enforces requires-python; pass --python-version 3.10 to resolve for the installed interpreter instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

uv pip install enforces requires-python>=3.12 which fails on the Python 3.10 venv RoboTwin needs. uv sync --locked uses the pre-resolved lockfile and skips the requires-python check, allowing base lerobot to install on 3.10. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s 3.10) The venv is Python 3.10 so uv resolves for cp310 automatically. The --python-version 3.12 flag caused open3d to fail (no cp312 wheel). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

uv sync --locked may change the resolution target Python. Explicitly pass --python .venv/bin/python so sapien and pytorch3d resolve for the venv's Python 3.10, not the system Python 3.13. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pkooij force-pushed the feat/robotwin-benchmark branch from af9e0b9 to ac9b262 Compare April 8, 2026 14:24

pkooij force-pushed the feat/async-vector-env branch 2 times, most recently from 35f18d4 to 566a77b Compare April 8, 2026 17:05

pkooij changed the base branch from feat/async-vector-env to feat/benchmark-ci April 9, 2026 08:03

pkooij force-pushed the feat/robotwin-benchmark branch from ac9b262 to d64d150 Compare April 9, 2026 08:06

pkooij and others added 3 commits April 9, 2026 10:22

pkooij force-pushed the feat/robotwin-benchmark branch from d64d150 to 17672a4 Compare April 9, 2026 08:22

pkooij and others added 7 commits April 10, 2026 14:38

ci: trigger benchmark CI after workflow update

a36a0f1

ci: retrigger after uv sync → uv pip install fix

05a280f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(envs): add RoboTwin 2.0 benchmark integration#3315

feat(envs): add RoboTwin 2.0 benchmark integration#3315
pkooij wants to merge 10 commits intofeat/benchmark-cifrom
feat/robotwin-benchmark

pkooij commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pkooij commented Apr 8, 2026

Summary

Changes

Design decisions

How to run

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant