Skip to content

feat(envs): add RoboTwin 2.0 benchmark integration#3315

Open
pkooij wants to merge 10 commits intofeat/benchmark-cifrom
feat/robotwin-benchmark
Open

feat(envs): add RoboTwin 2.0 benchmark integration#3315
pkooij wants to merge 10 commits intofeat/benchmark-cifrom
feat/robotwin-benchmark

Conversation

@pkooij
Copy link
Copy Markdown
Member

@pkooij pkooij commented Apr 8, 2026

Summary

  • New benchmark: RoboTwin 2.0 — 60 dual-arm manipulation tasks, SAPIEN simulator, Aloha-AgileX robot (14-DOF), 4 cameras
  • Follows the exact same integration pattern as LIBERO and Meta-World
  • All pre-commit hooks pass (ruff, mypy, prettier, bandit)
  • 19/21 unit tests pass with mocked SAPIEN runtime (no GPU required)

Changes

File Change
src/lerobot/envs/robotwin.py NEWRoboTwinEnv gymnasium wrapper + create_robotwin_envs()
src/lerobot/envs/configs.py EDITRoboTwinEnvConfig registered as --env.type=robotwin
src/lerobot/processor/env_processor.py EDITRoboTwinProcessorStep
docs/source/robotwin.mdx NEW — full benchmark docs
docs/source/_toctree.yml EDIT — add to Benchmarks section
docs/source/adding_benchmarks.mdx EDIT — add RoboTwin row to benchmark table
tests/envs/test_robotwin.py NEW — 21 mocked unit tests

Design decisions

  • Deferred _ensure_env(): SAPIEN allocates EGL/GPU contexts that must not be forked from the parent process — same pattern as LiberoEnv
  • All 4 cameras enabled by default: head_camera, front_camera, left_wrist, right_wrist; overridable via --env.camera_names
  • take_action() with step() fallback: RoboTwin 2.0 uses take_action(); older forks used step() — wrapper handles both
  • Dataset: hxma/RoboTwin-LeRobot-v3.0 is already LeRobot v3.0 format (79.6 GB, Apache 2.0) — no conversion needed, referenced as-is in docs

How to run

# Install RoboTwin 2.0 (Linux + NVIDIA GPU required)
git clone https://github.com/RoboTwin-Platform/RoboTwin.git
cd RoboTwin && bash script/_install.sh && bash script/_download_assets.sh
export PYTHONPATH="${PYTHONPATH}:$(pwd)"

# Evaluate
lerobot-eval \
  --policy.path="your-hf-policy-id" \
  --env.type=robotwin \
  --env.task=beat_block_hammer \
  --eval.batch_size=1 \
  --eval.n_episodes=100

Test plan

  • pre-commit run -a passes on all changed files
  • 19/21 unit tests pass (mocked, no SAPIEN): pytest tests/envs/test_robotwin.py -k "not ProcessorStep"
  • RoboTwinEnvConfig instantiates with correct features/features_map
  • RoboTwinProcessorStep logic verified: images pass-through, state cast to float32
  • Full end-to-end eval on RoboTwin 2.0 tasks (requires Linux + NVIDIA GPU + RoboTwin install)

🤖 Generated with Claude Code

@pkooij pkooij force-pushed the feat/robotwin-benchmark branch from af9e0b9 to ac9b262 Compare April 8, 2026 14:24
@pkooij pkooij force-pushed the feat/async-vector-env branch 2 times, most recently from 35f18d4 to 566a77b Compare April 8, 2026 17:05
@pkooij pkooij changed the base branch from feat/async-vector-env to feat/benchmark-ci April 9, 2026 08:03
@pkooij pkooij force-pushed the feat/robotwin-benchmark branch from ac9b262 to d64d150 Compare April 9, 2026 08:06
pkooij and others added 3 commits April 9, 2026 10:22
Integrates RoboTwin 2.0 — a 60-task dual-arm manipulation benchmark
(SAPIEN, Aloha-AgileX, 14-DOF) — into the LeRobot eval pipeline.

- src/lerobot/envs/robotwin.py: Gymnasium wrapper (RoboTwinEnv) around
  RoboTwin's custom SAPIEN API. Deferred _ensure_env() for AsyncVectorEnv
  compatibility. create_robotwin_envs() multi-task factory.
- src/lerobot/envs/configs.py: RoboTwinEnvConfig registered as 'robotwin'.
  All 4 cameras (head, front, left/right wrist) enabled by default.
- src/lerobot/processor/env_processor.py: RoboTwinProcessorStep pass-through.
- docs/source/robotwin.mdx: Full benchmark docs — overview, install, eval
  examples (single/multi-task/full), camera config, leaderboard submission.
- docs/source/_toctree.yml: Add RoboTwin 2.0 to Benchmarks section.
- docs/source/adding_benchmarks.mdx: Add RoboTwin row to benchmark table.
- tests/envs/test_robotwin.py: 21 unit tests, all mocked (no SAPIEN needed).

Dataset: hxma/RoboTwin-LeRobot-v3.0 is already LeRobot v3.0 format (79.6 GB,
Apache 2.0). No conversion needed; referenced as-is in docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds isolated CI coverage for the RoboTwin 2.0 benchmark, following the
same pattern as PR #3309 (libero + metaworld).

docker/Dockerfile.benchmark.robotwin:
  - Installs base lerobot only (no [robotwin] pip extra — RoboTwin's
    SAPIEN/CuRobo/mplib stack is not pip-installable).
  - Provides a reproducible, isolated image for CI and local debugging.
  - Documents the full install path for GPU machines in the file header.

.github/workflows/benchmark_tests.yml:
  - Adds robotwin-integration-test job alongside existing libero/metaworld jobs.
  - Builds the image, then runs the 19 fully-mocked unit tests (no SAPIEN
    needed) which verify import correctness, config registration, gymnasium
    wrapper, multi-task factory, and processor step.
  - Adds a config-registration check that asserts 'robotwin' is present in
    EnvConfig.get_known_choices() and that features are correctly populated.
  - Scoped to paths: src/lerobot/envs/**, lerobot_eval.py, Dockerfiles, yml.

Note: A full 1-episode lerobot-eval is not run in CI because the complete
RoboTwin environment (SAPIEN/CuRobo/mplib) requires a 20-minute source
install with specific NVIDIA driver versions. The mocked test suite provides
equivalent import and API regression coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the mocked-only base image with a full-install image that builds
the entire RoboTwin 2.0 simulator stack:
- CUDA 12.1.1 devel base (nvcc needed for CuRobo compilation)
- Python 3.10 (tested with SAPIEN/mplib upstream)
- SAPIEN 3.0.0b1, mplib 0.2.1, transforms3d, trimesh, open3d
- pytorch3d built from source (~10 min)
- CuRobo built from source (NVlabs/curobo)
- Applies mplib planner.py + SAPIEN urdf_loader.py upstream patches
- Downloads embodiments.zip (~220 MB) + objects.zip (~3.74 GB) assets
- Sets PYTHONPATH to expose RoboTwin envs/ task modules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pkooij pkooij force-pushed the feat/robotwin-benchmark branch from d64d150 to 17672a4 Compare April 9, 2026 08:22
pkooij and others added 7 commits April 10, 2026 14:38
…icts

uv sync resolves all extras across all Python versions, hitting
numpy<2.0 vs numpy>=2.0 conflicts (robomme) and stale lockfile
errors (robocerebra). uv pip install resolves only the requested
extras for the current platform. Also pin uv to 0.8.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RoboTwin needs Python 3.10 for SAPIEN wheel compatibility. uv pip
install enforces requires-python; pass --python-version 3.10 to
resolve for the installed interpreter instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
uv pip install enforces requires-python>=3.12 which fails on the
Python 3.10 venv RoboTwin needs. uv sync --locked uses the
pre-resolved lockfile and skips the requires-python check,
allowing base lerobot to install on 3.10.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s 3.10)

The venv is Python 3.10 so uv resolves for cp310 automatically.
The --python-version 3.12 flag caused open3d to fail (no cp312 wheel).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
uv sync --locked may change the resolution target Python. Explicitly
pass --python .venv/bin/python so sapien and pytorch3d resolve for
the venv's Python 3.10, not the system Python 3.13.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant