Merge main into develop by alexmillane · Pull Request #615 · isaac-sim/IsaacLab-Arena

alexmillane · 2026-04-17T14:40:59Z

Summary

Merge main into develop

## Summary Explain the resume flag

…572) ## Summary This is a doc change requested by QA in https://nvbugspro.nvidia.com/bug/6063011 It clarifies that evaluated newton trained model using physx is expected to completely fail the dexsuite task.

## Summary Subprocess-spawning tests hang indefinitely on CI. ## Causes & Fixes ### Problems From Lab: 1. Lab reports "AppLauncher doesnt quit properly after app.close(), app.quit() doesn't help either." 2. Cold startup times for tests using IS can be upwards of 10 min on Lab CI machines. Above issues apply to us, because tests hang during sub-process tests section, between the end of last test and the beginning of the next test. See detailed logs and analysis from reproducing locally [here](#568) ### Fixes 1. `SimulationApp` Force Exit: Skips `app.close()` (which can hang indefinitely in Kit's shutdown path) when the env var `ISAACLAB_ARENA_FORCE_EXIT_ON_COMPLETE=1` is set. Calls a new `_kill_child_processes()` helper that walks `/proc` to `SIGKILL` all direct children before doing `os._exit(0)`, preventing orphaned Kit processes from holding GPU resources. 2. `run_subprocess` has a configuarable wall-clock timeouts and process isolation, such that when needed, it could trigger the force exit path above. 3. Add wall-clock timing and logging inside the SimulationApp start method. Keep track of how much startup time is taking on CI. ## Minor fixes 1. Add timing stats into pytest cmds such that it reports the slowests test func at the end of each section. 2. Parametrize multi-config tests: Convert nested for-loops in `test_zero_action_policy_kitchen_pick_and_place` (6 configs) and `test_zero_action_policy_gr1_open_microwave` (3 configs) into `@pytest.mark.parametrize.` Each config gets its own timeout, pass/fail, and timing. 3. Reduce num_envs in gr00t eval_runner test to speed up. ### Local validation With the repro script #568, I do not have local stalling. Log for more details. [repro_20260410_041313.log](https://github.com/user-attachments/files/26620524/repro_20260410_041313.log) ### CI Before -- timeout <img width="1219" height="170" alt="image" src="https://github.com/user-attachments/assets/2f9eabb2-403d-4257-bd84-4da508de7d00" /> ### CI After <img width="1219" height="170" alt="image" src="https://github.com/user-attachments/assets/dbaf2a7d-e3a4-4ad2-85a4-389eae962c1d" /> <img width="1198" height="472" alt="image" src="https://github.com/user-attachments/assets/8a24f1aa-4bcb-4030-b075-09f3885673c2" /> ## TODOs - test_camera_observations takes 10mins to start the app due to Kit cold start. Experimenting with a warm start before tests process here #565 - Kit itself intermittently deadlocks during startup — not because of orphans, but because Kit's internal thread synchronization fails on low-CPU runners. Experimenting with retry here #570

## Summary Install missing arena package into NGC docker. ## Detailed description - We forgot to install our new package `isaaclab_arena_examples` into the docker image. - This was masked in CI due to mounting a branch and correctly installing there. Co-authored-by: Xinjie Yao <[email protected]>

## Summary - Fix the `eval_config.json` example in the DexSuite Kuka Allegro Lift evaluation docs to match the actual `eval_runner.py` schema (`jobs` array with `name`, `arena_env_args`, `policy_type`, `policy_config_dict`). Signed-off-by: Clemens Volk <[email protected]> Co-authored-by: Xinjie Yao <[email protected]>

## Summary Doc fix to https://nvbugspro.nvidia.com/bug/6062848, Readme updates. ## Detailed description - Policy training docs: Added a "Compute Requirements" section (GPU VRAM + system RAM guidance) to all three workflow tutorials (static_manipulation, sequential_static_manipulation, locomanipulation) and fixed the "an an" typo. - Arena-in-your-repo docs: Created an index.rst landing page for the section and updated docs/index.rst to use it instead of listing the three sub-pages individually. - README: Added a link to the "Installing IsaacLab-Arena in Your Repository" guide in the "Publishing Your Own Benchmark" section.

## Summary As CI seems to run smoothly agin, bring back previously disabled tests.

## Problem IsaacLab-Arena needs a tabletop manipulation task where the G1 robot uses the WBC-AGILE locomotion policy to pick up an apple and place it on a plate, while balancing in place. Ref: ISAAC-12630 ## Solution Add a new `G1AgileTabletopAppleToPlateEnvironment` that wires the `G1WBCAgileJointEmbodiment` (from PR #489) with the existing `PickAndPlaceTask`, a Seattle Lab table scene, and appropriate object assets. ## Changes - **`isaaclab_arena_environments/g1_agile_tabletop_apple_to_plate_environment.py`** — New environment class: G1 robot at (-0.4, 0, 0) facing a table with an apple (pick object) and a clay plate (target). Uses `G1WBCAgileJointEmbodiment` for balance + upper body control. 30-second episodes. Supports `--object`, `--embodiment`, `--teleop_device` CLI args. - **`isaaclab_arena_environments/cli.py`** — Register the new environment in the `ExampleEnvironments` dict. - **`isaaclab_arena/tests/test_g1_agile_tabletop_apple_to_plate.py`** — Two tests: (1) initial state is not terminated (apple starts away from plate), (2) teleporting apple onto plate triggers success termination. Uses correct base-height command (0.75) to keep the robot stable. ## Testing - [x] New unit tests added (2 tests) - [x] Linters pass locally (black, flake8, isort, pyupgrade, codespell, license headers) - [ ] CI pipeline (tests require Isaac Sim Docker with GPU) ## Notes - Object positions (apple, plate, robot) are based on Seattle Lab table geometry and G1 arm reach. May need visual tuning in simulator. - No new task class needed — the existing `PickAndPlaceTask` handles contact-sensor success detection, object-dropped termination, and metrics. - Self-review caught and fixed a test issue: the initial-state test was sending zero base-height commands, which would cause the robot to squat. Fixed to use 0.75 (matching established pattern from `test_g1_wbc_embodiment.py`). --- *Generated by [autodev](https://github.com/anthropics/claude-code) — Claude Code* --------- Signed-off-by: Lionel Gulich <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

## Summary Rework the concepts documentation to eliminate AI slop. --------- Signed-off-by: Clemens Volk <[email protected]> Co-authored-by: Clemens Volk <[email protected]> Co-authored-by: isaaclab-review-bot[bot] <270793704+isaaclab-review-bot[bot]@users.noreply.github.com> Co-authored-by: Xinjie Yao <[email protected]>

## Motivation When building tasks, users often need to restrict object placement to a sub-region of a surface -- for example, only within the robot's reachable workspace. `On(table)` allows placement anywhere on the table, and `AtPosition` pins to a single point. There was no way to constrain to a region or set bounds on individual axes. ## Summary - New `PositionLimits` unary relation that constrains object position in world coordinates. Supports full ranges (box), single bounds (half-plane), or a mix per axis. - New `UnaryRelation` base class so `get_spatial_relations()` automatically includes any new unary relation without updating isinstance checks. - `PositionLimitsLossStrategy` using `linear_band_loss` (both bounds) and `single_boundary_linear_loss` (single bound). - Registered in solver strategies with slope=100.0 (matching `AtPosition`). - Fixed `_print_unary_relation_debug` to work with any unary relation type. ## Usage ```python # Full box constraint (reachable region) apple.add_relation(On(table)) apple.add_relation(PositionLimits(x_min=-0.3, x_max=0.3, y_min=-0.2, y_max=0.2)) # Single bound (half-plane) apple.add_relation(PositionLimits(x_min=0.5)) # Mix apple.add_relation(PositionLimits(x_min=-0.3, x_max=0.3, y_min=-0.2)) ``` ## Test plan - [x] 12 PositionLimits tests pass (11 strategy-level, 1 solver integration) - [x] All relation/placer tests pass - [x] Pre-commit checks pass --------- Signed-off-by: Clemens Volk <[email protected]> Co-authored-by: Xinjie Yao <[email protected]>

## Summary Test code owners by only adding myself. ## Detailed description - We had an incident where a couple of bots with organization access collaborated to push an unreviewed change to main. - This is an attempt to prevent this in the future.

## Summary Complete the list code owners. ## Detailed description - Follows successful test #584

## Summary Fix CODEOWNERS specification. ## Detailed description - Mutiple lines indicate tha the last line overrides the previous. - This is not what was intended. - Fix.

## Summary Bring IsaacLab issue templates into Isaac Lab - Arena ## Detailed description - Gives users a structure for bug reports and feature requests.

## Summary This is to fix teleop crashing https://nvbugspro.nvidia.com/bug/6066640 The root cause is isaacteleop has a regression in latest 1.2.xxx. 1.1.x should be the latest stable version to use and Teleop team will push patches to fix the issues on Teeleop side. Teleop on arena side verified to work after rebuilding the arena docker with this change. Co-authored-by: Xinjie Yao <[email protected]>

## Summary Remove server client from v0.2 release docs ## Detailed description - We plan on reworking the server client to fully support it in v0.3 - The current implementation of the server-client, and it's documentation, are only half supported. - Remove the documentation references to the server-client and aim for full support in `v0.3` - Address [6072205](https://nvbugspro.nvidia.com/bug/6072205)

Addressing SQA https://nvbugspro.nvidia.com/bug/6077281 https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/6077909

## Summary Clean up type annotations in the environment files ## Detailed description - Type annotation were not properly done at the start of the project, and that propagated over time to all environment files. - This cleans that up.

## Summary The config in the doc is mistakenly set for AVP instead of for Quest/Pico handtracking. Correcting the doc. This fixes issue reported from https://nvbugspro.nvidia.com/bug/6076546

…596) ## Summary CI subprocess tests are slow and faced with timeout without stalling ## Detailed description - Skipped `test_eval_runner_enable_cameras` as cold-start camera rendering takes ~1165s, making CI exceeding the timeout. - Replaced raw `subprocess.run()` with the shared `run_subprocess()` helper, which enforces `ISAACLAB_ARENA_SUBPROCESS_TIMEOUT` (900s in CI). - Removed redundant stdout-regex failure check; the eval_runner already exits non-zero on job failure (no` --continue_on_error`).

## Summary Address https://nvbugspro.nvidia.com/bug/6062848 ## Detailed description In a multi-GPU setup, the standard output (stdout) buffer gets flooded with logs from secondary GPUs. As a result, the wandb prompt requesting user input gets buried in the output. Because the prompt goes unanswered, the data loading process stalls, eventually leading to a timeout

## Summary Set GR1 XrCfg to anchor on robot pelvix similar to G1. Teleop initial view aligns with robot head. This fix bug https://nvbugspro.nvidia.com/bug/6076070 --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

## Summary Sanitize numpy types in metrics for readable logging and JSON export. Fix to https://nvbugspro.nvidia.com/bug/6077892 ## Detailed description - **Why:** Metric values from environment rollouts can be `np.float32`, `np.int64`, or `np.ndarray`. These print as `np.float32(0.85)` instead of `0.85` and are not JSON-serializable by default. - **What changed:** Added a `sanitize_metrics()` utility in `metrics_logger.py` that converts numpy scalars to `float` and numpy arrays to Python lists. Used it in `policy_runner.py` (metrics print after rollout) and `MetricsLogger.append_job_metrics()` (sanitizes on ingestion so both `print_metrics()` and `save_metrics_to_file()` get clean types). - **Impact:** Metrics are now human-readable when printed and safely serializable to JSON. No behavioral change.

## Summary Teleop bug https://nvbugspro.nvidia.com/bug/6081144 might be related to missing the env sourcing step. Update the doc so it is in a separate box and more noticable.

Reverts #562 Co-authored-by: Alex Millane <[email protected]>

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

Merge PR syncing main → develop. The changes span five areas: (1) isaacteleop version bump to ~=1.1.0, (2) documentation restructuring of teleop workflows to numbered-step format with an important cloudxr.env ordering note, (3) XR anchor refactor from world-space offset composition to pelvis-relative prim anchoring on GR1T2 (aligning it with G1's existing pattern), (4) a metrics_to_plain_python_types() utility to fix JSON serialization of numpy types, and (5) widespread adoption of TYPE_CHECKING for deferred imports across all environment files.

The changes are well-structured and internally consistent. The deleted common.py has no remaining references, and the base class get_xr_cfg() correctly returns self.xr which is now set directly. Two minor suggestions below.

Design Assessment

Design is sound. The XR anchor refactor is a good simplification — pelvis-relative anchoring removes the need for world-space pose composition and makes the behavior consistent between GR1T2 and G1. The TYPE_CHECKING pattern cleanly resolves the long-standing annotation issue noted in multiple TODO comments.

Findings

(Detailed findings posted as inline comments on the relevant lines.)

🔵 Suggestion: metrics_to_plain_python_types() return type annotation is narrower than actual — see inline comment.

🔵 Suggestion: metrics_to_plain_python_types() doesn't handle nested dicts — noted inline as a minor enhancement.

Test Coverage

XR refactor: Tests properly updated — the old world-space composition tests are replaced by unified pelvis-relative assertions covering both gr1_pink and g1_wbc_pink. The test correctly verifies that set_initial_pose() does not alter the anchor config (the key behavioral change).
Metrics utility: No dedicated unit test for metrics_to_plain_python_types(), but the function is straightforward and exercised via append_job_metrics() in integration tests.
Eval runner: Test simplification from stdout parsing to exit-code checking is an improvement in robustness. The @pytest.mark.skip additions for camera cold-start are pragmatic CI fixes.

CI Status

Pre-commit check is in progress.

Verdict

Ship it — Clean merge with no blocking issues. The two inline suggestions are optional improvements.

## Summary Reduce to a minimum number of policy runner tests. ## Detailed description - These tests are slow and flaky - Reduce to a minimal number to try to speed up CI and decrease the probability it stalls. ## Not done - This reduces our test coverage on our environments (which was already very low) - I will try, in a follow up MR, to add complete coverage of the environments **in process**, so that they're fast to run.

## Summary Address https://nvbugspro.nvidia.com/bug/6084606. Update training doc system requirements and defaults ## Detailed description Across all three workflow docs (locomanipulation, static_manipulation, sequential_static_manipulation): - Increased recommended system RAM from 256 GB to 512 GB - Changed --dataloader_num_workers from 8 to 16 in all training commands - Added a .. note:: explaining that global_batch_size and dataloader_num_workers can be reduced on less powerful hardware at the cost of longer training time

greptile-apps · 2026-04-20T20:08:19Z

Greptile Summary

This PR merges main into develop, bringing in new environments (GR1PutAndCloseDoor, GalileoG1LocomanipPickAndPlace), a new MetricsLogger, XR anchor pose tests, GR00T closed-loop policy tests, and various doc/Docker updates.

Clarification on previous P1: \"put_item_in_fridge_and_close_door\" is a valid registered environment name (gr1_put_and_close_door_environment.py, line 35) — the prior comment flagging it as unregistered was incorrect.
Unresolved P1: The quaternion noise addition in gr1t2.py lines 511–512 still lacks renormalization after the perturbation, breaking the unit-norm constraint and potentially corrupting downstream matrix_from_quat results.

Confidence Score: 4/5

Safe to merge pending resolution of the quaternion denormalization P1 in gr1t2.py.

One P1 from a prior review (additive quaternion noise without renormalization in gr1t2.py) remains unaddressed; all other prior concerns are P2 or confirmed false-positive. No new critical issues were found in this merge.

isaaclab_arena/embodiments/gr1t2/gr1t2.py (quaternion noise renormalization)

Important Files Changed

Filename	Overview
isaaclab_arena/embodiments/gr1t2/gr1t2.py	Quaternion noise applied additively without renormalization (lines 511-512), breaking unit-norm constraint on rotation quaternions — P1 from previous review still unresolved.
isaaclab_arena/evaluation/policy_runner.py	Rollout logic and distributed-mode handling look correct; minor: RuntimeError re-raise at line 118 drops original traceback (raise without `from e`).
isaaclab_arena/metrics/metrics_logger.py	New MetricsLogger class; numpy-to-Python type conversion and JSON serialization look correct.
isaaclab_arena/tests/test_eval_runner.py	"put_item_in_fridge_and_close_door" environment is confirmed registered in gr1_put_and_close_door_environment.py — previously raised P1 was a false positive.
isaaclab_arena_gr00t/tests/test_gr00t_closedloop_policy.py	Redundant inner `import sys` at line 104 inside _run_gr00t_closedloop_policy (already imported at module scope on line 7); otherwise test logic looks correct.
docker/Dockerfile.isaaclab_arena	Build structure looks correct; INSTALL_GROOT guard correctly gates CUDA 12.8 and GR00T dependency installs.
isaaclab_arena_environments/gr1_put_and_close_door_environment.py	New GR1 put-and-close-door environment; registers under name "put_item_in_fridge_and_close_door", consistent with its use in test_eval_runner.py.
isaaclab_arena_environments/galileo_g1_locomanip_pick_and_place_environment.py	Locomanipulation pick-and-place environment for G1 robot; structure follows the project pattern correctly.
isaaclab_arena/tests/test_xr_anchor_pose.py	New XR anchor pose tests for GR1T2 and G1 WBC embodiments; tests are well-structured and numerically precise.

Sequence Diagram

sequenceDiagram
    participant CLI
    participant PolicyRunner
    participant ArenaEnvBuilder
    participant Env
    participant Policy
    participant MetricsLogger

    CLI->>PolicyRunner: main()
    PolicyRunner->>ArenaEnvBuilder: get_arena_builder_from_cli(args_cli)
    ArenaEnvBuilder->>Env: make_registered_and_return_cfg()
    PolicyRunner->>Policy: policy_cls.from_args(args_cli)
    PolicyRunner->>PolicyRunner: rollout_policy(env, policy, num_steps, num_episodes)
    loop Each step/episode
        PolicyRunner->>Policy: get_action(env, obs)
        Policy-->>PolicyRunner: actions
        PolicyRunner->>Env: env.step(actions)
        Env-->>PolicyRunner: obs, terminated, truncated
        alt terminated or truncated
            PolicyRunner->>Policy: reset(env_ids)
        end
    end
    PolicyRunner->>Env: compute_metrics(env.unwrapped)
    Env-->>PolicyRunner: metrics
    PolicyRunner->>MetricsLogger: metrics_to_plain_python_types(metrics)
    PolicyRunner->>CLI: print metrics

_{Reviews (3): Last reviewed commit: "Fix docs." | Re-trigger Greptile}

viiik-inside and others added 26 commits April 10, 2026 02:41

Doc/explain resume flag (#456)

0178929

## Summary Explain the resume flag

Update newton example doc to clearly state failure using physx eval (#…

e3f1283

…572) ## Summary This is a doc change requested by QA in https://nvbugspro.nvidia.com/bug/6063011 It clarifies that evaluated newton trained model using physx is expected to completely fail the dexsuite task.

[CI] Revert eval_runner tests (#559)

9a382b5

## Summary As CI seems to run smoothly agin, bring back previously disabled tests.

Add test code owners. (#584)

30c68ae

## Summary Test code owners by only adding myself. ## Detailed description - We had an incident where a couple of bots with organization access collaborated to push an unreviewed change to main. - This is an attempt to prevent this in the future.

Complete the list of codeowners. (#585)

03d5938

## Summary Complete the list code owners. ## Detailed description - Follows successful test #584

Fix code owners. (#586)

26c1504

## Summary Fix CODEOWNERS specification. ## Detailed description - Mutiple lines indicate tha the last line overrides the previous. - This is not what was intended. - Fix.

Bring over issue templates from IsaacLab. (#583)

ff8db9c

## Summary Bring IsaacLab issue templates into Isaac Lab - Arena ## Detailed description - Gives users a structure for bug reports and feature requests.

Dox fix on closedloop eval for example workflows (#600)

7761419

Addressing SQA https://nvbugspro.nvidia.com/bug/6077281 https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/6077909

Correct the CloudXR config for Quest handtracking (#605)

87ca851

## Summary The config in the doc is mistakenly set for AVP instead of for Quest/Pico handtracking. Correcting the doc. This fixes issue reported from https://nvbugspro.nvidia.com/bug/6076546

Update doc to make teleop env setting step clear (#609)

da3f9c2

## Summary Teleop bug https://nvbugspro.nvidia.com/bug/6081144 might be related to missing the env sourcing step. Update the doc so it is in a separate box and more noticable.

Revert "Add G1 AGILE tabletop apple-to-plate environment" (#597)

b2f1b6e

Reverts #562 Co-authored-by: Alex Millane <[email protected]>

Merge branch 'main' into alex/merge_main_into_develop

845443c

isaaclab-review-bot Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread isaaclab_arena/metrics/metrics_logger.py

Comment thread isaaclab_arena/metrics/metrics_logger.py

xyao-nv approved these changes Apr 17, 2026

View reviewed changes

alexmillane and others added 2 commits April 17, 2026 19:02

alexmillane added 3 commits April 20, 2026 21:51

Fix bug.

3a7f81b

Merge branch 'main' into alex/merge_main_into_develop

eff0559

Merge branch 'develop' into alex/merge_main_into_develop

5c73e31

alexmillane marked this pull request as ready for review April 20, 2026 20:05

alexmillane requested review from cvolkcvolk, peterd-NV, qianl-nv, viiik-inside and zhx06 as code owners April 20, 2026 20:05

alexmillane added 2 commits April 21, 2026 09:44

Merge branch 'develop' into alex/merge_main_into_develop

f7447bf

Fix docs.

77bcb6f

alexmillane merged commit 5986888 into develop Apr 21, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main into develop#615

Merge main into develop#615
alexmillane merged 33 commits intodevelopfrom
alex/merge_main_into_develop

alexmillane commented Apr 17, 2026

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

alexmillane commented Apr 17, 2026

Summary

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

Design Assessment

Findings

Test Coverage

CI Status

Verdict

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

greptile-apps Bot commented Apr 20, 2026 •

edited

Loading