First experiments: add GR00T closed-loop docs and language instruction support#519
First experiments: add GR00T closed-loop docs and language instruction support#519cvolkcvolk wants to merge 19 commits intomainfrom
Conversation
docs/pages/quickstart/first_experiments/running_a_real_policy.rst
Outdated
Show resolved
Hide resolved
docs/pages/quickstart/first_experiments/running_a_real_policy.rst
Outdated
Show resolved
Hide resolved
alexmillane
left a comment
There was a problem hiding this comment.
Looks great! Another great improvement!
Bridges from zero_action to a real policy: shows the container prerequisite (-g flag), the two argument changes vs zero_action (policy_type + enable_cameras), and batch evaluation via the GR00T jobs config. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Convert first_experiments.rst into a directory with an index and two pages: 'Exploring Environment Variations' (zero-action experiments) and 'Running a Real Policy' (GR00T N1.6 closed-loop). Matches the structure used by the Example Workflows section. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
first_arena_env.rst was linking to the old flat first_experiments path; update both occurrences to first_experiments/index. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Shows a 5x5 grid of closed-loop GR00T N1.6 runs varying background, lighting, and destination object. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Add closing sentence to Exploring Environment Variations pointing forward to Running a Real Policy - Explain the switch from --num_steps to --num_episodes in the GR00T command - Fix stale job count: seven -> six (billiard_hall_wooden_bowl removed) - Fix GIF caption: destination object -> pick-up object Signed-off-by: Clemens Volk <cvolk@nvidia.com>
'Batch' implies parallel execution; eval_runner.py runs jobs sequentially. 'Multi-job' is accurate and maps directly to the jobs config concept. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Add mustard_bottle, sugar_box, and mug jobs. Distribute wooden_bowl as destination across 5 of the 9 jobs (blue_block, orange, tomato_sauce_can, mustard_bottle, mug) and bowl_ycb across the remaining 4. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
55b9269 to
02df632
Compare
Allows callers to pass a natural-language instruction directly on the command line. The value takes precedence over the task's own get_task_description(), which in turn takes precedence over the policy config YAML fallback. Remove the hardcoded language_instruction from droid_manip_gr00t_closedloop_config.yaml so the instruction is always supplied explicitly rather than silently falling back to a stale string that doesn't match the object being evaluated. Update the GR00T closed-loop docs command to pass the instruction explicitly. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Wire language_instruction through Job, eval_runner, and rollout_policy so per-job instructions in the jobs config take precedence over the task's own get_task_description(). Add explicit language instructions to all jobs in droid_pnp_srl_gr00t_jobs_config.json. Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Raise ValueError in Gr00tClosedloopPolicy and Gr00tRemotePolicy set_task_description when no instruction is provided, preventing silent evaluation with an empty prompt - Rename droid_pick_and_place_srl -> pick_and_place_maple_table in droid_pnp_srl_gr00t_jobs_config.json and drop the duplicate billiard_hall_wooden_bowl job, aligning with the intended state - Fix "Two things change" -> "Three things change" in docs now that --language_instruction is a third notable CLI change Signed-off-by: Clemens Volk <cvolk@nvidia.com>
| zero-action experiments. This functionality can be used to test how the policy adapts to each new | ||
| object and lighting condition, as we shall see in the next section. | ||
|
|
||
| **Multi-job evaluation across object variations** |
There was a problem hiding this comment.
@xyao-nv I think you had a comment on that right?
There was a problem hiding this comment.
Can we call it as "Sequential batch evaluation across object variations"
https://isaac-sim.github.io/IsaacLab-Arena/main/pages/policy_evaluation/evaluation_types.html
| zero-action experiments. This functionality can be used to test how the policy adapts to each new | ||
| object and lighting condition, as we shall see in the next section. | ||
|
|
||
| **Multi-job evaluation across object variations** |
There was a problem hiding this comment.
Can we call it as "Sequential batch evaluation across object variations"
https://isaac-sim.github.io/IsaacLab-Arena/main/pages/policy_evaluation/evaluation_types.html
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
| def set_task_description(self, task_description: str | None) -> dict[str, Any]: | ||
| if task_description is None: | ||
| task_description = self.policy_config.language_instruction | ||
| if not task_description: |
| """Set the language instruction of the task being evaluated.""" | ||
| if task_description is None: | ||
| task_description = self.policy_config.language_instruction | ||
| if not task_description: |
| Running your First Experiments | ||
| ============================== | ||
|
|
||
| The following pages walk you through your first Arena experiments — first verifying that |
There was a problem hiding this comment.
exploring?
To be consistent as below.
|
|
||
| Batch Evaluation | ||
| ----------------- | ||
| Multi-Job Evaluation |
There was a problem hiding this comment.
Can re rename to "Sequential batch evaluation"
| - ``--enable_cameras`` turns on the robot's cameras, which GR00T requires for observations | ||
| - ``--language_instruction`` sets the natural-language instruction sent to the model | ||
|
|
||
| GR00T also requires absolute joint positions, so use ``--embodiment droid_abs_joint_pos`` |
There was a problem hiding this comment.
The default modality config came with this checkpoint is set to using absolute joint positions.
| droid_pnp_srl_gr00t_blue_block: | ||
| num_episodes 3 | ||
| object_moved_rate 0.0000 | ||
| success_rate 0.0000 |
There was a problem hiding this comment.
Maybe we can ack this bad results. We can add a note declaring this checkpoint is not post-trained to those object setup. So it's reasonable to observe 0 success rate.
It could remind the users again that we are eval platform, not responsible for policy SR given there is no known bugs in our evaluation.
Although, to me, the claim is not reasonable given it brands as a true foundation model.
|
|
||
| To go beyond the pre-trained GR00T N1.6 foundation model — for example, fine-tuning on your own | ||
| teleoperation data — see :doc:`../../../pages/example_workflows/imitation_learning/index` for | ||
| end-to-end imitation learning workflows. |
There was a problem hiding this comment.
How about adding a pointer to RL example too? So we are not tightly coupled with GR00T in policy eval.
xyao-nv
left a comment
There was a problem hiding this comment.
Thx for adding them!
Let's put the refactorings (aka rm -rf) into our v0.3 todos!
Summary
language_instructionas an optional per-job field in the jobs config (foreval_runner.py) and as a--language_instructionCLI argument inpolicy_runner. In both cases the value takes precedence over the task's own description.droid_manip_gr00t_closedloop_config.yamland add explicit per-object instructions todroid_pnp_srl_gr00t_jobs_config.json