Skip to content

First experiments: add GR00T closed-loop docs and language instruction support#519

Open
cvolkcvolk wants to merge 19 commits intomainfrom
cvolk/first-experiments-gr00t-docs
Open

First experiments: add GR00T closed-loop docs and language instruction support#519
cvolkcvolk wants to merge 19 commits intomainfrom
cvolk/first-experiments-gr00t-docs

Conversation

@cvolkcvolk
Copy link
Copy Markdown
Collaborator

@cvolkcvolk cvolkcvolk commented Mar 31, 2026

Summary

  • Split "Running your First Experiments" into two sub-pages:
    • Exploring Environment Variations — the existing zero-action experiments (swap objects, HDR, scale up, batch eval)
    • Running a Real Policy — new page walking through GR00T N1.6 closed-loop evaluation on the DROID embodiment
  • Add language_instruction as an optional per-job field in the jobs config (for eval_runner.py) and as a --language_instruction CLI argument in policy_runner. In both cases the value takes precedence over the task's own description.
  • Remove the hardcoded instruction from droid_manip_gr00t_closedloop_config.yaml and add explicit per-object instructions to droid_pnp_srl_gr00t_jobs_config.json

@cvolkcvolk cvolkcvolk changed the base branch from main to cvolk/language-instruction-in-jobs-config March 31, 2026 15:24
Copy link
Copy Markdown
Collaborator

@alexmillane alexmillane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Another great improvement!

cvolkcvolk and others added 15 commits April 1, 2026 16:21
Bridges from zero_action to a real policy: shows the container
prerequisite (-g flag), the two argument changes vs zero_action
(policy_type + enable_cameras), and batch evaluation via the
GR00T jobs config.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Convert first_experiments.rst into a directory with an index and two
pages: 'Exploring Environment Variations' (zero-action experiments)
and 'Running a Real Policy' (GR00T N1.6 closed-loop). Matches the
structure used by the Example Workflows section.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
first_arena_env.rst was linking to the old flat first_experiments path;
update both occurrences to first_experiments/index.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Shows a 5x5 grid of closed-loop GR00T N1.6 runs varying background,
lighting, and destination object.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
- Add closing sentence to Exploring Environment Variations pointing
  forward to Running a Real Policy
- Explain the switch from --num_steps to --num_episodes in the GR00T
  command
- Fix stale job count: seven -> six (billiard_hall_wooden_bowl removed)
- Fix GIF caption: destination object -> pick-up object

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
'Batch' implies parallel execution; eval_runner.py runs jobs
sequentially. 'Multi-job' is accurate and maps directly to the jobs
config concept.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Add mustard_bottle, sugar_box, and mug jobs. Distribute wooden_bowl
as destination across 5 of the 9 jobs (blue_block, orange,
tomato_sauce_can, mustard_bottle, mug) and bowl_ycb across the
remaining 4.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk cvolkcvolk force-pushed the cvolk/first-experiments-gr00t-docs branch from 55b9269 to 02df632 Compare April 1, 2026 14:22
@cvolkcvolk cvolkcvolk changed the base branch from cvolk/language-instruction-in-jobs-config to main April 1, 2026 14:24
Allows callers to pass a natural-language instruction directly on the
command line. The value takes precedence over the task's own
get_task_description(), which in turn takes precedence over the
policy config YAML fallback.

Remove the hardcoded language_instruction from
droid_manip_gr00t_closedloop_config.yaml so the instruction is always
supplied explicitly rather than silently falling back to a stale string
that doesn't match the object being evaluated.

Update the GR00T closed-loop docs command to pass the instruction
explicitly.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Wire language_instruction through Job, eval_runner, and rollout_policy
so per-job instructions in the jobs config take precedence over the
task's own get_task_description(). Add explicit language instructions
to all jobs in droid_pnp_srl_gr00t_jobs_config.json.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
@cvolkcvolk cvolkcvolk changed the title Add GR00T N1.6 closed-loop section to first experiments docs First experiments: add GR00T closed-loop docs and language instruction support Apr 1, 2026
- Raise ValueError in Gr00tClosedloopPolicy and Gr00tRemotePolicy
  set_task_description when no instruction is provided, preventing
  silent evaluation with an empty prompt
- Rename droid_pick_and_place_srl -> pick_and_place_maple_table in
  droid_pnp_srl_gr00t_jobs_config.json and drop the duplicate
  billiard_hall_wooden_bowl job, aligning with the intended state
- Fix "Two things change" -> "Three things change" in docs now that
  --language_instruction is a third notable CLI change

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
zero-action experiments. This functionality can be used to test how the policy adapts to each new
object and lighting condition, as we shall see in the next section.

**Multi-job evaluation across object variations**
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xyao-nv I think you had a comment on that right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it as "Sequential batch evaluation across object variations"
https://isaac-sim.github.io/IsaacLab-Arena/main/pages/policy_evaluation/evaluation_types.html

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

zero-action experiments. This functionality can be used to test how the policy adapts to each new
object and lighting condition, as we shall see in the next section.

**Multi-job evaluation across object variations**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it as "Sequential batch evaluation across object variations"
https://isaac-sim.github.io/IsaacLab-Arena/main/pages/policy_evaluation/evaluation_types.html

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
def set_task_description(self, task_description: str | None) -> dict[str, Any]:
if task_description is None:
task_description = self.policy_config.language_instruction
if not task_description:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

"""Set the language instruction of the task being evaluated."""
if task_description is None:
task_description = self.policy_config.language_instruction
if not task_description:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

Running your First Experiments
==============================

The following pages walk you through your first Arena experiments — first verifying that
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exploring?
To be consistent as below.


Batch Evaluation
-----------------
Multi-Job Evaluation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can re rename to "Sequential batch evaluation"

- ``--enable_cameras`` turns on the robot's cameras, which GR00T requires for observations
- ``--language_instruction`` sets the natural-language instruction sent to the model

GR00T also requires absolute joint positions, so use ``--embodiment droid_abs_joint_pos``
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default modality config came with this checkpoint is set to using absolute joint positions.

droid_pnp_srl_gr00t_blue_block:
num_episodes 3
object_moved_rate 0.0000
success_rate 0.0000
Copy link
Copy Markdown
Collaborator

@xyao-nv xyao-nv Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can ack this bad results. We can add a note declaring this checkpoint is not post-trained to those object setup. So it's reasonable to observe 0 success rate.

It could remind the users again that we are eval platform, not responsible for policy SR given there is no known bugs in our evaluation.

Although, to me, the claim is not reasonable given it brands as a true foundation model.


To go beyond the pre-trained GR00T N1.6 foundation model — for example, fine-tuning on your own
teleoperation data — see :doc:`../../../pages/example_workflows/imitation_learning/index` for
end-to-end imitation learning workflows.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a pointer to RL example too? So we are not tightly coupled with GR00T in policy eval.

Copy link
Copy Markdown
Collaborator

@xyao-nv xyao-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for adding them!
Let's put the refactorings (aka rm -rf) into our v0.3 todos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants