Skip to content

Commit ef0fa28

Browse files
qianl-nvxyao-nv
andauthored
Improve lift rl training example (#529)
## Summary Update the lift object RL example to have a high success rate model ## Detailed description - What was the reason for the change? Existing lift object RL training model success rate is low (~30%), and the arm motion is unnatural. - What has been changed? Add a franka joint control embodiment for RL training to avoid the weird arm motion from IK version Update the observation to include joint and target poses only Fix a bug in the base rls policy so the target pose (task_obs in addition to policy) is passed to the actor/critic model Fix a bug resulting ~0 success rate for parallel eval due to incorrect object/target frame in the success term Update RL docs with latest models and commands - What is the impact of this change? The RL model now gets 70~80% success rate within 1.5h --------- Co-authored-by: Xinjie Yao <xyao@nvidia.com>
1 parent ad17473 commit ef0fa28

File tree

51 files changed

+285
-160
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+285
-160
lines changed

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ docker exec isaaclab_arena-latest bash -c "cd /workspaces/isaaclab_arena && \
2929
--num_steps 10 \
3030
kitchen_pick_and_place \
3131
--object cracker_box \
32-
--embodiment franka"
32+
--embodiment franka_ik"
3333
```
3434

3535
## Common Commands

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ asset_registry = AssetRegistry()
111111

112112
# Select building blocks
113113
background = asset_registry.get_asset_by_name("kitchen")()
114-
embodiment = asset_registry.get_asset_by_name("franka")()
114+
embodiment = asset_registry.get_asset_by_name("franka_ik")()
115115
cracker_box = asset_registry.get_asset_by_name("cracker_box")()
116116
tomato_soup_can = asset_registry.get_asset_by_name("tomato_soup_can")()
117117

Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 2 additions & 2 deletions
Loading

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ The following code snippet shows a simple example(pick up a tomato soup can and
100100

101101
.. code-block:: python
102102
103-
embodiment = asset_registry.get_asset_by_name("franka")(enable_cameras=True)
103+
embodiment = asset_registry.get_asset_by_name("franka_ik")(enable_cameras=True)
104104
background = asset_registry.get_asset_by_name("kitchen")()
105105
tomato_soup_can = asset_registry.get_asset_by_name("tomato_soup_can")()
106106
destination_location = ObjectReference(

docs/pages/concepts/concept_embodiment_design.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ Environment Integration
119119
.. code-block:: python
120120
121121
# Embodiment creation with camera support
122-
embodiment = asset_registry.get_asset_by_name("franka")(
122+
embodiment = asset_registry.get_asset_by_name("franka_ik")(
123123
enable_cameras=True
124124
)
125125
@@ -144,7 +144,7 @@ Usage Examples
144144

145145
.. code-block:: python
146146
147-
franka = asset_registry.get_asset_by_name("franka")(enable_cameras=True)
147+
franka = asset_registry.get_asset_by_name("franka_ik")(enable_cameras=True)
148148
task = PickAndPlaceTask(pick_object, destination, background)
149149
150150
**Humanoid Control Modes**

docs/pages/concepts/concept_environment_design.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Creating an Environment Example
3333
.. code-block:: python
3434
3535
# Component creation
36-
embodiment = asset_registry.get_asset_by_name("franka")()
36+
embodiment = asset_registry.get_asset_by_name("franka_ik")()
3737
background = asset_registry.get_asset_by_name("kitchen")()
3838
pick_object = asset_registry.get_asset_by_name("cracker_box")()
3939
pick_object.set_initial_pose(Pose(position_xyz=(0.4, 0.0, 0.1)))

docs/pages/example_workflows/reinforcement_learning/step_2_policy_training.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ builds the environment, and registers it with gym so IsaacLab's script can find
1818
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
1919
--task lift_object \
2020
--rl_training_mode \
21-
--num_envs 512 \
22-
--max_iterations 12000
21+
--num_envs 4096 \
22+
--max_iterations 2000
2323
2424
.. tip::
2525

@@ -55,8 +55,8 @@ For example, to train with relu activation and a higher learning rate:
5555
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
5656
--task lift_object \
5757
--rl_training_mode \
58-
--num_envs 512 \
59-
--max_iterations 12000 \
58+
--num_envs 4096 \
59+
--max_iterations 2000 \
6060
agent.policy.activation=relu \
6161
agent.algorithm.learning_rate=0.001
6262
@@ -112,15 +112,15 @@ Add ``--distributed`` to spread environments across all available GPUs:
112112
--external_callback isaaclab_arena.environments.isaaclab_interop.environment_registration_callback \
113113
--task lift_object \
114114
--rl_training_mode \
115-
--num_envs 512 \
116-
--max_iterations 12000 \
115+
--num_envs 4096\
116+
--max_iterations 2000 \
117117
--distributed
118118
119119
120120
Expected Results
121121
^^^^^^^^^^^^^^^^
122122

123-
After 12,000 iterations (~6 hours on a single GPU with 512 environments), the trained
123+
After 2,000 iterations (~40 minutes on a single GPU with 4096 environments), the trained
124124
policy should reliably grasp and lift objects to commanded target positions.
125125

126126
.. image:: ../../../images/lift_object_rl_task.gif

docs/pages/example_workflows/reinforcement_learning/step_3_evaluation.rst

Lines changed: 49 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,12 @@ or you can download a pre-trained one as described below.
2121
.. code-block:: bash
2222
2323
hf download \
24-
nvidia/IsaacLab-Arena-Lift-Object-RL \
25-
model_11999.pt \
24+
nvidia/Arena-Franka-Lift-Object-RL-Task \
2625
--local-dir $MODELS_DIR/lift_object_checkpoint
2726
2827
After downloading, the checkpoint is at:
2928

30-
``$MODELS_DIR/lift_object_checkpoint/model_11999.pt``
29+
``$MODELS_DIR/lift_object_checkpoint/model_1999.pt``
3130

3231
Replace checkpoint paths in the examples below with this path.
3332

@@ -50,14 +49,14 @@ Method 1: Single Environment Evaluation
5049
python isaaclab_arena/evaluation/policy_runner.py \
5150
--visualizer kit \
5251
--policy_type rsl_rl \
53-
--num_steps 1000 \
54-
--checkpoint_path logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_11999.pt \
52+
--num_episodes 20 \
53+
--checkpoint_path $MODELS_DIR/lift_object_checkpoint/model_1999.pt \
5554
lift_object
5655
5756
.. note::
5857

59-
If you downloaded the pre-trained model from Hugging Face, replace the checkpoint path with:
60-
``$MODELS_DIR/lift_object_checkpoint/model_11999.pt``
58+
If you train the model yourself, the checkpoint path is typically in the ``logs/rsl_rl/generic_experiment/`` directory.
59+
Replace the checkpoint path with the path to your own checkpoint.
6160

6261
Policy-specific arguments (``--policy_type``, ``--checkpoint_path``, etc.) must come **before** the
6362
environment name. Environment-specific arguments (``--object``, ``--embodiment``, etc.) must come
@@ -67,7 +66,7 @@ At the end of the run, metrics are printed to the console:
6766

6867
.. code-block:: text
6968
70-
Metrics: {'success_rate': 0.85, 'num_episodes': 12}
69+
Metrics: {'success_rate': 0.81, 'num_episodes': 12}
7170
7271
7372
Method 2: Parallel Environment Evaluation
@@ -79,21 +78,28 @@ For more statistically significant results, run across many environments in para
7978
8079
python isaaclab_arena/evaluation/policy_runner.py \
8180
--policy_type rsl_rl \
82-
--num_steps 5000 \
81+
--num_episodes 1024 \
8382
--num_envs 64 \
84-
--checkpoint_path logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_11999.pt \
85-
--headless \
83+
--env_spacing 2.5 \
84+
--visualizer kit \
85+
--checkpoint_path $MODELS_DIR/lift_object_checkpoint/model_1999.pt \
8686
lift_object
8787
8888
.. code-block:: text
8989
90-
Metrics: {'success_rate': 0.83, 'num_episodes': 156}
90+
Metrics: {'success_rate': 0.72, 'num_episodes': 1024}
91+
92+
.. image:: ../../../images/lift_object_rl_parallel.gif
93+
:align: center
94+
:height: 400px
9195

9296

9397
Method 3: Batch Evaluation
9498
^^^^^^^^^^^^^^^^^^^^^^^^^^^
9599

96100
To evaluate multiple checkpoints in sequence, use ``eval_runner.py`` with a JSON config.
101+
Here we evaluate the models you trained yourself.
102+
The checkpoint path should be replaced with the timestamp of your training run in the ``logs/rsl_rl/generic_experiment/`` directory.
97103

98104
**1. Create an evaluation config**
99105

@@ -102,20 +108,30 @@ Create a file ``eval_config.json``:
102108
.. code-block:: json
103109
104110
{
105-
"policy_runner_args": {
106-
"policy_type": "rsl_rl",
107-
"num_steps": 5000,
108-
"num_envs": 64,
109-
"headless": true
110-
},
111-
"evaluations": [
111+
"jobs": [
112112
{
113-
"checkpoint_path": "logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_5999.pt",
114-
"environment": "lift_object"
113+
"name": "lift_object_model_1000",
114+
"policy_type": "rsl_rl",
115+
"num_episodes": 1024,
116+
"arena_env_args": {
117+
"environment": "lift_object",
118+
"num_envs": 64
119+
},
120+
"policy_config_dict": {
121+
"checkpoint_path": "logs/rsl_rl/generic_experiment/<timestamp>/model_1000.pt"
122+
}
115123
},
116124
{
117-
"checkpoint_path": "logs/rsl_rl/generic_experiment/2026-01-28_17-26-10/model_11999.pt",
118-
"environment": "lift_object"
125+
"name": "lift_object_model_1999",
126+
"policy_type": "rsl_rl",
127+
"num_episodes": 1024,
128+
"arena_env_args": {
129+
"environment": "lift_object",
130+
"num_envs": 64
131+
},
132+
"policy_config_dict": {
133+
"checkpoint_path": "logs/rsl_rl/generic_experiment/<timestamp>/model_1999.pt"
134+
}
119135
}
120136
]
121137
}
@@ -128,16 +144,18 @@ Create a file ``eval_config.json``:
128144
129145
.. code-block:: text
130146
131-
Evaluating checkpoint 1/2: model_5999.pt
132-
Metrics: {'success_rate': 0.72, 'num_episodes': 152}
147+
======================================================================
148+
METRICS SUMMARY
149+
======================================================================
133150
134-
Evaluating checkpoint 2/2: model_11999.pt
135-
Metrics: {'success_rate': 0.85, 'num_episodes': 156}
151+
lift_object_model_1000:
152+
num_episodes 1024
153+
success_rate 0.6526
136154
137-
Summary:
138-
========================================
139-
model_5999.pt | Success: 72% | Episodes: 152
140-
model_11999.pt | Success: 85% | Episodes: 156
155+
lift_object_model_1999:
156+
num_episodes 1024
157+
success_rate 0.7408
158+
======================================================================
141159
142160
143161
Understanding the Metrics

isaaclab_arena/assets/retargeter_library.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def get_pipeline_builder(self, embodiment: object) -> Callable:
7575
@register_retargeter
7676
class FrankaKeyboardRetargeter(RetargetterBase):
7777
device = "keyboard"
78-
embodiment = "franka"
78+
embodiment = "franka_ik"
7979

8080
def __init__(self):
8181
pass
@@ -87,7 +87,7 @@ def get_pipeline_builder(self, embodiment: object) -> Callable | None:
8787
@register_retargeter
8888
class FrankaSpaceMouseRetargeter(RetargetterBase):
8989
device = "spacemouse"
90-
embodiment = "franka"
90+
embodiment = "franka_ik"
9191

9292
def __init__(self):
9393
pass

0 commit comments

Comments
 (0)