-
Notifications
You must be signed in to change notification settings - Fork 39
Skip redundant GPU rendering for action-chunking policies #521
Description
Summary
Action-chunking policies (e.g., GR00T) output a chunk of 16-32 actions per inference call. The policy replays this chunk over subsequent env.step() calls, but only the first step of each chunk needs a fresh camera frame -- all intermediate steps discard the rendered result. Today, Isaac Lab renders on every env.step() regardless, wasting significant GPU time.
This PR is a proof-of-concept demonstrating two optimizations that eliminate unnecessary rendering and significantly improve throughput. The implementation works but relies on a workaround (mutating cfg.sim.render_interval at runtime). We propose a cleaner long-term API change to Isaac Lab's env.step().
Branch
hkang/render-opt (based on latest main)
Benchmark Results
Hardware: NVIDIA L20 (48 GB), single GPU, remote policy (GR00T server + Isaac Sim client).
Tested on L20 with ActionChunkingClientSidePolicy + GR00T remote server, gr1_open_microwave (chunk_length=16), 8 envs, 100 steps:
- With render (inference step): ~4-6 step/s -- this is the step where
needs_obs_next_step()returnsTrue, sorender_intervalis restored to normal andenv.step()renders a camera frame for the next inference - Without render (chunk-replay step): ~9.5 step/s -- this is the step where the policy is consuming buffered actions from the chunk,
needs_obs_next_step()returnsFalse, sorender_intervalis set to a huge value andenv.step()skips rendering entirely
The ~2x speed difference confirms that render-skipping is working. For a chunk_length of 16, only 1 out of every 16 steps needs to render, so the majority of steps run at the faster rate.
How to Reproduce
Prerequisites: Two Docker containers on the same GPU node -- one for the GR00T policy server, one for the Isaac Sim client.
1. Start GR00T policy server
# Build & run the server container (from repo root)
bash docker/run_gr00t_server.sh \
-m /path/to/models \
--port 5555 \
--policy_type isaaclab_arena_gr00t.policy.gr00t_remote_policy.Gr00tRemoteServerSidePolicy \
--policy_config_yaml_path isaaclab_arena_gr00t/policy/config/gr1_manip_gr00t_closedloop_config.yamlWait until you see: [PolicyServer] listening on tcp://0.0.0.0:5555
2. Run the client (Isaac Sim container)
# Inside the isaaclab_arena Docker container
/isaac-sim/python.sh -m isaaclab_arena.evaluation.policy_runner \
--headless \
--enable_cameras \
--policy_type isaaclab_arena.policy.action_chunking_client.ActionChunkingClientSidePolicy \
--remote_host $(hostname) \
--remote_port 5555 \
--num_envs 8 \
--num_steps 100 \
gr1_open_microwaveYou should see a repeating throughput pattern every 16 steps (= chunk_length): the first step is slow (~4-6 step/s) because needs_obs_next_step() returned True on the previous step, so this env.step() renders a camera frame. The remaining 15 steps are fast (~9-10 step/s) because the policy is replaying buffered actions and needs_obs_next_step() returns False, causing env.step() to skip rendering.
3. Compare with baseline
To see the baseline (without render optimization), revert the render_interval change in isaaclab_arena_manager_based_env.py and remove the render-skip logic in policy_runner.py.
What Changed and Why It's a Workaround
This PR modifies 5 files (42 lines added):
Optimization 1: Render once per env.step() instead of twice
In isaaclab_arena_manager_based_env.py, we set render_interval = decimation in __post_init__(). With default settings (decimation=4, render_interval=2), Isaac Lab renders twice per env.step(), but only the final frame is consumed by observation_manager.compute(). This change reduces it to 1 render per step. This is clean and correct.
Optimization 2: Skip rendering when the policy doesn't need observations
This is the workaround part. We add PolicyBase.needs_obs_next_step() -> bool so action-chunking policies can signal "I'm replaying buffered actions, don't need a fresh camera frame." In policy_runner.py, we toggle cfg.sim.render_interval at runtime:
# Current workaround: mutate config before each env.step()
_NO_RENDER = 2**31 - 1
if not policy.needs_obs_next_step():
unwrapped.cfg.sim.render_interval = _NO_RENDER # skip render
else:
unwrapped.cfg.sim.render_interval = _render_interval # restore
obs, _, terminated, truncated, _ = env.step(actions)Why this is ugly:
- We mutate
cfg.sim.render_interval(a config value) at runtime as a side-channel to control rendering behavior - This only works because
render_intervalhappens to be read inside the physics loop as a modulo condition -- it's an implementation detail, not an API contract - If Isaac Lab changes how
render_intervalis used internally, this breaks silently
Proposed Clean Solution (Requires Isaac Lab Change)
The right fix is for Isaac Lab's ManagerBasedRLEnv.step() to accept a render parameter:
# Proposed Isaac Lab API
def step(self, action, render: bool = True):
is_rendering = render and (self.sim.has_gui() or self.sim.has_rtx_sensors())
...Then the rollout loop becomes clean and explicit:
actions = policy.get_action(env, obs)
obs, _, terminated, truncated, _ = env.step(
actions,
render=policy.needs_obs_next_step(),
)This would:
- Eliminate runtime config mutation
- Make the semantics explicit: the caller decides whether to render
- Leave existing reset re-rendering (
num_rerenders_on_reset) unchanged
Headless vs. Non-Headless Consideration
Developers should consider whether render-skipping should only apply in headless mode. In non-headless (GUI) mode, skipping renders would freeze the viewport on most steps, making the simulation appear broken. A possible guard:
# Only skip renders when running headless
if not policy.needs_obs_next_step() and not env.sim.has_gui():
# skip renderingThis ensures:
- Headless evaluation/benchmarking gets the full performance benefit
- Interactive/GUI sessions always render for visual feedback