Tune Shadow-Hand Vision default iters from 50k to 5k#5837
Tune Shadow-Hand Vision default iters from 50k to 5k#5837AntoineRichard wants to merge 1 commit into
Conversation
The Shadow-Hand Vision (in-hand cube reposing with camera obs) PPO agents ship with a default training schedule of 50000 iterations, which is a 10-30 hour wall-clock job on a current GPU. Empirical training curves show convergence well before 5k iterations, so the 50k default amounts to a long no-op tail that scares operators away from running the example as shipped. Drop the default to 5000 for both training frameworks (ShadowHandVisionFFPPORunnerCfg for rsl_rl, max_epochs in rl_games_ppo_vision_cfg.yaml). Users who still want the long schedule can pass --max_iterations 50000 on the train.py CLI; both scripts already plumb that flag through to the agent config.
There was a problem hiding this comment.
🤖 Automated Code Review
Summary
This PR appropriately reduces the default training iterations for the Shadow-Hand Vision task from 50k to 5k, providing a much better out-of-box experience for users exploring this example.
Review
✅ Configuration Consistency
Both RL frameworks (rsl_rl and rl_games) are updated in lockstep, and the author correctly identified that skrl has no vision config requiring changes.
✅ Backward Compatibility
The existing --max_iterations CLI flag allows users who need longer training schedules to easily restore the previous behavior. No API changes or breaking modifications.
✅ Documentation
The changelog fragment is well-written and follows the project's established format. It clearly explains:
- What changed
- Why it changed (10-30h wall-clock to reasonable duration)
- How to opt into the old behavior
✅ Code Quality
- Clean diff affecting only the intended constants
- No unrelated changes
- Pre-commit checks pass
Notes
The decision not to add tests is reasonable here—verifying convergence claims would require running the actual multi-hour training job, which defeats the purpose of this PR. The change is to default values only; the underlying training logic remains untested by this PR but is unaffected.
Verdict: LGTM 👍
This is a well-scoped quality-of-life improvement. The 10× reduction in default iterations will make the Shadow-Hand Vision example much more accessible for new users while preserving full flexibility for those who need extended training.
Greptile SummaryReduces the default training budget for the Shadow-Hand Vision task from 50 000 to 5 000 iterations across both
Confidence Score: 5/5Safe to merge — both changes are isolated single-value edits to default config constants with no effect on code paths, APIs, or other tasks. The PR touches exactly two numeric literals across two config files and adds a changelog entry. All surrounding hyperparameters (save cadence, learning rate, entropy coefficient, etc.) remain consistent with the new budget. The existing CLI escape hatch covers users who want the old 50k schedule. No logic, no tests, no APIs are affected. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Shadow-Hand Vision Training] --> B{Framework?}
B -->|rsl_rl| C[ShadowHandVisionFFPPORunnerCfg\nmax_iterations: 50000 → 5000\nsave_interval: 250]
B -->|rl_games| D[rl_games_ppo_vision_cfg.yaml\nmax_epochs: 50000 → 5000\nsave_frequency: 200]
B -->|skrl| E[No vision config — unchanged]
C --> F[~20 checkpoints saved]
D --> G[~25 checkpoints saved]
F --> H[Override via --max_iterations 50000]
G --> H
Reviews (1): Last reviewed commit: "Tune Shadow-Hand Vision default iters to..." | Re-trigger Greptile |
Description
The Shadow-Hand Vision example (in-hand cube reposing with camera observations) currently ships with a default training schedule of 50000 iterations, which is a 10-30 hour wall-clock job on a current GPU. Empirical training curves show convergence well before 5k iterations, so the 50k default is a long no-op tail that discourages operators from running the example as shipped.
This PR drops the default to 5000 for both training frameworks that have a vision config:
rsl_rl_ppo_cfg.py:ShadowHandVisionFFPPORunnerCfg.max_iterations50000 → 5000rl_games_ppo_vision_cfg.yaml:max_epochs50000 → 5000skrlhas noshadow_handvision config so no change there.Users who want the prior long schedule can still pass
--max_iterations 50000on the CLI; bothscripts/reinforcement_learning/rsl_rl/train.pyandscripts/reinforcement_learning/rl_games/train.pyalready plumb that flag through to the agent config (rsl_rl:78, rl_games:73).Reproducing the example with the new default:
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ --task Isaac-Repose-Cube-Shadow-Vision-Direct-v0 \ --num_envs 4096 --headlessTo restore the old long schedule:
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py \ --task Isaac-Repose-Cube-Shadow-Vision-Direct-v0 \ --num_envs 4096 --headless --max_iterations 50000Fixes # (no issue)
Type of change
--max_iterationsCLI flag)Screenshots
N/A — config-default change only.
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched package (do not editCHANGELOG.rstor bumpextension.toml— CI handles that)CONTRIBUTORS.mdor my name already exists thereTests not added: this PR only changes default values of two config constants. The convergence claim that motivates the change is an empirical observation; landing a test for it would mean running the full training, which is exactly the multi-hour cost this PR exists to avoid. Happy to add a smoke-level configclass loadability test if a reviewer wants one.