Adds public v1.0 benchmark schema for Odin (and others) by AntoineRichard · Pull Request #5840 · isaac-sim/IsaacLab

AntoineRichard · 2026-05-28T15:32:55Z

Description

Promote the JSON bundle schema produced by the standalone benchmark scripts under scripts/benchmarks/ into a real public-API module, isaaclab.benchmark.schema. Until now there was no single place in lab that defined the shape of training.json / startup.json, even though three lab scripts emit it and downstream tooling (e.g. the in-tree Odin evaluation harness, but also any future regression-comparison or CI-perf tool) consumes it.

This PR is the schema foundation for a follow-up that rewrites benchmark_rsl_rl.py, benchmark_skrl.py, and benchmark_startup.py to emit schema-v1 bundles via this module. Kept separate so the type surface lands first and the script rewrite reads as a pure switch-over.

What's in here

New public package isaaclab.benchmark:

isaaclab/benchmark/schema.py — frozen dataclasses for TrainingBundle, StartupBundle, and their building blocks (Versions, Hardware, Runtime, Resources, Learning, GpuDeviceInfo, MeanStd, MeanStdPeak, StartupPhase*, CProfileFunction, RunIdentity, StartupRunIdentity). Plus SCHEMA_VERSION = "1.0" and write_bundle_file().
isaaclab/benchmark/__init__.py — re-exports the public surface, so callers can write from isaaclab.benchmark import TrainingBundle.

Recorder peak fields:

GPUInfoRecorder now reports per-device peak memory and peak GPU utilisation alongside the existing mean/std rows ("GPU Memory Used peak", "GPU Utilization peak").
MemoryInfoRecorder reports peak RSS / VMS / USS the same way.
Existing mean/std rows are unchanged.
Peak rows are always emitted (initialised to 0.0) so consumers see a consistent key set regardless of whether any sample was recorded.

Tests:

source/isaaclab/test/benchmark/test_schema.py — round-trip tests for TrainingBundle and StartupBundle, plus a sanity check that the package re-exports stay in sync with schema.py.
source/isaaclab/test/benchmark/test_recorders.py — extended with peak-tracking and "peak is zero before any record" cases for both GPU and memory recorders.

Changelog fragment: source/isaaclab/changelog.d/antoiner-feat-benchmark-schema.rst.

Compatibility

Strictly additive. No public API removed, no existing recorder rows renamed or removed.
The legacy benchmark scripts (scripts/benchmarks/benchmark_*.py) are untouched in this PR — they keep emitting their existing per-backend output format until the follow-up rewires them.

Fixes # (no issue)

Type of change

New feature (non-breaking change which adds functionality)

Screenshots

N/A — types + recorder fields, no UI surface.

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation (benchmarking.rst covering the public schema ships with the script-rewrite follow-up where it's user-facing; this PR is the type-only foundation)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package
I have added my name to the CONTRIBUTORS.md or my name already exists there

Promote the JSON bundle schema produced by the standalone benchmark scripts under scripts/benchmarks/ into a real public-API module, isaaclab.benchmark.schema. Until now there was no single place in lab that defined the shape of training.json / startup.json, even though three lab scripts emit it and downstream tooling (e.g. the in-tree Odin evaluation harness) is starting to consume it. The module ships frozen dataclasses for TrainingBundle, StartupBundle, and all their building blocks, plus a small write_bundle_file helper that serialises any dataclass tree as schema-v1 JSON. The package __init__ re-exports the public surface so callers can write `from isaaclab.benchmark import TrainingBundle`. This commit also extends GPUInfoRecorder and MemoryInfoRecorder to report per-device peak alongside the existing mean/std rows. The peak rows are always emitted (initialised to 0.0) so dashboards see a consistent key set regardless of whether any sample was recorded. Existing rows are unchanged. The benchmark scripts themselves continue to use the legacy output format on develop today; a follow-up PR rewrites them to emit schema-v1 bundles directly via this module.

isaaclab-review-bot

Code Review: Add public v1.0 benchmark schema

Summary

This PR introduces a well-designed public schema module (isaaclab.benchmark.schema) for benchmark bundles, along with recorder enhancements to track peak memory/utilization values. The implementation is clean, well-documented, and follows Python best practices.

Strengths

Excellent API design: Frozen dataclasses provide immutability and clear contracts for benchmark data structures
Proper versioning: SCHEMA_VERSION = "1.0" enables forward compatibility
Good separation of concerns: Schema types are independent from benchmark scripts, enabling clean evolution
Comprehensive type hints: Use of Literal types for Framework, Backend, RunStatus provides type safety
Thorough test coverage: Round-trip tests, peak tracking tests, and re-export consistency checks

Suggestions (non-blocking)

write_bundle_file robustness (schema.py:251):
```
os.makedirs(os.path.dirname(os.path.abspath(path)) or ".", exist_ok=True)
```
The or "." fallback for single-component paths works but is subtle. A brief comment explaining this edge case would help future maintainers.
Consider atomic writes: If a process is interrupted during write_bundle_file, the output file could be corrupted. Writing to a temporary file and using os.rename() would provide atomicity.
Optional: Dataclass validation: A __post_init__ could validate invariants like peak >= mean for MeanStdPeak or duration_s >= 0, catching data errors early.
Test count assertion fragility (test_recorders.py:246):
```
assert len(measurement_data.measurements) in (4 * num_gpus, 8 * num_gpus)
```
This will need updates if more fields are added. Consider asserting on specific required field names instead.

Verdict

Approve — This is a solid foundation PR. The code quality is high, the design is extensible, and the changes are strictly additive with no breaking changes to existing APIs. Ready to merge.

Automated review by Isaac Lab Review Bot

greptile-apps · 2026-05-28T15:38:44Z

Greptile Summary

This PR introduces isaaclab.benchmark.schema — a new public module of frozen dataclasses (TrainingBundle, StartupBundle, and their building blocks) plus write_bundle_file() — as a schema foundation for future rewriting of the standalone benchmark scripts. It also extends GPUInfoRecorder and MemoryInfoRecorder with per-device running-max peak fields alongside the existing Welford mean/std rows.

New isaaclab.benchmark package: schema.py defines the full v1.0 JSON bundle shape; __init__.py re-exports the same surface so callers can import directly from the package root.
Recorder peak additions: MemoryInfoRecorder always emits RSS/VMS/USS peak rows; GPUInfoRecorder always emits GPU memory peak but emits GPU utilization peak only when pynvml/nvidia-smi is available — inconsistent with the stated "always emitted" contract.
Resources.gpu_util_pct uses MeanStd (no peak field) while gpu_mem_gb uses MeanStdPeak, leaving the recorder's collected peak utilization data with nowhere to go in the schema.

Confidence Score: 3/5

Safe to merge for the schema types, but the GPU utilization peak emission gap should be fixed before downstream consumers are built against this contract.

The recorder emits GPU memory peak unconditionally but gates GPU utilization peak on nvml/smi availability, directly contradicting the stated behavioral promise of a consistent key set. Environments without nvml/smi produce structurally different output, which will silently break any consumer that reads the utilization peak row unconditionally.

source/isaaclab/isaaclab/test/benchmark/recorders/record_gpu_info.py — the utilization peak emission block needs to move outside the nvml/smi conditional, mirroring the memory peak pattern.

Important Files Changed

Filename	Overview
source/isaaclab/isaaclab/benchmark/schema.py	New public schema module with frozen dataclasses for TrainingBundle/StartupBundle and write_bundle_file(); Resources.gpu_util_pct uses MeanStd instead of MeanStdPeak and a dead-code `or "."` branch exists in makedirs.
source/isaaclab/isaaclab/benchmark/init.py	Re-exports the full public surface from schema.py; all matches imports correctly.
source/isaaclab/isaaclab/test/benchmark/recorders/record_gpu_info.py	Adds per-device peak memory and utilization tracking; memory peak is always emitted but utilization peak is conditionally emitted only when nvml/smi is available, breaking the stated consistent-key-set contract.
source/isaaclab/isaaclab/test/benchmark/recorders/record_memory_info.py	Adds RSS/VMS/USS peak tracking via running max; all three peaks are always emitted, consistent with the memory peak pattern.
source/isaaclab/test/benchmark/test_schema.py	Round-trip tests for TrainingBundle and StartupBundle plus a re-export consistency check; coverage is adequate for the schema surface.
source/isaaclab/test/benchmark/test_recorders.py	Extends recorder tests with peak-tracking and zero-before-record cases; GPU utilization peak is not tested for the always-emitted behavior, mirroring the emission gap in the recorder.
source/isaaclab/changelog.d/antoiner-feat-benchmark-schema.rst	Changelog fragment documenting new schema module and recorder peak fields; lists bare measurement names that omit the mandatory "GPU " prefix.

Comments Outside Diff (3)

source/isaaclab/isaaclab/test/benchmark/recorders/record_gpu_info.py, line 298-327 (link)

GPU utilization peak is not always emitted — breaks the stated consistent-key-set contract

The PR description and the in-code comment on line 289 both promise "Peak rows are always emitted (initialised to 0.0) so consumers see a consistent key set regardless of whether any sample was recorded." That holds for Memory Used peak, which is appended unconditionally outside the if "memory_used_mean_bytes" in runtime block. However, Utilization peak is appended inside if "utilization_mean_percent" in runtime, which is only entered when pynvml or nvidia-smi is available. In an environment without either (e.g., a CPU-only or sandboxed CI machine), memory peak is emitted but utilization peak is entirely absent from the output. Downstream consumers that assume a fixed key set and read "GPU Utilization peak" unconditionally will either KeyError or silently produce wrong numbers.

To match the memory-peak pattern, the Utilization peak SingleMeasurement should be moved outside the conditional and read from self._util_peak[i] directly — in the same way Memory Used peak reads from self._mem_peak[i] — so the row is always present with value 0.0 when no utilization data was recorded.
source/isaaclab/isaaclab/benchmark/schema.py, line 364 (link)

Dead-code or "." branch in os.makedirs call

os.path.abspath(path) always returns an absolute path, so os.path.dirname() applied to it will always return a non-empty string (at minimum / on POSIX). The or "." fallback is therefore unreachable. Removing it makes the intent clearer and avoids misleading future readers.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
source/isaaclab/changelog.d/antoiner-feat-benchmark-schema.rst, line 27 (link)

Changelog lists bare "Memory Used peak" and "Utilization peak" — actual measurement names include the "GPU " prefix

For a single GPU the recorder emits f"{prefix}Memory Used peak" where prefix = "GPU ", so the real name is "GPU Memory Used peak". For multi-GPU setups it is "GPU {i} Memory Used peak". The changelog's bare names don't match what consumers would search for in the output.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "Add public v1.0 benchmark schema and rec..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T15:38:52Z

+            :class:`TrainingBundle` or :class:`StartupBundle`; any frozen
+            dataclass tree composed of primitives, lists, and dicts works.


Resources.gpu_util_pct has no peak field while gpu_mem_gb does — asymmetry with the recorder

The GPUInfoRecorder now tracks _util_peak per device and the Resources dataclass was the natural place to surface it. gpu_mem_gb uses MeanStdPeak, but gpu_util_pct still uses MeanStd, so the peak utilisation data collected by the recorder cannot be round-tripped through the schema. Consumers who want the peak GPU utilisation will have no field to read from TrainingBundle.

Suggested change

:class:`TrainingBundle` or :class:`StartupBundle`; any frozen

dataclass tree composed of primitives, lists, and dicts works.

gpu_util_pct: MeanStdPeak

gpu_mem_gb: MeanStdPeak

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

github-actions Bot added the isaac-lab Related to Isaac Lab team label May 28, 2026

AntoineRichard requested a review from kellyguo11 May 28, 2026 15:33

isaaclab-review-bot Bot reviewed May 28, 2026

View reviewed changes

AntoineRichard changed the title ~~Add public v1.0 benchmark schema (isaaclab.benchmark.schema)~~ Adds public v1.0 benchmark schema for Odin (and others) May 28, 2026

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

AntoineRichard mentioned this pull request May 28, 2026

Emit v1.0 schema bundles from benchmark scripts #5842

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds public v1.0 benchmark schema for Odin (and others)#5840

Adds public v1.0 benchmark schema for Odin (and others)#5840
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/feat/benchmark-schema

AntoineRichard commented May 28, 2026

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

greptile-apps Bot commented May 28, 2026 •

edited

Loading

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		:class:`TrainingBundle` or :class:`StartupBundle`; any frozen
		dataclass tree composed of primitives, lists, and dicts works.

Conversation

AntoineRichard commented May 28, 2026

Description

What's in here

Compatibility

Type of change

Screenshots

Checklist

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

Code Review: Add public v1.0 benchmark schema

Summary

Strengths

Suggestions (non-blocking)

Verdict

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (3)

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 28, 2026 •

edited

Loading