Skip to content

Adds public v1.0 benchmark schema for Odin (and others)#5840

Open
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/feat/benchmark-schema
Open

Adds public v1.0 benchmark schema for Odin (and others)#5840
AntoineRichard wants to merge 1 commit into
isaac-sim:developfrom
AntoineRichard:antoiner/feat/benchmark-schema

Conversation

@AntoineRichard
Copy link
Copy Markdown
Collaborator

Description

Promote the JSON bundle schema produced by the standalone benchmark scripts under scripts/benchmarks/ into a real public-API module, isaaclab.benchmark.schema. Until now there was no single place in lab that defined the shape of training.json / startup.json, even though three lab scripts emit it and downstream tooling (e.g. the in-tree Odin evaluation harness, but also any future regression-comparison or CI-perf tool) consumes it.

This PR is the schema foundation for a follow-up that rewrites benchmark_rsl_rl.py, benchmark_skrl.py, and benchmark_startup.py to emit schema-v1 bundles via this module. Kept separate so the type surface lands first and the script rewrite reads as a pure switch-over.

What's in here

New public package isaaclab.benchmark:

  • isaaclab/benchmark/schema.py — frozen dataclasses for TrainingBundle, StartupBundle, and their building blocks (Versions, Hardware, Runtime, Resources, Learning, GpuDeviceInfo, MeanStd, MeanStdPeak, StartupPhase*, CProfileFunction, RunIdentity, StartupRunIdentity). Plus SCHEMA_VERSION = "1.0" and write_bundle_file().
  • isaaclab/benchmark/__init__.py — re-exports the public surface, so callers can write from isaaclab.benchmark import TrainingBundle.

Recorder peak fields:

  • GPUInfoRecorder now reports per-device peak memory and peak GPU utilisation alongside the existing mean/std rows ("GPU Memory Used peak", "GPU Utilization peak").
  • MemoryInfoRecorder reports peak RSS / VMS / USS the same way.
  • Existing mean/std rows are unchanged.
  • Peak rows are always emitted (initialised to 0.0) so consumers see a consistent key set regardless of whether any sample was recorded.

Tests:

  • source/isaaclab/test/benchmark/test_schema.py — round-trip tests for TrainingBundle and StartupBundle, plus a sanity check that the package re-exports stay in sync with schema.py.
  • source/isaaclab/test/benchmark/test_recorders.py — extended with peak-tracking and "peak is zero before any record" cases for both GPU and memory recorders.

Changelog fragment: source/isaaclab/changelog.d/antoiner-feat-benchmark-schema.rst.

Compatibility

  • Strictly additive. No public API removed, no existing recorder rows renamed or removed.
  • The legacy benchmark scripts (scripts/benchmarks/benchmark_*.py) are untouched in this PR — they keep emitting their existing per-backend output format until the follow-up rewires them.

Fixes # (no issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots

N/A — types + recorder fields, no UI surface.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation (benchmarking.rst covering the public schema ships with the script-rewrite follow-up where it's user-facing; this PR is the type-only foundation)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Promote the JSON bundle schema produced by the standalone benchmark
scripts under scripts/benchmarks/ into a real public-API module,
isaaclab.benchmark.schema. Until now there was no single place in
lab that defined the shape of training.json / startup.json, even
though three lab scripts emit it and downstream tooling (e.g. the
in-tree Odin evaluation harness) is starting to consume it.

The module ships frozen dataclasses for TrainingBundle, StartupBundle,
and all their building blocks, plus a small write_bundle_file helper
that serialises any dataclass tree as schema-v1 JSON. The package
__init__ re-exports the public surface so callers can write
`from isaaclab.benchmark import TrainingBundle`.

This commit also extends GPUInfoRecorder and MemoryInfoRecorder to
report per-device peak alongside the existing mean/std rows. The
peak rows are always emitted (initialised to 0.0) so dashboards see
a consistent key set regardless of whether any sample was recorded.
Existing rows are unchanged.

The benchmark scripts themselves continue to use the legacy output
format on develop today; a follow-up PR rewrites them to emit
schema-v1 bundles directly via this module.
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 28, 2026
@AntoineRichard AntoineRichard requested a review from kellyguo11 May 28, 2026 15:33
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Add public v1.0 benchmark schema

Summary

This PR introduces a well-designed public schema module (isaaclab.benchmark.schema) for benchmark bundles, along with recorder enhancements to track peak memory/utilization values. The implementation is clean, well-documented, and follows Python best practices.

Strengths

  • Excellent API design: Frozen dataclasses provide immutability and clear contracts for benchmark data structures
  • Proper versioning: SCHEMA_VERSION = "1.0" enables forward compatibility
  • Good separation of concerns: Schema types are independent from benchmark scripts, enabling clean evolution
  • Comprehensive type hints: Use of Literal types for Framework, Backend, RunStatus provides type safety
  • Thorough test coverage: Round-trip tests, peak tracking tests, and re-export consistency checks

Suggestions (non-blocking)

  1. write_bundle_file robustness (schema.py:251):

    os.makedirs(os.path.dirname(os.path.abspath(path)) or ".", exist_ok=True)

    The or "." fallback for single-component paths works but is subtle. A brief comment explaining this edge case would help future maintainers.

  2. Consider atomic writes: If a process is interrupted during write_bundle_file, the output file could be corrupted. Writing to a temporary file and using os.rename() would provide atomicity.

  3. Optional: Dataclass validation: A __post_init__ could validate invariants like peak >= mean for MeanStdPeak or duration_s >= 0, catching data errors early.

  4. Test count assertion fragility (test_recorders.py:246):

    assert len(measurement_data.measurements) in (4 * num_gpus, 8 * num_gpus)

    This will need updates if more fields are added. Consider asserting on specific required field names instead.

Verdict

Approve — This is a solid foundation PR. The code quality is high, the design is extensible, and the changes are strictly additive with no breaking changes to existing APIs. Ready to merge.


Automated review by Isaac Lab Review Bot

@AntoineRichard AntoineRichard changed the title Add public v1.0 benchmark schema (isaaclab.benchmark.schema) Adds public v1.0 benchmark schema for Odin (and others) May 28, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR introduces isaaclab.benchmark.schema — a new public module of frozen dataclasses (TrainingBundle, StartupBundle, and their building blocks) plus write_bundle_file() — as a schema foundation for future rewriting of the standalone benchmark scripts. It also extends GPUInfoRecorder and MemoryInfoRecorder with per-device running-max peak fields alongside the existing Welford mean/std rows.

  • New isaaclab.benchmark package: schema.py defines the full v1.0 JSON bundle shape; __init__.py re-exports the same surface so callers can import directly from the package root.
  • Recorder peak additions: MemoryInfoRecorder always emits RSS/VMS/USS peak rows; GPUInfoRecorder always emits GPU memory peak but emits GPU utilization peak only when pynvml/nvidia-smi is available — inconsistent with the stated "always emitted" contract.
  • Resources.gpu_util_pct uses MeanStd (no peak field) while gpu_mem_gb uses MeanStdPeak, leaving the recorder's collected peak utilization data with nowhere to go in the schema.

Confidence Score: 3/5

Safe to merge for the schema types, but the GPU utilization peak emission gap should be fixed before downstream consumers are built against this contract.

The recorder emits GPU memory peak unconditionally but gates GPU utilization peak on nvml/smi availability, directly contradicting the stated behavioral promise of a consistent key set. Environments without nvml/smi produce structurally different output, which will silently break any consumer that reads the utilization peak row unconditionally.

source/isaaclab/isaaclab/test/benchmark/recorders/record_gpu_info.py — the utilization peak emission block needs to move outside the nvml/smi conditional, mirroring the memory peak pattern.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/benchmark/schema.py New public schema module with frozen dataclasses for TrainingBundle/StartupBundle and write_bundle_file(); Resources.gpu_util_pct uses MeanStd instead of MeanStdPeak and a dead-code or "." branch exists in makedirs.
source/isaaclab/isaaclab/benchmark/init.py Re-exports the full public surface from schema.py; all matches imports correctly.
source/isaaclab/isaaclab/test/benchmark/recorders/record_gpu_info.py Adds per-device peak memory and utilization tracking; memory peak is always emitted but utilization peak is conditionally emitted only when nvml/smi is available, breaking the stated consistent-key-set contract.
source/isaaclab/isaaclab/test/benchmark/recorders/record_memory_info.py Adds RSS/VMS/USS peak tracking via running max; all three peaks are always emitted, consistent with the memory peak pattern.
source/isaaclab/test/benchmark/test_schema.py Round-trip tests for TrainingBundle and StartupBundle plus a re-export consistency check; coverage is adequate for the schema surface.
source/isaaclab/test/benchmark/test_recorders.py Extends recorder tests with peak-tracking and zero-before-record cases; GPU utilization peak is not tested for the always-emitted behavior, mirroring the emission gap in the recorder.
source/isaaclab/changelog.d/antoiner-feat-benchmark-schema.rst Changelog fragment documenting new schema module and recorder peak fields; lists bare measurement names that omit the mandatory "GPU " prefix.

Comments Outside Diff (3)

  1. source/isaaclab/isaaclab/test/benchmark/recorders/record_gpu_info.py, line 298-327 (link)

    P1 GPU utilization peak is not always emitted — breaks the stated consistent-key-set contract

    The PR description and the in-code comment on line 289 both promise "Peak rows are always emitted (initialised to 0.0) so consumers see a consistent key set regardless of whether any sample was recorded." That holds for Memory Used peak, which is appended unconditionally outside the if "memory_used_mean_bytes" in runtime block. However, Utilization peak is appended inside if "utilization_mean_percent" in runtime, which is only entered when pynvml or nvidia-smi is available. In an environment without either (e.g., a CPU-only or sandboxed CI machine), memory peak is emitted but utilization peak is entirely absent from the output. Downstream consumers that assume a fixed key set and read "GPU Utilization peak" unconditionally will either KeyError or silently produce wrong numbers.

    To match the memory-peak pattern, the Utilization peak SingleMeasurement should be moved outside the conditional and read from self._util_peak[i] directly — in the same way Memory Used peak reads from self._mem_peak[i] — so the row is always present with value 0.0 when no utilization data was recorded.

  2. source/isaaclab/isaaclab/benchmark/schema.py, line 364 (link)

    P2 Dead-code or "." branch in os.makedirs call

    os.path.abspath(path) always returns an absolute path, so os.path.dirname() applied to it will always return a non-empty string (at minimum / on POSIX). The or "." fallback is therefore unreachable. Removing it makes the intent clearer and avoids misleading future readers.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  3. source/isaaclab/changelog.d/antoiner-feat-benchmark-schema.rst, line 27 (link)

    P2 Changelog lists bare "Memory Used peak" and "Utilization peak" — actual measurement names include the "GPU " prefix

    For a single GPU the recorder emits f"{prefix}Memory Used peak" where prefix = "GPU ", so the real name is "GPU Memory Used peak". For multi-GPU setups it is "GPU {i} Memory Used peak". The changelog's bare names don't match what consumers would search for in the output.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "Add public v1.0 benchmark schema and rec..." | Re-trigger Greptile

Comment on lines +251 to +252
:class:`TrainingBundle` or :class:`StartupBundle`; any frozen
dataclass tree composed of primitives, lists, and dicts works.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Resources.gpu_util_pct has no peak field while gpu_mem_gb does — asymmetry with the recorder

The GPUInfoRecorder now tracks _util_peak per device and the Resources dataclass was the natural place to surface it. gpu_mem_gb uses MeanStdPeak, but gpu_util_pct still uses MeanStd, so the peak utilisation data collected by the recorder cannot be round-tripped through the schema. Consumers who want the peak GPU utilisation will have no field to read from TrainingBundle.

Suggested change
:class:`TrainingBundle` or :class:`StartupBundle`; any frozen
dataclass tree composed of primitives, lists, and dicts works.
gpu_util_pct: MeanStdPeak
gpu_mem_gb: MeanStdPeak

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant