Adds public v1.0 benchmark schema for Odin (and others)#5840
Adds public v1.0 benchmark schema for Odin (and others)#5840AntoineRichard wants to merge 1 commit into
Conversation
Promote the JSON bundle schema produced by the standalone benchmark scripts under scripts/benchmarks/ into a real public-API module, isaaclab.benchmark.schema. Until now there was no single place in lab that defined the shape of training.json / startup.json, even though three lab scripts emit it and downstream tooling (e.g. the in-tree Odin evaluation harness) is starting to consume it. The module ships frozen dataclasses for TrainingBundle, StartupBundle, and all their building blocks, plus a small write_bundle_file helper that serialises any dataclass tree as schema-v1 JSON. The package __init__ re-exports the public surface so callers can write `from isaaclab.benchmark import TrainingBundle`. This commit also extends GPUInfoRecorder and MemoryInfoRecorder to report per-device peak alongside the existing mean/std rows. The peak rows are always emitted (initialised to 0.0) so dashboards see a consistent key set regardless of whether any sample was recorded. Existing rows are unchanged. The benchmark scripts themselves continue to use the legacy output format on develop today; a follow-up PR rewrites them to emit schema-v1 bundles directly via this module.
There was a problem hiding this comment.
Code Review: Add public v1.0 benchmark schema
Summary
This PR introduces a well-designed public schema module (isaaclab.benchmark.schema) for benchmark bundles, along with recorder enhancements to track peak memory/utilization values. The implementation is clean, well-documented, and follows Python best practices.
Strengths
- Excellent API design: Frozen dataclasses provide immutability and clear contracts for benchmark data structures
- Proper versioning:
SCHEMA_VERSION = "1.0"enables forward compatibility - Good separation of concerns: Schema types are independent from benchmark scripts, enabling clean evolution
- Comprehensive type hints: Use of
Literaltypes forFramework,Backend,RunStatusprovides type safety - Thorough test coverage: Round-trip tests, peak tracking tests, and re-export consistency checks
Suggestions (non-blocking)
-
write_bundle_filerobustness (schema.py:251):os.makedirs(os.path.dirname(os.path.abspath(path)) or ".", exist_ok=True)
The
or "."fallback for single-component paths works but is subtle. A brief comment explaining this edge case would help future maintainers. -
Consider atomic writes: If a process is interrupted during
write_bundle_file, the output file could be corrupted. Writing to a temporary file and usingos.rename()would provide atomicity. -
Optional: Dataclass validation: A
__post_init__could validate invariants likepeak >= meanforMeanStdPeakorduration_s >= 0, catching data errors early. -
Test count assertion fragility (test_recorders.py:246):
assert len(measurement_data.measurements) in (4 * num_gpus, 8 * num_gpus)
This will need updates if more fields are added. Consider asserting on specific required field names instead.
Verdict
Approve — This is a solid foundation PR. The code quality is high, the design is extensible, and the changes are strictly additive with no breaking changes to existing APIs. Ready to merge.
Automated review by Isaac Lab Review Bot
Greptile SummaryThis PR introduces
Confidence Score: 3/5Safe to merge for the schema types, but the GPU utilization peak emission gap should be fixed before downstream consumers are built against this contract. The recorder emits GPU memory peak unconditionally but gates GPU utilization peak on nvml/smi availability, directly contradicting the stated behavioral promise of a consistent key set. Environments without nvml/smi produce structurally different output, which will silently break any consumer that reads the utilization peak row unconditionally. source/isaaclab/isaaclab/test/benchmark/recorders/record_gpu_info.py — the utilization peak emission block needs to move outside the nvml/smi conditional, mirroring the memory peak pattern. Important Files Changed
|
| :class:`TrainingBundle` or :class:`StartupBundle`; any frozen | ||
| dataclass tree composed of primitives, lists, and dicts works. |
There was a problem hiding this comment.
Resources.gpu_util_pct has no peak field while gpu_mem_gb does — asymmetry with the recorder
The GPUInfoRecorder now tracks _util_peak per device and the Resources dataclass was the natural place to surface it. gpu_mem_gb uses MeanStdPeak, but gpu_util_pct still uses MeanStd, so the peak utilisation data collected by the recorder cannot be round-tripped through the schema. Consumers who want the peak GPU utilisation will have no field to read from TrainingBundle.
| :class:`TrainingBundle` or :class:`StartupBundle`; any frozen | |
| dataclass tree composed of primitives, lists, and dicts works. | |
| gpu_util_pct: MeanStdPeak | |
| gpu_mem_gb: MeanStdPeak |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Description
Promote the JSON bundle schema produced by the standalone benchmark scripts under
scripts/benchmarks/into a real public-API module,isaaclab.benchmark.schema. Until now there was no single place in lab that defined the shape oftraining.json/startup.json, even though three lab scripts emit it and downstream tooling (e.g. the in-tree Odin evaluation harness, but also any future regression-comparison or CI-perf tool) consumes it.This PR is the schema foundation for a follow-up that rewrites
benchmark_rsl_rl.py,benchmark_skrl.py, andbenchmark_startup.pyto emit schema-v1 bundles via this module. Kept separate so the type surface lands first and the script rewrite reads as a pure switch-over.What's in here
New public package
isaaclab.benchmark:isaaclab/benchmark/schema.py— frozen dataclasses forTrainingBundle,StartupBundle, and their building blocks (Versions,Hardware,Runtime,Resources,Learning,GpuDeviceInfo,MeanStd,MeanStdPeak,StartupPhase*,CProfileFunction,RunIdentity,StartupRunIdentity). PlusSCHEMA_VERSION = "1.0"andwrite_bundle_file().isaaclab/benchmark/__init__.py— re-exports the public surface, so callers can writefrom isaaclab.benchmark import TrainingBundle.Recorder peak fields:
GPUInfoRecordernow reports per-device peak memory and peak GPU utilisation alongside the existing mean/std rows ("GPU Memory Used peak","GPU Utilization peak").MemoryInfoRecorderreports peak RSS / VMS / USS the same way.0.0) so consumers see a consistent key set regardless of whether any sample was recorded.Tests:
source/isaaclab/test/benchmark/test_schema.py— round-trip tests forTrainingBundleandStartupBundle, plus a sanity check that the package re-exports stay in sync withschema.py.source/isaaclab/test/benchmark/test_recorders.py— extended with peak-tracking and "peak is zero before any record" cases for both GPU and memory recorders.Changelog fragment:
source/isaaclab/changelog.d/antoiner-feat-benchmark-schema.rst.Compatibility
scripts/benchmarks/benchmark_*.py) are untouched in this PR — they keep emitting their existing per-backend output format until the follow-up rewires them.Fixes # (no issue)
Type of change
Screenshots
N/A — types + recorder fields, no UI surface.
Checklist
pre-commitchecks with./isaaclab.sh --formatbenchmarking.rstcovering the public schema ships with the script-rewrite follow-up where it's user-facing; this PR is the type-only foundation)source/<pkg>/changelog.d/for every touched packageCONTRIBUTORS.mdor my name already exists there