Logging per test case status#5847
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Thanks for adding per-test-case logging! This is a nice enhancement for test result visibility.
✅ Architecture & Design
- Appropriate location: Changes to
tools/conftest.pyare correct for test infrastructure modifications - Clean integration: The
caseslist is added to the existingtest_statusdictionary structure, maintaining backwards compatibility - Good import addition: Adding
FailureandSkippedfromjunitparseralongside existingErrorimport is correct
✅ Implementation Correctness
- Result status logic is sound: The priority order (FAILED > ERROR > skipped > passed) handles overlapping statuses correctly
- Empty cases list: All error paths (
STARTUP_HANG,TIMEOUT,CRASHED, report parse failure) now correctly include"cases": []- nice attention to detail - Safe defaults: Using
case.name or "",case.classname or "", andfloat(case.time or 0.0)preventsNonepropagation
🔍 Minor Observations
-
Line 518-520: The
list(case.result)conversion assumescase.resultis iterable. While this works with junitparser's API, a brief comment noting this behavior might help future maintainers. -
Output formatting: The
↳prefix with 4-space indent creates a nice visual hierarchy in the output table. The empty strings for Wall time and # Tests columns on case rows make sense since those are file-level metrics.
📋 Pre-commit Check
The pre-commit check is failing - please run ./isaaclab.sh --format to address any formatting issues.
✅ Test Coverage
This is test infrastructure code. The changes are self-testing in that any issues would surface immediately when running tests.
This is an automated review. Please address any concerns and reach out if you have questions.
Update (f6d72b4): Nice cleanup! The case status logic has been extracted into a _case_result_status() helper function. This improves code readability and maintainability. ✅
Update (5780763): This is a substantial and excellent refactor of run_individual_tests() 🎉
What changed:
- The original 250+ line function has been decomposed into ~15 focused helper functions
- Each helper has a clear single responsibility (e.g.,
_build_test_command,_run_test_with_retries,_handle_startup_hang,_parse_test_report) - The main
run_individual_tests()is now a clean 15-line loop
Quality highlights:
- ✅
_status_summary()builder ensures consistent status dict structure across all paths - ✅
_record_infrastructure_status()cleanly separates infrastructure failures from test failures - ✅
_run_test_with_retries()encapsulates the retry logic with proper state management - ✅
_print_retry_context()DRYs up the diagnostic output formatting - ✅ Per-case results (
caseslist) are now properly populated via_parse_test_report()
Minor note: The summary table output now includes per-case rows with the ↳ prefix - this is a nice visual improvement for identifying which specific test cases passed/failed.
This refactor significantly improves maintainability. Future changes (new failure modes, additional metrics, etc.) can be isolated to individual helpers. Well done! 👏
Update (bbd0270): The helper function refactoring from the previous commit has been reverted. The code is now back to a single run_individual_tests() function with inline logic (~270 lines).
What changed:
- Removed:
_build_test_command,_read_test_content,_print_retry_context,_run_test_with_retries,_status_summary,_write_synthetic_failure,_handle_startup_hang,_handle_timeout,_handle_missing_report,_parse_test_report,_record_infrastructure_status,_record_report_status,_test_timeout_config,_run_individual_test - The retry loops, error handling, and report parsing are now inline within the main function
Observation: While the inlined version is more self-contained, it reverts the maintainability gains from the previous refactor. If the team prefers the explicit inline approach over the decomposed helper style, that's a valid choice - but it may be worth discussing the tradeoffs with the maintainers.
Functionality preserved: ✅ All the per-case logging and status tracking features remain intact. The _case_result_status() helper is still present.
Update (d4d6010): The refactoring is back! 🎉 This commit re-introduces helper functions with some nice improvements:
New/restored helper functions:
_print_retry_diagnostics()- consolidated retry diagnostic output_run_test_with_retries()- encapsulates the retry loop logic_append_output_tail()- DRY helper for truncating output tails_write_error_result()- creates synthetic JUnit reports for process errors_process_startup_hang()/_process_timeout_without_report()/_process_missing_report()- handle specific failure modes_process_error_status()- builds consistent error status dicts_parse_test_report()- extracts report parsing with per-case collection_summary_from_report()- calculates final pass/fail from parsed report_test_file_has_cameras()- isolated cold-cache check_build_test_command()- constructs pytest command
Improvements over 5780763:
- Cleaner function names (e.g.,
_process_startup_hangvs_handle_startup_hang) _append_output_tail()eliminates duplicated output formatting- Explicit return values from
_run_test_with_retries()make data flow clearer _summary_from_report()cleanly separates report parsing from pass/fail determination
Result: The main loop in run_individual_tests() is now ~60 lines of clear orchestration logic. This is a solid balance between decomposition and readability. ✅
Update (6d1f645): Second revert - the helper function refactoring has been reverted again.
Current state:
- Back to the simpler structure: one
_case_result_status()helper + inline logic inrun_individual_tests() - All the per-case logging functionality is preserved
- The
caseslist correctly captures test case name, classname, time, and result - Error paths (
STARTUP_HANG,TIMEOUT,CRASHED, parse failure) all set"cases": []
Final assessment: The core feature (per-test-case status logging) is complete and working correctly. The team appears to be iterating on whether to extract helpers vs keep inline logic - both approaches are valid for this infrastructure code. The current implementation is straightforward and achieves the goal. 👍
Greptile SummaryThis PR enhances the test runner's summary table to show per-test-case status as indented sub-rows (prefixed with
Confidence Score: 4/5Safe to merge; the change is additive and only affects the summary table printed at the end of a test run. The case-collection logic is correct and all early-exit paths properly initialise No files require special attention beyond the two style nits in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Parse JUnitXml report] --> B[Iterate suites]
B --> C[Iterate cases in suite]
C --> D{case.result empty?}
D -- Yes --> E[case_status = 'passed']
D -- No --> F{any Failure?}
F -- Yes --> G[case_status = 'FAILED']
F -- No --> H{any Error?}
H -- Yes --> I[case_status = 'ERROR']
H -- No --> J{any Skipped?}
J -- Yes --> K[case_status = 'skipped']
J -- No --> L[case_status = 'passed']
E & G & I & K & L --> M[Append to cases list]
M --> N[Store in test_status with cases key]
N --> O[Render sub-row in PrettyTable with ↳ prefix]
Reviews (1): Last reviewed commit: "Logging per test case status" | Re-trigger Greptile |
| if not case_results: | ||
| case_status = "passed" | ||
| elif any(isinstance(r, Failure) for r in case_results): | ||
| case_status = "FAILED" | ||
| elif any(isinstance(r, Error) for r in case_results): | ||
| case_status = "ERROR" | ||
| elif any(isinstance(r, Skipped) for r in case_results): | ||
| case_status = "skipped" | ||
| else: | ||
| case_status = "passed" |
There was a problem hiding this comment.
The result strings for case-level status mix lowercase and uppercase:
"passed" and "skipped" are lowercase while "FAILED" and "ERROR" are uppercase. This is inconsistent within the same data structure — if any downstream code ever does a case-sensitive comparison against these values (e.g. filtering, aggregating, or serialising), results with "passed" and "FAILED" would need to be treated differently. Using a uniform casing convention (all-uppercase or all-lowercase) avoids that ambiguity.
| if not case_results: | |
| case_status = "passed" | |
| elif any(isinstance(r, Failure) for r in case_results): | |
| case_status = "FAILED" | |
| elif any(isinstance(r, Error) for r in case_results): | |
| case_status = "ERROR" | |
| elif any(isinstance(r, Skipped) for r in case_results): | |
| case_status = "skipped" | |
| else: | |
| case_status = "passed" | |
| if not case_results: | |
| case_status = "passed" | |
| elif any(isinstance(r, Failure) for r in case_results): | |
| case_status = "failed" | |
| elif any(isinstance(r, Error) for r in case_results): | |
| case_status = "error" | |
| elif any(isinstance(r, Skipped) for r in case_results): | |
| case_status = "skipped" | |
| else: | |
| case_status = "passed" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
seems good consistent change
There was a problem hiding this comment.
Thanks for confirming! 👍
There was a problem hiding this comment.
I think Piotr wanted to make ERROR and FAILED stand out in uppercase, this was intentional
There was a problem hiding this comment.
That's a fair point — using uppercase for FAILED and ERROR as a visual signal for actionable/attention-requiring states, while keeping passed and skipped lowercase, is a reasonable intentional design choice. The visual distinction helps them stand out in the table output, which is exactly what you'd want when scanning test results.
I'll withdraw my concern — the mixed casing is purposeful here, not accidental.
Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.
| if suite.name == "pytest": | ||
| suite.name = os.path.splitext(file_name)[0] | ||
| for case in suite: | ||
| case_results = list(case.result) if case.result else [] |
There was a problem hiding this comment.
In junitparser 2.x
case.result already returns a list; wrapping it in list(...) creates an unnecessary shallow copy. The idiom case.result or [] expresses the intent more directly — fall back to an empty list when case.result is falsy (either None or []).
| case_results = list(case.result) if case.result else [] | |
| case_results = case.result or [] |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
f6d72b4 to
23c079c
Compare
Description
Logging status per case as follows:
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there