Skip to content

Logging per test case status#5847

Open
pbarejko wants to merge 6 commits into
isaac-sim:developfrom
pbarejko:pbarejko/per-case-status
Open

Logging per test case status#5847
pbarejko wants to merge 6 commits into
isaac-sim:developfrom
pbarejko:pbarejko/per-case-status

Conversation

@pbarejko
Copy link
Copy Markdown
Collaborator

Description

Logging status per case as follows:

+------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------+----------+---------+
| Test Path                                                                                                                                            | Result | Test (s) | Wall (s) | # Tests |
+------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------+----------+---------+
| /home/pbarejko/git/IsaacLab/source/isaaclab_physx/test/renderers/test_isaac_rtx_renderer_utils.py                                                    | passed |     0.19 |     0.73 |  13/13  |
|     ↳ source.isaaclab_physx.test.renderers.test_isaac_rtx_renderer_utils.TestGetStageStreamingBusy::test_returns_true_when_busy                      | passed |     0.00 |          |         |
|     ↳ source.isaaclab_physx.test.renderers.test_isaac_rtx_renderer_utils.TestGetStageStreamingBusy::test_returns_false_when_idle                     | passed |     0.00 |          |         |
|     ↳ source.isaaclab_physx.test.renderers.test_isaac_rtx_renderer_utils.TestGetStageStreamingBusy::test_returns_false_when_no_context               | passed |     0.00 |          |         |
|     ↳ source.isaaclab_physx.test.renderers.test_isaac_rtx_renderer_utils.TestWaitForStreamingComplete::test_returns_immediately_when_not_busy        | passed |     0.00 |          |         |

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Thanks for adding per-test-case logging! This is a nice enhancement for test result visibility.

✅ Architecture & Design

  • Appropriate location: Changes to tools/conftest.py are correct for test infrastructure modifications
  • Clean integration: The cases list is added to the existing test_status dictionary structure, maintaining backwards compatibility
  • Good import addition: Adding Failure and Skipped from junitparser alongside existing Error import is correct

✅ Implementation Correctness

  • Result status logic is sound: The priority order (FAILED > ERROR > skipped > passed) handles overlapping statuses correctly
  • Empty cases list: All error paths (STARTUP_HANG, TIMEOUT, CRASHED, report parse failure) now correctly include "cases": [] - nice attention to detail
  • Safe defaults: Using case.name or "", case.classname or "", and float(case.time or 0.0) prevents None propagation

🔍 Minor Observations

  1. Line 518-520: The list(case.result) conversion assumes case.result is iterable. While this works with junitparser's API, a brief comment noting this behavior might help future maintainers.

  2. Output formatting: The prefix with 4-space indent creates a nice visual hierarchy in the output table. The empty strings for Wall time and # Tests columns on case rows make sense since those are file-level metrics.

📋 Pre-commit Check

The pre-commit check is failing - please run ./isaaclab.sh --format to address any formatting issues.

✅ Test Coverage

This is test infrastructure code. The changes are self-testing in that any issues would surface immediately when running tests.


This is an automated review. Please address any concerns and reach out if you have questions.


Update (f6d72b4): Nice cleanup! The case status logic has been extracted into a _case_result_status() helper function. This improves code readability and maintainability. ✅


Update (5780763): This is a substantial and excellent refactor of run_individual_tests() 🎉

What changed:

  • The original 250+ line function has been decomposed into ~15 focused helper functions
  • Each helper has a clear single responsibility (e.g., _build_test_command, _run_test_with_retries, _handle_startup_hang, _parse_test_report)
  • The main run_individual_tests() is now a clean 15-line loop

Quality highlights:

  • _status_summary() builder ensures consistent status dict structure across all paths
  • _record_infrastructure_status() cleanly separates infrastructure failures from test failures
  • _run_test_with_retries() encapsulates the retry logic with proper state management
  • _print_retry_context() DRYs up the diagnostic output formatting
  • ✅ Per-case results (cases list) are now properly populated via _parse_test_report()

Minor note: The summary table output now includes per-case rows with the prefix - this is a nice visual improvement for identifying which specific test cases passed/failed.

This refactor significantly improves maintainability. Future changes (new failure modes, additional metrics, etc.) can be isolated to individual helpers. Well done! 👏


Update (bbd0270): The helper function refactoring from the previous commit has been reverted. The code is now back to a single run_individual_tests() function with inline logic (~270 lines).

What changed:

  • Removed: _build_test_command, _read_test_content, _print_retry_context, _run_test_with_retries, _status_summary, _write_synthetic_failure, _handle_startup_hang, _handle_timeout, _handle_missing_report, _parse_test_report, _record_infrastructure_status, _record_report_status, _test_timeout_config, _run_individual_test
  • The retry loops, error handling, and report parsing are now inline within the main function

Observation: While the inlined version is more self-contained, it reverts the maintainability gains from the previous refactor. If the team prefers the explicit inline approach over the decomposed helper style, that's a valid choice - but it may be worth discussing the tradeoffs with the maintainers.

Functionality preserved: ✅ All the per-case logging and status tracking features remain intact. The _case_result_status() helper is still present.


Update (d4d6010): The refactoring is back! 🎉 This commit re-introduces helper functions with some nice improvements:

New/restored helper functions:

  • _print_retry_diagnostics() - consolidated retry diagnostic output
  • _run_test_with_retries() - encapsulates the retry loop logic
  • _append_output_tail() - DRY helper for truncating output tails
  • _write_error_result() - creates synthetic JUnit reports for process errors
  • _process_startup_hang() / _process_timeout_without_report() / _process_missing_report() - handle specific failure modes
  • _process_error_status() - builds consistent error status dicts
  • _parse_test_report() - extracts report parsing with per-case collection
  • _summary_from_report() - calculates final pass/fail from parsed report
  • _test_file_has_cameras() - isolated cold-cache check
  • _build_test_command() - constructs pytest command

Improvements over 5780763:

  • Cleaner function names (e.g., _process_startup_hang vs _handle_startup_hang)
  • _append_output_tail() eliminates duplicated output formatting
  • Explicit return values from _run_test_with_retries() make data flow clearer
  • _summary_from_report() cleanly separates report parsing from pass/fail determination

Result: The main loop in run_individual_tests() is now ~60 lines of clear orchestration logic. This is a solid balance between decomposition and readability. ✅


Update (6d1f645): Second revert - the helper function refactoring has been reverted again.

Current state:

  • Back to the simpler structure: one _case_result_status() helper + inline logic in run_individual_tests()
  • All the per-case logging functionality is preserved
  • The cases list correctly captures test case name, classname, time, and result
  • Error paths (STARTUP_HANG, TIMEOUT, CRASHED, parse failure) all set "cases": []

Final assessment: The core feature (per-test-case status logging) is complete and working correctly. The team appears to be iterating on whether to extract helpers vs keep inline logic - both approaches are valid for this infrastructure code. The current implementation is straightforward and achieves the goal. 👍

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR enhances the test runner's summary table to show per-test-case status as indented sub-rows (prefixed with ) beneath each test file row, parsing Failure, Error, and Skipped result types from the JUnit XML report.

  • run_individual_tests now collects a cases list from the parsed JUnit XML for each test file and stores it in test_status; early-exit paths (startup hang, timeout, crash, parse error) all correctly initialise cases to [].
  • The pytest_sessionstart summary loop iterates status["cases"] to append one additional PrettyTable row per test case, leaving the Wall (s) and # Tests columns empty for those rows.

Confidence Score: 4/5

Safe to merge; the change is additive and only affects the summary table printed at the end of a test run.

The case-collection logic is correct and all early-exit paths properly initialise cases: []. The two findings are purely stylistic — inconsistent casing of status strings and a redundant list() conversion — neither affects runtime behaviour.

No files require special attention beyond the two style nits in tools/conftest.py.

Important Files Changed

Filename Overview
tools/conftest.py Adds per-test-case status rows to the summary table; cases accumulation logic is correct, but result-status casing is inconsistent and the case.result list-copy idiom is slightly noisy.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Parse JUnitXml report] --> B[Iterate suites]
    B --> C[Iterate cases in suite]
    C --> D{case.result empty?}
    D -- Yes --> E[case_status = 'passed']
    D -- No --> F{any Failure?}
    F -- Yes --> G[case_status = 'FAILED']
    F -- No --> H{any Error?}
    H -- Yes --> I[case_status = 'ERROR']
    H -- No --> J{any Skipped?}
    J -- Yes --> K[case_status = 'skipped']
    J -- No --> L[case_status = 'passed']
    E & G & I & K & L --> M[Append to cases list]
    M --> N[Store in test_status with cases key]
    N --> O[Render sub-row in PrettyTable with ↳ prefix]
Loading

Reviews (1): Last reviewed commit: "Logging per test case status" | Re-trigger Greptile

Comment thread tools/conftest.py Outdated
Comment on lines +519 to +528
if not case_results:
case_status = "passed"
elif any(isinstance(r, Failure) for r in case_results):
case_status = "FAILED"
elif any(isinstance(r, Error) for r in case_results):
case_status = "ERROR"
elif any(isinstance(r, Skipped) for r in case_results):
case_status = "skipped"
else:
case_status = "passed"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The result strings for case-level status mix lowercase and uppercase: "passed" and "skipped" are lowercase while "FAILED" and "ERROR" are uppercase. This is inconsistent within the same data structure — if any downstream code ever does a case-sensitive comparison against these values (e.g. filtering, aggregating, or serialising), results with "passed" and "FAILED" would need to be treated differently. Using a uniform casing convention (all-uppercase or all-lowercase) avoids that ambiguity.

Suggested change
if not case_results:
case_status = "passed"
elif any(isinstance(r, Failure) for r in case_results):
case_status = "FAILED"
elif any(isinstance(r, Error) for r in case_results):
case_status = "ERROR"
elif any(isinstance(r, Skipped) for r in case_results):
case_status = "skipped"
else:
case_status = "passed"
if not case_results:
case_status = "passed"
elif any(isinstance(r, Failure) for r in case_results):
case_status = "failed"
elif any(isinstance(r, Error) for r in case_results):
case_status = "error"
elif any(isinstance(r, Skipped) for r in case_results):
case_status = "skipped"
else:
case_status = "passed"

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good consistent change

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 👍

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming! 👍

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Piotr wanted to make ERROR and FAILED stand out in uppercase, this was intentional

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point — using uppercase for FAILED and ERROR as a visual signal for actionable/attention-requiring states, while keeping passed and skipped lowercase, is a reasonable intentional design choice. The visual distinction helps them stand out in the table output, which is exactly what you'd want when scanning test results.

I'll withdraw my concern — the mixed casing is purposeful here, not accidental.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

Comment thread tools/conftest.py Outdated
if suite.name == "pytest":
suite.name = os.path.splitext(file_name)[0]
for case in suite:
case_results = list(case.result) if case.result else []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 In junitparser 2.x case.result already returns a list; wrapping it in list(...) creates an unnecessary shallow copy. The idiom case.result or [] expresses the intent more directly — fall back to an empty list when case.result is falsy (either None or []).

Suggested change
case_results = list(case.result) if case.result else []
case_results = case.result or []

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@pv-nvidia pv-nvidia force-pushed the pbarejko/per-case-status branch from f6d72b4 to 23c079c Compare May 29, 2026 12:37
Copy link
Copy Markdown
Contributor

@pv-nvidia pv-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me.

The tiny greptile suggestion

case_results = case.result or []

seems valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants