Skip to content

[Changelog] Move nightly commit/push lifecycle into cli.py auto-bump subcommand#5867

Draft
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/changelog-auto-bump
Draft

[Changelog] Move nightly commit/push lifecycle into cli.py auto-bump subcommand#5867
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/changelog-auto-bump

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented May 29, 2026

1. Summary

  • Lifts the entire post-compile lifecycle (stage, commit, rebase-on-conflict, push) out of inline shell in .github/workflows/nightly-changelog.yml and into a new cli.py auto-bump subcommand.
  • Eliminates the producer/consumer drift class of bug that bricked the nightly cron on 2026-05-29 (workflow's git add glob missed pyproject.toml after Clean up sub module packaging and remove setup.py #5785 added the pyproject write site). AutoBumpRun stages whatever each Package.compile actually wrote — the staging set is an in-process list, not a duplicated glob in YAML.
  • 2 files changed, 663 insertions, 14 deletions. New tests cover the lifecycle end-to-end against a tempdir repo + bare-repo remote with no GitHub interaction.

2. Background

The 2026-05-29 incident (https://github.com/isaac-sim/IsaacLab/actions/runs/26621920244) traced back to the workflow's git add carrying duplicated knowledge of which files cli.py modifies. #5785 added a write site in cli.py without updating the YAML's glob; the cron noticed only after #5831 cleared an upstream wedge that had been masking the issue.

#5859 patches the immediate symptom on main. This PR removes the duplication so future producer changes (new file-write sites in cli.py) don't need paired YAML edits to keep the cron healthy. Discussed at length in the iteration thread following #5859.

3. Design

3.1 Package.compile returns the touched-paths manifest

write_changelog_entry and write_version now return the list of paths they wrote (empty in dry-run). Package.compile returns (compiled, touched)compiled preserves the existing any_compiled signal for cmd_compile; touched is the in-process manifest the orchestrator stages.

No callable wrappers, no external manifest file, no subprocess hand-off. The list lives in memory for the duration of the run.

3.2 GitRepo — thin subprocess wrapper

Single responsibility: shell out to git and raise typed exceptions on the failure modes that need policy. One nested exception (GitRepo.NonFastForward) for the recoverable case the orchestrator retries. Stateless beyond cwd. Easy to test by pointing at a tempdir git repo.

3.3 AutoBumpRun — per-run orchestrator

Owns the per-run state (branch, remote, dry-run flag, touched list, failure list, any_compiled flag). The run() method reads top-down:

  • _compile_all walks Package.discover() with per-package try/except (partial success).
  • _stage_and_commit configures the bot author identity (deterministic public values, hardcoded as class constants per [[feedback_oop_when_possible]] for non-trivial code), stages self.touched, commits with the structured message.
  • _push_with_retry is a three-attempt fetch+rebase+push loop. Rebase only fires when the race actually fires, and always against a clean (committed) working tree — eliminates the unstaged-residue trap that bricked the cron today.
  • _exit_code returns 1 if any package raised during compile, 0 otherwise.

The bot identity values are public, not secret (they appear on every nightly commit). Hardcoding them as class constants keeps cli.py self-contained without leaking secrets — the App token itself stays in repo secrets and is consumed by the workflow before cli.py runs.

3.4 Test coverage in tools/changelog/test/test_auto_bump.py

Seven scenarios, all against a tempdir working tree + bare-repo remote with no GitHub interaction:

  1. test_auto_bump_happy_path_commits_and_pushes — single package with a fragment compiles and lands on the bare remote with the bot's author identity.
  2. test_auto_bump_stages_and_commits_pyproject — regression for today's bug: package with [project] version in pyproject.toml produces a commit that carries both extension.toml AND pyproject.toml bumps.
  3. test_auto_bump_rebases_on_non_fast_forward — sidecar commit pre-pushed to bare remote; first push fails non-fast-forward; rebase loop fetches the new tip, replays the auto-commit, second push succeeds; bot's commit lands on top with human's commit one below.
  4. test_auto_bump_raises_after_exhausting_retries — every retry races against another sidecar commit; orchestrator raises GitError after PUSH_RETRIES attempts rather than spinning.
  5. test_auto_bump_ships_healthy_packages_when_one_fails — two packages, one with a broken CHANGELOG.rst header; healthy commit ships to remote, exit code 1.
  6. test_auto_bump_dry_run_does_not_commit — no commits, no push, fragments preserved on disk.
  7. test_auto_bump_with_no_fragments_is_a_noop — exit 0, no commit, no push.

All 7 pass. Full suite goes from 90 → 97.

4. Follow-up

This PR is develop-scoped: pure cli.py + tests. The workflow YAML edit on main that consumes the new subcommand is a one-line follow-up:

- name: Compile and push
  id: compile
  continue-on-error: true
  env:
    TARGET_BRANCH: \${{ matrix.branch }}
  run: |
    python3 tools/changelog/cli.py auto-bump \
      --branch "\$TARGET_BRANCH" --remote origin \
      --event-name "\${{ github.event_name }}" \
      \${{ inputs.dry_run && '--dry-run' || '' }}

- if: \${{ steps.compile.outcome == 'failure' }}
  run: exit 1

Will land on main after this PR is in develop AND cherry-picked to release/3.0.0-beta2, so every cron-target branch carries auto-bump before main's workflow tries to invoke it.

5. What stays in cli.py vs the workflow

After both PRs land:

  • cli.py owns: file writes, staging set, commit shape, bot identity, retry/rebase policy, partial-success semantics, all behavior unit-tested in pytest.
  • workflow owns: actions/checkout, actions/setup-python, App-token mint, matrix shape, CRON_BRANCHES, GitHub-context env vars. None of it changes when cli.py grows a new file-write site.

The PR gate (.github/workflows/changelog-check.yml) already runs the changelog test suite on PRs touching tools/changelog/**, so future producer changes get caught at PR time by the new test_auto_bump.py without any workflow YAML edits.

6. Test plan

  • pytest tools/changelog/test/ — 97/97 pass (was 90, +7 new).
  • ./isaaclab.sh -f clean.
  • Authorship: single commit authored by jichuanh.
  • Reviewer sanity check on the GitRepo subprocess surface (especially the non-fast-forward detection regex matching [rejected] for both git-side and ruleset-side rejections).
  • Follow-up PR against main with the workflow YAML simplification, gated on this PR landing on develop AND release/3.0.0-beta2.

The nightly changelog cron's wedge mode on 2026-05-29 (workflow's
git-add glob missing pyproject.toml after isaac-sim#5785 added the pyproject
write path) was a producer/consumer drift bug between cli.py and
nightly-changelog.yml. The two surfaces carried duplicated knowledge of
which files cli.py touches, and the YAML side never got the memo when
the producer grew a new write site.

Lift the entire post-compile lifecycle (stage, commit, rebase-on-conflict
loop, push) out of inline shell in the workflow and into a new
``cli.py auto-bump`` subcommand. The orchestrator derives its staging
set from each ``Package.compile`` return value, so the glob is an
in-process list of files cli.py just wrote — drift is impossible by
construction. The workflow shrinks to one ``python3 cli.py auto-bump``
call (follow-up on main once this lands on develop and release).

Changes:

- ``Package.write_changelog_entry`` and ``Package.write_version`` now
  return the list of paths they wrote (empty in dry-run).
- ``Package.compile`` returns ``(compiled, touched)`` — the existing
  truthy/falsy semantics carry through ``cmd_compile``'s ``any_compiled``
  tracking.
- New ``GitRepo`` class: thin subprocess wrapper around the ``git`` CLI
  with one nested exception (``NonFastForward``) for the policy decision
  the orchestrator cares about.
- New ``AutoBumpRun`` class: end-to-end lifecycle owner. Hardcoded bot
  identity (deterministic public values, not secrets), three-retry push
  loop with fetch+rebase on conflict, partial-success across packages.
- New ``cmd_auto_bump`` handler + subparser.

Tests in tools/changelog/test/test_auto_bump.py cover seven scenarios
against a tempdir working tree + bare-repo remote with no GitHub
interaction:

- Happy path commits and pushes.
- pyproject.toml staging regression (today's bug class).
- Non-fast-forward push → rebase loop → second push succeeds.
- Exhausted retries raise GitError.
- Partial success: one bad package, one good; healthy commit ships,
  exit 1.
- Dry-run skips commit/push entirely.
- No fragments is a clean no-op.

Workflow YAML on main consumes the new subcommand in a follow-up PR.
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Verdict: ✅ LGTM

This PR correctly addresses the producer/consumer drift class of bug that caused the 2026-05-29 nightly failure.

What I Checked

Area Status
CLI subcommand implementation ✅ Correct
Error handling ✅ Robust
Replaces workflow steps properly ✅ Yes
Test coverage ✅ Excellent (7 scenarios)

Strengths

  1. Drift-proof by constructionPackage.compile() returning (compiled, touched) means the staging set is derived from what cli.py actually wrote, not a duplicated glob in YAML.

  2. Clean architectureGitRepo (subprocess wrapper with typed exceptions) and AutoBumpRun (orchestrator) have clear single responsibilities.

  3. Robust race handling — The 3-retry push loop with fetch+rebase on NonFastForward handles concurrent commits gracefully without spinning forever.

  4. Partial success semantics — One broken package does not block healthy packages from shipping; exit code reflects failures appropriately.

  5. Comprehensive tests — All 7 scenarios exercise real tempdir repos with bare-remote pushes, no mocking subprocess. This catches integration issues the PR gate would miss with unit-only tests.

Minor Notes (not blocking)

  • _extract_version_from_diff gracefully returns "?" on parse failure — appropriate for informational commit messages.
  • Bot identity hardcoded as class constants is intentional (public values, per the design doc).

No issues found. Ready for human reviewer sign-off.


Reviewed at HEAD: 15e9d206bfeb8fe53f865510399398ad92835f36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant