Add automated GHA release workflow (WT-1042)#17139
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #17139 +/- ##
=======================================
Coverage 81.74% 81.74%
=======================================
Files 174 174
Lines 9347 9347
=======================================
Hits 7641 7641
Misses 1706 1706 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replaces the current manual 12-step release process with a single
workflow_dispatch-triggered GitHub Actions pipeline that handles the
full deployment cycle from preflight checks through to production.
## What the workflow does
The workflow (`.github/workflows/release.yml`) runs five jobs in sequence:
1. **preflight-checks** — Resolves the HEAD SHA of main, posts a
"Starting release" message to #www and #www-notify with a count of
pending commits, then gates on four CI signals for that exact SHA:
unit tests, pre-commit standards, Docker image build (dev), and at
least one integration test run for main/dev. All checks poll the GHA
API rather than re-running tests themselves.
2. **deploy-to-stage** — Verifies HEAD still matches the release SHA
(guards against a concurrent push to main), posts "Pushing to Bedrock
stage" to #www, then pushes the SHA to the `stage` branch. Waits for
the `build-and-push` workflow to complete for that commit on stage,
then waits for both integration test runs dispatched by the deployment
infrastructure to succeed. If `pause_on_staging=true` was requested,
also posts an approval prompt to #www with a link to the workflow run.
3. **prod-approval-gate** — Only runs when `pause_on_staging=true`.
References the `prod` GitHub environment, which should have required
reviewers configured; the workflow pauses here until a reviewer
approves in the GHA UI.
4. **deploy-to-prod** — Verifies HEAD again, then runs
`bin/tag-release.sh --ci --push` to create a `YYYY-MM-DD[.X]` tag
and push it along with HEAD to the `prod` branch. Posts "Pushing
Bedrock Prod, tagged X" to #www, then waits for the prod Docker build
and both prod integration test runs to succeed.
5. **notify-completion** — Always runs regardless of earlier job
outcomes; posts a final success or failure summary to both #www and
#www-notify. Distinguishes the case where the tag was pushed but
post-deployment checks failed (partial completion).
## Changes to bin/tag-release.sh
Adds a `--ci` / `-c` flag for non-interactive use:
- If main and stage branches do not match, exits non-zero with a clear
error message instead of prompting for "Override".
- Skips the "Did tests pass on staging?" confirmation prompt (the
workflow has already verified this via the GHA API).
- Writes `tag=YYYY-MM-DD[.X]` to `$GITHUB_OUTPUT` when that env var is
present, so the workflow can capture and report the tag name.
Interactive mode (default, no `--ci`) is unchanged.
## Security
- Workflow-level permissions are `contents: read` + `actions: read`;
only the two deployment jobs that push branches/tags elevate to
`contents: write`.
- All `${{ }}` expressions used in `run:` steps are passed via `env:`
to avoid shell injection.
- `actions/checkout` is pinned to a specific commit SHA.
- `persist-credentials: false` is set on read-only checkouts;
deployment checkouts retain credentials (needed for git push) with
Zizmor ignore annotations explaining why.
- A workflow-level `concurrency: release` group prevents overlapping
release runs.
- Checked with `zizmor --pedantic`: 0 findings (2 intentional ignores
for push checkouts that require credential persistence).
## Environment configuration required
The `prod` GitHub environment should have required reviewers added
(teams: bedrock-codeowners-backend and bedrock-codeowners-frontend)
to activate the `pause_on_staging` approval gate.
647178b to
c4347e2
Compare
| # Prevent concurrent release runs to avoid overlapping deployments. | ||
| concurrency: | ||
| group: release | ||
| cancel-in-progress: false |
There was a problem hiding this comment.
Note to self: maybe this should be cancel-in-progress: true in case we want to stop the release bus and let a newer version of main reach production ASAP
There was a problem hiding this comment.
Answer to self: nope
The main risk is that the release workflow performs irreversible side effects at multiple points — and cancellation can happen after some but not all of them have completed.
The dangerous scenarios:
- Cancelled after pushing to stage but before prod — Stage gets deployed with Run 1's SHA, then Run 2 pushes a different SHA to stage. Two builds/test cycles run simultaneously on stage, and Run 2's polling could match Run 1's results.
- Cancelled after tag-release.sh runs — This is the worst case. A prod tag is pushed, triggering Docker builds and actual production deployment, but Run 1 never verifies the integration tests passed. Production is running unverified code. Then Run 2 creates a second tag (e.g. 2026-04-17.1), so you get two prod releases for the same day.
- Cancelled mid-tag-release.sh (between tag push and prod branch push) — The tag exists but the prod branch wasn't updated, leaving an inconsistent state.
Key detail: The downstream workflows (build-and-push, integration_tests, etc.) are triggered by the git pushes/tags themselves — they keep running even after the release workflow is cancelled.
Bottom line: cancel-in-progress: false (current) is the safe choice. Queuing ensures each release completes or fails cleanly before the next starts. Changing to true risks unverified prod deployments and orphaned tags.
There was a problem hiding this comment.
Pull request overview
This PR introduces a GitHub Actions–driven release pipeline intended to automate the Bedrock release process end-to-end (preflight → stage → optional approval gate → prod) and updates the existing tagging script to support non-interactive CI usage.
Changes:
- Added a new
workflow_dispatchrelease workflow that posts Slack notifications, polls CI status, deploys to stage/prod, and performs post-deploy verification. - Updated
bin/tag-release.shwith a--cimode to skip prompts and to emit the created tag via$GITHUB_OUTPUT.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
.github/workflows/release.yml |
New automated release workflow with preflight gates, stage/prod deploy, approval gate, and Slack notifications. |
bin/tag-release.sh |
Adds CI mode behavior and exports the release tag for GitHub Actions consumption. |
| RUNS=$(gh run list \ | ||
| --repo mozilla/bedrock \ | ||
| --workflow integration_tests.yml \ | ||
| --json displayTitle,conclusion,status,createdAt \ | ||
| | jq --arg since "$STAGE_BUILD_DONE_TIME" \ | ||
| '[.[] | select(.displayTitle | test("Integration tests for stage")) | select(.createdAt >= $since)]') | ||
| SUCCESS_COUNT=$(echo "$RUNS" | jq '[.[] | select(.conclusion == "success")] | length') |
- Fix preflight-checks checkout to explicitly use ref: main, so the release SHA is always resolved from main regardless of which ref triggered the workflow_dispatch - Add actions: read to deploy-to-stage and deploy-to-prod job permissions; job-level permissions replace (not merge) workflow-level defaults, so polling via gh run list would have failed without it - Update integration_tests.yml run-name to include the dispatched git_sha: "Integration tests for <branch> @ <sha>". Update all three integration test polling filters in release.yml to match on the SHA in the displayTitle, preventing false positives from unrelated runs on the same branch - Improve origin/prod unavailable handling in commit count step: detect missing ref explicitly instead of silently reporting 0 commits pending
| pause_on_staging: | ||
| description: "Pause after stage deployment for manual QA before pushing to prod" | ||
| type: boolean | ||
| default: false |
There was a problem hiding this comment.
@stevejalim can we set detault:true here for a human manual QA review prior to prod?
There was a problem hiding this comment.
We could do, sure. The real-world human step usually just looks through the results of our integrration-tests GHA, rather than clicking around the site, though, so I went with what's closer to our reality right now.
There was a problem hiding this comment.
Pull request overview
This PR automates Bedrock’s previously manual release process by adding a GitHub Actions workflow_dispatch pipeline that performs preflight verification, stage deploy + verification, optional approval gating, production deploy/tagging, and final Slack notifications.
Changes:
- Added a new
.github/workflows/release.ymlworkflow to drive the end-to-end release flow (including polling existing CI runs instead of re-running them). - Updated
bin/tag-release.shwith a--cimode for non-interactive usage and to emit the created tag via$GITHUB_OUTPUT. - Updated the integration test workflow run name to include both branch and deployed SHA, enabling the release workflow’s polling/lookup logic.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
bin/tag-release.sh |
Adds CI/non-interactive behavior and outputs the created tag for downstream workflow steps. |
.github/workflows/release.yml |
Introduces the orchestrated release workflow with gating, deploy, optional approval, and Slack notifications. |
.github/workflows/integration_tests.yml |
Adjusts run-name to include SHA so release workflow polling can match the correct runs. |
Ensures deploy-to-stage, deploy-to-prod, and notify-completion only run in the canonical mozilla/bedrock repo, preventing failures in forks due to missing secrets or insufficient permissions.
stevejalim
left a comment
There was a problem hiding this comment.
Merging this to give it a try
This changeset takes our manual release workflow and makes it possible to run the steps via a GHA, with appropriate bail-outs and pauses.
Summary
.github/workflows/release.yml: aworkflow_dispatch-triggered pipeline that automates the full Bedrock release process (preflight → stage → optional QA pause → prod), replacing 12+ manual stepsbin/tag-release.shto add a--ciflag for non-interactive use in the workflow; interactive mode is unchangedzizmor --pedanticwith 0 findingsHow it works
The workflow runs five jobs in order:
main, posts to#wwwand#www-notify, then gates on unit tests, pre-commit standards, Docker build, and integration tests all passing for that exact SHA onmain/dev. Uses polling rather than re-running tests.#www, pushes tostage, waits for build + both integration test runs. Ifpause_on_staging=true, posts an approval prompt linking to the run.pause_on_staging=true; uses theprodGitHub environment's required reviewers to pause until manually approved.bin/tag-release.sh --ci --pushto create theYYYY-MM-DD[.X]tag and push toprod, then waits for the prod build and both prod integration test runs.#wwwand#www-notify, with a distinct message if the tag was pushed but post-deployment verification failed.Test plan
#wwwand#www-notifyat each steppause_on_staging=true: confirm workflow pauses at the prod gate and the#wwwapproval message includes a working linkpause_on_staging=false: confirm workflow proceeds automatically after stage passesbin/tag-release.sh --ci --pushproduces the correct tag and that the interactive mode still works unchangedprodGitHub environment (teams:bedrock-codeowners-backend,bedrock-codeowners-frontend)