Skip to content

Add automated GHA release workflow (WT-1042)#17139

Merged
stevejalim merged 4 commits intomainfrom
WT-1042--gha-release-trigger
Apr 17, 2026
Merged

Add automated GHA release workflow (WT-1042)#17139
stevejalim merged 4 commits intomainfrom
WT-1042--gha-release-trigger

Conversation

@stevejalim
Copy link
Copy Markdown
Contributor

@stevejalim stevejalim commented Apr 16, 2026

This changeset takes our manual release workflow and makes it possible to run the steps via a GHA, with appropriate bail-outs and pauses.

Summary

  • Adds .github/workflows/release.yml: a workflow_dispatch-triggered pipeline that automates the full Bedrock release process (preflight → stage → optional QA pause → prod), replacing 12+ manual steps
  • Modifies bin/tag-release.sh to add a --ci flag for non-interactive use in the workflow; interactive mode is unchanged
  • Passes zizmor --pedantic with 0 findings

How it works

The workflow runs five jobs in order:

  1. preflight-checks — Resolves the HEAD SHA of main, posts to #www and #www-notify, then gates on unit tests, pre-commit standards, Docker build, and integration tests all passing for that exact SHA on main/dev. Uses polling rather than re-running tests.
  2. deploy-to-stage — Verifies SHA hasn't changed, posts to #www, pushes to stage, waits for build + both integration test runs. If pause_on_staging=true, posts an approval prompt linking to the run.
  3. prod-approval-gate — Only when pause_on_staging=true; uses the prod GitHub environment's required reviewers to pause until manually approved.
  4. deploy-to-prod — Verifies SHA again, runs bin/tag-release.sh --ci --push to create the YYYY-MM-DD[.X] tag and push to prod, then waits for the prod build and both prod integration test runs.
  5. notify-completion — Always runs; posts final success/failure summary to both #www and #www-notify, with a distinct message if the tag was pushed but post-deployment verification failed.

Test plan

  • Trigger workflow on a branch and verify it gates correctly on preflight checks
  • Verify Slack messages appear in #www and #www-notify at each step
  • Test pause_on_staging=true: confirm workflow pauses at the prod gate and the #www approval message includes a working link
  • Test pause_on_staging=false: confirm workflow proceeds automatically after stage passes
  • Test a failure scenario (e.g. failing unit tests) and verify the workflow aborts with a Slack failure notification
  • Confirm bin/tag-release.sh --ci --push produces the correct tag and that the interactive mode still works unchanged
  • Add required reviewers to the prod GitHub environment (teams: bedrock-codeowners-backend, bedrock-codeowners-frontend)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.74%. Comparing base (43a679c) to head (774e811).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #17139   +/-   ##
=======================================
  Coverage   81.74%   81.74%           
=======================================
  Files         174      174           
  Lines        9347     9347           
=======================================
  Hits         7641     7641           
  Misses       1706     1706           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stevejalim stevejalim requested a review from Copilot April 16, 2026 16:14
Replaces the current manual 12-step release process with a single
workflow_dispatch-triggered GitHub Actions pipeline that handles the
full deployment cycle from preflight checks through to production.

## What the workflow does

The workflow (`.github/workflows/release.yml`) runs five jobs in sequence:

1. **preflight-checks** — Resolves the HEAD SHA of main, posts a
   "Starting release" message to #www and #www-notify with a count of
   pending commits, then gates on four CI signals for that exact SHA:
   unit tests, pre-commit standards, Docker image build (dev), and at
   least one integration test run for main/dev. All checks poll the GHA
   API rather than re-running tests themselves.

2. **deploy-to-stage** — Verifies HEAD still matches the release SHA
   (guards against a concurrent push to main), posts "Pushing to Bedrock
   stage" to #www, then pushes the SHA to the `stage` branch. Waits for
   the `build-and-push` workflow to complete for that commit on stage,
   then waits for both integration test runs dispatched by the deployment
   infrastructure to succeed. If `pause_on_staging=true` was requested,
   also posts an approval prompt to #www with a link to the workflow run.

3. **prod-approval-gate** — Only runs when `pause_on_staging=true`.
   References the `prod` GitHub environment, which should have required
   reviewers configured; the workflow pauses here until a reviewer
   approves in the GHA UI.

4. **deploy-to-prod** — Verifies HEAD again, then runs
   `bin/tag-release.sh --ci --push` to create a `YYYY-MM-DD[.X]` tag
   and push it along with HEAD to the `prod` branch. Posts "Pushing
   Bedrock Prod, tagged X" to #www, then waits for the prod Docker build
   and both prod integration test runs to succeed.

5. **notify-completion** — Always runs regardless of earlier job
   outcomes; posts a final success or failure summary to both #www and
   #www-notify. Distinguishes the case where the tag was pushed but
   post-deployment checks failed (partial completion).

## Changes to bin/tag-release.sh

Adds a `--ci` / `-c` flag for non-interactive use:
- If main and stage branches do not match, exits non-zero with a clear
  error message instead of prompting for "Override".
- Skips the "Did tests pass on staging?" confirmation prompt (the
  workflow has already verified this via the GHA API).
- Writes `tag=YYYY-MM-DD[.X]` to `$GITHUB_OUTPUT` when that env var is
  present, so the workflow can capture and report the tag name.

Interactive mode (default, no `--ci`) is unchanged.

## Security

- Workflow-level permissions are `contents: read` + `actions: read`;
  only the two deployment jobs that push branches/tags elevate to
  `contents: write`.
- All `${{ }}` expressions used in `run:` steps are passed via `env:`
  to avoid shell injection.
- `actions/checkout` is pinned to a specific commit SHA.
- `persist-credentials: false` is set on read-only checkouts;
  deployment checkouts retain credentials (needed for git push) with
  Zizmor ignore annotations explaining why.
- A workflow-level `concurrency: release` group prevents overlapping
  release runs.
- Checked with `zizmor --pedantic`: 0 findings (2 intentional ignores
  for push checkouts that require credential persistence).

## Environment configuration required

The `prod` GitHub environment should have required reviewers added
(teams: bedrock-codeowners-backend and bedrock-codeowners-frontend)
to activate the `pause_on_staging` approval gate.
# Prevent concurrent release runs to avoid overlapping deployments.
concurrency:
group: release
cancel-in-progress: false
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: maybe this should be cancel-in-progress: true in case we want to stop the release bus and let a newer version of main reach production ASAP

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer to self: nope

The main risk is that the release workflow performs irreversible side effects at multiple points — and cancellation can happen after some but not all of them have completed.

The dangerous scenarios:

  1. Cancelled after pushing to stage but before prod — Stage gets deployed with Run 1's SHA, then Run 2 pushes a different SHA to stage. Two builds/test cycles run simultaneously on stage, and Run 2's polling could match Run 1's results.
  2. Cancelled after tag-release.sh runs — This is the worst case. A prod tag is pushed, triggering Docker builds and actual production deployment, but Run 1 never verifies the integration tests passed. Production is running unverified code. Then Run 2 creates a second tag (e.g. 2026-04-17.1), so you get two prod releases for the same day.
  3. Cancelled mid-tag-release.sh (between tag push and prod branch push) — The tag exists but the prod branch wasn't updated, leaving an inconsistent state.

Key detail: The downstream workflows (build-and-push, integration_tests, etc.) are triggered by the git pushes/tags themselves — they keep running even after the release workflow is cancelled.

Bottom line: cancel-in-progress: false (current) is the safe choice. Queuing ensures each release completes or fails cleanly before the next starts. Changing to true risks unverified prod deployments and orphaned tags.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a GitHub Actions–driven release pipeline intended to automate the Bedrock release process end-to-end (preflight → stage → optional approval gate → prod) and updates the existing tagging script to support non-interactive CI usage.

Changes:

  • Added a new workflow_dispatch release workflow that posts Slack notifications, polls CI status, deploys to stage/prod, and performs post-deploy verification.
  • Updated bin/tag-release.sh with a --ci mode to skip prompts and to emit the created tag via $GITHUB_OUTPUT.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 7 comments.

File Description
.github/workflows/release.yml New automated release workflow with preflight gates, stage/prod deploy, approval gate, and Slack notifications.
bin/tag-release.sh Adds CI mode behavior and exports the release tag for GitHub Actions consumption.

Comment thread .github/workflows/release.yml Outdated
Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml
Comment on lines +346 to +352
RUNS=$(gh run list \
--repo mozilla/bedrock \
--workflow integration_tests.yml \
--json displayTitle,conclusion,status,createdAt \
| jq --arg since "$STAGE_BUILD_DONE_TIME" \
'[.[] | select(.displayTitle | test("Integration tests for stage")) | select(.createdAt >= $since)]')
SUCCESS_COUNT=$(echo "$RUNS" | jq '[.[] | select(.conclusion == "success")] | length')
Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml Outdated
- Fix preflight-checks checkout to explicitly use ref: main, so the
  release SHA is always resolved from main regardless of which ref
  triggered the workflow_dispatch
- Add actions: read to deploy-to-stage and deploy-to-prod job
  permissions; job-level permissions replace (not merge) workflow-level
  defaults, so polling via gh run list would have failed without it
- Update integration_tests.yml run-name to include the dispatched
  git_sha: "Integration tests for <branch> @ <sha>". Update all three
  integration test polling filters in release.yml to match on the SHA
  in the displayTitle, preventing false positives from unrelated runs
  on the same branch
- Improve origin/prod unavailable handling in commit count step: detect
  missing ref explicitly instead of silently reporting 0 commits pending
Comment thread .github/workflows/release.yml Outdated
pause_on_staging:
description: "Pause after stage deployment for manual QA before pushing to prod"
type: boolean
default: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevejalim can we set detault:true here for a human manual QA review prior to prod?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do, sure. The real-world human step usually just looks through the results of our integrration-tests GHA, rather than clicking around the site, though, so I went with what's closer to our reality right now.

@stevejalim stevejalim requested a review from Copilot April 17, 2026 11:10
Comment thread .github/workflows/release.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR automates Bedrock’s previously manual release process by adding a GitHub Actions workflow_dispatch pipeline that performs preflight verification, stage deploy + verification, optional approval gating, production deploy/tagging, and final Slack notifications.

Changes:

  • Added a new .github/workflows/release.yml workflow to drive the end-to-end release flow (including polling existing CI runs instead of re-running them).
  • Updated bin/tag-release.sh with a --ci mode for non-interactive usage and to emit the created tag via $GITHUB_OUTPUT.
  • Updated the integration test workflow run name to include both branch and deployed SHA, enabling the release workflow’s polling/lookup logic.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.

File Description
bin/tag-release.sh Adds CI/non-interactive behavior and outputs the created tag for downstream workflow steps.
.github/workflows/release.yml Introduces the orchestrated release workflow with gating, deploy, optional approval, and Slack notifications.
.github/workflows/integration_tests.yml Adjusts run-name to include SHA so release workflow polling can match the correct runs.

Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml
Ensures deploy-to-stage, deploy-to-prod, and notify-completion only
run in the canonical mozilla/bedrock repo, preventing failures in forks
due to missing secrets or insufficient permissions.
Copy link
Copy Markdown
Contributor Author

@stevejalim stevejalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this to give it a try

@stevejalim stevejalim merged commit 559d296 into main Apr 17, 2026
5 checks passed
@stevejalim stevejalim deleted the WT-1042--gha-release-trigger branch April 17, 2026 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants