Skip to content

feat: Single workflow and actions optimizations#2140

Merged
taddes merged 24 commits intomasterfrom
feat/gh-single-workflow-optimizations-STOR-512
Mar 27, 2026
Merged

feat: Single workflow and actions optimizations#2140
taddes merged 24 commits intomasterfrom
feat/gh-single-workflow-optimizations-STOR-512

Conversation

@taddes
Copy link
Copy Markdown
Collaborator

@taddes taddes commented Mar 17, 2026

Description

Updates the GitHub actions for Sync to a single workflow and optimizes runs the jobs.

Cuts down the run times considerably, increases cache usage and gets us faster feedback across the board.
Screenshot 2026-03-26 at 2 36 21 PM

Runtimes

Total workflow (longest run is Spanner unit tests) is ~22 min
build-and-unit-test-** : can run as fast as 6min with cache.
Screenshot 2026-03-26 at 1 16 48 PM
Spanner build and test runs would take ~45 min | Now build takes 19s with cache, 5 min for e2e tests and 6 - 19 min for unit tests (depending on cache)
Screenshot 2026-03-26 at 1 18 01 PM
Postgres build and test runs would take ~25| Now build takes 18s with cache, 1 min 40 s for e2e tests and 6 - 11 min for unit tests (depending on cache)

Workflow Consolidation

The four separate workflow files (checks.yml, postgres.yml, mysql.yml, spanner.yml) have been merged into a single main-workflow.yml. Previously each database backend ran as an independent workflow with its own triggers, duplicate setup steps, and no shared structure. All three backend pipelines now run as concurrent jobs within one workflow.

Triggers remain unchanged: push to master/main, tags, and pull requests. A workflow_dispatch input was added to allow manual runs with a custom python-version or rust-version.


New: Automatic Cache Cleanup (cleanup-branch-cache.yml)

A new workflow deletes all Actions cache entries associated with a branch when its pull request is closed — whether merged or abandoned. This prevents stale cache accumulation across long-lived feature branches. Uses gh cache list scoped to the branch ref and deletes each entry by ID, with || true guards so individual deletion failures don't fail the job.


Job Structure

The consolidated workflow runs the following jobs:

Stage Jobs
Setup python-env, rust-env
Checks python-checks, rust-checks, clippy (matrix: postgres/mysql/spanner)
Build & Test build-and-unit-test-postgres, build-and-unit-test-mysql, build-and-unit-test-spanner
Image Build build-postgres-image, build-mysql-image, build-spanner-image
E2E postgres-e2e-tests, mysql-e2e-tests, spanner-e2e-tests

Clippy runs as a matrix across all three backends in parallel. All three database pipelines (build, test, image, e2e) run fully concurrently with each other.


Caching

Significant caching was added to reduce redundant work across runs:

  • Rust toolchain — cached by RUST_VERSION; downstream jobs restore without reinstalling
  • Python/Poetry virtualenv~/.cache/pip and ~/.cache/pypoetry/virtualenvs cached by lockfile hash; pip3 install poetry remains explicit in jobs that need the binary
  • Cargo build artifacts~/.cargo/registry and target/ cached per-backend (cargo-postgres-, cargo-mysql-, cargo-spanner-) with restore-keys fallback for partial hits on Cargo.lock changes
  • Docker image tars — each backend's built image is cached as a .tar file keyed on Dockerfile, Cargo.lock, all .rs/Cargo.toml files, tools/**, and scripts/**; image build and Buildx setup steps are skipped on cache hit
  • cargo-audit — cached; only reinstalled when main-workflow.yml changes
  • mdbook + mdbook-mermaid — cached; only reinstalled when Makefile changes
  • cargo-llvm-cov — cached per Cargo.lock hash in all three unit test jobs; previously compiled from source every run
  • cargo-nextest — cached per Cargo.lock hash in all three unit test jobs; previously downloaded on every run

Security & Permissions

Workflow-level permissions: {} denies all token permissions by default. Each job grants only what it needs (contents: read, checks: write, actions: write where applicable). The rust-env setup job no longer checks out the repository, as it only installs the Rust toolchain and needs no repo contents.


Dependabot

Dependabot is configured for all three ecosystems with weekly updates and open-pull-requests-limit: 1 to avoid PR floods:

  • Cargo — grouped into dev-deps and prod-deps to reduce noise from patch bumps
  • pip/Poetry — all Python directories grouped under a single python-deps group
  • GitHub Actions — all action pin updates grouped under actions-deps

In deciding what build and test methods should be cancelled if they fail, I settled for a middle ground: I only gate the image builds (expensive, long-running) behind checks, but let unit tests / e2e tests run concurrently. This means we can get more feedback across the board should there be a problem with the code and not have to wait so long before getting feedback.

Issue(s)

Closes STOR-512.

@taddes taddes self-assigned this Mar 17, 2026
@taddes taddes force-pushed the feat/gh-single-workflow-optimizations-STOR-512 branch 6 times, most recently from d197773 to 22d33ac Compare March 23, 2026 20:32
@taddes taddes force-pushed the feat/gh-single-workflow-optimizations-STOR-512 branch 2 times, most recently from 63fb7b4 to c3066dc Compare March 26, 2026 17:35
@taddes taddes changed the title [WIP] feat: Single workflow and actions optimizations feat: Single workflow and actions optimizations Mar 26, 2026
@taddes taddes force-pushed the feat/gh-single-workflow-optimizations-STOR-512 branch from c3066dc to 2e21c38 Compare March 26, 2026 17:40
@taddes taddes marked this pull request as ready for review March 26, 2026 17:51
@taddes taddes requested review from chenba and pjenvey March 26, 2026 18:53
@taddes taddes force-pushed the feat/gh-single-workflow-optimizations-STOR-512 branch from b593772 to 9c5e107 Compare March 26, 2026 20:17
path: |
~/.rustup/toolchains
~/.rustup/update-hashes
key: ${{ runner.os }}-rust-toolchain-${{ env.RUST_VERSION }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could use a cache-hit != 'true' fallback like the other actions/cache actions below?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you pointed this out, since it's more useful for the cache and install steps in the rust-env job; that's the expensive operation that downloads components. Skipping it on a cache hit saves real time.

The "Restore Rust toolchain" steps in downstream jobs don't install anything — they just restore cached files. The step that follows, rustup default, only switches which toolchain is active (no download, so instant). There's nothing expensive to skip, so no guard is needed there, but I added it for the Install Rust toolchain 👍

push:
branches:
- master
- main
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-proofing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. We keep knocking around the idea to finally convert and never do, so perhaps this will remind us someday 😆

@taddes taddes force-pushed the feat/gh-single-workflow-optimizations-STOR-512 branch from 9c5e107 to ef9ec10 Compare March 27, 2026 00:50
destination: ecosystem-test-eng-metrics/syncstorage-rs/junit
glob: "*.xml"
parent: false
process_gcloudignore: false
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to break the file up into smaller ones?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into this, perhaps composite actions or reusable actions that can be pulled into a central workflow? If it seems like something doable, I can file a quick ticket because yeah, that's the main disadvantage: this big file (just like our .circleci.yaml was)

Copy link
Copy Markdown
Collaborator

@chenba chenba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks. Looks great as a single workflow.

@taddes
Copy link
Copy Markdown
Collaborator Author

taddes commented Mar 27, 2026

👍 Thanks. Looks great as a single workflow.

Thanks! And it runs faster, thank goodness.

@taddes taddes merged commit cf1d30f into master Mar 27, 2026
23 checks passed
@taddes taddes deleted the feat/gh-single-workflow-optimizations-STOR-512 branch March 27, 2026 14:17
@chenba
Copy link
Copy Markdown
Collaborator

chenba commented Mar 27, 2026

👍 Thanks. Looks great as a single workflow.

Thanks! And it runs faster, thank goodness.

The gain is from running what were the first two serial steps concurrently now correct? E.g. "build-and-test-spanner" and "build-spanner-image" now run concurrently where as in the past "build-spanner-image" only run once "build-and-test-spanner" succeeded.

@taddes
Copy link
Copy Markdown
Collaborator Author

taddes commented Mar 27, 2026

👍 Thanks. Looks great as a single workflow.

Thanks! And it runs faster, thank goodness.

The gain is from running what were the first two serial steps concurrently now correct? E.g. "build-and-test-spanner" and "build-spanner-image" now run concurrently where as in the past "build-spanner-image" only run once "build-and-test-spanner" succeeded.

Yes, and the end to end testing waits for the image to be built. Additionally, the testing begins while the clippy and python checks run, saving us some time. Much of the gain is also from more caching, since there were a lot of build artifacts (esp for Rust) that were not cached. This saves us time more when developing features, opposed to the final merge and will depend on the changes being made to the code/dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants