feat: Single workflow and actions optimizations#2140
Conversation
d197773 to
22d33ac
Compare
63fb7b4 to
c3066dc
Compare
c3066dc to
2e21c38
Compare
b593772 to
9c5e107
Compare
| path: | | ||
| ~/.rustup/toolchains | ||
| ~/.rustup/update-hashes | ||
| key: ${{ runner.os }}-rust-toolchain-${{ env.RUST_VERSION }} |
There was a problem hiding this comment.
Maybe this could use a cache-hit != 'true' fallback like the other actions/cache actions below?
There was a problem hiding this comment.
Glad you pointed this out, since it's more useful for the cache and install steps in the rust-env job; that's the expensive operation that downloads components. Skipping it on a cache hit saves real time.
The "Restore Rust toolchain" steps in downstream jobs don't install anything — they just restore cached files. The step that follows, rustup default, only switches which toolchain is active (no download, so instant). There's nothing expensive to skip, so no guard is needed there, but I added it for the Install Rust toolchain 👍
| push: | ||
| branches: | ||
| - master | ||
| - main |
There was a problem hiding this comment.
Indeed. We keep knocking around the idea to finally convert and never do, so perhaps this will remind us someday 😆
… warning, add llvm to rust toolchain
9c5e107 to
ef9ec10
Compare
| destination: ecosystem-test-eng-metrics/syncstorage-rs/junit | ||
| glob: "*.xml" | ||
| parent: false | ||
| process_gcloudignore: false |
There was a problem hiding this comment.
Is it possible to break the file up into smaller ones?
There was a problem hiding this comment.
I can look into this, perhaps composite actions or reusable actions that can be pulled into a central workflow? If it seems like something doable, I can file a quick ticket because yeah, that's the main disadvantage: this big file (just like our .circleci.yaml was)
chenba
left a comment
There was a problem hiding this comment.
👍 Thanks. Looks great as a single workflow.
Thanks! And it runs faster, thank goodness. |
The gain is from running what were the first two serial steps concurrently now correct? E.g. "build-and-test-spanner" and "build-spanner-image" now run concurrently where as in the past "build-spanner-image" only run once "build-and-test-spanner" succeeded. |
Yes, and the end to end testing waits for the image to be built. Additionally, the testing begins while the clippy and python checks run, saving us some time. Much of the gain is also from more caching, since there were a lot of build artifacts (esp for Rust) that were not cached. This saves us time more when developing features, opposed to the final merge and will depend on the changes being made to the code/dependencies. |
Description
Updates the GitHub actions for Sync to a single workflow and optimizes runs the jobs.
Cuts down the run times considerably, increases cache usage and gets us faster feedback across the board.

Runtimes
Total workflow (longest run is Spanner unit tests) is ~22 min


build-and-unit-test-**: can run as fast as 6min with cache.Spanner build and test runs would take ~45 min | Now build takes 19s with cache, 5 min for e2e tests and 6 - 19 min for unit tests (depending on cache)
Postgres build and test runs would take ~25| Now build takes 18s with cache, 1 min 40 s for e2e tests and 6 - 11 min for unit tests (depending on cache)
Workflow Consolidation
The four separate workflow files (
checks.yml,postgres.yml,mysql.yml,spanner.yml) have been merged into a singlemain-workflow.yml. Previously each database backend ran as an independent workflow with its own triggers, duplicate setup steps, and no shared structure. All three backend pipelines now run as concurrent jobs within one workflow.Triggers remain unchanged: push to
master/main, tags, and pull requests. Aworkflow_dispatchinput was added to allow manual runs with a custompython-versionorrust-version.New: Automatic Cache Cleanup (
cleanup-branch-cache.yml)A new workflow deletes all Actions cache entries associated with a branch when its pull request is closed — whether merged or abandoned. This prevents stale cache accumulation across long-lived feature branches. Uses
gh cache listscoped to the branch ref and deletes each entry by ID, with|| trueguards so individual deletion failures don't fail the job.Job Structure
The consolidated workflow runs the following jobs:
python-env,rust-envpython-checks,rust-checks,clippy(matrix: postgres/mysql/spanner)build-and-unit-test-postgres,build-and-unit-test-mysql,build-and-unit-test-spannerbuild-postgres-image,build-mysql-image,build-spanner-imagepostgres-e2e-tests,mysql-e2e-tests,spanner-e2e-testsClippy runs as a matrix across all three backends in parallel. All three database pipelines (build, test, image, e2e) run fully concurrently with each other.
Caching
Significant caching was added to reduce redundant work across runs:
RUST_VERSION; downstream jobs restore without reinstalling~/.cache/pipand~/.cache/pypoetry/virtualenvscached by lockfile hash;pip3 install poetryremains explicit in jobs that need the binary~/.cargo/registryandtarget/cached per-backend (cargo-postgres-,cargo-mysql-,cargo-spanner-) withrestore-keysfallback for partial hits onCargo.lockchanges.tarfile keyed onDockerfile,Cargo.lock, all.rs/Cargo.tomlfiles,tools/**, andscripts/**; image build and Buildx setup steps are skipped on cache hitcargo-audit— cached; only reinstalled whenmain-workflow.ymlchangesmdbook+mdbook-mermaid— cached; only reinstalled whenMakefilechangescargo-llvm-cov— cached perCargo.lockhash in all three unit test jobs; previously compiled from source every runcargo-nextest— cached perCargo.lockhash in all three unit test jobs; previously downloaded on every runSecurity & Permissions
Workflow-level
permissions: {}denies all token permissions by default. Each job grants only what it needs (contents: read,checks: write,actions: writewhere applicable). Therust-envsetup job no longer checks out the repository, as it only installs the Rust toolchain and needs no repo contents.Dependabot
Dependabot is configured for all three ecosystems with weekly updates and
open-pull-requests-limit: 1to avoid PR floods:dev-depsandprod-depsto reduce noise from patch bumpspython-depsgroupactions-depsIn deciding what build and test methods should be cancelled if they fail, I settled for a middle ground: I only gate the image builds (expensive, long-running) behind checks, but let unit tests / e2e tests run concurrently. This means we can get more feedback across the board should there be a problem with the code and not have to wait so long before getting feedback.
Issue(s)
Closes STOR-512.