ci: Connector Registry 2.0 Production Launch (replaces legacy metadata_service)#75224
Conversation
…ta_service - Remove legacy Poetry-based registry entry generation (OSS + Cloud) from publish_connectors.yml - Promote ops CLI generate + publish steps as the primary registry pipeline - Remove continue-on-error: true from ops CLI steps (now required, not soft-launch) - Change REGISTRY_STORE from coral:dev/soft-launch-trial to coral:prod - Add dry-run skip steps for registry artifact generation and publish - Remove Poetry and metadata_service install from publish_connector_registry_entries job - Replace legacy generate-cloud-registry, generate-oss-registry, and post-registry-generation jobs with single ops CLI compile job in generate-connector-registries.yml - Remove soft-launch naming prefixes from all step names Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Note 📝 PR Converted to Draft More info...Thank you for creating this PR. As a policy to protect our engineers' time, Airbyte requires all PRs to be created first in draft status. Your PR has been automatically converted to draft status in respect for this policy. As soon as your PR is ready for formal review, you can proceed to convert the PR to "ready for review" status by clicking the "Ready for review" button at the bottom of the PR page. To skip draft status in future PRs, please include |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
metadata_service)
Artifact generation is local-only, so there's no reason to skip it during dry-run. Only the publish step (which writes to GCS) needs the dry-run guard. Also removes the now-unnecessary [DRY-RUN] skip step for generation. Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
Co-Authored-By: AJ Steers <aj@airbyte.io>
| - name: "Promote or Rollback RC: ${{ env.ACTION }} ${{ env.CONNECTOR_NAME }}" | ||
| id: finalize-release-candidate | ||
| shell: bash | ||
| env: | ||
| GCS_CREDENTIALS: ${{ secrets.METADATA_SERVICE_PROD_GCS_CREDENTIALS }} | ||
| REGISTRY_STORE: "coral:prod" | ||
| run: > | ||
| airbyte-ops registry rc "${ACTION}" | ||
| --name "${CONNECTOR_NAME}" | ||
| --store "${REGISTRY_STORE}" | ||
| --with-pr | ||
| --with-store-cleanup |
There was a problem hiding this comment.
🔴 Missing GitHub App authentication for PR creation in finalize_rollout.yml
The old workflow explicitly authenticated as the OCTAVIA_BOT GitHub App with the comment: "Authenticate as the GitHub App to ensure CI can run. This is necessary because commits created with the built-in GitHub token will not trigger workflows." The token was passed via github_token: ${{ steps.get-app-token.outputs.token }}. The new workflow uses --with-pr (which creates a PR) but removes all GitHub App authentication entirely — no GITHUB_TOKEN or GH_TOKEN env var is passed to the ops CLI step, only GCS_CREDENTIALS and REGISTRY_STORE. This means PRs created by the workflow will either fail (if the CLI requires explicit auth) or use the default GITHUB_TOKEN, which GitHub explicitly does not allow to trigger downstream workflows. This same pattern is confirmed in bump-progressive-rollout-version-command.yml:73-74 which comments: "Important that token is a PAT so that CI checks are triggered again. Without this we would be forever waiting on required checks to pass."
Prompt for agents
In .github/workflows/finalize_rollout.yml, the GitHub App authentication step that existed in the old workflow was removed. The old workflow used actions/create-github-app-token with OCTAVIA_BOT_APP_ID and OCTAVIA_BOT_PRIVATE_KEY secrets to create a token, specifically because PRs/commits created with the default GITHUB_TOKEN do not trigger CI workflows.
To fix this:
1. Add back the GitHub App authentication step before the 'Promote or Rollback RC' step (around line 36, after the 'Install Ops CLI' step):
- name: Authenticate as GitHub App
uses: actions/create-github-app-token@d72941d797fd3113feb6b93fd0dec494b13a2547
id: get-app-token
with:
owner: airbytehq
repositories: airbyte
app-id: ${{ secrets.OCTAVIA_BOT_APP_ID }}
private-key: ${{ secrets.OCTAVIA_BOT_PRIVATE_KEY }}
2. Then pass the token to the ops CLI step as an env var (e.g., GITHUB_TOKEN or GH_TOKEN, depending on what the airbyte-ops CLI expects):
env:
GCS_CREDENTIALS: ${{ secrets.METADATA_SERVICE_PROD_GCS_CREDENTIALS }}
REGISTRY_STORE: coral:prod
GITHUB_TOKEN: ${{ steps.get-app-token.outputs.token }}
This ensures PRs created by --with-pr will trigger CI workflows.
Was this helpful? React with 👍 or 👎 to provide feedback.
Co-Authored-By: AJ Steers <aj@airbyte.io>
…ttps://git-manager.devin.ai/proxy/github.com/airbytehq/airbyte into devin/1773952030-productionalize-ops-cli-registry
What
Promotes the ops CLI registry pipeline from soft-launch (parallel validation) to production, fully replacing the legacy Poetry-based
metadata_serviceregistry generation and the legacyrun-airbyte-ciRC promote/rollback steps.This is the cutover step following successful soft-launch validation where the ops CLI ran alongside the legacy pipeline with
continue-on-error: truetargetingcoral:dev/soft-launch-trial.Tracking: https://github.com/airbytehq/airbyte-ops-mcp/issues/504
Post-launch test plan: https://github.com/airbytehq/airbyte-ops-mcp/issues/560
Operations docs: https://github.com/airbytehq/airbyte-ops-mcp/issues/561
How
publish_connectors.yml(per-connector artifact generation + publish):poetry run metadata_service generate-registry-entrysteps (both OSS and Cloud)metadata_serviceinstall from thepublish_connector_registry_entriesjobgenerate+publishsteps: removecontinue-on-error: true, remove soft-launch namingREGISTRY_STOREfromcoral:dev/soft-launch-trial→coral:prodif: always()so artifacts are available for debugging even on validation failuregenerate-connector-registries.yml(full registry compilation):generate-cloud-registry,generate-oss-registry,post-registry-generation(4 jobs → 1 job)compilejob: removecontinue-on-error: true, remove soft-launch namingREGISTRY_STOREfromcoral:dev/soft-launch-trial→coral:prod--with-legacy-migration v1to clean up legacycloud.json/oss.jsonfor disabled connectorsfinalize_rollout.yml(RC promote/rollback):run-airbyte-cipromote + rollback steps with a single ops CLIregistry rccommandactions/checkout@v4withfetch-depth: 1)GCS_CREDENTIALSconnector-publish-large→ubuntu-24.04(ops CLI is lightweight)CONNECTOR_NAMEjob-level env var to handle bothworkflow_dispatchandrepository_dispatchevent types--with-prand--with-store-cleanupflags to the RC command"finalizeRollout"that calls this workflowReview guide
publish_connectors.yml— Focus on the removal of legacy steps and the promotion of ops CLI steps. Verify that the publish step's dry-run guard is correct, and that artifact generation intentionally runs unconditionally (it's local-only).generate-connector-registries.yml— Verify the single ops CLI compile job covers what the 4 legacy jobs did (cloud registry, oss registry, secrets mask, registry report).finalize_rollout.yml— This workflow was NOT soft-launched (parallel execution was skipped due to state mutation concerns). Verify the ops CLIregistry rccommand correctly handles bothpromoteandrollbackactions via theACTIONenv var, and that the--with-prand--with-store-cleanupflags are correct for production use.finalize_rollout.ymlhas no soft-launch history — unlike the other two workflows, this is going directly from legacy to ops CLI. Theregistry rccommand has been tested manually but not via this workflow path in production.--with-prand--with-store-cleanupflags — These were added to the RC command. Confirm these flags are production-ready and correctly implement the workflow's documented behavior (creating an auto-merge PR and cleaning up the release_candidate directory).CONNECTOR_NAMEexpression (${{ github.event_name == 'workflow_dispatch' && github.event.inputs.connector_name || github.event.client_payload.connector_name }}) handles both event types — confirm this is correct forrepository_dispatchpayloads from the platform.--with-prto create the version-bump PR. Confirmfetch-depth: 1is sufficient.generate-registry-entryaccepted a--pre-release/--main-releaseflag. The ops CLIartifacts generatedoes not receive this flag — please confirm it handles RC/preview releases correctly without it.post-registry-generationjob rangenerate-registry-report— this has no ops CLI equivalent in this PR. Confirm this report is either handled by the ops CLI compile or is acceptable to drop.SLACK_TOKEN,SENTRY_DSN, etc. for error reporting. The ops CLI steps do not — confirm error observability is acceptable.airbyte-enterprise) callspublish_connectors.yml@master— once this merges, enterprise publishes immediately use the new pipeline.Human review checklist
finalize_rollout.ymlCONNECTOR_NAMEexpression works for bothworkflow_dispatchandrepository_dispatchregistry rccommand handles bothpromoteandrollbackactions correctly--with-prand--with-store-cleanupflags are production-ready and implement the documented workflow behaviorfetch-depth: 1is sufficient for the--with-prflag to create version-bump PRsubuntu-24.04is acceptable for the ops CLI workloadREGISTRY_STOREis set tocoral:prodin all three workflowsUser Impact
No direct user-facing impact. Connector publishing, registry compilation, and RC promote/rollback will now use the ops CLI as the sole path instead of the legacy
metadata_serviceandrun-airbyte-ci. If the ops CLI encounters issues, these operations will fail (previously the ops CLI failures were silently ignored viacontinue-on-error).Can this PR be safely reverted and rolled back?
Reverting this PR restores the legacy
metadata_serviceandrun-airbyte-cipipelines. However, if connectors have been published tocoral:prodvia the ops CLI after this merges, reverting may cause registry inconsistencies unless the legacy pipeline can handle the ops CLI-generated artifacts.Link to Devin run: https://app.devin.ai/sessions/f900274cb0884bf99e399b1c40c48067
Requested by: AJ Steers (Aaron ("AJ") Steers (@aaronsteers))