Skip to content

fix(bulk-cdk): handle null generationId/syncId in DestinationStreamFactory (AI-Triage PR)#75211

Draft
devin-ai-integration[bot] wants to merge 3 commits intomasterfrom
devin/1773922562-fix-npe-null-generation-id
Draft

fix(bulk-cdk): handle null generationId/syncId in DestinationStreamFactory (AI-Triage PR)#75211
devin-ai-integration[bot] wants to merge 3 commits intomasterfrom
devin/1773922562-fix-npe-null-generation-id

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 19, 2026

What

Fixes a NullPointerException in DestinationStreamFactory.make() that crashes all bulk-CDK destinations during initialization when generationId, minimumGenerationId, or syncId are null on the incoming ConfiguredAirbyteStream.

This affects multiple destinations (S3, ClickHouse, HubSpot, and others) on certain OSS platform versions or when using custom connector definitions where the platform does not populate these fields.

Resolves https://github.com/airbytehq/oncall/issues/11702:

Related issues:

How

The ConfiguredAirbyteStream (from io.airbyte.protocol.models.v0) is a Java class where generationId, minimumGenerationId, and syncId are boxed Long — nullable at runtime. The DestinationStream data class expects non-nullable Long. When the platform doesn't populate these fields, Kotlin's auto-unboxing throws an NPE.

The fix adds ?: 0L null-coalescing in both the modern and legacy-task-loader DestinationStreamFactory.make() methods. The default of 0L is consistent with CatalogGenerationSetter in the platform, which also defaults generation IDs to 0.

Note: This is a defensive CDK-side fix. The underlying platform issue (not populating these fields in certain OSS/custom-connector scenarios) may warrant a separate investigation.

Review guide

  1. airbyte-cdk/bulk/core/load/src/main/kotlin/io/airbyte/cdk/load/command/DestinationStreamFactory.kt — modern path (lines 56-58)
  2. airbyte-cdk/bulk/toolkits/legacy-task-loader/src/main/kotlin/io/airbyte/cdk/load/command/DestinationStream.kt — legacy path (lines 204-206)
  3. airbyte-cdk/bulk/core/load/version.properties — version bump 1.0.6 → 1.0.7
  4. airbyte-cdk/bulk/core/load/changelog.md — changelog entry for 1.0.7

Key question for reviewers: Is 0L the correct default? With generationId=0 and minimumGenerationId=0, shouldBeTruncatedAtEndOfSync() returns false (safe — no accidental data deletion). Should a warning be logged when falling back to defaults?

Human review checklist

  • Verify 0L is safe across all code paths that read generationId/syncId/minimumGenerationId — not just shouldBeTruncatedAtEndOfSync(). Grep for usages in the bulk CDK and destination connectors.
  • No unit test was added for the null-input case. Assess whether one is needed given the blast radius (all bulk-CDK destinations).
  • Confirm the legacy-task-loader change at line 204-206 doesn't interact unexpectedly with callers that may already handle nulls upstream.

Updates since last revision

  • Bumped bulk-cdk-core-load version from 1.0.6 → 1.0.7 (required by CI checkLoadVersion gate).
  • Added changelog entry for 1.0.7.

User Impact

Destinations using the bulk CDK will no longer crash with "Failed to initialize connector operation" / NPE when the platform does not provide generationId/syncId. Syncs will proceed normally with default values.

Can this PR be safely reverted and rolled back?

  • YES 💚

Link to Devin session: https://app.devin.ai/sessions/7268e4116a494cd3a87597f78b4d8b70

…ctory to prevent NPE

When ConfiguredAirbyteStream has null generationId, minimumGenerationId,
or syncId (which occurs on certain OSS platform versions or when using
custom connector definitions), the DestinationStreamFactory crashes with
a NullPointerException during Kotlin auto-unboxing from Long? to Long.

This adds null-safe defaults (0L) in both the modern and legacy-task-loader
versions of DestinationStreamFactory, matching the default behavior in
CatalogGenerationSetter which also defaults to 0L.

Resolves airbytehq/oncall#11702
Related: #75210
Related: #69351
Related: #70197
Related: #69219

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

devin-ai-integration bot and others added 2 commits March 19, 2026 12:21
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

Deploy preview for airbyte-kotlin-cdk ready!

✅ Preview
https://airbyte-kotlin-kw7bv6ud7-airbyte-growth.vercel.app

Built with commit afcba6a.
This pull request is being automatically deployed with vercel-action

@devin-ai-integration
Copy link
Contributor Author

↪️ Triggering /ai-prove-fix per Hands-Free AI Triage Project triage next step.

Reason: Draft PR with Green 4/5 triage score and CI mostly passing (47 passed, 2 failed). Fixes NPE in DestinationStreamFactory for null generationId/syncId affecting all bulk-CDK destinations.
https://github.com/airbytehq/oncall/issues/11702

Devin session

@octavia-bot
Copy link
Contributor

octavia-bot bot commented Mar 20, 2026

🔍 AI Prove Fix session starting... Running readiness checks and testing against customer connections. View playbook

Devin AI session created successfully!

@devin-ai-integration
Copy link
Contributor Author

devin-ai-integration bot commented Mar 20, 2026

Fix Validation Evidence

Outcome: Could not Run Tests

Evidence Summary

Pre-release publish of destination-s3 failed due to unrelated compilation errors on the PR branch (S3V2Specification.kt — unresolved jakarta/Singleton/JsonSchemaInject references). The PR branch appears out of date with master, causing dependency resolution failures for destination-s3 that are unrelated to the CDK fix.

Additionally, this bug only manifests on self-managed/OSS deployments where the platform does not populate generationId/syncId. On Airbyte Cloud, CatalogGenerationSetter always populates these fields, so the NPE would not reproduce in Cloud testing — only regression testing would be possible.

Code analysis confirms the fix is correct and minimal: adding ?: 0L null-coalescing to 3 fields in both the modern and legacy DestinationStreamFactory paths, consistent with how the Cloud platform defaults these values.

Next Steps
  1. Rebase the PR branch onto current master to resolve the destination-s3 compilation errors
  2. Re-run /ai-prove-fix after rebasing to attempt pre-release publish and regression testing
  3. For true fix validation (not just regression), test on a self-managed OSS deployment where generationId/syncId are null
  4. Alternatively, publish a different bulk-CDK destination (e.g., destination-bigquery) that may compile on the current PR branch

Connector & PR Details

Connector: destination-s3 (and all bulk-CDK destinations via shared CDK)
CDK Version: bulk-cdk-core-load 1.0.6 → 1.0.7
PR: #75211
Pre-release Version Tested: 1.9.7-preview.afcba6a (failed to build)
Failed Workflow: https://github.com/airbytehq/airbyte/actions/runs/23341486617
Detailed Results: https://github.com/airbytehq/oncall/issues/11702#issuecomment-4097510691

Evidence Plan

Proving Criteria

A sync that previously failed with NPE in DestinationStreamFactory.make() (when generationId/syncId are null) should complete successfully after applying the fix.

Disproving Criteria

The same NPE still occurs after applying the fix, or new errors appear that were not present before.

Cases Attempted

  1. Pre-release publish of destination-s3 — ❌ Build failed (unrelated compilation errors on PR branch)
  2. Cloud regression testing — ❌ Blocked by missing pre-release image. 10 internal connections were identified but could not be tested.

Limitation

This NPE only occurs on OSS/self-managed deployments where the platform does not populate generationId/syncId. Cloud always populates these via CatalogGenerationSetter, so Cloud testing can only validate regression safety, not prove the fix itself.

Pre-flight Checks
  • Viability: Fix adds ?: 0L null-coalescing — consistent with platform defaults
  • Safety: No malicious code or dangerous patterns
  • Breaking Change: No breaking changes detected (patch version bump only, no schema/spec/stream changes)
  • Reversibility: Can be safely downgraded/reverted (no state or config format changes)
  • Design Intent: Aligns with CatalogGenerationSetter which defaults these fields to 0
Detailed Evidence Log
Timestamp (UTC) Action Result
11:42 Posted initial status comment
11:43 Read PR diff, identified CDK-level fix in 4 files
11:44 Completed pre-flight checks (all passed)
11:48 Triggered pre-release publish for destination-s3
11:52 Pre-release workflow completed with failure
11:53 Confirmed Docker image does not exist
11:53 Queried prod for failed destination-s3 syncs (10 found)
11:53 Queried internal org for destination-s3 connections (10 found)
11:55 Posted detailed results to oncall issue

Note: Connection IDs and detailed logs are recorded in the linked private issue.


Validation by Clumsy Tucker Devin · Triggered by /ai-prove-fix

@github-actions
Copy link
Contributor

Pre-release Connector Publish Started

Publishing pre-release build for connector destination-s3.
PR: #75211

Pre-release versions will be tagged as {version}-preview.afcba6a
and are available for version pinning via the scoped_configuration API.

View workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants