Skip to content

docs(source-gcs): Improve documentation accuracy and completeness#75212

Draft
devin-ai-integration[bot] wants to merge 1 commit intomasterfrom
devin/1773922847-docs-source-gcs-improvements
Draft

docs(source-gcs): Improve documentation accuracy and completeness#75212
devin-ai-integration[bot] wants to merge 1 commit intomasterfrom
devin/1773922847-docs-source-gcs-improvements

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 19, 2026

Documentation Confidence Assessment

Overall Confidence: 3/5

Dimension Score Rationale
Code Comprehension 3/5 Python CDK file-based connector; ZIP/stream_reader logic clear, but compression support inferred from CDK base classes.
API Documentation Quality 4/5 GCS has comprehensive IAM and auth docs; minor gap in rate limit specifics.
Change Scope & Risk 3/5 160 lines changed including setup section consolidation and new file format documentation.
Existing Doc Maturity 4/5 329-line doc with good structure but identifiable gaps (missing ZIP/compression docs, incorrect glob description).
Connector Sensitivity 3/5 Certified connector with moderate usage (ql: 200, sl: 300).
Triggering Context 5/5 Triggered from small merged bug-fix PR for ZIP file detection.

What I Verified vs. What I Inferred

  • Verified from code: Glob patterns used (not regex), service account auth + OAuth auth types, ZIP extraction via zip_helper.py, blob.name.endswith(".zip") detection logic, changelog PR numbers match URLs
  • Verified from API docs: roles/storage.objectViewer grants storage.objects.get + storage.objects.list, service account JSON key auth flow
  • Inferred: gzip/bzip2 compression support (inherited from CDK FileBasedStreamReader, not GCS-specific code), that the consolidated setup instructions are equivalent for Cloud and OSS users

Areas of Concern

  • Consolidated setup section: Removed separate Cloud/OSS subsections and the Cloud-specific login link (https://cloud.airbyte.com/workspaces). Reviewer should verify the unified instructions are adequate for both platforms.
  • IAM role recommendation: Added roles/storage.objectViewer as minimum required role. Confirm this matches actual connector behavior — the connector calls bucket.list_blobs() and blob.download_to_filename().
  • Compression/ZIP paragraph: Claims gzip and bzip2 support. This is inherited from the CDK's file-based framework, not verified against a running GCS connector instance.

What

Improves the source-gcs connector documentation, triggered by recent ZIP-related fixes in 74779 and 74781. Fixes several factual inaccuracies, fills documentation gaps, and removes redundant content.

How

Key changes to docs/integrations/sources/gcs.md:

Correctness fixes (high priority):

  • Fixed Globs description from "regular expression" to "glob-style pattern matching" — globs are not regex
  • Fixed "service user" → "service account" terminology
  • Removed duplicate step 6 in setup instructions (paste service account key was listed twice)
  • Fixed broken heading hierarchy (## Set up Google Cloud Storage was an h2 sibling of ## Setup guide instead of a child)
  • Corrected changelog date for v0.10.8 from 2026-03-18 → 2026-03-19 to match actual merge date
  • Replaced escaped HTML entities (\(tl;dr ->...) with a proper Docusaurus :::tip admonition

Completeness fixes (medium priority):

  • Added specific IAM role requirement (Storage Object Viewer / roles/storage.objectViewer) to prerequisites
  • Documented ZIP archive support (added in v0.7.0) and gzip/bzip2 compression support (added in v0.4.0) — these were completely undocumented
  • Renamed "Supported Streams" → "Supported file formats" since the section lists file formats, not streams (GCS streams are user-defined)

Readability/style fixes:

  • Consolidated nearly-identical Cloud and OSS setup sections into a single unified section
  • Cleaned up unnecessary markdown escapes throughout
  • Added text language tag to fenced code block
  • Capitalized file format names consistently (csv → CSV, parquet → Parquet, etc.)

Review guide

  1. docs/integrations/sources/gcs.md — single file, documentation-only change

Items most important to review:

  • Consolidated setup section (lines 36–57): The original doc had separate Cloud/OSS subsections that were ~95% identical. I merged them into one. Verify the unified instructions are adequate for both platforms — notably, the Cloud-specific login link (https://cloud.airbyte.com/workspaces) was removed.
  • IAM role recommendation (line 16): Added roles/storage.objectViewer as the minimum required role. Confirm this matches the connector's actual GCS API usage (storage.objects.get + storage.objects.list).
  • ZIP/compression paragraph (line 211): New documentation for an existing feature. Verify the description ("automatically extracted during sync") accurately reflects the connector's behavior.

User Impact

Users reading the source-gcs docs will see more accurate prerequisites (specific IAM roles instead of vague "access to GCS"), correct description of glob patterns, and documentation of ZIP/compression support that was previously missing.

Can this PR be safely reverted and rolled back?

  • YES 💚

This is a documentation-only change with no code modifications.


Note: I am an AI assistant (Devin) and have proposed these documentation updates based on a review of the connector source code and GCS API documentation. Reviewers may merge, modify, or close this PR as they see fit.

Link to Devin session: https://app.devin.ai/sessions/cd485e66445d44cba43b9c43386f9677

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Contributor

Deploy preview for airbyte-docs ready!

✅ Preview
https://airbyte-docs-8wta785q3-airbyte-growth.vercel.app

Built with commit af62c31.
This pull request is being automatically deployed with vercel-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Improvements or additions to documentation team/documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant