Skip to content

Conversation

@samikshya-db
Copy link
Collaborator

Summary

This PR implements the remaining core infrastructure for PECOBLR-1143 (Phases 4-5), completing the foundational telemetry system for the Databricks SQL Go driver.

Phase 4: Export Infrastructure ✅

  • Implemented telemetryExporter with HTTP POST to /api/2.0/telemetry-ext
  • Added retry logic with exponential backoff (100ms base, 3 retries)
  • Integrated with circuit breaker for endpoint protection
  • Implemented tag filtering via shouldExportToDatabricks()
  • Added comprehensive error swallowing
  • Support for both HTTP/HTTPS URLs (testing + production)

Phase 5: Opt-In Configuration Integration ✅

  • Implemented isTelemetryEnabled() with 5-level priority logic:
    1. forceEnableTelemetry=true - bypasses all server checks (testing/internal)
    2. enableTelemetry=false - explicit opt-out (always disabled)
    3. enableTelemetry=true + server flag - user opt-in with server control
    4. Server flag only - default Databricks-controlled behavior
    5. Default disabled - fail-safe default
  • Integrated with existing featureFlagCache for server flag checks
  • Added proper error handling with safe fallbacks

Changes

New Files

  • telemetry/exporter.go (192 lines) - Export infrastructure
  • telemetry/exporter_test.go (448 lines) - Comprehensive exporter tests

Modified Files

  • telemetry/config.go (+48 lines) - Added isTelemetryEnabled() function
  • telemetry/config_test.go (+230 lines) - Added opt-in priority tests
  • telemetry/DESIGN.md - Updated checklist (Phases 3-5 marked complete)

Test Coverage

All 70+ tests passing ✅

  • ✅ 17 new exporter tests (success, retries, circuit breaker, tag filtering, error swallowing, exponential backoff, context cancellation)
  • ✅ 8 new opt-in priority tests (all 5 priority levels, error handling, server scenarios)
  • ✅ All existing tests continue to pass

Test run time: 2.017s

Testing Done

Unit Tests

  • Export success scenarios with mock HTTP server
  • Retry logic on 5xx errors with exponential backoff
  • Non-retryable 4xx errors (no retry)
  • 429 rate limiting (retryable)
  • Circuit breaker integration (drops when open)
  • Tag filtering (exports only allowed tags)
  • Error swallowing (no panics)
  • Context cancellation handling
  • All 5 opt-in priority levels
  • Server error scenarios
  • Unreachable server handling

Integration Tests

  • HTTP mock server tests for all scenarios
  • Circuit breaker state transitions
  • Feature flag cache integration

Design Alignment

This implementation follows the design document (telemetry/DESIGN.md) specifications:

  • ✅ Section 3.6 - telemetryExporter
  • ✅ Section 5 - Export Mechanism
  • ✅ Section 6.4 - Opt-In Control & Priority

Related Issues

  • Implements: PECOBLR-1143 (Phases 4-5)
  • Depends on: PECOBLR-1143 Phases 1-3 (already merged)
  • Enables: PECOBLR-1381 (Phase 6 - Collection & Aggregation)
  • Enables: PECOBLR-1382 (Phase 7 - Driver Integration)

Checklist

  • Implementation follows design document
  • Comprehensive unit tests added
  • All tests passing
  • DESIGN.md checklist updated
  • Code follows Go best practices
  • Error handling with proper swallowing
  • Thread-safe implementation
  • No breaking changes

Next Steps

After this PR merges:

  1. Phase 6 (PECOBLR-1381): Implement metric collection and aggregation
  2. Phase 7 (PECOBLR-1382): Integrate with driver (connection.go, statement.go)

🤖 Generated with Claude Code

…nd opt-in configuration

This commit implements the remaining components for PECOBLR-1143 (Phases 4-5):

Phase 4: Export Infrastructure
- Implement telemetryExporter with HTTP POST to /api/2.0/telemetry-ext
- Add retry logic with exponential backoff (100ms base, 3 retries)
- Integrate with circuit breaker for endpoint protection
- Implement tag filtering via shouldExportToDatabricks()
- Add error swallowing to ensure telemetry never impacts driver
- Support both http:// and https:// URLs for testing

Phase 5: Opt-In Configuration Integration
- Implement isTelemetryEnabled() with 5-level priority logic:
  1. forceEnableTelemetry=true - bypasses all server checks
  2. enableTelemetry=false - explicit opt-out
  3. enableTelemetry=true + server flag - user opt-in with server control
  4. Server flag only - default Databricks-controlled behavior
  5. Default disabled - fail-safe default
- Wire up with existing featureFlagCache for server flag checks
- Handle errors gracefully (default to disabled on failures)

Testing:
- Add 17 comprehensive unit tests for exporter (success, retries, circuit breaker, tag filtering, error swallowing, exponential backoff, context cancellation)
- Add 8 unit tests for isTelemetryEnabled (all 5 priority levels, error handling, server scenarios)
- All 70+ telemetry tests passing

Documentation:
- Update DESIGN.md checklist to mark Phases 3-5 as completed

This completes the core telemetry infrastructure for PECOBLR-1143.
Next phases (6-7) will add metric collection and driver integration.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@samikshya-db
Copy link
Collaborator Author

Recreating with git stack for proper stacked PR management

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants