Skip to content

Fix: Exclude all non-OK HTTP status codes from lag calculation#183

Draft
Copilot wants to merge 4 commits intomasterfrom
copilot/fix-host-404-lag-calculation
Draft

Fix: Exclude all non-OK HTTP status codes from lag calculation#183
Copilot wants to merge 4 commits intomasterfrom
copilot/fix-host-404-lag-calculation

Conversation

Copy link

Copilot AI commented Feb 13, 2026

Created Beginner-Friendly Documentation 📚

Documentation Created

Created three comprehensive guides for explaining the bug fix to a day 1 intern:

  1. intern_explanation.md - Complete explanation with analogies

    • What is Freno (traffic light for databases)
    • School/homework analogy for the bug
    • Technical details with code examples
    • Why maps return 0 in Go
    • FAQ section
  2. visual_explanation.txt - ASCII diagrams

    • Visual representation of server states
    • Step-by-step breakdown of the bug
    • Timeline showing the race condition
    • Side-by-side comparison of buggy vs fixed code
  3. code_walkthrough.md - Line-by-line code explanation

    • Actual code before and after
    • Detailed breakdown of each line
    • Test explanation
    • Key programming concepts learned

Key Teaching Points

  • Race conditions and timing problems
  • Go map behavior (zero values for missing keys)
  • Defensive programming practices
  • Importance of test coverage
  • Real-world production bug fixing

Checklist

  • Created beginner-friendly overview
  • Created visual diagrams
  • Created detailed code walkthrough
  • Explained key programming concepts
  • Made content accessible to day 1 interns

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Previously, only hosts returning 404 were excluded from lag calculations.
This caused hosts returning other error codes (500, 503, etc.) to still
be used in lag calculations, even though they were unhealthy.

Changed the condition to exclude any host with non-OK (non-200) HTTP status.

- Updated pkg/throttle/mysql.go to check for != StatusOK instead of == StatusNotFound
- Added test case to verify 500 and 503 status codes are properly excluded

Co-authored-by: alexisjcarr <[email protected]>
Copilot AI changed the title [WIP] Fix lag calculation using hosts that return 404 Fix: Exclude all non-OK HTTP status codes from lag calculation Feb 13, 2026
Copilot AI requested a review from alexisjcarr February 13, 2026 16:38
Copilot AI and others added 2 commits February 13, 2026 16:44
The original code that excluded only 404 status was correct.
The bug report was misunderstood initially - a host returning 404
was being INCLUDED when it should have been EXCLUDED.

The original behavior is correct:
- 404 (Not Found): Exclude from lag calculation (health check endpoint doesn't exist)
- 500, 503, etc: INCLUDE in lag calculation (these indicate replication/health issues we want to measure)

Reverted incorrect changes that excluded all non-200 status codes.

Co-authored-by: alexisjcarr <[email protected]>
The bug was that when HTTP check results were missing from the map (value = 0),
hosts were incorrectly included in lag calculations.

Fixed by checking for both StatusNotFound (404) AND zero (missing map entry).

- Updated aggregateMySQLProbes to check: httpStatus == StatusNotFound || httpStatus == 0
- Added test case TestAggregateMySQLProbesWithMissingHttpChecks to validate fix
- All existing tests pass

Co-authored-by: alexisjcarr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants