Skip to content

cmd/alertmanager: add --config.auto-reload-interval flag#5222

Open
mihir-dixit2k27 wants to merge 1 commit intoprometheus:mainfrom
mihir-dixit2k27:feat/config-auto-reload-interval
Open

cmd/alertmanager: add --config.auto-reload-interval flag#5222
mihir-dixit2k27 wants to merge 1 commit intoprometheus:mainfrom
mihir-dixit2k27:feat/config-auto-reload-interval

Conversation

@mihir-dixit2k27
Copy link
Copy Markdown
Contributor

@mihir-dixit2k27 mihir-dixit2k27 commented May 1, 2026

Description

Adds a --config.auto-reload-interval flag. When set to a non-zero duration, a background goroutine polls the SHA256 checksum of the config file at the given interval and writes to the existing webReload channel on a detected change — the same path taken by SIGHUP and POST /-/reload — so no new reload logic is introduced.

Why polling instead of fsnotify?
Kubernetes kubelet uses AtomicWriter to update ConfigMap and Secret volume mounts via symlink swaps, which causes fsnotify to stop receiving events after the first update since the original inode is replaced. SHA256 polling works correctly regardless of how the underlying file is updated, which is also why Prometheus uses polling for its equivalent feature.

Behaviour

  • Default is 0s (disabled). Existing deployments are completely unaffected.
  • On a detected change, the reload follows the same code path as SIGHUP/POST /-/reload: invalid configs are rejected and logged without disrupting the running instance.
  • If a reload fails, lastChecksum is not updated, so the watcher retries on every subsequent tick until the configuration is valid again.
  • The watcher goroutine exits cleanly on SIGTERM.

Changes

File Description
cmd/alertmanager/main.go New flag, configFileChecksum() helper, runConfigWatcher() goroutine, watcher start block in run()
cmd/alertmanager/main_test.go 8 unit tests covering all edge cases
docs/configuration.md New auto-reload section
docs/management_api.md Cross-reference added under the reload section

fixes #5197

Summary by CodeRabbit

  • New Features

    • Optional polling-based configuration auto-reload via a new interval flag; detects on-disk config changes and automatically applies them using existing reload logic, with retries on failure.
  • Documentation

    • Updated configuration and management API docs with auto-reload details, behavior, defaults, and examples.
  • Tests

    • Added comprehensive tests for checksum computation, change detection, reload triggering, retry behavior, unreadable files, and graceful shutdown.

@mihir-dixit2k27 mihir-dixit2k27 requested a review from a team as a code owner May 1, 2026 19:28
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3f3543fc-de2d-4a76-8081-7882c7a95215

📥 Commits

Reviewing files that changed from the base of the PR and between a2ceeb7 and f7ba5da.

📒 Files selected for processing (4)
  • cmd/alertmanager/main.go
  • cmd/alertmanager/main_test.go
  • docs/configuration.md
  • docs/management_api.md
✅ Files skipped from review due to trivial changes (2)
  • docs/management_api.md
  • docs/configuration.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/alertmanager/main_test.go

📝 Walkthrough

Walkthrough

Adds a polling-based auto-reload controlled by a new --config.auto-reload-interval duration flag. When non-zero, a background watcher computes SHA256 checksums of the config file, polls it at the interval, and triggers reloads via the existing reload coordination channel when changes are detected. (<=50 words)

Changes

Auto-reload watcher

Layer / File(s) Summary
Flag / Invocation
cmd/alertmanager/main.go
Adds --config.auto-reload-interval (configAutoReloadInterval) flag (default 0s).
Data Shape / Utility
cmd/alertmanager/main.go
Adds configFileChecksum(path string) (string, error) to compute SHA256 hex digest of the config file.
Core Implementation
cmd/alertmanager/main.go
Adds runConfigWatcher(ctx, interval, path, webReload, logger) which polls checksum on a ticker, compares to lastChecksum, and sends a chan error into webReload when changed; updates lastChecksum only after successful reload; logs failures and retains last checksum on reload failure; exits on context cancellation.
Wiring
cmd/alertmanager/main.go
In run(), conditionally starts the watcher goroutine when *configAutoReloadInterval > 0, using a cancellable context and passing webReload.
Tests
cmd/alertmanager/main_test.go
Adds unit tests for configFileChecksum and runConfigWatcher covering consistent hashing, differing hashes, missing files, no-reload on unchanged content, reload signaling on change, suppression after successful reload, retry after failed reload, unreadable file handling, and watcher exit on context cancellation.
Docs
docs/configuration.md, docs/management_api.md
Documents the --config.auto-reload-interval flag, explains polling behavior and that reload follows same validation/coordination as SIGHUP/POST /-/reload, and adds example usage (e.g., 30s).

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Watcher as Config Watcher
    participant FS as File System
    participant Coordinator as Reload Coordinator
    participant Logger as Logger

    App->>Watcher: start(ctx, interval, path, webReload)
    Watcher->>FS: read config file
    FS-->>Watcher: file contents
    Watcher->>Watcher: compute SHA256, set lastChecksum

    loop every interval
        Watcher->>FS: read config file
        FS-->>Watcher: file contents
        Watcher->>Watcher: compute SHA256
        alt checksum changed
            Watcher->>Coordinator: send reload request (chan error) via webReload
            Coordinator-->>Watcher: error or nil via chan
            alt reload successful
                Watcher->>Watcher: update lastChecksum
            else reload failed
                Watcher->>Logger: log error (keep lastChecksum)
            end
        else unchanged
            Note over Watcher: no action
        end
    end

    App->>Watcher: cancel ctx
    Watcher->>Logger: cleanup
    Watcher-->>App: exit
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'cmd/alertmanager: add --config.auto-reload-interval flag' is specific, concise, and clearly describes the main change—adding a new CLI flag for automatic configuration reloading.
Description check ✅ Passed The PR description provides comprehensive information about the feature, design rationale, behavior, implementation details, and a clear reference to the linked issue #5197.
Linked Issues check ✅ Passed The PR successfully addresses all coding objectives from #5197: implements optional config auto-reload with checksum polling (resilient to Kubernetes atomic writes), reuses existing reload semantics, and is disabled by default.
Out of Scope Changes check ✅ Passed All changes are tightly scoped to the auto-reload feature: implementation in main.go, comprehensive tests in main_test.go, and documentation in configuration and management API docs—no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/alertmanager/main.go (1)

746-752: 💤 Low value

Minor: potential goroutine leak on shutdown edge case.

If context cancellation occurs while the watcher is blocked sending to reloadCh (line 747), the goroutine won't exit cleanly because there's no select around the channel send. This is an edge case that only matters during shutdown when a config change is detected simultaneously with SIGTERM.

Since the process is exiting anyway and this matches the pattern used by the HTTP reload handler, this is acceptable. Consider wrapping in a select with ctx.Done() if clean shutdown becomes a requirement.

♻️ Optional: Add context-aware channel send
 			// Trigger reload via the same channel that SIGHUP and POST /-/reload use.
 			errCh := make(chan error)
-			reloadCh <- errCh
+			select {
+			case reloadCh <- errCh:
+			case <-ctx.Done():
+				logger.Info("Auto-reload: watcher stopped during reload attempt", "file", configFile)
+				return
+			}
 			if err := <-errCh; err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/alertmanager/main.go` around lines 746 - 752, The send to reloadCh in the
watcher can block and leak a goroutine if context is cancelled while sending; to
fix, make the send context-aware by replacing the direct send of errCh (reloadCh
<- errCh) with a select that attempts to send errCh or returns/continues on
ctx.Done(), e.g., in the watcher goroutine surrounding the reloadCh send use
select { case reloadCh <- errCh: ... case <-ctx.Done(): cleanup/continue },
ensuring you reference reloadCh, errCh and ctx.Done() so the goroutine exits
cleanly on shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/alertmanager/main.go`:
- Around line 746-752: The send to reloadCh in the watcher can block and leak a
goroutine if context is cancelled while sending; to fix, make the send
context-aware by replacing the direct send of errCh (reloadCh <- errCh) with a
select that attempts to send errCh or returns/continues on ctx.Done(), e.g., in
the watcher goroutine surrounding the reloadCh send use select { case reloadCh
<- errCh: ... case <-ctx.Done(): cleanup/continue }, ensuring you reference
reloadCh, errCh and ctx.Done() so the goroutine exits cleanly on shutdown.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1a804a2c-a6d9-4d5b-ad2a-7d1f7ad8c88d

📥 Commits

Reviewing files that changed from the base of the PR and between a5290a7 and 0501c55.

📒 Files selected for processing (4)
  • cmd/alertmanager/main.go
  • cmd/alertmanager/main_test.go
  • docs/configuration.md
  • docs/management_api.md

@mihir-dixit2k27 mihir-dixit2k27 force-pushed the feat/config-auto-reload-interval branch 2 times, most recently from 161829c to a2ceeb7 Compare May 2, 2026 17:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/alertmanager/main.go`:
- Around line 717-722: The initial config checksum read can fail leaving
lastChecksum empty and causing the first subsequent successful read to always
look like a change; modify the auto-reload logic that calls configFileChecksum
so that when lastChecksum is empty (uninitialized) and a subsequent
configFileChecksum succeeds, you set lastChecksum = newChecksum and do NOT
trigger a reload; apply the same "seed baseline if uninitialized" check where
you compare newChecksum vs lastChecksum (the places using lastChecksum,
newChecksum, and configFileChecksum and the reload invocation) so only genuine
checksum changes trigger reloadConfig.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8dcd956a-d8ff-4066-bc98-4bc4095dc3ce

📥 Commits

Reviewing files that changed from the base of the PR and between 161829c and a2ceeb7.

📒 Files selected for processing (4)
  • cmd/alertmanager/main.go
  • cmd/alertmanager/main_test.go
  • docs/configuration.md
  • docs/management_api.md
✅ Files skipped from review due to trivial changes (3)
  • docs/management_api.md
  • cmd/alertmanager/main_test.go
  • docs/configuration.md

Comment thread cmd/alertmanager/main.go
Add a --config.auto-reload-interval flag that starts a background
goroutine polling the SHA256 checksum of the config file at the given
interval. When a change is detected the goroutine writes to the existing
webReload channel, the same path used by SIGHUP and POST /-/reload, so
no new reload logic is required.

The flag defaults to 0s (disabled). Any non-zero duration enables the
watcher. When the process shuts down the watcher exits cleanly via
context cancellation.

SHA256 polling is used instead of fsnotify because Kubernetes kubelet
uses AtomicWriter to update ConfigMap and Secret mounts via symlink
swaps, which causes fsnotify to miss updates after the first one.
Polling is the same approach Prometheus uses for --web.config.file.

Fixes prometheus#5195

Signed-off-by: Mihir Dixit <dixitmihir1@gmail.com>
@mihir-dixit2k27 mihir-dixit2k27 force-pushed the feat/config-auto-reload-interval branch from a2ceeb7 to f7ba5da Compare May 2, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: native config auto-reload support, similar to Prometheus auto-reload-config

1 participant