cmd/alertmanager: add --config.auto-reload-interval flag#5222
cmd/alertmanager: add --config.auto-reload-interval flag#5222mihir-dixit2k27 wants to merge 1 commit intoprometheus:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a polling-based auto-reload controlled by a new ChangesAuto-reload watcher
Sequence DiagramsequenceDiagram
participant App as Application
participant Watcher as Config Watcher
participant FS as File System
participant Coordinator as Reload Coordinator
participant Logger as Logger
App->>Watcher: start(ctx, interval, path, webReload)
Watcher->>FS: read config file
FS-->>Watcher: file contents
Watcher->>Watcher: compute SHA256, set lastChecksum
loop every interval
Watcher->>FS: read config file
FS-->>Watcher: file contents
Watcher->>Watcher: compute SHA256
alt checksum changed
Watcher->>Coordinator: send reload request (chan error) via webReload
Coordinator-->>Watcher: error or nil via chan
alt reload successful
Watcher->>Watcher: update lastChecksum
else reload failed
Watcher->>Logger: log error (keep lastChecksum)
end
else unchanged
Note over Watcher: no action
end
end
App->>Watcher: cancel ctx
Watcher->>Logger: cleanup
Watcher-->>App: exit
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cmd/alertmanager/main.go (1)
746-752: 💤 Low valueMinor: potential goroutine leak on shutdown edge case.
If context cancellation occurs while the watcher is blocked sending to
reloadCh(line 747), the goroutine won't exit cleanly because there's no select around the channel send. This is an edge case that only matters during shutdown when a config change is detected simultaneously with SIGTERM.Since the process is exiting anyway and this matches the pattern used by the HTTP reload handler, this is acceptable. Consider wrapping in a select with
ctx.Done()if clean shutdown becomes a requirement.♻️ Optional: Add context-aware channel send
// Trigger reload via the same channel that SIGHUP and POST /-/reload use. errCh := make(chan error) - reloadCh <- errCh + select { + case reloadCh <- errCh: + case <-ctx.Done(): + logger.Info("Auto-reload: watcher stopped during reload attempt", "file", configFile) + return + } if err := <-errCh; err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/alertmanager/main.go` around lines 746 - 752, The send to reloadCh in the watcher can block and leak a goroutine if context is cancelled while sending; to fix, make the send context-aware by replacing the direct send of errCh (reloadCh <- errCh) with a select that attempts to send errCh or returns/continues on ctx.Done(), e.g., in the watcher goroutine surrounding the reloadCh send use select { case reloadCh <- errCh: ... case <-ctx.Done(): cleanup/continue }, ensuring you reference reloadCh, errCh and ctx.Done() so the goroutine exits cleanly on shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/alertmanager/main.go`:
- Around line 746-752: The send to reloadCh in the watcher can block and leak a
goroutine if context is cancelled while sending; to fix, make the send
context-aware by replacing the direct send of errCh (reloadCh <- errCh) with a
select that attempts to send errCh or returns/continues on ctx.Done(), e.g., in
the watcher goroutine surrounding the reloadCh send use select { case reloadCh
<- errCh: ... case <-ctx.Done(): cleanup/continue }, ensuring you reference
reloadCh, errCh and ctx.Done() so the goroutine exits cleanly on shutdown.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1a804a2c-a6d9-4d5b-ad2a-7d1f7ad8c88d
📒 Files selected for processing (4)
cmd/alertmanager/main.gocmd/alertmanager/main_test.godocs/configuration.mddocs/management_api.md
161829c to
a2ceeb7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/alertmanager/main.go`:
- Around line 717-722: The initial config checksum read can fail leaving
lastChecksum empty and causing the first subsequent successful read to always
look like a change; modify the auto-reload logic that calls configFileChecksum
so that when lastChecksum is empty (uninitialized) and a subsequent
configFileChecksum succeeds, you set lastChecksum = newChecksum and do NOT
trigger a reload; apply the same "seed baseline if uninitialized" check where
you compare newChecksum vs lastChecksum (the places using lastChecksum,
newChecksum, and configFileChecksum and the reload invocation) so only genuine
checksum changes trigger reloadConfig.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8dcd956a-d8ff-4066-bc98-4bc4095dc3ce
📒 Files selected for processing (4)
cmd/alertmanager/main.gocmd/alertmanager/main_test.godocs/configuration.mddocs/management_api.md
✅ Files skipped from review due to trivial changes (3)
- docs/management_api.md
- cmd/alertmanager/main_test.go
- docs/configuration.md
Add a --config.auto-reload-interval flag that starts a background goroutine polling the SHA256 checksum of the config file at the given interval. When a change is detected the goroutine writes to the existing webReload channel, the same path used by SIGHUP and POST /-/reload, so no new reload logic is required. The flag defaults to 0s (disabled). Any non-zero duration enables the watcher. When the process shuts down the watcher exits cleanly via context cancellation. SHA256 polling is used instead of fsnotify because Kubernetes kubelet uses AtomicWriter to update ConfigMap and Secret mounts via symlink swaps, which causes fsnotify to miss updates after the first one. Polling is the same approach Prometheus uses for --web.config.file. Fixes prometheus#5195 Signed-off-by: Mihir Dixit <dixitmihir1@gmail.com>
a2ceeb7 to
f7ba5da
Compare
Description
Adds a
--config.auto-reload-intervalflag. When set to a non-zero duration, a background goroutine polls the SHA256 checksum of the config file at the given interval and writes to the existingwebReloadchannel on a detected change — the same path taken bySIGHUPandPOST /-/reload— so no new reload logic is introduced.Why polling instead of fsnotify?
Kubernetes
kubeletusesAtomicWriterto update ConfigMap and Secret volume mounts via symlink swaps, which causes fsnotify to stop receiving events after the first update since the original inode is replaced. SHA256 polling works correctly regardless of how the underlying file is updated, which is also why Prometheus uses polling for its equivalent feature.Behaviour
0s(disabled). Existing deployments are completely unaffected.SIGHUP/POST /-/reload: invalid configs are rejected and logged without disrupting the running instance.lastChecksumis not updated, so the watcher retries on every subsequent tick until the configuration is valid again.SIGTERM.Changes
cmd/alertmanager/main.goconfigFileChecksum()helper,runConfigWatcher()goroutine, watcher start block inrun()cmd/alertmanager/main_test.godocs/configuration.mddocs/management_api.mdfixes #5197
Summary by CodeRabbit
New Features
Documentation
Tests