Problem
When a NATS connection is lost, the nats input becomes silently non-functional with no log output to indicate what happened or whether recovery is in progress.
The current get() in connection.go only sets nats.ErrorHandler, which covers async subscription-level errors.
It does not set:
- nats.DisconnectErrHandler — fired when the TCP connection drops
- nats.ReconnectHandler — fired when reconnection succeeds
- nats.ClosedHandler — fired when the client exhausts all reconnect attempts and gives up permanently
This means:
- A TCP disconnect produces zero log output
- Each reconnect attempt produces zero log output
- If MaxReconnects (default: 60) is exhausted and the connection is permanently closed, there is zero log output — the input simply stops receiving messages forever, with no indication of why
Impact
In production, this makes it nearly impossible to diagnose why an input stopped receiving messages without external tooling.
The symptom is: input_received metric drops to 0 and never recovers, requiring a manual pod restart.
Without logs from these handlers, operators cannot distinguish between:
- A transient network blip that self-recovered
- A permanent connection loss requiring intervention
- A slow consumer that caused the server to forcibly close the subscription
Proposed Fix
Add the three missing handlers in get(), immediately after errorHandlerOption:
opts = append(opts, nats.DisconnectErrHandler(func(nc *nats.Conn, err error) {
if err != nil {
c.logger.Errorf("NATS disconnected from %s: %v", nc.ConnectedUrl(), err)
} else {
c.logger.Warnf("NATS disconnected from %s (no error)", nc.ConnectedUrl())
}
}))
opts = append(opts, nats.ReconnectHandler(func(nc *nats.Conn) {
c.logger.Infof("NATS reconnected to %s", nc.ConnectedUrl())
}))
opts = append(opts, nats.ClosedHandler(func(nc *nats.Conn) {
c.logger.Errorf("NATS connection permanently closed (exhausted reconnect attempts), manual restart required")
}))
The ClosedHandler log in particular is critical: it is the only way an operator can know that the input will never recover without a restart, since the NATS client silently stops after MaxReconnects is exceeded (if people does not configure MaxReconnect=-1 reconnect forever).
Note: ClosedHandler is also a natural place to surface a fatal error or trigger a component restart in the future, but logging is a minimal and non-breaking first step.
Problem
When a NATS connection is lost, the nats input becomes silently non-functional with no log output to indicate what happened or whether recovery is in progress.
The current get() in connection.go only sets nats.ErrorHandler, which covers async subscription-level errors.
It does not set:
This means:
Impact
In production, this makes it nearly impossible to diagnose why an input stopped receiving messages without external tooling.
The symptom is: input_received metric drops to 0 and never recovers, requiring a manual pod restart.
Without logs from these handlers, operators cannot distinguish between:
Proposed Fix
Add the three missing handlers in get(), immediately after errorHandlerOption:
The ClosedHandler log in particular is critical: it is the only way an operator can know that the input will never recover without a restart, since the NATS client silently stops after MaxReconnects is exceeded (if people does not configure MaxReconnect=-1 reconnect forever).
Note: ClosedHandler is also a natural place to surface a fatal error or trigger a component restart in the future, but logging is a minimal and non-breaking first step.