nats input: silent connection loss with no observability into disconnect/reconnect/close lifecycle

### Problem
When a NATS connection is lost, the nats input becomes silently non-functional with no log output to indicate what happened or whether recovery is in progress.                                                                                                                                           
The current get() in connection.go only sets nats.ErrorHandler, which covers async subscription-level errors. 

It does not set:                
  - nats.DisconnectErrHandler — fired when the TCP connection drops 
  - nats.ReconnectHandler — fired when reconnection succeeds
  - nats.ClosedHandler — fired when the client exhausts all reconnect attempts and gives up permanently
  
This means:
  - A TCP disconnect produces zero log output
  - Each reconnect attempt produces zero log output                                                                                               
  - If MaxReconnects (default: 60) is exhausted and the connection is permanently closed, there is zero log output — the input simply stops receiving messages forever, with no indication of why
 
### Impact
In production, this makes it nearly impossible to diagnose why an input stopped receiving messages without external tooling. 
The symptom is:    input_received metric drops to 0 and never recovers, requiring a manual pod restart. 
Without logs from these handlers, operators cannot distinguish between:                                                                                                                            
  - A transient network blip that self-recovered
  - A permanent connection loss requiring intervention                                                                                            
  - A slow consumer that caused the server to forcibly close the subscription

### Proposed Fix
Add the three missing handlers in get(), immediately after errorHandlerOption:

```
	opts = append(opts, nats.DisconnectErrHandler(func(nc *nats.Conn, err error) {
		if err != nil { 
			c.logger.Errorf("NATS disconnected from %s: %v", nc.ConnectedUrl(), err)
		} else {     
			c.logger.Warnf("NATS disconnected from %s (no error)", nc.ConnectedUrl())
		}                                                                                                                                       
	}))
	
	opts = append(opts, nats.ReconnectHandler(func(nc *nats.Conn) {    
		c.logger.Infof("NATS reconnected to %s", nc.ConnectedUrl())
	}))      
	
	opts = append(opts, nats.ClosedHandler(func(nc *nats.Conn) {
		c.logger.Errorf("NATS connection permanently closed (exhausted reconnect attempts), manual restart required")
	}))
```      
  
The ClosedHandler log in particular is critical: it is the only way an operator can know that the input will never recover without a restart, since the NATS client silently stops after MaxReconnects is exceeded (if people does not configure MaxReconnect=-1 reconnect forever).

**Note**: ClosedHandler is also a natural place to surface a fatal error or trigger a component restart in the future, but logging is a minimal and non-breaking first step.



                                                                                                                                 
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nats input: silent connection loss with no observability into disconnect/reconnect/close lifecycle #4142

Problem

Impact

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nats input: silent connection loss with no observability into disconnect/reconnect/close lifecycle #4142

Description

Problem

Impact

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions