feat(ping): add core PingRequest/Response implementation#2939
feat(ping): add core PingRequest/Response implementation#2939roganlynch wants to merge 2 commits intojuanfont:mainfrom
Conversation
…very Add SendDirectUpdate method to LockFreeBatcher to enable sending MapResponse messages directly to specific nodes without going through the change broadcast system. This is useful for targeted updates like PingRequests that need to be delivered immediately to a specific node. The method: - Validates the MapResponse and node existence - Checks for active connections - Sends directly via the node's send method - Logs delivery status for debugging This lays the groundwork for implementing ping functionality that requires direct node communication. Related to juanfont#2902
Add core ping functionality for node health checks and connectivity testing.
This PR implements:
- PingManager: manages ping request lifecycle, tracking, and response handling
- PingRequest types: support for disco, TSMP, and c2n ping types
- Noise protocol integration: authenticated ping response handler
- HTTP endpoint: /machine/ping-response/{request_id} for receiving responses
- Batcher interface extension: SendDirectUpdate method for direct message delivery
The PingManager creates unique ping requests with callback URLs, tracks them
with timeouts, and coordinates responses. The Noise protocol ensures only
authenticated nodes can submit ping responses.
This builds on the SendDirectUpdate capability added in the previous commit
and provides the foundation for gRPC ping endpoints and the scheduler.
Related to juanfont#2902
| } | ||
| } | ||
|
|
||
| // SendDirectUpdate sends a MapResponse directly to a specific node without going through |
There was a problem hiding this comment.
I think my original suggestion was to batch these PR a bit bigger than one for each, so we avoid this where we have the same commits in multiple PRs.
There was a problem hiding this comment.
Apologies - NooB - Went the opposite extreme after the monolithic PR
| @@ -0,0 +1,280 @@ | |||
| package hscontrol | |||
There was a problem hiding this comment.
I will have to think more about this, I am not sure how we want to structure this yet.
| type PingManager struct { | ||
| mu sync.RWMutex | ||
| requests map[string]*PingRequest | ||
| baseURL string |
There was a problem hiding this comment.
I do not think we actually need a base URL, I think the client ignores the part: https://github.com/tailscale/tailscale/blob/8eda947530cebbe3dc7882ace4d9f2829b0448da/ipn/ipnlocal/web_client.go#L183-L185
Would be good to test, because it would simplify more.
There was a problem hiding this comment.
I can test this and get back to you.
| idBytes := make([]byte, 16) | ||
| if _, err := rand.Read(idBytes); err != nil { | ||
| return nil, fmt.Errorf("generating request ID: %w", err) | ||
| } |
There was a problem hiding this comment.
we already have a few random ID generators implemented, use one of those.
| } | ||
|
|
||
| // CreateKeepAlivePing creates a simple keep-alive ping request | ||
| func (pm *PingManager) CreateKeepAlivePing(nodeID types.NodeID) (*tailcfg.PingRequest, error) { |
There was a problem hiding this comment.
Good Question. Probably vestigial/superfluous. Will investigate and remove as needs be.
|
|
||
| // CheckNodeOnline checks if a node is online by sending a health check ping. | ||
| // It creates a ping request, sends it via the MapBatcher, and waits for a response. | ||
| func (h *Headscale) CheckNodeOnline(nodeID types.NodeID, targetIP string) (*tailcfg.PingResponse, error) { |
There was a problem hiding this comment.
I feel like this manager in general has too much "exposed api", so a user have access to too much internal channels and other stuff from the PingResponse structure.
This function is more of a sensible wrapper, where the thing you can access is the "what do I want, I want to know if a node is healthy", without having to do a large multistep building block thing. Of course we need them available, but I suspect we want to have them internally in a package, and only expose specific functions, like "Is it still here" (health sounds misguided, because the node being available doesnt have to mean healthy).
I would think that the only exposed things should be pinging (are you there) and various c2n implementations we add over time. The rest are internals.
There was a problem hiding this comment.
Roger. So we're starting small - I'll see what can be dropped/simplified.
Adds core ping functionality for node health checks and connectivity testing.
Changes
/machine/ping-response/{request_id}for receiving responsesImplementation Details
The PingManager creates unique ping requests with callback URLs, tracks them with timeouts, and coordinates responses. The Noise protocol ensures only authenticated nodes can submit ping responses.
This builds on the SendDirectUpdate capability from the previous commit and provides the foundation for gRPC ping endpoints and the scheduler.
Context
Part of the work discussed in #2902 to add ping functionality to headscale.
Dependencies
Note: This code was generated with assistance from Claude Sonnet 4.5 via cline.bot