Skip to content

Comments

feat(ping): add core PingRequest/Response implementation#2939

Open
roganlynch wants to merge 2 commits intojuanfont:mainfrom
roganlynch:pr2b-ping-core
Open

feat(ping): add core PingRequest/Response implementation#2939
roganlynch wants to merge 2 commits intojuanfont:mainfrom
roganlynch:pr2b-ping-core

Conversation

@roganlynch
Copy link
Contributor

Adds core ping functionality for node health checks and connectivity testing.

Changes

  • PingManager: Manages ping request lifecycle, tracking, and response handling
  • PingRequest types: Support for disco, TSMP, and c2n ping types
  • Noise protocol integration: Authenticated ping response handler
  • HTTP endpoint: /machine/ping-response/{request_id} for receiving responses
  • Interface extension: Uses SendDirectUpdate method for direct message delivery

Implementation Details

The PingManager creates unique ping requests with callback URLs, tracks them with timeouts, and coordinates responses. The Noise protocol ensures only authenticated nodes can submit ping responses.

This builds on the SendDirectUpdate capability from the previous commit and provides the foundation for gRPC ping endpoints and the scheduler.

Context

Part of the work discussed in #2902 to add ping functionality to headscale.

Dependencies


Note: This code was generated with assistance from Claude Sonnet 4.5 via cline.bot

…very

Add SendDirectUpdate method to LockFreeBatcher to enable sending MapResponse
messages directly to specific nodes without going through the change broadcast
system. This is useful for targeted updates like PingRequests that need to be
delivered immediately to a specific node.

The method:
- Validates the MapResponse and node existence
- Checks for active connections
- Sends directly via the node's send method
- Logs delivery status for debugging

This lays the groundwork for implementing ping functionality that requires
direct node communication.

Related to juanfont#2902
Add core ping functionality for node health checks and connectivity testing.

This PR implements:
- PingManager: manages ping request lifecycle, tracking, and response handling
- PingRequest types: support for disco, TSMP, and c2n ping types
- Noise protocol integration: authenticated ping response handler
- HTTP endpoint: /machine/ping-response/{request_id} for receiving responses
- Batcher interface extension: SendDirectUpdate method for direct message delivery

The PingManager creates unique ping requests with callback URLs, tracks them
with timeouts, and coordinates responses. The Noise protocol ensures only
authenticated nodes can submit ping responses.

This builds on the SendDirectUpdate capability added in the previous commit
and provides the foundation for gRPC ping endpoints and the scheduler.

Related to juanfont#2902
}
}

// SendDirectUpdate sends a MapResponse directly to a specific node without going through
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my original suggestion was to batch these PR a bit bigger than one for each, so we avoid this where we have the same commits in multiple PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies - NooB - Went the opposite extreme after the monolithic PR

@@ -0,0 +1,280 @@
package hscontrol
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to think more about this, I am not sure how we want to structure this yet.

type PingManager struct {
mu sync.RWMutex
requests map[string]*PingRequest
baseURL string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we actually need a base URL, I think the client ignores the part: https://github.com/tailscale/tailscale/blob/8eda947530cebbe3dc7882ace4d9f2829b0448da/ipn/ipnlocal/web_client.go#L183-L185

Would be good to test, because it would simplify more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can test this and get back to you.

Comment on lines +52 to +55
idBytes := make([]byte, 16)
if _, err := rand.Read(idBytes); err != nil {
return nil, fmt.Errorf("generating request ID: %w", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a few random ID generators implemented, use one of those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger.

}

// CreateKeepAlivePing creates a simple keep-alive ping request
func (pm *PingManager) CreateKeepAlivePing(nodeID types.NodeID) (*tailcfg.PingRequest, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Question. Probably vestigial/superfluous. Will investigate and remove as needs be.


// CheckNodeOnline checks if a node is online by sending a health check ping.
// It creates a ping request, sends it via the MapBatcher, and waits for a response.
func (h *Headscale) CheckNodeOnline(nodeID types.NodeID, targetIP string) (*tailcfg.PingResponse, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this manager in general has too much "exposed api", so a user have access to too much internal channels and other stuff from the PingResponse structure.

This function is more of a sensible wrapper, where the thing you can access is the "what do I want, I want to know if a node is healthy", without having to do a large multistep building block thing. Of course we need them available, but I suspect we want to have them internally in a package, and only expose specific functions, like "Is it still here" (health sounds misguided, because the node being available doesnt have to mean healthy).

I would think that the only exposed things should be pinging (are you there) and various c2n implementations we add over time. The rest are internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger. So we're starting small - I'll see what can be dropped/simplified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants