Skip to content

Add OPC UA Reverse Connect support#1755

Open
kevinherron wants to merge 41 commits into
mainfrom
feature/reverse-connect-2
Open

Add OPC UA Reverse Connect support#1755
kevinherron wants to merge 41 commits into
mainfrom
feature/reverse-connect-2

Conversation

@kevinherron
Copy link
Copy Markdown
Contributor

@kevinherron kevinherron commented May 22, 2026

Motivation

Add symmetric support for OPC UA Reverse Connect (Part 6 §7.1.2.6), which inverts the initial TCP handshake so the server dials the client. This enables deployments where network policy prevents clients from reaching servers directly — typically servers behind NAT or restrictive firewalls.

Description

Implements Reverse Connect end-to-end across the stack and SDK layers on both sides.

  • Stack — new ReverseHelloMessage and MessageType.ReverseHello on the RHE/F frame; OpcTcpServerReverseConnector dials the client and hands off to the standard server pipeline; the client channel initializer is split so the same UASC pipeline can be installed on either an outbound or already-connected channel.
  • Client SDKReverseConnectManager binds listener sockets, runs a verifier, and matches one-shot selectors against incoming candidates. OpcUaClient.createReverseConnect(...) covers the hint-based case; DiscoveryFirstReverseConnectClient handles the case where the endpoint is not known up front; ReverseConnectAcceptor supports a single listener serving many unknown servers.
  • Server SDKOpcUaServer exposes a target management API (add / update / remove / listen) backed by ReverseConnectTargetManager, with per-target retry policy and a runtime handle for pause, resume, trigger, and remove.
  • Docs and examples — feature and architecture references under docs/, plus reverse-connect client examples (including a Prosys interop variant).

ReverseHello is treated strictly as a pre-SecureChannel routing hint; all server identity validation remains in the normal certificate, endpoint, SecureChannel, and Session paths.

Introduce stack-core support for encoding, decoding, and identifying the
UA-TCP RHE message so later reverse-connect transport work can reuse the
same bounded message value and simple-message framing path.
Share client and server UASC pipeline initialization between existing TCP transports and future reverse-connected channels. This keeps outbound and passive listener behavior on the current path while letting already-active channels supply the Hello endpoint URL and still enforce the server Hello deadline.
Establish the client-side pre-UASC control plane for reverse-connect sockets so later transport work can hand claimed channels into the normal Milo client pipeline.
Introduce the internal UA-TCP primitive that opens a server-owned socket to a client reverse listener, sends ReverseHello, and hands the channel into the normal server UASC Hello path.

Keep target registries and retry scheduling out of this layer so later phases can build on attempt futures, state transitions, and connector-owned cleanup.
Wire reverse-opened TCP channels into the normal client Session path while
keeping ReverseConnectManager responsible for listener ownership, ReverseHello
validation, selector matching, and channel handoff.

Use optional transport capabilities for channel-state observation and current
channel access so SessionFsm remains transport-neutral and non-TCP transports
do not inherit lifecycle hooks they cannot support.
Support dynamic client-side claiming of pending reverse connections and one-shot
client creation from pre-claimed channels. Add server-side target configuration,
runtime control, scheduling, retry, update, and observability APIs on top of the
low-level reverse connector.
Make the dynamic Reverse Connect path first-class for clients that
start from ReverseHello metadata instead of a preselected endpoint.
The helpers separate the consumed discovery connection from the later
production Session connection and cover shared-listener routing,
endpoint selection, and pending-candidate acceptor startup behavior.
Show discovery-first Reverse Connect as the primary example flow for
clients that do not already have an endpoint description. Group
vendor-specific examples by server family and keep the local Milo
examples focused on the normal client/server workflow with TCP
initiation reversed.
Make handoff terminal before the channel future is exposed so pause and
update cannot close a channel that has entered the server UASC path. Track
pending handoff attempts until the target manager records their channels.
First-message timeout handling must not reject a candidate after
a ReverseHello has already been decoded but before selector
matching completes.

Gate timeout rejection on both the waiting state and the absence
of a recorded ReverseHello so stale scheduler tasks cannot tear
down valid candidates.
Release discovery-first acceptor keys when the production reverse
transport disconnects so later candidates for the same server can start
a fresh discovery and production connection. Emit a disconnected
transition during explicit reverse transport disconnects to make
application-initiated shutdowns observable.
Direct reverse connections are one-shot, so close, disconnect, and reuse
paths must fail the transport channel future terminally instead of leaving
session recovery or public connect calls waiting for another channel.
Avoid scheduling trigger or resume attempts while a target already owns a handed-off reverse channel. This keeps lifecycle operations aligned with the one-active-channel invariant and covers both paths with regression tests.
Default discovery-first routing previously treated missing ServerUri and
EndpointUrl hints as an exact match. Fail the default production path when
both hints are absent so shared listeners cannot claim unrelated hintless
candidates while still allowing custom selectors to opt into alternate
routing.
Make the discovery-first example spell out how it prefers the
ReverseHello URL while still selecting from the GetEndpoints response.
Harden reverse-channel initialization, server transport unbind handling,
post-ReverseHello Hello size checks, and malformed ReverseHello decode
errors so wire-level failures surface deterministically.
Keep manager listener events balanced during partial startup failure,
preserve transition listener ordering, re-arm manager-mode claims before
failing invalid endpoints, and ensure acceptor stop/listener failures
clean up clients.
Make target handle failures asynchronous, validate resume and trigger
scheduling, preserve terminal attempt events for stale generations, and
apply retry policy to active-channel reconnects.
Keep ReverseHello string limits tied to the Part 6 4096-byte cap and make connected client channel initialization fail fast when called off the Netty event loop.
Make reverse-listener cleanup and discovery failure handling deterministic under candidate rejection, pre-Hello close, explicit disconnect, synchronous setup failure, and empty endpoint discovery responses.
Keep target generations to one increment per logical transition and defensively copy retained attempt errors so snapshots can be exposed without letting callers mutate manager diagnostics.
Clarify client, server, and transport reverse-connect state machines and add short embedded API examples so users can follow common setup, selection, and cleanup flows without jumping to examples.
Clean up IDEA inspection findings in the unshipped Reverse Connect implementation and tests while leaving API-unused and duplicate-code warnings alone.
Ensure terminal attempt states cannot be overwritten by late async
callbacks, avoid invoking observers while holding the attempt state lock,
reject invalid first-response headers before buffering advertised bodies,
and fail partial server pipeline setup cleanly.
Propagate discovery-first cancellation into hidden discovery and
production work, keep production reconnect candidates available to the
owning transport before acceptor scans run, and release acceptor active
keys when pre-delivery failures or already-closed transports are observed.
Validate resumed targets even when an active channel is present, and emit
the synthetic closed attempt event on active-channel retry paths so
listeners observe the same state used by retry policy.
SessionFsm shelves the CloseSession event while in Creating/Activating
with a pending request. A reverse-connect transport whose server has
vanished without notifying the client can leave the FSM there
indefinitely, so disconnectAsync never returns. Bound the wait so
transport.disconnect runs after the timeout, failing the pending
channelFuture and unblocking the FSM.
sendRequestMessage previously awaited getChannel() with no bound, so a
reverse-connect transport whose server is offline would park requests
forever. Schedule the request timeout up front so the future completes
when the timeout hint elapses even if no channel arrives.
The transport transition listener is registered only on the transition
into Active. If the channel went inactive between the SecureChannel
handshake and reaching Active, the transport already emitted
connected=false before the listener was attached, so recovery would
wait for the next request to fail. Fire ConnectionLost synthetically
when the channel is already inactive at install time.
The reverse-connect initializer previously installed the customizer
before the AcknowledgeHandler while the outbound initializer installed
it after. Unify both paths to install the customizer after the
AcknowledgeHandler. Document the consequence: outbound writes from the
AcknowledgeHandler (such as the client Hello) do not pass through
customizer-installed handlers added with addLast.
- ReverseConnectAcceptor: release activeKeys in finally so an unexpected
  throw cannot leak the key; report inactive production transports as a
  delivery failure rather than swallowing them.
- ReverseConnectDiscovery: close the claimed connection when
  cancellation arrives after the in-flight GetEndpoints starts.
- ReverseConnectManager: skip the verifier when the candidate channel
  has already closed.
- ReverseTcpClientTransport: emit synthetic connected=false when a
  previous channel's close listener has not yet fired; complete the
  target future inside the lock to avoid orphan connected=true events.

Plus doc clarifications for the verifier contract, default production
selector matching, and session endpoint validation defaults.
- OpcUaServer: use a non-mutating transport lookup for target
  validation and run validation after binding so the bound transport
  is found without side effects.
- ReverseConnectTargetManager: clear pendingHandoffAttempts on shutdown;
  install a rescue cleanup when an attempt is cancelled but races a
  handoff; defensively copy stored lastError so callers see the same
  snapshot semantics; notify added before scheduling.
- OpcTcpServerReverseConnector: register the attempt after
  bootstrap.connect to avoid bookkeeping leaks on synchronous throws;
  always removeAttempt in finally.
- OpcTcpServerReverseConnectResponseHandler: release the Hello buffer
  on the exception path.
- TcpMessageEncoder: release the allocated buffer in a finally if the
  payload encoder throws, so encode failures cannot leak the partially
  written buffer.
- TcpMessageDecoder: replace the assert-based header guards in HEL/ACK/
  ERR with explicit UaException throws and 8-byte truncation checks,
  matching the existing RHE behavior regardless of -ea.
- OpcTcpServerReverseConnectAttempt: reject terminal nextState in
  transition() so future callers cannot bypass channelFuture completion.
- OpcTcpServerReverseConnectParameters: enforce the Part 6 4096-byte
  UTF-8 limit on serverUri/endpointUrl in the compact constructor so
  oversized values fail fast before TCP is opened.
- OpcTcpServerReverseConnector: map UnknownHostException and
  NoRouteToHostException to Bad_ConnectionRejected.
- OpcTcpServerTransport: issue connector.connect inside the transport
  lock so a concurrent unbind cannot surface the misleading
  "connector is closed" message.
- UascServerHelloHandler: capture and cancel the scheduled hello-
  deadline future on the success path so it does not retain the
  ChannelHandlerContext until the deadline fires.
- ReverseHelloMessageTest: cover truncated RHE headers, mixed-null and
  empty-string round trips, and assert Bad_EncodingLimitsExceeded on
  oversize cases.
- OpcUaServer: document that getBoundEndpoints returns an empty list
  when the server is not running, matching the unbind-clears-endpoints
  behavior introduced earlier on this branch.
- OpcUaServerConfigBuilder: require non-null targets in
  setReverseConnectTargets/addReverseConnectTarget; add a source-
  compatible OpcUaServerConfigImpl constructor that defaults the new
  reverseConnectTargets parameter to Set.of().
- ReverseConnectTargetManager: document the startup -> shutdown ->
  startup restart cycle and what is allowed between those phases.
- ReverseConnectTargetSnapshot: document the type-loss in lastError()
  so callers know to use lastStatusCode() and Throwable.toString()
  rather than instanceof checks.
- DiscoveryClient: correct the reverseGetEndpoints failure message to
  "null or blank" to match the predicate.
- ReverseConnectAcceptor: gate the production transport listener body
  on running.get() so stale transitions after stop() do not mutate
  acceptor bookkeeping.
- ReverseConnectManager: fire the bound listener event inside the
  bind lock so a concurrent accepted-channel event cannot enqueue
  before bound; set firstMessageReceived as soon as 8 bytes arrive so
  a malformed-frame throw cannot race the first-message timeout.
- ReverseTcpClientTransport: route the applicationContext-null failure
  through enterDirectTerminalStateLocked in direct mode so the original
  cause survives the next connect; fail handshakeFuture on channel
  close before/after initializer dispatch so the chain cannot remain
  pending; make enterDirectTerminalStateLocked a no-op when a prior
  terminal failure has already been recorded.
- OpcUaClientReverseConnectTest: wrap the manager in try-with-resources.
@kevinherron kevinherron changed the title Add OPC UA Reverse Connect support (Part 6 §7.1.2.6) Add OPC UA Reverse Connect support May 22, 2026
Fixes lifecycle races and validation gaps found during reverse-connect review,
including partial first-frame timeout handling, discovery cancellation cleanup,
listener callback ordering, initial Hello customization, target id validation,
and targeted Maven command documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant