Description
A race condition in abstract-message-stream.ts causes an uncaught exception when a drain event fires on a stream whose underlying transport is already closing. The StreamStateError is thrown inside an event listener callback with no surrounding try/catch, so it propagates as an uncaught exception to the Node.js process level.
Environment
@libp2p/utils@7.0.13 (latest)
@chainsafe/libp2p-noise@17.0.0
@libp2p/mplex@12.0.14
- Lodestar v1.41.0 (Ethereum consensus client)
- Node.js (TCP transport)
Stack Trace
[network] error: uncaughtException: Cannot write to a stream that is closing
StreamStateError: Cannot write to a stream that is closing
at TCPSocketMultiaddrConnection.send (abstract-message-stream.js:112)
at stream.send (@libp2p/prometheus-metrics/index.js:183)
at EncryptedMessageStream.sendData (@chainsafe/libp2p-noise/utils.js:169)
at EncryptedMessageStream.processSendQueue (abstract-message-stream.js:355)
at EncryptedMessageStream.continueSendingOnDrain (abstract-message-stream.js:56)
Root Cause Analysis
The race:
- A TCP connection begins closing → the underlying stream's
writeStatus transitions to 'closing'
- Before the close completes, a
drain event fires on the underlying TCP stream (the OS/Node.js TCP socket signals buffer space is available)
- The noise layer's
noiseOnDrain handler (libp2p-noise/utils.ts) relays this as safeDispatchEvent('drain') on the encrypted stream
- The
continueSendingOnDrain listener in abstract-message-stream.ts (line 97-104) calls processSendQueue()
processSendQueue() (line 428) has guards for writableNeedsDrain, empty buffer, and sendingData — but no guard for writeStatus
- It calls
sendData() → noise's sendData() calls this.stream.send() on the underlying (now-closing) TCP stream
send() (line 169-171) throws StreamStateError because writeStatus === 'closing'
- The throw propagates through the event listener call stack with no try/catch → uncaught exception
Key observation: The continueSendingOnDrain callback at line 97 only checks this.writableNeedsDrain before calling processSendQueue(). Neither it nor processSendQueue() verifies that the stream is still writable.
Impact
- On Lodestar mainnet: 6 uncaught exceptions in 7 days (3 distinct timestamps, occurring in bursts of 2-3 when a connection tears down)
- The error is logged as
[network] error: uncaughtException — it reaches the Node.js process.on('uncaughtException') handler
- Currently non-fatal (Lodestar catches and logs it) but any
uncaughtException is a process stability risk
- Affects any js-libp2p consumer using noise over TCP when connections tear down while the send queue is draining
Proposed Fix
Add a writeStatus guard in processSendQueue(), which is the single chokepoint for all send paths:
// In abstract-message-stream.ts, processSendQueue() — add after existing bail-out checks:
protected processSendQueue (): boolean {
// bail if the underlying transport is full
if (this.writableNeedsDrain) { /* ... existing ... */ }
// bail if there is no data to send
if (this.writeBuffer.byteLength === 0) { /* ... existing ... */ }
// bail if we are already sending data
if (this.sendingData) { /* ... existing ... */ }
// bail if the stream is no longer writable — a drain event can
// arrive after writeStatus transitions to 'closing'/'closed'
if (this.writeStatus !== 'writable') {
this.log.trace('not processing send queue as stream is %s', this.writeStatus)
return false
}
// ... rest of method unchanged
}
This is safe because:
- It's a single guard at the beginning of the method, consistent with the existing bail-out pattern
- Data remaining in
writeBuffer when the stream closes will be discarded as part of the normal close/abort cleanup
- The
send() method already enforces this invariant by throwing — this just converts the throw into a graceful no-op at the right level
Alternative (less preferred): Add a try/catch around the sendData() call in processSendQueue() to swallow StreamStateError during close. This would be more defensive but masks the root cause.
Reproduction
This is a timing-dependent race that occurs during TCP connection teardown. It's most likely to manifest under load when many connections are being established and torn down simultaneously. In our case, a mainnet Ethereum beacon node with ~200 peers.
The simplest conceptual reproduction:
- Establish a noise-over-TCP connection with mplex
- Send enough data to fill the write buffer (trigger
writableNeedsDrain)
- Close/abort the connection while the buffer is still draining
- The OS delivers a drain event after the close has started → boom
Reported from Lodestar — an Ethereum consensus client built on js-libp2p.
Description
A race condition in
abstract-message-stream.tscauses an uncaught exception when adrainevent fires on a stream whose underlying transport is already closing. TheStreamStateErroris thrown inside an event listener callback with no surrounding try/catch, so it propagates as an uncaught exception to the Node.js process level.Environment
@libp2p/utils@7.0.13(latest)@chainsafe/libp2p-noise@17.0.0@libp2p/mplex@12.0.14Stack Trace
Root Cause Analysis
The race:
writeStatustransitions to'closing'drainevent fires on the underlying TCP stream (the OS/Node.js TCP socket signals buffer space is available)noiseOnDrainhandler (libp2p-noise/utils.ts) relays this assafeDispatchEvent('drain')on the encrypted streamcontinueSendingOnDrainlistener inabstract-message-stream.ts(line 97-104) callsprocessSendQueue()processSendQueue()(line 428) has guards forwritableNeedsDrain, empty buffer, andsendingData— but no guard forwriteStatussendData()→ noise'ssendData()callsthis.stream.send()on the underlying (now-closing) TCP streamsend()(line 169-171) throwsStreamStateErrorbecausewriteStatus === 'closing'Key observation: The
continueSendingOnDraincallback at line 97 only checksthis.writableNeedsDrainbefore callingprocessSendQueue(). Neither it norprocessSendQueue()verifies that the stream is still writable.Impact
[network] error: uncaughtException— it reaches the Node.jsprocess.on('uncaughtException')handleruncaughtExceptionis a process stability riskProposed Fix
Add a
writeStatusguard inprocessSendQueue(), which is the single chokepoint for all send paths:This is safe because:
writeBufferwhen the stream closes will be discarded as part of the normal close/abort cleanupsend()method already enforces this invariant by throwing — this just converts the throw into a graceful no-op at the right levelAlternative (less preferred): Add a try/catch around the
sendData()call inprocessSendQueue()to swallowStreamStateErrorduring close. This would be more defensive but masks the root cause.Reproduction
This is a timing-dependent race that occurs during TCP connection teardown. It's most likely to manifest under load when many connections are being established and torn down simultaneously. In our case, a mainnet Ethereum beacon node with ~200 peers.
The simplest conceptual reproduction:
writableNeedsDrain)Reported from Lodestar — an Ethereum consensus client built on js-libp2p.