-
Notifications
You must be signed in to change notification settings - Fork 17
IO.select blocks indefinitely with :persistent plugin when connections become inactive #137
Description
Summary
When using the :persistent plugin, IO.select in Selector#select_many can be called with a nil timeout, causing it to block indefinitely. This happens because inactive persistent connections return nil from Connection#timeout, and if all selectables have nil timeouts and no timers are pending, Selector#next_timeout returns nil.
We observed a Sidekiq job hang for 11 hours on IO.select before we took a thread dump and killed the process.
Environment
- httpx version: 1.7.5 (branch
issue-377, commit336b057cd5c2) - Ruby version: 3.4.8
- OS: Linux (Docker container, Debian-based)
- Plugins:
:persistent,:retries(max_retries: 2),:rate_limiter,:auth
Configuration
HTTPX.
plugin(:retries, max_retries: 2, retry_after: :exponential_backoff).
plugin(:rate_limiter).
plugin(:auth).
with(
timeout: { connect_timeout: 15, request_timeout: 30, operation_timeout: 30 },
).
plugin(:persistent, close_on_fork: true)Usage pattern
We issue multi-request GET calls to multiple different origins:
# requests is an array of [path, { origin: ..., params: ... }] tuples
responses = session.get(*requests)This fans out HTTP requests to ~20-30 different hosts in parallel via Session#send_requests → Session#receive_requests.
Thread dump (stuck thread)
IO.select # selector.rb:206
HTTPX::Selector#select_many # selector.rb:206
HTTPX::Selector#select # selector.rb:183
HTTPX::Selector#next_tick # selector.rb:53
HTTPX::Session#receive_requests # session.rb:337
HTTPX::Session#send_requests # session.rb:307
HTTPX::Session#request # session.rb:102
HTTPX::Chainable#get # chainable.rb:10
Root cause analysis
1. Connection#timeout returns nil for inactive connections
In connection.rb:327-335:
def timeout
return if @state == :closed || @state == :inactive # <-- returns nil
return @timeout if @timeout
return @options.timeout[:connect_timeout] if @state == :idle
@options.timeout[:operation_timeout]
end2. Selector#next_timeout propagates nil to IO.select
In selector.rb:275-291:
def next_timeout
timer_interval = @timers.wait_interval
connection_interval = @selectables.filter_map(&:timeout).min
return connection_interval unless timer_interval # <-- returns nil if both are nil
# ...
endIf all registered selectables are :inactive or :closed, filter_map(&:timeout) produces an empty array, .min returns nil, and if there are no active timers, next_timeout returns nil.
3. IO.select called with nil timeout blocks forever
In selector.rb:204-206:
def select_many(r, w, interval, &block)
readers, writers = ::IO.select(r, w, nil, interval) # interval is nil → blocks forever4. The :persistent plugin creates the conditions for this
Without :persistent, connections close after completing their requests and are deregistered from the selector. With :persistent, completed connections transition to :inactive and remain as selectables.
The likely sequence:
- Multi-request call sends requests to N hosts
- Most responses complete successfully, their connections transition to
:inactive - One connection enters a problematic state (e.g., remote closes the TCP connection without a proper FIN/RST, or the connection errors out and gets closed)
- All remaining selectables now return
nilfromtimeout - No active timers remain (the request timeout timer may have been cleaned up when the connection errored)
next_timeoutreturnsnilIO.selectblocks forever waiting on the inactive connections' file descriptors
Steps to reproduce
This is a race condition that's hard to reproduce deterministically, but the conditions are:
- Use the
:persistentplugin - Send multi-request calls (
session.get(*multiple_requests)) to multiple different origins - Have one of the target hosts drop the connection in a way that causes the connection to close/error without properly failing the associated request's timeout timer
- The remaining inactive persistent connections keep the selector non-empty, but all return
niltimeouts
A more targeted reproduction might be possible by:
require "httpx"
session = HTTPX.plugin(:persistent).with(
timeout: { connect_timeout: 5, request_timeout: 10, operation_timeout: 10 }
)
# Send requests to multiple hosts where one will hang/drop
responses = session.get(
["https://httpbin.org/get", {}],
["https://host-that-will-drop-connection.example/", {}]
)Suggested fix
Selector#next_timeout should return a maximum timeout value (or the minimum configured timeout) instead of nil when no selectables report a timeout. This ensures IO.select always has a bounded wait. For example:
def next_timeout
@is_timer_interval = false
timer_interval = @timers.wait_interval
connection_interval = @selectables.filter_map(&:timeout).min
# Ensure we never return nil when there are selectables registered,
# to prevent IO.select from blocking indefinitely
if connection_interval.nil? && timer_interval.nil? && @selectables.any?
return 0 # or a small default like 1
end
return connection_interval unless timer_interval
# ...
endAlternatively, Connection#timeout could return operation_timeout for :inactive connections that are still registered in the selector, rather than nil.