Skip to content

Conversation

@antoniovicente
Copy link
Contributor

The new logic matches the app-limited detection algorithm described in
the BBR RFC draft, which involves checking for the previous state of the
connection before timeout and ack processing.

This change simplifies the event loop and removes the need to bias the
select! to avoid starvation by serializing all the event loop actions
and clarifying the start and end of the work loop iteration.
@antoniovicente antoniovicente requested a review from a team as a code owner January 28, 2026 00:46
@antoniovicente antoniovicente force-pushed the antonio/app_limited branch 2 times, most recently from e0d6be3 to 7190ad4 Compare January 28, 2026 00:50
The new logic matches the app-limited detection algorithm described in
the BBR RFC draft, which involves checking for the previous state of the
connection before timeout and ack processing.
@estcarisimo
Copy link

This branch creates add enable_bbr_app_limited_fix as a flag to detect the status of the data flow coming for the app side. This is propagated to various places including the connection establishment.

The code reads well and it is generally easy to follow for a reader familiar with the app limited issue.

As this change propagate signals to various parts of QUIC mechanism, requires some lengthy comment blocks: (example)
My recommendation is to add documentation (e.g., md) along with that comment to reduce its length and perhaps add a visual element

The app_limited.rs is a very good test case. As it requires many lines of code it is a bit hard to keep all in cache while you read the code. Another minor point is what if enable_bbr_app_limited_fix=false. I am curious about it.

My last comment is that next_release_time_in_past as a function name was a bit confusing at the beginning but I then understood what it meant.

Unit test for pacer_has_pending_transmissions
@antoniovicente
Copy link
Contributor Author

This branch creates add enable_bbr_app_limited_fix as a flag to detect the status of the data flow coming for the app side. This is propagated to various places including the connection establishment.

The code reads well and it is generally easy to follow for a reader familiar with the app limited issue.

As this change propagate signals to various parts of QUIC mechanism, requires some lengthy comment blocks: (example) My recommendation is to add documentation (e.g., md) along with that comment to reduce its length and perhaps add a visual element

Hmm, maybe. I tend to prefer docs in line with code unless I'm trying to describe usage of some tool or larger scale design descriptions. I'll get back to you.

The app_limited.rs is a very good test case. As it requires many lines of code it is a bit hard to keep all in cache while you read the code. Another minor point is what if enable_bbr_app_limited_fix=false. I am curious about it.

I parametrized the test so it runs against CUBIC and BBR with enable_bbr_app_limited_fix=false

The short answer is that BBR incorrectly exits startup when this flag is not set. It would be interesting to also add some test cases for BBR existing startup due to bandwidth plateau but unfortunately I haven't managed to open source the traffic shaper utility that allows testing that sort of thing. I do hope to get to that in the coming weeks.

My last comment is that next_release_time_in_past as a function name was a bit confusing at the beginning but I then understood what it meant.

I renamed this method and added an unit test for it. See if that looks better to you.

/// connection work loop, before receiving packets from the network and
/// processing ACKs/timeouts.
///
/// The expected work loop order is:
Copy link
Member

@ghedo ghedo Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't particularly like that every application now has to change how it uses quiche to get correct behaviour... this API also seems kind of leaky and exposes internal architecture details that applications shouldn't really care about.

Could we detect the start automatically? Say, have an internal was_flushable_at_loop_stat: Option<bool> parameter that is initialized early in recv() if it's None, and then reset to None on the first send() call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very magical. If there is truly a guarantee that the event loop will involve a call to on_timeout, followed by other stuff maybe we can hide the new call inside on_timeout. But the problem is the transitions from not having flushable data to having flushable data. I need to know the value of that call before wait_for_data_or_handshake changes it.

The location of this whole event loop feels somewhat wrong. Things would be simpler if this logic could be moved to quiche; there is strict order of operations and things will go wrong if you get the order wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that there is no guarantee that on_timeout or recv will be called in each loop iteration. Recv will only happen in cases where at least 1 packet has arrived.

I think this means that the first call into quiche could be on_timeout(), recv() or send_on_path().

Similarly there's no great signaling for the end of the loop iteration. We could try to look at differences in Instant::now, but the lack of a now argument to on_timeout(), recv() or send_on_path() makes that difficult; you can't tell based on time if two consecutive send_on_path calls happen on the same or different iterations of the event loop.

Is there something I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to gather information from this new experiment while we figure out the API issues. Could we consider marking some of the new APIs internal and proceed with this PR mostly as-is?

Personally I would like to see a now: Instant argument to on_timeout(), recv() or send_on_path(). But I would also like to see some simplifications to the quiche API by moving more of the work loop to quiche. The current API is just too low level and prone to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants