-
Notifications
You must be signed in to change notification settings - Fork 938
BBR: Check for app-limited at the beginning of the packet recv / send loop #2328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This change simplifies the event loop and removes the need to bias the select! to avoid starvation by serializing all the event loop actions and clarifying the start and end of the work loop iteration.
e0d6be3 to
7190ad4
Compare
The new logic matches the app-limited detection algorithm described in the BBR RFC draft, which involves checking for the previous state of the connection before timeout and ack processing.
7190ad4 to
44f9c5b
Compare
|
This branch creates add The code reads well and it is generally easy to follow for a reader familiar with the app limited issue. As this change propagate signals to various parts of QUIC mechanism, requires some lengthy comment blocks: (example) The app_limited.rs is a very good test case. As it requires many lines of code it is a bit hard to keep all in cache while you read the code. Another minor point is what if My last comment is that |
Unit test for pacer_has_pending_transmissions
Hmm, maybe. I tend to prefer docs in line with code unless I'm trying to describe usage of some tool or larger scale design descriptions. I'll get back to you.
I parametrized the test so it runs against CUBIC and BBR with enable_bbr_app_limited_fix=false The short answer is that BBR incorrectly exits startup when this flag is not set. It would be interesting to also add some test cases for BBR existing startup due to bandwidth plateau but unfortunately I haven't managed to open source the traffic shaper utility that allows testing that sort of thing. I do hope to get to that in the coming weeks.
I renamed this method and added an unit test for it. See if that looks better to you. |
| /// connection work loop, before receiving packets from the network and | ||
| /// processing ACKs/timeouts. | ||
| /// | ||
| /// The expected work loop order is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't particularly like that every application now has to change how it uses quiche to get correct behaviour... this API also seems kind of leaky and exposes internal architecture details that applications shouldn't really care about.
Could we detect the start automatically? Say, have an internal was_flushable_at_loop_stat: Option<bool> parameter that is initialized early in recv() if it's None, and then reset to None on the first send() call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very magical. If there is truly a guarantee that the event loop will involve a call to on_timeout, followed by other stuff maybe we can hide the new call inside on_timeout. But the problem is the transitions from not having flushable data to having flushable data. I need to know the value of that call before wait_for_data_or_handshake changes it.
The location of this whole event loop feels somewhat wrong. Things would be simpler if this logic could be moved to quiche; there is strict order of operations and things will go wrong if you get the order wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think that there is no guarantee that on_timeout or recv will be called in each loop iteration. Recv will only happen in cases where at least 1 packet has arrived.
I think this means that the first call into quiche could be on_timeout(), recv() or send_on_path().
Similarly there's no great signaling for the end of the loop iteration. We could try to look at differences in Instant::now, but the lack of a now argument to on_timeout(), recv() or send_on_path() makes that difficult; you can't tell based on time if two consecutive send_on_path calls happen on the same or different iterations of the event loop.
Is there something I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to gather information from this new experiment while we figure out the API issues. Could we consider marking some of the new APIs internal and proceed with this PR mostly as-is?
Personally I would like to see a now: Instant argument to on_timeout(), recv() or send_on_path(). But I would also like to see some simplifications to the quiche API by moving more of the work loop to quiche. The current API is just too low level and prone to change.
The new logic matches the app-limited detection algorithm described in
the BBR RFC draft, which involves checking for the previous state of the
connection before timeout and ack processing.