Fix race in TestStream::StartPing causing Reject0Rtt/v6/20000 flakes#5995
Fix race in TestStream::StartPing causing Reject0Rtt/v6/20000 flakes#5995gaurav2699 wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
I think we can simplify this.
If I read the help correctly, it is supposed to send a payload of N bytes, with a maximum of K send request in flight. If more than K requests are needed, the completion handler will queue more send requests.
We can just pre-compute how many requests this needs to send and how much data that will be. The member only ever needs to contain what the handler needs.
I'd also recommend to use a lock instead of atomic operation, if there are still any concurrency concerns. This is test code, simplicity and readabilty is much, much more important than an hypothetical performance gain.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5995 +/- ##
==========================================
+ Coverage 85.01% 85.77% +0.76%
==========================================
Files 60 60
Lines 18730 18730
==========================================
+ Hits 15923 16066 +143
+ Misses 2807 2664 -143 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -127,15 +127,55 @@ TestStream::StartPing( | |||
| { | |||
| BytesToSend = PayloadLength / MaxSendBuffers; | |||
There was a problem hiding this comment.
To be honest, I'm not sure I get this.
Is BytesToSend poorly named and actually not the number of bytes left to be sent?
There was a problem hiding this comment.
You're right, BytesToSend is per-buffer bytes remaining, not total. The real semantics is bytes still needed on each of the 2 buffers on the next StreamSend.
| return false; | ||
| } | ||
| if (resultingBytesLeft == 0) { | ||
| if (IsLast) { |
There was a problem hiding this comment.
Is this still needed? If we are on the last packet, we are going to exit the loop anyway.
| int64_t TotalSends; | ||
| uint32_t FinalSendBufferLength; | ||
| if (BytesToSend == 0) { | ||
| const uint64_t BytesPerBuffer = PayloadLength / MaxSendBuffers; |
There was a problem hiding this comment.
Oh, I think I finally get what the variable represents, which doesn't make a lot of sense.
The number of bytes we need to send is divided by the number of buffers we use on every send call?
This variable and BytesToSend are pretty confusing still (already from before this PR), because they don't represent a clear concept (or at least, one that doesn't click in my head...)
It is pretty similar, but I think it would be clearer if:
MaxBytesPerSendOperation = MaxSendBuffers * MaxSendLength
then:
TotalSends = PayloadLength / MaxBytesPerSendOperation;
LastSend = PayloadLength % MaxBytesPerSendOperation;
if (LastSend != 0) {
// Or we could just deal with the last send outside of the loop
TotalSends++;
}
And then, each iteration of the loop send MaxSendBuffers of MaxSendLength, except for the last iteration that sends MaxSendBuffers of LastSend / MaxSendBuffers bytes.
Or we could have the last send just send a single QUIC_BUFFER, honnestly, and keep it simple.
The callback would have to be adapted in the same way so that BytesToSend is actually the number of bytes to send.
Description
AppData/WithSend0RttArgs2.Reject0Rtt/7(v6 / Length=20000) intermittently fails withMsQuic->StreamSend failed, 0x80070057atTestStream.cpp:165. The failure is a race betweenTestStream::StartPing(main test thread) andTestStream::HandleStreamSendComplete(connection worker thread); both mutateBytesToSendnon-atomically and can each decide they're sending the final chunk.Make the "read remaining bytes + pick chunk size + commit subtraction" a single atomic claim via
InterlockedCompareExchange64in both call sites.Desired == 0after the successful CAS becomes the unambiguous "I got the last chunk" signal used for FIN (no separate post-subtract check).If a thread loses the CAS, it loops; if it sees
Current <= 0on a fresh read it exits cleanly.BytesToSendcan no longer go negative, the 4 GB allocation path is unreachable, and the FIN flag is applied by exactly one sender.Fixes: #5894
Testing
Tests Succeed
Documentation
No