Adding a yeild in getTeamByServers to prevent CPU monopolization#12860
Adding a yeild in getTeamByServers to prevent CPU monopolization#12860neethuhaneesha wants to merge 1 commit intoapple:release-7.3from
Conversation
8c0f52d to
393da65
Compare
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
| break; | ||
| } | ||
| teamCount++; | ||
| if (teamCount % 200 == 0) { |
There was a problem hiding this comment.
If there is important stuff to run at flow prioroity below the priority of whatever callback is running this, does that stuff get starved? I can't remember.
In any case I'd be in favor of a hard cap on requests per second through this code path given its problematic nature thus far. Like say
if (++teamCount % 1000 == 0) {
wait(SERVER_KNOBS->DD_GET_TEAM_BY_SERVERS_DELAY_PER_1K_REQUESTS);
}
and define that knob as 0.01 to start with, 1 second/(0.01 seconds/1000K requests)) ==> 100K requests/second which I'm guessing is a bit less than one core can do. But we can lower the knob if needed.
There was a problem hiding this comment.
AI is telling that this should give lower priority work a chance within the same outer run loop (I haven't actually looked at the code to tell). If you're sure this will work I'm OK with it.
There was a problem hiding this comment.
Separately I think the yield check should go before checking any team because otherwise if there are bazillion requests all handled by self->teams.begin() then we will never yield. Unlikely maybe, but why have to think about it.
| state const std::string servers = TCTeamInfo::serversToString(req.src); | ||
| state Optional<Reference<IDataDistributionTeam>> res; | ||
| state int teamCount = 0; | ||
| state std::vector<Reference<TCTeamInfo>>::iterator teamIt = self->teams.begin(); |
There was a problem hiding this comment.
since there is a wait(yield()) at the bottom now, it's possible other actors mutate self->teams in between yield points of this actor. Will that invalidate the iterator? Previously this was a blocking/sync call so iterator never got invalidated since only this function could mutate self->teams while it was running.
There was a problem hiding this comment.
Oh good point. Given that the number of teams is on the order of hundreds would it be simpler to just put a single yield() at the top of this function?
Alternatively put the yield at the top of the loop in serverGetTeamRequests(). I feel yields at the tops of potentially tight loops is a bit more idiomatic.
There was a problem hiding this comment.
Yes, assuming O(100) teams and therefore relatively low cost of a single invocation of getTeamByServers, having a single yield at the top would help. It does smell like the outer loop (serverGetTeamRequests) is not giving time to other tasks and it's invoking getTeamByServers back to back, and so an explicit yield within getTeamByServers would help break that cycle. This can help the immediate problem of txns timing out in other actors.
Perhaps we can add some counters here to validate the above assumptions?
There was a problem hiding this comment.
(thinking more)
Above may solve the txn timing out problem but perhaps the next problem can be that as a result of mass storage server migration, the queue in serverGetTeamRequests is quite large (again, we need a counter here). If that were the case, then we have no option but to make the overall processing more efficient. One implementation I can think of:
- have a yield in the outer loop (in serverGetTeamRequests), maybe not add a big delay, but atleast give other tasks in DD time on the run loop in between calls to getTeamByServers
- keep getTeamByServers sync/blocking
- but make getTeamByServers O(1) by using a hash map
There was a problem hiding this comment.
Untested implementation of the latter approach above: #12861
There was a problem hiding this comment.
Just an FYI, Looks like we have ~3500 teams and ~5400 getTeamByServers calls need to be done until all the data moves are recovered.
There was a problem hiding this comment.
interesting, if I'm reading this correctly, 5400 calls × 3500 teams per call = ~18.9 million getServerIDsStr(). That's a lot, can definitely consume most of the CPU.
with this PR, we'll still be doing 18.9M getServerIDsStr, just spread over time, so we'll slowly process the queue. Is the impact there just slow data movement or we think it's more than that?
given the numbers, I think making getTeamByServers O(1) and therefore doing 5400 getServerIDsStr would be ideal. Let me start simulation testing against #12861.
There was a problem hiding this comment.
#12861 looks good to me. But we need to test more thoroughly.
There was a problem hiding this comment.
yeah, that PR needs more testing. There's the teams vector and then the newly added hash map, they must always be consistent. The vector is only mutated in addTeam or removeTeam as far as I can tell, so it does not seem too risky.
so far the results on that PR are: 20260326-072449-praza-e8c2a8291ccb74a8 compressed=True data_size=35199917 duration=1719043 ended=38366 fail_fast=10 max_runs=500000 pass=38366 priority=100 remaining=3:27:09 runtime=0:17:13 sanity=False started=40128 submitted=20260326-072449 timeout=5400 username=praza
having said that, maybe we can try this change (post feedback address) first which feels less invasive and hopefully address txn timeouts. If then data movement is slow due to large queues and slow processing, then we'll have #12861 as a solution.
Replace this text with your description here...
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)