-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](TabletScheduler)Make EditLog asynchronous and not block the scheduler #59774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
8bf4c4a to
0a1d6fb
Compare
|
run buildall |
TPC-H: Total hot run time: 32086 ms |
TPC-DS: Total hot run time: 172712 ms |
|
run feut |
|
run buildall |
TPC-H: Total hot run time: 31981 ms |
TPC-DS: Total hot run time: 173128 ms |
|
There is a sync edit log api, just use it. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use async editlog api.
I noticed that there is a logEditWithQueue method, but it is sync api in actually and will block current thread, so it doesn't seem to meet the usage needs. We need to make some updates after editlog, such as finalizeTabletCtx Besides, it seems that logEditWithQueue was introduced in a recent version and cannot be used in earlier versions |
|
run p0 |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
This PR introduces asynchronous processing for edit log operations in the TabletScheduler to reduce blocking and improve overall system responsiveness.
Previously, synchronized EditLog operations in the TabletScheduler slowed down the speed of sending clone tasks, making it impossible to effectively increase the replica repair rate across the entire cluster even when raising values such as schedule_batch_size, schedule_slot_num_per_hdd_path/schedule_slot_num_per_ssd_path to large values, particularly in large-scale clusters, the overall replica repair rate is constrained by the FE TabletScheduler and does not increase with the addition of BE nodes. Therefore, we implement the following improvements:
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)