Skip to content

executor: Don't run background tasks if future is ready#2298

Open
Gelbpunkt wants to merge 1 commit intohermit-os:mainfrom
Gelbpunkt:dont-poll-bg-tasks-if-future-ready
Open

executor: Don't run background tasks if future is ready#2298
Gelbpunkt wants to merge 1 commit intohermit-os:mainfrom
Gelbpunkt:dont-poll-bg-tasks-if-future-ready

Conversation

@Gelbpunkt
Copy link
Member

Often times, the future is ready immediately and yet we will still run the background tasks every time, adding significant overhead that is noticable when benchmarking a HTTP server.

The blocking httpd example gains anywhere between 20-50% more throughput for me with these changes, while the netbench TCP bandwidth benchmark only slightly regresses by about 5%. The httpd example ported to axum sees an increase in throughput by 500-600%:

before:

Beginning round 1...
Benchmarking 128 connections @ http://127.0.0.1:9975 for 15 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    6.27ms   83.92ms  1.25ms   11608.18ms
  Requests:
    Total: 212052  Req/Sec: 14143.12
  Transfer:
    Total: 43.26 MB Transfer Rate: 2.89 MB/Sec

after:

Beginning round 1...
Benchmarking 128 connections @ http://127.0.0.1:9975 for 15 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    0.57ms   0.56ms   0.12ms   209.94ms
  Requests:
    Total: 1197609 Req/Sec: 79857.55
  Transfer:
    Total: 244.29 MB Transfer Rate: 16.29 MB/Sec

This was suggested by @zyuiop in #2086 (comment).

Often times, the future is ready immediately and yet we will still run
the background tasks every time, adding significant overhead that is
noticable when benchmarking a HTTP server.

The blocking httpd example gains anywhere between 20-50% more throughput
for me with these changes, while the netbench TCP bandwidth benchmark
only slightly regresses by about 5%. The httpd example ported to axum
sees an increase in throughput by 500-600%.

Co-authored-by: Louis Vialar <louis.vialar@gmail.com>
@mkroening mkroening self-assigned this Mar 5, 2026
@mkroening mkroening requested review from mkroening and stlankes March 5, 2026 15:24
@zyuiop
Copy link
Contributor

zyuiop commented Mar 5, 2026

Thanks for bringing back this patch!

I'm concerned the fact that the executor may run way less often may cause other issues. Then again, the run call here is only invoked when calling block_on, so if calling run less often causes issues, this means it should probably be invoked in a timer interrupt or something like that.

@mkroening
Copy link
Member

I'm concerned the fact that the executor may run way less often may cause other issues. Then again, the run call here is only invoked when calling block_on, so if calling run less often causes issues, this means it should probably be invoked in a timer interrupt or something like that.

Yeah, I think this is happening in CI. Of course, one of the issues with our executor is that many things are actually blocking, which is bad. Making things properly non-blocking should make polling the executor here a non-issue. On the other hand, I agree that if something breaks due to this PR, that is probably a bug and should be resolved.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark Current: e67a10f Previous: 7c500d8 Performance Ratio
startup_benchmark Build Time 90.87 s 91.25 s 1.00
startup_benchmark File Size 0.81 MB 0.79 MB 1.02
Startup Time - 1 core 0.96 s (±0.03 s) 0.93 s (±0.03 s) 1.03
Startup Time - 2 cores 0.97 s (±0.03 s) 0.94 s (±0.03 s) 1.03
Startup Time - 4 cores 0.99 s (±0.03 s) 0.94 s (±0.02 s) 1.05
multithreaded_benchmark Build Time 86.72 s 88.17 s 0.98
multithreaded_benchmark File Size 0.91 MB 0.89 MB 1.02
Multithreaded Pi Efficiency - 2 Threads 88.71 % (±7.78 %) 90.13 % (±10.09 %) 0.98
Multithreaded Pi Efficiency - 4 Threads 43.62 % (±3.09 %) 45.06 % (±4.71 %) 0.97
Multithreaded Pi Efficiency - 8 Threads 25.50 % (±1.63 %) 25.74 % (±2.31 %) 0.99
micro_benchmarks Build Time 94.77 s 93.18 s 1.02
micro_benchmarks File Size 0.92 MB 0.90 MB 1.02
Scheduling time - 1 thread 72.96 ticks (±3.77 ticks) 71.27 ticks (±4.26 ticks) 1.02
Scheduling time - 2 threads 40.96 ticks (±6.09 ticks) 39.04 ticks (±5.20 ticks) 1.05
Micro - Time for syscall (getpid) 3.09 ticks (±0.25 ticks) 2.97 ticks (±0.30 ticks) 1.04
Memcpy speed - (built_in) block size 4096 65436.08 MByte/s (±47028.44 MByte/s) 66371.85 MByte/s (±47142.10 MByte/s) 0.99
Memcpy speed - (built_in) block size 1048576 29657.45 MByte/s (±24588.04 MByte/s) 29496.16 MByte/s (±24371.57 MByte/s) 1.01
Memcpy speed - (built_in) block size 16777216 27575.23 MByte/s (±23021.49 MByte/s) 27947.10 MByte/s (±23282.80 MByte/s) 0.99
Memset speed - (built_in) block size 4096 66053.38 MByte/s (±47416.08 MByte/s) 66660.22 MByte/s (±47324.52 MByte/s) 0.99
Memset speed - (built_in) block size 1048576 30460.31 MByte/s (±25032.45 MByte/s) 30239.65 MByte/s (±24809.14 MByte/s) 1.01
Memset speed - (built_in) block size 16777216 28336.00 MByte/s (±23452.91 MByte/s) 28701.07 MByte/s (±23711.18 MByte/s) 0.99
Memcpy speed - (rust) block size 4096 56917.41 MByte/s (±42026.53 MByte/s) 59143.47 MByte/s (±43480.35 MByte/s) 0.96
Memcpy speed - (rust) block size 1048576 29396.25 MByte/s (±24539.08 MByte/s) 29331.43 MByte/s (±24300.99 MByte/s) 1.00
Memcpy speed - (rust) block size 16777216 27420.15 MByte/s (±22866.37 MByte/s) 28046.81 MByte/s (±23386.04 MByte/s) 0.98
Memset speed - (rust) block size 4096 57564.84 MByte/s (±42408.57 MByte/s) 59676.52 MByte/s (±43806.15 MByte/s) 0.96
Memset speed - (rust) block size 1048576 30204.88 MByte/s (±24996.22 MByte/s) 30122.22 MByte/s (±24741.83 MByte/s) 1.00
Memset speed - (rust) block size 16777216 28032.05 MByte/s (±23163.63 MByte/s) 28769.44 MByte/s (±23781.83 MByte/s) 0.97
alloc_benchmarks Build Time 93.51 s 92.51 s 1.01
alloc_benchmarks File Size 0.88 MB 0.86 MB 1.02
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 11197.85 Ticks (±194.00 Ticks) 10757.47 Ticks (±159.10 Ticks) 1.04
Allocations - Average Allocation time (no fail) 11197.85 Ticks (±194.00 Ticks) 10757.47 Ticks (±159.10 Ticks) 1.04
Allocations - Average Deallocation time 1082.14 Ticks (±540.08 Ticks) 1302.69 Ticks (±790.51 Ticks) 0.83
mutex_benchmark Build Time 92.90 s 93.08 s 1.00
mutex_benchmark File Size 0.91 MB 0.90 MB 1.02
Mutex Stress Test Average Time per Iteration - 1 Threads 13.10 ns (±0.70 ns) 13.32 ns (±0.71 ns) 0.98
Mutex Stress Test Average Time per Iteration - 2 Threads 23.40 ns (±14.93 ns) 18.58 ns (±6.34 ns) 1.26

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants