Skip to content

test: add benchmarks with latency PoC#2751

Open
gilcu3 wants to merge 10 commits intomainfrom
2243-investigate-the-possibility-of-simulating-multiparty-protocols-with-a-single-thread
Open

test: add benchmarks with latency PoC#2751
gilcu3 wants to merge 10 commits intomainfrom
2243-investigate-the-possibility-of-simulating-multiparty-protocols-with-a-single-thread

Conversation

@gilcu3
Copy link
Copy Markdown
Contributor

@gilcu3 gilcu3 commented Apr 8, 2026

Closes #2243

Some initial results:

DKG
❯ cargo bench -p threshold-signatures --bench simulate_dkg
    Finished `bench` profile [optimized] target(s) in 0.16s
     Running benches/simulate_dkg.rs (target/release/deps/simulate_dkg-5b4dacfa91efffed)
Protocol simulation: DKG
Participants: 7, threshold: 4, latency: 50ms, samples: 15

Warming up for 3s... done

=== secp256k1 ===
Total messages: 2856 sent, 2548 received
Avg messages/participant: 408 sent, 364 received
Total bytes: 524085 sent, 492977 received
Avg bytes/participant: 74869 sent, 70425 received
Virtual time: avg 553.912 ms, min 553.886 ms, max 553.955 ms
=== ed25519 ===
Total messages: 2856 sent, 2548 received
Avg messages/participant: 408 sent, 364 received
Total bytes: 522202 sent, 489246 received
Avg bytes/participant: 74600 sent, 69892 received
Virtual time: avg 556.938 ms, min 556.901 ms, max 557.093 ms
=== bls12381 ===
Total messages: 2856 sent, 2548 received
Avg messages/participant: 408 sent, 364 received
Total bytes: 963206 sent, 932406 received
Avg bytes/participant: 137601 sent, 133201 received
Virtual time: avg 566.310 ms, min 566.257 ms, max 566.574 ms
ECDSA
❯ cargo bench -p threshold-signatures --bench simulate_ecdsa
   Compiling threshold-signatures v0.1.0 (/home/rey/opt/near/mpc/.worktrees/simulate/crates/threshold-signatures)
    Finished `bench` profile [optimized] target(s) in 2.29s
     Running benches/simulate_ecdsa.rs (target/release/deps/simulate_ecdsa-1a53bf551ad0759c)
Protocol simulation: ECDSA (Cait-Sith vs DamgardEtAl)
Participants: 7, threshold: 4, latency: 50ms, samples: 15

Setting up (keygen)... done
Warming up for 3s... done

=== Cait-Sith: triples ===
Total messages: 6006 sent, 6006 received
Avg messages/participant: 858 sent, 858 received
Total bytes: 4991353 sent, 4991353 received
Avg bytes/participant: 713050 sent, 713050 received
Virtual time: avg 585.830 ms, min 585.718 ms, max 585.961 ms
=== Cait-Sith: presign ===
Total messages: 84 sent, 84 received
Avg messages/participant: 12 sent, 12 received
Total bytes: 9822 sent, 9822 received
Avg bytes/participant: 1403 sent, 1403 received
Virtual time: avg 100.148 ms, min 100.146 ms, max 100.152 ms
=== Cait-Sith: sign ===
Total messages: 6 sent, 6 received
Avg messages/participant: 1 sent, 1 received
Total bytes: 556 sent, 556 received
Avg bytes/participant: 79 sent, 79 received
Virtual time: avg 50.076 ms, min 50.074 ms, max 50.079 ms
=== DamgardEtAl: presign ===
Total messages: 126 sent, 126 received
Avg messages/participant: 18 sent, 18 received
Total bytes: 22221 sent, 22221 received
Avg bytes/participant: 3174 sent, 3174 received
Virtual time: avg 151.332 ms, min 151.324 ms, max 151.404 ms
=== DamgardEtAl: sign ===
Total messages: 6 sent, 6 received
Avg messages/participant: 1 sent, 1 received
Total bytes: 550 sent, 550 received
Avg bytes/participant: 79 sent, 79 received
Virtual time: avg 50.076 ms, min 50.075 ms, max 50.082 ms
FROST
❯ cargo bench -p threshold-signatures --bench simulate_frost
    Finished `bench` profile [optimized] target(s) in 0.16s
     Running benches/simulate_frost.rs (target/release/deps/simulate_frost-1e88c8a61a5f8235)
Protocol simulation: EdDSA FROST signing
Participants: 7, threshold: 4, latency: 50ms, samples: 15

Setting up (keygen)... done
Warming up for 3s... done

=== frost_v1 ===
Total messages: 18 sent, 18 received
Avg messages/participant: 3 sent, 3 received
Total bytes: 8346 sent, 8346 received
Avg bytes/participant: 1192 sent, 1192 received
Virtual time: avg 151.246 ms, min 151.242 ms, max 151.249 ms
=== frost_v2: presign ===
Total messages: 42 sent, 42 received
Avg messages/participant: 6 sent, 6 received
Total bytes: 6444 sent, 6444 received
Avg bytes/participant: 921 sent, 921 received
Virtual time: avg 50.360 ms, min 50.358 ms, max 50.363 ms
=== frost_v2: sign ===
Total messages: 6 sent, 6 received
Avg messages/participant: 1 sent, 1 received
Total bytes: 612 sent, 612 received
Avg bytes/participant: 87 sent, 87 received
Virtual time: avg 50.327 ms, min 50.325 ms, max 50.330 ms

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a small tweak in this file to allow reusing the function ed25519_prepare_presign

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for simplicity I am just comparing with the same number of total participants, don't mind modifying to cover more special cases here or in a follow-up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core of the PoC is this file, specially functions run_simulation and drain_poke

@gilcu3 gilcu3 marked this pull request as ready for review April 8, 2026 10:53
@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

PR title type suggestion: This PR adds benchmarking infrastructure and test utilities, which are development tools rather than user-facing functionality. Consider using test: instead of feat:

Suggested title: test: add benchmarks with latency PoC

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

Code Review: feat: add benchmarks with latency PoC

No critical issues found. The simulation core (run_simulation / drain_poke) implements a correct discrete-event simulation with:

  • Min-heap priority queue for message ordering
  • Per-participant virtual clocks combining wall-clock computation time + configurable network latency
  • Determinism assertions across repeated samples

A few minor observations (non-blocking):

  • Both run_protocol and run_simulation sort outputs by participant, so all zip pairings across keygen/triple/presign results are correctly aligned.
  • BenchConfig::from_env silently falls back to defaults on parse errors — fine for bench tooling, but a warn! or stderr note could save debugging time if someone passes an invalid env var.
  • sorted_triples.sort_by_key(...) in ot_run_presign is redundant since run_simulation already returns sorted output — no harm, just unnecessary.

✅ Approved — bench-only code, no production changes, simulation logic is sound.

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

PR title type suggestion: This PR adds benchmarks and test utilities, which is development infrastructure rather than user-facing functionality. The type prefix should probably be test: instead of feat:.

Suggested title: test: add benchmarks with latency PoC

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

PR title type suggestion: This PR adds benchmarks and test infrastructure only, so the type prefix should probably be test: instead of feat:.
Suggested title: test: add benchmarks with latency PoC

@gilcu3 gilcu3 changed the title feat: add benchmarks with latency PoC test: add benchmarks with latency PoC Apr 8, 2026
pub total_messages_received: u64,
pub bytes_sent_per_participant: HashMap<Participant, u64>,
pub bytes_received_per_participant: HashMap<Participant, u64>,
pub wall_clock_elapsed: Duration,
Copy link
Copy Markdown
Contributor Author

@gilcu3 gilcu3 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one (wall_clock_elapsed) is not displayed now, so I am fine removing it. I initially thought they would take a long time, but it turns out they are relatively fast

SimonRastikian
SimonRastikian previously approved these changes Apr 8, 2026
Copy link
Copy Markdown
Contributor

@SimonRastikian SimonRastikian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments to these functions to help in readability.

Copy link
Copy Markdown
Collaborator

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice stuff. Interestingly, this becomes very slow for me when I increase the participant count (tried with 20 participants and a threshold of 15).

Haven't gone deep into the code yet but while playing with the benches I've observed this bug:

NUM_PARTICIPANTS=20 THRESHOLD=15 LATENCY_MS=0 cargo bench -p threshold-signatures --bench simulate_dkg
    Finished `bench` profile [optimized] target(s) in 0.17s
     Running benches/simulate_dkg.rs (target/release/deps/simulate_dkg-5b4dacfa91efffed)
Protocol simulation: DKG
Participants: 20, threshold: 15, latency: 0ms, samples: 15

Warming up for 3s... done


thread 'main' panicked at crates/threshold-signatures/src/test_utils/simulator_bench.rs:93:9:
assertion `left == right` failed: secp256k1: total_messages_received changed
  left: 68202
 right: 67731
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I saw this with latency a couple of times but it happened consistently when running without suggesting something is wrong in the current implementation.

@netrome
Copy link
Copy Markdown
Collaborator

netrome commented Apr 8, 2026

For the slowness, Claude suspects echo broadcast being the culprit:

There it is. do_broadcast is a Bracha reliable echo broadcast — not a simple send+receive. Each broadcast has three phases:
  Send → Echo → Ready, each involving all-to-all messaging. Let me count the messages.

  For a single do_broadcast with n participants:
  - Send phase: n broadcasts × (n-1) recipients = n(n-1) messages
  - Echo phase: Each participant echoes for each of n sessions → n × n × (n-1) messages
  - Ready phase: Same pattern → n × n × (n-1) messages

  That's O(n³) messages per broadcast round, not O(n²).

  ┌───────────────────────────┬──────┬─────────┬───────┐
  │                           │ n=7  │  n=20   │ Ratio │
  ├───────────────────────────┼──────┼─────────┼───────┤
  │ Messages per do_broadcast │ ~630 │ ~15,580 │ ~25×  │
  └───────────────────────────┴──────┴─────────┴───────┘

  Now let me check how many do_broadcast calls the DKG makes.

  Searched for 1 pattern (ctrl+o to expand)

● Three do_broadcast calls in DKG. So for n=20: ~46,740 total messages just from broadcasts, vs ~1,890 for n=7. Plus the
  private share messages in round 5.

  Each of those ~47K messages goes through the full simulator cycle: priority queue pop → message() → drain_poke() → poke() →
  poll future → process → possibly enqueue more messages. Even at ~10μs per cycle, that's ~0.5s per sample. With 15 samples ×
  3 ciphersuites + warmup, you're looking at 25+ seconds for n=20 vs ~4 seconds for n=7.

  The echo broadcast's O(n³) message complexity is the culprit. Going from n=7 to n=20 produces ~25× more messages flowing
  through the simulator's event loop. The "no high CPU" observation makes sense because it's single-threaded on a multi-core
  machine, and each individual operation is lightweight (mutex locks, small allocations, heap operations) — it's just the
  sheer volume of them.

  The existing no-latency benchmarks are fast because they presumably use run_protocol which runs all participants
  concurrently with actual async tasks, so the message processing parallelizes across threads. Your simulator runs everything
  sequentially on one thread, so the O(n³) message count translates directly to wall-clock time.

This would explain why our DKG gets stuck for ever when we run with higher participant counts.

@gilcu3
Copy link
Copy Markdown
Contributor Author

gilcu3 commented Apr 8, 2026

For the slowness, Claude suspects echo broadcast being the culprit:

fixed in abb8e36

for reference, it does not get stuck, here you have values for 40 participants:

Details
❯ NUM_PARTICIPANTS=40 cargo bench -p threshold-signatures --bench simulate_dkg
   Compiling threshold-signatures v0.1.0 (/home/rey/opt/near/mpc/.worktrees/simulate/crates/threshold-signatures)
    Finished `bench` profile [optimized] target(s) in 2.89s
     Running benches/simulate_dkg.rs (target/release/deps/simulate_dkg-5b4dacfa91efffed)
Protocol simulation: DKG
Participants: 40, threshold: 21, latency: 50ms, samples: 15

Warming up for 3s... done

=== secp256k1 ===
Total messages: 569400 sent, 506285 received
Avg messages/participant: 14235 sent, 12657 received
Total bytes: 273006256 sent, 266631641 received
Avg bytes/participant: 6825156 sent, 6665791 received
Virtual time: avg 1189.136 ms, min 1188.575 ms, max 1190.168 ms
=== ed25519 ===
Total messages: 569400 sent, 506324 received
Avg messages/participant: 14235 sent, 12658 received
Total bytes: 267323475 sent, 261268179 received
Avg bytes/participant: 6683087 sent, 6531704 received
Virtual time: avg 2313.319 ms, min 2313.034 ms, max 2313.651 ms
=== bls12381 ===
Total messages: 569400 sent, 506324 received
Avg messages/participant: 14235 sent, 12658 received
Total bytes: 667793025 sent, 661548501 received
Avg bytes/participant: 16694826 sent, 16538713 received
Virtual time: avg 4246.255 ms, min 4237.424 ms, max 4265.869 ms

@gilcu3 gilcu3 requested a review from netrome April 8, 2026 17:55
netrome
netrome previously approved these changes Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit annoyed by all trait objects. Haven't gone deep into the code, but since this is standalone benchmarks that seem to work well I'm happy to do a semi-blind approval to get these in.

gilcu3 added 2 commits April 9, 2026 10:12
@claude
Copy link
Copy Markdown

claude bot commented Apr 9, 2026

PR title type suggestion: This PR modifies source files like src/protocol.rs along with adding benchmarks. Since the primary intent appears to be performance measurement (latency PoC), the type prefix should probably be perf: instead of test:.

Suggested title: perf: add latency PoC benchmarks

Copy link
Copy Markdown
Contributor

@SimonRastikian SimonRastikian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this different than the syntax from other legacy benchmarks namely in main? Could you make the main a one liner and add the rest in a proper function?
Also I see quite some repetition in the printed configs, this could benefit from a helper function in bench_utils.
I also suggest the uniformization of the implementations (in the previous legacy examples) that deduce num_participants from maxmalicious. You can otherwise modify the previous implementations to accept arbitrary num_participants. The important being that running the benches is quite uniform across the schemes.

Final nit, please care the order of functions from most important to least i.e. starting with the functions that use criterion and down to least important.

@gilcu3
Copy link
Copy Markdown
Contributor Author

gilcu3 commented Apr 9, 2026

Why is this different than the syntax from other legacy benchmarks namely in main? Could you make the main a one liner and add the rest in a proper function? Also I see quite some repetition in the printed configs, this could benefit from a helper function in bench_utils. I also suggest the uniformization of the implementations (in the previous legacy examples) that deduce num_participants from maxmalicious. You can otherwise modify the previous implementations to accept arbitrary num_participants. The important being that running the benches is quite uniform across the schemes.

Final nit, please care the order of functions from most important to least i.e. starting with the functions that use criterion and down to least important.

I cannot make full sense of this, as I am not using criterion. Please point where the requested changes need to happen.

About making benchmarks uniform, I can do that in a follow-up, just to avoid making this too bloated. Agree taking explicitly the num_participants and threshold is better

@gilcu3 gilcu3 requested review from SimonRastikian and netrome April 9, 2026 10:21
netrome
netrome previously approved these changes Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, some new questions.

Copy link
Copy Markdown
Collaborator

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate the possibility of simulating multiparty protocols with a single thread

3 participants