ecmult_multi: Replace scratch space with malloc, use abcd cost model by fjahr · Pull Request #1789 · bitcoin-core/secp256k1

fjahr · 2025-12-15T23:47:46Z

This is an implementation of the discussed changes from an in-person meeting in October. It removes usage of scratch space in batch validation and replaces it with internal malloc usage. It also adds an ABCD cost model for algorithm selection.

The API and internals follow the drafted spec from the meeting very closely: https://gist.github.com/fjahr/c2a009487dffe7a1fbf17ca1821976ca There are few minor changes that should not change the intended behavior. The test coverage may be currently a bit lower than it was previously. I am guessing an adapted form of the test_ecmult_multi test should be added back and there are some TODOs left in the test code which I am planning to address after a first round of conceptual feedback.

The second commit demonstrates the calibration tooling that I have been using though it's not exactly what has given me the results that are in the PR. I have still been struggling with the calibration code and seem to never really get a result that just works without manual tweaking. The calibration itself as well as the code added there thus is rather a work in progress. I am assuming some version of the calibration code should be added to the repo and I haven't thought much about what the best place to add it is. Putting it into the benchmark and combining it with the python script was just a convenient way for experimentation. I am very open to suggestions on how to change this.

siv2r · 2025-12-26T13:33:12Z

-    return 1;
-}
-
 int secp256k1_musig_pubkey_agg(const secp256k1_context* ctx, secp256k1_xonly_pubkey *agg_pk, secp256k1_musig_keyagg_cache *keyagg_cache, const secp256k1_pubkey * const* pubkeys, size_t n_pubkeys) {


It might be better to pass mem_limit as an input argument rather than defining it internally

Hm, I see the point that this would align with the batch validation API that has been discussed but I am not sure that this justifies changing the existing API of musig. It would probably better to have this in a separate/follow-up PR but I will wait a bit to see what other reviewers think about this.

siv2r · 2025-12-27T09:17:04Z

+ * values from the output.
+ */
+static void run_ecmult_multi_calib(bench_data* data) {
+    static const size_t batch_sizes[] = {10, 20, 30, 50, 75, 100, 150, 200, 300, 500, 750, 1000, 1500, 2000, 3000, 5000, 7500, 10000, 15000, 20000, 30000};


I think it woulbe nice to have more batch sizes in the crossover region of strauss and pippenger (~80-150 points). So, something like this:

static const size_t batch_sizes[] = { /* Small (Strauss region) */ 5, 10, 15, 20, 30, 50, 70, /* Crossover region */ 85, 88, 90, 100, 120, 150, 175, /* Medium (Pippenger small windows, w=6..8) */ 200, 300, 500, 750, 1000, 1200, /* Large (Pippenger large windows, w=9..12) */ 1500, 2000, 3000, 5000, 7500, 10000, 15000, 20000, 30000 };

We should probably cap the batch size for Strauss calibration (7,500 or less?). At higher sizes, the benchmark results curve and become outliers, which will skew the linear regression model. Since ecmult_multi_select won't choose Strauss for large batches anyway, we can avoid these sizes. I've attached a visualization below.

The Strauss fit improved. More info: #1789 (comment)

siv2r · 2025-12-27T09:38:22Z

I ran the calibration tool, and I'm getting the following results. It's very close the exisitng results, uptil PIPPENGER_8 after which the C value increases for me (this shouldn't happen right?). I'll run this a few more times.

My ABCD values

static const struct secp256k1_ecmult_multi_abcd secp256k1_ecmult_multi_abcds[SECP256K1_ECMULT_MULTI_NUM_ALGOS] = {
    {0,                                     0,                                     1000,  0     },
    {SECP256K1_STRAUSS_POINT_SIZE,          0,                                     113,   143  },
    {SECP256K1_PIPPENGER_POINT_SIZE(1),  SECP256K1_PIPPENGER_FIXED_SIZE(1),  199,   303  },
    {SECP256K1_PIPPENGER_POINT_SIZE(2),  SECP256K1_PIPPENGER_FIXED_SIZE(2),  152,   460  },
    {SECP256K1_PIPPENGER_POINT_SIZE(3),  SECP256K1_PIPPENGER_FIXED_SIZE(3),  117,   782  },
    {SECP256K1_PIPPENGER_POINT_SIZE(4),  SECP256K1_PIPPENGER_FIXED_SIZE(4),  100,   1158 },
    {SECP256K1_PIPPENGER_POINT_SIZE(5),  SECP256K1_PIPPENGER_FIXED_SIZE(5),  86,    1837 },
    {SECP256K1_PIPPENGER_POINT_SIZE(6),  SECP256K1_PIPPENGER_FIXED_SIZE(6),  77,    3013 },
    {SECP256K1_PIPPENGER_POINT_SIZE(7),  SECP256K1_PIPPENGER_FIXED_SIZE(7),  72,    4845 },
    {SECP256K1_PIPPENGER_POINT_SIZE(8),  SECP256K1_PIPPENGER_FIXED_SIZE(8),  69,    8775 },
    {SECP256K1_PIPPENGER_POINT_SIZE(9),  SECP256K1_PIPPENGER_FIXED_SIZE(9),  73,    14373},
    {SECP256K1_PIPPENGER_POINT_SIZE(10), SECP256K1_PIPPENGER_FIXED_SIZE(10), 78,    26442},
    {SECP256K1_PIPPENGER_POINT_SIZE(11), SECP256K1_PIPPENGER_FIXED_SIZE(11), 80,    48783},
    {SECP256K1_PIPPENGER_POINT_SIZE(12), SECP256K1_PIPPENGER_FIXED_SIZE(12), 106,   88289},
};

I’ve visualized the benchmarks for all algorithms (see siv2r@3119386 for the diagrams). I’m planning to do a deeper dive into the regression model fit to ensure they align with these results. Will share any useful findings here

siv2r · 2025-12-27T14:11:08Z

I did some analysis on the linear regression model for the CD values (see siv2r@b5985a6). I basically ran ./bench_ecmult calib twice. Used the first run to compute the linear regression model, and the second run to test it against new benchmark values.

ABCD Values

static const struct secp256k1_ecmult_multi_abcd secp256k1_ecmult_multi_abcds[SECP256K1_ECMULT_MULTI_NUM_ALGOS] = {
    {0,                                     0,                                     1000,  0     },
    {SECP256K1_STRAUSS_POINT_SIZE,          0,                                     112,   173  },
    {SECP256K1_PIPPENGER_POINT_SIZE(1),  SECP256K1_PIPPENGER_FIXED_SIZE(1),  197,   285  },
    {SECP256K1_PIPPENGER_POINT_SIZE(2),  SECP256K1_PIPPENGER_FIXED_SIZE(2),  152,   480  },
    {SECP256K1_PIPPENGER_POINT_SIZE(3),  SECP256K1_PIPPENGER_FIXED_SIZE(3),  117,   767  },
    {SECP256K1_PIPPENGER_POINT_SIZE(4),  SECP256K1_PIPPENGER_FIXED_SIZE(4),  100,   1167 },
    {SECP256K1_PIPPENGER_POINT_SIZE(5),  SECP256K1_PIPPENGER_FIXED_SIZE(5),  86,    1887 },
    {SECP256K1_PIPPENGER_POINT_SIZE(6),  SECP256K1_PIPPENGER_FIXED_SIZE(6),  78,    3023 },
    {SECP256K1_PIPPENGER_POINT_SIZE(7),  SECP256K1_PIPPENGER_FIXED_SIZE(7),  73,    4906 },
    {SECP256K1_PIPPENGER_POINT_SIZE(8),  SECP256K1_PIPPENGER_FIXED_SIZE(8),  70,    8889 },
    {SECP256K1_PIPPENGER_POINT_SIZE(9),  SECP256K1_PIPPENGER_FIXED_SIZE(9),  74,    14544},
    {SECP256K1_PIPPENGER_POINT_SIZE(10), SECP256K1_PIPPENGER_FIXED_SIZE(10), 79,    26764},
    {SECP256K1_PIPPENGER_POINT_SIZE(11), SECP256K1_PIPPENGER_FIXED_SIZE(11), 84,    49179},
    {SECP256K1_PIPPENGER_POINT_SIZE(12), SECP256K1_PIPPENGER_FIXED_SIZE(12), 106,   89518},
};

Statistical analysis

Metric Definitions

Metric	Meaning	Good Value for Benchmarking
R²	How well the linear model fits (0-1)	> 0.95 (strong linear relationship)
Std Error	Uncertainty in slope estimate	Low relative to slope
p-value	Probability slope = 0 by chance	< 0.05 (statistically significant)
Slope	Time increase per additional n (μs)	Positive, algorithm-dependent
Intercept	Fixed overhead time (μs)	Algorithm-dependent

Per-Algorithm Metrics

Algorithm	R²	Std Error	p-value	Slope	Intercept
STRAUSS	0.983536	0.1660	1.29e-25	6.2956	-1917.23
PIPPENGER_1	0.999995	0.0039	6.29e-73	8.5860	-133.20
PIPPENGER_2	0.999992	0.0036	9.98e-71	6.7066	-115.04
PIPPENGER_3	0.999993	0.0026	8.15e-71	4.9746	-34.11
PIPPENGER_4	0.999996	0.0015	5.34e-75	4.1772	-13.27
PIPPENGER_5	0.999995	0.0015	1.16e-73	3.5473	56.56
PIPPENGER_6	0.999991	0.0019	2.30e-69	3.1035	105.03
PIPPENGER_7	0.999994	0.0013	4.69e-72	2.7083	219.62
PIPPENGER_8	0.999989	0.0016	2.64e-68	2.4879	414.58
PIPPENGER_9	0.999977	0.0021	3.76e-64	2.2872	719.46
PIPPENGER_10	0.999953	0.0028	5.02e-60	2.1705	1287.92
PIPPENGER_11	0.999847	0.0049	4.73e-53	2.0592	2321.53
PIPPENGER_12	0.999583	0.0075	3.62e-47	1.9331	4153.43

The linear fit ($R^2$) and standard error for STRAUSS is relatively worse. We can improve this by using smaller batch sizes when calibrating Strauss. I see negative intercepts (i.e., D values) for PIPPENGER_1..4, this should be positive. I think this could also be address by capping these algorithms to medium batch sizes during calibration. Otherwise, the calibrated values are excellent!

fjahr · 2026-01-20T16:29:36Z

Addressed the comments on the implementation code and rebased, thanks a lot for the review @siv2r ! I need a little more time to think about the calibration again, but will comment on that asap.

fjahr · 2026-01-25T17:22:17Z

@siv2r Thanks again for the review! The hint about using more narrow ranges for each of the algos in the calibration was awesome, I didn't think of this before and it pretty much fixed all the problems I ran into with the calibration results. I took these suggestions with some smaller additional tweaks, specifically with Strauss which still had quite a bit of variance. I see pretty stable calibration results between multiple runs now and the measurements between the different benchmarks are running smoother than any of my previous iterations.

These improvements have given me a lot more confidence in these changes, so taking this out of draft status too.

fjahr · 2026-02-03T10:37:05Z

Randomly found #638 when looking at older PRs and found a small bug in my pippenger size calculation from comparing the overlapping code.

fjahr · 2026-03-24T10:35:24Z

Addressed @siv2r 's feedback and rebased, I had kind of lost track of that, sorry.

siv2r · 2026-03-26T09:30:12Z

+    secp256k1_ecmult_multi_algo algo = secp256k1_ecmult_multi_select(mem_limit, n_points);
+    return secp256k1_ecmult_multi_internal(error_callback, algo, r, n_points, points, scalars, scalar_g);


The old secp256k1_ecmult_multi_var had an internal batch loop: it split n points into ceil(n / max_points_per_batch) chunks based on scratch capacity, ran Strauss or Pippenger on each chunk, and accumulated results. The new secp256k1_ecmult_multi passes n_points directly as batch_size to secp256k1_ecmult_multi_select with no splitting. All points are processed in one shot.

This means when n_points * A + B > mem_limit for every algorithm, select falls back to TRIVIAL. On a 64-bit system, Strauss needs ~1,414 bytes/point and even the cheapest Pippenger (window 1) needs ~760 bytes/point. So for test_ecmult_multi_batching with n_points = 200:

mem_limit Strauss (needs 283 KB for 200 pts) Pippenger w1 (needs 153 KB for 200 pts) Selected

1 KB exceeds exceeds TRIVIAL

4 KB exceeds exceeds TRIVIAL

16 KB exceeds exceeds TRIVIAL

64 KB exceeds exceeds TRIVIAL

256 KB exceeds fits Pippenger

1 MB+ fits fits best fit

The old test (176 points, scratch sized from 1 to 176) used TRIVIAL only 2 out of 178 calls (the two explicit edge cases), and every loop iteration used Strauss or Pippenger via batching. The new test hits TRIVIAL for 4 of 7 memory limits, it still passes (TRIVIAL is correct), but no longer exercises the efficient algorithms under constrained memory.

One fix: re-introduce the batch loop in secp256k1_ecmult_multi using the existing secp256k1_ecmult_multi_batch_size to compute chunk size, then call secp256k1_ecmult_multi_internal per chunk. This also means modules that want to manage their own batching (e.g., batch verification) could call _internal directly.

Alternatively, if callers are always expected to provide sufficient mem_limit, the current design works. Since ecmult_multi isn't a public API, that's not unreasonable. But most callers probably won't know what "sufficient memory" means for a given n_points, they'd expect ecmult_multi to handle that for them (see also this discussion about exposing mem_limit to musig callers). I'm slightly leaning toward letting ecmult_multi do the batching like the old code, but will probably know more once caller requirements become clearer.

One downside of having ecmult_multi do internal batching (like the old code) is that callers lose visibility into which algorithm is actually being used. In particular, it's hard to detect a silent fallback to TRIVIAL. The current design avoids this since callers can call _select beforehand to inspect the choice.

That said, I think this isn't a problem. If ecmult_multi handled batching internally, callers who just want the result would use it directly. Callers who want more control (inspect the algorithm, force a specific one, handle batching themselves) would use _select + _multi_internal instead.

Right now, ecmult_multi is basically a thin wrapper over _select + _multi_internal, so adding batching to it would give the two functions more distinct roles. What do you think?

You are making a good argument and I agree, I oversimplified things not keeping memory constraint environments in mind here. I have introduced the batch loop. If possible it still tries to one-shot but if not it basically matches the behavior from secp256k1_ecmult_multi_var in master.

siv2r · 2026-03-26T14:15:22Z

The PR description acknowledges that test coverage is currently lower than before. Here's a list that might be helpful when adding tests back:

Test each algorithm individually via _multi_internal (TRIVIAL, STRAUSS, PIPPENGER 1-12), verifying results against secp256k1_ecmult for small inputs (n=0, 1, 2)

Structured edge case tests: all-infinity points, all-zero scalars, cancelling pairs (negated scalar with same point, same scalar with negated point), sum-to-zero across multiple points

Boundary tests for _batch_size: mem_limit=0 should return ECMULT_MAX_POINTS_PER_BATCH (TRIVIAL fallback), mem_limit=1 similarly. For larger values, verify the returned batch_size leads to a successful _multi call

Property tests for _select: with large mem_limit, _select(mem_limit, 10) should pick STRAUSS (fastest for tiny batches), _select(mem_limit, 100000) should pick a PIPPENGER variant (fastest for large batches), _select(0, n) should return TRIVIAL for any n.

I'm wondering if we can also define a range of points where each algorithm transitions to the next, and assert those transitions

Thanks, yeah, I had planned to get back to this and add some more tests. I think I was able to cover all your suggestions but I have skipped exploring coverage for the exact transitions since they seemed brittle and we might still recalibrate. But open to revisit this if other reviewers think this would be valuable to add once we had a few more eyes on the current calibration values.

siv2r · 2026-03-26T18:41:06Z

I re-ran my old analysis on the new ABCD values. The results are very positive, and there's clear improvement in STRAUSS values, the linear fit is better now (0.983 -> 0.998).

ABCD Values

static const struct secp256k1_ecmult_multi_abcd secp256k1_ecmult_multi_abcds[SECP256K1_ECMULT_MULTI_NUM_ALGOS] = {
    {0,                                     0,                                     1000,  0     },
    {SECP256K1_STRAUSS_POINT_SIZE,          0,                                     103,   274  },
    {SECP256K1_PIPPENGER_POINT_SIZE(1),  SECP256K1_PIPPENGER_FIXED_SIZE(1),  195,   417  },
    {SECP256K1_PIPPENGER_POINT_SIZE(2),  SECP256K1_PIPPENGER_FIXED_SIZE(2),  147,   602  },
    {SECP256K1_PIPPENGER_POINT_SIZE(3),  SECP256K1_PIPPENGER_FIXED_SIZE(3),  117,   888  },
    {SECP256K1_PIPPENGER_POINT_SIZE(4),  SECP256K1_PIPPENGER_FIXED_SIZE(4),  100,   1390 },
    {SECP256K1_PIPPENGER_POINT_SIZE(5),  SECP256K1_PIPPENGER_FIXED_SIZE(5),  86,    2227 },
    {SECP256K1_PIPPENGER_POINT_SIZE(6),  SECP256K1_PIPPENGER_FIXED_SIZE(6),  76,    3755 },
    {SECP256K1_PIPPENGER_POINT_SIZE(7),  SECP256K1_PIPPENGER_FIXED_SIZE(7),  67,    6626 },
    {SECP256K1_PIPPENGER_POINT_SIZE(8),  SECP256K1_PIPPENGER_FIXED_SIZE(8),  62,    10976},
    {SECP256K1_PIPPENGER_POINT_SIZE(9),  SECP256K1_PIPPENGER_FIXED_SIZE(9),  57,    19909},
    {SECP256K1_PIPPENGER_POINT_SIZE(10), SECP256K1_PIPPENGER_FIXED_SIZE(10), 54,    36574},
    {SECP256K1_PIPPENGER_POINT_SIZE(11), SECP256K1_PIPPENGER_FIXED_SIZE(11), 50,    72630},
    {SECP256K1_PIPPENGER_POINT_SIZE(12), SECP256K1_PIPPENGER_FIXED_SIZE(12), 46,    135404},
};

Statistical analysis

Metric Definitions

Metric	Meaning	Good Value for Benchmarking
R²	How well the linear model fits (0-1)	> 0.95 (strong linear relationship)
Std Error	Uncertainty in slope estimate	Low relative to slope
p-value	Probability slope = 0 by chance	< 0.05 (statistically significant)
Slope	Time increase per additional n (μs)	Positive, algorithm-dependent
Intercept	Fixed overhead time (μs)	Algorithm-dependent

Per-Algorithm Metrics

Algorithm	R²	Std Error	p-value	Slope	Intercept
STRAUSS	0.998256	0.0468	2.77e-26	4.5477	-15.70
PIPPENGER_1	0.999981	0.0110	6.54e-25	7.6313	16.23
PIPPENGER_2	0.999932	0.0131	1.38e-30	5.7699	21.74
PIPPENGER_3	0.999973	0.0066	1.98e-33	4.6724	27.37
PIPPENGER_4	0.999987	0.0040	3.30e-33	3.9572	45.39
PIPPENGER_5	0.999988	0.0033	9.93e-36	3.4059	80.97
PIPPENGER_6	0.999979	0.0044	1.02e-24	2.9971	123.83
PIPPENGER_7	0.999960	0.0057	4.08e-21	2.6388	220.59
PIPPENGER_8	0.999989	0.0027	1.18e-23	2.4487	353.68
PIPPENGER_9	0.999995	0.0019	8.61e-20	2.2403	700.71
PIPPENGER_10	0.999992	0.0028	7.05e-14	2.1095	1444.03
PIPPENGER_11	0.999914	0.0109	3.41e-07	1.9767	2804.01
PIPPENGER_12	0.999997	0.0033	1.14e-03	1.8235	5310.41

siv2r · 2026-03-30T17:57:36Z

I was curious whether we could squeeze out better CD values by calibrating each algorithm on its most optimal batch range, instead of the broader ranges we currently use. So I ran all 14 ecmult_multi algorithms across [2, 50000] and plotted the per-point times to figure out which algorithm actually wins at each batch size (see the two toggles below for the full picture).

graph comparing all 14 algorithms for speedup

optimal algorithm for each batch size

Optimal Algorithm Ranges

Optimal Range	Best Algorithm	Per-Point Time (us)	Speedup vs Individual
[2, 90]	STRAUSS	4.40 - 6.97	1.40x - 2.27x
[95, 130]	PIPPENGER_5	4.12 - 4.37	2.30x - 2.44x
[150, 250]	PIPPENGER_6	3.67 - 4.01	2.51x - 2.76x
[300, 750]	PIPPENGER_7	3.02 - 3.56	2.84x - 3.37x
[1000, 1500]	PIPPENGER_8	2.74 - 2.88	3.59x - 3.77x
[2000, 7500]	PIPPENGER_9	2.34 - 2.63	3.91x - 4.42x
[10000, 12500]	PIPPENGER_11	2.24 - 2.29	4.53x - 4.62x
[15000, 50000]	PIPPENGER_12	1.96 - 2.20	4.71x - 5.29x

Key Findings

Never optimal: PIPPENGER_1, PIPPENGER_2, PIPPENGER_3, PIPPENGER_4, PIPPENGER_10 — dominated by neighbors at every batch size.
Crossover points: STRAUSS -> PIPPENGER_5 at n~92; PIPPENGER_5 -> PIPPENGER_6 at n=140; PIPPENGER_6 -> PIPPENGER_7 at n=275; PIPPENGER_7 -> PIPPENGER_8 at n=875; PIPPENGER_8 -> PIPPENGER_9 at n=1750; PIPPENGER_9 -> PIPPENGER_11 at n=8750; PIPPENGER_11 -> PIPPENGER_12 at n=13750.

Winner at each batch size

N	Best Algorithm	Per-Point Time (us)	Speedup
2	STRAUSS	6.97	1.40x
3	STRAUSS	6.12	1.60x
4	STRAUSS	5.75	1.71x
5	STRAUSS	5.46	1.80x
7	STRAUSS	5.15	1.93x
10	STRAUSS	4.85	2.06x
13	STRAUSS	4.79	2.09x
16	STRAUSS	4.61	2.15x
20	STRAUSS	4.58	2.16x
25	STRAUSS	4.58	2.18x
30	STRAUSS	4.46	2.24x
35	STRAUSS	4.46	2.23x
40	STRAUSS	4.48	2.23x
45	STRAUSS	4.44	2.25x
50	STRAUSS	4.48	2.24x
55	STRAUSS	4.40	2.27x
60	STRAUSS	4.42	2.26x
65	STRAUSS	4.42	2.27x
70	STRAUSS	4.42	2.27x
75	STRAUSS	4.41	2.27x
80	STRAUSS	4.46	2.26x
85	STRAUSS	4.45	2.26x
88	PIPPENGER_5	4.45	2.26x
90	STRAUSS	4.43	2.27x
95	PIPPENGER_5	4.37	2.30x
100	PIPPENGER_5	4.33	2.33x
110	PIPPENGER_5	4.25	2.37x
120	PIPPENGER_5	4.18	2.41x
130	PIPPENGER_5	4.12	2.44x
150	PIPPENGER_6	4.01	2.51x
175	PIPPENGER_6	3.86	2.60x
200	PIPPENGER_6	3.77	2.68x
250	PIPPENGER_6	3.67	2.76x
300	PIPPENGER_7	3.56	2.84x
350	PIPPENGER_7	3.46	2.93x
400	PIPPENGER_7	3.38	2.99x
500	PIPPENGER_7	3.25	3.11x
600	PIPPENGER_7	3.09	3.29x
750	PIPPENGER_7	3.02	3.37x
1000	PIPPENGER_8	2.88	3.59x
1200	PIPPENGER_8	2.81	3.69x
1500	PIPPENGER_8	2.74	3.77x
2000	PIPPENGER_9	2.63	3.91x
2500	PIPPENGER_9	2.56	4.05x
3000	PIPPENGER_9	2.51	4.12x
4000	PIPPENGER_9	2.46	4.20x
5000	PIPPENGER_9	2.42	4.28x
7500	PIPPENGER_10	2.34	4.42x
10000	PIPPENGER_11	2.29	4.53x
12500	PIPPENGER_11	2.24	4.62x
15000	PIPPENGER_12	2.20	4.71x
20000	PIPPENGER_12	2.12	4.90x
25000	PIPPENGER_12	2.07	5.01x
30000	PIPPENGER_12	2.03	5.09x
40000	PIPPENGER_12	2.00	5.20x
50000	PIPPENGER_12	1.96	5.29x

A few things stood out. First, pippenger w=1 through w=4 are never optimal at any batch size — after strauss, ecmult_multi jumps straight to pippenger w=5. The pippenger CD values in general are already well calibrated, which makes sense since the linear fit for those is near perfect (R^2 > 0.999).

The interesting part was strauss. Strauss flatlines after about n=90. So when we calibrate using batches from [2, 500] as we currently do, we're fitting a line to data that stops being linear after 90. Including these flatline points pulls the regression intercept C upward (104 -> 109) and the slope D downward (184 -> 120) compared to what we'd get from fitting just the linear region [2, 90]. In other words, the non-linear tail inflates the C value.

Surprisingly, that inflation is actually helpful. The ecmult_multi_select picks the method that minimizes C + D/n. With the current values (C=109, D=120 for strauss, C=86, D=2187 for pippenger w=5), the crossover n works out to (2187 - 120) / (109 - 86) = 89.9. Which lines up almost perfectly with the empirical data showing pippenger w=5 becomes faster around n=92 (in the second graph above). Also, the old code switched to pippenger at n = 88.

When I tried to "fix" the linear fit by calibrating strauss only on [2, 90] where it's truly linear, I got a much better R^2 (0.9998 vs 0.983) but the crossover moved to n=110. That means ecmult_multi would keep picking strauss all the way through n=110, even though pippenger w=5 is clearly faster by n=95. Better statistical fit, worse algorithm selection.

TL;DR the current CD values are near perfect. The slightly "bad" linear fit for strauss isn't a problem. It is exactly what pushes the crossover to pippenger at the correct batch size (at n=89).

fjahr · 2026-04-03T21:50:37Z

Huge thanks for the helpful review comments @siv2r , I think I have addressed them all now.

fjahr force-pushed the 2025-11-mem-multi-var branch from 68dd612 to 8b62524 Compare December 16, 2025 21:21

siv2r reviewed Dec 26, 2025

View reviewed changes

siv2r reviewed Dec 27, 2025

View reviewed changes

Comment thread src/ecmult_impl.h Outdated

siv2r reviewed Dec 27, 2025

View reviewed changes

fjahr force-pushed the 2025-11-mem-multi-var branch 2 times, most recently from 72fd3e8 to c2c9bc2 Compare January 20, 2026 16:27

fjahr force-pushed the 2025-11-mem-multi-var branch from c2c9bc2 to 4aa160c Compare January 25, 2026 16:48

fjahr marked this pull request as ready for review January 25, 2026 17:22

fjahr mentioned this pull request Jan 27, 2026

[EXPERIMENTAL] Schnorr batch verification for blocks bitcoin/bitcoin#29491

Draft

1 task

fjahr force-pushed the 2025-11-mem-multi-var branch from 4aa160c to ef4754b Compare February 3, 2026 10:34

fjahr mentioned this pull request Feb 3, 2026

Function to compute optimal ecmult_multi scratch size for a number of points #638

Open

fjahr force-pushed the 2025-11-mem-multi-var branch from ef4754b to b2cc6d1 Compare February 3, 2026 10:45

siv2r reviewed Mar 23, 2026

View reviewed changes

Comment thread src/ecmult.h Outdated

fjahr force-pushed the 2025-11-mem-multi-var branch from b2cc6d1 to 497e9e4 Compare March 24, 2026 10:34

fjahr force-pushed the 2025-11-mem-multi-var branch from 497e9e4 to 5050b5e Compare March 24, 2026 10:46