Speed up packed varint decoding by validating the region up front by kkaefer · Pull Request #153 · mapbox/protozero

kkaefer · 2026-06-30T20:06:35Z

Decoding packed varint fields is now 30-150% faster, depending on varint length. The speedup comes from the new decode_varint_unchecked() / skip_varint_unchecked() helpers, which decode a varint without checking against an end pointer on every byte. A side effect is that our iterator size is now a single pointer, aligning it with the fixed iterators.

This is safe because a packed varint region is valid only if its last byte has the continuation bit clear, i.e. it ends exactly on a varint boundary. That guarantees a decode starting anywhere inside the region always terminates at or before the final byte, so no end pointer is needed. The packed varint iterators (const_varint_iterator, const_svarint_iterator) check this once in their constructor, and as a result no longer need to store the end pointer at all: they now hold a single pointer instead of two. That halves their size, but on its own it doesn't measurably change decoding speed (overlong varints are still detected via the byte-count limit).

Behavior change: a packed varint field whose trailing varint is truncated now throws end_of_buffer_exception when the iterator range is created (in get_packed_*()) instead of later during iteration. Arguably this doesn't matter because a truncated varint in a packed field is illegal anyway. The iterators' (data, end) constructor is also no longer noexcept.

Decoding packed varint fields is now 30-150% faster, depending on varint length. The speedup comes from the new `decode_varint_unchecked()` / `skip_varint_unchecked()` helpers, which decode a varint without checking against an end pointer on every byte. A side effect is that our iterator size is now a single pointer, aligning it with the fixed iterators. This is safe because a packed varint region is valid only if its last byte has the continuation bit clear, i.e. it ends exactly on a varint boundary. That guarantees a decode starting anywhere inside the region always terminates at or before the final byte, so no end pointer is needed. The packed varint iterators (`const_varint_iterator`, `const_svarint_iterator`) check this once in their constructor, and as a result no longer need to store the end pointer at all: they now hold a single pointer instead of two. That halves their size, but on its own it doesn't measurably change decoding speed (overlong varints are still detected via the byte-count limit). Behavior change: a packed varint field whose trailing varint is truncated now throws `end_of_buffer_exception` when the iterator range is created (in `get_packed_*()`) instead of later during iteration. Arguably this doesn't matter because a truncated varint in a packed field is illegal anyway. The iterators' `(data, end)` constructor is also no longer `noexcept`.

kkaefer · 2026-06-30T20:26:17Z

Here are more benchmark results:

Magnitude	Elements	master (ns)	This PR (ns)	Speedup
bytes1	10	13	8	1.65x
bytes1	100	131	70	1.87x
bytes1	1000	1258	634	1.98x
bytes1	10000	12544	6280	2.00x
bytes1	100000	125293	62685	2.00x
bytes2	10	23	8	2.74x
bytes2	100	168	85	1.98x
bytes2	1000	1577	789	2.00x
bytes2	10000	15659	7827	2.00x
bytes2	100000	157517	78690	2.00x
bytes3	10	35	12	3.06x
bytes3	100	270	115	2.34x
bytes3	1000	2594	1101	2.36x
bytes3	10000	25869	10934	2.37x
bytes3	100000	257822	109451	2.36x
bytes5	10	31	18	1.69x
bytes5	100	298	178	1.67x
bytes5	1000	3139	1730	1.81x
bytes5	10000	28505	17245	1.65x
bytes5	100000	306460	172060	1.78x
bytes10	10	45	37	1.21x
bytes10	100	456	350	1.30x
bytes10	1000	4457	3751	1.19x
bytes10	10000	44513	34380	1.29x
bytes10	100000	446080	343361	1.30x
mixed	10	44	33	1.32x
mixed	100	452	290	1.56x
mixed	1000	4057	2937	1.38x
mixed	10000	83866	42600	1.97x
mixed	100000	1091575	740620	1.47x
mixed_byte_length	10	42	29	1.48x
mixed_byte_length	100	409	303	1.35x
mixed_byte_length	1000	4161	3086	1.35x
mixed_byte_length	10000	84034	48212	1.74x
mixed_byte_length	100000	1270616	741718	1.71x

Legend:

bytesX: Varints that have exactly this many bytes.
mixed: Pick a random bit width in [0, 63], then a value of that width.
mixed_byte_length: Pick a byte length uniformly in [1, 10], then a value of that length.

The speedup varies based on the characteristics of the packed varint buffer, but we can see a clear performance increase in all cases, even if we only have a small handful of varints in that buffer, and also if those varints are really short (1-2 bytes).

kkaefer · 2026-06-30T20:50:30Z

I guess one side effect of this change is that if you accidentally call operator++ on an iterator that is at/beyond the end of the buffer it keeps incrementing, whereas previously, we would have thrown an execpetion. However, STL iterators do this as well.

kkaefer · 2026-07-01T12:17:16Z

+        // The decoders used by this iterator run without an explicit end
+        // pointer. This is only safe if the region ends exactly on a varint
+        // boundary, i.e. the last byte does not have its continuation bit set.
+        if (data != end && (static_cast<unsigned char>(*(end - 1)) & 0x80U) != 0) {


Should we do data < end here?

I think the data != end makes it clearer that this check is there to cover the case where the constructor is used to construct the "end iterator". With data < end it seems that we want to defend against the case where data > end which we don't.

But there is another change we could make here: Add a different constructor with a single const char* end parameter to construct the "end iterator". This could save one "if" check for the end iterator case (but we'd still need that check in this constructor to be backwards compatible).

Add a different constructor with a single const char* end parameter to construct the "end iterator"

I considered this as well, but decided to not do this to avoid accidental creation of an iterator without doing the last-byte check. With the current shape, it's pretty hard to misuse, and I don't think end iterator creation is a bottleneck.

Makes sense.

joto · 2026-07-01T12:22:22Z

This looks pretty good. The only thing I am concerned about is that we don't have tests for the new decode_varint_unchecked() and skip_varint_unchecked functions. They are mostly like the original functions, but different enough that they should get tests in their own right. Maybe the tests can be rewritten so that they can be applied to both the original and the _unchecked() versions?

kkaefer requested a review from a team as a code owner June 30, 2026 20:06

kkaefer requested a review from joto June 30, 2026 20:07

kkaefer commented Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up packed varint decoding by validating the region up front#153

Speed up packed varint decoding by validating the region up front#153
kkaefer wants to merge 1 commit into
masterfrom
kk/optimize-varint-iterators

kkaefer commented Jun 30, 2026

Uh oh!

kkaefer commented Jun 30, 2026 •

edited

Loading

Uh oh!

kkaefer commented Jun 30, 2026

Uh oh!

kkaefer Jul 1, 2026

Uh oh!

joto Jul 1, 2026

Uh oh!

kkaefer Jul 1, 2026

Uh oh!

joto Jul 1, 2026

Uh oh!

joto commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kkaefer commented Jun 30, 2026

Uh oh!

kkaefer commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkaefer commented Jun 30, 2026

Uh oh!

kkaefer Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

joto Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

kkaefer Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

joto Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

joto commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kkaefer commented Jun 30, 2026 •

edited

Loading