Speed up packed varint decoding by validating the region up front#153
Speed up packed varint decoding by validating the region up front#153kkaefer wants to merge 1 commit into
Conversation
Decoding packed varint fields is now 30-150% faster, depending on varint length. The speedup comes from the new `decode_varint_unchecked()` / `skip_varint_unchecked()` helpers, which decode a varint without checking against an end pointer on every byte. A side effect is that our iterator size is now a single pointer, aligning it with the fixed iterators. This is safe because a packed varint region is valid only if its last byte has the continuation bit clear, i.e. it ends exactly on a varint boundary. That guarantees a decode starting anywhere inside the region always terminates at or before the final byte, so no end pointer is needed. The packed varint iterators (`const_varint_iterator`, `const_svarint_iterator`) check this once in their constructor, and as a result no longer need to store the end pointer at all: they now hold a single pointer instead of two. That halves their size, but on its own it doesn't measurably change decoding speed (overlong varints are still detected via the byte-count limit). Behavior change: a packed varint field whose trailing varint is truncated now throws `end_of_buffer_exception` when the iterator range is created (in `get_packed_*()`) instead of later during iteration. Arguably this doesn't matter because a truncated varint in a packed field is illegal anyway. The iterators' `(data, end)` constructor is also no longer `noexcept`.
|
Here are more benchmark results:
Legend:
The speedup varies based on the characteristics of the packed varint buffer, but we can see a clear performance increase in all cases, even if we only have a small handful of varints in that buffer, and also if those varints are really short (1-2 bytes). |
|
I guess one side effect of this change is that if you accidentally call |
| // The decoders used by this iterator run without an explicit end | ||
| // pointer. This is only safe if the region ends exactly on a varint | ||
| // boundary, i.e. the last byte does not have its continuation bit set. | ||
| if (data != end && (static_cast<unsigned char>(*(end - 1)) & 0x80U) != 0) { |
There was a problem hiding this comment.
Should we do data < end here?
There was a problem hiding this comment.
I think the data != end makes it clearer that this check is there to cover the case where the constructor is used to construct the "end iterator". With data < end it seems that we want to defend against the case where data > end which we don't.
But there is another change we could make here: Add a different constructor with a single const char* end parameter to construct the "end iterator". This could save one "if" check for the end iterator case (but we'd still need that check in this constructor to be backwards compatible).
There was a problem hiding this comment.
Add a different constructor with a single
const char* endparameter to construct the "end iterator"
I considered this as well, but decided to not do this to avoid accidental creation of an iterator without doing the last-byte check. With the current shape, it's pretty hard to misuse, and I don't think end iterator creation is a bottleneck.
|
This looks pretty good. The only thing I am concerned about is that we don't have tests for the new |
Decoding packed varint fields is now 30-150% faster, depending on varint length. The speedup comes from the new
decode_varint_unchecked()/skip_varint_unchecked()helpers, which decode a varint without checking against an end pointer on every byte. A side effect is that our iterator size is now a single pointer, aligning it with the fixed iterators.This is safe because a packed varint region is valid only if its last byte has the continuation bit clear, i.e. it ends exactly on a varint boundary. That guarantees a decode starting anywhere inside the region always terminates at or before the final byte, so no end pointer is needed. The packed varint iterators (
const_varint_iterator,const_svarint_iterator) check this once in their constructor, and as a result no longer need to store the end pointer at all: they now hold a single pointer instead of two. That halves their size, but on its own it doesn't measurably change decoding speed (overlong varints are still detected via the byte-count limit).Behavior change: a packed varint field whose trailing varint is truncated now throws
end_of_buffer_exceptionwhen the iterator range is created (inget_packed_*()) instead of later during iteration. Arguably this doesn't matter because a truncated varint in a packed field is illegal anyway. The iterators'(data, end)constructor is also no longernoexcept.