[EXPERIMENTAL] Improved Go tuple library#12338
Draft
Semisol wants to merge 2 commits intoapple:mainfrom
Draft
Conversation
This commit: - Requires a minimum Go version 1.20 to allow using newer features in future commits - Adds support for the length-prefixed bytes type - Fixes certain panics detected by fuzzing on Unpack - Switches to binary.BigEndian methods for decoding integers which eliminates allocations and is faster - Switches to google/uuid for UUIDs to reduce conversion boilerplate as it seems to be the de-facto library for UUIDs - Does some other refactors for a future tuple "builder" to reduce allocations and eliminate type-casting overhead
This commit adds an experimental tuple unpacker to the Go bindings
which reduces unnecessary allocations and tries to optimize decoding.
Tests for the FixedLen type are also added, alongside extending tests
to ensure the unpacked value is identical to the starting value and
adding benchmarks, and negative numbers.
- Zero-copy is used for strings and byte slices whenever possible to
eliminate allocations.
- Suprisingly, a loop was faster than using standard library functions
to locate the end of a string and check.
- A function table was used as it could allow extensibility in the
future and may be faster on certain platforms.
- A custom Boxed type is used, which is larger than an interface{}
but does not require contents to be allocated on the heap. This
is effectively a tagged union.
- A neat hack is used in the unpacking loop to only do 1 allocation
in the majority of tuple decoding cases.
11 tasks
Contributor
|
Thanks for the PR. Curious what is breaking here? Is it the API or something else? |
Contributor
Author
UnpackV2 (which will become Unpack) assumes that the input byte slice will not become deallocated (if it for example points to memory not allocated by Go) or changed (as it tries to zero-copy strings). The other breaking change is to switch from the tuple.UUID type to google/uuid, as it seems to be the most common convention. I am currently maintaining an internal fork of the Go bindings to see how it can be improved, as it has a lot of rough edges. I am intending to contribute back the results once it is more stable, ideally as one large break. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds an experimental tuple decoder that is optimized for lower allocations and higher performance, support for fixed-length byte strings, along with fixing some panics.
This should be kept experimental until a way to handle the potentially breaking changes is found (possibly combining it with other breaking binding changes in a Go bindings "v2")
Over the original decoder, the new decoder attempts to zero-copy strings, and uses a new
Boxedtype which requires zero allocations compared to theanytype which requires its contents are on the heap.A neat hack relating to on-stack slice allocation is also used to remove allocations from
appendthat arise from tuples while not having to estimate the length of the tuple for capacity.The tests are also improved to ensure unpacking works correctly, and benchmarks are added.
This PR was closed and reopened due to separating this update to a separate branch
Benchmarking
To run benchmarks:
go test -run=NONE -bench=^BenchmarkTupleUnpackTo-do list
google/uuid(we may be able to just aliastuple.UUIDto it)Unpackto not depend on the input buffer. This could be "fixed" by copying the input buffer or by making a breaking v2 release (alongside possibly other binding improvements).big.Int. The new decoder does not support this right now.Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.