Skip to content

Conversation

@copybara-service
Copy link

Avoid doing an extra pass over strings in the cases when the min/max number of bytes for the length-prefix is not bounded.

Finding the encoded lengths of strings is slow, so to avoid doing work we usually can first see how many bytes the length-prefix-varint will need, leave that much space, encode the string to see how long it is, and then jump back to write the length in front.

But, in some cases we can't tell how many bytes the length prefix will take: for example a string that is 100 chars long will encode to 100 bytes if it was ascii but 300 bytes long if it was all emojis. The size encoded as a varint would be 1-byte in the first case and 2 bytes in teh second, so it can't know.

Instead of doing 2 pass in this situation, just Utf8 encode to a tmp byte[] first, write the length and bytes.

When this code was orginilly written, doing 2 passes was better, but in 2025 where the java.lang.Strings have alternate internal represantions that we cannot access, as well as advancements in JIT technology for the tmp allocation to typically not need GC, this is better.

…number of bytes for the length-prefix is not bounded.

Finding the encoded lengths of strings is slow, so to avoid doing work we usually can first see how many bytes the length-prefix-varint will need, leave that much space, encode the string to see how long it is, and then jump back to write the length in front.

But, in some cases we can't tell how many bytes the length prefix will take: for example a string that is 100 chars long will encode to 100 bytes if it was ascii but 300 bytes long if it was all emojis. The size encoded as a varint would be 1-byte in the first case and 2 bytes in teh second, so it can't know.

Instead of doing 2 pass in this situation, just Utf8 encode to a tmp byte[] first, write the length and bytes.

When this code was orginilly written, doing 2 passes was better, but in 2025 where the java.lang.Strings have alternate internal represantions that we cannot access, as well as advancements in JIT technology for the tmp allocation to typically not need GC, this is better.

PiperOrigin-RevId: 848160875
@github-actions
Copy link

Auto-closing Copybara pull request

@github-actions github-actions bot closed this Dec 30, 2025
@github-actions github-actions bot deleted the test_848160875 branch December 30, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant