Skip to content

Fix large field number parsing by using unsigned right shift#3509

Merged
oldergod merged 4 commits intosquare:masterfrom
xi0yu:fix-large-field-numbers
Mar 3, 2026
Merged

Fix large field number parsing by using unsigned right shift#3509
oldergod merged 4 commits intosquare:masterfrom
xi0yu:fix-large-field-numbers

Conversation

@xi0yu
Copy link
Contributor

@xi0yu xi0yu commented Feb 3, 2026

This fixes an issue where Wire fails to parse protobuf data when encountering very large field numbers (for example, 290,848,974).
The failure is caused by using a signed right shift (shr), which introduces sign extension for large values, leading to incorrect tag extraction.
Changed shr to ushr in nextTag() and skipGroup() methods to ensure proper handling of large field numbers.

Only two lines modified:

  • Line 184: tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)
  • Line 250: val tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> val tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)

This fixes an issue where Wire fails to parse protobuf data with large field numbers
(greater than 2^29) due to incorrect signed right shift operations that cause
sign extension. Changed shr to ushr in nextTag() and skipGroup() methods to ensure
proper handling of large field numbers.

Only two lines modified:
- Line 184: tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)
- Line 250: val tag = tagAndFieldEncoding shr TAG_FIELD_ENCODING_BITS -> val tag = (tagAndFieldEncoding ushr TAG_FIELD_ENCODING_BITS)
@oldergod
Copy link
Member

oldergod commented Feb 3, 2026

Thanks for the PR. Is writing a test for this gonna be difficult?

- Change signed right shift to unsigned right shift in ProtoReader
  to fix parsing of large field numbers (numbers greater than 0x10000000) :)
- Add test case and fixture for large field number validation
@xi0yu
Copy link
Contributor Author

xi0yu commented Feb 4, 2026

Thanks! I’ve confirmed locally that the issue occurs when field numbers are >= 0x10000000. Using signed right shift (shr) causes sign extension, which makes tag/tagAndField decode incorrectly. This is exactly what the PR fixes by switching to unsigned right shift (ushr).

I’ve also added a unit test specifically covering field numbers >= 0x10000000 to verify the fix.

@oldergod
Copy link
Member

oldergod commented Feb 4, 2026

Comment on lines +26 to +31
private val adapter = createRuntimeMessageAdapter(
LargeFieldMessage::class.java,
"square.github.io/wire/unknown",
Syntax.PROTO_2,
LargeFieldNumberTest::class.java.classLoader,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could access the generated adapter as well, I think?

Suggested change
private val adapter = createRuntimeMessageAdapter(
LargeFieldMessage::class.java,
"square.github.io/wire/unknown",
Syntax.PROTO_2,
LargeFieldNumberTest::class.java.classLoader,
)
private val adapter = LargeFieldMessage.ADAPTER

@xi0yu
Copy link
Contributor Author

xi0yu commented Feb 4, 2026

It's done! Thanks for the opportunity. I'm really happy to contribute to this project. :P

refactor: use direct adapter reference instead of factory 
@oldergod
Copy link
Member

oldergod commented Feb 4, 2026

Hmm, is this actually a problem? It looks like this PR addresses a tag value which is outside of the supported bound for the wire format anyway?

https://github.com/protocolbuffers/protobuf/blob/6fcc1b6d16db029c219083042fd9e4238d32faf3/src/google/protobuf/edition_unittest.proto#L615-L618

@xi0yu
Copy link
Contributor Author

xi0yu commented Feb 5, 2026

Hmm, is this actually a problem? It looks like this PR addresses a tag value which is outside of the supported bound for the wire format anyway?

https://github.com/protocolbuffers/protobuf/blob/6fcc1b6d16db029c219083042fd9e4238d32faf3/src/google/protobuf/edition_unittest.proto#L615-L618

Thanks for raising the important question about large field number support. You noticed the Google documentation limit comments, which prompted me to investigate deeply.

I found some key points:

  1. Calculation error in Google's comment: The comment says "The largest possible tag number is 2^28 - 1, since the wire format uses three bits to communicate wire type" - the basic principle is correct (reserving three bits for wire type), but the calculated value is wrong. Actually, in 32 bits, 3 bits are reserved for wire type, leaving 29 bits for field numbers, so the maximum should be 0x1FFFFFFF (i.e., 2^29-1 = 536870911), not 2^28-1 (268435455).

  2. Legitimacy of field number 290848974: This field number (hexadecimal: 0x115600CE) falls within the valid range of 2^29-1, making it a legitimate Protobuf field number.

  3. Hard limit verification: I confirmed that field numbers exceeding 2^29-1 (such as 2^29 = 536870912) are rejected by the compiler, confirming that 2^29-1 (536870911) is the true hard limit. Actually, the Wire project itself defines the same limit in Util.kt#L88: MAX_TAG_VALUE = (1 shl 29) - 1 // 536,870,911

So your question is very valuable, it helped me discover that Google's own comments contain calculation errors. The actual limit is 2^29-1, not 2^28-1, so our fix is reasonable.

@xi0yu
Copy link
Contributor Author

xi0yu commented Feb 26, 2026

@oldergod Hi! Just checking whether you had a chance to look at the latest updates.
All review comments have been addressed.

Let me know if anything else needs improvement.

@oldergod
Copy link
Member

@xi0yu in total honesty, I felt a cold shower reading an LLM text.

We have some interop tests in wire-protoc-compatibility-tests and I would want to see a test with the biggest allowed tag number per your PR work between Wire and protoc. I gave it a quick try and tests didn't pass.

@xi0yu
Copy link
Contributor Author

xi0yu commented Feb 26, 2026

Thanks for the feedback. To be honest, because my English isn't very strong, I used an AI tool to help polish the wording to make it clearer. I'll review and rewrite it to make sure it sounds more natural.More importantly, I'll look into the failing interop test regarding the max tag number right away and get back to you with a fix.

Add test in wire-protoc-compatibility-tests validating Wire and protoc
can encode/decode messages with field numbers at the 2^28 boundary:
- 268435455 (2^28 - 1, below boundary)
- 268435456 (2^28, at boundary)
- 536870911 (2^29 - 1, maximum allowed)

Remove the duplicate LargeFieldNumberTest from wire-tests which only
tested Wire's internal encoding. The new interop test validates actual
Wire ↔ protoc compatibility as requested by maintainers.
@xi0yu
Copy link
Contributor Author

xi0yu commented Feb 27, 2026

@oldergod I've added the interop test in wire-protoc-compatibility-tests. Tests pass on my machine.

Could you share what error you saw when you tested? Want to make sure I haven't missed anything.

@xi0yu
Copy link
Contributor Author

xi0yu commented Mar 3, 2026

Hey, any thoughts on this when you get a chance?
Happy to tweak anything.

Copy link
Member

@oldergod oldergod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@oldergod oldergod merged commit a393861 into square:master Mar 3, 2026
10 checks passed
@oldergod oldergod added this to the 6.0 milestone Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants