Use `int` as basis for `CvdumpTypeKey` by disinvite · Pull Request #315 · isledecomp/reccmp

disinvite · 2026-02-07T23:04:51Z

Fixes #211.

This is a huge change and it could easily be split into a few smaller ones if needed.

Intro

Type keys are used throughout the TYPES section of cvdump, along with other sections that refer to a type, such as SYMBOLS and GLOBALS. They come in three forms:

Scalar types like T_NOTYPE(0000)
"Unknown" scalar types like ???(047C)
User-defined types like 0x1234

Until now we have used a string representation for this key, but the use of upper or lower-case hex digits is not consistent. We use the normalize_type_id function to handle this, but also manually convert using str.lower() in many spots. #203 added the CvdumpTypeKey annotation but this was just an alias for str so it was not checked with mypy.

The obvious improvement is to exploit the fact that scalar types occupy the range from 0x0000 to 0x1000, so the key could just be an int. The obstacle there is that we depend on the string version of the T_* scalar types to decide the type's footprint and other attributes. cvinfo.h contains bit-shifting macros that do this using integers. Considering that the range of scalar types is only 10% full (about 300 total) it is reasonable to use const data for this instead of calculating it on the fly.

This change

All scalar types now have CvInfoType tuples in a new file cvinfo.py, which includes the type's size and format chars for struct.unpack.
CvdumpTypeKey is now a class that extends int so mypy will type check it. We need to make sure it is an int when storing in the Entity DB and convert back to CvdumpTypeKey when reading data back. For now, this happens automatically, but we could use conversion methods too.
The parametrized tests in test_cvdump.py may not have much value now that we store the type attributes directly in cvinfo.py. One change to test data is that float and double are now considered unsigned to match cvinfo.h. The only impact was the choice of format char, but we now store this directly rather than computing it.
On the topic of struct.unpack format chars, I decided to use i/I for the INT types and l/L for LONG. We had previously used l/L for any 4-byte integer. This is more of a style thing and I don't expect any practical difference.
CvdumpTypeKey.from_str() replaces normalize_type_id and it's used everywhere we need to convert a type key. Similarly, the is_scalar() function replaces magic string checks for the "T_" prefix.
Because CvdumpTypeKey is now a class, we need to use it to initialize any int variable that will serve as a type key. This change is most prevalent in unit tests.
Similarly, whenever we need to refer to a specific type, there is an Enum based on CvdumpTypeKey. This is created dynamically with types.new_class to save space.
The Ghidra scripts use the string representation of scalar types to decide which Ghidra primitive to use, and to add a * for pointers. We can access the T_ string in the CvInfoType tuple so the conversion should be the same.
I fixed a bug with LF_ENUM where forward refs were not followed. Forward refs have an underlying type of T_NOTYPE(0000) but this slipped by unnoticed because we assumed a size of 4 most types.
Fixed another bug: we did not correctly handle type keys larger than 0xffff because they use more characters than expected.

Future

Some types might be handled incorrectly in the new system, but we won't know for sure until we see them in a PDB. I marked most scalar types with a weird attribute and the logger will display an info message if we see any.

It is probably time to separate cvdump text parsing from the general type database.

Not all data is exposed through the get() API and we need to access the keys dict directly. e.g. We don't store modifiers like const or volatile or the destination type of pointers. See #106.

test_cvdump_types.py is now over 1000 lines, so I added the pylint:disable option for now. The problem is that we need the huge blocks of text to set up the parser, but this could probably be broken out into tests for each leaf type.

Fix tests to account for this (including INT types using 'i' for struct unpack)

…pes.

…LS parser.

jonschz

Looks very good, this is a big improvement to the code quality!

reccmp/cvdump/cvinfo.py

tests/test_cvdump_types.py

reccmp/cvdump/cvinfo.py

…efault).

disinvite added 21 commits January 31, 2026 14:37

Use integer for type key in all cvdump parsing (WIP)

b3ba361

Fix hex display of type key

db873b6

Merge branch 'isledecomp:master' into strong-type-for-cvdump-key

5f0a742

float/double dubbed unsigned

89f2c04

Add cvdump type names

94ef73f

Remove some magic numbers

9eaaec7

Expand CVInfo listing to include type metadata

ac3343c

Remove T_NOTYPE uses

6358de9

Get scalar data from cvinfo.py.

8278cda

Fix tests to account for this (including INT types using 'i' for struct unpack)

Bug: all types were pointers

6bb34b6

Assume 32-bit enum footprint if NOTYPE specified. Alert to unusual ty…

a1ec1f0

…pes.

Fix the actual bug: LF_ENUM forward refs not followed.

5a6fc61

Remove magic numbers in ghidra files

88bda4e

Use MappingProxy for immutable dict

4c00031

Use kwargs instead. Move test-only functions out of cvdump/types.py

864567a

Move more pieces into cvinfo module.

a13b93f

Dynamic enum for scalar CV type keys.

495fc40

Now enable strong typing for the key.

0e08d81

Remove scalar check magic numbers. Mypy fix for datacmp

aa09c50

Move normalize fn into type class. Close type conversion gap in GLOBA…

db54ff4

…LS parser.

Split leaves properly for keys over 0xffff

d6b2e3b

disinvite requested a review from jonschz February 7, 2026 23:05

Add T_CHAR8 types seen in isledecomp#85. Log type key for ??? types.

419660e

jonschz approved these changes Feb 8, 2026

View reviewed changes

disinvite added 6 commits February 8, 2026 17:03

Bugfix: wrong pointer links for T_LONG

02ba207

Move Ghidra type name conversion to its own module and add tests.

7812397

Explicit enum for CV scalar keys. Expose mapping proxy directly (no d…

c3c1a8b

…efault).

Wrap KeyError from CV primitive map

d6454a5

weird -> verified

9d719a4

Clean up cvinfo type comments

4e9cce5

disinvite added 6 commits February 12, 2026 12:06

Drop tests for type.is_signed

f3ebb84

Improve some variable names to reflect type changes

497bc13

Fix dropped parametrize var

b15e5bb

Add a few tests for reading type keys

965659e

Use existing CV primitive in ScalarType wrapper

171bfab

Merge branch 'master' into strong-type-for-cvdump-key

b44a6ab

disinvite merged commit 78fba75 into isledecomp:master Feb 13, 2026
15 checks passed

disinvite deleted the strong-type-for-cvdump-key branch February 13, 2026 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `int` as basis for `CvdumpTypeKey`#315

Use `int` as basis for `CvdumpTypeKey`#315
disinvite merged 34 commits intoisledecomp:masterfrom
disinvite:strong-type-for-cvdump-key

disinvite commented Feb 7, 2026

Uh oh!

jonschz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

disinvite commented Feb 7, 2026

Intro

This change

Future

Uh oh!

jonschz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants