[@typescript/api] Port scanner and token navigation utils to API client, include JSDoc in encoded ASTs, add more SourceFile properties#2904
Conversation
There was a problem hiding this comment.
I added this so the client-side token utils can validate identical behavior to the Go astnav utils. Open to ideas for how to not dump 20k lines of JSON to disk if this is upsetting.
There was a problem hiding this comment.
You could dump it encoded in arrays, to limit how much text it is?
There was a problem hiding this comment.
I'm not bothered by it; it's small potatoes compared to a lot of the project baselines
There was a problem hiding this comment.
This is a direct port of the Go code, not the Strada code
There was a problem hiding this comment.
Pull request overview
This PR ports scanner and token navigation utilities from the TypeScript compiler to the API client package, includes JSDoc nodes in encoded ASTs, and adds additional SourceFile properties for module metadata. The changes span both Go (internal encoder/decoder) and TypeScript (AST/API packages) codebases.
Changes:
- Bumped protocol version from 3 to 5 with expanded node structure (28 bytes instead of 24) to include node flags
- Added structured data section to binary format using msgpack encoding for file references and node index arrays
- Ported scanner.ts (~2500 lines) and astnav.ts (~500 lines) token navigation utilities to TypeScript AST package
- Enhanced SourceFile with languageVariant, scriptKind, file references, imports, module augmentations, and external module indicator
- Go encoder now visits JSDoc nodes after each node's children to include them in the AST
Reviewed changes
Copilot reviewed 34 out of 37 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/api/encoder/encoder.go | Updated protocol to v5, added structured data section, JSDoc encoding, node flags, and SourceFile metadata |
| internal/api/encoder/decoder.go | Added JSDocLink variants and JSDocNameReference support |
| internal/astnav/tokens_test.go | Added JSON baseline tests for token navigation functions |
| _packages/ast/src/scanner.ts | New 2500+ line port of TypeScript scanner with full lexical analysis |
| _packages/ast/src/astnav.ts | New token navigation utilities (getTokenAtPosition, getTouchingPropertyName, etc.) |
| _packages/ast/src/nodes.ts | Added FileReference interface, SourceFile properties (jsDoc, languageVariant, scriptKind, etc.) |
| _packages/ast/src/utils.ts | Added formatSyntaxKind helper |
| _packages/ast/src/is.ts | Added type guards for tokens, keywords, JSDoc kinds, and property name literals |
| _packages/ast/src/enums/* | Added new enum files for CharacterCodes, ScriptTarget, ScriptKind, LanguageVariant, JSDocParsingMode, etc. |
| _packages/api/src/node/protocol.ts | Updated protocol version and header offsets for structured data |
| _packages/api/src/node/node.ts | Added SourceFile property getters, JSDoc support, structured data reading via msgpack |
| _packages/api/src/node/msgpack.ts | New minimal msgpack encoder/decoder for structured data section |
| _packages/api/src/node/encoder.ts | Added SourceFile extended data encoding, FileReference encoding (incomplete) |
| _packages/api/src/syncChannel.ts | Refactored msgpack constants into shared msgpack.ts module |
| _packages/api/test/async/astnav.test.ts | New tests comparing TypeScript astnav with Go baseline |
| _packages/api/scripts/generateSync.ts | Added astnav.test.ts to sync generation |
| _packages/ast/scripts/generateFactory.ts | Excluded new SourceFile properties from factory generation |
| _packages/ast/package.json | Added scanner export |
| _packages/api/package.json | Updated build scripts to include factory generation |
| const referencedFilesOffset = encodeFileReferences(sf.referencedFiles, structuredWriter); | ||
| const typeRefDirectivesOffset = encodeFileReferences(sf.typeReferenceDirectives, structuredWriter); | ||
| const libRefDirectivesOffset = encodeFileReferences(sf.libReferenceDirectives, structuredWriter); | ||
| extendedData.push(textIndex, fileNameIndex, pathIndex, sf.languageVariant, sf.scriptKind, referencedFilesOffset, typeRefDirectivesOffset, libRefDirectivesOffset, NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, 0); |
There was a problem hiding this comment.
The TypeScript encoder for SourceFile extended data pushes placeholder values (NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, 0) for imports, moduleAugmentations, ambientModuleNames, and externalModuleIndicator (line 237), but unlike the Go encoder, it never patches these placeholders with actual data after the tree walk.
The Go encoder builds a nodeIndexMap, tracks node indices during the walk, then encodes these arrays and patches the placeholders (lines 439-455 in encoder.go). The TypeScript encoder lacks this logic entirely, so these SourceFile properties will always be empty/undefined when decoding.
There was a problem hiding this comment.
That's because you can't construct a SourceFile with these properties from the client.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
How should we keep this up to date going forward?
How does this manage differences in encoding?
There was a problem hiding this comment.
What do you think of the latest commit for that?
No description provided.