Skip to content

[@typescript/api] Port scanner and token navigation utils to API client, include JSDoc in encoded ASTs, add more SourceFile properties#2904

Open
andrewbranch wants to merge 14 commits intomicrosoft:mainfrom
andrewbranch:api-scanner
Open

[@typescript/api] Port scanner and token navigation utils to API client, include JSDoc in encoded ASTs, add more SourceFile properties#2904
andrewbranch wants to merge 14 commits intomicrosoft:mainfrom
andrewbranch:api-scanner

Conversation

@andrewbranch
Copy link
Member

No description provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this so the client-side token utils can validate identical behavior to the Go astnav utils. Open to ideas for how to not dump 20k lines of JSON to disk if this is upsetting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could dump it encoded in arrays, to limit how much text it is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not bothered by it; it's small potatoes compared to a lot of the project baselines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a direct port of the Go code, not the Strada code

@andrewbranch andrewbranch marked this pull request as ready for review February 25, 2026 19:05
Copilot AI review requested due to automatic review settings February 25, 2026 19:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports scanner and token navigation utilities from the TypeScript compiler to the API client package, includes JSDoc nodes in encoded ASTs, and adds additional SourceFile properties for module metadata. The changes span both Go (internal encoder/decoder) and TypeScript (AST/API packages) codebases.

Changes:

  • Bumped protocol version from 3 to 5 with expanded node structure (28 bytes instead of 24) to include node flags
  • Added structured data section to binary format using msgpack encoding for file references and node index arrays
  • Ported scanner.ts (~2500 lines) and astnav.ts (~500 lines) token navigation utilities to TypeScript AST package
  • Enhanced SourceFile with languageVariant, scriptKind, file references, imports, module augmentations, and external module indicator
  • Go encoder now visits JSDoc nodes after each node's children to include them in the AST

Reviewed changes

Copilot reviewed 34 out of 37 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/api/encoder/encoder.go Updated protocol to v5, added structured data section, JSDoc encoding, node flags, and SourceFile metadata
internal/api/encoder/decoder.go Added JSDocLink variants and JSDocNameReference support
internal/astnav/tokens_test.go Added JSON baseline tests for token navigation functions
_packages/ast/src/scanner.ts New 2500+ line port of TypeScript scanner with full lexical analysis
_packages/ast/src/astnav.ts New token navigation utilities (getTokenAtPosition, getTouchingPropertyName, etc.)
_packages/ast/src/nodes.ts Added FileReference interface, SourceFile properties (jsDoc, languageVariant, scriptKind, etc.)
_packages/ast/src/utils.ts Added formatSyntaxKind helper
_packages/ast/src/is.ts Added type guards for tokens, keywords, JSDoc kinds, and property name literals
_packages/ast/src/enums/* Added new enum files for CharacterCodes, ScriptTarget, ScriptKind, LanguageVariant, JSDocParsingMode, etc.
_packages/api/src/node/protocol.ts Updated protocol version and header offsets for structured data
_packages/api/src/node/node.ts Added SourceFile property getters, JSDoc support, structured data reading via msgpack
_packages/api/src/node/msgpack.ts New minimal msgpack encoder/decoder for structured data section
_packages/api/src/node/encoder.ts Added SourceFile extended data encoding, FileReference encoding (incomplete)
_packages/api/src/syncChannel.ts Refactored msgpack constants into shared msgpack.ts module
_packages/api/test/async/astnav.test.ts New tests comparing TypeScript astnav with Go baseline
_packages/api/scripts/generateSync.ts Added astnav.test.ts to sync generation
_packages/ast/scripts/generateFactory.ts Excluded new SourceFile properties from factory generation
_packages/ast/package.json Added scanner export
_packages/api/package.json Updated build scripts to include factory generation

const referencedFilesOffset = encodeFileReferences(sf.referencedFiles, structuredWriter);
const typeRefDirectivesOffset = encodeFileReferences(sf.typeReferenceDirectives, structuredWriter);
const libRefDirectivesOffset = encodeFileReferences(sf.libReferenceDirectives, structuredWriter);
extendedData.push(textIndex, fileNameIndex, pathIndex, sf.languageVariant, sf.scriptKind, referencedFilesOffset, typeRefDirectivesOffset, libRefDirectivesOffset, NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, 0);
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TypeScript encoder for SourceFile extended data pushes placeholder values (NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, NO_STRUCTURED_DATA, 0) for imports, moduleAugmentations, ambientModuleNames, and externalModuleIndicator (line 237), but unlike the Go encoder, it never patches these placeholders with actual data after the tree walk.

The Go encoder builds a nodeIndexMap, tracks node indices during the walk, then encodes these arrays and patches the placeholders (lines 439-455 in encoder.go). The TypeScript encoder lacks this logic entirely, so these SourceFile properties will always be empty/undefined when decoding.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because you can't construct a SourceFile with these properties from the client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we keep this up to date going forward?

How does this manage differences in encoding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of the latest commit for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants