Rewrite code generator in Rust by leighmcculloch · Pull Request #490 · stellar/rs-stellar-xdr

leighmcculloch · 2026-01-09T14:21:40Z

What

Add a new XDR-to-Rust code generator written in Rust (xdr-generator-rust/) that produces identical output to the existing Ruby generator (xdr-generator/).

Uses Askama templates separating code templates fromlogic and preparation of the data.

Plan

This change includes a test that for now requires both generators to continue to output identical. The plan is to migrate to the Rust generator once it has had time to bake through changes, then deprecate the Ruby generator.

There are many things we can probably do to simplify the generator and/or make it better. But those things will only come after migrating to the Rust generator.

Why

Based off passing comments over years, devs avoid contributing to xdrgen or the generators because they are written in Ruby. The ruby generator is brittle and easy to break, and messy with logic and templates all mixed up together.

I have a long list of things I want to do to improve the Rust code generation and the Ruby generator is at the point where it is a struggle to continue adding things:

Also, the ruby generator is so slow that iterating on it is painful, especially when every second build and generate is to just find a ruby syntax error. The Rust generator in this PR builds and runs much faster than the ruby one does.

Known limitations

Most of this code was generated by Claude Code, and I am still going through the process of reviewing it myself. I'll mark it as ready for review after I've completed my own review, and figured out a larger test story.

socket-security · 2026-01-09T14:22:04Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	cargo/thiserror@2.0.17
	cargo/pretty_assertions@1.4.1
	cargo/askama@0.14.0
	cargo/clap@4.5.54
	cargo/heck@0.5.0
	cargo/logos@0.15.1

View full report

leighmcculloch · 2026-01-09T15:36:48Z

Code Review: Simplifications, Improvements, and Potential Bugfixes

1. Significant Code Duplication in lexer.rs

Location: lexer.rs:206-305 and lexer.rs:381-450

The next_token() and next_token_content() methods are nearly identical (both contain the same match arms for keywords and symbols). This duplication means bug fixes or changes need to be made in two places.

Suggestion: Refactor to have next_token() call next_token_content() after the whitespace/comment skipping loop:

pub fn next_token(&mut self) -> Result<Token, LexError> {
    loop {
        self.skip_whitespace();
        match self.peek() {
            None => return Ok(Token::Eof),
            Some(&'/') => { /* ... skip comments ... */ continue; }
            Some(&'%') => { self.skip_preprocessor(); continue; }
            _ => break,
        }
    }
    self.next_token_content()
}

2. Dead/Unused Fields in Lexer

Location: lexer.rs:76-80

The source and line/col fields are defined but never meaningfully used for error reporting:

source is only used in tokenize_with_spans() to clone and return
line and col are tracked but never included in error messages

Suggestion: Either remove these fields or add them to LexError to provide better error messages with line/column information.

3. Potential Bug: `tokenize()` Method is Unused

Location: lexer.rs:308-319

The tokenize() method exists but only tokenize_with_spans() is used by the parser. Consider removing the unused method or marking it with #[allow(dead_code)] if it's intended for tests.

4. Duplicate Enum Value Resolution Logic in parser.rs

Location: parser.rs:546-614 (enum parsing) and parser.rs:617-630 (resolve helper)

The resolve_enum_value() method searches the current members first, then global values. This is fine, but the global_values map is populated after parsing each enum, which could cause issues if an enum references a value from itself that hasn't been added yet.

However, the current logic handles this correctly by searching members first. No bug here, but the code flow is subtle.

5. Anonymous Union Extraction is Duplicated

Location: parser.rs:271-327 and parser.rs:838-897

The logic for extracting anonymous unions is duplicated between parse_member() and parse_union_arm(). Both have nearly identical code for:

Generating the union name
Extracting source text
Creating the Union definition
Fixing parent relationships

Suggestion: Extract this into a helper method:

fn extract_anonymous_union(
    &mut self,
    field_name: &str,
    discriminant: Box<Discriminant>,
    arms: Vec<UnionArm>,
    default_arm: Option<Box<UnionArm>>,
    source_start: usize,
    source_end: usize,
    extract_start_idx: usize,
) -> Type

6. Unnecessary String Allocation in generator.rs

Location: generator.rs:317-320

let member_names: String = members
    .iter()
    .map(|m| m.name.as_str())
    .collect::<Vec<_>>()
    .join(", ");

Could use itertools::join() or a simpler approach without the intermediate Vec.

7. Redundant `clone()` Calls

Location: parser.rs:77, 89, 101, etc.

Methods like expect(), expect_ident(), and expect_int_with_hex() clone the token before matching:

let token = self.advance().clone();

Since advance() returns &Token (a reference to the token in self.tokens), the clone is necessary for ownership. However, the parser could be refactored to use indexes and avoid these clones.

8. Potential Simplification in types.rs

Location: types.rs:361-391

The rust_read_call_type() function has complex string manipulation to convert type references to turbofish syntax. This could be simplified with a more structured approach:

fn rust_read_call_type(...) -> String {
    match type_ {
        Type::Array { .. } => format!("<{ref_type}>"),
        Type::Optional(inner) if is_boxed => format!("Option::<Box<{}>>", inner_ref),
        Type::Optional(inner) => format!("Option::<{}>", inner_ref),
        // etc.
    }
}

9. `base_rust_type_ref` vs `base_rust_type_ref_with_info` Duplication

Location: types.rs:256-305 and types.rs:308-358

These two functions are almost identical, differing only in how sizes are resolved (one uses size_to_rust(), the other uses type_info.size_to_literal()).

Suggestion: Merge into one function that takes an optional TypeInfo:

fn base_rust_type_ref_internal(type_: &Type, type_info: Option<&TypeInfo>) -> String {
    let size_fn = |s: &Size| match type_info {
        Some(ti) => ti.size_to_literal(s),
        None => size_to_rust(s),
    };
    // ... rest of implementation
}

10. Error Handling for Enum Value References

Location: parser.rs:617-630

The resolve_enum_value() method returns 0 as a fallback when a reference cannot be resolved:

// Return 0 as fallback
0

This silently handles errors. Consider either:

Making it return Result<i32, ParseError> and propagating the error
Adding a warning/log when a reference can't be resolved

11. Unused `_name` Parameter

Location: parser.rs:946

fn extract_definition_source(&self, _name: &str) -> String {

The _name parameter is never used. Remove it to simplify the API.

12. Simplify Template Conditionals

Location: typedef_newtype.rs.jinja:3-10

{%- if t.has_default && !t.is_var_array %}
#[cfg_attr(feature = "alloc", derive(Default))]
{%- endif %}
{%- if t.has_default && t.is_var_array %}
#[derive(Default, Clone, Hash, PartialEq, Eq, PartialOrd, Ord)]
{%- else %}
#[derive(Clone, Hash, PartialEq, Eq, PartialOrd, Ord)]
{%- endif %}

This could be cleaner. The logic is: if is_var_array, derive Default unconditionally (no cfg_attr). Consider restructuring or adding comments to clarify.

Summary of Highest-Impact Changes

Remove next_token_content() duplication - reduces ~65 lines of duplicate code
Extract anonymous union logic into a helper - reduces ~100 lines of duplicate code
Merge base_rust_type_ref variants - reduces ~50 lines of duplicate code
Remove unused _name parameter - cleanup
Remove or use line/col in error messages - either cleanup or improve UX

socket-security · 2026-01-11T21:47:42Z

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action	Severity	Alert (click "▶" to expand/collapse)
Warn		License policy violation: cargo `memchr` under MIT AND Unlicense Location: Package overview From: `?` → `cargo/[email protected]` → `cargo/[email protected]` ℹ Read more on: This package \| This alert \| What is a license policy violation? Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `[email protected]`. Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore cargo/[email protected]`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.
Warn		License policy violation: cargo `regex-syntax` Location: Package overview From: `?` → `cargo/[email protected]` → `cargo/[email protected]` ℹ Read more on: This package \| This alert \| What is a license policy violation? Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `[email protected]`. Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore cargo/[email protected]`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.
Warn		License policy violation: cargo `unicode-ident` under Unicode-3.0 License: Unicode-3.0 - the applicable license policy does not allow this license (4) (unicode-ident-1.0.22/Cargo.toml) License: Unicode-3.0 - the applicable license policy does not allow this license (4) (unicode-ident-1.0.22/Cargo.toml) License: Unicode-3.0 - the applicable license policy does not allow this license (4) (unicode-ident-1.0.22/LICENSE-UNICODE) From: `?` → `cargo/[email protected]` → `cargo/[email protected]` → `cargo/[email protected]` → `cargo/[email protected]` → `cargo/[email protected]` ℹ Read more on: This package \| This alert \| What is a license policy violation? Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at `[email protected]`. Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license. Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment `@SocketSecurity ignore cargo/[email protected]`. You can also ignore all packages with `@SocketSecurity ignore-all`. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

Remove duplicate RawToken enum by using tuple variant for IntLiteral. Lexer reduced from 373 to 283 lines.

Copilot

Pull request overview

This PR introduces a new XDR-to-Rust code generator written in Rust to replace the existing Ruby generator. The Rust generator uses Askama templates to separate code generation logic from templates, improving maintainability and performance. The new generator is designed to produce identical output to the Ruby generator during a transition period, with both generators running in parallel for validation.

Changes:

Added new Rust-based XDR generator in xdr-generator-rust/ with lexer, parser, AST, type resolution, and code generation modules
Created Askama templates for generating Rust code (structs, enums, unions, typedefs, constants, and type enums)
Updated Makefile to run both generators and verify their outputs match

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
xdr-generator-rust/Cargo.toml	Defines the new Rust generator package and dependencies
xdr-generator-rust/Cargo.lock	Dependency lock file for the new generator
xdr-generator-rust/src/main.rs	CLI entry point that parses arguments and orchestrates generation
xdr-generator-rust/src/lib.rs	Module exports for the library
xdr-generator-rust/src/lexer.rs	Tokenizer for XDR source using Logos
xdr-generator-rust/src/parser.rs	Recursive descent parser for XDR definitions
xdr-generator-rust/src/ast.rs	AST data structures for XDR definitions
xdr-generator-rust/src/types.rs	Type resolution, reference computation, and naming utilities
xdr-generator-rust/src/generator.rs	Code generation logic that prepares data for templates
xdr-generator-rust/templates/*.jinja	Askama templates for generating Rust code
Makefile	Added rules to run Rust generator and compare outputs
CONTRIBUTING.md	Documented the dual generator setup
.gitignore	Added generated-rust.rs to ignore list

Comments suppressed due to low confidence (1)

Makefile:73

The dependency rules for src/next/generated.rs and src/curr/generated.rs appear to be swapped. Line 69 shows that src/next/generated.rs depends on xdr/curr/.x files, but it should depend on xdr/next/.x files. Similarly, line 72 shows that src/curr/generated.rs depends on xdr/next/.x files, but it should depend on xdr/curr/.x files.

src/next/generated.rs: $(sort $(wildcard xdr/curr/*.x))
	> $@

src/curr/generated.rs: $(sort $(wildcard xdr/next/*.x))
	> $@

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xdr-generator-rust/templates/generated.rs.jinja

xdr-generator-rust/templates/typedef_newtype.rs.jinja

Co-authored-by: Copilot <[email protected]>

leighmcculloch added 18 commits January 9, 2026 09:37

add xdr-generator-rust crate structure

824adb0

add handoff documentation and fix parser edge cases

260dac2

extract anonymous unions and inline structs during parsing

94060fe

remove handoff documentation

5154442

delete test_snake example file

4078cc6

add rust code generation targets to makefile

463f29e

prefix input flags to each xdr file argument

ddf003b

generate with xdr-generator-rust

fea7888

fix differences between ruby and rust generators

ed375a3

add topological sorting for xdr file dependencies

97dfb73

rename generate-rust target and add to generate deps

e8331c4

remove unused fns

180e77e

generated with rust files

af20dcf

add nested type tracking and parent relationships to ast

a58bee9

update rust generated

94edc6f

refactor struct generation to use template-driven attributes

bda5338

replace attribute generation with boolean flags in output structs

dbd4290

add xdr generator documentation and output validation

537a246

remove rust files

9fb2dfe

leighmcculloch mentioned this pull request Jan 9, 2026

[Experiment] Add sparse XDR decode types #488

Closed

remove .PHONY declaration from Makefile

09b7c4e

leighmcculloch added 2 commits January 12, 2026 16:16

remove dependencies from check-generated-match

ae06f09

remove tokenize method and use tokenize_with_spans

7db6d1b

leighmcculloch mentioned this pull request Jan 12, 2026

Deduplicate types across curr and next #467

Open

leighmcculloch added 3 commits January 12, 2026 22:31

reorganize code into logical sections

24e28eb

replace handwritten lexer with logos

2e31b28

simplify lexer to single Token enum

2e56473

Remove duplicate RawToken enum by using tuple variant for IntLiteral. Lexer reduced from 373 to 283 lines.

leighmcculloch added 2 commits January 13, 2026 16:11

use IntBase enum instead of is_hex boolean

2ad2b6f

move IntBase enum and parse functions after Token enum

44168ba

leighmcculloch marked this pull request as ready for review January 22, 2026 04:40

Copilot AI review requested due to automatic review settings January 22, 2026 04:40

Copilot started reviewing on behalf of leighmcculloch January 22, 2026 04:40 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

xdr-generator-rust/templates/generated.rs.jinja Outdated Show resolved Hide resolved

xdr-generator-rust/templates/typedef_newtype.rs.jinja Outdated Show resolved Hide resolved

leighmcculloch and others added 4 commits February 6, 2026 11:24

fmt

807f32c

Co-authored-by: Copilot <[email protected]>

fmt

c968557

Co-authored-by: Copilot <[email protected]>

Merge branch 'main' into rewrite-code-gen

cc8982a

add namespace to generated template

2ab4018

leighmcculloch mentioned this pull request Feb 10, 2026

Experiment: Add #ifdef/#else/#endif support to XDR codegen #494

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite code generator in Rust#490

Rewrite code generator in Rust#490
leighmcculloch wants to merge 31 commits intomainfrom
rewrite-code-gen

leighmcculloch commented Jan 9, 2026 •

edited

Loading

Uh oh!

socket-security bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

leighmcculloch commented Jan 9, 2026

Uh oh!

socket-security bot commented Jan 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leighmcculloch commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Plan

Why

Known limitations

Uh oh!

socket-security bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leighmcculloch commented Jan 9, 2026

Code Review: Simplifications, Improvements, and Potential Bugfixes

1. Significant Code Duplication in lexer.rs

2. Dead/Unused Fields in Lexer

3. Potential Bug: tokenize() Method is Unused

4. Duplicate Enum Value Resolution Logic in parser.rs

5. Anonymous Union Extraction is Duplicated

6. Unnecessary String Allocation in generator.rs

7. Redundant clone() Calls

8. Potential Simplification in types.rs

9. base_rust_type_ref vs base_rust_type_ref_with_info Duplication

10. Error Handling for Enum Value References

11. Unused _name Parameter

12. Simplify Template Conditionals

Summary of Highest-Impact Changes

Uh oh!

socket-security bot commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leighmcculloch commented Jan 9, 2026 •

edited

Loading

socket-security bot commented Jan 9, 2026 •

edited

Loading

3. Potential Bug: `tokenize()` Method is Unused

7. Redundant `clone()` Calls

9. `base_rust_type_ref` vs `base_rust_type_ref_with_info` Duplication

11. Unused `_name` Parameter

socket-security bot commented Jan 11, 2026 •

edited

Loading