Skip to content

Rewrite code generator in Rust#490

Open
leighmcculloch wants to merge 31 commits intomainfrom
rewrite-code-gen
Open

Rewrite code generator in Rust#490
leighmcculloch wants to merge 31 commits intomainfrom
rewrite-code-gen

Conversation

@leighmcculloch
Copy link
Member

@leighmcculloch leighmcculloch commented Jan 9, 2026

What

Add a new XDR-to-Rust code generator written in Rust (xdr-generator-rust/) that produces identical output to the existing Ruby generator (xdr-generator/).

Uses Askama templates separating code templates fromlogic and preparation of the data.

Plan

This change includes a test that for now requires both generators to continue to output identical. The plan is to migrate to the Rust generator once it has had time to bake through changes, then deprecate the Ruby generator.

There are many things we can probably do to simplify the generator and/or make it better. But those things will only come after migrating to the Rust generator.

Why

Based off passing comments over years, devs avoid contributing to xdrgen or the generators because they are written in Ruby. The ruby generator is brittle and easy to break, and messy with logic and templates all mixed up together.

I have a long list of things I want to do to improve the Rust code generation and the Ruby generator is at the point where it is a struggle to continue adding things:

Also, the ruby generator is so slow that iterating on it is painful, especially when every second build and generate is to just find a ruby syntax error. The Rust generator in this PR builds and runs much faster than the ruby one does.

Known limitations

Most of this code was generated by Claude Code, and I am still going through the process of reviewing it myself. I'll mark it as ready for review after I've completed my own review, and figured out a larger test story.

@socket-security
Copy link

socket-security bot commented Jan 9, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​thiserror@​2.0.178010093100100
Addedcargo/​pretty_assertions@​1.4.110010090100100
Addedcargo/​askama@​0.14.010010093100100
Addedcargo/​clap@​4.5.549910093100100
Addedcargo/​heck@​0.5.010010093100100
Addedcargo/​logos@​0.15.1100100100100100

View full report

@leighmcculloch
Copy link
Member Author

Code Review: Simplifications, Improvements, and Potential Bugfixes

1. Significant Code Duplication in lexer.rs

Location: lexer.rs:206-305 and lexer.rs:381-450

The next_token() and next_token_content() methods are nearly identical (both contain the same match arms for keywords and symbols). This duplication means bug fixes or changes need to be made in two places.

Suggestion: Refactor to have next_token() call next_token_content() after the whitespace/comment skipping loop:

pub fn next_token(&mut self) -> Result<Token, LexError> {
    loop {
        self.skip_whitespace();
        match self.peek() {
            None => return Ok(Token::Eof),
            Some(&'/') => { /* ... skip comments ... */ continue; }
            Some(&'%') => { self.skip_preprocessor(); continue; }
            _ => break,
        }
    }
    self.next_token_content()
}

2. Dead/Unused Fields in Lexer

Location: lexer.rs:76-80

The source and line/col fields are defined but never meaningfully used for error reporting:

  • source is only used in tokenize_with_spans() to clone and return
  • line and col are tracked but never included in error messages

Suggestion: Either remove these fields or add them to LexError to provide better error messages with line/column information.


3. Potential Bug: tokenize() Method is Unused

Location: lexer.rs:308-319

The tokenize() method exists but only tokenize_with_spans() is used by the parser. Consider removing the unused method or marking it with #[allow(dead_code)] if it's intended for tests.


4. Duplicate Enum Value Resolution Logic in parser.rs

Location: parser.rs:546-614 (enum parsing) and parser.rs:617-630 (resolve helper)

The resolve_enum_value() method searches the current members first, then global values. This is fine, but the global_values map is populated after parsing each enum, which could cause issues if an enum references a value from itself that hasn't been added yet.

However, the current logic handles this correctly by searching members first. No bug here, but the code flow is subtle.


5. Anonymous Union Extraction is Duplicated

Location: parser.rs:271-327 and parser.rs:838-897

The logic for extracting anonymous unions is duplicated between parse_member() and parse_union_arm(). Both have nearly identical code for:

  • Generating the union name
  • Extracting source text
  • Creating the Union definition
  • Fixing parent relationships

Suggestion: Extract this into a helper method:

fn extract_anonymous_union(
    &mut self,
    field_name: &str,
    discriminant: Box<Discriminant>,
    arms: Vec<UnionArm>,
    default_arm: Option<Box<UnionArm>>,
    source_start: usize,
    source_end: usize,
    extract_start_idx: usize,
) -> Type

6. Unnecessary String Allocation in generator.rs

Location: generator.rs:317-320

let member_names: String = members
    .iter()
    .map(|m| m.name.as_str())
    .collect::<Vec<_>>()
    .join(", ");

Could use itertools::join() or a simpler approach without the intermediate Vec.


7. Redundant clone() Calls

Location: parser.rs:77, 89, 101, etc.

Methods like expect(), expect_ident(), and expect_int_with_hex() clone the token before matching:

let token = self.advance().clone();

Since advance() returns &Token (a reference to the token in self.tokens), the clone is necessary for ownership. However, the parser could be refactored to use indexes and avoid these clones.


8. Potential Simplification in types.rs

Location: types.rs:361-391

The rust_read_call_type() function has complex string manipulation to convert type references to turbofish syntax. This could be simplified with a more structured approach:

fn rust_read_call_type(...) -> String {
    match type_ {
        Type::Array { .. } => format!("<{ref_type}>"),
        Type::Optional(inner) if is_boxed => format!("Option::<Box<{}>>", inner_ref),
        Type::Optional(inner) => format!("Option::<{}>", inner_ref),
        // etc.
    }
}

9. base_rust_type_ref vs base_rust_type_ref_with_info Duplication

Location: types.rs:256-305 and types.rs:308-358

These two functions are almost identical, differing only in how sizes are resolved (one uses size_to_rust(), the other uses type_info.size_to_literal()).

Suggestion: Merge into one function that takes an optional TypeInfo:

fn base_rust_type_ref_internal(type_: &Type, type_info: Option<&TypeInfo>) -> String {
    let size_fn = |s: &Size| match type_info {
        Some(ti) => ti.size_to_literal(s),
        None => size_to_rust(s),
    };
    // ... rest of implementation
}

10. Error Handling for Enum Value References

Location: parser.rs:617-630

The resolve_enum_value() method returns 0 as a fallback when a reference cannot be resolved:

// Return 0 as fallback
0

This silently handles errors. Consider either:

  1. Making it return Result<i32, ParseError> and propagating the error
  2. Adding a warning/log when a reference can't be resolved

11. Unused _name Parameter

Location: parser.rs:946

fn extract_definition_source(&self, _name: &str) -> String {

The _name parameter is never used. Remove it to simplify the API.


12. Simplify Template Conditionals

Location: typedef_newtype.rs.jinja:3-10

{%- if t.has_default && !t.is_var_array %}
#[cfg_attr(feature = "alloc", derive(Default))]
{%- endif %}
{%- if t.has_default && t.is_var_array %}
#[derive(Default, Clone, Hash, PartialEq, Eq, PartialOrd, Ord)]
{%- else %}
#[derive(Clone, Hash, PartialEq, Eq, PartialOrd, Ord)]
{%- endif %}

This could be cleaner. The logic is: if is_var_array, derive Default unconditionally (no cfg_attr). Consider restructuring or adding comments to clarify.


Summary of Highest-Impact Changes

  1. Remove next_token_content() duplication - reduces ~65 lines of duplicate code
  2. Extract anonymous union logic into a helper - reduces ~100 lines of duplicate code
  3. Merge base_rust_type_ref variants - reduces ~50 lines of duplicate code
  4. Remove unused _name parameter - cleanup
  5. Remove or use line/col in error messages - either cleanup or improve UX

@socket-security
Copy link

socket-security bot commented Jan 11, 2026

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn High
License policy violation: cargo memchr under MIT AND Unlicense

Location: Package overview

From: ?cargo/[email protected]cargo/[email protected]

ℹ Read more on: This package | This alert | What is a license policy violation?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at [email protected].

Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore cargo/[email protected]. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

Warn High
License policy violation: cargo regex-syntax

Location: Package overview

From: ?cargo/[email protected]cargo/[email protected]

ℹ Read more on: This package | This alert | What is a license policy violation?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at [email protected].

Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore cargo/[email protected]. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

Warn High
License policy violation: cargo unicode-ident under Unicode-3.0

License: Unicode-3.0 - the applicable license policy does not allow this license (4) (unicode-ident-1.0.22/Cargo.toml)

License: Unicode-3.0 - the applicable license policy does not allow this license (4) (unicode-ident-1.0.22/Cargo.toml)

License: Unicode-3.0 - the applicable license policy does not allow this license (4) (unicode-ident-1.0.22/LICENSE-UNICODE)

From: ?cargo/[email protected]cargo/[email protected]cargo/[email protected]cargo/[email protected]cargo/[email protected]

ℹ Read more on: This package | This alert | What is a license policy violation?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at [email protected].

Suggestion: Find a package that does not violate your license policy or adjust your policy to allow this package's license.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore cargo/[email protected]. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

Remove duplicate RawToken enum by using tuple variant for IntLiteral.
Lexer reduced from 373 to 283 lines.
@leighmcculloch leighmcculloch marked this pull request as ready for review January 22, 2026 04:40
Copilot AI review requested due to automatic review settings January 22, 2026 04:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new XDR-to-Rust code generator written in Rust to replace the existing Ruby generator. The Rust generator uses Askama templates to separate code generation logic from templates, improving maintainability and performance. The new generator is designed to produce identical output to the Ruby generator during a transition period, with both generators running in parallel for validation.

Changes:

  • Added new Rust-based XDR generator in xdr-generator-rust/ with lexer, parser, AST, type resolution, and code generation modules
  • Created Askama templates for generating Rust code (structs, enums, unions, typedefs, constants, and type enums)
  • Updated Makefile to run both generators and verify their outputs match

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
xdr-generator-rust/Cargo.toml Defines the new Rust generator package and dependencies
xdr-generator-rust/Cargo.lock Dependency lock file for the new generator
xdr-generator-rust/src/main.rs CLI entry point that parses arguments and orchestrates generation
xdr-generator-rust/src/lib.rs Module exports for the library
xdr-generator-rust/src/lexer.rs Tokenizer for XDR source using Logos
xdr-generator-rust/src/parser.rs Recursive descent parser for XDR definitions
xdr-generator-rust/src/ast.rs AST data structures for XDR definitions
xdr-generator-rust/src/types.rs Type resolution, reference computation, and naming utilities
xdr-generator-rust/src/generator.rs Code generation logic that prepares data for templates
xdr-generator-rust/templates/*.jinja Askama templates for generating Rust code
Makefile Added rules to run Rust generator and compare outputs
CONTRIBUTING.md Documented the dual generator setup
.gitignore Added generated-rust.rs to ignore list
Comments suppressed due to low confidence (1)

Makefile:73

  • The dependency rules for src/next/generated.rs and src/curr/generated.rs appear to be swapped. Line 69 shows that src/next/generated.rs depends on xdr/curr/.x files, but it should depend on xdr/next/.x files. Similarly, line 72 shows that src/curr/generated.rs depends on xdr/next/.x files, but it should depend on xdr/curr/.x files.
src/next/generated.rs: $(sort $(wildcard xdr/curr/*.x))
	> $@

src/curr/generated.rs: $(sort $(wildcard xdr/next/*.x))
	> $@

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant