Skip to content

Abhisheklearn12/semantic_analyzer_rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦀 Semantic Analyzer in Rust

A From-Scratch Semantic Analyzer + IR Builder in Pure Rust.

This project implements a fully working semantic analysis layer and intermediate representation (IR) builder--> written from scratch in 100% Rust.

It performs the same core responsibilities as a compiler frontend: resolving symbols, performing type inference and validation, managing lexical scopes, and lowering the typed AST into a compiler-friendly Intermediate Representation (IR).

No external frameworks. No shortcuts. Just clean, idiomatic Rust with strong correctness guarantees.


Vision

Modern compilers (Rustc, Clang, Swiftc) rely on precise semantic analysis to ensure that programs are type-safe, scope-safe, and referentially correct before any code generation happens.

This project mirrors those principles on a smaller scale:

  • Builds a real symbol table system with nested scopes.
  • Implements type checking and error diagnostics.
  • Performs AST → IR lowering (the step between frontend and backend).
  • Demonstrates compiler-grade engineering--> correctness, clarity, and extensibility.

Key Features

Type System & Inference

  • Supports primitive types (int, bool, string, unit).
  • Infers variable types when not explicitly annotated.
  • Validates return types and assignment compatibility.

Symbol Table & Scope Management

  • Tracks functions, variables, and parameters across nested scopes.
  • Detects redeclarations and undeclared references.
  • Pushes and pops environments automatically for functions & blocks.

Diagnostics System

  • Structured diagnostics with span tracking.

  • Detects:

    • Undeclared variables
    • Type mismatches
    • Duplicate declarations
    • Invalid function calls

IR Builder

  • Lowers verified AST nodes into a linear, typed Intermediate Representation.

  • Generates pseudo-assembly–like instructions:

    LOAD a
    LOAD b
    ADD
    RETURN
    
  • Supports expressions, variables, function calls, and blocks.

Test Suite

  • Includes unit tests for every major semantic case:

    • Correct function call resolution
    • Type mismatch errors
    • Undeclared identifier checks
    • Nested block scope resolution
    • Return-type enforcement

System Architecture

flowchart TD
    A[Raw Source Code] --> B[Lexer]
    B --> C[Parser]
    C -->|AST| D[Semantic Analyzer]
    D -->|Typed AST| E[IR Builder]
    E -->|IR Instructions| F[Runtime / Codegen]
    
    subgraph Diagnostics
    Dx[Errors, Warnings, Spans]
    end

    D --> Dx
Loading

Layers

Layer Description Output
Lexer Converts text into tokens. Token Stream
Parser Builds an untyped AST. AST
Semantic Analyzer Validates AST semantics (types, scopes). Typed AST
IR Builder Lowers the typed AST to linear IR. IR Vector

⚙️ Running the Project

Build

cargo build --release

Run

cargo run

You can modify or run pre-defined sample programs in src/main.rs like:

fn sample_program_small() -> Program {
    use Expr::*;
    use Stmt::*;
    Program {
        items: vec![
            // fn add(a:int, b:int) -> int { return a + b; }
            FnDecl {
                name: "add".into(),
                params: vec![("a".into(), TypeExpr::Int), ("b".into(), TypeExpr::Int)],
                ret: TypeExpr::Int,
                body: vec![Return {
                    expr: Some(BinaryOp(
                        Box::new(Ident("a".into(), Span::new(1,1))),
                        BinOp::Add,
                        Box::new(Ident("b".into(), Span::new(2,2))),
                        Span::new(1,2)
                    )),
                    span: Span::new(1,2)
                }],
                span: Span::new(0,10),
            },
            // fn main() { var x = add(1, 2); print("ok"); }
            FnDecl {
                name: "main".into(),
                params: vec![],
                ret: TypeExpr::Unit,
                body: vec![
                    VarDecl {
                        name: "x".into(),
                        typ: None,
                        init: Some(Call(
                            Box::new(Ident("add".into(), Span::new(10,13))),
                            vec![IntLit(1, Span::new(11,11)), IntLit(2, Span::new(12,12))],
                            Span::new(10,13)
                        )),
                        span: Span::new(10,13),
                    },
                    ExprStmt(Call(
                        Box::new(Ident("print".into(), Span::new(14,19))),
                        vec![StrLit("ok".into(), Span::new(15,17))],
                        Span::new(14,17)
                    )),
                ],
                span: Span::new(10,40),
            },
        ],
    }
}

🧪 Testing

cargo test

Example output:

Output Image

Output Image

💡 Challenges & Solutions

Challenge Problem Solution
Managing nested scopes Variables in inner blocks shadowing parent variables Implemented scoped EnvStack with push/pop per block
Type inference conflicts Ambiguous types for untyped variables Added default inference and propagation logic
Function redeclarations Duplicate symbols during multiple passes Tracked all function declarations upfront before analysis
Undeclared identifiers Referenced before declaration Introduced global + local environment validation phase
Borrow checker issues Conflicting mutable borrows in IR lowering Switched to interior mutability via RefCell

Lessons Learned

  1. Semantic correctness is the soul of compilers. Syntax is surface-level --> semantics define truth.

  2. Symbol tables and environments mirror real-world scope chains. From C++ to Rust, this same logic drives variable lifetime and resolution.

  3. Rust’s ownership model enforces safe compiler design. Every borrow and reference in this project is intentional.

  4. Intermediate Representation (IR) is language-agnostic power. The moment you emit IR, you’ve built a true compiler frontend.


Roadmap

  • Symbol tables & scoped environments
  • Type inference and checking
  • Function validation and call resolution
  • Error diagnostics with spans
  • AST → IR lowering
  • Unit test suite
  • Control flow analysis (if/while/return)
  • Constant folding optimization
  • IR serialization and visualization
  • Integration with codegen (LLVM or custom VM)

Example IR Output

fn add(a:int, b:int) -> int {
    LOAD a
    LOAD b
    ADD
    RETURN
}

fn main() {
    CALL add 1 2
    STORE x
    CONST "ok"
    CALL print 1
}

What I Learned

Building a semantic analyzer from scratch taught me how languages guarantee correctness before execution — from type inference to scope validation.

I also learned:

  • Compiler architectures are just well-structured trees.
  • Rust forces correctness not just in programs — but in the compiler itself.
  • IR generation is the bridge between “understanding” and “executing” code.

Summary

This is a real compiler frontend.

Written in pure Rust. Built with zero dependencies. Designed for correctness, not convenience.

Semantic analysis separates “syntax that looks right” from “code that is right.”

This project makes that distinction crystal clear.


Quote:

what I cannot create, I cannot understand.

About

Build my own Semantic Analyzer from scratch in Rust.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages