🦀 Semantic Analyzer in Rust

A From-Scratch Semantic Analyzer + IR Builder in Pure Rust.

This project implements a fully working semantic analysis layer and intermediate representation (IR) builder--> written from scratch in 100% Rust.

It performs the same core responsibilities as a compiler frontend: resolving symbols, performing type inference and validation, managing lexical scopes, and lowering the typed AST into a compiler-friendly Intermediate Representation (IR).

No external frameworks. No shortcuts. Just clean, idiomatic Rust with strong correctness guarantees.

Vision

Modern compilers (Rustc, Clang, Swiftc) rely on precise semantic analysis to ensure that programs are type-safe, scope-safe, and referentially correct before any code generation happens.

This project mirrors those principles on a smaller scale:

Builds a real symbol table system with nested scopes.
Implements type checking and error diagnostics.
Performs AST → IR lowering (the step between frontend and backend).
Demonstrates compiler-grade engineering--> correctness, clarity, and extensibility.

Key Features

✅ Type System & Inference

Supports primitive types (int, bool, string, unit).
Infers variable types when not explicitly annotated.
Validates return types and assignment compatibility.

✅ Symbol Table & Scope Management

Tracks functions, variables, and parameters across nested scopes.
Detects redeclarations and undeclared references.
Pushes and pops environments automatically for functions & blocks.

✅ Diagnostics System

Structured diagnostics with span tracking.
Detects:
- Undeclared variables
- Type mismatches
- Duplicate declarations
- Invalid function calls

✅ IR Builder

Lowers verified AST nodes into a linear, typed Intermediate Representation.
Generates pseudo-assembly–like instructions:
```
LOAD a
LOAD b
ADD
RETURN
```
Supports expressions, variables, function calls, and blocks.

✅ Test Suite

Includes unit tests for every major semantic case:
- Correct function call resolution
- Type mismatch errors
- Undeclared identifier checks
- Nested block scope resolution
- Return-type enforcement

System Architecture

flowchart TD
    A[Raw Source Code] --> B[Lexer]
    B --> C[Parser]
    C -->|AST| D[Semantic Analyzer]
    D -->|Typed AST| E[IR Builder]
    E -->|IR Instructions| F[Runtime / Codegen]
    
    subgraph Diagnostics
    Dx[Errors, Warnings, Spans]
    end

    D --> Dx

Layers

Layer	Description	Output
Lexer	Converts text into tokens.	Token Stream
Parser	Builds an untyped AST.	AST
Semantic Analyzer	Validates AST semantics (types, scopes).	Typed AST
IR Builder	Lowers the typed AST to linear IR.	IR Vector

⚙️ Running the Project

Build

cargo build --release

Run

cargo run

You can modify or run pre-defined sample programs in src/main.rs like:

fn sample_program_small() -> Program {
    use Expr::*;
    use Stmt::*;
    Program {
        items: vec![
            // fn add(a:int, b:int) -> int { return a + b; }
            FnDecl {
                name: "add".into(),
                params: vec![("a".into(), TypeExpr::Int), ("b".into(), TypeExpr::Int)],
                ret: TypeExpr::Int,
                body: vec![Return {
                    expr: Some(BinaryOp(
                        Box::new(Ident("a".into(), Span::new(1,1))),
                        BinOp::Add,
                        Box::new(Ident("b".into(), Span::new(2,2))),
                        Span::new(1,2)
                    )),
                    span: Span::new(1,2)
                }],
                span: Span::new(0,10),
            },
            // fn main() { var x = add(1, 2); print("ok"); }
            FnDecl {
                name: "main".into(),
                params: vec![],
                ret: TypeExpr::Unit,
                body: vec![
                    VarDecl {
                        name: "x".into(),
                        typ: None,
                        init: Some(Call(
                            Box::new(Ident("add".into(), Span::new(10,13))),
                            vec![IntLit(1, Span::new(11,11)), IntLit(2, Span::new(12,12))],
                            Span::new(10,13)
                        )),
                        span: Span::new(10,13),
                    },
                    ExprStmt(Call(
                        Box::new(Ident("print".into(), Span::new(14,19))),
                        vec![StrLit("ok".into(), Span::new(15,17))],
                        Span::new(14,17)
                    )),
                ],
                span: Span::new(10,40),
            },
        ],
    }
}

🧪 Testing

cargo test

Example output:

💡 Challenges & Solutions

Challenge	Problem	Solution
Managing nested scopes	Variables in inner blocks shadowing parent variables	Implemented scoped `EnvStack` with push/pop per block
Type inference conflicts	Ambiguous types for untyped variables	Added default inference and propagation logic
Function redeclarations	Duplicate symbols during multiple passes	Tracked all function declarations upfront before analysis
Undeclared identifiers	Referenced before declaration	Introduced global + local environment validation phase
Borrow checker issues	Conflicting mutable borrows in IR lowering	Switched to interior mutability via `RefCell`

Lessons Learned

Semantic correctness is the soul of compilers. Syntax is surface-level --> semantics define truth.
Symbol tables and environments mirror real-world scope chains. From C++ to Rust, this same logic drives variable lifetime and resolution.
Rust’s ownership model enforces safe compiler design. Every borrow and reference in this project is intentional.
Intermediate Representation (IR) is language-agnostic power. The moment you emit IR, you’ve built a true compiler frontend.

Roadmap

Example IR Output

fn add(a:int, b:int) -> int {
    LOAD a
    LOAD b
    ADD
    RETURN
}

fn main() {
    CALL add 1 2
    STORE x
    CONST "ok"
    CALL print 1
}

What I Learned

Building a semantic analyzer from scratch taught me how languages guarantee correctness before execution — from type inference to scope validation.

I also learned:

Compiler architectures are just well-structured trees.
Rust forces correctness not just in programs — but in the compiler itself.
IR generation is the bridge between “understanding” and “executing” code.

Summary

This is a real compiler frontend.

Written in pure Rust. Built with zero dependencies. Designed for correctness, not convenience.

Semantic analysis separates “syntax that looks right” from “code that is right.”

This project makes that distinction crystal clear.

Quote:

what I cannot create, I cannot understand.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
media		media
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦀 Semantic Analyzer in Rust

A From-Scratch Semantic Analyzer + IR Builder in Pure Rust.

Vision

Key Features

System Architecture

Layers

⚙️ Running the Project

Build

Run

🧪 Testing

💡 Challenges & Solutions

Lessons Learned

Roadmap

Example IR Output

What I Learned

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦀 Semantic Analyzer in Rust

A From-Scratch Semantic Analyzer + IR Builder in Pure Rust.

Vision

Key Features

System Architecture

Layers

⚙️ Running the Project

Build

Run

🧪 Testing

💡 Challenges & Solutions

Lessons Learned

Roadmap

Example IR Output

What I Learned

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages