This project implements a fully working semantic analysis layer and intermediate representation (IR) builder--> written from scratch in 100% Rust.
It performs the same core responsibilities as a compiler frontend: resolving symbols, performing type inference and validation, managing lexical scopes, and lowering the typed AST into a compiler-friendly Intermediate Representation (IR).
No external frameworks. No shortcuts. Just clean, idiomatic Rust with strong correctness guarantees.
Modern compilers (Rustc, Clang, Swiftc) rely on precise semantic analysis to ensure that programs are type-safe, scope-safe, and referentially correct before any code generation happens.
This project mirrors those principles on a smaller scale:
- Builds a real symbol table system with nested scopes.
- Implements type checking and error diagnostics.
- Performs AST → IR lowering (the step between frontend and backend).
- Demonstrates compiler-grade engineering--> correctness, clarity, and extensibility.
✅ Type System & Inference
- Supports primitive types (
int,bool,string,unit). - Infers variable types when not explicitly annotated.
- Validates return types and assignment compatibility.
✅ Symbol Table & Scope Management
- Tracks functions, variables, and parameters across nested scopes.
- Detects redeclarations and undeclared references.
- Pushes and pops environments automatically for functions & blocks.
✅ Diagnostics System
-
Structured diagnostics with span tracking.
-
Detects:
- Undeclared variables
- Type mismatches
- Duplicate declarations
- Invalid function calls
✅ IR Builder
-
Lowers verified AST nodes into a linear, typed Intermediate Representation.
-
Generates pseudo-assembly–like instructions:
LOAD a LOAD b ADD RETURN -
Supports expressions, variables, function calls, and blocks.
✅ Test Suite
-
Includes unit tests for every major semantic case:
- Correct function call resolution
- Type mismatch errors
- Undeclared identifier checks
- Nested block scope resolution
- Return-type enforcement
flowchart TD
A[Raw Source Code] --> B[Lexer]
B --> C[Parser]
C -->|AST| D[Semantic Analyzer]
D -->|Typed AST| E[IR Builder]
E -->|IR Instructions| F[Runtime / Codegen]
subgraph Diagnostics
Dx[Errors, Warnings, Spans]
end
D --> Dx
| Layer | Description | Output |
|---|---|---|
| Lexer | Converts text into tokens. | Token Stream |
| Parser | Builds an untyped AST. | AST |
| Semantic Analyzer | Validates AST semantics (types, scopes). | Typed AST |
| IR Builder | Lowers the typed AST to linear IR. | IR Vector |
cargo build --releasecargo runYou can modify or run pre-defined sample programs in src/main.rs like:
fn sample_program_small() -> Program {
use Expr::*;
use Stmt::*;
Program {
items: vec![
// fn add(a:int, b:int) -> int { return a + b; }
FnDecl {
name: "add".into(),
params: vec![("a".into(), TypeExpr::Int), ("b".into(), TypeExpr::Int)],
ret: TypeExpr::Int,
body: vec![Return {
expr: Some(BinaryOp(
Box::new(Ident("a".into(), Span::new(1,1))),
BinOp::Add,
Box::new(Ident("b".into(), Span::new(2,2))),
Span::new(1,2)
)),
span: Span::new(1,2)
}],
span: Span::new(0,10),
},
// fn main() { var x = add(1, 2); print("ok"); }
FnDecl {
name: "main".into(),
params: vec![],
ret: TypeExpr::Unit,
body: vec![
VarDecl {
name: "x".into(),
typ: None,
init: Some(Call(
Box::new(Ident("add".into(), Span::new(10,13))),
vec![IntLit(1, Span::new(11,11)), IntLit(2, Span::new(12,12))],
Span::new(10,13)
)),
span: Span::new(10,13),
},
ExprStmt(Call(
Box::new(Ident("print".into(), Span::new(14,19))),
vec![StrLit("ok".into(), Span::new(15,17))],
Span::new(14,17)
)),
],
span: Span::new(10,40),
},
],
}
}cargo testExample output:
| Challenge | Problem | Solution |
|---|---|---|
| Managing nested scopes | Variables in inner blocks shadowing parent variables | Implemented scoped EnvStack with push/pop per block |
| Type inference conflicts | Ambiguous types for untyped variables | Added default inference and propagation logic |
| Function redeclarations | Duplicate symbols during multiple passes | Tracked all function declarations upfront before analysis |
| Undeclared identifiers | Referenced before declaration | Introduced global + local environment validation phase |
| Borrow checker issues | Conflicting mutable borrows in IR lowering | Switched to interior mutability via RefCell |
-
Semantic correctness is the soul of compilers. Syntax is surface-level --> semantics define truth.
-
Symbol tables and environments mirror real-world scope chains. From C++ to Rust, this same logic drives variable lifetime and resolution.
-
Rust’s ownership model enforces safe compiler design. Every borrow and reference in this project is intentional.
-
Intermediate Representation (IR) is language-agnostic power. The moment you emit IR, you’ve built a true compiler frontend.
- Symbol tables & scoped environments
- Type inference and checking
- Function validation and call resolution
- Error diagnostics with spans
- AST → IR lowering
- Unit test suite
- Control flow analysis (if/while/return)
- Constant folding optimization
- IR serialization and visualization
- Integration with codegen (LLVM or custom VM)
fn add(a:int, b:int) -> int {
LOAD a
LOAD b
ADD
RETURN
}
fn main() {
CALL add 1 2
STORE x
CONST "ok"
CALL print 1
}
Building a semantic analyzer from scratch taught me how languages guarantee correctness before execution — from type inference to scope validation.
I also learned:
- Compiler architectures are just well-structured trees.
- Rust forces correctness not just in programs — but in the compiler itself.
- IR generation is the bridge between “understanding” and “executing” code.
This is a real compiler frontend.
Written in pure Rust. Built with zero dependencies. Designed for correctness, not convenience.
Semantic analysis separates “syntax that looks right” from “code that is right.”
This project makes that distinction crystal clear.
Quote:
what I cannot create, I cannot understand.

