Skip to content

Frigatebird-db/frigatebird

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

237 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Frigatebird

Frigatebird: A columnar SQL database with push-based query execution

Named after the frigatebird, one of the fastest birds that can stay aloft for weeks with minimal effort

License: MIT

Demo

Overview

Frigatebird is an embedded columnar database that implements a push-based Volcano execution model with morsel-driven parallelism. Queries compile into pipelines where operators push batches downstream through channels, enabling parallel execution across multiple workers.

Architecture

architecture

Key Features

  • Push-based execution - Operators push data downstream instead of pulling, enabling natural parallelism
  • Morsel-driven parallelism - Data processed in 50k-row morsels that flow through pipelines independently
  • Late materialization - Columns loaded only when needed, reducing I/O for selective queries
  • Three-tier caching - Uncompressed (hot) β†’ Compressed (warm) β†’ Disk (cold)
  • Vectorized filtering - Bitmap operations process 64 rows per CPU instruction
  • Dictionary encoding - Automatic compression for low-cardinality string columns
  • WAL durability - Write-ahead logging with three-phase commit for crash recovery
  • io_uring + O_DIRECT - Batched async I/O bypassing OS page cache (Linux)

Quick Start

# Start the interactive CLI
make frigatebird
sql> CREATE TABLE events (id TEXT, ts TIMESTAMP, data TEXT) ORDER BY id
OK

sql> INSERT INTO events (id, ts, data) VALUES ('sensor-1', '2024-01-15 10:30:00', 'temperature=23.5')
OK

sql> SELECT * FROM events WHERE id = 'sensor-1'
id       | ts                  | data
sensor-1 | 2024-01-15 10:30:00 | temperature=23.5
(1 rows)

note: ORDER BY is mandatory while creating a table

How It Works

Query Pipeline

A query like SELECT name FROM users WHERE age > 25 AND city = 'NYC' compiles into a pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Step 0    β”‚     β”‚   Step 1    β”‚     β”‚   Step 2    β”‚
β”‚  column:age │────▢│ column:city │────▢│ column:name │────▢ Output
β”‚ filter:>25  β”‚     β”‚ filter:=NYC β”‚     β”‚ (project)   β”‚
β”‚   (root)    β”‚     β”‚             β”‚     β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   187k rows           31k rows            8k rows

Each step loads only its column and filters, passing surviving row IDs downstream. The final step materializes projection columns only for rows that passed all filters.

Late Materialization

Without late materialization:  Load ALL columns Γ— 187k rows = 750k values
With late materialization:     age: 187k + city: 31k + name: 8k = 226k values
                               ─────────────────────────────────────────────
                               70% less data loaded

Parallel Execution

Workers grab morsels (page groups) using lock-free atomic compare-and-swap:

Time ──────────────────────────────────────────────▢

Worker 0  β•‘β–“β–“β–“ M0:S0 β–“β–“β–“β•‘     β•‘β–“β–“ M0:S1 β–“β–“β•‘    β•‘β–“ M0:S2 β–“β•‘
Worker 1  β•‘β–“β–“β–“ M1:S0 β–“β–“β–“β•‘   β•‘β–“β–“ M1:S1 β–“β–“β•‘   β•‘β–“ M1:S2 β–“β•‘
Worker 2    β•‘β–“β–“β–“ M2:S0 β–“β–“β–“β•‘   β•‘β–“β–“ M2:S1 β–“β–“β•‘    β•‘β–“ M2:S2 β–“β•‘
Worker 3      β•‘β–“β–“ M3:S0 β–“β–“β•‘   β•‘β–“ M3:S1 β–“β•‘    β•‘β–“ M3:S2 β–“β•‘

Legend: M0:S0 = Morsel 0, Step 0

Multiple morsels execute concurrently. Channels buffer batches between steps.

CLI Commands

Command Description
\d List all tables
\d <table> Describe table schema
\dt List all tables
\i <file> Execute SQL from file
\q Quit

SQL Support

DDL

  • CREATE TABLE <name> (...columns...) ORDER BY <col[, ...]>

DML

  • INSERT INTO <table> (...) VALUES (...)
  • UPDATE <table> SET ... [WHERE ...]
  • DELETE FROM <table> [WHERE ...]

Queries

  • Single table SELECT with WHERE, ORDER BY, LIMIT, OFFSET
  • Predicates: AND, OR, comparisons, BETWEEN, LIKE, ILIKE, RLIKE, IN, IS NULL
  • Aggregates: COUNT, SUM, AVG, MIN, MAX, VARIANCE, STDDEV, PERCENTILE_CONT
  • Aggregate modifiers: FILTER (WHERE ...), DISTINCT
  • Conditional aggregates: sumIf, avgIf, countIf
  • GROUP BY with HAVING, QUALIFY, DISTINCT
  • Window functions: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM, FIRST_VALUE, LAST_VALUE
  • Time functions: TIME_BUCKET, DATE_TRUNC
  • Scalar functions: ABS, ROUND, CEIL, FLOOR, EXP, LN, LOG, POWER, WIDTH_BUCKET

Data Types

  • TEXT / VARCHAR / STRING
  • INT / INTEGER / BIGINT
  • FLOAT / DOUBLE / REAL
  • BOOL / BOOLEAN
  • TIMESTAMP / DATETIME
  • UUID
  • INET / IP

Storage

Data is stored in columnar format with one file per column:

storage/
β”œβ”€β”€ data.00000          # Page data (4GB max per file, auto-rotates)
└── data.00001

wal_files/
β”œβ”€β”€ frigatebird/        # WAL for durability
└── frigatebird-meta/   # Metadata journal

Pages are compressed with LZ4 and aligned to 4KB boundaries for O_DIRECT I/O.

Running Tests

cargo test

Documentation

See the docs/ directory for detailed documentation:

License

MIT License - see LICENSE for details.

About

πŸͺΆ columnar SQL database built from scratch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages