Support custom types by glommer · Pull Request #5254 · tursodatabase/turso

glommer · 2026-02-14T02:06:44Z

Description

This PR adds custom types for SQLite. Strict tables are required, and custom types can only be used with strict tables. Users can define their own custom types by adding a SQLite expression, but we seed the table with some initial ones.

There is a new fuzzer that is introduced with this that tries to make sure that the expressions we generate are valid with sql expressions like order by, indexes, etc.

I can now run this fuzzer for hours without issues.

Note that this depends on PR #5207, which is included here.

Motivation and context

We live in 2026. Types won.

Description of AI Usage

This was over a week of Claude Coding. I focused a lot on validation, and aside from the fuzzer loop, used reviewing agents extensively.

turso-bot

Please review @pereman2

glommer · 2026-02-15T00:24:24Z

Update for maintainers: I am currently going through a list of bugs found by @LeMikaelF 's clanker.

Will update the PR as I make progress.

glommer · 2026-02-23T18:08:03Z

too many conflicts so I stashed everything into one commit (FYI @penberg )

Per the manual, CAST(x AS custom_type) should produce the stored (encoded) representation. Previously it applied both encode and decode, making it a no-op for symmetric encode/decode pairs like cents (value*100 / value/100 = identity). Now CAST(42 AS cents) correctly returns 4200 (the encoded form), and CAST('hello' AS reversed) returns 'olleh'. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Custom type names like "doubled" contain substrings ("DOUB") that SQLite's name-based rules map to REAL affinity. This caused integer values to be stored as floats and returned with wrong typeof(). Fix: when resolving custom type columns, override both the Column's Type bits and base affinity bits with values derived from the BASE type name. This is done in handle_schema_row (for ParseSchema) and resolve_all_custom_type_affinities (for schema reparse after loading types from __turso_internal_types). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce emit_user_facing_column_value() — a single abstraction that converts a stored column value to its user-facing form. For custom type columns this applies the DECODE function; for regular columns it is a plain copy. Both SELECT and RETURNING now go through this helper. Previously RETURNING mapped column references directly to the write registers which hold encoded storage values, so INSERT/UPDATE/DELETE RETURNING showed raw encoded values (e.g. 4200 instead of 42 for a cents type). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When reopening a database, make_from_btree loads table definitions from the sqlite_schema btree but does not read the contents of __turso_internal_types. This meant the type_registry was empty after reopen, causing SELECT to return raw encoded values and PRAGMA list_types to omit user-defined types. Extract shared helpers Schema::load_type_definitions() and Connection::query_stored_type_definitions() so both the initial open path (lib.rs) and schema reparse path (connection.rs) load custom types through the same code. Add integration tests for reopen, schema change after reopen, and new connection visibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The optimizer builds ephemeral auto-indexes for the inner table in joins. These indexes store raw encoded column values. When building the seek key, encode_seek_keys_for_custom_types re-encodes the decoded seek key to match the index contents. However, for aliased tables (e.g. FROM t1 a, t1 b), this function searched for the table by identifier ("a"/"b") while the ephemeral index stored the base table name ("t1"). The lookup failed silently, skipping the encode step, so the decoded seek key (e.g. 10) could never match the encoded index value (e.g. 1000). Fix: add find_table_by_table_name() that searches by the underlying table name rather than the alias, and use it as a fallback when the identifier lookup fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The existing fuzzer tested cross-table joins (t1 JOIN t2) but not self-joins (t1 a JOIN t1 b). This gap meant the auto-index alias lookup bug fixed in bdc308e would not have been caught by the fuzzer. Add pattern 39 that performs a self-join on t1 custom type column and verifies both the join condition (a.val == b.val) and that the result count is at least t1_rows (each row matches at least itself). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In UPSERTs DO UPDATE path, excluded.column was reading from the already-encoded insertion registers. When the UPSERT then encoded the SET columns again, values were double-encoded (e.g. 50 became 5000 for a cents type with ENCODE value * 100). Similarly, the WHERE clause in DO UPDATE was comparing against encoded values from disk (e.g. WHERE t1.amount < 20 evaluated 1000 < 20 instead of 10 < 20). The fix: - Create decoded copies of current_start registers for WHERE/SET expressions (current_start itself stays encoded for trigger OLD registers) - Create decoded copies of excluded (insertion) registers so excluded.column references see user-facing values - Decode new_start in-place (was copied from encoded current_start) - Encode ALL columns in new_start before writing (not just SET columns), since non-SET columns are now decoded too Add emit_custom_type_decode_columns helper (mirrors the existing encode helper) and extend rewrite_expr_to_registers to accept decoded excluded register base for proper column resolution. Fixes Bugs 7, 13, 15, 22, 27 from the custom types bug list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SQLite already has special treatment for boolean: it accepts true/false as literals that map to 1/0, and CAST(1 AS boolean) returns 1. Our int_to_boolean DECODE function was returning text 'true'/'false', which diverged from SQLite behavior and caused CAST(1 AS boolean) to break after the encode-only CAST fix. Change DECODE to `CASE WHEN value THEN 1 ELSE 0 END` so boolean columns display as 0/1, matching SQLite semantics. The ENCODE function (boolean_to_int) is unchanged and still validates/normalizes user input. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Disable constant hoisting for SET expressions on columns with custom type encode functions. The encode is applied in-place to the target register inside the update loop, so a hoisted constant would be encoded repeatedly on each iteration (99 → 9900 → 990000 → ...). We disable hoisting rather than working around it because: 1. Encode functions may be non-deterministic (e.g. datetime('now')) 2. Even for deterministic encodes, hoisting the pre-encode value and encoding in-place causes progressive double-encoding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In STRICT tables, CHECK constraint comparisons are now type-checked at CREATE TABLE time. Every comparison operand must have a determinable, compatible type. If the type cannot be determined (e.g. function calls), the user must use an explicit CAST. This prevents Bug 10 (CHECK constraints seeing encoded custom type values) by rejecting the problematic pattern entirely: comparing a custom type column against a raw literal is a type error. The user must write CHECK(amount < CAST(50 AS cents)) instead of CHECK(amount < 50). Type compatibility rules: - INTEGER and REAL are mutually compatible (numeric) - TEXT only with TEXT, BLOB only with BLOB - ANY compatible with everything, NULL compatible with everything - Custom types only compatible with the same custom type - Function calls require CAST (return type unknown) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

VACUUM INTO failed with three separate errors when the source database had custom types: (a) creating __turso_internal_types in the dest was rejected by the reserved-name check, (b) INSERT INTO it was rejected by may-not-be-modified, and (c) CREATE TABLE for STRICT tables failed because custom types were not registered in the dest connection schema. Fixed by temporarily marking the dest connection as nested during prepare() for internal tables and copying the source type registry into the dest after creating the types table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CREATE TYPE allowed names like INTEGER, TEXT, REAL, BLOB, ANY, and INT, which shadow the column type system and create undropable types. Now rejected at CREATE TYPE time with a clear error message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Foreign key CASCADE and SET NULL actions failed with custom types because the key registers used for parent index probes contained decoded values while the index stored encoded values. Add decode_fk_key_registers helper that decodes FK key registers before comparison, applied in delete actions, update actions, and drop table checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Custom type columns in STRICT tables bypassed TypeCheck because encoding ran before validation, silently converting wrong-type inputs (e.g. 'hello' * 100 = 0 for a cents column). Add typed parameter syntax to CREATE TYPE so types can declare the expected input type for the value parameter, e.g. CREATE TYPE cents(value integer) BASE integer. A pre-encode TypeCheck now validates user input against the declared value type before encoding runs. The existing post-encode TypeCheck remains to validate encoded output against the BASE storage type. Updated all 13 built-in types with typed params (uuid expects text, boolean expects any, numeric expects any, etc.). Backward compatible: old untyped params default to any. Also moves child-side FK checks in UPDATE to after encoding so that new values probed against parent indexes are properly encoded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Drop the type parameter from OPERATOR syntax (now OPERATOR '+' func_name instead of OPERATOR '+' (type) -> func_name). Old syntax still parses for backward compatibility. Operators now only fire when both operands are the same custom type, or when one is a custom type column and the other is a compatible literal (matching the type's value parameter). Literals are encoded before being passed to the operator function so both args are in the same form. Fixes three bugs: operators firing for wrong types, register clobbering when encoding literals across loop iterations, and reversed comparisons when the literal appears on the LHS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

INSERT and UPDATE paths computed expression index values from encoded (storage) column registers, while SELECT/DELETE computed from decoded (user-facing) values. This mismatch caused index lookups to fail and DELETE to corrupt the database with "IdxDelete: no matching index entry found" errors. Add decode_custom_type_registers_in_expr() that walks rewritten expression trees, decodes custom type Expr::Register nodes into temporary registers before evaluation, ensuring all paths produce consistent index keys. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

NOT NULL was checked on the encoded (stored) value, allowing "ghost NULLs" where ENCODE produces a non-NULL value but DECODE returns NULL. The user would see NULL in a NOT NULL column, violating the constraint contract. Now emit_notnulls() decodes custom type values into a temporary register before the NULL check, ensuring the user-facing value is verified. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Non-deterministic functions like random(), changes(), and last_insert_rowid() in ENCODE expressions produce different stored values for the same user input, breaking UNIQUE constraints, equality lookups, index seeks, and JOIN matching. The existing validate_type_expr() now checks is_deterministic() on resolved built-in functions and rejects them at CREATE TYPE time. External (extension) functions are excluded from the check since they default to non-deterministic but may be deterministic (e.g. uuid_blob). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ALTER TABLE ADD COLUMN with NOT NULL rejected the operation when the column had no explicit DEFAULT, even if the custom type defined one. The type-level DEFAULT was not consulted for either the NOT NULL feasibility check or the actual column default value. Now the NOT NULL check considers the type-level DEFAULT, and for NOT NULL columns without an explicit DEFAULT, the type default is propagated to the column definition so existing rows use it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

INSERT INTO t DEFAULT VALUES was not consulting type-level defaults defined via CREATE TYPE ... DEFAULT expr. When a column had no explicit column-level DEFAULT but its custom type had a DEFAULT clause, the DEFAULT VALUES path produced NULL instead of the type default value. This was inconsistent with column-list INSERT (e.g. INSERT INTO t(id) VALUES (1)) which correctly applied type-level defaults for omitted columns. Fixed both DefaultValues code paths in bind_insert() and init_source_emission() to check type-level defaults when no column default exists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… NULL The built-in date, time, and timestamp types used bare date()/time()/ datetime() as their ENCODE expressions. These SQLite functions return NULL for invalid input rather than raising an error, which meant invalid values like 'not-a-date' were silently converted to NULL on STRICT tables — a data integrity violation. Changed the ENCODE expressions to wrap the function calls in a CASE that checks for NULL output (when input was non-NULL) and raises an ABORT error with a descriptive message. NULL input is still passed through unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e DDL .dump was emitting CREATE TABLE __turso_internal_types(...) and INSERT statements for the internal metadata table. When this output was fed back into a fresh database, it failed with 'Object name reserved for internal use' because direct creation of __turso_internal_types is blocked. Now .dump emits the original CREATE TYPE statements (extracted from the sql column of __turso_internal_types) before any table DDL. The internal metadata table itself is skipped in the table dump loop. This produces a clean, restorable SQL dump that recreates custom types via the proper CREATE TYPE syntax. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

In the UPSERT DO UPDATE path, BEFORE UPDATE triggers received NEW registers that were already encoded (post custom type encoding). The trigger context was created with new_encoded=false, so fire_trigger's decode_trigger_registers skipped decoding. This caused NEW.column references in trigger bodies to show raw encoded values (e.g. 2000 instead of 20 for a cents type with ENCODE value * 100). Fixed by using new_after_with_override_conflict (which sets new_encoded=true) instead of new_with_override_conflict for the BEFORE trigger context in the UPSERT path. This matches the actual state of the registers at that point and lets the existing decode logic in fire_trigger handle decoding correctly. Also updated pragma-list-types test expectations to reflect the new date/time/timestamp ENCODE expressions from the Bug 30 fix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… OPERATOR '<' Custom types with encode/decode but no OPERATOR '<' were silently producing wrong ORDER BY results — the sorter compared encoded (on-disk) values, which may have completely different ordering than the user expects. This commit makes the behavior explicit and safe. Key changes: 1. **Naked OPERATOR '<' syntax**: Types can now declare `OPERATOR '<'` without a function name, meaning "use the base type's built-in comparison on encoded values." With a function name (`OPERATOR '<' func`), the comparator transforms encoded values before comparing. 2. **Block ORDER BY** on custom type columns without any '<' operator, with a clear error: "cannot ORDER BY column 'X' of type 'Y': type does not declare OPERATOR '<'". 3. **Block CREATE INDEX** on non-orderable custom type columns (expression indexes like `CREATE INDEX idx ON t(length(val))` remain allowed). 4. **Add OPERATOR '<' to built-in types** where base type comparison on encoded values produces correct ordering: date, time, timestamp, varchar, smallint, boolean, uuid, bytea. Types without meaningful ordering (json, jsonb, inet) are left without '<'. 5. **Sort keys are always encoded values**: Sorting operates on the encoded (on-disk) representation, never decoded values. DECODE is purely a presentation layer. For deduplicated columns (where the sort key IS the result column), DECODE is applied after sorting for display. 6. **Replace test_reverse_encode/decode with string_reverse**: The two identical test functions that just reversed strings were consolidated into a single `string_reverse` scalar function, which is genuinely useful as both a function and a sort comparator. Parser changes (parser/src/ast.rs, parser.rs, ast/fmt.rs): - TypeOperator.func_name changed from String to Option<String> - Parser accepts three syntaxes: naked, named, and old (type)->func - SQL serialization conditionally emits function name Core changes (schema.rs, order_by.rs, index.rs, expr.rs, execute.rs): - Built-in type definitions updated with OPERATOR '<' - ORDER BY validation added in init_order_by - CREATE INDEX validation added in translate_create_index - Naked operators fall through to standard comparison in expr dispatch - string_reverse added as both scalar function and sort comparator - All custom type sort keys suppress decode (not just types without '<') - Post-sort decode restored for deduplicated columns Tests: 27 new ordering tests (custom_type_ordering.sqltest) covering error cases, naked '<' with identity/monotonic/non-monotonic encodings, custom comparators, built-in types, and index verification (each sort test duplicated with/without index to ensure identical results). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Custom types (CREATE TYPE, DROP TYPE, built-in types, sqlite_turso_types virtual table) were unconditionally available even without the --experimental-strict flag. Since custom types only work with STRICT tables, they should be gated behind the same flag. Changes: - Schema::with_options(enable_strict) conditionally bootstraps built-in custom types and registers the sqlite_turso_types virtual table - CREATE TYPE and DROP TYPE return a clear error when strict is disabled - PRAGMA list_types shows only base types (INTEGER, REAL, TEXT, BLOB, ANY) when strict is disabled; shows all types when enabled - Custom type loading from __turso_internal_types during database open and schema refresh is skipped when strict is disabled - Updated documentation to note the --experimental-strict requirement Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Decouple custom types from --experimental-strict: a new --experimental-custom-types flag independently gates CREATE TYPE / DROP TYPE. Fix post-rebase compilation errors from upstream API changes. Fix affinity bug where affinity_with_strict() used name-based affinity instead of respecting base affinity override for custom type columns. Add @requires-file custom_types annotations to all custom type test files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two changes in the custom-types work affect snapshot bytecode: 1. RETURNING clause (all 7 returning snapshots): emit_returning_results() now calls emit_user_facing_column_value() which allocates fresh registers and emits Copy instructions so custom-type columns can be decoded before being returned. For regular columns this is a no-op copy but the instructions are still emitted. 2. Integrity check table order (2 multi-table snapshots): adding sqlite_turso_types to the Schema tables HashMap changes the HashMap iteration order used by integrity_check when enumerating tables, resulting in t1 being visited before t2 instead of after. The check is correct regardless of order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

glommer requested review from jussisaurio, penberg and pereman2 as code owners February 14, 2026 02:06

turso-bot bot reviewed Feb 14, 2026

View reviewed changes

github-actions bot added core optimizer translation/planning vdbe cli docs labels Feb 14, 2026

glommer force-pushed the custom-types branch 2 times, most recently from d246ce6 to a1937b0 Compare February 15, 2026 19:32

glommer force-pushed the custom-types branch from a1937b0 to f07ef56 Compare February 23, 2026 18:07

github-actions bot added simulator mvcc antithesis Macros Storage Sqlite3 IO Perf/Benchmarks ci-actions rust-bindings JS-Bindings Java-Bindings Python-Bindings Json vector labels Feb 23, 2026

glommer and others added 27 commits February 24, 2026 07:17

glommer force-pushed the custom-types branch from 778f1d8 to f2ce970 Compare February 24, 2026 13:32

penberg merged commit dc4916f into tursodatabase:main Feb 24, 2026
88 checks passed

avinassh mentioned this pull request Mar 12, 2026

Ensure VACUUM INTO works with rich / custom types #5898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom types#5254

Support custom types#5254
penberg merged 64 commits intotursodatabase:mainfrom
glommer:custom-types

glommer commented Feb 14, 2026

Uh oh!

turso-bot bot left a comment

Uh oh!

glommer commented Feb 15, 2026

Uh oh!

glommer commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

glommer commented Feb 14, 2026

Description

Motivation and context

Description of AI Usage

Uh oh!

turso-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

glommer commented Feb 15, 2026

Uh oh!

glommer commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants