-
Notifications
You must be signed in to change notification settings - Fork 860
Open
Labels
C-bugCategory: something isn't workingCategory: something isn't workingagent-issueAgent-created issue markerAgent-created issue marker
Description
Search before asking
- I had searched in the issues and found no similar issues.
Version
1.2.890
What's Wrong?
1. Description
Currently, Databend treats the hex literal X'...' as a PostgreSQL Hex Integer Literal (PGLiteralHex) in the lexer/parser, which is then interpreted as an Integer.
// Current problematic regex in Databend lexer
#[regex(r"[xX]'[a-fA-F0-9]*'")]
PGLiteralHex,However, this implementation deviates from the SQL Standard, confuses the behavior of PostgreSQL, and creates functional limitations for binary data.
2. Evidence of Non-Compliance
A. SQL Standard (ISO/IEC 9075)
According to the SQL:1999 (and subsequent versions like SQL:2016) specification, Section 5.3 <literal>, the syntax X'<hexit>...' is explicitly defined as a <binary string literal>.
- It is intended to represent
BINARYorBLOBdata types. - It is not defined as an integer literal.
B. Misconception of PostgreSQL Behavior
The naming PGLiteralHex suggests alignment with PostgreSQL, but PG's behavior is different:
- In PostgreSQL,
X'0A'results in a Bit String (BIT VARYING), not anInteger. SELECT X'0A' + 1;fails in PG because a Bit String cannot be implicitly used as an Integer.- PostgreSQL 16+ introduced
0x...for hex integers, butX'...'remains a Bit String.
3. Comparison of Database Implementations
The following table shows that Databend's current approach is an outlier compared to major SQL engines:
| Implementation Group | Databases | Result Type of X'0A' |
SQL Standard Compliant? |
|---|---|---|---|
| Standard-Compliant | DuckDB, Trino, Snowflake, MySQL, SQLite | Binary / BLOB (Byte Stream) | Yes |
| Dialect (Bit-string) | PostgreSQL | BIT VARYING (Bit Array) | No |
| Non-Standard | SQL Server | Syntax Error (Only supports 0x) |
No |
| Databend (Current) | Databend | Integer | No |
4. Impact & Risks
- Integer Overflow: Binary literals (like SHA-256 hashes
X'df4... ') frequently exceed 64 bits. Forcing them into anIntegertype causes overflow errors where they should have been validBINARYdata. - Migration Barrier: Users migrating from Snowflake, MySQL, or ClickHouse expect
X'...'to work for binary ingestion. - Inconsistent Logic: Databend lacks a standard-compliant way to input
BINARYliterals ifX'...'is occupied by theIntegertype.
5. Suggested Fix
- Refactor Lexer: Map
X'...'literals to aBinarytype instead ofPGLiteralHex(Integer).
Reference
- ISO/IEC 9075-2 Section 5.3
<binary string literal> - PostgreSQL Docs: Bit String Constants
How to Reproduce?
root@localhost:8000/default/default> select X'ABCD';
SELECT 43981
╭────────╮
│ 43981 │
│ UInt16 │
├────────┤
│ 43981 │
╰────────╯
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
C-bugCategory: something isn't workingCategory: something isn't workingagent-issueAgent-created issue markerAgent-created issue marker