You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/table-design/storage-format.md
+34-20Lines changed: 34 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,38 +24,52 @@ specific language governing permissions and limitations
24
24
under the License.
25
25
-->
26
26
27
-
Doris Storage Format V3 is a major evolution from the Segment V2 format. Through metadata decoupling and encoding strategy optimization, it specifically improves performance for wide tables, complex data types (such as Variant), and cloud-native storage-compute separation scenarios.
27
+
Storage Format V3 is the successor to Segment V2. The main change: column metadata is no longer packed inside the Segment Footer, but stored in a separate area of the file. This removes the metadata bottleneck that V2 hits when tables grow to thousands of columns — a common situation with `VARIANT` subcolumns.
28
28
29
-
## Key Optimizations
29
+
## What Changed
30
30
31
31
### External Column Meta
32
-
***Background**: In Segment V2, metadata for all columns (`ColumnMetaPB`) is stored in the Footer of the Segment file. For wide tables with thousands of columns or auto-scaling Variant scenarios, the Footer can grow to several megabytes.
33
-
***Optimization**: V3 decouples `ColumnMetaPB` from the Footer and stores it in a separate area within the file (External Column Meta Area).
***On-demand Loading**: Metadata can be loaded on demand from the independent area, reducing memory usage and improving cold start query performance on object storage (like S3/OSS).
32
+
33
+
In V2, every column's `ColumnMetaPB` sits in the Segment Footer. When a table has thousands of columns (or a `VARIANT` column expands into thousands of subcolumns), the Footer alone can reach several MB. Opening a Segment means loading and deserializing all of that, even if the query only touches two columns.
34
+
35
+
V3 moves `ColumnMetaPB` out of the Footer into a dedicated area in the file. The Footer keeps only lightweight pointers.
36
+
37
+

38
+
39
+
Result: the system loads a small Footer first, then fetches metadata only for the columns the query needs. On object storage (S3, OSS), this cuts cold-start latency considerably.
37
40
38
41
### Integer Type Plain Encoding
39
-
***Optimization**: V3 defaults to `PLAIN_ENCODING` (raw binary storage) for numerical types (such as `INT`, `BIGINT`), instead of the traditional BitShuffle.
40
-
***Benefits**: Combined with LZ4/ZSTD compression, `PLAIN_ENCODING` provides higher read throughput and lower CPU overhead. In modern high-speed IO environments, this "trading decompression for performance" strategy offers a clear advantage when scanning large volumes of data.
42
+
43
+
V3 switches the default encoding for numeric types (`INT`, `BIGINT`, etc.) from BitShuffle to `PLAIN_ENCODING` (raw binary). With LZ4 or ZSTD compression on top, this combination reads faster and uses less CPU than BitShuffle during large scans.
41
44
42
45
### Binary Plain Encoding V2
43
-
***Optimization**: Introduces `BINARY_PLAIN_ENCODING_V2`, using a `[length(varuint)][raw_data]` streaming layout, replacing the old format that relied on trailing offset tables.
44
-
***Benefits**: Eliminates large trailing offset tables, making data storage more compact and significantly reducing storage consumption for string and JSONB types.
45
46
46
-
## Design Philosophy
47
-
The design philosophy of V3 can be summarized as: **"Metadata Decoupling, Encoding Simplification, and Streaming Layout"**. By reducing metadata processing bottlenecks and leveraging the high efficiency of modern CPUs in processing simple encodings, it achieves high-performance analysis under complex schemas.
47
+
V3 introduces `BINARY_PLAIN_ENCODING_V2` for strings and JSONB. The new layout uses `[length(varuint)][raw_data]` in a streaming fashion, eliminating the trailing offset table that V2 required. This makes string storage more compact.
48
+
49
+
## Performance
50
+
51
+
The following test was run on a VARIANT table with 10,000 Segments, each containing 7,000 JSON paths — all materialized as subcolumns.
52
+
53
+

54
+
55
+
| Metric | V2 | V3 | Improvement |
56
+
|---|---:|---:|---|
57
+
| Segment open time | 65 s | 4 s | 16× faster |
58
+
| Memory during open | 60 GB | < 1 GB | 60× less |
59
+
60
+
With V2, the system must deserialize the entire Footer (containing all column metadata) even when the query reads only a few columns. That causes massive I/O and memory waste. V3 reads a slim Footer, then loads column metadata on demand.
61
+
62
+
## When to Use V3
63
+
64
+
- Tables with 2,000+ columns or VARIANT columns expanding into many subcolumns.
65
+
- Object storage or tiered storage where metadata loading latency matters.
66
+
- Any new `VARIANT` table — V3 is always recommended.
48
67
49
-
## Use Cases
50
-
-**Wide Tables**: Tables with more than 2000 columns or long column names.
51
-
-**Semi-structured Data**: Heavy use of `VARIANT` or `JSON` types.
52
-
-**Tiered Storage/Cloud Native**: Scenarios sensitive to object storage loading latency.
53
-
-**High-performance Scanning**: Analytical tasks with extreme requirements for scan throughput.
68
+
For tables with a moderate number of columns and no VARIANT, V2 works fine. V3 helps most when the column count is large.
54
69
55
70
## Usage
56
71
57
-
### Enable When Creating a New Table
58
-
Specify `storage_format` as `V3` in the `PROPERTIES` of the `CREATE TABLE` statement:
72
+
Specify `storage_format` as `V3` in `PROPERTIES` when creating a table:
0 commit comments