apache
diff --git a/‎docs/table-design/storage-format.md‎
Lines changed: 34 additions & 20 deletions b/‎docs/table-design/storage-format.md‎
Lines changed: 34 additions & 20 deletions
diff --git a/‎i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md‎
Lines changed: 38 additions & 23 deletions b/‎i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md‎
Lines changed: 38 additions & 23 deletions
diff --git a/‎i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/storage-format.md‎
Lines changed: 38 additions & 23 deletions b/‎i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/storage-format.md‎
Lines changed: 38 additions & 23 deletions
diff --git a/‎static/images/variant/storage-format-v3-benchmark.png‎
335 KB b/‎static/images/variant/storage-format-v3-benchmark.png‎
335 KB
diff --git a/‎static/images/variant/storage-format-v3-layout.png‎
371 KB b/‎static/images/variant/storage-format-v3-layout.png‎
371 KB
@@ -24,38 +24,52 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Doris Storage Format V3 is a major evolution from the Segment V2 format. Through metadata decoupling and encoding strategy optimization, it specifically improves performance for wide tables, complex data types (such as Variant), and cloud-native storage-compute separation scenarios.
+Storage Format V3 is the successor to Segment V2. The main change: column metadata is no longer packed inside the Segment Footer, but stored in a separate area of the file. This removes the metadata bottleneck that V2 hits when tables grow to thousands of columns — a common situation with `VARIANT` subcolumns.
 
-## Key Optimizations
+## What Changed
 
 ### External Column Meta
-*   **Background**: In Segment V2, metadata for all columns (`ColumnMetaPB`) is stored in the Footer of the Segment file. For wide tables with thousands of columns or auto-scaling Variant scenarios, the Footer can grow to several megabytes.
-*   **Optimization**: V3 decouples `ColumnMetaPB` from the Footer and stores it in a separate area within the file (External Column Meta Area).
-*   **Benefits**:
-    *   **Ultra-fast Metadata Loading**: Significantly reduces Segment Footer size, speeding up initial file opening.
-    *   **On-demand Loading**: Metadata can be loaded on demand from the independent area, reducing memory usage and improving cold start query performance on object storage (like S3/OSS).
+
+In V2, every column's `ColumnMetaPB` sits in the Segment Footer. When a table has thousands of columns (or a `VARIANT` column expands into thousands of subcolumns), the Footer alone can reach several MB. Opening a Segment means loading and deserializing all of that, even if the query only touches two columns.
+
+V3 moves `ColumnMetaPB` out of the Footer into a dedicated area in the file. The Footer keeps only lightweight pointers.
+
+![Storage Format V2 vs V3 — Segment File Layout](/images/variant/storage-format-v3-layout.png)
+
+Result: the system loads a small Footer first, then fetches metadata only for the columns the query needs. On object storage (S3, OSS), this cuts cold-start latency considerably.
 
 ### Integer Type Plain Encoding
-*   **Optimization**: V3 defaults to `PLAIN_ENCODING` (raw binary storage) for numerical types (such as `INT`, `BIGINT`), instead of the traditional BitShuffle.
-*   **Benefits**: Combined with LZ4/ZSTD compression, `PLAIN_ENCODING` provides higher read throughput and lower CPU overhead. In modern high-speed IO environments, this "trading decompression for performance" strategy offers a clear advantage when scanning large volumes of data.
+
+V3 switches the default encoding for numeric types (`INT`, `BIGINT`, etc.) from BitShuffle to `PLAIN_ENCODING` (raw binary). With LZ4 or ZSTD compression on top, this combination reads faster and uses less CPU than BitShuffle during large scans.
 
 ### Binary Plain Encoding V2
-*   **Optimization**: Introduces `BINARY_PLAIN_ENCODING_V2`, using a `[length(varuint)][raw_data]` streaming layout, replacing the old format that relied on trailing offset tables.
-*   **Benefits**: Eliminates large trailing offset tables, making data storage more compact and significantly reducing storage consumption for string and JSONB types.
 
-## Design Philosophy
-The design philosophy of V3 can be summarized as: **"Metadata Decoupling, Encoding Simplification, and Streaming Layout"**. By reducing metadata processing bottlenecks and leveraging the high efficiency of modern CPUs in processing simple encodings, it achieves high-performance analysis under complex schemas.
+V3 introduces `BINARY_PLAIN_ENCODING_V2` for strings and JSONB. The new layout uses `[length(varuint)][raw_data]` in a streaming fashion, eliminating the trailing offset table that V2 required. This makes string storage more compact.
+
+## Performance
+
+The following test was run on a VARIANT table with 10,000 Segments, each containing 7,000 JSON paths — all materialized as subcolumns.
+
+![Storage Format V3 — Metadata Open Efficiency](/images/variant/storage-format-v3-benchmark.png)
+
+| Metric | V2 | V3 | Improvement |
+|---|---:|---:|---|
+| Segment open time | 65 s | 4 s | 16× faster |
+| Memory during open | 60 GB | < 1 GB | 60× less |
+
+With V2, the system must deserialize the entire Footer (containing all column metadata) even when the query reads only a few columns. That causes massive I/O and memory waste. V3 reads a slim Footer, then loads column metadata on demand.
+
+## When to Use V3
+
+- Tables with 2,000+ columns or VARIANT columns expanding into many subcolumns.
+- Object storage or tiered storage where metadata loading latency matters.
+- Any new `VARIANT` table — V3 is always recommended.
 
-## Use Cases
-- **Wide Tables**: Tables with more than 2000 columns or long column names.
-- **Semi-structured Data**: Heavy use of `VARIANT` or `JSON` types.
-- **Tiered Storage/Cloud Native**: Scenarios sensitive to object storage loading latency.
-- **High-performance Scanning**: Analytical tasks with extreme requirements for scan throughput.
+For tables with a moderate number of columns and no VARIANT, V2 works fine. V3 helps most when the column count is large.
 
 ## Usage
 
-### Enable When Creating a New Table
-Specify `storage_format` as `V3` in the `PROPERTIES` of the `CREATE TABLE` statement:
+Specify `storage_format` as `V3` in `PROPERTIES` when creating a table:
 
 ```sql
 CREATE TABLE table_v3 (
 
@@ -24,38 +24,53 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Apache Doris 存储格式 V3 是在 Segment V2 格式基础上进行的重大演进。它通过元数据解耦与编码策略优化，专门针对大宽表、复杂数据类型（如 Variant）以及云原生存算分离场景提升性能。
+存储格式 V3 是 Segment V2 的继任者。核心变化：列元数据不再打包在 Segment Footer 中，而是存储到文件内的独立区域。这去掉了 V2 在列数达到几千时遇到的元数据瓶颈——在 `VARIANT` 子列自动扩展后，这是常见场景。
 
-## 核心优化点
+## 改了什么
 
-### 外部列元数据 (External Column Meta)
-*   **优化背景**：在 Segment V2 中，所有列的元数据（`ColumnMetaPB`）都存储在 Segment 文件的 Footer 中。对于拥有数千列的大宽表或自动扩容的 Variant 场景，Footer 可能会膨胀到几 MB。
-*   **优化思路**：V3 将 `ColumnMetaPB` 从 Footer 中剥离，转而存储在文件内的独立区域（External Column Meta Area）。
-*   **收益**：
-    *   **极速元数据加载**：显著减小 Segment Footer 体积，加快文件初次打开速度。
-    *   **按需加载**：元数据可以按需从独立区域加载，降低内存占用，提升对象存储（如 S3/OSS）上的冷启动查询性能。
+### 外部列元数据（External Column Meta）
 
-### 数值类型 Plain 编码模式 (Integer Type Plain Encoding)
-*   **优化思路**：V3 默认将数值类型（如 `INT`, `BIGINT`）切换为 `PLAIN_ENCODING`（原始二进制存储），而非传统的 BitShuffle。
-*   **收益**：配合 LZ4/ZSTD 压缩时，`PLAIN_ENCODING` 提供了更高的读取吞吐量和更低的 CPU 开销。在现代高速 IO 环境下，这种“解压换性能”的策略在扫描大体量数据时优势明显。
+V2 中，所有列的 `ColumnMetaPB` 都放在 Segment Footer 里。当表有几千列（或一个 `VARIANT` 列展开为几千个子列）时，Footer 可以膨胀到几 MB。打开一个 Segment 就要加载和反序列化全部元数据，即使查询只需读两个列。
 
-### 二进制 Plain 编码 V2 (Binary Plain Encoding V2)
-*   **优化思路**：引入 `BINARY_PLAIN_ENCODING_V2`，采用 `[长度(varuint)][原始数据]` 的流式布局，取代了依赖末尾偏移表（Offsets）的旧格式。
-*   **收益**：消除了末尾庞大的偏移表，数据存储更加紧凑，有效降低了字符串和 JSONB 类型的存储空间占用。
+V3 将 `ColumnMetaPB` 从 Footer 移到文件内的独立区域，Footer 只保留轻量指针。
 
-## 设计哲学
-V3 的设计哲学可以总结为：**“元数据解耦、编码简化、流式布局”**。通过减少元数据处理瓶颈和利用现代 CPU 对简单编码的高处理效率，实现在复杂模式下的高性能分析。
+![存储格式 V2 vs V3 — Segment 文件布局](/images/variant/storage-format-v3-layout.png)
 
-## 使用场景
-- **大宽表**：字段数量超过 2000 个以上，或字段名冗长。
-- **半结构化数据**：大量使用 `VARIANT`， 且物化列数超过2000列。
-- **冷热分离/云原生**：对对象存储加载延迟敏感的场景。
-- **高性能扫描**：对 Scan 吞吐量有极致要求的分析任务。
+结果：系统先加载一个很小的 Footer，再按需拉取查询所需列的元数据。在对象存储（S3、OSS）上，冷启动延迟大幅降低。
+
+### 数值类型 Plain 编码
+
+V3 将数值类型（`INT`、`BIGINT` 等）的默认编码从 BitShuffle 换成 `PLAIN_ENCODING`（原始二进制存储）。配合 LZ4 或 ZSTD 压缩，读取速度更快、CPU 开销更低，在大批量扫描时优势明显。
+
+### 二进制 Plain 编码 V2
+
+V3 为字符串和 JSONB 引入 `BINARY_PLAIN_ENCODING_V2`。新布局采用 `[长度(varuint)][原始数据]` 流式结构，去掉了 V2 需要的末尾偏移表，存储更紧凑。
+
+## 性能数据
+
+以下测试在一张 VARIANT 表上进行，共 10,000 个 Segment，每个 Segment 包含 7,000 个 JSON Path，全部物化为子列。
+
+![存储格式 V3 — 元数据打开效率](/images/variant/storage-format-v3-benchmark.png)
+
+| 指标 | V2 | V3 | 提升 |
+|---|---:|---:|---|
+| Segment 打开时间 | 65 s | 4 s | 快 16 倍 |
+| 打开时内存占用 | 60 GB | < 1 GB | 降低 60 倍 |
+
+V2 必须反序列化整个 Footer（包含全部列元数据），即使查询只读几列，也会产生大量无效 I/O 和内存浪费。V3 只读一个精简 Footer，再按需加载列元数据。
+
+## 什么时候用 V3
+
+- 表有 2,000 列以上，或 VARIANT 列展开了大量子列。
+- 使用对象存储或分层存储，元数据加载延迟敏感。
+- 新建的 `VARIANT` 表——始终建议开启 V3。
+
+列数不多、不使用 VARIANT 的普通表，V2 也够用。V3 在列数量大的场景收益最明显。
 
 ## 使用方式
 
-### 创建新表时启用
-在建表语句的 `PROPERTIES` 中指定 `storage_format` 为 `V3`：
+建表时在 `PROPERTIES` 中指定 `storage_format` 为 `V3`：
+
 ```sql
 CREATE TABLE table_v3 (
     id BIGINT,
 
@@ -24,38 +24,53 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Apache Doris 存储格式 V3 是在 Segment V2 格式基础上进行的重大演进。它通过元数据解耦与编码策略优化，专门针对大宽表、复杂数据类型（如 Variant）以及云原生存算分离场景提升性能。
+存储格式 V3 是 Segment V2 的继任者。核心变化：列元数据不再打包在 Segment Footer 中，而是存储到文件内的独立区域。这去掉了 V2 在列数达到几千时遇到的元数据瓶颈——在 `VARIANT` 子列自动扩展后，这是常见场景。
 
-## 核心优化点
+## 改了什么
 
-### 外部列元数据 (External Column Meta)
-*   **优化背景**：在 Segment V2 中，所有列的元数据（`ColumnMetaPB`）都存储在 Segment 文件的 Footer 中。对于拥有数千列的大宽表或自动扩容的 Variant 场景，Footer 可能会膨胀到几 MB。
-*   **优化思路**：V3 将 `ColumnMetaPB` 从 Footer 中剥离，转而存储在文件内的独立区域（External Column Meta Area）。
-*   **收益**：
-    *   **极速元数据加载**：显著减小 Segment Footer 体积，加快文件初次打开速度。
-    *   **按需加载**：元数据可以按需从独立区域加载，降低内存占用，提升对象存储（如 S3/OSS）上的冷启动查询性能。
+### 外部列元数据（External Column Meta）
 
-### 数值类型 Plain 编码模式 (Integer Type Plain Encoding)
-*   **优化思路**：V3 默认将数值类型（如 `INT`, `BIGINT`）切换为 `PLAIN_ENCODING`（原始二进制存储），而非传统的 BitShuffle。
-*   **收益**：配合 LZ4/ZSTD 压缩时，`PLAIN_ENCODING` 提供了更高的读取吞吐量和更低的 CPU 开销。在现代高速 IO 环境下，这种“解压换性能”的策略在扫描大体量数据时优势明显。
+V2 中，所有列的 `ColumnMetaPB` 都放在 Segment Footer 里。当表有几千列（或一个 `VARIANT` 列展开为几千个子列）时，Footer 可以膨胀到几 MB。打开一个 Segment 就要加载和反序列化全部元数据，即使查询只需读两个列。
 
-### 二进制 Plain 编码 V2 (Binary Plain Encoding V2)
-*   **优化思路**：引入 `BINARY_PLAIN_ENCODING_V2`，采用 `[长度(varuint)][原始数据]` 的流式布局，取代了依赖末尾偏移表（Offsets）的旧格式。
-*   **收益**：消除了末尾庞大的偏移表，数据存储更加紧凑，有效降低了字符串和 JSONB 类型的存储空间占用。
+V3 将 `ColumnMetaPB` 从 Footer 移到文件内的独立区域，Footer 只保留轻量指针。
 
-## 设计哲学
-V3 的设计哲学可以总结为：**“元数据解耦、编码简化、流式布局”**。通过减少元数据处理瓶颈和利用现代 CPU 对简单编码的高处理效率，实现在复杂模式下的高性能分析。
+![存储格式 V2 vs V3 — Segment 文件布局](/images/variant/storage-format-v3-layout.png)
 
-## 使用场景
-- **大宽表**：字段数量超过 2000 个以上，或字段名冗长。
-- **半结构化数据**：大量使用 `VARIANT`， 且物化列数超过2000列。
-- **冷热分离/云原生**：对对象存储加载延迟敏感的场景。
-- **高性能扫描**：对 Scan 吞吐量有极致要求的分析任务。
+结果：系统先加载一个很小的 Footer，再按需拉取查询所需列的元数据。在对象存储（S3、OSS）上，冷启动延迟大幅降低。
+
+### 数值类型 Plain 编码
+
+V3 将数值类型（`INT`、`BIGINT` 等）的默认编码从 BitShuffle 换成 `PLAIN_ENCODING`（原始二进制存储）。配合 LZ4 或 ZSTD 压缩，读取速度更快、CPU 开销更低，在大批量扫描时优势明显。
+
+### 二进制 Plain 编码 V2
+
+V3 为字符串和 JSONB 引入 `BINARY_PLAIN_ENCODING_V2`。新布局采用 `[长度(varuint)][原始数据]` 流式结构，去掉了 V2 需要的末尾偏移表，存储更紧凑。
+
+## 性能数据
+
+以下测试在一张 VARIANT 表上进行，共 10,000 个 Segment，每个 Segment 包含 7,000 个 JSON Path，全部物化为子列。
+
+![存储格式 V3 — 元数据打开效率](/images/variant/storage-format-v3-benchmark.png)
+
+| 指标 | V2 | V3 | 提升 |
+|---|---:|---:|---|
+| Segment 打开时间 | 65 s | 4 s | 快 16 倍 |
+| 打开时内存占用 | 60 GB | < 1 GB | 降低 60 倍 |
+
+V2 必须反序列化整个 Footer（包含全部列元数据），即使查询只读几列，也会产生大量无效 I/O 和内存浪费。V3 只读一个精简 Footer，再按需加载列元数据。
+
+## 什么时候用 V3
+
+- 表有 2,000 列以上，或 VARIANT 列展开了大量子列。
+- 使用对象存储或分层存储，元数据加载延迟敏感。
+- 新建的 `VARIANT` 表——始终建议开启 V3。
+
+列数不多、不使用 VARIANT 的普通表，V2 也够用。V3 在列数量大的场景收益最明显。
 
 ## 使用方式
 
-### 创建新表时启用
-在建表语句的 `PROPERTIES` 中指定 `storage_format` 为 `V3`：
+建表时在 `PROPERTIES` 中指定 `storage_format` 为 `V3`：
+
 ```sql
 CREATE TABLE table_v3 (
     id BIGINT,