refine storage-format and add bench test#3509
Conversation
bd701d8 to
7b1305f
Compare
There was a problem hiding this comment.
Pull request overview
Refines the documentation for Doris Storage Format V3 (Segment V3) by clarifying the key architectural change (externalized column metadata) and adding benchmark/performance guidance across English and Chinese docs for v4.x and current.
Changes:
- Rewrote the V3 vs V2 explanation to focus on external column metadata and on-demand loading.
- Added a “Performance/性能数据” section with benchmark results and a chart image.
- Updated “When to Use V3/什么时候用 V3” and simplified usage wording.
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| docs/table-design/storage-format.md | English current doc rewrite + performance section and guidance. |
| versioned_docs/version-4.x/table-design/storage-format.md | English v4.x version kept in sync with current doc updates. |
| i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md | Chinese current doc rewrite + performance section and guidance. |
| i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/storage-format.md | Chinese v4.x version kept in sync with current doc updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7b1305f to
2ff072d
Compare
Co-authored-by: Copilot <[email protected]>
…sign/storage-format.md Co-authored-by: Copilot <[email protected]>
|
/review |
There was a problem hiding this comment.
Code Review Summary — PR #3509: Refine storage-format and add bench test
Goal and Implementation
The goal is to rewrite the storage-format.md doc (current + version-4.x, English + Chinese) for clarity, replace the verbose bullet-point style with prose, add a benchmark/performance section with two new images, and update the ## When to Use V3 / ## Use Cases section. The implementation satisfies the stated goal. All four expected file trees are updated consistently (docs/current EN, versioned 4.x EN, i18n current ZH, i18n 4.x ZH). Two new PNG assets are added to static/images/variant/, which is the correct location for variant-related images.
Critical Checkpoint Conclusions
1. Does the implementation satisfy the goal?
Yes. The rewrite is substantially cleaner and more direct than the original. The new Performance section with benchmark numbers adds concrete value for users evaluating V3.
2. Is the change focused?
Yes. Only the four storage-format.md files and two new images are touched. No unrelated files changed.
3. Documentation information architecture
- 3.x does not have a
storage-format.mdand it is not in the 3.x sidebar. The PR checklist marks 3.x as not applicable, which is correct. - 2.1 similarly has no
storage-format.md. The omission is intentional and correct. - All inbound links to
storage-formatfromcolumn-compression.md,variant-workload-guide.md, andvariant.mdin bothdocs/andversioned_docs/version-4.x/continue to resolve correctly — no filenames or slugs changed.
4. Path/anchor/slug/sidebar changes?
No paths, slugs, or anchors changed. The only heading restructuring is ## Use Cases → ## When to Use V3 and ## Design Philosophy removed. No sidebar entries reference these heading anchors directly, so no sidebar or redirect changes are needed.
5. Docusaurus config / React / build scripts?
Not touched.
6. Validation
Images verified present in static/images/variant/. All four docs files updated. Relative links and front matter are unchanged.
7. Other issues found
Issue 1 (Minor — English wording ambiguity): Benchmark description sentence is grammatically misleading
In docs/table-design/storage-format.md (and the identical versioned_docs/version-4.x copy), the Performance section says:
The following test was run on a wide table with 10,000 Segments, each containing 7,000 columns.
Grammatically, "each containing 7,000 columns" modifies "Segments", implying each segment independently holds 7,000 columns — but columns belong to the table, not to individual segments. The Chinese version is clearer: "一张含 7,000 列的宽表上进行,共 10,000 个 Segment" (a wide table with 7,000 columns, with 10,000 Segments total). The English should match this phrasing to avoid confusion.
Suggested fix:
The following test was run on a wide table with 7,000 columns, containing 10,000 Segments.
or
The following test used a 7,000-column table with 10,000 Segments.
Issue 2 (Minor — feature doc completeness): No guidance on how to check or convert existing tables
The ## Usage section only shows how to set V3 on a new table at creation time. A user with an existing V2 table has no guidance on: (a) how to check what storage format a table is currently using, and (b) whether and how to convert an existing table to V3. This was also missing in the original, but given the PR explicitly rewrites the Usage section, it is an opportunity to address this gap. This is consistent with AGENTS.md's Feature Documentation Standard: "Error handling and caveats: note known failure modes, behavioral edge cases".
If V3 cannot be applied to existing tables (only new ones), that constraint should be stated explicitly. If it can be changed via ALTER TABLE … SET ("storage_format" = "V3"), a note or example would help users.
EN/ZH Consistency
The English and Chinese texts are substantively consistent. Section headings, benchmark numbers, use-case bullet points, and code examples all match. No inconsistencies found.
Front Matter
The description SEO field is absent from the front matter in all four files — but this was already absent before this PR and is therefore a pre-existing issue, not introduced here.
Overall Assessment
The PR is a clear improvement over the original. The two issues above are minor: Issue 1 is a one-sentence fix, Issue 2 is a pre-existing gap that was not made worse. The structural changes, image additions, and locale coverage are all correct.
Versions
Languages
Docs Checklist