Skip to content

refine storage-format and add bench test#3509

Merged
eldenmoon merged 4 commits intoapache:masterfrom
eldenmoon:refine_storage_format
Mar 31, 2026
Merged

refine storage-format and add bench test#3509
eldenmoon merged 4 commits intoapache:masterfrom
eldenmoon:refine_storage_format

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented Mar 31, 2026

Versions

  • dev
  • 4.x
  • 3.x
  • 2.1

Languages

  • Chinese
  • English

Docs Checklist

  • Checked by AI
  • Test Cases Built

Copilot AI review requested due to automatic review settings March 31, 2026 11:58
@eldenmoon eldenmoon force-pushed the refine_storage_format branch from bd701d8 to 7b1305f Compare March 31, 2026 12:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refines the documentation for Doris Storage Format V3 (Segment V3) by clarifying the key architectural change (externalized column metadata) and adding benchmark/performance guidance across English and Chinese docs for v4.x and current.

Changes:

  • Rewrote the V3 vs V2 explanation to focus on external column metadata and on-demand loading.
  • Added a “Performance/性能数据” section with benchmark results and a chart image.
  • Updated “When to Use V3/什么时候用 V3” and simplified usage wording.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 2 comments.

File Description
docs/table-design/storage-format.md English current doc rewrite + performance section and guidance.
versioned_docs/version-4.x/table-design/storage-format.md English v4.x version kept in sync with current doc updates.
i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md Chinese current doc rewrite + performance section and guidance.
i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/storage-format.md Chinese v4.x version kept in sync with current doc updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md Outdated
@eldenmoon eldenmoon force-pushed the refine_storage_format branch from 7b1305f to 2ff072d Compare March 31, 2026 12:04
@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary — PR #3509: Refine storage-format and add bench test

Goal and Implementation

The goal is to rewrite the storage-format.md doc (current + version-4.x, English + Chinese) for clarity, replace the verbose bullet-point style with prose, add a benchmark/performance section with two new images, and update the ## When to Use V3 / ## Use Cases section. The implementation satisfies the stated goal. All four expected file trees are updated consistently (docs/current EN, versioned 4.x EN, i18n current ZH, i18n 4.x ZH). Two new PNG assets are added to static/images/variant/, which is the correct location for variant-related images.

Critical Checkpoint Conclusions

1. Does the implementation satisfy the goal?
Yes. The rewrite is substantially cleaner and more direct than the original. The new Performance section with benchmark numbers adds concrete value for users evaluating V3.

2. Is the change focused?
Yes. Only the four storage-format.md files and two new images are touched. No unrelated files changed.

3. Documentation information architecture

  • 3.x does not have a storage-format.md and it is not in the 3.x sidebar. The PR checklist marks 3.x as not applicable, which is correct.
  • 2.1 similarly has no storage-format.md. The omission is intentional and correct.
  • All inbound links to storage-format from column-compression.md, variant-workload-guide.md, and variant.md in both docs/ and versioned_docs/version-4.x/ continue to resolve correctly — no filenames or slugs changed.

4. Path/anchor/slug/sidebar changes?
No paths, slugs, or anchors changed. The only heading restructuring is ## Use Cases## When to Use V3 and ## Design Philosophy removed. No sidebar entries reference these heading anchors directly, so no sidebar or redirect changes are needed.

5. Docusaurus config / React / build scripts?
Not touched.

6. Validation
Images verified present in static/images/variant/. All four docs files updated. Relative links and front matter are unchanged.

7. Other issues found

Issue 1 (Minor — English wording ambiguity): Benchmark description sentence is grammatically misleading

In docs/table-design/storage-format.md (and the identical versioned_docs/version-4.x copy), the Performance section says:

The following test was run on a wide table with 10,000 Segments, each containing 7,000 columns.

Grammatically, "each containing 7,000 columns" modifies "Segments", implying each segment independently holds 7,000 columns — but columns belong to the table, not to individual segments. The Chinese version is clearer: "一张含 7,000 列的宽表上进行,共 10,000 个 Segment" (a wide table with 7,000 columns, with 10,000 Segments total). The English should match this phrasing to avoid confusion.

Suggested fix:

The following test was run on a wide table with 7,000 columns, containing 10,000 Segments.

or

The following test used a 7,000-column table with 10,000 Segments.

Issue 2 (Minor — feature doc completeness): No guidance on how to check or convert existing tables

The ## Usage section only shows how to set V3 on a new table at creation time. A user with an existing V2 table has no guidance on: (a) how to check what storage format a table is currently using, and (b) whether and how to convert an existing table to V3. This was also missing in the original, but given the PR explicitly rewrites the Usage section, it is an opportunity to address this gap. This is consistent with AGENTS.md's Feature Documentation Standard: "Error handling and caveats: note known failure modes, behavioral edge cases".

If V3 cannot be applied to existing tables (only new ones), that constraint should be stated explicitly. If it can be changed via ALTER TABLE … SET ("storage_format" = "V3"), a note or example would help users.

EN/ZH Consistency

The English and Chinese texts are substantively consistent. Section headings, benchmark numbers, use-case bullet points, and code examples all match. No inconsistencies found.

Front Matter

The description SEO field is absent from the front matter in all four files — but this was already absent before this PR and is therefore a pre-existing issue, not introduced here.

Overall Assessment

The PR is a clear improvement over the original. The two issues above are minor: Issue 1 is a one-sentence fix, Issue 2 is a pre-existing gap that was not made worse. The structural changes, image additions, and locale coverage are all correct.

Comment thread docs/table-design/storage-format.md
Comment thread versioned_docs/version-4.x/table-design/storage-format.md
Comment thread docs/table-design/storage-format.md
Comment thread versioned_docs/version-4.x/table-design/storage-format.md
@eldenmoon eldenmoon merged commit 04123f1 into apache:master Mar 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants