Skip to content

[fix](fe) Fix nested column pruning OFFSET dedup crash on array<struct>#62631

Open
englefly wants to merge 4 commits intoapache:masterfrom
englefly:fix-length-str-offset
Open

[fix](fe) Fix nested column pruning OFFSET dedup crash on array<struct>#62631
englefly wants to merge 4 commits intoapache:masterfrom
englefly:fix-length-str-offset

Conversation

@englefly
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #62205

Problem Summary:

PR #62205 introduced the length(str) OFFSET optimization for nested column pruning. When length() is applied to a string-like nested field, the pruner marks the access path with an OFFSET suffix so BE can read only the offset array instead of full element data.

However, the OFFSET path dedup logic operates at slot granularity: if ANY non-OFFSET path exists for the same slot, ALL OFFSET paths are stripped. This is incorrect for array<struct<...>> columns where different struct fields have independent access patterns.

Bug 1 (crash): Given a query like:

SELECT array_match_all(x -> length(struct_element(x, 'str_field')) > 0, arr),
       struct_element(element_at(arr, 1), 'int_field')
FROM t

The int_field non-OFFSET path causes the str_field OFFSET path to be stripped. BE never reads str_field data → crash or wrong results.

Fix: Replace slot-level OFFSET dedup with per-field bidirectional prefix matching. An OFFSET path P+["OFFSET"] is only stripped when a non-OFFSET path Q shares a prefix relationship with P (covering the same container or ancestor), not when Q accesses a sibling struct field.

Bug 2 (wrong pruning): When isStringOffsetOnly=true AND accessPartialChild=true (e.g., cardinality(arr) + arr[*].f1), pruneDataType() returns the full type instead of pruning unused fields.

Fix: Add !accessPartialChild guard to the isStringOffsetOnly check.

Release note

Fix a BE crash caused by nested column pruning incorrectly stripping OFFSET access paths for array columns when different struct fields have independent access patterns (e.g., length() on one field + direct access on another).

Check List (For Author)

  • Test: Regression test
  • Behavior changed: No
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#62205

Problem Summary:

PR apache#62205 introduced the `length(str)` OFFSET optimization for nested column
pruning. When `length()` is applied to a string-like nested field, the pruner
marks the access path with an OFFSET suffix so BE can read only the offset
array instead of full element data.

However, the OFFSET path dedup logic operates at **slot granularity**: if ANY
non-OFFSET path exists for the same slot, ALL OFFSET paths are stripped. This
is incorrect for `array<struct<...>>` columns where different struct fields
have independent access patterns.

**Bug 1 (crash):** Given a query like:
```sql
SELECT array_match_all(x -> length(struct_element(x, 'str_field')) > 0, arr),
       struct_element(element_at(arr, 1), 'int_field')
FROM t
```
The `int_field` non-OFFSET path causes the `str_field` OFFSET path to be
stripped. BE never reads `str_field` data → crash or wrong results.

**Fix:** Replace slot-level OFFSET dedup with per-field bidirectional prefix
matching. An OFFSET path `P+["OFFSET"]` is only stripped when a non-OFFSET
path Q shares a prefix relationship with P (covering the same container or
ancestor), not when Q accesses a sibling struct field.

**Bug 2 (wrong pruning):** When `isStringOffsetOnly=true` AND
`accessPartialChild=true` (e.g., `cardinality(arr) + arr[*].f1`),
`pruneDataType()` returns the full type instead of pruning unused fields.

**Fix:** Add `!accessPartialChild` guard to the `isStringOffsetOnly` check.

### Release note

Fix a BE crash caused by nested column pruning incorrectly stripping OFFSET
access paths for array<struct> columns when different struct fields have
independent access patterns (e.g., length() on one field + direct access on
another).

### Check List (For Author)

- Test: Regression test
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@englefly
Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking issue remains in the new OFFSET dedup logic.

Critical checkpoint conclusions:

  • Goal of the current task: partially achieved. The array sibling-field case from the PR description is covered, but map value accesses that are spelled as * vs VALUES are still treated as unrelated, so OFFSET stripping is not correct end-to-end.
  • Scope/focus: the patch is small and localized.
  • Concurrency: no concurrency-sensitive code is involved here.
  • Lifecycle/static init: no special lifecycle concerns.
  • Config/compatibility/persistence/FE-BE protocol: no new config, compatibility, or persistence concerns.
  • Parallel code paths: not fully handled. Map value access has semantically equivalent encodings (* and VALUES), and the new raw-prefix comparison only handles one spelling.
  • Special conditional checks: the new minLen prefix test is too weak because it compares raw path tokens instead of normalized semantic coverage.
  • Tests: the added regression covers the array crash path, but I did not find coverage for the still-broken map overlap case, e.g. length(map_col['a']) together with map_values(map_col)[1].
  • Test result changes: the new .out file looks consistent with the added SQL case.
  • Observability: no additional logging or metrics are needed for this optimizer fix.
  • Transaction/persistence/data writes: not applicable.
  • Performance: no material concern in this change.
  • Other issues: none beyond the blocking bug below.

}
int minLen = Math.min(nonOffset.size(), prefix.size());
if (minLen == 0
|| nonOffset.subList(0, minLen).equals(prefix.subList(0, minLen))) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raw-prefix comparison misses overlapping map-value accesses that are spelled with different path tokens. length(map_col['a']) is collected as [map_col, *, OFFSET], but map_values(map_col)[1] becomes [map_col, VALUES]. Those two paths still target the same value container, yet * and VALUES are not prefix-equal here, so the OFFSET path survives.

That leaves both val_col.OFFSET and val_col on the BE side. MapFileColumnIterator::set_access_paths() forwards both to the value iterator, and StringFileColumnIterator::set_access_paths() switches to OFFSET_ONLY as soon as it sees the first one, so the full-value consumer reads zero-filled strings instead of the real payload.

A query like select length(map_col['a']), map_values(map_col)[1] from ... should still suppress the OFFSET path. The dedup needs to normalize the map aliases (* vs VALUES/KEYS) or compare semantic container coverage instead of raw string prefixes.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 31.82% (7/22) 🎉
Increment coverage report
Complete coverage report

…LUES tokens

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:
When `length(map_col['a'])` and `map_values(map_col)[1]` appear in the same
query, the OFFSET-only path `[map_col, *, OFFSET]` was not suppressed by the
full-value path `[map_col, VALUES]` because the prefix comparison used strict
string equality. `*` (element-at access) and `VALUES` (map_values access) both
target the same value container but were treated as different tokens, so the
OFFSET path incorrectly survived deduplication. On BE, StringFileColumnIterator
would switch to OFFSET_ONLY mode and the full-value consumer would read
zero-filled strings instead of the real payload.

The fix introduces `prefixSemanticEquals()` which treats `*` and `VALUES` as
equivalent during the prefix comparison, since both cover the map value container.

### Release note

Fix incorrect OFFSET path survival when combining element_at and map_values
access on the same map column, which could cause zero-filled string reads on BE.

### Check List (For Author)

- Test: Regression test
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@englefly
Copy link
Copy Markdown
Contributor Author

/review

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • Blocking: fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/NestedColumnPruning.java
    The new * vs VALUES semantic equivalence is not safe in both directions. [map_col, *] is not just value access; it is also the FE signal that BE must read map keys so element_at(map_col, key) can locate the entry. If a query mixes length(map_col['a']) with map_values(map_col)[1], the new dedup drops [map_col, *, OFFSET] and keeps only [map_col, VALUES]. In BE, MapFileColumnIterator::set_access_paths() then marks the key iterator SKIP_READING, so the element_at(map_col, 'a') inside length() can no longer match the key and returns NULL/defaults. The new EXPLAIN-only regression test for this case therefore encodes the broken behavior instead of guarding against it.

Critical Checkpoints

  • Goal: The array dedup fix and the !accessPartialChild guard look correct, but the map */VALUES change does not accomplish the goal safely because it regresses mixed element_at + map_values queries.
  • Scope/minimality: The patch is small and focused.
  • Concurrency: Not involved.
  • Lifecycle/static init: Not involved.
  • Config: None.
  • Compatibility / FE-BE contract: Affected. The new equivalence violates the existing access-path contract in BE, where * implies key + value access and VALUES implies value-only access.
  • Parallel code paths: Not fully handled. Any path that uses element_at(map, key) still needs keys even when another expression reads only VALUES.
  • Special conditions: The new suppression condition is missing the required asymmetry between * and VALUES.
  • Test coverage: Good coverage for the array fix; insufficient for the map case because the new test only checks EXPLAIN and does not verify query results.
  • Test result files: The added .out file appears auto-generated and consistent with the executed case.
  • Observability: No new observability needs.
  • Transaction/persistence/data writes: Not applicable.
  • FE-BE variable/path passing: No new fields, but the encoded access path is wrong for the mixed map case.
  • Performance: No material concern beyond the correctness issue.
  • Other issues: None beyond the blocking regression above.

Requesting changes because the current map-path suppression can produce wrong results.

// Both cover the same value container, so treat them as equivalent for OFFSET suppression.
// Do NOT treat * as equivalent to KEYS. For example:
// select length(map_col['a']), map_keys(map_col) from t
// map_keys(map_col) only covers the key side, while length(map_col['a']) still needs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison is not safe in both directions. A [map_col, *] path means element_at(map, key) style access, and the BE uses it to keep the map key iterator readable as well as the value iterator (see MapFileColumnIterator::set_access_paths()). [map_col, VALUES] only preserves the value side. With this change, select length(map_col['a']), map_values(map_col)[1] ... drops [map_col, *, OFFSET] and keeps only [map_col, VALUES]; BE then sets the key iterator to SKIP_READING, so the element_at(map_col, 'a') inside length() can no longer match the key and returns NULL/defaults. The * vs VALUES relation here needs to be directional, or this case needs to be rewritten to KEYS + VALUES rather than removing the * path entirely.

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
The previous OFFSET dedup fix treated map ACCESS_ALL (*) and VALUES access as fully equivalent. That is not safe when the OFFSET path comes from element_at(map_col, key): * is also the FE signal that BE must read map keys so the runtime can locate the requested entry. If length(map_col['a']) is mixed with map_values(map_col)[1], dropping [map_col, *, OFFSET] outright loses the key lookup requirement and can break the query.

This change makes the comparison type-aware and directional. When a VALUES path covers only the value side of an ACCESS_ALL OFFSET path, FE now rewrites the OFFSET path to a KEYS-only path instead of deleting it. That keeps the BE key iterator alive while still removing the redundant value-side OFFSET path.

### Release note

Fix nested-column pruning for mixed map element_at and map_values access so key lookup remains correct while redundant OFFSET reads are removed.

### Check List (For Author)

- Test: FE Unit Test, Regression test
    - FE Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest
    - Regression test: bash run-regression-test.sh --run -d nereids_rules_p0 -s string_length_column_pruning
- Behavior changed: Yes (mixed map element_at + map_values now preserves the KEYS access path)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@englefly
Copy link
Copy Markdown
Contributor Author

/review

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because one correctness gap remains.

Critical checkpoints:

  • Goal/task: The PR aims to make OFFSET-path dedup safe for mixed nested accesses. It fixes the top-level ARRAY/MAP cases covered by the new tests, but it does not yet cover the same access pattern when the container is nested under a STRUCT slot.
  • Scope/focus: The change is localized to NestedColumnPruning, but the new guard if (slot.getDataType().isArrayType() || slot.getDataType().isMapType()) scopes the rewrite too narrowly.
  • Concurrency: Not applicable; this is FE-only rewrite logic with no new concurrent state.
  • Lifecycle/config/compatibility: No new lifecycle, config, or upgrade concerns found.
  • Parallel code paths: The BE access-path contract is the same for array/map children inside structs, so those paths need the same suppression/rewrite behavior.
  • Special conditions: The current special handling only triggers when the root slot itself is ARRAY/MAP; nested containers under STRUCT still forward conflicting OFFSET and full-data paths together.
  • Test coverage: Added FE unit/regression tests cover top-level arr_struct and map_col cases, but they do not cover STRUCT<arr/map ...> cases where the bug still reproduces.
  • Observability/persistence/transactions/FE-BE variables: No additional blocking issues found.
  • Validation: I attempted bash run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest, but this runner is missing thirdparty/installed/bin/protoc, so I could not rerun FE UT locally.

See the inline comment for the concrete failing shape.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 51.46% (53/103) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 71.84% (74/103) 🎉
Increment coverage report
Complete coverage report

@englefly
Copy link
Copy Markdown
Contributor Author

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 71.84% (74/103) 🎉
Increment coverage report
Complete coverage report

@englefly
Copy link
Copy Markdown
Contributor Author

/review

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking finding:

  • regression-test/conf/regression-conf.groovy was committed with worktree-local port overrides. This file is explicitly local-only and AGENTS.md says env/run config changes must not be included in the PR.

Critical checkpoint conclusions:

  • Goal of the task: The FE pruning change appears to address the reported array<struct> sibling-field case and the mixed element_at(map) + map_values(map) case. The added FE unit tests and regression tests exercise those paths.
  • Small / focused change: Not fully. The code fix itself is focused, but the PR also includes an unrelated local regression config edit.
  • Concurrency: Not involved. I did not find shared-state or lock-order concerns in this change.
  • Lifecycle / static initialization: Not involved.
  • Configuration items: No new product config is added, but a checked-in regression harness config file was modified with local ports and should not be merged.
  • Compatibility: No FE/BE protocol, symbol, or storage-format compatibility issue found in the pruning logic.
  • Parallel code paths: I traced the collector -> NestedColumnPruning -> slot replacement -> BE map/array/struct iterator flow. The new path-rewrite logic is consistent with BE's distinction between *, KEYS, VALUES, and OFFSET.
  • Special conditionals: The new type-aware compareOffsetPrefixCoverage() logic is justified for the affected map/array cases.
  • Test coverage: Coverage for the newly fixed select-list cases is good in both FE UT and regression tests. I did not find another concrete blocker in the code path.
  • Test result files: The new .out files are consistent with the new order_qt cases.
  • Observability: Not needed for this optimizer-side fix.
  • Transaction / persistence / data write / FE-BE variable passing: Not applicable here.
  • Performance: The additional per-slot path comparison work is small and acceptable for the expected number of collected access paths.
  • Other issues: None beyond the accidental config check-in.

// add allowLoadLocalInfile so that the jdbc can execute mysql load data from client.
jdbcUrl = "jdbc:mysql://127.0.0.1:9030/?useLocalSessionState=true&allowLoadLocalInfile=true&zeroDateTimeBehavior=round"
targetJdbcUrl = "jdbc:mysql://127.0.0.1:9030/?useLocalSessionState=true&allowLoadLocalInfile=true&zeroDateTimeBehavior=round"
jdbcUrl = "jdbc:mysql://127.0.0.1:39131/?useLocalSessionState=true&allowLoadLocalInfile=true&zeroDateTimeBehavior=round"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like worktree-local test setup rather than PR content. regression-test/conf/regression-conf.groovy is the shared default harness config, and AGENTS.md explicitly says env/run config changes should not be committed. If this lands, CI and other developers will start targeting 39131/39121/38131 instead of the repo defaults. Please drop this file from the PR.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 60.95% (64/105) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 47.62% (80/168) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants