Skip to content

fix(query): handle empty LIKE ESCAPE in planner#19595

Open
sundy-li wants to merge 4 commits intodatabendlabs:mainfrom
sundy-li:fix/issue-19562-like-empty-escape-panic
Open

fix(query): handle empty LIKE ESCAPE in planner#19595
sundy-li wants to merge 4 commits intodatabendlabs:mainfrom
sundy-li:fix/issue-19562-like-empty-escape-panic

Conversation

@sundy-li
Copy link
Member

@sundy-li sundy-li commented Mar 23, 2026

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Fixes LIKE ... ESCAPE '' panics in planner type checking #19562
  • Stop the planner LIKE fast path from unwrapping an empty ESCAPE string.
  • Preserve existing operator/builtin fallback behavior by routing empty or non-single-character ESCAPE literals through the builtin like / like_any path instead of planner rewrites.
  • Add planner and sqllogictest regressions for empty-escape bindings and backslash-containing fallback cases.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Validation:

  • cargo test -p databend-common-sql --test it planner -- --nocapture
  • cargo test -p databend-common-sql --test it test_like_escape_preserves_existing_binding_semantics -- --nocapture
  • cargo clippy -p databend-common-sql --test it -- -D warnings
  • cargo fmt --all --check

Logic test added for CI coverage:

  • tests/sqllogictests/suites/query/issues/issue_19562.test

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Mar 23, 2026
@sundy-li sundy-li added the agent-reviewable Ready for agent review label Mar 23, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63f2779b9c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sundy-li
Copy link
Member Author

Blocking issue in resolve_like(): this patch stops the panic for ESCAPE , but it also lets empty-escape patterns fall into the const/prefix fast paths even when those rewrites are not semantics-preserving.\n\nExample: SELECT ax LIKE a\\x ESCAPE . After this patch new_like_str is borrowed as a\\x, check_const() returns true, and the planner rewrites to equality against a\\x. The runtime LIKE matcher still treats backslash as escaping the next character, so that pattern behaves like ax and should return true, not false. The same issue applies to prefix rewrites like a\\x%.\n\nSo this changes the failure mode from a planner panic to silent wrong-answer plans for some empty-escape inputs. I think the safe options here are either to reject ESCAPE with a normal SQL error, or to bypass the const/prefix rewrites when the escape string is present but empty.

@sundy-li
Copy link
Member Author

This fixes the planner panic, but it still leaves LIKE ... ESCAPE with inconsistent semantics when the pattern contains backslashes.\n\nAt planning time, src/query/sql/src/planner/semantic/type_check.rs now treats an empty escape string as "no conversion". But execution still canonicalizes LIKE matching around backslash escapes: src/query/functions/src/scalars/comparison.rs:1263-1268 returns the raw pattern for an empty escape, and src/query/expression/src/filter/like.rs:102-109 plus src/query/expression/src/filter/like.rs:189-190 still interpret \\ as an escape marker.\n\nThat means queries such as SELECT % LIKE \\% ESCAPE are still evaluated as if backslash were the escape character, instead of either rejecting the empty escape or honoring it as a true "no escape character" mode. The new tests only cover a pattern without backslashes, so this semantic hole is not exercised.\n\nPlease either reject empty ESCAPE strings with a normal SQL error, or plumb a real empty-escape mode through the matcher and add a regression that covers a backslash-containing pattern.

@sundy-li
Copy link
Member Author

Verified locally that this is still unsafe to merge. The planner no longer panics on , but it still takes the constant fast path for raw backslash patterns because \ only rejects \ and . Runtime LIKE with an empty escape still feeds the raw pattern into the matcher, and the matcher still treats backslash followed by , , or \ as an escape sequence.\n\nThat leaves a wrong-answer path for empty-escape inputs such as a raw double-backslash pattern: the planner treats it as an ordinary constant string, while execution interprets it as an escaped single backslash. Please either reject \ with a normal SQL error, or plumb a real no-escape mode through LIKE planning and execution and add a regression that covers backslashes.

@sundy-li
Copy link
Member Author

Verified locally that this is still unsafe to merge. The planner no longer panics on an empty ESCAPE string, but it still takes the constant fast path for raw backslash patterns because check_const only rejects percent and underscore. Runtime LIKE with an empty escape still feeds the raw pattern into the matcher, and the matcher still treats backslash followed by percent, underscore, or backslash as an escape sequence.

That leaves a wrong-answer path for empty-escape inputs such as a raw double-backslash pattern: the planner treats it as an ordinary constant string, while execution interprets it as an escaped single backslash. Please either reject empty ESCAPE strings with a normal SQL error, or plumb a real no-escape mode through LIKE planning and execution and add a regression that covers backslashes.

@sundy-li
Copy link
Member Author

Blocking issue in src/query/sql/src/planner/semantic/type_check.rs: treating ESCAPE as Cow::Borrowed(like_str) now lets explicit empty-escape patterns reach the const/prefix LIKE fast paths, but those rewrites are not semantics-preserving for backslashes.\n\nConcrete example: SELECT ax LIKE a\\x ESCAPE . The planner sees no % or _ and rewrites this to equality against the raw pattern. Execution still treats backslash as escaping the next character in LIKE patterns, so the runtime semantics of that pattern are ax, not a\\x. That changes the failure mode from a panic into a silent wrong-answer plan for some empty-escape inputs.\n\nPlease either reject empty ESCAPE with a normal SQL error, or bypass the const/prefix rewrites when escape is present but empty and add a regression that covers a backslash-containing pattern.

@sundy-li
Copy link
Member Author

Blocking issue remains: this patch removes the panic for , but it still lets explicit empty-escape patterns hit the LIKE fast paths even when those rewrites are not semantics-preserving. A concrete case is a raw double-backslash pattern: \ rewrites it to string equality on two backslashes, while the current LIKE matcher still interprets that pattern as an escaped single backslash. That changes the failure mode from panic to silent wrong-answer plans for some empty-escape inputs. Please either reject empty \ with a normal SQL error, or bypass the const/prefix rewrites (and ideally plumb an explicit empty-escape mode through the matcher) and add a regression that covers backslashes.

@sundy-li
Copy link
Member Author

Blocking issue remains: this patch removes the panic for ESCAPE '', but it still lets explicit empty-escape patterns hit the LIKE fast paths even when those rewrites are not semantics-preserving. A concrete case is a raw double-backslash pattern: check_const() rewrites it to string equality on two backslashes, while the current LIKE matcher still interprets that pattern as an escaped single backslash. That changes the failure mode from panic to silent wrong-answer plans for some empty-escape inputs. Please either reject empty ESCAPE with a normal SQL error, or bypass the const/prefix rewrites (and ideally plumb an explicit empty-escape mode through the matcher) and add a regression that covers backslashes.

Copy link
Member Author

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue: this still does not preserve ESCAPE '' semantics for backslash patterns.

@sundy-li
Copy link
Member Author

Blocking issue: this patch removes the planner panic for ESCAPE , but it still makes some empty-escape queries semantically unsafe.\n\nIn src/query/sql/src/planner/semantic/type_check.rs, resolve_like() now treats an empty escape string as the raw pattern and still runs check_const(). Since check_const() only rejects % and _, a pattern like \\\\ gets rewritten to =. Runtime LIKE does not agree with that: empty escape returns the raw pattern unchanged in src/query/functions/src/scalars/comparison.rs, and src/query/expression/src/filter/like.rs still treats raw \\ as an escape sequence. I verified locally that generate_like_pattern("\\\\") matches a single backslash, not a double backslash.\n\nSo this changes LIKE ... ESCAPE from a planner panic into silent wrong answers for backslash-containing patterns. Please either reject empty ESCAPE with a normal SQL error, or plumb a true no-escape mode through planning and execution and add a regression that covers backslashes.

@sundy-li
Copy link
Member Author

Re-verified locally: this still is not safe to merge.

The panic is gone for ESCAPE '', but the planner now lets explicit empty-escape patterns hit the LIKE const/prefix fast paths. Runtime LIKE still treats backslash as an escape marker, so planning and execution disagree for backslash-containing patterns.

Concrete example: SELECT 'ax' LIKE 'a\x' ESCAPE ''. The planner can rewrite that as equality on a\x, while the matcher interprets the pattern as ax. That turns the original panic into a silent wrong-answer path.

Please either reject empty ESCAPE with a normal SQL error, or bypass the const/prefix rewrites for Some("") (and ideally add a regression with a backslash-containing pattern).

@sundy-li
Copy link
Member Author

Blocking issue: this removes the panic, but it still does not validate LIKE ESCAPE literals as single-character values. Empty and multi-character escapes should stay semantic errors until the runtime can represent them correctly.

@sundy-li
Copy link
Member Author

Blocking issue in src/query/sql/src/planner/semantic/type_check.rs: the new Cow::Borrowed(like_str) path still sends ESCAPE patterns through check_const(). check_const() only rejects % and _, so a raw double-backslash pattern is rewritten to = even though runtime LIKE does not agree with that rewrite.\n\nAt execution time, src/query/functions/src/scalars/comparison.rs:1263-1268 leaves an empty escape string unchanged, and src/query/expression/src/filter/like.rs:234-240 plus src/query/expression/src/filter/like.rs:102-109 still treat raw \\ as an escape sequence. I re-verified this locally in the workspace: generate_like_pattern("\\\\") returns ComplexPattern, matches a single backslash, and does not match a double backslash.\n\nSo this patch changes the failure mode from planner panic to silent wrong answers for some LIKE ... ESCAPE inputs. Please either reject empty ESCAPE with a normal SQL error, or bypass the const/prefix rewrites when escape == Some("") and add a regression that covers a backslash-containing pattern.

Copy link
Member Author

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: this still changes from a planner panic into incorrect execution semantics. The planner now binds empty-escape patterns as if no escape character were present, but the runtime path still hard-codes as an escape in and . For example, will now bind and evaluate as a literal-percent match, even though disabling the escape character should leave the backslash literal instead of escaping . This needs either a runtime fix for the empty-escape case or a semantic error for unsupported patterns.

Copy link
Member Author

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: this still changes LIKE ... ESCAPE '' from a planner panic into incorrect execution semantics. The planner now binds empty-escape patterns as if no escape character were present, but the runtime path still hard-codes \\ as an escape in src/query/functions/src/scalars/comparison.rs and src/query/expression/src/filter/like.rs.

For example, SELECT '%' LIKE '\\%' ESCAPE '' will now bind and evaluate as a literal-percent match, even though disabling the escape character should leave the backslash literal instead of escaping %. This needs either a runtime fix for the empty-escape case or a semantic error for unsupported ESCAPE '' patterns.

@sundy-li
Copy link
Member Author

Blocking issue: this still changes LIKE ... ESCAPE '' from a planner panic into incorrect execution semantics. The planner now binds empty-escape patterns as raw strings, but the runtime like matcher still treats '' as an escape unconditionally in src/query/functions/src/scalars/comparison.rs:1263 and src/query/expression/src/filter/like.rs:102. For example, SELECT '%' LIKE '\%' ESCAPE '' will now bind and still match as a literal-percent pattern, even though disabling the escape should keep the backslash literal. Please either fix the runtime no-escape path or reject unsupported ESCAPE '' patterns semantically.

@sundy-li
Copy link
Member Author

Blocking: this patch removes the planner panic, but it still changes LIKE ... ESCAPE '\ into incorrect execution semantics for backslash patterns.

resolve_like() now treats an empty escape string the same as no escape character, but the runtime matcher still hard-codes \\ as a LIKE escape in src/query/functions/src/scalars/comparison.rs and src/query/expression/src/filter/like.rs. For example, SELECT '%' LIKE '\\\\%' ESCAPE '\ will now bind and match as a literal percent, even though disabling the escape character should leave the backslash literal instead of escaping %.

This needs one of two fixes before merge:

  1. preserve no-escape semantics in the runtime LIKE path when the third argument is empty, or
  2. reject ESCAPE '\ with a normal semantic error instead of routing it through the existing backslash-escape behavior.

@sundy-li sundy-li added the agent-changed Changed by agent label Mar 24, 2026
@sundy-li
Copy link
Member Author

Blocking follow-up with the exact repro:

SELECT '%' LIKE '\\%' ESCAPE ''

This patch stops the planner panic, but the query still routes into runtime LIKE matching that treats \\ as an escape in src/query/functions/src/scalars/comparison.rs and src/query/expression/src/filter/like.rs.

So ESCAPE '' is not preserved consistently: the planner now allows the query, while execution still interprets the backslash as escaping %. That changes the failure mode from panic to wrong result for backslash patterns.

Before merge, this needs either:

  • a runtime no-escape path when the third LIKE argument is empty, or
  • a normal semantic error for unsupported ESCAPE '' instead of binding it as the existing backslash-escape behavior.

@sundy-li sundy-li removed the agent-changed Changed by agent label Mar 24, 2026
@sundy-li
Copy link
Member Author

Blocking concern from review:

validate_like_escape() fixes the panic by rejecting ESCAPE '' globally, but Databend's existing execution path already treats an empty escape string as "no escape" through the public 3-argument like() function in src/query/functions/src/scalars/comparison.rs.

That changes behavior for queries that previously worked when they missed the literal fast path, for example:
SELECT 'a' LIKE concat('a') ESCAPE ''

It also leaves the operator inconsistent with like(lhs, rhs, ''), which still accepts the empty third argument.

This should be fixed without changing those semantics, for example by treating an empty escape as None in the planner rewrite path.

@sundy-li
Copy link
Member Author

Blocking issue: this patch fixes the panic, but it also turns previously accepted \ operator forms into semantic errors while the exposed 3-arg \ builtin still accepts an empty third argument as 'no escape'. Before this PR, non-fast-path cases reached \ and then the builtin, so queries such as \ could bind successfully. After this change they fail at bind time, which is a behavior regression and leaves the operator inconsistent with \ unless both paths are tightened together.

@sundy-li
Copy link
Member Author

Blocking issue: this patch fixes the panic, but it also turns previously accepted LIKE ... ESCAPE '' operator forms into semantic errors while the exposed 3-arg like() builtin still accepts an empty third argument as "no escape". Before this PR, non-fast-path cases reached resolve_like_escape() and then the builtin, so queries such as SELECT a LIKE concat(a) ESCAPE '' could bind successfully. After this change they fail at bind time, which is a behavior regression and leaves the operator inconsistent with like(a, concat(a), '') unless both paths are tightened together.

@sundy-li
Copy link
Member Author

Blocking for review: this patch fixes the planner panic by rejecting ESCAPE '' everywhere in the operator path, but the public 3-arg like() builtin still accepts '' as "no escape" in src/query/functions/src/scalars/comparison.rs.

That regresses previously bindable operator forms such as SELECT 'a' LIKE concat('a') ESCAPE '' and leaves LIKE ... ESCAPE '' inconsistent with like(lhs, rhs, '') unless the runtime/function semantics are tightened in the same change.

@sundy-li
Copy link
Member Author

Blocking review note: validate_like_escape() now rejects ESCAPE '' for every operator path in src/query/sql/src/planner/semantic/type_check.rs, but the underlying LIKE builtins still accept an empty third argument.

src/query/functions/src/scalars/comparison.rs registers 3-arg like / like_any, defaults missing escapes to "", and convert_escape_pattern() keeps the raw pattern when the escape string is empty. That means LIKE ... ESCAPE '' now fails at bind time while the corresponding builtin form can still be expressed and executed with ''.

Before this is safe to merge, the SQL surface needs one consistent rule: either tighten the builtin/runtime path in the same change, or preserve the existing non-panic operator behavior instead of rejecting only the operator syntax.

@sundy-li
Copy link
Member Author

Blocking review result: validate_like_escape() fixes the panic, but this patch also makes operator syntax stricter than the existing builtin path. I verified locally on this branch that SELECT a LIKE ANY (a) ESCAPE now fails at bind time, while SELECT like_any(a, a, ) still binds because the builtin/runtime path in src/query/functions/src/scalars/comparison.rs still treats an empty third argument as "no escape". Please align those semantics before merge.

@sundy-li sundy-li added the agent-changed Changed by agent label Mar 25, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

🤖 CI Job Analysis (Retry 1)

Workflow: 23520532107

📊 Summary

  • Total Jobs: 86
  • Failed Jobs: 8
  • Retryable: 0
  • Code Issues: 8

NO RETRY NEEDED

All failures appear to be code/test issues requiring manual fixes.

🔍 Job Details

  • linux / sqllogic / standalone_minio (query, http, native): Not retryable (Code/Test)
  • linux / sqllogic / standalone_minio (query, hybrid, parquet): Not retryable (Code/Test)
  • linux / sqllogic / standalone_minio (query, hybrid, native): Not retryable (Code/Test)
  • linux / sqllogic / standalone_minio (query, http, parquet): Not retryable (Code/Test)
  • linux / sqllogic / standalone (query, 4c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / standalone (query, 4c, http): Not retryable (Code/Test)
  • linux / sqllogic / cluster (query, 4c, hybrid): Not retryable (Code/Test)
  • linux / sqllogic / cluster (query, 4c, http): Not retryable (Code/Test)

🤖 About

Automated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed).

@sundy-li sundy-li added agent-approved Approved by agent and removed agent-changed Changed by agent agent-reviewable Ready for agent review labels Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-approved Approved by agent pr-bugfix this PR patches a bug in codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LIKE ... ESCAPE '' panics in planner type checking

1 participant