Parser: fix exponential parse time on nested function-call arguments#2378
Open
altmannmarcelo wants to merge 1 commit into
Open
Parser: fix exponential parse time on nested function-call arguments#2378altmannmarcelo wants to merge 1 commit into
altmannmarcelo wants to merge 1 commit into
Conversation
On dialects with expression-named function arguments (supports_named_fn_args_with_expr_name, e.g. PostgreSQL and MSSQL), parse_function_args speculatively parsed the full argument expression to detect named-argument syntax (name => value), then rewound and re-parsed it as a positional argument when no operator followed. For deeply nested function-call arguments such as replace(replace(replace(...))), this doubled the work at every nesting level, giving O(2^n) parse time. Measured parse time for an n-deep replace() chain (PostgreSQL) roughly doubles per level: depth 5 ~0.002s, depth 10 ~0.05s, depth 15 ~1.3s, depth 20 ~43s. A 30-level chain was not run; extrapolating the same doubling puts it near 12 hours. Parse the leading expression once, then check for a named-argument operator: when one follows it is a named argument, otherwise the same expression is the positional argument. The shared positional-argument tail (wildcard options, AS aliasing) moves into parse_unnamed_function_arg. test_reserved_keywords_for_identifiers previously asserted that expression-named dialects report 'found: interval' for SELECT MAX(interval); that message was an artifact of the second parse hitting the position-keyed failure cache. With a single parse the error matches every other dialect: 'found: )'. Adds a regression test and a benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Parser::parse_function_argsruns inO(2^n)time onn-deep nested function-call arguments (e.g.replace(replace(replace(...)))) for any dialect wheresupports_named_fn_args_with_expr_name()is true — notably PostgreSQL and MSSQL. A 20-level chain takes ~43s; a 30-level chain would take ~12 hours. The fix makes parse time linear.Root cause
To support expression-named arguments like
JSON_OBJECT(a + b : c),parse_function_argsspeculatively parses the entire argument as an expression to see whether a named-argument operator (=>,:=,:) follows:For a plain positional argument that is itself a function call, the speculative
parse_expr()consumes the whole nested subtree, finds no named-arg operator,maybe_parserewinds, andparse_wildcard_expr()parses the identical subtree again. Each nesting level parses its child twice:T(n) = 2*T(n-1)→O(2^n).Introduced in commit
92be237— "Add support for MSSQL'sJSON_ARRAY/JSON_OBJECTexpr (#1507)" — which changed named-argument detection from a single-tokenparse_identifierlookahead to this speculative full-expression parse.Fix
Parse the leading expression once, then check for a named-argument operator (an O(1) token check). If one follows, it is a named argument; otherwise the same already-parsed expression is the positional argument. The positional tail (
* EXCLUDE,expr AS alias) is factored intoparse_unnamed_function_arg, shared by both paths. No re-parse; behavior is otherwise unchanged.Evidence
Pathological input:
SELECT replace(replace(... x ..., 'a','b'), 'a','b'), PostgreSQL dialect. Parse time per nesting depth, before vs after the fix (milliseconds):Both columns are parse-only (min of 3 runs). Before: parse time doubles per level —
O(2^n)(additional measured points: depth 15 = 1,350 ms, depth 18 = 10,600 ms). After: linear, ~0.0024 ms per level. At depth 20 that is roughly an 870,000x speedup. The depth-30 case was not run before the fix; extrapolating the doubling (42,600 ms x 2^10) puts it near 12 hours.