Parser: fix exponential parse time on nested function-call arguments by altmannmarcelo · Pull Request #2378 · apache/datafusion-sqlparser-rs

altmannmarcelo · 2026-06-16T15:03:30Z

Summary

Parser::parse_function_args runs in O(2^n) time on n-deep nested function-call arguments (e.g. replace(replace(replace(...)))) for any dialect where supports_named_fn_args_with_expr_name() is true — notably PostgreSQL and MSSQL. A 20-level chain takes ~43s; a 30-level chain would take ~12 hours. The fix makes parse time linear.

Root cause

To support expression-named arguments like JSON_OBJECT(a + b : c), parse_function_args speculatively parses the entire argument as an expression to see whether a named-argument operator (=>, :=, :) follows:

let arg = if self.dialect.supports_named_fn_args_with_expr_name() {
    self.maybe_parse(|p| {
        let name = p.parse_expr()?;                         // parse the WHOLE argument
        let operator = p.parse_function_named_arg_operator()?;  // needs =>/:=/:
        let arg = p.parse_wildcard_expr()?.into();
        Ok(FunctionArg::ExprNamed { name, arg, operator })
    })?
} else { ... };
if let Some(arg) = arg { return Ok(arg); }
let wildcard_expr = self.parse_wildcard_expr()?;            // RE-PARSE the same argument

For a plain positional argument that is itself a function call, the speculative parse_expr() consumes the whole nested subtree, finds no named-arg operator, maybe_parse rewinds, and parse_wildcard_expr() parses the identical subtree again. Each nesting level parses its child twice: T(n) = 2*T(n-1) → O(2^n).

Introduced in commit 92be237 — "Add support for MSSQL's JSON_ARRAY/JSON_OBJECT expr (#1507)" — which changed named-argument detection from a single-token parse_identifier lookahead to this speculative full-expression parse.

Fix

Parse the leading expression once, then check for a named-argument operator (an O(1) token check). If one follows, it is a named argument; otherwise the same already-parsed expression is the positional argument. The positional tail (* EXCLUDE, expr AS alias) is factored into parse_unnamed_function_arg, shared by both paths. No re-parse; behavior is otherwise unchanged.

Evidence

Pathological input: SELECT replace(replace(... x ..., 'a','b'), 'a','b'), PostgreSQL dialect. Parse time per nesting depth, before vs after the fix (milliseconds):

depth	before	after
10	48 ms	0.025 ms
20	42,600 ms	0.049 ms
30	~43,000,000 ms (extrapolated, not run)	0.071 ms

Both columns are parse-only (min of 3 runs). Before: parse time doubles per level — O(2^n) (additional measured points: depth 15 = 1,350 ms, depth 18 = 10,600 ms). After: linear, ~0.0024 ms per level. At depth 20 that is roughly an 870,000x speedup. The depth-30 case was not run before the fix; extrapolating the doubling (42,600 ms x 2^10) puts it near 12 hours.

On dialects with expression-named function arguments (supports_named_fn_args_with_expr_name, e.g. PostgreSQL and MSSQL), parse_function_args speculatively parsed the full argument expression to detect named-argument syntax (name => value), then rewound and re-parsed it as a positional argument when no operator followed. For deeply nested function-call arguments such as replace(replace(replace(...))), this doubled the work at every nesting level, giving O(2^n) parse time. Measured parse time for an n-deep replace() chain (PostgreSQL) roughly doubles per level: depth 5 ~0.002s, depth 10 ~0.05s, depth 15 ~1.3s, depth 20 ~43s. A 30-level chain was not run; extrapolating the same doubling puts it near 12 hours. Parse the leading expression once, then check for a named-argument operator: when one follows it is a named argument, otherwise the same expression is the positional argument. The shared positional-argument tail (wildcard options, AS aliasing) moves into parse_unnamed_function_arg. test_reserved_keywords_for_identifiers previously asserted that expression-named dialects report 'found: interval' for SELECT MAX(interval); that message was an artifact of the second parse hitting the position-keyed failure cache. With a single parse the error matches every other dialect: 'found: )'. Adds a regression test and a benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser: fix exponential parse time on nested function-call arguments#2378

Parser: fix exponential parse time on nested function-call arguments#2378
altmannmarcelo wants to merge 1 commit into
apache:mainfrom
readysettech:parse-function-args-exponential

altmannmarcelo commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

altmannmarcelo commented Jun 16, 2026

Summary

Root cause

Fix

Evidence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant