Skip to content

regcomp: Don't parse ahead in sets recursion#24299

Open
khwilliamson wants to merge 3 commits intoPerl:bleadfrom
khwilliamson:sets_bug
Open

regcomp: Don't parse ahead in sets recursion#24299
khwilliamson wants to merge 3 commits intoPerl:bleadfrom
khwilliamson:sets_bug

Conversation

@khwilliamson
Copy link
Copy Markdown
Contributor

Fixes #24238

This bug was introduced by

commit d8d1ded
Author: Karl Williamson khw@cpan.org
Date: Mon Feb 17 12:07:07 2020 -0700

 Improve handling of nested qr/(?[...])/

which changed how interpolated ("nested") sets expressions were handled. The nesting is done by recursive calls then as now, but certain details were changed by that commit

The current bug is due to getting the next character in the input at the end of a nested call, and the layer above is not expecting that. The solution is just to avoid getting that character when recursed.

However, this causes a different error message to be displayed than prior to this commit in certain cases. The old message displayed was

Unexpected ']' with no following ')'

which is accurate. The new one is

 Operand with no preceding operator

which is no less accurate. I considered making a special case to try to avoid the diagnostic wording change, and I would have done so if this were just a warning. But it is a fatal error, so I think it quite unlikely that real code is expecting to not compile and to not compile with this particular message. And I think the new message is a bit more to the point of what is wrong, so is actually better than before.

  • This set of changes does not require a perldelta entry.

@jkeenan
Copy link
Copy Markdown
Contributor

jkeenan commented Mar 19, 2026

I can confirm that when this p.r. is applied, the test program in #24238 (comment) will run to completion without warnings or exceptions. Not otherwise reviewed.

Comment thread regcomp.c Outdated
Comment thread regcomp.c Outdated
Comment thread regcomp.c
Comment on lines +9218 to +9219
/* If recursed down multiple set calls, leave the parse as-is for the next
* layer up. (GH #24238). Otherwise advance to prepare for the next
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know the regex engine, but I'm uncomfortable with the difference in expectation of the returned character position for the outermost call and any inner recursive calls (presumably from the reg() in case '(' above).

I'll look into it further.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like the nextchar() call could be moved up to the end of the else immediately above.

I couldn't see anything that indicated to me where reg() and handle_regex_sets() were expected to set the parse position on a successful end, and this change makes it appear the expected result from non-recursive calls to reg() and calls to reg() from handle_regex_sets() differ in that expectation, which seems fragile.

But I don't think that's something to fix for 5.44.

But I still think the nextchar() could be moved up, otherwise it's fine.

@tonycoz
Copy link
Copy Markdown
Contributor

tonycoz commented Apr 16, 2026

This looks like you rebased on the wrong thing.

This extra information can be useful in debugging
This adds printing the caller line number to this debug routine.  This
enables plunking down calls to it temporarily when actually doing
debugging and being able to distinguish between them with a minimum of
fuss.
Fixes Perl#24238

This bug was introduced by

commit d8d1ded
 Author: Karl Williamson <khw@cpan.org>
 Date:   Mon Feb 17 12:07:07 2020 -0700

     Improve handling of nested qr/(?[...])/

which changed how interpolated ("nested") sets expressions were handled.
The nesting is done by recursive calls then as now, but certain details
were changed by that commit

The current bug is due to getting the next character in the input at the
end of a nested call, and the layer above is not expecting that.  The
solution is just to avoid getting that character when recursed.

However, this causes a different error message to be displayed than
prior to this commit in certain cases.  The old message displayed was

    Unexpected ']' with no following ')'

which is accurate.  The new one is

     Operand with no preceding operator

which is no less accurate.  I considered making a special case to try to
avoid the diagnostic wording change, and I would have done so if this
were just a warning.  But it is a fatal error, so I think it quite unlikely
that real code is expecting to not compile and to not compile with this
particular message.  And I think the new message is a bit more to the
point of what is wrong, so is actually better than before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regexes using Extended Bracketed Character Classes fail to compile when interpolated and using the /x modifier

3 participants