regcomp: Don't parse ahead in sets recursion#24299
regcomp: Don't parse ahead in sets recursion#24299khwilliamson wants to merge 3 commits intoPerl:bleadfrom
Conversation
|
I can confirm that when this p.r. is applied, the test program in #24238 (comment) will run to completion without warnings or exceptions. Not otherwise reviewed. |
| /* If recursed down multiple set calls, leave the parse as-is for the next | ||
| * layer up. (GH #24238). Otherwise advance to prepare for the next |
There was a problem hiding this comment.
I don't really know the regex engine, but I'm uncomfortable with the difference in expectation of the returned character position for the outermost call and any inner recursive calls (presumably from the reg() in case '(' above).
I'll look into it further.
There was a problem hiding this comment.
It does seem like the nextchar() call could be moved up to the end of the else immediately above.
I couldn't see anything that indicated to me where reg() and handle_regex_sets() were expected to set the parse position on a successful end, and this change makes it appear the expected result from non-recursive calls to reg() and calls to reg() from handle_regex_sets() differ in that expectation, which seems fragile.
But I don't think that's something to fix for 5.44.
But I still think the nextchar() could be moved up, otherwise it's fine.
|
This looks like you rebased on the wrong thing. |
This extra information can be useful in debugging
This adds printing the caller line number to this debug routine. This enables plunking down calls to it temporarily when actually doing debugging and being able to distinguish between them with a minimum of fuss.
Fixes Perl#24238 This bug was introduced by commit d8d1ded Author: Karl Williamson <khw@cpan.org> Date: Mon Feb 17 12:07:07 2020 -0700 Improve handling of nested qr/(?[...])/ which changed how interpolated ("nested") sets expressions were handled. The nesting is done by recursive calls then as now, but certain details were changed by that commit The current bug is due to getting the next character in the input at the end of a nested call, and the layer above is not expecting that. The solution is just to avoid getting that character when recursed. However, this causes a different error message to be displayed than prior to this commit in certain cases. The old message displayed was Unexpected ']' with no following ')' which is accurate. The new one is Operand with no preceding operator which is no less accurate. I considered making a special case to try to avoid the diagnostic wording change, and I would have done so if this were just a warning. But it is a fatal error, so I think it quite unlikely that real code is expecting to not compile and to not compile with this particular message. And I think the new message is a bit more to the point of what is wrong, so is actually better than before.
Fixes #24238
This bug was introduced by
commit d8d1ded
Author: Karl Williamson khw@cpan.org
Date: Mon Feb 17 12:07:07 2020 -0700
which changed how interpolated ("nested") sets expressions were handled. The nesting is done by recursive calls then as now, but certain details were changed by that commit
The current bug is due to getting the next character in the input at the end of a nested call, and the layer above is not expecting that. The solution is just to avoid getting that character when recursed.
However, this causes a different error message to be displayed than prior to this commit in certain cases. The old message displayed was
which is accurate. The new one is
which is no less accurate. I considered making a special case to try to avoid the diagnostic wording change, and I would have done so if this were just a warning. But it is a fatal error, so I think it quite unlikely that real code is expecting to not compile and to not compile with this particular message. And I think the new message is a bit more to the point of what is wrong, so is actually better than before.