Handle leading combinator selectors (>, +, ~) in SelectorConverter#119
Handle leading combinator selectors (>, +, ~) in SelectorConverter#119eyupcanakman wants to merge 1 commit intovoku:masterfrom
Conversation
Selectors like "> span" now work when called from element->find(), fixing issue voku#102. The combinator is intercepted before reaching Symfony's CssSelector and converted to the correct XPath axis.
Summary of ChangesHello @eyupcanakman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where CSS selectors starting with combinators (e.g., "> span") failed when used with element->find() due to limitations in Symfony's CssSelector component. The solution involves intercepting these selectors, stripping the leading combinator, and then dynamically constructing the correct XPath by replacing the default descendant-or-self:: axis with the appropriate child or sibling axis, thereby extending the functionality and robustness of the SelectorConverter. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
There was a problem hiding this comment.
Code Review
The pull request aims to address the issue of leading combinators in CSS selectors by intercepting them before they reach the Symfony CssSelector component. However, the current implementation introduces a potential Denial of Service (DoS) vulnerability due to uncontrolled recursion and risks memory exhaustion from an unbounded static cache for compiled selectors. Additionally, the use of str_replace on the entire XPath string can lead to incorrect results for nested selectors and fails to handle special selectors like text or comment.
| throw new \RuntimeException('Selector cannot end with a combinator: ' . $selector); | ||
| } | ||
|
|
||
| $restXPath = self::toXPath($restSelector, $ignoreCssSelectorErrors, $isForHtml); |
There was a problem hiding this comment.
The toXPath method recursively calls itself when a CSS selector starts with a combinator (>, +, or ~). An attacker can provide a maliciously crafted selector with a large number of leading combinators (e.g., >>>>...) to trigger a stack overflow or exceed the PHP recursion limit. This will cause the PHP process to crash or terminate abruptly, leading to a Denial of Service (DoS). To remediate this, consider using a loop to strip leading combinators instead of recursion, or implement a maximum recursion depth check.
| switch ($combinator) { | ||
| case '>': | ||
| $xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath); | ||
| break; | ||
| case '+': | ||
| $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::*[1]/self::', $restXPath); | ||
| break; | ||
| case '~': | ||
| $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::', $restXPath); | ||
| break; | ||
| default: | ||
| throw new \RuntimeException('Unexpected combinator: ' . $combinator); | ||
| } |
There was a problem hiding this comment.
The use of str_replace to modify the XPath axis is problematic for two reasons:
- Nested Selectors: It replaces all occurrences of
descendant-or-self::. If the rest of the selector contains nested descendant relationships (e.g.,> div span), Symfony might produce an XPath with multipledescendant-or-self::(or similar) axes. Replacing all of them would incorrectly turn all descendants into direct children throughout the entire chain (e.g., turning> div spaninto/*/div/*/span). - Special Selectors: It does not handle the
//prefix returned by the specialtextandcommentselectors (lines 36-43). For example,> textwould result in//text()instead of/*/text(), causing it to find all descendant text nodes instead of just direct children.
It is safer to only replace the leading axis prefix of the restXPath string.
switch ($combinator) {
case '>':
$replacement = '/*/';
break;
case '+':
$replacement = '/*/following-sibling::*[1]/self::';
break;
case '~':
$replacement = '/*/following-sibling::';
break;
default:
throw new \RuntimeException('Unexpected combinator: ' . $combinator);
}
if (\strpos($restXPath, 'descendant-or-self::') === 0) {
$xPathQuery = $replacement . \substr($restXPath, 20);
} elseif (\strpos($restXPath, '//') === 0) {
$xPathQuery = $replacement . \substr($restXPath, 2);
} else {
$xPathQuery = $replacement . $restXPath;
}| throw new \RuntimeException('Unexpected combinator: ' . $combinator); | ||
| } | ||
|
|
||
| self::$compiled[$selector] = $xPathQuery; |
There was a problem hiding this comment.
The results of XPath conversions are stored in a static array self::$compiled without any size limit or eviction policy. An attacker can send a large number of unique CSS selectors to the application, causing the cache to grow indefinitely until the server's memory is exhausted. This leads to a Denial of Service (DoS) as the PHP process will be terminated by the OOM killer or hit the memory_limit. This is particularly relevant in long-running PHP environments (e.g., Swoole, RoadRunner, or long-lived CLI workers). To remediate this, implement a maximum size for the cache with an eviction policy like LRU (Least Recently Used).
WalkthroughThe changes add support for CSS selectors beginning with combinators (>, +, ~) to SelectorConverter.php, enabling direct child, adjacent sibling, and general sibling XPath conversions. A comprehensive test suite validates this functionality across various selector patterns and integration scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee96621bdf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| switch ($combinator) { | ||
| case '>': | ||
| $xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath); |
There was a problem hiding this comment.
Replace only leading scope axis for combinator rewrites
The combinator branch rewrites XPath with a global str_replace, which also changes descendant axes inside the remainder selector instead of only the implicit scope prefix. For selectors like > div span, Symfony emits an XPath containing an inner /descendant-or-self::*/ for the descendant combinator, and this code rewrites that too, effectively turning descendant matching into child-only traversal and missing valid results. The rewrite should be limited to the leading descendant-or-self:: segment only.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/voku/helper/SelectorConverter.php`:
- Around line 61-73: The current switch in SelectorConverter.php replaces all
occurrences of 'descendant-or-self::' in $restXPath using str_replace, which
breaks grouped selectors; change each case handling the $combinator ('>', '+',
'~') to replace only the first occurrence in $restXPath (e.g., use preg_replace
with a limit of 1 or a function that replaces the first match) when assigning
$xPathQuery so only the leading axis is transformed and grouped selectors
(pipes) remain correct; update the cases for '>', '+', and '~' that currently
operate on $restXPath/$xPathQuery and keep the default RuntimeException
unchanged.
🧹 Nitpick comments (2)
src/voku/helper/SelectorConverter.php (1)
50-51: Minor:$selectoris already used as cache key on line 31, butltrimis applied again on line 50.If the caller passes a selector with leading whitespace (e.g.,
" > span"), it will bypass the early cache check for the trimmed equivalent"> span"if that was cached previously. This is consistent with pre-existing behavior (the//check on line 45 andtext/commentchecks also use the original string), but worth noting for awareness — not blocking.tests/SelectorConverterChildCombinatorTest.php (1)
52-64: Missing integration tests for+and~combinators.The adjacent (
+) and general (~) sibling combinator tests only assert XPath string content but don't verify actual DOM querying behavior. Given the XPath correctness concern flagged in the source file, adding integration tests (similar totestIssue102Reproduction) for these combinators would help catch regressions.
| switch ($combinator) { | ||
| case '>': | ||
| $xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath); | ||
| break; | ||
| case '+': | ||
| $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::*[1]/self::', $restXPath); | ||
| break; | ||
| case '~': | ||
| $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::', $restXPath); | ||
| break; | ||
| default: | ||
| throw new \RuntimeException('Unexpected combinator: ' . $combinator); | ||
| } |
There was a problem hiding this comment.
str_replace replaces all occurrences of descendant-or-self:: — breaks grouped selectors.
If restSelector is a grouped selector (e.g., > .a, .b), Symfony produces XPath with multiple descendant-or-self:: axes (one per group, joined by |). str_replace would replace all of them, incorrectly applying the combinator axis to every group member.
Consider using preg_replace with a limit of 1 to replace only the leading axis:
Proposed fix
- switch ($combinator) {
- case '>':
- $xPathQuery = \str_replace('descendant-or-self::', '/*/', $restXPath);
- break;
- case '+':
- $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::*[1]/self::', $restXPath);
- break;
- case '~':
- $xPathQuery = \str_replace('descendant-or-self::', '/*/following-sibling::', $restXPath);
- break;
- default:
- throw new \RuntimeException('Unexpected combinator: ' . $combinator);
- }
+ switch ($combinator) {
+ case '>':
+ $replacement = '/*/';
+ break;
+ case '+':
+ $replacement = '/*/following-sibling::*[1]/self::';
+ break;
+ case '~':
+ $replacement = '/*/following-sibling::';
+ break;
+ default:
+ throw new \RuntimeException('Unexpected combinator: ' . $combinator);
+ }
+ $xPathQuery = \preg_replace('/descendant-or-self::/', $replacement, $restXPath, 1);🤖 Prompt for AI Agents
In `@src/voku/helper/SelectorConverter.php` around lines 61 - 73, The current
switch in SelectorConverter.php replaces all occurrences of
'descendant-or-self::' in $restXPath using str_replace, which breaks grouped
selectors; change each case handling the $combinator ('>', '+', '~') to replace
only the first occurrence in $restXPath (e.g., use preg_replace with a limit of
1 or a function that replaces the first match) when assigning $xPathQuery so
only the leading axis is transformed and grouped selectors (pipes) remain
correct; update the cases for '>', '+', and '~' that currently operate on
$restXPath/$xPathQuery and keep the default RuntimeException unchanged.



Fixes #102
Problem
Selectors starting with a combinator like
> spanthrow an error when used withelement->find():This happens because Symfony's CssSelector doesn't support standalone combinators — they're only valid in a compound expression like
div > span.Solution
Intercept selectors starting with
>,+, or~inSelectorConverter::toXPath()before they reach Symfony. The combinator is stripped, the rest is converted normally via Symfony, then thedescendant-or-self::axis prefix is replaced with the correct XPath axis:>/*/+/*/following-sibling::*[1]/self::~/*/following-sibling::Tests
Added
SelectorConverterChildCombinatorTest.phpwith 10 tests covering:> p > span)> span.highlight)element->find()matching the original issue's reproduction case> spanonly matches direct children, not nested descendantsThis change is
Summary by CodeRabbit
Release Notes
New Features
Tests