Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 23 additions & 29 deletions cpan/perlfaq/lib/perlfaq6.pod
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,19 @@ Here's another example of using C<..>:
X<regex, XML> X<regex, HTML> X<XML> X<HTML> X<pain> X<frustration>
X<sucking out, will to live>

Do not use regexes. Use a module and forget about the
regular expressions. The L<XML::LibXML>, L<HTML::TokeParser> and
L<HTML::TreeBuilder> modules are good starts, although each namespace
has other parsing modules specialized for certain tasks and different
ways of doing it. Start at CPAN Search ( L<http://metacpan.org/> )
and wonder at all the work people have done for you already! :)
Regular expressions in Perl versions prior to 5.10.0 could not handle
recursion. Therefore the recommendation for recursive languages
like C<XML> or C<HTML> was to avoid using regexes, and use specific
parsing modules instead, like L<XML::LibXML>, L<HTML::TokeParser> or
L<HTML::TreeBuilder>. If you need a full parse tree, that recommendation
is still the best advice.

Since version 5.10.0, Perl regexes I<can> parse recursive constructs,
either through regexes crafted by hand, or through helper modules
like L<Regexp::Common> or L<Regexp::Grammars>.
Therefore, if you need to extract some specific subset of information from
an C<XML> or C<HTML> document, you I<can> construct a regexp to do so -
but this is not an easy task and it may require a fair amount of testing.

=head2 I put a regular expression into $/ but it didn't work. What's wrong?
X<$/, regexes in> X<$INPUT_RECORD_SEPARATOR, regexes in>
Expand Down Expand Up @@ -807,31 +814,18 @@ These strings do not match /\Bam\B/
"I am Sam" # "am" surrounded by non-word chars


=head2 Why does using $&, $`, or $' slow my program down?
=head2 Why did using $&, $`, or $' slow my program down in older Perls?
X<$MATCH> X<$&> X<$POSTMATCH> X<$'> X<$PREMATCH> X<$`>

(contributed by Anno Siegel)

Once Perl sees that you need one of these variables anywhere in the
program, it provides them on each and every pattern match. That means
that on every pattern match the entire string will be copied, part of it
to $`, part to $&, and part to $'. Thus the penalty is most severe with
long strings and patterns that match often. Avoid $&, $', and $` if you
can, but if you can't, once you've used them at all, use them at will
because you've already paid the price. Remember that some algorithms
really appreciate them. As of the 5.005 release, the $& variable is no
longer "expensive" the way the other two are.

Since Perl 5.6.1 the special variables @- and @+ can functionally replace
$`, $& and $'. These arrays contain pointers to the beginning and end
of each match (see perlvar for the full story), so they give you
essentially the same information, but without the risk of excessive
string copying.

Perl 5.10 added three specials, C<${^MATCH}>, C<${^PREMATCH}>, and
C<${^POSTMATCH}> to do the same job but without the global performance
penalty. Perl 5.10 only sets these variables if you compile or execute the
regular expression with the C</p> modifier.
(contributed by Anno Siegel, revised by Laurent Dami)

In versions prior to 5.20.0,
once Perl saw that you needed one of these variables anywhere in the
program, it provided them on each and every pattern match. That meant
that on every pattern match the entire string was copied, part of it
to $`, part to $&, and part to $'. Thus the penalty was most severe with
long strings and patterns that match often. Since version 5.20.0 that
problem has been solved and is no longer a concern.

=head2 What good is C<\G> in a regular expression?
X<\G>
Expand Down
29 changes: 13 additions & 16 deletions pod/perlvar.pod
Original file line number Diff line number Diff line change
Expand Up @@ -1027,12 +1027,14 @@ considered to be one of many good reasons to avoid C<goto LABEL>.

=head3 Performance issues

Traditionally in Perl, any use of any of the three variables C<$`>, C<$&>
In Perl prior to 5.20.0, any use of any of the three variables C<$`>, C<$&>
or C<$'> (or their C<use English> equivalents) anywhere in the code, caused
all subsequent successful pattern matches to make a copy of the matched
string, in case the code might subsequently access one of those variables.
This imposed a considerable performance penalty across the whole program,
so generally the use of these variables has been discouraged.
so generally the use of these variables was discouraged. Most Perl textbooks
and tutorials still reflect these ancient recommendations; but under recent
versions of Perl, they are no longer necessary, as explained below.

In Perl 5.6.0 the C<@-> and C<@+> dynamic arrays were introduced that
supply the indices of successful matches. So you could for example do
Expand Down Expand Up @@ -1068,8 +1070,9 @@ In Perl 5.20.0 a new copy-on-write system was enabled by default, which
finally fixes most of the performance issues with these three variables, and
makes them safe to use anywhere.

The C<Devel::NYTProf> and C<Devel::FindAmpersand> modules can help you
find uses of these problematic match variables in your code.
If you work with older Perl versions, when these match variables were still
problematic, then the C<Devel::NYTProf> and C<Devel::FindAmpersand> modules
can help you find uses of these variables in your code.

=over 8

Expand Down Expand Up @@ -1140,9 +1143,6 @@ X<$&> X<$MATCH>
The string matched by the last successful pattern match.
(See L</Scoping Rules of Regex Variables>.)

See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.

This variable is read-only, and its value is dynamically scoped.

Mnemonic: like C<&> in some editors.
Expand All @@ -1154,7 +1154,8 @@ It is only guaranteed to return a defined value when the pattern was
compiled or executed with the C</p> modifier.

This is similar to C<$&> (C<$MATCH>) except that to use it you must
use the C</p> modifier when executing the pattern, and it does not incur
use the C</p> modifier when executing the pattern, and in versions
prior to 5.20.0 it does not incur
the performance penalty associated with that variable.

See L</Performance issues> above.
Expand All @@ -1171,9 +1172,6 @@ X<$`> X<$PREMATCH>
The string preceding whatever was matched by the last successful
pattern match. (See L</Scoping Rules of Regex Variables>).

See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.

This variable is read-only, and its value is dynamically scoped.

Mnemonic: C<`> often precedes a quoted string.
Expand All @@ -1185,7 +1183,8 @@ It is only guaranteed to return a defined value when the pattern was
executed with the C</p> modifier.

This is similar to C<$`> ($PREMATCH) except that to use it you must
use the C</p> modifier when executing the pattern, and it does not incur
use the C</p> modifier when executing the pattern, and in versions
prior to 5.20.0 it does not incur
the performance penalty associated with that variable.

See L</Performance issues> above.
Expand All @@ -1207,9 +1206,6 @@ pattern match. (See L</Scoping Rules of Regex Variables>). Example:
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi

See L</Performance issues> above for the serious performance implications
of using this variable (even once) in your code.

This variable is read-only, and its value is dynamically scoped.

Mnemonic: C<'> often follows a quoted string.
Expand All @@ -1221,7 +1217,8 @@ It is only guaranteed to return a defined value when the pattern was
compiled or executed with the C</p> modifier.

This is similar to C<$'> (C<$POSTMATCH>) except that to use it you must
use the C</p> modifier when executing the pattern, and it does not incur
use the C</p> modifier when executing the pattern, and in versions
prior to 5.20.0 it does not incur
the performance penalty associated with that variable.

See L</Performance issues> above.
Expand Down
Loading