?? ch05_02.htm
字號:
<html><head><title>Pattern-Matching Operators (Programming Perl)</title><!-- STYLESHEET --><link rel="stylesheet" type="text/css" href="../style/style1.css"><!-- METADATA --><!--Dublin Core Metadata--><meta name="DC.Creator" content=""><meta name="DC.Date" content=""><meta name="DC.Format" content="text/xml" scheme="MIME"><meta name="DC.Generator" content="XSLT stylesheet, xt by James Clark"><meta name="DC.Identifier" content=""><meta name="DC.Language" content="en-US"><meta name="DC.Publisher" content="O'Reilly & Associates, Inc."><meta name="DC.Source" content="" scheme="ISBN"><meta name="DC.Subject.Keyword" content=""><meta name="DC.Title" content="Pattern-Matching Operators"><meta name="DC.Type" content="Text.Monograph"></head><body><!-- START OF BODY --><!-- TOP BANNER --><img src="gifs/smbanner.gif" usemap="#banner-map" border="0" alt="Book Home"><map name="banner-map"><AREA SHAPE="RECT" COORDS="0,0,466,71" HREF="index.htm" ALT="Programming Perl"><AREA SHAPE="RECT" COORDS="467,0,514,18" HREF="jobjects/fsearch.htm" ALT="Search this book"></map><!-- TOP NAV BAR --><div class="navbar"><table width="515" border="0"><tr><td align="left" valign="top" width="172"><a href="ch05_01.htm"><img src="../gifs/txtpreva.gif" alt="Previous" border="0"></a></td><td align="center" valign="top" width="171"><a href="ch05_01.htm">Chapter 5: Pattern Matching</a></td><td align="right" valign="top" width="172"><a href="ch05_03.htm"><img src="../gifs/txtnexta.gif" alt="Next" border="0"></a></td></tr></table></div><hr width="515" align="left"><!-- SECTION BODY --><h2 class="sect1">5.2. Pattern-Matching Operators</h2><p><a name="INDEX-1298"></a><a name="INDEX-1299"></a>Zoologically speaking, Perl's pattern-matching operators function as akind of cage for regular expressions, to keep them from getting out.This is by design; if we were to let the regex beasties wanderthroughout the language, Perl would be a total jungle. The world needsits jungles, of course--they're the engines of biological diversity,after all--but jungles should stay where they belong. Similarly,despite being the engines of combinatorial diversity, regularexpressions should stay inside pattern match operators where theybelong. It's a jungle in there.</p><p><a name="INDEX-1300"></a><a name="INDEX-1301"></a><a name="INDEX-1302"></a><a name="INDEX-1303"></a><a name="INDEX-1304"></a><a name="INDEX-1305"></a><a name="INDEX-1306"></a><a name="INDEX-1307"></a><a name="INDEX-1308"></a>As if regular expressions weren't powerful enough, the <tt class="literal">m//</tt> and<tt class="literal">s///</tt> operators also provide the (likewise confined) power of double-quoteinterpolation. Since patterns are parsed like double-quoted strings,all the normal double-quote conventions will work, including variableinterpolation (unless you use single quotes as the delimiter) andspecial characters indicated with backslash escapes. (See "SpecificCharacters" later in this chapter.) These are applied before the string is interpretedas a regular expression. (This is one of the few places in the Perllanguage where a string undergoes more than one pass of processing.)The first pass is not quite normal double-quote interpolation, in thatit knows what it should interpolate and what it should pass on to theregular expression parser. So, for instance, any <tt class="literal">$</tt> immediatelyfollowed by a vertical bar, closing parenthesis, or the end of thestring will be treated not as a variable interpolation, but as thetraditional regex assertion meaning end-of-line. So if you say:<blockquote><pre class="programlisting">$foo = "bar";/$foo$/;</pre></blockquote>the double-quote interpolation pass knows that those two <tt class="literal">$</tt> signs arefunctioning differently. It does the interpolation of <tt class="literal">$foo</tt>, thenhands this to the regular expression parser:<blockquote><pre class="programlisting">/bar$/;</pre></blockquote><a name="INDEX-1309"></a>Another consequence of this two-pass parsing is that the ordinary Perltokener finds the end of the regular expression first, just as if itwere looking for the terminating delimiter of an ordinary string.Only after it has found the end of the string (and done any variableinterpolation) is the pattern treated as a regular expression. Amongother things, this means you can't "hide" the terminating delimiter ofa pattern inside a regex construct (such as a character class or aregex comment, which we haven't covered yet). Perl will see thedelimiter wherever it is and terminate the pattern at that point.</p><p><a name="INDEX-1310"></a><a name="INDEX-1311"></a>You should also know that interpolating variables into a pattern slowsdown the pattern matcher, because it feels it needs to check whetherthe variable has changed, in case it has to recompile the pattern(which will slow it down even further). See "Variable Interpolation"later in this chapter.</p><p><a name="INDEX-1312"></a>The <tt class="literal">tr///</tt> transliteration operator does not interpolate variables;it doesn't even use regular expressions! (In fact, it probably doesn'tbelong in this chapter at all, but we couldn't think of a better placeto put it.) It does share one feature with <tt class="literal">m//</tt> and <tt class="literal">s///</tt>,however: it binds to variables using the <tt class="literal">=~</tt> and <tt class="literal">!~</tt> operators.</p><p><a name="INDEX-1313"></a><a name="INDEX-1314"></a><a name="INDEX-1315"></a><a name="INDEX-1316"></a>The <tt class="literal">=~</tt> and <tt class="literal">!~</tt> operators,described in <a href="ch03_01.htm">Chapter 3, "Unary and Binary Operators"</a>,bind the scalar expression on their lefthand side to one of threequote-like operators on their right: <tt class="literal">m//</tt> formatching a pattern, <tt class="literal">s///</tt> for substituting somestring for a substring matched by a pattern, and<tt class="literal">tr///</tt> (or its synonym, <tt class="literal">y///</tt>) fortransliterating one set of characters to another set. (You may write<tt class="literal">m//</tt> as <tt class="literal">//</tt>, without the<tt class="literal">m</tt>, if slashes are used for the delimiter.) If therighthand side of <tt class="literal">=~</tt> or <tt class="literal">!~</tt> isnone of these three, it still counts as a <tt class="literal">m//</tt>matching operation, but there'll be no place to put any trailingmodifiers (see "Pattern Modifiers" later), and you'll have to handleyour own quoting:<blockquote><pre class="programlisting">print "matches" if $somestring =~ $somepattern;</pre></blockquote>Really, there's little reason not to spell it out explicitly:<blockquote><pre class="programlisting">print "matches" if $somestring =~ m/$somepattern/;</pre></blockquote>When used for a matching operation, <tt class="literal">=~</tt> and <tt class="literal">!~</tt> are sometimespronounced "matches" and "doesn't match" respectively (although"contains" and "doesn't contain" might cause less confusion).</p><p><a name="INDEX-1317"></a><a name="INDEX-1318"></a><a name="INDEX-1319"></a>Apart from the <tt class="literal">m//</tt> and <tt class="literal">s///</tt>operators, regular expressions show up in two other places in Perl.The first argument to the <tt class="literal">split</tt> function is aspecial match operator specifying what <em class="emphasis">not</em> toreturn when breaking a string into multiple substrings. See thedescription and examples for <tt class="literal">split</tt> in <a href="ch29_01.htm">Chapter 29, "Functions"</a>. The <tt class="literal">qr//</tt> ("quoteregex") operator also specifies a pattern via a regex, but it doesn'ttry to match anything (unlike <tt class="literal">m//</tt>, which does).Instead, the compiled form of the regex is returned for future use.See "Variable Interpolation" for more information.</p><p><a name="INDEX-1320"></a><a name="INDEX-1321"></a><a name="INDEX-1322"></a>You apply one of the <tt class="literal">m//</tt>, <tt class="literal">s///</tt>, or <tt class="literal">tr///</tt> operators to aparticular string with the <tt class="literal">=~</tt> binding operator (which isn't a realoperator, just a kind of topicalizer, linguistically speaking). Hereare some examples:<blockquote><pre class="programlisting">$haystack =~ m/needle/ # match a simple pattern$haystack =~ /needle/ # same thing$italiano =~ s/butter/olive oil/ # a healthy substitution$rotate13 =~ tr/a-zA-Z/n-za-mN-ZA-M/ # easy encryption (to break)</pre></blockquote>Without a binding operator, <tt class="literal">$_</tt> is implicitly used as the "topic":<blockquote><pre class="programlisting">/new life/ and # search in $_ and (if found) /new civilizations/ # boldly search $_ agains/sugar/aspartame/ # substitute a substitute into $_tr/ATCG/TAGC/ # complement the DNA stranded in $_</pre></blockquote><a name="INDEX-1323"></a><a name="INDEX-1324"></a>Because <tt class="literal">s///</tt> and <tt class="literal">tr///</tt> change the scalar to whichthey're applied, you may only use them on valid lvalues:<blockquote><pre class="programlisting">"onshore" =~ s/on/off/; # WRONG: compile-time error</pre></blockquote>However, <tt class="literal">m//</tt> works on the result of any scalar expression:<blockquote><pre class="programlisting">if ((lc $magic_hat->fetch_contents->as_string) =~ /rabbit/) { print "Nyaa, what's up doc?\n";}else { print "That trick never works!\n";}</pre></blockquote>But you have to be a wee bit careful, since <tt class="literal">=~</tt> and<tt class="literal">!~</tt> have rather high precedence--in our previousexample the parentheses are necessary around the leftterm.<a href="#FOOTNOTE-3">[3]</a> The <tt class="literal">!~</tt> binding operatorworks like <tt class="literal">=~</tt>, but negates the logical result ofthe operation:<blockquote><pre class="programlisting">if ($song !~ /words/) { print qq/"$song" appears to be a song without words.\n/;}</pre></blockquote><a name="INDEX-1325"></a>Since <tt class="literal">m//</tt>, <tt class="literal">s///</tt>, and<tt class="literal">tr///</tt> are quote operators, you may pick your owndelimiters. These work in the same way as the quoting operators<tt class="literal">q//</tt>, <tt class="literal">qq//</tt>,<tt class="literal">qr//</tt>, and <tt class="literal">qw//</tt> (see the section<a href="ch02_06.htm#ch02-sect-pick">Section 5.6.3, "Pick Your Own Quotes"</a> in<a href="ch02_01.htm">Chapter 2, "Bits and Pieces"</a>).<blockquote><pre class="programlisting">$path =~ s#/tmp#/var/tmp/scratch#;if ($dir =~ m[/bin]) { print "No binary directories please.\n";}</pre></blockquote>When using paired delimiters with <tt class="literal">s///</tt> or <tt class="literal">tr///</tt>, ifthe first part is one of the four customary bracketing pairs (angle,round, square, or curly), you may choose different delimiters for thesecond part than you chose for the first:<blockquote><pre class="programlisting">s(egg)<larva>;s{larva}{pupa};s[pupa]/imago/;</pre></blockquote>Whitespace is allowed in front of the opening delimiters:<blockquote><pre class="programlisting">s (egg) <larva>;s {larva} {pupa};s [pupa] /imago/;</pre></blockquote><a name="INDEX-1326"></a><a name="INDEX-1327"></a><a name="INDEX-1328"></a>Each time a pattern successfully matches (including the pattern in asubstitution), it sets the <tt class="literal">$`</tt>, <tt class="literal">$&</tt>, and <tt class="literal">$'</tt> variables to the textleft of the match, the whole match, and the text right of the match. Thisis useful for pulling apart strings into their components:<blockquote><pre class="programlisting">"hot cross buns" =~ /cross/;print "Matched: <$`> $& <$'>\n"; # Matched: <hot > cross < buns>print "Left: <$`>\n"; # Left: <hot >print "Match: <$&>\n"; # Match: <cross>print "Right: <$'>\n"; # Right: < buns></pre></blockquote><a name="INDEX-1329"></a>For better granularity and efficiency, use parentheses to capture theparticular portions that you want to keep around. Each pair ofparentheses captures the substring corresponding to the<em class="emphasis">subpattern</em> in the parentheses. The pairs ofparentheses are numbered from left to right by the positions of theleft parentheses; the substrings corresponding to those subpatternsare available after the match in the numbered variables,<tt class="literal">$1</tt>, <tt class="literal">$2</tt>, <tt class="literal">$3</tt>,and so on:<a href="#FOOTNOTE-4">[4]</a><blockquote><pre class="programlisting">$_ = "Bilbo Baggins's birthday is September 22";/(.*)'s birthday is (.*)/;print "Person: $1\n";print "Date: $2\n";</pre></blockquote><a name="INDEX-1330"></a><a name="INDEX-1331"></a><a name="INDEX-1332"></a><tt class="literal">$`</tt>, <tt class="literal">$&</tt>, <tt class="literal">$'</tt>, and the numbered variables are global variablesimplicitly localized to the enclosing dynamic scope. They last untilthe next successful pattern match or the end of the current scope,whichever comes first. More on this later, in a different scope.</p><blockquote class="footnote"><a name="FOOTNOTE-3"></a><p>[3] Without the parentheses, the lower-precedence<tt class="literal">lc</tt> would have applied to the whole pattern matchinstead of just the method call on the magic hatobject.</p></blockquote><blockquote class="footnote"><a name="FOOTNOTE-4"></a><p>[4] Not <tt class="literal">$0</tt>, though, whichholds the name of your program.</p></blockquote><p>Once Perl sees that you need one of <tt class="literal">$`</tt>,<tt class="literal">$&</tt>, or <tt class="literal">$'</tt> anywhere in theprogram, it provides them for every pattern match. This will slowdown your program a bit. Perl uses a similar mechanism to produce<tt class="literal">$1</tt>, <tt class="literal">$2</tt>, and so on, so you alsopay a price for each pattern that contains capturing parentheses.(See "Clustering" to avoid the cost of capturing while still retainingthe grouping behavior.) But if you never use <tt class="literal">$`</tt><tt class="literal">$&</tt>, or <tt class="literal">$'</tt>, then patterns<em class="emphasis">without</em> capturing parentheses will not bepenalized. So it's usually best to avoid <tt class="literal">$`</tt>,<tt class="literal">$&</tt>, and <tt class="literal">$'</tt> if you can,especially in library modules. But if you must use them once (andsome algorithms really appreciate their convenience), then use them atwill, because you've already paid the price. <tt class="literal">$&</tt> isnot so costly as the other two in recent versions of Perl.</p><h3 class="sect2">5.2.1. Pattern Modifiers</h3><p><a name="INDEX-1333"></a><a name="INDEX-1334"></a><a name="INDEX-1335"></a>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -