?? ch05_02.htm
字號(hào):
We'll discuss the individual pattern-matching operators in a moment,but first we'd like to mention another thing they all have in common,<em class="emphasis">modifiers</em>.</p><p><a name="INDEX-1336"></a><a name="INDEX-1337"></a><a name="INDEX-1338"></a><a name="INDEX-1339"></a>Immediately following the final delimiter of an<tt class="literal">m//</tt>, <tt class="literal">s///</tt>,<tt class="literal">qr//</tt>, or <tt class="literal">tr///</tt> operator, you mayoptionally place one or more single-letter modifiers, in any order.For clarity, modifiers are usually written as "the<tt class="literal">/o</tt> modifier" and pronounced "the slash ohmodifier", even though the final delimiter might be something other than aslash. (Sometimes people say "flag" or "option" to mean "modifier";that's okay too.)<a name="INDEX-1340"></a><a name="INDEX-1341"></a></p><p>Some modifiers change the behavior of the individual operator, so we'lldescribe those in detail later. Others change how the regex isinterpreted, so we'll talk about them here. The <tt class="literal">m//</tt>, <tt class="literal">s///</tt>, and<tt class="literal">qr//</tt> operators<a href="#FOOTNOTE-5">[5]</a> all accept thefollowing modifiers after their final delimiter:</p><blockquote class="footnote"><a name="FOOTNOTE-5"></a><p>[5] The <tt class="literal">tr///</tt> operator does not takeregexes, so these modifiers do not apply.</p></blockquote><a name="perl3-tab-patmods"></a><table border="1"><tr><th>Modifier</th><th>Meaning</th></tr><tr><td><tt class="literal">/i</tt></td><td>Ignore alphabetic case distinctions (case insensitive).<a name="INDEX-1342"></a><a name="INDEX-1343"></a></td></tr><tr><td><tt class="literal">/s</tt></td><td>Let <tt class="literal">.</tt> match newline and ignore deprecated <tt class="literal">$*</tt> variable.<a name="INDEX-1344"></a></td></tr><tr><td><tt class="literal">/m</tt></td><td>Let <tt class="literal">^</tt> and <tt class="literal">$</tt> match next to embedded <tt class="literal">\n</tt>.<a name="INDEX-1345"></a></td></tr><tr><td><tt class="literal">/x</tt></td><td>Ignore (most) whitespace and permit comments in pattern.<a name="INDEX-1346"></a></td></tr><tr><td><tt class="literal">/o</tt></td><td>Compile pattern once only.<a name="INDEX-1347"></a></td></tr></table><p><a name="INDEX-1348"></a>The <tt class="literal">/i</tt> modifier says to match both upper- and lowercase (and titlecase, under Unicode). That way <tt class="literal">/perl/i</tt> would also match the strings"<tt class="literal">PROPERLY</tt>" or "<tt class="literal">Perlaceous</tt>" (amongst other things). A <tt class="literal">use locale</tt>pragma may also have some influence on what is considered to be equivalent.(This may be a negative influence on strings containing Unicode.)</p><p><a name="INDEX-1349"></a><a name="INDEX-1350"></a>The <tt class="literal">/s</tt> and <tt class="literal">/m</tt> modifiers don'tinvolve anything kinky. Rather, they affect how Perl treats matchesagainst a string that contains newlines. But they aren't aboutwhether your string actually contains newlines; they're about whetherPerl should <em class="emphasis">assume</em> that your string contains asingle line (<tt class="literal">/s</tt>) or multiple lines(<tt class="literal">/m</tt>), because certain metacharacters workdifferently depending on whether they're expected to behave in aline-oriented fashion or not.</p><p><a name="INDEX-1351"></a>Ordinarily, the metacharacter "<tt class="literal">.</tt>" matches any onecharacter <em class="emphasis">except</em> a newline, because itstraditional meaning is to match characters within a line. With<tt class="literal">/s</tt>, however, the "<tt class="literal">.</tt>"metacharacter can also match a newline, because you've told Perl toignore the fact that the string might contain multiple newlines. (The<tt class="literal">/s</tt> modifier also makes Perl ignore the deprecated<tt class="literal">$*</tt> variable, which we hope you too have beenignoring.) The <tt class="literal">/m</tt> modifier, on the other hand,changes the interpretation of the <tt class="literal">^</tt> and<tt class="literal">$</tt> metacharacters by letting them match next tonewlines within the string instead of considering only the ends of thestring. See the examples in the section <a href="ch05_06.htm#ch05-sect-posit">Section 5.6, "Positions"</a> later in thischapter.</p><p><a name="INDEX-1352"></a><a name="INDEX-1353"></a><a name="INDEX-1354"></a>The <tt class="literal">/o</tt> modifier controls pattern recompilation.Unless the delimiters chosen are single quotes(<tt class="literal">m'</tt><em class="replaceable">PATTERN</em><tt class="literal">'</tt>,<tt class="literal">s'</tt><em class="replaceable">PATTERN</em><tt class="literal">'</tt><em class="replaceable">REPLACEMENT</em><tt class="literal">'</tt>,or<tt class="literal">qr'</tt><em class="replaceable">PATTERN</em><tt class="literal">'</tt>),any variables in the pattern will be interpolated (and may cause thepattern to be recompiled) every time the pattern operator isevaluated. If you want such a pattern to be compiled once and onlyonce, use the <tt class="literal">/o</tt> modifier. This prevents expensiverun-time recompilations; it's useful when the value you areinterpolating won't change during execution. However, mentioning<tt class="literal">/o</tt> constitutes a promise that you won't change thevariables in the pattern. If you do change them, Perl won't evennotice. For better control over recompilation, use the<tt class="literal">qr//</tt> regex quoting operator. See "VariableInterpolation" later in this chapter for details.</p><p><a name="INDEX-1355"></a><a name="INDEX-1356"></a>The <tt class="literal">/x</tt> is the <em class="emphasis">ex</em>pressivemodifier: it allows you to <em class="emphasis">ex</em>ploit whitespace and<em class="emphasis">ex</em>planatory comments in order to<em class="emphasis">ex</em>pand your pattern's legibility, even<em class="emphasis">ex</em>tending the pattern across newlineboundaries.</p><p><a name="INDEX-1357"></a><a name="INDEX-1358"></a>Er, that is to say, <tt class="literal">/x</tt> modifies the meaning of thewhitespace characters (and the <tt class="literal">#</tt> character):instead of letting them do self-matching as ordinary characters do, itturns them into metacharacters that, oddly, now behave as whitespace(and comment characters) should. Hence, <tt class="literal">/x</tt> allowsspaces, tabs, and newlines for formatting, just like regular Perlcode. It also allows the <tt class="literal">#</tt> character, not normallyspecial in a pattern, to introduce a comment that extends through theend of the current line within the pattern string.<a href="#FOOTNOTE-6">[6]</a> If you want to match a real whitespacecharacter (or the <tt class="literal">#</tt> character), then you'll have toput it into a character class, or escape it with a backslash, orencode it using an octal or hex escape. (But whitespace is normallymatched with a <tt class="literal">\s*</tt> or <tt class="literal">\s+</tt>sequence, so the situation doesn't arise often inpractice.)</p><blockquote class="footnote"><a name="FOOTNOTE-6"></a><p>[6] Becareful not to include the pattern delimiter in the comment--becauseof its "find the end first" rule, Perl has no way of knowing youdidn't intend to terminate the pattern at thatpoint.</p></blockquote><p>Taken together, these features go a long way toward making traditionalregular expressions a readable language. In the spirit of TMTOWTDI,there's now more than one way to write a given regular expression. Infact, there's more than two ways:<blockquote><pre class="programlisting">m/\w+:(\s+\w+)\s*\d+/; # A word, colon, space, word, space, digits.m/\w+: (\s+ \w+) \s* \d+/x; # A word, colon, space, word, space, digits.m{ \w+: # Match a word and a colon. ( # (begin group) \s+ # Match one or more spaces. \w+ # Match another word. ) # (end group) \s* # Match zero or more spaces. \d+ # Match some digits}x;</pre></blockquote><a name="INDEX-1359"></a>We'll explain those new metasymbols later in the chapter. (Thissection was supposed to be about pattern modifiers, but we've let itget out of hand in our <em class="emphasis">ex</em>citement about <tt class="literal">/x</tt>. Ah well.) Here's aregular expression that finds duplicate words in paragraphs, stolenright out of the <em class="citetitle">Perl Cookbook</em>. It uses the <tt class="literal">/x</tt> and <tt class="literal">/i</tt>modifiers, as well as the <tt class="literal">/g</tt> modifier described later.<blockquote><pre class="programlisting"># Find duplicate words in paragraphs, possibly spanning line boundaries.# Use /x for space and comments, /i to match both `is'# in "Is is this ok?", and use /g to find all dups.$/ = ""; # "paragrep" modewhile (<>) { while ( m{ \b # start at a word boundary (\w\S+) # find a wordish chunk ( \s+ # separated by some whitespace \1 # and that chunk again ) + # repeat ad lib \b # until another word boundary }xig ) { print "dup word '$1' at paragraph $.\n"; }}</pre></blockquote>When run on this chapter, it produces warnings like this:<blockquote><pre class="programlisting">dup word 'that' at paragraph 100</pre></blockquote>As it happens, we know that that particular instance was intentional.</p><a name="INDEX-1360"></a><a name="INDEX-1361"></a><a name="INDEX-1362"></a><h3 class="sect2">5.2.2. The m// Operator (Matching)</h3><p><a name="INDEX-1363"></a><blockquote><pre class="programlisting"><em class="replaceable">EXPR</em> =~ m/<em class="replaceable">PATTERN</em>/cgimosx<em class="replaceable">EXPR</em> =~ /<em class="replaceable">PATTERN</em>/cgimosx<em class="replaceable">EXPR</em> =~ ?<em class="replaceable">PATTERN</em>?cgimosxm/<em class="replaceable">PATTERN</em>/cgimosx/<em class="replaceable">PATTERN</em>/cgimosx?<em class="replaceable">PATTERN</em>?cgimosx</pre></blockquote><a name="INDEX-1364"></a>The <tt class="literal">m//</tt> operator searches the string in the scalar <em class="replaceable">EXPR</em> for<em class="replaceable">PATTERN</em>. If <tt class="literal">/</tt> or <tt class="literal">?</tt> is the delimiter, the initial <tt class="literal">m</tt> isoptional. Both <tt class="literal">?</tt> and <tt class="literal">'</tt> have special meanings as delimiters: thefirst is a once-only match; the second suppresses variableinterpolation and the six translation escapes (<tt class="literal">\U</tt> and company,described later).</p><p><a name="INDEX-1365"></a>If <em class="replaceable">PATTERN</em> evaluates to a null string,either because you specified it that way using <tt class="literal">//</tt>or because an interpolated variable evaluated to the empty string, thelast successfully executed regular expression not hidden within aninner block (or within a <tt class="literal">split</tt>,<tt class="literal">grep</tt>, or <tt class="literal">map</tt>) is used instead.</p><p><a name="INDEX-1366"></a>In scalar context, the operator returns true (<tt class="literal">1</tt>) if successful,false (<tt class="literal">""</tt>) otherwise. This form is usually seen in Boolean context:<blockquote><pre class="programlisting">if ($shire =~ m/Baggins/) { ... } # search for Baggins in $shireif ($shire =~ /Baggins/) { ... } # search for Baggins in $shireif ( m#Baggins# ) { ... } # search right here in $_if ( /Baggins/ ) { ... } # search right here in $_</pre></blockquote><a name="INDEX-1367"></a><a name="INDEX-1368"></a>Used in list context, <tt class="literal">m//</tt> returns a list ofsubstrings matched by the capturing parentheses in the pattern (thatis, <tt class="literal">$1</tt>, <tt class="literal">$2</tt>,<tt class="literal">$3</tt>, and so on) as described later under "Capturingand Clustering". The numbered variables are still set even when thelist is returned. If the match fails in list context, a null list isreturned. If the match succeeds in list context but there were nocapturing parentheses (nor <tt class="literal">/g</tt>), a list value of<tt class="literal">(1)</tt> is returned. Since it returns a null list onfailure, this form of <tt class="literal">m//</tt> can also be used inBoolean context, but only when participating indirectly via a listassignment:<blockquote><pre class="programlisting">if (($key,$value) = /(\w+): (.*)/) { ... }</pre></blockquote>Valid modifiers for <tt class="literal">m//</tt> (in whatever guise) areshown in <a href="ch05_02.htm#perl3-tab-mmods">Table 5-1</a>.<a name="INDEX-1369"></a><a name="INDEX-1370"></a></p><a name="perl3-tab-mmods"></a><h4 class="objtitle">Table 5.1. m// Modifiers</h4><table border="1"><tr><th>Modifier</th><th>Meaning</th></tr><tr><td><tt class="literal">/i</tt><a name="INDEX-1371"></a></td><td>Ignore alphabetic case.</td></tr><tr><td><tt class="literal">/m</tt><a name="INDEX-1372"></a></td><td>Let <tt class="literal">^</tt> and <tt class="literal">$</tt> match next to embedded <tt class="literal">\n</tt>.<a name="INDEX-1373"></a><a name="INDEX-1374"></a></td></tr><tr><td><tt class="literal">/s</tt></td><td>Let <tt class="literal">.</tt> match newline and ignore deprecated <tt class="literal">$*</tt>.<a name="INDEX-1375"></a></td></tr>
?? 快捷鍵說(shuō)明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -