?? java.regex.tutorial.html

?? Regular Expressions of Java Tutorial
?? HTML
?? 第 1 頁 / 共 5 頁
字號:
上一頁 1 2 3 45
Enter your regex: .*?foo  // 勉強量詞
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // 侵占量詞
Enter input string to search: xfooxxxxxxfoo
No match found.</pre>

　　第一個例子使用貪婪量詞<code>.*</code>，尋找緊跟著字母“f”“o”“o”的“任何東西”零次或者多次。由于量詞是貪婪的，表達式的<code>.*</code>部分第一次“吃掉”整個輸入的字符串。在這一點，全部表達式不能成功地進行匹配，這是由于最后三個字母（“f”“o”“o”）已經被消耗掉了。那么匹配器會慢慢地每次回退一個字母，直到返還的“foo”在最右邊出現，這時匹配成功并且搜索終止。<br/>
　　然而，第二個例子采用勉強量詞，因此通過首次消耗“什么也沒有”作為開始。由于“foo”并沒有出現在字符串的開始，它被強迫吞掉第一個字母（“x”），在 0 和 4 處觸發了第一個匹配。測試用具會繼續處理，直到輸入的字符串耗盡為止。在 4 和 13 找到了另外一個匹配。<br/>
　　第三個例子的量詞是侵占，所以在尋找匹配時失敗了。在這種情況下，整個輸入的字符串被<code>.*+</code>消耗了，什么都沒有剩下來滿足表達式末尾的“foo”。<br/>
　　你可以在想抓取所有的東西，且決不回退的情況下使用侵占量詞，在這種匹配不是立即被發現的情況下，它將會優于等價的貪婪量詞。<br/>

<div id="h2"><a name="reg6"></a>6　捕獲組<span class="returnContents"><a href="#contents">返回目錄</a></span></div>

　　在上一節中，學習了每次如何把量詞放在一個字符、字符類或者捕獲組中。到目前為止，還沒有詳細地討論過捕獲組的概念。<br/>
　　<em>捕獲組</em>（capturing group）是將多個字符作為單獨的單元來對待的一種方式。構建它們可以通過把字符放在一對圓括號中而成為一組。例如，正則表達式<code>(dog)</code>建了單個的組，包括字符“d”“o”和“g”。匹配捕獲組輸入的字符串部分將會存放于內存中，稍后通過反向引用再次調用。（在 <a href="#reg6_2">6.2 節</a> 中將會討論反向引用）

<div id="h3"><a name="reg6_1"></a>6.1　編號方式<span class="returnContents"><a href="#contents">返回目錄</a></span></div>
　　在 Pattern 的 API 描述中，捕獲組通過從左至右計算開始的圓括號進行編號。例如，在表達式<code>((A)(B(C)))</code>中，有下面的四組：<br/>
　　1. <code>((A)(B(C)))</code><br/>
　　2. <code>(A)</code><br/>
　　3. <code>(B(C))</code><br/>
　　4. <code>(C)</code><br/>
　　要找出當前的表達式中有多少組，通過調用 Matcher 對象的 groupCount 方法。groupCount 方法返回 int 類型值，表示當前 Matcher 模式中捕獲組的數量。例如，groupCount 返回 4 時，表示模式中包含有 4 個捕獲組。<br/>
　　有一個特別的組&mdash;&mdash;組 0，它表示整個表達式。這個組不包括在 groupCount 的報告范圍內。以<code>(?</code>開始的組是純粹的<em>非捕獲組</em>（non-capturing group），它不捕獲文本，也不作為組總數而計數。（可以看 <a href="#reg8">8 Pattern 類的方法</a> 一節中非捕獲組的例子。）<br/>
　　Matcher 中的一些方法，可以指定 int 類型的特定組號作為參數，因此理解組是如何編號的是尤為重要的。<br/>
　　<label>public int start(int group)</label>：返回之前的匹配操作期間，給定組所捕獲的子序列的初始索引。<br/>
　　<label>public int end(int group)</label>：返回之前的匹配操作期間，給定組所捕獲子序列的最后字符索引加 1。<br/>
　　<label>public String group (int group)</label>：返回之前的匹配操作期間，通過給定組而捕獲的輸入子序列。<br/>

<div id="h3"><a name="reg6_2"></a>6.2　反向引用<span class="returnContents"><a href="#contents">返回目錄</a></span></div>
　　匹配輸入字符串的捕獲組部分會存放在內存中，通過<em>反向引用</em>（backreferences）稍后再調用。在正則表達式中，反向引用使用反斜線（<code>\</code>）后跟一個表示需要再調用組號的數字來表示。例如，表達式<code>(\d\d)</code>定義了匹配一行中的兩個數字的捕獲組，通過反向引用<code>\1</code>，表達式稍候會被再次調用。<br/>
　　匹配兩個數字，且后面跟著兩個完全相同的數字時，就可以使用<code>(\d\d)\1</code>作為正則表達式：<br/>

<pre id="console">Enter your regex: (\d\d)\1
Enter input string to search: 1212
I found the text "1212" starting at index 0 and ending at index 4.</pre>

　　如果更改最后的兩個數字，這時匹配就會失敗：<br/>
 
<pre id="console">Enter your regex: (\d\d)\1
Enter input string to search: 1234
No match found.</pre>

　　對于嵌套的捕獲組而言，反向引用采用完全相同的方式進行工作，即指定一個反斜線加上需要被再次調用的組號。<br/>

<div id="h2"><a name="reg7"></a>7　邊界匹配器<span class="returnContents"><a href="#contents">返回目錄</a></span></div>
　　就目前而言，我們的興趣在于指定輸入字符串中某些位置是否有匹配，還沒有考慮到字符串的匹配產生在什么地方。<br/>
　　通過指定一些<em>邊界匹配器</em>（boundary matchers）的信息，可以使模式匹配更為精確。比如說你對某個特定的單詞感興趣，并且它只出現在行首或者是行尾時。又或者你想知道匹配發生在單詞邊界（word boundary），或者是上一個匹配的尾部。<br/>
　　下表中列出了所有的邊界匹配器及其說明。<br/>

<table border="0" cellpadding="0" cellspacing="0" class="regTab" align="center">
  <caption>邊界匹配器</caption>
  <tr>
    <td class="regCenter"><code>^</code></td>
    <td>行首</td>
  </tr>
  <tr>
    <td class="regCenter"><code>$</code></td>
    <td>行尾</td>
  </tr>
  <tr>
    <td class="regCenter"><code>\b</code></td>
    <td>單詞邊界</td>
  </tr>
  <tr>
    <td class="regCenter"><code>\B</code></td>
    <td>非單詞邊界</td>
  </tr>
  <tr>
    <td class="regCenter"><code>\A</code></td>
    <td>輸入的開頭</td>
  </tr>
  <tr>
    <td class="regCenter"><code>\G</code></td>
    <td>上一個匹配的結尾</td>
  </tr>
  <tr>
    <td class="regCenter"><code>\Z</code></td>
    <td>輸入的結尾，僅用于最后的結束符（如果有的話）</td>
  </tr>
  <tr>
    <td class="regCenter"><code>\z</code></td>
    <td>輸入的結尾</td>
  </tr>
</table>

　　接下來的例子中，說明了<code>^</code>和<code>$</code>邊界匹配器的用法。注意上表中，<code>^</code>匹配行首，<code>$</code>匹配行尾。<br/>

<pre id="console">Enter your regex: ^dog$
Enter input string to search: dog
I found the text "dog" starting at index 0 and ending at index 3.

Enter your regex: ^dog$
Enter input string to search:       dog
No match found.

Enter your regex: \s*dog$
Enter input string to search:             dog
I found the text "            dog" starting at index 0 and ending at index 15.

Enter your regex: ^dog\w*
Enter input string to search: dogblahblah
I found the text "dogblahblah" starting at index 0 and ending at index 11.</pre>

　　第一個例子的匹配是成功的，這是因為模式占據了整個輸入的字符串。第二個例子失敗了，是由于輸入的字符串在開始部分包含了額外的空格。第三個例子指定的表達式是不限的空格，后跟著在行尾的 dog。第四個例子，需要 dog 放在行首，后面跟的是不限數量的單詞字符。<br/>
　　對于檢查一個單詞開始和結束的邊界模式（用于長字符串里子字符串），這時可以在兩邊使用<code>\b</code>，例如<code>\bdog\b</code>。

<pre id="console">Enter your regex: \bdog\b
Enter input string to search: The dog plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \bdog\b
Enter input string to search: The doggie plays in the yard.
No match found.</pre>

　　對于匹配非單詞邊界的表達式，可以使用<code>\B</code>來代替：<br/>
 
<pre id="console">Enter your regex: \bdog\B
Enter input string to search: The dog plays in the yard.
No match found.

Enter your regex: \bdog\B
Enter input string to search: The doggie plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.</pre>

　　對于需要匹配僅出現在前一個匹配的結尾，可以使用<code>\G</code>：<br/>
 
<pre id="console">Enter your regex: dog
Enter input string to search: dog dog
I found the text "dog" starting at index 0 and ending at index 3.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \Gdog
Enter input string to search: dog dog
I found the text "dog" starting at index 0 and ending at index 3.</pre>

　　這里的第二個例子僅找到了一個匹配，這是由于第二次出現的“dog”不是在前一個匹配結尾的開始。<a name="note_07"></a><sup><a href="#note07">[7]</a></sup><br/>

<div id="h2"><a name="reg8"></a>8　Pattern 類的方法<span class="returnContents"><a href="#contents">返回目錄</a></span></div>

　　到目前為止，僅使用測試用具來建立最基本的 Pattern 對象。在這一節中，我們將探討一些諸如使用標志構建模式、使用內嵌標志表達式等高級的技術。同時也探討了一些目前還沒有討論過的其他有用的方法。<br/>

<div id="h3"><a name="reg8_1"></a>8.1　使用標志構建模式<span class="returnContents"><a href="#contents">返回目錄</a></span></div>

　　Pattern 類定義了備用的 compile 方法，用于接受影響模式匹配方式的標志集。標志參數是一個位掩碼，可以是下面公共靜態字段中的任意一個：<br/>

<div id="h4">Pattern.CANON_EQ</span></div>
　　啟用規范等價。在指定此標志后，當且僅當在其完整的規范分解匹配時，兩個字符被視為匹配。例如，表達式<code>a\u030A</code><a name="note_08"></a><sup><a href="#note08">[8]</a></sup>在指定此標志后，將匹配字符串“\u00E5”（即字符 <span style="font-family: Courier New; font-size: 14pt;">&#229;</span>）。默認情況下，匹配不會采用規范等價。指定此標志可能會對性能會有一定的影響。<br/>

<div id="h4">Pattern.CASE_INSENSITIVE</span></div>
　　啟用不區分大小寫匹配。默認情況下，僅匹配 US-ASCII 字符集中的字符。Unicode 感知（Unicode-aware）的不區分大小寫匹配，可以通過指定 UNICODE_CASE 標志連同此標志來啟用。不區分大小寫匹配也能通過內嵌標志表達式<code>(?i)</code>來啟用。指定此標志可能會對性能會有一定的影響。<br/>

<div id="h4">Pattern.COMMENTS</span></div>
　　模式中允許存在空白和注釋。在這種模式下，空白和以<code>#</code>開始的直到行尾的內嵌注釋會被忽略。注釋模式也能通過內嵌標志表達式<code>(?x)</code>來啟用。<br/>

<div id="h4">Pattern.DOTALL</span></div>
　　啟用 dotall 模式。在 dotall 模式下，表達式<code>.</code>匹配包括行結束符在內的任意字符。默認情況下，表達式不會匹配行結束符。dotall 模式也通過內嵌標志表達式<code>(?x)</code>來啟用。［s 是“單行（single-line）”模式的助記符，與 Perl 中的相同。］<br/>

<div id="h4">Pattern.LITERAL</span></div>
　　啟用模式的字面分析。指定該標志后，指定模式的輸入字符串作為字面上的字符序列來對待。輸入序列中的元字符和轉義字符不具有特殊的意義了。CASE_INSENSITIVE 和 UNICODE_CASE 與此標志一起使用時，會對匹配產生一定的影響。其他的標志就變得多余了。啟用字面分析沒有內嵌標志表達式。<br/>

<div id="h4">Pattern.MULTILINE</span></div>
　　啟用多行（multiline）模式。在多行模式下，表達式<code>^</code>和<code>$</code>分別匹配輸入序列行結束符前面和行結束符的前面。默認情況下，表達式僅匹配整個輸入序列的開始和結尾。多行模式也能通過內嵌標志表達式<code>(?m)</code>來啟用。<br/>

<div id="h4">Pattern.UNICODE_CASE</span></div>
　　啟用可折疊感知 Unicode（Unicode-aware case folding）大小寫。在指定此標志后，需要通過 CASE_INSENSITIVE 標志來啟用，不區分大小寫區配將在 Unicode 標準的意義上來完成。默認情況下，不區分大小寫匹配僅匹配 US-ASCII 字符集中的字符。可折疊感知 Unicode 大小寫也能通過內嵌標志表達式<code>(?u)</code>來啟用。指定此標志可能會對性能會有一定的影響。<br/>

<div id="h4">Pattern.UNIX_LINES</span></div>
　　啟用 Unix 行模式。在這種模式下，<code>.</code>、<code>^</code>和<code>$</code>的行為僅識別“\n”的行結束符。Unix 行模式可以通過內嵌標志表達式<code>(?d)</code>來啟用。<br/>
　　接下來，將修改測試用具 <a href="src/RegexTestHarness.java">RegexTestHarness.java</a>，用于構建不區分大小寫匹配的模式。<br/>
　　首先，修改代碼去調用 complie 的另外一個備用的方法：<br/>

<pre name="java" id="java">Pattern pattern = Pattern.compile(
        console.readLine("%nEnter your regex: "),
        Pttern.CASE_INSENSITIVE
    );</pre>

　　編譯并運行這個測試用具，會得出下面的結果：<br/>
 
<pre id="console">Enter your regex: dog
Enter input string to search: DoGDOg
I found the text "DoG" starting at index 0 and ending at index 3.
I found the text "DOg" starting at index 3 and ending at index 6.</pre>

　　正如你所看到的，不管是否大小寫，字符串字面上是“dog”的都產生了匹配。使用多個標志來編譯一個模式，使用按位或操作符“|”分隔各個標志。為了更清晰地說明，下面的示例代碼使用硬編碼（hardcode）的方式，來取代控制臺中的讀取：<br/>
 
<pre name="java" id="java">pattern = Pattern.compile("[az]$", Pattern.MULTILINE | Pattern.UNIX_LINES);</pre>

　　也可以使用一個 int 類型的變量來代替：<br/>

<pre name="java" id="java">final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
Pattern pattern = Pattern.compile("aa", flags);</pre>

<div id="h3"><a name="reg8_2"></a>8.2　內嵌標志表達式<span class="returnContents"><a href="#contents">返回目錄</a></span></div>
　　使用<em>內嵌標志表達式</em>（embedded flag expressions）也可以啟用不同的標志。對于兩個參數的 compile 方法，內嵌標志表達式是可選的，因為它在自身的正則表達式中被指定了。下面的例子使用最初的測試用具（<a href="src/RegexTestHarness.java">RegexTestHarness.java</a>），使用內嵌標志表達式<code>(?i)</code>來啟用不區分大小寫的匹配。<br/>
<pre id="console">Enter your regex: (?i)foo
Enter input string to search: FOOfooFoOfoO
I found the text "FOO" starting at index 0 and ending at index 3.
I found the text "foo" starting at index 3 and ending at index 6.
I found the text "FoO" starting at index 6 and ending at index 9.
I found the text "foO" starting at index 9 and ending at index 12.</pre>
　　所有匹配無關大小寫都一次次地成功了。<br/>
　　內嵌標志表達式所對應 Pattern 的公用的訪問字段表示如下表：<br/>

<table border="0" c
上一頁 1 2 3 45
?? 文件大小 32 K
?? 上傳用戶 zhang8818200
?? 所屬分類 Java編程
??? 相關標簽

#Expressions #Tutorial #Regular #Java
?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? java.regex.tutorial.html

?? 快捷鍵說明