?? 促使我寫此正則表達式解析庫的由來.htm
字號:
<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=gb2312">
<title>促使我寫此正則表達式解析庫的由來</title>
<style>
<!--
/* Font Definitions */
@font-face
{font-family:宋體;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"\@宋體";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
text-justify:inter-ideograph;
font-size:10.5pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline;}
/* Page Definitions */
@page Section1
{size:595.3pt 841.9pt;
margin:72.0pt 10.3pt 72.0pt 18.0pt;
layout-grid:15.6pt;}
div.Section1
{page:Section1;}
-->
</style>
</head>
<body lang=ZH-CN link=blue vlink=purple style='text-justify-trim:punctuation'>
<div class=Section1 style='layout-grid:15.6pt'>
<p class=MsoNormal><span lang=EN-US><a href="#初衷"><span style='font-family:
宋體'>初衷</span></a></span></p>
<p class=MsoNormal><span lang=EN-US><a href="#我想說的"><span style='font-family:
宋體'>我想說的</span></a></span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><a name=初衷><span style='font-family:宋體'>大家好</span></a><span
style='font-family:宋體'>!</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>我所知道的正則表達式庫有:</span><span lang=EN-US>boost</span><span
style='font-family:宋體'>的,</span><span lang=EN-US>GNU</span><span
style='font-family:宋體'>的,</span><span lang=EN-US>VC7</span><span
style='font-family:宋體'>帶的</span><span lang=EN-US>ATL</span><span
style='font-family:宋體'>中的和微軟發布的</span><span lang=EN-US>greta</span><span
style='font-family:宋體'>。我使用過后三種,</span><span lang=EN-US>greta</span><span
style='font-family:宋體'>使用時間最短(才兩天)。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>現在我來說說我的感受:</span></p>
<p class=MsoNormal><span lang=EN-US> GNU</span><span
style='font-family:宋體'>的正則表達式根本就不支持多字節碼,設置連</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋體'>都不支持,在</span><span lang=EN-US>parse</span><span
style='font-family:宋體'>階段就會非法操作。在軟件全球化的今天,實在不是一個好現象。優點是支持的語法完備。</span></p>
<p class=MsoNormal><span lang=EN-US> ATL</span><span
style='font-family:宋體'>中的正則表達式不完全支持多字節碼,可以完善的支持</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋體'>。不過,此正則表達式書寫非常清晰,沒有用到</span><span lang=EN-US>STL</span><span
style='font-family:宋體'>里面任何高深的東西,也沒有用到模板中特別高深的東西</span><span lang=EN-US>(</span><span
style='font-family:宋體'>我認為這才是</span><span lang=EN-US>C++</span><span
style='font-family:宋體'>的發展之道,畢竟,聰明人是少數——大部分是平庸的人,曲高寡合,總有一天會被大多數程序員拋棄</span><span
lang=EN-US>,</span><span style='font-family:宋體'>剩下一幫高手顧影自憐),所以,通過非常微小和容易的更改就可以完善支持多字節碼。缺點是不支持</span><span
lang=EN-US>{n,m}</span><span style='font-family:宋體'>語法,不支持遞歸語法,如:</span><span
lang=EN-US>"([^\\"]*(\\.)*[^\\"]*)*"</span><span
style='font-family:宋體'>。最后一個</span><span lang=EN-US>*</span><span
style='font-family:宋體'>是不被支持的。</span></p>
<p class=MsoNormal><span lang=EN-US> greta</span><span
style='font-family:宋體'>能完善的支持單字節碼和</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋體'>,語法也完善,而且據說普遍情況下速度也快,不過,把部分實現放</span><span lang=EN-US>cpp</span><span
style='font-family:宋體'>里導致不能同時使用單字節碼和</span><span lang=EN-US>UNICODE</span><span
style='font-family:宋體'>編碼,</span><span lang=EN-US>posix</span><span
style='font-family:宋體'>和</span><span lang=EN-US>perl</span><span
style='font-family:宋體'>語法,解決辦法還算簡單:把</span><span lang=EN-US>cpp</span><span
style='font-family:宋體'>改名為</span><span lang=EN-US>inl</span><span
style='font-family:宋體'>,在</span><span lang=EN-US>.h</span><span
style='font-family:宋體'>里</span><span lang=EN-US>include</span><span
style='font-family:宋體'>這個</span><span lang=EN-US>inl</span><span
style='font-family:宋體'>,再修改一點別的東西就可。問題是,它沒有支持多字節碼的實現,我仔細看看了,似乎通過自己寫一個多字節碼的迭代子,可以解決這個問題,因為他支持</span><span
lang=EN-US>basic_string</span><span style='font-family:宋體'>。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>接下來的問題是:</span><span lang=EN-US>STL</span><span
style='font-family:宋體'>如何支持多字節碼的?我沒有在</span><span lang=EN-US>SGI-STL</span><span
style='font-family:宋體'>,</span><span lang=EN-US>STLPort453</span><span
style='font-family:宋體'>中找到關于多字節碼的東西。</span><span lang=EN-US>basic_string</span><span
style='font-family:宋體'>默認只實現了</span><span lang=EN-US>char,wchar_t</span><span
style='font-family:宋體'>的</span><span lang=EN-US>base_string</span><span
style='font-family:宋體'>。而要自己實現一個迭代子,我又不知道如何下手。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>我現在的需求是:</span><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>需要正則表達式支持類似這樣的語法:“</span><span lang=EN-US>/</span><span
style='font-family:宋體'>漢字</span><span lang=EN-US>[</span><span
style='font-family:宋體'> </span><span lang=EN-US> ]+[^</span><span
style='font-family:宋體'> ,</span><span lang=EN-US> ,]+[</span><span
style='font-family:宋體'> </span><span lang=EN-US> ]*[</span><span
style='font-family:宋體'>,</span><span lang=EN-US>,][</span><span
style='font-family:宋體'> </span><span lang=EN-US> ]*[^</span><span
style='font-family:宋體'> ,</span><span lang=EN-US> ,]+</span><span
style='font-family:宋體'>”以匹配“/漢字 蘭征鵬 ,正則表達式”。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>使用</span><span lang=EN-US>STL</span><span
style='font-family:宋體'>進行字符串搜索都有問題,比如在一篇文章中搜索“正則”,很可能就把三個漢字的中間四個字節匹配上了。出現這樣的情況,讓人哭笑不得。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>有這方面經驗的或對</span><span lang=EN-US>STL</span><span
style='font-family:宋體'>比較熟悉的同仁,請勿吝嗇指導</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'> 致</span></p>
<p class=MsoNormal><span style='font-family:宋體'>禮!</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'> </span><span
lang=EN-US>lanzhengpeng</span></p>
<p class=MsoNormal><span style='font-family:宋體'> </span><span
lang=EN-US>2004-06-02</span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span style='font-family:宋體'>在</span><span
lang=EN-US>C/C++</span><span style='font-family:宋體'>中如果想要使用與</span><span
lang=EN-US>Perl</span><span style='font-family:宋體'>兼容的</span><span lang=EN-US>regexp</span><span
style='font-family:宋體'>庫,一個選擇是</span><span lang=EN-US>Boost</span><span
style='font-family:宋體'>,另一個選擇是</span><span lang=EN-US>PCRE</span></p>
<p class=MsoNormal><span style='font-family:宋體'>庫。</span><span lang=EN-US>Boost</span><span
style='font-family:宋體'>中的</span><span lang=EN-US>regex</span><span
style='font-family:宋體'>算法最近做了改近,平均效率比以前的版本提高了</span><span lang=EN-US>10</span><span
style='font-family:宋體'>倍,不過用起</span></p>
<p class=MsoNormal><span style='font-family:宋體'>來可能比較麻煩。</span><span
lang=EN-US>PCRE</span><span style='font-family:宋體'>已經很成熟了,</span><span
lang=EN-US>Apache/Postfix/PHP/Python</span><span style='font-family:宋體'>都用它。我認為應</span></p>
<p class=MsoNormal><span style='font-family:宋體'>該優先考慮。不過我自己沒有在</span><span
lang=EN-US>Windows</span><span style='font-family:宋體'>下編譯過,不是很有把握。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>See www.pcre.org</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>我個人很喜歡</span><span lang=EN-US>Ruby</span><span
style='font-family:宋體'>中的正則表達式功能,功能強,速度也很不錯。因為</span><span lang=EN-US>Ruby</span><span
style='font-family:宋體'>是日本人發</span></p>
<p class=MsoNormal><span style='font-family:宋體'>明的,處理東亞大字符集沒有任何問題。</span><span
lang=EN-US>Ruby</span><span style='font-family:宋體'>與</span><span lang=EN-US>C/C++</span><span
style='font-family:宋體'>接口很容易,但是為了這個小功</span></p>
<p class=MsoNormal><span style='font-family:宋體'>能加入</span><span lang=EN-US>Ruby</span><span
style='font-family:宋體'>,似乎有點小題大做了。</span><span lang=EN-US>Perl</span><span
style='font-family:宋體'>我不熟悉。</span><span lang=EN-US>Lua</span><span
style='font-family:宋體'>獨創了一套模式匹配語法,而</span></p>
<p class=MsoNormal><span style='font-family:宋體'>且</span><span lang=EN-US>Lua</span><span
style='font-family:宋體'>天生就是要嵌入到</span><span lang=EN-US>C/C++</span><span
style='font-family:宋體'>中去的,性能比</span><span lang=EN-US>Perl/Ruby/Python</span><span
style='font-family:宋體'>都快的多。</span><span lang=EN-US>Lua</span><span
style='font-family:宋體'>的模式</span></p>
<p class=MsoNormal><span style='font-family:宋體'>匹配語法有點怪,解決</span><span
lang=EN-US>lanzhengpeng</span><span style='font-family:宋體'>的問題好像是足夠的,不過跟標準</span><span
lang=EN-US>regex</span><span style='font-family:宋體'>語法完全</span></p>
<p class=MsoNormal><span style='font-family:宋體'>不同。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>我個人的感覺,不如靜下心來寫一個</span><span
lang=EN-US>iterator</span><span style='font-family:宋體'>,應該是很容易的。不過我也很久沒干過</span></p>
<p class=MsoNormal><span style='font-family:宋體'>這種事情了,也就泛泛的說說算了。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>孟巖</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>發件人</span><span lang=EN-US>: kyo</span></p>
<p class=MsoNormal><span style='font-family:宋體'>發送時間</span><span lang=EN-US>:
2004</span><span style='font-family:宋體'>年</span><span lang=EN-US>6</span><span
style='font-family:宋體'>月</span><span lang=EN-US>2</span><span style='font-family:
宋體'>日</span><span lang=EN-US> 11:19</span></p>
<p class=MsoNormal><span style='font-family:宋體'>收件人</span><span lang=EN-US>:
'C++ Discuss Group'</span></p>
<p class=MsoNormal><span style='font-family:宋體'>主題</span><span lang=EN-US>: RE:
[cpp]</span><span style='font-family:宋體'>正則表達式和多字節碼的問題</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -