?? 促使我寫此正則表達式解析庫的由來.htm
字號:
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>蘭兄的經歷我很有經驗,以前也曾經努力尋找一套好用的正則表達式的</span><span lang=EN-US>C++</span><span
style='font-family:宋體'>庫,然</span></p>
<p class=MsoNormal><span style='font-family:宋體'>而用過以后都不太滿意。</span> </p>
<p class=MsoNormal><span style='font-family:宋體'>正則表達式中公認的</span><span
lang=EN-US>perl</span><span style='font-family:宋體'>是做的最好的(現在很多庫都聲稱可以支持</span><span
lang=EN-US>perl</span><span style='font-family:宋體'>的正則表達</span></p>
<p class=MsoNormal><span style='font-family:宋體'>式),比如懶惰匹配就很有用。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>如果蘭兄不是必須用</span><span lang=EN-US>C++ </span><span
style='font-family:宋體'>做的話,可以用內嵌</span><span lang=EN-US>python</span><span
style='font-family:宋體'>引擎,然后用</span><span lang=EN-US>python</span><span
style='font-family:宋體'>里的正則</span></p>
<p class=MsoNormal><span style='font-family:宋體'>表達式</span><span lang=EN-US>module
re</span></p>
<p class=MsoNormal><span style='font-family:宋體'>按你的要求的話,需要使用</span><span
lang=EN-US>python 2.4</span><span style='font-family:宋體'>以上版本,因為中文的</span><span
lang=EN-US>unicode</span><span style='font-family:宋體'>在</span><span lang=EN-US>2.4</span><span
style='font-family:宋體'>才支持(</span><span lang=EN-US>2.</span></p>
<p class=MsoNormal><span lang=EN-US>4</span><span style='font-family:宋體'>還沒有</span><span
lang=EN-US>release</span><span style='font-family:宋體'>。)</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span><span style='font-family:
宋體'>關于</span><span lang=EN-US>C++</span><span style='font-family:宋體'>漢字查找的問題最近大話西游也遇到,因為要限制經濟頻道里的說話必須包</span></p>
<p class=MsoNormal><span style='font-family:宋體'>含“賣”。要精確判斷的</span></p>
<p class=MsoNormal><span style='font-family:宋體'>話,需要先把</span><span lang=EN-US>char*</span><span
style='font-family:宋體'>或</span><span lang=EN-US>string</span><span
style='font-family:宋體'>的字符串先用</span><span lang=EN-US>MultiByteToWideChar</span><span
style='font-family:宋體'>轉為</span><span lang=EN-US> WCHAR</span><span
style='font-family:宋體'>或</span></p>
<p class=MsoNormal><span lang=EN-US>wstring, </span><span style='font-family:
宋體'>然后再查找。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>希望對你有用。</span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>孟巖</span><span lang=EN-US>,</span><span
style='font-family:宋體'>您好!</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>======= 2004-06-02 13:22:29 </span><span
style='font-family:宋體'>您在來信中寫道:</span><span lang=EN-US>=======</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>>See www.pcre.org</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>多謝!</span></p>
<p class=MsoNormal><span lang=EN-US>></span><span style='font-family:宋體'>我個人的感覺,不如靜下心來寫一個</span><span
lang=EN-US>iterator</span><span style='font-family:宋體'>,應該是很容易的。不過我也很久沒干過</span></p>
<p class=MsoNormal><span lang=EN-US>></span><span style='font-family:宋體'>這種事情了,也就泛泛的說說算了。</span></p>
<p class=MsoNormal><span lang=EN-US> </span><span
style='font-family:宋體'>我著手寫了一下,似乎我寫一個</span><span lang=EN-US>iterator</span><span
style='font-family:宋體'>不起作用,需要把</span><span lang=EN-US>base_string</span><span
style='font-family:宋體'>也一起寫了。而且有個很大的問題:</span><span lang=EN-US>++</span><span
style='font-family:宋體'>操作跟</span><span lang=EN-US>--</span><span
style='font-family:宋體'>操作不一致。</span><span lang=EN-US>++</span><span
style='font-family:宋體'>的時候我可以很容易判斷當前字節是否是多字節嗎,從而地址</span><span lang=EN-US>+1</span><span
style='font-family:宋體'>還是</span><span lang=EN-US>+2</span><span
style='font-family:宋體'>。但是,</span><span lang=EN-US>--</span><span
style='font-family:宋體'>的時候就不是那么好做了(考慮到支持如</span><span lang=EN-US>GIB5</span><span
style='font-family:宋體'>——其漢字的后半字節編碼跟英文有重疊),如果單純的地址</span><span lang=EN-US>-1</span><span
style='font-family:宋體'>,會不會出現問題,這個迭代子是否還是</span><span lang=EN-US>random_iterator?</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>= = = = = = = = = = = = = = = = = = = =</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'> 致</span></p>
<p class=MsoNormal><span style='font-family:宋體'>禮!</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'> </span><span
lang=EN-US>lanzhengpeng</span></p>
<p class=MsoNormal><span style='font-family:宋體'> </span><span
lang=EN-US>2004-06-02</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span style='font-family:宋體'>發送時間</span><span lang=EN-US>:
2004</span><span style='font-family:宋體'>年</span><span lang=EN-US>6</span><span
style='font-family:宋體'>月</span><span lang=EN-US>2</span><span style='font-family:
宋體'>日</span><span lang=EN-US> 15:53</span></p>
<p class=MsoNormal><span style='font-family:宋體'>收件人</span><span lang=EN-US>:
C++ Discuss Group</span></p>
<p class=MsoNormal><span style='font-family:宋體'>主題</span><span lang=EN-US>:
Re[2]: </span><span style='font-family:宋體'>答復</span><span lang=EN-US>: [cpp]</span><span
style='font-family:宋體'>正則表達式和多字節碼的問題</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>Hello lanzhengpeng,</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>Wednesday, June 2, 2004</span><span
lang=EN-US>, </span><span lang=EN-US>3:38:11 PM</span><span lang=EN-US>, you
wrote:</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>還是轉一下吧</span><span lang=EN-US>,
</span><span style='font-family:宋體'>轉成</span><span lang=EN-US> wstring.</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>我想到另外一個問題</span><span
lang=EN-US>, </span><span style='font-family:宋體'>也是我前段干過的</span><span
lang=EN-US>.</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>就是英文有</span><span lang=EN-US>
stricmp, </span><span style='font-family:宋體'>中文是否也應該有一個模糊查找</span><span
lang=EN-US>. </span><span style='font-family:宋體'>比如忽略掉同音字的</span><span
lang=EN-US>.</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>有時候也不用忽略所有同音字,高頻字一般即使同音也不會混用</span><span
lang=EN-US>. </span><span style='font-family:宋體'>一些不常用到的字容</span></p>
<p class=MsoNormal><span style='font-family:宋體'>易用同音別字代替</span><span
lang=EN-US>.</span></p>
<p class=MsoNormal><span style='font-family:宋體'>另外漢字有多音字的問題,使這種模糊匹配的算法變得復雜。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>我曾經花了一下午的時間整理資料,把大部分</span><span
lang=EN-US> GBK </span><span style='font-family:宋體'>字集里的漢字的漢語拼音都列出來</span></p>
<p class=MsoNormal><span style='font-family:宋體'>的</span><span lang=EN-US>(</span><span
style='font-family:宋體'>包括聲調</span><span lang=EN-US>)</span><span
style='font-family:宋體'>,包括一字多音的。</span></p>
<p class=MsoNormal><span style='font-family:宋體'>還有一種最常用的</span><span
lang=EN-US> 1000 </span><span style='font-family:宋體'>多字的按使用頻率排列的表。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>有沒有人感興趣呀</span><span
lang=EN-US> :)</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>Best regards,</span></p>
<p class=MsoNormal><span lang=EN-US> Cloudwu</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>[</span><span style='font-family:宋體'>一人之斷制</span><span
lang=EN-US>, </span><span style='font-family:宋體'>所見有限</span><span lang=EN-US>, </span><span
style='font-family:宋體'>猶目之一瞥</span><span lang=EN-US>, </span><span
style='font-family:宋體'>豈能盡萬物之情乎</span><span lang=EN-US>]</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>_______________________________________________</span></p>
<p class=MsoNormal><span lang=EN-US>Cpp mailing list</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>> ======= 2004-06-02 11:18:58 </span><span
style='font-family:宋體'>您在來信中寫道:</span><span lang=EN-US>=======</span></p>
<p class=MsoNormal><span lang=EN-US>> </span></p>
<p class=MsoNormal><span lang=EN-US>> > </span><span
style='font-family:宋體'>關于</span><span lang=EN-US>C++</span><span
style='font-family:宋體'>漢字查找的問題最近大話西游也遇到,因為要限制經濟頻道里的說話必須包含“賣”。要精確判斷的</span></p>
<p class=MsoNormal><span lang=EN-US>> ></span><span style='font-family:
宋體'>話,需要先把</span><span lang=EN-US>char*</span><span style='font-family:宋體'>或</span><span
lang=EN-US>string</span><span style='font-family:宋體'>的字符串先用</span><span
lang=EN-US>MultiByteToWideChar</span><span style='font-family:宋體'>轉為</span><span
lang=EN-US> WCHAR</span><span style='font-family:宋體'>或</span><span
lang=EN-US>wstring, </span><span style='font-family:宋體'>然后再查找。</span></p>
<p class=MsoNormal><span lang=EN-US>> </span><span
style='font-family:宋體'>這樣只能判斷有和無,實際上我需要精確位置。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span style='font-family:宋體'>是可以精確查找的呀。</span></p>
<p class=MsoNormal><span lang=EN-US> </span></p>
<p class=MsoNormal><span lang=EN-US>> </span><span style='font-family:宋體'>另外是否可以嵌入其他東西:我覺得沒有必要,實際那些腳本語言最后也通過</span><span
lang=EN-US>C/C++</span><span style='font-family:宋體'>來做的,搞不好還就是用的我們已知的東西。而且正</span></p>
<p class=MsoNormal><span lang=EN-US>> </span><span style='font-family:宋體'>則表達式如此有用,以至于我到處都在使用——無論程序大小。如果為此在那些眾多的程序中嵌入一個腳本,也是我所不愿意的。</span></p>
<p class=MsoNormal><span lang=EN-US>> </span></p>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -