赞 | 0 |
VIP | 0 |
好人卡 | 7 |
积分 | 1 |
经验 | 5012 |
最后登录 | 2014-2-28 |
在线时间 | 163 小时 |
Lv1.梦旅人
- 梦石
- 0
- 星屑
- 50
- 在线时间
- 163 小时
- 注册时间
- 2011-11-12
- 帖子
- 56
|
Ruby 的文檔明確聲明了:
\w [A-Za-z0-9_] Word character (+ Connector_Punctuation, Letter, Mark,
and Number)
\W [^A-Za-z0-9_] Any character except a word character
所以 \w 僅限 ASCII 字符。Ruby 的問題還是要以 Ruby 權威的文檔為准。另外,Ruby 1.9 的正則表達式引擎換成了 Onigurama,所以和 1.8 有差異也屬於正常現象。Onigurama 的功能比以前的正則引擎強了不知幾倍,若想要匹配 Unicode 字符,可以用 \p{Word}。- puts ' 1f測o試2o3 '[/\p{Word}+/] # => 1f測o試2o3
复制代码 還可以只匹配漢字:- puts ' 測試 '[/\p{Han}+/] # => 測試
复制代码 \p{Word} 可以泛用于所有編碼,但 {Han} 只能用於 UTF-*。
Onigurama 完整的 Unicode character classes 表:
\p{name} Matches character with named property
\p{^name} Matches any character except named property
\P{name} Matches any character except named property
Property names
All encodings: Alnum, Alpha, Blank, Cntrl, Digit, Graph, Lower, Print, Punct, Space, Upper, XDigit, Word, ASCII
EUC and SJIS: Hiragana, Katakana
UTF-n: Any, Assigned, C, Cc, Cf, Cn, Co, Cs, L, Ll, Lm, Lo, Lt, Lu, M, Mc, Me, Mn, N, Nd, Nl, No, P, Pc, Pd, Pe, Pf, Pi, Po, Ps, S, Sc, Sk, Sm, So, Z, Zl, Zp, Zs, Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian, Oriya, Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai |
|