RE2简介

RE2是,一个高效、原则性的正则表达式库,由Rob Pike和Russ Cox两位来自google的大牛用C++实现。他俩同时也是Go语言的主导者。Go语言中的regexp正则表达式包,也是RE2的Go实现。
RE2是,一个快速、安全,线程友好,PCRE、PERL和Python等回溯正则表达式引擎(backtracking regular expression engine)的一个替代品。RE2支持Linux和绝大多数的Unix平台,但不支持Windows(如果有必要,你可以自己hack)。
RE2的特点
回溯引擎(Backtracking engine)通常是典型的完整的功能和便捷的语法糖,但是即使很小的输入都可能强制进入指数级时间处理场景。RE2应用自动机理论理论,来保证在一个尺寸的输入上正则表达式搜索运行于一个时间线。RE2实现了内存限制,所以搜索可以被制约在一个固定大小的内存。RE2被设计为使用一个很小的固定C++堆栈足迹,无论它必须处理的输入或正则表达式是什么。从而RE2在多线程环境非常有用,当线程栈不能武断的增大时。
当输入(数据集)很大时,RE2通常比回溯引擎快很多。它采用自动机理论,实施别的引擎无法进行的优化。
不同于绝大多数基于自动机的引擎,RE2实现了几乎所有Perl和PCRE特点,和语法糖。它找到最左-优先(leftmost-first)匹配,同时匹配Perl可能匹配的,并且能返回子匹配信息。最明显的例外是,RE2去掉了对反向引用(backreferences)和一般性零-宽度断言(zero-width assertion)的支持,因为无法高效实现。
为了相对简单语法的使用者,RE2,有一个POSIX模式,仅接受POSIX egrep算子,实现最左-最长整体匹配(leftmost-longest overall matching)。

¹ Technical note: there's a difference between submatches and backreferences. Submatches let you find out what certain subexpressions matched after the match is over, so that you can find out, after matching dogcat against (cat|dog)(cat|dog), that \1 is dog and \2 is cat. Backreferences let you use those subexpressions during the match, so that (cat|dog)\1 matches catcat and dogdog but not catdog or dogcat.
RE2支持子匹配萃取(submatch extraction),但是不支持反向引用(backreferences)。
如果你必须要反向引用和一般性断言,而RE2不支持,那么你可以看一下irregexp,Google Chrome的正则表达式引擎。
玩转RE2
安装
你可以下载发行版的代码包,然后解压进行安装。这里介绍,另一种安装方式:
需要安装Mercurial SCM和C++编译器(g++的克隆):
下载代码,并进行安装:
hg clone http://re2.googlecode.com/hg re2cd re2make testmake testinstallsudo make install
在BSD系统, 使用gmake替换make
使用RE2库
使用RE2库开发C++应用,需要在代码中包含re2/re2.h头文件,链接时增加 -lre2以及-lpthread(多线环境使用)选项。
语法
在POSIX模式,RE@接受标准POSIX (egrep)语法正则表达式。在Perl模式,RE2接受大部分Perl操作符。唯一例外的是,那些要求回溯(潜在需要指数级的运行时)实现的部分。其中,包括反向引用(子匹配,还是支持的)和一般性断言。RE2,默认为Perl模式。
C++ 高级接口
这里包括两个基本的操作:
- RE2::FullMatch: 要求regexp表达式匹配整个输入文本。
- RE2::PartialMatch: 在输入文本中寻找一个子匹配。在POSIX模式,返回
最左-最长匹配,Perl模式也是相同的匹配。
例如,
vi re2_high_interface_test.cc
#include <re2/re2.h>
#include <iostream>
#include <assert.h>int
main(void)
{assert(RE2::FullMatch("hello", "h.*o"));assert(!RE2::FullMatch("hello", "e"));assert(RE2::PartialMatch("hello", "h.*o"));assert(RE2::PartialMatch("hello", "e"));std::cout << "Ok" << std::endl;return 0;
}
编译程序:
g++ -o re2_high_interface_test re2_high_interface_test.cc -lre2
执行re2_high_interface_test,程序正常运行,显示结果Ok。
子匹配萃取
两个匹配函数,都支持附加参数,来指定子匹配。此参数可以是一个字符串或一个整数类型或StringPiece类型。一个StringPiece是一个指向原始输入的指针,和一个字符串的长度计数。有点类似一个string,但是有自己的存储。和使用指针一样,当使用StringPiece时,你必须小心谨慎,原始文本已被删除或不在相同的边界时,不能使用。
示例:
vi re2_submatch_ex_test.cc
#include <re2/re2.h>
#include <iostream>
#include <assert.h>int
main(void)
{int i;std::string s;assert(RE2::FullMatch("ruby:1234", "(\\w+):(\\d+)", &s, &i));assert(s == "ruby");assert(i == 1234);// Fails: "ruby" cannot be parsed as an integer.assert(!RE2::FullMatch("ruby", "(.+)", &i));// Success; does not extract the number.assert(RE2::FullMatch("ruby:1234", "(\\w+):(\\d+)", &s));// Success; skips NULL argument.assert(RE2::FullMatch("ruby:1234", "(\\w+):(\\d+)", (void*)NULL, &i));// Fails: integer overflow keeps value from being stored in i.assert(!RE2::FullMatch("ruby:123456789123", "(\\w+):(\\d+)", &s, &i));std::cout << "Ok" << std::endl;return 0;
}
g++ -o re2_submatch_ex_test re2_submatch_ex_test.cc -lre2
预编译的正则表达式
上面的示例都是每次调用的时编译一次正则表达式。相反,你可以编译一次正则表达式,保存到一个RE2对象中,然后在每次调用时重用这个对象。
示例:
vi re2_prec_re_test.cc
#include <re2/re2.h>
#include <iostream>
#include <assert.h>int
main(void)
{int i;std::string s;RE2 re("(\\w+):(\\d+)");assert(re.ok()); // compiled; if not, see re.error();assert(RE2::FullMatch("ruby:1234", re, &s, &i));assert(RE2::FullMatch("ruby:1234", re, &s));assert(RE2::FullMatch("ruby:1234", re, (void*)NULL, &i));assert(!RE2::FullMatch("ruby:123456789123", re, &s, &i));std::cout << "Ok" << std::endl;return 0;
}
g++ -o re2_prec_re_test re2_prec_re_test.cc -lre2
选项
RE2构造器还有第二个可选参数,可以用来改变RE2的默认选项。例如,预定义的Quiet选项,当正则表达式解析失败时,不打印错误消息:
vi re2_options_test.cc
#include <re2/re2.h>
#include <iostream>
#include <assert.h>int
main(void)
{RE2 re("(ab", RE2::Quiet); // don't write to stderr for parser failureassert(!re.ok()); // can check re.error() for detailsstd::cout << "Ok" << std::endl;return 0;
}
编译程序:
g++ -o re2_options_test re2_options_test.cc -lre2
其他有用的预定义选项,是Latin1 (禁用UTF-8)和POSIX (使用POSIX语法和最左-最长匹配)。
你可以定义自己的RE2::Options对象,然后配置它。所有的选项在re2/re2.h文件中。
Unicode规范化
RE2操作Unicode的码点(code points): 它没有试图进行规范化。例如,正则表达式/ü/(U+00FC, u和分音符)不匹配"ü"(U+0075 U+0308, u紧挨结合分音符)。规范化,是一个长期,参与的话题。最小的解决方案,如果你需要这样的匹配,是在使用RE2之前的处理环节中同时规范化正则表达式和输入。相关主题的更多细节,请参考http://www.unicode.org/reports/tr15/。
额外的技巧和窍门
RE2的高级应用技巧,如构造自己的参数列表,或将RE2作为词法分析器使用或解析十六进制、十进制和C-基数数字,请参考re2.h文件。
“回溯”与“非回溯”的区别
以下照片内容,源自“sregex: matching Perl 5 regexes on data streams”讲演文档.




RE2的各种包装
An Inferno wrapper is at http://code.google.com/p/inferno-re2/.
A Python wrapper is at http://github.com/facebook/pyre2/.
A Ruby wrapper is at http://github.com/axic/rre2/.
An Erlang wrapper is at http://github.com/tuncer/re2/.
A Perl wrapper is at http://search.cpan.org/~dgl/re-engine-RE2-0.05/lib/re/engine/RE2.pm.
An Eiffel wrapper is at http://sourceforge.net/projects/eiffelre2/.
RE2支持的语法
这里列出了RE2支持的正则表达式语法。同时,也列出了PCRE、PERL和VIM接受的语法。蓝色内容是,RE2不支持的语法。
| Single characters: | |
| . | any character, including newline (s=true) |
| [xyz] | character class |
| [^xyz] | negated character class |
| \d | Perl character class |
| \D | negated Perl character class |
| [:alpha:] | ASCII character class |
| [:^alpha:] | negated ASCII character class |
| \pN | Unicode character class (one-letter name) |
| \p{Greek} | Unicode character class |
| \PN | negated Unicode character class (one-letter name) |
| \P{Greek} | negated Unicode character class |
| Composites: | |
| xy | x followed by y |
| x|y | x or y (prefer x) |
| Repetitions: | |
| x | zero or more x, prefer more |
| x+ | one or more x, prefer more |
| x? | zero or one x, prefer one |
| x{n,m} | n or n+1 or ... or m x, prefer more |
| x{n,} | n or more x, prefer more |
| x{n} | exactly n x |
| x? | zero or more x, prefer fewer |
| x+? | one or more x, prefer fewer |
| x?? | zero or one x, prefer zero |
| x{n,m}? | n or n+1 or ... or m x, prefer fewer |
| x{n,}? | n or more x, prefer fewer |
| x{n}? | exactly n x |
| x{} | (≡ x) (NOT SUPPORTED) VIM |
| x{-} | (≡ x?) (NOT SUPPORTED) VIM |
| x{-n} | (≡ x{n}?) (NOT SUPPORTED) VIM |
| x= | (≡ x?) (NOT SUPPORTED) VIM |
| Possessive repetitions: | |
| x+ | zero or more x, possessive (NOT SUPPORTED) |
| x++ | one or more x, possessive (NOT SUPPORTED) |
| x?+ | zero or one x, possessive (NOT SUPPORTED) |
| x{n,m}+ | n or ... or m x, possessive (NOT SUPPORTED) |
| x{n,}+ | n or more x, possessive (NOT SUPPORTED) |
| x{n}+ | exactly n x, possessive (NOT SUPPORTED) |
| Grouping: | |
| (re) | numbered capturing group |
| (?Pre) | named & numbered capturing group |
| (?re) | named & numbered capturing group (NOT SUPPORTED) |
| (?'name're) | named & numbered capturing group (NOT SUPPORTED) |
| (?:re) | non-capturing group |
| (?flags) | set flags within current group; non-capturing |
| (?flags:re) | set flags during re; non-capturing |
| (?#text) | comment (NOT SUPPORTED) |
| (?|x|y|z) | branch numbering reset (NOT SUPPORTED) |
| (?>re) | possessive match of re (NOT SUPPORTED) |
| re@> | possessive match of re (NOT SUPPORTED) VIM |
| %(re) | non-capturing group (NOT SUPPORTED) VIM |
| Flags: | |
| i | case-insensitive (default false) |
| m | multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false) |
| s | let . match \n (default false) |
| U | ungreedy: swap meaning of x and x?, x+ and x+?, etc (default false) |
| Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). | |
| Empty strings: | |
| ^ | at beginning of text or line (m=true) |
| $ | at end of text (like \z not \Z) or line (m=true) |
| \A | at beginning of text |
| \b | at word boundary (\w on one side and \W, \A, or \z on the other) |
| \B | not a word boundary |
| \G | at beginning of subtext being searched (NOT SUPPORTED) PCRE |
| \G | at end of last match (NOT SUPPORTED) PERL |
| \Z | at end of text, or before newline at end of text (NOT SUPPORTED) |
| \z | at end of text |
| (?=re) | before text matching re (NOT SUPPORTED) |
| (?!re) | before text not matching re (NOT SUPPORTED) |
| (?<=re) | after text matching re (NOT SUPPORTED) |
| (?<!re) | after text not matching re (NOT SUPPORTED) |
| re& | before text matching re (NOT SUPPORTED) VIM |
| re@= | before text matching re (NOT SUPPORTED) VIM |
| re@! | before text not matching re (NOT SUPPORTED) VIM |
| re@<= | after text matching re (NOT SUPPORTED) VIM |
| re@<! | after text not matching re (NOT SUPPORTED) VIM |
| \zs | sets start of match (= \K) (NOT SUPPORTED) VIM |
| \ze | sets end of match (NOT SUPPORTED) VIM |
| \%^ | beginning of file (NOT SUPPORTED) VIM |
| \%$ | end of file (NOT SUPPORTED) VIM |
| \%V | on screen (NOT SUPPORTED) VIM |
| \%# | cursor position (NOT SUPPORTED) VIM |
| \%'m | mark m position (NOT SUPPORTED) VIM |
| \%23l | in line 23 (NOT SUPPORTED) VIM |
| \%23c | in column 23 (NOT SUPPORTED) VIM |
| \%23v | in virtual column 23 (NOT SUPPORTED) VIM |
| Escape sequences: | |
| \a | bell (≡ \007) |
| \f | form feed (≡ \014) |
| \t | horizontal tab (≡ \011) |
| \n | newline (≡ \012) |
| \r | carriage return (≡ \015) |
| \v | vertical tab character (≡ \013) |
| * | literal , for any punctuation character |
| \123 | octal character code (up to three digits) |
| \x7F | hex character code (exactly two digits) |
| \x{10FFFF} | hex character code |
| \C | match a single byte even in UTF-8 mode |
| \Q...\E | literal text ... even if ... has punctuation |
| \1 | backreference (NOT SUPPORTED) |
| \b | backspace (NOT SUPPORTED) (use \010) |
| \cK | control char ^K (NOT SUPPORTED) (use \001 etc) |
| \e | escape (NOT SUPPORTED) (use \033) |
| \g1 | backreference (NOT SUPPORTED) |
| \g{1} | backreference (NOT SUPPORTED) |
| \g{+1} | backreference (NOT SUPPORTED) |
| \g{-1} | backreference (NOT SUPPORTED) |
| \g{name} | named backreference (NOT SUPPORTED) |
| \g | subroutine call (NOT SUPPORTED) |
| \g'name' | subroutine call (NOT SUPPORTED) |
| \k | named backreference (NOT SUPPORTED) |
| \k'name' | named backreference (NOT SUPPORTED) |
| \lX | lowercase X (NOT SUPPORTED) |
| \ux | uppercase x (NOT SUPPORTED) |
| \L...\E | lowercase text ... (NOT SUPPORTED) |
| \K | reset beginning of $0 (NOT SUPPORTED) |
| \N{name} | named Unicode character (NOT SUPPORTED) |
| \R | line break (NOT SUPPORTED) |
| \U...\E | upper case text ... (NOT SUPPORTED) |
| \X | extended Unicode sequence (NOT SUPPORTED) |
| \%d123 | decimal character 123 (NOT SUPPORTED) VIM |
| \%xFF | hex character FF (NOT SUPPORTED) VIM |
| \%o123 | octal character 123 (NOT SUPPORTED) VIM |
| \%u1234 | Unicode character 0x1234 (NOT SUPPORTED) VIM |
| \%U12345678 | Unicode character 0x12345678 (NOT SUPPORTED) VIM |
| Character class elements: | |
| x | single character |
| A-Z | character range (inclusive) |
| \d | Perl character class |
| [:foo:] | ASCII character class foo |
| \p{Foo} | Unicode character class Foo |
| \pF | Unicode character class F (one-letter name) |
| Named character classes as character class elements: | |
| [\d] | digits (≡ \d) |
| [^\d] | not digits (≡ \D) |
| [\D] | not digits (≡ \D) |
| [^\D] | not not digits (≡ \d) |
| [[:name:]] | named ASCII class inside character class (≡ [:name:]) |
| [^[:name:]] | named ASCII class inside negated character class (≡ [:^name:]) |
| [\p{Name}] | named Unicode property inside character class (≡ \p{Name}) |
| [^\p{Name}] | named Unicode property inside negated character class (≡ \P{Name}) |
| Perl character classes: | |
| \d | digits (≡ [0-9]) |
| \D | not digits (≡ [^0-9]) |
| \s | whitespace (≡ [\t\n\f\r ]) |
| \S | not whitespace (≡ [^\t\n\f\r ]) |
| \w | word characters (≡ [0-9A-Za-z]) |
| \W | not word characters (≡ [^0-9A-Za-z]) |
| \h | horizontal space (NOT SUPPORTED) |
| \H | not horizontal space (NOT SUPPORTED) |
| \v | vertical space (NOT SUPPORTED) |
| \V | not vertical space (NOT SUPPORTED) |
| ASCII character classes: | |
| [:alnum:] | alphanumeric (≡ [0-9A-Za-z]) |
| [:alpha:] | alphabetic (≡ [A-Za-z]) |
| [:ascii:] | ASCII (≡ [\x00-\x7F]) |
| [:blank:] | blank (≡ [\t ]) |
| [:cntrl:] | control (≡ [\x00-\x1F\x7F]) |
| [:digit:] | digits (≡ [0-9]) |
| [:graph:] | graphical (≡ [!-~] == [A-Za-z0-9!"#$%&'()+,-./:;<=>?@[\]^</tt><tt>{|}~]</tt>)</td></tr> <tr><td><tt>[:lower:]</tt></td><td>lower case (≡ <tt>[a-z]</tt>)</td></tr> <tr><td><tt>[:print:]</tt></td><td>printable (≡ <tt>[ -~] == [ [:graph:]]</tt>)</td></tr> <tr><td><tt>[:punct:]</tt></td><td>punctuation (≡ <tt>[!-/:-@[-</tt><tt>{-~]) |
| [:space:] | whitespace (≡ [\t\n\v\f\r ]) |
| [:upper:] | upper case (≡ [A-Z]) |
| [:word:] | word characters (≡ [0-9A-Za-z]) |
| [:xdigit:] | hex digit (≡ [0-9A-Fa-f]) |
| Unicode character class names--general category: | |
| C | other |
| Cc | control |
| Cf | format |
| Cn | unassigned code points (NOT SUPPORTED) |
| Co | private use |
| Cs | surrogate |
| L | letter |
| LC | cased letter (NOT SUPPORTED) |
| L& | cased letter (NOT SUPPORTED) |
| Ll | lowercase letter |
| Lm | modifier letter |
| Lo | other letter |
| Lt | titlecase letter |
| Lu | uppercase letter |
| M | mark |
| Mc | spacing mark |
| Me | enclosing mark |
| Mn | non-spacing mark |
| N | number |
| Nd | decimal number |
| Nl | letter number |
| No | other number |
| P | punctuation |
| Pc | connector punctuation |
| Pd | dash punctuation |
| Pe | close punctuation |
| Pf | final punctuation |
| Pi | initial punctuation |
| Po | other punctuation |
| Ps | open punctuation |
| S | symbol |
| Sc | currency symbol |
| Sk | modifier symbol |
| Sm | math symbol |
| So | other symbol |
| Z | separator |
| Zl | line separator |
| Zp | paragraph separator |
| Zs | space separator |
| Unicode character class names--scripts: | |
| Arabic | Arabic |
| Armenian | Armenian |
| Balinese | Balinese |
| Bengali | Bengali |
| Bopomofo | Bopomofo |
| Braille | Braille |
| Buginese | Buginese |
| Buhid | Buhid |
| Canadian_Aboriginal | Canadian Aboriginal |
| Carian | Carian |
| Cham | Cham |
| Cherokee | Cherokee |
| Common | characters not specific to one script |
| Coptic | Coptic |
| Cuneiform | Cuneiform |
| Cypriot | Cypriot |
| Cyrillic | Cyrillic |
| Deseret | Deseret |
| Devanagari | Devanagari |
| Ethiopic | Ethiopic |
| Georgian | Georgian |
| Glagolitic | Glagolitic |
| Gothic | Gothic |
| Greek | Greek |
| Gujarati | Gujarati |
| Gurmukhi | Gurmukhi |
| Han | Han |
| Hangul | Hangul |
| Hanunoo | Hanunoo |
| Hebrew | Hebrew |
| Hiragana | Hiragana |
| Inherited | inherit script from previous character |
| Kannada | Kannada |
| Katakana | Katakana |
| Kayah_Li | Kayah Li |
| Kharoshthi | Kharoshthi |
| Khmer | Khmer |
| Lao | Lao |
| Latin | Latin |
| Lepcha | Lepcha |
| Limbu | Limbu |
| Linear_B | Linear B |
| Lycian | Lycian |
| Lydian | Lydian |
| Malayalam | Malayalam |
| Mongolian | Mongolian |
| Myanmar | Myanmar |
| New_Tai_Lue | New Tai Lue (aka Simplified Tai Lue) |
| Nko | Nko |
| Ogham | Ogham |
| Ol_Chiki | Ol Chiki |
| Old_Italic | Old Italic |
| Old_Persian | Old Persian |
| Oriya | Oriya |
| Osmanya | Osmanya |
| Phags_Pa | 'Phags Pa |
| Phoenician | Phoenician |
| Rejang | Rejang |
| Runic | Runic |
| Saurashtra | Saurashtra |
| Shavian | Shavian |
| Sinhala | Sinhala |
| Sundanese | Sundanese |
| Syloti_Nagri | Syloti Nagri |
| Syriac | Syriac |
| Tagalog | Tagalog |
| Tagbanwa | Tagbanwa |
| Tai_Le | Tai Le |
| Tamil | Tamil |
| Telugu | Telugu |
| Thaana | Thaana |
| Thai | Thai |
| Tibetan | Tibetan |
| Tifinagh | Tifinagh |
| Ugaritic | Ugaritic |
| Vai | Vai |
| Yi | Yi |
| Vim character classes: | |
| \i | identifier character (NOT SUPPORTED)/font> VIM |
| \I | \i except digits (NOT SUPPORTED) VIM |
| \k | keyword character (NOT SUPPORTED) VIM |
| \K | \k except digits (NOT SUPPORTED) VIM |
| \f | file name character (NOT SUPPORTED) VIM |
| \F | \f except digits (NOT SUPPORTED) VIM |
| \p | printable character (NOT SUPPORTED) VIM |
| \P | \p except digits (NOT SUPPORTED) VIM |
| \s | whitespace character (≡ [ \t]) (NOT SUPPORTED) VIM |
| \S | non-white space character (≡ [^ \t]) (NOT SUPPORTED) VIM |
| \d | digits (≡ [0-9]) VIM |
| \D | not \d VIM |
| \x | hex digits (≡ [0-9A-Fa-f]) (NOT SUPPORTED) VIM |
| \X | not \x (NOT SUPPORTED) VIM |
| \o | octal digits (≡ [0-7]) (NOT SUPPORTED) VIM |
| \O | not \o (NOT SUPPORTED) VIM |
| \w | word character VIM |
| \W | not \w VIM |
| \h | head of word character (NOT SUPPORTED) VIM |
| \H | not \h (NOT SUPPORTED) VIM |
| \a | alphabetic (NOT SUPPORTED) VIM |
| \A | not \a (NOT SUPPORTED) VIM |
| \l | lowercase (NOT SUPPORTED) VIM |
| \L | not lowercase (NOT SUPPORTED) VIM |
| \u | uppercase (NOT SUPPORTED) VIM |
| \U | not uppercase (NOT SUPPORTED) VIM |
| _x | \x plus newline, for any x (NOT SUPPORTED) VIM |
| Vim flags: | |
| \c | ignore case (NOT SUPPORTED) VIM |
| \C | match case (NOT SUPPORTED) VIM |
| \m | magic (NOT SUPPORTED) VIM |
| \M | nomagic (NOT SUPPORTED) VIM |
| \v | verymagic (NOT SUPPORTED) VIM |
| \V | verynomagic (NOT SUPPORTED) VIM |
| \Z | ignore differences in Unicode combining characters (NOT SUPPORTED) VIM |
| Magic: | |
| (?{code}) | arbitrary Perl code (NOT SUPPORTED) PERL |
| (??{code}) | postponed arbitrary Perl code (NOT SUPPORTED) PERL |
| (?n) | recursive call to regexp capturing group n (NOT SUPPORTED) |
| (?+n) | recursive call to relative group +n (NOT SUPPORTED) |
| (?-n) | recursive call to relative group -n (NOT SUPPORTED) |
| (?C) | PCRE callout (NOT SUPPORTED) PCRE |
| (?R) | recursive call to entire regexp (≡ (?0)) (NOT SUPPORTED) |
| (?&name) | recursive call to named group (NOT SUPPORTED) |
| (?P=name) | named backreference (NOT SUPPORTED) |
| (?P>name) | recursive call to named group (NOT SUPPORTED) |
| (?(cond)true|false) | conditional branch (NOT SUPPORTED) |
| (?(cond)true) | conditional branch (NOT SUPPORTED) |
| (ACCEPT) | make regexps more like Prolog (NOT SUPPORTED) |
| (COMMIT) | (NOT SUPPORTED) |
| (F) | (NOT SUPPORTED) |
| (FAIL) | (NOT SUPPORTED) |
| (MARK) | (NOT SUPPORTED) |
| (PRUNE) | (NOT SUPPORTED) |
| (SKIP) | (NOT SUPPORTED) |
| (THEN) | (NOT SUPPORTED) |
| (ANY) | set newline convention (NOT SUPPORTED) |
| (ANYCRLF) | (NOT SUPPORTED) |
| (CR) | (NOT SUPPORTED) |
| (CRLF) | (NOT SUPPORTED) |
| (LF) | (NOT SUPPORTED) |
| (BSR_ANYCRLF) | set \R convention (NOT SUPPORTED) PCRE |
| (*BSR_UNICODE) | (NOT SUPPORTED) PCRE |
扩展阅读
-
"perlre - Perl regular expressions" http://perldoc.perl.org/perlre.html
-
"Implementing Regular Expressions" http://swtch.com/~rsc/regexp
-
The re1 project: http://code.google.com/p/re1
-
The re2 project: http://code.google.com/p/re2
-
sregex: A non-backtracking regex engine matching on data streams
-
sregex: matching Perl 5 regexes on data streams: http://agentzh.org/misc/slides/yapc-na-2013-sregex.pdf
参考资料
-
RE2官网资料:http://code.google.com/p/re2/wiki/ -
sregex: matching Perl 5 regexes on data streams: http://agentzh.org/misc/slides/yapc-na-2013-sregex.pdf















