From: esdoppio Date: Thu, 18 Aug 2016 15:55:18 +0000 (+0800) Subject: doc/RE: refinements X-Git-Tag: v6.1.0~45 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=2e81a00515f41b43c0c5f2a3a79ba91ff683013a;p=onig doc/RE: refinements --- diff --git a/doc/RE b/doc/RE index 5e26d21..7dd7ea4 100644 --- a/doc/RE +++ b/doc/RE @@ -5,7 +5,7 @@ syntax: ONIG_SYNTAX_RUBY (default) 1. Syntax elements - \ escape (enable or disable meta character meaning) + \ escape (enable or disable meta character) | alternation (...) group [...] character class @@ -13,23 +13,23 @@ syntax: ONIG_SYNTAX_RUBY (default) 2. Characters - \t horizontal tab (0x09) - \v vertical tab (0x0B) - \n newline (0x0A) - \r return (0x0D) - \b back space (0x08) - \f form feed (0x0C) - \a bell (0x07) - \e escape (0x1B) - \nnn octal char (encoded byte value) - \xHH hexadecimal char (encoded byte value) - \x{7HHHHHHH} wide hexadecimal char (character code point value) - \cx control char (character code point value) - \C-x control char (character code point value) - \M-x meta (x|0x80) (character code point value) - \M-\C-x meta control char (character code point value) - - (* \b is effective in character class [...] only) + \t horizontal tab (0x09) + \v vertical tab (0x0B) + \n newline (line feed) (0x0A) + \r carriage return (0x0D) + \b backspace (0x08) + \f form feed (0x0C) + \a bell (0x07) + \e escape (0x1B) + \nnn octal char (encoded byte value) + \xHH hexadecimal char (encoded byte value) + \x{7HHHHHHH} wide hexadecimal char (character code point value) + \cx control char (character code point value) + \C-x control char (character code point value) + \M-x meta (x|0x80) (character code point value) + \M-\C-x meta control char (character code point value) + + (* \b as backspace is effective in character class only) 3. Character types @@ -44,7 +44,7 @@ syntax: ONIG_SYNTAX_RUBY (default) Unicode: General_Category -- (Letter|Mark|Number|Connector_Punctuation) - \W non word char + \W non-word char \s whitespace char @@ -57,17 +57,17 @@ syntax: ONIG_SYNTAX_RUBY (default) -- Paragraph_Separator -- Space_Separator - \S non whitespace char + \S non-whitespace char \d decimal digit char Unicode: General_Category -- Decimal_Number - \D non decimal digit char + \D non-decimal-digit char \h hexadecimal digit char [0-9a-fA-F] - \H non hexadecimal digit char + \H non-hexdigit char Character Property @@ -97,9 +97,9 @@ syntax: ONIG_SYNTAX_RUBY (default) ? 1 or 0 times * 0 or more times + 1 or more times - {n,m} at least n but not more than m times + {n,m} at least n but no more than m times {n,} at least n times - {,n} at least 0 but not more than n times ({0,n}) + {,n} at least 0 but no more than n times ({0,n}) {n} n times reluctant @@ -111,7 +111,7 @@ syntax: ONIG_SYNTAX_RUBY (default) {n,}? at least n times {,n}? at least 0 but not more than n times (== {0,n}?) - possessive (greedy and does not backtrack after repeated) + possessive (greedy and does not backtrack once match) ?+ 1 or 0 times *+ 0 or more times @@ -127,7 +127,7 @@ syntax: ONIG_SYNTAX_RUBY (default) ^ beginning of the line $ end of the line \b word boundary - \B not word boundary + \B non-word boundary \A beginning of string \Z end of string, or before newline at the end \z end of string @@ -136,15 +136,15 @@ syntax: ONIG_SYNTAX_RUBY (default) 6. Character class - ^... negative class (lowest precedence operator) + ^... negative class (lowest precedence) x-y range from x to y [...] set (character class in character class) - ..&&.. intersection (low precedence at the next of ^) + ..&&.. intersection (low precedence, only higher than ^) ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w] - * If you want to use '[', '-', ']' as a normal character - in a character class, you should escape these characters by '\'. + * If you want to use '[', '-', or ']' as a normal character + in character class, you should escape them with '\'. POSIX bracket ([:xxxxx:], negate [:^xxxxx:]) @@ -196,37 +196,36 @@ syntax: ONIG_SYNTAX_RUBY (default) (?imx-imx) option on/off i: ignore case - m: multi-line (dot(.) match newline) + m: multi-line (dot (.) also matches newline) x: extended form (?imx-imx:subexp) option on/off for subexp - (?:subexp) not captured group - (subexp) captured group + (?:subexp) non-capturing group + (subexp) capturing group (?=subexp) look-ahead (?!subexp) negative look-ahead (?<=subexp) look-behind (?subexp) atomic group - don't backtrack in subexp. + no backtracks in subexp. (?subexp), (?'name'subexp) define named group - (All characters of the name must be a word character.) + (Each character of the name must be a word character.) - Not only a name but a number is assigned like a captured + Not only a name but a number is assigned like a capturing group. - Assigning the same name as two or more subexps is allowed. + Assigning the same name to two or more subexps is allowed. In this case, a subexp call can not be performed although the back reference is possible. @@ -308,25 +307,25 @@ syntax: ONIG_SYNTAX_RUBY (default) 10. Captured group - Behavior of the no-named group (...) changes with the following conditions. + Behavior of an unnamed group (...) changes with the following conditions. (But named group is not changed.) case 1. /.../ (named group is not used, no option) - (...) is treated as a captured group. + (...) is treated as a capturing group. case 2. /.../g (named group is not used, 'g' option) - (...) is treated as a no-captured group (?:...). + (...) is treated as a non-capturing group (?:...). case 3. /..(?..)../ (named group is used, no option) - (...) is treated as a no-captured group (?:...). + (...) is treated as a non-capturing group. numbered-backref/call is not allowed. case 4. /..(?..)../G (named group is used, 'G' option) - (...) is treated as a captured group. + (...) is treated as a capturing group. numbered-backref/call is allowed. where @@ -338,14 +337,14 @@ syntax: ONIG_SYNTAX_RUBY (default) ----------------------------- -A-1. Syntax depend options +A-1. Syntax-dependent options + ONIG_SYNTAX_RUBY - (?m): dot(.) match newline + (?m): dot (.) also matches newline + ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA - (?s): dot(.) match newline - (?m): ^ match after newline, $ match before newline + (?s): dot (.) also matches newline + (?m): ^ matches after newline, $ matches before newline A-2. Original extensions @@ -356,7 +355,7 @@ A-2. Original extensions + subexp call \g, \g -A-3. Lacked features compare with perl 5.8.0 +A-3. Missing features compared with perl 5.8.0 + \N{name} + \l,\u,\L,\U, \X, \C @@ -373,7 +372,7 @@ A-4. Differences with Japanized GNU regex(version 0.12) of Ruby 1.8 + add character property (\p{property}, \P{property}) + add hexadecimal digit char type (\h, \H) + add look-behind - (?<=fixed-char-length-pattern), (?