From dafd6d0318638269cca12d9dd1cb33b5a935b01c Mon Sep 17 00:00:00 2001
From: Ryan Bloom
Return information about the first character of any matched string, for a
non-anchored pattern. If there is a fixed first character, e.g. from a pattern
-such as (cat|cow|coyote), then it is returned in the integer pointed to by
+such as (cat|cow|coyote), it is returned in the integer pointed to by
where. Otherwise, if either
@@ -442,9 +442,9 @@ starts with "^", or
(if it were set, the pattern would be anchored),
-then -1 is returned, indicating that the pattern matches only at the
-start of a subject string or after any "\n" within the string. Otherwise -2 is
-returned. For anchored patterns, -2 is returned.
+-1 is returned, indicating that the pattern matches only at the start of a
+subject string or after any "\n" within the string. Otherwise -2 is returned.
+For anchored patterns, -2 is returned.
@@ -734,8 +734,8 @@ is a pointer to the vector of integer offsets that was passed to
were captured by the match, including the substring that matched the entire
regular expression. This is the value returned by pcre_exec if it
is greater than zero. If pcre_exec() returned zero, indicating that it
-ran out of space in ovector, then the value passed as
-stringcount should be the size of the vector divided by three.
+ran out of space in ovector, the value passed as stringcount should
+be the size of the vector divided by three.
The functions pcre_copy_substring() and pcre_get_substring() @@ -857,7 +857,7 @@ patterns using the non-Perl item (?R). with the settings of captured strings when part of a pattern is repeated. For example, matching "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if -the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) get set. +the pattern is changed to /^(aa(b(b))?)+$/ then $2 (and $3) are set.
In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the @@ -1186,10 +1186,10 @@ end of the subject in both modes, and if all branches of a pattern start with
Outside a character class, a dot in the pattern matches any one character in the subject, including a non-printing character, but not (by default) newline. -If the PCRE_DOTALL option is set, then dots match newlines as well. The -handling of dot is entirely independent of the handling of circumflex and -dollar, the only relationship being that they both involve newline characters. -Dot has no special meaning in a character class. +If the PCRE_DOTALL option is set, dots match newlines as well. The handling of +dot is entirely independent of the handling of circumflex and dollar, the only +relationship being that they both involve newline characters. Dot has no +special meaning in a character class.
@@ -1580,7 +1580,7 @@ fails, because it matches the entire string due to the greediness of the .* item.
-However, if a quantifier is followed by a question mark, then it ceases to be +However, if a quantifier is followed by a question mark, it ceases to be greedy, and instead matches the minimum number of times possible, so the pattern
@@ -1605,8 +1605,8 @@ which matches one digit by preference, but can match two if that is the only way the rest of the pattern matches.-If the PCRE_UNGREEDY option is set (an option which is not available in Perl) -then the quantifiers are not greedy by default, but individual ones can be made +If the PCRE_UNGREEDY option is set (an option which is not available in Perl), +the quantifiers are not greedy by default, but individual ones can be made greedy by following them with a question mark. In other words, it inverts the default behaviour.
@@ -1617,8 +1617,8 @@ compiled pattern, in proportion to the size of the minimum or maximum.If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent -to Perl's /s) is set, thus allowing the . to match newlines, then the pattern -is implicitly anchored, because whatever follows will be tried against every +to Perl's /s) is set, thus allowing the . to match newlines, the pattern is +implicitly anchored, because whatever follows will be tried against every character position in the subject string, so there is no point in retrying the overall match at any position after the first. PCRE treats such a pattern as though it were preceded by \A. In cases where it is known that the subject @@ -1677,7 +1677,7 @@ itself. So the pattern
matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If caseful matching is in force at the time of the -back reference, then the case of letters is relevant. For example, +back reference, the case of letters is relevant. For example,
@@ -1690,7 +1690,7 @@ capturing subpattern is matched caselessly.There may be more than one back reference to the same subpattern. If a -subpattern has not actually been used in a particular match, then any back +subpattern has not actually been used in a particular match, any back references to it always fail. For example, the pattern
@@ -1702,9 +1702,9 @@ references to it always fail. For example, the pattern always fails if it starts to match "a" rather than "bc". Because there may be up to 99 back references, all digits following the backslash are taken as part of a potential back reference number. If the pattern continues with a -digit character, then some delimiter must be used to terminate the back -reference. If the PCRE_EXTENDED option is set, this can be whitespace. -Otherwise an empty comment can be used. +digit character, some delimiter must be used to terminate the back reference. +If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an empty +comment can be used.
A back reference that occurs inside the parentheses to which it refers fails @@ -1836,7 +1836,7 @@ Several assertions (of any sort) may occur in succession. For example, matches "foo" preceded by three digits that are not "999". Notice that each of the assertions is applied independently at the same point in the subject string. First there is a check that the previous three characters are all -digits, then there is a check that the same three characters are not "999". +digits, and then there is a check that the same three characters are not "999". This pattern does not match "foo" preceded by six characters, the first of which are digits and the last three of which are not "999". For example, it doesn't match "123abcfoo". A pattern to do that is @@ -1957,11 +1957,11 @@ what follows matches the rest of the pattern. If the pattern is specified as
-then the initial .* matches the entire string at first, but when this fails -(because there is no following "a"), it backtracks to match all but the last -character, then all but the last two characters, and so on. Once again the -search for "a" covers the entire string, from right to left, so we are no -better off. However, if the pattern is written as +the initial .* matches the entire string at first, but when this fails (because +there is no following "a"), it backtracks to match all but the last character, +then all but the last two characters, and so on. Once again the search for "a" +covers the entire string, from right to left, so we are no better off. However, +if the pattern is written as
@@ -1969,7 +1969,7 @@ better off. However, if the pattern is written as
-then there can be no backtracking for the .* item; it can match only the entire +there can be no backtracking for the .* item; it can match only the entire string. The subsequent lookbehind assertion does a single test on the last four characters. If it fails, the match fails immediately. For long strings, this approach makes a significant difference to the processing time. @@ -2032,11 +2032,10 @@ subpattern, a compile-time error occurs.
There are two kinds of condition. If the text between the parentheses consists -of a sequence of digits, then the condition is satisfied if the capturing -subpattern of that number has previously matched. Consider the following -pattern, which contains non-significant white space to make it more readable -(assume the PCRE_EXTENDED option) and to divide it into three parts for ease -of discussion: +of a sequence of digits, the condition is satisfied if the capturing subpattern +of that number has previously matched. Consider the following pattern, which +contains non-significant white space to make it more readable (assume the +PCRE_EXTENDED option) and to divide it into three parts for ease of discussion:
@@ -2157,7 +2156,7 @@ on at the top level. If additional parentheses are added, giving ^ ^ ^ ^-then the string they capture is "ab(cd)ef", the contents of the top level +the string they capture is "ab(cd)ef", the contents of the top level parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain extra memory to store data during a recursion, which it does by using pcre_malloc, freeing it via pcre_free afterwards. If no diff --git a/srclib/pcre/doc/pcre.txt b/srclib/pcre/doc/pcre.txt index f28ee99e8b..b8106e4457 100644 --- a/srclib/pcre/doc/pcre.txt +++ b/srclib/pcre/doc/pcre.txt @@ -353,8 +353,8 @@ INFORMATION ABOUT A PATTERN Return information about the first character of any matched string, for a non-anchored pattern. If there is a fixed first character, e.g. from a pattern such as - (cat|cow|coyote), then it is returned in the integer pointed - to by where. Otherwise, if either + (cat|cow|coyote), it is returned in the integer pointed to + by where. Otherwise, if either (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch starts with "^", or @@ -363,10 +363,10 @@ INFORMATION ABOUT A PATTERN PCRE_DOTALL is not set (if it were set, the pattern would be anchored), - then -1 is returned, indicating that the pattern matches - only at the start of a subject string or after any "\n" - within the string. Otherwise -2 is returned. For anchored - patterns, -2 is returned. + -1 is returned, indicating that the pattern matches only at + the start of a subject string or after any "\n" within the + string. Otherwise -2 is returned. For anchored patterns, -2 + is returned. PCRE_INFO_FIRSTTABLE @@ -622,8 +622,8 @@ EXTRACTING CAPTURED SUBSTRINGS entire regular expression. This is the value returned by pcre_exec if it is greater than zero. If pcre_exec() returned zero, indicating that it ran out of space in ovec- - tor, then the value passed as stringcount should be the size - of the vector divided by three. + tor, the value passed as stringcount should be the size of + the vector divided by three. The functions pcre_copy_substring() and pcre_get_substring() extract a single substring, whose number is given as string- @@ -739,7 +739,7 @@ DIFFERENCES FROM PERL "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2 unset. However, if the pattern is changed to - /^(aa(b(b))?)+$/ then $2 (and $3) get set. + /^(aa(b(b))?)+$/ then $2 (and $3) are set. In Perl 5.004 $2 is set in both cases, and that is also true of PCRE. If in the future Perl changes to a consistent state @@ -1056,11 +1056,11 @@ FULL STOP (PERIOD, DOT) Outside a character class, a dot in the pattern matches any one character in the subject, including a non-printing char- acter, but not (by default) newline. If the PCRE_DOTALL - option is set, then dots match newlines as well. The han- - dling of dot is entirely independent of the handling of cir- - cumflex and dollar, the only relationship being that they - both involve newline characters. Dot has no special meaning - in a character class. + option is set, dots match newlines as well. The handling of + dot is entirely independent of the handling of circumflex + and dollar, the only relationship being that they both + involve newline characters. Dot has no special meaning in a + character class. @@ -1406,9 +1406,9 @@ REPETITION fails, because it matches the entire string due to the greediness of the .* item. - However, if a quantifier is followed by a question mark, - then it ceases to be greedy, and instead matches the minimum - number of times possible, so the pattern + However, if a quantifier is followed by a question mark, it + ceases to be greedy, and instead matches the minimum number + of times possible, so the pattern /\*.*?\*/ @@ -1425,7 +1425,7 @@ REPETITION that is the only way the rest of the pattern matches. If the PCRE_UNGREEDY option is set (an option which is not - available in Perl) then the quantifiers are not greedy by + available in Perl), the quantifiers are not greedy by default, but individual ones can be made greedy by following them with a question mark. In other words, it inverts the default behaviour. @@ -1437,7 +1437,7 @@ REPETITION If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent to Perl's /s) is set, thus allowing the . - to match newlines, then the pattern is implicitly anchored, + to match newlines, the pattern is implicitly anchored, because whatever follows will be tried against every charac- ter position in the subject string, so there is no point in retrying the overall match at any position after the first. @@ -1490,8 +1490,8 @@ BACK REFERENCES matches "sense and sensibility" and "response and responsi- bility", but not "sense and responsibility". If caseful - matching is in force at the time of the back reference, then - the case of letters is relevant. For example, + matching is in force at the time of the back reference, the + case of letters is relevant. For example, ((?i)rah)\s+\1 @@ -1501,8 +1501,8 @@ BACK REFERENCES There may be more than one back reference to the same sub- pattern. If a subpattern has not actually been used in a - particular match, then any back references to it always - fail. For example, the pattern + particular match, any back references to it always fail. For + example, the pattern (a|(bc))\2 @@ -1510,9 +1510,9 @@ BACK REFERENCES Because there may be up to 99 back references, all digits following the backslash are taken as part of a potential back reference number. If the pattern continues with a digit - character, then some delimiter must be used to terminate the - back reference. If the PCRE_EXTENDED option is set, this can - be whitespace. Otherwise an empty comment can be used. + character, some delimiter must be used to terminate the back + reference. If the PCRE_EXTENDED option is set, this can be + whitespace. Otherwise an empty comment can be used. A back reference that occurs inside the parentheses to which it refers fails when the subpattern is first used, so, for @@ -1612,7 +1612,7 @@ ASSERTIONS matches "foo" preceded by three digits that are not "999". Notice that each of the assertions is applied independently at the same point in the subject string. First there is a - check that the previous three characters are all digits, + check that the previous three characters are all digits, and then there is a check that the same three characters are not "999". This pattern does not match "foo" preceded by six characters, the first of which are digits and the last three @@ -1713,21 +1713,20 @@ ONCE-ONLY SUBPATTERNS ^.*abcd$ - then the initial .* matches the entire string at first, but - when this fails (because there is no following "a"), it - backtracks to match all but the last character, then all but - the last two characters, and so on. Once again the search - for "a" covers the entire string, from right to left, so we - are no better off. However, if the pattern is written as + the initial .* matches the entire string at first, but when + this fails (because there is no following "a"), it back- + tracks to match all but the last character, then all but the + last two characters, and so on. Once again the search for + "a" covers the entire string, from right to left, so we are + no better off. However, if the pattern is written as ^(?>.*)(?<=abcd) - then there can be no backtracking for the .* item; it can - match only the entire string. The subsequent lookbehind - assertion does a single test on the last four characters. If - it fails, the match fails immediately. For long strings, - this approach makes a significant difference to the process- - ing time. + there can be no backtracking for the .* item; it can match + only the entire string. The subsequent lookbehind assertion + does a single test on the last four characters. If it fails, + the match fails immediately. For long strings, this approach + makes a significant difference to the processing time. When a pattern contains an unlimited repeat inside a subpat- tern that can itself be repeated an unlimited number of @@ -1777,12 +1776,12 @@ CONDITIONAL SUBPATTERNS error occurs. There are two kinds of condition. If the text between the - parentheses consists of a sequence of digits, then the - condition is satisfied if the capturing subpattern of that - number has previously matched. Consider the following pat- - tern, which contains non-significant white space to make it - more readable (assume the PCRE_EXTENDED option) and to - divide it into three parts for ease of discussion: + parentheses consists of a sequence of digits, the condition + is satisfied if the capturing subpattern of that number has + previously matched. Consider the following pattern, which + contains non-significant white space to make it more read- + able (assume the PCRE_EXTENDED option) and to divide it into + three parts for ease of discussion: ( \( )? [^()]+ (?(1) \) ) @@ -1888,8 +1887,8 @@ RECURSIVE PATTERNS \( ( ( (?>[^()]+) | (?R) )* ) \) ^ ^ - ^ ^ then the string they capture - is "ab(cd)ef", the contents of the top level parentheses. If + ^ ^ the string they capture is + "ab(cd)ef", the contents of the top level parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain extra memory to store data during a recursion, which it does by using pcre_malloc, freeing it diff --git a/srclib/pcre/doc/pcretest.txt b/srclib/pcre/doc/pcretest.txt index 831fdac987..0e6783af0c 100644 --- a/srclib/pcre/doc/pcretest.txt +++ b/srclib/pcre/doc/pcretest.txt @@ -63,10 +63,11 @@ to the matching process if the pattern begins with a lookbehind assertion (including \b or \B). If any call to pcre_exec() in a /g or /G sequence matches an empty string, the -next call is done with the PCRE_NOTEMPTY flag set so that it cannot match an -empty string again at the same point. If however, this second match fails, the -start offset is advanced by one, and the match is retried. This imitates the -way Perl handles such cases when using the /g modifier or the split() function. +next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED flags set in order +to search for another, non-empty, match at the same point. If this second match +fails, the start offset is advanced by one, and the normal match is retried. +This imitates the way Perl handles such cases when using the /g modifier or the +split() function. There are a number of other modifiers for controlling the way pcretest operates. diff --git a/srclib/pcre/internal.h b/srclib/pcre/internal.h index 91ff3018f1..b4b750f6c6 100644 --- a/srclib/pcre/internal.h +++ b/srclib/pcre/internal.h @@ -40,11 +40,27 @@ modules, but which are not relevant to the outside. */ #include "config.h" /* To cope with SunOS4 and other systems that lack memmove() but have bcopy(), -define a macro for memmove() if HAVE_MEMMOVE is false. */ +define a macro for memmove() if HAVE_MEMMOVE is false, provided that HAVE_BCOPY +is set. Otherwise, include an emulating function for those systems that have +neither (there some non-Unix environments where this is the case). This assumes +that all calls to memmove are moving strings upwards in store, which is the +case in PCRE. */ #if ! HAVE_MEMMOVE #undef memmove /* some systems may have a macro */ +#if HAVE_BCOPY #define memmove(a, b, c) bcopy(b, a, c) +#else +void * +pcre_memmove(unsigned char *dest, const unsigned char *src, size_t n) +{ +int i; +dest += n; +src += n; +for (i = 0; i < n; ++i) *(--dest) = *(--src); +} +#define memmove(a, b, c) pcre_memmove(a, b, c) +#endif #endif /* Standard C headers plus the external interface definition */ diff --git a/srclib/pcre/pcre.c b/srclib/pcre/pcre.c index e45dee8d96..e3fdde9114 100644 --- a/srclib/pcre/pcre.c +++ b/srclib/pcre/pcre.c @@ -145,6 +145,21 @@ static BOOL compile_regex(int, int, int *, uschar **, const uschar **, const char **, BOOL, int, int *, int *, compile_data *); +/* Structure for building a chain of data that actually lives on the +stack, for holding the values of the subject pointer at the start of each +subpattern, so as to detect when an empty string has been matched by a +subpattern - to break infinite loops. */ + +typedef struct eptrblock { + struct eptrblock *prev; + const uschar *saved_eptr; +} eptrblock; + +/* Flag bits for the match() function */ + +#define match_condassert 0x01 /* Called to check a condition assertion */ +#define match_isgroup 0x02 /* Set if start of bracketed group */ + /************************************************* @@ -855,7 +870,9 @@ for (;; ptr++) if ((cd->ctypes[c] & ctype_space) != 0) continue; if (c == '#') { - while ((c = *(++ptr)) != 0 && c != '\n'); + /* The space before the ; is to avoid a warning on a silly compiler + on the Macintosh. */ + while ((c = *(++ptr)) != 0 && c != '\n') ; continue; } } @@ -1795,7 +1812,9 @@ for (;; ptr++) if ((cd->ctypes[c] & ctype_space) != 0) continue; if (c == '#') { - while ((c = *(++ptr)) != 0 && c != '\n'); + /* The space before the ; is to avoid a warning on a silly compiler + on the Macintosh. */ + while ((c = *(++ptr)) != 0 && c != '\n') ; if (c == 0) break; continue; } @@ -2313,7 +2332,9 @@ while ((c = *(++ptr)) != 0) if ((compile_block.ctypes[c] & ctype_space) != 0) continue; if (c == '#') { - while ((c = *(++ptr)) != 0 && c != '\n'); + /* The space before the ; is to avoid a warning on a silly compiler + on the Macintosh. */ + while ((c = *(++ptr)) != 0 && c != '\n') ; continue; } } @@ -2523,8 +2544,8 @@ while ((c = *(++ptr)) != 0) else /* An assertion must follow */ { ptr++; /* Can treat like ':' as far as spacing is concerned */ - - if (ptr[2] != '?' || strchr("=!<", ptr[3]) == NULL) + if (ptr[2] != '?' || + (ptr[3] != '=' && ptr[3] != '!' && ptr[3] != '<') ) { ptr += 2; /* To get right offset in message */ *errorptr = ERR28; @@ -2737,7 +2758,9 @@ while ((c = *(++ptr)) != 0) if ((compile_block.ctypes[c] & ctype_space) != 0) continue; if (c == '#') { - while ((c = *(++ptr)) != 0 && c != '\n'); + /* The space before the ; is to avoid a warning on a silly compiler + on the Macintosh. */ + while ((c = *(++ptr)) != 0 && c != '\n') ; continue; } } @@ -3195,18 +3218,36 @@ Arguments: offset_top current top pointer md pointer to "static" info for the match ims current /i, /m, and /s options - condassert TRUE if called to check a condition assertion - eptrb eptr at start of last bracket + eptrb pointer to chain of blocks containing eptr at start of + brackets - for testing for empty matches + flags can contain + match_condassert - this is an assertion condition + match_isgroup - this is the start of a bracketed group Returns: TRUE if matched */ static BOOL match(register const uschar *eptr, register const uschar *ecode, - int offset_top, match_data *md, unsigned long int ims, BOOL condassert, - const uschar *eptrb) + int offset_top, match_data *md, unsigned long int ims, eptrblock *eptrb, + int flags) { unsigned long int original_ims = ims; /* Save for resetting on ')' */ +eptrblock newptrb; + +/* At the start of a bracketed group, add the current subject pointer to the +stack of such pointers, to be re-instated at the end of the group when we hit +the closing ket. When match() is called in other circumstances, we don't add to +the stack. */ + +if ((flags & match_isgroup) != 0) + { + newptrb.prev = eptrb; + newptrb.saved_eptr = eptr; + eptrb = &newptrb; + } + +/* Now start processing the operations. */ for (;;) { @@ -3252,7 +3293,8 @@ for (;;) do { - if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup)) + return TRUE; ecode += (ecode[1] << 8) + ecode[2]; } while (*ecode == OP_ALT); @@ -3278,7 +3320,8 @@ for (;;) DPRINTF(("start bracket 0\n")); do { - if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup)) + return TRUE; ecode += (ecode[1] << 8) + ecode[2]; } while (*ecode == OP_ALT); @@ -3297,7 +3340,7 @@ for (;;) return match(eptr, ecode + ((offset < offset_top && md->offset_vector[offset] >= 0)? 5 : 3 + (ecode[1] << 8) + ecode[2]), - offset_top, md, ims, FALSE, eptr); + offset_top, md, ims, eptrb, match_isgroup); } /* The condition is an assertion. Call match() to evaluate it - setting @@ -3305,13 +3348,14 @@ for (;;) else { - if (match(eptr, ecode+3, offset_top, md, ims, TRUE, NULL)) + if (match(eptr, ecode+3, offset_top, md, ims, NULL, + match_condassert | match_isgroup)) { ecode += 3 + (ecode[4] << 8) + ecode[5]; while (*ecode == OP_ALT) ecode += (ecode[1] << 8) + ecode[2]; } else ecode += (ecode[1] << 8) + ecode[2]; - return match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr); + return match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup); } /* Control never reaches here */ @@ -3348,7 +3392,7 @@ for (;;) case OP_ASSERTBACK: do { - if (match(eptr, ecode+3, offset_top, md, ims, FALSE, NULL)) break; + if (match(eptr, ecode+3, offset_top, md, ims, NULL, match_isgroup)) break; ecode += (ecode[1] << 8) + ecode[2]; } while (*ecode == OP_ALT); @@ -3356,7 +3400,7 @@ for (;;) /* If checking an assertion for a condition, return TRUE. */ - if (condassert) return TRUE; + if ((flags & match_condassert) != 0) return TRUE; /* Continue from after the assertion, updating the offsets high water mark, since extracts may have been taken during the assertion. */ @@ -3372,12 +3416,14 @@ for (;;) case OP_ASSERTBACK_NOT: do { - if (match(eptr, ecode+3, offset_top, md, ims, FALSE, NULL)) return FALSE; + if (match(eptr, ecode+3, offset_top, md, ims, NULL, match_isgroup)) + return FALSE; ecode += (ecode[1] << 8) + ecode[2]; } while (*ecode == OP_ALT); - if (condassert) return TRUE; + if ((flags & match_condassert) != 0) return TRUE; + ecode += 3; continue; @@ -3423,7 +3469,8 @@ for (;;) for (i = 1; i <= c; i++) save[i] = md->offset_vector[md->offset_end - i]; - rc = match(eptr, md->start_pattern, offset_top, md, ims, FALSE, eptrb); + rc = match(eptr, md->start_pattern, offset_top, md, ims, eptrb, + match_isgroup); for (i = 1; i <= c; i++) md->offset_vector[md->offset_end - i] = save[i]; if (save != stacksave) (pcre_free)(save); @@ -3449,10 +3496,12 @@ for (;;) case OP_ONCE: { const uschar *prev = ecode; + const uschar *saved_eptr = eptr; do { - if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) break; + if (match(eptr, ecode+3, offset_top, md, ims, eptrb, match_isgroup)) + break; ecode += (ecode[1] << 8) + ecode[2]; } while (*ecode == OP_ALT); @@ -3475,7 +3524,7 @@ for (;;) 5.005. If there is an options reset, it will get obeyed in the normal course of events. */ - if (*ecode == OP_KET || eptr == eptrb) + if (*ecode == OP_KET || eptr == saved_eptr) { ecode += 3; break; @@ -3494,13 +3543,14 @@ for (;;) if (*ecode == OP_KETRMIN) { - if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr) || - match(eptr, prev, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, ecode+3, offset_top, md, ims, eptrb, 0) || + match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup)) + return TRUE; } else /* OP_KETRMAX */ { - if (match(eptr, prev, offset_top, md, ims, FALSE, eptr) || - match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup) || + match(eptr, ecode+3, offset_top, md, ims, eptrb, 0)) return TRUE; } } return FALSE; @@ -3521,7 +3571,8 @@ for (;;) case OP_BRAZERO: { const uschar *next = ecode+1; - if (match(eptr, next, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, next, offset_top, md, ims, eptrb, match_isgroup)) + return TRUE; do next += (next[1] << 8) + next[2]; while (*next == OP_ALT); ecode = next + 3; } @@ -3531,7 +3582,8 @@ for (;;) { const uschar *next = ecode+1; do next += (next[1] << 8) + next[2]; while (*next == OP_ALT); - if (match(eptr, next+3, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, next+3, offset_top, md, ims, eptrb, match_isgroup)) + return TRUE; ecode++; } break; @@ -3546,6 +3598,9 @@ for (;;) case OP_KETRMAX: { const uschar *prev = ecode - (ecode[1] << 8) - ecode[2]; + const uschar *saved_eptr = eptrb->saved_eptr; + + eptrb = eptrb->prev; /* Back up the stack of bracket start pointers */ if (*prev == OP_ASSERT || *prev == OP_ASSERT_NOT || *prev == OP_ASSERTBACK || *prev == OP_ASSERTBACK_NOT || @@ -3565,7 +3620,10 @@ for (;;) int number = *prev - OP_BRA; int offset = number << 1; - DPRINTF(("end bracket %d\n", number)); +#ifdef DEBUG + printf("end bracket %d", number); + printf("\n"); +#endif if (number > 0) { @@ -3591,7 +3649,7 @@ for (;;) 5.005. If there is an options reset, it will get obeyed in the normal course of events. */ - if (*ecode == OP_KET || eptr == eptrb) + if (*ecode == OP_KET || eptr == saved_eptr) { ecode += 3; break; @@ -3602,13 +3660,14 @@ for (;;) if (*ecode == OP_KETRMIN) { - if (match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr) || - match(eptr, prev, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, ecode+3, offset_top, md, ims, eptrb, 0) || + match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup)) + return TRUE; } else /* OP_KETRMAX */ { - if (match(eptr, prev, offset_top, md, ims, FALSE, eptr) || - match(eptr, ecode+3, offset_top, md, ims, FALSE, eptr)) return TRUE; + if (match(eptr, prev, offset_top, md, ims, eptrb, match_isgroup) || + match(eptr, ecode+3, offset_top, md, ims, eptrb, 0)) return TRUE; } } return FALSE; @@ -3819,7 +3878,7 @@ for (;;) { for (i = min;; i++) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; if (i >= max || !match_ref(offset, eptr, length, md, ims)) return FALSE; @@ -3840,7 +3899,7 @@ for (;;) } while (eptr >= pp) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; eptr -= length; } @@ -3911,7 +3970,7 @@ for (;;) { for (i = min;; i++) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; if (i >= max || eptr >= md->end_subject) return FALSE; c = *eptr++; @@ -3935,7 +3994,7 @@ for (;;) } while (eptr >= pp) - if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; return FALSE; } @@ -4032,7 +4091,7 @@ for (;;) { for (i = min;; i++) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; if (i >= max || eptr >= md->end_subject || c != md->lcc[*eptr++]) @@ -4049,7 +4108,7 @@ for (;;) eptr++; } while (eptr >= pp) - if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; return FALSE; } @@ -4066,7 +4125,7 @@ for (;;) { for (i = min;; i++) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; if (i >= max || eptr >= md->end_subject || c != *eptr++) return FALSE; } @@ -4081,7 +4140,7 @@ for (;;) eptr++; } while (eptr >= pp) - if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; return FALSE; } @@ -4163,7 +4222,7 @@ for (;;) { for (i = min;; i++) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; if (i >= max || eptr >= md->end_subject || c == md->lcc[*eptr++]) @@ -4180,7 +4239,7 @@ for (;;) eptr++; } while (eptr >= pp) - if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; return FALSE; } @@ -4197,7 +4256,7 @@ for (;;) { for (i = min;; i++) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; if (i >= max || eptr >= md->end_subject || c == *eptr++) return FALSE; } @@ -4212,7 +4271,7 @@ for (;;) eptr++; } while (eptr >= pp) - if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; return FALSE; } @@ -4312,7 +4371,7 @@ for (;;) { for (i = min;; i++) { - if (match(eptr, ecode, offset_top, md, ims, FALSE, eptrb)) return TRUE; + if (match(eptr, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; if (i >= max || eptr >= md->end_subject) return FALSE; c = *eptr++; @@ -4431,7 +4490,7 @@ for (;;) } while (eptr >= pp) - if (match(eptr--, ecode, offset_top, md, ims, FALSE, eptrb)) + if (match(eptr--, ecode, offset_top, md, ims, eptrb, 0)) return TRUE; return FALSE; } @@ -4717,7 +4776,7 @@ do if certain parts of the pattern were not used. */ match_block.start_match = start_match; - if (!match(start_match, re->code, 2, &match_block, ims, FALSE, start_match)) + if (!match(start_match, re->code, 2, &match_block, ims, NULL, match_isgroup)) continue; /* Copy the offset information from temporary store if necessary */ diff --git a/srclib/pcre/pcre.in b/srclib/pcre/pcre.in index 74b0cfc579..8a531671a9 100644 --- a/srclib/pcre/pcre.in +++ b/srclib/pcre/pcre.in @@ -7,6 +7,9 @@ #ifndef _PCRE_H #define _PCRE_H +/* The file pcre.h is build by "configure". Do not edit it; instead +make changes to pcre.in. */ + #define PCRE_MAJOR @PCRE_MAJOR@ #define PCRE_MINOR @PCRE_MINOR@ #define PCRE_DATE @PCRE_DATE@ @@ -26,7 +29,6 @@ /* Have to include stdlib.h in order to ensure that size_t is defined; it is needed here for malloc. */ -#include