From: Brian Pane Date: Wed, 20 Mar 2002 06:22:57 +0000 (+0000) Subject: PCRE 3.9 merge X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=cbe4da67d6c650a96e65c293df1c6b495850bb1f;p=apache PCRE 3.9 merge git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@94043 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/srclib/pcre/ChangeLog b/srclib/pcre/ChangeLog index 5bedd53bc6..a93f347f45 100644 --- a/srclib/pcre/ChangeLog +++ b/srclib/pcre/ChangeLog @@ -1,6 +1,181 @@ ChangeLog for PCRE ------------------ +Version 3.0 02-Jan-02 +--------------------- + +1. A bit of extraneous text had somehow crept into the pcregrep documentation. + +2. If --disable-static was given, the building process failed when trying to +build pcretest and pcregrep. (For some reason it was using libtool to compile +them, which is not right, as they aren't part of the library.) + + +Version 3.8 18-Dec-01 +--------------------- + +1. The experimental UTF-8 code was completely screwed up. It was packing the +bytes in the wrong order. How dumb can you get? + + +Version 3.7 29-Oct-01 +--------------------- + +1. In updating pcretest to check change 1 of version 3.6, I screwed up. +This caused pcretest, when used on the test data, to segfault. Unfortunately, +this didn't happen under Solaris 8, where I normally test things. + +2. The Makefile had to be changed to make it work on BSD systems, where 'make' +doesn't seem to recognize that ./xxx and xxx are the same file. (This entry +isn't in ChangeLog distributed with 3.7 because I forgot when I hastily made +this fix an hour or so after the initial 3.7 release.) + + +Version 3.6 23-Oct-01 +--------------------- + +1. Crashed with /(sens|respons)e and \1ibility/ and "sense and sensibility" if +offsets passed as NULL with zero offset count. + +2. The config.guess and config.sub files had not been updated when I moved to +the latest autoconf. + + +Version 3.5 15-Aug-01 +--------------------- + +1. Added some missing #if !defined NOPOSIX conditionals in pcretest.c that +had been forgotten. + +2. By using declared but undefined structures, we can avoid using "void" +definitions in pcre.h while keeping the internal definitions of the structures +private. + +3. The distribution is now built using autoconf 2.50 and libtool 1.4. From a +user point of view, this means that both static and shared libraries are built +by default, but this can be individually controlled. More of the work of +handling this static/shared cases is now inside libtool instead of PCRE's make +file. + +4. The pcretest utility is now installed along with pcregrep because it is +useful for users (to test regexs) and by doing this, it automatically gets +relinked by libtool. The documentation has been turned into a man page, so +there are now .1, .txt, and .html versions in /doc. + +5. Upgrades to pcregrep: + (i) Added long-form option names like gnu grep. + (ii) Added --help to list all options with an explanatory phrase. + (iii) Added -r, --recursive to recurse into sub-directories. + (iv) Added -f, --file to read patterns from a file. + +6. pcre_exec() was referring to its "code" argument before testing that +argument for NULL (and giving an error if it was NULL). + +7. Upgraded Makefile.in to allow for compiling in a different directory from +the source directory. + +8. Tiny buglet in pcretest: when pcre_fullinfo() was called to retrieve the +options bits, the pointer it was passed was to an int instead of to an unsigned +long int. This mattered only on 64-bit systems. + +9. Fixed typo (3.4/1) in pcre.h again. Sigh. I had changed pcre.h (which is +generated) instead of pcre.in, which it its source. Also made the same change +in several of the .c files. + +10. A new release of gcc defines printf() as a macro, which broke pcretest +because it had an ifdef in the middle of a string argument for printf(). Fixed +by using separate calls to printf(). + +11. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure +script, to force use of CR or LF instead of \n in the source. On non-Unix +systems, the value can be set in config.h. + +12. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an +absolute limit. Changed the text of the error message to make this clear, and +likewise updated the man page. + +13. The limit of 99 on the number of capturing subpatterns has been removed. +The new limit is 65535, which I hope will not be a "real" limit. + + +Version 3.4 22-Aug-00 +--------------------- + +1. Fixed typo in pcre.h: unsigned const char * changed to const unsigned char *. + +2. Diagnose condition (?(0) as an error instead of crashing on matching. + + +Version 3.3 01-Aug-00 +--------------------- + +1. If an octal character was given, but the value was greater than \377, it +was not getting masked to the least significant bits, as documented. This could +lead to crashes in some systems. + +2. Perl 5.6 (if not earlier versions) accepts classes like [a-\d] and treats +the hyphen as a literal. PCRE used to give an error; it now behaves like Perl. + +3. Added the functions pcre_free_substring() and pcre_free_substring_list(). +These just pass their arguments on to (pcre_free)(), but they are provided +because some uses of PCRE bind it to non-C systems that can call its functions, +but cannot call free() or pcre_free() directly. + +4. Add "make test" as a synonym for "make check". Corrected some comments in +the Makefile. + +5. Add $(DESTDIR)/ in front of all the paths in the "install" target in the +Makefile. + +6. Changed the name of pgrep to pcregrep, because Solaris has introduced a +command called pgrep for grepping around the active processes. + +7. Added the beginnings of support for UTF-8 character strings. + +8. Arranged for the Makefile to pass over the settings of CC, CFLAGS, and +RANLIB to ./ltconfig so that they are used by libtool. I think these are all +the relevant ones. (AR is not passed because ./ltconfig does its own figuring +out for the ar command.) + + +Version 3.2 12-May-00 +--------------------- + +This is purely a bug fixing release. + +1. If the pattern /((Z)+|A)*/ was matched agained ZABCDEFG it matched Z instead +of ZA. This was just one example of several cases that could provoke this bug, +which was introduced by change 9 of version 2.00. The code for breaking +infinite loops after an iteration that matches an empty string was't working +correctly. + +2. The pcretest program was not imitating Perl correctly for the pattern /a*/g +when matched against abbab (for example). After matching an empty string, it +wasn't forcing anchoring when setting PCRE_NOTEMPTY for the next attempt; this +caused it to match further down the string than it should. + +3. The code contained an inclusion of sys/types.h. It isn't clear why this +was there because it doesn't seem to be needed, and it causes trouble on some +systems, as it is not a Standard C header. It has been removed. + +4. Made 4 silly changes to the source to avoid stupid compiler warnings that +were reported on the Macintosh. The changes were from + + while ((c = *(++ptr)) != 0 && c != '\n'); +to + while ((c = *(++ptr)) != 0 && c != '\n') ; + +Totally extraordinary, but if that's what it takes... + +5. PCRE is being used in one environment where neither memmove() nor bcopy() is +available. Added HAVE_BCOPY and an autoconf test for it; if neither +HAVE_MEMMOVE nor HAVE_BCOPY is set, use a built-in emulation function which +assumes the way PCRE uses memmove() (always moving upwards). + +6. PCRE is being used in one environment where strchr() is not available. There +was only one use in pcre.c, and writing it out to avoid strchr() probably gives +faster code anyway. + Version 3.2 12-May-00 --------------------- diff --git a/srclib/pcre/internal.h b/srclib/pcre/internal.h index b4b750f6c6..0c8c1c9df6 100644 --- a/srclib/pcre/internal.h +++ b/srclib/pcre/internal.h @@ -9,7 +9,7 @@ the file Tech.Notes for some information on the internals. Written by: Philip Hazel - Copyright (c) 1997-2000 University of Cambridge + Copyright (c) 1997-2001 University of Cambridge ----------------------------------------------------------------------------- Permission is granted to anyone to use this software for any purpose on any @@ -105,7 +105,7 @@ time, run time or study time, respectively. */ #define PUBLIC_OPTIONS \ (PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \ - PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY) + PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8) #define PUBLIC_EXEC_OPTIONS \ (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY) @@ -123,12 +123,36 @@ typedef int BOOL; #define FALSE 0 #define TRUE 1 +/* Escape items that are just an encoding of a particular data value. Note that +ESC_N is defined as yet another macro, which is set in config.h to either \n +(the default) or \r (which some people want). */ + +#ifndef ESC_E +#define ESC_E 27 +#endif + +#ifndef ESC_F +#define ESC_F '\f' +#endif + +#ifndef ESC_N +#define ESC_N NEWLINE +#endif + +#ifndef ESC_R +#define ESC_R '\r' +#endif + +#ifndef ESC_T +#define ESC_T '\t' +#endif + /* These are escaped items that aren't just an encoding of a particular data value such as \n. They must have non-zero values, as check_escape() returns their negation. Also, they must appear in the same order as in the opcode definitions below, up to ESC_z. The final one must be ESC_REF as subsequent values are used for \1, \2, \3, etc. There is a test in the code for an escape -greater than ESC_b and less than ESC_X to detect the types that may be +greater than ESC_b and less than ESC_Z to detect the types that may be repeated. If any new escapes are put in-between that don't consume a character, that code will have to change. */ @@ -224,19 +248,26 @@ enum { OP_ONCE, /* Once matched, don't back up into the subpattern */ OP_COND, /* Conditional group */ - OP_CREF, /* Used to hold an extraction string number */ + OP_CREF, /* Used to hold an extraction string number (cond ref) */ OP_BRAZERO, /* These two must remain together and in this */ OP_BRAMINZERO, /* order. */ + OP_BRANUMBER, /* Used for extracting brackets whose number is greater + than can fit into an opcode. */ + OP_BRA /* This and greater values are used for brackets that - extract substrings. */ + extract substrings up to a basic limit. After that, + use is made of OP_BRANUMBER. */ }; -/* The highest extraction number. This is limited by the number of opcodes -left after OP_BRA, i.e. 255 - OP_BRA. We actually set it somewhat lower. */ +/* The highest extraction number before we have to start using additional +bytes. (Originally PCRE didn't have support for extraction counts highter than +this number.) The value is limited by the number of opcodes left after OP_BRA, +i.e. 255 - OP_BRA. We actually set it a bit lower to leave room for additional +opcodes. */ -#define EXTRACT_MAX 99 +#define EXTRACT_BASIC_MAX 150 /* The texts of compile-time error messages are defined as macros here so that they can be accessed by the POSIX wrapper and converted into error codes. Yes, @@ -255,13 +286,13 @@ just to accommodate the POSIX wrapper. */ #define ERR10 "operand of unlimited repeat could match the empty string" #define ERR11 "internal error: unexpected repeat" #define ERR12 "unrecognized character after (?" -#define ERR13 "too many capturing parenthesized sub-patterns" +#define ERR13 "unused error" #define ERR14 "missing )" #define ERR15 "back reference to non-existent subpattern" #define ERR16 "erroffset passed as NULL" #define ERR17 "unknown option bit(s) set" #define ERR18 "missing ) after comment" -#define ERR19 "too many sets of parentheses" +#define ERR19 "parentheses nested too deeply" #define ERR20 "regular expression too large" #define ERR21 "failed to get memory" #define ERR22 "unmatched parentheses" @@ -274,6 +305,10 @@ just to accommodate the POSIX wrapper. */ #define ERR29 "(?p must be followed by )" #define ERR30 "unknown POSIX class name" #define ERR31 "POSIX collating elements are not supported" +#define ERR32 "this version of PCRE is not compiled with PCRE_UTF8 support" +#define ERR33 "characters with values > 255 are not yet supported in classes" +#define ERR34 "character value in \\x{...} sequence is too large" +#define ERR35 "invalid condition (?(0)" /* All character handling must be done as unsigned characters. Otherwise there are problems with top-bit-set characters and functions such as isspace(). @@ -292,8 +327,8 @@ typedef struct real_pcre { size_t size; const unsigned char *tables; unsigned long int options; - uschar top_bracket; - uschar top_backref; + unsigned short int top_bracket; + unsigned short int top_backref; uschar first_char; uschar req_char; uschar code[1]; @@ -330,6 +365,7 @@ typedef struct match_data { BOOL offset_overflow; /* Set if too many extractions */ BOOL notbol; /* NOTBOL flag */ BOOL noteol; /* NOTEOL flag */ + BOOL utf8; /* UTF8 flag */ BOOL endonly; /* Dollar not before final \n */ BOOL notempty; /* Empty string match not wanted */ const uschar *start_pattern; /* For use when recursing */ diff --git a/srclib/pcre/pcreposix.c b/srclib/pcre/pcreposix.c index e48d93e4c6..9b2efe2098 100644 --- a/srclib/pcre/pcreposix.c +++ b/srclib/pcre/pcreposix.c @@ -12,7 +12,7 @@ functions. Written by: Philip Hazel - Copyright (c) 1997-2000 University of Cambridge + Copyright (c) 1997-2001 University of Cambridge ----------------------------------------------------------------------------- Permission is granted to anyone to use this software for any purpose on any @@ -62,13 +62,13 @@ static int eint[] = { REG_BADRPT, /* "operand of unlimited repeat could match the empty string" */ REG_ASSERT, /* "internal error: unexpected repeat" */ REG_BADPAT, /* "unrecognized character after (?" */ - REG_ESIZE, /* "too many capturing parenthesized sub-patterns" */ + REG_ASSERT, /* "unused error" */ REG_EPAREN, /* "missing )" */ REG_ESUBREG, /* "back reference to non-existent subpattern" */ REG_INVARG, /* "erroffset passed as NULL" */ REG_INVARG, /* "unknown option bit(s) set" */ REG_EPAREN, /* "missing ) after comment" */ - REG_ESIZE, /* "too many sets of parentheses" */ + REG_ESIZE, /* "parentheses nested too deeply" */ REG_ESIZE, /* "regular expression too large" */ REG_ESPACE, /* "failed to get memory" */ REG_EPAREN, /* "unmatched brackets" */ @@ -80,7 +80,11 @@ static int eint[] = { REG_BADPAT, /* "assertion expected after (?(" */ REG_BADPAT, /* "(?p must be followed by )" */ REG_ECTYPE, /* "unknown POSIX class name" */ - REG_BADPAT /* "POSIX collating elements are not supported" */ + REG_BADPAT, /* "POSIX collating elements are not supported" */ + REG_INVARG, /* "this version of PCRE is not compiled with PCRE_UTF8 support" */ + REG_BADPAT, /* "characters with values > 255 are not yet supported in classes" */ + REG_BADPAT, /* "character value in \x{...} sequence is too large" */ + REG_BADPAT /* "invalid condition (?(0)" */ }; /* Table of texts corresponding to POSIX error codes */