ChangeLog for PCRE
------------------
+Version 3.0 02-Jan-02
+---------------------
+
+1. A bit of extraneous text had somehow crept into the pcregrep documentation.
+
+2. If --disable-static was given, the building process failed when trying to
+build pcretest and pcregrep. (For some reason it was using libtool to compile
+them, which is not right, as they aren't part of the library.)
+
+
+Version 3.8 18-Dec-01
+---------------------
+
+1. The experimental UTF-8 code was completely screwed up. It was packing the
+bytes in the wrong order. How dumb can you get?
+
+
+Version 3.7 29-Oct-01
+---------------------
+
+1. In updating pcretest to check change 1 of version 3.6, I screwed up.
+This caused pcretest, when used on the test data, to segfault. Unfortunately,
+this didn't happen under Solaris 8, where I normally test things.
+
+2. The Makefile had to be changed to make it work on BSD systems, where 'make'
+doesn't seem to recognize that ./xxx and xxx are the same file. (This entry
+isn't in ChangeLog distributed with 3.7 because I forgot when I hastily made
+this fix an hour or so after the initial 3.7 release.)
+
+
+Version 3.6 23-Oct-01
+---------------------
+
+1. Crashed with /(sens|respons)e and \1ibility/ and "sense and sensibility" if
+offsets passed as NULL with zero offset count.
+
+2. The config.guess and config.sub files had not been updated when I moved to
+the latest autoconf.
+
+
+Version 3.5 15-Aug-01
+---------------------
+
+1. Added some missing #if !defined NOPOSIX conditionals in pcretest.c that
+had been forgotten.
+
+2. By using declared but undefined structures, we can avoid using "void"
+definitions in pcre.h while keeping the internal definitions of the structures
+private.
+
+3. The distribution is now built using autoconf 2.50 and libtool 1.4. From a
+user point of view, this means that both static and shared libraries are built
+by default, but this can be individually controlled. More of the work of
+handling this static/shared cases is now inside libtool instead of PCRE's make
+file.
+
+4. The pcretest utility is now installed along with pcregrep because it is
+useful for users (to test regexs) and by doing this, it automatically gets
+relinked by libtool. The documentation has been turned into a man page, so
+there are now .1, .txt, and .html versions in /doc.
+
+5. Upgrades to pcregrep:
+ (i) Added long-form option names like gnu grep.
+ (ii) Added --help to list all options with an explanatory phrase.
+ (iii) Added -r, --recursive to recurse into sub-directories.
+ (iv) Added -f, --file to read patterns from a file.
+
+6. pcre_exec() was referring to its "code" argument before testing that
+argument for NULL (and giving an error if it was NULL).
+
+7. Upgraded Makefile.in to allow for compiling in a different directory from
+the source directory.
+
+8. Tiny buglet in pcretest: when pcre_fullinfo() was called to retrieve the
+options bits, the pointer it was passed was to an int instead of to an unsigned
+long int. This mattered only on 64-bit systems.
+
+9. Fixed typo (3.4/1) in pcre.h again. Sigh. I had changed pcre.h (which is
+generated) instead of pcre.in, which it its source. Also made the same change
+in several of the .c files.
+
+10. A new release of gcc defines printf() as a macro, which broke pcretest
+because it had an ifdef in the middle of a string argument for printf(). Fixed
+by using separate calls to printf().
+
+11. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
+script, to force use of CR or LF instead of \n in the source. On non-Unix
+systems, the value can be set in config.h.
+
+12. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
+absolute limit. Changed the text of the error message to make this clear, and
+likewise updated the man page.
+
+13. The limit of 99 on the number of capturing subpatterns has been removed.
+The new limit is 65535, which I hope will not be a "real" limit.
+
+
+Version 3.4 22-Aug-00
+---------------------
+
+1. Fixed typo in pcre.h: unsigned const char * changed to const unsigned char *.
+
+2. Diagnose condition (?(0) as an error instead of crashing on matching.
+
+
+Version 3.3 01-Aug-00
+---------------------
+
+1. If an octal character was given, but the value was greater than \377, it
+was not getting masked to the least significant bits, as documented. This could
+lead to crashes in some systems.
+
+2. Perl 5.6 (if not earlier versions) accepts classes like [a-\d] and treats
+the hyphen as a literal. PCRE used to give an error; it now behaves like Perl.
+
+3. Added the functions pcre_free_substring() and pcre_free_substring_list().
+These just pass their arguments on to (pcre_free)(), but they are provided
+because some uses of PCRE bind it to non-C systems that can call its functions,
+but cannot call free() or pcre_free() directly.
+
+4. Add "make test" as a synonym for "make check". Corrected some comments in
+the Makefile.
+
+5. Add $(DESTDIR)/ in front of all the paths in the "install" target in the
+Makefile.
+
+6. Changed the name of pgrep to pcregrep, because Solaris has introduced a
+command called pgrep for grepping around the active processes.
+
+7. Added the beginnings of support for UTF-8 character strings.
+
+8. Arranged for the Makefile to pass over the settings of CC, CFLAGS, and
+RANLIB to ./ltconfig so that they are used by libtool. I think these are all
+the relevant ones. (AR is not passed because ./ltconfig does its own figuring
+out for the ar command.)
+
+
+Version 3.2 12-May-00
+---------------------
+
+This is purely a bug fixing release.
+
+1. If the pattern /((Z)+|A)*/ was matched agained ZABCDEFG it matched Z instead
+of ZA. This was just one example of several cases that could provoke this bug,
+which was introduced by change 9 of version 2.00. The code for breaking
+infinite loops after an iteration that matches an empty string was't working
+correctly.
+
+2. The pcretest program was not imitating Perl correctly for the pattern /a*/g
+when matched against abbab (for example). After matching an empty string, it
+wasn't forcing anchoring when setting PCRE_NOTEMPTY for the next attempt; this
+caused it to match further down the string than it should.
+
+3. The code contained an inclusion of sys/types.h. It isn't clear why this
+was there because it doesn't seem to be needed, and it causes trouble on some
+systems, as it is not a Standard C header. It has been removed.
+
+4. Made 4 silly changes to the source to avoid stupid compiler warnings that
+were reported on the Macintosh. The changes were from
+
+ while ((c = *(++ptr)) != 0 && c != '\n');
+to
+ while ((c = *(++ptr)) != 0 && c != '\n') ;
+
+Totally extraordinary, but if that's what it takes...
+
+5. PCRE is being used in one environment where neither memmove() nor bcopy() is
+available. Added HAVE_BCOPY and an autoconf test for it; if neither
+HAVE_MEMMOVE nor HAVE_BCOPY is set, use a built-in emulation function which
+assumes the way PCRE uses memmove() (always moving upwards).
+
+6. PCRE is being used in one environment where strchr() is not available. There
+was only one use in pcre.c, and writing it out to avoid strchr() probably gives
+faster code anyway.
+
Version 3.2 12-May-00
---------------------
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
#define PUBLIC_OPTIONS \
(PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \
- PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY)
+ PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8)
#define PUBLIC_EXEC_OPTIONS \
(PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY)
#define FALSE 0
#define TRUE 1
+/* Escape items that are just an encoding of a particular data value. Note that
+ESC_N is defined as yet another macro, which is set in config.h to either \n
+(the default) or \r (which some people want). */
+
+#ifndef ESC_E
+#define ESC_E 27
+#endif
+
+#ifndef ESC_F
+#define ESC_F '\f'
+#endif
+
+#ifndef ESC_N
+#define ESC_N NEWLINE
+#endif
+
+#ifndef ESC_R
+#define ESC_R '\r'
+#endif
+
+#ifndef ESC_T
+#define ESC_T '\t'
+#endif
+
/* These are escaped items that aren't just an encoding of a particular data
value such as \n. They must have non-zero values, as check_escape() returns
their negation. Also, they must appear in the same order as in the opcode
definitions below, up to ESC_z. The final one must be ESC_REF as subsequent
values are used for \1, \2, \3, etc. There is a test in the code for an escape
-greater than ESC_b and less than ESC_X to detect the types that may be
+greater than ESC_b and less than ESC_Z to detect the types that may be
repeated. If any new escapes are put in-between that don't consume a character,
that code will have to change. */
OP_ONCE, /* Once matched, don't back up into the subpattern */
OP_COND, /* Conditional group */
- OP_CREF, /* Used to hold an extraction string number */
+ OP_CREF, /* Used to hold an extraction string number (cond ref) */
OP_BRAZERO, /* These two must remain together and in this */
OP_BRAMINZERO, /* order. */
+ OP_BRANUMBER, /* Used for extracting brackets whose number is greater
+ than can fit into an opcode. */
+
OP_BRA /* This and greater values are used for brackets that
- extract substrings. */
+ extract substrings up to a basic limit. After that,
+ use is made of OP_BRANUMBER. */
};
-/* The highest extraction number. This is limited by the number of opcodes
-left after OP_BRA, i.e. 255 - OP_BRA. We actually set it somewhat lower. */
+/* The highest extraction number before we have to start using additional
+bytes. (Originally PCRE didn't have support for extraction counts highter than
+this number.) The value is limited by the number of opcodes left after OP_BRA,
+i.e. 255 - OP_BRA. We actually set it a bit lower to leave room for additional
+opcodes. */
-#define EXTRACT_MAX 99
+#define EXTRACT_BASIC_MAX 150
/* The texts of compile-time error messages are defined as macros here so that
they can be accessed by the POSIX wrapper and converted into error codes. Yes,
#define ERR10 "operand of unlimited repeat could match the empty string"
#define ERR11 "internal error: unexpected repeat"
#define ERR12 "unrecognized character after (?"
-#define ERR13 "too many capturing parenthesized sub-patterns"
+#define ERR13 "unused error"
#define ERR14 "missing )"
#define ERR15 "back reference to non-existent subpattern"
#define ERR16 "erroffset passed as NULL"
#define ERR17 "unknown option bit(s) set"
#define ERR18 "missing ) after comment"
-#define ERR19 "too many sets of parentheses"
+#define ERR19 "parentheses nested too deeply"
#define ERR20 "regular expression too large"
#define ERR21 "failed to get memory"
#define ERR22 "unmatched parentheses"
#define ERR29 "(?p must be followed by )"
#define ERR30 "unknown POSIX class name"
#define ERR31 "POSIX collating elements are not supported"
+#define ERR32 "this version of PCRE is not compiled with PCRE_UTF8 support"
+#define ERR33 "characters with values > 255 are not yet supported in classes"
+#define ERR34 "character value in \\x{...} sequence is too large"
+#define ERR35 "invalid condition (?(0)"
/* All character handling must be done as unsigned characters. Otherwise there
are problems with top-bit-set characters and functions such as isspace().
size_t size;
const unsigned char *tables;
unsigned long int options;
- uschar top_bracket;
- uschar top_backref;
+ unsigned short int top_bracket;
+ unsigned short int top_backref;
uschar first_char;
uschar req_char;
uschar code[1];
BOOL offset_overflow; /* Set if too many extractions */
BOOL notbol; /* NOTBOL flag */
BOOL noteol; /* NOTEOL flag */
+ BOOL utf8; /* UTF8 flag */
BOOL endonly; /* Dollar not before final \n */
BOOL notempty; /* Empty string match not wanted */
const uschar *start_pattern; /* For use when recursing */
Written by: Philip Hazel <ph10@cam.ac.uk>
- Copyright (c) 1997-2000 University of Cambridge
+ Copyright (c) 1997-2001 University of Cambridge
-----------------------------------------------------------------------------
Permission is granted to anyone to use this software for any purpose on any
REG_BADRPT, /* "operand of unlimited repeat could match the empty string" */
REG_ASSERT, /* "internal error: unexpected repeat" */
REG_BADPAT, /* "unrecognized character after (?" */
- REG_ESIZE, /* "too many capturing parenthesized sub-patterns" */
+ REG_ASSERT, /* "unused error" */
REG_EPAREN, /* "missing )" */
REG_ESUBREG, /* "back reference to non-existent subpattern" */
REG_INVARG, /* "erroffset passed as NULL" */
REG_INVARG, /* "unknown option bit(s) set" */
REG_EPAREN, /* "missing ) after comment" */
- REG_ESIZE, /* "too many sets of parentheses" */
+ REG_ESIZE, /* "parentheses nested too deeply" */
REG_ESIZE, /* "regular expression too large" */
REG_ESPACE, /* "failed to get memory" */
REG_EPAREN, /* "unmatched brackets" */
REG_BADPAT, /* "assertion expected after (?(" */
REG_BADPAT, /* "(?p must be followed by )" */
REG_ECTYPE, /* "unknown POSIX class name" */
- REG_BADPAT /* "POSIX collating elements are not supported" */
+ REG_BADPAT, /* "POSIX collating elements are not supported" */
+ REG_INVARG, /* "this version of PCRE is not compiled with PCRE_UTF8 support" */
+ REG_BADPAT, /* "characters with values > 255 are not yet supported in classes" */
+ REG_BADPAT, /* "character value in \x{...} sequence is too large" */
+ REG_BADPAT /* "invalid condition (?(0)" */
};
/* Table of texts corresponding to POSIX error codes */