]> granicus.if.org Git - onig/blob - README.md
update README.md
[onig] / README.md
1 [![Build Status](https://travis-ci.org/kkos/oniguruma.svg?branch=master)](https://travis-ci.org/kkos/oniguruma)
2
3 Oniguruma
4 =========
5
6 https://github.com/kkos/oniguruma
7
8 Oniguruma is a modern and flexible regular expressions library. It
9 encompasses features from different regular expression implementations
10 that traditionally exist in different languages.
11
12 Character encoding can be specified per regular expression object.
13
14 Supported character encodings:
15
16   ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
17   EUC-JP, EUC-TW, EUC-KR, EUC-CN,
18   Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
19   ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
20   ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
21   ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
22
23 * GB18030: contributed by KUBO Takehiro
24 * CP1251:  contributed by Byte
25
26
27 New feature of version 6.9.0
28 --------------------------
29
30 * NEW: update Unicode version 11.0.0
31 * NEW: add Emoji properties
32
33
34 New feature of version 6.8.2
35 --------------------------
36
37 * Fix: #80 UChar in header causes issue
38 * NEW API: onig_set_callout_user_data_of_match_param()  (* omission in 6.8.0)
39 * add doc/CALLOUTS.API and doc/CALLOUTS.API.ja
40
41
42 New feature of version 6.8.1
43 --------------------------
44
45 * Update shared library version to 5.0.0 for API incompatible changes from 6.7.1
46
47
48 New feature of version 6.8.0
49 --------------------------
50
51 * Retry-limit-in-match function enabled by default
52 * NEW: configure option --enable-posix-api=no  (* enabled by default)
53 * NEW API: onig_search_with_param(), onig_match_with_param()
54 * NEW: Callouts of contents  (?{...contents...}) (?{...}\[tag]\[X<>]) (?{{...}})
55 * NEW: Callouts of name      (*name) (*name\[tag]{args...})
56 * NEW: Builtin callouts  (*FAIL) (*MISMATCH) (*ERROR{n}) (*COUNT) (*MAX{n}) etc..
57 * Examples of Callouts program: [callout.c](sample/callout.c), [count.c](sample/count.c), [echo.c](sample/echo.c)
58
59
60 New feature of version 6.7.1
61 --------------------------
62
63 * NEW: Mechanism of retry-limit-in-match (* disabled by default)
64
65
66 New feature of version 6.7.0
67 --------------------------
68
69 * NEW: hexadecimal codepoint \uHHHH
70 * NEW: add ONIG_SYNTAX_ONIGURUMA (== ONIG_SYNTAX_DEFAULT)
71 * Disabled \N and \O on ONIG_SYNTAX_RUBY
72 * Reduced size of object file
73
74
75 New feature of version 6.6.0
76 --------------------------
77
78 * NEW: ASCII only mode options for character type/property (?WDSP)
79 * NEW: Extended Grapheme Cluster boundary \y, \Y (*original)
80 * NEW: Extended Grapheme Cluster \X
81 * Range-clear (Absent-clear) operator restores previous range in retractions.
82
83
84 New feature of version 6.5.0
85 --------------------------
86
87 * NEW: \K (keep)
88 * NEW: \R (general newline) \N (no newline)
89 * NEW: \O (true anychar)
90 * NEW: if-then-else   (?(...)...\|...)
91 * NEW: Backreference validity checker (?(xxx)) (*original)
92 * NEW: Absent repeater (?~absent)  \[is equal to (?\~\|absent|\O*)]
93 * NEW: Absent expression   (?~|absent|expr)  (*original)
94 * NEW: Absent stopper (?~|absent)     (*original)
95
96
97 New feature of version 6.4.0
98 --------------------------
99
100 * Fix fatal problem of endless repeat on Windows
101 * NEW: call zero (call the total regexp) \g<0>
102 * NEW: relative backref/call by positive number \k<+n>, \g<+n>
103
104
105 New feature of version 6.3.0
106 --------------------------
107
108 * NEW: octal codepoint \o{.....}
109 * Fixed CVE-2017-9224
110 * Fixed CVE-2017-9225
111 * Fixed CVE-2017-9226
112 * Fixed CVE-2017-9227
113 * Fixed CVE-2017-9228
114 * Fixed CVE-2017-9229
115
116
117 New feature of version 6.1.2
118 --------------------------
119
120 * allow word bound, word begin and word end in look-behind.
121 * NEW option: ONIG_OPTION_CHECK_VALIDITY_OF_STRING
122
123 New feature of version 6.1
124 --------------------------
125
126 * improved doc/RE
127 * NEW API: onig_scan()
128
129 New feature of version 6.0
130 --------------------------
131
132 * Update Unicode 8.0 Property/Case-folding
133 * NEW API: onig_unicode_define_user_property()
134
135
136 License
137 -------
138
139   BSD license.
140
141
142 Install
143 -------
144
145 ### Case 1: Unix and Cygwin platform
146
147    1. autoreconf -vfi   (* case: configure script is not found.)
148
149    2. ./configure
150    3. make
151    4. make install
152
153    * uninstall
154
155      make uninstall
156
157    * configuration check
158
159      onig-config --cflags
160      onig-config --libs
161      onig-config --prefix
162      onig-config --exec-prefix
163
164
165
166 ### Case 2: Windows 64/32bit platform (Visual Studio)
167
168    Execute make_win.bat
169
170       onig_s.lib:  static link library
171       onig.dll:    dynamic link library
172
173    * test (ASCII/Shift_JIS)
174
175       1. cd src
176       2. copy ..\windows\testc.c .
177       3. nmake -f Makefile.windows ctest
178
179    (I have checked by Visual Studio Community 2015)
180
181
182
183 Regular Expressions
184 -------------------
185
186   See [doc/RE](doc/RE) or [doc/RE.ja](doc/RE.ja) for Japanese.
187
188
189 Usage
190 -----
191
192   Include oniguruma.h in your program. (Oniguruma API)
193   See doc/API for Oniguruma API.
194
195   If you want to disable UChar type (== unsigned char) definition
196   in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then 
197   include oniguruma.h.
198
199   If you want to disable regex_t type definition in oniguruma.h,
200   define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
201
202   Example of the compiling/linking command line in Unix or Cygwin,
203   (prefix == /usr/local case)
204
205     cc sample.c -L/usr/local/lib -lonig
206
207
208   If you want to use static link library(onig_s.lib) in Win32,
209   add option -DONIG_EXTERN=extern to C compiler.
210
211
212
213 Sample Programs
214 ---------------
215
216 |File                  |Description                               |
217 |:---------------------|:-----------------------------------------|
218 |sample/simple.c       |example of the minimum (Oniguruma API)    |
219 |sample/names.c        |example of the named group callback.      |
220 |sample/encode.c       |example of some encodings.                |
221 |sample/listcap.c      |example of the capture history.           |
222 |sample/posix.c        |POSIX API sample.                         |
223 |sample/scan.c         |example of using onig_scan().             |
224 |sample/sql.c          |example of the variable meta characters.  |
225 |sample/user_property.c|example of user defined Unicode property. |
226 |sample/callout.c      |example of callouts.                      |
227
228
229 Test Programs
230
231 |File               |Description                            |
232 |:------------------|:--------------------------------------|
233 |sample/syntax.c    |Perl, Java and ASIS syntax test.       |
234 |sample/crnl.c      |--enable-crnl-as-line-terminator test  |
235
236
237
238 Source Files
239 ------------
240
241 |File               |Description                                             |
242 |:------------------|:-------------------------------------------------------|
243 |oniguruma.h        |Oniguruma API header file (public)                      |
244 |onig-config.in     |configuration check program template                    |
245 |regenc.h           |character encodings framework header file               |
246 |regint.h           |internal definitions                                    |
247 |regparse.h         |internal definitions for regparse.c and regcomp.c       |
248 |regcomp.c          |compiling and optimization functions                    |
249 |regenc.c           |character encodings framework                           |
250 |regerror.c         |error message function                                  |
251 |regext.c           |extended API functions (deluxe version API)             |
252 |regexec.c          |search and match functions                              |
253 |regparse.c         |parsing functions.                                      |
254 |regsyntax.c        |pattern syntax functions and built-in syntax definitions|
255 |regtrav.c          |capture history tree data traverse functions            |
256 |regversion.c       |version info function                                   |
257 |st.h               |hash table functions header file                        |
258 |st.c               |hash table functions                                    |
259 |oniggnu.h          |GNU regex API header file (public)                      |
260 |reggnu.c           |GNU regex API functions                                 |
261 |onigposix.h        |POSIX API header file (public)                          |
262 |regposerr.c        |POSIX error message function                            |
263 |regposix.c         |POSIX API functions                                     |
264 |mktable.c          |character type table generator                          |
265 |ascii.c            |ASCII encoding                                          |
266 |euc_jp.c           |EUC-JP encoding                                         |
267 |euc_tw.c           |EUC-TW encoding                                         |
268 |euc_kr.c           |EUC-KR, EUC-CN encoding                                 |
269 |sjis.c             |Shift_JIS encoding                                      |
270 |big5.c             |Big5      encoding                                      |
271 |gb18030.c          |GB18030   encoding                                      |
272 |koi8.c             |KOI8      encoding                                      |
273 |koi8_r.c           |KOI8-R    encoding                                      |
274 |cp1251.c           |CP1251    encoding                                      |
275 |iso8859_1.c        |ISO-8859-1 (Latin-1)                                    |
276 |iso8859_2.c        |ISO-8859-2 (Latin-2)                                    |
277 |iso8859_3.c        |ISO-8859-3 (Latin-3)                                    |
278 |iso8859_4.c        |ISO-8859-4 (Latin-4)                                    |
279 |iso8859_5.c        |ISO-8859-5 (Cyrillic)                                   |
280 |iso8859_6.c        |ISO-8859-6 (Arabic)                                     |
281 |iso8859_7.c        |ISO-8859-7 (Greek)                                      |
282 |iso8859_8.c        |ISO-8859-8 (Hebrew)                                     |
283 |iso8859_9.c        |ISO-8859-9 (Latin-5 or Turkish)                         |
284 |iso8859_10.c       |ISO-8859-10 (Latin-6 or Nordic)                         |
285 |iso8859_11.c       |ISO-8859-11 (Thai)                                      |
286 |iso8859_13.c       |ISO-8859-13 (Latin-7 or Baltic Rim)                     |
287 |iso8859_14.c       |ISO-8859-14 (Latin-8 or Celtic)                         |
288 |iso8859_15.c       |ISO-8859-15 (Latin-9 or West European with Euro)        |
289 |iso8859_16.c       |ISO-8859-16 (Latin-10)                                  |
290 |utf8.c             |UTF-8    encoding                                       |
291 |utf16_be.c         |UTF-16BE encoding                                       |
292 |utf16_le.c         |UTF-16LE encoding                                       |
293 |utf32_be.c         |UTF-32BE encoding                                       |
294 |utf32_le.c         |UTF-32LE encoding                                       |
295 |unicode.c          |common codes of Unicode encoding                        |
296 |unicode_fold_data.c|Unicode folding data                                    |
297 |windows/testc.c    |Test program for Windowns (VC++)                        |