]> granicus.if.org Git - onig/blob - README.md
small fix to escape Markdown symbol in README.md
[onig] / README.md
1 Oniguruma
2 =========
3
4 https://github.com/kkos/oniguruma
5
6 FIXED Security Issues:
7 --------------------------
8   **CVE-2017-9224, CVE-2017-9225, CVE-2017-9226**
9   **CVE-2017-9227, CVE-2017-9228, CVE-2017-9229**
10
11 Oniguruma is a modern and flexible regular expressions library. It
12 encompasses features from different regular expression implementations
13 that traditionally exist in different languages. It comes close to
14 being a complete superset of all regular expression features found
15 in other regular expression implementations.
16
17 Its features include:
18 * Character encoding can be specified per regular expression object.
19 * Several regular expression types are supported:
20   * Oniguruma (native)
21   * POSIX
22   * Grep
23   * GNU Regex
24   * Perl
25   * Java
26   * Ruby
27   * Emacs
28
29 Supported character encodings:
30
31   ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE,
32   EUC-JP, EUC-TW, EUC-KR, EUC-CN,
33   Shift_JIS, Big5, GB18030, KOI8-R, CP1251,
34   ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5,
35   ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10,
36   ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16
37
38 * GB18030: contributed by KUBO Takehiro
39 * CP1251:  contributed by Byte
40
41
42 New feature of version 6.8.0
43 --------------------------
44
45 * Enabled retry-limit-in-match function
46 * NEW API: onig_search_with_param(), onig_match_with_param()
47 * NEW: Callouts of code (?{....}) (?{{....}})
48 * NEW: Callouts of name (*NAME) (*NAME:....)
49 * NEW: Builtin callout functions  (*FAIL) (*SUCCESS) (*ABORT) (*ERROR:n)
50
51
52 New feature of version 6.7.1
53 --------------------------
54
55 * NEW: Mechanism of retry-limit-in-match (* disabled by default)
56
57
58 New feature of version 6.7.0
59 --------------------------
60
61 * NEW: hexadecimal codepoint \uHHHH
62 * NEW: add ONIG_SYNTAX_ONIGURUMA (== ONIG_SYNTAX_DEFAULT)
63 * Disabled \N and \O on ONIG_SYNTAX_RUBY
64 * Reduced size of object file
65
66
67 New feature of version 6.6.0
68 --------------------------
69
70 * NEW: ASCII only mode options for character type/property (?WDSP)
71 * NEW: Extended Grapheme Cluster boundary \y, \Y (*original)
72 * NEW: Extended Grapheme Cluster \X
73 * Range-clear (Absent-clear) operator restores previous range in retractions.
74
75
76 New feature of version 6.5.0
77 --------------------------
78
79 * NEW: \K (keep)
80 * NEW: \R (general newline) \N (no newline)
81 * NEW: \O (true anychar)
82 * NEW: if-then-else syntax   (?(...)...\|...)
83 * NEW: Backreference validity checker (?(xxx)) (*original)
84 * NEW: Absent repeater (?~absent)  [is equal to (?~\|absent|\O*)]
85 * NEW: Absent expression   (?~|absent|expr)  (*original)
86 * NEW: Absent stopper (?~|absent)     (*original)
87
88
89 New feature of version 6.4.0
90 --------------------------
91
92 * Fix fatal problem of endless repeat on Windows
93 * NEW: call zero (call the total regexp) \g<0>
94 * NEW: relative backref/call by positive number \k<+n>, \g<+n>
95
96
97 New feature of version 6.3.0
98 --------------------------
99
100 * NEW: octal codepoint \o{.....}
101
102
103 New feature of version 6.1.2
104 --------------------------
105
106 * allow word bound, word begin and word end in look-behind.
107 * NEW option: ONIG_OPTION_CHECK_VALIDITY_OF_STRING
108
109 New feature of version 6.1
110 --------------------------
111
112 * improved doc/RE
113 * NEW API: onig_scan()
114
115 New feature of version 6.0
116 --------------------------
117
118 * Update Unicode 8.0 Property/Case-folding
119 * NEW API: onig_unicode_define_user_property()
120
121
122 License
123 -------
124
125   BSD license.
126
127
128 Install
129 -------
130
131 ### Case 1: Unix and Cygwin platform
132
133    1. autoreconf -vfi   (* case: configure script is not found.)
134
135    2. ./configure
136    3. make
137    4. make install
138
139    * uninstall
140
141      make uninstall
142
143    * configuration check
144
145      onig-config --cflags
146      onig-config --libs
147      onig-config --prefix
148      onig-config --exec-prefix
149
150
151
152 ### Case 2: Windows 64/32bit platform (Visual Studio)
153
154    execute make_win64 or make_win32
155
156       onig_s.lib:  static link library
157       onig.dll:    dynamic link library
158
159    * test (ASCII/Shift_JIS)
160
161       1. cd src
162       2. copy ..\windows\testc.c .
163       3. nmake -f Makefile.windows ctest
164
165    (I have checked by Visual Studio Community 2015)
166
167
168
169 Regular Expressions
170 -------------------
171
172   See [doc/RE](doc/RE) or [doc/RE.ja](doc/RE.ja) for Japanese.
173
174
175 Usage
176 -----
177
178   Include oniguruma.h in your program. (Oniguruma API)
179   See doc/API for Oniguruma API.
180
181   If you want to disable UChar type (== unsigned char) definition
182   in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then 
183   include oniguruma.h.
184
185   If you want to disable regex_t type definition in oniguruma.h,
186   define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.
187
188   Example of the compiling/linking command line in Unix or Cygwin,
189   (prefix == /usr/local case)
190
191     cc sample.c -L/usr/local/lib -lonig
192
193
194   If you want to use static link library(onig_s.lib) in Win32,
195   add option -DONIG_EXTERN=extern to C compiler.
196
197
198
199 Sample Programs
200 ---------------
201
202 |File                  |Description                               |
203 |:---------------------|:-----------------------------------------|
204 |sample/simple.c       |example of the minimum (Oniguruma API)    |
205 |sample/names.c        |example of the named group callback.      |
206 |sample/encode.c       |example of some encodings.                |
207 |sample/listcap.c      |example of the capture history.           |
208 |sample/posix.c        |POSIX API sample.                         |
209 |sample/scan.c         |example of using onig_scan().             |
210 |sample/sql.c          |example of the variable meta characters.  |
211 |sample/user_property.c|example of user defined Unicode property. |
212
213
214 Test Programs
215
216 |File               |Description                            |
217 |:------------------|:--------------------------------------|
218 |sample/syntax.c    |Perl, Java and ASIS syntax test.       |
219 |sample/crnl.c      |--enable-crnl-as-line-terminator test  |
220
221
222
223 Source Files
224 ------------
225
226 |File               |Description                                             |
227 |:------------------|:-------------------------------------------------------|
228 |oniguruma.h        |Oniguruma API header file (public)                      |
229 |onig-config.in     |configuration check program template                    |
230 |regenc.h           |character encodings framework header file               |
231 |regint.h           |internal definitions                                    |
232 |regparse.h         |internal definitions for regparse.c and regcomp.c       |
233 |regcomp.c          |compiling and optimization functions                    |
234 |regenc.c           |character encodings framework                           |
235 |regerror.c         |error message function                                  |
236 |regext.c           |extended API functions (deluxe version API)             |
237 |regexec.c          |search and match functions                              |
238 |regparse.c         |parsing functions.                                      |
239 |regsyntax.c        |pattern syntax functions and built-in syntax definitions|
240 |regtrav.c          |capture history tree data traverse functions            |
241 |regversion.c       |version info function                                   |
242 |st.h               |hash table functions header file                        |
243 |st.c               |hash table functions                                    |
244 |oniggnu.h          |GNU regex API header file (public)                      |
245 |reggnu.c           |GNU regex API functions                                 |
246 |onigposix.h        |POSIX API header file (public)                          |
247 |regposerr.c        |POSIX error message function                            |
248 |regposix.c         |POSIX API functions                                     |
249 |mktable.c          |character type table generator                          |
250 |ascii.c            |ASCII encoding                                          |
251 |euc_jp.c           |EUC-JP encoding                                         |
252 |euc_tw.c           |EUC-TW encoding                                         |
253 |euc_kr.c           |EUC-KR, EUC-CN encoding                                 |
254 |sjis.c             |Shift_JIS encoding                                      |
255 |big5.c             |Big5      encoding                                      |
256 |gb18030.c          |GB18030   encoding                                      |
257 |koi8.c             |KOI8      encoding                                      |
258 |koi8_r.c           |KOI8-R    encoding                                      |
259 |cp1251.c           |CP1251    encoding                                      |
260 |iso8859_1.c        |ISO-8859-1 (Latin-1)                                    |
261 |iso8859_2.c        |ISO-8859-2 (Latin-2)                                    |
262 |iso8859_3.c        |ISO-8859-3 (Latin-3)                                    |
263 |iso8859_4.c        |ISO-8859-4 (Latin-4)                                    |
264 |iso8859_5.c        |ISO-8859-5 (Cyrillic)                                   |
265 |iso8859_6.c        |ISO-8859-6 (Arabic)                                     |
266 |iso8859_7.c        |ISO-8859-7 (Greek)                                      |
267 |iso8859_8.c        |ISO-8859-8 (Hebrew)                                     |
268 |iso8859_9.c        |ISO-8859-9 (Latin-5 or Turkish)                         |
269 |iso8859_10.c       |ISO-8859-10 (Latin-6 or Nordic)                         |
270 |iso8859_11.c       |ISO-8859-11 (Thai)                                      |
271 |iso8859_13.c       |ISO-8859-13 (Latin-7 or Baltic Rim)                     |
272 |iso8859_14.c       |ISO-8859-14 (Latin-8 or Celtic)                         |
273 |iso8859_15.c       |ISO-8859-15 (Latin-9 or West European with Euro)        |
274 |iso8859_16.c       |ISO-8859-16 (Latin-10)                                  |
275 |utf8.c             |UTF-8    encoding                                       |
276 |utf16_be.c         |UTF-16BE encoding                                       |
277 |utf16_le.c         |UTF-16LE encoding                                       |
278 |utf32_be.c         |UTF-32BE encoding                                       |
279 |utf32_le.c         |UTF-32LE encoding                                       |
280 |unicode.c          |common codes of Unicode encoding                        |
281 |unicode_fold_data.c|Unicode folding data                                    |
282 |win32/Makefile     |Makefile for Win32 (VC++)                               |
283 |win32/config.h     |config.h for Win32                                      |