1 <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.17 2001/11/19 03:58:24 tgl Exp $ -->
8 Describes the available localization features from the point of
9 view of the administrator.
14 <productname>Postgres</productname> supports localization with
20 Using the locale features of the operating system to provide
21 locale-specific collation order, number formatting, translated
22 messages, and other aspects.
28 Using explicit multiple-byte character sets defined in the
29 <productname>Postgres</productname> server to support languages
30 that require more characters than will fit into a single byte,
31 and to provide character set recoding between client and server.
32 The number of supported character sets is fixed at the time the
33 server is compiled, and internal operations such as string
34 comparisons require expansion of each character into a 32-bit
41 Single byte character recoding provides a more light-weight
42 solution for users of multiple, yet single-byte character sets.
50 <title>Locale Support</title>
52 <indexterm zone="locale"><primary>locale</></>
55 <firstterm>Locale</> support refers to an application respecting
56 cultural preferences regarding alphabets, sorting, number
57 formatting, etc. <productname>PostgreSQL</> uses the standard ISO
58 C and <acronym>POSIX</acronym>-like locale facilities provided by the server operating
59 system. For additional information refer to the documentation of your
67 Locale support is not built into <productname>PostgreSQL</> by
68 default; to enable it, supply the <option>--enable-locale</> option
69 to the <filename>configure</> script:
72 <prompt>$ </><userinput>./configure --enable-locale</>
75 Locale support only affects the server; all clients are compatible
76 with servers with or without locale support.
80 To enable messages translated to the user's preferred language,
81 the <option>--enable-nls</option> option must be used. This
82 option is independent of the other locale support.
86 The information about which particular cultural rules to use is
87 determined by standard environment variables. If you are getting
88 localized behavior from other programs you probably have them set
89 up already. The simplest way to set the localization information
90 is the <envar>LANG</> variable, for example:
94 This sets the locale to Swedish (<literal>sv</>) as spoken in
95 Sweden (<literal>SE</>). Other possibilities might be
96 <literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada,
97 French). If more than one character set can be useful for a locale
98 then the specifications look like this:
99 <literal>cs_CZ.ISO8859-2</>. What locales are available under what
100 names on your system depends on what was provided by the operating
101 system vendor and what was installed.
105 Occasionally it is useful to mix rules from several locales, e.g.,
106 use U.S. collation rules but Spanish messages. To do that a set of
107 environment variables exist that override the default of
108 <envar>LANG</> for a particular category:
114 <entry><envar>LC_COLLATE</></>
115 <entry>String sort order</>
118 <entry><envar>LC_CTYPE</></>
119 <entry>Character classification (What is a letter? The upper-case equivalent?)</>
122 <entry><envar>LC_MESSAGES</></>
123 <entry>Language of messages</>
126 <entry><envar>LC_MONETARY</></>
127 <entry>Formatting of currency amounts</>
130 <entry><envar>LC_NUMERIC</></>
131 <entry>Formatting of numbers</>
134 <entry><envar>LC_TIME</></>
135 <entry>Formatting of dates and times</>
141 Additionally, all of these specific variables and the
142 <envar>LANG</> variable can be overridden with the
143 <envar>LC_ALL</> environment variable.
148 Some message localization libraries also look at the environment
149 variable <envar>LANGUAGE</envar> which overrides all other locale
150 settings for the purpose of setting the language of messages. If
151 in doubt, please refer to the documentation of your operating
152 system, in particular the
153 <citerefentry><refentrytitle>gettext</><manvolnum>3</></> manual
154 page, for more information.
159 If you want the system to behave as if it had no locale support,
160 use the special locale <literal>C</> or <literal>POSIX</>, or
161 simply unset all locale-related variables.
165 Note that the locale behavior of the server is determined by the
166 environment variables seen by the server, not by the environment
167 of any client. Therefore, be careful to set these variables
168 before starting the server. A consequence of this is that if
169 client and server are set up to different locales, messages may
170 appear in different languages depending on where they originated.
174 The <envar>LC_COLLATE</> and <envar>LC_CTYPE</> variables affect the
175 sort order of indexes. Therefore, these values must be kept fixed
176 for any particular database cluster, or indexes on text columns will
177 become corrupt. <productname>Postgres</productname> enforces this
178 by recording the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
179 that are seen by <application>initdb</>. The server automatically adopts
180 those two values when it is started; only the other <envar>LC_</>
181 categories can be set from the environment at server startup.
182 In short, only one collation order can be used in a database cluster,
183 and it is chosen at <application>initdb</> time.
191 Locale support influences in particular the following features:
196 Sort order in <command>ORDER BY</> queries.
197 <indexterm><primary>ORDER BY</></>
203 The <function>to_char</> family of functions
209 The <literal>LIKE</> and <literal>~</> operators for pattern
217 The only severe drawback of using the locale support in
218 <productname>PostgreSQL</> is its speed. So use locale only if you
219 actually need it. It should be noted in particular that selecting
220 a non-C locale disables index optimizations for <literal>LIKE</> and
221 <literal>~</> operators, which can make a huge difference in the
222 speed of searches that use those operators.
230 If locale support doesn't work in spite of the explanation above,
231 check that the locale support in your operating system is correctly configured.
232 To check whether a given locale is installed and functional you
233 can use <application>Perl</>, for example. Perl has also support
234 for locales and if a locale is broken <command>perl -v</> will
235 complain something like this:
237 <prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
238 <prompt>$</> <userinput>perl -v</>
240 perl: warning: Setting locale failed.
241 perl: warning: Please check that your locale settings:
243 LC_CTYPE = "not_exist",
245 are supported and installed on your system.
246 perl: warning: Falling back to the standard locale ("C").
252 Check that your locale files are in the right location. Possible
253 locations include: <filename>/usr/lib/locale</filename> (<systemitem class="osname">Linux</>,
254 <systemitem class="osname">Solaris</>), <filename>/usr/share/locale</filename> (<systemitem class="osname">Linux</>),
255 <filename>/usr/lib/nls/loc</filename> (<systemitem class="osname">DUX 4.0</>). Check the locale
256 man page of your system if you are not sure.
260 Check that <productname>PostgreSQL</> is actually using the locale that
261 you think it is. <envar>LC_COLLATE</> and <envar>LC_CTYPE</> settings are
262 determined at <application>initdb</> time and cannot be changed without
263 repeating <application>initdb</>. Other locale settings including
264 <envar>LC_MESSAGES</> and <envar>LC_MONETARY</> are determined by the
265 environment the postmaster is started in, and can be changed with a simple
266 postmaster restart. You can check the <envar>LC_COLLATE</> and
267 <envar>LC_CTYPE</> settings of
268 a database with the <filename>contrib/pg_controldata</> utility program.
272 The directory <filename>src/test/locale</> contains a test suite
273 for <productname>PostgreSQL</>'s locale support.
277 Client applications that handle server-side errors by parsing the
278 text of the error message will obviously have problems when the
279 server's messages are in a different language. If you create such
280 an application you need to devise a plan to cope with this
281 situation. The embedded SQL interface (<application>ecpg</>) is
282 also affected by this problem. It is currently recommended that
283 servers interfacing with <application>ecpg</> applications be
284 configured to send messages in English.
288 Maintaining catalogs of message translations requires the on-going
289 efforts of many volunteers that want to see
290 <productname>PostgreSQL</> speak their preferred language well.
291 If messages in your language is currently not available or fully
292 translated, your assistance would be appreciated. If you want to
293 help, refer to the <citetitle>Developer's Guide</> or write to the
294 developers' mailing list.
300 <sect1 id="multibyte">
301 <title>Multibyte Support</title>
303 <indexterm zone="multibyte"><primary>multibyte</></>
306 <title>Author</title>
309 Tatsuo Ishii (<email>ishii@postgresql.org</email>),
310 last updated 2000-03-22.
312 url="http://www.sra.co.jp/people/t-ishii/PostgreSQL/">Tatsuo's
313 web site</ulink> for more information.
318 Multibyte (<acronym>MB</acronym>) support is intended to allow
319 <productname>Postgres</productname> to handle
320 multiple-byte character sets such as <acronym>EUC</> (Extended Unix Code), Unicode and
321 Mule internal code. With <acronym>MB</acronym> enabled you can use multibyte
322 character sets in regular expressions (regexp), LIKE, and some
323 other functions. The default
324 encoding system is selected while initializing your
325 <productname>Postgres</productname> installation using
326 <application>initdb</application>. Note that this can be
327 overridden when you create a database using
328 <application>createdb</application> or by using the SQL command
329 CREATE DATABASE. So you can have multiple databases each with
330 a different encoding system.
334 <acronym>MB</acronym> also fixes some problems concerning 8-bit single byte
335 character sets including ISO8859. (I would not say all problems
336 have been fixed. I just confirmed that the regression test ran fine
337 and a few French characters could be used with the patch. Please let
338 me know if you find any problem while using 8-bit characters.)
342 <title>Enabling MB</title>
345 Run configure with the multibyte option:
348 % ./configure --enable-multibyte[=<replaceable>encoding_system</replaceable>]
351 where <replaceable>encoding_system</replaceable> can be one of the
352 values in the following table:
355 <title>Character Set Encodings</title>
356 <titleabbrev>Encodings</titleabbrev>
360 <entry>Encoding</entry>
361 <entry>Description</entry>
366 <entry><literal>SQL_ASCII</literal></entry>
367 <entry><acronym>US ASCII</acronym></entry>
370 <entry><literal>EUC_JP</literal></entry>
371 <entry>Japanese <acronym>EUC</></entry>
374 <entry><literal>EUC_CN</literal></entry>
375 <entry>Chinese <acronym>EUC</></entry>
378 <entry><literal>EUC_KR</literal></entry>
379 <entry>Korean <acronym>EUC</></entry>
382 <entry><literal>EUC_TW</literal></entry>
383 <entry>Taiwan <acronym>EUC</acronym></entry>
386 <entry><literal>UNICODE</literal></entry>
387 <entry>Unicode (<acronym>UTF</acronym>-8)</entry>
390 <entry><literal>MULE_INTERNAL</literal></entry>
391 <entry>Mule internal code</entry>
394 <entry><literal>LATIN1</literal></entry>
395 <entry>ISO 8859-1 ECMA-94 Latin Alphabet No.1</entry>
398 <entry><literal>LATIN2</literal></entry>
399 <entry>ISO 8859-2 ECMA-94 Latin Alphabet No.2</entry>
402 <entry><literal>LATIN3</literal></entry>
403 <entry>ISO 8859-3 ECMA-94 Latin Alphabet No.3</entry>
406 <entry><literal>LATIN4</literal></entry>
407 <entry>ISO 8859-4 ECMA-94 Latin Alphabet No.4</entry>
410 <entry><literal>LATIN5</literal></entry>
411 <entry>ISO 8859-9 ECMA-128 Latin Alphabet No.5</entry>
414 <entry><literal>LATIN6</literal></entry>
415 <entry>ISO 8859-10 ECMA-144 Latin Alphabet No.6</entry>
418 <entry><literal>LATIN7</literal></entry>
419 <entry>ISO 8859-13 Latin Alphabet No.7</entry>
422 <entry><literal>LATIN8</literal></entry>
423 <entry>ISO 8859-14 Latin Alphabet No.8</entry>
426 <entry><literal>LATIN9</literal></entry>
427 <entry>ISO 8859-15 Latin Alphabet No.9</entry>
430 <entry><literal>LATIN10</literal></entry>
431 <entry>ISO 8859-16 ASRO SR 14111 Latin Alphabet No.10</entry>
434 <entry><literal>ISO-8859-5</literal></entry>
435 <entry>ECMA-113 Latin/Cyrillic</entry>
438 <entry><literal>ISO-8859-6</literal></entry>
439 <entry>ECMA-114 Latin/Arabic</entry>
442 <entry><literal>ISO-8859-7</literal></entry>
443 <entry>ECMA-118 Latin/Greek</entry>
446 <entry><literal>ISO-8859-8</literal></entry>
447 <entry>ECMA-121 Latin/Hebrew</entry>
450 <entry><literal>KOI8</literal></entry>
451 <entry><acronym>KOI</acronym>8-R(U)</entry>
454 <entry><literal>WIN</literal></entry>
455 <entry>Windows CP1251</entry>
458 <entry><literal>ALT</literal></entry>
459 <entry>Windows CP866</entry>
467 CAUTION1: Note that before 7.2 LATIN5 meant ISO 8859-5 mistakely. In 7.2
468 LATIN5 measn ISO 8859-9. If you have LATIN5 database created on
469 7.1 or before and want to migrate to 7.2, you should be very
470 carefull about this change.
474 CAUTION2: Not all API supports encodings listed above. For example,
475 PostgreSQL JDBC driver does not support MULE_INTERNAL, LATIN6,
480 Here is an example of configuring
481 <productname>Postgres</productname> to use a Japanese encoding by
485 % ./configure --enable-multibyte=EUC_JP
490 If the encoding system is omitted (./configure --enable-multibyte),
491 SQL_ASCII is assumed.
496 <title>Setting the Encoding</title>
499 <application>initdb</application> defines the default encoding
500 for a <productname>Postgres</productname> installation. For example:
506 sets the default encoding to <literal>EUC_JP</literal> (Extended Unix Code for Japanese).
507 Note that you can use <option>--encoding</option> instead of <option>-E</option> if you prefer
508 to type longer option strings.
509 If no -E or --encoding option is given, the encoding
510 specified at configure time is used.
514 You can create a database with a different encoding:
517 % createdb -E EUC_KR korean
520 will create a database named <database>korean</database> with <literal>EUC_KR</literal> encoding.
521 Another way to accomplish this is to use a SQL command:
524 CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
527 The encoding for a database is represented as an
528 <firstterm>encoding column</firstterm> in the
529 <literal>pg_database</literal> system catalog.
530 You can see that by using <option>-l</option> or <command>\l</command> of <command>psql</command>
536 Database | Owner | Encoding
537 ---------------+---------+---------------
538 euc_cn | t-ishii | EUC_CN
539 euc_jp | t-ishii | EUC_JP
540 euc_kr | t-ishii | EUC_KR
541 euc_tw | t-ishii | EUC_TW
542 mule_internal | t-ishii | MULE_INTERNAL
543 regression | t-ishii | SQL_ASCII
544 template1 | t-ishii | EUC_JP
545 test | t-ishii | EUC_JP
546 unicode | t-ishii | UNICODE
553 <title>Automatic encoding translation between backend and
557 <productname>Postgres</productname> supports an automatic
558 encoding translation between backend
559 and frontend for some encodings.
562 <title>Client/Server Character Set Encodings</title>
563 <titleabbrev>Communication Encodings</titleabbrev>
567 <entry>Server Encoding</entry>
568 <entry>Available Client Encodings</entry>
573 <entry><literal>SQL_ASCII</literal></entry>
574 <entry><literal>SQL_ASCII</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
578 <entry><literal>EUC_JP</literal></entry>
579 <entry><literal>EUC_JP</literal>, <literal>SJIS</literal>,
580 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
584 <entry><literal>EUC_TW</literal></entry>
585 <entry><literal>EUC_TW</literal>, <literal>BIG5</literal>,
586 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
590 <entry><literal>LATIN1</literal></entry>
591 <entry><literal>LATIN1</literal>, <literal>UNICODE</literal>
592 <literal>MULE_INTERNAL</literal>
596 <entry><literal>LATIN2</literal></entry>
597 <entry><literal>LATIN2</literal>, <literal>WIN1250</literal>,
598 <literal>UNICODE</literal>,
599 <literal>MULE_INTERNAL</literal>
603 <entry><literal>LATIN3</literal></entry>
604 <entry><literal>LATIN3</literal>, <literal>UNICODE</literal>
605 <literal>MULE_INTERNAL</literal>
609 <entry><literal>LATIN4</literal></entry>
610 <entry><literal>LATIN4</literal>, <literal>UNICODE</literal>
611 <literal>MULE_INTERNAL</literal>
615 <entry><literal>LATIN5</literal></entry>
616 <entry><literal>LATIN5</literal>, <literal>UNICODE</literal>
617 <literal>MULE_INTERNAL</literal>
621 <entry><literal>LATIN6</literal></entry>
622 <entry><literal>LATIN6</literal>, <literal>UNICODE</literal>
623 <literal>MULE_INTERNAL</literal>
627 <entry><literal>LATIN7</literal></entry>
628 <entry><literal>LATIN7</literal>, <literal>UNICODE</literal>
629 <literal>MULE_INTERNAL</literal>
633 <entry><literal>LATIN8</literal></entry>
634 <entry><literal>LATIN8</literal>, <literal>UNICODE</literal>
635 <literal>MULE_INTERNAL</literal>
639 <entry><literal>LATIN9</literal></entry>
640 <entry><literal>LATIN9</literal>, <literal>UNICODE</literal>
641 <literal>MULE_INTERNAL</literal>
645 <entry><literal>LATIN10</literal></entry>
646 <entry><literal>LATIN10</literal>, <literal>UNICODE</literal>
647 <literal>MULE_INTERNAL</literal>
651 <entry><literal>ISO_8859_5</literal></entry>
652 <entry><literal>ISO_8859_5</literal>,
653 <literal>UNICODE</literal>
657 <entry><literal>ISO_8859_6</literal></entry>
658 <entry><literal>ISO_8859_6</literal>,
659 <literal>UNICODE</literal>
663 <entry><literal>ISO_8859_7</literal></entry>
664 <entry><literal>ISO_8859_7</literal>,
665 <literal>UNICODE</literal>
669 <entry><literal>ISO_8859_8</literal></entry>
670 <entry><literal>ISO_8859_8</literal>,
671 <literal>UNICODE</literal>
675 <entry><literal>ISO_8859_9</literal></entry>
676 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
677 <literal>ALT</literal>, <literal>KOI8R</literal>,
678 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
682 <entry><literal>UNICODE</literal></entry>
684 <literal>EUC_JP</literal>, <literal>SJIS</literal>,
685 <literal>EUC_KR</literal>, <literal>EUC_CN</literal>,
686 <literal>EUC_TW</literal>, <literal>BIG5</literal>,
687 <literal>LATIN1</literal> to <literal>LATIN10</literal>,
688 <literal>ISO_8859_5</literal>,
689 <literal>ISO_8859_6</literal>,
690 <literal>ISO_8859_7</literal>,
691 <literal>ISO_8859_8</literal>,
692 <literal>WIN</literal>, <literal>ALT</literal>,
693 <literal>KOI8</literal>
697 <entry><literal>MULE_INTERNAL</literal></entry>
698 <entry><literal>EUC_JP</literal>, <literal>SJIS</literal>, <literal>EUC_KR</literal>, <literal>EUC_CN</literal>,
699 <literal>EUC_TW</literal>, <literal>BIG5</literal>, <literal>LATIN1</literal> to <literal>LATIN5</literal>,
700 <literal>WIN</literal>, <literal>ALT</literal>, <literal>WIN1250</literal></entry>
703 <entry><literal>KOI8</literal></entry>
704 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
705 <literal>ALT</literal>, <literal>KOI8</literal>,
706 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
710 <entry><literal>WIN</literal></entry>
711 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
712 <literal>ALT</literal>, <literal>KOI8</literal>,
713 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
717 <entry><literal>ALT</literal></entry>
718 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
719 <literal>ALT</literal>, <literal>KOI8</literal>,
720 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
729 To enable the automatic encoding translation, you have to tell
730 <productname>Postgres</productname> the encoding you would like
731 to use in frontend. There are
732 several ways to accomplish this.
737 Using the <command>\encoding</command> command in
738 <application>psql</application>.
739 <command>\encoding</command> allows you to change frontend
740 encoding on the fly. For
741 example, to change the encoding to <literal>SJIS</literal>, type:
751 Using <application>libpq</> functions.
752 <command>\encoding</command> actually calls
753 <function>PQsetClientEncoding()</function> for its purpose.
756 int PQsetClientEncoding(PGconn *<replaceable>conn</replaceable>, const char *<replaceable>encoding</replaceable>)
759 where <replaceable>conn</replaceable> is a connection to the backend,
760 and <replaceable>encoding</replaceable> is an encoding you
761 want to use. If it successfully sets the encoding, it returns 0,
762 otherwise -1. The current encoding for this connection can be shown by
766 int PQclientEncoding(const PGconn *<replaceable>conn</replaceable>)
769 Note that it returns the encoding id, not the encoding symbol string
770 such as <literal>EUC_JP</literal>. To convert an encoding id to an encoding symbol, you
774 char *pg_encoding_to_char(int <replaceable>encoding_id</replaceable>)
781 Using <command>SET CLIENT_ENCODING TO</command>.
783 Setting the frontend side encoding can be done by this SQL command:
786 SET CLIENT_ENCODING TO 'encoding';
789 Also you can use SQL92 syntax <literal>SET NAMES</literal> for this purpose:
792 SET NAMES 'encoding';
795 To query the current frontend encoding:
798 SHOW CLIENT_ENCODING;
801 To return to the default encoding:
804 RESET CLIENT_ENCODING;
811 Using <envar>PGCLIENTENCODING</envar>.
813 If environment variable <envar>PGCLIENTENCODING</envar> is defined
814 in the client's environment, that client encoding is automatically
815 selected when a backend connection is made. (This can subsequently
816 be overridden using any of the other methods mentioned above.)
824 <title>About Unicode</title>
826 <indexterm><primary>Unicode</></>
829 An automatic encoding translation between Unicode and other
830 encodings has been supported since PostgreSQL 7.1.
831 For 7.1 it's not enabled by default.
832 To enable this feature, run configure with the
833 <option>--enable-unicode-conversion</option> option. Note that this requires
834 the <option>--enable-multibyte</option> option also.
837 For 7.2, <option>--enable-unicode-conversion</option> is not necessary.
838 The unicode conversion functionality is automatically enabled
839 if <option>--enable-multibyte</option> is specified.
844 <title>What happens if the translation is not possible?</title>
847 Suppose you choose <literal>EUC_JP</literal> for the backend, <literal>LATIN1</literal> for the frontend,
848 then some Japanese characters could not be translated into <literal>LATIN1</literal>. In
849 this case, a letter that cannot be represented in the <literal>LATIN1</literal> character set
850 would be transformed as:
859 <title>References</title>
862 These are good sources to start learning about various kinds of encoding
867 <term><ulink url="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf"></ulink></term>
871 Detailed explanations of <literal>EUC_JP</literal>,
872 <literal>EUC_CN</literal>, <literal>EUC_KR</literal>,
873 <literal>EUC_TW</literal> appear in section 3.2.
879 <term><ulink url="http://www.unicode.org/"></ulink></term>
883 The web site of the Unicode Consortium
889 <term>RFC 2044</term>
893 <acronym>UTF</acronym>-8 is defined here.
902 <title>History</title>
904 <literallayout class="monospaced">
906 * An automatic encoding translation between Unicode and other
907 encodings are implemented
908 * Changes above will appear in 7.1
911 * SJIS UDC (NEC selection IBM kanji) support contributed
913 * Changes above will appear in 7.0.1
916 * Add new libpq functions PQsetClientEncoding, PQclientEncoding
917 * ./configure --with-mb=EUC_JP
919 ./configure --enable-multibyte=EUC_JP
921 * Add SQL_ASCII regression test case
922 * Add SJIS User Defined Character (UDC) support
923 * All of above will appear in 7.0
926 * Add support for WIN1250 (Windows Czech) as a client encoding
927 (contributed by Pavel Behal)
928 * fix some compiler warnings (contributed by Tomoaki Nishiyama)
931 * Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
932 (thanks Oleg Broytmann for testing)
933 * Fix problem with MB and locale
936 * Add support for Big5 for fronend encoding
937 (you need to create a database with EUC_TW to use Big5)
938 * Add regression test case for EUC_TW
939 (contributed by Jonah Kuo <email>jonahkuo@mail.ttn.com.tw</email>)
942 * Bugs related to SQL_ASCII support fixed
945 * 6.4 release. In this version, pg_database has "encoding"
946 column that represents the database encoding
949 * determine encoding at initdb/createdb rather than compile time
950 * support for PGCLIENTENCODING when issuing COPY command
951 * support for SQL92 syntax "SET NAMES"
952 * support for LATIN2-5
953 * add UNICODE regression test case
954 * new test suite for MB
955 * clean up source files
958 * add support for the encoding translation between the backend
960 * new command SET CLIENT_ENCODING etc. added
961 * add support for LATIN1 character set
962 * enhance 8 bit cleaness
964 April 21, 1998 some enhancements/fixes
965 * character_length(), position(), substring() are now aware of
966 multi-byte characters
968 * add --with-mb option to configure
969 * new regression tests for EUC_KR
970 (contributed by Soonmyung Hong <email>hong@lunaris.hanmesoft.co.kr</email>)
971 * add some test cases to the EUC_JP regression test
972 * fix problem in regress/regress.sh in case of System V
973 * fix toupper(), tolower() to handle 8bit chars
975 Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
977 Mar 10, 1998 PL2 released
978 * add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
979 * add an English document (this file)
980 * fix problems concerning 8-bit single byte characters
982 Mar 1, 1998 PL1 released
987 <title>WIN1250 on Windows/ODBC</title>
991 [Here is a good documentation explaining how to use WIN1250 on
992 Windows/ODBC from Pavel Behal]
994 Version: 0.91 for PgSQL 6.5
996 Revised by: Tatsuo Ishii
997 Email: behal@opf.slu.cz
998 Licence: The Same as PostgreSQL
1000 Sorry for my Eglish and C code, I'm not native :-)
1002 !!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1005 The WIN1250 character set on Windows client platforms can be used
1006 with <productname>Postgres</productname> with locale support
1011 The following should be kept in mind:
1016 Success depends on proper system locales. This has been tested
1017 with <systemitem class="osname">Red Hat 6.0</> and <systemitem
1018 class="osname">Slackware 3.6</>, with <literal>cs_CZ.iso8859-2</literal> locale.
1024 Never try to set the server multibyte database encoding to WIN1250.
1025 Always use LATIN2 instead since there is not a WIN1250 locale
1032 WIN1250 encoding is usable only for Windows ODBC clients. The
1033 characters are recoded on the fly, to be displayed and stored
1041 When running, it is important to remember the following:
1046 This configuration reorders your sort order depending on your
1047 <envar>LC_<replaceable>x</replaceable></envar> settings. Don't be
1048 confused with the regression test results since they don't use
1055 A locale such as <literal>ch</literal> is correctly sorted
1057 supports that locale; older systems may not do so but new ones
1064 You have to insert money as '<literal>162,50</literal>' (note
1065 comma within the single-quotes).
1071 At the time of writing (early 1999), this configuration has
1072 not received extensive testing. Please let us know of any
1073 changes you had to make!
1080 <title>WIN1250 on Windows/ODBC</title>
1083 Compile <productname>Postgres</productname> with locale enabled
1084 and the multibyte encoding set to <literal>LATIN2</literal>.
1090 Set up your installation. Do not forget to create locale
1091 variables in your profile (environment). For example (this may
1092 not be correct for <emphasis>your</emphasis> environment):
1095 LC_ALL=cs_CZ.ISO8859-2
1096 LC_COLLATE=cs_CZ.ISO8859-2
1097 LC_CTYPE=cs_CZ.ISO8859-2
1098 LC_MONETARY=cs_CZ.ISO8859-2
1099 LC_NUMERIC=cs_CZ.ISO8859-2
1100 LC_TIME=cs_CZ.ISO8859-2
1107 You have to start the postmaster with locales set!
1113 Try it with Czech language, and have it sort on a query.
1119 Install ODBC driver for <productname>PostgreSQL</productname> on your Windows machine.
1125 Setup properly your data source. Include this line in your ODBC
1126 configuration dialog in the field <literal>Connect Settings</literal>:
1129 SET CLIENT_ENCODING = 'WIN1250';
1136 Now try it again, but in Windows with ODBC.
1145 <title>Single-byte character set recoding</>
1146 <!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> -->
1149 You can set up this feature with the <option>--enable-recode</> option
1150 to <filename>configure</>. This option was formerly described as
1151 <quote>Cyrillic recode support</> which doesn't express all its
1152 power. It can be used for <emphasis>any</> single-byte character
1157 This method uses a file <filename>charset.conf</> file located in
1158 the database directory (<envar>PGDATA</>). It's a typical
1159 configuration text file where spaces and newlines separate items
1160 and records and # specifies comments. Three keywords with the
1161 following syntax are recognized here:
1163 BaseCharset <replaceable>server_charset</>
1164 RecodeTable <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</>
1165 HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
1170 <token>BaseCharset</> defines the encoding of the database server.
1171 All character set names are only used for mapping inside of
1172 <filename>charset.conf</> so you can freely use typing-friendly
1177 <token>RecodeTable</> records specify translation tables between
1178 server and client. The file name is relative to the
1179 <envar>PGDATA</> directory. The table file format is very
1180 simple. There are no keywords and characters are represented by a
1181 pair of decimal or hexadecimal (0x prefixed) values on single
1184 <replaceable>char_value</> <replaceable>translated_char_value</>
1189 <token>HostCharset</> records define the client character set by IP
1190 address. You can use a single IP address, an IP mask range starting
1191 from the given address or an IP interval (e.g., 127.0.0.1,
1192 192.168.1.100/24, 192.168.1.20-192.168.1.40).
1196 The <filename>charset.conf</> file is always processed up to the
1197 end, so you can easily specify exceptions from the previous
1198 rules. In the <filename>src/data/</> directory you will find an
1199 example <filename>charset.conf</> and a few recoding tables.
1203 As this solution is based on the client's IP address and character
1204 set mapping there are obviously some restrictions as well. You
1205 cannot use different encodings on the same host at the same
1206 time. It is also inconvenient when you boot your client hosts into
1207 multiple operating systems. Nevertheless, when these restrictions are
1208 not limiting and you do not need multibyte characters than it is a
1209 simple and effective solution.
1215 <!-- Keep this comment at the end of the file
1220 sgml-minimize-attributes:nil
1221 sgml-always-quote-attributes:t
1224 sgml-parent-document:nil
1225 sgml-default-dtd-file:"./reference.ced"
1226 sgml-exposed-tags:nil
1227 sgml-local-catalogs:("/usr/lib/sgml/catalog")
1228 sgml-local-ecat-files:nil