1 <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.19 2001/11/28 20:49:09 petere Exp $ -->
8 Describes the available localization features from the point of
9 view of the administrator.
14 <productname>PostgreSQL</productname> supports localization with
20 Using the locale features of the operating system to provide
21 locale-specific collation order, number formatting, translated
22 messages, and other aspects.
28 Using explicit multiple-byte character sets defined in the
29 <productname>PostgreSQL</productname> server to support languages
30 that require more characters than will fit into a single byte,
31 and to provide character set recoding between client and server.
32 The number of supported character sets is fixed at the time the
33 server is compiled, and internal operations such as string
34 comparisons require expansion of each character into a 32-bit
41 Single byte character recoding provides a more light-weight
42 solution for users of multiple, yet single-byte character sets.
50 <title>Locale Support</title>
52 <indexterm zone="locale"><primary>locale</></>
55 <firstterm>Locale</> support refers to an application respecting
56 cultural preferences regarding alphabets, sorting, number
57 formatting, etc. <productname>PostgreSQL</> uses the standard ISO
58 C and <acronym>POSIX</acronym>-like locale facilities provided by the server operating
59 system. For additional information refer to the documentation of your
67 Locale support is not built into <productname>PostgreSQL</> by
68 default; to enable it, supply the <option>--enable-locale</> option
69 to the <filename>configure</> script:
72 <prompt>$ </><userinput>./configure --enable-locale</>
75 Locale support only affects the server; all clients are compatible
76 with servers with or without locale support.
80 To enable messages translated to the user's preferred language,
81 the <option>--enable-nls</option> option must be used. This
82 option is independent of the other locale support.
86 The information about which particular cultural rules to use is
87 determined by standard environment variables. If you are getting
88 localized behavior from other programs you probably have them set
89 up already. The simplest way to set the localization information
90 is the <envar>LANG</> variable, for example:
94 This sets the locale to Swedish (<literal>sv</>) as spoken in
95 Sweden (<literal>SE</>). Other possibilities might be
96 <literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada,
97 French). If more than one character set can be useful for a locale
98 then the specifications look like this:
99 <literal>cs_CZ.ISO8859-2</>. What locales are available under what
100 names on your system depends on what was provided by the operating
101 system vendor and what was installed.
105 Occasionally it is useful to mix rules from several locales, e.g.,
106 use U.S. collation rules but Spanish messages. To do that a set of
107 environment variables exist that override the default of
108 <envar>LANG</> for a particular category:
114 <entry><envar>LC_COLLATE</></>
115 <entry>String sort order</>
118 <entry><envar>LC_CTYPE</></>
119 <entry>Character classification (What is a letter? The upper-case equivalent?)</>
122 <entry><envar>LC_MESSAGES</></>
123 <entry>Language of messages</>
126 <entry><envar>LC_MONETARY</></>
127 <entry>Formatting of currency amounts</>
130 <entry><envar>LC_NUMERIC</></>
131 <entry>Formatting of numbers</>
134 <entry><envar>LC_TIME</></>
135 <entry>Formatting of dates and times</>
141 Additionally, all of these specific variables and the
142 <envar>LANG</> variable can be overridden with the
143 <envar>LC_ALL</> environment variable.
148 Some message localization libraries also look at the environment
149 variable <envar>LANGUAGE</envar> which overrides all other locale
150 settings for the purpose of setting the language of messages. If
151 in doubt, please refer to the documentation of your operating
152 system, in particular the
153 <citerefentry><refentrytitle>gettext</><manvolnum>3</></> manual
154 page, for more information.
159 If you want the system to behave as if it had no locale support,
160 use the special locale <literal>C</> or <literal>POSIX</>, or
161 simply unset all locale-related variables.
165 Note that the locale behavior of the server is determined by the
166 environment variables seen by the server, not by the environment
167 of any client. Therefore, be careful to set these variables
168 before starting the server. A consequence of this is that if
169 client and server are set up to different locales, messages may
170 appear in different languages depending on where they originated.
174 The <envar>LC_COLLATE</> and <envar>LC_CTYPE</> variables affect the
175 sort order of indexes. Therefore, these values must be kept fixed
176 for any particular database cluster, or indexes on text columns will
177 become corrupt. <productname>PostgreSQL</productname> enforces this
178 by recording the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
179 that are seen by <application>initdb</>. The server automatically adopts
180 those two values when it is started; only the other <envar>LC_</>
181 categories can be set from the environment at server startup.
182 In short, only one collation order can be used in a database cluster,
183 and it is chosen at <application>initdb</> time.
191 Locale support influences in particular the following features:
196 Sort order in <command>ORDER BY</> queries.
197 <indexterm><primary>ORDER BY</></>
203 The <function>to_char</> family of functions
209 The <literal>LIKE</> and <literal>~</> operators for pattern
217 The only severe drawback of using the locale support in
218 <productname>PostgreSQL</> is its speed. So use locale only if you
219 actually need it. It should be noted in particular that selecting
220 a non-C locale disables index optimizations for <literal>LIKE</> and
221 <literal>~</> operators, which can make a huge difference in the
222 speed of searches that use those operators.
230 If locale support doesn't work in spite of the explanation above,
231 check that the locale support in your operating system is correctly configured.
232 To check whether a given locale is installed and functional you
233 can use <application>Perl</>, for example. Perl has also support
234 for locales and if a locale is broken <command>perl -v</> will
235 complain something like this:
237 <prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
238 <prompt>$</> <userinput>perl -v</>
240 perl: warning: Setting locale failed.
241 perl: warning: Please check that your locale settings:
243 LC_CTYPE = "not_exist",
245 are supported and installed on your system.
246 perl: warning: Falling back to the standard locale ("C").
252 Check that your locale files are in the right location. Possible
253 locations include: <filename>/usr/lib/locale</filename> (<systemitem class="osname">Linux</>,
254 <systemitem class="osname">Solaris</>), <filename>/usr/share/locale</filename> (<systemitem class="osname">Linux</>),
255 <filename>/usr/lib/nls/loc</filename> (<systemitem class="osname">DUX 4.0</>). Check the locale
256 man page of your system if you are not sure.
260 Check that <productname>PostgreSQL</> is actually using the locale that
261 you think it is. <envar>LC_COLLATE</> and <envar>LC_CTYPE</> settings are
262 determined at <application>initdb</> time and cannot be changed without
263 repeating <application>initdb</>. Other locale settings including
264 <envar>LC_MESSAGES</> and <envar>LC_MONETARY</> are determined by the
265 environment the postmaster is started in, and can be changed with a simple
266 postmaster restart. You can check the <envar>LC_COLLATE</> and
267 <envar>LC_CTYPE</> settings of
268 a database with the <filename>contrib/pg_controldata</> utility program.
272 The directory <filename>src/test/locale</> contains a test suite
273 for <productname>PostgreSQL</>'s locale support.
277 Client applications that handle server-side errors by parsing the
278 text of the error message will obviously have problems when the
279 server's messages are in a different language. If you create such
280 an application you need to devise a plan to cope with this
281 situation. The embedded SQL interface (<application>ecpg</>) is
282 also affected by this problem. It is currently recommended that
283 servers interfacing with <application>ecpg</> applications be
284 configured to send messages in English.
288 Maintaining catalogs of message translations requires the on-going
289 efforts of many volunteers that want to see
290 <productname>PostgreSQL</> speak their preferred language well.
291 If messages in your language is currently not available or fully
292 translated, your assistance would be appreciated. If you want to
293 help, refer to the <citetitle>Developer's Guide</> or write to the
294 developers' mailing list.
300 <sect1 id="multibyte">
301 <title>Multibyte Support</title>
303 <indexterm zone="multibyte"><primary>multibyte</></>
306 <title>Author</title>
309 Tatsuo Ishii (<email>ishii@postgresql.org</email>),
310 last updated 2000-03-22.
312 url="http://www.sra.co.jp/people/t-ishii/PostgreSQL/">Tatsuo's
313 web site</ulink> for more information.
318 Multibyte (<acronym>MB</acronym>) support is intended to allow
319 <productname>PostgreSQL</productname> to handle
320 multiple-byte character sets such as <acronym>EUC</> (Extended Unix Code), Unicode, and
321 Mule internal code. With <acronym>MB</acronym> enabled you can use multibyte
322 character sets in regular expressions (regexp), LIKE, and some
323 other functions. The default
324 encoding system is selected while initializing your
325 <productname>PostgreSQL</productname> installation using
326 <application>initdb</application>. Note that this can be
327 overridden when you create a database using
328 <application>createdb</application> or by using the SQL command
329 <command>CREATE DATABASE</>. So you can have multiple databases each with
330 a different encoding system.
334 <title>Enabling Multibyte Support</title>
337 Run configure with the multibyte option:
340 ./configure --enable-multibyte<optional>=<replaceable>encoding_system</replaceable></optional>
343 where <replaceable>encoding_system</replaceable> can be one of the
344 values in the following table:
347 <title>Character Set Encodings</title>
348 <titleabbrev>Encodings</titleabbrev>
352 <entry>Encoding</entry>
353 <entry>Description</entry>
358 <entry><literal>SQL_ASCII</literal></entry>
359 <entry><acronym>ASCII</acronym></entry>
362 <entry><literal>EUC_JP</literal></entry>
363 <entry>Japanese <acronym>EUC</></entry>
366 <entry><literal>EUC_CN</literal></entry>
367 <entry>Chinese <acronym>EUC</></entry>
370 <entry><literal>EUC_KR</literal></entry>
371 <entry>Korean <acronym>EUC</></entry>
374 <entry><literal>EUC_TW</literal></entry>
375 <entry>Taiwan <acronym>EUC</acronym></entry>
378 <entry><literal>UNICODE</literal></entry>
379 <entry>Unicode (<acronym>UTF</acronym>-8)</entry>
382 <entry><literal>MULE_INTERNAL</literal></entry>
383 <entry>Mule internal code</entry>
386 <entry><literal>LATIN1</literal></entry>
387 <entry>ISO 8859-1 ECMA-94 Latin Alphabet No.1</entry>
390 <entry><literal>LATIN2</literal></entry>
391 <entry>ISO 8859-2 ECMA-94 Latin Alphabet No.2</entry>
394 <entry><literal>LATIN3</literal></entry>
395 <entry>ISO 8859-3 ECMA-94 Latin Alphabet No.3</entry>
398 <entry><literal>LATIN4</literal></entry>
399 <entry>ISO 8859-4 ECMA-94 Latin Alphabet No.4</entry>
402 <entry><literal>LATIN5</literal></entry>
403 <entry>ISO 8859-9 ECMA-128 Latin Alphabet No.5</entry>
406 <entry><literal>LATIN6</literal></entry>
407 <entry>ISO 8859-10 ECMA-144 Latin Alphabet No.6</entry>
410 <entry><literal>LATIN7</literal></entry>
411 <entry>ISO 8859-13 Latin Alphabet No.7</entry>
414 <entry><literal>LATIN8</literal></entry>
415 <entry>ISO 8859-14 Latin Alphabet No.8</entry>
418 <entry><literal>LATIN9</literal></entry>
419 <entry>ISO 8859-15 Latin Alphabet No.9</entry>
422 <entry><literal>LATIN10</literal></entry>
423 <entry>ISO 8859-16 ASRO SR 14111 Latin Alphabet No.10</entry>
426 <entry><literal>ISO-8859-5</literal></entry>
427 <entry>ECMA-113 Latin/Cyrillic</entry>
430 <entry><literal>ISO-8859-6</literal></entry>
431 <entry>ECMA-114 Latin/Arabic</entry>
434 <entry><literal>ISO-8859-7</literal></entry>
435 <entry>ECMA-118 Latin/Greek</entry>
438 <entry><literal>ISO-8859-8</literal></entry>
439 <entry>ECMA-121 Latin/Hebrew</entry>
442 <entry><literal>KOI8</literal></entry>
443 <entry><acronym>KOI</acronym>8-R(U)</entry>
446 <entry><literal>WIN</literal></entry>
447 <entry>Windows CP1251</entry>
450 <entry><literal>ALT</literal></entry>
451 <entry>Windows CP866</entry>
460 Before <productname>PostgreSQL</>7.2, <literal>LATIN5</> mistakenly
461 meant ISO 8859-5. From 7.2 on,
462 <literal>LATIN5</> means ISO 8859-9. If you have a <literal>LATIN5</>
463 database created on 7.1 or earlier and want to migrate to 7.2 (or
464 later), you should be very careful about this change.
470 Not all APIs supports all the encodings listed above. For example, the
471 <productname>PostgreSQL</>
472 JDBC driver does not support <literal>MULE_INTERNAL</>, <literal>LATIN6</>,
473 <literal>LATIN8</>, and <literal>LATIN10</>.
478 Here is an example of configuring
479 <productname>PostgreSQL</productname> to use a Japanese encoding by
483 $ <userinput>./configure --enable-multibyte=EUC_JP</userinput>
488 If the encoding system is omitted (<literal>./configure --enable-multibyte</literal>),
489 <literal>SQL_ASCII</> is assumed.
494 <title>Setting the Encoding</title>
497 <application>initdb</application> defines the default encoding
498 for a <productname>PostgreSQL</productname> installation. For example:
501 $ <userinput>initdb -E EUC_JP</>
504 sets the default encoding to <literal>EUC_JP</literal> (Extended Unix Code for Japanese).
505 Note that you can use <option>--encoding</option> instead of <option>-E</option> if you prefer
506 to type longer option strings.
507 If no <option>-E</> or <option>--encoding</option> option is given, the encoding
508 specified at configure time is used.
512 You can create a database with a different encoding:
515 $ <userinput>createdb -E EUC_KR korean</>
518 will create a database named <database>korean</database> with <literal>EUC_KR</literal> encoding.
519 Another way to accomplish this is to use a SQL command:
522 CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
525 The encoding for a database is represented as an
526 <firstterm>encoding column</firstterm> in the
527 <literal>pg_database</literal> system catalog.
528 You can see that by using the <option>-l</option> option or the
529 <command>\l</command> command of <command>psql</command>.
532 $ <userinput>psql -l</userinput>
534 Database | Owner | Encoding
535 ---------------+---------+---------------
536 euc_cn | t-ishii | EUC_CN
537 euc_jp | t-ishii | EUC_JP
538 euc_kr | t-ishii | EUC_KR
539 euc_tw | t-ishii | EUC_TW
540 mule_internal | t-ishii | MULE_INTERNAL
541 regression | t-ishii | SQL_ASCII
542 template1 | t-ishii | EUC_JP
543 test | t-ishii | EUC_JP
544 unicode | t-ishii | UNICODE
551 <title>Automatic encoding translation between server and
555 <productname>PostgreSQL</productname> supports an automatic
556 encoding translation between server
557 and client for some encodings. The available combinations are
558 listed in <xref linkend="multibyte-translation-table">.
561 <table tocentry="1" id="multibyte-translation-table">
562 <title>Client/Server Character Set Encodings</title>
563 <titleabbrev>Communication Encodings</titleabbrev>
567 <entry>Server Encoding</entry>
568 <entry>Available Client Encodings</entry>
573 <entry><literal>SQL_ASCII</literal></entry>
574 <entry><literal>SQL_ASCII</literal>, <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
578 <entry><literal>EUC_JP</literal></entry>
579 <entry><literal>EUC_JP</literal>, <literal>SJIS</literal>,
580 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
584 <entry><literal>EUC_TW</literal></entry>
585 <entry><literal>EUC_TW</literal>, <literal>BIG5</literal>,
586 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
590 <entry><literal>LATIN1</literal></entry>
591 <entry><literal>LATIN1</literal>, <literal>UNICODE</literal>
592 <literal>MULE_INTERNAL</literal>
596 <entry><literal>LATIN2</literal></entry>
597 <entry><literal>LATIN2</literal>, <literal>WIN1250</literal>,
598 <literal>UNICODE</literal>,
599 <literal>MULE_INTERNAL</literal>
603 <entry><literal>LATIN3</literal></entry>
604 <entry><literal>LATIN3</literal>, <literal>UNICODE</literal>
605 <literal>MULE_INTERNAL</literal>
609 <entry><literal>LATIN4</literal></entry>
610 <entry><literal>LATIN4</literal>, <literal>UNICODE</literal>
611 <literal>MULE_INTERNAL</literal>
615 <entry><literal>LATIN5</literal></entry>
616 <entry><literal>LATIN5</literal>, <literal>UNICODE</literal>
617 <literal>MULE_INTERNAL</literal>
621 <entry><literal>LATIN6</literal></entry>
622 <entry><literal>LATIN6</literal>, <literal>UNICODE</literal>
623 <literal>MULE_INTERNAL</literal>
627 <entry><literal>LATIN7</literal></entry>
628 <entry><literal>LATIN7</literal>, <literal>UNICODE</literal>
629 <literal>MULE_INTERNAL</literal>
633 <entry><literal>LATIN8</literal></entry>
634 <entry><literal>LATIN8</literal>, <literal>UNICODE</literal>
635 <literal>MULE_INTERNAL</literal>
639 <entry><literal>LATIN9</literal></entry>
640 <entry><literal>LATIN9</literal>, <literal>UNICODE</literal>
641 <literal>MULE_INTERNAL</literal>
645 <entry><literal>LATIN10</literal></entry>
646 <entry><literal>LATIN10</literal>, <literal>UNICODE</literal>
647 <literal>MULE_INTERNAL</literal>
651 <entry><literal>ISO_8859_5</literal></entry>
652 <entry><literal>ISO_8859_5</literal>,
653 <literal>UNICODE</literal>
657 <entry><literal>ISO_8859_6</literal></entry>
658 <entry><literal>ISO_8859_6</literal>,
659 <literal>UNICODE</literal>
663 <entry><literal>ISO_8859_7</literal></entry>
664 <entry><literal>ISO_8859_7</literal>,
665 <literal>UNICODE</literal>
669 <entry><literal>ISO_8859_8</literal></entry>
670 <entry><literal>ISO_8859_8</literal>,
671 <literal>UNICODE</literal>
675 <entry><literal>ISO_8859_9</literal></entry>
676 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
677 <literal>ALT</literal>, <literal>KOI8R</literal>,
678 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
682 <entry><literal>UNICODE</literal></entry>
684 <literal>EUC_JP</literal>, <literal>SJIS</literal>,
685 <literal>EUC_KR</literal>, <literal>EUC_CN</literal>,
686 <literal>EUC_TW</literal>, <literal>BIG5</literal>,
687 <literal>LATIN1</literal> to <literal>LATIN10</literal>,
688 <literal>ISO_8859_5</literal>,
689 <literal>ISO_8859_6</literal>,
690 <literal>ISO_8859_7</literal>,
691 <literal>ISO_8859_8</literal>,
692 <literal>WIN</literal>, <literal>ALT</literal>,
693 <literal>KOI8</literal>
697 <entry><literal>MULE_INTERNAL</literal></entry>
698 <entry><literal>EUC_JP</literal>, <literal>SJIS</literal>, <literal>EUC_KR</literal>, <literal>EUC_CN</literal>,
699 <literal>EUC_TW</literal>, <literal>BIG5</literal>, <literal>LATIN1</literal> to <literal>LATIN5</literal>,
700 <literal>WIN</literal>, <literal>ALT</literal>, <literal>WIN1250</literal></entry>
703 <entry><literal>KOI8</literal></entry>
704 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
705 <literal>ALT</literal>, <literal>KOI8</literal>,
706 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
710 <entry><literal>WIN</literal></entry>
711 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
712 <literal>ALT</literal>, <literal>KOI8</literal>,
713 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
717 <entry><literal>ALT</literal></entry>
718 <entry><literal>ISO_8859_9</literal>, <literal>WIN</literal>,
719 <literal>ALT</literal>, <literal>KOI8</literal>,
720 <literal>UNICODE</literal>, <literal>MULE_INTERNAL</literal>
728 To enable the automatic encoding translation, you have to tell
729 <productname>PostgreSQL</productname> the encoding you would like
730 to use in the client. There are
731 several ways to accomplish this.
736 Using the <command>\encoding</command> command in
737 <application>psql</application>.
738 <command>\encoding</command> allows you to change client
739 encoding on the fly. For
740 example, to change the encoding to <literal>SJIS</literal>, type:
750 Using <application>libpq</> functions.
751 <command>\encoding</command> actually calls
752 <function>PQsetClientEncoding()</function> for its purpose.
755 int PQsetClientEncoding(PGconn *<replaceable>conn</replaceable>, const char *<replaceable>encoding</replaceable>)
758 where <replaceable>conn</replaceable> is a connection to the server,
759 and <replaceable>encoding</replaceable> is an encoding you
760 want to use. If it successfully sets the encoding, it returns 0,
761 otherwise -1. The current encoding for this connection can be shown by
765 int PQclientEncoding(const PGconn *<replaceable>conn</replaceable>)
768 Note that it returns the encoding ID, not a symbolic string
769 such as <literal>EUC_JP</literal>. To convert an encoding ID to an encoding name, you
773 char *pg_encoding_to_char(int <replaceable>encoding_id</replaceable>)
780 Using <command>SET CLIENT_ENCODING TO</command>.
782 Setting the client encoding can be done with this SQL command:
785 SET CLIENT_ENCODING TO 'encoding';
788 Also you can use the SQL92 syntax <literal>SET NAMES</literal> for this purpose:
791 SET NAMES 'encoding';
794 To query the current client encoding:
797 SHOW CLIENT_ENCODING;
800 To return to the default encoding:
803 RESET CLIENT_ENCODING;
810 Using <envar>PGCLIENTENCODING</envar>.
812 If environment variable <envar>PGCLIENTENCODING</envar> is defined
813 in the client's environment, that client encoding is automatically
814 selected when a connection to the server is made. (This can subsequently
815 be overridden using any of the other methods mentioned above.)
823 <title>About Unicode</title>
825 <indexterm><primary>Unicode</></>
828 An automatic encoding translation between Unicode and other
829 encodings has been supported since <productname>PostgreSQL</> 7.1.
830 For 7.1 it was not enabled by default.
831 To enable this feature, run configure with the
832 <option>--enable-unicode-conversion</option> option. Note that this requires
833 the <option>--enable-multibyte</option> option also.
836 For 7.2, <option>--enable-unicode-conversion</option> is not necessary.
837 The unicode conversion functionality is automatically enabled
838 if <option>--enable-multibyte</option> is specified.
843 <title>What happens if the translation is not possible?</title>
846 Suppose you choose <literal>EUC_JP</literal> for the server
847 and <literal>LATIN1</literal> for the client,
848 then some Japanese characters cannot be translated into <literal>LATIN1</literal>. In
849 this case, a letter that cannot be represented in the <literal>LATIN1</literal> character set
850 would be transformed as:
859 <title>References</title>
862 These are good sources to start learning about various kinds of encoding
867 <term><ulink url="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf"></ulink></term>
871 Detailed explanations of <literal>EUC_JP</literal>,
872 <literal>EUC_CN</literal>, <literal>EUC_KR</literal>,
873 <literal>EUC_TW</literal> appear in section 3.2.
879 <term><ulink url="http://www.unicode.org/"></ulink></term>
883 The web site of the Unicode Consortium
889 <term>RFC 2044</term>
893 <acronym>UTF</acronym>-8 is defined here.
902 <title>History</title>
904 <literallayout class="monospaced">
906 * An automatic encoding translation between Unicode and other
907 encodings are implemented
908 * Changes above will appear in 7.1
911 * SJIS UDC (NEC selection IBM kanji) support contributed
913 * Changes above will appear in 7.0.1
916 * Add new libpq functions PQsetClientEncoding, PQclientEncoding
917 * ./configure --with-mb=EUC_JP
919 ./configure --enable-multibyte=EUC_JP
921 * Add SQL_ASCII regression test case
922 * Add SJIS User Defined Character (UDC) support
923 * All of above will appear in 7.0
926 * Add support for WIN1250 (Windows Czech) as a client encoding
927 (contributed by Pavel Behal)
928 * fix some compiler warnings (contributed by Tomoaki Nishiyama)
931 * Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
932 (thanks Oleg Broytmann for testing)
933 * Fix problem with MB and locale
936 * Add support for Big5 for fronend encoding
937 (you need to create a database with EUC_TW to use Big5)
938 * Add regression test case for EUC_TW
939 (contributed by Jonah Kuo <email>jonahkuo@mail.ttn.com.tw</email>)
942 * Bugs related to SQL_ASCII support fixed
945 * 6.4 release. In this version, pg_database has "encoding"
946 column that represents the database encoding
949 * determine encoding at initdb/createdb rather than compile time
950 * support for PGCLIENTENCODING when issuing COPY command
951 * support for SQL92 syntax "SET NAMES"
952 * support for LATIN2-5
953 * add UNICODE regression test case
954 * new test suite for MB
955 * clean up source files
958 * add support for the encoding translation between the backend
960 * new command SET CLIENT_ENCODING etc. added
961 * add support for LATIN1 character set
962 * enhance 8 bit cleaness
964 April 21, 1998 some enhancements/fixes
965 * character_length(), position(), substring() are now aware of
966 multi-byte characters
968 * add --with-mb option to configure
969 * new regression tests for EUC_KR
970 (contributed by Soonmyung Hong <email>hong@lunaris.hanmesoft.co.kr</email>)
971 * add some test cases to the EUC_JP regression test
972 * fix problem in regress/regress.sh in case of System V
973 * fix toupper(), tolower() to handle 8bit chars
975 Mar 25, 1998 MB PL2 is incorporated into <productname>PostgreSQL</> 6.3.1
977 Mar 10, 1998 PL2 released
978 * add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
979 * add an English document (this file)
980 * fix problems concerning 8-bit single byte characters
982 Mar 1, 1998 PL1 released
987 <title>WIN1250 on Windows/ODBC</title>
991 [Here is a good documentation explaining how to use WIN1250 on
992 Windows/ODBC from Pavel Behal]
994 Version: 0.91 for PgSQL 6.5
996 Revised by: Tatsuo Ishii
997 Email: behal@opf.slu.cz
998 License: The Same as <productname>PostgreSQL</>
1000 Sorry for my Eglish and C code, I'm not native :-)
1002 !!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1005 The WIN1250 character set on Windows client platforms can be used
1006 with <productname>PostgreSQL</productname> with locale support
1011 The following should be kept in mind:
1016 Success depends on proper system locales. This has been tested
1017 with <systemitem class="osname">Red Hat 6.0</> and <systemitem
1018 class="osname">Slackware 3.6</>, with the
1019 <literal>cs_CZ.iso8859-2</literal> locale.
1025 Never try to set the server's database encoding to WIN1250.
1026 Always use LATIN2 instead since there is no WIN1250 locale
1033 The WIN1250 encoding is usable only for Windows ODBC clients. The
1034 characters are recoded on the fly, to be displayed and stored
1042 <title>WIN1250 on Windows/ODBC</title>
1045 Compile <productname>PostgreSQL</productname> with locale enabled
1046 and the server-side encoding set to <literal>LATIN2</literal>.
1052 Set up your installation. Do not forget to create locale
1053 variables in your environment. For example (this may
1054 not be correct for <emphasis>your</emphasis> environment):
1057 LC_ALL=cs_CZ.ISO8859-2
1064 You have to start the server with locales set!
1070 Try it with the Czech language, and have it sort on a query.
1076 Install ODBC driver for <productname>PostgreSQL</productname> on your Windows machine.
1082 Set up your data source properly. Include this line in your ODBC
1083 configuration dialog in the field <guilabel>Connect Settings</guilabel>:
1086 SET CLIENT_ENCODING = 'WIN1250';
1093 Now try it again, but in Windows with ODBC.
1102 <title>Single-byte character set recoding</>
1103 <!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> -->
1106 You can set up this feature with the <option>--enable-recode</> option
1107 to <filename>configure</>. This option was formerly described as
1108 <quote>Cyrillic recode support</> which doesn't express all its
1109 power. It can be used for <emphasis>any</> single-byte character
1114 This method uses a file <filename>charset.conf</> file located in
1115 the database directory (<envar>PGDATA</>). It's a typical
1116 configuration text file where spaces and newlines separate items
1117 and records and # specifies comments. Three keywords with the
1118 following syntax are recognized here:
1120 BaseCharset <replaceable>server_charset</>
1121 RecodeTable <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</>
1122 HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
1127 <token>BaseCharset</> defines the encoding of the database server.
1128 All character set names are only used for mapping inside of
1129 <filename>charset.conf</> so you can freely use typing-friendly
1134 <token>RecodeTable</> records specify translation tables between
1135 server and client. The file name is relative to the
1136 <envar>PGDATA</> directory. The table file format is very
1137 simple. There are no keywords and characters are represented by a
1138 pair of decimal or hexadecimal (0x prefixed) values on single
1141 <replaceable>char_value</> <replaceable>translated_char_value</>
1146 <token>HostCharset</> records define the client character set by IP
1147 address. You can use a single IP address, an IP mask range starting
1148 from the given address or an IP interval (e.g., 127.0.0.1,
1149 192.168.1.100/24, 192.168.1.20-192.168.1.40).
1153 The <filename>charset.conf</> file is always processed up to the
1154 end, so you can easily specify exceptions from the previous
1155 rules. In the <filename>src/data/</> directory you will find an
1156 example <filename>charset.conf</> and a few recoding tables.
1160 As this solution is based on the client's IP address and character
1161 set mapping there are obviously some restrictions as well. You
1162 cannot use different encodings on the same host at the same
1163 time. It is also inconvenient when you boot your client hosts into
1164 multiple operating systems. Nevertheless, when these restrictions are
1165 not limiting and you do not need multibyte characters then it is a
1166 simple and effective solution.
1172 <!-- Keep this comment at the end of the file
1177 sgml-minimize-attributes:nil
1178 sgml-always-quote-attributes:t
1181 sgml-parent-document:nil
1182 sgml-default-dtd-file:"./reference.ced"
1183 sgml-exposed-tags:nil
1184 sgml-local-catalogs:("/usr/lib/sgml/catalog")
1185 sgml-local-ecat-files:nil