granicus.if.org Git - postgresql/blob - doc/src/sgml/charset.sgml

   1 <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.3 2000/09/30 16:58:20 petere Exp $ -->
   2
   3 <chapter id="charset">
   4  <title>Localization</>
   5
   6  <abstract>
   7   <para>
   8    Describes the available localization features from the point of
   9    view of the administrator.
  10   </para>
  11  </abstract>
  12
  13   <para>
  14    <productname>Postgres</productname> supports localization with
  15    three approaches:
  16
  17    <itemizedlist>
  18     <listitem>
  19      <para>
  20       Using the locale features of the operating system to provide
  21       locale-specific collation order, number formatting, and other
  22       aspects.
  23      </para>
  24     </listitem>
  25
  26     <listitem>
  27      <para>
  28       Using explicit multiple-byte character sets defined in the
  29       <productname>Postgres</productname> server to support languages
  30       that require more characters than will fit into a single byte,
  31       and to provide character set recoding between client and server.
  32       The number of supported character sets is fixed at the time the
  33       server is compiled, and internal operations such as string
  34       comparisons require expansion of each character into a 32-bit
  35       word.
  36      </para>
  37     </listitem>
  38
  39     <listitem>
  40      <para>
  41       Single byte character recoding provides a more light-weight
  42       solution for users of multiple, yet single-byte character sets.
  43      </para>
  44     </listitem>
  45    </itemizedlist>
  46   </para>
  47
  48
  49  <sect1 id="locale">
  50   <title>Locale Support</title>
  51
  52   <para>
  53    <firstterm>Locale</> support refers to an application respecting
  54    cultural preferences regarding alphabets, sorting, number
  55    formatting, etc.  <productname>PostgreSQL</> uses the standard ISO
  56    C and POSIX-like locale facilities provided by the server operating
  57    system.  For additional information refer the documentation of your
  58    system.
  59   </para>
  60
  61   <sect2>
  62    <title>Overview</>
  63
  64   <para>
  65     Locale support is not build into <productname>PostgreSQL</> by
  66     default; to enable it, supply the <option>--enable-locale</> option
  67     to the <filename>configure</> script:
  68 <informalexample>
  69 <screen>
  70 <prompt>$ </><userinput>./configure --enable-locale</>
  71 </screen>
  72 </informalexample>
  73     Locale support only affects the server; all clients are compatible
  74     with servers with or without locale support.
  75    </para>
  76
  77    <para>
  78     The information about which particular cultural rules to use is
  79     determined by standard environment variables.  If you are getting
  80     localized behavior from other programs you probably have them set
  81     up already.  The simplest way to set the localization information
  82     is the <envar>LANG</> variable, for example:
  83 <programlisting>
  84 export LANG=sv_SE
  85 </programlisting>
  86     This sets the locale to Swedish (<literal>sv</>) as spoken in
  87     Sweden (<literal>SE</>).  Other possibilities might be
  88     <literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada,
  89     French).  If more than one character set can be useful for a locale
  90     then the specifications look like this:
  91     <literal>cs_CZ.ISO8859-2</>. What locales are available under what
  92     names on your system depends on what was provided by the operating
  93     system vendor and what was installed.
  94    </para>
  95
  96    <para>
  97     Occasionally it is useful to mix rules from several locales, e.g.,
  98     use U.S. rules but Spanish messages.  To do that a set of
  99     environment variables exist that override the default of
 100     <envar>LANG</> for a particular category:
 101
 102     <informaltable>
 103      <tgroup cols="2">
 104       <tbody>
 105        <row>
 106         <entry>LC_COLLATE</>
 107         <entry>String sort order</>
 108        </row>
 109        <row>
 110         <entry>LC_CTYPE</>
 111         <entry>Character classification (What is a letter? What is the upper-case equivalent of this letter?)</>
 112        </row>
 113        <row>
 114         <entry>LC_MESSAGES</>
 115         <entry>Language of messages</>
 116        </row>
 117        <row>
 118         <entry>LC_MONETARY</>
 119         <entry>Formatting of currency amounts</>
 120        </row>
 121        <row>
 122         <entry>LC_NUMERIC</>
 123         <entry>Formatting of numbers</>
 124        </row>
 125        <row>
 126         <entry>LC_TIME</>
 127         <entry>Formatting of dates and times</>
 128        </row>
 129       </tbody>
 130      </tgroup>
 131     </informaltable>
 132
 133     <envar>LC_MESSAGES</> only affects the messages that come from the
 134     operating system, not <productname>PostgreSQL</>.
 135    </para>
 136
 137    <para>
 138     If you want the system to behave as if it had no locale support,
 139     use the special locale <literal>C</> or <literal>POSIX</>, or
 140     simply unset all locale related variables.
 141    </para>
 142
 143    <para>
 144     Once you have chosen a set of localization rules this way you must
 145     keep them fixed for any particular database cluster.  That means
 146     that the locales that were active when you ran <filename>initdb</>
 147     must be kept the same when you start the postmaster.  Otherwise,
 148     the changed sort order can corrupt indexes or make your data
 149     disappear mysteriously.  It is currently not possible to change the
 150     locales after database initialization or to use more than one set
 151     of locales for a given database cluster.
 152    </para>
 153   </sect2>
 154
 155   <sect2>
 156    <title>Benefits</>
 157
 158    <para>
 159     Locale support influences in particular the following features:
 160
 161     <itemizedlist>
 162      <listitem>
 163       <para>
 164        Sort order in <command>ORDER BY</> queries.
 165       </para>
 166      </listitem>
 167
 168      <listitem>
 169       <para>
 170        The <function>to_char</> family of functions
 171       </para>
 172      </listitem>
 173
 174      <listitem>
 175       <para>
 176        The <literal>LIKE</> and <literal>~</> operators for pattern
 177        matching
 178       </para>
 179      </listitem>
 180     </itemizedlist>
 181    </para>
 182
 183    <para>
 184     The only severe drawback of using the locale support in
 185     <productname>PostgreSQL</> is its speed.  So use locale only if you
 186     actually need it.
 187    </para>
 188   </sect2>
 189
 190   <sect2>
 191    <title>Problems</>
 192
 193    <para>
 194     If locale support doesn't work in spite of the explanation above,
 195     check that the locale support in your operating system is okay.
 196     To check whether a given locale is installed and functional you
 197     can use <application>Perl</>, for example.  Perl has also support
 198     for locales and if a locale is broken <command>perl -v</> will
 199     complain something like this:
 200 <screen>
 201 <prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
 202 <prompt>$</> <userinput>perl -v</>
 203 <computeroutput>
 204 perl: warning: Setting locale failed.
 205 perl: warning: Please check that your locale settings:
 206 LC_ALL = (unset),
 207 LC_CTYPE = "not_exist",
 208 LANG = (unset)
 209 are supported and installed on your system.
 210 perl: warning: Falling back to the standard locale ("C").
 211 </computeroutput>
 212 </screen>
 213    </para>
 214
 215    <para>
 216     Check that your locale files are in the right location.  Possible
 217     locations include: <filename>/usr/lib/locale</filename> (Linux,
 218     Solaris), <filename>/usr/share/locale</filename> (Linux),
 219     <filename>/usr/lib/nls/loc</filename> (DUX 4.0).  Check the locale
 220     man page of your system if you are not sure.
 221    </para>
 222
 223    <para>
 224     The directory <filename>src/test/locale</> contains a test suite
 225     for <productname>PostgreSQL</>'s locale support.
 226    </para>
 227   </sect2>
 228  </sect1>
 229
 230
 231   <sect1 id="multibyte">
 232    <title>Multibyte Support</title>
 233
 234    <note>
 235     <title>Author</title>
 236
 237     <para>
 238      <ulink url="mailto:ishii@postgresql.org">Tatsuo Ishii</ulink>,
 239      last updated 2000-03-22.
 240      Check <ulink
 241       url="http://www.sra.co.jp/people/t-ishii/PostgreSQL/">Tatsuo's
 242       web site</ulink> for more information.
 243     </para>
 244    </note>
 245
 246    <para>
 247     Multibyte (<acronym>MB</acronym>) support is intended to allow
 248     <productname>Postgres</productname> to handle
 249     multiple-byte character sets such as EUC (Extended Unix Code), Unicode and
 250     Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte
 251     character sets in regular expressions (regexp), LIKE, and some
 252     other functions. The default
 253     encoding system is selected while initializing your
 254     <productname>Postgres</productname> installation using
 255     <application>initdb</application>. Note that this can be
 256     overridden when you create a database using
 257     <application>createdb</application> or by using the SQL command
 258     CREATE DATABASE. So you can have multiple databases each with
 259     a different encoding system.
 260    </para>
 261
 262    <para>
 263     <acronym>MB</acronym> also fixes some problems concerning 8-bit single byte
 264     character sets including ISO8859. (I would not say all of problems
 265     have been fixed. I just confirmed that the regression test ran fine
 266     and a few French characters could be used with the patch. Please let
 267     me know if you find any problem while using 8-bit characters.)
 268    </para>
 269
 270    <sect2>
 271     <title>Enabling MB</title>
 272
 273     <para>
 274      Run configure with a multibyte option:
 275
 276      <programlisting>
 277 % ./configure --enable-multibyte[=<replaceable>encoding_system</replaceable>]
 278      </programlisting>
 279
 280      where <replaceable>encoding_system</replaceable> can be one of the
 281      values in the following table:
 282
 283      <table tocentry="1">
 284       <title><productname>Postgres</productname> Character Set Encodings</title>
 285       <titleabbrev>Encodings</titleabbrev>
 286       <tgroup cols="2">
 287        <thead>
 288         <row>
 289          <entry>Encoding</entry>
 290          <entry>Description</entry>
 291         </row>
 292        </thead>
 293        <tbody>
 294         <row>
 295          <entry>SQL_ASCII</entry>
 296          <entry>ASCII</entry>
 297         </row>
 298         <row>
 299          <entry>EUC_JP</entry>
 300          <entry>Japanese EUC</entry>
 301         </row>
 302         <row>
 303          <entry>EUC_CN</entry>
 304          <entry>Chinese EUC</entry>
 305         </row>
 306         <row>
 307          <entry>EUC_KR</entry>
 308          <entry>Korean EUC</entry>
 309         </row>
 310         <row>
 311          <entry>EUC_TW</entry>
 312          <entry>Taiwan EUC</entry>
 313         </row>
 314         <row>
 315          <entry>UNICODE</entry>
 316          <entry>Unicode(UTF-8)</entry>
 317         </row>
 318         <row>
 319          <entry>MULE_INTERNAL</entry>
 320          <entry>Mule internal</entry>
 321         </row>
 322         <row>
 323          <entry>LATIN1</entry>
 324          <entry>ISO 8859-1 English and some European languages</entry>
 325         </row>
 326         <row>
 327          <entry>LATIN2</entry>
 328          <entry>ISO 8859-2 English and some European languages</entry>
 329         </row>
 330         <row>
 331          <entry>LATIN3</entry>
 332          <entry>ISO 8859-3 English and some European languages</entry>
 333         </row>
 334         <row>
 335          <entry>LATIN4</entry>
 336          <entry>ISO 8859-4 English and some European languages</entry>
 337         </row>
 338         <row>
 339          <entry>LATIN5</entry>
 340          <entry>ISO 8859-5 English and some European languages</entry>
 341         </row>
 342         <row>
 343          <entry>KOI8</entry>
 344          <entry>KOI8-R</entry>
 345         </row>
 346         <row>
 347          <entry>WIN</entry>
 348          <entry>Windows CP1251</entry>
 349         </row>
 350         <row>
 351          <entry>ALT</entry>
 352          <entry>Windows CP866</entry>
 353         </row>
 354        </tbody>
 355       </tgroup>
 356      </table>
 357     </para>
 358
 359     <para>
 360      Here is an example of configuring
 361      <productname>Postgres</productname> to use a Japanese encoding by
 362      default:
 363
 364      <programlisting>
 365 % ./configure --enable-multibyte=EUC_JP
 366      </programlisting>
 367     </para>
 368
 369     <para>
 370      If the encoding system is omitted (./configure --enable-multibyte),
 371      SQL_ASCII is assumed.
 372     </para>
 373    </sect2>
 374
 375    <sect2>
 376     <title>Setting the Encoding</title>
 377
 378     <para>
 379      <application>initdb</application> defines the default encoding
 380      for a <productname>Postgres</productname> installation. For example:
 381
 382      <programlisting>
 383 % initdb -E EUC_JP
 384      </programlisting>
 385
 386      sets the default encoding to EUC_JP(Extended Unix Code for Japanese).
 387      Note that you can use "--encoding" instead of "-E" if you prefer
 388      to type longer option strings.
 389      If no -E or --encoding option is given, the encoding
 390      specified at the compile time is used.
 391     </para>
 392
 393     <para>
 394      You can create a database with a different encoding:
 395
 396      <programlisting>
 397 % createdb -E EUC_KR korean
 398      </programlisting>
 399
 400      will create a database named "korean" with EUC_KR encoding. The
 401      another way to accomplish this is to use a SQL command:
 402
 403      <programlisting>
 404 CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
 405      </programlisting>
 406
 407      The encoding for a database is represented as an
 408      <firstterm>encoding column</firstterm>  in the
 409      <literal>pg_database</literal> system catalog.
 410      You can see that by using -l or \l of psql
 411      command.
 412
 413      <programlisting>
 414 $ psql -l
 415             List of databases
 416    Database    |  Owner  |   Encoding
 417 ---------------+---------+---------------
 418  euc_cn        | t-ishii | EUC_CN
 419  euc_jp        | t-ishii | EUC_JP
 420  euc_kr        | t-ishii | EUC_KR
 421  euc_tw        | t-ishii | EUC_TW
 422  mule_internal | t-ishii | MULE_INTERNAL
 423  regression    | t-ishii | SQL_ASCII
 424  template1     | t-ishii | EUC_JP
 425  test          | t-ishii | EUC_JP
 426  unicode       | t-ishii | UNICODE
 427 (9 rows)
 428      </programlisting>
 429     </para>
 430    </sect2>
 431
 432    <sect2>
 433     <title>Automatic encoding translation between backend and
 434      frontend</title>
 435
 436     <para>
 437      <productname>Postgres</productname> supports an automatic
 438      encoding translation between backend
 439      and frontend for some encodings.
 440
 441      <table tocentry="1">
 442       <title><productname>Postgres</productname> Client/Server Character Set Encodings</title>
 443       <titleabbrev>Communication Encodings</titleabbrev>
 444       <tgroup cols="2">
 445        <thead>
 446         <row>
 447          <entry>Server Encoding</entry>
 448          <entry>Available Client Encodings</entry>
 449         </row>
 450        </thead>
 451        <tbody>
 452         <row>
 453          <entry>EUC_JP</entry>
 454          <entry>EUC_JP, SJIS</entry>
 455         </row>
 456         <row>
 457          <entry>EUC_TW</entry>
 458          <entry>EUC_TW, BIG5</entry>
 459         </row>
 460         <row>
 461          <entry>LATIN2</entry>
 462          <entry>LATIN2, WIN1250</entry>
 463         </row>
 464         <row>
 465          <entry>LATIN5</entry>
 466          <entry>LATIN5, WIN, ALT</entry>
 467         </row>
 468         <row>
 469          <entry>MULE_INTERNAL</entry>
 470          <entry>EUC_JP, SJIS, EUC_KR, EUC_CN,
 471           EUC_TW, BIG5, LATIN1 to LATIN5,
 472           WIN, ALT, WIN1250</entry>
 473         </row>
 474        </tbody>
 475       </tgroup>
 476      </table>
 477     </para>
 478
 479     <para>
 480      To enable the automatic encoding translation, you have to tell
 481      <productname>Postgres</productname> the encoding you would like
 482      to use in frontend. There are
 483      several ways to accomplish this.
 484
 485      <itemizedlist>
 486       <listitem>
 487        <para>
 488         Using the <command>\encoding</command> command in
 489         <application>psql</application>.
 490         <command>\encoding</command> allows you to change frontend
 491         encoding on the fly. For
 492         example, to change the encoding to SJIS, type:
 493
 494         <programlisting>
 495 \encoding SJIS
 496         </programlisting>
 497        </para>
 498       </listitem>
 499
 500       <listitem>
 501        <para>
 502         Using libpq functions.
 503         <command>\encoding</command> actually calls
 504         PQsetClientEncoding() for its purpose.
 505
 506         <programlisting>
 507 int PQsetClientEncoding(PGconn *<replaceable>conn</replaceable>, const char *<replaceable>encoding</replaceable>)
 508         </programlisting>
 509
 510         where <replaceable>conn</replaceable> is a connection to the backend,
 511         and <replaceable>encoding</replaceable> is an encoding you
 512         want to use. If it successfully sets the encoding, it returns 0,
 513         otherwise -1. The current encoding for this connection can be shown by
 514         using:
 515
 516         <programlisting>
 517 int PQclientEncoding(const PGconn *<replaceable>conn</replaceable>)
 518         </programlisting>
 519
 520         Note that it returns the "encoding id," not the encoding symbol string
 521         such as "EUC_JP." To convert an encoding id to an encoding symbol, you
 522         can use:
 523
 524         <programlisting>
 525 char *pg_encoding_to_char(int <replaceable>encoding_id</replaceable>)
 526         </programlisting>
 527        </para>
 528       </listitem>
 529
 530       <listitem>
 531        <para>
 532         Using <envar>PGCLIENTENCODING</envar>.
 533
 534         If an environment variable <envar>PGCLIENTENCODING</envar> is defined in the
 535         frontend, an automatic encoding translation is done by the backend.
 536        </para>
 537       </listitem>
 538
 539       <listitem>
 540        <para>
 541         Using <command>SET CLIENT_ENCODING TO</command>.
 542
 543         Setting the frontend side encoding can be done a SQL command:
 544
 545         <programlisting>
 546 SET CLIENT_ENCODING TO 'encoding';
 547         </programlisting>
 548
 549         Also you can use SQL92 syntax "SET NAMES" for this purpose:
 550
 551         <programlisting>
 552 SET NAMES 'encoding';
 553         </programlisting>
 554
 555         To query the current the frontend encoding:
 556
 557         <programlisting>
 558 SHOW CLIENT_ENCODING;
 559         </programlisting>
 560
 561         To return to the default encoding:
 562
 563         <programlisting>
 564 RESET CLIENT_ENCODING;
 565         </programlisting>
 566        </para>
 567       </listitem>
 568      </itemizedlist>
 569     </para>
 570    </sect2>
 571
 572    <sect2>
 573     <title>About Unicode</title>
 574
 575     <para>
 576      An automatic encoding translation between Unicode and other
 577      encodings is not yet supported.
 578     </para>
 579    </sect2>
 580
 581    <sect2>
 582     <title>What happens if the translation is not possible?</title>
 583
 584     <para>
 585      Suppose you choose EUC_JP for the backend, LATIN1 for the frontend,
 586      then some Japanese characters could not be translated into LATIN1. In
 587      this case, a letter cannot be represented in the LATIN1 character set,
 588      would be transformed as:
 589
 590      <programlisting>
 591 (HEXA DECIMAL)
 592      </programlisting>
 593     </para>
 594    </sect2>
 595
 596    <sect2>
 597     <title>References</title>
 598
 599     <para>
 600      These are good sources to start learning various kind of encoding
 601      systems.
 602
 603      <itemizedlist>
 604       <listitem>
 605        <para>
 606         <ulink url="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf">
 607          ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf</ulink>
 608         Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
 609         appear in section 3.2.
 610        </para>
 611       </listitem>
 612
 613       <listitem>
 614        <para>
 615         Unicode: <ulink url="http://www.unicode.org/">http://www.unicode.org/</ulink>
 616         The homepage of UNICODE.
 617        </para>
 618       </listitem>
 619
 620       <listitem>
 621        <para>
 622         <literal>RFC 2044</literal>
 623         UTF-8 is defined here.
 624        </para>
 625       </listitem>
 626      </itemizedlist>
 627     </para>
 628    </sect2>
 629
 630    <sect2>
 631     <title>History</title>
 632
 633     <para>
 634      <programlisting>
 635 May 20, 2000
 636         * SJIS UDC (NEC selection IBM kanji) support contributed
 637           by Eiji Tokuya
 638         * Changes above will appear in 7.0.1
 639
 640 Mar 22, 2000
 641         * Add new libpq functions PQsetClientEncoding, PQclientEncoding
 642         * ./configure --with-mb=EUC_JP
 643           now deprecated. use
 644           ./configure --enable-multibyte=EUC_JP
 645           instead
 646         * Add SQL_ASCII regression test case
 647         * Add SJIS User Defined Character (UDC) support
 648         * All of above will appear in 7.0
 649
 650 July 11, 1999
 651         * Add support for WIN1250 (Windows Czech) as a client encoding
 652           (contributed by Pavel Behal)
 653         * fix some compiler warnings (contributed by Tomoaki Nishiyama)
 654
 655 Mar 23, 1999
 656         * Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
 657           (thanks Oleg Broytmann for testing)
 658         * Fix problem with MB and locale
 659
 660 Jan 26, 1999
 661         * Add support for Big5 for fronend encoding
 662           (you need to create a database with EUC_TW to use Big5)
 663         * Add regression test case for EUC_TW
 664           (contributed by <ulink url="mailto:jonahkuo@mail.ttn.com.tw">Jonah Kuo</ulink>)
 665
 666 Dec 15, 1998
 667         * Bugs related to SQL_ASCII support fixed
 668
 669 Nov 5, 1998
 670         * 6.4 release. In this version, pg_database has "encoding"
 671           column that represents the database encoding
 672
 673 Jul 22, 1998
 674         * determine encoding at initdb/createdb rather than compile time
 675         * support for PGCLIENTENCODING when issuing COPY command
 676         * support for SQL92 syntax "SET NAMES"
 677         * support for LATIN2-5
 678         * add UNICODE regression test case
 679         * new test suite for MB
 680         * clean up source files
 681
 682 Jun 5, 1998
 683         * add support for the encoding translation between the backend
 684           and the frontend
 685         * new command SET CLIENT_ENCODING etc. added
 686         * add support for LATIN1 character set
 687         * enhance 8 bit cleaness
 688
 689 April 21, 1998 some enhancements/fixes
 690         * character_length(), position(), substring() are now aware of
 691           multi-byte characters
 692         * add octet_length()
 693         * add --with-mb option to configure
 694         * new regression tests for EUC_KR
 695           (contributed by <ulink url="mailto:hong@lunaris.hanmesoft.co.kr">Soonmyung. Hong</ulink>)
 696         * add some test cases to the EUC_JP regression test
 697         * fix problem in regress/regress.sh in case of System V
 698         * fix toupper(), tolower() to handle 8bit chars
 699
 700 Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
 701
 702 Mar 10, 1998 PL2 released
 703         * add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
 704         * add an English document (this file)
 705         * fix problems concerning 8-bit single byte characters
 706
 707 Mar 1, 1998 PL1 released
 708      </programlisting>
 709     </para>
 710    </sect2>
 711
 712    <sect2>
 713     <title>WIN1250 on Windows/ODBC</title>
 714
 715     <para>
 716 <!--
 717 [Here is a good documentation explaining how to use WIN1250 on
 718 Windows/ODBC from Pavel Behal. Please note that Installation step 1)
 719 is not necceary in 6.5.1 - Tatsuo]
 720
 721 Version: 0.91 for PgSQL 6.5
 722 Author: Pavel Behal
 723 Revised by: Tatsuo Ishii
 724 Email: <ulink url="mailto:behal@opf.slu.cz">behal@opf.slu.cz</ulink>
 725 Licence: The Same as PostgreSQL
 726
 727 Sorry for my Eglish and C code, I'm not native :-)
 728
 729 !!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 730 -->
 731
 732      The WIN1250 character set on Windows client platforms can be used
 733      with <productname>Postgres</productname> with locale support
 734      enabled.
 735     </para>
 736
 737     <para>
 738      The following should be kept in mind:
 739
 740      <itemizedlist>
 741       <listitem>
 742        <para>
 743         Success depends on proper system locales. This has been tested
 744         with RH6.0 and Slackware 3.6, with cs_CZ.iso8859-2 locale.
 745        </para>
 746       </listitem>
 747
 748       <listitem>
 749        <para>
 750         Never try to set the server multibyte database encoding to WIN1250.
 751         Always use LATIN2 instead since there is not a WIN1250 locale
 752         in Unix.
 753        </para>
 754       </listitem>
 755
 756       <listitem>
 757        <para>
 758         WIN1250 encoding is useable only for M$W ODBC clients. The
 759         characters are recoded on the fly, to be displayed and stored
 760         back properly.
 761        </para>
 762       </listitem>
 763      </itemizedlist>
 764     </para>
 765
 766     <para>
 767      When running, it is important to remember the following:
 768
 769      <itemizedlist>
 770       <listitem>
 771        <para>
 772         This configuration reorders your sort order depending on your
 773         <envar>LC_<replaceable>x</replaceable></envar> settings. Don't be
 774         confused with the regression test results since they don't use
 775         locale.
 776        </para>
 777       </listitem>
 778
 779       <listitem>
 780        <para>
 781         A locale such as "<literal>ch</literal>" is correctly sorted
 782         only if your system
 783         supports that locale; older systems may not do so but new ones
 784         (e.g. RH6.0) do.
 785        </para>
 786       </listitem>
 787
 788       <listitem>
 789        <para>
 790         You have to insert money as '<literal>162,50</literal>' (note
 791         comma within the single-quotes).
 792        </para>
 793       </listitem>
 794
 795       <listitem>
 796        <para>
 797         At the time of writing (early 1999), this configuration has
 798         not received extensive testing. Please let us know of any
 799         changes you had to make!
 800        </para>
 801       </listitem>
 802      </itemizedlist>
 803     </para>
 804
 805     <procedure>
 806      <title>WIN1250 on Windows/ODBC</title>
 807      <step>
 808       <para>
 809        Change the three relevant files in the source directories.
 810       </para>
 811      </step>
 812
 813      <step>
 814       <para>
 815        Compile <productname>Postgres</productname> with local enabled
 816        and the multibyte encoding set to <literal>LATIN2</literal>.
 817       </para>
 818      </step>
 819
 820      <step>
 821       <para>
 822        Set up your instalation. Do not forget to create locale
 823        variables in your profile (environment). For example (this may
 824        not be correct for <emphasis>your</emphasis> environment):
 825
 826        <programlisting>
 827 LC_ALL=cs_CZ.ISO8859-2
 828 LC_COLLATE=cs_CZ.ISO8859-2
 829 LC_CTYPE=cs_CZ.ISO8859-2
 830 LC_MONETARY=cs_CZ.ISO8859-2
 831 LC_NUMERIC=cs_CZ.ISO8859-2
 832 LC_TIME=cs_CZ.ISO8859-2
 833        </programlisting>
 834       </para>
 835      </step>
 836
 837      <step>
 838       <para>
 839        You have to start the postmaster with locales set!
 840       </para>
 841      </step>
 842
 843      <step>
 844       <para>
 845        Try it with Czech language, and have it sort on a query.
 846       </para>
 847      </step>
 848
 849      <step>
 850       <para>
 851        Install ODBC driver for PgSQL on your M$ Windows machine.
 852       </para>
 853      </step>
 854
 855      <step>
 856       <para>
 857        Setup properly your data source. Include this line in your ODBC
 858        configuration dialog in the field <literal>Connect Settings</literal>:
 859
 860        <programlisting>
 861 SET CLIENT_ENCODING = 'WIN1250';
 862        </programlisting>
 863       </para>
 864      </step>
 865
 866      <step>
 867       <para>
 868        Now try it again, but in Windows with ODBC.
 869       </para>
 870      </step>
 871     </procedure>
 872    </sect2>
 873   </sect1>
 874
 875
 876  <sect1 id="recode">
 877   <title>Single-byte character set recoding</>
 878 <!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> -->
 879
 880   <para>
 881    You can set up this feature with the <option>--enable-recode</> option
 882    to <filename>configure</>. This option was formerly described as
 883    <quote>Cyrillic recode support</> which doesn't express all its
 884    power. It can be used for <emphasis>any</> single-byte character
 885    set recoding.
 886   </para>
 887
 888   <para>
 889    This method uses a file <filename>charset.conf</> file located in
 890    the database directory (<envar>PGDATA</>).  It's a typical
 891    configuration text file where spaces and newlines separate items
 892    and records and # specifies comments.  Three keywords with the
 893    following syntax are recognized here:
 894 <synopsis>
 895 BaseCharset      <replaceable>server_charset</>
 896 RecodeTable      <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</>
 897 HostCharset      <replaceable>host_spec</>    <replaceable>host_charset</>
 898 </synopsis>
 899   </para>
 900
 901   <para>
 902    <token>BaseCharset</> defines the encoding of the database server.
 903    All character set names are only used for mapping inside of
 904    <filename>charset.conf</> so you can freely use typing-friendly
 905    names.
 906   </para>
 907
 908   <para>
 909    <token>RecodeTable</> records specify translation tables between
 910    server and client.  The file name is relative to the
 911    <envar>PGDATA</> directory.  The table file format is very
 912    simple. There are no keywords and characters are represented by a
 913    pair of decimal or hexadecimal (0x prefixed) values on single
 914    lines:
 915 <synopsis>
 916 <replaceable>char_value</>   <replaceable>translated_char_value</>
 917 </synopsis>
 918   </para>
 919
 920   <para>
 921    <token>HostCharset</> records define the client character set by IP
 922    address. You can use a single IP address, an IP mask range starting
 923    from the given address or an IP interval (e.g., 127.0.0.1,
 924    192.168.1.100/24, 192.168.1.20-192.168.1.40).
 925   </para>
 926
 927   <para>
 928    The <filename>charset.conf</> file is always processed up to the
 929    end, so you can easily specify exceptions from the previous
 930    rules. In the src/data you will find charset.conf example and a few
 931    recoding tables.
 932   </para>
 933
 934   <para>
 935    As this solution is based on the client's IP address and character
 936    set mapping there are obviously some restrictions as well. You
 937    cannot use different encodings on the same host at the same
 938    time. It is also inconvenient when you boot your client hosts into
 939    more operating systems.  Nevertheless, when these restrictions are
 940    not limiting and you do not need multi-byte characters than it is a
 941    simple and effective solution.
 942   </para>
 943  </sect1>
 944
 945 </chapter>
 946
 947 <!-- Keep this comment at the end of the file
 948 Local variables:
 949 mode:sgml
 950 sgml-omittag:nil
 951 sgml-shorttag:t
 952 sgml-minimize-attributes:nil
 953 sgml-always-quote-attributes:t
 954 sgml-indent-step:1
 955 sgml-indent-data:t
 956 sgml-parent-document:nil
 957 sgml-default-dtd-file:"./reference.ced"
 958 sgml-exposed-tags:nil
 959 sgml-local-catalogs:("/usr/lib/sgml/catalog")
 960 sgml-local-ecat-files:nil
 961 End:
 962 -->