From: François Pinard Date: Fri, 29 Nov 2013 04:26:34 +0000 (-0500) Subject: README.org: New name and format for README. X-Git-Tag: v3.7~187 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=84e81343760e14ea41c12a1143cf3962686ebf94;p=recode README.org: New name and format for README. Also, document the planned future of Recode. --- diff --git a/README b/README deleted file mode 100644 index 4f561a2..0000000 --- a/README +++ /dev/null @@ -1,492 +0,0 @@ -.. role:: code(strong) -.. role:: file(literal) - -====================== -README file for Recode -====================== - -.. contents:: -.. sectnum:: - -Introduction -============ - -What is Recode? ---------------- - -Here is version 3.6 for the Recode program and library. Hereafter, -Recode means the whole package, :code:`recode` means the executable -program. Glance through this :file:`README` file before starting -configuration. Make sure you read files :file:`ABOUT-NLS` and -:file:`INSTALL` if you are not familiar with them already. - -The Recode library converts files between character sets and usages. -It recognises or produces over 200 different character sets (or about -300 if combined with an :code:`iconv` library) and transliterates files -between almost any pair. When exact transliteration are not possible, -it gets rid of offending characters or falls back on approximations. -The :code:`recode` program is a handy front-end to the library. - -The Recode program and library have been written by François Pinard, -yet it significantly reuses tabular works from Keld Simonsen. It is an -evolving package, and specifications might change in future releases. - -On various Unix systems, Recode is usually compiled from sources, -see the `Installation`_ section below. On Linux, it often comes -bundled. Recode had been ported to other popular systems. See both -`contrib/README`__ and the `Non-Unix ports`_ section below, to find -some more information about these. - -__ contrib.html - -Reports and collaboration -------------------------- - -Send bug reports to mailto:recode-bugs@iro.umontreal.ca' . A bug -report is an adequate description of the problem: your input, what you -expected, what you got, and why this is wrong. Diffs are welcome, but -they only describe a solution, from which the problem might be uneasy -to infer. If needed, submit actual data files with your report. Small -data files are preferred. Big files may sometimes be necessary, but do -not send them on the mailing list; rather take special arrangement with -the maintainer. - -Your feedback will help us to make a better and more portable -package. Consider documentation errors as bugs, and report them -as such. If you develop anything pertaining to Recode or have -suggestions, let us know and share your findings by writing at -mailto:recode-forum@iro.umontreal.ca . You may also choose to directly -write at mailto:pinard@iro.umontreal.ca, yet be warned that such -correspondence is often visible for a while through the Recode Web site. - -If you feel like receiving releases and pretest announcements for the -Recode package, send a message to mailto:majordomo@iro.umontreal.ca -having, in its body, a line saying:: - - subscribe recode-announce - -If you rather want to participate actively in discussions, pretesting -and development for Recode, do just as above, but this time, use:: - - subscribe recode-forum - -Visit http://recode.progiciels-bpi.ca/ for releases or pretests, and related -files. In particular, button ``Browse`` gives access to a weekly mirror -of the current unpackaged work files, while button ``Folders`` gives -access to saved or pending correspondence. - -Please *do not* widely redistribute releases having a letter after the -version numbers, as these are meant for pretesting only, and might not -be stable enough for other usages. - -Development plan ----------------- - -My plan has long been to end the 3.x series of this package, rather -aiming 4.0 as a major internal rewrite. As there is still a long -way before 4.0 gets ready, and *especially* because some of my good -collaborators insisted that I do so, there will be a Recode 3.7. That -release is meant to provide a selection of user-contributed patches. - -For prototyping what Recode will become and experimenting new concepts -more easily, I created a subsidiary and standalone project named -Recodec, meant to receive the best part of my development efforts in -this particular area. Once I'll be happy with the prototype, the plan -is to rewrite it from Python to C, somehow. Visit the Web pages for -this `Recodec project`__ for more information and details. For now at -least, new features go to Recodec only. - -__ http://recodec.progiciels-bpi.ca - -Notes for version 3.7-beta2 ---------------------------- - -Here are a few notes related to the beta2 pre-test release for the incoming -Recode 3.7. I publish it to ease later exchanges of patches with testers. - -+ The name has been changed from Free recode to Recode -- as "Free" was - a four letter word to some people :-). :code:`recode` (no capital) still - names the executable program specifically, or the distribution archive - itself. - -+ Recode does not itself include :code:`libiconv` anymore. However, - it uses an external :code:`iconv` library if one is available at - installation time, like :code:`libiconv` or the one provided within GNU - :code:`libc`. The ``-x:`` option to the program, or a new flag to the - library :code:`recode_new_outer` function, inhibits the initialisation - and usage of :code:`iconv`. - -+ The bug about loosing a few characters, here and there, when recoding - big files in :code:`iconv` context, seems to have been corrected. A - patch for this problem has been floating around for years, but it was - not solving all cases. - -+ Recode installation now uses Python. In particular, it creates - file :file:`build/src/iconvdecl.h` from local ``iconv -l`` output. - Recode testing through ``make check`` also needs what people - :code:`python-devel`, providing C header files for Python and - :code:`distutils`. The :file:`Makemore` file has been merged within - regular Makefiles and is not distributed separately anymore. - -+ It is likely that new bugs have been introduced through the above - changes. In particular, not everything is cosy on the side of release - engineering. A few files are either spuriously remade, or remade late. - I'm a bit surprised by the difficulty to get this right. - -+ ``make check`` accepts a ``LIMIT=`` option, for limiting tests to one or - a few cases. See :file:`tests/Makefile` for more information. - -+ PO files have been updated from the Translation Project. - -Notes for version 3.7-beta1 ---------------------------- - -The beta 1 pre-test release for the incoming Recode 3.7 has been made -available for those needing it right away. While it solves some serious -bugs and portability problems, others are meant to be addressed only in -later pre-tests. In particular, none of charset or surface issues, user -requests, and various suggestions appear in this pre-test, and will not -either in later pretests, until all real show-stoppers are solved first. -So this is in no way a candidate for a Recode 3.7 release. - -The test suite is worth more comments: - -+ The suite is very partial, and may not be thought as a validation - suite. Before it could be used to ascertain confidence, it would need - much more tests than it has already. - -+ Testing is notably more speedy than it used to be. For example, the - previous :code:`bigauto` test, which was not run by default because it - ran for too long, is now executed within the standard test suite, once - in non-strict mode, and a second time in strict mode. - -+ It does not use Autotest anymore, but rather a home grown test driver - much inspired from the Codespeak project. The link between the test and - the Recode library is established through a Pyrex interface, so you need - to have :code:`python` and :code:`python-devel` installed first. - -+ Beware that the Pyrex interface to the Recode library is only meant - for testing, for now at least. While you may play with it, it would not - be wise relying on it, as the specifications might change at any time. - -Installation -============ - -Prerequisites -------------- - -Simple installation of Recode requires the usual tools and facilities as -those needed for most GNU packages. If not already bundled with your -system, you also need to pre-install Python, version 2.2 or better. You -may get it from: - - http://www.python.org - -It is also convenient to have some :code:`iconv` library already present -on your system, this much extends Recode capabilities, especially in -the area of Asiatic character sets. GNU :code:`libc`, as found on -Linux systems and a few others, already has such an :code:`iconv` -library. Otherwise, you might consider pre-installing the portable -:code:`libiconv`, written by Bruno Haible. You may get it from: - - http://www.gnu.org/software/libiconv/ - -Getting a release ------------------ - -Source files and various distributions (either latest, prestest, or -archive) are available through: - - https://github.com/pinard/Recode/ - -File timestamps after checkout may trigger Make difficulties. As a way -to avoid these, from the top level of the distribution, execute ``sh -after-patch.sh`` before configuring. If you miss either :code:`sh` or -GNU :code:`touch`, try ``python after-patch.py`` instead. - -Configure options ------------------ - -Once you have an unpacked distribution, see files: - - =================== ======================================================= - File name Description - =================== ======================================================= - :file:`ABOUT-NLS` how to customise this program to your language - :file:`COPYING` copying conditions for the program - :file:`COPYING.LIB` copying conditions for the library - :file:`INSTALL` compilation and installation instructions - :file:`NEWS` major changes in the current release - :file:`THANKS` partial list of contributors - =================== ======================================================= - -Besides those configure options documented in files :file:`INSTALL` -and :file:`ABOUT-NLS`, a few extra options may be accepted after -``./configure``: - -+ Options ``--disable-shared`` or ``--disable-static`` - - to inhibit the building of shared libraries or static libraries; the - default is to always build static libraries, and to attempt building - shared libraries if there is some known recipe for this. - -+ Option ``--with-gnu-ld`` - - to force the assumption that the C compiler uses GNU ld. - -+ Option ``--with-dmalloc`` - - to trigger a debugging feature for looking at memory management - problems, it pre-requires Gray Watson's package, which is available as - ftp://ftp.letters.com/src/dmalloc/dmalloc.tar.gz . - -Maintenance tools ------------------ - -For simple modifications to Recode, you should not need special tools -beyond those usual for installing GNU packages. However, if you modify -any :file:`.l` source file, Python and Flex are both needed for remaking -:file:`merged.c`. - -For more comprehensive modifications, you might need more tools. If not -done already, make sure you have a copy of the packages listed in the -following table. You may also choose to establish a link in your build -:file:`doc/` directory, as explained within :file:`doc/Makemore`. - - ================ ========== ========== ============= - Package name Current Minimum Install after - ================ ========== ========== ============= - :code:`autoconf` 2.61 2.12 :code:`m4` - :code:`automake` 1.10 1.9 :code:`Perl` - :code:`Flex` 2.5.33 2.5.4a - :code:`gettext` 0.16 0.16 - :code:`Help2man` 1.36 1.020 :code:`Perl` - :code:`libtool` 1.5.24 1.3.4 - :code:`m4` 1.4.10 1.4n - :code:`Make` 3.81 - :code:`Perl` 5.8.8 5.005.03 - :code:`Python` 2.5.1 2.2 - :code:`tar` 1.17 1.12 - :code:`wget` 1.10.2 - ================ ========== ========== ============= - -The *current* version numbers just happen to be those used for -development, it is often likely that older versions would work just as -well. The *minimum* version numbers were once acceptable, they might -not be anymore, this has not been verified; any updating information is -welcome! - -Installation hints ------------------- - -Here are a few hints which might help installing Recode on some systems. -Many may be applied by temporary presetting environment variables while -calling ``./configure``. File :file:`INSTALL` explains this. - -+ Compilation time - - Some C compilers, like Apollo's, have a hard time compiling - :file:`merged.c`. If this is your case, avoid compiler - optimisation. From within the Bourne shell, you may use:: - - CFLAGS= ./configure - - But if you want to give a real hard time to your C optimiser on - :file:`merged.c`, to get code that runs only a bit faster, merely - try:: - - CPPFLAGS=-DINLINE_HARDER ./configure - -+ Smallish systems - - For 80286 based systems (do some still exist?!), it has been - reported that some compilers generate wrong code while optimising - for *small* models. So, from within the Bourne shell, do:: - - CFLAGS=-Ml LDFLAGS=-Ml ./configure - - to force large memory model. For 80286 Xenix compiler, the last time - it was tried a while ago, one ought to use:: - - CFLAGS='-Ml -F2000' LDFLAGS=-Ml ./configure - - Other systems have poor :code:`pipe`/:code:`popen` support or thrash - heavily when processes fork. In this case, just before doing - ``make``, edit :file:`config.h` and ensure :code:`HAVE_PIPE` is - *not* defined. - -External pointers -================= - -Documentation -------------- - -+ IETF references - - + Character Mnemonics & Character Sets - - + ftp://nic.ddn.mil/rfc/rfc1345.txt - - Keld Simonsen , 1992-06. - - + UTF-7 - A Mail-Safe Transformation Format of Unicode - - + ftp://nic.ddn.mil/rfc/rfc1642.txt - - David Goldsmith and Mark Davis - , 1994-07. - - + UTF-8, a transformation format of Unicode and ISO 10646 - - + ftp://nic.ddn.mil/rfc/rfc2044.txt - - François Yergeau , 1997-10. - -+ Various references - - + Unicode charset mappings - - + ftp://ftp.unicode.org:/Public/MAPPINGS/ - - The Unicode consortium makes available plenty of charset mappings - for converting "legacy" charsets to Unicode. - - + Normalisation et internationalisation: Inventaire et prospectives des - normes clefs pour le traitement informatique du français. (392p.) - - This is a report, written in French, discussing charset issues and many - other topics as well. Laurent Bourbeau - and François Pinard , 1995-10. - - + ftp://ftp.iro.umontreal.ca/pub/contrib/pinard/accents/oqil-tome1.ps.gz - + http://www.ceveil.qc.ca/Normes - -+ Recode specific - - + ETL presentation - - In 1999, the organisers of the `m17n99 conference`__ in Tsukuba, - Japan, were kind enough to invite me. This has been for me a - fabulous trip and experience, and I met many extraordinary people in - there. At the conference, I presented the Translation Project, and - Recode. The Recode `presentation slides`__ are available. - -__ http://www.m17n.org/conference/m17n99_all_but_registration/welcome.en.html -__ m17n99.html - -Programs --------- - -+ :code:`libiconv` - - This comprehensive charset converter library revolves around Unicode, - and support Asian encodings among many others. Even Recode uses it! - - + http://www.gnu.org/software/libiconv/ - - Bruno Haible - -+ :code:`tcs` - - Here is the main recoding tool from the Plan9 project. - - + ftp://research.att.com/dist/tcs.shar.Z - -+ :code:`yuedit` - - This GUI editor handles many encodings, among which UTF-8. It also - installs uniconv, a recoding program, and uniprint, a printing tool. - - + ftp://sunsite.unc.edu/pub/Linux/apps/editors/X/yudit-1.2.tar.gz - - Gaspar Sinai , 1999-01. - -+ :code:`ucs-fonts` - - These 6x13 fonts, covering Unicode characters besides the Asian sets, - merely replace the Linux fixed 6x13 font. Works nicely with yudit. - - + http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz - - Markus Kuhn , 1998-11. - -+ :code:`MtRecode` - - This charset converter is oriented towards SGML text manipulation. It - may be freely downloaded for non-commercial, non-military use from: - - + http://www.lpl.univ-aix.fr/projects/multext/MtRecode/ - - Pointer given by Jean Véronis , 1996-06. - -+ :code:`sp` - - This quite nice SGML structure analyser contains internal C++ modules - for handling many charsets. - - + ftp://ftp.jclark.com/pub/sp/sp-1.3.tar.gz - - James Clark - -+ :code:`b2c` - - This program is able to generate interpreted - character dumps, but properly embedded within complete C header files. - - + http://research.de.uu.net:8080/~gnu/b2c/b2c-2.1.tar.gz - - Jörg Heitkötter , 1997-11. - -+ :code:`PyRecode` - - This wrapper provides Recode functionality to Python programs. - - + http://www.suxers.de/PyRecode.tgz - - Andreas Jung - - Also see: - - + http://www.vex.net/parnassus/apyllo.py?find=recode - + http://www.suxers.de/python/pyrecode.htm - -Non-Unix ports --------------- - -Please mailto:recode-bugs@iro.umontreal.ca if you are aware of various -ports to non-Unix systems not listed here, or for corrections. Please -provide the goal system, a complete and stable URL, the maintainer name -and address, the Recode version used as a base, and your comments. - -+ MSDOS (DJGPP) - - Juan Manuel Guerrero maintains this port, - dated 2001-03 and based on Recode 3.5. The following archives hold - binaries, docs and sources respectively. - - + ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/rcode35b.zip - + ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/rcode35d.zip - + ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/rcode35s.zip - - See `contrib/DJGPP/README`__ in the Recode distribution for more - information about compiling this port. - - __ /DJGPP.html - -+ MSDOS (Gnuish) - - Darrel Hankerson maintains this port, dated - 1994-11 and based on Recode 3.4. You get many GNU tools, not only - Recode. The GNUish project is described in :file:`gnuish_t.htm`. - - + http://www.simtel.net/simtel.net/ - + http://www.leo.org/pub/comp/platforms/pc/gnuish (Germany) - + ftp://ftp.simtel.net/simtelnet/gnu - + ftp://ftp.leo.org/pub/comp/platforms/pc/gnuish - -+ OS/2 (using emx/gcc) - - Maintainer unknown (maybe Kai Uwe Rommel ), dated - 1994-11 and based on Recode 3.4. - - + http://hobbes.nmsu.edu/pub/os2/util/convert/gnurcode.zip diff --git a/README.org b/README.org new file mode 100644 index 0000000..ba51c57 --- /dev/null +++ b/README.org @@ -0,0 +1,482 @@ +#+TITLE: README for Recode +#+OPTIONS: H:2 toc:2 + +* Introduction +** What is Recode? +Here is version 3.6 for the Recode program and library. Hereafter, +Recode means the whole package, *recode* means the executable program. +Glance through this =README= file before starting configuration. Make +sure you read files =ABOUT-NLS= and =INSTALL= if you are not familiar with +them already. + +The Recode library converts files between character sets and usages. +It recognises or produces over 200 different character sets (or about +300 if combined with an *iconv* library) and transliterates files +between almost any pair. When exact transliteration are not possible, +it gets rid of offending characters or falls back on approximations. +The *recode* program is a handy front-end to the library. + +The Recode program and library have been written by François Pinard, +yet it significantly reuses tabular works from Keld Simonsen. It is +an evolving package, and specifications might change in future +releases. + +On various Unix systems, Recode is usually compiled from sources, see +the [[Installation]] section below. On Linux, it often comes bundled. +Recode had been ported to other popular systems. See both +[[http:/contrib.html][contrib/README]] and the [[Non-Unix ports]] section below, to find some more +information about these. + +** Reports and collaboration +Send bug reports to [[mailto:recode-bugs@iro.umontreal.ca][this address]], or if you are comfortable with +GitHub facilities, through [[https://github.com/pinard/Recode/issues][GitHub Issues]]. A bug report is an adequate +description of the problem: your input, what you expected, what you +got, and why this is wrong. Diffs are welcome, but they only describe +a solution, from which the problem might be uneasy to infer. If +needed, submit actual data files with your report. Small data files +are preferred. Big files may sometimes be necessary, but do not send +them on the mailing list; rather take special arrangement with the +maintainer. + +Your feedback will help us to make a better and more portable package. +Consider documentation errors as bugs, and report them as such. If +you develop anything pertaining to Recode or have suggestions, let us +know and share your findings by writing at [[mailto:recode-forum@iro.umontreal.ca][Recode forum]]. You may also +choose to directly write to [[mailto:pinard@iro.umontreal.ca][the author]], yet be warned that such +correspondence is often visible for a while through the Recode Web +site. + +If you feel like receiving releases and pretest announcements for the +Recode package, send a message to [[mailto:majordomo@iro.umontreal.ca][this Majordomo]] having, in its body, +a line saying: + + #+BEGIN_EXAMPLE + subscribe recode-announce + #+END_EXAMPLE + +If you rather want to participate actively in discussions, pretesting +and development for Recode, do just as above, but this time, use: + + #+BEGIN_EXAMPLE + subscribe recode-forum + #+END_EXAMPLE + +The [[https://github.com/pinard/Recode][Git repository]] for Recode gives access, through the magic of Git +and GitHub, to all distribution releases, would they be actual or +past, pretest or official, as well as individual files. + +Please /do not/ widely redistribute releases having a letter after the +version numbers, as these are meant for pretesting only, and might not +be stable enough for other usages. + +* Release notes +** Notes for version 3.7-beta2 +Here are a few notes related to the *beta2* pre-test release for the +incoming Recode 3.7. I publish it to ease later exchanges of patches +with testers. + +- Long ago, I renamed /GNU recode/ to /Free recode/: the permission for + using the /GNU/ prefix mandated a level of obedience to the FSF that + once went overboard, in my opinion. After that change, I realized + that some people read /Free/ as a four letter word! To be peaceful, + this version changes the name again, from /Free recode/ to merely + /Recode/. *recode* (no capital) still names the executable program + specifically, or the distribution archive itself. + +- Recode does not itself include *libiconv* anymore. However, it uses + an external *iconv* library if one is available at installation time, + like *libiconv* or the one provided within GNU *libc*. The =-x:= option + to the program, or a new flag to the library *recode_new_outer* + function, inhibits the initialisation and usage of *iconv*. + +- The bug about loosing a few characters, here and there, when + recoding big files in *iconv* context, seems to have been corrected. + A patch for this problem has been floating around for years, but it + was not solving all cases. + +- Recode installation now uses Python. In particular, it creates file + =build/src/iconvdecl.h= from local =iconv -l= output. Recode testing + through =make check= also needs what people *python-devel*, providing C + header files for Python and *distutils*. The =Makemore= file has been + merged within regular Makefiles and is not distributed separately + anymore. + +- It is likely that new bugs have been introduced through the above + changes. In particular, not everything is cosy on the side of + release engineering. A few files are either spuriously remade, or + remade late. I'm a bit surprised by the difficulty to get this + right. + +- =make check= accepts a =LIMIT== option, for limiting tests to one or a + few cases. See =tests/Makefile= for more information. + +- PO files have been updated from the Translation Project. + +** Notes for version 3.7-beta1 +The beta 1 pre-test release for the incoming Recode 3.7 has been made +available for those needing it right away. While it solves some +serious bugs and portability problems, others are meant to be +addressed only in later pre-tests. In particular, none of charset or +surface issues, user requests, and various suggestions appear in this +pre-test, and will not either in later pretests, until all real +show-stoppers are solved first. So this is in no way a candidate for +a Recode 3.7 release. + +The test suite is worth more comments: + +- The suite is very partial, and may not be thought as a validation + suite. Before it could be used to ascertain confidence, it would + need much more tests than it has already. + +- Testing is notably more speedy than it used to be. For example, the + previous *bigauto* test, which was not run by default because it ran + for too long, is now executed within the standard test suite, once + in non-strict mode, and a second time in strict mode. + +- It does not use Autotest anymore, but rather a home grown test + driver much inspired from the Codespeak project. The link between + the test and the Recode library is established through a Pyrex + interface, so you need to have *python* and *python-devel* installed + first. + +- Beware that the Pyrex interface to the Recode library is only meant + for testing, for now at least. While you may play with it, it would + not be wise relying on it, as the specifications might change at any + time. + +** Non-Unix ports +Please [[mailto:recode-bugs@iro.umontreal.ca][inform us]] if you are aware of various ports to non-Unix systems +not listed here, or for corrections. Please provide the goal system, +a complete and stable URL, the maintainer name and address, the Recode +version used as a base, and your comments. + +- MSDOS (DJGPP) :: [[mailto:juan.guerrero@gmx.de][Juan Manuel Guerrero]] maintains this port, dated + 2001-03 and based on Recode 3.5. The following + archives hold binaries, docs and sources + respectively. See [[ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/rcode35b.zip][rcode35b]], [[ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/rcode35d.zip][rcode35d]] and [[ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/rcode35s.zip][rcode35s]]. + Also see [[http:/DJGPP.html][contrib/DJGPP/README]] in the Recode + distribution for more information about compiling + this port. +- MSDOS (Gnuish) :: [[mailto:hankedr@mail.auburn.edu][Darrel Hankerson]] maintains this port, dated + 1994-11 and based on Recode 3.4. You get many GNU + tools, not only Recode. The GNUish project is + described in =gnuish_t.htm=. See [[http://www.simtel.net/simtel.net/][simtel]] and [[http://www.leo.org/pub/comp/platforms/pc/gnuish][gnuish]] + (Germany), or for the FTP versions: [[ftp://ftp.simtel.net/simtelnet/gnu][simtel]] and + [[ftp://ftp.leo.org/pub/comp/platforms/pc/gnuish][gnuish]]. +- OS/2 (using emx/gcc) :: Maintainer unknown (maybe [[mailto:rommel@ars.de][Kai Uwe Rommel]]), + dated 1994-11 and based on Recode 3.4. See [[http://hobbes.nmsu.edu/pub/os2/util/convert/gnurcode.zip][gnurcode]]. +* Installation +** Prerequisites + +Simple installation of Recode requires the usual tools and facilities +as those needed for most GNU packages. If not already bundled with +your system, you also need to pre-install [[http://www.python.org][Python]], version 2.2 or +better. + +It is also convenient to have some *iconv* library already present on +your system, this much extends Recode capabilities, especially in the +area of Asiatic character sets. GNU *libc*, as found on Linux systems +and a few others, already has such an *iconv* library. Otherwise, you +might consider pre-installing the [[http://www.gnu.org/software/libiconv/][portable libiconv]], written by Bruno +Haible. + +** Getting a release +Source files and various distributions (either latest, prestest, or +archive) are available through [[https://github.com/pinard/Recode/][GitHub]]. + +File timestamps after checkout may trigger Make difficulties. As a +way to avoid these, from the top level of the distribution, execute =sh +after-patch.sh= before configuring. If you miss either *sh* or GNU +*touch*, try =python after-patch.py= instead. + +** Configure options +Once you have an unpacked distribution, see files: + + |-------------+------------------------------------------------| + | File name | Description | + |-------------+------------------------------------------------| + | =ABOUT-NLS= | how to customise this program to your language | + | =COPYING= | copying conditions for the program | + | =COPYING.LIB= | copying conditions for the library | + | =INSTALL= | compilation and installation instructions | + | =NEWS= | major changes in the current release | + | =THANKS= | partial list of contributors | + |-------------+------------------------------------------------| + +Besides those configure options documented in files =INSTALL= and +=ABOUT-NLS=, a few extra options may be accepted after =./configure=: + +- Options =--disable-shared= or =--disable-static= + + to inhibit the building of shared libraries or static libraries; the + default is to always build static libraries, and to attempt building + shared libraries if there is some known recipe for this. + +- Option =--with-gnu-ld= + + to force the assumption that the C compiler uses GNU *ld*. + +- Option =--with-dmalloc= + + to trigger a debugging feature for looking at memory management + problems, it pre-requires Gray Watson's [[ftp://ftp.letters.com/src/dmalloc/dmalloc.tar.gz][dmalloc package]]. + +** Maintenance tools +For simple modifications to Recode, you should not need special tools +beyond those usual for installing GNU packages. However, if you +modify any =.l= source file, Python and Flex are both needed for +remaking =merged.c=. + +For more comprehensive modifications, you might need more tools. If +not done already, make sure you have a copy of the packages listed in +the following table. You may also choose to establish a link in your +build =doc/= directory, as explained within =doc/Makemore=. + + |--------------+---------+----------+---------------| + | Package name | Current | Minimum | Install after | + |--------------+---------+----------+---------------| + | *autoconf* | 2.61 | 2.12 | *m4* | + | *automake* | 1.10 | 1.9 | *Perl* | + | *Flex* | 2.5.33 | 2.5.4a | | + | *gettext* | 0.16 | 0.16 | | + | *Help2man* | 1.36 | 1.020 | *Perl* | + | *libtool* | 1.5.24 | 1.3.4 | | + | *m4* | 1.4.10 | 1.4n | | + | *Make* | 3.81 | | | + | *Perl* | 5.8.8 | 5.005.03 | | + | *Python* | 2.5.1 | 2.2 | | + | *tar* | 1.17 | 1.12 | | + | *wget* | 1.10.2 | | | + |--------------+---------+----------+---------------| + +The /current/ version numbers just happen to be those used for +development, it is often likely that older versions would work just as +well. The /minimum/ version numbers were once acceptable, they might +not be anymore, this has not been verified; any updating information +is welcome! + +** Installation hints +Here are a few hints which might help installing Recode on some +systems. Many may be applied by temporary presetting environment +variables while calling =./configure=. File =INSTALL= explains this. + +- Compilation time + + Some C compilers, like Apollo's, have a hard time compiling + =merged.c=. If this is your case, avoid compiler optimisation. From + within the Bourne shell, you may use: + + #+BEGIN_EXAMPLE + CFLAGS= ./configure + #+END_EXAMPLE + + But if you want to give a real hard time to your C optimiser on + =merged.c=, to get code that runs only a bit faster, merely try: + + #+BEGIN_EXAMPLE + CPPFLAGS=-DINLINE_HARDER ./configure + #+END_EXAMPLE + +- Smallish systems + + For 80286 based systems (do some still exist?!), it has been + reported that some compilers generate wrong code while optimising + for /small/ models. So, from within the Bourne shell, do: + + #+BEGIN_EXAMPLE + CFLAGS=-Ml LDFLAGS=-Ml ./configure + #+END_EXAMPLE + + to force large memory model. For 80286 Xenix compiler, the last + time it was tried a while ago, one ought to use: + + #+BEGIN_EXAMPLE + CFLAGS='-Ml -F2000' LDFLAGS=-Ml ./configure + #+END_EXAMPLE + + Other systems have poor *pipe* / *popen* support or thrash heavily when + processes fork. In this case, just before doing =make=, edit =config.h= + and ensure *HAVE_PIPE* is /not/ defined. + +* The future of Recode +** Motivation +Recode is due for a major ovehaul. My plan is to end the 3.x series +of this package, rather aiming 4.0 as a major internal rewrite. + +For one thing, I want to explore some new avenues. It does not seem +natural anymore, to me at least, using C code for exploring or +prototyping new ideas requiring complex internal structures: +encompassing changes are stretchy, work overhead is just too high. I +want to add a run-time dependency between Recode and Python, with the +admitted goal of shifting the internals of Recode from C to Python. + +Another thing is that Recode should reuse more of the work of many +competent people in the recoding area. I was brought into the +business of character set conversion issues by a random set of +coincidences and needs, but have never been a character set specialist +myself. I rely on users to help me sketch what needs to be done. +There are other tools and other maintainers who address these matters +more competently than me. Recode might well rely on their work and +better concentrate on user functionalities and on an overall picture. + +For experimenting what Recode might become and experimenting new +concepts more easily, I created a subsidiary and standalone Python +project named [[https://github.com/pinard/Recodec][Recodec]], which reproduces a good part of Recode +functionality. My goal is now to merge Recodec back into Recode soon, +rather than slowly stretching the distance between Recode and Recodec. +Recode is going to be a mix of Python, C and either Pyrex or Cython. + +** Overall plan +The release 3.6 for Recode was likely the last in the 3.x series. As +there is still a long way before 4.0 gets ready, and /especially/ +because some of my good collaborators insisted that I do so, there +will likely be other Recode 3.X releases on the way to 4.0, at least +to provide a selection of user-contributed patches. Also, the next +releases of Recode will progressively implement the base mechanics for +the transition, through a list of development steps similar to the +following. By principle, the implementation should be working and +usable at each devewlopment step. Moreover, for better +maintainability, refactoring shall occur all along the way. + +I'll likely select Cython over Pyrex, the main arguments being +Unicode, Python 3 and pragmatic support, and a wide and active user +base. Pyrex, the inspiration behind Cython, is amazingly well +thought; I stay really admirative and grateful for Greg Ewing's work. + +- The main program is written in Python, and through a Cython + interface, calls the existing C API for doing the real work. +- The C API gets merely able to use Cython written steps internally, + besides the actual C steps, but with no Cython steps yet. +- New Cython steps wrap many standard Python codecs, with some + trickery to force Python codecs over actual, older Recode steps. +- Recode library initialization is moved from C to Python, and gets + called through Cython from the C API. +- Initialization is extended to cover the Recodec Python API, which + uses different tables and descriptive data. +- More steps from Recodec get moved into Recode, either coexisting + with or taking over the previous wrapping of Python codecs. +- The remaining code from the Recodec engine gets moved into Recode, + replacing C code having the same fonctionality. +- Special care is given to GNU *libc* or *libiconv* support, maybe going + from the C side to the Python side. +- Proper documentation and decisions follow extensive comparison and + diagnostic of multiple implementations of same charsets or surfaces. +- Profiling allows to fine tune when and how Cython gets used over + Python; standard Python codecs might even be cythonized in Recode. +- Program and library initialization get revised to spare disk + accesses and building descriptive structures, whenever possible. + +I once thought about resorting to kludges, within a Python API +interface, so the Python interpreter would not be required at all at +run-time. Today, I doubt this is doable in practice, or that the +implied restrictions on Cython code would be bearable. By the time, I +may come to think that this is not worth the effort, anyway. + +** Speed and memory +Historically, Recode has always been oriented towards some generality +in specifications, combined with good execution speed. Generality is +granted through providing recoding steps either as tables or fuller +algorithms expressed as C code. Speed surely results from careful C +coding of individual steps, and using Flex for more difficult +recognition problems. Speed also comes from the monolithic design of +an executable holding all tables at once, relying on system paging +instead of run-time opening of external data files. The automatic +selection among step sequencing methods also play a role in the speed +area. + +Rewriting a character shuffling engine in Python is going to have +consequences on both speed and memory: Python is inherently much +slower than C for such problems, program startup requires many disk +accesses to load all required modules, and the size of the Python +interpreter is not negligible. On the other hand, Recode is not small +as it stands. + +For prototyping various experiments, the slowdown is likely to be +bearable, especially considering the development speedup that might +result from using Python instead of C. It is fairly tedious to make +encompassing structural changes in the C version of Recode, while +similar changes are going to be rather easy in Python. I expect that +the shorter development cycle will counter-weight some duplication in +the maintenance of Recode both in Python and C afterwards. + +** Planned differences +Whenever the Python library offers a charset or a surface which Recode +also has, the Python library codec is used. In some cases, this +introduces differences, those will have to be resolved one by one, +either by accepting that the Python library does better, getting the +Python team to improve some codecs, or overriding these from Recodec. + +Other differences may occur, especially in the Asian charset area, +from the fact *libiconv*, GNU *libc* recoding facilities, and various +contributors to the Python codecs project, do not fully agree on how +things should be done. Recodec is likely to offer configuration +mechanisms to choose among various possibilities, but will not likely +attempt to rule out who is right and who is wrong! ☺ + +Issues about reversibility and canonicity, which were much present in +Recode 3.X, are fading out. While some of these were moderately easy +to implement, other cases stayed pending as fairly difficult to solve +without a significant loss of efficiency. I think these issues are +better abandoned than forever kept as half-hearted and not wholly +dependable. Any user concerned about such things might try the +reverse coding to find out if the original file is recoverable, some +new option might automate a (costly) reversibility test. + +One drawback of the whole move is that the Global Interpreter Lock in +Python gets in the way of parallel execution of the code. This would +have been more of a concern if GNU *libc* recoding facilities were +relying on the Recode library, but as things stand by now, I'm +guessing that users will not be much impacted in practice. + +* Other pointers +** Documentation +- IETF references + + - [[ftp://nic.ddn.mil/rfc/rfc1345.txt][Character Mnemonics & Character Sets]], by [[mailto:keld@dkuug.dk][Keld Simonsen]], 1992-06. + - [[ftp://nic.ddn.mil/rfc/rfc1642.txt][UTF-7 - A Mail-Safe Transformation Format of Unicode]], by [[mailto:david_goldsmith@taligent.com][David + Goldsmith]] and [[mailto:mark_davis@taligent.com][Mark Davis]], 1994-07. + - [[ftp://nic.ddn.mil/rfc/rfc2044.txt][UTF-8, a transformation format of Unicode and ISO 10646]], by [[mailto:yergeau@alis.com][François Yergeau]], 1997-10. + +- Various references + + - [[ftp://ftp.unicode.org:/Public/MAPPINGS/][Unicode charset mappings]]. The Unicode consortium makes available + plenty of charset mappings for converting /legacy/ charsets to + Unicode. + - [[ftp://ftp.iro.umontreal.ca/pub/contrib/pinard/accents/oqil-tome1.ps.gz][Normalisation et internationalisation: Inventaire et prospectives + des normes clefs pour le traitement informatique du français.]] + (392p.) or [[http://www.ceveil.qc.ca/Normes][this other copy]]. This is a report, written in French, + discussing charset issues and many other topics as well. [[mailto:bourbeau@progiciels-bpi.ca][Laurent + Bourbeau]] and [[mailto:pinard@iro.umontreal.ca][François Pinard]], 1995-10. + +- Recode specific + + - ETL presentation + + In 1999, the organisers of the [[http://www.m17n.org/conference/m17n99_all_but_registration/welcome.en.html][m17n99 conference]] in Tsukuba, + Japan, were kind enough to invite me. This has been for me a + fabulous trip and experience, and I met many extraordinary people + in there. At the conference, I presented the Translation Project, + and Recode. The Recode [[http:/m17n99.html][presentation slides]] are available. + +** Programs +- libiconv :: This comprehensive [[http://www.gnu.org/software/libiconv/][charset converter library]], by [[mailto:haible@ilog.fr][Bruno + Haible]], revolves around Unicode, and support Asian + encodings among many others. Even Recode uses it! +- tcs :: Here is the [[ftp://research.att.com/dist/tcs.shar.Z][main recoding tool]] from the Plan9 project. +- yuedit :: This [[ftp://sunsite.unc.edu/pub/Linux/apps/editors/X/yudit-1.2.tar.gz][GUI editor]], by [[mailto:gsinai@iname.com][Gaspar Sinai]], 1999-01, handles many + encodings, among which UTF-8. It also installs *uniconv*, a + recoding program, and *uniprint*, a printing tool. +- ucs-fonts :: These [[http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz][6x13 fonts]], by [[mailto:Markus.Kuhn@cl.cam.ac.uk][Markus Kuhn]], 1998-11, covering + Unicode characters besides the Asian sets, merely + replace the Linux fixed 6x13 font. Works nicely with + *yudit*. +- MtRecode :: This [[http://www.lpl.univ-aix.fr/projects/multext/MtRecode/][charset converter]] is oriented towards SGML text + manipulation. It may be freely downloaded for + non-commercial, non-military use. Pointer given by [[mailto:veronis@univ-aix.fr][Jean + Véronis]], 1996-06. +- sp :: This quite nice SGML [[ftp://ftp.jclark.com/pub/sp/sp-1.3.tar.gz][structure analyser]], by [[mailto:jjc@jclark.com][James Clark]], + contains internal C++ modules for handling many charsets. +- b2c :: This [[http://research.de.uu.net:8080/~gnu/b2c/b2c-2.1.tar.gz][program]], by [[mailto:Joerg.Heitkoetter@de.uu.net][Jörg Heitkötter]], 1997-11, is able to + generate interpreted character dumps, but properly embedded + within complete C header files. +- PyRecode :: This [[http://www.suxers.de/PyRecode.tgz][wrapper]], by [[mailto:ajung@server.python.net][Andreas Jung]], provides Recode functionality to Python programs. Also see [[http://www.vex.net/parnassus/apyllo.py?find%3Drecode][this link]] and [[http://www.suxers.de/python/pyrecode.htm][this other link]].