From: Reuben Thomas Date: Thu, 18 Jan 2018 01:32:20 +0000 (+0000) Subject: Fix -k (fixes Debian bug #607021) X-Git-Tag: v3.7~96 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=c4519507365fc713d96d094cd5ccf2d1aa2c4bf9;p=recode Fix -k (fixes Debian bug #607021) Update manual to reflect differently-presented character set names (output is cosmetically different). --- diff --git a/doc/recode.texi b/doc/recode.texi index 69038f0..0db823f 100644 --- a/doc/recode.texi +++ b/doc/recode.texi @@ -904,7 +904,7 @@ This particular option is meant to help identifying an unknown charset, using as hints some already identified characters of the charset. Some examples will help introducing the idea. -Let's presume here that Recode is run in an ISO-8859-1 locale, and +Let's presume here that Recode is run in a UTF-8 locale, and that @code{DEFAULT_CHARSET} is unset in the environment. Suppose you have guessed that code 130 (decimal) of the unknown charset represents a lower case @samp{e} with an acute accent. That is to say @@ -919,17 +919,17 @@ recode -k 130:233 you should obtain a listing similar to: @example -AtariST atarist -CWI cphu cwi cwi2 -IBM437 437 cp437 ibm437 -IBM850 850 cp850 ibm850 -IBM851 851 cp851 ibm851 -IBM852 852 cp852 ibm852 -IBM857 857 cp857 ibm857 -IBM860 860 cp860 ibm860 -IBM861 861 cp861 cpis ibm861 -IBM863 863 cp863 ibm863 -IBM865 865 cp865 ibm865 +AtariST +CWI cp-hu CWI-2 +IBM437/CR-LF 437/CR-LF CP437/CR-LF +IBM850/CR-LF 850/CR-LF CP850/CR-LF +IBM851/CR-LF 851/CR-LF CP851/CR-LF +IBM852/CR-LF 852/CR-LF CP852/CR-LF pcl2 pclatin2 +IBM857/CR-LF 857/CR-LF CP857/CR-LF +IBM860/CR-LF 860/CR-LF CP860/CR-LF +IBM861/CR-LF 861/CR-LF CP861/CR-LF cp-is +IBM863/CR-LF 863/CR-LF CP863/CR-LF +IBM865/CR-LF 865/CR-LF CP865/CR-LF @end example You can give more than one clue at once, to restrict the list further. @@ -945,9 +945,9 @@ recode -k 130:233,211:203 you should obtain: @example -IBM850 850 cp850 ibm850 -IBM852 852 cp852 ibm852 -IBM857 857 cp857 ibm857 +IBM850/CR-LF 850/CR-LF CP850/CR-LF +IBM852/CR-LF 852/CR-LF CP852/CR-LF pcl2 pclatin2 +IBM857/CR-LF 857/CR-LF CP857/CR-LF @end example The usual charset may be overridden by specifying one non-option argument. @@ -962,17 +962,17 @@ recode -k 130:142 mac and get: @example -AtariST atarist -CWI cphu cwi cwi2 -IBM437 437 cp437 ibm437 -IBM850 850 cp850 ibm850 -IBM851 851 cp851 ibm851 -IBM852 852 cp852 ibm852 -IBM857 857 cp857 ibm857 -IBM860 860 cp860 ibm860 -IBM861 861 cp861 cpis ibm861 -IBM863 863 cp863 ibm863 -IBM865 865 cp865 ibm865 +AtariST +CWI cp-hu CWI-2 +IBM437/CR-LF 437/CR-LF CP437/CR-LF +IBM850/CR-LF 850/CR-LF CP850/CR-LF +IBM851/CR-LF 851/CR-LF CP851/CR-LF +IBM852/CR-LF 852/CR-LF CP852/CR-LF pcl2 pclatin2 +IBM857/CR-LF 857/CR-LF CP857/CR-LF +IBM860/CR-LF 860/CR-LF CP860/CR-LF +IBM861/CR-LF 861/CR-LF CP861/CR-LF cp-is +IBM863/CR-LF 863/CR-LF CP863/CR-LF +IBM865/CR-LF 865/CR-LF CP865/CR-LF @end example @noindent diff --git a/src/names.c b/src/names.c index 9a4868e..5d47c51 100644 --- a/src/names.c +++ b/src/names.c @@ -29,6 +29,10 @@ _GL_ATTRIBUTE_PURE int code_to_ucs2 (RECODE_CONST_SYMBOL charset, unsigned code) { + /* FIXME: if no specific UCS-2 translation, assume an identity map. */ + if (charset->data_type != RECODE_STRIP_DATA) + return code; + const struct strip_data *data = (const struct strip_data *) charset->data; const recode_ucs2 *pool = data->pool; unsigned offset = data->offset[code / STRIP_SIZE]; @@ -50,12 +54,6 @@ check_restricted (RECODE_CONST_OUTER outer, int left; int right; - /* Reject the charset if no UCS-2 translation known for it. */ - - if (before->data_type != RECODE_STRIP_DATA - || after->data_type != RECODE_STRIP_DATA) - return true; - for (pair = outer->pair_restriction; pair < outer->pair_restriction + outer->pair_restrictions; pair++)