library if one was available at installation time. The -x: option
to the program, or a new flag to the library recode_new_outer
function, inhibits the initialisation and usage of iconv.
++ The experimental ``tree`` surface is removed. Structured data
+ needs a proper parser, and that doesn't fit the framework of Recode.
+ Many bug fixes.
+ Long ago, I renamed GNU recode to Free recode: the permission for
using the GNU prefix mandated a level of obedience to the FSF that
Conversion is possible between almost any pair of charsets. Here is a
list of the exceptions. One may not recode @emph{from} the @code{flat},
@code{count-characters} or @code{dump-with-names} charsets, nor @emph{from}
-or @emph{to} the @code{data}, @code{tree} or @code{:iconv:} charsets.
-Also, if we except the @code{data} and @code{tree} pseudo-charsets, charsets
+or @emph{to} the @code{data} or @code{:iconv:} charsets.
+Also, if we except the @code{data} pseudo-charset, charsets
and surfaces live in disjoint recoding spaces, one cannot really transform
a surface into a charset or vice-versa, as surfaces are only meant to be
applied over charsets, or removed from them.
@cindex surfaces, implementation in Recode
@tindex data@r{, a special charset}
-@tindex tree@r{, a special charset}
Surfaces are implemented within Recode as special charsets
-which may only transform to or from the @code{data} or @code{tree}
-special charsets. Clever users may use this knowledge for writing
+which may only transform to or from the @code{data}
+special charset. Clever users may use this knowledge for writing
surface names in requests exactly as if they were pure charsets, when
the only need is to change surfaces without any kind of recoding between
-real charsets. In such contexts, either @code{data} or @code{tree} may
-also be used as if it were some kind of generic, anonymous charset: the
+real charsets. In such contexts @code{data} may also be used as if it
+were some kind of generic, anonymous charset: the
request @samp{data..@var{surface}} merely adds the given @var{surface},
while the request @samp{@var{surface}..data} removes it.
-@cindex structural surfaces
-@cindex surfaces, structural
-@cindex surfaces, trees
-The Recode library distinguishes between mere data surfaces, and
-structural surfaces, also called tree surfaces for short. Structural
-surfaces might allow, in the long run, transformations between a few
-specialised representations of structural information like MIME parts,
-Perl or Python initialisers, LISP S-expressions, XML, Emacs outlines, etc.
-
-We are still experimenting with surfaces in Recode. The concept opens
-the doors to many avenues; it is not clear yet which ones are worth pursuing,
-and which should be abandoned. In particular, implementation of structural
-surfaces is barely starting, there is not even a commitment that tree
-surfaces will stay in Recode, if they do prove to be more cumbersome
-than useful. This chapter presents all surfaces currently available.
+This chapter presents all surfaces currently available.
@menu
* Permutations:: Permuting groups of bytes
Adding a new surface is technically quite similar to adding a new charset.
@xref{New charsets}. A surface is provided as a set of two transformations:
-one from the predefined special charset @code{data} or @code{tree} to the
+one from the predefined special charset @code{data} to the
new surface, meant to apply the surface, the other from the new surface
-to the predefined special charset @code{data} or @code{tree}, meant to
-remove the surface.
+to the predefined special charset @code{data}, meant to remove the surface.
@findex declare_step
Internally in Recode, function @code{declare_step} especially
-recognises when a charset is so related to @code{data} or @code{tree},
+recognises when a charset is so related to @code{data},
and then takes appropriate actions so that charset gets indeed installed
as a surface.
{
case SYMBOL_CREATE_CHARSET:
case SYMBOL_CREATE_DATA_SURFACE:
- case SYMBOL_CREATE_TREE_SURFACE:
abort ();
case ALIAS_FIND_AS_CHARSET:
type = RECODE_DATA_SURFACE;
break;
- case SYMBOL_CREATE_TREE_SURFACE:
- type = RECODE_TREE_SURFACE;
- break;
-
default:
/* Clean and disambiguate first as requested. */
single->before = before->symbol;
single->after = outer->data_symbol;
}
- else if (strcmp (before_name, "tree") == 0)
- {
- single->before = outer->tree_symbol;
- after = find_alias (outer, after_name, SYMBOL_CREATE_TREE_SURFACE);
- single->after = after->symbol;
- }
- else if (strcmp(after_name, "tree") == 0)
- {
- before = find_alias (outer, before_name, SYMBOL_CREATE_TREE_SURFACE);
- single->before = before->symbol;
- single->after = outer->tree_symbol;
- }
else
{
before = find_alias (outer, before_name, SYMBOL_CREATE_CHARSET);
single->init_routine = init_routine;
single->transform_routine = transform_routine;
- if (single->before == outer->data_symbol
- || single->before == outer->tree_symbol)
+ if (single->before == outer->data_symbol)
{
if (single->after->resurfacer)
recode_error (outer, _("Resurfacer set more than once for `%s'"),
after_name);
single->after->resurfacer = single;
}
- else if (single->after == outer->data_symbol
- || single->after == outer->tree_symbol)
+ else if (single->after == outer->data_symbol)
{
if (single->before->unsurfacer)
recode_error (outer, _("Unsurfacer set more than once for `%s'"),
return false;
outer->data_symbol = alias->symbol;
- if (alias = find_alias (outer, "tree", SYMBOL_CREATE_CHARSET), !alias)
- return false;
- outer->tree_symbol = alias->symbol;
-
if (alias = find_alias (outer, "ISO-10646-UCS-2", SYMBOL_CREATE_CHARSET),
!alias)
return false;
/* Preset charsets and surfaces. */
RECODE_SYMBOL data_symbol;/* special charset defining surfaces */
- RECODE_SYMBOL tree_symbol; /* special charset defining structures */
RECODE_SYMBOL ucs2_charset; /* UCS-2 */
RECODE_SYMBOL iconv_pivot; /* `iconv' internal UCS */
RECODE_SYMBOL crlf_surface; /* for IBM PC machines */
{
RECODE_NO_SYMBOL_TYPE, /* missing value */
RECODE_CHARSET, /* visible in the space of charsets */
- RECODE_DATA_SURFACE, /* this is a mere data surface */
- RECODE_TREE_SURFACE /* this is a structural surface */
+ RECODE_DATA_SURFACE /* this is a mere data surface */
};
enum recode_data_type
{
SYMBOL_CREATE_CHARSET, /* charset as given, create as needed */
SYMBOL_CREATE_DATA_SURFACE, /* data surface as given, create as needed */
- SYMBOL_CREATE_TREE_SURFACE, /* tree surface as given, create as needed */
ALIAS_FIND_AS_CHARSET, /* disambiguate only as a charset */
ALIAS_FIND_AS_SURFACE, /* disambiguate only as a surface */
ALIAS_FIND_AS_EITHER /* disambiguate as a charset or a surface */
/* Find unsurfacers. */
while (step < request->sequence_array + request->sequence_length
- && (step->after == outer->data_symbol
- || step->after == outer->tree_symbol))
+ && step->after == outer->data_symbol)
step++;
unsurfacer_end = step;
add_work_string (request, "..");
if (step < request->sequence_array + request->sequence_length
- && step->before != outer->data_symbol
- && step->before != outer->tree_symbol)
+ && step->before != outer->data_symbol)
{
last_charset_printed = step->after;
add_work_string (request, last_charset_printed->name);
else
{
last_charset_printed = outer->data_symbol;
- /* FIXME: why not outer->tree_symbol? */
add_work_string (request, last_charset_printed->name);
}
/* Print resurfacers. */
while (step < request->sequence_array + request->sequence_length
- && (step->before == outer->data_symbol
- || step->before == outer->tree_symbol))
+ && step->before == outer->data_symbol)
{
add_work_character (request, '/');
last_charset_printed = NULL;
SUBTASK_RETURN (subtask);
}
-/*---------------------------------------------------------------------.
-| Execute the conversion sequence for a recoding TASK, using several |
-| passes with two alternating memory buffers or intermediate files, or |
-| forking for each step and interconnecting the processes with pipes. |
-| This routine assumes at least one needed recoding step. |
-`---------------------------------------------------------------------*/
-
-static bool
-perform_sequence (RECODE_TASK task)
+/*------------------------------------------------------------------------.
+| Execute the conversion sequence for a recoding TASK. If no conversions |
+| are needed, merely copy the input onto the output. |
+| Returns zero if the recoding has been found to be non-reversible. |
+| Tell what goes on if VERBOSE. |
+`------------------------------------------------------------------------*/
+
+bool
+recode_perform_task (RECODE_TASK task)
{
RECODE_CONST_REQUEST request = task->request;
struct recode_subtask subtask_block;
subtask->task = task;
subtask->input = task->input;
+ /* Switch stdin and stdout to binary mode unless they are ttys, as this has
+ nasty side-effects on several DOSish systems. For example, the Ctrl-Z
+ character is no longer interpreted as EOF, and thus the poor user cannot
+ signal end of input; the INTR character also doesn't work, so they cannot
+ even interrupt the program, and are stuck. On the other hand, output to
+ the screen doesn't have to follow the end-of-line format exactly, since
+ it is going to be discarded anyway. */
+ if (task->input.name && !*task->input.name && !isatty (fileno (stdin)))
+ xset_binary_mode (fileno (stdin), O_BINARY);
+ if (task->output.name && !*task->output.name && !isatty (fileno (stdout)))
+ xset_binary_mode (fileno (stdout), O_BINARY);
+
/* Prepare the first input file. */
if (subtask->input.name)
free (task);
return true;
}
-
-/*------------------------------------------------------------------------.
-| Execute the conversion sequence for a recoding TASK. If no conversions |
-| are needed, merely copy the input onto the output. |
-| Returns zero if the recoding has been found to be non-reversible. |
-| Tell what goes on if VERBOSE. |
-`------------------------------------------------------------------------*/
-
-bool
-recode_perform_task (RECODE_TASK task)
-{
- /* Switch stdin and stdout to binary mode unless they are ttys, as this has
- nasty side-effects on several DOSish systems. For example, the Ctrl-Z
- character is no longer interpreted as EOF, and thus the poor user cannot
- signal end of input; the INTR character also doesn't work, so they cannot
- even interrupt the program, and are stuck. On the other hand, output to
- the screen doesn't have to follow the end-of-line format exactly, since
- it is going to be discarded anyway. */
- if (task->input.name && !*task->input.name && !isatty (fileno (stdin)))
- xset_binary_mode (fileno (stdin), O_BINARY);
- if (task->output.name && !*task->output.name && !isatty (fileno (stdout)))
- xset_binary_mode (fileno (stdout), O_BINARY);
-
- return perform_sequence (task);
-}
RECODE_NO_SYMBOL_TYPE
RECODE_CHARSET
RECODE_DATA_SURFACE
- RECODE_TREE_SURFACE
enum recode_data_type:
RECODE_NO_CHARSET_DATA
unsigned number_of_singles
unsigned char *one_to_same
RECODE_SYMBOL data_symbol
- RECODE_SYMBOL tree_symbol
RECODE_SYMBOL ucs2_charset
RECODE_SYMBOL iconv_pivot
RECODE_SYMBOL crlf_surface
enum alias_find_type:
SYMBOL_CREATE_CHARSET_ 'SYMBOL_CREATE_CHARSET'
SYMBOL_CREATE_DATA_SURFACE_ 'SYMBOL_CREATE_DATA_SURFACE'
- SYMBOL_CREATE_TREE_SURFACE_ 'SYMBOL_CREATE_TREE_SURFACE'
ALIAS_FIND_AS_CHARSET_ 'ALIAS_FIND_AS_CHARSET'
ALIAS_FIND_AS_SURFACE_ 'ALIAS_FIND_AS_SURFACE'
ALIAS_FIND_AS_EITHER_ 'ALIAS_FIND_AS_EITHER'
NO_SYMBOL_TYPE = RECODE_NO_SYMBOL_TYPE
CHARSET = RECODE_CHARSET
DATA_SURFACE = RECODE_DATA_SURFACE
-TREE_SURFACE = RECODE_TREE_SURFACE
NO_CHARSET_DATA = RECODE_NO_CHARSET_DATA
STRIP_DATA = RECODE_STRIP_DATA
SYMBOL_CREATE_CHARSET = SYMBOL_CREATE_CHARSET_
SYMBOL_CREATE_DATA_SURFACE = SYMBOL_CREATE_DATA_SURFACE_
-SYMBOL_CREATE_TREE_SURFACE = SYMBOL_CREATE_TREE_SURFACE_
ALIAS_FIND_AS_CHARSET = ALIAS_FIND_AS_CHARSET_
ALIAS_FIND_AS_SURFACE = ALIAS_FIND_AS_SURFACE_
ALIAS_FIND_AS_EITHER = ALIAS_FIND_AS_EITHER_
while symbol is not NULL:
if (symbol.type == RECODE_CHARSET
and symbol is not self.outer.iconv_pivot
- and symbol is not self.outer.data_symbol
- and symbol is not self.outer.tree_symbol):
+ and symbol is not self.outer.data_symbol):
list.append(symbol.name)
symbol = symbol.next
return list
TCVN
Texinfo texi ti
Texte txte
-tree
UNICODE-1-1-UTF-7 TF-7 u7 UTF-7
UTF-8 FSS_UTF TF-8 u8 UTF-2 UTF-FSS
UTF-16 TF-16 u6 Unicode