From: Reuben Thomas Date: Tue, 30 Jan 2018 22:05:38 +0000 (+0000) Subject: Remove 'tree' surface (see NEWS for details) X-Git-Tag: v3.7~3 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=cf738e700e0a95b9e737bbf308476b147bd4c0d9;p=recode Remove 'tree' surface (see NEWS for details) --- diff --git a/NEWS b/NEWS index 6bf2f20..fb48b9e 100644 --- a/NEWS +++ b/NEWS @@ -27,6 +27,8 @@ Version 3.7 library if one was available at installation time. The -x: option to the program, or a new flag to the library recode_new_outer function, inhibits the initialisation and usage of iconv. ++ The experimental ``tree`` surface is removed. Structured data + needs a proper parser, and that doesn't fit the framework of Recode. + Many bug fixes. + Long ago, I renamed GNU recode to Free recode: the permission for using the GNU prefix mandated a level of obedience to the FSF that diff --git a/doc/recode.texi b/doc/recode.texi index fd91eaf..8e1a839 100644 --- a/doc/recode.texi +++ b/doc/recode.texi @@ -449,8 +449,8 @@ now, clashes are avoided, the old and new charsets are kept well separate. Conversion is possible between almost any pair of charsets. Here is a list of the exceptions. One may not recode @emph{from} the @code{flat}, @code{count-characters} or @code{dump-with-names} charsets, nor @emph{from} -or @emph{to} the @code{data}, @code{tree} or @code{:iconv:} charsets. -Also, if we except the @code{data} and @code{tree} pseudo-charsets, charsets +or @emph{to} the @code{data} or @code{:iconv:} charsets. +Also, if we except the @code{data} pseudo-charset, charsets and surfaces live in disjoint recoding spaces, one cannot really transform a surface into a charset or vice-versa, as surfaces are only meant to be applied over charsets, or removed from them. @@ -4264,32 +4264,17 @@ and usage. @xref{Universal}. @cindex surfaces, implementation in Recode @tindex data@r{, a special charset} -@tindex tree@r{, a special charset} Surfaces are implemented within Recode as special charsets -which may only transform to or from the @code{data} or @code{tree} -special charsets. Clever users may use this knowledge for writing +which may only transform to or from the @code{data} +special charset. Clever users may use this knowledge for writing surface names in requests exactly as if they were pure charsets, when the only need is to change surfaces without any kind of recoding between -real charsets. In such contexts, either @code{data} or @code{tree} may -also be used as if it were some kind of generic, anonymous charset: the +real charsets. In such contexts @code{data} may also be used as if it +were some kind of generic, anonymous charset: the request @samp{data..@var{surface}} merely adds the given @var{surface}, while the request @samp{@var{surface}..data} removes it. -@cindex structural surfaces -@cindex surfaces, structural -@cindex surfaces, trees -The Recode library distinguishes between mere data surfaces, and -structural surfaces, also called tree surfaces for short. Structural -surfaces might allow, in the long run, transformations between a few -specialised representations of structural information like MIME parts, -Perl or Python initialisers, LISP S-expressions, XML, Emacs outlines, etc. - -We are still experimenting with surfaces in Recode. The concept opens -the doors to many avenues; it is not clear yet which ones are worth pursuing, -and which should be abandoned. In particular, implementation of structural -surfaces is barely starting, there is not even a commitment that tree -surfaces will stay in Recode, if they do prove to be more cumbersome -than useful. This chapter presents all surfaces currently available. +This chapter presents all surfaces currently available. @menu * Permutations:: Permuting groups of bytes @@ -4771,14 +4756,13 @@ the fact your routines is written in C or in Flex. Adding a new surface is technically quite similar to adding a new charset. @xref{New charsets}. A surface is provided as a set of two transformations: -one from the predefined special charset @code{data} or @code{tree} to the +one from the predefined special charset @code{data} to the new surface, meant to apply the surface, the other from the new surface -to the predefined special charset @code{data} or @code{tree}, meant to -remove the surface. +to the predefined special charset @code{data}, meant to remove the surface. @findex declare_step Internally in Recode, function @code{declare_step} especially -recognises when a charset is so related to @code{data} or @code{tree}, +recognises when a charset is so related to @code{data}, and then takes appropriate actions so that charset gets indeed installed as a surface. diff --git a/src/names.c b/src/names.c index fbbc811..8c75597 100644 --- a/src/names.c +++ b/src/names.c @@ -203,7 +203,6 @@ disambiguate_name (RECODE_OUTER outer, { case SYMBOL_CREATE_CHARSET: case SYMBOL_CREATE_DATA_SURFACE: - case SYMBOL_CREATE_TREE_SURFACE: abort (); case ALIAS_FIND_AS_CHARSET: @@ -273,10 +272,6 @@ find_alias (RECODE_OUTER outer, const char *name, type = RECODE_DATA_SURFACE; break; - case SYMBOL_CREATE_TREE_SURFACE: - type = RECODE_TREE_SURFACE; - break; - default: /* Clean and disambiguate first as requested. */ diff --git a/src/outer.c b/src/outer.c index c63aea6..82fabc4 100644 --- a/src/outer.c +++ b/src/outer.c @@ -85,18 +85,6 @@ declare_single (RECODE_OUTER outer, single->before = before->symbol; single->after = outer->data_symbol; } - else if (strcmp (before_name, "tree") == 0) - { - single->before = outer->tree_symbol; - after = find_alias (outer, after_name, SYMBOL_CREATE_TREE_SURFACE); - single->after = after->symbol; - } - else if (strcmp(after_name, "tree") == 0) - { - before = find_alias (outer, before_name, SYMBOL_CREATE_TREE_SURFACE); - single->before = before->symbol; - single->after = outer->tree_symbol; - } else { before = find_alias (outer, before_name, SYMBOL_CREATE_CHARSET); @@ -120,16 +108,14 @@ declare_single (RECODE_OUTER outer, single->init_routine = init_routine; single->transform_routine = transform_routine; - if (single->before == outer->data_symbol - || single->before == outer->tree_symbol) + if (single->before == outer->data_symbol) { if (single->after->resurfacer) recode_error (outer, _("Resurfacer set more than once for `%s'"), after_name); single->after->resurfacer = single; } - else if (single->after == outer->data_symbol - || single->after == outer->tree_symbol) + else if (single->after == outer->data_symbol) { if (single->before->unsurfacer) recode_error (outer, _("Unsurfacer set more than once for `%s'"), @@ -406,10 +392,6 @@ register_all_modules (RECODE_OUTER outer) return false; outer->data_symbol = alias->symbol; - if (alias = find_alias (outer, "tree", SYMBOL_CREATE_CHARSET), !alias) - return false; - outer->tree_symbol = alias->symbol; - if (alias = find_alias (outer, "ISO-10646-UCS-2", SYMBOL_CREATE_CHARSET), !alias) return false; diff --git a/src/recodext.h b/src/recodext.h index be7a965..57b3c1f 100644 --- a/src/recodext.h +++ b/src/recodext.h @@ -147,7 +147,6 @@ struct recode_outer /* Preset charsets and surfaces. */ RECODE_SYMBOL data_symbol;/* special charset defining surfaces */ - RECODE_SYMBOL tree_symbol; /* special charset defining structures */ RECODE_SYMBOL ucs2_charset; /* UCS-2 */ RECODE_SYMBOL iconv_pivot; /* `iconv' internal UCS */ RECODE_SYMBOL crlf_surface; /* for IBM PC machines */ @@ -173,8 +172,7 @@ enum recode_symbol_type { RECODE_NO_SYMBOL_TYPE, /* missing value */ RECODE_CHARSET, /* visible in the space of charsets */ - RECODE_DATA_SURFACE, /* this is a mere data surface */ - RECODE_TREE_SURFACE /* this is a structural surface */ + RECODE_DATA_SURFACE /* this is a mere data surface */ }; enum recode_data_type @@ -570,7 +568,6 @@ enum alias_find_type { SYMBOL_CREATE_CHARSET, /* charset as given, create as needed */ SYMBOL_CREATE_DATA_SURFACE, /* data surface as given, create as needed */ - SYMBOL_CREATE_TREE_SURFACE, /* tree surface as given, create as needed */ ALIAS_FIND_AS_CHARSET, /* disambiguate only as a charset */ ALIAS_FIND_AS_SURFACE, /* disambiguate only as a surface */ ALIAS_FIND_AS_EITHER /* disambiguate as a charset or a surface */ diff --git a/src/request.c b/src/request.c index 41284f2..34d59c0 100644 --- a/src/request.c +++ b/src/request.c @@ -121,8 +121,7 @@ edit_sequence (RECODE_REQUEST request, bool edit_quality) /* Find unsurfacers. */ while (step < request->sequence_array + request->sequence_length - && (step->after == outer->data_symbol - || step->after == outer->tree_symbol)) + && step->after == outer->data_symbol) step++; unsurfacer_end = step; @@ -154,8 +153,7 @@ edit_sequence (RECODE_REQUEST request, bool edit_quality) add_work_string (request, ".."); if (step < request->sequence_array + request->sequence_length - && step->before != outer->data_symbol - && step->before != outer->tree_symbol) + && step->before != outer->data_symbol) { last_charset_printed = step->after; add_work_string (request, last_charset_printed->name); @@ -164,15 +162,13 @@ edit_sequence (RECODE_REQUEST request, bool edit_quality) else { last_charset_printed = outer->data_symbol; - /* FIXME: why not outer->tree_symbol? */ add_work_string (request, last_charset_printed->name); } /* Print resurfacers. */ while (step < request->sequence_array + request->sequence_length - && (step->before == outer->data_symbol - || step->before == outer->tree_symbol)) + && step->before == outer->data_symbol) { add_work_character (request, '/'); last_charset_printed = NULL; diff --git a/src/task.c b/src/task.c index 5b70ca7..5dd62b2 100644 --- a/src/task.c +++ b/src/task.c @@ -214,15 +214,15 @@ transform_byte_to_variable (RECODE_SUBTASK subtask) SUBTASK_RETURN (subtask); } -/*---------------------------------------------------------------------. -| Execute the conversion sequence for a recoding TASK, using several | -| passes with two alternating memory buffers or intermediate files, or | -| forking for each step and interconnecting the processes with pipes. | -| This routine assumes at least one needed recoding step. | -`---------------------------------------------------------------------*/ - -static bool -perform_sequence (RECODE_TASK task) +/*------------------------------------------------------------------------. +| Execute the conversion sequence for a recoding TASK. If no conversions | +| are needed, merely copy the input onto the output. | +| Returns zero if the recoding has been found to be non-reversible. | +| Tell what goes on if VERBOSE. | +`------------------------------------------------------------------------*/ + +bool +recode_perform_task (RECODE_TASK task) { RECODE_CONST_REQUEST request = task->request; struct recode_subtask subtask_block; @@ -242,6 +242,18 @@ perform_sequence (RECODE_TASK task) subtask->task = task; subtask->input = task->input; + /* Switch stdin and stdout to binary mode unless they are ttys, as this has + nasty side-effects on several DOSish systems. For example, the Ctrl-Z + character is no longer interpreted as EOF, and thus the poor user cannot + signal end of input; the INTR character also doesn't work, so they cannot + even interrupt the program, and are stuck. On the other hand, output to + the screen doesn't have to follow the end-of-line format exactly, since + it is going to be discarded anyway. */ + if (task->input.name && !*task->input.name && !isatty (fileno (stdin))) + xset_binary_mode (fileno (stdin), O_BINARY); + if (task->output.name && !*task->output.name && !isatty (fileno (stdout))) + xset_binary_mode (fileno (stdout), O_BINARY); + /* Prepare the first input file. */ if (subtask->input.name) @@ -515,28 +527,3 @@ recode_delete_task (RECODE_TASK task) free (task); return true; } - -/*------------------------------------------------------------------------. -| Execute the conversion sequence for a recoding TASK. If no conversions | -| are needed, merely copy the input onto the output. | -| Returns zero if the recoding has been found to be non-reversible. | -| Tell what goes on if VERBOSE. | -`------------------------------------------------------------------------*/ - -bool -recode_perform_task (RECODE_TASK task) -{ - /* Switch stdin and stdout to binary mode unless they are ttys, as this has - nasty side-effects on several DOSish systems. For example, the Ctrl-Z - character is no longer interpreted as EOF, and thus the poor user cannot - signal end of input; the INTR character also doesn't work, so they cannot - even interrupt the program, and are stuck. On the other hand, output to - the screen doesn't have to follow the end-of-line format exactly, since - it is going to be discarded anyway. */ - if (task->input.name && !*task->input.name && !isatty (fileno (stdin))) - xset_binary_mode (fileno (stdin), O_BINARY); - if (task->output.name && !*task->output.name && !isatty (fileno (stdout))) - xset_binary_mode (fileno (stdout), O_BINARY); - - return perform_sequence (task); -} diff --git a/tests/Recode.pyx b/tests/Recode.pyx index 96dfaa9..366cc9c 100644 --- a/tests/Recode.pyx +++ b/tests/Recode.pyx @@ -39,7 +39,6 @@ cdef extern from "common.h": RECODE_NO_SYMBOL_TYPE RECODE_CHARSET RECODE_DATA_SURFACE - RECODE_TREE_SURFACE enum recode_data_type: RECODE_NO_CHARSET_DATA @@ -235,7 +234,6 @@ cdef extern from "common.h": unsigned number_of_singles unsigned char *one_to_same RECODE_SYMBOL data_symbol - RECODE_SYMBOL tree_symbol RECODE_SYMBOL ucs2_charset RECODE_SYMBOL iconv_pivot RECODE_SYMBOL crlf_surface @@ -310,7 +308,6 @@ cdef extern from "common.h": enum alias_find_type: SYMBOL_CREATE_CHARSET_ 'SYMBOL_CREATE_CHARSET' SYMBOL_CREATE_DATA_SURFACE_ 'SYMBOL_CREATE_DATA_SURFACE' - SYMBOL_CREATE_TREE_SURFACE_ 'SYMBOL_CREATE_TREE_SURFACE' ALIAS_FIND_AS_CHARSET_ 'ALIAS_FIND_AS_CHARSET' ALIAS_FIND_AS_SURFACE_ 'ALIAS_FIND_AS_SURFACE' ALIAS_FIND_AS_EITHER_ 'ALIAS_FIND_AS_EITHER' @@ -456,7 +453,6 @@ class error(Exception): NO_SYMBOL_TYPE = RECODE_NO_SYMBOL_TYPE CHARSET = RECODE_CHARSET DATA_SURFACE = RECODE_DATA_SURFACE -TREE_SURFACE = RECODE_TREE_SURFACE NO_CHARSET_DATA = RECODE_NO_CHARSET_DATA STRIP_DATA = RECODE_STRIP_DATA @@ -506,7 +502,6 @@ STRIP_SIZE = STRIP_SIZE_ SYMBOL_CREATE_CHARSET = SYMBOL_CREATE_CHARSET_ SYMBOL_CREATE_DATA_SURFACE = SYMBOL_CREATE_DATA_SURFACE_ -SYMBOL_CREATE_TREE_SURFACE = SYMBOL_CREATE_TREE_SURFACE_ ALIAS_FIND_AS_CHARSET = ALIAS_FIND_AS_CHARSET_ ALIAS_FIND_AS_SURFACE = ALIAS_FIND_AS_SURFACE_ ALIAS_FIND_AS_EITHER = ALIAS_FIND_AS_EITHER_ @@ -555,8 +550,7 @@ cdef class Outer: while symbol is not NULL: if (symbol.type == RECODE_CHARSET and symbol is not self.outer.iconv_pivot - and symbol is not self.outer.data_symbol - and symbol is not self.outer.tree_symbol): + and symbol is not self.outer.data_symbol): list.append(symbol.name) symbol = symbol.next return list diff --git a/tests/t21_names.py b/tests/t21_names.py index 0915452..401d7e7 100644 --- a/tests/t21_names.py +++ b/tests/t21_names.py @@ -220,7 +220,6 @@ T.61-7bit iso-ir-102 TCVN Texinfo texi ti Texte txte -tree UNICODE-1-1-UTF-7 TF-7 u7 UTF-7 UTF-8 FSS_UTF TF-8 u8 UTF-2 UTF-FSS UTF-16 TF-16 u6 Unicode