Reuben Thomas [Sat, 27 Jan 2018 12:43:15 +0000 (12:43 +0000)]
task.c: slightly improve code and comments
This is the fruit of an aborted investigation into using pipes all the
time (it won’t work with the present architecture with translations into a
buffer).
It could be achieved by either a) adding one more stage to write to a file,
and having the parent process read that back, or b) adding an extra pipe
back to the parent process, but that would then require non-blocking I/O,
which would involve i) a new type of communication channel (e.g. file
descriptors), and ii) reimplementation, if desired, of buffering.
An overall simpler and more portable solution would be to convert the
parallel code to use threads instead of child processes, so that the output
buffer would remain accessible to the parent process.
Reuben Thomas [Fri, 26 Jan 2018 21:02:31 +0000 (21:02 +0000)]
html.c: add explanation to FIXME
Outputting a BOM into HTML is marked “experimental”. Since there are
situations in which a BOM should not be output, this seems right. Needs
further evaluation.
Reuben Thomas [Thu, 25 Jan 2018 09:52:41 +0000 (09:52 +0000)]
Run tests with valgrind
Use the valgrind-python.supp suppressions file (latest version from cpython
git), adding wildcard * to end of PyObject_Realloc name to match LTO
symbols. However, even this is not currently enough to run with the standard
/usr/bin/python, so use /usr/bin/python-dbg instead, which does work (with
some extra suppressions). See https://bugs.python.org/issue32666
Remove a bogus setting of PYTHON in tests/Makefile.am
Reuben Thomas [Fri, 26 Jan 2018 11:49:06 +0000 (11:49 +0000)]
Fix memory leaks
Use the existing, but unused, term_routine member of struct recode_step to
register finalisers.
Also add step_table_term_routine, as adding separate finalisers for
different table types is nicer than having to add finalisers for a huge
number of transformers.
Zero-initialise all allocated memory, so we can assume it, and as a
defensive measure.
Move struct ucs2_to_byte from recodext.h into recode.c, its only user.
Reuben Thomas [Thu, 25 Jan 2018 09:56:09 +0000 (09:56 +0000)]
Travis: switch off verbose build
Comment it out for debugging use, but for now it makes the logs too long to
read in the nice web presentation (instead one is forced to read the raw
logs), and the Recode test system shouldn’t need verbose logging.
Reuben Thomas [Tue, 23 Jan 2018 23:37:40 +0000 (23:37 +0000)]
Remove ifdeffed-out alias of ‘.’ for ‘RFC 1345’
Although we will probably not implement ‘.’ to mean “guess the charset” as
suggested in the comment, there’s also no reason to have a catch-all alias
for RFC 1345 now, when iconv would be a better choice.
Reuben Thomas [Tue, 23 Jan 2018 22:26:42 +0000 (22:26 +0000)]
Remove built-in applemac module
Resolve differences with RFC 1345 code. They were caused simply by extra
characters in the latter missing from the former (I compared the two tables
character by character, by eye).
Reuben Thomas [Mon, 22 Jan 2018 23:35:27 +0000 (23:35 +0000)]
Allow pipe filters to signal when they are interrupted again
Make the interrupted variable, previously static in main,
recode_interrupted, defined in task.c (hence in librecode), so that it can
now be tested at the end of a pipe recode.
Reuben Thomas [Mon, 22 Jan 2018 22:22:09 +0000 (22:22 +0000)]
Remove --sequence=files
Assume that any reasonable target OS has virtual memory (on which holding
the data in memory has much the same performance implications as using
files).
Reuben Thomas [Sat, 20 Jan 2018 00:01:53 +0000 (00:01 +0000)]
Overhaul error handling in transform steps (fixes Debian bug #215285)
Three main principles were applied:
1. Check all return codes (this fixes Debian bug #215285, data is lost when
no space left on device).
2. Ensure that resources are not leaked (memory and file descriptors).
3. Consistently use the recode error signalling mechanism: rather than
arbitrarily signalling failure on an I/O error, set error level
RECODE_SYSTEM_ERROR, and signal failure according to fail_level.
It proved to be useful to merge perform_{memory,pass}_sequence.
Reuben Thomas [Thu, 18 Jan 2018 00:36:48 +0000 (00:36 +0000)]
TODO: overhaul
Replace François’s email address with mine, remove things that are either
done or seem to refer to personal files of François, and the section on the
MS-DOS port (assumed defunct).
Reuben Thomas [Wed, 17 Jan 2018 22:43:10 +0000 (22:43 +0000)]
Try to diagnose untranslatable input when using iconv
See Debian bug #348909.
The problem starts with the fact that iconv returns EILSEQ (invalid input)
when in fact the input is merely untranslatable.
It is possible to diagnose this situation by running another conversion with
the output encoding the same as the input (so that it will always succeed on
valid input) at the same point. This is what we now do. Unfortunately,
there’s no way I can see to work out how much input to skip (i.e. the length
of the untranslatable character in the source encoding). Hence, we still
just skip one byte. The typical result is that invalid input is diagnosed on
the next step, resulting in the same problem as at present.
Two possible workarounds are to not use iconv, or to set abort_level to
RECODE_UNTRANSLATABLE (this is what test_2 in t80_error.py does).
Reuben Thomas [Wed, 17 Jan 2018 13:13:12 +0000 (13:13 +0000)]
base64.c: fix handling of EOF and LF (fixes Debian bug #271939)
LF can occur before the end of a full line (76 characters) if it’s
immediately followed by EOF.
The last line can be any number of quadruplets long; it need not be 76
characters. (I suspect this was an attempt to deal with LF without the extra
call to get_byte and goto.)
Reuben Thomas [Tue, 16 Jan 2018 10:37:48 +0000 (10:37 +0000)]
Recode.pyx: minor improvements
size_t is supported natively these days, so don’t make a dangerous guess as
to its value (in particular, should never hard-wire things determined in
config.h!).
Use libcpp’s bool instead of hand-defining a bool enum.
Use True for C true (will be automatically coerced).
Reuben Thomas [Tue, 16 Jan 2018 01:06:51 +0000 (01:06 +0000)]
varia.c: resolve various conflicts
Rather than compile-time macros for two different options:
1. Resolve the Kamenický encoding by referring to the Wikipedia version at
https://en.wikipedia.org/wiki/Kamenick%C3%BD_encoding
2. Allow the extra characters in the Cork encoding table.
Also fill in some missing Unicode code points, fixing some FIXMEs.