VML plugin: replace 'gvputs(… html_string(…))' with 'xml_escape' functionality
This is further work towards unifying the XML escaping code (#1868). This change
has no functional impact but makes this processing slightly more efficient
(escaped text is emitted directly into the target file/stream instead of first
constructed in an intermediate buffer) and thread safe (a static buffer is no
longer used for escaping). The latter is not so significant as other factors
make it still unsafe to call into this plugin with multiple threads.
This functionality was previously indirectly tested through some other graph
processing that uses escaping. However, this introduces some unit testing of
this function giving us an extra safe guard and an easier way to diagnose
problems with this functionality.
xml_core: support a mode for escaping UTF-8 characters
This is modeled after `html_string` in the VML plugin and intended to replace
that function in a future commit. It differs from `html_string` in the following
ways:
* More limited unicode character detection. `html_string` has a very
generalized notion of a valid character that extends to lengths beyond what
UTF-8 allows. This new implementation in `xml_core` adheres more strictly to
only valid UTF-8 character lengths.
* Simpler character parsing. `html_string` is written in a style to (1) decode
character byte length without branching and (2) use the outer loop to also
loop over the UTF-8 character’s bytes. This new implementation in `xml_core`
uses simpler, more obvious code for decoding the byte length and consumes
more than one character of the input instead of reusing the outer loop. This
code is not on a hot path and it is not necessary or helpful to
micro-optimize the control flow.
* Hex escapes instead of decimal escapes. `html_string` uses `&#[0-9]+;`
escapes while this new implementation uses `&#x[0-9a-f]+;` escapes. For
many characters, this results in a shorter sequence. A compiler that knows
`snprintf` as a built-in (all recent GCC and Clang) should also be able to
generate a hex escape without using any division operations.
Note that nothing yet uses this functionality; all existing calls that go
through this code have the `utf8` flag unset.
This code aborts on encountering an invalid UTF-8 character. This is not ideal,
but matches `html_string`’s error handling. Perhaps this can be improved in
future.
xml_core: update input pointer to reflect how many characters were consumed
This has no immediate effect because the function only ever consumes a single
character. However, a future change will introduce more sophisticated escaping
that sometimes involves consuming more than one character from the input.
This is a bit sooner since 2.49.2 than the usual release cadence. There was an
unintended regression committed after 2.49.1 that made it into 2.49.2 (fixed in
the commit series merged in c5ee41f65cc02c9d96f8f27a9fb5e6314424a4d9). In order
to minimize the time the latest Graphviz version contains a known regression,
this commit is making a new release sooner than otherwise would be done.
Magnus Jacobsson [Tue, 19 Oct 2021 05:15:02 +0000 (07:15 +0200)]
sfio: correct misleading indentation in SFnputc macro definition
Fixes errors like this when building with CMake (which uses -Wall,
-Wextra and -Werror) on Ubuntu 21.10 with gcc 11.2.0.
../lib/sfio/sfvprintf.c:76:13: error: this ‘if’ clause does not guard... [-Werror=misleading-indentation]
76 | if(n != w) goto done; n = 0;\
| ^~
../lib/sfio/sfvprintf.c:511:25: note: in expansion of macro ‘SFnputc’
511 | SFnputc(f, '0', n);
| ^~~~~~~
../lib/sfio/sfvprintf.c:511:41: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
511 | SFnputc(f, '0', n);
| ^
../lib/sfio/sfvprintf.c:76:35: note: in definition of macro ‘SFnputc’
76 | if(n != w) goto done; n = 0;\
| ^
Magnus Jacobsson [Tue, 19 Oct 2021 05:15:02 +0000 (07:15 +0200)]
sfio: correct misleading indentation in REINIT macro definition
Fixes errors like this when building with CMake (which uses -Wall,
-Wextra and -Werror) on Ubuntu 21.10 with gcc 11.2.0.
../lib/sfio/sfdisc.c:114:25: error: this ‘for’ clause does not guard... [-Werror=misleading-indentation]
114 | { for(d = f->disc; d && !d->iof; d = d->disc) ; \
| ^~~
../lib/sfio/sfdisc.c:119:9: note: in expansion of macro ‘REINIT’
119 | REINIT(oreadf, readf, Sfread_f);
| ^~~~~~
../lib/sfio/sfdisc.c:115:25: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘for’
115 | if(DISCF(d,iof,type) != oiof) \
| ^~
../lib/sfio/sfdisc.c:119:9: note: in expansion of macro ‘REINIT’
119 | REINIT(oreadf, readf, Sfread_f);
| ^~~~~~
Magnus Jacobsson [Tue, 19 Oct 2021 05:15:02 +0000 (07:15 +0200)]
sfio: correct misleading indentation in GETDISCF macro definition
Fixes errors like this when building with CMake (which uses -Wall,
-Wextra and -Werror) on Ubuntu 21.10 with gcc 11.2.0.
../lib/sfio/sfdisc.c:68:11: error: this ‘for’ clause does not guard... [-Werror=misleading-indentation]
68 | { for(d = f->disc; d && !d->iof; d = d->disc) ; \
| ^~~
../lib/sfio/sfdisc.c:71:5: note: in expansion of macro ‘GETDISCF’
71 | GETDISCF(oreadf, readf, Sfread_f);
| ^~~~~~~~
../lib/sfio/sfdisc.c:71:14: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘for’
71 | GETDISCF(oreadf, readf, Sfread_f);
| ^~~~~~
../lib/sfio/sfdisc.c:69:11: note: in definition of macro ‘GETDISCF’
69 | func = d ? d->iof : NULL; \
| ^~~~
lib/glcomp: replace header guards with more modern #pragma once
Amusingly it looks like some time in the past (prior to version control) someone
got a little trigger happy find-and-replacing “CompText” with “CompFont”.
CMake: only pass 'YY_NO_UNISTD_H' to Flex when unistd.h is not found
This macro tells Flex that #including unistd.h should be avoided. There is no
need for this on almost any platform except Windows, hence why `YY_NO_UNISTD_H`
is only mentioned elsewhere in the MS Build build system, not in the Autotools
build system. In the CMake build system which is meant to be used across Windows
and non-Windows platforms, we can do something more nuanced and depend on the
existence check of unistd.h itself.
extokens: fix missing NUL terminator append in GVPR tokenization
This is the second half of a bug fix following the prior commit.
Commit 971293551421455a0d939b9f8cea17356b7968f8 refactored this code to avoid
the use of an SFIO buffer, inadvertently introducing a bug. The change did not
account for the source buffer not being NUL terminated. This fix wicks closer to
the original code, not assuming a NUL terminator and copying a known number of
bytes into the destination.
exsplit: fix missing NUL terminator append in GVPR splitting
Commit 7ef9d53e2e6dc53c44939ace7a9cad57c3aa00bf refactored this code to avoid
the use of an SFIO buffer, inadvertently introducing a bug. The change did not
account for the source buffer not being NUL terminated. This fix wicks closer to
the original code, not assuming a NUL terminator and copying a known number of
bytes into the destination.
Match function signature in definition with declaration
GCC11 throws the following warning:
graphviz/lib/pathplan/shortest.c:93:47: warning: argument 2 of type ‘Ppoint_t *’ {aka ‘struct Pxy_t *’} declared as a pointer [-Warray-parameter=]
93 | int Pshortestpath(Ppoly_t * polyp, Ppoint_t * eps, Ppolyline_t * output)
| ~~~~~~~~~~~^~~
In file included from graphviz/lib/pathplan/pathutil.h:15,
from graphviz/lib/pathplan/shortest.c:16:
graphviz/lib/pathplan/pathplan.h:22:59: note: previously declared as an array ‘Ppoint_t[2]’ {aka ‘struct Pxy_t[2]’}
22 | extern int Pshortestpath(Ppoly_t * boundary, Ppoint_t endpoints[2],
| ~~~~~~~~~^~~~~~~~~~~~
[2/20] Building C object lib/pathplan/CMakeFiles/pathplan.dir/route.c.o
graphviz/lib/pathplan/route.c:76:29: warning: argument 4 of type ‘Ppoint_t *’ {aka ‘struct Pxy_t *’} declared as a pointer [-Warray-parameter=]
76 | Ppoint_t * evs, Ppolyline_t * output)
| ~~~~~~~~~~~^~~
In file included from graphviz/lib/pathplan/pathutil.h:15,
from graphviz/lib/pathplan/route.c:17:
graphviz/lib/pathplan/pathplan.h:28:39: note: previously declared as an array ‘Pvector_t[2]’ {aka ‘struct Pxy_t[2]’}
28 | Pvector_t endpoint_slopes[2],
| ~~~~~~~~~~^~~~~~~~~~~~~~~~~~