granicus.if.org Git - postgresql/commit

author	Tom Lane <tgl@sss.pgh.pa.us>
	Thu, 11 Feb 2016 00:30:12 +0000 (19:30 -0500)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Thu, 11 Feb 2016 00:30:12 +0000 (19:30 -0500)
commit	e56acbe2a3d942e0f029ecdb1eb8b11f7b9e99b2
tree	3ea425c722a5a41e673f70e8232c9d3b80cbb552	tree \| snapshot
parent	3dca6f36fcd694c8c49d26e7c4971194dee2754a	commit \| diff

Avoid use of sscanf() to parse ispell dictionary files.

It turns out that on FreeBSD-derived platforms (including OS X), the
*scanf() family of functions is pretty much brain-dead about multibyte
characters.  In particular it will apply isspace() to individual bytes
of input even when those bytes are part of a multibyte character, thus
allowing false recognition of a field-terminating space.

We appear to have little alternative other than instituting a coding
rule that *scanf() is not to be used if the input string might contain
multibyte characters.  (There was some discussion of relying on "%ls",
but that probably just moves the portability problem somewhere else,
and besides it doesn't fully prevent BSD *scanf() from using isspace().)

This patch is a down payment on that: it gets rid of use of sscanf()
to parse ispell dictionary files, which are certainly at great risk
of having a problem.  The code is cleaner this way anyway, though
a bit longer.

In passing, improve a few comments.

Report and patch by Artur Zakirov, reviewed and somewhat tweaked by me.
Back-patch to all supported branches.