Tom Lane [Wed, 24 Oct 2007 03:30:03 +0000 (03:30 +0000)]
Set read_only = TRUE while evaluating input queries for ts_rewrite()
and ts_stat(), per my recent suggestion. Also add a possibly-not-needed-
but-can't-hurt check for NULL SPI_tuptable, before we try to dereference
same.
Tom Lane [Wed, 24 Oct 2007 02:24:49 +0000 (02:24 +0000)]
Remove the aggregate form of ts_rewrite(), since it doesn't work as desired
if there are zero rows to aggregate over, and the API seems both conceptually
and notationally ugly anyway. We should look for something that improves
on the tsquery-and-text-SELECT version (which is also pretty ugly but at
least it works...), but it seems that will take query infrastructure that
doesn't exist today. (Hm, I wonder if there's anything in or near SQL2003
window functions that would help?) Per discussion.
Tom Lane [Tue, 23 Oct 2007 21:38:16 +0000 (21:38 +0000)]
Make configure probe for the location of the <uuid.h> header file.
Needed to accommodate different layout on some platforms (Debian for
one). Heikki Linnakangas
Tom Lane [Tue, 23 Oct 2007 20:46:12 +0000 (20:46 +0000)]
Rename and slightly redefine the default text search parser's "word"
categories, as per discussion. asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case. But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before. This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages. In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words". The hyphenated-word categories are adjusted
similarly.
Magnus Hagander [Tue, 23 Oct 2007 17:58:01 +0000 (17:58 +0000)]
Use snprintf instead of wsprintf, and use getenv("APPDATA") instead of
SHGetFolderPath.
This removes the direct dependency on shell32.dll and user32.dll, which
eats a lot of "desktop heap" for each backend that's started. The
desktop heap is a very limited resource, causing backends to no
longer start once it's been exhausted.
We still have indirect depdendencies on user32.dll through third party
libraries, but those can't easily be removed.
Tom Lane [Tue, 23 Oct 2007 01:44:40 +0000 (01:44 +0000)]
Fix two-argument form of ts_rewrite() so it actually works for cases where
a later rewrite rule should change a subtree modified by an earlier one.
Per my gripe of a few days ago.
Tom Lane [Tue, 23 Oct 2007 00:51:23 +0000 (00:51 +0000)]
Fix several bugs in tsvectorin, including crash due to uninitialized field and
miscomputation of required palloc size. The crash could only occur if the
input contained lexemes both with and without positions, which is probably not
common in practice. The miscomputation would definitely result in wasted
space. Also fix some inconsistent coding around alignment of strings and
positions in a tsvector value; these errors could also lead to crashes given
mixed with/without position data and a machine that's picky about alignment.
And be more careful about checking for overflow of string offsets.
Patch is only against HEAD --- I have not looked to see if same bugs are
in back-branch contrib/tsearch2 code.
Tom Lane [Mon, 22 Oct 2007 21:34:33 +0000 (21:34 +0000)]
Clarify example of planner cost computation, per a suggestion from
James Shaw. Also update a couple of examples to reflect 8.3's improved
plan-printing code.
Tom Lane [Mon, 22 Oct 2007 20:13:37 +0000 (20:13 +0000)]
Adjust ts_debug's output as per my proposal of yesterday: show the
active dictionary and its output lexemes as separate columns, instead
of smashing them into one text column, and lowercase the column names.
Also, define the output rowtype using OUT parameters instead of a
composite type, to be consistent with the other built-in functions.
Tom Lane [Mon, 22 Oct 2007 03:37:04 +0000 (03:37 +0000)]
Create a quick-and-dirty list of known migration issues for pre-8.3
users of tsearch. This isn't meant to be permanent documentation,
but to call out the areas that need either fixing or real documentation.
Tom Lane [Mon, 22 Oct 2007 01:02:22 +0000 (01:02 +0000)]
Add a useless return statement to suppress a warning seen with some
versions of gcc (I'm seeing it with Apple's gcc 4.0.1). I think the
reason we did not see this before was that the assert() macros in the
regex code were all no-ops till recently.
Tom Lane [Sun, 21 Oct 2007 22:29:56 +0000 (22:29 +0000)]
Fix shared tsvector/tsquery input code so that we don't say "syntax error in
tsvector" when we are really parsing a tsquery. Report the bogus input,
too. Make styles of some related error messages more consistent.
Tom Lane [Sun, 21 Oct 2007 20:04:37 +0000 (20:04 +0000)]
Editorial overhaul for text search documentation. Organize the info
more clearly, improve a lot of unclear descriptions, add some missing
material. We still need a migration guide though.
Tom Lane [Sat, 20 Oct 2007 04:00:38 +0000 (04:00 +0000)]
Add a note pointing out that you can't log to syslog without tweaking
the syslog configuration file (at least not on most known Unixen).
I dunno why we hadn't had that info in the docs all along ...
Tom Lane [Fri, 19 Oct 2007 22:01:45 +0000 (22:01 +0000)]
Found another small glitch in tsearch API: the two versions of ts_lexize()
are really redundant, since we invented a regdictionary alias type.
We can have just one function, declared as taking regdictionary, and
it will handle both behaviors. Noted while working on documentation.
Tom Lane [Wed, 17 Oct 2007 15:24:04 +0000 (15:24 +0000)]
Add missing entry for PG_WIN1250 encoding, per gripe from Pavel Stehule.
Also enable translation of PG_WIN874, which certainly seems to have an
obvious translation now, though maybe it did not at the time this table's
ancestor was created.
Tom Lane [Wed, 17 Oct 2007 01:01:28 +0000 (01:01 +0000)]
Another round of editorialization on the text search documentation.
Notably, standardize on using "token" for the strings output by a parser,
while "lexeme" is reserved for the normalized strings produced by a
dictionary.
Tom Lane [Tue, 16 Oct 2007 17:05:26 +0000 (17:05 +0000)]
Tweak toast-related logic in heapam.c so that the toaster is only invoked
when relkind = RELKIND_RELATION. This syncs these tests with the Asserts
in tuptoaster.c, and ensures that we won't ever try to, for example,
compress a sequence's tuple. Problem found by Greg Stark while stress-testing
with much-smaller-than-normal page sizes.
Tom Lane [Tue, 16 Oct 2007 16:00:00 +0000 (16:00 +0000)]
Teach pgxs.mk and Install.pm how to install files from a contrib module
into SHAREDIR/tsearch_data. Use this instead of ad-hoc coding in
dict_xsyn/Makefile. Should fix current ContribCheck failures on MSVC.
Tom Lane [Mon, 15 Oct 2007 22:46:27 +0000 (22:46 +0000)]
Fix pg_wchar_table[] to match revised ordering of the encoding ID enum.
Add some comments so hopefully the next poor sod doesn't fall into the
same trap. (Wrong comments are worse than none at all...)
Tom Lane [Mon, 15 Oct 2007 21:39:57 +0000 (21:39 +0000)]
Remove obsolete examples of add-on parsers and dictionary templates;
these are more easily and usefully maintained as contrib modules.
Various other wordsmithing, markup improvement, etc.
Tom Lane [Mon, 15 Oct 2007 21:36:50 +0000 (21:36 +0000)]
Add sample text search dictionary templates and parsers, to replace the
hard-to-maintain textual examples currently in the SGML docs. From
Sergey Karpov.
Tom Lane [Mon, 15 Oct 2007 15:11:29 +0000 (15:11 +0000)]
Include NOLOGIN roles in the 'flat' password file. In the original
coding this was seen as useless, but the problem with not including them
is that the error message will often be something about authentication
failure, rather than the more helpful one about 'role is not permitted
to log in'. Per discussion.
Tom Lane [Sat, 13 Oct 2007 22:33:38 +0000 (22:33 +0000)]
Strengthen type_sanity's check on pg_type.typarray. It failed to
complain about types that didn't have typarray set. Noted while
working on txid patch.
Tom Lane [Sat, 13 Oct 2007 20:46:47 +0000 (20:46 +0000)]
Guard against possible double free during error escape from XML
functions. Patch for the reported issue from Kris Jurka, some
other potential trouble spots plugged by Tom.
Tom Lane [Sat, 13 Oct 2007 20:18:42 +0000 (20:18 +0000)]
Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: the
renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2
initdb and psql if they are run with an 8.3beta1 libpq.so. For the moment
we can rearrange the order of enum pg_enc to keep the same number for
everything except PG_JOHAB, which isn't a problem since there are no direct
references to it in the 8.2 programs anyway. (This does force initdb
unfortunately.)
Going forward, we want to fix things so that encoding IDs can be changed
without an ABI break, and this commit includes the changes needed to allow
libpq's encoding IDs to be treated as fully independent of the backend's.
The main issue is that libpq clients should not include pg_wchar.h or
otherwise assume they know the specific values of libpq's encoding IDs,
since they might encounter version skew between pg_wchar.h and the libpq.so
they are using. To fix, have libpq officially export functions needed for
encoding name<=>ID conversion and validity checking; it was doing this
anyway unofficially.
It's still the case that we can't renumber backend encoding IDs until the
next bump in libpq's major version number, since doing so will break the
8.2-era client programs. However the code is now prepared to avoid this
type of problem in future.
Note that initdb is no longer a libpq client: we just pull in the two
source files we need directly. The patch also fixes a few places that
were being sloppy about checking for an unrecognized encoding name.
Tom Lane [Sat, 13 Oct 2007 15:55:40 +0000 (15:55 +0000)]
Fix ALTER COLUMN TYPE to preserve the tablespace and reloptions of indexes
it affects. The original coding neglected tablespace entirely (causing
the indexes to move to the database's default tablespace) and for an index
belonging to a UNIQUE or PRIMARY KEY constraint, it would actually try to
assign the parent table's reloptions to the index :-(. Per bug #3672 and
subsequent investigation.
8.0 and 8.1 did not have reloptions, but the tablespace bug is present.
Tom Lane [Sat, 13 Oct 2007 00:58:03 +0000 (00:58 +0000)]
Teach planagg.c that partial indexes specifying WHERE foo IS NOT NULL can be
used to perform MIN(foo) or MAX(foo), since we want to discard null rows in
the indexscan anyway. (This would probably fall out for free if we were
injecting the IS NOT NULL clause somewhere earlier, but given the current
anatomy of the MIN/MAX optimization code we have to do it explicitly.
Fortunately, very little added code is needed.) Per a discussion with
Henk de Wit.
Tom Lane [Fri, 12 Oct 2007 19:39:59 +0000 (19:39 +0000)]
When telling the bgwriter that we need a checkpoint because too much xlog
has been consumed, recheck against the latest value of RedoRecPtr before
really sending the signal. This avoids useless checkpoint activity if
XLogWrite is executed when we have a very stale local copy of RedoRecPtr.
The potential for useless checkpoint is very much worse in 8.3 because of
the walwriter process (which never does XLogInsert), so while this behavior
was intentional, it needs to be changed. Per report from Itagaki Takahiro.
Tom Lane [Fri, 12 Oct 2007 18:55:12 +0000 (18:55 +0000)]
Remove hack in pg_tablespace_aclmask() that disallowed permissions
on pg_global even to superusers, and replace it with checks in various
other places to complain about invalid uses of pg_global. This ends
up being a bit more code but it allows a more specific error message
to be given, and it un-breaks pg_tablespace_size() on pg_global.
Per discussion.
Tom Lane [Thu, 11 Oct 2007 21:27:49 +0000 (21:27 +0000)]
Ensure that the result of evaluating a function during constant-expression
simplification gets detoasted before it is incorporated into a Const node.
Otherwise, if an immutable function were to return a TOAST pointer (an
unlikely case, but it can be made to happen), we would end up with a plan
that depends on the continued existence of the out-of-line toast datum.
Tom Lane [Thu, 11 Oct 2007 19:54:17 +0000 (19:54 +0000)]
Code review for txid patch: add binary I/O functions, avoid dependence
on SerializableSnapshot, minor other cleanup. Marko Kreen, some further
editorialization by me.
Tom Lane [Thu, 11 Oct 2007 18:19:58 +0000 (18:19 +0000)]
Remove incorrect use of VARSIZE() on a toasted datum. We can just remove it
instead of fix it, since once we've set toast_action[i] to 'p' it no longer
matters what toast_sizes[i] is. Greg Stark
Tom Lane [Thu, 11 Oct 2007 18:05:27 +0000 (18:05 +0000)]
Fix the plan-invalidation mechanism to treat regclass constants that refer to
a relation as a reason to invalidate a plan when the relation changes. This
handles scenarios such as dropping/recreating a sequence that is referenced by
nextval('seq') in a cached plan. Rather than teach plancache.c all about
digging through plan trees to find regclass Consts, we charge the planner's
setrefs.c with making a list of the relation OIDs on which each plan depends.
That way the list can be built cheaply during a plan tree traversal that has
to happen anyway. Per bug #3662 and subsequent discussion.
Bruce Momjian [Mon, 8 Oct 2007 18:01:17 +0000 (18:01 +0000)]
Add:
> o Have ALTER SEQUENCE RENAME rename the sequence name stored
> in the sequence table
>
> http://archives.postgresql.org/pgsql-bugs/2007-09/msg00092.php
> http://archives.postgresql.org/pgsql-bugs/2007-10/msg00007.php
>
Jan Wieck [Sun, 7 Oct 2007 23:32:19 +0000 (23:32 +0000)]
Added the Skytools extended transaction ID module to contrib as discussed
on CORE previously.
This module offers transaction ID's containing the original XID and the
transaction epoch as a bigint value to the user level. It also provides
a special txid_snapshot data type that contains an entire transactions
visibility snapshot information, which is useful to determine if a
particular txid was visible to a transaction or not.
The module has been tested by porting Slony-I from using its original
xxid data type.
Alvaro Herrera [Sun, 7 Oct 2007 01:16:42 +0000 (01:16 +0000)]
A few improvements to analyze and vacuum sections in documentation: add "see
also" entries for autovacuum in analyze and vacuum reference pages, and
enhance usage of cross-references in the maintenance page.
Alvaro Herrera [Sun, 7 Oct 2007 00:32:11 +0000 (00:32 +0000)]
Clean up the doc makefile for draft HTML generation. It no longer works
to do "make DRAFT=Y html"; you need to use "make draft" (which was also
supported previously).
Tom Lane [Sat, 6 Oct 2007 16:18:09 +0000 (16:18 +0000)]
Make dumpcolors() have tolerable performance when using 32-bit chr,
as we do (and upstream Tcl doesn't). The loop limit might be subject
to negotiation if anyone ever tries to do regex debugging in Far
Eastern languages, but for now 1000 seems plenty. CHR_MAX was right out :-(
Tom Lane [Sat, 6 Oct 2007 16:01:51 +0000 (16:01 +0000)]
Adjust regcustom.h so that all those assert() calls in the regex package
are converted to Postgres Assert() macros, instead of using <assert.h>
as formerly. No difference in production builds, but --enable-cassert
debug builds will get better coverage for regex testing.