Tom Lane [Sat, 27 Oct 2007 17:53:15 +0000 (17:53 +0000)]
Add some rudimentary tracing code to the default text search parser, to help
in debugging its state-machine rules. Const-ify all the constant tables.
Minor other code cleanup, including using "token" rather than "lexeme" to
describe the output strings.
Tom Lane [Sat, 27 Oct 2007 16:01:09 +0000 (16:01 +0000)]
Rename default text search parser's "uri" token type to "url_path",
per recommendation from Alvaro. This doesn't force initdb since the
numeric token type in the catalogs doesn't change; but note that
the expected regression test output changed.
Tom Lane [Sat, 27 Oct 2007 05:45:43 +0000 (05:45 +0000)]
Avoid considering both sort directions as equally useful for merging.
This doubles the planning workload for mergejoins while not actually
accomplishing much. The only useful case is where one of the directions
matches the query's ORDER BY request; therefore, put a thumb on the scales
in that direction, and otherwise arbitrarily consider only the ASC direction.
(This is a lot easier now than it would've been before 8.3, since we have
more semantic knowledge embedded in PathKeys now.)
Magnus Hagander [Fri, 26 Oct 2007 21:50:10 +0000 (21:50 +0000)]
Change win32 child-death tracking code to use a threadpool to wait for
childprocess deaths instead of using one thread per child. This drastastically
reduces the address space usage and should allow for more backends running.
Also change the win32_waitpid functionality to use an IO Completion Port for
queueing child death notices instead of using a fixed-size array.
Alvaro Herrera [Fri, 26 Oct 2007 20:45:10 +0000 (20:45 +0000)]
Allow an autovacuum worker to be interrupted automatically when it is found
to be locking another process (except when it's working to prevent Xid
wraparound problems).
Tom Lane [Fri, 26 Oct 2007 18:10:50 +0000 (18:10 +0000)]
Change have_join_order_restriction() so that we do not force a clauseless join
if either of the input relations can legally be joined to any other rels using
join clauses. This avoids uselessly (and expensively) considering a lot of
really stupid join paths when there is a join restriction with a large
footprint, that is, lots of relations inside its LHS or RHS. My patch of
15-Feb-2007 had been causing the code to consider joining *every* combination
of rels inside such a group, which is exponentially bad :-(. With this
behavior, clauseless bushy joins will be done if necessary, but they'll be
put off as long as possible. Per report from Jakub Ouhrabka.
Backpatch to 8.2. We might someday want to backpatch to 8.1 as well, but 8.1
does not have the problem for OUTER JOIN nests, only for IN-clauses, so it's
not clear anyone's very likely to hit it in practice; and the current patch
doesn't apply cleanly to 8.1.
Tom Lane [Thu, 25 Oct 2007 20:22:53 +0000 (20:22 +0000)]
Make initdb's selection of default text search configuration depend
only on the 'language' part of the locale name, ignoring the country code.
We may need to be smarter later when there are more built-in configurations,
but for now this is good enough and avoids having to bloat the table.
Alvaro Herrera [Thu, 25 Oct 2007 19:13:37 +0000 (19:13 +0000)]
Fix memory management for new variables -- they must actually survive
transaction end, in case we decide to do a vacuum analyze (which is done in two
xacts).
Tom Lane [Thu, 25 Oct 2007 18:54:03 +0000 (18:54 +0000)]
Fix ALTER SEQUENCE so that it does not affect the value of currval() for
the sequence. Also, make setval() with is_called = false not affect the
currval state, either. Per report from Kris Jurka that an implicit
ALTER SEQUENCE OWNED BY unexpectedly caused currval() to become valid.
Since this isn't 100% backwards compatible, it will go into HEAD only;
I'll put a more limited patch into 8.2.
Tom Lane [Thu, 25 Oct 2007 13:48:57 +0000 (13:48 +0000)]
Tweak new error messages to match the actual syntax of DECLARE CURSOR.
(Last night I copied-and-pasted from the WITH HOLD case, but that's
wrong because of the bizarrely irregular syntax specified by the standard.)
Tom Lane [Wed, 24 Oct 2007 23:27:08 +0000 (23:27 +0000)]
Disallow scrolling of FOR UPDATE/FOR SHARE cursors, so as to avoid problems
in corner cases such as re-fetching a just-deleted row. We may be able to
relax this someday, but let's find out how many people really care before
we invest a lot of work in it. Per report from Heikki and subsequent
discussion.
While in the neighborhood, make the combination of INSENSITIVE and FOR UPDATE
throw an error, since they are semantically incompatible. (Up to now we've
accepted but just ignored the INSENSITIVE option of DECLARE CURSOR.)
Alvaro Herrera [Wed, 24 Oct 2007 20:55:36 +0000 (20:55 +0000)]
Rearrange vacuum-related bits in PGPROC as a bitmask, to better support
having several of them. Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).
Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.
Tom Lane [Wed, 24 Oct 2007 20:54:27 +0000 (20:54 +0000)]
Fix an error in make_outerjoininfo introduced by my patch of 30-Aug: the code
neglected to test whether an outer join's join-condition actually refers to
the lower outer join it is looking at. (The comment correctly described what
was supposed to happen, but the code didn't do it...) This often resulted in
adding an unnecessary constraint on the join order of the two outer joins,
which was bad enough. However, it also seems to expose a performance
problem in an older patch (from 15-Feb): once we've decided that there is a
join ordering constraint, we will start trying clauseless joins between every
combination of rels within the constraint, which pointlessly eats up lots of
time and space if there are numerous rels below the outer join. That probably
needs to be revisited :-(. Per gripe from Jakub Ouhrabka.
Alvaro Herrera [Wed, 24 Oct 2007 19:08:25 +0000 (19:08 +0000)]
Minor changes to autovacuum worker: change error handling so that it continues
with the next table on schedule instead of exiting, in all cases instead of
just on query cancel.
Add a errcontext() line indicating the activity of the worker to the error
message when it is cancelled.
Change the WorkerInfo struct to contain a pointer to the worker's PGPROC
instead of just the PID.
Add forgotten post-auth delays, per Simon Riggs. Also to autovac launcher.
Tom Lane [Wed, 24 Oct 2007 18:37:09 +0000 (18:37 +0000)]
Fix UPDATE/DELETE WHERE CURRENT OF to support repeated update and update-
then-delete on the current cursor row. The basic fix is that nodeTidscan.c
has to apply heap_get_latest_tid() to the current-scan-TID obtained from the
cursor query; this ensures we get the latest row version to work with.
However, since that only works if the query plan is a TID scan, we also have
to hack the planner to make sure only that type of plan will be selected.
(Formerly, the planner might decide to apply a seqscan if the table is very
small. This change is probably a Good Thing anyway, since it's hard to see
how a seqscan could really win.) That means the execQual.c code to support
CurrentOfExpr as a regular expression type is dead code, so replace it with
just an elog(). Also, add regression tests covering these cases. Note
that the added tests expose the fact that re-fetching an updated row
misbehaves if the cursor used FOR UPDATE. That's an independent bug that
should be fixed later. Per report from Dharmendra Goyal.
Tom Lane [Wed, 24 Oct 2007 03:30:03 +0000 (03:30 +0000)]
Set read_only = TRUE while evaluating input queries for ts_rewrite()
and ts_stat(), per my recent suggestion. Also add a possibly-not-needed-
but-can't-hurt check for NULL SPI_tuptable, before we try to dereference
same.
Tom Lane [Wed, 24 Oct 2007 02:24:49 +0000 (02:24 +0000)]
Remove the aggregate form of ts_rewrite(), since it doesn't work as desired
if there are zero rows to aggregate over, and the API seems both conceptually
and notationally ugly anyway. We should look for something that improves
on the tsquery-and-text-SELECT version (which is also pretty ugly but at
least it works...), but it seems that will take query infrastructure that
doesn't exist today. (Hm, I wonder if there's anything in or near SQL2003
window functions that would help?) Per discussion.
Tom Lane [Tue, 23 Oct 2007 21:38:16 +0000 (21:38 +0000)]
Make configure probe for the location of the <uuid.h> header file.
Needed to accommodate different layout on some platforms (Debian for
one). Heikki Linnakangas
Tom Lane [Tue, 23 Oct 2007 20:46:12 +0000 (20:46 +0000)]
Rename and slightly redefine the default text search parser's "word"
categories, as per discussion. asciiword (formerly lword) is still
ASCII-letters-only, and numword (formerly word) is still the most general
mixed-alpha-and-digits case. But word (formerly nlword) is now
any-group-of-letters-with-at-least-one-non-ASCII, rather than all-non-ASCII as
before. This is no worse than before for parsing mixed Russian/English text,
which seems to have been the design center for the original coding; and it
should simplify matters for parsing most European languages. In particular
it will not be necessary for any language to accept strings containing digits
as being regular "words". The hyphenated-word categories are adjusted
similarly.
Magnus Hagander [Tue, 23 Oct 2007 17:58:01 +0000 (17:58 +0000)]
Use snprintf instead of wsprintf, and use getenv("APPDATA") instead of
SHGetFolderPath.
This removes the direct dependency on shell32.dll and user32.dll, which
eats a lot of "desktop heap" for each backend that's started. The
desktop heap is a very limited resource, causing backends to no
longer start once it's been exhausted.
We still have indirect depdendencies on user32.dll through third party
libraries, but those can't easily be removed.
Tom Lane [Tue, 23 Oct 2007 01:44:40 +0000 (01:44 +0000)]
Fix two-argument form of ts_rewrite() so it actually works for cases where
a later rewrite rule should change a subtree modified by an earlier one.
Per my gripe of a few days ago.
Tom Lane [Tue, 23 Oct 2007 00:51:23 +0000 (00:51 +0000)]
Fix several bugs in tsvectorin, including crash due to uninitialized field and
miscomputation of required palloc size. The crash could only occur if the
input contained lexemes both with and without positions, which is probably not
common in practice. The miscomputation would definitely result in wasted
space. Also fix some inconsistent coding around alignment of strings and
positions in a tsvector value; these errors could also lead to crashes given
mixed with/without position data and a machine that's picky about alignment.
And be more careful about checking for overflow of string offsets.
Patch is only against HEAD --- I have not looked to see if same bugs are
in back-branch contrib/tsearch2 code.
Tom Lane [Mon, 22 Oct 2007 21:34:33 +0000 (21:34 +0000)]
Clarify example of planner cost computation, per a suggestion from
James Shaw. Also update a couple of examples to reflect 8.3's improved
plan-printing code.
Tom Lane [Mon, 22 Oct 2007 20:13:37 +0000 (20:13 +0000)]
Adjust ts_debug's output as per my proposal of yesterday: show the
active dictionary and its output lexemes as separate columns, instead
of smashing them into one text column, and lowercase the column names.
Also, define the output rowtype using OUT parameters instead of a
composite type, to be consistent with the other built-in functions.
Tom Lane [Mon, 22 Oct 2007 03:37:04 +0000 (03:37 +0000)]
Create a quick-and-dirty list of known migration issues for pre-8.3
users of tsearch. This isn't meant to be permanent documentation,
but to call out the areas that need either fixing or real documentation.
Tom Lane [Mon, 22 Oct 2007 01:02:22 +0000 (01:02 +0000)]
Add a useless return statement to suppress a warning seen with some
versions of gcc (I'm seeing it with Apple's gcc 4.0.1). I think the
reason we did not see this before was that the assert() macros in the
regex code were all no-ops till recently.
Tom Lane [Sun, 21 Oct 2007 22:29:56 +0000 (22:29 +0000)]
Fix shared tsvector/tsquery input code so that we don't say "syntax error in
tsvector" when we are really parsing a tsquery. Report the bogus input,
too. Make styles of some related error messages more consistent.
Tom Lane [Sun, 21 Oct 2007 20:04:37 +0000 (20:04 +0000)]
Editorial overhaul for text search documentation. Organize the info
more clearly, improve a lot of unclear descriptions, add some missing
material. We still need a migration guide though.
Tom Lane [Sat, 20 Oct 2007 04:00:38 +0000 (04:00 +0000)]
Add a note pointing out that you can't log to syslog without tweaking
the syslog configuration file (at least not on most known Unixen).
I dunno why we hadn't had that info in the docs all along ...
Tom Lane [Fri, 19 Oct 2007 22:01:45 +0000 (22:01 +0000)]
Found another small glitch in tsearch API: the two versions of ts_lexize()
are really redundant, since we invented a regdictionary alias type.
We can have just one function, declared as taking regdictionary, and
it will handle both behaviors. Noted while working on documentation.
Tom Lane [Wed, 17 Oct 2007 15:24:04 +0000 (15:24 +0000)]
Add missing entry for PG_WIN1250 encoding, per gripe from Pavel Stehule.
Also enable translation of PG_WIN874, which certainly seems to have an
obvious translation now, though maybe it did not at the time this table's
ancestor was created.
Tom Lane [Wed, 17 Oct 2007 01:01:28 +0000 (01:01 +0000)]
Another round of editorialization on the text search documentation.
Notably, standardize on using "token" for the strings output by a parser,
while "lexeme" is reserved for the normalized strings produced by a
dictionary.
Tom Lane [Tue, 16 Oct 2007 17:05:26 +0000 (17:05 +0000)]
Tweak toast-related logic in heapam.c so that the toaster is only invoked
when relkind = RELKIND_RELATION. This syncs these tests with the Asserts
in tuptoaster.c, and ensures that we won't ever try to, for example,
compress a sequence's tuple. Problem found by Greg Stark while stress-testing
with much-smaller-than-normal page sizes.
Tom Lane [Tue, 16 Oct 2007 16:00:00 +0000 (16:00 +0000)]
Teach pgxs.mk and Install.pm how to install files from a contrib module
into SHAREDIR/tsearch_data. Use this instead of ad-hoc coding in
dict_xsyn/Makefile. Should fix current ContribCheck failures on MSVC.
Tom Lane [Mon, 15 Oct 2007 22:46:27 +0000 (22:46 +0000)]
Fix pg_wchar_table[] to match revised ordering of the encoding ID enum.
Add some comments so hopefully the next poor sod doesn't fall into the
same trap. (Wrong comments are worse than none at all...)
Tom Lane [Mon, 15 Oct 2007 21:39:57 +0000 (21:39 +0000)]
Remove obsolete examples of add-on parsers and dictionary templates;
these are more easily and usefully maintained as contrib modules.
Various other wordsmithing, markup improvement, etc.
Tom Lane [Mon, 15 Oct 2007 21:36:50 +0000 (21:36 +0000)]
Add sample text search dictionary templates and parsers, to replace the
hard-to-maintain textual examples currently in the SGML docs. From
Sergey Karpov.
Tom Lane [Mon, 15 Oct 2007 15:11:29 +0000 (15:11 +0000)]
Include NOLOGIN roles in the 'flat' password file. In the original
coding this was seen as useless, but the problem with not including them
is that the error message will often be something about authentication
failure, rather than the more helpful one about 'role is not permitted
to log in'. Per discussion.
Tom Lane [Sat, 13 Oct 2007 22:33:38 +0000 (22:33 +0000)]
Strengthen type_sanity's check on pg_type.typarray. It failed to
complain about types that didn't have typarray set. Noted while
working on txid patch.
Tom Lane [Sat, 13 Oct 2007 20:46:47 +0000 (20:46 +0000)]
Guard against possible double free during error escape from XML
functions. Patch for the reported issue from Kris Jurka, some
other potential trouble spots plugged by Tom.
Tom Lane [Sat, 13 Oct 2007 20:18:42 +0000 (20:18 +0000)]
Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: the
renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2
initdb and psql if they are run with an 8.3beta1 libpq.so. For the moment
we can rearrange the order of enum pg_enc to keep the same number for
everything except PG_JOHAB, which isn't a problem since there are no direct
references to it in the 8.2 programs anyway. (This does force initdb
unfortunately.)
Going forward, we want to fix things so that encoding IDs can be changed
without an ABI break, and this commit includes the changes needed to allow
libpq's encoding IDs to be treated as fully independent of the backend's.
The main issue is that libpq clients should not include pg_wchar.h or
otherwise assume they know the specific values of libpq's encoding IDs,
since they might encounter version skew between pg_wchar.h and the libpq.so
they are using. To fix, have libpq officially export functions needed for
encoding name<=>ID conversion and validity checking; it was doing this
anyway unofficially.
It's still the case that we can't renumber backend encoding IDs until the
next bump in libpq's major version number, since doing so will break the
8.2-era client programs. However the code is now prepared to avoid this
type of problem in future.
Note that initdb is no longer a libpq client: we just pull in the two
source files we need directly. The patch also fixes a few places that
were being sloppy about checking for an unrecognized encoding name.
Tom Lane [Sat, 13 Oct 2007 15:55:40 +0000 (15:55 +0000)]
Fix ALTER COLUMN TYPE to preserve the tablespace and reloptions of indexes
it affects. The original coding neglected tablespace entirely (causing
the indexes to move to the database's default tablespace) and for an index
belonging to a UNIQUE or PRIMARY KEY constraint, it would actually try to
assign the parent table's reloptions to the index :-(. Per bug #3672 and
subsequent investigation.
8.0 and 8.1 did not have reloptions, but the tablespace bug is present.
Tom Lane [Sat, 13 Oct 2007 00:58:03 +0000 (00:58 +0000)]
Teach planagg.c that partial indexes specifying WHERE foo IS NOT NULL can be
used to perform MIN(foo) or MAX(foo), since we want to discard null rows in
the indexscan anyway. (This would probably fall out for free if we were
injecting the IS NOT NULL clause somewhere earlier, but given the current
anatomy of the MIN/MAX optimization code we have to do it explicitly.
Fortunately, very little added code is needed.) Per a discussion with
Henk de Wit.
Tom Lane [Fri, 12 Oct 2007 19:39:59 +0000 (19:39 +0000)]
When telling the bgwriter that we need a checkpoint because too much xlog
has been consumed, recheck against the latest value of RedoRecPtr before
really sending the signal. This avoids useless checkpoint activity if
XLogWrite is executed when we have a very stale local copy of RedoRecPtr.
The potential for useless checkpoint is very much worse in 8.3 because of
the walwriter process (which never does XLogInsert), so while this behavior
was intentional, it needs to be changed. Per report from Itagaki Takahiro.
Tom Lane [Fri, 12 Oct 2007 18:55:12 +0000 (18:55 +0000)]
Remove hack in pg_tablespace_aclmask() that disallowed permissions
on pg_global even to superusers, and replace it with checks in various
other places to complain about invalid uses of pg_global. This ends
up being a bit more code but it allows a more specific error message
to be given, and it un-breaks pg_tablespace_size() on pg_global.
Per discussion.
Tom Lane [Thu, 11 Oct 2007 21:27:49 +0000 (21:27 +0000)]
Ensure that the result of evaluating a function during constant-expression
simplification gets detoasted before it is incorporated into a Const node.
Otherwise, if an immutable function were to return a TOAST pointer (an
unlikely case, but it can be made to happen), we would end up with a plan
that depends on the continued existence of the out-of-line toast datum.
Tom Lane [Thu, 11 Oct 2007 19:54:17 +0000 (19:54 +0000)]
Code review for txid patch: add binary I/O functions, avoid dependence
on SerializableSnapshot, minor other cleanup. Marko Kreen, some further
editorialization by me.