Fix using too many LWLocks bug, reported by Craig Ringer
<craig@postnewspapers.com.au>.
It was my mistake, I missed limitation of number of held locks, now GIN doesn't
use continiuous locks, but still hold buffers pinned to prevent interference
with vacuum's deletion algorithm.
Tom Lane [Tue, 22 Apr 2008 01:34:34 +0000 (01:34 +0000)]
Issue explicit error messages for attempts to use "shell" operators in
ordinary expressions. This probably doesn't catch every single case
where you might get "cache lookup failed for function 0" for use of a
shell operator, but it will catch most. Per bug #4120 from Pedro Gimeno.
This patch incidentally folds make_op_expr() into its sole remaining
caller --- the alternative was to give it yet more arguments, which
didn't seem an improvement.
Tom Lane [Mon, 21 Apr 2008 20:54:15 +0000 (20:54 +0000)]
Fix convert_IN_to_join to properly handle the case where the subselect's
output is not of the same type that's needed for the IN comparison (ie,
where the parser inserted an implicit coercion above the subselect result).
We should record the coerced expression, not just a raw Var referencing
the subselect output, as the quantity that needs to be unique-ified if
we choose to implement the IN as Unique followed by a plain join.
As of 8.3 this error was causing crashes, as seen in bug #4113 from Javier
Hernandez, because the executor was being told to hash or sort the raw
subselect output column using operators appropriate to the coerced type.
In prior versions there was no crash because the executor chose the
hash or sort operators for itself based on the column type it saw.
However, that's still not really right, because what's unique for one data
type might not be unique for another. In corner cases we could get multiple
outputs of a row that should appear only once, as demonstrated by the
regression test case included in this commit.
However, this patch doesn't apply cleanly to 8.2 or before, and the code
involved has shifted enough over time that I'm hesitant to try to back-patch.
Given the lack of complaints from the field about such corner cases, I think
the bug may not be important enough to risk breaking other things with a
back-patch.
Magnus Hagander [Mon, 21 Apr 2008 09:44:47 +0000 (09:44 +0000)]
Add link to major version release notes at the top of the minor
version ones, to make it clear to users just browsing the notes
that there are a lot more changes available from whatever version
they are at than what's in the minor version release notes.
Tom Lane [Mon, 21 Apr 2008 03:49:45 +0000 (03:49 +0000)]
Fix a couple of places in execMain that erroneously assumed that SELECT FOR
UPDATE/SHARE couldn't occur as a subquery in a query with a non-SELECT
top-level operation. Symptoms included outright failure (as in report from
Mark Mielke) and silently neglecting to take the requested row locks.
Back-patch to 8.3, because the visible failure in the INSERT ... SELECT case
is a regression from 8.2. I'm a bit hesitant to back-patch further given the
lack of field complaints.
Tom Lane [Mon, 21 Apr 2008 02:04:09 +0000 (02:04 +0000)]
Add FLOAT4PASSBYVAL/FLOAT8PASSBYVAL to pg_config.h.win32, as a stopgap
measure to get the Windows buildfarm members working again. I don't
know if it's worth exposing these as configurables, or exactly how to
do it in the MSVC build system ...
Tom Lane [Mon, 21 Apr 2008 01:11:43 +0000 (01:11 +0000)]
Make earthdistance use version-0 calling convention if not USE_FLOAT8_BYVAL,
and version-1 if USE_FLOAT8_BYVAL. This might seem a bit pointless, but the
idea is to have at least one regression test that will fail if we ever
accidentally break version-0 functions that return float8. However, they're
already broken, or at least hopelessly unportable, in the USE_FLOAT8_BYVAL
case.
Tom Lane [Mon, 21 Apr 2008 00:26:47 +0000 (00:26 +0000)]
Allow float8, int8, and related datatypes to be passed by value on machines
where Datum is 8 bytes wide. Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior. Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.
Zoltan Boszormenyi, plus configurability stuff by me.
Fix broken compare function for tsquery_ops. Per Tom's report.
I never understood why initial authors GiST in pgsql choose so
stgrange signature for 'same' method:
bool *sameFn(Datum a, Datum b, bool* result)
instead of simple, logical
bool sameFn(Datum a, Datum b)
This change will break any existing GiST extension, so we still live with
it and will live.
Modify the float4 datatype to be pass-by-val. Along the way, remove the last
uses of the long-deprecated float32 in contrib/seg; the definitions themselves
are still there, but no longer used. fmgr/README updated to match.
I added a CREATE FUNCTION to account for existing seg_center() code in seg.c
too, and some tests for it and the neighbor functions. At the same time,
remove checks for NULL which are not needed (because the functions are declared
STRICT).
I had to do some adjustments to contrib's btree_gist too. The choices for
representation there are not ideal for changing the underlying types :-(
Original patch by Zoltan Boszormenyi, with some adjustments by me.
Tom Lane [Fri, 18 Apr 2008 17:05:45 +0000 (17:05 +0000)]
Fix rmtree() so that it keeps going after failure to remove any individual
file; the idea is that we should clean up as much as we can, even if there's
some problem removing one file. Make the error messages a bit less misleading,
too. In passing, const-ify function arguments.
Fix two race conditions between the pending unlink mechanism that was put in
place to prevent reusing relation OIDs before next checkpoint, and DROP
DATABASE. First, if a database was dropped, bgwriter would still try to unlink
the files that the rmtree() call by the DROP DATABASE command has already
deleted, or is just about to delete. Second, if a database is dropped, and
another database is created with the same OID, bgwriter would in the worst
case delete a relation in the new database that happened to get the same OID
as a dropped relation in the old database.
To fix these race conditions:
- make rmtree() ignore ENOENT errors. This fixes the 1st race condition.
- make ForgetDatabaseFsyncRequests forget unlink requests as well.
- force checkpoint on in dropdb on all platforms
Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't
enough on its own to fix the problem of dropping and creating a database with
same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient.
Per Tom Lane's bug report and proposal. Backpatch to 8.3.
Tom Lane [Fri, 18 Apr 2008 01:42:17 +0000 (01:42 +0000)]
Cause EXPLAIN's VERBOSE option to print the target list (output column list)
of each plan node, instead of its former behavior of dumping the internal
representation of the plan tree. The latter display is still available for
those who really want it (see debug_print_plan), but uses for it are certainly
few and and far between. Per discussion.
This patch also removes the explain_pretty_print GUC, which is obsoleted
by the change.
Tom Lane [Thu, 17 Apr 2008 21:22:14 +0000 (21:22 +0000)]
Fix a couple of oversights associated with the "physical tlist" optimization:
we had several code paths where a physical tlist could be used for the input
to a Sort node, which is a dumb idea because any unneeded table columns will
increase the volume of data the sort has to push around.
(Unfortunately the easy-looking fix of calling disuse_physical_tlist during
make_sort_xxx doesn't work because in most cases we're already committed to
the current input tlist --- it's been marked with sort column numbers, or
we've built grouping column numbers using it, etc. The tlist has to be
selected properly at the calling level before we start constructing sort-col
information. This is easy enough to do, we were just failing to take the
point into consideration.)
Back-patch to 8.3. I believe the problem probably exists clear back to 7.4
when the physical tlist optimization was added, but I'm afraid to back-patch
further than 8.3 without a great deal more study than I want to put into it.
The code in this area has drifted a lot over time. The real-world importance
of these code paths is uncertain anyway --- I think in many cases we'd
probably prefer hash-based methods.
Tom Lane [Thu, 17 Apr 2008 18:30:18 +0000 (18:30 +0000)]
Add some code to EXPLAIN to show the targetlist (ie, output columns)
of each plan node. For the moment this is debug support only and is
not enabled unless EXPLAIN_PRINT_TLISTS is defined at build time.
Later I'll see about the idea of letting EXPLAIN VERBOSE do it.
Tom Lane [Wed, 16 Apr 2008 23:59:40 +0000 (23:59 +0000)]
Repair two places where SIGTERM exit could leave shared memory state
corrupted. (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.) To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook. Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.
Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
Tom Lane [Wed, 16 Apr 2008 18:23:04 +0000 (18:23 +0000)]
Fix LOAD_CRIT_INDEX() macro to take out AccessShareLock on the system index
it is trying to build a relcache entry for. This is an oversight in my 8.2
patch that tried to ensure we always took a lock on a relation before trying
to build its relcache entry. The implication is that if someone committed a
reindex of a critical system index at about the same time that some other
backend were starting up without a valid pg_internal.init file, the second one
might PANIC due to not seeing any valid version of the index's pg_class row.
Improbable case, but definitely not impossible.
Tom Lane [Mon, 14 Apr 2008 17:05:34 +0000 (17:05 +0000)]
Push index operator lossiness determination down to GIST/GIN opclass
"consistent" functions, and remove pg_amop.opreqcheck, as per recent
discussion. The main immediate benefit of this is that we no longer need
8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery
searches on GIN indexes. In future it should be possible to optimize some
other queries better than is done now, by detecting at runtime whether the
index match is exact or not.
Tom Lane, after an idea of Heikki's, and with some help from Teodor.
Tom Lane [Sun, 13 Apr 2008 20:51:21 +0000 (20:51 +0000)]
Since createplan.c no longer cares whether index operators are lossy, it has
no particular need to do get_op_opfamily_properties() while building an
indexscan plan. Postpone that lookup until executor start. This simplifies
createplan.c a lot more than it complicates nodeIndexscan.c, and makes things
more uniform since we already had to do it that way for RowCompare
expressions. Should be a bit faster too, at least for plans that aren't
re-used many times, since we avoid palloc'ing and perhaps copying the
intermediate list data structure.
Tom Lane [Sun, 13 Apr 2008 19:18:14 +0000 (19:18 +0000)]
Phase 2 of project to make index operator lossiness be determined at runtime
instead of plan time. Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now). For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions. The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
Tom Lane [Sun, 13 Apr 2008 03:49:22 +0000 (03:49 +0000)]
Turn the -i/--ignore-version options of pg_dump and pg_dumpall into no-ops:
the server version check is now always enforced. Relax the version check to
allow a server that is of pg_dump's own major version but a later minor
version; this is the only case that -i was at all safe to use in.
pg_restore already enforced only a very weak version check, so this is
really just a documentation change for it.
Tom Lane [Sat, 12 Apr 2008 23:21:04 +0000 (23:21 +0000)]
Clean up a few places where Datums were being treated as pointers without
going through DatumGetPointer or some other "official" conversion macro.
Not actually a bug, since Datum the same size as pointer is the only
supported case at the moment, but good cleanup for the future.
Tom Lane [Sat, 12 Apr 2008 23:14:21 +0000 (23:14 +0000)]
Create new routines systable_beginscan_ordered, systable_getnext_ordered,
systable_endscan_ordered that have API similar to systable_beginscan etc
(in particular, the passed-in scankeys have heap not index attnums),
but guarantee ordered output, unlike the existing functions. For the moment
these are just very thin wrappers around index_beginscan/index_getnext/etc.
Someday they might need to get smarter; but for now this is just a code
refactoring exercise to reduce the number of direct callers of index_getnext,
in preparation for changing that function's API.
In passing, remove index_getnext_indexitem, which has been dead code for
quite some time, and will have even less use than that in the presence
of run-time-lossy indexes.
Tom Lane [Fri, 11 Apr 2008 23:53:00 +0000 (23:53 +0000)]
A quick try at un-breaking the Cygwin build. Whether it needs the
pgwin32_safestat remains to be determined, but in any case the current
code is not tolerable.
Tom Lane [Fri, 11 Apr 2008 22:54:23 +0000 (22:54 +0000)]
Add some debug support code to try to catch future mistakes in the area of
input functions that include garbage bytes in their results. Provide a
compile-time option RANDOMIZE_ALLOCATED_MEMORY to make palloc fill returned
blocks with variable contents. This option also makes the parser perform
conversions of literal constants twice and compare the results, emitting a
WARNING if they don't match. (This is the code I used to catch the input
function bugs fixed in the previous commit.) For the moment, I've set it
to be activated automatically by --enable-cassert.
Tom Lane [Fri, 11 Apr 2008 22:52:05 +0000 (22:52 +0000)]
Fix several datatype input functions that were allowing unused bytes in their
results to contain uninitialized, unpredictable values. While this was okay
as far as the datatypes themselves were concerned, it's a problem for the
parser because occurrences of the "same" literal might not be recognized as
equal by datumIsEqual (and hence not by equal()). It seems sufficient to fix
this in the input functions since the only critical use of equal() is in the
parser's comparisons of ORDER BY and DISTINCT expressions.
Per a trouble report from Marc Cousin.
Patch all the way back. Interestingly, array_in did not have the bug before
8.2, which may explain why the issue went unnoticed for so long.
Tom Lane [Thu, 10 Apr 2008 22:25:26 +0000 (22:25 +0000)]
Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole
indexscan always occurs in one call, and the results are returned in a
TIDBitmap instead of a limited-size array of TIDs. This should improve
speed a little by reducing AM entry/exit overhead, and it is necessary
infrastructure if we are ever to support bitmap indexes.
In an only slightly related change, add support for TIDBitmaps to preserve
(somewhat lossily) the knowledge that particular TIDs reported by an index
need to have their quals rechecked when the heap is visited. This facility
is not really used yet; we'll need to extend the forced-recheck feature to
plain indexscans before it's useful, and that hasn't been coded yet.
The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
with weighted queries. There might be other uses in future, but that one
alone is sufficient reason.
Magnus Hagander [Thu, 10 Apr 2008 16:58:51 +0000 (16:58 +0000)]
Create wrapper pgwin32_safestat() and redefine stat() to it
on win32, because the stat() function in the runtime cannot
be trusted to always update the st_size field.
Tom Lane [Tue, 8 Apr 2008 18:20:29 +0000 (18:20 +0000)]
Fix tsvector_update_trigger() to be domain-friendly: it needs to allow all
the columns it works with to be domains over the expected type, not just
exactly the expected type. In passing, fix ts_stat() the same way.
Per report from Markus Wollny.
Implement a few changes to how shared libraries and dynamically loadable
modules are built. Foremost, it creates a solid distinction between these two
types of targets based on what had already been implemented and duplicated in
ad hoc ways before. Specifically,
- Dynamically loadable modules no longer get a soname. The numbers previously
set in the makefiles were dummy numbers anyway, and the presence of a soname
upset a few packaging tools, so it is nicer not to have one.
- The cumbersome detour taken on installation (build a libfoo.so.0.0.0 and
then override the rule to install foo.so instead) is removed.
Tom Lane [Sun, 6 Apr 2008 16:54:49 +0000 (16:54 +0000)]
Improve hash_any() to use word-wide fetches when hashing suitably aligned
data. This makes for a significant speedup at the cost that the results
now vary between little-endian and big-endian machines; which forces us
to add explicit ORDER BYs in a couple of regression tests to preserve
machine-independent comparison results. Also, force initdb by bumping
catversion, since the contents of hash indexes will change (at least on
big-endian machines).
Kenneth Marshall and Tom Lane, based on work from Bob Jenkins. This commit
does not adopt Bob's new faster mix() algorithm, however, since we still need
to convince ourselves that that doesn't degrade the quality of the hashing.
Tom Lane [Sat, 5 Apr 2008 01:58:20 +0000 (01:58 +0000)]
Defend against JOINs having more than 32K columns altogether. We cannot
currently support this because we must be able to build Vars referencing
join columns, and varattno is only 16 bits wide. Perhaps this should be
improved in future, but considering that it never came up before, I'm not
sure the problem is worth much effort. Per bug #4070 from Marcello
Ceschia.
The problem seems largely academic in 8.0 and 7.4, because they have
(different) O(N^2) performance issues with such wide joins, but
back-patch all the way anyway.
Bruce Momjian [Sat, 5 Apr 2008 01:34:06 +0000 (01:34 +0000)]
Have pg_stop_backup() wait for all archive files to be sent, rather than
returing right away. This guarantees that when pg_stop_backup()
returns, you have a valid backup.
Tom Lane [Fri, 4 Apr 2008 18:45:36 +0000 (18:45 +0000)]
Re-implement division for numeric values using the traditional "schoolbook"
algorithm. This is a good deal slower than our old roundoff-error-prone
code for long inputs, so we keep the old code for use in the transcendental
functions, where everything is approximate anyway. Also create a
user-accessible function div(numeric, numeric) to provide access to the
exact result of trunc(x/y) --- since the regular numeric / operator will
round off its result, simply computing that expression in SQL doesn't
reliably give the desired answer. This fixes bug #3387 and various related
corner cases, and improves the usefulness of PG for high-precision integer
arithmetic.