Tom Lane [Mon, 10 Mar 2008 02:04:10 +0000 (02:04 +0000)]
Reduce memory consumption during VACUUM of large relations, by using
FSMPageData (6 bytes) instead of PageFreeSpaceInfo (8 or 16 bytes)
for the temporary array of page-free-space information.
Tom Lane [Mon, 10 Mar 2008 01:23:04 +0000 (01:23 +0000)]
Fix pgbench's getrand() function so that min and max have approximately
the same chance of being selected as do numbers between them. Problem
noted by Greg Stark; fix by Alexey Klyukin.
Tom Lane [Sun, 9 Mar 2008 04:56:28 +0000 (04:56 +0000)]
Remove postmaster.c's check that NBuffers is at least twice MaxBackends.
With the addition of multiple autovacuum workers, our choices were to delete
the check, document the interaction with autovacuum_max_workers, or complicate
the check to try to hide that interaction. Since this restriction has never
been adequate to ensure backends can't run out of pinnable buffers, it doesn't
really have enough excuse to live to justify the second or third choices.
Per discussion of a complaint from Andreas Kling (see also bug #3888).
This commit also removes several documentation references to this restriction,
but I'm not sure I got them all.
Tom Lane [Sun, 9 Mar 2008 00:32:09 +0000 (00:32 +0000)]
Change patternsel() so that instead of switching from a pure
pattern-examination heuristic method to purely histogram-driven selectivity at
histogram size 100, we compute both estimates and use a weighted average.
The weight put on the heuristic estimate decreases linearly with histogram
size, dropping to zero for 100 or more histogram entries.
Likewise in ltreeparentsel(). After a patch by Greg Stark, though I
reorganized the logic a bit to give the caller of histogram_selectivity()
more control.
Tom Lane [Sat, 8 Mar 2008 22:41:38 +0000 (22:41 +0000)]
Modify prefix_selectivity() so that it will never estimate the selectivity
of the generated range condition var >= 'foo' AND var < 'fop' as being less
than what eqsel() would estimate for var = 'foo'. This is intuitively
reasonable and it gets rid of the need for some entirely ad-hoc coding we
formerly used to reject bogus estimates. The basic problem here is that
if the prefix is more than a few characters long, the two boundary values
are too close together to be distinguishable by comparison to the column
histogram, resulting in a selectivity estimate of zero, which is often
not very sane. Change motivated by an example from Peter Eisentraut.
Arguably this is a bug fix, but I'll refrain from back-patching it
for the moment.
Tom Lane [Sat, 8 Mar 2008 21:57:59 +0000 (21:57 +0000)]
Refactor heap_page_prune so that instead of changing item states on-the-fly,
it accumulates the set of changes to be made and then applies them. It had
to accumulate the set of changes anyway to prepare a WAL record for the
pruning action, so this isn't an enormous change; the only new complexity is
to not doubly mark tuples that are visited twice in the scan. The main
advantage is that we can substantially reduce the scope of the critical
section in which the changes are applied, thus avoiding PANIC in foreseeable
cases like running out of memory in inval.c. A nice secondary advantage is
that it is now far clearer that WAL replay will actually do the same thing
that the original pruning did.
This commit doesn't do anything about the open problem that
CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change
caused by collapsing out a redirect pointer. But whatever we do about that,
it'll be a good idea to not do it inside a critical section.
Andrew Dunstan [Sat, 8 Mar 2008 01:16:26 +0000 (01:16 +0000)]
Improve efficiency of attribute scanning in CopyReadAttributesCSV.
The loop is split into two parts, inside quotes, and outside quotes, saving some instructions in both parts.
Tom Lane [Sat, 8 Mar 2008 01:09:36 +0000 (01:09 +0000)]
Improve pglz_decompress() so that it cannot clobber memory beyond the
available output buffer when presented with corrupt input. Some testing
suggests that this slows the decompression loop about 1%, which seems an
acceptable price to pay for more robustness. (Curiously, the penalty
seems to be *less* on not-very-compressible data, which I didn't expect
since the overhead per output byte ought to be more in the literal-bytes
path.)
Patch from Zdenek Kotala. I fixed a corner case and did some renaming
of variables to make the routine more readable.
Tom Lane [Fri, 7 Mar 2008 23:20:21 +0000 (23:20 +0000)]
This patch addresses some issues in TOAST compression strategy that
were discussed last year, but we felt it was too late in the 8.3 cycle to
change the code immediately. Specifically, the patch:
* Reduces the minimum datum size to be considered for compression from
256 to 32 bytes, as suggested by Greg Stark.
* Increases the required compression rate for compressed storage from
20% to 25%, again per Greg's suggestion.
* Replaces force_input_size (size above which compression is forced)
with a maximum size to be considered for compression. It was agreed
that allowing large inputs to escape the minimum-compression-rate
requirement was not bright, and that indeed we'd rather have a knob
that acted in the other direction. I set this value to 1MB for the
moment, but it could use some performance studies to tune it.
* Adds an early-failure path to the compressor as suggested by Jan:
if it's been unable to find even one compressible substring in the
first 1KB (parameterizable), assume we're looking at incompressible
input and give up. (Possibly this logic can be improved, but I'll
commit it as-is for now.)
* Improves the toasting heuristics so that when we have very large
fields with attstorage 'x' or 'e', we will push those out to toast
storage before considering inline compression of shorter fields.
This also responds to a suggestion of Greg's, though my original
proposal for a solution was a bit off base because it didn't fix
the problem for large 'e' fields.
There was some discussion in the earlier threads of exposing some
of the compression knobs to users, perhaps even on a per-column
basis. I have not done anything about that here. It seems to me
that if we are changing around the parameters, we'd better get some
experience and be sure we are happy with the design before we set
things in stone by providing user-visible knobs.
Bruce Momjian [Fri, 7 Mar 2008 20:38:59 +0000 (20:38 +0000)]
Add:
>
> * Add a function like pg_get_indexdef() that report more detailed index
> information
>
> http://archives.postgresql.org/pgsql-bugs/2007-12/msg00166.php
>
Bruce Momjian [Fri, 7 Mar 2008 20:22:25 +0000 (20:22 +0000)]
Add:
>
>
> o Prevent autovacuum from running if an old transaction is still
> running from the last vacuum
>
> http://archives.postgresql.org/pgsql-hackers/2007-11/msg00899.php
>
Bruce Momjian [Fri, 7 Mar 2008 19:03:39 +0000 (19:03 +0000)]
Add item:
> o Store per-table autovacuum settings in pg_class.reloptions.
>
> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01440.php
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00724.php
Tom Lane [Fri, 7 Mar 2008 15:59:03 +0000 (15:59 +0000)]
Change hashscan.c to keep its list of active hash index scans in
TopMemoryContext, rather than scattered through executor per-query contexts.
This poses no danger of memory leak since the ResourceOwner mechanism
guarantees release of no-longer-needed items. It is needed because the
per-query context might already be released by the time we try to clean up
the hash scan list. Report by ykhuang, diagnosis by Heikki.
Back-patch to 8.0, where the ResourceOwner-based cleanup was introduced.
The given test case does not fail before 8.2, probably because we rearranged
transaction abort processing somehow; but this coding is undoubtedly risky
so I'll patch 8.0 and 8.1 anyway.
Bruce Momjian [Fri, 7 Mar 2008 14:57:39 +0000 (14:57 +0000)]
Add:
>
> * Add comments on system tables/columns using the information in
> catalogs.sgml
>
> Ideally the information would be pulled from the SGML file
> automatically.
>
Teodor Sigaev [Fri, 7 Mar 2008 14:30:20 +0000 (14:30 +0000)]
Fix memory arrangement of tsquery after removing stop words. It causes
a unused memory holes in tsquery.
Per report by Richard Huxton <dev@archonet.com>.
It was working well because in fact tsquery->size is not used for any
kind of operation except comparing tsqueries. So, in HEAD it's enough to
fix to_tsquery function, but for previous version it's needed to
remove optimization in CompareTSQ to prevent requirement of renew all
stored tsquery.
Bruce Momjian [Fri, 7 Mar 2008 00:10:13 +0000 (00:10 +0000)]
Add:
> o Have \d show foreign keys that reference a table's primary key
>
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00424.php
>
> o Have \d show child tables that inherit from the specified parent
Bruce Momjian [Thu, 6 Mar 2008 22:09:43 +0000 (22:09 +0000)]
Add URLs for:
* Consider compressing indexes by storing key values duplicated in
several rows as a single index entry
>
> http://archives.postgresql.org/pgsql-hackers/2006-12/msg00341.php
> http://archives.postgresql.org/pgsql-hackers/2007-02/msg01264.php
> http://archives.postgresql.org/pgsql-hackers/2007-03/msg00465.php
>
Bruce Momjian [Thu, 6 Mar 2008 21:25:50 +0000 (21:25 +0000)]
Add:
>
> * Allow client certificate names to be checked against the client
> hostname
>
> This is already implemented in
> libpq/fe-secure.c::verify_peer_name_matches_certificate() but the code
> is commented out.
Bruce Momjian [Thu, 6 Mar 2008 17:19:38 +0000 (17:19 +0000)]
Add:
> * Prevent malicious functions from being executed with the permissions
> of unsuspecting users
>
> Index functions are safe, so VACUUM and ANALYZE are safe too.
> Triggers, CHECK and DEFAULT expressions, and rules are still vulnerable.
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00268.php
Bruce Momjian [Thu, 6 Mar 2008 03:18:19 +0000 (03:18 +0000)]
Add:
>
> o Have CONSTRAINT cname NOT NULL preserve the contraint name
>
> Right now pg_attribute.attnotnull records the NOT NULL status
> of the column, but does not record the contraint name
>
Tom Lane [Wed, 5 Mar 2008 17:01:26 +0000 (17:01 +0000)]
In PrepareToInvalidateCacheTuple, don't force initialization of catalog
caches that we don't actually need to touch. This saves some trivial
number of cycles and avoids certain cases of deadlock when doing concurrent
VACUUM FULL on system catalogs. Per report from Gavin Roy.
Backpatch to 8.2. In earlier versions, CatalogCacheInitializeCache didn't
lock the relation so there's no deadlock risk (though that certainly had
plenty of risks of its own).
Bruce Momjian [Wed, 5 Mar 2008 16:59:10 +0000 (16:59 +0000)]
Document that increasing the number of checkpoints segments or
checkpoint timeout can incrase the time needed for crash recovery, per
suggestion from Simon.
Tom Lane [Tue, 4 Mar 2008 19:54:06 +0000 (19:54 +0000)]
Fix PREPARE TRANSACTION to reject the case where the transaction has dropped a
temporary table; we can't support that because there's no way to clean up the
source backend's internal state if the eventual COMMIT PREPARED is done by
another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(.
Patch by Heikki Linnakangas, original trouble report by John Smith.
Bruce Momjian [Tue, 4 Mar 2008 02:48:22 +0000 (02:48 +0000)]
Remove:
<
< o To better utilize resources, restore data, primary keys, and
< indexes for a single table before restoring the next table
<
< Hopefully this will allow the CPU-I/O load to be more uniform
< for simultaneous restores. The idea is to start data restores
< for several objects, and once the first object is done, to move
< on to its primary keys and indexes. Over time, simultaneous
< data loads and index builds will be running.
Bruce Momjian [Tue, 4 Mar 2008 01:33:32 +0000 (01:33 +0000)]
Add ideas for concurrent pg_dump and pg_restore:
< * pg_dump
> * pg_dump / pg_restore
> o Allow pg_dump to utilize multiple CPUs and I/O channels by dumping
> multiple objects simultaneously
>
> The difficulty with this is getting multiple dump processes to
> produce a single dump output file.
> http://archives.postgresql.org/pgsql-hackers/2008-02/msg00205.php
>
> o Allow pg_restore to utilize multiple CPUs and I/O channels by
> restoring multiple objects simultaneously
>
> This might require a pg_restore flag to indicate how many
> simultaneous operations should be performed. Only pg_dump's
> -Fc format has the necessary dependency information.
>
> o To better utilize resources, restore data, primary keys, and
> indexes for a single table before restoring the next table
>
> Hopefully this will allow the CPU-I/O load to be more uniform
> for simultaneous restores. The idea is to start data restores
> for several objects, and once the first object is done, to move
> on to its primary keys and indexes. Over time, simultaneous
> data loads and index builds will be running.
>
> o To better utilize resources, allow pg_restore to check foreign
> keys simultaneously, where possible
> o Allow pg_restore to create all indexes of a table
> concurrently, via a single heap scan
>
> This requires a pg_dump -Fc file because that format contains
> the required dependency information.
> http://archives.postgresql.org/pgsql-general/2007-05/msg01274.php
>
> o Allow pg_restore to load different parts of the COPY data
> simultaneously
< single heap scan, and have a restore of a pg_dump somehow use it
> single heap scan, and have pg_restore use it
< http://archives.postgresql.org/pgsql-general/2007-05/msg01274.php
Bruce Momjian [Mon, 3 Mar 2008 21:00:35 +0000 (21:00 +0000)]
Add another URL for:
o Consider using a ring buffer for COPY FROM
<
< http://archives.postgresql.org/pgsql-hackers/2008-02/msg01080.php
> http://archives.postgresql.org/pgsql-hackers/2008-02/msg01080.php
Bruce Momjian [Mon, 3 Mar 2008 18:45:24 +0000 (18:45 +0000)]
Add:
> * Speed WAL recovery by allowing more than one page to be prefetched
>
> This involves having a separate process that can be told which pages
> the recovery process will need in the near future.
> http://archives.postgresql.org/pgsql-hackers/2008-02/msg01279.php
>
Bruce Momjian [Mon, 3 Mar 2008 15:06:55 +0000 (15:06 +0000)]
Add URL's for sequence discussions:
>
> http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php
>
< o %Have ALTER TABLE RENAME rename SERIAL sequence names
> o Have ALTER TABLE RENAME rename SERIAL sequence names
>
> http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php
>
> http://archives.postgresql.org/pgsql-hackers/2008-03/msg00008.php
Tom Lane [Sat, 1 Mar 2008 19:26:22 +0000 (19:26 +0000)]
Fix another place that was assuming that a local variable declared as
"struct varlena" would be at least word-aligned. Per buildfarm results
from gypsy_moth. I did a little bit of trawling for other instances of
this coding pattern, and didn't find any; but if we turn up any more
of them I think we'd better revert the "char [4]" patch and find another
way of making tuptoaster.c alignment-safe.
Tom Lane [Sat, 1 Mar 2008 03:26:35 +0000 (03:26 +0000)]
Fix unportable usages of tolower(). On signed-char machines, it is necessary
to explicitly cast the output back to char before comparing it to a char
value, else we get the wrong result for high-bit-set characters. Found by
Rolf Jentsch. Also, fix several places where <ctype.h> functions were being
called without casting the argument to unsigned char; this is likewise
unportable, but we keep making that mistake :-(. These found by buildfarm
member salamander, which I will desperately miss if it ever goes belly-up.
Tom Lane [Sat, 1 Mar 2008 02:46:49 +0000 (02:46 +0000)]
Disable the undocumented xmlvalidate() function, which was unintentionally
left in the code though it was not meant to be provided. It represents a
security hole because unprivileged users could use it to look at (at least the
first line of) any file readable by the backend. Fortunately, this is only
possible if the backend was built with XML support, so the damage is at least
mitigated; and 8.3 probably hasn't propagated into any security-critical uses
yet anyway. Per report from Sergey Burladyan.
Tom Lane [Fri, 29 Feb 2008 17:47:41 +0000 (17:47 +0000)]
Reducing the assumed alignment of struct varlena means that the compiler
is also licensed to put a local variable declared that way at an unaligned
address. Which will not work if the variable is then manipulated with
SET_VARSIZE or other macros that assume alignment. So the previous patch
is not an unalloyed good, but on balance I think it's still a win, since
we have very few places that do that sort of thing. Fix the one place in
tuptoaster.c that does it. Per buildfarm results from gypsy_moth
(I'm a bit surprised that only one machine showed a failure).
Magnus Hagander [Fri, 29 Feb 2008 15:31:33 +0000 (15:31 +0000)]
Fix handling of restricted processes for Windows Vista (mainly),
by explicitly adding back the user to the DACL of the new process.
This fixes the failure case when executing as the Administrator
user, which had no permissions left at all after we dropped the
Administrators group.
Neil Conway [Fri, 29 Feb 2008 02:49:39 +0000 (02:49 +0000)]
Fix several memory leaks when rescanning SRFs. Arrange for an SRF's
"multi_call_ctx" to be a distinct sub-context of the EState's per-query
context, and delete the multi_call_ctx as soon as the SRF finishes
execution. This avoids leaking SRF memory until the end of the current
query, which is particularly egregious when the SRF is scanned
multiple times. This change also fixes a leak of the fields of the
AttInMetadata struct in shutdown_MultiFuncCall().
Also fix a leak of the SRF result TupleDesc when rescanning a
FunctionScan node. The TupleDesc is allocated in the per-query context
for every call to ExecMakeTableFunctionResult(), so we should free it
after calling that function. Since the SRF might choose to return
a non-expendable TupleDesc, we only free the TupleDesc if it is
not being reference-counted.
Peter Eisentraut [Wed, 27 Feb 2008 20:31:01 +0000 (20:31 +0000)]
Change expand_subsys function so that it preserves the relative order of
the files passed as argument. This is desirable so that the dtrace rule
in src/backend/Makefile works.
Tom Lane [Wed, 27 Feb 2008 17:44:19 +0000 (17:44 +0000)]
If RelationBuildDesc() fails to open a critical system index, PANIC with
a relevant error message instead of just dumping core. Odd that nobody
reported this before Darren Reed.
Peter Eisentraut [Tue, 26 Feb 2008 16:07:16 +0000 (16:07 +0000)]
In the SSH setup instructions, change
ssh -L 3333:foo.com:5432 joe@foo.com
I think this should be changed to
ssh -L 3333:localhost:5432 joe@foo.com
The reason is that this assumes the postgres server on foo.com allows
connections from foo.com, which is not allowed by the default
listen_addresses setting. Add more detail explaining this.
pointed out by Faheem Mitha
Also change the example port number 3333 to 63333 so no one can complain
that we are stealing a reserved port number.
Peter Eisentraut [Tue, 26 Feb 2008 13:31:40 +0000 (13:31 +0000)]
Create two separate libpq.rc's: One that is built at build time, and one
that is shipped in the distribution, named libpq-dist.rc. This way the
build system doesn't get upset when a distributed file is forcibly
overwritten by during a normal build.
Peter Eisentraut [Tue, 26 Feb 2008 10:45:24 +0000 (10:45 +0000)]
Reorganize some of the exports list generation code. It seems that this
has been reinvented about four different times throughout history (aix,
cygwin, win32, darwin/linux) and a lot of the concepts are actually shared,
which the code now shows better.
Peter Eisentraut [Tue, 26 Feb 2008 07:20:38 +0000 (07:20 +0000)]
We don't need to rebuild objfiles.txt every time an object file changes.
So only rebuild when a makefile changes (which presumably defines the
file list somewhere), and only touch the file if an object changed. The
touch is necessary so the parent make knows something changed and
ultimately rebuilds postgres.
Tom Lane [Tue, 26 Feb 2008 02:54:08 +0000 (02:54 +0000)]
Fix encode(...bytea..., 'escape') so that it converts all high-bit-set byte
values into \nnn octal escape sequences. When the database encoding is
multibyte this is *necessary* to avoid generating invalidly encoded text.
Even in a single-byte encoding, the old behavior seems very hazardous ---
consider for example what happens if the text is transferred to another
database with a different encoding. Decoding would then yield some other
bytea value than what was encoded, which is surely undesirable. Per gripe
from Hernan Gonzalez.
Backpatch to 8.3, but not further. This is a bit of a judgment call, but I
make it on these grounds: pre-8.3 we don't really have much encoding safety
anyway because of the convert() function family, and we would also have much
higher risk of breaking existing apps that may not be expecting this behavior.
8.3 is still new enough that we can probably get away with making this change
in the function's behavior.
Tom Lane [Mon, 25 Feb 2008 23:36:28 +0000 (23:36 +0000)]
Reject year zero during datetime input, except when it's a 2-digit year
(then it means 2000 AD). Formerly we silently interpreted this as 1 BC,
which at best is unwarranted familiarity with the implementation.
It's barely possible that some app somewhere expects the old behavior,
though, so we won't back-patch this into existing release branches.