Bruce Momjian [Fri, 29 Jul 2005 03:23:00 +0000 (03:23 +0000)]
Done:
< * Consider use of open/fcntl(O_DIRECT) to minimize OS caching,
< especially for WAL writes
> * -Consider use of open/fcntl(O_DIRECT) to minimize OS caching,
> for WAL writes
> If we disable writeback-cache and use open_sync, the per-page writing
> behavior in WAL module will show up as bad result. O_DIRECT is similar
> to O_DSYNC (at least on linux), so that the benefit of it will disappear
> behind the slow disk revolution.
>
> In the current source, WAL is written as:
> for (i = 0; i < N; i++) { write(&buffers[i], BLCKSZ); }
> Is this intentional? Can we rewrite it as follows?
> write(&buffers[0], N * BLCKSZ);
>
> In order to achieve it, I wrote a 'gather-write' patch (xlog.gw.diff).
> Aside from this, I'll also send the fixed direct io patch (xlog.dio.diff).
> These two patches are independent, so they can be applied either or both.
>
>
> I tested them on my machine and the results as follows. It shows that
> direct-io and gather-write is the best choice when writeback-cache is off.
> Are these two patches worth trying if they are used together?
>
>
> | writeback | fsync= | fdata | open_ | fsync_ | open_
> patch | cache | false | sync | sync | direct | direct
> ------------+-----------+--------+-------+-------+--------+---------
> direct io | off | 124.2 | 105.7 | 48.3 | 48.3 | 48.2
> direct io | on | 129.1 | 112.3 | 114.1 | 142.9 | 144.5
> gather-write| off | 124.3 | 108.7 | 105.4 | (N/A) | (N/A)
> both | off | 131.5 | 115.5 | 114.4 | 145.4 | 145.2
>
> - 20runs * pgbench -s 100 -c 50 -t 200
> - with tuning (wal_buffers=64, commit_delay=500, checkpoint_segments=8)
> - using 2 ATA disks:
> - hda(reiserfs) includes system and wal.
> - hdc(jfs) includes database files. writeback-cache is always on.
>
> ---
> ITAGAKI Takahiro
Bruce Momjian [Fri, 29 Jul 2005 03:17:55 +0000 (03:17 +0000)]
Thank you for applying patch --- regexp_replace.
An attached patch is a small additional improvement.
This patch use appendStringInfoText instead of appendStringInfoString.
There is an overhead of PG_TEXT_GET_STR when appendStringInfoString is
executed by text type. This can be reduced by appendStringInfoText.
Tom Lane [Thu, 28 Jul 2005 20:26:22 +0000 (20:26 +0000)]
Fix a bunch of bad interactions between partial indexes and the new
planning logic for bitmap indexscans. Partial indexes create corner
cases in which a scan might be done with no explicit index qual conditions,
and the code wasn't handling those cases nicely. Also be a little
tenser about eliminating redundant clauses in the generated plan.
Per report from Dmitry Karasik.
Neil Conway [Thu, 28 Jul 2005 07:51:13 +0000 (07:51 +0000)]
Refactor exec_cast_value() and exec_simple_cast_value(): since they do
not ever write through the `isnull' parameter, it does not need to be
an out parameter. Therefore it can be declared a "bool" rather than a
"bool *".
Neil Conway [Thu, 28 Jul 2005 07:38:33 +0000 (07:38 +0000)]
Mark a static array "const" to move a few bytes from the "data" segment
to the "text" segment. It would be possible to mark the elements of the
array "const" as well, but this would require multiple API changes and
does not seem to be worth the notational inconvenience.
Tom Lane [Thu, 28 Jul 2005 04:31:30 +0000 (04:31 +0000)]
Put libpgport into OBJS instead of LIBS, so that it gets included
into .def and .exp files automatically on Windows, AIX, and the like.
An additional benefit is that changes in libpgport files correctly
propagate to force rebuild of the backend executable. This is my
reworking of Rocco Altier's idea, and if it breaks anything it's
definitely my fault.
Tom Lane [Thu, 28 Jul 2005 04:03:14 +0000 (04:03 +0000)]
Fix a whole bunch of #includes that were either wrong or redundant.
The first rule of portability for us is 'thou shalt have no other gods
before c.h', and a whole lot of these files were either not including
c.h at all, or including random system headers beforehand, either of
which sins can mess up largefile support nicely. Once you have
included c.h, there is no need to re-include what it includes, either.
Neil Conway [Wed, 27 Jul 2005 12:44:10 +0000 (12:44 +0000)]
Fix a few macro definitions to ensure that unary minus is enclosed in
parentheses. This avoids possible operator precedence problems, and
is consistent with most of the macro definitions in the tree.
Tom Lane [Tue, 26 Jul 2005 16:38:29 +0000 (16:38 +0000)]
Add a role property 'rolinherit' which, when false, denotes that the role
doesn't automatically inherit the privileges of roles it is a member of;
for such a role, membership in another role can be exploited only by doing
explicit SET ROLE. The default inherit setting is TRUE, so by default
the behavior doesn't change, but creating a user with NOINHERIT gives closer
adherence to our current reading of SQL99. Documentation still lacking,
and I think the information schema needs another look.
Tom Lane [Tue, 26 Jul 2005 00:04:19 +0000 (00:04 +0000)]
Add pg_has_role() family of privilege inquiry functions modeled after the
existing ones for object privileges. Update the information_schema for
roles --- pg_has_role() makes this a whole lot easier, removing the need
for most of the explicit joins with pg_user. The views should be a tad
faster now, too. Stephen Frost and Tom Lane.
Tom Lane [Mon, 25 Jul 2005 04:52:32 +0000 (04:52 +0000)]
Awhile back we replaced all uses of strcasecmp and strncasecmp with
pg_strcasecmp and pg_strncasecmp ... but I see some of the former have
crept back in.
Eternal vigilance is the price of locale independence, apparently.
Tom Lane [Mon, 25 Jul 2005 00:58:27 +0000 (00:58 +0000)]
Change build of regress.so to use Makefile.shlib instead of depending
on the not-very-good .so pattern rules in the port-specific Makefiles.
(This leaves only pgxs' MODULES case needing those rules.) Also,
compile pgsleep.c locally and add it to regress.so to avoid failure
on AIX.
Tom Lane [Sun, 24 Jul 2005 17:07:18 +0000 (17:07 +0000)]
With the interval/day patch, the horology regression test no longer
fails near DST transition days, so remove the advice about that testing
problem. Also improve the description of variant-comparison-file
selection.
Tom Lane [Sun, 24 Jul 2005 02:25:26 +0000 (02:25 +0000)]
Fix logic error in tbm_intersect: the intersection of a normal page and
a lossy page has to be lossy, because we don't know exactly which tuples
on the page should remain part of the bitmap. Per Jie Zhang.
Tom Lane [Sun, 24 Jul 2005 00:33:28 +0000 (00:33 +0000)]
Fix some failures to initialize table entries induced by recent autovacuum
integration. Not clear this explains recent stats problems, but it's
definitely wrong.
Tom Lane [Sat, 23 Jul 2005 21:05:48 +0000 (21:05 +0000)]
Simple constraint exclusion. For now, only child tables of inheritance
scans are candidates for exclusion; this should be fixed eventually.
Simon Riggs, with some help from Tom Lane.
Tom Lane [Sat, 23 Jul 2005 14:18:57 +0000 (14:18 +0000)]
In the stats test, delay for the stats collector to catch up using a
function that actually sleeps, instead of busy-waiting. Perhaps this
will resolve some of the intermittent stats failures we keep seeing.
Bruce Momjian [Sat, 23 Jul 2005 02:02:27 +0000 (02:02 +0000)]
Fix AT TIME ZONE for timestamps without time zones:
test=> select (CURRENT_DATE + '05:00'::time)::timestamp at time zone
'Canada/Pacific';
timezone
------------------------
2005-07-22 08:00:00-04
(1 row)
Bruce Momjian [Fri, 22 Jul 2005 21:16:15 +0000 (21:16 +0000)]
Fix AT TIME ZONE for timestamps without time zones:
test=> select ('2005-07-20 00:00:00'::timestamp without time zone) at
time zone 'Europe/Paris';
timezone
------------------------
2005-07-19 22:00:00-04
Tom Lane [Fri, 22 Jul 2005 19:12:02 +0000 (19:12 +0000)]
Fix compare_fuzzy_path_costs() to behave a bit more sanely. The original
coding would ignore startup cost differences of less than 1% of the
estimated total cost; which was OK for normal planning but highly not OK
if a very small LIMIT was applied afterwards, so that startup cost becomes
the name of the game. Instead, compare startup and total costs fuzzily
but independently. This changes the plan selected for two queries in the
regression tests; adjust expected-output files for resulting changes in
row order. Per reports from Dawid Kuroczko and Sam Mason.
Tom Lane [Thu, 21 Jul 2005 04:15:04 +0000 (04:15 +0000)]
Fix storage size for btree_gist interval indexes. Fix penalty
calculations for interval and time/timetz to behave sanely for both
integer and float timestamps; up to now I think it's been doing
something pretty strange...
Tom Lane [Mon, 18 Jul 2005 22:34:14 +0000 (22:34 +0000)]
Fix some bogosities in geometric-function documentation: add an entry
for circle(polygon), which was missing; remove bogus entry for
point(lseg, lseg), which does not exist, and the documentation seemed to
describe lseg_interpt, which we already document as an operator not a
function. Also remove entry for box_intersect, which likewise is
preferentially used via the operator #.
Tom Lane [Mon, 18 Jul 2005 17:40:14 +0000 (17:40 +0000)]
Adjust psql describe queries so that any pg_foo_is_visible() condition
is applied last, after other constraints such as name patterns. This
is useful first because the pg_foo_is_visible() functions are relatively
expensive, and second because it minimizes the prospects for race
conditions. The change is fragile though since it makes unwarranted
assumptions about planner behavior, ie, that WHERE clauses will be
executed in the original order if there's not reason to change it.
This should fix ... or at least hide ... an intermittent failure in the
prepared_xacts regression test, while we think about what else to do.
Tom Lane [Mon, 18 Jul 2005 15:53:28 +0000 (15:53 +0000)]
MemSet() must not cast its pointer argument to int32* until after it has
checked that the pointer is actually word-aligned. Casting a non-aligned
pointer to int32* is technically illegal per the C spec, and some recent
versions of gcc actually generate bad code for the memset() when given
such a pointer. Per report from Andrew Morrow.
Tom Lane [Sun, 17 Jul 2005 18:28:45 +0000 (18:28 +0000)]
Make pg_regress accept a command-line option for the temporary installation's
port number, and use a default value for it that is dependent on the
configuration-time DEF_PGPORT. Should make the world safe for running
parallel 'make check' in different branches. Back-patch as far as 7.4
so that this actually is useful.
Tom Lane [Fri, 15 Jul 2005 22:02:51 +0000 (22:02 +0000)]
Fix create_unique_plan() so it doesn't generate useless entries in the
output targetlist of the Unique or HashAgg plan. This code was OK when
written, but subsequent changes to use "physical tlists" where possible
had broken it: given an input subplan that has extra variables added to
avoid a projection step, it would copy those extra variables into the
upper tlist, which is pointless since a projection has to happen anyway.
Tom Lane [Fri, 15 Jul 2005 18:39:59 +0000 (18:39 +0000)]
Check for out-of-range varoattno in deparse_context_for_subplan.
I have seen this case in CVS tip due to new "physical tlist" optimization
for subqueries. I believe it probably can't happen in existing releases,
but the check is not going to hurt anything, so backpatch to 8.0 just
in case.
Tom Lane [Fri, 15 Jul 2005 17:09:26 +0000 (17:09 +0000)]
Fix overenthusiastic optimization of 'x IN (SELECT DISTINCT ...)' and related
cases: we can't just consider whether the subquery's output is unique on its
own terms, we have to check whether the set of output columns we are going to
use will be unique. Per complaint from Luca Pireddu and test case from
Michael Fuhr.
Tom Lane [Thu, 14 Jul 2005 21:46:30 +0000 (21:46 +0000)]
Adjust permissions checking for ALTER OWNER commands: instead of
requiring superuserness always, allow an owner to reassign ownership
to any role he is a member of, if that role would have the right to
create a similar object. These three requirements essentially state
that the would-be alterer has enough privilege to DROP the existing
object and then re-CREATE it as the new role; so we might as well
let him do it in one step. The ALTER TABLESPACE case is a bit
squirrely, but the whole concept of non-superuser tablespace owners
is pretty dubious anyway. Stephen Frost, code review by Tom Lane.
Neil Conway [Thu, 14 Jul 2005 07:12:27 +0000 (07:12 +0000)]
Mark xml2 CREATE FUNCTIONs as IMMUTABLE, and use the "STRICT" syntax
rather than the deprecated "WITH (isStrict)" syntax. Patch from Ilia
Kantor, minor editorializing by Neil Conway.
Neil Conway [Thu, 14 Jul 2005 06:17:36 +0000 (06:17 +0000)]
This doc patch replaces all inappropriate references to SQL:1999 when it
is used as if it were the latest (and/or still valid) SQL standard.
SQL:2003 is used in its place. Patch from Simon Riggs.
Tom Lane [Thu, 14 Jul 2005 05:13:45 +0000 (05:13 +0000)]
Integrate autovacuum functionality into the backend. There's still a
few loose ends to be dealt with, but it seems to work. Alvaro Herrera,
based on the contrib code by Matthew O'Connor.