Tom Lane [Mon, 31 May 2004 20:31:33 +0000 (20:31 +0000)]
Additional mop-up for sync-to-fsync changes: avoid issuing fsyncs for
temp tables, and avoid WAL-logging truncations of temp tables. Do issue
fsync on truncated files (not sure this is necessary but it seems like
a good idea).
Tom Lane [Mon, 31 May 2004 19:24:05 +0000 (19:24 +0000)]
Minor code rationalization: FlushRelationBuffers just returns void,
rather than an error code, and does elog(ERROR) not elog(WARNING)
when it detects a problem. All callers were simply elog(ERROR)'ing on
failure return anyway, and I find it hard to envision a caller that would
not, so we may as well simplify the callers and produce the more useful
error message directly.
Tom Lane [Mon, 31 May 2004 18:31:51 +0000 (18:31 +0000)]
I think I've finally identified the cause of the off-by-one-second
issue in timestamp conversion that we hacked around for so long by
ignoring the seconds field from localtime(). It's simple: you have
to watch out for platform-specific roundoff error when reducing a
possibly-fractional timestamp to integral time_t form. In particular
we should subtract off the already-determined fractional fsec field.
This should be enough to get an exact answer with int64 timestamps;
with float timestamps, throw in a rint() call just to be sure.
Tom Lane [Mon, 31 May 2004 03:48:10 +0000 (03:48 +0000)]
Per previous discussions, get rid of use of sync(2) in favor of
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint. In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends. (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors. Anyone have a problem with that?) This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.
Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC: heap_delete_redo: no block'
later on in xlog replay.
Neil Conway [Sun, 30 May 2004 23:40:41 +0000 (23:40 +0000)]
Use the new List API function names throughout the backend, and disable the
list compatibility API by default. While doing this, I decided to keep
the llast() macro around and introduce llast_int() and llast_oid() variants.
Tom Lane [Sat, 29 May 2004 22:48:23 +0000 (22:48 +0000)]
Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files. Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses. While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
Tom Lane [Sat, 29 May 2004 05:55:13 +0000 (05:55 +0000)]
Fix another place that assumed 'x = lcons(y, z)' would not have any
side-effect on the original list z. I fear we have a few more of these
to track down yet :-(.
Bruce Momjian [Fri, 28 May 2004 18:37:10 +0000 (18:37 +0000)]
When checking for thread safety with src/tools/thread/thread_test.c, the
mktemp function wants an argument that contains 6 X, while the current
version only supplies 5 X which will fail on my SuSE 8.1.
Tom Lane [Fri, 28 May 2004 16:17:14 +0000 (16:17 +0000)]
Fix thinko in recent patch to change temp-table permissions behavior:
this is an aclmask function and does not have the same return convention
as aclcheck functions. Also adjust the behavior so that users without
CREATE TEMP permission still have USAGE permission on their session's
temp schema. This allows privileged code to create a temp table and
make it accessible to code that's not got the same privilege. (Since
the default permissions on a table are no-access, an explicit grant on
the table will still be needed; but I see no reason that the temp schema
itself should prohibit such access.)
Teodor Sigaev [Fri, 28 May 2004 10:43:32 +0000 (10:43 +0000)]
New version. Add support for int2, int8, float4, float8, timestamp with/without time zone, time with/without time zone, date, interval, oid, money and macaddr, char, varchar/text, bytea, numeric, bit, varbit, inet/cidr types for GiST
Tom Lane [Fri, 28 May 2004 05:13:32 +0000 (05:13 +0000)]
Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by
about a third, make it work on non-Windows platforms again. (But perhaps
I broke the WIN32 code, since I have no way to test that.) Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently. Take care of one or two
FIXMEs that remained in the code.
Tom Lane [Thu, 27 May 2004 17:12:57 +0000 (17:12 +0000)]
Get rid of the former rather baroque mechanism for propagating the values
of ThisStartUpID and RedoRecPtr into new backends. It's a lot easier just
to make them all grab the values out of shared memory during startup.
This helps to decouple the postmaster from checkpoint execution, which I
need since I'm intending to let the bgwriter do it instead, and it also
fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at
all AFAICS. (Doesn't give me a lot of faith in the amount of testing that
port has gotten.)
Tom Lane [Thu, 27 May 2004 03:30:11 +0000 (03:30 +0000)]
Recommend ALTER TABLE ... TYPE as the best way to reclaim space occupied by deleted columns. The old method involving UPDATE and VACUUM FULL will be considerably less efficient.
Tom Lane [Wed, 26 May 2004 19:44:15 +0000 (19:44 +0000)]
Reduce the minimum allocable chunk size to 8 bytes (from 16). Now that
ListCells are only 8 bytes instead of 12 (on 4-byte-pointer machines
anyway), it's worth maintaining a separate freelist for 8-byte objects.
Remembering that alloc chunks carry 8 bytes of overhead, this should
reduce the net storage requirement for a long List by about a third.
Bruce Momjian [Wed, 26 May 2004 18:51:43 +0000 (18:51 +0000)]
AIX doc addition:
> FWIW, the section on configuring kernel resources under various
> Unixen[1] doesn't have any documentation for AIX. If someone out there
> knows which knobs need to be tweaked, would they mind sending in a doc
> patch? (Or just specifying what needs to be done, and I'll add the
> SGML.)
After verifying that nobody wound up messing with the kernel
parameters, here's a docs patch...
Bruce Momjian [Wed, 26 May 2004 18:35:51 +0000 (18:35 +0000)]
*) inet_(client|server)_(addr|port)() and necessary documentation for
the four functions.
> Also, please justify the temp-related changes. I was not aware that we
> had any breakage there.
patch-tmp-schema.txt contains the following bits:
*) Changes pg_namespace_aclmask() so that the superuser is always able
to create objects in the temp namespace.
*) Changes pg_namespace_aclmask() so that if this is a temp namespace,
objects are only allowed to be created in the temp namespace if the
user has TEMP privs on the database. This encompasses all object
creation, not just TEMP tables.
*) InitTempTableNamespace() checks to see if the current user, not the
session user, has access to create a temp namespace.
The first two changes are necessary to support the third change. Now
it's possible to revoke all temp table privs from non-super users and
limiting all creation of temp tables/schemas via a function that's
executed with elevated privs (security definer). Before this change,
it was not possible to have a setuid function to create a temp
table/schema if the session user had no TEMP privs.
patch-area-path.txt contains:
*) Can now determine the area of a closed path.
patch-dfmgr.txt contains:
*) Small tweak to add the library path that's being expanded.
I was using $lib/foo.so and couldn't easily figure out what the error
message, "invalid macro name in dynamic library path" meant without
looking through the source code. With the path in there, at least I
know where to start looking in my config file.
Bruce Momjian [Wed, 26 May 2004 15:26:28 +0000 (15:26 +0000)]
The added aggregates are:
(1) boolean-and and boolean-or aggregates named bool_and and bool_or.
they (SHOULD;-) correspond to standard sql every and some/any aggregates.
they do not have the right name as there is a problem with
the standard and the parser for some/any. Tom also think that
the standard name is misleading because NULL are ignored.
Also add 'every' aggregate.
(2) bitwise integer aggregates named bit_and and bit_or for
int2, int4, int8 and bit types. They are not standard, but I find
them useful. I needed them once.
The patches adds:
- 2 new very short strict functions for boolean aggregates in
src/backed/utils/adt/bool.c,
src/include/utils/builtins.h and src/include/catalog/pg_proc.h
- the new aggregates declared in src/include/catalog/pg_proc.h and
src/include/catalog/pg_aggregate.h
- some documentation and validation about these new aggregates.
Bruce Momjian [Wed, 26 May 2004 15:07:41 +0000 (15:07 +0000)]
The patch adresses the TODO list item "Allow external interfaces to
extend the GUC variable set".
Plugin modules like the pl<lang> modules needs a way to declare
configuration parameters. The postmaster has no knowledge of such
modules when it reads the postgresql.conf file. Rather than allowing
totally unknown configuration parameters, the concept of a variable
"class" is introduced. Variables that belongs to a declared classes will
create a placeholder value of string type and will not generate an
error. When a module is loaded, it will declare variables for such a
class and make those variables "consume" any placeholders that has been
defined. Finally, the module will generate warnings for unrecognized
placeholders defined for its class.
More detail:
The design is outlined after the suggestions made by Tom Lane and Joe
Conway in this thread:
A new string variable 'custom_variable_classes' is introduced. This
variable is a comma separated string of identifiers. Each identifier
denots a 'class' that will allow its members to be added without error.
This variable must be defined in postmaster.conf.
The lexer (guc_file.l) is changed so that it can accept a qualified name
in the form <ID>.<ID> as the name of a variable. I also changed so that
the 'custom_variable_classes', if found, is added first of all variables
in order to remove the order of declaration issue.
The guc_variables table is made more dynamic. It is originally created
with 20% slack and can grow dynamically. A capacity is introduced to
avoid resizing every time a new variable is added. guc_variables and
num_guc_variables becomes static (hidden).
The GucInfoMain now uses the new function get_guc_variables() and
GetNumConfigOptions instead or using the guc_variables directly.
The find_option() function, when passed a missing name, will check if
the name is qualified. If the name is qualified and if the qualifier
denotes a class included in the 'custom_variable_classes', a placeholder
variable will be created. Such a placeholder will not participate in a
list operation but will otherwise function as a normal string variable.
Define<type>GucVariable() functions will be added, one for each variable
type. They are inteded to be used by add-on modules like the pl<lang>
mappings. Example:
(I created typedefs for the assign-hook and show-hook functions). A call
to these functions will define a new GUC-variable. If a placeholder
exists it will be replaced but it's value will be used in place of the
default value. The valueAddr is assumed ot point at a default value when
the define function is called. The only constraint that is imposed on a
Custom variable is that its name is qualified.
was added. This function should be called when a module has completed
its variable definitions. At that time, no placeholders should remain
for the class that the module uses. If they do, elog(INFO, ...) messages
will be issued to inform the user that unrecognized variables are
present.
Bruce Momjian [Wed, 26 May 2004 13:57:04 +0000 (13:57 +0000)]
This patch implement the TODO [ALTER DATABASE foo OWNER TO bar].
It was necessary to touch in grammar and create a new node to make home
to the new syntax. The command is also supported in E
CPG. Doc updates are attached too. Only superusers can change the owner
of the database. New owners don't need any aditional
privileges.
Neil Conway [Wed, 26 May 2004 04:41:50 +0000 (04:41 +0000)]
Reimplement the linked list data structure used throughout the backend.
In the past, we used a 'Lispy' linked list implementation: a "list" was
merely a pointer to the head node of the list. The problem with that
design is that it makes lappend() and length() linear time. This patch
fixes that problem (and others) by maintaining a count of the list
length and a pointer to the tail node along with each head node pointer.
A "list" is now a pointer to a structure containing some meta-data
about the list; the head and tail pointers in that structure refer
to ListCell structures that maintain the actual linked list of nodes.
The function names of the list API have also been changed to, I hope,
be more logically consistent. By default, the old function names are
still available; they will be disabled-by-default once the rest of
the tree has been updated to use the new API names.
Tom Lane [Tue, 25 May 2004 19:46:21 +0000 (19:46 +0000)]
Tweaks per discussion with Magnus: suppress chatter on unpatched MinGW
systems, add verbose logging (at DEBUG4) to help identify why a given
time zone is not matched.
Tom Lane [Tue, 25 May 2004 19:11:14 +0000 (19:11 +0000)]
Fix erroneous error message printout when a configuration file contains
an overlength token. Printout was always garbage and could dump core
entirely :-(. Per report from Martin Pitt.
Tom Lane [Tue, 25 May 2004 18:08:59 +0000 (18:08 +0000)]
Add code to identify_system_timezone() to try all zones in the zic
database, not just ones that we cons up POSIX names for. This looks
grim but it seems to take less than a second even on a relatively slow
machine, and since it only happens once during postmaster startup, that
seems acceptable.
Tom Lane [Mon, 24 May 2004 02:30:29 +0000 (02:30 +0000)]
Rewrite identify_system_timezone() to give it better-than-chance odds
of correctly identifying the system's daylight-savings transition rules.
This still begs the question of how to look through the zic database to
find a matching zone definition, but at least now we'll have some chance
of recognizing the match when we find it.
Tom Lane [Sun, 23 May 2004 23:12:11 +0000 (23:12 +0000)]
Avoid calling select_default_timezone() when backing out an unwanted TZ
setting. This is a temporary kluge to keep Alvaro happy; eventually we
should fix the TZ library API to make the problem really go away.
Tom Lane [Sun, 23 May 2004 22:24:08 +0000 (22:24 +0000)]
Use case-insensitive comparison so that explicitly setting timezone=unknown
in postgresql.conf does the right thing. variable.c got this right, but
not pgtz.c ...
Tom Lane [Sun, 23 May 2004 21:24:02 +0000 (21:24 +0000)]
New two-stage sampling method for ANALYZE, as per discussions a few weeks
ago. This should give significantly better results when the density of
live tuples is not uniform throughout a table. Manfred Koizar, with
minor kibitzing from Tom Lane.
Tom Lane [Sun, 23 May 2004 17:10:54 +0000 (17:10 +0000)]
Still another place to make the world safe for zero-column tables:
remove the ancient (and always pretty dodgy) assumption in parse_clause.c
that a query can't have an empty targetlist.
Tom Lane [Sun, 23 May 2004 03:50:45 +0000 (03:50 +0000)]
Handle impending sinval queue overflow by means of a separate signal
(SIGUSR1, which we have not been using recently) instead of piggybacking
on SIGUSR2-driven NOTIFY processing. This has several good results:
the processing needed to drain the sinval queue is a lot less than the
processing needed to answer a NOTIFY; there's less contention since we
don't have a bunch of backends all trying to acquire exclusive lock on
pg_listener; backends that are sitting inside a transaction block can
still drain the queue, whereas NOTIFY processing can't run if there's
an open transaction block. (This last is a fairly serious issue that
I don't think we ever recognized before --- with clients like JDBC that
tend to sit with open transaction blocks, the sinval queue draining
mechanism never really worked as intended, probably resulting in a lot
of useless cache-reset overhead.) This is the last of several proposed
changes in response to Philip Warner's recent report of sinval-induced
performance problems.
Tom Lane [Sat, 22 May 2004 23:14:38 +0000 (23:14 +0000)]
For multi-table ANALYZE, use per-table transactions when possible
(ie, when not inside a transaction block), so that we can avoid holding
locks longer than necessary. Per trouble report from Philip Warner.
Tom Lane [Sat, 22 May 2004 21:58:24 +0000 (21:58 +0000)]
Reduce pg_listener lock taken by NOTIFY et al from AccessExclusiveLock
to ExclusiveLock. This still serializes the operations of this module,
but doesn't conflict with concurrent ANALYZE operations. Per trouble
report from Philip Warner a few weeks ago.