From 2ff4e440432b007cf8132d04f17bc3631e92822e Mon Sep 17 00:00:00 2001 From: Neil Conway Date: Thu, 22 Apr 2004 07:02:36 +0000 Subject: [PATCH] Improvements to the backup & restore documentation. --- doc/src/sgml/backup.sgml | 49 +++++++++-------- doc/src/sgml/perform.sgml | 109 +++++++++++++++++++++++++++----------- 2 files changed, 101 insertions(+), 57 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index cff69b96a7..b8b958296c 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1,5 +1,5 @@ Backup and Restore @@ -30,7 +30,7 @@ $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.38 2004/03/09 16:57:46 neilc Exp commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. PostgreSQL provides the utility program - pg_dump for this purpose. The basic usage of this + for this purpose. The basic usage of this command is: pg_dump dbname > outfile @@ -126,10 +126,11 @@ psql dbname < - Once restored, it is wise to run ANALYZE on each - database so the optimizer has useful statistics. You - can also run vacuumdb -a -z to ANALYZE all - databases. + Once restored, it is wise to run on each database so the optimizer has + useful statistics. You can also run vacuumdb -a -z to + VACUUM ANALYZE all databases; this is equivalent to + running VACUUM ANALYZE manually. @@ -153,13 +154,11 @@ pg_dump -h host1 dbname | psql -h h - - - Restore performance can be improved by increasing the - configuration parameter . - - + + For advice on how to load large amounts of data into + PostgreSQL efficiently, refer to . + @@ -167,12 +166,11 @@ pg_dump -h host1 dbname | psql -h h The above mechanism is cumbersome and inappropriate when backing - up an entire database cluster. For this reason the - pg_dumpall program is provided. + up an entire database cluster. For this reason the program is provided. pg_dumpall backs up each database in a given - cluster, and also preserves cluster-wide data such as - users and groups. The call sequence for - pg_dumpall is simply + cluster, and also preserves cluster-wide data such as users and + groups. The basic usage of this command is: pg_dumpall > outfile @@ -195,7 +193,7 @@ psql template1 < infile Since PostgreSQL allows tables larger than the maximum file size on your system, it can be problematic to dump such a table to a file, since the resulting file will likely - be larger than the maximum size allowed by your system. As + be larger than the maximum size allowed by your system. Since pg_dump can write to the standard output, you can just use standard Unix tools to work around this possible problem. @@ -274,7 +272,7 @@ pg_dump -Fc dbname > pg_dump does not dump large objects by default.large objectbackup To dump - large objects you must use either the custom or the TAR output + large objects you must use either the custom or the tar output format, and use the - Use <command>COPY FROM</command> + Use <command>COPY</command> + + + Use to load + all the rows in one command, instead of using a series of + INSERT commands. The COPY + command is optimized for loading large numbers of rows; it is less + flexible than INSERT, but incurs significantly + less overhead for large data loads. Since COPY + is a single command, there is no need to disable autocommit if you + use this method to populate a table. + - Use COPY FROM STDIN to load all the rows in one - command, instead of using a series of INSERT - commands. This reduces parsing, planning, etc. overhead a great - deal. If you do this then it is not necessary to turn off - autocommit, since it is only one command anyway. + Note that loading a large number of rows using + COPY is almost always faster than using + INSERT, even if multiple + INSERT commands are batched into a single + transaction. @@ -678,11 +699,12 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; If you are augmenting an existing table, you can drop the index, - load the table, then recreate the index. Of - course, the database performance for other users may be adversely - affected during the time that the index is missing. One should also - think twice before dropping unique indexes, since the error checking - afforded by the unique constraint will be lost while the index is missing. + load the table, and then recreate the index. Of course, the + database performance for other users may be adversely affected + during the time that the index is missing. One should also think + twice before dropping unique indexes, since the error checking + afforded by the unique constraint will be lost while the index is + missing. @@ -701,16 +723,39 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; + + Increase <varname>checkpoint_segments</varname> + + + Temporarily increasing the configuration variable can also + make large data loads faster. This is because loading a large + amount of data into PostgreSQL can + cause checkpoints to occur more often than the normal checkpoint + frequency (specified by the checkpoint_timeout + configuration variable). Whenever a checkpoint occurs, all dirty + pages must be flushed to disk. By increasing + checkpoint_segments temporarily during bulk + data loads, the number of checkpoints that are required can be + reduced. + + + Run <command>ANALYZE</command> Afterwards - It's a good idea to run ANALYZE or VACUUM - ANALYZE anytime you've added or updated a lot of data, - including just after initially populating a table. This ensures that - the planner has up-to-date statistics about the table. With no statistics - or obsolete statistics, the planner may make poor choices of query plans, - leading to bad performance on queries that use your table. + Whenever you have significantly altered the distribution of data + within a table, running is strongly recommended. This + includes when bulk loading large amounts of data into + PostgreSQL. Running + ANALYZE (or VACUUM ANALYZE) + ensures that the planner has up-to-date statistics about the + table. With no statistics or obsolete statistics, the planner may + make poor decisions during query planning, leading to poor + performance on any tables with inaccurate or nonexistent + statistics. -- 2.40.0