From: Tom Lane Date: Sat, 29 May 2010 21:08:04 +0000 (+0000) Subject: Add text to "Populating a Database" pointing out that bulk data load into a X-Git-Tag: REL9_0_BETA2~36 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=63f591e9695160bce80c77714970213cf8ca3318;p=postgresql Add text to "Populating a Database" pointing out that bulk data load into a table with foreign key constraints eats memory. Per off-line discussion of bug #5480 with its reporter. Also do some minor wordsmithing elsewhere in the same section. --- diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 9400ebcc15..4b6768bb69 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1,4 +1,4 @@ - + Performance Tips @@ -870,11 +870,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; If you are adding large amounts of data to an existing table, - it might be a win to drop the index, - load the table, and then recreate the index. Of course, the + it might be a win to drop the indexes, + load the table, and then recreate the indexes. Of course, the database performance for other users might suffer - during the time the index is missing. One should also think - twice before dropping unique indexes, since the error checking + during the time the indexes are missing. One should also think + twice before dropping a unique index, since the error checking afforded by the unique constraint will be lost while the index is missing. @@ -890,6 +890,19 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; the constraints. Again, there is a trade-off between data load speed and loss of error checking while the constraint is missing. + + + What's more, when you load data into a table with existing foreign key + constraints, each new row requires an entry in the server's list of + pending trigger events (since it is the firing of a trigger that checks + the row's foreign key constraint). Loading many millions of rows can + cause the trigger event queue to overflow available memory, leading to + intolerable swapping or even outright failure of the command. Therefore + it may be necessary, not just desirable, to drop and re-apply + foreign keys when loading large amounts of data. If temporarily removing + the constraint isn't acceptable, the only other recourse may be to split + up the load operation into smaller transactions. + @@ -930,11 +943,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; When loading large amounts of data into an installation that uses WAL archiving or streaming replication, it might be faster to take a new base backup after the load has completed than to process a large - amount of incremental WAL data. You might want to disable archiving - and streaming replication while loading, by setting + amount of incremental WAL data. To prevent incremental WAL logging + while loading, disable archiving and streaming replication, by setting to minimal, - off, and - to zero). + to off, and + to zero. But note that changing these settings requires a server restart. @@ -1006,7 +1019,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; pg_dump dump as quickly as possible, you need to do a few extra things manually. (Note that these points apply while restoring a dump, not while creating it. - The same points apply when using pg_restore to load + The same points apply whether loading a text dump with + psql or using pg_restore to load from a pg_dump archive file.) @@ -1027,10 +1041,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; If using WAL archiving or streaming replication, consider disabling - them during the restore. To do that, set archive_mode off, + them during the restore. To do that, set archive_mode + to off, wal_level to minimal, and - max_wal_senders zero before loading the dump script, - and afterwards set them back to the right values and take a fresh + max_wal_senders to zero before loading the dump. + Afterwards, set them back to the right values and take a fresh base backup. @@ -1044,10 +1059,14 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; possibly discarding many hours of processing. Depending on how interrelated the data is, that might seem preferable to manual cleanup, or not. COPY commands will run fastest if you use a single - transaction and have WAL archiving turned off. - pg_restore also has a