From 7d6e28149ea699b28773eefa48acc449b67d6ff4 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Fri, 22 Jun 2001 18:53:36 +0000 Subject: [PATCH] Mention that EXPLAIN results on small tables shouldn't be extrapolated to large tables. Recommend ANALYZE or VACUUM ANALYZE after populating a table. --- doc/src/sgml/perform.sgml | 57 ++++++++++++++++++++++++++++++--------- 1 file changed, 45 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index bb0962d79d..e9328cb745 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1,5 +1,5 @@ @@ -110,7 +110,7 @@ select * from pg_class where relname = 'tenk1'; you'll find out that tenk1 has 233 disk - pages and 10000 tuples. So the cost is estimated at 233 block + pages and 10000 tuples. So the cost is estimated at 233 page reads, defined as 1.0 apiece, plus 10000 * cpu_tuple_cost which is currently 0.01 (try show cpu_tuple_cost). @@ -248,6 +248,19 @@ Hash Join (cost=173.44..557.03 rows=47 width=296) 10000 times. Note, however, that we are NOT charging 10000 times 173.32; the hash table setup is only done once in this plan type. + + + It is worth noting that EXPLAIN results should not be extrapolated + to situations other than the one you are actually testing; for example, + results on a toy-sized table can't be assumed to apply to large tables. + The planner's cost estimates are not linear and so it may well choose + a different plan for a larger or smaller table. An extreme example + is that on a table that only occupies one disk page, you'll nearly + always get a sequential scan plan whether indexes are available or not. + The planner realizes that it's going to take one disk page read to + process the table in any case, so there's no value in expending additional + page reads to look at an index. + @@ -375,10 +388,13 @@ SELECT * FROM d LEFT JOIN Turn off auto-commit and just do one commit at - the end. Otherwise Postgres is doing a - lot of work for each record - added. In general when you are doing bulk inserts, you want - to turn off some of the database features to gain speed. + the end. (In plain SQL, this means issuing BEGIN + at the start and COMMIT at the end. Some client + libraries may do this behind your back, in which case you need to + make sure the library does it when you want it done.) + If you allow each insertion to be committed separately, + Postgres is doing a lot of work for each + record added. @@ -387,10 +403,11 @@ SELECT * FROM d LEFT JOIN Use COPY FROM STDIN to load all the records in one - command, instead - of a series of INSERT commands. This reduces parsing, planning, etc + command, instead of using + a series of INSERT commands. This reduces parsing, + planning, etc overhead a great deal. If you do this then it's not necessary to fool - around with autocommit, since it's only one command anyway. + around with auto-commit, since it's only one command anyway. @@ -399,16 +416,32 @@ SELECT * FROM d LEFT JOIN If you are loading a freshly created table, the fastest way is to - create the table, bulk-load with COPY, then create any indexes needed + create the table, bulk-load with COPY, then create any + indexes needed for the table. Creating an index on pre-existing data is quicker than updating it incrementally as each record is loaded. If you are augmenting an existing table, you can DROP - INDEX, load the table, then recreate the index. Of + INDEX, load the table, then recreate the index. Of course, the database performance for other users may be adversely - affected during the time that the index is missing. + affected during the time that the index is missing. One should also + think twice before dropping UNIQUE indexes, since the error checking + afforded by the UNIQUE constraint will be lost while the index is missing. + + + + + ANALYZE Afterwards + + + It's a good idea to run ANALYZE or VACUUM + ANALYZE anytime you've added or updated a lot of data, + including just after initially populating a table. This ensures that + the planner has up-to-date statistics about the table. With no statistics + or obsolete statistics, the planner may make poor choices of query plans, + leading to bad performance on queries that use your table. -- 2.40.0