From 170fe97da45d29f87e183982d3dd938ff2d86744 Mon Sep 17 00:00:00 2001 From: Sandro Santilli Date: Fri, 4 Feb 2005 10:20:57 +0000 Subject: [PATCH] Added Performance Tips chapter git-svn-id: http://svn.osgeo.org/postgis/trunk@1364 b70326c6-7e19-0410-871a-916f4a2858ee --- doc/postgis.xml | 98 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) diff --git a/doc/postgis.xml b/doc/postgis.xml index fdbedcd18..ee2466807 100644 --- a/doc/postgis.xml +++ b/doc/postgis.xml @@ -2402,6 +2402,104 @@ if( geom.getType() = Geometry.POLYGON ) + Performance tips + + Small tables of large geometries + +Problem description + + +Current PostgreSQL versions (including 8.0) suffer from a query +optimizer weakness regarding TOAST tables. TOAST tables are a kind of +"extension room" used to store large (in the sense of data size) values +that do not fit into normal data pages (like long texts, images or +complex geometries with lots of vertices), see +http://www.postgresql.org/docs/8.0/static/storage-toast.html for more +information). + + + +The problem appears if you happen to have a table with rather large +geometries, but not too much rows of them (like a table containing the +boundaries of all european countries in high resolution). Then the table +itsself is small, but it uses lots of TOAST space. In our example case, +the table itsself had about 80 rows and used only 3 data pages, but the +TOAST table used 8225 pages. + + + +Now issue a query where you use the geometry operator && to search for a +bounding box that matches only very few of those rows. Now the query +optimizer sees that the table has only 3 pages and 80 rows. He estimates +that a sequential scan on such a small table is much faster than using +an index. And so he decides to ignore the GIST index. Usually, this +estimation is correct. But in our case, the && operator has to fetch +every geometry from disk to compare the bounding boxes, thus reading all +TOAST pages, too. + + + +To see whether your suffer from this bug, use the "EXPLAIN ANALYZE" +postgresql command. For more information and the technical details, you +can read the thread on the postgres performance mailing list: +http://archives.postgresql.org/pgsql-performance/2005-02/msg00030.php + + + + +Workarounds + + +The PostgreSQL people are trying to solve this issue by making the query +estimation TOAST-aware. For now, here are two workarounds: + + + +The first workaround is to force the query planner to use the index. +Send "SET enable_seqscan TO off;" to the server before issuing the +query. This basically forces the query planner to avoid sequential scans +whenever possible. So it uses the GIST index as usual. But this flag has +to be set on every connection, and it causes the query planner to make +misestimations in other cases, so you should "SET enable_seqscan TO on;" +after the query. + + + +The second workaround is to make the sequential scan as fast as the +query planner thinks. This can be achieved by creating an additional +column that "caches" the bbox, and matching against this. In our +example, the commands are like: + + + +SELECT addGeometryColumn('myschema','mytable','bbox','4326','GEOMETRY','2'); + +UPDATE mytable set bbox = Envelope(Force_2d(the_geom)); + + + +Now change your query to use the && operator against bbox instead of +geom_column, like: + + + +SELECT geom_column FROM mytable WHERE bbox && SetSrid('BOX3D(0 0,1 1)'::box3d,4326); + + + +Of yourse, if you change or add rows to mytable, you have to keep the +bbox "in sync". The most transparent way to do this would be triggers, +but you also can modify your application to keep the bbox column current +or run the UPDATE query above after every modification. + + + + + + + + + PostGIS Reference -- 2.49.0