doc: Add best practises section to partitioning docs

author David Rowley <drowley@postgresql.org>

Tue, 11 Jun 2019 20:08:57 +0000 (08:08 +1200)

committer David Rowley <drowley@postgresql.org>

Tue, 11 Jun 2019 20:08:57 +0000 (08:08 +1200)
author David Rowley <drowley@postgresql.org>
Tue, 11 Jun 2019 20:08:57 +0000 (08:08 +1200)
committer David Rowley <drowley@postgresql.org>
Tue, 11 Jun 2019 20:08:57 +0000 (08:08 +1200)
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml

index cce1618fc135507cd5278cfe68ec5e00d5e325aa..15505f337c16f1703e42dd2dc5afed3ef18dba42 100644 (file)
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3450,8 +3450,9 @@ VALUES ('Albany', NULL, NULL, 'NY');
      </listitem>
     </itemizedlist>
  
-   These deficiencies will probably be fixed in some future release,
-   but in the meantime considerable care is needed in deciding whether
+   Some functionality not implemented for inheritance hierarchies is
+   implemented for declarative partitioning.
+   Considerable care is needed in deciding whether partitioning with legacy
     inheritance is useful for your application.
    </para>
  
@@ -4674,6 +4675,87 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
     </itemizedlist>
     </para>
    </sect2>
+  
+  <sect2 id="ddl-partitioning-declarative-best-practices">
+   <title>Declarative Partitioning Best Practices</title>
+
+   <para>
+    The choice of how to partition a table should be made carefully as the
+    performance of query planning and execution can be negatively affected by
+    poor design.
+   </para>
+
+   <para>
+    One of the most critical design decisions will be the column or columns
+    by which you partition your data.  Often the best choice will be to
+    partition by the column or set of columns which most commonly appear in
+    <literal>WHERE</literal> clauses of queries being executed on the
+    partitioned table.  <literal>WHERE</literal> clause items that match and
+    are compatible with the partition key can be used to prune unneeded
+    partitions.  However, you may be forced into making other decisions by
+    requirements for the <literal>PRIMARY KEY</literal> or a
+    <literal>UNIQUE</literal> constraint.  Removal of unwanted data is also a
+    factor to consider when planning your partitioning strategy.  An entire
+    partition can be detached fairly quickly, so it may be beneficial to
+    design the partition strategy in such a way that all data to be removed
+    at once is located in a single partition.
+   </para>
+
+   <para>
+    Choosing the target number of partitions that the table should be divided
+    into is also a critical decision to make.  Not having enough partitions
+    may mean that indexes remain too large and that data locality remains poor
+    which could result in low cache hit ratios.  However, dividing the table
+    into too many partitions can also cause issues.  Too many partitions can
+    mean longer query planning times and higher memory consumption during both
+    query planning and execution.  When choosing how to partition your table,
+    it's also important to consider what changes may occur in the future.  For
+    example, if you choose to have one partition per customer and you
+    currently have a small number of large customers, consider the
+    implications if in several years you instead find yourself with a large
+    number of small customers.  In this case, it may be better to choose to
+    partition by <literal>HASH</literal> and choose a reasonable number of
+    partitions rather than trying to partition by <literal>LIST</literal> and
+    hoping that the number of customers does not increase beyond what it is
+    practical to partition the data by.
+   </para>
+
+   <para>
+    Sub-partitioning can be useful to further divide partitions that are
+    expected to become larger than other partitions, although excessive
+    sub-partitioning can easily lead to large numbers of partitions and can
+    cause the same problems mentioned in the preceding paragraph.
+   </para>
+
+   <para>
+    It is also important to consider the overhead of partitioning during
+    query planning and execution.  The query planner is generally able to
+    handle partition hierarchies up a few thousand partitions fairly well,
+    provided that typical queries allow the query planner to prune all but a
+    small number of partitions.  Planning times become longer and memory
+    consumption becomes higher when more partitions remain after the planner
+    performs partition pruning.  This is particularly true for the
+    <command>UPDATE</command> and <command>DELETE</command> commands. Another
+    reason to be concerned about having a large number of partitions is that
+    the server's memory consumption may grow significantly over a period of
+    time, especially if many sessions touch large numbers of partitions.
+    That's because each partition requires its metadata to be loaded into the
+    local memory of each session that touches it.
+   </para>
+
+   <para>
+    With data warehouse type workloads, it can make sense to use a larger
+    number of partitions than with an <acronym>OLTP</acronym> type workload.
+    Generally, in data warehouses, query planning time is less of a concern as
+    the majority of processing time is spent during query execution.  With
+    either of these two types of workload, it is important to make the right
+    decisions early, as re-partitioning large quantities of data can be
+    painfully slow.  Simulations of the intended workload are often beneficial
+    for optimizing the partitioning strategy.  Never assume that more
+    partitions are better than fewer partitions and vice-versa.
+   </para>
+  </sect2>
+
   </sect1>
  
   <sect1 id="ddl-foreign-data">
author	David Rowley <drowley@postgresql.org>
	Tue, 11 Jun 2019 20:08:57 +0000 (08:08 +1200)
committer	David Rowley <drowley@postgresql.org>
	Tue, 11 Jun 2019 20:08:57 +0000 (08:08 +1200)