doc: Add best practises section to partitioning docs

author David Rowley <drowley@postgresql.org>

Tue, 11 Jun 2019 20:09:28 +0000 (08:09 +1200)

committer David Rowley <drowley@postgresql.org>

Tue, 11 Jun 2019 20:09:28 +0000 (08:09 +1200)
author David Rowley <drowley@postgresql.org>
Tue, 11 Jun 2019 20:09:28 +0000 (08:09 +1200)
committer David Rowley <drowley@postgresql.org>
Tue, 11 Jun 2019 20:09:28 +0000 (08:09 +1200)
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml

index 13a10017614ccf3ea0182551a5894bbde7623357..dfcf3c4ab0af4ef6eacca6a7bea27df3b1a18909 100644 (file)
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2833,8 +2833,9 @@ VALUES ('Albany', NULL, NULL, 'NY');
      </listitem>
     </itemizedlist>
  
-   These deficiencies will probably be fixed in some future release,
-   but in the meantime considerable care is needed in deciding whether
+   Some functionality not implemented for inheritance hierarchies is
+   implemented for declarative partitioning.
+   Considerable care is needed in deciding whether partitioning with legacy
     inheritance is useful for your application.
    </para>
  
@@ -3927,6 +3928,84 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
     </itemizedlist>
     </para>
    </sect2>
+  
+  <sect2 id="ddl-partitioning-declarative-best-practices">
+   <title>Declarative Partitioning Best Practices</title>
+
+   <para>
+    The choice of how to partition a table should be made carefully as the
+    performance of query planning and execution can be negatively affected by
+    poor design.
+   </para>
+
+   <para>
+    One of the most critical design decisions will be the column or columns
+    by which you partition your data.  Often the best choice will be to
+    partition by the column or set of columns which most commonly appear in
+    <literal>WHERE</literal> clauses of queries being executed on the
+    partitioned table.  <literal>WHERE</literal> clause items that match and
+    are compatible with the partition key can be used to prune unneeded
+    partitions.  Removal of unwanted data is also a factor to consider when
+    planning your partitioning strategy.  An entire partition can be detached
+    fairly quickly, so it may be beneficial to design the partition strategy
+    in such a way that all data to be removed at once is located in a single
+    partition.
+   </para>
+
+   <para>
+    Choosing the target number of partitions that the table should be divided
+    into is also a critical decision to make.  Not having enough partitions
+    may mean that indexes remain too large and that data locality remains poor
+    which could result in low cache hit ratios.  However, dividing the table
+    into too many partitions can also cause issues.  Too many partitions can
+    mean longer query planning times and higher memory consumption during both
+    query planning and execution.  When choosing how to partition your table,
+    it's also important to consider what changes may occur in the future.  For
+    example, if you choose to have one partition per customer and you
+    currently have a small number of large customers, consider the
+    implications if in several years you instead find yourself with a large
+    number of small customers.  In this case, it may be better to choose to
+    partition by <literal>RANGE</literal> and choose a reasonable number of
+    partitions, each containing a fixed number of customers, rather than
+    trying to partition by <literal>LIST</literal> and hoping that the number
+    of customers does not increase beyond what it is practical to partition
+    the data by.
+   </para>
+
+   <para>
+    Sub-partitioning can be useful to further divide partitions that are
+    expected to become larger than other partitions, although excessive
+    sub-partitioning can easily lead to large numbers of partitions and can
+    cause the same problems mentioned in the preceding paragraph.
+   </para>
+
+   <para>
+    It is also important to consider the overhead of partitioning during
+    query planning and execution.  The query planner is generally able to
+    handle partition hierarchies up a few hundred partitions.  Planning times
+    become longer and memory consumption becomes higher as more partitions are
+    added.  This is particularly true for the <command>UPDATE</command> and
+    <command>DELETE</command> commands. Another reason to be concerned about
+    having a large number of partitions is that the server's memory
+    consumption may grow significantly over a period of time, especially if
+    many sessions touch large numbers of partitions.  That's because each
+    partition requires its metadata to be loaded into the local memory of
+    each session that touches it.
+   </para>
+
+   <para>
+    With data warehouse type workloads, it can make sense to use a larger
+    number of partitions than with an <acronym>OLTP</acronym> type workload.
+    Generally, in data warehouses, query planning time is less of a concern as
+    the majority of processing time is spent during query execution.  With
+    either of these two types of workload, it is important to make the right
+    decisions early, as re-partitioning large quantities of data can be
+    painfully slow.  Simulations of the intended workload are often beneficial
+    for optimizing the partitioning strategy.  Never assume that more
+    partitions are better than fewer partitions and vice-versa.
+   </para>
+  </sect2>
+
   </sect1>
  
   <sect1 id="ddl-foreign-data">
author	David Rowley <drowley@postgresql.org>
	Tue, 11 Jun 2019 20:09:28 +0000 (08:09 +1200)
committer	David Rowley <drowley@postgresql.org>
	Tue, 11 Jun 2019 20:09:28 +0000 (08:09 +1200)