Some additional doc changes based around compression of page images in

author Bruce Momjian <bruce@momjian.us>

Thu, 13 Oct 2005 17:32:42 +0000 (17:32 +0000)

committer Bruce Momjian <bruce@momjian.us>

Thu, 13 Oct 2005 17:32:42 +0000 (17:32 +0000)
author Bruce Momjian <bruce@momjian.us>
Thu, 13 Oct 2005 17:32:42 +0000 (17:32 +0000)
committer Bruce Momjian <bruce@momjian.us>
Thu, 13 Oct 2005 17:32:42 +0000 (17:32 +0000)
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml

index 01cdae83d69f3a6352bf0eaed03ad0000b8adad8..4dbeae9fd66851d6f23a125f2982c8ed183adfc9 100644 (file)
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1,5 +1,5 @@
  <!--
-$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.69 2005/06/25 22:47:28 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.70 2005/10/13 17:32:42 momjian Exp $
  -->
  <chapter id="backup">
   <title>Backup and Restore</title>
@@ -1147,13 +1147,22 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"'  # Windows
     </para>
  
     <para>
-    It should also be noted that the present <acronym>WAL</acronym>
-    format is extremely bulky since it includes many disk page
-    snapshots.  This is appropriate for crash recovery purposes,
+    It should also be noted that the default <acronym>WAL</acronym>
+    format is fairly bulky since it includes many disk page snapshots. The pages
+    are partially compressed, using the simple expedient of removing the
+    empty space (if any) within each block. You can significantly reduce
+    the total volume of archived logs by turning off page snapshots 
+    using the <xref linkend="guc-full-page-writes"> parameter, 
+    though you should read the notes and warnings in 
+    <xref linkend="reliability"> before you do so. 
+    These page snapshots are designed to allow crash recovery,
      since we may need to fix partially-written disk pages.  It is not
-    necessary to store so many page copies for PITR operations, however.
+    necessary to store these page copies for PITR operations, however.
+    If you turn off <xref linkend="guc-full-page-writes">, your PITR
+    backup and recovery operations will continue to work successfully.
      An area for future development is to compress archived WAL data by
-    removing unnecessary page copies.  In the meantime, administrators
+    removing unnecessary page copies when <xref linkend="guc-full-page-writes">
+    is turned on.  In the meantime, administrators
      may wish to reduce the number of page snapshots included in WAL by
      increasing the checkpoint interval parameters as much as feasible.
     </para>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml

index 68557a26d23871e80793433f4aa5160adf57b512..5da06fddce351a420aa46dad56270de608f3f5c3 100644 (file)
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1,5 +1,5 @@
  <!--
-$PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.25 2005/10/08 20:27:25 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.26 2005/10/13 17:32:42 momjian Exp $
  -->
  <chapter Id="runtime-config">
    <title>Run-time Configuration</title>
@@ -1360,7 +1360,7 @@ SET ENABLE_SEQSCAN TO OFF;
         <para>
          When this option is on, the <productname>PostgreSQL</> server
          writes full pages to WAL when they are first modified after a
-        checkpoint so full recovery is possible. Turning this option off
+        checkpoint so crash recovery is possible. Turning this option off
          might lead to a corrupt system after an operating system crash
          or power failure because uncorrected partial pages might contain
          inconsistent or corrupt data. The risks are less but similar to
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml

index 8f96f483622f538334b4665c75c21f0e40b5ebac..62595c594e4fd00fc6d15b7883f1824e98df82fe 100644 (file)
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.35 2005/10/01 01:42:43 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.36 2005/10/13 17:32:42 momjian Exp $ -->
  
  <chapter id="reliability">
   <title>Reliability</title>
@@ -12,7 +12,7 @@
     failure (unrelated to the non-volatile area itself). To accomplish
     this, <productname>PostgreSQL</> uses the magnetic platters of modern
     disk drives for permanent storage that is immune to the failures
-   listed above. In fact, a computer can be completely destroyed, but if
+   listed above. In fact, even if a computer is fatally damaged, if
     the disk drives survive they can be moved to another computer with
     similar hardware and all committed transactions will remain intact.
    </para>
@@ -68,11 +68,13 @@
     these partially written cases. To guard against that,
     <productname>PostgreSQL</> periodically writes full page images to
     permanent storage <emphasis>before</> modifying the actual page on
-   disk. By doing this, during recovery <productname>PostgreSQL</> can
+   disk. By doing this, during crash recovery <productname>PostgreSQL</> can
     restore partially-written pages. If you have a battery-backed disk
-   controller that prevents partial page writes, you can turn off this
-   page imaging by using the <xref linkend="guc-full-page-writes">
-   parameter.
+   controller or filesystem (e.g. Reiser4) that prevents partial page writes, 
+   you can turn off this page imaging by using the 
+   <xref linkend="guc-full-page-writes"> parameter. This parameter has no 
+   effect on the successful use of Point in Time Recovery (PITR), 
+   described in <xref linkend="backup-online">.
    </para>
   
    <para>
@@ -107,14 +109,10 @@
      the data pages can be redone from the log records.  (This is
      roll-forward recovery, also known as REDO.)
     </para>
-  </sect1>
-
-  <sect1 id="wal-benefits">
-   <title>Benefits of Write-Ahead Logging</title>
  
-   <indexterm zone="wal-benefits">
-    <primary>fsync</primary>
-   </indexterm>
+   <para>
+    WAL brings three major benefits:
+   </para>
  
     <para>
      The first major benefit of using <acronym>WAL</acronym> is a
@@ -131,11 +129,11 @@
     </para>
  
     <para>
-    The next benefit is consistency of the data pages. The truth is
-    that, before <acronym>WAL</acronym>,
+    The next benefit is crash recovery protection. The truth is
+    that, before <acronym>WAL</acronym> was introduced back in release 7.1,
      <productname>PostgreSQL</productname> was never able to guarantee
-    consistency in the case of a crash.  Before
-    <acronym>WAL</acronym>, any crash during writing could result in:
+    consistency in the case of a crash.  Now, 
+    <acronym>WAL</acronym> protects fully against the following problems:
  
      <orderedlist>
       <listitem>
@@ -151,13 +149,6 @@
        of partially written data pages</simpara>
       </listitem>
      </orderedlist>
-
-    Problems with indexes (problems 1 and 2) could possibly have been
-    fixed by additional <function>fsync</function> calls, but it is
-    not obvious how to handle the last case without
-    <acronym>WAL</acronym>.  <acronym>WAL</acronym> saves the entire data
-    page content in the log if that is required to ensure page
-    consistency for after-crash recovery.
     </para>
  
     <para>
@@ -214,12 +205,14 @@
     <varname>checkpoint_timeout</varname> causes checkpoints to be done
     more often. This allows faster after-crash recovery (since less work
     will need to be redone). However, one must balance this against the
-   increased cost of flushing dirty data pages more often. In addition,
-   to ensure data page consistency, the first modification of a data
-   page after each checkpoint results in logging the entire page
-   content. Thus a smaller checkpoint interval increases the volume of
-   output to the WAL log, partially negating the goal of using a smaller
-   interval, and in any case causing more disk I/O.
+   increased cost of flushing dirty data pages more often. If 
+   <xref linkend="guc-full-page-writes"> is set (the default), there is 
+   another factor to consider. To ensure data page consistency, 
+   the first modification of a data page after each checkpoint results in 
+   logging the entire page content. In that case,
+   a smaller checkpoint interval increases the volume of output to the WAL log,
+   partially negating the goal of using a smaller interval, 
+   and in any case causing more disk I/O.
    </para>
  
    <para>
@@ -234,7 +227,9 @@
     a message will be output to the server log recommending increasing 
     <varname>checkpoint_segments</varname>.  Occasional appearance of such
     a message is not cause for alarm, but if it appears often then the
-   checkpoint control parameters should be increased.
+   checkpoint control parameters should be increased. Bulk operations such
+   as a COPY, INSERT SELECT etc. may cause a number of such warnings if you
+   do not set <xref linkend="guc-checkpoint-segments"> high enough.
    </para>
  
    <para>
@@ -252,7 +247,7 @@
    </para>
  
    <para>
-   There are two commonly used <acronym>WAL</acronym> functions:
+   There are two commonly used internal <acronym>WAL</acronym> functions:
     <function>LogInsert</function> and <function>LogFlush</function>.
     <function>LogInsert</function> is used to place a new record into
     the <acronym>WAL</acronym> buffers in shared memory. If there is no
@@ -275,9 +270,11 @@
     modifying the configuration parameter <xref
     linkend="guc-wal-buffers">.  The default number of <acronym>WAL</acronym>
     buffers is 8.  Increasing this value will
-   correspondingly increase shared memory usage.  (It should be noted
-   that there is presently little evidence to suggest that increasing
-   <varname>wal_buffers</> beyond the default is worthwhile.)
+   correspondingly increase shared memory usage.  When 
+   <xref linkend="guc-full-page-writes"> is set and the system is very busy, 
+   setting this value higher will help smooth response times during the 
+   period immediately following each checkpoint.  As a guide, a setting of 1024 
+   would be considered to be high.
    </para>
  
    <para>
@@ -313,7 +310,8 @@
     (provided that <productname>PostgreSQL</productname> has been
     compiled with support for it) will result in each
     <function>LogInsert</function> and <function>LogFlush</function>
-   <acronym>WAL</acronym> call being logged to the server log. This
+   <acronym>WAL</acronym> call being logged to the server log. The output
+   is too verbose for use as a guide to performance tuning. This
     option may be replaced by a more general mechanism in the future.
    </para>
   </sect1>
author	Bruce Momjian <bruce@momjian.us>
	Thu, 13 Oct 2005 17:32:42 +0000 (17:32 +0000)
committer	Bruce Momjian <bruce@momjian.us>
	Thu, 13 Oct 2005 17:32:42 +0000 (17:32 +0000)
doc/src/sgml/backup.sgml		patch \| blob \| history
doc/src/sgml/config.sgml		patch \| blob \| history
doc/src/sgml/wal.sgml		patch \| blob \| history