granicus.if.org Git - postgresql/blob - doc/src/sgml/replication-origins.sgml

   1 <!-- doc/src/sgml/replication-origins.sgml -->
   2 <chapter id="replication-origins">
   3  <title>Replication Progress Tracking</title>
   4  <indexterm zone="replication-origins">
   5   <primary>Replication Progress Tracking</primary>
   6  </indexterm>
   7  <indexterm zone="replication-origins">
   8   <primary>Replication Origins</primary>
   9  </indexterm>
  10
  11  <para>
  12   Replication origins are intended to make it easier to implement
  13   logical replication solutions on top
  14   of <xref linkend="logicaldecoding">. They provide a solution to two
  15   common problems:
  16   <itemizedlist>
  17    <listitem><para>How to safely keep track of replication progress</para></listitem>
  18    <listitem><para>How to change replication behavior, based on the
  19    origin of a row; e.g. to avoid loops in bi-directional replication
  20    setups</para></listitem>
  21   </itemizedlist>
  22  </para>
  23
  24  <para>
  25   Replication origins consist out of a name and a oid. The name, which
  26   is what should be used to refer to the origin across systems, is
  27   free-form text. It should be used in a way that makes conflicts
  28   between replication origins created by different replication
  29   solutions unlikely; e.g. by prefixing the replication solution's
  30   name to it.  The oid is used only to avoid having to store the long
  31   version in situations where space efficiency is important. It should
  32   never be shared between systems.
  33  </para>
  34
  35  <para>
  36   Replication origins can be created using the
  37   <link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>;
  38   dropped using
  39   <link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>;
  40   and seen in the
  41   <link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link>
  42   catalog.
  43  </para>
  44
  45  <para>
  46   When replicating from one system to another (independent of the fact that
  47   those two might be in the same cluster, or even same database) one
  48   nontrivial part of building a replication solution is to keep track of
  49   replay progress in a safe manner. When the applying process, or the whole
  50   cluster, dies, it needs to be possible to find out up to where data has
  51   successfully been replicated. Naive solutions to this like updating a row in
  52   a table for every replayed transaction have problems like runtime overhead
  53   bloat.
  54  </para>
  55
  56  <para>
  57   Using the replication origin infrastructure a session can be
  58   marked as replaying from a remote node (using the
  59   <link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link>
  60   function. Additionally the <acronym>LSN</acronym> and commit
  61   timestamp of every source transaction can be configured on a per
  62   transaction basis using
  63   <link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact-setup()</function></link>.
  64   If that's done replication progress will be persist in a crash safe
  65   manner. Replay progress for all replication origins can be seen in the
  66   <link linkend="catalog-pg-replication-origin-status">
  67    <structname>pg_replication_origin_status</structname>
  68   </link> view. A individual origin's progress, e.g. when resuming
  69   replication, can be acquired using
  70   <link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link>
  71   for any origin or
  72   <link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link>
  73   for the origin configured in the current session.
  74  </para>
  75
  76  <para>
  77   In more complex replication topologies than replication from exactly one
  78   system to one other, another problem can be that, that it is hard to avoid
  79   replicating replayed rows again. That can lead both to cycles in the
  80   replication and inefficiencies. Replication origins provide a optional
  81   mechanism to recognize and prevent that. When configured using the functions
  82   referenced in the previous paragraph, every change and transaction passed to
  83   output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin">)
  84   generated by the session is tagged with the replication origin of the
  85   generating session.  This allows to treat them differently in the output
  86   plugin, e.g. ignoring all but locally originating rows.  Additionally
  87   the <link linkend="logicaldecoding-output-plugin-filter-by-origin">
  88   <function>filter_by_origin_cb</function></link> callback can be used
  89   to filter the logical decoding change stream based on the
  90   source. While less flexible, filtering via that callback is
  91   considerably more efficient.
  92  </para>
  93 </chapter>