-<!-- $PostgreSQL: pgsql/doc/src/sgml/protocol.sgml,v 1.76 2009/12/02 04:54:10 tgl Exp $ -->
+<!-- doc/src/sgml/protocol.sgml -->
<chapter id="protocol">
<title>Frontend/Backend Protocol</title>
if it is able.
</para>
- <para>
- Higher level features built on this protocol (for example, how
- <application>libpq</application> passes certain environment
- variables when the connection is established) are covered elsewhere.
- </para>
-
<para>
In order to serve multiple clients efficiently, the server launches
a new <quote>backend</> process for each client.
<para>
This message contains the response data from the previous step
of GSSAPI or SSPI negotiation (AuthenticationGSS, AuthenticationSSPI
- or a previous AuthenticationGSSContinue). If the GSSAPI
+ or a previous AuthenticationGSSContinue). If the GSSAPI
or SSPI data in this message
indicates more data is needed to complete the authentication,
the frontend must send that data as another PasswordMessage. If
<para>
The unnamed prepared statement is likewise planned during Parse processing
if the Parse message defines no parameters. But if there are parameters,
- query planning occurs during Bind processing instead. This allows the
- planner to make use of the actual values of the parameters provided in
- the Bind message when planning the query.
+ query planning occurs every time Bind parameters are supplied. This allows the
+ planner to make use of the actual values of the parameters provided by
+ each Bind message, rather than use generic estimates.
</para>
<note>
<para>
In the event of a backend-detected error during copy-in mode (including
- receipt of a CopyFail message), the backend will issue an ErrorResponse
+ receipt of a CopyFail message), the backend will issue an ErrorResponse
message. If the <command>COPY</> command was issued via an extended-query
message, the backend will now discard frontend messages until a Sync
message is received, then it will issue ReadyForQuery and return to normal
</para>
<para>
- The CopyInResponse and CopyOutResponse messages include fields that
- inform the frontend of the number of columns per row and the format
- codes being used for each column. (As of the present implementation,
- all columns in a given <command>COPY</> operation will use the same
- format, but the message design does not assume this.)
+ There is another Copy-related mode called Copy-both, which allows
+ high-speed bulk data transfer to <emphasis>and</> from the server.
+ Copy-both mode is initiated when a backend in walsender mode
+ executes a <command>START_REPLICATION</command> statement. The
+ backend sends a CopyBothResponse message to the frontend. Both
+ the backend and the frontend may then send CopyData messages
+ until the connection is terminated. See <xref
+ linkend="protocol-replication">.
</para>
+
+ <para>
+ The CopyInResponse, CopyOutResponse and CopyBothResponse messages
+ include fields that inform the frontend of the number of columns
+ per row and the format codes being used for each column. (As of
+ the present implementation, all columns in a given <command>COPY</>
+ operation will use the same format, but the message design does not
+ assume this.)
+ </para>
+
</sect2>
<sect2 id="protocol-async">
<literal>standard_conforming_strings</> was not reported by releases
before 8.1;
<literal>IntervalStyle</> was not reported by releases before 8.4;
- <literal>application_name</> was not reported by releases before 8.5.)
+ <literal>application_name</> was not reported by releases before 9.0.)
Note that
<literal>server_version</>,
<literal>server_encoding</> and
backend will send a NotificationResponse message (not to be
confused with NoticeResponse!) whenever a
<command>NOTIFY</command> command is executed for the same
- notification name.
+ channel name.
</para>
<note>
</sect2>
</sect1>
+<sect1 id="protocol-replication">
+<title>Streaming Replication Protocol</title>
+
+<para>
+To initiate streaming replication, the frontend sends the
+<literal>replication</> parameter in the startup message. This tells the
+backend to go into walsender mode, wherein a small set of replication commands
+can be issued instead of SQL statements. Only the simple query protocol can be
+used in walsender mode.
+
+The commands accepted in walsender mode are:
+
+<variablelist>
+ <varlistentry>
+ <term>IDENTIFY_SYSTEM</term>
+ <listitem>
+ <para>
+ Requests the server to identify itself. Server replies with a result
+ set of a single row, containing two fields:
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term>
+ systemid
+ </term>
+ <listitem>
+ <para>
+ The unique system identifier identifying the cluster. This
+ can be used to check that the base backup used to initialize the
+ standby came from the same cluster.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ timeline
+ </term>
+ <listitem>
+ <para>
+ Current TimelineID. Also useful to check that the standby is
+ consistent with the master.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>START_REPLICATION <replaceable>XXX</>/<replaceable>XXX</></term>
+ <listitem>
+ <para>
+ Instructs server to start streaming WAL, starting at
+ WAL position <replaceable>XXX</>/<replaceable>XXX</>.
+ The server can reply with an error, e.g. if the requested section of WAL
+ has already been recycled. On success, server responds with a
+ CopyBothResponse message, and then starts to stream WAL to the frontend.
+ WAL will continue to be streamed until the connection is broken;
+ no further commands will be accepted.
+ </para>
+
+ <para>
+ WAL data is sent as a series of CopyData messages. (This allows
+ other information to be intermixed; in particular the server can send
+ an ErrorResponse message if it encounters a failure after beginning
+ to stream.) The payload in each CopyData message follows this format:
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term>
+ XLogData (B)
+ </term>
+ <listitem>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term>
+ Byte1('w')
+ </term>
+ <listitem>
+ <para>
+ Identifies the message as WAL data.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>
+ Byte8
+ </term>
+ <listitem>
+ <para>
+ The starting point of the WAL data in this message, given in
+ XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>
+ Byte8
+ </term>
+ <listitem>
+ <para>
+ The current end of WAL on the server, given in
+ XLogRecPtr format.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>
+ Byte8
+ </term>
+ <listitem>
+ <para>
+ The server's system clock at the time of transmission,
+ given in TimestampTz format.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>
+ Byte<replaceable>n</replaceable>
+ </term>
+ <listitem>
+ <para>
+ A section of the WAL data stream.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ A single WAL record is never split across two CopyData messages.
+ When a WAL record crosses a WAL page boundary, and is therefore
+ already split using continuation records, it can be split at the page
+ boundary. In other words, the first main WAL record and its
+ continuation records can be sent in different CopyData messages.
+ </para>
+ <para>
+ Note that all fields within the WAL data and the above-described header
+ will be in the sending server's native format. Endianness, and the
+ format for the timestamp, are unpredictable unless the receiver has
+ verified that the sender's system identifier matches its own
+ <filename>pg_control</> contents.
+ </para>
+ <para>
+ If the WAL sender process is terminated normally (during postmaster
+ shutdown), it will send a CommandComplete message before exiting.
+ This might not happen during an abnormal shutdown, of course.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>BASE_BACKUP [<literal>LABEL</literal> <replaceable>'label'</replaceable>] [<literal>PROGRESS</literal>] [<literal>FAST</literal>]</term>
+ <listitem>
+ <para>
+ Instructs the server to start streaming a base backup.
+ The system will automatically be put in backup mode before the backup
+ is started, and taken out of it when the backup is complete. The
+ following options are accepted:
+ <variablelist>
+ <varlistentry>
+ <term><literal>LABEL</literal> <replaceable>'label'</replaceable></term>
+ <listitem>
+ <para>
+ Sets the label of the backup. If none is specified, a backup label
+ of <literal>base backup</literal> will be used. The quoting rules
+ for the label are the same as a standard SQL string with
+ <xref linkend="guc-standard-conforming-strings"> turned on.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>PROGRESS</></term>
+ <listitem>
+ <para>
+ Request information required to generate a progress report. This will
+ send back an approximate size in the header of each tablespace, which
+ can be used to calculate how far along the stream is done. This is
+ calculated by enumerating all the file sizes once before the transfer
+ is even started, and may as such have a negative impact on the
+ performance - in particular it may take longer before the first data
+ is streamed. Since the database files can change during the backup,
+ the size is only approximate and may both grow and shrink between
+ the time of approximation and the sending of the actual files.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>FAST</></term>
+ <listitem>
+ <para>
+ Request a fast checkpoint.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ When the backup is started, the server will first send a header in
+ ordinary result set format, followed by one or more CopyResponse
+ results, one for PGDATA and one for each additional tablespace other
+ than <literal>pg_default</> and <literal>pg_global</>. The data in
+ the CopyResponse results will be a tar format (using ustar00
+ extensions) dump of the tablespace contents.
+ </para>
+ <para>
+ The header is an ordinary resultset with one row for each tablespace.
+ The fields in this row are:
+ <variablelist>
+ <varlistentry>
+ <term>spcoid</term>
+ <listitem>
+ <para>
+ The oid of the tablespace, or <literal>NULL</> if it's the base
+ directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>spclocation</term>
+ <listitem>
+ <para>
+ The full path of the tablespace directory, or <literal>NULL</>
+ if it's the base directory.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>size</term>
+ <listitem>
+ <para>
+ The approximate size of the tablespace, if progress report has
+ been requested; otherwise it's <literal>NULL</>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ <para>
+ The tar archive for the data directory and each tablespace will contain
+ all files in the directories, regardless of whether they are
+ <productname>PostgreSQL</> files or other files added to the same
+ directory. The only excluded files are:
+ <itemizedlist spacing="compact" mark="bullet">
+ <listitem>
+ <para>
+ <filename>postmaster.pid</>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <filename>pg_xlog</> (including subdirectories)
+ </para>
+ </listitem>
+ </itemizedlist>
+ Owner, group and file mode are set if the underlying filesystem on
+ the server supports it.
+ </para>
+ </listitem>
+ </varlistentry>
+</variablelist>
+
+</para>
+
+</sect1>
+
<sect1 id="protocol-message-types">
<title>Message Data Types</title>
value that will appear, otherwise the value is variable.
Eg. String, String("user").
</para>
-
+
<note>
<para>
<emphasis>There is no predefined limit</emphasis> on the length of a string
(denoted <replaceable>R</> below).
This can be zero to indicate that there are no result columns
or that the result columns should all use the default format
- (text);
+ (text);
or one, in which case the specified format code is applied
to all result columns (if any); or it can equal the actual
number of result columns of the query.
<replaceable>rows</replaceable> is the number of rows updated.
</para>
+ <para>
+ For a <command>SELECT</command> or <command>CREATE TABLE AS</command>
+ command, the tag is <literal>SELECT <replaceable>rows</replaceable></literal>
+ where <replaceable>rows</replaceable> is the number of rows retrieved.
+ </para>
+
<para>
For a <command>MOVE</command> command, the tag is
<literal>MOVE <replaceable>rows</replaceable></literal> where
characters, etc).
1 indicates the overall copy format is binary (similar
to DataRow format).
- See <xref linkend="sql-copy" endterm="sql-copy-title">
+ See <xref linkend="sql-copy">
for more information.
</para>
</listitem>
is textual (rows separated by newlines, columns
separated by separator characters, etc). 1 indicates
the overall copy format is binary (similar to DataRow
- format). See <xref linkend="sql-copy"
- endterm="sql-copy-title"> for more information.
+ format). See <xref linkend="sql-copy"> for more information.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int16
+</term>
+<listitem>
+<para>
+ The number of columns in the data to be copied
+ (denoted <replaceable>N</> below).
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int16[<replaceable>N</>]
+</term>
+<listitem>
+<para>
+ The format codes to be used for each column.
+ Each must presently be zero (text) or one (binary).
+ All must be zero if the overall copy format is textual.
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+
+</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>
+CopyBothResponse (B)
+</term>
+<listitem>
+<para>
+
+<variablelist>
+<varlistentry>
+<term>
+ Byte1('W')
+</term>
+<listitem>
+<para>
+ Identifies the message as a Start Copy Both response.
+ This message is used only for Streaming Replication.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int32
+</term>
+<listitem>
+<para>
+ Length of message contents in bytes, including self.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>
+ Int8
+</term>
+<listitem>
+<para>
+ 0 indicates the overall <command>COPY</command> format
+ is textual (rows separated by newlines, columns
+ separated by separator characters, etc). 1 indicates
+ the overall copy format is binary (similar to DataRow
+ format). See <xref linkend="sql-copy"> for more information.
</para>
</listitem>
</varlistentry>
</term>
<listitem>
<para>
- The name of the condition that the notify has been raised on.
+ The name of the channel that the notify has been raised on.
</para>
</listitem>
</varlistentry>
</term>
<listitem>
<para>
- Additional information passed from the notifying process.
- (Currently, this feature is unimplemented so the field
- is always an empty string.)
+ The <quote>payload</> string passed from the notifying process.
</para>
</listitem>
</varlistentry>
</sect1>
-
<sect1 id="protocol-changes">
<title>Summary of Changes since Protocol 2.0</title>
<para>
The NotificationResponse ('<literal>A</>') message has an additional string
-field, which is presently empty but might someday carry additional data passed
+field, which can carry a <quote>payload</> string passed
from the <command>NOTIFY</command> event sender.
</para>
</sect1>
-
</chapter>