Improve documentation's description of JOIN clauses.

author Tom Lane <tgl@sss.pgh.pa.us>

Wed, 19 Nov 2014 21:00:24 +0000 (16:00 -0500)

committer Tom Lane <tgl@sss.pgh.pa.us>

Wed, 19 Nov 2014 21:00:24 +0000 (16:00 -0500)
author Tom Lane <tgl@sss.pgh.pa.us>
Wed, 19 Nov 2014 21:00:24 +0000 (16:00 -0500)
committer Tom Lane <tgl@sss.pgh.pa.us>
Wed, 19 Nov 2014 21:00:24 +0000 (16:00 -0500)
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml

index c4a8ca8d8a450c2041cf035a4b15bac41cad46a4..7dbad462a5dd5ab01cb3d05e1197446393a93e05 100644 (file)
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -118,10 +118,12 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
  </synopsis>
  
      A table reference can be a table name (possibly schema-qualified),
-    or a derived table such as a subquery, a table join, or complex
-    combinations of these.  If more than one table reference is listed
-    in the <literal>FROM</> clause they are cross-joined (see below)
-    to form the intermediate virtual table that can then be subject to
+    or a derived table such as a subquery, a <literal>JOIN</> construct, or
+    complex combinations of these.  If more than one table reference is
+    listed in the <literal>FROM</> clause, the tables are cross-joined
+    (that is, the Cartesian product of their rows is formed; see below).
+    The result of the <literal>FROM</> list is an intermediate virtual
+    table that can then be subject to
      transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
      and <literal>HAVING</> clauses and is finally the result of the
      overall table expression.
@@ -161,6 +163,16 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
       A joined table is a table derived from two other (real or
       derived) tables according to the rules of the particular join
       type.  Inner, outer, and cross-joins are available.
+     The general syntax of a joined table is
+<synopsis>
+<replaceable>T1</replaceable> <replaceable>join_type</replaceable> <replaceable>T2</replaceable> <optional> <replaceable>join_condition</replaceable> </optional>
+</synopsis>
+     Joins of all types can be chained together, or nested: either or
+     both <replaceable>T1</replaceable> and
+     <replaceable>T2</replaceable> can be joined tables.  Parentheses
+     can be used around <literal>JOIN</> clauses to control the join
+     order.  In the absence of parentheses, <literal>JOIN</> clauses
+     nest left-to-right.
      </para>
  
      <variablelist>
@@ -197,10 +209,28 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
         <para>
          <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
          <replaceable>T2</replaceable></literal> is equivalent to
-        <literal>FROM <replaceable>T1</replaceable>,
-        <replaceable>T2</replaceable></literal>.  It is also equivalent to
          <literal>FROM <replaceable>T1</replaceable> INNER JOIN
          <replaceable>T2</replaceable> ON TRUE</literal> (see below).
+        It is also equivalent to
+        <literal>FROM <replaceable>T1</replaceable>,
+        <replaceable>T2</replaceable></literal>.
+        <note>
+        <para>
+         This latter equivalence does not hold exactly when more than two
+         tables appear, because <literal>JOIN</> binds more tightly than
+         comma.  For example
+         <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
+         <replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
+         ON <replaceable>condition</replaceable></literal>
+         is not the same as
+         <literal>FROM <replaceable>T1</replaceable>,
+         <replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
+         ON <replaceable>condition</replaceable></literal>
+         because the <replaceable>condition</replaceable> can
+         reference <replaceable>T1</replaceable> in the first case but not
+         the second.
+        </para>
+        </note>
         </para>
        </listitem>
       </varlistentry>
@@ -240,76 +270,6 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
          <quote>match</quote>, as explained in detail below.
         </para>
  
-       <para>
-        The <literal>ON</> clause is the most general kind of join
-        condition: it takes a Boolean value expression of the same
-        kind as is used in a <literal>WHERE</> clause.  A pair of rows
-        from <replaceable>T1</> and <replaceable>T2</> match if the
-        <literal>ON</> expression evaluates to true.
-       </para>
-
-       <para>
-        The <literal>USING</> clause allows you to take advantage of
-        the specific situation where both sides of the join use the
-        same name for the joining columns.  It takes a
-        comma-separated list of the shared column names
-        and forms a join using the equals operator.  Furthermore, the
-        output of <literal>JOIN USING</> has one column for each of the
-        listed columns, followed by the remaining columns from each table.
-       </para>
-
-       <para>The output column difference between <literal>ON</> and
-       <literal>USING</> when invoking <literal>SELECT *</> is:</para>
-         <itemizedlist>
-          <listitem>
-           <para>
-            <literal>ON</> - all columns from <replaceable>T1</> followed
-            by all columns from <replaceable>T2</>
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            <literal>USING</> - all join columns, one copy each
-            and in the listed order, followed by non-join columns
-            in <replaceable>T1</> followed by non-join columns in
-            <replaceable>T2</>
-           </para>
-          </listitem>
-          <listitem>
-           <para>
-            Examples provided below
-           </para>
-          </listitem>
-         </itemizedlist>
-
-       <para>
-        <indexterm>
-         <primary>join</primary>
-         <secondary>natural</secondary>
-        </indexterm>
-        <indexterm>
-         <primary>natural join</primary>
-        </indexterm>
-        Finally, <literal>NATURAL</> is a shorthand form of
-        <literal>USING</>: it forms a <literal>USING</> list
-        consisting of all column names that appear in both
-        input tables.  As with <literal>USING</>, these columns appear
-        only once in the output table.  If there are no common
-        columns, <literal>NATURAL</literal> behaves like
-        <literal>CROSS JOIN</literal>.
-       </para>
-
-       <note>
-        <para>
-         <literal>USING</literal> is reasonably safe from column changes
-         in the joined relations since only the specific columns mentioned
-         are considered.  <literal>NATURAL</> is considerably more problematic
-         if you are referring to relations only by name (views and tables)
-         since any schema changes to either relation that cause a new matching
-         column name to be present will cause the join to consider that new column.
-        </para>
-       </note>
-
         <para>
          The possible types of qualified join are:
  
@@ -387,19 +347,70 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
          </varlistentry>
         </variablelist>
         </para>
+
+       <para>
+        The <literal>ON</> clause is the most general kind of join
+        condition: it takes a Boolean value expression of the same
+        kind as is used in a <literal>WHERE</> clause.  A pair of rows
+        from <replaceable>T1</> and <replaceable>T2</> match if the
+        <literal>ON</> expression evaluates to true.
+       </para>
+
+       <para>
+        The <literal>USING</> clause is a shorthand that allows you to take
+        advantage of the specific situation where both sides of the join use
+        the same name for the joining column(s).  It takes a
+        comma-separated list of the shared column names
+        and forms a join condition that includes an equality comparison
+        for each one.  For example, joining <replaceable>T1</>
+        and <replaceable>T2</> with <literal>USING (a, b)</> produces
+        the join condition <literal>ON <replaceable>T1</>.a
+        = <replaceable>T2</>.a AND <replaceable>T1</>.b
+        = <replaceable>T2</>.b</literal>.
+       </para>
+
+       <para>
+        Furthermore, the output of <literal>JOIN USING</> suppresses
+        redundant columns: there is no need to print both of the matched
+        columns, since they must have equal values.  While <literal>JOIN
+        ON</> produces all columns from <replaceable>T1</> followed by all
+        columns from <replaceable>T2</>, <literal>JOIN USING</> produces one
+        output column for each of the listed column pairs (in the listed
+        order), followed by any remaining columns from <replaceable>T1</>,
+        followed by any remaining columns from <replaceable>T2</>.
+       </para>
+
+       <para>
+        <indexterm>
+         <primary>join</primary>
+         <secondary>natural</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>natural join</primary>
+        </indexterm>
+        Finally, <literal>NATURAL</> is a shorthand form of
+        <literal>USING</>: it forms a <literal>USING</> list
+        consisting of all column names that appear in both
+        input tables.  As with <literal>USING</>, these columns appear
+        only once in the output table.  If there are no common
+        column names, <literal>NATURAL</literal> behaves like
+        <literal>CROSS JOIN</literal>.
+       </para>
+
+       <note>
+        <para>
+         <literal>USING</literal> is reasonably safe from column changes
+         in the joined relations since only the listed columns
+         are combined.  <literal>NATURAL</> is considerably more risky since
+         any schema changes to either relation that cause a new matching
+         column name to be present will cause the join to combine that new
+         column as well.
+        </para>
+       </note>
        </listitem>
       </varlistentry>
      </variablelist>
  
-    <para>
-     Joins of all types can be chained together or nested: either or
-     both <replaceable>T1</replaceable> and
-     <replaceable>T2</replaceable> can be joined tables.  Parentheses
-     can be used around <literal>JOIN</> clauses to control the join
-     order.  In the absence of parentheses, <literal>JOIN</> clauses
-     nest left-to-right.
-    </para>
-
      <para>
       To put this together, assume we have tables <literal>t1</literal>:
  <programlisting>
@@ -516,6 +527,8 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
       clause is processed <emphasis>before</> the join, while
       a restriction placed in the <literal>WHERE</> clause is processed
       <emphasis>after</> the join.
+     That does not matter with inner joins, but it matters a lot with outer
+     joins.
      </para>
     </sect3>
author	Tom Lane <tgl@sss.pgh.pa.us>
	Wed, 19 Nov 2014 21:00:24 +0000 (16:00 -0500)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Wed, 19 Nov 2014 21:00:24 +0000 (16:00 -0500)