granicus.if.org Git - postgresql/blob - doc/src/sgml/queries.sgml

   1 <!-- doc/src/sgml/queries.sgml -->
   2
   3 <chapter id="queries">
   4  <title>Queries</title>
   5
   6  <indexterm zone="queries">
   7   <primary>query</primary>
   8  </indexterm>
   9
  10  <indexterm zone="queries">
  11   <primary>SELECT</primary>
  12  </indexterm>
  13
  14  <para>
  15   The previous chapters explained how to create tables, how to fill
  16   them with data, and how to manipulate that data.  Now we finally
  17   discuss how to retrieve the data from the database.
  18  </para>
  19
  20
  21  <sect1 id="queries-overview">
  22   <title>Overview</title>
  23
  24   <para>
  25    The process of retrieving or the command to retrieve data from a
  26    database is called a <firstterm>query</firstterm>.  In SQL the
  27    <xref linkend="sql-select"> command is
  28    used to specify queries.  The general syntax of the
  29    <command>SELECT</command> command is
  30 <synopsis>
  31 <optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
  32 </synopsis>
  33    The following sections describe the details of the select list, the
  34    table expression, and the sort specification.  <literal>WITH</>
  35    queries are treated last since they are an advanced feature.
  36   </para>
  37
  38   <para>
  39    A simple kind of query has the form:
  40 <programlisting>
  41 SELECT * FROM table1;
  42 </programlisting>
  43   Assuming that there is a table called <literal>table1</literal>,
  44   this command would retrieve all rows and all user-defined columns from
  45   <literal>table1</literal>.  (The method of retrieval depends on the
  46   client application.  For example, the
  47   <application>psql</application> program will display an ASCII-art
  48   table on the screen, while client libraries will offer functions to
  49   extract individual values from the query result.)  The select list
  50   specification <literal>*</literal> means all columns that the table
  51   expression happens to provide.  A select list can also select a
  52   subset of the available columns or make calculations using the
  53   columns.  For example, if
  54   <literal>table1</literal> has columns named <literal>a</>,
  55   <literal>b</>, and <literal>c</> (and perhaps others) you can make
  56   the following query:
  57 <programlisting>
  58 SELECT a, b + c FROM table1;
  59 </programlisting>
  60   (assuming that <literal>b</> and <literal>c</> are of a numerical
  61   data type).
  62   See <xref linkend="queries-select-lists"> for more details.
  63  </para>
  64
  65  <para>
  66   <literal>FROM table1</literal> is a simple kind of
  67   table expression: it reads just one table.  In general, table
  68   expressions can be complex constructs of base tables, joins, and
  69   subqueries.  But you can also omit the table expression entirely and
  70   use the <command>SELECT</command> command as a calculator:
  71 <programlisting>
  72 SELECT 3 * 4;
  73 </programlisting>
  74   This is more useful if the expressions in the select list return
  75   varying results.  For example, you could call a function this way:
  76 <programlisting>
  77 SELECT random();
  78 </programlisting>
  79   </para>
  80  </sect1>
  81
  82
  83  <sect1 id="queries-table-expressions">
  84   <title>Table Expressions</title>
  85
  86   <indexterm zone="queries-table-expressions">
  87    <primary>table expression</primary>
  88   </indexterm>
  89
  90   <para>
  91    A <firstterm>table expression</firstterm> computes a table.  The
  92    table expression contains a <literal>FROM</> clause that is
  93    optionally followed by <literal>WHERE</>, <literal>GROUP BY</>, and
  94    <literal>HAVING</> clauses.  Trivial table expressions simply refer
  95    to a table on disk, a so-called base table, but more complex
  96    expressions can be used to modify or combine base tables in various
  97    ways.
  98   </para>
  99
 100   <para>
 101    The optional <literal>WHERE</>, <literal>GROUP BY</>, and
 102    <literal>HAVING</> clauses in the table expression specify a
 103    pipeline of successive transformations performed on the table
 104    derived in the <literal>FROM</> clause.  All these transformations
 105    produce a virtual table that provides the rows that are passed to
 106    the select list to compute the output rows of the query.
 107   </para>
 108
 109   <sect2 id="queries-from">
 110    <title>The <literal>FROM</literal> Clause</title>
 111
 112    <para>
 113     The <xref linkend="sql-from" endterm="sql-from-title"> derives a
 114     table from one or more other tables given in a comma-separated
 115     table reference list.
 116 <synopsis>
 117 FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
 118 </synopsis>
 119
 120     A table reference can be a table name (possibly schema-qualified),
 121     or a derived table such as a subquery, a table join, or complex
 122     combinations of these.  If more than one table reference is listed
 123     in the <literal>FROM</> clause they are cross-joined (see below)
 124     to form the intermediate virtual table that can then be subject to
 125     transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
 126     and <literal>HAVING</> clauses and is finally the result of the
 127     overall table expression.
 128    </para>
 129
 130    <indexterm>
 131     <primary>ONLY</primary>
 132    </indexterm>
 133
 134    <para>
 135     When a table reference names a table that is the parent of a
 136     table inheritance hierarchy, the table reference produces rows of
 137     not only that table but all of its descendant tables, unless the
 138     key word <literal>ONLY</> precedes the table name.  However, the
 139     reference produces only the columns that appear in the named table
 140     &mdash; any columns added in subtables are ignored.
 141    </para>
 142
 143    <para>
 144     Instead of writing <literal>ONLY</> before the table name, you can write
 145     <literal>*</> after the table name to explicitly specify that descendant
 146     tables are included.  Writing <literal>*</> is not necessary since that
 147     behavior is the default (unless you have changed the setting of the <xref
 148     linkend="guc-sql-inheritance"> configuration option).  However writing
 149     <literal>*</> might be useful to emphasize that additional tables will be
 150     searched.
 151    </para>
 152
 153    <sect3 id="queries-join">
 154     <title>Joined Tables</title>
 155
 156     <indexterm zone="queries-join">
 157      <primary>join</primary>
 158     </indexterm>
 159
 160     <para>
 161      A joined table is a table derived from two other (real or
 162      derived) tables according to the rules of the particular join
 163      type.  Inner, outer, and cross-joins are available.
 164     </para>
 165
 166     <variablelist>
 167      <title>Join Types</title>
 168
 169      <varlistentry>
 170       <term>Cross join</term>
 171
 172       <indexterm>
 173        <primary>join</primary>
 174        <secondary>cross</secondary>
 175       </indexterm>
 176
 177       <indexterm>
 178        <primary>cross join</primary>
 179       </indexterm>
 180
 181       <listitem>
 182 <synopsis>
 183 <replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
 184 </synopsis>
 185
 186        <para>
 187         For every possible combination of rows from
 188         <replaceable>T1</replaceable> and
 189         <replaceable>T2</replaceable> (i.e., a Cartesian product),
 190         the joined table will contain a
 191         row consisting of all columns in <replaceable>T1</replaceable>
 192         followed by all columns in <replaceable>T2</replaceable>.  If
 193         the tables have N and M rows respectively, the joined
 194         table will have N * M rows.
 195        </para>
 196
 197        <para>
 198         <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
 199         <replaceable>T2</replaceable></literal> is equivalent to
 200         <literal>FROM <replaceable>T1</replaceable>,
 201         <replaceable>T2</replaceable></literal>.  It is also equivalent to
 202         <literal>FROM <replaceable>T1</replaceable> INNER JOIN
 203         <replaceable>T2</replaceable> ON TRUE</literal> (see below).
 204        </para>
 205       </listitem>
 206      </varlistentry>
 207
 208      <varlistentry>
 209       <term>Qualified joins</term>
 210
 211       <indexterm>
 212        <primary>join</primary>
 213        <secondary>outer</secondary>
 214       </indexterm>
 215
 216       <indexterm>
 217        <primary>outer join</primary>
 218       </indexterm>
 219
 220       <listitem>
 221 <synopsis>
 222 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
 223 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
 224 <replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
 225 </synopsis>
 226
 227        <para>
 228         The words <literal>INNER</literal> and
 229         <literal>OUTER</literal> are optional in all forms.
 230         <literal>INNER</literal> is the default;
 231         <literal>LEFT</literal>, <literal>RIGHT</literal>, and
 232         <literal>FULL</literal> imply an outer join.
 233        </para>
 234
 235        <para>
 236         The <firstterm>join condition</firstterm> is specified in the
 237         <literal>ON</> or <literal>USING</> clause, or implicitly by
 238         the word <literal>NATURAL</>.  The join condition determines
 239         which rows from the two source tables are considered to
 240         <quote>match</quote>, as explained in detail below.
 241        </para>
 242
 243        <para>
 244         The <literal>ON</> clause is the most general kind of join
 245         condition: it takes a Boolean value expression of the same
 246         kind as is used in a <literal>WHERE</> clause.  A pair of rows
 247         from <replaceable>T1</> and <replaceable>T2</> match if the
 248         <literal>ON</> expression evaluates to true for them.
 249        </para>
 250
 251        <para>
 252         <literal>USING</> is a shorthand notation: it takes a
 253         comma-separated list of column names, which the joined tables
 254         must have in common, and forms a join condition specifying
 255         equality of each of these pairs of columns.  Furthermore, the
 256         output of <literal>JOIN USING</> has one column for each of
 257         the equated pairs of input columns, followed by the
 258         remaining columns from each table.  Thus, <literal>USING (a, b,
 259         c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
 260         t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
 261         if <literal>ON</> is used there will be two columns
 262         <literal>a</>, <literal>b</>, and <literal>c</> in the result,
 263         whereas with <literal>USING</> there will be only one of each
 264         (and they will appear first if <command>SELECT *</> is used).
 265        </para>
 266
 267        <para>
 268         <indexterm>
 269          <primary>join</primary>
 270          <secondary>natural</secondary>
 271         </indexterm>
 272         <indexterm>
 273          <primary>natural join</primary>
 274         </indexterm>
 275         Finally, <literal>NATURAL</> is a shorthand form of
 276         <literal>USING</>: it forms a <literal>USING</> list
 277         consisting of all column names that appear in both
 278         input tables.  As with <literal>USING</>, these columns appear
 279         only once in the output table.  If there are no common
 280         columns, <literal>NATURAL</literal> behaves like
 281         <literal>CROSS JOIN</literal>.
 282        </para>
 283
 284        <para>
 285         The possible types of qualified join are:
 286
 287        <variablelist>
 288         <varlistentry>
 289          <term><literal>INNER JOIN</></term>
 290
 291          <listitem>
 292           <para>
 293            For each row R1 of T1, the joined table has a row for each
 294            row in T2 that satisfies the join condition with R1.
 295           </para>
 296          </listitem>
 297         </varlistentry>
 298
 299         <varlistentry>
 300          <term><literal>LEFT OUTER JOIN</></term>
 301
 302          <indexterm>
 303           <primary>join</primary>
 304           <secondary>left</secondary>
 305          </indexterm>
 306
 307          <indexterm>
 308           <primary>left join</primary>
 309          </indexterm>
 310
 311          <listitem>
 312           <para>
 313            First, an inner join is performed.  Then, for each row in
 314            T1 that does not satisfy the join condition with any row in
 315            T2, a joined row is added with null values in columns of
 316            T2.  Thus, the joined table always has at least
 317            one row for each row in T1.
 318           </para>
 319          </listitem>
 320         </varlistentry>
 321
 322         <varlistentry>
 323          <term><literal>RIGHT OUTER JOIN</></term>
 324
 325          <indexterm>
 326           <primary>join</primary>
 327           <secondary>right</secondary>
 328          </indexterm>
 329
 330          <indexterm>
 331           <primary>right join</primary>
 332          </indexterm>
 333
 334          <listitem>
 335           <para>
 336            First, an inner join is performed.  Then, for each row in
 337            T2 that does not satisfy the join condition with any row in
 338            T1, a joined row is added with null values in columns of
 339            T1.  This is the converse of a left join: the result table
 340            will always have a row for each row in T2.
 341           </para>
 342          </listitem>
 343         </varlistentry>
 344
 345         <varlistentry>
 346          <term><literal>FULL OUTER JOIN</></term>
 347
 348          <listitem>
 349           <para>
 350            First, an inner join is performed.  Then, for each row in
 351            T1 that does not satisfy the join condition with any row in
 352            T2, a joined row is added with null values in columns of
 353            T2.  Also, for each row of T2 that does not satisfy the
 354            join condition with any row in T1, a joined row with null
 355            values in the columns of T1 is added.
 356           </para>
 357          </listitem>
 358         </varlistentry>
 359        </variablelist>
 360        </para>
 361       </listitem>
 362      </varlistentry>
 363     </variablelist>
 364
 365     <para>
 366      Joins of all types can be chained together or nested: either or
 367      both <replaceable>T1</replaceable> and
 368      <replaceable>T2</replaceable> can be joined tables.  Parentheses
 369      can be used around <literal>JOIN</> clauses to control the join
 370      order.  In the absence of parentheses, <literal>JOIN</> clauses
 371      nest left-to-right.
 372     </para>
 373
 374     <para>
 375      To put this together, assume we have tables <literal>t1</literal>:
 376 <programlisting>
 377  num | name
 378 -----+------
 379    1 | a
 380    2 | b
 381    3 | c
 382 </programlisting>
 383      and <literal>t2</literal>:
 384 <programlisting>
 385  num | value
 386 -----+-------
 387    1 | xxx
 388    3 | yyy
 389    5 | zzz
 390 </programlisting>
 391      then we get the following results for the various joins:
 392 <screen>
 393 <prompt>=&gt;</> <userinput>SELECT * FROM t1 CROSS JOIN t2;</>
 394  num | name | num | value
 395 -----+------+-----+-------
 396    1 | a    |   1 | xxx
 397    1 | a    |   3 | yyy
 398    1 | a    |   5 | zzz
 399    2 | b    |   1 | xxx
 400    2 | b    |   3 | yyy
 401    2 | b    |   5 | zzz
 402    3 | c    |   1 | xxx
 403    3 | c    |   3 | yyy
 404    3 | c    |   5 | zzz
 405 (9 rows)
 406
 407 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;</>
 408  num | name | num | value
 409 -----+------+-----+-------
 410    1 | a    |   1 | xxx
 411    3 | c    |   3 | yyy
 412 (2 rows)
 413
 414 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 USING (num);</>
 415  num | name | value
 416 -----+------+-------
 417    1 | a    | xxx
 418    3 | c    | yyy
 419 (2 rows)
 420
 421 <prompt>=&gt;</> <userinput>SELECT * FROM t1 NATURAL INNER JOIN t2;</>
 422  num | name | value
 423 -----+------+-------
 424    1 | a    | xxx
 425    3 | c    | yyy
 426 (2 rows)
 427
 428 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;</>
 429  num | name | num | value
 430 -----+------+-----+-------
 431    1 | a    |   1 | xxx
 432    2 | b    |     |
 433    3 | c    |   3 | yyy
 434 (3 rows)
 435
 436 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 USING (num);</>
 437  num | name | value
 438 -----+------+-------
 439    1 | a    | xxx
 440    2 | b    |
 441    3 | c    | yyy
 442 (3 rows)
 443
 444 <prompt>=&gt;</> <userinput>SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;</>
 445  num | name | num | value
 446 -----+------+-----+-------
 447    1 | a    |   1 | xxx
 448    3 | c    |   3 | yyy
 449      |      |   5 | zzz
 450 (3 rows)
 451
 452 <prompt>=&gt;</> <userinput>SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;</>
 453  num | name | num | value
 454 -----+------+-----+-------
 455    1 | a    |   1 | xxx
 456    2 | b    |     |
 457    3 | c    |   3 | yyy
 458      |      |   5 | zzz
 459 (4 rows)
 460 </screen>
 461     </para>
 462
 463     <para>
 464      The join condition specified with <literal>ON</> can also contain
 465      conditions that do not relate directly to the join.  This can
 466      prove useful for some queries but needs to be thought out
 467      carefully.  For example:
 468 <screen>
 469 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';</>
 470  num | name | num | value
 471 -----+------+-----+-------
 472    1 | a    |   1 | xxx
 473    2 | b    |     |
 474    3 | c    |     |
 475 (3 rows)
 476 </screen>
 477      Notice that placing the restriction in the <literal>WHERE</> clause
 478      produces a different result:
 479 <screen>
 480 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';</>
 481  num | name | num | value
 482 -----+------+-----+-------
 483    1 | a    |   1 | xxx
 484 (1 row)
 485 </screen>
 486      This is because a restriction placed in the <literal>ON</>
 487      clause is processed <emphasis>before</> the join, while
 488      a restriction placed in the <literal>WHERE</> clause is processed
 489      <emphasis>after</> the join.
 490     </para>
 491    </sect3>
 492
 493    <sect3 id="queries-table-aliases">
 494     <title>Table and Column Aliases</title>
 495
 496     <indexterm zone="queries-table-aliases">
 497      <primary>alias</primary>
 498      <secondary>in the FROM clause</secondary>
 499     </indexterm>
 500
 501     <indexterm>
 502      <primary>label</primary>
 503      <see>alias</see>
 504     </indexterm>
 505
 506     <para>
 507      A temporary name can be given to tables and complex table
 508      references to be used for references to the derived table in
 509      the rest of the query.  This is called a <firstterm>table
 510      alias</firstterm>.
 511     </para>
 512
 513     <para>
 514      To create a table alias, write
 515 <synopsis>
 516 FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
 517 </synopsis>
 518      or
 519 <synopsis>
 520 FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
 521 </synopsis>
 522      The <literal>AS</literal> key word is optional noise.
 523      <replaceable>alias</replaceable> can be any identifier.
 524     </para>
 525
 526     <para>
 527      A typical application of table aliases is to assign short
 528      identifiers to long table names to keep the join clauses
 529      readable.  For example:
 530 <programlisting>
 531 SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
 532 </programlisting>
 533     </para>
 534
 535     <para>
 536      The alias becomes the new name of the table reference so far as the
 537      current query is concerned &mdash; it is not allowed to refer to the
 538      table by the original name elsewhere in the query.  Thus, this is not
 539      valid:
 540 <programlisting>
 541 SELECT * FROM my_table AS m WHERE my_table.a &gt; 5;    -- wrong
 542 </programlisting>
 543     </para>
 544
 545     <para>
 546      Table aliases are mainly for notational convenience, but it is
 547      necessary to use them when joining a table to itself, e.g.:
 548 <programlisting>
 549 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
 550 </programlisting>
 551      Additionally, an alias is required if the table reference is a
 552      subquery (see <xref linkend="queries-subqueries">).
 553     </para>
 554
 555     <para>
 556      Parentheses are used to resolve ambiguities.  In the following example,
 557      the first statement assigns the alias <literal>b</literal> to the second
 558      instance of <literal>my_table</>, but the second statement assigns the
 559      alias to the result of the join:
 560 <programlisting>
 561 SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
 562 SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
 563 </programlisting>
 564     </para>
 565
 566     <para>
 567      Another form of table aliasing gives temporary names to the columns of
 568      the table, as well as the table itself:
 569 <synopsis>
 570 FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
 571 </synopsis>
 572      If fewer column aliases are specified than the actual table has
 573      columns, the remaining columns are not renamed.  This syntax is
 574      especially useful for self-joins or subqueries.
 575     </para>
 576
 577     <para>
 578      When an alias is applied to the output of a <literal>JOIN</>
 579      clause, the alias hides the original
 580      name(s) within the <literal>JOIN</>.  For example:
 581 <programlisting>
 582 SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
 583 </programlisting>
 584      is valid SQL, but:
 585 <programlisting>
 586 SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
 587 </programlisting>
 588      is not valid; the table alias <literal>a</> is not visible
 589      outside the alias <literal>c</>.
 590     </para>
 591    </sect3>
 592
 593    <sect3 id="queries-subqueries">
 594     <title>Subqueries</title>
 595
 596     <indexterm zone="queries-subqueries">
 597      <primary>subquery</primary>
 598     </indexterm>
 599
 600     <para>
 601      Subqueries specifying a derived table must be enclosed in
 602      parentheses and <emphasis>must</emphasis> be assigned a table
 603      alias name (as in <xref linkend="queries-table-aliases">).  For
 604      example:
 605 <programlisting>
 606 FROM (SELECT * FROM table1) AS alias_name
 607 </programlisting>
 608     </para>
 609
 610     <para>
 611      This example is equivalent to <literal>FROM table1 AS
 612      alias_name</literal>.  More interesting cases, which cannot be
 613      reduced to a plain join, arise when the subquery involves
 614      grouping or aggregation.
 615     </para>
 616
 617     <para>
 618      A subquery can also be a <command>VALUES</> list:
 619 <programlisting>
 620 FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
 621      AS names(first, last)
 622 </programlisting>
 623      Again, a table alias is required.  Assigning alias names to the columns
 624      of the <command>VALUES</> list is optional, but is good practice.
 625      For more information see <xref linkend="queries-values">.
 626     </para>
 627    </sect3>
 628
 629    <sect3 id="queries-tablefunctions">
 630     <title>Table Functions</title>
 631
 632     <indexterm zone="queries-tablefunctions"><primary>table function</></>
 633
 634     <indexterm zone="queries-tablefunctions">
 635      <primary>function</>
 636      <secondary>in the FROM clause</>
 637     </indexterm>
 638
 639     <para>
 640      Table functions are functions that produce a set of rows, made up
 641      of either base data types (scalar types) or composite data types
 642      (table rows).  They are used like a table, view, or subquery in
 643      the <literal>FROM</> clause of a query. Columns returned by table
 644      functions can be included in <literal>SELECT</>,
 645      <literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
 646      as a table, view, or subquery column.
 647     </para>
 648
 649     <para>
 650      If a table function returns a base data type, the single result
 651      column name matches the function name. If the function returns a
 652      composite type, the result columns get the same names as the
 653      individual attributes of the type.
 654     </para>
 655
 656     <para>
 657      A table function can be aliased in the <literal>FROM</> clause,
 658      but it also can be left unaliased. If a function is used in the
 659      <literal>FROM</> clause with no alias, the function name is used
 660      as the resulting table name.
 661     </para>
 662
 663     <para>
 664      Some examples:
 665 <programlisting>
 666 CREATE TABLE foo (fooid int, foosubid int, fooname text);
 667
 668 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
 669     SELECT * FROM foo WHERE fooid = $1;
 670 $$ LANGUAGE SQL;
 671
 672 SELECT * FROM getfoo(1) AS t1;
 673
 674 SELECT * FROM foo
 675     WHERE foosubid IN (
 676                         SELECT foosubid
 677                         FROM getfoo(foo.fooid) z
 678                         WHERE z.fooid = foo.fooid
 679                       );
 680
 681 CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
 682
 683 SELECT * FROM vw_getfoo;
 684 </programlisting>
 685     </para>
 686
 687     <para>
 688      In some cases it is useful to define table functions that can
 689      return different column sets depending on how they are invoked.
 690      To support this, the table function can be declared as returning
 691      the pseudotype <type>record</>.  When such a function is used in
 692      a query, the expected row structure must be specified in the
 693      query itself, so that the system can know how to parse and plan
 694      the query.  Consider this example:
 695 <programlisting>
 696 SELECT *
 697     FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
 698       AS t1(proname name, prosrc text)
 699     WHERE proname LIKE 'bytea%';
 700 </programlisting>
 701      The <xref linkend="CONTRIB-DBLINK-FUNCTION"> function
 702      (part of the <xref linkend="dblink"> module>) executes
 703      a remote query.  It is declared to return
 704      <type>record</> since it might be used for any kind of query.
 705      The actual column set must be specified in the calling query so
 706      that the parser knows, for example, what <literal>*</> should
 707      expand to.
 708     </para>
 709    </sect3>
 710
 711    <sect3 id="queries-lateral">
 712     <title><literal>LATERAL</> Subqueries</title>
 713
 714     <indexterm zone="queries-lateral">
 715      <primary>LATERAL</>
 716      <secondary>in the FROM clause</>
 717     </indexterm>
 718
 719     <para>
 720      Subqueries appearing in <literal>FROM</> can be
 721      preceded by the key word <literal>LATERAL</>.  This allows them to
 722      reference columns provided by preceding <literal>FROM</> items.
 723      (Without <literal>LATERAL</literal>, each subquery is
 724      evaluated independently and so cannot cross-reference any other
 725      <literal>FROM</> item.)
 726     </para>
 727
 728     <para>
 729      Table functions appearing in <literal>FROM</> can also be
 730      preceded by the key word <literal>LATERAL</>, but for functions the
 731      key word is optional; the function's arguments can contain references
 732      to columns provided by preceding <literal>FROM</> items in any case.
 733     </para>
 734
 735     <para>
 736      A <literal>LATERAL</literal> item can appear at top level in the
 737      <literal>FROM</> list, or within a <literal>JOIN</> tree.  In the latter
 738      case it can also refer to any items that are on the left-hand side of a
 739      <literal>JOIN</> that it is on the right-hand side of.
 740     </para>
 741
 742     <para>
 743      When a <literal>FROM</> item contains <literal>LATERAL</literal>
 744      cross-references, evaluation proceeds as follows: for each row of the
 745      <literal>FROM</> item providing the cross-referenced column(s), or
 746      set of rows of multiple <literal>FROM</> items providing the
 747      columns, the <literal>LATERAL</literal> item is evaluated using that
 748      row or row set's values of the columns.  The resulting row(s) are
 749      joined as usual with the rows they were computed from.  This is
 750      repeated for each row or set of rows from the column source table(s).
 751     </para>
 752
 753     <para>
 754      A trivial example of <literal>LATERAL</literal> is
 755 <programlisting>
 756 SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;
 757 </programlisting>
 758      This is not especially useful since it has exactly the same result as
 759      the more conventional
 760 <programlisting>
 761 SELECT * FROM foo, bar WHERE bar.id = foo.bar_id;
 762 </programlisting>
 763      <literal>LATERAL</literal> is primarily useful when the cross-referenced
 764      column is necessary for computing the row(s) to be joined.  A common
 765      application is providing an argument value for a set-returning function.
 766      For example, supposing that <function>vertices(polygon)</> returns the
 767      set of vertices of a polygon, we could identify close-together vertices
 768      of polygons stored in a table with:
 769 <programlisting>
 770 SELECT p1.id, p2.id, v1, v2
 771 FROM polygons p1, polygons p2,
 772      LATERAL vertices(p1.poly) v1,
 773      LATERAL vertices(p2.poly) v2
 774 WHERE (v1 &lt;-&gt; v2) &lt; 10 AND p1.id != p2.id;
 775 </programlisting>
 776      This query could also be written
 777 <programlisting>
 778 SELECT p1.id, p2.id, v1, v2
 779 FROM polygons p1 CROSS JOIN LATERAL vertices(p1.poly) v1,
 780      polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
 781 WHERE (v1 &lt;-&gt; v2) &lt; 10 AND p1.id != p2.id;
 782 </programlisting>
 783      or in several other equivalent formulations.  (As already mentioned,
 784      the <literal>LATERAL</> key word is unnecessary in this example, but
 785      we use it for clarity.)
 786     </para>
 787
 788     <para>
 789      It is often particularly handy to <literal>LEFT JOIN</> to a
 790      <literal>LATERAL</literal> subquery, so that source rows will appear in
 791      the result even if the <literal>LATERAL</literal> subquery produces no
 792      rows for them.  For example, if <function>get_product_names()</> returns
 793      the names of products made by a manufacturer, but some manufacturers in
 794      our table currently produce no products, we could find out which ones
 795      those are like this:
 796 <programlisting>
 797 SELECT m.name
 798 FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
 799 WHERE pname IS NULL;
 800 </programlisting>
 801     </para>
 802    </sect3>
 803   </sect2>
 804
 805   <sect2 id="queries-where">
 806    <title>The <literal>WHERE</literal> Clause</title>
 807
 808    <indexterm zone="queries-where">
 809     <primary>WHERE</primary>
 810    </indexterm>
 811
 812    <para>
 813     The syntax of the <xref linkend="sql-where"
 814     endterm="sql-where-title"> is
 815 <synopsis>
 816 WHERE <replaceable>search_condition</replaceable>
 817 </synopsis>
 818     where <replaceable>search_condition</replaceable> is any value
 819     expression (see <xref linkend="sql-expressions">) that
 820     returns a value of type <type>boolean</type>.
 821    </para>
 822
 823    <para>
 824     After the processing of the <literal>FROM</> clause is done, each
 825     row of the derived virtual table is checked against the search
 826     condition.  If the result of the condition is true, the row is
 827     kept in the output table, otherwise (i.e., if the result is
 828     false or null) it is discarded.  The search condition typically
 829     references at least one column of the table generated in the
 830     <literal>FROM</> clause; this is not required, but otherwise the
 831     <literal>WHERE</> clause will be fairly useless.
 832    </para>
 833
 834    <note>
 835     <para>
 836      The join condition of an inner join can be written either in
 837      the <literal>WHERE</> clause or in the <literal>JOIN</> clause.
 838      For example, these table expressions are equivalent:
 839 <programlisting>
 840 FROM a, b WHERE a.id = b.id AND b.val &gt; 5
 841 </programlisting>
 842      and:
 843 <programlisting>
 844 FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
 845 </programlisting>
 846      or perhaps even:
 847 <programlisting>
 848 FROM a NATURAL JOIN b WHERE b.val &gt; 5
 849 </programlisting>
 850      Which one of these you use is mainly a matter of style.  The
 851      <literal>JOIN</> syntax in the <literal>FROM</> clause is
 852      probably not as portable to other SQL database management systems,
 853      even though it is in the SQL standard.  For
 854      outer joins there is no choice:  they must be done in
 855      the <literal>FROM</> clause.  The <literal>ON</> or <literal>USING</>
 856      clause of an outer join is <emphasis>not</> equivalent to a
 857      <literal>WHERE</> condition, because it results in the addition
 858      of rows (for unmatched input rows) as well as the removal of rows
 859      in the final result.
 860     </para>
 861    </note>
 862
 863    <para>
 864     Here are some examples of <literal>WHERE</literal> clauses:
 865 <programlisting>
 866 SELECT ... FROM fdt WHERE c1 &gt; 5
 867
 868 SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
 869
 870 SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
 871
 872 SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
 873
 874 SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
 875
 876 SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 &gt; fdt.c1)
 877 </programlisting>
 878     <literal>fdt</literal> is the table derived in the
 879     <literal>FROM</> clause. Rows that do not meet the search
 880     condition of the <literal>WHERE</> clause are eliminated from
 881     <literal>fdt</literal>. Notice the use of scalar subqueries as
 882     value expressions.  Just like any other query, the subqueries can
 883     employ complex table expressions.  Notice also how
 884     <literal>fdt</literal> is referenced in the subqueries.
 885     Qualifying <literal>c1</> as <literal>fdt.c1</> is only necessary
 886     if <literal>c1</> is also the name of a column in the derived
 887     input table of the subquery.  But qualifying the column name adds
 888     clarity even when it is not needed.  This example shows how the column
 889     naming scope of an outer query extends into its inner queries.
 890    </para>
 891   </sect2>
 892
 893
 894   <sect2 id="queries-group">
 895    <title>The <literal>GROUP BY</literal> and <literal>HAVING</literal> Clauses</title>
 896
 897    <indexterm zone="queries-group">
 898     <primary>GROUP BY</primary>
 899    </indexterm>
 900
 901    <indexterm zone="queries-group">
 902     <primary>grouping</primary>
 903    </indexterm>
 904
 905    <para>
 906     After passing the <literal>WHERE</> filter, the derived input
 907     table might be subject to grouping, using the <literal>GROUP BY</>
 908     clause, and elimination of group rows using the <literal>HAVING</>
 909     clause.
 910    </para>
 911
 912 <synopsis>
 913 SELECT <replaceable>select_list</replaceable>
 914     FROM ...
 915     <optional>WHERE ...</optional>
 916     GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
 917 </synopsis>
 918
 919    <para>
 920     The <xref linkend="sql-groupby" endterm="sql-groupby-title"> is
 921     used to group together those rows in a table that have the same
 922     values in all the columns listed. The order in which the columns
 923     are listed does not matter.  The effect is to combine each set
 924     of rows having common values into one group row that
 925     represents all rows in the group.  This is done to
 926     eliminate redundancy in the output and/or compute aggregates that
 927     apply to these groups.  For instance:
 928 <screen>
 929 <prompt>=&gt;</> <userinput>SELECT * FROM test1;</>
 930  x | y
 931 ---+---
 932  a | 3
 933  c | 2
 934  b | 5
 935  a | 1
 936 (4 rows)
 937
 938 <prompt>=&gt;</> <userinput>SELECT x FROM test1 GROUP BY x;</>
 939  x
 940 ---
 941  a
 942  b
 943  c
 944 (3 rows)
 945 </screen>
 946    </para>
 947
 948    <para>
 949     In the second query, we could not have written <literal>SELECT *
 950     FROM test1 GROUP BY x</literal>, because there is no single value
 951     for the column <literal>y</> that could be associated with each
 952     group.  The grouped-by columns can be referenced in the select list since
 953     they have a single value in each group.
 954    </para>
 955
 956    <para>
 957     In general, if a table is grouped, columns that are not
 958     listed in <literal>GROUP BY</> cannot be referenced except in aggregate
 959     expressions.  An example with aggregate expressions is:
 960 <screen>
 961 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x;</>
 962  x | sum
 963 ---+-----
 964  a |   4
 965  b |   5
 966  c |   2
 967 (3 rows)
 968 </screen>
 969     Here <literal>sum</literal> is an aggregate function that
 970     computes a single value over the entire group.  More information
 971     about the available aggregate functions can be found in <xref
 972     linkend="functions-aggregate">.
 973    </para>
 974
 975    <tip>
 976     <para>
 977      Grouping without aggregate expressions effectively calculates the
 978      set of distinct values in a column.  This can also be achieved
 979      using the <literal>DISTINCT</> clause (see <xref
 980      linkend="queries-distinct">).
 981     </para>
 982    </tip>
 983
 984    <para>
 985     Here is another example:  it calculates the total sales for each
 986     product (rather than the total sales of all products):
 987 <programlisting>
 988 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
 989     FROM products p LEFT JOIN sales s USING (product_id)
 990     GROUP BY product_id, p.name, p.price;
 991 </programlisting>
 992     In this example, the columns <literal>product_id</literal>,
 993     <literal>p.name</literal>, and <literal>p.price</literal> must be
 994     in the <literal>GROUP BY</> clause since they are referenced in
 995     the query select list (but see below).  The column
 996     <literal>s.units</> does not have to be in the <literal>GROUP
 997     BY</> list since it is only used in an aggregate expression
 998     (<literal>sum(...)</literal>), which represents the sales
 999     of a product.  For each product, the query returns a summary row about
1000     all sales of the product.
1001    </para>
1002
1003    <indexterm><primary>functional dependency</primary></indexterm>
1004
1005    <para>
1006     If the products table is set up so that, say,
1007     <literal>product_id</literal> is the primary key, then it would be
1008     enough to group by <literal>product_id</literal> in the above example,
1009     since name and price would be <firstterm>functionally
1010     dependent</firstterm> on the product ID, and so there would be no
1011     ambiguity about which name and price value to return for each product
1012     ID group.
1013    </para>
1014
1015    <para>
1016     In strict SQL, <literal>GROUP BY</> can only group by columns of
1017     the source table but <productname>PostgreSQL</productname> extends
1018     this to also allow <literal>GROUP BY</> to group by columns in the
1019     select list.  Grouping by value expressions instead of simple
1020     column names is also allowed.
1021    </para>
1022
1023    <indexterm>
1024     <primary>HAVING</primary>
1025    </indexterm>
1026
1027    <para>
1028     If a table has been grouped using <literal>GROUP BY</literal>,
1029     but only certain groups are of interest, the
1030     <literal>HAVING</literal> clause can be used, much like a
1031     <literal>WHERE</> clause, to eliminate groups from the result.
1032     The syntax is:
1033 <synopsis>
1034 SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
1035 </synopsis>
1036     Expressions in the <literal>HAVING</> clause can refer both to
1037     grouped expressions and to ungrouped expressions (which necessarily
1038     involve an aggregate function).
1039    </para>
1040
1041    <para>
1042     Example:
1043 <screen>
1044 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) &gt; 3;</>
1045  x | sum
1046 ---+-----
1047  a |   4
1048  b |   5
1049 (2 rows)
1050
1051 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x &lt; 'c';</>
1052  x | sum
1053 ---+-----
1054  a |   4
1055  b |   5
1056 (2 rows)
1057 </screen>
1058    </para>
1059
1060    <para>
1061     Again, a more realistic example:
1062 <programlisting>
1063 SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
1064     FROM products p LEFT JOIN sales s USING (product_id)
1065     WHERE s.date &gt; CURRENT_DATE - INTERVAL '4 weeks'
1066     GROUP BY product_id, p.name, p.price, p.cost
1067     HAVING sum(p.price * s.units) &gt; 5000;
1068 </programlisting>
1069     In the example above, the <literal>WHERE</> clause is selecting
1070     rows by a column that is not grouped (the expression is only true for
1071     sales during the last four weeks), while the <literal>HAVING</>
1072     clause restricts the output to groups with total gross sales over
1073     5000.  Note that the aggregate expressions do not necessarily need
1074     to be the same in all parts of the query.
1075    </para>
1076
1077    <para>
1078     If a query contains aggregate function calls, but no <literal>GROUP BY</>
1079     clause, grouping still occurs: the result is a single group row (or
1080     perhaps no rows at all, if the single row is then eliminated by
1081     <literal>HAVING</>).
1082     The same is true if it contains a <literal>HAVING</> clause, even
1083     without any aggregate function calls or <literal>GROUP BY</> clause.
1084    </para>
1085   </sect2>
1086
1087   <sect2 id="queries-window">
1088    <title>Window Function Processing</title>
1089
1090    <indexterm zone="queries-window">
1091     <primary>window function</primary>
1092     <secondary>order of execution</>
1093    </indexterm>
1094
1095    <para>
1096     If the query contains any window functions (see
1097     <xref linkend="tutorial-window">,
1098     <xref linkend="functions-window"> and
1099     <xref linkend="syntax-window-functions">), these functions are evaluated
1100     after any grouping, aggregation, and <literal>HAVING</> filtering is
1101     performed.  That is, if the query uses any aggregates, <literal>GROUP
1102     BY</>, or <literal>HAVING</>, then the rows seen by the window functions
1103     are the group rows instead of the original table rows from
1104     <literal>FROM</>/<literal>WHERE</>.
1105    </para>
1106
1107    <para>
1108     When multiple window functions are used, all the window functions having
1109     syntactically equivalent <literal>PARTITION BY</> and <literal>ORDER BY</>
1110     clauses in their window definitions are guaranteed to be evaluated in a
1111     single pass over the data. Therefore they will see the same sort ordering,
1112     even if the <literal>ORDER BY</> does not uniquely determine an ordering.
1113     However, no guarantees are made about the evaluation of functions having
1114     different <literal>PARTITION BY</> or <literal>ORDER BY</> specifications.
1115     (In such cases a sort step is typically required between the passes of
1116     window function evaluations, and the sort is not guaranteed to preserve
1117     ordering of rows that its <literal>ORDER BY</> sees as equivalent.)
1118    </para>
1119
1120    <para>
1121     Currently, window functions always require presorted data, and so the
1122     query output will be ordered according to one or another of the window
1123     functions' <literal>PARTITION BY</>/<literal>ORDER BY</> clauses.
1124     It is not recommended to rely on this, however.  Use an explicit
1125     top-level <literal>ORDER BY</> clause if you want to be sure the
1126     results are sorted in a particular way.
1127    </para>
1128   </sect2>
1129  </sect1>
1130
1131
1132  <sect1 id="queries-select-lists">
1133   <title>Select Lists</title>
1134
1135   <indexterm>
1136    <primary>SELECT</primary>
1137    <secondary>select list</secondary>
1138   </indexterm>
1139
1140   <para>
1141    As shown in the previous section,
1142    the table expression in the <command>SELECT</command> command
1143    constructs an intermediate virtual table by possibly combining
1144    tables, views, eliminating rows, grouping, etc.  This table is
1145    finally passed on to processing by the <firstterm>select list</firstterm>.  The select
1146    list determines which <emphasis>columns</emphasis> of the
1147    intermediate table are actually output.
1148   </para>
1149
1150   <sect2 id="queries-select-list-items">
1151    <title>Select-List Items</title>
1152
1153    <indexterm>
1154     <primary>*</primary>
1155    </indexterm>
1156
1157    <para>
1158     The simplest kind of select list is <literal>*</literal> which
1159     emits all columns that the table expression produces.  Otherwise,
1160     a select list is a comma-separated list of value expressions (as
1161     defined in <xref linkend="sql-expressions">).  For instance, it
1162     could be a list of column names:
1163 <programlisting>
1164 SELECT a, b, c FROM ...
1165 </programlisting>
1166      The columns names <literal>a</>, <literal>b</>, and <literal>c</>
1167      are either the actual names of the columns of tables referenced
1168      in the <literal>FROM</> clause, or the aliases given to them as
1169      explained in <xref linkend="queries-table-aliases">.  The name
1170      space available in the select list is the same as in the
1171      <literal>WHERE</> clause, unless grouping is used, in which case
1172      it is the same as in the <literal>HAVING</> clause.
1173    </para>
1174
1175    <para>
1176     If more than one table has a column of the same name, the table
1177     name must also be given, as in:
1178 <programlisting>
1179 SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
1180 </programlisting>
1181     When working with multiple tables, it can also be useful to ask for
1182     all the columns of a particular table:
1183 <programlisting>
1184 SELECT tbl1.*, tbl2.a FROM ...
1185 </programlisting>
1186     (See also <xref linkend="queries-where">.)
1187    </para>
1188
1189    <para>
1190     If an arbitrary value expression is used in the select list, it
1191     conceptually adds a new virtual column to the returned table.  The
1192     value expression is evaluated once for each result row, with
1193     the row's values substituted for any column references.  But the
1194     expressions in the select list do not have to reference any
1195     columns in the table expression of the <literal>FROM</> clause;
1196     they can be constant arithmetic expressions, for instance.
1197    </para>
1198   </sect2>
1199
1200   <sect2 id="queries-column-labels">
1201    <title>Column Labels</title>
1202
1203    <indexterm zone="queries-column-labels">
1204     <primary>alias</primary>
1205     <secondary>in the select list</secondary>
1206    </indexterm>
1207
1208    <para>
1209     The entries in the select list can be assigned names for subsequent
1210     processing, such as for use in an <literal>ORDER BY</> clause
1211     or for display by the client application.  For example:
1212 <programlisting>
1213 SELECT a AS value, b + c AS sum FROM ...
1214 </programlisting>
1215    </para>
1216
1217    <para>
1218     If no output column name is specified using <literal>AS</>,
1219     the system assigns a default column name.  For simple column references,
1220     this is the name of the referenced column.  For function
1221     calls, this is the name of the function.  For complex expressions,
1222     the system will generate a generic name.
1223    </para>
1224
1225    <para>
1226     The <literal>AS</> keyword is optional, but only if the new column
1227     name does not match any
1228     <productname>PostgreSQL</productname> keyword (see <xref
1229     linkend="sql-keywords-appendix">).  To avoid an accidental match to
1230     a keyword, you can double-quote the column name.  For example,
1231     <literal>VALUE</> is a keyword, so this does not work:
1232 <programlisting>
1233 SELECT a value, b + c AS sum FROM ...
1234 </programlisting>
1235     but this does:
1236 <programlisting>
1237 SELECT a "value", b + c AS sum FROM ...
1238 </programlisting>
1239     For protection against possible
1240     future keyword additions, it is recommended that you always either
1241     write <literal>AS</literal> or double-quote the output column name.
1242    </para>
1243
1244    <note>
1245     <para>
1246      The naming of output columns here is different from that done in
1247      the <literal>FROM</> clause (see <xref
1248      linkend="queries-table-aliases">).  It is possible
1249      to rename the same column twice, but the name assigned in
1250      the select list is the one that will be passed on.
1251     </para>
1252    </note>
1253   </sect2>
1254
1255   <sect2 id="queries-distinct">
1256    <title><literal>DISTINCT</literal></title>
1257
1258    <indexterm zone="queries-distinct">
1259     <primary>DISTINCT</primary>
1260    </indexterm>
1261
1262    <indexterm zone="queries-distinct">
1263     <primary>duplicates</primary>
1264    </indexterm>
1265
1266    <para>
1267     After the select list has been processed, the result table can
1268     optionally be subject to the elimination of duplicate rows.  The
1269     <literal>DISTINCT</literal> key word is written directly after
1270     <literal>SELECT</literal> to specify this:
1271 <synopsis>
1272 SELECT DISTINCT <replaceable>select_list</replaceable> ...
1273 </synopsis>
1274     (Instead of <literal>DISTINCT</> the key word <literal>ALL</literal>
1275     can be used to specify the default behavior of retaining all rows.)
1276    </para>
1277
1278    <indexterm>
1279     <primary>null value</>
1280     <secondary sortas="DISTINCT">in DISTINCT</>
1281    </indexterm>
1282
1283    <para>
1284     Obviously, two rows are considered distinct if they differ in at
1285     least one column value.  Null values are considered equal in this
1286     comparison.
1287    </para>
1288
1289    <para>
1290     Alternatively, an arbitrary expression can determine what rows are
1291     to be considered distinct:
1292 <synopsis>
1293 SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
1294 </synopsis>
1295     Here <replaceable>expression</replaceable> is an arbitrary value
1296     expression that is evaluated for all rows.  A set of rows for
1297     which all the expressions are equal are considered duplicates, and
1298     only the first row of the set is kept in the output.  Note that
1299     the <quote>first row</quote> of a set is unpredictable unless the
1300     query is sorted on enough columns to guarantee a unique ordering
1301     of the rows arriving at the <literal>DISTINCT</> filter.
1302     (<literal>DISTINCT ON</> processing occurs after <literal>ORDER
1303     BY</> sorting.)
1304    </para>
1305
1306    <para>
1307     The <literal>DISTINCT ON</> clause is not part of the SQL standard
1308     and is sometimes considered bad style because of the potentially
1309     indeterminate nature of its results.  With judicious use of
1310     <literal>GROUP BY</> and subqueries in <literal>FROM</>, this
1311     construct can be avoided, but it is often the most convenient
1312     alternative.
1313    </para>
1314   </sect2>
1315  </sect1>
1316
1317
1318  <sect1 id="queries-union">
1319   <title>Combining Queries</title>
1320
1321   <indexterm zone="queries-union">
1322    <primary>UNION</primary>
1323   </indexterm>
1324   <indexterm zone="queries-union">
1325    <primary>INTERSECT</primary>
1326   </indexterm>
1327   <indexterm zone="queries-union">
1328    <primary>EXCEPT</primary>
1329   </indexterm>
1330   <indexterm zone="queries-union">
1331    <primary>set union</primary>
1332   </indexterm>
1333   <indexterm zone="queries-union">
1334    <primary>set intersection</primary>
1335   </indexterm>
1336   <indexterm zone="queries-union">
1337    <primary>set difference</primary>
1338   </indexterm>
1339   <indexterm zone="queries-union">
1340    <primary>set operation</primary>
1341   </indexterm>
1342
1343   <para>
1344    The results of two queries can be combined using the set operations
1345    union, intersection, and difference.  The syntax is
1346 <synopsis>
1347 <replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
1348 <replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
1349 <replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
1350 </synopsis>
1351    <replaceable>query1</replaceable> and
1352    <replaceable>query2</replaceable> are queries that can use any of
1353    the features discussed up to this point.  Set operations can also
1354    be nested and chained, for example
1355 <synopsis>
1356 <replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
1357 </synopsis>
1358    which is executed as:
1359 <synopsis>
1360 (<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
1361 </synopsis>
1362   </para>
1363
1364   <para>
1365    <literal>UNION</> effectively appends the result of
1366    <replaceable>query2</replaceable> to the result of
1367    <replaceable>query1</replaceable> (although there is no guarantee
1368    that this is the order in which the rows are actually returned).
1369    Furthermore, it eliminates duplicate rows from its result, in the same
1370    way as <literal>DISTINCT</>, unless <literal>UNION ALL</> is used.
1371   </para>
1372
1373   <para>
1374    <literal>INTERSECT</> returns all rows that are both in the result
1375    of <replaceable>query1</replaceable> and in the result of
1376    <replaceable>query2</replaceable>.  Duplicate rows are eliminated
1377    unless <literal>INTERSECT ALL</> is used.
1378   </para>
1379
1380   <para>
1381    <literal>EXCEPT</> returns all rows that are in the result of
1382    <replaceable>query1</replaceable> but not in the result of
1383    <replaceable>query2</replaceable>.  (This is sometimes called the
1384    <firstterm>difference</> between two queries.)  Again, duplicates
1385    are eliminated unless <literal>EXCEPT ALL</> is used.
1386   </para>
1387
1388   <para>
1389    In order to calculate the union, intersection, or difference of two
1390    queries, the two queries must be <quote>union compatible</quote>,
1391    which means that they return the same number of columns and
1392    the corresponding columns have compatible data types, as
1393    described in <xref linkend="typeconv-union-case">.
1394   </para>
1395  </sect1>
1396
1397
1398  <sect1 id="queries-order">
1399   <title>Sorting Rows</title>
1400
1401   <indexterm zone="queries-order">
1402    <primary>sorting</primary>
1403   </indexterm>
1404
1405   <indexterm zone="queries-order">
1406    <primary>ORDER BY</primary>
1407   </indexterm>
1408
1409   <para>
1410    After a query has produced an output table (after the select list
1411    has been processed) it can optionally be sorted.  If sorting is not
1412    chosen, the rows will be returned in an unspecified order.  The actual
1413    order in that case will depend on the scan and join plan types and
1414    the order on disk, but it must not be relied on.  A particular
1415    output ordering can only be guaranteed if the sort step is explicitly
1416    chosen.
1417   </para>
1418
1419   <para>
1420    The <literal>ORDER BY</> clause specifies the sort order:
1421 <synopsis>
1422 SELECT <replaceable>select_list</replaceable>
1423     FROM <replaceable>table_expression</replaceable>
1424     ORDER BY <replaceable>sort_expression1</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional>
1425              <optional>, <replaceable>sort_expression2</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional> ...</optional>
1426 </synopsis>
1427    The sort expression(s) can be any expression that would be valid in the
1428    query's select list.  An example is:
1429 <programlisting>
1430 SELECT a, b FROM table1 ORDER BY a + b, c;
1431 </programlisting>
1432    When more than one expression is specified,
1433    the later values are used to sort rows that are equal according to the
1434    earlier values.  Each expression can be followed by an optional
1435    <literal>ASC</> or <literal>DESC</> keyword to set the sort direction to
1436    ascending or descending.  <literal>ASC</> order is the default.
1437    Ascending order puts smaller values first, where
1438    <quote>smaller</quote> is defined in terms of the
1439    <literal>&lt;</literal> operator.  Similarly, descending order is
1440    determined with the <literal>&gt;</literal> operator.
1441     <footnote>
1442      <para>
1443       Actually, <productname>PostgreSQL</> uses the <firstterm>default B-tree
1444       operator class</> for the expression's data type to determine the sort
1445       ordering for <literal>ASC</> and <literal>DESC</>.  Conventionally,
1446       data types will be set up so that the <literal>&lt;</literal> and
1447       <literal>&gt;</literal> operators correspond to this sort ordering,
1448       but a user-defined data type's designer could choose to do something
1449       different.
1450      </para>
1451     </footnote>
1452   </para>
1453
1454   <para>
1455    The <literal>NULLS FIRST</> and <literal>NULLS LAST</> options can be
1456    used to determine whether nulls appear before or after non-null values
1457    in the sort ordering.  By default, null values sort as if larger than any
1458    non-null value; that is, <literal>NULLS FIRST</> is the default for
1459    <literal>DESC</> order, and <literal>NULLS LAST</> otherwise.
1460   </para>
1461
1462   <para>
1463    Note that the ordering options are considered independently for each
1464    sort column.  For example <literal>ORDER BY x, y DESC</> means
1465    <literal>ORDER BY x ASC, y DESC</>, which is not the same as
1466    <literal>ORDER BY x DESC, y DESC</>.
1467   </para>
1468
1469   <para>
1470    A <replaceable>sort_expression</> can also be the column label or number
1471    of an output column, as in:
1472 <programlisting>
1473 SELECT a + b AS sum, c FROM table1 ORDER BY sum;
1474 SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
1475 </programlisting>
1476    both of which sort by the first output column.  Note that an output
1477    column name has to stand alone, that is, it cannot be used in an expression
1478    &mdash; for example, this is <emphasis>not</> correct:
1479 <programlisting>
1480 SELECT a + b AS sum, c FROM table1 ORDER BY sum + c;          -- wrong
1481 </programlisting>
1482    This restriction is made to reduce ambiguity.  There is still
1483    ambiguity if an <literal>ORDER BY</> item is a simple name that
1484    could match either an output column name or a column from the table
1485    expression.  The output column is used in such cases.  This would
1486    only cause confusion if you use <literal>AS</> to rename an output
1487    column to match some other table column's name.
1488   </para>
1489
1490   <para>
1491    <literal>ORDER BY</> can be applied to the result of a
1492    <literal>UNION</>, <literal>INTERSECT</>, or <literal>EXCEPT</>
1493    combination, but in this case it is only permitted to sort by
1494    output column names or numbers, not by expressions.
1495   </para>
1496  </sect1>
1497
1498
1499  <sect1 id="queries-limit">
1500   <title><literal>LIMIT</literal> and <literal>OFFSET</literal></title>
1501
1502   <indexterm zone="queries-limit">
1503    <primary>LIMIT</primary>
1504   </indexterm>
1505
1506   <indexterm zone="queries-limit">
1507    <primary>OFFSET</primary>
1508   </indexterm>
1509
1510   <para>
1511    <literal>LIMIT</> and <literal>OFFSET</> allow you to retrieve just
1512    a portion of the rows that are generated by the rest of the query:
1513 <synopsis>
1514 SELECT <replaceable>select_list</replaceable>
1515     FROM <replaceable>table_expression</replaceable>
1516     <optional> ORDER BY ... </optional>
1517     <optional> LIMIT { <replaceable>number</replaceable> | ALL } </optional> <optional> OFFSET <replaceable>number</replaceable> </optional>
1518 </synopsis>
1519   </para>
1520
1521   <para>
1522    If a limit count is given, no more than that many rows will be
1523    returned (but possibly less, if the query itself yields less rows).
1524    <literal>LIMIT ALL</> is the same as omitting the <literal>LIMIT</>
1525    clause.
1526   </para>
1527
1528   <para>
1529    <literal>OFFSET</> says to skip that many rows before beginning to
1530    return rows.  <literal>OFFSET 0</> is the same as omitting the
1531    <literal>OFFSET</> clause, and <literal>LIMIT NULL</> is the same
1532    as omitting the <literal>LIMIT</> clause.  If both <literal>OFFSET</>
1533    and <literal>LIMIT</> appear, then <literal>OFFSET</> rows are
1534    skipped before starting to count the <literal>LIMIT</> rows that
1535    are returned.
1536   </para>
1537
1538   <para>
1539    When using <literal>LIMIT</>, it is important to use an
1540    <literal>ORDER BY</> clause that constrains the result rows into a
1541    unique order.  Otherwise you will get an unpredictable subset of
1542    the query's rows. You might be asking for the tenth through
1543    twentieth rows, but tenth through twentieth in what ordering? The
1544    ordering is unknown, unless you specified <literal>ORDER BY</>.
1545   </para>
1546
1547   <para>
1548    The query optimizer takes <literal>LIMIT</> into account when
1549    generating query plans, so you are very likely to get different
1550    plans (yielding different row orders) depending on what you give
1551    for <literal>LIMIT</> and <literal>OFFSET</>.  Thus, using
1552    different <literal>LIMIT</>/<literal>OFFSET</> values to select
1553    different subsets of a query result <emphasis>will give
1554    inconsistent results</emphasis> unless you enforce a predictable
1555    result ordering with <literal>ORDER BY</>.  This is not a bug; it
1556    is an inherent consequence of the fact that SQL does not promise to
1557    deliver the results of a query in any particular order unless
1558    <literal>ORDER BY</> is used to constrain the order.
1559   </para>
1560
1561   <para>
1562    The rows skipped by an <literal>OFFSET</> clause still have to be
1563    computed inside the server; therefore a large <literal>OFFSET</>
1564    might be inefficient.
1565   </para>
1566  </sect1>
1567
1568
1569  <sect1 id="queries-values">
1570   <title><literal>VALUES</literal> Lists</title>
1571
1572   <indexterm zone="queries-values">
1573    <primary>VALUES</primary>
1574   </indexterm>
1575
1576   <para>
1577    <literal>VALUES</> provides a way to generate a <quote>constant table</>
1578    that can be used in a query without having to actually create and populate
1579    a table on-disk.  The syntax is
1580 <synopsis>
1581 VALUES ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) [, ...]
1582 </synopsis>
1583    Each parenthesized list of expressions generates a row in the table.
1584    The lists must all have the same number of elements (i.e., the number
1585    of columns in the table), and corresponding entries in each list must
1586    have compatible data types.  The actual data type assigned to each column
1587    of the result is determined using the same rules as for <literal>UNION</>
1588    (see <xref linkend="typeconv-union-case">).
1589   </para>
1590
1591   <para>
1592    As an example:
1593 <programlisting>
1594 VALUES (1, 'one'), (2, 'two'), (3, 'three');
1595 </programlisting>
1596
1597    will return a table of two columns and three rows.  It's effectively
1598    equivalent to:
1599 <programlisting>
1600 SELECT 1 AS column1, 'one' AS column2
1601 UNION ALL
1602 SELECT 2, 'two'
1603 UNION ALL
1604 SELECT 3, 'three';
1605 </programlisting>
1606
1607    By default, <productname>PostgreSQL</productname> assigns the names
1608    <literal>column1</>, <literal>column2</>, etc. to the columns of a
1609    <literal>VALUES</> table.  The column names are not specified by the
1610    SQL standard and different database systems do it differently, so
1611    it's usually better to override the default names with a table alias
1612    list.
1613   </para>
1614
1615   <para>
1616    Syntactically, <literal>VALUES</> followed by expression lists is
1617    treated as equivalent to:
1618 <synopsis>
1619 SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable>
1620 </synopsis>
1621    and can appear anywhere a <literal>SELECT</> can.  For example, you can
1622    use it as part of a <literal>UNION</>, or attach a
1623    <replaceable>sort_specification</replaceable> (<literal>ORDER BY</>,
1624    <literal>LIMIT</>, and/or <literal>OFFSET</>) to it.  <literal>VALUES</>
1625    is most commonly used as the data source in an <command>INSERT</> command,
1626    and next most commonly as a subquery.
1627   </para>
1628
1629   <para>
1630    For more information see <xref linkend="sql-values">.
1631   </para>
1632
1633  </sect1>
1634
1635
1636  <sect1 id="queries-with">
1637   <title><literal>WITH</literal> Queries (Common Table Expressions)</title>
1638
1639   <indexterm zone="queries-with">
1640    <primary>WITH</primary>
1641    <secondary>in SELECT</secondary>
1642   </indexterm>
1643
1644   <indexterm>
1645    <primary>common table expression</primary>
1646    <see>WITH</see>
1647   </indexterm>
1648
1649   <para>
1650    <literal>WITH</> provides a way to write auxiliary statements for use in a
1651    larger query.  These statements, which are often referred to as Common
1652    Table Expressions or <acronym>CTE</acronym>s, can be thought of as defining
1653    temporary tables that exist just for one query.  Each auxiliary statement
1654    in a <literal>WITH</> clause can be a <command>SELECT</>,
1655    <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>; and the
1656    <literal>WITH</> clause itself is attached to a primary statement that can
1657    also be a <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>, or
1658    <command>DELETE</>.
1659   </para>
1660
1661  <sect2 id="queries-with-select">
1662    <title><command>SELECT</> in <literal>WITH</></title>
1663
1664   <para>
1665    The basic value of <command>SELECT</> in <literal>WITH</> is to
1666    break down complicated queries into simpler parts.  An example is:
1667
1668 <programlisting>
1669 WITH regional_sales AS (
1670         SELECT region, SUM(amount) AS total_sales
1671         FROM orders
1672         GROUP BY region
1673      ), top_regions AS (
1674         SELECT region
1675         FROM regional_sales
1676         WHERE total_sales &gt; (SELECT SUM(total_sales)/10 FROM regional_sales)
1677      )
1678 SELECT region,
1679        product,
1680        SUM(quantity) AS product_units,
1681        SUM(amount) AS product_sales
1682 FROM orders
1683 WHERE region IN (SELECT region FROM top_regions)
1684 GROUP BY region, product;
1685 </programlisting>
1686
1687    which displays per-product sales totals in only the top sales regions.
1688    The <literal>WITH</> clause defines two auxiliary statements named
1689    <structname>regional_sales</> and <structname>top_regions</>,
1690    where the output of <structname>regional_sales</> is used in
1691    <structname>top_regions</> and the output of <structname>top_regions</>
1692    is used in the primary <command>SELECT</> query.
1693    This example could have been written without <literal>WITH</>,
1694    but we'd have needed two levels of nested sub-<command>SELECT</command>s.  It's a bit
1695    easier to follow this way.
1696   </para>
1697
1698   <para>
1699    The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
1700    from a mere syntactic convenience into a feature that accomplishes
1701    things not otherwise possible in standard SQL.  Using
1702    <literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
1703    output.  A very simple example is this query to sum the integers from 1
1704    through 100:
1705
1706 <programlisting>
1707 WITH RECURSIVE t(n) AS (
1708     VALUES (1)
1709   UNION ALL
1710     SELECT n+1 FROM t WHERE n &lt; 100
1711 )
1712 SELECT sum(n) FROM t;
1713 </programlisting>
1714
1715    The general form of a recursive <literal>WITH</> query is always a
1716    <firstterm>non-recursive term</>, then <literal>UNION</> (or
1717    <literal>UNION ALL</>), then a
1718    <firstterm>recursive term</>, where only the recursive term can contain
1719    a reference to the query's own output.  Such a query is executed as
1720    follows:
1721   </para>
1722
1723   <procedure>
1724    <title>Recursive Query Evaluation</title>
1725
1726    <step performance="required">
1727     <para>
1728      Evaluate the non-recursive term.  For <literal>UNION</> (but not
1729      <literal>UNION ALL</>), discard duplicate rows.  Include all remaining
1730      rows in the result of the recursive query, and also place them in a
1731      temporary <firstterm>working table</>.
1732     </para>
1733    </step>
1734
1735    <step performance="required">
1736     <para>
1737      So long as the working table is not empty, repeat these steps:
1738     </para>
1739     <substeps>
1740      <step performance="required">
1741       <para>
1742        Evaluate the recursive term, substituting the current contents of
1743        the working table for the recursive self-reference.
1744        For <literal>UNION</> (but not <literal>UNION ALL</>), discard
1745        duplicate rows and rows that duplicate any previous result row.
1746        Include all remaining rows in the result of the recursive query, and
1747        also place them in a temporary <firstterm>intermediate table</>.
1748       </para>
1749      </step>
1750
1751      <step performance="required">
1752       <para>
1753        Replace the contents of the working table with the contents of the
1754        intermediate table, then empty the intermediate table.
1755       </para>
1756      </step>
1757     </substeps>
1758    </step>
1759   </procedure>
1760
1761   <note>
1762    <para>
1763     Strictly speaking, this process is iteration not recursion, but
1764     <literal>RECURSIVE</> is the terminology chosen by the SQL standards
1765     committee.
1766    </para>
1767   </note>
1768
1769   <para>
1770    In the example above, the working table has just a single row in each step,
1771    and it takes on the values from 1 through 100 in successive steps.  In
1772    the 100th step, there is no output because of the <literal>WHERE</>
1773    clause, and so the query terminates.
1774   </para>
1775
1776   <para>
1777    Recursive queries are typically used to deal with hierarchical or
1778    tree-structured data.  A useful example is this query to find all the
1779    direct and indirect sub-parts of a product, given only a table that
1780    shows immediate inclusions:
1781
1782 <programlisting>
1783 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
1784     SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
1785   UNION ALL
1786     SELECT p.sub_part, p.part, p.quantity
1787     FROM included_parts pr, parts p
1788     WHERE p.part = pr.sub_part
1789   )
1790 SELECT sub_part, SUM(quantity) as total_quantity
1791 FROM included_parts
1792 GROUP BY sub_part
1793 </programlisting>
1794   </para>
1795
1796   <para>
1797    When working with recursive queries it is important to be sure that
1798    the recursive part of the query will eventually return no tuples,
1799    or else the query will loop indefinitely.  Sometimes, using
1800    <literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
1801    by discarding rows that duplicate previous output rows.  However, often a
1802    cycle does not involve output rows that are completely duplicate: it may be
1803    necessary to check just one or a few fields to see if the same point has
1804    been reached before.  The standard method for handling such situations is
1805    to compute an array of the already-visited values.  For example, consider
1806    the following query that searches a table <structname>graph</> using a
1807    <structfield>link</> field:
1808
1809 <programlisting>
1810 WITH RECURSIVE search_graph(id, link, data, depth) AS (
1811         SELECT g.id, g.link, g.data, 1
1812         FROM graph g
1813       UNION ALL
1814         SELECT g.id, g.link, g.data, sg.depth + 1
1815         FROM graph g, search_graph sg
1816         WHERE g.id = sg.link
1817 )
1818 SELECT * FROM search_graph;
1819 </programlisting>
1820
1821    This query will loop if the <structfield>link</> relationships contain
1822    cycles.  Because we require a <quote>depth</> output, just changing
1823    <literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
1824    Instead we need to recognize whether we have reached the same row again
1825    while following a particular path of links.  We add two columns
1826    <structfield>path</> and <structfield>cycle</> to the loop-prone query:
1827
1828 <programlisting>
1829 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1830         SELECT g.id, g.link, g.data, 1,
1831           ARRAY[g.id],
1832           false
1833         FROM graph g
1834       UNION ALL
1835         SELECT g.id, g.link, g.data, sg.depth + 1,
1836           path || g.id,
1837           g.id = ANY(path)
1838         FROM graph g, search_graph sg
1839         WHERE g.id = sg.link AND NOT cycle
1840 )
1841 SELECT * FROM search_graph;
1842 </programlisting>
1843
1844    Aside from preventing cycles, the array value is often useful in its own
1845    right as representing the <quote>path</> taken to reach any particular row.
1846   </para>
1847
1848   <para>
1849    In the general case where more than one field needs to be checked to
1850    recognize a cycle, use an array of rows.  For example, if we needed to
1851    compare fields <structfield>f1</> and <structfield>f2</>:
1852
1853 <programlisting>
1854 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1855         SELECT g.id, g.link, g.data, 1,
1856           ARRAY[ROW(g.f1, g.f2)],
1857           false
1858         FROM graph g
1859       UNION ALL
1860         SELECT g.id, g.link, g.data, sg.depth + 1,
1861           path || ROW(g.f1, g.f2),
1862           ROW(g.f1, g.f2) = ANY(path)
1863         FROM graph g, search_graph sg
1864         WHERE g.id = sg.link AND NOT cycle
1865 )
1866 SELECT * FROM search_graph;
1867 </programlisting>
1868   </para>
1869
1870   <tip>
1871    <para>
1872     Omit the <literal>ROW()</> syntax in the common case where only one field
1873     needs to be checked to recognize a cycle.  This allows a simple array
1874     rather than a composite-type array to be used, gaining efficiency.
1875    </para>
1876   </tip>
1877
1878   <tip>
1879    <para>
1880     The recursive query evaluation algorithm produces its output in
1881     breadth-first search order.  You can display the results in depth-first
1882     search order by making the outer query <literal>ORDER BY</> a
1883     <quote>path</> column constructed in this way.
1884    </para>
1885   </tip>
1886
1887   <para>
1888    A helpful trick for testing queries
1889    when you are not certain if they might loop is to place a <literal>LIMIT</>
1890    in the parent query.  For example, this query would loop forever without
1891    the <literal>LIMIT</>:
1892
1893 <programlisting>
1894 WITH RECURSIVE t(n) AS (
1895     SELECT 1
1896   UNION ALL
1897     SELECT n+1 FROM t
1898 )
1899 SELECT n FROM t LIMIT 100;
1900 </programlisting>
1901
1902    This works because <productname>PostgreSQL</productname>'s implementation
1903    evaluates only as many rows of a <literal>WITH</> query as are actually
1904    fetched by the parent query.  Using this trick in production is not
1905    recommended, because other systems might work differently.  Also, it
1906    usually won't work if you make the outer query sort the recursive query's
1907    results or join them to some other table, because in such cases the
1908    outer query will usually try to fetch all of the <literal>WITH</> query's
1909    output anyway.
1910   </para>
1911
1912   <para>
1913    A useful property of <literal>WITH</> queries is that they are evaluated
1914    only once per execution of the parent query, even if they are referred to
1915    more than once by the parent query or sibling <literal>WITH</> queries.
1916    Thus, expensive calculations that are needed in multiple places can be
1917    placed within a <literal>WITH</> query to avoid redundant work.  Another
1918    possible application is to prevent unwanted multiple evaluations of
1919    functions with side-effects.
1920    However, the other side of this coin is that the optimizer is less able to
1921    push restrictions from the parent query down into a <literal>WITH</> query
1922    than an ordinary sub-query.  The <literal>WITH</> query will generally be
1923    evaluated as written, without suppression of rows that the parent query
1924    might discard afterwards.  (But, as mentioned above, evaluation might stop
1925    early if the reference(s) to the query demand only a limited number of
1926    rows.)
1927   </para>
1928
1929   <para>
1930    The examples above only show <literal>WITH</> being used with
1931    <command>SELECT</>, but it can be attached in the same way to
1932    <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>.
1933    In each case it effectively provides temporary table(s) that can
1934    be referred to in the main command.
1935   </para>
1936  </sect2>
1937
1938  <sect2 id="queries-with-modifying">
1939    <title>Data-Modifying Statements in <literal>WITH</></title>
1940
1941    <para>
1942     You can use data-modifying statements (<command>INSERT</>,
1943     <command>UPDATE</>, or <command>DELETE</>) in <literal>WITH</>.  This
1944     allows you to perform several different operations in the same query.
1945     An example is:
1946
1947 <programlisting>
1948 WITH moved_rows AS (
1949     DELETE FROM products
1950     WHERE
1951         "date" &gt;= '2010-10-01' AND
1952         "date" &lt; '2010-11-01'
1953     RETURNING *
1954 )
1955 INSERT INTO products_log
1956 SELECT * FROM moved_rows;
1957 </programlisting>
1958
1959     This query effectively moves rows from <structname>products</> to
1960     <structname>products_log</>.  The <command>DELETE</> in <literal>WITH</>
1961     deletes the specified rows from <structname>products</>, returning their
1962     contents by means of its <literal>RETURNING</> clause; and then the
1963     primary query reads that output and inserts it into
1964     <structname>products_log</>.
1965    </para>
1966
1967    <para>
1968     A fine point of the above example is that the <literal>WITH</> clause is
1969     attached to the <command>INSERT</>, not the sub-<command>SELECT</> within
1970     the <command>INSERT</>.  This is necessary because data-modifying
1971     statements are only allowed in <literal>WITH</> clauses that are attached
1972     to the top-level statement.  However, normal <literal>WITH</> visibility
1973     rules apply, so it is possible to refer to the <literal>WITH</>
1974     statement's output from the sub-<command>SELECT</>.
1975    </para>
1976
1977    <para>
1978     Data-modifying statements in <literal>WITH</> usually have
1979     <literal>RETURNING</> clauses, as seen in the example above.
1980     It is the output of the <literal>RETURNING</> clause, <emphasis>not</> the
1981     target table of the data-modifying statement, that forms the temporary
1982     table that can be referred to by the rest of the query.  If a
1983     data-modifying statement in <literal>WITH</> lacks a <literal>RETURNING</>
1984     clause, then it forms no temporary table and cannot be referred to in
1985     the rest of the query.  Such a statement will be executed nonetheless.
1986     A not-particularly-useful example is:
1987
1988 <programlisting>
1989 WITH t AS (
1990     DELETE FROM foo
1991 )
1992 DELETE FROM bar;
1993 </programlisting>
1994
1995     This example would remove all rows from tables <structname>foo</> and
1996     <structname>bar</>.  The number of affected rows reported to the client
1997     would only include rows removed from <structname>bar</>.
1998    </para>
1999
2000    <para>
2001     Recursive self-references in data-modifying statements are not
2002     allowed.  In some cases it is possible to work around this limitation by
2003     referring to the output of a recursive <literal>WITH</>, for example:
2004
2005 <programlisting>
2006 WITH RECURSIVE included_parts(sub_part, part) AS (
2007     SELECT sub_part, part FROM parts WHERE part = 'our_product'
2008   UNION ALL
2009     SELECT p.sub_part, p.part
2010     FROM included_parts pr, parts p
2011     WHERE p.part = pr.sub_part
2012   )
2013 DELETE FROM parts
2014   WHERE part IN (SELECT part FROM included_parts);
2015 </programlisting>
2016
2017     This query would remove all direct and indirect subparts of a product.
2018    </para>
2019
2020    <para>
2021     Data-modifying statements in <literal>WITH</> are executed exactly once,
2022     and always to completion, independently of whether the primary query
2023     reads all (or indeed any) of their output.  Notice that this is different
2024     from the rule for <command>SELECT</> in <literal>WITH</>: as stated in the
2025     previous section, execution of a <command>SELECT</> is carried only as far
2026     as the primary query demands its output.
2027    </para>
2028
2029    <para>
2030     The sub-statements in <literal>WITH</> are executed concurrently with
2031     each other and with the main query.  Therefore, when using data-modifying
2032     statements in <literal>WITH</>, the order in which the specified updates
2033     actually happen is unpredictable.  All the statements are executed with
2034     the same <firstterm>snapshot</> (see <xref linkend="mvcc">), so they
2035     cannot <quote>see</> each others' effects on the target tables.  This
2036     alleviates the effects of the unpredictability of the actual order of row
2037     updates, and means that <literal>RETURNING</> data is the only way to
2038     communicate changes between different <literal>WITH</> sub-statements and
2039     the main query.  An example of this is that in
2040
2041 <programlisting>
2042 WITH t AS (
2043     UPDATE products SET price = price * 1.05
2044     RETURNING *
2045 )
2046 SELECT * FROM products;
2047 </programlisting>
2048
2049     the outer <command>SELECT</> would return the original prices before the
2050     action of the <command>UPDATE</>, while in
2051
2052 <programlisting>
2053 WITH t AS (
2054     UPDATE products SET price = price * 1.05
2055     RETURNING *
2056 )
2057 SELECT * FROM t;
2058 </programlisting>
2059
2060     the outer <command>SELECT</> would return the updated data.
2061    </para>
2062
2063    <para>
2064     Trying to update the same row twice in a single statement is not
2065     supported.  Only one of the modifications takes place, but it is not easy
2066     (and sometimes not possible) to reliably predict which one.  This also
2067     applies to deleting a row that was already updated in the same statement:
2068     only the update is performed.  Therefore you should generally avoid trying
2069     to modify a single row twice in a single statement.  In particular avoid
2070     writing <literal>WITH</> sub-statements that could affect the same rows
2071     changed by the main statement or a sibling sub-statement.  The effects
2072     of such a statement will not be predictable.
2073    </para>
2074
2075    <para>
2076     At present, any table used as the target of a data-modifying statement in
2077     <literal>WITH</> must not have a conditional rule, nor an <literal>ALSO</>
2078     rule, nor an <literal>INSTEAD</> rule that expands to multiple statements.
2079    </para>
2080
2081   </sect2>
2082
2083  </sect1>
2084
2085 </chapter>