granicus.if.org Git - postgresql/blob - doc/src/sgml/queries.sgml

   1 <!-- doc/src/sgml/queries.sgml -->
   2
   3 <chapter id="queries">
   4  <title>Queries</title>
   5
   6  <indexterm zone="queries">
   7   <primary>query</primary>
   8  </indexterm>
   9
  10  <indexterm zone="queries">
  11   <primary>SELECT</primary>
  12  </indexterm>
  13
  14  <para>
  15   The previous chapters explained how to create tables, how to fill
  16   them with data, and how to manipulate that data.  Now we finally
  17   discuss how to retrieve the data from the database.
  18  </para>
  19
  20
  21  <sect1 id="queries-overview">
  22   <title>Overview</title>
  23
  24   <para>
  25    The process of retrieving or the command to retrieve data from a
  26    database is called a <firstterm>query</firstterm>.  In SQL the
  27    <xref linkend="sql-select"> command is
  28    used to specify queries.  The general syntax of the
  29    <command>SELECT</command> command is
  30 <synopsis>
  31 <optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
  32 </synopsis>
  33    The following sections describe the details of the select list, the
  34    table expression, and the sort specification.  <literal>WITH</>
  35    queries are treated last since they are an advanced feature.
  36   </para>
  37
  38   <para>
  39    A simple kind of query has the form:
  40 <programlisting>
  41 SELECT * FROM table1;
  42 </programlisting>
  43   Assuming that there is a table called <literal>table1</literal>,
  44   this command would retrieve all rows and all columns from
  45   <literal>table1</literal>.  (The method of retrieval depends on the
  46   client application.  For example, the
  47   <application>psql</application> program will display an ASCII-art
  48   table on the screen, while client libraries will offer functions to
  49   extract individual values from the query result.)  The select list
  50   specification <literal>*</literal> means all columns that the table
  51   expression happens to provide.  A select list can also select a
  52   subset of the available columns or make calculations using the
  53   columns.  For example, if
  54   <literal>table1</literal> has columns named <literal>a</>,
  55   <literal>b</>, and <literal>c</> (and perhaps others) you can make
  56   the following query:
  57 <programlisting>
  58 SELECT a, b + c FROM table1;
  59 </programlisting>
  60   (assuming that <literal>b</> and <literal>c</> are of a numerical
  61   data type).
  62   See <xref linkend="queries-select-lists"> for more details.
  63  </para>
  64
  65  <para>
  66   <literal>FROM table1</literal> is a simple kind of
  67   table expression: it reads just one table.  In general, table
  68   expressions can be complex constructs of base tables, joins, and
  69   subqueries.  But you can also omit the table expression entirely and
  70   use the <command>SELECT</command> command as a calculator:
  71 <programlisting>
  72 SELECT 3 * 4;
  73 </programlisting>
  74   This is more useful if the expressions in the select list return
  75   varying results.  For example, you could call a function this way:
  76 <programlisting>
  77 SELECT random();
  78 </programlisting>
  79   </para>
  80  </sect1>
  81
  82
  83  <sect1 id="queries-table-expressions">
  84   <title>Table Expressions</title>
  85
  86   <indexterm zone="queries-table-expressions">
  87    <primary>table expression</primary>
  88   </indexterm>
  89
  90   <para>
  91    A <firstterm>table expression</firstterm> computes a table.  The
  92    table expression contains a <literal>FROM</> clause that is
  93    optionally followed by <literal>WHERE</>, <literal>GROUP BY</>, and
  94    <literal>HAVING</> clauses.  Trivial table expressions simply refer
  95    to a table on disk, a so-called base table, but more complex
  96    expressions can be used to modify or combine base tables in various
  97    ways.
  98   </para>
  99
 100   <para>
 101    The optional <literal>WHERE</>, <literal>GROUP BY</>, and
 102    <literal>HAVING</> clauses in the table expression specify a
 103    pipeline of successive transformations performed on the table
 104    derived in the <literal>FROM</> clause.  All these transformations
 105    produce a virtual table that provides the rows that are passed to
 106    the select list to compute the output rows of the query.
 107   </para>
 108
 109   <sect2 id="queries-from">
 110    <title>The <literal>FROM</literal> Clause</title>
 111
 112    <para>
 113     The <xref linkend="sql-from" endterm="sql-from-title"> derives a
 114     table from one or more other tables given in a comma-separated
 115     table reference list.
 116 <synopsis>
 117 FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
 118 </synopsis>
 119
 120     A table reference can be a table name (possibly schema-qualified),
 121     or a derived table such as a subquery, a table join, or complex
 122     combinations of these.  If more than one table reference is listed
 123     in the <literal>FROM</> clause they are cross-joined (see below)
 124     to form the intermediate virtual table that can then be subject to
 125     transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
 126     and <literal>HAVING</> clauses and is finally the result of the
 127     overall table expression.
 128    </para>
 129
 130    <indexterm>
 131     <primary>ONLY</primary>
 132    </indexterm>
 133
 134    <para>
 135     When a table reference names a table that is the parent of a
 136     table inheritance hierarchy, the table reference produces rows of
 137     not only that table but all of its descendant tables, unless the
 138     key word <literal>ONLY</> precedes the table name.  However, the
 139     reference produces only the columns that appear in the named table
 140     &mdash; any columns added in subtables are ignored.
 141    </para>
 142
 143    <sect3 id="queries-join">
 144     <title>Joined Tables</title>
 145
 146     <indexterm zone="queries-join">
 147      <primary>join</primary>
 148     </indexterm>
 149
 150     <para>
 151      A joined table is a table derived from two other (real or
 152      derived) tables according to the rules of the particular join
 153      type.  Inner, outer, and cross-joins are available.
 154     </para>
 155
 156     <variablelist>
 157      <title>Join Types</title>
 158
 159      <varlistentry>
 160       <term>Cross join</term>
 161
 162       <indexterm>
 163        <primary>join</primary>
 164        <secondary>cross</secondary>
 165       </indexterm>
 166
 167       <indexterm>
 168        <primary>cross join</primary>
 169       </indexterm>
 170
 171       <listitem>
 172 <synopsis>
 173 <replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
 174 </synopsis>
 175
 176        <para>
 177         For every possible combination of rows from
 178         <replaceable>T1</replaceable> and
 179         <replaceable>T2</replaceable> (i.e., a Cartesian product),
 180         the joined table will contain a
 181         row consisting of all columns in <replaceable>T1</replaceable>
 182         followed by all columns in <replaceable>T2</replaceable>.  If
 183         the tables have N and M rows respectively, the joined
 184         table will have N * M rows.
 185        </para>
 186
 187        <para>
 188         <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
 189         <replaceable>T2</replaceable></literal> is equivalent to
 190         <literal>FROM <replaceable>T1</replaceable>,
 191         <replaceable>T2</replaceable></literal>.  It is also equivalent to
 192         <literal>FROM <replaceable>T1</replaceable> INNER JOIN
 193         <replaceable>T2</replaceable> ON TRUE</literal> (see below).
 194        </para>
 195       </listitem>
 196      </varlistentry>
 197
 198      <varlistentry>
 199       <term>Qualified joins</term>
 200
 201       <indexterm>
 202        <primary>join</primary>
 203        <secondary>outer</secondary>
 204       </indexterm>
 205
 206       <indexterm>
 207        <primary>outer join</primary>
 208       </indexterm>
 209
 210       <listitem>
 211 <synopsis>
 212 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
 213 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
 214 <replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
 215 </synopsis>
 216
 217        <para>
 218         The words <literal>INNER</literal> and
 219         <literal>OUTER</literal> are optional in all forms.
 220         <literal>INNER</literal> is the default;
 221         <literal>LEFT</literal>, <literal>RIGHT</literal>, and
 222         <literal>FULL</literal> imply an outer join.
 223        </para>
 224
 225        <para>
 226         The <firstterm>join condition</firstterm> is specified in the
 227         <literal>ON</> or <literal>USING</> clause, or implicitly by
 228         the word <literal>NATURAL</>.  The join condition determines
 229         which rows from the two source tables are considered to
 230         <quote>match</quote>, as explained in detail below.
 231        </para>
 232
 233        <para>
 234         The <literal>ON</> clause is the most general kind of join
 235         condition: it takes a Boolean value expression of the same
 236         kind as is used in a <literal>WHERE</> clause.  A pair of rows
 237         from <replaceable>T1</> and <replaceable>T2</> match if the
 238         <literal>ON</> expression evaluates to true for them.
 239        </para>
 240
 241        <para>
 242         <literal>USING</> is a shorthand notation: it takes a
 243         comma-separated list of column names, which the joined tables
 244         must have in common, and forms a join condition specifying
 245         equality of each of these pairs of columns.  Furthermore, the
 246         output of <literal>JOIN USING</> has one column for each of
 247         the equated pairs of input columns, followed by the
 248         remaining columns from each table.  Thus, <literal>USING (a, b,
 249         c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
 250         t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
 251         if <literal>ON</> is used there will be two columns
 252         <literal>a</>, <literal>b</>, and <literal>c</> in the result,
 253         whereas with <literal>USING</> there will be only one of each
 254         (and they will appear first if <command>SELECT *</> is used).
 255        </para>
 256
 257        <para>
 258         <indexterm>
 259          <primary>join</primary>
 260          <secondary>natural</secondary>
 261         </indexterm>
 262         <indexterm>
 263          <primary>natural join</primary>
 264         </indexterm>
 265         Finally, <literal>NATURAL</> is a shorthand form of
 266         <literal>USING</>: it forms a <literal>USING</> list
 267         consisting of all column names that appear in both
 268         input tables.  As with <literal>USING</>, these columns appear
 269         only once in the output table.
 270        </para>
 271
 272        <para>
 273         The possible types of qualified join are:
 274
 275        <variablelist>
 276         <varlistentry>
 277          <term><literal>INNER JOIN</></term>
 278
 279          <listitem>
 280           <para>
 281            For each row R1 of T1, the joined table has a row for each
 282            row in T2 that satisfies the join condition with R1.
 283           </para>
 284          </listitem>
 285         </varlistentry>
 286
 287         <varlistentry>
 288          <term><literal>LEFT OUTER JOIN</></term>
 289
 290          <indexterm>
 291           <primary>join</primary>
 292           <secondary>left</secondary>
 293          </indexterm>
 294
 295          <indexterm>
 296           <primary>left join</primary>
 297          </indexterm>
 298
 299          <listitem>
 300           <para>
 301            First, an inner join is performed.  Then, for each row in
 302            T1 that does not satisfy the join condition with any row in
 303            T2, a joined row is added with null values in columns of
 304            T2.  Thus, the joined table always has at least
 305            one row for each row in T1.
 306           </para>
 307          </listitem>
 308         </varlistentry>
 309
 310         <varlistentry>
 311          <term><literal>RIGHT OUTER JOIN</></term>
 312
 313          <indexterm>
 314           <primary>join</primary>
 315           <secondary>right</secondary>
 316          </indexterm>
 317
 318          <indexterm>
 319           <primary>right join</primary>
 320          </indexterm>
 321
 322          <listitem>
 323           <para>
 324            First, an inner join is performed.  Then, for each row in
 325            T2 that does not satisfy the join condition with any row in
 326            T1, a joined row is added with null values in columns of
 327            T1.  This is the converse of a left join: the result table
 328            will always have a row for each row in T2.
 329           </para>
 330          </listitem>
 331         </varlistentry>
 332
 333         <varlistentry>
 334          <term><literal>FULL OUTER JOIN</></term>
 335
 336          <listitem>
 337           <para>
 338            First, an inner join is performed.  Then, for each row in
 339            T1 that does not satisfy the join condition with any row in
 340            T2, a joined row is added with null values in columns of
 341            T2.  Also, for each row of T2 that does not satisfy the
 342            join condition with any row in T1, a joined row with null
 343            values in the columns of T1 is added.
 344           </para>
 345          </listitem>
 346         </varlistentry>
 347        </variablelist>
 348        </para>
 349       </listitem>
 350      </varlistentry>
 351     </variablelist>
 352
 353     <para>
 354      Joins of all types can be chained together or nested: either or
 355      both <replaceable>T1</replaceable> and
 356      <replaceable>T2</replaceable> can be joined tables.  Parentheses
 357      can be used around <literal>JOIN</> clauses to control the join
 358      order.  In the absence of parentheses, <literal>JOIN</> clauses
 359      nest left-to-right.
 360     </para>
 361
 362     <para>
 363      To put this together, assume we have tables <literal>t1</literal>:
 364 <programlisting>
 365  num | name
 366 -----+------
 367    1 | a
 368    2 | b
 369    3 | c
 370 </programlisting>
 371      and <literal>t2</literal>:
 372 <programlisting>
 373  num | value
 374 -----+-------
 375    1 | xxx
 376    3 | yyy
 377    5 | zzz
 378 </programlisting>
 379      then we get the following results for the various joins:
 380 <screen>
 381 <prompt>=&gt;</> <userinput>SELECT * FROM t1 CROSS JOIN t2;</>
 382  num | name | num | value
 383 -----+------+-----+-------
 384    1 | a    |   1 | xxx
 385    1 | a    |   3 | yyy
 386    1 | a    |   5 | zzz
 387    2 | b    |   1 | xxx
 388    2 | b    |   3 | yyy
 389    2 | b    |   5 | zzz
 390    3 | c    |   1 | xxx
 391    3 | c    |   3 | yyy
 392    3 | c    |   5 | zzz
 393 (9 rows)
 394
 395 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;</>
 396  num | name | num | value
 397 -----+------+-----+-------
 398    1 | a    |   1 | xxx
 399    3 | c    |   3 | yyy
 400 (2 rows)
 401
 402 <prompt>=&gt;</> <userinput>SELECT * FROM t1 INNER JOIN t2 USING (num);</>
 403  num | name | value
 404 -----+------+-------
 405    1 | a    | xxx
 406    3 | c    | yyy
 407 (2 rows)
 408
 409 <prompt>=&gt;</> <userinput>SELECT * FROM t1 NATURAL INNER JOIN t2;</>
 410  num | name | value
 411 -----+------+-------
 412    1 | a    | xxx
 413    3 | c    | yyy
 414 (2 rows)
 415
 416 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;</>
 417  num | name | num | value
 418 -----+------+-----+-------
 419    1 | a    |   1 | xxx
 420    2 | b    |     |
 421    3 | c    |   3 | yyy
 422 (3 rows)
 423
 424 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 USING (num);</>
 425  num | name | value
 426 -----+------+-------
 427    1 | a    | xxx
 428    2 | b    |
 429    3 | c    | yyy
 430 (3 rows)
 431
 432 <prompt>=&gt;</> <userinput>SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;</>
 433  num | name | num | value
 434 -----+------+-----+-------
 435    1 | a    |   1 | xxx
 436    3 | c    |   3 | yyy
 437      |      |   5 | zzz
 438 (3 rows)
 439
 440 <prompt>=&gt;</> <userinput>SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;</>
 441  num | name | num | value
 442 -----+------+-----+-------
 443    1 | a    |   1 | xxx
 444    2 | b    |     |
 445    3 | c    |   3 | yyy
 446      |      |   5 | zzz
 447 (4 rows)
 448 </screen>
 449     </para>
 450
 451     <para>
 452      The join condition specified with <literal>ON</> can also contain
 453      conditions that do not relate directly to the join.  This can
 454      prove useful for some queries but needs to be thought out
 455      carefully.  For example:
 456 <screen>
 457 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';</>
 458  num | name | num | value
 459 -----+------+-----+-------
 460    1 | a    |   1 | xxx
 461    2 | b    |     |
 462    3 | c    |     |
 463 (3 rows)
 464 </screen>
 465      Notice that placing the restriction in the <literal>WHERE</> clause
 466      produces a different result:
 467 <screen>
 468 <prompt>=&gt;</> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';</>
 469  num | name | num | value
 470 -----+------+-----+-------
 471    1 | a    |   1 | xxx
 472 (1 row)
 473 </screen>
 474      This is because a restriction placed in the <literal>ON</>
 475      clause is processed <emphasis>before</> the join, while
 476      a restriction placed in the <literal>WHERE</> clause is processed
 477      <emphasis>after</> the join.
 478     </para>
 479    </sect3>
 480
 481    <sect3 id="queries-table-aliases">
 482     <title>Table and Column Aliases</title>
 483
 484     <indexterm zone="queries-table-aliases">
 485      <primary>alias</primary>
 486      <secondary>in the FROM clause</secondary>
 487     </indexterm>
 488
 489     <indexterm>
 490      <primary>label</primary>
 491      <see>alias</see>
 492     </indexterm>
 493
 494     <para>
 495      A temporary name can be given to tables and complex table
 496      references to be used for references to the derived table in
 497      the rest of the query.  This is called a <firstterm>table
 498      alias</firstterm>.
 499     </para>
 500
 501     <para>
 502      To create a table alias, write
 503 <synopsis>
 504 FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
 505 </synopsis>
 506      or
 507 <synopsis>
 508 FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
 509 </synopsis>
 510      The <literal>AS</literal> key word is optional noise.
 511      <replaceable>alias</replaceable> can be any identifier.
 512     </para>
 513
 514     <para>
 515      A typical application of table aliases is to assign short
 516      identifiers to long table names to keep the join clauses
 517      readable.  For example:
 518 <programlisting>
 519 SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
 520 </programlisting>
 521     </para>
 522
 523     <para>
 524      The alias becomes the new name of the table reference so far as the
 525      current query is concerned &mdash; it is not allowed to refer to the
 526      table by the original name elsewhere in the query.  Thus, this is not
 527      valid:
 528 <programlisting>
 529 SELECT * FROM my_table AS m WHERE my_table.a &gt; 5;    -- wrong
 530 </programlisting>
 531     </para>
 532
 533     <para>
 534      Table aliases are mainly for notational convenience, but it is
 535      necessary to use them when joining a table to itself, e.g.:
 536 <programlisting>
 537 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
 538 </programlisting>
 539      Additionally, an alias is required if the table reference is a
 540      subquery (see <xref linkend="queries-subqueries">).
 541     </para>
 542
 543     <para>
 544      Parentheses are used to resolve ambiguities.  In the following example,
 545      the first statement assigns the alias <literal>b</literal> to the second
 546      instance of <literal>my_table</>, but the second statement assigns the
 547      alias to the result of the join:
 548 <programlisting>
 549 SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
 550 SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
 551 </programlisting>
 552     </para>
 553
 554     <para>
 555      Another form of table aliasing gives temporary names to the columns of
 556      the table, as well as the table itself:
 557 <synopsis>
 558 FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
 559 </synopsis>
 560      If fewer column aliases are specified than the actual table has
 561      columns, the remaining columns are not renamed.  This syntax is
 562      especially useful for self-joins or subqueries.
 563     </para>
 564
 565     <para>
 566      When an alias is applied to the output of a <literal>JOIN</>
 567      clause, the alias hides the original
 568      name(s) within the <literal>JOIN</>.  For example:
 569 <programlisting>
 570 SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
 571 </programlisting>
 572      is valid SQL, but:
 573 <programlisting>
 574 SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
 575 </programlisting>
 576      is not valid; the table alias <literal>a</> is not visible
 577      outside the alias <literal>c</>.
 578     </para>
 579    </sect3>
 580
 581    <sect3 id="queries-subqueries">
 582     <title>Subqueries</title>
 583
 584     <indexterm zone="queries-subqueries">
 585      <primary>subquery</primary>
 586     </indexterm>
 587
 588     <para>
 589      Subqueries specifying a derived table must be enclosed in
 590      parentheses and <emphasis>must</emphasis> be assigned a table
 591      alias name.  (See <xref linkend="queries-table-aliases">.)  For
 592      example:
 593 <programlisting>
 594 FROM (SELECT * FROM table1) AS alias_name
 595 </programlisting>
 596     </para>
 597
 598     <para>
 599      This example is equivalent to <literal>FROM table1 AS
 600      alias_name</literal>.  More interesting cases, which cannot be
 601      reduced to a plain join, arise when the subquery involves
 602      grouping or aggregation.
 603     </para>
 604
 605     <para>
 606      A subquery can also be a <command>VALUES</> list:
 607 <programlisting>
 608 FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
 609      AS names(first, last)
 610 </programlisting>
 611      Again, a table alias is required.  Assigning alias names to the columns
 612      of the <command>VALUES</> list is optional, but is good practice.
 613      For more information see <xref linkend="queries-values">.
 614     </para>
 615    </sect3>
 616
 617    <sect3 id="queries-tablefunctions">
 618     <title>Table Functions</title>
 619
 620     <indexterm zone="queries-tablefunctions"><primary>table function</></>
 621
 622     <indexterm zone="queries-tablefunctions">
 623      <primary>function</>
 624      <secondary>in the FROM clause</>
 625     </indexterm>
 626
 627     <para>
 628      Table functions are functions that produce a set of rows, made up
 629      of either base data types (scalar types) or composite data types
 630      (table rows).  They are used like a table, view, or subquery in
 631      the <literal>FROM</> clause of a query. Columns returned by table
 632      functions can be included in <literal>SELECT</>,
 633      <literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
 634      as a table, view, or subquery column.
 635     </para>
 636
 637     <para>
 638      If a table function returns a base data type, the single result
 639      column name matches the function name. If the function returns a
 640      composite type, the result columns get the same names as the
 641      individual attributes of the type.
 642     </para>
 643
 644     <para>
 645      A table function can be aliased in the <literal>FROM</> clause,
 646      but it also can be left unaliased. If a function is used in the
 647      <literal>FROM</> clause with no alias, the function name is used
 648      as the resulting table name.
 649     </para>
 650
 651     <para>
 652      Some examples:
 653 <programlisting>
 654 CREATE TABLE foo (fooid int, foosubid int, fooname text);
 655
 656 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
 657     SELECT * FROM foo WHERE fooid = $1;
 658 $$ LANGUAGE SQL;
 659
 660 SELECT * FROM getfoo(1) AS t1;
 661
 662 SELECT * FROM foo
 663     WHERE foosubid IN (
 664                         SELECT foosubid
 665                         FROM getfoo(foo.fooid) z
 666                         WHERE z.fooid = foo.fooid
 667                       );
 668
 669 CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
 670
 671 SELECT * FROM vw_getfoo;
 672 </programlisting>
 673     </para>
 674
 675     <para>
 676      In some cases it is useful to define table functions that can
 677      return different column sets depending on how they are invoked.
 678      To support this, the table function can be declared as returning
 679      the pseudotype <type>record</>.  When such a function is used in
 680      a query, the expected row structure must be specified in the
 681      query itself, so that the system can know how to parse and plan
 682      the query.  Consider this example:
 683 <programlisting>
 684 SELECT *
 685     FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
 686       AS t1(proname name, prosrc text)
 687     WHERE proname LIKE 'bytea%';
 688 </programlisting>
 689      The <xref linkend="CONTRIB-DBLINK-FUNCTION"> function
 690      (part of the <xref linkend="dblink"> module>) executes
 691      a remote query.  It is declared to return
 692      <type>record</> since it might be used for any kind of query.
 693      The actual column set must be specified in the calling query so
 694      that the parser knows, for example, what <literal>*</> should
 695      expand to.
 696     </para>
 697    </sect3>
 698   </sect2>
 699
 700   <sect2 id="queries-where">
 701    <title>The <literal>WHERE</literal> Clause</title>
 702
 703    <indexterm zone="queries-where">
 704     <primary>WHERE</primary>
 705    </indexterm>
 706
 707    <para>
 708     The syntax of the <xref linkend="sql-where"
 709     endterm="sql-where-title"> is
 710 <synopsis>
 711 WHERE <replaceable>search_condition</replaceable>
 712 </synopsis>
 713     where <replaceable>search_condition</replaceable> is any value
 714     expression (see <xref linkend="sql-expressions">) that
 715     returns a value of type <type>boolean</type>.
 716    </para>
 717
 718    <para>
 719     After the processing of the <literal>FROM</> clause is done, each
 720     row of the derived virtual table is checked against the search
 721     condition.  If the result of the condition is true, the row is
 722     kept in the output table, otherwise (i.e., if the result is
 723     false or null) it is discarded.  The search condition typically
 724     references at least one column of the table generated in the
 725     <literal>FROM</> clause; this is not required, but otherwise the
 726     <literal>WHERE</> clause will be fairly useless.
 727    </para>
 728
 729    <note>
 730     <para>
 731      The join condition of an inner join can be written either in
 732      the <literal>WHERE</> clause or in the <literal>JOIN</> clause.
 733      For example, these table expressions are equivalent:
 734 <programlisting>
 735 FROM a, b WHERE a.id = b.id AND b.val &gt; 5
 736 </programlisting>
 737      and:
 738 <programlisting>
 739 FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val &gt; 5
 740 </programlisting>
 741      or perhaps even:
 742 <programlisting>
 743 FROM a NATURAL JOIN b WHERE b.val &gt; 5
 744 </programlisting>
 745      Which one of these you use is mainly a matter of style.  The
 746      <literal>JOIN</> syntax in the <literal>FROM</> clause is
 747      probably not as portable to other SQL database management systems,
 748      even though it is in the SQL standard.  For
 749      outer joins there is no choice:  they must be done in
 750      the <literal>FROM</> clause.  The <literal>ON</> or <literal>USING</>
 751      clause of an outer join is <emphasis>not</> equivalent to a
 752      <literal>WHERE</> condition, because it results in the addition
 753      of rows (for unmatched input rows) as well as the removal of rows
 754      in the final result.
 755     </para>
 756    </note>
 757
 758    <para>
 759     Here are some examples of <literal>WHERE</literal> clauses:
 760 <programlisting>
 761 SELECT ... FROM fdt WHERE c1 &gt; 5
 762
 763 SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
 764
 765 SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
 766
 767 SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
 768
 769 SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
 770
 771 SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 &gt; fdt.c1)
 772 </programlisting>
 773     <literal>fdt</literal> is the table derived in the
 774     <literal>FROM</> clause. Rows that do not meet the search
 775     condition of the <literal>WHERE</> clause are eliminated from
 776     <literal>fdt</literal>. Notice the use of scalar subqueries as
 777     value expressions.  Just like any other query, the subqueries can
 778     employ complex table expressions.  Notice also how
 779     <literal>fdt</literal> is referenced in the subqueries.
 780     Qualifying <literal>c1</> as <literal>fdt.c1</> is only necessary
 781     if <literal>c1</> is also the name of a column in the derived
 782     input table of the subquery.  But qualifying the column name adds
 783     clarity even when it is not needed.  This example shows how the column
 784     naming scope of an outer query extends into its inner queries.
 785    </para>
 786   </sect2>
 787
 788
 789   <sect2 id="queries-group">
 790    <title>The <literal>GROUP BY</literal> and <literal>HAVING</literal> Clauses</title>
 791
 792    <indexterm zone="queries-group">
 793     <primary>GROUP BY</primary>
 794    </indexterm>
 795
 796    <indexterm zone="queries-group">
 797     <primary>grouping</primary>
 798    </indexterm>
 799
 800    <para>
 801     After passing the <literal>WHERE</> filter, the derived input
 802     table might be subject to grouping, using the <literal>GROUP BY</>
 803     clause, and elimination of group rows using the <literal>HAVING</>
 804     clause.
 805    </para>
 806
 807 <synopsis>
 808 SELECT <replaceable>select_list</replaceable>
 809     FROM ...
 810     <optional>WHERE ...</optional>
 811     GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
 812 </synopsis>
 813
 814    <para>
 815     The <xref linkend="sql-groupby" endterm="sql-groupby-title"> is
 816     used to group together those rows in a table that have the same
 817     values in all the columns listed. The order in which the columns
 818     are listed does not matter.  The effect is to combine each set
 819     of rows having common values into one group row that
 820     represents all rows in the group.  This is done to
 821     eliminate redundancy in the output and/or compute aggregates that
 822     apply to these groups.  For instance:
 823 <screen>
 824 <prompt>=&gt;</> <userinput>SELECT * FROM test1;</>
 825  x | y
 826 ---+---
 827  a | 3
 828  c | 2
 829  b | 5
 830  a | 1
 831 (4 rows)
 832
 833 <prompt>=&gt;</> <userinput>SELECT x FROM test1 GROUP BY x;</>
 834  x
 835 ---
 836  a
 837  b
 838  c
 839 (3 rows)
 840 </screen>
 841    </para>
 842
 843    <para>
 844     In the second query, we could not have written <literal>SELECT *
 845     FROM test1 GROUP BY x</literal>, because there is no single value
 846     for the column <literal>y</> that could be associated with each
 847     group.  The grouped-by columns can be referenced in the select list since
 848     they have a single value in each group.
 849    </para>
 850
 851    <para>
 852     In general, if a table is grouped, columns that are not
 853     listed in <literal>GROUP BY</> cannot be referenced except in aggregate
 854     expressions.  An example with aggregate expressions is:
 855 <screen>
 856 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x;</>
 857  x | sum
 858 ---+-----
 859  a |   4
 860  b |   5
 861  c |   2
 862 (3 rows)
 863 </screen>
 864     Here <literal>sum</literal> is an aggregate function that
 865     computes a single value over the entire group.  More information
 866     about the available aggregate functions can be found in <xref
 867     linkend="functions-aggregate">.
 868    </para>
 869
 870    <tip>
 871     <para>
 872      Grouping without aggregate expressions effectively calculates the
 873      set of distinct values in a column.  This can also be achieved
 874      using the <literal>DISTINCT</> clause (see <xref
 875      linkend="queries-distinct">).
 876     </para>
 877    </tip>
 878
 879    <para>
 880     Here is another example:  it calculates the total sales for each
 881     product (rather than the total sales of all products):
 882 <programlisting>
 883 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
 884     FROM products p LEFT JOIN sales s USING (product_id)
 885     GROUP BY product_id, p.name, p.price;
 886 </programlisting>
 887     In this example, the columns <literal>product_id</literal>,
 888     <literal>p.name</literal>, and <literal>p.price</literal> must be
 889     in the <literal>GROUP BY</> clause since they are referenced in
 890     the query select list (but see below).  The column
 891     <literal>s.units</> does not have to be in the <literal>GROUP
 892     BY</> list since it is only used in an aggregate expression
 893     (<literal>sum(...)</literal>), which represents the sales
 894     of a product.  For each product, the query returns a summary row about
 895     all sales of the product.
 896    </para>
 897
 898    <para>
 899     If the products table is set up so that,
 900     say, <literal>product_id</literal> is the primary key, then it
 901     would be enough to group by <literal>product_id</literal> in the
 902     above example, since name and price would
 903     be <firstterm>functionally
 904     dependent</firstterm><indexterm><primary>functional
 905     dependency</primary></indexterm> on the product ID, and so there
 906     would be no ambiguity about which name and price value to return
 907     for each product ID group.
 908    </para>
 909
 910    <para>
 911     In strict SQL, <literal>GROUP BY</> can only group by columns of
 912     the source table but <productname>PostgreSQL</productname> extends
 913     this to also allow <literal>GROUP BY</> to group by columns in the
 914     select list.  Grouping by value expressions instead of simple
 915     column names is also allowed.
 916    </para>
 917
 918    <indexterm>
 919     <primary>HAVING</primary>
 920    </indexterm>
 921
 922    <para>
 923     If a table has been grouped using <literal>GROUP BY</literal>,
 924     but only certain groups are of interest, the
 925     <literal>HAVING</literal> clause can be used, much like a
 926     <literal>WHERE</> clause, to eliminate groups from the result.
 927     The syntax is:
 928 <synopsis>
 929 SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
 930 </synopsis>
 931     Expressions in the <literal>HAVING</> clause can refer both to
 932     grouped expressions and to ungrouped expressions (which necessarily
 933     involve an aggregate function).
 934    </para>
 935
 936    <para>
 937     Example:
 938 <screen>
 939 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) &gt; 3;</>
 940  x | sum
 941 ---+-----
 942  a |   4
 943  b |   5
 944 (2 rows)
 945
 946 <prompt>=&gt;</> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x &lt; 'c';</>
 947  x | sum
 948 ---+-----
 949  a |   4
 950  b |   5
 951 (2 rows)
 952 </screen>
 953    </para>
 954
 955    <para>
 956     Again, a more realistic example:
 957 <programlisting>
 958 SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
 959     FROM products p LEFT JOIN sales s USING (product_id)
 960     WHERE s.date &gt; CURRENT_DATE - INTERVAL '4 weeks'
 961     GROUP BY product_id, p.name, p.price, p.cost
 962     HAVING sum(p.price * s.units) &gt; 5000;
 963 </programlisting>
 964     In the example above, the <literal>WHERE</> clause is selecting
 965     rows by a column that is not grouped (the expression is only true for
 966     sales during the last four weeks), while the <literal>HAVING</>
 967     clause restricts the output to groups with total gross sales over
 968     5000.  Note that the aggregate expressions do not necessarily need
 969     to be the same in all parts of the query.
 970    </para>
 971
 972    <para>
 973     If a query contains aggregate function calls, but no <literal>GROUP BY</>
 974     clause, grouping still occurs: the result is a single group row (or
 975     perhaps no rows at all, if the single row is then eliminated by
 976     <literal>HAVING</>).
 977     The same is true if it contains a <literal>HAVING</> clause, even
 978     without any aggregate function calls or <literal>GROUP BY</> clause.
 979    </para>
 980   </sect2>
 981
 982   <sect2 id="queries-window">
 983    <title>Window Function Processing</title>
 984
 985    <indexterm zone="queries-window">
 986     <primary>window function</primary>
 987     <secondary>order of execution</>
 988    </indexterm>
 989
 990    <para>
 991     If the query contains any window functions (see
 992     <xref linkend="tutorial-window">,
 993     <xref linkend="functions-window"> and
 994     <xref linkend="syntax-window-functions">), these functions are evaluated
 995     after any grouping, aggregation, and <literal>HAVING</> filtering is
 996     performed.  That is, if the query uses any aggregates, <literal>GROUP
 997     BY</>, or <literal>HAVING</>, then the rows seen by the window functions
 998     are the group rows instead of the original table rows from
 999     <literal>FROM</>/<literal>WHERE</>.
1000    </para>
1001
1002    <para>
1003     When multiple window functions are used, all the window functions having
1004     syntactically equivalent <literal>PARTITION BY</> and <literal>ORDER BY</>
1005     clauses in their window definitions are guaranteed to be evaluated in a
1006     single pass over the data. Therefore they will see the same sort ordering,
1007     even if the <literal>ORDER BY</> does not uniquely determine an ordering.
1008     However, no guarantees are made about the evaluation of functions having
1009     different <literal>PARTITION BY</> or <literal>ORDER BY</> specifications.
1010     (In such cases a sort step is typically required between the passes of
1011     window function evaluations, and the sort is not guaranteed to preserve
1012     ordering of rows that its <literal>ORDER BY</> sees as equivalent.)
1013    </para>
1014
1015    <para>
1016     Currently, window functions always require presorted data, and so the
1017     query output will be ordered according to one or another of the window
1018     functions' <literal>PARTITION BY</>/<literal>ORDER BY</> clauses.
1019     It is not recommendable to rely on this, however.  Use an explicit
1020     top-level <literal>ORDER BY</> clause if you want to be sure the
1021     results are sorted in a particular way.
1022    </para>
1023   </sect2>
1024  </sect1>
1025
1026
1027  <sect1 id="queries-select-lists">
1028   <title>Select Lists</title>
1029
1030   <indexterm>
1031    <primary>SELECT</primary>
1032    <secondary>select list</secondary>
1033   </indexterm>
1034
1035   <para>
1036    As shown in the previous section,
1037    the table expression in the <command>SELECT</command> command
1038    constructs an intermediate virtual table by possibly combining
1039    tables, views, eliminating rows, grouping, etc.  This table is
1040    finally passed on to processing by the <firstterm>select list</firstterm>.  The select
1041    list determines which <emphasis>columns</emphasis> of the
1042    intermediate table are actually output.
1043   </para>
1044
1045   <sect2 id="queries-select-list-items">
1046    <title>Select-List Items</title>
1047
1048    <indexterm>
1049     <primary>*</primary>
1050    </indexterm>
1051
1052    <para>
1053     The simplest kind of select list is <literal>*</literal> which
1054     emits all columns that the table expression produces.  Otherwise,
1055     a select list is a comma-separated list of value expressions (as
1056     defined in <xref linkend="sql-expressions">).  For instance, it
1057     could be a list of column names:
1058 <programlisting>
1059 SELECT a, b, c FROM ...
1060 </programlisting>
1061      The columns names <literal>a</>, <literal>b</>, and <literal>c</>
1062      are either the actual names of the columns of tables referenced
1063      in the <literal>FROM</> clause, or the aliases given to them as
1064      explained in <xref linkend="queries-table-aliases">.  The name
1065      space available in the select list is the same as in the
1066      <literal>WHERE</> clause, unless grouping is used, in which case
1067      it is the same as in the <literal>HAVING</> clause.
1068    </para>
1069
1070    <para>
1071     If more than one table has a column of the same name, the table
1072     name must also be given, as in:
1073 <programlisting>
1074 SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
1075 </programlisting>
1076     When working with multiple tables, it can also be useful to ask for
1077     all the columns of a particular table:
1078 <programlisting>
1079 SELECT tbl1.*, tbl2.a FROM ...
1080 </programlisting>
1081     (See also <xref linkend="queries-where">.)
1082    </para>
1083
1084    <para>
1085     If an arbitrary value expression is used in the select list, it
1086     conceptually adds a new virtual column to the returned table.  The
1087     value expression is evaluated once for each result row, with
1088     the row's values substituted for any column references.  But the
1089     expressions in the select list do not have to reference any
1090     columns in the table expression of the <literal>FROM</> clause;
1091     they can be constant arithmetic expressions, for instance.
1092    </para>
1093   </sect2>
1094
1095   <sect2 id="queries-column-labels">
1096    <title>Column Labels</title>
1097
1098    <indexterm zone="queries-column-labels">
1099     <primary>alias</primary>
1100     <secondary>in the select list</secondary>
1101    </indexterm>
1102
1103    <para>
1104     The entries in the select list can be assigned names for subsequent
1105     processing, such as for use in an <literal>ORDER BY</> clause
1106     or for display by the client application.  For example:
1107 <programlisting>
1108 SELECT a AS value, b + c AS sum FROM ...
1109 </programlisting>
1110    </para>
1111
1112    <para>
1113     If no output column name is specified using <literal>AS</>,
1114     the system assigns a default column name.  For simple column references,
1115     this is the name of the referenced column.  For function
1116     calls, this is the name of the function.  For complex expressions,
1117     the system will generate a generic name.
1118    </para>
1119
1120    <para>
1121     The <literal>AS</> keyword is optional, but only if the new column
1122     name does not match any
1123     <productname>PostgreSQL</productname> keyword (see <xref
1124     linkend="sql-keywords-appendix">).  To avoid an accidental match to
1125     a keyword, you can double-quote the column name.  For example,
1126     <literal>VALUE</> is a keyword, so this does not work:
1127 <programlisting>
1128 SELECT a value, b + c AS sum FROM ...
1129 </programlisting>
1130     but this does:
1131 <programlisting>
1132 SELECT a "value", b + c AS sum FROM ...
1133 </programlisting>
1134     For protection against possible
1135     future keyword additions, it is recommended that you always either
1136     write <literal>AS</literal> or double-quote the output column name.
1137    </para>
1138
1139    <note>
1140     <para>
1141      The naming of output columns here is different from that done in
1142      the <literal>FROM</> clause (see <xref
1143      linkend="queries-table-aliases">).  It is possible
1144      to rename the same column twice, but the name assigned in
1145      the select list is the one that will be passed on.
1146     </para>
1147    </note>
1148   </sect2>
1149
1150   <sect2 id="queries-distinct">
1151    <title><literal>DISTINCT</literal></title>
1152
1153    <indexterm zone="queries-distinct">
1154     <primary>DISTINCT</primary>
1155    </indexterm>
1156
1157    <indexterm zone="queries-distinct">
1158     <primary>duplicates</primary>
1159    </indexterm>
1160
1161    <para>
1162     After the select list has been processed, the result table can
1163     optionally be subject to the elimination of duplicate rows.  The
1164     <literal>DISTINCT</literal> key word is written directly after
1165     <literal>SELECT</literal> to specify this:
1166 <synopsis>
1167 SELECT DISTINCT <replaceable>select_list</replaceable> ...
1168 </synopsis>
1169     (Instead of <literal>DISTINCT</> the key word <literal>ALL</literal>
1170     can be used to specify the default behavior of retaining all rows.)
1171    </para>
1172
1173    <para>
1174     <indexterm><primary>null value</><secondary sortas="DISTINCT">in
1175     DISTINCT</></indexterm>
1176     Obviously, two rows are considered distinct if they differ in at
1177     least one column value.  Null values are considered equal in this
1178     comparison.
1179    </para>
1180
1181    <para>
1182     Alternatively, an arbitrary expression can determine what rows are
1183     to be considered distinct:
1184 <synopsis>
1185 SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
1186 </synopsis>
1187     Here <replaceable>expression</replaceable> is an arbitrary value
1188     expression that is evaluated for all rows.  A set of rows for
1189     which all the expressions are equal are considered duplicates, and
1190     only the first row of the set is kept in the output.  Note that
1191     the <quote>first row</quote> of a set is unpredictable unless the
1192     query is sorted on enough columns to guarantee a unique ordering
1193     of the rows arriving at the <literal>DISTINCT</> filter.
1194     (<literal>DISTINCT ON</> processing occurs after <literal>ORDER
1195     BY</> sorting.)
1196    </para>
1197
1198    <para>
1199     The <literal>DISTINCT ON</> clause is not part of the SQL standard
1200     and is sometimes considered bad style because of the potentially
1201     indeterminate nature of its results.  With judicious use of
1202     <literal>GROUP BY</> and subqueries in <literal>FROM</>, this
1203     construct can be avoided, but it is often the most convenient
1204     alternative.
1205    </para>
1206   </sect2>
1207  </sect1>
1208
1209
1210  <sect1 id="queries-union">
1211   <title>Combining Queries</title>
1212
1213   <indexterm zone="queries-union">
1214    <primary>UNION</primary>
1215   </indexterm>
1216   <indexterm zone="queries-union">
1217    <primary>INTERSECT</primary>
1218   </indexterm>
1219   <indexterm zone="queries-union">
1220    <primary>EXCEPT</primary>
1221   </indexterm>
1222   <indexterm zone="queries-union">
1223    <primary>set union</primary>
1224   </indexterm>
1225   <indexterm zone="queries-union">
1226    <primary>set intersection</primary>
1227   </indexterm>
1228   <indexterm zone="queries-union">
1229    <primary>set difference</primary>
1230   </indexterm>
1231   <indexterm zone="queries-union">
1232    <primary>set operation</primary>
1233   </indexterm>
1234
1235   <para>
1236    The results of two queries can be combined using the set operations
1237    union, intersection, and difference.  The syntax is
1238 <synopsis>
1239 <replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
1240 <replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
1241 <replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
1242 </synopsis>
1243    <replaceable>query1</replaceable> and
1244    <replaceable>query2</replaceable> are queries that can use any of
1245    the features discussed up to this point.  Set operations can also
1246    be nested and chained, for example
1247 <synopsis>
1248 <replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
1249 </synopsis>
1250    which is executed as:
1251 <synopsis>
1252 (<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
1253 </synopsis>
1254   </para>
1255
1256   <para>
1257    <literal>UNION</> effectively appends the result of
1258    <replaceable>query2</replaceable> to the result of
1259    <replaceable>query1</replaceable> (although there is no guarantee
1260    that this is the order in which the rows are actually returned).
1261    Furthermore, it eliminates duplicate rows from its result, in the same
1262    way as <literal>DISTINCT</>, unless <literal>UNION ALL</> is used.
1263   </para>
1264
1265   <para>
1266    <literal>INTERSECT</> returns all rows that are both in the result
1267    of <replaceable>query1</replaceable> and in the result of
1268    <replaceable>query2</replaceable>.  Duplicate rows are eliminated
1269    unless <literal>INTERSECT ALL</> is used.
1270   </para>
1271
1272   <para>
1273    <literal>EXCEPT</> returns all rows that are in the result of
1274    <replaceable>query1</replaceable> but not in the result of
1275    <replaceable>query2</replaceable>.  (This is sometimes called the
1276    <firstterm>difference</> between two queries.)  Again, duplicates
1277    are eliminated unless <literal>EXCEPT ALL</> is used.
1278   </para>
1279
1280   <para>
1281    In order to calculate the union, intersection, or difference of two
1282    queries, the two queries must be <quote>union compatible</quote>,
1283    which means that they return the same number of columns and
1284    the corresponding columns have compatible data types, as
1285    described in <xref linkend="typeconv-union-case">.
1286   </para>
1287  </sect1>
1288
1289
1290  <sect1 id="queries-order">
1291   <title>Sorting Rows</title>
1292
1293   <indexterm zone="queries-order">
1294    <primary>sorting</primary>
1295   </indexterm>
1296
1297   <indexterm zone="queries-order">
1298    <primary>ORDER BY</primary>
1299   </indexterm>
1300
1301   <para>
1302    After a query has produced an output table (after the select list
1303    has been processed) it can optionally be sorted.  If sorting is not
1304    chosen, the rows will be returned in an unspecified order.  The actual
1305    order in that case will depend on the scan and join plan types and
1306    the order on disk, but it must not be relied on.  A particular
1307    output ordering can only be guaranteed if the sort step is explicitly
1308    chosen.
1309   </para>
1310
1311   <para>
1312    The <literal>ORDER BY</> clause specifies the sort order:
1313 <synopsis>
1314 SELECT <replaceable>select_list</replaceable>
1315     FROM <replaceable>table_expression</replaceable>
1316     ORDER BY <replaceable>sort_expression1</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional>
1317              <optional>, <replaceable>sort_expression2</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional> ...</optional>
1318 </synopsis>
1319    The sort expression(s) can be any expression that would be valid in the
1320    query's select list.  An example is:
1321 <programlisting>
1322 SELECT a, b FROM table1 ORDER BY a + b, c;
1323 </programlisting>
1324    When more than one expression is specified,
1325    the later values are used to sort rows that are equal according to the
1326    earlier values.  Each expression can be followed by an optional
1327    <literal>ASC</> or <literal>DESC</> keyword to set the sort direction to
1328    ascending or descending.  <literal>ASC</> order is the default.
1329    Ascending order puts smaller values first, where
1330    <quote>smaller</quote> is defined in terms of the
1331    <literal>&lt;</literal> operator.  Similarly, descending order is
1332    determined with the <literal>&gt;</literal> operator.
1333     <footnote>
1334      <para>
1335       Actually, <productname>PostgreSQL</> uses the <firstterm>default B-tree
1336       operator class</> for the expression's data type to determine the sort
1337       ordering for <literal>ASC</> and <literal>DESC</>.  Conventionally,
1338       data types will be set up so that the <literal>&lt;</literal> and
1339       <literal>&gt;</literal> operators correspond to this sort ordering,
1340       but a user-defined data type's designer could choose to do something
1341       different.
1342      </para>
1343     </footnote>
1344   </para>
1345
1346   <para>
1347    The <literal>NULLS FIRST</> and <literal>NULLS LAST</> options can be
1348    used to determine whether nulls appear before or after non-null values
1349    in the sort ordering.  By default, null values sort as if larger than any
1350    non-null value; that is, <literal>NULLS FIRST</> is the default for
1351    <literal>DESC</> order, and <literal>NULLS LAST</> otherwise.
1352   </para>
1353
1354   <para>
1355    Note that the ordering options are considered independently for each
1356    sort column.  For example <literal>ORDER BY x, y DESC</> means
1357    <literal>ORDER BY x ASC, y DESC</>, which is not the same as
1358    <literal>ORDER BY x DESC, y DESC</>.
1359   </para>
1360
1361   <para>
1362    A <replaceable>sort_expression</> can also be the column label or number
1363    of an output column, as in:
1364 <programlisting>
1365 SELECT a + b AS sum, c FROM table1 ORDER BY sum;
1366 SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
1367 </programlisting>
1368    both of which sort by the first output column.  Note that an output
1369    column name has to stand alone, that is, it cannot be used in an expression
1370    &mdash; for example, this is <emphasis>not</> correct:
1371 <programlisting>
1372 SELECT a + b AS sum, c FROM table1 ORDER BY sum + c;          -- wrong
1373 </programlisting>
1374    This restriction is made to reduce ambiguity.  There is still
1375    ambiguity if an <literal>ORDER BY</> item is a simple name that
1376    could match either an output column name or a column from the table
1377    expression.  The output column is used in such cases.  This would
1378    only cause confusion if you use <literal>AS</> to rename an output
1379    column to match some other table column's name.
1380   </para>
1381
1382   <para>
1383    <literal>ORDER BY</> can be applied to the result of a
1384    <literal>UNION</>, <literal>INTERSECT</>, or <literal>EXCEPT</>
1385    combination, but in this case it is only permitted to sort by
1386    output column names or numbers, not by expressions.
1387   </para>
1388  </sect1>
1389
1390
1391  <sect1 id="queries-limit">
1392   <title><literal>LIMIT</literal> and <literal>OFFSET</literal></title>
1393
1394   <indexterm zone="queries-limit">
1395    <primary>LIMIT</primary>
1396   </indexterm>
1397
1398   <indexterm zone="queries-limit">
1399    <primary>OFFSET</primary>
1400   </indexterm>
1401
1402   <para>
1403    <literal>LIMIT</> and <literal>OFFSET</> allow you to retrieve just
1404    a portion of the rows that are generated by the rest of the query:
1405 <synopsis>
1406 SELECT <replaceable>select_list</replaceable>
1407     FROM <replaceable>table_expression</replaceable>
1408     <optional> ORDER BY ... </optional>
1409     <optional> LIMIT { <replaceable>number</replaceable> | ALL } </optional> <optional> OFFSET <replaceable>number</replaceable> </optional>
1410 </synopsis>
1411   </para>
1412
1413   <para>
1414    If a limit count is given, no more than that many rows will be
1415    returned (but possibly less, if the query itself yields less rows).
1416    <literal>LIMIT ALL</> is the same as omitting the <literal>LIMIT</>
1417    clause.
1418   </para>
1419
1420   <para>
1421    <literal>OFFSET</> says to skip that many rows before beginning to
1422    return rows.  <literal>OFFSET 0</> is the same as omitting the
1423    <literal>OFFSET</> clause, and <literal>LIMIT NULL</> is the same
1424    as omitting the <literal>LIMIT</> clause.  If both <literal>OFFSET</>
1425    and <literal>LIMIT</> appear, then <literal>OFFSET</> rows are
1426    skipped before starting to count the <literal>LIMIT</> rows that
1427    are returned.
1428   </para>
1429
1430   <para>
1431    When using <literal>LIMIT</>, it is important to use an
1432    <literal>ORDER BY</> clause that constrains the result rows into a
1433    unique order.  Otherwise you will get an unpredictable subset of
1434    the query's rows. You might be asking for the tenth through
1435    twentieth rows, but tenth through twentieth in what ordering? The
1436    ordering is unknown, unless you specified <literal>ORDER BY</>.
1437   </para>
1438
1439   <para>
1440    The query optimizer takes <literal>LIMIT</> into account when
1441    generating query plans, so you are very likely to get different
1442    plans (yielding different row orders) depending on what you give
1443    for <literal>LIMIT</> and <literal>OFFSET</>.  Thus, using
1444    different <literal>LIMIT</>/<literal>OFFSET</> values to select
1445    different subsets of a query result <emphasis>will give
1446    inconsistent results</emphasis> unless you enforce a predictable
1447    result ordering with <literal>ORDER BY</>.  This is not a bug; it
1448    is an inherent consequence of the fact that SQL does not promise to
1449    deliver the results of a query in any particular order unless
1450    <literal>ORDER BY</> is used to constrain the order.
1451   </para>
1452
1453   <para>
1454    The rows skipped by an <literal>OFFSET</> clause still have to be
1455    computed inside the server; therefore a large <literal>OFFSET</>
1456    might be inefficient.
1457   </para>
1458  </sect1>
1459
1460
1461  <sect1 id="queries-values">
1462   <title><literal>VALUES</literal> Lists</title>
1463
1464   <indexterm zone="queries-values">
1465    <primary>VALUES</primary>
1466   </indexterm>
1467
1468   <para>
1469    <literal>VALUES</> provides a way to generate a <quote>constant table</>
1470    that can be used in a query without having to actually create and populate
1471    a table on-disk.  The syntax is
1472 <synopsis>
1473 VALUES ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) [, ...]
1474 </synopsis>
1475    Each parenthesized list of expressions generates a row in the table.
1476    The lists must all have the same number of elements (i.e., the number
1477    of columns in the table), and corresponding entries in each list must
1478    have compatible data types.  The actual data type assigned to each column
1479    of the result is determined using the same rules as for <literal>UNION</>
1480    (see <xref linkend="typeconv-union-case">).
1481   </para>
1482
1483   <para>
1484    As an example:
1485 <programlisting>
1486 VALUES (1, 'one'), (2, 'two'), (3, 'three');
1487 </programlisting>
1488
1489    will return a table of two columns and three rows.  It's effectively
1490    equivalent to:
1491 <programlisting>
1492 SELECT 1 AS column1, 'one' AS column2
1493 UNION ALL
1494 SELECT 2, 'two'
1495 UNION ALL
1496 SELECT 3, 'three';
1497 </programlisting>
1498
1499    By default, <productname>PostgreSQL</productname> assigns the names
1500    <literal>column1</>, <literal>column2</>, etc. to the columns of a
1501    <literal>VALUES</> table.  The column names are not specified by the
1502    SQL standard and different database systems do it differently, so
1503    it's usually better to override the default names with a table alias
1504    list.
1505   </para>
1506
1507   <para>
1508    Syntactically, <literal>VALUES</> followed by expression lists is
1509    treated as equivalent to:
1510 <synopsis>
1511 SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable>
1512 </synopsis>
1513    and can appear anywhere a <literal>SELECT</> can.  For example, you can
1514    use it as part of a <literal>UNION</>, or attach a
1515    <replaceable>sort_specification</replaceable> (<literal>ORDER BY</>,
1516    <literal>LIMIT</>, and/or <literal>OFFSET</>) to it.  <literal>VALUES</>
1517    is most commonly used as the data source in an <command>INSERT</> command,
1518    and next most commonly as a subquery.
1519   </para>
1520
1521   <para>
1522    For more information see <xref linkend="sql-values">.
1523   </para>
1524
1525  </sect1>
1526
1527
1528  <sect1 id="queries-with">
1529   <title><literal>WITH</literal> Queries (Common Table Expressions)</title>
1530
1531   <indexterm zone="queries-with">
1532    <primary>WITH</primary>
1533    <secondary>in SELECT</secondary>
1534   </indexterm>
1535
1536   <indexterm>
1537    <primary>common table expression</primary>
1538    <see>WITH</see>
1539   </indexterm>
1540
1541   <para>
1542    <literal>WITH</> provides a way to write auxiliary statements for use in a
1543    larger query.  These statements, which are often referred to as Common
1544    Table Expressions or <acronym>CTE</acronym>s, can be thought of as defining
1545    temporary tables that exist just for one query.  Each auxiliary statement
1546    in a <literal>WITH</> clause can be a <command>SELECT</>,
1547    <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>; and the
1548    <literal>WITH</> clause itself is attached to a primary statement that can
1549    also be a <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>, or
1550    <command>DELETE</>.
1551   </para>
1552
1553  <sect2 id="queries-with-select">
1554    <title><command>SELECT</> in <literal>WITH</></title>
1555
1556   <para>
1557    The basic value of <command>SELECT</> in <literal>WITH</> is to
1558    break down complicated queries into simpler parts.  An example is:
1559
1560 <programlisting>
1561 WITH regional_sales AS (
1562         SELECT region, SUM(amount) AS total_sales
1563         FROM orders
1564         GROUP BY region
1565      ), top_regions AS (
1566         SELECT region
1567         FROM regional_sales
1568         WHERE total_sales &gt; (SELECT SUM(total_sales)/10 FROM regional_sales)
1569      )
1570 SELECT region,
1571        product,
1572        SUM(quantity) AS product_units,
1573        SUM(amount) AS product_sales
1574 FROM orders
1575 WHERE region IN (SELECT region FROM top_regions)
1576 GROUP BY region, product;
1577 </programlisting>
1578
1579    which displays per-product sales totals in only the top sales regions.
1580    The <literal>WITH</> clause defines two auxiliary statements named
1581    <structname>regional_sales</> and <structname>top_regions</>,
1582    where the output of <structname>regional_sales</> is used in
1583    <structname>top_regions</> and the output of <structname>top_regions</>
1584    is used in the primary <command>SELECT</> query.
1585    This example could have been written without <literal>WITH</>,
1586    but we'd have needed two levels of nested sub-SELECTs.  It's a bit
1587    easier to follow this way.
1588   </para>
1589
1590   <para>
1591    The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
1592    from a mere syntactic convenience into a feature that accomplishes
1593    things not otherwise possible in standard SQL.  Using
1594    <literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
1595    output.  A very simple example is this query to sum the integers from 1
1596    through 100:
1597
1598 <programlisting>
1599 WITH RECURSIVE t(n) AS (
1600     VALUES (1)
1601   UNION ALL
1602     SELECT n+1 FROM t WHERE n &lt; 100
1603 )
1604 SELECT sum(n) FROM t;
1605 </programlisting>
1606
1607    The general form of a recursive <literal>WITH</> query is always a
1608    <firstterm>non-recursive term</>, then <literal>UNION</> (or
1609    <literal>UNION ALL</>), then a
1610    <firstterm>recursive term</>, where only the recursive term can contain
1611    a reference to the query's own output.  Such a query is executed as
1612    follows:
1613   </para>
1614
1615   <procedure>
1616    <title>Recursive Query Evaluation</title>
1617
1618    <step performance="required">
1619     <para>
1620      Evaluate the non-recursive term.  For <literal>UNION</> (but not
1621      <literal>UNION ALL</>), discard duplicate rows.  Include all remaining
1622      rows in the result of the recursive query, and also place them in a
1623      temporary <firstterm>working table</>.
1624     </para>
1625    </step>
1626
1627    <step performance="required">
1628     <para>
1629      So long as the working table is not empty, repeat these steps:
1630     </para>
1631     <substeps>
1632      <step performance="required">
1633       <para>
1634        Evaluate the recursive term, substituting the current contents of
1635        the working table for the recursive self-reference.
1636        For <literal>UNION</> (but not <literal>UNION ALL</>), discard
1637        duplicate rows and rows that duplicate any previous result row.
1638        Include all remaining rows in the result of the recursive query, and
1639        also place them in a temporary <firstterm>intermediate table</>.
1640       </para>
1641      </step>
1642
1643      <step performance="required">
1644       <para>
1645        Replace the contents of the working table with the contents of the
1646        intermediate table, then empty the intermediate table.
1647       </para>
1648      </step>
1649     </substeps>
1650    </step>
1651   </procedure>
1652
1653   <note>
1654    <para>
1655     Strictly speaking, this process is iteration not recursion, but
1656     <literal>RECURSIVE</> is the terminology chosen by the SQL standards
1657     committee.
1658    </para>
1659   </note>
1660
1661   <para>
1662    In the example above, the working table has just a single row in each step,
1663    and it takes on the values from 1 through 100 in successive steps.  In
1664    the 100th step, there is no output because of the <literal>WHERE</>
1665    clause, and so the query terminates.
1666   </para>
1667
1668   <para>
1669    Recursive queries are typically used to deal with hierarchical or
1670    tree-structured data.  A useful example is this query to find all the
1671    direct and indirect sub-parts of a product, given only a table that
1672    shows immediate inclusions:
1673
1674 <programlisting>
1675 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
1676     SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
1677   UNION ALL
1678     SELECT p.sub_part, p.part, p.quantity
1679     FROM included_parts pr, parts p
1680     WHERE p.part = pr.sub_part
1681   )
1682 SELECT sub_part, SUM(quantity) as total_quantity
1683 FROM included_parts
1684 GROUP BY sub_part
1685 </programlisting>
1686   </para>
1687
1688   <para>
1689    When working with recursive queries it is important to be sure that
1690    the recursive part of the query will eventually return no tuples,
1691    or else the query will loop indefinitely.  Sometimes, using
1692    <literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
1693    by discarding rows that duplicate previous output rows.  However, often a
1694    cycle does not involve output rows that are completely duplicate: it may be
1695    necessary to check just one or a few fields to see if the same point has
1696    been reached before.  The standard method for handling such situations is
1697    to compute an array of the already-visited values.  For example, consider
1698    the following query that searches a table <structname>graph</> using a
1699    <structfield>link</> field:
1700
1701 <programlisting>
1702 WITH RECURSIVE search_graph(id, link, data, depth) AS (
1703         SELECT g.id, g.link, g.data, 1
1704         FROM graph g
1705       UNION ALL
1706         SELECT g.id, g.link, g.data, sg.depth + 1
1707         FROM graph g, search_graph sg
1708         WHERE g.id = sg.link
1709 )
1710 SELECT * FROM search_graph;
1711 </programlisting>
1712
1713    This query will loop if the <structfield>link</> relationships contain
1714    cycles.  Because we require a <quote>depth</> output, just changing
1715    <literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
1716    Instead we need to recognize whether we have reached the same row again
1717    while following a particular path of links.  We add two columns
1718    <structfield>path</> and <structfield>cycle</> to the loop-prone query:
1719
1720 <programlisting>
1721 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1722         SELECT g.id, g.link, g.data, 1,
1723           ARRAY[g.id],
1724           false
1725         FROM graph g
1726       UNION ALL
1727         SELECT g.id, g.link, g.data, sg.depth + 1,
1728           path || g.id,
1729           g.id = ANY(path)
1730         FROM graph g, search_graph sg
1731         WHERE g.id = sg.link AND NOT cycle
1732 )
1733 SELECT * FROM search_graph;
1734 </programlisting>
1735
1736    Aside from preventing cycles, the array value is often useful in its own
1737    right as representing the <quote>path</> taken to reach any particular row.
1738   </para>
1739
1740   <para>
1741    In the general case where more than one field needs to be checked to
1742    recognize a cycle, use an array of rows.  For example, if we needed to
1743    compare fields <structfield>f1</> and <structfield>f2</>:
1744
1745 <programlisting>
1746 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1747         SELECT g.id, g.link, g.data, 1,
1748           ARRAY[ROW(g.f1, g.f2)],
1749           false
1750         FROM graph g
1751       UNION ALL
1752         SELECT g.id, g.link, g.data, sg.depth + 1,
1753           path || ROW(g.f1, g.f2),
1754           ROW(g.f1, g.f2) = ANY(path)
1755         FROM graph g, search_graph sg
1756         WHERE g.id = sg.link AND NOT cycle
1757 )
1758 SELECT * FROM search_graph;
1759 </programlisting>
1760   </para>
1761
1762   <tip>
1763    <para>
1764     Omit the <literal>ROW()</> syntax in the common case where only one field
1765     needs to be checked to recognize a cycle.  This allows a simple array
1766     rather than a composite-type array to be used, gaining efficiency.
1767    </para>
1768   </tip>
1769
1770   <tip>
1771    <para>
1772     The recursive query evaluation algorithm produces its output in
1773     breadth-first search order.  You can display the results in depth-first
1774     search order by making the outer query <literal>ORDER BY</> a
1775     <quote>path</> column constructed in this way.
1776    </para>
1777   </tip>
1778
1779   <para>
1780    A helpful trick for testing queries
1781    when you are not certain if they might loop is to place a <literal>LIMIT</>
1782    in the parent query.  For example, this query would loop forever without
1783    the <literal>LIMIT</>:
1784
1785 <programlisting>
1786 WITH RECURSIVE t(n) AS (
1787     SELECT 1
1788   UNION ALL
1789     SELECT n+1 FROM t
1790 )
1791 SELECT n FROM t LIMIT 100;
1792 </programlisting>
1793
1794    This works because <productname>PostgreSQL</productname>'s implementation
1795    evaluates only as many rows of a <literal>WITH</> query as are actually
1796    fetched by the parent query.  Using this trick in production is not
1797    recommended, because other systems might work differently.  Also, it
1798    usually won't work if you make the outer query sort the recursive query's
1799    results or join them to some other table, because in such cases the
1800    outer query will usually try to fetch all of the <literal>WITH</> query's
1801    output anyway.
1802   </para>
1803
1804   <para>
1805    A useful property of <literal>WITH</> queries is that they are evaluated
1806    only once per execution of the parent query, even if they are referred to
1807    more than once by the parent query or sibling <literal>WITH</> queries.
1808    Thus, expensive calculations that are needed in multiple places can be
1809    placed within a <literal>WITH</> query to avoid redundant work.  Another
1810    possible application is to prevent unwanted multiple evaluations of
1811    functions with side-effects.
1812    However, the other side of this coin is that the optimizer is less able to
1813    push restrictions from the parent query down into a <literal>WITH</> query
1814    than an ordinary sub-query.  The <literal>WITH</> query will generally be
1815    evaluated as written, without suppression of rows that the parent query
1816    might discard afterwards.  (But, as mentioned above, evaluation might stop
1817    early if the reference(s) to the query demand only a limited number of
1818    rows.)
1819   </para>
1820
1821   <para>
1822    The examples above only show <literal>WITH</> being used with
1823    <command>SELECT</>, but it can be attached in the same way to
1824    <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>.
1825    In each case it effectively provides temporary table(s) that can
1826    be referred to in the main command.
1827   </para>
1828  </sect2>
1829
1830  <sect2 id="queries-with-modifying">
1831    <title>Data-Modifying Statements in <literal>WITH</></title>
1832
1833    <para>
1834     You can use data-modifying statements (<command>INSERT</>,
1835     <command>UPDATE</>, or <command>DELETE</>) in <literal>WITH</>.  This
1836     allows you to perform several different operations in the same query.
1837     An example is:
1838
1839 <programlisting>
1840 WITH moved_rows AS (
1841     DELETE FROM products
1842     WHERE
1843         "date" &gt;= '2010-10-01' AND
1844         "date" &lt; '2010-11-01'
1845     RETURNING *
1846 )
1847 INSERT INTO products_log
1848 SELECT * FROM moved_rows;
1849 </programlisting>
1850
1851     This query effectively moves rows from <structname>products</> to
1852     <structname>products_log</>.  The <command>DELETE</> in <literal>WITH</>
1853     deletes the specified rows from <structname>products</>, returning their
1854     contents by means of its <literal>RETURNING</> clause; and then the
1855     primary query reads that output and inserts it into
1856     <structname>products_log</>.
1857    </para>
1858
1859    <para>
1860     A fine point of the above example is that the <literal>WITH</> clause is
1861     attached to the <command>INSERT</>, not the sub-<command>SELECT</> within
1862     the <command>INSERT</>.  This is necessary because data-modifying
1863     statements are only allowed in <literal>WITH</> clauses that are attached
1864     to the top-level statement.  However, normal <literal>WITH</> visibility
1865     rules apply, so it is possible to refer to the <literal>WITH</>
1866     statement's output from the sub-<command>SELECT</>.
1867    </para>
1868
1869    <para>
1870     Data-modifying statements in <literal>WITH</> usually have
1871     <literal>RETURNING</> clauses, as seen in the example above.
1872     It is the output of the <literal>RETURNING</> clause, <emphasis>not</> the
1873     target table of the data-modifying statement, that forms the temporary
1874     table that can be referred to by the rest of the query.  If a
1875     data-modifying statement in <literal>WITH</> lacks a <literal>RETURNING</>
1876     clause, then it forms no temporary table and cannot be referred to in
1877     the rest of the query.  Such a statement will be executed nonetheless.
1878     A not-particularly-useful example is:
1879
1880 <programlisting>
1881 WITH t AS (
1882     DELETE FROM foo
1883 )
1884 DELETE FROM bar;
1885 </programlisting>
1886
1887     This example would remove all rows from tables <structname>foo</> and
1888     <structname>bar</>.  The number of affected rows reported to the client
1889     would only include rows removed from <structname>bar</>.
1890    </para>
1891
1892    <para>
1893     Recursive self-references in data-modifying statements are not
1894     allowed.  In some cases it is possible to work around this limitation by
1895     referring to the output of a recursive <literal>WITH</>, for example:
1896
1897 <programlisting>
1898 WITH RECURSIVE included_parts(sub_part, part) AS (
1899     SELECT sub_part, part FROM parts WHERE part = 'our_product'
1900   UNION ALL
1901     SELECT p.sub_part, p.part
1902     FROM included_parts pr, parts p
1903     WHERE p.part = pr.sub_part
1904   )
1905 DELETE FROM parts
1906   WHERE part IN (SELECT part FROM included_parts);
1907 </programlisting>
1908
1909     This query would remove all direct and indirect subparts of a product.
1910    </para>
1911
1912    <para>
1913     Data-modifying statements in <literal>WITH</> are executed exactly once,
1914     and always to completion, independently of whether the primary query
1915     reads all (or indeed any) of their output.  Notice that this is different
1916     from the rule for <command>SELECT</> in <literal>WITH</>: as stated in the
1917     previous section, execution of a <command>SELECT</> is carried only as far
1918     as the primary query demands its output.
1919    </para>
1920
1921    <para>
1922     The sub-statements in <literal>WITH</> are executed concurrently with
1923     each other and with the main query.  Therefore, when using data-modifying
1924     statements in <literal>WITH</>, the order in which the specified updates
1925     actually happen is unpredictable.  All the statements are executed with
1926     the same <firstterm>snapshot</> (see <xref linkend="mvcc">), so they
1927     cannot <quote>see</> each others' effects on the target tables.  This
1928     alleviates the effects of the unpredictability of the actual order of row
1929     updates, and means that <literal>RETURNING</> data is the only way to
1930     communicate changes between different <literal>WITH</> sub-statements and
1931     the main query.  An example of this is that in
1932
1933 <programlisting>
1934 WITH t AS (
1935     UPDATE products SET price = price * 1.05
1936     RETURNING *
1937 )
1938 SELECT * FROM products;
1939 </programlisting>
1940
1941     the outer <command>SELECT</> would return the original prices before the
1942     action of the <command>UPDATE</>, while in
1943
1944 <programlisting>
1945 WITH t AS (
1946     UPDATE products SET price = price * 1.05
1947     RETURNING *
1948 )
1949 SELECT * FROM t;
1950 </programlisting>
1951
1952     the outer <command>SELECT</> would return the updated data.
1953    </para>
1954
1955    <para>
1956     Trying to update the same row twice in a single statement is not
1957     supported.  Only one of the modifications takes place, but it is not easy
1958     (and sometimes not possible) to reliably predict which one.  This also
1959     applies to deleting a row that was already updated in the same statement:
1960     only the update is performed.  Therefore you should generally avoid trying
1961     to modify a single row twice in a single statement.  In particular avoid
1962     writing <literal>WITH</> sub-statements that could affect the same rows
1963     changed by the main statement or a sibling sub-statement.  The effects
1964     of such a statement will not be predictable.
1965    </para>
1966
1967    <para>
1968     At present, any table used as the target of a data-modifying statement in
1969     <literal>WITH</> must not have a conditional rule, nor an <literal>ALSO</>
1970     rule, nor an <literal>INSTEAD</> rule that expands to multiple statements.
1971    </para>
1972
1973   </sect2>
1974
1975  </sect1>
1976
1977 </chapter>