1 <!-- doc/src/sgml/queries.sgml -->
6 <indexterm zone="queries">
7 <primary>query</primary>
10 <indexterm zone="queries">
11 <primary>SELECT</primary>
15 The previous chapters explained how to create tables, how to fill
16 them with data, and how to manipulate that data. Now we finally
17 discuss how to retrieve the data from the database.
21 <sect1 id="queries-overview">
22 <title>Overview</title>
25 The process of retrieving or the command to retrieve data from a
26 database is called a <firstterm>query</firstterm>. In SQL the
27 <xref linkend="sql-select"> command is
28 used to specify queries. The general syntax of the
29 <command>SELECT</command> command is
31 <optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
33 The following sections describe the details of the select list, the
34 table expression, and the sort specification. <literal>WITH</>
35 queries are treated last since they are an advanced feature.
39 A simple kind of query has the form:
43 Assuming that there is a table called <literal>table1</literal>,
44 this command would retrieve all rows and all columns from
45 <literal>table1</literal>. (The method of retrieval depends on the
46 client application. For example, the
47 <application>psql</application> program will display an ASCII-art
48 table on the screen, while client libraries will offer functions to
49 extract individual values from the query result.) The select list
50 specification <literal>*</literal> means all columns that the table
51 expression happens to provide. A select list can also select a
52 subset of the available columns or make calculations using the
53 columns. For example, if
54 <literal>table1</literal> has columns named <literal>a</>,
55 <literal>b</>, and <literal>c</> (and perhaps others) you can make
58 SELECT a, b + c FROM table1;
60 (assuming that <literal>b</> and <literal>c</> are of a numerical
62 See <xref linkend="queries-select-lists"> for more details.
66 <literal>FROM table1</literal> is a simple kind of
67 table expression: it reads just one table. In general, table
68 expressions can be complex constructs of base tables, joins, and
69 subqueries. But you can also omit the table expression entirely and
70 use the <command>SELECT</command> command as a calculator:
74 This is more useful if the expressions in the select list return
75 varying results. For example, you could call a function this way:
83 <sect1 id="queries-table-expressions">
84 <title>Table Expressions</title>
86 <indexterm zone="queries-table-expressions">
87 <primary>table expression</primary>
91 A <firstterm>table expression</firstterm> computes a table. The
92 table expression contains a <literal>FROM</> clause that is
93 optionally followed by <literal>WHERE</>, <literal>GROUP BY</>, and
94 <literal>HAVING</> clauses. Trivial table expressions simply refer
95 to a table on disk, a so-called base table, but more complex
96 expressions can be used to modify or combine base tables in various
101 The optional <literal>WHERE</>, <literal>GROUP BY</>, and
102 <literal>HAVING</> clauses in the table expression specify a
103 pipeline of successive transformations performed on the table
104 derived in the <literal>FROM</> clause. All these transformations
105 produce a virtual table that provides the rows that are passed to
106 the select list to compute the output rows of the query.
109 <sect2 id="queries-from">
110 <title>The <literal>FROM</literal> Clause</title>
113 The <xref linkend="sql-from" endterm="sql-from-title"> derives a
114 table from one or more other tables given in a comma-separated
115 table reference list.
117 FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
120 A table reference can be a table name (possibly schema-qualified),
121 or a derived table such as a subquery, a table join, or complex
122 combinations of these. If more than one table reference is listed
123 in the <literal>FROM</> clause they are cross-joined (see below)
124 to form the intermediate virtual table that can then be subject to
125 transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
126 and <literal>HAVING</> clauses and is finally the result of the
127 overall table expression.
131 <primary>ONLY</primary>
135 When a table reference names a table that is the parent of a
136 table inheritance hierarchy, the table reference produces rows of
137 not only that table but all of its descendant tables, unless the
138 key word <literal>ONLY</> precedes the table name. However, the
139 reference produces only the columns that appear in the named table
140 — any columns added in subtables are ignored.
143 <sect3 id="queries-join">
144 <title>Joined Tables</title>
146 <indexterm zone="queries-join">
147 <primary>join</primary>
151 A joined table is a table derived from two other (real or
152 derived) tables according to the rules of the particular join
153 type. Inner, outer, and cross-joins are available.
157 <title>Join Types</title>
160 <term>Cross join</term>
163 <primary>join</primary>
164 <secondary>cross</secondary>
168 <primary>cross join</primary>
173 <replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
177 For every possible combination of rows from
178 <replaceable>T1</replaceable> and
179 <replaceable>T2</replaceable> (i.e., a Cartesian product),
180 the joined table will contain a
181 row consisting of all columns in <replaceable>T1</replaceable>
182 followed by all columns in <replaceable>T2</replaceable>. If
183 the tables have N and M rows respectively, the joined
184 table will have N * M rows.
188 <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
189 <replaceable>T2</replaceable></literal> is equivalent to
190 <literal>FROM <replaceable>T1</replaceable>,
191 <replaceable>T2</replaceable></literal>. It is also equivalent to
192 <literal>FROM <replaceable>T1</replaceable> INNER JOIN
193 <replaceable>T2</replaceable> ON TRUE</literal> (see below).
199 <term>Qualified joins</term>
202 <primary>join</primary>
203 <secondary>outer</secondary>
207 <primary>outer join</primary>
212 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
213 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
214 <replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
218 The words <literal>INNER</literal> and
219 <literal>OUTER</literal> are optional in all forms.
220 <literal>INNER</literal> is the default;
221 <literal>LEFT</literal>, <literal>RIGHT</literal>, and
222 <literal>FULL</literal> imply an outer join.
226 The <firstterm>join condition</firstterm> is specified in the
227 <literal>ON</> or <literal>USING</> clause, or implicitly by
228 the word <literal>NATURAL</>. The join condition determines
229 which rows from the two source tables are considered to
230 <quote>match</quote>, as explained in detail below.
234 The <literal>ON</> clause is the most general kind of join
235 condition: it takes a Boolean value expression of the same
236 kind as is used in a <literal>WHERE</> clause. A pair of rows
237 from <replaceable>T1</> and <replaceable>T2</> match if the
238 <literal>ON</> expression evaluates to true for them.
242 <literal>USING</> is a shorthand notation: it takes a
243 comma-separated list of column names, which the joined tables
244 must have in common, and forms a join condition specifying
245 equality of each of these pairs of columns. Furthermore, the
246 output of <literal>JOIN USING</> has one column for each of
247 the equated pairs of input columns, followed by the
248 remaining columns from each table. Thus, <literal>USING (a, b,
249 c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
250 t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
251 if <literal>ON</> is used there will be two columns
252 <literal>a</>, <literal>b</>, and <literal>c</> in the result,
253 whereas with <literal>USING</> there will be only one of each
254 (and they will appear first if <command>SELECT *</> is used).
259 <primary>join</primary>
260 <secondary>natural</secondary>
263 <primary>natural join</primary>
265 Finally, <literal>NATURAL</> is a shorthand form of
266 <literal>USING</>: it forms a <literal>USING</> list
267 consisting of all column names that appear in both
268 input tables. As with <literal>USING</>, these columns appear
269 only once in the output table.
273 The possible types of qualified join are:
277 <term><literal>INNER JOIN</></term>
281 For each row R1 of T1, the joined table has a row for each
282 row in T2 that satisfies the join condition with R1.
288 <term><literal>LEFT OUTER JOIN</></term>
291 <primary>join</primary>
292 <secondary>left</secondary>
296 <primary>left join</primary>
301 First, an inner join is performed. Then, for each row in
302 T1 that does not satisfy the join condition with any row in
303 T2, a joined row is added with null values in columns of
304 T2. Thus, the joined table always has at least
305 one row for each row in T1.
311 <term><literal>RIGHT OUTER JOIN</></term>
314 <primary>join</primary>
315 <secondary>right</secondary>
319 <primary>right join</primary>
324 First, an inner join is performed. Then, for each row in
325 T2 that does not satisfy the join condition with any row in
326 T1, a joined row is added with null values in columns of
327 T1. This is the converse of a left join: the result table
328 will always have a row for each row in T2.
334 <term><literal>FULL OUTER JOIN</></term>
338 First, an inner join is performed. Then, for each row in
339 T1 that does not satisfy the join condition with any row in
340 T2, a joined row is added with null values in columns of
341 T2. Also, for each row of T2 that does not satisfy the
342 join condition with any row in T1, a joined row with null
343 values in the columns of T1 is added.
354 Joins of all types can be chained together or nested: either or
355 both <replaceable>T1</replaceable> and
356 <replaceable>T2</replaceable> can be joined tables. Parentheses
357 can be used around <literal>JOIN</> clauses to control the join
358 order. In the absence of parentheses, <literal>JOIN</> clauses
363 To put this together, assume we have tables <literal>t1</literal>:
371 and <literal>t2</literal>:
379 then we get the following results for the various joins:
381 <prompt>=></> <userinput>SELECT * FROM t1 CROSS JOIN t2;</>
382 num | name | num | value
383 -----+------+-----+-------
395 <prompt>=></> <userinput>SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;</>
396 num | name | num | value
397 -----+------+-----+-------
402 <prompt>=></> <userinput>SELECT * FROM t1 INNER JOIN t2 USING (num);</>
409 <prompt>=></> <userinput>SELECT * FROM t1 NATURAL INNER JOIN t2;</>
416 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;</>
417 num | name | num | value
418 -----+------+-----+-------
424 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 USING (num);</>
432 <prompt>=></> <userinput>SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;</>
433 num | name | num | value
434 -----+------+-----+-------
440 <prompt>=></> <userinput>SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;</>
441 num | name | num | value
442 -----+------+-----+-------
452 The join condition specified with <literal>ON</> can also contain
453 conditions that do not relate directly to the join. This can
454 prove useful for some queries but needs to be thought out
455 carefully. For example:
457 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';</>
458 num | name | num | value
459 -----+------+-----+-------
465 Notice that placing the restriction in the <literal>WHERE</> clause
466 produces a different result:
468 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';</>
469 num | name | num | value
470 -----+------+-----+-------
474 This is because a restriction placed in the <literal>ON</>
475 clause is processed <emphasis>before</> the join, while
476 a restriction placed in the <literal>WHERE</> clause is processed
477 <emphasis>after</> the join.
481 <sect3 id="queries-table-aliases">
482 <title>Table and Column Aliases</title>
484 <indexterm zone="queries-table-aliases">
485 <primary>alias</primary>
486 <secondary>in the FROM clause</secondary>
490 <primary>label</primary>
495 A temporary name can be given to tables and complex table
496 references to be used for references to the derived table in
497 the rest of the query. This is called a <firstterm>table
502 To create a table alias, write
504 FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
508 FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
510 The <literal>AS</literal> key word is optional noise.
511 <replaceable>alias</replaceable> can be any identifier.
515 A typical application of table aliases is to assign short
516 identifiers to long table names to keep the join clauses
517 readable. For example:
519 SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
524 The alias becomes the new name of the table reference so far as the
525 current query is concerned — it is not allowed to refer to the
526 table by the original name elsewhere in the query. Thus, this is not
529 SELECT * FROM my_table AS m WHERE my_table.a > 5; -- wrong
534 Table aliases are mainly for notational convenience, but it is
535 necessary to use them when joining a table to itself, e.g.:
537 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
539 Additionally, an alias is required if the table reference is a
540 subquery (see <xref linkend="queries-subqueries">).
544 Parentheses are used to resolve ambiguities. In the following example,
545 the first statement assigns the alias <literal>b</literal> to the second
546 instance of <literal>my_table</>, but the second statement assigns the
547 alias to the result of the join:
549 SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
550 SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
555 Another form of table aliasing gives temporary names to the columns of
556 the table, as well as the table itself:
558 FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
560 If fewer column aliases are specified than the actual table has
561 columns, the remaining columns are not renamed. This syntax is
562 especially useful for self-joins or subqueries.
566 When an alias is applied to the output of a <literal>JOIN</>
567 clause, the alias hides the original
568 name(s) within the <literal>JOIN</>. For example:
570 SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
574 SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
576 is not valid; the table alias <literal>a</> is not visible
577 outside the alias <literal>c</>.
581 <sect3 id="queries-subqueries">
582 <title>Subqueries</title>
584 <indexterm zone="queries-subqueries">
585 <primary>subquery</primary>
589 Subqueries specifying a derived table must be enclosed in
590 parentheses and <emphasis>must</emphasis> be assigned a table
591 alias name. (See <xref linkend="queries-table-aliases">.) For
594 FROM (SELECT * FROM table1) AS alias_name
599 This example is equivalent to <literal>FROM table1 AS
600 alias_name</literal>. More interesting cases, which cannot be
601 reduced to a plain join, arise when the subquery involves
602 grouping or aggregation.
606 A subquery can also be a <command>VALUES</> list:
608 FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
609 AS names(first, last)
611 Again, a table alias is required. Assigning alias names to the columns
612 of the <command>VALUES</> list is optional, but is good practice.
613 For more information see <xref linkend="queries-values">.
617 <sect3 id="queries-tablefunctions">
618 <title>Table Functions</title>
620 <indexterm zone="queries-tablefunctions"><primary>table function</></>
622 <indexterm zone="queries-tablefunctions">
624 <secondary>in the FROM clause</>
628 Table functions are functions that produce a set of rows, made up
629 of either base data types (scalar types) or composite data types
630 (table rows). They are used like a table, view, or subquery in
631 the <literal>FROM</> clause of a query. Columns returned by table
632 functions can be included in <literal>SELECT</>,
633 <literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
634 as a table, view, or subquery column.
638 If a table function returns a base data type, the single result
639 column name matches the function name. If the function returns a
640 composite type, the result columns get the same names as the
641 individual attributes of the type.
645 A table function can be aliased in the <literal>FROM</> clause,
646 but it also can be left unaliased. If a function is used in the
647 <literal>FROM</> clause with no alias, the function name is used
648 as the resulting table name.
654 CREATE TABLE foo (fooid int, foosubid int, fooname text);
656 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
657 SELECT * FROM foo WHERE fooid = $1;
660 SELECT * FROM getfoo(1) AS t1;
665 FROM getfoo(foo.fooid) z
666 WHERE z.fooid = foo.fooid
669 CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
671 SELECT * FROM vw_getfoo;
676 In some cases it is useful to define table functions that can
677 return different column sets depending on how they are invoked.
678 To support this, the table function can be declared as returning
679 the pseudotype <type>record</>. When such a function is used in
680 a query, the expected row structure must be specified in the
681 query itself, so that the system can know how to parse and plan
682 the query. Consider this example:
685 FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
686 AS t1(proname name, prosrc text)
687 WHERE proname LIKE 'bytea%';
689 The <xref linkend="CONTRIB-DBLINK-FUNCTION"> function
690 (part of the <xref linkend="dblink"> module>) executes
691 a remote query. It is declared to return
692 <type>record</> since it might be used for any kind of query.
693 The actual column set must be specified in the calling query so
694 that the parser knows, for example, what <literal>*</> should
700 <sect2 id="queries-where">
701 <title>The <literal>WHERE</literal> Clause</title>
703 <indexterm zone="queries-where">
704 <primary>WHERE</primary>
708 The syntax of the <xref linkend="sql-where"
709 endterm="sql-where-title"> is
711 WHERE <replaceable>search_condition</replaceable>
713 where <replaceable>search_condition</replaceable> is any value
714 expression (see <xref linkend="sql-expressions">) that
715 returns a value of type <type>boolean</type>.
719 After the processing of the <literal>FROM</> clause is done, each
720 row of the derived virtual table is checked against the search
721 condition. If the result of the condition is true, the row is
722 kept in the output table, otherwise (i.e., if the result is
723 false or null) it is discarded. The search condition typically
724 references at least one column of the table generated in the
725 <literal>FROM</> clause; this is not required, but otherwise the
726 <literal>WHERE</> clause will be fairly useless.
731 The join condition of an inner join can be written either in
732 the <literal>WHERE</> clause or in the <literal>JOIN</> clause.
733 For example, these table expressions are equivalent:
735 FROM a, b WHERE a.id = b.id AND b.val > 5
739 FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5
743 FROM a NATURAL JOIN b WHERE b.val > 5
745 Which one of these you use is mainly a matter of style. The
746 <literal>JOIN</> syntax in the <literal>FROM</> clause is
747 probably not as portable to other SQL database management systems,
748 even though it is in the SQL standard. For
749 outer joins there is no choice: they must be done in
750 the <literal>FROM</> clause. The <literal>ON</> or <literal>USING</>
751 clause of an outer join is <emphasis>not</> equivalent to a
752 <literal>WHERE</> condition, because it results in the addition
753 of rows (for unmatched input rows) as well as the removal of rows
759 Here are some examples of <literal>WHERE</literal> clauses:
761 SELECT ... FROM fdt WHERE c1 > 5
763 SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
765 SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
767 SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
769 SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
771 SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)
773 <literal>fdt</literal> is the table derived in the
774 <literal>FROM</> clause. Rows that do not meet the search
775 condition of the <literal>WHERE</> clause are eliminated from
776 <literal>fdt</literal>. Notice the use of scalar subqueries as
777 value expressions. Just like any other query, the subqueries can
778 employ complex table expressions. Notice also how
779 <literal>fdt</literal> is referenced in the subqueries.
780 Qualifying <literal>c1</> as <literal>fdt.c1</> is only necessary
781 if <literal>c1</> is also the name of a column in the derived
782 input table of the subquery. But qualifying the column name adds
783 clarity even when it is not needed. This example shows how the column
784 naming scope of an outer query extends into its inner queries.
789 <sect2 id="queries-group">
790 <title>The <literal>GROUP BY</literal> and <literal>HAVING</literal> Clauses</title>
792 <indexterm zone="queries-group">
793 <primary>GROUP BY</primary>
796 <indexterm zone="queries-group">
797 <primary>grouping</primary>
801 After passing the <literal>WHERE</> filter, the derived input
802 table might be subject to grouping, using the <literal>GROUP BY</>
803 clause, and elimination of group rows using the <literal>HAVING</>
808 SELECT <replaceable>select_list</replaceable>
810 <optional>WHERE ...</optional>
811 GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
815 The <xref linkend="sql-groupby" endterm="sql-groupby-title"> is
816 used to group together those rows in a table that have the same
817 values in all the columns listed. The order in which the columns
818 are listed does not matter. The effect is to combine each set
819 of rows having common values into one group row that
820 represents all rows in the group. This is done to
821 eliminate redundancy in the output and/or compute aggregates that
822 apply to these groups. For instance:
824 <prompt>=></> <userinput>SELECT * FROM test1;</>
833 <prompt>=></> <userinput>SELECT x FROM test1 GROUP BY x;</>
844 In the second query, we could not have written <literal>SELECT *
845 FROM test1 GROUP BY x</literal>, because there is no single value
846 for the column <literal>y</> that could be associated with each
847 group. The grouped-by columns can be referenced in the select list since
848 they have a single value in each group.
852 In general, if a table is grouped, columns that are not
853 listed in <literal>GROUP BY</> cannot be referenced except in aggregate
854 expressions. An example with aggregate expressions is:
856 <prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x;</>
864 Here <literal>sum</literal> is an aggregate function that
865 computes a single value over the entire group. More information
866 about the available aggregate functions can be found in <xref
867 linkend="functions-aggregate">.
872 Grouping without aggregate expressions effectively calculates the
873 set of distinct values in a column. This can also be achieved
874 using the <literal>DISTINCT</> clause (see <xref
875 linkend="queries-distinct">).
880 Here is another example: it calculates the total sales for each
881 product (rather than the total sales of all products):
883 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
884 FROM products p LEFT JOIN sales s USING (product_id)
885 GROUP BY product_id, p.name, p.price;
887 In this example, the columns <literal>product_id</literal>,
888 <literal>p.name</literal>, and <literal>p.price</literal> must be
889 in the <literal>GROUP BY</> clause since they are referenced in
890 the query select list (but see below). The column
891 <literal>s.units</> does not have to be in the <literal>GROUP
892 BY</> list since it is only used in an aggregate expression
893 (<literal>sum(...)</literal>), which represents the sales
894 of a product. For each product, the query returns a summary row about
895 all sales of the product.
899 If the products table is set up so that,
900 say, <literal>product_id</literal> is the primary key, then it
901 would be enough to group by <literal>product_id</literal> in the
902 above example, since name and price would
903 be <firstterm>functionally
904 dependent</firstterm><indexterm><primary>functional
905 dependency</primary></indexterm> on the product ID, and so there
906 would be no ambiguity about which name and price value to return
907 for each product ID group.
911 In strict SQL, <literal>GROUP BY</> can only group by columns of
912 the source table but <productname>PostgreSQL</productname> extends
913 this to also allow <literal>GROUP BY</> to group by columns in the
914 select list. Grouping by value expressions instead of simple
915 column names is also allowed.
919 <primary>HAVING</primary>
923 If a table has been grouped using <literal>GROUP BY</literal>,
924 but only certain groups are of interest, the
925 <literal>HAVING</literal> clause can be used, much like a
926 <literal>WHERE</> clause, to eliminate groups from the result.
929 SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
931 Expressions in the <literal>HAVING</> clause can refer both to
932 grouped expressions and to ungrouped expressions (which necessarily
933 involve an aggregate function).
939 <prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;</>
946 <prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';</>
956 Again, a more realistic example:
958 SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
959 FROM products p LEFT JOIN sales s USING (product_id)
960 WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
961 GROUP BY product_id, p.name, p.price, p.cost
962 HAVING sum(p.price * s.units) > 5000;
964 In the example above, the <literal>WHERE</> clause is selecting
965 rows by a column that is not grouped (the expression is only true for
966 sales during the last four weeks), while the <literal>HAVING</>
967 clause restricts the output to groups with total gross sales over
968 5000. Note that the aggregate expressions do not necessarily need
969 to be the same in all parts of the query.
973 If a query contains aggregate function calls, but no <literal>GROUP BY</>
974 clause, grouping still occurs: the result is a single group row (or
975 perhaps no rows at all, if the single row is then eliminated by
977 The same is true if it contains a <literal>HAVING</> clause, even
978 without any aggregate function calls or <literal>GROUP BY</> clause.
982 <sect2 id="queries-window">
983 <title>Window Function Processing</title>
985 <indexterm zone="queries-window">
986 <primary>window function</primary>
987 <secondary>order of execution</>
991 If the query contains any window functions (see
992 <xref linkend="tutorial-window">,
993 <xref linkend="functions-window"> and
994 <xref linkend="syntax-window-functions">), these functions are evaluated
995 after any grouping, aggregation, and <literal>HAVING</> filtering is
996 performed. That is, if the query uses any aggregates, <literal>GROUP
997 BY</>, or <literal>HAVING</>, then the rows seen by the window functions
998 are the group rows instead of the original table rows from
999 <literal>FROM</>/<literal>WHERE</>.
1003 When multiple window functions are used, all the window functions having
1004 syntactically equivalent <literal>PARTITION BY</> and <literal>ORDER BY</>
1005 clauses in their window definitions are guaranteed to be evaluated in a
1006 single pass over the data. Therefore they will see the same sort ordering,
1007 even if the <literal>ORDER BY</> does not uniquely determine an ordering.
1008 However, no guarantees are made about the evaluation of functions having
1009 different <literal>PARTITION BY</> or <literal>ORDER BY</> specifications.
1010 (In such cases a sort step is typically required between the passes of
1011 window function evaluations, and the sort is not guaranteed to preserve
1012 ordering of rows that its <literal>ORDER BY</> sees as equivalent.)
1016 Currently, window functions always require presorted data, and so the
1017 query output will be ordered according to one or another of the window
1018 functions' <literal>PARTITION BY</>/<literal>ORDER BY</> clauses.
1019 It is not recommendable to rely on this, however. Use an explicit
1020 top-level <literal>ORDER BY</> clause if you want to be sure the
1021 results are sorted in a particular way.
1027 <sect1 id="queries-select-lists">
1028 <title>Select Lists</title>
1031 <primary>SELECT</primary>
1032 <secondary>select list</secondary>
1036 As shown in the previous section,
1037 the table expression in the <command>SELECT</command> command
1038 constructs an intermediate virtual table by possibly combining
1039 tables, views, eliminating rows, grouping, etc. This table is
1040 finally passed on to processing by the <firstterm>select list</firstterm>. The select
1041 list determines which <emphasis>columns</emphasis> of the
1042 intermediate table are actually output.
1045 <sect2 id="queries-select-list-items">
1046 <title>Select-List Items</title>
1049 <primary>*</primary>
1053 The simplest kind of select list is <literal>*</literal> which
1054 emits all columns that the table expression produces. Otherwise,
1055 a select list is a comma-separated list of value expressions (as
1056 defined in <xref linkend="sql-expressions">). For instance, it
1057 could be a list of column names:
1059 SELECT a, b, c FROM ...
1061 The columns names <literal>a</>, <literal>b</>, and <literal>c</>
1062 are either the actual names of the columns of tables referenced
1063 in the <literal>FROM</> clause, or the aliases given to them as
1064 explained in <xref linkend="queries-table-aliases">. The name
1065 space available in the select list is the same as in the
1066 <literal>WHERE</> clause, unless grouping is used, in which case
1067 it is the same as in the <literal>HAVING</> clause.
1071 If more than one table has a column of the same name, the table
1072 name must also be given, as in:
1074 SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
1076 When working with multiple tables, it can also be useful to ask for
1077 all the columns of a particular table:
1079 SELECT tbl1.*, tbl2.a FROM ...
1081 (See also <xref linkend="queries-where">.)
1085 If an arbitrary value expression is used in the select list, it
1086 conceptually adds a new virtual column to the returned table. The
1087 value expression is evaluated once for each result row, with
1088 the row's values substituted for any column references. But the
1089 expressions in the select list do not have to reference any
1090 columns in the table expression of the <literal>FROM</> clause;
1091 they can be constant arithmetic expressions, for instance.
1095 <sect2 id="queries-column-labels">
1096 <title>Column Labels</title>
1098 <indexterm zone="queries-column-labels">
1099 <primary>alias</primary>
1100 <secondary>in the select list</secondary>
1104 The entries in the select list can be assigned names for subsequent
1105 processing, such as for use in an <literal>ORDER BY</> clause
1106 or for display by the client application. For example:
1108 SELECT a AS value, b + c AS sum FROM ...
1113 If no output column name is specified using <literal>AS</>,
1114 the system assigns a default column name. For simple column references,
1115 this is the name of the referenced column. For function
1116 calls, this is the name of the function. For complex expressions,
1117 the system will generate a generic name.
1121 The <literal>AS</> keyword is optional, but only if the new column
1122 name does not match any
1123 <productname>PostgreSQL</productname> keyword (see <xref
1124 linkend="sql-keywords-appendix">). To avoid an accidental match to
1125 a keyword, you can double-quote the column name. For example,
1126 <literal>VALUE</> is a keyword, so this does not work:
1128 SELECT a value, b + c AS sum FROM ...
1132 SELECT a "value", b + c AS sum FROM ...
1134 For protection against possible
1135 future keyword additions, it is recommended that you always either
1136 write <literal>AS</literal> or double-quote the output column name.
1141 The naming of output columns here is different from that done in
1142 the <literal>FROM</> clause (see <xref
1143 linkend="queries-table-aliases">). It is possible
1144 to rename the same column twice, but the name assigned in
1145 the select list is the one that will be passed on.
1150 <sect2 id="queries-distinct">
1151 <title><literal>DISTINCT</literal></title>
1153 <indexterm zone="queries-distinct">
1154 <primary>DISTINCT</primary>
1157 <indexterm zone="queries-distinct">
1158 <primary>duplicates</primary>
1162 After the select list has been processed, the result table can
1163 optionally be subject to the elimination of duplicate rows. The
1164 <literal>DISTINCT</literal> key word is written directly after
1165 <literal>SELECT</literal> to specify this:
1167 SELECT DISTINCT <replaceable>select_list</replaceable> ...
1169 (Instead of <literal>DISTINCT</> the key word <literal>ALL</literal>
1170 can be used to specify the default behavior of retaining all rows.)
1174 <indexterm><primary>null value</><secondary sortas="DISTINCT">in
1175 DISTINCT</></indexterm>
1176 Obviously, two rows are considered distinct if they differ in at
1177 least one column value. Null values are considered equal in this
1182 Alternatively, an arbitrary expression can determine what rows are
1183 to be considered distinct:
1185 SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
1187 Here <replaceable>expression</replaceable> is an arbitrary value
1188 expression that is evaluated for all rows. A set of rows for
1189 which all the expressions are equal are considered duplicates, and
1190 only the first row of the set is kept in the output. Note that
1191 the <quote>first row</quote> of a set is unpredictable unless the
1192 query is sorted on enough columns to guarantee a unique ordering
1193 of the rows arriving at the <literal>DISTINCT</> filter.
1194 (<literal>DISTINCT ON</> processing occurs after <literal>ORDER
1199 The <literal>DISTINCT ON</> clause is not part of the SQL standard
1200 and is sometimes considered bad style because of the potentially
1201 indeterminate nature of its results. With judicious use of
1202 <literal>GROUP BY</> and subqueries in <literal>FROM</>, this
1203 construct can be avoided, but it is often the most convenient
1210 <sect1 id="queries-union">
1211 <title>Combining Queries</title>
1213 <indexterm zone="queries-union">
1214 <primary>UNION</primary>
1216 <indexterm zone="queries-union">
1217 <primary>INTERSECT</primary>
1219 <indexterm zone="queries-union">
1220 <primary>EXCEPT</primary>
1222 <indexterm zone="queries-union">
1223 <primary>set union</primary>
1225 <indexterm zone="queries-union">
1226 <primary>set intersection</primary>
1228 <indexterm zone="queries-union">
1229 <primary>set difference</primary>
1231 <indexterm zone="queries-union">
1232 <primary>set operation</primary>
1236 The results of two queries can be combined using the set operations
1237 union, intersection, and difference. The syntax is
1239 <replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
1240 <replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
1241 <replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
1243 <replaceable>query1</replaceable> and
1244 <replaceable>query2</replaceable> are queries that can use any of
1245 the features discussed up to this point. Set operations can also
1246 be nested and chained, for example
1248 <replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
1250 which is executed as:
1252 (<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
1257 <literal>UNION</> effectively appends the result of
1258 <replaceable>query2</replaceable> to the result of
1259 <replaceable>query1</replaceable> (although there is no guarantee
1260 that this is the order in which the rows are actually returned).
1261 Furthermore, it eliminates duplicate rows from its result, in the same
1262 way as <literal>DISTINCT</>, unless <literal>UNION ALL</> is used.
1266 <literal>INTERSECT</> returns all rows that are both in the result
1267 of <replaceable>query1</replaceable> and in the result of
1268 <replaceable>query2</replaceable>. Duplicate rows are eliminated
1269 unless <literal>INTERSECT ALL</> is used.
1273 <literal>EXCEPT</> returns all rows that are in the result of
1274 <replaceable>query1</replaceable> but not in the result of
1275 <replaceable>query2</replaceable>. (This is sometimes called the
1276 <firstterm>difference</> between two queries.) Again, duplicates
1277 are eliminated unless <literal>EXCEPT ALL</> is used.
1281 In order to calculate the union, intersection, or difference of two
1282 queries, the two queries must be <quote>union compatible</quote>,
1283 which means that they return the same number of columns and
1284 the corresponding columns have compatible data types, as
1285 described in <xref linkend="typeconv-union-case">.
1290 <sect1 id="queries-order">
1291 <title>Sorting Rows</title>
1293 <indexterm zone="queries-order">
1294 <primary>sorting</primary>
1297 <indexterm zone="queries-order">
1298 <primary>ORDER BY</primary>
1302 After a query has produced an output table (after the select list
1303 has been processed) it can optionally be sorted. If sorting is not
1304 chosen, the rows will be returned in an unspecified order. The actual
1305 order in that case will depend on the scan and join plan types and
1306 the order on disk, but it must not be relied on. A particular
1307 output ordering can only be guaranteed if the sort step is explicitly
1312 The <literal>ORDER BY</> clause specifies the sort order:
1314 SELECT <replaceable>select_list</replaceable>
1315 FROM <replaceable>table_expression</replaceable>
1316 ORDER BY <replaceable>sort_expression1</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional>
1317 <optional>, <replaceable>sort_expression2</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional> ...</optional>
1319 The sort expression(s) can be any expression that would be valid in the
1320 query's select list. An example is:
1322 SELECT a, b FROM table1 ORDER BY a + b, c;
1324 When more than one expression is specified,
1325 the later values are used to sort rows that are equal according to the
1326 earlier values. Each expression can be followed by an optional
1327 <literal>ASC</> or <literal>DESC</> keyword to set the sort direction to
1328 ascending or descending. <literal>ASC</> order is the default.
1329 Ascending order puts smaller values first, where
1330 <quote>smaller</quote> is defined in terms of the
1331 <literal><</literal> operator. Similarly, descending order is
1332 determined with the <literal>></literal> operator.
1335 Actually, <productname>PostgreSQL</> uses the <firstterm>default B-tree
1336 operator class</> for the expression's data type to determine the sort
1337 ordering for <literal>ASC</> and <literal>DESC</>. Conventionally,
1338 data types will be set up so that the <literal><</literal> and
1339 <literal>></literal> operators correspond to this sort ordering,
1340 but a user-defined data type's designer could choose to do something
1347 The <literal>NULLS FIRST</> and <literal>NULLS LAST</> options can be
1348 used to determine whether nulls appear before or after non-null values
1349 in the sort ordering. By default, null values sort as if larger than any
1350 non-null value; that is, <literal>NULLS FIRST</> is the default for
1351 <literal>DESC</> order, and <literal>NULLS LAST</> otherwise.
1355 Note that the ordering options are considered independently for each
1356 sort column. For example <literal>ORDER BY x, y DESC</> means
1357 <literal>ORDER BY x ASC, y DESC</>, which is not the same as
1358 <literal>ORDER BY x DESC, y DESC</>.
1362 A <replaceable>sort_expression</> can also be the column label or number
1363 of an output column, as in:
1365 SELECT a + b AS sum, c FROM table1 ORDER BY sum;
1366 SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
1368 both of which sort by the first output column. Note that an output
1369 column name has to stand alone, that is, it cannot be used in an expression
1370 — for example, this is <emphasis>not</> correct:
1372 SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; -- wrong
1374 This restriction is made to reduce ambiguity. There is still
1375 ambiguity if an <literal>ORDER BY</> item is a simple name that
1376 could match either an output column name or a column from the table
1377 expression. The output column is used in such cases. This would
1378 only cause confusion if you use <literal>AS</> to rename an output
1379 column to match some other table column's name.
1383 <literal>ORDER BY</> can be applied to the result of a
1384 <literal>UNION</>, <literal>INTERSECT</>, or <literal>EXCEPT</>
1385 combination, but in this case it is only permitted to sort by
1386 output column names or numbers, not by expressions.
1391 <sect1 id="queries-limit">
1392 <title><literal>LIMIT</literal> and <literal>OFFSET</literal></title>
1394 <indexterm zone="queries-limit">
1395 <primary>LIMIT</primary>
1398 <indexterm zone="queries-limit">
1399 <primary>OFFSET</primary>
1403 <literal>LIMIT</> and <literal>OFFSET</> allow you to retrieve just
1404 a portion of the rows that are generated by the rest of the query:
1406 SELECT <replaceable>select_list</replaceable>
1407 FROM <replaceable>table_expression</replaceable>
1408 <optional> ORDER BY ... </optional>
1409 <optional> LIMIT { <replaceable>number</replaceable> | ALL } </optional> <optional> OFFSET <replaceable>number</replaceable> </optional>
1414 If a limit count is given, no more than that many rows will be
1415 returned (but possibly less, if the query itself yields less rows).
1416 <literal>LIMIT ALL</> is the same as omitting the <literal>LIMIT</>
1421 <literal>OFFSET</> says to skip that many rows before beginning to
1422 return rows. <literal>OFFSET 0</> is the same as omitting the
1423 <literal>OFFSET</> clause, and <literal>LIMIT NULL</> is the same
1424 as omitting the <literal>LIMIT</> clause. If both <literal>OFFSET</>
1425 and <literal>LIMIT</> appear, then <literal>OFFSET</> rows are
1426 skipped before starting to count the <literal>LIMIT</> rows that
1431 When using <literal>LIMIT</>, it is important to use an
1432 <literal>ORDER BY</> clause that constrains the result rows into a
1433 unique order. Otherwise you will get an unpredictable subset of
1434 the query's rows. You might be asking for the tenth through
1435 twentieth rows, but tenth through twentieth in what ordering? The
1436 ordering is unknown, unless you specified <literal>ORDER BY</>.
1440 The query optimizer takes <literal>LIMIT</> into account when
1441 generating query plans, so you are very likely to get different
1442 plans (yielding different row orders) depending on what you give
1443 for <literal>LIMIT</> and <literal>OFFSET</>. Thus, using
1444 different <literal>LIMIT</>/<literal>OFFSET</> values to select
1445 different subsets of a query result <emphasis>will give
1446 inconsistent results</emphasis> unless you enforce a predictable
1447 result ordering with <literal>ORDER BY</>. This is not a bug; it
1448 is an inherent consequence of the fact that SQL does not promise to
1449 deliver the results of a query in any particular order unless
1450 <literal>ORDER BY</> is used to constrain the order.
1454 The rows skipped by an <literal>OFFSET</> clause still have to be
1455 computed inside the server; therefore a large <literal>OFFSET</>
1456 might be inefficient.
1461 <sect1 id="queries-values">
1462 <title><literal>VALUES</literal> Lists</title>
1464 <indexterm zone="queries-values">
1465 <primary>VALUES</primary>
1469 <literal>VALUES</> provides a way to generate a <quote>constant table</>
1470 that can be used in a query without having to actually create and populate
1471 a table on-disk. The syntax is
1473 VALUES ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) [, ...]
1475 Each parenthesized list of expressions generates a row in the table.
1476 The lists must all have the same number of elements (i.e., the number
1477 of columns in the table), and corresponding entries in each list must
1478 have compatible data types. The actual data type assigned to each column
1479 of the result is determined using the same rules as for <literal>UNION</>
1480 (see <xref linkend="typeconv-union-case">).
1486 VALUES (1, 'one'), (2, 'two'), (3, 'three');
1489 will return a table of two columns and three rows. It's effectively
1492 SELECT 1 AS column1, 'one' AS column2
1499 By default, <productname>PostgreSQL</productname> assigns the names
1500 <literal>column1</>, <literal>column2</>, etc. to the columns of a
1501 <literal>VALUES</> table. The column names are not specified by the
1502 SQL standard and different database systems do it differently, so
1503 it's usually better to override the default names with a table alias
1508 Syntactically, <literal>VALUES</> followed by expression lists is
1509 treated as equivalent to:
1511 SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable>
1513 and can appear anywhere a <literal>SELECT</> can. For example, you can
1514 use it as part of a <literal>UNION</>, or attach a
1515 <replaceable>sort_specification</replaceable> (<literal>ORDER BY</>,
1516 <literal>LIMIT</>, and/or <literal>OFFSET</>) to it. <literal>VALUES</>
1517 is most commonly used as the data source in an <command>INSERT</> command,
1518 and next most commonly as a subquery.
1522 For more information see <xref linkend="sql-values">.
1528 <sect1 id="queries-with">
1529 <title><literal>WITH</literal> Queries (Common Table Expressions)</title>
1531 <indexterm zone="queries-with">
1532 <primary>WITH</primary>
1533 <secondary>in SELECT</secondary>
1537 <primary>common table expression</primary>
1542 <literal>WITH</> provides a way to write auxiliary statements for use in a
1543 larger query. These statements, which are often referred to as Common
1544 Table Expressions or <acronym>CTE</acronym>s, can be thought of as defining
1545 temporary tables that exist just for one query. Each auxiliary statement
1546 in a <literal>WITH</> clause can be a <command>SELECT</>,
1547 <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>; and the
1548 <literal>WITH</> clause itself is attached to a primary statement that can
1549 also be a <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>, or
1553 <sect2 id="queries-with-select">
1554 <title><command>SELECT</> in <literal>WITH</></title>
1557 The basic value of <command>SELECT</> in <literal>WITH</> is to
1558 break down complicated queries into simpler parts. An example is:
1561 WITH regional_sales AS (
1562 SELECT region, SUM(amount) AS total_sales
1568 WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
1572 SUM(quantity) AS product_units,
1573 SUM(amount) AS product_sales
1575 WHERE region IN (SELECT region FROM top_regions)
1576 GROUP BY region, product;
1579 which displays per-product sales totals in only the top sales regions.
1580 The <literal>WITH</> clause defines two auxiliary statements named
1581 <structname>regional_sales</> and <structname>top_regions</>,
1582 where the output of <structname>regional_sales</> is used in
1583 <structname>top_regions</> and the output of <structname>top_regions</>
1584 is used in the primary <command>SELECT</> query.
1585 This example could have been written without <literal>WITH</>,
1586 but we'd have needed two levels of nested sub-SELECTs. It's a bit
1587 easier to follow this way.
1591 The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
1592 from a mere syntactic convenience into a feature that accomplishes
1593 things not otherwise possible in standard SQL. Using
1594 <literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
1595 output. A very simple example is this query to sum the integers from 1
1599 WITH RECURSIVE t(n) AS (
1602 SELECT n+1 FROM t WHERE n < 100
1604 SELECT sum(n) FROM t;
1607 The general form of a recursive <literal>WITH</> query is always a
1608 <firstterm>non-recursive term</>, then <literal>UNION</> (or
1609 <literal>UNION ALL</>), then a
1610 <firstterm>recursive term</>, where only the recursive term can contain
1611 a reference to the query's own output. Such a query is executed as
1616 <title>Recursive Query Evaluation</title>
1618 <step performance="required">
1620 Evaluate the non-recursive term. For <literal>UNION</> (but not
1621 <literal>UNION ALL</>), discard duplicate rows. Include all remaining
1622 rows in the result of the recursive query, and also place them in a
1623 temporary <firstterm>working table</>.
1627 <step performance="required">
1629 So long as the working table is not empty, repeat these steps:
1632 <step performance="required">
1634 Evaluate the recursive term, substituting the current contents of
1635 the working table for the recursive self-reference.
1636 For <literal>UNION</> (but not <literal>UNION ALL</>), discard
1637 duplicate rows and rows that duplicate any previous result row.
1638 Include all remaining rows in the result of the recursive query, and
1639 also place them in a temporary <firstterm>intermediate table</>.
1643 <step performance="required">
1645 Replace the contents of the working table with the contents of the
1646 intermediate table, then empty the intermediate table.
1655 Strictly speaking, this process is iteration not recursion, but
1656 <literal>RECURSIVE</> is the terminology chosen by the SQL standards
1662 In the example above, the working table has just a single row in each step,
1663 and it takes on the values from 1 through 100 in successive steps. In
1664 the 100th step, there is no output because of the <literal>WHERE</>
1665 clause, and so the query terminates.
1669 Recursive queries are typically used to deal with hierarchical or
1670 tree-structured data. A useful example is this query to find all the
1671 direct and indirect sub-parts of a product, given only a table that
1672 shows immediate inclusions:
1675 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
1676 SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
1678 SELECT p.sub_part, p.part, p.quantity
1679 FROM included_parts pr, parts p
1680 WHERE p.part = pr.sub_part
1682 SELECT sub_part, SUM(quantity) as total_quantity
1689 When working with recursive queries it is important to be sure that
1690 the recursive part of the query will eventually return no tuples,
1691 or else the query will loop indefinitely. Sometimes, using
1692 <literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
1693 by discarding rows that duplicate previous output rows. However, often a
1694 cycle does not involve output rows that are completely duplicate: it may be
1695 necessary to check just one or a few fields to see if the same point has
1696 been reached before. The standard method for handling such situations is
1697 to compute an array of the already-visited values. For example, consider
1698 the following query that searches a table <structname>graph</> using a
1699 <structfield>link</> field:
1702 WITH RECURSIVE search_graph(id, link, data, depth) AS (
1703 SELECT g.id, g.link, g.data, 1
1706 SELECT g.id, g.link, g.data, sg.depth + 1
1707 FROM graph g, search_graph sg
1708 WHERE g.id = sg.link
1710 SELECT * FROM search_graph;
1713 This query will loop if the <structfield>link</> relationships contain
1714 cycles. Because we require a <quote>depth</> output, just changing
1715 <literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
1716 Instead we need to recognize whether we have reached the same row again
1717 while following a particular path of links. We add two columns
1718 <structfield>path</> and <structfield>cycle</> to the loop-prone query:
1721 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1722 SELECT g.id, g.link, g.data, 1,
1727 SELECT g.id, g.link, g.data, sg.depth + 1,
1730 FROM graph g, search_graph sg
1731 WHERE g.id = sg.link AND NOT cycle
1733 SELECT * FROM search_graph;
1736 Aside from preventing cycles, the array value is often useful in its own
1737 right as representing the <quote>path</> taken to reach any particular row.
1741 In the general case where more than one field needs to be checked to
1742 recognize a cycle, use an array of rows. For example, if we needed to
1743 compare fields <structfield>f1</> and <structfield>f2</>:
1746 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1747 SELECT g.id, g.link, g.data, 1,
1748 ARRAY[ROW(g.f1, g.f2)],
1752 SELECT g.id, g.link, g.data, sg.depth + 1,
1753 path || ROW(g.f1, g.f2),
1754 ROW(g.f1, g.f2) = ANY(path)
1755 FROM graph g, search_graph sg
1756 WHERE g.id = sg.link AND NOT cycle
1758 SELECT * FROM search_graph;
1764 Omit the <literal>ROW()</> syntax in the common case where only one field
1765 needs to be checked to recognize a cycle. This allows a simple array
1766 rather than a composite-type array to be used, gaining efficiency.
1772 The recursive query evaluation algorithm produces its output in
1773 breadth-first search order. You can display the results in depth-first
1774 search order by making the outer query <literal>ORDER BY</> a
1775 <quote>path</> column constructed in this way.
1780 A helpful trick for testing queries
1781 when you are not certain if they might loop is to place a <literal>LIMIT</>
1782 in the parent query. For example, this query would loop forever without
1783 the <literal>LIMIT</>:
1786 WITH RECURSIVE t(n) AS (
1791 SELECT n FROM t LIMIT 100;
1794 This works because <productname>PostgreSQL</productname>'s implementation
1795 evaluates only as many rows of a <literal>WITH</> query as are actually
1796 fetched by the parent query. Using this trick in production is not
1797 recommended, because other systems might work differently. Also, it
1798 usually won't work if you make the outer query sort the recursive query's
1799 results or join them to some other table, because in such cases the
1800 outer query will usually try to fetch all of the <literal>WITH</> query's
1805 A useful property of <literal>WITH</> queries is that they are evaluated
1806 only once per execution of the parent query, even if they are referred to
1807 more than once by the parent query or sibling <literal>WITH</> queries.
1808 Thus, expensive calculations that are needed in multiple places can be
1809 placed within a <literal>WITH</> query to avoid redundant work. Another
1810 possible application is to prevent unwanted multiple evaluations of
1811 functions with side-effects.
1812 However, the other side of this coin is that the optimizer is less able to
1813 push restrictions from the parent query down into a <literal>WITH</> query
1814 than an ordinary sub-query. The <literal>WITH</> query will generally be
1815 evaluated as written, without suppression of rows that the parent query
1816 might discard afterwards. (But, as mentioned above, evaluation might stop
1817 early if the reference(s) to the query demand only a limited number of
1822 The examples above only show <literal>WITH</> being used with
1823 <command>SELECT</>, but it can be attached in the same way to
1824 <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>.
1825 In each case it effectively provides temporary table(s) that can
1826 be referred to in the main command.
1830 <sect2 id="queries-with-modifying">
1831 <title>Data-Modifying Statements in <literal>WITH</></title>
1834 You can use data-modifying statements (<command>INSERT</>,
1835 <command>UPDATE</>, or <command>DELETE</>) in <literal>WITH</>. This
1836 allows you to perform several different operations in the same query.
1840 WITH moved_rows AS (
1841 DELETE FROM products
1843 "date" >= '2010-10-01' AND
1844 "date" < '2010-11-01'
1847 INSERT INTO products_log
1848 SELECT * FROM moved_rows;
1851 This query effectively moves rows from <structname>products</> to
1852 <structname>products_log</>. The <command>DELETE</> in <literal>WITH</>
1853 deletes the specified rows from <structname>products</>, returning their
1854 contents by means of its <literal>RETURNING</> clause; and then the
1855 primary query reads that output and inserts it into
1856 <structname>products_log</>.
1860 A fine point of the above example is that the <literal>WITH</> clause is
1861 attached to the <command>INSERT</>, not the sub-<command>SELECT</> within
1862 the <command>INSERT</>. This is necessary because data-modifying
1863 statements are only allowed in <literal>WITH</> clauses that are attached
1864 to the top-level statement. However, normal <literal>WITH</> visibility
1865 rules apply, so it is possible to refer to the <literal>WITH</>
1866 statement's output from the sub-<command>SELECT</>.
1870 Data-modifying statements in <literal>WITH</> usually have
1871 <literal>RETURNING</> clauses, as seen in the example above.
1872 It is the output of the <literal>RETURNING</> clause, <emphasis>not</> the
1873 target table of the data-modifying statement, that forms the temporary
1874 table that can be referred to by the rest of the query. If a
1875 data-modifying statement in <literal>WITH</> lacks a <literal>RETURNING</>
1876 clause, then it forms no temporary table and cannot be referred to in
1877 the rest of the query. Such a statement will be executed nonetheless.
1878 A not-particularly-useful example is:
1887 This example would remove all rows from tables <structname>foo</> and
1888 <structname>bar</>. The number of affected rows reported to the client
1889 would only include rows removed from <structname>bar</>.
1893 Recursive self-references in data-modifying statements are not
1894 allowed. In some cases it is possible to work around this limitation by
1895 referring to the output of a recursive <literal>WITH</>, for example:
1898 WITH RECURSIVE included_parts(sub_part, part) AS (
1899 SELECT sub_part, part FROM parts WHERE part = 'our_product'
1901 SELECT p.sub_part, p.part
1902 FROM included_parts pr, parts p
1903 WHERE p.part = pr.sub_part
1906 WHERE part IN (SELECT part FROM included_parts);
1909 This query would remove all direct and indirect subparts of a product.
1913 Data-modifying statements in <literal>WITH</> are executed exactly once,
1914 and always to completion, independently of whether the primary query
1915 reads all (or indeed any) of their output. Notice that this is different
1916 from the rule for <command>SELECT</> in <literal>WITH</>: as stated in the
1917 previous section, execution of a <command>SELECT</> is carried only as far
1918 as the primary query demands its output.
1922 The sub-statements in <literal>WITH</> are executed concurrently with
1923 each other and with the main query. Therefore, when using data-modifying
1924 statements in <literal>WITH</>, the order in which the specified updates
1925 actually happen is unpredictable. All the statements are executed with
1926 the same <firstterm>snapshot</> (see <xref linkend="mvcc">), so they
1927 cannot <quote>see</> each others' effects on the target tables. This
1928 alleviates the effects of the unpredictability of the actual order of row
1929 updates, and means that <literal>RETURNING</> data is the only way to
1930 communicate changes between different <literal>WITH</> sub-statements and
1931 the main query. An example of this is that in
1935 UPDATE products SET price = price * 1.05
1938 SELECT * FROM products;
1941 the outer <command>SELECT</> would return the original prices before the
1942 action of the <command>UPDATE</>, while in
1946 UPDATE products SET price = price * 1.05
1952 the outer <command>SELECT</> would return the updated data.
1956 Trying to update the same row twice in a single statement is not
1957 supported. Only one of the modifications takes place, but it is not easy
1958 (and sometimes not possible) to reliably predict which one. This also
1959 applies to deleting a row that was already updated in the same statement:
1960 only the update is performed. Therefore you should generally avoid trying
1961 to modify a single row twice in a single statement. In particular avoid
1962 writing <literal>WITH</> sub-statements that could affect the same rows
1963 changed by the main statement or a sibling sub-statement. The effects
1964 of such a statement will not be predictable.
1968 At present, any table used as the target of a data-modifying statement in
1969 <literal>WITH</> must not have a conditional rule, nor an <literal>ALSO</>
1970 rule, nor an <literal>INSTEAD</> rule that expands to multiple statements.