1 <!-- doc/src/sgml/queries.sgml -->
6 <indexterm zone="queries">
7 <primary>query</primary>
10 <indexterm zone="queries">
11 <primary>SELECT</primary>
15 The previous chapters explained how to create tables, how to fill
16 them with data, and how to manipulate that data. Now we finally
17 discuss how to retrieve the data from the database.
21 <sect1 id="queries-overview">
22 <title>Overview</title>
25 The process of retrieving or the command to retrieve data from a
26 database is called a <firstterm>query</firstterm>. In SQL the
27 <xref linkend="sql-select"> command is
28 used to specify queries. The general syntax of the
29 <command>SELECT</command> command is
31 <optional>WITH <replaceable>with_queries</replaceable></optional> SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> <optional><replaceable>sort_specification</replaceable></optional>
33 The following sections describe the details of the select list, the
34 table expression, and the sort specification. <literal>WITH</>
35 queries are treated last since they are an advanced feature.
39 A simple kind of query has the form:
43 Assuming that there is a table called <literal>table1</literal>,
44 this command would retrieve all rows and all user-defined columns from
45 <literal>table1</literal>. (The method of retrieval depends on the
46 client application. For example, the
47 <application>psql</application> program will display an ASCII-art
48 table on the screen, while client libraries will offer functions to
49 extract individual values from the query result.) The select list
50 specification <literal>*</literal> means all columns that the table
51 expression happens to provide. A select list can also select a
52 subset of the available columns or make calculations using the
53 columns. For example, if
54 <literal>table1</literal> has columns named <literal>a</>,
55 <literal>b</>, and <literal>c</> (and perhaps others) you can make
58 SELECT a, b + c FROM table1;
60 (assuming that <literal>b</> and <literal>c</> are of a numerical
62 See <xref linkend="queries-select-lists"> for more details.
66 <literal>FROM table1</literal> is a simple kind of
67 table expression: it reads just one table. In general, table
68 expressions can be complex constructs of base tables, joins, and
69 subqueries. But you can also omit the table expression entirely and
70 use the <command>SELECT</command> command as a calculator:
74 This is more useful if the expressions in the select list return
75 varying results. For example, you could call a function this way:
83 <sect1 id="queries-table-expressions">
84 <title>Table Expressions</title>
86 <indexterm zone="queries-table-expressions">
87 <primary>table expression</primary>
91 A <firstterm>table expression</firstterm> computes a table. The
92 table expression contains a <literal>FROM</> clause that is
93 optionally followed by <literal>WHERE</>, <literal>GROUP BY</>, and
94 <literal>HAVING</> clauses. Trivial table expressions simply refer
95 to a table on disk, a so-called base table, but more complex
96 expressions can be used to modify or combine base tables in various
101 The optional <literal>WHERE</>, <literal>GROUP BY</>, and
102 <literal>HAVING</> clauses in the table expression specify a
103 pipeline of successive transformations performed on the table
104 derived in the <literal>FROM</> clause. All these transformations
105 produce a virtual table that provides the rows that are passed to
106 the select list to compute the output rows of the query.
109 <sect2 id="queries-from">
110 <title>The <literal>FROM</literal> Clause</title>
113 The <xref linkend="sql-from" endterm="sql-from-title"> derives a
114 table from one or more other tables given in a comma-separated
115 table reference list.
117 FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_reference</replaceable> <optional>, ...</optional></optional>
120 A table reference can be a table name (possibly schema-qualified),
121 or a derived table such as a subquery, a table join, or complex
122 combinations of these. If more than one table reference is listed
123 in the <literal>FROM</> clause they are cross-joined (see below)
124 to form the intermediate virtual table that can then be subject to
125 transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
126 and <literal>HAVING</> clauses and is finally the result of the
127 overall table expression.
131 <primary>ONLY</primary>
135 When a table reference names a table that is the parent of a
136 table inheritance hierarchy, the table reference produces rows of
137 not only that table but all of its descendant tables, unless the
138 key word <literal>ONLY</> precedes the table name. However, the
139 reference produces only the columns that appear in the named table
140 — any columns added in subtables are ignored.
144 Instead of writing <literal>ONLY</> before the table name, you can write
145 <literal>*</> after the table name to explicitly specify that descendant
146 tables are included. Writing <literal>*</> is not necessary since that
147 behavior is the default (unless you have changed the setting of the <xref
148 linkend="guc-sql-inheritance"> configuration option). However writing
149 <literal>*</> might be useful to emphasize that additional tables will be
153 <sect3 id="queries-join">
154 <title>Joined Tables</title>
156 <indexterm zone="queries-join">
157 <primary>join</primary>
161 A joined table is a table derived from two other (real or
162 derived) tables according to the rules of the particular join
163 type. Inner, outer, and cross-joins are available.
167 <title>Join Types</title>
170 <term>Cross join</term>
173 <primary>join</primary>
174 <secondary>cross</secondary>
178 <primary>cross join</primary>
183 <replaceable>T1</replaceable> CROSS JOIN <replaceable>T2</replaceable>
187 For every possible combination of rows from
188 <replaceable>T1</replaceable> and
189 <replaceable>T2</replaceable> (i.e., a Cartesian product),
190 the joined table will contain a
191 row consisting of all columns in <replaceable>T1</replaceable>
192 followed by all columns in <replaceable>T2</replaceable>. If
193 the tables have N and M rows respectively, the joined
194 table will have N * M rows.
198 <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
199 <replaceable>T2</replaceable></literal> is equivalent to
200 <literal>FROM <replaceable>T1</replaceable>,
201 <replaceable>T2</replaceable></literal>. It is also equivalent to
202 <literal>FROM <replaceable>T1</replaceable> INNER JOIN
203 <replaceable>T2</replaceable> ON TRUE</literal> (see below).
209 <term>Qualified joins</term>
212 <primary>join</primary>
213 <secondary>outer</secondary>
217 <primary>outer join</primary>
222 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> ON <replaceable>boolean_expression</replaceable>
223 <replaceable>T1</replaceable> { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable> USING ( <replaceable>join column list</replaceable> )
224 <replaceable>T1</replaceable> NATURAL { <optional>INNER</optional> | { LEFT | RIGHT | FULL } <optional>OUTER</optional> } JOIN <replaceable>T2</replaceable>
228 The words <literal>INNER</literal> and
229 <literal>OUTER</literal> are optional in all forms.
230 <literal>INNER</literal> is the default;
231 <literal>LEFT</literal>, <literal>RIGHT</literal>, and
232 <literal>FULL</literal> imply an outer join.
236 The <firstterm>join condition</firstterm> is specified in the
237 <literal>ON</> or <literal>USING</> clause, or implicitly by
238 the word <literal>NATURAL</>. The join condition determines
239 which rows from the two source tables are considered to
240 <quote>match</quote>, as explained in detail below.
244 The <literal>ON</> clause is the most general kind of join
245 condition: it takes a Boolean value expression of the same
246 kind as is used in a <literal>WHERE</> clause. A pair of rows
247 from <replaceable>T1</> and <replaceable>T2</> match if the
248 <literal>ON</> expression evaluates to true for them.
252 <literal>USING</> is a shorthand notation: it takes a
253 comma-separated list of column names, which the joined tables
254 must have in common, and forms a join condition specifying
255 equality of each of these pairs of columns. Furthermore, the
256 output of <literal>JOIN USING</> has one column for each of
257 the equated pairs of input columns, followed by the
258 remaining columns from each table. Thus, <literal>USING (a, b,
259 c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
260 t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
261 if <literal>ON</> is used there will be two columns
262 <literal>a</>, <literal>b</>, and <literal>c</> in the result,
263 whereas with <literal>USING</> there will be only one of each
264 (and they will appear first if <command>SELECT *</> is used).
269 <primary>join</primary>
270 <secondary>natural</secondary>
273 <primary>natural join</primary>
275 Finally, <literal>NATURAL</> is a shorthand form of
276 <literal>USING</>: it forms a <literal>USING</> list
277 consisting of all column names that appear in both
278 input tables. As with <literal>USING</>, these columns appear
279 only once in the output table. If there are no common
280 columns, <literal>NATURAL</literal> behaves like
281 <literal>CROSS JOIN</literal>.
285 The possible types of qualified join are:
289 <term><literal>INNER JOIN</></term>
293 For each row R1 of T1, the joined table has a row for each
294 row in T2 that satisfies the join condition with R1.
300 <term><literal>LEFT OUTER JOIN</></term>
303 <primary>join</primary>
304 <secondary>left</secondary>
308 <primary>left join</primary>
313 First, an inner join is performed. Then, for each row in
314 T1 that does not satisfy the join condition with any row in
315 T2, a joined row is added with null values in columns of
316 T2. Thus, the joined table always has at least
317 one row for each row in T1.
323 <term><literal>RIGHT OUTER JOIN</></term>
326 <primary>join</primary>
327 <secondary>right</secondary>
331 <primary>right join</primary>
336 First, an inner join is performed. Then, for each row in
337 T2 that does not satisfy the join condition with any row in
338 T1, a joined row is added with null values in columns of
339 T1. This is the converse of a left join: the result table
340 will always have a row for each row in T2.
346 <term><literal>FULL OUTER JOIN</></term>
350 First, an inner join is performed. Then, for each row in
351 T1 that does not satisfy the join condition with any row in
352 T2, a joined row is added with null values in columns of
353 T2. Also, for each row of T2 that does not satisfy the
354 join condition with any row in T1, a joined row with null
355 values in the columns of T1 is added.
366 Joins of all types can be chained together or nested: either or
367 both <replaceable>T1</replaceable> and
368 <replaceable>T2</replaceable> can be joined tables. Parentheses
369 can be used around <literal>JOIN</> clauses to control the join
370 order. In the absence of parentheses, <literal>JOIN</> clauses
375 To put this together, assume we have tables <literal>t1</literal>:
383 and <literal>t2</literal>:
391 then we get the following results for the various joins:
393 <prompt>=></> <userinput>SELECT * FROM t1 CROSS JOIN t2;</>
394 num | name | num | value
395 -----+------+-----+-------
407 <prompt>=></> <userinput>SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num;</>
408 num | name | num | value
409 -----+------+-----+-------
414 <prompt>=></> <userinput>SELECT * FROM t1 INNER JOIN t2 USING (num);</>
421 <prompt>=></> <userinput>SELECT * FROM t1 NATURAL INNER JOIN t2;</>
428 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num;</>
429 num | name | num | value
430 -----+------+-----+-------
436 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 USING (num);</>
444 <prompt>=></> <userinput>SELECT * FROM t1 RIGHT JOIN t2 ON t1.num = t2.num;</>
445 num | name | num | value
446 -----+------+-----+-------
452 <prompt>=></> <userinput>SELECT * FROM t1 FULL JOIN t2 ON t1.num = t2.num;</>
453 num | name | num | value
454 -----+------+-----+-------
464 The join condition specified with <literal>ON</> can also contain
465 conditions that do not relate directly to the join. This can
466 prove useful for some queries but needs to be thought out
467 carefully. For example:
469 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num AND t2.value = 'xxx';</>
470 num | name | num | value
471 -----+------+-----+-------
477 Notice that placing the restriction in the <literal>WHERE</> clause
478 produces a different result:
480 <prompt>=></> <userinput>SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx';</>
481 num | name | num | value
482 -----+------+-----+-------
486 This is because a restriction placed in the <literal>ON</>
487 clause is processed <emphasis>before</> the join, while
488 a restriction placed in the <literal>WHERE</> clause is processed
489 <emphasis>after</> the join.
493 <sect3 id="queries-table-aliases">
494 <title>Table and Column Aliases</title>
496 <indexterm zone="queries-table-aliases">
497 <primary>alias</primary>
498 <secondary>in the FROM clause</secondary>
502 <primary>label</primary>
507 A temporary name can be given to tables and complex table
508 references to be used for references to the derived table in
509 the rest of the query. This is called a <firstterm>table
514 To create a table alias, write
516 FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceable>
520 FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
522 The <literal>AS</literal> key word is optional noise.
523 <replaceable>alias</replaceable> can be any identifier.
527 A typical application of table aliases is to assign short
528 identifiers to long table names to keep the join clauses
529 readable. For example:
531 SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.id = a.num;
536 The alias becomes the new name of the table reference so far as the
537 current query is concerned — it is not allowed to refer to the
538 table by the original name elsewhere in the query. Thus, this is not
541 SELECT * FROM my_table AS m WHERE my_table.a > 5; -- wrong
546 Table aliases are mainly for notational convenience, but it is
547 necessary to use them when joining a table to itself, e.g.:
549 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
551 Additionally, an alias is required if the table reference is a
552 subquery (see <xref linkend="queries-subqueries">).
556 Parentheses are used to resolve ambiguities. In the following example,
557 the first statement assigns the alias <literal>b</literal> to the second
558 instance of <literal>my_table</>, but the second statement assigns the
559 alias to the result of the join:
561 SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
562 SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
567 Another form of table aliasing gives temporary names to the columns of
568 the table, as well as the table itself:
570 FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
572 If fewer column aliases are specified than the actual table has
573 columns, the remaining columns are not renamed. This syntax is
574 especially useful for self-joins or subqueries.
578 When an alias is applied to the output of a <literal>JOIN</>
579 clause, the alias hides the original
580 name(s) within the <literal>JOIN</>. For example:
582 SELECT a.* FROM my_table AS a JOIN your_table AS b ON ...
586 SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
588 is not valid; the table alias <literal>a</> is not visible
589 outside the alias <literal>c</>.
593 <sect3 id="queries-subqueries">
594 <title>Subqueries</title>
596 <indexterm zone="queries-subqueries">
597 <primary>subquery</primary>
601 Subqueries specifying a derived table must be enclosed in
602 parentheses and <emphasis>must</emphasis> be assigned a table
603 alias name (as in <xref linkend="queries-table-aliases">). For
606 FROM (SELECT * FROM table1) AS alias_name
611 This example is equivalent to <literal>FROM table1 AS
612 alias_name</literal>. More interesting cases, which cannot be
613 reduced to a plain join, arise when the subquery involves
614 grouping or aggregation.
618 A subquery can also be a <command>VALUES</> list:
620 FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow'))
621 AS names(first, last)
623 Again, a table alias is required. Assigning alias names to the columns
624 of the <command>VALUES</> list is optional, but is good practice.
625 For more information see <xref linkend="queries-values">.
629 <sect3 id="queries-tablefunctions">
630 <title>Table Functions</title>
632 <indexterm zone="queries-tablefunctions"><primary>table function</></>
634 <indexterm zone="queries-tablefunctions">
636 <secondary>in the FROM clause</>
640 Table functions are functions that produce a set of rows, made up
641 of either base data types (scalar types) or composite data types
642 (table rows). They are used like a table, view, or subquery in
643 the <literal>FROM</> clause of a query. Columns returned by table
644 functions can be included in <literal>SELECT</>,
645 <literal>JOIN</>, or <literal>WHERE</> clauses in the same manner
646 as a table, view, or subquery column.
650 If a table function returns a base data type, the single result
651 column name matches the function name. If the function returns a
652 composite type, the result columns get the same names as the
653 individual attributes of the type.
657 A table function can be aliased in the <literal>FROM</> clause,
658 but it also can be left unaliased. If a function is used in the
659 <literal>FROM</> clause with no alias, the function name is used
660 as the resulting table name.
666 CREATE TABLE foo (fooid int, foosubid int, fooname text);
668 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
669 SELECT * FROM foo WHERE fooid = $1;
672 SELECT * FROM getfoo(1) AS t1;
677 FROM getfoo(foo.fooid) z
678 WHERE z.fooid = foo.fooid
681 CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
683 SELECT * FROM vw_getfoo;
688 In some cases it is useful to define table functions that can
689 return different column sets depending on how they are invoked.
690 To support this, the table function can be declared as returning
691 the pseudotype <type>record</>. When such a function is used in
692 a query, the expected row structure must be specified in the
693 query itself, so that the system can know how to parse and plan
694 the query. Consider this example:
697 FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
698 AS t1(proname name, prosrc text)
699 WHERE proname LIKE 'bytea%';
701 The <xref linkend="CONTRIB-DBLINK-FUNCTION"> function
702 (part of the <xref linkend="dblink"> module>) executes
703 a remote query. It is declared to return
704 <type>record</> since it might be used for any kind of query.
705 The actual column set must be specified in the calling query so
706 that the parser knows, for example, what <literal>*</> should
711 <sect3 id="queries-lateral">
712 <title><literal>LATERAL</> Subqueries</title>
714 <indexterm zone="queries-lateral">
716 <secondary>in the FROM clause</>
720 Subqueries appearing in <literal>FROM</> can be
721 preceded by the key word <literal>LATERAL</>. This allows them to
722 reference columns provided by preceding <literal>FROM</> items.
723 (Without <literal>LATERAL</literal>, each subquery is
724 evaluated independently and so cannot cross-reference any other
725 <literal>FROM</> item.)
729 Table functions appearing in <literal>FROM</> can also be
730 preceded by the key word <literal>LATERAL</>, but for functions the
731 key word is optional; the function's arguments can contain references
732 to columns provided by preceding <literal>FROM</> items in any case.
736 A <literal>LATERAL</literal> item can appear at top level in the
737 <literal>FROM</> list, or within a <literal>JOIN</> tree. In the latter
738 case it can also refer to any items that are on the left-hand side of a
739 <literal>JOIN</> that it is on the right-hand side of.
743 When a <literal>FROM</> item contains <literal>LATERAL</literal>
744 cross-references, evaluation proceeds as follows: for each row of the
745 <literal>FROM</> item providing the cross-referenced column(s), or
746 set of rows of multiple <literal>FROM</> items providing the
747 columns, the <literal>LATERAL</literal> item is evaluated using that
748 row or row set's values of the columns. The resulting row(s) are
749 joined as usual with the rows they were computed from. This is
750 repeated for each row or set of rows from the column source table(s).
754 A trivial example of <literal>LATERAL</literal> is
756 SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;
758 This is not especially useful since it has exactly the same result as
759 the more conventional
761 SELECT * FROM foo, bar WHERE bar.id = foo.bar_id;
763 <literal>LATERAL</literal> is primarily useful when the cross-referenced
764 column is necessary for computing the row(s) to be joined. A common
765 application is providing an argument value for a set-returning function.
766 For example, supposing that <function>vertices(polygon)</> returns the
767 set of vertices of a polygon, we could identify close-together vertices
768 of polygons stored in a table with:
770 SELECT p1.id, p2.id, v1, v2
771 FROM polygons p1, polygons p2,
772 LATERAL vertices(p1.poly) v1,
773 LATERAL vertices(p2.poly) v2
774 WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;
776 This query could also be written
778 SELECT p1.id, p2.id, v1, v2
779 FROM polygons p1 CROSS JOIN LATERAL vertices(p1.poly) v1,
780 polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
781 WHERE (v1 <-> v2) < 10 AND p1.id != p2.id;
783 or in several other equivalent formulations. (As already mentioned,
784 the <literal>LATERAL</> key word is unnecessary in this example, but
785 we use it for clarity.)
789 It is often particularly handy to <literal>LEFT JOIN</> to a
790 <literal>LATERAL</literal> subquery, so that source rows will appear in
791 the result even if the <literal>LATERAL</literal> subquery produces no
792 rows for them. For example, if <function>get_product_names()</> returns
793 the names of products made by a manufacturer, but some manufacturers in
794 our table currently produce no products, we could find out which ones
798 FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
805 <sect2 id="queries-where">
806 <title>The <literal>WHERE</literal> Clause</title>
808 <indexterm zone="queries-where">
809 <primary>WHERE</primary>
813 The syntax of the <xref linkend="sql-where"
814 endterm="sql-where-title"> is
816 WHERE <replaceable>search_condition</replaceable>
818 where <replaceable>search_condition</replaceable> is any value
819 expression (see <xref linkend="sql-expressions">) that
820 returns a value of type <type>boolean</type>.
824 After the processing of the <literal>FROM</> clause is done, each
825 row of the derived virtual table is checked against the search
826 condition. If the result of the condition is true, the row is
827 kept in the output table, otherwise (i.e., if the result is
828 false or null) it is discarded. The search condition typically
829 references at least one column of the table generated in the
830 <literal>FROM</> clause; this is not required, but otherwise the
831 <literal>WHERE</> clause will be fairly useless.
836 The join condition of an inner join can be written either in
837 the <literal>WHERE</> clause or in the <literal>JOIN</> clause.
838 For example, these table expressions are equivalent:
840 FROM a, b WHERE a.id = b.id AND b.val > 5
844 FROM a INNER JOIN b ON (a.id = b.id) WHERE b.val > 5
848 FROM a NATURAL JOIN b WHERE b.val > 5
850 Which one of these you use is mainly a matter of style. The
851 <literal>JOIN</> syntax in the <literal>FROM</> clause is
852 probably not as portable to other SQL database management systems,
853 even though it is in the SQL standard. For
854 outer joins there is no choice: they must be done in
855 the <literal>FROM</> clause. The <literal>ON</> or <literal>USING</>
856 clause of an outer join is <emphasis>not</> equivalent to a
857 <literal>WHERE</> condition, because it results in the addition
858 of rows (for unmatched input rows) as well as the removal of rows
864 Here are some examples of <literal>WHERE</literal> clauses:
866 SELECT ... FROM fdt WHERE c1 > 5
868 SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)
870 SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)
872 SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)
874 SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100
876 SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)
878 <literal>fdt</literal> is the table derived in the
879 <literal>FROM</> clause. Rows that do not meet the search
880 condition of the <literal>WHERE</> clause are eliminated from
881 <literal>fdt</literal>. Notice the use of scalar subqueries as
882 value expressions. Just like any other query, the subqueries can
883 employ complex table expressions. Notice also how
884 <literal>fdt</literal> is referenced in the subqueries.
885 Qualifying <literal>c1</> as <literal>fdt.c1</> is only necessary
886 if <literal>c1</> is also the name of a column in the derived
887 input table of the subquery. But qualifying the column name adds
888 clarity even when it is not needed. This example shows how the column
889 naming scope of an outer query extends into its inner queries.
894 <sect2 id="queries-group">
895 <title>The <literal>GROUP BY</literal> and <literal>HAVING</literal> Clauses</title>
897 <indexterm zone="queries-group">
898 <primary>GROUP BY</primary>
901 <indexterm zone="queries-group">
902 <primary>grouping</primary>
906 After passing the <literal>WHERE</> filter, the derived input
907 table might be subject to grouping, using the <literal>GROUP BY</>
908 clause, and elimination of group rows using the <literal>HAVING</>
913 SELECT <replaceable>select_list</replaceable>
915 <optional>WHERE ...</optional>
916 GROUP BY <replaceable>grouping_column_reference</replaceable> <optional>, <replaceable>grouping_column_reference</replaceable></optional>...
920 The <xref linkend="sql-groupby" endterm="sql-groupby-title"> is
921 used to group together those rows in a table that have the same
922 values in all the columns listed. The order in which the columns
923 are listed does not matter. The effect is to combine each set
924 of rows having common values into one group row that
925 represents all rows in the group. This is done to
926 eliminate redundancy in the output and/or compute aggregates that
927 apply to these groups. For instance:
929 <prompt>=></> <userinput>SELECT * FROM test1;</>
938 <prompt>=></> <userinput>SELECT x FROM test1 GROUP BY x;</>
949 In the second query, we could not have written <literal>SELECT *
950 FROM test1 GROUP BY x</literal>, because there is no single value
951 for the column <literal>y</> that could be associated with each
952 group. The grouped-by columns can be referenced in the select list since
953 they have a single value in each group.
957 In general, if a table is grouped, columns that are not
958 listed in <literal>GROUP BY</> cannot be referenced except in aggregate
959 expressions. An example with aggregate expressions is:
961 <prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x;</>
969 Here <literal>sum</literal> is an aggregate function that
970 computes a single value over the entire group. More information
971 about the available aggregate functions can be found in <xref
972 linkend="functions-aggregate">.
977 Grouping without aggregate expressions effectively calculates the
978 set of distinct values in a column. This can also be achieved
979 using the <literal>DISTINCT</> clause (see <xref
980 linkend="queries-distinct">).
985 Here is another example: it calculates the total sales for each
986 product (rather than the total sales of all products):
988 SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
989 FROM products p LEFT JOIN sales s USING (product_id)
990 GROUP BY product_id, p.name, p.price;
992 In this example, the columns <literal>product_id</literal>,
993 <literal>p.name</literal>, and <literal>p.price</literal> must be
994 in the <literal>GROUP BY</> clause since they are referenced in
995 the query select list (but see below). The column
996 <literal>s.units</> does not have to be in the <literal>GROUP
997 BY</> list since it is only used in an aggregate expression
998 (<literal>sum(...)</literal>), which represents the sales
999 of a product. For each product, the query returns a summary row about
1000 all sales of the product.
1003 <indexterm><primary>functional dependency</primary></indexterm>
1006 If the products table is set up so that, say,
1007 <literal>product_id</literal> is the primary key, then it would be
1008 enough to group by <literal>product_id</literal> in the above example,
1009 since name and price would be <firstterm>functionally
1010 dependent</firstterm> on the product ID, and so there would be no
1011 ambiguity about which name and price value to return for each product
1016 In strict SQL, <literal>GROUP BY</> can only group by columns of
1017 the source table but <productname>PostgreSQL</productname> extends
1018 this to also allow <literal>GROUP BY</> to group by columns in the
1019 select list. Grouping by value expressions instead of simple
1020 column names is also allowed.
1024 <primary>HAVING</primary>
1028 If a table has been grouped using <literal>GROUP BY</literal>,
1029 but only certain groups are of interest, the
1030 <literal>HAVING</literal> clause can be used, much like a
1031 <literal>WHERE</> clause, to eliminate groups from the result.
1034 SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</optional> GROUP BY ... HAVING <replaceable>boolean_expression</replaceable>
1036 Expressions in the <literal>HAVING</> clause can refer both to
1037 grouped expressions and to ungrouped expressions (which necessarily
1038 involve an aggregate function).
1044 <prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING sum(y) > 3;</>
1051 <prompt>=></> <userinput>SELECT x, sum(y) FROM test1 GROUP BY x HAVING x < 'c';</>
1061 Again, a more realistic example:
1063 SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit
1064 FROM products p LEFT JOIN sales s USING (product_id)
1065 WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
1066 GROUP BY product_id, p.name, p.price, p.cost
1067 HAVING sum(p.price * s.units) > 5000;
1069 In the example above, the <literal>WHERE</> clause is selecting
1070 rows by a column that is not grouped (the expression is only true for
1071 sales during the last four weeks), while the <literal>HAVING</>
1072 clause restricts the output to groups with total gross sales over
1073 5000. Note that the aggregate expressions do not necessarily need
1074 to be the same in all parts of the query.
1078 If a query contains aggregate function calls, but no <literal>GROUP BY</>
1079 clause, grouping still occurs: the result is a single group row (or
1080 perhaps no rows at all, if the single row is then eliminated by
1081 <literal>HAVING</>).
1082 The same is true if it contains a <literal>HAVING</> clause, even
1083 without any aggregate function calls or <literal>GROUP BY</> clause.
1087 <sect2 id="queries-window">
1088 <title>Window Function Processing</title>
1090 <indexterm zone="queries-window">
1091 <primary>window function</primary>
1092 <secondary>order of execution</>
1096 If the query contains any window functions (see
1097 <xref linkend="tutorial-window">,
1098 <xref linkend="functions-window"> and
1099 <xref linkend="syntax-window-functions">), these functions are evaluated
1100 after any grouping, aggregation, and <literal>HAVING</> filtering is
1101 performed. That is, if the query uses any aggregates, <literal>GROUP
1102 BY</>, or <literal>HAVING</>, then the rows seen by the window functions
1103 are the group rows instead of the original table rows from
1104 <literal>FROM</>/<literal>WHERE</>.
1108 When multiple window functions are used, all the window functions having
1109 syntactically equivalent <literal>PARTITION BY</> and <literal>ORDER BY</>
1110 clauses in their window definitions are guaranteed to be evaluated in a
1111 single pass over the data. Therefore they will see the same sort ordering,
1112 even if the <literal>ORDER BY</> does not uniquely determine an ordering.
1113 However, no guarantees are made about the evaluation of functions having
1114 different <literal>PARTITION BY</> or <literal>ORDER BY</> specifications.
1115 (In such cases a sort step is typically required between the passes of
1116 window function evaluations, and the sort is not guaranteed to preserve
1117 ordering of rows that its <literal>ORDER BY</> sees as equivalent.)
1121 Currently, window functions always require presorted data, and so the
1122 query output will be ordered according to one or another of the window
1123 functions' <literal>PARTITION BY</>/<literal>ORDER BY</> clauses.
1124 It is not recommended to rely on this, however. Use an explicit
1125 top-level <literal>ORDER BY</> clause if you want to be sure the
1126 results are sorted in a particular way.
1132 <sect1 id="queries-select-lists">
1133 <title>Select Lists</title>
1136 <primary>SELECT</primary>
1137 <secondary>select list</secondary>
1141 As shown in the previous section,
1142 the table expression in the <command>SELECT</command> command
1143 constructs an intermediate virtual table by possibly combining
1144 tables, views, eliminating rows, grouping, etc. This table is
1145 finally passed on to processing by the <firstterm>select list</firstterm>. The select
1146 list determines which <emphasis>columns</emphasis> of the
1147 intermediate table are actually output.
1150 <sect2 id="queries-select-list-items">
1151 <title>Select-List Items</title>
1154 <primary>*</primary>
1158 The simplest kind of select list is <literal>*</literal> which
1159 emits all columns that the table expression produces. Otherwise,
1160 a select list is a comma-separated list of value expressions (as
1161 defined in <xref linkend="sql-expressions">). For instance, it
1162 could be a list of column names:
1164 SELECT a, b, c FROM ...
1166 The columns names <literal>a</>, <literal>b</>, and <literal>c</>
1167 are either the actual names of the columns of tables referenced
1168 in the <literal>FROM</> clause, or the aliases given to them as
1169 explained in <xref linkend="queries-table-aliases">. The name
1170 space available in the select list is the same as in the
1171 <literal>WHERE</> clause, unless grouping is used, in which case
1172 it is the same as in the <literal>HAVING</> clause.
1176 If more than one table has a column of the same name, the table
1177 name must also be given, as in:
1179 SELECT tbl1.a, tbl2.a, tbl1.b FROM ...
1181 When working with multiple tables, it can also be useful to ask for
1182 all the columns of a particular table:
1184 SELECT tbl1.*, tbl2.a FROM ...
1186 (See also <xref linkend="queries-where">.)
1190 If an arbitrary value expression is used in the select list, it
1191 conceptually adds a new virtual column to the returned table. The
1192 value expression is evaluated once for each result row, with
1193 the row's values substituted for any column references. But the
1194 expressions in the select list do not have to reference any
1195 columns in the table expression of the <literal>FROM</> clause;
1196 they can be constant arithmetic expressions, for instance.
1200 <sect2 id="queries-column-labels">
1201 <title>Column Labels</title>
1203 <indexterm zone="queries-column-labels">
1204 <primary>alias</primary>
1205 <secondary>in the select list</secondary>
1209 The entries in the select list can be assigned names for subsequent
1210 processing, such as for use in an <literal>ORDER BY</> clause
1211 or for display by the client application. For example:
1213 SELECT a AS value, b + c AS sum FROM ...
1218 If no output column name is specified using <literal>AS</>,
1219 the system assigns a default column name. For simple column references,
1220 this is the name of the referenced column. For function
1221 calls, this is the name of the function. For complex expressions,
1222 the system will generate a generic name.
1226 The <literal>AS</> keyword is optional, but only if the new column
1227 name does not match any
1228 <productname>PostgreSQL</productname> keyword (see <xref
1229 linkend="sql-keywords-appendix">). To avoid an accidental match to
1230 a keyword, you can double-quote the column name. For example,
1231 <literal>VALUE</> is a keyword, so this does not work:
1233 SELECT a value, b + c AS sum FROM ...
1237 SELECT a "value", b + c AS sum FROM ...
1239 For protection against possible
1240 future keyword additions, it is recommended that you always either
1241 write <literal>AS</literal> or double-quote the output column name.
1246 The naming of output columns here is different from that done in
1247 the <literal>FROM</> clause (see <xref
1248 linkend="queries-table-aliases">). It is possible
1249 to rename the same column twice, but the name assigned in
1250 the select list is the one that will be passed on.
1255 <sect2 id="queries-distinct">
1256 <title><literal>DISTINCT</literal></title>
1258 <indexterm zone="queries-distinct">
1259 <primary>DISTINCT</primary>
1262 <indexterm zone="queries-distinct">
1263 <primary>duplicates</primary>
1267 After the select list has been processed, the result table can
1268 optionally be subject to the elimination of duplicate rows. The
1269 <literal>DISTINCT</literal> key word is written directly after
1270 <literal>SELECT</literal> to specify this:
1272 SELECT DISTINCT <replaceable>select_list</replaceable> ...
1274 (Instead of <literal>DISTINCT</> the key word <literal>ALL</literal>
1275 can be used to specify the default behavior of retaining all rows.)
1279 <primary>null value</>
1280 <secondary sortas="DISTINCT">in DISTINCT</>
1284 Obviously, two rows are considered distinct if they differ in at
1285 least one column value. Null values are considered equal in this
1290 Alternatively, an arbitrary expression can determine what rows are
1291 to be considered distinct:
1293 SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ...</optional>) <replaceable>select_list</replaceable> ...
1295 Here <replaceable>expression</replaceable> is an arbitrary value
1296 expression that is evaluated for all rows. A set of rows for
1297 which all the expressions are equal are considered duplicates, and
1298 only the first row of the set is kept in the output. Note that
1299 the <quote>first row</quote> of a set is unpredictable unless the
1300 query is sorted on enough columns to guarantee a unique ordering
1301 of the rows arriving at the <literal>DISTINCT</> filter.
1302 (<literal>DISTINCT ON</> processing occurs after <literal>ORDER
1307 The <literal>DISTINCT ON</> clause is not part of the SQL standard
1308 and is sometimes considered bad style because of the potentially
1309 indeterminate nature of its results. With judicious use of
1310 <literal>GROUP BY</> and subqueries in <literal>FROM</>, this
1311 construct can be avoided, but it is often the most convenient
1318 <sect1 id="queries-union">
1319 <title>Combining Queries</title>
1321 <indexterm zone="queries-union">
1322 <primary>UNION</primary>
1324 <indexterm zone="queries-union">
1325 <primary>INTERSECT</primary>
1327 <indexterm zone="queries-union">
1328 <primary>EXCEPT</primary>
1330 <indexterm zone="queries-union">
1331 <primary>set union</primary>
1333 <indexterm zone="queries-union">
1334 <primary>set intersection</primary>
1336 <indexterm zone="queries-union">
1337 <primary>set difference</primary>
1339 <indexterm zone="queries-union">
1340 <primary>set operation</primary>
1344 The results of two queries can be combined using the set operations
1345 union, intersection, and difference. The syntax is
1347 <replaceable>query1</replaceable> UNION <optional>ALL</optional> <replaceable>query2</replaceable>
1348 <replaceable>query1</replaceable> INTERSECT <optional>ALL</optional> <replaceable>query2</replaceable>
1349 <replaceable>query1</replaceable> EXCEPT <optional>ALL</optional> <replaceable>query2</replaceable>
1351 <replaceable>query1</replaceable> and
1352 <replaceable>query2</replaceable> are queries that can use any of
1353 the features discussed up to this point. Set operations can also
1354 be nested and chained, for example
1356 <replaceable>query1</replaceable> UNION <replaceable>query2</replaceable> UNION <replaceable>query3</replaceable>
1358 which is executed as:
1360 (<replaceable>query1</replaceable> UNION <replaceable>query2</replaceable>) UNION <replaceable>query3</replaceable>
1365 <literal>UNION</> effectively appends the result of
1366 <replaceable>query2</replaceable> to the result of
1367 <replaceable>query1</replaceable> (although there is no guarantee
1368 that this is the order in which the rows are actually returned).
1369 Furthermore, it eliminates duplicate rows from its result, in the same
1370 way as <literal>DISTINCT</>, unless <literal>UNION ALL</> is used.
1374 <literal>INTERSECT</> returns all rows that are both in the result
1375 of <replaceable>query1</replaceable> and in the result of
1376 <replaceable>query2</replaceable>. Duplicate rows are eliminated
1377 unless <literal>INTERSECT ALL</> is used.
1381 <literal>EXCEPT</> returns all rows that are in the result of
1382 <replaceable>query1</replaceable> but not in the result of
1383 <replaceable>query2</replaceable>. (This is sometimes called the
1384 <firstterm>difference</> between two queries.) Again, duplicates
1385 are eliminated unless <literal>EXCEPT ALL</> is used.
1389 In order to calculate the union, intersection, or difference of two
1390 queries, the two queries must be <quote>union compatible</quote>,
1391 which means that they return the same number of columns and
1392 the corresponding columns have compatible data types, as
1393 described in <xref linkend="typeconv-union-case">.
1398 <sect1 id="queries-order">
1399 <title>Sorting Rows</title>
1401 <indexterm zone="queries-order">
1402 <primary>sorting</primary>
1405 <indexterm zone="queries-order">
1406 <primary>ORDER BY</primary>
1410 After a query has produced an output table (after the select list
1411 has been processed) it can optionally be sorted. If sorting is not
1412 chosen, the rows will be returned in an unspecified order. The actual
1413 order in that case will depend on the scan and join plan types and
1414 the order on disk, but it must not be relied on. A particular
1415 output ordering can only be guaranteed if the sort step is explicitly
1420 The <literal>ORDER BY</> clause specifies the sort order:
1422 SELECT <replaceable>select_list</replaceable>
1423 FROM <replaceable>table_expression</replaceable>
1424 ORDER BY <replaceable>sort_expression1</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional>
1425 <optional>, <replaceable>sort_expression2</replaceable> <optional>ASC | DESC</optional> <optional>NULLS { FIRST | LAST }</optional> ...</optional>
1427 The sort expression(s) can be any expression that would be valid in the
1428 query's select list. An example is:
1430 SELECT a, b FROM table1 ORDER BY a + b, c;
1432 When more than one expression is specified,
1433 the later values are used to sort rows that are equal according to the
1434 earlier values. Each expression can be followed by an optional
1435 <literal>ASC</> or <literal>DESC</> keyword to set the sort direction to
1436 ascending or descending. <literal>ASC</> order is the default.
1437 Ascending order puts smaller values first, where
1438 <quote>smaller</quote> is defined in terms of the
1439 <literal><</literal> operator. Similarly, descending order is
1440 determined with the <literal>></literal> operator.
1443 Actually, <productname>PostgreSQL</> uses the <firstterm>default B-tree
1444 operator class</> for the expression's data type to determine the sort
1445 ordering for <literal>ASC</> and <literal>DESC</>. Conventionally,
1446 data types will be set up so that the <literal><</literal> and
1447 <literal>></literal> operators correspond to this sort ordering,
1448 but a user-defined data type's designer could choose to do something
1455 The <literal>NULLS FIRST</> and <literal>NULLS LAST</> options can be
1456 used to determine whether nulls appear before or after non-null values
1457 in the sort ordering. By default, null values sort as if larger than any
1458 non-null value; that is, <literal>NULLS FIRST</> is the default for
1459 <literal>DESC</> order, and <literal>NULLS LAST</> otherwise.
1463 Note that the ordering options are considered independently for each
1464 sort column. For example <literal>ORDER BY x, y DESC</> means
1465 <literal>ORDER BY x ASC, y DESC</>, which is not the same as
1466 <literal>ORDER BY x DESC, y DESC</>.
1470 A <replaceable>sort_expression</> can also be the column label or number
1471 of an output column, as in:
1473 SELECT a + b AS sum, c FROM table1 ORDER BY sum;
1474 SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1;
1476 both of which sort by the first output column. Note that an output
1477 column name has to stand alone, that is, it cannot be used in an expression
1478 — for example, this is <emphasis>not</> correct:
1480 SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; -- wrong
1482 This restriction is made to reduce ambiguity. There is still
1483 ambiguity if an <literal>ORDER BY</> item is a simple name that
1484 could match either an output column name or a column from the table
1485 expression. The output column is used in such cases. This would
1486 only cause confusion if you use <literal>AS</> to rename an output
1487 column to match some other table column's name.
1491 <literal>ORDER BY</> can be applied to the result of a
1492 <literal>UNION</>, <literal>INTERSECT</>, or <literal>EXCEPT</>
1493 combination, but in this case it is only permitted to sort by
1494 output column names or numbers, not by expressions.
1499 <sect1 id="queries-limit">
1500 <title><literal>LIMIT</literal> and <literal>OFFSET</literal></title>
1502 <indexterm zone="queries-limit">
1503 <primary>LIMIT</primary>
1506 <indexterm zone="queries-limit">
1507 <primary>OFFSET</primary>
1511 <literal>LIMIT</> and <literal>OFFSET</> allow you to retrieve just
1512 a portion of the rows that are generated by the rest of the query:
1514 SELECT <replaceable>select_list</replaceable>
1515 FROM <replaceable>table_expression</replaceable>
1516 <optional> ORDER BY ... </optional>
1517 <optional> LIMIT { <replaceable>number</replaceable> | ALL } </optional> <optional> OFFSET <replaceable>number</replaceable> </optional>
1522 If a limit count is given, no more than that many rows will be
1523 returned (but possibly less, if the query itself yields less rows).
1524 <literal>LIMIT ALL</> is the same as omitting the <literal>LIMIT</>
1529 <literal>OFFSET</> says to skip that many rows before beginning to
1530 return rows. <literal>OFFSET 0</> is the same as omitting the
1531 <literal>OFFSET</> clause, and <literal>LIMIT NULL</> is the same
1532 as omitting the <literal>LIMIT</> clause. If both <literal>OFFSET</>
1533 and <literal>LIMIT</> appear, then <literal>OFFSET</> rows are
1534 skipped before starting to count the <literal>LIMIT</> rows that
1539 When using <literal>LIMIT</>, it is important to use an
1540 <literal>ORDER BY</> clause that constrains the result rows into a
1541 unique order. Otherwise you will get an unpredictable subset of
1542 the query's rows. You might be asking for the tenth through
1543 twentieth rows, but tenth through twentieth in what ordering? The
1544 ordering is unknown, unless you specified <literal>ORDER BY</>.
1548 The query optimizer takes <literal>LIMIT</> into account when
1549 generating query plans, so you are very likely to get different
1550 plans (yielding different row orders) depending on what you give
1551 for <literal>LIMIT</> and <literal>OFFSET</>. Thus, using
1552 different <literal>LIMIT</>/<literal>OFFSET</> values to select
1553 different subsets of a query result <emphasis>will give
1554 inconsistent results</emphasis> unless you enforce a predictable
1555 result ordering with <literal>ORDER BY</>. This is not a bug; it
1556 is an inherent consequence of the fact that SQL does not promise to
1557 deliver the results of a query in any particular order unless
1558 <literal>ORDER BY</> is used to constrain the order.
1562 The rows skipped by an <literal>OFFSET</> clause still have to be
1563 computed inside the server; therefore a large <literal>OFFSET</>
1564 might be inefficient.
1569 <sect1 id="queries-values">
1570 <title><literal>VALUES</literal> Lists</title>
1572 <indexterm zone="queries-values">
1573 <primary>VALUES</primary>
1577 <literal>VALUES</> provides a way to generate a <quote>constant table</>
1578 that can be used in a query without having to actually create and populate
1579 a table on-disk. The syntax is
1581 VALUES ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) [, ...]
1583 Each parenthesized list of expressions generates a row in the table.
1584 The lists must all have the same number of elements (i.e., the number
1585 of columns in the table), and corresponding entries in each list must
1586 have compatible data types. The actual data type assigned to each column
1587 of the result is determined using the same rules as for <literal>UNION</>
1588 (see <xref linkend="typeconv-union-case">).
1594 VALUES (1, 'one'), (2, 'two'), (3, 'three');
1597 will return a table of two columns and three rows. It's effectively
1600 SELECT 1 AS column1, 'one' AS column2
1607 By default, <productname>PostgreSQL</productname> assigns the names
1608 <literal>column1</>, <literal>column2</>, etc. to the columns of a
1609 <literal>VALUES</> table. The column names are not specified by the
1610 SQL standard and different database systems do it differently, so
1611 it's usually better to override the default names with a table alias
1616 Syntactically, <literal>VALUES</> followed by expression lists is
1617 treated as equivalent to:
1619 SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable>
1621 and can appear anywhere a <literal>SELECT</> can. For example, you can
1622 use it as part of a <literal>UNION</>, or attach a
1623 <replaceable>sort_specification</replaceable> (<literal>ORDER BY</>,
1624 <literal>LIMIT</>, and/or <literal>OFFSET</>) to it. <literal>VALUES</>
1625 is most commonly used as the data source in an <command>INSERT</> command,
1626 and next most commonly as a subquery.
1630 For more information see <xref linkend="sql-values">.
1636 <sect1 id="queries-with">
1637 <title><literal>WITH</literal> Queries (Common Table Expressions)</title>
1639 <indexterm zone="queries-with">
1640 <primary>WITH</primary>
1641 <secondary>in SELECT</secondary>
1645 <primary>common table expression</primary>
1650 <literal>WITH</> provides a way to write auxiliary statements for use in a
1651 larger query. These statements, which are often referred to as Common
1652 Table Expressions or <acronym>CTE</acronym>s, can be thought of as defining
1653 temporary tables that exist just for one query. Each auxiliary statement
1654 in a <literal>WITH</> clause can be a <command>SELECT</>,
1655 <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>; and the
1656 <literal>WITH</> clause itself is attached to a primary statement that can
1657 also be a <command>SELECT</>, <command>INSERT</>, <command>UPDATE</>, or
1661 <sect2 id="queries-with-select">
1662 <title><command>SELECT</> in <literal>WITH</></title>
1665 The basic value of <command>SELECT</> in <literal>WITH</> is to
1666 break down complicated queries into simpler parts. An example is:
1669 WITH regional_sales AS (
1670 SELECT region, SUM(amount) AS total_sales
1676 WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
1680 SUM(quantity) AS product_units,
1681 SUM(amount) AS product_sales
1683 WHERE region IN (SELECT region FROM top_regions)
1684 GROUP BY region, product;
1687 which displays per-product sales totals in only the top sales regions.
1688 The <literal>WITH</> clause defines two auxiliary statements named
1689 <structname>regional_sales</> and <structname>top_regions</>,
1690 where the output of <structname>regional_sales</> is used in
1691 <structname>top_regions</> and the output of <structname>top_regions</>
1692 is used in the primary <command>SELECT</> query.
1693 This example could have been written without <literal>WITH</>,
1694 but we'd have needed two levels of nested sub-<command>SELECT</command>s. It's a bit
1695 easier to follow this way.
1699 The optional <literal>RECURSIVE</> modifier changes <literal>WITH</>
1700 from a mere syntactic convenience into a feature that accomplishes
1701 things not otherwise possible in standard SQL. Using
1702 <literal>RECURSIVE</>, a <literal>WITH</> query can refer to its own
1703 output. A very simple example is this query to sum the integers from 1
1707 WITH RECURSIVE t(n) AS (
1710 SELECT n+1 FROM t WHERE n < 100
1712 SELECT sum(n) FROM t;
1715 The general form of a recursive <literal>WITH</> query is always a
1716 <firstterm>non-recursive term</>, then <literal>UNION</> (or
1717 <literal>UNION ALL</>), then a
1718 <firstterm>recursive term</>, where only the recursive term can contain
1719 a reference to the query's own output. Such a query is executed as
1724 <title>Recursive Query Evaluation</title>
1726 <step performance="required">
1728 Evaluate the non-recursive term. For <literal>UNION</> (but not
1729 <literal>UNION ALL</>), discard duplicate rows. Include all remaining
1730 rows in the result of the recursive query, and also place them in a
1731 temporary <firstterm>working table</>.
1735 <step performance="required">
1737 So long as the working table is not empty, repeat these steps:
1740 <step performance="required">
1742 Evaluate the recursive term, substituting the current contents of
1743 the working table for the recursive self-reference.
1744 For <literal>UNION</> (but not <literal>UNION ALL</>), discard
1745 duplicate rows and rows that duplicate any previous result row.
1746 Include all remaining rows in the result of the recursive query, and
1747 also place them in a temporary <firstterm>intermediate table</>.
1751 <step performance="required">
1753 Replace the contents of the working table with the contents of the
1754 intermediate table, then empty the intermediate table.
1763 Strictly speaking, this process is iteration not recursion, but
1764 <literal>RECURSIVE</> is the terminology chosen by the SQL standards
1770 In the example above, the working table has just a single row in each step,
1771 and it takes on the values from 1 through 100 in successive steps. In
1772 the 100th step, there is no output because of the <literal>WHERE</>
1773 clause, and so the query terminates.
1777 Recursive queries are typically used to deal with hierarchical or
1778 tree-structured data. A useful example is this query to find all the
1779 direct and indirect sub-parts of a product, given only a table that
1780 shows immediate inclusions:
1783 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
1784 SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
1786 SELECT p.sub_part, p.part, p.quantity
1787 FROM included_parts pr, parts p
1788 WHERE p.part = pr.sub_part
1790 SELECT sub_part, SUM(quantity) as total_quantity
1797 When working with recursive queries it is important to be sure that
1798 the recursive part of the query will eventually return no tuples,
1799 or else the query will loop indefinitely. Sometimes, using
1800 <literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
1801 by discarding rows that duplicate previous output rows. However, often a
1802 cycle does not involve output rows that are completely duplicate: it may be
1803 necessary to check just one or a few fields to see if the same point has
1804 been reached before. The standard method for handling such situations is
1805 to compute an array of the already-visited values. For example, consider
1806 the following query that searches a table <structname>graph</> using a
1807 <structfield>link</> field:
1810 WITH RECURSIVE search_graph(id, link, data, depth) AS (
1811 SELECT g.id, g.link, g.data, 1
1814 SELECT g.id, g.link, g.data, sg.depth + 1
1815 FROM graph g, search_graph sg
1816 WHERE g.id = sg.link
1818 SELECT * FROM search_graph;
1821 This query will loop if the <structfield>link</> relationships contain
1822 cycles. Because we require a <quote>depth</> output, just changing
1823 <literal>UNION ALL</> to <literal>UNION</> would not eliminate the looping.
1824 Instead we need to recognize whether we have reached the same row again
1825 while following a particular path of links. We add two columns
1826 <structfield>path</> and <structfield>cycle</> to the loop-prone query:
1829 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1830 SELECT g.id, g.link, g.data, 1,
1835 SELECT g.id, g.link, g.data, sg.depth + 1,
1838 FROM graph g, search_graph sg
1839 WHERE g.id = sg.link AND NOT cycle
1841 SELECT * FROM search_graph;
1844 Aside from preventing cycles, the array value is often useful in its own
1845 right as representing the <quote>path</> taken to reach any particular row.
1849 In the general case where more than one field needs to be checked to
1850 recognize a cycle, use an array of rows. For example, if we needed to
1851 compare fields <structfield>f1</> and <structfield>f2</>:
1854 WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
1855 SELECT g.id, g.link, g.data, 1,
1856 ARRAY[ROW(g.f1, g.f2)],
1860 SELECT g.id, g.link, g.data, sg.depth + 1,
1861 path || ROW(g.f1, g.f2),
1862 ROW(g.f1, g.f2) = ANY(path)
1863 FROM graph g, search_graph sg
1864 WHERE g.id = sg.link AND NOT cycle
1866 SELECT * FROM search_graph;
1872 Omit the <literal>ROW()</> syntax in the common case where only one field
1873 needs to be checked to recognize a cycle. This allows a simple array
1874 rather than a composite-type array to be used, gaining efficiency.
1880 The recursive query evaluation algorithm produces its output in
1881 breadth-first search order. You can display the results in depth-first
1882 search order by making the outer query <literal>ORDER BY</> a
1883 <quote>path</> column constructed in this way.
1888 A helpful trick for testing queries
1889 when you are not certain if they might loop is to place a <literal>LIMIT</>
1890 in the parent query. For example, this query would loop forever without
1891 the <literal>LIMIT</>:
1894 WITH RECURSIVE t(n) AS (
1899 SELECT n FROM t LIMIT 100;
1902 This works because <productname>PostgreSQL</productname>'s implementation
1903 evaluates only as many rows of a <literal>WITH</> query as are actually
1904 fetched by the parent query. Using this trick in production is not
1905 recommended, because other systems might work differently. Also, it
1906 usually won't work if you make the outer query sort the recursive query's
1907 results or join them to some other table, because in such cases the
1908 outer query will usually try to fetch all of the <literal>WITH</> query's
1913 A useful property of <literal>WITH</> queries is that they are evaluated
1914 only once per execution of the parent query, even if they are referred to
1915 more than once by the parent query or sibling <literal>WITH</> queries.
1916 Thus, expensive calculations that are needed in multiple places can be
1917 placed within a <literal>WITH</> query to avoid redundant work. Another
1918 possible application is to prevent unwanted multiple evaluations of
1919 functions with side-effects.
1920 However, the other side of this coin is that the optimizer is less able to
1921 push restrictions from the parent query down into a <literal>WITH</> query
1922 than an ordinary sub-query. The <literal>WITH</> query will generally be
1923 evaluated as written, without suppression of rows that the parent query
1924 might discard afterwards. (But, as mentioned above, evaluation might stop
1925 early if the reference(s) to the query demand only a limited number of
1930 The examples above only show <literal>WITH</> being used with
1931 <command>SELECT</>, but it can be attached in the same way to
1932 <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>.
1933 In each case it effectively provides temporary table(s) that can
1934 be referred to in the main command.
1938 <sect2 id="queries-with-modifying">
1939 <title>Data-Modifying Statements in <literal>WITH</></title>
1942 You can use data-modifying statements (<command>INSERT</>,
1943 <command>UPDATE</>, or <command>DELETE</>) in <literal>WITH</>. This
1944 allows you to perform several different operations in the same query.
1948 WITH moved_rows AS (
1949 DELETE FROM products
1951 "date" >= '2010-10-01' AND
1952 "date" < '2010-11-01'
1955 INSERT INTO products_log
1956 SELECT * FROM moved_rows;
1959 This query effectively moves rows from <structname>products</> to
1960 <structname>products_log</>. The <command>DELETE</> in <literal>WITH</>
1961 deletes the specified rows from <structname>products</>, returning their
1962 contents by means of its <literal>RETURNING</> clause; and then the
1963 primary query reads that output and inserts it into
1964 <structname>products_log</>.
1968 A fine point of the above example is that the <literal>WITH</> clause is
1969 attached to the <command>INSERT</>, not the sub-<command>SELECT</> within
1970 the <command>INSERT</>. This is necessary because data-modifying
1971 statements are only allowed in <literal>WITH</> clauses that are attached
1972 to the top-level statement. However, normal <literal>WITH</> visibility
1973 rules apply, so it is possible to refer to the <literal>WITH</>
1974 statement's output from the sub-<command>SELECT</>.
1978 Data-modifying statements in <literal>WITH</> usually have
1979 <literal>RETURNING</> clauses, as seen in the example above.
1980 It is the output of the <literal>RETURNING</> clause, <emphasis>not</> the
1981 target table of the data-modifying statement, that forms the temporary
1982 table that can be referred to by the rest of the query. If a
1983 data-modifying statement in <literal>WITH</> lacks a <literal>RETURNING</>
1984 clause, then it forms no temporary table and cannot be referred to in
1985 the rest of the query. Such a statement will be executed nonetheless.
1986 A not-particularly-useful example is:
1995 This example would remove all rows from tables <structname>foo</> and
1996 <structname>bar</>. The number of affected rows reported to the client
1997 would only include rows removed from <structname>bar</>.
2001 Recursive self-references in data-modifying statements are not
2002 allowed. In some cases it is possible to work around this limitation by
2003 referring to the output of a recursive <literal>WITH</>, for example:
2006 WITH RECURSIVE included_parts(sub_part, part) AS (
2007 SELECT sub_part, part FROM parts WHERE part = 'our_product'
2009 SELECT p.sub_part, p.part
2010 FROM included_parts pr, parts p
2011 WHERE p.part = pr.sub_part
2014 WHERE part IN (SELECT part FROM included_parts);
2017 This query would remove all direct and indirect subparts of a product.
2021 Data-modifying statements in <literal>WITH</> are executed exactly once,
2022 and always to completion, independently of whether the primary query
2023 reads all (or indeed any) of their output. Notice that this is different
2024 from the rule for <command>SELECT</> in <literal>WITH</>: as stated in the
2025 previous section, execution of a <command>SELECT</> is carried only as far
2026 as the primary query demands its output.
2030 The sub-statements in <literal>WITH</> are executed concurrently with
2031 each other and with the main query. Therefore, when using data-modifying
2032 statements in <literal>WITH</>, the order in which the specified updates
2033 actually happen is unpredictable. All the statements are executed with
2034 the same <firstterm>snapshot</> (see <xref linkend="mvcc">), so they
2035 cannot <quote>see</> each others' effects on the target tables. This
2036 alleviates the effects of the unpredictability of the actual order of row
2037 updates, and means that <literal>RETURNING</> data is the only way to
2038 communicate changes between different <literal>WITH</> sub-statements and
2039 the main query. An example of this is that in
2043 UPDATE products SET price = price * 1.05
2046 SELECT * FROM products;
2049 the outer <command>SELECT</> would return the original prices before the
2050 action of the <command>UPDATE</>, while in
2054 UPDATE products SET price = price * 1.05
2060 the outer <command>SELECT</> would return the updated data.
2064 Trying to update the same row twice in a single statement is not
2065 supported. Only one of the modifications takes place, but it is not easy
2066 (and sometimes not possible) to reliably predict which one. This also
2067 applies to deleting a row that was already updated in the same statement:
2068 only the update is performed. Therefore you should generally avoid trying
2069 to modify a single row twice in a single statement. In particular avoid
2070 writing <literal>WITH</> sub-statements that could affect the same rows
2071 changed by the main statement or a sibling sub-statement. The effects
2072 of such a statement will not be predictable.
2076 At present, any table used as the target of a data-modifying statement in
2077 <literal>WITH</> must not have a conditional rule, nor an <literal>ALSO</>
2078 rule, nor an <literal>INSTEAD</> rule that expands to multiple statements.