2 $PostgreSQL: pgsql/doc/src/sgml/query.sgml,v 1.43 2005/01/22 22:56:36 momjian Exp $
5 <chapter id="tutorial-sql">
6 <title>The <acronym>SQL</acronym> Language</title>
8 <sect1 id="tutorial-sql-intro">
9 <title>Introduction</title>
12 This chapter provides an overview of how to use
13 <acronym>SQL</acronym> to perform simple operations. This
14 tutorial is only intended to give you an introduction and is in no
15 way a complete tutorial on <acronym>SQL</acronym>. Numerous books
16 have been written on <acronym>SQL</acronym>, including <xref
17 linkend="MELT93"> and <xref linkend="DATE97">.
18 You should be aware that some <productname>PostgreSQL</productname>
19 language features are extensions to the standard.
23 In the examples that follow, we assume that you have created a
24 database named <literal>mydb</literal>, as described in the previous
25 chapter, and have started <application>psql</application>.
29 Examples in this manual can also be found in the
30 <productname>PostgreSQL</productname> source distribution
31 in the directory <filename>src/tutorial/</filename>. To use those
32 files, first change to that directory and run <application>make</>:
35 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
36 <prompt>$</prompt> <userinput>make</userinput>
39 This creates the scripts and compiles the C files containing user-defined
40 functions and types. (You must use GNU make for this — it may be named
41 something different on your system, often <application>gmake</>.)
42 Then, to start the tutorial, do the following:
45 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
46 <prompt>$</prompt> <userinput>psql -s mydb</userinput>
51 <prompt>mydb=></prompt> <userinput>\i basics.sql</userinput>
54 The <literal>\i</literal> command reads in commands from the
55 specified file. The <literal>-s</literal> option puts you in
56 single step mode which pauses before sending each statement to the
57 server. The commands used in this section are in the file
58 <filename>basics.sql</filename>.
63 <sect1 id="tutorial-concepts">
64 <title>Concepts</title>
67 <indexterm><primary>relational database</primary></indexterm>
68 <indexterm><primary>hierarchical database</primary></indexterm>
69 <indexterm><primary>object-oriented database</primary></indexterm>
70 <indexterm><primary>relation</primary></indexterm>
71 <indexterm><primary>table</primary></indexterm>
73 <productname>PostgreSQL</productname> is a <firstterm>relational
74 database management system</firstterm> (<acronym>RDBMS</acronym>).
75 That means it is a system for managing data stored in
76 <firstterm>relations</firstterm>. Relation is essentially a
77 mathematical term for <firstterm>table</firstterm>. The notion of
78 storing data in tables is so commonplace today that it might
79 seem inherently obvious, but there are a number of other ways of
80 organizing databases. Files and directories on Unix-like
81 operating systems form an example of a hierarchical database. A
82 more modern development is the object-oriented database.
86 <indexterm><primary>row</primary></indexterm>
87 <indexterm><primary>column</primary></indexterm>
89 Each table is a named collection of <firstterm>rows</firstterm>.
90 Each row of a given table has the same set of named
91 <firstterm>columns</firstterm>,
92 and each column is of a specific data type. Whereas columns have
93 a fixed order in each row, it is important to remember that SQL
94 does not guarantee the order of the rows within the table in any
95 way (although they can be explicitly sorted for display).
99 <indexterm><primary>database cluster</primary></indexterm>
100 <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
102 Tables are grouped into databases, and a collection of databases
103 managed by a single <productname>PostgreSQL</productname> server
104 instance constitutes a database <firstterm>cluster</firstterm>.
109 <sect1 id="tutorial-table">
110 <title>Creating a New Table</title>
112 <indexterm zone="tutorial-table">
113 <primary>CREATE TABLE</primary>
117 You can create a new table by specifying the table
118 name, along with all column names and their types:
121 CREATE TABLE weather (
123 temp_lo int, -- low temperature
124 temp_hi int, -- high temperature
125 prcp real, -- precipitation
130 You can enter this into <command>psql</command> with the line
131 breaks. <command>psql</command> will recognize that the command
132 is not terminated until the semicolon.
136 White space (i.e., spaces, tabs, and newlines) may be used freely
137 in SQL commands. That means you can type the command aligned
138 differently than above, or even all on one line. Two dashes
139 (<quote><literal>--</literal></quote>) introduce comments.
140 Whatever follows them is ignored up to the end of the line. SQL
141 is case insensitive about key words and identifiers, except
142 when identifiers are double-quoted to preserve the case (not done
147 <type>varchar(80)</type> specifies a data type that can store
148 arbitrary character strings up to 80 characters in length.
149 <type>int</type> is the normal integer type. <type>real</type> is
150 a type for storing single precision floating-point numbers.
151 <type>date</type> should be self-explanatory. (Yes, the column of
152 type <type>date</type> is also named <literal>date</literal>.
153 This may be convenient or confusing — you choose.)
157 <productname>PostgreSQL</productname> supports the standard
158 <acronym>SQL</acronym> types <type>int</type>,
159 <type>smallint</type>, <type>real</type>, <type>double
160 precision</type>, <type>char(<replaceable>N</>)</type>,
161 <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
162 <type>time</type>, <type>timestamp</type>, and
163 <type>interval</type>, as well as other types of general utility
164 and a rich set of geometric types.
165 <productname>PostgreSQL</productname> can be customized with an
166 arbitrary number of user-defined data types. Consequently, type
167 names are not syntactical key words, except where required to
168 support special cases in the <acronym>SQL</acronym> standard.
172 The second example will store cities and their associated
173 geographical location:
175 CREATE TABLE cities (
180 The <type>point</type> type is an example of a
181 <productname>PostgreSQL</productname>-specific data type.
186 <primary>DROP TABLE</primary>
189 Finally, it should be mentioned that if you don't need a table any
190 longer or want to recreate it differently you can remove it using
191 the following command:
193 DROP TABLE <replaceable>tablename</replaceable>;
199 <sect1 id="tutorial-populate">
200 <title>Populating a Table With Rows</title>
202 <indexterm zone="tutorial-populate">
203 <primary>INSERT</primary>
207 The <command>INSERT</command> statement is used to populate a table with
211 INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
214 Note that all data types use rather obvious input formats.
215 Constants that are not simple numeric values usually must be
216 surrounded by single quotes (<literal>'</>), as in the example.
218 <type>date</type> type is actually quite flexible in what it
219 accepts, but for this tutorial we will stick to the unambiguous
224 The <type>point</type> type requires a coordinate pair as input,
227 INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
232 The syntax used so far requires you to remember the order of the
233 columns. An alternative syntax allows you to list the columns
236 INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
237 VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
239 You can list the columns in a different order if you wish or
240 even omit some columns, e.g., if the precipitation is unknown:
242 INSERT INTO weather (date, city, temp_hi, temp_lo)
243 VALUES ('1994-11-29', 'Hayward', 54, 37);
245 Many developers consider explicitly listing the columns better
246 style than relying on the order implicitly.
250 Please enter all the commands shown above so you have some data to
251 work with in the following sections.
256 <primary>COPY</primary>
259 You could also have used <command>COPY</command> to load large
260 amounts of data from flat-text files. This is usually faster
261 because the <command>COPY</command> command is optimized for this
262 application while allowing less flexibility than
263 <command>INSERT</command>. An example would be:
266 COPY weather FROM '/home/user/weather.txt';
269 where the file name for the source file must be available to the
270 backend server machine, not the client, since the backend server
271 reads the file directly. You can read more about the
272 <command>COPY</command> command in <xref linkend="sql-copy">.
277 <sect1 id="tutorial-select">
278 <title>Querying a Table</title>
281 <indexterm><primary>query</primary></indexterm>
282 <indexterm><primary>SELECT</primary></indexterm>
284 To retrieve data from a table, the table is
285 <firstterm>queried</firstterm>. An <acronym>SQL</acronym>
286 <command>SELECT</command> statement is used to do this. The
287 statement is divided into a select list (the part that lists the
288 columns to be returned), a table list (the part that lists the
289 tables from which to retrieve the data), and an optional
290 qualification (the part that specifies any restrictions). For
291 example, to retrieve all the rows of table
292 <classname>weather</classname>, type:
294 SELECT * FROM weather;
296 Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
299 While <literal>SELECT *</literal> is useful for off-the-cuff
300 queries, it is widely considered bad style in production code,
301 since adding a column to the table would change the results.
304 So the same result would be had with:
306 SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
309 The output should be:
312 city | temp_lo | temp_hi | prcp | date
313 ---------------+---------+---------+------+------------
314 San Francisco | 46 | 50 | 0.25 | 1994-11-27
315 San Francisco | 43 | 57 | 0 | 1994-11-29
316 Hayward | 37 | 54 | | 1994-11-29
322 You can write expressions, not just simple column references, in the
323 select list. For example, you can do:
325 SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
329 city | temp_avg | date
330 ---------------+----------+------------
331 San Francisco | 48 | 1994-11-27
332 San Francisco | 50 | 1994-11-29
333 Hayward | 45 | 1994-11-29
336 Notice how the <literal>AS</literal> clause is used to relabel the
337 output column. (The <literal>AS</literal> clause is optional.)
341 A query can be <quote>qualified</> by adding a <literal>WHERE</>
342 clause that specifies which rows are wanted. The <literal>WHERE</>
343 clause contains a Boolean (truth value) expression, and only rows for
344 which the Boolean expression is true are returned. The usual
345 Boolean operators (<literal>AND</literal>,
346 <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
347 the qualification. For example, the following
348 retrieves the weather of San Francisco on rainy days:
351 SELECT * FROM weather
352 WHERE city = 'San Francisco' AND prcp > 0.0;
356 city | temp_lo | temp_hi | prcp | date
357 ---------------+---------+---------+------+------------
358 San Francisco | 46 | 50 | 0.25 | 1994-11-27
364 <indexterm><primary>ORDER BY</primary></indexterm>
366 You can request that the results of a query
367 be returned in sorted order:
370 SELECT * FROM weather
375 city | temp_lo | temp_hi | prcp | date
376 ---------------+---------+---------+------+------------
377 Hayward | 37 | 54 | | 1994-11-29
378 San Francisco | 43 | 57 | 0 | 1994-11-29
379 San Francisco | 46 | 50 | 0.25 | 1994-11-27
382 In this example, the sort order isn't fully specified, and so you
383 might get the San Francisco rows in either order. But you'd always
384 get the results shown above if you do
387 SELECT * FROM weather
388 ORDER BY city, temp_lo;
393 <indexterm><primary>DISTINCT</primary></indexterm>
394 <indexterm><primary>duplicate</primary></indexterm>
396 You can request that duplicate rows be removed from the result of
412 Here again, the result row ordering might vary.
413 You can ensure consistent results by using <literal>DISTINCT</literal> and
414 <literal>ORDER BY</literal> together:
417 In some database systems, including older versions of
418 <productname>PostgreSQL</productname>, the implementation of
419 <literal>DISTINCT</literal> automatically orders the rows and
420 so <literal>ORDER BY</literal> is redundant. But this is not
421 required by the SQL standard, and current
422 <productname>PostgreSQL</productname> doesn't guarantee that
423 <literal>DISTINCT</literal> causes the rows to be ordered.
436 <sect1 id="tutorial-join">
437 <title>Joins Between Tables</title>
439 <indexterm zone="tutorial-join">
440 <primary>join</primary>
444 Thus far, our queries have only accessed one table at a time.
445 Queries can access multiple tables at once, or access the same
446 table in such a way that multiple rows of the table are being
447 processed at the same time. A query that accesses multiple rows
448 of the same or different tables at one time is called a
449 <firstterm>join</firstterm> query. As an example, say you wish to
450 list all the weather records together with the location of the
451 associated city. To do that, we need to compare the city column of
452 each row of the weather table with the name column of all rows in
453 the cities table, and select the pairs of rows where these values match.
456 This is only a conceptual model. The join is usually performed
457 in a more efficient manner than actually comparing each possible
458 pair of rows, but this is invisible to the user.
461 This would be accomplished by the following query:
470 city | temp_lo | temp_hi | prcp | date | name | location
471 ---------------+---------+---------+------+------------+---------------+-----------
472 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
473 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
480 Observe two things about the result set:
484 There is no result row for the city of Hayward. This is
485 because there is no matching entry in the
486 <classname>cities</classname> table for Hayward, so the join
487 ignores the unmatched rows in the weather table. We will see
488 shortly how this can be fixed.
494 There are two columns containing the city name. This is
495 correct because the lists of columns of the
496 <classname>weather</classname> and the
497 <classname>cities</classname> table are concatenated. In
498 practice this is undesirable, though, so you will probably want
499 to list the output columns explicitly rather than using
500 <literal>*</literal>:
502 SELECT city, temp_lo, temp_hi, prcp, date, location
512 <title>Exercise:</title>
515 Attempt to find out the semantics of this query when the
516 <literal>WHERE</literal> clause is omitted.
521 Since the columns all had different names, the parser
522 automatically found out which table they belong to, but it is good
523 style to fully qualify column names in join queries:
526 SELECT weather.city, weather.temp_lo, weather.temp_hi,
527 weather.prcp, weather.date, cities.location
529 WHERE cities.name = weather.city;
534 Join queries of the kind seen thus far can also be written in this
539 FROM weather INNER JOIN cities ON (weather.city = cities.name);
542 This syntax is not as commonly used as the one above, but we show
543 it here to help you understand the following topics.
547 <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
549 Now we will figure out how we can get the Hayward records back in.
550 What we want the query to do is to scan the
551 <classname>weather</classname> table and for each row to find the
552 matching <classname>cities</classname> row. If no matching row is
553 found we want some <quote>empty values</quote> to be substituted
554 for the <classname>cities</classname> table's columns. This kind
555 of query is called an <firstterm>outer join</firstterm>. (The
556 joins we have seen so far are inner joins.) The command looks
561 FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
563 city | temp_lo | temp_hi | prcp | date | name | location
564 ---------------+---------+---------+------+------------+---------------+-----------
565 Hayward | 37 | 54 | | 1994-11-29 | |
566 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
567 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
571 This query is called a <firstterm>left outer
572 join</firstterm> because the table mentioned on the left of the
573 join operator will have each of its rows in the output at least
574 once, whereas the table on the right will only have those rows
575 output that match some row of the left table. When outputting a
576 left-table row for which there is no right-table match, empty (null)
577 values are substituted for the right-table columns.
581 <title>Exercise:</title>
584 There are also right outer joins and full outer joins. Try to
585 find out what those do.
590 <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
591 <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
593 We can also join a table against itself. This is called a
594 <firstterm>self join</firstterm>. As an example, suppose we wish
595 to find all the weather records that are in the temperature range
596 of other weather records. So we need to compare the
597 <structfield>temp_lo</> and <structfield>temp_hi</> columns of
598 each <classname>weather</classname> row to the
599 <structfield>temp_lo</structfield> and
600 <structfield>temp_hi</structfield> columns of all other
601 <classname>weather</classname> rows. We can do this with the
605 SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
606 W2.city, W2.temp_lo AS low, W2.temp_hi AS high
607 FROM weather W1, weather W2
608 WHERE W1.temp_lo < W2.temp_lo
609 AND W1.temp_hi > W2.temp_hi;
611 city | low | high | city | low | high
612 ---------------+-----+------+---------------+-----+------
613 San Francisco | 43 | 57 | San Francisco | 46 | 50
614 Hayward | 37 | 54 | San Francisco | 46 | 50
618 Here we have relabeled the weather table as <literal>W1</> and
619 <literal>W2</> to be able to distinguish the left and right side
620 of the join. You can also use these kinds of aliases in other
621 queries to save some typing, e.g.:
624 FROM weather w, cities c
625 WHERE w.city = c.name;
627 You will encounter this style of abbreviating quite frequently.
632 <sect1 id="tutorial-agg">
633 <title>Aggregate Functions</title>
635 <indexterm zone="tutorial-agg">
636 <primary>aggregate function</primary>
640 <indexterm><primary>average</primary></indexterm>
641 <indexterm><primary>count</primary></indexterm>
642 <indexterm><primary>max</primary></indexterm>
643 <indexterm><primary>min</primary></indexterm>
644 <indexterm><primary>sum</primary></indexterm>
646 Like most other relational database products,
647 <productname>PostgreSQL</productname> supports
649 An aggregate function computes a single result from multiple input rows.
650 For example, there are aggregates to compute the
651 <function>count</function>, <function>sum</function>,
652 <function>avg</function> (average), <function>max</function> (maximum) and
653 <function>min</function> (minimum) over a set of rows.
657 As an example, we can find the highest low-temperature reading anywhere
661 SELECT max(temp_lo) FROM weather;
673 <indexterm><primary>subquery</primary></indexterm>
675 If we wanted to know what city (or cities) that reading occurred in,
679 SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
682 but this will not work since the aggregate
683 <function>max</function> cannot be used in the
684 <literal>WHERE</literal> clause. (This restriction exists because
685 the <literal>WHERE</literal> clause determines the rows that will
686 go into the aggregation stage; so it has to be evaluated before
687 aggregate functions are computed.)
688 However, as is often the case
689 the query can be restated to accomplish the intended result, here
690 by using a <firstterm>subquery</firstterm>:
693 SELECT city FROM weather
694 WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
704 This is OK because the subquery is an independent computation
705 that computes its own aggregate separately from what is happening
710 <indexterm><primary>GROUP BY</primary></indexterm>
711 <indexterm><primary>HAVING</primary></indexterm>
713 Aggregates are also very useful in combination with <literal>GROUP
714 BY</literal> clauses. For example, we can get the maximum low
715 temperature observed in each city with
718 SELECT city, max(temp_lo)
725 ---------------+-----
731 which gives us one output row per city. Each aggregate result is
732 computed over the table rows matching that city.
733 We can filter these grouped
734 rows using <literal>HAVING</literal>:
737 SELECT city, max(temp_lo)
740 HAVING max(temp_lo) < 40;
750 which gives us the same results for only the cities that have all
751 <literal>temp_lo</> values below 40. Finally, if we only care about
753 names begin with <quote><literal>S</literal></quote>, we might do
756 SELECT city, max(temp_lo)
758 WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
760 HAVING max(temp_lo) < 40;
763 <callout arearefs="co.tutorial-agg-like">
765 The <literal>LIKE</literal> operator does pattern matching and
766 is explained in <xref linkend="functions-matching">.
773 It is important to understand the interaction between aggregates and
774 <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
775 The fundamental difference between <literal>WHERE</literal> and
776 <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
777 input rows before groups and aggregates are computed (thus, it controls
778 which rows go into the aggregate computation), whereas
779 <literal>HAVING</literal> selects group rows after groups and
780 aggregates are computed. Thus, the
781 <literal>WHERE</literal> clause must not contain aggregate functions;
782 it makes no sense to try to use an aggregate to determine which rows
783 will be inputs to the aggregates. On the other hand, the
784 <literal>HAVING</literal> clause always contains aggregate functions.
785 (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
786 clause that doesn't use aggregates, but it's wasteful. The same condition
787 could be used more efficiently at the <literal>WHERE</literal> stage.)
791 In the previous example, we can apply the city name restriction in
792 <literal>WHERE</literal>, since it needs no aggregate. This is
793 more efficient than adding the restriction to <literal>HAVING</literal>,
794 because we avoid doing the grouping and aggregate calculations
795 for all rows that fail the <literal>WHERE</literal> check.
800 <sect1 id="tutorial-update">
801 <title>Updates</title>
803 <indexterm zone="tutorial-update">
804 <primary>UPDATE</primary>
808 You can update existing rows using the
809 <command>UPDATE</command> command.
810 Suppose you discover the temperature readings are
811 all off by 2 degrees as of November 28. You may update the
816 SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
817 WHERE date > '1994-11-28';
822 Look at the new state of the data:
824 SELECT * FROM weather;
826 city | temp_lo | temp_hi | prcp | date
827 ---------------+---------+---------+------+------------
828 San Francisco | 46 | 50 | 0.25 | 1994-11-27
829 San Francisco | 41 | 55 | 0 | 1994-11-29
830 Hayward | 35 | 52 | | 1994-11-29
836 <sect1 id="tutorial-delete">
837 <title>Deletions</title>
839 <indexterm zone="tutorial-delete">
840 <primary>DELETE</primary>
844 Rows can be removed from a table using the <command>DELETE</command>
846 Suppose you are no longer interested in the weather of Hayward.
847 Then you can do the following to delete those rows from the table:
849 DELETE FROM weather WHERE city = 'Hayward';
852 All weather records belonging to Hayward are removed.
855 SELECT * FROM weather;
859 city | temp_lo | temp_hi | prcp | date
860 ---------------+---------+---------+------+------------
861 San Francisco | 46 | 50 | 0.25 | 1994-11-27
862 San Francisco | 41 | 55 | 0 | 1994-11-29
868 One should be wary of statements of the form
870 DELETE FROM <replaceable>tablename</replaceable>;
873 Without a qualification, <command>DELETE</command> will
874 remove <emphasis>all</> rows from the given table, leaving it
875 empty. The system will not request confirmation before
882 <!-- Keep this comment at the end of the file
887 sgml-minimize-attributes:nil
888 sgml-always-quote-attributes:t
891 sgml-parent-document:nil
892 sgml-default-dtd-file:"./reference.ced"
893 sgml-exposed-tags:nil
894 sgml-local-catalogs:("/usr/lib/sgml/catalog")
895 sgml-local-ecat-files:nil