2 $Header: /cvsroot/pgsql/doc/src/sgml/query.sgml,v 1.22 2001/11/23 21:08:51 tgl Exp $
5 <chapter id="tutorial-sql">
6 <title>The <acronym>SQL</acronym> Language</title>
8 <sect1 id="tutorial-sql-intro">
9 <title>Introduction</title>
12 This chapter provides an overview of how to use
13 <acronym>SQL</acronym> to perform simple operations. This
14 tutorial is only intended to give you an introduction and is in no
15 way a complete tutorial on <acronym>SQL</acronym>. Numerous books
16 have been written on <acronym>SQL92</acronym>, including <xref
17 linkend="MELT93"> and <xref linkend="DATE97">.
18 You should be aware that some <productname>PostgreSQL</productname>
19 language features are extensions to the standard.
23 In the examples that follow, we assume that you have created a
24 database named <quote>mydb</quote>, as described in the previous
25 chapter, and have started <application>psql</application>.
29 Examples in this manual can also be found in the
30 <productname>PostgreSQL</productname> source distribution
31 in the directory <filename>src/tutorial/</filename>. Refer to the
32 <filename>README</filename> file in that directory for how to use
33 them. To start the tutorial, do the following:
36 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
37 <prompt>$</prompt> <userinput>psql -s mydb</userinput>
42 <prompt>mydb=></prompt> <userinput>\i basics.sql</userinput>
45 The <literal>\i</literal> command reads in commands from the
46 specified file. The <literal>-s</literal> option puts you in
47 single step mode which pauses before sending each query to the
48 server. The commands used in this section are in the file
49 <filename>basics.sql</filename>.
54 <sect1 id="tutorial-concepts">
55 <title>Concepts</title>
58 <indexterm><primary>relational database</primary></indexterm>
59 <indexterm><primary>hierarchical database</primary></indexterm>
60 <indexterm><primary>object-oriented database</primary></indexterm>
61 <indexterm><primary>relation</primary></indexterm>
62 <indexterm><primary>table</primary></indexterm>
64 <productname>PostgreSQL</productname> is a <firstterm>relational
65 database management system</firstterm> (<acronym>RDBMS</acronym>).
66 That means it is a system for managing data stored in
67 <firstterm>relations</firstterm>. Relation is essentially a
68 mathematical term for <firstterm>table</firstterm>. The notion of
69 storing data in tables is so commonplace today that it might
70 seem inherently obvious, but there are a number of other ways of
71 organizing databases. Files and directories on Unix-like
72 operating systems form an example of a hierarchical database. A
73 more modern development is the object-oriented database.
77 <indexterm><primary>row</primary></indexterm>
78 <indexterm><primary>column</primary></indexterm>
80 Each table is a named collection of <firstterm>rows</firstterm>.
81 Each row of a given table has the same set of named
82 <firstterm>columns</firstterm>,
83 and each column is of a specific data type. Whereas columns have
84 a fixed order in each row, it is important to remember that SQL
85 does not guarantee the order of the rows within the table in any
86 way (although they can be explicitly sorted for display).
90 <indexterm><primary>cluster</primary></indexterm>
92 Tables are grouped into databases, and a collection of databases
93 managed by a single <productname>PostgreSQL</productname> server
94 instance constitutes a database <firstterm>cluster</firstterm>.
99 <sect1 id="tutorial-table">
100 <title>Creating a New Table</title>
102 <indexterm zone="tutorial-table">
103 <primary>CREATE TABLE</primary>
107 You can create a new table by specifying the table
108 name, along with all column names and their types:
111 CREATE TABLE weather (
113 temp_lo int, -- low temperature
114 temp_hi int, -- high temperature
115 prcp real, -- precipitation
120 You can enter this into <command>psql</command> with the line
121 breaks. <command>psql</command> will recognize that the command
122 is not terminated until the semicolon.
126 White space (i.e., spaces, tabs, and newlines) may be used freely
127 in SQL commands. That means you can type the command aligned
128 differently than above, or even all on one line. Two dashes
129 (<quote><literal>--</literal></quote>) introduce comments.
130 Whatever follows them is ignored up to the end of the line. SQL
131 is case insensitive about key words and identifiers, except
132 when identifiers are double-quoted to preserve the case (not done
137 <type>varchar(80)</type> specifies a data type that can store
138 arbitrary character strings up to 80 characters in length.
139 <type>int</type> is the normal integer type. <type>real</type> is
140 a type for storing single precision floating point numbers.
141 <type>date</type> should be self-explanatory. (Yes, the column of
142 type <type>date</type> is also named <literal>date</literal>.
143 This may be convenient or confusing -- you choose.)
147 <productname>PostgreSQL</productname> supports the usual
148 <acronym>SQL</acronym> types <type>int</type>,
149 <type>smallint</type>, <type>real</type>, <type>double
150 precision</type>, <type>char(<replaceable>N</>)</type>,
151 <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
152 <type>time</type>, <type>timestamp</type>, and
153 <type>interval</type>, as well as other types of general utility
154 and a rich set of geometric types.
155 <productname>PostgreSQL</productname> can be customized with an
156 arbitrary number of user-defined data types. Consequently, type
157 names are not syntactical keywords, except where required to
158 support special cases in the <acronym>SQL</acronym> standard.
162 The second example will store cities and their associated
163 geographical location:
165 CREATE TABLE cities (
170 The <type>point</type> type is an example of a
171 <productname>PostgreSQL</productname>-specific data type.
176 <primary>DROP TABLE</primary>
179 Finally, it should be mentioned that if you don't need a table any
180 longer or want to recreate it differently you can remove it using
181 the following command:
183 DROP TABLE <replaceable>tablename</replaceable>;
189 <sect1 id="tutorial-populate">
190 <title>Populating a Table With Rows</title>
192 <indexterm zone="tutorial-populate">
193 <primary>INSERT</primary>
197 The <command>INSERT</command> statement is used to populate a table with
201 INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
204 Note that all data types use rather obvious input formats.
205 Constants that are not simple numeric values usually must be
206 surrounded by single quotes (<literal>'</>), as in the example.
208 <type>date</type> column is actually quite flexible in what it
209 accepts, but for this tutorial we will stick to the unambiguous
214 The <type>point</type> type requires a coordinate pair as input,
217 INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
222 The syntax used so far requires you to remember the order of the
223 columns. An alternative syntax allows you to list the columns
226 INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
227 VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
229 You can list the columns in a different order if you wish or
230 even omit some columns, e.g., if the precipitation is unknown:
232 INSERT INTO weather (date, city, temp_hi, temp_lo)
233 VALUES ('1994-11-29', 'Hayward', 54, 37);
235 Many developers consider explicitly listing the columns better
236 style than relying on the order implicitly.
240 Please enter all the commands shown above so you have some data to
241 work with in the following sections.
246 <primary>COPY</primary>
249 You could also have used <command>COPY</command> to load large
250 amounts of data from flat text files. This is usually faster
251 because the <command>COPY</command> is optimized for this
252 application while allowing less flexibility than
253 <command>INSERT</command>. An example would be:
256 COPY weather FROM '/home/user/weather.txt';
259 where the path name for the source file must be available to the
260 backend server machine, not the client, since the backend server
261 reads the file directly. You can read more about the
262 <command>COPY</command> command in the <citetitle>Reference
268 <sect1 id="tutorial-select">
269 <title>Querying a Table</title>
272 <indexterm><primary>query</primary></indexterm>
273 <indexterm><primary>SELECT</primary></indexterm>
275 To retrieve data from a table it is
276 <firstterm>queried</firstterm>. An <acronym>SQL</acronym>
277 <command>SELECT</command> statement is used to do this. The
278 statement is divided into a select list (the part that lists the
279 columns to be returned), a table list (the part that lists the
280 tables from which to retrieve the data), and an optional
281 qualification (the part that specifies any restrictions). For
282 example, to retrieve all the rows of
283 <classname>weather</classname>, type:
285 SELECT * FROM weather;
287 (here <literal>*</literal> means <quote>all columns</quote>) and
288 the output should be:
290 city | temp_lo | temp_hi | prcp | date
291 ---------------+---------+---------+------+------------
292 San Francisco | 46 | 50 | 0.25 | 1994-11-27
293 San Francisco | 43 | 57 | 0 | 1994-11-29
294 Hayward | 37 | 54 | | 1994-11-29
300 You may specify any arbitrary expressions in the target list. For
303 SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
307 city | temp_avg | date
308 ---------------+----------+------------
309 San Francisco | 48 | 1994-11-27
310 San Francisco | 50 | 1994-11-29
311 Hayward | 45 | 1994-11-29
314 Notice how the <literal>AS</literal> clause is used to relabel the
315 output column. (It is optional.)
319 Arbitrary Boolean operators (<literal>AND</literal>,
320 <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
321 the qualification of a query. For example, the following
322 retrieves the weather of San Francisco on rainy days:
325 SELECT * FROM weather
326 WHERE city = 'San Francisco'
331 city | temp_lo | temp_hi | prcp | date
332 ---------------+---------+---------+------+------------
333 San Francisco | 46 | 50 | 0.25 | 1994-11-27
339 <indexterm><primary>ORDER BY</primary></indexterm>
340 <indexterm><primary>DISTINCT</primary></indexterm>
341 <indexterm><primary>duplicate</primary></indexterm>
343 As a final note, you can request that the results of a select can
344 be returned in sorted order or with duplicate rows removed. (Just
345 to make sure the following won't confuse you,
346 <literal>DISTINCT</literal> and <literal>ORDER BY</literal> can be
366 <sect1 id="tutorial-join">
367 <title>Joins Between Tables</title>
369 <indexterm zone="tutorial-join">
370 <primary>join</primary>
374 Thus far, our queries have only accessed one table at a time.
375 Queries can access multiple tables at once, or access the same
376 table in such a way that multiple rows of the table are being
377 processed at the same time. A query that accesses multiple rows
378 of the same or different tables at one time is called a
379 <firstterm>join</firstterm> query. As an example, say you wish to
380 list all the weather records together with the location of the
381 associated city. To do that, we need to compare the city column of
382 each row of the weather table with the name column of all rows in
383 the cities table, and select the pairs of rows where these values match.
386 This is only a conceptual model. The actual join may
387 be performed in a more efficient manner, but this is invisible
391 This would be accomplished by the following query:
400 city | temp_lo | temp_hi | prcp | date | name | location
401 ---------------+---------+---------+------+------------+---------------+-----------
402 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
403 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
410 Observe two things about the result set:
414 There is no result row for the city of Hayward. This is
415 because there is no matching entry in the
416 <classname>cities</classname> table for Hayward, so the join
417 ignores the unmatched rows in the weather table. We will see
418 shortly how this can be fixed.
424 There are two columns containing the city name. This is
425 correct because the lists of columns of the
426 <classname>weather</classname> and the
427 <classname>cities</classname> tables are concatenated. In
428 practice this is undesirable, though, so you will probably want
429 to list the output columns explicitly rather than using
430 <literal>*</literal>:
432 SELECT city, temp_lo, temp_hi, prcp, date, location
442 <title>Exercise:</title>
445 Attempt to find out the semantics of this query when the
446 <literal>WHERE</literal> clause is omitted.
451 Since the columns all had different names, the parser
452 automatically found out which table they belong to, but it is good
453 style to fully qualify column names in join queries:
456 SELECT weather.city, weather.temp_lo, weather.temp_hi,
457 weather.prcp, weather.date, cities.location
459 WHERE cities.name = weather.city;
464 Join queries of the kind seen thus far can also be written in this
469 FROM weather INNER JOIN cities ON (weather.city = cities.name);
472 This syntax is not as commonly used as the one above, but we show
473 it here to help you understand the following topics.
477 <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
479 Now we will figure out how we can get the Hayward records back in.
480 What we want the query to do is to scan the
481 <classname>weather</classname> table and for each row to find the
482 matching <classname>cities</classname> row. If no matching row is
483 found we want some <quote>empty values</quote> to be substituted
484 for the <classname>cities</classname> table's columns. This kind
485 of query is called an <firstterm>outer join</firstterm>. (The
486 joins we have seen so far are inner joins.) The command looks
491 FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
493 city | temp_lo | temp_hi | prcp | date | name | location
494 ---------------+---------+---------+------+------------+---------------+-----------
495 Hayward | 37 | 54 | | 1994-11-29 | |
496 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
497 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
501 This query is called a <firstterm>left outer
502 join</firstterm> because the table mentioned on the left of the
503 join operator will have each of its rows in the output at least
504 once, whereas the table on the right will only have those rows
505 output that match some row of the left table. When outputting a
506 left-table row for which there is no right-table match, empty (NULL)
507 values are substituted for the right-table columns.
511 <title>Exercise:</title>
514 There are also right outer joins and full outer joins. Try to
515 find out what those do.
520 <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
521 <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
523 We can also join a table against itself. This is called a
524 <firstterm>self join</firstterm>. As an example, suppose we wish
525 to find all the weather records that are in the temperature range
526 of other weather records. So we need to compare the
527 <structfield>temp_lo</> and <structfield>temp_hi</> columns of
528 each <classname>weather</classname> row to the
529 <structfield>temp_lo</structfield> and
530 <structfield>temp_hi</structfield> columns of all other
531 <classname>weather</classname> rows. We can do this with the
535 SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
536 W2.city, W2.temp_lo AS low, W2.temp_hi AS high
537 FROM weather W1, weather W2
538 WHERE W1.temp_lo < W2.temp_lo
539 AND W1.temp_hi > W2.temp_hi;
541 city | low | high | city | low | high
542 ---------------+-----+------+---------------+-----+------
543 San Francisco | 43 | 57 | San Francisco | 46 | 50
544 Hayward | 37 | 54 | San Francisco | 46 | 50
548 Here we have relabeled the weather table as <literal>W1</> and
549 <literal>W2</> to be able to distinguish the left and right side
550 of the join. You can also use these kinds of aliases in other
551 queries to save some typing, e.g.:
554 FROM weather w, cities c
555 WHERE w.city = c.name;
557 You will encounter this style of abbreviating quite frequently.
562 <sect1 id="tutorial-agg">
563 <title>Aggregate Functions</title>
565 <indexterm zone="tutorial-agg">
566 <primary>aggregate</primary>
570 <indexterm><primary>average</primary></indexterm>
571 <indexterm><primary>count</primary></indexterm>
572 <indexterm><primary>max</primary></indexterm>
573 <indexterm><primary>min</primary></indexterm>
574 <indexterm><primary>sum</primary></indexterm>
576 Like most other relational database products,
577 <productname>PostgreSQL</productname> supports
579 An aggregate function computes a single result from multiple input rows.
580 For example, there are aggregates to compute the
581 <function>count</function>, <function>sum</function>,
582 <function>avg</function> (average), <function>max</function> (maximum) and
583 <function>min</function> (minimum) over a set of rows.
587 As an example, we can find the highest low-temperature reading anywhere
591 SELECT max(temp_lo) FROM weather;
603 <indexterm><primary>subquery</primary></indexterm>
605 If we want to know what city (or cities) that reading occurred in,
609 SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
612 but this will not work since the aggregate
613 <function>max</function> cannot be used in the
614 <literal>WHERE</literal> clause. (This restriction exists because
615 the <literal>WHERE</literal> clause determines the rows that will
616 go into the aggregation stage; so it has to be evaluated before
617 aggregate functions are computed.)
618 However, as is often the case
619 the query can be restated to accomplish the intended result; here
620 by using a <firstterm>subquery</firstterm>:
623 SELECT city FROM weather
624 WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
634 This is OK because the sub-select is an independent computation
635 that computes its own aggregate separately from what is happening
640 <indexterm><primary>GROUP BY</primary></indexterm>
641 <indexterm><primary>HAVING</primary></indexterm>
643 Aggregates are also very useful in combination with <literal>GROUP
644 BY</literal> clauses. For example, we can get the maximum low
645 temperature observed in each city with
648 SELECT city, max(temp_lo)
655 ---------------+-----
661 which gives us one output row per city. Each aggregate result is
662 computed over the table rows matching that city.
663 We can filter these grouped
664 rows using <literal>HAVING</literal>:
667 SELECT city, max(temp_lo)
670 HAVING max(temp_lo) < 40;
680 which gives us the same results for only the cities that have all
681 <literal>temp_lo</> values below forty. Finally, if we only care about
683 names begin with <quote><literal>S</literal></quote>, we might do
686 SELECT city, max(temp_lo)
690 HAVING max(temp_lo) < 40;
695 It is important to understand the interaction between aggregates and
696 SQL's <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
697 The fundamental difference between <literal>WHERE</literal> and
698 <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
699 input rows before groups and aggregates are computed (thus, it controls
700 which rows go into the aggregate computation), whereas
701 <literal>HAVING</literal> selects group rows after groups and
702 aggregates are computed. Thus, the
703 <literal>WHERE</literal> clause must not contain aggregate functions;
704 it makes no sense to try to use an aggregate to determine which rows
705 will be inputs to the aggregates. On the other hand,
706 <literal>HAVING</literal> clauses always contain aggregate functions.
707 (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
708 clause that doesn't use aggregates, but it's wasteful; the same condition
709 could be used more efficiently at the <literal>WHERE</literal> stage.)
713 Observe that we can apply the city name restriction in
714 <literal>WHERE</literal>, since it needs no aggregate. This is
715 more efficient than adding the restriction to <literal>HAVING</literal>,
716 because we avoid doing the grouping and aggregate calculations
717 for all rows that fail the <literal>WHERE</literal> check.
722 <sect1 id="tutorial-update">
723 <title>Updates</title>
725 <indexterm zone="tutorial-update">
726 <primary>UPDATE</primary>
730 You can update existing rows using the
731 <command>UPDATE</command> command.
732 Suppose you discover the temperature readings are
733 all off by 2 degrees as of November 28, you may update the
738 SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
739 WHERE date > '1994-11-28';
744 Look at the new state of the data:
746 SELECT * FROM weather;
748 city | temp_lo | temp_hi | prcp | date
749 ---------------+---------+---------+------+------------
750 San Francisco | 46 | 50 | 0.25 | 1994-11-27
751 San Francisco | 41 | 55 | 0 | 1994-11-29
752 Hayward | 35 | 52 | | 1994-11-29
758 <sect1 id="tutorial-delete">
759 <title>Deletions</title>
761 <indexterm zone="tutorial-delete">
762 <primary>DELETE</primary>
766 Suppose you are no longer interested in the weather of Hayward,
767 then you can do the following to delete those rows from the table.
768 Deletions are performed using the <command>DELETE</command>
771 DELETE FROM weather WHERE city = 'Hayward';
774 All weather records belonging to Hayward are removed.
777 SELECT * FROM weather;
781 city | temp_lo | temp_hi | prcp | date
782 ---------------+---------+---------+------+------------
783 San Francisco | 46 | 50 | 0.25 | 1994-11-27
784 San Francisco | 41 | 55 | 0 | 1994-11-29
790 One should be wary of queries of the form
792 DELETE FROM <replaceable>tablename</replaceable>;
795 Without a qualification, <command>DELETE</command> will
796 remove <emphasis>all</> rows from the given table, leaving it
797 empty. The system will not request confirmation before
804 <!-- Keep this comment at the end of the file
809 sgml-minimize-attributes:nil
810 sgml-always-quote-attributes:t
813 sgml-parent-document:nil
814 sgml-default-dtd-file:"./reference.ced"
815 sgml-exposed-tags:nil
816 sgml-local-catalogs:("/usr/lib/sgml/catalog")
817 sgml-local-ecat-files:nil