granicus.if.org Git - postgresql/blob - doc/src/sgml/query.sgml

   1 <!--
   2 $PostgreSQL: pgsql/doc/src/sgml/query.sgml,v 1.43 2005/01/22 22:56:36 momjian Exp $
   3 -->
   4
   5  <chapter id="tutorial-sql">
   6   <title>The <acronym>SQL</acronym> Language</title>
   7
   8   <sect1 id="tutorial-sql-intro">
   9    <title>Introduction</title>
  10
  11    <para>
  12     This chapter provides an overview of how to use
  13     <acronym>SQL</acronym> to perform simple operations.  This
  14     tutorial is only intended to give you an introduction and is in no
  15     way a complete tutorial on <acronym>SQL</acronym>.  Numerous books
  16     have been written on <acronym>SQL</acronym>, including <xref
  17     linkend="MELT93"> and <xref linkend="DATE97">.
  18     You should be aware that some <productname>PostgreSQL</productname>
  19     language features are extensions to the standard.
  20    </para>
  21
  22    <para>
  23     In the examples that follow, we assume that you have created a
  24     database named <literal>mydb</literal>, as described in the previous
  25     chapter, and have started <application>psql</application>.
  26    </para>
  27
  28    <para>
  29     Examples in this manual can also be found in the
  30     <productname>PostgreSQL</productname> source distribution
  31     in the directory <filename>src/tutorial/</filename>.  To use those
  32     files, first change to that directory and run <application>make</>:
  33
  34 <screen>
  35 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
  36 <prompt>$</prompt> <userinput>make</userinput>
  37 </screen>
  38
  39     This creates the scripts and compiles the C files containing user-defined
  40     functions and types.  (You must use GNU make for this &mdash; it may be named
  41     something different on your system, often <application>gmake</>.)
  42     Then, to start the tutorial, do the following:
  43
  44 <screen>
  45 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
  46 <prompt>$</prompt> <userinput>psql -s mydb</userinput>
  47 <computeroutput>
  48 ...
  49 </computeroutput>
  50
  51 <prompt>mydb=&gt;</prompt> <userinput>\i basics.sql</userinput>
  52 </screen>
  53
  54     The <literal>\i</literal> command reads in commands from the
  55     specified file. The <literal>-s</literal> option puts you in
  56     single step mode which pauses before sending each statement to the
  57     server.  The commands used in this section are in the file
  58     <filename>basics.sql</filename>.
  59    </para>
  60   </sect1>
  61
  62
  63   <sect1 id="tutorial-concepts">
  64    <title>Concepts</title>
  65
  66    <para>
  67     <indexterm><primary>relational database</primary></indexterm>
  68     <indexterm><primary>hierarchical database</primary></indexterm>
  69     <indexterm><primary>object-oriented database</primary></indexterm>
  70     <indexterm><primary>relation</primary></indexterm>
  71     <indexterm><primary>table</primary></indexterm>
  72
  73     <productname>PostgreSQL</productname> is a <firstterm>relational
  74     database management system</firstterm> (<acronym>RDBMS</acronym>).
  75     That means it is a system for managing data stored in
  76     <firstterm>relations</firstterm>.  Relation is essentially a
  77     mathematical term for <firstterm>table</firstterm>.  The notion of
  78     storing data in tables is so commonplace today that it might
  79     seem inherently obvious, but there are a number of other ways of
  80     organizing databases.  Files and directories on Unix-like
  81     operating systems form an example of a hierarchical database.  A
  82     more modern development is the object-oriented database.
  83    </para>
  84
  85    <para>
  86     <indexterm><primary>row</primary></indexterm>
  87     <indexterm><primary>column</primary></indexterm>
  88
  89     Each table is a named collection of <firstterm>rows</firstterm>.
  90     Each row of a given table has the same set of named
  91     <firstterm>columns</firstterm>,
  92     and each column is of a specific data type.  Whereas columns have
  93     a fixed order in each row, it is important to remember that SQL
  94     does not guarantee the order of the rows within the table in any
  95     way (although they can be explicitly sorted for display).
  96    </para>
  97
  98    <para>
  99     <indexterm><primary>database cluster</primary></indexterm>
 100     <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
 101
 102     Tables are grouped into databases, and a collection of databases
 103     managed by a single <productname>PostgreSQL</productname> server
 104     instance constitutes a database <firstterm>cluster</firstterm>.
 105    </para>
 106   </sect1>
 107
 108
 109   <sect1 id="tutorial-table">
 110    <title>Creating a New Table</title>
 111
 112    <indexterm zone="tutorial-table">
 113     <primary>CREATE TABLE</primary>
 114    </indexterm>
 115
 116    <para>
 117     You  can  create  a  new  table by specifying the table
 118     name, along with all column names and their types:
 119
 120 <programlisting>
 121 CREATE TABLE weather (
 122     city            varchar(80),
 123     temp_lo         int,           -- low temperature
 124     temp_hi         int,           -- high temperature
 125     prcp            real,          -- precipitation
 126     date            date
 127 );
 128 </programlisting>
 129
 130     You can enter this into <command>psql</command> with the line
 131     breaks.  <command>psql</command> will recognize that the command
 132     is not terminated until the semicolon.
 133    </para>
 134
 135    <para>
 136     White space (i.e., spaces, tabs, and newlines) may be used freely
 137     in SQL commands.  That means you can type the command aligned
 138     differently than above, or even all on one line.  Two dashes
 139     (<quote><literal>--</literal></quote>) introduce comments.
 140     Whatever follows them is ignored up to the end of the line.  SQL
 141     is case insensitive about key words and identifiers, except
 142     when identifiers are double-quoted to preserve the case (not done
 143     above).
 144    </para>
 145
 146    <para>
 147     <type>varchar(80)</type> specifies a data type that can store
 148     arbitrary character strings up to 80 characters in length.
 149     <type>int</type> is the normal integer type.  <type>real</type> is
 150     a type for storing single precision floating-point numbers.
 151     <type>date</type> should be self-explanatory.  (Yes, the column of
 152     type <type>date</type> is also named <literal>date</literal>.
 153     This may be convenient or confusing &mdash; you choose.)
 154    </para>
 155
 156    <para>
 157     <productname>PostgreSQL</productname> supports the standard
 158     <acronym>SQL</acronym> types <type>int</type>,
 159     <type>smallint</type>, <type>real</type>, <type>double
 160     precision</type>, <type>char(<replaceable>N</>)</type>,
 161     <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
 162     <type>time</type>, <type>timestamp</type>, and
 163     <type>interval</type>, as well as other types of general utility
 164     and a rich set of geometric types.
 165     <productname>PostgreSQL</productname> can be customized with an
 166     arbitrary number of user-defined data types.  Consequently, type
 167     names are not syntactical key words, except where required to
 168     support special cases in the <acronym>SQL</acronym> standard.
 169    </para>
 170
 171    <para>
 172     The second example will store cities and their associated
 173     geographical location:
 174 <programlisting>
 175 CREATE TABLE cities (
 176     name            varchar(80),
 177     location        point
 178 );
 179 </programlisting>
 180     The <type>point</type> type is an example of a
 181     <productname>PostgreSQL</productname>-specific data type.
 182    </para>
 183
 184    <para>
 185     <indexterm>
 186      <primary>DROP TABLE</primary>
 187     </indexterm>
 188
 189     Finally, it should be mentioned that if you don't need a table any
 190     longer or want to recreate it differently you can remove it using
 191     the following command:
 192 <synopsis>
 193 DROP TABLE <replaceable>tablename</replaceable>;
 194 </synopsis>
 195    </para>
 196   </sect1>
 197
 198
 199   <sect1 id="tutorial-populate">
 200    <title>Populating a Table With Rows</title>
 201
 202    <indexterm zone="tutorial-populate">
 203     <primary>INSERT</primary>
 204    </indexterm>
 205
 206    <para>
 207     The <command>INSERT</command> statement is used to populate a table  with
 208     rows:
 209
 210 <programlisting>
 211 INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
 212 </programlisting>
 213
 214     Note that all data types use rather obvious input formats.
 215     Constants that are not simple numeric values usually must be
 216     surrounded by single quotes (<literal>'</>), as in the example.
 217     The
 218     <type>date</type> type is actually quite flexible in what it
 219     accepts, but for this tutorial we will stick to the unambiguous
 220     format shown here.
 221    </para>
 222
 223    <para>
 224     The <type>point</type> type requires a coordinate pair as input,
 225     as shown here:
 226 <programlisting>
 227 INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
 228 </programlisting>
 229    </para>
 230
 231    <para>
 232     The syntax used so far requires you to remember the order of the
 233     columns.  An alternative syntax allows you to list the columns
 234     explicitly:
 235 <programlisting>
 236 INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
 237     VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
 238 </programlisting>
 239     You can list the columns in a different order if you wish or
 240     even omit some columns, e.g., if the precipitation is unknown:
 241 <programlisting>
 242 INSERT INTO weather (date, city, temp_hi, temp_lo)
 243     VALUES ('1994-11-29', 'Hayward', 54, 37);
 244 </programlisting>
 245     Many developers consider explicitly listing the columns better
 246     style than relying on the order implicitly.
 247    </para>
 248
 249    <para>
 250     Please enter all the commands shown above so you have some data to
 251     work with in the following sections.
 252    </para>
 253
 254    <para>
 255     <indexterm>
 256      <primary>COPY</primary>
 257     </indexterm>
 258
 259     You could also have used <command>COPY</command> to load large
 260     amounts of data from flat-text files.  This is usually faster
 261     because the <command>COPY</command> command is optimized for this
 262     application while allowing less flexibility than
 263     <command>INSERT</command>.  An example would be:
 264
 265 <programlisting>
 266 COPY weather FROM '/home/user/weather.txt';
 267 </programlisting>
 268
 269     where the file name for the source file must be available to the
 270     backend server machine, not the client, since the backend server
 271     reads the file directly.  You can read more about the
 272     <command>COPY</command> command in <xref linkend="sql-copy">.
 273    </para>
 274   </sect1>
 275
 276
 277   <sect1 id="tutorial-select">
 278    <title>Querying a Table</title>
 279
 280    <para>
 281     <indexterm><primary>query</primary></indexterm>
 282     <indexterm><primary>SELECT</primary></indexterm>
 283
 284     To retrieve data from a table, the table is
 285     <firstterm>queried</firstterm>.  An <acronym>SQL</acronym>
 286     <command>SELECT</command> statement is used to do this.  The
 287     statement is divided into a select list (the part that lists the
 288     columns to be returned), a table list (the part that lists the
 289     tables from which to retrieve the data), and an optional
 290     qualification (the part that specifies any restrictions).  For
 291     example, to retrieve all the rows of table
 292     <classname>weather</classname>, type:
 293 <programlisting>
 294 SELECT * FROM weather;
 295 </programlisting>
 296     Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
 297      <footnote>
 298       <para>
 299        While <literal>SELECT *</literal> is useful for off-the-cuff
 300        queries, it is widely considered bad style in production code,
 301        since adding a column to the table would change the results.
 302       </para>
 303      </footnote>
 304     So the same result would be had with:
 305 <programlisting>
 306 SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
 307 </programlisting>
 308
 309     The output should be:
 310
 311 <screen>
 312      city      | temp_lo | temp_hi | prcp |    date
 313 ---------------+---------+---------+------+------------
 314  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 315  San Francisco |      43 |      57 |    0 | 1994-11-29
 316  Hayward       |      37 |      54 |      | 1994-11-29
 317 (3 rows)
 318 </screen>
 319    </para>
 320
 321    <para>
 322     You can write expressions, not just simple column references, in the
 323     select list.  For example, you can do:
 324 <programlisting>
 325 SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
 326 </programlisting>
 327     This should give:
 328 <screen>
 329      city      | temp_avg |    date
 330 ---------------+----------+------------
 331  San Francisco |       48 | 1994-11-27
 332  San Francisco |       50 | 1994-11-29
 333  Hayward       |       45 | 1994-11-29
 334 (3 rows)
 335 </screen>
 336     Notice how the <literal>AS</literal> clause is used to relabel the
 337     output column.  (The <literal>AS</literal> clause is optional.)
 338    </para>
 339
 340    <para>
 341     A query can be <quote>qualified</> by adding a <literal>WHERE</>
 342     clause that specifies which rows are wanted.  The <literal>WHERE</>
 343     clause contains a Boolean (truth value) expression, and only rows for
 344     which the Boolean expression is true are returned.  The usual
 345     Boolean operators (<literal>AND</literal>,
 346     <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
 347     the qualification.  For example, the following
 348     retrieves the weather of San Francisco on rainy days:
 349
 350 <programlisting>
 351 SELECT * FROM weather
 352     WHERE city = 'San Francisco' AND prcp &gt; 0.0;
 353 </programlisting>
 354     Result:
 355 <screen>
 356      city      | temp_lo | temp_hi | prcp |    date
 357 ---------------+---------+---------+------+------------
 358  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 359 (1 row)
 360 </screen>
 361    </para>
 362
 363    <para>
 364     <indexterm><primary>ORDER BY</primary></indexterm>
 365
 366     You can request that the results of a query
 367     be returned in sorted order:
 368
 369 <programlisting>
 370 SELECT * FROM weather
 371     ORDER BY city;
 372 </programlisting>
 373
 374 <screen>
 375      city      | temp_lo | temp_hi | prcp |    date
 376 ---------------+---------+---------+------+------------
 377  Hayward       |      37 |      54 |      | 1994-11-29
 378  San Francisco |      43 |      57 |    0 | 1994-11-29
 379  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 380 </screen>
 381
 382     In this example, the sort order isn't fully specified, and so you
 383     might get the San Francisco rows in either order.  But you'd always
 384     get the results shown above if you do
 385
 386 <programlisting>
 387 SELECT * FROM weather
 388     ORDER BY city, temp_lo;
 389 </programlisting>
 390    </para>
 391
 392    <para>
 393     <indexterm><primary>DISTINCT</primary></indexterm>
 394     <indexterm><primary>duplicate</primary></indexterm>
 395
 396     You can request that duplicate rows be removed from the result of
 397     a query:
 398
 399 <programlisting>
 400 SELECT DISTINCT city
 401     FROM weather;
 402 </programlisting>
 403
 404 <screen>
 405      city
 406 ---------------
 407  Hayward
 408  San Francisco
 409 (2 rows)
 410 </screen>
 411
 412     Here again, the result row ordering might vary.
 413     You can ensure consistent results by using <literal>DISTINCT</literal> and
 414     <literal>ORDER BY</literal> together:
 415      <footnote>
 416       <para>
 417        In some database systems, including older versions of
 418        <productname>PostgreSQL</productname>, the implementation of
 419        <literal>DISTINCT</literal> automatically orders the rows and
 420        so <literal>ORDER BY</literal> is redundant.  But this is not
 421        required by the SQL standard, and current
 422        <productname>PostgreSQL</productname> doesn't guarantee that
 423        <literal>DISTINCT</literal> causes the rows to be ordered.
 424       </para>
 425      </footnote>
 426
 427 <programlisting>
 428 SELECT DISTINCT city
 429     FROM weather
 430     ORDER BY city;
 431 </programlisting>
 432    </para>
 433   </sect1>
 434
 435
 436   <sect1 id="tutorial-join">
 437    <title>Joins Between Tables</title>
 438
 439    <indexterm zone="tutorial-join">
 440     <primary>join</primary>
 441    </indexterm>
 442
 443    <para>
 444     Thus far, our queries have only accessed one table at a time.
 445     Queries can access multiple tables at once, or access the same
 446     table in such a way that multiple rows of the table are being
 447     processed at the same time.  A query that accesses multiple rows
 448     of the same or different tables at one time is called a
 449     <firstterm>join</firstterm> query.  As an example, say you wish to
 450     list all the weather records together with the location of the
 451     associated city.  To do that, we need to compare the city column of
 452     each row of the weather table with the name column of all rows in
 453     the cities table, and select the pairs of rows where these values match.
 454     <note>
 455      <para>
 456       This  is only a conceptual model.  The join is usually performed
 457       in a more efficient manner than actually comparing each possible
 458       pair of rows, but this is invisible to the user.
 459      </para>
 460     </note>
 461     This would be accomplished by the following query:
 462
 463 <programlisting>
 464 SELECT *
 465     FROM weather, cities
 466     WHERE city = name;
 467 </programlisting>
 468
 469 <screen>
 470      city      | temp_lo | temp_hi | prcp |    date    |     name      | location
 471 ---------------+---------+---------+------+------------+---------------+-----------
 472  San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
 473  San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
 474 (2 rows)
 475 </screen>
 476
 477    </para>
 478
 479    <para>
 480     Observe two things about the result set:
 481     <itemizedlist>
 482      <listitem>
 483       <para>
 484        There is no result row for the city of Hayward.  This is
 485        because there is no matching entry in the
 486        <classname>cities</classname> table for Hayward, so the join
 487        ignores the unmatched rows in the weather table.  We will see
 488        shortly how this can be fixed.
 489       </para>
 490      </listitem>
 491
 492      <listitem>
 493       <para>
 494        There are two columns containing the city name.  This is
 495        correct because the lists of columns of the
 496        <classname>weather</classname> and the
 497        <classname>cities</classname> table are concatenated.  In
 498        practice this is undesirable, though, so you will probably want
 499        to list the output columns explicitly rather than using
 500        <literal>*</literal>:
 501 <programlisting>
 502 SELECT city, temp_lo, temp_hi, prcp, date, location
 503     FROM weather, cities
 504     WHERE city = name;
 505 </programlisting>
 506       </para>
 507      </listitem>
 508     </itemizedlist>
 509    </para>
 510
 511    <formalpara>
 512     <title>Exercise:</title>
 513
 514     <para>
 515      Attempt to find out the semantics of this query when the
 516      <literal>WHERE</literal> clause is omitted.
 517     </para>
 518    </formalpara>
 519
 520    <para>
 521     Since the columns all had different names, the parser
 522     automatically found out which table they belong to, but it is good
 523     style to fully qualify column names in join queries:
 524
 525 <programlisting>
 526 SELECT weather.city, weather.temp_lo, weather.temp_hi,
 527        weather.prcp, weather.date, cities.location
 528     FROM weather, cities
 529     WHERE cities.name = weather.city;
 530 </programlisting>
 531    </para>
 532
 533    <para>
 534     Join queries of the kind seen thus far can also be written in this
 535     alternative form:
 536
 537 <programlisting>
 538 SELECT *
 539     FROM weather INNER JOIN cities ON (weather.city = cities.name);
 540 </programlisting>
 541
 542     This syntax is not as commonly used as the one above, but we show
 543     it here to help you understand the following topics.
 544    </para>
 545
 546    <para>
 547     <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
 548
 549     Now we will figure out how we can get the Hayward records back in.
 550     What we want the query to do is to scan the
 551     <classname>weather</classname> table and for each row to find the
 552     matching <classname>cities</classname> row.  If no matching row is
 553     found we want some <quote>empty values</quote> to be substituted
 554     for the <classname>cities</classname> table's columns.  This kind
 555     of query is called an <firstterm>outer join</firstterm>.  (The
 556     joins we have seen so far are inner joins.)  The command looks
 557     like this:
 558
 559 <programlisting>
 560 SELECT *
 561     FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
 562
 563      city      | temp_lo | temp_hi | prcp |    date    |     name      | location
 564 ---------------+---------+---------+------+------------+---------------+-----------
 565  Hayward       |      37 |      54 |      | 1994-11-29 |               |
 566  San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
 567  San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
 568 (3 rows)
 569 </programlisting>
 570
 571     This query is called a <firstterm>left outer
 572     join</firstterm> because the table mentioned on the left of the
 573     join operator will have each of its rows in the output at least
 574     once, whereas the table on the right will only have those rows
 575     output that match some row of the left table.  When outputting a
 576     left-table row for which there is no right-table match, empty (null)
 577     values are substituted for the right-table columns.
 578    </para>
 579
 580    <formalpara>
 581     <title>Exercise:</title>
 582
 583     <para>
 584      There are also right outer joins and full outer joins.  Try to
 585      find out what those do.
 586     </para>
 587    </formalpara>
 588
 589    <para>
 590     <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
 591     <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
 592
 593     We can also join a table against itself.  This is called a
 594     <firstterm>self join</firstterm>.  As an example, suppose we wish
 595     to find all the weather records that are in the temperature range
 596     of other weather records.  So we need to compare the
 597     <structfield>temp_lo</> and <structfield>temp_hi</> columns of
 598     each <classname>weather</classname> row to the
 599     <structfield>temp_lo</structfield> and
 600     <structfield>temp_hi</structfield> columns of all other
 601     <classname>weather</classname> rows.  We can do this with the
 602     following query:
 603
 604 <programlisting>
 605 SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
 606     W2.city, W2.temp_lo AS low, W2.temp_hi AS high
 607     FROM weather W1, weather W2
 608     WHERE W1.temp_lo &lt; W2.temp_lo
 609     AND W1.temp_hi &gt; W2.temp_hi;
 610
 611      city      | low | high |     city      | low | high
 612 ---------------+-----+------+---------------+-----+------
 613  San Francisco |  43 |   57 | San Francisco |  46 |   50
 614  Hayward       |  37 |   54 | San Francisco |  46 |   50
 615 (2 rows)
 616 </programlisting>
 617
 618     Here we have relabeled the weather table as <literal>W1</> and
 619     <literal>W2</> to be able to distinguish the left and right side
 620     of the join.  You can also use these kinds of aliases in other
 621     queries to save some typing, e.g.:
 622 <programlisting>
 623 SELECT *
 624     FROM weather w, cities c
 625     WHERE w.city = c.name;
 626 </programlisting>
 627     You will encounter this style of abbreviating quite frequently.
 628    </para>
 629   </sect1>
 630
 631
 632   <sect1 id="tutorial-agg">
 633    <title>Aggregate Functions</title>
 634
 635    <indexterm zone="tutorial-agg">
 636     <primary>aggregate function</primary>
 637    </indexterm>
 638
 639    <para>
 640     <indexterm><primary>average</primary></indexterm>
 641     <indexterm><primary>count</primary></indexterm>
 642     <indexterm><primary>max</primary></indexterm>
 643     <indexterm><primary>min</primary></indexterm>
 644     <indexterm><primary>sum</primary></indexterm>
 645
 646     Like  most  other relational database products,
 647     <productname>PostgreSQL</productname> supports
 648     aggregate functions.
 649     An aggregate function computes a single result from multiple input rows.
 650     For example, there are aggregates to compute the
 651     <function>count</function>, <function>sum</function>,
 652     <function>avg</function> (average), <function>max</function> (maximum) and
 653     <function>min</function> (minimum) over a set of rows.
 654    </para>
 655
 656    <para>
 657     As an example, we can find the highest low-temperature reading anywhere
 658     with
 659
 660 <programlisting>
 661 SELECT max(temp_lo) FROM weather;
 662 </programlisting>
 663
 664 <screen>
 665  max
 666 -----
 667   46
 668 (1 row)
 669 </screen>
 670    </para>
 671
 672    <para>
 673     <indexterm><primary>subquery</primary></indexterm>
 674
 675     If we wanted to know what city (or cities) that reading occurred in,
 676     we might try
 677
 678 <programlisting>
 679 SELECT city FROM weather WHERE temp_lo = max(temp_lo);     <lineannotation>WRONG</lineannotation>
 680 </programlisting>
 681
 682     but this will not work since the aggregate
 683     <function>max</function> cannot be used in the
 684     <literal>WHERE</literal> clause.  (This restriction exists because
 685     the <literal>WHERE</literal> clause determines the rows that will
 686     go into the aggregation stage; so it has to be evaluated before
 687     aggregate functions are computed.)
 688     However, as is often the case
 689     the query can be restated to accomplish the intended result, here
 690     by using a <firstterm>subquery</firstterm>:
 691
 692 <programlisting>
 693 SELECT city FROM weather
 694     WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
 695 </programlisting>
 696
 697 <screen>
 698      city
 699 ---------------
 700  San Francisco
 701 (1 row)
 702 </screen>
 703
 704     This is OK because the subquery is an independent computation
 705     that computes its own aggregate separately from what is happening
 706     in the outer query.
 707    </para>
 708
 709    <para>
 710     <indexterm><primary>GROUP BY</primary></indexterm>
 711     <indexterm><primary>HAVING</primary></indexterm>
 712
 713     Aggregates are also very useful in combination with <literal>GROUP
 714     BY</literal> clauses.  For example, we can get the maximum low
 715     temperature observed in each city with
 716
 717 <programlisting>
 718 SELECT city, max(temp_lo)
 719     FROM weather
 720     GROUP BY city;
 721 </programlisting>
 722
 723 <screen>
 724      city      | max
 725 ---------------+-----
 726  Hayward       |  37
 727  San Francisco |  46
 728 (2 rows)
 729 </screen>
 730
 731     which gives us one output row per city.  Each aggregate result is
 732     computed over the table rows matching that city.
 733     We can filter these grouped
 734     rows using <literal>HAVING</literal>:
 735
 736 <programlisting>
 737 SELECT city, max(temp_lo)
 738     FROM weather
 739     GROUP BY city
 740     HAVING max(temp_lo) &lt; 40;
 741 </programlisting>
 742
 743 <screen>
 744   city   | max
 745 ---------+-----
 746  Hayward |  37
 747 (1 row)
 748 </screen>
 749
 750     which gives us the same results for only the cities that have all
 751     <literal>temp_lo</> values below 40.  Finally, if we only care about
 752     cities whose
 753     names begin with <quote><literal>S</literal></quote>, we might do
 754
 755 <programlisting>
 756 SELECT city, max(temp_lo)
 757     FROM weather
 758     WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
 759     GROUP BY city
 760     HAVING max(temp_lo) &lt; 40;
 761 </programlisting>
 762    <calloutlist>
 763     <callout arearefs="co.tutorial-agg-like">
 764      <para>
 765       The <literal>LIKE</literal> operator does pattern matching and
 766       is explained in <xref linkend="functions-matching">.
 767      </para>
 768     </callout>
 769    </calloutlist>
 770    </para>
 771
 772    <para>
 773     It is important to understand the interaction between aggregates and
 774     <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
 775     The fundamental difference between <literal>WHERE</literal> and
 776     <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
 777     input rows before groups and aggregates are computed (thus, it controls
 778     which rows go into the aggregate computation), whereas
 779     <literal>HAVING</literal> selects group rows after groups and
 780     aggregates are computed.  Thus, the
 781     <literal>WHERE</literal> clause must not contain aggregate functions;
 782     it makes no sense to try to use an aggregate to determine which rows
 783     will be inputs to the aggregates.  On the other hand, the
 784     <literal>HAVING</literal> clause always contains aggregate functions.
 785     (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
 786     clause that doesn't use aggregates, but it's wasteful. The same condition
 787     could be used more efficiently at the <literal>WHERE</literal> stage.)
 788    </para>
 789
 790    <para>
 791     In the previous example, we can apply the city name restriction in
 792     <literal>WHERE</literal>, since it needs no aggregate.  This is
 793     more efficient than adding the restriction to <literal>HAVING</literal>,
 794     because we avoid doing the grouping and aggregate calculations
 795     for all rows that fail the <literal>WHERE</literal> check.
 796    </para>
 797   </sect1>
 798
 799
 800   <sect1 id="tutorial-update">
 801    <title>Updates</title>
 802
 803    <indexterm zone="tutorial-update">
 804     <primary>UPDATE</primary>
 805    </indexterm>
 806
 807    <para>
 808     You can update existing rows using the
 809     <command>UPDATE</command> command.
 810     Suppose you discover the temperature readings are
 811     all  off  by 2 degrees as of November 28.  You may update the
 812     data as follows:
 813
 814 <programlisting>
 815 UPDATE weather
 816     SET temp_hi = temp_hi - 2,  temp_lo = temp_lo - 2
 817     WHERE date &gt; '1994-11-28';
 818 </programlisting>
 819    </para>
 820
 821    <para>
 822     Look at the new state of the data:
 823 <programlisting>
 824 SELECT * FROM weather;
 825
 826      city      | temp_lo | temp_hi | prcp |    date
 827 ---------------+---------+---------+------+------------
 828  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 829  San Francisco |      41 |      55 |    0 | 1994-11-29
 830  Hayward       |      35 |      52 |      | 1994-11-29
 831 (3 rows)
 832 </programlisting>
 833    </para>
 834   </sect1>
 835
 836   <sect1 id="tutorial-delete">
 837    <title>Deletions</title>
 838
 839    <indexterm zone="tutorial-delete">
 840     <primary>DELETE</primary>
 841    </indexterm>
 842
 843    <para>
 844     Rows can be removed from a table using the <command>DELETE</command>
 845     command.
 846     Suppose you are no longer interested in the weather of Hayward.
 847     Then you can do the following to delete those rows from the table:
 848 <programlisting>
 849 DELETE FROM weather WHERE city = 'Hayward';
 850 </programlisting>
 851
 852     All weather records belonging to Hayward are removed.
 853
 854 <programlisting>
 855 SELECT * FROM weather;
 856 </programlisting>
 857
 858 <screen>
 859      city      | temp_lo | temp_hi | prcp |    date
 860 ---------------+---------+---------+------+------------
 861  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 862  San Francisco |      41 |      55 |    0 | 1994-11-29
 863 (2 rows)
 864 </screen>
 865    </para>
 866
 867    <para>
 868     One should be wary of statements of the form
 869 <synopsis>
 870 DELETE FROM <replaceable>tablename</replaceable>;
 871 </synopsis>
 872
 873     Without a qualification, <command>DELETE</command> will
 874     remove  <emphasis>all</>  rows from the given table, leaving it
 875     empty.  The system will not request confirmation before
 876     doing this!
 877    </para>
 878   </sect1>
 879
 880  </chapter>
 881
 882 <!-- Keep this comment at the end of the file
 883 Local variables:
 884 mode:sgml
 885 sgml-omittag:nil
 886 sgml-shorttag:t
 887 sgml-minimize-attributes:nil
 888 sgml-always-quote-attributes:t
 889 sgml-indent-step:1
 890 sgml-indent-data:t
 891 sgml-parent-document:nil
 892 sgml-default-dtd-file:"./reference.ced"
 893 sgml-exposed-tags:nil
 894 sgml-local-catalogs:("/usr/lib/sgml/catalog")
 895 sgml-local-ecat-files:nil
 896 End:
 897 -->