granicus.if.org Git - postgresql/blob - doc/src/sgml/sql.sgml

   1 <!--
   2 $Header: /cvsroot/pgsql/doc/src/sgml/sql.sgml,v 1.10 2000/06/14 13:10:48 thomas Exp $
   3 -->
   4
   5  <chapter id="sql">
   6   <title>SQL</title>
   7
   8   <abstract>
   9    <para>
  10     This chapter introduces the mathematical concepts behind
  11     relational databases. It is not required reading, so if you bog
  12     down or want to get straight to some simple examples feel free to
  13     jump ahead to the next chapter and come back when you have more
  14     time and patience. This stuff is supposed to be fun!
  15    </para>
  16
  17    <para>
  18     This material originally appeared as a part of
  19     Stefan Simkovics' Master's Thesis
  20     (<xref linkend="SIM98" endterm="SIM98">).
  21    </para>
  22   </abstract>
  23
  24   <para>
  25    <acronym>SQL</acronym> has become the most popular relational query
  26    language.
  27    The name "<acronym>SQL</acronym>" is an abbreviation for
  28    <firstterm>Structured Query Language</firstterm>.
  29    In 1974 Donald Chamberlin and others defined the
  30    language SEQUEL (<firstterm>Structured English Query
  31     Language</firstterm>) at IBM
  32    Research. This language was first implemented in an IBM
  33    prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version
  34    of SEQUEL called SEQUEL/2 was defined and the name was changed to
  35    <acronym>SQL</acronym>
  36    subsequently.
  37   </para>
  38
  39   <para>
  40    A new prototype called System R was developed by IBM in 1977. System R
  41    implemented a large subset of SEQUEL/2 (now <acronym>SQL</acronym>)
  42    and a number of
  43    changes were made to <acronym>SQL</acronym> during the project.
  44    System R was installed in
  45    a number of user sites, both internal IBM sites and also some selected
  46    customer sites. Thanks to the success and acceptance of System R at
  47    those user sites IBM started to develop commercial products that
  48    implemented the <acronym>SQL</acronym> language based on the System
  49    R technology.
  50   </para>
  51
  52   <para>
  53    Over the next years IBM and also a number of other vendors announced
  54    <acronym>SQL</acronym> products such as
  55    <productname>SQL/DS</productname> (IBM),
  56    <productname>DB2</productname> (IBM),
  57    <productname>ORACLE</productname> (Oracle Corp.),
  58    <productname>DG/SQL</productname> (Data General Corp.),
  59    and <productname>SYBASE</productname> (Sybase Inc.).
  60   </para>
  61
  62   <para>
  63    <acronym>SQL</acronym> is also an official standard now. In 1982
  64    the American National
  65    Standards Institute (<acronym>ANSI</acronym>) chartered its
  66    Database Committee X3H2 to
  67    develop a proposal for a standard relational language. This proposal
  68    was ratified in 1986 and consisted essentially of the IBM dialect of
  69    <acronym>SQL</acronym>. In 1987 this <acronym>ANSI</acronym>
  70    standard was also accepted as an international
  71    standard by the International Organization for Standardization
  72    (<acronym>ISO</acronym>).
  73    This original standard version of <acronym>SQL</acronym> is often
  74    referred to,
  75    informally, as "<abbrev>SQL/86</abbrev>". In 1989 the original
  76    standard was extended
  77    and this new standard is often, again informally, referred to as
  78    "<abbrev>SQL/89</abbrev>". Also in 1989, a related standard called
  79    <firstterm>Database Language Embedded <acronym>SQL</acronym></firstterm>
  80    (<acronym>ESQL</acronym>) was developed.
  81   </para>
  82
  83   <para>
  84    The <acronym>ISO</acronym> and <acronym>ANSI</acronym> committees
  85    have been working for many years on the
  86    definition of a greatly expanded version of the original standard,
  87    referred to informally as <firstterm><acronym>SQL2</acronym></firstterm>
  88    or <firstterm><acronym>SQL/92</acronym></firstterm>. This version became a
  89    ratified standard - "International Standard ISO/IEC 9075:1992,
  90    Database Language <acronym>SQL</acronym>" - in late 1992.
  91    <acronym>SQL/92</acronym> is the version
  92    normally meant when people refer to "the <acronym>SQL</acronym>
  93    standard". A detailed
  94    description of <acronym>SQL/92</acronym> is given in
  95    <xref linkend="DATE97" endterm="DATE97">. At the time of
  96    writing this document a new standard informally referred to
  97    as <firstterm><acronym>SQL3</acronym></firstterm>
  98    is under development. It is planned to make <acronym>SQL</acronym>
  99    a Turing-complete
 100    language, i.e. all computable queries (e.g. recursive queries) will be
 101    possible. This is a very complex task and therefore the completion of
 102    the new standard can not be expected before 1999.
 103   </para>
 104
 105   <sect1 id="rel-model">
 106    <title>The Relational Data Model</title>
 107
 108   <para>
 109     As mentioned before, <acronym>SQL</acronym> is a relational
 110     language. That means it is
 111     based on the <firstterm>relational data model</firstterm>
 112     first published by E.F. Codd in
 113     1970. We will give a formal description of the relational model
 114     later (in
 115     <xref linkend="formal-notion" endterm="formal-notion">)
 116     but first we want to have a look at it from a more intuitive
 117     point of view.
 118   </para>
 119
 120   <para>
 121     A <firstterm>relational database</firstterm> is a database that is
 122     perceived by its
 123     users as a <firstterm>collection of tables</firstterm> (and
 124     nothing else but tables).
 125     A table consists of rows and columns where each row represents a
 126     record and each column represents an attribute of the records
 127     contained in the table.
 128     <xref linkend="supplier-fig" endterm="supplier-fig">
 129     shows an example of a database consisting of three tables:
 130
 131     <itemizedlist>
 132      <listitem>
 133       <para>
 134        SUPPLIER is a table storing the number
 135        (SNO), the name (SNAME) and the city (CITY) of a supplier.
 136       </para>
 137      </listitem>
 138
 139      <listitem>
 140       <para>
 141        PART is a table storing the number (PNO) the name (PNAME) and
 142        the price (PRICE) of a part.
 143       </para>
 144      </listitem>
 145
 146      <listitem>
 147       <para>
 148        SELLS stores information about which part (PNO) is sold by which
 149        supplier (SNO).
 150        It serves in a sense to connect the other two tables together.
 151       </para>
 152      </listitem>
 153     </itemizedlist>
 154
 155     <example>
 156      <title id="supplier-fig">The Suppliers and Parts Database</title>
 157      <programlisting>
 158 SUPPLIER:                   SELLS:
 159  SNO |  SNAME  |  CITY       SNO | PNO
 160 ----+---------+--------     -----+-----
 161  1  |  Smith  | London        1  |  1
 162  2  |  Jones  | Paris         1  |  2
 163  3  |  Adams  | Vienna        2  |  4
 164  4  |  Blake  | Rome          3  |  1
 165                               3  |  3
 166                               4  |  2
 167 PART:                         4  |  3
 168  PNO |  PNAME  |  PRICE       4  |  4
 169 ----+---------+---------
 170  1  |  Screw  |   10
 171  2  |  Nut    |    8
 172  3  |  Bolt   |   15
 173  4  |  Cam    |   25
 174      </programlisting>
 175     </example>
 176    </para>
 177
 178    <para>
 179     The tables PART and SUPPLIER may be regarded as
 180     <firstterm>entities</firstterm> and
 181     SELLS may be regarded as a <firstterm>relationship</firstterm>
 182     between a particular
 183     part and a particular supplier.
 184    </para>
 185
 186    <para>
 187     As we will see later, <acronym>SQL</acronym> operates on tables
 188     like the ones just
 189     defined but before that we will study the theory of the relational
 190     model.
 191    </para>
 192   </sect1>
 193
 194   <sect1>
 195    <title id="formal-notion">Relational Data Model Formalities</title>
 196
 197    <para>
 198     The mathematical concept underlying the relational model is the
 199     set-theoretic <firstterm>relation</firstterm> which is a subset of
 200     the Cartesian
 201     product of a list of domains. This set-theoretic relation gives
 202     the model its name (do not confuse it with the relationship from the
 203     <firstterm>Entity-Relationship model</firstterm>).
 204     Formally a domain is simply a set of
 205     values. For example the set of integers is a domain. Also the set of
 206     character strings of length 20 and the real numbers are examples of
 207     domains.
 208    </para>
 209
 210    <para>
 211 <!--
 212 \begin{definition}
 213 The <firstterm>Cartesian product</firstterm> of domains $D_{1},
 214     D_{2},\ldots, D_{k}$ written
 215 \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set of
 216 all $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \in
 217 D_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}.
 218 \end{definition}
 219 -->
 220     The <firstterm>Cartesian product</firstterm> of domains
 221     <parameter>D<subscript>1</subscript></parameter>,
 222     <parameter>D<subscript>2</subscript></parameter>,
 223     ...
 224     <parameter>D<subscript>k</subscript></parameter>,
 225     written
 226     <parameter>D<subscript>1</subscript></parameter> &times;
 227     <parameter>D<subscript>2</subscript></parameter> &times;
 228     ... &times;
 229     <parameter>D<subscript>k</subscript></parameter>
 230     is the set of all k-tuples
 231     <parameter>v<subscript>1</subscript></parameter>,
 232     <parameter>v<subscript>2</subscript></parameter>,
 233     ...
 234     <parameter>v<subscript>k</subscript></parameter>,
 235     such that
 236     <parameter>v<subscript>1</subscript></parameter> &isin;
 237     <parameter>D<subscript>1</subscript></parameter>,
 238     <parameter>v<subscript>1</subscript></parameter> &isin;
 239     <parameter>D<subscript>1</subscript></parameter>,
 240     ...
 241     <parameter>v<subscript>k</subscript></parameter> &isin;
 242     <parameter>D<subscript>k</subscript></parameter>.
 243    </para>
 244
 245    <para>
 246     For example, when we have
 247 <!--
 248  $k=2$, $D_{1}=\{0,1\}$ and
 249 $D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is
 250 $\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$.
 251 -->
 252     <parameter>k</parameter>=2,
 253     <parameter>D<subscript>1</subscript></parameter>=<literal>{0,1}</literal> and
 254     <parameter>D<subscript>2</subscript></parameter>=<literal>{a,b,c}</literal> then
 255     <parameter>D<subscript>1</subscript></parameter> &times;
 256     <parameter>D<subscript>2</subscript></parameter> is
 257     <literal>{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)}</literal>.
 258    </para>
 259
 260    <para>
 261 <!--
 262 \begin{definition}
 263 A Relation is any subset of the Cartesian product of one or more
 264 domains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$}
 265 \end{definition}
 266 -->
 267     A Relation is any subset of the Cartesian product of one or more
 268     domains: <parameter>R</parameter> &sube;
 269     <parameter>D<subscript>1</subscript></parameter> &times;
 270     <parameter>D<subscript>2</subscript></parameter> &times;
 271     ... &times;
 272     <parameter>D<subscript>k</subscript></parameter>.
 273    </para>
 274
 275    <para>
 276     For example <literal>{(0,a),(0,b),(1,a)}</literal> is a relation;
 277     it is in fact a subset of
 278     <parameter>D<subscript>1</subscript></parameter> &times;
 279     <parameter>D<subscript>2</subscript></parameter>
 280     mentioned above.
 281    </para>
 282
 283    <para>
 284     The members of a relation are called tuples. Each relation of some
 285     Cartesian product
 286     <parameter>D<subscript>1</subscript></parameter> &times;
 287     <parameter>D<subscript>2</subscript></parameter> &times;
 288     ... &times;
 289     <parameter>D<subscript>k</subscript></parameter>
 290     is said to have arity <literal>k</literal> and is therefore a set
 291     of <literal>k</literal>-tuples.
 292    </para>
 293
 294    <para>
 295     A relation can be viewed as a table (as we already did, remember
 296     <xref linkend="supplier-fig" endterm="supplier-fig"> where
 297     every tuple is represented by a row and every column corresponds to
 298     one component of a tuple. Giving names (called attributes) to the
 299     columns leads to the definition of a
 300     <firstterm>relation scheme</firstterm>.
 301    </para>
 302
 303    <para>
 304 <!--
 305 \begin{definition}
 306 A {\it relation scheme} $R$ is a finite set of attributes
 307 \mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ for
 308 each attribute $A_{i}, 1 \le i \le k$ where the values of the
 309 attributes are taken from. We often write a relation scheme as
 310 \mbox{$R(A_{1},A_{2},\ldots,A_{k})$}.
 311 \end{definition}
 312 -->
 313     A <firstterm>relation scheme</firstterm> <literal>R</literal> is a
 314     finite set of attributes
 315     <parameter>A<subscript>1</subscript></parameter>,
 316     <parameter>A<subscript>2</subscript></parameter>,
 317     ...
 318     <parameter>A<subscript>k</subscript></parameter>.
 319     There is a domain
 320     <parameter>D<subscript>i</subscript></parameter>,
 321     for each attribute
 322     <parameter>A<subscript>i</subscript></parameter>,
 323     1 &lt;= <literal>i</literal> &lt;= <literal>k</literal>,
 324     where the values of the attributes are taken from. We often write
 325     a relation scheme as
 326     <literal>R(<parameter>A<subscript>1</subscript></parameter>,
 327     <parameter>A<subscript>2</subscript></parameter>,
 328     ...
 329     <parameter>A<subscript>k</subscript></parameter>)</literal>.
 330
 331     <note>
 332      <para>
 333       A <firstterm>relation scheme</firstterm> is just a kind of template
 334       whereas a <firstterm>relation</firstterm> is an instance of a
 335       <firstterm>relation
 336        scheme</firstterm>. The relation consists of tuples (and can
 337       therefore be
 338       viewed as a table); not so the relation scheme.
 339      </para>
 340     </note>
 341    </para>
 342
 343    <sect2>
 344     <title id="domains">Domains vs. Data Types</title>
 345
 346     <para>
 347      We often talked about <firstterm>domains</firstterm>
 348      in the last section. Recall that a
 349      domain is, formally, just a set of values (e.g., the set of integers or
 350      the real numbers). In terms of database systems we often talk of
 351      <firstterm>data types</firstterm> instead of domains.
 352      When we define a table we have to make
 353      a decision about which attributes to include. Additionally we
 354      have to decide which kind of data is going to be stored as
 355      attribute values. For example the values of
 356      <classname>SNAME</classname> from the table
 357      <classname>SUPPLIER</classname> will be character strings,
 358      whereas <classname>SNO</classname> will store
 359      integers. We define this by assigning a data type to each
 360      attribute. The type of <classname>SNAME</classname> will be
 361      <type>VARCHAR(20)</type> (this is the <acronym>SQL</acronym> type
 362      for character strings of length &lt;= 20),
 363      the type of <classname>SNO</classname> will be
 364      <type>INTEGER</type>. With the assignment of a data type we also
 365      have selected
 366      a domain for an attribute. The domain of
 367      <classname>SNAME</classname> is the set of all
 368      character strings of length &lt;= 20,
 369      the domain of <classname>SNO</classname> is the set of
 370      all integer numbers.
 371     </para>
 372    </sect2>
 373   </sect1>
 374
 375   <sect1>
 376    <title id="operations">Operations in the Relational Data Model</title>
 377
 378    <para>
 379     In the previous section
 380     (<xref linkend="formal-notion" endterm="formal-notion">)
 381     we defined the mathematical notion of
 382     the relational model. Now we know how the data can be stored using a
 383     relational data model but we do not know what to do with all these
 384     tables to retrieve something from the database yet. For example somebody
 385     could ask for the names of all suppliers that sell the part
 386     'Screw'. Therefore two rather different kinds of notations for
 387     expressing operations on relations have been defined:
 388
 389     <itemizedlist>
 390      <listitem>
 391       <para>
 392        The <firstterm>Relational Algebra</firstterm> which is an
 393        algebraic notation,
 394        where queries are expressed by applying specialized operators to the
 395        relations.
 396       </para>
 397      </listitem>
 398
 399      <listitem>
 400       <para>
 401        The <firstterm>Relational Calculus</firstterm> which is a
 402        logical notation,
 403        where queries are expressed by formulating some logical restrictions
 404        that the tuples in the answer must satisfy.
 405       </para>
 406     </listitem>
 407     </itemizedlist>
 408    </para>
 409
 410    <sect2>
 411     <title id="rel-alg">Relational Algebra</title>
 412
 413     <para>
 414      The <firstterm>Relational Algebra</firstterm> was introduced by
 415      E. F. Codd in 1972. It consists of a set of operations on relations:
 416
 417      <itemizedlist>
 418       <listitem>
 419        <para>
 420         SELECT (&sigma;): extracts <firstterm>tuples</firstterm> from
 421         a relation that
 422         satisfy a given restriction. Let <parameter>R</parameter> be a
 423         table that contains an attribute
 424         <parameter>A</parameter>.
 425 &sigma;<subscript>A=a</subscript>(R) = {t &isin; R &mid; t(A) = a}
 426         where <literal>t</literal> denotes a
 427         tuple of <parameter>R</parameter> and <literal>t(A)</literal>
 428         denotes the value of attribute <parameter>A</parameter> of
 429         tuple <literal>t</literal>.
 430        </para>
 431       </listitem>
 432
 433       <listitem>
 434        <para>
 435         PROJECT (&pi;): extracts specified
 436         <firstterm>attributes</firstterm> (columns) from a
 437         relation. Let <classname>R</classname> be a relation
 438         that contains an attribute <classname>X</classname>.
 439         &pi;<subscript>X</subscript>(<classname>R</classname>) = {t(X) &mid; t &isin; <classname>R</classname>},
 440         where <literal>t</literal>(<classname>X</classname>) denotes the value of
 441         attribute <classname>X</classname> of tuple <literal>t</literal>.
 442        </para>
 443       </listitem>
 444
 445       <listitem>
 446        <para>
 447         PRODUCT (&times;): builds the Cartesian product of two
 448         relations. Let <classname>R</classname> be a table with arity
 449         <literal>k</literal><subscript>1</subscript> and let
 450         <classname>S</classname> be a table with
 451         arity <literal>k</literal><subscript>2</subscript>.
 452         <classname>R</classname> &times; <classname>S</classname>
 453         is the set of all
 454         <literal>k</literal><subscript>1</subscript>
 455         + <literal>k</literal><subscript>2</subscript>-tuples
 456         whose first <literal>k</literal><subscript>1</subscript>
 457         components form a tuple in <classname>R</classname> and whose last
 458         <literal>k</literal><subscript>2</subscript> components form a
 459         tuple in <classname>S</classname>.
 460        </para>
 461       </listitem>
 462
 463       <listitem>
 464        <para>
 465         UNION (&cup;): builds the set-theoretic union of two
 466         tables. Given the tables <classname>R</classname> and
 467         <classname>S</classname> (both must have the same arity),
 468         the union <classname>R</classname> &cup; <classname>S</classname>
 469         is the set of tuples that are in <classname>R</classname>
 470         or <classname>S</classname> or both.
 471        </para>
 472       </listitem>
 473
 474       <listitem>
 475        <para>
 476         INTERSECT (&cap;): builds the set-theoretic intersection of two
 477         tables. Given the tables <classname>R</classname> and
 478         <classname>S</classname>,
 479         <classname>R</classname> &cap; <classname>S</classname> is the
 480         set of tuples
 481         that are in <classname>R</classname> and in
 482         <classname>S</classname>.
 483         We again require that <classname>R</classname> and
 484         <classname>S</classname> have the
 485         same arity.
 486        </para>
 487       </listitem>
 488
 489       <listitem>
 490        <para>
 491         DIFFERENCE (&minus; or &setmn;): builds the set difference of
 492         two tables. Let <classname>R</classname> and <classname>S</classname>
 493         again be two tables with the same
 494         arity. <classname>R</classname> - <classname>S</classname>
 495         is the set of tuples in <classname>R</classname> but not in
 496         <classname>S</classname>.
 497        </para>
 498       </listitem>
 499
 500       <listitem>
 501        <para>
 502         JOIN (&prod;): connects two tables by their common
 503         attributes. Let <classname>R</classname> be a table with the
 504         attributes <classname>A</classname>,<classname>B</classname>
 505         and <classname>C</classname> and
 506         let <classname>S</classname> be a table with the attributes
 507         <classname>C</classname>,<classname>D</classname>
 508         and <classname>E</classname>. There is one
 509         attribute common to both relations,
 510         the attribute <classname>C</classname>.
 511 <!--
 512         <classname>R</classname> &prod; <classname>S</classname> =
 513         &pi;<subscript><classname>R</classname>.<classname>A</classname>,<classname>R</classname>.<classname>B</classname>,<classname>R</classname>.<classname>C</classname>,<classname>S</classname>.<classname>D</classname>,<classname>S</classname>.<classname>E</classname></subscript>(&sigma;<subscript><classname>R</classname>.<classname>C</classname>=<classname>S</classname>.<classname>C</classname></subscript>(<classname>R</classname> &times; <classname>S</classname>)).
 514 -->
 515         R &prod; S = &pi;<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(&sigma;<subscript>R.C=S.C</subscript>(R &times; S)).
 516         What are we doing here? We first calculate the Cartesian
 517         product
 518         <classname>R</classname> &times; <classname>S</classname>.
 519         Then we select those tuples whose values for the common
 520         attribute <classname>C</classname> are equal
 521         (&sigma;<subscript>R.C = S.C</subscript>).
 522         Now we have a table
 523         that contains the attribute <classname>C</classname>
 524         two times and we correct this by
 525         projecting out the duplicate column.
 526        </para>
 527
 528        <example>
 529         <title id="join-example">An Inner Join</title>
 530
 531         <para>
 532          Let's have a look at the tables that are produced by evaluating the steps
 533          necessary for a join.
 534          Let the following two tables be given:
 535
 536          <programlisting>
 537 R:                 S:
 538  A | B | C          C | D | E
 539 ---+---+---        ---+---+---
 540  1 | 2 | 3          3 | a | b
 541  4 | 5 | 6          6 | c | d
 542  7 | 8 | 9
 543          </programlisting>
 544         </para>
 545        </example>
 546
 547        <para>
 548         First we calculate the Cartesian product
 549         <classname>R</classname> &times; <classname>S</classname> and
 550         get:
 551
 552         <programlisting>
 553 R x S:
 554  A | B | R.C | S.C | D | E
 555 ---+---+-----+-----+---+---
 556  1 | 2 |  3  |  3  | a | b
 557  1 | 2 |  3  |  6  | c | d
 558  4 | 5 |  6  |  3  | a | b
 559  4 | 5 |  6  |  6  | c | d
 560  7 | 8 |  9  |  3  | a | b
 561  7 | 8 |  9  |  6  | c | d
 562         </programlisting>
 563        </para>
 564
 565        <para>
 566         After the selection
 567         &sigma;<subscript>R.C=S.C</subscript>(R &times; S)
 568         we get:
 569
 570         <programlisting>
 571  A | B | R.C | S.C | D | E
 572 ---+---+-----+-----+---+---
 573  1 | 2 |  3  |  3  | a | b
 574  4 | 5 |  6  |  6  | c | d
 575         </programlisting>
 576        </para>
 577
 578        <para>
 579         To remove the duplicate column
 580         <classname>S</classname>.<classname>C</classname>
 581         we project it out by the following operation:
 582         &pi;<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(&sigma;<subscript>R.C=S.C</subscript>(R &times; S))
 583         and get:
 584
 585         <programlisting>
 586  A | B | C | D | E
 587 ---+---+---+---+---
 588  1 | 2 | 3 | a | b
 589  4 | 5 | 6 | c | d
 590         </programlisting>
 591        </para>
 592       </listitem>
 593
 594       <listitem>
 595        <para>
 596         DIVIDE (&divide;): Let <classname>R</classname> be a table
 597         with the attributes A, B, C, and D and let
 598         <classname>S</classname> be a table with the attributes
 599         C and D.
 600         Then we define the division as:
 601
 602         <programlisting>
 603 R &divide; S = {t &mid; &forall; t<subscript>s</subscript> &isin; S &exist; t<subscript>r</subscript> &isin; R
 604         </programlisting>
 605
 606         such that
 607 t<subscript>r</subscript>(A,B)=t&and;t<subscript>r</subscript>(C,D)=t<subscript>s</subscript>}
 608         where
 609         t<subscript>r</subscript>(x,y)
 610         denotes a
 611         tuple of table <classname>R</classname> that consists only of
 612         the components <literal>x</literal> and <literal>y</literal>.
 613         Note that the tuple <literal>t</literal> only consists of the
 614         components <classname>A</classname> and
 615         <classname>B</classname> of relation <classname>R</classname>.
 616        </para>
 617
 618        <para id="divide-example">
 619         Given the following tables
 620
 621         <programlisting>
 622 R:                    S:
 623  A | B | C | D         C | D
 624 ---+---+---+---       ---+---
 625  a | b | c | d         c | d
 626  a | b | e | f         e | f
 627  b | c | e | f
 628  e | d | c | d
 629  e | d | e | f
 630  a | b | d | e
 631         </programlisting>
 632
 633         R &divide; S
 634         is derived as
 635
 636         <programlisting>
 637  A | B
 638 ---+---
 639  a | b
 640  e | d
 641         </programlisting>
 642        </para>
 643       </listitem>
 644      </itemizedlist>
 645     </para>
 646
 647     <para>
 648      For a more detailed description and definition of the relational
 649      algebra refer to [<xref linkend="ULL88" endterm="ULL88">] or
 650      [<xref linkend="DATE94" endterm="DATE94">].
 651     </para>
 652
 653     <example>
 654      <title id="suppl-rel-alg">A Query Using Relational Algebra</title>
 655      <para>
 656       Recall that we formulated all those relational operators to be able to
 657       retrieve data from the database. Let's return to our example from
 658       the previous
 659       section (<xref linkend="operations" endterm="operations">)
 660       where someone wanted to know the names of all
 661       suppliers that sell the part <literal>Screw</literal>.
 662       This question can be answered
 663       using relational algebra by the following operation:
 664
 665       <programlisting>
 666 &pi;<subscript>SUPPLIER.SNAME</subscript>(&sigma;<subscript>PART.PNAME='Screw'</subscript>(SUPPLIER &prod; SELLS &prod; PART))
 667       </programlisting>
 668      </para>
 669
 670      <para>
 671       We call such an operation a query. If we evaluate the above query
 672       against the our example tables
 673       (<xref linkend="supplier-fig" endterm="supplier-fig">)
 674       we will obtain the following result:
 675
 676       <programlisting>
 677  SNAME
 678 -------
 679  Smith
 680  Adams
 681       </programlisting>
 682      </para>
 683     </example>
 684    </sect2>
 685
 686    <sect2 id="rel-calc">
 687     <title>Relational Calculus</title>
 688
 689     <para>
 690      The relational calculus is based on the
 691      <firstterm>first order logic</firstterm>. There are
 692      two variants of the relational calculus:
 693
 694      <itemizedlist>
 695       <listitem>
 696        <para>
 697         The <firstterm>Domain Relational Calculus</firstterm>
 698         (<acronym>DRC</acronym>), where variables
 699         stand for components (attributes) of the tuples.
 700        </para>
 701       </listitem>
 702
 703       <listitem>
 704        <para>
 705         The <firstterm>Tuple Relational Calculus</firstterm>
 706         (<acronym>TRC</acronym>), where variables stand for tuples.
 707        </para>
 708       </listitem>
 709      </itemizedlist>
 710     </para>
 711
 712     <para>
 713      We want to discuss the tuple relational calculus only because it is
 714      the one underlying the most relational languages. For a detailed
 715      discussion on <acronym>DRC</acronym> (and also
 716      <acronym>TRC</acronym>) see
 717      <xref linkend="DATE94" endterm="DATE94">
 718      or
 719      <xref linkend="ULL88" endterm="ULL88">.
 720     </para>
 721    </sect2>
 722
 723    <sect2>
 724     <title>Tuple Relational Calculus</title>
 725
 726     <para>
 727      The queries used in <acronym>TRC</acronym> are of the following
 728      form:
 729
 730      <programlisting>
 731 x(A) &mid; F(x)
 732      </programlisting>
 733
 734      where <literal>x</literal> is a tuple variable
 735      <classname>A</classname> is a set of attributes and <literal>F</literal> is a
 736      formula. The resulting relation consists of all tuples
 737      <literal>t(A)</literal> that satisfy <literal>F(t)</literal>.
 738     </para>
 739
 740     <para>
 741      If we want to answer the question from example
 742      <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">
 743      using <acronym>TRC</acronym> we formulate the following query:
 744
 745      <programlisting>
 746 {x(SNAME) &mid; x &isin; SUPPLIER &and;
 747     &exist; y &isin; SELLS &exist; z &isin; PART (y(SNO)=x(SNO) &and;
 748     z(PNO)=y(PNO) &and;
 749     z(PNAME)='Screw')}
 750      </programlisting>
 751     </para>
 752
 753     <para>
 754      Evaluating the query against the tables from
 755      <xref linkend="supplier-fig" endterm="supplier-fig">
 756      again leads to the same result
 757      as in
 758      <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">.
 759     </para>
 760    </sect2>
 761
 762    <sect2 id="alg-vs-calc">
 763     <title>Relational Algebra vs. Relational Calculus</title>
 764
 765     <para>
 766      The relational algebra and the relational calculus have the same
 767      <firstterm>expressive power</firstterm>; i.e. all queries that
 768      can be formulated using relational algebra can also be formulated
 769      using the relational calculus and vice versa.
 770      This was first proved by E. F. Codd in
 771      1972. This proof is based on an algorithm ("Codd's reduction
 772      algorithm") by which an arbitrary expression of the relational
 773      calculus can be reduced to a semantically equivalent expression of
 774      relational algebra. For a more detailed discussion on that refer to
 775      <xref linkend="DATE94" endterm="DATE94">
 776      and
 777      <xref linkend="ULL88" endterm="ULL88">.
 778     </para>
 779
 780     <para>
 781      It is sometimes said that languages based on the relational calculus
 782      are "higher level" or "more declarative" than languages based on
 783      relational algebra because the algebra (partially) specifies the order
 784      of operations while the calculus leaves it to a compiler or
 785      interpreter to determine the most efficient order of evaluation.
 786     </para>
 787    </sect2>
 788   </sect1>
 789
 790   <sect1 id="sql-language">
 791    <title>The <acronym>SQL</acronym> Language</title>
 792
 793    <para>
 794     As is the case with most modern relational languages,
 795     <acronym>SQL</acronym> is based on the tuple
 796     relational calculus. As a result every query that can be formulated
 797     using the tuple relational calculus (or equivalently, relational
 798     algebra) can also be formulated using
 799     <acronym>SQL</acronym>. There are, however,
 800     capabilities beyond the scope of relational algebra or calculus. Here
 801     is a list of some additional features provided by
 802     <acronym>SQL</acronym> that are not
 803     part of relational algebra or calculus:
 804
 805     <itemizedlist>
 806      <listitem>
 807       <para>
 808        Commands for insertion, deletion or modification of data.
 809       </para>
 810      </listitem>
 811
 812      <listitem>
 813       <para>
 814        Arithmetic capability: In <acronym>SQL</acronym> it is possible
 815        to involve
 816        arithmetic operations as well as comparisons, e.g.
 817
 818        <programlisting>
 819 A &lt; B + 3.
 820        </programlisting>
 821
 822        Note
 823        that + or other arithmetic operators appear neither in relational
 824        algebra nor in relational calculus.
 825       </para>
 826      </listitem>
 827
 828      <listitem>
 829       <para>
 830        Assignment and Print Commands: It is possible to print a
 831        relation constructed by a query and to assign a computed relation to a
 832        relation name.
 833       </para>
 834      </listitem>
 835
 836      <listitem>
 837       <para>
 838        Aggregate Functions: Operations such as
 839        <firstterm>average</firstterm>, <firstterm>sum</firstterm>,
 840        <firstterm>max</firstterm>, etc. can be applied to columns of a
 841        relation to
 842        obtain a single quantity.
 843       </para>
 844      </listitem>
 845     </itemizedlist>
 846    </para>
 847
 848    <sect2 id="select">
 849     <title id="select-title">Select</title>
 850
 851     <para>
 852      The most often used command in <acronym>SQL</acronym> is the
 853      SELECT statement,
 854      used to retrieve data. The syntax is:
 855
 856      <synopsis>
 857 SELECT [ALL|DISTINCT]
 858     { * | <replaceable class="parameter">expr_1</replaceable> [AS <replaceable class="parameter">c_alias_1</replaceable>] [, ...
 859      [, <replaceable class="parameter">expr_k</replaceable> [AS <replaceable class="parameter">c_alias_k</replaceable>]]]}
 860     FROM <replaceable class="parameter">table_name_1</replaceable> [<replaceable class="parameter">t_alias_1</replaceable>]
 861      [, ... [, <replaceable class="parameter">table_name_n</replaceable> [<replaceable class="parameter">t_alias_n</replaceable>]]]
 862     [WHERE <replaceable class="parameter">condition</replaceable>]
 863     [GROUP BY <replaceable class="parameter">name_of_attr_i</replaceable>
 864      [,... [, <replaceable class="parameter">name_of_attr_j</replaceable>]] [HAVING <replaceable class="parameter">condition</replaceable>]]
 865     [{UNION [ALL] | INTERSECT | EXCEPT} SELECT ...]
 866     [ORDER BY <replaceable class="parameter">name_of_attr_i</replaceable> [ASC|DESC]
 867      [, ... [, <replaceable class="parameter">name_of_attr_j</replaceable> [ASC|DESC]]]];
 868      </synopsis>
 869     </para>
 870
 871     <para>
 872      Now we will illustrate the complex syntax of the SELECT statement
 873      with various examples. The tables used for the examples are defined in
 874      <xref linkend="supplier-fig" endterm="supplier-fig">.
 875     </para>
 876
 877     <sect3>
 878      <title>Simple Selects</title>
 879
 880      <para>
 881       Here are some simple examples using a SELECT statement:
 882
 883       <example>
 884        <title id="simple-query">Simple Query with Qualification</title>
 885        <para>
 886         To retrieve all tuples from table PART where the attribute PRICE is
 887         greater than 10 we formulate the following query:
 888
 889         <programlisting>
 890 SELECT * FROM PART
 891     WHERE PRICE > 10;
 892         </programlisting>
 893
 894         and get the table:
 895
 896         <programlisting>
 897  PNO |  PNAME  |  PRICE
 898 -----+---------+--------
 899   3  |  Bolt   |   15
 900   4  |  Cam    |   25
 901         </programlisting>
 902        </para>
 903
 904        <para>
 905         Using "*" in the SELECT statement will deliver all attributes from
 906         the table. If we want to retrieve only the attributes PNAME and PRICE
 907         from table PART we use the statement:
 908
 909         <programlisting>
 910 SELECT PNAME, PRICE
 911     FROM PART
 912     WHERE PRICE > 10;
 913         </programlisting>
 914
 915         In this case the result is:
 916
 917         <programlisting>
 918                       PNAME  |  PRICE
 919                      --------+--------
 920                       Bolt   |   15
 921                       Cam    |   25
 922         </programlisting>
 923
 924         Note that the <acronym>SQL</acronym> SELECT corresponds to the
 925         "projection" in relational algebra not to the "selection"
 926         (see <xref linkend="rel-alg" endterm="rel-alg"> for more details).
 927        </para>
 928
 929        <para>
 930         The qualifications in the WHERE clause can also be logically connected
 931         using the keywords OR, AND, and NOT:
 932
 933         <programlisting>
 934 SELECT PNAME, PRICE
 935     FROM PART
 936     WHERE PNAME = 'Bolt' AND
 937          (PRICE = 0 OR PRICE < 15);
 938         </programlisting>
 939
 940         will lead to the result:
 941
 942         <programlisting>
 943  PNAME  |  PRICE
 944 --------+--------
 945  Bolt   |   15
 946         </programlisting>
 947        </para>
 948
 949        <para>
 950         Arithmetic operations may be used in the target list and in the WHERE
 951         clause. For example if we want to know how much it would cost if we
 952         take two pieces of a part we could use the following query:
 953
 954         <programlisting>
 955 SELECT PNAME, PRICE * 2 AS DOUBLE
 956     FROM PART
 957     WHERE PRICE * 2 < 50;
 958         </programlisting>
 959
 960         and we get:
 961
 962         <programlisting>
 963  PNAME  |  DOUBLE
 964 --------+---------
 965  Screw  |    20
 966  Nut    |    16
 967  Bolt   |    30
 968         </programlisting>
 969
 970         Note that the word DOUBLE after the keyword AS is the new title of the
 971         second column. This technique can be used for every element of the
 972         target list to assign a new title to the resulting
 973         column. This new title
 974         is often referred to as alias. The alias cannot be used throughout the
 975         rest of the query.
 976        </para>
 977       </example>
 978      </para>
 979     </sect3>
 980
 981     <sect3>
 982      <title>Joins</title>
 983
 984      <para id="simple-join">
 985       The following example shows how <firstterm>joins</firstterm> are
 986       realized in <acronym>SQL</acronym>.
 987      </para>
 988
 989      <para>
 990       To join the three tables SUPPLIER, PART and SELLS over their common
 991       attributes we formulate the following statement:
 992
 993       <programlisting>
 994 SELECT S.SNAME, P.PNAME
 995     FROM SUPPLIER S, PART P, SELLS SE
 996     WHERE S.SNO = SE.SNO AND
 997           P.PNO = SE.PNO;
 998       </programlisting>
 999
1000       and get the following table as a result:
1001
1002       <programlisting>
1003  SNAME | PNAME
1004 -------+-------
1005  Smith | Screw
1006  Smith | Nut
1007  Jones | Cam
1008  Adams | Screw
1009  Adams | Bolt
1010  Blake | Nut
1011  Blake | Bolt
1012  Blake | Cam
1013       </programlisting>
1014      </para>
1015
1016      <para>
1017       In the FROM clause we introduced an alias name for every relation
1018       because there are common named attributes (SNO and PNO) among the
1019       relations. Now we can distinguish between the common named attributes
1020       by simply prefixing the attribute name with the alias name followed by
1021       a dot. The join is calculated in the same way as shown in
1022       <xref linkend="join-example" endterm="join-example">.
1023       First the Cartesian product
1024
1025       SUPPLIER &times; PART &times; SELLS
1026
1027       is derived. Now only those tuples satisfying the
1028       conditions given in the WHERE clause are selected (i.e. the common
1029       named attributes have to be equal). Finally we project out all
1030       columns but S.SNAME and P.PNAME.
1031      </para>
1032     </sect3>
1033
1034     <sect3>
1035      <title>Aggregate Operators</title>
1036
1037      <para>
1038       <acronym>SQL</acronym> provides aggregate operators
1039       (e.g. AVG, COUNT, SUM, MIN, MAX) that
1040       take the name of an attribute as an argument. The value of the
1041       aggregate operator is calculated over all values of the specified
1042       attribute (column) of the whole table. If groups are specified in the
1043       query the calculation is done only over the values of a group (see next
1044       section).
1045
1046       <example>
1047        <title id="aggregates-example">Aggregates</title>
1048
1049        <para>
1050         If we want to know the average cost of all parts in table PART we use
1051         the following query:
1052
1053         <programlisting>
1054 SELECT AVG(PRICE) AS AVG_PRICE
1055     FROM PART;
1056         </programlisting>
1057        </para>
1058
1059        <para>
1060         The result is:
1061
1062         <programlisting>
1063  AVG_PRICE
1064 -----------
1065    14.5
1066         </programlisting>
1067        </para>
1068
1069        <para>
1070         If we want to know how many parts are stored in table PART we use
1071         the statement:
1072
1073         <programlisting>
1074 SELECT COUNT(PNO)
1075     FROM PART;
1076         </programlisting>
1077
1078         and get:
1079
1080         <programlisting>
1081  COUNT
1082 -------
1083    4
1084         </programlisting>
1085
1086        </para>
1087       </example>
1088      </para>
1089     </sect3>
1090
1091     <sect3>
1092      <title>Aggregation by Groups</title>
1093
1094      <para>
1095       <acronym>SQL</acronym> allows one to partition the tuples of a table
1096       into groups. Then the
1097       aggregate operators described above can be applied to the groups
1098       (i.e. the value of the aggregate operator is no longer calculated over
1099       all the values of the specified column but over all values of a
1100       group. Thus the aggregate operator is evaluated individually for every
1101       group.)
1102      </para>
1103
1104      <para>
1105       The partitioning of the tuples into groups is done by using the
1106       keywords <command>GROUP BY</command> followed by a list of
1107       attributes that define the
1108       groups. If we have
1109       <command>GROUP BY A<subscript>1</subscript>, &tdot;, A<subscript>k</subscript></command>
1110       we partition
1111       the relation into groups, such that two tuples are in the same group
1112       if and only if they agree on all the attributes
1113       A<subscript>1</subscript>, &tdot;, A<subscript>k</subscript>.
1114
1115       <example>
1116        <title id="aggregates-groupby">Aggregates</title>
1117        <para>
1118         If we want to know how many parts are sold by every supplier we
1119         formulate the query:
1120
1121         <programlisting>
1122 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1123     FROM SUPPLIER S, SELLS SE
1124     WHERE S.SNO = SE.SNO
1125     GROUP BY S.SNO, S.SNAME;
1126         </programlisting>
1127
1128         and get:
1129
1130         <programlisting>
1131  SNO | SNAME | COUNT
1132 -----+-------+-------
1133   1  | Smith |   2
1134   2  | Jones |   1
1135   3  | Adams |   2
1136   4  | Blake |   3
1137         </programlisting>
1138        </para>
1139
1140        <para>
1141         Now let's have a look of what is happening here.
1142         First the join of the
1143         tables SUPPLIER and SELLS is derived:
1144
1145         <programlisting>
1146  S.SNO | S.SNAME | SE.PNO
1147 -------+---------+--------
1148    1   |  Smith  |   1
1149    1   |  Smith  |   2
1150    2   |  Jones  |   4
1151    3   |  Adams  |   1
1152    3   |  Adams  |   3
1153    4   |  Blake  |   2
1154    4   |  Blake  |   3
1155    4   |  Blake  |   4
1156         </programlisting>
1157        </para>
1158
1159        <para>
1160         Next we partition the tuples into groups by putting all tuples
1161         together that agree on both attributes S.SNO and S.SNAME:
1162
1163         <programlisting>
1164  S.SNO | S.SNAME | SE.PNO
1165 -------+---------+--------
1166    1   |  Smith  |   1
1167                  |   2
1168 --------------------------
1169    2   |  Jones  |   4
1170 --------------------------
1171    3   |  Adams  |   1
1172                  |   3
1173 --------------------------
1174    4   |  Blake  |   2
1175                  |   3
1176                  |   4
1177         </programlisting>
1178        </para>
1179
1180        <para>
1181         In our example we got four groups and now we can apply the aggregate
1182         operator COUNT to every group leading to the total result of the query
1183         given above.
1184        </para>
1185       </example>
1186      </para>
1187
1188      <para>
1189       Note that for the result of a query using GROUP BY and aggregate
1190       operators to make sense the attributes grouped by must also appear in
1191       the target list. All further attributes not appearing in the GROUP
1192       BY clause can only be selected by using an aggregate function. On
1193       the other hand you can not use aggregate functions on attributes
1194       appearing in the GROUP BY clause.
1195      </para>
1196     </sect3>
1197
1198     <sect3>
1199      <title>Having</title>
1200
1201      <para>
1202       The HAVING clause works much like the WHERE clause and is used to
1203       consider only those groups satisfying the qualification given in the
1204       HAVING clause. The expressions allowed in the HAVING clause must
1205       involve aggregate functions. Every expression using only  plain
1206       attributes belongs to the WHERE clause. On the other hand every
1207       expression involving an aggregate function must be put to the HAVING
1208       clause.
1209
1210       <example>
1211        <title id="having-example">Having</title>
1212
1213        <para>
1214         If we want only those suppliers selling more than one part we use the
1215         query:
1216
1217         <programlisting>
1218 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1219     FROM SUPPLIER S, SELLS SE
1220     WHERE S.SNO = SE.SNO
1221     GROUP BY S.SNO, S.SNAME
1222     HAVING COUNT(SE.PNO) > 1;
1223         </programlisting>
1224
1225         and get:
1226
1227         <programlisting>
1228  SNO | SNAME | COUNT
1229 -----+-------+-------
1230   1  | Smith |   2
1231   3  | Adams |   2
1232   4  | Blake |   3
1233         </programlisting>
1234        </para>
1235       </example>
1236      </para>
1237     </sect3>
1238
1239     <sect3>
1240      <title>Subqueries</title>
1241
1242      <para>
1243       In the WHERE and HAVING clauses the use of subqueries (subselects) is
1244       allowed in every place where a value is expected. In this case the
1245       value must be derived by evaluating the subquery first. The usage of
1246       subqueries extends the expressive power of
1247       <acronym>SQL</acronym>.
1248
1249       <example>
1250        <title id="subselect-example">Subselect</title>
1251
1252        <para>
1253         If we want to know all parts having a greater price than the part
1254         named 'Screw' we use the query:
1255
1256         <programlisting>
1257 SELECT *
1258     FROM PART
1259     WHERE PRICE > (SELECT PRICE FROM PART
1260                    WHERE PNAME='Screw');
1261         </programlisting>
1262        </para>
1263
1264        <para>
1265         The result is:
1266
1267         <programlisting>
1268  PNO |  PNAME  |  PRICE
1269 -----+---------+--------
1270   3  |  Bolt   |   15
1271   4  |  Cam    |   25
1272         </programlisting>
1273        </para>
1274
1275        <para>
1276         When we look at the above query we can see
1277         the keyword SELECT two times. The first one at the beginning of the
1278         query - we will refer to it as outer SELECT - and the one in the WHERE
1279         clause which begins a nested query - we will refer to it as inner
1280         SELECT. For every tuple of the outer SELECT the inner SELECT has to be
1281         evaluated. After every evaluation we know the price of the tuple named
1282         'Screw' and we can check if the price of the actual tuple is
1283         greater.
1284        </para>
1285
1286        <para>
1287         If we want to know all suppliers that do not sell any part
1288         (e.g. to be able to remove these suppliers from the database) we use:
1289
1290         <programlisting>
1291 SELECT *
1292     FROM SUPPLIER S
1293     WHERE NOT EXISTS
1294         (SELECT * FROM SELLS SE
1295          WHERE SE.SNO = S.SNO);
1296         </programlisting>
1297        </para>
1298
1299        <para>
1300         In our example the result will be empty because every supplier sells
1301         at least one part. Note that we use S.SNO from the outer SELECT within
1302         the WHERE clause of the inner SELECT. As described above the subquery
1303         is evaluated for every tuple from the outer query i.e. the value for
1304         S.SNO is always taken from the actual tuple of the outer SELECT.
1305        </para>
1306       </example>
1307      </para>
1308     </sect3>
1309
1310     <sect3>
1311      <title>Union, Intersect, Except</title>
1312
1313      <para>
1314       These operations calculate the union, intersect and set theoretic
1315       difference of the tuples derived by two subqueries.
1316
1317       <example>
1318        <title id="union-example">Union, Intersect, Except</title>
1319
1320        <para>
1321         The following query is an example for UNION:
1322
1323         <programlisting>
1324 SELECT S.SNO, S.SNAME, S.CITY
1325     FROM SUPPLIER S
1326     WHERE S.SNAME = 'Jones'
1327     UNION
1328     SELECT S.SNO, S.SNAME, S.CITY
1329     FROM SUPPLIER S
1330     WHERE S.SNAME = 'Adams';
1331         </programlisting>
1332
1333 gives the result:
1334
1335         <programlisting>
1336  SNO | SNAME |  CITY
1337 -----+-------+--------
1338   2  | Jones | Paris
1339   3  | Adams | Vienna
1340         </programlisting>
1341        </para>
1342
1343        <para>
1344         Here an example for INTERSECT:
1345
1346         <programlisting>
1347 SELECT S.SNO, S.SNAME, S.CITY
1348     FROM SUPPLIER S
1349     WHERE S.SNO > 1
1350     INTERSECT
1351     SELECT S.SNO, S.SNAME, S.CITY
1352     FROM SUPPLIER S
1353     WHERE S.SNO > 2;
1354         </programlisting>
1355
1356         gives the result:
1357
1358         <programlisting>
1359  SNO | SNAME |  CITY
1360 -----+-------+--------
1361   2  | Jones | Paris
1362         </programlisting>
1363
1364         The only tuple returned by both parts of the query is the one having $SNO=2$.
1365        </para>
1366
1367        <para>
1368         Finally an example for EXCEPT:
1369
1370         <programlisting>
1371 SELECT S.SNO, S.SNAME, S.CITY
1372     FROM SUPPLIER S
1373     WHERE S.SNO > 1
1374     EXCEPT
1375     SELECT S.SNO, S.SNAME, S.CITY
1376     FROM SUPPLIER S
1377     WHERE S.SNO > 3;
1378         </programlisting>
1379
1380         gives the result:
1381
1382         <programlisting>
1383  SNO | SNAME |  CITY
1384 -----+-------+--------
1385   2  | Jones | Paris
1386   3  | Adams | Vienna
1387         </programlisting>
1388        </para>
1389       </example>
1390      </para>
1391     </sect3>
1392    </sect2>
1393
1394    <sect2 id="datadef">
1395     <title>Data Definition</title>
1396
1397     <para>
1398      There is a set of commands used for data definition included in the
1399      <acronym>SQL</acronym> language.
1400     </para>
1401
1402     <sect3 id="create">
1403      <title id="create-title">Create Table</title>
1404
1405      <para>
1406       The most fundamental command for data definition is the
1407       one that creates a new relation (a new table). The syntax of the
1408       CREATE TABLE command is:
1409
1410       <synopsis>
1411 CREATE TABLE <replaceable class="parameter">table_name</replaceable>
1412     (<replaceable class="parameter">name_of_attr_1</replaceable> <replaceable class="parameter">type_of_attr_1</replaceable>
1413      [, <replaceable class="parameter">name_of_attr_2</replaceable> <replaceable class="parameter">type_of_attr_2</replaceable>
1414      [, ...]]);
1415       </synopsis>
1416
1417       <example>
1418        <title id="table-create">Table Creation</title>
1419
1420        <para>
1421         To create the tables defined in
1422         <xref linkend="supplier-fig" endterm="supplier-fig"> the
1423         following <acronym>SQL</acronym> statements are used:
1424
1425         <programlisting>
1426 CREATE TABLE SUPPLIER
1427     (SNO   INTEGER,
1428      SNAME VARCHAR(20),
1429      CITY  VARCHAR(20));
1430      </programlisting>
1431
1432      <programlisting>
1433 CREATE TABLE PART
1434     (PNO   INTEGER,
1435      PNAME VARCHAR(20),
1436      PRICE DECIMAL(4 , 2));
1437      </programlisting>
1438
1439      <programlisting>
1440 CREATE TABLE SELLS
1441     (SNO INTEGER,
1442      PNO INTEGER);
1443         </programlisting>
1444        </para>
1445       </example>
1446      </para>
1447     </sect3>
1448
1449     <sect3>
1450      <title>Data Types in <acronym>SQL</acronym></title>
1451
1452      <para>
1453       The following is a list of some data types that are supported by
1454       <acronym>SQL</acronym>:
1455
1456       <itemizedlist>
1457        <listitem>
1458         <para>
1459          INTEGER: signed fullword binary integer (31 bits precision).
1460         </para>
1461        </listitem>
1462
1463        <listitem>
1464         <para>
1465          SMALLINT: signed halfword binary integer (15 bits precision).
1466         </para>
1467        </listitem>
1468
1469        <listitem>
1470         <para>
1471          DECIMAL (<replaceable class="parameter">p</replaceable>[,<replaceable class="parameter">q</replaceable>]):
1472          signed packed decimal number of
1473          <replaceable class="parameter">p</replaceable>
1474          digits precision with assumed
1475          <replaceable class="parameter">q</replaceable>
1476          of them right to the decimal point.
1477
1478 (15 &ge; <replaceable class="parameter">p</replaceable> &ge; <replaceable class="parameter">q</replaceable> &ge; 0).
1479
1480          If <replaceable class="parameter">q</replaceable>
1481          is omitted it is assumed to be 0.
1482         </para>
1483        </listitem>
1484
1485        <listitem>
1486         <para>
1487          FLOAT: signed doubleword floating point number.
1488         </para>
1489        </listitem>
1490
1491        <listitem>
1492         <para>
1493          CHAR(<replaceable class="parameter">n</replaceable>):
1494          fixed length character string of length
1495          <replaceable class="parameter">n</replaceable>.
1496         </para>
1497        </listitem>
1498
1499        <listitem>
1500         <para>
1501          VARCHAR(<replaceable class="parameter">n</replaceable>):
1502          varying length character string of maximum length
1503          <replaceable class="parameter">n</replaceable>.
1504         </para>
1505        </listitem>
1506       </itemizedlist>
1507      </para>
1508     </sect3>
1509
1510     <sect3>
1511      <title>Create Index</title>
1512
1513      <para>
1514       Indices are used to speed up access to a relation. If a relation <classname>R</classname>
1515       has an index on attribute <classname>A</classname> then we can
1516       retrieve all tuples <replaceable>t</replaceable>
1517       having
1518       <replaceable>t</replaceable>(<classname>A</classname>) = <replaceable>a</replaceable>
1519       in time roughly proportional to the number of such
1520       tuples <replaceable>t</replaceable>
1521       rather than in time proportional to the size of <classname>R</classname>.
1522      </para>
1523
1524      <para>
1525       To create an index in <acronym>SQL</acronym>
1526       the CREATE INDEX command is used. The syntax is:
1527
1528       <programlisting>
1529 CREATE INDEX <replaceable class="parameter">index_name</replaceable>
1530     ON <replaceable class="parameter">table_name</replaceable> ( <replaceable class="parameter">name_of_attribute</replaceable> );
1531       </programlisting>
1532      </para>
1533
1534      <para>
1535       <example>
1536        <title id="index-create">Create Index</title>
1537
1538        <para>
1539         To create an index named I on attribute SNAME of relation SUPPLIER
1540         we use the following statement:
1541
1542       <programlisting>
1543 CREATE INDEX I ON SUPPLIER (SNAME);
1544       </programlisting>
1545      </para>
1546
1547        <para>
1548         The created index is maintained automatically, i.e. whenever a new tuple
1549         is inserted into the relation SUPPLIER the index I is adapted. Note
1550         that the only changes a user can percept when an index is present
1551         are an increased speed.
1552        </para>
1553       </example>
1554      </para>
1555     </sect3>
1556
1557     <sect3>
1558      <title>Create View</title>
1559
1560      <para>
1561       A view may be regarded as a <firstterm>virtual table</firstterm>,
1562       i.e. a table that
1563       does not <emphasis>physically</emphasis> exist in the database
1564       but looks to the user
1565       as if it does. By contrast, when we talk of a
1566       <firstterm>base table</firstterm> there is
1567       really a physically stored counterpart of each row of the table
1568       somewhere in the physical storage.
1569      </para>
1570
1571      <para>
1572       Views do not have their own, physically separate, distinguishable
1573       stored data. Instead, the system stores the definition of the
1574       view (i.e. the rules about how to access physically stored base
1575       tables in order to materialize the view) somewhere in the system
1576       catalogs (see
1577       <xref linkend="catalogs-title" endterm="catalogs-title">). For a
1578       discussion on different techniques to implement views refer to
1579 <!--
1580       section
1581       <xref linkend="view-impl" endterm="view-impl">.
1582 -->
1583       <citetitle>SIM98</citetitle>.
1584      </para>
1585
1586      <para>
1587       In <acronym>SQL</acronym> the <command>CREATE VIEW</command>
1588       command is used to define a view. The syntax
1589       is:
1590
1591       <programlisting>
1592 CREATE VIEW <replaceable class="parameter">view_name</replaceable>
1593     AS <replaceable class="parameter">select_stmt</replaceable>
1594       </programlisting>
1595
1596       where <replaceable class="parameter">select_stmt</replaceable>
1597       is a valid select statement as defined
1598       in <xref linkend="select-title" endterm="select-title">.
1599       Note that <replaceable class="parameter">select_stmt</replaceable> is
1600       not executed when the view is created. It is just stored in the
1601       <firstterm>system catalogs</firstterm>
1602       and is executed whenever a query against the view is made.
1603      </para>
1604
1605      <para>
1606       Let the following view definition be given (we use
1607       the tables from
1608       <xref linkend="supplier-fig" endterm="supplier-fig"> again):
1609
1610       <programlisting>
1611 CREATE VIEW London_Suppliers
1612     AS SELECT S.SNAME, P.PNAME
1613         FROM SUPPLIER S, PART P, SELLS SE
1614         WHERE S.SNO = SE.SNO AND
1615               P.PNO = SE.PNO AND
1616               S.CITY = 'London';
1617       </programlisting>
1618      </para>
1619
1620      <para>
1621       Now we can use this <firstterm>virtual relation</firstterm>
1622       <classname>London_Suppliers</classname> as
1623       if it were another base table:
1624
1625       <programlisting>
1626 SELECT * FROM London_Suppliers
1627     WHERE P.PNAME = 'Screw';
1628       </programlisting>
1629
1630       which will return the following table:
1631
1632       <programlisting>
1633  SNAME | PNAME
1634 -------+-------
1635  Smith | Screw
1636       </programlisting>
1637      </para>
1638
1639      <para>
1640       To calculate this result the database system has to do a
1641       <emphasis>hidden</emphasis>
1642       access to the base tables SUPPLIER, SELLS and PART first. It
1643       does so by executing the query given in the view definition against
1644       those base tables. After that the additional qualifications
1645       (given in the
1646       query against the view) can be applied to obtain the resulting
1647       table.
1648      </para>
1649     </sect3>
1650
1651     <sect3>
1652      <title>Drop Table, Drop Index, Drop View</title>
1653
1654      <para>
1655       To destroy a table (including all tuples stored in that table) the
1656       DROP TABLE command is used:
1657
1658       <programlisting>
1659 DROP TABLE <replaceable class="parameter">table_name</replaceable>;
1660        </programlisting>
1661       </para>
1662
1663      <para>
1664       To destroy the SUPPLIER table use the following statement:
1665
1666       <programlisting>
1667 DROP TABLE SUPPLIER;
1668       </programlisting>
1669      </para>
1670
1671      <para>
1672       The DROP INDEX command is used to destroy an index:
1673
1674       <programlisting>
1675 DROP INDEX <replaceable class="parameter">index_name</replaceable>;
1676       </programlisting>
1677      </para>
1678
1679      <para>
1680       Finally to destroy a given view use the command DROP VIEW:
1681
1682       <programlisting>
1683 DROP VIEW <replaceable class="parameter">view_name</replaceable>;
1684       </programlisting>
1685      </para>
1686     </sect3>
1687    </sect2>
1688
1689    <sect2>
1690     <title>Data Manipulation</title>
1691
1692     <sect3>
1693      <title>Insert Into</title>
1694
1695      <para>
1696       Once a table is created (see
1697       <xref linkend="create-title" endterm="create-title">), it can be filled
1698       with tuples using the command <command>INSERT INTO</command>.
1699       The syntax is:
1700
1701       <programlisting>
1702 INSERT INTO <replaceable class="parameter">table_name</replaceable> (<replaceable class="parameter">name_of_attr_1</replaceable>
1703     [, <replaceable class="parameter">name_of_attr_2</replaceable> [,...]])
1704     VALUES (<replaceable class="parameter">val_attr_1</replaceable> [, <replaceable class="parameter">val_attr_2</replaceable> [, ...]]);
1705       </programlisting>
1706      </para>
1707
1708      <para>
1709       To insert the first tuple into the relation SUPPLIER (from
1710       <xref linkend="supplier-fig" endterm="supplier-fig">) we use the
1711       following statement:
1712
1713       <programlisting>
1714 INSERT INTO SUPPLIER (SNO, SNAME, CITY)
1715     VALUES (1, 'Smith', 'London');
1716       </programlisting>
1717      </para>
1718
1719      <para>
1720       To insert the first tuple into the relation SELLS we use:
1721
1722       <programlisting>
1723 INSERT INTO SELLS (SNO, PNO)
1724     VALUES (1, 1);
1725       </programlisting>
1726      </para>
1727     </sect3>
1728
1729     <sect3>
1730      <title>Update</title>
1731
1732      <para>
1733       To change one or more attribute values of tuples in a relation the
1734       UPDATE command is used. The syntax is:
1735
1736       <programlisting>
1737 UPDATE <replaceable class="parameter">table_name</replaceable>
1738     SET <replaceable class="parameter">name_of_attr_1</replaceable> = <replaceable class="parameter">value_1</replaceable>
1739         [, ... [, <replaceable class="parameter">name_of_attr_k</replaceable> = <replaceable class="parameter">value_k</replaceable>]]
1740     WHERE <replaceable class="parameter">condition</replaceable>;
1741       </programlisting>
1742      </para>
1743
1744      <para>
1745       To change the value of attribute PRICE of the part 'Screw' in the
1746       relation PART we use:
1747
1748       <programlisting>
1749 UPDATE PART
1750     SET PRICE = 15
1751     WHERE PNAME = 'Screw';
1752       </programlisting>
1753      </para>
1754
1755      <para>
1756       The new value of attribute PRICE of the tuple whose name is 'Screw' is
1757       now 15.
1758      </para>
1759     </sect3>
1760
1761     <sect3>
1762      <title>Delete</title>
1763
1764      <para>
1765       To delete a tuple from a particular table use the command DELETE
1766       FROM. The syntax is:
1767
1768       <programlisting>
1769 DELETE FROM <replaceable class="parameter">table_name</replaceable>
1770     WHERE <replaceable class="parameter">condition</replaceable>;
1771       </programlisting>
1772      </para>
1773
1774      <para>
1775       To delete the supplier called 'Smith' of the table SUPPLIER the
1776       following statement is used:
1777
1778       <programlisting>
1779 DELETE FROM SUPPLIER
1780     WHERE SNAME = 'Smith';
1781       </programlisting>
1782      </para>
1783     </sect3>
1784    </sect2>
1785
1786    <sect2 id="catalogs">
1787     <title id="catalogs-title">System Catalogs</title>
1788
1789     <para>
1790      In every <acronym>SQL</acronym> database system
1791      <firstterm>system catalogs</firstterm> are used to keep
1792      track of which tables, views indexes etc. are defined in the
1793      database. These system catalogs can be queried as if they were normal
1794      relations. For example there is one catalog used for the definition of
1795      views. This catalog stores the query from the view definition. Whenever
1796      a query against a view is made, the system first gets the
1797      <firstterm>view definition query</firstterm> out of the catalog
1798      and materializes the view
1799      before proceeding with the user query (see
1800 <!--
1801       section
1802       <xref linkend="view-impl" endterm="view-impl">.
1803     <citetitle>SIM98</citetitle>
1804 -->
1805      <xref linkend="SIM98" endterm="SIM98">
1806      for a more detailed
1807      description). For more information about system catalogs refer to
1808      <xref linkend="DATE94" endterm="DATE94">.
1809     </para>
1810    </sect2>
1811
1812    <sect2>
1813     <title>Embedded <acronym>SQL</acronym></title>
1814
1815     <para>
1816      In this section we will sketch how <acronym>SQL</acronym> can be
1817      embedded into a host language (e.g. <literal>C</literal>).
1818      There are two main reasons why we want to use <acronym>SQL</acronym>
1819      from a host language:
1820
1821      <itemizedlist>
1822       <listitem>
1823        <para>
1824         There are queries that cannot be formulated using pure <acronym>SQL</acronym>
1825         (i.e. recursive queries). To be able to perform such queries we need a
1826         host language with a greater expressive power than
1827         <acronym>SQL</acronym>.
1828        </para>
1829       </listitem>
1830
1831       <listitem>
1832        <para>
1833         We simply want to access a database from some application that
1834         is written in the host language (e.g. a ticket reservation system
1835         with a graphical user interface is written in C and the information
1836         about which tickets are still left is stored in a database that can be
1837         accessed using embedded <acronym>SQL</acronym>).
1838        </para>
1839       </listitem>
1840      </itemizedlist>
1841     </para>
1842
1843     <para>
1844      A program using embedded <acronym>SQL</acronym>
1845      in a host language consists of statements
1846      of the host language and of
1847      <firstterm>embedded <acronym>SQL</acronym></firstterm>
1848      (<acronym>ESQL</acronym>) statements. Every <acronym>ESQL</acronym>
1849      statement begins with the keywords <command>EXEC SQL</command>.
1850      The <acronym>ESQL</acronym> statements are
1851      transformed to statements of the host language
1852      by a <firstterm>precompiler</firstterm>
1853      (which usually inserts
1854      calls to library routines that perform the various <acronym>SQL</acronym>
1855      commands).
1856     </para>
1857
1858     <para>
1859      When we look at the examples throughout
1860      <xref linkend="select-title" endterm="select-title"> we
1861      realize that the result of the queries is very often a set of
1862      tuples. Most host languages are not designed to operate on sets so we
1863      need a mechanism to access every single tuple of the set of tuples
1864      returned by a SELECT statement. This mechanism can be provided by
1865      declaring a <firstterm>cursor</firstterm>.
1866      After that we can use the FETCH command to
1867      retrieve a tuple and set the cursor to the next tuple.
1868     </para>
1869
1870     <para>
1871      For a detailed discussion on embedded <acronym>SQL</acronym>
1872      refer to
1873      <xref linkend="DATE97" endterm="DATE97">,
1874      <xref linkend="DATE94" endterm="DATE94">,
1875      or
1876      <xref linkend="ULL88" endterm="ULL88">.
1877     </para>
1878    </sect2>
1879   </sect1>
1880  </chapter>
1881
1882 <!-- Keep this comment at the end of the file
1883 Local variables:
1884 mode:sgml
1885 sgml-omittag:nil
1886 sgml-shorttag:t
1887 sgml-minimize-attributes:nil
1888 sgml-always-quote-attributes:t
1889 sgml-indent-step:1
1890 sgml-indent-data:t
1891 sgml-parent-document:nil
1892 sgml-default-dtd-file:"./reference.ced"
1893 sgml-exposed-tags:nil
1894 sgml-local-catalogs:("/usr/lib/sgml/catalog")
1895 sgml-local-ecat-files:nil
1896 End:
1897 -->