+++ /dev/null
-<!-- doc/src/sgml/sql.sgml -->
-
- <chapter id="sql-intro">
- <title>SQL</title>
-
- <abstract>
- <para>
- This chapter introduces the mathematical concepts behind
- relational databases. It is not required reading, so if you bog
- down or want to get straight to some simple examples feel free to
- jump ahead to the next chapter and come back when you have more
- time and patience. This stuff is supposed to be fun!
- </para>
-
- <para>
- This material originally appeared as a part of
- Stefan Simkovics' Master's Thesis
- (<xref linkend="SIM98" endterm="SIM98">).
- </para>
- </abstract>
-
- <para>
- <acronym>SQL</acronym> has become the most popular relational query
- language.
- The name <quote><acronym>SQL</acronym></quote> is an abbreviation for
- <firstterm>Structured Query Language</firstterm>.
- In 1974 Donald Chamberlin and others defined the
- language SEQUEL (<firstterm>Structured English Query
- Language</firstterm>) at IBM
- Research. This language was first implemented in an IBM
- prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version
- of SEQUEL called SEQUEL/2 was defined and the name was changed to
- <acronym>SQL</acronym>
- subsequently.
- </para>
-
- <para>
- A new prototype called System R was developed by IBM in 1977. System R
- implemented a large subset of SEQUEL/2 (now <acronym>SQL</acronym>)
- and a number of
- changes were made to <acronym>SQL</acronym> during the project.
- System R was installed in
- a number of user sites, both internal IBM sites and also some selected
- customer sites. Thanks to the success and acceptance of System R at
- those user sites IBM started to develop commercial products that
- implemented the <acronym>SQL</acronym> language based on the System
- R technology.
- </para>
-
- <para>
- Over the next years IBM and also a number of other vendors announced
- <acronym>SQL</acronym> products such as
- <productname>SQL/DS</productname> (IBM),
- <productname>DB2</productname> (IBM),
- <productname>ORACLE</productname> (Oracle Corp.),
- <productname>DG/SQL</productname> (Data General Corp.),
- and <productname>SYBASE</productname> (Sybase Inc.).
- </para>
-
- <para>
- <acronym>SQL</acronym> is also an official standard now. In 1982
- the American National
- Standards Institute (<acronym>ANSI</acronym>) chartered its
- Database Committee X3H2 to
- develop a proposal for a standard relational language. This proposal
- was ratified in 1986 and consisted essentially of the IBM dialect of
- <acronym>SQL</acronym>. In 1987 this <acronym>ANSI</acronym>
- standard was also accepted as an international
- standard by the International Organization for Standardization
- (<acronym>ISO</acronym>).
- This original standard version of <acronym>SQL</acronym> is often
- referred to,
- informally, as <quote><abbrev>SQL/86</abbrev></quote>. In 1989 the original
- standard was extended
- and this new standard is often, again informally, referred to as
- <quote><abbrev>SQL/89</abbrev></quote>. Also in 1989, a related standard called
- <firstterm>Database Language Embedded <acronym>SQL</acronym></firstterm>
- (<acronym>ESQL</acronym>) was developed.
- </para>
-
- <para>
- The <acronym>ISO</acronym> and <acronym>ANSI</acronym> committees
- have been working for many years on the
- definition of a greatly expanded version of the original standard,
- referred to informally as <firstterm><acronym>SQL2</acronym></firstterm>
- or <firstterm><acronym>SQL/92</acronym></firstterm>. This version became a
- ratified standard - <quote>International Standard ISO/IEC 9075:1992,
- Database Language <acronym>SQL</acronym></quote> - in late 1992.
- <acronym>SQL/92</acronym> is the version
- normally meant when people refer to <quote>the <acronym>SQL</acronym>
- standard</quote>. A detailed
- description of <acronym>SQL/92</acronym> is given in
- <xref linkend="DATE97" endterm="DATE97">. At the time of
- writing this document a new standard informally referred to
- as <firstterm><acronym>SQL3</acronym></firstterm>
- is under development. It is planned to make <acronym>SQL</acronym>
- a Turing-complete
- language, i.e., all computable queries (e.g., recursive queries) will be
- possible. This has now been completed as SQL:2003.
- </para>
-
- <sect1 id="rel-model">
- <title>The Relational Data Model</title>
-
- <para>
- As mentioned before, <acronym>SQL</acronym> is a relational
- language. That means it is
- based on the <firstterm>relational data model</firstterm>
- first published by E.F. Codd in
- 1970. We will give a formal description of the relational model
- later (in
- <xref linkend="formal-notion" endterm="formal-notion">)
- but first we want to have a look at it from a more intuitive
- point of view.
- </para>
-
- <para>
- A <firstterm>relational database</firstterm> is a database that is
- perceived by its
- users as a <firstterm>collection of tables</firstterm> (and
- nothing else but tables).
- A table consists of rows and columns where each row represents a
- record and each column represents an attribute of the records
- contained in the table.
- <xref linkend="supplier-fig" endterm="supplier-fig">
- shows an example of a database consisting of three tables:
-
- <itemizedlist>
- <listitem>
- <para>
- SUPPLIER is a table storing the number
- (SNO), the name (SNAME) and the city (CITY) of a supplier.
- </para>
- </listitem>
-
- <listitem>
- <para>
- PART is a table storing the number (PNO) the name (PNAME) and
- the price (PRICE) of a part.
- </para>
- </listitem>
-
- <listitem>
- <para>
- SELLS stores information about which part (PNO) is sold by which
- supplier (SNO).
- It serves in a sense to connect the other two tables together.
- </para>
- </listitem>
- </itemizedlist>
-
- <example>
- <title id="supplier-fig">The Suppliers and Parts Database</title>
-<screen>
-SUPPLIER: SELLS:
- SNO | SNAME | CITY SNO | PNO
-----+---------+-------- -----+-----
- 1 | Smith | London 1 | 1
- 2 | Jones | Paris 1 | 2
- 3 | Adams | Vienna 2 | 4
- 4 | Blake | Rome 3 | 1
- 3 | 3
- 4 | 2
-PART: 4 | 3
- PNO | PNAME | PRICE 4 | 4
-----+---------+---------
- 1 | Screw | 10
- 2 | Nut | 8
- 3 | Bolt | 15
- 4 | Cam | 25
-</screen>
- </example>
- </para>
-
- <para>
- The tables PART and SUPPLIER can be regarded as
- <firstterm>entities</firstterm> and
- SELLS can be regarded as a <firstterm>relationship</firstterm>
- between a particular
- part and a particular supplier.
- </para>
-
- <para>
- As we will see later, <acronym>SQL</acronym> operates on tables
- like the ones just
- defined but before that we will study the theory of the relational
- model.
- </para>
- </sect1>
-
- <sect1 id="relmodel-formal">
- <title id="formal-notion">Relational Data Model Formalities</title>
-
- <para>
- The mathematical concept underlying the relational model is the
- set-theoretic <firstterm>relation</firstterm> which is a subset of
- the Cartesian
- product of a list of domains. This set-theoretic relation gives
- the model its name (do not confuse it with the relationship from the
- <firstterm>Entity-Relationship model</firstterm>).
- Formally a domain is simply a set of
- values. For example the set of integers is a domain. Also the set of
- character strings of length 20 and the real numbers are examples of
- domains.
- </para>
-
- <para>
-<!--
-\begin{definition}
-The <firstterm>Cartesian product</firstterm> of domains $D_{1},
- D_{2},\ldots, D_{k}$ written
-\mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set of
-all $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \in
-D_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}.
-\end{definition}
--->
- The <firstterm>Cartesian product</firstterm> of domains
- <parameter>D<subscript>1</subscript></parameter>,
- <parameter>D<subscript>2</subscript></parameter>,
- ...
- <parameter>D<subscript>k</subscript></parameter>,
- written
- <parameter>D<subscript>1</subscript></parameter> ×
- <parameter>D<subscript>2</subscript></parameter> ×
- ... ×
- <parameter>D<subscript>k</subscript></parameter>
- is the set of all k-tuples
- <parameter>v<subscript>1</subscript></parameter>,
- <parameter>v<subscript>2</subscript></parameter>,
- ...
- <parameter>v<subscript>k</subscript></parameter>,
- such that
- <parameter>v<subscript>1</subscript></parameter> ∈
- <parameter>D<subscript>1</subscript></parameter>,
- <parameter>v<subscript>2</subscript></parameter> ∈
- <parameter>D<subscript>2</subscript></parameter>,
- ...
- <parameter>v<subscript>k</subscript></parameter> ∈
- <parameter>D<subscript>k</subscript></parameter>.
- </para>
-
- <para>
- For example, when we have
-<!--
- $k=2$, $D_{1}=\{0,1\}$ and
-$D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is
-$\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$.
--->
- <parameter>k</parameter>=2,
- <parameter>D<subscript>1</subscript></parameter>=<literal>{0,1}</literal> and
- <parameter>D<subscript>2</subscript></parameter>=<literal>{a,b,c}</literal> then
- <parameter>D<subscript>1</subscript></parameter> ×
- <parameter>D<subscript>2</subscript></parameter> is
- <literal>{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)}</literal>.
- </para>
-
- <para>
-<!--
-\begin{definition}
-A Relation is any subset of the Cartesian product of one or more
-domains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$}
-\end{definition}
--->
- A Relation is any subset of the Cartesian product of one or more
- domains: <parameter>R</parameter> ⊆
- <parameter>D<subscript>1</subscript></parameter> ×
- <parameter>D<subscript>2</subscript></parameter> ×
- ... ×
- <parameter>D<subscript>k</subscript></parameter>.
- </para>
-
- <para>
- For example <literal>{(0,a),(0,b),(1,a)}</literal> is a relation;
- it is in fact a subset of
- <parameter>D<subscript>1</subscript></parameter> ×
- <parameter>D<subscript>2</subscript></parameter>
- mentioned above.
- </para>
-
- <para>
- The members of a relation are called tuples. Each relation of some
- Cartesian product
- <parameter>D<subscript>1</subscript></parameter> ×
- <parameter>D<subscript>2</subscript></parameter> ×
- ... ×
- <parameter>D<subscript>k</subscript></parameter>
- is said to have arity <literal>k</literal> and is therefore a set
- of <literal>k</literal>-tuples.
- </para>
-
- <para>
- A relation can be viewed as a table (as we already did, remember
- <xref linkend="supplier-fig" endterm="supplier-fig"> where
- every tuple is represented by a row and every column corresponds to
- one component of a tuple. Giving names (called attributes) to the
- columns leads to the definition of a
- <firstterm>relation scheme</firstterm>.
- </para>
-
- <para>
-<!--
-\begin{definition}
-A {\it relation scheme} $R$ is a finite set of attributes
-\mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ for
-each attribute $A_{i}, 1 \le i \le k$ where the values of the
-attributes are taken from. We often write a relation scheme as
-\mbox{$R(A_{1},A_{2},\ldots,A_{k})$}.
-\end{definition}
--->
- A <firstterm>relation scheme</firstterm> <literal>R</literal> is a
- finite set of attributes
- <parameter>A<subscript>1</subscript></parameter>,
- <parameter>A<subscript>2</subscript></parameter>,
- ...
- <parameter>A<subscript>k</subscript></parameter>.
- There is a domain
- <parameter>D<subscript>i</subscript></parameter>,
- for each attribute
- <parameter>A<subscript>i</subscript></parameter>,
- 1 <= <literal>i</literal> <= <literal>k</literal>,
- where the values of the attributes are taken from. We often write
- a relation scheme as
- <literal>R(<parameter>A<subscript>1</subscript></parameter>,
- <parameter>A<subscript>2</subscript></parameter>,
- ...
- <parameter>A<subscript>k</subscript></parameter>)</literal>.
-
- <note>
- <para>
- A <firstterm>relation scheme</firstterm> is just a kind of template
- whereas a <firstterm>relation</firstterm> is an instance of a
- <firstterm>relation
- scheme</firstterm>. The relation consists of tuples (and can
- therefore be
- viewed as a table); not so the relation scheme.
- </para>
- </note>
- </para>
-
- <sect2>
- <title id="domains">Domains vs. Data Types</title>
-
- <para>
- We often talked about <firstterm>domains</firstterm>
- in the last section. Recall that a
- domain is, formally, just a set of values (e.g., the set of integers or
- the real numbers). In terms of database systems we often talk of
- <firstterm>data types</firstterm> instead of domains.
- When we define a table we have to make
- a decision about which attributes to include. Additionally we
- have to decide which kind of data is going to be stored as
- attribute values. For example the values of
- <classname>SNAME</classname> from the table
- <classname>SUPPLIER</classname> will be character strings,
- whereas <classname>SNO</classname> will store
- integers. We define this by assigning a data type to each
- attribute. The type of <classname>SNAME</classname> will be
- <type>VARCHAR(20)</type> (this is the <acronym>SQL</acronym> type
- for character strings of length <= 20),
- the type of <classname>SNO</classname> will be
- <type>INTEGER</type>. With the assignment of a data type we also
- have selected
- a domain for an attribute. The domain of
- <classname>SNAME</classname> is the set of all
- character strings of length <= 20,
- the domain of <classname>SNO</classname> is the set of
- all integer numbers.
- </para>
- </sect2>
- </sect1>
-
- <sect1 id="relmodel-oper">
- <title id="operations">Operations in the Relational Data Model</title>
-
- <para>
- In the previous section
- (<xref linkend="formal-notion" endterm="formal-notion">)
- we defined the mathematical notion of
- the relational model. Now we know how the data can be stored using a
- relational data model but we do not know what to do with all these
- tables to retrieve something from the database yet. For example somebody
- could ask for the names of all suppliers that sell the part
- 'Screw'. Therefore two rather different kinds of notations for
- expressing operations on relations have been defined:
-
- <itemizedlist>
- <listitem>
- <para>
- The <firstterm>Relational Algebra</firstterm> which is an
- algebraic notation,
- where queries are expressed by applying specialized operators to the
- relations.
- </para>
- </listitem>
-
- <listitem>
- <para>
- The <firstterm>Relational Calculus</firstterm> which is a
- logical notation,
- where queries are expressed by formulating some logical restrictions
- that the tuples in the answer must satisfy.
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <sect2>
- <title id="rel-alg">Relational Algebra</title>
-
- <para>
- The <firstterm>Relational Algebra</firstterm> was introduced by
- E. F. Codd in 1972. It consists of a set of operations on relations:
-
- <itemizedlist>
- <listitem>
- <para>
- SELECT (σ): extracts <firstterm>tuples</firstterm> from
- a relation that
- satisfy a given restriction. Let <parameter>R</parameter> be a
- table that contains an attribute
- <parameter>A</parameter>.
-σ<subscript>A=a</subscript>(R) = {t ∈ R ∣ t(A) = a}
- where <literal>t</literal> denotes a
- tuple of <parameter>R</parameter> and <literal>t(A)</literal>
- denotes the value of attribute <parameter>A</parameter> of
- tuple <literal>t</literal>.
- </para>
- </listitem>
-
- <listitem>
- <para>
- PROJECT (π): extracts specified
- <firstterm>attributes</firstterm> (columns) from a
- relation. Let <classname>R</classname> be a relation
- that contains an attribute <classname>X</classname>.
- π<subscript>X</subscript>(<classname>R</classname>) = {t(X) ∣ t ∈ <classname>R</classname>},
- where <literal>t</literal>(<classname>X</classname>) denotes the value of
- attribute <classname>X</classname> of tuple <literal>t</literal>.
- </para>
- </listitem>
-
- <listitem>
- <para>
- PRODUCT (×): builds the Cartesian product of two
- relations. Let <classname>R</classname> be a table with arity
- <literal>k</literal><subscript>1</subscript> and let
- <classname>S</classname> be a table with
- arity <literal>k</literal><subscript>2</subscript>.
- <classname>R</classname> × <classname>S</classname>
- is the set of all
- <literal>k</literal><subscript>1</subscript>
- + <literal>k</literal><subscript>2</subscript>-tuples
- whose first <literal>k</literal><subscript>1</subscript>
- components form a tuple in <classname>R</classname> and whose last
- <literal>k</literal><subscript>2</subscript> components form a
- tuple in <classname>S</classname>.
- </para>
- </listitem>
-
- <listitem>
- <para>
- UNION (∪): builds the set-theoretic union of two
- tables. Given the tables <classname>R</classname> and
- <classname>S</classname> (both must have the same arity),
- the union <classname>R</classname> ∪ <classname>S</classname>
- is the set of tuples that are in <classname>R</classname>
- or <classname>S</classname> or both.
- </para>
- </listitem>
-
- <listitem>
- <para>
- INTERSECT (∩): builds the set-theoretic intersection of two
- tables. Given the tables <classname>R</classname> and
- <classname>S</classname>,
- <classname>R</classname> ∩ <classname>S</classname> is the
- set of tuples
- that are in <classname>R</classname> and in
- <classname>S</classname>.
- We again require that <classname>R</classname> and
- <classname>S</classname> have the
- same arity.
- </para>
- </listitem>
-
- <listitem>
- <para>
- DIFFERENCE (− or ∖): builds the set difference of
- two tables. Let <classname>R</classname> and <classname>S</classname>
- again be two tables with the same
- arity. <classname>R</classname> - <classname>S</classname>
- is the set of tuples in <classname>R</classname> but not in
- <classname>S</classname>.
- </para>
- </listitem>
-
- <listitem>
- <para>
- JOIN (∏): connects two tables by their common
- attributes. Let <classname>R</classname> be a table with the
- attributes <classname>A</classname>,<classname>B</classname>
- and <classname>C</classname> and
- let <classname>S</classname> be a table with the attributes
- <classname>C</classname>,<classname>D</classname>
- and <classname>E</classname>. There is one
- attribute common to both relations,
- the attribute <classname>C</classname>.
-<!--
- <classname>R</classname> ∏ <classname>S</classname> =
- π<subscript><classname>R</classname>.<classname>A</classname>,<classname>R</classname>.<classname>B</classname>,<classname>R</classname>.<classname>C</classname>,<classname>S</classname>.<classname>D</classname>,<classname>S</classname>.<classname>E</classname></subscript>(σ<subscript><classname>R</classname>.<classname>C</classname>=<classname>S</classname>.<classname>C</classname></subscript>(<classname>R</classname> × <classname>S</classname>)).
--->
- R ∏ S = π<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(σ<subscript>R.C=S.C</subscript>(R × S)).
- What are we doing here? We first calculate the Cartesian
- product
- <classname>R</classname> × <classname>S</classname>.
- Then we select those tuples whose values for the common
- attribute <classname>C</classname> are equal
- (σ<subscript>R.C = S.C</subscript>).
- Now we have a table
- that contains the attribute <classname>C</classname>
- two times and we correct this by
- projecting out the duplicate column.
- </para>
-
- <example>
- <title id="join-example">An Inner Join</title>
-
- <para>
- Let's have a look at the tables that are produced by evaluating the steps
- necessary for a join.
- Let the following two tables be given:
-
-<screen>
-R: S:
- A | B | C C | D | E
----+---+--- ---+---+---
- 1 | 2 | 3 3 | a | b
- 4 | 5 | 6 6 | c | d
- 7 | 8 | 9
-</screen>
- </para>
- </example>
-
- <para>
- First we calculate the Cartesian product
- <classname>R</classname> × <classname>S</classname> and
- get:
-
-<screen>
-R x S:
- A | B | R.C | S.C | D | E
----+---+-----+-----+---+---
- 1 | 2 | 3 | 3 | a | b
- 1 | 2 | 3 | 6 | c | d
- 4 | 5 | 6 | 3 | a | b
- 4 | 5 | 6 | 6 | c | d
- 7 | 8 | 9 | 3 | a | b
- 7 | 8 | 9 | 6 | c | d
-</screen>
- </para>
-
- <para>
- After the selection
- σ<subscript>R.C=S.C</subscript>(R × S)
- we get:
-
-<screen>
- A | B | R.C | S.C | D | E
----+---+-----+-----+---+---
- 1 | 2 | 3 | 3 | a | b
- 4 | 5 | 6 | 6 | c | d
-</screen>
- </para>
-
- <para>
- To remove the duplicate column
- <classname>S</classname>.<classname>C</classname>
- we project it out by the following operation:
- π<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(σ<subscript>R.C=S.C</subscript>(R × S))
- and get:
-
-<screen>
- A | B | C | D | E
----+---+---+---+---
- 1 | 2 | 3 | a | b
- 4 | 5 | 6 | c | d
-</screen>
- </para>
- </listitem>
-
- <listitem>
- <para>
- DIVIDE (÷): Let <classname>R</classname> be a table
- with the attributes A, B, C, and D and let
- <classname>S</classname> be a table with the attributes
- C and D.
- Then we define the division as:
-
-<programlisting>
-R ÷ S = {t ∣ ∀ t<subscript>s</subscript> ∈ S ∃ t<subscript>r</subscript> ∈ R
-</programlisting>
-
- such that
-t<subscript>r</subscript>(A,B)=t∧t<subscript>r</subscript>(C,D)=t<subscript>s</subscript>}
- where
- t<subscript>r</subscript>(x,y)
- denotes a
- tuple of table <classname>R</classname> that consists only of
- the components <literal>x</literal> and <literal>y</literal>.
- Note that the tuple <literal>t</literal> only consists of the
- components <classname>A</classname> and
- <classname>B</classname> of relation <classname>R</classname>.
- </para>
-
- <para id="divide-example">
- Given the following tables
-
-<screen>
-R: S:
- A | B | C | D C | D
----+---+---+--- ---+---
- a | b | c | d c | d
- a | b | e | f e | f
- b | c | e | f
- e | d | c | d
- e | d | e | f
- a | b | d | e
-</screen>
-
- R ÷ S
- is derived as
-
-<screen>
- A | B
----+---
- a | b
- e | d
-</screen>
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <para>
- For a more detailed description and definition of the relational
- algebra refer to [<xref linkend="ULL88" endterm="ULL88">] or
- [<xref linkend="DATE04" endterm="DATE04">].
- </para>
-
- <example>
- <title id="suppl-rel-alg">A Query Using Relational Algebra</title>
- <para>
- Recall that we formulated all those relational operators to be able to
- retrieve data from the database. Let's return to our example from
- the previous
- section (<xref linkend="operations" endterm="operations">)
- where someone wanted to know the names of all
- suppliers that sell the part <literal>Screw</literal>.
- This question can be answered
- using relational algebra by the following operation:
-
-<programlisting>
-π<subscript>SUPPLIER.SNAME</subscript>(σ<subscript>PART.PNAME='Screw'</subscript>(SUPPLIER ∏ SELLS ∏ PART))
-</programlisting>
- </para>
-
- <para>
- We call such an operation a query. If we evaluate the above query
- against the our example tables
- (<xref linkend="supplier-fig" endterm="supplier-fig">)
- we will obtain the following result:
-
-<screen>
- SNAME
--------
- Smith
- Adams
-</screen>
- </para>
- </example>
- </sect2>
-
- <sect2 id="rel-calc">
- <title>Relational Calculus</title>
-
- <para>
- The relational calculus is based on the
- <firstterm>first order logic</firstterm>. There are
- two variants of the relational calculus:
-
- <itemizedlist>
- <listitem>
- <para>
- The <firstterm>Domain Relational Calculus</firstterm>
- (<acronym>DRC</acronym>), where variables
- stand for components (attributes) of the tuples.
- </para>
- </listitem>
-
- <listitem>
- <para>
- The <firstterm>Tuple Relational Calculus</firstterm>
- (<acronym>TRC</acronym>), where variables stand for tuples.
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <para>
- We want to discuss the tuple relational calculus only because it is
- the one underlying the most relational languages. For a detailed
- discussion on <acronym>DRC</acronym> (and also
- <acronym>TRC</acronym>) see
- <xref linkend="DATE04" endterm="DATE04">
- or
- <xref linkend="ULL88" endterm="ULL88">.
- </para>
- </sect2>
-
- <sect2>
- <title>Tuple Relational Calculus</title>
-
- <para>
- The queries used in <acronym>TRC</acronym> are of the following
- form:
-
-<programlisting>
-x(A) ∣ F(x)
-</programlisting>
-
- where <literal>x</literal> is a tuple variable
- <classname>A</classname> is a set of attributes and <literal>F</literal> is a
- formula. The resulting relation consists of all tuples
- <literal>t(A)</literal> that satisfy <literal>F(t)</literal>.
- </para>
-
- <para>
- If we want to answer the question from example
- <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">
- using <acronym>TRC</acronym> we formulate the following query:
-
-<programlisting>
-{x(SNAME) ∣ x ∈ SUPPLIER ∧
- ∃ y ∈ SELLS ∃ z ∈ PART (y(SNO)=x(SNO) ∧
- z(PNO)=y(PNO) ∧
- z(PNAME)='Screw')}
-</programlisting>
- </para>
-
- <para>
- Evaluating the query against the tables from
- <xref linkend="supplier-fig" endterm="supplier-fig">
- again leads to the same result
- as in
- <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">.
- </para>
- </sect2>
-
- <sect2 id="alg-vs-calc">
- <title>Relational Algebra vs. Relational Calculus</title>
-
- <para>
- The relational algebra and the relational calculus have the same
- <firstterm>expressive power</firstterm>; i.e., all queries that
- can be formulated using relational algebra can also be formulated
- using the relational calculus and vice versa.
- This was first proved by E. F. Codd in
- 1972. This proof is based on an algorithm (<quote>Codd's reduction
- algorithm</quote>) by which an arbitrary expression of the relational
- calculus can be reduced to a semantically equivalent expression of
- relational algebra. For a more detailed discussion on that refer to
- <xref linkend="DATE04" endterm="DATE04">
- and
- <xref linkend="ULL88" endterm="ULL88">.
- </para>
-
- <para>
- It is sometimes said that languages based on the relational
- calculus are <quote>higher level</quote> or <quote>more
- declarative</quote> than languages based on relational algebra
- because the algebra (partially) specifies the order of operations
- while the calculus leaves it to a compiler or interpreter to
- determine the most efficient order of evaluation.
- </para>
- </sect2>
- </sect1>
-
- <sect1 id="sql-language">
- <title>The <acronym>SQL</acronym> Language</title>
-
- <para>
- As is the case with most modern relational languages,
- <acronym>SQL</acronym> is based on the tuple
- relational calculus. As a result every query that can be formulated
- using the tuple relational calculus (or equivalently, relational
- algebra) can also be formulated using
- <acronym>SQL</acronym>. There are, however,
- capabilities beyond the scope of relational algebra or calculus. Here
- is a list of some additional features provided by
- <acronym>SQL</acronym> that are not
- part of relational algebra or calculus:
-
- <itemizedlist>
- <listitem>
- <para>
- Commands for insertion, deletion or modification of data.
- </para>
- </listitem>
-
- <listitem>
- <para>
- Arithmetic capability: In <acronym>SQL</acronym> it is possible
- to involve
- arithmetic operations as well as comparisons, e.g.:
-
-<programlisting>
-A < B + 3.
-</programlisting>
-
- Note
- that + or other arithmetic operators appear neither in relational
- algebra nor in relational calculus.
- </para>
- </listitem>
-
- <listitem>
- <para>
- Assignment and Print Commands: It is possible to print a
- relation constructed by a query and to assign a computed relation to a
- relation name.
- </para>
- </listitem>
-
- <listitem>
- <para>
- Aggregate Functions: Operations such as
- <firstterm>average</firstterm>, <firstterm>sum</firstterm>,
- <firstterm>max</firstterm>, etc. can be applied to columns of a
- relation to
- obtain a single quantity.
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <sect2 id="select">
- <title id="select-title">Select</title>
-
- <para>
- The most often used command in <acronym>SQL</acronym> is the
- <command>SELECT</command> statement,
- used to retrieve data. The syntax is:
-
-<synopsis>
-SELECT [ ALL | DISTINCT [ ON ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) ] ]
- * | <replaceable class="PARAMETER">expression</replaceable> [ [ AS ] <replaceable class="PARAMETER">output_name</replaceable> ] [, ...]
- [ INTO [ TEMPORARY | TEMP ] [ TABLE ] <replaceable class="PARAMETER">new_table</replaceable> ]
- [ FROM <replaceable class="PARAMETER">from_item</replaceable> [, ...] ]
- [ WHERE <replaceable class="PARAMETER">condition</replaceable> ]
- [ GROUP BY <replaceable class="PARAMETER">expression</replaceable> [, ...] ]
- [ HAVING <replaceable class="PARAMETER">condition</replaceable> [, ...] ]
- [ { UNION | INTERSECT | EXCEPT } [ ALL ] <replaceable class="PARAMETER">select</replaceable> ]
- [ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ]
- [ LIMIT { <replaceable class="PARAMETER">count</replaceable> | ALL } ]
- [ OFFSET <replaceable class="PARAMETER">start</replaceable> ]
- [ FOR { UPDATE | SHARE } [ OF <replaceable class="parameter">table_name</replaceable> [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
-</synopsis>
- </para>
-
- <para>
- Now we will illustrate the complex syntax of the
- <command>SELECT</command> statement with various examples. The
- tables used for the examples are defined in <xref
- linkend="supplier-fig" endterm="supplier-fig">.
- </para>
-
- <sect3>
- <title>Simple Selects</title>
-
- <para>
- Here are some simple examples using a <command>SELECT</command> statement:
-
- <example>
- <title id="simple-query">Simple Query with Qualification</title>
- <para>
- To retrieve all tuples from table PART where the attribute PRICE is
- greater than 10 we formulate the following query:
-
-<programlisting>
-SELECT * FROM PART
- WHERE PRICE > 10;
-</programlisting>
-
- and get the table:
-
-<screen>
- PNO | PNAME | PRICE
------+---------+--------
- 3 | Bolt | 15
- 4 | Cam | 25
-</screen>
- </para>
-
- <para>
- Using <quote>*</quote> in the <command>SELECT</command> statement
- will deliver all attributes from the table. If we want to retrieve
- only the attributes PNAME and PRICE from table PART we use the
- statement:
-
-<programlisting>
-SELECT PNAME, PRICE
- FROM PART
- WHERE PRICE > 10;
-</programlisting>
-
- In this case the result is:
-
-<screen>
- PNAME | PRICE
- --------+--------
- Bolt | 15
- Cam | 25
-</screen>
-
- Note that the <acronym>SQL</acronym> <command>SELECT</command>
- corresponds to the <quote>projection</quote> in relational algebra
- not to the <quote>selection</quote> (see <xref linkend="rel-alg"
- endterm="rel-alg"> for more details).
- </para>
-
- <para>
- The qualifications in the WHERE clause can also be logically connected
- using the keywords OR, AND, and NOT:
-
-<programlisting>
-SELECT PNAME, PRICE
- FROM PART
- WHERE PNAME = 'Bolt' AND
- (PRICE = 0 OR PRICE <= 15);
-</programlisting>
-
- will lead to the result:
-
-<screen>
- PNAME | PRICE
---------+--------
- Bolt | 15
-</screen>
- </para>
-
- <para>
- Arithmetic operations can be used in the target list and in the WHERE
- clause. For example if we want to know how much it would cost if we
- take two pieces of a part we could use the following query:
-
-<programlisting>
-SELECT PNAME, PRICE * 2 AS DOUBLE
- FROM PART
- WHERE PRICE * 2 < 50;
-</programlisting>
-
- and we get:
-
-<screen>
- PNAME | DOUBLE
---------+---------
- Screw | 20
- Nut | 16
- Bolt | 30
-</screen>
-
- Note that the word DOUBLE after the keyword AS is the new title of the
- second column. This technique can be used for every element of the
- target list to assign a new title to the resulting
- column. This new title
- is often referred to as alias. The alias cannot be used throughout the
- rest of the query.
- </para>
- </example>
- </para>
- </sect3>
-
- <sect3>
- <title>Joins</title>
-
- <para id="simple-join">
- The following example shows how <firstterm>joins</firstterm> are
- realized in <acronym>SQL</acronym>.
- </para>
-
- <para>
- To join the three tables SUPPLIER, PART and SELLS over their common
- attributes we formulate the following statement:
-
-<programlisting>
-SELECT S.SNAME, P.PNAME
- FROM SUPPLIER S, PART P, SELLS SE
- WHERE S.SNO = SE.SNO AND
- P.PNO = SE.PNO;
-</programlisting>
-
- and get the following table as a result:
-
-<screen>
- SNAME | PNAME
--------+-------
- Smith | Screw
- Smith | Nut
- Jones | Cam
- Adams | Screw
- Adams | Bolt
- Blake | Nut
- Blake | Bolt
- Blake | Cam
-</screen>
- </para>
-
- <para>
- In the FROM clause we introduced an alias name for every relation
- because there are common named attributes (SNO and PNO) among the
- relations. Now we can distinguish between the common named attributes
- by simply prefixing the attribute name with the alias name followed by
- a dot. The join is calculated in the same way as shown in
- <xref linkend="join-example" endterm="join-example">.
- First the Cartesian product
-
- SUPPLIER × PART × SELLS
-
- is derived. Now only those tuples satisfying the
- conditions given in the WHERE clause are selected (i.e., the common
- named attributes have to be equal). Finally we project out all
- columns but S.SNAME and P.PNAME.
- </para>
-
- <para>
- Another way to perform joins is to use the SQL JOIN syntax as follows:
-<programlisting>
-SELECT sname, pname from supplier
- JOIN sells USING (sno)
- JOIN part USING (pno);
-</programlisting>
- giving again:
-<screen>
- sname | pname
--------+-------
- Smith | Screw
- Adams | Screw
- Smith | Nut
- Blake | Nut
- Adams | Bolt
- Blake | Bolt
- Jones | Cam
- Blake | Cam
-(8 rows)
-</screen>
- </para>
-
- <para>
- A joined table, created using JOIN syntax, is a table reference list
- item that occurs in a FROM clause and before any WHERE, GROUP BY,
- or HAVING clause. Other table references, including table names or
- other JOIN clauses, can be included in the FROM clause if separated
- by commas. JOINed tables are logically like any other
- table listed in the FROM clause.
- </para>
-
- <para>
- SQL JOINs come in two main types, CROSS JOINs (unqualified joins)
- and <firstterm>qualified JOINs</>. Qualified joins can be further
- subdivided based on the way in which the <firstterm>join condition</>
- is specified (ON, USING, or NATURAL) and the way in which it is
- applied (INNER or OUTER join).
- </para>
-
- <variablelist>
- <title>Join Types</title>
- <varlistentry>
- <term>CROSS JOIN</term>
- <listitem>
- <cmdsynopsis>
- <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg>
- <command> CROSS JOIN </command>
- <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg>
- </cmdsynopsis>
-
- <para>
- A cross join takes two tables T1 and T2 having N and M rows
- respectively, and returns a joined table containing all
- N*M possible joined rows. For each row R1 of T1, each row
- R2 of T2 is joined with R1 to yield a joined table row JR
- consisting of all fields in R1 and R2. A CROSS JOIN is
- equivalent to an INNER JOIN ON TRUE.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>Qualified JOINs</term>
- <listitem>
-
- <cmdsynopsis>
- <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg>
- <arg choice="opt"> NATURAL </arg>
- <group choice="opt">
- <arg choice="opt"> INNER </arg>
- <arg choice="plain">
- <group choice="req">
- <arg choice="plain"> LEFT </arg>
- <arg choice="plain"> RIGHT </arg>
- <arg choice="plain"> FULL </arg>
- </group>
- <arg choice="opt"> OUTER </arg>
- </arg>
- </group>
- <command> JOIN </command>
- <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg>
- <group choice="req">
- <arg choice="plain"> ON <replaceable>search condition</replaceable></arg>
- <arg choice="plain"> USING ( <replaceable>join column list</replaceable> ) </arg>
- </group>
- </cmdsynopsis>
-
- <para>
- A qualified JOIN must specify its join condition
- by providing one (and only one) of NATURAL, ON, or
- USING. The ON clause
- takes a <replaceable>search condition</replaceable>,
- which is the same as in a WHERE clause. The USING
- clause takes a comma-separated list of column names,
- which the joined tables must have in common, and joins
- the tables on equality of those columns. NATURAL is
- shorthand for a USING clause that lists all the common
- column names of the two tables. A side-effect of both
- USING and NATURAL is that only one copy of each joined
- column is emitted into the result table (compare the
- relational-algebra definition of JOIN, shown earlier).
- </para>
-
- <!-- begin join semantics -->
- <variablelist>
- <varlistentry>
- <term>
- <cmdsynopsis>
- <arg choice="opt"> INNER </arg>
- <command> JOIN </command>
- </cmdsynopsis>
- </term>
- <listitem>
- <para>
- For each row R1 of T1, the joined table has a row for each row
- in T2 that satisfies the join condition with R1.
- </para>
- <tip>
- <para>
- The words INNER and OUTER are optional for all JOINs.
- INNER is the default. LEFT, RIGHT, and FULL imply an
- OUTER JOIN.
- </para>
- </tip>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>
- <cmdsynopsis>
- <arg choice="plain"> LEFT </arg>
- <arg choice="opt"> OUTER </arg>
- <command> JOIN </command>
- </cmdsynopsis>
- </term>
- <listitem>
- <para>
- First, an INNER JOIN is performed.
- Then, for each row in T1 that does not satisfy the join
- condition with any row in T2, an additional joined row is
- returned with null fields in the columns from T2.
- </para>
- <tip>
- <para>
- The joined table unconditionally has a row for each row in T1.
- </para>
- </tip>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>
- <cmdsynopsis>
- <arg choice="plain"> RIGHT </arg>
- <arg choice="opt"> OUTER </arg>
- <command> JOIN </command>
- </cmdsynopsis>
- </term>
- <listitem>
- <para>
- First, an INNER JOIN is performed.
- Then, for each row in T2 that does not satisfy the join
- condition with any row in T1, an additional joined row is
- returned with null fields in the columns from T1.
- </para>
- <tip>
- <para>
- The joined table unconditionally has a row for each row in T2.
- </para>
- </tip>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>
- <cmdsynopsis>
- <arg choice="plain"> FULL </arg>
- <arg choice="opt"> OUTER </arg>
- <command> JOIN </command>
- </cmdsynopsis>
- </term>
- <listitem>
- <para>
- First, an INNER JOIN is performed.
- Then, for each row in T1 that does not satisfy the join
- condition with any row in T2, an additional joined row is
- returned with null fields in the columns from T2.
- Also, for each row in T2 that does not satisfy the join
- condition with any row in T1, an additional joined row is
- returned with null fields in the columns from T1.
- </para>
- <tip>
- <para>
- The joined table unconditionally has a row for every row of T1
- and a row for every row of T2.
- </para>
- </tip>
- </listitem>
- </varlistentry>
- </variablelist>
- <!-- end join semantics -->
-
- </listitem>
- </varlistentry>
- </variablelist>
-
- <para>
- JOINs of all types can be chained together or nested where either or both of
- <replaceable class="parameter">T1</replaceable> and
- <replaceable class="parameter">T2</replaceable> can be JOINed tables.
- Parenthesis can be used around JOIN clauses to control the order
- of JOINs which are otherwise processed left to right.
- </para>
-
- </sect3>
-
- <sect3>
- <title id="aggregates-tutorial">Aggregate Functions</title>
-
- <para>
- <acronym>SQL</acronym> provides aggregate functions such as AVG,
- COUNT, SUM, MIN, and MAX. The argument(s) of an aggregate function
- are evaluated at each row that satisfies the WHERE
- clause, and the aggregate function is calculated over this set
- of input values. Normally, an aggregate delivers a single
- result for a whole <command>SELECT</command> statement. But if
- grouping is specified in the query, then a separate calculation
- is done over the rows of each group, and an aggregate result is
- delivered per group (see next section).
-
- <example>
- <title id="aggregates-example">Aggregates</title>
-
- <para>
- If we want to know the average cost of all parts in table PART we use
- the following query:
-
-<programlisting>
-SELECT AVG(PRICE) AS AVG_PRICE
- FROM PART;
-</programlisting>
- </para>
-
- <para>
- The result is:
-
-<screen>
- AVG_PRICE
------------
- 14.5
-</screen>
- </para>
-
- <para>
- If we want to know how many parts are defined in table PART we use
- the statement:
-
-<programlisting>
-SELECT COUNT(PNO)
- FROM PART;
-</programlisting>
-
- and get:
-
-<screen>
- COUNT
--------
- 4
-</screen>
-
- </para>
- </example>
- </para>
- </sect3>
-
- <sect3>
- <title>Aggregation by Groups</title>
-
- <para>
- <acronym>SQL</acronym> allows one to partition the tuples of a table
- into groups. Then the
- aggregate functions described above can be applied to the groups —
- i.e., the value of the aggregate function is no longer calculated over
- all the values of the specified column but over all values of a
- group. Thus the aggregate function is evaluated separately for every
- group.
- </para>
-
- <para>
- The partitioning of the tuples into groups is done by using the
- keywords <command>GROUP BY</command> followed by a list of
- attributes that define the
- groups. If we have
- <command>GROUP BY A<subscript>1</subscript>, ⃛, A<subscript>k</subscript></command>
- we partition
- the relation into groups, such that two tuples are in the same group
- if and only if they agree on all the attributes
- A<subscript>1</subscript>, ⃛, A<subscript>k</subscript>.
-
- <example>
- <title id="aggregates-groupby">Aggregates</title>
- <para>
- If we want to know how many parts are sold by every supplier we
- formulate the query:
-
-<programlisting>
-SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
- FROM SUPPLIER S, SELLS SE
- WHERE S.SNO = SE.SNO
- GROUP BY S.SNO, S.SNAME;
-</programlisting>
-
- and get:
-
-<screen>
- SNO | SNAME | COUNT
------+-------+-------
- 1 | Smith | 2
- 2 | Jones | 1
- 3 | Adams | 2
- 4 | Blake | 3
-</screen>
- </para>
-
- <para>
- Now let's have a look of what is happening here.
- First the join of the
- tables SUPPLIER and SELLS is derived:
-
-<screen>
- S.SNO | S.SNAME | SE.PNO
--------+---------+--------
- 1 | Smith | 1
- 1 | Smith | 2
- 2 | Jones | 4
- 3 | Adams | 1
- 3 | Adams | 3
- 4 | Blake | 2
- 4 | Blake | 3
- 4 | Blake | 4
-</screen>
- </para>
-
- <para>
- Next we partition the tuples into groups by putting all tuples
- together that agree on both attributes S.SNO and S.SNAME:
-
-<screen>
- S.SNO | S.SNAME | SE.PNO
--------+---------+--------
- 1 | Smith | 1
- | 2
---------------------------
- 2 | Jones | 4
---------------------------
- 3 | Adams | 1
- | 3
---------------------------
- 4 | Blake | 2
- | 3
- | 4
-</screen>
- </para>
-
- <para>
- In our example we got four groups and now we can apply the aggregate
- function COUNT to every group leading to the final result of the query
- given above.
- </para>
- </example>
- </para>
-
- <para>
- Note that for a query using GROUP BY and aggregate
- functions to make sense, the target list can only refer directly to
- the attributes being grouped by. Other attributes can only be used
- inside the arguments of aggregate functions. Otherwise there would
- not be a unique value to associate with the other attributes.
- </para>
-
- <para>
- Also observe that it makes no sense to ask for an aggregate of
- an aggregate, e.g., AVG(MAX(sno)), because a
- <command>SELECT</command> only does one pass of grouping and
- aggregation. You can get a result of this kind by using a
- temporary table or a sub-SELECT in the FROM clause to do the
- first level of aggregation.
- </para>
- </sect3>
-
- <sect3>
- <title>Having</title>
-
- <para>
- The HAVING clause works much like the WHERE clause and is used to
- consider only those groups satisfying the qualification given in the
- HAVING clause. Essentially, WHERE filters out unwanted input rows
- before grouping and aggregation are done, whereas HAVING filters out
- unwanted group rows post-GROUP. Therefore, WHERE cannot refer to the
- results of aggregate functions. On the other hand, there's no point
- in writing a HAVING condition that doesn't involve an aggregate
- function! If your condition doesn't involve aggregates, you might
- as well write it in WHERE, and thereby avoid the computation of
- aggregates for groups that you're just going to throw away anyway.
-
- <example>
- <title id="having-example">Having</title>
-
- <para>
- If we want only those suppliers selling more than one part we use the
- query:
-
-<programlisting>
-SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
- FROM SUPPLIER S, SELLS SE
- WHERE S.SNO = SE.SNO
- GROUP BY S.SNO, S.SNAME
- HAVING COUNT(SE.PNO) > 1;
-</programlisting>
-
- and get:
-
-<screen>
- SNO | SNAME | COUNT
------+-------+-------
- 1 | Smith | 2
- 3 | Adams | 2
- 4 | Blake | 3
-</screen>
- </para>
- </example>
- </para>
- </sect3>
-
- <sect3>
- <title>Subqueries</title>
-
- <para>
- In the WHERE and HAVING clauses the use of subqueries (subselects) is
- allowed in every place where a value is expected. In this case the
- value must be derived by evaluating the subquery first. The usage of
- subqueries extends the expressive power of
- <acronym>SQL</acronym>.
-
- <example>
- <title id="subselect-example">Subselect</title>
-
- <para>
- If we want to know all parts having a greater price than the part
- named 'Screw' we use the query:
-
-<programlisting>
-SELECT *
- FROM PART
- WHERE PRICE > (SELECT PRICE FROM PART
- WHERE PNAME='Screw');
-</programlisting>
- </para>
-
- <para>
- The result is:
-
-<screen>
- PNO | PNAME | PRICE
------+---------+--------
- 3 | Bolt | 15
- 4 | Cam | 25
-</screen>
- </para>
-
- <para>
- When we look at the above query we can see the keyword
- <command>SELECT</command> two times. The first one at the
- beginning of the query - we will refer to it as outer
- <command>SELECT</command> - and the one in the WHERE clause which
- begins a nested query - we will refer to it as inner
- <command>SELECT</command>. For every tuple of the outer
- <command>SELECT</command> the inner <command>SELECT</command> has
- to be evaluated. After every evaluation we know the price of the
- tuple named 'Screw' and we can check if the price of the actual
- tuple is greater. (Actually, in this example the inner query need
- only be evaluated once, since it does not depend on the state of
- the outer query.)
- </para>
-
- <para>
- If we want to know all suppliers that do not sell any part
- (e.g., to be able to remove these suppliers from the database) we use:
-
-<programlisting>
-SELECT *
- FROM SUPPLIER S
- WHERE NOT EXISTS
- (SELECT * FROM SELLS SE
- WHERE SE.SNO = S.SNO);
-</programlisting>
- </para>
-
- <para>
- In our example the result will be empty because every supplier
- sells at least one part. Note that we use S.SNO from the outer
- <command>SELECT</command> within the WHERE clause of the inner
- <command>SELECT</command>. Here the subquery must be evaluated
- afresh for each tuple from the outer query, i.e., the value for
- S.SNO is always taken from the current tuple of the outer
- <command>SELECT</command>.
- </para>
- </example>
- </para>
- </sect3>
-
- <sect3>
- <title>Subqueries in FROM</title>
-
- <para>
- A somewhat different way of using subqueries is to put them in the
- FROM clause. This is a useful feature because a subquery of this
- kind can output multiple columns and rows, whereas a subquery used
- in an expression must deliver just a single result. It also lets
- us get more than one round of grouping/aggregation without resorting
- to a temporary table.
-
- <example>
- <title id="subselect-in-from-example">Subselect in FROM</title>
-
- <para>
- If we want to know the highest average part price among all our
- suppliers, we cannot write MAX(AVG(PRICE)), but we can write:
-
-<programlisting>
-SELECT MAX(subtable.avgprice)
- FROM (SELECT AVG(P.PRICE) AS avgprice
- FROM SUPPLIER S, PART P, SELLS SE
- WHERE S.SNO = SE.SNO AND
- P.PNO = SE.PNO
- GROUP BY S.SNO) subtable;
-</programlisting>
-
- The subquery returns one row per supplier (because of its GROUP BY)
- and then we aggregate over those rows in the outer query.
- </para>
- </example>
- </para>
- </sect3>
-
- <sect3>
- <title>Union, Intersect, Except</title>
-
- <para>
- These operations calculate the union, intersection and set theoretic
- difference of the tuples derived by two subqueries.
-
- <example>
- <title id="union-example">Union, Intersect, Except</title>
-
- <para>
- The following query is an example for UNION:
-
-<programlisting>
-SELECT S.SNO, S.SNAME, S.CITY
- FROM SUPPLIER S
- WHERE S.SNAME = 'Jones'
-UNION
- SELECT S.SNO, S.SNAME, S.CITY
- FROM SUPPLIER S
- WHERE S.SNAME = 'Adams';
-</programlisting>
-
-gives the result:
-
-<screen>
- SNO | SNAME | CITY
------+-------+--------
- 2 | Jones | Paris
- 3 | Adams | Vienna
-</screen>
- </para>
-
- <para>
- Here is an example for INTERSECT:
-
-<programlisting>
-SELECT S.SNO, S.SNAME, S.CITY
- FROM SUPPLIER S
- WHERE S.SNO > 1
-INTERSECT
- SELECT S.SNO, S.SNAME, S.CITY
- FROM SUPPLIER S
- WHERE S.SNO < 3;
-</programlisting>
-
- gives the result:
-
-<screen>
- SNO | SNAME | CITY
------+-------+--------
- 2 | Jones | Paris
-</screen>
-
- The only tuple returned by both parts of the query is the one having SNO=2.
- </para>
-
- <para>
- Finally an example for EXCEPT:
-
-<programlisting>
-SELECT S.SNO, S.SNAME, S.CITY
- FROM SUPPLIER S
- WHERE S.SNO > 1
-EXCEPT
- SELECT S.SNO, S.SNAME, S.CITY
- FROM SUPPLIER S
- WHERE S.SNO > 3;
-</programlisting>
-
- gives the result:
-
-<screen>
- SNO | SNAME | CITY
------+-------+--------
- 2 | Jones | Paris
- 3 | Adams | Vienna
-</screen>
- </para>
- </example>
- </para>
- </sect3>
- </sect2>
-
- <sect2 id="datadef">
- <title>Data Definition</title>
-
- <para>
- There is a set of commands used for data definition included in the
- <acronym>SQL</acronym> language.
- </para>
-
- <sect3 id="create">
- <title id="create-title">Create Table</title>
-
- <para>
- The most fundamental command for data definition is the
- one that creates a new relation (a new table). The syntax of the
- <command>CREATE TABLE</command> command is:
-
-<synopsis>
-CREATE TABLE <replaceable class="parameter">table_name</replaceable>
- (<replaceable class="parameter">name_of_attr_1</replaceable> <replaceable class="parameter">type_of_attr_1</replaceable>
- [, <replaceable class="parameter">name_of_attr_2</replaceable> <replaceable class="parameter">type_of_attr_2</replaceable>
- [, ...]]);
-</synopsis>
-
- <example>
- <title id="table-create">Table Creation</title>
-
- <para>
- To create the tables defined in
- <xref linkend="supplier-fig" endterm="supplier-fig"> the
- following <acronym>SQL</acronym> statements are used:
-
-<programlisting>
-CREATE TABLE SUPPLIER
- (SNO INTEGER,
- SNAME VARCHAR(20),
- CITY VARCHAR(20));
-</programlisting>
-
-<programlisting>
-CREATE TABLE PART
- (PNO INTEGER,
- PNAME VARCHAR(20),
- PRICE DECIMAL(4 , 2));
-</programlisting>
-
-<programlisting>
-CREATE TABLE SELLS
- (SNO INTEGER,
- PNO INTEGER);
-</programlisting>
- </para>
- </example>
- </para>
- </sect3>
-
- <sect3>
- <title>Data Types in <acronym>SQL</acronym></title>
-
- <para>
- The following is a list of some data types that are supported by
- <acronym>SQL</acronym>:
-
- <itemizedlist>
- <listitem>
- <para>
- INTEGER: signed fullword binary integer (31 bits precision).
- </para>
- </listitem>
-
- <listitem>
- <para>
- SMALLINT: signed halfword binary integer (15 bits precision).
- </para>
- </listitem>
-
- <listitem>
- <para>
- DECIMAL (<replaceable class="parameter">p</replaceable>[,<replaceable class="parameter">q</replaceable>]):
- signed packed decimal number of up to
- <replaceable class="parameter">p</replaceable>
- digits, with
- <replaceable class="parameter">q</replaceable>
- digits to the right of the decimal point.
- If <replaceable class="parameter">q</replaceable>
- is omitted it is assumed to be 0.
- </para>
- </listitem>
-
- <listitem>
- <para>
- FLOAT: signed doubleword floating point number.
- </para>
- </listitem>
-
- <listitem>
- <para>
- VARCHAR(<replaceable class="parameter">n</replaceable>):
- varying length character string of maximum length
- <replaceable class="parameter">n</replaceable>.
- </para>
- </listitem>
-
- <listitem>
- <para>
- CHAR(<replaceable class="parameter">n</replaceable>):
- fixed length character string of length
- <replaceable class="parameter">n</replaceable>.
- </para>
- </listitem>
-
- </itemizedlist>
- </para>
- </sect3>
-
- <sect3>
- <title>Create Index</title>
-
- <para>
- Indexes are used to speed up access to a relation. If a relation <classname>R</classname>
- has an index on attribute <classname>A</classname> then we can
- retrieve all tuples <replaceable>t</replaceable>
- having
- <replaceable>t</replaceable>(<classname>A</classname>) = <replaceable>a</replaceable>
- in time roughly proportional to the number of such
- tuples <replaceable>t</replaceable>
- rather than in time proportional to the size of <classname>R</classname>.
- </para>
-
- <para>
- To create an index in <acronym>SQL</acronym>
- the <command>CREATE INDEX</command> command is used. The syntax is:
-
-<programlisting>
-CREATE INDEX <replaceable class="parameter">index_name</replaceable>
- ON <replaceable class="parameter">table_name</replaceable> ( <replaceable class="parameter">name_of_attribute</replaceable> );
-</programlisting>
- </para>
-
- <para>
- <example>
- <title id="index-create">Create Index</title>
-
- <para>
- To create an index named I on attribute SNAME of relation SUPPLIER
- we use the following statement:
-
-<programlisting>
-CREATE INDEX I ON SUPPLIER (SNAME);
-</programlisting>
- </para>
-
- <para>
- The created index is maintained automatically, i.e., whenever a new
- tuple is inserted into the relation SUPPLIER the index I is
- adapted. Note that the only changes a user can perceive when an
- index is present are increased speed for <command>SELECT</command>
- and decreases in speed of updates.
- </para>
- </example>
- </para>
- </sect3>
-
- <sect3>
- <title>Create View</title>
-
- <para>
- A view can be regarded as a <firstterm>virtual table</firstterm>,
- i.e., a table that
- does not <emphasis>physically</emphasis> exist in the database
- but looks to the user
- as if it does. By contrast, when we talk of a
- <firstterm>base table</firstterm> there is
- really a physically stored counterpart of each row of the table
- somewhere in the physical storage.
- </para>
-
- <para>
- Views do not have their own, physically separate, distinguishable
- stored data. Instead, the system stores the definition of the
- view (i.e., the rules about how to access physically stored base
- tables in order to materialize the view) somewhere in the system
- catalogs (see
- <xref linkend="tutorial-catalogs-title" endterm="tutorial-catalogs-title">). For a
- discussion on different techniques to implement views refer to
-<!--
- section
- <xref linkend="view-impl" endterm="view-impl">.
--->
- <citetitle>SIM98</citetitle>.
- </para>
-
- <para>
- In <acronym>SQL</acronym> the <command>CREATE VIEW</command>
- command is used to define a view. The syntax
- is:
-
-<programlisting>
-CREATE VIEW <replaceable class="parameter">view_name</replaceable>
- AS <replaceable class="parameter">select_stmt</replaceable>
-</programlisting>
-
- where <replaceable class="parameter">select_stmt</replaceable>
- is a valid select statement as defined
- in <xref linkend="select-title" endterm="select-title">.
- Note that <replaceable class="parameter">select_stmt</replaceable> is
- not executed when the view is created. It is just stored in the
- <firstterm>system catalogs</firstterm>
- and is executed whenever a query against the view is made.
- </para>
-
- <para>
- Let the following view definition be given (we use
- the tables from
- <xref linkend="supplier-fig" endterm="supplier-fig"> again):
-
-<programlisting>
-CREATE VIEW London_Suppliers
- AS SELECT S.SNAME, P.PNAME
- FROM SUPPLIER S, PART P, SELLS SE
- WHERE S.SNO = SE.SNO AND
- P.PNO = SE.PNO AND
- S.CITY = 'London';
-</programlisting>
- </para>
-
- <para>
- Now we can use this <firstterm>virtual relation</firstterm>
- <classname>London_Suppliers</classname> as
- if it were another base table:
-
-<programlisting>
-SELECT * FROM London_Suppliers
- WHERE PNAME = 'Screw';
-</programlisting>
-
- which will return the following table:
-
-<screen>
- SNAME | PNAME
--------+-------
- Smith | Screw
-</screen>
- </para>
-
- <para>
- To calculate this result the database system has to do a
- <emphasis>hidden</emphasis>
- access to the base tables SUPPLIER, SELLS and PART first. It
- does so by executing the query given in the view definition against
- those base tables. After that the additional qualifications
- (given in the
- query against the view) can be applied to obtain the resulting
- table.
- </para>
- </sect3>
-
- <sect3>
- <title>Drop Table, Drop Index, Drop View</title>
-
- <para>
- To destroy a table (including all tuples stored in that table) the
- <command>DROP TABLE</command> command is used:
-
-<programlisting>
-DROP TABLE <replaceable class="parameter">table_name</replaceable>;
-</programlisting>
- </para>
-
- <para>
- To destroy the SUPPLIER table use the following statement:
-
-<programlisting>
-DROP TABLE SUPPLIER;
-</programlisting>
- </para>
-
- <para>
- The <command>DROP INDEX</command> command is used to destroy an index:
-
-<programlisting>
-DROP INDEX <replaceable class="parameter">index_name</replaceable>;
-</programlisting>
- </para>
-
- <para>
- Finally to destroy a given view use the command <command>DROP
- VIEW</command>:
-
-<programlisting>
-DROP VIEW <replaceable class="parameter">view_name</replaceable>;
-</programlisting>
- </para>
- </sect3>
- </sect2>
-
- <sect2>
- <title>Data Manipulation</title>
-
- <sect3>
- <title>Insert Into</title>
-
- <para>
- Once a table is created (see
- <xref linkend="create-title" endterm="create-title">), it can be filled
- with tuples using the command <command>INSERT INTO</command>.
- The syntax is:
-
-<programlisting>
-INSERT INTO <replaceable class="parameter">table_name</replaceable> (<replaceable class="parameter">name_of_attr_1</replaceable>
- [, <replaceable class="parameter">name_of_attr_2</replaceable> [, ...]])
- VALUES (<replaceable class="parameter">val_attr_1</replaceable> [, <replaceable class="parameter">val_attr_2</replaceable> [, ...]]);
-</programlisting>
- </para>
-
- <para>
- To insert the first tuple into the relation SUPPLIER (from
- <xref linkend="supplier-fig" endterm="supplier-fig">) we use the
- following statement:
-
-<programlisting>
-INSERT INTO SUPPLIER (SNO, SNAME, CITY)
- VALUES (1, 'Smith', 'London');
-</programlisting>
- </para>
-
- <para>
- To insert the first tuple into the relation SELLS we use:
-
-<programlisting>
-INSERT INTO SELLS (SNO, PNO)
- VALUES (1, 1);
-</programlisting>
- </para>
- </sect3>
-
- <sect3>
- <title>Update</title>
-
- <para>
- To change one or more attribute values of tuples in a relation the
- <command>UPDATE</command> command is used. The syntax is:
-
-<programlisting>
-UPDATE <replaceable class="parameter">table_name</replaceable>
- SET <replaceable class="parameter">name_of_attr_1</replaceable> = <replaceable class="parameter">value_1</replaceable>
- [, ... [, <replaceable class="parameter">name_of_attr_k</replaceable> = <replaceable class="parameter">value_k</replaceable>]]
- WHERE <replaceable class="parameter">condition</replaceable>;
-</programlisting>
- </para>
-
- <para>
- To change the value of attribute PRICE of the part 'Screw' in the
- relation PART we use:
-
-<programlisting>
-UPDATE PART
- SET PRICE = 15
- WHERE PNAME = 'Screw';
-</programlisting>
- </para>
-
- <para>
- The new value of attribute PRICE of the tuple whose name is 'Screw' is
- now 15.
- </para>
- </sect3>
-
- <sect3>
- <title>Delete</title>
-
- <para>
- To delete a tuple from a particular table use the command DELETE
- FROM. The syntax is:
-
-<programlisting>
-DELETE FROM <replaceable class="parameter">table_name</replaceable>
- WHERE <replaceable class="parameter">condition</replaceable>;
-</programlisting>
- </para>
-
- <para>
- To delete the supplier called 'Smith' of the table SUPPLIER the
- following statement is used:
-
-<programlisting>
-DELETE FROM SUPPLIER
- WHERE SNAME = 'Smith';
-</programlisting>
- </para>
- </sect3>
- </sect2>
-
- <sect2 id="tutorial-catalogs">
- <title id="tutorial-catalogs-title">System Catalogs</title>
-
- <para>
- In every <acronym>SQL</acronym> database system
- <firstterm>system catalogs</firstterm> are used to keep
- track of which tables, views indexes etc. are defined in the
- database. These system catalogs can be queried as if they were normal
- relations. For example there is one catalog used for the definition of
- views. This catalog stores the query from the view definition. Whenever
- a query against a view is made, the system first gets the
- <firstterm>view definition query</firstterm> out of the catalog
- and materializes the view
- before proceeding with the user query (see
-<!--
- section
- <xref linkend="view-impl" endterm="view-impl">.
- <citetitle>SIM98</citetitle>
--->
- <xref linkend="SIM98" endterm="SIM98">
- for a more detailed
- description). For more information about system catalogs refer to
- <xref linkend="DATE04" endterm="DATE04">.
- </para>
- </sect2>
-
- <sect2>
- <title>Embedded <acronym>SQL</acronym></title>
-
- <para>
- In this section we will sketch how <acronym>SQL</acronym> can be
- embedded into a host language (e.g., <literal>C</literal>).
- There are two main reasons why we want to use <acronym>SQL</acronym>
- from a host language:
-
- <itemizedlist>
- <listitem>
- <para>
- There are queries that cannot be formulated using pure <acronym>SQL</acronym>
- (i.e., recursive queries). To be able to perform such queries we need a
- host language with a greater expressive power than
- <acronym>SQL</acronym>.
- </para>
- </listitem>
-
- <listitem>
- <para>
- We simply want to access a database from some application that
- is written in the host language (e.g., a ticket reservation system
- with a graphical user interface is written in C and the information
- about which tickets are still left is stored in a database that can be
- accessed using embedded <acronym>SQL</acronym>).
- </para>
- </listitem>
- </itemizedlist>
- </para>
-
- <para>
- A program using embedded <acronym>SQL</acronym>
- in a host language consists of statements
- of the host language and of
- <firstterm>embedded <acronym>SQL</acronym></firstterm>
- (<acronym>ESQL</acronym>) statements. Every <acronym>ESQL</acronym>
- statement begins with the keywords <command>EXEC SQL</command>.
- The <acronym>ESQL</acronym> statements are
- transformed to statements of the host language
- by a <firstterm>precompiler</firstterm>
- (which usually inserts
- calls to library routines that perform the various <acronym>SQL</acronym>
- commands).
- </para>
-
- <para>
- When we look at the examples throughout
- <xref linkend="select-title" endterm="select-title"> we
- realize that the result of the queries is very often a set of
- tuples. Most host languages are not designed to operate on sets so we
- need a mechanism to access every single tuple of the set of tuples
- returned by a SELECT statement. This mechanism can be provided by
- declaring a <firstterm>cursor</firstterm>.
- After that we can use the <command>FETCH</command> command to
- retrieve a tuple and set the cursor to the next tuple.
- </para>
-
- <para>
- For a detailed discussion on embedded <acronym>SQL</acronym>
- refer to
- <xref linkend="DATE97" endterm="DATE97">,
- <xref linkend="DATE04" endterm="DATE04">,
- or
- <xref linkend="ULL88" endterm="ULL88">.
- </para>
- </sect2>
- </sect1>
- </chapter>