2 $Header: /cvsroot/pgsql/doc/src/sgml/sql.sgml,v 1.10 2000/06/14 13:10:48 thomas Exp $
10 This chapter introduces the mathematical concepts behind
11 relational databases. It is not required reading, so if you bog
12 down or want to get straight to some simple examples feel free to
13 jump ahead to the next chapter and come back when you have more
14 time and patience. This stuff is supposed to be fun!
18 This material originally appeared as a part of
19 Stefan Simkovics' Master's Thesis
20 (<xref linkend="SIM98" endterm="SIM98">).
25 <acronym>SQL</acronym> has become the most popular relational query
27 The name "<acronym>SQL</acronym>" is an abbreviation for
28 <firstterm>Structured Query Language</firstterm>.
29 In 1974 Donald Chamberlin and others defined the
30 language SEQUEL (<firstterm>Structured English Query
31 Language</firstterm>) at IBM
32 Research. This language was first implemented in an IBM
33 prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version
34 of SEQUEL called SEQUEL/2 was defined and the name was changed to
35 <acronym>SQL</acronym>
40 A new prototype called System R was developed by IBM in 1977. System R
41 implemented a large subset of SEQUEL/2 (now <acronym>SQL</acronym>)
43 changes were made to <acronym>SQL</acronym> during the project.
44 System R was installed in
45 a number of user sites, both internal IBM sites and also some selected
46 customer sites. Thanks to the success and acceptance of System R at
47 those user sites IBM started to develop commercial products that
48 implemented the <acronym>SQL</acronym> language based on the System
53 Over the next years IBM and also a number of other vendors announced
54 <acronym>SQL</acronym> products such as
55 <productname>SQL/DS</productname> (IBM),
56 <productname>DB2</productname> (IBM),
57 <productname>ORACLE</productname> (Oracle Corp.),
58 <productname>DG/SQL</productname> (Data General Corp.),
59 and <productname>SYBASE</productname> (Sybase Inc.).
63 <acronym>SQL</acronym> is also an official standard now. In 1982
65 Standards Institute (<acronym>ANSI</acronym>) chartered its
66 Database Committee X3H2 to
67 develop a proposal for a standard relational language. This proposal
68 was ratified in 1986 and consisted essentially of the IBM dialect of
69 <acronym>SQL</acronym>. In 1987 this <acronym>ANSI</acronym>
70 standard was also accepted as an international
71 standard by the International Organization for Standardization
72 (<acronym>ISO</acronym>).
73 This original standard version of <acronym>SQL</acronym> is often
75 informally, as "<abbrev>SQL/86</abbrev>". In 1989 the original
77 and this new standard is often, again informally, referred to as
78 "<abbrev>SQL/89</abbrev>". Also in 1989, a related standard called
79 <firstterm>Database Language Embedded <acronym>SQL</acronym></firstterm>
80 (<acronym>ESQL</acronym>) was developed.
84 The <acronym>ISO</acronym> and <acronym>ANSI</acronym> committees
85 have been working for many years on the
86 definition of a greatly expanded version of the original standard,
87 referred to informally as <firstterm><acronym>SQL2</acronym></firstterm>
88 or <firstterm><acronym>SQL/92</acronym></firstterm>. This version became a
89 ratified standard - "International Standard ISO/IEC 9075:1992,
90 Database Language <acronym>SQL</acronym>" - in late 1992.
91 <acronym>SQL/92</acronym> is the version
92 normally meant when people refer to "the <acronym>SQL</acronym>
94 description of <acronym>SQL/92</acronym> is given in
95 <xref linkend="DATE97" endterm="DATE97">. At the time of
96 writing this document a new standard informally referred to
97 as <firstterm><acronym>SQL3</acronym></firstterm>
98 is under development. It is planned to make <acronym>SQL</acronym>
100 language, i.e. all computable queries (e.g. recursive queries) will be
101 possible. This is a very complex task and therefore the completion of
102 the new standard can not be expected before 1999.
105 <sect1 id="rel-model">
106 <title>The Relational Data Model</title>
109 As mentioned before, <acronym>SQL</acronym> is a relational
110 language. That means it is
111 based on the <firstterm>relational data model</firstterm>
112 first published by E.F. Codd in
113 1970. We will give a formal description of the relational model
115 <xref linkend="formal-notion" endterm="formal-notion">)
116 but first we want to have a look at it from a more intuitive
121 A <firstterm>relational database</firstterm> is a database that is
123 users as a <firstterm>collection of tables</firstterm> (and
124 nothing else but tables).
125 A table consists of rows and columns where each row represents a
126 record and each column represents an attribute of the records
127 contained in the table.
128 <xref linkend="supplier-fig" endterm="supplier-fig">
129 shows an example of a database consisting of three tables:
134 SUPPLIER is a table storing the number
135 (SNO), the name (SNAME) and the city (CITY) of a supplier.
141 PART is a table storing the number (PNO) the name (PNAME) and
142 the price (PRICE) of a part.
148 SELLS stores information about which part (PNO) is sold by which
150 It serves in a sense to connect the other two tables together.
156 <title id="supplier-fig">The Suppliers and Parts Database</title>
159 SNO | SNAME | CITY SNO | PNO
160 ----+---------+-------- -----+-----
161 1 | Smith | London 1 | 1
162 2 | Jones | Paris 1 | 2
163 3 | Adams | Vienna 2 | 4
164 4 | Blake | Rome 3 | 1
168 PNO | PNAME | PRICE 4 | 4
169 ----+---------+---------
179 The tables PART and SUPPLIER may be regarded as
180 <firstterm>entities</firstterm> and
181 SELLS may be regarded as a <firstterm>relationship</firstterm>
183 part and a particular supplier.
187 As we will see later, <acronym>SQL</acronym> operates on tables
189 defined but before that we will study the theory of the relational
195 <title id="formal-notion">Relational Data Model Formalities</title>
198 The mathematical concept underlying the relational model is the
199 set-theoretic <firstterm>relation</firstterm> which is a subset of
201 product of a list of domains. This set-theoretic relation gives
202 the model its name (do not confuse it with the relationship from the
203 <firstterm>Entity-Relationship model</firstterm>).
204 Formally a domain is simply a set of
205 values. For example the set of integers is a domain. Also the set of
206 character strings of length 20 and the real numbers are examples of
213 The <firstterm>Cartesian product</firstterm> of domains $D_{1},
214 D_{2},\ldots, D_{k}$ written
215 \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set of
216 all $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \in
217 D_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}.
220 The <firstterm>Cartesian product</firstterm> of domains
221 <parameter>D<subscript>1</subscript></parameter>,
222 <parameter>D<subscript>2</subscript></parameter>,
224 <parameter>D<subscript>k</subscript></parameter>,
226 <parameter>D<subscript>1</subscript></parameter> ×
227 <parameter>D<subscript>2</subscript></parameter> ×
229 <parameter>D<subscript>k</subscript></parameter>
230 is the set of all k-tuples
231 <parameter>v<subscript>1</subscript></parameter>,
232 <parameter>v<subscript>2</subscript></parameter>,
234 <parameter>v<subscript>k</subscript></parameter>,
236 <parameter>v<subscript>1</subscript></parameter> ∈
237 <parameter>D<subscript>1</subscript></parameter>,
238 <parameter>v<subscript>1</subscript></parameter> ∈
239 <parameter>D<subscript>1</subscript></parameter>,
241 <parameter>v<subscript>k</subscript></parameter> ∈
242 <parameter>D<subscript>k</subscript></parameter>.
246 For example, when we have
248 $k=2$, $D_{1}=\{0,1\}$ and
249 $D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is
250 $\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$.
252 <parameter>k</parameter>=2,
253 <parameter>D<subscript>1</subscript></parameter>=<literal>{0,1}</literal> and
254 <parameter>D<subscript>2</subscript></parameter>=<literal>{a,b,c}</literal> then
255 <parameter>D<subscript>1</subscript></parameter> ×
256 <parameter>D<subscript>2</subscript></parameter> is
257 <literal>{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)}</literal>.
263 A Relation is any subset of the Cartesian product of one or more
264 domains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$}
267 A Relation is any subset of the Cartesian product of one or more
268 domains: <parameter>R</parameter> ⊆
269 <parameter>D<subscript>1</subscript></parameter> ×
270 <parameter>D<subscript>2</subscript></parameter> ×
272 <parameter>D<subscript>k</subscript></parameter>.
276 For example <literal>{(0,a),(0,b),(1,a)}</literal> is a relation;
277 it is in fact a subset of
278 <parameter>D<subscript>1</subscript></parameter> ×
279 <parameter>D<subscript>2</subscript></parameter>
284 The members of a relation are called tuples. Each relation of some
286 <parameter>D<subscript>1</subscript></parameter> ×
287 <parameter>D<subscript>2</subscript></parameter> ×
289 <parameter>D<subscript>k</subscript></parameter>
290 is said to have arity <literal>k</literal> and is therefore a set
291 of <literal>k</literal>-tuples.
295 A relation can be viewed as a table (as we already did, remember
296 <xref linkend="supplier-fig" endterm="supplier-fig"> where
297 every tuple is represented by a row and every column corresponds to
298 one component of a tuple. Giving names (called attributes) to the
299 columns leads to the definition of a
300 <firstterm>relation scheme</firstterm>.
306 A {\it relation scheme} $R$ is a finite set of attributes
307 \mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ for
308 each attribute $A_{i}, 1 \le i \le k$ where the values of the
309 attributes are taken from. We often write a relation scheme as
310 \mbox{$R(A_{1},A_{2},\ldots,A_{k})$}.
313 A <firstterm>relation scheme</firstterm> <literal>R</literal> is a
314 finite set of attributes
315 <parameter>A<subscript>1</subscript></parameter>,
316 <parameter>A<subscript>2</subscript></parameter>,
318 <parameter>A<subscript>k</subscript></parameter>.
320 <parameter>D<subscript>i</subscript></parameter>,
322 <parameter>A<subscript>i</subscript></parameter>,
323 1 <= <literal>i</literal> <= <literal>k</literal>,
324 where the values of the attributes are taken from. We often write
326 <literal>R(<parameter>A<subscript>1</subscript></parameter>,
327 <parameter>A<subscript>2</subscript></parameter>,
329 <parameter>A<subscript>k</subscript></parameter>)</literal>.
333 A <firstterm>relation scheme</firstterm> is just a kind of template
334 whereas a <firstterm>relation</firstterm> is an instance of a
336 scheme</firstterm>. The relation consists of tuples (and can
338 viewed as a table); not so the relation scheme.
344 <title id="domains">Domains vs. Data Types</title>
347 We often talked about <firstterm>domains</firstterm>
348 in the last section. Recall that a
349 domain is, formally, just a set of values (e.g., the set of integers or
350 the real numbers). In terms of database systems we often talk of
351 <firstterm>data types</firstterm> instead of domains.
352 When we define a table we have to make
353 a decision about which attributes to include. Additionally we
354 have to decide which kind of data is going to be stored as
355 attribute values. For example the values of
356 <classname>SNAME</classname> from the table
357 <classname>SUPPLIER</classname> will be character strings,
358 whereas <classname>SNO</classname> will store
359 integers. We define this by assigning a data type to each
360 attribute. The type of <classname>SNAME</classname> will be
361 <type>VARCHAR(20)</type> (this is the <acronym>SQL</acronym> type
362 for character strings of length <= 20),
363 the type of <classname>SNO</classname> will be
364 <type>INTEGER</type>. With the assignment of a data type we also
366 a domain for an attribute. The domain of
367 <classname>SNAME</classname> is the set of all
368 character strings of length <= 20,
369 the domain of <classname>SNO</classname> is the set of
376 <title id="operations">Operations in the Relational Data Model</title>
379 In the previous section
380 (<xref linkend="formal-notion" endterm="formal-notion">)
381 we defined the mathematical notion of
382 the relational model. Now we know how the data can be stored using a
383 relational data model but we do not know what to do with all these
384 tables to retrieve something from the database yet. For example somebody
385 could ask for the names of all suppliers that sell the part
386 'Screw'. Therefore two rather different kinds of notations for
387 expressing operations on relations have been defined:
392 The <firstterm>Relational Algebra</firstterm> which is an
394 where queries are expressed by applying specialized operators to the
401 The <firstterm>Relational Calculus</firstterm> which is a
403 where queries are expressed by formulating some logical restrictions
404 that the tuples in the answer must satisfy.
411 <title id="rel-alg">Relational Algebra</title>
414 The <firstterm>Relational Algebra</firstterm> was introduced by
415 E. F. Codd in 1972. It consists of a set of operations on relations:
420 SELECT (σ): extracts <firstterm>tuples</firstterm> from
422 satisfy a given restriction. Let <parameter>R</parameter> be a
423 table that contains an attribute
424 <parameter>A</parameter>.
425 σ<subscript>A=a</subscript>(R) = {t ∈ R ∣ t(A) = a}
426 where <literal>t</literal> denotes a
427 tuple of <parameter>R</parameter> and <literal>t(A)</literal>
428 denotes the value of attribute <parameter>A</parameter> of
429 tuple <literal>t</literal>.
435 PROJECT (π): extracts specified
436 <firstterm>attributes</firstterm> (columns) from a
437 relation. Let <classname>R</classname> be a relation
438 that contains an attribute <classname>X</classname>.
439 π<subscript>X</subscript>(<classname>R</classname>) = {t(X) ∣ t ∈ <classname>R</classname>},
440 where <literal>t</literal>(<classname>X</classname>) denotes the value of
441 attribute <classname>X</classname> of tuple <literal>t</literal>.
447 PRODUCT (×): builds the Cartesian product of two
448 relations. Let <classname>R</classname> be a table with arity
449 <literal>k</literal><subscript>1</subscript> and let
450 <classname>S</classname> be a table with
451 arity <literal>k</literal><subscript>2</subscript>.
452 <classname>R</classname> × <classname>S</classname>
454 <literal>k</literal><subscript>1</subscript>
455 + <literal>k</literal><subscript>2</subscript>-tuples
456 whose first <literal>k</literal><subscript>1</subscript>
457 components form a tuple in <classname>R</classname> and whose last
458 <literal>k</literal><subscript>2</subscript> components form a
459 tuple in <classname>S</classname>.
465 UNION (∪): builds the set-theoretic union of two
466 tables. Given the tables <classname>R</classname> and
467 <classname>S</classname> (both must have the same arity),
468 the union <classname>R</classname> ∪ <classname>S</classname>
469 is the set of tuples that are in <classname>R</classname>
470 or <classname>S</classname> or both.
476 INTERSECT (∩): builds the set-theoretic intersection of two
477 tables. Given the tables <classname>R</classname> and
478 <classname>S</classname>,
479 <classname>R</classname> ∩ <classname>S</classname> is the
481 that are in <classname>R</classname> and in
482 <classname>S</classname>.
483 We again require that <classname>R</classname> and
484 <classname>S</classname> have the
491 DIFFERENCE (− or ∖): builds the set difference of
492 two tables. Let <classname>R</classname> and <classname>S</classname>
493 again be two tables with the same
494 arity. <classname>R</classname> - <classname>S</classname>
495 is the set of tuples in <classname>R</classname> but not in
496 <classname>S</classname>.
502 JOIN (∏): connects two tables by their common
503 attributes. Let <classname>R</classname> be a table with the
504 attributes <classname>A</classname>,<classname>B</classname>
505 and <classname>C</classname> and
506 let <classname>S</classname> be a table with the attributes
507 <classname>C</classname>,<classname>D</classname>
508 and <classname>E</classname>. There is one
509 attribute common to both relations,
510 the attribute <classname>C</classname>.
512 <classname>R</classname> ∏ <classname>S</classname> =
513 π<subscript><classname>R</classname>.<classname>A</classname>,<classname>R</classname>.<classname>B</classname>,<classname>R</classname>.<classname>C</classname>,<classname>S</classname>.<classname>D</classname>,<classname>S</classname>.<classname>E</classname></subscript>(σ<subscript><classname>R</classname>.<classname>C</classname>=<classname>S</classname>.<classname>C</classname></subscript>(<classname>R</classname> × <classname>S</classname>)).
515 R ∏ S = π<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(σ<subscript>R.C=S.C</subscript>(R × S)).
516 What are we doing here? We first calculate the Cartesian
518 <classname>R</classname> × <classname>S</classname>.
519 Then we select those tuples whose values for the common
520 attribute <classname>C</classname> are equal
521 (σ<subscript>R.C = S.C</subscript>).
523 that contains the attribute <classname>C</classname>
524 two times and we correct this by
525 projecting out the duplicate column.
529 <title id="join-example">An Inner Join</title>
532 Let's have a look at the tables that are produced by evaluating the steps
533 necessary for a join.
534 Let the following two tables be given:
539 ---+---+--- ---+---+---
548 First we calculate the Cartesian product
549 <classname>R</classname> × <classname>S</classname> and
554 A | B | R.C | S.C | D | E
555 ---+---+-----+-----+---+---
556 1 | 2 | 3 | 3 | a | b
557 1 | 2 | 3 | 6 | c | d
558 4 | 5 | 6 | 3 | a | b
559 4 | 5 | 6 | 6 | c | d
560 7 | 8 | 9 | 3 | a | b
561 7 | 8 | 9 | 6 | c | d
567 σ<subscript>R.C=S.C</subscript>(R × S)
571 A | B | R.C | S.C | D | E
572 ---+---+-----+-----+---+---
573 1 | 2 | 3 | 3 | a | b
574 4 | 5 | 6 | 6 | c | d
579 To remove the duplicate column
580 <classname>S</classname>.<classname>C</classname>
581 we project it out by the following operation:
582 π<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(σ<subscript>R.C=S.C</subscript>(R × S))
596 DIVIDE (÷): Let <classname>R</classname> be a table
597 with the attributes A, B, C, and D and let
598 <classname>S</classname> be a table with the attributes
600 Then we define the division as:
603 R ÷ S = {t ∣ ∀ t<subscript>s</subscript> ∈ S ∃ t<subscript>r</subscript> ∈ R
607 t<subscript>r</subscript>(A,B)=t∧t<subscript>r</subscript>(C,D)=t<subscript>s</subscript>}
609 t<subscript>r</subscript>(x,y)
611 tuple of table <classname>R</classname> that consists only of
612 the components <literal>x</literal> and <literal>y</literal>.
613 Note that the tuple <literal>t</literal> only consists of the
614 components <classname>A</classname> and
615 <classname>B</classname> of relation <classname>R</classname>.
618 <para id="divide-example">
619 Given the following tables
624 ---+---+---+--- ---+---
648 For a more detailed description and definition of the relational
649 algebra refer to [<xref linkend="ULL88" endterm="ULL88">] or
650 [<xref linkend="DATE94" endterm="DATE94">].
654 <title id="suppl-rel-alg">A Query Using Relational Algebra</title>
656 Recall that we formulated all those relational operators to be able to
657 retrieve data from the database. Let's return to our example from
659 section (<xref linkend="operations" endterm="operations">)
660 where someone wanted to know the names of all
661 suppliers that sell the part <literal>Screw</literal>.
662 This question can be answered
663 using relational algebra by the following operation:
666 π<subscript>SUPPLIER.SNAME</subscript>(σ<subscript>PART.PNAME='Screw'</subscript>(SUPPLIER ∏ SELLS ∏ PART))
671 We call such an operation a query. If we evaluate the above query
672 against the our example tables
673 (<xref linkend="supplier-fig" endterm="supplier-fig">)
674 we will obtain the following result:
686 <sect2 id="rel-calc">
687 <title>Relational Calculus</title>
690 The relational calculus is based on the
691 <firstterm>first order logic</firstterm>. There are
692 two variants of the relational calculus:
697 The <firstterm>Domain Relational Calculus</firstterm>
698 (<acronym>DRC</acronym>), where variables
699 stand for components (attributes) of the tuples.
705 The <firstterm>Tuple Relational Calculus</firstterm>
706 (<acronym>TRC</acronym>), where variables stand for tuples.
713 We want to discuss the tuple relational calculus only because it is
714 the one underlying the most relational languages. For a detailed
715 discussion on <acronym>DRC</acronym> (and also
716 <acronym>TRC</acronym>) see
717 <xref linkend="DATE94" endterm="DATE94">
719 <xref linkend="ULL88" endterm="ULL88">.
724 <title>Tuple Relational Calculus</title>
727 The queries used in <acronym>TRC</acronym> are of the following
734 where <literal>x</literal> is a tuple variable
735 <classname>A</classname> is a set of attributes and <literal>F</literal> is a
736 formula. The resulting relation consists of all tuples
737 <literal>t(A)</literal> that satisfy <literal>F(t)</literal>.
741 If we want to answer the question from example
742 <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">
743 using <acronym>TRC</acronym> we formulate the following query:
746 {x(SNAME) ∣ x ∈ SUPPLIER ∧
747 ∃ y ∈ SELLS ∃ z ∈ PART (y(SNO)=x(SNO) ∧
754 Evaluating the query against the tables from
755 <xref linkend="supplier-fig" endterm="supplier-fig">
756 again leads to the same result
758 <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">.
762 <sect2 id="alg-vs-calc">
763 <title>Relational Algebra vs. Relational Calculus</title>
766 The relational algebra and the relational calculus have the same
767 <firstterm>expressive power</firstterm>; i.e. all queries that
768 can be formulated using relational algebra can also be formulated
769 using the relational calculus and vice versa.
770 This was first proved by E. F. Codd in
771 1972. This proof is based on an algorithm ("Codd's reduction
772 algorithm") by which an arbitrary expression of the relational
773 calculus can be reduced to a semantically equivalent expression of
774 relational algebra. For a more detailed discussion on that refer to
775 <xref linkend="DATE94" endterm="DATE94">
777 <xref linkend="ULL88" endterm="ULL88">.
781 It is sometimes said that languages based on the relational calculus
782 are "higher level" or "more declarative" than languages based on
783 relational algebra because the algebra (partially) specifies the order
784 of operations while the calculus leaves it to a compiler or
785 interpreter to determine the most efficient order of evaluation.
790 <sect1 id="sql-language">
791 <title>The <acronym>SQL</acronym> Language</title>
794 As is the case with most modern relational languages,
795 <acronym>SQL</acronym> is based on the tuple
796 relational calculus. As a result every query that can be formulated
797 using the tuple relational calculus (or equivalently, relational
798 algebra) can also be formulated using
799 <acronym>SQL</acronym>. There are, however,
800 capabilities beyond the scope of relational algebra or calculus. Here
801 is a list of some additional features provided by
802 <acronym>SQL</acronym> that are not
803 part of relational algebra or calculus:
808 Commands for insertion, deletion or modification of data.
814 Arithmetic capability: In <acronym>SQL</acronym> it is possible
816 arithmetic operations as well as comparisons, e.g.
823 that + or other arithmetic operators appear neither in relational
824 algebra nor in relational calculus.
830 Assignment and Print Commands: It is possible to print a
831 relation constructed by a query and to assign a computed relation to a
838 Aggregate Functions: Operations such as
839 <firstterm>average</firstterm>, <firstterm>sum</firstterm>,
840 <firstterm>max</firstterm>, etc. can be applied to columns of a
842 obtain a single quantity.
849 <title id="select-title">Select</title>
852 The most often used command in <acronym>SQL</acronym> is the
854 used to retrieve data. The syntax is:
857 SELECT [ALL|DISTINCT]
858 { * | <replaceable class="parameter">expr_1</replaceable> [AS <replaceable class="parameter">c_alias_1</replaceable>] [, ...
859 [, <replaceable class="parameter">expr_k</replaceable> [AS <replaceable class="parameter">c_alias_k</replaceable>]]]}
860 FROM <replaceable class="parameter">table_name_1</replaceable> [<replaceable class="parameter">t_alias_1</replaceable>]
861 [, ... [, <replaceable class="parameter">table_name_n</replaceable> [<replaceable class="parameter">t_alias_n</replaceable>]]]
862 [WHERE <replaceable class="parameter">condition</replaceable>]
863 [GROUP BY <replaceable class="parameter">name_of_attr_i</replaceable>
864 [,... [, <replaceable class="parameter">name_of_attr_j</replaceable>]] [HAVING <replaceable class="parameter">condition</replaceable>]]
865 [{UNION [ALL] | INTERSECT | EXCEPT} SELECT ...]
866 [ORDER BY <replaceable class="parameter">name_of_attr_i</replaceable> [ASC|DESC]
867 [, ... [, <replaceable class="parameter">name_of_attr_j</replaceable> [ASC|DESC]]]];
872 Now we will illustrate the complex syntax of the SELECT statement
873 with various examples. The tables used for the examples are defined in
874 <xref linkend="supplier-fig" endterm="supplier-fig">.
878 <title>Simple Selects</title>
881 Here are some simple examples using a SELECT statement:
884 <title id="simple-query">Simple Query with Qualification</title>
886 To retrieve all tuples from table PART where the attribute PRICE is
887 greater than 10 we formulate the following query:
898 -----+---------+--------
905 Using "*" in the SELECT statement will deliver all attributes from
906 the table. If we want to retrieve only the attributes PNAME and PRICE
907 from table PART we use the statement:
915 In this case the result is:
924 Note that the <acronym>SQL</acronym> SELECT corresponds to the
925 "projection" in relational algebra not to the "selection"
926 (see <xref linkend="rel-alg" endterm="rel-alg"> for more details).
930 The qualifications in the WHERE clause can also be logically connected
931 using the keywords OR, AND, and NOT:
936 WHERE PNAME = 'Bolt' AND
937 (PRICE = 0 OR PRICE < 15);
940 will lead to the result:
950 Arithmetic operations may be used in the target list and in the WHERE
951 clause. For example if we want to know how much it would cost if we
952 take two pieces of a part we could use the following query:
955 SELECT PNAME, PRICE * 2 AS DOUBLE
957 WHERE PRICE * 2 < 50;
970 Note that the word DOUBLE after the keyword AS is the new title of the
971 second column. This technique can be used for every element of the
972 target list to assign a new title to the resulting
973 column. This new title
974 is often referred to as alias. The alias cannot be used throughout the
984 <para id="simple-join">
985 The following example shows how <firstterm>joins</firstterm> are
986 realized in <acronym>SQL</acronym>.
990 To join the three tables SUPPLIER, PART and SELLS over their common
991 attributes we formulate the following statement:
994 SELECT S.SNAME, P.PNAME
995 FROM SUPPLIER S, PART P, SELLS SE
996 WHERE S.SNO = SE.SNO AND
1000 and get the following table as a result:
1017 In the FROM clause we introduced an alias name for every relation
1018 because there are common named attributes (SNO and PNO) among the
1019 relations. Now we can distinguish between the common named attributes
1020 by simply prefixing the attribute name with the alias name followed by
1021 a dot. The join is calculated in the same way as shown in
1022 <xref linkend="join-example" endterm="join-example">.
1023 First the Cartesian product
1025 SUPPLIER × PART × SELLS
1027 is derived. Now only those tuples satisfying the
1028 conditions given in the WHERE clause are selected (i.e. the common
1029 named attributes have to be equal). Finally we project out all
1030 columns but S.SNAME and P.PNAME.
1035 <title>Aggregate Operators</title>
1038 <acronym>SQL</acronym> provides aggregate operators
1039 (e.g. AVG, COUNT, SUM, MIN, MAX) that
1040 take the name of an attribute as an argument. The value of the
1041 aggregate operator is calculated over all values of the specified
1042 attribute (column) of the whole table. If groups are specified in the
1043 query the calculation is done only over the values of a group (see next
1047 <title id="aggregates-example">Aggregates</title>
1050 If we want to know the average cost of all parts in table PART we use
1051 the following query:
1054 SELECT AVG(PRICE) AS AVG_PRICE
1070 If we want to know how many parts are stored in table PART we use
1092 <title>Aggregation by Groups</title>
1095 <acronym>SQL</acronym> allows one to partition the tuples of a table
1096 into groups. Then the
1097 aggregate operators described above can be applied to the groups
1098 (i.e. the value of the aggregate operator is no longer calculated over
1099 all the values of the specified column but over all values of a
1100 group. Thus the aggregate operator is evaluated individually for every
1105 The partitioning of the tuples into groups is done by using the
1106 keywords <command>GROUP BY</command> followed by a list of
1107 attributes that define the
1109 <command>GROUP BY A<subscript>1</subscript>, ⃛, A<subscript>k</subscript></command>
1111 the relation into groups, such that two tuples are in the same group
1112 if and only if they agree on all the attributes
1113 A<subscript>1</subscript>, ⃛, A<subscript>k</subscript>.
1116 <title id="aggregates-groupby">Aggregates</title>
1118 If we want to know how many parts are sold by every supplier we
1119 formulate the query:
1122 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1123 FROM SUPPLIER S, SELLS SE
1124 WHERE S.SNO = SE.SNO
1125 GROUP BY S.SNO, S.SNAME;
1132 -----+-------+-------
1141 Now let's have a look of what is happening here.
1142 First the join of the
1143 tables SUPPLIER and SELLS is derived:
1146 S.SNO | S.SNAME | SE.PNO
1147 -------+---------+--------
1160 Next we partition the tuples into groups by putting all tuples
1161 together that agree on both attributes S.SNO and S.SNAME:
1164 S.SNO | S.SNAME | SE.PNO
1165 -------+---------+--------
1168 --------------------------
1170 --------------------------
1173 --------------------------
1181 In our example we got four groups and now we can apply the aggregate
1182 operator COUNT to every group leading to the total result of the query
1189 Note that for the result of a query using GROUP BY and aggregate
1190 operators to make sense the attributes grouped by must also appear in
1191 the target list. All further attributes not appearing in the GROUP
1192 BY clause can only be selected by using an aggregate function. On
1193 the other hand you can not use aggregate functions on attributes
1194 appearing in the GROUP BY clause.
1199 <title>Having</title>
1202 The HAVING clause works much like the WHERE clause and is used to
1203 consider only those groups satisfying the qualification given in the
1204 HAVING clause. The expressions allowed in the HAVING clause must
1205 involve aggregate functions. Every expression using only plain
1206 attributes belongs to the WHERE clause. On the other hand every
1207 expression involving an aggregate function must be put to the HAVING
1211 <title id="having-example">Having</title>
1214 If we want only those suppliers selling more than one part we use the
1218 SELECT S.SNO, S.SNAME, COUNT(SE.PNO)
1219 FROM SUPPLIER S, SELLS SE
1220 WHERE S.SNO = SE.SNO
1221 GROUP BY S.SNO, S.SNAME
1222 HAVING COUNT(SE.PNO) > 1;
1229 -----+-------+-------
1240 <title>Subqueries</title>
1243 In the WHERE and HAVING clauses the use of subqueries (subselects) is
1244 allowed in every place where a value is expected. In this case the
1245 value must be derived by evaluating the subquery first. The usage of
1246 subqueries extends the expressive power of
1247 <acronym>SQL</acronym>.
1250 <title id="subselect-example">Subselect</title>
1253 If we want to know all parts having a greater price than the part
1254 named 'Screw' we use the query:
1259 WHERE PRICE > (SELECT PRICE FROM PART
1260 WHERE PNAME='Screw');
1269 -----+---------+--------
1276 When we look at the above query we can see
1277 the keyword SELECT two times. The first one at the beginning of the
1278 query - we will refer to it as outer SELECT - and the one in the WHERE
1279 clause which begins a nested query - we will refer to it as inner
1280 SELECT. For every tuple of the outer SELECT the inner SELECT has to be
1281 evaluated. After every evaluation we know the price of the tuple named
1282 'Screw' and we can check if the price of the actual tuple is
1287 If we want to know all suppliers that do not sell any part
1288 (e.g. to be able to remove these suppliers from the database) we use:
1294 (SELECT * FROM SELLS SE
1295 WHERE SE.SNO = S.SNO);
1300 In our example the result will be empty because every supplier sells
1301 at least one part. Note that we use S.SNO from the outer SELECT within
1302 the WHERE clause of the inner SELECT. As described above the subquery
1303 is evaluated for every tuple from the outer query i.e. the value for
1304 S.SNO is always taken from the actual tuple of the outer SELECT.
1311 <title>Union, Intersect, Except</title>
1314 These operations calculate the union, intersect and set theoretic
1315 difference of the tuples derived by two subqueries.
1318 <title id="union-example">Union, Intersect, Except</title>
1321 The following query is an example for UNION:
1324 SELECT S.SNO, S.SNAME, S.CITY
1326 WHERE S.SNAME = 'Jones'
1328 SELECT S.SNO, S.SNAME, S.CITY
1330 WHERE S.SNAME = 'Adams';
1337 -----+-------+--------
1344 Here an example for INTERSECT:
1347 SELECT S.SNO, S.SNAME, S.CITY
1351 SELECT S.SNO, S.SNAME, S.CITY
1360 -----+-------+--------
1364 The only tuple returned by both parts of the query is the one having $SNO=2$.
1368 Finally an example for EXCEPT:
1371 SELECT S.SNO, S.SNAME, S.CITY
1375 SELECT S.SNO, S.SNAME, S.CITY
1384 -----+-------+--------
1394 <sect2 id="datadef">
1395 <title>Data Definition</title>
1398 There is a set of commands used for data definition included in the
1399 <acronym>SQL</acronym> language.
1403 <title id="create-title">Create Table</title>
1406 The most fundamental command for data definition is the
1407 one that creates a new relation (a new table). The syntax of the
1408 CREATE TABLE command is:
1411 CREATE TABLE <replaceable class="parameter">table_name</replaceable>
1412 (<replaceable class="parameter">name_of_attr_1</replaceable> <replaceable class="parameter">type_of_attr_1</replaceable>
1413 [, <replaceable class="parameter">name_of_attr_2</replaceable> <replaceable class="parameter">type_of_attr_2</replaceable>
1418 <title id="table-create">Table Creation</title>
1421 To create the tables defined in
1422 <xref linkend="supplier-fig" endterm="supplier-fig"> the
1423 following <acronym>SQL</acronym> statements are used:
1426 CREATE TABLE SUPPLIER
1436 PRICE DECIMAL(4 , 2));
1450 <title>Data Types in <acronym>SQL</acronym></title>
1453 The following is a list of some data types that are supported by
1454 <acronym>SQL</acronym>:
1459 INTEGER: signed fullword binary integer (31 bits precision).
1465 SMALLINT: signed halfword binary integer (15 bits precision).
1471 DECIMAL (<replaceable class="parameter">p</replaceable>[,<replaceable class="parameter">q</replaceable>]):
1472 signed packed decimal number of
1473 <replaceable class="parameter">p</replaceable>
1474 digits precision with assumed
1475 <replaceable class="parameter">q</replaceable>
1476 of them right to the decimal point.
1478 (15 ≥ <replaceable class="parameter">p</replaceable> ≥ <replaceable class="parameter">q</replaceable> ≥ 0).
1480 If <replaceable class="parameter">q</replaceable>
1481 is omitted it is assumed to be 0.
1487 FLOAT: signed doubleword floating point number.
1493 CHAR(<replaceable class="parameter">n</replaceable>):
1494 fixed length character string of length
1495 <replaceable class="parameter">n</replaceable>.
1501 VARCHAR(<replaceable class="parameter">n</replaceable>):
1502 varying length character string of maximum length
1503 <replaceable class="parameter">n</replaceable>.
1511 <title>Create Index</title>
1514 Indices are used to speed up access to a relation. If a relation <classname>R</classname>
1515 has an index on attribute <classname>A</classname> then we can
1516 retrieve all tuples <replaceable>t</replaceable>
1518 <replaceable>t</replaceable>(<classname>A</classname>) = <replaceable>a</replaceable>
1519 in time roughly proportional to the number of such
1520 tuples <replaceable>t</replaceable>
1521 rather than in time proportional to the size of <classname>R</classname>.
1525 To create an index in <acronym>SQL</acronym>
1526 the CREATE INDEX command is used. The syntax is:
1529 CREATE INDEX <replaceable class="parameter">index_name</replaceable>
1530 ON <replaceable class="parameter">table_name</replaceable> ( <replaceable class="parameter">name_of_attribute</replaceable> );
1536 <title id="index-create">Create Index</title>
1539 To create an index named I on attribute SNAME of relation SUPPLIER
1540 we use the following statement:
1543 CREATE INDEX I ON SUPPLIER (SNAME);
1548 The created index is maintained automatically, i.e. whenever a new tuple
1549 is inserted into the relation SUPPLIER the index I is adapted. Note
1550 that the only changes a user can percept when an index is present
1551 are an increased speed.
1558 <title>Create View</title>
1561 A view may be regarded as a <firstterm>virtual table</firstterm>,
1563 does not <emphasis>physically</emphasis> exist in the database
1564 but looks to the user
1565 as if it does. By contrast, when we talk of a
1566 <firstterm>base table</firstterm> there is
1567 really a physically stored counterpart of each row of the table
1568 somewhere in the physical storage.
1572 Views do not have their own, physically separate, distinguishable
1573 stored data. Instead, the system stores the definition of the
1574 view (i.e. the rules about how to access physically stored base
1575 tables in order to materialize the view) somewhere in the system
1577 <xref linkend="catalogs-title" endterm="catalogs-title">). For a
1578 discussion on different techniques to implement views refer to
1581 <xref linkend="view-impl" endterm="view-impl">.
1583 <citetitle>SIM98</citetitle>.
1587 In <acronym>SQL</acronym> the <command>CREATE VIEW</command>
1588 command is used to define a view. The syntax
1592 CREATE VIEW <replaceable class="parameter">view_name</replaceable>
1593 AS <replaceable class="parameter">select_stmt</replaceable>
1596 where <replaceable class="parameter">select_stmt</replaceable>
1597 is a valid select statement as defined
1598 in <xref linkend="select-title" endterm="select-title">.
1599 Note that <replaceable class="parameter">select_stmt</replaceable> is
1600 not executed when the view is created. It is just stored in the
1601 <firstterm>system catalogs</firstterm>
1602 and is executed whenever a query against the view is made.
1606 Let the following view definition be given (we use
1608 <xref linkend="supplier-fig" endterm="supplier-fig"> again):
1611 CREATE VIEW London_Suppliers
1612 AS SELECT S.SNAME, P.PNAME
1613 FROM SUPPLIER S, PART P, SELLS SE
1614 WHERE S.SNO = SE.SNO AND
1621 Now we can use this <firstterm>virtual relation</firstterm>
1622 <classname>London_Suppliers</classname> as
1623 if it were another base table:
1626 SELECT * FROM London_Suppliers
1627 WHERE P.PNAME = 'Screw';
1630 which will return the following table:
1640 To calculate this result the database system has to do a
1641 <emphasis>hidden</emphasis>
1642 access to the base tables SUPPLIER, SELLS and PART first. It
1643 does so by executing the query given in the view definition against
1644 those base tables. After that the additional qualifications
1646 query against the view) can be applied to obtain the resulting
1652 <title>Drop Table, Drop Index, Drop View</title>
1655 To destroy a table (including all tuples stored in that table) the
1656 DROP TABLE command is used:
1659 DROP TABLE <replaceable class="parameter">table_name</replaceable>;
1664 To destroy the SUPPLIER table use the following statement:
1667 DROP TABLE SUPPLIER;
1672 The DROP INDEX command is used to destroy an index:
1675 DROP INDEX <replaceable class="parameter">index_name</replaceable>;
1680 Finally to destroy a given view use the command DROP VIEW:
1683 DROP VIEW <replaceable class="parameter">view_name</replaceable>;
1690 <title>Data Manipulation</title>
1693 <title>Insert Into</title>
1696 Once a table is created (see
1697 <xref linkend="create-title" endterm="create-title">), it can be filled
1698 with tuples using the command <command>INSERT INTO</command>.
1702 INSERT INTO <replaceable class="parameter">table_name</replaceable> (<replaceable class="parameter">name_of_attr_1</replaceable>
1703 [, <replaceable class="parameter">name_of_attr_2</replaceable> [,...]])
1704 VALUES (<replaceable class="parameter">val_attr_1</replaceable> [, <replaceable class="parameter">val_attr_2</replaceable> [, ...]]);
1709 To insert the first tuple into the relation SUPPLIER (from
1710 <xref linkend="supplier-fig" endterm="supplier-fig">) we use the
1711 following statement:
1714 INSERT INTO SUPPLIER (SNO, SNAME, CITY)
1715 VALUES (1, 'Smith', 'London');
1720 To insert the first tuple into the relation SELLS we use:
1723 INSERT INTO SELLS (SNO, PNO)
1730 <title>Update</title>
1733 To change one or more attribute values of tuples in a relation the
1734 UPDATE command is used. The syntax is:
1737 UPDATE <replaceable class="parameter">table_name</replaceable>
1738 SET <replaceable class="parameter">name_of_attr_1</replaceable> = <replaceable class="parameter">value_1</replaceable>
1739 [, ... [, <replaceable class="parameter">name_of_attr_k</replaceable> = <replaceable class="parameter">value_k</replaceable>]]
1740 WHERE <replaceable class="parameter">condition</replaceable>;
1745 To change the value of attribute PRICE of the part 'Screw' in the
1746 relation PART we use:
1751 WHERE PNAME = 'Screw';
1756 The new value of attribute PRICE of the tuple whose name is 'Screw' is
1762 <title>Delete</title>
1765 To delete a tuple from a particular table use the command DELETE
1766 FROM. The syntax is:
1769 DELETE FROM <replaceable class="parameter">table_name</replaceable>
1770 WHERE <replaceable class="parameter">condition</replaceable>;
1775 To delete the supplier called 'Smith' of the table SUPPLIER the
1776 following statement is used:
1779 DELETE FROM SUPPLIER
1780 WHERE SNAME = 'Smith';
1786 <sect2 id="catalogs">
1787 <title id="catalogs-title">System Catalogs</title>
1790 In every <acronym>SQL</acronym> database system
1791 <firstterm>system catalogs</firstterm> are used to keep
1792 track of which tables, views indexes etc. are defined in the
1793 database. These system catalogs can be queried as if they were normal
1794 relations. For example there is one catalog used for the definition of
1795 views. This catalog stores the query from the view definition. Whenever
1796 a query against a view is made, the system first gets the
1797 <firstterm>view definition query</firstterm> out of the catalog
1798 and materializes the view
1799 before proceeding with the user query (see
1802 <xref linkend="view-impl" endterm="view-impl">.
1803 <citetitle>SIM98</citetitle>
1805 <xref linkend="SIM98" endterm="SIM98">
1807 description). For more information about system catalogs refer to
1808 <xref linkend="DATE94" endterm="DATE94">.
1813 <title>Embedded <acronym>SQL</acronym></title>
1816 In this section we will sketch how <acronym>SQL</acronym> can be
1817 embedded into a host language (e.g. <literal>C</literal>).
1818 There are two main reasons why we want to use <acronym>SQL</acronym>
1819 from a host language:
1824 There are queries that cannot be formulated using pure <acronym>SQL</acronym>
1825 (i.e. recursive queries). To be able to perform such queries we need a
1826 host language with a greater expressive power than
1827 <acronym>SQL</acronym>.
1833 We simply want to access a database from some application that
1834 is written in the host language (e.g. a ticket reservation system
1835 with a graphical user interface is written in C and the information
1836 about which tickets are still left is stored in a database that can be
1837 accessed using embedded <acronym>SQL</acronym>).
1844 A program using embedded <acronym>SQL</acronym>
1845 in a host language consists of statements
1846 of the host language and of
1847 <firstterm>embedded <acronym>SQL</acronym></firstterm>
1848 (<acronym>ESQL</acronym>) statements. Every <acronym>ESQL</acronym>
1849 statement begins with the keywords <command>EXEC SQL</command>.
1850 The <acronym>ESQL</acronym> statements are
1851 transformed to statements of the host language
1852 by a <firstterm>precompiler</firstterm>
1853 (which usually inserts
1854 calls to library routines that perform the various <acronym>SQL</acronym>
1859 When we look at the examples throughout
1860 <xref linkend="select-title" endterm="select-title"> we
1861 realize that the result of the queries is very often a set of
1862 tuples. Most host languages are not designed to operate on sets so we
1863 need a mechanism to access every single tuple of the set of tuples
1864 returned by a SELECT statement. This mechanism can be provided by
1865 declaring a <firstterm>cursor</firstterm>.
1866 After that we can use the FETCH command to
1867 retrieve a tuple and set the cursor to the next tuple.
1871 For a detailed discussion on embedded <acronym>SQL</acronym>
1873 <xref linkend="DATE97" endterm="DATE97">,
1874 <xref linkend="DATE94" endterm="DATE94">,
1876 <xref linkend="ULL88" endterm="ULL88">.
1882 <!-- Keep this comment at the end of the file
1887 sgml-minimize-attributes:nil
1888 sgml-always-quote-attributes:t
1891 sgml-parent-document:nil
1892 sgml-default-dtd-file:"./reference.ced"
1893 sgml-exposed-tags:nil
1894 sgml-local-catalogs:("/usr/lib/sgml/catalog")
1895 sgml-local-ecat-files:nil