Course Notes On SQL
Course Notes On SQL
Course Notes On SQL
SQL: Introduction
SQL =
3 another syntax for expressions of DRC, TRC, and the relational algebra
3 + a data definition language
3 + some extensions
SQL: Summary of Presentation SQL was defined in the mid 70s, after the relational algebra and calculi (early
70s): SQL was meant to be simpler (this is not obvious)
Bases of SQL
SQL uses a vocabulary different from that of the usual relational model
Discussion of SQL features through examples
3 relation table
Criticism of SQL
3 tuple row
Standardization
3 attribute column
A major difference with the relational model: tables can have identical rows in
SQL
4 6
Relation Schema Declaration: Example 3 Compared with relational query languages, the operations and structure of a
relational DDL are simple and leave little room for variations
CREATE TABLE Employee ( 3 The SQL DDL borrows little from the query part of SQL
SSN CHAR(9) NOT NULL,
FName VARCHAR(15) NOT NULL,
MInit CHAR,
LName VARCHAR(15) NOT NULL,
BDate DATE,
Address VARCHAR(30),
Sex CHAR,
SALARY DECIMAL(10,2),
SuperSSN CHAR(9),
DNo INT NOT NULL,
PRIMARY KEY (SSN)
FOREIGN KEY (SUPERSSN) REFERENCES Employee(SSN)
FOREIGN KEY (DNo) REFERENCES Department(DNumber) );
7 8
Compare with the structure of the same query in TRC: Not a good idea in a multi-user database
3 {t.BDate, t.Address | Employee(t) t.FName = John t.LName = Smith} 3 The number of attributes of the tuples in the answer will change if the relation
3 What is the major difference between SQL and TRC for this query? schema of Employee changes (if attributes are added or deleted), without the
query having changed: this is a violation of logical data independence
3 The text of the query does not make explicit the structure of the result (lack of
self-documentation)
9 10
DNumber = DNo (join condition) in (1) signals a join in SQL, between a relation with Join condition DNumber = DNo in (1) is clearly decoded : DNumber and DNo are unique
attribute DNumber (Department) and a relation with attribute DNo (Employee) in relations Employee and Department
Example (2) has two joins (and thus two join conditions) If the attribute name DNumber is chosen in both relations to reference the domain of
departments, then SQL must formulate the query as (2)
11 12
The first SQL query above is the general form concerning the use of variables, of which
the previous examples are special cases
When the same relation is invoked more than once in the same query, then SQL uses
variables exactly like TRC
13
Union
List project names of projects for which an employee whose last name is Smith is a
EXISTS and NOT EXISTS worker or a manager of the department that controls the project
(SELECT PName
(1) List the name of employees with at least one dependent
FROM Employee, Project, Department
SELECT FName, LName
WHERE DNum = DNumber AND MgrSSN = SSN AND LName = Smith)
FROM Employee
WHERE EXISTS (SELECT *
UNION
FROM Dependent
WHERE SSN = ESSN)
(SELECT PName
FROM Employee, Project, WorksOn
(2) List the name of employees with no dependent WHERE PNumber = PNo AND ESSN = SSN AND LName = Smith)
SELECT FName, LName
FROM Employee
WHERE NOT EXISTS (SELECT *
FROM Dependent
WHERE SSN = ESSN) 15
14
Here, the intuition of (1) extends well to the negated query (2)
Same Example with Disjunction
SELECT DISTINCT PName
FROM Project
WHERE PNumber IN (SELECT PNumber
FROM Project, Department, Employee
WHERE DNum = DNumber AND MgrSSN = SSN
AND LName = Smith)
OR
PNumber IN (SELECT PNo
FROM WorksOn, Employee
WHERE ESSN = SSN AND LName = Smith)
16
17
Very different perceptions often exist in SQL for the same query
This may not be a good idea, as learning the language gets more complicated
(1) is a flat TRC-like formulation
(2) evokes the TRC structure, but, like (3), is really a mixture of tuple-oriented and
set-oriented formulations
20
22 23
24
{e.FName, e.LName | Employee(e) There is no agreed-upon notation for specifying aggregate functions in the alge-
p (Project(p) p.DNum = 5 bra or the calculi
w(WorksOn(w) w.ESSN = e.SSN w.PNo = p.PNo) ) }
26
25
28 30
29 31
34
35 36
37
38 40
39
41 42
43
45
44
46 48
Programming effort
Programming effort
FROM R
SQL
GROUP BY A
HAVING SET(B) CONTAINS S
47 49
Do users exist that could satisfy their needs in the lower part of the SQL curve? Tru optimality can only be realized on average, because true optimum in general
depends on populations
We argue that, for many people, the calculi get simpler at some point before
relational completeness Data independence is violated when several equivalent queries are evaluated with
different strategies (with different performance)
Strategy for the more complex queries: formulate queries in calculus and trans-
late them into SQL Redundancy and nonorthogonality of SQL make full data independence hope-
lessly impossible, because some equivalent queries have very different forms
Programming effort
SQL
51
Query complexity Relational
completeness
50 SQL History
SQL was ill-conceived from the beginning: from a technical point of view, the
history of its evolution is essentially uninteresting
Still, SQL has become practically unescapable
The bases for the calculi were defined with the relational model (1970)
First version of SQUARE at IBM Research in 1974
Numerous successive versions (SEQUEL, SEQUEL/2, SQL) with no clear ratio-
nale for evolution during 70s
In parallel, numerous proposals for other similar languages (e.g., QUEL, QUERY
BY EXAMPLE)
First RDBMSs (1980-1986): ORACLE, SQL/DS then DB2(IBM), INGRES,
SYBASE, INFORMIX, etc.
SQL became the leading language by mid-80s
52
53