1.16k likes | 1.6k Views
Chapter 3: Relational Model and Relational Algebra & Calculus ( [S] Chp . 2 and 5 ). Structure of Relational Databases Relational Algebra Tuple Relational Calculus Domain Relational Calculus Extended Relational-Algebra-Operations Modification of the Database Views.
E N D
Chapter 3: Relational Model and Relational Algebra & Calculus( [S] Chp. 2 and 5 ) • Structure of Relational Databases • Relational Algebra • Tuple Relational Calculus • Domain Relational Calculus • Extended Relational-Algebra-Operations • Modification of the Database • Views
Why Study the Relational Model? • Most widely used model. • Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. • “Legacy systems” in older models • E.G., IBM’s IMS • Recent competitor: object-oriented model • ObjectStore, Versant, Ontos • A synthesis emerging: object-relational model • Informix Universal Server, UniSQL, O2, Oracle, DB2
Basic Structure • Formally, given sets D1, D2, …. Dn, a relation r is a subset of D1 x D2 x … x Dn Each Di is a Domain Thus a relation is a set of n-tuples (a1, a2, …, an) where each ai Di • Example: if customer-name = {Jones, Smith, Curry, Lindsay}customer-street = {Main, North, Park}customer-city = {Harrison, Rye, Pittsfield}Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield)} is a relation over customer-name x customer-street x customer-city
Attribute Types • Each attribute of a relation has a name • The set of allowed values for each attribute is called the domain of the attribute • Attribute values are (normally) required to be atomic, that is, indivisible • E.g. multivalued attribute values are not atomic • E.g. composite attribute values are not atomic • The special value null is a member of every domain • The null value causes complications in the definition of many operations • we shall ignore the effect of null values in our main presentation and consider their effect later
Relation Schema • A1, A2, …, Anare attributes • R = (A1, A2, …, An ) is a relation schema E.g. Customer-schema = (customer-name, customer-street, customer-city) • r(R) is a relation on the relation schema R E.g. customer (Customer-schema) Our common notation: R – relational schema, r- relation instance When we say “relation” we will mean relation instance or relation schema depending on the context
Relation Instance • The current values (relation instance) of a relation are specified by a table • An element t of r is a tuple, represented by a row in a table • The number of attributes is the relation degree, the number of rows is the relation cardinality attributes (or columns) customer-name customer-street customer-city Jones Smith Curry Lindsay Main North North Park Harrison Rye Rye Pittsfield tuples (or rows) customer
Relations are Unordered • Order of tuples is irrelevant (tuples may be stored in an arbitrary order) • E.g. account relation with unordered tuples
Example Instance of Students Relation • Cardinality = 3, degree = 5, all rows distinct
Database • A database consists of multiple relations • Information about an enterprise is broken up into parts (usually an entity), with each relation storing one part of the information E.g.: account : stores information about accounts depositor : stores information about which customer owns which account customer : stores information about customers • Storing all information as a single relation such as bank(account-number, balance, customer-name, ..)results in • repetition of information (e.g. two customers own an account) • the need for null values (e.g. represent a customer without an account) • Normalization theory (Chapter 7) deals with how to design relational schemas
Keys • Let K R • K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R) • by “possible r” we mean a relation r that could exist in the enterprise we are modeling. • Example: {customer-name, customer-street} and {customer-name} are both superkeys of Customer, if no two customers can possibly have the same name. • K is a candidate key if K is minimal • Example: {customer-name} is a candidate key for Customer. • Since it is a superkey, and no subset of it is a superkey. • Assuming no two customers can possibly have the same name.
Determining Keys from E-R Sets • Strong entity set. • The primary key of the entity set becomes the primary key of the relation. • Weak entity set. • The primary key of the relation consists of the union of the primary key of the strong entity set and the discriminator of the weak entity set. • Relationship set. • The union of the primary keys of the related entity sets becomes a super key of the relation. • For binary many-to-one relationship sets • The primary key of the “many” entity set becomes the relation’s primary key. • For one-to-one relationship sets, the relation’s primary key can be that of either entity set. • For many-to-many relationship sets • The union of the primary keys becomes the relation’s primary key
המודל הטבלאי • פותח לראשונה על ידי Codd בשנת 1970. • הפך להיות המודל השליט בשנות ה- 08 וה- 90. • רוב המערכות המסחריות מבוססות עליו: • DB2, ORACLE, SYBASE, Informix • מדוע? • פשוט, אלגנטי ומבוסס תאורטית.
המודל הטבלאי - מושגים • תחום – Domain – קבוצה של ערכים אטומיים. • טבלה – Relation(scheme)R(A1, A2…An)(Intention) • Ai זו תכונה – Attribute עם ערכים הלקוחים מתחום מסויים. • טבלה – Relation(Instance)r(t1, t2…tm)(Extension) • שורה – tuplet=(v1, v2…vn) • Vi – ערך של תכונה i בשורה t. • מספר העמודות בטבלה – דרגה – Degree or Arity. • מס. השורות - Cardinality
המודל הטבלאי - תכונות • אין חשיבות לסדר השורות. • אין חשיבות לסדר העמודות. • אין שתי שורות זהות בטבלה. • קבוצת תכונות המזהה שורה באופן יחיד – סופר מפתח – Super-Key. • קבוצה מינימלית כזו נקראת Candidate key. • אחת הקבוצות הללו נקראת ה – Primary key.
המודל הטבלאי – סימונים • טבלאות (סכימה) – Q, R, S • טבלאות (מופע ) – q, r, s • שורות – t, u, v • t[Ai] – הערך של תכונה – Ai עבור t. • לדוגמה: Ai – עבור שורה tt(SSN, GPA). • במידה ושם התכונה לא יחיד – R. Ai • לדוגמה: STUDENT.NAME • ערך לא קיים - Null
המודל הטבלאי - אילוצים • תחומים: כל הערכים בתחום הם אטומיים (או Null). • מפתחות: לכל טבלה יש לפחות מפתח אחד, לאחד מהם קוראים מפתח ראשי – primary. • ערך של תכונה במפתח לא יכול להיות Null. • מפתח זר: לתכונה של מפתח זר בטבלה יש אותם תחומים כמו לתכונות של מפתח ראשי בטבלה אחרת. • אלוץ מפתח זר: FK constraint – referential integrity constraint ערך של מפתח זר הוא או Null או שקיים כערך של מפתח ראשי בטבלה המתאימה. • משמעות: קשר • חשיבות: הוספה, עדכון ובטול • נראה הגדרה פורמלית מייד
המודל הטבלאי – אילוצים נוספים • אילוצי ערך: 0 < salary < 100,000 • אילוצים בין שדות: QTY_SUPPLIED ≤ QTY_ORDERED • תלויות פונקציונליות: NAME -> AGE אם אותו שם מופיע פעמיים אז הוא מופיע עם אותו גיל. • SQL Integrity Constraints – אילוצים מורכבים
בסיס נתונים טבלאי - הגדרה • אוסף הטבלאות, תכונות, תחומים, מפתחות ואילוצי שלמות וכל המופעים של הטבלאות הללו המקיימים את ההגדרות והאילוצים הללו.
Domain Constraints • Integrity constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency. • Domain constraints are the most elementary form of integrity constraint. • They test values inserted in the database, and test queries to ensure that the comparisons make sense. • New domains can be created from existing data types • E.g. create domainDollarsnumeric(12, 2)create domainPoundsnumeric(12,2) • We cannot assign or compare a value of type Dollars to a value of type Pounds. • However, we can convert type as below (castr.AasPounds) (Should also multiply by the dollar-to-pound conversion-rate)
Domain Constraints (Cont.) • The check clause in SQL-92 permits domains to be restricted: • Use check clause to ensure that an hourly-wage domain allows only values greater than a specified value. create domain hourly-wage numeric(5,2) constraintvalue-test check(value > = 4.00) • The domain has a constraint that ensures that the hourly-wage is greater than 4.00 • The clause constraintvalue-test is optional; useful to indicate which constraint an update violated. • Can have complex conditions in domain check • createdomainAccountTypechar(10)constraintaccount-type-testcheck (valuein (‘Checking’, ‘Saving’)) • check (branch-namein (selectbranch-namefrombranch))
Referential Integrity • Ensures that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. • Example: If “Perryridge” is a branch name appearing in one of the tuples in the account relation, then there exists a tuple in the branch relation for branch “Perryridge”. • Formal Definition • Let r1(R1) and r2(R2) be relations with primary keys K1 and K2 respectively. • The subset of R2 is a foreign key referencing K1in relation r1, if for every t2 in r2 there must be a tuple t1 in r1 such that t1[K1] = t2[]. • Referential integrity constraint is also called subset dependency since its can be written as (r2) K1 (r1)
R E1 E2 Referential Integrity in the E-R Model • Consider relationship set R between entity sets E1 and E2. The relational schema for R includes the primary keys K1 of E1and K2 of E2.Then K1 and K2 form foreign keys on the relational schemas for E1and E2 respectively. • Weak entity sets are also a source of referential integrity constraints. • For the relation schema for a weak entity set must include the primary key attributes of the entity set on which it depends
Checking Referential Integrity on Database Modification • The following tests must be made in order to preserve the following referential integrity constraint: (r2) K (r1) • Insert. If a tuple t2 is inserted into r2, the system must ensure that there is a tuple t1 in r1 such that t1[K] = t2[]. That is t2[] K (r1) • Delete. If a tuple, t1 is deleted from r1, the system must compute the set of tuples in r2 that reference t1: = t1[K] (r2) If this set is not empty • either the delete command is rejected as an error, or • the tuples that reference t1 must themselves be deleted(cascading deletions are possible).
Database Modification (Cont.) • Update. There are two cases: • If a tuple t2 is updated in relation r2and the update modifies values for foreign key , then a test similar to the insert case is made: • Let t2’ denote the new value of tuple t2. The system must ensure that t2’[] K(r1) • If a tuple t1 is updated in r1, and the update modifies values for the primary key (K), then a test similar to the delete case is made: • The system must compute = t1[K] (r2) using the old value of t1 (the value before the update is applied). • If this set is not empty • the update may be rejected as an error, or • the update may be cascaded to the tuples in the set, or • the tuples in the set may be deleted.
Referential Integrity in SQL • Primary and candidate keys and foreign keys can be specified as part of the SQL create table statement: • The primary key clause lists attributes that comprise the primary key. • The unique key clause lists attributes that comprise a candidate key. • The foreign key clause lists the attributes that comprise the foreign key and the name of the relation referenced by the foreign key. • By default, a foreign key references the primary key attributes of the referenced table foreign key (account-number) references account • Short form for specifying a single column as foreign key account-number char (10) references account • Reference columns in the referenced table can be explicitly specified • but must be declared as primary/candidate keys foreign key (account-number) references account(account-number)
Referential Integrity in SQL – Example create table customer(customer-name char(20),customer-street char(30),customer-city char(30),primary key (customer-name)) create table branch(branch-name char(15),branch-city char(30),assets integer,primary key(branch-name))
Referential Integrity in SQL – Example (Cont.) create table account(account-number char(10),branch-name char(15),balance integer,primary key (account-number), foreign key (branch-name) references branch) create table depositor(customer-name char(20),account-number char(10),primary key (customer-name, account-number),foreign key (account-number) references account,foreign key (customer-name) references customer)
Cascading Actions in SQL create table account . . .foreign key(branch-name) references branchon delete cascade on update cascade. . . ) • Due to the on delete cascade clauses, if a delete of a tuple in branchresults in referential-integrity constraint violation, the delete “cascades” to the account relation, deleting the tuple that refers to the branch that was deleted. • Cascading updates are similar.
Cascading Actions in SQL (Cont.) • If there is a chain of foreign-key dependencies across multiple relations, with on delete cascade specified for each dependency, a deletion or update at one end of the chain can propagate across the entire chain. • If a cascading update to delete causes a constraint violation that cannot be handled by a further cascading operation, the system aborts the transaction. • As a result, all the changes caused by the transaction and its cascading actions are undone. • Referential integrity is only checked at the end of a transaction • Intermediate steps are allowed to violate referential integrity provided later steps remove the violation • Otherwise it would be impossible to create some database states, e.g. insert two tuples whose foreign keys point to each other • E.g. spouse attribute of relation marriedperson(name, address, spouse)
Referential Integrity in SQL (Cont.) • Alternative to cascading: • on delete set null • on delete set default • Null values in foreign key attributes complicate SQL referential integrity semantics, and are best prevented using not null • if any attribute of a foreign key is null, the tuple is defined to satisfy the foreign key constraint!
Foreign keys and Relationships • In the relational model foreign keys represent relationships • In contrast to primary (candidate) keys foreign keys may be null representing partial relationship. • One to one is represented as a foreign key in one of the participating relations. Always prefer the relation where the participation is total to avoid null values • Many to one is represented as foreign key in the Many relation • Many to Many is represented as two foreign keys for each of the two relations
Mapping ER to Relational and Vice-versa • ER to Relational – see slides in Chp 2 • Relational to ER. Must determine for each foreign key which type of relationship it represents (may not be unique) or existence of a weak entity set.
Schema Diagram for the Banking Enterprise(another style to denote foreign keys…)
Possible relational database state corresponding to the COMPANY scheme
הגדרות בסיס נתונים ב- SQL CREATE TABLE EMPOLYEE ( FNAME VARCHAR(15) NOT NULL. MINIT CHAR. LNAME VARCHAR(15) NOT NULL. SSN CHAR(9) NOT NULL. BDATE DATE ADDRESS VARCHAR(30). SEX CHAR. SALARY DECIMAL(10,2). SUPERSSN CHAR(9). DNO INT NOT NULL. PRIMARY KEY (SSN). FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE (SSN), FOREIGN KEY (DNO) REFERENCES DEPARTMENT (DNUMBER)); CREATE TABLE DEPARTMENT ( DNAME VARCHAR(15) NOT NULL DNUMBER INT NOT NULL MGRSSN CHR(9) NOT NULL MGRSTARTDATE DATE, PRIMARY KEY (DNUMBER) UNIQUE (DNAME) FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE (SSN)); CREATE TABLE DEPT_LOCATIONS ( DNUMBER INT NOT NULL, DLOCATION VARCHAR(15) NOT NULL, PRIMARY KEY (DNUMBER, DLOCATION), FOREIGN KEY (DNUMBER) REFERENCES DEPARTMENT (DNUMBER)
הגדרות בסיס נתונים ב- SQL CREATE TABLE PROJECT ( PNAME VARCHAR(15) NOT NULL, PNUMBER INT NOT NULL, PLOCATION VARCHAR(15) . DNUM INT NOT NULL, PRIMARY KEY (PNUMBER) UNIQUE (PNAME) FOREIGN KEY (DNUM) REFERENCES DEPARTMENT (DNUMBER) ); CREATE TABLE WORKS_ON ( ESSN CHAR(9) NOT NULL, PNO INT NOT NULL, HOURS DECIMAL(3, 1) NOT NULL, PRIMARY KEY (ESSN, PNO), FOREIGN KEY (ESSN) REFERENCES EMPLOYEE (SSN), FOREIGN KEY (PNO) REFERENCES PROJECT (PNUMBER) ); CREATE TABLE DEPENDENT ( ESSN CHAR(9) NOT NULL, DEPENDENT_NAME VARCHR(15) NOT NULL, SEX CHAR, BDATE DATE, RELATIONSHIP VARCHAR(8), PRIMARY KEY (ESSN, DEPENDENR_NAME), FOREIGN KEY (ESSN) REFERENCES EMPLOYEE (SSN) );
Query Languages • Language in which user requests information from the database. • Categories of languages • procedural • non-procedural • “Pure” languages: • Relational Algebra • Tuple Relational Calculus • Domain Relational Calculus • Pure languages form underlying basis of query languages that people use.
Relational Algebra • Procedural language • Six basic operators • select • project • union • set difference • Cartesian product • rename • The operators take one, two or more relations as inputs and give a new relation as a result.
אלגברה טבלאית • בחירה – selectB • הטלה – projectA, B, C • מכפלה קרטזית – AxB • איחוד – UnionU • חיתוך – Intersection∩ • הפרש – Difference - • צימוד - JOINB • חילוק – Division %
Select Operation – Example A B C D • Relation r 1 5 12 23 7 7 3 10 • A=B ^ D > 5(r) A B C D 1 23 7 10
Select Operation • Notation: p(r) • p is called the selection predicate • Defined as: p(r) = {t | t rand p(t)} Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not)Each term is one of: <attribute> op <attribute> or <constant> where op is one of: =, , >, . <. • Example of selection:branch-name=“Perryridge”(account)
Project Operation – Example • Relation r: A B C 10 20 30 40 1 1 1 2 A C A C • A,C (r) 1 1 1 2 1 1 2 =
Project Operation • Notation:A1, A2, …, Ak (r) where A1, A2 are attribute names and r is a relation name. • The result is defined as the relation of k columns obtained by erasing the columns that are not listed • Duplicate rows removed from result, since relations are sets • E.g. To eliminate the branch-name attribute of accountaccount-number, balance (account)
Union Operation – Example • Relations r, s: A B A B 1 2 1 2 3 s r r s: A B 1 2 1 3