Untitled
Untitled
Universe of
Interest
Requirements
Specification
Process Data
Specifications Specifications
[ER Modeling
Process Model Conceptual Design/Schema
Grammar]
ER Diagram
Design-Specific + Updated semantic
Logical Data Modeling ER Model integrity constraints List
Technology-Independent
Logical Schema
[Information Preserving Grammar]
Technology-Independent
Normalization
Technology-Dependent
Technology-Dependent
Logical Schema
[Relational Modeling Grammar]
Physical Design/Schema
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
DATA MODELING AND
DATABASE DESIGN
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
DATA MODELING AND
DATABASE DESIGN
Second Edition
Narayan S. Umanath
University of Cincinnati
Richard W. Scamell
University of Houston
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
This is an electronic version of the print textbook. Due to electronic rights restrictions,
some third party content may be suppressed. Editorial review has deemed that any suppressed
content does not materially affect the overall learning experience. The publisher reserves the right
to remove content from this title at any time if subsequent rights restrictions require it. For
valuable information on pricing, previous editions, changes to current editions, and alternate
formats, please visit www.cengage.com/highered to search by ISBN#, author, title, or keyword for
materials in your areas of interest.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Data Modeling and Database Design, © 2015 Cengage Learning
Second Edition
WCN: 02-200-203
2
Narayan S. Umanath and
Richard W. Scamell ALL RIGHTS RESERVED. No part of this work covered by the copyright
herein may be reproduced, transmitted, stored, or used in any form or by
Production Director: Patty Stephan any means graphic, electronic, or mechanical, including but not limited to
Product Manager: Clara Goosman photocopying, recording, scanning, digitizing, taping, Web distribution,
information networks, or information storage and retrieval systems, except
Managing Developer: Jeremy Judson
as permitted under Section 107 or 108 of the 1976 United States Copyright
Content Developer: Wendy Langeurd Act, without the prior written permission of the publisher.
Product Assistant: Brad Sullender
Senior Marketing Manager: Eric La Scola For product information and technology assistance, contact us at
Cengage Learning Customer & Sales Support, 1-800-354-9706
IP Analyst: Sara Crane
For permission to use material from this text or product,
Senior IP Project Manager: Kathryn Kucharek submit all requests online at www.cengage.com/permissions
Manufacturing Planner: Ron Montgomery Further permissions questions can be e-mailed to
[email protected]
Art and Design Direction, Production
Management, and Composition:
PreMediaGlobal Library of Congress Control Number: 2014934580
Cover Image: © VikaSuh/www.Shutterstock.com ISBN-13: 978-1-285-08525-8
ISBN-10: 1-285-08525-6
Cengage Learning
20 Channel Center Street
Boston, MA 02210
USA
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
To Beloved Bhagwan Sri Sathya Sai Baba, the very source
of my thoughts, words, and deeds
To my Graduate Teaching Assistants and students,
the very source of my inspiration
To my dear children, Sharda and Kausik, always concerned
about their dad overworking
To my dear wife Lalitha, a pillar of courage I always lean on
Uma
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
BRIEF CONTENTS
Preface xvii
Chapter 1
Database Systems: Architecture and Components 1
Chapter 2
Foundation Concepts 30
Chapter 3
Entity-Relationship Modeling 79
Chapter 4
Enhanced Entity-Relationship (EER) Modeling 141
Chapter 5
Modeling Complex Relationships 197
Chapter 6
The Relational Data Model 280
P a r t I I I : N o rm a l i z a t i o n
Chapter 7
Functional Dependencies 358
Chapter 8
Normal Forms Based on Functional Dependencies 395
Chapter 9
Higher Normal Forms 467
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
viii Brief Contents
P a r t I V : D a t a b a s e I mp l e me n t a t i o n U s i n g th e R e l a t i o n a l
Data Model
Chapter 10
Database Creation 506
Chapter 11
Relational Algebra 539
Chapter 12
Structured Query Language (SQL) 567
Chapter 13
Advanced Data Manipulation Using SQL 635
Appendix A
Data Modeling Architectures Based on the Inverted Tree
and Network Data Structures 719
Appendix B
Object-Oriented Data Modeling Architectures 731
Index 743
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
TABLE OF CONTENTS
Preface xvii
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
x Table of Contents
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table of Contents xi
P a r t I I I : N o rm a l i z a t i o n
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xii Table of Contents
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table of Contents xiii
P a r t I V : Da t a b a s e I m p l e m e n t a t i o n U s i n g th e R e l a t i o n a l
Data Model
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xiv Table of Contents
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table of Contents xv
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PREFACE
QUOTE
Everything should be made as simple as possible—but no simpler.
—Albert Einstein
Popular business database books typically provide broad coverage of a wide variety of
topics, including data modeling, database design and implementation, database
administration, the client/server database environment, the Internet database envi-
ronment, distributed databases, and object-oriented database development. This is
invariably at the expense of deeper treatment of critical topics, such as principles of
data modeling and database design. Using current business database books in our
courses, we found that in order to properly cover data modeling and database design,
we had to augment the texts with significant supplemental material (1) to achieve
precision and detail and (2) to impart the depth necessary for the students to gain a
robust understanding of data modeling and database design. In addition, we ended up
skipping several chapters as topics to be covered in a different course. We also know
other instructors who share this experience. Broad coverage of many database topics
in a single book is appropriate for some audiences, but that is not the aim of this
book.
The goal of Data Modeling and Database Design, Second Edition is to provide
core competency in the areas that every Information Systems (IS), Computer Science
(CS), and Computer Information Systems (CIS) student and professional should
acquire: data modeling and database design. It is our experience that this set of
topics is the most essential for database professionals, and that, covered in sufficient
depth, these topics alone require a full semester of study. It is our intention to
address these topics at a level of technical depth achieved in CS textbooks, yet make
palatable to the business student/IS professional with little sacrifice in precision. We
deliberately refrain from the mathematics and algorithmic solutions usually found in
CS textbooks, yet we attempt to capture the precision therein via heuristic
expressions.
Data Modeling and Database Design, Second Edition provides not just hands-on
instruction in current data modeling and database design practices, it gives readers a
thorough conceptual background for these practices. We do not subscribe to the idea
that a textbook should limit itself to describing what is actually being practiced.
Teaching only what is being practiced is bound to lead to knowledge stagnation.
Where do practitioners learn what they know? Did they invent the relational data
model? Did they invent the ER model? We believe that it is our responsibility to
present not only industry “best practices” but also to provide students (future practi-
tioners) with concepts and techniques that are not necessarily used in industry today
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xviii Preface
but can enliven their practice and help it evolve without knowledge stagnation. One
of the coauthors of this book has worked in the software development industry for
over 15 years, with a significant focus on database development. His experience indi-
cates that having a richness of advanced data modeling constructs available enhances
the robustness of database design and that practitioners readily adopt these techni-
ques in their design practices.
In a nutshell, our goal is to take an IS/CS/CIS student/professional through an
intense educational experience, starting at conceptual modeling and culminating in a
fully implemented database design—nothing more and nothing less. This educational
journey is briefly articulated in the following paragraphs.
STRUCTURE
We have tried very hard to make the book “fluff-free.” It is our hope that every sen-
tence in the book, including this preface, adds value to a reader’s learning (and foot-
notes are no exception to this statement).
The book begins with an introduction to rudimentary concepts of data, metadata,
and information, followed by an overview of data management. Pointing out the limita-
tions of file-processing systems, Chapter 1 introduces database systems as a solution to
overcome these limitations. The architecture and components of a database system that
makes this possible are discussed. The chapter concludes with the presentation of a
framework for the database system design life cycle. Following the introductory chapter
on database systems architecture and components, the book contains four parts.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xix
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xx Preface
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xxi
database systems. We have chosen to limit the scope of this book exclusively to data
modeling and database design since we firmly believe that this set of topics is the
core of database systems and must be learned in depth by every IS/CS/CIS student
and practitioner. Any system designed robustly has the potential to best serve the
needs of the users. More importantly, a poor design is a virus that can ruin an
enterprise.
In this light, we believe these are the unique strengths of this book:
• It presents conceptual modeling using the entity-relationship modeling gram-
mar including extensive discussion of the enhanced entity-relationship (ER)
model.
We believe that a conceptual model should capture all possible constraints
conveyed by the business rules implicit in users’ requirement specifica-
tions. To that end, we posit that an ER diagram is not an ER model unless
accompanied by a comprehensive specification of characteristics of and
constraints pertaining to attributes. We accomplish this via a list of
semantic integrity constraints (sort of a conceptual data dictionary) that
will accompany an ER diagram, a unique feature that we have not seen in
other database textbooks. We also seek to demonstrate the systematic
development of a multi-layer conceptual data model via a comprehensive
illustration at the beginning of each Part. We consider the multi-layer
modeling strategy and the heuristics for systematic development as unique
features of this book.
• It includes substantial coverage of higher-degree relationships and other
complex relationships in the entity-relationship diagram.
Most business database books seem to provide only a cursory treatment of
complex relationships in an ER model. We not only cover relationships
beyond binary relationships (e.g., ternary and higher-degree relationships),
we also clarify the nuances pertaining to the necessity and efficacy of
higher-degree relationships and the various conditions under which even
recursive and binary relationships are aggregated in interesting ways to
form cluster entity types.
• It discusses the information-preserving issue in data model mapping and
introduces a new information-preserving grammar for logical data modeling.
Many computer scientists have noted that the major difficulty of logical
database design (i.e., transforming an ER schema into a schema in the lan-
guage of some logical model) is the information preservation issue. Indeed,
assuring a complete mapping of all modeling constructs and constraints
that are inherent, implicit or explicit, in the source schema (e.g., ER/EER
model) is problematic since constraints of the source model often cannot be
represented directly in terms of structures and constraints of the target
model (e.g., relational schema). In such a case, they must be realized
through application programs; alternatively, an information-reducing trans-
formation must be accepted (Fahrner and Vossen, 1995). In their research,
initially presented at the Workshop on Information Technologies (WITS) in
the ICIS (International Conference on Information Systems) in Brisbane,
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xxii Preface
SUPPORTING TECHNOLOGIES
Any business database book can be effective only when supporting technologies are
made available for student use. Yet, we don’t think that the type of book we are writ-
ing should be married to any commercial product. The specific technologies that will
render this book highly effective include a drawing tool (such as Microsoft Visio), a
software engineering tool (such as ERWIN, ORACLE/Designer, or Visible Analyst),
and a relational database management system (RDBMS) product (such as ORACLE,
SQL Server, or DB2).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xxiii
SUPPLEMENTAL MATERIALS
The following supplemental materials are available to instructors when this book is
used in a classroom setting. Some of these materials may also be found on the
Cengage Learning Web site at www.cengage.com.
• Electronic Instructor’s Manual: The Instructor’s Manual assists in class
preparation by providing suggestions and strategies for teaching the text, and
solutions to the end-of-chapter questions/problems.
• Sample Syllabi and Course Outline: The sample syllabi and course outlines
are provided as a foundation to begin planning and organizing your course.
• Cognero Test Bank: Cognero allows instructors to create and administer
printed, computer (LAN-based), and Internet exams. The Test Bank includes
an array of questions that correspond to the topics covered in this text,
enabling students to generate detailed study guides that include page refer-
ences for further review. The computer-based and Internet testing compo-
nents allow students to generate detailed study guides that include page
references for further review. The computer-based and Internet testing
components allow students to take exams at their computers, and also save
the instructor time by automatically grading each exam. The Test Bank is
also available in Blackboard and WebCT versions posted online at www
.course.com.
• PowerPoint Presentations: Microsoft PowerPoint slides for each chapter are
included as a teaching aid for classroom presentation, to make available to
students on the network for chapter review, or to be printed for classroom
distribution. Instructors can add their own slides for additional topics they
introduce to the class.
• Figure Files: Figure files from each chapter are provided for the instructor’s
use in the classroom.
• Data Files: Data files containing scripts to populate the database tables used
as examples in Chapters 11 and 12 are provided on the Cengage Learning
Web site at www.cengage.com.
ACKNOWLEDGMENTS
We have never written a textbook before. We have been using books written by our
academic colleagues, always supplemented with handouts that we developed our-
selves. Over the years, we accumulated a lot of supplemental material. In the begin-
ning, we took the positive feedback from the students about the supplemental
material rather lightly until we started to see comments like “I don’t know why I
bought the book; the instructor’s handouts were so good and much clearer than the
book” in the student evaluation forms. Our impetus to write a textbook thus origi-
nated from the consistent positive feedback from our students.
We also realized that, contrary to popular belief, business students are certainly
capable of assimilating intricate technical concepts; the trick is to frame the concepts
in meaningful business scenarios. The unsolicited testimonials from our alumni about
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xxiv Preface
the usefulness of the technical depth offered in our database course in solving real-
world design problems reinforced our faith in developing a book focused exclusively
on data modeling and database design that was technically rigorous but permeated
with business relevance.
Since we both teach database courses regularly, we have had the opportunity to
field-test the manuscript of this book for close to 10 years at both undergraduate-level
and graduate-level information systems courses in the Carl Lindner College of
Business at the University of Cincinnati and in the C. T. Bauer College of Business at
the University of Houston. Hundreds of students—mostly business students—have
used earlier drafts of this textbook so far. Interestingly, even the computer science
and engineering students taking our courses have expressed their appreciation of the
content. This is a long preamble to acknowledge one of the most important and for-
mative elements in the creation of this book: our students.
The students’ continued feedback (comments, complaints, suggestions, and criti-
cisms) have significantly contributed to the improvement of the content. As we were
cycling through revisions of the manuscript, the graduate teaching assistants of
Dr. Umanath were a constant source of inspiration. Their meaningful questions and
suggestions added significant value to the content of this book. Dr. Scamell was ably
assisted by his graduate assistants as well.
We would also like to thank the following reviewers whose critiques, comments,
and suggestions helped shape every chapter of this book’s first edition:
Akhilesh Bajaj, University of Tulsa
Iris Junlgas, Florida State University
Margaret Porciello, State University of New York/Farmingdale
Sandeep Purao, Pennsylvania State University
Jaymeen Shah, Texas State University
Last, but by no means the least, we gratefully acknowledge the significant contri-
bution of Deb Kaufmann and Kent Williams, the development editors of our first and
second editions, respectively. We cannot thank them enough for their thorough and
also prompt and supportive efforts.
Enjoy!
N. S. Umanath
R. W. Scamell
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 1
DATABASE SYSTEMS:
ARCHITECTURE AND
COMPONENTS
Data modeling and database design involve elements of both art and engineering.
Understanding user requirements and modeling them in the form of an effective logical
database design is an artistic process. Transforming the design into a physical database
with functionally complete and efficient applications is an engineering process.
To better comprehend what drives the design of databases, it is important to under-
stand the distinction between data and information. Data consists of raw facts—that is,
facts that have not yet been processed to reveal their meaning. Processing these facts
provides information on which decisions can be based.
Timely and useful information requires that data be accurate and stored in a manner
that is easy to access and process. And, like any basic resource, data must be managed
carefully. Data management is a discipline that focuses on the proper acquisition, storage,
maintenance, and retrieval of data. Typically, the use of a database enables efficient and
effective management of data.
This chapter introduces the rudimentary concepts of data and how information
emerges from data when viewed through the lens of metadata. Next, the discussion
addresses data management, contrasting file-processing systems with database systems.
This is followed by brief examples of desktop, workgroup, and enterprise databases. The
chapter then presents a framework for database design that describes the multiple tiers of
data modeling and how these tiers function in database design. This framework serves as a
roadmap to guide the reader through the remainder of the book.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
the number 31 (obtained by summing the 10 digits). A mathematician may see a set of
2
prime numbers, viz., 2, 3, 5, 7, 11, 13, 17. Another might see a person’s phone number with
the first three digits constituting the area code and the remaining seven digits the local
phone number. On the other hand, if the first digit is used to represent a person’s gender
(1 for male and 2 for female) and the remaining nine digits the person’s Social Security
number, the 10 digits would mean something else. Numerous other interpretations are pos-
sible, but without a context it is impossible to say what the digits represent. However, when
framed in a specific context (such as being told that the first digit represents a person’s
gender and the remaining digits the Social Security number), the data is transformed into
information. It is important to note that “information” is not necessarily the “Truth” since
the same data yields different information based on the context; information is an inference.
Metadata, in a database environment, is data that describes the properties of data. It
contains a complete definition or description of database structure (i.e., the file structure,
data type, and storage format of each data item), and other constraints on the stored data.
For example, when the structure of the 10 digits 2357111317 is revealed, the 10 digits
become information, such as a phone number. Metadata defines this structure. In other
words, through the lens of metadata, data takes on specific meaning and yields information.1
Metadata may be characterized as follows:
• The lens to view data and infer information
• A precise definition of the context for framing the data
Table 1.1 contains metadata for the data associated with a manufacturing plant. Later
in this chapter, we will see that in a database environment, metadata is recorded in what
is called a data dictionary.
Record
Type Data Element Data Type Size Source Role Domain
As reflected in Table 1.1, the smallest unit of data is called a data element. A group of
related data elements treated as a unit (such as Pl_name, Pl_number, Budget, Building,
1
With the advent of the data warehouse, the term “metadata” assumes a more comprehensive
meaning to include business and technical metadata, which is outside the scope of the current
discussion.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
and No_of_employees) is called a record type. A set of values for the data elements con-
3
stituting a record type is called a record instance or simply a record. A file is a collection
of records. A file is sometimes referred to as a data set. A company with 10 plants would
have a PLANT file or a PLANT data set that contains 10 records.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
billing systems. Nonetheless, understanding their limitations provides insight into the
4
development of and justification for database systems.
Figure 1.1 shows three file-processing systems for a hypothetical university. One pro-
cesses data for students, another processes data for faculty and staff, and a third processes
data for alumni. In such an environment, each file-processing system has its own collec-
tion of private files and programs that access these files.
While an improvement over the manual systems that preceded them, file-processing
systems suffer from a number of limitations:
• Lack of data integrity—Data integrity ensures that data values are correct,
consistent, complete, and current. Duplication of data in isolated file-
processing systems leads to the possibility of inconsistent data. Then it is
difficult to identify which of these duplicate data is correct, complete, and/
or current. This creates data integrity problems. For example, if an
employee who is also a student and an alumnus changes his or her mailing
address, files that contain the mailing address in three different file-
processing systems require updating to ensure consistency of information
across the board. Data redundancy across the three file-processing
systems not only creates maintenance inefficiencies, it also leads to the
problem of not knowing which is the current, correct, and /or complete
address of the person.
• Lack of standards—Organizations with file-processing systems often lack or
find it difficult to enforce standards for naming data items as well as for
accessing, updating, and protecting data. The absence of such standards can
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
2
In a database context, the word “schema” stands for “description of metadata.”
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
3
While an external schema is technically a collection of external subschemas or views, the term
“external schema” is used here in the context of either an individual user view or a collection of
different user views.
4
Informally, a “view” is a term that describes the information of interest to a user or a group of
users, where a user can be either an end user or a programmer. See Chapter 6 (Section 6.4) for
a more precise definition of a “view.”
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
schema. The internal schema describes the physical structure of the stored data (how the
8
data is actually laid out on storage devices) and the mechanism used to implement the
access strategies (indexes, hashed addresses, and so on). The internal schema is con-
cerned with efficiency of data storage and access mechanisms in the database. Thus, the
internal schema is technology dependent, while the conceptual schema and external
schemas are technology independent. In principle, user views are generated on demand
through logical reference to data items in the conceptual schema independent of the logi-
cal or physical structure of the data.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
in Figure 1.3. Here, the programmer’s view corresponds to the physical structure of the
9
data, meaning that the physical structure of data (internal schema) is fully mapped
(incorporated) into the application program. The file-processing system lacks program-
data independence because any modification to the storage structure or access strategy in
the internal schema necessitates changes to application programs and subsequent recom-
pilation and testing. In the absence of a conceptual schema, the internal schema struc-
tures are necessarily mapped directly to external views (or subschemas). Consequently,
changes in the internal schema require appropriate changes in the external schema;
therefore, data independence is lost. Because changes to the internal schema, such as
incorporating new user requirements and accommodating technological enhancements,
are expected in a typical application environment, absence of a conceptual schema essen-
tially sacrifices data independence. In short, file-processing systems lack data indepen-
dence because they employ what amounts to a two-schema architecture.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
10
1.5 CHARACTERISTICS OF DATABASE SYSTEMS
Database systems seek to overcome the two root causes of the limitations that plague file-
processing systems by creating a single integrated set of files that can be accessed by all
users. This integrated set of files is known as a database. A database management system
(typically referred to as a DBMS) is a collection of general-purpose software that facilitates
the processes of defining, constructing, and manipulating a database for various applica-
tions. Figure 1.4 provides a layman’s view of the difference between a database and a
database management system. This illustration shows how neither a user nor a program-
mer is able to access data in the database without going through the database manage-
ment system software. Whether a program is written in Java, C, COBOL, or some other
language, the program must “ask” the DBMS for the data, and the DBMS will fetch the
data. SQL (Structured Query Language) has been established as the language for acces-
sing data in a database by the International Organization for Standardization (ISO) and
the American National Standards Institute (ANSI). Accordingly, any application program
that seeks to access a database must do so via embedded SQL statements.
An important purpose of this book is to discuss how to organize the data items con-
11
ceptualized in Figure 1.4. In reality, data items do not exist in one big pool surrounded by
the database management system. Several different architectures exist for organizing this
data. One is a hierarchical organization, another is a network organization, and a third is
relational; in this book, the relational approach is emphasized.5 While the data items that
collectively comprise the database at the physical level are stored as sequential, indexed,
and random files, the DBMS is a layer on top of these files that frees the user and appli-
cation programs from the burden of knowing the structures of the physical files (unlike a
file-processing system).
Next, let us look more closely at what constitutes a database, a database management
system, and finally a database system.
5
Two relatively new data modeling architectures (the object-oriented data model and the object-
relational model) also exist. Appendix B briefly discusses each of these architectures. Appendix A
reviews architectures based on the hierarchical and network organizations.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
13
Data manipulation languages (DMLs) facilitate the retrieval, insertion, deletion, and
modification of data in a database. SQL is the most well-known nonprocedural6 DML and
can be used to specify many complex database operations in a concise manner. Most
DBMS products also include procedural language extensions to supplement the capabilities
of SQL, such as Oracle PL/SQL. Other examples of procedural language extensions are
languages such as C, Java, Visual Basic, and COBOL, in which pre-compilers extract data
manipulation commands written in SQL from a program and send them to a DML com-
piler for compilation into object code for subsequent database access by the run-time sub-
system.7 Finally, the access routines handle database access at run time by passing
requests to the file manager of the operating system to retrieve data from the physical files
of the database.
Much as a dictionary is a reference book that provides information about the form,
origin, function, meaning, and syntax of words, a data dictionary in a DBMS environment
6
SQL is known to be a nonprocedural language since it only specifies what data to retrieve as
opposed to specifying how actually to retrieve it. A procedural language specifies how to retrieve
data in addition to what data to retrieve.
7
The run-time subsystem of a database management system processes applications created by the
various design tools at run time.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
stores metadata that provides such information as the definitions of the data items and
14
their relationships, authorizations, and usage statistics. The DBMS makes use of the data
dictionary to look up the required data component structures and relationships, thus
relieving application developers (end users and programmers) from having to incorporate
data structures and relationships in their applications. In addition, any changes
made to the physical structure of the database are automatically recorded in the data
dictionary. This removes the need to modify application programs that access the
modified structure.
As a simple illustration of what constitutes a data dictionary, consider a database for
a university with four tables (i.e., data sets) of user data: a STUDENT table, an ADVISOR
table, a COURSE table, and an ENROLLMENT table. Assume that the STUDENT table
contains one row for each of the 40,000 students, one row for each of the 500 academic
advisors in the ADVISOR table, one row for each of the 2,000 courses listed in the
COURSE table, and one row each for the enrollment of a student in a course in the
ENROLLMENT table. As shown in the following table, the DB_TABLES data dictionary
table8 would include one row for each table containing user data. While the DB_TABLES
data dictionary table shown here (see Table 1.2) contains only the table name, number of
columns, number of rows, and primary key for each table of user data, a data dictionary
table comparable to DB_TABLES in a commercial database management system product
might have several dozen columns of data about the STUDENT table.
Number of
Table Name Columns Number of Rows Primary Key
The DB_TABLES data dictionary table indicates that the STUDENT table contains
four columns but, other than indicating that the Student_Number column is the primary
key, does not contain any information about any of the other column names. However,
this data is contained in the following data dictionary table (see Table 1.3) as DB_TABLES_
COLUMNS. DB_TABLES_COLUMNS contains one row for each column in each of the four
tables STUDENT, ADVISOR, COURSE, and ENROLLMENT. As was the case for
DB_TABLES, a data dictionary table comparable to DB_TABLES_COLUMNS in a commer-
cial database management systems product might have several dozen columns of data
about each column in each table of user data.
8
Data dictionary tables are typically accessed via built-in views.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
It is important to note that the content of these data dictionary tables is updated
whenever a change is made to the database. For example, if a PROFESSOR table that
consists of five columns is added to the database, then one row would be inserted into the
DB_TABLES data dictionary table and five rows would be inserted into the DB_TABLES_
COLUMNS data dictionary table. Each time a row is inserted into the PROFESSOR table,
the Number of Rows column in the DB_TABLES data dictionary table would be incremen-
ted by one.
While not a component of the DBMS per se, the data repository has become an inte-
gral part of the data management suite of tools. The data repository is a collection of
metadata about data models and application program interfaces. CASE (computer-aided
software engineering) tools such as Oracle Designer and ERWIN that are used for develop-
ing a conceptual/logical schema9 interact with the data repository and are independent of
the database and the DBMS.
9
The data modeling activity includes the development of the conceptual and logical schemas. The
role of the conceptual schema in database design is discussed in Section 1.6.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
database management system software to create and manipulate the database, a database
16
management system is usually purchased from a software vendor such as IBM, Microsoft,
or Oracle. Figure 1.6 illustrates how a database system for the hypothetical university
introduced earlier might be structured.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
The notion of data relatability involves the creation of logical relationships between
17
different types of records, usually in different files. In a file-processing environment,
information often cannot be generated without a programmer writing or at least modifying
an application program to consolidate the files. With the advent of database systems, all
that is necessary is for one to specify the data to be combined; the DBMS will perform the
necessary operations to accomplish the task.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
18
Universe of
Interest
Requirements
Specification
Process Data
Specifications Specifications
[ER Modeling
Process Model Conceptual Design/Schema
Grammar]
ER Diagram
Design-Specific + Updated semantic
Logical Data Modeling ER Model integrity constraints List
Technology-Independent
Logical Schema
[Information Preserving Grammar]
Technology-Independent
Normalization
Technology-Dependent
Technology-Dependent
Logical Schema
[Relational Modeling Grammar]
Physical Design/Schema
The initial step in the design process is the requirements specification. During this
step, systems analysts review existing documents and systems and interview prospective
users in an effort to identify the objectives to be supported by the database system. The
output of the requirements specification activity is a set of data and process specifications.
This is essentially an organized conglomeration of user-specified restrictions on the orga-
nization’s activities (business processes) that must be reflected in the database and/or
database applications. Such restrictions are commonly referred to as business rules.
In order to define the data requirements, one needs to know the process requirements—
that is, what is going to be done with the data. For example, suppose a company is going to
sell a product. What processes are involved? When a company sells a product, it bills the
customers who purchase the product. Then, shipping has to be notified to dispatch the
product to the customer. Shipping also has to check the inventory and make sure that
inventory levels are adjusted as a result of sales. The inventory system must make sure that
inventory levels are optimal and, accordingly, replenish inventory periodically.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
10
The early technique for expressing process requirements was “data flow diagrams.” The current
methods include UML (Unified Modeling Language) and BPMN (Business Process Modeling Notation).
An examination of approaches for identifying process requirements is usually part of systems analysis
and design and thus is not discussed here.
11
Entity relationship (ER) modeling is a “design by analysis” modeling approach and is top-down in
nature, while NIAM (Nijssen Information Analysis Methodology) modeling is a “design by synthesis”
approach and is bottom-up in nature.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
12
It is not possible to capture all business rules contained in a requirements specification in the ER
diagram. Semantic integrity constraints are business rules that are not captured in the ER diagram.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
LAW_FIRM 1 n LAW_SUIT
Files Type_of_case
Location
Status
No_of_emps 1 m
Role
Employs Involved_in
1 n
n Handles
n
Name First_nm Last_nm
Gender
Employee# Address Phone#
Qualification
Name
1 n
LAWYER Represents CLIENT
Experience
The ERD is a pictorial representation of the business rules reflected in the require-
ments specification. As stated earlier in this section, ER modeling is covered in
Chapters 2, 3, 4, and 5. So, here we can use a simple intuitive interpretation of the ERD.
LAW_FIRM, LAWYER, CLIENT, and LAW_SUIT appear as entity units, with specific
attributes describing each of them (see Figure 1.8a). The diamonds represent specific
associations among these entity units. One can see the user-friendliness of this diagram
and appreciate how it can be used by systems analysts as a tool to communicate with
the user community about how the user requirements are being modeled for ultimate
implementation as a database system.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
The relational schema for the vignette mapped from the ER model in Figure 1.8a is
22
displayed in Figure 1.8b. It should be obvious to the reader that the relational schema is
certainly a user-friendly rendition of the metadata specification and cannot be used for
communication with the user community. The relational schema here is the database
design using the relational architecture and is intended for communication with the
database designer/architect/administrator. From an intuitive stance, one can see that, for
starters, each entity unit in the ERD is mapped to a relation schema. All associations
except one among these entity units are marked by an expression of connectivity between
attributes in the respective entity units using simple arrow marks. The relationship Invol-
ved_in, however, is expressed as a relation schema, due to the nature of the association
between CUSTOMER and LAW_SUIT (m:n). Please note that the other three diamonds
convey an association of (1:n or n:1). Likewise, the descriptor Location of LAW_FIRM,
indicating multiple locations (see Figure 1.8a), is mapped as a separate relation schema in
the logical model.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
The transition from a logical schema to a physical design entails an intermediate step
23
of transforming the logical schema to a database language. In the relational database
architecture, this language is called SQL. For a quick preview of SQL, Figure 1.8c presents
the relational schema displayed in Figure 1.8b in SQL—more precisely, using the Data
Definition Language (DDL) subset of SQL.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
FIGURE 1.8c DDL/SQL for the relational schema in Figure 1.8b (continued)
As shown in Figure 1.7, the physical design activity is fully technology dependent.
Physical design involves using the tools of a particular DBMS product to create the data-
base and to design and develop applications that address the high-level requirements of
the universe of interest. The objective here is twofold: (a) developing an appropriate
structure for the database and (b) keeping focus on performance while determining the
physical structure for the database. A good physical database design is impossible without
the database designer understanding the “job mix” for the particular application
environment—that is, the mix of transactions, queries, applications, etc.
The ANSI/SPARC three-schema architecture in Figure 1.3 forms the basis for the data
modeling/database design life cycle shown in Figure 1.7. Starting with the conceptual
design activity and progressing through the logical design and physical design activities
mirrors the nucleus of interest of the three-schema architecture—that is, the conceptual
schema. This figure is replicated at the beginning of Parts I, II, III, and IV of this book to
serve as a roadmap through the topics covered in the chapters of each part.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components
Chapter Summary 25
Data consists of raw facts—that is, facts that have not yet been organized or processed to
reveal their meaning. Information is data in context—that is, data that has been organized into
a specific context that has meaning and value. Metadata describes the properties of data. It is
through the lens of metadata that data becomes information.
Data management involves four actions: creating, retrieving, updating, and deleting data.
Two data management functions support these actions: organizing data and accessing data.
Two primary forms of access are sequential access and direct access.
Database systems have been successful because they overcome the problems associated
with the lack of integration of data and program-data dependence that plague their predeces-
sors, file-processing systems. Database management system (DBMS) software has been the
vehicle that has allowed many organizations to move from a file-processing environment to a
database system environment. Among the components of a DBMS are tools for (a) retrieving
and analyzing data in a database, (b) creating reports, (c) creating the structure of database
objects, (d) protecting the database from unauthorized use, and (e) facilitating recovery from
various types of failures. In a DBMS environment, the data dictionary (metadata that describes
characteristics of data) functions as the lens through which data in the database is viewed.
The ANSI/SPARC three-schema architecture divides a database system into three levels or
tiers. The external level is closest to the users and is concerned with the way data is used or
viewed by individual users. The conceptual level is technology independent and represents the
global or community view of the entire database. The internal level is the one closest to physical
storage and is concerned with the way the data is physically stored. As such, the internal level
is technology dependent. The data as perceived at each level are described by a schema (or
subschemas, in the case of the external level). A file-processing system is essentially a two-tier
architecture with only external and internal levels. Without the conceptual schema, the internal
schema must be mapped directly into external views. Thus, changes in the internal schema
require appropriate changes in the external subschemas; this is how data independence is lost.
Data models play a crucial role in database design. Data models describe the database
structure. The approach described in this book begins with the creation of a conceptual schema
that describes the structure of the data to be stored in the database without specifying how and
where it will be stored or the methods used to retrieve it. The conceptual schema takes the form
of Presentation Layer and Design-Specific ER models and, once appropriately validated,
serves as input to the logical design activity. During logical design, the technology-independent
conceptual schema evolves into a technology-dependent logical schema. This technology-
dependent logical schema is subsequently used during the physical design activity. The transi-
tion from logical design to physical design is initiated by the execution of the DDL/SQL script.
Exercises
1. What is the difference between data, metadata, and information?
2. Demonstrate your understanding of data, metadata, and information using an example.
3. Describe the four actions involved in data management.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1
4. Distinguish between sequential access and direct access. Give an example of a type of
26
application for which each is particularly appropriate.
5. Identify a common task in a payroll system for which sequential access is more appropriate
than direct access and explain why this is so.
6. What is the difference between a serial collection of data and a sequential collection of
data? Which can be used for direct access?
7. What is the purpose of an external index?
8. What is data integrity, and what is the significance of a lack of data integrity?
9. Describe the limitations of file-processing systems. How do database systems make it
possible to overcome these limitations?
10. Using the Internet, trace the history of ANSI and ISO and their relevance to the information
systems discipline. Write a summary of your findings.
11. Describe the structure of the ANSI/SPARC three-schema architecture. Compare this
structure with that of the two-schema architecture inherent in a file-processing system.
12. Explain why a file-processing system may be referred to as belonging to a two-schema
architecture.
13. Define data independence.
14. What is the difference between logical and physical data independence? Why is the
distinction between the two important?
15. What is the difference between a database and a database management system?
16. Since ANSI and ISO have adopted SQL as the standard language for database access,
explore via the Internet the history and features of SQL and its appropriateness for
database access. Write a summary of your findings.
17. Write a short essay (one or two pages) about distributed databases using information
available from Internet resources.
18. Write a short essay (one or two pages) about data warehousing using information available
from Internet resources.
19. Oil companies have functional databases, and the consumer-product industry tends to have
product databases. How do financial institutions and the airline industry classify their
enterprise database systems? Use Internet resources to find the answer, and record your
findings.
20. Find out and describe briefly what a CASE tool is, using Internet sources.
21. Distinguish between a model and a data model.
22. What is the role of data models in database design?
23. Write a short essay (one or two pages) summarizing the content of the Harvard Business
Review article on databases cited in Figure 1.4.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PART I
CONCEPTUAL DATA
MODELING
INTRODUCTION
Database systems typically have a very limited understanding of what the data in the database actu-
ally mean. Therefore, semantic modeling (the overall activity of attempting to represent meaning)
can be a valuable precursor to the database design process to capture at least some of the meaning
conveyed by the users’ business rules. The term “conceptual modeling” is often used as a synonym
for semantic modeling. In this book, conceptual modeling refers to only data specifications (not pro-
cess specifications) of the user requirements, at a high level of abstraction. Batini, Ceri, and Navathe
(1992) have argued that conceptual modeling in database design is justified because it encourages
user participation in the design process, allows the model to be more DBMS independent, facilitates
understanding of how the database fits into the organization as a whole, and eases maintenance of
schemas and applications in the long run. Sophisticated interpretations of the meaning of data are
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Part I
FIGURE I.1 Wand and Weber’s framework for research in conceptual modeling
The chapters in Part I introduce and elaborate upon conceptual data modeling using
the Wand and Weber framework. Chapter 2 introduces the fundamental constructs and
rules for the entity-relationship (ER) modeling grammar, which is used in this book as the
tool for conceptual data modeling. The constructs pertain to inter-entity class binary rela-
tionships. Chapter 3 employs a comprehensive case to illustrate the method component of
the Wand and Weber framework. In this chapter, the method to use the ER modeling
grammar is explicated in progressive steps leading to the emergence of a specific script
(i.e., an ER diagram and a semantic integrity constraints list) for the case in question.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Conceptual Data Modeling
Chapter 4 introduces newer constructs that enhance the ER (EER) modeling grammar
with means to model intra-entity class relationships. An extension to the case from
Chapter 3 incorporating a story line that requires application of EER constructs is used 29
here to demonstrate the method pertaining to the use of the EER constructs. Chapter 5
presents higher-order relationships, namely relationships of degree 3 and beyond, innova-
tive use of the grammar, and a few additional ER constructs (e.g., cluster entity type,
interrelationship dependency). A second comprehensive case is used to illustrate the
method component of the modeling framework.
Figure I.2, which replicates Figure 1.7, is a “road map” that serves as an overview of
the process of data modeling and database design, indicating how the topics in Part I fit
into the overall picture.
Universe of
Interest
Requirements
Specification
Process Data
Specifications Specifications
We
are
Process Modeling here Conceptual Data Modeling Presentation ER Diagram
Layer + A list of other semantic
ER Model integrity constraints
[ER Modeling
Process Model Conceptual Design/Schema
Grammar]
ER Diagram
Design-Specific + Updated semantic
Logical Data Modeling ER Model integrity constraints List
Technology-Independent
Logical Schema
[Information Preserving Grammar]
Technology-Independent
Normalization
Technology-Dependent
Technology-Dependent
Logical Schema
[Relational Modeling Grammar]
Physical Design/Schema
FIGURE I.2 Road map for data modeling and database design
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 2
FOUNDATION CONCEPTS
This chapter introduces the fundamental constructs and rules for conceptual data model-
ing using the entity-relationship (ER) modeling grammar as the modeling tool. The basic
units of the ER model—that is, entity type, entity class, attribute, unique identifier, and
relationship type—are treated in detail. The chapter also includes a brief introduction to
cluster entity type and a comprehensive treatment of the incorporation of deletion rules
in the ER modeling grammar.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Property Attribute
Fact Value
Association Relationship
TABLE 2.1 Equivalence between real world primitives and conceptual primitives
In the conceptual world, an object type is referred to as an entity type. Objects belonging
to an object type are considered to be entities or entity instances of the corresponding entity
type. The concept of an entity is the most fundamental one of the ER modeling grammar and
serves as the foundation for other concepts. Instead of the actual person or object, Anna Li,
the entity representing Anna Li, takes the form of a record in a STUDENT data set. Thus,
actual students are student objects and are referred to in terms of a STUDENT object type,
whereas a representation of the STUDENT object type is called the STUDENT entity type.
In the real world there can be many occurrences of a particular object type. For
example, there can be 35,000 students enrolled and taking courses in a university during
a semester—thus, 35,000 student objects. In this case, there would be 35,000 student
entities represented by 35,000 records in a STUDENT data set. The collection of these
35,000 student entities is referred to as an entity set.
An object type can have many properties. For example, a STUDENT object type has
properties such as student number, date of birth, gender, and so on. Correspondingly, an
entity type is said to have attributes.
An entity or entity instance is created when a value is supplied for some attribute(s).
Thus, a STUDENT data set with 35,000 records would contain values that represent the facts
associated with the 35,000 student objects. In addition, in the real world, a fact is drawn
from a property value set. In the conceptual world, the value of an attribute comes from a
domain of possible values. For example, the domain for the attribute Gender can be (Male,
Female), whereas the domain associated with the set of two-character U.S. postal codes is
(AK, AL, AR, … , WY). A domain can be either explicit or implicit. The domain for Gender is
an example of an explicit domain consisting of a set of only two possible values. The domain
for the attribute Salary is an example of an implicit domain because it is not practical to
explicitly list the set of all possible salaries between, say, $10,000 and $2,000,000.
In the real world, there are associations between objects of different object types.
For example, students enroll in courses, suppliers supply parts, and salespersons process
orders. In the conceptual world, these associations are referred to as relationships.
Two other terms need to be introduced as part of our foundation concepts: object class
and entity class. In the real world, an object class is a generalization of related object
types that have shared properties; in the conceptual world, an entity class is a generaliza-
tion of different related entity types that have shared attributes. For example, the entity
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
type CHAIR and the entity type TABLE would both refer to the entity class FURNITURE.
Likewise, STUDENT, FACULTY, and CUSTOMER entity types can be said to belong to an
32 entity class called HUMAN_BEING.
Attribute Characteristics
*In a stricter sense, other characteristics of an attribute (e.g., type, value) are also viewed as part of
the domain of an attribute.
A variety of data types can be associated with attributes. A “numeric” data type is
used when an attribute’s value can consist of positive and negative numbers; it is often
used in arithmetic operations. Numeric attributes can further be constrained so as to allow
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
only integer values, decimal values, and so on. The “alphabetic” data type permits an
attribute to consist of only letters and spaces, whereas an “alphanumeric” data type allows
the value of an attribute to consist of text, numbers (telephone numbers, postal codes, 33
account numbers, and so on), and certain special characters. Alphabetic and alphanu-
meric data types should not be used for attributes involved in arithmetic operations. Like-
wise, an attribute not involved in an arithmetic operation should not be defined as a
numeric data type even if it contains only numbers (telephone number, Social Security
number) to enable textual manipulations. A “logical” data type is associated with an attri-
bute whose value can be either true or false. Attributes with a “date” data type occur fre-
quently in database applications—for example, date of birth, date hired, or flight date.
A particularly important characteristic of an attribute is its classification—that is,
whether it is an atomic attribute or a composite attribute. An attribute that has a discrete
factual value and cannot be meaningfully subdivided is called an atomic or simple attri-
bute. On the other hand, a composite or molecular attribute can be meaningfully subdi-
vided into smaller subparts (i.e., atomic attributes) with independent meaning. Salary is an
atomic attribute because it cannot be meaningfully divided further. Depending on the
user’s specification, Name can be an atomic attribute or a composite attribute made up of
First name, Middle initial, and Last name. Figure 2.1 illustrates how an address might be
modeled in the form of a hierarchy of composite attributes.
Address
An attribute can be either a stored attribute or a derived attribute. In some cases, two
or more attributes are related in the sense that the value of one can be calculated or
derived from the values of the other(s). For example, if Flight time is calculated as the
difference between the arrival time at the destination and departure time at the point of
origin, then Flight time in this case can be a derived attribute since it need not be stored.
Most attributes have a single value for a particular entity and are referred to as single-
valued attributes. For example, Date of birth is a single-valued attribute of an employee. There
are attributes, however, that can have more than one value. For example, a programmer may
be skilled in several programming languages, thus making the attribute Skill a multi-valued
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
attribute. Figure 2.2 contains another example of a multi-valued attribute; here, whereas
Album_no, Price, and Stock are single-valued attributes, Artist_nm is a multi-valued attribute.
34
Artist_nm
Artist_nm
Artist_nm
For each entity of an entity type, some attributes must be assigned a value. Such attri-
butes are referred to as mandatory attributes. On the other hand, attributes that need not be
assigned a value for each entity are referred to as optional attributes. For example, it is pos-
sible that the attribute Salary might be classified as an optional attribute if the salary of each
employee need not necessarily be provided or is unknown. Another attribute, Commission,
might be classified as an optional attribute because a commission is not necessarily mean-
ingful for job types other than, perhaps, “salesperson.” Attributes classified as optional are
assigned a special value called “null” when their value is not available or is unknown.1
Composite and/or multi-valued attributes that are nested as a meaningful cluster are
called complex attributes.2 As an example, let’s consider the medical profile for a patient
as shown here:
Medical_profile (Blood (Type, Cholesterol (HDL, LDL, Triglyceride), Sugar), Height,
Weight, {Allergy (Code, Name, Intensity)})
Observe that composite attributes are enclosed in parentheses ( ), whereas multi-valued
attributes are enclosed in braces { }. The medical profiles for two patients appear in Figure 2.3.
1
“Value unknown” and “value exists but is not available” constitute what is typically called “missing
data.”
2
A complex attribute is most likely an entity type instead of an attribute. Nonetheless, the purpose
here is to explain the existence of an attribute called a complex attribute.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
Observe that the first six business rules are obvious from the narrative, whereas the
last two (shown in italics) are inferred and precisely specified from the requirements
36 specification as a whole.
A systematic study of the narrative reveals ambiguities requiring clarification by the
user community, who can be asked such questions as:
• Is a particular course offered in more than one quarter?
• Are there courses that are still “in the books” but no longer offered?
• Can an instructor teach for more than one college in the university?
Answers to such questions will sharpen the requirements specification by way of
additional business rules. In short, business rules are indispensable in the process of
translating a requirements specification to integrity constraints.
In general, integrity constraints are considered part of the schema (the description of the
metadata) in that they are declared along with the structural design of the data model (concep-
tual, logical, and physical) and hold for all valid states of a database that correctly model an
application. Although it is possible to specify all the integrity constraints as a part of the model-
ing process, some cannot be expressed explicitly or implicitly in the schema of the data model.
For instance, an ER diagram (conceptual schema) is not capable of expressing domain con-
straints of attributes. Likewise, there are other constraints that a logical schema (relational
schema) is not capable of expressing (this topic is covered in more detail in Chapter 6). As a
consequence, such constraints are carried forward through the data modeling tiers in textual
form and are often referred to as semantic integrity constraints.
At the conceptual tier of data modeling, two types of data integrity constraints per-
taining to entity types and attributes are specified:
• Domain constraint—This is imposed on an attribute to ensure that its
observed value is not outside the defined domain.
• Uniqueness constraint—This requires entities of an entity type to be
uniquely identifiable. It is sometimes referred to as the “key constraint.”
3
Strictly speaking, the data type and size of an attribute can also be construed as domain constraints
on the attribute.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
error will not be identified. Here is an example of an explicit specification of a domain con-
straint: A Student_type takes a value from the set {FR, SO, JR, SR, GR}. Here is another
example: Music_skill takes a value from the set {Rock, Jazz, Classical}. 37
NOTE
When a composite or complex attribute is optional and one or more of its proper subsets (atomic or composite)
is/are mandatory (e.g., Medical Profile in Figure 2.4), it simply means that when the composite or complex is
present, at least the mandatory subset(s) must be present.
4
Another popular term, “primary key,” is deliberately avoided here because a strict definition of the
term is possible only in the context of a relation schema discussed in Chapter 6.
5
A unique identifier is irreducible when none of its proper subsets is a unique identifier. The con-
cept of irreducibility is discussed in detail in Chapter 6.
6
In formal terms, a key attribute is a proper subset of a unique identifier and a non-key attribute is
any attribute that is not a subset of a unique identifier.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
F_name
M_int
38 Pat_no
Pat_prefix L_name
Pat_name
Pat_id
Date_of_birth
Medical_profile
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
relationship, a more generalized term is “n-ary relationship,” in which the degree of the rela-
tionship type is “n.” An entity type related to itself is termed a “recursive relationship type.”
Figures 2.5 through 2.9 illustrate binary, ternary, quaternary, and recursive relation- 39
ship types, respectively; in the figures, a diamond signifies a relationship type. Figure 2.5
illustrates a binary relationship type called Flies between Pilot and FLIGHT. A particular
pilot flying a specific flight is an instance of the relationship type Flies. This relationship
instance is often referred to as a “relationship.” The set of all relationship instances
involving pilots and flights is called a relationship set.
Figure 2.6 illustrates the ternary relationship type Teaches among PROFESSOR,
SUBJECT, and COURSE. An instance of the relationship type Teaches involves a particu-
lar Professor teaching a certain Course in a specific Subject. Another relationship instance
of Teaches could involve the same Professor teaching another Course in the same Subject
area. A third instance of the relationship type Teaches could involve the same Professor
teaching a Course in a different Subject area. A fourth relationship instance of Teaches
could involve another Professor teaching a different Course in the previous Subject area.
Examples of the four relationship instances just described are as follows:
• Example 1: Professor Einstein teaches the course Optics in Physics.
• Example 2: Professor Einstein teaches the course Mechanics in Physics.
• Example 3: Professor Einstein teaches the course Calculus in Mathematics.
• Example 4: Professor Chu teaches the course Algebra in Mathematics.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
Teaches
PROFESSOR Entity Type Relationship Type SUBJECT Entity Type
r1
Einstein Physics
r2
r3 Mathematics
Chu
r4
Optics
Mechanics
Calculus
Algebra
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
41
Quaternary relationship types are discussed in greater detail in Chapter 5. Figure 2.9
illustrates a recursive relationship type in which a NURSE acts as a supervisor of other
nurses. Two instances of this relationship type might involve the same Nurse (e.g., Flor-
ence Nightingale) supervising two different Nurses (e.g., Jean Warren and Michael Evans).
The participation of an entity type in a relationship type can be indicated by its role
name. When used in recursive relationship types, role names describe the function of each
participation. The use of role names to describe the Supervises relationship type that
appears in Figure 2.9 is given in Figure 2.10. One participation of NURSE in the Super-
vises relationship is given the role name of Supervisor to reflect the possibility that a
nurse may supervise other nurses, and the second participation is given the role name of
Supervisee to indicate the possibility that a nurse may be supervised by another nurse.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
42
Role names may also be used when two entity types are associated through more than
one relationship type. For example, consider the possibility that the PILOT and FLIGHT
entity types are associated through the two relationship types Flies and Scheduled_for.
Flies exists in order to indicate the specific pilot in charge of the flight, whereas
Scheduled_for indicates all pilots and co-pilots who are members of the flight crew. As
shown in Figure 2.11, the use of role names can clarify the purpose of each relationship.
Except for the previous two kinds of relationships, role names are often unnecessary
when the relationship specification is unambiguous.7
7
In commercial software engineering tools, role names often replace the relationship symbol in the
specification of a relationship type.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
n Held_by r5
CERTIFICATION c5
e4
r6
ER Diagram
Instance Diagram
FIGURE 2.12 Cardinality ratio and participation constraint for an m:n relationship
8
Sometimes (for example, in UML—Unified Modeling Language), the term “multiplicity” is used
instead of “structural constraints.”
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
than n entities in entity set A. This is the general form of a cardinality con-
straint in a binary relationship. An example of an m:n cardinality constraint
44 would involve the two entity types EMPLOYEE and CERTIFICATION, where
each Employee can hold many different Certifications and each Certification
can be held by many different Employees (see Figure 2.12).
• 1:n—An entity in entity set A is associated with no more than n entities in
entity set B; however, an entity in entity set B is associated with no more
than one entity in entity set A. When m takes a value of 1, the general form
m:n becomes 1:n. The Sales relationship type in Figure 2.13 is an example of
a 1:n cardinality constraint. A relationship of this type is sometimes referred
to as a parent-child relationship (PCR), in which the 1 side (SALESPERSON)
is the parent and the n side (VEHICLE) is the child.9
• n:1—An entity in entity set A is associated with no more than one entity of
entity set B; however, an entity in entity set B is associated with as many as
n entities in entity set A. This is just a reverse expression of the cardinality
constraint, 1:n. The Sales relationship type presented earlier also serves as
an example of n:1 cardinality constraint (see Figure 2.13).
• 1:1—An entity in entity set A is associated with no more than one entity of
entity set B, and an entity in entity set B is associated with no more than one
entity in entity set A. When both m and n take a value of 1, the general form
m:n becomes 1:1. A 1:1 cardinality constraint would exist between the two
entity types EMPLOYEE and COMPUTER if each Employee was assigned no
more than a single Computer and each Computer was assigned to no more
than a single Employee (see Figure 2.14).
v1
r1
Note: Look across optional participation of
Sales VEHICLE and mandatory s1
v2
participation of SALESPERSON in Sales
r2
s2
v3
r3
n Sold_by
VEHICLE s3
v4
r4
v5
FIGURE 2.13 Cardinality ratio of 1:n and partial participation of VEHICLE and total
participation of SALESPERSON in Sales
9
The original source of this acronym is said to be the practitioners’ world. Bachmann used it in the
data structure diagram that predates the ER model. The term was also prevalent in hierarchical and
network data models before the advent of relational data model.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Assignment c2
e3
r2
c3
e4
r3
1
COMPUTER e5
Assigned_to
FIGURE 2.14 Cardinality ratio of 1:1 and partial participation of EMPLOYEE and total participation
of COMPUTER in Assignment
When the cardinality ratio in a binary relationship is 1:n, one of the entity types in the
relationship (PCR) is unambiguously the parent and the other is the child. On the other
hand, in a 1:1 binary relationship type it is not possible to unequivocally assign the parent
or child role to either of the participating entity types. In fact, from a modeling perspective,
both entity types will have to be evaluated in both roles. In an m:n binary relationship type,
both entity types hold the role of parent because both a 1:n relationship and a 1:m relation-
ship underlie an m:n relationship, where the relationship itself serves as a pseudo-entity
type taking on the role of the child in both of the underlying relationships—that is, a child
with two parents.
The cardinality constraint reflects the maximum cardinality of the entity types partic-
ipating in the binary relationship type. The maximum cardinality indicates the maximum
number of relationship instances in which an entity participates. For example, in the Sales
relationship type shown in Figure 2.13, a Salesperson entity is connected to a maximum of
n Sales relationship instances, whereas a Vehicle entity is connected to a maximum of one
Sales relationship instance.
The participation constraint for an entity type in a relationship type is based on
whether, in order to exist, an entity of that entity type needs to participate in that rela-
tionship. In a binary relationship type, the entity will be related to an entity of the other
entity type through this relationship type. Participation can be total or partial. If, in order
to exist, every entity of an entity type must participate in the relationship, then participa-
tion of the entity type in that relationship type is termed total participation. On the other
hand, if an entity in an entity set can exist without participating in the relationship, then
participation of the entity type in that relationship type is called partial participation.
Total and partial participation are also commonly referred to as mandatory and optional
participation, respectively. For example, if every Salesperson must have sold at least one
Vehicle, then there is total participation of SALESPERSON in the Sales relationship type.
Similarly, if a Salesperson need not have sold any Vehicle, then there is partial
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
n Sold_by s3
VEHICLE r4 v4
r5 v5
FIGURE 2.15 Cardinality ratio of 1:n and total participation of VEHICLE and partial participation
of SALESPERSON in Sales
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
illustrates the partial participation of NURSE as Supervisor and as Supervisee in the Super-
vises relationship type. For example, observe how Nurse n1 is the supervisor of Nurses n2
and n3 but is not supervised by another Nurse. On the other hand, Nurse n2 is a supervisee 47
of Nurse n1 (i.e., is supervised by Nurse n1) but is not a supervisor of any Nurses.
“u” “e”
“u” “e”
FIGURE 2.16 Structural constraints for recursive relationships: cardinality ratio of 1:n
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
In Figure 2.16b, the small oval on the right-hand side of the ER diagram indicates
that a Nurse may or may not supervise other Nurses. However, the hash on the left-hand
48 side indicates that every Nurse must be supervised by another Nurse. Observe that in
the instance diagram, Nurse n3 is the supervisor of Nurses n1, n3, and n4 (i.e., Nurse
n3 acts as his/her own supervisor). In addition, note how each of the six relationship
instances pertains to a different one of the six nurses, indicating that each nurse is
supervised.
A recursive relationship type with an m:n cardinality ratio appears in Figure 2.17. In
this relationship, a Course may not only serve as a prerequisite for many other courses
but may also have many other courses as its prerequisites. The instance diagram in
Figure 2.17a illustrates this Prerequisite relationship type by showing a duplicate copy
of the COURSE entity type. Observe how Course c2 has Courses c1, c3, and c4 as its
prerequisites. In addition, note that Course c3 is a prerequisite of Course c2 as well as of
Course c5, whereas Course c2 is not a prerequisite of any other courses. Moreover, note
that Courses c1 and c3 have no prerequisites. The instance diagram in Figure 2.17b
illustrates the same relationships among courses in a truly recursive sense through the
use of only one COURSE entity type.
The attributes described in Section 2.3.1 can also be assigned to relationship types.
For example, consider the 1:n relationship type in Figure 2.18. Here, Figure 2.18a depicts
the condition in which Rent is a mandatory attribute of DORMITORY, conveying the
semantics of one fixed rent per Dormitory. If, on the other hand, Rent is shown as an
attribute of the relationship type (Figure 2.18b), the semantics of the diagram changes
to the following: The mandatory rent changes for a given dormitory based on the
occupancy—that is, each student may pay a different rent even in the same dormitory.
In a 1:n relationship type, attributes of the relationship can alternatively be shown as
attributes of the child entity type in the relationship without altering the semantics of
the relationship.
Consider the example shown in Figure 2.18c. Here, Rent is shown as an attribute of
the entity type STUDENT, meaning that the Rent can be different for different Students.
Thus, the semantics expressed in Figures 2.18b and 2.18c are the same.When Rent is
included as an attribute of DORMITORY (Figure 2.18a), even if it is shown as a multi-
valued attribute, it is impossible to identify the rent paid by each individual student living
in the dormitory. In short, when rent varies by occupant, rent can be an attribute of
STUDENT instead of Occupies without affecting the semantics. That is, an attribute of
a relationship type can always be stored in the child entity type in the relationship without
changing the semantics conveyed.
Next, consider the 1:1 relationship type Heads between PROFESSOR and
DEPARTMENT, which is shown in Figure 2.19. In a semantic sense, the attributes
Start_dt and End_dt belong to the relationship type Heads because the value of this
attribute is determined based on when a particular Professor assumed the duties of the
Head of a particular Department and when it was relinquished.
Here, the attributes Start_dt and End_dt can be included either in the entity type
PROFESSOR or in the entity type DEPARTMENT because the cardinality ratio of the
relationship is 1:1, implying that either entity type can be the child in the relationship.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
49
FIGURE 2.17 Structural constraints for recursive relationships: cardinality ratio of m:n
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
50
FIGURE 2.18c Rent as an attribute of the entity type STUDENT instead of an attribute
of the relationship type Occupies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Start_dt End_dt
51
Name
Name
1 Heads 1
PROFESSOR DEPARTMENT
Note: Since the cardinality constraint of Heads is 1:1, Start_dt and End_dt can be attributes of either PROFESSOR or
DEPARTMENT instead of being attributes of Heads.
Attributes of m:n relationship types cannot be shown anywhere other than as attributes
of the relationship type itself. For example, consider the attribute Cost in the Imports rela-
tionship type shown in Figure 2.20. Because the cost incurred by a vendor to import a prod-
uct is determined by a vendor-product combination, the cost can only be specified as an
attribute of Imports and not as an attribute of either VENDOR or PRODUCT
Cost
Vendor_id Vendor_name Product_name Product_id
n m
VENDOR Imports PRODUCT
Note: Since the cardinality constraint of Imports is m:n, Cost cannot be an attribute of either VENDOR or
PRODUCT—it must remain as an attribute of Imports.
(a) ER diagram
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
entity type because it does not have a unique identifier of its own. To distinguish a base
entity type from a weak entity type, as shown in Figure 2.21, the weak entity type is
depicted as a double rectangular box. To signify the identification dependency of APART- 53
MENT on BUILDING, a double diamond is used to portray the identifying relationship
type, Contains. Recall that a single diamond is used to represent a “regular” relationship
between entity types.
(sq. ft.)
(sq. ft.)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
When the participation of an entity type in a When an entity type is dependent on (an)other
relationship type is total (min ¼ 1), the entity entity type(s) for its unique identification, the
type is said to have existence dependency on the dependent entity type is said to have identifica-
relationship type irrespective of the other entity tion (ID) dependency on the identifying entity
type(s) present in the relationship. type(s) in a relationship.
Additional examples of a weak entity type appear in Figures 2.22 through 2.26. Note
in Figure 2.22 that the weak entity type INTERNSHIP has a collective identification
dependency on both COMPANY and STUDENT via two independent identifying relation-
ships. Sample data for Figure 2.22 is shown in Figure 2.23.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
55
FIGURE 2.22 Example of a weak entity type with multiple identifying parents
).
).
.
FIGURE 2.23 Sample data for the COMPANY, STUDENT, and INTERNSHIP entity types
in Figure 2.22
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
FIGURE 2.25 Sample data for the INTERNSHIP and TRAINING_PROGRAM entity types
of Figure 2.24
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
The final example in this section demonstrates how a weak entity type, though
identification-dependent on some base entity type(s), may also participate in other “regu-
lar” (non-identifying) relationship(s). The example shown in Figure 2.26 is a minor exten- 57
sion to the example discussed earlier using Figure 2.21.
Size No_of_bathrooms
No_of_bedrooms
No_of_floors
Vacancy Sqr_ft
Apt_no
Rent
Bldg_no
1 n 1
BUILDING Contains APARTMENT Rented_to
3
Name
Reference
Phone# TENANT
Apartments are rented to tenants. The structural constraints of this regular relation-
ship indicate that an apartment need not be rented (stay vacant) or can be rented to no
more than three tenants. A tenant, on the other hand, must rent an apartment. (Perhaps
that is why the entity type is named TENANT!) Also, a tenant cannot rent more than one
apartment. Observe that in this regular 1:n relationship, not only is the weak entity type
APARTMENT not identification-dependent on the base entity type TENANT, it is not even
existent-dependent on the relationship type Rented_to. Also, the weak entity type APART-
MENT is the parent in the Rented_to relationship and the base entity type TENANT, and
the child in the Rented_to relationship is existent-dependent on Rented_to. Finally,
since the value of n (maximum cardinality) in this 1:n relationship happened to be
known as 3, the specification indicates the cardinality constraint as 1:3.
Product to any Customer. In fact, in the extreme case, the model would permit all ordered
Products to be shipped to all Customers by all the Warehouses. Rich modeling constructs
58 like a ternary relationship type can result in unmanageable relationship patterns; therefore,
they require careful evaluation before being incorporated. Figure 2.27a captures the seman-
tics conveyed by a business rule that prohibits more than one Warehouse from shipping the
same Product to the same Customer. However, the ER modeling grammar does not permit a
relationship type linked to another relationship type—a syntactic error in the ER modeling
grammar. In order to resolve this issue, a new construct labeled “cluster entity type” is
introduced in Figure 2.27b. The virtual entity type labeled ORDER denoted by a dotted line
encircling the composite object PRODUCT-Order-CUSTOMER is referred to as the cluster
entity type. Observe that in Figure 2.27b the relationship Shipment connects the base entity
type WAREHOUSE to the cluster entity type ORDER, thus averting the syntactic error
found in Figure 2.27a. The structural constraints of the relationship type Shipment now
ensure that a Customer can get an ordered Product shipped from only one Warehouse.
The use and value of cluster entity types is covered in great detail in Chapter 5.
Number
Code
Product_id
Price
PRODUCT
Order Customer_name
Customer#
n X n
1 Shipment
WH_num Location
CUSTOMER
WAREHOUSE
Credit
Rating Size
(a)
FIGURE 2.27 Customer order for a product shipped from only one warehouse
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
59
Number
Code
Product_id
Price
ORD
PRODUCT
E
R
m
Order Customer_name
Customer#
n
WH_num Location
n
1 Shipment
CUSTOMER
WAREHOUSE
Credit
Rating Size
(b)
FIGURE 2.27 Customer order for a product shipped from only one warehouse
(continued)
in the database. Note that no action is needed when an entity is deleted from the entity set of
the child entity type in the relationship except when the cardinality constraint of the rela-
tionship is 1:1. When the cardinality constraint of a relationship type is 1:1, both the entity
types in the relationship can be a parent or a child and should be evaluated accordingly.
Four rules apply to deletion constraints: the restrict rule, the cascade rule, the set null
rule, and the set default rule.10 Here are descriptions of each:
• When an attempt is made to delete a parent entity in a relationship, if the deletion
should be disallowed when child entities related to this parent in this relationship
exist, the restrict rule is specified on the parent entity type in the relationship.
• When an attempt is made to delete a parent entity in a relationship, if all
child entities related to this parent in this relationship should be deleted
along with the parent entity, the cascade rule is specified on the child entity
type in the relationship.
• When an attempt is made to delete a parent entity in a relationship, if all child
entities related to this parent in this relationship should be retained but no
10
Some argue that the deletion constraints belong in physical database design. It is our view that the
semantics for the deletion constraints also emerges from user-specified business rules and ought to
be captured, modeled, and passed through the data modeling tiers. A similar constraint is necessary
on those occasions when the value of the unique identifier of the parent entity is changed. This type
of constraint is called an update constraint. This is discussed in Section 10.1.1.1 of Chapter 10.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
longer referenced to this parent while the deletion of the parent entity is
allowed, the set null rule is specified on the child entity type in the relationship.
60 • When an attempt is made to delete a parent entity in a relationship, if all
child entities related to this parent in this relationship should be retained
despite the deletion of the parent entity type by shifting the parent reference
to a predefined default parent, the set default rule is specified on the child
entity type in the relationship.
Figure 2.28 illustrates these deletion rules in the context of a relationship between the
entity type FACULTY and the entity type STUDENT. In Figure 2.28a, the restrict rule (R)
prohibits the deletion of a faculty member serving as the dissertation chair of one or more
Ph.D. students. Observe the conflict between the restrict rule and the total participation
constraint on FACULTY in this relationship. Since every faculty member must participate in
the Dis_chair relationship, meaning must be the dissertation chair of some Phd_students(s),
the restrict rule here (Figure 2.28a1) prevents the deletion of any faculty member from this
entity set ever. In other words, though syntactically correct, this condition is semantically
almost always incorrect since restricting any entity set from deletion forever is highly
improbable, if not impossible, in a typical application domain. Figure 2.28a2 remedies this
semantic error through the specification of partial participation of FACULTY in the
Dis_chair relationship when the deletion rule imposed is “restrict.” On the other hand, in
Figure 2.28b, the cascade rule (C) implies that the deletion of a faculty member leads to the
deletion of all Ph.D. students for whom the faculty member serves as dissertation chair. The
set null rule (N) allows a Ph.D. student to exist without a dissertation chair by simply nulli-
fying the relationship of the Ph.D. student with the faculty member, should the faculty
member be removed from the FACULTY entity set (see Figure 2.28c). Here, observe how
total participation in Figure 2.28c is incompatible with the set null rule and must therefore
be corrected to permit partial participation of STUDENT in the Diss_chair relationship.
Alternatively, when the total participation of the Ph.D. student in the relationship is a nec-
essary condition, the set null rule must be replaced by a compatible deletion rule. Finally,
the set default rule (D) portrayed in Figure 2.28d is somewhat similar to the set null rule.
Here, instead of nullifying the relationship, the Ph.D. student is linked to a predetermined
(default) dissertation chair, should the student’s current dissertation chair be deleted. Con-
ventionally, when a deletion constraint is not specified, the restrict rule is implied.
A second example for the application of deletion rules for a 1:n relationship type is
shown in Figure 2.29. Observe the conflict between participation constraint and deletion
rule in Figures 2.29a and 2.29d.
In an m:n binary relationship type, both entity types hold the role of a parent since
both a 1:n relationship and a 1:m relationship underlie an m:n relationship where the
relationship serves as a pseudo-entity type taking on the role of the child in both the
underlying relationships. For this reason, sometimes the relationship type in an m:n rela-
tionship is referred to as a “relationship entity type” or more often an “associative entity
type.” Figure 2.30 portrays an extension of the m:n relationship exhibited in Figure 2.20,
highlighting the expression of the relationship type as an associative entity type (see
Figure 2.30c). The sample dataset (Figure 2.30b) can facilitate understanding of the struc-
ture of the associative entity type as one where the entities are uniquely identified by the
concatenation of the unique identifiers of the two parent entity types in the relationship.
An important inference ensues from this: The relationship type in an m:n relationship
(associative entity type) will have total participation (existence dependency) in the
relationships with both the parent entity types.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
1 1 1
n n n
Conflicting
C
(a1) (a2)
Conflicting
1 1 1
n n n
N N Conflicting D
(c1) (c2)
(c) The “set null” Rule (d) The “set default” Rule
When a parent entity in a relationship is deleted, if all When a parent entity in a relationship is deleted, if all
child entities related to this parent in this relationship child entities related to this parent in this relationship
should be retained but no longer referenced to this should be retained, no longer referenced to this parent,
parent, the “set null” (N) rule applies. but should be referenced to a predefined default
parent, the “set default” (D) rule applies.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
1 1 1 1 1
n n n n n
Conflicting N N C Conflicting
FIGURE 2.29 Application of deletion rules in a 1:n relationship type: Another example
Cost
Vendor_id Vendor_name Product_name Product_id
n m
VENDOR Imports PRODUCT
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Cost
63
n m
VENDOR Imports PRODUCT
Note: Imports is viewed as an “artificial” entity type some times referred to as an “Associative Entity
Type” and serves as the “child” of both VENDOR and PRODUCT.
FIGURE 2.30 An m:n relationship type depicted as an associative entity type (continued)
This portrayal further clarifies that an m:n relationship entails two deletion rules since
there are two parents for one child in this relationship. Figure 2.31 explicates the role of
deletion constraints in an m:n relationship type. Observe that there are two deletion con-
straints in the m:n relationship type. In Figure 2.31a, the “R” next to VENDOR informs that
when an entity from the entity set of the VENDOR entity type is deleted, if that entity par-
ticipates in the Imports relationship type (meaning that it is related to one or more Product
entities), then the deletion of the vendor entity is disallowed. Since the participation of
VENDOR in the Imports relationship is optional, the restrict rule on VENDOR is compatible
with the [partial] participation constraint on VENDOR. On the contrary, the “R” next to
PRODUCT restricting the deletion of a Product entity if it participates in the Imports rela-
tionship, though correct in syntax, is incompatible with the [total] participation constraint
imposed on PRODUCT in this relationship. Please note how the deletion constraints C, N,
and D prevailing on the child entity type appear next to the relationship type Imports in
Figures 2.31(b), (c), and (d). The set null constraint (N), however, cannot ever be imposed
on a relationship type (associative entity type) of an m:n relationship since nullifying the
relationship here entails disabling the unique identification of the associative entity type.
The same logic applies to the set default (D) constraint. It is critical to understand that the
cascade constraint pertaining to the deletion of an entity type—say, VENDOR in an m:n
relationship—does not imply cascading deletion of related entities from the PRODUCT
entity set. The cascading deletion (C) applies to the related instances in the relationship—
that is, entities of the associative entity type. The same logic applies to the set default (D)
constraint.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
64 VENDOR VENDOR
R
Incompatible
m C m
Imports Imports
n C n
R Incompatible
PRODUCT PRODUCT
When a parent entity in a relationship is deleted, if the When a parent entity in a relationship is deleted, if all
deletion of the parent should be prohibited even if one child entities related to this parent should be deleted,
child entity related to this parent in present, then the then the cascade (C) rule applies.
restrict (R) rule is used.
VENDOR VENDOR
Incorrect N m Incorrect D m
Imports Imports
Incorrect D n
Incorrect N n
PRODUCT PRODUCT
(c) The “set null” Rule (d) The “set default” Rule
When a parent entity in a relationship is deleted, if all When a parent entity in a relationship is deleted, if all
child entities related to this parent in this relationship child entities related to this parent in this relationship
should be retained but no longer referenced to this should be retained, no longer referenced to this parent,
parent, the “set null” (N) rule applies. but should be referenced to a predefined default
parent, the “set default” (D) rule applies.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
R Incompatible N C R N D
Conflicting
1 1 1 1 1
1 1 1 1 1
R Incompatible Conflicting N N C
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
Observe that in a 1:1 binary relationship type, since both participating entity types
can be the parent and child of each other, two deletion rules will have to be imposed.
66 Figure 2.32d represents the condition where the deletion of the parent entity type
EMPLOYEE is prohibited if an employee entity participates in the Assigned relationship
type. But then, the partial participation of EMPLOYEE in the Assigned relationship
type is in synch with the restrict deletion constraint imposed on EMPLOYEE. The set
null deletion constraint also imposed on EMPLOYEE pertains to the role of EMPLOYEE
as the child in the Assigned relationship type. The set default deletion rule applicable
only on the child entity type (similar to set null and cascade) is demonstrated in
Figure 2.32e.
Deletion rules, if not carefully specified, may also cause conflict in a hierarchy of
relationships. An example is presented in Figure 2.33. If an Investment Club closes
down, the cascade deletion constraint in the Member_of relationship type specifies that
all the Investors related to this specific Investment Club be removed (Figure 2.33a).
However, the restrict deletion constraint imposed on the INVESTOR entity type in the
Purchased_by relationship type requires that an Investor entity not be removed if it
participates in the Purchased_by relationship type—that is, if that Investor is related to
one or more Security entities. Thus, the INVESTOR entity type is subject to conflicting
deletion constraints through the two relationship types, Purchased_by and Member_of,
in which it participates. A way to resolve this conflict by changing just one of the dele-
tion constraints is demonstrated in Figure 2.33b—that is, by replacing the restrict dele-
tion constraint on the parent entity type INVESTOR in the Purchased_by relationship
type by the set null deletion constraint on the child entity type SECURITY. A second
illustration stems from an earlier example (Figure 2.26) reproduced in Figure 2.34.
As in the previous example, the cascade deletion constraint on APARTMENT in the
Contains relationship type and the restrict deletion constraint on it in the Rented_to
relationship type are mutually conflicting (Figure 2.34a). Observe that the resolution
of this conflict is accomplished here to demonstrate an alternative solution. The
cascade deletion constraint on APARTMENT in the Contains relationship type is
replaced by a restrict deletion constraint on the parent entity type of the Contains
relationship type—that is, BUILDING. However, the solution is not quite complete;
notwithstanding the resolution of the conflict just identified, the restrict deletion
constraint on APARTMENT in the Rented_to relationship type is incompatible with
the total participation of APARTMENT in this relationship type. Replacing the restrict
deletion constraint on APARTMENT in the Rented_to relationship type with a cascade
deletion constraint on the child entity type, TENANT, of Rented_to solves this problem,
as shown in Figure 2.34b.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Last_nm
First_nm 67
name
Member_id
C INVESTOR
n
Cost
1 Member_of 1
Purchased_by
Club_id Office n
Security_nm
Size Symbol
Rating
Type
INVESTMENT
SECURITY
_CLUB
Last_nm
First_nm
name
Member_id
C INVESTOR
n
Cost
1 Member_of 1
Purchased_by
Club_id Office n
Security_nm
Size Symbol
Rating N
Type
INVESTMENT
SECURITY
_CLUB
No_of_bathrooms
Size
68 No_of_bedrooms
No_of_floors
Vacancy Sqr_ft
Apt_no
Rent
Bldg_no
1 n C R 1
BUILDING Contains APARTMENT Rented_to
3
Name
Reference
Phone# TENANT
Size No_of_bathrooms
No_of_bedrooms
No_of_floors
Vacancy Sqr_ft
Apt_no
Rent
Bldg_no
R 1 n 1
BUILDING Contains APARTMENT Rented_to
3
Name
Reference
C
Phone# TENANT
In summary, deletion rules are also business rules and, therefore, an inherent part of
the requirements specification. Deletion constraints express the deletion rules in terms of
modeling specifications. In a PCR, the deletion constraint restrict (R) always entails an
action on the parent entity type, whereas the other three constraints (N, C, D) always
result in actions on the child. Deletion rules cannot be arbitrarily applied in a relationship
type. Here are a few guidelines in this regard:
• Specification of the set null constraint on an entity type existent-dependent
on the relationship type is invalid.
• Specification of the set null constraint on a weak entity type in the identify-
ing relationship(s) type is invalid since a weak entity type is always existent-
dependent on its identifying relationship(s).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
Chapter Summary
70 The fundamental constructs and rules for conceptual data modeling can be understood using the
Wand and Weber (2002) framework for research in conceptual modeling. The entity-relationship
(ER) modeling grammar is a popular tool for conceptual data modeling.
The entity-relationship (ER) model was developed by Peter Chen in 1976. Using this model,
object types are conceptualized as entity types. Objects belonging to an object type are considered
to be entities of the corresponding entity type. Properties of an object type are represented as attri-
butes of an entity type. An entity is created when a value is supplied for each attribute. Some, but not
all, attribute values can be null. Associations exist among objects of different object types. In con-
ceptual modeling, these associations are referred to as relationships among entity types.
An attribute possesses a number of characteristics. These include a name, a data type, and
a class (atomic or composite). Furthermore, an attribute can be stored or derived, single or multi-
valued, and mandatory or optional. An atomic attribute or collection of atomic attributes (i.e., a
composite attribute) can serve as a unique identifier of an entity type. Every attribute plays only
one of three roles in an entity type. It is a key attribute, a non-key attribute, or a unique identifier.
Any attribute that is a constituent part of a unique identifier is a key attribute. An attribute that is
not a constituent part of a unique identifier is a non-key attribute.
Business rules supplied by the users expressed in terms of constraints allow data integrity
to be achieved. At the conceptual tier of data modeling, two types of data integrity constraints
must be specified: (a) the domain constraint imposed on an attribute to ensure that its observed
value is not outside the defined domain, and (b) the key (or uniqueness) constraint, which
requires entities of an entity type to be uniquely identifiable.
A relationship type is a meaningful association among entity types. The degree of a relation-
ship is defined as the number of entity types participating in a relationship type. A relationship type
is said to be binary or “of degree two” when two entity types are involved. Relationship types that
involve three entity types (of degree three) are referred to as “ternary relationships,” whereas rela-
tionships that involve four or more entity types are referred to as “n-ary relationships.” An entity
type related to itself is termed a “recursive relationship type.” A relationship type is not fully speci-
fied until two structural constraints are explicitly imposed. These are the cardinality constraint and
the participation constraint. Role names are used to indicate the participation of entity types in
relationship types. In addition, a relationship type can have attributes.
An entity type where the entities have independent existences (that is, where each entity is
unique) is referred to as a base or strong entity type. An entity type that does not have indepen-
dent existence (where some entities in the entity set may be identical) is known as a weak entity
type. An attribute, atomic or composite, in a weak entity type, which in conjunction with a unique
identifier of the parent entity type in the identifying relationship type uniquely identifies weak enti-
ties, is called the partial key or the discriminator of the weak entity type.
A cluster entity type is a virtual entity type emerging from a grouping operation on a collec-
tion of entity types and the relationship(s) among them. It does not have a “real” existence,
unlike a base or weak entity type. Nonetheless, as an ER modeling grammar construct, it
enriches conceptual modeling. A brief introduction of the cluster entity type I presented in this
chapter. More sophisticated use for this construct appears in chapter 5.
In the context of every relationship type in an ER model, the deletion of an entity from the entity
set of the parent entity type in the relationship requires specific action either in the parent entity set
or in the child entity set in order to maintain consistency of the relationships in the database. It
must be noted that the semantics for the deletion constraints emerge from user-specified business
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
rules and should therefore be part of data modeling right from the start. Four distinct rules that apply
to deletion constraints are discussed as the last topic in this chapter.
71
Exercises
1. What is the difference between the conceptual world and the real world? Is it possible for a
conceptual model to represent reality in total? Why or why not?
2. Use examples to distinguish between the following:
a. an object type and an entity type
b. an object and an entity
c. a property and an attribute
d. an entity and an entity instance
e. an association and a relationship
f. an object class and an entity class
3. Describe various data types associated with attributes.
4. What is the difference between a stored attribute and a derived attribute?
5. What would be the domain of the attribute County_name in the state of Texas?
6. Distinguish among a simple attribute, a single-valued attribute, a composite attribute, a
multi-valued attribute, and a complex attribute. Develop an example similar to Figure 2.3
that illustrates the differences among these attributes.
7. What is a unique identifier of an entity type? Is it possible for there to be more than one
unique identifier for an entity type?
8. What is the difference between a key attribute and a non-key attribute?
9. Consider the EMPLOYEE entity type shown here.
a. List all key and non-key attributes.
b. What is (are) the unique identifier(s)?
c. Which attribute(s) is (are) derived attributes?
d. Using the following figure as a guide, develop sample data for four employees that
illustrate the nature of the various mandatory and optional attributes in the EMPLOYEE
entity type. Be sure to illustrate the various ways the Name attribute might appear.
11. Give an example of three entity types and accompanying attributes that might be associ-
ated with a database for a car rental agency.
72 12. What is a relationship type? How does a relationship type differ from a relationship
instance?
13. What is meant by the “degree” of a relationship?
14. What is the value of using role names to describe the participation of an entity type in a
relationship type?
15. What is the difference between a binary relationship that exhibits a 1:1 cardinality constraint
and a binary relationship that exhibits a 1:n cardinality constraint?
16. Describe how Married_to can be modeled as a recursive relationship.
17. Create an example of a recursive relationship with an m:n cardinality constraint.
18. Distinguish between a participation constraint and minimum cardinality.
19. Why can total participation of an entity type in a relationship type also be referred to as
existence dependency of that entity type in that relationship type?
20. How do cardinality constraints and participation constraints relate to the notions of total and
partial participation?
21. Discuss the difference between existence dependency and identification dependency.
22. Give an example of a relationship type between two entity types where an attribute can be
assigned to the relationship type instead of to one of the two entity types.
23. What is the difference between a base entity type and a weak entity type? When is a weak
entity type used in data modeling?
24. Define the term “partial key.”
25. A small university is comprised of several colleges. Each college has a name, location, and
size. A college offers many courses over four college terms or quarters—Fall, Winter,
Spring, and Summer—during which one or more of these courses are offered. Course#,
name, and credit hours describe a course. No two courses in any college have the same
course#; likewise, no two courses have the same name. Terms are identified by year and
quarter, and they contain numbers. Courses are offered during every term. The college also
has several instructors. Instructors teach; that is why they are called instructors. Often, not
all instructors are scheduled to teach during all terms, but every term has some instructors
teaching. Also, the same course is never taught by more than one instructor in a specific
term. Further, instructors are capable of teaching a variety of courses offered by the col-
lege. Instructors have a unique employee ID; their name, qualifications, and experience are
also recorded.
For this narrative, perform the following tasks:
a. List the business rules explicitly stated and implicitly indicated in the narrative.
b. Study the narrative carefully and identify the missing information required for develop-
ing a semantically complete conceptual data model.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
26. The instance diagram shown here illustrates the relationship between Sullivan Insurance
Agency’s agents and clients. Using this instance diagram, write the narrative that describes
the relationship between agents and clients. Your narrative should include a description of 73
both the cardinality ratio and participation constraints implied in the instance diagram. In
addition, draw the ER diagram that fully describes the relationship between the company’s
agents and clients.
Policy
CLIENT AGENT
r1
c1 a1
r2
c2
r3 a2
c3
r4
a3
c4
r5
c5
a4
r6
27. Revise the ER diagram you drew in the previous exercise to include the following manda-
tory attributes: CLIENT—ID number, name, address (city, state, zip), phone number(s),
birthdate; AGENT—agent number, name, phone number, area; and commission received
by an agent for selling a Policy to a client.
28. Using the instance diagram depicting the ternary relationship Orders shown on the next
page, answer the following questions:
a. Identify an error seeded in the diagram and correct the error.
b. Which customers order pens from the Galveston warehouse?
c. Which items are ordered by customers from both warehouses?
d. Which warehouse fills one or more orders of items from both customers?
e. Describe orders filled from both warehouses.
f. What changes must be made to the instance diagram for order r1 to involve both pen-
cils and pens?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
74
29. The following two ER diagrams contain both a cardinality ratio constraint and a participation
constraint.
a. In the first ER diagram, is the instance diagram on the right consistent with the ER
diagram on the left? Why or why not?
b. In the second ER diagram, is the instance diagram on the right consistent with the ER
diagram on the left? Why or why not?
r5 v5
Instance Diagram
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
Instance Diagram
30. Suppose you want to show that a person can have multiple degrees. Would each of the
following two ER diagrams get the job done? Why or why not? What is the difference?
Phone
Birthdate
BBA-MIS
SSN
PERSON MBA-Acctg
Last
Education
Name
Ph.D.-Finance
First
Phone
Birthdate
SSN
PERSON
Last
Education
Name
First
31. Adams, Ives, and Scott Incorporated is an agency that specializes in representing clients in the
fields of sports and entertainment. Given the nature of the business, some employees are given a
company car to drive, each company car being assigned to a particular employee. Each
employee has a unique employee number, plus an address and set of certifications. Not all
employees have earned one or more certifications. Company cars are identified by their respec-
tive vehicle IDs and also have a license plate number, make, model, and year. Employees rep-
resent clients. Not all employees represent clients, whereas some employees represent many
clients. Each client is represented by one and only one employee. Sometimes, clients refer one
another to use Adams, Ives, and Scott to represent them. A given client can refer one or more
other clients. A client may or may not have been referred to Adams, Ives, and Scott by another
client, but a client may be referred by only one other client. Each client is assigned a unique client
number. Additional attributes recorded for each client are: name, address, and date of birth.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 2
Draw an ER diagram that shows the entity types and relationship types for Adams, Ives,
and Scott. You must name each relationship type and define its structural constraints; how-
76 ever, it is not necessary that you provide role names.
32. Draw the ER diagram for the two instance diagrams depicted here.
NEW_ASSET
S1
Assigned_to
S2
r7
AIRFORCE_BASE
Scheduled_for S3 NAVAL BASE
r1
A1 S4 r8
N1
r2
A2 S5
r3 r9
A3 S6
N2
r4
A4 S7 r10
r5
A5 S8
r11 N3
r6
S9
S10
S11
PROPERTY
P1
P2
P10
P11
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Foundation Concepts
33. This vignette is a small excerpt from a comprehensive case about a clinic. Various physi-
cians and surgeons working for a clinic are on an annual salary [o]. These doctors are
identified by their respective employee numbers. The other descriptors of a doctor are: 77
name, gender, address, and phone. Each physician’s specialty and rank [o] are captured;
each surgeon’s, specialty and skill are also captured; a surgeon may have one or more
skills.
Every physician serves as a primary care physician for at least seven patients; however, no
more than 20 patients are allotted to a physician. Every patient is assigned one physician
for primary care. Some patients need surgeries; others don’t. Surgeons perform surgeries
for the patients in the clinic. Some do a lot of surgeries; others do just a few. The date and
operation theater [o] for each surgery needs to be recorded, too. Removal of a surgeon
from the clinic database is prohibited if that surgeon is scheduled to perform any surgery.
However, if a patient chooses to pull out of the surgery schedule, all surgeries scheduled
for that patient are cancelled.
Data for patients include: patient number (the unique identifier of a patient), name, gender,
date of birth, blood type, cholesterol (consisting of HDL, LDL, and triglyceride), blood sugar,
and the code and name of allergies, if any.
Physicians may prescribe medications to patients; thus, it is necessary to capture which
physician(s) prescribe(s) what medication(s) to which patient(s) along with dosage and fre-
quency. In addition, no two physicians can prescribe the same medication to the same
patient. If a physician leaves the clinic, all prescriptions written by that physician should
also be removed because this information is retained in the archives.
A patient may be taking several medications, and a particular medication may be taken by
several patients. Despite its list price, a medication’s cost varies from patient to patient,
perhaps because of the difference in insurance coverage. The cost of a medication for a
patient needs to be captured. A medication may interact with several other medications.
When a medication is removed from the system, its interaction with other medications, if
any, should be voided. When a patient leaves the clinic, all the medication records for that
patient are removed from the system.
Medications are identified by either their unique medication codes or by their unique names.
Other attributes of a medication are its classification, list price, and manufacturer [o]. For
every medication, either the medication code or the medication name must be present—not
necessarily both.
Note: [o] indicates optionality of value for the attribute. Develop an ER model for this
scenario.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 3
ENTITY-RELATIONSHIP
MODELING
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
projects undertaken by that plant cannot be canceled. The project assignments from a
closed plant must be temporarily removed in order to allow the project to be transferred
to another plant.
Employees work in these plants, and each employee works in only one plant. A plant
80
may employ many employees but must have at least 100 employees in order to exist. A
plant with employees cannot be closed down. Every plant is managed by an employee who
works in the same plant; but every employee is not a plant manager, nor can an employee
manage more than one plant. Company policy dictates that every plant must have a
manager. Therefore, an employee currently managing a plant cannot be deleted from the
database. If a plant is closed down, the employee no longer manages the plant but
becomes an employee of another plant.
Some employees are assigned to work on projects and in some cases might even be
assigned to work on several projects simultaneously. For a project to exist, it must have at
least one employee assigned to it. A project might need several employees, depending on its
size and scope. As long as an employee is assigned to a project, his or her record cannot be
removed from the database. However, once a project ends, the employee records are removed
from the database and all assignments of employees to that project must be removed.
Some employees also supervise other employees, but all employees need not be super-
vised; the employees that are supervised are supervised by just one employee. An employee
may be a supervisor of several employees but of no more than 20. The Human Resources
Department uses a designated default employee number to replace a supervisor who leaves
the company. It is not possible for an employee to be his or her own supervisor.
Some employees may have several dependents. Bearcat Incorporated does not allow
both husband and wife to be an employee of the company. Also, a dependent can only be
a dependent of one employee at any time.
Bearcat Incorporated offers credit union facilities as a service to its employees and
to their dependents. An employee is not required to become a member of Bearcat Credit
Union (BCU). However, most employees and some of their dependents have accounts
with BCU. Some BCU accounts are individual accounts, and others are joint accounts
held by an employee and his or her dependent(s). Every BCU account must belong to at
least one employee or dependent. Each joint account must involve no more than one
employee and no more than one of his or her dependents. If an employee leaves the
company, all dependents and BCU accounts of that employee must be removed. In
addition, as long as a dependent has a BCU account, deletion of the dependent is not
permitted.
To nurture the hobbies of employees’ dependents, Bearcat Incorporated sponsors
recreational opportunities. Dependents need not have a hobby, but some dependents may
have several hobbies. Because some hobbies are not as popular as others, every hobby
need not have participants. If a dependent is no longer in the database, no records of that
dependent’s participation in hobbies should exist in the database. Finally, as long as at
least one dependent participates in a hobby, that hobby should continue to exist.
All plants of Bearcat Incorporated have a plant name, number, budget, and building.
A plant has three or more buildings. Each plant can be identified by either its name or
number. Bearcat Incorporated operates seven plants, and the plant numbers can be any in
the range 10 through 20. However, either the plant number or plant name must always be
recorded.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
The name of an employee of Bearcat Incorporated consists of the first name, middle
initial, last name, and a nametag. Employee numbers are used to identify employees in
the company. However, names can also be used as identifiers. Both employee number and
employee name must be recorded. Where two or more employees have the same name, a
81
one-position numeric nametag is used so that up to 10 otherwise duplicate names can be
distinguished from one another. Sometimes, an employee’s middle initial may not be
available.
Although the address, gender (male or female), and hiring date of each employee
must be recorded, salary information is optional. Salaries at Bearcat Incorporated range
from $35,000 to $90,000. Also, the salary of an employee cannot exceed the salary of the
employee’s supervisor.
The date on which an employee starts working as a manager of a plant should be
gathered. In addition, the number of employees working in each plant should be gathered;
this can be computed. Information about the dependents related to each employee, such
as the dependent’s name, relationship to the employee, birth date, and gender, should also
be captured. The dependent’s name as well as how the dependent is related to the
employee is mandatory. A mother or daughter must be a female, a father or son must be a
male, and a spouse can be either male or female. Since a dependent cannot exist inde-
pendently of an employee, the dependent’s name and relationship to the employee, in
conjunction with either the employee name or the employee number, is used to identify
the dependents of an employee. The number of dependents of each employee must also
be captured, but this (like the number of employees working in each plant) can be
computed.
Projects have unique names and numbers, and their locations must be specified.
Every project must have a project number but sometimes may not have a project name;
project numbers range from 1 to 40. Bearcat Incorporated’s projects are located in the
cities of Stafford, Bellaire, Sugarland, Blue Ash, and Mason. The amount of time an
employee has been assigned to a particular project should be recorded for accounting
purposes.
BCU accounts are identified by a unique account ID, composed of an account num-
ber and an account type (C: checking account; S: savings account; I: investment
account). For each account, the account balance is recorded. Only the account ID is
required for every account. Account numbers contain a maximum of six alphanumeric
characters.
Hobbies are identified by a unique hobby name and include a code that indicates
whether the hobby is an indoor or outdoor activity and another code that indicates if the
hobby is a group or individual activity. The time spent by a dependent per week on each
hobby and the associated annual cost are also captured.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
notational variations occur in the expression of the properties of a relationship type. In all
these notations, an entity type is expressed by a rectangular box, whereas a weak entity
type appears as a double rectangular box. In order to contrast it from a weak entity type,
the entity type is often referred to as a base (or strong) entity type. A relationship type is
83
shown as a diamond, whereas an identifying relationship type (a relationship type that
connects a weak entity type to its identifying parent entity type) is shown as a double
diamond. The CASE tools tend to avoid the use of the diamond for a relationship type,
instead labeling the edges connecting the entity types to capture the semantics of the
relationship. There are just a couple of different ways to express attributes graphically.
This text uses the convention in which the optional/mandatory property can be explicitly
expressed. Thus, an attribute is shown by a circle with the name of the attribute written
adjacent to the circle. A dark circle represents an attribute with a mandatory value (also
known as a mandatory attribute), whereas an empty circle indicates an attribute with an
optional value (also known as an optional attribute). Component attributes constituting a
composite attribute are attached to the circle that represents the composite attribute. A
multi-valued attribute is shown by a double circle, whereas a derived attribute is shown by
a dotted circle. An attribute that serves as a unique identifier of a base entity type is
underlined with a solid line, whereas the partial key of a weak entity type (also known as a
discriminator) is underlined with a dotted line. Note that an entity type can and often does
have multiple unique identifiers and that each identifier can be an atomic or composite
attribute.
Figure 3.2 summarizes the notational scheme used for the Presentation Layer ERDs in
this book. Although it is based on Peter Chen’s original notation (1976), it also incorpo-
rates a few desirable features of other commonly used notational schemes. A relationship
is shown through the use of edges (lines) that connect the relationship type to the partici-
pating entity types. Both the cardinality ratio and the participation constraint are
expressed via a “look across” approach. The cardinality ratio is placed on the connectors
adjacent to the relationship type. The oval adjacent to E2 in Figure 3.2 indicates that E1 is
optionally related to E2—that is, there can be entities (e11, e12, …, e1x, …, e1n) of E1 not
related to any entity of E2. As stated in Section 2.3.4, this is known as partial participation
of E1 in R. The bar ( | ) adjacent to E1 signifies that E2 is mandatorily related to E1.1 As
discussed in Section 2.3.4, this is known as total participation of E2 in R. This also implies
that E2 has existence dependency on R—that is, in order for an entity e21 of E2 to exist, it
must participate in a relationship r1 with an entity e1x of E1. The Crow’s Foot notation to
specify the relationship properties is also popular and essentially replaces the m and n in
the Chen scheme with fork-like symbols. The Crow’s Foot notation, originally introduced
by Everest (1986) for the Knowledgeware software, follows the same “look across” strategy
to specify both cardinality ratios and participation constraints, as is done in the Chen
scheme. The meaning of the exclusive, inclusive arcs and the either/both (hash) notation
is discussed later in this chapter.
1
Mandatory participation in Chen’s original notational scheme is implicitly indicated by the absence
of the oval (symbol for partial participation). We, however, employ an explicit indicator—that is, the
bar ( j ).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Entity type
Relationship type
.....
Composite (molecular) attribute
Derived attribute
E1 optionally related To E2
E1 R E2 (Partial participation of E1 in R)
E2 mandatorily related to E1
(Total participation of E2 in R)
[Existence dependency of E2 on R]
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
application, and because the ER modeling grammar in this case expresses the conceptual
schema, the resulting script is referred to as the ER model/schema.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
the ERD. This is followed by gathering properties that appear to belong to individual entity
types. These properties, also typically nouns, are labeled as attributes of that entity type.
As was the case in the synthesis approach, throughout this process the identification of
relationships among various entity types must also be recognized.
86
Under both the synthesis and the analysis approach, caution should be exercised to
ensure that elements outside the scope of the narrative are not brought into the modeling
process based on an individual’s whims.
The excerpts that follow, taken from the Bearcat Incorporated narrative presented in
Section 3.1, illustrate the application of the analysis approach to identifying possible entity
types and their attributes. Capitalized nouns constitute the entity types, whereas nouns in
bold monofont are the attributes of the respective entity types.
• Bearcat Incorporated is a manufacturing company with several
PLANTs.… All plants of Bearcat Incorporated have a Plant name, Number,
Budget, and Building.…
• These plants are responsible for leading different PROJECTs.… Projects
have … Names and Numbers, and their Location must be specified.…
• EMPLOYEEs work in these plants.… The Name of an employee … consists of
the First name, Middle initial, Last name, and a Nametag. While the Address,
Gender, … and Date hired … must be recorded, Salary information is
optional.… The Start date of an employee as a manager of a plant should also
be gathered. There is also the requirement that the Number of employees
working in each plant be available.… Company policy dictates that every
plant must have a MANAGER.…
• Some employees may have several DEPENDENTs.… Information about the
dependents related to each employee, such as the dependent’s Name,
Relationship to the employee, Birth date, and Gender, should also be
captured.… There is also the requirement that the Number of dependents of
each employee be captured.…
• Bearcat Incorporated offers CREDIT UNION facilities as a service to its
employees and to their dependents.… BCU accounts are identified by a
unique Account ID composed of an Account number and an Account type
(C: checking account; S: savings account; I: investment account). For each
account, the Account balance is recorded.
• To nurture the HOBBY(ies) of employees’ dependents, Bearcat Incorporated
sponsors recreational opportunities. Hobbies are identified by a unique
Hobby name and include one code that indicates whether the hobby is an
indoor or outdoor activity and another code that indicates if the hobby is a
group or individual activity. The Time spent by a dependent per week on
each hobby and the associated Annual cost are also captured.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
italicized in a particular box will be described later in this chapter. Note that in order to
enhance the clarity of the presentation, attributes are not shown in these ERDs.
87
BOX 1
In Box 1, observe that the cardinality ratio of the relationship type (Underta-
ken_by) is shown as 1:n by looking across, from PLANT to PROJECT. This is because a
plant can be associated with several projects but a project is always under the control of
a single plant. Also, since the data requirements explicitly specify that “Some plants do
not undertake any projects at all,” one can infer that a plant may or may not under-
take/control a project (i.e., a plant optionally controls a project). This is indicated in the
diagram in Box 1 by looking across, from the PLANT entity type to the PROJECT entity
type, and placing the oval optionality marker just above the PROJECT entity type.
Accordingly, the participation constraint of PLANT in this relationship is said to have
the value “partial.” Likewise, since every project must be controlled by a plant, a look
across, from the PROJECT entity type to the PLANT entity type, signifies the manda-
tory participation of the PROJECT in the relationship through the bar (j) just below the
PLANT.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
88
BOX 2
In Box 2, the Works_in relationship type between the PLANT entity type and the
EMPLOYEE entity type indicates that every employee must work in a plant and only in
one plant (look across, from EMPLOYEE to PLANT) and that a plant contains many
employees. However, note that the requirement that a plant must have at least 100
employees in order to exist is not reflected in the ERD. Instead, the ERD in Box 2 only
indicates that a plant must have at least one employee and may have more than one
employee. A second relationship type, Managed_by, also exists between the PLANT and
EMPLOYEE entity types in order to show that (a) each plant must be managed by one
and no more than one employee, and (b) an employee may manage one plant but all
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
employees are not managers. Observe that, in Section 3.2.2, MANAGER was identified as
a possible base entity type. The basis for the design decision portrayed in Box 2 is dis-
cussed at the end of this section. The relationship type (Assigned) between the
EMPLOYEE and PROJECT entity types exhibits an m:n cardinality ratio. The oval next
89
to PROJECT indicates that every employee need not be assigned to a project (looking
across, from EMPLOYEE), and the bar next to EMPLOYEE indicates that every project
has at least one employee assigned to it (looking across, from PROJECT).
BOX 3
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
90
BOX 4
The DEPENDENT entity type in Box 4 is shown as a weak entity type, and the
Dependent_of relationship type is shown as an identifying relationship type. How do we
know that DEPENDENT is a weak entity type? The fact is, we do not know from what is
stated in Box 4. The statement “a dependent can only be a dependent of one employee at
any time” only indicates total participation of DEPENDENT in the relationship. Any entity
type, base or weak, has existence dependency in a relationship type if its participation in
the relationship is total. The reason we know that DEPENDENT is a weak entity type par-
ticipating as a child in an identifying relationship type with EMPLOYEE is that, later in the
narrative, the following is stated: “Since a dependent cannot exist independently of an
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
employee, the dependent’s name and relationship to the employee, in conjunction with
either the employee name or the employee number, is used to identify the dependents of
an employee.” Observe that while not all employees have dependents, each dependent is
related to one and only one employee.
91
The BCU_ACCOUNT entity type participates in two relationships: Held_by_E and
Held_by_D. The Held_by_E relationship type reflects the fact that an employee may have a
BCU account but that not all BCU accounts are held by employees. In addition, no more
than one employee can be associated with a BCU account. The Held_by_D relationship
type reflects a similar relationship between the BCU_ACCOUNT entity type and the
DEPENDENT entity type. Here, while a dependent may have several BCU accounts, he or
she need not have a BCU account. Likewise, a BCU account need not be associated with a
dependent, but if it is, then there can be no more than one dependent per BCU account.
The business rules that every BCU account must belong to at least one employee or a
dependent and, if held jointly, must belong to an employee and his or her dependent—not
an employee or the dependent of any other employee—are not reflected in the ERD in
Box 4. Business rules that cannot be captured in the ERD must be included in the list of
semantic integrity constraints that accompanies the ERD as part of the ER model.
BOX 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
92
BOX 6
What if joint accounts between employees and dependents are not permitted? This
means that the same BCU account entity cannot be related to an employee entity as well
as a dependent entity. That is, the relationship types Held_by_E and Held_by_D are
mutually exclusive. An exclusive arc, as shown in Box 5, represents this constraint. Like-
wise, if, for any reason, the user requirement specifies that all BCU accounts must be
jointly held between an employee and a dependent, then an inclusive arc is used to
express such a business rule as a constraint in the ERD, as shown in Box 6.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
EMPLOYEE
93
1
Dependent_of
1 20
Supervised_by
n
Held_by_E DEPENDENT
1
m
Held_by_D
BCU_ACCOUNT
Note: The exclusive arc indicates that the relationship types Held_by_E and Held_by_D are mutually
exclusive; the either/both hash indicates that every entity in the entity set of BCU_ACCOUNT must
participate in one or the other of Held_by_E and Held_by_D.
BOX 7
Observe that the business rule “Every BCU account must belong to at least one
employee or dependent” cannot be captured in the ERD using either the inclusive arc or
the exclusive arc. Use of an inclusive arc here would indicate that every BCU account
must always belong jointly to an employee and a dependent. An exclusive arc, on the
other hand, would mean that a BCU account must never belong jointly to an employee
and a dependent. The business rule, however, indicates that every BCU account must
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
BOX 8
The relationship type (Participates) between the DEPENDENT entity type and the
HOBBY entity type shown in Box 8 reflects an m:n cardinality ratio. Observe that a
dependent may optionally have many hobbies (up to n) and that up to m dependents may
participate in a hobby. Also, a hobby need not have any participants. Observe that while
DEPENDENT is a weak entity type, its relationship with HOBBY is not an identifying
relationship type.
A review of the data requirements for Bearcat Incorporated in Section 3.1 also
makes it possible to identify the attributes of each of the entity types. Figure 3.32 is a
collective representation of Boxes 1, 2, 3, 7, and 8 and includes the attributes.
Although the attribute names in the ERD are arbitrary, as much as possible, meaningful
abbreviated attribute names have been used to identify the attributes. The following
comments elaborate on selected attributes in the ERD.
2
All ER diagrams that appear in this book were created using Microsoft Visio.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Pl_name
Lname Name_tag
Salary Pnumber
Minit Address n 1
Works_in
Fname Building
Name employer
worker No_of_employees
Emp #
95
Gender EMPLOYEE Mgr_start_dt PLANT
Date_hired Budget
manager
Responsible
No_of_dependents managed by
Managed_by
1 1 1
Supervisee
Undertaken_by
Assignee
Holder of Hours n
Supervisor
Controlled
Plocation
1 20 Having
Supervised_by m n Assignment
1 Assigned PROJECT
Pr_name
1 Dependent_of Pnumber
Held_by_E
Related_how n
Dname
m Depends_on
Dependent
------------
DEPENDENT
Belongs to
Participant
Birthdate Gender
Account Holder m Annual_cost
1
Participates
Held_by_D
n n
Hrs_per_wk
Hb_name Usage
Account of
Gi_activity
Account_id HOBBY
BANK ACCOUNT
Acct_type Io_activity
Account#
Balance
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
3
Note that Pnumber and Pl_name are shown as optional attributes in Figure 3.3. Both are also desig-
nated as unique identifiers (underlined). This may initially appear contradictory because we expect
unique identifiers to be mandatory attributes. However, at the conceptual level, the task is to simply
identify all possible unique identifiers of an entity type. The implication here is that in any entity of
this entity type (PLANT) it is enough if either Pnumber or Pl_name has a value. This requirement is
captured by the either/both (hash) notation. Later, at the logical level, only one of these unique iden-
tifiers (Pnumber or Pl_name) will be chosen to serve as the primary unique identifier. At that point,
the primary unique identifier is made mandatory.
4
It is conceivable that a DEPENDENT entity type can have a naturally occurring unique identifier
(e.g., Social Security number), in which case it should be modeled as a base entity type. Since the
requirements specification of Bearcat Incorporated does not include such an attribute, DEPENDENT
is modeled here as a weak entity type identification-dependent on the identifying parent
EMPLOYEE.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Pl_name
Lname Name_tag
Salary Pnumber
Minit n 1
Address Works_in
Fname
employer Building
Name
worker
98 Emp#
R R
Gender EMPLOYEE Mgr_start_dt PLANT
C Budget
Date_hired No_of_employees
D R manager
managed by Responsible
No_of_dependents Managed_by
1 1 1
Supervisee
Undertaken_by
Assignee
Holder of
Hours
n
Supervisor
Controlled
Plocation
1 20 Having N
Supervised_by m n Assignment
1 Assigned
C PROJECT Pr_name
1 Dependent_of Pnumber
Held_by_E
n
Dname Related_how
m
Depends_on
Dependent
--------
C
DEPENDENT
Belongs to R
Participant
Birthdate
Gender
Account Holder m Annual_cost
1 C
Participates
Held_by_D
n n
Hrs_per_wk
Hb_name Usage
Account of
R
Gi_activity
C
Account_id HOBBY
BCU_ACCOUNT
Acct_type Io_activity
Account#
Balance
• In a m:n binary relationship, because both participating entity types are par-
ents, the relationship type serves the role of the child (artificial entity type,
also known as “associative entity type,” as discussed in Chapter 2), with two
parents. Accordingly, once again, the relationship type is informed by two
deletion constraints (see the assigned relationship between the entity types
EMPLOYEE and PROJECT).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Each of the 12 deletion constraints listed here is followed by a description (in italics)
of how the relevant deletion rule is indicated in Figure 3.4.
1. A plant with employees cannot be closed down.
The R adjacent to the PLANT entity type indicates the restriction of the 99
deletion of a plant if the plant has one or more employees. However, the
deletion of a plant without employees is permitted.
2. If an employee leaves the company, all BCU accounts of the employee must
be removed.
The C immediately above the BCU_ACCOUNT entity type under the
relationship Held_by_E indicates that the deletion of an employee should
cascade through in order to delete all BCU accounts associated with this
employee.
3. If a plant is closed down, the projects undertaken by that plant cannot be
canceled. The project assignments from a closed plant must be temporar-
ily removed in order to allow the project to be transferred to another
plant.
The N (representing the “set null” rule) immediately above the PROJ-
ECT entity type indicates that a project’s relationship with a plant can be
temporarily removed, resulting in the project not being undertaken by (i.e.,
under the control of) any plant.
4. The Human Resources Department uses a designated default employee num-
ber to replace a supervisor who leaves the company.
The D (representing the “set default” rule) shown immediately below
the EMPLOYEE entity type indicates that if a supervisor is deleted, all
supervisees under that supervisor are reassigned to the designated default
supervisor.
5. An employee currently managing a plant cannot be deleted from the
database.
The R adjacent to the EMPLOYEE entity type indicates the restriction of
a deletion of an employee if the employee manages a plant since company
policy dictates that every plant must have a manager.
6. If a plant is closed down, the employee no longer manages the plant but
becomes an employee of another plant.
The N adjacent to the EMPLOYEE entity type indicates that an employ-
ee’s relationship with a plant as a manager can be removed, resulting in
the employee no longer playing the role of a manager.
7. If an employee leaves the company, all dependents and BCU accounts of that
employee must be removed.
The C immediately above the DEPENDENT entity type indicates that the
deletion of an employee should cascade through in order to delete all
dependents associated with this employee. Similarly, the C immediately
above the BCU ACCOUNT entity type indicates that the deletion of an
employee should cascade through in order to delete all BCU accounts asso-
ciated with this employee.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
1 Dependent_of Pnumber
Held_by_E
n
Dname Related_how
m Depends_on
Dependent
-------- C
DEPENDENT
Belongs to
Participant
Birthdate Gender
Account Holder m Annual_cost
1 C
Participates
Held_by_D
n n
Hrs_per_wk
Hb_name Usage
Account of
R
Gi_activity
C C
Account_id HOBBY
BCU_ACCOUNT
Acct_type Io_activity
Account#
Balance
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Please note that in all three cases, alternative solutions are available, too. The reader
is encouraged to explore the alternative solutions.
There are several other business rules stated in the requirements specification that
still cannot be captured in the Presentation Layer ERD. These rules are expressed as a list
103
of Semantic Integrity Constraints (SICs) and are shown in Table 3.1. The SIC list is a cru-
cial supplement to the ERD in order to preserve all the information conveyed in the data
requirements. These Semantic Integrity Constraints are grouped into two categories:
attribute-level business rules, and entity-level business rules. For example, the business
rule requiring gender to be either male or female cannot be captured in an ERD and
therefore must be recorded as a semantic integrity constraint; all domain constraints on
attributes usually fall under the category of attribute-level business rules. A constraint of
the form “an employee cannot be his or her own supervisor” entails verification involving
more than one attribute in the same entity. Such business rules are captured in the SIC
list in the entity-level business rules. Next, the business rule that the salary of an employee
cannot exceed the salary of the employee’s supervisor cannot be captured as a constraint
in the ERD per se, nor can it be expressed as an attribute-level or entity-level business
rule; therefore, it must be captured as a semantic integrity constraint that takes the form
1. A mother or daughter dependent must be a female, a father or son dependent must be a male, and a
spouse dependent can be either male or female.
2. An employee cannot be his or her own supervisor.
3. A dependent may have a joint account only with an employee of Bearcat Incorporated to whom he or
she is related.
4. Every plant is managed by an employee who works in the same plant.
TABLE 3.1 Semantic integrity constraints for the Presentation Layer ER model
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
3.2.3.1 The (Min, Max) Notation for the Structural Constraints of Relationships
As noted in Section 3.2.1, a handful of popular notational schemes for the ERD exists. In
this section, the (min, max) notation, originally prescribed by Abrial (1974) for specifying
the structural constraints of a relationship, is introduced. Here, min depicts the minimum
cardinality of an entity type’s participation in a relationship type—the participation
constraint—and max indicates the maximum cardinality of an entity type’s participation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Entity/
Relationship
Type Name Attribute Name Data Type Size Domain Constraint
TABLE 3.2 Semantic integrity constraints for the initial Design-Specific ER model
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Entity/
Relationship
Type Name Attribute Name Data Type Size Domain Constraint
*(n1.n2) is used to indicate n1 places to the left of the decimal point and n2 places to the right of the
decimal point
Entity-Level Domain Constraints
1. A mother or daughter dependent must be a female, a father or son dependent must be a male, and a
spouse dependent can be either gender.
2. An employee cannot be his or her own supervisor.
3. A dependent may have a joint account only with an employee of Bearcat Incorporated to whom he or
she is related.
4. Every plant is managed by an employee who works in the same plant.
Remaining Miscellaneous Constraints
1. Each plant has at least three buildings.
2. The salary of an employee cannot exceed the salary of the employee’s supervisor.
TABLE 3.2 Semantic integrity constraints for the initial Design-Specific ER model (continued)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
in a relationship, thus reflecting the cardinality ratio. This notation is in general more
precise and particularly more expressive for specifying relationships of higher degrees
beyond two. In this notation, a pair of finite whole numbers (min, max) is used with each
participation of an entity type E in a relationship type R without any reference to other
107
entity types participating in the relationship R where 0 min max and max > 0. The
meaning conveyed here is that each entity e in E participates in at least “min” and at most
“max” relationships (r1, r2, …) in R. In this notation, min=0 implies partial or optional
participation of E in R, and min 1 implies total or mandatory participation of E in R.
The mapping of the structural constraints of a relationship from the presentation layer
to the Design-Specific layer—in other words, converting Chen’s notation to the (min, max)
notation—is demonstrated in Figure 3.6. Figure 3.6a indicates that an employee may have
several BCU accounts but need not have any. This is done by “looking across,” from
EMPLOYEE to BCU_ACCOUNT, through the relationship Held_by_E (Chen’s notation). On
the other hand, a BCU account belongs to no more than one employee, and some BCU
accounts need not belong to any employee. This is inferred by “looking across,” from
BCU_ACCOUNT to EMPLOYEE, in the relationship Held_by_E. The same metadata is
reflected in Figure 3.6b using the Crow’s Foot notation. In the (min, max) notation shown
in Figure 3.6c, the cardinality ratio (maximum cardinality) and participation (minimum
cardinality) constraint in the Held_by_E relationship type are captured in terms of the
participation of EMPLOYEE in the Held_by_E relationship type independent of the par-
ticipation of BCU_ACCOUNT in the Held_by_E relationship type. An employee partici-
pates in at least zero (0) (optional participation) and at most m Held_by_E relationships,
meaning an employee need not have a BCU account, but may have many BCU accounts.
Likewise, a BCU account participates in at least zero (0) (optional participation) and at
most one (1) Held_by_E relationship, meaning a BCU account need not belong to any
employee, but can belong to a maximum of one employee. Note that the (min, max) nota-
tion employs a “look near” approach as opposed to the “look across” approach used in the
Presentation Layer ERD.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
1 m
EMPLOYEE Held_by_E BCU_ACCOUNT
108
(a)
“Look across” (Chen’s) notation for a binary relationship
Note: The bar adjacent to EMPLOYEE and the crow’s foot adjacent to BCU_ACCOUNT
convey the cardinality ratio of 1:m.
(b)
“Look across” (crow’s foot variant) notation for a binary relationship
(0,m) (0,1)
EMPLOYEE Held_by_E BCU_ACCOUNT
(c)
“Look near” (min, max) notation for a binary relationship
FIGURE 3.6 Introduction of (min, max) notation for an 1:m binary relationship
An example of mapping an m:n relationship from the “look across” (Chen) notation to
the “look near” notation ([min, max]) is shown in Figure 3.7. Here, two points are
noteworthy:
• The fact that an employee must have at least three certifications can be
explicitly captured in the (min, max) notation; the “look across” notation
cannot capture this specification.
• Because m and n represent different maximum cardinality values, it is
imperative that these two values not be arbitrarily placed in the mapping
process; an employee cannot be issued more than m certifications, and a
specific certification cannot be issued to more than n employees, as indicated
in Figure 3.7b.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
n m
EMPLOYEE Issued CERTIFICATION
109
(a)
“Look across” (Chen’s) notation for a binary relationship
(3,m) (0,n)
EMPLOYEE Issued CERTIFICATION
(b)
“Look near” ([min, max]) notation for a binary relationship
Note: The business rule, “an employee must have at least three certifications,” is captured in the
(min, max) notation, while the “look across” notation does not capture this specification.
The mapping of the structural constraints specified in the Presentation Layer ERD in
Figure 3.5 to the (min, max) notation appears in the initial version of the Design-Specific
ERD in Figure 3.8. In contrast to what is shown in Figure 3.5, observe how the (min, max)
notation allows the requirement that a plant must have at least 100 employees to be
explicitly specified. Also, note that a weak entity type participating as a child in an iden-
tifying relationship type, since it also has existence dependency on the identifying parent,
will always have a (min, max) value of (1,1) in that relationship.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Pl name
Lname Name_tag
Salary Pnumber
Minit
Address Works_in
Fname (1, 1) Building
Name (100, n)
110 Emp# C
N Plocation
Supervised_by
(1, m)
Assigned PROJECT
Pr name
C
Dependent_of Pnumber
Held_by_E
Related_how
Dname
(1, 1)
C
Dependent
----- ---
DEPENDENT
(0, n)
(0,1) Birthdate (0, n)
Gender
Annual_cost
C
Participates
Held_by_D
Hrs_per_wk
Hb name (0,m)
(0,1)
R
Gi_activity
C
C
Account id Balance HOBBY
BCU_ACCOUNT
Acct_type Io_activity
Account#
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
The other business rules not mapped to the Design-Specific ERD are included with the
semantic integrity constraints for the Design-Specific ER model in Table 3.2. Thus, the Design-
Specific ER model comprising the ERD in Figure 3.8 and semantic integrity constraints in
Table 3.2 fully preserve all constructs and constraints reflected in the Presentation Layer
111
ER model.
Now we have seen how the user-oriented Presentation Layer ER model is translated to
a database design orientation. To this end, two specific steps were taken in the process of
developing the Design-Specific ER model:
1. Collection of a few characteristics for attributes (e.g., data type and size)
2. Introduction of the technically more precise (min, max) notation for the
specification of relationships in the ERD
The next section presents the second stage of this line of enquiry in order to render
the conceptual schema amenable to direct mapping to the logical level.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Figure 3.9 presents an example of mapping the attribute characteristics to the ERD.
[A,1] [N,1]
Emp_a [A,1] [A,20] [A,30]
Name_tag [N,6]
112 Minit Lname Pl_name
[N,2] [N,2]
[X,50] Salary Pnumber Building
[A,20] Address Works_in
Fname (1,1)
[N,5] Name (100, n)
Emp_n
C [N,7]
Emp# [N,3] Budget
RN [Dt,8]
EMPLOYEE No_of_employees PLANT
Mgr_start_dt
[A,1]
Gender D R (0,1)
(1,1)
[Dt,8] (0,n)
Date_hired
Managed_by
[N,2]
No_of_dependents Hours
(0, 20) (0,n)
Undertaken_by
(0, 1)
[N,2]
Pnumber (0,1)
Assigned C
(1,m N
)
Supervised_by
PROJECT
[A,20] [A,15]
Pr_name Plocation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Budget
Pnumber
Building
Plname
113
PLANT
Building Pnumber
Building
Plname
Budget Budget
PLANT PLANT
(a)
Transformation of a multi-valued attribute to a single-valued attribute:
Two design variations
Budget
Pnumber
Building
Plname
PLANT
Budget
Building
Pnumber
Plname
(1,n) (1,1)
PLANT Houses BUILDING
(b)
Mapping a multi-valued attribute to a weak entity type:
A structural alternative to Figure (a)
Note: The impact of the solutions in Figures a and b above during the normalization process is
discussed in Chapter 8.
FIGURE 3.10 Two structural alternatives for the resolution of a multi-valued attribute
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
114
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
The first solution (Figure 3.10a) is apparently simpler and appears more efficient
than the alternative solution (Figure 3.10b); however, this solution will pose a data
redundancy problem that will then have to be resolved in the logical data model during
normalization.5 Interestingly, such resolution will always result in a schema equivalent
115
to the alternative solution proposed in Figure 3.10b. Therefore, the second solution is
used in the final ERD in Figure 3.13. Observe that creation of a new entity type (BUILD-
ING) and relating it to PLANT requires specification of a deletion constraint for the rela-
tionship. While rule of thumb suggests a default value of “Restrict” (R), R is not
compatible with the total participation of PLANT in this relationship. (Every plant has at
least three buildings). Therefore, the “Set default” (D) is assumed by supposing that
when a plant is closed the associated buildings will be reassigned to some other plant.
5
Normalization is covered in Chapter 8.
6
This issue is not unique to binary relationships. Any n-ary relationship (ternary, quaternary, etc.)
poses the same problem and has a similar solution. This issue is discussed further in Chapter 5.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
116
Freq Dosage
(a)
An m:n relationship in [min, max] (look-near) notation
Freq Dosage
(b)
PRESCRIBED shown as an associative entity type
Freq Dosage
(c)
Decomposition of Prescribed relationship type to a gerund entity type to eliminate m:n
cardinality ratio
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
[N,1]
[A,1] Name_tag
[A,1] [A,20]
Emp_a Minit [A,30]
Lname [N,6] [N,2]
[X,50] Salary Pl_name
Pnumber
Address Works_in
[A,20]
(1,1)
118 [N,5]
Fname
Name (100, n)
Emp_n
C [N,7]
Emp# [Dt,8] [N,3] Budget
EMPLOYEE No_of_employees PLANT
R Mgr_start_dt
N
[A,1] (0,1)
D R R
Gender (1,1)
(0,n)
[Dt,8]
Date_hired Managed_by (3, n)
[N,2]
(0, n)
No_of_dependents (0, n)
(0, 20)
Undertaken_by
(0, 1) [N,2]
(0,m)
Pnumber (0,1)
Uses
Houses
N
Supervised_by
PROJECT
[A,20]
Pr_name (1,1)
Dependent_of (1, m)
[A,15]
(1, 1) Plocation
Held_by_E
[A,12]
Belongs_to
Related_how D
[A,15] (1, 1)
Dname
BUILDING
C [N,3] (1, 1)
Dependent Hours
----- --- [A,1]
Gender C
DEPENDENT
(0,1) [A,20]
ASSIGNMENT
Building
- - - - -
[Dt,8]
Birthdate (0, n)
[A,1] [X,6]
Acct_type Account# (0, n) [N,6]
Annual_cost
Held_
by_D
Includes_D
Account_id (1, 1) PARTICIPATION
(0,1) C
C C
BCU_ACCOUNT [A,20]
Hb_name (1, 1)
HOBBY
[N,2.1]
R Hrs_per_wk
[N,8.2] (o, m) Includes_H
Balance
Io_activity Gi_activity
[A,1] [A,1]
Note that the domain constraint on the Emp# in Table 3.2 specifies a numeric and
alphabetic component for Emp#. This requirement is expressed in Figure 3.13 by making
Emp# a composite attribute composed of the two atomic attributes Emp_n and Emp_a.
The integrity constraints not mapped to the final Design-Specific ERD are included
with the semantic integrity constraints for the Design-Specific ER model in Table 3.4.
Thus, the final Design-Specific ER model comprising the ERD in Figure 3.13 and con-
straint specifications in Table 3.4 fully preserve all constructs and constraints reflected in
the Presentation Layer ER model.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
TABLE 3.4 Semantic integrity constraints for the final Design-Specific ER model
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
3.3.1 Vignette 1
This vignette is a slightly expanded version of the example about a university’s academic
program, which was presented in Section 2.3.2.
120
There are several colleges in the university. Each college has a name, location, and
size. A college offers many courses over four college terms or quarters—Fall, Winter,
Spring, and Summer—during which one or more of these courses are offered. Course#,
name, and credit hours describe a course. No two courses in any college have the
same course#; likewise, no two courses have the same name. Terms are identified by
year and quarter, and they contain enrollment numbers. Courses are offered during
every term. The college also has several instructors. Instructors teach; that is why they
are called instructors. Often, not all instructors are scheduled to teach during all
terms, but every term has some instructors teaching. Also, the same course is never
taught by more than one instructor in a specific term. Furthermore, instructors are
capable of teaching a variety of courses offered by the college. Instructors have unique
employee IDs, and their names, qualifications, and experience are also recorded.
To begin with, COLLEGE may be modeled as an entity type since a collection of
attributes—namely, Name, Location, Size, Course, and Instructor—seem to cluster under
this title. Figure 3.14a portrays the COLLEGE entity type using the ER modeling gram-
mar. The ERD at this point is syntactically correct. However, there are a couple of pro-
blems with reference to the semantics conveyed by this entity type. In Figure 3.14a,
COLLEGE is modeled as an entity type with attributes that indicate that every college
has one name, one location, one size, one course, and one instructor. Of course, the
attributes Course and Instructor are correctly shown as composite attributes, with their
appropriate content of atomic attributes. Nonetheless, this is not quite correct according
to the semantics conveyed in the requirements specification. A college indeed offers
many courses and also has several instructors. Therefore, these two attributes (Course
and Instructor) must be modeled as multi-valued attributes, as shown in Figure 3.14b.
Also, the attribute Name is duplicated in COLLEGE, one referring to the name of a
college and the other referring to the name of a course. Duplicate attribute names
within an entity type are semantically incorrect since the only reference point for the
attributes is the entity type. Thus, one of the attributes labeled Name is changed to
Course_name, as shown in Figure 3.14b. The ERD at this point is both semantically
and syntactically correct.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Emp_id Qualification
Course# Name
Experience 121
Credit_hr
Instructor
Course
Location
Name Size
COLLEGE
Emp_id Qualification
Course# Course_name
Experience
Credit_hr
Instructor
Course
Location
Name Size
COLLEGE
Next, we notice that instructors are capable of teaching a variety of courses. This
indicates a relationship between instructors and courses. Given the current portrayal of
the COLLEGE entity type, the relationship between instructors and courses can be mod-
eled as shown in Figure 3.14c. While the data model shown in Figure 3.14c correctly con-
veys the intended semantics, the model violates a syntactic rule of the ER modeling
grammar. While all attributes of an entity type are implicitly related to one another, an
explicit relationship between attributes of an entity type independent of the entity type is
not permitted in the ER modeling grammar. This syntactic error can be corrected only by
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
modeling COURSE and INSTRUCTOR as independent entity types related to the COL-
LEGE entity type and then establishing the relationship between the INSTRUCTOR and
COURSE entity types. The corrected ERD is shown in Figure 3.14d. Since both COURSE
and INSTRUCTOR have unique identifiers, they both are modeled as base entity types.
122
Furthermore, since courses are offered every term and there are four terms, Term is also
modeled here as an optional multi-valued attribute of COURSE; the optional property of
the attribute allows for the possibility that some courses are not offered at all even though
they may still be on the books. However, when Term has a value, it must have the quarter
made up of Year and Qtr#.
Emp_id Qualification
Course# Course_name
Experience
Credit_hr
Instructor
1 Can_teach Course
n
Location
Name Size
COLLEGE
Enrollment
Qtr#
Course_name
Year Course#
Name Term
Emp_id
Can_teach
1 n
COURSE
INSTRUCTOR Name
Location
Credit_hr
m
1 COLLEGE
Employs
Qualification Experience
Size
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Qtr#
Year Enrollment
Course_name
Course#
Scheduled
n m
Emp_id Name
Term
Can_teach COURSE
1 n
INSTRUCTOR Name
Location
Credit_hr
m
1 COLLEGE
Qualification Employs
Experience Size
FIGURE 3.14e A syntactically incorrect m:n relationship between INSTRUCTOR and Term
Qtr# Enrollment
Year
TERM
Course_name
Course#
m
n
Emp_id Name Scheduled
Can_teach COURSE
1 n
INSTRUCTOR Name
Location
Credit_hr
m
1 COLLEGE
Qualification Employs
Experience Size
FIGURE 3.14f A syntactically correct m:n relationship between INSTRUCTOR and TERM
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
The conceptual model at this point captures the fact that instructors are scheduled to
teach in the four terms and the instructors are capable of teaching several courses. How-
ever, the fact that courses are offered over the four terms and that in each term one or
more of the courses are offered is yet to be incorporated in the ERD. More importantly,
124
the business rule that the same course is never taught by more than one instructor in a
specific term is not incorporated in the conceptual data model either. An m:n relationship
between COURSE and TERM will capture the semantics of the first statement. This is
shown in Figure 3.14g. On another note, the unique identifier of TERM is defined as the
combination of Year and Qtr#. However, in Figure 3.14f, these two attributes are shown as
two independent unique identifiers of the entity type TERM. This is a syntactic error and
is corrected in Figure 3.14g by first constructing a composite attribute Quarter whose
atomic components are Year and Qtr# and then labeling (underlining) Quarter as the
unique identifier of TERM.
Qtr#
Year
Enrollment
TERM
Quarter
m
m
Course_name
Offered Course#
Scheduled
n
n
n 1 Assigned
Emp_id Name
Can_teach COURSE
1 n
INSTRUCTOR Name
Location
Credit_hr
COLLEGE
Qualification m
Experience 1 Size
Employs
FIGURE 3.14g No more than one instructor per course in a term: Syntactically incorrect modeling
In addition, an attempt is made to convey the business rule that the same course is
never taught by more than one instructor in a specific term by establishing a relation-
ship between INSTRUCTOR and Offered (see Figure 3.14g). This relationship does cor-
rectly convey the semantics that a course offered in a term is assigned to only one
instructor, that an instructor may teach several courses in the same term, and that the
instructor may teach the same as well as other courses in other terms. However, there is
a violation of the ER modeling grammar in the expression of relationship type Assigned
since it relates an entity type, INSTRUCTOR, with a relationship type, Offered—in other
words, there is, in effect, a diamond (Assigned) connecting to another diamond
(Offered). The solution for this syntactic error is not immediately obvious. To get a
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
better understanding of this scenario, let us review the ERDs that appear in Figure 2.27a
and 2.27b in Chapter 2. Following this line of logic, the m:n relationship type Offered,
along with the entity types TERM and COURSE participating in this relationship, can be
modeled as a cluster entity; the relationship type Assigned can now relate the base
125
entity type INSTRUCTOR, and the cluster entity type OFFERING will accomplish
expression of the intended semantics without violating the rules of the ER modeling
grammar. The revised ERD is shown in Figure 3.14h. The final Presentation Layer ERD
expressed using the “look near”—(min, max)—notation is shown in Figure 3.14i.
Qtr#
Year
TERM Enrollment
Quarter
OF
FE
RI
NG
m
m
Course_name
Offered Course#
Scheduled
n
n
Emp_id Name n
1 Assigned
Can_teach COURSE
1 n
Employs
ts
FIGURE 3.14h Presentation Layer ERD for vignette 1—“Look across” notation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
126 Qtr#
Year
TERM Enrollment
Quarter
OF
F
ER
ING
(1, n
)
(1, n
)
Course_name
Scheduled Offered Course#
1) (0,
(1, m)
Emp_id Name
)
Assigned
,m
(0
n)
(1,
Can_teach (0, 1) COURSE
(1, n)
INSTRUCTOR Location Name
Size
)
,1
(0
Credit_hr
(1
COLLEGE
,1
Qualification
)
Experience (1,
m)
Lis
Employs (1, n)
ts
FIGURE 3.14i Presentation Layer ERD for vignette 1—“Look near” (min, max) notation
The next step in the modeling process is to transform the Presentation Layer ERD to
the design-specific layer. Since the vignette does not provide the attribute characteristics
(data type, size, etc.), the only transformation that can be done at this point is to decom-
pose multi-valued attributes, if any, to weak entity types and then decompose the m:n
binary relationships to the gerund entity type with two identifying parents. It can be
observed from Figure 3.14i that the Presentation Layer ERD does not contain any multi-
valued attributes and that there are two m:n relationship types (Assigned and Scheduled).
Observe the decompositions portrayed in Figure 3.14j, the Design-Specific ERD for
vignette 1 transformed from the Presentation Layer ERD shown in Figure 3.14i.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Qtr#
Year
Enrollment
Quarter
127
TERM (1, n)
SCHEDULE
Contains
(1, n)
(1
,1
(1
)
,1 1)
) Part_of (1,
Teaches
(1, 1) OFFERING
ed Name
Emp_id Qualification Assign Course#
(1,
)
1)
,m
n)
(0
(1, Has
Can_teach (0,
(1, n) (0, 1) m)
INSTRUCTOR
COURSE
Name Size
(1,
1)
Location
Employs (1,
m)
Experience
1)
Name Credit_hr
(0,
COLLEGE
(1, n
)
Lists
FIGURE 3.14j Design-Specific Layer ERD for vignette 1—“Look near” (min, max) notation
As a final note, the scenario described in vignette 1 is not quite complete in its speci-
fication. The reader may find it a good exercise to first identify all business rules implicitly
expressed in the vignette and then list the ambiguities present in the description of the
scenario due to incomplete specification.
3.3.2 Vignette 2
Widget USA is a widgets manufacturer located in Whitefield, Indiana. A widget is an
intricate assembly of numerous parts. The assembly and manufacture of a few of the
intricate parts are done at Widget USA’s small but highly sophisticated plant in
Whitefield. All the other parts of the widget are outsourced to various vendors. Some
of the vendors supply more than one part, but a specific part is supplied by only one
vendor. A part has a unique part number and a unique name. Therefore, it is enough if
a value for one of these two attributes is present for a given part. Other parts attributes
are: size, weight, color, design, and quality standard. Manufactured parts have a cost
and raw material, and they follow a production plan, whereas purchased parts have
a price and delivery schedule. A production plan has a machine sequence, timetable,
and capacity, and it is identified by a unique plan number. Vendors are identified by
a vendor name. Other information available on a vendor includes its address, phone
number, and vendor rating.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
The cardinality constraint between VENDOR and PART is 1:n, indicating that a vendor
supplies many parts and a part is supplied by only one vendor. Also, the participation
constraints (look across) indicate that every vendor must be a supplier of part(s); however,
all parts are not supplied by vendors since some are manufactured by Widget USA. A sim-
128
ilar relationship exists between PRODUCTION_PLAN and PART since some but not all
parts are manufactured by Widget USA in its own plant. The ERD is syntactically correct.
There are two commonly committed semantic errors seeded in the ERD in Figure 3.15a
in order to point out that careful scrutiny of the details present in the requirements specifi-
cation is crucial to accurate data modeling. First, note that a vendor has telephone numbers
(plural) indicating that Phone# in the VENDOR entity type should be a multi-valued attri-
bute. Also, every part has a unique part number and a unique name, implying that PART
has two unique identifiers. Concatenating Part# and Name to form a single attribute and
labeling that as a single unique identifier of PART is incorrect. These errors are corrected
in Figure 3.15b. The fact that an attribute called Name occurs in the entity type VENDOR
as well as the entity type PART is not an error.
FIGURE 3.15b Semantic errors in PART and VENDOR in Figure 3.15a corrected
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
The conceptual model in Figure 3.15b is still semantically incomplete. What is missing
here can be discovered only when the scenario described in the vignette is systematically
analyzed for all explicit and implicit business rules conveyed by the story. For instance, a
business rule that a manufactured part is not purchased and vice versa is implicitly
129
stated in the story. The ERD in Figure 3.15b does not capture this business rule.
Suppose we split PART into two different entity types: PURCHASED_PART and
MANUFACTURED_PART, as shown in Figure 3.15c. Does this solve the problem? This
design certainly offers an opportunity to separate manufactured parts from purchased
parts and relate them to production plans and vendors, respectively. However, the data
model does not explicitly prohibit a manufactured part from also being a purchased part
and vice versa. In other words, inclusion of the same part in both entity sets, even if
done inadvertently, will not be considered an error by this design.
We also notice that a significant number of attributes are duplicated across
MANUFACTURED_PART and PURCHASED_PART. Does this mean data redundancy?
The answer is “No.” As long as the manufactured parts are included in the
MANUFACTURED_PART entity set and purchased parts are present in the PURCHASED_
PART entity set, mere duplication of attribute names does not result in data redundancy.
Also, observe that the participation constraint of the entity type MANUFACTURED_PART
in the As_per relationship type and the participation constraint of the entity type
PURCHASED_PART in the Supplied_by relationship type in this design are mandatory—
different from the design depicted in Figure 3.15b. All said and done, Figure 3.15c depicts
two independent ERDs unconnected to each other; both ERDs being in the same figure do
not make them part of a single ERD. This is poor modeling.
Size Weight
Name
Color
Phone# Part#
Name Rating Design
PURCHASED_
PART
Qlty_std
1 n
VENDOR Supplied_by
Price
Delv_sch
PRODUCTION
As_per Size Weight
_PLAN Name
1 m
Color
Part#
Design
Capacity
Plan# Mach_seq Time_tbl
MANUFACTURED
_PART
Qlty_std
FIGURE 3.15c Entity type PART in Figure 3.15b split into two entity types
In short, the semantic incompleteness of both designs (Figure 3.15b and 3.15c) with
respect to the business rule that a manufactured part is not purchased and vice versa is
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
the same. The ER modeling construct “exclusive arc” is, in fact, capable of capturing this
business rule, as can be seen in Figure 3.15d. While the exclusive arc ensures that a part
that participates in the relationship type Supplied_by cannot participate in the relation-
ship type As_per, the optional participation of PART in both the relationships may render
130
some parts not participating in both relationships. Once again, the implicit business rule
that between purchasing and manufacturing all parts are covered is rather obvious in the
vignette narrative. The either/both (hash) constraint can take care of this condition, as
shown in Figure 3.15d. In Chapter 4, we will see a more precise and elegant way of speci-
fying both these constraints.
Phone#
Name Rating
Size Weight
Name
Color
1 n Part#
Supplied_by
Design
VENDOR
PART
Qlty_std
PRODUCTION
_PLAN Cost
Raw_material Price
Delv_sch
As_per Prod_plan
Capacity m
Plan# Mach_seq Time_tbl 1
Let us now explore the following business rule: It is enough if a value for one of the
two attributes, Part# or Name, is present for a given part. In Figure 3.15b and 3.15c, both
Part# and Name are designated mandatory (dark circle), implying that a value must be
present for both of these unique identifiers (underlined attributes) in all entities of the
PART entity set. Thus, the design does not reflect the business rule just stated. Designating
both unique identifiers as optional (empty circle) in conjunction with the either/both
(hash) notation will accomplish the stated objective, as shown in Figure 3.15d. Some data
modeling scholars may disagree with this design option because at some point in the
design cycle one of the unique identifiers must be designated as the primary means of
identifying entities of the entity set and that unique identifier must be a mandatory attri-
bute. However, at the conceptual level of data modeling, there is no reason why all the
richness of the scenario cannot be captured.
A further fine-tuning of the ERD is perhaps in order for the following reason. Observe
that manufactured parts and purchased parts have several common characteristics (e.g.,
the attributes Part#, Name, Size, Weight, Color, Design, and Qlty_std) as well as attributes
that are specific to each of them. Thus, accumulating the individual specific attributes of
manufactured parts and purchased parts with the attributes common to both in a single
entity type PART entails some inefficiencies pertaining to presence of null values in the
specific attributes of manufactured parts and purchased parts; also, additional constraints
are necessary to manage different relationships for manufactured parts and purchased
parts. This can be resolved by creating separate entity types for MANUFACTURED_PART,
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
with its specific attributes, and PURCHASED_PART, with its specific attributes, and inde-
pendently relating them to PART, which contains the common attributes—essentially,
a 1:1 relationship, with partial participation of PART and total participation of
MANUFACTURED_PART and PURCHASED_PART in the relationship types
131
Manufactured and Purchased, respectively, as reflected in Figure 3.15e.
Delv_sch
Phone# Part#
Price Size Weight
Rating Name
Name
Color
Part#
PURCHASED_
PART Design
n
1
VENDOR Supplied_by PART
1
Qlty_std
1
Purchased
1
PRODUCTION
As_per
_PLAN
1 Manufactured
m
Part#
1
Capacity
Plan# Mach_seq Time_tbl
MANUFACTURED
_PART
Cost Raw_material
Prod_plan
At this point, let us tweak the scenario of this vignette to incorporate the fact that
a widget is an intricate assembly of numerous parts. Some of the parts are actually sub-
assemblies, in that they have many subparts. However, a part can be a subpart of only
one part—that is, a part can be a subpart in only one subassembly.
Since we have not been advised to the contrary, the practical assumption to make is
that any subpart of a part can be either a manufactured part or a purchased part. Now that
the PART entity set includes all the manufactured and purchased parts, a part containing
subparts can be modeled by establishing a relationship between PART and SUB_PART, a
mirror image of the entity type PART. The ERD in Figure 3.15f reflecting this relationship
is syntactically correct. Is it also semantically correct? The answer is “No.” Since any part
can also be a subpart of another part, duplication of several part entities in the parts entity
set and the sub-parts entity set is imminent. This is data redundancy and can create
semantic problems of data consistency, currency, and correctness in addition to storage
inefficiencies during database implementation.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Delv_sch
Phone# Part#
Price Size Weight
Rating Name
Name
Color
132 Part#
PURCHASED
_PART Design
n
1
VENDOR Supplied_by PART
1 Qlty_std
1
Purchased 1
PRODUCTION
As_per Manufactured
_PLAN 1 1 May_have
m
1
n Color
Capacity Design
Mach_seq Time_tbl
MANUFACTURED
_PART SUB_PART
Qlty_std
Cost Raw_material
Prod_plan Part# Name Weight
Part#
Size
The correct solution for this scenario is to model Sub_part as a recursive relationship,
as shown in Figure 3.15g. The final Design-Specific ER model is presented in the (min,
max) notation in Figure 3.15h. The only transformation required here is the decomposi-
tion of the multi-valued attribute Telephone in the entity type VENDOR to a weak entity
type identification-dependent on VENDOR. Observe that neither deletion rules nor the
attribute characteristics (data type, size, domain constraints, and any other business rules
that cannot be captured in the ERD) are provided in the narrative of the vignette. Accord-
ingly, the final ER model does not have a list of semantic integrity constraints to go with
the ERD, and the ERD itself is devoid of deletion rules and attribute characteristics.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
Delv_sch
Phone# Part#
Price Size Weight
Rating Name
Name
Color
Part# 133
PURCHASED
_PART Design
n
1
VENDOR Supplied_by PART
1 Qlty_std
1
Purchased
PRODUCTION
As_per Manufactured 1
_PLAN
1
m
1 May
Is a
have
Capacity
Mach_seq Time_tbl
n
MANUFACTURED 1 Sub_part
_PART
FIGURE 3.15g The final Presentation Layer ERD with PART-Sub_part as a recursive relationship
Delv_sch
PHONE
Part#
Price Size Weight
Rating Name
Name
)
(1,1
Color
Phone Part#
Has ----- PURCHASED
_PART Design
)
,n
(1
)
Supplied_by (1,1
VENDOR PART
(1,n) (1,1)
Qlty_std
)
(0,1
Purchased
PRODUCTION
)
,1
As_per
(0
May
Is a
(1,1
have
)
(1,1
Capacity
)
Mach_seq Time_tbl
MANUFACTURED Sub_part
_PART
Cost Raw_material
Prod_plan Part#
FIGURE 3.15h The final Design-Specific ERD for vignette 2 in (min, max) notation
The two vignettes presented in this section used a learning technique of step-by-step devel-
opment of an ERD with deliberately incomplete stages and seeded errors so that the progressive
development can also sensitize the reader to common errors (semantic and syntactic) that
occur during the development process of an ER model and the way to correct them.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
Chapter Summary
The ER modeling grammar for the conceptual data modeling activity includes an ERD as well
as the specification of semantic integrity constraints (SICs) not captured in the ERD. The ER
134
modeling framework presented in this book contains two basic layers: a presentation layer
and a design-specific layer.
The case of Bearcat Incorporated, a manufacturing company with several plants located
in the northeastern part of the United States, is used to illustrate the ER modeling framework,
proceeding from the Presentation Layer ER model to the Design-Specific ER model.
The Presentation Layer ER model is the principal vehicle for communicating with the end-
user community. The Presentation Layer ERD contains the initial definition of attributes, entity
types, and relationship types using the “look across” specification for the structural constraints of
relationships. In addition, it allows for the deletion constraints to be explicitly shown in the ERD.
Business rules and data requirements not incorporated into the Presentation Layer ERD are
expressed as semantic integrity constraints (domain constraints on an attribute or collection of
attributes, deletion constraints, and miscellaneous constraints).
Transformation of the end-user–oriented Presentation Layer ER model to the database-
designer–oriented Design-Specific ER model incorporates additional details about the character-
istics of attributes obtained from the users. The Design-Specific ERD makes use of the relatively
more accurate (min, max) notation to represent the structural constraints. Next, the Design-
Specific ER model is further revised by:
The richness of the original requirements specification is preserved all the way through the
conceptual modeling process. Finally, two short vignettes are used to highlight semantic and
syntactic errors that usually occur during the development of an ERD and the way to correct
them.
Exercises
1. What is information preservation and why is it important?
2. Describe what constitutes an ER model.
3. What is the focus of a Presentation Layer ER model?
4. Crow’s Foot notation and IDEF1X notation are two other popular notations for the
ER modeling grammar. Investigate these two grammars and compare them with the
Chen’s notation used in this chapter.
5. Examine the CASE tools ERWin and Oracle/Designer and discuss the ER modeling
grammar supported by each of them.
6. Give examples of types of business rules that are not reflected in a Presentation
Layer ER diagram.
7. Consider the following Presentation Layer ER diagram.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
135
a. Identify the base and weak entity types. What is (are) the unique identifier(s) of each
entity type? Which unique identifiers are composite attributes?
b. Identify the partial key(s).
c. Identify the optional attributes. Which of these are multi-valued attributes?
d. Identify the derived attributes. What is it about these attributes that allows them to be
considered “derived” attributes?
e. Identify the recursive and binary relationship types.
f. Which relationship types exhibit (a) total participation of each entity type, (b) partial
participation of each entity type, and (c) total participation of one entity type and partial
participation of the second entity type?
g. Describe the nature of each relationship type with (a) a 1:1 cardinality ratio, (b) a 1:n
cardinality ratio, and (c) an m:n cardinality ratio.
h. What is the significance of the use of the double diamond in naming the Goes_for
relationship type versus the use of a single diamond to name the Performs relationship
type?
i. Describe an obvious business rule that would be associated with the
Date_of_Reservation attribute in the RESERVATION entity type.
j. What does the assignment of the attribute Cost to the Performs relationship
type mean?
8. What is the difference between an exclusive arc and an inclusive arc?
9. What is a deletion constraint?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
136
10. What four deletion rules are applicable to deletion constraints? Which rule(s) refer(s) to an
action on the parent and which rule(s) refer(s) to an action on the child?
11. Is the use of the “set null” rule applicable to an identifying relationship type? If yes, explain.
If no, definitely explain.
12. What must be done to develop a Design-Specific ER model from a Presentation Layer ER
model?
13. When used in the context of data modeling, what is meant by the use of the term
“mapping”?
14. What constructs in a Presentation Layer ER diagram cannot be directly mapped to a logical
schema? What is required to represent these constructs in a Design-Specific ER diagram?
15. Assume that data are maintained on airports around the country for a company that offers a
flight chartering service for college basketball teams. The company gathers information
from a wide variety of sources but has had considerable difficulty obtaining data on runway
surfaces at airports located in small college towns. Company pilots need access to this
information when they land, either at airports such as these or when they must land at
small airports that are not located in college towns. As a result, as shown here, they have
created a RUNWAY_SURFACE entity type separate from a RUNWAY entity type.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
137
a. What does the cardinality ratio and participation constraint suggest about some airport
runways in the Consists_of relationship?
b. How might the Presentation Layer ER diagram be revised to make the relationship
between an airport runway and runway surface more efficient?
16. Transform the Presentation Layer ER diagram for Exercise 7 to a Design-Specific ER
diagram.
17. Develop a Presentation Layer Entity-Relationship (ER/EER) model for building a database
for the Indian Hill Company described in the following narrative. Indian Hill Company is a
factory manufacturing miscellaneous spare parts for the farm equipment industry. The ER
diagram should be fully specified, with unique identifier(s) and other attributes for each
entity type, and relationship(s) among the entity types. The narrative is complete. However,
if you discover any ambiguities in the narrative, make up reasonable assumptions to com-
plete the story and state the assumptions made. Note that no assumption you make can
contradict the specifications contained in the narrative. Also, business rule(s) not
incorporated in the ERD, if any, must be explicitly stated as semantic integrity constraint(s).
A business rule captured in the ERD should not be restated as a Semantic Integrity
Constraint and vice versa. You should also list any ambiguities/conflicts in the stated
specifications.
Caution: Do not read any extra meanings into the story and make it more complicated than
the simple one given below.
The factory has several departments. A department may have many employees but must
have at least seven. Every employee works for one and only one department. Every
department has a manager—only one manager per department. Clearly, a manager is an
employee of the company, but all employees are not managers. For an employee to be the
manager of a department, that employee must belong to that particular department. If a
department is closed down, all employees of that department are laid off.
A department may have many machines, and every machine is assigned to a specific
department. A machine may go for maintenance numerous times. Maintenance is per-
formed on a machine only once on a given day. Some machines are so new that they may
not have gone for maintenance yet. Maintenance tasks are outsourced to contractors.
Every contractor performs at least one maintenance task, often more. If a contractor quits,
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
the association of that contractor with its maintenance tasks are temporarily suspended;
after all, the maintenance task itself cannot go away! A maintenance task is often done by
one contractor but may sometimes involve up to three contractors. When a machine is
retired from service, all the associated maintenance records are erased.
138
Products are produced on machines. A product can be an assembly of several different
components (products) or a single piece. Also, a product cannot be a component of more
than one product. A product cannot be a component of itself. Every product (component)
goes through one or more machines for appropriate production operations. Likewise, sev-
eral products may go through a particular machine for a production operation. If a product is
deleted (due to obsolescence), all operations on that product can be discarded. Operations
have precise specifications; so, every operation of a product on a machine is specified by a
designer.
Designers design the products and/or specify production operations. Some designers may
design more than one product; others may specify more than one operation, and some may
do both. Of course, all designers are employees of the factory. Operators, who are also
employees of the factory, operate the machines. Due to multiple shifts, several operators
will operate the same machine. All operators are routinely assigned to work on only one
machine, and no operator is kept idle. A machine is never kept idle except when it is out for
maintenance. The same employee cannot be a designer as well as an operator. The fac-
tory also has employees other than designers and operators.
The ER model should capture employee’s name, which will include first name, last name,
and middle initial [o]. It should also capture gender, address, and salary [o]. An employee
number uniquely identifies an employee. Likewise, department number [o], department
name [o], type, and location [o] must be captured. The department number and department
name are both unique identifiers of a department. (Note that it is enough if one of these two
is present for any particular department.) Every machine will have a unique machine num-
ber. It will also have other attributes, like name of machine, type [o], and vendor’s name [o].
When a machine goes for maintenance, the maintenance date for that machine must be
captured since a maintenance activity is identified by the date of maintenance for each
machine. The attributes of maintenance activity are time taken and cost. A product is iden-
tified by its component ID. Component name, description [o] must also be recorded. It
should be possible to compute the number of components in a product. When a component
goes through machining operation, the starting time [o] and completion time [o] for each
product on every machine must be captured, from which the hours of machining operation
for a particular product in a specific machine can be computed. The information about
designer includes his/her qualifications [o], specialization field, and experience [o] in years.
Operators, who are responsible for operating the machines, belong to a labor union [o] and
have certain skill sets (at least one skill) associated with them. Contractors are identified by
their names. Additional attributes captured for a contractor are experience and expertise.
Note: [o] indicates optional attribute; where specification of deletion rule is missing, the
default value of Restrict should be considered first; if found problematic, an alternative may
be suggested and used.
Hint: No more than nine entity types are needed to complete this design. It is possible to
model all but two business rules in the ER diagram, if the ER modeling grammar is fully
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Entity-Relationship Modeling
employed; otherwise, a few more business rules may have to be stated in the semantic
integrity constraints.
18. The NCAA (National Collegiate Athletic Association) wants to develop a database to keep
track of information about college basketball. Each university team belongs to only one 139
conference (the University of Houston belongs to Conference USA; the University of
Cincinnati belongs to the Big East Conference, etc.), but a team may not belong to any
conference. A conference has several teams; no conference has less than five (5) teams.
Each team can have a maximum of 20 players and a minimum of 13 players. Each player
can play for only one team. Each team has from three (3) to seven (7) coaches on its
coaching staff, and a coach works for only one team. Lots of games are played in each
university location every year, but a game between any two universities is played at a given
location only one time a year. Three referees from a larger pool of referees are assigned to
each game. A referee can work several games; however, some referees may not be
assigned to any game. Players are called players because they play in games—in fact,
several games. A game involves at least 10 players. It is possible that some players simply
sit on the bench and do not play in any game. Player performance statistics (i.e., points
scored, rebounds, assists, minutes played, and personal fouls committed) are recorded for
each player for every game. Information collected about a game includes the final score,
the attendance, and the date of the game. During the summer months, some of the players
serve as counselors in summer youth basketball camps. These camps are identified by
their unique campsite location (e.g., Mason, Bellaire, Kenwood, League City, etc.). Each
camp has at least three (3) players who serve as counselors, and a player serving as a
counselor may work in a number of camps.
A player can be identified by student number (i.e., Social Security number) only. The other
attributes for a player include name, major, and grade point average. For a coach, relevant
attributes include name, title (e.g., head coach, assistant coach), salary, address, and tele-
phone number. Attributes for a referee include name, salary, years of experience, address,
telephone number, and certifications. Both coaches and referees are identified by their per-
sonal NCAA identification number. A team is identified by the name of the university (i.e.,
team). Other team attributes include current ranking, capacity of home court, and number of
players. Each conference has a unique name, number of teams, and an annual budget. For
the basketball camps, data is available on the campsite (i.e., location) and the number of
courts.
Develop a Presentation Layer ER model for the NCAA database. The ERD should be fully
specified with the unique identifiers, other attributes for each entity type, and the relation-
ship types that exist among the various entity types. All business rules that can be captured
in the ERD must be present in the ERD. Any business rule that cannot be captured in the
ERD should be specified as part of a list of semantic integrity constraints.
Hint: No more than seven entity types are needed to complete this design.
19. This exercise contains additional information in the form of deletion rules that will enable us
to develop a Design-Specific ER diagram for the NCAA database in Exercise 18.
When a referee retires, all links to the games handled by that referee should be removed.
Likewise, if a game is cancelled, all links to the referees for that game should be dropped.
Although it is does not happen often, a university may sometimes leave the conference of
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 3
which it is a member. Naturally, we want to keep the team in the database since the uni-
versity could decide to join another conference at a later date. However, if a team (univer-
sity) leaves the NCAA altogether, all players and coaches of that team should be removed
from the database along with the team. In all other relationships that exist in the database,
140
the default value of “Restriction of Deletion” should be explicitly indicated.
a. Incorporate the above business rules in the Presentation Layer ER diagram for the
NCAA database developed in the previous exercise.
b. Transform the design in (a) to a Design-Specific ER diagram. Note that attribute char-
acteristics are not provided and thus need not appear in the diagram.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 4
ENHANCED
ENTITY-RELATIONSHIP
(EER) MODELING
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
layers for Bearcat Incorporated. Finally, Section 4.5 specifies deletion rules and
incorporation of the corresponding deletion constraints for intra-entity class relationships.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Delv_sch
Part#
Phone#
Price Size Weight
Name Rating Name
Color
Part#
PURCHASED
_PART Design
n 143
1 Supplied_by
VENDOR PART
1 Qlty_std
1
Purchased
Manufactured 1
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
144
Because FURNITURE represents the generic class of entity that includes one or more
entity type occurrences (CHAIR, TABLE, SOFA), FURNITURE is labeled a Superclass (SC)
entity type. Because CHAIR (or TABLE or SOFA) represents an entity type that is a sub-
group of FURNITURE, it is labeled a subclass (sc) entity type. The relationship between a
superclass and any one of the subclasses is called an SC/sc relationship. There are three
SC/sc relationships present in the intra-entity class relationship shown in Figure 4.2. Note
that FURNITURE and CHAIR (or TABLE or SOFA) are separate entity types, although they
are not independent, like FURNITURE and STORE are. Figure 4.3 shows that entities
belonging to the CHAIR, TABLE, and SOFA subclasses also belong to the FURNITURE
superclass. Note that an entity that belongs to a subclass represents the same entity that is
connected to it from the FURNITURE superclass.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
145
1
Note that in Section 2.2, the term “entity” is defined as an instance (occurrence) of an entity type.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
2
These attributes need not be unique to the subclass.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
other hand, pulls together the common properties (attributes) shared by a set of entity
types (“sc”s) into a generic entity type (SC). In other words, generalization is the reverse
process of specialization in which the differences among a set of entity types are sup-
pressed and the common features are “generalized” into a single superclass of which the
original source entity types become subclasses. Put another way, specialization is a top-
down approach for describing an SC/sc relationship, whereas generalization is a bottom-up 147
approach for describing the same relationship. Specialization and generalization can be
thought of as two sides of the same coin; therefore, all discussions about specialization
apply equally well to generalization and vice versa.
Delv_sch
1 n ⊂
VENDOR Supplied_by Ou pe PART
tso _ty
ur rce
ce
d Sou Qlty_std
d
d
rce
ou
⊂Ins
May
Is a
have
1 n
MANUFACTURED Sub_part
_PART
Cost Raw_material
Prod_plan
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
3
It should be noted that a defining predicate is not an attribute; it is a condition based on the value(s)
of one or more attributes in the superclass.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
4.1.4.2 Vignette 1
This vignette is about Division I collegiate sports programs in which universities provide
academic support in the form of academic advising, tutoring, career counseling, and so
on to student-athletes who participate in the sports programs sponsored by the univer-
sity. Note that a university provides this support to numerous student-athletes, whereas
each student-athlete receives the support from only the university that he or she attends.
Every student-athlete participates in one or more sponsored sports. Football, basketball,
and baseball are among the many sponsored sports. Attributes common to all student-
athletes, regardless of sport, include student# (a unique identifier), name, major, grade
point average, eligibility, sport (a multi-valued attribute), weight, and height. For student-
athletes participating in the sports of football, basketball, and baseball, attributes spe-
cific to each sport are also collected. These include:
• For football players: touchdowns, position, speed, and uniform#
• For basketball players: uniform# (a unique identifier), position (a multi-
valued attribute), points scored per game, assists per game, and rebounds
per game
• For baseball players: uniform# (a unique identifier), position (a multi-
valued attribute), batting average, home runs, and errors
4
Sometimes, the term “subclass discriminator” is used instead.
5
Commonality implies that all characteristics of the individual attributes across the entity types are
the same.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
For other sports, such sport-specific attributes are not available. Also, note that
uniform# is not a unique identifier of football players because it is possible for two foot-
ball players to have the same uniform number in squads with more than 100 players
(one may be an offensive player and the other a defensive player).
Since football players, basketball players, and baseball players are all student-athletes,
150 the relationships can indeed be modeled as a specialization. Figure 4.6 depicts this sce-
nario using the specialization construct of the enhanced ER (EER) modeling grammar.
The subclasses that participate in the specialization, FOOTBALL_PLAYER, BASKETBALL_
PLAYER, and BASEBALL_PLAYER, are represented as base entity types and are collec-
tively connected to the parent superclass, STUDENT_ATHLETE, using the specialization
notation.
As we know now, each subclass may have its own specific attributes that the
superclass does not. For example, in Figure 4.6, FOOTBALL_PLAYER has the attributes
Speed, Position, Touchdowns, and Uniform#; BASKETBALL_PLAYER has the attributes
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
The dotted line for the superclass connector in Figure 4.6 indicates that there are
student-athletes who are neither football players, nor basketball players, nor baseball
players; they participate in other sports. The reason these other sports are not depicted as
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
subclasses is that they do not have any specific attributes beyond what they inherit from
the superclass. Had this been a solid line, it would have conveyed that there are no
student-athletes playing anything other than football, basketball, or baseball. Notice in
Figure 4.6 that the value “o” for the disjointness constraint conveys that a football player
can also be a basketball player and/or baseball player. Had this value been a “d,” it would
152 have prohibited a student-athlete from participating in more than one of these three
sports.
A closer inspection of Figure 4.6 may suggest that the attributes Uniform# and Position
should be included among the attributes of the STUDENT_ATHLETE entity type instead of
each of the individual subclasses. However, there are a variety of factors that prohibit
adding these attributes to STUDENT_ATHLETE. First, the intent is to show Uniform# as a
unique identifier in two of the subclasses. But Uniform# cannot be shown as a unique
identifier of STUDENT_ATHLETE because two student-athletes in baseball and basketball
respectively may have the same uniform number. Then the requirement that Uniform# is a
unique identifier in BASKETBALL_PLAYER and BASEBALL_PLAYER would not be pre-
served. In addition, designation of Uniform# as a unique identifier in STUDENT_ATHLETE
will propagate the same property to FOOTBALL_PLAYER, which would also be incorrect.
The multi-valued attribute Position poses a similar problem because a student-athlete par-
ticipating in multiple sports may play the same position in more than one sport—a center
in football could also be a center in basketball.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
specialization (i.e., the completeness constraint is “total”), meaning that every student-
athlete must be either a varsity player or intramural player.
Also, in Figure 4.7, the inter-entity class relationship depicted by Sponsored_by
between INTRAMURAL_PLAYER and ORGANIZATION illustrates that an entity type par-
ticipating in a specialization as a subclass may also have specific (external) relationship(s)
beyond the specialization. Sample data representing this extension of vignette 1 appears 153
in Table 4.2. Observe that in both Figure 4.7 and Table 4.2, the subclasses INTRAMURAL_
PLAYER and VARSITY_PLAYER also have their own specific attributes.
ty pe
Sport_
o d
U
U
U U
U
"Basketball"
U
"Football"
"Basketball"
League
Uniform# n
Uniform# Team Scholarship Redshirt
Position Speed Position Position Errors
Pts_per_game Uniform# Sponsored_by
Batting_avg Home_runs
Rebounds_per_game
1
Assists_per_game
ORGANIZATION
Name
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
154
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Student#
Name
Major STUDENT Gpa
U
155
Eligibility
Height
F_b_b STUDENT_
ATHLETE Weight
Sport_type
d1 d3
U d2
U
U
U "Baseball"
U
U
Speed
FOOTBALL BASKETBALL_ BASEBALL_ TEAM_ VARSITY_ INTRAMURAL_
Touchdowns PLAYER PLAYER PLAYER CAPTAIN PLAYER PLAYER
Uniform#
Position
Position n
Batting_avg Team Scholarship Redshirt Auto
d Uniform# U Errors
Sponsored_by
Pts_per_game Home_runs
U U Assists_per_game Rebounds_per_game 1
Uniform#
Walks
DEFENSIVE_ OFFENSIVE_ PITCHER ORGANIZATION
PLAYER PLAYER Strikeouts
Era
Innings_pitched
Receptions
No_of_tackles No_of_interceptions Yards_gained Name
Type
Pitching_speed
4.1.5.1 Vignette 2
Vignette 2 is a continuation of vignette 1, which was introduced in Section 4.1.4.2. It is
common today for a football player to specialize in either offense or defense. Attributes
collected for offensive players include receptions and yards gained. Attributes collected
for defensive players include number of tackles and number of interceptions. Likewise,
some baseball players are also pitchers. Attributes collected about pitchers include
earned run average (ERA), pitching speed, innings pitched, strikeouts, and walks.
The structure of a specialization hierarchy is constrained to an inverted tree in that
an entity type must not participate as a child in more than one specialization. In informal
terms, a child entity type cannot have more than one parent. Notice that Figure 4.8
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
4.1.5.2 Vignette 3
Vignette 3 describes a scenario suitable for modeling as a specialization lattice. Each
member of the high school staff at Homer Hanna High School is either a full-time or part-
time employee. At the same time, a staff member may be either a trainer or coach and a
member of the teaching staff or support staff. A part-time employee may be a part-time
trainer, and each part-time trainer is also a trainer. Finally, a member of the athletic
staff is a full-time employee, a coach, and a teacher.
This scenario describes a situation in which an entity type can participate as a sub-
class in more than one specialization; in simple terms, a child can have more than one
parent. Such a specialization is called a specialization lattice. Since a specialization can
involve only one superclass, each parent in a specialization lattice comes from a different
specialization. The subclass itself inherits all the attributes and relationship types from the
superclasses of all the specializations participating in the specialization lattice and the
predecessor hierarchy of all these superclasses. This is called multiple type inheritance,
and the subclass in the specialization lattice is referred to as a shared subclass since it is
participating as a subclass in multiple specializations. As a rule, any attribute or relation-
ship type inherited more than once via different paths in the specialization lattice is not
duplicated in the shared subclass.
Figure 4.9 illustrates a specialization lattice. Here, the entity type ATHLETIC_
STAFF_MEMBER is a shared subclass in three distinct specializations:
[FULL_TIME_EMPLOYEE ® ATHLETIC_STAFF_MEMBER], [COACH ® ATHLETIC_
STAFF_MEMBER], and [TEACHER ® ATHLETIC_STAFF_MEMBER] and inherits the
specific attributes of FULL_TIME_EMPLOYEE, COACH, and TEACHER. In addition,
ATHLETIC_STAFF_MEMBER inherits the attributes of HIGH_SCHOOL_STAFF, but only
once, even though the inheritance itself occurs via three paths. It should be noted that the
relationship types Scheduled between TEACHER and COURSE and Coaches between
COACH and TEAM are also inherited by ATHLETIC _STAFF_MEMBER. At the same time,
the specific attributes of ATHLETIC_STAFF_MEMBER and the specific relationship type
Recruits belong only to ATHLETIC_STAFF_MEMBER.
6
The symbol ® is used here to convey the hierarchical predecessor and successor.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Name
Emp#
Birthdate
HIGH_
Degrees SCHOOL_
Gender
STAFF
157
d2 d3
d1
U U U
U
U
Certification U Sport
Pay_scale Subject Position
Salary
U
m
Sport U m
Coaches
PART_TIME_ Scheduled
n
Dotted line indicates that every full-time TRAINER
employee is not an athletic staff member. n
Sport
U Credits
Team_id
TEAM U Course_nm
U
ATHLETIC_
STAFF_ ATHLETIC_STAFF_MEMBER is a shared subclass in 3 distinct
MEMBER specializations:
{FULL_TIME_EMPLOYEE -ATHLETIC_STAFF_MEMBER}
{COACH -ATHLETIC_STAFF_MEMBER} and
1 {TEACHER -ATHLETIC_STAFF_MEMBER}
Recruits
n
Gpa
Major
Eligibility
Name
Sport
Student# STUDENT_ Weight
ATHLETE Height
4.1.6 Categorization
A fundamental characteristic of the specialization/generalization construct is that there
can be only one superclass in the construct. For this reason, there are intra-entity
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Name
n 1 1
Employed_by UNIVERSITY Administers
Name n
Amount
Ssno n
Donates FUND
INDIVIDUAL
m
Fund#
Name Donor_id
Tax_id
Value
U DONOR
U
COMPANY
Id
Title
Address Type
FOUNDATION
In the example shown in Figure 4.10, the participation of COMPANY and FOUNDA-
TION is total (every company and foundation must be a donor), whereas the participation
of INDIVIDUAL is partial (some individuals are not donors). On the other hand, a donor is
either a company, a foundation, or an individual. If the completeness constraint for all
superclasses in the categorization exhibits total participation, then the category (subclass)
itself is called a total category. That is, the category set is a union of all the three super- 159
class entities. Likewise, if the completeness constraint is partial, the category itself is
referred to as a partial category (i.e., the category set is a proper subset of the union of all
the superclass entities). Finally, type inheritance in categorization is selective. That is,
members of the category (subclass) selectively inherit attributes and relationships of the
superclass entity based on the SC/sc relationship in the categorization in which the mem-
ber participates. This is often referred to as the selective type inheritance property of a
category. Observe that this is diametrically opposite to the property of multiple type
inheritance exhibited by a shared subclass in a specialization lattice.
Sometimes, a category may not have a unique identifier. For instance, DONOR does
not have a unique identifier stated as part of the data specification. While type inheri-
tance, even when selective, will furnish the category (in this case, DONOR) with a unique
identifier, the properties of the unique identifier will vary across category instances,
depending on the superclass from which the attributes are inherited. This situation is
often simplified by specifying a “manufactured” surrogate key for the category to serve the
role of unique identifier. The attribute Donor_id shown in the EERD in Figure 4.10 is the
surrogate key artificially created by the modeler.
The sample data shown in Figure 4.11 illustrates the relationship among INDIVIDUAL,
COMPANY, FOUNDATION, DONOR, and FUND shown in Figure 4.10. Observe that
DONOR is a partial category because, while the participation of COMPANY and FOUNDA-
TION is total, the participation of INDIVIDUAL is partial (i.e., the Jim Jones with Social
Security number 456456456 is not a donor). In Figure 4.11, the surrogate key “manufac-
tured” for each donor begins with a one-character code that represents the donor category
(F = Foundation, C = Company, I = Individual) followed by the donor number within that
category. In addition, the DONATION data reflects the fact that each donor makes a
donation to at least one fund and that each fund has at least one donation from a donor.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
160
FIGURE 4.11 Sample data sets for the categorization example in Figure 4.10
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
VEHICLE represents the entity class (superclass) to which the entity types (subclasses)
CAR, TRUCK, VAN, and SUV belong. Notice that the completeness constraint indicates
that the dealership has other vehicles not captured explicitly in this generalization and the
only data captured on these vehicles is that which is common to all vehicles. It is also
important to realize that an entity cannot be a member of one of the subclasses {CAR,
TRUCK, VAN, SUV} unless it exists in the superclass VEHICLE. 161
FIGURE 4.12 A set of entity types: CAR, TRUCK, VAN, and SUV
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
162
FIGURE 4.13 Generalization of subclasses CAR, TRUCK, VAN, and SUV to a VEHICLE
superclass
Not all vehicles in the dealership are registered vehicles, though. In other words, the
dealership has in its inventory lots of vehicles not yet registered; they do not have an
assigned license plate number. How would you model REGISTERED_VEHICLE? The cru-
cial issue here is that the model should allow for the possibility of some unregistered
cars, trucks, vans, and/or SUVs to be present. This property cannot be captured in the
generalization/specialization construct shown in Figure 4.13. One method for modeling
this situation involves the use of the categorization construct. Figure 4.14 presents this
scenario. Note that REGISTERED_VEHICLE is the subclass in this categorization, while
CAR, TRUCK, VAN, and SUV are superclasses of the categorization. Participation of CAR
and VAN in this relationship is partial, while TRUCK and SUV exhibit total participation.
This is indicated in Figure 4.14 by the dotted lines and solid lines for the completeness
constraint. That there are other vehicles in the lot and some of them may be registered
vehicles is incorporated via the superclass OTHER with a partial participation in the
categorization. Also, since the category REGISTERED_VEHICLE has a unique identifier
License_plate#, there is no need to create a surrogate key for REGISTERED_VEHICLE.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
163
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Size License_plate#
CAR REGISTERED_CAR
U
Tonnage Cabin_type
License_plate#
TRUCK REGISTERED_TRUCK
U
Note: In this specification, all SUVs are
registered_SUVs and all trucks are
registered_trucks. Capacity Door_style
License_plate#
U
U
VAN REGISTERED_VAN
U
U
Vehicle_id#
List_price Drive
License_plate#
VEHICLE d
U
SUV REGISTERED_SUV
U
Description Option
U
License_plate#
FIGURE 4.15 The partial category REGISTERED_VEHICLE in Figure 4.14 expressed in a spe-
cialization hierarchy
The ERD in Figure 4.15 can be rendered a little more efficient by eliminating the
subclasses REGISTERED_SUV and REGISTERED_TRUCK from the ERD since, according
to the specifications provided, all trucks and SUVs are indeed registered. The revised ERD
appears in Figure 4.16.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Size License_plate#
CAR
U
REGISTERED_CAR
TRUCK
Note: Since in this specification, all
SUVs are registered_SUVs and all
trucks are registered_trucks, further Capacity
Door_style
U
specialization of SUV and TRUCK is License_plate#
not necessary U
U
VAN REGISTERED_VAN
Vehicle_id#
U
License_plate#
List_price Drive
VEHICLE d
U
SUV
Description Option
U
License_plate#
U
REGISTERED_OTHER
FIGURE 4.16 A revised version of the disjoint specialization hierarchy of VEHICLE displayed in
Figure 4.15
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
4.1.8 Aggregation7,8
While categorization is capable of expressing a modeling variation that generalization/
specialization cannot incorporate, there are other constraints that categorization imposes
in order to sharpen its expressive power. In categorization, an entity that is a member of a
category (subclass) must exist in only one of its superclasses. With aggregation, this con-
166 straint is relaxed. In fact, it is not only relaxed, it is reversed in the aggregation construct,
thereby enriching the capabilities of an EER modeling domain.
Aggregation allows modeling a “whole/part” association as an “Is-a-part-of” relation-
ship between a superclass and a subclass. An aggregate here is a subclass that is a subset
of the aggregation of the superclasses in the relationship. In other words, an entity in the
aggregate contains superclass entities from all SC/sc relationships in which it participates.
Therefore, in this case the type inheritance is collective, as opposed to categorization,
where it is selective. Collective type inheritance connotes inheritance of attributes and
relationships from all superclass entities contained in the specific aggregation.
A diagrammatic representation of the aggregation construct in the EER diagram is
shown in Figure 4.17. The notation is similar to that of categorization except that the
union indicator (U) is replaced by the aggregation indicator (A). In the example in
Figure 4.17, the subclass PERSONAL_COMPUTER is the aggregate of which HARDWARE
and OPERATING_SYSTEM are parts. While a category can be total or partial, an aggregate
can never be partial (no connector is a dotted line). That is, all hardware and operating
system entities are “part of” some personal computer. Further, a hardware entity or an
operating system entity can belong to only one personal computer entity. Figure 4.18
portrays an aggregation hierarchy.
7
Unified modeling language (UML) for object-oriented modeling distinguishes between simple
aggregation, which is entirely conceptual, and composition, which is classified as a variation of
simple aggregation and does add some valuable semantics (Booch, Rumbaugh, and Jacobson, 2005).
Composition changes the meaning of navigation across the association between the whole and its
parts and links the lifetimes of the whole and its parts. Aggregation, as described in this section,
corresponds to UML’s composition construct.
8
The term “aggregation” is also used in inter-entity class relationships to indicate a cluster of related
entity types, which is referred to as an aggregate entity type or a cluster entity type. This will be
addressed in Chapter 5.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
167
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Figure 4.19 depicts an aggregation and a categorization involving the same set of
entity types. Every taxable property is a lot or a house (not a lot and a house together as a
single taxable property). Some lots and some houses are not taxable properties (superclass
connectors are dotted lines, denoting partial completeness). Each lot and each house is
recorded as a separate taxable property. On the contrary, a lot and a house are “parts of”
168 a home—that is, a HOME entity includes both a LOT entity and a HOUSE entity. All lots
and houses participate in the aggregation. A house and a lot cannot belong to more than
one home.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
[N,6]
Student# [A,30]
Name
[A,25] STUDENT
[N,1.2]
Major
Gpa
[A,1] 169
F_b_b
[N,2]
Height
STUDENT_ [N,3]
ATHLETE Weight
F_b_b_value
d1 d3
Batting_avg
U d2
[N,1.3] Home_runs
[N,2.2] [N,2]
U
U
[N,2]
Speed Touchdowns
U
"Football"
U
U "Basketball" Errors
"Baseball" [N,2] League
[N,2]
Uniform# Bpid
FOOTBALL_ BASKETBALL_ BASEBALL_ TEAM_ VARSITY_ INTRAMURAL_ Auto
PLAYER PLAYER PLAYER CAPTAIN PLAYER PLAYER
Pts_per_game N
[A,15] (0,1)
(1,n) [N,2.1]
Team
Uniform# o Position Assists_per_game (1,n) Scholarship Redshirt
[A,15] [N,2.1] [A,1]
[N,2] Plays_
Plays_ Rebounds_per_game Plays_ U [A,1] Sponsored_by
position
position [N,2.1] position
U
U
In the Presentation Layer ER diagram shown in Figure 4.8, the multi-valued attribute
Position in BASKETBALL_PLAYER and BASEBALL_PLAYER indicates that both baseball and
basketball players can play more than one position within that sport. Observe in Figure 4.20
the multi-valued attribute Position, which appears in both BASKETBALL_PLAYER and
BASEBALL_PLAYER, has been replaced by the weak entity types POSITION_BK (for
BASKETBALL_PLAYER) and POSITION_BA (for BASEBALL_PLAYER), with the attribute
Position serving as its partial key in each weak entity type. In BASEBALL_PLAYER, replacing
the multi-valued attribute Position with the weak entity type POSITION_BA also requires
that a relationship Plays_pitcher between the POSITION_BA and PITCHER be established,
where each POSITION_BA entity is associated with at most one PITCHER entity and each
PITCHER entity is associated with exactly one POSITION_BA entity. The SC/sc relationship
between BASEBALL_PLAYER and PITCHER remains intact.
The deletion constraints associated with the inter-entity class relationship types in
Figure 4.20 are listed here, followed by a description, in italics, of how the deletion
constraint is shown in the figure.
• If an organization stops sponsoring intramural players, then intramural
players sponsored by that organization are retained in the database for
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
more employees. An employee is involved in no more than seven in-house projects but
need not be involved in any project. For both in-house and outsourced projects, a
description of the project is gathered. Data gathered about each vendor include a vendor
name, address, phone number, and contact person. Vendor name is used to identify each
vendor.
Because the same vendors are often contracted for future projects, when an out- 171
sourced project is removed from the system, the vendor information should be retained
for future use. If a hobby is removed from the recreation portfolio of Bearcat Incorpo-
rated, its relationship with any sponsor is removed as well. Likewise, when a sponsor is
removed from the recreation portfolio of Bearcat Incorporated, its relationship with any
hobby is removed.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
172
Chapter 4
A plant’s project may be done in-house or outsourced to one or more contracted vendors. However, a vendor can participate in only one outsourced
project at a time. A plant employee is assigned to an in-house project, and an in-house project involves one or more employees. An employee is involved
in no more than seven in-house projects but need not be involved in any project. For both in-house and outsourced projects, a description of the project
is gathered. Data gathered about each vendor include a vendor name, address, phone number, and contact person. Vendor name is used to identify each
vendor. Because the same vendors are often contracted for future projects, when an outsourced project is removed from the system, the vendor informa-
tion should be retained for future use.
Lname Name_tag
Salary
Minit
Address Lname
Fname Name_tag
Name Minit
Name
Emp# Fname Address
Gender EMPLOYEE Salary
Emp# Pnumber
Date_hired
R Pr_name
Hours EMPLOYEE
Assignee Gender
Plocation R PROJECT
No_of_dependents
Date_hired Plocation
m n Assignee
Assignment
A
Assigned PROJECT
C Pr_name No_of_dependents m
7
U
Assigned IN_HOUSE_PROJECT d
C Assignment
Pnumber
(a) Before U
Hours OUTSOURCED_PROJECT
Description
1 Description
Contracted_to
N
VENDOR
V_name
V_address
V_phone
(b) After
BOX 1
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
The sponsor of a hobby can be one or more individuals, schools, or churches. Although each hobby need not
have a sponsor, each individual, school, and church is involved in sponsoring one or more hobbies. A Social
Security number is used to identify each individual sponsor. Other data captured about individuals include
name, address, and phone number. Schools and churches are identified by their names. For a church, its
denomination and pastor are recorded; for a school, its size and the name of its principal are recorded.
Many of the sponsors of hobbies are not-for-profit organizations; for these organizations, the type, exempt ID, 173
and annual operating budget are recorded.
Some of the schools are public schools and therefore are also classified as not-for-profit organizations.
For the public schools, the name of the school district and its tax base are recorded.
Sponsor_id
Status
SPONSOR
U
U
Pastor
Name
CHURCH SCHOOL INDIVIDUAL Ssn
Name
Denomination Name Phone# Sponsor_id
Size Status
Address
Principal
SPONSOR
(a)
U
Budget
NOT_FOR_PROFIT_
ORGANIZATION
Type
Exempt_id
(b)
Sponsor_id
Status
SPONSOR
U
U
U
Budget Pastor
NOT_FOR_PROFIT_ Name
CHURCH SCHOOL INDIVIDUAL Ssn
ORGANIZATION
Denomination
Name
U
U
District Tax_base
(c)
BOX 2
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
The excerpts from the requirements specification stated in Section 4.3 that are dis-
played at the top of Box 2 depict three distinct, yet interrelated concepts:
a) Sponsors of hobbies comprise churches, schools and individuals.
b) Some of the sponsors are not-for-profit organizations.
174 c) All public schools are not-for-profit organizations.
First, CHURCH, SCHOOL, and INDIVIDUAL represent different entity classes. Thus,
the fact that these three entity types are the sponsors cannot be represented by a spe-
cialization simply because the entity types participating in a specialization relationship
type must, by definition, belong to the same entity class. Categorization is the EER
modeling construct designed to express the union of entity types from different entity
classes (SCs) selectively representing a subclass entity type SPONSOR. Box 2(a) presents
this scenario using the intra-entity class relationship “Categorization.” Observe that
the requirements specification does not provide a unique identifier for the “category”
(subclass in this relationship). Although, through selective inheritance, SPONSOR does
indeed inherit the respective unique identifier from each of CHURCH, SCHOOL, and
INDIVIDUAL, none of them can serve the role of the unique identifier of all the entities
in the SPONSOR entity set due to incompatible attribute characteristics across them.
Under these circumstances, the modeler has no choice but to specify a surrogate key for
the category; thus, Sponsor_id is the surrogate key manufactured by the modeler for the
entity type SPONSOR participating as the subclass in the categorization that has three
superclass entity types; see Box 2(a).
The second concept (b) stated earlier is captured in Box 2(b). This is a simple partial
specialization depicting the business rule that some of the sponsors are not-for-profit
organizations.
Finally, Box 2(c) first combines the ERDs in (a) and (b). Next, the entity type
PUBLIC_SCHOOL is modeled as the shared subclass participating in two distinct partial
specializations—one in which SCHOOL is the superclass and the other in which NOT_
FOR_PROFIT_ORGANIZATION is the superclass. Thus, the specialization lattice results.
Observe that the entity type SPONSOR plays the role of a subclass in one intra-entity
class relationship (categorization) while also assuming the role of a superclass in another
intra-entity class relationship (specialization).
Figure 4.21 and Table 4.3 incorporate the additional requirements specified in
Section 4.3 into the Presentation Layer ER model for Bearcat Incorporated.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Lname
Name_tag Pl_name
Minit Pnumber
Name
Fname Address No_of_employees
Salary n Works_in 1
Emp#
worker employer R
EMPLOYEE PLANT 175
RN Budget
Gender manager managed by
D
R Managed_by Responsible
1 1
Date_hired 1
Supervisee Undertaken_by
Assignee Building
No_of_dependents Supervisor n
Mg_start_dt Pnumber
Having Controlled
Supervised_by
Holder of 1 Pr_name
20 m N
1 Ass
ign
1 ed
C PROJECT
n
m
Hours
U
Related_how Depends_on IN_HOUSE_PROJECT d
Belongs to
Dname
Dependent C U
--------------
DEPENDENT
Gender
OUTSOURCED_PROJECT
Participant Description
C n
Account_id Acc_type
m HOBBY
Supports
C
Bank# n C Status
Account# N
lo_activity VENDOR
SPONSOR Gi_activity
U
V_name
U
U
V_address
Budget Pastor
V_phone
Name
NOT_FOR_PROFIT_ CHURCH SCHOOL INDIVIDUAL Ssn
ORGANIZATION
Denomination
Name
U
Type
U
Name Phone#
PUBLIC_ Size
Exempt_id Address
SCHOOL Principal
District Tax_base
FIGURE 4.21 Presentation Layer Enhanced ER (EER) diagram for Bearcat Incorporated
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
TABLE 4.3 Presentation Layer semantic integrity constraints for expanded Bearcat Incorporated
scenario
Table 4.4 represents the first step in the transition from the Presentation Layer ER
model to the Design-Specific ER model—the collection of attribute characteristics (data
type, size, and domain constraints) for the incremental specifications added for the
Bearcat Incorporated case in Section 4.3.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Entity/Relationship
Type Name Attribute Name Data Type Size Domain Constraint
C = Checking Acct.,
BCU_ACCOUNT Acc_type Alphabetic 1 S = Savings Acct.,
I = Investment Acct.
I = Indoor Activity,
HOBBY Io_activity Alphabetic 1
O = Outdoor Activity
TABLE 4.4 Semantic integrity constraints for Initial Design-Specific ER model for expanded Bearcat
Incorporated scenario—Initial
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Entity/Relationship Attribute
Type Name Name Data Type Size Domain Constraint
G = Group Activity,
HOBBY Gi_activity Alphabetic 1
I = Individual Activity
OUTSOURCED_
Proj_pct Numeric (3.1)
COMPONENT
OUTSOURCED_
Component Alphanumeric 50
COMPONENT
NOT_FOR_PROFIT_
Exempt_id Alphanumeric 9
ORGANIZATION
NOT_FOR_PROFIT_
Budget Numeric (10.0)
ORGANIZATION
NOT_FOR_PROFIT_
Type Alphabetic 1
ORGANIZATION
Baptist, Catholic,
CHURCH Denomination Alphabetic 20
Lutheran, Methodist
*(n.m) is used to indicate n places to the left of the decimal point and m places to the right of the decimal point
TABLE 4.4 Semantic integrity constraints for Initial Design-Specific ER model for expanded Bearcat
Incorporated scenario—Initial (continued)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
TABLE 4.4 Semantic integrity constraints for Initial Design-Specific ER model for expanded Bearcat
Incorporated scenario—Initial (continued)
The next step is to transform the Presentation Layer EER model to the Design-
Specific tier. Here, the majority of the mapping has already been done in Chapter 3 and
appears in Figure 3.13. One of the incremental mappings pertains to relating IN_HOU-
SE_PROJECT to the gerund entity type ASSIGNMENT in place of PROJECT. Likewise,
OUTSOURCED_PROJECT, the new subclass entity type resulting from the specialization
of PROJECT is now related to a new entity type, VENDOR, in an inter-entity class rela-
tionship. Figure 4.22 contains the ER diagram for the Design-Specific ER model.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
[A,20]
Lname
[N,1] [A,30]
Name_tag Pl_name
[A,1]
[N,5] [N,2]
Minit
Emp_n Name
[X,50] Pnumber
[A,20]
Fname Address [N,3]
[N,6] No_of_employees
[A,1] (0,1) Works_in (100,n)
Emp_a Emp# Salary
180 R
EMPLOYEE R PLANT [N,7]
[A,1] Budget
(0,1) (1,1) (0,m)
Gender D R (3,n) R
Managed_by
[Dt,8] R
Undertaken_by [N,2]
Date_hired (0,7)
Houses Pnumber
[N,2] (0,20) (0,1) (0,n) [Dt,8] (0,1)
No_of_dependents Mg_start_dt N [A,20]
Supervised_by Pr_name
(1,m) (1,1) PROJECT
Uses [A,15]
Held_by_E [A,20] Plocation
BUILDING
[A,12] Building
Related_how ----------
Dependent_of (1,1)
d
[A,15] (1,m)
U
Dname Belongs_to IN_HOUSE_PROJECT
(0,1) (1,1) U
[N,3]
[X,6] Dependent C (1,1)
Hours
Account# -------------- C
DEPENDENT
[A,1]
Gender ASSIGNMENT [X,50] OUTSOURCED_PROJECT
[N,2]
Bank# (0,n) Description
[Dt,8] [N,6]
Birthdate (0,n) Annual_cost
Held_by_D (1,n)
Account_id [X,50]
(1,1) C Description
Includes_D PARTICIPATION
C (0,1)
C (1,1)
BANK_ACCOUNT [A,1] [N,2,1] Contracted_to
Acc_type [A,20]
Hb_name Hrs_per_wk
C
[A,1] N
(1,1) Status
[A,1] VENDOR
lo_activity
(1,m) [A,1]
Provides SPONSOR Gi_activity
[A,30]
U
V_name
U [X,50]
[N,10.0]
U
[A,30] V_address
Budget Pastor
[A,30] [X,10]
Name V_phone#
NOT_FOR_PROFIT_ Ssn
CHURCH SCHOOL INDIVIDUAL
ORGANIZATION [N,9]
[A,30]
Denomination Name
Type Name [X,10]
U
[A,20] [A,30]
[A,1]
U
Phone#
Size
Exempt_id [X,50]
PUBLIC_ [N,4]
[X,6] Address
SCHOOL Principal
[A,30]
District
[A,30] Tax_base
[N,10.0]
Because the data type and size of attributes are now captured in the EER diagram,
the associated list of semantic integrity constraints accordingly shrinks. The updated
semantic integrity constraints are displayed in Table 4.5.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
TABLE 4.5 Semantic integrity constraints for the Design-Specific ER model for expanded Bearcat
Incorporated scenario—Final
Similar to the gerund entity type ASSIGNMENT created as a result of the m:n
relationship between EMPLOYEE and IN_HOUSE_PROJECT, the gerund entity type
SUPPORT appears in Figure 4.22 as a result of the decomposition of the m:n relationship
between HOBBY and SPONSOR. This makes it possible to tie the support of a certain
sponsor of a specific hobby to its two identifying parents (SPONSOR and HOBBY).
Furthermore, that an employee cannot be assigned to more than seven project compo-
nents is captured by the maximum cardinality shown on the edge connecting employee
to the Uses relationship.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
correct this error by revising one of the two restrict (R) constraints to a cascade (C); note
that the cascade constraint is applied on the child entity in this SC/sc relationship.
Because this is a disjoint specialization (d), as long as at least one part is manufactured,
the participation of PART in the relationship with PURCHASED_PART will be partial.
Therefore, the restrict (R) constraint becomes legal. The revision of the cascade con-
straint on PART in the role of child in the relationship with MANUFACTURED_PART to a 183
“set null” (N) is incorrect since the total specialization (solid line on the SC edge) contra-
dicts with the deletion constraint N.
Delv_sch
Price
Size Weight
PURCHASED Name
_PART Color
Part#
Sourcing
Ou
ts U
ou
rce
e PART
d
typ
C rce_
R Sou Qlty_std
R d
C
d
rce
U
ou
Ins
MANUFACTURED
_PART
Prod_plan
Cost
Raw_material (a)
Delv_sch
Price
Size Weight
Name
PURCHASED
_PART Color
Part#
Sourcing
Ou
tsU
ou
r ce
e PART
typ
d
rce_
C Sou Qlty_std
R
d
N
d
rce
U
ou
Ins
MANUFACTURED
_PART
Prod_plan
Cost (b)
Raw_material
Delv_sch
Price
Size Weight
PURCHASED Name
_PART Color
Part#
184 Sourcing
Ou U
ts
ou
e
typ
rc
_ PART
ed
rce
C Sou Qlty_std
R
N d
ed
urc
U
o
Ins
C
MANUFACTURED
_PART
Prod_plan
Cost Raw_material
(c)
Delv_sch
Price
Size Weight
PURCHASED Name
_PART Color
Part#
Sourcing
Ou
ts U
ou
pe
rc
_ty PART
ed
C rce
R Sou Qlty_std
N O
d
rce
U
ou
Ins
MANUFACTURED
_PART
Prod_plan
Cost (d)
Raw_material
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Delv_sch
Price
Size Weight
PURCHASED Name
_PART Color
Part# 185
Sourcing
Ou
tso
ur
ype
U
ce
e_t PART
d
rc
N Sou
R Qlty_std
O
N
d
rce
U
ou
Ins
C
MANUFACTURED
_PART
Prod_plan
Cost Raw_material
(e)
Note: Square enclosure of a deletion constraint indicates
erroneous specification.
The only revision made in Figure 4.23c that differentiates it from Figure 4.23b is the
partial specialization of PART. Because this revision implies that there can be Part entities
that are neither manufactured nor purchased, the deletion constraint N now becomes
legal. Next, what if while some Parts are manufactured and some are purchased, some
others are insourced as well as outsourced—that is? manufactured as well as purchased?
Perhaps some Parts are large-quantity items and neither source has sufficient capacity to
supply the quantity needed. There can be other reasons for such situations; what is
important from a data modeling perspective is that the model should be able to handle
this condition.
In fact, the overlapping specialization (disjointness constraint = O) in the intra-entity
class relationship of the EER modeling grammar is specifically intended to serve this need.
Note that, in Section 4.1.4.1, the story line for this example presents this model as an
attribute-defined specialization where the role of subclass discriminator (defining attri-
bute) is played by the mandatory attribute Sourcing in the entity type PART. The over-
lapping specialization then requires that this attribute be a mandatory multi-valued
attribute. These revisions appear in Figure 4.23d. When any revision to an ER/EER model
is made, one must carefully review the model for any unintended consequences. The cur-
rent example (see Figure 4.23d) is a case in point for such an occurrence. Observe that
the cascade deletion constraint on MANUFACTURED_PART was legal in the previous
stage (Figure 4.23c); however, it is shown as problematic in Figure 4.23d. That is because,
in itself, the cascade rule on the child entity type in the SC/sc relationship depicted by
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
the defining predicate “insourced” is correct. However, the change in the semantics con-
veyed through the change in the specialization characteristic from disjoint to overlapping
raises an unintended consequence. According to the current model, deletion of a
Purchased_part entity will trigger deletion of the associated part from the PART entity
type. This is perfectly alright for a disjoint specialization (Figure 4.23c) because the
186 specific Part entity deleted will not also be a Manufactured_part. When the specialization
is overlapping (Figure 4.23d), the deletion of a Part entity triggered by the deletion
of a Purchased_part entity can further trigger the deletion of the corresponding
Manufactured_part entity if this Part happens to be obtained from both sources. What
if the intention was to only stop outsourcing the specific Part and continue insourcing it?
Then, an inadvertent error has occurred by the automatic deletion of the corresponding
Manufactured_part due to the cascade deletion constraint imposed on that sc entity type.
That is why, in Figure 4.23d, the deletion constraint C on MANUFACTURED_PART is
labeled an error. The revision of the cascade deletion constraint to the “set null” deletion
constraint on PART in the SC/sc relationship defined by the defining predicate “out-
sourced” remedies this error (Figure 4.23e). It must be noted that this correction is pos-
sible only because
the specialization is partial; if this is total, an alternative solution must be sought.
The second example pertaining to the role of deletion constraints in the intra-entity
class relationship types demonstrates the incorporation of deletion rules for the EER con-
struct “Categorization.” As a quick review, categorization entails an intra-entity class
relationship where a sc is-a subset of the union of one or more SCs (for a detailed review,
see Section 4.1.6). Figure 4.24 is an excerpt from an example introduced in Section 4.1.6
(Figure 4.10). The example here (Figure 4.24a) is a partial categorization since all entities
of the entity type INDIVIDUAL are not Donors. The “category” DONOR being a subclass
in this relationship, by definition it always has total participation in each of the SC/sc
relationships in the categorization. Thus, the “set null” (N) specification on FOUNDATION
end of the edge conflicts with the participation constraint of DONOR in this relationship.
Likewise, since FOUNDATION also exhibits total participation (solid edge) in this rela-
tionship, the deletion constraint N on the DONOR end of the edge is not valid either.
Next, since all Company entities are Donors (solid edge implying total participation), the
deletion rule R on COMPANY will create a deadlock in that a deletion action cannot be
directly initiated on COMPANY. The deletion of a Donor entity triggering the deletion of
the related Company entity through the deletion rule C is perfectly legal. Once again,
the deletion rule R on INDIVIDUAL is compatible with the partial participation of
INDIVIDUAL in the relationship. On the same grounds, the total participation of DONOR
in the relationship conflicts with the deletion constraint R on DONOR in this SC/sc rela-
tionship. The reader should now be able to interpret the revisions shown in Figure 4.24b
and the associated errors identified and the final solution shown in Figure 4.24c.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Name
Ssno
INDIVIDUAL
R
187
187
Name
Tax_id Donor_id
R R
COMPANY U DONOR
C
U
N
Id Type
Title Address
N
FOUNDATION
(a)
Name
Ssno
INDIVIDUAL N
Name
Tax_id Donor_id
N
C
COMPANY U
C DONOR
U
Id Type
Title Address
C
FOUNDATION
(b)
Name
Ssno
INDIVIDUAL N
R
Name
Tax_id Donor_id
C C
COMPANY U DONOR
U
Id Type
Address
Title
C
FOUNDATION
(c)
Note: Square enclosure of a deletion constraint indicates erroneous specification.
Chapter Summary
With the advent of complex database applications, the constructs available in the entity-
relationship (ER) modeling grammar were found inadequate to fully capture the richness of the
conceptual design. The enhanced entity-relationship (EER) modeling grammar extends the orig-
188 inal ER modeling grammar to include a few new constructs: specialization/generalization, spe-
cialization hierarchies and lattices, categorization, and aggregation. Each of these new
constructs can be displayed in an EER diagram.
A fundamental unit of these intra-entity class relationships is the Superclass/subclass (SC/sc)
relationship where the superclass represents a generic entity type for a group of entity types (sub-
classes). Since the generic entity type subsumes the group, it is also referred to as an entity class.
Specialization and generalization can be viewed as two sides of the same coin. Specializa-
tion involves the generation of subgroups (i.e., subclasses) of a generic entity class by specify-
ing the distinguishing attributes of the subgroups, whereas generalization consolidates common
attributes shared by a set of entity types into a generic entity type. Two main constraints apply to
specialization/generalization: the completeness constraint, which can be total or partial, and the
disjointness constraint, which can be disjoint or overlapping.
A specialization/generalization can take the form of a hierarchy or a lattice. In a specializa-
tion hierarchy, an entity type can participate as a subclass in only one specialization, whereas in
a specialization lattice an entity type can participate as a subclass in more than one specializa-
tion. A category allows for the modeling of a situation where a subclass can be a subset of the
union of several superclasses. A constraint imposed by the use of a category is that an entity
that is a member of a category (subclass) must exist in only one of its superclasses. On the
other hand, aggregation allows for the relaxation of this constraint and requires that an entity that
is a member of an aggregate must exist in all of its superclasses.
Finally, the incorporation of deletion rules in intra-entity class relationships is articulated.
Exercises
1. What is the difference between an inter-entity class relationship and an intra-entity class
relationship?
2. What is a subclass and when is a subclass used in data modeling?
3. Under what circumstances in a specialization is it possible for one superclass to be related
to more than one subclass and one subclass to be related to one or more superclasses?
4. What is the type inheritance property?
5. What is the difference between specialization and generalization? Why is this difference not
reflected in ER diagrams?
6. What is the difference between total specialization and partial specialization, and how is
each reflected in an ER diagram?
7. What is the difference between a specialization hierarchy and a specialization lattice?
8. How does categorization differ from specialization?
9. What is the difference between a total category and a partial category?
10. In categorization, what is meant by the property of selective type inheritance?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
11. Explain how a total category and specialization are mutually substitutable constructs.
12. Contrast categorization and aggregation.
13. Consider the following Presentation Layer EER diagram.
189
a. Identify all entity types that at one time or another function as a superclass.
b. List the superclass entity type and subclass entity type(s) in each distinct
specialization.
c. Identify an entity type that functions as a shared subclass. What type of specialization
does this entity type reflect?
d. Which entity types comprise the entire specialization hierarchy? For each level in this
specialization hierarchy, define which entity types serve as a superclass and which
serve as a subclass.
e. Which entity type functions as a superclass in more than one specialization?
f. How many Superclass/subclass(SC/sc) relationships appear in the ER diagram?
g. How many specializations appear in the ER diagram?
h. Which entity types in the diagram form a category?
i. What is the meaning of the arc in the diagram?
j. List the attributes inherited by the entity type RESV_AGENT.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
191
a. Describe what is reflected by the entity types and relationship types in Presentation
Layer ER diagram number 1. In other words, please tell the basic facts of the story. An
explanation of the various attributes need not be part of your story.
b. How do the basic facts of the story change with the change noted in Presentation
Layer ER diagram number 2?
192
c. How many data sets would be required to illustrate the nature of the information
requirements in Presentation Layer ER diagram number 1?
d. What would be the name and attributes associated with each data set identified in
question (c)?
e. Using Presentation Layer ER diagram number 1, assume there are four clinic owners,
at least one of which is a physician and at least one of which is a medical corporation.
How many entities (instance of each entity type) would appear in the MEDICAL_
CORPORATION, PHYSICIAN, and CLINIC_OWNER data sets? It is possible that
there may be more than one correct answer to this question.
f. How would your answer to question (e) change if it were based on illustrating the dif-
ference between Presentation Layer ER diagram number 1 and Presentation Layer ER
diagram number 2? Please justify your answer.
g. Discuss how the basic facts of the story change if the story is based on Presentation
Layer ER diagram number 3.
16. Consider the Presentation Layer ER diagram that appears in Figure 4.21.
a. What makes SCHOOL part of the specialization lattice involving SCHOOL, NOT_
FOR_PROFIT_ORGANIZATION, and PUBLIC_SCHOOL and, at the same time, part
of the SPONSOR category that involves CHURCH, SCHOOL, and INDIVIDUAL? What
role (i.e., subclass, superclass, category, aggregate) do SCHOOL and NOT_FOR_
PROFIT_ORGANIZATION serve in each of these structures?
b. Does SPONSOR take the form of a total category or a partial category?
c. Is it possible to redraw the SPONSOR category as a specialization and still retain the
participation of SCHOOL in the specialization? If the answer is “yes,” redraw this por-
tion of Figure 4.21.
17. This exercise contains additional information to the information given in Exercises 17 and
18 in Chapter 3 in order to give you an opportunity to work with various types of enhanced
ER modeling constructs.
• Both coaches and referees are basketball professionals who have chosen different
careers within this profession.
• Players no longer serve as counselors in summer youth basketball camps. Instead,
some of the players and all of the coaches do voluntary service as trainers in the
summer youth basketball camps.
Incorporate this additional information into the Presentation Layer ER diagram that you
developed for Exercise 17 in Chapter 3.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
18. Develop valid deletion rules and incorporate the associated deletion constraints in the
EERDs shown next. Then explain the meaning of the incorporated deletion constraints,
clarifying the absence of errors in your specifications.
PHYSICIAN
Address
U
Name Emp_numb
Salary Gender
Role PERSONNEL
O
SURGEON
Speciality Emp_numb
Skill
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 4
Lname
Salary
Minit
Employee_no
Fname
Name
194
Gender EMPLOYEE
Address
skill
Qualification U
Field U
DESIGNER OPERATOR
Experience Union
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Enhanced Entity-Relationship (EER) Modeling
Emp#
CLINIC_PERSONNEL Gender Address
Name
Phone#
U
d
Ssno PERSON
Salary U
195
U
O
SALARIED_EMPLOYEE
Speciality SURGEON U
d Con_type
Con_years
PATIENT
U U
NURSE PHYSICIAN
Grade
Yrs_experience Speciality
19. Develop valid deletion rules and incorporate the associated deletion constraints for the
EERD in diagram 3 of Exercise 15. Then explain the meaning of the incorporated deletion
constraints, clarifying the absence of errors in your specifications.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 5
MODELING COMPLEX
RELATIONSHIPS
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Rank Name
Qualification Course# Cname
Credits
m n
INSTRUCTOR Can_teach COURSE
n 199
Offered_
during
4
n
From_dt
Teaches_ 4
QUARTER
during
To_dt
Quarter
FIGURE 5.1 An initial ERD and sample data sets for vignette 1
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
teaching that course if it is offered during the quarter when he or she teaches, then both
Pezman and Fite would be scheduled to teach EE812 in both Fall and Spring. This result
can, in fact, be derived from the three binary relationship types Teaches, Teaches_during,
and Offered_during that are shown in Figure 5.2. Absent such a business rule, it is
possible, as stated in the story line, for Pezman to teach EE812 in the Spring quarter
and for Fite to teach EE812 in the Fall quarter; this can never be derived from the three
binary relationship types Teaches, Teaches_during, and Offered_during that are shown
200
in Figure 5.2.
m n n
Teaches
Offered_
during
4
n
From_dt
Teaches_ 4 QUARTER
during
To_dt
Quarter
FIGURE 5.2 A second ERD and sample data sets for vignette 1
What is the solution to this problem? When can one unequivocally infer who teaches
what and when? It is possible to capture this condition precisely via a ternary relationship
type among INSTRUCTOR, COURSE, and QUARTER. This relationship type is shown as
Schedule in Figure 5.3, with a supporting sample data set.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Rank Name
Qualification Can_teach
(1,m) Cname
(1,n) Credits Course#
(0,n) (0,m)
INSTRUCTOR Schedule COURSE
(1,4) (1,p)
(1,4)
Quarter From_dt
201
Teaches_ To_dt
(0,n)
during (0,n) Offered_
QUARTER
during
Note: A ternary relationship type is of degree three and is signified by three edges emanating from the relationship
diamond to the participating entity types.
[Pezman teaches EE812 in Spring; and Fite teaches EE812 in Fall] cannot be inferred from the
datasets above representing binary relationships among the three entity types taken two at a time;
the specified fact is unequivocally captured in the dataset below–the ternary relationship among the
three entity types, Schedule.
Schedule
Name Course# Quarter
Pezman EE812 Spring
Pezman EE832 Winter
Pezman EE330 Fall
Pezman EE330 Spring
Fite EE812 Fall
Fite EE430 Fall
Fite EE430 Spring
Hall EE821 Winter
Hall EE430 Spring
Hall EE430 Summer
FIGURE 5.3 The ternary relationship type Schedule and associated sample data
set
Note that ER modeling grammars that use the “look across” notation (Chen’s notation
employed in the Presentation Layer ERD) to express a cardinality ratio cannot accurately
capture the cardinality ratio of any relationship type beyond degree two. In a binary rela-
tionship type, there is only one entity type present when looking across the relationship
type. For instance, in Figure 5.1, it is possible to state unambiguously that one instructor
entity is related to a maximum of n course entities. However, looking across from the
INSTRUCTOR in a ternary relationship type Schedule (Figure 5.3), we see both COURSE
and QUARTER, and the maximum cardinality should actually reflect the maximum
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
combination of courses and quarters per instructor. The (min, max) notation, which uses
the “look near” approach to express the structural constraints of a relationship type, is
able to express the maximum cardinality precisely in a relationship type of any degree.
For this reason the (min, max) notation is used here to specify the structural constraints
of relationship types beyond degree two; in the interest of consistency, the same notation
will be used for recursive and binary relationship types henceforth.
In the ERD that appears in Figure 5.3, the (0, n) on the edge labeled “Teaches_by”
202
indicates that an instructor need not teach any course in any quarter (0 for the min value)
and that an instructor may teach up to n {course, quarter} pairs (n for the max value). For
example, using the data sets in Figure 5.3, while Stansbury can teach EE430 (see the
Can_teach data set), the data set Schedule indicates (by his absence) that he is not
scheduled to teach a course during any of the four quarters. Likewise, Cords can teach
EE435 but is not scheduled to teach a course during any of the four quarters either. Hall,
on the other hand, who can teach three different courses, is scheduled to teach two of
them. Looking at the (1, p) on the “Included_in” edge associated with the ternary rela-
tionship type Schedule reveals that a quarter must be related to at least one {instructor,
course} pair (1 for the min value) and that a quarter may be related to up to p {instructor,
course} pairs (p for the max value). The data set Schedule in Figure 5.3 indicates that
there is at least one {instructor, course} pair related to each quarter and that each quarter
has one or more instructors teaching one or more courses.
One can also argue that Schedule can be conceptualized as a base entity type and, in
effect, preempt the need for formulating a ternary relationship type. The ERD in Figure 5.4
models this viewpoint. This is a valid argument since SCHEDULE in Figure 5.4 lends itself
to conceptualization as a base entity type with its own unique identifier, (Call#).
Credits
Name
Call# Room
Qualification No_of_students
(1,1)
Course#
Included_in
(1,p)
From_dt
Quarter
QUARTER
To_dt
Note: In this conceptualization it is possible to show that a course can be scheduled for a quarter without
any instructor assignment (see the bolded 0 in the Teaches_by relationship connector to SCHEDULE). It is not
possible to do the same in the design shown in Figure 5.3.
FIGURE 5.4 Modeling the ternary relationship type Schedule of Figure 5.3 as a base entity type
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Let us now examine another scenario to signify the utility of a ternary relationship.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Day
From
Pharm# Location Time
To
Working_hrs
PHARMACY
Pat_name Med_code Med_name
Occupation
Insurance (1,p)
Price Expiration_dt
(1,q) (0,n)
PATIENT Dispenses MEDICATION
List_price
(0,r)
(0,s)
Age
Prescribes
Dosage Frequency
(1,m)
Speciality
Experience PHYSICIAN
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Day
From
Pharm# Location Time
To
Working_hrs
PHARMACY
Occupation Med_code
(1,q) (0,n)
PATIENT Dispenses MEDICATION
(0,s)
(0,r)
(1,1)
Dosage Frequency
Writes
(1,m)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
207
FIGURE 5.8 The ternary relationship type Advising plus two binary relationship types
Trained_in and Enrolls
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
208
FIGURE 5.9 Changing the maximum cardinality of the “Advisee” edge in Figure 5.8a from
p to 1
Rule V3R1 requires that whereas a student may have multiple advisors and also mul-
tiple majors, the student is not permitted to have the same advisor for more than one
major. This essentially means that a {student, advisor} pair must be related to no more
than one major while a major may be related to many such pairs. This constraint is
reflected in the revised Advising data set in Table 5.1.
As previously discussed, Rule V3R1 cannot be incorporated in the ERD by changing the
structural constraints of the ternary relationship type Advising in Figure 5.8. The implica-
tion of this rule is that one {student, advisor} pair can be related to only one major. This
does not prohibit the presence of several {student, advisor} pairs, nor does it preclude a
major being related to several {student, advisor} pairs. Rule V3R1 can then be interpreted as
a relationship between MAJOR and an emergent entity type—say, COUNSELING—that
results from the cluster {ADVISOR—Counseling—STUDENT}, as shown in Figure 5.10.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
FIGURE 5.10 The relationship type In_charge_of with the cluster entity type
COUNSELING
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
COUNSELING here is called a cluster entity type,1 indicated in the ERD by the dotted rect-
angle. Note that each cluster entity “COUNSELING” represents an inter-related {Student,
Advisor} pair. Rule V3R1 is implemented in the ERD by replacing the ternary relationship
type Advising in Figure 5.8 with the cluster entity type COUNSELING and its relationship
with MAJOR (Figure 5.10). The maximum cardinality of 1 on the edge labeled
“Takes_charge” is also required to specify the constraint implied by the Rule V3R1.
The other two relationship types (Trained_in and Enrolls) shown in Figure 5.8 also
210
persist in the ERD. They are not shown in Figure 5.10 in order to enhance the clarity of
expression of modeling this particular business rule.
Finally, consider the addition of one more business rule (V3R2) to vignette 3: In order
to minimize advising snafus, the college mandates that no more than one advisor can
advise the same student for the same major.
The Advising data set in Table 5.2 reflects Rule V3R2. This rule can be interpreted as
one advisor per {student, major} pair triggering the emergence of another cluster entity
type ENROLLMENT as the product of the relationship type Enrolls between STUDENT
and MAJOR. ENROLLMENT then is related to ADVISOR.
1
“Aggregate entity type” is also a popular term for this. Sometimes, this is referred to as a composite
(molecular) object. However, in order to distinguish this from an aggregate arising from an aggrega-
tion construct (see Section 4.1.6), the term “cluster entity type” is used. Notice that this entity type
is essentially virtual (not real) and does not have a real existence, like a base or weak entity type.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Figure 5.11 portrays the cluster entity type ENROLLMENT and its relationship type
Assignment with ADVISOR. Once again, note that the maximum cardinality on the edge
labeled “Assigned_to” indicates a 1 in order to enforce the constraint of no more than one
advisor per {student, major} pair in the ERD. Figure 5.12 is a consolidated view of the two
cluster entity types ENROLLMENT and COUNSELING, along with the rest of the scenario
from the beginning of the episode in vignette 1 and the embellishment added in vignette 3.
211
FIGURE 5.11 The relationship type Assignment with the cluster entity type
ENROLLMENT
2
While specification of attributes for a relationship type in ER modeling grammar is an accepted
practice, designation of a multi-valued attribute for a relationship type is somewhat uncommon. An
alternate solution essentially resulting from the decomposition of this multi-valued attribute of a
relationship type is presented in Section 5.5.2.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
213
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
214
FIGURE 5.14 An example of the cluster entity type SECTION emerging as a product of a
quaternary relationship type
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
similar to Figure 5.15, with Location and Time_slot becoming part of the multi-valued
attribute of the Offering relationship type, as long as the entity types CLASS_ROOM
and TIME_SLOT need not be represented as entity types. This design appears in
Figure 5.17.
215
SECTION
From To
Room# Bldg#
(0,n) (0,m)
INSTRUCTOR Offering COURSE
(1,p) Course#
Year Name
Qtr_name
QUARTER
Quarter
Qtr_prefix
FIGURE 5.17 Reducing Offering to a ternary relationship type with a multi-valued attribute
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Observe that the cluster entity type SECTION (ERD in Figure 5.16 or 5.17) in its
current form implies that “team teaching” of a course section is possible—that is, a
particular course in a certain time slot in a given room during a specific quarter can be
offered by more than one instructor. Observe that the current design also permits other
semantically impractical states; for example, an instructor (or another instructor) may
teach a different course in the same room at the same time slot during a specific quarter.
Nonetheless, let us presently focus on the “team teaching” issue because the purpose of
216
the design in Figure 5.16 is limited to exemplifying the possibility of a relationship type of
degree five.
3
The ER modeling grammar does allow a weak relationship type to be involved in a relationship with
another relationship type. A weak relationship type is discussed in Section 5.3.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
217
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
218
Interestingly, the design in Figure 5.19 does not prevent sections of two or more dif-
ferent courses from being taught in the same classroom at the same time during a quarter.
One can argue that this is intended to permit cross-listing of multiple courses. Suppose
cross-listing of courses is not permitted. How should the ERD in Figure 5.19 be altered to
handle such a business rule? This is left as an exercise for the reader.
Now, let us embellish the Madeira College story further by stating a couple of addi-
tional business rules: (1) While a course section need not use a textbook, it is also possi-
ble that a course section may sometimes use more than one textbook and that a textbook
may be used in multiple course sections (V4R3). (2) Likewise, a student must enroll in
one or more course sections and a course section must have more than one student
(V4R4).
Does the ERD shown in Figure 5.20 (a relationship type of degree five) accurately
describe this scenario? Not unless it is okay for a course section in a quarter, when held
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
in one or more classrooms, to use one or more textbooks for one or more students and use
a different textbook for another student or students, and also okay for a course section in
a quarter to use a different textbook (or textbooks) for the same student(s) in a different
classroom (or classrooms). Clearly, while the ERD is syntactically correct, it does not
reflect the semantics conveyed by the business rules stated above. SECTION, to begin
with, should be a cluster entity type akin to the design shown in Figure 5.18b. Then, it is
possible to establish an m:n relationship type between SECTION and TEXTBOOK as well
219
as between SECTION and STUDENT. The ERD in Figure 5.21 is a correct rendition of this
scenario in which a student enrolls in a course section and uses the same textbook(s) that
all other students in that course section are using, and the classroom location of the
course section does not change with the student(s) or textbook(s).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
220
The final ERD reflecting all the business rules stated in the rest of the story for
Madeira College appears in Figure 5.22.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
221
Name Location
Size Contract#
Contract
Client
Year
(0,n)
(0,m)
Contractor
Contract# Year
Name Location
Size
Client Contract
(0,n)
(1,1)
COMPANY Contract Linked_with
Pno
(0,m) Pname
(0,n)
Contractor
PROJECT
Note: ER modeling grammar does not permit an edge connecting two relationship types.
FIGURE 5.25a Syntactic error in the representation of the Linked_with relationship type
Contract# Year
Name Location
Size
Client Contract
(0,n)
Contract (1,1)
COMPANY Linked_with
Pno
(0,m) (0,n)
Contractor
PROJECT
CONTRACT
FIGURE 5.25b Resolution of the syntactic error in Figure 5.25a using a cluster entity type CONTRACT
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Here is another real-world scenario similar to the one described in the previous
example: A flight often connects to several other flights at an airline’s hub. Thus, a flight’s
arrival information and the connecting flight’s departure information are crucial to the
airline and its passengers. Figure 5.26a depicts this scenario.
223
Happens
A technical examination of the ERD yields the following facts captured in the ERD:
• An airport need not be a place for any flight connections; then again, as
many as “n” flight connections can happen in an airport.
• A flight connection happens in exactly one airport—no more, no less, and not
in thin air, either.
• A flight connection is defined as a flight (say, Flight A) connecting to another
flight (say, Flight B) on a given day; accordingly, Flights A and B connect
more than once a week, and for each of these connections, the Flight_gate,
the Connection_gate, and the Connection_time can be the same or different.
A closer scrutiny reveals that the current model will not prevent a connection between
Flights A and B occurring at different airports on different days. Since the business rule
forming the basis for this ERD is not available, one may certainly question the intent
behind this ERD. It is perhaps more practical to assume that Flights A and B connect at
the same airport on all days they connect. If that is true, then this ERD is in error.
Figure 5.26b offers a solution to this problem. The structural constraints (cardinality
constraint and participation constraint) of the binary relationship between AIRPORT and
the cluster entity type FLIGHT_CONNECTION remain the same. In fact, the only differ-
ence between the ERDs in Figures 5.26a and 5.26b is that the multi-valued, composite
attribute Connection in Figure 5.26b is on the relationship Happens instead of on the rela-
tionship Connect. Now, a flight connection is between two flights (Flight A and Flight B,
independent of the day of flight), and a specific flight connection (say, between Flights A
and B) happens not only at just one airport but at the same airport, irrespective of the day
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
of flight. The fact that Flights A and B connect more than once a week and that, for each
of these connections, the Flight_gate, the Connection_gate, and the Connection_time can be
the same or different, is independent of the fact that the two flights connect in not just one
but the same airport. The difference between the two ERDs will get further clarified after
the final decomposition of the two models later in this chapter (Section 5.5).
224
Day Flight_gate
------ Connect_gate
Flight# Origin
Destination Connect_time
Connection
Flight
(0,n)
(1,1)
FLIGHT Connect Happens
(0,m)
Connects_flight (0,n)
AIRPORT
FLIGHT_CONNECTION
Ap_code Ap_name
City
So far, we have seen several real-world scenarios that require relationship types
beyond the conventional binary relationship types. The presentation is by no means
exhaustive because other innovative ways of combining the various modeling constructs of
the ER/EER modeling grammar are possible. An understanding of the domain of the busi-
ness problem normally leads to the emergence of such uses. The objective of this section
has been to sensitize data modelers and database designers to the rich modeling opportu-
nities available at their disposal via the ER modeling grammar.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Pl_name
Lname Name_tag
Salary
Minit Pnumber
Address Works_in
Fname (10
Name 1) 0,n Building
(1, )
Emp#
No_of_dependents Managed_by
FIGURE 5.27 The Works_in and Managed_by relationship types from Figure 3.13
all Bearcat employees work at the plants. The new business rule essentially suggests a
precedence relationship between Works_in and Managed_by. In other words, for a
Managed_by relationship instance between an employee and a plant to exist, a corre-
sponding Works_in instance between the same two entities must be present. In 1999,
Debabrata Dey, Veda Storey, and Terry Barron published an article in the journal ACM
Transactions on Database Systems in which they introduced a new ER modeling con-
struct called the weak relationship type that indicates an inter-relationship integrity con-
straint. The symbol used to denote a weak relationship type is the same as the identifying
relationship type (a double-diamond symbol; see Figure 3.2), as shown in Figure 5.28.4 A
solid arrow from a regular relationship type to the weak relationship type indicates an inter-
relationship integrity constraint, implying that the latter relationship set is included in (i.e.,
is a subset of) the former relationship set. That is, in order for an instance of the Managed_by
relationship type to occur, an instance of the Works_in relationship type between the
same entity pair should be present; essentially, a manager of a plant must work in the
same plant. This constraint specification is referred to as inclusion dependency.5 Dey,
Storey, and Barron define a weak relationship type as “... a relationship, the existence of
whose instances depends on the [presence of] instances of (one or more) other
relationships.”6 The inclusion dependency is shown as Managed_by Works_in.
4
This does not cause any conflict in the ER modeling grammar because an identifying relationship
type can exist only between a weak entity type and its identifying parent entity type(s), while a
weak relationship type can relate only to a regular relationship type. Furthermore, interpretation in
context will clarify if a double diamond is an identifying relationship type or a weak relationship
type or both.
5
Inclusion dependency is different from the inclusive arc construct of the ER modeling grammar in
that it conveys directionality through a subset relationship, whereas the inclusive arc conveys the
idea of mutuality in inclusiveness. Furthermore, an inclusive arc pertains to the participation of
an entity type in any two relationship types, whereas inclusion dependency is about the inter-
relationship between two relationship types among the same two entity types. Inclusion dependency
when mapping a conceptual schema to a logical schema is discussed in Chapter 6.
6
Dey, D., V. C. Storey, and T. M. Barron. “Improving Database Design through the Analysis of
Relationships.” ACM Transactions on Database Systems, 24, 4 (December) 453–486, 1999.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Pl_name
Lname Name_tag
Salary Pnumber
Minit
Address Works_in
Fname ) (10 Building
Name (0,1 0, n
)
Emp#
A weak relationship type arises in many real-world situations when two relationship
types are linked by (1) a condition precedence sequence (based on meeting a condition)
or (2) an event precedence sequence (based on the occurrence of an event). The weak
relationship type Managed_by in Figure 5.28 is a condition-precedent weak relationship
type because the condition that one has to be an employee of the plant in order to be a
manager of that plant semantically precedes that employee being the manager of that
plant. Another opportunity for a condition-precedent weak relationship can be seen in the
scenario reflected in Figure 5.2. An appropriate excerpt from Figure 5.2 is presented in
Figure 5.29. Consider a business rule: In order for an instructor to teach a course, he or
she must be capable of teaching that course. This refinement of the story is incorporated
in the ERD in Figure 5.29. Teaches here is a condition-precedent weak relationship type
that is inclusion-dependent on Can_teach, as shown in (Teaches Can_teach). This is
captured in the ERD by the solid arrow drawn from Can_teach to Teaches.
Figure 5.30 is a subset of the ERD for the story narrated in vignette 2 about Get
Well Pharmacists, Inc. (see Section 5.1.2 and Figure 5.6) where a new constraint is
imposed via this business rule: A medication must be stocked by a pharmacy before it
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Day
From
Time
227
On_hand On_order
To
Location
Working_hrs
Pharm #
(2000,n)
PHARMACY Stocks
Pat_name Occupation
(1,n) (0,n)
PATIENT Dispenses MEDICATION
Expiration_dt List_price
Gender Age Med_name
Med_code
Note: It is also possible to view this as a condition-precedent weak relationship based on the storyline.
Suppose a rental agency rents an array of vehicles (e.g., cars, trucks, vans, boats). A plau-
sible business rule in this context is: Before the event “return of a vehicle by a customer”
happens, the event “rental of that specific vehicle by that particular customer” should tran-
spire. Figure 5.31 captures this requirement as an event-precedent weak relationship type.
7
What is important is that this indicates a weak relationship type. Interpretation of whether it is
event-precedent or condition-precedent has only amusement value.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
While the weak relationship type can be used to indicate inclusion dependency in
terms of a unidirectional subset relationship between two relationship types, it can also be
used to model scenarios of bidirectional (mutual) exclusion between two relationship
types. Referred to as exclusion dependency, this construct in the ER modeling grammar
may appear to be equivalent to the ER modeling construct “exclusive arc” defined in
Chapter 3 (see Figure 3.2), but it is not. Thus, these two constructs are not mutually
substitutable. As an example, consider an excerpt from a larger ERD, displayed in
228
Figure 5.32, in which a technician performs and/or supervises a maintenance activity on
an aircraft. Figure 5.32a models some of the technicians to be supervisors in addition to
being technicians. Only the supervisors supervise the maintenance activities. And, being a
technician, a supervisor can also engage in performing a maintenance activity. In fact, the
modeling here permits a technician to also engage in performing the maintenance activity
s/he supervises.
Emp_num
Name
Skill
TECHNICIAN EMPLOYEE
U
N
C
N
U
)
(0,n
Gender Address
C
C Serialno
m)
(1,
(3,7) C N
1)
(1,
Type
Maint_dt
D n) Address
( 0, Gender
(0,
n)
MAINTENANCE_
Goes_for
ACTIVITY C (1,1) Age
)
(0,m
C
Serialno
Performed_by
Type
(a)
Supervised_By Capacity
C
Num_of_hrs AIRCRAFT
(3,7)
1)
(1,
Type
Maint_dt
D n)
(0,
MAINTENANCE_
Goes_for
ACTIVITY C (1,1)
(b)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Emp_num
Name
Skill
TECHNICIAN EMPLOYEE
U
C N
230
Gender Address
(0,
)
m
n)
(0,
Age
C Serialno
Supervised_by
Performed_by Type
Capacity
C
)
,1
Num_of_hrs (1
AIRCRAFT
(3,7)
Type
Maint_dt
D
n)
(0,
MAINTENANCE_ Goes_for
ACTIVITY (1,1)
C
FIGURE 5.33 Modeling exclusion dependency between two relationships among the same
two entity types
231
8
The symbol , referred to as a “composite” in A (B C), implies a projection from the natural
join of B and C that is union-compatible with A.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Figure 5.36 depicts this scenario as an ERD. The relevant entity types and their
attributes are arbitrarily assigned.
232
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Following Dey, Storey, and Barron (1999), this story line can be expressed as an exclusion
dependency between (Writes Referees) and Conflict_of_interest, as shown in
Figure 5.37. Here, the composite (Writes Referees) represents the reviewer who referees
an author’s paper, and Conflict_of_interest captures the reviewer who has a conflict of
interest with the author. That is, Conflict_of_interest and (Writes Referees) are mutually
exclusive. In other words, if Conflict_of_interest exists, then the composite (Writes
Referees) cannot exist and thus the author cannot be assigned to review the paper. On the
233
other hand, if the composite (Writes Referees) exists, then the Conflict_of_interest rela-
tionship cannot exist. Thus, the exclusion is bidirectional. Notice that all three relation-
ships (Writes, Referees, and Conflict_of_interest) are modeled as weak relationship types
since, unlike in inclusion dependency, there is no directionality in expressing an exclusion
dependency. In other words, the order of the relationship types is immaterial. However, a
relationship instance cannot be part of both the composite (weak) relationship type and
the other weak relationship type at the same time. The exclusion dependency itself is
indicated by a dotted line with no directional pointer connecting the weak relationship
type Conflict_of_interest and the composite (weak) relationship type (Writes Referees).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
235
FIGURE 5.39 Decomposition of the ternary relationship type Uses to the gerund entity type
USE
The transformation process is the same for quaternary (degree four), quintary (degree
five), and higher-order relationship types. The reader may, as an exercise, map the qua-
ternary relationship Performs (Figure 5.13) to a gerund entity type PERFORMS.
Contract#
Name Location Size Year Name Location
Contract Size
(0,n) (1,m)
COMPANY Contract CONSULTANT
Client Contractor
(a)
236 Contract as a binary relationship type with a multi-valued attribute Contract
(0,n) (1,m)
COMPANY Executes CONSULTANT
Client Contractor
(1,1) Document
Year
Contract#
Contract_yr
--------- CONTRACT
(b)
Decomposition of the multi-valued attribute Contract in Figure 5.40a to a weak entity type CONTRACT
COMPANY CONSULTANT
(0,n) (0,m)
Signs
Honors
Year
(1,1) (1,1)
Contract#
Contract_yr
--------- CONTRACT
(c)
Decomposition of the m:n cardinality constraint in the identifying relationship type Executes in Figure 5.40b
conveyed by them are the same as in Figures 5.41a and 5.40a, respectively.9 In fact, the
degree of this relationship in the decomposition is not three. In order for this to qualify as
a ternary relationship, Execute cannot be an identifying relationship type.
9
In Chapter 6, we will see how the mapped logical schema conveys the semantics of the conceptual
schema correctly.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Contract #
Contract_yr
Name Location Size -------- Year Name Location
Contract Size
Retainer
(0,n) (1,m)
COMPANY Contract CONSULTANT
Client Contractor
237
(a)
Contract as a binary relationship type with a multi-valued attribute Contract
(0,n) (1,m)
COMPANY Executes CONSULTANT
Client Contractor
(1,1) Document
Year
Contract#
Contract_yr
--------
CONTRACT
Retainer
(b)
Decomposition of the multi-valued attribute Contract in Figure 5.41a to a weak entity type CONTRACT
COMPANY CONSULTANT
(0,n) (0,m)
Signs
Honors
Year
(1,1) (1,1)
Contract #
Contract_yr
--------
CONTRACT
Retainer
(c)
Decomposition of the m:n cardinality constraint in the identifying relationship type Executes in Figure 5.41b
Let us now tweak the story line to refine the business rule about the retainer: The retainer
is independent of the contract. That is, a company pays a retainer (a fixed amount) to a con-
sultant for availability at short notice with or without a contract; this amount may change, but
not as a part of the contract provisions. Moving the attribute Retainer out of the composite
attribute Contract, as in Figure 5.42a, is the obvious solution. By decomposing the multi-valued
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
attribute of the m:n relationship type Contract, as was done in the previous scenario (see
Figures 5.40 and 5.41), we end up with a weak entity type CONTRACT, with the two identify-
ing parents COMPANY and CONSULTANT. Since Retainer is independent of CONTRACT, it
cannot be an attribute of CONTRACT; instead, it is mapped as an attribute of the identifying
relationship type Execute, as shown in Figure 5.42b. Is this mapping correct? While syntacti-
cally acceptable, this mapping generates a semantic error. Given that CONTRACT is the only
child entity type in the relationship type Execute, Retainer (the attribute of this relationship
238
type) will end up as an attribute of CONTRACT because it is the child entity type of the rela-
tionship type Execute. Then, the decomposition is no different from the ERD in Figure 5.41b.
But then, the source ERDs from which the decompositions shown in Figures 5.42b and 5.41b
are not semantically equivalent (see Figures 5.42a and 5.41a, respectively). In short, the map-
ping shown in Figure 5.42b fails to honor the business rule that The retainer is independent of
the contract, which is expressed in Figure 5.42a. This essentially generalizes to a semantic rule
that an identifying relationship type cannot have an attribute. Thus, Figure 5.42b is in error.
Contract_yr Contract #
---------------
Year
Name Location
Size Contract Name Location
Retainer Size
(0,n) (1,m)
COMPANY Contract CONSULTANT
Client Contractor
(a)
“Retainer” as an attribute of Contract independent of the multi-valued attribute Contract
(0,n) (1,m)
COMPANY Execute CONSULTANT
Client Contractor
Contract#
Contract_yr
--------------- CONTRACT
(b)
Erroneous decomposition of the ERD in Figure 5.42a with an attribute of the identifying relationship type
(0,n) (1,m)
COMPANY Retains CONSULTANT
Client Contractor
AGREEMENT
(0,p)
Execute
Year
(1,1)
Contract#
Contract_yr
-------- CONTRACT
(a)
Retainer as an attribute of Retains independent of CONTRACT, a mapping of the
multi-valued attribute Contract in Figure 5.42a
Retainer
Location Name Location
Size Size
(0,p)
Execute
Year
(1,1)
Contract#
Contract_yr
-------- CONTRACT
(b)
Decomposition of the ER diagram in Figure 5.43a
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Given the original source ERD in Figure 5.42a, which of the two decompositions—the
one in Figure 5.43b or the one in Figure 5.44—more appropriately captures the import of
the intended semantics of Figure 5.42a? The reader is encouraged to investigate this
question.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
expressed “as is” in the logical schema. The solution is to configure the cluster entity type
either as a weak entity type or as a gerund entity type.
Invariably, there is a nucleus relationship within a cluster from which the cluster
entity type emerges. Often, this relationship type gets distilled to an entity type due to
previous decompositions, such as a gerund entity type resulting from the decomposition of
an m:n binary or recursive relationship type or any other relationship type of higher
degree. Otherwise, at this time, this nucleus relationship from which the cluster entity
241
type emerges can be condensed to a gerund or weak entity type. Observe that the cluster
entity type AGREEMENT in Figure 5.43a gets decomposed to the gerund entity type
AGREEMENT, as seen in Figure 5.43b. Also, notice in Figure 5.14 that the relationship
type Offering can be expressed as a gerund entity type to represent a cluster entity type
called SECTION. However, in Figure 5.17, SECTION will decompose to a weak entity child
of a relationship among INSTRUCTOR, COURSE, and QUARTER. The reader may work
these out as exercises.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
How does one model in Figures 5.46a and 5.46b the business rule that a flight cannot
connect to itself? The exclusion dependency between the relationship types Connects_to
and Connection_for imposes this constraint. Observe that the two identifying relationship
types Connects_to and Connection_for also serve the role of weak relationship type for the
tnter-relationship constraint in these ERDs.
242
Contract# Year
Name Location
Size
Client Contract
(0,n)
(1,1)
COMPANY Contract Linked_with
Pno Pname
(0,m) (0,n)
Contractor Location
CONTRACT PROJECT
(a)
A cluster of a recursive relationship type with a multi-valued attribute in another relationship (a replication of Figure 5.25b)
Name Location
Size
Client Year
(0,n)
(1,1)
COMPANY Executes CONTRACT
Contract_yr
- - - - - - - - Contract#
(0,m) (1,1)
Contractor
Linked_with
Pname
(0,n)
Location
PROJECT
(b)
The cluster entity type with a multi-valued attribute in (a) above decomposed to a weak entity type.
Name Location
Size
Client Year
(0,n) Executes (1,1)
COMPANY CONTRACT
) -Contract_yr
- - - - - - - Contract#
(0,m (1,1 (1,1)
Services
Contr )
actor
Linked_with
Pname
(0,n)
Location
PROJECT
(c)
The m:n relationship with a weak entity child in (b) above decomposed: The final Design-Specific ER diagram
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
CONNECTION
FLIGHT
(1,1
) (1,1) 243
(0,m) Connection_for
Happens
Ap_name
Ap_code
(0,n)
City
AIRPORT
Connect_gate
Flight_gate SCHEDULE
Day
----
(1,1)
Connect_time
Flight# Origin
Scheduled_for
Destination
(1,7)
(0,n) Connects_to (1,1)
CONNECTION
FLIGHT
)
(0,m) (1,1 (1,1)
Connection_for
Happens
Ap_name
Ap_code
(0,n)
City
AIRPORT
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
245
FIGURE 5.47b Decomposition of the weak relationship type Teaches using the EER
construct Specialization
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
occurs.10 A fan trap results when the pathway between certain entities in the relationship
fan becomes ambiguous. The following example illustrates a fan trap:
Suppose a library has a large membership and each patron is a member of exactly one
library. Every library also stocks a lot of books, while a specific book can only be in one library.
Every patron borrows at least one book, and every book is borrowed by a patron. A patron may
borrow books only from the library in which he or she is a member. Since we are considering
the database environment at a given point in time, a book is borrowed by only one patron. 247
The ERD in Figure 5.48a models this scenario. The relevant entity types and their
attributes are arbitrarily assigned. At the outset, it may appear that the connection of
PATRON to BOOK via LIBRARY will facilitate deduction of which book(s) is/are available
for borrowing to which patron. But a closer scrutiny of the ERD reveals this not to be the
case. For instance, from the description of the scenario, one may reasonably expect ques-
tions about the following to be answered by the design shown in Figure 5.48a:
1. Number of members in a given library
2. Library in which a particular patron has membership
3. Library in which a particular book is present
4. Number of books in a given library
5. Number of books borrowed by a patron
6. The patron who has borrowed a particular book
Observe that it is impossible to answer questions about items 5 and 6 from the cur-
rent design. The design then, while semantically correct, is not semantically complete.
The cause of this error could be a fan trap present in the design. Clearly, relationship
types Member_of and Available_in are “fanning out” from LIBRARY. Thus, a relationship
fan does exist. What we need to investigate is whether the relationship fan, in this case,
creates ambiguity in the pathway between patrons and books, resulting in a fan trap.
The instance diagram (see Figure 5.48b), reflecting a legal state of the relationships
indicated in the ERD, facilitates the investigation. The pathway in the relationship fan
clearly shows that p5 can borrow three books (k4, k5, and k6); no one else can borrow
these three books. But can we know the number of books borrowed by patron p1 or p2?
The answer is “No.” Also, can we know who borrowed the books k1 and k2? The answer,
again, is “No.” We do know that p1 and p2 are members of the library l1, and could have
borrowed a book only from l1. We also know that the books k1 and k2 are available for
borrowing only from l1. From these two facts, it is impossible to infer who between p1 and
p2 borrowed which of the two books, k1 and k2. Likewise, it is impossible to infer who
between p3 and p4 could have borrowed the book k3. That is, the pathway connecting
patrons and books through the relationships has ambiguity, and it is not possible to answer
10
Decomposition of an m:n relationship type results in a gerund entity type that, when viewed as the
focal point (in this case, child) into which two relationship types fan in, should not be misconstrued
as a relationship fan because the gerund entity type does not have an independent existence as an
entity type—that is, it is identification-dependent on the entity types constituting the original m:n
relationship type. Thus, the apparent fan structure emanating from the gerund entity type is at best
a trivial relationship fan incapable of a potential fan trap.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
248
questions about items 5 and 6 in the list of questions from this design. This is because the
relationship fan here is causing this ambiguity and so is a fan trap. The reason this is a
“trap” is because, superficially, the design appears to provide an unambiguous pathway
between PATRON and BOOK, although in reality it doesn’t.
Figure 5.49 depicts an alternative design. This design is also syntactically and seman-
tically correct. Furthermore, the design answers questions about items 5 and 6 that are
not answerable using the design presented in Figure 5.48. So, is this design the correct
solution? Is it semantically complete in the context of the stated scenario and the list of
anticipated questions? Let us investigate. The instance diagram in Figure 5.49b reflects a
legal state of the relationships indicated in the ERD that appears in Figure 5.49a. As per
the design shown in the ERD (Figure 5.49a), a patron may borrow several books. So, the
patron p4 has borrowed the books k4 and k5. Likewise, a library may have many books.
Observe that libraries l1, l2, and l3 have two books each. A book, however, can be in only
one library. Accordingly, the book k4 is in library l2 and book k5 is in l3. In short, the
instance diagram does not violate any relationship constraint defined in the ERD. If we
navigate through the available pathway in the design from p4 to the library entities, it is
seen that p4 is linked to libraries l2 and l3. Incidentally, the only link available from the
entity type PATRON to the entity type LIBRARY is through the entity type BOOK. So, the
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
inevitable inference about membership from this design is that p4 is a member of two
libraries. Thus, the design violates a business rule of the stated scenario that each patron
is a member of exactly one library. Consequently, the design also yields wrong answers to
questions about items 1 and 2 in the list. The cause of this error could be a fan trap pres-
ent in the design. Clearly, relationship types Borrowed_by and Available_in are “fanning
in” to BOOK. So, a relationship fan does exist. Our investigation reveals that the relation-
ship fan, in this case, does create ambiguity in the pathway between patrons and libraries
249
resulting in a fan trap. The design then, while syntactically correct, is not semantically
correct, but far less complete.
FIGURE 5.49 An alternative solution for the design shown in Figure 5.48
One way to prove that the relationship fans in the designs shown in Figures 5.48 and
5.49 are indeed fan traps is to demonstrate the absence of ambiguous pathways among the
entities in a design that is free of fan traps (proof by contradiction). Figure 5.50 is a design
of the same scenario restructured to eliminate relationship fans. In fact, the ERD depicts a
relationship hierarchy. The instance diagram of Figure 5.50b demonstrates that questions
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
pertaining to all six items about the scenario are answered using the design shown in
Figure 5.50a. The ERD that appears in Figure 5.50a is:
• syntactically correct, as are the ERDs in Figure 5.48a and 5.49a
• semantically correct, as is the ERD in Figure 5.48a, in that both portray the
scenario specified equally accurate (whereas Figure 5.49a has been shown to
be semantically incorrect)
250 • semantically complete because it does not have a fan trap, while the designs
in Figures 5.48a and 5.49a are plagued by fan traps and therefore are seman-
tically incomplete
FIGURE 5.50 Resolution of the fan trap present in Figures 5.48 and 5.49
That said, it is important to understand that all relationship fans are not necessarily
fan traps. If structurally apparent fan traps are not of any semantic significance in
the context of the requirements specification prevailing over its scenario, then those
relationship fans are not fan traps. Thus, unconditional avoidance of relationship fans in
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
ERDs limits the richness of the ER modeling grammar and is not recommended.
For example, the scenario depicted by the ERD in Figure 5.51 expresses the following
story line:
A music group has several musicians. Every musician owns one or more vehicles,
and a given vehicle is owned by only one musician. Likewise, a musician can play sev-
eral instruments, but in this group an instrument is played by only one musician.
251
To begin with, the relevant entity types and their attributes in the ERD (Figure 5.51)
are arbitrarily assigned. Observe that a relationship fan exists in this design—two distinct
relationship types, Owned_by and Played_by, fan out of the entity type MUSICIAN. Is
there a fan trap inherent in the design? It is true that there can be ambiguities in the
pathway between VEHICLE and INSTRUMENT. If questions like “what vehicle does a
musician own while playing a guitar” or “how many instruments does a musician play
while owning an SUV” are semantically relevant, then this relationship fan indeed consti-
tutes a fan trap. In other words, since a pathway between VEHICLE and INSTRUMENT is
not semantically relevant in the story line, any ambiguity in the pathway caused by the
structural arrangement in the design is irrelevant. Therefore, the relationship fan, in this
case, does not cause a fan trap.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
list provided in Section 5.6.1. The original designs shown in Figures 5.48 and 5.49 do
indeed answer this particular question but are unacceptable solutions because the pres-
ence of fan traps in these designs raises other semantic issues relevant to the scenario. So,
what is the solution?
252
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
(1,n)
Available_in
(1,1)
253
(a)
ER diagram in Figure 5.52a augmented by the relationship type Available_in
b1 k1
m1 p1
k2
l1 m2 p2 b2
b3 k3
l2 m3 p3
b4 k4
l3 m4 p4
k5
m5 p5 b5
k6
Available_in
a6
a5
a4
a3
a2
a1
(b)
An instance diagram for the design shown in figure above
FIGURE 5.53 Final design free of connection traps for the scenario specified
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
5.6.3.1 Vignette 5
Suppose vendors supply products to projects. A critical thing to know is which vendor
supplies what product to which project and the frequency of the supplies. In addition,
say there is another business rule: A project can get a specific product only from one
vendor. This does not preclude a project from getting other products from the same and
other vendors. It only restricts a project from getting the same product from several
254 vendors.
Based on the several examples provided at the beginning of this chapter (see
Section 5.1), it appears that a ternary relationship among entity types VENDOR,
PRODUCT, and PROJECT, with Frequency as the attribute of the relationship type,
captures this scenario. This design is presented in the ERD shown in Figure 5.54a.
Does this ERD capture the business rule A project can get a specific product only
from one vendor? The answer is “No.” From a modeling perspective, what needs to be
accomplished is that a {product, project} pair must be restricted to a relationship with just
one vendor. The instinctive reaction to this constraint specification is to change the
structural constraints of VENDOR in the Supplies relationship from (1, n) to (1, 1). This is
a semantic trap in that the change does more than what the business rule specifies. That
is, not only can a project get a specific product from just one vendor, as required by the
business rule, but a vendor can supply no more than one product, and that product can be
supplied to no more than one project. This unexpected side effect amounts to a semantic
trap.
The solution lies in restricting the relationship of a {product, project} pair to just one
vendor while permitting a vendor to relate to multiple {product, project} pairs. This can-
not be done by manipulating the structural constraints of a ternary relationship type
among PRODUCT, PROJECT, and VENDOR. This is accomplished by restructuring the
ternary relationship type to two binary relationships:
• Rendering an m:n relationship type connecting PRODUCT and PROJECT to a
cluster entity type, and
• Specifying a 1:n relationship type between VENDOR and the cluster entity
type
The ERD for this revised design is depicted in Figure 5.54b. Observe that Frequency is
now the attribute of the binary relationship Uses. Also, the cluster entity type has been
arbitrarily named INVENTORY. In essence, what appeared to be a possible ternary rela-
tionship at first glance is not the correct model to capture the complete scenario por-
tended in the story line. Does the revised model (Figure 5.54b) continue to preserve the
requirement as to which vendor supplies what product to which project and the frequency
of supplies? It certainly does capture the frequency of use of a certain product by a cer-
tain product in the Uses relationship in the cluster entity type INVENTORY. Since there is
only one vendor related to this {product, project} pair, as indicated by the (1, 1) structural
constraint of INVENTORY in the Supplies relationship, the requirement is intact in the
revised design.
A couple of other examples are available in Section 5.2.2, where the cluster entity
type as an ER modeling construct is first introduced.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
255
5.6.3.2 Vignette 6
Let us revisit another example from earlier in the chapter to examine a concealed seman-
tic trap. The second example in Section 5.4.1 involves a scenario of restaurants catering to
banquets. For convenience, the scenario is reproduced here along with the ER model
(from Figure 5.35), which is shown in Figure 5.55a:
Restaurants cater to banquets. A banquet has a menu of food items and a restaurant
caters various food items. Unless a restaurant is capable of preparing the set of food items
contained in a banquet’s menu, the restaurant cannot cater that particular banquet.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
256
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
A quick scrutiny of the ERD in Figure 5.55a reveals that it is not possible to answer
the first two questions using this ERD, even though the ERD is syntactically correct and
can in fact be claimed as semantically correct, too. From the ERD, it is possible to list the
banquets a particular restaurant is capable of catering, but that does not indicate whether
the restaurant actually catered any or all of these banquets (Question 1). Likewise, it is
possible to find out the number of restaurants a banquet can use given its menu. However,
one cannot identify the number of restaurants that actually catered a given banquet
257
(Question 2). Thus, the ERD is certainly not semantically complete. In other words, a
semantic trap is present in the design. It is a trap simply because its presence is concealed
and the design was completed without recognizing the presence of a semantic trap, a case
in point for the importance of data model validation.
An alternative design for the same scenario is portrayed in Figure 5.55b. Structurally,
the entity types BANQUET and FOOD_ITEM are rearranged so that a direct relationship
between RESTAURANT and BANQUET is enabled. This should facilitate answering Ques-
tions 1 and 2. This rearrangement triggers another structural change in order to preserve
the other business rules prevailing over the scenario. A composite of Caters and Contains
becomes inclusion-dependent on Capable_of_preparing. As a consequence, pursuant to
ER modeling grammar rules, Caters and Contains become weak relationship types (double
diamond), and the direction of the solid arrow in the ERD is accordingly reversed. Is this
design superior to the one developed earlier (Figure 5.55a)? The answer to this question is
context dependent. Given the scenario and the list of probable questions pertaining to the
scenario, the alternative design just developed (Figure 5.55b) is indeed superior since it
fully captures all the specified semantics and specifically eliminates the semantic trap
identified in the original ERD.
The more important question is: What is the approach used to solve this problem?
Unfortunately, semantic traps often do not fall in a pattern like connection traps, where a
structural configuration of the ERD capable of generating a specific connection trap is
known; and the approach(es) to resolve the particular connection trap is(are) known as
well. The two examples presented here bear no similarity to each other regarding the iden-
tification or resolution of the semantic trap. Trial and error and experimentation based on
possible hints discerned in the scenario and its business rules seem to be the only available
approach. The lesson to be learned here is simply that validation of the conceptual design is
a crucial step in the data modeling process. Inadvertent misinterpretation of the semantics
embedded in a requirements specification is an unavoidable aspect of conceptual modeling.
Being aware of the possibility of connection traps and other semantic traps sensitizes a
designer to pay close attention to the requirements specification during the conceptual
modeling process, and including a formal step of model validation in the conceptual model-
ing process enhances the quality of the conceptual modeling script (e.g., ER model).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
salary. Surgeons do not receive a salary but work for Cougar Medical Associates on a contract
basis. It is possible for a physician to have an ownership position in the clinic.
Since surgeons perform surgery on patients as needed, it is required that a surgery sched-
ule keep track of the operation theater where a surgeon performs a certain surgery type on a
particular patient and when that surgery type is performed. Some patients need surgeries and
others don’t. Surgeons perform surgeries in the clinic; some do a lot, others just a few. Some
surgery types are so rare that they may not yet have been performed in the clinic, but there are
258
others that are performed numerous times. In addition, there is the need to keep track of
nurses who can be assigned to a specific surgery type since all nurses cannot be assigned to
assist in all types of surgeries. A nurse cannot be assigned to more than one surgery type. It is
the policy of the clinic that all types of surgery have at least two nurses. The clinic maintains a
list of surgery skills. A surgery type requires at least one but often many surgery skills. However,
all surgery skills are not utilized in the clinic, whereas some surgery skills are utilized in
numerous surgery types. Nurses possess one or more of these surgery skills. There are certain
surgery skills for which no nurse in the clinic qualifies; at the same time, there are other sur-
gery skills that have several qualified nurses. In order to assign a nurse to a surgery type, a
nurse should possess one or more of the skills required for the surgery type.
Depending on the illness, some patients may stay in the clinic for a few days, but
most require no hospitalization. In-patients are assigned a room and a bed. A nurse
attends to several in-patients but must have at least five. No more than one nurse attends
to an in-patient, but some in-patients may not have any nurse attending to them. If a
nurse leaves the clinic, the association of all in-patients who were previously attended to
by that nurse should be temporarily removed in order to allow these patients to be trans-
ferred to another nurse at a later time. Every physician serves as a primary care physician
for at least seven patients; however, no more than 20 patients are allotted to a physician.
If a physician leaves the clinic, that physician’s patients should be temporarily assigned to
the clinic’s chief of staff. Clinic personnel can also become ill and be treated in the clinic.
A patient is assigned one physician for primary care.
Physicians prescribe medications to patients; thus, it is necessary to capture which
physician(s) prescribe(s) what medication(s) to which patient(s), along with dosage and
frequency. In addition, no two physicians can prescribe the same medication to the same
patient. If a physician leaves the clinic, all prescriptions prescribed by that physician
should be removed because this information is also retained in the archives. A person
affiliated with the clinic as a surgeon cannot be deleted as long as a record of all surgeries
performed by the surgeon is retained.
A patient may be taking several medications, and a particular medication may be
taken by several patients. However, in order for a patient to take a medicine, the medi-
cine must be prescribed to that patient. As a medicine may interact with several other
medicines, the severity of such interactions must be recorded in the system. Possible
interactions include S ¼ Severe interaction, M ¼ Moderate interaction, L ¼ Little interac-
tion, and N ¼ No interaction.
A patient may have several illnesses, and several patients may have the same illness.
In order to qualify as a patient, a patient must have at least one illness. Also, a patient
may have several allergies.
All clinic personnel have an employee number, name, gender (male or female),
address, and telephone number. With the exception of surgeons, all clinic personnel also
have a salary (which can range from $25,000 to $300,000), but salaries of some can be
missing. Each person who works in the clinic can be identified by an employee number.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
For each physician, his or her specialty is captured; whereas, for each surgeon, data
pertaining to his or her specialty and contract are captured. Contract data for surgeons
include the type of contract and the length of the contract (in years). Grade and years of
experience represent the specific data requirements for nurses.
A surgery code is used to identify each type of surgery. In addition, the name, cate-
gory, anatomical location, and special needs are captured for each surgery type. There are
two surgery categories: those that require hospitalization (category ¼ H) and those that
259
can be performed on an outpatient basis (category ¼ O). A surgery skill is identified by its
description and a unique skill code. Data for patients consists of personal data and medi-
cal data. Personal data includes patient number (the unique identifier of a patient), name,
gender (male or female), date of birth, address, and telephone number. Medical data
includes the patient’s blood type, cholesterol (consisting of HDL, LDL, and triglyceride),
blood sugar, and the code and name of all the patient’s allergies.
For both clinic personnel and patients, a Social Security number is collected. For
each illness, a code and description are recorded. Additional data for each in-patient con-
sists of a required date of admission along with the patient’s location (nursing unit, room
number, and bed number). Nursing units are numbered 1 through 7, rooms are located in
either the Blue or Green wing, and the bed numbers in a room are labeled A or B. Medi-
cations are identified by their unique medication code, and medication data also includes
name, quantity on hand, quantity on order, unit cost, and year-to-date usage. For medical
corporations with ownership interest in the clinic, the corporation name and headquarters
are obtained. Corporation name uniquely identifies a medical corporation. The percentage
ownership of each clinic owner is also recorded.
The physicians who work in the clinic have recently embarked on a program to mon-
itor the cholesterol level of its patients because cholesterol contributes to heart disease.
Risk of heart disease is classified as N (None), L (Low), M (Moderate), and H (High). The
ratio of a person’s total cholesterol divided by HDL is used in the field of medicine as one
indicator of heart risk. Total cholesterol is calculated as the sum of the HDL, LDL, and
one-fifth of triglycerides. A total cholesterol/HDL ratio less than 4 suggests no risk of heart
disease due to cholesterol; a ratio between 4 and 5 reflects a low risk; and a ratio greater
than 5 is a moderate risk. The high-risk category is not coded as a function of cholesterol.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
A slightly simplified alternative design can model NURSE, PHYSICIAN, SURGEON, and
SUPPORT_STAFF as subclasses of a disjoint specialization of CLINIC_PERSONNEL. Observe
that SUPPORT_STAFF is modeled as a subclass instead of subsumed in CLINIC_PERSONNEL
because Salary cannot be included as an attribute of CLINIC_PERSONNEL—that is, surgeons
are not salaried members of clinic personnel. Therefore, in this design, Salary is included as
an attribute of NURSE, PHYSICIAN, and SUPPORT_STAFF. The ERD depicting this design
appears in Figure 5.57. This design has the same number of entity types as the one in
261
Figure 5.56 and actually has one less tier in the specialization hierarchy. Therefore, this
design is used for the ER model of CMA.
The list of attributes for the entity type PATIENT is relatively large and appears clearly
demarcated as personal and medical data about a patient. Although all these attributes can
certainly be recorded under PATIENT, it may be worthwhile to model PERSONAL_INFO
and MEDICAL_INFO as separate entity types, especially if CMA expects to treat a large
number of patients and the use of personal and medical information is clearly divided
between administrative and medical personnel of the clinic. Since personal and medical
information together make up patient information, the aggregation construct seems an
appropriate way to depict this relationship. Figure 5.58 captures this aggregation construct.
FIGURE 5.57 Presentation Layer ERD for Cougar Medical Associates—Stage 1 (an alter-
native design)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
262
Next, a patient having several illnesses as well as certain illnesses afflicting many
patients can be captured as an m:n relationship between a new entity type called ILLNESS
and the entity type PATIENT. But then, observe that Allergy is modeled as a multi-valued
attribute of a patient’s MEDICAL_INFO instead of as an entity type ALLERGY. Why? The
requirements specified in the narrative simply state that “a patient may have several
allergies” and nothing more. That Allergy can be a multi-valued attribute of PATIENT is
quite clear from this statement. Attempting to specify an entity type ALLERGY amounts to
speculating beyond the requirements specification and is incorrect in the context of the
CMA scenario.
A relationship between PATIENT and PHYSICIAN also seems obvious from the story.
MEDICATION appears to be another candidate for being modeled as an entity type. Once
again, an m:n relationship between PATIENT and MEDICATION appears imminent. Which
physician(s) prescribe(s) what medication(s) to which patient(s) appears to fit the mold of
a ternary relationship type, with Dosage and Frequency as the attributes of the relation-
ship type. Likewise, medicines interacting with other medicines convey an ideal recursive
relationship type. Finally, the fact that a medicine must be prescribed in order for a
patient to take that medicine can be handled by an inter-relationship constraint. These,
as well as the relationships that follow, are incorporated in the Presentation Layer ERD
shown in Figure 5.59.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Gender
Address
Name
Phone#
Emp#
Ssno PERSON
FIGURE 5.59
Code Description
CLINIC_PERSONNEL
U
O
Percent_own ILLNESS
Headquarters
MEDICAL_ CLINIC_
CORPORATION U OWNER Severity
Corp_name
U
Interacts
U m
Salary m n
U
U
SUPPORT_STAFF d PHYSICIAN Suffers
n
Speciality Q_on_hand
U
PCP*
1 20 D C
Speciality
SURGEON R Theatre PATIENT Prescribes MEDICATION Q_on_order
Salary Con_type
U
Yrs_experience Con_years
Skill_code Description
Frequency Dosage Unit_cost
Salary S_code Surg_sch
Name
U
Surg_date n m
n m SURGERY_ n m SURGERY_ A Name Ytd_usage
NURSE Nurse_skill Req_skill Takes
SKILL TYPE
Special_Needs
Med_code
U 1 1
Category Ch_ratio
Blood_sugar
Grade
Anat_location
PERSONAL_INFO MEDICAL_INFO
n 1
Assigned_to Blood_type Heart_risk
Allergy
1
Description
Patient# Total_
n
Attends IN_PATIENT Birthdate cholesterol
N
-Code
---
HDL Cholesterol
Nursing_unit Room#
Modeling Complex Relationships
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
263
Chapter 5
Another ternary relationship looms in the story line “Surgeons perform surgery on
patients as needed.” For this, the creation of an entity type called SURGERY_TYPE is
necessary. Incidences of surgery performed by a surgeon on a patient indicating the time
and operation theater for these events are captured by this ternary relationship type. Note
that the structural constraints of the ternary relationship types cannot be accurately
reflected in the presentation layer because of the use of “look-across” notation in the
grammar. It is important to note that nurses are assigned to surgery types (heart surgery,
264
knee surgery, etc.), not to scheduled surgery events. Since nurses also attend to in-
patients and since other attributes specific to in-patients do not apply to patients in
general, specializing IN_PATIENT as a subclass of PATIENT and relating it to NURSE
makes sense.
At first glance, it is conceivable to think of a nurse’s surgery skills and skills required
for a surgery type as multi-valued attribute(s). However, this will not be adequate to model
the kinds of associations among nurses, surgery types, and surgery skills specified in the
requirements. A closer perusal of the story will justify the creation of an entity type for
SURGERY_SKILL and relate it to SURGERY_TYPE as well as to NURSE.
Finally, the fact that the clinic can have multiple owners and since the owners can
be medical corporations and individual physicians (belonging to two different entity
classes) leads to the modeling of CLINIC_OWNER as a category arising from a subset of
the union between PHYSICIAN and another new entity type, MEDICAL_CORPORATION.
Alternatively, when the system is being developed for just one clinic, the ownership
information may be captured in the entity types PHYSICIAN and MEDICAL_
CORPORATION. MEDICAL_CORPORATION, in this case, will be a stand-alone entity type
in the ERD which, while technically acceptable, may not be considered an
elegant design.
This gives an initial version of the ERD, which will serve as the input for further
refinement and specification of other semantic integrity constraints that cannot be incor-
porated in the Presentation Layer ERD shown in Figure 5.59. Notice that the cardinality
ratios and participation constraints culled from the narrative are shown in the ERD. The
deletion rules embedded in the narrative are:
1. If a nurse leaves the clinic, temporarily remove the association of all patients
previously attended to by that nurse in order to allow these patients to be
transferred to another nurse sooner or later.
2. If a physician leaves the clinic, temporarily assign the physician’s patients to
the clinic’s chief of staff.
3. If a physician leaves the clinic, all prescriptions prescribed by that physician
should be removed because this information will be retained in the archives.
4. A person affiliated with the clinic as a surgeon cannot be deleted as long as a
record of surgeries performed by that surgeon is retained.
These deletion rules have also been incorporated in the ERD as deletion constraints.
Observe that the narrative hasn’t provided a comprehensive set of deletion rules for the
scenario. As we know, at the time of database implementation, missing deletion con-
straints will, by default, translate to the Restrict (R) option, which can sometimes create
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
specification conflicts. It is the responsibility of the data modeler to make sure that such a
default specification for the deletion constraint does not conflict with the participation
constraint prevailing in the relationship type or the specified deletion constraint on an
associated relationship type.
The business rules that cannot be expressed in the ERD are listed as semantic integ-
rity constraints in Table 5.3.
265
Attribute-Level Business Rules
1. The gender of a person (i.e., a person affiliated with the clinic or a patient) is either male or female.
2. Nursing unit numbers range from 1 to 7.
3. Salaries of clinic personnel range from $25,000 to $300,000.
4. A surgery type can be either H — require hospitalization or O — be performed on an outpatient basis.
5. Rooms in the clinic are located in either the B ¼ Blue wing or G ¼ Green wing.
6. Bed numbers in a room are labeled either A or B.
7. Severity of medication interaction can be N ¼ No interaction; L ¼ Little interaction; M ¼ Moderate
interaction; and S ¼ Severe interaction.
8. Heart risk can be N ¼ No risk; L ¼ Low risk; M ¼ Moderate risk; and H ¼ High risk.
1. A patient’s heart risk is (a) “N” when the cholesterol ratio is below 4; (b) “L” when the cholesterol
ratio is between 4 and 5; and (c) “M” when the cholesterol ratio is greater than 5.
1. A physician serves as a primary care physician for at least seven but no more than 20 patients.
2. Each nurse is assigned a minimum of five patients.
3. All types of surgery require at least two nurses.
TABLE 5.3 Semantic integrity constraints for the Presentation Layer ER model for Cougar
Medical Associates
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
maximum cardinality of 1 for the PHYSICIAN in Prescribes will limit the relationship of a
physician to one {patient, medication} pair. The chances are that this side effect is unin-
tended and unacceptable. If so, then this is a case of a semantic trap like the one discussed
in vignette 5 (Section 5.6.3.1). The solution lies in restricting the relationship of a {patient,
medication} pair to just one physician while permitting a physician to relate to multiple
{patient, medication} pairs. This cannot be done by manipulating the structural constraints
of a ternary relationship type among PATIENT, PHYSICIAN, and MEDICATION. This is
266
accomplished by restructuring the ternary relationship type to two binary relationships:
• Rendering an m:n relationship type connecting PATIENT and MEDICATION
(Prescribes) as a cluster entity type (PRESCRIPTION)
• Specifying a 1:n relationship type (Writes) between PHYSICIAN and the clus-
ter entity type, PRESCRIPTION
This solution implements the stated business rule without causing any unexpected
side effects. This modification to the design is shown in Figure 5.60. An interesting ques-
tion that may arise at this point is: Is there a need for a cluster entity type? How about
specifying a base entity type called PRESCRIPTION, which conventional wisdom would
suggest? Strict adherence to the story line of the CMA scenario reveals that a base entity
type called PRESCRIPTION is not feasible because, given the information available in the
requirements specification, a unique identifier for such an entity type cannot be culled
out. Furthermore, a closer observation of the design reveals that the cluster entity type
PRESCRIPTION will get decomposed to a gerund entity type PRESCRIPTION at the
design-specific level.
Business rule (2) can be incorporated into the ERD by transforming the Assigned_to
relationship type to a weak relationship type with an inclusion dependency on the
composite of Skill_set and Req_skill. This particular design was discussed earlier in this
chapter (see Section 5.4.1).
At this point the ER model consists of the ERD in Figure 5.60 and the list of semantic
integrity constraints in Table 5.3.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Gender Address
Name
Phone#
Emp#
Ssno PERSON
FIGURE 5.60
Code Description
U
CLINIC_PERSONNEL O
Percent_own ILLNESS
Headquarters
MEDICAL_ CLINIC_
CORPORATION OWNER
U
Severity
Corp_name
U
U Interacts
m m n
1 m
Writes
SUPPORT_STAFF
U
U
d PHYSICIAN Suffers
C n
1
Speciality Q_on_hand
U
PCP*
Salary PRESCRIPTION
Speciality 20
D n m
Salary SURGEON R PATIENT Prescribed MEDICATION Q_on_order
Con_type Theatre
U
Con_years
Skill_code Description U Dosage Unit_cost
Salary S_code Frequency
Surg_sch
Name
Surg_date n m
n m SURGERY_ n m SURGERY_ U A Name Yfd_usage
NURSE Nurse_skill Req_skill Takes
SKILL TYPE Special_Needs
1 1 Med_code
Category Ch_ratio
Blood_sugar
Grade
Yrs_experience Anat_location
PERSONAL_INFO MEDICAL_INFO Heart_risk
n 1 Blood_type
Assigned_to Allergy
1
Patient# Description
Total_
Nursing_unit Room#
Modeling Complex Relationships
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
267
Chapter 5
Entity/Relationship
Type Name Attribute Name Data Type Size Domain Contraint
TABLE 5.4 Domain specifications for the attributes of the ER model for Cougar Medical Associates
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Entity/Relationship
Type Name Attribute Name Data Type Size Domain Contraint
TABLE 5.4 Domain specifications for the attributes of the ER model for Cougar Medical Associates
(continued)
Three major tasks in this transformation process for the ERD are:
1. Map the structural constraints of relationship types from the “look across”
(Chen’s) notation to the “look near” [(min, max)] notation.
2. Decompose m:n relationship types to gerund entity types.
3. Transform multi-valued attributes to single-valued attributes.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Since each of these three tasks is discussed in detail in Chapters 3 and 4, suffice it to
say here that there is only one multi-valued attribute (Allergy) requiring evaluation; and
the m:n binary relationships requiring decomposition to gerund entity types are: Nurse_
skill, Req_skill, Prescribes, Takes, Interacts, and Suffers. In addition, the ternary rela-
tionship type Surg_sch requires decomposition in preparation for mapping to the logical
tier. How to decompose a ternary relationship was illustrated in Section 5.5.1. Following
that procedure, the decomposition of Surg_sch results in a gerund entity type SURG_SCH
270
with three identifying parents, SURGEON, SURGERY_TYPE, and PATIENT, as shown in
Figure 5.61.
Next, the weak relationship type Takes, inclusion-dependent on the regular relation-
ship type Prescribes, needs attention. Note that both Prescribes and Takes, being rela-
tionship types depicting the m:n cardinality ratio, will first be decomposed to the gerund
entity types PRESCRIPTION and MED_TAKEN, respectively. Then, following the proce-
dure prescribed in Section 5.5.4 for transforming a weak relationship type to the
design-specific state, a partial specialization involving PRESCRIPTION and MED_TAKEN is
modeled as superclass and subclass respectively (see Figure 5.61).
The situation with the weak relationship type Assigned_to in the Presentation Layer
(Figure 5.60) is somewhat different because it has inclusion dependency on a composite of
the two relationship types Nurse_skill and Req_skill. Remember, the story line states that
a nurse must have some of the skills required for the surgery type in order to get assigned
to that surgery type. Our first task is to recognize that Nurse_skill and Req_skill first get
translated to the gerund entity types NURSE_SKILL and REQ_SKILL, respectively,
because these two are relationship types bearing an m:n cardinality ratio. The task here is
two-fold: (1) rendering the inclusion dependency Assigned_to (Nurse_skill Req_skill)
implementable, and (2) keeping the structural constraints of the relationship type
Assigned_to intact.
As shown in Figure 5.61, using a cross-referencing design for mapping Assigned_to,
the weak entity type ASSIGNMENT is identification-dependent on NURSE. This meets the
specification in item (2). The relationship type Subject_to between ASSIGNMENT and
NURSE_SKILL and the relationship type Depends_on between ASSIGNMENT and
REQ_SKILL followed by an inclusive arc across these two relationship types partially
captures the specification in item (1). In addition, a constraint specifying that the {Nurse,
Surgery_type} projection from ASSIGNMENT should be a subset of the {Nurse, Surgery_
type} set resulting from the intersection of NURSE_SKILL and REQ_SKILL must be
included in the list of semantic integrity constraints.11
The final version of the Design-Specific ERD ready for conversion to a logical schema
appears in Figure 5.61. Note that in addition to all the necessary decompositions, all mis-
cellaneous business rules from Table 5.3 and attribute characteristics from Table 5.4 have
been incorporated in this ERD. Table 5.5 records the remaining semantic integrity con-
straints to be carried forward to the logical modeling phase, thus rendering the Design-
Specific ER model fully information-preserving.
11
Projection is a vertical subset of attributes from an entity set. Intersection of two entity sets results
in a third entity set containing entities common to the first two. A precise definition and detailed
explanation of “projection” and “intersection” are included in Chapters 6 and 11.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
271
FIGURE 5.61 The final form of the Design-Specific ERD for Cougar Medical Associates
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
TABLE 5.5 Semantic integrity constraints for the final Design-Specific ER model for Cougar
Medical Associates
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
Chapter Summary
This chapter focuses on modeling relationships beyond binary relationships, characteristics of
weak relationship types, the decomposition of complex relationships, and model validation.
How to handle ternary (degree 3), quaternary (degree 4), and quintary (degree 5) relation-
ships is shown through a series of application scenarios and vignettes. The cluster entity type is
a way to represent entity types that naturally emerge from a higher-order relationship type and/or
273
a group of entity types and associations among them.
The relationship construct known as the weak relationship type was originally defined by
Dey, Storey, and Barron (1999). A weak relationship type occurs when two relationship types
are linked by either an event-precedent sequence or a condition-precedent sequence. An event-
precedent weak relationship type occurs when an event associated with the occurrence of one
relationship type must precede an occurrence of the weak relationship type. A condition-
precedent weak relationship type occurs when a condition that triggers the occurrence of one
relationship type must precede the occurrence of the weak relationship type.
The decomposition of ternary and higher-order relationship types is very similar to the
decomposition of binary relationship types with an m:n cardinality ratio. It involves converting the
relationship type to a gerund entity type such that the resulting transformation contains nothing
but a set of binary relationships with cardinality ratios of 1:m. Binary relationship types with multi-
valued attributes and design alternatives in their decomposition are also introduced. A brief dis-
cussion about the alteration of weak relationship types in preparation for logical model mapping
follows.
It is important that a conceptual model be an accurate representation of the “universe of
interest.” Such an objective can only be achieved through the careful evaluation of how well
the developed conceptual model addresses the explicit and/or implicit questions associated
with the requirements specification. This is accomplished through a formal design validation step
in the conceptual modeling process.
The Cougar Medical Associates case represents a real-world scenario and provides an
opportunity to employ several complex relationship constructs described in the chapter. This
case is developed all the way to a Design-Specific ER model ready for transformation to the
logical tier.
Exercises
1. What is a cluster entity type?
2. What is a weak relationship type? Contrast a condition-precedent weak relationship type
with an event-precedent weak relationship type.
3. What is a composite relationship type?
4. What is required in order to decompose a ternary and higher-order relationship type in
preparation for mapping to a logical schema? What is the cardinality ratio of each of the
resulting set of binary relationships?
5. What is required to decompose a cluster entity type in preparation for mapping to a logical
schema?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
6. Figures 5.4, 5.7, and 5.39 show the decomposition of a ternary relationship. Explain why
the entity types SCHEDULE and PRESCRIPTION in Figures 5.4 and 5.7 are shown as
base entity types while the entity type USE in Figure 5.39 is shown as a weak entity type.
7. Consider the ERD in Figure 5.5. State meaningful semantics for additional binary relation-
ships among the entity types in the diagram and update the ERD accordingly with full
specification of the structural constraints of the relationship types you have proposed. Can
274 the three binary relationships among the three entity types present collectively capture the
semantics conveyed by the ternary relationship type? State the condition under which the
answer is “yes” and the condition under which the answer is “no.”
8. The Design-Specific ERD in Figure 5.13a requires further decomposition before it can be
mapped to the logical tier. Specify the final form of the Design-Specific ERD that will render
this design ready for mapping to a logical schema.
9. Decompose the Design-Specific ERDs in Figures 5.17 and 5.18 to the final Design-Specific
stage and explain the differences.
10. Convert the ERD in Figure 5.22 to the final form of Design-Specific ERD.
11. Transform the Design-Specific ERD in Figure 5.28 to a final-form Design-Specific ERD.
12. The weak relationship type shown in Figure 5.32 requires further decomposition preparatory
to mapping to a logical schema. Develop the final form of the Design-Specific ERD.
13. Business Process, Inc. (BPI), a consulting company offering business process reengineer-
ing and application system development expertise, wants to develop a prototype of a
simple University Registration System (UNIVREG) to handle student/faculty information,
course/section schedules, and co-op and lab information. Many small universities are in
need of such a system. BPI believes that the profit potential from economies of scale alone
in custom-fitting such an IS application to small universities that primarily offer a small
number of programs is an attractive business opportunity. You have recently been hired by
BPI and assigned to develop the conceptual design for this application. Here are the data
specifications:
A university has several departments and these departments employ faculty (professors) for
purposes of teaching, research, and administration. A department may have many profes-
sors but has to employ at least five. A professor, however, belongs to only one department
at any time. In addition to teaching, some of the professors may work as department heads.
Each department has a department head, but no more than one. Every department should
continue to exist as long as it has at least one professor associated with it or it offers at
least one course. If a faculty member serving as a department head leaves the university,
some other professor (often, the most senior faculty member of the department) assumes
the role by default.
The departments may offer several courses as part of their academic missions. However,
any particular course is offered by only one department. Not all courses are offered all the
time, but every course is offered sometime. When a course is offered, multiple sections of
that courses may be offered during a specific quarter of the year. If a particular course is no
longer offered, all offerings (sections) of that course should be deleted unless there are
students enrolled in the course sections. If, however, a student leaves, that student’s
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Modeling Complex Relationships
enrollment in all associated course sections should be removed. A course may be a pre-
requisite for several other courses, but a course may have no more than one prerequisite.
A course cannot be removed from the database as long as it is a prerequisite for other
course(s); however, if it only has prerequisites, then its deletion should be accompanied by
the removal of its links to all of its prerequisites.
Some professors in the university also write textbooks. Sometimes, more than one profes-
sor may be co-authoring a book, but all textbooks used by the university need not have one 275
of its professors as an author. Some professors are also authors of multiple textbooks.
There is no plan for this system to record the authorship of professors who are not working
at this university. When a professor leaves the university, the university no longer keeps
track of books written by that professor. Likewise, if a textbook is no longer in use, the
authorship of the textbook is not preserved. The system also needs to record which profes-
sor uses what book in which course. Removal of a textbook from the database is prohibited
if it is used by a professor in a course. However, if a course is removed from the catalogue,
its link to a professor using a particular textbook is also removed. Likewise, if a faculty
member leaves the university, the link to the textbook used by the professor in a specific
course is deleted. All professors teach, and a professor may teach several course sections.
A course section, however, is taught by just one professor and must have some professor
assigned to teach it. Some of the course sections may have multiple lab sessions in a
quarter. Each lab session caters to only one course section—that is, there are no joint lab
sessions. If a course section has an associated lab session, cancellation of the course
section is not permitted.
Students enroll in course sections. In fact, to remain a student, one has to take at least one
course (section), but university rules forbid a student from taking more than six courses
(sections) in a quarter. Each section has to have at least 10 students enrolled; otherwise, it
will be cancelled. If a student has registered for a section, the section should continue to
exist. Also, when a professor is assigned to teach a section, deletion of the professor’s
record is prohibited. The university admits mostly graduate and undergraduate students, but
a few non-matriculating students are also admitted. The undergraduate students may, as
part of their academic programs, enroll for professional practice (co-op) sessions with com-
panies. Several students may be enrolled in the same co-op session, and a co-op session
has at least one undergraduate student enrolled in it. An undergraduate student can co-op
more than once. When a graduate student leaves the university for whatever reason, the
associated graduate student record is deleted from the database; if an undergraduate stu-
dent leaves, the associated student record is not deleted so that the co-op status of the
student can be properly verified. If, having verified the co-op status, the decision is made to
drop the undergraduate student information from the system, all co-op enrollments for that
student should also be erased. Cancellation of a co-op session is prohibited if there is/are
student(s) enrolled in it. As part of their academic experience some of the graduate stu-
dents are assigned to conduct one or more lab sessions. A lab session can be conducted
by at most one graduate student, but some lab sessions are not assigned to any graduate
student. When a graduate student graduates, the lab sessions assigned to him/her cannot
be cancelled; instead, the capability should exist to indicate that, for the present, the lab
session is not handled by a graduate student.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 5
Students borrow books from a single (main) library on the campus. A student may borrow a
lot of books, and a book may be borrowed by several students, when available. Book-
returns by students are also recorded in this system. The return pattern is the same as that
of borrowing. Deletion of a student record is not allowed if he or she has any borrowed
books outstanding. If a book is removed from the library catalogue, all borrow and return
links for that book are removed. When a student leaves, the book-return links for that stu-
dent are also discarded. It is important to note that a book should have been borrowed in
276
order for it to be returned.
The registration system should capture student information like the name [o], address, and
a unique student ID for each student. Please note that optional attributes are marked by
an [o]; so, the rest of the attributes are mandatory. In addition, the status of the student
should be recorded. For undergraduate students, data on the student’s concentrations
should be available; all undergraduate students have multiple (at least two) concentrations.
Thesis option [o] and the undergraduate major of each graduate student should be cap-
tured by this system. A co-op session is identified by year and quarter, and each co-op
session has a session manager [o]. A particular student during a particular co-op session
works in a company, and the database should record the name of the company and co-op
assessment [o] for the student for each co-op session. Every professor has a name,
employee ID, office [o], and phone [o]. Both professor name and employee ID have unique
values. Data gathered about a department are: department name [o], department code [o],
location, and phone# [o]. For a department, the name and code are both unique. The
courses offered have data on course name, credit hours, college [o], and course#. The
course# is used to distinguish between courses. Each course may have multiple course
sections, with data including the classroom [o], class time, class size [o], section number,
quarter, and year. There is no unique identifier for course section because the course sec-
tion has existence dependency on course; section number, quarter, and year together in
conjunction with course# can uniquely identify course sections. The grade a student makes
in a particular course should be available through the system. The lab sessions have infor-
mation about the topic [o], time, lab location, and the lab session number for a given course
section. Attributes of textbooks include ISBN, the unique identifier, year [o], title, and pub-
lisher [o]. The library books, on the other hand, are identified by a call#. The ISBN# and
copy# together also identify a copy of the book. The name of the book [o] and author [o]
are also recorded.
a. Develop a Presentation Layer ER model for the UNIVREG. The ERD should be fully
specified, with the unique identifiers, other attributes for each entity type, and the rela-
tionship types that exist among the various entity types. All business rules that can be
captured in the ERD must be present in the ERD. Any business rule that cannot be
captured in the ERD should be specified as part of a list of semantic integrity
constraints.
b. Incorporate the following business rule into the Presentation Layer ER model: No two
courses can be taught by the same professor using the same textbook.
c. Transform the Presentation Layer ER model developed in Exercise 13a to a Design-
Specific ER model. Note that attribute characteristics are not provided and thus need
not appear in the ERD.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PART II
LOGICAL DATA MODELING
INTRODUCTION
At the completion of the conceptual modeling phase, the systems analyst/application developer
usually has a reasonably clear understanding of the data requirements for the application system at
a high level of abstraction. It is important to note that a conceptual data model is technology indepen-
dent. During conceptual modeling, the analysis and design activities are not constrained by the
technology that will be used for implementation. A conceptual schema may contain constructs not
directly compatible with the technology intended for implementation. In addition, some of the design
may require refinement to eliminate data redundancy. The next step after conceptual modeling, then,
is to transform the conceptual schema into a logical schema that is more compatible with the
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Part II
Part II introduces logical data modeling, which serves as the transition between the
conceptual schema and the physical design. Logical data modeling begins with the crea-
tion of a technology-independent logical schema, proceeds through the normalization
process,1 and concludes with a technology-dependent logical schema expressed using the
relational modeling grammar. Figure II.1 points to where we are now in the database
development process.
278
Universe of
Interest
Requirements
Specification
Process Data
Specifications Specifications
[ER Modeling
Process Model Conceptual Design/Schema
Grammar]
ER Diagram
We Design-Specific + Updated semantic
are Logical Data Modeling ER Model integrity constraints list
here
Technology-Independent
Logical Schema
[Information–Preserving Grammar]
Technology-Independent
Normalization
Technology-Dependent
Technology-Dependent
Logical Schema
Physical Design/Schema
1
Normalization is the subject of Chapters 7, 8, and 9 in Part III.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Logical Data Modeling
2
Two basic logical data structures—inverted tree and network—underlie conventional database design;
and three basic data model architectures—relational, hierarchical, and CODASYL (Conference on
Data Systems Languages)—employ one or both of the logical data structures. A brief discussion of the
inverted tree and network data structures and an overview of how the hierarchical and CODASYL
data model architectures express the inverted tree and network data structures respectively are pre-
sented in Appendix A. Object-oriented concepts have drawn considerable attention among researchers
and practitioners since the late 1980s and have significantly influenced efforts to incorporate in the
DBMS the ability to process complex data types beyond just storage and retrieval. Appendix B briefly
introduces the reader to object-oriented concepts exclusively from a database—or, to be more precise,
from a data modeling perspective.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 6
T H E RE L A T I O N A L
DATA MODEL
This chapter introduces the relational data model and the process of mapping a conceptual
schema that is technology independent to a logical schema (in this case, a relational data
model) that provides the transition to a technology-dependent database design. Recall that
in Part I of this book, the ER modeling grammar was used to develop a conceptual data
model in the form of a Design-Specific ER model. It is this model that serves as the input
for the logical data modeling activity.
The chapter flows as follows. Section 6.1 formally defines a relation, and Section 6.2
gives an informal description of a relation. Section 6.3 discusses the data integrity con-
straints pertaining to a relational data model. This is followed by a brief introduction in
Section 6.4 to relational algebra as a means of specifying the logic for data retrieval from a
series of relations. Section 6.5 introduces the concepts of views and materialized views as
different ways of looking at data stored in relations. The rest of the chapter discusses
mapping a conceptual schema to its logical counterpart using several examples. Section
6.6 introduces the idea of information preservation in data model mapping. Sections
6.7.1.1 and 6.7.1.2 present a detailed discussion of fundamental methods for transforming
basic ER constructs (entity types, relationship types) to the logical tier. Section 6.7.1.3
demonstrates mapping techniques using Bearcat Incorporated’s Design-Specific ERD as
the source (conceptual) schema. The solution highlights the information-reducing nature
of the transformation process. Then, an information-preserving grammar for the logical
schema is presented in Section 6.7.2. A discussion of the heuristics for mapping EER
constructs to the logical schema and the metadata lost in the transformation process
follows in Section 6.8.1. Next, Section 6.8.2 presents the information-preserving grammar
for modeling EER constructs at the logical tier. Finally, mapping complex ER modeling
constructs to the logical tier is covered in Section 6.9.
6.1 DEFINITION
In Foundation for Object/Relational Databases: The Third Manifesto, C. J. Date and Hugh
Darwen wrote the following: “The foundation of modern database technology is without
question the relational model; it is that foundation that makes the field a science. Thus,
any book on the fundamentals of database technology that does not include a thorough
coverage of the relational model is by definition shallow. Likewise, any claim to expertise
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
in the database field can hardly be justified if the claimant does not understand the rela-
tional model in depth.”3
E. F. Codd proposed the relational data model in 1970 as a logically sound basis for
describing the structure of data as well as data manipulation operations.4 The model uses
the concept of mathematical relations as its foundation and is based on set theory. The
relational data model includes a group of basic data manipulation operations called
relational algebra. This chapter discusses the structural aspect of the relational data
model and contains a brief introduction to relational algebra. Relational algebra is covered
in more detail in Chapter 11. 281
The simplicity of the concept and its sound theoretical basis are two reasons why the
relational data model has gained popularity as a logical data model for database design. As
the name implies, the relational data model represents a database as a collection of rela-
tion values (“relations,” for short), where a relation resembles a two-dimensional table of
values presented as rows and columns. A row in the table represents a set of related data
values and is called a tuple. All values in a column are of the same data type. A column is
formally referred to as an attribute. The set of all tuples in the table goes by the name
relation. A relation consists of two parts: (a) an empty shell called the heading, which is a
tuple of attribute names, and (b) a body of data that inhabits the shell; this body of data is
a set of tuples all having the same heading. The heading of a relation is also referred to in
the literature as a relation schema, schema, scheme, or intension. When the heading is
called intension, then the body of the relation is referred to as extension.
Recall from Chapter 2 that the domain of an attribute is the set of possible values for
the attribute. In a relational data model, a domain is defined as a set of atomic values for
an attribute, and an attribute is the name of a role played by a domain in the relation.
In formal terms, an attribute, A, is an ordered pair (N, D) in which N is the name of
the attribute and D is the domain that the named attribute represents. If r is a relation
whose structure is defined by a set of attributes A1, A2 …, An, then R (A1, A2 …, An) is
called the relation schema5 of the relation r. In other words, a relation schema, R, is a
named collection of attributes (R, C) in which R is the name of the relation schema and C
is the set {(N1, D1), (N2, D2), … …, (Nn, Dn)}, where N1, N2 …, Nn are distinct names. r is
the relation (or relation state) over the schema R. The domain of Ai (i ¼ 1, 2, …, n) is
often denoted as Dom (Ai). The number of attributes (n) in R is called the degree (or
arity) of R. A relation state r of the relation schema R (A1, A2 …, An), also denoted as
r (R), is a set of n-tuples {t1, t2 …, tm}. Each n-tuple tj (j ¼ 1, 2, …, m) in r (R) is an
ordered list of n values <v1j, v2j, …, vnj> where each vij (i ¼ 1, 2, …, n; j ¼ 1, 2, …, m) is
an element of Dom (Ai) (i ¼ 1, 2, …, n) or, when allowed, a missing value represented by
a special value called null. The number of tuples, m, in the relation state is called the
3
Date, C. J., and Hugh Darwen. Foundation for Object/Relational Databases, Addison-Wesley, 1998.
4
Codd, E. F. “A Relational Model for Large Shared Data Banks,” Communications of the ACM, 13, 6
(June, 1970) 377–387.
5
A relation schema is sometimes loosely referred to as a relation. C. J. Date (2004) has coined the
term “relvar” (for relation variable) to distinguish a relation schema from a relation. It is important
to notice the difference between a relation and a relation schema as well as between a relation
schema and a relational schema. A relational schema defines a set of relation schemas in a
relational data model.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
cardinality of the relation. Figure 6.1 shows an example of a relation schema of degree 3
and two relation states of cardinality 5 and 4, respectively, for that relation schema.
282
6
An implicit assumption of the relational database theory is that attributes have unique names over
the entire relational schema. Additional discussion of this issue follows in Section 6.6.2.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Since the relational database theory stipulates that attribute names must be unique over
the entire relational schema, the following guidelines represent the approach used in this
chapter for developing attribute names:
• Each attribute name begins with a prefix of up to three letters that represents
a meaningful abbreviation of the name of the relation schema to which the
attribute belongs. This prefix is followed by an underscore character.
• Only the first letter of the prefix is capitalized.
• Following the underscore character is a suffix that corresponds to the attri-
bute name itself. This suffix may contain only lowercase letters, the pound
sign (#), and underscore characters; it corresponds to the name of the attri-
bute in the conceptual data model.
These guidelines were used in developing the attribute names for the PLANT relation
schema in Figure 6.1.
7
A relation maps to a flat file at the physical level. This is supported by the assumption of the theory
behind the relational model called the First Normal Form assumption. A discussion of normal forms
begins in Chapter 8.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
textual form and are often referred to as semantic integrity constraints,8 whereas the
constraints directly expressed in the schema of the data model are labeled as schema-
based or declarative constraints.9 Domain constraints, key constraints, entity integrity
constraints, referential integrity constraints, and functional dependency constraints are
part of the declarative constraints of a relational data model. (Entity integrity constraints
and referential integrity constraints are discussed in the following sections. Functional
dependency constraints are discussed in Chapter 7.) Uniqueness constraints and struc-
tural constraints of a relationship type are declarative constraints specified at the concep-
284 tual level in an ER model. Semantic integrity constraints that require procedural
intervention can always be specified and enforced through application programming code
and are called application-based constraints. Procedural enforcement of integrity con-
straints is also often possible via mechanisms incorporated in the DBMS that use general
purpose constraint specification languages, such as triggers and assertions.
Since every valid state of a database must satisfy the declarative and procedural forms of
the integrity constraints noted earlier, these constraints are collectively referred to as state
constraints. Sometimes, integrity constraints may have to be specified to define legal transi-
tions of state. For example, the value of the attribute Marital_status can change from Married to
Divorced or Widowed, but not to Single. These types of constraints are referred to as transition
constraints and invariably require procedural language support either through application
programs or general-purpose constraint specification languages of the DBMS.
8
Note, however, that all integrity constraints pertain to the “semantics” of the database application.
9
A third category of constraints is inherent to the data model; these are called inherent model-based
constraints and do not necessarily emerge from the semantics of the application. For example, the
characteristics of a relation stated in Section 6.2 are inherent to a relational schema. Similarly, an
inherent constraint of an ERD is that a relationship type can link only entity types.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
PRESCRIPTION-A
PRESCRIPTION-B
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
10
A set S1 is a superset of another set S2 if every element in S2 is in S1. S1 may have elements that
are not in S2.
11
A set S2 is a subset of another set S 1 if every element in S2 is in S1 . S1 may have exactly the
same elements as S2 . If S2 is a subset of S1, S1 is a superset of S 2. A set S2 is a proper subset of
another set S1 if every element in S2 is in S1 and S1 has some elements which are not in S2.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
287
TABLE 6.2 Superkeys and candidate keys in the PRESCRIPTION-A and PRESCRIPTION-B relations
While (Rx_rx#), (Rx_pat#), (Rx_medcode), and (Rx_rx#, Rx_pat#) are not superkeys of
PRESCRIPTION-B (Rx_rx#, Rx_medcode) and (Rx_pat#, Rx_medcode) are indeed superkeys
of PRESCRIPTION-B. Therefore, (Rx_rx#, Rx_pat#, Rx_medcode) is not a candidate key of
PRESCRIPTION-B because two of its proper subsets are superkeys of PRESCRIPTION-B.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
The same procedure can be used to evaluate if any other superkey of PRESCRIPTION-B
is also a candidate key of PRESCRIPTION-B. A review of the data in Table 6.1 indicates that
none of the other superkeys is also a candidate key of PRESCRIPTION-B.
Every attribute, whether atomic or composite, plays one of three roles in a relation
schema. It is a key attribute, a non-key attribute, or a candidate key. Any attribute that is
a proper subset of a candidate key is a key attribute. An attribute that is not a subset of a
candidate key is a non-key attribute. For example, in PRESCRIPTION-A, because
(Rx_pat#, Rx_medcode) is a candidate key, Rx_pat# and Rx_medcode are key attributes.
288 Because Rx_rx#, an atomic attribute, is a candidate key of PRESCRIPTION-A, it is, by
definition, not a key attribute in PRESCRIPTION-A because Rx_rx# is not a proper subset
of Rx_rx#. However, Rx_rx# is not a non-key attribute either, because it is a candidate key
of PRESCRIPTION-A. In short, a candidate key in itself is neither a key attribute nor a
non-key attribute in R. Further discussion of this appears in Chapter 7.
A primary key serves the role of uniquely identifying tuples of a relation. A primary
key is a candidate key (an irreducible unique identifier) with one additional property. This
additional property results from what is known as the entity integrity constraint, which
specifies that the primary key of a relation schema cannot have a “missing” value (i.e., a
null value), essentially assuring identification of every tuple in a relation. Given a set of
candidate keys for a relation schema, exactly one is chosen as the primary key. Since the
entity integrity constraint applies exclusively to a primary key, by implication the rest of
the candidate keys12 not chosen as the primary key apparently tolerate “missing” values;
otherwise, the entity integrity rule will apply to all candidate keys (Date, 2004). In short,
when a candidate key of a relation schema is chosen to be the primary key of that relation
schema, it is bound by the entity integrity constraint, and from that point forward, the
alternate keys (other candidate keys) may entertain “missing” values.13 What happens
when an alternate key is chosen as the primary key at a later time? The answer is inher-
ent in the definition of the primary key—that is, the entity integrity constraint must be
enforced in order for the alternate key to become the primary key of the relation schema.
Since the primary key value is used to identify individual tuples in a relation, if null
values are allowed for the primary key, some tuples cannot be identified; hence, the entity
integrity constraint, which disallows null values for a primary key. A primary key of a
relation schema is denoted by underlining the atomic attributes (or single attribute) that
constitute a primary key. In Table 6.3, Pl_p# is the primary key of PLANT. Observe that
both Pl_name and Pl_p# are candidate keys of PLANT as modeled in the conceptual
schema (refer to Figure 3.11) and that Pl_p# has been chosen by the systems analyst/
database designer as the primary key. This implies that the entity integrity constraint is
imposed only on Pl_p# and not on Pl_name.
12
Once the primary key is chosen from among the set of candidate keys, the remaining candidate
keys are referred to as “alternate keys.” The choice of a primary key of a relation schema from
among its candidate keys is essentially arbitrary.
13
This indeed is in contradiction with the strict definition of a relation—i.e., a relation cannot have
missing value for any of its attributes. Nonetheless, relaxation of this constraint is very common in
practice.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
289
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
A foreign key constraint is just a special kind of inclusion dependency. In terms of the
foreign key constraint, the foreign key value in r2 represents a reference to the tuple con-
taining the matching candidate key value in the referenced tuple in r1. When a relation
schema includes a foreign key that references some candidate key in the same relation
schema, then the relation schema is said to be self-referencing (equivalent to a recursive
relationship type in the ERD).
As an example, consider a scenario in which manufacturing plants undertake pro-
jects. All plants need not undertake projects, but any plant may undertake several
projects. Likewise, a project is controlled by only one plant and not all projects are
controlled by plants. Version 1 of the PROJECT relation in Table 6.3 uses Prj_pl_p# as
the foreign key referencing the primary key of PLANT (Pl_p#). Observe that the name
of the foreign key attribute begins with the prefix Prj that represents an abbreviation
14
If the participation of R2 in this relationship with R1 is partial, the foreign key attribute, A2, can
have null values.
15
The relational model originally required that foreign keys reference, very specifically, the primary
key, not just candidate keys. This limitation is unnecessary and undesirable in general, although it
might often constitute good discipline in practice (Date, 2004, p. 274).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
of the referencing relation schema name; the suffix is the name of the referenced
attribute Pl_p# from the referenced relation schema. The constraint is expressed as
follows:
PROJECT.{Prj_pl_p#} PLANT.{Pl_p#} or Ø
Here, Ø indicates “null,” meaning that a project need not be controlled by a plant.
In Version 2 of the PROJECT relation, Prj_pl_name in PROJECT references Pl_name in
PLANT, a candidate key of PLANT, not the primary key of PLANT.
The constraint is expressed as follows: 291
PROJECT.{Prj_pl_name} PLANT.{Pl_name} or Ø
The naming convention applied here to name a foreign key attribute in the referencing
relation schema consists of (a) the prefix used in conjunction with the attribute names in
the referencing relation schema, (b) an underscore, and (c) the referenced attribute name.
The example in Table 6.3 demonstrates enforcement of a referential integrity constraint
along with this naming convention. Observe that in its current state, all projects in the
PROJECT relation are controlled by some plant.
16
A formal discussion of relational algebra, including the Cartesian product and Division operations,
appears in Chapter 11.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
292
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
from a relation. Selection and Projection are referred to as unary operations because they
produce a new relation by manipulating only a single relation.
Example of a Selection operation: Which award-winning plants have a budget that
exceeds $2,000,000?
Result:17
Whitefield 12 2910000
293
King’s Island 19 2500000
Ashton 15 2500000
Example of a Projection operation: What is the plant number and budget of each
award-winning plant?
Result:
R_aw_pl_p# R_aw_pl_budget
11 1230000
13 1930000
12 2910000
17 1930000
19 2500000
15 2500000
If each attribute involved in a Projection operation is not unique, it is possible for the new
relation produced to have duplicate tuples. If this occurs, these duplicate tuples are deleted.
17
The result obtained in this and all other examples in this section produces the new relation
RESULTS, which contains its own unique attribute names.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Example of the Union operation: What plants are either located in Texas or are
award-winning plants?
Result:
Ashton 15 2500000
Kingwood 18 1930000
Observe that duplicate tuples are omitted (i.e., the River Oaks plant, an award-winning
plant located in Texas, appears only once in the result). In addition, note that AW_PLANT
and TX_PLANT are union compatible.
Example of the Difference operation: Which Texas plants are not award-winning
plants?
Result:
Kingwood 18 1930000
Note the difference between this result and the following result, to the question
“Which award-winning plants are not located in Texas?”
Result:
Whitefield 12 2910000
Ashton 15 2500000
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
295
6.4.3 The Natural Join (*) Operation
The Join operation combines two relations into a third relation by matching values for
attributes in the two relations that come from the same domain. The tuples in the new
relation consist of the tuples extracted from the first relation concatenated with each tuple
in the second relation where there is a match on the joining attributes. When the new
relation contains all the attributes from the first relation plus all the attributes from the
second relation but does not redundantly carry the joining attributes, the result is called a
Natural Join.18
Example of a Natural Join operation: Perform a natural join of the award-winning
plant and project relations.
Result:
Note that the joining attributes here are Aw_pl_p# and Prj_aw_pl_p#. Relational algebra
operations can be combined to form more complex expressions. Using the Natural
Join operation (as shown here) followed by a Selection operation on plant number 11
and a Projection operation on the project name yields a relation that contains the
names of the projects controlled by plant number 11 (i.e., Solar Heating and Robot
Sweeping).
18
Three other types of joins exist: Equijoin, Theta Join, and Outer Join. These are discussed in
Chapter 11.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
19
Date, C. J., and Hugh Darwen. Foundation for Object/Relational Databases, Addison-Wesley, 1998.
20
Normalization is covered in Part III.
21
Deletion of an attribute or a relation schema usually does not yield an information-equivalent logi-
cal structure; if it does, this implies that the original conceptual schema had information
redundancy.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
for by re-creating the original structure to serve as the source schema(s) for the con-
struction of the external schema (e.g., a denormalized view of the normalized relation
schemas that simulates the original logical structure). Views are an effective means to
achieve logical data independence as long as any restructuring of the conceptual schema
creates a version that is information-equivalent to the original conceptual schema.
A materialized view (also known as a snapshot), despite the similarity in name, is not
a view. Like a view, it is constructed from one or more relation schemas; unlike a view,
a materialized view is stored in the database and refreshed when updates occur to the
relation schemas from which the materialized view is generated. Materialized views are 297
often used to freeze data as of a certain moment without preventing updates to continue
on the data in the relation schemas on which they are based. A materialized view is often
deleted when it is not used for a period of time and then reconstructed from scratch as
future needs dictate.
22
Formally, a transformation in which all possible database instances that can be represented in a
source schema can be represented in a target schema implies information preservation (Fahrner and
Vossen, 1995). To that extent, the spirit of the discussion here pertains to design information
preservation.
23
Fahrner, C., and Vossen, G. “A Survey of Database Design Transformations Based on the Entity-
Relationship Model.” Data & Knowledge Engineering, 15, 3, 1995: 213–250.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Thus, it is crucial that a logical schema captures as much of the inherent, implicit, and
explicit constructs and constraints conveyed by the conceptual schema as possible
through the logical modeling grammar, and that it carry forward the remainder of the
semantic integrity constraints to the next step in the design process, which is physical
data modeling.
However, popular mapping techniques presently in vogue that map directly to a rela-
tional schema are information-reducing in nature. Section 6.7 discusses these techniques
and points out their information-reducing aspects. Then, Section 6.8 presents a new
298 information-preserving logical modeling grammar24 for transforming ER and EER models
to their logical counterparts.
24
An early version of this grammar was presented in the Workshop on Information Technology and
Systems (WITS) in December 2000 at Brisbane, Australia (Umanath and Chiang, 2000).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
the primary key of EMPLOYEE because both are defined as candidate keys of EMPLOYEE
in the ERD. Likewise, either Pl_name or Pnumber becomes the primary key of PLANT.
Observe in Figure 6.3 that [Emp_e#a, Emp_e#n] and Pl_p# have been specified as the pri-
mary key of the relation schema (EMPLOYEE and PLANT, respectively), as indicated by
the underlining of these attributes.
299
Works_in
(10
0,n
)
C
R
N [Dt,8]
Mgr_start_dt
(0,1) (1,1)
Managed_by
FIGURE 6.3 Logical schema for the ERD in Figure 6.2: Foreign key design
A weak entity type in the ERD is mapped in a similar manner except that the primary
key of each identifying parent of the weak entity type is added to the relation schema. The
attributes thus added, along with the partial key of the weak entity type, form the primary
key of the relation schema representing the weak entity type. In other words, there is no
such thing called a “weak” relation schema. All relation schemas are “strong”—meaning
they have a primary key. Observe the mapping of the weak entity type BUILDING in
Figure 6.3. The primary key of PLANT, the only identifying parent of BUILDING, has been
concatenated to the partial key, Bld_building, to form the primary key of the relation
schema BUILDING.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Mapping the Cardinality Ratio of 1:n: Foreign Key Design When the cardinality ratio
of the relationship type under consideration is 1:n, the entity type on the n-side of the
relationship type is the child in the PCR (parent-child relationship), and hence it becomes
the referencing relation schema in the logical tier. Because each tuple in the child rela-
tion is related to at most one tuple in the referenced (parent) relation, the placement of
foreign key attributes (or attributes) in the referencing relation schema maps the rela-
tionship type specified in the ERD. Of course, the foreign key placed in the referencing
schema shares the same domain with a candidate key (invariably and preferably, but not
necessarily the primary key) of the referenced relation schema. This is expressed dia-
grammatically by drawing a directed arc originating from the foreign key attribute(or
attributes) to the relation schema it references, with the arrow head terminating at the
referenced candidate (or primary) key. In addition, a rule requiring that all foreign key
values match some value of the referenced candidate key except, of course, when the
foreign key value is null (i.e., referential integrity constraint) is implied. This technique is
often labeled as the foreign key technique/design.
In the example in Figure 6.2, the Works_in relationship type represents a PCR where
PLANT is the parent and EMPLOYEE is the child. Therefore, the mapping of Works_in is
implemented in the logical schema (Figure 6.3) by adding the attribute Emp_pl_name to
EMPLOYEE, which fulfills the role of a foreign key by referencing Pl_name, an alternate
key of PLANT.25 Also, in Figure 6.3, notice the directed arc originating from
EMPLOYEE.Emp_pl_name and pointing at PLANT.Pl_name. The attributes of a relationship
type in the ERD, if any, are also added to the referencing (child) relation schema
along with the foreign key attribute(s). Next, while the primary key of BUILDING is
[Bld_building, Bld_pl_p#], BUILDING.Bld_pl_p# also serves as the foreign key referencing
PLANT.Pl_p#, the primary key of the (identifying) parent, PLANT, thus accurately portray-
ing the presence of the identifying relationship type, Houses.
The clarity of the foreign key design deteriorates very quickly as the number of rela-
tion schemas in the relational data model increases, because the spaghetti of directed arcs
becomes difficult to trace. Alternatively, instead of the directed arcs, it is possible to
express the relationship types via the specification of inclusion dependencies. This
method of expressing a referential integrity constraint is shown in Figure 6.4. Both meth-
25
Instead, the foreign key added to EMPLOYEE could have been an attribute that references Pl_p#,
the primary key of PLANT.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
ods fail to map the participation constraints available in the ERD to the logical tier—a case
in point for information reduction in mapping.
301
FIGURE 6.4 Referential integrity constraints in Figure 6.3 expressed as inclusion dependencies:
Foreign key design
When using the foreign key design, it is important to note what happens should
the foreign key attribute inadvertently be placed in the parent relation (what should be
the referenced relation schema) instead of in the child relation (what should be the
referencing relation schema). For example, mapping the Works_in relationship type by
placing the primary key of EMPLOYEE [Pl_emp_e#a, Pl_emp_e#n] as a foreign key
attribute in PLANT amounts to a reversal of the cardinality constraint and results in a
serious semantic error.
26
There is a sense of tentativeness associated with the status of “null” values in data. Therefore, a
database design that avoids presence of null values is expected to be relatively more robust than the
ones that freely allow null values in the data.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Works_in
1)
(0, (10
0,n
)
C
R
302 N [Dt,8]
Mgr_start_dt
(0,1) (1,1)
Managed_by
FIGURE 6.5 Reproduction of Figure 6.2 with a change in participation constraints of Works_in
FIGURE 6.6 Logical schema for the ERD in Figure 6.5: Cross-referencing design
FIGURE 6.7 Referential integrity constraints in Figure 6.6 expressed as inclusion dependencies:
Cross-referencing design
In the previous approach (the foreign key design illustrated in Figures 6.2 through
6.4), if there is a need to add an employee who does not work for any plant, a
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
tuple is added to the EMPLOYEE relation with a null value for the foreign key
EMPLOYEE.Emp_pl_name. In the current cross-referencing design, however, an employee
can be added without any concern about whether he or she works in a plant or not, and
yet not use a null value in EMPLOYEE to portray this. Figure 6.7 is a representation of the
same cross-referencing design using inclusion dependencies.
The cross-referencing design is not as compact as the foreign key design and can pro-
liferate the logical data model with numerous relation schemas in a hurry. Even when the
participation of EMPLOYEE in Works_in is optional, it is probably practical to use the
foreign key design and allow null values for the foreign key EMPLOYEE.Emp_pl_name in 303
the interest of a somewhat more efficient design. However, the price paid is that, by
definition, we don’t have a ‘relational schema’; plus, the null value possible for foreign key
in some of the EMPLOYEE tuples is also an issue to be considered. We will encounter
another such condition in the following discussion about mapping 1:1 relationship types
to the logical tier.
Mapping the Cardinality Ratio of 1:1 When the cardinality ratio of a relationship type
is 1:1, mapping such a relationship type to the logical tier becomes somewhat complicated
because either one of the entity types engaged in this relationship type can be the parent
or the child. Three solutions are possible, and each is conducive to specific situations—
the situations occasioned by particular participation constraints in the relationship.
Case 1: The participation constraint of one of the entity types participating in the
relationship type is total, as shown in Figure 6.8.
[N,5]
[A,20] [N,1]
Emp_n [A,1] [A,30]
Lname Name_tag [N,6] [N,2]
Minit Pl_name
[X,50] Salary Pnumber
[A,20] Address
Fname
[A,1] Name
Emp_a
[N,3]
[N,7]
No_of_employees
Budget
Emp# R [Dt,8]
EMPLOYEE PLANT
N Mgr_start_dt
[A,1]
Gender (0,1)
(1,1)
[Dt,8]
Date_hired
Managed_by
[N,2]
No_of_dependents
FIGURE 6.8 A 1:1 relationship type with total participation of PLANT in Managed_by
The best way to handle this case is to choose the entity type with total participation in
the relationship type to assume the role of child in the PCR. Then the foreign key design
described at the beginning of this section for mapping a 1:n cardinality ratio can be
directly applied here as well, as shown in Figures 6.9 and 6.10. As the foreign key design
amounts to an implicit specification of 1:n cardinality ratio, an additional constraint
explicitly specifying that the foreign key value must be unique is necessary to convey
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
the 1:1 cardinality ratio, which incidentally renders the foreign key an alternate key
(candidate key) of the child entity type.
304
FIGURE 6.9 Logical schema for the ERD in Figure 6.8: Foreign key design
EMPLOYEE (Emp_e#a, Emp_e#n, Emp_fname, Emp_minit, Emp_lname, Emp_nametag, Emp_gender, Emp_address, Emp_salary, Emp_datehired)
FIGURE 6.10 Referential integrity constraint in Figure 6.9 expressed as an inclusion dependency:
Foreign key design
Equally important, the total participation of the child in the relationship type is
incorporated in the design via an explicit specification of “no missing value” for the foreign
key. For instance, consider the Managed_by relationship type in Figure 6.8. The partici-
pation of PLANT in this relationship type is total, as indicated by the (min) value of 1.
Therefore, if an attribute (or attributes) representing the foreign key is (are) added to the
relation schema PLANT and this foreign key shares the same domain with the primary key
[Emp_e#a, Emp_e#n] or any other candidate key (for example, [Emp_fname, Emp_minit,
Emp_lname, Emp_nametag]) of the relation schema EMPLOYEE (see Figure 6.9), then with
additional constraint specifications of uniqueness and “not null” on the foreign key in
PLANT, the Managed_by relationship type will be fully implemented in the relational data
model. These two constraints can be specified declaratively.
On the other hand, suppose EMPLOYEE is chosen as the child in this 1:1 PCR. Then,
under the foreign key design, either Emp_pl_name or Emp_pl_p# will be added to the rela-
tion schema EMPLOYEE as the foreign key to depict the Managed_by relationship type.
However, since the participation of EMPLOYEE in the Managed_by relationship is only
partial, the corresponding foreign key values can legitimately have null values in some of
the EMPLOYEE tuples. As a consequence, addition of a tuple in PLANT will require a pro-
cedural intervention in EMPLOYEE that ensures, at the least, a concurrent assignment of
an employee to manage a plant, because every plant must have a manager (total partici-
pation of PLANT in the Managed_by relationship type). In other words, a plant added to
the PLANT relation must reference some employee tuple in the EMPLOYEE relation.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Thus, it is fairly obvious that including the foreign key in the relation schema that has
mandatory participation in the relationship type is the better solution.
Case 2: The participation constraints of both entity types in the relationship type are
partial, as shown in Figure 6.11.
[N,5]
[A,20] [N,1]
Emp_n [A,1] [A,30]
Lname Name_tag [N,6] [N,2]
Minit Pl_name
[X,50] Salary Pnumber
[A,20] Address 305
Fname
[A,1] Name
Emp_a
[N,3]
[N,7]
No_of_employees
Budget
Emp# R [Dt,8]
EMPLOYEE PLANT
N Mgr_start_dt
[A,1]
Gender (0,1)
(0,1)
[Dt,8]
Date_hired
Managed_by
[N,2]
No_of_dependents
FIGURE 6.11 A 1:1 relationship type with partial participation of both EMPLOYEE and PLANT in
Managed_by
In this case, from a strictly design perspective, addition of a foreign key in either one
of the relation schemas involved in the 1:1 relationship type is sufficient. Figures 6.11,
6.12, and 6.13 display an example for Case 2. The ERD in Figure 6.11 is a simple
variation of the earlier example (Figure 6.8) in that the participation of PLANT in
Managed_by is also partial, as reflected by the (min) value of 0. Since the participation
of both EMPLOYEE and PLANT in the Managed_by relationship type is partial, either
PLANT or EMPLOYEE can assume the role of the child in this relationship. Figure 6.12
shows the foreign key design using a directed arc in which EMPLOYEE, as the child,
carries the foreign key. The same design using inclusion dependency appears in
Figure 6.13.
FIGURE 6.12 Logical schema for the ERD in Figure 6.11: Foreign key design
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
FIGURE 6.13 Referential integrity constraint in Figure 6.12 expressed as an inclusion dependency:
Foreign key design
306
Other semantic or operational considerations may sometimes suggest inclusion of
the foreign key in a specific relation schema. For instance, the user may have a predis-
position towards the semantics of the relationship. Sometimes, one of the entity types
will have a small entity set relative to the other, in which case it is operationally efficient
to designate it as the child in the PCR. In certain cases, optimal data access is facilitated
by mutual-referencing—that is, when the two relation schemas directly reference each
other by placing foreign keys in both. In this situation, cross-referencing ought to be
considered instead of mutual-referencing, because mutual-referencing between two rela-
tion schemas entails specification of additional constraints to ensure consistency mainte-
nance (i.e., reference to the correct tuple). Such constraints can only be implemented
procedurally, and the ramifications are further clarified in the upcoming discussion of
Case 3. The cross-referencing design eliminates imposition of such a constraint, however,
at the expense of adding a relation schema to portray the relationship type. Sometimes,
such an alternative may be worth considering, such as if the expected size of the intervening
relation is small relative to the two base relations in the relationship type. The cross-
referencing designs (using directed arcs and inclusion dependencies) for the ERD that was
shown in Figure 6.11 and reproduced in Figure 6.14 are portrayed in Figures 6.15 and 6.16,
respectively.
[N,5]
[A,20] [N,1]
Emp_n [A,1] [A,30]
Lname Name_tag [N,6] [N,2]
Minit Pl_name
[X,50] Salary Pnumber
[A,20] Address
Fname
[A,1] Name
Emp_a
[N,3]
[N,7]
No_of_employees
Budget
Emp# R [Dt,8]
EMPLOYEE PLANT
N Mgr_start_dt
[A,1]
Gender (0,1)
(0,1)
[Dt,8]
Date_hired
Managed_by
[N,2]
No_of_dependents
FIGURE 6.14 A 1:1 relationship type with partial participation of both EMPLOYEE and PLANT in
Managed_by
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
307
FIGURE 6.15 Logical schema for the ERD in Figure 6.14: Cross-referencing design
FIGURE 6.16 Referential integrity constraints in Figure 6.15 expressed as inclusion dependencies:
Cross-referencing design
Case 3: The participation constraints of both entity types in the relationship type are
total.
Here, it is first necessary to add a foreign key in both relation schemas engaged in the
relationship—in other words, mutual-referencing. Only by constraining both foreign keys
to be unique can it be ascertained that the cardinality ratio is 1:1. By virtue of this con-
straint, the defined foreign keys also become alternate keys of the respective relation
schemas. Total participation of both entity types in the relationship type is incorporated in
the design by not allowing null values for the two foreign keys. The presence of foreign
keys in both relation schemas referencing each other creates two problems. The first
problem is that it becomes necessary to make sure that the [primary/candidate key, for-
eign key] pairs in the two relations match. This cannot be done using declarative con-
straints; procedural intervention is necessary to accomplish this. For instance, if employee
A12357 manages plant 19, since mutually referencing foreign keys are present in both
relations, plant 19 must be managed by employee A12357 and nobody else. This is an
additional constraint and can only be implemented via procedural intervention. Notice
that in a cross-referencing design (as shown in Figure 6.15), the fact that A12357 manages
plant 19 and vice versa is captured in the MANAGED_BY relation; this eliminates the
need for any procedural intervention. The second problem is that the two relation sche-
mas referencing each other create a cycle. Therefore, enforcement of at least one of the
two referential integrity constraints must be deferred to run time. An alternative solution
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
to preempt these problems is to merge the two relation schemas into a single-schema
design. However, it is not always possible to adopt a single-schema design, especially if the
relationship involves two distinct entity types and/or the entity types also participate
independently in other relationship types.
A variation of the Managed_by relationship type (the variation is intended only for
lending better semantic sense) appears in Figure 6.17. MANAGER is a partial specializa-
tion of EMPLOYEE, a given plant is managed by exactly one manager, and each manager
manages exactly one plant. The total participation of both MANAGER and PLANT in
308 Managed_by implies mutual-referencing between PLANT and MANAGER.
FIGURE 6.17 A 1:1 relationship type with total participation of both MANAGER and PLANT in
Managed_by
The mutual-referencing designs in Figures 6.18 and 6.19 reflect this relationship.
Clearly, a procedural constraint is required to verify that the pairs ([Mgr_emp_e#a,
Mgr_emp_e#n, Mgr_pl_p#], [Pl_emp_e#a, Pl_emp_e#n, Pl_p#]) of values from MANAGER and
PLANT match. In addition, deferred enforcement of at least one of the two referential
integrity constraints (Figure 6.19) is also necessary. Additional procedures may be
required to manage other constraints—for example, the deletion rule that restricts dele-
tion of a tuple in MANAGER if a matching tuple in PLANT exists. Incidentally, the example
here includes a partial specialization of EMPLOYEE as MANAGER. Mapping of enhanced
ER model constructs is discussed in Section 6.8.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
309
FIGURE 6.18 Logical schema for the ERD in Figure 6.17: Mutual-referencing design
FIGURE 6.19 Referential integrity constraints in Figure 6.18 expressed as inclusion dependencies:
Mutual-referencing design
Note that the redundant inclusion of foreign keys in both relation schemas referencing
each other (mutual-referencing) can be done in all three cases as long as one is willing to
incur the penalty of maintaining consistency. Similarly, in any relationship type that has a
1:1 cardinality ratio, combining the two entity types into a single relation schema requires
evaluation because a single-schema design, when feasible, minimizes complex integrity
constraints and attribute redundancies.
Also, the self-referencing property inherent to a recursive relationship type does not pose
any special problems beyond what has been considered in the discussions in this section.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
310
TABLE 6.5 A stepwise guide for mapping a Design-Specific ERD to a relational schema using
the foreign key design
The Design-Specific ERD for Bearcat Incorporated, the domain constraints on the
attributes, and a few other semantic integrity constraints not recorded in the ERD, which
were provided in Chapter 3, are reproduced in Figure 6.20 and Table 6.6 for conve-
nience.27 The objective of this section is to develop the corresponding logical data model.
To this end, the section first focuses on mapping the ERD to a logical schema.
27
Recall that an ER model constitutes the ERD and the semantic integrity constraints not specified
in the ERD.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
[A,1] [N,1]
Emp_a [A,1] [A,20] [A,30]
Name_tag
Minit Lname [N,6] Pl_name [N,2]
[X,50] Salary Pnumber
[A,20] Address Works_in
Fname (1,1)
[N,5] Name (100,n)
Emp_n
C [N,3] [N,7]
No_of_employees Budget
Emp# [Dt,8]
EMPLOYEE PLANT
R Mgr_start_dt
[A,1] N
(0,1)
Gender D R R (1,1)
(0,n) 311
[Dt,8]
Date_hired
Managed_by (3, n)
[N,2]
(0,n)
No_of_dependents (0,n)
(0,20)
Undertaken_by
(0,1) [N,2]
(0,m)
Pnumber
Uses (0,1)
Houses
N
Supervised_by
PROJECT
[A,20]
Pr_name (1,1)
Dependent_of (1,m) [A,15]
Plocation
(1,1)
Held_by_E
[A,12]
Belongs_to
Related_how D
[A,15] (1,1)
Dname
BUILDING
C [N,3] (1, 1)
Dependent Hours
-------------- [A,1]
Gender C
DEPENDENT
[A,20]
(0,1)
ASSIGNMENT Building
--------
[Dt,8]
[A,1] [X,6] Birthdate (0,n)
Acct_type Account# (0,n) [N,6]
Annual_cost
Held
_by_
D
Includes_D
(1,1) PARTICIPATION
Account_id (0,1) C
C C
[A,20]
BCU_ACCOUNT Hb_name (1,1)
HOBBY
[N,2.1]
[N,8.2] R Hrs_per_wk
Balance (0,m) Includes_H
Gi_activity
Io_activity
[A,1]
[A,1]
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
TABLE 6.6 Semantic integrity constraints for the final Design-Specific ER model
Using the mapping guide in Table 6.5 and applying the foreign key design discussed
thus far in this chapter, the logical schema shown in Figure 6.21 and Table 6.7 can be
obtained (the reader may wish to do this as an exercise and verify the result against
Figure 6.21). The corresponding relational schema using inclusion dependencies is shown
in Figure 6.22.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
313
FIGURE 6.21 Logical schema for Bearcat Incorporated: Foreign key design using directed arcs
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
314
None of the lost information is trivial. These items convey the business rules
specified by the user community and design characteristics explicitly expressed in
the Design-Specific ERD that serves as the source schema for the mapping process.
Most of this metadata is needed to implement the physical data model correctly.
One alternative is to include all the lost information in a list of semantic integrity
constraints at the logical tier. Alternatively, the conceptual data model (the
Design-Specific ERD and the semantic integrity constraints) can be used to supplement
the underdeveloped logical schema. In that case, the very utility of a logical schema
in the systematic development of a database design becomes questionable. The next
section presents a logical modeling grammar that can produce an information-
preserving script (logical schema).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
315
FIGURE 6.22 Logical schema for Bearcat Incorporated: Foreign key design using inclusion
dependencies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Att1 Q[Att2] Att3• Att4 [Att5 Att6] Att7 FKAtt1 FKAtt2 ... AttZ
Lx: SCHEME
(t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s)
or
(Att1, Q[Att2], Att3•, Att4, [Att5, Att6], Att7, . . . FKAtt1, FKAtt2, . . ., AttZ)
Lx: SCHEME
(t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s) (t,s)
316
Step 1: Specify a logical scheme for each base and weak entity type in the ERD fol-
lowing the grammar described here:
Where x ¼ (1, 2, 3, 4, … …., N)
• SCHEME is the name of the entity type being mapped. (Use all capital
letters.)
• Lx is a label for the SCHEME. (Or it could be an abbreviated short name of
the SCHEME.)
• N is the number of SCHEMEs in the logical schema.
• Att1, …, AttZ are the names of the atomic attributes from the entity type.
(Capitalize first letter only.)28
• The primary key is underlined—the constituent attributes need not be
recorded successively.
• Attributes specified as mandatory in the ERD (•) are marked by an • follow-
ing the attribute.
• The data type (t) and size (s) of an attribute mapped from the ERD are
recorded immediately below the attribute.
• Composite (molecular) attributes are enclosed by square brackets [ …, …];
for this reason, the constituent atomic attributes are recorded next to each
other.
• Alternate keys are enclosed in square brackets and marked by a Q (meaning
unique) preceding the alternate key attribute(s).
• Derived attributes not stored in the database are denoted by a dotted under-
line (_ _ _ _ _).
28
The Universal Relation Schema (URS) assumption dictates that every attribute name must be
unique because attributes have a global meaning in a database schema. Therefore, if an attribute
name appears in several relation schemas, all of these denote the same meaning—that is, the attri-
butes are semantically join-compatible. In an ER model, however, the same attribute name is
allowed to appear in different entity types since they imply different roles for the attribute name. We
have adopted the URS assumption for the logical schema presented here. Thus, mapping of attri-
butes from an ER model to a logical schema requires careful attention in order to ensure unique
attribute names in the logical schema. Note that a referencing foreign key and corresponding refer-
enced primary (or alternate) key having the same attribute name in a logical schema do not violate
the URS assumption.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Step 2: Map each relationship type in the ERD using either the foreign key design,
cross-referencing design, or mutual-referencing design, as discussed in Section 6.7.1.2:
• Each relationship type (diamond in the ERD) is accounted for by adding a
foreign key (say, FKAtt1) attribute or attributes to the child scheme in the
PCR. Obviously, the foreign key should share the same domain with the pri-
mary key of the parent scheme in the relationship type (the highlighting of
the foreign key with italics is a convention just intended to draw attention).
While the reference implied in general is to the primary key of the parent
317
scheme, if the design intention is to refer, in some cases, to an alternate key
of the parent scheme, it is accomplished by coding the foreign key name to
exactly match the name of the alternate key that is the target of the refer-
ence. This does not violate the universal relation schema assumption (see
Footnote 28).
• The label (e.g., Ly, Lz) of the parent relation is coded on top of the foreign
key box that depicts the relationship. The explicit reference connoted by the
foreign key is supplemented via this notation for ease of locating the refer-
enced scheme. Alternatively, the label can be prefixed to the foreign key
attribute’s name.
Step 3: Incorporate the structural constraints of the relationships (i.e., cardinality
ratio and participation constraints) using (min, max), as described here:
• The structural constraints of the relationship (i.e., cardinality ratio and par-
ticipation constraint) are expressed using the (min, max) notation as follows:
The (min, max) expression coded on the parent edge of the PCR in the ERD
is shown on the top of the foreign key, and the (min, max) expression coded
on the child edge of the PCR is shown on the bottom of the foreign key.
• min: participation constraint (min ¼ 0) indicates partial participation; (min ≥
0) indicates total participation.
• max: cardinality ratio (max ≥ min).
Step 4: Indicate the deletion rule parameter (C, N, D, or R).
• A deletion constraint parameter (C, N, D, or R) between min and max
recorded below the foreign key specifies the deletion rules—that is, action to
be taken when a tuple from the parent scheme in the relationship type (PCR)
is deleted. Four options are possible: C ¼ Cascade; N ¼ Set null; D ¼ Set
default value provided; R ¼ Restrict.
Despite its significant capacity to preserve design information, the grammar pre-
sented here is not capable of preserving all metadata (i.e., expressed in the ERD and
semantic integrity constraints) declaratively. For instance, the domain constraints on
the attributes often listed as semantic integrity constraints are not captured by this
grammar, and so will have to be carried forward via a list of semantic integrity con-
straints prepared at the logical tier. Also, the specific names and the roles of the rela-
tionship types are not preserved in the mapping process even though the relationship
types themselves are fully captured.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
318
[A,1] [N,1] [N,6]
[A,1] [A,20]
Emp_a Name_tag Salary Works_in [A,30]
Minit Lname [N,2]
[X,50] Pl_name
Pnumber
[A,20] Address
Fname (1, 1) (100, n)
[N,5] Name
Emp_n
[Dt,8] [N,3] [N,7]
Emp# C Mgr_start_dt No_of_employees Budget
EMPLOYEE PLANT
R N
[A,1]
Gender D (0,1) Managed_by (0,1)
[Dt,8]
Date_hired
(3, n)
[N,2]
No_of_dependents [A,20]
(0, 20) Building BUILDING D
(0, 1) -----
(1,1)
Houses
Supervised_by
[Emp_e#a Emp_e#n] Q[Emp_fname• Emp_minit Emp_lname• Emp_nametag]• Emp_gender• Emp_address• Emp_salary Emp_datehired•
L1: EMPLOYEE (A,1) (N,5) (A,1) (N,6)
(A,20) (A,20) (N,1) (A,1) (X,50) (Dt,8)
0 < - - - - - - - L1 - - - - - - - > 1 3 L2 n
Q[Pl_name] Pl_p# Pl_budget Pl_emp_e#a Pl_emp_e#n Pl_nemps Pl_mgrstdte Bld_building Bld_pl_p#
L2: PLANT ------------- L3: BUILDING
(A,30) (N,2) (N,7) (A,1) (N,5) (N,3) (Dt,8) (A,20) (N,2)
0<-------R------->1 1 D 1
FIGURE 6.23 A Design-Specific ERD and its associated information-preserving logical schema
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
The attribute type and size for each attribute are shown right below the attribute
name. The mandatory property of an attribute is marked by • above the attribute. The
primary key for each logical scheme is denoted by the underline. Observe that when a
primary key is a composite attribute, then all atomic attributes constituting the primary
key are underlined (for example, [Bld_building, Bld_pl_p#]).29 An alternate key (or keys) of
a logical scheme is identified by the letter Q preceding the bracketed attribute(s); for
example Q[Pl_name]. Q[Emp_fname, Emp_ minit, Emp_lname, Emp_nametag] means that this
composite attribute is unique—not the constituent atomic attributes. Foreign keys are
highlighted by italics (Emp_pl_name in EMPLOYEE, Bld_pl_p# in BUILDING). The derived 319
attribute of PLANT (Pl_nemps) is also mapped to the logical tier, the dotted underline
indicating the derived nature of the attribute. The label of the parent in a PCR and the
(min, max) on the parent edge of the relationship type are stated on top of the foreign key
representing the relationship type. For example, 100 L2 n above Emp_pl_name in
EMPLOYEE signifies that L2 (i.e., PLANT) is the parent in the PCR Works_in, and Works_in
is represented in the logical scheme by the foreign key Emp_pl_name in EMPLOYEE. The
(min, max) comes from the edge connecting PLANT to Works_in. Likewise, the (min, max)
on the edge connecting the child in the PCR (i.e., EMPLOYEE) to Works_in as well as the
deletion rule for Works_in are stated right below the foreign key representing that rela-
tionship type (i.e., 1 C 1 below Emp_pl_name in EMPLOYEE). Notice that the attribute pair
[Emp_emp_e#a, Emp_emp_e#n] in EMPLOYEE references L1, which is EMPLOYEE itself,
thus capturing the recursive relationship type Supervised_by. In the same fashion,
Managed_by and Houses are captured by PLANT.[Pl_emp_e#a, Pl_emp_e#n] and BUILDING.
Bld_pl_p# respectively. In essence, the logical schema in Figure 6.23 preserves all the
design information portrayed in the ERD above it.
The information-preserving logical schema for the Design-Specific ERD of Bearcat
Incorporated shown in Figure 6.20 is given in Figure 6.24. In addition to incorporating all
metadata conveyed by the ERD in the logical schema, item 4 of the semantic integrity
constraints for the Design-Specific ER model (Table 6.6) is also implicitly captured in the
logical schema. The rest of the semantic integrity constraints in Table 6.7 along with the
logical schema (Figure 6.24) complete the logical data model, which then becomes fully
information-preserving.
29
It is not mandatory that atomic attributes comprising the primary key of a logical scheme be listed
contiguously, although it is a good practice to do so.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
[Emp_e#a Emp_e#n] Q[Emp_fname• Emp_minit Emp_lname• Emp_nametag]• Emp_gender• Emp_address• Emp_salary Emp_datehired•
L1: EMPLOYEE
(A, 1) (N, 5) (A, 20) (A, 1) (A, 20) (N, 1) (A, 1) (X, 50) (N, 6) (Dt, 8)
1 C 1 0<-----D----->1 0 N 1
320
0 < - - - - - L1 - - - - > 1 3 L2 n
Bld_building•
Q[Pl_name] Pl_p# Pl_budget Pl_nemps
L2: PLANT Pl_emp_e#a Pl_emp_e#n - - - - - - - - - - - Pl_mgrstdte L3: BUILDING
Bld_pl_p#
(A, 30) (N, 2) (N, 7) (A, 1) (N, 5) (N, 3) (Dt, 8) (A, 20) (N, 2)
1<----R---->1 1 D 1
0 L2 n 1 L4 m 0 < - - - - - L1 - - - - - > n
0 N 1 1 C 1 1<-----R----->1
0 < - - - - - - - L1 - - - - - - - > n
L6: DEPENDENT
Dep_sex Dep_brthdte [Dep_name• Dep_relhow]• Dep_emp_e#a Dep_emp_e#n
(A, 1) (Dt, 8) (A, 15) (A, 12) (A, 1) (N, 5)
1<-------C------->1
0<---------------C--------------->1 0<-------C------->1
0 < - - - - - - - - - - - - - - - - - - - - L6 - - - - - - - - - - - - - - - - - - - - - > n 0 L9 m
1<---------------------C----------------------->1 1 R 1
FIGURE 6.24 Information-preserving logical schema for the Design-Specific ERD for Bearcat
Incorporated in Figure 6.20
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
[N,6]
[A,1] Student#
F_b_b
STUDENT_ [N,2]
ATHLETE Height
[N,3]
F_b_b_value Weight
d
[N,1.3]
[N,2] Batting_avg
Touchdowns [N,2]
[N,2.2] Home_runs
Speed “Baseball”
“Football”
Errors
“Basketball”
[N,2]
[N,2]
Uniform# FOOTBALL_ BASKETBALL_ BASEBALL_
PLAYER PLAYER Rebounds_per_game PLAYER
[N,2.1]
Position
Assists_per_game
[A,15]
[N,2.1]
Pts_per_game
Uniform# [N,2.1]
[N,2]
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
STUDENT_
Sat_fbb Sat_s# Sat_hgt Sat_wgt
ATHLETE
FOOTBALL_
Fb_sat_s# Fb_unif# Fb_speed Fb_tds
PLAYER
322 BASKETBALL_
Bk_sat_s# Bk_unif# Bk_pos Bk_ppg Bk_apg Bk_rpg
PLAYER
BASEBALL_
Ba_sat_s# Ba_avg Ba_hrs Ba_errs
PLAYER
Solution 1
STUDENT_ATHLETE Sat_s# Sat_fbb Sat_hgt Sat_wgt Sat_unif# Sat_speed Sat_tds Sat_pos Sat_ppg Sat_apg Sat_rpg Sat_avg Sat_hrs Sat_errs
Solution 2
Solution 3
FIGURE 6.26 Three possible logical schemas for the specialization in Figure 6.25
The first solution yields one logical scheme each for the subclasses and the superclass
in the specialization. Observe that the inheritance property of the specialization yields a
candidate key for every subclass (sc) in the specialization; since the cardinality ratio
between a SC and sc is 1:1, this candidate key in the sc also serves as the foreign key
referencing the superclass (SC) in the specialization. Sometimes, when the number of
attributes in the subclasses is very small, the efficiency of this design, with all its referen-
tial integrity constraints, becomes questionable. The single-schema design (Solution 2)
also supports all four combinations of disjointness and completeness constraints by essen-
tially eliminating the need for these constraints. The absence of referential integrity con-
straints and a single-schema design certainly enhance operational efficiency. Unless the
specialization is overlapping (as opposed to disjoint), the database will have null values for
several attributes in each tuple. If subclasses contain lots of attributes, this may be an
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
issue. In addition, any specific relationship types in which one or more of the subclasses
independently or collectively participate cannot always be optimally implemented. The
third solution will have to be rejected if the completeness constraint is partial, as in the
example under illustration. The information that there are student-athletes who are nei-
ther football players, nor basketball players, nor baseball players will be lost in this design.
Assuming that the completeness constraint is total, this solution can be an optimal middle
ground among the three, especially when the number of attributes in the superclass is
minimal. However, if the specialization is overlapping, some data redundancy is to be
expected. The alternative foreign key design using inclusion dependencies in place of the 323
directed arcs is left as an exercise for the reader.
The remainder of the discussion in this section demonstrates only the foreign key
design using the directed arc notation shown in Figure 6.26, Solution 1.
d1
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
324
FIGURE 6.28 Logical schema for the specialization hierarchy in Figure 6.27: Foreign key design
using directed arcs
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
schemes. Figure 6.30 portrays the logical schema for the ERD in Figure 6.29 using the
foreign key design with directed arcs. It must be noted that in a categorization a candidate
key of the subclass (i.e., SPONSOR) is added to each superclass (i.e., CHURCH, SCHOOL,
and INDIVIDUAL) as the foreign key instead of the other way around as is done in a
specialization.
325
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
326
FIGURE 6.30 Logical schema for the specialization lattice and categorization in Figure 6.29:
Foreign key design using directed arcs
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
alternate keys for LOT and BUILDING respectively. Likewise, Lot_pr_address and
Bld_pr_address are also alternate keys for LOT and BUILDING, respectively. As in a cate-
gorization, a candidate key of an aggregate (subclass) is mapped as a foreign key in every
superclass that participates in that aggregation. The reader may wish to develop the logical
schema using inclusion dependencies as an exercise.
327
FIGURE 6.32 Logical schema for the category and aggregate in Figure 6.31:
Foreign key design using directed arcs
6.8.1.5 Information Lost While Mapping EER Constructs to the Logical Tier
Once again, the techniques discussed in Section 6.8.1 are information-reducing. Two
kinds of metadata information are lost: user-specified business rules and design features.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
In addition to all but the first two information loss items listed in Section 6.7.1.3, the
following information is lost:
• The type of relationship (e.g., specialization/generalization, categorization,
aggregation) is not carried forward to the logical schema.
• SC/sc (i.e., intra-entity class) relationships become indistinguishable from the
regular (inter-entity class) relationships.
• The disjointness constraint of a specialization/generalization is lost during the
conversion process.
328
• Multiple specializations of the same superclass are not captured.
• Specialization lattices are not discernible.
• The number of subclasses participating in a specialization and the number of
superclasses participating in a categorization and/or aggregation are lost in
the mapping.
• The completeness constraint of an SC/sc relationship is not present in the
logical schema.
Are these losses of information trivial? If so, why are they collected in the conceptual
data model to begin with? The argument that the ERD and the list of semantic integrity
constraints can be used to supplement the logical schema during physical design defeats
the purpose of even attempting to develop a logical schema. Furthermore, there is nothing
wrong in attempting to develop a self-sufficient logical data model. Simplicity of design as
the rationale for an underdeveloped logical schema is not a worthy compromise. The
next section describes an extension to the information-preserving logical modeling gram-
mar presented in Section 6.7.2, as applied to EER modeling.
min LS #
FKAtt1 Att1 Att2 AttZ
Lu: SCHEME ...
(t, s) (t, s) (t, s) (t, s)
min D Jx
Or
min LS #
Lu: SCHEME (FKAtt1, Att1, Att2, . . ., AttZ)
(t, s) (t, s) (t, s) (t, s)
min D Jx
Note: Since the cardinality ratio in an SC/sc relationship is always (1:1) the
max part of (min,max) is always a 1. Therefore, the max is inherently
preserved.
• LS is the label of the parent scheme coded on top of the foreign key that
depicts the SC/sc relationship.
• # denotes the number of sc’s in the specialization/generalization or the num-
ber of SCs in the categorization, specialization lattice, and aggregation.
• Jx is the SC/sc label where:
J is the disjointness constraint value: d ¼ disjoint; o ¼ overlapping; u ¼ union; 329
a ¼ aggregate; L ¼ specialization lattice; ¼ none.
* x is the marker for the specialization/categorization/aggregation/
specialization lattice (e.g., 1, 2, 3, … )
• The D between min and Jx under the foreign key specifies action to be taken
when a tuple from the parent entity in a relationship is deleted. C ¼ Cascade;
N ¼ Set null; D ¼ Set default value provided; R ¼ Restrict.
While the grammar notation is almost the same as that of the one employed for the
ER constructs, two syntactical markers are specific to intra-entity class (SC/sc) relation-
ships: Jx and #. J in the Jx reflects the value of the disjointness constraint in the cases of
specialization/generalization (o for overlapping and d for disjoint), the union property (u)
in the case of categorization, and the aggregate property (a) in the case of aggregation. The
x in the Jx marks the specific specialization type occurrence in which the scheme partici-
pates. The same is true for categorization and aggregation as well. The # marker above the
foreign key is an integer that indicates the number of subclasses in the specialization
flagged by Jx. In the cases of categorization, aggregation, and specialization lattice, this
will be the number of superclasses in the particular SC/sc construct.
The general form of the specialization lattice is shown here:
Or
Note that each specialization in which the shared subclass is a member is captured by
an independent foreign key. LL is the shared subclass in the lattice, and LS1, LS2, … point
to parents in the SC/sc relationships.
The following examples illustrate the extension to the information-preserving
grammar prescribed earlier for mapping EER model constructs to the logical schema.
To begin with, let us revisit the specialization hierarchy depicted in Figure 6.27, which is
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
reproduced in Figure 6.33. As pointed out in Section 6.8.1.5, both the foreign key design
using directed arcs approach for logical model mapping (Figure 6.28) and its inclusion
dependency alternative are information-reducing.
330
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
the fact that STUDENT_ ATHLETE participates as the superclass (SC) in three distinct spe-
cializations, and the related metadata are fully captured in the logical schema with no loss of
information in the transformation process. The overlapping specialization of the superclass
FOOTBALL_PLAYER as {OFFENSIVE_PLAYER, DEFENSIVE_PLAYER} is coded by the value
o for Jx (instead of o1 or o2) because there is only one specialization of FOOTBALL_PLAYER.
1 d1
0 L1 3
Ba_sat_s# Ba_avg• Ba_hrs Ba_errs
L6: BASEBALL_PLAYER
(N, 6) (N, 1.3) (N, 2) (N, 2)
1 d1
0 L1 1
Cpt_sat_s# Cpt_team
L7: TEAM_CAPTAIN (N, 6) (A, 15)
1 d2
1 L1 2 1 L1 2
Var_sat_s# Var_sship Var_redshrt Im_sat_s# Im_class Im_league
L8: VARSITY_PLAYER (N, 6) (A, 1) (A, 1) L9: INTRAMURAL_PLAYER
(N, 6) (A, 15) (A, 15)
1 d3 1 d3
FIGURE 6.34 Information-preserving logical schema for the specialization hierarchy in Figure 6.33
Let us turn our attention back to Figure 6.29, in which a categorization and a special-
ization lattice are portrayed. In fact, while the relational schema in Figure 6.30 indicates
the presence of these relationships, the fact that these are SC/sc relationships and repre-
sent a categorization and a specialization lattice is completely lost in Figure 6.30. It is true
that the logical schema in Figure 6.30 is correct in that the physical implementation of the
design will work, but the mapping is not complete, thus the implemented database system
will not be robust. Indeed, some of the metadata lost in the transformation may not matter
in implementations in certain relational DBMSs. The lost information, however, is integrity
constraints arising from the business rules of the application—that is, metadata of the data
model. While declarative implementation of some of these constraints may not be possible,
procedural implementation methods can be used to make up for it. In sum, during data-
base design, losing metadata through the data modeling tiers cannot be casually accepted
and should be strictly avoided.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Sp_s# Sp_status
L1: SPONSOR (N,5) (A,1)
1 L1 3
Ch_name Ch_sp_s# Ch_pastor Ch_den
L2: CHURCH (A,30) (N,5) (A,30) (A,20)
333
1 u
1 L1 3
1 u
1 L1 3
1 u
0 L4 1 0 L6 1
1 L 1 L
0 L1 1
Note: While the entity type SPONSOR need not have a candidate key, every logical scheme, by definition, must have a primary
key. Therefore, the surrogate key, Sp_s# is “manufactured” for the relation scheme.
FIGURE 6.36 Information-preserving logical schema for the specialization lattice and
categorization in Figure 6.35
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
In Figure 6.35, observe that the category SPONSOR does not have a candidate key.
While this is okay in the conceptual tier, absence of a candidate key in a logical scheme is,
by definition, unacceptable. Therefore, a surrogate key, Sp_s#, has been coined for the
logical scheme of SPONSOR in Figure 6.36 and mapped as the foreign key in each of the
three superclasses in the categorization: CHURCH, SCHOOL, and INDIVIDUAL. Also, con-
forming to the universal relation schema assumption, some of the attributes in the logical
schema have been renamed so that the attribute names are globally unique in the logical
schema—for example, Ch_name in CHURCH and Sch_name in SCHOOL. Note that these
334 foreign keys also end up as candidate keys of the respective logical schemes due to the
cardinality ratio of 1:1 implicitly present in the design.30
In the specialization lattice, however, PUBLIC_SCHOOL is a shared subclass in two
specializations. It inherits attributes and relationship types from both SCHOOL and
NOT_FOR_PROFIT_ORGANIZATION. Since this is a lattice of “specialization,” the shared
subclass, PUBLIC_SCHOOL, assumes the role of child in the two PCRs of specialization.
Therefore, there are two foreign keys, Psh_sch_name and Psh_nfp_sp_s#, in PUBLIC_
SCHOOL that capture the relationships with SCHOOL and NOT_FOR_PROFIT_
ORGANIZATION, respectively. Observe that the two foreign keys are also candidate keys
of PUBLIC_SCHOOL because of the 1:1 cardinality ratios inherent in specialization rela-
tionships. Since Psh_sch_name is chosen as the primary key, Psh_nfp_sp_s# automatically
becomes an alternate key of the logical scheme.
How do you convert an aggregation construct to the logical tier? This was discussed
earlier in this chapter and illustrated in Figures 6.31 and 6.32 (see Section 6.8.1.4). The
information-preserving mapping of aggregation is similar to that of categorization except
that the aggregate is denoted by an a for the Jx value instead of a u. Recall that the #
above the foreign key indicates the value of the number of superclasses participating in
the aggregation. Therefore, when the cardinality ratio of an SC/sc relationship in an
aggregation is not 1:1, the associated metadata is not captured in the logical schema;
instead, it is recorded in the list of semantic integrity constraints that accompanies the
logical schema. Figure 6.37 shows the design-specific rendition of the ERD (a repeat of
Figure 6.31, for the reader’s convenience). The information-preserving logical schema for
the aggregation and categorization shown in Figures 6.31 and 6.37 appears in
Figure 6.38.
30
Selective type inhertitance as a distinguishing property of categorization is inherent in the rela-
tionship construct. This property is not explicitly seen in the logical schema either. The constraint
means that the same value of Sponsor# cannot occur in more than one of the relations: CHURCH,
SCHOOL, and INDIVIDUAL. The subtle nature of this constraint deems it necessary to be explicitly
specified in the list of semantic integrity constraints that accompanies the local schema.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
[X,50] [N,8]
PROPERTY Mkt_vlu
Address
[X,30] U
Location
A
[X,3]
[X,11]
Bldg#
Survey# 335
U
[X,5]
[N,5]
Lot#
Area
U BUILDING
[N,6] LOT
Size
[A,7] TAXABLE_
Type PROPERTY
[N,8]
Appraisal [N,3]
Living_units
FIGURE 6.38 Information-preserving logical schema for the category and aggregate in Figure 6.37
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
data model, first using the foreign key design with both the directed arcs and inclusion
dependencies alternatives, and then using the information-preserving grammar. Such an
exercise will not only reinforce understanding of the information-preserving grammar, it
will also clarify the information-preserving nature of this new grammar. The logical sche-
mas in Figures 6.21, 6.22, and 6.24 may be used as springboards to embark on this
adventure.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Isbn# BOOK
Price D
m Uses C
Title 1
Name Office
Qualification Type_of_use
Credits
D
337
INSTRUCTOR Adopts COURSE
R C
N
Name
Course#
Teaches
1 m
Note: It is not possible to express structural constraints of n-ary relationships where n > 2 (ternary and beyond) using the
look-across notation.
FIGURE 6.39a Presentation Layer ERD for a ternary relationship type Adopts
Isbn# BOOK
D USE
(1,n )
) Is_for (1,1 C
Price (1,p) (1,1)
Title Type_of_use
Name Office Used_in Has
Qualification
(1,1) (0,m)
Credits
D
(0,n) (1,1) (1,1) (0,m)
INSTRUCTOR Selects ADOPTION Follows COURSE
R C
N
(1,m) (0,1)
Name
Course#
Teaches
FIGURE 6.39b A Design-specific ERD for the Presentation Layer ERD shown in Figure 6.39a
The mapping of a Design-Specific ERD to the logical tier has been discussed exten-
sively earlier in this chapter. Thus, the information-reducing logical schema
(Figure 6.39c) and the information-preserving logical schema (Figure 6.39d) require no
additional clarification except to draw the attention of the reader to the logical scheme
ADOPTION in both the figures.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
338
R3: BOOK B_isbn# B_title B_price
FIGURE 6.39c Logical schema (information-reducing) for the Design-Specific ERD in Figure 6.39b
1 L1 m
C_i_name Q[C_name] C_credits• C_course#
L2: COURSE (A,30) (A,30) (N,1) (A,11)
0 N 1
0 L1 n 1 L3 p 0 L2 m
A_i_name A_b_isbn# A_c_course#
L4: ADOPTION
(A,30) (A,17) (A,11)
1 R 11 D 11 C 1
1 L3 n 0 L2 m
L5: USE
U_b_isbn# U_c_course# U_type_of_use•
(A,17) (A,11) (A,17)
1 D 11 C 1
Note: Data type and size for the attributes have been arbitrarily assigned for closure since this data
is not provided in the ER diagram.
A weak entity type child in a recursive relationship type is modeled in Figure 6.40a.
The decomposition of this relationship type appears in the final Design-Specific ERD in
Figure 6.40b. Also, the business rule stated as a semantic integrity constraint in
Figure 6.40a is incorporated as the inter-relationship constraint “exclusion dependency”
in Figure 6.40b using weak relationships.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Name Location
Size
Client C Year
(0,n)
(1,1)
Executes CONTRACT
COMPANY
Contract_yr Contract#
--------
C (1,1)
R (0,m)
Contractor
339
Linked_with
Pname
Pno
(0,n)
Location
PROJECT
Business rule: A company in the role of a client in a contract cannot be a contractor in the same
contract and vice versa.
FIGURE 6.40a A copy of the ERD presented in Figure 5.45b with added deletion constraints
Name Location
Size
t
Clien Year
( )
0 , n Executes (1,1)
C
CONTRACT
COMPANY
Contract_yr
Contract#
R 1) C (1,1)
(0,m) (1,
Services
Contra
ctor
Linked_with
Pname
Pno
(0,n)
Location
PROJECT
FIGURE 6.40b The m:n relationship with a weak entity child in Figure 6.40(a) decomposed:
The final Design-Specific ERD
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
FIGURE 6.40c Logical schema (information-reducing) for the Design-Specific ERD in Figure 6.40b
0 L1 n 0 L1 m 0 L3 n
Cnt_year Cnt_num Cnt_co_client Cnt_co_contractor Cnt_prj_pno
L2: CONTRACT
(A,4) (A,7) (A,30) (A,30) (A,7)
1 C 1 1 R 1 1 C 1
FIGURE 6.40d Logical schema (information-preserving) for the Design-Specific ERD in Figure 6.40b
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
remember is that the standard decomposition procedure will establish CUSTOMER and
VEHICLE as the parents of RETURN through the appropriate identification relationship
type constructs. However, the partial specialization of RENTAL as RETURN renders these
identifying relationships redundant; that is why they are absent in Figure 6.41b. Once this
modeling insight is clear, the mapping to the logical tier becomes a simple mechanical
process.
Drv_License# Type
Gender
341
Name Vehicle_id Make
C (0,n) R
CUSTOMER Rents VEHICLE
n m
Rate
Returns m
n
FIGURE 6.41a Presentation Layer ERD with a weak relationship expressing inclusion dependency
Drv_License# Type
Name Gender Make
Vehicle _id
Rate
CUSTOMER RENTAL VEHICLE
R
C
(1,m) (1,1) (1,1) (0,n)
U Up_for
Seeks
C
RETURN
Figures 6.41c and 6.41d display the information-reducing and the information-
preserving logical schema respectively for the design-specific ERD portrayed in
Figure 6.41b. Please note that the foreign key reference from RETURN to RENTAL implies
reference based on the two attributes Vehicle_id and Drv_license# together—not individu-
ally. Individual foreign key references here will be an error.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
FIGURE 6.41c Logical schema (information-reducing) for the Design-Specific ERD in Figure 6.41b
0 L2 n 0 L1 n
R_v_veh_id R_c_drv_lic# R-rate
L3: RENTAL (A,9) (A,13) (N,3)
1 R 1 1 C 1
0 < - - - - - - - - - L3 - - - - - - - - - 1
E_r_v_veh_id E_r_c_drv_lic#
L4: RETURN (A,9) (A,13)
1 <-- -- -- -- C -- -- -- -- -- 1
Note: Data type and size for the attributes have been arbitrarily assigned for
closure since this data is not provided in the ER diagram.
FIGURE 6.41d Logical schema (information-preserving) for the Design-Specific ERD in Figure 6.41b
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Managed_by
No_of_dependents
(a)
An enhanced replication of the weak relationship type portrayed in Figure 5.28
R1: EMPLOYEE Emp_emp# Emp_pl_p# Emp_fname Emp_minit Emp_lname Emp_nametag Emp_gender Emp_address Emp_salary Emp_datehired
(b)
Logical schema (Information-reducing) for the Design-Specific ERD in 6.42a
100 L2 n
• • • •
Emp_emp# Emp_pl_p# Q[Emp_fname Emp_minit Emp_lname Emp_nametag] Emp_gender Emp_address
L1: EMPLOYEE (A,7) (N,2) (A,20) (A,1) (A,20) (N,1) (A,1) (X,50)
0 C 1
•
Emp_salary Emp_datehired No_of_dependends
(N,6) (Dt,8) (N,2)
0 <- - - - - - - - L 1 - - - - - - - - > 1
Q[Pl_name] Q[Pl_emp_emp#] Pl_p#• Pl_budget Pl_no_of_emps Pl_mgrstdte
L2: PLANT (A,30) (A,7) (N,2) (N,7) (N,3) (Dt,8)
0 <- - - - - - - - R - - - - - - - - > 1
Note: Data type and size for the attributes have been arbitrarily assigned for closure since this data is not provided in the
ER diagram.
(c)
Logical schema (Information-preserving) for the Design-Specific ERD in 6.42a
FIGURE 6.42a,b,c Mapping a weak relationship type to the logical tier—A second example
31
A strict relational schema would in this case require a cross-referencing design (for example, see Figure 6.15)
because EMPLOYEE, the child entity type, sports a partial participation in the Works_in relationship type and
thus will permit null value for the foreign key, which is prohibited in a strict relational schema.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Here, mapping the Works_in relationship type to the logical tier is rather
straightforward, as reflected in Figures 6.42b and 6.42c. The intricate aspect of
the design is the ability to capture the Managed_by relationship type and the inclusion
dependency of Managed_by on Works_in using a single foreign key reference, as
depicted in Figures 6.42b and 6.42c. The reader is encouraged to study this
mapping closely.
344
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Chapter Summary
The relational data model was first outlined in a paper published by E. F. Codd in 1970. The
model uses mathematical relations as its foundation and is based on set theory. The simplicity of
the concept and the sound theoretical basis are two reasons why the relational data model has
become the model on which most commercial database systems are based. The relational data
model represents a database as a collection of relations, where a relation resembles a two-
dimensional table of values presented as rows and columns. A row in the table represents a set
of related data values and is called a tuple. All values in a column are of the same data type. A 345
column is formally referred to as an attribute. The set of all tuples in the table goes by the name
“relation.” A relationship (association) between two relations in the relational data model takes
the form of a referential integrity constraint.
An attribute or collection of attributes can serve as a unique identifier of a relation. A superkey
is a set of one or more attributes, which, taken collectively, uniquely identifies a tuple. A second
type of unique identifier is called a candidate key. A candidate key is defined as a superkey with
no proper subsets that are superkeys. A candidate key has two properties: uniqueness and irre-
ducibility. The uniqueness property is common to both a superkey and a candidate key, whereas
the irreducibility property is present only in a candidate key. Every attribute plays only one of three
roles in a relation schema: It is a candidate key, a key attribute, or a non-key attribute of the rela-
tion schema. Any attribute that is a constituent part (proper subset) of a candidate key of the rela-
tion schema is a key attribute. An attribute that is not a candidate key is a non-key attribute. A
primary key serves the role of uniquely identifying tuples of a relation. In addition to possessing
the uniqueness and irreducibility properties, a primary key is not allowed to have a missing (i.e.,
null) value (this property is known as the entity integrity constraint).
The relational data model includes a group of basic manipulation operations that involves
relations. Collectively, these operations comprise what is known as relational algebra. Section
6.4 discussed six (selection, projection, union, minus, intersection, and natural join) of the eight
basic relational algebra operations. A more in-depth treatment of relational algebra appears in
Chapter 11. The relational data model also includes views and materialized views. A view is
defined as a named “virtual” relation schema constructed from one or more relation schemas
through the use of one or more relational algebra operations. In a database environment, views
(a) allow the same data to be seen by different users in different ways at the same time, (b)
provide security by restricting user access to predetermined data, and (c) hide complexity from
the user. Unlike a view, a materialized view is real and contains its own separate data. Material-
ized views are used to freeze data at a certain point in time without preventing updates to con-
tinue on the data in the relation schemas on which they are based.
Sections 6.6 through 6.8 described ways to map conceptual schemas (both ER and EER
models) to logical schemas. Approaches for mapping ER constructs to a logical schema begin by
creating a relation schema for each base and weak entity type present in the Design-Specific
ERD. Only the stored attributes of the entity type become attributes of the relation schema. In the
case of composite attributes, only their constituent atomic components are recorded. For each
relation schema based on a base entity type, the atomic attribute(s) serving as the primary key is
(are) underlined. The primary key of a relation schema for a weak entity type includes the partial
key of the weak entity type plus the primary key of each identifying parent of the weak entity type.
A Design-Specific ERD contains binary and recursive relationship types that exhibit a cardi-
nality ratio of 1:n and 1:1. In cases where the cardinality ratio is 1:n, the entity type on the n-side
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
of the relationship type is the child in the parent-child relationship and the child (or referencing
relation schema) contains the foreign key attribute(s). This approach, where the foreign key in
the child shares the same domain with a candidate key (most of the time, the primary key) of the
parent, is known as the foreign key design. The foreign key design can be expressed diagram-
matically via the use of directed arcs or by the specification of inclusion dependencies. An alter-
native to the foreign key design is the cross-referencing design (which can also be expressed
using directed arcs or inclusion dependencies), which entails the creation of a relation schema
to represent the relationship type. This approach can be used if the absence of null values in the
346 foreign key is an important consideration.
In situations where the cardinality ratio of the relationship type is 1:1, either one of the entity
types can be the parent or child. Approaches for handling a 1:1 cardinality ratio depend on the
nature of the participation constraint that characterizes the relationship. In cases where the par-
ticipation constraint of only one of the entity types participating in the relationship type is total,
the entity type with total participation in the relationship type assumes the role of the child in the
parent-child relationship and the foreign key design is applied. When the participation constraints
of both entity types in the relationship type are partial, a variety of approaches can be consid-
ered: (a) a foreign key design with the addition of a foreign key in either one of the relation
schemas involved in the 1:1 relationship type, (b) mutual-referencing, in which the two relation
schemas directly reference each other via foreign keys included in both, and (c) a cross-
referencing design. A third case is when the participation constraints of both entity types in the
relationship type are total. Situations of this type are handled by using mutual-referencing.
Mutual-referencing must be accompanied by the imposition of several constraints, some of
which can be established declaratively and some of which require procedural intervention.
Merging the entity types involved in this or any other type of 1:1 relationship into a single relation
schema is always a possibility but often not employed if the distinct nature of the entity types is
lost or the entity types also participate independently in other relationship types.
The information-reducing nature of design approaches that make use of directed arcs and
inclusion dependencies for mapping ER constructs (i.e., entity types and relationship types) to a
logical schema is illustrated via their application to the mapping of the Design-Specific ER Model
for Bearcat Incorporated to a logical schema. Information lost (i.e., ignored) in the transformation
process includes: the nature of the cardinality ratios, the participation constraints of each rela-
tionship type, the optional/mandatory property of an attribute, the identification of alternate keys,
the composite nature of certain atomic attributes, the existence of derived attributes, deletion
rules, and attribute type and size. A logical modeling grammar capable of producing a logical
schema that is information-preserving is described and then applied to the mapping of the
Design-Specific ER Model for Bearcat Incorporated.
The next section started with a discussion of the application of the foreign key design
approach using directed arcs to map the EER constructs of (a) the specialization/generalization
hierarchy, (b) the specialization/generalization lattice, (c) categorization, and (d) aggregation.
With the exception of aggregation, given the presence of only 1:1 cardinality ratios, the mapping
of these constructs can make use of strategies that are similar to those used to map relationship
types with 1:1 cardinality ratios, depending on the type of participation of the superclass (partial
or total) in the relationship. Since aggregation permits the relaxation of the inherent property of
the 1:1 cardinality ratio in an SC/sc relationship (an aggregate is a subclass that is a subset of
the aggregation of the superclasses in the relationship), the logical mapping of an aggregation is
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
similar to that of the foreign key design in a 1:n relationship type. As was the case when dis-
cussing the application of the foreign key design using directed arcs and inclusion dependencies
in the context of ER constructs, use of these techniques in the context of EER constructs to
develop a logical schema was shown as information-reducing and thus amenable to an
extension of the information-preserving logical modeling grammar described previously.
Accordingly, the information-preserving grammar for EER constructs was described at this point
and application of this grammar was demonstrated using a few examples.
Section 6.9 explicated mapping techniques for some of the advanced ER modeling gram-
mar constructs as well as a few complex ER models. Both information-reducing and information- 347
preserving logical modeling grammars were used for the mapping process.
Exercises
1. Define the terms “tuple,” “attribute,” and “relation.”
2. What is a relation schema? What is the difference between a relation, a relation schema,
and a relational schema?
3. What is the difference between a derived attribute and a stored attribute in terms of their
representation in a relation schema?
4. What is a null value? What gives rise to null values in a relation?
5. What is the difference between a subset and a proper subset?
6. What is a candidate key? How does a candidate key differ from a superkey?
7. What is a primary key? How do the properties of a primary key differ from those of a can-
didate key?
8. Identify the superkeys, candidate key(s), and the primary key for the following relation
instance of the STU-CLASS relation schema.
Student
Number Student Name Student Major Class Name Class Time
9. Define the term “referential integrity constraint.” Why is referential integrity important?
How is the term “foreign key” used in the context of referential integrity?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Relation R1
30 A 20
45 B 32
75 A 24
348
Relation R2
30 B 24
75 C 12
30 B 20
Show the relations created as a result of the following relational algebra operations:
a. The union of R1 and R2
b. The difference of R1 and R2
c. The difference of R2 and R1
d. The intersection of R1 and R2
e. The natural join of R1 and R2. [Assume that R1.a and R2.x are the joining attributes.]
11. Consider the following relations of DRIVER, TICKET_TYPE, and TICKET:
DRIVER (Dr_license_no, Dr_name, Dr_city, Dr_state)
TICKET_TYPE (Ttp_offense, Ttp_fine)
TICKET (Tic_ticket_no, Tic_ticket_date, Tic_dr_license, Tic_ttp_offense)
An instance of each of these relations appears here:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Ttp_offense Ttp_fine
Parking 15
Red Light 50
Speeding 65
Failure To Stop 30
349
Tic_Ticket_no Tic_ticket_date Tic_dr_license_no Tic_ttp_offense
Use the data to answer the following questions. Also, list the relational algebra operation(s)
required to obtain the answer.
a. What are the names of all drivers?
b. What are the license numbers of all drivers who have been issued a ticket?
c. What are the license numbers of those drivers who have never been issued a ticket?
Hint: Consider the use of the minus operator along with one other relational algebra
operator.
d. What are the names of all drivers who have been issued a ticket?
12. What would cause a relational schema for a database to contain more relation schemas
than there are entity types?
13. Discuss the concept of information preservation in data model mapping.
14. What is required to map a base entity type to a relation schema? Describe how this
approach differs for a weak entity type.
15. What is required to map a relationship type that exhibits a 1:n cardinality ratio?
16. What is the difference between the referencing relation schema and the referenced relation
schema? How are these terms incorporated into the foreign key design?
17. What is the purpose of the cross-referencing design?
18. What complicates the mapping of 1:1 cardinality ratios?
19. Describe mutual-referencing and the complexities that it introduces.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
[N,6]
[A,1]
[A,20] Base_salary
Minit [A,7]
Lname
[N,2] [A,30] Lic_plate#
[A,20] Pct_commision Vin_num
Fname
Name
[N,5] [N,5.2]
Emp_no (0, n) (0, 1) Msrp
Sells
SALESPERSON VEHICLE
[A,1]
Gender [N,4]
Model_yr
[Dt,8] [A,13]
Date_hired Make
24. List (tabulate) the metadata available in the ERD for Exercise 23 and indicate the ones
captured in the logical schema of design 23(a) and design 23(b).
25. For the ERD for Exercise 23, specify the logical (relational) schema as per the cross-
referencing design in the following ways:
a. using directed arcs
b. in terms of inclusion dependencies
Also, explain the merits and demerits of this cross-referencing design over the foreign key
design solution.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
Salary
Sys_name
Qualification Sys_id
Name
Emp_no (y, 1)
(x, 1) Resident_expert Size
PROGRAMMER SYSTEM 351
Gender
Date_hired
For the following three cases, specify the logical (relational) schema using either directed
arcs or in terms of inclusion dependencies according to the foreign key design, cross-
referencing design, and mutual-referencing design:
a. when x ¼ 0 and y ¼ 1
b. when x ¼ 0 and y ¼ 0
c. when x ¼ 1 and y ¼ 1
In each case, offer a comparative discussion of the merits and demerits of the three design
options.
27. Consider the following excerpt from an ERD:
Salary
Emp_no
(x, 1)
NURSE
Supervises
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 6
Specify the logical (relational) schema (choosing a design without a need to use null values
to indicate partial participation) in the following ways:
a. using directed arcs
b. in terms of inclusion dependencies
28. How can the design requirement in Exercise 27 be satisfied if the “Supervisee” part of the
relationship has the structural constraints (0, 1)? Again, show a design using both directed
arcs and in terms of inclusion dependencies.
352 29. Specify the logical schema for the ERD in Exercise 23 using the information-preserving
grammar, and indicate the metadata present in the ERD (Exercise 24) captured by this
logical schema.
30. Specify the logical schema for the ERD for Exercise 27 using the information-preserving
grammar.
31. Consider the following ERD:
Date
Prescription #
Med_name
Med_code
List_price
Using the foreign key design, specify the logical schema for this diagram in the following
ways:
a. with a directed arc
b. in terms of an inclusion dependency
c. using the information-preserving grammar
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The Relational Data Model
COMPANY STUDENT
Location Period
353
Stipend
1 1
m INTERNSHIP n
Provides Gets
Using the foreign key design, specify the logical scheme in the following ways:
a. with directed arcs
b. in terms of inclusion dependencies
c. using the information-preserving grammar
Also, list the metadata present in the ERD and indicate the ones captured by each of the
three designs of the logical schema.
33. Specify the logical (relational) schema for the ERD shown in Figure 5.61 in Chapter 5 for
the Cougar Medical Associates as per the foreign key design:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PART III
NORMALIZATION
INTRODUCTION
At this point in the database design life cycle, we are in the logical tier, and a logical data model
comprising a logical schema and a list of semantic integrity constraints has been developed.
The modeling process to this point has been more heuristic and intuitive than scientific, and in fact
the source schema (ERD) for the logical modeling process was conceived intuitively.
Now that we are at the doorstep of implementing a database system using this design,
a valid question to consider concerns the “goodness” of the design. What do we know about
the quality of the data model we have in our hands? How do we vouch for the goodness of
the initial conceptual model and the quality of the process of transforming the conceptual
data model to its logical counterpart? The data models on hand at this point are probably
“good” for user-analyst interaction purposes. But how can we make sure that the database design,
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Part III
if implemented, will work without causing any problems? No matter what approach is
taken,1 grouping of attributes is an intuitive process and so requires validation for design
quality. How do we go about doing this? The answer is normalization.
A major problem that often escapes attention during semantic considerations in data
modeling is data redundancy.2 Data redundancy creates the potential for inconsistencies
in the stored data. Normalization is a technique that systematically eliminates data
redundancies in a relational database. The principles of normalization have been devel-
oped as a part of relational database theory. While the dependency-preserving logical data
model developed in Part II accommodates constructs beyond what is permissible in a
356 relational data model, the issues and answers addressed by normalization principles apply
equally to all data models in the logical tier. Since contemporary database systems are
dominated by relational data models, we confine our attention to the relational data
model and relational database systems. Figure III.1 points out our current location in the
data modeling journey.
Chapter 7 looks at data redundancy in a relation schema and why it is a problem.
The problem is then traced to its source, that is, undesirable functional dependencies.
Functional dependencies are examined through inference rules called Armstrong’s axioms.
Next, we study techniques to derive the candidate keys of a universal relation schema for
a given set of functional dependencies. Chapter 8 is dedicated to developing a solution to
data redundancy problems triggered by undesirable functional dependencies; in other
words, normalization. After discussing normal forms associated with functional
dependencies in isolation, we examine the side effects of normalization—namely, the
lossless-join property and dependency preservation property. Chapter 8 presents a
comprehensive approach to resolving various normal form violations triggered by a set of
functional dependencies in a universal relation schema. This is followed by a brief
discussion of how to “reverse engineer” a normalized relational schema to the conceptual
tier, which often forges a better understanding of the database design. Chapter 9
completes the discussion of normalization by examining the impact of multi-valued
dependency and join-dependency on a relation schema.
1
This book uses a top-down approach to database design (also known as design by analysis), as
shown in Figure III.1 and the other Part-introductory figures. A bottom-up approach to database
design, based on the early binary modeling work by Abrial (1974), is also possible. Somewhat less
popular than the top-down approach, this design-by-synthesis approach is the basis for the NIAM
model.
2
Redundancy means “superfluous repetition” that does not add any new meaning.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normalization
Universe of
Interest
Requirements
Specification
Process Data
Specifications Specifications
[ER Modeling
Process Model Conceptual Design/Schema
Grammar]
ER Diagram
Design-Specific + Updated semantic
Logical Data Modeling ER Model integrity constraints List
Technology-Independent
Logical Schema
[Information–Preserving Grammar]
Technology-Independent
We
are
here Normalization
Technology-Dependent
Technology-Dependent
Logical Schema
Physical Design/Schema
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 7
FUNCTIONAL DEPENDENCIES
When developing ER models, the entity types and relationships among them are
intuitively distilled from the requirement specifications; then attributes are assigned
to each entity type and sometimes to relationship types. Alternatively, all discernible
data elements in the requirement specifications are treated as attributes, and these
attributes are grouped based on apparent commonalities. The clusters of attributes
are then labeled as entity types and related to each other as semantically obvious.
Unfortunately, there is no objective means to validate the attribute allocation process
during conceptual modeling.1
Normalization, the topic of Chapter 8, is a technique that facilitates systematic
validation of participation of attributes in a relation schema from a perspective of data
redundancy. The building block that enables a scientific analysis of data redundancy and
the elimination of anomalies caused by data redundancy through the process of
normalization is called functional dependency. This chapter introduces the concept of
functional dependency and its role in the normalization process.
This chapter begins with a simple example in Section 7.1 that highlights the issues
pertaining to “goodness” of design of a conceptual/logical data model. Section 7.2
introduces functional dependency and how this concept can be used to scientifically
evaluate the “goodness” of a conceptual/logical design from the perspective of data
redundancy. This section includes a definition of “functional dependency,” a discus-
sion of inference rules that govern functional dependencies (called Armstrong’s
axioms) and the idea of a minimal cover for a set of functional dependencies.
Application of Armstrong’s axioms to systematically derive the candidate keys of
a relation schema, given a set of functional dependencies that hold on the relation
schema, is presented in Section 7.3.
1
In fact, some people question the efficacy of the conceptual modeling step in a database design.
Hypothetically speaking, it is possible to develop a relational data model directly from user require-
ment specifications by transforming all business rules to domain constraints and functional depen-
dencies. However, we subscribe to the school that advocates conceptual modeling as a necessary
and useful step in the database development process.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
359
Product Name
Store
Stock_item Address
W_no
? Price
(1, m) (9, n)
STOCK Replenishes WAREHOUSE
Sq_ft C C
Manager
Location Quantity
? ?
Discount
2
A representative state means that all characteristics of the real, complete relation can be inferred
from the instance shown. That is, the tuples in the relation instance have been hand-picked to fully
represent all the characteristics of the source relation. For instance, one can infer that each Product
has exactly one Price from Figure 7.1c. It is incorrect to argue about the possibility of the Price of a
Product varying from store to store on common sense grounds. After all, common sense varies from
person to person! In other words, any inference about the properties (not data values) of this rela-
tion must be made from the instance of the relation presented—that is why the instance is made
available.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
360
A quick review of the contents of Figure 7.1c confirms {Store, Product} to be a candi-
date key of STOCK, as denoted in the ERD. As a consequence, it can be the primary key
of the relation schema STOCK, as indicated by the underlining here:
STOCK (Store, Product, Price, Quantity, Location, Discount, Sq_ft, Manager)
A cursory look at the table in Figure 7.1c indicates all sorts of redundancy in its content—
literally, every attribute value appears to be duplicated. A closer inspection reveals that there
is some data redundancy in the table, but not all data that appear on the surface to be redun-
dant are actually redundant. For instance, there are lots of duplicate values of Quantity. Does
this mean there is data redundancy in the attribute Quantity? No, because there is no “super-
fluous repetition” of data values of Quantity in STOCK. It is true that a given Product has the
same Quantity in more than one row of the table. This would be redundant only if this is the
case irrespective of the store in which it is stocked. Since that is not the case, presence of
duplicate values of Quantity in STOCK does not signify redundancy. On the other hand, the
Price of a Product in STOCK is the same irrespective of any other fact in the table (e.g., any
store). Therefore, duplication of the Price of a Product in multiple rows in the table amounts to
redundant data. Based on similar reasoning, notice that there is redundancy in the data for
Location as well as Discount in STOCK. It is a good exercise for the reader to reason this out.
The next issue to investigate is the “so what?” question—that is, why does the data
redundancy matter? While the wasted storage space need not be a serious issue, there are
more significant problems. Suppose we want to add Washing Machine to the stock with a
Price. We cannot do this without knowing a Store where washing machines are stocked. This
is because {Store, Product} is the primary key of this table STOCK, and the entity integrity
constraint stipulates that neither Store nor Product can have “null” values in this table. This
is what is called an insertion anomaly.3 This is a serious problem because it may be an
unreasonable imposition on the user community. Now, say that store 17 is closed. In order
to remove store 17, not only do we need to remove several rows from the STOCK table, we
3
An anomaly, according to the Random House Dictionary, means a “deviation from the rule, type,
or form; an irregularity or abnormality.”
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
inadvertently lose the information that the vacuum cleaner is priced at $300, since no other
store presently stocks vacuum cleaners. This is a deletion anomaly. If, for instance, we want
to change the Location of store 11 from Houston to Cincinnati, we need to update all rows in
the STOCK table that are store 11. Failure to do so will result in store 11 being located in
both Houston and Cincinnati. This is referred to as an update anomaly. In this and other
chapters of the book, we use the umbrella term modification anomalies to collectively refer
to insertion, deletion, and update anomalies. One way of addressing modification anomalies
is to decompose the STOCK table into other relations, as shown in Figure 7.2. The three
relations (tables) STORE, PRODUCT, and INVENTORY are decompositions that collectively
replace the table STOCK shown in Figure 7.1.
361
Now, if we want to add Washing Machine and its price, we can add the information to
the PRODUCT table; whenever a store begins to stock washing machines, we can add the
necessary data to the INVENTORY table, which eliminates the insertion anomaly in
STOCK. If store 17 is closed, a single row in the STORE table is deleted and there is no
other loss of information (we still know that a vacuum cleaner costs $300)—an example of
removing the deletion anomaly. Changing the location for store 11 requires the update of
a single row in the STORE table, as opposed to modifying several rows in the original
STOCK table, removing the update anomaly. It is true that the decomposed design in
Figure 7.2 may be less efficient for data retrieval; but then, that is the price for eliminat-
ing modification anomalies caused by redundant data.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
STORE and PRODUCT now do not have any data redundancy. How about INVENTORY?
Are Store and Product in this relation redundant because these attributes are already present
in the other two relations? The answer is “No,” simply because the repetition of the attributes
in INVENTORY is not superfluous. These attributes in INVENTORY convey more semantics
than what is present in STORE and PRODUCT. However, it is obvious from the table instance
INVENTORY as well as from the original table instance of STOCK that Discount values are
redundantly stored. After all, for a given Quantity, there is only a single, specific Discount
value. The solution is a simple further decomposition, shown in Figure 7.3.
362
However, there are times when some data redundancy is willfully tolerated as a tradeoff
for efficiency of querying (data retrieval). Suppose the discount structure, according to the
user, is relatively stable (i.e., minimal changes). Then, the design in Figure 7.2 may be a
more optimal design than that in Figure 7.3, despite the data redundancy in INVENTORY.
Such a redundancy is sometimes referred to as controlled redundancy.
The table STOCK in Figure 7.1c is said to be “unnormalized”—that is, it has many
data redundancies—whereas the set of tables in Figure 7.3 is said to be fully “normal-
ized,” given that it has no data redundancies. The questions at this point ought to be:
• How do we systematically identify data redundancies?
• How do we know how to decompose the base relation schema under
investigation?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
363
L1: STORE Store Location Sq_ft Manager L2: PRODUCT Product Price
FIGURE 7.4a A reverse-engineered logical schema for the set of tables in Figure 7.3
Note that the structural constraints of relationships emerge from the data in the
tables. Since deletion rules are not discernible from the tables, the default value of restrict
(R) is adopted. At this point, we are able to further “reverse engineer” the relational
schema to a Design-Specific ERD, as shown in Figure 7.4b.
Sq_ft
Manager
(1, n)
Quantity
DISC_STRUCTURE
Discount
FIGURE 7.4b Design-Specific ERD reverse-engineered from the logical schema in Figure 7.4a
Observe that INVENTORY is a gerund entity type with two identifying parents, STORE
and PRODUCT. Finally, we can also abstract the Design-Specific ERD to the Presentation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
layer by unraveling the m:n relationship type implicit in the gerund entity type. Since ER
modeling grammar does not allow a relationship type to be related to another relationship
type, INVENTORY necessarily becomes a cluster entity type. The Presentation Layer ERD
that will yield the Design-Specific ERD in Figure 7.4b is presented in Figure 7.4c, and we
have just completed what is known in data modeling circles as “reverse engineering.” By
comparing the ERDs in Figures 7.4c and 7.1, we ought to be able to appreciate the design
problems due to data redundancy hidden in the original entity type STOCK.
Sq_ft
364 Store Location Product Price
Manager
m n
STORE Stocks PRODUCT
C R
INVENTORY
N
n
Quantity
1
DISC_STRUCTURE
Discount
Similar problems may be present in WAREHOUSE (see Figure 7.1a) and every other
relation schema in a relational data model. At this point in the database design life cycle,
each relation schema in the relational data model must be independently scrutinized and
“normalized” where needed. While conceptual modeling is not a scientific process, orga-
nized application of intuition in the development of the conceptual data model and careful
mapping of the conceptual schema to the logical tier often yields a logical (relational)
schema where most of the constituent relation schemas are fully normalized—that is, free
from modification anomalies. Nonetheless, as shown in the example, a systematic verifi-
cation of the “goodness” of the design at this stage of data modeling is imperative lest the
implemented database system should fail to meet user expectations. To that end, let us
proceed to investigate the “how-do” questions:
• How do we systematically identify data redundancies?
• How do we know how to decompose the base relation schema under
investigation?
• How do we know that the decomposition is correct and complete without
looking at sample data?
In order to engage in this enquiry, it is necessary to understand the concept of func-
tional dependency.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
4
As a matter of technicality, functional dependencies exist only when attributes or other elements
involved have unique and singular identifiers (Kent, 1983, p. 121). Since a relation schema, by defi-
nition, has a candidate key, the functional dependency concept is applicable to a relation schema.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
a relation instance is, by our definition, a specially prepared relation state r of R con-
forming to the set of FDs specified on R and can be used to infer the FDs in R.
Note: The reflexivity rule defines what is called a trivial dependency. A dependency is
trivial if it is impossible to not satisfy it.7
• Augmentation rule: If X ã Y, then {X,Z} ã {Y,Z}; also, {X,Z} ã Y.
• Transitivity rule: If X ã Y, and Y ã Z, then X ã Z.
5
Here, it is useful to assume that the entire database for the application domain is a single universal
relation schema.
6
Armstrong, W. W., “Dependence Structures of Data Base Relationships.” Proc. IFIP Congress.
Stockhom, Sweden, 1974.
7
An FD in R is trivial if and only if the dependent is a subset of the determinant. Since trivial
dependencies do not provide any additional information (i.e., do not add any new constraints on R),
they are usually removed from Fþ.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
Given:
Store ã Location and Store ã Sq_ft,
Store ã {Location, Sq_ft} exemplifies the union rule.
The pseudotransitivity rule is a handy corollary of the transitivity rule and works the
following way:
If Manager ã Store and {Store, Product} ã Quantity, then {Manager, Product} ã Quantity.
The inference rules for FDs known as Armstrong’s axioms are summarized in
Table 7.1. Several of the inference rules discussed earlier can be derived from Darwen’s
General Unification Theorem (1992):
If X ã Y and Z ã W, then X {Z-Y} ã {Y,W}
§
In principle, the closure, Fþ, of a given set of FDs, F, can be computed by the use of a
rather inefficient algorithm: Repeatedly apply Armstrong’s axioms on F until it stops pro-
ducing new FDs (Date, 2004).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Rule Definition
References: Armstrong, W. W. “Dependence Structures of Data Base Relationships.” Proc. IFIP Congress.
Stockholm, Sweden, 1974.; Darwen, H. “The Role of Functional Dependencies in Query Decomposition.”
In Date, C. J., and H. Darwen, Relational Database Writings 1989–1991. Addison-Wesley, 1992.
It is likely that a set of FDs, F, translated from user-specified business rules has redun-
dant (extraneous) attributes and sometimes redundant (extraneous) FDs8 because, as narra-
tives, requirement specifications are prone to be somewhat repetitive. For example, consider
the attribute set {Store, Product, Price} and an associated set of FDs, F, culled from the
requirement specification, where fd1: {Store, Product} ã Price and fd2: Product ã Price. Here,
fd1 is redundant because (F – fd1) is equivalent to F, meaning removal of fd1 will not change
Fþ. Alternatively, Store in fd1 is a redundant attribute, and removal of Store from fd1 will not
change Fþ. Likewise, given an attribute set {Store, Product, Price, Quantity} and an associated
set of FDs, G, where fd1: {Store, Product} ã {Quantity, Price} and fd2: Product ã Price, the
attribute Price is redundant in fd1—that is, removal of Price from fd1 will not change Gþ.
Suppose that a set of FDs, F, prevails over a relational schema. This means that
whenever a user performs an update that entails changes to one or more relations in the
database, the DBMS must ensure that the update does not violate any of the FDs in F. All
FDs in F must hold in the updated database state; otherwise, the DBMS must roll back the
updates and restore the database to the original state that prevailed before the update. It is
always useful to identify a simplified set of FDs, Gc, equivalent to F—that is, having the
same closure (Fþ) as F and not further reducible. This Gc is not only equivalent to F, but
further reduction of Gc destroys the equivalence. The practical value of such a simplified
set of FDs, Gc, is that the effort required to check for violations in the database is mini-
mized because a database that satisfies Gc will also satisfy F and vice versa. Gc in this case
is called the canonical cover or minimal cover of F. Formally, a set of FDs, Gc is a minimal
cover for another set of FDs, F, if the following conditions are satisfied:
• Gc F (Gc and F are equivalent.)
• The dependent (right-hand side) in every FD in Gc is a singleton attribute.
This is known as the standard or canonical form of an FD and is intended to
8
This is based on the application of the transitivity rule of Armstrong’s axioms.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
Using Armstrong’s axioms, we can deduce from F that Tenant ã Apartment is in Fþ.
Note that the set of FDs, G:
is a cover for F. However, is G a minimal cover for F? No. It will be a minimal cover only if
there are no redundant attributes and redundant FDs in G. An examination of G reveals
that, given fd2, the attribute Tenant is redundant in fd3. Removal of the redundant attri-
bute from fd3 renders fd2 and fd3 identical; so one of these two FDs (say, fd3) is redun-
dant and can be deleted. Next, given fd1, fd4 is redundant and can be removed without
any consequence. Thus, we are left with G0: {fd1, fd2}, where
fd1: Tenant ã {Apartment, Rent}; and fd2: Apartment ã Rent
Is G0 a minimal cover for F? The answer is still no, because Tenant ã Rent in fd1 is
still redundant. Removing this redundancy from fd1, we have Gc: {fd1, fd2}, where:
fd1: Tenant ã Apartment and fd2: Apartment ã Rent
which now is a minimal cover for F, G, and G0.
Is Gx: {fd1, fd2}, where
fd1: Tenant ã Apartment and fd2: Tenant ã Rent
a minimal cover for F?
The answer is no because Gx is not equivalent to F, in that Gxþ is not ¼ Fþ. For
instance, the FD Apartment ã Rent present in Fþ is not present in Gxþ.
In short, a minimal cover Gc of a set of FDs, F, is not only equivalent to F (that is,
F Gc, meaning Fþ ¼ Gcþ), it also contains neither redundant FDs nor redundant attri-
butes. Every set of FDs, F, possesses a minimal cover. F can be its own minimal cover, too.
In fact, careful construction of F from the business rules often yields F itself as a minimal
cover of the set of FDs in it. Furthermore, there can be several minimal covers for F.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
When several sets of FDs qualify as minimal covers of F, additional criteria are used to
choose a minimal cover, such as the minimal cover with the least number of FDs or the
minimal cover that most closely resembles F.
7.2.3.2 Example 2
Consider a set of attributes {A, B, C} and an associated set of FDs, F: {fd1, fd2, fd3, fd4},
where:
is a minimal cover of F.
Targeting attributes and FDs for evaluation in a different sequence, it can be shown that:
Gm: {fd1, fd2, fd3}, where:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
7.2.3.3 Example 3
Consider the set of attributes {Student, Advisor, Subject, Grade} and an associated set of
FDs, F:
fd1: {Student, Advisor} ã {Grade, Subject};
fd2: Advisor ã Subject;
fd3: {Student, Subject} ã {Grade, Advisor}
G, the expression of F in standard form, can be written as follows:
7.2.3.4 Example 4
Consider the attribute set:
{Product, Store, Vendor, Date, Quantity, Unit_price, Discount, Size, Color}
and the set of FDs, F:
fd1: Product ã {Size, Color}
fd2: {Vendor, Quantity} ã {Unit_price, Discount};
fd3: {Product, Store, Date, Quantity} ã {Vendor, Discount};
fd4: {Product, Size, Color, Store, Date} ã Vendor
What is the minimal cover of F?
The first step is to express F in standard form—say, G {fd1a, fd1b, fd2a, fd2b, fd3a,
fd3b, fd4}, where:
fd3a: {Product, Store, Date, Quantity} ã Vendor; fd3b: {Product, Store, Date, Quantity} ã Discount;
Now that G is in the standard form, the next step in the algorithm to deduce the
minimal cover is to identify and remove redundant attributes from the left-hand side
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
(determinant) of the FDs constituting G. Accordingly, based on fd1a and fd1b, Size and
Color in fd4 are redundant. Thus, fd4 reduces to fd4a: {Product, Store, Date} ã Vendor.
Based on fd4a, Quantity in fd3a becomes a redundant attribute, and elimination of this
attribute from fd3a renders fd4a and fd3a identical. Thus, one of them (fd4a) entails the
other (fd3a), which then becomes a redundant FD and so can be removed with no conse-
quence. Next, fd3b, implied by {fd2b, fd4a}, becomes a redundant FD and so can be
deleted. Thus, we have, Gc {fd1a, fd1b, fd2a, fd2b, fd4a}, where:
A close examination of F and Gc reveals that Gc is a cover for F since F Gc, meaning
Fþ ¼ Gcþ. Since Gc does not contain any redundant attributes or redundant FDs, Gc is
then, by definition, a minimal cover for F. How do we know that Gc does not contain any
redundant attributes or redundant FDs? This is tested by finding an attribute or an FD in
Gc, the removal of which from Gc does not disturb the equivalence of Gc to F. The reader
is encouraged to try this as an exercise.
Since {Product, Store, Date} ã Vendor (fd4a), could we have retained fd3b in G instead
of fd2b? In other words, is Gx {fd1a, fd1b, fd2a, fd3b, fd4a}, where:
a minimal cover for F? The answer is no. In fact, Gx is not even a cover for F, let alone
a minimal cover, because F is not Gx, meaning Fþ is not ¼ Gxþ. Note that {Vendor,
Quantity} ã Discount does not exist in Gxþ.
Once again, the practical value of a minimal cover, Gc, of a set of FDs, F, is that the
effort required to check for violations in the database is minimized because a database
that satisfies Gc will also satisfy F and vice versa.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
What is the closure [A | F]? What is being sought here is the set of attributes in R that
are functionally dependent on A under the set of FDs, F. A quick perusal of F—that is, fd1,
fd2, and fd3—indicates that Aþ ¼ {A, B, G, H}.
Here is an algorithm to compute the closure of an attribute set, Z, under a set of FDs,
F, in a relation schema R:
1. Set Closure [Z | F] to Z.
2. For each FD of the form X ã Y in F,
if X © Closure [Z | F], set Closure [Z | F] to (Closure [Z | F] Y).9
§
3. Iterate step 2 through F until no further change in the Closure [Z | F].
373
Suppose we want to compute {A,C}þ, the Closure [{A,C} | F] in R using this algorithm.
Start: {A,C}þ ¼ {A,C}
First iteration through F:
• In fd1, the determinant B is not a subset of {A,C}þ—so, no change in {A,C}þ.
• In fd2, the determinant A is a subset of {A,C}þ—so, {A,C}þ ¼ {A,C}þ B ¼
§
{A,C,B}.
• In fd3, the determinant C is a subset of {A,C}þ—so, {A,C}þ ¼ {A,C}þ D ¼
§
{A,C,B,D}.
Second iteration through F:
• In fd1, the determinant B is a subset of {A,C}þ—so, {A,C}þ ¼ {A,C}þ {G,H} ¼
§
{A,C,B,D,G,H}.
• In fd2, the determinant A is a subset of {A,C}þ—so, {A,C}þ ¼ {A,C}þ B ¼
§
no change.
• In fd3, the determinant C is a subset of {A,C}þ—so, {A,C}þ ¼ {A,C}þ D ¼
§
no change.
Third iteration through F, as can be seen, is not necessary; so the algorithm terminates.
End: {A,C}þ ¼ {A,C,B,D,G,H}.
Two useful corollaries are worthy of attention:
• Given F, it is possible to know if a specific FD X ã Y follows from F by com-
puting the attribute closure Xþ. If and only if Y is a subset of Xþ can we infer
that X ã Y follows from F. Note that we are able to determine whether the
FD X ã Y follows from F without actually having to compute Fþ.
• Given F, it is possible to know if a certain subset K of the attributes of R is a
superkey of R by computing Kþ, the closure [K | F]. K is a superkey of R if and
only if the closure [K | F] is precisely the set of all attributes of R. If K happens
to be an irreducible superkey of R under F, then K is a candidate key of R.
Note that {A,C} is not a superkey of R in the earlier example because {A,C}þ is not
precisely the set of all attributes of R—that is, {A,C}þ ¼ {A,C,B,D,G,H} does not contain
the attribute E of R.
9
This is based on the application of the transitivity rule of Armstrong’s axioms.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
somewhat more heuristic in nature, while the decomposition approach is more algorithmic.
Each of these approaches is illustrated in the following sections.
375
Derivation of First Candidate Key of URS
Step 1 Derive the minimal cover Fc for the set of functional dependencies (FDs), F, that prevails
over the URS.
Step 2 Select the FD with maximum number of attributes constituting the determinant as the
starting point. Let us call this FD as the target FD, and the determinant of this first FD as
the target determinant (TD1).
Note: If more than one such target FD exists, select one of them for now.
Step 3 If TD1þ is precisely the set of all attributes of URS, then TD1 is a candidate key of URS.
If so, skip to step 7.
Step 4 If TD1þ is not precisely the set of all attributes of URS, select a functional dependency
whose determinant is not a subset of TD1þ as the next target FD—the determinant of this
FD will be TD2.
Step 5 If TD2þ is precisely the set of all attributes of URS, then TD2 is a candidate key of URS.
Otherwise, if {TD1þ TD2þ} is precisely the set of all attributes of URS, then {TD1 TD2}
§ §
is a candidate key of URS.
If either one is true, skip to step 7.
Step 6 Otherwise, repeat steps 4 and 5 using the next target FD.
Repeat steps 4 and 5 until an attribute set K is derived such that Kþ is precisely the set of
all attributes of URS. Then, K is a candidate key of URS.
Step 7 If Fc contains an FD, fdx, where a candidate key of URS is a dependent, then the
determinant of fdx is also a candidate key of URS.
Step 8 When a candidate key of URS is a composite attribute, for each key attribute (atomic or
composite), evaluate if the key attribute is a dependent in an FD, fdy, in Fc.
If so, then the determinant of fdy, by the rule of pseudotransitivity, can replace the key
attribute under consideration, thus yielding additional candidate key(s) of URS.
Step 9 Repetition of steps 7 and 8 for every candidate key of URS will systematically reveal all the
other candidate key(s), if any, of URS.
TABLE 7.2 A heuristic for the derivation of candidate key(s) by synthesis, given a URS and a set of
functional dependencies, F, that prevails over it
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Target FD fd1: {Store, Branch} ã Location; Target determinant (TD1): {Store, Branch}
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Step 5. If TD2þ is precisely the set of all attributes of URS, then TD2 is a candidate
key of URS. Otherwise, if {TD1þ TD2þ} is precisely the set of all attributes of URS, then
§
§
{TD1 TD2} is a candidate key of URS.
If either one is true, skip to step 7. Since either one is not true, continue step 5.
Customerþ is not precisely the set of all attributes of URS1; therefore, Customer is not
a candidate key of URS1.
þ þ
Customer ¼ {Store, Branch, Location, Sq_ft, Manager, Type,
§
{Store, Branch}
Customer, Address}.
{Store, Branch}þ þ §
Customer is not precisely the set of all attributes of URS1;
therefore, {Store, Branch, Customer} is not a candidate key of URS1 either.
378
Step 6. Since at this point no candidate key has emerged, repeat steps 4 and 5 using
the next target FD. Repeat steps 4 and 5 until an attribute set K is derived such that Kþ is
precisely the set of all attributes of URS.
Then, and only then, K is a candidate key of URS.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
Step 8. When a candidate key of URS is a composite attribute, evaluate if the key
attribute is a dependent in an FD, fdy, in F for each key attribute (atomic or composite).
If so, then the determinant of fdy, by the pseudotransitivity rule, can replace the key
attribute under consideration, thus yielding additional candidate key(s) of URS.
Since in Manager ã {Store, Branch} (see fd7), the dependent, {Store, Branch}, is a sub-
set of the candidate key {Store, Branch, Customer, Vendor}, using the pseudotransitivity
rule, {Manager, Customer, Vendor} is extracted as another candidate key of URS1.
Since there is no other key attribute, atomic or composite, of the candidate key, {Store,
Branch, Customer, Vendor}, that is a dependent in any other FD in F, continue to step 9.
Step 9. Repetition of steps 7 and 8 for every candidate key of URS will systematically
379
reveal all the other candidate key(s), if any, of URS.
The only other candidate key of URS1 so far is {Manager, Customer, Vendor}. In this
case, application of steps 7 and 8 do not yield any more candidate keys.
In summary, URS1, where the set of FDs denoted by F prevail, has two candidate
keys. They are:
{Store, Branch, Customer, Vendor} and {Manager, Customer, Vendor}
Step 2 Remove an attribute Ai, (i ¼ 1, 2, 3, … . . , n) from URS such that {K – Ai} is still a superkey,
K0, of URS.
Note: In order for K0 to be a superkey of URS, the FD: (K0 ã Ai) should persist in Fþ.
TABLE 7.3 An algorithm for the derivation of candidate key(s) by decomposition of the superkey
given a universal relation schema: URS ¼ {A1, A2, A3, … . . , An} and the set of FDs
over URS: F ¼ {fd1, fd2 , fd3, … . . , fdm}
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Step 4 If F contains an FD, fdx, where a candidate key of URS is a dependent, then the
determinant of fdx is also a candidate key of URS.
Step 5 When a candidate key of URS is a composite attribute, for each key attribute (atomic or
composite):
• Evaluate if the key attribute is a dependent in an FD, fdy, in F.
• If so, then the determinant of fdy, by the rule of pseudotransitivity, can replace the key
attribute under consideration, thus yielding additional candidate key(s) of URS.
Step 6 Repetition of steps 4 and 5 above for every candidate key of URS will systematically reveal
380 all the other candidate key(s), if any, of URS.
TABLE 7.3 An algorithm for the derivation of candidate key(s) by decomposition of the superkey
given a universal relation schema: URS ¼ {A1, A2, A3, … . . , An} and the set of FDs
over URS: F ¼ {fd1, fd2 , fd3, … . . , fdm} (continued)
fd4: {Product, Company} ã Price; fd5: Sales ã Production; fd6: {Company, Product} ã Sales;
fd7: {Product, Company} ã Supplier; fd8: Supplier ã Product; fd9: President ã Company
Here are the steps to determine the candidate key(s) of URS2 by decomposing the
superkey:
Step 1. Set superkey, K, of URS ¼ {A1, A2, A3, … . . , An}
K ¼ {Company, Location, Size, President, Product, Price, Sales, Production, Supplier}
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
fd4: Proj_nm ã Budget; fd5: Fund# ã Proj_nm; fd6: {Proj_nm, Emp#} ã Hours;
fd7: Proj_nm ã Proj#; fd8: Emp# ã Job_type; fd9: {Proj#, Emp#} ã Fund#
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
383
FIGURE 7.7 Dependency diagram for the relation schema, URS3
For this example, the solution using both the decomposition technique and the
synthesis technique are demonstrated.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Target FD – fd6: {Proj_nm, Emp#} ã Hours; Target determinant (TD1): { Proj_nm, Emp#}
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
Step 3. If TD1þ is precisely the set of all attributes of URS, then TD1 is a candidate
key of URS. If so, skip to step 7.
{Proj_nm, Emp#}þ is not precisely the set of all attributes of URS3; therefore, {Proj_nm,
Emp#} is not a candidate key of URS3.
Step 4. If TD1þ is not precisely the set of all attributes of URS, select an FD whose
determinant is not a subset of TD1þ as the next target FD; the determinant of this FD will
be TD2.
Compute the attribute closure [TD2 | Fc].
There are no FDs in Fc whose determinant is not a subset of {Proj_nm, Emp#}þ.
Observe that the attribute Division does not participate in any FD in F while it is
present in URS3. This indicates the independent state of this attribute in URS3. One way to
formalize the presence of the attribute Division in URS3 is to portray it through an implicit
trivial FD in Fþ—viz., Division ã Division. Therefore, let fd10 be Division ã Division.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
10
For a review of the properties of superkey and candidate key, see Section 6.3.1 in Chapter 6.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
attribute. Any attribute, atomic or composite, in a relation schema R that fails the test for
a key attribute (being a proper subset of a candidate key) is a non-key attribute, with the
exception of the candidate key(s) of R—which are, in fact, neither key attributes nor non-
key attributes in R. This is because a candidate key is a subset of itself and thus fails the
test for non-key attribute. It is, however, not a proper subset of itself and thus fails the test
for a key attribute.
Based on this discussion, we have an alternative definition for a candidate key from
this point forward:
A candidate key of a relation schema, R, fully functionally determines all
attributes of R.
387
While the choice of primary key from among the candidate keys is essentially
arbitrary, some rules of thumb are often helpful:11
• A candidate key with the least number of attributes may be a good choice.
• A candidate key whose attributes are numeric and/or of small sizes may be
easy to work with from a developer’s perspective.
• A candidate key that is a determinant in a functional dependency in F rather
than Fþ may be a good choice because it is probably semantically obvious
from the user’s perspective.
• Surrogate keys (especially DBMS-developed sequence numbers) should be
used only as a last resort because they don’t offer semantic reference points
to the user community.
Primary keys were discussed in Chapter 6 (see Section 6.3.1). As a refresher, a
primary key is an irreducible unique identifier like any other candidate key of a relation
schema, but a primary key is in addition bound by the entity integrity constraint—that is,
none of the attributes constituting a primary key is allowed to have “null” values. Suppose
{Company, Product} is the primary key of URS2 in our example. Then, neither Company
nor Product can have null values in any tuple (in any state) of the relation, URS2. Similar
to a key attribute, any attribute, atomic or composite, in a relation schema, R, that is a
proper subset of the primary key of R is called a prime attribute. An attribute of R that is
not a member of the primary key is a non-prime attribute except when it is a candidate
key of R. Any candidate key of R not chosen as the primary key is referred to as an
alternate key of R and, like the primary key, is neither a prime nor a non-prime attribute
of R. Here are some examples that provide additional clarification.
Consider URS2, the example from Section 7.3.2.1:
URS2 (Company, Location, Size, President, Product, Price, Sales, Production, Supplier)
where, under the FDs specified in F with reference to URS2, the candidate keys are:
{Company, Product}; {Company, Supplier}; {President, Product}; and {President, Supplier} and
the chosen primary key is {Company, Product}.
Table 7.4 shows the key and non-key attributes and the prime and non-prime attri-
butes for URS2.
11
No matter which candidate key has been chosen to be the primary key of a relation schema, the
normalization process may spontaneously force some changes. We will have an opportunity to
observe this phenomenon in Chapter 9.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Note: Any composite attribute that includes one or more non-key attribute(s) is a non-key attribute.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
Note: Any composite attribute that includes one or more non-key attribute(s) is a non-key attribute.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
Chapter Summary
Normalization is a technique that facilitates systematic validation of the participation of attributes
in a relation schema from the perspective of data redundancy. One of the main concepts asso-
ciated with normalization is functional dependency. An FD in a relation schema, R, is a con-
straint of the form A ã B, where an attribute, B (atomic or composite), is dependent on attribute
A (atomic or composite) if each value, a, of A is associated with exactly one value, b, of B.
Examination of functional dependencies in a relation schema is important because certain
functional dependencies (i.e., those that are undesirable) can lead to insertion, deletion, and
update anomalies via data redundancies in the associated relation instances, collectively known
390
as modification anomalies. Since functional dependencies, desirable or undesirable, arise from
the business rules embedded in the user requirements specification, they cannot be conve-
niently discarded if undesirable. Therefore, the data redundancies and modification anomalies
are removed by decomposing the relation schema such that the undesirable functional depen-
dencies are rendered desirable. The example in Section 7.1 illustrates this.
The functional dependencies in a relation schema that are semantically obvious from the
business rules are often explicitly specified and are collectively referred to as F. All possible FDs
that can be inferred from the set F plus the set F itself constitute the closure of F. The closure of
F is denoted as Fþ. Armstrong’s axioms are a set of seven inference rules pertaining to func-
tional dependencies that are used to derive Fþ. Table 7.1 in Section 7.2.2 summarizes the infer-
ence rules for functional dependencies.
It is possible to progressively synthesize a candidate key from a set of functional depen-
dencies through the systematic application of the principle of closure of an attribute set. A sec-
ond method for identifying the candidate key(s) of a relation schema uses a top-down approach
of decomposition. In this method, given a set of functional dependencies that prevail over a
Universal Relation Schema (URS), the superkey, K, consisting of all attributes of URS, is pro-
gressively decomposed by arbitrarily removing one attribute at a time from K until K0, a superkey
that is not further reducible (i.e., no proper subset of K has the uniqueness property), results,
yielding a candidate key. The other candidate keys of URS are derived using F and the initial
candidate key by the application of the pseudotransitivity rule of Armstrong’s axioms. Three
examples are used to illustrate the use of Armstrong’s axioms to derive candidate keys of a
relation schema using the method of synthesis and the method of decomposition. Finally, defini-
tions of prime/non-prime and key/non-key attributes are presented, along with a handful of rules
of thumb to choose a primary key from among the candidate keys.
Exercises
1. What is the purpose of the normalization technique in the data modeling process?
2. Explain why data redundancy exists for the attributes Discount and Location in the STOCK
table in Figure 7.1c.
3. Explain functional dependency between two attributes.
4. Why can functional dependency not be inferred from a particular relation state?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
5. Identify the set of functional dependencies in the relation instance CAR shown next. Does
this constitute the minimal cover for the set of functional dependencies present in CAR? If it
is not a minimal cover, derive a minimal cover.
CAR
Camry 4 Japan 15 30
Mustang 6 USA 0 45
391
Fiat 4 Italy 18 30
Accord 4 Japan 15 30
Century 8 USA 0 60
Mustang 4 Canada 0 30
Civic 4 Japan 15 30
Mustang 4 Mexico 15 30
Mustang 6 Mexico 15 45
Civic 4 Korea 15 30
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 7
13. Consider the universal relation schema INVENTORY (Store#, Item, Vendor, Date, Cost,
Units, Manager, Price, Sale, Size, Color, Location) and the constraint set F {fd1, fd2, fd3,
fd4, fd5, fd6, fd7}, where:
a. Construct the universal relation schema that includes (i.e., preserves) the set of FDs in F.
b. Do the FDs shown constitute a minimal cover of F? If not, derive a minimal cover.
c. Derive the candidate key(s) of F.
d. Select the primary key and justify your choice.
e. Considering your primary key and candidate key(s), distinguish between (1) key versus
non-key attributes and (2) prime versus non-prime attributes.
15. Given the set of functional dependencies F {fd1, fd2, fd3, fd4, fd5, fd6, fd7, fd8, fd9, f10,
f11}, where:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Functional Dependencies
a. Construct the Universal Relation Schema that includes (i.e., preserves) the set of FDs in F.
b. Do the FDs shown constitute a minimal cover of F? If not, derive a minimal cover.
c. Derive the candidate key(s) of F.
d. Select the primary key and justify your choice.
e. Considering your primary key and candidate key(s), distinguish between (1) key versus
non-key attributes and (2) prime versus non-prime attributes.
16. Given the Universal Relation Schema URS (A, B, C, D, F, G) and the set of FDs prevailing
over URS F {fd1, fd2, fd3, fd4, fd5}, where:
393
fd1: A ã G; fd2: {A, B} ã {C, D}; fd3: {B, C} ã {F, G};
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 8
NORMAL FORMS BASED ON
FUNCTIONAL DEPENDENCIES
In Chapter 7, we saw how certain functional dependencies (FDs) can create data
redundancy problems in a relation schema. This chapter shows how the process of
normalization can be used to resolve these problems. Recall that normalization is a
technique that facilitates systematic validation of the participation of attributes in a
relation schema from a perspective of data redundancy.
This chapter flows as follows. Section 8.1 introduces normalization as a technique to
facilitate systematic validation of the goodness of design of a relation schema. The first,
second, and third normal forms (1NF, 2NF, and 3NF) are explained with appropriate
examples in Subsections 8.1.1, 8.1.2, and 8.1.3, respectively. Boyce-Codd Normal Form
(BCNF) is presented as a stronger version of the 3NF in Section 8.1.4. The lossless-join
property and dependency preservation are then presented, in Section 8.1.5, as two critical
side effects that require attention in the normalization process. The motivating exemplar
originally introduced in Section 7.1 of the previous chapter is revisited in Section 8.2 to
provide an explanation for and illustration of the logic behind the normalization process.
Section 8.3 gives a comprehensive example of normalizing a universal relation schema
subject to a defined set of FDs. Section 8.4 briefly discusses denormalization, and Section
8.5 presents the use of reverse engineering in data modeling.
8.1 NORMALIZATION
Data redundancy and the consequent modification (insertion, deletion, and update)
anomalies can be traced to “undesirable” functional dependencies in a relation schema.
What is an undesirable functional dependency? Any FD in a relation schema, R, where
the determinant is a candidate key of R is a desirable FD because it will not cause data
redundancy and the consequent modification anomalies. Where the determinant of an FD
in R is not a candidate key of R, the FD will cause data redundancy and the consequent
modification anomalies and so is an undesirable FD.
So, what can we do with the undesirable FDs? The source of all FDs, desirable and
undesirable, is the set of user-specified business rules and so must be incorporated in the
database system. Therefore, FDs cannot be selectively ignored or discarded because they
are undesirable. The only solution is to somehow render the undesirable FDs desirable,
and the process of doing this is called normalization.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Normal forms (NFs) provide a stepwise progression toward the goal of a fully normal-
ized relation schema that is guaranteed to be free of data redundancies that cause modifi-
cation anomalies from a functional dependency perspective.1 A relation schema is said to
be in a particular normal form if it satisfies certain prescribed criteria; otherwise, the
relation schema is said to violate that normal form. First normal form (1NF) reflects one
of the properties of a relation schema—that is, by definition, a relation schema is in 1NF.
The normal forms associated with functional dependencies are second normal form (2NF),
third normal form (3NF), and Boyce-Codd Normal Form (BCNF).2
The violations of each of these normal forms signal the presence of a specific type of
“undesirable” FD. When a relation schema violates a certain normal form, it can be inter-
preted as equivalent to an inadvertent mixing of entity types belonging to two different
396 entity classes in a single entity type. Therefore, by appropriately decomposing the relation
schema, the undesirable FD causing the violation of a specific normal form can be ren-
dered desirable in the resulting set of relation schemas—that is, the relational schema.
It is important to note that the normalization process is anchored to the candidate
key of a relation schema, R. The assessment of normal form can be based on the primary
key or any candidate key of R. This is not an issue when R has only one candidate key.
Even when R has multiple candidate keys, normalization based on any and every candi-
date key (including the primary key) will yield the same set of normalized relation sche-
mas. Therefore, we will use the primary key as the basis for evaluating and normalizing a
relation schema. This does not by any means contradict the assertion that an FD in R is
undesirable only when the determinant of that FD is not a candidate key of R.
The following sections delineate each of the normal forms using meaningful examples.
Later in the chapter, we will address the situation of generating a fully normalized rela-
tional schema from a given set of FDs.
1
A fully normalized relation schema from the perspective of functional dependencies need not be
completely free of data redundancies and the consequent modification anomalies if multi-valued
dependencies are present in the relation schema. This will be addressed in Chapter 9.
2
E. F. Codd first proposed the 1NF, 2NF, and 3NF in 1972. Later, it was discovered that under cer-
tain conditions (i.e., FDs) a relation schema in 3NF continues to have data redundancies, causing
modification anomalies. A revised, stronger definition of the 3NF was then proposed by Boyce and
Codd in 1974, which came to be known as Boyce-Codd Normal Form (BCNF).
3
This constraint is relaxed in object-relational database systems, which allow non-1NF relations.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
DEFINITION
1NF definition: A schema R is in 1NF only when the attributes comprising the schema are atomic and
single-valued. Unless a schema is in 1NF, it is not a “relation schema.” That is, a relation schema is, by
definition, in 1NF.
Consider the schema ALBUM and the corresponding instance of ALBUM shown in
Figure 8.1a. Here, for a given Album_no, there is a single specific value of Price (i.e.,
Album_no ã Price) and a single specific value of Stock (i.e., Album_no ã Stock). On the
other hand, either there are multiple values for Artist_nm associated with an Album_no or
the domain of Artist_nm does not have atomic values. In either case, 1NF is violated. In
fact, by definition, ALBUM is not even a relation. The solution to render ALBUM in 1NF 397
is to simply expand the relation so that there is a tuple for each (atomic) Artist_nm for a
given Album_no. This is shown in NEW_ALBUM (Figure 8.1b), which is in 1NF, with
{Album_no, Artist_nm} as its primary key.
Price
Album_no
Artist_nm
Stock
Album Artist_nm
ERD for ALBUM
Artist_nm ALBUM violates 1NF
Artist_nm
ALBUM
Album_no Artist_nm Price Stock
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Artist_nm
Album_no
Price
Alb_art
ERD for NEW_ALBUM
(No Multi-valued Attribute)
Stock
NEW_ALBUM
398
NEW_ALBUM
Album_no Artist_nm Price Stock
BS123 Britney Spears 17.95 1000
JT111 Justin Timberlake 17.95 1200
BTL007 John Lennon 23.95
BTL007 Paul McCartney 23.95
BTL007 George Harrison 23.95
BTL007 Ringo Star 23.95
MJ100 Michael Jackson 17.95
JM456 John Mayer 16.95 1000
JM151 John Mayer 16.95 1000
MX789 Madonna 11.95 500
DJM237 John Denver 11.95 2000
DJM237 Michael Jackson 11.95 2000
DJM237 Madonna 11.95 2000
DR711 Diana Ross 12.95 1000
PM137 Paul McCartney 19.95
DEFINITION
2NF definition: A relation schema R is in 2NF if every non-prime attribute in R is fully functionally
dependent on the primary key of R—that is, a non-prime attribute is not functionally dependent on a
proper subset of the primary key of R.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
399
R: NEW_ALBUM (Album_no, Artist_nm, Price, Stock)
NEW_ALBUM
Album_no Artist_nm Price Stock
BS123 Britney Spears 17.95 1000
JT111 Justin Timberlake 17.95 1200
BTL007 John Lennon 23.95
BTL007 Paul McCartney 23.95
BTL007 George Harrison 23.95
BTL007 Ringo Star 23.95
MJ100 Michael Jackson 17.95
JM456 John Mayer 16.95 1000
JM151 John Mayer 16.95 1000
MX789 Madonna 11.95 500
DJM237 John Denver 11.95 2000
DJM237 Michael Jackson 11.95 2000
DJM237 Madonna 11.95 2000
DR711 Diana Ross 12.95 1000
PM137 Paul McCartney 19.95
F+: fd12: Album_no (Price, Stock); fd12x: (Album_no, Artist_nm) (Price, Stock);
What is the primary key of NEW_ALBUM? Using Armstrong’s axioms, we can infer
that fd12: Album_no ã {Price, Stock} and fd12x: {Album_no, Artist_nm} ã {Price, Stock}
exist in F+. Therefore, {Album_no, Artist_nm} is a candidate key of NEW_ALBUM; being the
only candidate key, it becomes the primary key of NEW_ALBUM. Are there any “undesir-
able” FDs in NEW_ALBUM? The answer is “Yes.” Given that {Album_no, Artist_nm} is the
primary key of NEW_ALBUM, fd1 and fd2 (or fd12) reflects a partial dependency of a
non-prime attribute on the primary key of NEW_ALBUM. Therefore, 2NF is violated in
NEW_ALBUM.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Now let us examine the relation instance of NEW_ALBUM to see if there are any data
redundancies in it that lead to modification anomalies. It is obvious that both Price and
Stock are repeated for every Artist_nm for a given Album_no (e.g., see Album_no BTL007).
Are there any modification anomalies in NEW_ALBUM?
If we want to change the value of Price or Stock of Album_no BTL007, four tuples in
the relation require update—a clear case of update anomaly. The anomaly is due to the
fact that it is possible to erroneously post different values for Price and Stock for the same
value of Album_no in the four different tuples.
Also, if we want to add a new tuple, (Album_no: XY111, Price: 17.95, and Stock: 100),
to NEW_ALBUM, it is not possible to do so without knowing a value for Artist_nm because
the primary key of NEW_ALBUM is {Album_no, Artist_nm} and a prime attribute, Artist_nm,
400 cannot have a null value. This is an insertion anomaly simply because of the inability to
add a genuine tuple to the database.
In order to delete Album_no BTL007, four tuples in the relation NEW_ALBUM must be
deleted. This is an example of a deletion anomaly. If all four tuples are not deleted, the
information conveyed by the data in the relation is distorted; hence the anomaly.
The next question is, How do we know that the undesirable FDs identified earlier (i.e.,
fd1 and fd2) indeed cause the data redundancies and the associated modification anoma-
lies? The way this can be verified is to eliminate the undesirable FDs from NEW_ALBUM
(that is, rendering them desirable) and see if the data redundancies and the associated
anomalies persist. The resolution of 2NF violation is a two-step process that decomposes
the target relation schema with undesirable FDs into multiple relation schemas that are
free from undesirable FDs:
1. Pull out the undesirable FD(s) from the target relation schema as a separate
relation schema.
2. Retain the determinant of the pulled-out relation schema as an attribute
(foreign key) or attributes of the leftover target relation schema to facilitate
reconstruction of the original target relation schema.
Applying this two-step process to NEW_ALBUM, we have the decomposition, D:
Note: ALBUM_INFO and ALBUM_ARTIST are arbitrary (meaningful) names assigned to the
decomposed relation schemas.
All non-prime attributes in ALBUM_INFO are fully functionally dependent on the pri-
mary key of ALBUM_INFO, and in ALBUM_ARTIST there are no non-prime attributes. So,
both are in 2NF. Reviewing the corresponding decomposed relations (see Figure 8.2b), it is
clear that there are no data redundancies causing modification anomalies in either rela-
tion. It is now possible to insert the tuple (Album_no: XY111, Price: 17.95, and Stock: 100)
in the database (i.e., in ALBUM_INFO) without having a value for Artist_nm. If the Price or
Stock of Album_no BTL007 changes, the corresponding update requires change of just one
tuple in spite of the fact that the album involves multiple artists. So, we can infer that the
resolution of undesirable FDs causing the 2NF violation eliminated the data redundancies
and the associated modification anomalies.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Stock
(1,m) (1,1)
ALBUM Made _by ARTIST
401
ERD corresponding to the 2NF solution
ALBUM_INFO ALBUM_ARTIST
Album_no Price Stock Album_no Artist_nm
BS123 17.95 1000 BS123 Britney Spears
JT111 17.95 1200 JT111 Justin Timberlake
BTL007 23.95 BTL007 John Lennon
MJ100 17.95 BTL007 Paul McCartney
JM456 16.95 1000 BTL007 George Harrison
JM151 16.95 1000 BTL007 Ringo Star
MX789 11.95 500 MJ100 Michael Jackson
DJM237 11.95 2000 JM456 John Mayer
DR711 12.95 1000 JM151 John Mayer
PM137 19.95 MX789 Madonna
DJM237 John Denver
DJM237 Michael Jackson
DJM237 Madonna
DR711 Diana Ross
PM137 Paul McCartney
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Clearly, this form of FD is by definition undesirable, because the determinant in the FD is,
again, not a candidate key of R. Fundamentally, the source of the problem is not the prin-
ciple of transitivity itself, because had A (or B) been another candidate key (alternate key)
of R, the transitive nature of X ã B (or X ã A) does not yield an undesirable FD. In
essence, the problem boils down to a non-prime attribute functionally determining
another non-prime attribute. Nonetheless, this is not a violation of 2NF, because A ã B
(or B ã A) in R is not a partial dependency. In fact, R is in 2NF.
DEFINITION
3NF definition: A relation schema R is in 3NF if no non-prime attribute is functionally dependent on
another non-prime attribute in R.
402
While violations of 2NF and 3NF are independent effects, since these two normal
forms are labeled as such (i.e., 2NF and 3NF), it is customary to specify that in order for a
relation schema to be in 3NF, it should also be in 2NF.
As an example, consider the relation schema:
FLIGHT (Flight#, Origin, Destination, Mileage)
and the set of FDs, F, where:
fd1: Flight# ã Origin; fd2: Flight# ã Destination; fd3: {Origin, Destination} ã Mileage;
In order to assess the normal form of FLIGHT, we first need to identify the candidate
keys of FLIGHT. To that end, using Armstrong’s axioms, we can infer that F+ includes the
following FDs:
fd12: Flight# ã {Origin, Destination};—Union rule
fd3x: Flight# ã Mileage;—Transitivity rule
fd123: Flight# ã {Origin, Destination, Mileage};—Union rule
Thus, we see that Flight# is a candidate key of FLIGHT. Since there are no other candi-
date keys of FLIGHT, Flight# becomes the primary key of FLIGHT. Since the primary key of
FLIGHT is an atomic attribute, a 2NF violation is impossible in FLIGHT. So, FLIGHT is in
2NF. Is it also in 3NF? No, because fd3 (i.e., {Origin, Destination} ã Mileage) causes a transitive
dependency in FLIGHT, because a composite non-prime attribute, {Origin, Destination} func-
tionally determines another non-prime attribute, Mileage. Thus, 3NF is violated in FLIGHT.
Let us now explore if there are data redundancies in FLIGHT. Figure 8.3 displays an
instance of a relation for FLIGHT. Note that the relation instance precisely reflects the FDs
specified. Data redundancy is exemplified in FLIGHT by the repetition of distance 1058
from Chicago to Dallas in more than one tuple.
Are there any modification anomalies in FLIGHT?
If the tuple identified by Flight# DL507 is deleted, the information that Seattle to Denver
is 1537 miles is inadvertently lost from the database, an example of a deletion anomaly.
Addition of a tuple to FLIGHT to indicate that the mileage for Cincinnati to Houston is
1100 (Origin: ‘Cincinnati’, Destination: ‘Houston’, Mileage: 1100) is not possible without a
Flight# identifying this route. Since Flight# is the primary key of FLIGHT, it cannot have
null values. This is an insertion anomaly.
Once again, if the normalization of FLIGHT to 3NF by the removal of the undesirable FD:
{Origin, Destination} ã Mileage
eliminates the modification anomalies, we can infer that the data redundancy and the
consequent modification anomalies are due to the presence of the undesirable FD in
FLIGHT.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
403
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
The resolution of 3NF violation is accomplished by applying the same two-step process
used earlier to resolve the 2NF violation (see Section 8.1.2). To review, the two-step pro-
cess is:
1. Pull out the undesirable FD(s) from the target relation schema as a separate
relation schema.
2. Retain the determinant of the pulled-out relation schema as an attribute
(foreign key) or attributes of the leftover target relation schema to facilitate
reconstruction of the original target relation schema.
Accordingly, we have the decomposition, D4:
D: DISTANCE (Origin, Destination, Mileage); FLIGHT (Flight#, Origin, Destination)
404 Note that DISTANCE is an arbitrary (meaningful) name assigned to the decomposed
relation schema. The leftover target relation schema retains the same name as FLIGHT.
There are no 3NF violations in the decomposed set of relation schemas, DISTANCE
and FLIGHT. So, the solution has yielded a relational schema that is in 3NF. A review of
the corresponding decomposed relations (see Figure 8.3) reveals that there are no data
redundancies causing modification anomalies in either relation of the 3NF design. It is now
possible to insert the tuple (Origin: ‘Cincinnati’, Destination: ‘Houston’, Mileage: 1100) in
the database (i.e., in DISTANCE) without having a value for Flight#. Deletion of the tuple
identified by Flight# 507 in FLIGHT no longer gets rid of the information that Seattle to
Denver is 1537 miles from the database (i.e., from DISTANCE). So, we can infer that the
resolution of undesirable FDs causing the 3NF violation eliminated the data redundancies
and the associated anomalies. The controlled redundancy between DISTANCE and
FLIGHT in the 3NF design via the referencing attributes {Origin, Destination} establishes
referential integrity constraint between the two relation schemas as reflected in the inclu-
sion dependency:
FLIGHT.{Origin, Destination} Í DISTANCE.{Origin, Destination}
which can also be used to reconstruct the original target relation schema.
4
There are two other ways to decompose FLIGHT. The merits/demerits of those solutions are dis-
cussed later in this chapter (see Section 8.1.5).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Consider a relation schema R (X, A, B, C), where X, A, B, and C are pair-wise disjoint
atomic or composite attributes. Suppose the set of FDs (a minimal cover over R), F, given
here, prevail over R:
F: fd1: {X, A} ã B; fd2: {X, A} ã C; and fd3: B ã A
Using Armstrong’s axioms, we first infer from fd1 and fd2 that {X, A} is a candidate key
of R. Then, based on fd3, we infer that {X, B} is another candidate key of R (using the
pseudotransitivity rule of Armstrong’s axioms). Choosing {X, A} as the primary key of R, R
is in 2NF because there are no partial dependencies in R.
R is also in 3NF because there is no transitive dependency of a non-prime attribute on
the primary key.
Note: B ã A (fd3) does not violate 3NF because A is a prime attribute. In fact, in fd3 a
non-prime attribute determines a prime attribute. 405
Therefore, R is, indeed, in 3NF when evaluated on the basis of {X, A} as the primary
key of R. Observe that B ã A, by definition, is an undesirable FD in R simply because B is
not a candidate key of R.
DEFINITION
BCNF definition: A relation schema R is in BCNF if for every non-trivial functional dependency in R, the
determinant is a superkey of R.
The immediate questions, then, ought to be: “Is there any data redundancy in R? If
so, does it cause any modification anomalies?” To explore this condition, let us review an
example.
STU_SUB (Stu#, Subject, Teacher, Ap_score) subject to
F: fd1: {Stu#, Subject} ã Teacher; fd2: {Stu#, Subject} ã Ap_score;
fd3: Teacher ã Subject
It is obvious that {Stu#, Subject} is a candidate key of STU_SUB. Choosing this as the
primary key of STU_SUB, fd1, fd2, and fd3 does not violate either 2NF or 3NF. In fact, it
can be shown that no FD in F+ violates 2NF or 3NF.
A relation instance for STU_SUB appears in Figure 8.4, where one can observe that
{Subject, Teacher} pairs are redundantly recorded. Does this cause any anomalies? Since
Teacher ã Subject, if we want to add a new Teacher for a Subject (e.g., Teacher: ‘Salter,’
Subject: ‘English’), it is not possible to do so unless a corresponding Stu# is also pro-
vided. Likewise, if Campbell is no longer advising and is being replaced by, say,
Smith, multiple tuples require modification. These are cases of insertion anomaly
and update anomaly, respectively, in STU_SUB. In short, STU_SUB is in 3NF and yet
modification anomalies are present in it. An examination of F (or F+) reveals that fd3
(Teacher ã Subject) in STU_SUB is an undesirable FD because the determinant in fd3,
Teacher, is not a candidate key of STU_SUB. Since fd3 is not a trivial dependency
(i.e., the dependent in fd3 is not a subset of the determinant of fd3), the fact that
Teacher is not a superkey of STU_SUB causes a BCNF violation in STU_SUB per the
definition of BCNF.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
406
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Again, if the removal of the undesirable FD, fd3 (Teacher ã Subject), from STU_SUB
eliminates the modification anomalies, we can infer that the data redundancy and the
consequent anomalies are due to the presence of the undesirable FD in STU_SUB. Let us
see if the removal of fd3, the undesirable FD, eliminates the BCNF violation from STU_SUB
also. The resolution of BCNF violation is accomplished by applying the same two-step
process used earlier to resolve the 2NF and 3NF violations (see Sections 8.1.2 and 8.1.3):
1. Pull out the undesirable FD(s) from the target relation schema as a separate
relation schema.
2. Retain the determinant of the pulled-out relation schema as an attribute
(foreign key) or attributes of the leftover target relation schema to facilitate
reconstruction of the original target relation schema. 407
Accordingly, we have the decomposition, D:
D: TEACH_SUB (Teacher, Subject); STU_AP (Stu#, Teacher, Ap_score)
Note: TEACH_SUB and STU_AP are arbitrary (meaningful) names assigned to the decom-
posed relation schemas.5
There are no BCNF violations in the decomposed set of relation schemas TEACH_
SUB and STU_AP, because in both relation schemas the determinant of the only FD
present in each is a superkey of the respective relation schemas. So, the solution has
yielded a relational schema that is in BCNF. Reviewing the corresponding decomposed
relations (see Figure 8.4), it is seen that there are no data redundancies causing
modification anomalies in either relation of the BCNF design. It is now possible to insert
the tuple (Teacher: ‘Salter,’ Subject: ‘English’) in the database (i.e., in TEACH_SUB)
without having a value for Stu#. Replacement of Campbell as a teacher requires change
of attribute value in just one tuple (i.e., in TEACH_SUB). So, we can infer that the
resolution of the undesirable FD causing the BCNF violation eliminated the data
redundancies and the associated modification anomalies. The controlled redundancy
between TEACH_SUB and STU_AP in the BCNF design via the referencing attributes
(Teacher) establishes referential integrity constraint between the two relation schemas,
as reflected in the inclusion dependency:
STU_AP.{Teacher} Í TEACH_SUB.{TEACHER}
which can also be used to reconstruct the original target relation schema.
5
Note that (Stu#, Teacher) ã Ap_score is in F+ and is also a candidate key of STU_SUB. By choosing
(Stu#, Teacher) as the candidate key, the BCNF violation can be viewed as a 2NF violation and
resolved accordingly to produce the same answer. There is another BCNF decomposition of STU_SUB.
The merits/demerits of this solution are discussed later in this chapter (see Section 8.1.5.3).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
design require attention during the decomposition that results from normalization. They
are dependency preservation and lossless-join6 decomposition. These two independent
properties are an expected requirement of an “ideal” design, and both are always tied to a
set of functional dependencies, F that holds over the relation schema, R being normalized.
6
The lossless-join property is also referred to as non-additive join property or sometimes non-loss
property.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
410
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
The clarification sought here is about what it means to not preserve the FD {Origin,
Destination} ã Mileage. Suppose we want to add a new flight (Flight#: DL111, Origin: Seattle,
Destination: Denver, Mileage: 1300). In the 3NF decomposition that is dependency-
preserving (the set of tables in the middle of Figure 8.5), it is legal to add the tuple (Flight#:
DL111, Origin: Seattle, Destination: Denver) to FLIGHT, while it is not possible to add the
tuple (Origin: Seattle, Destination: Denver, Mileage: 1300) to DISTANCE because {Origin,
Destination} is the primary key of DISTANCE. Since a tuple with values (Origin: Seattle,
Destination: Denver, Mileage: 1537) already exists in DISTANCE, another tuple that contains
a duplicate value for the primary key cannot be added. Thus, {Origin, Destination} ã Mileage
continues to mean that for a given value of {Origin, Destination}, say (‘Seattle’, ‘Denver’),
there is a single, specific value of Mileage, say 1537—that is, the FD is preserved. Also, when
the two relations (R1 and R2) are joined, the FD {Origin, Destination} ã Mileage will continue 411
to be preserved as in the original FLIGHT table from which the decomposition occurred.
On the other hand, in the 3NF decomposition that is not dependency-preserving (the
set of tables at the bottom of Figure 8.5), it certainly is legal to add the tuple (Flight#:
DL111, Origin: Seattle, Destination: Denver) to FLIGHT, and it is equally legal to add the
tuple (Flight#: DL111, Mileage: 1300) to DISTANCE. Clearly, it is not possible to verify the
FD {Origin, Destination} ã Mileage from any single relation. If there is a need to combine
(join) multiple relations (in this case, two) to check for an FD, then, by definition, that
dependency is not preserved in the relational schema.
What does this entail? This can be seen by joining the two relations, R1a and R2a.
When the two relations are joined, observe that the FD {Origin, Destination} ã Mileage is not
preserved because there is a tuple (Flight#: DL111, Origin: Seattle, Destination: Denver,
Mileage: 1537) and another tuple (Flight#: DL111, Origin: Seattle, Destination: Denver,
Mileage: 1300) in the joined relation; this means that for (‘Seattle’, ‘Denver’) we do not have
a single, specific value of Mileage; there are two values, 1537 and 1300. This demonstrates
the seriousness of failing to preserve the FD, {Origin, Destination} ã Mileage. Essentially,
the database has been contaminated with incorrect data. In short, failure to preserve the
specified functional dependencies renders the resulting database vulnerable to contamina-
tion in the context of the business rules conveyed by the specified functional dependencies.
A similar analysis can be conducted about the other 3NF decomposition, D: {R1b,
R2b}. It is important to note that the simple algorithm reflected by the two-step process
prescribed will always yield a dependency-preserving decomposition for a relation schema
that violates 2NF or 3NF (e.g., D: {R1, R2}).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Even though the decomposition of a relation schema during normalization may ulti-
mately yield a set of multiple relation schemas, in the stepwise process of normalization,
each step typically involves a decomposition that produces two relation schemas—that is,
a binary decomposition. Therefore, we will address the lossless-join property in binary
decompositions.
Formally, a decomposition D: {R1, R2} of a relation schema, R, is lossless (non-
additive) with respect to a set of FDs, F specified on R, if for every relation state r of R that
satisfies F, the natural join of r(R1) and r(R2) strictly yields r(R), from which the projec-
tions r(R1) and r(R2) emerged.
Note that the term “loss” in lossless-join implies loss of information, not loss of tuples.
In fact, loss join occurs when the natural join of r(R1) and r(R2) yields r0(R), which
412 includes additional spurious tuples beyond r(R) from which r(R1) and r(R2) are projected.
The additional spurious tuples amount to loss of information because their presence
corrupts the semantics of the source relation schema, R—that is, the FDs specified on R
no longer hold good in r0(R). Then, the decomposition is a loss-join decomposition, not a
lossless-join decomposition.
At this point, let us revisit the FLIGHT example. The three 3NF solutions for this
example are reproduced here:
F: fd1: Flight# ã Origin; fd2: Flight# ã Destination; fd3: {Origin, Destination} ã Mileage
is specified over the relation schema:
R: FLIGHT (Flight#, Origin, Destination, Mileage)
The three 3NF solutions of FLIGHT are the decompositions:
D: R1: FLIGHT (Flight#, Origin, Destination); R2: DISTANCE (Origin, Destination, Mileage)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Case 2: 0 R2 n
D: {FLIGHT_A, DISTANCE_A} where R1a: FLIGHT_A (Flight#, Origin, Destination); R2a: DISTANCE_A (Flight, Mileage)
1 1
Case 3: 0 R2 n
D: {FLIGHT_B, DISTANCE_B} where R1b: FLIGHT_B (Origin, Destination, Mileage); R2b: DISTANCE_B (Flight, Mileage)
1 1
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
A review of this example in Section 8.1.5.1 combined with the above analysis indi-
cates that while all three solutions yield 3NF decompositions, the following distinctions
should be noted by the reader:
• {FLIGHT, DISTANCE} delivers a lossless-join decomposition that is also
dependency-preserving with respect to the FDs, F, specified on the relation
schema FLIGHT.
• {FLIGHT_A, DISTANCE_A} delivers a lossless-join decomposition. However,
the decomposition is not dependency-preserving with respect to the FDs, F,
specified on the relation schema FLIGHT.
• {FLIGHT_B, DISTANCE_B} delivers a loss-join decomposition that also fails
to preserve the FDs, F, specified on the relation schema FLIGHT.
414
Clearly, the decomposition {FLIGHT, DISTANCE} is the only acceptable design. Once
again, note that the simple algorithm incorporated in the two-step decomposition process
always yields 2NF and 3NF solutions that are dependency-preserving and also possess the
lossless-join property.
A test for verifying the lossless-join property of a binary decomposition can be speci-
fied as follows. A decomposition D: {R1, R2} of a relation schema, R, is a lossless-join
decomposition with respect to a set of FDs, F that holds on R, if and only if F+ contains:
• either the FD (R1 Ç R2) ã R1
• or the FD (R1 Ç R2) ã R2
In other words, the attribute(s) common to R1 and R2 must contain a candidate key
of either R1 or R2.7 In our example, the join attribute in case 1 solution is {Origin,
Destination}, which is the primary key of DISTANCE. Likewise, in the decomposition in
case 2, the join attribute, Flight#, is the primary key of both FLIGHT_A and DISTANCE_A.
Therefore, both these solutions offer lossless-join decompositions, as confirmed by the
data in Figure 8.6a. The third solution presented, however, fails the test for lossless-join
decomposition prescribed above, because the join attribute in this case, Mileage, does not
contain a candidate key in either FLIGHT_B or DISTANCE_B. Once again, the spurious
tuples in the natural join {FLIGHT_B * DISTANCE_B} (see Figure 8.6a) confirm this.
fd1: {Stu#, Subject} ã Teacher; fd2: {Stu#, Subject} ã Ap_score; fd3: Teacher ã Subject
7
A more general specification of the test is: The FD (R1 Ç R2) ã (R1R2) or the FD (R1 Ç R2) ã
(R2R1).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
exist in F+. Thus, {Stu#, Teacher} is a candidate key of STU_SUB and, if treated as the primary
key of STU_SUB, renders fd3 a violation of 2NF.
Either way, the decomposition of STU_SUB derived in Section 8.1.4 as the solution is:
The decomposition is in BCNF. The union of the FDs that hold on individual relation
schemas of D {R1a, R2a} is:
Teacher ã Subject (in R1a) and {Stu#, Subject} ã Ap_score (in R2a)
Once again, this set of FDs is not a cover for F. In other words, while this solution
preserves fd2 and fd3, fd1 is not preserved in this solution. In fact, there is no
BCNF solution that can preserve fd1. Moreover, this second solution also fails to
produce a lossless-join decomposition, as can be seen by applying the prescribed test
for lossless-join decomposition. In this case, (TEACH_SUB1 Ç STU_AP1) is the
attribute Subject, which is neither the candidate key of TEACH_SUB1 nor the candi-
date key of STU_AP1. Accordingly, the decomposition yields a loss-join. This can be
verified by constructing the natural join of the relations TEACH_SUB1 and STU_AP1,
as shown in Figure 8.6b. The spurious tuples that result from the natural join
(TEACH_SUB1 * STU_AP1) are a clear proof for the absence of lossless-join property
in the solution.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
416
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
In sum, since both solutions do not provide full dependency preservation, the
designer’s choice ought to be the solution that generates at least lossless-join projections, if
BCNF design is desired. If one has to aim for one of these two properties, a lossless-join
condition is an absolute must (Elmasri and Navathe, 2010, p. 357).
Assuming that we always seek a lossless-join condition, if we are forced to choose
between BCNF without preserving dependencies and 3NF with preserved dependencies, it
is generally preferable to opt for the latter. After all, if one can’t test for dependency pres-
ervation efficiently, one either pays a high penalty in system performance or risks the
integrity of the data in the database. Neither is an attractive alternative. Thus, the limited
amount of redundancy allowed under 3NF is regarded as the lesser of the two evils.
Therefore, the design goals can be expressed in two basic options:
Option 1 417
BCNF
Lossless-join
Dependency preservation
This is the ideal option and is achieved only when a relational schema is in 3NF and
there are no BCNF violations in the relational schema, because then the relational schema
is also in BCNF.
If the above design cannot be achieved, we may have to settle for:
Option 2
3NF
Lossless-join
Dependency preservation
That said, with the advent of materialized views,8 it is possible to always achieve
Option 1 rather cost-effectively. In the absence of BCNF (Option 2), the application devel-
oper assumes the responsibility to keep redundant data consistent programmatically when
modifications to the database occur. If we opt for a BCNF design (Option 1), the cost of
application programming incurred in Option 2 is eliminated. However, we need to supple-
ment the BCNF design with a materialized view for each unpreserved FD in the minimal
cover of F. The advantage of this approach is that the DBMS takes care of the maintenance
of the materialized views as modifications occur in the source relations, thus assuring
preservation of the associated dependencies. While the DBMS overhead for the mainte-
nance of materialized views requires consideration, BCNF violations are usually few and
far between. Costs and inefficiencies associated with application programming in this situ-
ation are often far more burdensome than the DBMS overhead.
8
A view defines a “virtual” relation schema constructed from one or more base relation schemas;
unlike base relations, a view does not store data; the value of a view at any given time is a ‘derived’
relation and results from the evaluation of a specified relational expression at that time. A view is
just a logical window to view selected data (attributes and tuples) from one or a set of relation sche-
mas. Materialized views (also known as snapshots) are also derived like views except that they are
stored in the database and refreshed on every modification (i.e., maintained current by the DBMS as
modifications occur) in the source relations from where the materialized views are generated.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
419
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
fd4: Product ã Price; fd5: {Store, Product} ã Quantity fd6: Quantity ã Discount
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Applying the two-step normalization process to the partial dependencies fd1, fd2, fd3,
and fd4 causing the 2NF violation, we have the decomposition D:
D: {R1, R2, R3, R4, R5}
where:
The decomposed relational schema, D, consisting of the five (arbitrarily named) relation
schemas—STORE_LOC, STORE_SIZE, STORE_MGR, PRODUCT, and INVENTORY—does
not have any 2NF violations anymore; that is, D is in 2NF. In addition, the decomposition is
lossless and preserves all the FDs in F.
In fact, STORE_LOC, STORE_SIZE, STORE_MGR, and PRODUCT are also in 3NF
and BCNF.
However, INVENTORY is in violation of 3NF because fd6: Quantity ã Discount causes
a transitive dependency in INVENTORY. Application of the two-step normalization pro-
cess, once again, leads to the following decomposition of INVENTORY:
R5a: DISC_STRUCTURE (Quantity, Discount); R5b: INVENTORY (Store, Product,
Quantity)
and the inclusion dependency:
INVENTORY.{Quantity} Í DISC_STRUCTURE.{Quantity}
Both the decomposed relation schema, DISC_STRUCTURE, and the leftover relation
schema, INVENTORY (R5b), are in 3NF as well as in BCNF.
Thus, a BCNF solution yields the following result. The relation schema STOCK where
2NF, 3NF, and BCNF violations were present is replaced by the relational schema that
contains the set of relation schemas STORE_LOC, STORE_SIZE, STORE_MGR,
PRODUCT, INVENTORY, and DISC_STRUCTURE, as shown next:
D: {R1, R2, R3, R4, R5a, R5b}
where:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Observe that, incidentally, this consolidation also preserves fd7, and {Manager, Location} is
a candidate key of R123. The resulting solution is of the form:
D: {R123, R4, R5a, R5b}
where:
R123: STORE (Store, Location, Sq_ft, Manager); R4: PRODUCT (Product, Price);
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
8.3.1 Case 1
Let us investigate the example presented in Section 7.3.1. The URS and the set of FDs that
hold in it are reproduced here:
URS1 (Store, Branch, Location, Sq_ft, Manager, Product, Price, Customer, Address,
Vendor, Type)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
The set of FDs in F may also be expressed via a dependency diagram, as shown in
Figure 8.8b.
425
R1: STORE (Store, Type); R2: BRANCH (Store, Branch, Location, Sq_ft, Manager);
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
property. This is possible because R2 and R6 have at least one common candidate key.
Consequently, a superior design is of the form:
D: {R1, R26, R3, R4, R5}
where:
If the above design is correct and complete, there is no utility in normalization. So,
426 an important question arises: Is the above design correct and complete? In Section 8.2,
three conditions were specified to determine if a design is correct. Accordingly, the
above design is attribute-preserving and each relation schema in the relational schema D
{R1, R26, R3, R4, R5} is in BCNF. Does the join of D {R1, R26, R3, R4, R5} strictly yield
R? The answer is “No” because while R1 and R26 as well as R4 and R5 can be joined,
the results of these joins and R3 are not joinable. So, the design is incorrect. One
way to ratify or refute this is to see if we can arrive at the same solution through the
normalization process. In order to do that, we return to the original question: Is URS1
normalized?
Step 2: Choose a primary key for URS1—Since the choice of primary key from among
the candidate keys is essentially arbitrary (see Section 6.3.1), using the rules of thumb
prescribed in Chapter 7 (see Section 7.3.4), let us choose {Manager, Customer, Vendor}
as the primary key of URS1. Then, the other candidate key, {Store, Branch, Customer,
Vendor}, becomes the alternate key of URS1.
Step 3: Record the immediate normal form violated in URS1 with respect to the primary
key by each of the FDs in F—The normal form violations are shown in Table 8.1.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
URS1 (Store, Branch, Location, Sq_ft, Manager, Product, Price, Customer, Address, Vendor, Type)
NF violated in URS1
Note: While we are only interested in the normal form violations predicated upon the primary key, the
table above also shows the normal form violations with respect to the alternate key just to demonstrate
that the same FD can violate different normal forms depending on the candidate key on which the
evaluation is based.
Step 4: Resolve 2NF and 3NF violations in URS1; the sequence of resolution is immaterial—
We use the two-step process prescribed in Sections 8.1.2 and 8.1.3 to resolve the normal
form violations. It is important to note that as Step 4 is executed recursively, URS1 ceases to
remain a single relation schema. Instead, in each successive execution of this step, the URS1
used is a revised set of decomposed relation schemas, as shown here:
Execution 1:
R0: LORS1 (Manager, Customer, Vendor, Store, Branch, Location, Sq_ft, Product, Price, Type)
Note: LORS1 is an acronym for Leftover Relation Schema 1. The input to the next execution of this step
is URS1 {R1, R0}.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Execution 2:
Input: URS1 {R1, R0} FD: fd3: Vendor ã Product; Violation: 2NF in LORS1
R0: LORS2 (Manager, Customer, Vendor, Store, Branch, Location, Sq_ft, Price, Type)
The input to the next execution of this step is URS1 {R1, R2, R0}.
Execution 3:
Input: URS1 {R1, R2, R0} FD: fd5: Product ã Price; Violation: 3NF in ?
Observe that the attributes in fd5 have been fragmented in previous decompositions.
In order to evaluate the effect of fd5 properly, fd5 will have to be restored. This is accom-
plished by moving the dependent in fd5 (Price) to the relation schema R2, where the
determinant of fd5 (Product) now resides. Here is the revised URS1 {R1, R2, R0}:
R0: LORS2 (Manager, Customer, Vendor, Store, Branch, Location, Sq_ft, Type)
Accordingly, the violation of 3NF by fd5 in URS1 has moved from LORS2 to R2 and is
now resolved as follows:
Resolution: Decomposition URS1 {R1, R2, R3, R0}
where:
R0: LORS3 (Manager, Customer, Vendor, Store, Branch, Location, Sq_ft, Type)
The input to the next execution of this step is URS1 {R1, R2, R3, R0}.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Execution 4:
Input: URS1 {R1, R2, R3, R0} FD: fd7: Manager ã {Store, Branch}; Violation: 2NF in LORS3
# LORS4.{Manager} Í MANAGER.{Manager}
Note that the resolution of normal form violation due to fd7 also resolves the BCNF
violation in LORS3 due to fd6: {Store, Branch} ã Manager.
The input to the next execution of this step is URS1 {R1, R2, R3, R4, R0}.
Execution 5:
Input:URS1 {R1, R2, R3, R4, R0} FD:fd8: Store ã Type; Violation:3NF in ?
Once again, the attributes in fd8 have been fragmented in previous decompositions.
In order to evaluate the effect of fd8 properly, fd8 will have to be restored. This is accom-
plished by moving the dependent in fd8 (Type) to the relation schema R4, where the
determinant of fd8 (Store) now resides. The affected relation schemas in URS1 {R1, R2,
R3, R4, R0} are R4 and R0, as shown here:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Since URS1 {R1, R2, R3, R4, R5, R0} is still not fully normalized—for instance,
LORS5 is not in BCNF—we continue with the normalization process. Accordingly, the
input to the next execution of this step is URS1 {R1, R2, R3, R4, R5, R0}.
Execution 6:
Input: URS1 {R1, R2, R3, R4, R5, R0} FD: fdx = {fd1 U fd4 U fd6) Violation: 3NF in ?
fdx: {Store, Branch} ã {Location, Sq_ft, Manager};
Since {Store, Branch} is a candidate key of R4, fdx does not violate any normal form
in R4. Hence, R4 is in BCNF. As can be seen, LORS5 is also in BCNF.
Step 5: Resolve all BCNF violations in URS1; the sequence of resolution is immaterial—
There are no other BCNF violations in URS1 {R1, R2, R3, R4, R5, R0}.
Thus, the final design URS1 {R1, R2, R3, R4, R5, R0} that is free from modification
anomalies is as follows:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Is the above design correct and complete? The above design is attribute-preserving,
and each relation schema in the relational schema URS1 {R1, R26, R3, R4, R5,R0} is in
BCNF. Does the join of D {R1, R26, R3, R4, R5} strictly yield R? Yes, it does. Therefore, 431
the solution is correct. Since the design also yields a lossless-join decomposition that is
also dependency-preserving, the solution is also complete.
It is crucial to observe that LORS5 is part of the final relational schema. Without
LORS5 as a part of the final decomposed URS1, the solution is incomplete and therefore
incorrect. The initial approach of constructing the relational schema by simply mapping
the FDs in F to relation schemas explored at the start of Section 8.3.1 fails to generate
LORS5 as a relation schema in the design and therefore is flawed.9 The solution generated
above via normalization demonstrates that the normalization process is indispensable for
analyzing a universal relation schema and the set of FDs holding over it, thereby generat-
ing a correct and complete relational schema.
8.3.2 Case 2
Let us next review the example presented in Section 7.3.3, this time as a second exer-
cise in normalization. The URS and the set of FDs that prevail over it are reproduced
here:
URS3 (Proj_nm, Emp#, Proj#, Job_type, Chg_rate, Emp_nm, Budget, Fund#, Hours,
Division)
fd4: Proj_nm ã Budget; fd5: Fund# ã Proj_nm; fd6: {Proj_nm, Emp#} ã Hours;
fd7: Proj_nm ã Proj#; fd8: Emp# ã Job_type; fd9: {Proj#, Emp#} ã Fund#
The set of FDs in F may also be expressed via a dependency diagram, as shown in
Figure 8.8c.
9
This is further clarified when the relational schema is reverse engineered to an ERD (see Section
8.4.1). A simpler decomposition algorithm is available if a 3NF solution is acceptable; see Elmasri
and Navathe (2010, p. 342) for details.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
432
Convinced from the previous example that arbitrary development of relation schemas
from a set of FDs yields an incomplete and incorrect solution, let us reject that approach
and proceed with the evaluation of URS3 from the normalization perspective.10 URS3 is in
1NF by virtue of its definition as a relation schema—that is, there are no multi-valued
attributes or composite attributes in URS3, and all FDs in F are preserved in URS3.
Lossless-join property is not an issue since URS3 is a single relation schema. If URS3 is in
BCNF, it is guaranteed that there will be no data redundancies/modification anomalies in
URS3 due to functional dependencies. In order to check this, we need to know at least one
candidate key of URS3, since normal form violations can be checked only with respect to
candidate keys (or a primary key, it being one of the candidate keys of the relation
schema). So, we start with Step 1 of the normalization heuristic.
Step 1: Identify the candidate keys of URS3 given the set of FDs, F—In Chapter 7, we
derived the candidate keys of URS3 (see Section 7.3.3.) as:
{Proj#, Emp#, Division}, {Proj_nm, Emp#, Division}, and {Fund#, Emp#, Division}
Step 2: Choose a primary key for URS3—Since the choice of primary key from among
the candidate keys is essentially arbitrary, using the rules of thumb prescribed in
Chapter 7, let us choose {Proj#, Emp#, Division} as the primary key of URS3. Then, the
other two candidate keys, {Proj_nm, Emp#, Division} and {Fund#, Emp#, Division}, become
the alternate keys of URS3.
Step 3: Record the immediate normal form violated in URS3 with respect to the primary
key by each of the FDs in F—Based on the primary key chosen, viz., {Proj#, Emp#,
Division}, normal forms are violated in URS3 by:
10
The unconvinced reader may, as an exercise, construct the relational schema by mapping each FD
to a relation schema, do some consolidation based on Armstrong’s axioms, and compare the results
with the normalized solution.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Step 4: Resolve 2NF and 3NF violations in URS3; the sequence of resolution is
immaterial—We now know that this step is executed recursively as necessary and that in
successive executions of this step, the URS3 is progressively revised to a set of decom-
posed relation schemas. Since we have already observed the successive, iterative execu-
tion of this step for each of the 2NF and 3NF violations in the first example, we handle
them collectively in one execution of this step in this example. Solving for the 2NF and
3NF violations in URS3, we have:
Decomposition: URS3 {R1, R2, R3, R4, R5, R0}
where:
While at this stage of the solution, the revised relational schema URS3 is not in BCNF, it
possesses the lossless-join property, and all FDs in F are preserved. Note that, incidentally,
the decomposition of URS3 for the resolution of 2NF and 3NF violations eliminated the
BCNF violation due to fd7 by the migration of Proj_nm to R1. However, the decomposition
has seeded a new BCNF violation in R5.
Step 5: Resolve all BCNF violations in URS3; the sequence of resolution is immaterial—
The only BCNF violation in the URS3 {R1, R2, R3, R4, R5, R0} is the one in:
The resolution of the BCNF violation using the prescribed two-step process results in:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
R5a can be consolidated with R2, since the determinant of the FDs preserved in both is
the same—viz., Fund#. Thus, we have:
Since Proj# and Proj_nm functionally determine each other, as per fd1 and fd7, one of the
two attributes moves to R1 in the process of decomposing R2a, to eliminate the 3NF vio-
lation. Thus, we have:
*Indicates revision to relationships via changes in inclusion dependencies in order to indicate a lossless-join
navigation path.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
longer present; instead, the decomposition has created a loss-join relationship between
R5b and R0. A more robust design is to propagate the revision of R5b to R0 and thus
retain the originally established lossless-join relationship between R0 and R5.
Step 6: Propagate the revisions to the primary keys in the BCNF resolutions to the pri-
mary keys of related relation schemas—In this example, the propagation involves just one
relation schema (R0):
This design is superior because the lossless-join property is fully restored. Thus, the 435
final form of the BCNF design that retains the lossless-join property is:
Decomposition: URS3 {R1, R2, R3, R4, R5b, R0}
where:
The final solution is attribute-preserving and the join of URS3 {R1, R2, R3, R4, R5b,
R0} does strictly yield R. All the relation schemas in URS3 {R1, R2, R3, R4, R5b, R0} are
in BCNF. Therefore, the solution is correct. As for completeness, the solution is a lossless-
join design, but is not dependency-preserving. The only way to compensate for the
dependencies that are not preserved in this solution is to supplement the relational
schema with the necessary materialized views for covering the lost FDs. In this case, the
two lost FDs happen to have the same determinant and so can be captured in a single
materialized view of the form:
MV1: SCHEDULE (Proj#, Emp#, Fund#, Hours)
Finally, while the choice of primary key at the beginning of the normalization process
(Step 2) may have some impact on the execution of the normalization steps prescribed,
the final solution will be the same or a close equivalent no matter which candidate key
is chosen as the primary key. Verification of this is left as an exercise for the reader.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
fd4: Proj_nm ã Budget; fd5: Fund# ã Proj_nm; fd6: {Proj_nm, Emp#} ã Hours;
fd7: Proj_nm ã Proj#; fd8: Emp# ã Job_type; fd9: {Proj#, Emp#} ã Fund#
The first step in the fast-track algorithm asks for developing the canonical cover
(Gc) of F. Here, it just happens that Gc = F. Note that since fd1 and fd7 indicate mutual
referencing between the attributes Proj# and Proj_nm, the dependent in fd5 can also be
Proj#, essentially providing an alternate canonical cover for F. This property will be
invoked later, in the development of the normalized solution.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Next, we need to derive the candidate keys of URS3. Since this was already done once
(see Section 7.3.3), we have the candidate keys of URS, which are the following:
{Proj#, Emp#, Division}, {Proj_nm, Emp#, Division}, and {Fund#, Emp#, Division}
The next two steps in the fast-track algorithm dictate that we specify a relation
schema for each FD in Gc and that we specify another relation schema containing the
attributes of a candidate key of the URS if this set of attributes is not present in one of the
specified relation schemas. Accordingly, we have R1 through R9 to meet the first rule and
R0 to fulfill the second rule:
Since R1 is a subset of R7, R1 is redundant and can be eliminated. Next, the consoli-
dations based on common primary key will lead to a relational schema as shown here:
Observe that, in this solution, all relation schemas except R9 are in BCNF. R9 violates
BCNF since Fund# ã Proj# is in F+. The design is dependency-preserving. On first glance,
it may appear that there are several loss-join conditions in the design (e.g., between R6
and R9, between R6 and R0, and between R5 and R6). While this algorithm does not
explicitly handle this condition, the fact that attributes Proj# and Proj_nm are mutually
substitutable based on the FDs fd1 and fd7, slightly revising Fc as the input will indeed
resolve this issue. For now, based on this property of substitutability due to the mutual
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
referencing between the attributes Proj# and Proj_nm, it is seen that R6 and R9 can be
consolidated as follows:
R69: PRJ_EMP (Proj#, Emp#, Fund#, Hours)
Thus, the final design generated by the fast-track algorithm leads to a non-loss 3NF
solution that is dependency-preserving, as shown here:
Observe that the solution here is identical to the one arrived at in Step 4 of Case 2 in
Section 8.3.2. Having arrived at this stage of solution rather quickly, the only task that
remains is to resolve the BCNF violation in R69 caused by the FD in F+ Fund# ã Proj#.
While this resolution is exactly the same as in Step 5 of Case 2 in Section 8.3.2, a slight
variation in the approach is shown here:
Given the substitutable nature of the attributes Proj# and Proj_nm, R69b can be seen
as a proper subset of R47 and hence redundant.
It is crucial to note that this decomposition to resolve the BCNF violation in R69
changes the primary key of R69a. Three consequences result from this process:
• The final relational schema in BCNF guarantees total eradication of modifi-
cation anomalies due to functional dependencies.
• The solution is not dependency-preserving. Two FDs preserved in the 3NF
solution are no longer preserved in this BCNF solution. They are:
fd6: {Proj_nm, Emp#} ã Hours; fd9: {Proj#, Emp#} ã Fund#
• The lossless-join that prevailed in the 3NF solution is now violated in the
BCNF solution. That is, a non-loss join between R0 and R69 present in the
3NF solution is now violated in R0 and R69a.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Thus, the final non-loss BCNF solution is identical to that in Case 2, presented in
Section 8.3.2: 439
Observe that the price paid to achieve a non-loss BCNF solution is the inability to
preserve the following two FDs:
fd6: {Proj_nm, Emp#} ã Hours; and fd9: {Proj#, Emp#} ã Fund#
A compensatory mechanism for this shortcoming using “materialized views” is pre-
sented in Section 8.3.2.
8.3.3.2 Case 3
The next example, first presented in Section 7.3.2.1, offers a different variation for a
normalization exercise. The URS and the set of FDs that prevail over URS are reproduced
here:
URS2 (Company, Location, Size, President, Product, Price, Sales, Production, Supplier)
fd4: {Product, Company} ã Price; fd5: Sales ã Production; fd6: {Company, Product} ã Sales;
fd7: {Product, Company} ã Supplier; fd8: Supplier ã Product; fd9: President ã Company
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
The set of FDs in F may also be expressed via a dependency diagram, as shown in
Figure 8.8d.
440
Let us step through the fast-track algorithm to quickly arrive at the non-loss,
dependency-preserving stage of the normalization process.
Step 1: Derive the canonical cover Gc of F.
Here, Gc = F.
In addition, since fd3 and fd9 indicate mutual referencing between the attributes
Company and President, it is obvious that another canonical cover can be derived by
substituting the attribute President for Company in the relevant FDs in F (Gc).
Step 2: Derive the candidate keys of URS | Gc.
In Chapter 7, we already derived the candidate keys of URS2 (see Section 7.3.2) as
follows:
{Company, Product}; {Company, Supplier}; {President, Product}; and {President,
Supplier}.
Step 4: If none of the relation schemas in D contains a candidate key of URS, create one
more relation schema in D that contains attributes that form a candidate key of URS.
R4, R6, and R7 in D shown in Step 3 each contain a candidate key of URS2; hence,
there is no need to create an additional relation schema.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Step 5: Eliminate redundant relation schemas from the resulting relational schema (database
schema).
In the presence of R3, it can be seen that R9 is redundant and should be eliminated.
In addition, R8 is a proper subset of R7 and hence, by definition, redundant; it should
therefore be eliminated.
Step 6: Consolidate D to a parsimonious set of relation schemas by combining relation
schemas in D that share the same primary key.
The consolidation leads to D: {R123, R467, R5}
where:
The final design generated by the fast-track algorithm shown above is a non-loss 3NF
solution that is also dependency-preserving. The only additional step required at this
point is to examine each relation schema in the design for possible violation of BCNF. A
critical point to note here is that one must evaluate every FD preserved in each relation
schema since it is possible that some of the FDs are present in F+. In this example,
fd8: Supplier ã Product violates BCNF in R467; therefore, modification anomalies will
persist in this solution.
Decomposition of R467 to resolve the BCNF violation using the basic two-step algorithm
prescribed in Sections 8.1.2, 8.1.3, and 8.1.4 yields the following:
R8: SUPPLIER (Supplier, Product); R46: CO_SUP (Company, Price, Sales, Supplier)
Observe that the resolution of immediate BCNF violation has resulted in a change in
the primary key of R46. However, in this example, there is no ripple effect (disruption) on
the non-loss status in the rest of the relation schemas in the solution, obviating any need
to propagate changes. On the other hand, the BCNF solution is not dependency-preserving
anymore. The FDs in Gc that are not preserved in the BCNF solution are as follows:
fd4: {Product, Company} ã Price; fd6: {Company, Product} ã Sales;
fd7: {Product, Company} ã Supplier.
The final non-loss BCNF solution is presented here:
D: {R123, R46, R5, R8}
where:
R46: CO_SUP (Company, Price, Sales, Supplier); R8: SUPPLIER (Supplier, Product)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Since Company ã President (fd3) and President ã Company (fd9), either Company or
President can be the primary key of R123 and the other becomes the alternate key of
R123. Also, using the rule of pseudotransitivity, since President ã Company, President can
replace Company in R46 to generate an equivalent relation schema, as follows:
8.4 DENORMALIZATION
442 The jovial remark, “Normalize until it hurts, and denormalize until it works,” while cer-
tainly funny, also indicates that the concept of denormalization is ill-understood. The
general case against normalization is that the process results in lots of logically separate
relations (tables), leading to lots of physically separate stored files and the consequent
data retrieval inefficiencies.
Denormalization entails combining relations so that they are easier to query. The
combined relations may lose their normalized status in that they may reintroduce data
redundancies eliminated by the normalization process. The general misunderstanding,
however, is that denormalization always improves data retrieval performance.
Formally, denormalization may be defined as replacing a set of (often normalized)
relation schemas D {R1, R2, . . . . . Rn} by their join R, such that projecting R over the set
of attributes of R1, R2, . . . . . Rn, respectively, is guaranteed to yield the original set D.
The objective is to reduce the number of joins that may be required during the run time
of queries (data retrieval) by including some of these joins structurally as a part of the
database design.
As an example, consider the following relation schema:
R: CUSTOMER (Id, Name, Street, City, State, Zip_code).
From the semantics of this general scenario, it is obvious that an FD of the form:
Zip_code ã {City, State}
holds on CUSTOMER, resulting in a violation of 3NF. In a strict normalization paradigm,
one would decompose R to eliminate the 3NF violation, leading to the solution D {R1, R2},
where:
R1: ZIP (Zip_code, City, State); R2: CUSTOMER (Id, Name, Street, Zip_code)
While the normalization process eliminates data redundancies and the associated
modification anomalies in the design, most queries on CUSTOMER may require joining
the two relations R1 and R2, while a denormalized R eliminates the repeated join opera-
tion and the attendant retrieval inefficiencies. The semantics of the scenario indicates
that a Zip_code is rarely deleted, added, or updated. Thus, the expected modification
anomalies are not a serious practical problem. In this case, one may opt for the denorma-
lized design instead of the normalized solution. However, execution of any query that
exclusively seeks Zip_code data needs to unnecessarily access a relatively larger relation
(R) and be accordingly less efficient.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
» Step 1 Translate the normalized relational schema to an information-preserving logical schema based on available
information.
• Represent the foreign key attribute(s) in a logical scheme by a relationship type in the ERD*:
© Establish the relationship type
© Connect the relationship type to the parent (referenced) and child (referencing) entity types – if the
child is a weak entity type then the relationship type is mapped as an identifying relationship type
© Map the (min, max) to the appropriate edge of the relationship type
• Map attributes of individual logical schemes to corresponding entity types in the ERD:
• Transform gerund entity types to n-way relationship types. Attributes of the gerund entity type remain as
attributes of the relationship type
• Any relationship with a gerund entity type is transformed to a relationship type with the cluster entity type
that the gerund represents
• A weak entity type not participating in any relationship other than the identifying relationship is transformed
to a multi-valued (atomic/composite) attribute of the parent entity type
*The foreign key attribute is removed from the referencing (child) entity type unless the attribute plays some other
role in the entity type in which case the attribute is retained in the entity type.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
When fully normalized to BCNF, the resulting relational schema looks like this:
URS1 {R1, R2, R3, R4, R5, R0}
where:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
447
R (0,p) R (0,n)
(0,n) (0,1) Location
STORE MANAGER PRODUCT
R Sq_ft
Store Branch
FIGURE 8.10b Design-Specific ERD reverse engineered from the logical schema
in Figure 8.10a
Finally, the Design-Specific ERD is abstracted up one more notch based on the
following procedure:
• The gerund entity types are transformed into m:n relationships between the
participating entity types.
• Weak entity types with no relationship other than the identifying relation-
ship, and no attributes other than the partial keys are transformed into multi-
valued attributes of the identifying parent.
• The (min, max) grammar, for the structural constraints of relationship types
is replaced by cardinality ratio and participation constraint expressed as
independent constructs.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
The Presentation Layer ERD for this case is shown in Figure 8.10c.
448
FIGURE 8.10c Presentation Layer ERD reverse engineered from the Design-Specific ERD
in Figure 8.10b
Reverse engineering a relational schema can generate more than one solution—
that is, more than one ERD that can produce the same relational schema. From a
database-implementation perspective, the effect is insignificant since the same relational
schema is generated. However, at the conceptual level, the designer/user may relate
better to one ERD than the other. While presenting an equivalent ERD for the readers’
review, we do not make any comparative assessment of the efficacy of two different
reverse engineered ERDs in this book. Figure 8.11a is an alternative design equivalent to
the reverse engineered Design-Specific ERD shown in Figure 8.10b. By selecting {Store,
Branch} as the primary key instead of Manager in R4, R4 can be depicted as a weak
entity child of R5 in the ERD with no impact on the relational schema. The correspond-
ing Presentation Layer ERD appears in Figure 8.11b. The fact that the attribute Manager
is a unique identifier of BRANCH does not appear in this ERD. This is because a weak
entity type, by definition, does not have an independent unique identifier. Therefore,
this information should be carried in the list of semantic integrity constraints that
accompanies the ERD.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
449
FIGURE 8.11a An alternative design equivalent to the Design-Specific ERD in Figure 8.10b
FIGURE 8.11b Presentation Layer ERD reverse engineered from the Design-Specific ERD
in Figure 8.11a
The logical schema in Figure 8.12a is the translation of the relational schema
constructed by directly mapping the FDs in F to relation schemas instead of following
the normalization process. At the end of Section 8.3.1, it is pointed out that the solution
from this arbitrary approach fails to capture a relation schema—viz., LORS5 (Manager,
Customer, Vendor)—that the normalized solution yields. The consequence of this error
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
becomes obvious in the ERD that is reverse engineered from the logical schema depicted
in Figure 8.12a, in that we see three islands of ERDs that are unconnected (see
Figure 8.12b). Since the original source of the reengineering task is a single relation
schema—viz., URS1—the ERD in Figure 8.12b cannot be correct.
450
FIGURE 8.12a Logical schema for URS1 {L1, L3, L4, L5, L26} constructed by directly
mapping FDs to relation schemas
CUSTOMER VENDOR
1 n
Store
1
STORE
PRODUCT
BRANCH
Type Location
Manager
FIGURE 8.12b Presentation Layer ERD for the logical schema in Figure 8.12a
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
452
FIGURE 8.13b Design-Specific ERD reverse engineered from the logical schema
in Figure 8.13a
m n
COMPANY Procures SUPPLY
President
Company STOCK
Sales
1
SALES
Production
FIGURE 8.13c Presentation Layer ERD reverse engineered from the Design-Specific ERD in
Figure 8.13b
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
454
FIGURE 8.14b Design-Specific ERD reverse engineered from the logical schema in
Figure 8.14a
The last step is to reverse engineer the Design-Specific ERD to the Presentation layer
ERD. This is done by following the procedure suggested in Step 3 of the reverse-
engineering heuristic in Figure 8.9. The reverse-engineered Presentation Layer ERD is
shown in Figure 8.14c. An additional level of abstraction of Figure 8.14c appears in
Figure 8.14d. In fact, only Figure 8.14d represents the Presentation Layer ERD for the
given relational schema.
Emp#
Fund# Hours
ASSIGNMENT Emp_nm
FIGURE 8.14c Design-Specific ERD reverse engineered from Figure 8.14b to a higher level of
abstraction
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
455
FIGURE 8.14d Presentation Layer ERD reverse engineered from the Design-Specific ERD
in Figure 8.14c
FIGURE 8.15a An alternative design equivalent to the Design-Specific ERD in Figure 8.15b
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
456
FIGURE 8.15b Presentation Layer ERD reverse engineered from the Design-Specific ERD
in Figure 8.15a
FIGURE 8.15c Presentation Layer ERD reverse engineered from the Design-Specific ERD
in Figure 8.15a: an alternative design
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Chapter Summary
The root cause of data redundancy and the consequent modification anomalies in a database is
the presence of undesirable functional dependencies in a relation schema. Any FD whose
determinant is not a candidate key of the relation schema in which the FD holds is an undesir-
able FD. FDs emerge from business rules of the application domain and so cannot be ignored or
discarded when they are undesirable. The set of FDs that prevail over a relation schema is
referred to as F. This chapter explores the solution for this problem.
Normalization is the mechanism capable of systematically weeding out undesirable FDs
from a relation schema. The undesirable FDs manifest themselves either as partial dependen-
cies or as transitive dependencies in a relation schema. Normalization prescribes a method to
render the undesirable FDs desirable. This is done by decomposing a relation schema with 457
undesirable FDs to a set of relation schemas so that, in each relation schema of the decom-
posed set, the determinant of the preserved FD is the candidate key of that relation schema.
Normal forms (NFs) provide a stepwise progression towards attaining the goal of a fully
normalized relational schema that is guaranteed to be free of data redundancies that cause
modification anomalies from a functional dependency perspective. First normal form (1NF)
defines a relation schema—that is, a schema that is not in 1NF is not a relation schema.
Elimination of partial dependencies establishes 2NF. Two variations of transitive dependencies
are resolved by 3NF and Boyce-Codd Normal Form (BCNF). A relation schema in BCNF is
guaranteed to be free of modification anomalies due to functional dependencies.
A relational schema (that is, a set of relation schemas) in BCNF does not necessarily result
in a good database design. Relational schemas created through the decomposition process
should also exhibit the lossless-join and dependency preservation properties. The lossless-join
property is critical to ensure that spurious tuples are not generated by a join operation between
two relations in the relational schema. The dependency preservation property, which ensures
that each functional dependency is represented in some individual relation schema resulting
after the decomposition, is sometimes sacrificed.
After introducing first, second, third, and BCNSs plus the lossless-join and dependency
preservation properties in the context of a series of individual examples, the motivating exemplar
introduced originally in Chapter 7 was reexamined to illustrate how a set of functional depen-
dencies derived from user-specified business rules can be used to develop a fully normalized
relational schema. This discussion answers the three questions presented in Section 7.1:
(a) How are data redundancies identified? (b) How is the base relation schema decomposed?
and (c) How can the decomposition be evaluated for correctness and completeness?
This chapter presented a comprehensive approach to the normalization process that
incorporates trade-offs among a BCNF design, a lossless-join decomposition, and dependency
preservation. The approach involves (a) the derivation of the candidate keys along with the
primary key of the initial universal relational schema (URS), (b) the identification and resolution
of all second and third normal form violations in the URS, (c) the resolution of all Boyce-Codd
normal form violations in the URS, and (d) an evaluation of whether the relational schema is
dependency-preserving and yields a lossless-join decomposition.
At that point, once the various issues and trade-offs associated with the normalization
process had been explained, a fast-track algorithm to quickly achieve a non-loss and dependency-
preserving solution was introduced. A reader who completely understands the ramifications of the
various issues associated with normalization can use this algorithm to quickly achieve a non-loss
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
3NF design that is also dependency-preserving, and can then use the traditional process of
normalization to resolve the BCNF violation, if any, in the relational schema.
When a relational schema is in BCNF, it may not be possible to achieve both lossless-join
and dependency-preservation properties in the design. One alternative in this situation is to
accept a relational schema in third normal form and handle the potential data-redundancy pro-
blems via application programs. Often, a superior alternative is to establish a BCNF design that
possesses the lossless-join property and then use materialized views to capture the few func-
tional dependencies that are not preserved.
Since the early 1990s, reverse engineering has drawn the attention of database research-
ers. Traditionally, reverse engineering seeks to examine operational software systems with a
view to analyzing data patterns to extract data structures and behaviors. This chapter presented
458
a unique use of reverse engineering in the data modeling task in order get a better grip on
design errors. To that end, the scope of Section 8.5 is restricted to reverse-engineering normal-
ized relational schemas to their conceptual counterparts, viz., ERDs. Reverse engineering a
normalized relational schema to an ERD reveals how the ERD should have been to begin with.
Such discovery enriches the designers’ understanding of the application domain being modeled.
The chapter concluded with a description and a few demonstrations of a heuristic to reverse
engineer a relational schema to a Presentation Layer ERD.
Exercises
1. What is the source of functional dependencies?
2. What is the difference between a desirable and an undesirable functional dependency?
Describe the nature of the problems caused by undesirable functional dependencies.
What prevents us from simply ignoring undesirable functional dependencies?
3. What is the role of normalization in the database design process?
4. Figure 8.1 illustrates how a first normal form violation of ALBUM can be resolved. Sup-
pose that a single album could never have more than four artists. Describe another
approach for defining ALBUM that does not violate first normal form.
5. Suppose {A, B} ã C in a relation schema R. Under what condition would this not reflect a
full functional dependency?
6. Consider the relation instance of the STU-CLASS relation schema:
STU-CLASS (Snum, Sname, Major, Cname, Time Room)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
Identify at least one update anomaly, one insertion anomaly, and one deletion anomaly.
7. What are the undesirable functional dependencies in the relation instance of the STU-
CLASS relation schema shown in Exercise 6?
8. Consider the following relation instance of the CAR relation schema:
459
CAR
Camry 4 Japan 15 30
Mustang 6 USA 0 45
Fiat 4 Italy 18 30
Accord 4 Japan 15 30
Century 8 USA 0 45
Mustang 4 Canada 0 30
Civic 4 Japan 15 30
Mustang 4 Mexico 15 30
Mustang 6 Mexico 15 45
Civic 4 Korea 15 30
SPORT
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
EXAM
McGrady Math 3
Howard Math 4
McGrady English 2
Jackson English 1
Yao Math 1
Yao Chemistry 1
Sura Math 2
Ward English 3
Taylor Chemistry 2
Taylor Math 5
Ewing Chemistry 3
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
13. Consider the relation schema PATIENT_VISIT (Patient, Hospital, Doctor) and the relation
instance given here:
150 TENNIS 50
175 KARATE 50
200 TENNIS 50
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
SUPPLY
S# Sname P# Qty
S1 SMITH P1 300
S1 SMITH P2 300
S1 SMITH P3 400
S1 SMITH P4 200
S1 SMITH P5 100
S1 SMITH P6 100
S2 CLARK P1 300
S2 CLARK P2 400
S3 MORRIS P2 200
S4 MCNARY P2 200
S4 MCNARY P4 300
S4 MCNARY P5 400
a. Is there a 3NF violation in SUPPLY? If yes, explain. If no, is there a BCNF violation in
SUPPLY? Explain.
b. Decompose SUPPLY if necessary so that the resulting relational schema is in BCNF.
Is your design attribute-preserving, dependency-preserving, and a lossless-join decom-
position? Explain.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
18. Consider the relation schema CLASS with attributes Student, Subject, and Teacher. The
meaning of this relation is that the specified student is taught the specified subject by the
specified teacher. Assume that semantic rules, depicted by the following functional depen-
dencies, exist:
{Student, Subject} ã Teacher
Teacher ã Subject
Subject ã Teacher
{Student, Teacher} ã Subject
a. What do these rules mean in words? Are all these rules necessary? If not, explain
which are not needed.
b. Is the following sample data consistent with these rules? Why or why not? 463
c. What causes CLASS to contain a BCNF violation? What anomalies does it exhibit?
d. Decompose CLASS if necessary so that the resulting relational schema is in BCNF. Is
your design attribute-preserving, dependency-preserving, and a lossless-join decompo-
sition? Explain.
19. This exercise is a variation of Exercise 18. Consider the following functional dependencies
prevailing over the relation schema CLASS (Student, Subject, Teacher):
{Student, Subject} ã Teacher
Teacher ã Subject
Subject ã Teacher
{Student, Teacher} ã Subject
a. What do these rules mean in words? Are all these rules necessary? If not, explain
which are not needed.
b. Is the following sample data consistent with these rules? Why or why not?
c. Does this version of CLASS satisfy the requirements of BCNF? Is it free of modification
anomalies triggered by undesirable functional dependencies? Is it free of modification
anomalies altogether?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
20. Given the relation schema FLIGHT (Gate#, Flight#, Date, Airport, Aircraft, Pilot) and the
constraint set F {fd1, fd2, fd3}, where:
fd1: {Airport, Flight#, Date} ã Gate; fd2: {Flight#, Date} ã Aircraft
fd3: {Flight, Date} ã Pilot
a. List the candidate key(s) of FLIGHT.
b. For each candidate key, indicate the immediate normal form violated in FLIGHT by
each of the functional dependencies given above.
c. If FLIGHT is not in BCNF, design a relational schema that
• is in BCNF, and
• yields a lossless-join decomposition
464
d. Are all functional dependencies in F preserved? If not, which are not preserved?
21. Consider the universal relation schema INVENTORY (Store#, Item, Vendor, Date, Cost,
Units, Manager, Price, Sale, Size, Color, Location) and the constraint set F {fd1, fd2, fd3,
fd4, fd5, fd6, fd7} introduced originally in Chapter 7, Exercise 13, where:
a. Confirm that F is a minimal cover for the set of functional dependencies given above.
b. List the candidate key(s) of INVENTORY.
c. For each candidate key, indicate the immediate Normal Form violated in INVENTORY
by each of the functional dependencies given above.
d. If INVENTORY is not in BCNF, design a relational schema that
• is in BCNF so that all modification anomalies due to functional dependencies are
eradicated, and
• yields all lossless-join decompositions
e. List the functional dependencies in F that are not preserved in this design.
f. Show the final design. The design should be parsimonious (i.e., minimal set in BCNF).
Also, clearly indicate entity integrity and referential integrity constraints.
g. Revise the above design so that all dependencies are preserved in a lossless-join
decomposition with the least sacrifice in the achieved level of normal form.
22. Given the set of functional dependencies F {fd1, fd2, fd3, fd4, fd5, fd6, fd7, fd8, fd9, fd10}
introduced originally in Chapter 7, Exercise 14:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Normal Forms Based on Functional Dependencies
a. Using the universal relation schema (URS) and its associated primary key developed in
Chapter 7, Exercise 14, indicate the immediate Normal Form violated in each of the
functional dependencies given above and explain how the particular Normal Form is
violated in each case.
b. If the URS is not in BCNF, design a relational schema that:
• is in BCNF so that all modification anomalies due to functional dependencies are
eradicated, and
• yields all lossless-join decompositions
c. List the functional dependencies in F that are not preserved in this design.
d. Show the final design. The design should be parsimonious (i.e., minimal set in BCNF).
Also, clearly indicate entity integrity and referential integrity constraints. 465
e. Revise the above design so that all dependencies are preserved in a lossless-join
decomposition with the least sacrifice in the achieved level of normal form.
f. Reverse engineer the design to the conceptual level and show it as a Presentation
Layer ERD.
23. Given the set of functional dependencies F {fd1, fd2, fd3, fd4, fd5, fd6, fd7, fd8, fd9, f10,
f11} introduced originally in Chapter 7, Exercise 15:
a. Using the universal relation schema (URS) and its associated primary key developed in
Chapter 7, Exercise 15, indicate the immediate Normal Form violated in each of the
functional dependencies given above and explain how the particular normal form is
violated in each case.
b. Decompose the URS to arrive at a design/schema that is in BCNF. In each decompo-
sition, identify the primary key and the functional dependencies accounted for. Demon-
strate that each decomposition is a lossless-join decomposition.
c. Show the final design. The design should be parsimonious (minimal set in BCNF).
Clearly indicate entity integrity and referential integrity constraints.
d. Indicate the functional dependencies in F that are not preserved in the BCNF design
and specify materialized view(s) for the same.
e. Reverse engineer the design to the conceptual level and show it is a Presentation
Layer ERD.
f. Revise the design above so that all dependencies are preserved in a lossless-join
decomposition, with the least sacrifice in the achieved level of normal form.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 8
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 9
HIGHER NORMAL FORMS
The focus of Chapter 8 was data redundancies resulting from undesirable functional
dependencies and the aspects of normalization that address this specific problem. This
chapter completes the discussion of normalization by addressing normal forms that are
beyond the purview of functional dependencies.
This chapter flows as follows. Section 9.1 introduces the concept of multi-valued
dependency (MVD) in a relation schema. It begins with a motivation exemplar in
Subsection 9.1.1 to help the reader appreciate the import of MVD intuitively; Subsections
9.1.2 and 9.1.3 then present the formal definition of MVD and the inference rules
associated with MVD, respectively. In Section 9.2, fourth normal form (4NF) is introduced
as the solution to eliminate data redundancies caused by MVDs. This is followed, in
Section 9.3, by a comprehensive example describing the occurrence and resolution of
4NF violation in a relation schema. The generality of 4NF is discussed in Section 9.4 by
showing how 4NF subsumes all the previously discussed normal forms. The topic of
Section 9.5 is the concept of join-dependency and the associated normal form called
Project/Join normal form (PJNF) or fifth normal form (5NF). A brief note on Domain/Key
normal form (DKNF) in Section 9.6 concludes this chapter.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
If we add only one or two of the above tuples, the semantics of the relation will be
altered. For example, if we add only the first tuple from the set above to MUSIC_VEHICLE,
it will mean that Kamath learns country music only when owning a jeep. In other words,
the fact that Music and Vehicle are independent attributes will be compromised in
MUSIC_VEHICLE. Therefore, an insertion anomaly is present. Likewise, if Kamath does 469
not own a jeep anymore, the following two tuples will have to be deleted in order to keep
the relation instance consistent with the implied semantics:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
471
Test Ratified:
Test Failed:
© 2015 Cengage Learning®
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Given R (A, B, C, D), where A, B, C, and D are arbitrary subsets, atomic or composite,
of the set of attributes of a relation schema, R, the following inference rules apply to
MVDs:
• Reflexivity rule: If B Ì A, then A ã
ã B
• Complementation rule: If A ã ã B, then A ã
ã [R (A È B)]
• ã
Augmentation rule: If A ã B, and C Ì D, then (A, D) ãã (B, C)
• ã ã ã
Transitivity rule: If A ã B, and B ã C, then A ã (C B)
A few other inference rules can be derived from these four (e.g., union rule, decom-
position rule, pseudotransitivity rule). In addition, there is an inference rule that essen-
tially bridges an FD and an MVD1:
• Replication rule: If A ã B, then A ã
ã B
472
The replication rule indicates that an MVD is a generalization of an FD in the sense
that every FD is an MVD. Note that the converse is not true. More precisely, an FD is an
MVD in which the set of dependent values matching a specific determinant value is always
a singleton set.
From the above set of rules for FDs and MVDs, it is possible to infer the complete set
of FDs in F+ and the MVDs that hold in any relation state r of R that satisfies V (the set of
specified MVDs). The closure of V is referred to as V+.
Another useful rule derived by Catriel, Fagin, and Howard (1977) is2:
If A ã
ã B, and (A, B) ã C, then A ã (C B)
The property of symmetry in an MVD emerges directly from the complementation
rule. Accordingly, if an MVD X ã ã Y holds in the relation schema R (X, Y, Z), then so does
the MVD X ã ã Z. This is often represented as X ãã Y | Z. The MVD X ã ã Y in the relation
schema R (X, Y, Z) is a trivial MVD if either (i) Y Ì X or (ii) (X È Y) = R because in either
case the MVD does not convey any additional constraint (meaning). Trivial MVDs are
usually removed from V+ without any consequence. An MVD that satisfies neither (i) nor
(ii) is a non-trivial MVD and requires attention.
1
There is a second inference rule called the coalescence rule linking FD to MVD, which is a little less
intuitive: if A ã
ã B, and (i) B and D are disjoint, (ii) D ã C, and (iii) C Ì B, then A ã C.
2
This rule helps us understand 2NF violation as a special case of 4NF violation.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
DEFINITION
4NF defined: A relation schema R is in 4NF if there are no non-trivial multi-valued dependencies in R,
or the determinant of any non-trivial multi-valued dependency in R is a superkey of R.3
For instance, let us evaluate the relation schema from Section 9.1.1 for 4NF:
R: MUSIC_VEHICLE (Name, Music, Vehicle)
As seen in Section 9.1.2, V+ contains the MVDs Name ã ã Music | Vehicle. Therefore,
MUSIC_VEHICLE violates 4NF. The resolution of 4NF violation is accomplished by the
decomposition strategy:
• Replace the target relation schema (R) by the projections (R1 and R2) that
contain the determinant and dependent present in each of the two MVDs.
473
Accordingly, we have the decomposition:
Solution 1
D {R1, R2}
where:
R1: N_MUSIC (Name, Music); R2: N_VEHICLE (Name, Vehicle)
D {R1, R2} is in 4NF because both R1 and R2 are free of any non-trivial MVD.
Other decompositions are also possible:
Solution 2
D {R1a, R2a}
where:
R1a: N1_MUSIC (Name, Music); R2a: N2_VEHICLE (Music, Vehicle)
Does this decomposition resolve the 4NF violation in MUSIC_VEHICLE? Yes, because
both R1a and R2a are free of any non-trivial MVD. The difference between Solutions 1
and 2 is that Solution 1 is a lossless-join decomposition while Solution 2 is a loss-join
decomposition, as shown in Figure 9.1.
How do we detect loss/lossless-join decomposition in a 4NF resolution? With reference
to MVDs that hold on a relation schema R, a decomposition D: {R1, R2} is a lossless-join
decomposition iff V+ contains:
• either the MVD (R1 Ç R2) ã
ã (R1 R2)
• or the MVD (R1 Ç R2) ã
ã (R2 R1)
3
This is an informal definition of 4NF. More formally, a relation schema R is in 4NF if for every non-
trivial MVD X ã +
ã Z in V in R, X is a superkey of R. Equivalently, it can be stated that R is in 4NF if
it is in BCNF and the dependents in all non-trivial MVDs in R are singleton sets—that is, the non-
trival MVDs are also FDs whose determinants are candidate keys of R.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
MUSIC_SKILL. The next question is: Are there any other MVDs in MUSIC_SKILL?
The only other possible MVD pairs in MUSIC_SKILL are Music ã ã Skill | Name and
Skill ã
ã Name | Music.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
475
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
mvd1: Name ã
ã Dependent
By the rule of complementation, we can infer that mvd2: Name ã ã {Music, Skill}.4
MUSICIAN is not in 1NF because of the presence of multi-valued attributes; for this
reason, MUSICIAN is not even a relation schema. In order to transform MUSICIAN to a
1NF relation schema, we specify (Name, Music, Skill, Dependent) as the primary key of
MUSICIAN. Thus we have a 1NF relation schema:
MUSICIAN (Name, Age, Ph#, Band, Rate, Music, Skill, Dependent)
Eliminating normal form violations due to FDs yields the decomposition:
R1: PERSON (Name, Age, Ph#, Band)
R2: BAND (Band, Rate)
R3: NMSD (Name, Music, Skill, Dependent)
R1 and R2 are in BCNF (in fact, in 4NF), and R3 violates 4NF because mvd1| mvd2
persists in R3. Solving R3 for 4NF violation yields:
R3a: FAMILY (Name, Dependent)
R3b: NMS (Name, Music, Skill)
The Presentation Layer ERD reverse engineered from this solution enabling a deeper
insight into the semantics of the scenario is shown in Figure 9.3.
4
It is incorrect to conclude that Name ã
ã Music; and Name ã
ã Skill.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
477
FIGURE 9.3 Reverse engineered 4NF design scenario 1
Note: Participation constraints arbitrarily assumed
Suppose we add a few more attributes to the 1NF relation schema MUSICIAN, where
the following FDs hold:
R1: PERSON (Name, Age, Ph#, Band); R2: BAND (Band, Rate);
The Presentation Layer ERD reverse engineered from this revised solution appears in
Figure 9.4.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
478
Comparing the two solutions along with the ERDs in Figures 9.3 and 9.4 provides
interesting insights as to how identical relation schemas in the two solutions (R3a and
R3b) depict somewhat different scenarios.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
that the dependent value is restricted to a singleton set—that is, for a given value of A, B
has only one value.
To get a better grip on the concept, let us review the 2NF violation from this per-
spective using the example studied in Chapter 8 (see Sections 8.1.1 and 8.1.2). The non-
1NF schema ALBUM (Album_no, {Artist_nm}, Price, Stock) with a multi-valued attribute,
Artist_nm, and an FD, Album_no ã (Price, Stock), is first normalized to 1NF as:
NEW_ALBUM (Album no, Artist_nm, Price, Stock)
Since NEW_ALBUM is in 1NF, it is a relation schema and the tenets of the relational
theory can be applied to it. Does Album_no ã ã Artist_nm in NEW_ALBUM? This can
be determined by comparing the natural join (ALBUM_ARTIST * ALBUM_INFO) with
NEW_ALBUM, where:
D: ALBUM_INFO (Album_no, Price, Stock); and ALBUM_ARTIST (Album_no, Artist_nm)
represents a binary decomposition NEW_ALBUM.
479
Figure 8.2 contains the relation instances pertaining to this example. By computing the
natural join, (ALBUM_ARTIST * ALBUM_INFO), it can be seen that the natural join indeed
strictly yields the relation NEW_ALBUM. Therefore, it is concluded that Album_no ã ã
Artist_nm. Then, by the rule of complementation, we have Album_no ã ã (Price, Stock). In
other words, for each value of Album_no in NEW_ALBUM, the same set of (Price, Stock)
occurs for each value of Artist_nm. However, (Price, Stock) is a singleton set. Because
(Price, Stock) is a singleton set, the MVD, Album_no ãã (Price, Stock), is equivalent to the FD,
Album_no ã (Price, Stock), and that is why the primary key for ALBUM_INFO is Album_no
instead of (Album_no, Price, Stock). Thus, it is seen that a 2NF violation is a special
case of 4NF violation—that is, 4NF subsumes 2NF. The fact that the binary decomposition
of NEW_ALBUM yields a lossless-join decomposition signals the presence of MVD in
NEW_ALBUM, thus ratifying Assertion 2 above. The inference rule derived by Catriel,
Fagin, and Howard (see Section 9.1.3) essentially conveys the same idea.
In a similar fashion, it can be shown that a 3NF violation and BCNF violation are also
special cases of 4NF violation. The mere existence of a lossless-join decomposition of the
target relation schema that violates 3NF or BCNF in itself signals the presence of an MVD
in the relation schema, and that is how a 3NF and a BCNF violation can be seen as a 4NF
violation. To gain deeper insight, the reader is encouraged to explore this further using
the examples from Chapter 8 (see Sections 8.1.3 and 8.1.4). Identifying the MVDs that
prevail in these cases is an interesting exercise to pursue. In short, a relation schema in
4NF also is in 3NF and BCNF.
Here is a summary of the salient properties of multi-valued dependencies and 4NF:
• A multi-valued attribute (MVA) can cause multi-valued dependencies (MVD)
in a relation schema R.
• A MVA is necessary to cause MVD in a relation schema R.
• When MVAs cause MVDs, modification anomalies will be present in R if the
determinant of any MVD present in R is not a super (candidate) key of R,
even though there are no undesirable functional dependencies (FDs) in R.
• When MVAs don’t cause MVDs in R, R is in 4NF.
• Independent MVAs in a relation schema cause MVDs and hence immediate
violation of 4NF.
• MVAs in a relation schema dependent on each other do not cause MVDs and
hence no immediate violation of 4NF.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
DEFINITION
Join dependency (JD) defined: JD (R1, R2, R3, … Rn) specified over a relation schema, R, states
that every legal state r of R has a lossless-join decomposition into R1, R2, R3, … Rn. In other words,
given (R1, R2, R3, … Rn) as projections of R, if the natural join of (R1, R2, R3, … Rn) produces every
legal state r of R, then JD (R1, R2, R3, … Rn) exists in R. A JD (R1, R2, R3, … Rn) is trivial if one of
the projections, Ri, is equal to R because this condition guarantees lossless-join property for any state
480 r of R and essentially does not specify any constraint at all on R. Observe that given (R1, R2) as pro-
jections of R, the MVDs (R1 Ç R2) ã ã (R1 – R2) | (R2 – R1) are equivalent to JD (R1, R2). Thus, MVDs
are essentially binary join dependencies possessing a number of algebraic properties similar to those
of FDs. Since an FD can be seen as a special case of MVD (i.e., an MVD subsumes an FD), an FD is
also subsumed by a JD. Therefore, it can be said that JD is the most general form of dependency
constraint that deals with decomposition via projections and re-compositions via natural joins.
Fifth normal form (5NF) is about JDs. In simple terms, the presence of JDs in a rela-
tion schema R violates 5NF—that is, if a relation state r of R can be strictly reconstructed
from the natural join of all of its projections, (R1, R2, R3, … , Rn), then a join-dependency
is present in R (i.e., a constraint specifying JD has been imposed on R) and R is not in 5NF.
In order to establish 5NF, R should be replaced by its decomposition (R1, R2, R3, … Rn).
The set of relation schemas (R1, R2, R3, … Rn) in this case is in 5NF.
DEFINITION
5NF defined: A relation schema R is in 5NF if there are no non-trivial join dependencies in R.5 A rela-
tion schema that cannot be reconstructed by a natural join of all its projections does not have a JD
imposed on it and so is already in 5NF and should not be decomposed to achieve 5NF.
Let us now review an illustration that clarifies the concept. SCHEDULE_X at the top
of Figure 9.5 is the relation instance that remains after all the normal form violations due
to functional dependencies have been resolved and decomposed from the original relation
schema; it is represented as:
R: SCHEDULE_X (Prof_name, Course#, Quarter)
5
A technically more precise definition is: R is in 5NF if, for every non-trival join-dependency, JD
(R1, R2, ... Rn), every Ri is a superkey of R (Elmasri and Navathe, 2010). A comprehensive (highly
technical) discussion of 4NF and 5NF can be found in Johnson (1997).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
Business Rule:
481
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
There are no MVDs specified on SCHEDULE_X. In fact, we are able to verify this from
the relation instance in Figure 9.5. The three possible decompositions that can suggest
presence of non-trivial MVDs in SCHEDULE_X are:
R1x:TAUGHT_X (Prof_name, Course#);
R2x:TAUGHT_DURING_X (Prof_name, Quarter)
R3x:COURSE_OFFERING_X (Course#, Quarter)
If a natural join of any binary decomposition of R—namely, (R1x * R2x) or
(R1x * R3x) or (R2x * R3x)—is sufficient to strictly reconstruct R (non-loss composition),
then the presence of MVD is evidenced. When verified with the relation instances in
Figure 9.5, the reader will find that none of these natural joins strictly yields R. Then,
there are no MVDs in R implying that R (i.e., SCHEDULE_X) is in 4NF. Is SCHEDULE_X
in 5NF? In order to check this, we need to know if JD (R1x, R2x, R3x) exists. If the
natural join of {R1x, R2x, R3x} strictly results in R, then we can conclude that JD (R1x,
482
R2x, R3x) persists. Then, R (i.e., SCHEDULE_X) violates 5NF. When verified using the
relation instances in Figure 9.5, the reader will find that a natural join of {R1x, R2x, R3x}
indeed yields strictly R, meaning that JD (R1x, R2x, R3x) is present. Therefore,
SCHEDULE_X violates 5NF. In order to restore the design to 5NF, SCHEDULE_X should
be replaced by the set of relation schemas (TAUGHT_X, TAUGHT_DURING_X,
COURSE_OFFERING_X).
Let us now review the relation instance SCHEDULE_Y in Figure 9.6. The relation
schema that represents this relation instance is:
R: SCHEDULE_Y (Prof_name, Course#, Quarter)
SCHEDULE_Y is in 4NF. The verification of this claim is left as an exercise to the
reader. Is SCHEDULE_Y in 5NF? In order to check this, we need to know if JD (R1y, R2y,
R3y) exists, where:
R1y: TAUGHT_Y (Prof_name, Course#);
R2y: TAUGHT_DURING_Y (Prof_name, Quarter)
R3y: COURSE_OFFERING_Y (Course#, Quarter)
Once again, if the natural join of {R1Y, R2Y, R3Y} strictly results in R, then we can
conclude that JD (R1Y, R2Y, R3Y) persists in R. Then, R (i.e., SCHEDULE_Y) violates
5NF. When verified using the relation instances in Figure 9.6, we find that a natural
join of {R1Y, R2Y, R3Y} does not strictly yield R, meaning that JD (R1Y, R2Y, R3Y) is
not present in R. Therefore, SCHEDULE_Y does not violate 5NF—that is, SCHEDULE_Y
is in 5NF. Replacing SCHEDULE_Y by the set of relation schemas (TAUGHT_Y,
TAUGHT_DURING_Y, COURSE_OFFERING_Y) will change the intended semantics
of the design and amounts to an erroneous decomposition. Table 9.2 is provided as an
aid to a better understanding of 5NF through a comparative review of SCHEDULE_X
(Figure 9.5) and SCHEDULE_Y (Figure 9.6).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
483
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
TABLE 9.2 A comparative analysis of the presence and absence of 5NF violation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
The relation instance MUSIC_SKILL in Figure 9.2 is in 4NF. Is it also in 5NF? If not,
what can be done to this relation instance to establish 5NF in MUSIC_SKILL? On the
other hand, if MUSIC_SKILL is in 5NF, how can the relation instance be altered to depict
a 4NF relation that violates 5NF? The reader should find this an interesting exercise.
A summary of the salient properties of join-dependencies (JD) and Project-Join Nor-
mal Form (PJNF, also termed 5NF) along with a clarification of how JD subsumes MVD
and therefor FD and thus represents the general form of all dependencies predicated upon
project and join relational algebra operations is presented here:
• A JD in a relation schema R pertains to conditions where the natural join of
any proper subset of its projections results in the strict reconstruction of R.
• Therefore, a relation schema R that cannot be reconstructed by a natural
join of any proper subset of its projections does not have a JD.
• A JD is trivial if the set of projections includes R. 485
• A JD (R1, R2, R3, … Rn) specified over a relation schema R states that every
legal state r of R has a lossless-join decomposition into R1, R2, R3, … Rn.
• Given (R1, R2, R3, … Rn) as a decomposition of R, if the natural join of (R1,
R2, R3, … Rn) produces every legal state r of R, then JD (R1, R2, R3, … Rn)
exists in R.
• A relation schema R is in 5NF if there are no non-trivial join-dependencies in R.
• Alternatively, R is in 5NF if for every non-trivial join-dependency, JD (R1,
R2, …, Rn), every Ri is a superkey of R.
• Presence of non-trivial join-dependency in a relation schema R violates 5NF
in R except when the projections are superkeys of R.
• 5NF Solution: If R has a non-trivial join-dependency, replace R with the
appropriate proper subset of projections [R1, R2, R3, …, Rn].
• An MVD is a special case of JD where the decomposition is binary—that is,
JD (R1, R2) in R is equivalent to MVD, (R1 Ç R2) —>> (R1 – R2) | (R2 – R1).
• MVDs are essentially binary join-dependencies possessing a number of alge-
braic properties similar to those of FDs.
• Since an FD can be seen as a special case of MVD (i.e., an MVD subsumes an
FD), an FD is also subsumed by a JD.
• JD is the most general form of dependency constraint that deals with decom-
position through projections and re-compositions using natural joins.
While FDs portray binary relationships among a set of entity types, MVDs and JD
pertain to ternary and n-ary relationship types, respectively. The salient characteristics of
the relationship type intrinsic to the 4NF and 5NF relations are presented here:
• Erroneous decomposition of a relation schema R that is in 4NF renders a
genuinely ternary relationship into two incorrect binary relationships.
• When a binary relationship between R1 & R2 and R1 & R3 is erroneously
expressed as a ternary relationship among R1, R2, and R3, 4NF is violated.
A ternary relationship that can be reconstructed from any two of its binary
projections should not be modeled as a ternary relationship.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
OFFERS
Prof_name Course# Quarter
Verstrate IS812 Spring
Verstrate IS832 Winter
Verstrate IS330 Fall
Verstrate IS330 Spring
Surendra IS812 Fall
Surendra IS430 Fall
Surendra IS430 Spring
Kim IS821 Winter
Kim IS430 Spring
Kim IS430 Summer
CAN_TEACH AVAILABLE_TO_TEACH
Prof_name Course# Prof_name Quarter
Verstrate IS812 Verstrate Fall
Verstrate IS832 Verstrate Winter
Verstrate IS330 Verstrate Spring 487
Surendra IS812 Surendra Fall
Surendra IS430 Surendra Spring
Kim IS812 Barron Fall
Kim IS821 Barron Winter
Kim IS430 Kim Winter
Seligman IS430 Kim Spring
Kim Summer
COURSE_OFFERING
Course# Quarter
IS330 Fall
IS330 Spring
IS330 Summer
IS340 Winter
IS430 Fall
IS430 Spring
IS430 Summer
IS812 Fall
IS812 Spring
IS821 Winter
IS832 Winter
Figures 9.8a, 9.8b, and 9.8c, which present relational schema and ERDs reverse
engineered from them for the relations SCHEDULE_X, SCHEDULE_Y, and OFFERS,
respectively, ought to complete the picture for the reader.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
R23: COURSE_OFFERING_X(O_course#,O_quarter)
R13: TAUGHT_DURING_X(D_quarter,D_prof_name)
Quarter
QUARTER
(0,m) (0,n)
Taught_during_x Course_offering_x
Prof_name Course#
(0,n) (0,m)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
Quarter
QUARTER
(0,m)
Schedule_y
(0,p)
(0,n)
PROFESSOR COURSE
Prof_name Course#
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Quarter
QUARTER
(0,m) (0,n)
PROFESSOR COURSE
(0,n) (0,m)
Can_teach
FIGURE 9.8c Relational Schema and ERD to model co-existing ternary and binary
relationships
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
Name (0,m)
Status
Course#
Stu_num
(1,p)
STUDENT Takes
491
These semantics are conveyed with the use of two m:n relationships—one between
COURSE and INSTRUCTOR and the other between COURSE and STUDENT—along with
the appropriate participation constraints.
Prior to mapping this ERD to the logical tier, the m:n relationship types will have to
be decomposed to gerund entity types. We ought to be able to bypass this step and pro-
ceed to map the ERD to the logical tier. Thus, the relational schema will contain one
relation schema for each of the three entity types, INSTRUCTOR, COURSE and
STUDENT. The two relationship types, Offers and Takes, that capture m:n relationships
will resolve to gerund entity types in the Design-Specific ER diagram and appear as
relation schemas OFFERING and TAKEN bridging the relation schemas COURSE and
INSTRUCTOR and the relation schemas COURSE and STUDENT, respectively. The
relational schema mapped from the ERD in Figure 9.9a is displayed in Figure 9.9b as
D [R1, R2, R3, R4, R5].
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Fc [fd1, fd2, fd3, fd4, fd5, fd6] derived from D [R1, R2, R3, R4, R5] where
Now that we have a URS and a set of FDs prevailing over it, we should be able to go
through the normalization process, reverse engineer the resulting BCNF design to the
conceptual tier (an ER model) and be able to verify if we arrive at the source ERD we
started off with. The URS and Fc prevailing over it are reproduced here from Figure 9.9c:
URS (Ins_name, Ins_rank, Ins_qualification, O_ins_name, O_co_course#,
Semester, Co_course#, Co_cname, Co_credits, T_co_course#,
T_stu_num, Grade, Stu_num, Stu_name, Stu_status)
and
Fc [fd1, fd2, fd3, fd4, fd5, fd6] prevailing over URS, where
We note that the only candidate key (and therefore the primary key) of URS is
(Ins_name, Course#, Stu__stu). Next, using the fast-track algorithm, we first arrive at the
relation schemas portrayed in Figure 9.10a. Observe that the relation schema R0 results
pursuant to the requirement of the algorithm that, in the absence of a candidate key of URS
in R1, R2, or R3, a relation schema should be created containing a candidate key of URS.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
Stu_num Stu_name
STUDENT
Stu_status
(0,m)
Tutors
(0,p
) Credits
)
(0,n
INSTRUCTOR COURSE
FIGURE 9.10b ERD reverse engineered from the relational schema displayed in Figure 9.10a
While multiple solutions are certainly acceptable, the semantics captured in these two
solutions are drastically different. In fact, for the business rules specified at the beginning
of this section, the second solution, depicting a ternary relationship type, is clearly
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
incorrect. Let us now examine the two relational schemas shown in Figures 9.9b and
9.10a. From a technical perspective, the two designs can be reconciled only when R0: ICS
(L_ins_name, L_co_course#, L_stu_num) in Figure 9.10a can be decomposed to the two
relation schemas, RX: OFFERING (L_ins_name, L_co_course#) and RY: TAKING
(L_co_course#, L_stu_num). A close scrutiny of R0, RX, and RY reveals that this precise
decomposition is possible when the MVD of the form L_co_course# ã ã L_ins_name |
L_stu_num is imposed on URS.
To this end, a review of the dependencies specified on the URS in Figure 9.9c based
on the relational schema shown in Figure 9.9b simply focused on the FDs derived from
R1, R3, and R5. While no non-trivial FDs are present in R2 and R4, R2 and R4 are map-
pings of the gerund entity types decomposed from the relationship types Offers and Takes
in Figure 9.9a. While our original specification of the FDs prevailing in the relational
schema in Figure 9.9b is technically correct, the overall specification of dependencies is
494
indeed incomplete without capturing the effect of the two relationship types Offers and
Takes. Now, we should be able to infer that while no non-trivial FDs are present in R3 and
R5, they actually represent the MVD L_co_course# ã ã L_ins_name | L_stu_num that we
failed to specify in Figure 9.9c. In other words, this MVD pair represents the business
rules that specify the two m:n relationships the entity type COURSE has in the ERD
shown in Figure 9.9a. In short, had this MVD been specified in Figure 9.9c, the normali-
zation process would not have yielded the solution presented in Figure 9.10a; instead, the
solution would have been exactly the same as in Figure 9.9b, first mapped from the ERD
in Figure 9.9a.
In Section 9.5, we discussed the rationale, possibility, and perhaps the need pur-
suant to requirement specification for a relational model that captures simultaneous
existence of ternary and binary relationships among the participating entity types
(e.g., SCHEDULE_Y and OFFERS). Let us now explore a similar design in the current
case. The relational schema and the ERD from which this relational schema is mapped
are displayed in Figures 9.11a and 9.11b, respectively. While the ERD is technically
legitimate and can/does make semantic sense, the relational schema from a pure
technical perspective will be observed as having redundant relation schemas. A parsi-
monious design will evaluate the relation schemas OFFERING and TAKEN as redundant
subsumable in the relation schema ICS. The dilemma here is rather straightforward:
• If MVD L_co_course# ã ã L_ins_name | L_stu_num exists, the consequent 4NF
violation in ICS will require decomposition of ICS to the two projections
OFFERING and TAKEN; thus, ICS cannot exist.
• Alternatively, if ICS is in 4NF, decomposing ICS is invalid, and so the relation
schemas OFFERING and TAKEN cannot exist.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
Can_offer
Cname
m)
n) Credits
INSTRUCTOR COURSE
)
(0,n
) (0,n
Scheduled
Course#
(0,m)
(0,p)
Status Name
Stu_num
(0,n)
STUDENT May_take
FIGURE 9.11b ERD for the relational schema presented in Figure 9.11a
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
The underlying issue is the inability to specify the semantic unrelated, independent
existence of one or more binary relationships simultaneously with the existence of a tern-
ary relationship among three participating entity types in terms of dependencies (e.g.,
FDs, MVDs, and/or JDs). The only solution appears to be to tolerate apparent redundan-
cies in a relational schema when semantics captured in the ERD obviate the absence of
any redundancies.
An interesting variation to this design inadvertently resolves the conflict we discussed
regarding Figures 9.11a and 9.11b. The only difference in the design presented in
Figure 9.12a and 9.12b from the one we analyzed in Figures 9.11a and 9.11b () is that
the relation schemas OFFERING and TAKEN each contain an attribute, meaning that
Semester and Grade are attributes of the respective relationships. The presence of these
attributes ensures that the relation schemas OFFERING and TAKEN are no longer subsets
of the relation schema ICS and are no longer redundant; in fact, their presence in the
496
relational schema is supported by the FDs {O_co_course#, O_ins_name} ã Semester and
{T_co_course#, T_stu_num} ã Grade.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
Can_offer
m)
Semester (0,
(0,
n) Credits
INSTRUCTOR ) COURSE
(0,n
) (0,n
Scheduled
Course#
(0,m)
(0,p)
Stu_num (0,n)
STUDENT May_take
FIGURE 9.12b ERD for the relational schema presented in Figure 9.12a
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Chapter Summary
While relation schemas in BCNF are free from modification anomalies due to undesirable func-
tional dependencies, BCNF relation schemas that fail to achieve 4NF and 5NF are vulnerable to
modification anomalies as a result of multi-valued dependencies (in the case of 4NF) and join-
dependencies (in the case of 5NF).
A multi-valued dependency is defined as follows: In a relation schema R (X, Y, Z), where X,
Y, and Z are atomic or composite attributes, a multi-valued dependency X ã ã Y exists if each
value of X in any relation state r of R is associated with a set of Y values independent of the Z
values with which X is associated. In a relation schema R (X, Y, Z) in BCNF, a multi-valued
dependency X ã ã Y exists if and only if the natural join r (R1) and r (R2), where R1 (X, Y) and
R2 (X, Z) are projections of R, strictly yields r (R) for every relation state r of R. A relation
schema R is in 4NF if it is in BCNF and has no non-trivial multi-valued dependencies.
498
A join dependency in a relation schema R pertains to conditions where the natural join of all
of its projections results in the reconstruction of R. A relation schema that cannot be reconstructed
by a natural join of all its projections does not have a join-dependency and is said to be in 5NF.
The normal forms 2NF through 5NF have considered constraints imposed by functional depen-
dencies, multi-valued dependencies, or join-dependencies. Fagin (1981) suggests a generalized
constraint that both infers the three kinds of dependencies and allows more generalized constraints.
Constraints are explained here in terms of domain constraints of the relation schema’s attributes and
relation keys. Relation schemas that satisfy these constraints are said to be in domain key normal
form (DK/NF). Unfortunately, other than checking to see if each constraint on a relation schema is a
logical consequence of the definition of keys of a relation schema or domains of attributes, there are
no formal methods to systematically verify if a relation schema is in DK/NF.
Exercises
1. What is a multi-valued dependency?
2. Consider the instance of the relation SHIRT (Shirt#, Color, Size), where Shirt# is equivalent
to a style number (e.g., style number 341 might be a shirt with a button-down collar, while
style number 342 might be a shirt with an open collar, etc.). Observe that each Shirt#
comes in a variety of colors and sizes.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 9
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Higher Normal Forms
13. Given the schema SCHEDULE (Prof, Office, Major, {Book}, Course, Quarter) along with F:
fd1: Prof ã {Office, Major} and V: mvd: Prof ã
ã Book, do the following:
a. Identify the primary key of SCHEDULE such that SCHEDULE is a 1NF relation
schema.
b. Normalize SCHEDULE to 4NF.
c. Indicate the entity integrity and referential integrity constraints.
d. Reverse engineer the design to a conceptual schema using the ER modeling grammar.
14. This exercise is based on four different inventory relation schemas, P, Q, R, and S. An
instance of each relation schema is given here, each of which reflects a different design
objective.
P Q R S
501
Part# Color Store# Part# Color Store# Part# Color Store# Part# Color Store#
18 Green B 18 Green C
18 Green C 18 Black C
18 Black B
18 Black C
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PART IV
DATABASE IMPLEMENTATION USING
THE RELATIONAL DATA MODEL
INTRODUCTION
The final phase of the data modeling life cycle is physical data modeling. At this point, we have an
information-preserving logical schema normalized to the extent we want and ready for implementa-
tion. Because contemporary database systems are dominated by relational data models, as we tran-
schema to a physical data model will employ the technology-driven relational data modeling grammar.
Accordingly, a relation becomes a table, the tuples are the rows of the table, and the attributes are
the columns of the table. Figure IV.1 points out our location in the data modeling hierarchy.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Part IV
Universe of
Interest
Requirements
Specification
Process Data
Specifications Specifications
[ER Modeling
Process Model Conceptual Design/Schema
Grammar]
504
Design-Specific
Logical Data Modeling ER Model
Technology-Independent
Logical Schema
[Information-Preserving Grammar]
Technology-Independent
Normalization
Technology-Dependent
Technology-Dependent
Logical Schema
Structured Query Language (SQL) is the standard universally accepted by the rela-
tional database (RDB) community—the users and the RDBMS vendors—for the imple-
mentation of a relational database. The name SQL comes from its predecessor SEQUEL
(Structured English Query Language) developed by IBM Research as an experimental
product. SQL1 is a comprehensive relational database language and comprises three
sublanguages: (1) a data definition language (SQL/DDL) intended for the creation and
alteration of tables and other associated structures such as views, domains, and schemas;
(2) a data manipulation language (SQL/DML) aimed at data manipulation tasks; and (3) a
data control language (SQL/DCL) geared to controlling database access. There are numer-
ous idiosyncratic differences among commercial DBMS products. Nonetheless, the RDBMS
vendors endeavor to meet a certain level of the SQL standards developed jointly by ANSI
1
Officially pronounced ess-que-ell (Date, 2004, p. 4)—not sequel.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Implementation Using the Relational Data Model
and ISO. Consequently, the users of ANSI/ISO standard SQL find migration and interop-
erability across RDBMS products relatively painless. At this writing, most commercial
RDBMS products provide reasonable support to SQL-2003. Therefore, the discussions in
Part IV are based on SQL-2003. Also, we provide only an overview of the salient features
of SQL. The reader is directed to the vendor-specific reference manuals for the complete
syntax of the language.
Chapter 10 begins the discussion with two sections that focus on database creation
followed by Chapter 11 where relational algebra is introduced as a means to retrieve data
from a relational database. A query expressed in relational algebra involves a series of
operations which when executed in the order specified produces the desired results. Even
though SQL uses some of the relational algebra operators explicitly, it is a high-level
declarative language based on tuple relational calculus.2 Chapter 12 is dedicated to an
extensive coverage of SQL pertaining to data retrieval (querying). Sometimes, this portion
of the DML is referred to as the data query language (DQL).
505
Chapter 13 covers additional features of the SQL language. The discussion begins
with two sections that focus on built-in functions that facilitate working with strings,
dates, and times. This is followed by sections that introduce the reader to SQL features
for writing hierarchical queries, using extended group by clauses, working with analytical
functions, and incorporating elements of spreadsheet modeling into the SQL SELECT
statement.
2
For more details on tuple relational calculus, see Elmasri and Navathe (2010).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 10
DATABASE CREATION
At the implementation stage of a database, the principal tasks include creating and modi-
fying the database tables and other related structures, enforcing integrity constraints, and
populating the database tables. Once created, data from these tables must be retrieved.
Relational algebra, a mathematical expression of data retrieval methods prescribed by
E. F. Codd, is introduced first as a means to specify the logic for data retrieval from a
relational database. A query expressed in relational algebra involves a sequence of
operations that, when executed in the order specified, produces the desired results.
SQL is the most common way that relational algebra is implemented for data retrieval
operations in a relational database.
In this chapter as well as in Chapters 11 through 13, we use the SQL-2003 language
standard in the code for the SQL statements. It is not necessary that a commercial DBMS
implementation include all the standard SQL constructs or follow the standard language
syntax verbatim for all SQL constructs. It is highly likely that commercial DBMS products
differ in the implementation of at least some of the SQL syntax. Also, some vendors offer
additional non-standard SQL constructs. Therefore, the reader using the SQL scripts1
presented in Chapters 10 through 13 may occasionally need to refer to the SQL reference
material of the DBMS platform being used. To the extent that a DBMS product does not
conform to a common syntactical standard, portability and migration across product plat-
forms might be difficult.
Chapter 10 is divided into two sections. Section 10.1 discusses elements of SQL’s data
definition language used to create and alter the structure of base tables. Section 10.2 dis-
cusses three statements used to modify database tables: INSERT, DELETE, and MODIFY.
1
A script is a command or series of commands usually stored in a file.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
[N, 2]
Wing Room
[A, 1]
P_num
[A, 5]
P_alpha Unit# Bed
[A, 2] [A,1]
Location
Admit_date
Patient_num
[Dt, 8]
Name
[A, 41]
PATIENT On_order
[A,31]
Gender [N,4]
Name On_hand
[A, 1]
[N, 4]
508 Age [A, 5]
[N, 2] Med_code
(0, n)
MEDICATION
Placed_for Unit_price
[A, 13] [N, 3.2] R
Rx_num
(1, 1)
(5, m)
Dosage C
[N, 1]
ORDER Is_for
Frequency (1, 1)
[N, 1]
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
PATIENT Pat_p#a Pat_p#n Pat_name Pat_gender Pat_age Pat_wing Pat_room# Pat_bed Pat_admit_dt
509
Pat_p#a Pat_p#n Pat_name• Pat_gender Pat_age• [[Pat_wing( Pat_room#]• Pat _bed ] Pat_admit_dt•
L1: PATIENT
(A, 2) (A, 5) (A, 41) (A, 1) (N, 2) A, 1) (N, 3) ( A , 1) (Dt, 8)
0 <----- L1 -----> n 5 L3 m
Ord_rx# Ord_pat_p#a Ord_pat_p#n Ord_med_code Ord_dosage Ord_freq
L2: ORDER (A, 13) (A, 2) (A, 5) (A, 5) (N, 1) (N, 1)
1 <----- C -----> 1 0 R 1
FIGURE 10.1d An information-preserving logical schema for the ERD in Figure 10.1a
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
Box 1
Observe that while it is legal to use the same attribute name in different entity types
in an ER diagram, relational database theory stipulates that attribute names must be
unique over the entire relational schema. Accordingly, the attribute names in the
relational/logical schema in this example follow the naming convention proposed in
Chapter 6. The column names of a base table in SQL/DDL are considered to be ordered
in the sequence in which they are specified in the CREATE TABLE syntax. However,
when populated with data, the rows are not considered to be ordered within a base
table. Common data types supported by SQL-2003 are grouped under Number, String,
Date/time, and Interval, and they are listed in Table 10.1. DBMS vendors often build
their own data types based on these four categories.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
integer or integer (p) Exact numeric type—binary representation of large whole number
where p indicates values—often precision set by the DBMS vendor (e.g., 2 bytes)
precision
smallint or smallint (p) Exact numeric type—binary representation of small whole number
where p indicates values—often precision set by the DBMS vendor (e.g., 1 byte)
precision 511
character (l) or char (l) Fixed length character strings including blanks from the defined
where l indicates length language set SQL_TEXT within a database—can be compared to other
columns of the same type with different lengths or varchar type with
different maximum lengths—most DBMS have an upper limit on l
(e.g., 255)
character varying (l) or Variable length character strings except trailing blanks from the defined
char (l) varying or language set SQL_TEXT within a database—DBMS records actual length
varchar (l) of column values—can be compared to other columns of the same type
where l indicates the with different maximum lengths or char type with different lengths—
maximum length most DBMS have an upper limit on l (e.g., 2000)
bit (l) Fixed length binary digits (0,1)—can be compared to other columns of
where l indicates length the same type with different lengths or bit varying type with different
maximum lengths
bit varying (l) Variable length binary digits (0,1)—can be compared to other columns
where l indicates of the same type with different maximum lengths or bit type with
maximum length different lengths
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
interval (q) Represents measure of time—there are two types of intervals: year-
512 month (yyyy:mm) which stores the year and month; and day-time
(dd hh:mi:ss) which stores the days, hours, minutes, and seconds—the
qualifier (q) known in some databases as the interval lead precision,
dictates whether the interval is year-month or day-time—
implementation of the qualifier value varies
Note that the “CREATE TABLE order” statement in Box 1 will generate an error; the word
“order” (or “ORDER,” or any case of the word) cannot be used as a user-defined value for
a table, column, or any construct in SQL because ORDER itself is an SQL construct and
thus a reserved word. A list of SQL reserved words appears in Appendix A.
The CREATE TABLE statement is a single statement starting at “CREATE” and ending
with a semicolon (;). The entire statement could be written on a single line, but it spans
multiple lines to enhance clarity and readability. The general form of the syntax for the
CREATE TABLE statement is:
CREATE TABLE table_name (comma-delimited list of table-elements);
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Constraints in a base table are declarative in nature and are imposed either on a sin- 513
gle column in the table or on a set of columns in the table. The former is referred to as
an attribute-level or column-level constraint, while the latter goes by the name tuple-level
or row-level constraint. A constraint definition can be an independent table element (i.e.,
row-level constraint) or, if applicable to a specific column only, part of the column defini-
tion (i.e., column-level constraint). Constraint definitions include the following:
• The primary key definition—i.e., specification of an entity integrity
constraint:
PRIMARY KEY (comma-delimited column list)
Example:
CONSTRAINT pk_pat PRIMARY KEY (Pat_p#a, Pat_p#n)
• An alternate key definition—i.e., specification of a uniqueness constraint:
UNIQUE (comma-delimited column list)
Example:
CONSTRAINT unq_med UNIQUE (Med_code)
• A foreign key constraint—i.e., specification of a referential integrity
constraint:
FOREIGN KEY (comma-delimited column list of referencing table)
REFERENCES table_name (comma-delimited column list of referenced
table)4
[ referential triggered action clause ]
2
A niladic-function is a built-in function that takes no arguments (Date and Darwen, 1997, p. 55).
The niladic-functions that are allowed here are: USER, CURRENT_USER, SESSION_USER, SYSTEM_
USER, CURRENT_DATE, CURRENT_TIME, and CURRENT_TIMESTAMP.
3
The CONSTRAINT constraint_name phrase enclosed by [ ] is optional. However, in practice, it is
extremely useful to use this phrase, especially for ease of later reference to the constraint by a name
known to the creator of the table or an individual responsible for altering the table. We strongly sug-
gest mandatory use of this phrase in constraint specification.
4
If the referenced column list is the primary key of the referenced table, the specification of this
column list is optional.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
5
In Chapters 11–13, table names are shown in all capital letters; that convention is preserved in the
running text of this chapter and other chapters of Part IV. However, in this chapter, for clarity in
distinguishing SQL keywords from table names, table names are sometimes shown in lowercase in
SQL code.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Box 2
Observe that the DDL script produced on the basis of the relational schema
(Figure 10.1c) does not fully capture all the information conveyed in the ERD. For
instance, the optional property of some attributes, candidate keys of relation schemas,
deletion rules, and participation constraints of relationships have not been mapped from
the ERD to the relational schema and hence are not reflected in the DDL. An inspection of
the information-preserving logical schema (Figure 10.1d) reveals that:
• Pat_name, Pat_age, Pat_admit_dt, Med_code, and Med_qty_onhand are
mandatory attributes—that is, cannot have null values in any tuple.
• Med_code is the alternate key since Med_name has been chosen as the pri-
mary key of the MEDICATION table.
• Participation of ORDER in the Placed_for relationship is total.
• Participation of PATIENT in the Placed_for relationship is partial.
• Participation of ORDER in the Is_for relationship is total.
• Participation of MEDICATION in the Is_for relationship is total.
• The deletion rule for the Is_for relationship is restrict.
• The deletion rule for the Placed_for relationship is cascade.
• [Pat_wing, Pat_room] is a composite attribute.
• [Pat_wing, Pat_room, Pat_bed] is a composite attribute.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
Note: The cardinality ratio of the form (1, n) in a relationship type is implicitly cap-
tured in the DDL specification via the foreign key constraint. Any (1, 1) cardinality ratio
can be implemented using the UNIQUE constraint definition.
At this point, let us write the SQL/DDL script a third time so as to capture all these
constraint definitions. The revised DDL script appears in Box 3, with the added constraint
definitions highlighted.
516
Box 3
By default, a column is allowed to have null values in the rows. The “not null” con-
straint definitions take care of the mandatory attribute value specification in the corre-
sponding columns in the associated base tables. Likewise, unless explicitly prohibited, a
column can contain the same value in multiple rows. An alternate key (i.e., a candidate
key not chosen as the primary key of a base table) is required to enforce the uniqueness
constraint. This is accomplished by the UNIQUE constraint definition for the Med_code
column in the MEDICATION table.
Partial participation of a parent as well as a child in a relationship (i.e., min = 0) exists
by default. Total participation of a parent in a relationship (i.e., min > 0) cannot be
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
enforced using any of the constraint definitions discussed so far. SQL-2003 offers another
mechanism called “declarative assertion” to specify broader constraints at the database
schema level. However, although part of the SQL-2003 standard, major database products
do not support the declarative definition of an assertion. Total participation of a child in a
relationship (i.e., min = 1) is enforced by specifying a “not null” constraint on the foreign
key attribute(s) (e.g., not null constraint definition for (Ord_pat_p#a, Ord_pat_p#n) in the
ORDERS table). The deletion rules (referential triggered action clause) are incorporated
using the ON DELETE clause of the foreign key constraint definition.6 Since, by definition,
a relation schema has only atomic attributes, SQL/DDL does not provide for the specifica-
tion of composite columns—all columns in a table are atomic.
where table_name is the name of the base table being altered and action is one of the
following:
• Adding a column or altering a column’s default-definition or removing the
existing default-definition via the syntax:
ADD [ COLUMN ] column_definition
Suppose we want to add a column to the base table PATIENT to store the phone
number of every patient. The SQL/DDL code to do this is as follows:
ALTER TABLE patient ADD Pat_phone# char (10);
6
Similar to the ON DELETE clause, SQL/DDL offers an ON UPDATE clause for referential triggered
action that is intended for specifying action to be taken when the referenced attribute(s) value
(primary key or alternate key value) in a foreign key constraint is changed. Since in our example
this is not specified, we defaulted to an assumption of same as ON DELETE clause specifications.
7
Braces { } are used to specify that one of the items from the list of items separated by the vertical
bar must be chosen.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
Now, the rows of the base table PATIENT are capable of receiving values for the
Pat_phone# column. Since no default has been specified for the new column added to the
table, the rows of PATIENT will, by default, have “null” value for the column, Pat_phone#.
Clearly, it is not possible to specify a “not null” constraint on the column until either a
non-null default value is specified for the column or the column is populated with non-null
values in all rows of the base table.
The column can be removed from the base table by either of these two statements:
ALTER TABLE patient DROP Pat_phone# CASCADE;
or:
ALTER TABLE patient DROP Pat_phone# RESTRICT;
Observe the definition of DROP behavior (i.e., CASCADE or RESTRICT) in the SQL/
DDL statement. The SQL-2003 standard requires the DROP behavior definition. The
518 CASCADE option implies that all constraints and derived tables that reference the column
also be dropped from the database schema. Likewise, the RESTRICT option prevents the
dropping of the column, should any schema element that references the column exist.
Also, SQL-2003 provides for the dropping of only one column per ALTER statement.
Suppose we want to specify a default value of $3.00 for the unit price of all medications.
This can be done as follows:
ALTER TABLE medication ALTER Med_unitprice SET DEFAULT 3.00;
Since the naming of the constraint definition is optional, it is possible to write the
above DDL using a second method, as follows:
Pat_age smallint not null CHECK (Pat_age IN (1 through 90))
Clearly, the second method code appears simpler and more concise. However, if we
decide to permit null values for Pat_age, in method 2, the entire column definition has to
be re-specified. On the other hand, using method 1, we simply drop the “not null” con-
straint, as shown here:
ALTER TABLE patient DROP CONSTRAINT nn_patage CASCADE;
or:
ALTER TABLE patient DROP CONSTRAINT nn_patage RESTRICT;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
This is possible only because we named the constraint. While the DBMS names every
constraint when we don’t, finding out the constraint name given by the DBMS is an ineffi-
cient task; and the constraint name given by the DBMS is not generally a user-friendly
name. Thus, the coding technique of method 1 offers greater flexibility and is strongly
recommended.
not only deletes the MEDICATION table (all rows and the table definition) from the data-
base, it also removes all schema elements (constraints and derived tables) that reference
any column of the MEDICATION table. For instance, the constraint definition, fk_med in
the base table ORDERS, defines the foreign key constraint that references a candidate
key of the MEDICATION table. The CASCADE option in the DROP TABLE statement
automatically drops the constraint fk_med in ORDERS when the MEDICATION table is
dropped. The RESTRICT option, on the other hand, disallows the deletion of the
MEDICATION table because the constraint definition, fk_med, exists in the schema.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
Hrs
Dcode (N,1)
(N,2)
Course#
(A,10)
Credit College
(A,1) (A,20)
Name
(A,24)
Offers COURSE
(1,1)
(1,m)
Adopts (1,m)
Maxst
(0,1)
(N,2)
Time
(A,8)
(1,1) (1,m)
PROFESSOR
Datehired
Name (Dt,8)
(A,20)
P_code
(N,2)
EmpID
(A,7)
Salary Phone
(N,6) (N,10)
FIGURE 10.2 A Design-Specific ERD for the Madeira College registration system
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
L3: STUDENT
521
L7 USE Course# Isbn Empid L2: PROFESSOR Name Phone Empid Datehired Salary Dcode
L0: COURSE Course# Name College Credit Hrs Dcode L6 TEXTBOOK Isbn Title Year Publisher
FIGURE 10.3 Information-reducing logical schema for the Madeira College registration system
8
In the interest of shortening the column names used throughout this chapter, the guidelines for
naming attributes given in Section 6.2 are not used in Figure 10.3, Figure 10.4, and Box 4.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
0 L0 p 0 L9 m 0 L2 n 5 L1 n
Course# Isbn Empid Q[Name] • Phone Empid Dcode Datehired Salary
L7 USE L2: PROFESSOR (A,20) (N,10) (A,7) (N,2) (Dt,8) (N,6)
(A,1 0) (A, 18) (A, 7)
1 R 1 1 R 1 1 C 1 1 R 1
0 L1 m
•
Course# Name • College Dcode Credit Hrs Isbn Title • Year Publisher
L0: COURSE (A,10) (A,24) (A,20) (N,2) (A,1) (N,1) L6 TEXTBOOK (A,18) (A,25) (N, 4) (A, 20)
1 R 1
0 L2 1
•
522 L1: DEPARTMENT
Phone [Name] Dcode Hodid Location College
(N, 10) (A, 15) (N,2) (A,7) (A, 15) (A,20)
1 D 1
0 L0 m1 L2 m
Section# Course# Profid Maxst Room Time
L4: SECTION
(N,3) (A, 10) (A, 7) (N, 2) (A, 15) (A,8)
1 C 1 1 R 1
10 < - - - - - - - - - - - - - - L1 L3 6------------>n
Section# Sid Grade •
L5 TAKES (N,3) (A, 7) (A, 1)
1<------------------1 C 1-------->1
FIGURE 10.4 Information-preserving logical schema for the Madeira College registration system
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Box 4
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
Hall')));
524
Box 4 (continued)
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
525
Box 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
The values should be listed in the same order in which they are specified in the CREATE
TABLE statement. The following three INSERT statements add one row to the PATIENT,
MEDICATION, and ORDERS tables. Note that SQL allows us to omit the column names from
the INSERT statement when assigning a value to each column in the table.
INSERT INTO PATIENT VALUES ( 0DB 0, 077642 0, 0Davis, Bill 0, 0M 0, 27, 02013-07-07 0, 0B 0,
108, 0B 0);10
1 row created.
INSERT INTO MEDICATION VALUES ( 0TAG 0, 0Tagament 0, 3.00, 3000, 0);
1 row created.
526 INSERT INTO ORDERS VALUES ( 0104 0, 0DB 0, 077642 0, 0TAG 0, 3, 1);
1 row created.
Each of these INSERT statements is successful only because each honors the declara-
tive constraints established in the respective CREATE TABLE statements.
It is also permissible for an INSERT statement to specify explicit column names that
correspond to the values provided in the INSERT statement. This is useful if a table has a
number of columns but only a few columns are assigned values in a particular new row.
Example 1 inserts a row into the PATIENT table that contains only the patient number,
patient name, age, and date of admission. In this case, specification of the column names
is required.
Example 1.
INSERT INTO PATIENT (PATIENT.PAT_P#A, PATIENT.PAT_P#N, PATIENT.PAT_NAME,
PATIENT.PAT_AGE, PATIENT.PAT_ADMIT_DT) VALUES ( 0GD 0, 072222 0, 0Grimes, David 0,
44, 02013-07-12 0);
1 row created.
This INSERT statement was successful because each of the columns not listed in the
INSERT statement permits null values.11 For example, had an attempt been made to insert
a patient without specifying a date of admission, the INSERT statement would have failed.
9
The SQL SELECT statement is used to retrieve (i.e., query) data from tables. In its simplest form,
SELECT * FROM table_name, all columns from the table_name listed are retrieved. The remainder
of this section contains several examples that make use of this form of the SQL SELECT statement.
Chapter 12 contains an extensive discussion of the SQL SELECT statement.
10
The ‘2014-07-07’ character string represents the date of admission of the patient. For a date data
type, SQL-2003 uses a default date format where the first four digits represent the year component,
the next two digits (1-12) represent the month component, and the final two digits (as constrained
by the rules of the Gregorian calendar) represent the day of the month (see Table 10.1). Other for-
mats for representing dates are covered in Section 13.1 of Chapter 13.
11
Though not illustrated here, it is also permissible to omit columns with a DEFAULT value. If a
DEFAULT exists for a column not explicitly listed in the INSERT statement, the default value will
also be included for this column when the row is inserted.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Thus, every row in the PATIENT table must contain a patient name, age, and date of
admission. In addition, since the patient number consisting of the combination of
PATIENT.Pat_p#a and PATIENT.Pat_p#n constitutes the primary key, these two columns
must be defined as well.
Each order must involve both an existing patient and existing medication. Observe
what happens in Example 2 when an attempt is made to insert an order for an existing
patient (David Grimes) but nonexistent medication (KEF).
Example 2.
INSERT INTO ORDERS VALUES ( 0109 0, 0GD 0, 072222 0, 0KEF 0, 1, 1);
integrity constraint FK_MED violated—parent key not found
Note that FK_MED (see Box 5) is the name of the referential integrity constraint
requiring each medication code in the ORDERS table to exist in the MEDICATION table.
The multi-row INSERT statement adds multiple rows of data to a table via the execu-
tion of a query. In this form of the INSERT statement, the data values for the new rows 527
appear in a SELECT statement specified as part of the INSERT statement. Suppose, for
example, separate patient tables exist for different hospitals within the same hospital sys-
tem. The INSERT statement in Example 3 inserts all rows in the PATIENT_SUGARLAND
table into the PATIENT table. Since the PATIENT_SUGARLAND table has only six col-
umns while the PATIENT table has nine columns, the column names are specified in the
INSERT statement.
Example 3.
INSERT INTO PATIENT
(PATIENT.PAT_P#A,PATIENT.PAT_P#N,PATIENT.PAT_NAME,PATIENT.PAT_GENDER,
PATIENT.PAT_AGE,
PATIENT.PAT_ADMIT_DT)
SELECT * FROM PATIENT_SUGARLAND;
3 rows created.
SELECT PAT_P#A, PAT_P#N, PAT_NAME, PAT_GENDER, PAT_AGE, PAT_ADMIT_DT
FROM PATIENT;12
Had there been an interest in inserting only those rows in the PATIENT_SUGARLAND
table with a date of admission after June 1, 2014, a WHERE clause referencing the appro-
priate column name in the PATIENT_SUGARLAND table could have been added to the
SELECT statement in the INSERT statement given in Example 3.
12
The “SELECT <column list> FROM <table list>” form of the SQL SELECT statement is formally
introduced in Chapter 12.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
results in the deletion of all orders for a particular patient when that patient is deleted from
the PATIENT table. The DELETE statement in Example 1 illustrates the propagation of a
deletion to another table by deleting the first patient inserted into the PATIENT table, Bill
Davis. Observe the content of the PATIENT and ORDERS tables before and after the deletion.
Content of Tables Prior to Deletion
SELECT PAT_P#A, PAT_P#N, PAT_NAME, PAT_GENDER, PAT_AGE, PAT_ADMIT_DT
FROM PATIENT;
PAT_P#A PAT_P#N PAT_NAME PAT_GENDER PAT_AGE PAT_ADMIT_DT
------------------------------------------------------------------------
DB 77642 Davis, Bill M 27 2013-07-07
GD 72222 Grimes, David 44 2013-07-12
LH 97384 Lisauckis, Hal M 69 2014-06-06
HJ 99182 Hargrove, Jan F 21 2014-05-25
RN 31678 Robbins, Nancy F 57 2014-06-01
13
The requirement that each medication be associated with at least five orders (see Figure 10.1a)
has been changed to only one order for this example.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Example 1.
DELETE FROM PATIENT WHERE PATIENT.PAT_NAME LIKE 0%Davis, Bill% 0;14
1 row deleted.
FROM PATIENT;
PAT_P#A PAT_P#N PAT_NAME PAT_GENDER PAT_AGE PAT_ADMIT_DT
------------------------------------------------------------------------
GD 72222 Grimes, David 44 2013-07-12
LH 97384 Lisauckis, Hal M 69 2014-06-06
HJ 99182 Hargrove, Jan F 21 2014-05-25 529
RN 31678 Robbins, Nancy F 57 2014-06-01
no rows selected
On the other hand, observe the effect of the constraint in the ORDERS table:
Ord_med_code char(5) constraint fk_med references medication (Med_code)
ON DELETE RESTRICT ON UPDATE RESTRICT
when the attempt is made in Example 2 to delete a medication for which one or more
orders exist. Assume that the rows previously deleted from the PATIENT and ORDERS
tables have been reinserted prior to the execution of the DELETE statement that attempts
to delete the Tagament medication from the MEDICATION table.
Example 2.
DELETE FROM MEDICATION WHERE MEDICATION.MED_CODE = 0TAG 0;
integrity constraint (FK_MED) violated - child record found
Note that FK_MED is the name of the referential integrity constraint requiring each
medication code in the ORDERS table to exist in the MEDICATION table and restricting
the deletion of a medication with one or more orders.
14
The LIKE operator and the percent character (%) are used for pattern matching. Pattern matching
in SQL is discussed in Section 12.1.6 of Chapter 12.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
The SET clause specifies which columns are to be updated and calculates the new
values for the columns.
As illustrated in Example 1, it is important that an UPDATE statement not violate any
existing constraints.
530 Example 1.
UPDATE MEDICATION SET MEDICATION.MED_UNITPRICE = 5.00
WHERE MEDICATION.MED_CODE = 0TAG 0;
The UPDATE statement in this example violates the check constraint CHK_UNITPRICE
and thus generates the following message:
check constraint (CHK_UNITPRICE) violated
Observe the effect of the UPDATE statement in Example 2 designed to add 500 to the
quantity on hand for each medication with a unit price greater than 0.50.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Example 2.
UPDATE MEDICATION
SET MEDICATION.MED_QTY_ONHAND = MEDICATION.MED_QTY_ONHAND + 500
WHERE MEDICATION.MED_UNITPRICE > 0.50;
check constraint (CHK_QTY) violated
While the CHK_QTY constraint requiring the quantity on hand plus the quantity on
order is violated for only one of the four otherwise qualifying rows, none of the four rows is
updated. When the WHERE clause excludes Tagament, as shown in Example 3, the
UPDATE is successful.
Example 3.
UPDATE MEDICATION
SET MEDICATION.MED_QTY_ONHAND = MEDICATION.MED_QTY_ONHAND + 500
WHERE MEDICATION.MED_UNITPRICE > 0.50 AND MEDICATION.MED_CODE <> 0TAG 0;
3 rows updated.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
Chapter Summary
The SQL-2003 standard contains a data definition language (SQL/DDL) that allows database
objects to be created. Section 10.1 focuses primarily on the CREATE TABLE statement as the
vehicle for defining required data by specifying the proper data type and using, where appropri-
ate, the NOT NULL clause, domain constraints (defined by either the CHECK clause), entity
integrity (via the PRIMARY KEY clause), referential integrity (via the FOREIGN KEY clause
along with, as appropriate, update and deletion rules), and row-level constraints (defined by the
CHECK and UNIQUE clauses).
The discussion highlights the efficacy of using a logical schema based on the information-
preserving grammar in SQL-2003 SQL/DDL to create tables that fully capture all information
contained in the Design-Specific ER model.
SQL-2003 DDL includes an ALTER TABLE statement as well as a DROP TABLE state-
ment. The ALTER TABLE statement is used to add or drop a column, add or drop a constraint,
532 or to modify a column definition. The DROP TABLE statement is used to remove a table (struc-
ture and content) from the database. Both the ALTER TABLE statement, if it involves some kind
of drop action, and the DROP TABLE statement must specify a drop behavior associated with
the action (i.e., dropping a column or constraint in the alteration of a table or the dropping of an
entire table). The options available in both cases are RESTRICT or CASCADE. RESTRICT
implies that the action is rejected if any other object referencing the base table that is the subject
of the ALTER TABLE statement or DROP TABLE statement exists. On the other hand, the
CASCADE option deletes the object (column, constraint, or base table) along with all references
to the object.
Section 10.2 covers the three statements that can be used to modify the database:
INSERT, DELETE, and UPDATE. Two types of INSERT statements exist. One allows for the
addition of a single row to a table, while the other allows for multiple rows to be added to a table.
Only one type of DELETE and UPDATE statement exists. However, with each statement it is
possible to delete (in the case of the DELETE statement) or update (in the case of the UPDATE
statement) one or more rows in the table.
Exercises
1. Discuss the differences between a relation and a table.
2. What are the minimum elements that must be included in the CREATE TABLE statement in
defining the structure of a table?
3. What is the difference between a column-level constraint and a row-level constraint?
4. Describe the SQL clauses used in the definition of:
a. a primary key constraint
b. an alternate key constraint
c. a foreign key constraint
d. a check constraint
5. With which of the four types of constraints in Exercise 4 is a requirement that an attribute
not contain a null value associated?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
534
a. Write appropriate CREATE TABLE statements for the logical schema. Be sure to
define all appropriate constraints.
b. Write an ALTER TABLE statement to add to the MEDICATION table the attribute unit
cost that represents the per unit cost of the medication. The unit cost of a medication
can range from $0.50 to $7.50.
c. Write an ALTER TABLE statement that imposes the business rule that the list price of
a medication must be at least 20 percent higher than its unit cost.
d. Write an ALTER TABLE statement to drop the occupation attribute from the PATIENT
table.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
10. You must have completed Exercise 9 before beginning this exercise, and thus have used
the SQL Data Definition Language to create tables for the three relations DRIVER,
TICKET_TYPE, and TICKET. Use the SQL INSERT statement to populate these tables
with the following data.
Ttp_offense Ttp_fine
Parking 15
Red Light 50
Speeding 65
Failure To Stop 30
Note: When entering the date of the ticket, in the INSERT statement enclose the entire date in single
quotes.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
11. This exercise is based on the data sets associated with Figure 2.25 in Chapter 2.
a. Use the SQL Data Definition Language to create a relational schema that consists of
the following three relations:
COMPANY (Co_name, Co_size, Co_headquarters)
STUDENT (St_name, St_major, St_status)
INTERNSHIP (In_co_name, In_st_name, In_year, In_qtr, In_location, In_stipend)
When you create a table for each relation, in addition to defining its primary key, define
all the appropriate referential integrity constraints. Assume that Co_name is a character
data type of size 5, Co_size is an integer data type of size 4, Co_headquarters is a
character data type of size 10, St_name is a Varchar data type of size 10, St_major is
a character data type of size 20, St_status is a character data type of size 2, In_co_
name is a character data type of size 5, In_st_name is a Varchar data type of size 10,
In_year is an integer data type of size 4, In_qtr is a character data type of size 10,
536 In_location is a character data type of size 15, and In_stipend is an integer data type of
size 4. In_stipend represents the monthly stipend associated with the internship.
b. Use the SQL Insert statement to populate the three tables with the following data:
A 1000 Boston
B 500 Chicago
C 1000 Boston
D 400 Houston
Michelle Communications SR
Chris Chemistry JR
Andy Finance SO
Anna Communications SR
Amy Communications FR
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Creation
12. The purpose of this exercise is to give you an opportunity to create the tables for Bearcat
Incorporated. The tables themselves are based on the relations that appear in the following figure:
537
In addition to the primary key constraints shown in the figure, these tables contain the
following constraints (i.e., business rules):
PLANT Table
• No two plants can have the same name.
• Plant numbers are allowed to range between 10 and 20 inclusive.
EMPLOYEE Table
• Each employee must have a first name and a last name.
• Employee salaries can range between $35,000 and $90,000 inclusive.
• Valid genders are “M” and “F.”
• Each employee must work in an existing plant.
• The supervisor of an employee must be an existing employee.
• No two employees can have the same first name, middle initial, last name, and name
tag combination.
BUILDING Table
• Each building must be part of an existing plant.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
PROJECT Table
• Projects are located in the following cities: Bellaire, Blue Ash, Mason, Stafford, and
Sugarland.
• Each project must be associated with an existing plant.
• Project numbers range from 1 to 40 inclusive.
ASSIGNMENT Table
• Each assignment must be associated with an existing employee and an existing
project.
DEPENDENT Table
• The sex of a dependent can be: “M,” “F,” “m,” or “f.”
• A dependent must be a dependent of an existing employee.
• A dependent can be related to an employee in the following ways:
538
• A dependent can be the employee’s spouse.
• A dependent who is a mother or daughter must be a female.
• A dependent who is a father or son must be a male.
BCU_ACCOUNT Table
• A bcu_account can belong to an employee, a dependent, or (an employee and a
dependent).
• Valid account types are “C,” “S,” or “I.”
HOBBY Table
• Valid values for the indoor/outdoor attribute are “I” or “O,”
• Valid values for the group/individual attribute are “G” or “I.”
PARTICIPATION Table
• A participation must involve an existing hobby and an existing dependent.
Once the tables have been created, they must be tested to make sure that the table and
column definitions allow for entry of only valid data. For example, it should not be possible
to insert two plants with the same name. Data to give the tables you have created a thor-
ough test is stored in the file insertdata.sql15. If you have defined your constraints properly,
some of these insert statements will successfully insert a row into a table. On the other
hand, some of the insert statements should fail because the data they contain violate one
of the constraints associated with the table.
At the end of running the test data through your table definitions, there should be 4 rows in the
PLANT table, 7 rows in the BUILDING table, 13 rows in the EMPLOYEE table, 9 rows in the
PROJECT table, 7 rows in the ASSIGNMENT table, 6 rows in the DEPENDENT table, 6 rows in
the BCU_ACCOUNT table, 7 rows in the HOBBY table, and 4 rows in the PARTICIPATION table.
Note: You need not make any changes to the insertdata.sql file. This file was “seeded” with errors
in order to test whether your CREATE TABLE statements handle all of the necessary constraints.
15
Insertdata.sql can be downloaded from www.course.com (search on the ISBN for this book) or
obtained from your instructor.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 11
RELATIONAL ALGEBRA
The relational data model includes a group of basic data manipulation operations. As a result
of its theoretical foundation in set theory, the relational data model’s operations include
Union, Intersection, and Difference. Five other relational operators also exist: Select, Project,
Cartesian Product, Join, and Divide. Collectively, these eight operators comprise relational
algebra.1 This section discusses each of these eight operators and gives examples of their use
in the formulation of queries. The examples are based on the Madeira College registration
system introduced in Section 10.1.1.5. The Design-Specific ERD for Madeira College appears
in Figure 10.2, and its information-reducing and information-preserving logical schema are
shown in Figures 10.3 and 10.4, respectively. Figure 11.1 contains representative data for
the DEPARTMENT, PROFESSOR, COURSE, and SECTION relations used in the relational
algebra examples in this chapter along with representative data for other Madeira College
relations used in conjunction with the SQL examples that begin in Chapter 12. In an effort
to keep the amount of sample data used in the examples in Chapters 11, 12, and 13 reason-
ably small, not all cardinality ratio and participation constraints defined in the Design-
Specific ERD in Figure 10.2 are reflected in the relations in Figure 11.1. In addition, in order
to reduce the number of characters in various column names, the attribute naming conven-
tion introduced in Section 6.2 of Chapter 6 has been dropped. For example, the column
names PR_NAME, PR_EMPID, PR_PHONE, PR_DATEHIRED, PR_DPT_CODE, and PR_
SALARY used in the PROFESSOR relation in Box 4 have been changed to NAME, EMPID,
PHONE, DATEHIRED, DCODE, and SALARY in Chapter 11 and all of Chapters 12 and 13.2
The discussion of these relational algebra operators begins with the two that operate on
a single relation (unary operators) followed by those that operate on two relations (binary
operators). Figure 11.2 shows the fundamental relational algebra operators, along with their
symbolic representations. In Figure 11.2, the fundamental operators are indicated by a dou-
ble asterisk (**). The rest can be defined from the fundamental operators. Nonetheless, these
operators are usually included in relational algebra as a matter of convenience.
1
The discussion of and notation used for various relational algebra operations in this chapter is based
on R. Elmasri and S.B. Navathe (2010), Fundamentals of Database Systems, Pearson Education. The
reader is encouraged to refer to Elmasri and Navathe (2010) and C. J. Date (2006), An Introduction to
Database Systems, Pearson Education for a more in-depth discussion of relational algebra.
2
The Madeira College relations contain a NAME column in the DEPARTMENT, COURSE, STUDENT
and PROFESSOR tables. When referenced in either a relational algebra operation or an SQL query,
the NAME column is always prefaced by the name of the relation or table. DCODE, COLLEGE,
COURSE#, and SECTION# are other column names that appear in multiple relations/tables.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
DEPARTMENT Relation
NAME DCODE COLLEGE PHONE LOCATION HODID
COURSE Relation
NAME COURSE# CREDIT COLLEGE HRS DCODE
STUDENT Relation
SID NAME ADDRESS BIRTHDATE GRADELEVEL
SECTION Relation
SECTION# TIME MAXST ROOM COURSE# PROFID
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
PROFESSOR Relation
NAME EMPID PHONE DATEHIRED DCODE SALARY
TAKES Relation
SECTION# GRADE SID
101A2014 A KP78924
101A2014 A KS39874
101A2014 B BG66765
201S2013 C BE76598
104A2014 B KJ56656
104A2014 A KP78924
104A2014 A KS39874
401W2014 A KS39874
104A2014 A BE76598
401W2014 B BG66765
104A2014 C GS76775
USES Relation
COURSE# ISBN EMPID
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
542
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
has the same attributes as R. The Boolean expression specified as <selection condition> is
composed of a number of clauses of the following forms:
<attribute name><comparison operator><constant value>
or:
<attribute name><comparison operator><attribute name>
Observe that the result is an unnamed relation that contains a subset of the tuples in
the COURSE relation.
Result:
Use of the logical operator AND requires that both conditions (DCODE = 7 and HRS = 3)
be satisfied.
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
Selection Example 3. Which courses are offered in either the College of Arts and
Sciences or the College of Education?
Relational Algebra Syntax:
σ(COLLEGE = ‘Arts and Sciences’ or COLLEGE = ‘Education’) (COURSE)
Use of the logical operator OR allows either condition (COLLEGE = ‘Arts and Sciences’
or COLLEGE = ‘Education’) to be satisfied.
Result:
Since COLLEGE is not a superkey of COURSE, the number of tuples in the result
is less than the number of tuples in COURSE. There are five distinct values for the
COLLEGE attribute: Engineering, Education, Arts and Sciences, Business, and the null
545
value associated with the Architectural History course. This null value appears as the
blank line in the following result.
Result:
COLLEGE
-----------------
Engineering
Education
Arts and Sciences
Business
NAME COLLEGE
--------------- ----------------
Economics Arts and Sciences
QA/QM Business
Economics Education
Mathematics Engineering
IS Business
Philosophy Arts and Sciences
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
Since there are six tuples in DEPARTMENT and 12 tuples in COURSE, a total of
72 tuples that contain 12 attributes per tuple is produced when the Cartesian Product
of DEPARTMENT and COURSE is formed.
In order to illustrate the result obtained by a Cartesian Product operation, consider
two relations—D and C derived from the DEPARTMENT and COURSE relations. Relation
D contains six tuples with the attributes NAME, DCODE, and COLLEGE; relation C contains
12 tuples with the attributes NAME, COURSE#, CREDIT, and DCODE. The content of
relations C and D is shown in Figure 11.5.
C Relation 547
NAME COURSE# CREDIT DCODE
D Relation
NAME DCODE COLLEGE
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
The first 24 of the 72 tuples produced by the Cartesian Product of relations C and D
are as follows:
Result:
Note that the result shown here is the concatenation of the 12 tuples of relation C (see
columns 1–3) with the first two tuples of relation D (see columns 4–7).
The Cartesian Product operation by itself is generally of little value. It is useful when
followed first by a Selection operation that matches values of attributes coming from the
component relations (technically, a Cartesian Product operation followed by a Selection
operation is equivalent to a Join operation) and sometimes by a Projection operation that
selects certain columns from the selected set of tuples.
Cartesian Product Example 2. What are the names of the departments and associated
colleges that offer a four-hour course?
p (DEPARTMENT.NAME, COLLEGE)(σ (HRS = 4 and COLLEGE.DCODE= DEPARTMENT.DCODE) (COURSE X DEPARTMENT))
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
The remainder of this section consists of the definition and examples of the three set
theoretic operators.
Union: The result of this operation, denoted by R [ S, is a relation that includes all
tuples that belong to either R or S or to both R and S. Duplicate tuples are eliminated.
Union Example. Let relations R and S be derived from the SECTION relation.
R contains tuples indicating Fall quarter sections (the fourth character in the SECTION#
is an A) and S contains tuples listing sections offered in a Lindner classroom (ROOM =
‘Lindner’).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
RELATION R
RELATION S
§
Relational Algebra Syntax and Result: R S
§
The Union R S contains the sections that are offered either exclusively in a Fall
quarter or exclusively in a Lindner classroom, or offered in a Lindner classroom during a
Fall quarter(see the third and seventh tuples).
Intersection: The result of this operation, denoted by R § S, is a relation that includes
all tuples that are in both R and S.
Intersection Example. Using the data in the relations R and S given previously, form
the intersection of R and S.
Relational Algebra Syntax and Result: R § S
Observe that the sections shown here are the only sections offered in a Lindner class-
room during a Fall quarter.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
Note that the result obtained by subtracting S from R is equal to all sections offered
during a Fall quarter in a classroom other than Lindner. The classroom associated with
the section in the first tuple is not available (i.e., is a null value) and satisfies the condition
that the section be offered in a classroom other than Lindner.
Difference Example 2. Using the data in the relations R and S given previously, form 551
the difference S minus R.
Relational Algebra Syntax and Result: S − R
Observe that the result obtained by subtracting R from S is equal to all sections
offered in a classroom located in Lindner during a quarter other than the Fall quarter.
The general form of a Join operation on two relations R (A1, A2, …, An) and S (B1, B2, …,
Bm) is:
R [X] <join condition> S
The result of the Join operation is a relation Q with n þ m attributes Q (A1, A2, …, An,
B1, B2, …. Bm), in that order. Q has one tuple for each combination of tuples—one from R
and one from S—whenever the combination satisfies the join condition. This is the main
difference between Cartesian Product and Join; in Join, only combinations of tuples
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
satisfying the join condition appear in the result, whereas in the Cartesian Product, all
combinations of tuples are included in the result. The join condition is specified on attri-
butes from the two relations R and S and is evaluated for each combination of tuples. Each
tuple combination for which the join condition evaluates to true is included in the result-
ing relation Q as a single combined tuple. In order to join the two relations R and S, they
must be join compatible—that is, the join condition must involve attributes from R and S
that share the same domain.
A general join condition is of the form:
<condition> AND <condition> AND … AND <condition>
where each condition is of the form Ai y Bj, where Ai is an attribute of R, where Bj is an attri-
bute of S, where Ai and Bj have the same domain, and where y (theta) is one of the compari-
son operators {¼, 6¼, 5, £, 4, ³}. A Join operation with such a general join condition is called
a Theta join. Tuples whose join attributes are null do not appear in the result.
Result:
Observe that the name (NAME - see column 5) and (COLLEGE - see column 7) of the
department associated with each course appears in the result just shown.
Had the Equijoin involved the complete COURSE and DEPARTMENT relations joined
on the COURSE.DCODE and DEPARTMENT.DCODE attributes instead of just relations C and
D, the result would have also included 11 tuples, with each tuple containing 12 attributes.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
While it is possible to join COURSE and DEPARTMENT over the COLLEGE attribute
from COURSE and the COLLEGE attribute from DEPARTMENT, the result of such a join
would be meaningless; each tuple in COURSE would be concatenated not only with the
tuple from DEPARTMENT that offers the course but also with the tuples from DEPART-
MENT with the same DEPARTMENT.COLLEGE but associated with a different department
code than that associated with the course. Using the data in the COURSE and DEPART-
MENT relations in Figure 11.1, it is left as an exercise for the reader to demonstrate that
an Equijoin of COURSE and DEPARTMENT on the attributes COURSE.COLLEGE and
DEPARTMENT.COLLEGE yields a result that contains 19 tuples.
Natural Join Example. Join the C and D relations over their common attribute
department code (DCODE in relation D and DCODE in relation C).
Relational Algebra Syntax:3
C* C.DCODE = D.DCODE D
3
Contrary to the requirement that attributes have unique names over the entire relational schema,
the standard definition of “Natural Join” requires that the two join attributes (or each pair of join
attributes) have the same name. If this is not the case, a renaming operation must be applied first.
See Elmasri and Navathe (2010) for a discussion of the use of the renaming operation in represent-
ing a Natural Join.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
Result:
554
Observe that only one (DCODE) of the two join attributes common to both relations
(D.DCODE and C.DCODE) appears in the result just shown.
The Join operation is used to combine data from multiple relations so that related
information can be presented in a single relation. As illustrated in Cartesian Product
Example 2, Join operations are typically followed by a Projection operation.
In such a Theta Join, a tuple from D is concatenated with a tuple from C only when an
inequality exists for each of the join conditions. Thus, the first tuple in D is not
concatenated with the first tuple in C because the join condition is not satisfied. Observe,
however, that the join condition is satisfied when the first tuple of D is evaluated against
all other tuples in C, thus resulting in the first 10 tuples of the 55 tuples in the result.4
Using the data for relations C and D shown in Figure 11.5, the reader is encouraged to
verify why this Theta Join produces a total of 55 tuples.
4
The reason why the join condition is not satisfied for the 12th tuple in COURSE where C.DCODE is
a null value is discussed in Section 12.1.5 of Chapter 12.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
Result:
Observe how the sixth tuple of D is concatenated with the tuple of null values from
the C relation to produce the twelfth tuple in the result, revealing that the Philosophy
Department has yet to offer a course.
Right Outer Join Example. Do a Right Outer Join of relations D and C over their
common attributes (D.DCODE in relation D and C.DCODE in relation C).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
Result:
Observe how each tuple of C (including the course in Architectural History that is not
affiliated with a department) appears in the result just shown.
Full Outer Join Example. Join the D and C relations, making sure that each tuple
from each relation appears in the result.
A Full Outer Join of D and C, expressed as:
Relational Algebra Syntax:
D ]X[ D.DCODE = C.DCODE C
adds a blank tuple to both relation D and relation C to ensure that each tuple in each
relation is reflected in the result. Observe how each tuple in each relation appears in the
result shown here:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
It is left as an exercise for the reader to verify that the union of the results of a Left Outer
Join and a Right Outer Join is equal to the result of a Full Outer Join.
557
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
For ease of understanding, Kifer, Bernstein, and Lewis (2005) illustrate how the Divi-
sion operation can be decomposed into a sequence of Projection, Cartesian Product, and
Difference operations, as shown here:
T2 ¼ πA(T1 R) All those A values in R that are not associated in R with every B value in S. These
are those A values that should not be in the answer.
T3 ¼ πA (R) T2 The quotient: all those A values in R that are associated in R with all B values in S.
Divide Example. List the course numbers of courses that are offered in all quarters
during which course sections are offered.5
Relational Algebra Syntax:
R=p (COURSE#, SUBSTR(SECTION#,4,1)) (SECTION)
S=p (SUBSTR(SECTION#,4,1)) (SECTION)
558 R÷S
Result:
COURSE#
-------
22QA375
Relation R Relation S
6
COURSE# SUBSTR(SECTION#,4,1) SUBSTR(SECTION#,4,1)
-------------------------------- -------------------
22QA375 A A
22IS270 A S
22IS330 S W
22IS832 W U
20ECES212 A
22QA375 U
22IS330 A
22QA375 S
22QA375 W
5
The fourth character position of the SECTION# represents the quarter in which the section of
offered. The letters “A,” “S,” “W,” and “U” represent the Fall, Spring, Winter, and Summer quarters,
respectively.
6
SUBSTR(SECTION#,4,1) is used to extract the fourth character containing the quarter in which the
section is offered from the 8-character section number. The SQL SUBSTR(char, m, [n]) function is
discussed in Section 13.1.1 of Chapter 13.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
pCOURSE#(R) S T1
--------- - ----------------
22QA375 A 22QA375 A
22IS270 S 22QA375 S
20IS330 W 22QA375 W
22IS832 U 22QA375 U
20ECES212 22IS270 A
22IS270 S
22IS270 W
22IS270 U
20IS330 A
20IS330 S
20IS330 W
20IS330 U 559
22IS832 A
22IS832 S
22IS832 W
22IS832 U
20ECES212 A
20ECES212 S
20ECES212 W
20ECES212 U
T2 ¼ π (COURSE#) (T1 R)—All courses that are not offered during each quarter:
(T1 – R) T2
---------------- ---------
22IS270 S 22IS270
22IS270 W 20IS330
22IS270 U 20IS832
22IS832 A 20ECES212
22IS832 S
22IS832 U
20ECES21 S
20ECES212 W
20ECES212 U
T3 ¼ π (COURSE#) (R) T2—All courses offered during each quarter:
p(COURSE#)(R) T2 T3
--------- ---------- ---------
22QA375 20IS270
22IS270 20IS330 22QA375
20IS330 22IS832
22IS832 20ECES212
20ECES212
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
Table 11.1 summarizes the basic relational algebra operators and notation.
Relational Algebra
Operator Purpose Notation
Select Selects all tuples that satisfy the selection condition s <selection condition> (R)
from a relation R.
Project Produces a new relation with only some of the π <attribute list> (R)
attributes of R and removes duplicate tuples.
Equijoin Produces all the combinations of tuples from R1 and R1[X]<join condition> R2
R2 that satisfy a join condition with only equality
comparisons. R1 and R2 must be join compatible.
Natural Join Produces all the combinations of tuples from R1 and R1 * <join condition> R2
R2 that satisfy a join condition with only equality
comparisons, except that one of the join attributes is
not included in the resulting relation. R1 and R2 must
be join compatible.
Theta Join Produces all the combinations of tuples from R1 and R2 R1[X]<join condition> R2
that satisfy a join condition which does not have to involve
equality comparisons. R1 and R2 must be join compatible.
*Source: R. Elmasri and S. B. Navathe (2010), Fundamentals of Database Systems, Pearson Education.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
projected back on the attributes of R. The Semi-Join operation can be expressed using the
Projection and Join operations, as follows:
PR (R |X A=B S)
Result:
The result of the Semi-Minus operation is thus the tuples of R that have no counterpart in S.
Semi-Minus Example. List complete details of all courses for which sections are not
offered in the fall 2014 quarter.
Relational Algebra Syntax:
COURSE - pCOURSE(COURSE |X (COURSE.COURSE# = SECTION.COURSE# and SUBSTR(SECTION#,4,5) = ‘A2014’)SECTION)
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
the value of some of their attributes and then applying an aggregate function indepen-
dently to each group. An example would be to group the courses taken by course number
and then count the number of students taking each course during the year 2014.
The Aggregate Function operation can be defined using the symbol f to specify these
types of requests, as follows:
Result:
COLLEGE AVG(HRS)
---------------- -------------
3
Engineering 3
Education 3.5
Arts and Sciences 3
Business 3
As mentioned in Section 11.1.2 in the context of Projection Example 1, there are five
distinct values for the COLLEGE attribute: Engineering, Education, Arts and Sciences,
Business, and the null value associated with the Architectural History course. From a
relational algebra standpoint, the null value is considered one of the five values of the
COLLEGE grouping attribute.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
Chapter Summary
Chapter 11 introduces relational algebra, a mathematical expression of data retrieval methods
prescribed by E. F. Codd, as a means to specify the logic for data retrieval from a relational
database. The tuples of a relation can be considered elements of a set and thus can be
involved in operations. In the same way that algebra is a system of operations on numbers,
relational algebra is a system of operations on relations. Expressed in terms of the relations
R and S, the basic operations of relational algebra are Union (R È S), Difference (R – S),
Selection (<selection condition>(R)), Projection (<attribute list> (R)), and Cartesian Product
(R(A1, A2, …, An) X S(B1, B2, …, Bm)), where Ai and Bj are attributes of R and S, respectively.
Certain combinations of these five operations can be used to define three other basic
operations. When the Union of R and S is formed, an Intersection operation identifies those
tuples common to both R and S. A Join operation consists of a Cartesian Product followed by
a Selection. A Division operation can be expressed as a sequence of Projection, Cartesian
Product, and Difference operations. A summary of the basic relational algebra operations
discussed in this chapter appears in Figure 11.2.
563
Exercises
1. What constitutes union compatibility?
2. What are the purposes of the Union, Intersection, and Difference operations?
3. What is a Cartesian Product operation?
4. What is the difference between the result obtained from a Selection operation versus a
Projection operation?
5. Why does the relation created as result of a Projection operation on a relation that includes
the primary key not contain fewer tuples than those in the source relation?
6. What is a Join operation? What is meant when it is said that two relations are join compat-
ible? What is the difference between an Inner Join operation and an Outer Join operation?
7. What is meant by the term “division compatibility”?
8. This question is based on the four relations shown below. Show the results along with the
relational algebra expressions for the following four retrieval requests.
a. Display the products and their associated prices for stores located in Houston.
b. Which products are not in the inventory of any store?
c. Which products are offered at a 10% discount?
d. Which stores stock products with a price greater than $1000?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
STORE PRODUCT
Computer
564
INVENTORY DISC_STRUCTURE
14 Television 280 30
14 Humidifier 30 10
17 Television 10
11 Computer 120
11 Refrigerator 180
11 Lawn Mower
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Relational Algebra
a. What are the Course IDs for those courses that do not have an assigned tutor?
b. What are the names of the tutors for MIS 4372?
c. What are the names of the tutors for those courses with no prerequisites?
d. What are the course names and number of prerequisites for those tutors hired in 2009.
TUTOR
TUTOR_ASSIGNMENT
TutorID CourseID
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 11
COURSE
Number of
CourseID CourseName Prerequisites
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 12
STRUCTURED QUERY
LANGUAGE (SQL)
SQL is the standard language for manipulating relational databases. The language was
created to facilitate implementation of relational algebra in a database. Like relational
algebra, SQL uses one or more relations as input and produces a single relation as
output.1 This chapter and Chapter 13 contain an informal overview of the use of the
SQL SELECT statement for information retrieval. Through the use of numerous example
queries, the discussion introduces the SELECT statement’s features, beginning gradually
with queries based on a single table followed by queries that retrieve data from several
tables. As was the case in Chapter 10, except where indicated, the examples in this
chapter as well as in Chapter 13 are based on the syntax associated with the SQL-2003
standard.
Before proceeding further, it is important to note that space does not permit a
thorough and complete discussion of all features of the SQL SELECT statement as well
as other features associated with SQL’s data definition language and data manipulation
language. Hundreds of books and reference manuals are available on SQL, and the
interested reader is encouraged to consider such sources for more comprehensive syntax
than that shown in this book. In addition, many Web sites have information about
standard SQL and its implementation under various database platforms.
As mentioned previously, the SQL SELECT statement is employed to retrieve
(i.e., query) data from tables (i.e., relations) and is used in conjunction with all relational
algebra operations. For example, using a SELECT statement, it is possible to view all the
columns and rows within a table (i.e., to execute a relational algebra Selection operation)
or specify that only certain columns and rows be viewed (execute a Selection operation
followed by a Projection operation).
The syntax for an SQL statement gives the basic structure, or rules, required to
execute the statement. The basic form of the SQL SELECT statement is called a
1
SQL uses the terms “table,” “row,” and “column” for “relation,” “tuple,” and “attribute,” respectively.
Thus, when the discussion focuses on SQL, the terms “table,” “row,” and “column” will be used.
When the discussion focuses on relational algebra operations, the terms “relation,” “tuple,” and
“attribute” will be used.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
select-from-where block and contains the three clauses SELECT, FROM, and WHERE in
the following form:
SELECT <column list>
FROM <table list>
WHERE <condition>
2
As we will see, the table list may include both database views and inline views.
3
Kifer, M., A. Bernstein, and P. M. Lewis. Databases and Transactions Processing: An Application-
Oriented Approach, Second Edition. Addison-Wesley, 2005b.
4
While the WHERE clause is optional, the SELECT and FROM clauses are not.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
5
The SQL-2003 standard actually prescribes the semicolon as a statement terminator only in the
case of embedded SQL.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
or:
SELECT *
FROM COURSE
WHERE COURSE.HRS = 3;
Result:
NAME COURSE# CREDIT COLLEGE HRS DCODE
---------------------- ---------- ------- ---------------- ---- ------
Intro to Economics 15ECON112 U Arts and Sciences 3 1
Supply Chain Analysis 22QA411 U Business 3 3
Principles of IS 22IS270 G Business 3 7
570
Programming in C++ 20ECES212 G Engineering 3 6
Optimization 22QA888 G Business 3 3
Financial Accounting 18ACCT801 G Education 3 4
Database Principles 22IS832 G Business 3 7
Systems Analysis 22IS430 G Business 3 7
Architectural History 05ARCH101 U 3
The asterisk (*) means that all columns from the table are to be selected. The WHERE
clause tells SQL to search the rows in the COURSE table and to return (i.e., display) only
those rows where the value of COURSE.HRS is equal to exactly 3. In addition, while not
required, in an SQL SELECT statement it is a good idea to prefix each column name with
its table name in order to minimize ambiguity and confusion.6
Since SQL SELECT statements are not case sensitive, the two SQL statements shown
here could also have been written as follows:
select name, course#, credit, college, hrs, dcode
from course
where hrs = 3;
select *
from course
where hrs = 3;
6
The convention of prefixing each column name with its table name is used throughout this chapter.
SQL, however, permits duplicate column names as long as they appear in different tables. When two
tables involved in a SELECT statement share a common column name or names, the qualification of
the column name(s) with the appropriate table name is required if the column name appears in the
<attribute list> or in the WHERE condition. Otherwise, ambiguity will exist and an error message of
the form “column ambiguously defined” will appear.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
lists the three-hour courses in descending order by the department number in which the
course is offered, whereas the query:
SELECT *
FROM COURSE
WHERE COURSE.HRS = 3
571
ORDER BY COURSE.DCODE DESC, COURSE.NAME;
lists the three-hour courses in descending order by the department number in which the
course is offered and in ascending order by course name within each department.
Example 1.1.2 (Corresponds to Selection Example 2 in Section 11.1.1). Which
courses offered by department 7 are three-hour courses?
SQL SELECT Statement:
SELECT *
FROM COURSE
WHERE COURSE.DCODE = 7 AND COURSE.HRS = 3;
Result:
Use of the logical operator AND requires that both conditions (i.e., COURSE.DCODE = 7
and COURSE.HRS = 3) be satisfied.
Example 1.1.3. Which sections have a maximum number of students greater than 30
or are offered in Lindner 110?
SQL SELECT Statement:
SELECT *
FROM SECTION
WHERE SECTION.MAXST > 30 OR SECTION.ROOM = ’Lindner 110’;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
Use of the logical operator OR requires that either one or both conditions (i.e.,
SECTION.MAXST > 30 OR SECTION.ROOM = ‘Lindner 110’) be satisfied. Observe that in
Example 1.1.3, none of the rows in the SECTION table satisfy both conditions of the WHERE
clause. Also, whenever a character or string literal (e.g., Lindner 110) is used as part of a
condition, the value must be enclosed within single quotation marks. Character or string lit-
erals enclosed in single quotation marks are case sensitive. Thus, the WHERE condition will
not be true if anything other than Lindner 110 appears inside the single quotation marks.
572
12.1.2 Use of Comparison and Logical Operators
Combining AND or OR in the same logical expression must be done with care. When AND
and OR appear in the same WHERE clause, all the ANDs are performed first, then all the
ORs are performed. In this way, AND is said to have a higher precedence than OR.
For example, the WHERE clause:
WHERE COURSE.CREDIT = ‘U’ AND COURSE.COLLEGE = ‘Business’ OR COURSE.COLLEGE
= ‘Engineering’
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
In SQL, all operators are arranged in a hierarchy that determines their precedence. In
any expression, operations are performed in order of their precedence, from highest to
lowest. When operators of equal precedence are used next to each other, they are
performed from left to right. The precedence of common logical operators in SQL is:
1. All of the comparison operators (=, <>, <=, <, >=, >) have equal precedence.
2. NOT
3. AND
4. OR
When the normal rules of operator precedence do not fit the needs, one can override
them by placing part of an expression in parentheses. That part of the expression will be
evaluated first, then the rest of the expression will be evaluated.
The following examples illustrate the incorporation of logical operators in a SELECT
statement. Note that Example 1.2.1 is based on the WHERE clause discussed earlier.
Example 1.2.1
SELECT *
FROM COURSE 573
WHERE COURSE.CREDIT = ’U’
AND COURSE.COLLEGE = ’Business’
OR COURSE.COLLEGE = ’Engineering’;
Result:
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
In Example 1.2.3, each of the first 11 rows satisfies at least one of the conditions in
parentheses (COURSE.COLLEGE <> ‘Business’ OR COURSE.COLLEGE <> ‘Engineering’)7.
Five of these rows are also associated with a course that is offered for undergraduate
credit.
Example 1.2.3
SELECT *
FROM COURSE
WHERE (COURSE.COLLEGE <> ’Business’
OR COURSE.COLLEGE <> ’Engineering’)
AND COURSE.CREDIT = ’U’;
Result:
7
Section 12.1.5 explains why the Architectural History course in row 12 of COURSE fails to satisfy
either condition.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
Result:
The logical operator NOT reverses the result of a logical expression. NOT can be used
to precede any of the comparison operators (=, <>, <=, <, >=, >) as well as the word IN.
Thus, Example 1.2.6 displays all courses offered for undergraduate credit that are not
offered at either the College of Business or the College of Engineering. Once again, the
Architectural History course does not appear in the result because null values only satisfy
a WHERE clause that uses IS NULL or IS NOT NULL.
Example 1.2.6
SELECT *
FROM COURSE
WHERE COURSE.CREDIT = ’U’
AND COURSE.COLLEGE NOT IN (’Business’, ’Engineering’);
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
So far, the conditions in the WHERE clause have involved comparisons where the
operands were of the form:
<column name><comparison operator><constant value>
and the constant value was either a numeric constant or a character constant (i.e.,
character string). However, as we will see, the operands of a comparison operator can
be expressions, not just a simple column name or constant. For example, suppose we
are interested in identifying all professors with a monthly salary that exceeds $6,000.
Example 1.2.7
SELECT *
FROM PROFESSOR
WHERE PROFESSOR.SALARY/12 > 6000;
Result:
NAME EMPID PHONE DATEHIRED DCODE SALARY
------------------ --------- ----------- ------------ ------- ------
Mike Faraday FM49276 5235568492 01-MAY-96 1 92000
576
Chelsea Bush BC65437 5235567777 01-MAY-93 3 77000
Tony Hopkins HT54347 5235569977 20-JAN-97 3 77000
Alan Brodie BA54325 5235569876 16-MAY-00 3 76000
Marie Curie CM65436 5235569899 22-OCT-99 4 99000
John Nicholson NJ43728 5235569999 22-JUN-03 4 99000
If we follow this Selection operation with a Projection operation, the query in Example
1.2.7 could be rewritten to display just the name and monthly salary of each qualifying
professor, as follows:
SELECT NAME, PROFESSOR.SALARY/12 AS "Monthly Salary"
FROM PROFESSOR
WHERE PROFESSOR.SALARY/12 > 6000;
Result:
8
The keyword AS is optional and is often used in the column list to distinguish between the column
name and column alias.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
provide a more descriptive column heading as a column alias. It should be noted that this
column alias cannot be used in other clauses associated with the SELECT statement. The
TRUNC function of the form TRUNC (PROFESSOR.SALARY/12,2) or the ROUND function
of the form ROUND (PROFESSOR.SALARY/12,2) could be used to either truncate or round
each monthly salary to two places to the right of the decimal point.
WHERE clauses can also refer to a range of values through use of the comparison
operator BETWEEN, which searches for rows in a specific range of values. For example,
suppose we are interested in identifying the name and monthly salary of all professors
whose monthly salary is between $6,000 and $7,000.
Example 1.2.8
SELECT PROFESSOR.NAME, PROFESSOR.SALARY/12 AS "Monthly Salary"
FROM PROFESSOR
WHERE PROFESSOR.SALARY/12 BETWEEN 6000 AND 7000;
Result:
The BETWEEN operator is inclusive (i.e., professors with a monthly salary of exactly
$6,000 or $7,000 would also be included in the result) and can be applied to all data types.
Further, NOT can also be used with the BETWEEN operator. For example, the query in
Example 1.2.9 identifies all professors whose monthly salary is outside the range of $6,000
to $7,000. A close inspection of the results indicates that the two professors without a sal-
ary (John B. Smith and Tiger Woods) do not appear. Note: Professor John Smith (without a
middle initial) has a $45,000 annual salary.
Example 1.2.9
SELECT PROFESSOR.NAME, ROUND(PROFESSOR.SALARY/12,0) AS
"Monthly Salary"
FROM PROFESSOR
WHERE PROFESSOR.SALARY/12 NOT BETWEEN 6000 AND 7000;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
NAME Monthly Salary
------------------- --------------
John Smith 3750
Mike Faraday 7667
Kobe Bryant 5500
Ram Raj 3667
Prester John 3667
Jessica Simpson 5583
Laura Jackson 3583
Marie Curie 8250
Jack Nicklaus 5583
John Nicholson 8250
Sunil Shetty 5333
Katie Shef 5417
Cathy Cobal 3750
Jeanine Troy 3750
578
Mike Crick 5750
Section 12.1.5 explains why John B. Smith and Tiger Woods are not included in the
results for either Example 1.2.8 or Example 1.2.9. Section 12.2 illustrates how the WHERE
and ON clauses have comparisons where operands of the form:
<column name><comparison operator><column name>
Execution of the SELECT statement given above, since it refers to only one of the
six columns in the COURSE table, actually generates duplicate rows, and thus the
result shown next does not constitute a relation. Since the Projection operation displays
the content of the COLLEGE column from each row of the COURSE table, the null
value in the COLLEGE column for the Architectural History course is displayed in the
first row.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:9
COLLEGE
------------------
Business
Business
Education
Arts and Sciences
Education
Business
Business
Business
Engineering
Business
Business
where ALL retains duplicate values in queries (note that ALL is the default as compared to
DISTINCT).
Thus, the revised SELECT statement with the DISTINCT qualifier produces the
following result:
SELECT DISTINCT COURSE.COLLEGE
FROM COURSE;
Result:
COLLEGE
------------------
Engineering
Education
Arts and Sciences
Business
Observe that a null value for an attribute is distinctly different from non-null values
such as Engineering, Education, etc.
9
Different implementations of SQL can result in the order of the rows displayed as a result of a Pro-
jection operation to be unpredictable. Use of an ORDER BY clause is the best way to control the
order of the rows displayed.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
NAME ADDRESS
------------------- -----------------
Elijah Baley 2920 Scioto Street
Daniel Olive 338 Bishop Street
Wanda Seldon 3138 Probasco
……
……
……
Poppy Kramer 437 Love Lane
Sweety Kramer 748 Hope Avenue
580 Diana Jackson 2920 Scioto Street
In most cases, several relational algebra operations are applied one after the other
(e.g., a Selection operation is followed by a Projection operation).
Example 1.3.3. What are the names of the courses offered in the College of Business?
SQL SELECT Statement:
SELECT COURSE.NAME
FROM COURSE
WHERE COURSE.COLLEGE = ’Business’;
Result:
NAME
----------------------
Database Concepts
Database Principles
Operations Research
Optimization
Principles of IS
Supply Chain Analysis
Systems Analysis
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Example 1.4.1. Count the number of salaries in the PROFESSOR table and, at the
same time, display the sum of the salaries, the average salary, the maximum salary, and
the minimum salary.
SQL SELECT Statement:
SELECT COUNT(SALARY), SUM(SALARY),
AVG(SALARY), MIN(SALARY), MAX(SALARY)
FROM PROFESSOR;
Result:
COUNT(SALARY) SUM(SALARY) AVG(SALARY) MIN(SALARY) MAX(SALARY)
------------- ----------- ----------- ----------- -----------
18 1184000 65777.7778 43000 99000
581
Note that while there are 20 rows in the PROFESSOR table, the salaries of two pro-
fessors (John B. Smith and Tiger Woods) are unknown, and thus a null value is stored
in the respective SALARY column for these professors. Section 12.1.5 discusses the impact
of null values on aggregate functions.
Aggregate functions are often used in conjunction with groups of rows rather than
with all rows in a table. The column or columns on which the grouping takes place are
called the grouping column(s). Doing this requires the application of the GROUP BY
clause. The GROUP BY clause divides data into sets (i.e., groups) based on the contents of
specified columns. The general form of the GROUP BY clause is:
GROUP BY column name, [,column name,…]
The grouping column(s) must also appear in the SELECT clause so that the value from
applying each function to a group of rows appears along with the value of the grouping column(s).
Example 1.4.2. Count the number of students at each grade level.
SQL SELECT Statement:
SELECT STUDENT.GRADELEVEL, COUNT(*)
FROM STUDENT
GROUP BY STUDENT.GRADELEVEL;
Result:
GRADELEVEL COUNT(*)
------------- -------------
SR 3
FR 3
SO 3
GR 2
JR 5
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Example 1.4.3. Calculate the average salary of the professors in each department.
SQL SELECT Statement:
SELECT PROFESSOR.DCODE, AVG(PROFESSOR.SALARY)
FROM PROFESSOR
GROUP BY PROFESSOR.DCODE;
Result:
DCODE AVG(PROFESSOR.SALARY)
----- ---------------------
1 67666.6667
6 44000
4 88333.3333
3 68000
7 58000
9 57000
Result:
Observe that a group is created for those courses for which the department code is a
null value (see row 1 of the result). Use of a GROUP BY clause is often combined with an
ORDER BY clause to display the result in a more meaningful manner.
The HAVING clause is used in conjunction with the GROUP BY clause to place
restrictions on the rows returned by the GROUP BY clause in a query. A condition in a
HAVING clause must always involve an aggregation. In addition, a HAVING clause cannot
be used apart from an associated GROUP BY clause.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Example 1.4.5. Calculate the average salary of the professors in each department for
those departments where the minimum salary of a professor is $45,000.
SQL SELECT Statement:
SELECT PROFESSOR.DCODE, AVG(PROFESSOR.SALARY)
FROM PROFESSOR
GROUP BY PROFESSOR.DCODE
HAVING MIN(PROFESSOR.SALARY) >= 45000;
Result:
DCODE AVG(PROFESSOR.SALARY)
----- ---------------------
1 67666.6667
4 88333.3333
7 58000
9 57000
Note that departments 3 and 6 do not appear in the results because there is at least
one professor in each department with a salary less than $45,000. In addition, observe 583
that the calculation of the average salary of the professors in department 9 ignores Tiger
Woods since his salary is a null value (i.e., is not available). The following section goes into
more detail on the handling of null values. In addition, Section 12.2 describes how to join
tables in SQL, which would represent one way to revise the query in Example 1.4.5 in
order to display department names instead of department numbers.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
SELECT *
FROM TEXTBOOK;
Result:
PUBLISHER
----------------
Thomson
Prentice-Hall
Springer
Thomson
Prentice-Hall
Thomson
9 rows selected.
10
In SQL-2003, a character value that is zero characters long is treated as a null value.
11
Most versions of SQL contain a system variable that allows for the number of rows retrieved by the
execution of a query to be displayed. The value of this variable will be shown in conjunction with
various examples when appropriate. It is useful here as a way of showing that all rows from the
TEXTBOOK table were selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
A query (see Example 1.5.2) that displays the textbooks where the
TEXTBOOK.PUBLISHER column contains a not null value excludes the rows associated
with the titles Economics For Managers and Fundamentals of SQL, since the
TEXTBOOK.PUBLISHER column for each of these textbooks contains a null value.
The row associated with the title Data Modeling is not excluded since the value in the
TEXTBOOK.PUBLISHER column for this title is not null (i.e., consists of a single blank space).
Example 1.5.2
SELECT TEXTBOOK.TITLE, TEXTBOOK.PUBLISHER
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IS NOT NULL;
Result:
TITLE PUBLISHER
----------------------- ------------
Database Management Thomson
Linear Programming Prentice-Hall
Simulation Modeling Springer 585
Systems Analysis Thomson
Principles of IS Prentice-Hall
Programming in C++ Thomson
Data Modeling
7 rows selected.
The only comparison operators that can be used with null values are IS NULL and IS
NOT NULL. If any other operator (=, >, <>, etc.) is used with a null value, the result is
always unknown.12 In addition, since a NULL represents a lack of data, a null value cannot
be equal or unequal to any other value, even another NULL. Examples 1.5.3, 1.5.4, and
1.5.5 illustrate the fact that only IS NULL and IS NOT NULL can be used as comparison
operators with null values.
Example 1.5.3
SELECT *
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER = NULL;
Result:
no rows selected
Example 1.5.4
SELECT *
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER <> NULL;
12
SQL-2003 treats conditions evaluating to unknown as FALSE.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
no rows selected
While conditional expressions of the form “WHERE X = NULL” and “WHERE X <>
NULL” are illegal, SQL does not generate a syntax error. This can create serious problems in
cases where “WHERE X IS NULL” would otherwise cause one or more rows to be selected.
Example 1.5.5
SELECT *
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IS NULL;
Result:
The SELECT statement in Example 1.5.6 demonstrates that there are five distinct
publishers in the TEXTBOOK table and reflects the fact that a null value in a column can
be distinguished from a column that contains a single blank space. The SELECT statement
in Example 1.5.7 (with the IS NOT NULL condition) indicates that the publisher whose
name consists of a single blank space can be distinguished from the publisher with a null
value for its name.
Example 1.5.6
SELECT DISTINCT TEXTBOOK.PUBLISHER
FROM TEXTBOOK;
Result:
PUBLISHER
-----------------
Springer
Thomson
Prentice-Hall
5 rows selected.
Example 1.5.7
SELECT DISTINCT TEXTBOOK.PUBLISHER
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IS NOT NULL;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
PUBLISHER
----------------
Springer
Thomson
Prentice-Hall
4 rows selected.
Example 1.5.8 uses the COUNT function to count the number of rows in the TEXTBOOK
table and returns the result in a single table with a single column. Recall that when COUNT(*)
is used, SQL focuses on the presence of rows rather than values appearing in a column.
Example 1.5.8
SELECT COUNT(*)
FROM TEXTBOOK;
Result: 587
COUNT(*)
------------
9
1 row selected.
When the COUNT function refers to a column, it behaves like the other aggregate
functions and ignores null values (see Section 12.1.4). Thus the query in Example 1.5.9
displays the number of rows in the TEXTBOOK table with something other than a null
value in the TEXTBOOK.PUBLISHER column.
Example 1.5.9
SELECT COUNT(TEXTBOOK.PUBLISHER)
FROM TEXTBOOK;
Result:
COUNT(TEXTBOOK.PUBLISHER)
---------------------------
7
1 row selected.
Numeric functions, such as the COUNT function, are associated with columns of
output whose width is equal to the number of characters required to display the name
of the function plus its argument(s). Often, a column alias, such as the one used in
Example 1.5.10, is used to provide a more descriptive column heading. Observe how
the width of the column has been adjusted to display the entire column alias
(i.e., heading).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Example 1.5.10
SELECT COUNT(TEXTBOOK.PUBLISHER) "Number of Publishers"
FROM TEXTBOOK;
Result:
Number of Publishers
---------------------
7
1 row selected.
In Example 1.5.10, the two rows in the TEXTBOOK table with null publishers
are ignored by the COUNT function. This can be verified by the query in Example
1.5.11, which counts the number of distinct not null values in the TEXTBOOK.PUBLISHER
column.
Example 1.5.11
SELECT COUNT(DISTINCT TEXTBOOK.PUBLISHER) "Number of Distinct Publishers"
588 FROM TEXTBOOK;
Result:
Number of Distinct Publishers
------------------------------
4
1 row selected.
Aggregate functions are frequently applied to groups of rows in a table rather than to all
rows in a table. The SELECT statement in Example 1.5.12 reflects the fact that there are five
distinct publishers in the TEXTBOOK table and counts the number of rows (i.e., the COUNT
(*) function is focusing on the presence of a row) associated with each distinct publisher.
Example 1.5.12
SELECT TEXTBOOK.PUBLISHER, COUNT(*)
FROM TEXTBOOK
GROUP BY TEXTBOOK.PUBLISHER;
Result:
PUBLISHER COUNT(*)
--------------- --------
Springer 1
2
Thomson 3
Prentice-Hall 2
1
5 rows selected.
Observe that the second row displayed is associated with the two textbooks with a null
value in the TEXTBOOK.PUBLISHER column.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
The SELECT statement in Example 1.5.13 also recognizes that there are five distinct
publishers, but since group functions ignore the presence of null values in the
TEXTBOOK.PUBLISHER column, the accumulator set up for the publisher whose value is a
null value never has its initial value of zero incremented.
Example 1.5.13
SELECT TEXTBOOK.PUBLISHER, COUNT(TEXTBOOK.PUBLISHER)
FROM TEXTBOOK
GROUP BY TEXTBOOK.PUBLISHER;
Result:
PUBLISHER COUNT(TEXTBOOK.PUBLISHER)
--------------------- -------------------------
Springer 1
0
Thomson 3
Prentice-Hall 2
1
589
5 rows selected.
The SELECT statement in Example 1.5.14 works as expected, since the WHERE
clause places the condition that the value in the TEXTBOOK.PUBLISHER column must
be not null before that publisher can be used to form a group. The SELECT statement
in Example 1.5.15 also works as expected since the WHERE clause focuses only on the
publisher whose name consists of a single blank space. Finally, the SELECT statement
in Example 1.5.16 serves as a reminder that the only publishers whose name consists
of something other than a single blank space are Thomson, Prentice-Hall, and
Springer. Recall, using a <> comparison operator in the evaluation of whether a
null value differs from a single blank space produces an unknown result (which is treated
as false).
Example 1.5.14
SELECT TEXTBOOK.PUBLISHER, COUNT(*)
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IS NOT NULL
GROUP BY TEXTBOOK.PUBLISHER;
Result:
PUBLISHER COUNT(*)
----------------- --------
Springer 1
Thomson 3
Prentice-Hall 2
1
4 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Example 1.5.15
SELECT TEXTBOOK.PUBLISHER, COUNT(*)
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER = ’ ’
GROUP BY TEXTBOOK.PUBLISHER;
Result:
PUBLISHER COUNT(*)
----------------- --------
1
1 row selected.
Example 1.5.16
SELECT TEXTBOOK.PUBLISHER, COUNT(*)
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER <> ’ ’
590 GROUP BY TEXTBOOK.PUBLISHER;
Result:
PUBLISHER COUNT(*)
----------------- --------
Springer 1
Thomson 3
Prentice-Hall 2
3 rows selected.
The SELECT statements in Examples 1.5.17, 1.5.18, and 1.5.19 illustrate the impact
of a null value in queries that involve multiple conditions. In Example 1.5.17, the first
condition selects the rows associated with the four not null publishers, while the second
condition selects the rows associated with the three publishers Thomson, Prentice-Hall,
and Springer. Since OR is used to connect the two conditions, groups are formed and
counts accumulated for the four not null publishers.
Example 1.5.17
SELECT TEXTBOOK.PUBLISHER, COUNT(*)
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IS NOT NULL
OR TEXTBOOK.PUBLISHER <> ’ ’
GROUP BY TEXTBOOK.PUBLISHER;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
PUBLISHER COUNT(*)
--------------- --------
Springer 1
Thomson 3
Prentice-Hall 2
1
4 rows selected.
In Example 1.5.18, since AND is used to connect the two conditions, groups are
formed only for those publishers that satisfy both conditions (i.e., Thomson, Prentice-Hall,
and Springer).
Example 1.5.18
SELECT TEXTBOOK.PUBLISHER, COUNT(*)
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IS NOT NULL
AND TEXTBOOK.PUBLISHER <> ’ ’ 591
GROUP BY TEXTBOOK.PUBLISHER;
Result:
PUBLISHER COUNT(*)
----------------- --------
Springer 1
Thomson 3
Prentice-Hall 2
3 rows selected.
The SELECT statement in Example 1.5.19 illustrates a situation where the first
condition selects rows associated with the TEXTBOOK.PUBLISHER column containing a
null value and the second condition selects the rows associated with the three
publishers other than the single-space publisher. Since OR is used to connect the two
conditions, groups are formed and counts accumulated for Thomson, Prentice-Hall, and
Springer, and the rows associated with the TEXTBOOK.PUBLISHER column containing a
null value.
Example 1.5.19
SELECT TEXTBOOK.PUBLISHER, COUNT(*)
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IS NULL
OR TEXTBOOK.PUBLISHER <> ’ ’
GROUP BY TEXTBOOK.PUBLISHER;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
PUBLISHER COUNT(*)
--------------- --------
Springer 1
2
Thomson 3
Prentice-Hall 2
4 rows selected.
The SELECT statement in Example 1.5.20 introduces the use of the operator IN to
test whether a value is contained within a set of values.
Example 1.5.20
SELECT *
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER IN (’Thomson’, ’Springer’);
592 Result:
4 rows selected.
NOT IN tests for whether a value does not appear within a set of values. Example
1.5.21 illustrates how null values only satisfy a WHERE clause that uses IS NULL or IS
NOT NULL since the titles Economics For Managers and Fundamentals of SQL do not
appear in the results but Prentice-Hall and the single-space publisher do.
Example 1.5.21
SELECT *
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER NOT IN (’Thomson’, ’Springer’);
Result:
3 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
no rows selected.
Example 1.6.2
SELECT TEXTBOOK.TITLE
FROM TEXTBOOK
WHERE TEXTBOOK.TITLE LIKE ’_i’;
Result:
no rows selected.
The SELECT statements in Examples 1.6.1 and 1.6.2 return no rows because no text-
book has an ISBN number that is exactly two characters long, nor is there any textbook
title that has the letter “i” in the second character position and is exactly two characters
long. As illustrated in Example 1.6.3, revising Example 1.6.2 by adding the % sign as a
13
Although the SQL-2003 standard specifies use of the underscore character (_) and percent char-
acter (%), some implementations of SQL use other characters to represent a single character or a
series of characters.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
wildcard character searches for textbook titles with the letter “i” in the second character
position but allows the title to have as many as 22 characters (the width of the
TEXTBOOK.TITLE column).
Example 1.6.3
SELECT TEXTBOOK.TITLE
FROM TEXTBOOK
WHERE TEXTBOOK.TITLE LIKE ’_i%’;
Result:
TITLE
---------------------
Linear Programming
Simulation Modeling
2 rows selected.
Examples 1.6.4 and 1.6.5 illustrate that the LIKE operator is case sensitive.
594 Example 1.6.4
SELECT TEXTBOOK.TITLE
FROM TEXTBOOK
WHERE TEXTBOOK.TITLE LIKE ’P%’;
Result:
TITLE
--------------------
Principles of IS
Programming in C++
2 rows selected.
Example 1.6.5
SELECT TEXTBOOK.TITLE
FROM TEXTBOOK
WHERE TEXTBOOK.TITLE LIKE ’p%’;
Result:
no rows selected.
The SELECT statements in Examples 1.6.6 and 1.6.7 represent rather unusual uses of
the LIKE operator. Example 1.6.6 searches for the titles of all textbooks that contain the
letter “e,” while Example 1.6.7 displays the titles of all textbooks.
Example 1.6.6
SELECT TEXTBOOK.TITLE
FROM TEXTBOOK
WHERE TEXTBOOK.TITLE LIKE ’%e%’;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
TITLE
------------------------
Database Management
Linear Programming
Simulation Modeling
Systems Analysis
Principles of IS
Economics For Managers
Fundamentals of SQL
Data Modeling
8 rows selected.
Example 1.6.7
SELECT TEXTBOOK.TITLE
FROM TEXTBOOK
WHERE TEXTBOOK.TITLE LIKE ’%’; 595
Result:
TITLE
-------------------------
Database Management
Linear Programming
Simulation Modeling
Systems Analysis
Principles of IS
Economics For Managers
Programming in C++
Fundamentals of SQL
Data Modeling
9 rows selected.
As shown in the SELECT statement in Example 1.6.8, the string ‘%’ cannot match a
null value, as the two publishers with a null value in the TEXTBOOK.PUBLISHER column do
not appear in the results.
Example 1.6.8
SELECT *
FROM TEXTBOOK
WHERE TEXTBOOK.PUBLISHER LIKE ’%’;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
7 rows selected.
CHAR(ℓ) and VARCHAR(ℓ) are string data types (see Table 10.1) commonly used to
define fixed-length and variable-length character data. A CHAR(ℓ) data type, where ℓ
represents the length of the column, has a maximum size of 255 characters in most
DBMSs; blank characters are added to the data should the number of characters be less
596 than ℓ. A VARCHAR(ℓ) data type, where ℓ also represents the length of the column, has a
maximum size of 2,000 characters in most DBMSs. A VARCHAR(ℓ) data type does not
append blank characters to the data if the number of characters is less than ℓ. CHAR(ℓ)
and VARCHAR(ℓ) data types can produce different results in some comparisons that
involve the LIKE operator. For example, the query in Example 1.6.9 searches for and
locates all titles that end with the letter “s.”
Example 1.6.9
SELECT *
FROM TEXTBOOK
WHERE TEXTBOOK.TITLE LIKE ’%s’;
Result:
2 rows selected.
However, had the TEXTBOOK.TITLE column been defined as a CHAR(22) data type
instead of as a VARCHAR(22) data type, only one of the titles, Economics For Managers
(a title with an “s” in the 22nd character position) would have been located since the 22nd
character position in the title, Systems Analysis, would have contained a single blank
space character (i.e., the rightmost “s” in Systems Analysis would have been in the 16th
character position, with character positions 17–22 containing blank spaces).
SQL allows for the definition of an escape character in cases where either a percent
character (%) or an underscore character (_) stand for themselves as part of the search.
For example, suppose you need to identify the name of each table in the Madeira College
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
database with an underscore character as part of its table name. This would be possible
with the following SELECT statement:
Example 1.6.1014
SELECT TABLE_NAME
FROM USER_TABLES
WHERE TABLE_NAME LIKE ’%/_%’
ESCAPE ’/’;
ESCAPE is used here to declare the slash (/) as an escape character so that it can be
prefixed to the underscore character in the character string expression used in the LIKE
operator.
In the SQL-2003 standard, the CROSS keyword, combined with the JOIN keyword, is
used in the FROM clause to create a Cartesian Product. Sometimes, a Cartesian Product is
referred to as a Cross Join. The first 24 rows of the result of this Cartesian Product appear
next. Since there are 12 rows in C and six rows in D, a total of 72 rows are produced as a
result of the concatenation of C and D. The first 12 rows of the result constitute the con-
catenation of the first row of D (observe how the final three columns remain the same)
14
USER_TABLES is the name of a data dictionary table comprised of columns that contain data
about various tables that comprise the Madeira College database. Included in this table is a column
that records the name of each table.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
with the 12 rows in C. Observe that the next 12 rows of the result shows the second row of
D with each of the 12 rows of C.
Result:
15
Use of the word “INNER” is optional.
16
Recall that the letter “A” in the fourth character position of the SECTION# represents the Fall
quarter. The letters “S,” “W,” and “U” represent the Spring, Winter, and Summer quarters,
respectively.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
SID TIME
-------- ------
BG66765 T1015
KP78924 T1015
KS39874 T1015
BE76598 T1045
4 rows selected.
Observe that the ON clause given above effectively constrains the concatenation of a
row from TAKES with a row from SECTION to the condition where the SECTION# in 599
TAKES matches a SECTION# in SECTION and the day of the week when the section meets
recorded in fourth character position in the SECTION# column contains the letter “T.”
Reviewing the content of the SECTION and TAKES tables provides an explanation of
this result. Since there are 11 rows in the SECTION table and 11 rows in the TAKES tables,
the Cartesian Product operation results in an unnamed table with 121 rows and nine col-
umns (there are six columns in SECTION and three columns in TAKES). Since only five
rows in SECTION involve sections offered on a Tuesday, this Cartesian Product operation
yields a result where 55 of these 121 rows contain a value in the TIME column that begins
with the letter “T.” However, the portion of the Selection operation that requires
SECTION.SECTION# = TAKES.SECTION# selects only four of these 55 rows. Finally, the
SQL Projection operation displays the Student ID and time that appear on these four rows.17
17
The actual execution of this SELECT statement would differ from this description. For example, a
Selection operation on the SECTION table would occur first and result in “selecting” the five rows
where SECTION.TIME LIKE ‘T%’. A Cartesian Product operation would follow, concatenating the five
rows with the 11 rows in the TAKES table. This would be followed by a second Selection operation
that would “select” the four rows in the concatenated result where SECTION.TIME LIKE ‘T%’ and
where TAKES.SECTION# = SECTION.SECTION#. The Projection operation on the TAKES.SID and
SECTION.TIME columns is the final operation executed.
18
The tables R and S were created simply as a means to illustrate use of the UNION, INTERSECT,
and DIFFERENCE operators in an SQL query. Each of the examples in this section can be expressed
by a query that refers to just the SECTION table.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
RELATION R
RELATION S
Note that when duplicate rows exist in tables, only one row per set of duplicates is
displayed in the result when using the UNION operator unless UNION ALL has been used.
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
Result:
The word MINUS is used in Oracle’s SQL. The word EXCEPT is part of the SQL-2003
standard.
Example 2.2.4 (Corresponds to Difference Example 2 in Section 11.2.2). Using the
data in the relations R and S given previously, form the difference S minus R (i.e., those
sections offered in a room located in Lindner Hall but not in the Fall quarter).
SQL SELECT Statement:
SELECT *
FROM S
MINUS
SELECT *
FROM R;
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Note that the result of this Equijoin displays for each section taken by a student (see
columns 7–9) a complete description of the section (see columns 1–6). The following
example illustrates how the combination of a Natural Join operation and a Projection
operation allows a less cluttered result to be displayed.
SELECT *
FROM SECTION NATURAL JOIN TAKES;
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Approach 2
SELECT * FROM tablename1 JOIN tablename2 USING (columnname_a, columnname_b, ...,
columnname_n)
SELECT *
FROM SECTION JOIN TAKES
USING (SECTION#);
Result:
Approach 3
SELECT SECTION.*, TAKES.GRADE, TAKES.SID
FROM SECTION JOIN TAKES ON
SECTION.SECTION# = TAKES.SECTION#;
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Since most queries that involve one or more Join operations also include some sort
of Projection operation, in effect virtually all joins take the form of this “variation” of a
Natural Join.
Sometimes, a join may be specified between a table and itself. This type of join is
often referred to as a Self Join.
Example 2.3.3. List the student IDs of those students recorded as having taken more
than one course.
SELECT X.SID
FROM TAKES X JOIN TAKES Y
ON X.SID = Y.SID
AND X.SECTION# <> Y.SECTION#;
This is due to the fact that the join condition is satisfied two times for each of the
three rows in TAKES that involve X.SID KS39874, one time for each of the two rows in
TAKES that involve X.SID KP78924, one time for each of the two rows in TAKES that
involve X.SID BE76598, and one time for each of the two rows in TAKES that involve
X.SID BG66765.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Use of the qualifier DISTINCT in the SELECT statement eliminates the duplicate rows
and produces a result that corresponds to the relational algebra result, as shown here:
SQL SELECT Statement Revised:
SELECT DISTINCT X.SID
FROM TAKES X JOIN TAKES Y
ON X.SID = Y.SID
AND X.SECTION# <> Y.SECTION#;
Result:
SID
---------
KP78924
KS39874
BE76598
BG66765
The Natural Join or Equijoin operation can also be specified among multiple tables,
606 leading to what is sometimes referred to as an “n-way join.”
Example 2.3.4. Instead of listing the student IDs of those students having taken more
than one course, list the names of those students having taken more than one course.
SQL SELECT Statement:
SELECT DISTINCT STUDENT.NAME
FROM (TAKES X JOIN TAKES Y
ON X.SID = Y.SID
AND X.SECTION# <> Y.SECTION#)
JOIN STUDENT ON X.SID = STUDENT.SID;
This SELECT statement uses the result of the Self Join from Example 2.3.2.2 and joins
it with the STUDENT table. Without the use of the qualifier DISTINCT to eliminate
duplicate rows, the four names shown next would be displayed a total of 12 times (i.e.,
Gladis Bale twice, Poppy Kramer twice, Elijah Baley twice, and Sweety Kramer six times).
Result:
NAME
---------------
Poppy Kramer
Sweety Kramer
Gladis Bale
Elijah Baley
Example 2.3.5. For each student taking a section where the maximum number of
students is greater than 25, list the student’s name, classroom where the course is offered,
course number, and section number.
SQL SELECT Statement:
SELECT STUDENT.NAME, SECTION.ROOM, SECTION.COURSE#, SECTION.SECTION#
FROM (STUDENT JOIN TAKES
ON STUDENT.SID = TAKES.SID)
JOIN SECTION ON TAKES.SECTION# = SECTION.SECTION#
AND SECTION.MAXST > 25;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
In effect, prior to the execution of the Projection operation, the SQL SELECT state-
ment first links (i.e., concatenates) each row of TAKES to the corresponding row in
STUDENT and then links each row in the combined result to the corresponding row in
SECTION. Should both the course number and course name need to be displayed, the
COURSE table must be included in the join. As shown next, the SELECT statement
required to generate this result takes the result of joining the STUDENT, TAKES, and 607
SECTION tables and joins it with the COURSE table.
SQL SELECT Statement:
SELECT STUDENT.NAME, SECTION.ROOM, SECTION.COURSE#, COURSE.NAME,
SECTION.SECTION#
FROM ((STUDENT JOIN TAKES
ON STUDENT.SID = TAKES.SID)
JOIN SECTION ON TAKES.SECTION# = SECTION.SECTION#
AND SECTION.MAXST > 25)
JOIN COURSE ON SECTION.COURSE# = COURSE.COURSE#;
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
19
Use of the word “OUTER” is not required to create a LEFT, RIGHT, or FULL Outer Join.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
In SQL-2003, the words “LEFT OUTER JOIN” are used to designate a Left Outer Join
operation. The use of the LEFT JOIN keywords means that if the table listed on the left
side of the join condition given in the ON clause has an unmatched row, it should be
matched with a null row and displayed in the results.
One way to verify that the 10 rows with null values for the attributes in
TAKES represent the 10 students who have not taken a course is to take the difference
between the STUDENT and the TAKES relations over those attributes that have the
same domain:
SELECT STUDENT.SID
FROM STUDENT
MINUS
SELECT TAKES.SID
FROM TAKES;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
SID
----------
AJ76998
DT87656
FR45545
FV67733
HJ45633
HT67657
JD35477
OD76578
SD23556
SW56547
The student IDs shown here can be replaced by a list of student names through the
use of the following nested subquery (subqueries are discussed in Section 12.3):
SELECT STUDENT.NAME: FROM STUDENT
610 WHERE STUDENT.SID IN
(SELECT STUDENT.SID
FROM STUDENT
MINUS
SELECT TAKES.SID
FROM TAKES);
Result:
NAME
------------------
Jenny Aniston
Tim Duncan
Rick Fox
Vanessa Fox
Jenna Hopp
Troy Hudson
Diana Jackson
Daniel Olive
David Sane
Wanda Seldon
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
In SQL-2003, the words “RIGHT OUTER JOIN” are used to designate a Right Outer
Join operation. The use of the RIGHT JOIN keywords means that if the table listed on the
right side of the join condition given in the ON clause has an unmatched row, it should be
matched with a null row and displayed in the results.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
Result:
A Semi-Minus operation occurs in cases where there are tuples in relation R that have
no counterpart in relation S. A Semi-Minus operation can be handled in SQL through use
of a combination of Join, Projection, and Minus operations.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
12.3 SUBQUERIES
A complete SELECT statement embedded within another SELECT statement is called a
subquery. In data retrieval, subqueries may be used (a) in the SELECT list of a SELECT
statement, (b) in the FROM clause of a SELECT statement, (c) in the WHERE clause of
a SELECT statement, and (d) in the ORDER BY clause of a SELECT statement. As illus-
trated in Section 10.2.1, subqueries can also be used in an INSERT … SELECT … FROM
statement as well as in the SET clause of an UPDATE statement. The output of a subquery
can consist of a single value (a single-row subquery) or several rows of values (a multiple-
row subquery). There are two types of subqueries: (a) uncorrelated subqueries, where
the subquery is executed first and passes one or more values to the outer query, and
(b) correlated subqueries, where the subquery is executed once for every row retrieved
by the outer query.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
614
COURSE# NAME COLLEGE
----------- --------------------- -----------
22QA375 Operations Research Business
22IS270 Principles of IS Business
22IS330 Database Concepts Business
22IS832 Database Principles Business
20ECES212 Programming in C++ Engineering
The NOT IN operator is the opposite of the IN operator and indicates that the rows
processed by the outer query are not equal to any of the values returned by the subquery.
Example 3.1.2 displays the course number, course name, and college of those courses for
which sections have not been offered.
Example 3.1.2
SELECT COURSE.COURSE#, COURSE.NAME, COURSE.COLLEGE
FROM COURSE
WHERE COURSE.COURSE# NOT IN
(SELECT SECTION.COURSE#
FROM SECTION);
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
The comparison operators =, <>, >, >=, <, and <= are single-row operators.
Observe the error message generated when the multiple-row operator IN is replaced by
the single-row operator = (equals sign) in Example 3.1.3. This error is caused by the fact
that there are several section numbers in the TAKES table associated with a grade of “A.”
Observe what happens, however, when the single-row operator = is replaced by IN.
The purpose of Example 3.1.3 is to display the section number and course number for
which at least one grade of “A” has been assigned.
Example 3.1.3
SELECT DISTINCT SECTION.SECTION#, SECTION.COURSE#
FROM SECTION
WHERE SECTION.SECTION# =
(SELECT TAKES.SECTION#
FROM TAKES
WHERE TAKES.GRADE = ’A’);
(SELECT TAKES.SECTION#
*
615
ERROR at line 4:
ORA-01427: single-row subquery returns more than one row
Result:
SECTION# COURSE#
----------- --------
101A2014 22QA375
401W2014 22IS832
104A2014 22IS330
The remaining examples in this section illustrate the use of the ANY and ALL opera-
tors in the context of the PROFESSOR table.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Operator Description
> ALL Greater than the highest value returned by the subquery
>= ALL Greater than or equal to the highest value returned by the subquery
< ALL Less than the lowest value returned by the subquery
<= ALL Less than or equal to the lowest value returned by the subquery
> ANY Greater than the lowest value returned by the subquery
>= ANY Greater than or equal to the lowest value returned by the subquery
< ANY Less than the highest value returned by the subquery
<= ANY Less than or equal to the highest value returned by the subquery
= ANY Equal to any value returned by the subquery (same as the IN operator)
616
Example 3.1.4. Display the names and salaries of those professors who earn more
than all professors in department number 3.
SQL SELECT Statement:
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY > ALL
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3);
Result:
NAME SALARY
---------------- -------
Mike Faraday 92000
Marie Curie 99000
John Nicholson 99000
The following query includes Chelsea Bush and Tony Hopkins in the result since their
salary is equal to the highest value returned by the subquery:
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY >= ALL
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3);
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
NAME SALARY
---------------- -------
Mike Faraday 92000
Chelsea Bush 77000
Tony Hopkins 77000
Marie Curie 99000
John Nicholson 99000
Example 3.1.5. Display the names and salaries of those professors who earn less than
all professors in department number 7.
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY < ALL
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 7);
617
Result:
NAME SALARY
----------------- -------
Ram Raj 44000
Prester John 44000
Laura Jackson 43000
Example 3.1.6. Revise the query in Example 3.1.5 and display the names and salaries
of those professors with a salary that is less than or equal to that of the lowest paid pro-
fessor in department number 7.
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY <= ALL
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 7);
Result:
NAME SALARY
----------------- -------
John Smith 45000
Ram Raj 44000
Prester John 44000
Laura Jackson 43000
Cathy Cobal 45000
Jeanine Troy 45000
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Example 3.1.7. Revise the query in Example 3.1.6 to exclude display of any employ-
ees in department number 7.
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE <> 7
AND PROFESSOR.SALARY <= ALL
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 7);
Result:
NAME SALARY
----------------------- -------
John Smith 45000
Ram Raj 44000
Prester John 44000
Laura Jackson 43000
618
Jeanine Troy 45000
Since < ANY returns all rows with a salary less than the highest salary associated with
department 3, the query in Example 3.1.8 displays the rows for all professors with a salary
less than $77,000.
Example 3.1.8
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY < ANY
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3);
Result:
NAME SALARY
----------------------- -------
Laura Jackson 43000
Prester John 44000
Ram Raj 44000
Jeanine Troy 45000
John Smith 45000
Cathy Cobal 45000
Sunil Shetty 64000
Katie Shef 65000
Kobe Bryant 66000
Jessica Simpson 67000
Jack Nicklaus 67000
Mike Crick 69000
Alan Brodie 76000
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
On the other hand, since <= ANY returns all rows with a salary less than or equal to
the highest salary associated with department 3, the query in Example 3.1.9 also displays
the rows for both Chelsea Bush and Tony Hopkins.
Example 3.1.9
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY <=ANY
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3);
Result:
NAME SALARY
----------------------- -------
Laura Jackson 43000
Prester John 44000
Ram Raj 44000
619
Jeanine Troy 45000
John Smith 45000
Cathy Cobal 45000
Sunil Shetty 64000
Katie Shef 65000
Kobe Bryant 66000
Jessica Simpson 67000
Jack Nicklaus 67000
Mike Crick 69000
Alan Brodie 76000
Tony Hopkins 77000
Chelsea Bush 77000
Since > ANY returns all rows with a salary greater than the lowest salary associated
with professors who work in department 3, the query in Example 3.1.10 displays all rows
except for the professor with the lowest salary (i.e., Laura Jackson) and the two professors
who have a null salary.
Example 3.1.10
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY > ANY
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3);
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
NAME SALARY
----------------------- -------
John Nicholson 99000
Marie Curie 99000
Mike Faraday 92000
Chelsea Bush 77000
Tony Hopkins 77000
Alan Brodie 76000
Mike Crick 69000
Jessica Simpson 67000
Jack Nicklaus 67000
Kobe Bryant 66000
Katie Shef 65000
Sunil Shetty 64000
Cathy Cobal 45000
620 Jeanine Troy 45000
John Smith 45000
Prester John 44000
Ram Raj 44000
As expected, since >= ANY returns all rows with a salary greater than or equal to the
lowest salary associated with department 3, all rows are returned in Example 3.1.11
except for those associated with the professors who have null salaries.
Example 3.1.11
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY >= ANY
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3);
Result:
NAME SALARY
----------------------- -------
John Nicholson 99000
Marie Curie 99000
Mike Faraday 92000
Chelsea Bush 77000
Tony Hopkins 77000
Alan Brodie 76000
Mike Crick 69000
Jessica Simpson 67000
Jack Nicklaus 67000
Kobe Bryant 66000
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
As illustrated in Example 3.1.12, = ANY produces the same result as the IN operator.
Note how the query in Example 3.1.12a restricts the rows displayed to those not associ-
ated with department 3.
Example 3.1.12
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY = ANY
(SELECT PROFESSOR.SALARY
FROM PROFESSOR 621
WHERE PROFESSOR.DCODE = 3);
Result:
NAME SALARY
----------------------- -------
Tony Hopkins 77000
Chelsea Bush 77000
Alan Brodie 76000
Jack Nicklaus 67000
Jessica Simpson 67000
Laura Jackson 43000
Example 3.1.12a
SELECT PROFESSOR.NAME, PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.SALARY = ANY
(SELECT PROFESSOR.SALARY
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3)
AND PROFESSOR.DCODE <> 3;
Result:
NAME SALARY
----------------------- -------
Jack Nicklaus 67000
Although ANY and ALL are most commonly used with subqueries that return a set of
numeric values, it is also possible to use them in conjunction with subqueries that return a
set of character values.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
NAME SALARY
----------------------- -------
Mike Faraday 92000
622
Marie Curie 99000
John Nicholson 99000
Result:
NAME SALARY
----------------------- -------
Ram Raj 44000
Prester John 44000
Laura Jackson 43000
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
NAME SALARY
----------------------- -------
John Smith 45000
Ram Raj 44000
Prester John 44000
Laura Jackson 43000
Cathy Cobal 45000
Jeanine Troy 45000
Observe how the shaded subquery, in essence, creates a temporary table that records
the average salary of the professors in each department. The syntax calls for the table alias
B to be located outside the parenthetical expression of the subquery since the execution
of the subquery yields a temporary (i.e., virtual) table. The Join operation uses the
PROFESSOR table and concatenates a row from PROFESSOR (table alias A) with a row
from the temporary table created by the subquery (table alias B) when (a) the department
number of the row from A matches the department number of a row from B, and (b) the
salary of the professor in the row from A exceeds the average salary of the professors in
his or her department.
20
Such a “temporary table” is more formally called an inline view.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
In order to explain this query, each line has been numbered. Two SELECT state-
ments, each of which is the same, appear in the column list. The first (shown in italics on
lines 2−4) determines the average salary for the professors in the department for a given
professor, while the second (shown highlighted on lines 5 and 6) recalculates this average
salary and uses it to determine the amount of the deviation between the salary of the
professor and the average salary for the professors in the department. The WHERE clause
in the main query (a) excludes from consideration those professors with a null salary (see
line 8), and (b) includes only those professors whose salary exceeds that of their average
salary in their department (note that the average salary of all professors in the professors’
department is calculated a third time in lines 9 and 10). The ORDER BY clause on line
11 allows the result to be displayed in descending order by the amount of the deviation
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
between the salary of the professor and the average salary of the professors in their
department.
Result:
Result:
NAME AVG(PROFESSOR.SALARY)
--------------- ---------------------
Economics 78000
QA/QM 68000
The SELECT statement in the HAVING clause acts as a filter that insures the selection
of only those departments (i.e., groups) with an average salary greater than the average
salary of the entire college.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Example 3.2.1. Display the names of professors who have offered at least one section.
SELECT PROFESSOR.NAME
FROM PROFESSOR
WHERE EXISTS
(SELECT *
FROM SECTION
WHERE PROFESSOR.EMPID = SECTION.PROFID);
Result:
NAME
---------------
Cathy Cobal
Tony Hopkins
Ram Raj
Katie Shef
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result
COURSE# NAME
------------- -------------------
22QA375 Operations Research
Both the subquery that begins on line 4 and the subquery that begins on line 7 refer to
the SECTION table. To prevent ambiguity, these two uses of the SECTION table have
been assigned the aliases of A and B, respectively. Since the two uses of the NOT EXISTS
operator may make this query difficult to understand, let’s begin by assuming the first
row retrieved from the COURSE table as part of the execution of lines 1–3 defines
COURSE.COURSE# as 15ECON112. Replacing COURSE.COURSE# in line 9 with
‘15ECON112’ causes the execution of lines 4–10 to generate the result shown here:
4 (SELECT DISTINCT (SUBSTR(A.SECTION#,4,1)
5 FROM SECTION A
6 WHERE NOT EXISTS
7 (SELECT *
8 FROM SECTION B
9 WHERE COURSE.COURSE# = B.COURSE# 627
10 AND SUBSTR(A.SECTION#,4,1) = SUBSTR(B.SECTION#,4,1)));
Result
SUBSTR(A.SECTION#,4,1)
-------------------------
W
U
A
S
Since Course# 15ECON112 does not appear at all in the SECTION table, the NOT
EXISTS condition is true for each of the four quarters. On the other hand, when Course#
22QA375 replaces Course# 15ECON112 line 9, the NOT EXISTS condition is false for all
four quarters (note that a section of Course# 22QA375 is offered during each quarter in
the SECTION table) and thus “no rows selected” is the result when lines 4–10 are
executed.
4 (SELECT DISTINCT (SUBSTR(A.SECTION#,4,1)
5 FROM SECTION A
6 WHERE NOT EXISTS
7 (SELECT *
8 FROM SECTION B
9 WHERE COURSE.COURSE# = B.COURSE#
10 AND SUBSTR(A.SECTION#,4,1) = SUBSTR(B.SECTION#,4,1)));
Result:
no rows selected.
In other words, the NOT EXISTS condition in line 3 is true for Course# 22QA375.
This SQL formulation corresponds to the following informal statement: “Display the
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
course numbers of those courses such that there does not exist a quarter during which the
course is not offered.” It is left as an exercise for the reader to determine the result when
lines 4–10 are executed for other courses (e.g., Course# 22IS330).
21
In addition to listing column names, expressions (e.g., see Section 12.1.2), functions (e.g., COUNT
(*)), and SELECT statements (e.g., see Section 12.3), the SELECT list of a SELECT statement may
contain either a numeric constant (e.g., the numeric literal 0) or a string constant.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Result:
The same result could be obtained using the Left Outer Join shown next, since the
value of COUNT(TAKES.SID) is zero when a row for a student not enrolled in a section is
concatenated with the row of null values in TAKES.
SQL SELECT Statement:
SELECT STUDENT.SID, STUDENT.NAME, COUNT(TAKES.SID) AS "Sections Taken"
FROM STUDENT LEFT OUTER JOIN TAKES
ON STUDENT.SID = TAKES.SID
GROUP BY STUDENT.SID, STUDENT.NAME
ORDER BY "Sections Taken" DESC
Example 3.3.2. Display the maximum, minimum, total, and average salary for the
professors affiliated with each department. In addition, count the number of professors in
each department as well as the number of professors in each department with a not null
salary.
SQL SELECT Statement:
SELECT DEPARTMENT.NAME AS "Dept Name", DEPARTMENT.DCODE AS "Dept Code",
MAX(PROFESSOR.SALARY) AS "Max Salary", MIN(PROFESSOR.SALARY) AS "Min Salary",
SUM (PROFESSOR.SALARY) AS "Total Salary",
ROUND(AVG(PROFESSOR.SALARY),0) AS "Avg Salary", COUNT(*) AS "Size",
COUNT(PROFESSOR.SALARY) AS "# Sals"
FROM DEPARTMENT JOIN PROFESSOR
ON DEPARTMENT.DCODE = PROFESSOR.DCODE
GROUP BY DEPARTMENT.NAME, DEPARTMENT.DCODE
ORDER BY "# Sals" DESC;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
Result:
Dept Name Dept Code Max Salary Min Salary Total Salary Avg Salary Size # Sals
----------- --------- ---------- --------- ----------- ---------- ----- ------
QA/QM 3 77000 43000 340000 68000 5 5
Economics 1 92000 45000 203000 67667 3 3
IS 7 65000 45000 174000 58000 3 3
Economics 4 99000 67000 265000 88333 3 3
Mathematics 6 44000 44000 88000 44000 3 2
Philosophy 9 69000 45000 114000 57000 3 2
Since department names are not unique but required as part of the output,
grouping must be done on both DEPARTMENT.NAME and DEPARTMENT.DCODE. In
addition, instead of grouping by DEPARTMENT.DCODE, grouping could have been by
DEPARTMENT.COLLEGE had the name of the college housing the department been
required as opposed to the department code.
630
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
Chapter Summary
A query expressed in relational algebra involves a sequence of operations that, when executed
in the order specified, produces the desired results. SQL, the most common way that relational
algebra is implemented for data retrieval operations in a relational database, is the subject of
Chapter 12. The SQL SELECT statement is used to express a query and is the most important
statement in the language. Every SELECT statement, when executed, produces as its result a
table that consists of one or more columns and zero or more rows. Six clauses make up a
SELECT statement. Two of these clauses, the SELECT clause and the FROM clause, are
required. The SELECT clause identifies the columns, calculated values, and literals to appear in
the result table. All column names that appear in the SELECT clause must have their corre-
sponding tables or views listed in the FROM clause.
The other four clauses—WHERE, GROUP BY, HAVING, and ORDER BY—are optional.
The WHERE clause of the SELECT statement includes a search condition that consists of an
expression involving constant values, column names, and comparison operators. The ORDER
BY clause allows the result table to be sorted on the values that appear in the SELECT clause.
If specified, the ORDER BY clause must be the final clause in the SELECT statement. The
GROUP BY clause is used to form groups of rows of the result table based on column 631
values. When grouping of rows occurs, all aggregate functions (e.g., COUNT, SUM, AVG) are
computed on the individual groups and not the entire table. If used, the HAVING clause follows
the GROUP BY clause. The HAVING clause functions as a WHERE clause for groups, keeping
some groups and eliminating other groups from further consideration.
A data field without a value in it is said to be a null value. A null value can occur in a data
field where a value is unknown or where a value is not meaningful. In an SQL SELECT state-
ment, the only comparison operators that can be used with null values are IS NULL and IS NOT
NULL. Any other operator (e.g., =, >, <) used with a null value will always produce an unknown
(i.e., false) result.
SQL queries are based on one or more tables or views and often take the form of subqu-
eries and joins. A subquery is an SQL SELECT statement embedded within another query or
even another subquery. Subqueries may appear in the FROM clause, the column list, the
WHERE clause, and the HAVING clause. The SQL SELECT statement is used to implement
each of the relational algebra operations, including both inner and outer joins. Use of the SQL
SELECT statement in joining tables is required for all queries where the result comes from more
than one table.
A complete list of the SQL SELECT statement features appears in Appendix B. After
studying this chapter, it is hoped that readers will be able to use the features discussed and,
where necessary, adapt them to their specific database platform with a minimum of difficulty.
Exercises
1. Describe the six clauses that can be used in the syntax of the SQL SELECT statement.
Which two clauses must be part of each SELECT statement?
2. What is the difference between a SELECT statement used in conjunction with the relational
algebra Selection operation and a SELECT statement used in conjunction with the
relational algebra Projection operation?
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 12
3. Of what value is the use of parentheses when making use of the rules of operator
precedence?
4. What is the difference between a character field that contains a null value and a character
field that contains a single blank space?
5. Which comparison operators can be used when searching for null values? Which compari-
son operators cannot be used when searching for null values? What is the result when
these unacceptable comparison operators are used when searching for null values?
6. What is the result of an attempt to add, subtract, multiply, or divide two number fields, one
of which contains a null value?
7. How are null values treated when one or more appears during the execution of a group
function?
8. When must a GROUP BY clause be used in a query?
9. What SQL operator (i.e., keyword) is used in conjunction with pattern matching?
10. What is the difference between a SELECT statement that uses COUNT (*) and a SELECT
statement that uses COUNT (column name)? How does COUNT (column name) differ from
632 COUNT (DISTINCT column name)?
11. Why is it important to be aware of the distinction between the CHAR and VARCHAR data
types?
12. What is the difference between a Cross Join, an Inner Join, and an Outer Join?
13. What is the difference between the JOIN … USING and the JOIN … ON approaches for
joining tables? Which approach must be used if the requirement that attributes have unique
names over the entire relational schema is enforced?
14. What is a subquery, and where can subqueries appear within an SQL SELECT statement?
15. What do the ALL and ANY operators do when used in a subquery?
16. You must have completed Exercise 10 in Chapter 10 before beginning this exercise, and
thus have used the SQL Data Definition Language to populate the tables for the three rela-
tions DRIVER, TICKET_TYPE, and TICKET. Once the three tables have been populated,
write SQL Select statements to satisfy the following information requests:
a. Display the names of all drivers.
b. Display the license numbers of all drivers who have been issued a ticket.
c. Display the names of all drivers who have been issued a ticket.
d. Display the license numbers of all drivers who have never been issued a ticket.
e. Display the names of all drivers who have never been issued a ticket.
f. Count the number of tickets issued for each offense. Include as part of what you
display any offense for which a ticket has not been issued.
g. For each ticket issued, display the name of the driver, the ticket number, and the
nature of the offense. Order the results in ascending order by the name of the driver;
and, within each driver, order the results by ticket number.
17. You must have completed Exercise 11 in Chapter 10 before beginning this exercise, and
thus have used the SQL Data Definition Language to populate the tables for the three
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Structured Query Language (SQL)
relations COMPANY, STUDENT, and INTERNSHIP. Once the three tables have been pop-
ulated, write SQL Select statements to satisfy the following information requests:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 13
ADVANCED DATA
MANIPULATION USING SQL
SQL for data manipulation is covered extensively in Chapter 12. In this chapter, we offer the
reader a glimpse of some advanced features of SQL via an assortment of simple and easy-
to-grasp examples. The discussion begins in Section 13.1 with an examination of a number of
character-based built-in functions that can be used in an SQL statement anywhere a constant
of the same data type can be used, whereas Section 13.2 focuses on functions that facilitate
the manipulation of dates and times. The next four sections introduce SQL’s features for
writing hierarchical queries, using Extended GROUP BY clauses, working with analytical
functions, and incorporating elements of spreadsheet modeling into the SQL SELECT state-
ment. Hierarchical relationships exist in an organization chart, a bill of materials, or a family
tree. Section 13.3 discusses the use of the CONNECT BY clause and the PRIOR operator in
processing data of this type. Section 13.4 discusses the GROUP BY clause, which makes
aggregating data from different perspectives simpler and more efficient. These enhancements
take the form of the ROLLUP and CUBE operators supplemented by the GROUPING SETS
extension to the GROUP BY clause, the GROUPING function, the GROUPING_ID function,
and the GROUP_ID function. SQL’s analytical functions and MODEL clause are business
intelligence tools that allow data to be retrieved, analyzed, and reported. Two of SQL’s ana-
lytical functions, ranking functions and window functions, are addressed in Section 13.5. The
SQL MODEL clause, introduced in Section 13.6, makes it possible to define a multidimen-
sional array on query results and then apply rules on the array to calculate new values. Sec-
tion 13.7 concludes the chapter with a series of examples that apply a number of the SQL
features introduced in this chapter and in Chapter 12.
As was the case in parts of Chapter 12, the execution of the queries in this chapter
includes feedback as to the number of rows retrieved. In an effort to conserve space,
information of this type is omitted for those queries where the number of rows retrieved is
obvious.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Chapter 13 Sections
SQL:2003 Function Oracle 10g Implementation Containing Useful Examples
636 TABLE 13.1 Selected SQL:2003 built-in functions and their Oracle equivalents
This section covers the use of the functions listed in Table 13.1 via a variety of short
examples in the context of the SQL SELECT statement. Note that Oracle’s SQL requires
use of the FROM keyword in every SQL SELECT statement. Thus, many of the Oracle
SQL examples in this section make use of the DUAL table. The DUAL table has one col-
umn, DUMMY CHAR(1), and one row with a value of “X.” As we will see, the DUAL table
is useful when a SELECT statement is issued to display data that does not exist in a table.
It is particularly useful when you want to display a numeric or character literal in a
SELECT statement.
Column 2 of Table 13.1 contains names of selected SQL:2003 standard built-in func-
tions used by Oracle, while column 3 contains the section numbers in Chapter 13 where
examples of the use of these functions can be found.
It is important to note that the syntax and some of the functionality of some of the
SQL:2003 functions in this section varies across database platforms. In short, the material
in this chapter, along with the material in Chapters 10, and 12, is intended to be a highly
useful but not necessarily stand-alone reference to SQL. As such, the reader may need to
supplement the material in this textbook with product-specific documentation.
where:
• n indicates the character position where the search begins
• len represents the length of the search.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Oracle implements the SUBSTRING function with the SUBSTR (char, m [,n]) function,
which returns a portion of char, beginning at character m, that is n characters long (if n is
omitted, to the end of char). The first position of char is 1. Floating point numbers passed
as arguments to SUBSTR are automatically converted to integers.
The SELECT statement in SUBSTRING Example 1 goes into the character string
‘ABCDEFG’ beginning at character position 3 and returns the next four characters, thus
displaying “CDEF.”
SUBSTRING Example 1
SELECT SUBSTR(’ABCDEFG’,3,4) "Substring" FROM DUAL;
Result: CDEF1
As shown in Examples 2 and 3, if the position where the search begins is not an integer,
the value of the position argument m is truncated.
SUBSTRING Example 2
SELECT SUBSTR(’ABCDEFG’,3.1,4) "Substring" FROM DUAL;
Result: CDEF
637
SUBSTRING Example 3
SELECT SUBSTR(’ABCDEFG’,3.7,4) "Substring" FROM DUAL;
Result: CDEF
The position where the search begins can also be a negative number. In this case,
characters beginning with the rightmost characters in the string are stripped off. As
expected, the SELECT statement in Example 4 illustrates that a value of 5 for the
starting position of the search produces the same result as when the starting position of
the search has a value of 3.
SUBSTRING Example 4
SELECT SUBSTR(’ABCDEFG’,-5,4) "Substring" FROM DUAL;
Result: CDEF
As shown in Example 5, if the number of characters to be searched is omitted, the
number of characters returned extends to the end of the string.
SUBSTRING Example 5
SELECT SUBSTR(’ABCDEFG’,-1) FROM DUAL;
Result: G
1
The output of each function is accompanied by a column heading. Since the format used to display
column headings varies by function and by product, only the value returned by the function is
displayed in this section.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Observe that if the position where the search begins is a negative value and the num-
ber of characters to be stripped off is greater than the absolute value of value where the
search begins, the number of characters returned extends only to the end of the string
(see Example 6).
SUBSTRING Example 6
SELECT SUBSTR(’ABCDEFG’,−1, 3) FROM DUAL;
Result: G
However, as indicated in Examples 7 and 8, if the number of characters to be stripped
off is zero or negative, a null value is displayed for the result.
SUBSTRING Example 7
SELECT SUBSTR (’ABCDEFG’,-1, 0) FROM DUAL;
Result:
NAME Phone
------------ -------------
John Smith (523)556-7645
John B Smith (523)556-7556
Sunil Shetty (523)556-6764
Katie Shef (523)556-8765
Cathy Cobal (523)556-5345
Jeanine Troy (523)556-5545
Tiger Woods (523)556-5563
7 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Result: 7
Using the TEXTBOOK table introduced in Section 12.1.5 of Chapter 12, the SELECT
statements in Examples 2 and 3 illustrate the difference when the LENGTH function is
applied to a column defined as a VARCHAR data type (TEXTBOOK.TITLE) versus one
defined as a CHAR data type (TEXTBOOK.PUBLISHER).
LENGTH Example 2
SELECT TEXTBOOK.TITLE, TEXTBOOK.PUBLISHER, LENGTH(TEXTBOOK.TITLE)
FROM TEXTBOOK;
Result:
639
TITLE PUBLISHER LENGTH(TEXTBOOK.TITLE)
----------------------------- --------------------- ----------------------
Database Management Thomson 19
Linear Programming Prentice-Hall 18
Simulation Modeling Springer 19
Systems Analysis Thomson 16
Principles of IS Prentice-Hall 16
Economics For Managers 22
Programming in C++ Thomson 18
Fundamentals of SQL 19
Data Modeling 13
9 rows selected.
LENGTH Example 3
SELECT TEXTBOOK.TITLE, TEXTBOOK.PUBLISHER,
LENGTH(TEXTBOOK.PUBLISHER)
FROM TEXTBOOK;
2
The LENGTH (char) function is called the CHAR_LENGTH (string) function in the SQL:2003 stan-
dard. It represents the length of a character string.
3
The LENGTH (char) function is the LEN (char) function in SQL Server.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Result:
9 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
TRIM Example 4 illustrates how the Oracle LTRIM function not only trims the word
“Systems” from the beginning of the textbook title Systems Analysis, it also trims the
leading “S” from all titles that begin with the letter “S” (i.e., Simulation Modeling). This is
because in Oracle’s LTRIM function it is important to note that any character string that
begins with any of the characters included in unwanted will be trimmed.
TRIM Example 4
SELECT TEXTBOOK.TITLE, LTRIM(TEXTBOOK.TITLE, ’Systems’) "Trimmed Title"
FROM TEXTBOOK;
Result:
9 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
TRIM Example 7
SELECT LTRIM (’xxxXxxLAST WORD’, ’yX’) FROM DUAL;
Result:
Trimmed Result
642 ----------------------
Database Management
Linear Programmin
Simulation Modelin
Systems Analysis
Principles of IS
Economics For Managers
Programming in C++
Fundamentals of SQL
Data Modelin
9 rows selected.
Trim Example 9 illustrates how the RTRIM function can be used by Oracle to trim
(i.e., remove) unwanted blank spaces at the end of a CHAR data type. Column 3 of the
result in Trim Example 9 confirms that the TEXTBOOK.PUBLISHER column in the
TEXTBOOK table is defined as a CHAR(13) data type, while column 5 verifies that
the RTRIM function used in column 4 removed all trailing blank spaces from the end of all
not-null publishers.
TRIM Example 9
SELECT TEXTBOOK.TITLE, TEXTBOOK.PUBLISHER,
LENGTH(TEXTBOOK.PUBLISHER) "Length Pub",
RTRIM(TEXTBOOK.PUBLISHER) "Trimmed Pub",
LENGTH(RTRIM(TEXTBOOK.PUBLISHER)) "Trimmed Length"
FROM TEXTBOOK;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Result:
9 rows selected.
The function searches char, replacing each occurrence of a character found in from_
string with the corresponding character from to_string. Characters that are in char but
not in from_string are left untouched, whereas characters in from_string but not in
to_string are deleted. For example, the following TRANSLATE function could be used
to extract the identifier of the department from each course number.
SELECT COURSE#, TRANSLATE (COURSE#, ’ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789’,
’ABCDEFGHIJKLMNOPQRSTUVWXYZ’) "Department"
FROM COURSE;
Result:
COURSE# Department
--------- ----------
05ARCH101 ARCH
15ECON112 ECON
18ACCT801 ACCT
18ECON123 ECON
20ECES212 ECES
22IS270 IS
22IS330 IS
22IS430 IS
22IS832 IS
22QA375 QA
22QA411 QA
22QA888 QA
12 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
None of the characters in COURSE.COURSE# are left untouched because each charac-
ter is part of from_string. However, since the characters ‘0123456789’ in from_string do
not appear in to_string ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’, the digits in
COURSE.COURSE# are not returned by the TRANSLATE function.
The position argument is used to specify the starting position for the search in source,
and occurrence makes it possible for a specific occurrence to be found. If position is neg-
ative, the search begins from the end of the string.
The SELECT statement in INSTR Example 1 locates the character position of the
second occurrence of the character ‘S’ in the character string ‘MISSISSIPPI’ beginning
at character position 5. When the value of occurrence is changed to a 1 (see INSTR
Example 2), observe that a different ‘S’ is located.
INSTR Example 1
SELECT INSTR (’MISSISSIPPI’,’S’,5,2) FROM DUAL;
Result: 7
INSTR Example 2
SELECT INSTR (’MISSISSIPPI’,’S’,5,1) FROM DUAL;
Result: 6
The SELECT statement in INSTR Example 3 locates all textbooks that contain the
character string ‘ing’ somewhere in the title. Since both position and occurrence are
omitted, their values are assumed to be equal to 1.
INSTR Example 3
SELECT TEXTBOOK.TITLE, INSTR(TEXTBOOK.TITLE, ’ing’) "Position of ing"
FROM TEXTBOOK
WHERE INSTR(TEXTBOOK.TITLE, ’ing’) > 0;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Result:
4 rows selected.
Observe that the value of the INSTR function returned for all other titles is zero.
Result:
3 rows selected.
The WHERE clause shown on lines 7 and 8 governs the titles displayed when the query
is executed. The INSTR(TEXTBOOK.TITLE, ‘ ‘) function on line 8 is evaluated first and returns
the character position of the first blank space character in the title of each textbook (i.e., it is
responsible for skipping over the first word of the title). By adding 1 to the value returned by
the INSTR function, the character position of the first character in the second word of the
title is obtained. The SUBSTR(TEXTBOOK.TITLE, INSTR(TEXTBOOK.TITLE, ‘ ‘)+1) function on
lines 7 and 8 is evaluated next. Since a value of n is not provided, all remaining characters
beginning with the second word of the title are selected. Finally, the outer INSTR function on
line 7 looks for the first occurrence of the character string ‘ing’ in the second or remaining
words in the title. The values displayed in columns 2 and 3 indicate the values returned by
the INSTR function when asked to find the character position of the character string ‘ing’
starting with the second word of the title (column 2) and when asked to find the character
position of the character string ‘ing’ starting with the first word of the title (column 3).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
16 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Result:
16 rows selected
Simple CASE Expressions. Simple CASE expressions use expressions to determine the
returned value and have the following syntax:
CASE search_expression
WHEN expression1 THEN result1
WHEN expression2 THEN result2
…
WHEN expressionN THEN resultN
ELSE default_result
END
where: search_expression is the expression being evaluated
expression1, expression2, …, expressionN are the expressions to be
evaluated against search_expression
result1, result2, …, resultN are the returned results (one for each possible
expression)
default_result is the default result returned when no matching expression is
found
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Result:
16 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Result:
16 rows selected.
You can also use comparison operators other than an equals sign in a searched CASE
expression.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
in all quarters during which sections are offered? For the convenience of the reader, the
content of the COURSE and SECTION tables follows:
COURSE Table
The following SQL SELECT statement can also be used to display the course(s)
offered during all quarters. Three steps are required to execute this statement. First, the
number of distinct quarters in the SECTION table is determined by the SELECT
statement:
(SELECT DISTINCT(COUNT(DISTINCT(SUBSTR(SECTION.SECTION#,4,1)))) FROM SECTION).
Second, the COURSE and SECTION tables are joined on their course number
attributes that share the same domain. This join yields a total of 11 rows. Next, the rows
associated with the result of the join are logically grouped by the combination of
COURSE.COURSE# and COURSE.NAME, with the HAVING clause used to identify the sub-
set of groups we want to consider. Finally, since one course (course number 22QA375) is
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
associated with all four quarters, only one course number is displayed: that for course
number 22QA375.
SQL SELECT Statement:
SELECT DISTINCT COURSE.COURSE#, COURSE.NAME
FROM COURSE JOIN SECTION
ON COURSE.COURSE# = SECTION.COURSE#
GROUP BY COURSE.COURSE#, COURSE.NAME;
HAVING COUNT(*) = (SELECT DISTINCT(COUNT(DISTINCT(SUBSTR
(SECTION.SECTION#,4,1)))) FROM SECTION)
Result:
COURSE# NAME
------- -------------------
22QA375 Operations Research
The remainder of the examples in this section makes use of the Oracle default format
for representing a date.4
Although referenced as a non-numeric field, a date is actually stored internally in a
numeric format that includes the century, year, month, day, hour, minute, and second.
While dates appear as non-numeric fields when displayed, calculations can be performed
4
NLS is an acronym for “National Language Support.”
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
with dates because they are stored internally as numeric data in accordance with the
Julian calendar. A Julian date represents the number of days that have elapsed between a
specified date and January 1, 4712 B.C.
The CURRENT_DATE function is part of the SQL:2003 standard and is used to record
the current date and time. For example, the following SELECT Statement calculates the
tenure (in days) and the tenure (in years) for each professor in department 3.
SQL SELECT Statement:
SELECT PROFESSOR.NAME, CURRENT_DATE - PROFESSOR.DATEHIRED "Tenure in Days",
TRUNC((CURRENT_DATE - PROFESSOR.DATEHIRED)/365.25,0) "Tenure in Years"
FROM PROFESSOR
WHERE PROFESSOR.DCODE = 3;
Result:
5 rows selected
5
The TO_CHAR function can also be used to convert a number to a formatted character string.
Format masks used in conjunction with numbers appear in Table 13.3.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
MONTH, Month, Name of the month spelled out (padded JULY, July, or july (5 spaces follows
or month with blank spaces to a total width of each representation of July)
nine spaces); case follows format.
DAY, Day, Name of the day of the week spelled MONDAY, Monday, or monday
or day out (padded with blank spaces to a (3 spaces follows each representation
length of nine characters) of Monday)
YEAR, Year, Spells out the year; case follows year. TWO THOUSAND FOURTEEN
or year
J Julian date; January 1, 4712 B.C. is July 27, 2014 is Julian date 2456865
day 1.
TABLE 13.2 Selected date and time format elements used with the TO_CHAR and TO_DATE
functions
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
TABLE 13.3 Selected number format elements used with the TO_CHAR function
The following query illustrates the use of the TO_CHAR function to display the date
hired and salary of each professor in department 3, using the default format and an alter-
native format:
SQL Select Statement:
Result:
When inserting a date in a table, Oracle assumes a default time of 12:00 AM (midnight).
Should it be necessary to associate a time other than 12:00 AM with a date, the TO_DATE
function can be used. For example, suppose we wish to insert the date and time of admis-
sion for each new patient into the PATIENT table. The INSERT statement that appears
below illustrates how this could be done. Prior to and following the INSERT statement are
two SELECT statements. The first two SELECT statements display the name and date of
admission of each patient prior to the insertion of the new patient. Note that the first
SELECT statement displays the date of admission using the default date format, while the
second displays the date of admission using a format mask that includes the time portion of
the date of admission. The use of the TO_DATE function in the INSERT statement for
patient Zhaoping Zhang allows both the date and time of her admission to be recorded.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
3 rows selected.
3 rows selected.
1 row created.
4 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
4 rows selected.
Suppose patient Zhaoping Zhang is discharged on August 12 at 3:35 PM (i.e. the value
of the CURRENT_DATE function is 3:35 PM). The following SELECT statement records
her length of stay:
SELECT PATIENT.PAT_NAME, CURRENT_DATE - PATIENT.PAT_ADMIT_DT "Length of Stay"
FROM PATIENT
WHERE PATIENT.PAT_P#A = ’ZZ’ AND PATIENT.PAT_P#N = ’06912’;
Zhou (AZ02)
Director
Hoffpauir (DH01) Kuncheria (GK01) Li (ZL01) Ryan (MR01) Chan (JC01) Smith (MS01) David (JD01) Mai (LM01)
Programmer Developer Sr. Analyst Sr. Analyst Sr. Analyst Sr. Analyst Programmer Developer
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
In other words, Alicia Zhou, the company founder, serves as its director, with James
Ryan, Michael Parks, and Ron Mayfield serving as managers of these three departments:
Finance and Accounting Excellence (FAE), Risk and Compliance (RC), and Internal Audit
and Financial Controls (IAFC). The CONSULTANT table contains data on each of the
company’s 14 consultants:
SELECT * FROM CONSULTANT;
14 rows selected.
The REPTS_TO column refers back to the ID column and thus reflects the supervisor
of the employee (if any).
Data organized in a hierarchy are said to form a tree the elements of which are called
nodes. Four types of nodes make up a tree:
• Root node—The node at the top of the tree. For example, in Figure 13.1, the
root node is Zhou, the Director.
• Parent node—A node that has one or more nodes below it. For example, in
Figure 13.1, Ryan is the parent of Hoffpauir, Kuncheria, Li, and Ryan. Note
that two different employees of AZ Consultants have the name Ryan.
• Child node—A node with one parent node above it. For example, in
Figure 13.1, Zhang’s parent is Smith.
• Leaf node—A node with no children. Hoffpauir and David are two of the eight
leaf nodes.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
where:
• LEVEL is a pseudo-column that tells you how far into the tree you are.
• start_condition specifies where to start the hierarchical query from.
• prior_condition specifies the relationship between the parent and child rows;
the relationship between the parent and the child is established by placing
the PRIOR operator before the parent column.
To find the children of a parent, SQL evaluates the expression qualified by the PRIOR
658 operator for the parent row. Rows for which the condition is true are the children of the
parent. Using the CONSULTANT table, the following CONNECT BY clause makes it possi-
ble to see the children of a parent:
CONNECT BY PRIOR ID = REPTS_TO
The ID column is the parent, and the REPTS_TO column is the child. The PRIOR
operator is placed in front of the parent column ID. As illustrated in the queries that follow
in Examples 1 and 2, depending on which column you prefix with the PRIOR operator, the
direction of the hierarchy changes.
The START WITH clause determines the root rows of the hierarchy. The records for
which the START WITH clause is true are selected first. All children are retrieved from
these records going forward.
To find the children of a specific parent—for example, Michael Parks (or ID = ‘MP01’)
in Example 1—SQL evaluates the expression qualified by the PRIOR operator for the par-
ent row. Rows for which the condition is true are the children of the parent.
Example 1
SELECT * FROM CONSULTANT
START WITH ID = ’MP01’
CONNECT BY PRIOR ID = REPTS_TO;
5 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
The START WITH clause determines the root rows6 of the hierarchy. The records for
which the START WITH clause is true are first selected. All children are retrieved from
these records going forward.
• In this case, after displaying Michael Parks in the first row, the first consul-
tant (i.e., child) reporting to Parks is displayed (Jackie Chan, employee id
JC01) in row 2. Note that the content of the REPTS_TO column in row 2 (the
child) is equal to the content of the ID column in row 1 (the parent).
• After displaying Jackie Chan on row 2, the first consultant reporting to Chan
is displayed (Nicole Nguyen, employee id NN01) in row 3.
• Since no consultant reports to Nicole Nguyen, the second consultant report-
ing to Michael Parks is displayed next (Maranda Smith, employee id MS01) in
row 4.
• After displaying Maranda Smith, the first consultant reporting to Smith is
displayed in row 5 (Anthony Zhang, employee id AZ01).
• Since no consultant reports to Anthony Zhang and no more consultants
report to Maranda Smith and no more consultants report to Michael Parks,
the execution of the query terminates with the fifth row.
In summary, note that Michael Parks is displayed first, then one of his senior analysts 659
is displayed, followed by all (in this case, just the one) of the senior analyst’s subordinates,
followed by the other senior analyst, followed by all (in this case, just one) of her subordi-
nates. Without the START WITH clause, SQL uses all rows in the table as root rows. It is
left as an exercise to the reader to verify that such a query will generate 40 rows of output.
Queries without a CONNECT BY clause generate a syntax error.
As mentioned previously, depending on which column you prefix with the PRIOR oper-
ator, the direction of the hierarchy changes. For example, consider the following query:
Example 2
SELECT * FROM CONSULTANT
START WITH ID = ’NN01’
CONNECT BY ID = PRIOR REPTS_TO;
4 rows selected.
Rather than displaying all employees under Nicole Nguyen in the hierarchy, all employ-
ees above Nicole Nguyen in the hierarchy (i.e., all employees that Nicole Nguyen ultimately
reports to) are displayed. In other words, this query starts at the child and traverses upward.
6
In this example. there is only one root row.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
14 rows selected.
Observe how the use of the LEVEL pseudo-column traverses the hierarchy from top to
bottom, left to right.
The COUNT() function can be used with the LEVEL pseudo-column to obtain the
number of levels in the “tree.”
Example 4
SELECT COUNT (DISTINCT LEVEL)
FROM CONSULTANT
START WITH ID = ’AZ02’
CONNECT BY PRIOR ID = REPTS_TO;
COUNT(DISTINCTLEVEL)
--------------------
4
1 row selected.
The LEVEL pseudo-column makes it easier to understand the results of a query with-
out a START WITH clause.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
14 rows selected.
Note how use of the LPAD function allows you to visualize the hierarchy by indenting
it with spaces. The length of the padded characters is calculated with the LEVEL pseudo-
column. Of course, it is not necessary to display the value of the LEVEL pseudo-column.
7
The LPAD(x, width, [, pad_string]) is used to pad x with spaces to the left to bring the total length
of the string up to width character. If a string is supplied in pad_string, this string is repeated to the
left to fill up the padded space.The resulting padded string is then returned.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Example 6
SELECT TITLE, LPAD(’ ’, 2*(LEVEL-1)) || NAME "Consultant Name"
FROM CONSULTANT
START WITH ID IN
(SELECT ID FROM CONSULTANT
WHERE TITLE = ’Manager’)
CONNECT BY PRIOR ID = REPTS_TO;
13 rows selected.
A WHERE clause can also be used to eliminate a particular node from a query. For
example, the query in Example 7 eliminates Michael Parks from the previous query.
Example 7
SELECT TITLE, LPAD(’ ’, 2*(LEVEL-1)) || NAME "Consultant Name"
FROM CONSULTANT
START WITH ID IN
(SELECT ID FROM CONSULTANT
WHERE TITLE = ’Manager’
AND NAME NOT LIKE ’%Parks%’)
CONNECT BY PRIOR ID = REPTS_TO;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
8 rows selected.
ID NAME Path
------- -------------- -------------------------------------------------
AZ02 Zhou, Alicia /Zhou, Alicia
JR01 Ryan, James R /Zhou, Alicia/Ryan, James R.
DH01 Hoffpauir, Deb /Zhou, Alicia/Ryan, James R./Hoffpauir, Deb
GK01 Kuncheria, Ginu /Zhou, Alicia/Ryan, James R./Kuncheria, Ginu
MR01 Ryan, Michael /Zhou, Alicia/Ryan, James R./Ryan, Michael
ZL01 Li, ZP /Zhou, Alicia/Ryan, James R./Li, ZP
MP01 Parks, Michael /Zhou, Alicia/Parks, Michael
JC01 Chan, Jackie /Zhou, Alicia/Parks, Michael/Chan, Jackie
NN01 Nguyen, Nicole /Zhou, Alicia/Parks, Michael/Chan, Jackie/Nguyen,
Nicole
MS01 Smith, Maranda /Zhou, Alicia/Parks, Michael/Smith, Maranda
AZ01 Zhang, Anthony /Zhou, Alicia/Parks, Michael/Smith, Maranda/Zhang,
Anthony
RM01 Mayfield, Ron /Zhou, Alicia/Mayfield, Ron
JD01 David, Jason /Zhou, Alicia/Mayfield, Ron/David, Jason
LM01 Mai, Ly H /Zhou, Alicia/Mayfield, Ron/Mai, Ly H.
14 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
It is left as an exercise for the reader to explain the output displayed when the PRIOR
operator is placed in front of the REPTS_TO column.
10 rows selected.
Note that the START WITH and CONNECT BY clauses in Example 9 are simply part of
the join condition.
Example 9
SELECT NAME, TITLE, DID, AS_C_ID, AS_P_ID, AS_HOURS
FROM CONSULTANT JOIN ASSIGNMENT
ON CONSULTANT.ID = AS_C_ID
START WITH ID = ’MP01’
CONNECT BY PRIOR ID = REPTS_TO;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
5 rows selected.
14 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Note how the foreign key constraint has not been included in the CREATE TABLE
statement. Indeed, had it been, the first row could not have been inserted into the
CONSULTANT table because the consultant to whom Ginu Kuncheria reports did not exist
in the table at this time.
Observe how the hierarchical structure is reflected in Example 10 despite the fact that
there is no constraint in CONSULTANT that requires a consultant to report to an existing
consultant.
Example 10
SELECT * FROM CONSULTANT
START WITH ID = ’AZ02’
CONNECT BY PRIOR ID = REPTS_TO;
14 rows selected.
Note how it is possible to insert a new consultant (King Nelson) who does not report to
an existing consultant (i.e., there is no employee id ZZ02).
INSERT INTO CONSULTANT VALUES (’KN01’, ’Nelson, King’, ’M’, ’Developer’, ’FAE’,
100000, ’23-JUL-13’, ’ZZ02’);
1 row created.
However, since the CONNECT BY clause is not satisfied in the following hierarchical
query (i.e., employee KN01 does not report to anyone in the hierarchy), consultant KN01
does not appear in the output generated by SELECT statement shown in Example 11.
Example 11
SELECT * FROM CONSULTANT
START WITH ID = ’AZ02’
CONNECT BY PRIOR ID = REPTS_TO;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Let’s assign consultant KN01 to one of the existing consultants (consultant JR01): 667
UPDATE CONSULTANT SET REPTS_TO = ’JR01’WHERE ID = ’KN01’;
1 row updated.
Since the CONNECT BY clause is now satisfied, consultant KN01 now appears in the
highlighted row:
Example 12
SELECT * FROM CONSULTANT
START WITH ID = ’AZ02’
CONNECT BY PRIOR ID = REPTS_TO;
Now, let’s add the constraint that requires each consultant to report to an existing
consultant. Once this constraint has been added, new consultants can be safely added to
the hierarchical structure as long as they report to an existing consultant.
ALTER TABLE CONSULTANT ADD CONSTRAINT REPTS_TO_FK
FOREIGN KEY (REPTS_TO) REFERENCES CONSULTANT (ID);
Table altered.
INSERT INTO CONSULTANT VALUES (’JA01’, ’Abbott, John’, ’M’, ’Programmer’, ’FAE’,
100000, ’24-JUL-13’, ’ZZ02’);
INSERT INTO CONSULTANT VALUES (’JA01’, ’Abbott, John’, ’M’, ’Programmer’, ’FAE’,
*
ERROR at line 1:
ORA-02291: integrity constraint (BUILDHIERARCHY.REPTS_TO_FK) violated - parent
key not found
8
The following CREATE TABLE statement created the DEPARTMENT table:
CREATE TABLE DEPARTMENT (DID VARCHAR2(5) PRIMARY KEY, DNAME VARCHAR2(40) NOT
NULL, DMGR VARCHAR2(5));
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
The query in Example 2 rewrites the query in Example 1 using the ROLLUP operator.
Notice how the additional row at the end contains the total salaries for the three
departments.
Example 2
SELECT DNAME "DEPARTMENT NAME", SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY ROLLUP(DNAME);
9
When using the ROLLUP and CUBE operators, the order in which the rows are displayed can vary
among different database products.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
10 rows selected.
Note how the result includes (a) the sum of all salaries in each department, (b) the
sum of the salaries of the consultants in each department for each gender, and (c) the
sum of the salaries of all consultants. As shown in Example 4, if a GROUP BY (DNAME,
GENDER) clause had simply been used, only the six sums associated with (b) would be
670 displayed.
Example 4
SELECT DNAME "DEPARTMENT NAME", GENDER, SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY (DNAME, GENDER);
6 rows selected.
The SELECT statement in Example 5 illustrates the totals generated when three
columns are passed to the ROLLUP operator. Since there are five different jobs across
the three departments and two genders, a total of 30 possible sums could be generated.
Another six possible sums could be generated for each of the six department name and
gender combinations. There would also be four additional sums (a sum for each depart-
ment name along with a sum of the salaries for all consultants. However, given the fact
that no department has more than one consultant of each gender for each of its job titles,
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
only 13 of the 30 possible job, department name, and gender combinations exist. When
added to the sums for the six department name and gender combinations, the three
departments, and the single grouping of all consultants, the query shown next produces a
total of 23 rows.
Example 5
SELECT DNAME "DEPARTMENT NAME", GENDER, TITLE, SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY ROLLUP(DNAME, GENDER, TITLE);
23 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Example 6
SELECT DNAME "DEPARTMENT NAME", GENDER, SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY ROLLUP(GENDER, DNAME);
9 rows selected.
Any of the other aggregate functions can be used with ROLLUP (e.g., AVG(), COUNT(),
MAX(), MEDIAN(), MIN(), STDDEV(), VARIANCE()).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
12 rows selected.
The query in Example 8 switches department name and gender so that gender is listed
before department name. This still results in CUBE returning separate sums of the salaries
673
for each department and for each gender.
Example 8
SELECT DNAME "DEPARTMENT NAME", GENDER, SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY CUBE(GENDER, DNAME);
12 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
4 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
The GROUPING () function eliminates any ambiguities. Whenever you see a value of 1
in a column where the GROUPING () function is applied, it indicates the presence of what
is referred to as a “super aggregate row,” such as a subtotal or grand total row created with
the ROLLUP or CUBE operator.10
13.4.5.2 Using the DECODE () Function to Convert the Returned Value from
the GROUPING () Function
As illustrated in Example 10, the DECODE () function can be used with the GROUPING ()
function to add a label to what would otherwise be displayed as a null value for a column
(the department name column in this case).
Example 10
SELECT DECODE (GROUPING (DNAME), 1, ’All Departments’, DNAME) "DEPARTMENT NAME",
SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY ROLLUP(DNAME);
675
4 rows selected.
The DECODE () and GROUPING () functions can also be used to display a meaningful
label for multiple column values. The SELECT statement in Example 11 replaces the null
values in a ROLLUP based on the department name and gender columns.
Example 11
SELECT DECODE(GROUPING(DNAME), 1, ’All Departments’, DNAME) "DEPARTMENT NAME",
DECODE(GROUPING(GENDER), 1, ’Both Genders’, GENDER) "GENDER", SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY ROLLUP(DNAME, GENDER);
10
The rows where GROUPING (DNAME) is equal to 0 exist because of the GROUP BY clause.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
10 rows selected.
5 rows selected.
Note how the GROUPING SETS extension eliminates the highlighted rows in Example 13
that would appear had the GROUP BY CUBE (DNAME, GENDER) been used.
Example 13
SELECT DNAME "DEPARTMENT NAME", GENDER, SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY CUBE(DNAME, GENDER);
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
12 rows selected.
677
While the GROUPING SETS extension allows the subtotals for the department
name and gender columns to be returned; the total for all consultants is not returned.
The GROUPING_ID () described below makes it possible to obtain this total.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
The SELECT statement in Example 14 passes department name and gender to the
GROUPING_ID () function. Note that the output from the GROUPING_ID () function
shown in the GROUPING ID VALUE column agrees with the expected returned values
described earlier.
Example 14
SELECT DNAME "DEPARTMENT NAME", GENDER,
GROUPING(DNAME) "DNAME GROUP",
GROUPING(GENDER) "GENDER GROUP",
GROUPING_ID(DNAME,GENDER) "GROUPING ID VALUE",
SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY CUBE(DNAME, GENDER);
DEPARTMENT NAME GENDER DNAME GROUP GENDER GROUP GROUPING ID VALUE SUM(SALARY)
---------------------------------- ------ ---------- ----------- ---------------- ----------
1 1 3 1059000
678 F 1 0 2 366500
M 1 0 2 692500
Risk and Compliance 0 1 1 264500
Risk and Compliance F 0 0 0 72000
Risk and Compliance M 0 0 0 192500
Finance and Accounting Excellence 0 1 1 395000
Finance and Accounting Excellence F 0 0 0 180000
Finance and Accounting Excellence M 0 0 0 215000
Internal Audit and Financial Controls 0 1 1 399500
Internal Audit and Financial Controls F 0 0 0 114500
Internal Audit and Financial Controls M 0 0 0 285000
12 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
6 rows selected.
The CUBE operator with the HAVING clause shown earlier considers only the
sum of the salaries for each department as a separate subtotal, the sum of the salaries
for each gender as a separate subtotal, and the sum of the salaries for all consultants
(i.e., departments/genders) as a separate subtotal.
12 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Note that the last three rows are duplicates of the previous three rows. These
duplicates can be eliminated using the GROUP_ID () function illustrated in Examples 17
and 18.
If the department name column is referenced only as part of the ROLLUP operator,
as we have seen previously, the output includes the sum of the salaries of all consultants
in addition to the sum of the salaries for each department. However, when you group
by department name before using the ROLLUP operator, the sum of the salaries of all
consultants is not calculated.
The GROUPING_ID () distinguishes among duplicate groupings by removing duplicate
rows returned by a GROUP BY clause. GROUP_ID () does not accept any parameters. If n
duplicates exist for a particular grouping, GROUPING_ID () returns numbers in the range
of 0 to n-1. The query in Example 18 rewrites the query in Example 17 that references the
department name column two times so as to include the output of the GROUP_ID () func-
tion. Note that GROUP_ID () returns 0 for all rows except for the last three, which are
duplicates of the previous three rows. For these rows, GROUP_ID () returns a value of 1.
Example 17
SELECT DNAME "DEPARTMENT NAME", GENDER, GROUP_ID(), SUM(SALARY)
680 FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY DNAME, ROLLUP(DNAME, GENDER);
12 rows selected.
Example 18
SELECT DNAME "DEPARTMENT NAME", GENDER, SUM(SALARY)
FROM CONSULTANT JOIN DEPARTMENT
ON CONSULTANT.DID = DEPARTMENT.DID
GROUP BY DNAME, ROLLUP(DNAME, GENDER)
HAVING GROUP_ID() = 0;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
9 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
functions. The ORDER_BY_CLAUSE is much like the familiar ordering clause; however, it
is applied to the result of an analytical function. The WINDOWING_CLAUSE lets you
compute moving and accumulative aggregates—such as moving averages, moving sums, or
cumulative sums—by choosing only certain data within a specified window.
Rischert (2010) describes query processing with analytical functions as being
performed in three steps:
1. The joins, WHERE, GROUP BY, and HAVING clauses are carried out.
2. The following analysis of the results from Step 1 takes place:
• If any partitioning clause is listed, the rows are split into appropriate
partitions. These partitions are formed after the GROUP BY clause, so
you may be able to analyze data by partition, not just the expressions of
the GROUP BY clause.
• If a windowing clause is involved, the ranges of the sliding windows of
rows are determined. The analytical functions are based against the
specified window and allow moving averages and moving sums.
• Analytic functions may have an ORDER BY clause as part of the function
specification that allows you to order the result before the analytical
682 function is applied.
3. The third step occurs if an ORDER BY clause is present at the end of the
statement and the results are sorted accordingly.
This section illustrates several analytical functions in the context of the MONTHLY_
SALES and MONTH tables that appear in Figure 13.2. The MONTHLY_SALES table
contains the number of units sold per month for each of five products. The MONTH
table contains the names of the months of the year. It is included so that the examples
that follow can display month names instead of month numbers.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Sections 13.5.2–13.5.4 illustrate the use of the ranking functions, window functions,
and the first and last functions. The reader is encouraged to refer to Price (2004) and
Rischert (2010) for a more in-depth discussion of analytical function types.
5 rows selected.
Since there are five products in the MONTHLY_SALES table, there are five
groups to rank. Twelve “monthly units sold” amounts are added together to produce
the five sums.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
The PARTITION BY clause is used when there is a need to divide the groups into
subgroups. If you need to subdivide the “monthly units sold” by product, so that you
can obtain which months generated the best sales for each product, you can use the
PARTITION BY clause, as shown in Example 2.
Example 2
SELECT PRODUCT, MNAME, UNITS_SOLD,
RANK() OVER (PARTITION BY PRODUCT ORDER BY UNITS_SOLD DESC) RANK
FROM MONTHLY_SALES JOIN MONTH
ON MONTH.MNUM = MONTHLY_SALES.MNUM
ORDER BY PRODUCT, RANK;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
60 rows selected.
The ranks shown in Example 2 reveal that the number of units sold for certain
products were the same for multiple months and that the month of November was the
best selling month for products 4 and 5. In fact, as shown in Example 3, the month of
November ranked first in terms of total units sold for the five products.
Example 3
SELECT MNAME, SUM(UNITS_SOLD),
RANK() OVER (ORDER BY SUM(UNITS_SOLD) DESC) RANK
FROM MONTHLY_SALES JOIN MONTH
ON MONTH.MNUM = MONTHLY_SALES.MNUM
GROUP BY MNAME
ORDER BY RANK;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
12 rows selected.
You can use the ROLLUP, CUBE, and GROUPING SETS operators with the analytical
functions. The SELECT statement in Example 4 uses ROLLUP and RANK () to get the
rankings of total units sold by product. Note that the ROLLUP clause is responsible for
the first row.
Example 4
SELECT PRODUCT, SUM(UNITS_SOLD),
RANK() OVER (ORDER BY SUM(UNITS_SOLD) DESC) RANK
FROM MONTHLY_SALES
GROUP BY ROLLUP(PRODUCT)
ORDER BY RANK;
6 rows selected.
The SELECT statement in Example 5 uses GROUPING SETS and RANK () to get just
the sales amount subtotal rankings. Since there are five products and 12 months, 17 rows
of output are generated.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Example 5
COLUMN RANK NOPRINT11
SELECT PRODUCT, MNAME,SUM(UNITS_SOLD),
RANK() OVER (ORDER BY SUM(UNITS_SOLD) DESC) RANK
FROM MONTHLY_SALES JOIN MONTH
ON MONTH.MNUM = MONTHLY_SALES.MNUM
GROUP BY GROUPING SETS(PRODUCT, MNAME)
ORDER BY RANK;
17 rows selected.
11
Different database products have a number of supporting commands that control the display attributes
for a single column or all columns. Oracle 10g’s SQL*Plus environment allows the command COLUMN
RANK NOPRINT prior to the SELECT statement to suppress the display of the column headed by the
alias RANK, leaving only the PRODUCT, MNAME, and SUM(UNITS_SOLD) columns displayed.
12
A moving average is used to analyze a series of data points by creating a series of averages of
different subsets of the full data set. Moving averages are typically used with time-series data.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
forward with each subsequent row. A moving average has sliding starting and ending rows
for a constant logical or physical range.
You can use windows with the following functions: SUM(), AVG(), MAX(), MIN(),
COUNT(), VARIANCE(), and STDDEV(). You can also use windows with FIRST_VALUE()
and LAST_VALUE(), which return the first and last values in a window.
The SELECT statement in Example 6 performs a cumulative sum to compute the
cumulative sales amount, starting with January and ending in December. Note how each
monthly sales amount is added to the cumulative amount that grows after each month.
Example 6
COLUMN MONTH.MNUM NOPRINT13
SELECT MONTH.MNUM, MNAME, SUM(UNITS_SOLD), SUM(SUM(UNITS_SOLD))
OVER (ORDER BY MONTH.MNUM ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
"CUMULATIVE SALES"
FROM MONTHLY_SALES JOIN MONTH
ON MONTHLY_SALES.MNUM = MONTH.MNUM
GROUP BY MONTH.MNUM, MNAME
ORDER BY MONTH.MNUM;
689
MNAME SUM(UNITS_SOLD) CUMULATIVE SALES
----------- --------------- ----------------
January 371 371
February 414 785
March 350 1135
April 396 1531
May 334 1865
June 344 2209
July 340 2549
August 396 2945
September 324 3269
October 356 3625
November 427 4052
December 363 4415
12 rows selected.
In this expression:
• SUM(UNITS_SOLD) computes the sum of the units sold for a given month. The
outer SUM() computes the cumulative amount.
13
While the MNUM column in needed to join the MONTH and MONTHLY_SALES tables, it is not
necessary to display both the month number and month name.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
6 rows selected.
The query in Example 8 computes the moving average of the sales amount between
(i.e., involving) the current month and the previous three months.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Example 8
SELECT MONTH.MNUM, MNAME, SUM(UNITS_SOLD), AVG(SUM(UNITS_SOLD))
OVER (ORDER BY MONTH.MNUM ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
"MOVING AVERAGE"
FROM MONTHLY_SALES JOIN MONTH
ON MONTHLY_SALES.MNUM = MONTH.MNUM
GROUP BY MONTH.MNUM, MNAME
ORDER BY MONTH.MNUM;
12 rows selected.
In this expression:
• SUM(UNITS_SOLD) computes the sum of the units sold for a given group of
months. The outer AVG() computes the average.
• ORDER BY MONTH.MNUM orders the rows read by the query by month.
• ROWS BETWEEN 3 PRECEDING AND CURRENT ROW defines the starting
point of the window as including the three rows preceding the current
row; the ending point of the window is the current row. Since current
row is the default, ROWS 3 PRECEDING would have accomplished the
same thing.
The entire expression means compute the moving average of the sales amount
between the current month and the previous three months. Because, for the first two
months, less than the full three months of data are available, the moving average is based
on only the months available. Once the month of April is reached, each moving average is
based on four months of data.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Both the starting point and the ending point of the window begin at row 1 read by the
query. The ending point of the window moves down after each row is processed. The
starting point of the window only moves down after row 4 has been processed, after which
time the starting point of the window moves down after each row is processed. Processing
continues until the last row read by the query is processed.
FIRST_VALUE() and LAST_VALUE() are functions used to get the first and last rows in a
window. The SELECT statement in Example 9 uses FIRST_VALUE() and LAST_VALUE() to
get the previous and next month’s sales amount.
Example 9
SELECT MONTH.MNUM, MNAME, SUM(UNITS_SOLD),
FIRST_VALUE(SUM(UNITS_SOLD)) OVER (ORDER BY MONTH.MNUM
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) "PREVIOUS AMOUNT",
LAST_VALUE(SUM(UNITS_SOLD)) OVER (ORDER BY MONTH.MNUM
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) "NEXT AMOUNT"
FROM MONTHLY_SALES JOIN MONTH
ON MONTHLY_SALES.MNUM = MONTH.MNUM
GROUP BY MONTH.MNUM, MNAME
692 ORDER BY MONTH.MNUM;
12 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
14
The MERGE statement allows you to merge rows from one table into another. The MERGE state-
ment is not covered in this book.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Rules:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
12 rows selected.
The SELECT statement in Example 1 retrieves the sales amount for each month in
2013 for product 1 and computes the predicted sales for January, February, and March
of 2014, based on sales in 2013. It is used as a vehicle by which to illustrate the basic 695
elements of the MODEL clause syntax.
Example 1
SELECT PRODUCT, YEAR, MNUM, UNITS_SOLD
FROM MONTHLY_SALES
WHERE PRODUCT = 1
MODEL
PARTITION BY (PRODUCT)
DIMENSION BY (MNUM, YEAR)
MEASURES (UNITS_SOLD)
(UNITS_SOLD [1, 2014] = UNITS_SOLD [1, 2013],
UNITS_SOLD [2, 2014] = UNITS_SOLD [2, 2013] + UNITS_SOLD [3, 2013],
UNITS_SOLD [3, 2014] = ROUND (UNITS_SOLD [3, 2013] * 1.25, 2))
ORDER BY YEAR, MNUM;
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
• After MEASURES come three lines that compute the future sales for
January, February, and March of 2014. The following three lines constitute
the rules of the model:
• UNITS_SOLD [1, 2014] = UNITS_SOLD [1, 2013] sets the sales amount for
January 2014 to the amount for January 2013.
• UNITS_SOLD [2, 2014] = UNITS_SOLD [2, 2013] + UNITS_SOLD [3, 2013] sets
the sales amount for February 2014 to the amount for February 2013
plus March 2013.
• UNITS_SOLD [3, 2014] = ROUND (UNITS_SOLD [3, 2013] * 1.25, 2) sets the
sales amount for March 2014 to the rounded value of the sales amount
for March 2013 multiplied by 1.25.
• ORDER BY prd_type_id, year, month orders the results returned by the entire
query.
The output of the query in Example 1 is shown next. Note that the results contain the
units sold for all months in 2013 for product 1 plus the predicted number of units sold for
the first three months of 2014.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Example 2
SELECT YEAR, MNUM, UNITS_SOLD
FROM MONTHLY_SALES
WHERE PRODUCT = 1
MODEL
DIMENSION BY (MNUM, YEAR)
MEASURES (UNITS_SOLD)
(UNITS_SOLD [1, 2014] = ROUND (AVG(UNITS_SOLD) [MNUM BETWEEN 1 AND 3, 2013],2))
ORDER BY YEAR, MNUM;
13 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
13 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
1 2013 9 59
1 2013 10 58
1 2013 11 86
1 2013 12 81
1 2014 1 113.75
1 2014 2 114
1 2014 3 66.25
15 rows selected.
15 rows selected.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
5 ON MONTHS.MNUM = TO_CHAR(PROFESSOR.DATEHIRED,’mm’)
6 GROUP BY MONTHS.MNUM,
7 NVL(TO_CHAR(PROFESSOR.DATEHIRED,’Month’),MONTHS.MNAME)
8 UNION
9 SELECT ’13’, ’Unknown’, COUNT(*)
10 FROM PROFESSOR
11 WHERE PROFESSOR.DATEHIRED IS NULL
12 GROUP BY ’13’, ’Unknown’;
1. Since the months February, March, April, and July do not appear in the
PROFESSOR table, the MONTHS Table was created using the following
CREATE TABLE statement:
CREATE TABLE MONTHS (MNUM VARCHAR(2), MNAME VARCHAR(10));
and a row containing a two-character month number (e.g., 01 for January, ...,
12 for December) and the full name of the month inserted for each of the
12 months of the year.
2. Lines 6 and 7 indicate that grouping is first done by month number and then
by month name. Grouping by month name alone would cause the months to
be displayed in alphabetical (i.e., April, August, December, etc.) as opposed 701
to chronological (i.e., January, February, March, etc.) order.
3. Lines 2 and 7 make use of the null value function NVL (expression1, expres-
sion2) as a way to replace a null value with a string in the results of a query.
If expression1 is a null value, the NVL function returns expression2. If
expression1 is not null, the NVL function returns expression1.
4. Line 4 indicates a left outer join involving the MONTHS and PROFESSOR
tables. This allows an unmatched month number in the MONTHS table to be
combined with the row of null values in the PROFESSOR table as a result of
the left outer join. Use of the NVL function in lines 2 and 7 is required so
that the name of the month in the MONTHS table can replace the null value
associated with the PROFESSOR.DATEHIRED on the row of null values in the
PROFESSOR table.
5. COUNT(PROFESSOR.DATEHIRED) is used on line 3 so that the null value in
the PROFESSOR.DATEHIRED column on rows associated with months during
which no professor was hired will allow the initial value of the accumulator to
remain at zero. If COUNT(PROFESSOR.DATEHIRED) is replaced by COUNT(*),
the number of professors hired during each of the months of February, March,
April, and July will be 1.
6. Execution of the query on lines 1−7 will not include a row that records the
number of professors for which a date hired is unavailable. However a
query that contains the union of the SELECT statement on lines 1−7
with the SELECT statement on lines 9−13 will include a final row that
shows the number of professors for which a date hired is unavailable
(i.e., unknown).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
7. Execution of the query that appears above generates the following result:
Result:
702 Different database products have a number of supporting commands that control the
display attributes for a single column or all columns. For example, in Oracle 10g’s
SQL*Plus environment, adding the command:
COLUMN MNUM NOPRINT
before line 1 would suppress the display of the column headed Mnum and leave only the
Month and Number Hired columns displayed.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Result:
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
1. Since both the salary of a professor and the salary of the professor’s depart-
ment head are recorded in the PROFESSOR table, this query references the
PROFESSOR table twice (once using table alias A and once using table alias
704
B) in order to compare the salary of a professor with that of his or her
department head.
2. The ON clauses that appear on lines 4 and 5 work together to identify the pro-
fessor in copy B of PROFESSOR serving as department head of the professor
under investigation in copy A of PROFESSOR. The condition on line 6 ensures
that the salary of the professor exceeds that of his or her department head.
3. Execution of the query yields the result shown here:
Result:
Prof Name Prof Salary Dept Name Dept Head Dept Head Salary Dept Name
----------- ---------- -------- ---------- --------------- --------
Chelsea Bush 77000 QA/QM Alan Brodie 76000 QA/QM
Tony Hopkins 77000 QA/QM Alan Brodie 76000 QA/QM
Sunil Shetty 64000 IS Cathy Cobal 45000 IS
Katie Shef 65000 IS Cathy Cobal 45000 IS
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
Since the PROFESSOR table does not have separate columns for first name, last
name, and middle initial, the subquery makes use of the SUBSTR and INSTR functions
along with the GROUP BY and HAVING clauses to retrieve the first names that appear
more than one time in the PROFESSOR table. The main query, when executed, follows by
displaying the names of all professors whose first name appears in the set of first names
retrieved by the subquery. Execution of this query produces the following result. It is left
as an exercise for the reader to order the result in ascending order by last name.
Result:
NAME
---------------
John Smith
Mike Faraday
John B Smith
John Nicholson
Mike Crick
NAME
-------------
Marie Curie
Mike Faraday
Mike Crick
Alan Brodie
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
Chapter Summary
In addition to the aggregate functions introduced in Chapter 12, the SQL:2003 standard
contains a number of built-in functions for working with strings, dates, and times. Although
the functionality associated with these functions is available in practically all database pro-
ducts, different products contain slightly different syntax from that specified in the standard
(e.g., Oracle’s SUBSTR function versus the SUBSTRING function in the SQL:2003 stan-
dard). While such variations in syntax limit the portability of SQL SELECT statements, it is
hoped that, after studying Chapters 10 through 13, readers will be able to use the features
discussed here and, where necessary, adapt them to their specific database platform with a
minimum of difficulty.
Hierarchical queries can be used to retrieve records from a table by their natural relationship,
be it a family tree, an employee/supervisor tree, or a bill of materials. Hierarchical queries incorpo-
rate key words such as START WITH, CONNECT BY, and PRIOR into the Select Statement.
The START WITH clause defines the root rows of the hierarchy; the CONNECT BY clause
explains the relationship between the parent and child; and the key word PRIOR, when prefixing
a column name, is used to indicate whether the hierarchy goes from parent to child (i.e., top to
bottom) or from child to parent (bottom to top).
706 SQL:2003 contains the following enhancements to the GROUP BY clause: (a) ROLLUP
and CUBE extensions to the GROUP BY clause, (b) three GROUPING functions, and (c) the
GROUPING SETS expression ROLLUP, which enables a SELECT statement to create subtotals
that roll up from the most detailed level to a grand total, following a grouping list of columns
specified in the ROLLUP clause. CUBE is an extension of ROLLUP, taking a specified set of
grouping columns and creating subtotals for all of their possible combinations. GROUPING
SETS is an expression that allows only a selectively specified set of groups to be displayed. The
three grouping functions allow you to determine (a) which rows are subtotals and (b) the exact
level of aggregation for a given subtotal.
SQL:2003 introduces a new family of analytical functions. These functions make it possi-
ble to calculate (a) rankings and percentiles, (b) moving window aggregates, (c) lag/lead anal-
ysis, (d) first/last analysis, and (e) least squares regression. Query processing using analytical
functions takes place in three steps. First, all joins, WHERE, GROUP BY, and HAVING
clauses are carried out. Second, the query results are divided into groups of rows called parti-
tions. A query result set may be partitioned into just one partition holding all rows, a few large
partitions, or many small partitions holding just a few rows each. Third, if the query has an
ORDER BY clause at its end, the ORDER BY is processed to allow for precise ordering of the
results.
The SQL MODEL clause allows you to view query results in the form of multidimensional
arrays and then apply formulas to calculate new array values. The formulas can be sophisticated
interdependent calculations with inter-row and inter-array references. The MODEL clause
defines a multidimensional array by mapping the columns of a query into three groups: partition-
ing, dimension, and measure columns. Partitions define logical blocks of the result set in a way
similar to the partitions of the analytical functions. Each partition is viewed by the formulas as an
independent array. Dimensions identify each measure cell within a partition. These columns are
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
identifying characteristics such as date, region, and product name. Measures contain numeric
values such as sales. Each cell is accessed within its partition by specifying its full combination
of dimensions.
Exercises
1. Display the subassemblies of a snow shovel in the same top to bottom order as shown here:
1605
Shovel complete
129 1118
Top handle Top handle
bracket coupling
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
MILK_PROD Table
YEAR MONTH POUNDS REGION YEAR MONTH POUNDS REGION YEAR MONTH POUNDS REGION
---- ------ ------ ------ ---- ------ ------ ------ ----- ----- ------ ------
2006 1 118 N 2009 10 124 N 2008 7 208 E
2006 2 118 N 2009 11 111 N 2008 8 224 E
2006 3 149 N 2009 12 121 N 2008 9 232 E
2006 4 177 N 2010 1 117 N 2008 10 261 E
2006 5 226 N 2010 2 118 N 2008 11 261 E
2006 6 215 N 2010 3 147 N 2008 12 243 E
2006 7 189 N 2010 4 166 N 2009 1 201 E
2006 8 172 N 2010 5 193 N 2009 2 199 E
2006 9 140 N 2010 6 195 N 2009 3 237 E
2006 10 143 N 2010 7 158 N 2009 4 246 E
2006 11 134 N 2010 8 159 N 2009 5 257 E
2006 12 143 N 2010 9 134 N 2009 6 228 E
2007 1 137 N 2010 10 126 N 2009 7 207 E
2007 2 124 N 2010 11 107 N 2009 8 230 E
2007 3 150 N 2010 12 110 N 2009 9 239 E
2007 4 181 N 2006 1 193 E 2009 10 258 E
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
YEAR MONTH POUNDS REGION YEAR MONTH POUNDS REGION YEAR MONTH POUNDS REGION
---- ------ ------ ------ ---- ------ ------ ------ ----- ----- ------ ------
2007 4 272 S 2006 2 103 W 2009 10 175 W
2007 5 286 S 2006 3 120 W 2009 11 172 W
2007 6 266 S 2006 4 147 W 2009 12 150 W
2007 7 239 S 2006 5 167 W 2010 1 159 W
2007 8 261 S 2006 6 173 W 2010 2 158 W
2007 9 272 S 2006 7 152 W 2010 3 160 W
2007 10 296 S 2006 8 142 W 2010 4 183 W
2007 11 292 S 2006 9 116 W 2010 5 184 W
2007 12 285 S 2006 10 97 W 2010 6 189 W
2008 1 240 S 2006 11 85 W 2010 7 215 W
2008 2 231 S 2006 12 93 W 2010 8 208 W
2008 3 282 S 2007 1 86 W 2010 9 204 W
2008 4 282 S 2007 2 86 W 2010 10 183 W
2008 5 309 S 2007 3 96 W 2010 11 195 W
2008 6 289 S 2007 4 125 W 2010 12 193 W
2008 7 255 S 2007 5 147 W
2008 8 273 S 2007 6 138 W
2008 9 289 S 2007 7 138 W REGIONS Table;
709
2008 10 309 S 2007 8 135 W
2008 11 305 S 2007 9 108 W REG_CODE REG_NAME
2008 12 290 S 2007 10 102 W ------- ---------------
2009 1 241 S 2007 11 95 W N Northern Region
2009 2 234 S 2007 12 110 W E Eastern Region
2009 3 278 S 2008 1 143 W S Southern Region
2009 4 294 S 2008 2 134 W W Western Region
2009 5 300 S 2008 3 127 W
2009 6 279 S 2008 4 121 W
2009 7 246 S 2008 5 110 W
2009 8 270 S 2008 6 107 W
2009 9 270 S 2008 7 111 W
2009 10 296 S 2008 8 114 W
2009 11 306 S 2008 9 122 W
2009 12 283 S 2008 10 134 W
2010 1 231 S 2008 11 143 W
2010 2 236 S 2008 12 130 W
2010 3 282 S 2009 1 132 W
2010 4 278 S 2009 2 124 W
2010 5 307 S 2009 3 126 W
2010 6 279 S 2009 4 134 W
2010 7 250 S 2009 5 136 W
2010 8 278 S 2009 6 145 W
2010 9 293 S 2009 7 140 W
2010 10 316 S 2009 8 140 W
2010 11 316 S 2009 9 172 W
2010 12 316 S
2006 1 105 W
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
8. Display the number of pounds of milk produced in each region during the 2006–2010 time period.
9. Display the number of pounds of milk produced in each region during the 2006–2010 time
period. In addition, include in the output you display the total number of pounds of milk
produced across all regions. Use the GROUPING function and the DECODE function to
display an appropriate label for the number of pounds produced across all regions.
10. Display the number of pounds of milk produced in each region each year, along with the
same totals you displayed in your answer to the previous question. Use the GROUPING
function and the DECODE function to display an appropriate label for the number of pounds
produced in all years for each region and for all years for all regions.
11. Modify the previous query so that instead of displaying as one of the totals the total number
pounds of milk produced in each region, the total number of pounds of milk displayed in
each year across all regions is displayed.
12. How many rows of output would be produced if the ROLLUP used in your query were to
take the following form: GROUP BY ROLLUP (YEAR, MONTH, REG_NAME)? Write a
query to verify your result.
13. How many rows of output would be produced if the ROLLUP used in your query were to
take the following form: GROUP BY ROLLUP (MONTH, YEAR, REG_NAME)? Write a
710 query to verify your result.
14. Display the number of pounds of milk produced in each region each year along with the
total number of pounds produced in each region, the total number of pounds of milk pro-
duced during each year, and the total number of pounds of milk produced overall.
15. Display only the total number of pounds of milk produced in each region and in each year.
Order the results by the totals generated.
16. Display the number of pounds of milk produced in each year, along with the ranking of each
year in terms of the number of pounds of milk produced. The rankings should be displayed in
descending order (i.e., year with the top production should receive the highest rank).
17. Display the number of pounds of milk produced in each region during the 2006–2010
period, along with the ranking of each region in terms of the number of pounds of milk pro-
duced. The ranks should be displayed in descending order (i.e., the region with the top
production should receive the highest rank).
18. Display the number of pounds of milk produced in each month during the 2006–2010
period, along with the ranking of each month in terms of the number of pounds of milk pro-
duced. The ranks should be displayed in descending order (i.e., the month with the top
production should receive the highest rank).
19. Display a separate set of rankings for the number of pounds of milk produced in each
month for each region.
20. Display the sum along with the cumulative sum of the production of milk across the regions
for the 2006–2010 period, starting with January and ending in December.
21. Display the sum along with a three-month moving average of the production of milk across
the regions for the 2006–2010 period, starting with January and ending in December.
22. Display the sum along with a three-month moving average of the production of milk across the
regions for the 2006–2010 period, starting with January, 2006 and ending in December, 2010.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
23. Write an SQL query that makes use of the MODEL clause to estimate the monthly produc-
tion of milk in 2011 for each region to be equal to the average of the production of milk for
that month during the 2006 through 2010 time period.
SQL Project
This project15 is based on eight tables (AIRPORT, FLIGHT, DEPARTURES, PASSENGER,
RESERVATION, EQUIP_TYPE, PILOTS, and TICKET) that contain data about Belle Airlines.
The script required to create and populate these eight tables can be downloaded from www
.course.com (search on the ISBN of this book) or obtained from your instructor.
15
This project is an adaptation of an example that appears in Lorents and Morgan (1998).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
AIRPORT
(0,n) (0,m)
Departure Arrival
(1,1) (1,1)
FLIGHT
(1,n)
(0,n)
Contains Captains PILOT
(1,1) (1,1)
(1,1)
(0,n) Used_in TICKET
(1,1)
Involved_in
Possess
(1,1)
(1,n)
(1,n) (1,1)
RESERVATION Has PASSENGER
Example 1. On April 1, Ole Olson (assigned confirmation number 1 in the Reservation table)
reserved two tickets on Flight Number 15, scheduled to leave on April 1, and return via Flight
Number 329 on April 1 and April 10. As a result, the Passenger table shows Ole Olson as Itin-
erary Number 1, and his wife, Lena Olson, as Itinerary Number 2. Observe that both have
Confirmation Number 1 since both reservations were booked at the same time by Ole. Observe
that the Ticket Table shows that Ole (Itinerary Number 1) is sitting in Seat 10D on Flight Num-
ber 15 and in Seat 12D in Flight Number 329. On the other hand, his wife, Lena (Itinerary
Number 2) is sitting in Seat 10E on Flight Number 15 and in Seat 12E in Flight Number 329.
Example 2. On April 17, Andy Anderson (assigned confirmation number 6 in the Reserva-
tion table) reserved two tickets on Flight Number 102, scheduled to leave on April 18. As a
result, the Passenger table shows Andy Anderson as Itinerary Number 12, and his wife, Gloria
Anderson, as Itinerary Number 13. Once again, observe that both have Confirmation Number 6
since both reservations were booked at the same time by Andy. Observe that the Ticket
table shows that Andy (Itinerary Number 12) has a ticket on Flight Number 102, with seat
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
assignment 10B. The same is true for Gloria (Itinerary Number 13). She has a ticket on Flight
Number 102 and seat assignment 7C.
The 101 queries that follow should collectively provide a good introduction to SQL. Please
note that in some cases it is quite clear which columns (fields) need to be displayed. In other
cases, the columns to be displayed are left to individual judgment. Use DISTINCT to minimize
the repetition of duplicate rows, and don’t be afraid to use column aliases (especially when a
column heading would otherwise be the content of a numeric or character function); they can
make the output of a query more readable. In addition, please try to avoid wraparound as much
as possible. In all cases, please refrain from the temptation to believe that the results are
correct the first time output from a query is obtained. In other words, please try to display
enough columns (fields), and study the output in order to verify the accuracy of the result.
In order to challenge the student, the 101 queries are for the most part randomly ordered.
In other words, queries 1–10 are not necessarily the easiest to write, nor is query 101 neces-
sarily the most difficult to write. One may want to begin working with queries that seem to
involve either one or two tables, such as FLIGHT and PASSENGER. These queries involve the
use of the LIKE operator, numeric functions (e.g., COUNT, MIN, MAX, AVG, etc.), grouping,
nested subqueries, and joins.
1. Display the origin, destination, departure time from origin, and arrival time at destination 713
for all flights that occur in the same time zone. Your results should be displayed in order
by flight number.
2. Display the code, location, and elevation of all airports without a hub airline. Your results
should be in descending order by elevation.
3. Display the departures originating from Los Angeles, CA. Include in your results flights
from Los Angeles for which no departures currently exist. Los Angles, CA and not LAX
should be used in the WHERE clause of your query.
4. Display the flight numbers and the codes for the origins and destinations of all flight
reservations made by Andy Anderson.
5. Display the seating capacity, fuel capacity, and miles per gallon for all aircraft manufac-
tured by Boeing. Information about each equipment type should be displayed only once.
6. Display the names of all pilots who live outside of the state of Texas. Order the results in
alphabetical order by last name.
7. Display the flight number, flight date, fare, origin, and destination for all tickets with a flight
date of July 2006. Use the fare in the FLIGHT table as the fare for the ticket. Order your
results in ascending order by flight date and within flight date by flight number.
8. Display all flights that originate at an airport without a hub airline.
9. Display all flights that arrive at an airport without a hub airline.
10. Display all flights that both originate and arrive at an airport without a hub airline.
11. Display all departures that are flown by an aircraft not manufactured by Boeing. Your
results should be in ascending order by departure date and within departure date by flight
number.
12. Display the distance divided by the fare for each flight. For each flight, display the flight
number, the origin, the destination, the distance, the fare, and the quotient. Your results
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
should be in descending order by the quotient and rounded to two places to the right of
the decimal point. Create a descriptive column alias for the quotient.
13. Display the total number of flights that originate from each point of origin.
14. Revise the previous query so that instead of displaying the code for each point of origin,
the location of each point of origin from the AIRPORT table is displayed.
15. Revise the previous query to also include the display of those locations where no flights
originate.
16. Display the average flight pay for pilots that live in each state.
17. Display the name and flight pay for those pilots whose flight pay exceeds the average
flight pay for all pilots.
18. Display the name and flight pay for those pilots whose flight pay exceeds the average
flight pay for all pilots in the state in which they reside.
19. Display the date of the most recent departure flown by each pilot. Include in what you
display the name of the pilot.
20. Display not only the date of the most recent departure by each pilot but also the number
of days since the last departure date. Truncate the number of days (i.e., if number of days
714 is 37.67655, display 37) to zero places to the right of the decimal point. Order the result in
descending order by the number of days.
21. Display the number of departures that involve flights for each of the three time zone
differences.
22. Display the number of airports located in each state.
23. Display the number of departures where the distance flown is greater than or equal to
1000 miles.
24. Display the difference in age between the oldest and youngest pilot.
25. For each type of aircraft, display the total distance that can be flown before refueling.
Display your results in descending order by total distance that can be flown.
26. For each passenger listed in the PASSENGER table, display the name of the person
responsible for his or her reservation.
27. For each passenger listed in the PASSENGER table, display the name of the person
responsible for his or her reservation only if the passenger himself or herself was not
responsible for making the reservation.
28. For each reservation in the RESERVATION table, display the name of the pilot who will
be piloting the flight.
29. Display those tickets that include only one flight.
30. Display the name of the passengers whose tickets include only one flight.
31. What flights leave Phoenix for Los Angeles between 3:00 PM and midnight? Display each
flight’s flight number, city name of the flight’s origin, city name of the flight’s destination,
departure time, and arrival time.
32. What are the fares from Phoenix to Los Angeles if Belle Airlines is running a 20 percent
discount special off the current fares? Display each flight’s flight number, city name of the
flight’s origin, city name of the flight’s destination, and discounted fare.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
33. Andy Anderson wants to know the passenger and ticket information on all passengers for
which he has made reservations. For each reservation, display the passenger name, flight
number, and flight date and seat assignment.
34. Display the maximum fare for flights originating at each airport if that fare is greater than
$100. For each qualifying airport, display the airport code, location, and maximum fare.
Display the results in descending order by fare.
35. Display the flight number of the flights that have no ticketed passengers scheduled on
April 18, 2006. Display the results in ascending order by flight number.
36. What is the passenger count for each flight that has more than one ticketed passenger
scheduled? For each qualifying flight, display the flight number and number of
passengers.
37. Display the city name of the flight’s origin and city name of the flight’s destination as well
as all data about the flight for all tickets held by Pete Peterson.
38. Display the names and phone numbers of persons who have reservations on flights
leaving Phoenix, Arizona on May 17, 2006 for each flight booked with fewer than three
passengers.
39. Display all flights where the origination time is later than at least one of the Minneapolis to
Phoenix flights. 715
40. Display all flights that leave later than all flights going from Phoenix to Los Angeles.
41. Display the flight information (origination, destination, and times) on all passengers who
are flying under a reservation made by Pete Peterson. The origin and destination should
include the entire name of the city.
42. Display the number of departures associated with each pilot in ascending order by pilot
name. Include all pilots, even those without any departures, in your results.
43. Display the names of those pilots who were not assigned to a departure during April 2006.
44. A passenger wants to fly from Phoenix to Los Angeles and back in a single day. He
needs at least five hours in Los Angeles to get to and from the airport and conduct his
business. List the flight numbers, origin times, and destination times of the flights that will
accommodate his schedule.
45. What flights from Flagstaff to Phoenix have connecting flights in Phoenix going on to Los
Angeles? Allow 40 minutes for a connection.
46. Display the total of the fares for all tickets related to Pete Peterson. Use the fare in the
FLIGHT table in your calculation.
47. Display the total number of tickets sold for each flight across all dates.
48. Display the total of the fares collected for each flight on each date. Assume all tickets
were sold at full fare and that the fares come from the FLIGHT table.
49. Display the maximum fare for flights between each origin and destination airport (e.g.,
Phoenix to Los Angeles, Phoenix to Flagstaff, Phoenix to San Francisco). As part of your
result, display the name of the city where the airport is located—not the code.
50. Display the total miles flown by each pilot. Display the name of each pilot along with the
miles flown. Display the results in descending order by miles flown. Include all pilots in
your result—even those who have not yet flown a flight.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
51. Display the names of all pilots who have flown a Boeing 727. Please display the first
name followed by the middle initial and the last name of each pilot.
52. Display the names of the passengers with tickets on the July 23, 2006 departure of Flight
Number 104. Your results should be displayed in ascending order by seat number.
53. Display the flight number and date of all departures originating from Phoenix that serve
either a snack or nothing to passengers.
54. Display all flights with either a California origination or destination.
55. Display all flights that depart on one day and arrive the next day.
56. Display the total compensation to each pilot in April 2006. The total compensation for a
pilot is the product of the pilot’s flight pay times the number of flights flown. Include pilots
who did not fly any flights in April 2006.
57. Under the assumption that fuel costs $2.31 per gallon, display the total cost of fuel for all
departures flown by each aircraft. Include the equipment number and equipment type of
each aircraft.
58. Display the name and age of the youngest pilot.
59. Display the name and hiredate of all pilots who were hired during the 1990s. The qualifying
716 pilots should be displayed in descending order by length of service.
60. Display the name and age of all pilots who were less than 40 years of age when hired
and are now older than 47 years of age.
61. Display the names of those passengers whose last name begins with the letter “A.” Each
unique passenger name should be displayed only one time.
62. Display the confirmation number, reservation date, reservation name, phone number of
the person making the reservation, and flight number for those reservations made during
April 2006 for flights scheduled to depart sometime after April 2006. In addition, the area
code of the phone number should be separated from the first three digits by a hyphen,
and the first three digits of the phone number should be separated from the last four digits
by another hyphen.
63. In a Boeing 727, seats A and E are window seats. Display the name, flight number, and
departure date for all passengers who have reserved a window seat.
64. Display the names of passengers who have flown on a flight piloted by William B. Pasewark.
Include in your results the flight number as well as the date of the flight. Display the result in
order by passenger name.
65. Produce a list showing the total number of pilots hired in each year. The list should
contain the calculated current average flight pay for all pilots hired during that year.
The resulting list should be in ascending order by year.
66. Display the total number of pilots who live in each state in alphabetical order by state.
67. Display the total number of departures associated with each meal type.
68. Display the names of those pilots scheduled for a departure in an airplane manufactured
by Boeing Corporation during April 2006.
69. Display the equipment number and equipment type of airplanes flown by Stuart Long.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Advanced Data Manipulation Using SQL
70. Display the first and last name of all pilots who have flown more than one type of
aircraft. Display the results in ascending order by last name and include in the result the
equipment types each qualifying pilot has flown.
71. Display just the first and last name (first name followed by last name with no middle initial)
of pilots who have not been assigned to a flight during May 2006. Note: This is a
challenging query.
72. Display the first name and last name of those passengers who have a ticket with at least
two departures.
73. For the oldest pilot, display his or her name, and aircraft flown.
74. Display the names of those passengers who at one time or another have made at least
one reservation.
75. Display the name and age of the pilot with the highest pay among those pilots younger
than 50 years of age.
76. Display information about all departures flown by the aircraft with the largest fuel capacity.
Include in your results the flight number, departure date, origin, and destination.
77. Display the first and last name of those passengers whose reservation was made at least one
month in advance of their departure. In addition, display the flight date, origin, and destination.
717
78. How many flights are flown within the same state? Display the flight number, origin city,
destination city, and type of aircraft used.
79. Display the total number of miles flown on ticketed departures by each pilot. If a pilot has
never flown a ticketed flight, display a zero.
80. Display the names of those pilots who live in a city that does not have an airport.
81. Display the types of aircraft used to fly to or from the cities with the highest or lowest
elevation.
82. Display the name and phone number of the passenger booked on the most reservations.
83. Display the name, age, and hiredate of the pilots who fly either to or from the city with the
highest elevation. The result should include the name of the city.
84. Display the name(s) of all persons making a reservation for Lena Olson.
85. Display information on tickets associated with a person whose last name is Peterson.
86. Display the number of tickets associated with those people who make reservations.
87. Display the number of tickets sold on each row. The results should be displayed in
ascending order by row number.
88. Display the number of tickets associated with each seat in ascending order by seat.
89. Display the names of passengers who have never made a reservation themselves.
90. Display the total fare associated with each reservation. Base your calculation on the fare
associated with each flight.
91. Display the total number of tickets sold during each month.
92. For each ticket, display the name of the passenger, the name of the person making the
reservation, the name of the pilot, the fare, the fuel capacity of the airplane assigned to
the departure, and the flight’s point of origination and point of destination.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 13
93. For each flight, display the number of departures associated with flights within the same
time zone.
94. Display the location of those airports that have been neither the point of origination nor
the point of destination for any Belle Airlines flights.
95. Display flights without any departures in May 2006.
96. Calculate the number of scheduled departures of flights from each airport during May
2006 that include lunch or dinner. Display the number of flights for each qualifying airport.
97. Assume that airports with a hub airline offer flights for just that airline (e.g., the only
airline flying out of Phoenix is Belle Airlines, the only airline flying out of Minneapolis is
Northwest). Calculate the number of departures associated with each hub airline.
98. Display the flying time associated with each flight. Remember to take into account the
difference in time zones from the point of origination versus the point of destination.
99. Display the number of tickets associated with each flight that has a California destination.
100. Calculate the total fare paid by each passenger. Use the fare in the FLIGHT table.
101. Display the number of departures associated with each date in ascending order by date.
718
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX A
DATA MODELING
ARCHITECTURES BASED
O N T H E IN V E R T E D T R E E
AND NETWORK DATA
STRUCTURES
Appendix A begins with illustrations of the inverted tree and network data structures—the
two basic data structures that underlie the data models used today for designing data-
bases. Discussion of these data structures is followed by a brief overview of how the
hierarchical and CODASYL data model architectures express the inverted tree and
network data structures, respectively.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix A
720
1
When two nodes exhibit a 1:1 relationship, either can be designated as the parent or the child.
2
Observe that the line connecting EMPLOYEE to PLANT has a single arrowhead on the PLANT end
indicating that an employee works for (is associated with just one PLANT); since a PLANT can have
many EMPLOYEEs, the same line connecting PLANT to EMPLOYEE has two (i.e., multiple) arrow-
heads on the EMPLOYEE end.
3
The semantics of this illustration allow for a project to be an in-house or an outsourced project or
contain no more than one in-house and one outsourced component.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Data Modeling Architectures Based on the Inverted Tree and Network Data Structures
Thus, both inverted tree and network data structures permit a node to participate as
a parent in multiple PCRs. That a node can participate as a child in only one PCR is an
additional constraint specific to the inverted tree structure. When this constraint is
relaxed and a node is permitted to also participate as a child in multiple PCRs, a network
structure eventuates. From this it should be clear that an inverted tree structure is a spe-
cial case of the network structure. In other words, implementation of the latter implicitly
implements the former.
4
Observe that the network data structure is similar to the EER construct Specialization Lattice (see
Section 4.1.3 in Chapter 4).
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix A
722
A.2 LOGICAL DATA MODEL ARCHITECTURES
There are three major architectures for logical data models. The relational data model,
discussed in Chapter 6, literally monopolizes contemporary DBMS architectures. However,
the hierarchical and CODASYL data models precede the relational data model and have
historical value in that many legacy systems still run on DBMS platforms that implement
hierarchical or CODASYL architectures. Two more architectures are currently emerging
and are often referred to as post-relational data models: the object-oriented data model and
the object-relational data model. The former is considered an aggressive competitor to the
relational data model, while the latter containing the features of both relational and object-
oriented constructs is seen at least by some [e.g., Date (2004)] as the future of practical
data modeling for database design. These models are the subject of Appendix B.
5
A bill of materials is sequence of 1:n relationships between the subassemblies necessary for pro-
ducing an item. (Shepherd, 1990, p. 67).
6
For example, if BCU_ACCOUNT can be a child of both EMPLOYEE and DEPENDENT, Figure A.1
would have to be revised to show a BCU_ACCOUNT node under DEPENDENT as well as a second
BCU_ACCOUNT node under EMPLOYEE, thus introducing possible redundant bank account data.
However, in practice this data redundancy is handled through the use of logical pointers. Figures A.3
and A.4 illustrate a situation of this nature.
7
This is an issue of navigational efficiency through the hierarchy. Several strategies are available for
ordering the segments. However, this topic is beyond the scope of this book.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Data Modeling Architectures Based on the Inverted Tree and Network Data Structures
Figure A.3 shows two occurrences (one for PLANT11 and a second for PLANT12) of
723
the inverted tree structure that appears in Figure A.1. Figure A.4 shows two occurrences
of the network structure that appears in Figure A.2 implemented in a hierarchical archi-
tecture. Since in the case shown in Figure A.2 it is possible for a bank account to be
associated with both an employee and a dependent, the segment occurrences for bank
accounts 11a and 11b must appear redundantly as children of both EMPLOYEE11 and
DEPENDENT11A (one of the three dependents of EMPLOYEE11). Likewise, the segment
occurrence for assignment 11-11 also appears redundantly under EMPLOYEE11 and
PROJECT11. While this exemplifies how a network structure is accommodated in a hier-
archical architecture, from a practical standpoint, the resulting redundancy often induces
processing inefficiencies that are bound to reduce the database system performance.
FIGURE A.3 Two occurrences of the hierarchical structure shown in Figure A.1
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix A
724
FIGURE A.4 Two occurrences of the network structure shown in Figure A.2
The Data Definition Language for IMS is Data Language/1 (DL/1). DL/1 makes it pos-
sible to map a conceptual schema to a logical schema as well as to a physical schema. In
DL/1, a field is the smallest unit of data, a segment is a group of related fields, a database
record is a hierarchically structured group of segments, and a database is a collection of
database record occurrences of one or more database record types. DL/1 makes use of a
database description (DBD) to define the way data is stored for use by IMS. In addition,
the DBD also defines the format, length, and location of each data item to be accessed by
the DL/1 data manipulation language. The following lines illustrate a DBD that reflects the
hierarchical structure in Figure A.1. It should be noted that many of the details associated
with a complete DBD (e.g., the length and location of each field) are not shown.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Data Modeling Architectures Based on the Inverted Tree and Network Data Structures
The DBD begins with a DBD macro8 which among other things assigns a name to the
hierarchical structure. Each segment description is headed by a SEGM macro that names
the segment, indicates its total length in bytes (not shown here), and gives the name of its
parent. The first segment, or root, has no parent. Each field within a segment is repre-
sented by a FIELD macro. Applications access an IMS database through DL/1 data
8
A macro is a short program consisting of several operations saved in a file under a certain name,
which can be invoked from within another program.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix A
9
A record type is equivalent to a segment type in a hierarchical architecture. Likewise, a set type is
the same as a PCR, an owner represents the parent and the member is the child.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Data Modeling Architectures Based on the Inverted Tree and Network Data Structures
727
Like the hierarchical data model, implementations of the CODASYL data model
include a data definition language. The following lines illustrate how the eight record types
and nine set types for the structure in Figure A.5 can be declared:
SCHEMA NAME IS BEARCAT
RECORD NAME IS PLANT
RECORD NAME IS EMPLOYEE
RECORD NAME IS PROJECT
RECORD NAME IS DEPENDENT
RECORD NAME IS BCU_ACCOUNT
RECORD NAME IS ASSIGNMENT
RECORD NAME IS IN_HOUSE
RECORD NAME IS OUT_SOURCED
SET NAME IS WORKS_IN
OWNER IS PLANT
MEMBER IS EMPLOYEE
SET NAME IS UNDERTAKEN_BY
OWNER IS PLANT
MEMBER IS PROJECT
SET NAME IS DEPENDENT_OF
OWNER IS EMPLOYEE
MEMBER IS DEPENDENT
SET NAME IS HELD_BY_E
OWNER IS EMPLOYEE
MEMBER IS BCU_ACCOUNT
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix A
As was the case for the DBD for the hierarchical data model that appears in
Section A.2.1, many of the details associated with the definition of the record types and
sets in this schema are not provided. For example, the definition of a record type includes
several clauses that specify the scheme used to locate the record and to define the indi-
vidual data items that comprise the record type. Besides identifying the owner and mem-
ber of each set type, the definition of each set type includes rules for the insertion of new
member records and for moving existing records from one set occurrence to another.
Interested readers may wish to refer to Shepherd (1990) for an excellent introduction to
the CODASYL as well as the hierarchical data model.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Data Modeling Architectures Based on the Inverted Tree and Network Data Structures
Summary 729
The hierarchical data model is the oldest of the data models and organizes data in the form of
an inverted tree consisting of a hierarchy of parent and child segments, where a child is allowed
to have only one parent. Coinciding with the development of the hierarchical data model was the
CODASYL data model, which allowed more than one parent per child. Both of these models
were used primarily during the mainframe era as vehicles for describing the structure of data as
well as data manipulation operations but are no longer used as the basis for database systems
today. While a number of legacy systems structured in accordance with these models remain in
use today, many predict that they will be phased out over time as the number of qualified staff
declines due to retirement and retraining.
Selected Bibliography
Date, C. J. (2004) An Introduction to Database Systems, Eighth Edition, Addison-Wesley.
Kroenke, D. M. (1977) Database Processing, Science Research Associates, Inc.
Shepherd, J. C. (1990) Database Management: Theory and Application, Richard D. Irwin, Inc.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
APPENDIX B
OBJECT-ORIENTED DATA
MODELING ARCHITECTURES
Object-oriented concepts have drawn considerable attention among researchers and prac-
titioners since the late 1980s and have significantly influenced efforts to incorporate in
the DBMS the ability to process complex data types beyond just storage and retrieval.
Appendix B briefly introduces the reader to object-oriented concepts exclusively from a
database or, to be more precise, from a data modeling perspective.
1
CAD/CAM stands for Computer-aided Design/Computer-aided Manufacturing, CIM is the acronym
for Computer-integrated Manufacturing and CASE is the acronym for Computer-aided Software
Engineering)
2
SQL:1999 supports the Large Object (LOB) data type with two possible variants—Binary Large
Object (BLOB) and Character Large Object (CLOB). A LOB has a unique id called a locator which
allows LOBs to be manipulated without extensive copying. LOBs are typically stored separately from
the tuples in whose attributes they appear.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix B
3
Persistence in the OO paradigm refers to continued existence of data even after the program that
created it has terminated.
4
ORION was developed at Microelectronic and Computer Technology Corporation (MCC), Austin,
TX; IRIS was developed by Hewlett-Packard; and ODE was developed at AT&T Bell Labs, now a part
of Lucent Technologies.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Object-Oriented Data Modeling Architectures
5
In a pure sense, all instance variables are hidden from the user. While logically unnecessary, in
practice, objects typically expose physical representation of some instance variables usually via
some special syntax and these are called public instance variables. Therefore, the truly hidden
instance variables are labeled private instance variables.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix B
support multiple inheritance—i.e., the ability of a subclass to inherit variables and meth-
ods from multiple superclasses. Modeled as a directed acyclic graph (DAG), this concept is
734 similar to that of a specialization lattice in the EER modeling grammar (see Section 4.1.3
in Chapter 4).
Another related concept that is rather useful is known as object containment. The
idea essentially conceptualizes objects as containing (i.e., being a part of) other objects in
addition to public and private instance variables and methods. Sometimes these objects
are referred to as complex objects or composite objects. Portrayal of multiple levels of
containment leads to a containment hierarchy. Containment allows different users to view
objects at different granularities. The concept essentially replicates the aggregation con-
structs in the EER modeling grammar in the OO context. Figure B.1 shows a containment
hierarchy for a complex object called Information Systems. In this containment hierar-
chy, a business analyst can focus attention on the Procedure objects without any concern
about Information Technology, Data, and Personnel objects. Likewise, a computer engi-
neer can choose to limit the scope of his/her analysis to the hardware objects. The
Information Systems manager, on the other hand, may use the containment hierarchy
to monitor the whole Information Systems. In applications where an object is a part of
several objects, the containment relationship can be portrayed as a DAG instead of a
hierarchy.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Object-Oriented Data Modeling Architectures
an OID is therefore meant for internal use by the database system and so is not visible to
the user. The fundamental property of an OID is that it is immutable. Therefore, it is pre-
ferred that an OID value be retired when the associated object is removed from the data- 735
base instead being reassigned to another object. For these reasons, the value of OIDs
should not be a function of any variable in the database schema. Likewise, basing the
value of an OID on a physical storage address is also discouraged. A commonly prac-
ticed strategy in object databases is the use of system-generated long integers as OID
values and using an index (or hash table) to map the OID values to a physical storage
address. Immutable objects like numbers and character strings usually do not have
OIDs since they are typically stored within an object and cannot be referenced from
outside the object. Note that OIDs do not eliminate the need for user-defined keys
(e.g., candidate keys or primary key) because OIDs are not only prohibited for use in
external interactions, but also are often not user-friendly means of external interac-
tion. With respect to inter-object reference, the use of OID is somewhat similar to the
use of a foreign key in a relational data model, except that an OID can point to an
object anywhere in the OODBMS while a foreign key in an RDBMS is constrained to
reference an attribute in a specific referenced relation. Lack of such a restriction in
OODBMS imposes the responsibility for proper references on the application program.
Use of OIDs for inter-object reference as in containment hierarchies is essentially
equivalent to the low-level pointer mechanism originally defined in the CODASYL data
model. Date (1998) asserts that OIDs have no place in the data model as far as the user
is concerned.
6
Rational Rose is one of the popular CASE tools for drawing UML diagrams.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix B
736
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Object-Oriented Data Modeling Architectures
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix B
Summary
738 The need to handle complex data types and to represent an object as consisting of both the
data structure and set of operations that can be used to manipulate it (i.e., the concept of
encapsulation) led to the creation of the object-oriented data model, which is intricately woven
with the OO programming languages. In fact, OODBMS in principle is simply equivalent to add-
ing persistence to OO programming language. The object-relational data model incorporating
selected OO constructs as an extension to the relational data model has been proposed as an
alternative to combat this problem. This has triggered a debate between proponents of the
object-oriented data model and the relational data model. Both sides agree that the relational
model is capable of supporting standard business applications, but lacks the capability to sup-
port special applications using complex data types. They, however, disagree as to whether
extensions to the relational model can overcome this limitation. Proponents of the relational data
model claim that the relational data model is a necessary part of any database management
system and believe that extensions to the relational model (i.e., the object-relational data model)
can address its limitations by effectively incorporating OO constructs as an extension to the the-
oretically sound relational data model.
Selected Bibliography
Date, C. J. (2004) An Introduction to Database Systems, Eighth Edition, Addison-Wesley.
Elmasri, R. and Navathe, S. B. (2003) Fundamentals of Database Systems, Fourth Edition,
Addison-Wesley.
Ramakrishnan, R. and Gehrke, J (2000) Database Management Systems, McGraw Hill.
Stonebraker, M. et al. (1990) “Third-Generation Database System Manifesto,” ACM SIGMOD
Record, 19, 3.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
SELECTED BIBLIOGRAPHY
Abrial, J. “Data Semantics.” Data Base Methods for Relational Databases.” Data &
Management. Eds. J. W. Klimbie and K. Knowledge Engineering, 21, 55–77, 1997.
L. Koffeman. Amsterdam: North-Holland, Codd, E. F. “A Relational Model for Large
1974. Shared Data Banks.” Communications of
Armstrong, W. W. “Dependence Structures of the ACM, 13, 6 (June) 377–387, 1970.
Data Base Relationships.” Proceedings of Connolly, T. M., and C. Begg. Database
IFIP Congress, Stockholm, Sweden, 1974. Systems: A Practical Approach to
Batini, C., M. Lenzerini, and S. B. Navathe. “A Design, Implementation, and
Comparative Analysis of Methodologies Management, 4th Edition. Boston:
for Database Schema Integration.” ACM Addison-Wesley, 2005.
Computing Surveys, 8, 4 (December) Courtney, J. F., and D. B. Paradice. Database
323–364, 1986. Systems for Management. St. Louis: Times
Batini, C., S. Ceri, and S. B. Navathe. Mirror/Mosby College Publishing, 1988.
Conceptual Database Design: An Entity- Darwen, H. “The Role of Functional
Relationship Approach. Boston: Addison- Dependencies in Query Decomposition.”
Wesley, 1991. In C. J. Date and H. Darwen, Relational
Booch, G., J. Rumbaugh, and I. Jacobson. The Database Writings 1989–1991, 133–154.
Unified Modeling Language User Guide, Boston: Addison-Wesley, 1992.
Second Edition. Boston: Addison-Wesley, Date, C. J. An Introduction to Database
2005. Systems, 8th Edition. Boston: Addison-
Catriel, B., R. Fagin, and J. H. Howard. “A Wesley, 2004.
Complete Axiomatization for Functional Date, C. J., and H. Darwen. A Guide to the
and Multi-valued Dependencies.” SQL Standard, Fourth Edition. Boston:
Proceedings of the ACM/SIGMOD Addison-Wesley, 1997.
International Conference on Management Date, C. J., and H. Darwen. Foundation for
of Data, Toronto, Canada (August) 1977. Object/Relational Databases. Boston:
Chen, P. “The Entity Relationship Model: Addison-Wesley, 1998.
Toward a Unified View of Data,” ACM Dey, D., V. C. Storey, and T. M. Barron.
Transactions on Database Systems, 1 “Improving Database Design through the
(March) 9–36, 1976. Analysis of Relationships.” ACM
Chen, P. “The Entity-Relationship Approach Transactions on Database Systems, 24,
to Logical Database Design.” Information 4 (December) 453–486, 1999.
Technology in Action: Trends and Elmasri, R., and S. B. Navathe. Fundamentals
Perspectives. Ed. R. Y. Wang. Upper of Database Systems, Sixth Edition.
Saddle River, NJ: Prentice Hall, 1993. Boston: Addison-Wesley, 2010.
Chiang, R. H. L., T. M. Barron, and V. C. Everest, G. C. Database Management
Storey. “Reverse Engineering of Objectives, System Functions and
Relational Databases: Extraction of an ER Administration. New York: McGraw-Hill,
Model from a Relational Database” Data 1986.
& Knowledge Engineering, 12, 107–142, Fagin, R. “Normal Forms and Relational
1994. Database Operators.” Proceedings of the
Chiang, R. H. L., T. M. Barron, and V. C. Storey. ACM/SIGMOD International Conference
“A Framework for the Design and on Management of Data, (May/June)
Evaluation of Reverse Engineering 1979.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Selected Bibliography
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Selected Bibliography
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
INDEX
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
M
J
Madeira College (case study), modeling
join dependencies (JDs) and 5NF, 480–497 complex relationships, 198–203,
join operations 216–221
examples of, 602–613 maintenance of file-processing systems, 5
natural join operation, 295 mandatory attributes, 34, 95
and operators, 551–557 mapping
join operators, 551–557 aggregations, 326–327
complex ER model, 336–344
enhanced ER model constructs to logical
K schema, 321–328
key attributes, 37, 386 entity types, 298–299
key constraints, 497 ER model to logical schema, 298–315
information-preserving, relational data
model, 315–320
L specialization, specialization hierarchy,
lattice, specialization, 156 lattice, 321–326
leaf nodes, 657 materialized views in relational data model,
Left Outer Join operations, 555–556, 296–297
608–610 MAX function (SQL), 622–623
LEVEL pseudo-column (SQL), 660 metadata described, 2
life cycle methods in conceptual data model, 28
database design, 19–24 min, max notation, for structural constraints,
data modeling/database design (fig.), 18 104–111
LIKE operator, 593, 596 MIN function (SQL), 622–623
logical minimal cover, 368–369
data independence, 8 MINUS keyword (SQL), 601
data modeling, 21–22, 722–728 m:n relationship types, resolution of,
data structures, 719–721 115–116
data types, 33 MODEL clause (SQL), 706
logical data modeling basic syntax of, 693–694
introduction, 277–279 concepts, 693
relational data model. See relational data example of, 694–699
model modeling
logical data structures, 719–721 complex relationships, 197
inverted tree structure, 719–720 conceptual data. See conceptual data
network data structure, 721 modeling
logical schema, mapping ER model to, data. See data modeling
298–315 ER (entity relationship). See ER modeling
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
relationships. See also specific relationship integrity constraints, 36, 82, 103, 284
752
complex. See complex relationships semantic modeling, 27
in conceptual data modeling, 31 semicolons (;) in SQL statements, 569
(min, max) notation for the structural Semi-Join operations, 560–561, 612–613
constraints of, 104–111 Semi-Minus operations, 561, 612–613
parent-child relationship (PCR), 44 sequential access, 3
relationship sets described, 39 servers, universal, 737
relationship types set default rule, 60
attributes of, 96 set null rule, 60
described, 38–42, 70 set theoretic operators, 549–551
mapping, 300–309 shared subclass, 156
resolution of m:n, 115–116 simple attributes, 33
structural constraints of, 43–51 single-valued attributes, 33–34
REPTS_TO column (SQL), 657 SPARC (Standards Planning and Requirements
requirements specification in database Committee), 6
design, 18–21, 35 specialization, 149
resolution of m:n relationship types, 115–116 and categorization, 157–160
restrict rule, 59 and generalization in superclass/subclass
reverse engineering role in data modeling, relationships, 146–154, 188
442–456, 458 hierarchy, and lattice, 154–157, 188
Right Outer Join operations, 555–556, specialization lattice, 156
610–611 SQL (Structured Query Language). See also
roles, names, and entity types, 42 specific statements
ROLLUP operator, 668–672 built-in functions, 2003 standard,
root nodes, 657 635–651
RTRIM function (SQL), 640–643 database data population using, 524–531
rules, business. See business rules database implementation using, 504–505
and database systems, 10
data definition language (DDL) and, 12
S as data manipulation language (DML), 13
schema-based constraints, 284 dates and times, handling, 651–656
schemas pattern matching in, 593–597
and CODASYL data model, 726–728 queries based in single table, 569–597, 631
in relational data model, 282 queries based on binary operators,
scripts 597–613, 631
conceptual modeling, 28 query examples, 700–705
SQL, 506, 506n subqueries, 613–630, 631
SC/se relationships, 144 using generally, 568–569
second normal form (2NF), 398–401 SQL/DCL (SQL data control language), 504
Selection operation, 292–293 SQL/DDL (SQL data definition language)
selective type inheritance, 159 base database table specification in,
Select operator, 542–544 507–524
SELECT statement (SQL), 569–631 described, 504–505
semantic SQL/DML (SQL data manipulation language),
data modeling errors, 119–133 504
errors, and connection traps, 253–257 SQL scripts, 506, 506n
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Index
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.