Get Data Modeling and Database Design 2nd Edition, (Ebook PDF) Free All Chapters

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Download More ebooks [PDF]. Format PDF ebook download PDF KINDLE.

Full download ebooks at ebookmass.com

Data Modeling and Database Design 2nd


Edition, (Ebook PDF)

For dowload this book click BUTTON or LINK below

https://ebookmass.com/product/data-modeling-and-
database-design-2nd-edition-ebook-pdf/

OR CLICK BUTTON

DOWLOAD NOW

Download More ebooks from https://ebookmass.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Visualization, Modeling, and Graphics for Engineering


Design 2nd Edition – Ebook PDF Version

https://ebookmass.com/product/visualization-modeling-and-
graphics-for-engineering-design-2nd-edition-ebook-pdf-version/

Database Systems Design, Implementation, and Management


12th Edition (eTextbook) PDF

https://ebookmass.com/product/database-systems-design-
implementation-and-management-12th-edition-etextbook-pdf/

Process Control: Modeling, Design, and Simulation 2nd


Edition B. Wayne Bequette

https://ebookmass.com/product/process-control-modeling-design-
and-simulation-2nd-edition-b-wayne-bequette/

(eBook PDF) Database Design Application Development


Administration 7th by Mannino

https://ebookmass.com/product/ebook-pdf-database-design-
application-development-administration-7th-by-mannino/
Database Systems: Design, Implementation, and
Management 13th Edition Carlos Coronel

https://ebookmass.com/product/database-systems-design-
implementation-and-management-13th-edition-carlos-coronel/

Data Center Handbook: Plan, Design, Build, And


Operations Of A Smart Data Center 2nd Edition Edition
Hwaiyu Geng

https://ebookmass.com/product/data-center-handbook-plan-design-
build-and-operations-of-a-smart-data-center-2nd-edition-edition-
hwaiyu-geng/

NoSQL and SQL data modeling: bringing together data,


semantics, and software Hills

https://ebookmass.com/product/nosql-and-sql-data-modeling-
bringing-together-data-semantics-and-software-hills/

Database Processing: Fundamentals, Design, and


Implementation 16th Edition David M. Kroenke

https://ebookmass.com/product/database-processing-fundamentals-
design-and-implementation-16th-edition-david-m-kroenke/

(eBook PDF) Data Mining and Predictive Analytics 2nd


Edition

https://ebookmass.com/product/ebook-pdf-data-mining-and-
predictive-analytics-2nd-edition/
To Beloved Bhagwan Sri Sathya Sai Baba, the very source
of my thoughts, words, and deeds
To my Graduate Teaching Assistants and students,
the very source of my inspiration
To my dear children, Sharda and Kausik, always concerned
about their dad overworking
To my dear wife Lalitha, a pillar of courage I always lean on
Uma

There is a verse that says


Focus on what I’m doing right now
And tell me that you appreciate me
So that I learn to feel worthy
And motivated to do more
Led by my family, I have always been surrounded by people
(friends, teachers, and students) who
With their kind thoughts, words, and deeds treat me in this way.
This book is dedicated to these people.
Richard

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
BRIEF CONTENTS

Preface xvii
Chapter 1
Database Systems: Architecture and Components 1

Part I: Conceptual Data Modeling

Chapter 2
Foundation Concepts 30

Chapter 3
Entity-Relationship Modeling 79

Chapter 4
Enhanced Entity-Relationship (EER) Modeling 141

Chapter 5
Modeling Complex Relationships 197

Part II: Logical Data Modeling

Chapter 6
The Relational Data Model 280

P a r t I I I : N o rm a l i z a t i o n

Chapter 7
Functional Dependencies 358

Chapter 8
Normal Forms Based on Functional Dependencies 395

Chapter 9
Higher Normal Forms 467

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
viii Brief Contents

P a r t I V : D a t a b a s e I mp l e me n t a t i o n U s i n g th e R e l a t i o n a l
Data Model

Chapter 10
Database Creation 506

Chapter 11
Relational Algebra 539

Chapter 12
Structured Query Language (SQL) 567

Chapter 13
Advanced Data Manipulation Using SQL 635

Appendix A
Data Modeling Architectures Based on the Inverted Tree
and Network Data Structures 719

Appendix B
Object-Oriented Data Modeling Architectures 731

Selected Bibliography 739

Index 743

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
TABLE OF CONTENTS

Preface xvii

Chapter 1 Database Systems: Architecture and Components 1


1.1 Data, Information, and Metadata 1
1.2 Data Management 3
1.3 Limitations of File-Processing Systems 3
1.4 The ANSI/SPARC Three-Schema Architecture 6
1.4.1 Data Independence Defined 8
1.5 Characteristics of Database Systems 10
1.5.1 What Is a Database System? 11
1.5.2 What Is a Database Management System? 12
1.5.3 Advantages of Database Systems 15
1.6 Data Models 17
1.6.1 Data Models and Database Design 17
1.6.2 Data Modeling and Database Design in a Nutshell 19
Chapter Summary 25
Exercises 25

Part I: Conceptual Data Modeling

Chapter 2 Foundation Concepts 30


2.1 A Conceptual Modeling Framework 30
2.2 ER Modeling Primitives 30
2.3 Foundations of the ER Modeling Grammar 32
2.3.1 Entity Types and Attributes 32
2.3.2 Entity and Attribute-Level Data Integrity Constraints 35
2.3.3 Relationship Types 38
2.3.4 Structural Constraints of a Relationship Type 43
2.3.5 Base Entity Types and Weak Entity Types 52
2.3.6 Cluster Entity Type: A Brief Introduction 57
2.3.7 Specification of Deletion Constraints 58
Chapter Summary 70
Exercises 71

Chapter 3 Entity-Relationship Modeling 79


3.1 Bearcat Incorporated: A Case Study 79
3.2 Applying the ER Modeling Grammar to the Conceptual Modeling Process 81
3.2.1 The Presentation Layer ER Model 82
3.2.2 The Presentation Layer ER Model for Bearcat Incorporated 85

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
x Table of Contents

3.2.3 The Design-Specific ER Model 104


3.2.4 The Decomposed Design-Specific ER Model 111
3.3 Data Modeling Errors 119
3.3.1 Vignette 1 120
3.3.2 Vignette 2 127
Chapter Summary 134
Exercises 134

Chapter 4 Enhanced Entity-Relationship (EER) Modeling 141


4.1 Superclass/subclass Relationship 142
4.1.1 A Motivating Exemplar 142
4.1.2 Introduction to the Intra-Entity Class Relationship Type 143
4.1.3 General Properties of a Superclass/subclass Relationship 145
4.1.4 Specialization and Generalization 146
4.1.5 Specialization Hierarchy and Specialization Lattice 154
4.1.6 Categorization 157
4.1.7 Choosing the Appropriate EER Construct 160
4.1.8 Aggregation 166
4.2 Converting from the Presentation Layer to a Design-Specific EER Diagram 168
4.3 Bearcat Incorporated Data Requirements Revisited 170
4.4 ER Model for the Revised Story 171
4.5 Deletion Rules for Intra-Entity Class Relationships 182
Chapter Summary 188
Exercises 188

Chapter 5 Modeling Complex Relationships 197


5.1 The Ternary Relationship Type 198
5.1.1 Vignette 1—Madeira College 198
5.1.2 Vignette 2—Get Well Pharmacists, Inc. 203
5.2 Beyond the Ternary Relationship Type 205
5.2.1 The Case for a Cluster Entity Type 205
5.2.2 Vignette 3—More on Madeira College 206
5.2.3 Vignette 4—A More Complex Entity Clustering 212
5.2.4 Cluster Entity Type—Additional Examples 212
5.2.5 Madeira College—The Rest of the Story 216
5.2.6 Clustering a Recursive Relationship Type 221
5.3 Inter-Relationship Integrity Constraint 224
5.4 Composites of Weak Relationship Types 230
5.4.1 Inclusion Dependency in Composite Relationship Types 230
5.4.2 Exclusion Dependency in Composites of Weak Relationship Types 231
5.5 Decomposition of Complex Relationship Constructs 234
5.5.1 Decomposing Ternary and Higher-Order Relationship Types 234
5.5.2 Decomposing a Relationship Type with a Multi-Valued Attribute 235
5.5.3 Decomposing a Cluster Entity Type 240
5.5.4 Decomposing Recursive Relationship Types 241
5.5.5 Decomposing a Weak Relationship Type 244

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table of Contents xi

5.6 Validation of the Conceptual Design 246


5.6.1 Fan Trap 246
5.6.2 Chasm Trap 251
5.6.3 Miscellaneous Semantic Traps 253
5.7 Cougar Medical Associates 257
5.7.1 Conceptual Model for CMA: The Genesis 259
5.7.2 Conceptual Model for CMA: The Next Generation 265
5.7.3 The Design-Specific ER Model for CMA: The Final Frontier 266
Chapter Summary 273
Exercises 273

Part II: Logical Data Modeling

Chapter 6 The Relational Data Model 280


6.1 Definition 280
6.2 Characteristics of a Relation 282
6.3 Data Integrity Constraints 283
6.3.1 The Concept of Unique Identifiers 284
6.3.2 Referential Integrity Constraint in the Relational Data Model 290
6.4 A Brief Introduction to Relational Algebra 291
6.4.1 Unary Operations: Selection (s) and Projection (p) 292
6.4.2 Binary Operations: Union ([), Difference (−), and Intersection (\) 293
6.4.3 The Natural Join (*) Operation 295
6.5 Views and Materialized Views in the Relational Data Model 296
6.6 The Issue of Information Preservation 297
6.7 Mapping an ER Model to a Logical Schema 298
6.7.1 Information-Reducing Mapping of ER Constructs 298
6.7.2 An Information-Preserving Mapping 315
6.8 Mapping Enhanced ER Model Constructs to a Logical Schema 320
6.8.1 Information-Reducing Mapping of EER Constructs 321
6.8.2 Information-Preserving Grammar for Enhanced ER Modeling Constructs 328
6.9 Mapping Complex ER Model Constructs to a Logical Schema 336
Chapter Summary 345
Exercises 347

P a r t I I I : N o rm a l i z a t i o n

Chapter 7 Functional Dependencies 358


7.1 A Motivating Exemplar 359
7.2 Functional Dependencies 365
7.2.1 Definition of Functional Dependency 365
7.2.2 Inference Rules for Functional Dependencies 366
7.2.3 Minimal Cover for a Set of Functional Dependencies 367
7.2.4 Closure of a Set of Attributes 372
7.2.5 When Do FDs Arise? 374

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xii Table of Contents

7.3 Candidate Keys Revisited 374


7.3.1 Deriving Candidate Key(s) by Synthesis 375
7.3.2 Deriving Candidate Keys by Decomposition 379
7.3.3 Deriving a Candidate Key—Another Example 382
7.3.4 Prime and Non-prime Attributes 386
Chapter Summary 390
Exercises 390

Chapter 8 Normal Forms Based on Functional Dependencies 395


8.1 Normalization 395
8.1.1 First Normal Form (1NF) 396
8.1.2 Second Normal Form (2NF) 398
8.1.3 Third Normal Form (3NF) 401
8.1.4 Boyce-Codd Normal Form (BCNF) 404
8.1.5 Side Effects of Normalization 407
8.1.6 Summary Notes on Normal Forms 418
8.2 The Motivating Exemplar Revisited 420
8.3 A Comprehensive Approach to Normalization 424
8.3.1 Case 1 424
8.3.2 Case 2 431
8.3.3 A Fast-Track Algorithm for a Non-Loss, Dependency-Preserving
Solution 436
8.4 Denormalization 442
8.5 Role of Reverse Engineering in Data Modeling 443
8.5.1 Reverse Engineering the Normalized Solution of Case 1 445
8.5.2 Reverse Engineering the Normalized Solution of URS2 (Case 3) 451
8.5.3 Reverse Engineering the Normalized Solution of URS3 (Case 2) 453
Chapter Summary 457
Exercises 458

Chapter 9 Higher Normal Forms 467


9.1 Multi-Valued Dependency 467
9.1.1 A Motivating Exemplar for Multi-Valued Dependency 467
9.1.2 Multi-Valued Dependency Defined 469
9.1.3 Inference Rules for Multi-Valued Dependencies 470
9.2 Fourth Normal Form (4NF) 472
9.3 Resolution of a 4NF Violation—A Comprehensive Example 476
9.4 Generality of Multi-Valued Dependencies and 4NF 478
9.5 Join-Dependencies and Fifth Normal Form (5NF) 480
9.6 A Thought-Provoking Exemplar 490
9.7 A Note on Domain Key Normal Form (DK/NF) 497
Chapter Summary 498
Exercises 498

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table of Contents xiii

P a r t I V : Da t a b a s e I m p l e m e n t a t i o n U s i n g th e R e l a t i o n a l
Data Model

Chapter 10 Database Creation 506


10.1 Data Definition Using SQL 507
10.1.1 Base Table Specification in SQL/DDL 507
10.2 Data Population Using SQL 524
10.2.1 The INSERT Statement 525
10.2.2 The DELETE Statement 528
10.2.3 The UPDATE Statement 530
Chapter Summary 532
Exercises 532

Chapter 11 Relational Algebra 539


11.1 Unary Operators 542
11.1.1 The Select Operator 542
11.1.2 The Project Operator 544
11.2 Binary Operators 546
11.2.1 The Cartesian Product Operator 546
11.2.2 Set Theoretic Operators 549
11.2.3 Join Operators 551
11.2.4 The Divide Operator 557
11.2.5 Additional Relational Operators 560
Chapter Summary 563
Exercises 563

Chapter 12 Structured Query Language (SQL) 567


12.1 SQL Queries Based on a Single Table 569
12.1.1 Examples of the Selection Operation 569
12.1.2 Use of Comparison and Logical Operators 572
12.1.3 Examples of the Projection Operation 578
12.1.4 Grouping and Summarizing 580
12.1.5 Handling Null Values 583
12.1.6 Pattern Matching in SQL 593
12.2 SQL Queries Based on Binary Operators 597
12.2.1 The Cartesian Product Operation 597
12.2.2 SQL Queries Involving Set Theoretic Operations 599
12.2.3 Join Operations 602
12.2.4 Outer Join Operations 608
12.2.5 SQL and the Semi-Join and Semi-Minus Operations 612
12.3 Subqueries 613
12.3.1 Multiple-Row Uncorrelated Subqueries 613
12.3.2 Multiple-Row Correlated Subqueries 625
12.3.3 Aggregate Functions and Grouping 628
Chapter Summary 631
Exercises 631

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xiv Table of Contents

Chapter 13 Advanced Data Manipulation Using SQL 635


13.1 Selected SQL:2003 Built-In Functions 635
13.1.1 The SUBSTRING Function 636
13.1.2 The CHAR_LENGTH (char) Function 639
13.1.3 The TRIM Function 640
13.1.4 The TRANSLATE Function 643
13.1.5 The POSITION Function 644
13.1.6 Combining the INSTR and SUBSTR Functions 645
13.1.7 The DECODE Function and the CASE Expression 646
13.1.8 A Query to Simulate the Division Operation 649
13.2 Some Brief Comments on Handling Dates and Times 651
13.3 Hierarchical Queries 656
13.3.1 Using the CONNECT BY and START WITH Clauses with
the PRIOR Operator 658
13.3.2 Using the LEVEL Pseudo-Column 660
13.3.3 Formatting the Results from a Hierarchical Query 661
13.3.4 Using a Subquery in a START WITH Clause 661
13.3.5 The SYS_CONNECT_BY_PATH Function 663
13.3.6 Joins in Hierarchical Queries 664
13.3.7 Incorporating a Hierarchical Structure into a Table 665
13.4 Extended GROUP BY Clauses 668
13.4.1 The ROLLUP Operator 668
13.4.2 Passing Multiple Columns to ROLLUP 669
13.4.3 Changing the Position of Columns Passed to ROLLUP 671
13.4.4 Using the CUBE Operator 672
13.4.5 The GROUPING () Function 674
13.4.6 The GROUPING SETS Extension to the GROUP BY Clause 676
13.4.7 The GROUPING_ID () 677
13.4.8 Using a Column Multiple Times in a GROUP BY Clause 679
13.5 Using the Analytical Functions 681
13.5.1 Analytical Function Types 682
13.5.2 The RANK () and DENSE_RANK () Functions 684
13.5.3 Using ROLLUP, CUBE, and GROUPING SETS Operators with
Analytical Functions 687
13.5.4 Using the Window Functions 688
13.6 A Quick Look at the MODEL Clause 692
13.6.1 MODEL Clause Concepts 693
13.6.2 Basic Syntax of the MODEL Clause 693
13.6.3 An Example of the MODEL Clause 694
13.7 A Potpourri of Other SQL Queries 700
13.7.1 Concluding Example 1 700
13.7.2 Concluding Example 2 702
13.7.3 Concluding Example 3 704
13.7.4 Concluding Example 4 704
13.7.5 Concluding Example 5 705
Chapter Summary 706
Exercises 707
SQL Project 711

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table of Contents xv

Appendix A Data Modeling Architectures Based on the Inverted Tree


and Network Data Structures 719
A.1 Logical Data Structures 719
A.1.1 Inverted Tree Structure 719
A.1.2 Network Data Structure 721
A.2 Logical Data Model Architectures 722
A.2.1 Hierarchical Data Model 722
A.2.2 CODASYL Data Model 726
Summary 729
Selected Bibliography 729

Appendix B Object-Oriented Data Modeling Architectures 731


B.1 The Object-Oriented Data Model 731
B.1.1 Overview of OO Concepts 732
B.1.2 A Note on UML 735
B.2 The Object-Relational Data Model 737
Summary 738
Selected Bibliography 738

Selected Bibliography 739


Index 743

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PREFACE

QUOTE
Everything should be made as simple as possible—but no simpler.
—Albert Einstein

Popular business database books typically provide broad coverage of a wide variety of
topics, including data modeling, database design and implementation, database
administration, the client/server database environment, the Internet database envi-
ronment, distributed databases, and object-oriented database development. This is
invariably at the expense of deeper treatment of critical topics, such as principles of
data modeling and database design. Using current business database books in our
courses, we found that in order to properly cover data modeling and database design,
we had to augment the texts with significant supplemental material (1) to achieve
precision and detail and (2) to impart the depth necessary for the students to gain a
robust understanding of data modeling and database design. In addition, we ended up
skipping several chapters as topics to be covered in a different course. We also know
other instructors who share this experience. Broad coverage of many database topics
in a single book is appropriate for some audiences, but that is not the aim of this
book.
The goal of Data Modeling and Database Design, Second Edition is to provide
core competency in the areas that every Information Systems (IS), Computer Science
(CS), and Computer Information Systems (CIS) student and professional should
acquire: data modeling and database design. It is our experience that this set of
topics is the most essential for database professionals, and that, covered in sufficient
depth, these topics alone require a full semester of study. It is our intention to
address these topics at a level of technical depth achieved in CS textbooks, yet make
palatable to the business student/IS professional with little sacrifice in precision. We
deliberately refrain from the mathematics and algorithmic solutions usually found in
CS textbooks, yet we attempt to capture the precision therein via heuristic
expressions.
Data Modeling and Database Design, Second Edition provides not just hands-on
instruction in current data modeling and database design practices, it gives readers a
thorough conceptual background for these practices. We do not subscribe to the idea
that a textbook should limit itself to describing what is actually being practiced.
Teaching only what is being practiced is bound to lead to knowledge stagnation.
Where do practitioners learn what they know? Did they invent the relational data
model? Did they invent the ER model? We believe that it is our responsibility to
present not only industry “best practices” but also to provide students (future practi-
tioners) with concepts and techniques that are not necessarily used in industry today

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xviii Preface

but can enliven their practice and help it evolve without knowledge stagnation. One
of the coauthors of this book has worked in the software development industry for
over 15 years, with a significant focus on database development. His experience indi-
cates that having a richness of advanced data modeling constructs available enhances
the robustness of database design and that practitioners readily adopt these techni-
ques in their design practices.
In a nutshell, our goal is to take an IS/CS/CIS student/professional through an
intense educational experience, starting at conceptual modeling and culminating in a
fully implemented database design—nothing more and nothing less. This educational
journey is briefly articulated in the following paragraphs.

STRUCTURE
We have tried very hard to make the book “fluff-free.” It is our hope that every sen-
tence in the book, including this preface, adds value to a reader’s learning (and foot-
notes are no exception to this statement).
The book begins with an introduction to rudimentary concepts of data, metadata,
and information, followed by an overview of data management. Pointing out the limita-
tions of file-processing systems, Chapter 1 introduces database systems as a solution to
overcome these limitations. The architecture and components of a database system that
makes this possible are discussed. The chapter concludes with the presentation of a
framework for the database system design life cycle. Following the introductory chapter
on database systems architecture and components, the book contains four parts.

Part I: Conceptual Data Modeling


Part I addresses the topic of conceptual data modeling—that is, modeling at the high-
est level of abstraction, independent of the limitations of the technology employed to
deploy the database system. Four chapters (Chapters 2–5) are used in order to pro-
vide an extensive discussion of conceptual data modeling. Chapter 2 lays the ground-
work using the Entity-Relationship (ER) modeling grammar as the principal means
to model a database application domain. Chapter 3 elaborates on the use of the ER
modeling grammar in progressive layers and exemplifies the modeling technique with
a comprehensive case called Bearcat Incorporated. This is followed by a presentation
in Chapter 4 of richer data modeling constructs that overlap with object-oriented
modeling constructs. The Bearcat Incorporated story is further enriched to demon-
strate the value of Enhanced ER (EER) modeling constructs. Chapter 5 provides
exclusive coverage of modeling complex relationships that have meaningful real-world
significance. At the end of Part I, the reader ought to be able to fully appreciate the
value of conceptual data modeling in the database system design life cycle.
This second edition of Data Modeling and Database Design includes the follow-
ing major enhancements:
• The material in Chapters 2 and 3 has been reorganized and better stream-
lined so that the reader not only learns the ER modeling grammar but is able
to develop very simple applications of ER modeling. In Chapter 3, the model-
ing method steps have been reconfigured across the Presentation Layer and

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xix

Design-Specific layer of the ER model. Also, the unique learning technique


via error detection exclusively developed by us is presented at the end of
Chapter 3.
• The intra-entity class relationships are introduced with a new simpler exam-
ple at the beginning of Chapter 4.
• The already extensive coverage of complex relationships in Chapter 5 is aug-
mented by a few newer modeling ideas. Additional examples clarifying
decomposition of complex relationships in preparation for logical model
mapping have also been added to this chapter.

Part II: Logical Data Modeling


Part II of the book is dedicated to the discussion of migration of a conceptual data
model to its logical counterpart. Since the relational data model architecture forms
the basis for the logical data modeling discussed in this textbook, Chapter 6 focuses
on its characteristics. Other logical data modeling architectures prevalent in some
legacy systems, the hierarchical data model, and the CODASYL data model appear in
Appendix A. An introduction to object-oriented data modeling concepts is presented
in Appendix B. The rest of Chapter 6 describes techniques to map a conceptual data
model to its logical counterpart. An information-preserving logical data modeling
grammar is introduced and contrasted with existing popular mapping techniques that
are information reducing. A comprehensive set of examples is used to clarify the use
and value of the information-preserving grammar.
An important addition to the current edition of the book is a section on mapping
complex relationship types to the logical tier.

Part III: Normalization


Part III addresses the critical question of the “goodness” of a database design that
results from a conceptual and logical data modeling processes. Normalization is
introduced as the “scientific” way to verify and improve the quality of a logical
schema that is available at this stage in the database design. Three chapters are
employed to cover the topic of normalization. In Chapter 7, we take a look at data
redundancy in a relation schema and see how it manifests as a problem. We then
trace the problem to its source—namely, undesirable functional dependencies. To
that end, we first learn about functional dependencies axiomatically and how infer-
ence rules (Armstrong’s axioms) can be used to derive candidate keys of a relation
schema. In Chapter 8, the solution offered by the normalization process to data
redundancy problems triggered by undesirable functional dependencies is presented.
After discussing first, second, third and Boyce-Codd normal forms individually, we
examine the side effects of normalization—namely, dependency preservation and
non-loss decomposition and their consequences. Next, we present real-world scenar-
ios of deriving full-fledged relational schemas (sets of relation schemas), given sets of
functional dependencies using several examples. The useful topic of denormalization
is covered next. Reverse engineering a normalized relational schema to the concep-
tual tier often forges insightful understanding of the database design and enables a
database designer to become a better data modeler. Despite its practical utility, this

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xx Preface

topic is rarely covered in database textbooks. Chapter 9 completes the discussion of


normalization by examining multi-valued dependency (MVD) and join-dependency
(JD) and their impact on a relation schema in terms of fourth normal form (4NF) and
Project/Join normal form, viz., PJNF (also known as fifth normal form—5NF)
respectively.
An interesting enhancement in Chapter 8 is the introduction of a fast-track algo-
rithm to achieve a non-loss, dependency-preserving 3NF design. Two distinct exam-
ples demonstrating the use of the algorithm are presented. The discussion of MVD
and 4NF, of JD and 5NF, and their respective expressiveness of ternary and n-ray
relationships is presented in Chapter 9. Additional examples offer unique insights into
apparently conflicting alternative solutions.

Part IV: Database Implementation Using the Relational Database Model


Part IV pertains to database implementation using the relational data model. Spread
over four chapters, this part of the book covers relational algebra and the ANSI/ISO
standard Structured Query Language (SQL). Chapter 10 focuses on the data defini-
tion language (DDL) aspect of SQL. Included in the discussion are the SQL schema
evolution statements for adding, altering, or dropping table structures, attributes,
constraints, and supporting structures. This is followed by the development of SQL/
DDL script for a comprehensive case about a college registration system. The chapter
also includes the use of INSERT, UPDATE, and DELETE statements in populating a
database and performing database maintenance.
Chapters 11, 12, and 13 focus on relational algebra and the use of SQL for data
manipulation. Chapter 11 concentrates on E. F. Codd’s eight original relational alge-
bra operations as a means to specify the logic for data retrieval from a relational
database. SQL, the most common way that relational algebra is implemented for data
retrieval operations, is the subject of Chapter 12. Chapter 13 covers a number of
built-in functions used by SQL to work with strings, dates, and times, and it illustrates
how SQL can be used to do retrievals against hierarchically structured data. This
chapter also provides an introduction to some of the features of SQL that facilitate
the summarization and analysis of data. The chapter ends with an SQL database
project that provides students with a real-life scenario to test and apply the skills and
concepts presented in Part IV.

FEATURES OF EACH CHAPTER


Since our objective is a crisp and clear presentation of rather intricate subject matter,
each chapter begins with a simple introduction, followed by the treatment of the
subject matter, and concludes with a chapter summary and a set of exercises based
on the subject matter.

WHAT MAKES THIS BOOK DIFFERENT?


Every book has strengths and weaknesses. If lack of breadth in the coverage of
database topics is considered a weakness, we have deliberately chosen to be weak in
that dimension. We have not planned this book to be another general book on

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xxi

database systems. We have chosen to limit the scope of this book exclusively to data
modeling and database design since we firmly believe that this set of topics is the
core of database systems and must be learned in depth by every IS/CS/CIS student
and practitioner. Any system designed robustly has the potential to best serve the
needs of the users. More importantly, a poor design is a virus that can ruin an
enterprise.
In this light, we believe these are the unique strengths of this book:
• It presents conceptual modeling using the entity-relationship modeling gram-
mar including extensive discussion of the enhanced entity-relationship (ER)
model.
We believe that a conceptual model should capture all possible constraints
conveyed by the business rules implicit in users’ requirement specifica-
tions. To that end, we posit that an ER diagram is not an ER model unless
accompanied by a comprehensive specification of characteristics of and
constraints pertaining to attributes. We accomplish this via a list of
semantic integrity constraints (sort of a conceptual data dictionary) that
will accompany an ER diagram, a unique feature that we have not seen in
other database textbooks. We also seek to demonstrate the systematic
development of a multi-layer conceptual data model via a comprehensive
illustration at the beginning of each Part. We consider the multi-layer
modeling strategy and the heuristics for systematic development as unique
features of this book.
• It includes substantial coverage of higher-degree relationships and other
complex relationships in the entity-relationship diagram.
Most business database books seem to provide only a cursory treatment of
complex relationships in an ER model. We not only cover relationships
beyond binary relationships (e.g., ternary and higher-degree relationships),
we also clarify the nuances pertaining to the necessity and efficacy of
higher-degree relationships and the various conditions under which even
recursive and binary relationships are aggregated in interesting ways to
form cluster entity types.
• It discusses the information-preserving issue in data model mapping and
introduces a new information-preserving grammar for logical data modeling.
Many computer scientists have noted that the major difficulty of logical
database design (i.e., transforming an ER schema into a schema in the lan-
guage of some logical model) is the information preservation issue. Indeed,
assuring a complete mapping of all modeling constructs and constraints
that are inherent, implicit or explicit, in the source schema (e.g., ER/EER
model) is problematic since constraints of the source model often cannot be
represented directly in terms of structures and constraints of the target
model (e.g., relational schema). In such a case, they must be realized
through application programs; alternatively, an information-reducing trans-
formation must be accepted (Fahrner and Vossen, 1995). In their research,
initially presented at the Workshop on Information Technologies (WITS) in
the ICIS (International Conference on Information Systems) in Brisbane,

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xxii Preface

Australia, Umanath and Chiang (2000) describe a logical modeling gram-


mar that generates an information preserving transformation. Umanath
further revised this modeling grammar based on the feedback received at
WITS. We have included this logical modeling grammar as a unique com-
ponent of this textbook.
• It includes unique features under the topic of normalization rarely covered in
business database books:
• Inference rules for functional dependencies (Armstrong’s axioms)
and derivations of candidate keys from a set of functional
dependencies
• Derivation of canonical covers for a set of semantically obvious func-
tional dependencies
• Rich examples to clarify the basic normal forms (first, second, third,
and Boyce-Codd)
• Derivation of a complete logical schema from a large set of functional
dependencies considering lossless (non-additive) join properties and
dependency preservation
• Reverse engineering a logical schema to an entity-relationship diagram
• Advanced coverage of fourth and fifth normal form (project-join normal
form, abbreviated “PJNF”) using a variety of examples
• It supports in-depth coverage of relational algebra with a significant number
of examples of their operationalization in ANSI/ISO SQL.

A NOTE TO THE INSTRUCTOR


The content of this book is designed for a rigorous one-semester course in database
design and development and may be used at both undergraduate and graduate levels.
Technical emphasis can be tempered by minimizing or eliminating the coverage of
some of the following topics from the course syllabus: Enhanced Entity-Relationship
(EER) Modeling (Chapter 4) and the related data model mapping topics in Chapter 6
(Section 6.8) on Mapping Enhanced ER Modeling Constructs to a Logical Schema;
Modeling Complex Relationships (Chapter 5); and higher normal forms (Chapter 9).
The suggested exclusions will not impair the continuity of the subject matter in the
rest of the book.

SUPPORTING TECHNOLOGIES
Any business database book can be effective only when supporting technologies are
made available for student use. Yet, we don’t think that the type of book we are writ-
ing should be married to any commercial product. The specific technologies that will
render this book highly effective include a drawing tool (such as Microsoft Visio), a
software engineering tool (such as ERWIN, ORACLE/Designer, or Visible Analyst),
and a relational database management system (RDBMS) product (such as ORACLE,
SQL Server, or DB2).

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Preface xxiii

SUPPLEMENTAL MATERIALS
The following supplemental materials are available to instructors when this book is
used in a classroom setting. Some of these materials may also be found on the
Cengage Learning Web site at www.cengage.com.
• Electronic Instructor’s Manual: The Instructor’s Manual assists in class
preparation by providing suggestions and strategies for teaching the text, and
solutions to the end-of-chapter questions/problems.
• Sample Syllabi and Course Outline: The sample syllabi and course outlines
are provided as a foundation to begin planning and organizing your course.
• Cognero Test Bank: Cognero allows instructors to create and administer
printed, computer (LAN-based), and Internet exams. The Test Bank includes
an array of questions that correspond to the topics covered in this text,
enabling students to generate detailed study guides that include page refer-
ences for further review. The computer-based and Internet testing compo-
nents allow students to generate detailed study guides that include page
references for further review. The computer-based and Internet testing
components allow students to take exams at their computers, and also save
the instructor time by automatically grading each exam. The Test Bank is
also available in Blackboard and WebCT versions posted online at www
.course.com.
• PowerPoint Presentations: Microsoft PowerPoint slides for each chapter are
included as a teaching aid for classroom presentation, to make available to
students on the network for chapter review, or to be printed for classroom
distribution. Instructors can add their own slides for additional topics they
introduce to the class.
• Figure Files: Figure files from each chapter are provided for the instructor’s
use in the classroom.
• Data Files: Data files containing scripts to populate the database tables used
as examples in Chapters 11 and 12 are provided on the Cengage Learning
Web site at www.cengage.com.

ACKNOWLEDGMENTS
We have never written a textbook before. We have been using books written by our
academic colleagues, always supplemented with handouts that we developed our-
selves. Over the years, we accumulated a lot of supplemental material. In the begin-
ning, we took the positive feedback from the students about the supplemental
material rather lightly until we started to see comments like “I don’t know why I
bought the book; the instructor’s handouts were so good and much clearer than the
book” in the student evaluation forms. Our impetus to write a textbook thus origi-
nated from the consistent positive feedback from our students.
We also realized that, contrary to popular belief, business students are certainly
capable of assimilating intricate technical concepts; the trick is to frame the concepts
in meaningful business scenarios. The unsolicited testimonials from our alumni about

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
xxiv Preface

the usefulness of the technical depth offered in our database course in solving real-
world design problems reinforced our faith in developing a book focused exclusively
on data modeling and database design that was technically rigorous but permeated
with business relevance.
Since we both teach database courses regularly, we have had the opportunity to
field-test the manuscript of this book for close to 10 years at both undergraduate-level
and graduate-level information systems courses in the Carl Lindner College of
Business at the University of Cincinnati and in the C. T. Bauer College of Business at
the University of Houston. Hundreds of students—mostly business students—have
used earlier drafts of this textbook so far. Interestingly, even the computer science
and engineering students taking our courses have expressed their appreciation of the
content. This is a long preamble to acknowledge one of the most important and for-
mative elements in the creation of this book: our students.
The students’ continued feedback (comments, complaints, suggestions, and criti-
cisms) have significantly contributed to the improvement of the content. As we were
cycling through revisions of the manuscript, the graduate teaching assistants of
Dr. Umanath were a constant source of inspiration. Their meaningful questions and
suggestions added significant value to the content of this book. Dr. Scamell was ably
assisted by his graduate assistants as well.
We would also like to thank the following reviewers whose critiques, comments,
and suggestions helped shape every chapter of this book’s first edition:
Akhilesh Bajaj, University of Tulsa
Iris Junlgas, Florida State University
Margaret Porciello, State University of New York/Farmingdale
Sandeep Purao, Pennsylvania State University
Jaymeen Shah, Texas State University
Last, but by no means the least, we gratefully acknowledge the significant contri-
bution of Deb Kaufmann and Kent Williams, the development editors of our first and
second editions, respectively. We cannot thank them enough for their thorough and
also prompt and supportive efforts.
Enjoy!

N. S. Umanath

R. W. Scamell

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 1
DATABASE SYSTEMS:
ARCHITECTURE AND
COMPONENTS

Data modeling and database design involve elements of both art and engineering.
Understanding user requirements and modeling them in the form of an effective logical
database design is an artistic process. Transforming the design into a physical database
with functionally complete and efficient applications is an engineering process.
To better comprehend what drives the design of databases, it is important to under-
stand the distinction between data and information. Data consists of raw facts—that is,
facts that have not yet been processed to reveal their meaning. Processing these facts
provides information on which decisions can be based.
Timely and useful information requires that data be accurate and stored in a manner
that is easy to access and process. And, like any basic resource, data must be managed
carefully. Data management is a discipline that focuses on the proper acquisition, storage,
maintenance, and retrieval of data. Typically, the use of a database enables efficient and
effective management of data.
This chapter introduces the rudimentary concepts of data and how information
emerges from data when viewed through the lens of metadata. Next, the discussion
addresses data management, contrasting file-processing systems with database systems.
This is followed by brief examples of desktop, workgroup, and enterprise databases. The
chapter then presents a framework for database design that describes the multiple tiers of
data modeling and how these tiers function in database design. This framework serves as a
roadmap to guide the reader through the remainder of the book.

1.1 DATA, INFORMATION, AND METADATA


Although the terms are often used interchangeably, information is different from data.
Data can be viewed as raw material consisting of unorganized facts about things, events,
activities, and transactions. While data may have implicit meaning, the lack of organiza-
tion renders it valueless. In other words, information is data in context—that is, data that
has been organized into a specific context such that it has value to its recipient.
As an example, consider the digits 2357111317. What does this string of digits
represent? One response is that they are simply 10 meaningless digits. Another might be

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1

the number 31 (obtained by summing the 10 digits). A mathematician may see a set of
2
prime numbers, viz., 2, 3, 5, 7, 11, 13, 17. Another might see a person’s phone number with
the first three digits constituting the area code and the remaining seven digits the local
phone number. On the other hand, if the first digit is used to represent a person’s gender
(1 for male and 2 for female) and the remaining nine digits the person’s Social Security
number, the 10 digits would mean something else. Numerous other interpretations are pos-
sible, but without a context it is impossible to say what the digits represent. However, when
framed in a specific context (such as being told that the first digit represents a person’s
gender and the remaining digits the Social Security number), the data is transformed into
information. It is important to note that “information” is not necessarily the “Truth” since
the same data yields different information based on the context; information is an inference.
Metadata, in a database environment, is data that describes the properties of data. It
contains a complete definition or description of database structure (i.e., the file structure,
data type, and storage format of each data item), and other constraints on the stored data.
For example, when the structure of the 10 digits 2357111317 is revealed, the 10 digits
become information, such as a phone number. Metadata defines this structure. In other
words, through the lens of metadata, data takes on specific meaning and yields information.1
Metadata may be characterized as follows:
• The lens to view data and infer information
• A precise definition of the context for framing the data
Table 1.1 contains metadata for the data associated with a manufacturing plant. Later
in this chapter, we will see that in a database environment, metadata is recorded in what
is called a data dictionary.

Record
Type Data Element Data Type Size Source Role Domain

PLANT Pl_name Alphabetic 30 Stored Non-key

PLANT Pl_number Numeric 2 Stored Key Integer values


from 10 to 20

PLANT Budget Numeric 7 Stored Non-key

PLANT Building Alphabetic 20 Stored Non-key

PLANT No_of_employees Numeric 4 Derived Non-key

TABLE 1.1 Some metadata for a manufacturing plant

As reflected in Table 1.1, the smallest unit of data is called a data element. A group of
related data elements treated as a unit (such as Pl_name, Pl_number, Budget, Building,

1
With the advent of the data warehouse, the term “metadata” assumes a more comprehensive
meaning to include business and technical metadata, which is outside the scope of the current
discussion.

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components

and No_of_employees) is called a record type. A set of values for the data elements con-
3
stituting a record type is called a record instance or simply a record. A file is a collection
of records. A file is sometimes referred to as a data set. A company with 10 plants would
have a PLANT file or a PLANT data set that contains 10 records.

1.2 DATA MANAGEMENT


This book focuses strictly on management of data, as opposed to the management of
human resources. Data management involves four actions: (a) data creation, (b) data
retrieval, (c) data modification or updating, and (d) data deletion. Two data management
functions support these four actions: Data must be accessed and, for ease of access, data
must be organized.
Despite today’s sophisticated information technologies, there are still only two pri-
mary approaches for accessing data. One is sequential access, where in order to get to the
nth record in a data set it is necessary to pass through the previous n–1 records in the
data set. The second approach is direct access, where it is possible to get to the nth
record without having to pass through the previous n–1 records. While direct access is
useful for ad hoc querying of information, sequential access remains essential for
transaction processing applications such as generating payroll, grade reports, and
utility bills.
In order to access data, the data must be organized. For sequential access, this means
that all records in a file must be stored (organized) through some order using a unique
identifier, such as employee number, inventory number, flight number, account number,
or stock symbol. This is called sequential organization. A serial (unordered) collection of
records, also known as a “heap file,” cannot provide sequential access. For direct
access, the records in a file can be stored serially and organized either randomly or by
using an external index. A randomly organized file is one in which the value of a unique
identifier is processed by some sort of transformation routine (often called a “hashing
algorithm”) that computes the location of records within the file (relative record
numbers). An indexed file makes use of an index external to the data set similar in nature
to the one found at the back of this book to identify the location where a record is
physically stored.
As discussed in Section 1.5, a database takes advantage of software called a database
management system (DBMS) that sits on top of a set of files physically organized as
sequential files and/or as some form of direct access files. A DBMS facilitates data access
in a database without burdening a user with the details of how the data is physically
organized.

1.3 LIMITATIONS OF FILE-PROCESSING SYSTEMS


Computer applications in the 1960s and 1970s focused primarily on automating clerical
tasks. These applications made use of records stored in separate files and thus were
called file-processing systems. Although file-processing systems for information systems
applications have been useful for many years, database technology has rendered them
obsolete except for their use in a few legacy systems such as some payroll and customer

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1

billing systems. Nonetheless, understanding their limitations provides insight into the
4
development of and justification for database systems.
Figure 1.1 shows three file-processing systems for a hypothetical university. One pro-
cesses data for students, another processes data for faculty and staff, and a third processes
data for alumni. In such an environment, each file-processing system has its own collec-
tion of private files and programs that access these files.

© 2015 Cengage Learning®


FIGURE 1.1 An example of a file-processing environment

While an improvement over the manual systems that preceded them, file-processing
systems suffer from a number of limitations:
• Lack of data integrity—Data integrity ensures that data values are correct,
consistent, complete, and current. Duplication of data in isolated file-
processing systems leads to the possibility of inconsistent data. Then it is
difficult to identify which of these duplicate data is correct, complete, and/
or current. This creates data integrity problems. For example, if an
employee who is also a student and an alumnus changes his or her mailing
address, files that contain the mailing address in three different file-
processing systems require updating to ensure consistency of information
across the board. Data redundancy across the three file-processing
systems not only creates maintenance inefficiencies, it also leads to the
problem of not knowing which is the current, correct, and /or complete
address of the person.
• Lack of standards—Organizations with file-processing systems often lack or
find it difficult to enforce standards for naming data items as well as for
accessing, updating, and protecting data. The absence of such standards can

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components

lead to unauthorized access and accidental or intentional damage to or


5
destruction of data. In essence, security and confidentiality of information
may be compromised.
• Lack of flexibility/maintainability—Information systems make it possible
for end users to develop information requirements that they had never
envisioned previously. This inevitably leads to a substantial increase in
requests for new queries and reports. However, file-processing systems are
dependent upon a programmer who has to either write or modify program
code to meet these information requirements from isolated data. This can
bring about information requests that are not satisfied or programs that are
inefficiently written, poorly documented, and difficult to maintain.
These limitations are actually symptoms resulting from two fundamental problems:
lack of integration of related data and lack of program-data independence.
• Lack of data integration—Data is separated and isolated, and ownership of
data is compartmentalized, resulting in limited data sharing. For example, to
produce a list of employees who are students and alumni at the same time,
data from multiple files must be accessed. This process can be quite complex
and time consuming since a program has to access and perform logical com-
parisons across independent files containing employee, student, and alumni
data. In short, lack of integration of data contributes to all of the problems
listed previously as symptoms.
• Lack of program-data independence—In a file-processing environment, the
structural layout of each file is embedded in the application programs. That
is, the metadata of a file is fully coded in each application program that uses
the particular file. Perhaps the most often-cited example of the program-data
dependence problem occurred during the file-processing era, when it was
common for an organization to expand the zip code field from five digits to
nine digits. In order to implement this change, every program in the
employee, student, and alumni file-processing systems containing the zip
code field had to be identified (often a time-consuming process itself) and
then modified to conform to the new file structure. This not only required
modification of each program and its documentation but also recompiling and
retesting of the program. Likewise, if a decision was made to change the
organization of a file from indexed to random, since the structure of the file
was mapped into every program using the file, every program using the file
had to be modified. Identifying all the affected programs for corrective action
was not a simple task, either. Thus, because of lack of program-data inde-
pendence, file-processing systems lack flexibility since they are not amenable
to structural changes in data. Program-data dependence also exacerbates data
security and confidentiality problems.
It is only through attacking the problems of lack of program-data independence and
lack of integration of related data that the limitations of file-processing systems can be

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1

eliminated. If a way is found to deal with these problems so as to establish centralized


6
control of data, then unnecessary redundancy can be reduced, data can be shared, stan-
dards can be enforced, security restrictions can be applied, and integrity can be main-
tained. One of the objectives of database systems is to integrate data without programmer
intervention in a way that eliminates data redundancy. The other objective of database
systems is to establish program-data independence, so that programs that access the data
are immune to changes in storage structure (how the data is physically organized) and
access technique.
The Time Life company experienced many of these problems in its early days.
Time Life was established in 1961 as a book-marketing division. It took its name from
Time and Life magazines, which at the time were two of the most popular weeklies
on the market. Time Life gained fame as a seller of book series that were mailed to
households in monthly installments, operating as book sales clubs. Most of the series
were more or less encyclopedic in nature (e.g., The LIFE History of the United States,
The Time-Life Encyclopedia of Gardening, The Great Cities, The American
Wilderness, etc.), providing the basics of the subjects in the way it might be done in
a series of lectures aimed at the general public. Over the years, more than 50 series
were published.
During the 1970s and first half of the 1980s, Time Life exhibited all of the character-
istics of a file-processing system. A separate collection of files was maintained for each
book series. Thus, when the company sought to promote a new series to its existing cus-
tomer base, a customer who had purchased or was currently subscribing to several book
series already would receive multiple copies of the same glossy brochure promoting the
new series. In addition, it was not uncommon for a customer to receive the same bro-
chure at multiple addresses if that customer had used different mailing addresses when
subscribing to different publications. In the mid-1980s, the company replaced its separate
file-processing systems with an integrated database system that eliminated much of the
data duplication and lack of data integrity that characterized the previous file-processing
environment in which it had been operating.

1.4 THE ANSI/SPARC THREE-SCHEMA ARCHITECTURE


In the 1970s, the Standards Planning and Requirements Committee (SPARC) of the
American National Standards Institute (ANSI) offered a solution to these problems by
proposing what came to be known as the ANSI/SPARC three-schema architecture.2 The
ANSI/SPARC three-schema architecture, as illustrated in Figure 1.2, consists of three per-
spectives of metadata in a database. The conceptual schema is the nucleus of the three-
schema architecture. Located between the external schema and internal schema, the
conceptual schema represents the global conceptual view of the structure of the entire
database for a community of users. By insulating applications/programs from changes in
physical storage structure and data access strategy, the conceptual schema achieves
program-data independence in a database environment.

2
In a database context, the word “schema” stands for “description of metadata.”

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Database Systems: Architecture and Components

FIGURE 1.2 The ANSI/SPARC three-schema architecture

The external schema3 consists of a number of different user views4 or subschemas,


each describing portions of the database of interest to a particular user or group of users.
The conceptual schema represents the global view of the structure of the entire database
for a community of users. The conceptual schema is the consolidation of user views. The
data specification (metadata) for the entire database is captured by the conceptual

3
While an external schema is technically a collection of external subschemas or views, the term
“external schema” is used here in the context of either an individual user view or a collection of
different user views.
4
Informally, a “view” is a term that describes the information of interest to a user or a group of
users, where a user can be either an end user or a programmer. See Chapter 6 (Section 6.4) for
a more precise definition of a “view.”

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 1

schema. The internal schema describes the physical structure of the stored data (how the
8
data is actually laid out on storage devices) and the mechanism used to implement the
access strategies (indexes, hashed addresses, and so on). The internal schema is con-
cerned with efficiency of data storage and access mechanisms in the database. Thus, the
internal schema is technology dependent, while the conceptual schema and external
schemas are technology independent. In principle, user views are generated on demand
through logical reference to data items in the conceptual schema independent of the logi-
cal or physical structure of the data.

1.4.1 Data Independence Defined


Data independence is the central concept driving a database system, and the very purpose
of a three-schema architecture is to enable data independence. The theme underlying the
concept of data independence is that when a schema at a lower level is changed, the
higher-level schemas themselves are unaffected by such changes. In other words, when a
change is made to storage structure or access strategy in the internal schema, there will be
no need to make any changes in the conceptual or external schemas; only the mapping
information—i.e., transforming requests and results between levels of schema—between a
schema and higher-level schemas need to be changed. Only then can it be said that data
independence is fully supported.
For instance, suppose direct access to data ordered by zip code is required. This
may be recorded as “direct access” in the conceptual schema, and a certain type of
indexing technique may be employed in the internal schema. This fact will be available
as the mapping information so that if/when the indexing technique in the internal schema
is changed, only the mapping information gets changed, and the conceptual schema is
unaffected. Incidentally, the external views are completely shielded from even the
knowledge of this change in the internal schema. That is, the specification and implementa-
tion of a change in the indexing mechanism on zip code does not require any modification
and testing of the application programs that use the external views containing zip code.
This capacity to change the internal schema without having to change the conceptual
or external schema is sometimes referred to as physical data independence. The internal
schema may be changed when certain file structures are reorganized or new indexes are
created to improve database performance. The physical data independence enables imple-
mentation of such changes without requiring any corresponding changes in the conceptual
or external schemas.
Likewise, enhancements to the conceptual schema in the form of growth or restructur-
ing will have no impact on any of the external views (subschemas) since all external views
are spawned from the conceptual schema only by logical reference to elements in the
conceptual schema. For instance, redefinition of logical structures of a data model (such as
adding or restructuring tables in a relational database) may sometimes be in order. Since
the external views (subschemas) are generated exclusively by logical references, the user
views are immune to such logical design changes in the conceptual schema. This property is
often called logical data independence. Logical data independence also enables a user
(external) view to be immune to changes in the other user views.
A file-processing system, in contrast, may be viewed as a two-schema architecture
consisting of the internal schema and the programmer’s view (external schema), as shown

Copyright 2015 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Another random document with
no related content on Scribd:
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like