Mate

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 89

DBMS

UNIT 1 Introduction to data base management systems


Objectives
At the end of this chapter the reader will be able to:

Distinguish between data and information and Knowledge

Distinguish between file processing system and DBMS

Describe DBMS its advantages and disadvantages

Describe Database users including data base administrator

Describe data models, schemas and instances.

Describe DBMS Architecture & Data Independence

Describe Data Languages


Introduction
A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. This is a collection of related data with an implicit meaning and
hence is a database. The collection of data, usually referred to as the database, contains
information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store
and retrieve database information that is both convenient and efficient. By data, we mean known
facts that can be recorded and that have implicit meaning. Database systems are designed to
manage large bodies of information. Management of data involves both defining structures for
storage of information and providing mechanisms for the manipulation of information. In
addition, the database system must ensure the safety of the information stored, despite system
crashes or attempts at unauthorized access. If data are to be shared among several users, the
system must avoid possible anomalous results.
Data Processing Vs. Data Management Systems
Although Data Processing and Data Management Systems both refer to functions that take raw
data and transform it into usable information, the usage of the terms is very different. Data
Processing is the term generally used to describe what was done by large mainframe computers
from the late 1940's until the early 1980's (and which continues to be done in most large
organizations to a greater or lesser extent even today): large volumes of raw transaction data fed

into programs that update a master file, with fixed-format reports written to paper.
The term Data Management Systems refers to an expansion of this concept, where the raw
data, previously copied manually from paper to punched cards, and later into data-entry

Terminals, is now fed into the system from a variety of sources, including ATMs, EFT, and direct
customer entry through the Internet. The master file concept has been largely displaced by
database management systems, and static reporting replaced or augmented by ad-hoc reporting
and direct inquiry, including downloading of data by customers. The ubiquity of the Internet and
the Personal Computer have been the driving force in the transformation of Data Processing to
the more global concept of Data Management Systems.
File Oriented Approach
The earliest business computer systems were used to process business records and produce
information. They were generally faster and more accurate than equivalent manual systems.
These systems stored groups of records in separate files, and so they were called file processing
systems. In a typical file processing systems, each department has its own files, designed
specifically for those applications. The department itself working with the data processing staff,
sets policies or standards for the format and maintenance of its files.
Programs are dependent on the files and vice-versa; that is, when the physical format of the file
is changed, the program has also to be changed. Although the traditional file oriented approach to
information processing is still widely used, it does have some very important disadvantages.
Characteristics
Traditionally data was organized in file formats. DBMS was all new concepts then and all the
research was done to make it to overcome all the deficiencies in traditional style of data
management. Modern DBMS has the following characteristics:
Real-world entity: Modern DBMS are more realistic and uses real world entities to design
its architecture. It uses the behavior and attributes too. For example, a school database may use
student as entity and their age as their attribute.
Relation-based tables: DBMS allows entities and relations among them to form as tables.
This eases the concept of data saving. A user can understand the architecture of database just by
looking at table names etc.
Isolation of data and application: A database system is entirely different than its data.
Where database is said to active entity, data is said to be passive one on which the database
works and organizes. DBMS also stores metadata which is data about data, to ease its own
process.
Less redundancy: DBMS follows rules of normalization, which splits a relation when any of
its attributes is having redundancy in values. Following normalization, which itself is a
mathematically rich and scientific process, make the entire database to contain as less
redundancy as possible.

Consistency: DBMS always enjoy the state on consistency where the previous form of data
storing applications like file processing does not guarantee this. Consistency is a state where
every relation in database remains consistent. There exist methods and techniques, which can
detect attempt of leaving database in inconsistent state.
Query Language: DBMS is equipped with query language, which makes it more efficient to
retrieve and manipulate data. A user can apply as many and different filtering options, as he or
she wants. Traditionally it was not possible where file-processing system was used.
ACID Properties: DBMS follows the concepts for ACID properties, which stands for
Atomicity, Consistency, Isolation and Durability. These concepts are applied on transactions,
which manipulate data in database. ACID properties maintains database in healthy state in multitransactional environment and in case of failure.
Multiuser and Concurrent Access: DBMS support multi-user environment and allows them
to access and manipulate data in parallel. Though there are restrictions on transactions when they
attempt to handle same data item, but users are always unaware of them.
Multiple views: DBMS offers multiples views for different users. A user who is in sales
department will have a different view of database than a person working in production
department. This enables user to have a concentrate view of database according to their
requirements.
Security: Features like multiple views offers security at some extent where users are unable
to access data of other users and departments. DBMS offers methods to impose constraints while
entering data into database and retrieving data at later stage. DBMS offers many different levels
of security features, which enables multiple users to have different view with different features.
Concurrent Use
A database system allows several users to access the database concurrently. Answering different
questions from different users with the same (base) data is a central aspect of an information
system. Such concurrent use of data increases the economy of a system.
Structured and Described Data
A fundamental feature of the database approach is that the database systems do not only contain
the data but also the complete definition and description of these data. These descriptions are
basically details about the extent, the structure, the type and the format of all data and,
additionally, the relationship between the data. This kind of stored data is called metadata ("data
about data").
Separation of Data and Applications
As described in the feature structured data the structure of a database is described through

metadata which is also stored in the database. An application software does not need any
knowledge about the physical data storage like encoding, format, storage place, etc. It only
communicates with the management system f a database (DBMS) via a standardized interface
with the help of a standardized language like SQL.
Data Integrity
Data integrity is a byword for the quality and the reliability of the data of a database system. In a
broader sense data integrity includes also the protection of the database from unauthorized access
(confidentiality) and un authorized changes..
Transactions
A transaction is a bundle of actions which are done within a database to bring it from one
consistent state to a new consistent state..
Data Persistence
Data persistence means that in a DBMS all data is maintained as long as it is not deleted
explicitly. The life span of data needs to be determined directly or indirectly be the user and must
not be dependent on system features. Additionally data once stored in a database must not be
lost. Changes of a database which are done by a transaction are persistent. When a transaction is
finished even a system crash cannot put the data in danger.
Advantages and Disadvantages of a DBMS
Using a DBMS to manage data has many advantages:
Data independence: Application programs should be as independent as possible from details of
data representation and storage. The DBMS can provide an abstract view of the data to insulate
application code from such details.
Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.
Data integrity and security: If data is always accessed through the DBMS, the DBMS can
enforce integrity constraints on the data. For example, before inserting salary information for an
employee, the DBMS can check that the department budget is not exceeded. Also, the DBMS
can enforce access controls that govern what data is visible to different classes of users.
Data administration: When several users share the data, centralizing the administration of data
can offer significant improvements. Experienced professionals, who understand the nature of the
data being managed, and how different groups of users use it, can be responsible for organizing
the data representation to minimize redundancy and fine-tuning the storage of the data to make
retrieval efficient.

Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in
such a manner that users can think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures.
Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS. This, in
conjunction with the high-level interface to the data, facilitates quick development of
applications. Such applications are also likely to be more robust than applications developed

from scratch because many important tasks are handled by the DBMS instead of being
implemented by the application.
Disadvantages of a DBMS
Danger of a Overkill: For small and simple applications for single users a database system is
often not advisable.
Complexity: A database system creates additional complexity and requirements. The supply and
operation of a database management system with several users and databases is quite costly and
demanding.
Qualified Personnel: The professional operation of a database system requires appropriately
trained staff. Without a qualified database administrator nothing will work for long.
Costs: Through the use of a database system new costs are generated for the system itselfs but
also for additional hardware and the more complex handling of the system.
Lower Efficiency: A database system is a multi-use software which is often less efficient than
specialized software which is produced and optimized exactly for one problem.
Instances and Schemas
Databases change over time as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database. The overall
design of the database is called the database schema. Schemas are changed infrequently, if at all.
The concept of database schemas and instances can be understood by analogy to a program
written in a programming language. A database schema corresponds to the variable declarations
(along with associated type definitions) in a program. Each variable has a particular value at a
given instant. The values of the variables in a program at a point in time correspond to an
instance of a database schema. Therefore Database schema skeleton structure of and it represents
the logical view of entire database. It tells about how the data is organized and how relation
among them is associated. It formulates all database constraints that would be put on data in
relations, which resides in database. A database schema defines its entities and the relationship
among them. Database schema is a descriptive detail of the database, which can be depicted by
means of schema diagrams. All these activities are done by database designer to help
programmers in order to give some ease of understanding all aspect of database.
Database systems have several schemas, partitioned according to the levels of abstraction. The
physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level. A database may also have several schemas at
the view level, sometimes called sub schemas that describe different views of the database. Of
these, the logical schema is by far the most important, in terms of its effect on application
programs, since programmers construct applications by using the logical schema. The physical
schema is hidden beneath the logical schema, and can usually be changed easily without

affecting application programs. Application programs are said to exhibit physical data
independence if they do not depend on the physical schema, and thus need not be rewritten if
the physical schema changes.
Database schema skeleton structure of and it represents the logical view of entire database. It
tells about how the data is organized and how relation among them is associated. It formulates all
database constraints that would be put on data in relations, which resides in database.
DBMS Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints.
To illustrate the concept of a data model, we outline two data models in this section: the entityrelationship model and the relational model. Both provide a way to describe the design of a
database at the logical level. Data model tells how the logical structure of a database is modeled.
Data Models are fundamental entities to introduce abstraction in DBMS. Data models define
how data is connected to each other and how it will be processed and stored inside the system.
The very first data model could be flat data-models where all the data used to be kept in same
plane. Because earlier data models were not so scientific they were prone to introduce lots of
duplication and update anomalies.
Other Data Models:
The object-oriented data model is another data model that has seen increasing attention.
The object-oriented model can be seen as extending the E-R model with notions object-oriented
data model. The object-relational data model combines features of the object-oriented data
model and relational data model. Semi structured data models permit the specification of data
where individual data items of the same type may have different sets of attributes. This is in
contrast with the data models mentioned earlier, where every data item of a particular type must
have the same set of attributes. The extensible markup language (XML) is widely used to
represent semi structured data.
Historically, two other data models, the network data model and the hierarchical data model,
preceded the relational data model. These models were tied closely to the underlying
implementation, and complicated the task of modeling data. As a result they are little used now,
except in old database code that is still in service in some places.

DBMS Architecture

Three important characteristics of the database approach are (1) insulation of programs and data
(program-data and program-operation independence); (2) support of multiple user views; and (3)
use of a catalog to store the database description (schema). In this section we specify architecture
for database systems, called the three-schema architecture, which was proposed to help achieve
and visualize these characteristics.

The Three-Schema Architecture


The goal of the three-schema architecture, illustrated in Figure is to separate the user applications
and the physical database. In this architecture, schemas can be defined at the following three
levels:
The internal level has an internal schema, which describes the physical storage structure of the
database. The internal schema uses a physical data model and describes the complete details of
data storage and access paths for the database.
The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema hides the details of physical storage
structures and concentrates on describing entities, data types, relationships, user operations, and
constraints. A high-level data model or an implementation data model can be used at this level.
The external or view level includes a number of external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and hides
the rest of the database from that user group. A high-level data model or an implementation data
model can be used at this level.
The three-schema architecture is a convenient tool for the user to visualize the schema levels in a
database system. Most DBMSs do not separate the three levels completely, but support the threeschema architecture to some extent. Some DBMSs may include physical-level details in the
conceptual schema. In most DBMSs that support user views,
external schemas are specified in the same data model that describes the conceptual-level
information. Some DBMSs allow different data models to be used at the conceptual and external
levels.
Notice that the three schemas are only descriptions of data; the only data that actually exists is at
the physical level. In a DBMS based on the three-schema architecture, each user group refers
only to its own external schema. Hence, the DBMS must transform a request specified on an
external schema into a request against the conceptual schema, and then into a request on the
internal schema for processing over the stored database. If the request is a database retrieval, the
data extracted from the stored database must be reformatted to match the users external view.
The processes of transforming requests and results between levels are called mappings. These
mappings may be time-consuming, so some DBMSsespecially those that are meant to support
small databasesdo not support external views. Even in such systems, however, a certain
amount of mapping is necessary to transform requests between the conceptual and internal
levels.
The design of a Database Management System highly depends on its architecture. It can be

centralized or decentralized or hierarchical. DBMS architecture can be seen as single tier or


Multi-tier. n-tier architecture divides the whole system into related but independent n modules,
which can be independently modified, altered, changed or replaced.
In 1-tier architecture, DBMS is the only entity where user directly sits on DBMS and uses it. Any
changes done here will directly be done on DBMS itself. It does not provide handy tools for end
users and preferably database designer and programmers use single tier architecture.
If the architecture of DBMS is 2-tier then must have some application, which uses the DBMS.
Programmers use 2-tier architecture where they access DBMS by means of application. Here
application tier is entirely independent of database in term of operation, design and
programming.
3-tier architecture
Most widely used architecture is 3-tier architecture. 3-tier architecture separates it tier from each
other on basis of users. It is described as follows:

Database Languages
A database system provides a data definition language to specify the database schema and a data
manipulation language to express database queries and updates. In practice, the data definition
and data manipulation languages are not two separate languages; instead they simply form parts
of a single database language, such as the widely used SQL language.

Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a
data-definition language (DDL). For instance, the following statement in the SQL language
defines the account table:
create table account (account-number char(10), balance integer)
Execution of the above DDL statement creates the account table. In addition, it updates a
special set of tables called the data dictionary or data directory. A data dictionary contains
metadatathat is, data about data. The schema of a table is an example of metadata. A database
system consults the data dictionary before reading or modifying actual data. We specify the
storage structure and access methods used by the database system by a set of statements in a
special type of DDL called a data storage and definition language.
These statements define the implementation details of the database schemas, which are
usually hidden from the users. The data values stored in the database must satisfy certain
consistency constraints. For example, suppose the balance on an account should not fall below
$100. The DDL provides facilities to specify such constraints. The database systems check these
Constraints every time the database is updated.
Data-Manipulation Language
Data manipulation is The retrieval of information stored in the database The insertion of new
information into the database The deletion of information from the database The modification of
information stored in the database A data-manipulation language (DML) is a language that
enables users to access or manipulate data as organized by the appropriate data model. There are
basically two types:
Procedural DMLs require a user to specify what data are needed and how to get those data.
Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data
are needed without specifying how to get those data. Declarative DMLs are usually easier to
learn and use than are procedural DMLs. However, since a user does not have to specify how to
get the data, the database system has to figure out an efficient means of accessing data. The DML
component of the SQL language is nonprocedural. A query is a statement requesting the retrieval
of information. The portion of a DML that involves information retrieval is called a query
language. Although technically incorrect, it is common practice to use the terms query language
and data manipulation language synonymously. This query in the SQL language finds the name
of the customer whose customer-id is 192-83-7465:

Select customer. Customer-name from customer where customer. Customer-id = 192-837465


The query specifies that those rows from the table customer where the customer-id is 192-837465 must be retrieved, and the customer-name attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the following query
finds the balance of all accounts owned by the customer with customer id 192-83-7465.
select account.balance from depositor, account where depositor.customer-id = 192-83-7465
and depositor.account-number = account.account-number
There are a number of database query languages in use, either commercially or experimentally.
The levels of abstraction apply not only to defining or structuring data, but also to manipulating
data. At the physical level, we must define algorithms that allow efficient access to data. At
higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact
efficiently with the system. The query processor component of the database system translates
DML queries into sequences of actions at the physical level of the database system.

Data Dictionary
We can define a data dictionary as a DBMS component that stores the definition of data
characteristics and relationships. You may recall that such data about data were labeled
metadata. The DBMS data dictionary provides the DBMS with its self describing characteristic.
In effect, the data dictionary resembles and X-ray of the companys entire data set, and is a
crucial element in the data administration function. The two main types of data dictionary exist,
integrated and stand alone. An integrated data dictionary is included with the DBMS. For
example, all relational DBMSs include a built in data dictionary or system catalog that is
frequently accessed and updated by the RDBMS. Other DBMSs especially older types, do not
have a built in data dictionary instead the DBA may use third party stand alone data dictionary
systems. Data dictionaries can also be classified as active or passive. An active data dictionary is
automatically updated by the DBMS with every database access, thereby keeping its 15access
information up-to-date. A passive data dictionary is not updated automatically and usually
requires a batch process to be run. Data dictionary access information is normally used by the
DBMS for query optimization purpose. The data dictionarys main function is to store the
description of all objects that interact with the database. Integrated data dictionaries tend to limit
their metadata to the data managed by the DBMS. Stand alone data dictionary systems are more
usually more flexible and allow the DBA to describe and manage all the organizations data,
whether or not they are computerized. Whatever the data dictionarys format, its existence
provides database designers and end users with a much improved ability to communicate. In
addition, the data dictionary is the tool that helps the DBA to resolve data conflicts. Although,
there is no standard format for the information stored in the data dictionary several features are
common. For example, the data dictionary typically stores descriptions of all:

Data elements that are define in all tables of all databases. Specifically the data dictionary
stores the name, datatypes, display formats, internal storage formats, and validation rules. The
data dictionary tells where an element is used, by whom it is used and so on. Tables define in
all databases. For example, the data dictionary is likely to store the name of the table creator, the
date of creation access authorizations, the number of columns, and so on. Indexes define for
each database tables. For each index the DBMS stores at least the index name the attributes used,
the location, specific index characteristics and the creation date.
Define databases: who created each database, the date of creation where the database is located,
who the DBA is and so on.

End users and The Administrators of the data base

Programs that access the database including screen formats, report formats Application formats,
SQL queries and so on.

Access authorization for all users of all databases.

Relationships among data elements which elements are involved: whether the relationship is
mandatory or optional, the connectivity and cardinality and so on.

If the data dictionary can be organized to include data external to the DBMS itself, it becomes an
specially flexible to for more general corporate resource management. The management of such
an extensive data dictionary, thus, makes it possible to manage the use and allocation of all of the
organization information regardless whether it has its roots in the database data. This is why
some managers consider the data dictionary to be the key element of the information resource
management function. And this is also why the data dictionary might be described as the
information resource dictionary. The metadata stored in the data dictionary is often the bases for
monitoring the database use and assignment of access rights to the database users. The
information stored in the database is usually based on the relational table format, thus , enabling
the DBA to query the database with SQL command. For example, SQL command can be used to
extract information about the users of the specific table or about the access rights of a particular
users.

UNIT 2
Objectives
At the end of this chapter the reader will be able to:

Describe Data modeling, Entity Relation Model

Distinguish between Entity set , weak entity strong entity

Describe Relational model and relational Constraints

Describe Relational model Concepts


Introduction
A data model is a conceptual representation of the data structures that are required by a database.
The data structures include the data objects, the associations between data objects, and the rules
which govern operations on the objects. As the name implies, the data model focuses on what
data is required and how it should be organized rather than what operations will be performed on
the data. To use a common analogy, the data model is equivalent to an architect's building plans.
A data model is independent of hardware or software constraints. Rather than try to represent the
data as a database would see it, the data model focuses on representing the data as the user sees it
in the "real world". It serves as a bridge between the concepts that make up real-world events and
processes and the physical representation of those concepts in a database.
Components of a Data Model
The data model gets its inputs from the planning and analysis stage. Here the modeler, along
with analysts, collects information about the requirements of the database by reviewing existing
documentation and interviewing end-users. The data model has two outputs. The first is an
entity-relationship diagram which represents the data structures in a pictorial form. Because the
diagram is easily learned, it is valuable tool to communicate the model to the end-user. The
second component is a data document. This a document that describes in detail the data objects,
relationships, and rules required by the database. The dictionary provides the detail required by
the database developer to construct the physical database.
Why is Data Modeling Important?
Data modeling is probably the most labor intensive and d time consuming part of the
development process. Why bother especially if you are pressed for time? A common response by
practitioners who write on the subject is that you should no more build a database without a
model than you should build a house without blueprints. The goal of the data model is to make
sure that the all data objects required by the database are completely and accurately represented.
Because the data model uses easily understood notations and natural language, it can be
reviewed and verified as correct by the end-users. The data model is also detailed enough to be

used by the database developers to use as a "blueprint" for building the physical database. The
information contained in the data model will be used to define the relational tables, primary and
foreign keys, stored procedures, and triggers.
A database model or database schema is the structure or format of a database, described in a
formal language supported by the database management system, In other words, a "database
model" is the application of a data model when used in conjunction with a database management
system.

Database model is a theory or specification describing how a database is structured and used.
Several such models have been suggested.
Common models include:
Hierarchical model
Network model
Relational model
Entity-relationship
Object-relational model
Object model
A data model is not just a way of structuring data: it also defines a set of operations that can be
performed on the data. The relational model, for example, defines operations such as select,
project, and join. Although these operations may not be explicit in a particular query language,
they provide the foundation on which a query language is built.
Models
Various techniques are used to model data structure. Most database systems are built around one
particular data model, although it is increasingly common for products to offer support for more
than one model. For any one logical model various physical implementations may be possible,
and most products will offer the user some level of control in tuning the physical
implementation, since the choices that are made have a significant effect on performance. An
example of this is the relational model: all serious implementations of the relational model allow
the creation of indexes which provide fast access to rows in a table if the values of certain
columns are known.

The flat (or table) model consists of a single, two-dimensional array of data elements, where all
members of a given column are assumed to be similar values, and all members of a row are

assumed to be related to one another. For instance, columns for name and password that might be
used as a part of a system security database. Each row would have the specific password
associated with an individual user. Columns of the table often have a type associated with them,
defining them as character data, date or time information, integers, or floating point numbers.
This may not strictly qualify as a data model, as defined above.

Hierarchical model

Hierarchical Model.[1]
Main article: Hierarchical model
In a hierarchical model, data is organized into a tree-like structure, implying a single upward link
in each record to describe the nesting, and a sort field to keep the records in a particular order in
each same-level list. Hierarchical structures were widely used in the early mainframe database
management systems, such as the Information Management System (IMS) by IBM, and now

describe the structure of XML documents. This structure allows one 1:N relationship between
two types of data. This structure is very efficient to describe many relationships in the real world;
recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information.
However, the hierarchical structure is inefficient for certain database operations when a full path
(as opposed to upward link and sort field) is not also included for each record.
Parentchild relationship: Child may only have one parent but a parent can have multiple
children. Parents and children are tied together by links called "pointers". A parent will have a
list of pointers to each of their children.
Network model

Network Model.[1]
Main article: Network model
The network model (defined by the CODASYL specification) organizes data using two
fundamental constructs, called records and sets. Records contain fields (which may be organized
hierarchically, as in the programming language COBOL). Sets (not to be confused with
mathematical sets) define one-to-many relationships between records: one owner, many
members. A record may be an owner in any number of sets, and a member in any number of sets.
The network model is a variation on the hierarchical model, to the extent that it is built on the
concept of multiple branches (lower-level structures) emanating from one or more nodes (higherlevel structures), while the model differs from the hierarchical model in that branches can be
connected to multiple nodes. The network model is able to represent redundancy in data more
efficiently than in the hierarchical model.

The operations of the network model are navigational in style: a program maintains a current
position, and navigates from one record to another by following the relationships in which the
record participates. Records can also be located by supplying key values.
Although it is not an essential feature of the model, network databases generally implement the
set relationships by means of pointers that directly address the location of a record on disk. This
gives excellent retrieval performance, at the expense of operations such as database loading and
reorganization.
Most object databases use the navigational concept to provide fast navigation across networks of
objects, generally using object identifiers as "smart" pointers to related objects. Objectivity/DB,
for instance, implements named 1:1, 1:many, many:1 and many:many named relationships that
can cross databases. Many object databases also support SQL, combining the strengths of both
models.
Relational model

Example of a Relational Model.[1]


The relational model was introduced by E.F. Codd in 1970[2] as a way to make database
management systems more independent of any particular application. It is a mathematical model
defined in terms ofpredicate logic and set theory.
The products that are generally referred to as relational databases in fact implement a model that
is only an approximation to the mathematical model defined by Codd. Three key terms are used
extensively in relational database models: relations, attributes, and domains. A relation is a table

with columns and rows. The named columns of the relation are called attributes, and the domain
is the set of values the attributes are allowed to take.
The basic data structure of the relational model is the table, where information about a particular
entity (say, an employee) is represented in rows (also called tuples) and columns. Thus, the
"relation" in "relational database" refers to the various tables in the database; a relation is a set of
tuples. The columns enumerate the various attributes of the entity (the employee's name, address
or phone number, for example), and a row is an actual instance of the entity (a specific
employee) that is represented by the relation. As a result, each tuple of the employee table
represents various attributes of a single employee.
All relations (and, thus, tables) in a relational database have to adhere to some basic rules to
qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can't be
identical tuples or rows in a table. And third, each tuple will contain a single value for each of its
attributes.
A relational database contains multiple tables, each similar to the one in the "flat" database
model. One of the strengths of the relational model is that, in principle, any value occurring in
two different records (belonging to the same table or to different tables), implies a relationship
among those two records. Yet, in order to enforce explicit integrity constraints, relationships
between records in tables can also be defined explicitly, by identifying or non-identifying parentchild relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also
have a designated single attribute or a set of attributes that can act as a "key", which can be used
to uniquely identify each tuple in the table.
A key that can be used to uniquely identify a row in a table is called a primary key. Keys are
commonly used to join or combine data from two or more tables. For example,
anEmployee table may contain a column named Location which contains a value that matches
the key of a Location table. Keys are also critical in the creation of indexes, which facilitate fast
retrieval of data from large tables. Any column can be a key, or multiple columns can be grouped
together into a compound key. It is not necessary to define all the keys in advance; a column can
be used as a key even if it was not originally intended to be one.
A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a
car's serial number) is sometimes called a "natural" key. If no natural key is suitable (think of the
many people named Brown), an arbitrary or surrogate key can be assigned (such as by giving
employees ID numbers). In practice, most databases have both generated and natural keys,
because generated keys can be used internally to create links between rows that cannot break,
while natural keys can be used, less reliably, for searches and for integration with other
databases. (For example, records in two independently developed databases could be matched up

by social security number, except when the social security numbers are incorrect, missing, or
have changed.)
Dimensional model
The dimensional model is a specialized adaptation of the relational model used to represent data
in data warehouses in a way that data can be easily summarized using OLAPqueries. In the
dimensional model, a database consists of a single large table of facts that are described using
dimensions and measures. A dimension provides the context of a fact (such as who participated,
when and where it happened, and its type) and is used in queries to group related facts together.
Dimensions tend to be discrete and are often hierarchical; for example, the location might
include the building, state, and country. A measure is a quantity describing the fact, such as
revenue. It's important that measures can be meaningfully aggregated - for example, the revenue
from different locations can be added together.
In an OLAP query, dimensions are chosen and the facts are grouped and added together to create
a summary.
The dimensional model is often implemented on top of the relational model using a star schema,
consisting of one table containing the facts and surrounding tables containing the dimensions.
Particularly complicated dimensions might be represented using multiple tables, resulting in
a snowflake schema.
A data warehouse can contain multiple star schemas that share dimension tables, allowing them
to be used together. Coming up with a standard set of dimensions is an important part of
dimensional modeling.
Objectional database models

Example of a Object-Oriented Model.[1]


Main article: Object-relational model

Main article: Object model


In recent years, the object-oriented paradigm has been applied to database technology, creating a
new programming model known as object databases. These databases attempt to bring the
database world and the application programming world closer together, in particular by ensuring
that the database uses the same type system as the application program. This aims to avoid the
overhead (sometimes referred to as theimpedance mismatch) of converting information between
its representation in the database (for example as rows in tables) and its representation in the
application program (typically as objects). At the same time, object databases attempt to
introduce the key ideas of object programming, such as encapsulation andpolymorphism, into the
world of databases.
A variety of these ways have been tried for storing objects in a database. Some products have
approached the problem from the application programming end, by making the objects
manipulated by the programpersistent. This also typically requires the addition of some kind of
query language, since conventional programming languages do not have the ability to find
objects based on their information content. Others have attacked the problem from the database
end, by defining an object-oriented data model for the database, and defining a database
programming language that allows full programming capabilities as well as traditional query
facilities.
Object databases suffered because of a lack of standardization: although standards were defined
by ODMG, they were never implemented well enough to ensure interoperability between
products. Nevertheless, object databases have been used successfully in many applications:
usually specialized applications such as engineering databases or molecular biology databases
rather than mainstream commercial data processing. However, object database ideas were picked
up by the relational vendors and influenced extensions made to these products and indeed to
the SQL language.

Unit-3
Entity-Relationship Model
Introduction
The entity-relationship (E-R) data model is based on a perception of a real world that consists of
a collection of basic objects, called entities, and of relationships among these objects. An entity is
a
thing or object in the real world that is distinguishable from other objects. EntityRelationship model is based on the notion of real world entities and relationship among them.
While formulating real-world scenario into database model, ER Model creates entity set,
relationship set, general attributes and constraints. For example, each person is an entity, and
bank accounts can be considered as entities. Entities are described in a database by a set of
attributes. For example, the attributes account-number and balance may describe one particular
account in a bank, and they form attributes of the account entity set. Similarly, attributes
customer-name, customer-street address and customer-city may describe a customer entity.
An extra attribute customer-id is used to uniquely identify customers (since it may be possible to
have two customers with the same name, street address, and city).
A unique customer identifier must be assigned to each customer. In the United States, many
enterprises use the social-security number of a person (a unique number the U.S. government
assigns to every person in the United States) as a customer identifier.
A relationship is an association among several entities. For example, a depositor relationship
associates a customer with each account that she has. The set of all entities of the same type and
the set of all relationships of the same type are termed an entity set and relationship set,
respectively.
Objective

To explain the need for entity-relationship modeling

To explain the terms entity-relationship model, entity-relationship diagram

To define the terms entity type, entity, attribute, attribute value, primary key, relationship,
relationship type, inverse relationship type

To define the grammar of entity-relationship diagrams

To describe ways of classifying relationship types

To describe the terms unary, binary, ternary, degree, cardinality and optionality with regard to
relationship types

To give various examples of entity-relationship modeling

3.1 Introduction
When a relational database is to be designed, an entity-relationship diagram is drawn at an early
stage and developed as the requirements of the database and its processing become better
understood. Drawing an entity-relationship diagram aids understanding of an organization's data
needs and can serve as a schema diagram for the required system's database. A schema diagram
is any diagram that attempts to show the structure of the data in a database. Nearly all systems
analysis and design methodologies contain entity-relationship diagramming as an important part
of the methodology and nearly all CASE (Computer Aided Software Engineering) tools contain
the facility for drawing entity-relationship diagrams. An entity-relationship diagram could serve
as the basis for the design of the files in a conventional file-based system as well as for a schema
diagram in a database system.
The details of how to draw the diagrams vary slightly from one method to another, but they all
have the same basic elements: entity types, attributes and relationships. These three categories
are considered to be sufficient to model the essentially static data-based parts of any
organization's information processing needs.
3.2 Entity Types
An entity type is any type of object that we wish to store data about. Which entity types you
decide to include on your diagram depends on your application. In an accounting application for
a business you would store data about customers, suppliers, products, invoices and payments and
if the business manufactured the products, you would need to store data about materials and
production steps. Each of these would be classified as an entity type because you would want to
store data about each one. In an entity-relationship diagram an entity type is shown as a box. In
Fig. 3.1, CUSTOMER is an entity type. Each entity type is shown once. There may be many
entity types in an entity-relationship diagram. The name of an entity type is singular since it
represents a type.
An entity type is considered to be a set of objects. For this reason some people use the alternative
term entity set. An entity is simply one member or example or element or instance of the type or
set. So an entity is one individual within an entity type. For example, within the entity type
CUSTOMER, J. Smith might be one entity. He is an individual entity within the type, an element
in the set, an instance of the type 'customer'.

Fig. 3.1 An entity type CUSTOMER and one of its attributes Cus_no
3.3 Attributes
The data that we want to keep about each entity within an entity type is contained in attributes.
An attribute is some quality about the entities that we are interested in and want to hold on the
database. In fact we store the value of the attributes on the database. Each entity within the entity
type will have the same set of attributes, but in general different attribute values. For example the
value of the attribute ADDRESS for a customer J. Smith in a CUSTOMER entity type might be
'10 Downing St., London' whereas the value of the attribute 'address' for another customer J.
Major might be '22 Railway Cuttings, Cheam'.
There will be the same number of attributes for each entity within an entity type. That is one of
the characteristics of entity-relationship modeling and relational databases. We store the same
type of facts (attributes) about every entity within the entity type. If you knew that one of your
customers happened to be your cousin, there would be no attribute to store that fact in, unless
you wanted to have a 'cousin-yes-no' attribute, in which case nearly every customer would be a
no, which would be considered a waste of space.
3.4 Primary Key
Attributes can be shown on the entity-relationship diagram in an oval. In Fig. 3.1, one of the
attributes of the entity type CUSTOMER is shown. It is up to you which attributes you show on
the diagram. In many cases an entity type may have ten or more attributes. There is often not
room on the diagram to show all of the attributes, but you might choose to show an attribute that
is used to identify each entity from all the others in the entity type. This attribute is known as the
primary key. In some cases you might need more than one attribute in the primary key to identify
the entities.
In Fig. 3.1, the attribute CUS_NO is shown. Assuming the organization storing the data ensures
that each customer is allocated a different cus_no, that attribute could act as the primary key,
since it identifies each customer; it distinguishes each customer from all the rest. No two
customers have the same value for the attribute cus_no. Some people would say that an attribute
is a candidate for being a primary key because it is unique. They mean that no two entities

within that entity type can have the same value of that attribute. In practice it is best not to use
that word because it has other connotations.
As already mentioned, you may need to have a group of attributes to form a primary key, rather
than just one attribute, although the latter is more common. For example if the organization using
the CUSTOMER entity type did not allocate a customer number to its customers, then it might
be necessary to use a composite key, for example one consisting of the attributes SURNAME
and INITIALS together, to distinguish between customers with common surnames such as
Smith. Even this may not be sufficient in some cases.
Primary keys are not the only attributes you might want to show on the entity-relationship
diagram. For example, in a manufacturing organization you might have an entity type called
COMPONENT and you want to make it clear on the entity-relationship diagram that the entities
within the type are not single components but a component type such as a BC109 transistor.
There are thousands of BC109s in stock and any one will do for any application. It is therefore
not necessary to identify each BC109 differently (they all look and work the same). However
you might want to distinguish BC109s from another transistor type BC108. To make it clear that
you are considering all the BC109s as one entity and all the BC108s as another entity, you might
put the attribute QIS (quantity in stock) on the entity-relationship diagram as in Fig. 3.2. This
makes it clearer at the entity-relationship model level that each entity in the entity type is in fact
a stock item of which there will be several in stock. Any doubts on this point should be resolved
by inspecting the entity description, which shows all the attributes of the entity type and (ideally)
their meaning. The primary key might be STOCK_NO and one of the attributes QIS, which
should remove any doubt on this point.

Fig. 3.2 A well-placed attribute may clarify the meaning of an entity type.
In a quality control situation however you might be interested in individual components
(pieces) and you would then consider each piece as an entity within the entity type BC109.
STOCK_NO would not then be an adequate primary key.
Object Oriented Analysis, which is sometimes considered as an alternative to entity-relationship
modeling focuses on this distinction between object and type, making it clear that it is possible
for an item to be both an object (instance, entity) and a type (class, entity type) at the same time.
There is generally no problem in coping with this in entity-relationship modeling provided the

modeler makes clear what he or she means. In this example we have seen that the simple placing
of a well-chosen attribute on the entity-relationship diagram helps clear up any ambiguity. It is an
important skill of the systems analyst and database designer to be able to recognize and control
such ambiguities where they arise. Careful naming of entity types is another device to enhance
clarity and reduce ambiguity. Changing the name of COMPONENT to COMPONENT_TYPE
would be a further improvement.
Fig. 3.3(a) uses the idea of a card file and individual cards within it as being analogous to an
entity type and an entity respectively. In Fig. 3.3(b) the set - element model is used to show the
same thing, and in Fig.3.3(c) the entity-relationship model for the same situation is shown. These
are three different models of the same phenomenon. Notice that the entity-relationship model
version does not explicitly show individual entities. You are meant to know that 'within' the
entity type CUSTOMER there are lots of customer entities.

Fig. 3.3 Three ways of thinking of an entity type.


Apart from serving as an identifier for each entity within an entity type, the primary key also
serves as the method of representing relationships between entities. The primary key becomes a
foreign key in all those entity types to which it is related in a one-one or one-many relationship
type. The concept of foreign keys is discussed later in the course.

3.5 Relationship Types


The first two major elements of entity-relationship diagrams are entity types and attributes. The
final element is the relationship type. Sometimes, the word 'types' is dropped and relationship
types are called simply 'relationships' but since there is a difference between the terms, one
should really use the term relationship type.
Real-world entities have relationships between them, and relationships between entities on the
entity-relationship diagram are shown where appropriate. An entity-relationship diagram consists
of a network of entity types and connecting relationship types. A relationship type is a named
association between entities. Individual entities have individual relationships of the type between
them. An individual person (entity) occupies (relationship) an individual house (entity). In an
entity-relationship diagram, this is generalized into entity types and relationship types. The entity
type PERSON is related to the entity type HOUSE by the relationship type OCCUPIES. There
are lots of individual persons, lots of individual houses, and lots of individual relationships
linking them.
There can be more than one type of relationship between entities. For an example of three
different relationship types between two entity types see Fig. 3.31. Fig. 3.4 shows a single
relationship type 'Received' and its inverse relationship type 'Was_sent_to' between the two
entity types CUSTOMER and INVOICE. It is very important to name all relationship types. The
reader of the diagram must know what the relationship type means and it is up to you the
designer to make the meaning clear from the relationship type name. The direction of both the
relationship type and its inverse should be shown to aid clarity and immediate readability of the
diagram. The tense of the relationship type should also be clear from its name.

Fig. 3.4 Representing a relationship on an entity-relationship diagram.


In the development of a database system, many people will be reading the entity-relationship
diagram and so it should be immediately readable and totally unambiguous. When the database is
implemented, the entity-relationship diagram will continue to be used by application
programmers and query writers. Misinterpretation of the model can result in many lost manhours going down wrong tracks. There is little harm in putting redundant information into your
entity-relationship model. What seems redundant to you can sometimes remove potential
ambiguities for other users of your diagram. Get your user to explain your entity-relationship
model to you! Then you will see how clear it is.

In Fig. 3.4 what is being 'said' is that customers received invoices and invoices were_sent_to
customers. How many invoices a customer might have received (the maximum number and the
minimum number) and how many customers an invoice might have been sent to, is shown by the
degree of the relationship type. The 'degree' of relationship types is defined below.
In Fig. 3.5 three different ways of illustrating the existence of a relationship type are shown. In
(a), in which the CUSTOMER and INVOICE entity types are represented by index cards, it can
be seen that there is a `received' relationship type between customer number 2 and invoice
numbers 7 and 9. Customer number 2 has `received' these two invoices. These two invoices
were_sent_to customer number 2. In (b) the same information is shown using set notation with
the relationship type received and inverse relationship type was_sent_to linking customer
entities and invoice entities. Fig. 3.5(c) is the entity-relationship diagram version and information
about individual entities and which entity is linked to which is lost. The reason for this is simply
that in a real database there would be hundreds of customer and invoice entities and it would be
impossible to show each one on the entity-relationship diagram.

Fig. 3.5 Three ways of thinking of a relationship.

It was mentioned earlier that there is in fact a distinction between relationships and relationship
types. In Fig. 3.5(a) and (b) there are in fact two relationships shown: one between customer 2
and invoice 7 and one between customer 2 and invoice 9, so strictly speaking received is a
relationship type consisting of a number of relationships between entity types. However, this
distinction is sometimes dropped and both are given the name relationship.
Finally, note that relationships between entity types are represented in a relational database using
foreign keys. The value of the primary key of one entity is placed in every entity of the second
type to which it is related. This is discussed in detail later on in the course.
3.6 Ways of Classifying Relationships Types
A relationship type can be classified by the number of entity types involved, and by the degree of
the relationship type, as is shown in Fig. 3.6. These methods of classifying relationship types are
complementary. To describe a relationship type adequately, you need to say what the name of the
relationship type and its inverse are and their meaning, if not clear from their names and you also
need to declare the entity type or types involved and the degree of the relationship type that links
the entities. We now discuss the latter two items.
The purpose of discussing the number of entity types is to introduce the terms unary relationship
type, binary relationship type, and ternary relationship type, and to give examples of each. The
number of entity types in the relationship type affects the final form of the relational database.
The purpose of discussing the degree of relationship types is to define the relevant terms, to give
examples, and to show the impact that the degree of a relationship type has on the form of the
final implemented relational database.

Fig. 3.6 Ways of classifying relationships.

3.6.1 Number of Entity Types


If a relationship type is between entities in a single entity type then it is called a unary
relationship type. One example is the relationship friendship between entities within the entity
type PERSON. If a relationship type is between entities in one entity type and entities in another
entity type then it is called a binary relationship type because two entity types are involved in the
relationship type. An example is the relationship Received in Fig. 3.4 and Fig. 3.5 between
customers and invoices. Another example of a binary relationship type is Purchased between
entity types CUSTOMER and PRODUCT. Two entity types are involved so the relationship is
binary.
It is possible to model relationship types involving more than two entity types. For example a
LECTURER recommends a certain TEXT on a certain COURSE. Here the relationship type is
recommends. This relationship type is said to be a ternary relationship type since three entity
types are involved. Examples of unary, binary and ternary relationship types are shown in Fig.
3.7.

Fig. 3.7 There can be one, two, three or more entity types involved in a relationship.
It is sometimes possible to replace higher-order relationship types (ternary and above) by a
collection of binary relationship types linking pairs of the original entity types. However this is
not always possible (although as we shall see, in 3.6.1.1 below, the high-order relationship can
always be redefined, with suitable renaming, as an entity type). In the example cited above
concerning lecturers recommending textbooks on courses, it is not possible to replace the ternary
relationship type recommends with two or even three binary relationship types because
information would be lost.

Fig. 3.8 (a) shows the ternary relationship type recommends linking LECTURER, TEXT and
COURSE.
In Fig. 3.8(b) an attempt has been made to replace the ternary relationship type with two binary
relationship types. LECTURERs recommend TEXTs and TEXTs are_used_on COURSEs.
The fact that a lecturer recommends a text and that text is used on a course does not necessarily

mean that that lecturer recommended that text for that course. The text might be used on the
course and recommended by someone else, whereas our lecturer does recommend that text but
for a different course.
In Fig. 3.8(c) it is possible to tell which texts a lecturer recommends and which courses he or she
teaches on, but not which texts are used on a course or which courses use a given text. The fact
that a lecturer recommends a text and teaches a course does not imply that he or she recommends
that text for that course.
In Fig. 3.8(d) it is possible to tell which courses a lecturer teaches and which texts a course uses
but not which texts a teacher recommends. Only if every course had only one lecturer would (d)
be satisfactory because then the fact that a course used a text implies who recommended it.
Otherwise (d) is unsatisfactory.
In Fig. 3.8(e) it is possible to tell who recommends which texts, who teaches which courses, and
which texts are used on which courses. However it is still not possible to ascertain, in general,
the answers to questions like:
'Which text does Mr Smith recommend for the 4th year Database course?'
The reason is that even though Mr Smith may recommend text1 and Mr Smith teaches on 4th
year Database, it is not known whether it was Mr Smith who recommended the text for the
course, because he may have recommended the text for another course, and another lecturer on
the 4th year Database course may have recommended text1. The only satisfactory relationship
type is that shown in Fig. 3.8(a).
3.6.1.1 Removing Ternary relationship types
It is advantageous to remove ternary and higher order relationship types. One reason is that it
might be considered more natural to think of entity types having attributes than relationship
types having them. It is in fact always possible to remove these high-order relationship types and
replace them with an entity type. A ternary relationship type is then replaced by an entity type
and three binary relationship types linking it to the entity types which were originally linked by
the ternary. A quartenary relationship type would be replaced by an entity type and four
relationship types and so on.
In Fig. 3.8(e), the ternary relationship type recommends (verb) can be replaced with an entity
type recommendation (noun), and a binary relationship between it and each of the entity types
LECTURER, TEXT and COURSE (three binary relationships in all). It is natural to think about
the attributes of a recommendation but not so natural to think about the attributes of a
relationship type recommends. Typical non-key attributes of the RECOMMENDATION might
be DATE_RECOMMENDED and STATUS (whether the recommendation has been approved or
not). Another advantage of replacing the ternary relationship type is that a ternary or higher-order
relationship type cannot in any real sense have a direction. Another is that in Fig. 3.8(a) it is not

clear from the diagram (without pre-existing contextual knowledge) what is recommending what
to what. Does a lecturer recommend a course in a text? Or does a lecturer recommend a text for a
course?
When the single ternary relationship type has been replaced by three binary relationship types,
each of the relationships and their inverses can be named, lending considerably more semantic
information to the diagram. Clearly, replacing the ternary has allowed us to convey more
semantics about the real-world situation than before.
The general conclusion then is that the only relationship types that should be shown on the entity
relationship diagram should be either unary (involving one entity type) or binary (involving two
entity types).
As stated, the naming of the new entity type and the new relationship types is important.
Inappropriately naming the entity type or omitting or inappropriately naming the relationship
types will lead to misunderstanding and consequent incorrect processing of data (possibly caused
by programmers misunderstanding the meaning of the database schema) and incorrect data
appearing on the database. As a general guide entity types should have noun names (e.g.
RECOMMENDATION) and relationships should have the form of a verb (e.g. `made' or
`concerned' or was_for).
We shall return to this example when we study Fourth Normal Form. That is one of the methods
in Normalization, which is a more detailed and mechanical method of categorizing data.
3.6.2 The Degree of a Relationship Type
The second way of classifying relationship types is to state their degree. As stated in the
preceding section, the number of entity types and the degree both have an important impact on
the final design of the relational database. The use of terminology related to the degree of a
relationship type varies between different authors (See Fig. 3.9). In this tutorial, we use the
James Martin terminology.
No. of entity
Minimum
number
types
participants
in
in
the
relationship
relationship

Source

Author
C.J. Date

1:

Author
2:
James Martin

of Maximum
the participants
relationship

Degree

Optionality

Cardinality

number
in

of
the

Author 3

Optionality

Author 4

Degree

Author 5
Author 6

Degree

Degree

Optionality

Cardinality and Degree

Minimum Cardinality

Maximum Cardinality

Author 7

Degree

Fig. 3.9 Assorted usage of the entity-relationship terminology concerning relationships.


The degree of a relationship type concerns the number of entities within each entity type that can
be linked by a given relationship type. Fig 3.10 shows how this degree is shown on an entity
relationship diagram. There are two directions of a relationship type. Each is named and each has
a minimum degree and a maximum degree.
3.6.2.1 Cardinality and Optionality
The maximum degree is called cardinality and the minimum degree is called optionality. In
another context the terms degree and cardinality have different meanings. The term degree is
used to denote the number of attributes in a relation while cardinality is the number of tuples in
a relation. Here, we are not talking about relations (database tables) but relationship types, the
associations between database tables and the real world entity types they model.
There are three symbols used to show degree. A circle means zero, a line means one and a
crowsfoot means many. The cardinality is shown next to the entity type and the optionality (if
shown at all) is shown behind it. Refer to Fig. 3.10(a). In Fig. 3.10(b) the relationship type R has
cardinality one-to-many because one A is related by R to many Bs and one B is related (by R's
inverse) to one A. Generally, the degree of a relationship type is described by its cardinality. R
would be called a 'one-many' or a one-to-many or a 1 : N relationship type. To fully describe
the degree of a relationship type however we should also specify its optionality.

Fig. 3.10 Relationship degree.


The optionality of relationship type R in Fig. 3.10(b) is one as shown by the line. This means that
the minimum number of Bs that an A is related to is one. A must be related to at least one B.
Considering the optionality and cardinality of relationship type R together, we can say that one A
entity is related by R to one or more B entities. Another way of describing the optionality of one,
is to say that R is a mandatory relationship type. An A must be related to a B. R's optionality is
mandatory. With optionality, the opposite of mandatory is optional. In Fig. 3.10(b) the inverse
of R happens to be optional, as shown by the circle. The inverse of R is an optional relationship
type. This means that one B might not be related (by the inverse of R) to any A. There may be a
B entity not related to any A entity. Considering the optionality and cardinality of the inverse of
R together, we can say that a B entity is related (by the inverse of R) to zero or one A entities.

Fig. 3.11 A set diagram representation of Fig. 3.10(b).


The case of Fig. 3.10(b) is shown in the form of a set diagram in Fig 3.11. The two entity types A
and B are shown as sets (the oval shapes). The entities are shown as small boxes: elements in the
sets. The relationship type R links A entities to B entities. It shows which A entities are related to
which B entities. Notice that it is possible for an A entity to be related to one or more B entities.
The maximum number of Bs for a given A is many (for example the first A entity is related to
two Bs) and the maximum number of As for a given B is one. This establishes the one-many
cardinality of R. The minimum number of Bs for a given A is 1. (There are no A entities without
a B entity). This establishes mandatory optionality of R. There can exist a B that is not related to
any A; for example the last B entity. This establishes the optional optionality of the inverse of
R.
Fig. 3.12 summarizes the terminology in another example.

Fig. 3.12 More examples of our relationship terminology.


3.6.2.2 Deriving a One-Many relationship type
In Fig. 3.13 the procedure for deriving the degree of a relationship type and putting it on the
entity relationship diagram is shown. The example concerns part of a sales ledger system.
Customers may have received zero or more invoices from us. The relationship type is thus called
received and is from CUSTOMER to INVOICE. The arrow shows the direction. The minimum
number of invoices the customer has received is zero and thus the received relationship type is
optional. This is shown by the zero on the line. The maximum number of invoices the customer
may have received is many. This is shown by the crowsfoot. This is summarized in Fig.
3.13(a). To complete the definition of the relationship type the next step is to name the inverse
relationship type. Clearly if a customer received an invoice, the invoice was sent to the customer
and this is an appropriate name for this inverse relationship type. Now consider the degree of the
inverse relationship type. The minimum number of customers you would send an invoice to is
one; you wouldn't send it to no-one. The optionality is thus one. The inverse relationship type is
mandatory. The maximum number of customers you would send an invoice to is also one so the
cardinality is also one. This is summarized in Fig. 3.13(b). Fig. 3.13(b) shows the completed
relationship.

Fig. 3.13 Deriving a 1:N (one:many) relationship.


A word of warning is useful here. In order to obtain the correct degree for a relationship type
(one-one or one-many or many-many) you must ask two questions. Both questions must begin
with the word one. In the present case (Fig. 3.13), the two questions you would ask when
drawing in the relationship line and deciding on its degree would be:
Question
1:
One
Answer: Zero or more.
Question
2:
Answer: One.

One

customer

invoice

was

received

sent

how

to

how

many

invoices?

many

customers?

This warning is based on observations of many student database designers getting the degree of
relationship types wrong. The usual cause of error is only asking one question and not starting
with the word one. For example a student might say (incorrectly): Many customers receive
many invoices (which is true) and wrongly conclude that the relationship type is many-many.
The second most common source of error is either to fail to name the relationship type and say
something like Customer to Invoice is one-to-many (which is meaningless) or give the
relationship type an inappropriate name.
3.6.2.3 Deriving a Many-Many relationship type
Fig. 3.14 gives an example of a many-many relationship type being derived.

Fig. 3.14 Deriving a M:N (many-many) relationship.


The two questions you have to ask to correctly derive the degree of this relationship (and the
answers) are:
Question
1:
One
Answer: One or more.

customer

Question 2: One product


Answer: Zero or more.

type

purchased

was

how

purchased

many

by

how

product

many

types?

customers?

Note that the entity type has been called PRODUCT TYPE rather than PRODUCT which might
mean an individual piece that the customer has bought. In that case the cardinality of
'was_purchased_by' would be one not many because an individual piece can of course only go to
one customer. This point is another common source of error: the tendency to call one item (e.g.
an individual 4" paintbrush) a product and the whole product type (or 'line') (e.g. the 4"
paintbrush product type) a product. You should make the meaning clear from the name you give
the entity type.
We have assumed here that every customer on the database has purchased at least one product;
hence the mandatory optionality of purchased. If this were not true in the situation under study
then a zero would appear instead. The zero optionality of 'was_purchased_by' is due to our
assumption that a product type might as yet have had no purchases at all.

In practice it is wise to replace many-many relationship types such as this with a set (often two)
of one-many relationship types and a set (often one) of new, previously hidden entity types. This
is covered in a later section in this tutorial.
3.6.2.4 Deriving a One-One relationship type
Fig. 3.15 gives an example of a one-one relationship type being derived. It concerns a person and
his or her birth certificate. We assume that everyone has one and that a certificate registers the
birth of one person only.

Fig. 3.15 Deriving a 1:1 (one:one) relationship.


Question
1:
Answer: One.
Question
2:
Answer: One.

How

How

many

many

persons

birth

is

certificates

birth

has

certificate

person?

owned

by?

Where there is a one-one relationship type we have the option of merging the two entity types.
The birth certificate attributes may be considered as attributes of the person and placed in the
person entity type. The birth certificate entity type would then be removed. There are two
reasons for not doing this. Firstly, the majority of processing involving PERSON records might
not involve any or many of the BIRTH_CERTIFICATE attributes. The BIRTH CERTIFICATE
attributes might only be subject to very specific processes which are rarely executed. The second

reason for not merging might be that the BIRTH CERTIFICATE entity type has relationship
types to other entity types that the PERSON entity type does not have. The two entity types have
different relationship types to other entity types.

3.6.3 Mutually Exclusive relationship types


In some cases the existence of one kind of relationship type precludes the existence of another.
Entities within an entity type A may be related by a relationship type R to an entity in entity type
B or entity type C but not both. The relationship types are said to be mutually exclusive. Usually
both relationship types will have the same name, as in the following example. In Fig. 3.16 a fault
report may have been for a computer or a printer but not both. The fact that it might not have
concerned a computer is shown by the zero optionality of the upper 'was_for' relationship type
between FAULT REPORT and COMPUTER. The fact that it might not have concerned a printer
is shown by the zero optionality of the lower 'was_for' relationship type between FAULT
REPORT and PRINTER. However a fault report must have been for either a computer or a
printer (in this example). The zero optionality cannot apply for both. Both this and the fact that
the fault report can have been for a maximum of one of the two entity types is indicated by the
arc on the diagram linking the two relationship types. In summary then, the arc shows that a fault
report can be for a maximum and a minimum of one entity from the types COMPUTER and
PRINTER.

Fig. 3.16 A mutually exclusive relationship 'Was for'.


The set of relationship types is normally assumed to be exhaustive in one sense (i.e. there are not
any other relationship types) because it is customary to put all relationships of interest on the
diagram. However the set of relationship types might not be exhaustive in the sense that a given

entity A might not be related to an entity in any of the other entity types in the group marked by
the arc. This second type of exhaustiveness (or lack of it) cannot be shown using this arc device.
Another limitation of the arc device is that it cannot show excluded and mandatory combinations
of permitted relationships. For example, it might be the case that an entity in type A might be
related to some subset of entities from types B, C and D. It might be that if it is related to a B and
a C then it cannot be related to a D entity. It might be that if it is related to a B then it must also
be related to either a C or D but not both.
A further constraint type that may be required in practice is that an entity of type A may legally
be related to any n entities from a selection of m entity types.
The suggestion being made here is that current methods for drawing entity relationship diagrams
could be extended to allow these types of relationship constraints to be shown on the diagram.
3.6.4 Redundant Relationship Types
In Fig. 3.17 there is a 'received' relationship type between CUSTOMER and INVOICE and an
'obtained' relationship type between INVOICE and PAYMENT. It is possible via 'received' to
find which invoices have been received by a given customer. It is possible to find the customer
an invoice was sent to via the 'was_sent_to' relationship type (the inverse of received). Using
the 'obtained' relationship type it is possible to find the payments that a given invoice has
received and via its inverse 'was_posted_to', the invoice that a payment was posted to. Using the
composition of 'received' and 'obtained' (that is, using one relationship type followed by the
other), it is possible to find all the payments that a given customer has made. By navigating from
CUSTOMER to INVOICE and thence to PAYMENT this can be done.

Fig. 3.17 The 'Made' relationship is redundant.


Similarly, it is possible to find the sender of a payment using the composition of the two
relationship types 'was_posted_to' and 'was_sent_to', navigating from PAYMENT to INVOICE
to CUSTOMER.
If an extra (direct) relationship type from CUSTOMER to PAYMENT were to be implemented it
would be redundant. If it showed only which customer sent a payment and (via its inverse) which
payments a customer made, it would be unnecessary because it shows nothing that cannot be
shown using compositions of the other two relationship types.
In general, when there are loops in your entity relationship diagram, be on the lookout for the
possibility of breaking the loop at some point by removing a relationship type that can be
synthesized from the composition of other relationship types on the diagram. This is often not
possible because of the nature of the relationships i.e. their meaning.
In some rare cases you might consider it advisable to introduce a logically redundant relationship
type simply out of consideration of efficiency.

Fig. 3.18 An entity-relationship diagram for a simple accounting system.


Fig. 3.18 shows the entity relationship diagram for a relatively simple accounting system. The
top grouping of entity types and relationship types constitute a sales ledger ('accounts receivable'
and 'debtors ledger' are two other names for this). The whole diagram will be used as a schema
for the database holding such data and the sales ledger entity types and relationship types will be
called the sales ledger subschema. The bottom subschema is the accounts payable subschema
(also called the 'purchase ledger', because it shows what the company has purchased from its
suppliers). Notice that sub-schemas may overlap, as here. The PRODUCT entity type is used in
both contexts; sales and purchasing. Products are purchased and they are sold.

Returning to the subject of redundant relationship types, let us consider placing a 'redundant'
relationship type between the entity type CUS and the entity type CUS_PAYMENT. There are
many queries that could be answered using the schema shown, including:
'List all payments made by customer X'.
The problem with this query is that to answer it, it is necessary to navigate via four relationship
types. Using the first relationship type 'made', all the customer orders are accessed. For each
order, 'Contains' is used to access every order line. For each order line, the customer invoice (if
any; an invoice might not yet have been sent. This is shown using an 'optional' circle at the lefthand end of the relationship type) is accessed and the payment is retrieved and then listed. The
pseudo-code for this could be written as shown in Fig. 3.19.
RETRIEVE
OBTAIN
customer's
RETRIEVE
first
CUS_ORDER
DOWHILE not end of CUS_ORDERs

CUS

OBTAIN
order's
RETRIEVE
first
CUS_ORDER_LINE
DOWHILE not end of CUS_ORDER_LINEs
OBTAIN
order
IF invoice number is not null

line's

RETRIEVE
CUS_INVOICE
OBTAIN
invoice's
IF invoice's payment number is not null

record

RETRIEVE
PAYMENT
LIST payment details

account
record
for

order
for

record

record

record
number
account

this

this

order

invoice

for
payment

for

ENDIF
ENDIF
RETRIEVE next CUS_ORDER_LINE record for this order no
ENDWHILE
RETRIEVE next CUS_ORDER for this account number
ENDWHILE
Fig. 3.19 Pseudo Code for 'List all payments made by Customer x'

this

this

number
number

number

inv

no
number

pmt

no

This pseudo-code assumes that a customer order line that has not yet been invoiced is indicated
by a null value for the invoice number attribute in the order line and that an invoice that has not
yet been paid is indicated in a similar way using a null value for the payment number in the
invoice record. It must be noted also that this pseudo-code may be considered rather physical
since it talks about records rather than real-world entities. However in general every entity of
interest will be modeled by a database record. Also, in a relational database, the relationship
types are shown using foreign keys such as invoice number in CUS_ORDER_LINE and
payment_no in CUS_INVOICE. In other types of database, in particular the older network
(CODASYL) and hierarchical databases, foreign keys are not used so the details of the pseudocode in Fig. 3.19 would be different. How relationship types are represented, including a
discussion of foreign keys is later in the course.
The pseudo-code might be considered rather complex for such a simple query. It can be
considerably simplified by adding a redundant direct relationship type from CUS to
CUS_PAYMENT. A foreign key (the customer's account number) would be placed in
CUS_PAYMENT as an extra attribute. While unnecessary, as we have said, this relationship type
is advantageous in that the pseudo-code for the query is now as shown in Fig. 3.20, which is
much simpler.

RETRIEVE
OBTAIN
customer's
RETRIEVE
first
DOWHILE not end of CUS_PAYMENTs

CUS

record
number
record

account
CUS_PAYMENT

RETRIEVE
PAYMENT
record
for
LIST
payment
RETRIEVE next CUS_PAYMENT for this account number

this

account

number
details

ENDWHILE
Fig. 3.20 Simplified Pseudo Code for 'List all payments made by Customer x'
In summary, redundant relationship types should be identified and in general removed. However,
implementing a redundant relationship type into the database schema may make the
programming of some queries, reports and updates simpler. The major disadvantage of having
redundant data on the database is that it may lead to inconsistency. The redundant one-many
relationship type we are considering putting between CUS and CUS_PAYMENT would be
implemented by placing a foreign key (the customer's account number) into the
CUS_PAYMENT entity type. If this value was different from the value obtained by navigating
back via the long route (CUS_PAYMENT, CUS_INVOICE, CUS_ORDER_LINE,
CUS_ORDER, CUS) then this would constitute an inconsistency.

Entity Relationship Modeling - Summary


1. An entity type is a type of entity you want to store data about. The data is stored in the form of
attributes. An individual within an entity type is an entity. Each entity in an entity type has a
different value of the entity types primary key.
2. Entity types are linked by relationship types. A relationship type is a type of relationship in the
real world that you want to represent on the database. An individual within a relationship type is
a relationship.
3. Relationship types are binary or unary. Ternary and higher order relationship types can always
be replaced with binary relationship types and entity types.
4. Relationship types have two directions, may have degree one-one, one-many or many-many,
and may be mandatory or optional in either direction.
5. Redundant relationship types may occasionally be modeled and implemented to improve
performance.
6. Many-many relationship types should always be split to reveal entity types and relationship
types that might otherwise remain hidden.
7. Relationship types must always be named in both directions. There can be more than one
relationship type between entity types. The names distinguish them.
8. It should be possible to read an entity-relationship diagram which should represent a set of
simple subject-verb-object sentences. Subject and object are entity types. Relationship types
correspond approximately to verbs. An entity-relationship model should explain itself. An
unclear or ambiguous model does more harm than good, since it might mislead. The good entity
modeler is good at grammar, good at spotting ambiguity, and uses the simplest words.
9. When drawing an entity-relationship model, don't assume anything; rather write a list of
questions you would have to find answers to complete the model. State any assumptions you are
aware of having made.
Assignment - Complete all 3 parts as described below.
Exercise 1 - CARS
Identify all entity types, attributes, relationship types and their degrees in the following case.
Hence draw an entity-relationship diagram.
An organization makes many models of cars, where a model is characterized by a name and a
suffix (such as GL or XL which indicates the degree of luxury) and an engine size.

Each model is made up from many parts and each part may be used in the manufacture of more
than one model. Each part has a description and an id code. Each model of car is produced at just
one of the firms factories, which are located in London, Birmingham, Bristol and Manchester one in each city. A factory produces many models of car and many types of part although each
type of part is produced at one factory only.
Exercise 2 - A UNIVERSITY
A university consists of several faculties. Within each faculty there are several departments. Each
department may run a number of courses. All teaching staff is attached to departments, each staff
member belonging to a unique department. (Note: see how many meanings you can assign to this
ambiguous sentence). Every course is composed of sub-courses. Some sub-courses are part of
more than one course. Staff may teach on many sub-courses and each sub-course may be taught
by a number of staff.
Draw an entity-relationship model for this example. Show both cardinalities and optionalities.
Put a question mark where the degree is not clear from the text. Don't assume anything; rather,
write a list of questions you would have to find answers to in order to complete the model.
Exercise 3 - MORTGAGES
Draw an entity-relationship diagram for the following. Produce also a list of questions you would
have to have answered in order to complete the model.
In a case study of this kind, and in particular in exam questions, there is not usually the space to
completely specify a problem. Remember also that not all the information given in a case study
of this type is necessarily relevant. Some information, while relevant to the organization
concerned, might not be relevant as far as database design is concerned.
Members of a friendly society invest money in any one of the society's branches. A member may
hold a number of investment accounts. Each investment account is associated with the branch
where it was opened, but money may be paid in or withdrawn at any branch. For each account,
the member holds an account book to record all transactions. A member may also have one
mortgage account. All mortgage accounts are associated with the Head Office. Payments may be
transferred from any investment account into the mortgage account.
Steps In Building the Data Model
While ER model lists and defines the constructs required to build a data model, there is no
standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up
development process were the model is built in stages. Typically, the entities and relationships
are modeled first, followed by key attributes, and then the model is finished by adding non-key
attributes. Other experts argue that in practice, using a phased approach is impractical because it

requires too many meetings with the end-users. The sequence used for this document are:
1.

Identification of data objects and relationships

2.

Drafting the initial ER diagram with entities and relationships

3.

Refining the ER diagram

4.

Add key attributes to the diagram

5.

Adding non-key attributes

6.

Diagramming Generalization Hierarchies

7.

Validating the model through normalization

8.

Adding business and integrity rules to the Model


In practice, model building is not a strict linear process. As noted above, the requirements
analysis and the draft of the initial ER diagram often occur simultaneously. Refining and
validating the diagram may uncover problems or missing information which require more
information gathering and analysis
Identifying Data Objects and Relationships
In order to begin constructing the basic model, the modeler must analyze the information
gathered during the requirements analysis for the purpose of:

Classifying data objects as either entities or attributes

Identifying and defining relationships between entities

Naming and defining identified entities, attributes, and relationships

Documenting this information in the data document


To accomplish these goals the modeler must analyze narratives from users, notes from meeting,
policy and procedure documents, and, if lucky, design documents from the current information
system. Although it is easy to define the basic constructs of the ER model, it is not an easy task
to distinguish their roles in building the data model. What makes an object an entity or attribute?
For example, given the statement "employees work on projects". Should employees be classified
as an entity or attribute? Very often, the correct answer depends upon the requirements of the
database. In some cases, employee would be an entity, in some it would be an attribute.
Attributes

Attributes are data objects that either identify or describe entities. Attributes that identify entities
are called key attributes. Attributes that describe an entity are called non-key attributes. Key
attributes will be discussed in detail in a latter section. The process for identifying attributes is
similar except now you want to look for and extract those names that appear to be descriptive
noun phrases.

Relationships
Relationships are associations between entities. Typically, a relationship is indicated by a verb
connecting two or more entities. For example: employees are assigned to projects As
relationships are identified they should be classified in terms of cardinality, optionality, direction,
and dependence. As a result of defining the relationships, some relationships may be dropped and
new relationships added. Cardinality quantifies the relationships between entities by measuring
how many instances of one entity are related to a single instance of another. To determine the
cardinality, assume the existence of an instance of one of the entities. The logical association
among entities is called relationship.

Relationships are mapped with entities in various ways. Mapping cardinalities define the number
of association between two entities.
Mapping cardinalities:
one to one
one to many
many to one
many to many
The overall logical structure (schema) of a database can be expressed graphically by an E-R
diagram.as

Relational Model
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name.
The data is arranged in a relation which is visually represented in a two dimensional table. The
data is inserted into the table in the form of tuples (which are nothing but rows). A tuple is
formed by one or more than one attributes, which are used as basic building blocks in the
formation of various expressions that are used to derive meaningful information. There can be

any number of tuples in the table, but all the tuple contain fixed and same attributes with varying
values. The relational model is implemented in database where a relation is represented by a
table, a tuple is represented by a row, an attribute is represented by a column of the table,
attribute name is the name of the column such as identifier, name, city etc., attribute value
contains the value for column in the row. Constraints are applied to the table and form the logical
schema. In order to facilitate the selection of a particular row/tuple from the table, the attributes
i.e. column names are used, and to expedite the selection of the rows some fields are defined
uniquely to use them as indexes, this helps in searching the required data as fast as possible. All

the relational algebra operations, such as Select, Intersection, Product, Union, Difference,
Project, Join, Division, Merge etc. can also be performed on the Relational Database Model.
Operations on the Relational Database Model are facilitated with the help of different conditional
expressions, various key attributes, pre-defined constraints etc. Hence in nutshell The most
popular data model in DBMS is Relational Model. It is more scientific model then others. This
model is based on first-order predicate logic and defines table as an n-ary relation.
The main highlights of this model are:
Data is stored in tables called relations. Relations can be normalized.
In normalized relations, values saved are atomic values. Each row in relation contains unique
value
Each column in relation contains values from a same domain

Relational Model Concepts

We shall represent a relation as a table with columns and rows. Each column of the table has a
name, or attribute. Each row is called a tuple.

Domain: a set of atomic values that an attribute can take

Attribute: name of a column in a particular table (all data is stored in tables). Each attribute Ai
must have a domain, dom(Ai).

Relational Schema: The design of one table, containing the name of the table (i.e. the name of
the relation), and the names of all the columns, or attributes.
Example: STUDENT( Name, SID, Age, GPA)

Degree of a Relation: the number of attributes in the relation's schema.

Tuple, t, of R( A1, A2, A3, , An): an ORDERED set of values, < v1, v2, v3, , vn>, where
each vi
is a value from dom( Ai).

Properties of relations
Properties of database relations are:
Relation name is distinct from all other relations
Each cell of relation contains exactly one atomic (single) value
Each attribute has a distinct name
Values of an attribute are all from the same domain
Order of attributes has no significance
Each tuple is distinct; there are no duplicate tuples
Order of tuples has no significance, theoretically.
Relational keys :
There are two kinds of keys in relations. The first are identifying keys: the primary key is the
main concept, while two other keys super key and candidate key are related concepts. The
second kind is the foreign key.
Identity Keys
Super Keys
A super key is a set of attributes whose values can be used to uniquely identify a tuple within a
relation. relation may have more than one super key, but it always has at least one: the set of all
attributes that make up the relation.
Candidate Keys
A candidate key is a super key that is minimal; that is, there is no proper subset that is itself a
super key. A relation may have more than one candidate key, and the different candidate keys
may have a different number of attributes. In other words, you should not interpret 'minimal' to
mean the super key with the fewest attributes.
A candidate key has two properties:

(i)

in each tuple of R, the values of K uniquely identify that tuple (uniqueness)

(ii)

no proper subset of K has the uniqueness property (irreducibility).


Primary Key
The primary key of a relation is a candidate key especially selected to be the key for the relation.
In other words, it is a choice, and there can be only one candidate key designated to be the
primary key.
Relationship between identity keys
The relationship between keys:

Super key Candidate Key Primary Key


Foreign keys
The attribute(s) within one relation that matches a candidate key of another relation. A relation
may have several foreign keys, associated with different target relations.
Foreign keys allow users to link information in one relation to information in another relation.
Without FKs, a database would be a collection of unrelated tables.

Relational Model Constraints Integrity Constraints


Each relational schema must satisfy the following four types of constraints.
A. Domain constraints
Each attribute Ai must be an atomic value from dom( Ai) for that attribute.
The attribute, Name in the example is a BAD DESIGN (because sometimes we may want to
search person by only using their last name.
B. Key Constraints
Super key of : A set of attributes, SK, of R such that no two tuples in any valid relational
instance, r( R), will have the same value for SK. Therefore, for any two distinct tuples, t1 and t2
in r( R), t1[ SK] != t2[SK].
Key of R: A minimal superkey. That is, a superkey, K, of R such that the removal of ANY
attribute from K will result in a set of attributes that are not a superkey.
Example CAR( State, LicensePlateNo, VehicleID, Model, Year, Manufacturer) This schema has
two keys:
K1 = { State, LicensePlateNo}
K2 = { VehicleID } Both K1 and K2 are superkeys.
K3 = { VehicleID, Manufacturer} is a superkey, but not a key (Why?).
If a relation has more than one keys, we can select any one (arbitrarily) to be the primary key. Primary
Key attributes are underlined in the schema:
CAR(State, LicensePlateNo, VehicleID, Model, Year, Manufacturer)

C. Entity Integrity Constraints


The primary key attribute, PK, of any relational schema R in a database cannot have null values
in any tuple. In other words, for each table in a DB, there must be a key; for each key, every row
in the table must have non-null values. This is because PK is used to identify the individual
tuples.
Mathematically, t[PK] != NULL for any tuple t r( R).
D. Referential Integrity Constraints
Referential integrity constraints are used to specify the relationships between two relations in a
database.

Consider a referencing relation, R1, and a referenced relation, R2. Tuples in the referencing
relation, R1, have attributed FK (called foreign key attributes) that reference,the primary key
attributes of the referenced relation, R2. A tuple, t1, in R1 is said to reference a tuple, t2, in R2 if
t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a relational database schema as a directed
arc from the referencing (foreign) key to the referenced (primary) key. Examples are shown in
the figure below:

Examples of ER diagrams:
Business rules (i.e., relationships)
Example 1
1.
2.
3.

a professor teaches zero, one or many classes and a class is taught by one professor
a course may generate zero, one or many classes and a class comes from one course
a class is held in one room but a room has many classes
Example 2 (try this at home and if you have questions raise them next class)

1.
2.
3.

an invoice is written by one salesrep but a salesrep writes many invoices


a vendor sells many products but a product is bought from one vendor
an invoice has one or many products and a product is found on zero, one or many
invoices
Example-1 Solution (Incomplete)
The many-to-many relationship is not resolved, therefore the solution is incomplete. In the final
solution the many-to-many must always be resolved.

Final Solution (Complete)

In this example, the many-to-many relationship between student and class is resolved.

Solution to the INVOICE problem

Collect one E-R diagram with extended ER features and explanations


QUESTION:
Create an E-R diagram of a sports club conducted by school.
a) A school is decided to setup a sports club outside the school.

b) Sports club can be categorized based on the type: cricket club, football club.
c) A student can join in any one of the sports club.
d) Each sports club has a coach who trains the students.
e) Each student can be identified using id no.
f) Salary, experience, name of the coach can also be included.

Design a database to keep track of information for an art museum.Assume that the following
requirements were collected:
The museum has a collection of art_objects.Each art_object has a unique id,an artist(if known),a
year(when it was created,if known) and a title.Art_objects are categorized based on
their types.There are two main types:painting and sculpture.Painting has a paint type and
style.Sculpture has a material from which it was created,height and weight.
Different exhibitions occur each having a name,start date and end date.Exhibitons are related to
all the art objects that were on display on display during the exhibition.

BLOCK I
PART A

1. Define a data-base-management system.


2. List any eight applications of database system.
3. What is the difference between single-user and multi-user system?
4. Write short notes on file system.
5. What are the disadvantages of file system?
6. What are the characteristics of a database system?
7. Explain the terms,
i)
ii)

Catalog
Meta-data
8. Define the terms,
i) Program-data independence
ii) Program-operation independence
9. What is a meant by view?
10. What is the need for concurrency control software in a DBMS?
11. Write short notes on online Transaction processing.
12. What are the advantages of using a DBMS?
13. In what situations, should we not use a DBMS?
14. Write notes on database system structure.
15. What is a storage Manager?
16. What is the purpose of a storage manager?
17. List the data structures implemented by the storage manager.
18. List the data structures implemented by the storage manager.
19. What is a data dictionary?
20. List out the components of a query processor.
21. Define the terms,
i) instance

ii) schema

22. Define the terms i. Physical schema, ii. Logical schema.

23. What is a data model?


24. List the categories of data models according to the types of concepts used to describe the
database structure.
25. Explain the following:
i) Conceptual data model ii) Physical data model
iii) Implementational data model.
26. What are record-based data models?
27. What is an access path?
28. What is an Entity-Relationship model?
29. What are attributes? Give example.
30. What is a relationship? Give example.
31. Define the terms,
i) Entity set ii) Relationship set
32. Express graphically by an ER diagram the following components
i) Entity sets

ii) Attributes

iii) Relationship

iv) Links

33. What is a Relational model?


34. Define the terms,
i) DDL

ii) DML

35. What are the two types of DML?


36. What are the three levels of architecture in ANSI/SPARC?
37. What is the need for a data administrator and a database administrator?
38. What is meant by a client /server architecture?
39. What are simple and composite attributes?
40. Define Single valued and multivalued attributes.
41. What are stored and derived attributes?
42. When do we use null values?
43. What are complex attributes?
44. Define the terms.
i) Entity type
45. Define the terms
i) Key attribute

ii) Entity set


ii) Value set

46. What are weak and strong entity types? How are they represented in an ER diagram?

47. Draw ER diagram notations for the following:


i) Key attribute

ii) Multivalued attribute

iii) Composite attribute

iv) Derived attribute

48. What is a relationship type? What is meant by the degree of relationship type?
49. What does a role name signify?
50. What does the participation constraint specify?
Long answer questions:
1) Explain Database Systems Versus File Systems.
2) Explain Database system Structure
3) Explain Entity Relationship model.
4) Explain Relational Model.
5) Explain Database Language.
6) Limitations of the ER Model.
7) What is Extended Entity Relationship (EER) Model
8) Compare logical and physical independence
9) Explain the two types of participation constraint.
10) What the Recursive Relationships?
11) What does the cardinality ratio specify?
12) What are structural constraints?
13) Define Data base Management system.
14) Define Data model.
15) What is schemer?
16) What is DML?
17) What are the subsystem in data base system?
18) What are Data base applications?
19) List the advantages of Hierarchical Model.
20) List out the disadvantages of hierarchical model.
21) What are the advantages of network model?
22) What are the disadvantages of network model?

23) Compare logical and physical independence.


24) Explain Data Definition Language.
25) What is Data Manipulation Language?
26) What is Extended Entity Relationship (EER) Model
27) Limitations of the ER Model.
28) Explain Weak Entity Types.

Say true or false:


1.A file is a collection of similar records.
2.A database is a collection of interrelated files.
3.An historical advantage of using conventional files has been processing speed. They can be
optimized for the access of the application.
4.Duplication of data items in multiple files is normally cited as the principal disadvantage of
file-based systems.
5.A database is not necessarily dependent on the applications that use it.
6.Given the large capacity disks that are now available, database administrators no longer have to
be concerned about estimating how much disk capacity is required for a new database.
7.It is important for the database administrator to estimate how much disk capacity is required
for a new database to ensure that sufficient disk space is available.
8.Conventional files are relatively difficult to design and implement because they are normally
designed for use with multiple applications or information systems.
9.Files tend to be built around single applications without regard to other, future applications.
10.A significant disadvantage of conventional files is their inflexibility and non-scalability.
11.A significant advantage of conventional files is their flexibility and scalability.
12.As legacy file-based systems and applications become candidates for reengineering, the trend
is overwhelmingly in favor of replacing file-based systems and applications with database
systems and applications.
13.As enterprise systems and applications are re-engineered, the trend is overwhelmingly in
favor of replacing database systems and applications with legacy file-based systems and
applications.
14.The principal advantage of database systems is the ability to share the same data across
multiple applications and systems.
15.A principal advantage of the database approach is that you can build a single super-database
that contains all data items of interest to an organization.

16.Most organizations build several databases, each one sharing data with several information
systems. Thus, there will be some redundancy between databases.
17.Most organizations build several databases leading to significant and uncontrolled redundancy
between databases.
18.Database technology offers the advantage of storing data in flexible formats.
19.A disadvantage of database technology is the lack of flexibility in data storage formats.
20.Data independence refers to the fact that databases are defined separately from the
information systems and application programs that will use them.
21. Database technology provides superior scalability, meaning that the database and the systems
that use it can be grown or expanded to meet the changing needs of an organization.
22.Database technology provides better technology for client/server and network computing
architectures.
23.You see a return to conventional file-based architectures today because they are better
technology for client/server and network computing architectures.
24.File technology is more complex than database technology.
25.While a database management system (DBMS) is somewhat slower than file technology,
these performance limitations are rapidly disappearing.
Fill in the Blank Questions
1
2
3
4
5
6
7
8
9
10

Database design should proceed only if the underlying logical data model is in at least
_____________________ normal form.
A(n) _____________________________ is the physical model or blueprint for a database. It
represents the technical implementation of the logical data model.
During the creation of database schemas, _________________________ means the field does
not have to have a value; whereas, __________________________ means the field must have a
value.
____________________ integrity for a database means that every table should have a primary
key (which may be concatenated) but is controlled such that no two records in the table have the
same primary key value.
The ______________________ key for a record must never be allowed to have a NULL value.
____________________________ integrity means that appropriate controls must be designed to
ensure that no field takes on a value that is outside the range of legal values.
_______________________________ integrity means that the architecture of relational
databases implements the relationships between the records in tables via foreign keys.
A(n) __________________________________ error exists when a foreign key value in one
table has no matching primary key value in the related table.
A(n) ________________________________ is an alternate name for a foreign key that clearly
distinguishes the purpose that foreign key serves in the table.
_______________________________________ establishes which business locations need
access to which logical data entities and attributes.

11
12

13
14

15
16

____________________________________ of a database means that it would be implemented


on a single server regardless of the number of physical locations that may require access to it.
______________________________ distribution of the data means that each table or entire rows
in a table would be assigned to different database servers and locations. This option results in
efficient access and security because each location has only those tables and rows required for
that location.
_________________________________ distribution of the data has the unfortunate side effect
that data cannot always be easily recombined for management analysis across sites.
________________________________ distribution of the data has specific columns of tables
assigned to specific databases and servers.

___________________________________ of data refers to the physical duplication of entire


tables to multiple locations.
_________________________________ of data offers performance and accessibility advantages
and reduces network traffic, but it also increases the complexity of data integrity and requires
more physical storage.

17

______________________________ is the simplest and easiest solution to maintain; however, it


violates a data management rule that has become important to many data administrators and
users - data should be located as closely as possible to its users.

18
19
20

A(n) _____________________________ is a collection of similar records.


A(n) ______________________________ is a collection of interrelated files.
A historical ___________________________ (advantage or disadvantage) of using conventional
files has been processing speed. They can be optimized for the access of the application.
Duplication of data items in multiple files is normally cited as the principal disadvantage of (filebased or database) systems.
_________________________________ is a three-step technique that places the data model into
first normal form, second normal form and third normal form.

21
22

23
24
25

Once a database design and its corresponding schema have been completed, a
_________________________ database can usually be generated very quickly.
Conventional files are relatively __________________ (easy or hard) to design and implement
because they are normally designed for use with a single application or information system.
The trend is overwhelmingly in favor of replacing file-based systems and applications with
______________ systems and applications.

Multiple choice questions:

1.

A Gender field can hold only the values M or F. This is an example of:
A)

key integrity

B)

domain integrity

C)

referential integrity

D)

logical integrity

E)

schema integrity

2.The EmployeeID field in an employee table cannot be left blank. This is an example of:
A)

key integrity

B)

domain integrity

C)

referential integrity

D)

logical integrity

E)

schema integrity

3.The DeptID field in an employee table must match the DeptID of an existing record in the
department table. This is an example of:
A)

key integrity

B)

domain integrity

C)

referential integrity

D)

logical integrity

E)

schema integrity

4.Specialized computer software that is used to create, access, control, and manage the database
is called:

5.

A)

network system

B)

database management system

C)

operating system

D)

network operating system

E)

none of these

A program embedded within a table and invoked automatically by updates is called a(n):
A)

DML

B)

DDL

C)

trigger

D)

stored procedure

E)

view

6.The person responsible for data planning, definition, architecture, and management is known as
a(n):
A)

data administrator

B)

database administrator

C)

system owner

D)

end-user

E)

none of these

7. The person responsible for the database technology, database design and construction
consultation, security, backup and recovery, and performance tuning is known as a(n):
A)

data administrator

B)

database administrator

C)

system owner

D)

end-user

E)

none of these

8.To add a new record to a database table you would use:


A)

DML

B)

DDL

C)

DBA

D)

CASE

E)

none of these

9. Which language is used by the DBMS to physically establish those record types,
structural relationships in a relational database?
A)

DML

B)

DDL

C)

DBA

D)

CASE

E)

none of these

fields, and

10.Which language is used by the DBMS to create, read, update, and delete records in the
database and to navigate between different records and types of records?

A)

DML

B)

DDL

C)

CASE

D)

navigator

E)

none of these

11.The physical, relational database implementation of a data model is known as a:


A)

scenario

B)

role model

C)

schema

D)

primary data model

E)

none of these

12.Which of the following is not a command in SQL?


A)

SELECT

B)

BLOCK

C)

PROJECT

D)

JOIN

E)

all of these are commands in SQL

13. Which of the following is NOT a high-performance relational DBMS?


A)

Oracle

B)

IBM DB2

C)

Microsoft SQL Server

D)

Microsoft Access

E)

Sybase

14.A program that is embedded within a relational database table that can be called from an
application program is known as a(n):
A)

embedded procedure

B)

stored procedure

C)

trigger procedure

D)

schema procedure

E)

none of these

15.Which of the following are not criteria for producing a quality data model?
A)

A good data model is simple.

B)

A good data model is redundant.

C)

A good data model is flexible.

D)

A good data model is adaptable to future needs.

E)

all of these are criteria for producing a quality data model

16.No two records in an employee table can have the same value for EmployeeID. This is an
example of:
A)

key integrity

B)

domain integrity

C)

referential integrity

D)

logical integrity

E)

schema integrity

17. Appropriate controls must be designed to ensure that no field takes on a value that is outside of
the range of legal values. This refers to:
A)

referential integrity

B)

domain integrity

C)

key integrity

D)

data integrity

E)

none of these

18.The foreign key value in one table must have a matching primary key value in the related
table. This refers to:
A)

referential integrity

B)

domain integrity

C)

key integrity

D)

data integrity

E)

none of these

19.The alternate name for a foreign key that clearly distinguishes the purpose that foreign key
serves in the table is known as:

A)

role name

B)

attribute name

C)

service pointer

D)

domain name

E)

none of these

20.When a database is implemented on a single server regardless of the number of physical


locations that may require access to it is known as:
A)

centralization

B)

horizontal distribution

C)

vertical distribution

D)

replication

E)

none of these

21. When a table or entire rows in a table are assigned to different database servers and locations, it
is known as:
A)

centralization

B)

horizontal distribution

C)

vertical distribution

D)

replication

E)

none of these

22.A collection of similar records is known as:


A)

a field

B)

a file

C)

a database

D)

an attribute

E)

none of these

23.A collection of interrelated files is known as:


A)

a field

B)

a record

C)

a database

D)

a network

E)

none of these

24.The physical implementation of a data attribute; it is the smallest unit of meaningful data to
be stored:
A)

a field

B)

a file

C)

a record

D)

a key

E)

none of these
25. The field whose values identify one and only one record in a file is known as the:

26.

A)

attribute

B)

associative field

C)

primary key

D)

secondary key

E)

none of these

An alternate identifier for a database, its value may identify either a single record or a subset of
all records is known as a(n):
A)

attribute

B)

associative field

C)

primary key

D)

secondary key

E)

none of these

27.Pointers to the records of a different file in a database, they are used to link records of one
type to those of another type:
A)

attributes

B)

referential pointers

C)

descriptive fields

D)

foreign keys

E)

none of these

28.A collection of fields arranged in a predefined format is known as a(n):


A)

attribute

B)

file

C)

concatenated key

D)

record

E)

none of these

29. Record structures that require each record instance to have the same fields, same number of
fields, and same logical size is classified as:
A)

a fixed length record structure

B)

a variable length record structure

C)

a table

D)

a transaction file structure

E)

none of these

30.Record structures that allow different records in the same file to have different lengths is
known as:
A)

a fixed length record structure

B)

a standard deviation record structure

C)

a variance record structure

D)

a variable length record structure

E)

none of these

31.The number of logical records included in a single read or write operation from the
computer's perspective is known as the:
A)

length factor

B)

transaction factor

C)

blocking factor

D)

referential factor

E)

none of these

32.The set of all occurrences of a record structure is known as a(n):


A)

field

B)

file

C)

object

D)

database

E)

none of these

33. The relational database equivalent of a file is known as a(n):


A)

scenario

B)

Transaction

C)

Block

D)

Table

E)

None of these

34.Files or tables that contain records that are relatively permanent are known as:
A)

Master

B)

Transaction

C)

Document

D)

Archival

E)

None of these

35.Files or tables that contain records that describe business events are known as:

36.

A)

master

B)

transaction

C)

document

D)

archival

E)

none of these

Files and tables that contain stored copies of historical data for easy retrieval and review without
the overhead of regeneration are known as:
A)

master

B)

archival

C)

document

D)

table look-up

E)

none of these

37. Files and tables that contain master and transaction file records that have been deleted from on-

line storage are known as:


A)

document

B)

table look-up

C)

archival

D)

audit

E)

none of these

38.Files that contain relatively static data that can be shared by applications to maintain
consistency and improve performance are known as:
A)

document

B)

table look-up

C)

archival

D)

audit

E)

none of these

39.Files that are special records of updates to other files, especially master and transaction files,
are known as:
A)

Document

B)

Audit

C)

Archival

D)

Block

E)

None of these

40.A database that stores data extracted from operational databases for the purpose of data
mining is called a(n):
A)

Transactional database

B)

Personal database

C)

Workgroup database

D)

Data warehouse

E)

Distributed database

41. Which of the following is the smallest unit of data stored?


A)

File

B)

Logical record

C)

Block

D)

Field

E)

None of these

42.A collection of fields arranged in a predefined format is called a:


A)

character

B)

record

C)

field

D)

database

E)

none of these.

43.Fields whose values identify one and only one record in a file are called:
A)

foreign keys

B)

primary keys

C)

alternative keys

D)

concatenated keys

E)

none of these

44.Which of the following is an acceptable technique for implementing supertype/subtype


entities?
A)

Each supertype and subtype can be implemented with a separate table.

B)

The subtypes may be collapsed into the supertype to create a single table.

C)

The supertype's attributes could be duplicated in a table for each subtype.

D)

A and C

E)

all of these
45. Which form of distribution duplicates data in multiple locations?

A)

centralization

B)

horizontal distribution

C)

vertical distribution

D)

replication

E)

none of these

46.What is a blocking factor?


A)

size of a physical record

B)

size of a block of logical records

C)

number of logical records included in a single read

D)

number of physical records in a physical record

E)

size of a logical record

47.The main reason why data is retrieved in "blocks" is to:


A)

minimize the number of actual disk accesses

B)

save space

C)

make programming easier

D)

increase the lifetime of hardware

E)

all of these

48.The person responsible for the database technology, database design and construction,
security, backup and recovery, and performance tuning is the:
A)

network administrator

B)

systems administrator

C)

systems analyst

D)

database administrator

E)

none of these

49.DDL stands for which of the following:


A)

data definition language

B)

data defined language

C)

driven data language

D)

driven data loop

E)

data definition lookup

50.Which of the following languages is used to create, read, update, and delete records in the
database and to navigate between different records and types of records?
A)

DML

B)

DDL

C)

DSS

D)

DDS

E)

none of these

51.The database technology used to support data architecture is called:

52.

A)

database architecture

B)

network architecture

C)

systems architecture

D)

data architecture

E)

none of these

An employee and a customer file are both examples of which type of file?
A)

master

B)

transaction

C)

scratch

D)

table

E)

archive
53. A file that contains records that describe business events is what type of file?

A)

master

B)

transaction

C)

scratch

D)

table

E)

archive

54.A file that contains off-line records of master or transaction files is what type of file?
A)

log

B)

scratch

C)

table

D)

archive

E)

none of these

55.When specific columns of tables are assigned to specific databases or servers, it is known as:
A)

centralization

B)

horizontal distribution

C)

vertical distribution

D)

replication

E)

none of these

56.When entire tables are duplicated and stored in multiple locations or file servers, it is known
as:
A)

centralization

B)

horizontal duplication

C)

vertical duplication

D)

replication

E)

none of these
57. Which of the following is NOT a step of database capacity planning?

A)

Calculate the record size by summing the field sizes in each table.

B)

Calculate the table size by multiplying the record size times the number of records.

C)

Sum the table sizes.

D)

Optionally, add a slack capacity buffer.

E)

none of these

58.Every nonkey field is called a(n):


A)

secondary key

B)

foreign key

C)

descriptive field

D)

record

E)

none of these

You might also like