0% found this document useful (0 votes)
647 views102 pages

RDBMS Important Questions With Answers

Uploaded by

Vsarchana Qa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
647 views102 pages

RDBMS Important Questions With Answers

Uploaded by

Vsarchana Qa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 102

== RDBMS – Relational database management system

It is a software that is used to define create and maintain a database and provides controlled accessed
to the data . RDBMS is an advanced version or extension of a DBMS

Information is RDBMS is in tables and the data sharing between tables is called relation

Data : it is nothing but collection of raw facts –unstructured data


Raw indicates that this data is not yet been processed to reveal its meaning

Number digits 0 to 9 ,characters aspecial symbols pictures audio video etc.

Process : the manipulation of data


Manipulation : changing the data from one form to another form
Information : the processed data is called information

Eg: student number name address phone nmber father name etc

Database : An organized collection of logically related data without


having any redundancy store in one place and accessed by multiple
users.
Before processing it is called data and after processing it is called information .

dodnot store repeated data with out duplication & stored in the centralized unit and acesed by , more
than one person can acess the data from a data base multiple users

college : staff students non staff administration hosptital , bank

DBMS : is a commercial software

DBMS : is a commercial software system which can be used to

Create database

Insert data in to a DB
To modify data in a DB

To removed data from a DB

To maintain a DB

RDBMS : for implementing DBMS systematically .

DBMS software examples : ORavle , SQL mysql DB2

Introduction to OS --- refer to the downloaded document

RDBMS Important Questions


UNIT-I
1) What is DBMS, Explain File based System.
2) Explain advantages of DBMS over file
based system
3) Explain DBA Roles and Responsibilities?
4) Explain all keys in DBMS (Primary Key,
Foreign Key,..).
5) Explain E-R Model and Relationships,
Explain with ER model for college Database?
6) Explain about Three level architecture of
DBMS
RDBMS Important Questions
UNIT-I
1) What is DBMS, Explain File based System.
2) Explain advantages of DBMS over file
based system
3) Explain DBA Roles and Responsibilities?
4) Explain all keys in DBMS (Primary Key,
Foreign Key,..).
5) Explain E-R Model and Relationships,
Explain with ER model for college Database?
6) Explain about Three level architecture of
DBMS

UNIT – I

What is DBMS Explain DBMS file


system
Data:
 Data is defined as collection of raw facts about a place, person, thing or object involving in the
transactions of an organization.
 Data can be represented in various forms like text, numbers, images, audio, video, graphs,
document files, etc.
 Data constitutes the building blocks of information.
 Data is one of the important assets of the modern business.
 Data becomes relevant based on the context.

Information
 Information can be defined as processed data that increases the knowledge of end user.
 Information is used to reveal the meaning of data.
 Good, accurate and timely information is used in decision making.
 The quality of data influences the quality of information.
 Information can be presented in the tabular form, bar graph or an image.
Database :
 Database can be defined as organized collection of logically related data.
 Database can be of any size and complexity.
 Data are structured so as to be easily stored, manipulated, and retrieved by users.
 Example: Sales person can store customers contacts on his laptop that consist of few mega
bytes of data or A big company can store the data of all activities in the organization which
helps in decision making..

DBMS:
 Database management system can be defined as reorganized collection of logically related data and
set of programs used for creating, storing, updating and retrieval of data from the database.
 DBMS acts as a mediator between end-user and the database.
 Database management system (DBMS): can be defined as collection of programs

that manages database structure and controls access to data.


 DBMS enables data to be shared.
 DBMS integrates many users’ views of the data.

Database Systems
• Database system consists of logically related data stored in a single logical data repository.
• Database system may be physically distributed among multiple storage facilities
• DBMS eliminates most of file system’s problems.
• Current generation stores data structures, relationships between structures, and access paths. Also
defines, stores, and manages all access paths and components

TYPES OF DATABASES
• Databases can be classified according to:
– Number of users
– Database location(s)
– Expected type and extent of use
• Single-user database supports only one user at a time
– Desktop database: single-user; runs on PC
• Multiuser database supports multiple users at the same time
– Workgroup and enterprise databases
• Centralized database: data located at a single site
• Distributed database: data distributed across several different sites
• Operational database: supports a company’s day-to-day operations
– Transactional or production database
• Data warehouse: stores data used for tactical or strategic decisions

2) Explain advantages of DBMS over


file system
File System: A File Management system is a DBMS that allows access to single files or tables
at a time. In a File System, data is directly stored in a set of files. It contains flat files that have
no relation to other files (when only one table is stored in a single file, then this file is known as
a flat file).
DBMS: A Database Management System (DBMS) is application software that allows users to
efficiently define, create, maintain and share databases. Defining a database involves
specifying the data types, structures and constraints of the data to be stored in the database.
Creating a database involves storing the data on some storage medium that is controlled by
DBMS. Maintaining a database involves updating the database whenever required to evolve
and reflect changes in the miniworld and also generating reports for each change. Sharing a
database involves allowing multiple users to access the database. DBMS also serves as an
interface between the database and end users or application programs. It provides control
access to the data and ensures that data is consistent and correct by defining rules on them.
An application program accesses the database by sending queries or requests for data to the
DBMS. A query causes some data to be retrieved from the database.

Advantages of DBMS over File system:

 Data redundancy and inconsistency: Redundancy is the concept of repetition of data i.e.
each data may have more than a single copy. The file system cannot control the redundancy
of data as each user defines and maintains the needed files for a specific application to run.
There may be a possibility that two users are maintaining the data of the same file for
different applications. Hence changes made by one user do not reflect in files used by
second users, which leads to inconsistency of data. Whereas DBMS controls redundancy by
maintaining a single repository of data that is defined once and is accessed by many users.
As there is no or less redundancy, data remains consistent.
 Data sharing: The file system does not allow sharing of data or sharing is too complex.
Whereas in DBMS, data can be shared easily due to a centralized system.
 Data concurrency: Concurrent access to data means more than one user is accessing the
same data at the same time. Anomalies occur when changes made by one user get lost
because of changes made by another user. The file system does not provide any procedure
to stop anomalies. Whereas DBMS provides a locking system to stop anomalies to occur.
 Data searching: For every search operation performed on the file system, a different
application program has to be written. While DBMS provides inbuilt searching operations.
The user only has to write a small query to retrieve data from the database.
 Data integrity: There may be cases when some constraints need to be applied to the data
before inserting it into the database. The file system does not provide any procedure to
check these constraints automatically. Whereas DBMS maintains data integrity by enforcing
user-defined constraints on data by itself.
 System crashing: In some cases, systems might have crashed due to various reasons. It is
a bane in the case of file systems because once the system crashes, there will be no
recovery of the data that’s been lost. A DBMS will have the recovery manager which
retrieves the data making it another advantage over file systems.
 Data security: A file system provides a password mechanism to protect the database but
how long can the password be protected? No one can guarantee that. This doesn’t happen
in the case of DBMS. DBMS has specialized features that help provide shielding to its data.
 Backup: It creates a backup subsystem to restore the data if required.
 Interfaces: It provides different multiple user interfaces like graphical user interface and
application program interface.
 Easy Maintenance: It is easily maintainable due to its centralized nature.
DBMS is continuously evolving from time to time. It is a powerful tool for data storage and
protection. In the coming years, we will get to witness an AI-based DBMS to retrieve databases
of ancient eras.

Database approach:
An information system that uses a Database
Management System (DBMS) to manage its
information has a particular structure, comprising three
components: Data, DBMS, and Application software.
This structure as described below is referred to as the
database approach to information system
development.
The central component of the database approach is the
DBMS. This software is also referred to as the
“database engine” or the “back end.” With regard to
the data it manages, it has several responsibilities
including the following:

characteristics of the database technique in DBMS.


1.Data Independence:- One of the important
characteristics of the database approach is
information independence. The database
method allows for the separation of the bodily
garage of information from the logical business
enterprise of information. This manner of
adjustments made to the bodily storage of
facts do no longer affect the logical
organization of the records, and vice versa. As
a result, applications that use the database can
be modified without affecting the underlying
information.
2.Data Integration:- Another characteristic of
the database approach is statistics integration.
The database technique allows for the
integration of statistics from distinct sources
right into a single database. This means that
information may be shared between
extraordinary packages, reducing redundancy
and improving facts consistency.
3.Centralized Control:- The database method
allows for centralized manipulation of
information. This approach means that
statistics may be controlled and controlled
from a single location, making it less difficult to
put in force information safety and get entry to
control guidelines.
4.Consistency:- The database technique
presents consistency in records storage and
retrieval. Data within the database is organized
in a constant and predictable manner, making
it less difficult to retrieve and examine facts.
Consistency is maintained through using
guidelines and constraints that make certain
that records are entered and saved in the ideal
format.
5.Scalability:- The database technique is
scalable, allowing for the control of massive
amounts of statistics. The use of a DBMS allows
for the control of facts in a dependent way,
making it less difficult to manage and retrieve
big amounts of records.
6.Security:- The database technique gives
information security. Data can be secured via
the use of get entry to control policies and
encryption. This guarantees that statistics is
only accessed by means of authorized users
and that touchy statistics is included from
unauthorized access.
7.Concurrent Access:- The database technique
lets in for concurrent access to records.
Multiple users can get entry to the equal facts
simultaneously, ensuring that records are
available to all customers in real-time. This is
finished via using locking mechanisms that
save you multiple customers from modifying
the identical facts concurrently.
8.Data Modeling:- The database approach lets
in for the usage of data modeling techniques to
create a logical representation of information.
Data modeling is a system that includes
figuring out the facts, entities, attributes, and
relationships in a device. By the usage of facts
modeling strategies, builders can create a clear
expertise of the statistics in a device and the
way it's miles related, making it less
complicated to layout the database schema.
9.Data Integrity:- The database technique
offers mechanisms for making sure statistics
integrity. This includes the use of facts
constraints which includes primary keys,
foreign keys, and take a look at constraints,
which ensure that statistics is entered in the
correct format and that relationships between
records are maintained.
10. Data Querying:- The database method
affords effective records querying capabilities.
The use of Structured Query Language (SQL)
lets in customers to retrieve and manipulate
facts in a flexible and efficient manner. SQL
allows for the filtering, sorting, grouping, and
aggregation of records, making it simpler to
extract insights from large amounts of
information.
11. Data Backup and Recovery:- The
database approach presents mechanisms for
statistics backup and recuperation. This
approach means that in the event of facts loss
or corruption, statistics can be restored to a
preceding country. DBMSs offer mechanisms
for growing backups of the database at regular
durations and for restoring data from these
backups within the event of records loss.
12. Data Replication:- The database
technique lets in for the replication of
information across a couple of servers. Data
replication is the method of copying records
from one server to any other, providing
redundancy and improving data availability. By
replicating information throughout multiple
servers, groups can ensure that information is
available in the occasion of a server failure.
In the end, the database method is an effective
approach for organizing and managing records in a
DBMS. The characteristics of the database
technique encompass facts independence,
statistics integration, centralized control,
consistency, scalability, safety, concurrent access,
records modeling, statistics integrity, information
querying, information backup and restoration, facts
replication, facts evaluation, and application
improvement. By the use of the database
approach, agencies can improve information
control and accessibility, making it less
complicated to extract insights from large
quantities of facts.
As can be seen from the above list, a DBMS is a
complex software application. While all database
management systems may not provide all of these
features, these are the general characteristics of
today’s DBMSs. Using a database requires considerable
expertise and knowledge about the specific DBMS
being used. Some of the more popular DBMSs today
are MySQL, Microsoft SQL Server, Oracle, PostgreSQL,
Microsoft Access, and IBM's DB2.
The second component in the database approach is the
data. Although the physical location or manner in
which the data are stored may be important for
performance reasons, the location of the data does not
determine whether or not a system is developed using
the database approach. As long as the DBMS has
access to and can perform its responsibilities in
regards to the data, the details of the data storage are
not relevant.
The final component of the database approach is the
application, also called "front end" software.
Application software interacts with the DBMS to
provide information to a user. It may also provide a
way for a user to invoke other functionality of the
DBMS. In fact, the DBMS software itself is non-visual,
meaning that the user does not interact directly with
the DBMS. Any software that provides an interface for
the user to invoke procedures in the DBMS we will
define as application software.
Once the application has determined what the user is
trying to accomplish, it sends a request to the DBMS.
The request may be an instruction to change data or a
request for information such as the list of employees
who were hired on a particular date. All relational
databases use a standard language to receive and
process requests. The standard language is
called Structured Query Language (SQL).
The DBMS receives the request and determines if the
operation requested is allowed for the authenticated
user. If the operation is allowed, the DBMS completes
the operation and sends a response to the application.
The application then communicates the information to
the user. If the operation is not authorized for the user
or if there is an error in fulfilling the operation, the
DBMS responds with an appropriate message. Again, it
is up to the application to display that to the user. It is
a critical feature of the database approach that the
application never bypasses the DBMS to access stored
data directly.
Figure 1.1 illustrates some of the primary components
of a typical DBMS and how they are used in an
information system. The user interacts with the DBMS
generally by writing SQL statements through the front
end. (Although a sophisticated front end could format
the SQL statements itself based on other types of user
input.) These SQL statements are interpreted and
executed by the DBMS by either updating the data or
returning results from the data. In this class, we will
focus on query statements, whose purpose is to
retrieve data from the database and present it in a
form that is understandable by the user.
Difference between File
System and DBMS
File System Approach

File based systems were an early attempt to computerize the manual


system. It is also called a traditional based approach in which a decentralized
approach was taken where each department stored and controlled its own
data with the help of a data processing specialist. The main role of a data
processing specialist was to create the necessary computer file structures,
and also manage the data within structures and design some application
programs that create reports based on file data.

In the above figure:

Consider an example of a student's file system. The student file will contain
information regarding the student (i.e. roll no, student name, course etc.).
Similarly, we have a subject file that contains information about the subject
and the result file which contains the information regarding the result.

Some fields are duplicated in more than one file, which leads to data
redundancy. So to overcome this problem, we need to create a centralized
system, i.e. DBMS approach.

DBMS:

A database approach is a well-organized collection of data that are related in


a meaningful way which can be accessed by different users but stored only
once in a system. The various operations performed by the DBMS system
are: Insertion, deletion, selection, sorting etc.
In the above figure,

In the above figure, duplication of data is reduced due to centralization of


data.

There are the following differences between DBMS and File systems:

Difference between File System and DBMS


Basics File System DBMS

The file system is


a way of arranging DBMS is software
the files in a for managing the
storage medium database.
Structure within a computer.

Redundant data In DBMS there is


can be present in no redundant
Data
Redundancy a file system. data.

It doesn’t provide It provides in


Inbuilt mechanism house tools for
for backup and backup and
recovery of data if recovery of data
Backup and
Recovery it is lost. even if it is lost.
Basics File System DBMS

There is no
Efficient query
efficient query
processing is
processing in the
Query there in DBMS.
processing file system.

There is more
There is less data data consistency
consistency in the because of the
file system. process of
Consistency normalization.

It has more
It is less complex complexity in
as compared to handling as
DBMS. compared to the
Complexity file system.

File systems DBMS has more


provide less security
security in mechanisms as
comparison to compared to file
Security
Constraints DBMS. systems.

It has a
It is less expensive comparatively
than DBMS. higher cost than a
Cost file system.

In DBMS data
independence
exists, mainly of
There is no data two types:
independence. 1) Logical Data
Independence.
2)Physical Data
Data
Independence Independence.

Only one user can Multiple users can


access data at a access data at a
User Access time. time.
Meaning
The users are not The user has to
required to write write procedures
procedures. for managing
Basics File System DBMS

databases

Data is distributed
Due to centralized
in many files. So, it
nature data
is not easy to
sharing is easy
Sharing share data.

It give details of
It hides the
storage and
internal details of
representation of
Data Database
Abstraction data

Integrity
Integrity
Constraints are
constraints are
difficult to
Integrity easy to implement
Constraints implement

To access data in
a file , user
No such attributes
requires attributes
are required.
such as file name,
Attributes file location.

Oracle, SQL
Cobol, C++
Example Server
The main difference between a file system and a DBMS (Database Management System)
is the way they organize and manage data.
1. File systems are used to manage files and directories, and provide basic operations for
creating, deleting, renaming, and accessing files. They typically store data in a hierarchical
structure, where files are organized in directories and subdirectories. File systems are simple
and efficient, but they lack the ability to manage complex data relationships and ensure data
consistency.
2. On the other hand, DBMS is a software system designed to manage large amounts of
structured data, and provide advanced operations for storing, retrieving, and manipulating
data. DBMS provides a centralized and organized way of storing data, which can be
accessed and modified by multiple users or applications. DBMS offers advanced features
like data validation, indexing, transactions, concurrency control, and backup and recovery
mechanisms. DBMS ensures data consistency, accuracy, and integrity by enforcing data
constraints, such as primary keys, foreign keys, and data types.
In summary, file systems are suitable for managing small amounts of unstructured data, while
DBMS is designed for managing large amounts of structured data, and offers more advanced
features for ensuring data integrity, security, and performance.
4 ) Explain DBA roles and
responsibilities
A Database Administrator (DBA) is an individual or person responsible for controlling,
maintaining, coordinating, and operating a database management system. Managing,
securing, and taking care of the database systems is a prime responsibility. They are
responsible and in charge of authorizing access to the database, coordinating,
capacity, planning, installation, and monitoring uses, and acquiring and gathering
software and hardware resources as and when needed. Their role also varies from
configuration, database design, migration, security, troubleshooting, backup, and data
recovery. Database administration is a major and key function in any firm or
organization that is relying on one or more databases. They are overall commanders of
the Database system.
Types of Database Administrator (DBA) :
 Administrative DBA –
Their job is to maintain the server and keep it functional. They are concerned with
data backups, security, troubleshooting, replication, migration, etc.
 Data Warehouse DBA –
Assigned earlier roles, but held accountable for merging data from various sources
into the data warehouse. They also design the warehouse, with cleaning and scrubs
data prior to loading.
 Cloud DBA –
Nowadays companies are preferring to save their workpiece on cloud storage. As it
reduces the chance of data loss and provides an extra layer of data security and
integrity.
 Development DBA –
They build and develop queries, stores procedure, etc. that meets firm or
organization needs. They are par at programming.
 Application DBA –
They particularly manage all requirements of application components that interact
with the database and accomplish activities such as application installation and
coordination, application upgrades, database cloning, data load process
management, etc.
 Architect –
They are held responsible for designing schemas like building tables. They work to
build a structure that meets organizational needs. The design is further used by
developers and development DBAs to design and implement real applications.
 OLAP DBA –
They design and build multi-dimensional cubes for determination support or OLAP
systems.
 Data Modeler –
In general, a data modeler is in charge of a portion of a data architect’s duties. A data
modeler is typically not regarded as a DBA, but this is not a hard and fast rule.
 Task-Oriented DBA –
To concentrate on a specific DBA task, large businesses may hire highly specialised
DBAs. They are quite uncommon outside of big corporations. Recovery and backup
DBA, whose responsibility it is to guarantee that the databases of businesses can be
recovered, is an example of a task-oriented DBA. However, this specialism is not
present in the majority of firms. These task-oriented DBAs will make sure that highly
qualified professionals are working on crucial DBA tasks when it is possible.
 Database Analyst –
This position doesn’t actually have a set definition. Junior DBAs may occasionally be
referred to as database analysts. A database analyst occasionally performs functions
that are comparable to those of a database architect. The term “Data Administrator”
is also used to describe database analysts and data analysts. Additionally, some
businesses occasionally refer to database administrators as data analysts.
Importance of Database Administrator (DBA) :
 Database Administrator manages and controls three levels of database internal level,
conceptual level, and external level of Database management system architecture
and in discussion with the comprehensive user community, gives a definition of the
world view of the database. It then provides an external view of different users and
applications.
 Database Administrator ensures held responsible to maintain integrity and security of
database restricting from unauthorized users. It grants permission to users of the
database and contains a profile of each and every user in the database.
 Database Administrators are also held accountable that the database is protected
and secured and that any chance of data loss keeps at a minimum.
 Database Administrator is solely responsible for reducing the risk of data loss as it
backup the data at regular intervals.
Role and Duties of Database Administrator (DBA) :
 Decides hardware –
They decide on economical hardware, based on cost, performance, and efficiency of
hardware, and best suits the organization. It is hardware that is an interface between
end users and the database.
 Manages data integrity and security –
Data integrity needs to be checked and managed accurately as it protects and
restricts data from unauthorized use. DBA eyes on relationships within data to
maintain data integrity.
 Database Accessibility –
Database Administrator is solely responsible for giving permission to access data
available in the database. It also makes sure who has the right to change the content.
 Database design –
DBA is held responsible and accountable for logical, physical design, external model
design, and integrity and security control.
 Database implementation –
DBA implements DBMS and checks database loading at the time of its
implementation.
 Query processing performance –
DBA enhances query processing by improving speed, performance, and accuracy.
 Tuning Database Performance –
If the user is not able to get data speedily and accurately then it may lose
organization’s business. So by tuning SQL commands DBA can enhance the
performance of the database.
Various responsibilities of Database Administrator (DBA) :
 Responsible for designing overall database schema (tables & fields).
 To select and install database software and hardware.
 Responsible for deciding on access methods and data storage.
 DBA selects appropriate DBMS software like oracle, SQL server or MySQL.
 Used in designing recovery procedures.
 DBA decides the user access level and security checks for accessing, modifying or
manipulating data.
 DBA is responsible for specifying various techniques for monitoring the database
performance.
 DBA is responsible for operation managements.
 The operation management deals with the data problems which arises on day to day
basis, and the responsibilities include are:
1. Investigating if any error is been found in the data.
2. Supervising of restart and recovery procedures in case of any event failure.
3. Supervising reorganization of the databases.
4. Controlling and handling all periodic dumps of data.
Skills Required for DBA:
1. The various programming and soft skills are required to DBA are as follows,
 Good communication skills
 Excellent knowledge of databases architecture and design and RDBMS.
 Knowledge of Structured Query Language (SQL).
2. In addition, this aspect of database administration includes maintenance of data
security, which involves maintaining security authorization tables, conducting periodic
security audits, investigating all known security breaches.
3. To carry out all these functions, it is crucial that the DBA has all the accurate
information about the company’s data readily on hand. For this purpose he maintains a
data dictionary.
4. The data dictionary contains definitions of all data items and structures, the various
schemes, the relevant authorization and validation checks and the different mapping
definitions.
5. It should also have information about the source and destination of a data item and
the flow of a data item as it is used by a system. This type of information is a great help
to the DBA in maintaining centralized control of data.

What is data dictionary ?


A data dictionary contains metadata i.e., data about the database. The data
dictionary is very important as it contains information such as what is in the
database, who is allowed to access it, where is the database physically stored etc.
The users of the database normally don't interact with the data dictionary, it is only
handled by the database administrators.

The data dictionary in general contains information about the following −

 Names of all the database tables and their schemas.


 Details about all the tables in the database, such as their owners, their
security constraints, when they were created etc.
 Physical information about the tables such as where they are stored and how.
 Table constraints such as primary key attributes, foreign key information etc.
 Information about the database views that are visible.

This is a data dictionary describing a table that contains employee details.

Field Name Data Fiel Descri Example


Type d ption
Siz
e
for
dis
pla
y

EmployeeNu Integer 10 Uniqu 1645000


mber e ID of 001
each
emplo
yee

Name Text 20 Name David


of the Heston
emplo
yee
Field Name Data Fiel Descri Example
Type d ption
Siz
e
for
dis
pla
y

Date of Date/ 10 DOB 08/03/19


Birth Time of 95
Emplo
yee

Phone Integer 10 Phone 6583648


Number numb 648
er of
emplo
yee

The different types of data dictionary are −

Active Data Dictionary

If the structure of the database or its specifications change at any point of time, it
should be reflected in the data dictionary. This is the responsibility of the database
management system in which the data dictionary resides.

So, the data dictionary is automatically updated by the database management


system when any changes are made in the database. This is known as an active
data dictionary as it is self updating.

Passive Data Dictionary

This is not as useful or easy to handle as an active data dictionary. A passive data
dictionary is maintained separately to the database whose contents are stored in
the dictionary. That means that if the database is modified the database dictionary
is not automatically updated as in the case of Active Data Dictionary.

So, the passive data dictionary has to be manually updated to match the database.
This needs careful handling or else the database and data dictionary are out of
sync.

Data file indices


Types of Databases
There are various types of databases used for storing different varieties of data:

1) Centralized Database

It is the type of database that stores data at a centralized database system. It


comforts the users to access the stored data from different locations through
several applications. These applications contain the authentication process to let
users access data securely. An example of a Centralized database can be Central
Library that carries a central database of each library in a college/university.

Advantages of Centralized Database


o It has decreased the risk of data management, i.e., manipulation of data will
not affect the core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data
standards.
o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database


o The size of the centralized database is large, which increases the response
time for fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge
loss.

2) Distributed Database

Unlike a centralized database system, in distributed systems, data is distributed


among different database systems of an organization. These database systems are
connected via communication links. Such links help the end-users to access the
data easily. Examples of the Distributed database are Apache Cassandra, HBase,
Ignite, etc.

We can further divide a distributed database system into:

o Homogeneous DDB: Those database systems which execute on the same


operating system and use the same application process and carry the same
hardware devices.
o Heterogeneous DDB: Those database systems which execute on different
operating systems under different application procedures, and carries
different hardware devices.

Advantages of Distributed Database


o Modular development is possible in a distributed database, i.e., the system
can be expanded by including new computers and connecting them to the
distributed system.
o One server failure will not affect the entire data set.
3) Relational Database

This database is based on the relational data model, which stores data in the form
of rows(tuple) and columns(attributes), and together forms a table(relation). A
relational database uses SQL for storing, manipulating, as well as maintaining the
data. E.F. Codd invented the database in 1970. Each table in the database carries a
key that makes the data unique from others. Examples of Relational databases are
MySQL, Microsoft SQL Server, Oracle, etc.

Properties of Relational Database

There are following four commonly known properties of a relational model known as
ACID properties, where:

A means Atomicity: This ensures the data operation will complete either with
success or with failure. It follows the 'all or nothing' strategy. For example, a
transaction will either be committed or will abort.

C means Consistency: If we perform any operation over the data, its value before
and after the operation should be preserved. For example, the account balance
before and after the transaction should be correct, i.e., it should remain conserved.

I means Isolation: There can be concurrent users for accessing data at the same
time from the database. Thus, isolation between the data should remain isolated.
For example, when multiple transactions occur at the same time, one transaction
effects should not be visible to the other transactions in the database.

D means Durability: It ensures that once it completes the operation and commits
the data, data changes should remain permanent.

4) NoSQL Database

Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of
data sets. It is not a relational database as it stores data not only in tabular form but
in several different ways. It came into existence when the demand for building
modern applications increased. Thus, NoSQL presented a wide variety of database
technologies in response to the demands. We can further divide a NoSQL database
into the following four types:
a. Key-value storage: It is the simplest type of database storage where it
stores every single item as a key (or attribute name) holding its value,
together.
b. Document-oriented Database: A type of database used to store data as
JSON-like document. It helps developers in storing data by using the same
document-model format as used in the application code.
c. Graph Databases: It is used for storing vast amounts of data in a graph-like
structure. Most commonly, social networking websites use the graph
database.
d. Wide-column stores: It is similar to the data represented in relational
databases. Here, data is stored in large columns together, instead of storing
in rows.

Advantages of NoSQL Database


o It enables good productivity in the application development as it is not
required to store data in a structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.

5) Cloud Database

A type of database where data is stored in a virtual environment and executes over
the cloud computing platform. It provides users with various cloud computing
services (SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous
cloud platforms, but the best options are:

o Amazon Web Services(AWS)


o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.

6) Object-oriented Databases

The type of database that uses the object-based data model approach for storing
data in the database system. The data is represented and stored as objects which
are similar to the objects used in the object-oriented programming language.

7) Hierarchical Databases

It is the type of database that stores data in the form of parent-children relationship
nodes. Here, it organizes data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child
record in the tree will contain only one parent. On the other hand, each parent
record can have multiple child records.
8) Network Databases

It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them.
Unlike the hierarchical database, it allows each record to have multiple children and
parent nodes to form a generalized graph structure.

9) Personal Database

Collecting and storing data on the user's system defines a Personal Database. This
database is basically designed for a single user.

Advantage of Personal Database


o It is simple and easy to handle.
o It occupies less storage space as it is small in size.

10) Operational Database

The type of database which creates and updates the database in real-time. It is
basically designed for executing and handling the daily data operations in several
businesses. For example, An organization uses operational databases for managing
per day transactions.

11) Enterprise Database

Large organizations or enterprises use this database for managing a massive


amount of data. It helps organizations to increase and improve their efficiency. Such
a database allows simultaneous access to users.

Advantages of Enterprise Database:


o Multi processes are supportable over the Enterprise database.
o It allows executing parallel queries on the system.

Tuple in DBMS
A tuple is an essential component in the relational database management system.
Relational database management stores data in tables.

A tuple contains all the information of a particular entity. The table is composed of
fields and a tuple. A tuple represents a row in the table and the data associated with
the entity. The data in RDBMS is arranged across several columns and rows. The
column represents the attribute of an entity, such as age, gender, marks, etc.
In the DBMS relational model, the database is represented as a collection of
relations between entities and their attributes.

A relation represents a table that contains certain values in the form of rows and
columns, where a row or tuple is a collection of related data values. Each value
stored inside each row, column, and the table is important for determining the
table's name. Though the data is arranged in tables in RDBMS, the data storage
does not depend on the logical structure of the data.

A tuple in the DBMS refers to a single record in the relational db. It represents the
entire data in a single row of the relational table. The data is stored in the table
format by attribute and tuples in DBMS.

The data may be available in an entire spreadsheet in the database consisting of


numerous rows known as a tuple with a value corresponding to each field present in
the tuple.

The user can perform several operations on the data stored in the field or tuple of
the tables. These operations include inserting, removing, modifying, or update on
the data values stored in the table. The user can also perform a join operation on
the tuples in two different relations. Join operations are used to combine the data
values in two tables.

Below is an example of a tuple with fields such as the client's name, contact
number, email address, and nationality of the client.

Harsh 9337XXX harsh@gmail.co India


Verma XX m n

In mathematics, a tuple can be described as an ordered list of elements of the same


or different data types. According to the set theory, the n-tuple can be defined as a
collection of n elements. The tuples represent a particular record of the table.

The management systems implement the relational model called the relational
database management system. Most of the time, when a user wants to store data in
a DBMS, the data is stored in tables. It is easy to read the data in tables as it is
more organized.

In DBMS, a unique key is assigned to each table that is used to organize and identify
the elements. This key is known as the table's primary key and is unique for each
record present. In DBMS, the user can add a column containing the value from
another table's column. This enables the user to link the tuple of different tables.

The rows in the tables represent the records in the database, and the columns
represent the attributes associated with the entity.

A tuple is a single row in a database. The record contains all the information about
an entity in the relation. A tuple or record is a name associated with using a
particular entity in the item. In contrast, in mathematics, a tuple is an ordered list of
elements that contains a set of associated data to the elements in the table.

Working with Tuple in DBMS with Examples

Given below is a table that contains several tuples or records. Using this table, we
will learn to work with a tuple in DBMS. The below table is a student record table
that contains information such as students' names, age, subjects, and marks. The
table has an additional column ID that contains a unique value for each row present
in the table.

Thus, as you can see that each row in the database contains information about
another individual.

For instance, the first row contains information about a student named Harsh. This
row can also be referred to as the record, as it contains the record of each student
in the database. These rows and records in the database are called a tuple.

Thus, in the database management system, the tuple is a row containing all the
information related to a particular entity. The entity can be an employee, a student,
a customer, or a user.

ID Name Age Subject Marks


1 Harsh 27 Hindi 84
2 Harshil 24 Physics 93
3 Harshit 23 English 78
4 Harshita 26 Maths 91

In the above figure, you can see that a tuple or record contains the entire
information about a single entity, such as the age, subject, and marks obtained in
the subject.

Most database stores data in the form of a table that consists of tuples and
attributes. A row corresponding data to a certain object is called a tuple in a
database management system

What is a relation ?
A general term used in database design is a “relational database"—but a database
relation is not the same thing and does not imply, as its name suggests, a
relationship between tables. A database relation simply refers to an individual table
in a relational database.
In a relational database, the table is a relation because it stores the relation
between data in its column-row format. The columns are the table's attributes,
while the rows represent the data records. A single row is known as a tuple to
database designers.

The Definition and Properties of a Relation

A relation, or table, in a relational database has some common properties.

 its name must be unique in the database, for example - a database cannot
contain multiple tables of the same name.
 Each relation must have a set of columns or attributes, and it must have a
set of rows to contain the data. As with the table names, no attributes can
have the same name.
 A tuple (or row) can be a duplicate. In practice, a database might actually
contain duplicate rows, but there should be practices in place to avoid this,
such as the use of unique primary keys (next up).
Given that a tuple cannot be a duplicate, it follows that a relation must contain at
least one attribute (or column) that identifies each tuple (or row) uniquely. This is
usually the primary key. This primary key cannot be duplicated.

Further, field must contain a single value. For example - you cannot enter
something like "Tom Smith" and expect the database to understand that you have a
first and last name; rather, the database will understand that the value of that cell
is exactly what has been entered.

5) Explain all keys in DBMS


primary key,foreign key… )
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table.
It is also used to establish and identify relationships between tables.

For example, ID is used as a key in the Student table because it is unique


for each student. In the PERSON table, passport_number, license_number,
SSN are keys since they are unique for each person.
Types of keys:

1. Primary key
o It is the first key used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys, as we saw in the
PERSON table. The key which is most suitable from those lists becomes
a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for
each employee. In the EMPLOYEE table, we can even select
License_Number and Passport_Number as primary keys since they are
also unique.
o For each entity, the primary key selection is based on requirements
and developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely
identify a tuple.
o Except for the primary key, the remaining attributes are considered a
candidate key. The candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key.
The rest of the attributes, like SSN, Passport_Number, License_Number, etc.,
are considered a candidate key.
3. Super Key

Super key is an attribute set that can uniquely identify a tuple. A super key is
a superset of a candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID,


EMPLOYEE_NAME), the name of two employees can be the same, but their
EMPLYEE_ID can't be the same. Hence, this combination can also be a key.

Backward Skip 10sPlay VideoForward Skip 10s

Keys in DBMS

The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key
o Foreign keys are the column of the table used to point to the primary
key of another table.
o Every employee works in a specific department in a company, and
employee and department are two different entities. So we can't store
the department's information in the employee table. That's why we link
these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as
a new attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the
tables are related.
5. Alternate key

There may be one or more attributes or a combination of attributes that


uniquely identify each tuple in a relation. These attributes or combinations of
the attributes are called the candidate keys. One key is chosen as the
primary key from these candidate keys, and the remaining candidate key, if
it exists, is termed the alternate key. In other words, the total number of
the alternate keys is the total number of candidate keys minus the primary
key. The alternate key may or may not exist. If there is only one candidate
key in a relation, it does not have an alternate key.

For example, employee relation has two attributes, Employee_Id and


PAN_No, that act as candidate keys. In this relation, Employee_Id is chosen
as the primary key, so the other candidate key, PAN_No, acts as the
Alternate key.
6. Composite key

Whenever a primary key consists of more than one attribute, it is known as a


composite key. This key is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be


assigned multiple roles, and an employee may work on multiple projects
simultaneously. So the primary key will be composed of all three attributes,
namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes
act as a composite key since the primary key comprises more than one
attribute.
7. Artificial key

The key created using arbitrarily assigned data are known as artificial keys.
These keys are created when a primary key is large and complex and has no
relationship with many other relations. The data values of the artificial keys
are usually numbered in a serial order. For example, the primary key, which
is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee
relations. So it would be better to add a new virtual attribute to identify each
tuple in the relation uniquely.

Constraints on Relational
Database Model


In modeling the design of the relational database we can put


some restrictions like what values are allowed to be inserted in
the relation, and what kind of modifications and deletions are
allowed in the relation. These are the restrictions we impose on
the relational database.
In models like Entity-Relationship models, we did not have such
features. Database Constraints can be categorized into 3 main
categories:
1. Constraints that are applied in the data model are
called Implicit Constraints.
2. Constraints that are directly applied in the schemas of the data
model, by specifying them in the DDL(Data Definition
Language). These are called Schema-Based Constraints or
Explicit Constraints.
3. Constraints that cannot be directly applied in the schemas of the
data model. We call these Application-based or Semantic
Constraints.
So here we are going to deal with Implicit constraints.
Relational Constraints
These are the restrictions or sets of rules imposed on the
database contents. It validates the quality of the database. It
validates the various operations like data insertion, updation, and
other processes that have to be performed without affecting the
integrity of the data. It protects us against threats/damages to the
database. Mainly Constraints on the relational database are of 4
types
 Domain constraints
 Key constraints or Uniqueness Constraints
 Entity Integrity constraints
 Referential integrity constraints

Types of Relational Constraints

Let’s discuss each of the above constraints in detail.


1. Domain Constraints
 Every domain must contain atomic values(smallest indivisible
units) which means composite and multi-valued attributes are
not allowed.
 We perform a datatype check here, which means when we
assign a data type to a column we limit the values that it can
contain. Eg. If we assign the datatype of attribute age as int, we
can’t give it values other than int datatype.
Example:
EID Name Phone

01 Bikash Dutta 123456789


234456678
Explanation: In the above relation, Name is a composite
attribute and Phone is a multi-values attribute, so it is violating
domain constraint.
2. Key Constraints or Uniqueness Constraints
 These are called uniqueness constraints since it ensures that
every tuple in the relation should be unique.
 A relation can have multiple keys or candidate keys(minimal
superkey), out of which we choose one of the keys as the
primary key, we don’t have any restriction on choosing the
primary key out of candidate keys, but it is suggested to go with
the candidate key with less number of attributes.
 Null values are not allowed in the primary key, hence Not Null
constraint is also part of the key constraint.
Example:
EID Name Phone

01 Bikash 6000000009

02 Paul 9000090009

01 Tuhin 9234567892

Explanation: In the above table, EID is the primary key, and the
first and the last tuple have the same value in EID ie 01, so it is
violating the key constraint.
3. Entity Integrity Constraints
 Entity Integrity constraints say that no primary key can take a
NULL value, since using the primary key we identify each tuple
uniquely in a relation.
Example:
EID Name Phone

01 Bikash 9000900099

02 Paul 600000009

NULL Sony 9234567892

Explanation: In the above relation, EID is made the primary key,


and the primary key can’t take NULL values but in the third tuple,
the primary key is null, so it is violating Entity Integrity
constraints.
4. Referential Integrity Constraints
 The Referential integrity constraint is specified between two
relations or tables and used to maintain the consistency among
the tuples in two relations.
 This constraint is enforced through a foreign key, when an
attribute in the foreign key of relation R1 has the same
domain(s) as the primary key of relation R2, then the foreign key
of R1 is said to reference or refer to the primary key of relation
R2.
 The values of the foreign key in a tuple of relation R1 can either
take the values of the primary key for some tuple in relation R2,
or can take NULL values, but can’t be empty.
Example:
EID Name DNO

01 Divine 12

02 Dino 22
EID Name DNO

04 Vivian 14

DNO Place

12 Jaipur

13 Mumbai

14 Delhi

Explanation: In the above tables, the DNO of Table 1 is the


foreign key, and DNO in Table 2 is the primary key. DNO = 22 in
the foreign key of Table 1 is not allowed because DNO = 22 is not
defined in the primary key of table 2. Therefore, Referential
integrity constraints are violated here.
Advantages of Relational Database Model
 It is simpler than the hierarchical model and network model.
 It is easy and simple to understand.
 Its structure can be changed anytime upon requirement.
 Data Integrity: The relational database model enforces data
integrity through various constraints such as primary keys,
foreign keys, and unique constraints. This ensures that the data
in the database is accurate, consistent, and valid.
 Flexibility: The relational database model is highly flexible and
can handle a wide range of data types and structures. It also
allows for easy modification and updating of the data without
affecting other parts of the database.
 Scalability: The relational database model can scale to handle
large amounts of data by adding more tables, indexes, or
partitions to the database. This allows for better performance
and faster query response times.
 Security: The relational database model provides robust
security features to protect the data in the database. These
include user authentication, authorization, and encryption of
sensitive data.
 Data consistency: The relational database model ensures that
the data in the database is consistent across all tables. This
means that if a change is made to one table, the corresponding
changes will be made to all related tables.
 Query Optimization: The relational database model provides a
query optimizer that can analyze and optimize SQL queries to
improve their performance. This allows for faster query response
times and better scalability.
Disadvantages of the Relational Model
 Few database relations have certain limits which can’t be
expanded further.
 It can be complex and it becomes hard to use.
 Complexity: The relational model can be complex and difficult
to understand, particularly for users who are not familiar with
SQL and database design principles. This can make it
challenging to set up and maintain a relational database.
 Performance: The relational model can suffer from
performance issues when dealing with large data sets or
complex queries. In particular, joins between tables can be slow,
and indexing strategies can be difficult to optimize.
 Scalability: While the relational model is generally scalable, it
can become difficult to manage as the database grows in size.
Adding new tables or indexes can be time-consuming, and
managing relationships between tables can become complex.
 Cost: Relational databases can be expensive to license and
maintain, particularly for large-scale deployments. Additionally,
relational databases often require dedicated hardware and
specialized software to run, which can add to the cost.
 Limited flexibility: The relational model is designed to work
with tables that have predefined structures and relationships.
This can make it difficult to work with data that does not fit
neatly into a table-based format, such as unstructured or semi-
structured data.
 Data redundancy: In some cases, the relational model can
lead to data redundancy, where the same data is stored in
multiple tables. This can lead to inefficiencies and can make it
difficult to ensure data consistency across the database.
Conclusion
Relational database constraints are rules in a database model
that help maintain the integrity and consistency of data. These
rules include primary key constraints, unique constraints, foreign
key constraints, check constraints, default constraints, not null
constraints, multi-column constraints, etc. Relational database
constraints help keep data accurate, maintain relationships, and
avoid the insertion of wrong or inconsistent data

Update Operations, Transactions


& Dealing with Constraint
Violations

The operations of the relational model can be categorized into

retrievals and updates

Modification or update operations

There are three basic operations that can change the states of
relations in the data-base: Insert, Delete,and Update (or
Modify).

Insert

is used to insert one or more new tuples in a relation,


Delete

is used to delete tuples, and

Update

(or Modify) is used to change the values of some attributes in existing tuples.

Whenever these operations are applied, the integrity constraints specified on the
relational database schema shouldnot be violated.

The Insert Operation

The Insert operation provides a list of attribute values for a new tuple that is to be
inserted into a relation R.

Insert can violate any of the four types of constraints:

Domain constraints

can be violated if an attribute value is given that does not appear in the
correspondingdomain or is not of the appropriate data type.

Key constraints
can be violated if a key value in the new tuple t already exists in another tuple in
the relationr(R).

Entity integrity

can be violated if any part of the primary key of the new tuple t is NULL.

Referential integrity

can be violated if the value of any foreign key in t refers to a tuple that does not
exist in thereferenced relation. Here are some examples to illustrate this
discussion.

The Insert Operation:

1)Insert <‘Cecilia’, ‘F’, ‘Kolonsky’,NULL, ‘1960-04-05’, ‘6357 Windy Lane, Katy,TX’, F,


28000,NULL, 4> intoEMPLOYEE. This insertion violates the entity integrity
constraint (NULL for the primary key Ssn), so it is rejected2) Insert <‘Alicia’, ‘J’,
‘Zelaya’, ‘999887777’, ‘1960-04-05’, ‘6357 Windy Lane, Katy,TX’, F,
28000, ‘987654321’, 4>into EMPLOYEE.This insertion violates the key constraint
because another tuple with the same Ssn value already exists in theEMPLOYEE
relation, and so it is rejected.
The Insert Operation:

3) Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357


Windswept,Katy, TX’, F, 28000, ‘987654321’, 7>into EMPLOYEE.This insertion
violates the referential integrity constraint specified on Dno in EMPLOYEE
because nocorresponding referenced tuple exists in DEPARTMENT with Dnumber= 7.4)
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windy
Lane,Katy, TX’, F, 28000,NULL, 4> intoEMPLOYEE.This insertion satisfies all
constraints, so it is acceptable.
The Insert Operation:

If an insertion violates one or more constraints :

the default option is to reject the insertion. In this case, it would be useful if the
DBMS could provide areason to the user as to why the insertion was rejected.

Another option is to attempt to correct the reason for rejecting the insertion, but
this is typically not usedfor violations caused by Insert
The Delete Operation

The DELETE statement is used to delete rows from a table. Generally, DELETE
statement removes one or morerecords form a table.

The Delete operation can violate only referential integrity.

This occurs if the tuple being deleted is referenced by foreign keys from
other tuples in the database.


To specify deletion, a condition on the attributes of the relation selects the
tuple (or tuples) to be deleted.

Here are some examples :1)Operation: Delete the WORKS_ON tuple with Essn=
‘999887777’ and Pno= 10. This deletion is acceptable and deletes exactly one tuple.

2) Delete the EMPLOYEE tuple with Ssn= ‘999887777’

This deletion is not acceptable, because there are tuples in WORKS_ON that refer
to this tuple.

Hence, if the tuple in EMPLOYEE is deleted, referential integrity violations will


result.
3) Delete the EMPLOYEE tuple with Ssn= ‘333445555’.

This deletion is not acceptable

This deletion will result in even worse referential integrity violations, because the
tuple involved is referenced bytuples from the

EMPLOYEE,DEPARTMENT,WORKS_ON, and DEPENDENT

relations.
The Delete Operation:

Several options are available if a deletion operation causes a violation.1)

Rejectt

he deletion.2)

Cascade

, is to attempt to cascade (or propagate) the deletion by deleting tuples that


reference the tuple that is being deleted.For example, in operation 2, the DBMS
could automatically delete the offending tuples from WORKS_ON withEssn=
‘999887777’.3)

set null or set default

, is to modify the referencing attribute values that cause the violation; each such
value iseither set to NULL or changed to reference another default valid tuple.
Notice that if a referencing attribute thatcauses a violation is part of the primary key,
it cannot be set to NULL; otherwise, it would violate entity integrity.4)Combinations of
these three options are also possible.

The Update Operation

The Update (or Modify) operation is used to change the values of one or more
attributes in a tuple (or tuples) of some relation R. It is necessary to specify a
condition on the attributes of the relation to select the tuple (or tuples) to be
modified.Here are some examples1) Update the salary of the EMPLOYEE tuple with
Ssn= ‘999887777’ to 28000. Acceptable.2) Update the Dno of the EMPLOYEE tuple with
Ssn= ‘999887777’ to 1.Acceptable.3) Update the Dno of the EMPLOYEE tuple with Ssn=
‘999887777’ to 7. Unacceptable, because it violates referential integrity.4) Update
the Ssn of the EMPLOYEE tuple with Ssn= ‘999887777’ to‘987654321’.Unacceptable,
because it violates primary key constraint by repeating a value that already exists
as a primary key in another tuple;it violates referential integrity constraints
because there are other relations that refer to the existing value of Ssn

The Update Operation

Updating an attribute that is neither part of a primary key nor of a foreign key
usually causes no problems; the DBMSneed only check to confirm that the new
value is of the correct data type and domain. Modifying a primary key value
issimilar to deleting one tuple and inserting another in its place because we use
the primary key to identify tuples.

Dealing with constraint violations:


Similar options exist to deal with referential integrity violations caused by Update
as those options discussed for theDelete operation.

Explain ER model and


relationships explain ER model
for college database
ER Diagrams in DBMS: Entity Relationship Diagram Model
An Entity Relationship Diagram is a diagram that represents relationships
among entities in a database. It is commonly known as an ER Diagram. An
ER Diagram in DBMS plays a crucial role in designing the database.
Today’s business world previews all the requirements demanded by the
users in the form of an ER Diagram. Later, it's forwarded to the database
administrators to design the database.
What is an ER Diagram?

An Entity Relationship Diagram (ER Diagram) pictorially explains the


relationship between entities to be stored in a database. Fundamentally,
the ER Diagram is a structural design of the database. It acts as a
framework created with specialized symbols for the purpose of defining the
relationship between the database entities. ER diagram is created based
on three principal components: entities, attributes, and relationships.
The following diagram showcases two entities - Student and Course, and
their relationship. The relationship described between student and course
is many-to-many, as a course can be opted by several students, and a
student can opt for more than one course. Student entity possesses
attributes - Stu_Id, Stu_Name & Stu_Age. The course entity has attributes
such as Cou_ID & Cou_Name.

What is an ER Model?

An Entity-Relationship Model represents the structure of the database with


the help of a diagram. ER Modelling is a systematic process to design a
database as it would require you to analyze all data requirements before
implementing your database.
History of ER models

Peter Chen proposed ER Diagrams in 1971 to create a uniform convention


that can be used as a conceptual modeling tool. Many models were
presented and discussed, but none were suitable. The data structure
diagrams offered by Charles Bachman also inspired his model.

Why Use ER Diagrams in DBMS?

 ER Diagram helps you conceptualize the database and lets you know
which fields need to be embedded for a particular entity
 ER Diagram gives a better understanding of the information to be stored
in a database
 It reduces complexity and allows database designers to build databases
quickly
 It helps to describe elements using Entity-Relationship models
 It allows users to get a preview of the logical structure of the database

Symbols Used in ER Diagrams

 Rectangles: This Entity Relationship Diagram symbol represents entity


types
 Ellipses: This symbol represents attributes
 Diamonds: This symbol represents relationship types
 Lines: It links attributes to entity types and entity types with other
relationship types
 Primary key: Here, it underlines the attributes
 Double Ellipses: Represents multi-valued attributes
Components of ER Diagram

You base an ER Diagram on three basic concepts:

 Entities
 Weak Entity
 Attributes
 Key Attribute
 Composite Attribute
 Multivalued Attribute
 Derived Attribute
 Relationships
 One-to-One Relationships
 One-to-Many Relationships
 Many-to-One Relationships
 Many-to-Many Relationships
Entities

An entity can be either a living or non-living component.


It showcases an entity as a rectangle in an ER diagram.
For example, in a student study course, both the student and the course
are entities.

Weak Entity
An entity that makes reliance over another entity is called a weak entity
You showcase the weak entity as a double rectangle in ER Diagram.
In the example below, school is a strong entity because it has a primary
key attribute - school number. Unlike school, the classroom is a weak entity
because it does not have any primary key and the room number here acts
only as a discriminator.
Attribute

An attribute exhibits the properties of an entity.


You can illustrate an attribute with an oval shape in an ER diagram.

Key Attribute
Key attribute uniquely identifies an entity from an entity set.
It underlines the text of a key attribute.
For example: For a student entity, the roll number can uniquely identify a
student from a set of students.
Composite Attribute
An attribute that is composed of several other attributes is known as a
composite attribute.
An oval showcases the composite attribute, and the composite attribute
oval is further connected with other ovals.

Multivalued Attribute
Some attributes can possess over one value, those attributes are called
multivalued attributes.
The double oval shape is used to represent a multivalued attribute.
Derived Attribute
An attribute that can be derived from other attributes of the entity is known
as a derived attribute.
In the ER diagram, the dashed oval represents the derived attribute.

Relationship

The diamond shape showcases a relationship in the ER diagram.


It depicts the relationship between two entities.
In the example below, both the student and the course are entities, and
study is the relationship between them.
One-to-One Relationship
When a single element of an entity is associated with a single element of
another entity, it is called a one-to-one relationship.
For example, a student has only one identification card and an identification
card is given to one person.

One-to-Many Relationship
When a single element of an entity is associated with more than one
element of another entity, it is called a one-to-many relationship
For example, a customer can place many orders, but an order cannot be
placed by many customers.

Many-to-One Relationship
When more than one element of an entity is related to a single element of
another entity, then it is called a many-to-one relationship.
For example, students have to opt for a single course, but a course can
have many students.
Many-to-Many Relationship
When more than one element of an entity is associated with more than one
element of another entity, this is called a many-to-many relationship.
For example, you can assign an employee to many projects and a project
can have many employees.

How to Draw an ER Diagram?


Below are some important points to draw ER diagram:

 First, identify all the Entities. Embed all the entities in a rectangle and
label them properly.
 Identify relationships between entities and connect them using a diamond
in the middle, illustrating the relationship. Do not connect relationships
with each other.
 Connect attributes for entities and label them properly.
 Eradicate any redundant entities or relationships.
 Make sure your ER Diagram supports all the data provided to design the
database.
 Effectively use colors to highlight key areas in your diagrams.
Conclusion

ER Diagram in DBMS is widely used to describe the conceptual design of


databases. It helps both users and database developers to preview the
structure of the database before implementing the database.

ER diagram of College database

How to Convert ER Diagram to


Relational Database
The ER Model is intended as a description of real-world
entities. Although it is constructed in such a way as to
allow easy translation to the relational schema model, this
is not an entirely trivial process. The ER diagram
represents the conceptual level of database design
meanwhile the relational schema is the logical level for the
database design. We will be following the simple rules:
1. Entities and Simple Attributes:
An entity type within ER diagram is turned into a table.
You may preferably keep the same name for the entity or
give it a sensible name but avoid DBMS reserved words
as well as avoid the use of special characters.
Each attribute turns into a column (attribute) in the table.
The key attribute of the entity is the primary key of the
table which is usually underlined. It can be composite if
required but can never be null.
[info]It is highly recommended that every table should start
with its primary key attribute conventionally named as
TablenameID.[/info]
Taking the following simple ER diagram:

The initial relational schema is expressed in the following


format writing the table names with the attributes list inside
a parentheses as shown below for
Persons( personid , name, lastname, email )
Persons and Phones are Tables. name, lastname, are
Table Columns (Attributes).
[info]personid is the primary key for the table :
Person[/info]
2. Multi-Valued Attributes
A multi-valued attribute is usually represented with a
double-line oval.

If you have a multi-valued attribute, take the attribute and


turn it into a new entity or table of its own. Then make a
1:N relationship between the new entity and the existing
one. In simple words. 1. Create a table for the attribute. 2.
Add the primary (id) column of the parent entity as a
foreign key within the new table as shown below:
Persons( personid , name, lastname, email )
Phones ( phoneid , personid, phone )
[info]personid within the table Phones is a foreign key
referring to the personid of Persons[/info]
3. 1:1 Relationships

To keep it simple and even for better performances at data


retrieval, I would personally recommend using attributes to
represent such relationship. For instance, let us consider the case
where the Person has or optionally has one wife. You can place
the primary key of the wife within the table of the Persons which
we call in this case Foreign key as shown below.
Persons( personid , name, lastname, email , wifeid )
Wife ( wifeid , name )
Or vice versa to put the personid as a foreign key within the
Wife table as shown below:
Persons( personid , name, lastname, email )
Wife ( wifeid , name , personid)
[info]For cases when the Person is not married i.e. has no wifeID,
the attribute can set to NULL[/info]
4. 1:N Relationships
This is the tricky part ! For simplicity, use attributes in the
same way as 1:1 relationship but we have only one choice
as opposed to two choices. For instance, the Person can
have a House from zero to many , but a House can
have only one Person. To represent such relationship
the personid as the Parent node must be placed within
the Child table as a foreign key but not the other way
around as shown next:

It should convert to :
Persons( personid , name, lastname, email )
House ( houseid , num , address, personid)
5. N:N Relationships
We normally use tables to express such type of
relationship. This is the same for N − ary relationship of
ER diagrams. For instance, The Person can live or work in
many countries. Also, a country can have many people.
To express this relationship within a relational schema we
use a separate table as shown below:

It should convert into :


Persons( personid , name, lastname, email )
Countries ( countryid , name, code)
HasRelat ( hasrelatid , personid , countryid)
Relationship with attributes:
It is recommended to use table to represent them to keep the
design tidy and clean regardless of the cardinality of the
relationship.
Case Study
For the sake of simplicity, we will be producing the relational
schema for the following ER diagram:
The relational schema for the ER Diagram is given below as:
Company( CompanyID , name , address )
Staff( StaffID , dob , address , WifeID)
Child( ChildID , name , StaffID )
Wife ( WifeID , name )
Phone(PhoneID , phoneNumber , StaffID)
Task ( TaskID , description)
Work(WorkID , CompanyID , StaffID , since )
Perform(PerformID , StaffID , TaskID )

Explain three level architecture


of DBMS
o The three schema architecture is also called ANSI/SPARC
architecture or three-level architecture.
o This framework is used to describe the structure of a specific
database system.
o The three schema architecture is also used to separate the
user applications and physical database.
o The three schema architecture contains three-levels. It
breaks the database down into three different categories.

The three-schema architecture is as follows:

In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response
between various database levels of architecture.
o Mapping is not good for small DBMS because it takes more
time.
o In External / Conceptual mapping, it is necessary to
transform the request from external level to conceptual
schema.
o In Conceptual / Internal mapping, DBMS transform the
request from the conceptual to internal level.

Objectives of Three schema Architecture

The main objective of three level architecture is to enable


multiple users to access the same data with a personalized view
while storing the underlying data only once. Thus it separates the
user's view from the physical structure of the database. This
separation is desirable for the following reasons:

o Different users need different views of the same data.


o The approach in which a particular user needs to see the
data may change over time.
o The users of the database should not worry about the
physical implementation and internal workings of the
database such as data compression and encryption
techniques, hashing, optimization of the internal structures
etc.
o All users should be able to access the same data according
to their requirements.
o DBA should be able to change the conceptual structure of
the database without affecting the user's
o Internal structure of the database should be unaffected by
changes to physical aspects of the storage.

1. Internal Level
o The internal level has an internal schema which describes
the physical storage structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how
the data will be stored in a block.
o The physical level is used to describe complex low-level data
structures in detail.
o This level is also known as physical level. This level describes how
the data is actually stored in the storage devices. This level is also
responsible for allocating space to the data. This is the lowest level of
the architecture.
o

The internal level is generally is concerned with the following


activities:

o Storage space allocations.


For Example: B-Trees, Hashing etc.
o Access paths.
For Example: Specification of primary and secondary keys,
indexes, pointers and sequencing.
o Data compression and encryption techniques.
o Optimization of internal structures.
o Representation of stored fields.

2. Conceptual Level
o The conceptual schema describes the design of a database
at the conceptual level. Conceptual level is also known as
logical level.
o The conceptual schema describes the structure of the whole
database.
o The conceptual level describes what data are to be stored in
the database and also describes what relationship exists
among those data.
o In the conceptual level, internal details such as an
implementation of the data structure are hidden.
o Programmers and database administrators work at this level.
o It is also called logical level. The whole design of the database such
as relationship among data, schema of data etc. are described in this
level.
o Database constraints and security are also implemented in this level
of architecture. This level is maintained by DBA (database
administrator).
o

3. External Level

o At the external level, a database contains several schemas


that sometimes called as subschema. The subschema is
used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a
particular user group is interested and hides the remaining
database from that user group.
o The view schema describes the end user interaction with
database systems.
o It is also called view level. The reason this level is called “view” is
because several users can view their desired data from this level
which is internally fetched from database with the help of conceptual
and internal level mapping.
o The user doesn’t need to know the database schema details such as
data structure, table definition etc. user is only concerned about data
which is what returned back to the view level after it has been fetched
from database (present at the internal level).
o External level is the “top level” of the Three Level DBMS Architecture.
o

Mapping between Views

The three levels of DBMS architecture don't exist independently of


each other. There must be correspondence between the three
levels i.e. how they actually correspond with each other. DBMS is
responsible for correspondence between the three types of
schema. This correspondence is called Mapping.

There are basically two types of mapping in the database


architecture:

o Conceptual/ Internal Mapping


o External / Conceptual Mapping

Conceptual/ Internal Mapping

The Conceptual/ Internal Mapping lies between the conceptual


level and the internal level. Its role is to define the
correspondence between the records and fields of the conceptual
level and files and data structures of the internal level.

External/ Conceptual Mapping

The external/Conceptual Mapping lies between the external level


and the Conceptual level. Its role is to define the correspondence
between a particular external and the conceptual view.

Physical DBMS architecture


Physical DBMS Architecture

 Describes the software components used to enter and process data.


 How these s/w components are related and interconnected.

 Data Definition Language(DDL) - Set of commands required to define the


format of data.
 Data Manipulation Language(DML) - Set of commands that modify, process
data.
 DML Precompiler - It converts DML statements embedded in an application
program to normal procedural calls in the host language. It interacts with the
query processor in order to generate the appropriate code.
 DDL Compiler - It converts DDL statements into a set of tables containing
metadata tables – which are in a form that can be used by other components
of the DBMS. These are stored in system catalog or data dictionary.
 File Manager - Manages the allocation of space on disk storage.
 Query Processor - Responsible for receiving query language statements and
changing to a form the DBMS can understand. It has two parts : parser and
query optimizer.

Database Manager
It is the interface between low-level data, application programs and queries. A
database manager is a program module responsible for interfacing with the
database file system to the user queries. It enforces constraints to maintain the
consistency and integrity of the data as well as its security. It synchronizes the
simultaneous operations performed by concurrent users. It also performs backup
and recovery operations.

Components of Database Manager:

 Authorization Control - Checks that the user has necessary authorization


to carry out the required function.
 Command Processor - Converts commands to a logical sequence of steps.
 Integrity Checker - Checks the requested operation satisfies all necessary
integrity constraints such as key constraints.
 Query Optimizer - Examines the query language statements and tries to
choose the best and most efficient way to executing the query. Factors – CPU
time, disk time, network time, sorting methods and scanning methods.
 Transaction Manager - The transaction manager maintains tables of
authorization concurrency.
 Scheduler - It controls the relative order in which transaction operations are
executed. This module is responsible for ensuring that concurrent operations
or transactions on the database proceed without conflicting with one another.
A database may also support concurrency control tables to prevent conflicts
when simultaneous, conflicting commands are executed.
 Recovery Manager - Ensures that the database remains in a consistent
state in the presence of failures. It is responsible for transaction commit and
abort, that is success or failure of transaction.
 Buffer Manager (Cache Manager) - Responsible for the transfer of data
between main memory and secondary storage.

Important responsibilities of Database Manager:

 Interaction with File Manager - The raw data is stored on the disk using
the file system which is usually provided by a conventional operating system.
The database manager translates the various DML statements into low-level
file system commands. Thus, the database manager is responsible for the
actual storing, retrieving and updating of data in the database.
 Integrity Enforcement - The data values stored in the database must
satisfy certain types of consistency constraints. These constraints must be
specified explicitly by the DBA. If such constraints are specified, then the
database manager can check whether updates to the database result in the
violation of any of these constraints and if so appropriate action may be
imposed.
 Security Enforcement - Not every user of the database needs to have
access to the entire content of the database. It is the job of the database
manager to enforce these security requirements.
 Backup and Recovery - It is the responsibility of database manager to
detect system failures due to disk crash, power failure, software errors, etc
and restore the database to a state that existed prior to the occurrence of the
failure. This is usually accomplished through the backup and recovery
procedures.
 Concurrency Control - When several users update the database
concurrently, the consistency of data may no longer be preserved. It is
necessary for the system to control the interaction among the concurrent
users, and achieving such a control is one of the responsibilities of database
manager.

Query Processor:
The query language processor is responsible for receiving query language
statements and changing them from the English like syntax of the query language
to a form the DBMS can understand.
It consists of two separate parts:

 The parser
 The query optimizer

The parser receives query language statements from application programs or


command-line utilities and examines the syntax of the statements to ensure they
are correct. To do this, the parser breaks a statement down into basic units of
syntax and examines them to make sure each statement consists of the proper
component parts. If the statements follow the syntax rules, the tokens are passed to
the query optimizer.

The query optimizer examines the query language statements, and tries to
choose the best and most efficient way of executing the query. To do this, the query
optimizer will generate several query plans in which operations are performed in
different orders, and then try to estimate which plan will execute most efficiently.
When making this estimate, the query optimizer may examine factors such as: CPU
time, disk time, network time, sorting methods, and scanning methods.

UNIT – 2
Part 1 – pdf sent
UNIT 2 - Part 2
Physical database design issue :
Storage of database on hard disks :

File organization and its types


File Organization
o The File is a collection of records. Using the primary
key, we can access the records. The type and
frequency of access can be determined by the type
of file organization which was used for a given set of
records.
o File organization is a logical relationship among
various records. This method defines how file records
are mapped onto disk blocks.
o File organization is used to describe the way in which
the records are stored in terms of blocks, and the
blocks are placed on the storage medium.
o The first approach to map the database to the file is
to use the several files and store only one fixed
length record in any given file. An alternative
approach is to structure our files so that we can
contain multiple lengths for records.
o Files of fixed length records are easier to implement
than the files of variable length records.
Objective of file organization
o It contains an optimal selection of records, i.e.,
records can be selected as fast as possible.
o To perform insert, delete or update transaction on
the records should be quick and easy.
o The duplicate records cannot be induced as a result
of insert, update or delete.
o For the minimal cost of storage, records should be
stored efficiently.

Types of file organization:

File organization contains various methods. These


particular methods have pros and cons on the basis of
access or selection. In the file organization, the
programmer decides the best-suited file organization
method according to his requirement.

Types of file organization are as follows:

o Sequential file organization


o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization

Sequential File Organization


This method is the easiest method for file organization. In this method, files are stored sequentially.
This method can be implemented in two ways:

1. Pile File Method:


o It is a quite simple method. In this method, we store the record in a sequence, i.e., one after
another. Here, the record will be inserted in the order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the memory
blocks. When it is found, then it will be marked for deleting, and the new record is inserted.

Insertion of the new record:

Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are
nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it will
be placed at the end of the file. Here, records are nothing but a row in any table.
2. Sorted File Method:
o In this method, the new record is always inserted at the file's end, and then it will sort the
sequence in ascending or descending order. Sorting of records is based on any primary key or
any other key.
o In the case of modification of any record, it will update the record and then sort the file, and
lastly, the updated record is placed in the right place.

Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and R7.
Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the end of the
file, and then it will sort the sequence.

Pros of sequential file organization


o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation of a
student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization


o It will waste time as we cannot jump on a particular record that is required but we have to
move sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

Heap file organization


o It is the simplest and most basic type of organization. It works with data blocks. In heap file
organization, the records are inserted at the file's end. When the records are inserted, it
doesn't require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new data block
need not to be the very next data block, but it can select any data block in the memory to
store new records. The heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It is the
DBMS responsibility to store and manage the new records.
Insertion of a new record
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a new
record R2 in a heap. If the data block 3 is full then it will be inserted in any of the database selected by
the DBMS, let's say data block 1.

If we want to search, update or delete the data in heap file organization, then we need to traverse the
data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-consuming
because there is no sorting or ordering of records. In the heap file organization, we need to check all
the data until we get the requested record.

Pros of Heap file organization


o It is a very good method of file organization for bulk insertion. If there is a large number of
data which needs to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the sequential
record.

Cons of Heap file organization


o This method is inefficient for the large database because it takes time to search or modify the
record.
o
o This method is inefficient for large databases.
Hash File Organization
Hash File Organization uses the computation of hash function on some fields of the records. The hash
function's output determines the location of disk block where the records are to be placed.

When a record has to be received using the hash key columns, then the address is generated, and the
whole record is retrieved using that address. In the same way, when a new record has to be inserted,
then the address is generated using the hash key and record is directly inserted. The same process is
applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In this method, each record
will be stored randomly in the memory.
B+ File Organization
o B+ tree file organization is the advanced method of an indexed sequential access method. It
uses a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the records. For
each primary key, the value of the index is generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In
this method, all the records are stored only at the leaf node. Intermediate nodes act as a
pointer to the leaf nodes. They do not contain any records.
The above B+ tree shows that:
o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual record. They have only
pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to the right
contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and accessed
easily.

Pros of B+ tree file organization


o In this method, searching becomes very easy as all the records are stored only in the leaf
nodes and sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records can increase or decrease
and the B+ tree structure can also grow or shrink.
o It is a balanced tree structure, and any insert/update/delete does not affect the performance of
tree.

Cons of B+ tree file organization


o This method is inefficient for the static method.
Indexed sequential access method (ISAM)
ISAM method is an advanced sequential file organization. In this method, records are stored in the file
using the primary key. An index value is generated for each primary key and mapped with the record.
This index contains the address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data block is fetched
and the record is retrieved from the memory.

Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a huge
database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is based
on the primary key values, we can retrieve the data for the given range of value. In the same
way, the partial value can also be easily searched, i.e., the student name starting with 'JA' can
be easily searched.

Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.
Cluster file organization
o When the two or more records are stored in the same file, it is known as clusters. These files
will have two or more tables in the same data block, and key attributes which are used to map
these tables together are stored only once.
o This method reduces the cost of searching for various records in different files.
o The cluster file organization is used when there is a frequent need for joining the tables with
the same condition. These joins will give only a few records from both tables. In the given
example, we are retrieving the record for only particular departments. This method can't be
used to retrieve the record for the entire department.

In this method, we can directly insert, update or delete any record. Data is sorted based on the key
with which searching is done. Cluster key is a type of key with which joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:

1. Indexed Clusters:

In indexed cluster, records are grouped based on the cluster key and stored together. The above
EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the records are
grouped based on the cluster key- DEP_ID and all the records are grouped.

2. Hash Clusters:

It is similar to the indexed cluster. In hash cluster, instead of storing the records based on the cluster
key, we generate the value of the hash key for the cluster key and store the records with the same
hash key value.

Pros of Cluster file organization


o The cluster file organization is used when there is a frequent request for joining the tables with
same joining condition.
o It provides the efficient result when there is a 1:M mapping between the tables.

Cons of Cluster file organization


o This method has the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use. If we change the
condition of joining then traversing the file takes a lot of time.
o This method is not suitable for a table with a 1:1 condition.

TYPES of INDEXES and INDEX and TREE structure

Indexing in DBMS – Types of Indexes in Database


A database index is a data structure that helps in improving the speed of data access.
However it comes with a cost of additional write operations and storage space to store
the database index. The database index helps quickly locate the data in database
without having to search every row of database. The process of creating an index
for a database is known indexing. In this guide, you will learn various types of Indexes
in DBMS (Database management system) with examples.

Real life example of Indexing


1. You must have read a book, the first few pages of book contains the index of book,
which tells which topic is covered at which page number. This helps you quickly locate
the topic in the book using the index. Without the index, you would have to scan the
entire book to look for the topic which would take a long time.

2. In the library, the books are arranged on the shelf in an alphabetical order. If
you are looking for a book starting with the the letter ‘A’ then you go to the shelf ‘A’.
Here shelf naming with the letter ‘A’ is the index. Imagine if the books are not arranged
in alphabetical order in shelves, it would take a very long time to search for a book.

Index structure in Database


The most common index data structure contains two fields.

1. First field is the search key, this is the column that a user can use to access the
record quickly. For example, if a user is searching for a student in database, the user
can use student id as a search key to quickly locate the student record.
2. The second field contains the address of the student record in the database.
Remember indexing doesn’t replicate the whole database, rather it creates an index that
refers to the actual data in database. This field is a reference to the data. If user is
searching for a student with student id “S01” then the S01 is the search key and the
second field of the index contains the address where the student data such as student
name, age, address is stored.
Indexing Methods

Ordered indices
The indices are usually sorted to make searching
faster. The indices which are sorted are known as
ordered indices.
Example: Suppose we have an employee table
with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on
and we have to search student with ID-543.
o In the case of a database with no index, we
have to search the disk block from starting till
it reaches 543. The DBMS will read the record
after reading 543*10=5430 bytes.
o In the case of an index, we will search using
indexes and the DBMS will read the record
after reading 542*2= 1084 bytes which are
very less compared to the previous case.
Primary Index
o If the index is created on the basis of the

primary key of the table, then it is known as


primary indexing. These primary keys are
unique to each record and contain 1:1 relation
between the records.
o As primary keys are stored in sorted order, the

performance of the searching operation is quite


efficient.
o The primary index can be classified into two

types: Dense index and Sparse index.

1. Dense Index
In Dense Index, there is an index for every record in the database. For example, if a table
student contains 100 records then in dense index the number of indices would be 100, one
index for each record in table.

If more than one record has the same search key then the dense index points to the first record
in the database that has the search key.

The dense name is given to this index is based on the fact that every record in the database has
a corresponding index in index file so the index file is very dense in this index based database.
Advantages of dense indexes:
1. Searching a record is faster compared to other indexes.
2. It doesn’t require the database to be sorted in any order to generate a dense indexes.

Disadvantages of dense indexes:


1. Requires more space as the index file is huge because it contains indexes for all records.
2. More write operations to generate index file.
3. It requires more maintenance as any change in any record would require a maintenance in
index file.

2. Sparse Index
In this index based system, the indexes of very few data items are maintained in the index
file. Unlike Dense index system where every record has an index entry in index file, in this
system, indexes are limited to one per block of data items as shown in the following diagram.
In sparse indexing database needs to be sorted in an order.

For example, let’s say we are creating a sparse index file for student database that contains
records for 100 students.

Student records are divided in blocks where every block contains two records. If index file
contains the indexes for alternate records then we need to maintain indexes for only 50 records
whereas in dense index system, we had to have 100 records in index file.

Advantages of sparse indexing:


1. It requires less storage space for managing the index file as it stores the indexes of few
records instead of all records. This improves the performance.
2. Since limited entries need to be maintained in index file, it requires less write operations for
generating a sparse index file.
3. It requires less maintenance compared to dense indexes.

Disadvantages of sparse indexing:


1. Searching is little slower than dense indexes as not all records have corresponding indexes
and it requires a binary search to locate the search record.
2. Sparse index requires file to be sorted.

Difference between Dense and Sparse indexes

DESCRIPTION DENSE SPARSE


Write operations to

Search is faster as generate indexes are


1.
index for every data faster as indexes for
Performance
item is present. few records needs to

be generated.

It requires the

2. Prerequisite No prerequisites database to be

sorted.

More storage space is Less storage space


3. Storage
required. is required.

Requires more time as


Requires less
every insert, update
maintenance as
4. and delete operation in
number of indexes
Maintenance database requires
are less compared to
maintenance in the
dense index system.
index file.

3. Clustered Index
As the name suggests, in clustered index, the records with the similar type are grouped together
to form a cluster and an index is created for this cluster which is maintained in clustered index
file.
For example:
Let’s say students are assigned to multiple courses and we are creating indexes
on course_id filed. In this case, all the students that are assigned to a
particular course_id form a cluster and the index for that particular course_id points to this
cluster as shown in the following diagram.

This helps in quickly locating a record in a particular cluster as the the size of the cluster is
limited and smaller than the actual database so searching a record is faster.

One of the type of clustered indexing is primary indexing: In this type of clustered indexing, data
is sorted based on the search key. In this type of indexing, searching is even faster as the
records are sorted.

4. Non-clustered or secondary indexing


In non-clustered indexing, the indexing is done on multiple levels. This indexing is also known
as secondary indexing.

For example, let’s say we have records of 300 students in database, instead of creating indexes
for 300 records on the root level, we create indexes for 1st student records, 101st student and
201st student. This index is maintained in the primary memory such as RAM. Here we have
divided the complete index file in three groups.
The second level of indexes are stored in hard disk, the primary index file is stored in
RAM, refers to this file and this file then points to the actual data block in memory as shown
below:

5. Multilevel index

B+ Tree
o The B+ tree is a balanced binary search tree. It
follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data
pointers. B+ tree ensures that all leaf nodes remain
at the same height.
o In the B+ tree, the leaf nodes are linked using a link
list. Therefore, a B+ tree can support random access
as well as sequential access.

Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance
from the root node. The B+ tree is of the order n
where n is fixed for every B+ tree.
o It contains an internal node and leaf node.

Internal node
o An internal node of the B+ tree can contain at least
n/2 record pointers except the root node.
o At most, an internal node of the tree contains n
pointers.

Leaf node
o The leaf node of the B+ tree can contain at least n/2
record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n
key values.
o Every leaf node of the B+ tree contains one block
pointer P to point to next leaf node.
Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree


structure. First, we will fetch for the intermediary node
which will direct to the leaf node that can contain a
record for 55.

So, in the intermediary node, we will find a branch


between 50 and 75 nodes. Then at the end, we will be
redirected to the third leaf node. Here DBMS will perform
a sequential search to find 55.

B+ Tree Insertion

Suppose we want to insert a record 60 in the below


structure. It will go to the 3rd leaf node after 55. It is a
balanced tree, and a leaf node of this tree is already full,
so we cannot insert 60 there.

In this case, we have to split the leaf node, so that it can


be inserted into tree without affecting the fill factor,
balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and
its current root node is 50. We will split the leaf node of
the tree in the middle so that its balance is not altered.
So we can group (50, 55) and (60, 65, 70) into 2 leaf
nodes.

If these two has to be leaf nodes, the intermediate node


cannot branch from 50. It should have 60 added to it, and
then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow.


In a normal scenario, it is very easy to find the node
where it fits and then place it in that leaf node.
B+ Tree Deletion

Suppose we want to delete 60 from the above example.


In this case, we have to remove 60 from the intermediate
node as well as from the 4th leaf node too. If we remove
it from the intermediate node, then the tree will not
satisfy the rule of the B+ tree. So we need to modify it to
have a balanced tree.

After deleting node 60 from above B+ tree and re-


arranging the nodes, it will show as follows:
6.

You might also like