DBMS-Introduction To Database Management Systems Notes
DBMS-Introduction To Database Management Systems Notes
Database
The database is a shared collection of logically related data in a systematic manner,
which is stored to meet the requirements of different users of an organization that
can easily be accessed, managed and updated.
It is actually a place where related piece of information is stored and various
operations can be performed on it.
Database can be maintained manually or through electronics devices such as: Digital
diaries, Mobile phones, computers, etc.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores
all the data in one single database file and that recorded data is placed in the
database.
o Data sharing: In DBMS, the authorized users of an organization can share the data
among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of
the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic
backup of data from hardware and software failures and restores the data if
required.
o multiple user interface: It provides different types of user interfaces like graphical
user interfaces, application program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and
large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most
of the organization, all the data stored in a single database and if the database is
damaged due to electric failure or database corruption then the data may be lost
forever.
Application of DBMS
Banking: For customer information, account activities, payments, deposits, loans,
etc.
Airlines: For reservations and schedule information.
Universities: For student information, course registrations, colleges and grades.
Telecommunication: It helps to keep call records, monthly bills, maintaining
balances, etc.
Finance: For storing information about stock, sales, and purchases of financial
instruments like stocks and bonds.
Sales: Use for storing customer, product & sales information.
Manufacturing: It is used for the management of supply chain and for tracking
production of items. Inventories status in warehouses.
HR Management: For information about employees, salaries, payroll, deduction,
generation of paychecks, etc.
Types of DBMS
Hierarchical Model
This database model organizes data into a tree-like-structure, with a single root, to
which all the other data is linked. The hierarchy starts from the Root data, and
expands like a tree, adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book,
recipes etc.
In hierarchical model, data is organized into tree-like structure with one one-to-
many relationship between two different types of data, for example, one department
can have many courses, many professors and of-course many students.
Network Model
This is an extension of the Hierarchical model. In this model data is organised more
like a graph, and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in
this database model. Also, as the data is more related, hence accessing the data is
also easier and fast. This database model was used to map many-to-many data
relationships.
In this model, entities are organized in a graph which can be accessed through
several paths.
Relational model
Relational DBMS is the most widely used DBMS model because it is one of the
easiest.
This model is based on normalizing data in the rows and columns of the tables.
Relational model stored in fixed structures and manipulated using SQL.
In this model, data is organised in two-dimensional tables and the relationship is
maintained by storing a common field.
The basic structure of data in the relational model is tables. All the information
related to a particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
Object-Oriented Model
In Object-oriented Model data stored in the form of objects.
The structure which is called classes which display data within it.
It defines a database as a collection of objects which stores both data member’s
values and operations.
The uniqueness of object oriented database is that it adds the database functionality
to the object programming language.
Components of DBMS
User: - Users are the one who really uses the database. Users can be administrator,
developer or the end users.
Data or Database: - As we discussed already, data is one of the important factor of
database. A very huge amount of data will be stored in the database and it forms the
main source for all other components to interact with each other. There are two
types of data. One is user data. It contains the data which is responsible for the
database, i.e.; based on the requirement, the data will be stored in the various tables
of the database in the form of rows and columns. Another data is Metadata. It is
known as ‘data about data’, i.e.; it stores the information like how many tables, their
names, how many columns and their names, primary keys, foreign keys etc.
basically these metadata will have information about each tables and their
constraints in the database.
DBMS: - This is the software helps the user to interact with the database. It allows
the users to insert, delete, update or retrieve the data. All these operations are
handled by query languages like MySQL, Oracle etc.
Database Application: - It the application program which helps the users to
interact with the database by means of query languages. Database application will
not have any idea about the underlying DBMS.
Functions of DBA:
Defining Conceptual Schema: The DBA creates the original database schema
by executing a set of data definition statements in the DDL.
Security and Integrity Checks: Ensuring data integrity, this means that data are
complete, accurate and current for the tasks at hand. Controlling data security,
including preventing unauthorized access to the data and protecting against
other security threats.
Backup and Recovery Strategies: DBAs create backup and recovery plans and
procedures based on industry best practices, then make sure that the necessary
steps are followed. Backups cost time and money, so the DBA may have to
persuade management to take necessary precautions to preserve data.System
admins or other personnel may actually create the backups, but it is the DBA’s
responsibility to make sure that everything is done on schedule.
In the case of a server failure or other form of data loss, the DBA will use existing
backups to restore lost information to the system. Different types of failures may
require different recovery strategies, and the DBA must be prepared for any
eventuality.
File processing system is good when there is only limited number of files and data in are
very less. As the data and files in the system grow, handling them becomes difficult.
1. Data Mapping and Access: - Although all the related informations are grouped
and stored in different files, there is no mapping between any two files. i.e.; any
two dependent files are not linked. Even though Student files and Student_Report
files are related, they are two different files and they are not linked by any means.
Hence if we need to display student details along with his report, we cannot
directly pick from those two files. We have to write a lengthy program to search
Student file first, get all details, then go Student_Report file and search for his
report.
When there is very huge amount of data, it is always a time consuming task to
search for particular information from the file system. It is always an inefficient
method to search for the data.
2. Data Redundancy: - There are no methods to validate the insertion of duplicate
data in file system. Any user can enter any data. File system does not validate for
the kind of data being entered nor does it validate for previous existence of the
same data in the same file. Duplicate data in the system is not appreciated as it is a
waste of space, and always lead to confusion and mishandling of data. When there
are duplicate data in the file, and if we need to update or delete the record, we
might end up in updating/deleting one of the record, leaving the other record in
the file. Again the file system does not validate this process. Hence the purpose of
storing the data is lost.
Though the file name says Student file, there is a chance of entering staff
information or his report information in the file. File system allows any
information to be entered into any file. It does not isolate the data being entered
from the group it belongs to.
3. Data Dependence: - In the files, data are stored in specific format, say tab,
comma or semicolon. If the format of any of the file is changed, then the program
for processing this file needs to be changed. But there would be many programs
dependent on this file. We need to know in advance all the programs which are
using this file and change in the entire place. Missing to change in any one place
will fail whole application. Similarly, changes in storage structure, or accessing
the data, affect all the places where this file is being used. We have to change it
entire programs. That is smallest change in the file affect all the programs and
need changes in all them.
4. Data inconsistency: - Imagine Student and Student_Report files have student’s
address in it, and there was a change request for one particular student’s address.
The program searched only Student file for the address and it updated it
correctly. There is another program which prints the student’s report and mails it
to the address mentioned in the Student_Report file. What happens to the report
of a student whose address is being changed? There is a mismatch in the actual
address and his report is sent to his old address. This mismatch in different copies
of same data is called data inconsistency. This has occurred here, because there is
no proper listing of files which has same copies of data.
5. Data Isolation: - Imagine we have to generate a single report of student, who is
studying in particular class, his study report, his library book details, and hostel
information. All these informations are stored in different files. How do we get all
these details in one report? We have to write a program. But before writing the
program, the programmer should find out which all files have the information
needed, what is the format of each file, how to search data in each file etc. Once all
these analysis is done, he writes a program. If there is 2-3 files involved,
programming would be bit simple. Imagine if there is lot many files involved in it?
It would be require lot of effort from the programmer. Since all the datas are
isolated from each other in different files, programming becomes difficult.
6. Security: - Each file can be password protected. But what if have to give access to
only few records in the file? For example, user has to be given access to view only
their bank account information in the file. This is very difficult in the file system.
7. Integrity: - If we need to check for certain insertion criteria while entering the
data into file it is not possible directly. We can do it writing programs. Say, if we
have to restrict the students above age 18, then it is by means of program alone.
There is no direct checking facility in the file system. Hence these kinds of
integrity checks are not easy in file system.
8. Atomicity: - If there is any failure to insert, update or delete in the file system,
there is no mechanism to switch back to the previous state. Imagine marks for one
particular subject needs to be entered into the Report file and then total needs to
be calculated. But after entering the new marks, file is closed without saving. That
means, whole of the required transaction is not performed. Only the totaling of
marks has been done, but addition of marks not being done. The total mark
calculated is wrong in this case. Atomicity refers to completion of whole
transaction or not completing it at all. Partial completion of any transaction leads
to incorrect data in the system. File system does not guarantee the atomicity. It
may be possible with complex programs, but introduce for each of transaction
costs money.
9. Concurrent Access: - Accessing the same data from the same file is called
concurrent access. In the file system, concurrent access leads to incorrect data.
For example, a student wants to borrow a book from the library. He searches for
the book in the library file and sees that only one copy is available. At the same
time another student also, wants to borrow same book and checks that one copy
available. First student opt for borrow and gets the book. But it is still not updated
to zero copy in the file and the second student also opt for borrow! But there are
no books available. This is the problem of concurrent access in the file system.
DBMS is a collection of data. In DBMS, the user File system is a collection of data. In this
is not required to write the procedures. system, the user has to write the
procedures for managing the database.
DBMS gives an abstract view of data that hides File system provides the detail of the data
the details. representation and storage of data.
DBMS provides a crash recovery mechanism, File system doesn't have a crash
i.e., DBMS protects the user from the system mechanism, i.e., if the system crashes
failure. while entering some data, then the
content of the file will lost.
DBMS provides a good protection mechanism. It is very difficult to protect a file under
the file system.
DBMS contains a wide variety of sophisticated File system can't efficiently store and
techniques to store and retrieve the data. retrieve the data.
DBMS takes care of Concurrent access of data In the File system, concurrent access has
using some form of locking. many problems like redirecting the file
while other deleting some information or
updating some information.
Data Abstraction
Logical: This level comprises of the information that is actually stored in the
database in the form of tables. It also stores the relationship among the data entities
in relatively simple structures. At this level, the information available to the user at
the view level is unknown.We can store the various attributes of an employee and
relationships, e.g. with the manager can also be stored.
View: This is the highest level of abstraction. Only a part of the actual database is
viewed by the users. This level exists to ease the accessibility of the database by an
individual user. Users view data in the form of rows and columns. Tables and
relations are used to store data. Multiple views of the same database may exist.
Users can just view the data and interact with the database, storage and
implementation details are hidden from them.
Data Independence
o The ability to modify a scheme definition in one level without affecting a scheme
definition in a higher level is called data independence.
o
o Metadata itself follows a layered architecture, so that when we change data at one
layer, it does not affect the data at another level. This data is independent but
mapped to each other.
Logical Data Independence
Logical data is data about database, that is, it stores information about how data is
managed inside. For example, a table (relation) stored in the database and all its
constraints, applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from
actual data stored on the disk. If we do some changes on table format, it should not
change the data residing on the disk.
Logical data independence refers characteristic of being able to change the
conceptual schema without having to change the external schema.
Logical data independence is used to separate the external level from the
conceptual view.
If we do any changes in the conceptual view of the data, then the user view of the
data would not be affected.
Logical data independence occurs at the user interface level.
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get
their request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing
and transaction management.
o To communicate with the DBMS, client-side application establishes a connection
with the server side.
3-tier Architecture
o A 3-tier architecture separates its tiers from each other based on the complexity of
the users and how they use the data present in the database.
o It is the most widely used architecture to design a DBMS.
Fig: 3-tier Architecture
Database (Data) Tier − At this tier, the database resides along with its query
processing languages. We also have the relations that define the data and their
constraints at this level.
Application (Middle) Tier − At this tier reside the application server and the
programs that access the database. For a user, this application tier presents an
abstracted view of the database. End-users are unaware of any existence of the
database beyond the application. At the other end, the database tier is not aware of
any other user beyond the application tier. Hence, the application layer sits in the
middle and acts as a mediator between the end-user and the database.
User (Presentation) Tier − End-users operate on this tier and they know nothing
about any existence of the database beyond this layer. At this layer, multiple views
of the database can be provided by the application. All views are generated by
applications that reside in the application tier.
There are following three levels or layers of DBMS architecture:
• External Level
•Conceptual Level
• Internal Level
In the above diagram,
Functions of DBMS:
There are the following important functions of a DBMS:
(i) Data Storage Management: It provides a mechanism for management of permanent
storage of the data. The internal schema defines how the data should be stored by the
storage management mechanism and the storage manager interfaces with the operating
system to access the physical storage.
(ii) Data Manipulation Management: A DBMS furnishes users with the ability to retrieve,
update and delete existing data in the database.
(iii) Data Definition Services: The DBMS accepts the data definitions such as external
schema, the conceptual schema, the internal schema, and all the associated mappings in
source form.
(iv) Data Dictionary/System Catalog Management: The DBMS provides a data
dictionary or system catalog function in which descriptions of data items are stored and
which is accessible to users.
(v) Database Communication Interfaces: The end-user's requests for database access
are transmitted to DBMS in the form of communication messages.
(vi) Authorization / Security Management: The DBMS protects the database against
unauthorized access, either international or accidental. It furnishes mechanism to ensure
that only authorized users an access the database.
{vii) Backup and Recovery Management: The DBMS provides mechanisms for backing
up data periodically and recovering from different types of failures. This prevents the loss
of data,
(viii) Concurrency Control Service: Since DBMSs support sharing of data among multiple
users, they must provide a mechanism for managing concurrent access to the database.
DBMSs ensure that the database kept in consistent state and that integrity of the data is
preserved.
(ix) Transaction Management: A transaction is a series of database operations, carried
out by a single user or application program, which accesses or changes the contents of the
database. Therefore, a DBMS must provide a mechanism to ensure either that all the
updates corresponding to a given transaction is made or that none of them is made.
Relational algebra
SELECT (symbol: σ)
PROJECT (symbol: π)
RENAME (symbol: ƿ)
UNION (υ)
INTERSECTION (∩),
DIFFERENCE (-)
CARTESIAN PRODUCT ( x )
JOIN
DIVISION
o Projection (π)
Projection is used to project required column data from a relation.
Example :
o Selection (σ)
Selection is used to select required tuples of the relations. for the above relation
σ (c>3)R will select the tuples which have c more than 3.
Note: selection operator only selects the required tuples but does not display them.
For displaying, data projection operator is used.
For the above selected tuples, to display we need to use projection also.
o Union (U)
UNION is denoted by ∪ symbol. It includes all tuples that are in tables A or in B. It
also eliminates duplicate tuples. So, set A UNION set B would be expressed as:
The result <- A ∪ B
For a union operation to be valid, the following conditions must hold -
Example
Table A Table B
1 1 1 1
1 2 1 3
A ∪ B gives
Table A ∪ B
column 1 column 2
1 1
1 2
1 3
o Intersection (∩)
An intersection is defined by the symbol ∩
A∩B
Defines a relation consisting of a set of all tuple that are in both A and B. However, A
and B must be union-compatible.
Example:
A∩B
Table A ∩ B
column 1 column 2
1 1
Set Difference in relational algebra is same set difference operation as in set theory
with the constraint that both relations should have same set of attributes.
- Symbol denotes it. The result of A - B, is a relation which includes all tuples that
are in A but not in B.
Example
A-B
Table A - B
column 1 column 2
1 2
o Rename (ρ)
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
1. Operation: (EMPLOYEE ⋈ SALARY)
Result:
Input:
1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example:
EMPLOYEE
FACT_WORKERS
Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)
Output:
Input:
1. EMPLOYEE ⟕ FACT_WORKERS
Input:
1. EMPLOYEE ⟖ FACT_WORKERS
Output:
Input:
1. EMPLOYEE ⟗ FACT_WORKERS
Output:
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data as
per the equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
1. CUSTOMER ⋈ PRODUCT
Output:
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
Relational Calculus
o Relational calculus is a non-procedural query language. In the non-procedural query
language, the user is concerned with the details of how to obtain the end results.
o The relational calculus tells what to do but never explains how to do.
Saurabh A7 Patiala
Mehak B6 Jalandhar
Sumiti D9 Ludhiana
Ria A5 Patiala
Table-2: Branch
BRANCH NAME BRANCH CITY
ABC Patiala
DEF Ludhiana
GHI Jalandhar
Table-3: Account
Saurabh L33
Mehak L49
Ria L98
Table-6: Depositor
CUSTOMER NAME ACCOUNT NUMBER
Saurabh 1111
Mehak 1113
Sumiti 1114
Queries-1: Find the loan number, branch, amount of loans of greater than or equal to
10000 amount.
{t| t ∈ loan ∧ t[amount]>=10000}
Resulting relation:
LOAN NUMBER
L33
L35
L98
Queries-3: Find the names of all customers who have a loan and an account at the bank.
{t | ∃ s ∈ borrower( t[customer-name] = s[customer-name])
∧ ∃ u ∈ depositor( t[customer-name] = u[customer-name])}
Resulting relation:
CUSTOMER NAME
Saurabh
Mehak
Queries-4: Find the names of all customers having a loan at the “ABC” branch.
{t | ∃ s ∈ borrower(t[customer-name] = s[customer-name]
∧ ∃ u ∈ loan(u[branch-name] = “ABC” ∧ u[loan-number] = s[loan-number]))}
Resulting relation:
CUSTOMER NAME
Saurabh
Notation:
1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
Example:
Table-1: Customer
CUSTOMER NAME STREET CITY
Table-2: Loan
LOAN NUMBER BRANCH NAME AMOUNT
L10 Sub 90
L08 Main 60
Table-3: Borrower
CUSTOMER NAME LOAN NUMBER
Ritu L01
Debomit L08
Soumya L03
Query-1: Find the loan number, branch, amount of loans of greater than or equal to 100
amount.
{≺l, b, a≻ | ≺l, b, a≻ ∈ loan ∧ (a ≥ 100)}
Resulting relation:
LOAN NUMBER BRANCH NAME AMOUNT
L10 Sub 90
Query-2: Find the loan number for each loan of an amount greater or equal to 150.
{≺l≻ | ∃ b, a (≺l, b, a≻ ∈ loan ∧ (a ≥ 150)}
Resulting relation:
LOAN NUMBER
L01
L03
Query-3: Find the names of all customers having a loan at the “Main” branch and find the
loan amount .
{≺c, a≻ | ∃ l (≺c, l≻ ∈ borrower ∧ ∃ b (≺l, b, a≻ ∈ loan ∧ (b = “Main”)))}
Resulting relation:
Ritu 200
Debomit 60
Soumya 150
1. Let R = (A, B, C) and let r1 and r2 both be relations on schema R Give an expression in
Tuple relational calculus and domain relational calculus that is equivalent to
a. Π A(r1)
b. σ B = 17 (r1)
c. r1 ∪ r2
d. r1 − r2
e. r1 ∩ r2
Solution:
In Tuple relational calculus
a.{t | ∃ q ∈ r1 (q[A] = t[A])}
b. {t | t ∈ r1 ∧ t [B] = 17}
c. {t | t ∈ r1 ∨ t ∈ r2}
d. {t | t ∈ r1 ∧ t ∈ r2}
e. {t | t ∈ r1 ∧ t ∈ r2}
2. Consider the following relational database and give the relational algebra for each of
the following.
Manages (Person_name, manager_name)
Company (Company_name, city)
Works (Person_name, company_name, salary)
Employee (Person_name, street, city)
Underline columns are the primary keys.
2) Find the names and cities of residence of all employees who work for SBI.
Π person-name, city (employee (σ company-name = “SBI” (works))
3) Find the names, street address, and cities of residence of all employees who work for
SBI and earn more than $10,000 per annum.
Π person-name, street, city (σ company-name = “SBI” ∧ salary > 10000) works employee
4) Find the names of all employees in this database who live in the same city as the
Company for which they work.
Π person-name (employee works company)
5) Find the names of all employees who do not work for SBI.
The following solutions assume that all people work for exactly one company. If one
allows people to appear in the database (e.g. in employee) but not appear in works, the
problem is more complicated.
Π person-name (σ company-name = “SBI” (works))
If people may not work for any company:
Π person-name (employee) – Π person-name (σ (company-name = “SBI”) (works))
6) Find the names of all employees who earn more than every employee of SBI.
Π person-name (works) − (Π works.person-name (works (works.salary ≤works2.salary ∧ works2.company-name = “SBI”) ρ
works2(works)))
7 5. Names of students and the titles of courses they registered to.
πname,title ( Student Registered Course)
Parallel network database system – This system has the advantage of improving
processing input and output speeds. Majorly used in the applications that have
query to larger database. It holds the multiple central processing units and data
storage disks in parallel.
Distributed database system – In this data and the DBMS software are distributed
over several sites but connected to the single computer.