Dbms Notes
Dbms Notes
com
DBMS Notes
(Database Management System)
1
Introduction to Database Management System
Before we dive into core concepts, let us see the basic terminologies:
What is Data?
Data encompasses raw, unstructured information such as text,
observations, figures, symbols, and descriptions, lacking inherent purpose
or significance. It is quantified in bits and bytes, fundamental units in
computer storage and processing. Recorded data remains devoid of
meaning until processed.
● Quantitative data
● Qualitative data
What is Information?
Information refers to organized data that provides meaningful insights or
knowledge. For example, a weather forecast predicting rain tomorrow is
information derived from data collected by meteorological instruments.
Similarly, a report detailing quarterly sales figures provides information
distilled from transaction records.
Page 2
Difference between data and information
1. Nature: Data consists of raw, unprocessed facts, whereas information
is data that has been processed, organized, and given context to
make it meaningful and useful.
2. Context: Data lacks context and interpretation, while information is
contextualized and interpreted to provide insights or understanding.
3. Purpose: Data alone does not serve a specific purpose, whereas
information is intended to inform, guide decision-making, or provide
understanding.
4. Representation: Data can be represented in various forms, such as
text, numbers, or symbols, while information is typically presented in a
structured and understandable format, such as reports, charts, or
graphs.
5. Actionability: Data may not always be actionable on its own, while
information is often actionable, providing guidance or prompting
decisions based on the insights it conveys.
What is a database?
A database is a structured collection of data organized for efficient storage,
retrieval, and manipulation. It typically consists of tables, each containing
rows and columns, where each row represents a record and each column
represents a field or attribute. Databases are managed by database
management systems (DBMS) and are used in various applications to store
and manage information, such as customer data, inventory records, or
financial transactions.
Page 3
Database Management System
A Database Management System (DBMS) is software designed to efficiently
and securely manage databases. It provides an interface for users to interact
with the database, allowing them to create, retrieve, update, and delete
data. DBMS handles tasks such as data organization, storage, retrieval,
indexing, security, and backup. Examples of popular DBMS include MySQL,
Oracle Database, Microsoft SQL Server, and PostgreSQL.
Page 4
3. Data Integrity and Security: DBMS offers better data integrity and
security mechanisms compared to file systems. It enforces data
integrity constraints, such as primary keys and foreign keys, and
provides access control features to regulate who can access and
modify data.
4. Concurrency Control: DBMS supports concurrent access by multiple
users, providing mechanisms like locking and transactions to ensure
data consistency and integrity during concurrent operations. File
systems typically lack built-in concurrency control, leading to potential
data corruption in multi-user environments.
5. Data Retrieval and Manipulation: DBMS provides powerful querying
capabilities, allowing users to retrieve, filter, and manipulate data using
structured query languages (e.g., SQL). In contrast, file systems offer
limited querying capabilities, often requiring custom scripts or
programs for data extraction and manipulation.
Page 5
DBMS Application Architecture
There are 3 types of architecture for database management systems
1-Tier Architecture:
In a 1-Tier Architecture, the database is accessible directly to the user. This
means the user can directly access the Database Management System
(DBMS) without any intermediary layers. In this setup, the client, server, and
database all reside on the same machine. For instance, when learning SQL,
one can set up an SQL server and database on their local system. This
allows direct interaction with the relational database to execute operations.
However, in practice, the industry typically opts for 2-tier or 3-tier
architectures instead of this setup.
Page 6
2-Tier Architecture:
The 2-tier architecture resembles a fundamental client-server model. The
application on the client end communicates directly with the database
located on the server side. APIs such as ODBC and JDBC facilitate this
interaction. The server side handles tasks like query processing and
transaction management. Meanwhile, the client side runs user interfaces
and application programs. The client-side application establishes a
connection with the server-side to interact with the DBMS.
Page 7
3-Tier Architecture:
In 3-Tier Architecture, an additional layer exists between the client and the
server. Direct communication between the client and server is avoided.
Instead, the client interacts with an application server, which in turn
communicates with the database system. Query processing and transaction
management occur at this server level. This intermediary layer serves as a
conduit for the exchange of partially processed data between the server and
the client. This architecture is commonly employed in large-scale web
applications.
Page 8
View of Data (Three Schema Architecture)
In a DBMS, data visualization varies at different levels of data abstraction.
Each level of abstraction allows developers to shield users from complex
data structures. This is achieved by concealing intricate data structures
behind layers of abstraction.
The primary goal of the three level architecture is to allow multiple users to
access identical data with individualized perspectives, all while storing the
core data only once.
1. Physical Level:
● Represents the lowest level of abstraction, detailing the storage
mechanisms of the data.
● Utilizes low-level data structures.
Page 9
● Contains the Physical schema, delineating the physical storage
arrangement of the database.
● Addresses aspects such as storage allocation methods (e.g.,
optimizations.
Page 10
What is Schema and Instances?
Three types of schemas exist: Physical, Logical, and View schemas known
as subschemas.
Page 11
Data Models
A data model is a conceptual framework for organizing and representing
data in a database. It defines the structure, relationships, constraints, and
operations associated with the data. There are several types of data models:
Data Languages
Page 12
1. Data Definition Language (DDL): used to define, modify, and
manage the structure of database objects such as tables, views,
indexes, and schemas. DDL commands include CREATE, ALTER,
DROP, and TRUNCATE, allowing users to create new database
objects, modify existing ones, or remove them entirely.
2. Data Manipulation Language (DML): used to manipulate data within
the database. DML commands include SELECT, INSERT, UPDATE, and
DELETE, enabling users to retrieve, insert, modify, and delete data in
database tables.
3. Data Control Language (DCL): used to control access to data within
the database. DCL commands include GRANT and REVOKE, allowing
users to grant or revoke permissions on database objects such as
tables, views, and procedures.
4. Transaction Control Language (TCL): used to manage transactions
within the database. TCL commands include COMMIT, ROLLBACK,
and SAVEPOINT, enabling users to control the outcome and integrity
of transactions.
Entity-Relationship diagram
An ER diagram, or Entity Relationship diagram, provides a visual depiction of
the logical structure of a database.
Page 13
Stud_id, Col_id is the primary key.
Student, College are entity sets of which (Stud_id, Stud name, Stud addr)
and (Col_id, Col name) are attributes of respectively.
Entity Set:
It is a set of the same type of entities.
1. Strong Entity Set: in a strong entity set primary key exists. A primary
key in an entity set is represented by underlining it.
2. Weak Entity Set: primary key does not exist. It contains a partial key
called discriminator.
Relationship:
Association among entities. It can be unary, binary, ternary and n-ary.
Cardinality Constraint:
Maximum number of relationship instances an entity set can take part in. It
can be one-to-one, one-to-many, many-to-one, many-to-many.
Page 14
Attributes:
Attributes are the descriptive characteristics possessed by each entity within
an Entity Set.
Types of attributes-
Page 15
Extended ER Features
Extended Entity-Relationship (EER) features in database management
systems (DBMS) extend the capabilities of traditional Entity-Relationship (ER)
models.
Page 16
Keys in DBMS
A key is a set of attributes that can identify each tuple uniquely in the given
relation. Types of keys:
Page 17
individual attribute in a composite key may not be unique on its own,
but the combination of attributes is unique.
6. Alternate Key: Alternate keys are candidate keys that were not
selected as the primary key. They could potentially serve as the
primary key if the primary key is not available.
Decomposition of a relation
Decomposition of a relation refers to the process of splitting a single relation
into two or more sub-relations.
Page 18
Types of decomposition:
Normalization
In DBMS, database normalization is the process of organizing and
structuring the database to reduce redundancies and ensure data integrity
through lossless decomposition.
1. Reduce redundancy,
2. Ensure data integrity, and
3. Optimize storage and query performance.
Normal Forms
1. First Normal Form (1NF): Ensures that each table has atomic
attributes, meaning each attribute contains only indivisible values. No
repeating groups or arrays are allowed.
2. Second Normal Form (2NF): Requires that a table be in 1NF and that
all non-key attributes are fully functionally dependent on the entire
primary key, eliminating partial dependencies.
3. Third Normal Form (3NF): Requires a table to be in 2NF and ensures
that there are no transitive dependencies, meaning that non-key
attributes are not dependent on other non-key attributes.
Page 19
4. Boyce-Codd Normal Form (BCNF): A given relation is considered to
be in Boyce-Codd Normal Form (BCNF) if and only if it meets the
following criteria:
● The relation already exists in Third Normal Form (3NF).
● For every non-trivial functional dependency 'A → B' in the
relation, where 'A' and 'B' are sets of attributes, 'A' must be a
superkey of the relation.
Transaction
In a database management system (DBMS), a transaction represents a unit
of work performed against the database. It is a sequence of one or more
database operations (such as insert, update, delete, or select) that must be
executed atomically, ensuring data integrity and consistency.
Page 20
contains the modified value, which is then written to the
corresponding location in the database.
Transaction States
1. Active State:
This represents the initial phase in the lifecycle of a transaction.
During the active state, the transaction's instructions are being
executed. Any modifications made by the transaction are temporarily
stored in the buffer located in the main memory.
Page 21
3. Committed State:
Once all the changes made by the transaction have been
successfully stored in the database, it transitions into a committed
state. At this stage, the transaction is considered fully committed,
indicating that its modifications are now permanent and reflected in
the database.
4. Failed State:
When a transaction is being executed in the active state or partially
committed state and encounters a failure that prevents it from
continuing execution, it transitions into a failed state.
5. Aborted State:
After the transaction has failed and entered a failed state, it becomes
necessary to undo all the changes made by the transaction. To
accomplish this, the transaction is rolled back, reverting any
modifications it made to the database. Once the rollback process is
complete, the transaction transitions into an aborted state.
6. Terminated State:
This represents the final stage in the lifecycle of a transaction.
Once the transaction has entered either the committed state or the
aborted state, it ultimately transitions into a terminated state, signifying
the conclusion of its lifecycle.
Page 22
ACID Properties
The properties that ensure the consistency, reliability, and correctness of
transactions in a database system are known as the ACID properties. ACID
stands for:
1. Atomicity:
Ensures that a transaction is treated as a single unit of work, meaning
that either all of its operations are successfully completed, or none of
them are. If any part of the transaction fails, the entire transaction is
rolled back, and the database returns to its original state.
2. Consistency:
Ensures that the database remains in a consistent state before and
after the execution of a transaction. Constraints and rules defined on
the database schema must be enforced throughout the transaction,
preserving the integrity of the data.
3. Isolation:
Ensures that the execution of transactions is isolated from each other,
preventing interference or corruption of data. Each transaction
Page 23
operates as if it were the only transaction executing against the
database, even in the presence of concurrent transactions.
4. Durability:
Ensures that the effects of a committed transaction are permanent and
survive system failures. Once a transaction is committed, its changes
are saved to the database and remain intact even in the event of a
crash or power outage.
Schedules:
In a database system, a schedule refers to the sequential order in which
the operations of multiple transactions are executed. It represents the
timeline of transactional operations within the system.
Page 24
1. Serial Schedule:
● Transactions execute one after the other, in a sequential
manner.
● Only one transaction is allowed to execute at any given time.
● Serial schedules guarantee consistency, recoverability,
cascadelessness, and strictness.
2. Non-Serial Schedule:
● Multiple transactions execute simultaneously.
● Operations of different transactions are interleaved or mixed
with each other, allowing for parallel execution.
● Non-serial schedules may not always guarantee consistency,
recoverability, cascadelessness, and strictness.
Page 25
Serializability
Serializability is a concept used to determine the correctness of non-serial
schedules and ensure database consistency. It identifies which concurrent
schedules are valid and will preserve the integrity of the database.
1. Serializable Schedules:
If a given non-serial schedule involving 'n' transactions is equivalent to
some serial schedule of the same 'n' transactions, it is termed a
serializable schedule. Serializable schedules consistently maintain the
properties of consistency, recoverability, cascadelessness, and
strictness.
2. Non-Serializable Schedules:
A non-serial schedule that cannot be transformed into a serial
schedule is referred to as a non-serializable schedule. Such a
schedule is not guaranteed to yield the same outcome as any serial
schedule on a consistent database.
Page 26
Non-serializable schedules may or may not be consistent, may or may
not be recoverable.
Page 27
Thank You!
Join Now
Page 28