0% found this document useful (0 votes)
397 views43 pages

MCS023

This document discusses key concepts in database management systems including entity relationship diagrams, views, normalization, checkpoints, and data replication. It provides an example ER diagram with entities for employees, departments, projects, and dependents. Views are described as virtual tables that allow selecting fields from tables to query specific data. Normalization is explained as decomposing tables to minimize data dependencies and anomalies through functional dependencies and primary keys. Checkpoints mark consistent states in the transaction log to facilitate recovery from crashes. Data replication copies a database to multiple sites for improved availability and distributed access.

Uploaded by

Ls Payne
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
397 views43 pages

MCS023

This document discusses key concepts in database management systems including entity relationship diagrams, views, normalization, checkpoints, and data replication. It provides an example ER diagram with entities for employees, departments, projects, and dependents. Views are described as virtual tables that allow selecting fields from tables to query specific data. Normalization is explained as decomposing tables to minimize data dependencies and anomalies through functional dependencies and primary keys. Checkpoints mark consistent states in the transaction log to facilitate recovery from crashes. Data replication copies a database to multiple sites for improved availability and distributed access.

Uploaded by

Ls Payne
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 43

1.

(a) Draw an ER diagram for the situation given below :


"In a department many employees are working on many projects, which are under control of
the manager of the department. The manager of the department also holds the responsibility
of the welfare of the employees." Make suitable choices of the attributes for the entities,
identified by you for your ER. diagram. Transform your ER diagram into a Relational Database.

This Company ER diagram illustrates key information about Company, including entities such
as employee, department, project and dependent. It allows to understand the relationships
between entities.
Entities and their Attributes are
• Employee Entity : Attributes of Employee Entity are Name, Id, Address, Gender,
Dob and Doj.
Id is Primary Key for Employee Entity.
• Department Entity : Attributes of Department Entity are D_no, Name and Location.
D_no is Primary Key for Department Entity.
• Project Entity : Attributes of Project Entity are P_No, Name and Location.
P_No is Primary Key for Project Entity.
• Dependent Entity : Attributes of Dependent Entity are D_no, Gender and
relationship.

(c) What is the role of views in DBMS ? Can we perform insert, delete or modify operations, if
the view contains a group function ? Justify.
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real
table in the database. We can create a view by selecting fields from one or more tables
present in the database. A View can either have all the rows of a table or specific rows based
on certain condition. In this article we will learn about creating , deleting and updating Views.
We can create View using CREATE VIEW statement. A View can be created from a single table
or multiple tables. Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;

view_name: Name for the View


table_name: Name of the table
condition: Condition to select rows
Examples:
• Creating View from a single table:
• In this example we will create a View named DetailsView from the table
StudentDetails. Query:
CREATE VIEW DetailsView AS
SELECT NAME, ADDRESS
FROM StudentDetails
WHERE S_ID < 5;
• To see the data in the View, we can query the view in the same manner as we
query a table.
SELECT * FROM DetailsView;
1. The SELECT statement which is used to create the view should not include GROUP
BY clause or ORDER BY clause.
2. The SELECT statement should not have the DISTINCT keyword.
3. The View should have all NOT NULL values.
4. The view should not be created using nested queries or complex queries.
5. The view should be created from a single table. If the view is created using
multiple tables then we will not be allowed to update the view.
• We can use the CREATE OR REPLACE VIEW statement to add or remove fields from
a view. Syntax:
CREATE OR REPLACE VIEW view_name AS
SELECT column1,column2,..
FROM table_name
WHERE condition;
• For example, if we want to update the view MarksView and add the field AGE to
this View from StudentMarks Table, we can do this as:
CREATE OR REPLACE VIEW MarksView AS
SELECT StudentDetails.NAME, StudentDetails.ADDRESS, StudentMarks.MARKS,
StudentMarks.AGE
FROM StudentDetails, StudentMarks
WHERE StudentDetails.NAME = StudentMarks.NAME;

(d) Why do we do normalization of databases ? Discuss synthesis and decomposition


approaches of normalization. Give one example for each approach.
Database normalization is a stepwise formal process that allows us to decompose database
tables in such a way that both data dependency and update anomalies are minimized. It
makes use of functional dependency that exists in the table and primary key or candidate key
in analyzing the tables. Normal forms were initially proposed called First Normal Form
(INF), Second Normal Form (2NF), and Third Normal Form (3NF
). Subsequently, R, Boyce, and E. F. Codd introduced a stronger definition of 3NF called Boyce-
Codd Normal Form. With the exception of 1NF, all these normal forms are based on
functional dependency among the attributes of a table. Higher normal forms that go beyond
BCNF were introduced later such as Fourth Normal Form (4NF) and Fifth Normal Form (5NF).
However, these later normal forms deal with situations that are very rare.
Normal
Form Test Remedy (Normalization)
Relation should have no non-atomic Form name relation for each non-atomic
1NF attributes or nested relations. attribute or nested relation.
Decompose and set up a new relation for
each partial key with its dependent
For relations where primary key contains attributes. Make sure to keep a relation
multiple attributes, no non-key attributes with the original primary key and any
should be functionally dependent on a attributes that are fully functionally
2NF part of the primary key. dependent on it.
Relation should not have a non-key
attribute functionally determined by
another non-key attribute (or by a sets of Decompose and set up a relation that
non-key attributes) i.e., there should be includes the non-key attribute(s) that
no transitive dependency of a non-key functionally determine(s) other non-key
3NF attribute of the primary key. attribute(s).

(e) What is the significance of checkpoints in DBMS ? Discuss the utility of checkpoints, with the
help of suitable example.
The checkpoint is used to declare a point before which the DBMS was in the consistent state,
and all transactions were committed. During transaction execution, such checkpoints are
traced. After execution, transaction log files will be created.
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving its update to the
database. Then a new log is created with upcoming execution operations of the transaction
and it will be updated until the next checkpoint and the process continues.
1. Write begin_checkpoint record into log.
2. Collect checkpoint data in the stable storage.
3. Write end_checkpoint record into log.
The behavior when the system crashes and recovers when concurrent transactions are
executed
• The recovery system reads the logs backward from the end to the last checkpoint
i.e. from T4 to T1.
• It will keep track of two lists – Undo and Redo.
• Whenever there is a log with instruction <Tn, start>and <Tn, commit> or only <Tn,
commit> then it will put that transaction in Redo List. T2 and T3 contain <Tn, Start>
and <Tn, Commit> whereas T1 will have only <Tn, Commit>. Here, T1, T2, and T3
are in the redo list.
• Whenever a log record with no instruction of commit or abort is found, that
transaction is put to Undo List <Here, T4 has <Tn, Start> but no <Tn, commit> as it
is an ongoing transaction. T4 will be put in the undo list.
All the transactions in the redo-list are deleted with their previous logs and then redone
before saving their logs. All the transactions in the undo-list are undone and their logs are
deleted.
Relevance of Checkpoints :
A checkpoint is a feature that adds a value of C in ACID-compliant to RDBMS. A checkpoint is
used for recovery if there is an unexpected shutdown in the database. Checkpoints work on
some intervals and write all dirty pages (modified pages) from logs relay to data file from i.e
from a buffer to physical disk. It is also known as the hardening of dirty pages. It is a
dedicated process and runs automatically by SQL Server at specific intervals. The
synchronization point between the database and transaction log is served with a checkpoint.

(g) Describe the utility of data replication in distributed DBMS. Briefly discuss the concept of
complete and selective replication.
Data Replication is the process of storing data in more than one site or node. It is useful
in improving the availability of data. It is simply copying data from a database from one server
to another server so that all the users can share the same data without any inconsistency. The
result is a distributed database in which users can access data relevant to their tasks without
interfering with the work of others.
Data replication encompasses duplication of transactions on an ongoing basis, so that
the replicate is in a consistently updated state and synchronized with the source.However in
data replication data is available at different locations, but a particular relation has to reside
at only one location.
There can be full replication, in which the whole database is stored at every site. There can
also be partial replication, in which some frequently used fragment of the database are
replicated and others are not replicated.
1. Transactional Replication – In Transactional replication users receive full initial
copies of the database and then receive updates as data changes. Data is copied in
real time from the publisher to the receiving database(subscriber) in the same
order as they occur with the publisher therefore in this type of
replication, transactional consistency is guaranteed. Transactional replication is
typically used in server-to-server environments. It does not simply copy the data
changes, but rather consistently and accurately replicates each change.
2. Snapshot Replication – Snapshot replication distributes data exactly as it appears
at a specific moment in time does not monitor for updates to the data. The entire
snapshot is generated and sent to Users. Snapshot replication is generally used
when data changes are infrequent. It is bit slower than transactional because on
each attempt it moves multiple records from one end to the other end. Snapshot
replication is a good way to perform initial synchronization between the publisher
and the subscriber.
3. Merge Replication – Data from two or more databases is combined into a single
database. Merge replication is the most complex type of replication because it
allows both publisher and subscriber to independently make changes to the
database. Merge replication is typically used in server-to-client environments. It
allows changes to be sent from one publisher to multiple subscribers.

2. (a) Explain ANSI-SPARC 3 level architecture of DBMS. Discuss the languages associated at
different levels. What are the different types of data independence involved at different levels ?
The three-level architecture aims to separate each user’s view of the database from the way
the database is physically represented.
External level:
It is the view how the user views the database. The data of the database that is relevant to
that user is described at this level. The external level consists of several different external
views of the database. In the external view only that entities, attributes, and relationships are
included that the user wants. The different views may have different ways of representing the
same data. For example, one user may view name in the form (firstname, lastname), while
another may view as (lastname, firstname).
1. Conceptual level:
It is the community view of the database and describes what data is stored in the
database and represents the entities, their attributes, and their relationships. It
represents the semantic, security, and integrity information about the data. The
middle-level or the second-level in the three-level architecture is the conceptual
level. This level contains the logical structure of the entire database, it represents
the complete view of the database that the organization demands independent of
any storage consideration.
2. Internal level:
At the internal level, the database is represented physically on the computer. It
emphasizes the physical implementation of the database to do storage space
utilization and to achieve the optimal runtime performance, and data encryption
techniques. It interfaces with the operating system to place the data on storage
files and build the storage space, retrieve the data, etc.

(b) Discuss the following :


(i) Lossless Decomposition – Lossless join decomposition is a decomposition of a relation R
into relations R1, R2 such that if we perform a natural join of relation R1 and R2, it will return
the original relation R. This is effective in removing redundancy from databases while
preserving the original data…
In other words by lossless decomposition, it becomes feasible to reconstruct the relation R
from decomposed tables R1 and R2 by using Joins.
In Lossless Decomposition, we select the common attribute and the criteria for selecting a
common attribute is that the common attribute must be a candidate key or super key in
either relation R1, R2, or both.
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at least one of
the following functional dependencies are in F+ (Closure of functional dependencies)

(ii) Dependency Preserving Decomposition Give suitable examples in support of your discussion
- Decomposition of a relation is done when a relation in relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition is lossless
join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
• Decomposition is lossy if R1 ⋈ R2 ⊃ R
• Decomposition is lossless if R1 ⋈ R2 = R
• Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R
must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
• Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
• Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
(b) What is the need of indices in a database system ? Mention the categories of indices
available in a DBMS. Which data structure is suitable for creating indices and why ?
o Indexing is used to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a database
table quickly.

Index structure:

Indexes can be created using some database columns.

o The first column of the database is the search key that contains a copy of the primary key
or candidate key of the table. The values of the primary key are stored in sorted order so
that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be found.

Ordered indices

The indices are usually sorted to make searching faster. The indices which are sorted are known
as ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.

o In the case of a database with no index, we have to search the disk block from starting till
it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the record
after reading 542*2= 1084 bytes which are very less compared to the previous case.
4. Differentiate between the following :
(a) DBMS and File base systems
Basis File System DBMS

The file system is software that


manages and organizes the files in a DBMS is software for
Structure storage medium within a computer. managing the database.

Data Redundant data can be present in a file In DBMS there is no redundant


Redundancy system. data.

It provides backup and


Backup and It doesn’t provide backup and recovery recovery of data even if it is
Recovery of data if it is lost. lost.

Query There is no efficient query processing in Efficient query processing is


processing the file system. there in DBMS.

There is more data consistency


There is less data consistency in the file because of the process of
Consistency system. normalization.

It has more complexity in


handling as compared to the
Complexity It is less complex as compared to DBMS. file system.

DBMS has more security


Security File systems provide less security in mechanisms as compared to
Constraints comparison to DBMS. file systems.

It has a comparatively higher


Cost It is less expensive than DBMS. cost than a file system.

Data In DBMS data independence


Independence There is no data independence. exists.

(b) 2-Phase locking and 2-Phase commit


1. A transaction is said to follow the Two-Phase Locking protocol if Locking and
Unlocking can be done in two phases. Growing Phase: New locks on data items
may be acquired but none can be released.
2. Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Note – If lock conversion is allowed, then upgrading of lock( from S(a) to X(a) ) is allowed in
the Growing Phase, and downgrading of lock (from X(a) to S(a)) must be done in shrinking
phase.
This is just a skeleton transaction that shows how unlocking and locking work with 2-PL. Note
for:
Transaction T1:
• The growing Phase is from steps 1-3.
• The shrinking Phase is from steps 5-7.
• Lock Point at 3
Transaction T2:
• The growing Phase is from steps 2-6.
• The shrinking Phase is from steps 8-9.
• Lock Point at 6

(c) DDBMS and Centralized DBMS


Basis of
S.NO. Comparison Centralized database Distributed database

It is a database that consists of


multiple databases which are
It is a database that is stored, connected with each other and
located as well as maintained are spread across different
1. Definition at a single location only. physical locations.

The data access time in the


case of multiple users is The data access time in the case
more in a centralized of multiple users is less in a
2. Access time database. distributed database.

The management, The management, modification,


modification, and backup of and backup of this database are
this database are easier as very difficult as it is spread
Management the entire data is present at across different physical
3. of data the same location. locations.
Basis of
S.NO. Comparison Centralized database Distributed database

Since it is spread across


This database provides a different locations thus it is
uniform and complete view difficult to provide a uniform
4. View to the user. view to the user.

This database has more data This database may have some
Data consistency in comparison to data replications thus data
5. Consistency distributed database. consistency is less.

The users cannot access the In a distributed database, if one


database in case of database database fails users have access
6. Failure failure occurs. to other databases.

A centralized database is less


7. Cost costly. This database is very expensive.

(d) Serial schedule and Serializable schedule

The serial schedule is a type of schedule where one transaction is executed completely before
starting another transaction. In the serial schedule, when the first transaction completes its cycle,
then the next transaction is executed.

For example: Suppose there are two transactions T1 and T2 which have some operations. If it has
no interleaving of operations, then there are the following two possible outcomes:

1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.

o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.

2. Non-serial Schedule

o If interleaving of operations is allowed, then there will be non-serial schedule.


o It contains many possible orders in which the system can execute the individual
operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It
has interleaving of operations.

5. Write short notes on any four of the following :


(a) Write Ahead Log Protocol - SQL Server needs to guarantee the durability of your
transactions (once you commit your data it is there even in the event of power loss) and the
ability to roll back the data changed from uncommitted transactions. The mechanism that is
being utilized is called Write-Ahead Logging (WAL). It simply means that SQL Server needs to
write the log records associated with a particular modification before it writes the page to the
disk.This process ensures that no modifications to a database page will be flushed to disk until
the associated transaction log records with that modification are written to disk firs to maintain
the ACID properties of a transaction.

(b) Clustering Indices - A Clustered index is one of the special types of index which reorders the
way records in the table are physically stored on the disk. It sorts and stores the data rows in the
table or view based on their key values. It is essentially a sorted copy of the data in the indexed
columns.
Sometimes we are asked to create an index on a non-unique key like dept-id in the below table.
There could be several employees in each department. Here, all employees belonging to the
same dept-id are considered to be within a single cluster, and the index pointers point to the
cluster as a whole.

(c) Locks and its Types - In this type of protocol, any transaction cannot read or write data until it
acquires an appropriate lock on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.

(d) Deadlock Prevention Protocols - Deadlock prevention –


For a large database, the deadlock prevention method is suitable. A deadlock can be
prevented if the resources are allocated in such a way that deadlock never occurs. The DBMS
analyzes the operations whether they can create a deadlock situation or not, If they do, that
transaction is never allowed to be executed.
Deadlock prevention mechanism proposes two schemes :

• Wait-Die Scheme –
In this scheme, If a transaction requests a resource that is locked by another
transaction, then the DBMS simply checks the timestamp of both transactions and
allows the older transaction to wait until the resource is available for execution.
Suppose, there are two transactions T1 and T2, and Let the timestamp of any
transaction T be TS (T). Now, If there is a lock on T2 by some other transaction and
T1 is requesting for resources held by T2, then DBMS performs the following
actions:
Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2 has held some
resource, then it allows T1 to wait until resource is available for execution. That
means if a younger transaction has locked some resource and an older transaction
is waiting for it, then an older transaction is allowed to wait for it till it is available.
If T1 is an older transaction and has held some resource with it and if T2 is waiting
for it, then T2 is killed and restarted later with random delay but with the same
timestamp. i.e. if the older transaction has held some resource and the younger
transaction waits for the resource, then the younger transaction is killed and
restarted with a very minute delay with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.

(e) Advantages and Disadvantages Distributed DBMS - The distributed database management
system contains the data in multiple locations. That can be in different systems in the same
place or across different geographical locations.
The database is divided into multiple locations and stores the data in Site1, Site2,Site3 and Site4.
The advantages and disadvantages of Distributed database management systems are as follows

Advantages of DDBMS
• The database is easier to expand as it is already spread across multiple systems
and it is not too complicated to add a system.
• The distributed database can have the data arranged according to different levels
of transparency i.e data with different transparency levels can be stored at
different locations.
• The database can be stored according to the departmental information in an
organisation. In that case, it is easier for a organisational hierarchical access.
• there were a natural catastrophe such as fire or an earthquake all the data would
not be destroyed it is stored at different locations.
• It is cheaper to create a network of systems containing a part of the database.
This database can also be easily increased or decreased.
• Even if some of the data nodes go offline, the rest of the database can continue
its normal functions.

Disadvantages of DDBMS

• The distributed database is quite complex and it is difficult to make sure that a
user gets a uniform view of the database because it is spread across multiple
locations.
• This database is more expensive as it is complex and hence, difficult to maintain.
• It is difficult to provide security in a distributed database as the database needs
to be secured at all the locations it is stored. Moreover, the infrastructure
connecting all the nodes in a distributed database also needs to be secured.

(f) What is the difference between DBMS and RDBMS ? Under what situations is it better to use
File based System than Database System ?
DBMS RDBMS

DBMS stores data as file. RDBMS stores data in tabular form.

Data elements need to access Multiple data elements can be accessed at the same
individually. time.

Data is stored in the form of tables which are related to


No relationship between data. each other.
DBMS RDBMS

Normalization is not present. Normalization is present.

DBMS does not support


distributed database. RDBMS supports distributed database.

It stores data in either a It uses a tabular structure where the headers are the
navigational or hierarchical column names, and the rows contain corresponding
form. values.

It deals with small quantity of


data. It deals with large amount of data.

Data redundancy is common in


this model. Keys and indexes do not allow Data redundancy.

It is used for small organization


and deal with small data. It is used to handle large amount of data.

It supports single user. It supports multiple users.

Data fetching is slower for the


large amount of data. Data fetching is fast because of relational approach.

(g) Explain database recovery using system log with the help of an example.(h) Explain the
following terms :

(i) Candidate key - A candidate key is a subset of a super key set where the key which contains
no redundant attribute is none other than a Candidate Key. In order to select the candidate keys
from the set of super key, we need to look at the super key set.

Role of a Candidate Key


The role of a candidate key is to identify a table row or column uniquely. Also, the value of a
candidate key cannot be Null. The description of a candidate key is "no redundant attributes"
and being a "minimal representation of a tuple," according to the Experts.

(ii) Primary key – A Primary Key is the minimal set of attributes of a table that has the task to
uniquely identify the rows, or we can say the tuples of the given particular table.

A primary key of a relation is one of the possible candidate keys which the database designer
thinks it's primary. It may be selected for convenience, performance and many other reasons.
The choice of the possible primary key from the candidate keys depend upon the following
conditions.

(iii) Foreign key - A foreign key is the one that is used to link two tables together via the primary
key. It means the columns of one table points to the primary key attribute of the other table. It
further means that if any attribute is set as a primary key attribute will work in another table as
a foreign key attribute. But one should know that a foreign key has nothing to do with the
primary key.

(iv) Super key - The role of the super key is simply to identify the tuples of the specified table in
the database. It is the superset where the candidate key is a part of the super key only. So, all
those attributes in a table that is capable of identifying the other attributes of the table in a
unique manner are all super keys.

(v) Alternate key - An alternate key (Alt key) is a key that is present on most computer
keyboards and is considered a modifier key that can used similarly to the shift or control keys.
In other words, the alternate key provides alternate input and operations when pressed in
combination with other keys

(b) Explain 3NF. Discuss the Insert, Delete and Update anomalies associated with 3NF.
Third Normal Form (3NF):
A relation is in third normal form, if there is no transitive dependency for non-prime
attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X –> Y:
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
3. A relation that is in First and Second Normal Form and in which no non-primary-key
attribute is transitively dependent on the primary key, then it is in Third Normal Form
(3NF).
4. Note – If A->B and B->C are two FDs then A->C is called transitive dependency.
5. The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively dependent
attribute(s) from the relation by placing the attribute(s) in a new relation along with a
copy of the determinant.

Anomalies

The above student table is also suffering from all three anomalies −
•Insertion anomaly − A new game can't be inserted into the table unless we get a
student to play that game.
• Deletion anomaly − If rollno 7 is deleted from the table we also lost the complete
information regarding tennis.
• Updation anomaly −To change the fee structure for basketball we need to make
changes in more than one place.
So, now to convert the above student table into 3NF first we need to decompose the table as
follows −

Decomposition for 3NF

To overcome these anomalies, the student table should be divided into smaller tables.
If X->Y is transitive dependency then divide R into R1(X+) and R2(R-Y+).
Game->feestructure is a transitive dependency [since neither game is a key nor fee is a key
attribute]
R1=game+=(game, feestructure)
R2=(student-feestructure+) = (rollno,game)
So divide the student table into R1(game, feestructure) and R2 (rollno, game).

3. Differentiate the following : 20


(ii) Deadlock avoidance and Deadlock prevention protocols
S.NO. Factors Deadlock Prevention Deadlock Avoidance

It blocks at least one of the


conditions necessary for It ensures that system does
1. Concept deadlock to occur. not go in unsafe state

Resource requests are done


Resource All the resources are requested according to the available
2. Request together. safe path.

It does not requires It requires information about


information about existing existing resources, available
Information resources, available resources resources and resource
3. required and resource requests requests

It prevents deadlock by
constraining resource request It automatically considers
process and handling of requests and check whether
4. Procedure resources. it is safe for system or not.

Sometimes, preemption occurs In deadlock avoidance there


5. Preemption more frequently. is no preemption.

Resource Resource allocation strategy Resource allocation strategy


allocation for deadlock prevention is for deadlock avoidance is not
6. strategy conservative. conservative.

Future It doesn’t requires knowledge It requires knowledge of


resource of future process resource future process resource
7. requests requests. requests.

(iv) 3NF and BCNF


S.NO. 3NF BCNF

In 3NF there should be no In BCNF for any relation


transitive dependency that is no A->B, A should be a
1. non prime attribute should be super key of relation.
S.NO. 3NF BCNF

transitively dependent on the


candidate key.

It is comparatively
more stronger than
2. It is less stronger than BCNF. 3NF.

In BCNF the functional


In 3NF the functional dependencies are
dependencies are already in already in 1NF, 2NF and
3. 1NF and 2NF. 3NF.

The redundancy is
comparatively low in
4. The redundancy is high in 3NF. BCNF.

In BCNF there may or


may not be
preservation of all
In 3NF there is preservation of functional
5. all functional dependencies. dependencies.

It is comparatively easier to
6. achieve. It is difficult to achieve.

Lossless decomposition
Lossless decomposition can be is hard to achieve in
7. achieved by 3NF. BCNF.

4. Write short notes on following : 20


(i) Precedence graph for serializability check - Precedence Graph or Serialization Graph is used
commonly to test Conflict Serializability of a schedule.
It is a directed Graph (V, E) consisting of a set of nodes V = {T1, T2, T3……….Tn} and a set of
directed edges E = {e1, e2, e3………………em}.
The graph contains one node for each Transaction T i. An edge ei is of the form Tj –> Tk where
Tj is the starting node of e i and Tk is the ending node of ei. An edge ei is constructed between
nodes Tj to Tk if one of the operations in T j appears in the schedule before some conflicting
operation in Tk .
The Algorithm can be written as:
1. Create a node T in the graph for each participating transaction in the schedule.
2. For the conflicting operation read_item(X) and write_item(X) – If a Transaction
Tj executes a read_item (X) after T i executes a write_item (X), draw an edge from
Ti to Tj in the graph.
3. For the conflicting operation write_item(X) and read_item(X) – If a Transaction
Tj executes a write_item (X) after T i executes a read_item (X), draw an edge from
Ti to Tj in the graph.
4. For the conflicting operation write_item(X) and write_item(X) – If a Transaction
Tj executes a write_item (X) after T i executes a write_item (X), draw an edge from
Ti to Tj in the graph.
5. The Schedule S is serializable if there is no cycle in the precedence graph.

(ii) Types of Indexes in DBMS - Indexing is a technique for improving database performance by
reducing the number of disk accesses necessary when a query is run. An index is a form of data
structure. It’s used to swiftly identify and access data and information present in a database
table. Ordered Indices
To make searching easier and faster, the indices are frequently arranged/sorted. Ordered
indices are indices that have been sorted.
Example
Let’s say we have a table of employees with thousands of records, each of which is ten bytes
large. If their IDs begin with 1, 2, 3,…, etc., and we are looking for the student with ID-543:

• We must search the disk block from the beginning till it reaches 543 in the case of a DB
without an index. After reading 543*10=5430 bytes, the DBMS will read the record.
• We will perform the search using indices in the case of an index, and the DBMS would
read the record after it reads 542*2 = 1084 bytes, which is significantly less than the
prior example.
Primary Index

• Primary indexing refers to the process of creating an index based on the table’s primary
key. These primary keys are specific to each record and establish a 1:1 relationship
between them.
• The searching operation is fairly efficient because primary keys are stored in sorted
order.
• There are two types of primary indexes: dense indexes and sparse indexes.
Dense Index
Every search key value in the data file has an index record in the dense index. It speeds up the
search process. The total number of records present in the index table and the main table are
the same in this case. It requires extra space to hold the index record. A pointer to the actual
record on the disk and the search key are both included in the index records.

(iii) Data fragmentation and its objectives – Fragmentation is a process of dividing the whole or
full database into various subtables or sub relations so that data can be stored in different
systems. The small pieces of sub relations or subtables are called fragments. These fragments
are called logical data units and are stored at various sites. It must be made sure that the
fragments are such that they can be used to reconstruct the original relation (i.e, there isn’t
any loss of data).
In the fragmentation process, let’s say, If a table T is fragmented and is divided into a number
of fragments say T1, T2, T3….TN. The fragments contain sufficient information to allow the
restoration of the original table T. This restoration can be done by the use of UNION or JOIN
operation on various fragments. This process is called data fragmentation. All of these
fragments are independent which means these fragments can not be derived from others. The
users needn’t be logically concerned about fragmentation which means they should not
concerned that the data is fragmented and this is called fragmentation Independence or we
can say fragmentation transparency.

3. (a) With the help of an example for each, explain the following : 2 3=6
(i) Binary Lock - A binary lock is a variable capable of holding only 2 possible values, i.e., a 1
(depicting a locked state) or a 0 (depicting an unlocked state). This lock is usually associated
with every data item in the database ( maybe at table level, row level or even the entire
database level). Should item X be unlocked, then a corresponding object lock(X) would return
the value 0. So, the instant a user/session begins updating the contents of item X, lock(X) is set
to a value of 1. Due to this, for as long as the update query lasts, no other user may access the
item X – even read or write to it!
There are 2 operations used to implement binary locks. They are lock_data( ) and unlock_data(
). The algorithms have been discussed below (only algorithms have been entertained due to
the diversity in DBMS scripts):
The locking operation :
lock_data(X):
label: if lock(X) == 0
{
then lock(X) = 1;
}
else //when lock(X) == 1 or item X is locked
{
wait (until item is unlocked or lock(X)=0) //wait for the user to finish the update query
go to label
}
Note that ‘label:‘ is literally a label for the line which can be referred to at a later step to transfer
execution to. The ‘wait’ command in the else block basically puts all other transactions wanting
to access X in a queue. Since it monitors or keeps other transactions scheduled until access to
the item is unlocked, it is often taken to be outside the lock_data(X) operation i.e., defined
outside.

(ii) Multiple-mode Locks - The various Concurrency Control schemes have used different
methods and every individual Data item as the unit on which synchronization is performed. A
certain drawback of this technique is if a transaction T i needs to access the entire database,
and a locking protocol is used, then T i must lock each item in the database. It is less efficient,
it would be simpler if T i could use a single lock to lock the entire database. But, if it considers
the second proposal, this should not in fact overlook the certain flaw in the proposed
method. Suppose another transaction just needs to access a few data items from a database,
so locking the entire database seems to be unnecessary moreover it may cost us a loss of
Concurrency, which was our primary goal in the first place. To bargain between Efficiency and
Concurrency. Use Granularity.
Let’s start by understanding what is meant by Granularity.
Granularity – It is the size of the data item allowed to lock. Now Multiple Granularity means
hierarchically breaking up the database into blocks that can be locked and can be tracked
needs what needs to lock and in what fashion. Such a hierarchy can be represented
graphically as a tree.

(b) Define primary and clustering indexes. Briefly discuss implementation of clustering indexes.

Primary Index

o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1 relation
between the records.
o As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.

Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is created
on non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the
unique value and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for
these group.

(c) Discuss the advantages and disadvantages of data replication. What are the objectives of
complete and selective replication ?
ADVANTAGES OF DATA REPLICATION – Data Replication is generally performed to:
• To provide a consistent copy of data across all the database nodes.
• To increase the availability of data.
• The reliability of data is increased through data replication.
• Data Replication supports multiple users and gives high performance.
• To remove any data redundancy,the databases are merged and slave
databases are updated with outdated or incomplete data.
• Since replicas are created there are chances that the data is found itself
where the transaction is executing which reduces the data movement.
• To perform faster execution of queries.
DISADVANTAGES OF DATA REPLICATION –
• More storage space is needed as storing the replicas of same data at
different sites consumes more space.
• Data Replication becomes expensive when the replicas at all different sites
need to be updated.
• Maintaining Data consistency at all different sites involves complex
measures.
• Transactional Replication – In Transactional replication users receive full initial copies
of the database and then receive updates as data changes. Data is copied in real time
from the publisher to the receiving database(subscriber) in the same order as they
occur with the publisher therefore in this type of replication, transactional consistency
is guaranteed. Transactional replication is typically used in server-to-server
environments. It does not simply copy the data changes, but rather consistently and
accurately replicates each change.
• Snapshot Replication – Snapshot replication distributes data exactly as it appears at a
specific moment in time does not monitor for updates to the data. The entire
snapshot is generated and sent to Users. Snapshot replication is generally used when
data changes are infrequent. It is bit slower than transactional because on each
attempt it moves multiple records from one end to the other end. Snapshot
replication is a good way to perform initial synchronization between the publisher and
the subscriber.
• Merge Replication – Data from two or more databases is combined into a single
database. Merge replication is the most complex type of replication because it allows
both publisher and subscriber to independently make changes to the database. Merge
replication is typically used in server-to-client environments. It allows changes to be
sent from one publisher to multiple subscribers.

4. (a) What is use of a precedence graph in database ? Write all the steps for constructing a
precedence graph. Suppose there are two transactions T1 and T2. Draw an edge between T1
and T2, if T2 has written on item X first and T1 writes on the same item later.

Precedence Graph or Serialization Graph is used commonly to test Conflict Serializability of a


schedule.
It is a directed Graph (V, E) consisting of a set of nodes V = {T 1, T2, T3……….Tn} and a set of
directed edges E = {e1, e2, e3………………em}.
The graph contains one node for each Transaction T i. An edge ei is of the form Tj –> Tk where
Tj is the starting node of e i and Tk is the ending node of ei. An edge ei is constructed between
nodes Tj to Tk if one of the operations in T j appears in the schedule before some conflicting
operation in Tk .
The Algorithm can be written as:
1. Create a node T in the graph for each participating transaction in the schedule.
2. For the conflicting operation read_item(X) and write_item(X) – If a Transaction
Tj executes a read_item (X) after T i executes a write_item (X), draw an edge from
Ti to Tj in the graph.
3. For the conflicting operation write_item(X) and read_item(X) – If a Transaction
Tj executes a write_item (X) after T i executes a read_item (X), draw an edge from
Ti to Tj in the graph.
4. For the conflicting operation write_item(X) and write_item(X) – If a Transaction
Tj executes a write_item (X) after T i executes a write_item (X), draw an edge from
Ti to Tj in the graph.
5. The Schedule S is serializable if there is no cycle in the precedence graph.
If there is no cycle in the precedence graph, it means we can construct a serial schedule S’
which is conflict equivalent to schedule S.
The serial schedule S’ can be found by Topological Sorting of the acyclic precedence graph.
Such schedules can be more than 1.

(b) What is Log-Based Recovery System ? Explain the type of information kept in a log about
transaction. Which type of transactions are selected for REDO and UNDO for database recovery
? Explain with an example.
Atomicity property of DBMS states that either all the operations of transactions must be
performed or none. The modifications done by an aborted transaction should not be visible to
database and the modifications done by committed transaction should be visible.
To achieve our goal of atomicity, user must first output to stable storage information
describing the modifications, without modifying the database itself. This information can help
us ensure that all modifications performed by committed transactions are reflected in the
database. This information can also help us ensure that no modifications made by an aborted
transaction persist in the database.
Log and log records –
The log is a sequence of log records, recording all the update activities in the database. In a
stable storage, logs for each transaction are maintained. Any operation which is performed
on the database is recorded is on the log. Prior to performing any modification to database,
an update log record is created to reflect that modification.
An update log record represented as: <Ti, Xj, V1, V2> has these fields:
1. Transaction identifier: Unique Identifier of the transaction that performed the
write operation.
2. Data item: Unique identifier of the data item written.
3. Old value: Value of data item prior to write.
4. New value: Value of data item after write operation.
Other type of log records are:
1. <Ti start>: It contains information about when a transaction Ti starts.
2. <Ti commit>: It contains information about when a transaction Ti commits.
3. <Ti abort>: It contains information about when a transaction Ti aborts.
Undo and Redo Operations –
Because all database modifications must be preceded by creation of log record, the system
has available both the old value prior to modification of data item and new value that is to be
written for data item. This allows system to perform redo and undo operations as
appropriate:
1. Undo: using a log record sets the data item specified in log record to old value.
2. Redo: using a log record sets the data item specified in log record to new value.
The database can be modified using two approaches –
1. Deferred Modification Technique: If the transaction does not modify the database
until it has partially committed, it is said to use deferred modification technique.
2. Immediate Modification Technique: If database modification occur while
transaction is still active, it is said to use immediate modification technique.

(f) What are the two advantages of a B-Tree as an index ? Write the important features of B-
Tree of order N.
B-Tree is a self-balancing search tree. In most of the other self-balancing search trees
(like AVL and Red-Black Trees), it is assumed that everything is in the main memory.
To understand the use of B-Trees, we must think of the huge amount of data that cannot fit in
the main memory. When the number of keys is high, the data is read from the disk in the
form of blocks. Disk access time is very high compared to the main memory access time. The
main idea of using B-Trees is to reduce the number of disk accesses. Most of the tree
operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses where h is the
height of the tree. B-tree is a fat tree. The height of B-Trees is kept low by putting the
maximum possible keys in a B-Tree node. Generally, the B-Tree node size is kept equal to the
disk block size. Since the height of the B-tree is low so total disk accesses for most of the
operations are reduced significantly compared to balanced Binary Search Trees like AVL Tree,
Red-Black Tree, etc.
Properties of B-Tree:
• All leaves are at the same level.
• B-Tree is defined by the term minimum degree ‘t‘. The value of ‘t‘ depends upon
disk block size.
• Every node except the root must contain at least t-1 keys. The root may contain a
minimum of 1 key.
• All nodes (including root) may contain at most (2*t – 1) keys.
• Number of children of a node is equal to the number of keys in it plus 1.
• All keys of a node are sorted in increasing order. The child between two
keys k1 and k2 contains all keys in the range from k1 and k2.
• B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary
Search Trees grow downward and also shrink from downward.
• Like other balanced Binary Search Trees, the time complexity to search, insert and
delete is O(log n).
• Insertion of a Node in B-Tree happens only at Leaf Node.

(c) Define 2NF. The following are the functional dependencies in a relation : (order_no,
item_code) primary key item_code price/unit order_no order_date
Is this relation in 2NF ? Justify. In case the relation is not in 2NF, convert it in 2NF.
Second Normal Form (2NF):
Second Normal Form (2NF) is based on the concept of full functional dependency. Second
Normal Form applies to relations with composite keys, that is, relations with a primary key
composed of two or more attributes. A relation with a single-attribute primary key is
automatically in at least 2NF. A relation that is not in 2NF may suffer from the update
anomalies.
To be in second normal form, a relation must be in first normal form and relation must not
contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no
non-prime attribute (attributes which are not part of any candidate key) is dependent on any
proper subset of any candidate key of the table.
In other words,
A relation that is in First Normal Form and every non-primary-key attribute is fully
functionally dependent on the primary key, then the relation is in Second Normal Form (2NF).
Note – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency.
The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a
partial dependency exists, we remove the partially dependent attribute(s) from the relation
by placing them in a new relation along with a copy of their determinant.

2. (a) Define a view. Explain with the help of an example. Also specify the conditions that a view
must meet in order to allow updates.
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real
table in the database. We can create a view by selecting fields from one or more tables
present in the database. A View can either have all the rows of a table or specific rows based
on certain condition. In this article we will learn about creating , deleting and updating Views.
We can create View using CREATE VIEW statement. A View can be created from a single table
or multiple tables. Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;

view_name: Name for the View


table_name: Name of the table
condition: Condition to select rows
Examples:
• Creating View from a single table:
• In this example we will create a View named DetailsView from the table
StudentDetails. Query:
CREATE VIEW DetailsView AS
SELECT NAME, ADDRESS
FROM StudentDetails
WHERE S_ID < 5;
• To see the data in the View, we can query the view in the same manner as we
query a table.
SELECT * FROM DetailsView;
• In this example, we will create a view named StudentNames from the table
StudentDetails. Query:
CREATE VIEW StudentNames AS
SELECT S_ID, NAME
FROM StudentDetails
ORDER BY NAME;
If we now query the view as, SELECT * FROM StudentNames;

5. (a) Describe the following client-server architecture with the help of a diagram : 2 5=10
(i) 2-tier -A 2 Tier Architecture in DBMS is a Database architecture where the presentation layer
runs on a client (PC, Mobile, Tablet, etc.), and data is stored on a server called the second tier.
Two tier architecture provides added security to the DBMS as it is not exposed to the end-user
directly. It also provides direct and faster communication. In the above 2 Tier client-server
architecture of database management system, we can see that one server is connected with
clients 1, 2, and 3.

Two Tier Architecture Example:

A Contact Management System created using MS- Access.

(ii) 3-tier - 3-Tier Architecture


A 3 Tier Architecture in DBMS is the most popular client server architecture in DBMS in which
the development and maintenance of functional processes, logic, data access, data storage, and
user interface is done independently as separate modules. Three Tier architecture contains a
presentation layer, an application layer, and a database server.

3-Tier database Architecture design is an extension of the 2-tier client-server architecture. A 3-


tier architecture has the following layers:

1. Presentation layer (your PC, Tablet, Mobile, etc.)


2. Application layer (server)
3. Database Server

(b) Explain the following concepts with the help of suitable example : 2 5=10
(i) Lossless decomposition - Lossless join decomposition is a decomposition of a relation R
into relations R1, R2 such that if we perform a natural join of relation R1 and R2, it will return
the original relation R. This is effective in removing redundancy from databases while
preserving the original data…
In other words by lossless decomposition, it becomes feasible to reconstruct the relation R
from decomposed tables R1 and R2 by using Joins.
In Lossless Decomposition, we select the common attribute and the criteria for selecting a
common attribute is that the common attribute must be a candidate key or super key in
either relation R1, R2, or both.
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at least one of
the following functional dependencies are in F+ (Closure of functional dependencies)

(b) In terms of DBMS, what is a transaction ? Discuss ACID properties of transaction.


A transaction is a single logical unit of work that accesses and possibly modifies the contents
of a database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after the transaction, certain
properties are followed. These are called ACID properties. Atomicity:
By this, we mean that either the entire transaction takes place at once or doesn’t happen at
all. There is no midway i.e. transactions do not occur partially. Each transaction is considered
as one unit and either runs to completion or is not executed at all. It involves the following
two operations.
—Abort: If a transaction aborts, changes made to the database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
If the transaction fails after completion of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then the amount has been deducted from X but not added
to Y. This results in an inconsistent database state. Therefore, the transaction must be
executed in its entirety in order to ensure the correctness of the database state.

Consistency:

This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above,
The total amount before and after the transaction must be maintained.

(c) Compare and contrast serial file organization technique with indexed sequential technique
in terms of storage, access and other features.
Sequential files Indexed files Relative files

These files can be accessed


These files can be accessed sequentially as well as
sequentially as well as randomly with the help of
These files can be randomly with the help of the their relative record
accessed only sequentially. record key. number.
Sequential files Indexed files Relative files

The records are stored based


on the value of the RECORD-
The records are stored KEY which is the part of the The records are stored by
sequentially. data. their relative address.

Records cannot be deleted It is possible to store the The records can be


and can only be stored at records in the middle of the inserted at any given
the end of the file. file. position.

It occupies less space as


the records are stored in
continuous order. It occupies more space. It occupies more space.

It provides slow access, as It also provides slow


in order to access any access(but is fast as compared It provides fast access as
record all the previous to sequential access) as it provides the record key
records are to be accessed takes time to search for the compared to the other
first. index. two.

In Indexed file organization, In Relative file


In Sequential file the records are written in organization, the records
organization, the records sequential order but can be can be written and read in
are read and written in read in sequential as well as sequential as well as
sequential order. random order. random order.

There is no need to One or more KEYS can be Only one unique KEY is
declare any KEY for storing created for storing and declared for storing and
and accessing the records. accessing the records. accessing the records.

(d) Define a JOIN operation, How is this different from Cartesian product in relational algebra ?
Explain with the help of an example.
Join operation combines the relation R1 and R2 with respect to a condition. It is denoted by ⋈.
The different types of join operation are as follows −
• Theta join
• Natural join
• Outer join − It is further classified into following types −
o Left outer join.
o Right outer join.
o Full outer join.

Theta join

If we join R1 and R2 other than the equal to condition then it is called theta join/ non-equi join.
Natural join
If we join R1 and R2 on equal condition then it is called natural join or equi join. Generally, join
is referred to as natural join.
Natural join of R1 and R2 is −
{ we select those tuples from cartesian product where R1.regno=R2.regno}
On applying CARTESIAN PRODUCT on two relations that is on two sets of tuples, it will take
every tuple one by one from the left set(relation) and will pair it up with all the tuples in the
right set(relation).
So, the CROSS PRODUCT of two relation A(R1, R2, R3, …, Rp) with degree p, and B(S1, S2, S3,
…, Sn) with degree n, is a relation C(R1, R2, R3, …, Rp, S1, S2, S3, …, Sn) with degree p + n
attributes.
CROSS PRODUCT is a binary set operation means, at a time we can apply the operation on
two relations. But the two relations on which we are performing the operations do not have
the same type of tuples, which means Union compatibility (or Type compatibility) of the two
relations is not necessary.

3. (a) What is Normalization ? Compare the Synthesis and Decomposition approach of


Normalization. Discuss Lossless decomposition and Dependency preserving decomposition,
with a suitable example of each. 10
Decomposition of a relation is done when a relation in relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition is lossless
join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
• Decomposition is lossy if R1 ⋈ R2 ⊃ R
• Decomposition is lossless if R1 ⋈ R2 = R
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of
R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A-
>BC is given.
Dependency Preserving Decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a
part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).
GATE Question: Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D.
Then the decomposition of R into R1(AB) and R2(CD) is [GATE-CS-2001]
A. dependency preserving and lossless join
B. lossless join but not dependency preserving
C. dependency preserving but not lossless join
D. not dependency preserving and not lossless join
Answer: For lossless join decomposition, these three conditions must hold true:
1. Att(R1) U Att(R2) = ABCD = Att(R)
2. Att(R1) ∩ Att(R2) = Φ, which violates the condition of lossless join decomposition.
Hence the decomposition is not lossless.

(b) What are Checkpoints ? How does this technique of Checkpoints contribute to database
recovery ?
The checkpoint is used to declare a point before which the DBMS was in the consistent state,
and all transactions were committed. During transaction execution, such checkpoints are
traced. After execution, transaction log files will be created.
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving its update to the
database. Then a new log is created with upcoming execution operations of the transaction
and it will be updated until the next checkpoint and the process continues.
• The recovery system reads the logs backward from the end to the last checkpoint
i.e. from T4 to T1.
• It will keep track of two lists – Undo and Redo.
• Whenever there is a log with instruction <Tn, start>and <Tn, commit> or only <Tn,
commit> then it will put that transaction in Redo List. T2 and T3 contain <Tn, Start>
and <Tn, Commit> whereas T1 will have only <Tn, Commit>. Here, T1, T2, and T3
are in the redo list.
• Whenever a log record with no instruction of commit or abort is found, that
transaction is put to Undo List <Here, T4 has <Tn, Start> but no <Tn, commit> as it
is an ongoing transaction. T4 will be put in the undo list.
All the transactions in the redo-list are deleted with their previous logs and then redone
before saving their logs. All the transactions in the undo-list are undone and their logs are
deleted.

4. Differentiate between the following (give example for each) : 4 5=20


(c) DBMS vs. File Base Systems
Basis File System DBMS

The file system is software that


manages and organizes the files in a DBMS is software for managing
Structure storage medium within a computer. the database.

Data Redundant data can be present in a file In DBMS there is no redundant


Redundancy system. data.

Backup and It doesn’t provide backup and recovery It provides backup and recovery
Recovery of data if it is lost. of data even if it is lost.

Query There is no efficient query processing in Efficient query processing is


processing the file system. there in DBMS.

There is more data consistency


There is less data consistency in the file because of the process of
Consistency system. normalization.

It has more complexity in


handling as compared to the file
Complexity It is less complex as compared to DBMS. system.

DBMS has more security


Security File systems provide less security in mechanisms as compared to file
Constraints comparison to DBMS. systems.

It has a comparatively higher


Cost It is less expensive than DBMS. cost than a file system.
1. (d) 2-Phase Locking vs. 2-Phase Commit - A transaction is said to follow the Two-
Phase Locking protocol if Locking and Unlocking can be done in two
phases. Growing Phase: New locks on data items may be acquired but none can be
released.
2. Shrinking Phase: Existing locks may be released but no new locks can be acquired.
This is just a skeleton transaction that shows how unlocking and locking work with 2-PL. Note
for:
Transaction T1:
• The growing Phase is from steps 1-3.
• The shrinking Phase is from steps 5-7.
• Lock Point at 3
Transaction T2:
• The growing Phase is from steps 2-6.
• The shrinking Phase is from steps 8-9.
• Lock Point at 6

5. Write short notes on the following : 4 5=20


(a) Concurrent Transactions - Concurrency Control deals with interleaved execution of more
than one transaction. In the next article, we will see what is serializability and how to find
whether a schedule is serializable or not.
What is Transaction?
A set of logically related operations is known as a transaction. The main operations of a
transaction are:
Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it
in a buffer in the main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from the
buffer.
(Note: It doesn’t always need to write it to a database back it just writes the changes to buffer
this is the reason where dirty read comes into the picture)

(b) Hashed File Organization - Hash File Organization uses the computation of hash function on
some fields of the records. The hash function's output determines the location of disk block
where the records are to be placed. When a record has to be received using the hash key
columns, then the address is generated, and the whole record is retrieved using that address. In
the same way, when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of delete and
update. In this method, there is no effort for searching and sorting the entire file. In this method,
each record will be stored randomly in the memory.
(c) DML Commands used in DBMS (any four) - The SQL commands that deals with the
manipulation of data present in the database belong to DML or Data Manipulation Language
and this includes most of the SQL statements. It is the component of the SQL statement that
controls access to data and to the database. Basically, DCL statements are grouped with DML
statements.
List of DML commands:
• INSERT : It is used to insert data into a table.
• UPDATE: It is used to update existing data within a table.
• DELETE : It is used to delete records from a database table.
• LOCK: Table control concurrency.
• CALL: Call a PL/SQL or JAVA subprogram.
• EXPLAIN PLAN: It describes the access path to data.

(d) Functional Dependency - The functional dependency is a relationship that exists between two
attributes. It typically exists between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule


For example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
For example, If {roll_no, name} → dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also valid.→
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid
by the Transitivity rule.
For example, roll_no → dept_name & dept_name → dept_building, then roll_no →
dept_building is also valid.

(b) Verify the statement, "Any relation in BCNF is in 3NF but converse is not true." Give suitable
example. 5
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X → Y:
1. X is a super key. (This condition is must for BCNF relations.).
2. Y is a prime attribute (each element of Y is part of some candidate key).
But, a relation is in BCNF iff, X is superkey for every functional dependency (FD) X → Y in given
relation.
Therefore, BCNF relations are subset of 3NF relations. Means every BCNF relation is 3NF but
converse may not true.

(c) Explain the term data replication and data fragmentation with suitable example. 5
Data Replication is the process of storing data in more than one site or node. It is useful
in improving the availability of data. It is simply copying data from a database from one server
to another server so that all the users can share the same data without any inconsistency. The
result is a distributed database in which users can access data relevant to their tasks without
interfering with the work of others.
Data replication encompasses duplication of transactions on an ongoing basis, so that
the replicate is in a consistently updated state and synchronized with the source.However in
data replication data is available at different locations, but a particular relation has to reside
at only one location.
There can be full replication, in which the whole database is stored at every site. There can
also be partial replication, in which some frequently used fragment of the database are
replicated and others are not replicated.
Types of Data Replication
1. Transactional Replication – In Transactional replication users receive full initial
copies of the database and then receive updates as data changes. Data is copied in
real time from the publisher to the receiving database(subscriber) in the same
order as they occur with the publisher therefore in this type of
replication, transactional consistency is guaranteed. Transactional replication is
typically used in server-to-server environments. It does not simply copy the data
changes, but rather consistently and accurately replicates each change.
2. Snapshot Replication – Snapshot replication distributes data exactly as it appears
at a specific moment in time does not monitor for updates to the data. The entire
snapshot is generated and sent to Users. Snapshot replication is generally used
when data changes are infrequent. It is bit slower than transactional because on
each attempt it moves multiple records from one end to the other end. Snapshot
replication is a good way to perform initial synchronization between the publisher
and the subscriber.

(d) What are integrity constraints ? Explain the various types of integrity constraints with
suitable examples.
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.

Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if
the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key
of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in
Table 2.

Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.

(e) How do you implement a hierarchical data model ? Explain through an illustration. 5
Applications of hierarchical model :
• Hierarchical models are generally used as semantic models in practice as many
real-world occurrences of events are hierarchical in nature like biological
structures, political, or social structures.
• Hierarchical models are also commonly used as physical models because of the
inherent hierarchical structure of the disk storage system like tracks, cylinders, etc.
There are various examples such as Information Management System (IMS) by
IBM, NOMAD by NCSS, etc.

(f) Define Data Manipulation Language (DML) of SQL. List and explain various DML commands.

DML is an abbreviation of Data Manipulation Language.

The DML commands in Structured Query Language change the data present in the SQL database.
We can easily access, store, modify, update and delete the existing records from the database
using DML commands.

Following are the four main DML commands in SQL:

1. SELECT Command
2. INSERT Command
3. UPDATE Command
4. DELETE Command

SELECT DML Command


SELECT is the most important data manipulation command in Structured Query Language. The
SELECT command shows the records of the specified table. It also shows the particular record of
a particular column by using the WHERE clause.

DML is an abbreviation of Data Manipulation Language.

The DML commands in Structured Query Language change the data present in the SQL database.
We can easily access, store, modify, update and delete the existing records from the database
using DML commands.

Following are the four main DML commands in SQL:

1. SELECT Command
2. INSERT Command
3. UPDATE Command
4. DELETE Command

SELECT DML Command

SELECT is the most important data manipulation command in Structured Query Language. The
SELECT command shows the records of the specified table. It also shows the particular record of
a particular column by using the WHERE clause.

(g) How do B-tree indexes differ from Binary search tree indexes ?
B-Tree : B-Tree is known as a self-balancing tree as its nodes are sorted in the inorder
traversal. Unlike the binary trees, in B-tree, a node can have more than two children. B-tree
has a height of logM N (Where ‘M’ is the order of tree and N is the number of nodes). And the
height is adjusts automatically at each update. In the B-tree data is sorted in a specific order,
with the lowest value on the left and the highest value on the right. To insert the data or key
in B-tree is more complicated than binary tree.
There are some conditions that must be hold by the B-Tree:
• All the leaf nodes of the B-tree must be at the same level.
• Above the leaf nodes of the B-tree, there should be no empty sub-trees.
• B- tree’s height should lie as low as possible.
Binary Tree : A binary tree is the special type of general tree. Unlike B-tree, in a binary tree a
node can have at most two nodes. In a binary tree, there is a limitation on the degree of a
node because the nodes in a binary tree can’t have more than two child node(or degree two).
The topmost node of a binary tree is called root node and there are mainly two subtrees one
is left-subtree and another is right-sub-tree. Unlike the general tree, the binary tree can be
empty. Like B-tree, binary tree can also be sorted in inorder traversal. But it can also be
sorted in preorder as well as postorder. In binary tree, data insertion is not complicated than
B-tree.

(h) Differentiate between the concepts of Logical data independence and . Physical data
independence in DBMS.
Physical Data Independence Logical Data Independence

It mainly concern about how the data is It mainly concerned about the structure or
stored into the system. the changing data definition.

It is difficult to retrieve because the data is


mainly dependent on the logical structure
It is easy to retrieve. of data.

As compared to the logical independence it As compared to the physical independence


is easy to achieve physical data it is not easy to achieve logical data
independence. independence.

Any change at the physical level, does not The change in the logical level requires a
require to change at the application level. change at the application level.

The modifications made at the internal The modifications made at the logical level
level may or may not be needed to is significant whenever the logical structure
improve the performance of the structure. of the database is to be changed.

It is concerned with the internal schema. It is concerned with the conceptual schema.

Example: Change in compression


techniques, Hashing algorithms and Example: Add/Modify or Delete a new
storage devices etc. attribute.

3. (a) What do you understand by the term Query Optimization ? Discuss the role of relational
algebra in Query Optimization. List the operators used in relational algebra and discuss the
operation of each, with suitable example.

Query: A query is a request for information from a database.


Query Plans: A query plan (or query execution plan) is an ordered set of steps used to access
data in a SQL relational database management system.
Query Optimization: A single query can be executed through different algorithms or re-
written in different forms and structures. Hence, the question of query optimization comes
into the picture – Which of these forms or pathways is the most optimal? The query optimizer
attempts to determine the most efficient way to execute a given query by considering the
possible query plans.
Importance: The goal of query optimization is to reduce the system resources required to
fulfill a query, and ultimately provide the user with the correct result set faster.
Analyze and transform equivalent relational expressions.
Here, we shall talk about generating minimal equivalent expressions. To analyze equivalent
expression, listed are a set of equivalence rules. These generate equivalent expressions for a
query written in relational algebra. To optimize a query, we must convert the query into its
equivalent form as long as an equivalence rule is satisfied.

1. Conjunctive selection operations can be written as a sequence of individual


selections. This is called a sigma-cascade.

Explanation: Applying condition intersection is expensive. Instead, filter out


tuples satisfying condition (inner selection) and then apply condition (outer
selection) to the then resulting fewer tuples. This leaves us with less tuples to
process the second time. This can be extended for two or more intersecting
selections. Since we are breaking a single condition into a series of selections or
cascades, it is called a “cascade”.

2. Selection is commutative.

Explanation: condition is commutative in nature. This means, it does not matter


whether we apply first or first. In practice, it is better and more optimal to
apply that selection first which yields a fewer number of tuples. This saves time on
our outer selection.

3. All following projections can be omitted, only the first projection is required. This is
called a pi-cascade.

Explanation: A cascade or a series of projections is meaningless. This is because in


the end, we are only selecting those columns which are specified in the last, or the
outermost projection. Hence, it is better to collapse all the projections into just
one i.e. the outermost projection.
(c) What do you understand by the terms Lossless decomposition and Dependency Preserving
decomposition ? Is it always true that a lossless decomposition is dependency preserving too ?
Give suitable example in support of your answer.
Decomposition of a relation is done when a relation in relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition is lossless
join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
• Decomposition is lossy if R1 ⋈ R2 ⊃ R
• Decomposition is lossless if R1 ⋈ R2 = R
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of
R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A-
>BC is given.
Dependency Preserving Decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a
part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).
GATE Question: Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D.
Then the decomposition of R into R1(AB) and R2(CD) is [GATE-CS-2001]
A. dependency preserving and lossless join
B. lossless join but not dependency preserving
C. dependency preserving but not lossless join
D. not dependency preserving and not lossless join
Answer: For lossless join decomposition, these three conditions must hold true:
1. Att(R1) U Att(R2) = ABCD = Att(R)
2. Att(R1) ∩ Att(R2) = Φ, which violates the condition of lossless join decomposition.
Hence the decomposition is not lossless.
(e) Explain database recovery using a system log with the help of an example.
Log is nothing but a file which contains a sequence of records, each log record refers to a write
operation. All the log records are recorded step by step in the log file. We can say, log files store
the history of all updates activities.
Log contains start of transaction, transaction number, record number, old value, new value, end
of transaction etc. For example, mini statements in bank ATMs.
If within an ongoing transaction, the system crashes, then by using log files, we can return back
to the previous state as if nothing has happened to the database.
The log is kept on disk so that it is not affected by failures except disk and failures.

Example

Different types of log records are as follows −


• <Ti, Xi, V1, V2> − update log record, where Ti=transaction, Xi=data, V1=old data,
V2=new value.
• <Ti, start> − Transaction Ti starts execution.
• <Ti, commit> − Transaction Ti is committed.
• <Ti, abort> − Transaction Ti is aborted
The log records can be written as follows −
Create a log for the given transaction T1 and T2.
T1 T2 Log
Read A Read A <T1, start>
A=A-2000 A=A+5000 <T1,A,5000, 3000>
Write A Write A <T1, B, 8000, 10000>
Read B Read B <T1, commit>
B=B+2000 B= B+7000 <T2, start>
Write B Write B <T2, A, 3000, 8000>
<T2, B, 10000, 17000>
<T2, commit>

Log based recovery techniques

Log based recovery uses one of the techniques −

Deferred database modification


It modifies the database after completion of transaction. The database modification is deferred
or delayed until the last operation of the transaction is executed. Update log records maintain
the new value of the data item.
Recover system uses one operation which is as follows −
• Redo(Ti) − All data items updated by the transaction Ti are set to a new value.

Immediate database modification

It modifies the database after a write operation, database modification is immediately done
when a transaction performs an update/ write operation. Update log records maintain both old
and new values of data items.
The recovery system uses two operations, which are as follows −
• Undo(Ti) − All data items updated by the transaction Ti, are set to old value.
• Redo(Ti) − All data items updated by the transaction Ti are set to a new value.

You might also like