MCS023
MCS023
This Company ER diagram illustrates key information about Company, including entities such
as employee, department, project and dependent. It allows to understand the relationships
between entities.
Entities and their Attributes are
• Employee Entity : Attributes of Employee Entity are Name, Id, Address, Gender,
Dob and Doj.
Id is Primary Key for Employee Entity.
• Department Entity : Attributes of Department Entity are D_no, Name and Location.
D_no is Primary Key for Department Entity.
• Project Entity : Attributes of Project Entity are P_No, Name and Location.
P_No is Primary Key for Project Entity.
• Dependent Entity : Attributes of Dependent Entity are D_no, Gender and
relationship.
(c) What is the role of views in DBMS ? Can we perform insert, delete or modify operations, if
the view contains a group function ? Justify.
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real
table in the database. We can create a view by selecting fields from one or more tables
present in the database. A View can either have all the rows of a table or specific rows based
on certain condition. In this article we will learn about creating , deleting and updating Views.
We can create View using CREATE VIEW statement. A View can be created from a single table
or multiple tables. Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
(e) What is the significance of checkpoints in DBMS ? Discuss the utility of checkpoints, with the
help of suitable example.
The checkpoint is used to declare a point before which the DBMS was in the consistent state,
and all transactions were committed. During transaction execution, such checkpoints are
traced. After execution, transaction log files will be created.
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving its update to the
database. Then a new log is created with upcoming execution operations of the transaction
and it will be updated until the next checkpoint and the process continues.
1. Write begin_checkpoint record into log.
2. Collect checkpoint data in the stable storage.
3. Write end_checkpoint record into log.
The behavior when the system crashes and recovers when concurrent transactions are
executed
• The recovery system reads the logs backward from the end to the last checkpoint
i.e. from T4 to T1.
• It will keep track of two lists – Undo and Redo.
• Whenever there is a log with instruction <Tn, start>and <Tn, commit> or only <Tn,
commit> then it will put that transaction in Redo List. T2 and T3 contain <Tn, Start>
and <Tn, Commit> whereas T1 will have only <Tn, Commit>. Here, T1, T2, and T3
are in the redo list.
• Whenever a log record with no instruction of commit or abort is found, that
transaction is put to Undo List <Here, T4 has <Tn, Start> but no <Tn, commit> as it
is an ongoing transaction. T4 will be put in the undo list.
All the transactions in the redo-list are deleted with their previous logs and then redone
before saving their logs. All the transactions in the undo-list are undone and their logs are
deleted.
Relevance of Checkpoints :
A checkpoint is a feature that adds a value of C in ACID-compliant to RDBMS. A checkpoint is
used for recovery if there is an unexpected shutdown in the database. Checkpoints work on
some intervals and write all dirty pages (modified pages) from logs relay to data file from i.e
from a buffer to physical disk. It is also known as the hardening of dirty pages. It is a
dedicated process and runs automatically by SQL Server at specific intervals. The
synchronization point between the database and transaction log is served with a checkpoint.
(g) Describe the utility of data replication in distributed DBMS. Briefly discuss the concept of
complete and selective replication.
Data Replication is the process of storing data in more than one site or node. It is useful
in improving the availability of data. It is simply copying data from a database from one server
to another server so that all the users can share the same data without any inconsistency. The
result is a distributed database in which users can access data relevant to their tasks without
interfering with the work of others.
Data replication encompasses duplication of transactions on an ongoing basis, so that
the replicate is in a consistently updated state and synchronized with the source.However in
data replication data is available at different locations, but a particular relation has to reside
at only one location.
There can be full replication, in which the whole database is stored at every site. There can
also be partial replication, in which some frequently used fragment of the database are
replicated and others are not replicated.
1. Transactional Replication – In Transactional replication users receive full initial
copies of the database and then receive updates as data changes. Data is copied in
real time from the publisher to the receiving database(subscriber) in the same
order as they occur with the publisher therefore in this type of
replication, transactional consistency is guaranteed. Transactional replication is
typically used in server-to-server environments. It does not simply copy the data
changes, but rather consistently and accurately replicates each change.
2. Snapshot Replication – Snapshot replication distributes data exactly as it appears
at a specific moment in time does not monitor for updates to the data. The entire
snapshot is generated and sent to Users. Snapshot replication is generally used
when data changes are infrequent. It is bit slower than transactional because on
each attempt it moves multiple records from one end to the other end. Snapshot
replication is a good way to perform initial synchronization between the publisher
and the subscriber.
3. Merge Replication – Data from two or more databases is combined into a single
database. Merge replication is the most complex type of replication because it
allows both publisher and subscriber to independently make changes to the
database. Merge replication is typically used in server-to-client environments. It
allows changes to be sent from one publisher to multiple subscribers.
2. (a) Explain ANSI-SPARC 3 level architecture of DBMS. Discuss the languages associated at
different levels. What are the different types of data independence involved at different levels ?
The three-level architecture aims to separate each user’s view of the database from the way
the database is physically represented.
External level:
It is the view how the user views the database. The data of the database that is relevant to
that user is described at this level. The external level consists of several different external
views of the database. In the external view only that entities, attributes, and relationships are
included that the user wants. The different views may have different ways of representing the
same data. For example, one user may view name in the form (firstname, lastname), while
another may view as (lastname, firstname).
1. Conceptual level:
It is the community view of the database and describes what data is stored in the
database and represents the entities, their attributes, and their relationships. It
represents the semantic, security, and integrity information about the data. The
middle-level or the second-level in the three-level architecture is the conceptual
level. This level contains the logical structure of the entire database, it represents
the complete view of the database that the organization demands independent of
any storage consideration.
2. Internal level:
At the internal level, the database is represented physically on the computer. It
emphasizes the physical implementation of the database to do storage space
utilization and to achieve the optimal runtime performance, and data encryption
techniques. It interfaces with the operating system to place the data on storage
files and build the storage space, retrieve the data, etc.
(ii) Dependency Preserving Decomposition Give suitable examples in support of your discussion
- Decomposition of a relation is done when a relation in relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition is lossless
join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
• Decomposition is lossy if R1 ⋈ R2 ⊃ R
• Decomposition is lossless if R1 ⋈ R2 = R
• Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R
must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
• Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
• Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
(b) What is the need of indices in a database system ? Mention the categories of indices
available in a DBMS. Which data structure is suitable for creating indices and why ?
o Indexing is used to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a database
table quickly.
Index structure:
o The first column of the database is the search key that contains a copy of the primary key
or candidate key of the table. The values of the primary key are stored in sorted order so
that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be found.
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known
as ordered indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
o In the case of a database with no index, we have to search the disk block from starting till
it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the record
after reading 542*2= 1084 bytes which are very less compared to the previous case.
4. Differentiate between the following :
(a) DBMS and File base systems
Basis File System DBMS
This database has more data This database may have some
Data consistency in comparison to data replications thus data
5. Consistency distributed database. consistency is less.
The serial schedule is a type of schedule where one transaction is executed completely before
starting another transaction. In the serial schedule, when the first transaction completes its cycle,
then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has
no interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
(b) Clustering Indices - A Clustered index is one of the special types of index which reorders the
way records in the table are physically stored on the disk. It sorts and stores the data rows in the
table or view based on their key values. It is essentially a sorted copy of the data in the indexed
columns.
Sometimes we are asked to create an index on a non-unique key like dept-id in the below table.
There could be several employees in each department. Here, all employees belonging to the
same dept-id are considered to be within a single cluster, and the index pointers point to the
cluster as a whole.
(c) Locks and its Types - In this type of protocol, any transaction cannot read or write data until it
acquires an appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
• Wait-Die Scheme –
In this scheme, If a transaction requests a resource that is locked by another
transaction, then the DBMS simply checks the timestamp of both transactions and
allows the older transaction to wait until the resource is available for execution.
Suppose, there are two transactions T1 and T2, and Let the timestamp of any
transaction T be TS (T). Now, If there is a lock on T2 by some other transaction and
T1 is requesting for resources held by T2, then DBMS performs the following
actions:
Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2 has held some
resource, then it allows T1 to wait until resource is available for execution. That
means if a younger transaction has locked some resource and an older transaction
is waiting for it, then an older transaction is allowed to wait for it till it is available.
If T1 is an older transaction and has held some resource with it and if T2 is waiting
for it, then T2 is killed and restarted later with random delay but with the same
timestamp. i.e. if the older transaction has held some resource and the younger
transaction waits for the resource, then the younger transaction is killed and
restarted with a very minute delay with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
(e) Advantages and Disadvantages Distributed DBMS - The distributed database management
system contains the data in multiple locations. That can be in different systems in the same
place or across different geographical locations.
The database is divided into multiple locations and stores the data in Site1, Site2,Site3 and Site4.
The advantages and disadvantages of Distributed database management systems are as follows
−
Advantages of DDBMS
• The database is easier to expand as it is already spread across multiple systems
and it is not too complicated to add a system.
• The distributed database can have the data arranged according to different levels
of transparency i.e data with different transparency levels can be stored at
different locations.
• The database can be stored according to the departmental information in an
organisation. In that case, it is easier for a organisational hierarchical access.
• there were a natural catastrophe such as fire or an earthquake all the data would
not be destroyed it is stored at different locations.
• It is cheaper to create a network of systems containing a part of the database.
This database can also be easily increased or decreased.
• Even if some of the data nodes go offline, the rest of the database can continue
its normal functions.
Disadvantages of DDBMS
• The distributed database is quite complex and it is difficult to make sure that a
user gets a uniform view of the database because it is spread across multiple
locations.
• This database is more expensive as it is complex and hence, difficult to maintain.
• It is difficult to provide security in a distributed database as the database needs
to be secured at all the locations it is stored. Moreover, the infrastructure
connecting all the nodes in a distributed database also needs to be secured.
(f) What is the difference between DBMS and RDBMS ? Under what situations is it better to use
File based System than Database System ?
DBMS RDBMS
Data elements need to access Multiple data elements can be accessed at the same
individually. time.
It stores data in either a It uses a tabular structure where the headers are the
navigational or hierarchical column names, and the rows contain corresponding
form. values.
(g) Explain database recovery using system log with the help of an example.(h) Explain the
following terms :
(i) Candidate key - A candidate key is a subset of a super key set where the key which contains
no redundant attribute is none other than a Candidate Key. In order to select the candidate keys
from the set of super key, we need to look at the super key set.
(ii) Primary key – A Primary Key is the minimal set of attributes of a table that has the task to
uniquely identify the rows, or we can say the tuples of the given particular table.
A primary key of a relation is one of the possible candidate keys which the database designer
thinks it's primary. It may be selected for convenience, performance and many other reasons.
The choice of the possible primary key from the candidate keys depend upon the following
conditions.
(iii) Foreign key - A foreign key is the one that is used to link two tables together via the primary
key. It means the columns of one table points to the primary key attribute of the other table. It
further means that if any attribute is set as a primary key attribute will work in another table as
a foreign key attribute. But one should know that a foreign key has nothing to do with the
primary key.
(iv) Super key - The role of the super key is simply to identify the tuples of the specified table in
the database. It is the superset where the candidate key is a part of the super key only. So, all
those attributes in a table that is capable of identifying the other attributes of the table in a
unique manner are all super keys.
(v) Alternate key - An alternate key (Alt key) is a key that is present on most computer
keyboards and is considered a modifier key that can used similarly to the shift or control keys.
In other words, the alternate key provides alternate input and operations when pressed in
combination with other keys
(b) Explain 3NF. Discuss the Insert, Delete and Update anomalies associated with 3NF.
Third Normal Form (3NF):
A relation is in third normal form, if there is no transitive dependency for non-prime
attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X –> Y:
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
3. A relation that is in First and Second Normal Form and in which no non-primary-key
attribute is transitively dependent on the primary key, then it is in Third Normal Form
(3NF).
4. Note – If A->B and B->C are two FDs then A->C is called transitive dependency.
5. The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively dependent
attribute(s) from the relation by placing the attribute(s) in a new relation along with a
copy of the determinant.
Anomalies
The above student table is also suffering from all three anomalies −
•Insertion anomaly − A new game can't be inserted into the table unless we get a
student to play that game.
• Deletion anomaly − If rollno 7 is deleted from the table we also lost the complete
information regarding tennis.
• Updation anomaly −To change the fee structure for basketball we need to make
changes in more than one place.
So, now to convert the above student table into 3NF first we need to decompose the table as
follows −
To overcome these anomalies, the student table should be divided into smaller tables.
If X->Y is transitive dependency then divide R into R1(X+) and R2(R-Y+).
Game->feestructure is a transitive dependency [since neither game is a key nor fee is a key
attribute]
R1=game+=(game, feestructure)
R2=(student-feestructure+) = (rollno,game)
So divide the student table into R1(game, feestructure) and R2 (rollno, game).
It prevents deadlock by
constraining resource request It automatically considers
process and handling of requests and check whether
4. Procedure resources. it is safe for system or not.
It is comparatively
more stronger than
2. It is less stronger than BCNF. 3NF.
The redundancy is
comparatively low in
4. The redundancy is high in 3NF. BCNF.
It is comparatively easier to
6. achieve. It is difficult to achieve.
Lossless decomposition
Lossless decomposition can be is hard to achieve in
7. achieved by 3NF. BCNF.
(ii) Types of Indexes in DBMS - Indexing is a technique for improving database performance by
reducing the number of disk accesses necessary when a query is run. An index is a form of data
structure. It’s used to swiftly identify and access data and information present in a database
table. Ordered Indices
To make searching easier and faster, the indices are frequently arranged/sorted. Ordered
indices are indices that have been sorted.
Example
Let’s say we have a table of employees with thousands of records, each of which is ten bytes
large. If their IDs begin with 1, 2, 3,…, etc., and we are looking for the student with ID-543:
• We must search the disk block from the beginning till it reaches 543 in the case of a DB
without an index. After reading 543*10=5430 bytes, the DBMS will read the record.
• We will perform the search using indices in the case of an index, and the DBMS would
read the record after it reads 542*2 = 1084 bytes, which is significantly less than the
prior example.
Primary Index
• Primary indexing refers to the process of creating an index based on the table’s primary
key. These primary keys are specific to each record and establish a 1:1 relationship
between them.
• The searching operation is fairly efficient because primary keys are stored in sorted
order.
• There are two types of primary indexes: dense indexes and sparse indexes.
Dense Index
Every search key value in the data file has an index record in the dense index. It speeds up the
search process. The total number of records present in the index table and the main table are
the same in this case. It requires extra space to hold the index record. A pointer to the actual
record on the disk and the search key are both included in the index records.
(iii) Data fragmentation and its objectives – Fragmentation is a process of dividing the whole or
full database into various subtables or sub relations so that data can be stored in different
systems. The small pieces of sub relations or subtables are called fragments. These fragments
are called logical data units and are stored at various sites. It must be made sure that the
fragments are such that they can be used to reconstruct the original relation (i.e, there isn’t
any loss of data).
In the fragmentation process, let’s say, If a table T is fragmented and is divided into a number
of fragments say T1, T2, T3….TN. The fragments contain sufficient information to allow the
restoration of the original table T. This restoration can be done by the use of UNION or JOIN
operation on various fragments. This process is called data fragmentation. All of these
fragments are independent which means these fragments can not be derived from others. The
users needn’t be logically concerned about fragmentation which means they should not
concerned that the data is fragmented and this is called fragmentation Independence or we
can say fragmentation transparency.
3. (a) With the help of an example for each, explain the following : 2 3=6
(i) Binary Lock - A binary lock is a variable capable of holding only 2 possible values, i.e., a 1
(depicting a locked state) or a 0 (depicting an unlocked state). This lock is usually associated
with every data item in the database ( maybe at table level, row level or even the entire
database level). Should item X be unlocked, then a corresponding object lock(X) would return
the value 0. So, the instant a user/session begins updating the contents of item X, lock(X) is set
to a value of 1. Due to this, for as long as the update query lasts, no other user may access the
item X – even read or write to it!
There are 2 operations used to implement binary locks. They are lock_data( ) and unlock_data(
). The algorithms have been discussed below (only algorithms have been entertained due to
the diversity in DBMS scripts):
The locking operation :
lock_data(X):
label: if lock(X) == 0
{
then lock(X) = 1;
}
else //when lock(X) == 1 or item X is locked
{
wait (until item is unlocked or lock(X)=0) //wait for the user to finish the update query
go to label
}
Note that ‘label:‘ is literally a label for the line which can be referred to at a later step to transfer
execution to. The ‘wait’ command in the else block basically puts all other transactions wanting
to access X in a queue. Since it monitors or keeps other transactions scheduled until access to
the item is unlocked, it is often taken to be outside the lock_data(X) operation i.e., defined
outside.
(ii) Multiple-mode Locks - The various Concurrency Control schemes have used different
methods and every individual Data item as the unit on which synchronization is performed. A
certain drawback of this technique is if a transaction T i needs to access the entire database,
and a locking protocol is used, then T i must lock each item in the database. It is less efficient,
it would be simpler if T i could use a single lock to lock the entire database. But, if it considers
the second proposal, this should not in fact overlook the certain flaw in the proposed
method. Suppose another transaction just needs to access a few data items from a database,
so locking the entire database seems to be unnecessary moreover it may cost us a loss of
Concurrency, which was our primary goal in the first place. To bargain between Efficiency and
Concurrency. Use Granularity.
Let’s start by understanding what is meant by Granularity.
Granularity – It is the size of the data item allowed to lock. Now Multiple Granularity means
hierarchically breaking up the database into blocks that can be locked and can be tracked
needs what needs to lock and in what fashion. Such a hierarchy can be represented
graphically as a tree.
(b) Define primary and clustering indexes. Briefly discuss implementation of clustering indexes.
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1 relation
between the records.
o As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is created
on non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the
unique value and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for
these group.
(c) Discuss the advantages and disadvantages of data replication. What are the objectives of
complete and selective replication ?
ADVANTAGES OF DATA REPLICATION – Data Replication is generally performed to:
• To provide a consistent copy of data across all the database nodes.
• To increase the availability of data.
• The reliability of data is increased through data replication.
• Data Replication supports multiple users and gives high performance.
• To remove any data redundancy,the databases are merged and slave
databases are updated with outdated or incomplete data.
• Since replicas are created there are chances that the data is found itself
where the transaction is executing which reduces the data movement.
• To perform faster execution of queries.
DISADVANTAGES OF DATA REPLICATION –
• More storage space is needed as storing the replicas of same data at
different sites consumes more space.
• Data Replication becomes expensive when the replicas at all different sites
need to be updated.
• Maintaining Data consistency at all different sites involves complex
measures.
• Transactional Replication – In Transactional replication users receive full initial copies
of the database and then receive updates as data changes. Data is copied in real time
from the publisher to the receiving database(subscriber) in the same order as they
occur with the publisher therefore in this type of replication, transactional consistency
is guaranteed. Transactional replication is typically used in server-to-server
environments. It does not simply copy the data changes, but rather consistently and
accurately replicates each change.
• Snapshot Replication – Snapshot replication distributes data exactly as it appears at a
specific moment in time does not monitor for updates to the data. The entire
snapshot is generated and sent to Users. Snapshot replication is generally used when
data changes are infrequent. It is bit slower than transactional because on each
attempt it moves multiple records from one end to the other end. Snapshot
replication is a good way to perform initial synchronization between the publisher and
the subscriber.
• Merge Replication – Data from two or more databases is combined into a single
database. Merge replication is the most complex type of replication because it allows
both publisher and subscriber to independently make changes to the database. Merge
replication is typically used in server-to-client environments. It allows changes to be
sent from one publisher to multiple subscribers.
4. (a) What is use of a precedence graph in database ? Write all the steps for constructing a
precedence graph. Suppose there are two transactions T1 and T2. Draw an edge between T1
and T2, if T2 has written on item X first and T1 writes on the same item later.
(b) What is Log-Based Recovery System ? Explain the type of information kept in a log about
transaction. Which type of transactions are selected for REDO and UNDO for database recovery
? Explain with an example.
Atomicity property of DBMS states that either all the operations of transactions must be
performed or none. The modifications done by an aborted transaction should not be visible to
database and the modifications done by committed transaction should be visible.
To achieve our goal of atomicity, user must first output to stable storage information
describing the modifications, without modifying the database itself. This information can help
us ensure that all modifications performed by committed transactions are reflected in the
database. This information can also help us ensure that no modifications made by an aborted
transaction persist in the database.
Log and log records –
The log is a sequence of log records, recording all the update activities in the database. In a
stable storage, logs for each transaction are maintained. Any operation which is performed
on the database is recorded is on the log. Prior to performing any modification to database,
an update log record is created to reflect that modification.
An update log record represented as: <Ti, Xj, V1, V2> has these fields:
1. Transaction identifier: Unique Identifier of the transaction that performed the
write operation.
2. Data item: Unique identifier of the data item written.
3. Old value: Value of data item prior to write.
4. New value: Value of data item after write operation.
Other type of log records are:
1. <Ti start>: It contains information about when a transaction Ti starts.
2. <Ti commit>: It contains information about when a transaction Ti commits.
3. <Ti abort>: It contains information about when a transaction Ti aborts.
Undo and Redo Operations –
Because all database modifications must be preceded by creation of log record, the system
has available both the old value prior to modification of data item and new value that is to be
written for data item. This allows system to perform redo and undo operations as
appropriate:
1. Undo: using a log record sets the data item specified in log record to old value.
2. Redo: using a log record sets the data item specified in log record to new value.
The database can be modified using two approaches –
1. Deferred Modification Technique: If the transaction does not modify the database
until it has partially committed, it is said to use deferred modification technique.
2. Immediate Modification Technique: If database modification occur while
transaction is still active, it is said to use immediate modification technique.
(f) What are the two advantages of a B-Tree as an index ? Write the important features of B-
Tree of order N.
B-Tree is a self-balancing search tree. In most of the other self-balancing search trees
(like AVL and Red-Black Trees), it is assumed that everything is in the main memory.
To understand the use of B-Trees, we must think of the huge amount of data that cannot fit in
the main memory. When the number of keys is high, the data is read from the disk in the
form of blocks. Disk access time is very high compared to the main memory access time. The
main idea of using B-Trees is to reduce the number of disk accesses. Most of the tree
operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses where h is the
height of the tree. B-tree is a fat tree. The height of B-Trees is kept low by putting the
maximum possible keys in a B-Tree node. Generally, the B-Tree node size is kept equal to the
disk block size. Since the height of the B-tree is low so total disk accesses for most of the
operations are reduced significantly compared to balanced Binary Search Trees like AVL Tree,
Red-Black Tree, etc.
Properties of B-Tree:
• All leaves are at the same level.
• B-Tree is defined by the term minimum degree ‘t‘. The value of ‘t‘ depends upon
disk block size.
• Every node except the root must contain at least t-1 keys. The root may contain a
minimum of 1 key.
• All nodes (including root) may contain at most (2*t – 1) keys.
• Number of children of a node is equal to the number of keys in it plus 1.
• All keys of a node are sorted in increasing order. The child between two
keys k1 and k2 contains all keys in the range from k1 and k2.
• B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary
Search Trees grow downward and also shrink from downward.
• Like other balanced Binary Search Trees, the time complexity to search, insert and
delete is O(log n).
• Insertion of a Node in B-Tree happens only at Leaf Node.
(c) Define 2NF. The following are the functional dependencies in a relation : (order_no,
item_code) primary key item_code price/unit order_no order_date
Is this relation in 2NF ? Justify. In case the relation is not in 2NF, convert it in 2NF.
Second Normal Form (2NF):
Second Normal Form (2NF) is based on the concept of full functional dependency. Second
Normal Form applies to relations with composite keys, that is, relations with a primary key
composed of two or more attributes. A relation with a single-attribute primary key is
automatically in at least 2NF. A relation that is not in 2NF may suffer from the update
anomalies.
To be in second normal form, a relation must be in first normal form and relation must not
contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no
non-prime attribute (attributes which are not part of any candidate key) is dependent on any
proper subset of any candidate key of the table.
In other words,
A relation that is in First Normal Form and every non-primary-key attribute is fully
functionally dependent on the primary key, then the relation is in Second Normal Form (2NF).
Note – If the proper subset of candidate key determines non-prime attribute, it is
called partial dependency.
The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a
partial dependency exists, we remove the partially dependent attribute(s) from the relation
by placing them in a new relation along with a copy of their determinant.
2. (a) Define a view. Explain with the help of an example. Also specify the conditions that a view
must meet in order to allow updates.
Views in SQL are kind of virtual tables. A view also has rows and columns as they are in a real
table in the database. We can create a view by selecting fields from one or more tables
present in the database. A View can either have all the rows of a table or specific rows based
on certain condition. In this article we will learn about creating , deleting and updating Views.
We can create View using CREATE VIEW statement. A View can be created from a single table
or multiple tables. Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE condition;
5. (a) Describe the following client-server architecture with the help of a diagram : 2 5=10
(i) 2-tier -A 2 Tier Architecture in DBMS is a Database architecture where the presentation layer
runs on a client (PC, Mobile, Tablet, etc.), and data is stored on a server called the second tier.
Two tier architecture provides added security to the DBMS as it is not exposed to the end-user
directly. It also provides direct and faster communication. In the above 2 Tier client-server
architecture of database management system, we can see that one server is connected with
clients 1, 2, and 3.
(b) Explain the following concepts with the help of suitable example : 2 5=10
(i) Lossless decomposition - Lossless join decomposition is a decomposition of a relation R
into relations R1, R2 such that if we perform a natural join of relation R1 and R2, it will return
the original relation R. This is effective in removing redundancy from databases while
preserving the original data…
In other words by lossless decomposition, it becomes feasible to reconstruct the relation R
from decomposed tables R1 and R2 by using Joins.
In Lossless Decomposition, we select the common attribute and the criteria for selecting a
common attribute is that the common attribute must be a candidate key or super key in
either relation R1, R2, or both.
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at least one of
the following functional dependencies are in F+ (Closure of functional dependencies)
Consistency:
This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above,
The total amount before and after the transaction must be maintained.
(c) Compare and contrast serial file organization technique with indexed sequential technique
in terms of storage, access and other features.
Sequential files Indexed files Relative files
There is no need to One or more KEYS can be Only one unique KEY is
declare any KEY for storing created for storing and declared for storing and
and accessing the records. accessing the records. accessing the records.
(d) Define a JOIN operation, How is this different from Cartesian product in relational algebra ?
Explain with the help of an example.
Join operation combines the relation R1 and R2 with respect to a condition. It is denoted by ⋈.
The different types of join operation are as follows −
• Theta join
• Natural join
• Outer join − It is further classified into following types −
o Left outer join.
o Right outer join.
o Full outer join.
Theta join
If we join R1 and R2 other than the equal to condition then it is called theta join/ non-equi join.
Natural join
If we join R1 and R2 on equal condition then it is called natural join or equi join. Generally, join
is referred to as natural join.
Natural join of R1 and R2 is −
{ we select those tuples from cartesian product where R1.regno=R2.regno}
On applying CARTESIAN PRODUCT on two relations that is on two sets of tuples, it will take
every tuple one by one from the left set(relation) and will pair it up with all the tuples in the
right set(relation).
So, the CROSS PRODUCT of two relation A(R1, R2, R3, …, Rp) with degree p, and B(S1, S2, S3,
…, Sn) with degree n, is a relation C(R1, R2, R3, …, Rp, S1, S2, S3, …, Sn) with degree p + n
attributes.
CROSS PRODUCT is a binary set operation means, at a time we can apply the operation on
two relations. But the two relations on which we are performing the operations do not have
the same type of tuples, which means Union compatibility (or Type compatibility) of the two
relations is not necessary.
(b) What are Checkpoints ? How does this technique of Checkpoints contribute to database
recovery ?
The checkpoint is used to declare a point before which the DBMS was in the consistent state,
and all transactions were committed. During transaction execution, such checkpoints are
traced. After execution, transaction log files will be created.
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving its update to the
database. Then a new log is created with upcoming execution operations of the transaction
and it will be updated until the next checkpoint and the process continues.
• The recovery system reads the logs backward from the end to the last checkpoint
i.e. from T4 to T1.
• It will keep track of two lists – Undo and Redo.
• Whenever there is a log with instruction <Tn, start>and <Tn, commit> or only <Tn,
commit> then it will put that transaction in Redo List. T2 and T3 contain <Tn, Start>
and <Tn, Commit> whereas T1 will have only <Tn, Commit>. Here, T1, T2, and T3
are in the redo list.
• Whenever a log record with no instruction of commit or abort is found, that
transaction is put to Undo List <Here, T4 has <Tn, Start> but no <Tn, commit> as it
is an ongoing transaction. T4 will be put in the undo list.
All the transactions in the redo-list are deleted with their previous logs and then redone
before saving their logs. All the transactions in the undo-list are undone and their logs are
deleted.
Backup and It doesn’t provide backup and recovery It provides backup and recovery
Recovery of data if it is lost. of data even if it is lost.
(b) Hashed File Organization - Hash File Organization uses the computation of hash function on
some fields of the records. The hash function's output determines the location of disk block
where the records are to be placed. When a record has to be received using the hash key
columns, then the address is generated, and the whole record is retrieved using that address. In
the same way, when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of delete and
update. In this method, there is no effort for searching and sorting the entire file. In this method,
each record will be stored randomly in the memory.
(c) DML Commands used in DBMS (any four) - The SQL commands that deals with the
manipulation of data present in the database belong to DML or Data Manipulation Language
and this includes most of the SQL statements. It is the component of the SQL statement that
controls access to data and to the database. Basically, DCL statements are grouped with DML
statements.
List of DML commands:
• INSERT : It is used to insert data into a table.
• UPDATE: It is used to update existing data within a table.
• DELETE : It is used to delete records from a database table.
• LOCK: Table control concurrency.
• CALL: Call a PL/SQL or JAVA subprogram.
• EXPLAIN PLAN: It describes the access path to data.
(d) Functional Dependency - The functional dependency is a relationship that exists between two
attributes. It typically exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
(b) Verify the statement, "Any relation in BCNF is in 3NF but converse is not true." Give suitable
example. 5
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X → Y:
1. X is a super key. (This condition is must for BCNF relations.).
2. Y is a prime attribute (each element of Y is part of some candidate key).
But, a relation is in BCNF iff, X is superkey for every functional dependency (FD) X → Y in given
relation.
Therefore, BCNF relations are subset of 3NF relations. Means every BCNF relation is 3NF but
converse may not true.
(c) Explain the term data replication and data fragmentation with suitable example. 5
Data Replication is the process of storing data in more than one site or node. It is useful
in improving the availability of data. It is simply copying data from a database from one server
to another server so that all the users can share the same data without any inconsistency. The
result is a distributed database in which users can access data relevant to their tasks without
interfering with the work of others.
Data replication encompasses duplication of transactions on an ongoing basis, so that
the replicate is in a consistently updated state and synchronized with the source.However in
data replication data is available at different locations, but a particular relation has to reside
at only one location.
There can be full replication, in which the whole database is stored at every site. There can
also be partial replication, in which some frequently used fragment of the database are
replicated and others are not replicated.
Types of Data Replication
1. Transactional Replication – In Transactional replication users receive full initial
copies of the database and then receive updates as data changes. Data is copied in
real time from the publisher to the receiving database(subscriber) in the same
order as they occur with the publisher therefore in this type of
replication, transactional consistency is guaranteed. Transactional replication is
typically used in server-to-server environments. It does not simply copy the data
changes, but rather consistently and accurately replicates each change.
2. Snapshot Replication – Snapshot replication distributes data exactly as it appears
at a specific moment in time does not monitor for updates to the data. The entire
snapshot is generated and sent to Users. Snapshot replication is generally used
when data changes are infrequent. It is bit slower than transactional because on
each attempt it moves multiple records from one end to the other end. Snapshot
replication is a good way to perform initial synchronization between the publisher
and the subscriber.
(d) What are integrity constraints ? Explain the various types of integrity constraints with
suitable examples.
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The
value of the attribute must be available in the corresponding domain.
Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
(e) How do you implement a hierarchical data model ? Explain through an illustration. 5
Applications of hierarchical model :
• Hierarchical models are generally used as semantic models in practice as many
real-world occurrences of events are hierarchical in nature like biological
structures, political, or social structures.
• Hierarchical models are also commonly used as physical models because of the
inherent hierarchical structure of the disk storage system like tracks, cylinders, etc.
There are various examples such as Information Management System (IMS) by
IBM, NOMAD by NCSS, etc.
(f) Define Data Manipulation Language (DML) of SQL. List and explain various DML commands.
The DML commands in Structured Query Language change the data present in the SQL database.
We can easily access, store, modify, update and delete the existing records from the database
using DML commands.
1. SELECT Command
2. INSERT Command
3. UPDATE Command
4. DELETE Command
The DML commands in Structured Query Language change the data present in the SQL database.
We can easily access, store, modify, update and delete the existing records from the database
using DML commands.
1. SELECT Command
2. INSERT Command
3. UPDATE Command
4. DELETE Command
SELECT is the most important data manipulation command in Structured Query Language. The
SELECT command shows the records of the specified table. It also shows the particular record of
a particular column by using the WHERE clause.
(g) How do B-tree indexes differ from Binary search tree indexes ?
B-Tree : B-Tree is known as a self-balancing tree as its nodes are sorted in the inorder
traversal. Unlike the binary trees, in B-tree, a node can have more than two children. B-tree
has a height of logM N (Where ‘M’ is the order of tree and N is the number of nodes). And the
height is adjusts automatically at each update. In the B-tree data is sorted in a specific order,
with the lowest value on the left and the highest value on the right. To insert the data or key
in B-tree is more complicated than binary tree.
There are some conditions that must be hold by the B-Tree:
• All the leaf nodes of the B-tree must be at the same level.
• Above the leaf nodes of the B-tree, there should be no empty sub-trees.
• B- tree’s height should lie as low as possible.
Binary Tree : A binary tree is the special type of general tree. Unlike B-tree, in a binary tree a
node can have at most two nodes. In a binary tree, there is a limitation on the degree of a
node because the nodes in a binary tree can’t have more than two child node(or degree two).
The topmost node of a binary tree is called root node and there are mainly two subtrees one
is left-subtree and another is right-sub-tree. Unlike the general tree, the binary tree can be
empty. Like B-tree, binary tree can also be sorted in inorder traversal. But it can also be
sorted in preorder as well as postorder. In binary tree, data insertion is not complicated than
B-tree.
(h) Differentiate between the concepts of Logical data independence and . Physical data
independence in DBMS.
Physical Data Independence Logical Data Independence
It mainly concern about how the data is It mainly concerned about the structure or
stored into the system. the changing data definition.
Any change at the physical level, does not The change in the logical level requires a
require to change at the application level. change at the application level.
The modifications made at the internal The modifications made at the logical level
level may or may not be needed to is significant whenever the logical structure
improve the performance of the structure. of the database is to be changed.
It is concerned with the internal schema. It is concerned with the conceptual schema.
3. (a) What do you understand by the term Query Optimization ? Discuss the role of relational
algebra in Query Optimization. List the operators used in relational algebra and discuss the
operation of each, with suitable example.
2. Selection is commutative.
3. All following projections can be omitted, only the first projection is required. This is
called a pi-cascade.
Example
It modifies the database after a write operation, database modification is immediately done
when a transaction performs an update/ write operation. Update log records maintain both old
and new values of data items.
The recovery system uses two operations, which are as follows −
• Undo(Ti) − All data items updated by the transaction Ti, are set to old value.
• Redo(Ti) − All data items updated by the transaction Ti are set to a new value.