0% found this document useful (0 votes)
47 views31 pages

Dbms Data Base Management System Is A: Features of Database

A DBMS is a collection of data and programs that allow for efficient access and storage of large amounts of data. It manages both the data itself and its structure. Key people involved in a DBMS include implementers, designers, developers, administrators, and end users. Data is represented at three levels of abstraction - physical, logical, and view - and is structured using schemas, entities, attributes, keys, and relationships modeled in entity-relationship diagrams. Relational algebra provides operations to manipulate the data.

Uploaded by

Debarun Sarkar
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
Download as doc, pdf, or txt
0% found this document useful (0 votes)
47 views31 pages

Dbms Data Base Management System Is A: Features of Database

A DBMS is a collection of data and programs that allow for efficient access and storage of large amounts of data. It manages both the data itself and its structure. Key people involved in a DBMS include implementers, designers, developers, administrators, and end users. Data is represented at three levels of abstraction - physical, logical, and view - and is structured using schemas, entities, attributes, keys, and relationships modeled in entity-relationship diagrams. Relational algebra provides operations to manipulate the data.

Uploaded by

Debarun Sarkar
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1/ 31

A DBMS is the acronym of Data Base Management System is a

collection of interrelated data and a set of programs to access those


data. It manages new large amount of data and supports efficient access
to new large amounts of data.
Data (which is large and is being frequently modified)
Structure of data (which is small and stable in time)

Features of Database:

Faithfulness: The design and implementation should be faithful to


the requirements.
Avoid Redundancy: This value is important because redundancy.
Simplicity: Simplicity requires that the design and implementation
avoid introducing more elements than are absolutely necessary.
Right kind of element: Attributes are easier to implement but
entity sets are relationships are necessary to ensure that the right
kind of element is introduced.

Types of Database

Centralized Database: All data is located at a single site.


Distributed Database: The database is stored on several
computer.

The information contained in a database is represented on two


levels:
Database Management System (DBMS) provides efficient, reliable,
convenient and safe multi-user storage of and access to massive
amounts of persistent data.
Key People Involved in a DBMS:

DBMS Implementer: Person who builds system


Database Designer: Person responsible for preparing external
schemas for applications, identifying and integrating user needs
into a conceptual (or community or enterprise) schema.
Database Application Developer: Person responsible for
implementing database application programs that facilitate data
access for end users.
Database Administrator: Person responsible for define the
internal schema, sub-schemas (with database designers) and
specifying mappings between schemas, monitoring database usage
and supervising DBMS functionality (e.g., access control,
performance optimisation, backup and recovery policies, conflict
management).
End Users: Users who query and update the database through
fixed programs (invoked by non-programmer users) e.g., banking.

Levels of Data Abstraction:

A 3-tier architecture separates its tiers from each other based on the
complexity of the users and how they use the data present in the
database. It is the most widely used architecture to design a DBMS.
Physical Level: It is lowest level of abstraction and describes how
the data are actually stored and complex low level data structures
in detail.
Logical Level: It is the next higher level of abstraction and
describes what data are stored and what relationships exist among
those data. At the logical level, each such record is described by a
type definition and the interrelationship of these record types is
defined as well. Database administrators usually work at this level
of abstraction.
View Level: It is the highest level of abstraction and describes only
part of the entire database and hides the details of the logical level.

Schema:

A schema is also known as the database schema. It is a logical design of


the database and a database instance is a snapshot of the data in the
database at a given instant of time. A relational schema consists of a list
of attributes and their corresponding domains.
Types of Schemas: It can be classified into three parts, according to the
levels of abstraction
Physical/Internal Schema: Describes the database design at the
physical level.
Logical/Conceptual Schema/Community User View: Describes
the database design at the logical level.
Sub-schemas /View/External Schema: Describes different views
of the database views may be queried combined in queries with
base relations, used to define other views in general not updated
freely.

Data model:

A data model is a plan for building a database. Data models define how
data is connected to each other and how they are processed and stored
inside the system.
Two widely used data models are:
Object-based logical model
Record based logical model

Entity :

An entity may be an object with a physical existence or it may be an


object with a conceptual existence. Each entity has attributes. A thing
(animate or inanimate) of independent physical or conceptual existence
and distinguishable. In the University database context, an individual
student, faculty member, a class room, a course are entities.

Attributes

Each entity is described by a set of attributes/properties.


Types of Attributes
Simple Attributes: having atomic or indivisible values. example:
Dept a string Phone Number - an eight digit number
Composite Attributes: having several components in the value.
example: Qualification with components (Degree Name, Year,
University Name)
Derived Attributes: Attribute value is dependent on some other
attribute. example: Age depends on Date Of Birth. So age is a
derived attribute.
Single-valued: having only one value rather than a set of values. for
instance, Place Of Birth - single string value.
Multi-valued: having a set of values rather than a single value. for
instance, Courses Enrolled attribute for student Email Address
attribute for student Previous Degree attribute for student.
Attributes can be: simple single-valued, simple multi-valued,
composite single-valued or composite multi-valued.

Keys

A super key of an entity set is a set of one or more attributes


whose values uniquely determine each entity.
A candidate key of an entity set is a minimal super key
Customer-id is candidate key of customer
Account-number is candidate key of the account
Although several candidate keys may exist, one of the candidate
keys is selected to be the primary key.

Keys for Relationship Sets

The combination of primary keys of the participating entity sets forms a


super key of a relationship set.
(customer-id, account-number) is the super key of the depositor
NOTE: this means a pair of entity sets can have at most one
relationship in a particular relationship set.
If we wish to track all access-dates to each account by each
customer, we cannot assume a relationship for each access. We
can use a multivalued attribute though.
Must consider the mapping cardinality of the relationship set when
deciding the what are the candidate keys.
Need to consider semantics of relationship set in selecting
the primary key in case of more than one candidate key.

ER Modeling:

Entity-Relationship model (ER model) in software engineering is an


abstract way to describe a database. Describing a database usually
starts with a relational database, which stores data in tables.
Notations/Shapes in ER Modeling

Notations/Shapes in ER Modeling:
The overall logical structure of a database can be expressed
graphically by an E-R diagram. The diagram consists of the following
major components.
Rectangles: represent entity set.
Ellipses: represent attributes.

Diamonds: represents relationship sets.


Lines: links attribute set to entity set and entity set to relationship
set.
Double ellipses: represent multi-valued attributes.
Dashed ellipses: denote derived attributes.
Double lines: represent total participation of an entity in a
relationship set.
Double rectangles: represent weak entity sets.

Mapping Cardinalities / Cardinality Ratio / Types of Relationship:

Expresses the number of entities to which another entity can be


associated via a relationship set. For a binary relationship set R between
entity sets A and B, the mapping cardinality must be one of the following:
One to One: An entity in A is associated with at most one entity in
B and an entity in B is associated with at most one entity in A.
One to Many: An entity in A is associated with any number (zero or
more) of entities; in B. An entity in B, however, can be associated
with at most one entity in A.
Many to Many: An entity in A is associated with any number (zero
or more) c entities in B and an entity B is associated with any
number (zero or more) of entities in A.
Specialization: Consider an entity set person with attributes name, street
and city, A person may be further classified-as one of the following:
Customer, and Employee. Each of these person types is described by a
set of attributes 1 at includes all the attributes of entity set person plus
possibly additional attributes. The process of designating subgroupings
within an entity set is called specialization.
The specialization of person allows us to distinguish among persons
according to whether they are employees or customers,
The refinement from an initial entity set into successive levels of entity
subgroupings represents a top-down design process in which distinctions
are made explicitly.
Generalization: Basically generalization is a simple inversion of
specialization. Some common attributes of multiple entity sets are chosen
to create higher level entity set. If the customer entity set and the
employee entity set are having several attributes in common, then this
commonality can be expressed by generalization.
Here, person is the higher level entity set and customer and employee
are lower level entity sets. Higher and lower level entity sets also may be
designated by- the terms super class and subclass, respectively.
Aggregation: Aggregation is used when we have to model a relationship
involving entity set and a relationship set. Aggregation is an abstraction
through which relationships are treated as higher level entities.

Relational Algebra:

A Relational model is completely based on relational algebra. It consists


of a collection of operators that operate on relations. Its main objective is
data retrieval. It is more operational and very much useful to represent
execution plans, while relational calculus is non-operational and
declarative. Here, declarative means user define queries in terms of what
they want, not in terms of how compute it.
Basic Operation in Relational Algebra
The operations in relational algebra are classified as follows.
Selection (): The select operation selects tuples/rows that satisfy a
given predicate or condition. We use () to denote selection. The
predicate/condition appears as a subscript to .
Projection (): It selects only required/specified columns/attributes from
a given relation/table. Projection operator eliminates duplicates (i.e.,
duplicate rows from the result relation).
Union (): It forms a relation from rows/tuples which are appearing in
either or both of the specified relations. For a union operation R S to be
valid, below two conditions must be satisfied.
The relations Rand S must be of the same entity i.e., they must
have the same number of attributes.
The domains. of the i th attribute of R and i th attribute of S must be
the same, for all i.

Intersection (): It forms a relation of rows/ tuples which are present in


both the relations R and S. Ensure that both relations are compatible for
union and intersection operations.
Set Difference (-): It allows us to find tuples that are in one relation but
are not in another. The expression R S produces a relation containing
those tuples in R but not in S.
Cross Product/Cartesian Product (): Assume that we have n 1 tuples in
R and n 2 tuples in S. Then, there are n 1 * n 2 ways of choosing a pair of
tuples; one tuple from each relation. So, there will be (n 1 * n 2) tuples in
result relation P if P = R S.
y conditions to be satisfied by the data values in the relational instances
so that the set of data values constitute a meaningful database.

Integrity Constraints:

There are four types of Integrity constraints


Domain Constraint: The value of attribute must be within the domain.
Key Constraint: Every relation must have a primary key.
Entity Integrity Constraint: Primary key of a relation should not contain
NULL values.
Referential Integrity Constraint: In relational model, two relations are
related to each other over the basis of attributes. Every value of
referencing attributes must be NULL or be available in the referenced
attribute.
There Schema Refinement/Normalization
Decomposition of complex records into simple records. Normalization
reduces redundancy using non-loss decomposition principle.

Decomposition

Splitting a relation R into two or more sub relation R 1 and R 2. A fully


normalized relation must have a primary key and a set of attributes.
Decomposition should satisfy: (i) Lossless join, and (ii) Dependency
preservence

Lossless Join Decomposition

Join between the sub relations should not create any additional tuples or
there should not be a case such that more number of tuples in R 1 than R 2
R R1 R2 (Lossy)
R R1 R2 (Lossless)
Dependency Preservence: Because of decomposition, there must not
be loss of any single dependency.
Functional Dependency (FD): Dependency between the attribute is
known as functional dependency. Let R be the relational schema and X,
Y be the non-empty sets of attributes and t 1, t 2, ... ,t n are the tuples of
relation R. X Y {values for X functionally determine values for Y}
Trivial Functional Dependency: If X Y, then X Y will be trivial FD.

Here, X and Y are set of attributes of a relation R.


In trivial FD, there must be a common attribute at both the sides of
arrow.
Non-Trivial Functional Dependency: If X Y = (no common
attributes) and X Y satisfies FD, then it will be a non-trivial FD.
(no common attribute at either side of arrow)

Case of semi-trivial FD
S id S id S name (semi-trivial)
Because on decomposition, we will get
S id S id (trivial FD) and
S id S name (non-trivial FD)
Properties of Functional Dependence (FD)
Reflexivity: If X Y, then X Y (trivial)
Transitivity: If X Y and Y Z, then X Z
Augmentation: If X Y, then XZ YZ
Splitting or Decomposition: If X YZ, then X Y and X Z
Union: If X Y and X Z, then X YZ

Attribute Closure: Suppose R(X, Y, Z) be a relation having set of


attributes i.e., (X, Y, Z), then (X + ) will be an attribute closure which
functionally determines other attributes of the relation (if not all then
atleast itself).

Normal Forms/Normalization:

In relational database design, the normalization is the process for


organizing data to minimize redundancy. Normalization usually involves
dividing a database into two or more tables and defining relationship
between the tables. The normal forms define the status of the relation
about the individuated attributes. There are five types of normal forms
First Normal Form (1NF): Relation should not contain any multivalued
attributes or relation should contain atomic attribute. The main
disadvantage of 1NF is high redundancy.
Second Normal Form (2NF): Relation R is in 2NF if and only if R should
be in 1NF, and R should not contain any partial dependency.
Partial Dependency: Let R be the relational schema having X, Y, A,
which are non-empty set of attributes, where X = Any candidate key of
the relation, Y = Proper subset of any candidate key, and A = Non-prime
attribute (i.e., A doesn't belong to any candidate key)
In the above example, X A already exists and if Y A will exist, then it
will become a partial dependency, if and only if
Y is a proper subset of candidate key.
A should be non-prime attribute.

If any of the above two conditions fail, then Y A will also become fully
functional dependency.
Full Functional Dependency: A functional dependency P Q is said to
be fully functional dependency, if removal of any attribute S from P
means that the dependency doesn't hold any more.
(Student_Name, College_Name College_Address)
Suppose, the above functional dependency is a full functional
dependency, then we must ensure that there are no FDs as below.
(Student_Name College_Address)
or (College_Name Collage_Address)
Third Normal Form (3NF): Let R be a relational schema, then any non-
trivial FD X Y over R is in 3NF, if X should be a candidate key or super
key or Y should be a prime attribute.
Either both of the above conditions should be true or one of them
should be true.
R should not contain any transitive dependency.

For a relation schema R to be a 3NF, it is necessary to be in 2NF.


Transitive Dependency: A FD, P Q in a relation schema R is a
transitive if
There is a set of attributes Z that is not a subset of any key of R.
Both X Z and Z Y hold

The above relation is in 2NF.


In relation R 1 , C is not a candidate key and D is non-prime attribute.
Due to this, R 1 fails to satisfy 3NF condition. Transitive dependency
is present here.

AB C and C D, then AB D will be transitive.


Boycee Codd Normal Form (BCNF): Let R be the relation schema and
X Y be the any non-trivial FD over R is in BCNF if and only if X is the
candidate key or super key.
If R satisfies this dependency, then of course it satisfy 2NF and 3NF.
Summary of 1 NF, 2 NF and 3 NF:
Fourth Normal Form (4NF): 4NF is mainly concerned with multivalued
dependency A relation is in 4NF if and only if for every one of its non-
trivial multivalued dependencies X Y, X is a super key (i.e., X is
either a candidate key or a superset).
Fifth Normal Form (5NF): It is also 'known as Project Join Normal From
(PJ/NF). 5NF reduces redundancy in relational database recording
multivalued facts by isolating semantically related multiple relationships.
A table or relation is said to be in the 5NF, if and only if every join
dependency in it is implied by the candidate keys.

SQL:
Structured Query language (SQL) is a language that provides an
interface to relational database systems. SQL was developed by IBM in
the 1970, for use in system R and is a defector standard, as well as an
ISO and ANSI standard.
To deal with the above database objects, we need a programming
language and that programming language is known as SQL.

Three subordinate languages of SOL are :

Type of SQL
SQL Keyword Function
Statement

Used to define change


CREATE and drop the structure
Data Definition ALTER of a table
Language(DDL) DROP
TRUNCATE Used to remove all
rows from a table

SELECT
INSERT INTO Used to enter, modify,
Data manipulation
delete and retrieve
language(DML) UPDATE data from a table
DELETE FROM

GRANT Used to provide control


over the data in a
REVOKE
Data Control database
Language (DCL)
COMMIT
Used to define the end
ROLLBACK of a transaction

Data Definition Language (DDL) :

It includes the commands as


CREATE To create tables in the database.
ALTER To modify the existing table structure:

DROP To drop the table with table structure.


Data Manipulation Language(DML) :

It is used to insert, delete, update data and perform queries on these


tables. Some of the DML commands are given below.
INSERT To insert data into the table.
SELECT To retrieve data from the table.

UPDATE To-update existing data in the table.


DELETE To delete data from the table.

Data Control Language (DCL) :

It is used to control user's access to the database objects. Some of the


DCL commands are:
GRANT Used to grant select/insert/delete access.
REVOKE Used to revoke the provided access

Transaction Control Language (TCL): It is used to manage changes


affecting the data.
COMMIT To save the work done, such as inserting or updating or
deleting data to/from the table.
ROLLBACK To restore database to the original state, since last
commit.
SQL Data Types SQL data types specify the type, size and format
of data/information that can be stored in columns and variables.

Constraint Types with Description

Default Constraint: It is used to insert a default value into a column, if


no other value is specified at the time of insertion.
Syntax
CREATE TABLE Employee
{
Emp_idint NOT NULL,
Last_Name varchar (250),
City varchar (50)OEFAULT *BANGALURU*
}
DDL Commands
1. CREATE TABLE < Tab1e_Name>
{
Co1umn_name 1< data_type >,
Column_name 2 < d'lta_type >
}
2. ALTER TABLE < Table_Name>
ALTER Column < Column_Name> SET NOT NULL
3. RENAME < object_type >object_name > to <new_name>
4. DROP TABLE <Table_Name>

DML Commands
SELECT A 1 , A 2 , A 3 ,A n what to return
FROM R 1, R 2, R 3, .., R m relations or table
WHERE condition filter condition i.e., on what basis, we want to restrict
the outcome/result.
If we want to write the above SQL script in the form of relational calculus,
we use the following syntax
Comparison operators which we can use in filter condition are (=, >, <, >
= , < =, < >,) < > means not equal to.
INSERT Statement: Used to add row (s) to the tables in a database
INSERT INTO Employee (F_Name, L_Name) VALUES ('Atal', 'Bihari')
UPDATE Statement: It is used to modify/update or change existing data
in single row, group of rows or all the rows in a table.
Example:
//Updates some rows in a table.
UPDATE Employee
SET City = LUCKNOW
WHERE Emp_Id BETWEEN 9 AND 15;
//Update city column for all the rows
UPDATE Employee SET City=LUCKNOW;
DELETE Statement: This is used to delete rows from a table,
Example:
//Following query will delete all the rows from Employee table
DELETE Employee
Emp_Id=7;
DELETE Employee
ORDER BY Clause: This clause is used to, sort the result of a query in a
specific order (ascending or descending), by default sorting order is
ascending.
SELECT Emp_Id, Emp_Name, City FROM Employee
WHERE City = LUCKNOW
ORDER BY Emp_Id DESC;
GROUP BY Clause: It is used to divide the result set into groups.
Grouping can be done by a column name or by the results of computed
columns when using numeric data types.
The HAVING clause can be used to set conditions for the
GROUPBY clause.
HAVING clause is similar to the WHERE clause, but having puts
conditions on groups.
WHERE clause places conditions on rows.
WHERE clause cant include aggregate: function, while HAVING
conditions can do so.

Example:
SELECT Emp_Id, AVG (Salary)
FROM Employee
GROUP BY Emp_Id
HAVING AVG (Salary) > 25000;

Aggregate Functions

Joins: Joins are needed to retrieve data from two tables' related rows on
the basis of some condition which satisfies both the tables. Mandatory
condition to join is that atleast one set of column (s) should be taking
values from same domain in each table.
Inner Join: Inner join is the most common join operation used in
applications and can be regarded as the default join-type. Inner join
creates a new result table by combining column values of two tables (A
and B) based upon the join-predicate. These may be further divided into
three parts.
1. Equi Join (satisfies equality condition)
2. Non-Equi Join (satisfies non-equality condition)
3. Self Join (one or more column assumes the same domain of
values).

Outer Join: An outer join does not require each record in the two joined
tables to have a matching record. The joined table retains each record-
even if no other matching record exists.
Considers also the rows from table (s) even if they don't satisfy the
joining condition
(i) Right outer join (ii) Left outer join (iii) Full outer join

Left Outer Join: The result of a left outer join for table A and B always
contains all records of the left table (A), even if the join condition does
not find any matching record in the right table (B).
Result set of T1 and T2
Right Outer Join: A right outer closely resembles a left outer join, except
with the treatment of the tables reversed. Every row from the right table
will appear in the joined table at least once. If no matching with left table
exists, NULL will appear.

Result set of T1 and T2


Full Outer Join: A full outer join combines the effect of applying both left
and right outer joins where records in the FULL OUTER JOIN table do
not match, the result set will have NULL values for every column of the
table that lacks a matching row for those records that do match, as single
row will be produced in the result set.

Result set of T1 and T2 (Using tables of previous example)

Cross Join (Cartesian product): Cross join returns the Cartesian product
of rows form tables in the join. It will produce rows which combine each
row from the first table with each row from the second table.
Select * FROM T1, T2
Number of rows in result set = (Number of rows in table 1 Number of
rows in table 2)
Result set of T1 and T2 (Using previous tables T1 and T2)

Structure Storage:
The storage structure can be divided into two categories:
Volatile storage: As the name suggests, a volatile storage cannot
survive system crashes. Volatile storage devices are placed very close to
the CPU; normally they are embedded onto the chipset itself. For
example, main memory and cache memory are examples of volatile
storage. They are fast but can store only a small amount of information.
Non-volatile storage: These memories are made to survive system
crashes. They are huge in data storage capacity, but slower in
accessibility. Examples may include hard-disks, magnetic tapes, flash
memory, and non-volatile (battery backed up) RAM.

File Organisation:

The database is stored as a collection of files. Each file is a sequence of


records. A record is a sequence of fields. Data is usually stored in the
form of records. Records usually describe entities and their attributes.
e.g., an employee record represents an employee entity and each field
value in the record specifies some attributes of that employee, such as
Name, Birth-date, Salary or Supervisor.
Allocating File Blocks on Disk: There are several standard techniques
for allocating the blocks of a file on disk
Contiguous Allocation: The file blocks are allocated to
consecutive disk blocks. This makes reading the whole file very
fast.
Linked Allocation: In this, each file contains a pointer to the next
file block.
Indexed Allocation: Where one or more index blocks contain
pointers to the actual file blocks.

Files of Unordered Records (Heap Files): In the simplest type of


organization records are placed in the file in the order in which they are
inserted, so new records are inserted at the end of the file. Such an
organisation is called a heap or pile file.
This organisation is often used with additional access paths, such as the
secondary indexes.
In this type of organisation, inserting a new record is very efficient. Linear
search is used to search a record.
Files of Ordered Records (Sorted Files): We can physically order the
records of a file on disk based on the values of one of their fields called
the ordering field. This leads to an ordered or sequential file. If the
ordering field is also a key field of the file, a field guaranteed to have a
unique value in each record, then the field is called the ordering key for
the file. Binary searching is used to search a record.
Indexing Structures for Files: Indexing mechanism are used to optimize
certain accesses to data (records) managed in files. e.g., the author
catalog in a library is a type of index. Search key (definition) attribute or
combination of attributes used to look-up records in a file.
An index file consists of records (called index entries) of the form.

Index files are typically much smaller than the original file because only
the values for search key and pointer are stored. The most prevalent
types of indexes are based on ordered files (single-level indexes) and
tree data structures (multilevel indexes).
Types of Single Level Ordered Indexes: In an ordered index file, index
enteries are stored sorted by the search key value. There are several
types of ordered Indexes
Primary Index: A primary index is an ordered file whose records are of
fixed length with two fields. The first field is of the same data type as the
ordering key field called the primary key of the data file and the second
field is a pointer to a disk block (a block address).
There is one index entry in the index file for each block in the data
file.
Indexes can also be characterised as dense or sparse.

Dense index A dense index has an index entry for every search
key value in the data file.
Sparse index A sparse index (non-dense), on the other hand has
index entries for only some of the search values.
A primary index is a non-dense (sparse) index, since it includes an
entry for each disk block of the data file rather than for every search
value.

Clustering Index: If file records are physically ordered on a non-key field


which does not have a distinct value for each record that field is called
the clustering field. We can create a different type of index, called a
clustering index, to speed up retrieval of records that have the same
value for the clustering field.
A clustering index is also an ordered file with two fields. The first
field is of the same type as the clustering field of the data file.
The record field in the clustering index is a block pointer.

A clustering index is another example of a non-dense index.

Secondary Index: A secondary index provides a secondary means of


accessing a file for which some primary access already exists. The
secondary index may be on a field which is a candidate key and has a
unique value in every record or a non-key with duplicate values. The
index is an ordered file with two fields. The first field is of the same data
type as some non-ordering field of the data file that is an indexing field.
The second field is either a block pointer or a record pointer. A secondary
index usually needs more storage space and longer search time than
does a primary index.
Multilevel Indexes: The idea behind a multilevel index is to reduce the
part of the index. A multilevel index considers the index file, which will be
referred now as the first (or base) level of a multilevel index. Therefore,
we can create a primary index for the first level; this index to the first
level is called the second level of the multilevel index and so on.
Dynamic Multilevel Indexes Using B-Trees and B + -Trees: There are
two multilevel indexes

B-Trees

When data volume is large and does not fit in memory, an extension
of the binary search tree to disk based environment is the B-tree.
In fact, since the B-tree is always balanced (all leaf nodes appear
at the same level), it is an extension of the balanced binary search
tree.
The problem which the B-tree aims to solve is given a large
collection of objects, each having a key and a value, design a disk
based index structure which efficiently supports query and update.
A B-tree of order p, when used as an access structure on a key field
to search for records in a data file, can be defined as follows
1. Each internal node in the B-tree is of the form
where, q p
Each P i is a tree pointer to another node in the B-tree.
Each is a data pointer to the record whose search key field
value is equal to K j .
2. Within each node, K 1 < K 2 < . < K q1
3. Each node has at most p tree pointers.
4. Each node, except the root and leaf nodes, has atleast [(p/2)]
tree pointers.
5. A node within q tree pointers q p, has q 1 search key field
values (and hence has q 1 data pointers).
e.g., A B-tree of order p = 3. The values were inserted in the
order 8, 5, 1, 7, 3, 12, 9, 6.

B + Trees
It is the variation of the B-tree data structure.

In a B-tree, every value of the search field appears once at some


level in the tree, along with a data pointer. In a B + -tree, data
pointers are stored only at the leaf nodes of the tree. Hence, the
structure of the leaf nodes differs from the structure of internal
nodes.
The pointers in the internal nodes are tree pointers to blocks that
are tree nodes whereas the pointers in leaf nodes are data
pointers.
B + Tree's Structure: The structure of the B + -tree of order p is as
follows
1. Each internal node is of the form < P l , K 1, P 2 , K 2, . ,P q1 , K q1 ,
P q > where, q P and each P i is a tree pointer.
2. Within each internal node, K 1 < K 2 < K 3. < K q1 .
3. Each internal node has at most p tree pointers and except
root, has atleast [(p/ 2)] tree pointers.
4. The root node has atleast two tree pointers, if it is an internal
node.
5. Each leaf node is of the form: where, q p, each is a data
pointer and P next points to the next leaf node of the B + -trees.

All the best.


Team gradeup.
Tags : IBPS-SO :: IT Officer Computer

You might also like