DBMS-unit 5-distributed databases

DISTRIBUTED DATABASE:
 Distributed database (DDB) is a collection of multiple logically interrelated databases distributed over a
computer network, and a distributed data-base management system (DDBMS) as a software system that
manages a distributed database while making the distribution transparent to the user.
Figure. Some different database system architectures.

(a) Shared nothing architecture.
(b) A networked architecture with a centralized database at one of the sites.
(c) A truly distributed database architecture.
GENERAL ARCHITECTURE OF PURE DISTRIBUTED DATABASES
 In Figure, which describes the generic schema architecture of a DDB, the enterprise is presented with a
consistent, unified view showing the logical structure of underlying data across all nodes. This view is
represented by the global conceptual schema (GCS), which provides network transparency.
 To accommodate potential heterogeneity in the DDB, each node is shown as having its own local internal
schema (LIS) based on physical organization details at that particular site. The logical organization of data
at each site is specified by the local conceptual schema (LCS). The GCS, LCS, and their underlying
mappings provide the fragmentation and replication transparency.

Federated Database Schema Architecture:
Figure . The five-level schema architecture in a federated database system (FDBS).
Typical five-level schema architecture to support global applications in the FDBS

environment is shown in Figure.
1) The local schema is the conceptual schema (full database definition) of a component database
2) The component schema is derived by translating the local schema into a canonical data model or
common data model (CDM) for the FDBS. Schema translation from the local schema to the component
schema is accompanied by generating mappings to transform commands on a component schema into
commands on the corresponding local schema.
3) The export schema represents the subset of a component schema that is available to the FDBS.
4) The federated schema is the global schema or view, which is the result of integrating all the shareable
export schemas.
5) The external schemas define the schema for a user group or an application.
All the problems related to query processing, transaction processing, and directory and metadata management
and recovery apply to FDBSs with additional considerations.
An Overview of Three-Tier Client-Server Architecture:
In the three-tier client-server architecture, the following three layers exist:
Presentation layer (client). This provides the user interface and interacts with the user. The programs at
this layer present Web interfaces or forms to the client in order to interface with the application. Web browsers
are often utilized, and the languages and specifications used include HTML, XHTML, CSS, Flash, MathML,
Scalable Vector Graphics (SVG), Java, JavaScript, Adobe Flex, and others. This layer handles user input, output,
and navigation by accepting user commands and displaying the needed information, usually in the form of static
or dynamic Web pages. The latter are employed when the interaction involves database access. When a Web
interface is used, this layer typically communicates with the application layer via the HTTP protocol.
Figure . The three-tier client-server architecture.
Application layer (business logic). This layer programs the application logic. For example, queries can be
formulated based on user input from the client, or query results can be formatted and sent to the client for
presentation. Additional application functionality can be handled at this layer, such as security checks, identity
verification, and other functions. The application layer can interact with one or more databases or data sources
as needed by connecting to the database using ODBC, JDBC, SQL/CLI, or other database access techniques.
Database server. This layer handles query and update requests from the application layer, processes the
requests, and sends the results. Usually SQL is used to access the database if it is relational or object-relational
and stored database procedures may also be invoked. Query results (and queries) may be formatted into XML
when transmitted between the application server and the database server.
 Exactly how to divide the DBMS functionality between the client, application server, and database server
may vary. The common approach is to include the functionality of a centralized DBMS at the database
server level. A number of relational DBMS products have taken this approach, where an SQL server is
provided. The application server must then formulate the appropriate SQL queries and connect to the
database server when needed. The client provides the processing for user interface interactions. Since SQL
is a relational standard, various SQL servers, possibly provided by different vendors, can accept SQL
commands through standards such as ODBC, JDBC, and SQL/CLI.
 In this architecture, the application server may also refer to a data dictionary that includes information on
the distribution of data among the various SQL servers, as well as modules for decomposing a global query
into a number of local queries that can be executed at the various sites. Interaction between an application
server and database server might proceed as follows during the processing of an SQL query:
 The application server formulates a user query based on input from the client layer and decomposes it
into a number of independent site queries. Each site query is sent to the appropriate database server site.
 Each database server processes the local query and sends the results to the application server site.
Increasingly, XML is being touted as the standard for data exchange, so the database server may format the
query result into XML before sending it to the application server.
 The application server combines the results of the subqueries to produce the result of the originally
required query, formats it into HTML or some other form accepted by the client, and sends it to the client
site for display.
 The application
server is responsible for generating a distributed execution plan for a multisite query or transaction and for
supervising distributed execution by sending commands to servers. These commands include local queries
and transactions to be executed, as well as commands to transmit data to other clients or servers. Another
function controlled by the application server (or coordinator) is that of ensuring consistency of replicated
copies of a data item by employing distributed (or global) concurrency control techniques. The application
server must also ensure the atomicity of global transactions by performing global recovery when certain
sites fail.
 If the DDBMS has the capability to hide the details of data distribution from the application server, then it
enables the application server to execute global queries and transactions as though the database were
centralized, without having to specify the sites at which the data referenced in the query or transaction
resides. This property is called distribution transparency. Some DDBMSs do not provide distribution
transparency, instead requiring that applications are aware of the details of data distribution.
TRANSACTION MANAGEMENT IN DISTRIBUTED DATABASES:
 The global and local transaction management software modules, along with the concurrency control and
recovery manager of a DDBMS, collectively guarantee the ACID properties of transactions.
 The global transaction manager is supporting distributed transactions. The site where the transaction
originated can temporarily assume the role of global transaction manager and coordinate the execution
of database operations with transaction managers across multiple sites. Transaction managers export
their functionality as an interface to the application programs.
 The operations exported by this interface are BEGIN_TRANSACTION, READ or WRITE,
END_TRANSACTION, COMMIT_TRANSACTION, and ROLLBACK (or ABORT).
 The manager stores bookkeeping information related to each transaction, such as a unique identifier,
originating site, name, and so on. For READ operations, it returns a local copy if valid and available.
For WRITE operations, it ensures that updates are visible across all sites containing copies (replicas) of
the data item. For ABORT operations, the manager ensures that no effects of the transaction are reflected
in any site of the distributed database. For COMMIT operations, it ensures that the effects of a write
are persistently recorded on all databases containing copies of the data item. Atomic termination
(COMMIT/ ABORT) of distributed transactions is commonly implemented using the two- phase commit
protocol.
 The transaction manager passes to the concurrency controller the database operation and associated
information. The controller is responsible for acquisition and release of associated locks. If the
transaction requires access to a locked resource, it is delayed until the lock is acquired. Once the lock is
acquired, the operation is sent to the runtime processor, which handles the actual execution of the
database operation. Once the operation is completed, locks are released and the transaction manager is
updated with the result of the operation.
Two-Phase Commit Protocol
The two-phase commit protocol (2PC) requires a global recovery manager, or coordinator, to maintain
information needed for recovery, in addition to the local recovery managers and the information they maintain
(log, tables) . The two-phase commit protocol has certain drawbacks that led to the development of the three-
phase commit protocol.
Three-Phase Commit Protocol
1) The biggest drawback of 2PC is that it is a blocking protocol. Failure of the coordinator blocks all
participating sites, causing them to wait until the coordinator recovers. This can cause performance
degradation, especially if participants are holding locks to shared resources.
2) Another problematic scenario is when both the coordinator and a participant that has committed
crash together. In the two-phase commit protocol, a participant has no way to ensure that all participants
got the commit message in the second phase. Hence once a decision to commit has been made by the
coordinator in the first phase, participants will commit their transactions in the second phase independent
of receipt of a global commit message by other participants. Thus, in the situation that both the
coordinator and a committed participant crash together, the result of the transaction becomes uncertain
or nondeterministic. Since the transaction has already been committed by one participant, it cannot be
aborted on recovery by the coordinator. Also, the transaction cannot be optimistically committed on
recovery since the original vote of the coordinator may have been to abort.
These problems are solved by the three-phase commit (3PC) protocol, which essentially divides the second
commit phase into two subphases called prepare-to-commit and commit. The prepare-to-commit phase is used
to communicate the result of the vote phase to all participants. If all participants vote yes, then the coordinator
instructs them to move into the prepare-to-commit state. The commit subphase is identical to its two-phase
counterpart. Now, if the coordinator crashes during this subphase, another participant can see the transaction
through to completion. It can simply ask a crashed participant if it received a prepare-to-commit message. If it
did not, then it safely assumes to abort. Thus the state of the protocol can be recovered irrespective of which
participant crashes. Also, by limiting the time required for a transaction to commit or abort to a maximum time-
out period, the protocol ensures that a transaction attempting to commit via 3PC releases locks on time-out.
The main idea is to limit the wait time for participants who have committed and are waiting for a global commit
or abort from the coordinator. When a participant receives a precommit message, it knows that the rest of the
participants have voted to commit. If a precommit message has not been received, then the participant will abort
and release all locks.
Operating System Support for Transaction Management
The following are the main benefits of operating system (OS)-supported transaction management:
Typically, DBMSs use their own semaphores to guarantee mutually exclusive access to
shared resources. Since these semaphores are implemented in user space at the level of the DBMS
application software, the OS has no knowledge about them. Hence if the OS deactivates a DBMS
process holding a lock, other DBMS processes wanting this lock resource get queued. Such a
situation can cause serious performance degradation. OS-level knowledge of semaphores can help
eliminate such situations.
Specialized hardware support for locking can be exploited to reduce associated costs. This
can be of great importance, since locking is one of the most common DBMS operations.
Providing a set of common transaction support operations though the kernel allows application
developers to focus on adding new features to their products as opposed to reimplementing the
common functionality for each application. For example, if different DDBMSs are to coexist on
the same machine and they chose the two-phase commit protocol, then it is more beneficial to have
this protocol implemented as part of the kernel so that the DDBMS developers can focus more on
adding new features to their products.
1. Define Distributed Database Management Systems.

A distributed database management system consists of loosely coupled sites (computer) that
share no physical components and each site is associated with a database system.
2. What are various fragmentations? State various fragmentations with example?

There are two types of fragmentation. They are Horizontal Fragmentation and Vertical
Fragmentation.
Horizontal fragmentation:
Splits the relation by assigning each tuple of r to one or more fragments relation r is partitioned
into a number of subsets, r1 ,r2,…..rn and can be reconstruct the original relation using union of all
fragments, that is r = r1 U r2 U……U rn
Vertical fragmentation:
– Splits the relation by decomposing scheme R of relation and reconstruct the original relation by
using natural join of all fragments. that is r = r r …… r
1 2 n
3. Give an example of two phase commit protocol?

Client want all or nothing transactions and Transfer either happens or nothing at all.
4. What are the advantages of distributed databases?
There is fast data processing as several sites participate in request processing.
Reliability and Availability of the system is high
It Possess reduced operating cost
It is easier to expand the system by adding more sites
It has improved sharing ability and local autonomy
5. List out the reasons for development of distributed databases?

Following are the reasons for development of distributed databases-
To control the data present at geographically different sites
To obtain highly available and reliable data processing systems.
6. What are two approaches to store a relation in the distributed databases?
Replication: System maintains multiple copies of data, stored in different sites, for fast retrieval
and fault tolerance.
Fragmentation: Relation is partitioned into several fragments stored in distinct sites.
7. Differentiate between Homogeneous and Heterogeneous Schema?
Homogeneous Schema Heterogeneous Schema
It shares global schema It has different schemas
Each site provides part of its autonomy in Each site maintains its own right to change
terms of right to change or software the schema or software.
Due to same schema, there is no problem in Due to different schemas, there are lot of
query processing. problems in query processing.
8. What are the advantages of fragmentation?
It allows parallel processing on fragments of a relation
It allows a relation to be split so that tuples are located where they are most frequently
accessed.
1. Explain about Distributed Databases and their characteristics, functions and advantages
and disadvantages.
Distributed Database: A logically interrelated collection of shared data and their description,
physically distributed over a computer network.
Distributed Processing: A centralized database, which may be accessed from different
computer systems, over an underlying network.
Replicated DBMS: A DDBMS that keeps and controls replicate data, such as Relations, in
multiple databases.
Distributed DBMS (DDBMS) consists of a collection of sites, each of which maintains a local
db system. So it is a network of computers interconnected by a data communication system so
that the physical db is distributed on at least two of the system's components:
Each site on the network is able to process local Transactions (i.e. - access data only in that
single site)
Each site may participate in the execution of Global Transactions (i.e. - access data in several
sites) which requires communication among the sites.
Note 1: The above can be thought of: Local Applications & Global Applications
Note 2: This scheme is transparent to users.
Homogeneous DDBMS: This is the case when the application programs are independent of how
the db is distributed; i.e. if the distribution of the physical data can be altered without having to
make alterations to the application programs. Here, all sites use the same DBMS product - same
schemata and same data dictionaries.
Heterogeneous DDBMS: This is the case when the application programs are dependent on the
physical location of the stored data; i.e. application programs must be altered if data is moved
from one site to another. Here, there are different kinds of DBMSs (i.e. Hierarchical,
Network, Relational, Object, etc.), with different underlying data models.
Characteristics of a DDBMS
A DDBMS developed by a single vendor may contain:
• Data independence
• Concurrency Control
• Replication facilities
• Recovery facilities
• Coordinated Data Dictionary
• Authorization System
• Shared Manipulation Language
Also:
• Transaction Manager (TM)
• Data Manager (DM)
• Transaction Coordinator (TC)
NOTE: a Distributed Data Processing System is a system where the application programs run on
distributed computers which are linked together by a data transmission network.
Advantages of DDBMSs
More accurately reflects organizational structure
Shareability and Local Autonomy (enforces global and local policies)
Availability and Reliability (failed central db vs failed node)
Performance (process/data migration and speed)
Economics
Modular growth
Integration (with older systems)
Disadvantages of DDBMSs
Complexity (Replication overhead, etc)
Maintenance Costs (of sites)
Security (Network Security)
Integrity Control (More complex)
Lack of Standards
Lack of Experience and Misconceptions
Database Design more complex
2. Discuss in detail about Distributed Databases?

(OR)
Explain with diagrammatic illustration the architecture of distributed database
management system?
In distributed database system data reside in several location where as centralized database
system the data reside in single location
Classification
- Homogenous distributed DB
- Heterogeneous distributed DB
Homogenous distributed DB
– All sites have identical database management software, are aware of one another.
– Agree to cooperate in processing users’ request.
• Heterogeneous distributed DB
– Different sites may use different schemas and different DBMS software.
– The sites may not be aware of one another
– Provide only limited facilities for cooperation in transaction processing.
• Consider a relation r, there are two approaches to store this relation in the distributed DB.
– Replication
– Fragmentation
• Replication
– The system maintains several identical replicas (copies) of the relation at different site.
– Full replication- copy is stored in every site in the system.
• Advantages and disadvantages
– Availability
– Increased parallelism
Increased overhead update
• Fragmentation
– The system partitions the relation into several fragment and stores each fragment at different
sites
– Two approaches
• Horizontal fragmentation
• Vertical fragmentation
Horizontal fragmentation
Splits the relation by assigning each tuple of r to one or more fragments relation r is partitioned
into a number of subsets, r1 ,r2,…..rn and can be reconstruct the original relation using union of all
fragments, that is r = r1 U r2 U……U rn
• Vertical fragmentation
– Splits the relation by decomposing scheme R of relation and reconstruct the original relation by
using natural join of all fragments. That is r = r1 r2 …… rn
1. Briefly explain about Two phase commit and three phase commit protocols.
(OR)
Explain two phase commit protocol with an example?
Two phase commit protocol

• Assumes fail-stop model – failed sites simply stop working, and do not cause any other harm,
such as sending incorrect messages to other sites.
• Execution of the protocol is initiated by the coordinator after the last step of the transaction has
been reached.
• The protocol involves all the local sites at which the transaction executed
• Let T be a transaction initiated at site Si, and let the transaction coordinator at Si be Ci.
Phase 1: Obtaining a Decision (prepare)
• Coordinator asks all participants to prepare to commit transaction Ti.
– Ci adds the records <prepare T> to the log and forces log to stable storage
– sends prepare T messages to all sites at which T executed
• Upon receiving message, transaction manager at site determines if it can commit the transaction
– if not, add a record <no T> to the log and send abort T message to Ci
– if the transaction can be committed, then:
– add the record <ready T> to the log
– force all records for T to stable storage
– send ready T message to Ci
Phase 2: Recording the Decision (commit)
• T can be committed of Ci received a ready T message from all the participating sites: otherwise
T must be aborted.
• Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto
stable storage. Once the record stable storage it is irrevocable (even if failures occur)
• Coordinator sends a message to each participant informing it of the decision (commit or abort)
• Participants take appropriate action locally.
Possible Failures
Site Failure
Coordinator Failure
Network Partition
Three phase commit protocols.
Assumptions:
– No network partitioning
– At any point, at least one site must be up.
– At most K sites (participants as well as coordinator) can fail
Phase 1: Obtaining Preliminary Decision: Identical to 2PC Phase 1.
– Every site is ready to commit if instructed to do so
– Under 2PC each site is obligated to wait for decision from coordinator
– Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator
failure
Phase 2: Recording the Preliminary Decision
Coordinator adds a decision record (<abort T> or <precommit T>) in its log and forces record
to stable storage. Coordinator sends a message to each participant informing it of the decision.
Participant records decision in its log
– If abort decision reached then participant aborts locally
– If pre-commit decision reached then participant replies with <Acknowledge T>
Phase 3: Recording Decision in the Database
Executed only if decision in phase 2 was to pre commit
Coordinator collects acknowledgments. It sends <commit T> message to the participants as
soon as it receives K acknowledgments.
Coordinator adds the record <commit T> in its log and forces record to stable storage.
Coordinator sends a message to each participant to <commit T>.
Participants take appropriate action locally
Under 3PC, knowledge of pre-commit decision can be used to commit despite coordinator failure
Avoids blocking problem as long as < K sites fail
Drawbacks:
higher overheads
Assumptions may not be satisfied in practice.

DBMS-unit 5-distributed databases

Uploaded by

DBMS-unit 5-distributed databases

Uploaded by

DISTRIBUTED DATABASE:

Figure. Some different database system architectures.

GENERAL ARCHITECTURE OF PURE DISTRIBUTED DATABASES

Figure . The five-level schema architecture in a federated database system (FDBS).

Typical five-level schema architecture to support global applications in the FDBS

An Overview of Three-Tier Client-Server Architecture:

In the three-tier client-server architecture, the following three layers exist:

Figure . The three-tier client-server architecture.

TRANSACTION MANAGEMENT IN DISTRIBUTED DATABASES:

Two-Phase Commit Protocol

Three-Phase Commit Protocol

1. Define Distributed Database Management Systems.

2. What are various fragmentations? State various fragmentations with example?

3. Give an example of two phase commit protocol?

5. List out the reasons for development of distributed databases?

2. Discuss in detail about Distributed Databases?

Two phase commit protocol

You might also like