100% found this document useful (1 vote)
69 views94 pages

Module 4 Transaction Processing

This document discusses transaction processing and error recovery. It defines a transaction as a unit of program execution that accesses and possibly updates data. Transactions must satisfy the ACID properties - Atomicity, Consistency, Isolation, and Durability. Concurrency control mechanisms allow multiple transactions to execute concurrently while maintaining isolation. Serializability and recoverability are also discussed.

Uploaded by

Vaidehi Verma
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
100% found this document useful (1 vote)
69 views94 pages

Module 4 Transaction Processing

This document discusses transaction processing and error recovery. It defines a transaction as a unit of program execution that accesses and possibly updates data. Transactions must satisfy the ACID properties - Atomicity, Consistency, Isolation, and Durability. Concurrency control mechanisms allow multiple transactions to execute concurrently while maintaining isolation. Serializability and recoverability are also discussed.

Uploaded by

Vaidehi Verma
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 94

Module 4

Transaction Processing and


Error Recovery
Outline
Transaction Concept
Transaction State
Concurrent Executions
Serializability
Recoverability
Implementation of Isolation
Transaction Definition in SQL
Testing for Serializability.
Transaction Concept
A transaction is a unit of program execution that
accesses and possibly updates various data items.
E.g., transaction to transfer $50 from account A
to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)

Two main issues to deal with:


◦ Failures of various kinds, such as hardware failures and
system crashes
◦ Concurrent execution of multiple transactions
Required Properties of a Transaction
 Consider a transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Atomicity requirement
◦ If the transaction fails after step 3 and before step 6, money will be “lost”
leading to an inconsistent database state
 Failure could be due to software or hardware
◦ The system should ensure that updates of a partially executed transaction are
not reflected in the database
 Durability requirement — once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has taken place), the updates to the
database by the transaction must persist even if there are software or hardware
failures.
Required Properties of a Transaction (Cont.)
 Consistency requirement in above example:
◦ The sum of A and B is unchanged by the execution of the transaction
 In general, consistency requirements include
 Explicitly specified integrity constraints such as primary keys and
foreign keys
 Implicit integrity constraints
 e.g., sum of balances of all accounts, minus sum of loan amounts
must equal value of cash-in-hand
 A transaction, when starting to execute, must see a consistent database.
 During transaction execution the database may be temporarily inconsistent.
 When the transaction completes successfully the database must be
consistent
◦ Erroneous transaction logic can lead to inconsistency
Required Properties of a Transaction (Cont.)
 Isolation requirement — if between steps 3 and 6 (of the fund transfer
transaction) , another transaction T2 is allowed to access the partially updated
database, it will see an inconsistent database (the sum A + B will be less than
it should be).

T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
 Isolation can be ensured trivially by running transactions serially
◦ That is, one after the other.
 However, executing multiple transactions concurrently has significant
benefits, as we will see later.
ACID Properties
A transaction is a unit of program execution that accesses and possibly
updates various data items. To preserve the integrity of data the database
system must ensure:
 Atomicity. Either all operations of the transaction are properly
reflected in the database or none are.
 Consistency. Execution of a transaction in isolation preserves
the consistency of the database.
 Isolation. Although multiple transactions may execute
concurrently, each transaction must be unaware of other
concurrently executing transactions. Intermediate transaction
results must be hidden from other concurrently executed
transactions.
◦ That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started execution
after Ti finished.
 Durability. After a transaction completes successfully, the
changes it has made to the database persist, even if there are
system failures.
Transaction State
Active – the initial state; the transaction stays in
this state while it is executing
Partially committed – after the final statement
has been executed.
Failed -- after the discovery that normal execution
can no longer proceed.
Aborted – after the transaction has been rolled
back and the database restored to its state prior to
the start of the transaction. Two options after it
has been aborted:
◦ Restart the transaction
 can be done only if no internal logical error
◦ Kill the transaction
Committed – after successful completion.
Transaction State (Cont.)
Concurrent Executions
Multipletransactions are allowed to run
concurrently in the system. Advantages are:
◦ Increased processor and disk utilization, leading to
better transaction throughput
 E.g. one transaction can be using the CPU while another is
reading from or writing to the disk
◦ Reduced average response time for transactions:
short transactions need not wait behind long ones.
Concurrency control schemes – mechanisms
to achieve isolation
◦ That is, to control the interaction among the
concurrent transactions in order to prevent them from
destroying the consistency of the database
Schedules
Schedule – a sequences of instructions that specify the
chronological order in which instructions of concurrent
transactions are executed
◦ A schedule for a set of transactions must consist of all
instructions of those transactions
◦ Must preserve the order in which the instructions appear in
each individual transaction.
A transaction that successfully completes its execution
will have a commit instructions as the last statement
◦ By default transaction assumed to execute commit instruction
as its last step
A transactionthat fails to successfully complete its
execution will have an abort instruction as the last
statement
Schedule 1
 Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A
to B.
 An example of a serial schedule in which T1 is followed by T2 :
Schedule 2
 A serial schedule in which T2 is followed by T1 :
 Follow the following steps to check whether a given non-serial schedule is conflict
serializable or not-
 
 Step-01:
 
 Find and list all the conflicting operations.
 
 Step-02:

 Start creating a precedence graph by drawing one node for each transaction.
 
 Step-03:
 
 Draw an edge for each conflict pair such that if Xi (V) and Yj (V) forms a conflict
pair then draw an edge from Ti to Tj.
 This ensures that T  gets executed before T .
i j
 
 Step-04:
 
 Check if there is any cycle formed in the graph.
 If there is no cycle found, then the schedule is conflict serializable otherwise not.
Checking Whether a Schedule is View Serializable
Or Not-
 
Method-01:
 
Check whether the given schedule is conflict
serializable or not.
If the given schedule is conflict serializable, then it
is surely view serializable. Stop and report your
answer.
If the given schedule is not conflict serializable,
then it may or may not be view serializable. Go and
check using other methods.
Method-02:

Check if there exists any blind write operation.


(Writing without reading is called as a blind
write).
If there does not exist any blind write, then the
schedule is surely not view serializable. Stop and
report your answer.
If there exists any blind write, then the schedule
may or may not be view serializable. Go and
check using other methods.
Method-03:
 
In this method, try finding a view
equivalent serial schedule.
By using the above three conditions, write
all the dependencies.
Then, draw a graph using those
dependencies.
If there exists no cycle in the graph, then
the schedule is view serializable otherwise
not.
Recoverable Schedules-

A transaction performs a dirty read


operation from an uncommitted
transaction
And its commit operation is delayed till
the uncommitted transaction either
commits or roll backs
then such a schedule is called as
a Recoverable Schedule.
 
Types of Recoverable Schedules-
Cascading Schedule
Cascadeless Schedule
Strict Schedule
Cascading Schedule-

If in a schedule, failure of one transaction


causes several other dependent
transactions to rollback or abort, then such
a schedule is called as a Cascading
Schedule or Cascading
Rollback or Cascading Abort.
It simply leads to the wastage of CPU
time.
 Transaction T2 depends on transaction T1.
 Transaction T3 depends on transaction T2.
 Transaction T4 depends on transaction T3.

 
In this schedule,
 The failure of transaction T1 causes the transaction T2
to rollback.
 The rollback of transaction T2 causes the transaction T3
to rollback.
 The rollback of transaction T3 causes the transaction T4
to rollback.
 Such a rollback is called as a Cascading Rollback.
Cascadeless Schedule-
If in a schedule, a transaction is not allowed to
read a data item until the last transaction that
has written it is committed or aborted, then
such a schedule is called as a Cascadeless
Schedule.
In other words,
Cascadeless schedule allows only committed
read operations.
Therefore, it avoids cascading roll back and
thus saves CPU time.
Strict Schedule-

If in a schedule, a transaction is neither allowed to


read nor write a data item until the last transaction
that has written it is committed or aborted, then such
a schedule is called as a Strict Schedule.
In other words,
Strict schedule allows only committed read and
write operations.
Clearly, strict schedule implements more
restrictions than cascadeless schedule.
Concurrency control
Concurrency control is the procedure in DBMS for managing
simultaneous operations without conflicting with each
another.
It is provided in a database to:
(i) enforce isolation among transactions.
(ii) preserve database consistency through consistency
preserving execution of transactions.
(iii) resolve read-write and write-read conflicts.

Various concurrency control techniques are:


1. Two-phase locking Protocol
2. Time stamp ordering Protocol
3. Multi version concurrency control
4. Validation concurrency control
Example
 Assume that two people who go to electronic kiosks at
the same time to buy a movie ticket for the same movie
and the same show time.
 However, there is only one seat left in for the movie
show in that particular theatre. Without concurrency
control, it is possible that both moviegoers will end up
purchasing a ticket. However, concurrency control
method does not allow this to happen. Both moviegoers
can still access information written in the movie seating
database. But concurrency control only provides a ticket
to the buyer who has completed the transaction process
first.
Concurrency Control Protocols

The concurrency control protocol can be


divided into three categories:
1.Lock based protocol
2.Time-stamp protocol
3.Validation based protocol
1. Lock Based Protocol
 In this type of protocol, any transaction cannot read or write data
until it acquires an appropriate lock on it. There are two types of
lock:
1. Shared lock:
 It is also known as a Read-only lock. In a shared lock, the data item
can only read by the transaction.
 It can be shared between the transactions because when the
transaction holds a lock, then it can't update the data on the data item.
2. Exclusive lock:
 In the exclusive lock, the data item can be both reads as well as
written by the transaction.
 This lock is exclusive, and in this lock, multiple transactions do not
modify the same data simultaneously.
Lock Protocols
There are four types of lock protocols
available:
1.Simplistic lock protocol
2. Pre-claiming Lock Protocol
3.Two-phase locking (2PL)
4.Strict Two-phase locking (Strict-2PL)
1. Simplistic Lock Protocol
It is the simplest way of locking the data
while transaction.
Simplistic lock-based protocols allow all
the transactions to get the lock on the data
before insert or delete or update on it.
It will unlock the data item after
completing the transaction.
2.Pre-claiming Lock Protocol
 Pre-claiming Lock Protocols evaluate the transaction to list all
the data items on which they need locks.
 Before initiating an execution of the transaction, it requests
DBMS for all the lock on all those data items.
 If all the locks are granted then this protocol allows the
transaction to begin. When the transaction is completed then it
releases all the lock.
 If all the locks are not granted then this protocol allows the
transaction to rolls back and waits until all the locks are granted .

2.Pre-claiming Lock Protocol
3. Two-phase locking (2PL)
The two-phase locking protocol divides the execution
phase of the transaction into three parts.
In the first part, when the execution of the transaction
starts, it seeks permission for the lock it requires.
In the second part, the transaction acquires all the
locks. The third phase is started as soon as the
transaction releases its first lock.
In the third phase, the transaction cannot demand any
new locks. It only releases the acquired locks.
3. Two-phase locking (2PL)
3. Two Phase Locking(2PL)
4. Strict Two-phase locking (Strict-
2PL)
The first phase of Strict-2PL is similar to 2PL. In
the first phase, after acquiring all the locks, the
transaction continues to execute normally.
The only difference between 2PL and strict 2PL is
that Strict-2PL does not release a lock after using it.
Strict-2PL waits until the whole transaction to
commit, and then it releases all the locks at a time.
Strict-2PL protocol does not have shrinking phase
of lock release.
4. Strict Two-phase locking
(Strict-2PL)
Error / Crash Recovery

DBMS is a highly complex system with


hundreds of transactions being executed
every second. The durability and robustness
of a DBMS depends on its complex
architecture and its underlying hardware
and system software. If it fails or crashes
amid transactions, it is expected that the
system would follow some sort of algorithm
or techniques to recover lost data.
Failure Classification

1.Transaction Failure
2.System Crash
3.Disk Failure
Transaction failure
A transaction has to abort when it fails to execute or when it
reaches a point from where it can’t go any further. This is called
transaction failure where only a few transactions or processes
are hurt.
 Reasons for a transaction failure could be −
 Logical errors − Where a transaction cannot complete because
it has some code error or any internal error condition.
 System errors − Where the database system itself terminates an
active transaction because the DBMS is not able to execute it, or
it has to stop because of some system condition. For example, in
case of deadlock or resource unavailability, the system aborts an
active transaction.
System Crash

There are problems − external to the


system − that may cause the system to
stop abruptly and cause the system to
crash. For example, interruptions in power
supply may cause the failure of
underlying hardware or software failure.
Examples may include operating system
errors.
Disk Failure

In early days of technology evolution, it


was a common problem where hard-disk
drives or storage drives used to fail
frequently.
Disk failures include formation of bad
sectors, unreachability to the disk, disk
head crash or any other failure, which
destroys all or a part of disk storage.
Storage Structure
 In brief, the storage structure can be divided into two
categories −
 Volatile storage − As the name suggests, a volatile storage
cannot survive system crashes. Volatile storage devices are
placed very close to the CPU; normally they are embedded
onto the chipset itself. For example, main memory and cache
memory are examples of volatile storage. They are fast but
can store only a small amount of information.
 Non-volatile storage − These memories are made to survive
system crashes. They are huge in data storage capacity, but
slower in accessibility. Examples may include hard-disks,
magnetic tapes, flash memory, and non-volatile (battery
backed up) RAM.
Recovery and Atomicity

When a system crashes, it may have several


transactions being executed and various files
opened for them to modify the data items.
 Transactions are made of various
operations, which are atomic in nature. But
according to ACID properties of DBMS,
atomicity of transactions as a whole must be
maintained, that is, either all the operations
are executed or none.
Recovery and Atomicity
 When a DBMS recovers from a crash, it should
maintain the following −
 It should check the states of all the transactions,
which were being executed.
 A transaction may be in the middle of some
operation; the DBMS must ensure the atomicity of
the transaction in this case.
 It should check whether the transaction can be
completed now or it needs to be rolled back.
 No transactions would be allowed to leave the
DBMS in an inconsistent state.
Recovery and Atomicity

There are two types of techniques, which can


help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
Maintaining the logs of each transaction, and
writing them onto some stable storage before
actually modifying the database.
Maintaining shadow paging, where the changes
are done on a volatile memory, and later, the
actual database is updated.
Log-based Recovery

Log is a sequence of records, which


maintains the records of actions
performed by a transaction.
It is important that the logs are written
prior to the actual modification and stored
on a stable storage media, which is
failsafe.
Log-based Recovery
Log Based recovery
The database can be modified using two
approaches −
Deferred database modification − All logs
are written on to the stable storage and the
database is updated when a transaction commits.
Immediate database modification − Each
log follows an actual database modification.
That is, the database is modified immediately
after every operation.
Recovery with Concurrent
Transactions
When more than one transaction are
being executed in parallel, the logs are
interleaved.
At the time of recovery, it would become
hard for the recovery system to backtrack
all logs, and then start recovering.
To ease this situation, most modern
DBMS use the concept of 'checkpoints'.
Recovery
Checkpoint
 Keeping and maintaining logs in real time and in real
environment may fill out all the memory space
available in the system.
 As time passes, the log file may grow too big to be
handled at all.
 Checkpoint is a mechanism where all the previous
logs are removed from the system and stored
permanently in a storage disk.
 Checkpoint declares a point before which the DBMS
was in consistent state, and all the transactions were
committed.
Recovery
 The recovery system reads the logs backwards from the end to
the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.
 If the recovery system sees a log with <T n, Start> and <Tn,
Commit> or just <Tn, Commit>, it puts the transaction in the
redo-list.
 If the recovery system sees a log with <T n, Start> but no
commit or abort log found, it puts the transaction in undo-list.
 All the transactions in the undo-list are then undone and their
logs are removed. All the transactions in the redo-list and their
previous logs are removed and then redone before saving their
logs.
Recovery
Recovery
Undo and Redo Logging
Shadow Paging
 Shadow Paging is recovery technique that is used to recover 
database.
 It is a copy-on-write technique for avoiding in-place updates of
pages.
 It is a technique for providing atomicity and durability in database
systems
 In this recovery technique, database is considered as made up of
fixed size of logical units of storage which are referred as pages. 
 pages are mapped into physical blocks of storage, with help of
the page table which allow one entry for each logical page of
database.
 This method uses two page tables named current page
table and shadow page table.
Shadow Paging
 The entries which are present in current page table are
used to point to most recent database pages on disk.
 Another table i.e., Shadow page table is used when the
transaction starts which is copy of current page table.
 After this, shadow page table gets saved on disk and
current page table is going to be used for transaction.
 Entries present in current page table may be changed
during execution but in shadow page table it never get
changed. After transaction, both tables become identical.
Shadow Paging
Shadow Paging
 In this 2 write operations are performed on page 3 and 5.
Before start of write operation on page 3, current page table
points to old page 3.
 When write operation starts following steps are performed :
 Firstly, search start for available free block in disk blocks.
 After finding free block, it copies page 3 to free block which
is represented by Page 3 (New).
 Now current page table points to Page 3 (New) on disk but
shadow page table points to old page 3 because it is not
modified.
 The changes are now propagated to Page 3 (New) which is
pointed by current page table.
Shadow Paging
COMMIT Operation :
To commit transaction following steps should be done :
All the modifications which are done by transaction which
are present in buffers are transferred to physical database.
Output current page table to disk.
Disk address of current page table output to fixed location
which is in stable storage containing address of shadow page
table. This operation overwrites address of old shadow page
table.
With this current page table becomes same as shadow page
table and transaction is committed.
Shadow Paging
 Failure :
If system crashes during execution of transaction but
before commit operation, With this, it is sufficient only to
free modified database pages and discard current page
table. Before execution of transaction, state of database get
recovered by reinstalling shadow page table.
 If the crash of system occur after last write operation then
it does not affect propagation of changes that are made by
transaction. These changes are preserved and there is no
need to perform redo operation.
Shadow Paging
Advantages :
This method require fewer disk accesses to perform operation.
In this method, recovery from crash is inexpensive and quite fast.
There is no need of operations like- Undo and Redo.

Disadvantages :
Due to location change on disk due to update database it is quite difficult
to keep related pages in database closer on disk.
During commit operation, changed blocks are going to be pointed by
shadow page table which have to be returned to collection of free blocks
otherwise they become accessible.
The commit of single transaction requires multiple blocks which
decreases execution speed.
To allow this technique to multiple transactions concurrently it is difficult.
Dirty Page
 When a page is read from disk in memory, it is
considered a clean page because it is similar to its
equivalent on disk.
 However, once the page has been modified in memory it
is marked as a dirty page means Any pages which are
available in buffer pool different from disk are known as
Dirty Pages.
 Simply we can say that the pages which are modified in
the buffer cache is called as a ‘Dirty page’.
Write Ahead Log Protocol
Logs have to kept in the memory, so that
when there is any failure, DB can be
recovered using the log files.
Whenever we are executing a transaction,
there are two tasks –
a. one to perform the transaction and update
DB
b. another one is to update the log files.
Write Ahead Log Protocol
But when these log files are created –
Before executing the transaction, or
during the transaction or after the
transaction? Which will be helpful during
the crash ?
Write Ahead Log Protocol
When a log is created after executing a
transaction, there will not be any log information
about the data before to the transaction.
In addition, if a transaction fails, then there is no
question of creating the log itself.
Suppose there is a media failure, then how a log
file can be created? We will lose all the data if we
create a log file after the transaction. Hence it is
of no use while recovering the data.
Write Ahead Log Protocol
 Suppose we created a log file first with before value of the
data. Then if the system crashes while executing the
transaction, then we know what its previous state / value
was and we can easily revert the changes.
 Hence it is always a better idea to log the details into log
file before the transaction is executed.
 In addition, it should be forced to update the log files first
and then have to write the data into DB. i.e.;
 in ATM withdrawal, each stages of transactions should be
logged into log files, and stored somewhere in the
memory. Then the actual balance has to be updated in DB.
This will guarantee the atomicity of the transaction even if
the system fails. This is known as Write-Ahead Logging
Protocol.
Write Ahead Log Protocol
 But in this protocol, we have I/O access twice – one for
writing the log and another for writing the actual
data.
 This is reduced by keeping the log buffer in the main
memory – log files are kept in the main memory for
certain pre-defined time period and then flushed into the
disk.
 The log files are appended with data for certain period,
once the buffer is full or it reaches the time limit, then it
is written into the disk. This reduces the I/O time for
writing the log files into the disk.
Write Ahead Log Protocol
 Similarly retrieving the data from the disk is also needs I/O.
 This can also be reduced by maintaining the data in the page
cache of the main memory. That is whenever a data has to be
retrieved; it will be retrieved from the disk for the first time.
 Then it will be kept in the page cache for the future
reference. If the same data is requested again, then it will be
retrieved from this page cache rather than retrieving from the
disk. This reduces the time for retrieval of data.
 When the usage / access to this data reduce to some
threshold, then it will be removed from page cache and space
is made available for other data.
ARIES Algorithm
 Algorithm for Recovery and Isolation Exploiting
Semantics (ARIES) is based on the Write Ahead Log
(WAL) protocol. Every update operation writes a 
log record which is one of the following :
 Undo-only log record:
Only the before image is logged. Thus, an undo
operation can be done to retrieve the old data.
 Redo-only log record:
Only the after image is logged. Thus, a redo operation
can be attempted.
 Undo-redo log record:
Both before images and after images are logged.
ARIES Algorithm
 In it, every log record is assigned a unique and
monotonically increasing log sequence number (LSN).
 Every data page has a page LSN field that is set to the LSN
of the log record corresponding to the last update on the
page.  
 WAL requires that the log record corresponding to an update
make it to stable storage before the data page corresponding
to that update is written to disk.
 For performance reasons, each log write is not immediately
forced to disk. A log tail is maintained in main memory to
buffer log writes. The log tail is flushed to disk when it gets
full. A transaction cannot be declared committed until the
commit log record makes it to disk
ARIES Algorithm
 Once in a while the recovery subsystem writes a
checkpoint record to the log.
 The checkpoint record contains the transaction table and
the dirty page table.
 A master log record is maintained separately, in stable
storage, to store the LSN of the latest checkpoint record
that made it to disk.
 On restart, the recovery subsystem reads the master log
record to find the checkpoint’s LSN, reads the
checkpoint record, and starts recovery from there on.
ARIES Algorithm
 The recovery process actually consists of 3 phases:
 Analysis:
The recovery subsystem determines the earliest log record
from which the next pass must start. It also scans the log
forward from the checkpoint record to construct a snapshot
of what the system looked like at the instant of the crash.
 Redo:
Starting at the earliest LSN, the log is read forward and
each update redone.
 Undo:
The log is scanned backward and updates corresponding to
loser transactions are undone.

You might also like