Module 4 Transaction Processing
Module 4 Transaction Processing
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
Isolation can be ensured trivially by running transactions serially
◦ That is, one after the other.
However, executing multiple transactions concurrently has significant
benefits, as we will see later.
ACID Properties
A transaction is a unit of program execution that accesses and possibly
updates various data items. To preserve the integrity of data the database
system must ensure:
Atomicity. Either all operations of the transaction are properly
reflected in the database or none are.
Consistency. Execution of a transaction in isolation preserves
the consistency of the database.
Isolation. Although multiple transactions may execute
concurrently, each transaction must be unaware of other
concurrently executing transactions. Intermediate transaction
results must be hidden from other concurrently executed
transactions.
◦ That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started execution
after Ti finished.
Durability. After a transaction completes successfully, the
changes it has made to the database persist, even if there are
system failures.
Transaction State
Active – the initial state; the transaction stays in
this state while it is executing
Partially committed – after the final statement
has been executed.
Failed -- after the discovery that normal execution
can no longer proceed.
Aborted – after the transaction has been rolled
back and the database restored to its state prior to
the start of the transaction. Two options after it
has been aborted:
◦ Restart the transaction
can be done only if no internal logical error
◦ Kill the transaction
Committed – after successful completion.
Transaction State (Cont.)
Concurrent Executions
Multipletransactions are allowed to run
concurrently in the system. Advantages are:
◦ Increased processor and disk utilization, leading to
better transaction throughput
E.g. one transaction can be using the CPU while another is
reading from or writing to the disk
◦ Reduced average response time for transactions:
short transactions need not wait behind long ones.
Concurrency control schemes – mechanisms
to achieve isolation
◦ That is, to control the interaction among the
concurrent transactions in order to prevent them from
destroying the consistency of the database
Schedules
Schedule – a sequences of instructions that specify the
chronological order in which instructions of concurrent
transactions are executed
◦ A schedule for a set of transactions must consist of all
instructions of those transactions
◦ Must preserve the order in which the instructions appear in
each individual transaction.
A transaction that successfully completes its execution
will have a commit instructions as the last statement
◦ By default transaction assumed to execute commit instruction
as its last step
A transactionthat fails to successfully complete its
execution will have an abort instruction as the last
statement
Schedule 1
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A
to B.
An example of a serial schedule in which T1 is followed by T2 :
Schedule 2
A serial schedule in which T2 is followed by T1 :
Follow the following steps to check whether a given non-serial schedule is conflict
serializable or not-
Step-01:
Find and list all the conflicting operations.
Step-02:
Start creating a precedence graph by drawing one node for each transaction.
Step-03:
Draw an edge for each conflict pair such that if Xi (V) and Yj (V) forms a conflict
pair then draw an edge from Ti to Tj.
This ensures that T gets executed before T .
i j
Step-04:
Check if there is any cycle formed in the graph.
If there is no cycle found, then the schedule is conflict serializable otherwise not.
Checking Whether a Schedule is View Serializable
Or Not-
Method-01:
Check whether the given schedule is conflict
serializable or not.
If the given schedule is conflict serializable, then it
is surely view serializable. Stop and report your
answer.
If the given schedule is not conflict serializable,
then it may or may not be view serializable. Go and
check using other methods.
Method-02:
In this schedule,
The failure of transaction T1 causes the transaction T2
to rollback.
The rollback of transaction T2 causes the transaction T3
to rollback.
The rollback of transaction T3 causes the transaction T4
to rollback.
Such a rollback is called as a Cascading Rollback.
Cascadeless Schedule-
If in a schedule, a transaction is not allowed to
read a data item until the last transaction that
has written it is committed or aborted, then
such a schedule is called as a Cascadeless
Schedule.
In other words,
Cascadeless schedule allows only committed
read operations.
Therefore, it avoids cascading roll back and
thus saves CPU time.
Strict Schedule-
1.Transaction Failure
2.System Crash
3.Disk Failure
Transaction failure
A transaction has to abort when it fails to execute or when it
reaches a point from where it can’t go any further. This is called
transaction failure where only a few transactions or processes
are hurt.
Reasons for a transaction failure could be −
Logical errors − Where a transaction cannot complete because
it has some code error or any internal error condition.
System errors − Where the database system itself terminates an
active transaction because the DBMS is not able to execute it, or
it has to stop because of some system condition. For example, in
case of deadlock or resource unavailability, the system aborts an
active transaction.
System Crash
Disadvantages :
Due to location change on disk due to update database it is quite difficult
to keep related pages in database closer on disk.
During commit operation, changed blocks are going to be pointed by
shadow page table which have to be returned to collection of free blocks
otherwise they become accessible.
The commit of single transaction requires multiple blocks which
decreases execution speed.
To allow this technique to multiple transactions concurrently it is difficult.
Dirty Page
When a page is read from disk in memory, it is
considered a clean page because it is similar to its
equivalent on disk.
However, once the page has been modified in memory it
is marked as a dirty page means Any pages which are
available in buffer pool different from disk are known as
Dirty Pages.
Simply we can say that the pages which are modified in
the buffer cache is called as a ‘Dirty page’.
Write Ahead Log Protocol
Logs have to kept in the memory, so that
when there is any failure, DB can be
recovered using the log files.
Whenever we are executing a transaction,
there are two tasks –
a. one to perform the transaction and update
DB
b. another one is to update the log files.
Write Ahead Log Protocol
But when these log files are created –
Before executing the transaction, or
during the transaction or after the
transaction? Which will be helpful during
the crash ?
Write Ahead Log Protocol
When a log is created after executing a
transaction, there will not be any log information
about the data before to the transaction.
In addition, if a transaction fails, then there is no
question of creating the log itself.
Suppose there is a media failure, then how a log
file can be created? We will lose all the data if we
create a log file after the transaction. Hence it is
of no use while recovering the data.
Write Ahead Log Protocol
Suppose we created a log file first with before value of the
data. Then if the system crashes while executing the
transaction, then we know what its previous state / value
was and we can easily revert the changes.
Hence it is always a better idea to log the details into log
file before the transaction is executed.
In addition, it should be forced to update the log files first
and then have to write the data into DB. i.e.;
in ATM withdrawal, each stages of transactions should be
logged into log files, and stored somewhere in the
memory. Then the actual balance has to be updated in DB.
This will guarantee the atomicity of the transaction even if
the system fails. This is known as Write-Ahead Logging
Protocol.
Write Ahead Log Protocol
But in this protocol, we have I/O access twice – one for
writing the log and another for writing the actual
data.
This is reduced by keeping the log buffer in the main
memory – log files are kept in the main memory for
certain pre-defined time period and then flushed into the
disk.
The log files are appended with data for certain period,
once the buffer is full or it reaches the time limit, then it
is written into the disk. This reduces the I/O time for
writing the log files into the disk.
Write Ahead Log Protocol
Similarly retrieving the data from the disk is also needs I/O.
This can also be reduced by maintaining the data in the page
cache of the main memory. That is whenever a data has to be
retrieved; it will be retrieved from the disk for the first time.
Then it will be kept in the page cache for the future
reference. If the same data is requested again, then it will be
retrieved from this page cache rather than retrieving from the
disk. This reduces the time for retrieval of data.
When the usage / access to this data reduce to some
threshold, then it will be removed from page cache and space
is made available for other data.
ARIES Algorithm
Algorithm for Recovery and Isolation Exploiting
Semantics (ARIES) is based on the Write Ahead Log
(WAL) protocol. Every update operation writes a
log record which is one of the following :
Undo-only log record:
Only the before image is logged. Thus, an undo
operation can be done to retrieve the old data.
Redo-only log record:
Only the after image is logged. Thus, a redo operation
can be attempted.
Undo-redo log record:
Both before images and after images are logged.
ARIES Algorithm
In it, every log record is assigned a unique and
monotonically increasing log sequence number (LSN).
Every data page has a page LSN field that is set to the LSN
of the log record corresponding to the last update on the
page.
WAL requires that the log record corresponding to an update
make it to stable storage before the data page corresponding
to that update is written to disk.
For performance reasons, each log write is not immediately
forced to disk. A log tail is maintained in main memory to
buffer log writes. The log tail is flushed to disk when it gets
full. A transaction cannot be declared committed until the
commit log record makes it to disk
ARIES Algorithm
Once in a while the recovery subsystem writes a
checkpoint record to the log.
The checkpoint record contains the transaction table and
the dirty page table.
A master log record is maintained separately, in stable
storage, to store the LSN of the latest checkpoint record
that made it to disk.
On restart, the recovery subsystem reads the master log
record to find the checkpoint’s LSN, reads the
checkpoint record, and starts recovery from there on.
ARIES Algorithm
The recovery process actually consists of 3 phases:
Analysis:
The recovery subsystem determines the earliest log record
from which the next pass must start. It also scans the log
forward from the checkpoint record to construct a snapshot
of what the system looked like at the instant of the crash.
Redo:
Starting at the earliest LSN, the log is read forward and
each update redone.
Undo:
The log is scanned backward and updates corresponding to
loser transactions are undone.