Distributed Deadlocks
Distributed Deadlocks
Distributed Deadlocks
Abstract
Distributed systems, in general, exhibit a high degree of resource and data sharing, a
situation in which deadlocks may happen. Deadlocks arise when members of a group of
processes which hold resources are blocked indefinitely from access to resources held by
other processes within the group.
Deadlock prevention is commonly achieved by either having a process acquire all the
needed resources simultaneously before it begins execution or by pre-empting a process
that holds the needed resource.
System Model
1. A distributed system consists of a set of processors that are connected by a
communication network. The communication delay is finite but unpredictable.
2. A distributed program is composed of a set of n asynchronous processes that
communicate by message passing over the communication network.
3. Each process is running on a different processor.
4. The processors do not share a common global memory and communicate solely
by passing messages over the communication network.
5. There is no physical global clock in the system to which processes have
instantaneous access.
6. The communication medium may deliver messages out of order, messages may be
lost, garbled, or duplicated due to timeout and retransmission, processors may
fail, and communication links may go down.
Comparison of Three Approaches
Here is a simple comparison of the approaches to handle deadlocks in distributed
systems:
Deadlock Prevention
Deadlock prevention is commonly achieved either by having a process acquire all the
needed resources simultaneously before it begins executing or by pre-empting a process
that holds the needed resource. This approach is highly inefficient because it decreases
system concurrency and impractical in distributed systems.
Deadlock Avoidance
In deadlock avoidance approach to distributed systems, a resource is granted to a process
if the resulting global system state is safe (a global state includes all the processes and
resources of the distributed system).
Although deadlock avoidance strategies are often used in centralized systems they are
rarely used in a distributed system. This is because checking for safe states is
computationally expensive due to the large number of processes and resources in
distributed systems without a global clock
Deadlock detection
Deadlock detection requires an examination of the status of process–resource interactions
for the presence of cyclic wait. Deadlock detection in distributed systems seems to be the
best approach to handle deadlocks in distributed systems
Wait-For Graph
In distributed systems, the state of the system can be modeled by directed graph, called a
wait-for graph (WFG). In a WFG, nodes are processes and there is a directed edge from
node P1 to node P2 if P1 is blocked and is waiting for P2 to release some resource. A
system is deadlocked if and only if there exists a directed cycle in the WFG.
Steps to Follow
For handling deadlocks, following steps are required:
1. Maintaining a wait-for graph and searching it for the presence of deadlock. A
cycle may consist of several sites.
2. Breaking existing wait-for dependencies between the processes to resolve the
deadlock. It involves rolling back one or more deadlocked processes and
assigning their resources to blocked processes so that they can resume execution
Correctness Criteria
A deadlock detection algorithm must satisfy the following two conditions:
1. Progress (no undetected deadlocks): The algorithm must detect all existing
deadlocks in a finite time. Once a deadlock has occurred, the deadlock detection
activity should continuously progress until the deadlock is detected.
2. Safety (no false deadlocks): The algorithm should not report deadlocks that do
not exist (called phantom or false deadlocks).
The OR Model
In the OR model, a process can make a request for numerous resources simultaneously.
The process that requires resources for execution can proceed when it has acquired at
least one of those resources.
AND-OR Model
In the AND-OR model, a request may specify any combination of and and or in the
resource request. For example, in the ANDOR model, a request for multiple resources
can be of the form x and (y or z). The requested resources may exist at different
locations.
P-out-of-Q Model
P-out-of-Q which means that a process simultaneously requests Q resources and remains
blocked until it is granted any P of those resources. Every request in AND-OR model can
be expressed in P-out-of-Q model and vice versa.
Distributed Deadlock Detection Algorithms
Classification
Distributed deadlock detection algorithms can be divided into four classes path-pushing,
edge-chasing, diffusion computation, and global state detection.
Path-pushing algorithms
In path-pushing algorithms, distributed deadlocks are detected by maintaining an explicit
global WFG. The basic idea is to build a global WFG for each site of the distributed
system. In this class of algorithm, whenever deadlock computation is performed, each
site sends its local WFG to all the neighboring sites. After the local data structure of each
site is updated, this updated WFG is then passed along to other sites, and the procedure is
repeated until one site has a sufficiently complete picture of the global state to announce
deadlock or to establish that no deadlocks are present.
Edge-chasing algorithms
In an edge-chasing algorithm, the presence of a cycle in a distributed graph structure is
verified by propagating special messages called probes along the edges of the graph.
These probe messages are different to the request and reply messages. The formation of a
cycle can be detected by a site if it receives the matching probe sent by it previously.
Whenever a process that is executing receives a probe message, it simply discards this
message and continues. Only blocked processes propagate probe messages along their
outgoing edges.
The main advantage of edge-chasing algorithms is that probes are fixed size messages
that are normally very short
Each node of the WFG has two local variables, called labels: a private label, which is
unique to the node at all times, though it is not constant, and a public label, which can be
read by other processes and which may not be unique. Each process is represented as u/v,
where u and v are the public and private labels, respectively. Initially, private and public
labels are equal for each process. A global WFG is maintained and it defines the entire
state of the system. The algorithm is defined by the four state transitions shown in the
figure:
z = inc(u, v), and inc(u, v) yields a unique label greater than both u and v. Labels that are
not shown do not change. Block creates an edge in the WFG. Two messages are needed:
one resource request and one message back to the blocked process to inform it of the
public label of the process it is waiting for. Activate denotes that a process has acquired
the resource from the process it was waiting for. Transmit propagates larger labels in the
opposite direction to the edges by sending a probe message. Whenever a process receives
a probe that is less than its public label, it simply ignores that probe. Detect means that
the probe with the private label of some process has returned to it, indicating a deadlock.
Whenever a process receives a signal, it compares its id with the one associated with the
signal and keeps the larger one in the outgoing signal. A process detects a deadlock when
it receives its own id.
Data structures
Each process Pi maintains a boolean array, dependent-i, where dependent-i(j) is true only
if Pi knows that Pj is dependent on it. Initially, dependent-i(j) is false for all i and j.
The algorithm
Following algorithm is executed to determine if a blocked process is deadlocked.
Therefore, a probe message is continuously circulated along the edges of the global WFG
graph and a deadlock is detected when a probe message returns to its initiating process.
On the receipt of a probe (i, j, k), the site takes the following actions:
if
(d) Pk is blocked, and
(e) dependentk i is false, and
(f) Pk has not replied to all requests Pj ,
Then begin
Dependent-k(i) = true;
if k = i
then declare that Pi is deadlocked
else for all Pm and Pn such that
(a’) Pk is locally dependent upon Pm, and
(b’) Pm is waiting on Pn, and
(c’) Pm and Pn are on different sites,
send a probe (i, m, n) to the home site of Pn
end.
Performance analysis
In the algorithm, one probe message (per deadlock detection initiation) is sent on every
edge of the WFG which connects processes on two sites. Thus, the algorithm exchanges
at most m(n−1)/2 messages to detect a deadlock that involves m processes and spans over
n sites. The size of messages is fixed and is very small (only three integer words). The
delay in detecting a deadlock is O(n).
Basic idea
A blocked process initiates deadlock detection by sending query messages to all
processes in its dependent set (i.e., processes from which it is waiting to receive a
message). If an active process receives a query or reply message, it discards it. When a
blocked process Pk receives a query(i, j, k) message, it takes the following actions:
1. If this is the first query message received by Pk for the deadlock detection
initiated by Pi (called the engaging query), then it propagates the query to all the
processes in its dependent set and sets a local variable num-k(i) to the number of
query messages sent.
2. If this is not the engaging query, then Pk returns a reply message to it immediately
provided Pk has been continuously blocked since it received the corresponding
engaging query. Otherwise, it discards the query.
Process Pk maintains a boolean variable wait-k(i) that denotes the fact that it has been
continuously blocked since it received the last engaging query from process Pi. When a
blocked process Pk receives a reply(i, j, k) message, it decrements num-k(i) only if
wait-k(i) holds. A process sends a reply message in response to an engaging query only
after it has received a reply to every query message it has sent out for this engaging
query.
The initiator process detects a deadlock when it has received reply messages to all the
query messages it has sent out.
The algorithm
The algorithm works as shown in Algorithm 10.2. For ease of presentation, we have
assumed that only one diffusion computation is initiated for a process. In practice, several
diffusion computations may be initiated for a process (a diffusion computation is initiated
every time the process gets blocked), but at any time only one diffusion computation is
current for any process. However, messages for outdated diffusion computations may still
be in transit. The current diffusion computation can be distinguished from outdated ones
by using sequence numbers.
Performance analysis
For every deadlock detection, the algorithm exchanges e query messages and e reply
messages, where e = n_n−1_ is the number of edges.
System model
The system has n nodes, and every pair of nodes is connected by a logical channel. An
event in a computation can be an internal event, a message send event, or a message
receive event.
The computation messages can be either REQUEST, REPLY, or CANCEL messages. To
execute a p(i)-out-of-q(i) request, an active node i sends q(i) REQUESTs to q(i) other
nodes and remains blocked until it receives sufficient number of REPLY messages.
When node i blocks on node j, node j becomes a successor of node i and node i becomes
a predecessor of node j in the WFG. A REPLY message denotes the granting of a request.
A node i unblocks when p(i) out of its q(i) requests have been granted. When a node
unblocks, it sends CANCEL messages to withdraw the remaining qi–pi requests it had
sent.
Sending and receiving of REQUEST, REPLY, and CANCEL messages are computation
events. The sending and receiving of deadlock detection algorithm messages are
algorithmic or control events.
The distributed WFG is recorded using FLOOD messages in the outward sweep and the
recorded WFG is examined for deadlocks using ECHO messages in the inward sweep. To
detect a deadlock, the initiator init records its local state and sends FLOOD messages
along all of its outward dependencies. When node i receives the first FLOOD message
along an existing inward dependency, it records its local state. If node i is blocked at this
time, it sends out FLOOD messages along all of its outward dependencies to continue the
recording of the WFG in the outward sweep. If node i is active at this time (i.e., it does
not have any outward dependencies and is a leaf node in the WFG), then it initiates
reduction of the WFG by returning an ECHO message along the incoming dependency
even before the states of all incoming dependencies have been recorded in the WFG
snapshot at the leaf node.
ECHO messages perform reduction of the recorded WFG by simulating the granting of
requests in the inward sweep. A node i in the WFG is reduced if it receives ECHOs along
pi out of its q(i) outgoing edges indicating that p(i) of its requests can be granted. An
edge is reduced if an ECHO is received on the edge indicating that the request it
represents can be granted. After a local snapshot has been recorded at node i, any
transition made by i from idle to active state is captured in the process of reduction. The
nodes that can be reduced do not form a deadlock whereas the nodes that cannot be
reduced are deadlocked. The order in which reduction of the nodes and edges of the WFG
is performed does not alter the final result. Node init detects the deadlock if it is not
reduced when the deadlock detection algorithm terminates.
In general, WFG reduction can begin at a non-leaf node before recording of the WFG has
been completed at that node; this happens when an ECHO message arrives and begins
reduction at a non-leaf node before all the FLOODs have arrived at it and recorded the
complete local WFG at that node. Thus, the activities of recording and reducing the WFG
snapshot are done concurrently in a single phase. Unlike the algorithm in, no serialization
is imposed between the two activities. Since a reduction is done on an incompletely
recorded WFG at nodes, the local snapshot at each node has to be carefully manipulated
so as to give the effect that WFG reduction is initiated after WFG recording has been
completed.
When multiple nodes block concurrently, they may each initiate the deadlock detection
algorithm concurrently. Each invocation of the deadlock detection algorithm is treated
independently and is identified by the initiator’s identity and initiator’s timestamp when it
blocked. Every node maintains a local snapshot for the latest deadlock detection
algorithm initiated by every other node.