Searching Deadlocks
Searching Deadlocks
Searching Deadlocks
Frank Huch
Christian-Albrechts-University of Kiel
Institute of Computer Science
Olshausenstr. 40, 24118 Kiel, Germany
{jac,fhu}@informatik.uni-kiel.de
ABSTRACT
This paper presents an approach to searching for deadlocks
in Concurrent Haskell programs. The search is based on
a redefinition of the IO monad which allows the reversal
of Concurrent Haskells concurrency primitives. Hence, it
is possible to implement this search by a backtracking algorithm checking all possible schedules of the system. It
is integrated in the Concurrent Haskell Debugger (CHD),
and automatically searches for deadlocks in the background
while debugging. The tool is easy to use and the small modifications of the source program are done by a preprocessor.
In the tool we use iterative deepening as search strategy
which quickly detects deadlocks close to the actual system
configuration and utilizes idle time during debugging at the
best.
General Terms
Languages
Keywords
Concurrent Haskell, debugging, deadlock, detecting deadlocks
1.
INTRODUCTION
Developing concurrent applications is a difficult task. Beside the bugs a programmer can make in the sequential parts
of his program, he can also produce bugs related to concurrency and thread synchronization, like deadlocks, livelocks,
or not guaranteed mutual exclusion. One approach to prevent programmers from such bugs is formal verification, like
model checking or theorem proving, combined with a formal
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICFP04, September 1921, 2004, Snowbird, Utah, USA.
Copyright 2004 ACM 1-58113-905-5/04/0009 ...$5.00.
2.
CONCURRENT HASKELL
Concurrency is a useful feature for many real world applications. Examples are graphical user interfaces or web
servers. Such reactive systems (have to) interact with multiple interfaces at the same time. Therefore, a programming language should support concurrency for simultaneous serving of requests.
The lazy functional programming language Haskell 98 [18]
does not support concurrency. Therefore, the extension
Concurrent Haskell [19] was proposed. Concurrent Haskell
extends the monadic IO primitives of Haskell 98 with a
thread concept and inter-thread communication by means
of shared variables. It is implemented within the Glasgow
Haskell Compiler (ghc) [7]. This section will give a short
introduction to the possibilities and usage of Concurrent
Haskell.
2.1
Threads
value. If the MVar is already full, then the thread performing putMVar is suspended until the MVar is empty again1 .
takeMVar reads the contents of a full MVar leaving it empty.
Similarly to putMVar, the thread executing takeMVar is suspended, if the MVar is already empty, until another thread
writes a value to it. In the case that multiple threads suspend on the same MVar, non-deterministically one thread
is chosen to continue, when the MVar is emptied (putMVar)
respectively filled (takeMVar). The implementation guarantees that each of these functions (and all the other functions on communication abstractions) are executed atomically. They cannot be interrupted by the scheduler.
The library Concurrent implements more actions on MVars
such as testing for emptiness, swapping the contents, or just
reading without emptying the MVar. Furthermore, MVars
are the base for other, more complex communication abstractions: a slight modification of MVars (SampleVar), two
kinds of quantity semaphores (QSem and QSemN), and unbound channels (Chan).
A Concurrent Haskell program is stuck in a deadlock if
all threads are suspended on communication abstractions.
If at least one thread is waiting for input from the outside
world (e.g. like in a web server), then this is not a deadlock,
since the thread may awake the other threads after receiving
input. Note, that systems may contain partial deadlocks in
which only some threads are waiting for each other. In the
remaining paper we do not consider partial deadlocks, since
they are very difficult to identify and many situations (e.g.
an idle web server) can be seen as a partial deadlock.
2.3
2.2
Communication
For inter-thread communication Concurrent Haskell provides different kinds of communication abstractions. The
simplest communication abstraction are mutable variables
(MVars). An MVar can either contain a value of a specified
type or be empty.
data MVar a
newEmptyMVar
takeMVar
putMVar
-::
::
::
abstract
IO (MVar a)
MVar a -> IO a
MVar a -> a -> IO ()
An Example
3.
4.
MOTIVATION
5.
This section attends on the generation of a tree representation of the actions used in a concurrent program, similar
to Hinzes Backtracking Monad transformers [11]. Before we
present how various schedules can be checked by means of
this representation, we shortly discuss how Haskells module system provides the redefinition of the IO datatype. For
convenient use of our debuger, we want to keep the name
IO for the our newly defined datatype. Hence, the original
IO type must be hidden explicitely which has to be done
explicitely since the prelude is always imported.
import Prelude hiding ( IO )
Furthermore, we also need the original IO datatype from the
prelude and import it qualified, which means use it qualified
as P.IO.
import qualified Prelude as P
To create the IO tree we have to redefine the Concurrent
Haskell functions. Since these redefinitions will use the original ones, we import the module Concurrent qualified as
well.
import qualified Concurrent as C
Hence, each function imported from the original Concurrent
Haskell library is preceded with C..
5.1
IO Redefinition
5.1.1 ConcAction
The constructor ConcAction represents all functions of
Concurrent Haskell, except for forkIO. Naturally, we do
not want our checker to suspend when for example, executing takeMVar on an empty MVar. Therefore, the first
argument of ConcAction, the retractable action, is of type
P.IO (Maybe a). It yields Nothing if the retractable action
would suspend. Otherwise, it yields Just v where v is the
result after performing the retractable action. The second
argument of ConcAction is the corresponding restore function.
data IO a = ...
| ConcAction (P.IO (Maybe a)) (a -> P.IO ())
...
5.1.2
SeqAction
5.1.3
ForkAction
if b
then do C.putMVar searchMVar x
return (Just ())
else return Nothing)
(\_ -> do C.takeMVar searchMVar
return ())
The retractable action of takeMVar is similar to the one of
putMVar. If the MVar is empty it yields Nothing because
takeMVar would suspend on an empty MVar. Otherwise, it
takes the value out of the MVar and yields it. When we check
for a deadlock we execute the retractable action and if the
MVar can be taken we pass the result to the restore function.
The restore function puts this value back into the MVar.
5.2
Concurrent Redefinition
5.2.1 ConcAction
Retracting newEmptyMVar is simple. We just do nothing
because the garbage collector will remove the created MVar.
Thus, the restore function is \ -> return (). Executing
newEmptyMVar no suspension is possible. The retractable
action always yields a new MVar packed into the constructor
Just.
newEmptyMVar :: IO (MVar a)
newEmptyMVar =
ConcAction
(do searchMVar <- C.newEmptyMVar
return (Just searchMVar))
(\_ -> return ())
In the case of putMVar the retractable action works as follows. First, it tests if the MVar is empty. If this is the
case, then it performs the C.putMVar and yields Just ().
Otherwise, it yields Nothing because a putMVar on a full
MVar would suspend. In the context of search there will
be no interleaving, so that the MVar cannot be filled in between isEmptyMVar searchMVar and putMVar searchMVar
(see Section 6). The restore function empties the MVar by
calling C.takeMVar.
putMVar :: MVar a -> a -> IO ()
putMVar searchMVar x =
ConcAction
(do b <- C.isEmptyMVar searchMVar
5.2.2
ForkAction
6.
DEADLOCK CHECKING
6.1
General Checking
| Stop
| Terminate
| Fork Thread (Thread,P.IO ())
checkThread executes the next node in the IO tree, i.e. it
executes the next retractable action. If the next node is
a NonBackTrackable SeqAction, then the result is Stop.
The result Terminate is used, when a thread terminates,
i.e. the node is Return (). If it is a ConcAction and the
retractable action yields Nothing, i.e. executing the action
would suspend, then the result is Suspended. Otherwise, the
result is Stepped with the performing thread and restore action as parameter. Similarly to Stepped, we yield Fork for
ForkActions.
On top of checkThread, the function checkThreads searches for a deadlock in a given list of Threads up to a given
depth. The function checkThreads yields a value of type
P.IO (Maybe [C.ThreadId]). The list of ThreadIds is a
schedule that causes a deadlock. It is Nothing if no deadlock is found. You can think of the function checkThreads
as an interpreter for the IO tree. It executes the associated
program with every possible scheduling up to a given execution depth. The function uses a depth first strategy, that is
it first executes one scheduling up to the maximum depth.
Then it undoes the last action and executes another thread
instead and so on.
The signature of the function checkThreads is defined as
follows.
checkThreads :: [Thread] -> Int -> Bool -> [Int]
-> P.IO (Maybe [C.ThreadId])
The first argument ([Thread]) is a list of actually forked
threads. When checkThreads is called for the first time
the list contains only one Thread, namely the main thread.
The second argument (Int) is the depth to be checked. The
third argument (Bool) indicates if all checked threads in
this depth are dead, that is suspended or terminated. If
all Threads in a depth are dead, then there is a deadlock.
The last argument ([Int]) says which Threads still have to
be checked in this depth. The numbers are positions in the
Thread list. We will call this list position list.
The first rule of checkThreads processes an empty position list. In this case all Threads in this depth have been
checked.
checkThreads ts@(_:_) _ dead [] = do
if dead
then return (Just [])
else return Nothing
If dead is True all threads are dead and a deadlock is found.
Therefore, it yields Just []. The list is empty because we
are already in a deadlock state. If dead is False we yield
Nothing because after checking all Threads no deadlock was
found and so there is no schedule to a deadlock.
If the position list is non-empty checkThreads calls the
function checkThread with the first Thread to be checked.
Note, that by means of the as pattern n is bound to the
value n1-1.
checkThreads ts@(_:_) n1@(n+1) dead (m:ms) =
do let thread
= ts!!(m-1)
threadId = fst thread
resultT <- checkThread thread
case resultT of
...
Suspended
This indicates that the thread would suspend after the next
action. In this case we just check the other Threads in this
depth. This is done by calling checkThreads with the remaining position list:
Suspended -> checkThreads ts n1 dead ms
Stop
This result indicates that the thread would next perform
a SeqAction of type NonBackTrackable. As already mentioned, in this case we stop the check and yield Nothing.
Stop -> return Nothing
Fork newT (t,restoreAction) -> do
let ts = replaceWithPos m t ts ++ [newT]
checkRes <- checkThreads ts n True
[1..length ts]
restoreAction
case checkRes of
Just path -> return (Just (threadId:path))
Nothing
-> checkThreads ts n1 False ms
Terminate
In this case we know that the thread we checked terminated. First this Thread is deleted from our Thread list.
Then the search is continued with the new Thread list, a decreased depth and a newly initialized position list because
we have deleted one Thread. If this call yields a deadlock
path, then we add the ThreadId of the Thread that terminated to this path. Otherwise, we look for a deadlock in the
original Thread list, with the original depth and indicate by
False that no deadlock was found.
Terminate -> do
let ts = deleteWithPos m ts
checkRes <- checkThreads ts n True
[1..length ts]
case checkRes of
Just path -> return (Just (threadId:path))
Nothing
-> checkThreads ts n1 False ms
Fork
If the thread has performed a forkIO, then checkThread
yields Fork. The first argument of Fork is the forked Thread.
The second is a tuple consisting of the modified Thread that
forked the new one and a restore function. First the old
Thread is replaced with the modified one and the new Thread
Stepped
If the thread has executed an action that doesnt fit in one of
the cases discussed so far, then checkThread yields Stepped.
All the concurrent functions like newEmptyMVar, putMVar
and takeMVar and the SeqActions of type Ignorable fall
in this category. The argument of Stepped is a tuple consisting of the modified Thread and a restore function.
In this case the old Thread is replaced by the modified
one. Then the search is continued with the new Thread list
and a decreased depth. the retractable action performed
by checkThread is undone by calling restoreAction. If a
deadlock path is found the ThreadId of the executed Thread
is added to the path and the search ends with this result. If
the call yields Nothing the search is proceeded with the old
Thread list and depth.
Stepped (t,restoreAction) -> do
let ts = replaceWithPos m t ts
checkRes <- checkThreads ts n True
[1..length ts]
restoreAction
case checkRes of
Just path -> return (Just (threadId:path))
Nothing -> checkThreads ts n1 False ms
This search for deadlocks already finds nearby deadlocks
during debugging. Indeed, we first integrated this search
into CHD and used it successfully for debugging. However,
there is the possibility to optimize the approach with ideas
from partial order reduction in model checking [5]. Hence,
we will first discuss this optimization, before we describe the
integration into CHD in Section 7.
6.2
To motivate the reduction of the search space, we consider a concurrent program with three threads, where the
actions of all threads are independent from each other, e.g.
all threads work on disjoint sets of MVars. We search for
deadlocks in that program with a maximum depth of 3.
Figure 2 shows the structure of a tree representing the computation of checkThreads. Each node represents a state of
the checked program and each edge the execution of one action. The numbers at the edges indicate which Thread is
executed in this step. Because the three threads are independent from each other, some paths are equivalent with
Suspended/Stop
In both cases there are no changes because the search stops
or just the rest of the position list has to be checked.
Terminate
In the case of Terminate we just check all threads that have
a number less than the one of the current thread. We do
not check the m-th Thread because it just terminated and
we have deleted it from the list. We additionally check all
Threads that are dependent on the current one.
Terminate -> do
let ts = deleteWithPos m ts
checkRes <- checkThreads ts n True
([1..m-1]++list)
case checkRes of
Just path -> return (Just (threadId:path))
Nothing
-> checkThreads ts n1 False ms
Fork
When we fork a thread we do not know which actions it
will perform. Thus, we do not know which other threads
are dependent on the actions of the new thread. We have
to assume that the ForkAction is dependent on all other
threads and we still check all Threads in this case. Thus
there is no change in this case.
Stepped
The case of Stepped completes the changes in the program
code. It is similar to the Terminate case except that we do
not delete the thread but replace it with the modified one.
We only check the threads with positions that are less or
equal to the current position and add the list of the dependent thread positions.
Stepped (t,restoreAction) -> do
let ts = replaceWithPos m t ts
checkRes <- checkThreads ts n True
([1..m]++list)
restoreAction
case checkRes of
Just path -> return (Just (threadId:path))
Nothing -> checkThreads ts n1 False ms
With this reduction the search space is decreased noticeably, as shown in Figure 3. Unfortunately, we can not find
deadlocks anymore: As mentioned above, if the list of thread
positions to be checked is empty, then we test for a deadlock
as follows:
1 2 3
(1,[]) (2,[])
1 2 3
(1,[])
2 2 3
(1,[2])
1 2 3
2 2 3
1 3 3
(1,[2])
(1,[])
2 3 3
2
(3,[])
1 2 1
(2,[3])
(1,[3])
1 2 3
(1,[])
2 2 3
2
2 2 1
(2,[])
(1,[2])
1 3 3
(1,[])
2 3 3
1 2 1
(2,[3])
1 2 3
(1,[3])
(2,[])
1 3 1
(3,[])
1 2 3
(1,[3])
2 3 1
2
2 2 1
7.
INTEGRATION IN CHD
This section will give some information about our implementation of the presented approach on top of CHD.
Additionally to the retractable action and the restore function which are executed during the check, we add a third argument for actions to ConcAction and ForkAction. These
actions are similar to the retractable actions but do not
check whether the execution of this action would suspend.
Thus, in the case of ConcAction their type is not IO (Maybe
a) but IO a. Furthermore, these actions use the functions
provided by CHD instead of the original Concurrent Haskell
functions. A function interprets the IO tree by executing
these actions like the normal program. Because we use the
functions defined in CHD we get the control of the scheduling of the program. The CHD blocks every function and
allows the user to decide which thread to unblock. Every
time CHD blocks the function checkThreads is called with
the current Thread list. Thus, the resulting program works
like CHD but searches for deadlocks in the background.
The screenshot in Figure 1 shows the debugger applied to
the dining philosophers problem, introduced in Section 2.3.
The black circle marks the thread the execution should be
continued with (by clicking this thread by the user) to reach
the found deadlock (here Thread5). If you allow this thread
to continue the next thread of the deadlock path will be
marked. If you select another Thread the program uses iterative deepening to find a new deadlock path based on the
new Thread list. The program increases the depth until a
deadlock is found or the search tree contains more states
than a maximum selectable by the user, initially 50,000
states. If the user unblocks a thread before checkThreads
has terminated, checkThreads is signaled to stop. It restores
the state before the check and yields Nothing. The selected
thread is unblocked, performs a step and then a new check
is started with the new Thread list.
Now we explain which modifications in the Concurrent
Haskell program have to be made, to apply our debugger.
First, we have to make some changes in our program. Instead of importing the original module Concurrent
import Concurrent
we import our new debugger
import ConcurrentSearch
Because the prelude is always imported we have to hide its
IO datatype and all functions using it.
import Prelude hiding (IO,putStr,getLine,... )
Furthermore, we import the prelude qualified to access the
original IO monad.
import Prelude qualified as P
8.
PRACTICAL APPLICATION
9.
RELATED WORK
10.
We have presented a new approach for improving debugging of Concurrent Haskell programs. The system is conveniently integrated in the Concurrent Haskell Debugger. The
basic idea is searching for deadlocks, whenever the computation is idle, while the user steps through the system
states in the debugger. If a deadlock is found, then the
erties (in [13] we presented lock reversal) can easily be integrated in this approach. We want to find more common
bugs and integrate this into this extended search.
Another direction for future work, is a larger case study.
This should give hints for improving our debugger, especially
adding different views. We also want to check whether a
more precise distinction of independent actions will improve
the search. At the moment we distinguish them only by the
communication abstraction they operate on. As an improvement, we could also distinguish the different operations. For
example, two isEmptyMVar actions performed on the same
MVar are independent as well. However, being more precise
yields additional costs and it is not clear that this will really
allow deeper search in practice. Finally, we will investigate,
if a combination of our debugger with Hood could be useful.
Although the values in the communication abstractions are
not directly relevant for deadlocks, it can be interesting to
view the values inside a concurrency abstraction (and perhaps also how much they are evaluated). At the moment
the programmer can only label these values by hand e.g.
using putMVarLabel instead of putMVar, but an automatic
labeling by means of Hood could be more convenient.
[13]
11.
[16]
REFERENCES
[11]
[12]
[14]
[15]
[17]
[18]
[19]
[20]
[21]