Debugging managed lock deadloacks in Mono
I just committed to svn a small function that can be used to
help debug deadlocks that result from the incorrect use of
managed locks.
Managed locks (implemented in the Monitor class and usually
invoked with the lock () construct in C#) are subject to the
same incorrect uses of normal locks, though they can be
safely taken recursively by the same thread.
One of the obviously incorrect way to use locks is to have
multiple locks and acquire them in different orders in
different codepaths. Here is an example:
using System;
using System.Threading;
class TestDeadlock {
static object lockA = new object ();
static object lockB = new object ();
static void normal_order () {
lock (lockA) {
Console.WriteLine ("took lock A");
// make the deadlock more likely
Thread.Sleep (500);
lock (lockB) {
Console.WriteLine ("took lock B");
}
}
}
static void reverse_order () {
lock (lockB) {
Console.WriteLine ("took lock B");
// make the deadlock more likely
Thread.Sleep (500);
lock (lockA) {
Console.WriteLine ("took lock A");
}
}
}
static void Main () {
TestDeadlock td = new TestDeadlock ();
lock (td) {
lock (td) { // twice for testing the nest level
Thread t1 = new Thread (
new ThreadStart (normal_order));
Thread t2 = new Thread (
new ThreadStart (reverse_order));
t1.Start ();
t2.Start ();
t1.Join ();
t2.Join ();
}
}
}
}
I added an explicit Sleep () call to make the race condition
happen almost every time you run such a program. The issue
with such deadlocks is that usually the race time window is
very small and it will go unnoticed during testing. The new
feature in the mono runtime is designed to help find the
issue when a process is stuck and we don't know why.
Now you can attach to the stuck process using gdb and issue
the following command:
(gdb) call mono_locks_dump (0)
which results in output like this:
Lock 0x824f108 in object 0x2ffd8 held by thread 0xb7d496c0,
nest level: 2
Lock 0x824f150 in object 0x2ffe8 held by thread 0xb7356bb0,
nest level: 1
Waiting on semaphore 0x40e: 1
Lock 0x824f1b0 in object 0x2ffe0 held by thread 0xb7255bb0,
nest level: 1
Waiting on semaphore 0x40d: 1
Total locks (in 1 array(s)): 16, used: 8, on freelist: 8, to
recycle: 0
We can see that there are three locks currently held by
three different threads. The first has been recursively
acquired 2 times. The other two are more interesting because
they each have a thread waiting on a semaphore associated
with the lock structure: they must be the ones involved in
the deadlock.
Once we know the threads that are deadlocking and the
objects that hold the lock we might have a better idea of
where exactly to look in the code for incorrect ordering of
lock statements.
In this particular case it's pretty easy since the objects
used for locking are static fields. The easy way to get the
class is to notice that the object which is locked twice
(0x2ffd8) is of the same class as the static fields:
(gdb) call mono_object_describe (0x2ffd8)
TestDeadlock object at 0x2ffd8 (klass: 0x820922c)
Now we know the class (0x820922c) and we can get a list of
the static fields and their values and correlate with the
objects locked in the mono_locks_dump () list:
(gdb) call mono_class_describe_statics (0x820922c)
At 0x26fd0 (ofs: 0) lockA: System.Object object at 0x2ffe8
(klass: 0x820beac)
At 0x26fd4 (ofs: 4) lockB: System.Object object at 0x2ffe0
(klass: 0x820beac)
Note that the lockA and lockB objects are the ones listed
above as deadlocking.