23 Nov 2006 lupus   » (Master)

Debugging managed lock deadloacks in Mono

I just committed to svn a small function that can be used to help debug deadlocks that result from the incorrect use of managed locks.

Managed locks (implemented in the Monitor class and usually invoked with the lock () construct in C#) are subject to the same incorrect uses of normal locks, though they can be safely taken recursively by the same thread.

One of the obviously incorrect way to use locks is to have multiple locks and acquire them in different orders in different codepaths. Here is an example:

using System;
using System.Threading;

class TestDeadlock {

static object lockA = new object (); static object lockB = new object ();

static void normal_order () { lock (lockA) { Console.WriteLine ("took lock A"); // make the deadlock more likely Thread.Sleep (500); lock (lockB) { Console.WriteLine ("took lock B"); } } } static void reverse_order () { lock (lockB) { Console.WriteLine ("took lock B"); // make the deadlock more likely Thread.Sleep (500); lock (lockA) { Console.WriteLine ("took lock A"); } } } static void Main () { TestDeadlock td = new TestDeadlock (); lock (td) { lock (td) { // twice for testing the nest level Thread t1 = new Thread ( new ThreadStart (normal_order)); Thread t2 = new Thread ( new ThreadStart (reverse_order)); t1.Start (); t2.Start (); t1.Join (); t2.Join (); } } } }

I added an explicit Sleep () call to make the race condition happen almost every time you run such a program. The issue with such deadlocks is that usually the race time window is very small and it will go unnoticed during testing. The new feature in the mono runtime is designed to help find the issue when a process is stuck and we don't know why.

Now you can attach to the stuck process using gdb and issue the following command:

(gdb) call mono_locks_dump (0)
which results in output like this:

Lock 0x824f108 in object 0x2ffd8 held by thread 0xb7d496c0,
nest level: 2
Lock 0x824f150 in object 0x2ffe8 held by thread 0xb7356bb0,
nest level: 1
        Waiting on semaphore 0x40e: 1
Lock 0x824f1b0 in object 0x2ffe0 held by thread 0xb7255bb0,
nest level: 1
        Waiting on semaphore 0x40d: 1
Total locks (in 1 array(s)): 16, used: 8, on freelist: 8, to
recycle: 0
We can see that there are three locks currently held by three different threads. The first has been recursively acquired 2 times. The other two are more interesting because they each have a thread waiting on a semaphore associated with the lock structure: they must be the ones involved in the deadlock.

Once we know the threads that are deadlocking and the objects that hold the lock we might have a better idea of where exactly to look in the code for incorrect ordering of lock statements.

In this particular case it's pretty easy since the objects used for locking are static fields. The easy way to get the class is to notice that the object which is locked twice (0x2ffd8) is of the same class as the static fields:

(gdb) call mono_object_describe (0x2ffd8)
TestDeadlock object at 0x2ffd8 (klass: 0x820922c)
Now we know the class (0x820922c) and we can get a list of the static fields and their values and correlate with the objects locked in the mono_locks_dump () list:

(gdb) call mono_class_describe_statics (0x820922c)
At 0x26fd0 (ofs:  0) lockA: System.Object object at 0x2ffe8
(klass: 0x820beac)
At 0x26fd4 (ofs:  4) lockB: System.Object object at 0x2ffe0
(klass: 0x820beac)
Note that the lockA and lockB objects are the ones listed above as deadlocking.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!