I guess some of you expected a blog entry about the
generational GC in Mono, given the title. From my
understanding many have the expectation that the new GC will
solve all the issues they think are caused by the GC so they
await with trepidation.
As a matter of fact, from my debugging of all or almost all
those issues, the existing GC is not the culprit. Sometimes
there is an unmanaged leak, sometimes a managed or unmanaged
excessive retention of objects, but basically 80% of those
issues that get attributed to the GC are not GC issues at
all.
So, instead of waiting for the holy grail, provide test
cases or as much data as you can for the bugs you
experience, because chances are that the bug can be fixed
relatively easily without waiting for the new GC to
stabilize and get deployed.
Now, this is not to say that the new GC won't bring great
improvements, but that those improvements are mainly in
allocation speed and mean pause time, both of which, while
measurable, are not bugs per-se and so are not part of the
few issues that people hit with the current Boehm-GC based
implementation.
After the long introduction, let's go to the purpose of this entry: svn Mono now can perform an object allocation entirely in managed code. Let me explain why this is significant.
The Mono runtime (including the GC) is written in C code and
this is called unmanaged code as opposed to managed code
which is all the code that gets JITted from IL opcodes.
The JIT and the runtime cooperate so that managed code is
compiled in a way that lets the runtime inspect it, inject
exceptions, unwind the stack and so on. The unmanaged code,
on the other hand, is compiled by the C compiler and on most
systems and architectures, there is no info available on it
that would allow the same operations. For this reason,
whenever a program needs to make a transition from managed
code to unmanaged (for example for an internal call
implementation or for calling into the GC) the runtime needs
to perform some additional bookeeping, which can be relatively
expensive, especially if the amount of code to execute in
unmanaged land is tiny.
Since a while we have made use of the Boehm GC's ability to
allocate objects in a thread-local fast-path, but we
couldn't take the full benefit of it because the cost of the
managed to unmanaged and back transition was bigger than the
allocation cost itself.
Now the runtime can create a managed method that performs
the allocation fast-path entirely in managed code, avoiding
the cost of the transition in most cases. This
infrastructure will be also used for the generational GC
where it will be more important: the allocation fast-path
sequence there is 4-5 instructions vs the dozen or more of
the Boehm GC thread local alloc.
As for actual numbers, a benchmark that repeatedly allocates small objects is now more than 20% faster overall (overall includes the time spent collecting the garbage objects, the actual allocation speed increase is much bigger).