<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for djcb</title>
    <link>http://www.advogato.org/person/djcb/</link>
    <description>Advogato blog for djcb</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Tue, 16 Mar 2010 06:18:36 GMT</pubDate>
    <item>
      <pubDate>Tue, 9 Mar 2010 21:42:02 GMT</pubDate>
      <title>9 Mar 2010</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=165</link>
      <guid>http://www.advogato.org/person/djcb/diary.html?start=165</guid>
      <description>&lt;p&gt;Yesterday, I found that, unfortunately,&#xD;
&lt;code&gt;advogato.org&lt;/code&gt; does not work anymore&#xD;
&amp;ndash; it's last version was from 2004 or so, and it&#xD;
expects to find browser&#xD;
cookies in some text file. However, times have changed, and&#xD;
these days, those&#xD;
cookies are stored in an SQLite-database.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;&#xD;
Anyway, it's actually not &lt;i&gt;too&lt;/i&gt; hard to publish by&#xD;
hand. I am using&#xD;
&lt;code&gt;org-mode&lt;/code&gt; in emacs, which has some light-weight&#xD;
markup syntax, as I &lt;a href="http://emacs-fu.blogspot.com/2009/05/writing-and-blogging-with-org-mode.html" &gt;discussed&#xD;
here&lt;/a&gt;. I can simply type things there, and when I am&#xD;
done, I run&#xD;
&lt;code&gt;org-export-as-html&lt;/code&gt;. That will also put the&#xD;
result in my 'kill-ring' (i.e.,&#xD;
paste buffer), so I can paste in the advogato web form, et&#xD;
voil&amp;agrave;!.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;&#xD;
LaTeX (and to a lesser extend, HTML) is sometimes promoted&#xD;
over WYSIWYG word&#xD;
processors because it allegedly &lt;i&gt;focuses on the&#xD;
contents&lt;/i&gt; and allows you to&#xD;
&lt;i&gt;describe semantics, not looks&lt;/i&gt;. That is only partly&#xD;
true, as anyone who wants&#xD;
to insert e.g., a table in a document can attest to: in&#xD;
MS-Word or Writer,&#xD;
it's &lt;i&gt;much&lt;/i&gt; easier to concentrate on table contents&#xD;
than it is in&#xD;
LaTeX. Programs like &lt;i&gt;LyX&lt;/i&gt; alleviate this to some&#xD;
extent, but for me it's a&#xD;
bit too much on the WYSIWYG side. &#xD;
&#xD;
&lt;p&gt; &lt;p&gt;&#xD;
So, I used to &lt;i&gt;endure&lt;/i&gt; the pain of raw LaTeX (and HTML)&#xD;
editing, because it&#xD;
still we was the &lt;i&gt;least painful&lt;/i&gt; way to get the what I&#xD;
want. For LaTeX that&#xD;
is, book-quality rendering, with all the magic of maths,&#xD;
indices, numbering,&#xD;
source code blobs and so on. For HTML, it would be&#xD;
standards-compliant 'clean'&#xD;
blobs that I can still understand, and can paste into e.g.,&#xD;
a blog.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;&#xD;
However, with &lt;code&gt;org-mode&lt;/code&gt; the pain is mostly gone!&#xD;
I can export to both HTML&#xD;
and LaTeX and it really allows me to focus only on the&#xD;
contents of what I want&#xD;
to write (as said, &lt;code&gt;org-mode&lt;/code&gt;-markup is really&#xD;
lightweight); still it allows&#xD;
for a lot of massaging of the output if needed. I can&#xD;
imagine that this&#xD;
'output massage' would be quite hard if I hadn't already&#xD;
spent quite a bit of&#xD;
time using 'raw' HTML and LaTeX - anyway, for me it works&#xD;
very well. Coming&#xD;
back to adding tables in documents: this is &lt;i&gt;easy&lt;/i&gt; in&#xD;
&lt;code&gt;org-mode&lt;/code&gt;, and I can&#xD;
even use the tables as little spreadsheets, with all the&#xD;
power of GNU Calc&#xD;
formulae.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;&#xD;
As a bonus, I can easily generate presentations from&#xD;
&lt;code&gt;org-mode&lt;/code&gt;, by &lt;a href="http://emacs-fu.blogspot.com/2009/10/writing-presentations-with-org-mode-and.html" &gt;exporting&#xD;
it through the LaTeX 'beamer' class&lt;/a&gt;. This works&#xD;
beautifully well for a lot of&#xD;
the presentation I do for colleagues at work: getting PDFs&#xD;
with the &lt;a href="http://nitens.org/taraborelli/latex" &gt;beauty of&#xD;
LaTeX&lt;/a&gt;, but without the headaches.&#xD;
&#xD;
</description>
    </item>
    <item>
      <pubDate>Mon, 8 Mar 2010 19:14:02 GMT</pubDate>
      <title>8 Mar 2010</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=164</link>
      <guid>http://www.advogato.org/person/djcb/diary.html?start=164</guid>
      <description>&lt;p&gt;I joined Advogato more than 10 years ago(!), and my last&#xD;
entries here are&#xD;
from ages ago. I am planning to do some more posting here;&#xD;
main reason for&#xD;
that is that I just installed the &lt;code&gt;advogato.el&lt;/code&gt;&#xD;
for emacs, which&#xD;
hopefully allows for painless publishing from within emacs,&#xD;
something which&#xD;
unfortunately cannot be said for the interaction with e.g.&#xD;
Blogger.   &lt;p&gt;&#xD;
In the last ten years, I've written a &lt;i&gt;lot&lt;/i&gt; of&#xD;
software, both for money&#xD;
and for fun, using C++, Perl, Python, Ruby, Emacs-Lisp, and&#xD;
good-old C. For&#xD;
some reason, most of the code has involved C and Emacs, I am&#xD;
somehow drawn to&#xD;
projects where that particular knowledge is useful.  &#xD;
&lt;p&gt; All those things&#xD;
that were once a bit mysterious, such as autotools, parsers,&#xD;
Lisp and all&#xD;
those obscure tools like &lt;code&gt;objdump&lt;/code&gt;,&#xD;
&lt;code&gt;strace&lt;/code&gt;,&#xD;
&lt;code&gt;procmail&lt;/code&gt;,&amp;hellip; have entered my comfort zone.&#xD;
Editor-wise, I am&#xD;
still using GNU/Emacs, as I've been done since the mid-90s,&#xD;
with maybe a month&#xD;
or so somewhere in 2000 where I went cold-turkey to vim. That&#xD;
did not last; I&#xD;
do like vim, but I am much more productive with emacs, and&#xD;
it's taking over&#xD;
more and more of my computing universe.   &lt;p&gt; I went as&#xD;
far as starting a&#xD;
blog with emacs tips at the end of 2008: &lt;a href="http://emacs-fu.blogspot.com" &gt;Emacs-Fu&lt;/a&gt;, where I&#xD;
try to share useful&#xD;
thing about the One True Editor. There are many little gems,&#xD;
but some of them&#xD;
are well hidden, such that I still often find some nifty&#xD;
trick that has been&#xD;
in emacs for twenty years, and I never discovered. My&#xD;
emacs-lisp is still a&#xD;
bit embryonic; good enough to glue things together, but not&#xD;
really fluent. I&#xD;
am brushing up my skills in this area though, and re-reading&#xD;
&lt;a href="http://mitpress.mit.edu/sicp/" &gt;SICP&lt;/a&gt;.   &lt;p&gt; I&#xD;
am also still a&#xD;
happy Gnome-user. I have learned a lot from reading the code&#xD;
from so many&#xD;
talented hackers. I think Gnome 3 offers some great&#xD;
opportunities, and I just&#xD;
got my first patch accepted into &lt;code&gt;gnome-shell&lt;/code&gt;&#xD;
(it fixes the&#xD;
12h/24h clock bug). But it must be said that with my&#xD;
workflow revolving around&#xD;
emacs, the desktop environment is less important.  </description>
    </item>
    <item>
      <pubDate>Wed, 24 Dec 2008 16:11:01 GMT</pubDate>
      <title>spitfall</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=163</link>
      <guid>http://djcbflux.blogspot.com/feeds/4559980394734629447/comments/default</guid>
      <description>Implementing GTK+-widgets and other &lt;a href="http://en.wikipedia.org/wiki/Gobject"  &gt;GObjects&lt;/a&gt; in C requires quite a bit of boilerplate code - that's hardly news. One obvious way to deal with that is to use a different programming language. If you're into C++, I can recommend the excellent &lt;a href="http://www.gtkmm.org/"  &gt;GtkMM&lt;/a&gt; C++-bindings for GTK+. Programming GtkMM feels very natural and follows the C++-idioms; it's easy to integrate with &lt;tt&gt;std::&lt;/tt&gt; and &lt;a href="http://www.boost.org"  &gt;friends&lt;/a&gt;. Also, it's LGPL and pure C++.&lt;br /&gt;&lt;br /&gt;&lt;p&gt;Another option is &lt;a href="http://live.gnome.org/Vala"  &gt;Vala&lt;/a&gt;. If you haven't heard about it, Vala is a programming language in its own right, with similarities to &lt;a href="http://en.wikipedia.org/wiki/C_Sharp_(programming_language)"  &gt;C#&lt;/a&gt;, but specifically designed for use with GObject. One very interesting thing about Vala is that it compiles to plain C-with-GObjects (as an intermediate step). Thus, you write in Vala, with no '&lt;tt&gt;libvala&lt;/tt&gt;' needed, with code which is just as fast as handwritten C. Vala also supports many other libraries, which can make them easier to use, compared with plain C. Using Vala, writing GObject/GTK+-based applications becomes a lot easier. &lt;a href="http://www.linux.com/feature/154784"  &gt;Vala Overview&lt;/a&gt;.&lt;br /&gt;&lt;p&gt;Finally, my truly &lt;em&gt;low-tech&lt;/em&gt; solution is &lt;tt&gt;spuug&lt;/tt&gt;. &lt;a href="http://www.djcbsoftware.nl/code/spuug/"  &gt;Spuug&lt;/a&gt; is a little &lt;a href="http://en.wikipedia.org/wiki/Gobject"  &gt;GObject&lt;/a&gt; code-generator that I wrote in 2006 to learn some &lt;a href="http://en.wikipedia.org/wiki/Ruby_(programming_language)"  &gt;Ruby&lt;/a&gt;, and to save myself some time. And boy, has it saved me some time! Now, finally a new version. The credit for this go mostly to &lt;em&gt;Viktor Nagy&lt;/em&gt; (many thanks!), who submitted some patches.&lt;br /&gt;&lt;p&gt;&lt;tt&gt;spuug&lt;/tt&gt; usage is quite easy; for example:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;    $ spuug --class=FunkyFooBar --namespace=Funky --parent=GtkWidget&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;will generate &lt;tt&gt;funky-foobar.c&lt;/tt&gt; and &lt;tt&gt;funky-foobar.h&lt;/tt&gt; with 150 lines of boilerplate code, as a starting point for some &lt;tt&gt;FunkyFooBar&lt;/tt&gt;-widget.&lt;br /&gt;&lt;p&gt;Of course, spuug works well for Maemo-code, and I know of a number of programs that are using it.&lt;br /&gt;&lt;p&gt;There are of course some disadvantages to using code-generators. But the advantage of &lt;tt&gt;spuug&lt;/tt&gt; is that it doesn't require you to learn any new language. Also, after using it, you're not depending on &lt;tt&gt;spuug&lt;/tt&gt; - the output is perfectly readable C code.</description>
    </item>
    <item>
      <pubDate>Tue, 2 Dec 2008 21:06:19 GMT</pubDate>
      <title>the song remains the same</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=162</link>
      <guid>http://djcbflux.blogspot.com/feeds/1256013965682342246/comments/default</guid>
      <description>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_kGFGcbwevHE/STWYV0KJj1I/AAAAAAAAAWU/oyT0ANHiFO0/s1600-h/ttb101.png" &gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 400px; height: 352px;" src="http://4.bp.blogspot.com/_kGFGcbwevHE/STWYV0KJj1I/AAAAAAAAAWU/oyT0ANHiFO0/s400/ttb101.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5275290039080292178" /&gt;&lt;/a&gt;&lt;br /&gt;So, after three years I finally made a new version &lt;a href="http://www.djcbsoftware.nl/code/ttb"  &gt;ttb&lt;/a&gt;, my teletekst viewer, which is especially interesting for Dutch-speakers and linguisticly-inclined people studying West-Germanic languages. The new version brings user-help and some cosmetic updates. &lt;br /&gt;&lt;p&gt;The program is listed as the 'official' client for Linux by the NOS (state television), and I'm getting quite some mails -- but interestingly, not one single bug in three years. To be honest, there &lt;strong&gt;is&lt;/strong&gt; a bug remaining: there is &lt;em&gt;too much bad news&lt;/em&gt; in the news section. I am working on that one, but it might take a while.&lt;br /&gt;&lt;p&gt;I am also preparing a &lt;a href="http://maemo.org"  &gt;Maemo&lt;/a&gt;-version. Interestingly, I had a version running on an 770 in early 2005 at &lt;em&gt;LinuxTag&lt;/em&gt;, but I never got to packaging it. Anyway, the work has to wait until after my trip to a friend's wedding in the Eternal City of &lt;a href="http://en.wikipedia.org/wiki/Rome"  &gt;Rome&lt;/a&gt;, where I'll be flying.&lt;br /&gt;&lt;p&gt;As if all of that were not enough, I started a blog with tips for &lt;a href="http://en.wikipedia.org/wiki/Emacs"  &gt;emacs&lt;/a&gt;-users; the idea is to have frequent small posts that show one useful trick: &lt;a href="http://emacs-fu.blogspot.com"  &gt;Emacs-Fu&lt;/a&gt;. Let's see if I succeed.</description>
    </item>
    <item>
      <pubDate>Wed, 26 Nov 2008 21:14:11 GMT</pubDate>
      <title>it's so easy</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=161</link>
      <guid>http://djcbflux.blogspot.com/feeds/3888888310487382935/comments/default</guid>
      <description>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_kGFGcbwevHE/SS2p9e1B4QI/AAAAAAAAASg/eBOFKPywzs4/s1600-h/screenshot01.png" &gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 400px; height: 240px;" src="http://2.bp.blogspot.com/_kGFGcbwevHE/SS2p9e1B4QI/AAAAAAAAASg/eBOFKPywzs4/s400/screenshot01.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5273057612433318146" /&gt;&lt;/a&gt;&lt;br /&gt;Sometimes, I like to use mathematical notation in webpages, either to impress people or simply for decoration. One way to do that is &lt;a href="http://en.wikipedia.org/wiki/Mathml" &gt;MathML&lt;/a&gt;, which is an XML-based markup language for mathematical notation. However, many browsers do not support MathML at all, or require you to download plugins and/or special fonts. Another problem with MathML is that XML is a &lt;em&gt;really&lt;/em&gt; inconvenient format to edit by hand. Practically, you'll need some kind of formula editor.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;tex vs mathml&lt;/h3&gt;&lt;br /&gt;As an old-schooler, I prefer to use the math-notation invented for &lt;a href="http://en.wikipedia.org/wiki/TeX"  &gt;TeX&lt;/a&gt; instead - it is short and sweet and powerful. &lt;a href="http://en.wikipedia.org/wiki/Donald_Knuth"  &gt;Donald Knuth&lt;/a&gt;  invented the whole TeX language because he was unhappy with the quality of typesetting of mathematic, and it is widely used in both computer science and mathematics. Anyway, I'm sure many people remember the '&lt;em&gt;abc-formula&lt;/em&gt;' to calculate the roots of a quadratic function &lt;img src="http://www.djcbsoftware.nl/image/quadratic.png" title="quadratic" class="texdrive-formula" name="ax^2 + bx + c" border="0"&gt;:&lt;br /&gt;&lt;blockquote&gt;&lt;img src="http://www.djcbsoftware.nl/image/abc-formula.png" title="abc-formula" class="texdrive-formula" name="$-b \pm \sqrt{b^2 - 4ac} \over 2a$" border="0"&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;In the TeX-sublanguage for math, one can specify the formula as follows:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt; -b \pm \sqrt{b^2 - 4ac} \over 2a&lt;/pre&gt;&lt;br /&gt;The corresponding MathML is no fewer than 20 lines; see the &lt;a href="http://en.wikipedia.org/wiki/Mathml#Example"  &gt;example&lt;/a&gt; in Wikipedia. Clearly, MathML is not designed for hand-editing. There are are some editors available, but hand-editing TeX is much faster (at least for me); and, as mentioned, even if you have the MathML, many browser will not show it correctly.&lt;br /&gt;&lt;p&gt;So what I'd like is a way to use (i) TeX-notation and (ii) have it display correctly in any (graphical) browser. One way to that is to use LaTeX to process and render the formulae, and convert that to a PNG-image. In 2004, I wrote a little tool called &lt;em&gt;WebTeX&lt;/em&gt; to create small images from TeX-formulae. It was nothing too fancy; you enter a &lt;tt&gt;&amp;lt;img ...&amp;gt;&lt;/tt&gt;-element with some decription of some formula, and the little tool would turn it into an image, using &lt;a href="http://en.wikipedia.org/wiki/LaTeX"  &gt;LaTeX&lt;/a&gt; and &lt;a href="http://www.imagemagick.org/script/index.php"  &gt;ImageMagick&lt;/a&gt;. I don't maintain that old tool anymore - it was time for something new. Therefore...&lt;br /&gt;&lt;h3&gt;texdrive&lt;/h3&gt;&lt;br /&gt;This weekend, I wrote a new maths-in-webpages tool using &lt;a href="http://en.wikipedia.org/wiki/Emacs_lisp"  &gt;emacs-lisp&lt;/a&gt;. The emacs-integration makes adding formulae to html-pages really easy. For example, if I want to include the famous &lt;em&gt;Bayes' Theorem&lt;/em&gt;, I simply type:&lt;pre&gt;&lt;br /&gt;  M-x texdrive-insert-formula&lt;br /&gt;  Formula: $P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})}$&lt;br /&gt;  Title: bayes-theorem&lt;/pre&gt;&lt;br /&gt;Et voil&amp;agrave;; the following is inserted:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;  &amp;lt;img src="bayes-theorem.png" title="bayes-theorem"&lt;br /&gt;        class="texdrive-formula" name="$P(A|B) = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})}$"&lt;br /&gt;        border="0"&amp;gt;&lt;/pre&gt; &lt;br /&gt;Now, all we need to do is &lt;tt&gt;texdrive-generate-images-from-html&lt;/tt&gt;, and the corresponding image will be generated:&lt;br /&gt;&lt;p align="center"&gt; &lt;img src="http://www.djcbsoftware.nl/image/bayes-theorem.png" title="bayes-theorem" class="texdrive-formula" name="$P(A|B) = {P(B|A)P(A)\over{P(B|A)P(A) + P(B|\overline{A})P(\overline{A})}}$" border="0"&gt;&lt;br /&gt;&lt;p&gt;So, for immediate download: &lt;a href="http://www.djcbsoftware.nl/code/texdrive"  &gt;texdrive.el&lt;/a&gt;. It works pretty well for me; please let me know if you have any problems or are missing something. In some cases, the formulae are not as sharp as they could be; I hope I'll be able to improve it with some tweaking. Anyway, it's nice to see how one can solve problems by glueing together some existing open-source tools. Standing on the shoulders of giants...&lt;br /&gt;&lt;P&gt;Note that some wiki-software, notably Wikipedia's MediaWiki, use a &lt;a href="http://en.wikipedia.org/wiki/Help:Formula"  &gt;similar approach&lt;/a&gt;.</description>
    </item>
    <item>
      <pubDate>Sun, 23 Nov 2008 16:14:27 GMT</pubDate>
      <title>the test that stumped them all</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=160</link>
      <guid>http://djcbflux.blogspot.com/feeds/1610864353738041481/comments/default</guid>
      <description>Most of us are not &lt;a href="http://en.wikipedia.org/wiki/Donald_knuth"  &gt;Donald Knuth&lt;/a&gt;, and indeed need to &lt;em&gt;test&lt;/em&gt; our software. That is even true for my hobby projects - when I offer software for use by others, it's a matter of craftmanship to deliver the best software possible. It's very hard to foresee all the possible environments (architecture, compiler, library version, ...) where my software might be run. But at least, I can minimize the number of programming errors by testing things as much as possible.&lt;br /&gt;&lt;p&gt;The trouble with testing, however, is that it is &lt;em&gt;dead boring&lt;/em&gt;. I hate doing boring things -- life is just too short. So, I want to do my testing in the least boring way possible --  I'd like to be able to simply run:&lt;br /&gt;&lt;code&gt;&lt;pre&gt;&lt;br /&gt;$ make test&lt;br /&gt;&lt;/pre&gt;&lt;/code&gt;&lt;br /&gt;and have that go through all my test cases, and report any failures. The idea is that if it is so easy to run tests, you might actually &lt;strong&gt;do&lt;/strong&gt; so, and make sure your software is working according to plan. When doing a release, it is &lt;em&gt;so&lt;/em&gt; easy to forget something &lt;em&gt;really obvious&lt;/em&gt;, for which you get embarrasing bug reports... Running some automated tests gives some peace of mind when doing a release.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;gtest&lt;/h3&gt;Since 2.16, the &lt;a href="http://en.wikipedia.org/wiki/Glib"  &gt;GLib library&lt;/a&gt; offers a unit-testing framework called &lt;a href="http://library.gnome.org/devel/glib/stable/glib-Testing.html"  &gt;GTest&lt;/a&gt; (note, this is not to be confused with &lt;em&gt;Google Test&lt;/em&gt;, sometimes also called GTest). GTest is not much different from, say, &lt;a href="http://check.sourceforge.net/"  &gt;check&lt;/a&gt;, but it's part of GLib and integrates nicely with it. I have started to use it for mu, and I am quite happy with it. Here, I will not go into the details of actually &lt;em&gt;writing test cases&lt;/em&gt;, but talk about how to integrate GTest with your code. For the best results, you'd probably want to integrate it with your build system. I am using autotools.&lt;br /&gt;&lt;p&gt;The overall setup is that for all my directories with code, there is a subdirectory &lt;tt&gt;tests/&lt;/tt&gt; which contains the test code. Those test cases are &lt;strong&gt;unit-tests&lt;/strong&gt;, which test one function or a couple of them combined. Now, of course it's a lot easier when your code is written in such a way that makes this easy[1]. In addition to the per-directery &lt;tt&gt;tests/&lt;/tt&gt;, there is also a top-level &lt;tt&gt;tests/&lt;/tt&gt;, which tests the whole software workflow. In the case of &lt;a href="http://www.djcbsoftware.nl/code/mu" &gt;mu&lt;/a&gt;, this means that the tests will index some test messages, fill a database with that, and then run some test queries against this database. When all of that works correctly, I am quite confident that my software is not &lt;em&gt;totally&lt;/em&gt; broken.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;autotools&lt;/h3&gt;Now, let's discuss how you can integrate GTest with your code; this is inspired by the way GTK+ does it these days. First, here is &lt;tt&gt;gtest.mk&lt;/tt&gt;, a file in the top of my source tree, that I include in all &lt;tt&gt;Makefile.am&lt;/tt&gt;s that require GTest support:&lt;br /&gt;&lt;code&gt;&lt;pre&gt;&lt;br /&gt;TEST_PROGS=&lt;br /&gt;&lt;br /&gt;test: all $(TEST_PROGS)&lt;br /&gt;        @ test -z "$(TEST_PROGS)" || gtester -l --verbose $(TEST_PROGS); \&lt;br /&gt;        test -z "$(SUBDIRS)" || \&lt;br /&gt;                for subdir in $(SUBDIRS); do \&lt;br /&gt;                        test "$$subdir" = "." || \&lt;br /&gt;                (cd $$subdir &amp;&amp; $(MAKE) $(AM_MAKEFLAGS) $@ ) || exit $? ; \&lt;br /&gt;                done&lt;br /&gt;&lt;br /&gt;.PHONY: test&lt;br /&gt;&lt;/pre&gt;&lt;/code&gt;&lt;br /&gt;This blob adds a &lt;tt&gt;test&lt;/tt&gt; target to various &lt;tt&gt;Makefile&lt;/tt&gt;s, which will run the &lt;tt&gt;gtester&lt;/tt&gt; program (part of GTest) with your test programs. &lt;br /&gt;In my &lt;tt&gt;configure.ac&lt;/tt&gt; I have:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;# g_test was introduced in glib 2.16&lt;br /&gt;PKG_CHECK_MODULES(g_test,glib-2.0 &gt;= 2.16,&lt;br /&gt;                  [have_gtest=yes],[have_gtest=no])&lt;br /&gt;AM_CONDITIONAL(MU_HAVE_GTEST, test "x$have_gtest" = "xyes")&lt;br /&gt;if test "x$have_gtest" = "xno"; then&lt;br /&gt;   AC_MSG_WARN([You need GLIB version &gt;= 2.16 to build the unit tests])&lt;br /&gt;fi&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;With this, I make sure that my code also works with older versions of GLib; the unit tests will only work with newer versions, of course. With this, you'll have a symbol &lt;tt&gt;MU_HAVE_GTEST&lt;/tt&gt; that you can use in your &lt;tt&gt;Makefile.am&lt;/tt&gt;; for example, in &lt;tt&gt;index/Makefile.am&lt;/tt&gt;, I have:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;include $(top_srcdir)/gtest.mk&lt;br /&gt;&lt;br /&gt;SUBDIRS= .&lt;br /&gt;&lt;br /&gt;if MU_HAVE_GTEST&lt;br /&gt;SUBDIRS += tests&lt;br /&gt;endif&lt;br /&gt;[....]&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;As you can see, it includes &lt;tt&gt;gtest.mk&lt;/tt&gt; mentioned above, and (conditionally) add &lt;tt&gt;tests/&lt;/tt&gt; as a subdirectory to visit.The unit tests are in this subdirectory. Note that by explicitly setting &lt;tt&gt;SUBDIRS&lt;/tt&gt; to '.' first, we ensure that first we build the code in index, before we go to &lt;tt&gt;tests/&lt;/tt&gt;.&lt;br /&gt;&lt;h3&gt;unit tests&lt;/h3&gt;Below is a simple example unit test program; it only uses a small subset of GTest. You can further organize your test cases (see &lt;a href="http://library.gnome.org/devel/glib/stable/glib-Testing.html#GTestSuite"  &gt;GTestSuite&lt;/a&gt; and &lt;a href="http://library.gnome.org/devel/glib/stable/glib-Testing.html#GTestCase"  &gt;GTestCase&lt;/a&gt;) and see &lt;em&gt;Fixtures&lt;/em&gt;, which setup the testing environment. I don't use those, but they might be useful for others. In general, I am only using a small subset; check out the GTest-documentation to find out more. Anyway, here are some simple test cases:&lt;br /&gt;&lt;code&gt;&lt;pre&gt;&lt;br /&gt;#include &amp;lt;glib.h&amp;gt;&lt;br /&gt;#include "my-code-to-test.h"&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;static void&lt;br /&gt;test_num_str (void)&lt;br /&gt;{&lt;br /&gt;        char *str;&lt;br /&gt;&lt;br /&gt; g_assert_cmpstr (str = my_num_str(1001),==,"one thousand and one");&lt;br /&gt; g_free (str);&lt;br /&gt;&lt;br /&gt; g_assert_cmpstr (str = my_num_str(-1),==,"minus one");&lt;br /&gt; g_free (str);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;static void&lt;br /&gt;test_warning (void)&lt;br /&gt;{&lt;br /&gt; /*  no complex roots: my_sqrt(-1) should&lt;br /&gt;         *  return MY_SQRT_ERROR and issue a g_warning; the &lt;br /&gt;         *  g_warning will trigger the process to fail,&lt;br /&gt;         *  which is what we're expecting */&lt;br /&gt; if (g_test_trap_fork (0, G_TEST_TRAP_SILENCE_STDERR))  &lt;br /&gt;  g_assert (my_sqrt (-1) == MY_SQRT_ERROR);&lt;br /&gt; &lt;br /&gt; g_test_trap_assert_failed ();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;int&lt;br /&gt;main (int argc, char *argv[])&lt;br /&gt;{&lt;br /&gt; g_test_init (&amp;argc, &amp;argv, NULL);&lt;br /&gt;&lt;br /&gt; g_test_add_func ("/mytests/test-add",     test_add);&lt;br /&gt; g_test_add_func ("/mytests/test-warning", test_warning);&lt;br /&gt;&lt;br /&gt; return g_test_run ();&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Now, we can run our tests with:&lt;br /&gt;&lt;pre&gt;$ make test&lt;/pre&gt;&lt;br /&gt;(Note that the test cases are &lt;tt&gt;fork()&lt;/tt&gt;ed, and you can actually write a test case where it &lt;em&gt;passes&lt;/em&gt; if an &lt;tt&gt;abort&lt;/tt&gt; or even a segfault occurs.)&lt;br /&gt;&lt;p&gt;For mu-0.4 I get the following output:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;[...]&lt;br /&gt;make[1]: Entering directory `/home/djcb/src/mu-0.4/tests'&lt;br /&gt;TEST: test-index-search... (pid=15553)&lt;br /&gt;  /all/test-query01:                                                   OK&lt;br /&gt;  /all/test-query02:                                                   OK&lt;br /&gt;  /all/test-query03:                                                   OK&lt;br /&gt;  /all/test-query04:                                                   OK&lt;br /&gt;  /all/test-query05:                                                   OK&lt;br /&gt;  /all/test-query06:                                                   OK&lt;br /&gt;  /all/test-query07:                                                   OK&lt;br /&gt;  /all/test-stats01:                                                   OK&lt;br /&gt;PASS: test-index-search&lt;br /&gt;make[1]: Leaving directory `/home/djcb/src/mu-0.4/tests'&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Nice and easy; if you're less lucky, you might get something like:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;make[1]: Entering directory `/home/djcb/src/mu-0.4/tests'&lt;br /&gt;TEST: test-index-search... (pid=16024)&lt;br /&gt;  /all/test-query01:                                                   **&lt;br /&gt;ERROR:test-index-search.c:117:query_01: assertion failed (mu_msg_sqlite_get_subject(row) == "this can't be right"): ("Re: What does 'run' do in cperl-mode?" == "this can't be right")&lt;br /&gt;FAIL&lt;br /&gt;GTester: last random seed: R02S2d24e3907b0c62e6a008e891f401fedf&lt;br /&gt;/bin/bash: line 5: 16023 Terminated              gtester --verbose test-index-search&lt;br /&gt;make[1]: Leaving directory `/home/djcb/src/mu-0.4/tests'&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;With that, all we need to do is fix the bug and test again... rinse-lather-repeat. Using GTest, it's really easy to run test cases. In general I try to keep my software pass the tests at the end of every programming session. Now, this does not work when I do &lt;em&gt;big&lt;/em&gt; changes, but after stabilizing things again, I make sure all test cases pass, both old and new. &lt;br /&gt;&lt;br /&gt;&lt;h3&gt;parting thoughts&lt;/h3&gt;One thing still missing from GTest is some way to see the &lt;a href="http://en.wikipedia.org/wiki/Code_coverage"  &gt;code coverage&lt;/a&gt;, i.e. to see which part of the code are covered by tests. I think it should be possible to do this using &lt;a href="http://gcc.gnu.org/onlinedocs/gcc/Gcov.html"  &gt;gcov&lt;/a&gt;, but it'd be nice if someone automated that a bit. Another issue is that for effective use, you will need something like the setup described here. One can hardly expect someone new to Unix-development to figure this out by themselves... but of course, we cannot really blame GTest for that.&lt;br /&gt;&lt;p&gt;Hopefully my setup helps a bit to setup non-boring testing (even though it might be a bit boring in itself...). There are real-life examples of this in both &lt;a href="http://www.djcbsoftware.nl/code/mu" &gt;mu&lt;/a&gt; and &lt;a href="http://www.gtk.org/" &gt;GTK+&lt;/a&gt;. And finally, if you find any inaccuracies, please let me know -- there are no unit tests for blog entries to save me from mistakes...&lt;br /&gt;&lt;hr&gt;&lt;br /&gt;&lt;small&gt;[1] Now, a discussion of how to write easily testable functions deserves its own blog entry, but there are some general things to keep in mind. Keep your functions short, limit the number of parameters, avoid global variables, limit side-effects to only a few functions, etc. In other words, use the lessons learnt from &lt;a href="http://en.wikipedia.org/wiki/Functional_programming_language" &gt;functional programming languages&lt;/a&gt;. And as a nice side-effect (ha!), such functions tend to be much less error-prone in the first place.&lt;br /&gt;&lt;/small&gt;</description>
    </item>
    <item>
      <pubDate>Sun, 23 Nov 2008 16:14:27 GMT</pubDate>
      <title>i dream in infra red</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=159</link>
      <guid>http://djcbflux.blogspot.com/feeds/2732137093089118494/comments/default</guid>
      <description>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_kGFGcbwevHE/SQwpiGRcFtI/AAAAAAAAASY/q1trOy6hS24/s1600-h/athdolls.jpg" &gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_kGFGcbwevHE/SQwpiGRcFtI/AAAAAAAAASY/q1trOy6hS24/s400/athdolls.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5263627730265315026" /&gt;&lt;/a&gt;&lt;br /&gt;I released &lt;a href="http://www.djcbsoftware.nl/code/mu"  &gt;mu 0.4&lt;/a&gt; (my e-mail indexing/search tool), and as always, I try to &lt;em&gt;learn&lt;/em&gt; things from it.&lt;br /&gt;&lt;br /&gt;One of the main problems with writing correct and maintainable software is &lt;strong&gt;complexity&lt;/strong&gt;. I am not talking about &lt;a href="http://en.wikipedia.org/wiki/Computational_complexity_theory"  &gt;&lt;em&gt;computational&lt;/em&gt;&lt;/a&gt; (big-O) complexity here - I am talking about code complexity, as a subjective measure for readability. Some people write very elegant and readable code, while others write code that is &lt;strong&gt;very hard&lt;/strong&gt; to understand. It would be nice to have some objective measure.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;cyclomatic complexity&lt;/h3&gt;While certainly not perfect, I found McCabe's &lt;a href="http://en.wikipedia.org/wiki/Cyclomatic_complexity"  &gt;&lt;em&gt;Cyclomatic Complexity&lt;/em&gt;&lt;/a&gt; a useful tool for this. Thomas J. McCabe describes his method in his &lt;a href="http://www.literateprogramming.com/mccabe.pdf"  &gt;classic paper&lt;/a&gt; from 1976 as a metric of the flow graph of the program. I won't go into the details of the exact calculation here (it's straightforward though, read the paper) -- the bottom line is that the higher the complexity, the harder the code is to understand and to test. Indeed, it's not just about readability for humans: the complexity has a &lt;em&gt;direct&lt;/em&gt; relation with the amount of code paths, and consequently, the testability of the function. If complexity is high, you'll have an unholy number of code paths, which are impossible to fully test, and software quality will suffer.&lt;br /&gt;&lt;p&gt;Making sure your code is not too complex (according to this measure) means simply assuring that there are not too many code-paths (really: &lt;em&gt;decisions&lt;/em&gt;); ie. split your code in to short functions that do one thing, and do it well.&lt;br /&gt;&lt;h3&gt;pmccabe&lt;/h3&gt;Now, how do we get the numbers to identify overly complex functions? Thankfully, we don't need to calculate anything by hand. There is the &lt;tt&gt;pccmcabe&lt;/tt&gt;-package (debian/ubuntu) which does the work for us, for example:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;$ pmccabe -fv prime.c &lt;br /&gt;Modified McCabe Cyclomatic Complexity&lt;br /&gt;|   Traditional McCabe Cyclomatic Complexity&lt;br /&gt;|       |    # Statements in function&lt;br /&gt;|       |        |   First line of function&lt;br /&gt;|       |        |       |   # lines in function&lt;br /&gt;|       |        |       |       |  filename(definition line number):function&lt;br /&gt;|       |        |       |       |           |&lt;br /&gt;6 6 18 4 26 prime.c(5): main&lt;br /&gt;6 6 19 1 30 prime.c&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;p&gt;An interesting example of complexity is the &lt;tt&gt;__strptime_internal&lt;/tt&gt; in &lt;a href="http://svn.gnome.org/viewvc/evolution-data-server/trunk/libedataserver/e-time-utils.c?view=markup"  &gt;evolution-data-server/trunk/libedataserver/e-time-utils.c&lt;/a&gt;, which has complexity of &lt;strong&gt;196&lt;/strong&gt;(!). I am glad I do not have to maintain that one...&lt;br /&gt;&lt;h3&gt;recommendation&lt;/h3&gt;What should be the maximum &lt;em&gt;recommended&lt;/em&gt; cyclomatic complexity for a function is debatable - but many coding guidelines suggest a value of &lt;strong&gt;10&lt;/strong&gt;. If you go much beyond that, it's easy to see that the function gets very complex.&lt;br /&gt;&lt;p&gt;As always we should use guidelines with care. I can imagine some inherently complex algorithms that you nevertheless wouldn't like to split precisely *because* you want to keep things as understandable as possible. But those will be rare exceptions.&lt;br /&gt;&lt;h3&gt;practical&lt;/h3&gt;Obviously, limiting cyclomatic complexity is not sufficient to create maintainable software; there are still many other opportunities for making your code hard to understand. Still, it does not hurt to at least keep this one aspect under control, especially as experience suggests there is a high correlation between function complexity and error density. Fortunately, it's usually not too hard to reduce the complexity: split big functions (carefully!) into smaller ones; logical units that do one thing, and do one thing well.&lt;br /&gt;&lt;p&gt;I made sure the new &lt;a href="http://www.djcbsoftware.nl/code/mu"  &gt;mu&lt;/a&gt; follows the &lt;em&gt;&amp;lt;=10&lt;/em&gt;-rule. I found some extra targets for Makefiles quite useful for that:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;cc10:&lt;br /&gt; @pmccabe `find -name '*.c'` | sort -nr | awk '($$1 &gt; 10)'&lt;br /&gt;&lt;br /&gt;cc20:&lt;br /&gt; @pmccabe `find -name '*.c'` | sort -nr | awk '($$1 &gt; 20)'&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now, I can simply type &lt;tt&gt;make cc10&lt;/tt&gt; or &lt;tt&gt;make cc20&lt;/tt&gt; to get all the functions that violate the rule &lt;em&gt;CC &amp;lt;= 10&lt;/em&gt;, resp &lt;em&gt;CC &amp;lt;= 20&lt;/em&gt;. Mu version 0.3 still contained a handful of function that broke the rule, but I have now simplified them - splitting big functions up. In my projects, I have usually followed the rule to some extent, intuitively, but I definitely could have written better code if I'd pay attention to the number before. There is of course a risk in changing working code just because of 'some number'; but in the long run I think it will really pay off.</description>
    </item>
    <item>
      <pubDate>Sun, 23 Nov 2008 16:14:27 GMT</pubDate>
      <title>a kind of magic</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=158</link>
      <guid>http://djcbflux.blogspot.com/feeds/8079968286175075454/comments/default</guid>
      <description>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_kGFGcbwevHE/SQi-rxH8JJI/AAAAAAAAASQ/d3nga0ggo9M/s1600-h/magit.png" &gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 400px; height: 221px;" src="http://2.bp.blogspot.com/_kGFGcbwevHE/SQi-rxH8JJI/AAAAAAAAASQ/d3nga0ggo9M/s400/magit.png" alt="" id="BLOGGER_PHOTO_ID_5262665823712715922" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Today just a short tip: if you are using &lt;a href="http://en.wikipedia.org/wiki/Emacs" &gt;emacs&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Git_%28software%29" &gt;git&lt;/a&gt;, I can recommend &lt;a href="http://zagadka.vm.bytemark.co.uk/magit/" &gt;magit&lt;/a&gt;.&lt;br /&gt;&lt;p&gt;Magit is a git-mode for &lt;tt&gt;emacs&lt;/tt&gt;, which makes using &lt;tt&gt;git&lt;/tt&gt; convenient and easy to use. Magit was created by &lt;span style="font-style: italic;"&gt;running mate &lt;/span&gt;&lt;a href="http://maemo.org/profile/view/mvo/" &gt;Marius&lt;/a&gt;. It's under heavy development, but I have been a happy user for while. There is even a &lt;a href="http://zagadka.vm.bytemark.co.uk/magit/magit.html" &gt;user manual&lt;/a&gt;, which you actually don't need very much, as things work very much as you would expect.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;If you are &lt;span style="font-weight: bold;"&gt;not&lt;/span&gt; using emacs, this might be a good reason to start.&lt;br /&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <pubDate>Sun, 23 Nov 2008 16:14:27 GMT</pubDate>
      <title>seek </title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=157</link>
      <guid>http://djcbflux.blogspot.com/feeds/8156449964507686557/comments/default</guid>
      <description>In my &lt;a href="http://djcbflux.blogspot.com/2008/10/chasing-time.html"  &gt;last entry&lt;/a&gt; I wrote a bit about optimizing &lt;a href="http://www.djcbsoftware.nl/code/mu"  &gt;my little project&lt;/a&gt;. One other significant optimization I found was &lt;strong&gt;inode-sorting&lt;/strong&gt;, from an idea I got from some old postings on the &lt;a href="http://www.mutt.org"  &gt;mutt&lt;/a&gt; mailing list.&lt;br /&gt;&lt;br /&gt;The idea is as follows: some file systems, in particular &lt;a href="http://en.wikipedia.org/wiki/Ext3"  &gt;ext3&lt;/a&gt;, support &lt;em&gt;hashed b-trees&lt;/em&gt; to speed-up lookups in large directories (&lt;a href="http://www.usenix.org/publications/library/proceedings/als01/full_papers/phillips/phillips_html/index.html"  &gt;paper&lt;/a&gt;). That's nice for finding particular files. However, as a side-effect, when you scan full directories (as &lt;a href="http://www.djcbsoftware.nl/code/mu"  &gt;mu&lt;/a&gt; does when indexing), you might get the entries back in a rather chaotic order. If you then try to &lt;em&gt;open&lt;/em&gt; the files in that order, you suffer from long seek times, and consequently, bad performance.&lt;br /&gt;&lt;br /&gt;The solution is to sort the dir entries by their &lt;a href="http://en.wikipedia.org/wiki/Inode"  &gt;inode&lt;/a&gt; (in ascending order), and then open the corresponding files in that order. This is what &lt;tt&gt;mu&lt;/tt&gt; (&lt;tt&gt;mu-index&lt;/tt&gt;) does by default, starting with version 0.3. You can turn it off with &lt;tt&gt;--tune-sort-inodes=0&lt;/tt&gt;, but there is usually little need for that, as the overhead of sorting is negligible.&lt;br /&gt;&lt;br /&gt;So, what difference does it make? Answer: it depends on how the files are laid out; if you already get your files back in their 'natural order', there won't be much difference - this is what happens on my main machine. But, on another (old) machine where the files are &lt;em&gt;not&lt;/em&gt; in that order, the improvements are substantial: I found that indexing 1500 message in &lt;strong&gt;25&lt;/strong&gt; seconds without inode-sorting, goes down to &lt;strong&gt;15&lt;/strong&gt; seconds with inode-sorting; a nice &lt;strong&gt;40%&lt;/strong&gt; improvement.&lt;br /&gt;&lt;br /&gt;Note(1): this works for ext3 directories with &lt;tt&gt;dir_index&lt;/tt&gt; enabled; there's a &lt;a href="http://ubuntuforums.org/showthread.php?t=37806"  &gt;HOWTO&lt;/a&gt;. There are other file systems that have similar features, but I haven't tested those. Note(2): This optimization is not very useful for flash-based file systems, as they don't really care in what order you open files.</description>
    </item>
    <item>
      <pubDate>Sun, 23 Nov 2008 16:14:26 GMT</pubDate>
      <title>chasing time</title>
      <link>http://www.advogato.org/person/djcb/diary.html?start=156</link>
      <guid>http://djcbflux.blogspot.com/feeds/2772592045019408508/comments/default</guid>
      <description>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_kGFGcbwevHE/SPnt3tPxLjI/AAAAAAAAASE/kob4paC922M/s1600-h/waydown.jpg" &gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://3.bp.blogspot.com/_kGFGcbwevHE/SPnt3tPxLjI/AAAAAAAAASE/kob4paC922M/s400/waydown.jpg" alt="" id="BLOGGER_PHOTO_ID_5258495581226085938" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;As discussed &lt;a href="http://djcbflux.blogspot.com/2008/09/its-all-greek-to-me.html" &gt;before&lt;/a&gt;, I am working on a little hobby project called &lt;a href="http://www.djcbsoftware.nl/code/mu/" &gt;mu&lt;/a&gt;, for indexing/searching e-mail messages in maildirs. As a true hobby project, it's about finding things out. I'll take notes as I go along.&lt;br /&gt;&lt;h3&gt;indexing&lt;/h3&gt;One important part of indexing and searching is.... &lt;strong&gt;indexing&lt;/strong&gt;. Indexing (in this context) is the operation of recursively going through a &lt;a href="http://en.wikipedia.org/wiki/Maildir" &gt;maildir&lt;/a&gt;, analyzing each message file, and storing the results in a database. In mu's case, there are actually two databases, one &lt;a href="http://sqlite.org/" &gt;SQLite&lt;/a&gt;-database and one &lt;a href="http://www.xapian.org/" &gt;Xapian&lt;/a&gt;-database (a really interesting tool - to be discussed later).&lt;br /&gt;&lt;p&gt;Indexing may take a considerable amount of time; mu version 0.1 took 192 seconds (on average) to index 10000 messages in my testing corpus. And this version did not even support the Xapian database. Indexing involves reading from disk, querying the database to see if the message is already there, and if not, storing the message metadata. Because of this scheme, &lt;strong&gt;re&lt;/strong&gt;-indexing of the same 10000 messages only takes about 5 seconds (with re-indexing, only modified/new messages need to be indexed).&lt;br /&gt;&lt;br /&gt;&lt;p&gt;The full indexing operation probably does not happen very often, for most people. Still, I think it's very worthwhile to try and make it faster. Nobody likes to wait for 192 seconds, even once - and during development, I need to do a full index rather often. Another important reason is that optimizing software is simply &lt;strong&gt;interesting&lt;/strong&gt; - which is a main motivator for a hobby project.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;So, let's see how we can make this a bit faster; here I'll only discuss some of the database-related optimizations.&lt;br /&gt;&lt;/p&gt;&lt;h3&gt;transactions&lt;/h3&gt;As mentioned, &lt;tt&gt;mu&lt;/tt&gt; stores the indexing data in two databases; one &lt;a href="http://sqlite.org/" &gt;SQLite&lt;/a&gt;-database and one &lt;a href="http://www.xapian.org/" &gt;Xapian&lt;/a&gt;-database. Both of these databases know the concept of a &lt;a href="http://en.wikipedia.org/wiki/Database_transaction" &gt;transaction&lt;/a&gt;. By default, SQLite puts every query in a separate transaction. This is very safe, but also quite expensive. When indexing messages, there is no risk of data loss, so it's quite reasonable to increase the transaction size. And this makes things a &lt;strong&gt;lot&lt;/strong&gt; faster. Between &lt;tt&gt;mu&lt;/tt&gt; version 0.1 to 0.2, I increased the default from one transaction per message (3 queries) to one transaction per 100 messages. This made indexing more than &lt;strong&gt;2.5&lt;/strong&gt; times faster -- see the table below. This improvement is even more impressive when considering that I also added full-text search, indexing message bodies as well (this is what Xapian is for).&lt;br /&gt;&lt;p&gt;For Xapian transactions, the default value I chose is 1000 transactions -- but the performance effects are much smaller. So, my 'optimal' values, are 100 and 1000, respectively. I found that transactions bigger than that don't improve the performance very much, but of course still affect memory usage. You can tune these with &lt;tt&gt;--tune-sqlite-transaction-size&lt;/tt&gt; and &lt;tt&gt;--tune-xapian-transaction-size&lt;/tt&gt;. The defaults should be just fine for the normal desktop use case - still, if you need a less memory-hungry but slower version, that is possible too. See the &lt;tt&gt;&lt;a href="http://www.djcbsoftware.nl/code/mu/man1/mu-index.1.html" &gt;mu-index(1)&lt;/a&gt;&lt;/tt&gt; man page for details.&lt;br /&gt;&lt;/p&gt;&lt;h3&gt;pragmatic&lt;/h3&gt;Another area for performance are SQLite's &lt;a href="http://www.sqlite.org/pragma.html" &gt;&lt;tt&gt;PRAGMA&lt;/tt&gt;-statements&lt;/a&gt;. Some useful ones are &lt;tt&gt;PRAGMA synchronous=&lt;/tt&gt; (which you can influence with &lt;tt&gt;--tune-synchronous&lt;/tt&gt; and &lt;tt&gt;PRAGMA temp_store=&lt;/tt&gt;, which you can tune with &lt;tt&gt;--tune-temp-store&lt;/tt&gt;. Again, see the &lt;tt&gt;&lt;a href="http://www.djcbsoftware.nl/code/mu/man1/mu-index.1.html" &gt;mu-index(1)&lt;/a&gt;&lt;/tt&gt; man page for details.&lt;br /&gt;&lt;p&gt;It turns out that &lt;tt&gt;PRAGMA synchronous&lt;/tt&gt; allows for some improvement. This setting determines whether SQLite does it writes in a synchronous way. It's faster (and slightly less safe, but the notes at the end of this blog entry). From the table below, it seems that &lt;tt&gt;PRAGMA temp_store&lt;/tt&gt; does not make much difference in this case. This PRAGMA determines where we store temporary (non-committed) results. Some testing suggests this is because, when we do not enable synchronous writing (above), even the 'file' temp_store never physically hits the disk, due to caching by the kernel.&lt;br /&gt;&lt;/p&gt;&lt;h3&gt;results&lt;/h3&gt;Having optimization options tunable through command line options is really useful. Software optimization, especially from what your read online, seems to be a field full of myths, outdated 'facts' and placebo-effects. And even if the information is correct, it may not apply to your use case. The only thing you can do is measure it. And with command line-options I can easily do that, as well as see how various combinations of optimizations perform.&lt;br /&gt;&lt;p&gt;Here's a table with the results for indexing 10000 messages with version 0.3. Between all the runs, I used&lt;br /&gt;&lt;/p&gt;&lt;pre&gt;# sync &amp;amp;&amp;amp; echo 3 &gt; /proc/sys/vm/drop_caches&lt;br /&gt;&lt;/pre&gt;to flush the caches. That's a critical step - the kernel caches a lot of data, which makes subsequent runs much faster if you don't flush the caches. And that is not what I wanted to measure.&lt;table border="1"&gt;&lt;br /&gt;&lt;tbody&gt;&lt;tr&gt;&lt;br /&gt;&lt;td&gt;&lt;strong&gt;msg/sqlite tx&lt;/strong&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;msg/xapian tx&lt;/strong&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;synchronous sqlite&lt;/strong&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;temp store sqlite&lt;/strong&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;time (s)&lt;/strong&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;notes&lt;/strong&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;br /&gt;&lt;td&gt;1&lt;br /&gt;&lt;/td&gt;&lt;td&gt;1&lt;br /&gt;&lt;/td&gt;&lt;td&gt;full&lt;br /&gt;&lt;/td&gt;&lt;td&gt;file&lt;br /&gt;&lt;/td&gt;&lt;td&gt;1536&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;br /&gt;&lt;td&gt;1&lt;br /&gt;&lt;/td&gt;&lt;td&gt;1&lt;br /&gt;&lt;/td&gt;&lt;td&gt;normal&lt;br /&gt;&lt;/td&gt;&lt;td&gt;default&lt;br /&gt;&lt;/td&gt;&lt;td&gt;182&lt;br /&gt;&lt;/td&gt;&lt;td&gt;similar to defaults for mu 0.1, but faster&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;br /&gt;&lt;td&gt;100&lt;br /&gt;&lt;/td&gt;&lt;td&gt;1000&lt;br /&gt;&lt;/td&gt;&lt;td&gt;full&lt;br /&gt;&lt;/td&gt;&lt;td&gt;file&lt;br /&gt;&lt;/td&gt;&lt;td&gt;73&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;br /&gt;&lt;td&gt;100&lt;br /&gt;&lt;/td&gt;&lt;td&gt;1000&lt;br /&gt;&lt;/td&gt;&lt;td&gt;no&lt;br /&gt;&lt;/td&gt;&lt;td&gt;file&lt;br /&gt;&lt;/td&gt;&lt;td&gt;68&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;br /&gt;&lt;td&gt;100&lt;br /&gt;&lt;/td&gt;&lt;td&gt;1000&lt;br /&gt;&lt;/td&gt;&lt;td&gt;no&lt;br /&gt;&lt;/td&gt;&lt;td&gt;memory&lt;br /&gt;&lt;/td&gt;&lt;td&gt;&lt;strong&gt;68&lt;/strong&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;default for mu 0.3&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;br /&gt;&lt;td&gt;10000&lt;br /&gt;&lt;/td&gt;&lt;td&gt;10000&lt;br /&gt;&lt;/td&gt;&lt;td&gt;no&lt;br /&gt;&lt;/td&gt;&lt;td&gt;memory&lt;br /&gt;&lt;/td&gt;&lt;td&gt;67&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;As an example, the default for mu version 0.3 is equivalent to:&lt;br /&gt;&lt;pre&gt;./mu-index --tune-sqlite-transaction-size=100 --tune-xapian-transaction-size=1000  --tune-synchronous=0 --tune-temp-store=2 ~/data/testmaildir&lt;br /&gt;&lt;/pre&gt;Again, see the &lt;tt&gt;mu-index(1)&lt;/tt&gt; manpage for details.&lt;br /&gt;&lt;p&gt;Note, these optimizations are a good strategy for indexing data, that is, generating data from data that is already safely stored somewhere else. If anything goes wrong, we can always restart the indexing later. However, if your database stores data that cannot easily be retrieved again afterwards (say, that one occurrence of the &lt;a href="http://en.wikipedia.org/wiki/Higgs_boson" &gt;Higg's Boson&lt;/a&gt; in your particle accelerator), you would want to be a bit more careful.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;There are some more optimizations possible; some I have even implemented, such as &lt;em&gt;inode-sorting&lt;/em&gt;, which is documented in the &lt;tt&gt;&lt;a href="http://www.djcbsoftware.nl/code/mu/man1/mu-index.1.html" &gt;mu-index(1)&lt;/a&gt;&lt;/tt&gt; man page. To be discussed some other time.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
