Finally managed to get some thoughts together on the spam/non-spam issue, a mere fortnight behind pretty much everybody else.
I've focused on the corpus collection side of things, since I worked on the SpeechDat(II) project for a while (the link via the Welsh flag on that page is long down, sorry). I could've written more about lexical model adaptation, but chose not to in the end.
Anyway, here's a link to what I wrote. Comments appreciated.