Older blog entries for Rhys (starting at number 7)

Thought some here might like to know that University of Wales Bangor are holding an e-Welsh day on Saturday November 30th. This is to celebrate the formation of their new 'e-Welsh: Terminology and Language Engineering' unit. I mention it here only because the day 'will be concentrating especially on what open source software has to offer small languages such as Welsh, with the intention of creating an e-Welsh network of contacts to promote and give direction to this work.'

1030-1400 in Bangor, simultaneous English translation provided. Further details are available.

(No connection with the day other than that I was sent its details).

Had my birthday recently. Many gifts, for which I was very grateful.

One, though, had to be mentioned here; a ty penguin beanie. Unfortunately for the present-givers, and hilariously for me, the nametag of the penguin wasn't checked before it was handed over. Which really should have been done...

...are ty trying to tell us something?

Well, it's official. Thanks Gareth. Here goes...

Hacking about a bit with Portaloo, writing quite a lot, and wondering what on earth this job site is trying to tell me.

Finally managed to get some thoughts together on the spam/non-spam issue, a mere fortnight behind pretty much everybody else.

I've focused on the corpus collection side of things, since I worked on the SpeechDat(II) project for a while (the link via the Welsh flag on that page is long down, sorry). I could've written more about lexical model adaptation, but chose not to in the end.

Anyway, here's a link to what I wrote. Comments appreciated.

I have this account's passphrase back (it was obvious when I saw it, but then these things always are like that I guess). Thanks to Telsa and to yosh for their help.

I've been wondering about the way that the current group of probabilistic spam-filters, from Vipul's Razor via spamassassin to those inspired by Paul Graham's work, actually collect their spam/non-spam corpuses, and, where appropriate, adapt their n-gram and other lexical analyses. I'm putting that here in order to embarrass myself into writing something about it in the very near future.

A lot's happened in the past month. My PhD grinds on, very slowly - current deadline for completion is March 31st. I have a Real Job for when I finish that. And I've almost completely neglected Advogato (sorry), but I'm glad I'm not a Journeyer any more. Later then...

22 Dec 2000 (updated 23 Dec 2000 at 08:05 UTC) »


Posted a first article. <strike>Please be gentle :)</strike>

Posted a rather firmer riposte to first article.

Now I'm pleading with people to read my two home pages before going for the minimalist jugular.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!