Older blog entries for Stevey (starting at number 112)

Port Monitoring

 The new software I mentioned in my previous entry, Lestat, has just had an update.

 I'm was pleased with the previous release as it got some good feedback and useful suggestions - however even in the course of three days I've become almost embaressed at some of the old code.

 PHP I've used before, for things like my LiveJournal Valentine System (double-blind blinddating .. almost), but never enough to code in a PHP-ish way.

 My PHP tends to look like Perl, clean but non-ideomatic.

 Still the new release is out now - with template based presentation; because I know that other people are more capable of writing GUI's than I. Writing GUIs is hard, writing them well is harder still.

 I think in my previous project I was touched by a fair amount of luck when it came to creating the GUI - as most people liked it, and those that didn't were capable of creating new layouts/themes.

 Even now I feel humbled when viewing some other peoples themes for my software (graphics intensive link).

 Sleep now.

Connection / Portscan Monitoring

 Prompted more by the spread of the MSBlaster RPC worm than anything else I've packaged and released my connection monitoring application, Lestat.

 This produces pretty graphs of connection attempts.

 The system is comprised of a perl based agent which collects packets and logs them to a database, and a PHP based viewing system which pulls the data out and massages it into prettyness.


 I've not been doing a fair bit of `real` hacking for the past few days... Looking through Debian packages for security holes.

 Mostly this has been triggered by somebody mailing me and telling me that the Debian Auditing Project had really nasty webpages - so I've updated them.

 Once I did that I got all enthusiastic and built up a list of all the setuid/setgid binaries in Debian stable, before starting to work my way through some of them.

 So far I've had several Debian Security Advisories published - and I've got a few more issues to report.

 Ideally I'd like to release one a day .. for the next few weeks!

 At the moment I have five in hand to report, so there is the chance that I can manage it.

 It's been productive week or so - it looks like there's the proposal to audit all new setuid/setgid binaries before they enter the distribution is going to be accepted, so we should be ahead of the game :)


 In other life news I have a new cat.

 Cat Six is the successor to Tigger - (bet you thought I was gonna say cat 5 then didn't you? ;)

 I'm in love, she's beautiful and lovely and nice , and stuff :)

6 Jul 2003 (updated 6 Jul 2003 at 21:13 UTC) »
Bayesian Spelling

 A lot of people have heard of Bayesian Spam filtering recently, as a result of Paul Grahams Plan For Spam article.

 I confess that my maths knowledge is lacking, but I can follow along with his idea. Counting tokens is trivial stuff, and applying weights to the different tokens appears to be reasonable - so I can follow along, and see how it all wokrs.

 Reading through the code of several implementations has been rewarding as I can see it all in action.

 The whole process has piqued my interest in statistics, something I've never really been that interested in before. I guess the closests statistical thing I have coded before has been Genetic Algorithms, where this kind of thing doesn't really turn up to the same extent.

 My formal maths training isn't terribly high, much like my computer training. Most of the things I know I've picked up by accidental discovery rather than pure theory, although I have read a lot of the literature over the past few years to shore up my home-learning approach to programming.

The Idea

 Whilst I was typing up the latest entry for my online journal I enabled the online spell checker.

 This managed to correct my erroneous spelling of "muscles" to "mussels". This was quite a fun mis-correction, but it did make me pause for thought.

 So often I've seen this in spell checkers before - you type "that" which is a real word - but not the one you should have written.

 Perhaps what we need is a statistical approach to spell checking; much like Paul's work - look over a corpus of previous emails/blog entries/whatever and look at the word distribution.

 Examining pairs of words it should be possible to see, for example that "hot this" doesn't ever occur - but that "sex", "curry", "weather" are a acceptible suffixes to follow "hot".

 I guess this does break down badly when you're using globally unique words for the first time - as there wouldn't be an entry in the database to describe it. So the first time you wrote "hot Madigasgar" you'd be flagged as if you'd made an error.

 It's an interesting idea though nonetheless. I wonder if it's been done before?


 I like your idea of a good visualization tool for duplicate file finding.

 As you might have seen from my recent diary entry I spent a while working on a quick and dirty script for finding duplicate files.

 I'd love to see a screenshot if you could dig one up - as I have a hard time imagining a useful GUI for such a tool.

 Finding duplicate directories might be simple, but displaying partial duplications seems tricky to me - maybe I just don't have the eye for it.


 Spent a while investigating online presentation systems recently for managing a new website in a collaborative manner.

 I narrowed down the list of systems to a couple - then went looking through the code to see how secure/paranoid/flexible each one was.

 (Due to my mistrust of such systems - How many times have holes been pointed out in PHPNuke et al?)

 Depressingly in both cases I found exploitable weaknesses. To my shame I tried to demonstrate one in a non-malicious manner after the author(s) didn't seem to understand what I had discovered and reported ... it went wrong. The main site was borked for around 15 minutes.

 I guess there's a good side the admins now spot the problem, but the down side is that I may have inspired evil people to take advantage.

 It was a genuine error for which I can only apologise profusely; in my investigation I hadn't realised quite what effect I'd have.

 Ce la vie ..

 Based on early responses the sites/packages will both be fixed shortly so a "Name and Shame" is inappropriate - but I'll document the flaws which might encourage other authors to take more care and be more paranoid in the future...


 Nothing much to report - I wrote some quick and dirty scripts today to find duplicate files as I'm bad at organizing MP3's.

 First we scan through a directory, recursively, writing out a temporary file containing MD5 hashes and filenames - then we use that to find duplicate files.

 Handy, but messy.


 It looks like I was responsible for the following two Debian Security Advisorys:

 (Details here, and here respectively).

 I'm such a naughty boy ... ;)

salmoni - No I guess all cats are perfect, although some are more perfect than others ;)


 I spent an hour or two working on the Advogato codebase last night - adding support for Article Editting.

 This isn't complete yet because I'm having issues with the way that articles are posted. What I have is an 'Edit' link displayed next to an article if you're the author.

 The edit link brings up the article, preamble, and title in a form which you may edit and submit.

 This is where it goes wrong - when the form is submitted a new article is posted with the changes applied. What should happen is the old article should be updated. I'll deal with this tonight if I get time.

 There are other issues to deal with - such as the forking of Advogato. There are many different versions of the code now. I think I'm correct in saying that Steven Rainwater's version is the most up to date - but there are different fixes and changes in each version.

 I have packaged a copy for Debian which is pretty standard with only the addition of my password emailing patch and no other changes. (I can make the .deb file available to the world if theres any interest - I didn't do this initially to avoid poluting the world with yet another codebase).

 The article editting I started just for fun - if it's complete I've no idea what to do with it. Keep it to myself? Add it to my .deb?


 Over the weekend I changed a lot of things in my MP3 streamer, rather than reading the tags from each audio file as needed there's an indexer script which builds up a database of all the files and tag information.

 This "database" is used throughout the code which provides a huge performance win - at the expense of potentially out of date information.

 So far I'm assuming that the indexer will be run from cron, but I'm experimenting with the auto-rebuilding of the index whenever the machine is "idle"... We'll see how that goes.

 A new release is going to arrive soon - I'm determined to get it out before I drop offline during my housemoving.

13 Jun 2003 (updated 13 Jun 2003 at 23:30 UTC) »
Article Feedback

 I've been swamped with feedback over the recent article I posted, both as comments and as personal mail.

 Firstly thanks to everybody for taking the time to offer their thoughts, and secondly sorry for my delay in replying to you all.

 Partly I wanted to digest all the information, and partly because I'm just gearing up to move house. Three weeks and counting!

 Two quick non-specific comments:

  • It looks like an idea repository appears to be a useful idea; but the site would have to be very flexible to be worthwhile.
  • I'm suprised lots of people jumped upon the Exchange mention - it was just a random example of a piece of software that I would like to so, not something I want to write myself. (To my mind any exchange replacement will not be a PHP groupware system, that may be great - but if it's not a server that an unmodified copy of Outlook can connect with it's not an exchange server..)

 I've been writing code sporadically recently, working upon my MP3 streamer and trying to get it to pre-generate a cache of all the song tags.

 This is a fairly nice thing to work upon, it's simple, self-contained and doesn't involve any tricky coding. A job that I can divide into discrete parts and implement and test fairly easily.

 Hopefully this will be complete over the weekend, subject to my erratic sleeping pattern.


 Did I say I was moving soon? OK I am, and I'm alternating between wishing it wasn't necessary and hating the fact it's not happening yet.

 The reason for the move is entirely my own - until last September I had a cat who lived in my flat with me.

 I've always been a cat lover, and they really do suit my moods and temperement.

 However I grew up spending my summers upon my grans farm - surrounded by Alsations.

 When I lost my cat I was determined that he couldn't be "replaced", and I had the thought of making my next pet a dog.

 My flat being too small to lock a dog inside alone whilst I work I decided to move house - taking advantage of the fact that the Edinburgh property market is obscene, and I could double the money I'd paid for this property only 3 years ago.

 So now I've got a nice new flat all lined up - top floor overlooking a park. Twice the size of my current place and very very sexy.

 As I'm getting closer to the entry date I've been looking at lots of 2 or 3 year old Alsations and Labradors .. but I'm increasingly thinking cat-ness is the pet for me.

 Wouldn't I feel stupid essentially having moved for no reason if I end up with another perfect cat?


 I posted my second ever article - this was really the result of some discussions in the pub with some friends over the weekend.

 Be gentle with me ..


 The company I'm working for is looking to upgrade the telephone system it uses, as sysadmin I'm in charge of this by default.

 Whilst networking is interesting I find I'm learning more than I really care to about telephone systems, voicemail, ddi, and switchboard systems.

 The first presentation I received was interesting, I felt curious about the technology involved to route calls from the exchanges down to our switchboard and from that to our individual telephones - but now I'm getting blase about the whole thing.

 I can't help thinking this is a bad thing in a way that I cannot articulate appropriately. I know that I'm not turned on by all technology, and indeed I'm not a fan of the telephone in general - but even so I feel I should be interested.

 Maybe it's another artifact of my increasing sloth and apathy... as the temperature rises every summer I get less motivated and more sleepy. Always.

 Wintertime is my time .. in the same way that night time is mine.

 It's at times like this I think I should have continued spending my life working in nightclubs; lets face it it's got appeal:

  • Beautiful people (tm).
  • Free beer.
  • Keeps you fit (lifting all the crates around).
  • Surrounded by your friends.
  • Good Music

 To be fair I know I could never really go back to it, and there were downsides too - being busy from 10PM-5AM every Friday and Saturday puts a damper upon your social life - and dealing with drunken people doesn't make you feel good about the human race, or customers in general.

 Life's not so bad .. I just need something to motivate me - either computery (see that article I mentioned) or outdoorsy.

 (Either that or jwz should offer me a job in his club ;)


 That's a neat POP3 library, any thoughts about IMAP access too?

 One minor quibble: Your sample code contains a couple of exploitable buffer overflows.

 I guess this isn't a huge problem as the code is clearly sample code, and isn't installed setuid() or anything silly like that. I just have a reflex action of trying to persuade people to always be more careful...

int main(int argc,char** argv){
        int mysock;
        char myservername[64];
        char username[64];
        char password[64];

// Time passes ...

strcpy(myservername,argv[1]); strcpy(username,argv[2]); strcpy(password,argv[3]);

// Thorin sits down and starts singing about gold ..

103 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!