Older blog entries for Stevey (starting at number 642)

Some things change, some things do not.

After seven years working from home I've resigned from my position at Bytemark.

Why? A combination of wanting to do something different coupled with the desire to reclaim my second bedroom, which is currently tied up as an office.

Working in an office in the future will be weird ("You mean I have to get dressed every day?!") but hopefully not unduly burdonsome.

My two-year plan still remains in effect: Pay off this flat as soon as possible, then purchase another and rent this one out. Giving me some income of my own, which I will need.

The "five" year plan involves me quitting work, so that I can stay home and raise children. That makes sense because sometime next year I'll become the partner who earns the least amount of monies, and I'll also be the partner with the lowest upper-bound on salary potential (short of moving to London/similar which I've always ruled out).

Having rental income for myself means I'm not utterly dependant on other money, and all being well this place will be 100% paid off within 18 months.

(After that lots of saving will take place for a deposit for the second place. We did bid on a couple of places locally, which were outstanding, but it is perhaps for the best we didn't win them. No more looking at ESPC!)

Bytemark now becomes a company I recommend 100% for hosting in the UK. In the past I've always said nice things, but I've not strongly recommended them/us, because I'm too biased.

All my personal hosting, except for one virtual machine, will remain at Bytemark indefinitely. Lovely, flexible, and great.

(I have one outside guest for the purposes of diversification. That currently lives at Mythic Beasts.)

Syndicated 2013-10-24 09:39:32 from Steve Kemp's Blog

5 Oct 2013 (updated 5 Oct 2013 at 10:14 UTC) »

I understand volunterering is hard

The tail end of this week was mostly spoiled by the discovery that libbeanstalkclient-ruby was not included in Wheezy.

Apparently it was removed because the maintainer had no time, and there were no reverse dependencies - #650308.

Debian maintainers really need to appreciate that no official dependencies doesn't mean a package is unused.

Last year I needed to redesign our companies monitoring software, because we ran out of options that scaled well. I came up with the (obvious) solution:

  • Have a central queue containing jobs to process.
    • e.g. Run a ping-test on host1.example.com
    • e.g. Run an SSH-probe on host99.example.com
    • e.g. Fetch a web-page from https://example3.net/ and test it has some text or a given HTTP status code.
    • (About 15 different test-types are available).
  • Have N workers each pull one job from the queue, execute it, and send the results somewhere.

I chose beanstalkd for my central queue precisely because it was packaged for Debian, had a client library I could use, and seemed to be a good fit. It was a good fit, a year on and we're still running around 5000 tests every minute with 10 workers.

The monitoring tool is called Custodian Custodian, and I think I've mentioned it before here and on the company blog.

It looks like we'll need to re-package the Ruby beanstalk client, and distribute it alongside our project now. That's not ideal, but also not a huge amount of work.

In summary? Debian you're awesome. But libraries shouldn't be removed unless it can't be helped, because you have more users than you know.

Syndicated 2013-10-05 09:09:30 (Updated 2013-10-05 10:14:09) from Steve Kemp's Blog

28 Sep 2013 (updated 28 Sep 2013 at 16:12 UTC) »

Some thoughts ..

It has taken just over two weeks for blogspam to reject 1 million SPAM comments.

I'm not sure how paranoid I should be able false-positives now, (I accept false-negatives easily enough).

Using node.js is pretty good for making toy servers, and on that basis here's another toy server:

This is a small server which is designed to accept HTTP-POSTs containing a payload of a message, these are stored and later retrieved. Seems like a simple thing, right? Imagine how it is used:

root@server1:~# record-log Upgraded mysql

root@server2:~# record-log Tweaked /etc/sysctl.conf

root@server3:~# record-log Added user 'bob'
root@server3:~# record-log Added user 'steve'

Later:

root@server3:~# get-recent
1.2.3.4 2013-09-28T08:08:09.211Z
root:Added user 'bob'

1.2.3.4 2013-09-28T08:08:10.211Z
root:Added user 'steve'

In short it makes it easy to record "activity", and later retrieve it. A host can only fetch the entries it stored, but if you've got access to the remote server then you can get all logs.

I suspect a more standard solution is to use syslog-ng, and logger, or similar. But it is a cute hack and I suspect if you've the discipline to record actions then this is actually reasonably useful.

Syndicated 2013-09-28 10:36:53 (Updated 2013-09-28 16:12:21) from Steve Kemp's Blog

Random hacking for fun

Recently I've been playing gtetrinet, against the publicly accessible server at tetrinet.debian.net.

If you're unfamiliar with the game it is a multi-player variant of Tetris. You clear many lines and your opponents suffer. Want to make them suffer some more? Use the special blocks you acquire.

Special blocks? How about shuffling your opponents playing field? Adding new semi-formed rows? etc. All good stuff.

There is support for up to six players. To fire a special block at the player in field 1 you press "1". To fire the special block to the player in field 6 press "6". But to fire a block at yourself, to clear your playing field ("nuke") or remove a single line ("clear") you have to know what player-number you are, which will change from day to day, as it is literally a marker for the order you joined the channel in.

It seems obvious that there should be a special-case keybinding "fire to self", and indeed there was bug #291844 filed in 2005 saying as much. I've just submitted a functional patch to resolve this, and already my playing is getting better.

Join me sometime.

Syndicated 2013-09-26 19:31:02 from Steve Kemp's Blog

Some days you just want to do nothing

Today I finally pushed out a new binary release of my slaughter server-automation tool. (Think "CFEngine-lite", with perl. full documetnation is available. Though nobody ever reads it.)

Otherwise the weekend is being quiet; we spent last night mostly drinking vodka, until midnight rolled over, and along with some messing around with a camera ("Wow, your arms are getting bigger!")

Today has consisted of a Turkish breakfast, an Indonesian dinner, and an ice-cream based tea.

I could write more, but I'm hung-over. A rare thing for me.

Syndicated 2013-09-22 14:06:32 from Steve Kemp's Blog

A new wordpress plugin

There is now a new wordpress plugin, for testing against my blogspam site/service.

Now time to talk about something else.

This week my partners sister & niece are visiting from Helsinki, so we've both got a few days off work, and we'll be acting like tourists.

Otherwise the job of this week is to find a local photographer to shoot the pair of us. I've shot her many, many, many times, and we also have many nice pictures of me but we have practically zero photos of the pair of us.

I spent a lot of time talking to local volunteers & models, because I like to shoot them, but I know only a couple of photographers.

Still a big city, we're bound to find somebody suitable :)

Syndicated 2013-09-17 08:47:32 from Steve Kemp's Blog

CIDR-matching, in node.js

I recently mentioned that there wasn't any built-in node.js functionality to perform IP matching against CIDR ranges.

This surprised me, given that lots of other functionality is available by default.

As a learning experience I've hacked a simple cidr-matching module, and published it as an NPM module.

I've written a few other javascript "libraries", but this is the first time I've published a module. Happy times.

The NPM documentation was pretty easy to follow:

  • Write a package.json file.
  • Run "npm publish".
  • Wait for time to pass, and Thorin to sit down and sing about gold.

Now I can take a rest, and stop talking about blog-spam.

Syndicated 2013-09-14 12:04:00 from Steve Kemp's Blog

The blogspam code is live.

Living dangerously I switched DNS to point to the new codebase on my lunch hour.

I found some problems immediately; but nothing terribly severe. Certainly nothing that didn't wait until I'd finished work to attend to.

I've spent an hour or so documenting the new API this evening, and now I'm just going to keep an eye on things over the next few days.

The code is faster, definitely. The load is significantly lower than it would have been under the old codebase - although it isn't a fair comparison:

  • I'm using redis to store IP-blacklists, which expire after 48 hours. Not the filesystem.
  • The plugins are nice and asynchronous now.
  • I've not yet coded a "bayasian filter", but looking at the user-supplied options that's the plugin that everybody seems to want to disable. So I'm in no rush.

The old XML-RPC API is still present, but now it just proxies to the JSON-version, which is a cute hack. How long it stays alive is an open question, but at least a year I guess.

God knows what my wordpress developer details are. I suspect its not worth my updating the wordpress plugin, since nobody ever seemed to love it.

These days the consumers of the API seem to be, in rough order of popularity:

  • Drupal.
  • ikiwiki.
  • Trac

There are few toy-users, like my own blog, and a few other similar small blogs. All told since lunchtime I've had hits from 189 distinct sources, the majority of which don't identify themselves. (Tempted to not process their requests in the future, but I don't think I can make such a change now without pissing off the world. Oops.)

PS. Those ~200 users? rejected 12,000 spam comments since this afternoon. That's cool, huh?

Syndicated 2013-09-12 20:09:24 from Steve Kemp's Blog

I've always relied upon the kindness of strangers

Many thanks to Vincent Meurisse who solved my node.js callback woe.

Some history of the blogspam service:

Back in 2008 I was annoyed by the many spam-comments that were being submitted to my Debian Administration website. I added some simple anti-spam measures, which reduced the flow, but it was a losing battle.

In the end I decided I should test comments, as the users submitted them, via some kind of external service. The intention being that any improvements to that central service would benefit all users. (So I could move to testing comments on my personal blog too, for example).

Ultimately I registered the domain-name "blogspam.net", and set up a simple service on it which would test comments and judge them to be "SPAM" or "OK".

The current statistics show that this service has stopped 20 million spam comments, since then. (We have to pretend I didn't wipe the counters once or twice.)

I've spent a while now re-implementing most of the old plugins in node.js, and I think I'll be ready to deploy the new service over the weekend. The new service will have to handle two different kinds of requests:

New Requests

These will be submitted via HTTP POSTed JSON data, and will be handled by node.js. These should be nice and fast.

Legacy Requests

These will come in via XML-RPC, and be proxied through the new node.js implementation. Hopefully this will mean existing clients won't even notice the transition.

I've not yet deployed the new code, but it is just a matter of time. Hopefully being node.js based and significantly easier to install, update, and tweak, I'll get more contributions too. The dependencies are happily very minimal:

  • A redis-server for maintaining state:
    • The number of SPAM/OK comments for each submitting site.
    • An auto-expiring cache of blacklisted IP adddresses. (I cache the results of various RBL results for 48 hours).
  • node.js

The only significant outstanding issue is that I need to pick a node.js library for performing CIDR lookups - "Does 10.11.12.23 lie within 10.11.12.0/24?" - I'm surprised that functionality isn't available out of the box, but it is the only omission I've missed.

I've been keeping load & RAM graphs, so it will be interesting to see how the node.js service competes. I expect that if clients were using it, in preference to the XML-RPC version, then I'd get a hell of a lot more throughput, but with it hidden behind the XML-RPC proxy I'm less sure what will happen.

I guess I also need to write documentation for the new/preferred JSON-based API...

https://github.com/skx/blogspam.js

Syndicated 2013-09-11 18:26:46 from Steve Kemp's Blog

node.js is kicking me

Today I started hacking on a re-implementation of my BlogSpam service - which tests that incoming comments are SPAM/HAM - in node.js (blogspam.js)

The current API uses XML::RPC and a perl server, along with a list of plugins, to do the work.

Having had some fun and success with the HTTP+JSON mstore toy I figured I'd have a stab at making BlogSpam more modern:

  • Receive a JSON body via HTTP-POST.
  • Deserialize it.
  • Run the body through a series of Javascript plugins.
  • Return the result back to the caller via HTTP status-code + text.

In theory this is easy, I've hacked up a couple of plugins, and a Perl client to make a submission. But sadly the async-stuff is causing me .. pain.

This is my current status:

shelob ~/git/blogspam.js $ node blogspam.js
Loaded plugin: ./plugins/10-example.js
Loaded plugin: ./plugins/20-ip.js
Loaded plugin: ./plugins/80-sfs.js
Loaded plugin: ./plugins/99-last.js
Received submission: {"body":"

This is my body ..

","ip":"109.194.111.184","name":"Steve Kemp"} plugin 10-example.js said next :next plugin 20-ip.js said next :next plugin 99-last.js said spam SPAM: Listed in StopForumSpam.com

So we've loaded plugins, and each has been called. But the end result was "SPAM: Listed .." and yet the caller didn't get that result. Instead the caller go this:

shelob ~/git/blogspam.js $ ./client.pl
200 OK 99-last.js

The specific issue is that I iterate over every loaded-plugin, and wait for them to complete. Because they complete asynchronously the plugin which should be last, and just return "OK" , has executed befure the 80-sfs.js plugin. (Which makes an outgoing HTTP request).

I've looked at async, I've looked at promises, but right now I can't get anything working.

Meh.

Surprise me with a pull request ;)

Syndicated 2013-09-10 17:57:12 from Steve Kemp's Blog

633 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!