Recent blog entries for Stevey

Migration of services and hosts

Yesterday I carried out the upgrade of a Debian host from Squeeze to Wheezy for a friend. I like doing odd-jobs like this as they're generally painless, and when there are problems it is a fun learning experience.

I accidentally forgot to check on the status of the MySQL server on that particular host, which was a little embarassing, but later put together a reasonably thorough serverspec recipe to describe how the machine should be setup, which will avoid that problem in the future - Introduction/tutorial here.

The more I use serverspec the more I like it. My own personal servers have good rules now:

shelob ~/Repos/git.steve.org.uk/server/testing $ make
..
Finished in 1 minute 6.53 seconds
362 examples, 0 failures

Slow, but comprehensive.

In other news I've now migrated every single one of my personal mercurial repositories over to git. I didn't have a particular reason for doing that, but I've started using git more and more for collaboration with others and using two systems felt like an annoyance.

That means I no longer have to host two different kinds of repositories, and I can use the excellent gitbucket software on my git repository host.

Needless to say I wrote a policy for this host too:

#
#  The host should be wheezy.
#
describe command("lsb_release -d") do
  its(:stdout) { should match /wheezy/ }
end


#
# Our gitbucket instance should be running, under runit.
#
describe supervise('gitbucket') do
  its(:status) { should eq 'run' }
end

#
# nginx will proxy to our back-end
#
describe service('nginx') do
  it { should be_enabled   }
  it { should be_running   }
end
describe port(80) do
  it { should be_listening }
end

#
#  Host should resolve
#
describe host("git.steve.org.uk" ) do
  it { should be_resolvable.by('dns') }
end

Simple stuff, but being able to trigger all these kind of tests, on all my hosts, with one command, is very reassuring.

Syndicated 2014-08-29 13:28:28 from Steve Kemp's Blog

25 Aug 2014 (updated 29 Aug 2014 at 13:13 UTC) »

Updates on git-hosting and load-balancing

To round up the discussion of the Debian Administration site yesterday I flipped the switch on the load-balancing. Rather than this:

  https -> pound \
                  \
  http  -------------> varnish  --> apache

We now have the simpler route for all requests:

http  -> haproxy -> apache
https -> haproxy -> apache

This means we have one less HTTP-request for all incoming secure connections, and these days secure connections are preferred since a Strict-Transport-Security header is set.

In other news I've been juggling git repositories; I've setup an installation of GitBucket on my git-host. My personal git repository used to contain some private repositories and some mirrors.

Now it contains mirrors of most things on github, as well as many more private repositories.

The main reason for the switch was to get a prettier interface and bug-tracker support.

A side-benefit is that I can use "groups" to organize repositories, so for example:

Most of those are mirrors of the github repositories, but some are new. When signed in I see more sources, for example the source to http://steve.org.uk.

I've been pleased with the setup and performance, though I had to add some caching and some other magic at the nginx level to provide /robots.txt, etc, which are not otherwise present.

I'm not abandoning github, but I will no longer be using it for private repositories (I was gifted a free subscription a year or three ago), and nor will I post things there exclusively.

If a single canonical source location is required for a repository it will be one that I control, maintain, and host.

I don't expect I'll give people commit access on this mirror, but it is certainly possible. In the past I've certainly given people access to private repositories for collaboration, etc.

Syndicated 2014-08-25 08:16:33 (Updated 2014-08-29 13:13:43) from Steve Kemp's Blog

Updating Debian Administration, the code

So I previously talked about the setup behind Debian Administration, and my complaints about the slownes.

The previous post talked about the logical setup, and the hardware. This post talks about the more interesting thing. The code.

The code behind the site was originally written by Denny De La Haye. I found it and reworked it a lot, most obviously adding structure and test cases.

Once I did that the early version of the site was born.

Later my version became the official version, as when Denny setup Police State UK he used my codebase rather than his.

So the code huh? Well as you might expect it is written in Perl. There used to be this layout:

]
yawns/cgi-bin/index.cgi
yawns/cgi-bin/Pages.pl
yawns/lib/...
yawns/htdocs/

Almost every request would hit the index.cgi script, which would parse the request and return the appropriate output via the standard CGI interface.

How did it know what you wanted? Well sometimes there would be a paramater set which would be looked up in a dispatch-table:

/cgi-bin/index.cgi?article=40         - Show article 40
/cgi-bin/index.cgi?view_user=Steve    - Show the user Steve
/cgi-bin/index.cgi?recent_comments=10 - Show the most recent comments.

Over time the code became hard to update because there was no consistency, and over time the site became slow because this is not a quick setup. Spiders, bots, and just average users would cause a lot of perl processes to run.

So? What did I do? I moved the thing to using FastCGI, which avoids the cost of forking Perl and loading (100k+) the code.

Unfortunately this required a bit of work because all the parameter handling was messy and caused issues if I just renamed index.cgi -> index.fcgi. The most obvious solution was to use one parameter, globally, to specify the requested mode of operation.

Hang on? One parameter to control the page requested? A persistant environment? What does that remind me of? Yes. CGI::Application.

I started small, and pulled some of the code out of index.cgi + Pages.pl, and over into a dedicated CGI::Application class:

  • Application::Feeds - Called via /cgi-bin/f.fcgi.
  • Application::Ajax - Called via /cgi-bin/a.fcgi.

So now every part of the site that is called by Ajax has one persistent handler, and every part of the site which returns RSS feeds has another.

I had some fun setting up the sessions to match those created by the old stuff, but I quickly made it work, as this example shows:

The final job was the biggest, moving all the other (non-feed, non-ajax) modes over to a similar CGI::Application structure. There were 53 modes that had to be ported, and I did them methodically, first porting all the Poll-related requests, then all the article-releated ones, & etc. I think I did about 15 a day for three days. Then the rest in a sudden rush.

In conclusion the code is now fast because we don't use CGI, and instead use FastCGI.

This allowed minor changes to be carried out, such as compiling the HTML::Template templates which determine the look and feel, etc. Those things don't make sense in the CGI environment, but with persistence they are essentially free.

The site got a little more of a speed boost when I updated DNS, and a lot more when I blacklisted a bunch of IP-space.

As I was wrapping this up I realized that the code had accidentally become closed - because the old repository no longer exists. That is not deliberate, or intentional, and will be rectified soon.

The site would never have been started if I'd not seen Dennys original project, and although I don't think others would use the code it should be possible. I remember at the time I was searching for things like "Perl CMS" and finding Slashcode, and Scoop, which I knew were too heavyweight for my little toy blog.

In conclusion Debian Administration website is 10 years old now. It might not have changed the world, it might have become less relevant, but I'm glad I tried, and I'm glad there were years when it really was the best place to be.

These days there are HowtoForges, blogs, spam posts titled "How to install SSH on Trusty", "How to install SSH on Wheezy", "How to install SSH on Precise", and all that. No shortage of content, just finding the good from the bad is the challenge.

Me? The single best resource I read these days is probably LWN.net.

Starting to ramble now.

Go look at my quick hack for remote command execution https://github.com/skx/nanoexec ?

Syndicated 2014-08-23 08:04:46 from Steve Kemp's Blog

Updating Debian Administration

Recently I've been getting annoyed with the Debian Administration website; too often it would be slower than it should be considering the resources behind it.

As a brief recap I have six nodes:

  • 1 x MySQL Database - The only MySQL database I personally manage these days.
  • 4 x Web Nodes.
  • 1 x Misc server.

The misc server is designed to display events. There is a node.js listener which receives UDP messages and stores them in a rotating buffer. The messages might contain things like "User bob logged in", "Slaughter ran", etc. It's a neat hack which gives a good feeling of what is going on cluster-wide.

I need to rationalize that code - but there's a very simple predecessor posted on github for the curious.

Anyway enough diversions, the database is tuned, and "small". The misc server is almost entirely irrelevent, non-public, and not explicitly advertised.

So what do the web nodes run? Well they run a lot. Potentially.

Each web node has four services configured:

  • Apache 2.x - All nodes.
  • uCarp - All nodes.
  • Pound - Master node.
  • Varnish - Master node.

Apache runs the main site, listening on *:8080.

One of the nodes will be special and will claim a virtual IP provided via ucarp. The virtual IP is actually the end-point visitors hit, meaning we have:

Master Host Other hosts

Running:

  • Apache.
  • Pound.
  • Varnish

Running:

  • Apache.

Pound is configured to listen on the virtual IP and perform SSL termination. That means that incoming requests get proxied from "vip:443 -> vip:80". Varnish listens on "vip:80" and proxies to the back-end apache instances.

The end result should be high availability. In the typical case all four servers are alive, and all is well.

If one server dies, and it is not the master, then it will simply be dropped as a valid back-end. If a single server dies and it is the master then a new one will appear, thanks to the magic of ucarp, and the remaining three will be used as expected.

I'm sure there is a pathological case when all four hosts die, and at that point the site will be down, but that's something that should be atypical.

Yes, I am prone to over-engineering. The site doesn't have any availability requirements that justify this setup, but it is good to experiment and learn things.

So, with this setup in mind, with incoming requests (on average) being divided at random onto one of four hosts, why is the damn thing so slow?

We'll come back to that in the next post.

(Good news though; I fixed it ;)

Syndicated 2014-08-21 08:50:10 from Steve Kemp's Blog

A tale of two products

This is a random post inspired by recent purchases. Some things we buy are practical, others are a little arbitrary.

I tend to avoid buying things for the sake of it, and have explicitly started decluttering our house over the past few years. That said sometimes things just seem sufficiently "cool" that they get bought without too much thought.

This entry is about two things.

A couple of years ago my bathroom was ripped apart and refitted. Gone was the old and nasty room, and in its place was a glorious space. There was only one downside to the new bathroom - you turn on the light and the fan comes on too.

When your wife works funny shifts at the hospital you can find that the (quiet) fan sounds very loud in the middle of the night and wakes you up..

So I figured we could buy a couple of LED lights and scatter them around the place - when it is dark the movement sensors turn on the lights.

These things are amazing. We have one sat on a shelf, one velcroed to the bottom of the sink, and one on the floor, just hidden underneath the toilet.

Due to the shiny-white walls of the room they're all you need in the dark.

By contrast my second purchase was a mistake - The Logitech Harmony 650 Universal Remote Control should be great. It clearly has the features I want - Able to power:

  • Our TV.
  • Our Sky-box.
  • OUr DVD player.

The problem is solely due to the horrific software. You program the device via an application/website which works only under Windows.

I had to resort to installing Windows in a virtual machine to make it run:

# Get the Bus/ID for the USB device
bus=$(lsusb |grep -i Harmony | awk '{print $2}' | tr -d 0)
id=$(lsusb |grep -i Harmony | awk '{print $4}' | tr -d 0:)

# pass to kvm
kvm -localtime ..  -usb -device usb-host,hostbus=$bus,hostaddr=$id ..

That allows the device to be passed through to windows, though you'll later have to jump onto the Qemu console to re-add the device as the software disconnects and reconnects it at random times, and the bus changes. Sigh.

I guess I can pretend it works, and has cut down on the number of remotes sat on our table, but .. The overwhelmingly negative setup and configuration process has really soured me on it.

There is a linux application which will take a configuration file and squirt it onto the device, when attached via a USB cable. This software, which I found during research prior to buying it, is useful but not as much as I'd expected. Why? Well the software lets you upload the config file, but to get a config file you must fully complete the setup on Windows. It is impossible to configure/use this device solely using GNU/Linux.

(Apparently there is MacOS software too, I don't use macs. *shrugs*)

In conclusion - Motion-activated LED lights, more useful than expected, but Harmony causes Discord.

Syndicated 2014-08-15 12:14:46 from Steve Kemp's Blog

Rebooting the CMS

I run a cluster for the Debian Administration website, and the code is starting to show its age. Unfortunately the code is not so modern, and has evolved a lot of baggage.

Given the relatively clean separation between the logical components I'm interested in trying something new. In brief the current codebase allows:

  • Posting of articles, blog-entries, and polls.
  • The manipulation of the same.
  • User-account management.

It crossed my mind the other night that it might make sense to break this code down into a number of mini-servers - a server to handle all article-related things, a server to handle all poll-related things, etc.

If we have a JSON endpoint that will allow:

  • GET /article/32
  • POST /article/ [create]
  • GET /articles/offset/number [get the most recent]

Then we could have a very thin shim/server on top of that whihc would present the public API. Of course the internal HTTP overhead might make this unworkable, but it is an interesting approach to the problem, and would allow the backend storage to be migrated in the future without too much difficulty.

At the moment I've coded up two trivial servers, one for getting user-data (to allow login requests to succeed), and one for getting article data.

There is a tiny presentation server written to use those back-end servers and it seems like an approach that might work. Of course deployment might be a pain..

It is still an experiment rather than a plan, but it could work out: http://github.com/skx/snooze/.

Syndicated 2014-08-09 08:59:51 from Steve Kemp's Blog

Free (orange) SMS alerts

In the past I used to pay for an email->SMS gateway, which was used to alert me about some urgent things. That was nice because it was bi-directional, and at one point I could restart particular services via sending SMS messages.

These days I get it for free, and for my own reference here is how you get to receive free SMS alerts via Orange, which is my mobile phone company. If you don't use Orange/EE this will probably not help you.

The first step is to register an Orange email-account, which can be done here:

Once you've done that you'll have an email address of the form example@orange.net, which is kinda-sorta linked to your mobile number. You'll sign in and be shown something that looks like webmail from the early 90s.

The thing that makes this interesting is that you can look in the left-hand menu and see a link called "SMS Alerts". Visit it. That will let you do things like set the number of SMSs you wish to receive a month (I chose "1000"), and the hours during which delivery will be made (I chose "All the time").

Anyway if you go through this dance you'll end up with an email address example@orange.net, and when an email arrives at that destination an SMS will be sent to your phone.

The content of the SMS will be the subject of the mail, truncated if necessary, so you can send a hello message to yourself like this:

echo "nop" | mail -s "Hello, urgent message is present" username@orange.net

Delivery seems pretty reliable, and I've scheduled the mailbox to be purged every week, to avoid it getting full:

Hostname pop.orange.net
Username Your mobile number
Password Your password

If you wished to send mail from this you can use smtp.orange.net, but I pity the fool who used their mobile phone company for their primary email address.

Syndicated 2014-08-05 09:56:48 from Steve Kemp's Blog

luonnos viesti - 31 heinäkuu 2014

Yesterday I spent a while looking at the Debian code search site, an enormously useful service allowing you to search the code contained in the Debian archives.

The end result was three trivial bug reports:

#756565 - lives

Insecure usage of temporary files.

A CVE-identifier should be requested.

#756566 - libxml-dt-perl

Insecure usage of temporary files.

A CVE-identifier has been requested by Salvatore Bonaccorso, and will be added to my security log once allocated.

756600 - xcfa

Insecure usage of temporary files.

A CVE-identifier should be requested.

Finding these bugs was a simple matter of using the code-search to look for patterns like "system.*>.*%2Ftmp".

Perhaps tomorrow somebody else would like to have a go at looking for backtick-related operations ("`"), or the usage of popen.

Tomorrow I will personally be swimming in a loch, which is more fun than wading in code..

Syndicated 2014-07-31 12:54:16 from Steve Kemp's Blog

The selfish programmer

Once upon a time I wrote a piece of software for scheduling the classes available to a college.

There was a bug in the scheduler: Students who happened to be named 'Steve Kemp' had a significantly higher chance (>=80% IIRC) of being placed in lessons where the class makeup was more than 50% female.

This bug was never fixed. Which was nice, because I spent several hours both implementing and disguising this feature.

I'm was a bad coder when I was a teenager.

These days I'm still a bad coder, but in different ways.

Syndicated 2014-07-25 13:16:54 from Steve Kemp's Blog

An alternative to devilspie/devilspie2

Recently I was updating my dotfiles, because I wanted to ensure that media-players were "always on top", when launched, as this suits the way I work.

For many years I've used devilspie to script the placement of new windows, and once I googled a recipe I managed to achieve my aim.

However during the course of my googling I discovered that devilspie is unmaintained, and has been replaced by something using Lua - something I like.

I'm surprised I hadn't realized that the project was dead, although I've always hated the configuration syntax it is something that I've used on a constant basis since I found it.

Unfortunately the replacement, despite using Lua, and despite being functional just didn't seem to gell with me. So I figured "How hard could it be?".

In the past I've written softare which iterated over all (visible) windows, and obviously I'm no stranger to writing Lua bindings.

However I did run into a snag. My initial implementation did two things:

  • Find all windows.
  • For each window invoke a lua script-file.

This worked. This worked well. This worked too well.

The problem I ran into was that if I wrote something like "Move window 'emacs' to desktop 2" that action would be applied, over and over again. So if I launched emacs, and then manually moved the window to desktop3 it would jump back!

In short I needed to add a "stop()" function, which would cause further actions against a given window to cease. (By keeping a linked list of windows-to-ignore, and avoiding processing them.)

The code did work, but it felt wrong to have an ever-growing linked-list of processed windows. So I figured I'd look at the alternative - the original devilspie used libwnck to operate. That library allows you to nominate a callback to be executed every time a new window is created.

If you apply your magic only on a window-create event - well you don't need to bother caching prior-windows.

So in conclusion :

I think my code is better than devilspie2 because it is smaller, simpler, and does things more neatly - for example instead of a function to get geometry and another to set it, I use one. (e.g. "xy()" returns the position of a window, but xy(3,3) sets it.).

kpie also allows you to run as a one-off job, and using the simple primitives I wrote a file to dump your windows, and their size/placement, which looks like this:

shelob ~/git/kpie $ ./kpie --single ./samples/dump.lua
-- Screen width : 1920
-- Screen height: 1080
..
if ( ( window_title() == "Buddy List" ) and
     ( window_class() == "Pidgin" ) and
     ( window_application() == "Pidgin" ) ) then
     xy(1536,24 )
     size(384,1032 )
     workspace(2)
end
if ( ( window_title() == "feeds" ) and
     ( window_class() == "Pidgin" ) and
     ( window_application() == "Pidgin" ) ) then
     xy(1,24 )
     size(1536,1032 )
     workspace(2)
end
..

As you can see that has dumped all my windows, along with their current state. This allows a simple starting-point - Configure your windows the way you want them, then dump them to a script file. Re-run that script file and your windows will be set back the way they were! (Obviously there might be tweaks required.)

I used that starting-point to define a simple recipe for configuring pidgin, which is more flexible than what I ever had with pidgin, and suits my tastes.

Bug-reports welcome.

Syndicated 2014-07-21 14:30:14 from Steve Kemp's Blog

716 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!