Recent blog entries for Stevey

I've not commented on security for a while

Unless you've been living under a rock, or in a tent (which would make me slightly jealous) you'll have heard about the recent heartbleed attack many times by now.

The upshot of that attack is that lots of noise was made about hardening things, and there is now a new fork of openssl being developed. Many people have commented about "hardening Debian" in particular, as well as random musing on hardening software. One or two brave souls have even made noises about auditing code.

Once upon a time I tried to setup a project to audit Debian software. You can still see the Debian Security Audit Project webpages if you look hard enough for them.

What did I learn? There are tons of easy security bugs, but finding the hard ones is hard.

(If you get bored some time just pick your favourite Editor, which will be emacs, and look how /tmp is abused during the build-process or in random libraries such as tramp [ tramp-uudecode].)

These days I still poke at source code, and I still report bugs, but my enthusiasm has waned considerably. I tend to only commit to auditing a package if it is a new one I install in production, which limits my efforts considerably, but makes me feel like I'm not taking steps into the dark. It looks like I reported only three security isseus this year, and before that you have to go down to 2011 to find something I bothered to document.

What would I do if I had copious free time? I wouldn't audit code. Instead I'd write test-cases for code.

Many many large projects have rudimentary test-cases at best, and zero coverage at worse. I appreciate writing test-cases is hard, because lots of times it is hard to test things "for real". For example I once wrote a filesystem, using FUSE, there are some built-in unit-tests (I was pretty pleased with that, you could lauch the filesystem with a --test argument and it would invoke the unit-tests on itself. No separate steps, or source code required. If it was installed you could use it and you could test it in-situ). Beyond that I also put together a simple filesystem-stress script, which read/wrote/found random files, computes MD5 hashes of contents, etc. I've since seen similar random-filesystem-stresstest projects, and if they existed then I'd have used them. Testing filesystems is hard.

I've written kernel modules that have only a single implicit test case: It compiles. (OK that's harsh, I'd usually ensure the kernel didn't die when they were inserted, and that a new node in /dev appeared ;)

I've written a mail client, and beyond some trivial test-cases to prove my MIME-handling wasn't horrifically bad there are zero tests. How do you simulate all the mail that people will get, and the funky things they'll do with it?

But that said I'd suggest if you're keen, if you're eager, if you want internet-points, writing test-cases/test-harnesses would be more useful than randomly auditing source code.

Still what would I know, I don't even have a beard..

Syndicated 2014-04-22 21:14:46 from Steve Kemp's Blog

19 Apr 2014 (updated 19 Apr 2014 at 19:14 UTC) »

I was beaten to the punch, but felt nothing

A while back I mented github-backed DNS hosting.

Turns out NameCast.net does that already, and there is an interesting writeup on the design of something similar, from the same authors in 2009.

Fun to read.

In other news applying for jobs is a painful annoyance.

Should anybody wish to employ an Edinburgh-based system administrator, with a good Debian record, then please do shout at me. Remote work is an option, as is a local office, if you're nearby.

Now I need to go hide from the sun, lest I get burned again...

Good news? Going on holiday to Helsinki in a week or so, for Vappu. Anybody local who wants me should feel free to grab me, via the appropriate channels.

Syndicated 2014-04-19 16:27:32 (Updated 2014-04-19 19:14:14) from Steve Kemp's Blog

Is lumail a stepping stone?

I'm pondering a rewrite of my console-based mail-client.

While it is "popular" it is not popular.

I suspect "console-based" is the killer.

I like console, and I ssh to a remote server to use it, but having different front-ends would be neat.

In the world of mailpipe, etc, is there room for a graphic console client? Possibly.

The limiting factor would be the lack of POP3/IMAP.

Reworking things such that there is a daemon to which a GUI, or a console client, could connect seems simple. The hard part would obviously be working the IPC and writing the GUI. Any toolkit selected would rule out 40% of the audience.

In other news I'm stalling on replying to emails. Irony.

Syndicated 2014-04-14 23:21:20 from Steve Kemp's Blog

Putting the finishing touches to a nodejs library

For the past few years I've been running a simple service to block blog/comment-spam, which is (currently) implemented as a simple JSON API over HTTP, with a minimal core and all the logic in a series of plugins.

One obvious thing I wasn't doing until today was paying attention to the anchor-text used in hyperlinks, for example:

  <a href="http://fdsf.example.com/">buy viagra</a>

Blocking on the anchor-text is less prone to false positives than blocking on keywords in the comment/message bodies.

Unfortunately there seem to exist no simple nodejs modules for extracting all the links, and associated anchors, from a random Javascript string. So I had to write such a module, but .. given how small it is there seems little point in sharing it. So I guess this is one of the reasons why there often large gaps in the module ecosystem.

(Equally some modules are essentially applications; great that the authors shared, but virtually unusable, unless you 100% match their problem domain.)

I've written about this before when I had to construct, and publish, my own cidr-matching module.

Anyway expect an upload soon, currently I "parse" HTML and BBCode. Possibly markdown to follow, since I have an interest in markdown.

Syndicated 2014-04-11 14:14:32 from Steve Kemp's Blog

A small assortment of content

Today I took down my KVM-host machine, rebooting it and restarting all of my guests. It has been a while since I'd done so and I was a little nerveous, as it turned out this nerveousness was prophetic.

I'd forgotten to hardwire the use of proxy_arp so my guests were all broken when the systems came back online.

If you're curious this is what my incoming graph of email SPAM looks like:

I think it is obvious where the downtime occurred, right?

In other news I'm awaiting news from the system administration job I applied for here in Edinburgh, if that doesn't work out I'll need to hunt for another position..

Finally I've started hacking on my console based mail-client some more. It is a modal client which means you're always in one of three states/modes:

  • maildir - Viewing a list of maildir folders.
  • index - Viewing a list of messages.
  • message - Viewing a single message.

As a result of a lot of hacking there is now a fourth mode/state "text-mode". Which allows you to view arbitrary text, for example scrolling up and down a file on-disk, to read the manual, or viewing messages in interesting ways.

Support is still basic at the moment, but both of these work:

  --
  -- Show a single file
  --
  show_file_contents( "/etc/passwd" )
  global_mode( "text" )

Or:

function x()
   txt = { "${colour:red}Steve",
           "${colour:blue}Kemp",
           "${bold}Has",
           "${underline}Definitely",
           "Made this work" }
   show_text( txt )
   global_mode( "text")
end

x()

There will be a new release within the week, I guess, I just need to wire up a few more primitives, write more of a manual, and close some more bugs.

Happy Thursday, or as we say in this house, Hyvää torstai!

Syndicated 2014-04-10 15:34:02 from Steve Kemp's Blog

So that distribution I'm not-building?

The other week I was toying with using GNU stow to build an NFS-share, which would allow remote machines to boot from it.

It worked. It worked well. (Standard stuff, PXE booting with an NFS-root.)

Then I started wondering about distributions, since in one sense what I'd built was a minimal distribution.

On that basis yesterday I started hacking something more minimal:

  • I compiled a monolithic GNU/Linux kernel.
  • I created a minimal initrd image, using busybox.
  • I built a static version of the tcc compiler.
  • I got the thing booting, via KVM.

Unfortunately here is where I ran out of patience. Using tcc and the static C library I can compile code. But I can't link it.

$ cat > t.c <>EOF
int main ( int argc, char *argv[] )
{
        printf("OK\n" );
        return 1;
}
EOF
$ /opt/tcc/bin/tcc t.c
tcc: error: file 'crt1.o' not found
tcc: error: file 'crti.o' not found
..

Attempting to fix this up resulted in nothing much better:

$ /opt/tcc/bin/tcc t.c -I/opt/musl/include -L/opt/musl/lib/

And because I don't have a full system I cannot compile t.c to t.o and use ld to link (because I have no ld.)

I had a brief flirt with the portable c-compiler, pcc, but didn't get any further with that.

I suspect the real solution here is to install gcc onto my host system, with something like --prefix=/opt/gcc, and then rsync that into my (suddenly huge) intramfs image. Then I have all the toys.

Syndicated 2014-04-06 14:35:27 from Steve Kemp's Blog

Tagging images, and maintaining collections?

I'm an amateur photographer, although these days I tend to drop the amateur prefix, given that I shoot people for cash at least once a month.

(It isn't my main job, and I'd never actually want it to be, because I'm certain I'd become unhappy hustling for jobs and doing the promotion thing.)

Anyway over the years I've built up a large library of images, mostly organized in a hierarchy of directories beneath ~/Images.

Unlike most photographers I don't use aperture, lighttable, or any similar library management. I shoot my images in RAW, convert to JPG via rawtherapee, and keep both versions of the images.

In short I don't want to mix the "library management" functions with the "RAW conversion" because I do regard them as two separate steps. That said I'm reaching a point where I do want to start tagging images, and finding them more quickly.

In the past I wrote a couple of simple tools to inject tags into the EXIF data of images, and then indexed them. But that didn't work so well in practise. I'm starting to think instead I should index images into sqlite:

  • Size.
  • date.
  • Content hash.
  • Tags.
  • Path.

The downside is that this breaks utterly as soon as you move images around on-disk. Which is something my previous exif-manipulation was designed to avoid.

Anyway I'm thinking at the moment, but I know that the existing tools such as F-Spot, shotwell, DigiKam, and similar aren't suitable. So I either need to go standalone and use EXIF tags, accepting the fact that the tags I enter won't be visible to other tools, or I cope with the file-rename issues by attempting to update an existing sqlite database via hash/size/etc.

Syndicated 2014-04-03 11:02:30 from Steve Kemp's Blog

Some things on DNS and caching

Although there wasn't too many comments on my what would you pay for? post I did get some mails.

I was reminded about this via Mario Langs post, which echoed a couple of private mails I received.

Despite being something that I take for granted, perhaps because my hosting comes from the Bytemark, people do seem willing to pay money for DNS hosting.

Which is odd. I mean you could do it very very very cheaply if you had just four virtual machines. You can get complex and be geo-fancy, and you could use anycast on a small AS, but really? You could just deploy four virtual machines0 to provide a.ns, b.ns, c.ns, d.ns, and be better than 90% of DNS hosters out there.

The thing that many people mentioned was Git-backed, or Git-based DNS. Which would be trivial if you used tinydns, and no much harder if you used bind.

I suspect I'm "not allowed" to do DNS-things for a while, due to my contract with Dyn, but it might be worth checking...

ObRandom: Beat me to it. Register gitdns.io, or similar, and configure hooks from github to compile tinydns records.

In other news I started documenting early thoughts about my caching reverse proxy, which has now got a name stockpile.

I wrote some stub code using node.js, and although it was functional it soon became callback hell:

  • Is this resource cachable?
  • Does this thing exist in the cache already?
  • Should we return the server's response to the client, archive to memcached, or do both?

Expressing the rules neatly is also a challenge. I want the server core to be simple and the configuration to be something like:

is_cachable ( vhost, source, request, backened )
{
    /**
     * If the file is static, then it is cachable.
     */
    if ( request.url.match( /\.(jpg|png|txt|html?|gif)$/i ) ) {
        return true;
    }

    /**
     * If there is a cookie then the answer is false.
     */
    if ( request.has_cookie? ) { return false ; }

    /**
     * If the server is alive we'll now pass the remainder through it
     * if not then we'll serve from the cache.
     */
    if ( backend.alive? ) {
        return false;
    }
    else {
        return true;
    }
}

I can see there is value in judging the cachability based on the server response, but I plan to ignore that except for "Expires:", "Etag", etc ,etc)

Anyway callback hell does make me want to reexamine the existing C/C++ libraries out there. Because I think I could do better.

Syndicated 2014-03-29 18:16:12 from Steve Kemp's Blog

A diversion on off-site storage

Yesterday I took a diversion from thinking about my upcoming cache project, largely because I took some pictures inside my house, and realized my offsite backup was getting full.

I have three levels of backups:

  • Home stuff on my desktop is replicated to my wifes desktop, and vice-versa.
  • A simple server running rsync (content-free http://rsync.io/).
  • A "peering" arrangement of a small group of friends. Each of us makes available a small amount of space and we copy to-from each others shares, via rsync / scp as appropriate.

Unfortunately my rsync-based personal server is getting a little too full, and will certainly be full by next year. S3 is pricy, and I don't trust the "unlimited" storage people (backblaze,etc) to be sustainable and reliable long-term.

The pricing on Google-drive seems appealing, but I guess I'm loathe to share more data with Google. Perhaps I could dedicated a single "backup.account@gmail.com" login to that, separate from all-else.

So the diversion came along when I looked for Amazon S3-comptible, self-hosted, servers. There are a few, most of them are PHP-based, or similarly icky.

So far cloudfoundry's vlob looks the most interesting, but the main project seems stalled/dead. Sadly using s3cmd to upload files failed, but certainly the `curl` based API works as expected.

I looked at Gluster, CEPH, and similar, but didn't yet come up with a decent plan for handling offsite storage, but I know I have only six months or so before the need becomes pressing. I imagine the plan has to be using N-small servers with local storage, rather than 1-Large server, purely because pricing is going to be better that way.

Decisions decisions.

Syndicated 2014-03-26 17:53:08 from Steve Kemp's Blog

New GPG-key

I've now generated a new GPG-key for myself:

$ gpg --fingerprint 229A4066
pub   4096R/0C626242 2014-03-24
      Key fingerprint = D516 C42B 1D0E 3F85 4CAB  9723 1909 D408 0C62 6242
uid                  Steve Kemp (Edinburgh, Scotland) <steve@steve.org.uk>
sub   4096R/229A4066 2014-03-24

The key can be found online via mit.edu : 0x1909D4080C626242

This has been signed with my old key:

pub   1024D/CD4C0D9D 2002-05-29
      Key fingerprint = DB1F F3FB 1D08 FC01 ED22  2243 C0CF C6B3 CD4C 0D9D
uid                  Steve Kemp <steve@steve.org.uk>
sub   2048g/AC995563 2002-05-29

If there is anybody who has signed my old key who wishes to sign my new one then please feel free to get in touch to arrange it.

Syndicated 2014-03-24 19:26:03 from Steve Kemp's Blog

681 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!