Older blog entries for joey (starting at number 479)

announcing github-backup

Partly as a followup to a Github survey, and partly because I had a free evening and the need to write more haskell code, any haskell code, I present to you, github-backup.

github-backup is a simple tool you run in a git repository you cloned from Github. It backs up everything Github knows about the repository, including other forks, issues, comments, milestones, pull requests, and watchers.

This is all stored in the repository, as regular files, on a "github" branch.

Available in Cabal now, in Debian maybe if someone packages haskell-github.

Syndicated 2012-01-26 04:44:10 from see shy jo

olduse.net 1982

Hard to believe I've consumed all of 1981's Usenet posts now on olduse.net, and it's been running for 7 months already.

Last night, there was a "very long" post, describing nearly every node on usenet in 1982. There had been a warning about this post the day before, since it would take many sites half an hour to download at 300 baud. It was handily formatted as a shell script, which created per-node files.

So, I ran this code nobody has run since 1982. It worked. I got files. I tossed them on the olduse.net wiki, and used some ikiwiki code TOVA contracted me to write just a few months ago, to make clickable links on my usenet map.

usenet map

The map data was contributed in another post a while back. By 1982, usenet is getting nearly impossible to map with 1982 technology of ascii art. I enjoyed throwing graphviz, git, wikis, and the web at it.

So, we have a collaboration across time, me and "Mark" and a lot of people who described their usenet nodes and piles of technology that make creating a mashup easy. Awesome!

I blog about stuff I find on the olduse.net blog. It's an open blog; Koldfront also blogs there, and we welcome other bloggers.

Some of the highlights for me have included:

As the space shuttle program is winding down, reading the excitement about the first shuttle flights, and the play-by-play coverage of a launch, posted to net.columbia by a high school student borrowing his dad's account. (A usegroup name that's hard to read without remembering its fate).

The announcements of the Motorola M68k, the IBM PC, and the CD-ROM.

world ipv6 launch Reading the TCP-IP digest, and Postel's plans for launching IPv4 soon, while the world IPv6 launch is being planned now. (The nay-sayers are especially fun to read. Including the guy who was concerned about the address space size, in 1981!)

Learning that nethack ascention tales have a history streching back 30 years, to rogue, and that the stories back then had much the same flavor as they do today.

Various celebrity sightings. Dennis Ritchie teaching C and Unix. Bill Joy talking vi. RMS talking .. nuclear politics?

The general development of usenet. B-news being rolled out, groups proliferating, many first inklings of what will be major problems and developments in 5 or 10 years. A shift in tone is already apparent, by now usenet is not only about announcements, there are already some flames.

oldusenet in a period terminal

Still 9 years to go!

Syndicated 2012-01-21 20:58:05 from see shy jo

version numbers

Today I released two entirely different pieces of software with the identical version number 3.20120115. Debian developers also will be soon noticing a piece of software I released with the version number 9.20120115.

I expect to move more of my software to this version number scheme over time, unless I find something badly wrong with it. It reflects how I think about versions for my software; there's a kind of continual "now" that development progresses through, in which individual releases have little discrete meaning and at the same time, there can also be significant discontinuities, that require the user to do something to deal with (such as a new debhelper compat version, or a new git-annex repository format).

Those two things are really all that I need a version number for my software to communicate. I can do without the rest of the things that version numbers are used for:

  • The marketing of version 1.0 and 2.0.
  • The comparative nuances such as whether 1.0 to 1.1 is a relatively big change, and 1.0 to 1.0.1 is a relatively small change
  • The implication that 0.99 is almost 1.0 ready, and 1.1a is some kind of alpha release.

There is so much software, with so many version numbers that any signal encoded in such version numbers is swamped in the noise. Even on projects that I develop a version number like 2.88 is meaningless to me. All I care about is, how long ago was that version? Has there been a major change breaking compatibility since that version? "2.88" doesn't answer these questions well; "3.20111111" does.

It is a little wordy to have the full year in there, and it can be annoying to remember to set the version to the right date on release day (TODO: automate). This is balanced with the version not being so wordy as to include the time of day, which means I might have to do a 3.20120115.1 if I goof up. These minor problems are worth it to instantly know how old a version is when a user pastes it into a bug report.

And that is probably all I will ever have to say about version numbers. :)

Syndicated 2012-01-16 02:21:07 from see shy jo

a resolution that stuck

Last year, my new year's resolution was to write in my journal every day. That actually stuck, I wrote 262 journal entries in 2011. While I've been keeping a journal intermittently since 1998, last year I doubled the number of entries in it. And wrote a novel's worth of entries -- 53 thousand words!

Most of it is of course banal and mundane stuff. Not good compared with Lars, who does something with his journal where he goes into some detail about code he's working on, and other work. The excerpts I've seen are quite nice. But after I've written code, written a commit message, documentation, perhaps bug reports etc, I often can't find much to say about it in my journal, beyond the bare bones that I worked on $foo today or faced a particularly hard bug. I also worry that the journal, and my reluctance to repeat myself, often tips the balance away from me blogging, if I write down something in the journal first.

Here's my journal for today:

Compare what jokes are funny now with those in 1982. The 1982 ones from net.jokes on olduse.net seem juvenile. Now compare what Unix joke man pages are funny now with those I'm reading from 1982. They seem basically the same. What would Biella make of this?

Liw noticed ikiwiki OOM on pell. Tracked down to a perl markdown bug with long lines. Had quite enough of perl markdown; ikiwiki will be moving to a different engine. Added discount support to it today, still needs Debian package tho.


Really gorgeous sunset, with a high wind, moon, puffy low, fast moving clouds. Enjoyed it ecstaticly. It's going to get cold soon. Very rainy early, but then got intermittently sunny; power is holding out ok.

Was going to roast a chicken today, but got distracted and had a large lunch besides. Need to find some quick food for supper.

I need to start a new book, should it be the River Cottage book about meat that I stole from Anna, or some SF?

Blogged about journaling, and put this journal entry in it, so also journaled about blogging. Wrote it somewhat self-conciously.

The benefits for me have ranged from being able to go back and work out dates of events, to forwarding the odd excerpts to others. The best thing though is certianly having a regular time of introspection, to look back over my the day.

If you've not got a new year's resolution yet, I recommend this one. (Learning Haskell would be another good one, if you haven't yet.)

Just write something, anything, down in your journal every day.

Syndicated 2012-01-01 22:58:57 from see shy jo

solar year

I've been at the cabin, on solar power, for a year now. I have a year of data!

Everything went pretty well until last month. There was an April rainy spell where power felt slightly tight. Then over the summer, plenty of power, no need to conserve. The last month though had what seemed like weeks of continual grey clouds, where I never saw the sun.

high noon today

Of course, even on a sunny day in winter, it does not get far above the hills, and the peak production window is only a few hours. This bad combination had my battery power dipping below the 10 volts that I consider low, down to 9, and even to 8 volts.

I use kerosine lamps in the winter. (I prefer the light anway.) I've also started unplugging my Thecus server at night to conserve power, meaning no internet late or early. For four or so nights, I had no power to run even my laptop after sunset. On one notable day, there was no power even in the daytime.

Even when it turned sunny again, I found that the batteries would seem to charge to 12 volts during the day, but then precipitously drop to 10 and 9 volts at night. I think the problem was not damaged batteries, but that these Nicads charge most efficiently above 12 volts (14 volts is best), and there was never enough power saved up to get them full enough that they could charge really efficiently.

So, I reluctantly spent three days away this week, to let the batteries soak up sun and recover. It seems to have worked; they've been holding a 12 volt charge overnight again.

Syndicated 2011-12-31 18:15:55 from see shy jo

a Github survey

The great thing about git and other distributed version control systems is that once you clone (or fork) a repository, you have all the data. You don't have to trust that Github will preserve it; everyone who develops the project is a backup.

Github carries this principle quite far amoung the features they provide. But not all the way. Today I have surveyed their features, and where the data for each is stored.

  • source code -- in git, of course!
  • user and project pages and wiki -- in git
  • gists -- in git
  • issues -- in a database accessible by an API
  • notes on commits -- in a database accessible by an API
  • relationships between repos (who forked what, pull requests) -- in a database accessible by an API
  • your account details and activity -- in a database, accessible by you via an API
  • list of all projects and users -- in a closed database (AFAIK)

The two that really stand out are the issues and notes not being stored in git. This means that, if a project uses github, it gets locked into github to a degree. The records of bugs and features, all the planning, and communication, is locked away in a database where it cannot be cloned, where every developer is not a backup.

Github's intent here is not to control this data to lock you in (to the extent they want to lock you in, they do that by providing a proprietary UI that people rave about); it was probably only expedient to use some sort of database, rather than git, when implementing these features.

They should automatically produce git repository branches containing a project's issues, and notes, based on the contents of their database. (For notes, git notes is the obviously right storage location.) Along with ensuring every developer checkout is a backup, this would allow accessing that data while offline, which is one of the reasons we use distributed version control.

The lack of a global list of projects is problimatic in a more global sense. It means that we can't make a backup of all the (public) repositories in Github (assuming that we had the bandwidth and storage to do it). I recently backed up all the repositories on Berlios.de, when it looked to be shutting down; this was only possible because they allowed enumerating them all.

People at The Internet Archive say that their archival coverage of free software is actually quite bad. We trust our version control systems to save our free software data, but while this works individually, it will result in data loss globally over time. I'd encourage Github to help the Internet Archive improve their collections by donating periodic snapshots of their public git repositories to the Archive. You're located in the same city, 5 miles apart; they have lots of hard drives (though less right now during the shortage than usual); this should be pretty easy to do.

Full disclosure: Github has bought me dinner and seemed like stand-up guys to me.

Syndicated 2011-12-27 17:38:45 from see shy jo

roundtrip latency from a cabin with dialup in 2011

alt="imagine an xkcd-style infographic here"

0 seconds

  • peace and quiet
  • full history of all my projects (git repos)
  • my blog
  • email

0.5 seconds

  • chatting on IRC
  • searching through all email received since 1994
  • music
  • cached web pages

5 seconds

  • ssh to a server
  • search the web
  • lwn, hacker news, reddit, metafilter, and other web aggregators

10 seconds

  • resuming laptop from sleep and waiting for network-manager
  • view an unnecessarily pastebinned scrap of text
  • access local Debian mirror
  • looking up a typical bug report

20 seconds

  • click on a typical link from a web aggregator
  • an hour of video pulled from a USB drive with git-annex

2 minutes

  • downloading new email
  • an increasing number of websites that force https (average of 3 reloads needed due to timeouts)

5 minutes

  • viewing a single file, bug report, or merge request on github
  • cloning the full content of a typical not too large git repo
  • retriving data from archival drives via git-annex
  • going offline and making a phone call
  • apt-get update (thanks aj, for the pdiffs)
  • viewing a single a twitter page (megabytes of crud and #! redirections)

10 minutes

  • entering a state of flow while programming
  • boingboing.net (with all the pretty pictures)
  • my mailbox (after a nice walk down a long driveway)

22 minutes

  • milk and eggs
  • a swim in the river

30 minutes

  • broadband internet access
  • someone else who knows what linux is

32 minutes

  • an hour of video pulled from my server with git-annex (includes travel time to broadband access point)

70 minutes

  • a halfway decent but slightly overpriced grocery store
  • a produce stand
  • a coffee shop

180 minutes

  • family
  • a bakery with real bread

300 minutes

  • downloading a typical podcast

Syndicated 2011-11-23 21:44:04 from see shy jo

the Engelbart demo

Just watched the whole Douglas Engelbart demo from 1968. Somehow I'd only heard of this as the first demo of the computer mouse, and only seen a brief clip on youtube. All three 30-minute reels of the film are available online, and well worth a watch in full.

The mouse is the least of it, the demo includes an outlining text editor, model-view-controller, hypertext, wiki, domain specific programming languages, a precurser to email, bug tracking, version control(?), a chorded keyboard. (Ok, that last one didn't really take off.) Probably a dozen other things I've forgotten. All in a single interface, and all before I was born.

Just like any tech demo, there are fumbles and mistakes, which is reassuring to anyone who has tried to give a tech demo.

There's also the awesome crazy hack shown here. They could only afford these tiny, round CRTs, so they pointed a television camera at it, and the camera image was piped to their television console. (So add KVM switch to the list of firsts!) The demo was done in San Fransisco, with the computer system remote in Palo Alto, so in this image you see the text on the CRT overlaid with the video from the camera.

Engelbart points out that the delay this added to the system acts as a short-term memory that filtered out flicker in the original display (and made the mouse have a mouse trail). To me it gives the whole demo a unique quality, as if it were underwater.

Despite the piping around of audio and video signals, and the multiuser system, the glaring thing missing from the demo that we have these days is networking. Although there is this amusing bit at the end where they compile a regular expression and then apply it, in order to search for documents containing certain terms, and end up with a hyperlinked list of 10 results, ordered by relevance. Yes, think Google.

Syndicated 2011-11-03 00:14:19 from see shy jo

two random thoughts about bugs

First thought is this: A bug's likelyhood of ever being fixed decays with time, starting when I first read it. If I have to read it a second time, the bug has already become more complex, since something prevented me from just fixing it the first time. If more information has to be added to the bug, that makes it yet more complex. If there is an argument in the bug about whether it is a bug, or how to fix it, just revisiting the bug at a later date can become more expensive than it's worth. Much of what is involved in filing good and effective bug reports are obvious corollaries of this. It also follows that it's best to either fix, or at least plan how to fix a bug immediatly upon reading it.

Second thought is about "wontfix". A bug submitter and the developer responsible for the bug see this state in very different ways, but the name hides what it really means, which is that there is a meta-bug affecting either the bug submitter, the developer, or both. Once you realize this, wontfix bugs, from either side, become a bit personally insulting. They also quickly decay to uselessness (see first thought), and then just lurk there wasting the developer's time in various ways. Bug tracking systems should not provide a "wontfix" state; if they want to track meta-bugs they should provide a way to reassign such a bug to some other party who can actually resolve such a meta-bug.

Syndicated 2011-10-29 18:08:33 from see shy jo


I attended the Git Together earlier this week. I was tenative about this, since I'm not really much of a git developer; all my git work is building stuff on top of it. It turned out great though.

At first it seemed like one of those parties where you don't know anyone. But then I got to reconnect with Avery Pennarun for the first time since DebConf 2, and got to know Jonathan Nieder better, and it was also nice to see Jelmer Vernooij. And the core developers were also very welcoming. Junio Hamano knew of my work (and I am in awe of his), and Jeff King thinks my take on SHA1 security issues has value, and has been expanding on it. Shawn Pearce managed the unconference subtly and well. Lots of very smart people. At one point I found myself accross the table from Android's lead developer.

I was very happy that everything I think needs improvement in git was discussed during the unconference:

  • big files: My postit suggesting this got more checks than most anything else, and I briefly presented git-annex at the start of a session on general scalability -- on its 1-year anniversary. Some ideas for improved hooks that git-annex and other tools could use are developing. Better scalability to lots of files and more efficient index files were also discussed.
  • git as a filesystem: There was a consensus that gone are the days when git was just about managing source code. (I remember being told on #git before I wrote etckeeper, that no, git should not be used for that..)
  • submodules: I was astounded that they're now considering supporting "floating" submodules, which would track the head of a branch, rather that the specific rev committed in the superproject. Many other problems that have kept me from ever trying submodules are also being worked on. This seems unlikely to replace mr, but who knows -- at least getting rid of repo is a goal.
  • SHA1 security was discussed for quite a long while, long enough that I felt a bit guilty for bringing it up, but it was an interesting and fruitful discussion. I went in thinking that the checksum basically has to be parameterized, but they have some good reasons not to do that, and some other good ideas, although what to do and when best to do it is still open for discussion. Signed commits are certianly coming soon. Also this amazing patch was developed.
  • Metadata storage was briefly discussed, but nobody seemed sure how to deal with it. Ideas floated included a metastore like tool that uses mergeable files, or storing metadata in some sort of notes-like separate branch.

Syndicated 2011-10-29 00:56:26 from see shy jo

470 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!