Recent blog entries for joey

STM Region contents

concurrent-output released yesterday got a lot of fun features. It now does full curses-style minimization of the output, to redraw updated lines with optimal efficiency. And supports multiline regions/wrapping too long lines. And allows the user to embed ANSI colors in a region. 3 features that are in some tension and were fun to implement all together.

But I have a more interesting feature to blog about... I've added the ability for the content of a Region to be determined by a (STM transaction).

Here, for example, is a region that's a clock:

timeDisplay :: TVar UTCTime -> STM Text
timeDisplay tv = T.pack . show <$> readTVar tv

clockRegion :: IO ConsoleRegionHandle
clockRegion = do
    tv <- atomically . newTVar =<< getCurrentTime
    r <- openConsoleRegion Linear
    setConsoleRegion r (timeDisplay tv)
    async $ forever $ do
        threadDelay 1000000 -- 1 sec
        atomically . (writeTVar tv) =<< getCurrentTime
    return r

There's something magical about this. Whenever a new value is written into the TVar, concurrent-output automatically knows that this region needs to be updated. How does it know how to do that?

Magic of STM. Basically, concurrent-output composes all the STM transactions of Regions, and asks STM to wait until there's something new to display. STM keeps track of whatever TVars might be looked at, and so can put the display thread to sleep until there's a change to display.

Using STM I've gotten extensability for free, due to the nice ways that STM transactions compose.

A few other obvious things to do with this: Compose 2 regions with padding so they display on the same line, left and right aligned. Trim a region's content to the display width. (Handily exported by concurrent-output in a TVar for this kind of thing.)

I'm tempted to write a console spreadsheet using this. Each visible cell of the spreadsheet would have its own region, that uses a STM transaction to display. Plain data Cells would just display their current value. Cells that contain a function would read the current values of other Cells, and use that to calculate what to display. Which means that a Cell containing a function would automatically update whenever any of the Cells that it depends on were updated!

Do you think that a simple interactive spreadsheet built this way would be more than 100 lines of code?

Syndicated 2015-11-03 20:03:53 from see shy jo

a tiling region manager for the console

Building on top of concurrent-output, and some related work Joachim Breitner did earlier, I now have a kind of equivilant to a tiling window manager, except it's managing regions of the console for different parts of a single program.

Here's a really silly demo, in an animated gif:


Not bad for 23 lines of code, is that? Seems much less tedious to do things this way than using ncurses. Even with its panels, ncurses requires you to think about layout of various things on the screen, and many low-level details. This, by contrast, is compositional, just add another region and a thread to update it, and away it goes.

So, here's an apt-like download progress display, in 30 lines of code.


Not only does it have regions which are individual lines of the screen, but those can have sub-regions within them as seen here (and so on).

And, log-type messages automatically scroll up above the regions. External programs run by createProcessConcurrent will automatically get their output/errors displayed there, too.

What I'm working on now is support for multiline regions, which automatically grow/shrink to fit what's placed in them. The hard part, which I'm putting the finishing touches on, is to accurately work out how large a region is before displaying it, in order to lay it out. Requires parsing ANSI codes amoung other things.

STM rules

There's so much concurrency, with complicated interrelated data being updated by different threads, that I couldn't have possibly built this without Software Transactional Memory.

Rather than a nightmare of locks behind locks behind locks, the result is so well behaved that I'm confident that anyone who needs more control over the region layout, or wants to do funky things can dive into to the STM interface and update the data structures, and nothing will ever deadlock or be inconsistent, and as soon as an update completes, it'll display on-screen.

An example of how powerful and beuatiful STM is, here's how the main display thread determines when it needs to refresh the display:

data DisplayChange
        = BufferChange [(StdHandle, OutputBuffer)]
        | RegionChange RegionSnapshot
        | TerminalResize (Maybe Width)
        | EndSignal ()

                change <- atomically $
                        (RegionChange <$> regionWaiter origsnapshot)
                        (RegionChange <$> regionListWaiter origsnapshot)
                        (BufferChange <$> outputBufferWaiterSTM waitCompleteLines)
                        (TerminalResize <$> waitwidthchange)
                        (EndSignal <$> waitTSem endsignal)
                case change of
                        RegionChange snapshot -> do
                        BufferChange buffers -> do
                        TerminalResize width -> do

So, it composes all these STM actions that can wait on various kinds of changes, to get one big action, that waits for all of the above, and builds up a nice sum type to represent what's changed.

Another example is that the whole support for sub-regions only involved adding 30 lines of code, all of it using STM, and it worked 100% the first time.

Available in concurrent-output 1.1.0.

Syndicated 2015-10-31 01:44:47 from see shy jo

concurrent output library

concurrent-output is a Haskell library I've developed this week, to make it easier to write console programs that do a lot of different things concurrently, and want to serialize concurrent outputs sanely.

It's increasingly easy to write concurrent programs, but all their status reporting has to feed back through the good old console, which is still obstinately serial.

Haskell illustrates problem this well with this "Linus's first kernel" equivilant interleaving the output of 2 threads:

  > import System.IO
> import Control.Concurrent.Async
> putStrLn (repeat 'A') `concurrently` putStrLn (repeat 'B')

That's fun, but also horrible if you wanted to display some messages to the user:

  > putStrLn "washed the car" `concurrently` putStrLn "walked the dog"
walwkaesdh etdh et hdeo gc

To add to the problem, we often want to run separate programs concurrently, which have output of their own to display. And, just to keep things interesting, sometimes a unix program will behave differently when stdout is not connected to a terminal (eg, ls | cat).

To tame simple concurrent programs like these so they generate readable output involves a lot of plumbing. Something like, run the actions concurrently, taking care to capture the output of any commands, and then feed the output that the user should see though some sort of serializing channel to the display. Dealing with that when you just wanted a simple concurrent program risks ending up with a not-so-simple program.

So, I wanted an library with basically 2 functions:

outputConcurrent :: String -> IO ()
createProcessConcurrent :: CreateProcess -> IO whatever

The idea is, you make your program use outputConcurrent to display all its output, and each String you pass to that will be displayed serially, without getting mixed up with any other concurrent output.

And, you make your program use createProcessConcurrent everywhere it starts a process that might output to stdout or stderr, and it'll likewise make sure its output is displayed serially.

Oh, and createProcessConcurrent should avoid redirecting stdout and stderr away from the console, when no other concurrent output is happening. So, if programs are mostly run sequentially, they behave as they normally would at the console; any behavior changes should only occur when there is concurrency. (It might also be nice for it to allocate ttys and run programs there to avoid any behavior changes at all, although I have not tried to do that.)

And that should be pretty much the whole API, although it's ok if it needs some function called by main to set it up:

import Control.Concurrent.Async
import Control.Concurrent.Output

main = withConcurrentOutput $
    outputConcurrent "washed the car\n"
    createProcessConcurrent (proc "ls" [])
    outputConcurrent "walked the dog\n"
  $ ./demo
washed the car
walked the dog
Maildir/  bin/  doc/  html/  lib/  mail/  mnt/  src/  tmp/

I think that's a pretty good API to deal with this concurrent output problem. Anyone know of any other attempts at this I could learn from?

I implemented this over the past 3 days and 320 lines of code. It got rather hairy:

  • It has to do buffering of the output.
  • There can be any quantity of output, but program memory use should be reasonably small. Solved by buffering up to 1 mb of output in RAM, and writing excess buffer to temp files.
  • Falling off the end of the program is complicated; there can be buffered output to flush and it may have to wait for some processes to finish running etc.
  • The locking was tough to get right! I could not have managed to write it correctly without STM.

It seems to work pretty great though. I got Propellor using it, and Propellor can now run actions concurrently!

Syndicated 2015-10-29 02:07:34 from see shy jo

propelling disk images

Following up on Then and Now ...

In quiet moments at ICFP last August, I finished teaching Propellor to generate disk images. With an emphasis on doing a whole lot with very little new code and extreme amount of code reuse.

For example, let's make a disk image with nethack on it. First, we need to define a chroot. Disk image creation reuses propellor's chroot support, described back in propelling containers. Any propellor properties can be assigned to the chroot, so it's easy to describe the system we want.

 nethackChroot :: FilePath -> Chroot
    nethackChroot d = Chroot.debootstrapped (System (Debian Stable) "amd64") mempty d
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["nethack-console"]
        & accountFor gamer
        & gamer `hasInsecurePassword` "hello"
        & gamer `hasLoginShell` "/usr/games/nethack"
      where gamer = User "gamer"

Now to make an image from that chroot, we just have to tell propellor where to put the image file, some partitioning information, and to make it boot using grub.

 nethackImage :: RevertableProperty
    nethackImage = imageBuilt "/srv/images/nethack.img" nethackChroot
        MSDOS (grubBooted PC)
        [ partition EXT2 `mountedAt` "/boot"
            `setFlag` BootFlag
        , partition EXT4 `mountedAt` "/"
            `addFreeSpace` MegaBytes 100
        , swapPartition (MegaBytes 256)

The disk image partitions default to being sized to fit exactly the files from the chroot that go into each partition, so, the disk image is as small as possible by default. There's a little DSL to configure the partitions. To give control over the partition size, it has some functions, like addFreeSpace and setSize. Other functions like setFlag and extended can further adjust the partitions. I think that worked out rather well; the partition specification is compact and avoids unecessary hardcoded sizes, while providing plenty of control.

By the end of ICFP, I had Propellor building complete disk images, but no boot loader installed on them.

Fast forward to today. After stuggling with some strange grub behavior, I found a working method to install grub onto a disk image.

The whole disk image feature weighs in at:

203 lines to interface with parted
88 lines to format and mount partitions
90 lines for the partition table specification DSL and partition sizing
196 lines to generate disk images
75 lines to install grub on a disk image
652 lines of code total

Which is about half the size of vmdebootstrap 1/4th the size of partman-base (probably 1/100th the size of total partman), and 1/13th the size of live-build. All of which do similar things, in ways that seem to me to be much less flexible than Propellor.

One thing I'm considering doing is extending this so Propellor can use qemu-user-static to create disk images for eg, arm. Add some u-boot setup, and this could create bootable images for arm boards. A library of configs for various arm boards could then be included in Propellor. This would be a lot easier than running the Debian Installer on an arm board.

Oh! I only just now realized that if you have a propellor host configured, like this example for my dialup gateway, leech --

 leech = host ""
        & os (System (Debian (Stable "jessie")) "armel")
        & Apt.installed ["linux-image-kirkwood", "ppp", "screen", "iftop"]
        & privContent "/etc/ppp/peers/provider"
        & privContent "/etc/ppp/pap-secrets"
        & Ppp.onBoot
        & hasPassword (User "root")
        & Ssh.installed

-- The host's properties can be extracted from it, using eg hostProperties leech and reused to create a disk image with the same properties as the host!

So, when my dialup gateway gets struck by lightning again, I could use this to build a disk image for its replacement:

 import qualified Propellor.Property.Hardware.SheevaPlug as SheevaPlug

    laptop = host ""
        & SheevaPlug.diskImage "/srv/images/leech.img" (MegaBytes 2000)
            (& propertyList "has all of leech's properties"
                (hostProperties leech))

This also means you can start with a manually built system, write down the properties it has, and iteratively run Propellor against it until you think you have a full specification of it, and then use that to generate a new, clean disk image. Nice way to transition from sysadmin days of yore to a clean declaratively specified system.

Syndicated 2015-10-23 02:09:17 from see shy jo

propellor orchestration

With the disclamer that I don't really know much about orchestration, I have added support for something resembling it to Propellor.

Until now, when using propellor to manage a bunch of hosts, you updated them one at a time by running propellor --spin $somehost, or maybe you set up a central git repository, and a cron job to run propellor on each host, pulling changes from git.

I like both of these ways to use propellor, but they only go so far...

  • Perhaps you have a lot of hosts, and would like to run propellor on them all concurrently.

    master = host "" & concurrently conducts alotofhosts

  • Perhaps you want to run propellor on your dns server last, so when you add a new webserver host, it gets set up and working before the dns is updated to point to it.

    master = host "" & conducts webservers before conducts dnsserver

  • Perhaps you have something more complex, with multiple subnets that propellor can run in concurrently, finishing up by updating that dnsserver.

    master = host "" & concurrently conducts [sub1, sub2] before conducts dnsserver

    sub1 = "" & concurrently conducts webservers & conducts loadbalancers

    sub2 = "" & conducts dockerservers

  • Perhaps you need to first run some command that creates a VPS host, and then want to run propellor on that host to set it up.

    vpscreate h = cmdProperty "vpscreate" [hostName h] before conducts h

All those scenarios are supported by propellor now!

Well, I haven't actually implemented concurrently yet, but the point is that the conducts property can be used with any of propellor's property combinators, like before etc, to express all kinds of scenarios.

The conducts property works in combination with an orchestrate function to set up all the necessary stuff to let one host ssh into another and run propellor there.

  main = defaultMain (orchestrate hosts)

hosts = 
    [ master
    , webservers 
    , ...

The orchestrate function does a bunch of stuff:

  • Builds up a graph of what conducts what.
  • Removes any cycles that might have snuck in by accident, before they cause foot shooting.
  • Arranges for the ssh keys to be accepted as necessary.
    Note that you you need to add ssh key properties to all relevant hosts so it knows what keys to trust.
  • Arranges for the private data of a host to be provided to the hosts that conduct it, so they can pass it along.

I've very pleased that I was able to add the Propellor.Property.Conductor module implementing this with only a tiny change to the rest of propellor. Almost everything needed to implement it was there in propellor's infrastructure already.

Also kind of cool that it only needed 13 lines of imperative code, the other several hundred lines of the implementation being all pure code.

Syndicated 2015-10-22 01:02:58 from see shy jo

it's a bird, it's a plane, it's a super monoid for propellor

I've been doing a little bit of dynamically typed programming in Haskell, to improve Propellor's Info type. The result is kind of interesting in a scary way.

Info started out as a big record type, containing all the different sorts of metadata that Propellor needed to keep track of. Host IP addresses, DNS entries, ssh public keys, docker image configuration parameters... This got quite out of hand. Info needed to have its hands in everything, even types that should have been private to their module.

To fix that, recent versions of Propellor let a single Info contain many different types of values. Look at it one way and it contains DNS entries; look at it another way and it contains ssh public keys, etc.

As an émigré from lands where you can never know what type of value is in a $foo until you look, this was a scary prospect at first, but I found it's possible to have the benefits of dynamic types and the safety of static types too.

The key to doing it is Data.Dynamic. Thanks to Joachim Breitner for suggesting I could use it here. What I arrived at is this type (slightly simplified):

newtype Info = Info [Dynamic]
    deriving (Monoid)

So Info is a monoid, and it holds of a bunch of dynamic values, which could each be of any type at all. Eep!

So far, this is utterly scary to me. To tame it, the Info constructor is not exported, and so the only way to create an Info is to start with mempty and use this function:

addInfo :: (IsInfo v, Monoid v) => Info -> v -> Info
addInfo (Info l) v = Info (toDyn v : l)

The important part of that is that only allows adding values that are in the IsInfo type class. That prevents the foot shooting associated with dynamic types, by only allowing use of types that make sense as Info. Otherwise arbitrary Strings etc could be passed to addInfo by accident, and all get concated together, and that would be a total dynamic programming mess.

Anything you can add into an Info, you can get back out:

getInfo :: (IsInfo v, Monoid v) => Info -> v
getInfo (Info l) = mconcat (mapMaybe fromDynamic (reverse l))

Only monoids can be stored in Info, so if you ask for a type that an Info doesn't contain, you'll get back mempty.

Crucially, IsInfo is an open type class. Any module in Propellor can make a new data type and make it an instance of IsInfo, and then that new data type can be stored in the Info of a Property, and any Host that uses the Property will have that added to its Info, available for later introspection.

For example, this weekend I'm extending Propellor to have controllers: Hosts that are responsible for running Propellor on some other hosts. Useful if you want to run propellor once and have it update the configuration of an entire network of hosts.

There can be whole chains of controllers controlling other controllers etc. The problem is, what if host foo has the property controllerFor bar and host bar has the property controllerFor foo? I want to avoid a loop of foo running Propellor on bar, running Propellor on foo, ...

To detect such loops, each Host's Info should contain a list of the Hosts it's controlling. Which is not hard to accomplish:

newtype Controlling = Controlled [Host]
    deriving (Typeable, Monoid)

isControlledBy :: Host -> Controlling -> Bool
h `isControlledBy` (Controlled hs) = any (== hostName h) (map hostName hs)

instance IsInfo Controlling where
    propigateInfo _ = True

mkControllingInfo :: Host -> Info
mkControllingInfo controlled = addInfo mempty (Controlled [controlled])

getControlledBy :: Host -> Controlling
getControlledBy = getInfo . hostInfo

isControllerLoop :: Host -> Host -> Bool
isControllerLoop controller controlled = go S.empty controlled
    go checked h
        | controller `isControlledBy` c = True
        -- avoid checking loops that have been checked before
        | hostName h `S.member` checked = False
        | otherwise = any (go (S.insert (hostName h) checked)) l
        c@(Controlled l) = getControlledBy h

This is all internal to the module that needs it; the rest of propellor doesn't need to know that the Info is using used for this. And yet, the necessary information about Hosts is gathered as propellor runs.

So, that's a useful technique. I do wonder if I could somehow make addInfo combine together values in the list that have the same type; as it is the list can get long. And, to show Info, the best I could do was this:

 instance Show Info where
            show (Info l) = "Info " ++ show (map (dynTypeRep . fst) l)

The resulting long list of the types of vales stored in a host's info is not a useful as it could be. Of course, getInfo can be used to get any particular type of value:

  *Main> hostInfo kite
Info [InfoVal System,PrivInfo,PrivInfo,Controlling,DnsInfo,DnsInfo,DnsInfo,AliasesInfo, ...
*Main> getInfo (hostInfo kite) :: AliasesInfo
AliasesInfo (fromList ["","","","" ...

And finally, I keep trying to think of a better name than "Info".

Syndicated 2015-10-17 18:43:20 from see shy jo

then and now

It's 2004 and I'm in Oldenburg DE, working on the Debian Installer. Colin and I pair program on partman, its new partitioner, to get it into shape. We've somewhat reluctantly decided to use it. Partman is in some ways a beautful piece of work, a mass of semi-object-oriented, super extensible shell code that sprang fully formed from the brow of Anton. And in many ways, it's mad, full of sector alignment twiddling math implemented in tens of thousands of lines of shell script scattered amoung hundreds of tiny files that are impossible to keep straight. In the tiny Oldenburg Developers Meeting, full of obscure hardware and crazy intensity of ideas like porting Debian to VAXen, we hack late into the night, night after night, and crash on the floor.

sepia toned hackers round a table

It's 2015 and I'm at a Chinese bakery, then at the Berkeley pier, then in a SF food truck lot, catching half an hour here and there in my vacation to add some features to Propellor. Mostly writing down data types for things like filesystem formats, partition layouts, and then some small amount of haskell code to use them in generic ways. Putting these peices together and reusing stuff already in Propellor (like chroot creation).

Before long I have this, which is only 2 undefined functions away from (probably) working:

let chroot d = Chroot.debootstrapped (System (Debian Unstable) "amd64") mempty d
        & Apt.installed ["openssh-server"]
        & ...
    partitions = fitChrootSize MSDOS
        [ (Just "/boot", mkPartiton EXT2)
        , (Just "/", mkPartition EXT4)
        , (Nothing, const (mkPartition LinuxSwap (MegaBytes 256)))
 in Diskimage.built chroot partitions (grubBooted PC)

This is at least a replication of vmdebootstrap, generating a bootable disk image from that config and 400 lines of code, with enormous customizability of the disk image contents, using all the abilities of Propellor. But is also, effectively, a replication of everything partman is used for (aside from UI and RAID/LVM).

sailboat on the SF bay

What a difference a decade and better choices of architecture make! In many ways, this is the loosely coupled, extensible, highly configurable system partman aspired to be. Plus elegance. And I'm writing it on a lark, because I have some spare half hours in my vacation.

Past Debian Installer team lead Tollef stops by for lunch, I show him the code, and we have the conversation old d-i developers always have about partman.

I can't say that partman was a failure, because it's been used by millions to install Debian and Ubuntu and etc for a decade. Anything that deletes that many Windows partitions is a success. But it's been an unhappy success. Nobody has ever had a good time writing partman recipes; the code has grown duplication and unmaintainability.

I can't say that these extensions to Propellor will be a success; there's no plan here to replace Debian Installer (although with a few hundred more lines of code, propellor is d-i 2.0); indeed I'm just adding generic useful stuff and building further stuff out of it without any particular end goal. Perhaps that's the real difference.

Syndicated 2015-08-27 00:01:59 from see shy jo

sweet summer


Having a wonderful summer, full of simple sweet pleasures. Mom visited today, and I made her this blackberry chocolate tart. Picking berries, swimming in the river, perfect summer day.


Earlier this summer, camped at in the dunes on Ocracoke island with many family and friends. Thunderstorms away across the sound flashed and grumbled long in the night, but mostly missed us. Jupiter and Venus in conjunction overhead, and the arch of the milky way completed the show.


Syndicated 2015-07-11 20:23:31 from see shy jo

our beautiful fake histories

Here's an odd thing about the git bisect command: It has only 1 option (--no-checkout). Compare with eg git commit, which has 36 options by my count.

The difference is largely down to git having a pervasive culture of carefully edited history. We need lots of git commit options to carefully produce commits that look Just Right. Staging only some of the files we've edited, perhaps even staging only some of the changes within a file. Amend that commit if we notice we made a mistake. Create a whole series of beautiful commits, and use rebase later to remix them into a more beautiful whole.

Beautiful fake histories. Because coding is actually messy; our actual edit history contains blind alleys and doublings back on itself; contains periods of many days the code isn't building properly. We want to sweep that complexity away, hide it under the rug. This works well except when it doesn't, when some detail airbrushed out of the only remaining history turns out to be important.

Once we have these beautiful fake histories of changes, we can easily bisect them and find the commit that introduced a bug. So bisect doesn't need a lot of options to control how it works.

I'd like to suggest a new option though. At least as a thought experiment. --merges-only would make bisect only check the merge commits in the range of commits being bisected. The bisection would result in not a single commit, but in the set of commits between two merges.

I suspect this would be useful for faster bisecting some histories of the beautiful fake kind. But I know it would be useful when the history is messy and organic and full of false starts and points where the code doesn't build. Merges, in such histories, are often the points where things reach a certian level of beauty, where that messy feature branch got to the point it all built again (please let this happen today) and was merged into master. Bisecting such points in a messy organic history should work about as well as bisecting carefully gardened histories.

I think I'll save the full rant about beautiful fake history vs messy real history for some other day. Or maybe I've already ranted that rant here before, I can't remember.

Let's just say that I personally come down on the side of liking my git history to reflect the actual code I was working on, even if it was broken and even if I threw it away later. I've indeed taken this to extreme lengths with propellor; in its git history you can see every time I've ever run it, and the version of my config file and code at that point. Apologies to anyone who's been put off by that... But oddly, propellor gets by far more contributions from others than any of my other haskell programs.

All in the form of beaufiully constructed commits, naturally.

Syndicated 2015-07-10 15:40:23 from see shy jo

I am ArchiveTeam

This seems as good a day as any to mention that I am a founding member of ArchiveTeam.

ArchiveTeam logo

Way back, when Geocities was closing down, I was one of a small rag-tag group who saved a copy of most of it. That snapshot has since generated more publicity than most other projects I've worked on. I've heard many heartwarning stories of it being the only remaining copy of baby pictures and writings of deceased friends, and so on. It's even been the subject of serious academic study as outlined in this talk, which is pretty awesome.

Jason Scott in full stage regalia

I'm happy to let this guy be the public face of ArchiveTeam in internet meme-land. It's a 0.1% project for me, and has grown into a well-oiled machine, albeit one that shouldn't need to exist. I only get involved these days when there's another crazy internet silo fire drill and/or I'm bored.

(Rumors of me being the hand model for ArchiveTeam are, however, unsubstantiated.)

Syndicated 2015-04-01 17:50:49 from see shy jo

580 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!