Recent blog entries for joey

type safe multi-OS Propellor

Propellor was recently ported to FreeBSD, by Evan Cofsky. This new feature led me down a two week long rabbit hole to make it type safe. In particular, Propellor needed to be taught that some properties work on Debian, others on FreeBSD, and others on both.

The user shouldn't need to worry about making a mistake like this; the type checker should tell them they're asking for something that can't fly.

-- Is this a Debian or a FreeBSD host? I can't remember, let's use both package managers!
host "" $ props
    & aptUpgraded
    & pkgUpgraded

As of propellor 3.0.0 (in git now; to be released soon), the type checker will catch such mistakes.

Also, it's really easy to combine two OS-specific properties into a property that supports both OS's:

upgraded = aptUpgraded `pickOS` pkgUpgraded

type level lists and functions

The magick making this work is type-level lists. A property has a metatypes list as part of its type. (So called because it's additional types describing the type, and I couldn't find a better name.) This list can contain one or more OS's targeted by the property:

aptUpgraded :: Property (MetaTypes '[ 'Targeting 'OSDebian, 'Targeting 'OSBuntish ])

pkgUpgraded :: Property (MetaTypes '[ 'Targeting 'OSFreeBSD ])

In Haskell type-level lists and other DataKinds are indicated by the ' if you have not seen that before. There are some convenience aliases and type operators, which let the same types be expressed more cleanly:

aptUpgraded :: Property (Debian + Buntish)

pkgUpgraded :: Property FreeBSD

Whenever two properties are combined, their metatypes are combined using a type-level function. Combining aptUpgraded and pkgUpgraded will yield a metatypes that targets no OS's, since they have none in common. So will fail to type check.

My implementation of the metatypes lists is hundreds of lines of code, consisting entirely of types and type families. It includes a basic implementation of singletons, and is portable back to ghc 7.6 to support Debian stable. While it takes some contortions to support such an old version of ghc, it's pretty awesome that the ghc in Debian stable supports this stuff.

extending beyond targeted OS's

Before this change, Propellor's Property type had already been slightly refined, tagging them with HasInfo or NoInfo, as described in making propellor safer with GADTs and type families. I needed to keep that HasInfo in the type of properties.

But, it seemed unnecessary verbose to have types like Property NoInfo Debian. Especially if I want to add even more information to Property types later. Property NoInfo Debian NoPortsOpen would be a real mouthful to need to write for every property.

Luckily I now have this handy type-level list. So, I can shove more types into it, so Property (HasInfo + Debian) is used where necessary, and Property Debian can be used everywhere else.

Since I can add more types to the type-level list, without affecting other properties, I expect to be able to implement type-level port conflict detection next. Should be fairly easy to do without changing the API except for properties that use ports.


As shown here, pickOS makes a property that decides which of two properties to use based on the host's OS.

aptUpgraded :: Property DebianLike
aptUpgraded = property "apt upgraded" (apt "upgrade" `requires` apt "update")

pkgUpgraded :: Property FreeBSD
pkgUpgraded = property "pkg upgraded" (pkg "upgrade")
upgraded :: Property UnixLike
upgraded = (aptUpgraded `pickOS` pkgUpgraded)
    `describe` "OS upgraded"

Any number of OS's can be chained this way, to build a property that is super-portable out of simple little non-portable properties. This is a sweet combinator!

Singletons are types that are inhabited by a single value. This lets the value be inferred from the type, which came in handy in building the pickOS property combinator.

Its implementation needs to be able to look at each of the properties at runtime, to compare the OS's they target with the actial OS of the host. That's done by stashing a target list value inside a property. The target list value is inferred from the type of the property, thanks to singletons, and so does not need to be passed in to property. That saves keyboard time and avoids mistakes.

is it worth it?

It's important to consider whether more complicated types are a net benefit. Of course, opinions vary widely on that question in general! But let's consider it in light of my main goals for Propellor:

  1. Help save the user from pushing a broken configuration to their machines at a time when they're down in the trenches dealing with some urgent problem at 3 am.
  2. Advance the state of the art in configuration management by taking advantage of the state of the art in strongly typed haskell.

This change definitely meets both criteria. But there is a tradeoff; it got a little bit harder to write new propellor properties. Not only do new properties need to have their type set to target appropriate systems, but the more polymorphic code is, the more likely the type checker can't figure out all the types without some help.

A simple example of this problem is as follows.

foo :: Property UnixLike
foo = p `requires` bar
    p = property "foo" $ do

The type checker will complain that "The type variable ‘metatypes1’ is ambiguous". Problem is that it can't infer the type of p because many different types could be combined with the bar property and all would yield a Property UnixLike. The solution is simply to add a type signature like p :: Property UnixLike

Since this only affects creating new properties, and not combining existing properties (which have known types), it seems like a reasonable tradeoff.

things to improve later

There are a few warts that I'm willing to live with for now...

Currently, Property (HasInfo + Debian) is different than Property (Debian + HasInfo), but they should really be considered to be the same type. That is, I need type-level sets, not lists. While there's a type level sets library for hackage, it still seems to require a specific order of the set items when writing down a type signature.

Also, using ensureProperty, which runs one property inside the action of another property, got complicated by the need to pass it a type witness.

foo = Property Debian
foo = property' $ \witness -> do
    ensureProperty witness (aptInstall "foo")

That witness is used to type check that the inner property targets every OS that the outer property targets. I think it might be possible to store the witness in the monad, and have ensureProperty read it, but it might complicate the type of the monad too much, since it would have to be parameterized on the type of the witness.

Oh no, I mentioned monads. While type level lists and type functions and generally bending the type checker to my will is all well and good, I know most readers stop reading at "monad". So, I'll stop writing. ;)


Thanks to David Miani who answered my first tentative question with a big hunk of example code that got me on the right track.

Also to many other people who answered increasingly esoteric Haskell type system questions.

Also thanks to the Shuttleworth foundation, which funded this work by way of a Flash Grant.

Syndicated 2016-03-28 11:29:18 from see shy jo

documentation first

I write documentation first and code second. I've mentioned this from time to time (previously, previously) but a reader pointed out that I've never really explained why I work that way.

It's a way to make my thinking more concrete without diving all the way into the complexities of the code right away. So sometimes, what I write down is design documentation, and sometimes it's notes on a bug report[1], but if what I'm working on is user-visible, I start by writing down the end user documentation.

Writing things down lets me interact with them as words on a page, which are more concrete than muddled thoughts in the head, and much easier to edit and reason about. Code constrains to existing structures; a blank page frees you to explore and build up new ideas. It's the essay writing process, applied to software development, with a side effect of making sure everything is documented.

Also, end-user documentation is best when it doesn't assume that the user has any prior knowledge. The point in time when I'm closest to perfect lack of knowledge about something is before I've built it[2]. So, that's the best time to document it.

I understand what I'm trying to tell you better now that I've written it down than I did when I started. Hopefully you do too.

[1] I'll often write a bug report down even if I have found the bug myself and am going to fix it myself on the same day. (example) This is one place where it's nice to have bug reports as files in the same repository as the code, so that the bug report can be included in the commit fixing it. Often the bug report has lots of details that don't need to go into the commit message, but explain more about my evolving thinking about a problem.

[2] Technically I'm even more clueless ten years later when I've totally forgotten whatever, but it's not practical to wait. ;-)

Syndicated 2016-02-27 16:05:24 from see shy jo

trademark nonsense

Canonical appear to require that you remove all trademarks entirely even if using them wouldn't be a violation of trademark law.

-- Matthew Garrett

Each time Matthew brings this up, and as evidence continues to mount that Canonical either actually intends their IP policy to be read that way, or is intentionally keeping the situation unclear to FUD derivatives, I start wondering about references to Ubuntu in my software.

Should such references be removed, or obscured, like "U*NIX" in software of old, to prevent exposing users to this trademark nonsense?

  joey@darkstar:~/src/git-annex>git grep -i ubuntu |wc -l
joey@darkstar:~/src/ikiwiki>git grep -i ubuntu |wc -l
joey@darkstar:~/src/etckeeper>git grep -i ubuntu |wc -l

Most of the code in git-annex, ikiwiki, and etckeeper is licensed under the GPL or AGPL, and so Canonical's IP policy propbably does not require that anyone basing a distribution on Ubuntu strip all references to "Ubuntu" from them. But then, there's Propellor:

  joey@darkstar:~/src/propellor>git grep -i ubuntu |wc -l

Propellor is BSD licensed. It's in Ubuntu universe. It not only references Ubuntu in documentation, but contains code that uses that trademark:

  data Distribution
        = Debian DebianSuite
        | Ubuntu Release

So, if an Ubuntu-derived distribution has to remove "Ubuntu" from Propellor, they'd end up with a Propellor that either differs from upstream, or that can't be used to manage Ubuntu systems. Neither choice is good for users. Probably most small derived distributions would not have expertise to patch data types in a Haskell program and would have to skip including Propellor. That's not good for Propellor getting wide distribution either.

I think I've convinced myself it would be for the best to remove all references to "Ubuntu" from Propellor.

Similarly, Debconf is BSD licensed. I originally wrote it, but it's now maintained by Colin Watson, who works for Canonical. If I were still maintaining Debconf, I'd be looking at removing all instances of "Ubuntu" from it and preventing that and other Canonical trademarks from slipping back in later. Alternatively, I'd be happy to re-license all Debconf code that I wrote under the AGPL-3+.

PS: Shall we use "*buntu" as the, erm, canonical trademark-free spelling of "Ubuntu"? Seems most reasonable, unless Canonical has trademarked that too.

Syndicated 2016-02-19 17:41:21 from see shy jo

letsencrypt support in propellor

I've integrated letsencrypt into propellor today.

I'm using the reference letsencrypt client. While I've seen complaints that it has a lot of dependencies and is too complicated, it seemed to only need to pull in a few packages, and use only a few megabytes of disk space, and it has fewer options than ls does. So seems fine. (Although it would be nice to have some alternatives packaged in Debian.)

I ended up implementing this:

  letsEncrypt :: AgreeTOS -> Domain -> WebRoot -> CertInstaller -> Property NoInfo

The interesting part of that is the CertInstaller, which is passed the certificate files that letsencrypt generates, and is responsible for making the web server (or whatever) use them.

This avoids relying on the letsencrypt client's apache config munging, which is probably useful for many people, but not those of us using configuration management systems. And so avoids most of the complicated magic that the letsencrypt client has a reputation for.

And, this API lets other propellor properties integrate with letsencrypt by providing a CertInstaller of their own. Like this property, which sets up apache to serve a https website, using letsencrypt to get the certificate:

  Apache.httpsVirtualHost "" "/var/www"
    (LetsEncrypt.AgreeTos (Just "me@my.domain"))

That's about as simple a configuration as I can imagine for such a website!

The two parts of letsencrypt that are complicated are not the fault of the client really. Those are renewal and rate limiting.

I'm currently rate limited for the next week because I asked letsencrypt for several certificates for a domain, as I was learning how to use it and integrating it into propellor. So I've not quite managed to fully test everything. That's annoying. I also worry that rate limiting could hit at an inopportune time once I'm relying on letsencrypt. It's especially problimatic that it only allows 5 certs for subdomains of a given domain per week. What if I use a lot of subdomains?

Renewal is complicated mostly because there's no good way to test it. You set up your cron job, or whatever, and wait three months, and hopefully it worked. Just as likely, you got something wrong, and your website breaks. Maybe letsencrypt could offer certificates that will only last an hour, or a day, for use when testing renewal.

Also, what if something goes wrong with renewal? Perhaps is not available when your certificate needs to be renewed.

What I've done in propellor to handle renewal is, it runs letsencrypt every time, with the --keep-until-expiring option. If this fails, propellor will report a failure. As long as propellor is run periodically by a cron job, this should result in multiple failure reports being sent (for 30 days I think) before a cert expires without getting renewed. But, I have not been able to test this.

Syndicated 2016-02-07 22:10:20 from see shy jo

git-annex v6

Version 6 of git-annex, released last week, adds a major new feature; support for unlocked large files that can be edited as usual and committed using regular git commands.

For example:

  git init
git annex init --version=6
mv ~/foo.iso .
git add foo.iso
git commit -m "added hundreds of megabytes to git annex (not git)"
git remote add origin ssh://sever/dir
git annex sync origin --content # uploads foo.iso

Compare that with how git-annex has worked from the beginning, where git annex add is used to add a file, and then the file is locked, preventing further modifications of it. That is still a very useful way to use git-annex for many kinds of files, and is still supported of course. Indeed, you can easily switch files back and forth between being locked and unlocked.

This new unlocked file mode uses git's smudge/clean filters, and I was busy developing it all through December. It started out playing catch-up with git-lfs somewhat, but has significantly surpassed it now in several ways.

So, if you had tried git-annex before, but found it didn't meet your needs, you may want to give it another look now.

Now a few thoughts on git-annex vs git-lfs, and different tradeoffs made by them.

After trying it out, my feeling is that git-lfs brings an admirable simplicity to using git with large files. File contents are automatically uploaded to the server when a git branch is pushed, and downloaded when a branch is merged, and after setting it up, the user may not need to change their git workflow at all to use git-lfs.

But there are some serious costs to that simplicity. git-lfs is a centralized system. This is especially problimatic when dealing with large files. Being a decentralized system, git-annex has a lot more flexability, like transferring large file contents peer-to-peer over a LAN, and being able to choose where large quantities of data are stored (maybe in S3, maybe on a local archive disk, etc).

The price git-annex pays for this flexability is you have to configure it, and run some additional commands. And, it has to keep track of what content is located where, since it can't assume the answer is "in the central server".

The simplicity of git-lfs also means that the user doesn't have much control over what files are present in their checkout of a repository. git-lfs downloads all the files in the work tree. It doesn't have facilities for dropping the content of some files to free up space, or for configuring a repository to only want to get a subset of files in the first place. On the other hand, git-annex has excellent support for alll those things, and this comes largely for free from its decentralized design.

If git has showed us anything, it's perhaps that a little added complexity to support a fully distributed system won't prevent people using it. Even if many of them end up using it in a mostly centralized way. And that being decentralized can have benefits beyond the obvious ones.

Oh yeah, one other advantage of git-annex over git-lfs. It can use half as much disk space!

A clone of a git-lfs repository contains one copy of each file in the work tree. Since the user can edit that file at any time, or checking out a different branch can delete the file, it also stashes a copy inside .git/lfs/objects/.

One of the main reasons git-annex used locked files, from the very beginning, was to avoid that second copy. A second local copy of a large file can be too expensive to put up with. When I added unlocked files in git-annex v6, I found it needed a second copy of them, same as git-lfs does. That's the default behavior. But, I decided to complicate git-annex with a config setting:

  git config annex.thin true
git annex fix

Run those two commands, and now only one copy is needed for unlocked files! How's it work? Well, it comes down to hard links. But there is a tradeoff here, which is why this is not the default: When you edit a file, no local backup is preserved of its old content. So you have to make sure to let git-annex upload files to another repository before editing them or the old version could get lost. So it's a tradeoff, and maybe it could be improved. (Only thin out a file after a copy has been uploaded?)

This adds a small amount of complexity to git-annex, but I feel it's well worth it to let unlocked files use half the disk space. If the git-lfs developers are reading this, that would probably be my first suggestion for a feature to consider adding to git-lfs. I hope for more opportunities to catch-up to git-lfs in turn.

Syndicated 2016-01-19 17:28:50 from see shy jo

STM Region contents

concurrent-output released yesterday got a lot of fun features. It now does full curses-style minimization of the output, to redraw updated lines with optimal efficiency. And supports multiline regions/wrapping too long lines. And allows the user to embed ANSI colors in a region. 3 features that are in some tension and were fun to implement all together.

But I have a more interesting feature to blog about... I've added the ability for the content of a Region to be determined by a (STM transaction).

Here, for example, is a region that's a clock:

timeDisplay :: TVar UTCTime -> STM Text
timeDisplay tv = T.pack . show <$> readTVar tv

clockRegion :: IO ConsoleRegionHandle
clockRegion = do
    tv <- atomically . newTVar =<< getCurrentTime
    r <- openConsoleRegion Linear
    setConsoleRegion r (timeDisplay tv)
    async $ forever $ do
        threadDelay 1000000 -- 1 sec
        atomically . (writeTVar tv) =<< getCurrentTime
    return r

There's something magical about this. Whenever a new value is written into the TVar, concurrent-output automatically knows that this region needs to be updated. How does it know how to do that?

Magic of STM. Basically, concurrent-output composes all the STM transactions of Regions, and asks STM to wait until there's something new to display. STM keeps track of whatever TVars might be looked at, and so can put the display thread to sleep until there's a change to display.

Using STM I've gotten extensability for free, due to the nice ways that STM transactions compose.

A few other obvious things to do with this: Compose 2 regions with padding so they display on the same line, left and right aligned. Trim a region's content to the display width. (Handily exported by concurrent-output in a TVar for this kind of thing.)

I'm tempted to write a console spreadsheet using this. Each visible cell of the spreadsheet would have its own region, that uses a STM transaction to display. Plain data Cells would just display their current value. Cells that contain a function would read the current values of other Cells, and use that to calculate what to display. Which means that a Cell containing a function would automatically update whenever any of the Cells that it depends on were updated!

Do you think that a simple interactive spreadsheet built this way would be more than 100 lines of code?

Syndicated 2015-11-03 20:03:53 from see shy jo

a tiling region manager for the console

Building on top of concurrent-output, and some related work Joachim Breitner did earlier, I now have a kind of equivilant to a tiling window manager, except it's managing regions of the console for different parts of a single program.

Here's a really silly demo, in an animated gif:


Not bad for 23 lines of code, is that? Seems much less tedious to do things this way than using ncurses. Even with its panels, ncurses requires you to think about layout of various things on the screen, and many low-level details. This, by contrast, is compositional, just add another region and a thread to update it, and away it goes.

So, here's an apt-like download progress display, in 30 lines of code.


Not only does it have regions which are individual lines of the screen, but those can have sub-regions within them as seen here (and so on).

And, log-type messages automatically scroll up above the regions. External programs run by createProcessConcurrent will automatically get their output/errors displayed there, too.

What I'm working on now is support for multiline regions, which automatically grow/shrink to fit what's placed in them. The hard part, which I'm putting the finishing touches on, is to accurately work out how large a region is before displaying it, in order to lay it out. Requires parsing ANSI codes amoung other things.

STM rules

There's so much concurrency, with complicated interrelated data being updated by different threads, that I couldn't have possibly built this without Software Transactional Memory.

Rather than a nightmare of locks behind locks behind locks, the result is so well behaved that I'm confident that anyone who needs more control over the region layout, or wants to do funky things can dive into to the STM interface and update the data structures, and nothing will ever deadlock or be inconsistent, and as soon as an update completes, it'll display on-screen.

An example of how powerful and beuatiful STM is, here's how the main display thread determines when it needs to refresh the display:

data DisplayChange
        = BufferChange [(StdHandle, OutputBuffer)]
        | RegionChange RegionSnapshot
        | TerminalResize (Maybe Width)
        | EndSignal ()

                change <- atomically $
                        (RegionChange <$> regionWaiter origsnapshot)
                        (RegionChange <$> regionListWaiter origsnapshot)
                        (BufferChange <$> outputBufferWaiterSTM waitCompleteLines)
                        (TerminalResize <$> waitwidthchange)
                        (EndSignal <$> waitTSem endsignal)
                case change of
                        RegionChange snapshot -> do
                        BufferChange buffers -> do
                        TerminalResize width -> do

So, it composes all these STM actions that can wait on various kinds of changes, to get one big action, that waits for all of the above, and builds up a nice sum type to represent what's changed.

Another example is that the whole support for sub-regions only involved adding 30 lines of code, all of it using STM, and it worked 100% the first time.

Available in concurrent-output 1.1.0.

Syndicated 2015-10-31 01:44:47 from see shy jo

concurrent output library

concurrent-output is a Haskell library I've developed this week, to make it easier to write console programs that do a lot of different things concurrently, and want to serialize concurrent outputs sanely.

It's increasingly easy to write concurrent programs, but all their status reporting has to feed back through the good old console, which is still obstinately serial.

Haskell illustrates problem this well with this "Linus's first kernel" equivilant interleaving the output of 2 threads:

  > import System.IO
> import Control.Concurrent.Async
> putStrLn (repeat 'A') `concurrently` putStrLn (repeat 'B')

That's fun, but also horrible if you wanted to display some messages to the user:

  > putStrLn "washed the car" `concurrently` putStrLn "walked the dog"
walwkaesdh etdh et hdeo gc

To add to the problem, we often want to run separate programs concurrently, which have output of their own to display. And, just to keep things interesting, sometimes a unix program will behave differently when stdout is not connected to a terminal (eg, ls | cat).

To tame simple concurrent programs like these so they generate readable output involves a lot of plumbing. Something like, run the actions concurrently, taking care to capture the output of any commands, and then feed the output that the user should see though some sort of serializing channel to the display. Dealing with that when you just wanted a simple concurrent program risks ending up with a not-so-simple program.

So, I wanted an library with basically 2 functions:

outputConcurrent :: String -> IO ()
createProcessConcurrent :: CreateProcess -> IO whatever

The idea is, you make your program use outputConcurrent to display all its output, and each String you pass to that will be displayed serially, without getting mixed up with any other concurrent output.

And, you make your program use createProcessConcurrent everywhere it starts a process that might output to stdout or stderr, and it'll likewise make sure its output is displayed serially.

Oh, and createProcessConcurrent should avoid redirecting stdout and stderr away from the console, when no other concurrent output is happening. So, if programs are mostly run sequentially, they behave as they normally would at the console; any behavior changes should only occur when there is concurrency. (It might also be nice for it to allocate ttys and run programs there to avoid any behavior changes at all, although I have not tried to do that.)

And that should be pretty much the whole API, although it's ok if it needs some function called by main to set it up:

import Control.Concurrent.Async
import Control.Concurrent.Output

main = withConcurrentOutput $
    outputConcurrent "washed the car\n"
    createProcessConcurrent (proc "ls" [])
    outputConcurrent "walked the dog\n"
  $ ./demo
washed the car
walked the dog
Maildir/  bin/  doc/  html/  lib/  mail/  mnt/  src/  tmp/

I think that's a pretty good API to deal with this concurrent output problem. Anyone know of any other attempts at this I could learn from?

I implemented this over the past 3 days and 320 lines of code. It got rather hairy:

  • It has to do buffering of the output.
  • There can be any quantity of output, but program memory use should be reasonably small. Solved by buffering up to 1 mb of output in RAM, and writing excess buffer to temp files.
  • Falling off the end of the program is complicated; there can be buffered output to flush and it may have to wait for some processes to finish running etc.
  • The locking was tough to get right! I could not have managed to write it correctly without STM.

It seems to work pretty great though. I got Propellor using it, and Propellor can now run actions concurrently!

Syndicated 2015-10-29 02:07:34 from see shy jo

propelling disk images

Following up on Then and Now ...

In quiet moments at ICFP last August, I finished teaching Propellor to generate disk images. With an emphasis on doing a whole lot with very little new code and extreme amount of code reuse.

For example, let's make a disk image with nethack on it. First, we need to define a chroot. Disk image creation reuses propellor's chroot support, described back in propelling containers. Any propellor properties can be assigned to the chroot, so it's easy to describe the system we want.

 nethackChroot :: FilePath -> Chroot
    nethackChroot d = Chroot.debootstrapped (System (Debian Stable) "amd64") mempty d
        & Apt.installed ["linux-image-amd64"]
        & Apt.installed ["nethack-console"]
        & accountFor gamer
        & gamer `hasInsecurePassword` "hello"
        & gamer `hasLoginShell` "/usr/games/nethack"
      where gamer = User "gamer"

Now to make an image from that chroot, we just have to tell propellor where to put the image file, some partitioning information, and to make it boot using grub.

 nethackImage :: RevertableProperty
    nethackImage = imageBuilt "/srv/images/nethack.img" nethackChroot
        MSDOS (grubBooted PC)
        [ partition EXT2 `mountedAt` "/boot"
            `setFlag` BootFlag
        , partition EXT4 `mountedAt` "/"
            `addFreeSpace` MegaBytes 100
        , swapPartition (MegaBytes 256)

The disk image partitions default to being sized to fit exactly the files from the chroot that go into each partition, so, the disk image is as small as possible by default. There's a little DSL to configure the partitions. To give control over the partition size, it has some functions, like addFreeSpace and setSize. Other functions like setFlag and extended can further adjust the partitions. I think that worked out rather well; the partition specification is compact and avoids unecessary hardcoded sizes, while providing plenty of control.

By the end of ICFP, I had Propellor building complete disk images, but no boot loader installed on them.

Fast forward to today. After stuggling with some strange grub behavior, I found a working method to install grub onto a disk image.

The whole disk image feature weighs in at:

203 lines to interface with parted
88 lines to format and mount partitions
90 lines for the partition table specification DSL and partition sizing
196 lines to generate disk images
75 lines to install grub on a disk image
652 lines of code total

Which is about half the size of vmdebootstrap 1/4th the size of partman-base (probably 1/100th the size of total partman), and 1/13th the size of live-build. All of which do similar things, in ways that seem to me to be much less flexible than Propellor.

One thing I'm considering doing is extending this so Propellor can use qemu-user-static to create disk images for eg, arm. Add some u-boot setup, and this could create bootable images for arm boards. A library of configs for various arm boards could then be included in Propellor. This would be a lot easier than running the Debian Installer on an arm board.

Oh! I only just now realized that if you have a propellor host configured, like this example for my dialup gateway, leech --

 leech = host ""
        & os (System (Debian (Stable "jessie")) "armel")
        & Apt.installed ["linux-image-kirkwood", "ppp", "screen", "iftop"]
        & privContent "/etc/ppp/peers/provider"
        & privContent "/etc/ppp/pap-secrets"
        & Ppp.onBoot
        & hasPassword (User "root")
        & Ssh.installed

-- The host's properties can be extracted from it, using eg hostProperties leech and reused to create a disk image with the same properties as the host!

So, when my dialup gateway gets struck by lightning again, I could use this to build a disk image for its replacement:

 import qualified Propellor.Property.Hardware.SheevaPlug as SheevaPlug

    laptop = host ""
        & SheevaPlug.diskImage "/srv/images/leech.img" (MegaBytes 2000)
            (& propertyList "has all of leech's properties"
                (hostProperties leech))

This also means you can start with a manually built system, write down the properties it has, and iteratively run Propellor against it until you think you have a full specification of it, and then use that to generate a new, clean disk image. Nice way to transition from sysadmin days of yore to a clean declaratively specified system.

Syndicated 2015-10-23 02:09:17 from see shy jo

propellor orchestration

With the disclamer that I don't really know much about orchestration, I have added support for something resembling it to Propellor.

Until now, when using propellor to manage a bunch of hosts, you updated them one at a time by running propellor --spin $somehost, or maybe you set up a central git repository, and a cron job to run propellor on each host, pulling changes from git.

I like both of these ways to use propellor, but they only go so far...

  • Perhaps you have a lot of hosts, and would like to run propellor on them all concurrently.

    master = host "" & concurrently conducts alotofhosts

  • Perhaps you want to run propellor on your dns server last, so when you add a new webserver host, it gets set up and working before the dns is updated to point to it.

    master = host "" & conducts webservers before conducts dnsserver

  • Perhaps you have something more complex, with multiple subnets that propellor can run in concurrently, finishing up by updating that dnsserver.

    master = host "" & concurrently conducts [sub1, sub2] before conducts dnsserver

    sub1 = "" & concurrently conducts webservers & conducts loadbalancers

    sub2 = "" & conducts dockerservers

  • Perhaps you need to first run some command that creates a VPS host, and then want to run propellor on that host to set it up.

    vpscreate h = cmdProperty "vpscreate" [hostName h] before conducts h

All those scenarios are supported by propellor now!

Well, I haven't actually implemented concurrently yet, but the point is that the conducts property can be used with any of propellor's property combinators, like before etc, to express all kinds of scenarios.

The conducts property works in combination with an orchestrate function to set up all the necessary stuff to let one host ssh into another and run propellor there.

  main = defaultMain (orchestrate hosts)

hosts = 
    [ master
    , webservers 
    , ...

The orchestrate function does a bunch of stuff:

  • Builds up a graph of what conducts what.
  • Removes any cycles that might have snuck in by accident, before they cause foot shooting.
  • Arranges for the ssh keys to be accepted as necessary.
    Note that you you need to add ssh key properties to all relevant hosts so it knows what keys to trust.
  • Arranges for the private data of a host to be provided to the hosts that conduct it, so they can pass it along.

I've very pleased that I was able to add the Propellor.Property.Conductor module implementing this with only a tiny change to the rest of propellor. Almost everything needed to implement it was there in propellor's infrastructure already.

Also kind of cool that it only needed 13 lines of imperative code, the other several hundred lines of the implementation being all pure code.

Syndicated 2015-10-22 01:02:58 from see shy jo

585 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!