Older blog entries for mjg59 (starting at number 334)

25 May 2012 »

Fedora 17 and Mac support

Fedora 17 will be shipping next week. It's got a bunch of new features, none of which I contributed to in the slightest. What I did work on was improving our support for installation on x86 Apple hardware. There's still a few shortcomings in this so it's not an announced or supported feature, but it's sufficient progress that it's worth writing about.

There's a few ways that Apple platforms differs from normal PC hardware. The first is that Apples are very much intended to be EFI first and BIOS second, while that's still not really true for commodity PC hardware. Apple launched Boot Camp shortly after they started shipping x86 Macs (and shortly after I'd got Ubuntu running on one for the first time[1]) but the focus of Boot Camp has always been to be good enough to boot Windows and not much else[2]. The other is the integration with the boot environment. You can boot Linux easily enough by setting the standard EFI boot variables, but if you ever boot back into OS X it's easy to trigger deletion of those. There's then no straightforward way of resetting them, and you're left having to recover your installation.

The other difficult bit has been actually starting the installer at all. If you're happy to use BIOS emulation than this can be done without too much misery, but you're then left with dealing with Apple's custom synchronised GPT/MBR. More modern Macs will boot via the standard EFI mechanisms, but there's a range of problems that can be caused that way. Fedora 17 is the first Linux distribution to ship install media that will EFI boot a Mac when either written to optical media or a USB stick[3]. Put the install media in your system and then either select it in the OS X Startup Disk preferences menu and reboot, or reboot and hold down the alt key. In the latter case there'll be a nice Fedora logo. Click on it and it'll boot.

The biggest difference between Apple hardware and anyone else is that we'd normally put the bootloader in a FAT partition that's shared with any other installed operating systems. That does actually work on Apples, but then you run into the earlier point I mentioned. The Apple startup picker that you get by holding down the alt key doesn't pay any attention to the standard EFI boot variables, so it won't show a FAT partition merely because it's got an EFI bootloader on it. The same is true of the OS X Startup Disk preferences. The best way to get a drive to appear in the menu is to make it look like an OS X drive.

This was problematic in a few ways, due to the requirement for an OS X drive to be HFS+. The first was that we didn't have any support for that in our installer, so work had to be done there. The second was that the state of HFS+ was woefully poor in Linux[4]. Linux will refuse to write to an HFS+ filesystem if it hasn't been cleanly unmounted or fscked, and since we can't rely on everyone always shutting their machine down cleanly that means a working fsck.hfsplus. In theory we were shipping one of those. In practice, it reliably segfaulted on 64-bit systems[5]. So I had to package the latest version of Apple's code, and doing that involved dealing with the fact that Apple now use blocks almost as much as Grub 2 uses nested functions, and that meant using Clang and that meant dealing with a bug where Clang segfaulted if it had been built with gcc 4.7[6], but finally once all of those yaks had been shaven we could trust our filesystem.

Of course, more problems existed. You can't just drop a bootloader onto an HFS+ partition and expect it to work. You need to write its inode value into the superblock. Trivial, except that if you do this from userspace then the kernel kindly overwrites your update when you unmount the disk and it updates the superblock. That required a mechanism for updating that data via the kernel, which meant adding an ioctl. This would have been fine, except the OS X Startup Disk preference looks for a bootloader in a location other than where Fedora installs it. Changing our bootloader install path wasn't a great option, so the easiest thing to do was just to link them together. Except HFS+ doesn't really support hardlinks. When you hardlink something the original file gets moved into a magic directory in the filesystem root and the links are special files that refer to it. And it turns out that it's not actually the inode that the firmware wants when finding the bootloader, it's the catalogue ID, and these aren't the same when hardlinks are involved. So, another kernel patch.

All that was left for the boot partition then was adding an icon and a drive label. We already had code for generating the icon, and the drive label wasn't that difficult. Hurrah!

Ok, so we can construct a filesystem that (a) appears in the bootpicker, and (b) appears in the OS X Startup Disk preferences. We can even put a bootloader on it. This led to the next problem. The bootloader started without any trouble. But the bootloader couldn't read any files off the boot partition, including its configuration. This should have been obvious with hindsight - the problem was simply that grub legacy[7] doesn't support HFS+. Writing an HFS+ driver seemed like an excessive amount of work, especially since the firmware already supported reading HFS+, so instead I wrote a simple grub filesystem driver that uses the UEFI interfaces to read files.

A working bootloader! One more problem there. grub looks in its startup directory to find its configuration file. We have two "copies" of grub hardlinked to each other, each looking in a different directory for configuration. Hardlinking the config files together worked fine, except that most tools that update config files have this irritating habit of writing to a temporary file and moving that over the original, thus breaking the hardlink. Easily solved by using a symlink instead, except that the UEFI filesystem semantics don't really support symlinks because UEFI only really envisaged using FAT. If you use Apple's UEFI HFS+ driver to open a symlink, you get a short file containing the target path of the symlink. Rather less than ideal, and a rather gross hack to cope with it.

All set now, other than a bug in some older Mac firmware where read would return BUFFER_TOO_SMALL without returning the real buffersize and the odd way that Apple treat CDs. Oh, and a bug in Apple's Broadcom wifi driver that could overwrite the kernel, and case-sensitivity, something that obviously isn't something you often hit on UEFI when all you usually see is FAT.

Kernel-side, we had a couple of things. Obviously there the bugs I'd mentioned in the past, but those had already been dealt with. New issues included the fact that we were frequently picking the wrong framebuffer address, especially on machines with multiple GPUs. Fixed now, along with another issue where we'd sometimes get confused by models with different GPU configurations. And at that point, most of the implementation work was done.

Of course, there was also the work of integrating this into the tools that we use to generate our install media, and I'd like to thank Will Woods, Brian Lane and Dennis Gilmore for the work they put into that, along with anyone I've forgotten. Tom Calloway handled getting the device label graphics generated right before deadline. A cast of thousands helped with testing.

So why did I mention shortcomings? First, this only works on machines with 64-bit firmware. Early 64-bit Apples still had 32-bit firmware. In theory we can support that case - in practice it's a moderate amount of writing, including some mildly awkward kernel code. But anything later than mid-2007 should have 64-bit firmware, so it's not a massive concern. Second, we seem to have fairly significant trouble with Radeon hardware on a lot of Macs. We fixed one bug there, but something's still going wrong in setup and you'll frequently end up with a black screen. And thirdly, I'm sure there's some other machines that still don't work for (as yet) undetermined reasons.

With luck we'll get these things dealt with by Fedora 18. But even with these flaws, Fedora 17 will work fine on a lot of Apple hardware, and testing it is as simple as downloading and booting a live image. Bugs go in the usual place.

[1] Note my charming belief that the Ubuntu installer would fully support this hardware via EFI in the near future. 6 years ago. It still doesn't.
[2] This can be fairly easily seen from little things like the name "Windows" being hardcoded into every UI that sees a Boot Camp drive.
[3] I cover this in more depth here
[4] To be fair, it mostly still is - there's plenty of work to be done there.
[5] Approximately nobody had noticed this, which is all kinds of unreassuring.
[6] Thanks to Kalev Lember for dealing with that
[7] Still the EFI bootloader in Fedora 17. We've already swapped Fedora 18 over to grub 2.

comments

Syndicated 2012-05-25 03:42:53 from Matthew Garrett

22 May 2012 »

I've been a terrible person (and so have most of you)

John Scalzi recently wrote a piece on straight white male privilege. If you haven't read it already, go and do so. No rush. I'll wait.

So. Some facts:

Women are underrepresented in free software development
Those women who are involved in free software development are overwhelmingly more likely to have been subject to sexual harassment, belittling commentary or just plain ignored because of their gender
When asked, women tend to believe that these two facts are fairly strongly related

(If you disagree with any of these then that's absolutely your right. You're wrong, but that's ok. But please do me a favour and stop reading here. Otherwise you'll just get angry and then you'll write something ill-tempered and still wrong in the comments and then I'll have to delete it and why not just save everybody the time and effort and go and eat ice cream or something instead)

I know I've said this before, but inappropriate and marginalising behaviour is rife in our community, and at all levels of our community. There's the time an open source evangelist just flat out told a woman that her experiences didn't match his so she must be an outlier. There's the time a leading kernel developer said that most rape statistics were basically made up. There's the time that I said the most useful thing Debian could do with its money would be to buy prostitutes for its developers, simultaneously sexualising the discussion, implying that Debian developers were all straight men and casting sex workers as property. These aren't the exceptions. It's endemic. Almost all of us have been part of the problem, and in doing so we've contributed to an environment that has at best driven away capable contributors. You probably don't want to know what it's done at worst.

But what people have done in the past isn't important. What's important is how we behave in the future. If you're not angry about social injustice like this then you're doing it wrong. If you're reading this then there's a pretty high probability that you're a white male. So, it's great that you're angry. You should be! As a straight white male born into a fairly well-off family, a native English speaker in an English speaking country, I have plenty of time to be angry before going back to my nice apartment and living my almost entirely discrimination-free life. So if it makes me angry, I have absolutely no way of comprehending how angry it must make the people who actually have to live with this shit on a daily basis.

(Were tampon mouse able to form and express coherent thoughts, tampon mouse would not put up with this shit)

The point isn't to be smugly self aware of our own shortcomings and the shortcomings of others. The point is to actually do something about it. If you're not already devoting some amount of your resources to improving fairness in the world, then why not? It doesn't have to be about women in technology - if you're already donating to charity or helping out at schools or engaging in local politics or any of the countless other ways an individual can help make the world a better place, large or small, then keep on doing that. But do consider that many of us have done things in the past that contributed to the alienation of an astounding number of potential community members, and if you can then please do do something to make up for it. It might be donating to groups like The Ada Initiative. It might be mentoring students for projects like the GNOME Outreach Program for Women, or working to create similar programs. Even just making our communities less toxic by pointing out unacceptable behaviour when you see it makes a huge difference.

But most importantly, be aware that it was people like me who were responsible for this problem in the first place and people like me who need to take responsibility for solving it. We can't demand the victims do that for us.

comments

Syndicated 2012-05-22 01:57:11 from Matthew Garrett

20 May 2012 »

Fixing Asus UX21e unexpected power off

The Asus UX21e (and maybe the UX31e?) has the irritating misfeature that it reloads the CPU thermal tables when you unplug the power. One consequence of this is that it'll automatically throttle itself much more aggressively on battery (reducing performance) but the more serious one is that the new critical power off temperature may then be lower than the temperature the CPU is currently operating at, resulting in the machine turning itself off. As far as I can tell from debugging, this is completely OS-independent - it still happens even if I stub out all the ACPI code for power supply events and there are reports of the same thing occurring on Windows. The good news is that it seems to be fixed in newer firmware versions. The even better news is that you can flash it without Windows. Just download the BIOS image from the Asus website, copy it onto a FAT formatted USB stick, insert that, go into the firmware (hit F2 on the splash screen) and start the flash program from there.

comments

Syndicated 2012-05-19 23:26:29 from Matthew Garrett

2 May 2012 »

Anatomy of a Fedora 17 ISO image

My spare time has been limited lately, so progress on producing Mac-bootable Fedora install images has been a bit slow. Thankfully it looks like everything's gong to land in time for Fedora 17[1] so hurrah for that - all that's missing right now are the last couple of patches that make sure the boot picker displays useful labels on the drives.

So how is this accomplished? Here's a hex dump of the first few K of the image.

00000000  45 52 08 00 00 00 90 90  00 00 00 00 00 00 00 00  |ER..............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  33 ed fa 8e d5 bc 00 7c  fb fc 66 31 db 66 31 c9  |3......|..f1.f1.|
00000030  66 53 66 51 06 57 8e dd  8e c5 52 be 00 7c bf 00  |fSfQ.W....R..|..|
00000040  06 b9 00 01 f3 a5 ea 4b  06 00 00 52 b4 41 bb aa  |.......K...R.A..|
00000050  55 31 c9 30 f6 f9 cd 13  72 16 81 fb 55 aa 75 10  |U1.0....r...U.u.|
00000060  83 e1 01 74 0b 66 c7 06  f1 06 b4 42 eb 15 eb 00  |...t.f.....B....|
00000070  5a 51 b4 08 cd 13 83 e1  3f 5b 51 0f b6 c6 40 50  |ZQ......?[Q...@P|
00000080  f7 e1 53 52 50 bb 00 7c  b9 04 00 66 a1 b0 07 e8  |..SRP..|...f....|
00000090  44 00 0f 82 80 00 66 40  80 c7 02 e2 f2 66 81 3e  |D.....f@.....f.>|
000000a0  40 7c fb c0 78 70 75 09  fa bc ec 7b ea 44 7c 00  |@|..xpu....{.D|.|
000000b0  00 e8 83 00 69 73 6f 6c  69 6e 75 78 2e 62 69 6e  |....isolinux.bin|
000000c0  20 6d 69 73 73 69 6e 67  20 6f 72 20 63 6f 72 72  | missing or corr|
000000d0  75 70 74 2e 0d 0a 66 60  66 31 d2 66 03 06 f8 7b  |upt...f`f1.f...{|
000000e0  66 13 16 fc 7b 66 52 66  50 06 53 6a 01 6a 10 89  |f...{fRfP.Sj.j..|
000000f0  e6 66 f7 36 e8 7b c0 e4  06 88 e1 88 c5 92 f6 36  |.f.6.{.........6|
00000100  ee 7b 88 c6 08 e1 41 b8  01 02 8a 16 f2 7b cd 13  |.{....A......{..|
00000110  8d 64 10 66 61 c3 e8 1e  00 4f 70 65 72 61 74 69  |.d.fa....Operati|
00000120  6e 67 20 73 79 73 74 65  6d 20 6c 6f 61 64 20 65  |ng system load e|
00000130  72 72 6f 72 2e 0d 0a 5e  ac b4 0e 8a 3e 62 04 b3  |rror...^....>b..|
00000140  07 cd 10 3c 0a 75 f1 cd  18 f4 eb fd 00 00 00 00  |...<.u..........|
00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This is the boot sector. One of the fun things about ISO9660 is that it doesn't have a boot sector in the usual x86 sense - firmwares are expected to read enough of the filesystem that they can find the bootloader in it and run it directly. x86 isn't really smart enough to manage that, so it uses something called El Torito[2]. In any case, this is usually empty space on an x86 CD. But with an increasing number of systems shipping without optical drives, real CD installs are decreasing in popularity. We use a piece of software called isohybrid that allows the building of ISO9660 images that can be written directly onto a USB stick. Asking a machine to boot off them will execute the boot sector located here, which then loads a real bootloader that's capable of reading ISO and booting the OS. It'll just be ignored when it's on a real CD.

There's one difference here, though. The isohybrid bootsector has been modified so that the first 32 bytes are just noops, and this has been inserted:

00000000  45 52 08 00 00 00 90 90  00 00 00 00 00 00 00 00  |ER..............|

45 52 ("ER") indicates the presence of an Apple partition map. This will become important later. 0800 indicates that it's using 2048-byte sectors. The 9090 is a lie to convince the firmware that it's not a zero-sized disk without representing dangerous code. We need an Apple partition map because some older Macs won't boot off CD unless the CD has one. But it's sitting right at the start of the boot sector, and this code is going to be executed. Thankfully, the entire Apple header decodes to

00000000  45                inc bp
00000001  52                push dx
00000002  0800              or [bx+si],al
00000004  0000              add [bx+si],al
00000006  90                nop
00000007  90                nop
00000008  0000              add [bx+si],al
0000000A  0000              add [bx+si],al
0000000C  0000              add [bx+si],al
0000000E  0000              add [bx+si],al
00000010  0000              add [bx+si],al
00000012  0000              add [bx+si],al
00000014  0000              add [bx+si],al
00000016  0000              add [bx+si],al
00000018  0000              add [bx+si],al
0000001A  0000              add [bx+si],al
0000001C  0000              add [bx+si],al
0000001E  0000              add [bx+si],al

which is completely harmless - we can execute all these instructions without any interesting changes in state. They'll all be undone by the rest of the boot sector.

000001b0  14 05 00 00 00 00 00 00  4c 1e ed 76 00 00 80 00  |........L..v....|
000001c0  01 00 00 3f a0 89 00 00  00 00 00 50 14 00 00 fe  |...?.......P....|
000001d0  ff ff ef fe ff ff a4 00  00 00 70 04 00 00 00 fe  |..........p.....|
000001e0  ff ff 00 fe ff ff 44 05  00 00 c0 08 00 00 00 00  |......D.........|
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|

This is the MBR partition table. The aim here was to produce media that would boot via BIOS just as well as it does via EFI, and it turns out that there are some systems that refuse to BIOS-boot off GPT media. So we need an MBR partition map. The first entry covers the entire disk and is of type 0. This is important, because some EFI implementations are strict about MBR parsing - if there's an MBR with overlapping partitions, the entire MBR will be ignored. Inconvenient. Fortunately, we've got the source code to this code, and it turns out that partitions of type 0 are ignored when performing this check. So, there's a partition of type 0.

Why does it cover the entire disk? Once we've booted the OS, we need to be able to read the ISO image contained on the USB stick. That means the kernel needs to be able to mount it, which means we need a partition entry that covers the entire disk. Linux is perfectly happy mounting a filesystem from a partition of type 0.

So, the next partition. This is an EFI boot partition that points at the embedded VFAT El-Torito image. Some firmware will decide that we've got an MBR and so will only boot off a partition that exists in it. So, here's an MBR partition to keep them happy.

The final partition covers the embedded HFS+ image. It's there purely for convenience - it means we can access it under Linux in order to update its contents for testing purposes.

00000200  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART....\...|
00000210  09 27 93 7e 00 00 00 00  01 00 00 00 00 00 00 00  |.'.~............|
00000220  fe 4f 14 00 00 00 00 00  30 00 00 00 00 00 00 00  |.O......0.......|
00000230  de 4f 14 00 00 00 00 00  2a 26 0f 37 23 c9 7f 4f  |.O......*&.7#..O|
00000240  90 df 5e fe 0a 60 1c b0  10 00 00 00 00 00 00 00  |..^..`..........|
00000250  80 00 00 00 80 00 00 00  ec 70 63 c1 00 00 00 00  |.........pc.....|
00000260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

And here we have the GPT header. Nothing complicated here, other than it indicates that the first partition descriptor is at LBA 0x10, ie byte 0x2000 - that's further into the disk than normal. That leaves us enough space to fit the Apple partition entries.

Why have a GPT at all? The EFI spec says that machines should boot fine from an MBR partition. Sadly, not all seem to. It also lets us represent the HFS+ partition in a way that makes the Mac firmware happy.

00000800  50 4d 00 00 00 00 00 03  00 00 00 01 00 00 00 10  |PM..............|
00000810  41 70 70 6c 65 00 00 00  00 00 00 00 00 00 00 00  |Apple...........|
00000820  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000830  41 70 70 6c 65 5f 70 61  72 74 69 74 69 6f 6e 5f  |Apple_partition_|
00000840  6d 61 70 00 00 00 00 00  00 00 00 00 00 00 00 00  |map.............|
00000850  00 00 00 00 00 00 00 0a  00 00 00 03 00 00 00 00  |................|
00000860  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

2K into the disk, here's an Apple partition map. It's 2K into the disk because we set the sector size to 2K in the partition map header. Why did we do that? Because the default is 512 bytes, and if we'd put the partition map at 512 bytes it would have been on top of the GPT header. Why 2K rather than 1K? A couple of reasons. First, the Apple partition map is only used when we boot off an actual CD, and the CD sector size is genuinely 2K. Secondly, because CDs have a sector size of 2K, 2K is a value that's actually been tested in the real world. It's nice to avoid tempting fate.

00001000  50 4d 00 00 00 00 00 03  00 00 00 29 00 00 04 70  |PM.........)...p|
00001010  45 46 49 00 00 00 00 00  00 00 00 00 00 00 00 00  |EFI.............|
00001020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001030  41 70 70 6c 65 5f 48 46  53 00 00 00 00 00 00 00  |Apple_HFS.......|
00001040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001050  00 00 00 00 00 00 04 70  00 00 00 33 00 00 00 00  |.......p...3....|
00001060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001800  50 4d 00 00 00 00 00 03  00 00 01 51 00 00 08 c0  |PM.........Q....|
00001810  45 46 49 00 00 00 00 00  00 00 00 00 00 00 00 00  |EFI.............|
00001820  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001830  41 70 70 6c 65 5f 48 46  53 00 00 00 00 00 00 00  |Apple_HFS.......|
00001840  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001850  00 00 00 00 00 00 08 c0  00 00 00 33 00 00 00 00  |...........3....|
00001860  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The Apple partition map entries. One partition covers the entire disk, the other the embedded HFS+ filesystem. Why do this? Because not all Macs understand EFI El Torito, so won't EFI boot a CD unless there's an Apple partition map. Why have an HFS+ partition at all? Because the same Macs won't boot off FAT.

00002000  a2 a0 d0 eb e5 b9 33 44  87 c0 68 b6 b7 26 99 c7  |......3D..h..&..|
00002010  d0 84 a5 d4 aa 89 e0 40  81 f9 fc e4 04 e9 c2 99  |.......@........|
00002020  00 00 00 00 00 00 00 00  10 4b 14 00 00 00 00 00  |.........K......|
00002030  00 00 00 00 00 00 00 00  49 53 4f 48 79 62 72 69  |........ISOHybri|
00002040  64 20 49 53 4f 00 49 53  4f 48 79 62 72 69 64 00  |d ISO.ISOHybrid.|
00002050  41 70 70 6c 00 00 00 00  00 00 00 00 00 00 00 00  |Appl............|
00002060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002080  a2 a0 d0 eb e5 b9 33 44  87 c0 68 b6 b7 26 99 c7  |......3D..h..&..|
00002090  68 d5 e0 27 72 83 c6 4c  a7 ae 92 dd ed 6c 89 71  |h..'r..L.....l.q|
000020a0  a4 00 00 00 00 00 00 00  13 05 00 00 00 00 00 00  |................|
000020b0  00 00 00 00 00 00 00 00  49 53 4f 48 79 62 72 69  |........ISOHybri|
000020c0  64 00 41 70 70 6c 65 00  41 70 70 6c 00 00 00 00  |d.Apple.Appl....|
000020d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002100  00 53 46 48 00 00 aa 11  aa 11 00 30 65 43 ec ac  |.SFH.......0eC..|
00002110  68 d5 e0 27 72 83 c6 4c  a7 ae 92 dd ed 6c 89 71  |h..'r..L.....l.q|
00002120  44 05 00 00 00 00 00 00  03 0e 00 00 00 00 00 00  |D...............|
00002130  00 00 00 00 00 00 00 00  49 53 4f 48 79 62 72 69  |........ISOHybri|
00002140  64 00 41 70 70 6c 65 00  41 70 70 6c 00 00 00 00  |d.Apple.Appl....|
00002150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Three GPT entries - one covers the entire CD, one covers the embedded FAT filesystem, one covers the embedded EFI filesystem. That ensures that they're all accessible to the firmware.

00008000  01 43 44 30 30 31 01 00  4c 49 4e 55 58 20 20 20  |.CD001..LINUX   |
00008010  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00008020  20 20 20 20 20 20 20 20  46 65 64 6f 72 61 2d 4c  |        Fedora-L|
00008030  69 76 65 43 44 20 20 20  20 20 20 20 20 20 20 20  |iveCD           |
00008040  20 20 20 20 20 20 20 20  00 00 00 00 00 00 00 00  |        ........|
00008050  c4 12 05 00 00 05 12 c4  00 00 00 00 00 00 00 00  |................|
00008060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00008070  00 00 00 00 00 00 00 00  01 00 00 01 01 00 00 01  |................|
00008080  00 08 08 00 40 00 00 00  00 00 00 40 15 00 00 00  |....@......@....|
00008090  00 00 00 00 00 00 00 17  00 00 00 00 22 00 1d 00  |............"...|
000080a0  00 00 00 00 00 1d 00 08  00 00 00 00 08 00 70 05  |..............p.|
000080b0  01 09 30 36 f0 02 00 00  01 00 00 01 01 00 20 20  |..06..........  |
000080c0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

The ISO9660 superblock. This is completely standard. From here on there's nothing terribly surprising - the CD filesystem itself is entirely normal. The only slight oddity is that we have three embedded El Torito images. The first of these is a BIOS-bootable disk image. That'll be executed whenever the CD is put in a non-EFI machine. The second is a VFAT EFI boot image. That's used for the majority of EFI systems. The third is an HFS+ EFI boot iamge. There's no strict requirement for the HFS+ one to be an El Torito image - CD booting on Macs is already handled by the Apple partition map. Storing it as an El Torito is purely for convenience, since it guarantees correct alignment and avoids us having to parse the entire ISO9660 image to find its location when generating the partition maps.

In summary: Three partition maps, three bootable images, support for BIOS, UEFI and Mac platforms, works whether it's burned to a CD or written directly to a USB stick. I'm pretty pleased with that.

[1] Other than an awkward bug where something goes horribly wrong during Radeon graphics init, so Radeon-based Macs aren't working too well at the moment. We're working on it.

[2] The world decided that writing a real ISO9660 driver for BIOS was hard, so came up with El Torito. El Torito is a spec for embedding disk images inside ISO9660. There's a pointer to the disk image in the Cd header, so the BIOS simply reads a few blocks off CD, finds that pointer and then sets up a bunch of disk i/o interrupts to point at the embedded filesystem rather than a real disk. From that point on, the El Torito image simply behaves like a floppy or hard drive. This was a reasonable compromise given how resource limited older BIOS implementations were.

For reasons that aren't entirely clear, UEFI doesn't mandate that implementations support ISO9660. So even though our firmware is now sufficiently capable that the only thing standing between it and emacs is someone being sufficiently bored, we still use El Torito. Fortunately the spec allows multiple El Torito images to be embedded, so we can include one that's bootable by BIOS systems and another that's bootable by UEFI systems.

comments

Syndicated 2012-05-02 13:52:20 from Matthew Garrett

17 Mar 2012 »

More ways for firmware to screw you

Some of my recent time has been devoted to making our boot media more Mac friendly, which has entailed rather a lot of rebooting. This would have been fine, if tedious, except that some number of boots would fall over with either a clearly impossible kernel panic or userspace segfaulting in places that made no sense. Something was clearly wrong. Crashes that shouldn't happen are generally an indication of memory corruption. The question is how that corruption is being triggered. Hunting that down wasn't terribly easy.

My first thought was that we were possibly managing to load the kernel over a region used by UEFI code. UEFI defines two types of code - boot services and runtime services. While runtime services code and data must be preserved by the OS, in theory boot services code and data is available to the OS once the firmware has exited. In practice, that's not true. It seemed entirely possible that the kernel might be ending up on top of some of that boot services code or data and getting trodden on. Grub now has code to avoid putting the kernel on boot services, so testing the latest code seemed like a good plan. But no, crashes still happened.

That pretty much ruled out the bootloader. My next thought was that executing some of the firmware code was triggering a write to some other memory that contained the kernel. Josh Boyer suggested the next trick, which was to try marking the kernel read-only to see whether anything was hitting it. x86 lets you mark pages as read-only - any attempt to write to them should take a fault. UEFI functions are executed in the context of the kernel, so share the same page tables. That let me rule this out, since everything still went just as wrong and I wasn't taking an extra fault first.

However, at this point I was reasonably happy that it wasn't the kernel itself being overwritten - faults were occurring in userspace code as well. That was a pretty strong indication that what was happening was continuing to happen once userspace had started, so it wasn't a direct response to a firmware call. I made sure of that by stubbing out all the calls that could be triggered after kernel initialisation, and saw the same failures. Once all attempts to be clever have failed, it's time to just start using brute force. The kernel lets you reserve areas of RAM by passing arguments like memmap=0xlength$0xstart to block length bytes starting at start from being used. It took a while, but I finally found a 256MB range that made a difference - reserving it resulted in the machine booting reliably, letting the OS use it resulted in occasional crashes.

Definite progress. Comparing that memory range to the EFI memory map was helpful. There were several blocks of UEFI boot services data present there, which really seemed like too much of a coincidence. By reserving each of them in turn, I'd traced it down to a single 31MB region of boot service data - that is, memory reserved by the firmware for use by the UEFI boot services. Per spec, this is available to the OS once the boot environment has been exited. Nothing other than the OS should be touching this after boot, but something clearly was. Tracking down what was far easier than I expected, although the first attempt was a failure. Setting it read-only should have triggered a fault, but didn't. That was rather confusing. But, rather than give up, I patched the kernel to fill the region with 0xff at kernel init. Then I booted the system, read it back and looked for values that weren't 0xff. I got this:

00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
001568a0  ff ff ff ff ff ff ff ff  ff ff ff ff 84 00 00 00  |................|
001568b0  00 20 a7 ac 46 00 00 00  00 00 00 00 00 00 06 01  |. ..F...........|
001568c0  c2 0b 0c 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
001568d0  ff ff 0a 04 f0 03 82 0d  40 00 00 00 ff ff ff ff  |........@.......|
001568e0  ff ff 00 21 00 36 9a 80  ff ff ff ff ff ff 00 7e  |...!.6.........~|
001568f0  00 09 43 48 41 2d 47 75  65 73 74 01 04 02 04 0b  |..CHA-Guest.....|
00156900  16 32 08 0c 12 18 24 30  48 60 6c 2d 1a 0e 18 1a  |.2....$0H`l-....|
00156910  ff ff 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00156920  00 00 00 00 00 00 00 dd  09 00 10 18 02 00 10 01  |................|
00156930  00 00 dd 1e 00 90 4c 33  0e 18 1a ff ff 00 00 00  |......L3........|
00156940  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00156950  00 00 bd ea f8 b3 ff ff  ff ff ff ff ff ff ff ff  |................|
00156960  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
022fb000

That's a lot of 0xffs (around 31MB of them) with one small section that contains an 802.11 probe packet with the SSID of the hospital across the road from my house. Apple support network booting off wireless networks. It seems that the firmware brought up the wireless card, associated with this network (it's the only public one nearby) and then left the card DMAing packets into RAM. The read-only page attribute only applies to CPU-initiated accesses, so it could do this without triggering a page fault. It also explained why it was so random - whether memory corruption occurred would depend on whether a packet appeared between that memory being used by the OS and the kernel reinitialising the wireless card. It certainly explains why I couldn't reproduce it when I left the machine repeatedly rebooting on the bus home.

How do we fix this? Unsure. With luck disconnecting the UEFI driver in the bootloader should quiesce the hardware, but without testing I'm not sure of that yet. For now it's just another example of firmware managing to break expectations in deeply strange ways.

comments

Syndicated 2012-03-17 23:07:32 from Matthew Garrett

12 Feb 2012 »

Some things you may have heard about Secure Boot which aren't entirely true

Talking about Secure Boot again, I'm afraid. One of the things that's made discussion of this difficult is that, while the specification isn't overly complicated, some of the outcomes aren't obvious at all until you spend a long time thinking about it. So here's some clarification on a few points.

Secure Boot provides no additional security

Untrue. Attacks against the pre-boot environment are real and increasingly common - this is a recent proof of concept, and this has been seen in the wild. Once something has got into the MBR, all bets are off. It can modify your bootloader or kernel, inserting its own code to return valid results whenever any kind of malware checker scans for it. The only way to reliably identify it is to either move the disk to another system or to cold boot off verified media. By restricting pre-OS code to something that's been signed, Secure Boot does provide enhanced security.

Secure Boot is just another name for Trusted Boot

Untrue. Trusted Boot requires the ability to measure the entire boot process, which gives the OS the ability to figure out everything that's been run before OS startup. The root of trust is in the hardware and a TPM is required. Secure Boot is simply a way to limit the applications that are run before the OS. Once booted, there is no way for the OS to identify what was previously booted, or even if the system was booted securely.

Microsoft are just requiring that vendors implement the specification

Untrue. Quoting from the Windows 8 Hardware Certification Requirements:

MANDATORY: No in-line mechanism is provided whereby a user can bypass Secure Boot failures and boot anyway Signature verification override during boot when Secure Boot is enabled is not allowed. A physically present user override is not permitted for UEFI images that fail signature verification during boot. If a user wants to boot an image that does not pass signature verification, they must explicitly disable Secure Boot on the target system.

Section 27.7.3.3 of version 2.3.1A of the UEFI spec explicitly permits implementations to provide a physically present user override. Whether this is a good thing or not is obviously open to argument, but the fact remains that Microsoft forbid behaviour that the specification permits.

Secure Boot can be used to implement DRM

Untrue. The argument here is that Secure Boot can be used to restrict the software that a machine can run, and so can limit a system to running code that implements effective copy protection mechanisms. This isn't the case. For that to be true, there would need to be a mechanism for the OS to identify which code had been run in the pre-OS environment. There isn't. The only communication channel between the firmware and the OS is via a single UEFI variable called "SecureBoot", which may be either "1" or "0". Additionally, the firmware may provide a table to the OS containing a list of UEFI executables that failed to authenticate. It is not required to provide any information about the executables that authenticated correctly.

In both these cases, the OS is required to trust the firmware. If the firmware has been compromised in any way (such as the user turning off Secure Boot), the data provided by the firmware can be trivially modified and the OS told that everything is fine. As long as machines exist where users are permitted to disable Secure Boot, Secure Boot is not any kind of DRM scheme.

Secure Boot provides physical security

Untrue. Secure Boot does not in any way claim to improve security against attackers who have physical access, for the same reasons as the DRM case. The OS has no way to determine that the firmware's behaviour has been modified. A physically-present attacker can simply disable Secure Boot and install a piece of malware that lies to the OS about platform security. The "Evil Maid" attack still exists.

Secure Boot only defines the interaction between the firmware and the bootloader. It doesn't specify any higher policy

Misleading. It's true that Secure Boot only specifies the authentication of pre-OS code. However, if you implement Secure Boot it's because you want to ensure that only authenticated code has run before your OS. If there is any way for unauthenticated code to touch the hardware before your OS starts, you can't ensure that. If an authenticated Linux kernel is booted and then loads an unsigned driver, that unsigned driver can fabricate a fake UEFI environment and then launch Windows from it. Windows would falsely believe that it has booted securely. If that authenticated Linux kernel were widely distributed, attackers could use it as an attack vector for Windows. The logical response from Microsoft would be to blacklist that kernel.

The inevitable outcome of implementing Secure Boot is that every component that can touch hardware must be signed. Anyone who implements Secure Boot without requiring that is adding no additional security whatsoever.

Only machines that want to boot Windows need to carry Microsoft's keys

Again, misleading. Microsoft only require one signing key to be installed, and the Windows bootloader will be signed with a key that chains back to this one. However, the bootloader is not the only component that must be signed. Any drivers that are carried on ROMs on plug-in cards must also be signed. One approach here would have been for all hardware vendors to have their own keys. This would have been unscalable - any shipped machine would have to carry keys for every vendor who produces PCI cards. If a machine carried an nvidia key but not an AMD one, swapping a geforce for a radeon would have resulted in the firmware graphics driver failing to load. Instead, Microsoft are providing a signing service. Vendors will be able to sign up for WHQL membership and have their UEFI drivers signed by Microsoft.

This leads to the problem. The Authenticode format used for signing UEFI objects only allows for a single signature. If a driver is signed by Microsoft, it can't be signed by anybody else. Therefore, if a system vendor wants to support off-the-shelf PCI devices with Microsoft-signed drivers, the system must carry Microsoft's key. If the same key is used as the root of trust for the driver signing and for the bootloader signing, that also means that the system will boot Windows.

This is only a problem for clients, not servers

Not strictly true. While Microsoft's current requirements don't mandate the presence of Secure Boot on server hardware, if present it must be enabled and locked down as it is for clients. It's not valid for servers to ship with disabled Secure Boot support, or for it to be shipped in a mode allowing users to make automated policy modifications at OS install time.

Secure Boot is required by NIST

This is one that I've heard from multiple people working on Secure Boot. It's amazingly untrue. The document they're referring to is NIST SP800-147, which is a document that describes guidelines for firmware security - that is, what has to be done to ensure that the firmware itself is secure. This includes making sure that the OS can't overwrite the firmware and that firmware updates must be signed. It says absolutely nothing about the firmware only running signed software. Secure Boot depends upon the firmware being trusted, so these guidelines are effectively a required part of Secure Boot. But Secure Boot is not within the scope of SP800-147 at all.

It's easy for Linux to implement Secure Boot

Misleading. From a technical perspective, sure. From a practical perspective, not at all. I wrote about the details here.

It's only a problem for hobbyist Linux, not the real Linux market

Untrue. It's unclear whether even the significant Linux vendors can implement Secure Boot in a way that meets the needs of their customers and still allows them to boot on commodity hardware. A naive implementation removes many of the benefits of Linux for enterprise customers, such as the ability to use local modifications to micro-optimise systems for specific workloads. One of the key selling points of Linux is the ability to make use of local expertise when adapting the product for your needs. Secure Boot makes that more difficult.

Conclusion

Much reporting on the issues surrounding Secure Boot so far has been inaccurate, leading to misunderstandings about the (genuine) benefits and the (genuine) drawbacks. Any useful discussion must be based around an accurate understanding of the specification rather than statements from analysts with no real understanding of the Linux market or security.

comments

Syndicated 2012-02-12 19:55:01 from Matthew Garrett

9 Feb 2012 »

Is GPL usage really declining?

Matthew Aslett wrote about how the proportion of projects released under GPL-like licenses appears to be declining, at least as far as various sets of figures go. But what does that actually mean? In absolute terms, GPL use has increased - any change isn't down to GPL projects transitioning over to liberal licenses. But an increasing number of new projects are being released under liberal licenses. Why is that?

The figures from Black Duck aren't a great help here, because they tell us very little about the software they're looking at. FLOSSmole is rather more interesting. I pulled the license figures from a few sites and found the following proportion of GPLed projects:

RubyForge: ~30%
Google Code: ~50%
Launchpad: ~70%

I've left the numbers rough because there's various uncertainties - should proprietary licenses be included in the numbers, is CC Sharealike enough like the GPL to count it there, that kind of thing. But what's clear is that these three sites have massively different levels of GPL use, and it's not hard to imagine why. They all attract different types of developer. The RubyForge figures are obviously going to be heavily influenced by Ruby developers, and that (handwavily) implies more of a bias towards web developers than the general developer population. Launchpad, on the other hand, is going to have a closer association with people with an Ubuntu background - it's probably more representative of Linux developers. Google Code? The 50% figure is the closest to the 56.8% figure that Black Duck give, so it's probably representative of the more general development community.

The impression gained from this is that the probability of you using one of the GPL licenses is influenced by the community that you're part of. And it's not a huge leap to believe that an increasing number of developers are targeting the web, and the web development community has never been especially attached to the GPL. It's not hard to see why - the benefits of the GPL vanish pretty much entirely when you're never actually obliged to distribute the code, and while Affero attempts to compensate from that it also constrains your UI and deployment model. No matter how strong a believer in Copyleft you are, the web makes it difficult for users to take any advantage of the freedoms you'd want to offer. It's as easy not to bother.
So it's pretty unsurprising that an increase in web development would be associated with a decrease in the proportion of projects licensed under the GPL.

This obviously isn't a rigorous analysis. I have very little hard evidence to back up my assumptions. But nor does anyone who claims that the change is because the FSF alienated the community during GPLv3 development. I'd be fascinated to see someone spend some time comparing project type with license use and trying to come up with a more convincing argument.

(Raw data from FLOSSmole: Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26.)

comments

Syndicated 2012-02-09 22:33:55 from Matthew Garrett

31 Jan 2012 »

The ongoing fight against GPL enforcement

GPL enforcement is a surprisingly difficult task. It's not just a matter of identifying an infringement - you need to make sure you have a copyright holder on your side, spend some money sending letters asking people to come into compliance, spend more money initiating a suit, spend even more money encouraging people to settle, spend yet more money actually taking them to court and then maybe, at the end, you have some source code. One of the (tiny) number of groups involved in doing this is the Software Freedom Conservancy, a non-profit organisation that offers various services to free software projects. One of their notable activities is enforcing the license of Busybox, a GPLed multi-purpose application that's used in many embedded Linux environments. And this is where things get interesting

GPLv2 (the license covering the relevant code) contains the following as part of section 4:

Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License.

There's some argument over what this means, precisely, but GPLv3 adds the following paragraph:

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation

which tends to support the assertion that, under V2, once the license is terminated you've lost it forever. That gives the SFC a lever. If a vendor is shipping products using Busybox, and is found to be in violation, this interpretation of GPLv2 means that they have no license to ship Busybox again until the copyright holders (or their agents) grant them another. This is a bit of a problem if your entire stock consists of devices running Busybox. The SFC will grant a new license, but on one condition - not only must you provide the source code to Busybox, you must provide the source code to all other works on the device that require source distribution.

The outcome of this is that we've gained access to large bodies of source code that would otherwise have been kept by companies. The SFC have successfully used Busybox to force the source release of many vendor kernels, ensuring that users have the freedoms that the copyright holders granted to them. Everybody wins, with the exception of the violators. And it seems that they're unenthusiastic about that.

A couple of weeks ago, this page appeared on the elinux.org wiki. It's written by an engineer at Sony, and it's calling for contributions to rewriting Busybox. This would be entirely reasonable if it were for technical reasons, but it's not - it's explicitly stated that companies are afraid that Busybox copyright holders may force them to comply with the licenses of software they ship. If you ship this Busybox replacement instead of the original Busybox you'll be safe from the SFC. You'll be able to violate licenses with impunity.

What can we do? The real problem here is that the SFC's reliance on Busybox means that they're only able to target infringers who use that Busybox code. No significant kernel copyright holders have so far offered to allow the SFC to enforce their copyrights, with the result that enforcement action will grind to a halt as vendors move over to this Busybox replacement. So, if you hold copyright over any part of the Linux kernel, I'd urge you to get in touch with them. The alternative is a strangely ironic world where Sony are simultaneously funding lobbying for copyright enforcement against individuals and tools to help large corporations infringe at will. I'm not enthusiastic about that.

comments

Syndicated 2012-01-30 23:40:50 from Matthew Garrett

23 Jan 2012 »

UEFI and bugs

I gave a presentation on UEFI at LCA 2012 - you can watch it here, with the bonus repeat (and different jokes) here. It's a gentle introduction to UEFI, followed by some discussion of the problems we've faced in dealing with implementation bugs.

The fundamental problem is that UEFI is a lot of code. And I really do mean a lot of code. Ignoring drivers, the x86 Linux kernel is around 30MB of code. A comparable subset of the UEFI tree is around 35MB. UEFI is of a comparable degree of complexity to the Linux kernel. There's no reason to assume that the people who've actually written this code are significantly more or less competent than an average Linux developer, so all else being equal we'd probably expect somewhere around the same number of bugs per line. Of course, not all else is equal.

Even today, basically all hardware is shipping with BIOS by default. The only people to enable UEFI are enthusiasts. Various machines will pop up all kinds of dire warnings if you try to turn it on. UEFI has had very little real world testing. And it really does show. In the few months I've been working on UEFI I've discovered machines where SetVirtualAddressMap() calls code that has already been (per spec) discarded. I've seen cases where it was possible to create variables, but not to delete them. I've seen a machine that would irreparably corrupt its firmware when you tried to set a variable. I've tripped over code that fails to parse invalid boot variables, bricking the hardware. Many vendors independently fail to report the correct framebuffer stride. And those are just the ones that have ended up on hardware which crosses my desk, which means I haven't even tested the majority of consumer-grade hardware with UEFI.

The problems with UEFI have very little to do with its design or the ability of the people implementing it. After a few years of iterative improvements it stands a good chance of being more reliable and useful than BIOS. Increased commonality of code between vendors is a blessing and a curse - in the long term it means that these bugs can be shaken out, but in the short term it means that at least one hardware-destroying bug has ended up carried by multiple vendors. Right now we're still in the short term and it's likely that we'll find yet more UEFI bugs that cause all sorts of problems. The next few years will probably be a bumpy ride.

comments

Syndicated 2012-01-23 15:52:51 from Matthew Garrett

18 Jan 2012 »

Why UEFI secure boot is difficult for Linux

I wrote about the technical details of supporting the UEFI secure boot specification with Linux. Despite me pretty clearly saying that this was ignoring issues of licensing and key distribution and the like, people are now using it to claim that Linux could support secure boot with minimal effort. In a sense, they're right. The technical implementation details are fairly straightforward. But they're not the difficult bit.

Secure boot requires that all code that can touch hardware be trusted

Right now, if you can run unstrusted code before the OS then you can subvert the OS. Secure boot gives you a mechanism for making sure you only run trusted code, which protects against that. So your UEFI drivers have to be signed, your bootloader has to be signed, and your bootloader must only load a signed kernel. If you've only booted trusted code then you know that your OS is safe. But, unlike trusted boot, secure boot provides no way for you to know that only trusted code was executed. That has to be ensured by OS policy.

This doesn't sound like too much of a problem. But it is. Imagine we have a signed Linux bootloader and a signed Linux kernel, and that these signatures are made with a globally trusted key. These will boot on any hardware using secure boot. Now imagine that an attacker writes a kernel module that sets up a fake UEFI environment, stops the kernel from running code and then executes the Windows bootloader - kind of like kexec, but a little more fiddly. The Windows bootloader believes that it's performing a secure boot, but in fact everything that it believes is trustworthy is the attacker's fake UEFI code. The user's encryption passphrase is logged, the Windows kernel is modified to hide the Linux code and despite everything looking fine your credit card details are on their way to China. In this scenario, the signed Linux kernel is simply used as a malware loader. The only sign that anything is wrong is that boot will be slightly slowed down.

Signing the kernel isn't enough. Signed Linux kernels must refuse to load any unsigned kernel modules. Virtualbox on Linux? Dead. Nvidia binary driver on Linux? Dead. All out of tree kernel modules? Utterly, utterly dead. Building an updated driver locally? Not going to happen. That's going to make some people fairly unhappy.

(The same applies to Windows, of course. Windows 7 allows you to disable driver integrity checks. Windows 8 will have to forbid that when the system's using secure boot)

Licensing

GPLv3 has various requirements for signing keys to be available. Microsoft's new requirement that systems support the installation of user keys would let users boot their own modified bootloaders, so that may end up being sufficient to satisfy the license. But we're then beholden on Microsoft - if they remove that requirement then users lose that freedom, and suddenly we're in an awkward licensing situation. There are ongoing conversations about exactly what we're able to do here, but it's not a solved problem.

Key distribution

The UEFI spec doesn't describe or mandate a central certifying authority. Microsoft require that everyone carry their key. We could generate our own, but we have much less sway with vendors. There's no way to guarantee that all hardware vendors will generate our key. And, obviously, if we generate a key, we can't just hand the private half out to others. That means that it becomes impossible for people to produce derivative versions of Linux distributions without getting their own key. The kind of identity verification that would be required for getting such a key is likely to be expensive, and also fairly likely to require that the distribution have a legally registered company in order to facilitate the identity verification. Think Extended Validation certificates, not Startssl Free. Hobbyist Linux distributions will be a thing of the past.

Doesn't custom mode fix this?

Microsoft's certification requirements now state that all systems must support a custom mode, implying that it will be possible for a user to install their own keys. Linux vendors would then be able to ship with their own keys on the install media and impose their own policies. Everyone's happy. It's not really good enough, though. People have spent incredible amounts of time and effort making it easy to install Linux by doing little more than putting a CD in a drive. Asking them to go into the firmware and reconfigure things adds an extra barrier that restricts the ability to install Linux to more technically skilled users. And it's even worse than that. This is the full description of the requirement for custom mode:

It shall be possible for a physically present user to use the Custom Mode firmware setup option to modify the contents of the Secure Boot signature databases and the PK.
If the user ends up deleting the PK then, upon exiting the Custom Mode firmware setup, the system will be operating in Setup Mode with Secure Boot turned off.
The firmware setup shall indicate if Secure Boot is turned on, and if it is operated in Standard or Custom Mode. The firmware setup must provide an option to return from Custom to Standard Mode which restores the factory defaults.

There's a few things missing from this, namely:

Any description of the UI. It's effectively impossible to document Linux installation when the first step becomes (a) complicated and (b) vendor specific. Vendors are using the UEFI transition to differentiate themselves by coming up with their own unique firmware interfaces. Custom mode is going to look different everywhere.
Any description of the key format. A raw binary representation of the key? An EFI_SIGNATURE_DATA struct? A base64 encoding of one, further protected with ROT13? We just don't know.
Any way to use custom mode for unattended installs. It's a firmware interface that requires a physically present user. Want to install a few thousand machines over the network? This isn't a scalable approach
…and this one's nitpicking, but there's not actually any requirement that the user be able to add keys - a vendor could conform to this language by only letting users delete keys. This is actually ok as long as the user deletes Pk, because then we'll effectively be back in setup mode and can install our own keys from the installer, but it still results in some practical problems

So no, custom mode doesn't make everything ok. Custom mode with a mandated UI and a documented key format would be much closer, but it wouldn't solve the problem of unattended automated installs.

Summary

We can write the code required to support secure boot on Linux in a minimal amount of time - in fact, most of it's now done. But significant practical problems remain, and so far we have no workable solutions for any of them.

comments

Syndicated 2012-01-18 00:55:05 from Matthew Garrett

325 older entries...