Older blog entries for mjg59 (starting at number 330)

More ways for firmware to screw you

Some of my recent time has been devoted to making our boot media more Mac friendly, which has entailed rather a lot of rebooting. This would have been fine, if tedious, except that some number of boots would fall over with either a clearly impossible kernel panic or userspace segfaulting in places that made no sense. Something was clearly wrong. Crashes that shouldn't happen are generally an indication of memory corruption. The question is how that corruption is being triggered. Hunting that down wasn't terribly easy.

My first thought was that we were possibly managing to load the kernel over a region used by UEFI code. UEFI defines two types of code - boot services and runtime services. While runtime services code and data must be preserved by the OS, in theory boot services code and data is available to the OS once the firmware has exited. In practice, that's not true. It seemed entirely possible that the kernel might be ending up on top of some of that boot services code or data and getting trodden on. Grub now has code to avoid putting the kernel on boot services, so testing the latest code seemed like a good plan. But no, crashes still happened.

That pretty much ruled out the bootloader. My next thought was that executing some of the firmware code was triggering a write to some other memory that contained the kernel. Josh Boyer suggested the next trick, which was to try marking the kernel read-only to see whether anything was hitting it. x86 lets you mark pages as read-only - any attempt to write to them should take a fault. UEFI functions are executed in the context of the kernel, so share the same page tables. That let me rule this out, since everything still went just as wrong and I wasn't taking an extra fault first.

However, at this point I was reasonably happy that it wasn't the kernel itself being overwritten - faults were occurring in userspace code as well. That was a pretty strong indication that what was happening was continuing to happen once userspace had started, so it wasn't a direct response to a firmware call. I made sure of that by stubbing out all the calls that could be triggered after kernel initialisation, and saw the same failures. Once all attempts to be clever have failed, it's time to just start using brute force. The kernel lets you reserve areas of RAM by passing arguments like memmap=0xlength$0xstart to block length bytes starting at start from being used. It took a while, but I finally found a 256MB range that made a difference - reserving it resulted in the machine booting reliably, letting the OS use it resulted in occasional crashes.

Definite progress. Comparing that memory range to the EFI memory map was helpful. There were several blocks of UEFI boot services data present there, which really seemed like too much of a coincidence. By reserving each of them in turn, I'd traced it down to a single 31MB region of boot service data - that is, memory reserved by the firmware for use by the UEFI boot services. Per spec, this is available to the OS once the boot environment has been exited. Nothing other than the OS should be touching this after boot, but something clearly was. Tracking down what was far easier than I expected, although the first attempt was a failure. Setting it read-only should have triggered a fault, but didn't. That was rather confusing. But, rather than give up, I patched the kernel to fill the region with 0xff at kernel init. Then I booted the system, read it back and looked for values that weren't 0xff. I got this:

00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
001568a0  ff ff ff ff ff ff ff ff  ff ff ff ff 84 00 00 00  |................|
001568b0  00 20 a7 ac 46 00 00 00  00 00 00 00 00 00 06 01  |. ..F...........|
001568c0  c2 0b 0c 00 ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
001568d0  ff ff 0a 04 f0 03 82 0d  40 00 00 00 ff ff ff ff  |........@.......|
001568e0  ff ff 00 21 00 36 9a 80  ff ff ff ff ff ff 00 7e  |...!.6.........~|
001568f0  00 09 43 48 41 2d 47 75  65 73 74 01 04 02 04 0b  |..CHA-Guest.....|
00156900  16 32 08 0c 12 18 24 30  48 60 6c 2d 1a 0e 18 1a  |.2....$0H`l-....|
00156910  ff ff 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00156920  00 00 00 00 00 00 00 dd  09 00 10 18 02 00 10 01  |................|
00156930  00 00 dd 1e 00 90 4c 33  0e 18 1a ff ff 00 00 00  |......L3........|
00156940  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00156950  00 00 bd ea f8 b3 ff ff  ff ff ff ff ff ff ff ff  |................|
00156960  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
022fb000
That's a lot of 0xffs (around 31MB of them) with one small section that contains an 802.11 probe packet with the SSID of the hospital across the road from my house. Apple support network booting off wireless networks. It seems that the firmware brought up the wireless card, associated with this network (it's the only public one nearby) and then left the card DMAing packets into RAM. The read-only page attribute only applies to CPU-initiated accesses, so it could do this without triggering a page fault. It also explained why it was so random - whether memory corruption occurred would depend on whether a packet appeared between that memory being used by the OS and the kernel reinitialising the wireless card. It certainly explains why I couldn't reproduce it when I left the machine repeatedly rebooting on the bus home.

How do we fix this? Unsure. With luck disconnecting the UEFI driver in the bootloader should quiesce the hardware, but without testing I'm not sure of that yet. For now it's just another example of firmware managing to break expectations in deeply strange ways.

comment count unavailable comments

Syndicated 2012-03-17 23:07:32 from Matthew Garrett

Some things you may have heard about Secure Boot which aren't entirely true

Talking about Secure Boot again, I'm afraid. One of the things that's made discussion of this difficult is that, while the specification isn't overly complicated, some of the outcomes aren't obvious at all until you spend a long time thinking about it. So here's some clarification on a few points.

Secure Boot provides no additional security

Untrue. Attacks against the pre-boot environment are real and increasingly common - this is a recent proof of concept, and this has been seen in the wild. Once something has got into the MBR, all bets are off. It can modify your bootloader or kernel, inserting its own code to return valid results whenever any kind of malware checker scans for it. The only way to reliably identify it is to either move the disk to another system or to cold boot off verified media. By restricting pre-OS code to something that's been signed, Secure Boot does provide enhanced security.

Secure Boot is just another name for Trusted Boot

Untrue. Trusted Boot requires the ability to measure the entire boot process, which gives the OS the ability to figure out everything that's been run before OS startup. The root of trust is in the hardware and a TPM is required. Secure Boot is simply a way to limit the applications that are run before the OS. Once booted, there is no way for the OS to identify what was previously booted, or even if the system was booted securely.

Microsoft are just requiring that vendors implement the specification

Untrue. Quoting from the Windows 8 Hardware Certification Requirements:

MANDATORY: No in-line mechanism is provided whereby a user can bypass Secure Boot failures and boot anyway Signature verification override during boot when Secure Boot is enabled is not allowed. A physically present user override is not permitted for UEFI images that fail signature verification during boot. If a user wants to boot an image that does not pass signature verification, they must explicitly disable Secure Boot on the target system.

Section 27.7.3.3 of version 2.3.1A of the UEFI spec explicitly permits implementations to provide a physically present user override. Whether this is a good thing or not is obviously open to argument, but the fact remains that Microsoft forbid behaviour that the specification permits.

Secure Boot can be used to implement DRM

Untrue. The argument here is that Secure Boot can be used to restrict the software that a machine can run, and so can limit a system to running code that implements effective copy protection mechanisms. This isn't the case. For that to be true, there would need to be a mechanism for the OS to identify which code had been run in the pre-OS environment. There isn't. The only communication channel between the firmware and the OS is via a single UEFI variable called "SecureBoot", which may be either "1" or "0". Additionally, the firmware may provide a table to the OS containing a list of UEFI executables that failed to authenticate. It is not required to provide any information about the executables that authenticated correctly.

In both these cases, the OS is required to trust the firmware. If the firmware has been compromised in any way (such as the user turning off Secure Boot), the data provided by the firmware can be trivially modified and the OS told that everything is fine. As long as machines exist where users are permitted to disable Secure Boot, Secure Boot is not any kind of DRM scheme.

Secure Boot provides physical security

Untrue. Secure Boot does not in any way claim to improve security against attackers who have physical access, for the same reasons as the DRM case. The OS has no way to determine that the firmware's behaviour has been modified. A physically-present attacker can simply disable Secure Boot and install a piece of malware that lies to the OS about platform security. The "Evil Maid" attack still exists.

Secure Boot only defines the interaction between the firmware and the bootloader. It doesn't specify any higher policy

Misleading. It's true that Secure Boot only specifies the authentication of pre-OS code. However, if you implement Secure Boot it's because you want to ensure that only authenticated code has run before your OS. If there is any way for unauthenticated code to touch the hardware before your OS starts, you can't ensure that. If an authenticated Linux kernel is booted and then loads an unsigned driver, that unsigned driver can fabricate a fake UEFI environment and then launch Windows from it. Windows would falsely believe that it has booted securely. If that authenticated Linux kernel were widely distributed, attackers could use it as an attack vector for Windows. The logical response from Microsoft would be to blacklist that kernel.

The inevitable outcome of implementing Secure Boot is that every component that can touch hardware must be signed. Anyone who implements Secure Boot without requiring that is adding no additional security whatsoever.

Only machines that want to boot Windows need to carry Microsoft's keys

Again, misleading. Microsoft only require one signing key to be installed, and the Windows bootloader will be signed with a key that chains back to this one. However, the bootloader is not the only component that must be signed. Any drivers that are carried on ROMs on plug-in cards must also be signed. One approach here would have been for all hardware vendors to have their own keys. This would have been unscalable - any shipped machine would have to carry keys for every vendor who produces PCI cards. If a machine carried an nvidia key but not an AMD one, swapping a geforce for a radeon would have resulted in the firmware graphics driver failing to load. Instead, Microsoft are providing a signing service. Vendors will be able to sign up for WHQL membership and have their UEFI drivers signed by Microsoft.

This leads to the problem. The Authenticode format used for signing UEFI objects only allows for a single signature. If a driver is signed by Microsoft, it can't be signed by anybody else. Therefore, if a system vendor wants to support off-the-shelf PCI devices with Microsoft-signed drivers, the system must carry Microsoft's key. If the same key is used as the root of trust for the driver signing and for the bootloader signing, that also means that the system will boot Windows.

This is only a problem for clients, not servers

Not strictly true. While Microsoft's current requirements don't mandate the presence of Secure Boot on server hardware, if present it must be enabled and locked down as it is for clients. It's not valid for servers to ship with disabled Secure Boot support, or for it to be shipped in a mode allowing users to make automated policy modifications at OS install time.

Secure Boot is required by NIST

This is one that I've heard from multiple people working on Secure Boot. It's amazingly untrue. The document they're referring to is NIST SP800-147, which is a document that describes guidelines for firmware security - that is, what has to be done to ensure that the firmware itself is secure. This includes making sure that the OS can't overwrite the firmware and that firmware updates must be signed. It says absolutely nothing about the firmware only running signed software. Secure Boot depends upon the firmware being trusted, so these guidelines are effectively a required part of Secure Boot. But Secure Boot is not within the scope of SP800-147 at all.

It's easy for Linux to implement Secure Boot

Misleading. From a technical perspective, sure. From a practical perspective, not at all. I wrote about the details here.

It's only a problem for hobbyist Linux, not the real Linux market

Untrue. It's unclear whether even the significant Linux vendors can implement Secure Boot in a way that meets the needs of their customers and still allows them to boot on commodity hardware. A naive implementation removes many of the benefits of Linux for enterprise customers, such as the ability to use local modifications to micro-optimise systems for specific workloads. One of the key selling points of Linux is the ability to make use of local expertise when adapting the product for your needs. Secure Boot makes that more difficult.

Conclusion

Much reporting on the issues surrounding Secure Boot so far has been inaccurate, leading to misunderstandings about the (genuine) benefits and the (genuine) drawbacks. Any useful discussion must be based around an accurate understanding of the specification rather than statements from analysts with no real understanding of the Linux market or security.

comment count unavailable comments

Syndicated 2012-02-12 19:55:01 from Matthew Garrett

Is GPL usage really declining?

Matthew Aslett wrote about how the proportion of projects released under GPL-like licenses appears to be declining, at least as far as various sets of figures go. But what does that actually mean? In absolute terms, GPL use has increased - any change isn't down to GPL projects transitioning over to liberal licenses. But an increasing number of new projects are being released under liberal licenses. Why is that?

The figures from Black Duck aren't a great help here, because they tell us very little about the software they're looking at. FLOSSmole is rather more interesting. I pulled the license figures from a few sites and found the following proportion of GPLed projects:

RubyForge: ~30%
Google Code: ~50%
Launchpad: ~70%

I've left the numbers rough because there's various uncertainties - should proprietary licenses be included in the numbers, is CC Sharealike enough like the GPL to count it there, that kind of thing. But what's clear is that these three sites have massively different levels of GPL use, and it's not hard to imagine why. They all attract different types of developer. The RubyForge figures are obviously going to be heavily influenced by Ruby developers, and that (handwavily) implies more of a bias towards web developers than the general developer population. Launchpad, on the other hand, is going to have a closer association with people with an Ubuntu background - it's probably more representative of Linux developers. Google Code? The 50% figure is the closest to the 56.8% figure that Black Duck give, so it's probably representative of the more general development community.

The impression gained from this is that the probability of you using one of the GPL licenses is influenced by the community that you're part of. And it's not a huge leap to believe that an increasing number of developers are targeting the web, and the web development community has never been especially attached to the GPL. It's not hard to see why - the benefits of the GPL vanish pretty much entirely when you're never actually obliged to distribute the code, and while Affero attempts to compensate from that it also constrains your UI and deployment model. No matter how strong a believer in Copyleft you are, the web makes it difficult for users to take any advantage of the freedoms you'd want to offer. It's as easy not to bother.
So it's pretty unsurprising that an increase in web development would be associated with a decrease in the proportion of projects licensed under the GPL.

This obviously isn't a rigorous analysis. I have very little hard evidence to back up my assumptions. But nor does anyone who claims that the change is because the FSF alienated the community during GPLv3 development. I'd be fascinated to see someone spend some time comparing project type with license use and trying to come up with a more convincing argument.

(Raw data from FLOSSmole: Howison, J., Conklin, M., & Crowston, K. (2006). FLOSSmole: A collaborative repository for FLOSS research data and analyses. International Journal of Information Technology and Web Engineering, 1(3), 17–26.)

comment count unavailable comments

Syndicated 2012-02-09 22:33:55 from Matthew Garrett

The ongoing fight against GPL enforcement

GPL enforcement is a surprisingly difficult task. It's not just a matter of identifying an infringement - you need to make sure you have a copyright holder on your side, spend some money sending letters asking people to come into compliance, spend more money initiating a suit, spend even more money encouraging people to settle, spend yet more money actually taking them to court and then maybe, at the end, you have some source code. One of the (tiny) number of groups involved in doing this is the Software Freedom Conservancy, a non-profit organisation that offers various services to free software projects. One of their notable activities is enforcing the license of Busybox, a GPLed multi-purpose application that's used in many embedded Linux environments. And this is where things get interesting

GPLv2 (the license covering the relevant code) contains the following as part of section 4:

Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License.

There's some argument over what this means, precisely, but GPLv3 adds the following paragraph:

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation

which tends to support the assertion that, under V2, once the license is terminated you've lost it forever. That gives the SFC a lever. If a vendor is shipping products using Busybox, and is found to be in violation, this interpretation of GPLv2 means that they have no license to ship Busybox again until the copyright holders (or their agents) grant them another. This is a bit of a problem if your entire stock consists of devices running Busybox. The SFC will grant a new license, but on one condition - not only must you provide the source code to Busybox, you must provide the source code to all other works on the device that require source distribution.

The outcome of this is that we've gained access to large bodies of source code that would otherwise have been kept by companies. The SFC have successfully used Busybox to force the source release of many vendor kernels, ensuring that users have the freedoms that the copyright holders granted to them. Everybody wins, with the exception of the violators. And it seems that they're unenthusiastic about that.

A couple of weeks ago, this page appeared on the elinux.org wiki. It's written by an engineer at Sony, and it's calling for contributions to rewriting Busybox. This would be entirely reasonable if it were for technical reasons, but it's not - it's explicitly stated that companies are afraid that Busybox copyright holders may force them to comply with the licenses of software they ship. If you ship this Busybox replacement instead of the original Busybox you'll be safe from the SFC. You'll be able to violate licenses with impunity.

What can we do? The real problem here is that the SFC's reliance on Busybox means that they're only able to target infringers who use that Busybox code. No significant kernel copyright holders have so far offered to allow the SFC to enforce their copyrights, with the result that enforcement action will grind to a halt as vendors move over to this Busybox replacement. So, if you hold copyright over any part of the Linux kernel, I'd urge you to get in touch with them. The alternative is a strangely ironic world where Sony are simultaneously funding lobbying for copyright enforcement against individuals and tools to help large corporations infringe at will. I'm not enthusiastic about that.

comment count unavailable comments

Syndicated 2012-01-30 23:40:50 from Matthew Garrett

UEFI and bugs

I gave a presentation on UEFI at LCA 2012 - you can watch it here, with the bonus repeat (and different jokes) here. It's a gentle introduction to UEFI, followed by some discussion of the problems we've faced in dealing with implementation bugs.

The fundamental problem is that UEFI is a lot of code. And I really do mean a lot of code. Ignoring drivers, the x86 Linux kernel is around 30MB of code. A comparable subset of the UEFI tree is around 35MB. UEFI is of a comparable degree of complexity to the Linux kernel. There's no reason to assume that the people who've actually written this code are significantly more or less competent than an average Linux developer, so all else being equal we'd probably expect somewhere around the same number of bugs per line. Of course, not all else is equal.

Even today, basically all hardware is shipping with BIOS by default. The only people to enable UEFI are enthusiasts. Various machines will pop up all kinds of dire warnings if you try to turn it on. UEFI has had very little real world testing. And it really does show. In the few months I've been working on UEFI I've discovered machines where SetVirtualAddressMap() calls code that has already been (per spec) discarded. I've seen cases where it was possible to create variables, but not to delete them. I've seen a machine that would irreparably corrupt its firmware when you tried to set a variable. I've tripped over code that fails to parse invalid boot variables, bricking the hardware. Many vendors independently fail to report the correct framebuffer stride. And those are just the ones that have ended up on hardware which crosses my desk, which means I haven't even tested the majority of consumer-grade hardware with UEFI.

The problems with UEFI have very little to do with its design or the ability of the people implementing it. After a few years of iterative improvements it stands a good chance of being more reliable and useful than BIOS. Increased commonality of code between vendors is a blessing and a curse - in the long term it means that these bugs can be shaken out, but in the short term it means that at least one hardware-destroying bug has ended up carried by multiple vendors. Right now we're still in the short term and it's likely that we'll find yet more UEFI bugs that cause all sorts of problems. The next few years will probably be a bumpy ride.

comment count unavailable comments

Syndicated 2012-01-23 15:52:51 from Matthew Garrett

Why UEFI secure boot is difficult for Linux

I wrote about the technical details of supporting the UEFI secure boot specification with Linux. Despite me pretty clearly saying that this was ignoring issues of licensing and key distribution and the like, people are now using it to claim that Linux could support secure boot with minimal effort. In a sense, they're right. The technical implementation details are fairly straightforward. But they're not the difficult bit.

Secure boot requires that all code that can touch hardware be trusted

Right now, if you can run unstrusted code before the OS then you can subvert the OS. Secure boot gives you a mechanism for making sure you only run trusted code, which protects against that. So your UEFI drivers have to be signed, your bootloader has to be signed, and your bootloader must only load a signed kernel. If you've only booted trusted code then you know that your OS is safe. But, unlike trusted boot, secure boot provides no way for you to know that only trusted code was executed. That has to be ensured by OS policy.

This doesn't sound like too much of a problem. But it is. Imagine we have a signed Linux bootloader and a signed Linux kernel, and that these signatures are made with a globally trusted key. These will boot on any hardware using secure boot. Now imagine that an attacker writes a kernel module that sets up a fake UEFI environment, stops the kernel from running code and then executes the Windows bootloader - kind of like kexec, but a little more fiddly. The Windows bootloader believes that it's performing a secure boot, but in fact everything that it believes is trustworthy is the attacker's fake UEFI code. The user's encryption passphrase is logged, the Windows kernel is modified to hide the Linux code and despite everything looking fine your credit card details are on their way to China. In this scenario, the signed Linux kernel is simply used as a malware loader. The only sign that anything is wrong is that boot will be slightly slowed down.

Signing the kernel isn't enough. Signed Linux kernels must refuse to load any unsigned kernel modules. Virtualbox on Linux? Dead. Nvidia binary driver on Linux? Dead. All out of tree kernel modules? Utterly, utterly dead. Building an updated driver locally? Not going to happen. That's going to make some people fairly unhappy.

(The same applies to Windows, of course. Windows 7 allows you to disable driver integrity checks. Windows 8 will have to forbid that when the system's using secure boot)

Licensing

GPLv3 has various requirements for signing keys to be available. Microsoft's new requirement that systems support the installation of user keys would let users boot their own modified bootloaders, so that may end up being sufficient to satisfy the license. But we're then beholden on Microsoft - if they remove that requirement then users lose that freedom, and suddenly we're in an awkward licensing situation. There are ongoing conversations about exactly what we're able to do here, but it's not a solved problem.

Key distribution

The UEFI spec doesn't describe or mandate a central certifying authority. Microsoft require that everyone carry their key. We could generate our own, but we have much less sway with vendors. There's no way to guarantee that all hardware vendors will generate our key. And, obviously, if we generate a key, we can't just hand the private half out to others. That means that it becomes impossible for people to produce derivative versions of Linux distributions without getting their own key. The kind of identity verification that would be required for getting such a key is likely to be expensive, and also fairly likely to require that the distribution have a legally registered company in order to facilitate the identity verification. Think Extended Validation certificates, not Startssl Free. Hobbyist Linux distributions will be a thing of the past.

Doesn't custom mode fix this?

Microsoft's certification requirements now state that all systems must support a custom mode, implying that it will be possible for a user to install their own keys. Linux vendors would then be able to ship with their own keys on the install media and impose their own policies. Everyone's happy. It's not really good enough, though. People have spent incredible amounts of time and effort making it easy to install Linux by doing little more than putting a CD in a drive. Asking them to go into the firmware and reconfigure things adds an extra barrier that restricts the ability to install Linux to more technically skilled users. And it's even worse than that. This is the full description of the requirement for custom mode:
  1. It shall be possible for a physically present user to use the Custom Mode firmware setup option to modify the contents of the Secure Boot signature databases and the PK.
  2. If the user ends up deleting the PK then, upon exiting the Custom Mode firmware setup, the system will be operating in Setup Mode with Secure Boot turned off.
  3. The firmware setup shall indicate if Secure Boot is turned on, and if it is operated in Standard or Custom Mode. The firmware setup must provide an option to return from Custom to Standard Mode which restores the factory defaults.

There's a few things missing from this, namely:
  • Any description of the UI. It's effectively impossible to document Linux installation when the first step becomes (a) complicated and (b) vendor specific. Vendors are using the UEFI transition to differentiate themselves by coming up with their own unique firmware interfaces. Custom mode is going to look different everywhere.
  • Any description of the key format. A raw binary representation of the key? An EFI_SIGNATURE_DATA struct? A base64 encoding of one, further protected with ROT13? We just don't know.
  • Any way to use custom mode for unattended installs. It's a firmware interface that requires a physically present user. Want to install a few thousand machines over the network? This isn't a scalable approach
  • …and this one's nitpicking, but there's not actually any requirement that the user be able to add keys - a vendor could conform to this language by only letting users delete keys. This is actually ok as long as the user deletes Pk, because then we'll effectively be back in setup mode and can install our own keys from the installer, but it still results in some practical problems

So no, custom mode doesn't make everything ok. Custom mode with a mandated UI and a documented key format would be much closer, but it wouldn't solve the problem of unattended automated installs.

Summary

We can write the code required to support secure boot on Linux in a minimal amount of time - in fact, most of it's now done. But significant practical problems remain, and so far we have no workable solutions for any of them.

comment count unavailable comments

Syndicated 2012-01-18 00:55:05 from Matthew Garrett

Firmware bugs considered enraging

Part of our work to make it possible to use UEFI Secure Boot on Linux has been to improve our EFI variable support code. Right now this has a hardcoded assumption that variables are 1024 bytes or smaller, which was true in pre-1.0 versions of the EFI spec. Modern implementations allow the maximum variable size to be determined by the hardware, and with implementations using large key sizes and hashes 1024 bytes isn't really going to cut it. My first attempt at this was a little ugly but also fell foul of the fact that sysfs only allows writes of up to the size of a page - so 4KB on most of the platforms we're caring about. So I've now reimplemented it as a filesystem[1], which is trickier but avoids this problem nicely.

Things were almost working fine - I could read variables of arbitrary sizes, and I could write to existing variables. I was just finishing hooking up new variable creation, but in the process accidentally set the contents of the Boot0002 variable to 0xffffffff 0xffffffff 0x00000000. Boot* variables provide the UEFI firmware with the different configured boot devices on the system - they can point either at a raw device or at a bootloader on a device, and they can do so using various different namespaces. They have a defined format, as documented in chapter 9 of the UEFI spec. At boot time the boot manager reads the variables and attempts to boot from them in a configured order as found in the BootOrder variable.

Now, obviously, 0xffffffff 0x00000000 is unlikely to conform to the specification. And when I rebooted the machine, it gave me a flashing cursor and did nothing. Fair enough - I should be able to choose another boot path from the boot manager. Except the boot manager behaves identically, and I get a flashing cursor and nothing else.

I reported this to the EDK2 development list, and Andrew Fish (who invented EFI back in the 90s) pointed me at the code that's probably responsible. It's in the BDS (Boot Device Selection) library that's part of the UEFI reference implementation from Intel, and you can find it here. The relevant function is BdsLibVariableToOption, which is as follows (with irrelevant bits elided):

BdsLibVariableToOption (
  IN OUT LIST_ENTRY                   *BdsCommonOptionList,
  IN  CHAR16                          *VariableName
  )
{
  UINT16                    FilePathSize;
  UINT8                     *Variable;
  UINT8                     *TempPtr;
  UINTN                     VariableSize;
  VOID                      *LoadOptions;
  UINT32                    LoadOptionsSize;
  CHAR16                    *Description;

  //
  // Read the variable. We will never free this data.
  //
  Variable = BdsLibGetVariableAndSize (
              VariableName,
              &gEfiGlobalVariableGuid,
              &VariableSize
              );
  if (Variable == NULL) {
    return NULL;
  }
So so far so good - we read the variable from flash and put it in Variable, Variable is now 0xffffffff 0xffffffff 0x00000000. If it hadn't existed we'd have skipped over and continued. VariableSize is 12.
  //
  // Get the option attribute
  //
  TempPtr   =  Variable;
  Attribute =  *(UINT32 *) Variable;
  TempPtr   += sizeof (UINT32);
Attribute is now 0xffffffff and TempPtr points to Variable + 4.
  //
  // Get the option's device path size
  //
  FilePathSize =  *(UINT16 *) TempPtr;
  TempPtr      += sizeof (UINT16);
FilePathSize is 0xffff, TempPtr points to Variable + 6.
  //
  // Get the option's description string size
  //
  TempPtr     += StrSize ((CHAR16 *) TempPtr);
TempPtr points to 0xffff 0x0000, so StrSize (which is basically strlen) will be 4. TempPtr now points to Variable + 10.
  //
  // Get the option's device path
  //
  DevicePath =  (EFI_DEVICE_PATH_PROTOCOL *) TempPtr;
  TempPtr    += FilePathSize;
TempPtr now points to Variable + 65545 (FilePathSize is 0xffff).
  LoadOptions     = TempPtr;
  LoadOptionsSize = (UINT32) (VariableSize - (UINTN) (TempPtr - Variable));
LoadOptionsSize is now 12 - (Variable + 65545 - Variable), or 12 - 65545, or -65533. But it's cast to an unsigned 32 bit integer, so it's actually 4294901763.
  Option->LoadOptions = AllocateZeroPool (LoadOptionsSize);
  ASSERT(Option->LoadOptions != NULL);
We attempt to allocate just under 4GB of RAM. This probably fails - if it does the boot manager exits. This probably means game over. But if it somehow succeeds:
CopyMem (Option->LoadOptions, LoadOptions, LoadOptionsSize);
we then proceed to read almost 4GB of content from uninitialised addresses, and since Variable was probably allocated below 4GB that almost certainly includes all of your PCI space (which is typically still below 4GB) and bits of your hardware probably generate very unhappy signals on the bus and you lose anyway.

So now I have a machine that won't boot, and the clear CMOS jumper doesn't clear the flash contents so I have no idea how to recover from it. And because this code is present in the Intel reference implementation, doing the same thing on most other UEFI machines would probably have the same outcome. Thankfully, it's not something people are likely to do by accident - using any of the standard interfaces will always generate a valid entry, so you could only trigger this when modifying variables by hand. But now I need to set up another test machine.

[1] All code in Linux will evolve until the point where it's implemented as a filesystem.

comment count unavailable comments

Syndicated 2012-01-06 20:17:56 from Matthew Garrett

The economic incentive to violate the GPL

My post yesterday on how Google gains financial benefit from vendor GPL violations contained an assertion that some people have questioned - namely, "unscrupulous hardware vendors save money by ignoring their GPL obligations". And, to be fair, as written it's true but not entirely convincing. So instead, let's consider "unscrupulous hardware vendors have economic incentives to ignore their GPL obligations".

The direct act of compliance costs money


Complying with the GPL means having the source code that built the binaries you ship. This is easy if your workflow involves putting source in at one end and getting binaries out at the other, but getting to that workflow means having a certain degree of engineering rigour. If your current build process involves mixing a bunch of known good binaries you got from somewhere but you can't remember where with a hacked up source tree that exists on someone's hard drive and then pushing all of these into a tool that only runs on Windows ME, before taking the resulting image and replacing chunks of it by hand, compliance is effectively impossible.

We all know that this is against all kinds of best practices and probably causes so many problems that it's more expensive in the long term, but retooling and hiring someone to oversee all of this takes time and money, and given the margins on many of these devices that's probably enough to make you uncompetitive for a couple of product cycles. Maybe you'll be in a better position afterwards, but you don't know that there'll be an afterwards.

Suppliers who don't provide you with the source code may be cheaper than those who do


You can't be in compliance if you don't have the source code in the first place. The same arguments that apply to the hardware vendors also apply to the people selling you your chips, so there's also an economic incentive for them to avoid complying. And there's an obvious incentive for you to choose the cheaper chipset, even if they don't comply.

Getting the source may cost money


Buying a chipset doesn't necessarily get you the software that makes it work - several silicon vendors will charge you for the SDK. But many of these devices are effectively reference platforms, so are basically identical from a hardware perspective. So if one of your competitors paid for the SDK, you can just dump the binaries off their machine, flash them onto your own boards and save yourself a decent amount of money. You obviously don't get the source, and nor do you have the standing to insist that the vendor whose binaries you misappropriated give you the source.

In the absence of enforcement, GPL compliance only works if it's the norm


Let's imagine two companies, A and B. Both build a tablet device, and buy the full SDK including source code. Both find a bunch of bugs in the vendor SDK and fix a different subset of them. They ship. A provides source code. B doesn't. B can now take A's bugfixes and incorporate them, resulting in a more compelling product without any significant extra cost. You now have two products that can sell for the same price, but B's is better. A would need to prove that B copied their bugfixes rather than simply fixing them themselves , which probably isn't going to happen.

In a larger market, if B is the only vendor who does this then their advantage isn't large - some of A's work is misappropriated by B, but A does benefit from the engineering work contributed by C, D, E, F and G. A combination of social pressure and legal threats may bring B into compliance. But if infringement is the norm, A has no incentive at all to release the source - by doing so they'll be helping not only B, but also C, D, E, F and G. Everyone undercuts A and they go out of business quite quickly.

Moral: In the absence of enforcement, if everyone else is infringing, a single company who complies is at a disadvantage.

If compliance cost nothing then everyone would do it


You can argue that cheap tablets from China are infringing simply because nobody knows better. But what's HTC's excuse? They've clearly decided that there's a benefit in holding back their source code releases[1], balancing this against the risk of being sued. They know full well what they're doing. If compliance was free they'd ship the source at the same time as they shipped the binaries. Other significant vendors are also fully aware of their obligations but choose to ignore them anyway.

Summary


There are economic incentives to infringe the GPL, and therefore (all else being equal) an infringing device can be sold for less money. All else being equal, a cheaper device will sell more units. More sales means more devices selling adverts for Google. Google makes more money because Android vendors infringe the GPL.

[1] The usual argument is "We will release the source code within 120 days", implying that it's a process that takes time and we should just be patient. Every single time I've started making threatening noises, the source has appeared within a week.

comment count unavailable comments

Syndicated 2012-01-04 15:08:34 from Matthew Garrett

Android, GPL violations and Google

A bit over a year ago, I wrote about how an incredible number of Android tablets on the market were in violation of the terms of the GPL. I've had rather a lot else to do since then so it's now awfully out of date - but taking a quick browse through the current stack of cheaper devices indicates that things aren't all that much better. We've got source code for some chipsets that were missing it before, but to compensate we've got a whole bunch of new hardware that's entirely lacking. It's all pretty poor, really.

At the time, I wrote the following:

"(Side note: People sometimes ask why Google aren't doing more to prevent infringing devices. For the vast majority of these cases, Google's sole contribution has been to put Android source code on a public website. Red Hat own more of the infringing code than Google do. There's no real reason why Google should be the ones taking the lead role here, and there's fairly sound business reasons why it's not in their interest to do so)"

Factually speaking, nothing's changed. Each of these devices contains code owned by Google, and Google could absolutely take legal action against the vendors. Equally, so could Red Hat, Intel, Nokia and dozens of other companies who hold copyright on portions of the code carried on these devices, and so could thousands of individuals around the world. Nobody's obliged to enforce their copyrights, and in the absence of anyone else doing so it's unreasonable to insist that Google should do it.

However.

Google gives Android away. This seems like an odd thing for them to do, given that it's a significant engineering effort and costs a lot of money to produce. But remember what Android brings to Google - it's a platform with a well-integrated mechanism for distributing advertising to users. Scanning the market shows a huge number of ad-supported apps, and Google's getting money for every single one of those that gets shown. The more Android devices, the bigger the market for apps - and the wider their advertising reach.

In other words: unscrupulous hardware vendors save money by ignoring their GPL obligations. This lets them appeal on price, increasing the number of Android devices in use and increasing Google's profits. Google makes money off other people's violation of the GPL.

Could Google do anything to stop this? Yes. They could sue for copyright infringement, but that kind of thing's time consuming and awkward and any argument about the GPL always seems to end up as a big argument involving conspiracy theories. Instead, Google could attach some extra conditions to the Android trademark. Requiring that the trademark only be attached to GPL-compliant products ought to allow Google to take advantage of the existing well-tested mechanisms for seizing counterfeit goods, providing a direct economic incentive for companies to come into compliance. For added marks, they could restrict the adwords code to devices that use the trademark - if the vendor removes the trademark, applications depending on the adwords functionality would refuse to run and Google wouldn't make money off the infringing hardware.

Or, of course, they could just carry on making extra money as a result of vendors denying users the freedoms granted by the copyright holders. Although that sounds kind of evil to me.

comment count unavailable comments

Syndicated 2012-01-04 03:11:38 from Matthew Garrett

TVs are all awful

A discussion a couple of days ago about DPI detection (which is best summarised by this and this and I am not having this discussion again) made me remember a chain of other awful things about consumer displays and EDID and there not being enough gin in the world, and reading various bits of the internet and wikipedia seemed to indicate that almost everybody who's written about this has issues with either (a) technology or (b) English, so I might as well write something.

The first problem is unique (I hope) to 720p LCD TVs. 720p is an HD broadcast standard that's defined as having a resolution of 1280x720. A 720p TV is able to display that image without any downscaling. So, naively, you'd expect them to have 1280x720 displays. Now obviously I wouldn't bother mentioning this unless there was some kind of hilarious insanity involved, so you'll be entirely unsurprised when I tell you that most actually have 1366x768 displays. So your 720p content has to be upscaled to fill the screen anyway, but given that you'd have to do the same for displaying 720p content on a 1920x1080 device this isn't the worst thing ever in the world. No, it's more subtle than that.

EDID is a standard for a blob of data that allows a display device to express its capabilities to a video source in order to ensure that an appropriate mode is negotiated. It allows resolutions to be expressed in a bunch of ways - you can set a bunch of bits to indicate which standard modes you support (1366x768 is not one of these standard modes), you can express the standard timing resolution (the horizontal resolution divided by 8, followed by an aspect ratio) and you can express a detailed timing block (a full description of a supported resolution).

1366/8 = 170.75. Hm.

Ok, so 1366x768 can't be expressed in the standard timing resolution block. The closest you can provide for the horizontal resolution is either 1360 or 1368. You also can't supply a vertical resolution - all you can do is say that it's a 16:9 mode. For 1360, that ends up being 765. For 1368, that ends up being 769.

It's ok, though, because you can just put this in the detailed timing block, except it turns out that basically no TVs do, probably because the people making them are the ones who've taken all the gin.

So what we end up with is a bunch of hardware that people assume is 1280x720, but is actually 1366x768, except they're telling your computer that they're either 1360x765 or 1368x769. And you're probably running an OS that's doing sub-pixel anti-aliasing, which requires that the hardware be able to address the pixels directly which is obviously difficult if you think the screen is one size and actually it's another. Thankfully Linux takes care of you here, and this code makes everything ok. Phew, eh?

But ha ha, no, it's worse than that. And the rest applies to 1080p ones as well.

Back in the old days when TV signals were analogue and got turned into a picture by a bunch of magnets waving a beam of electrons about all over the place, it was impossible to guarantee that all TV sets were adjusted correctly and so you couldn't assume that the edges of a picture would actually be visible to the viewer. In order to put text on screen without risking bits of it being lost, you had to steer clear of the edges. Over time this became roughly standardised and the areas of the signal that weren't expected to be displayed were called overscan. Now, of course, we're in a mostly digital world and such things can be ignored, except that when digital TVs first appeared they were mostly used to watch analogue signals so still needed to overscan because otherwise you'd have the titles floating weirdly in the middle of the screen rather than towards the edges, and so because it's never possible to kill technology that's escaped into the wild we're stuck with it.

tl;dr - Your 1920x1080 TV takes a 1920x1080 signal, chops the edges off it and then stretches the rest to fit the screen because of decisions made in the 1930s.

So you plug your computer into a TV and even though you know what the resolution really is you still don't get to address the individual pixels. Even worse, the edges of your screen are missing.

The best thing about overscan is that it's not rigorously standardised - different broadcast bodies have different recommendations, but you're then still at the mercy of what your TV vendor decided to implement. So what usually happens is that graphics vendors have some way in their drivers to compensate for overscan, which involves you manually setting the degree of overscan that your TV provides. This works very simply - you take your 1920x1080 framebuffer and draw different sized black borders until the edge of your desktop lines up with the edge of your TV. The best bit about this is that while you're still scanning out a 1920x1080 mode, your desktop has now shrunk to something more like 1728x972 and your TV is then scaling it back up to 1920x1080. Once again, you lose.

The HDMI spec actually defines an extension block for EDID that indicates whether the display will overscan or not, but doesn't provide any way to work out how much it'll overscan. We haven't seen many of those in the wild. It's also possible to send an HDMI information frame that indicates whether or not the video source is expecting to be overscanned or not, but (a) we don't do that and (b) it'll probably be ignored even if we did, because who ever tests this stuff. The HDMI spec also says that the default behaviour for 1920x1080 (but not 1366x768) should be to assume overscan. Charming.

The best thing about all of this is that the same TV will often have different behaviour depending on whether you connect via DVI or HDMI, but some TVs will still overscan DVI. Some TVs have options in the menu to disable overscan and others don't. Some monitors will overscan if you feed them an HD resolution over HDMI, so if you have HD content and don't want to lose the edges then your hardware needs to scale it down and let the display scale it back up again. It's all awful. I recommend you drink until everything's already blurry and then none of this will matter.

comment count unavailable comments

Syndicated 2012-01-03 17:46:40 from Matthew Garrett

321 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!