2006-10-24: A day in the life of power management
A day in the life of power management
Hmm, it seems the expected flamewar is flaming even better than expected.
hub
chimes in to say that power management has been working just fine on Linux
since 1999, thank you very much. Um, yes, well... sit back and relax,
because I have a story to tell. Like all my favourite stories, it involves
a lot of stupid people doing stupid things. It doesn't have a happy ending.
If you're easily depressed, you should stop reading right now. That means
you, Pierre.
Once upon a time there was APM, or
"Advanced Power Management." As with most fancy-named specifications, its
name certainly doesn't imply that there was an earlier "Totally Simple Power
Management" that it replaced. The way APM worked was it used a kind of
"super supervisor mode" in the x86-series CPUs to actually run some BIOS
code above the operating system layer in an invisible way so the kernel
couldn't detect it (except for the bad BIOS coding that would result in lost
clock ticks and random interrupt latency). This BIOS code would do things
like detect presses of the laptop's suspend button and forcibly suspend the
system whether the operating system liked it or not. If your operating
system supported APM, which was by no means necessary in order for
suspend/resume to work, it could get a notification from the BIOS
that it was about to suspend. In later APM specifications, they added a way
for your software to reject the suspend operation in progress.
However, because of BIOS bugs, this didn't really work so well. Also, if
you took too long to service the notification, the BIOS would just give up
on you and suspend anyway. Ah, those were the days.
Oh, also, because the BIOS writers for a laptop always knew exactly
which hardware devices were in the particular laptop, the APM BIOS could
always know exactly how to suspend and resume all the devices in your
laptop transparently to the kernel. And yes, it actually worked just fine.
Even the video mode was saved and restored correctly on most systems, all
behind the scenes.
But that was a long time ago. To give you an idea of how obsolete APM is,
my apmd page seems to be the
fourth hit in a Google search for APM - and it's the first one about
power management. And let me tell you, that's not because I'm such a
popular guy.
There were two perceived problems with APM. First of all, it was highly
x86-specific, which annoyed the people at Intel who were trying to make and
sell a non-Intel-compatible processor. (Remember that failed experiment?
Me neither.) Secondly, operating systems programmers correctly noted that
all BIOS programmers are crackheaded morons who can't implement an API
correctly to save their lives. The way things tended to work was this:
Windows didn't come with native APM support at the time, and the BIOS
programmers would screw up the APM implementation horribly, but that was
okay! Because every motherboard had to include a special APM driver for
Windows anyway, and this APM driver would just implement the broken APM
calls that the BIOS required, and so nobody would know the difference. Sure
enough, nobody (er, well, no Windows users) did, and the world of power
management was a fine place. For Windows users.
Linux users had a bit of trouble because they had to independently discover
all the stupid BIOS bugs in various laptop APM implementations. But they
mostly sorted this out eventually. In any case, one popular way to make
your problems go away and have suspend/resume work mostly right was to
simply disable the Linux APM altogether and just have your BIOS do it
transparently.
So anyway, back to those people who wanted to fix what wasn't particularly
broken. They invented ACPI, and I want
to kill them. Oops, I'm getting ahead of myself.
ACPI stands for Advanced Control and Power Interface. Now, I have two
things to say about that. First of all, there was no "Delightfully Simple
and Straightforward Control and Power Interface," although ACPI actually
makes APM seem that way in retrospect. And secondly, ACPI has nothing at
all to do with APIC,
the Advanced Programmable Interrupt Controller. The only comparable thing
between the two is that they both have Linux kernel boot-time options to
disable them because they both have buggy Linux drivers that cause your
computer to crash a lot.
Now where was I? Oh, right, ACPI. So, the idea of ACPI was to get the BIOS
developers out of the way on a normally-running system by turning around the
power interface: instead of the BIOS running things and just occasionally
notifying the kernel when something happened, the kernel would run
things and just ask the BIOS to do stuff occasionally, like power down
various hardware and blink the lights and so on. That would mean BIOS bugs
wouldn't be so harmful. Oh! And while we're here, because we're insane,
why not implement the whole thing using Forth-like bytecode instead of real
assembly language, so it can also run on that new (doomed) 64-bit processor
we've been working on? Forth-like bytecode is super simple and can be
implemented in a couple of kbytes, so it won't cause much overhead, and
suddenly everything will be portable. It'll be great!
Because I like foreshadowing, I'll give you the quick version of what I'm
about to say. To my total amazement, they managed to fail totally on
all counts. How's that for consistency?
First of all, ACPI completely and utterly fails to remove the BIOS from the
picture: in fact, you're calling back and forth to it far more than you ever
did with APM. And because you're calling it from your context
instead of it having its well-known super-supervisor context, it's more
likely to get confused by the funny way you do your stack or registers or
memory protection modes. And there are so many ways to call
into ACPI, because they broke it into tiny pieces so your kernel can
control exactly what it wants to when it wants to... except that the BIOS
manufacturers didn't actually test what happens when you call their
stuff in random order, so actually you have to reverse engineer exactly what
order is safe to call things in, or your system crashes horribly.
Oh, also, the bytecode thing went awry somewhere, because the Linux ACPI
implementation runs to hundreds of kbytes and is filled with all
kinds of weird and very complicated special case drivers for obviously
totally dissimilar APIs like fans (it goes fast! it goes slow!) and CPUs
(it goes fast! it goes slow!) and LCD backlights (it gets bright! it gets
dark!). And naturally, ACPI, being a big horrible pile of crap, was never
adopted on any non-x86 platforms, so its CPU independence doesn't help.
(Around the time all this garbage was being invented, people were trying to
make non-x86 platforms run PCI video cards, which was tough because the
video card initialization code was written in x86 assembly. The XFree86
group and other groups solved this problem unilaterally in a less
elegant-sounding but actually working way. To this day, video BIOSes are
still in x86 machine code, not bytecode.)
But that's not all!
Remember, the OS developers wanted to get the BIOS developers out of the
picture, because BIOS developers are indeed crackheaded morons - I think we
can all agree on that. Unfortunately, while they completely failed to do
this - and in fact, ACPI makes things much worse - they also made it
so the BIOS developers can happily just disclaim any responsibility for
whatever parts of power management they don't want. Once upon a time, in
the golden age of APM, the BIOS had to support all your
devices because it was the BIOS whose @#$! responsibility it was to
suspend and resume everything. Now, however, the OS is expected to pick up
wherever the BIOS leaves off - which is almost everywhere. That means most
ACPI BIOS implementations don't actually handle any parts of the
suspend/resume process properly, often including even the CPU speed.
Certainly they're not smart enough to suspend your ethernet chip. And
heaven help you if you want your video mode to be restored! My now-stolen
last laptop, a Sony Vaio, actually had an ACPI interface to control the LCD
backlight... but it didn't do anything. There was a totally different
non-ACPI backlight control elsewhere in the system, and the BIOS developers
simply didn't bother to take out the ACPI one leftover from a previous
laptop model. The OS has to know this, based on the laptop model number,
and deal with it.
But that's okay! Because the Windows driver programmers, sitting right next
to the BIOS programmers or maybe the hardware designers, can simply
compensate for all this stuff. The CD that comes with every laptop contains
modified drivers for all the broken stuff the hardware and BIOS designers
did wrong when building your system in the first place, so everything is
fine! For Windows users.
Now, Linux certainly didn't have it all easy in the days of APM, but things
had a pretty good chance of working because they were relatively simple.
With ACPI, everything is just a total disaster. You have to implement
suspend-to-disk all by yourself, and it's doomed to suck because the stupid
BIOS does its time-consuming and useless initialization before you're even
allowed to start. You have to implement power saving features in every
single driver, where with APM you didn't have to do it in any
driver. Linux developers are notoriously bad at handling exception
conditions, which power management is, so the power management code for most
drivers is almost-untested barely working garbage. And of course, you
do still have to call into the
mostly-but-unfortunately-not-totally-useless ACPI BIOS, which for some
reason takes hundreds of kbytes of source code to do and requires
talking to a horrendously buggy BIOS. That means you need an exception
table listing every laptop anyway to tell the kernel which bugs you need to
work around at which time.
And if you do all that stuff correctly, your laptop will suspend and
resume properly!
And you know what? Even then, it'll still suck, because that's just what
you have to do to make it barely work at all. If you want to, say,
bypass the stupid BIOS POST phase to make it boot faster, or do what Apple
does and actually save to disk and memory at suspend time, then
resume from memory whenever possible, or have the system suspend to memory
and then suspend to disk if you stay suspended for a long time, or any of
that other complicated stuff: that's all extra. Meanwhile, most hardware
developers are a bunch of slackers and even when you do suspend the
bloody thing properly, the battery dies in a few hours anyhow.
So kudos to the Linux developers for making it almost work. I'm sure that
was really hard. Yay team.
Compared to that, Apple cheated like crazy. But I still like my Mac,
because it actually works.
(And I have Ubuntu running constantly in a virtual machine because I mostly
hate Darwin, but that's another
story.)
Syndicated 2006-10-24 01:39:02 from apenwarr's log