Reducing power consumption on Haswell and Broadwell systems
Haswell and Broadwell (Intel's previous and current generations of x86) both introduced a range of new power saving states that promised significant improvements in battery life. Unfortunately, the typical experience on Linux was an increase in power consumption. The reasons why are kind of complicated and distinctly unfortunate, and I'm at something of a loss as to why none of the companies who get paid to care about this kind of thing seemed to actually be caring until I got a Broadwell and looked unhappy, but here we are so let's make things better.
Recent Intel mobile parts have the Platform Controller Hub (Intel's term for the Southbridge, the chipset component responsible for most system i/o like SATA and USB) integrated onto the same package as the CPU. This makes it easier to implement aggressive power saving - the CPU package already has a bunch of hardware for turning various clock and power domains on and off, and these can be shared between the CPU, the GPU and the PCH. But that also introduces additional constraints, since if any component within a power management domain is active then the entire domain has to be enabled. We've pretty much been ignoring that.
The tldr is that Haswell and Broadwell are only able to get into deeper package power saving states if several different components are in their own power saving states. If the CPU is active, you'll stay in a higher-power state. If the GPU is active, you'll stay in a higher-power state. And if the PCH is active, you'll stay in a higher-power state. The last one is the killer here. Having a SATA link in a full-power state is sufficient to keep the PCH active, and that constrains the deepest package power savings state you can enter.
SATA power management on Linux is in a kind of odd state. We support it, but we don't enable it by default. In fact, right now we even remove any existing SATA power management configuration that the firmware has initialised. Distributions don't enable it by default because there are horror stories about some combinations of disk and controller and power management configuration resulting in corruption and data loss and apparently nobody had time to investigate the problem.
I did some digging and it turns out that our approach isn't entirely inconsistent with the industry. The default behaviour on Windows is pretty much the same as ours. But vendors don't tend to ship with the Windows AHCI driver, they replace it with the Intel Rapid Storage Technology driver - and it turns out that that has a default-on policy. But to make things even more awkwad, the policy implemented by Intel doesn't match any of the policies that Linux provides.
In an attempt to address this, I've written some patches. The aim here is to provide two new policies. The first simply inherits whichever configuration the firmware has provided, on the assumption that the system vendor probably didn't configure their system to corrupt data out of the box[1]. The second implements the policy that Intel use in IRST. With luck we'll be able to use the firmware settings by default and switch to the IRST settings on Intel mobile devices.
This change alone drops my idle power consumption from around 8.5W to about 5W. One reason we'd pretty much ignored this in the past was that SATA power management simply wasn't that big a win. Even at its most aggressive, we'd struggle to see 0.5W of saving. But on these new parts, the SATA link state is the difference between going to PC2 and going to PC7, and the difference between those states is a large part of the CPU package being powered up.
But this isn't the full story. There's still work to be done on other components, especially the GPU. Keeping the link between the GPU and an internal display panel active is both a power suck and requires additional chipset components to be powered up. Embedded Displayport 1.3 introduced a new feature called Panel Self-Refresh that permits the GPU and the screen to negotiate dropping the link, leaving it up to the screen to maintain its contents. There's patches to enable this on Intel systems, but it's still not turned on by default. Doing so increases the amount of time spent in PC7 and brings corresponding improvements to battery life.
This trend is likely to continue. As systems become more integrated we're going to have to pay more attention to the interdependencies in order to obtain the best possible power consumption, and that means that distribution vendors are going to have to spend some time figuring out what these dependencies are and what the appropriate default policy is for their users. Intel's done the work to add kernel support for most of these features, but they're not the ones shipping it to end-users. Let's figure out how to make this right out of the box.
[1] This is not necessarily a good assumption, but hey, let's see
comments