6 Jul 2012 mjg59   » (Master)

DMI checks are a last resort

Most x86 devices export various bits of system information via SMBIOS, including the system manufacturer, model and firmware version. This makes it possible for the kernel to alter its behaviour depending on the machine it's running on, usually referred to as "DMI quirking". This is a very attractive approach to handling machine-specific bugs - unfortunately it also means that we often end up working around symptoms with no understanding of the underlying issue.

Almost all x86 hardware is tested with Windows. For the most part vendors don't ship devices that don't work - if you install a stock copy of Windows on a system, you expect it to boot successfully and reboot properly. Basic ACPI functionality should be present and correct, including processor power-saving states. Time should pass at something approximating the real rate. And since stock Windows isn't updated with large numbers of DMI quirk entries, the hardware needs to do this with an operating system that's already shipped.

Which means that if Linux doesn't provide the same level of functionality, it means we're doing something different to Windows. Sometimes this is because we're doing something fundamentally different with the hardware - the HP NX6125, for example, resets its thermal trip points if the timer is set up in a different way to Windows. In this case we've decided that the additional functionality of doing things the Linux way is worth it, and we'll just blacklist the small set of machines that are broken by it.

But other times there's no need for the difference. For years we were triggering system reboots in a different way to Windows and then adding DMI workarounds for any systems that didn't work. The problem with this approach is that it's basically impossible to guarantee that you've found the full set of broken hardware. It's very easy to add another DMI quirk, but it doesn't solve the problem for anyone who bought a machine, tried Linux and gave up when they found reboot didn't work. More recently we've added a pile of quirks for Dells - turns out that in all cases we're working around a bug in their firmware that hangs if we're using VT-d.

We typically don't merge code that fixes a specific example of a problem without at least considering whether there's a wider class of similar problems that could all be solved at once. We should take the same attitude to DMI quirks. Sometimes they're the least harmful way of handling an issue, but most of the time they're just fixing one person's itch and leaving a larger number of people to continue cursing at Linux. There should at least be a demonstration of an attempt to understand the underlying problem before just adding another quirk, and every patch that permits the removal of some DMI checks should be greeted with great cheer.

comment count unavailable comments

Syndicated 2012-07-06 14:19:43 from Matthew Garrett

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!