Older blog entries for mjg59 (starting at number 447)

12 May 2016 »

Convenience, security and freedom - can we pick all three?

Moxie, the lead developer of the Signal secure communication application, recently blogged on the tradeoffs between providing a supportable federated service and providing a compelling application that gains significant adoption. There's a set of perfectly reasonable arguments around that that I don't want to rehash - regardless of feelings on the benefits of federation in general, there's certainly an increase in engineering cost in providing a stable intra-server protocol that still allows for addition of new features, and the person leading a project gets to make the decision about whether that's a valid tradeoff.

One voiced complaint about Signal on Android is the fact that it depends on the Google Play Services. These are a collection of proprietary functions for integrating with Google-provided services, and Signal depends on them to provide a good out of band notification protocol to allow Signal to be notified when new messages arrive, even if the phone is otherwise in a power saving state. At the time this decision was made, there were no terribly good alternatives for Android. Even now, nobody's really demonstrated a free implementation that supports several million clients and has no negative impact on battery life, so if your aim is to write a secure messaging client that will be adopted by as many people is possible, keeping this dependency is entirely rational.

On the other hand, there are users for whom the decision not to install a Google root of trust on their phone is also entirely rational. I have no especially good reason to believe that Google will ever want to do something inappropriate with my phone or data, but it's certainly possible that they'll be compelled to do so against their will. The set of people who will ever actually face this problem is probably small, but it's probably also the set of people who benefit most from Signal in the first place.

(Even ignoring the dependency on Play Services, people may not find the official client sufficient - it's very difficult to write a single piece of software that satisfies all users, whether that be down to accessibility requirements, OS support or whatever. Slack may be great, but there's still people who choose to use Hipchat)

This shouldn't be a problem. Signal is free software and anybody is free to modify it in any way they want to fit their needs, and as long as they don't break the protocol code in the process it'll carry on working with the existing Signal servers and allow communication with people who run the official client. Unfortunately, Moxie has indicated that he is not happy with forked versions of Signal using the official servers. Since Signal doesn't support federation, that means that users of forked versions will be unable to communicate with users of the official client.

This is awkward. Signal is deservedly popular. It provides strong security without being significantly more complicated than a traditional SMS client. In my social circle there's massively more users of Signal than any other security app. If I transition to a fork of Signal, I'm no longer able to securely communicate with them unless they also install the fork. If the aim is to make secure communication ubiquitous, that's kind of a problem.

Right now the choices I have for communicating with people I know are either convenient and secure but require non-free code (Signal), convenient and free but insecure (SMS) or secure and free but horribly inconvenient (gpg). Is there really no way for us to work as a community to develop something that's all three?

comments

Syndicated 2016-05-12 14:40:00 from Matthew Garrett

22 Apr 2016 »

Circumventing Ubuntu Snap confinement

Ubuntu 16.04 was released today, with one of the highlights being the new Snap package format. Snaps are intended to make it easier to distribute applications for Ubuntu - they include their dependencies rather than relying on the archive, they can be updated on a schedule that's separate from the distribution itself and they're confined by a strong security policy that makes it impossible for an app to steal your data.

At least, that's what Canonical assert. It's true in a sense - if you're using Snap packages on Mir (ie, Ubuntu mobile) then there's a genuine improvement in security. But if you're using X11 (ie, Ubuntu desktop) it's horribly, awfully misleading. Any Snap package you install is completely capable of copying all your private data to wherever it wants with very little difficulty.

The problem here is the X11 windowing system. X has no real concept of different levels of application trust. Any application can register to receive keystrokes from any other application. Any application can inject fake key events into the input stream. An application that is otherwise confined by strong security policies can simply type into another window. An application that has no access to any of your private data can wait until your session is idle, open an unconfined terminal and then use curl to send your data to a remote site. As long as Ubuntu desktop still uses X11, the Snap format provides you with very little meaningful security. Mir and Wayland both fix this, which is why Wayland is a prerequisite for the sandboxed xdg-app design.

I've produced a quick proof of concept of this. Grab XEvilTeddy from git, install Snapcraft (it's in 16.04), snapcraft snap, sudo snap install xevilteddy*.snap, /snap/bin/xevilteddy.xteddy . An adorable teddy bear! How cute. Now open Firefox and start typing, then check back in your terminal window. Oh no! All my secrets. Open another terminal window and give it focus. Oh no! An injected command that could instead have been a curl session that uploaded your private SSH keys to somewhere that's not going to respect your privacy.

The Snap format provides a lot of underlying technology that is a great step towards being able to protect systems against untrustworthy third-party applications, and once Ubuntu shifts to using Mir by default it'll be much better than the status quo. But right now the protections it provides are easily circumvented, and it's disingenuous to claim that it currently gives desktop users any real security.

comments

Syndicated 2016-04-22 01:51:19 from Matthew Garrett

18 Apr 2016 »

One more attempt at SATA power management

Around a year ago I wrote some patches in an attempt to improve power management on Haswell and Broadwell systems by configuring Serial ATA power management appropriately. I got a couple of reports of them triggering SATA errors for some users, couldn't reproduce them myself and so didn't have a lot of confidence in them. Time passed.

I've been working on power management stuff again this week, so it seemed like a good opportunity to revisit these. I've made a few changes and pushed a couple of trees - one against master and one against 4.5.

First, these probably only have relevance to users of mobile Intel parts in the U or S range (/proc/cpuinfo will tell you - you're looking for a four-digit number that starts with 4 (Haswell), 5 (Broadwell) or 6 (Skylake) and ends with U or S), and won't do anything unless you have SATA drives (including PCI-based SATA). To test them, first disable anything like TLP that might alter your SATA link power management policy. Then check powertop - you should only be getting to PC3 at best. Build a kernel with these patches and boot it. /sys/class/scsi_host/*/link_power_management_policy should read "firmware". Check powertop and see whether you're getting into deeper PC states. Now run your system for a while and check the kernel log for any SATA errors that you didn't see before.

Let me know if you see SATA errors and are willing to help debug this, and leave a comment if you don't see any improvement in PC states.

comments

Syndicated 2016-04-18 02:15:58 from Matthew Garrett

15 Apr 2016 »

David MacKay

The first time I was paid to do software development came as something of a surprise to me. I was working as a sysadmin in a computational physics research group when a friend asked me if I'd be willing to talk to her PhD supervisor. I had nothing better to do, so said yes. And that was how I started the evening having dinner with David MacKay, and ended the evening better fed, a little drunker and having agreed in principle to be paid to write free software.

I'd been hired to work on Dasher, an information-efficient text entry system. It had been developed by one of David's students as a practical demonstration of arithmetic encoding after David had realised that presenting a visualisation of an effective compression algorithm allowed you to compose text without having to enter as much information into the system. At first this was merely a neat toy, but it soon became clear that the benefits of Dasher had a great deal of overlap with good accessibility software. It required much less precision of input, it made it easy to correct mistakes (you merely had to reverse direction in order to start zooming back out of the text you had entered) and it worked with a variety of input technologies from mice to eye tracking to breathing. My job was to take this codebase and turn it into a project that would be interesting to external developers.

In the year I worked with David, we turned Dasher from a research project into a well-integrated component of Gnome, improved its support for Windows, accepted code from an external contributor who ported it to OS X (using an OpenGL canvas!) and wrote ports for a range of handheld devices. We added code that allowed Dasher to directly control the UI of other applications, making it possible for people to drive word processors without having to leave Dasher. We taught Dasher to speak. We strove to avoid the mistakes present in so many other pieces of accessibility software, such as configuration that could only be managed by an (expensive!) external consultant. And we visited Dasher users and learned how they used it and what more they needed, then went back home and did what we could to provide that.

Working on Dasher was an incredible opportunity. I was involved in the development of exciting code. I spoke on it at multiple conferences. I became part of the Gnome community. I visited the USA for the first time. I entered people's homes and taught them how to use Dasher and experienced their joy as they realised that they could now communicate up to an order of magnitude more quickly. I wrote software that had a meaningful impact on the lives of other people.

Working with David was certainly not easy. Our weekly design meetings were, charitably, intense. He had an astonishing number of ideas, and my job was to figure out how to implement them while (a) not making the application overly complicated and (b) convincing David that it still did everything he wanted. One memorable meeting involved me gradually arguing him down from wanting five new checkboxes to agreeing that there were only two combinations that actually made sense (and hence a single checkbox) - and then admitting that this was broadly equivalent to an existing UI element, so we could just change the behaviour of that slightly without adding anything. I took the opportunity to delete an additional menu item in the process.

I was already aware of the importance of free software in terms of developers, but working with David made it clear to me how important it was to users as well. A community formed around Dasher, helping us improve it and allowing us to develop support for new use cases that made the difference between someone being able to type at two words per minute and being able to manage twenty. David saw that this collaborative development would be vital to creating something bigger than his original ideas, and it succeeded in ways he couldn't have hoped for.

I spent a year in the group and then went back to biology. David went on to channel his strong feelings about social responsibility into issues such as sustainable energy, writing a freely available book on the topic. He served as chief adviser to the UK Department of Energy and Climate Change for five years. And earlier this year he was awarded a knighthood for his services to scientific outreach.

David died yesterday. It's unlikely that I'll ever come close to what he accomplished, but he provided me with much of the inspiration to try to do so anyway. The world is already a less fascinating place without him.

comments

Syndicated 2016-04-15 06:26:14 from Matthew Garrett

13 Apr 2016 »

Skylake's power management under Linux is dreadful and you shouldn't buy one until it's fixed

Linux 4.5 seems to have got Intel's Skylake platform (ie, 6th-generation Core CPUs) to the point where graphics work pretty reliably, which is great progress (4.4 tended to lose all my windows every so often, especially over suspend/resume). I'm even running Wayland happily. Unfortunately one of the reasons I have a laptop is that I want to be able to do things like use it on battery, and power consumption's an important part of that. Skylake continues the trend from Haswell of moving to an SoC-type model where clock and power domains are shared between components that were previously entirely independent, and so you can't enter deep power saving states unless multiple components all have the correct power management configuration. On Haswell/Broadwell this manifested in the form of Serial ATA link power management being involved in preventing the package from going into deep power saving states - setting that up correctly resulted in a reduction in full-system power consumption of about 40%[1].

I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything. My battery life is pretty poor and the system is pretty warm.

The best thing about this is the following statement from page 64 of the 6th Generation Intel ® Processor Datasheet for U-Platforms:

Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled.

which is pretty concerning. Without support for states deeper than PC3, Linux is running in a configuration that Intel imply may trigger premature failure. That's obviously not good. Until this situation is improved, you probably shouldn't buy any Skylake systems if you're planning on running Linux.

[1] These patches never went upstream. Someone reported that they resulted in their SSD throwing errors and I couldn't find anybody with deeper levels of SATA experience who was interested in working on the problem. Intel's AHCI drivers for Windows do the right thing, but I couldn't find anybody at Intel who could get any information from their Windows driver team.

comments

Syndicated 2016-04-13 20:22:40 from Matthew Garrett

11 Apr 2016 »

Making it easier to deploy TPMTOTP on non-EFI systems

I've been working on TPMTOTP a little this weekend. I merged a pull request that adds command-line argument handling, which includes the ability to choose the set of PCRs you want to seal to without rebuilding the tools, and also lets you print the base32 encoding of the secret rather than the qr code so you can import it into a wider range of devices. More importantly it also adds support for setting the expected PCR values on the command line rather than reading them out of the TPM, so you can now re-seal the secret against new values before rebooting.

I also wrote some new code myself. TPMTOTP is designed to be usable in the initramfs, allowing you to validate system state before typing in your passphrase. Unfortunately the initramfs itself is one of the things that's measured. So, you end up with something of a chicken and egg problem - TPMTOTP needs access to the secret, and the obvious thing to do is to put the secret in the initramfs. But the secret is sealed against the hash of the initramfs, and so you can't generate the secret until after the initramfs. Modify the initramfs to insert the secret and you change the hash, so the secret is no longer released. Boo.

On EFI systems you can handle this by sticking the secret in an EFI variable (there's some special-casing in the code to deal with the additional metadata on the front of things you read out of efivarfs). But that's not terribly useful if you're not on an EFI system. Thankfully, there's a way around this. TPMs have a small quantity of nvram built into them, so we can stick the secret there. If you pass the -n argument to sealdata, that'll happen. The unseal apps will attempt to pull the secret out of nvram before falling back to looking for a file, so things should just magically work.

I think it's pretty feature complete now, other than TPM2 support? That's on my list.

comments

Syndicated 2016-04-11 05:59:32 from Matthew Garrett

5 Apr 2016 »

There's more than one way to exploit the commons

There's a piece of software called XScreenSaver. It attempts to fill two somewhat disparate roles:

Provide a functioning screen lock on systems using the X11 windowing system, a job made incredibly difficult due to a variety of design misfeatures in said windowing system[1]
Provide cute graphical output while the screen is locked

XScreenSaver does an excellent job of the second of these[2] and is pretty good at the first, which is to say that it only suffers from a disasterous security flaw once very few years and as such is certainly not appreciably worse than any other piece of software.

Debian ships an operating system that prides itself on stability. The Debian definition of stability is a very specific one - rather than referring to how often the software crashes or misbehaves, it refers to how often the software changes behaviour. Debian is very reluctant to upgrade software that is part of a stable release, to the extent that developers will attempt to backport individual security fixes to the version they shipped rather than upgrading to a release that contains all those security fixes but also adds a new feature. The argument here is that the new release may also introduce new bugs, and Debian's users desire stability (in the "things don't change" sense) more than new features. Backporting security fixes keeps them safe without compromising the reason they're running Debian in the first place.

This all makes plenty of sense at a theoretical level, but reality is sometimes less convenient. The first problem is that security bugs are typically also, well, bugs. They may make your software crash or misbehave in annoying but apparently harmless ways. And when you fix that bug you've also fixed a security bug, but the ability to determine whether a bug is a security bug or not is one that involves deep magic and a fanatical devotion to the cause so given the choice between maybe asking for a CVE and dealing with embargoes and all that crap when perhaps you've actually only fixed a bug that makes the letter "E" appear in places it shouldn't and not one that allows the complete destruction of your intergalactic invasion fleet means people will tend to err on the side of "Eh fuckit" and go drinking instead. So new versions of software will often fix security vulnerabilities without there being any indication that they do so[3], and running old versions probably means you have a bunch of security issues that nobody will ever do anything about.

But that's broadly a technical problem and one we can apply various metrics to, and if somebody wanted to spend enough time performing careful analysis of software we could have actual numbers to figure out whether the better security approach is to upgrade or to backport fixes. Conversations become boring once we introduce too many numbers, so let's ignore that problem and go onto the second, which is far more handwavy and social and so significantly more interesting.

The second problem is that upstream developers remain associated with the software shipped by Debian. Even though Debian includes a tool for reporting bugs against packages included in Debian, some users will ignore that and go straight to the upstream developers. Those upstream developers then have to spend at least 15 or so seconds telling the user that the bug they're seeing has been fixed for some time, and then figure out how to explain that no sorry they can't make Debian include a fixed version because that's not how things work. Worst case, the stable release of Debian ends up including a bug that makes software just basically not work at all and everybody who uses it assumes that the upstream author is brutally incompetent, and they end up quitting the software industry and I don't know running a nightclub or something.

From the Debian side of things, the straightforward solution is to make it more obvious that users should file bugs with Debian and not bother the upstream authors. This doesn't solve the problem of damaged reputation, and nor does it entirely solve the problem of users contacting upstream developers. If a bug is filed with Debian and doesn't get fixed in a timely manner, it's hardly surprising that users will end up going upstream. The Debian bugs list for XScreenSaver does not make terribly attractive reading.

So, coming back to the title for this entry. The most obvious failure of the commons is where a basically malicious actor consumes while giving nothing back, but if an actor with good intentions ends up consuming more than they contribute that may still be a problem. An upstream author releases a piece of software under a free license. Debian distributes this to users. Debian's policies result in the upstream author having to do more work. What does the upstream author get out of this exchange? In an ideal world, plenty. The author's software is made available to more people. A larger set of developers is willing to work on making improvements to the software. In a less ideal world, rather less. The author has to deal with bug mail about already fixed bugs. The author's reputation may be harmed by user exposure to said fixed bugs. The author may get less in the way of useful bug fixes or features because people are running old versions rather than fixing new ones. If the balance tips towards the latter, the author's decision to release their software under a free license has made their life more difficult.

Most discussions about Debian's policies entirely ignore the latter scenario, focusing more on the fact that the author chose to release their software under a free license to begin with. If the author is unwilling to handle the consequences of that, goes the argument, why did they do it in the first place? The unfortunate logical conclusion to that argument is that the author realises that they made a huge mistake and never does so again, and woo uh oops.

The irony here is that one of Debian's foundational documents, the Debian Free Software Guidelines, makes allowances for this. Section 4 allows for distribution of software in Debian even if the author insists that modified versions[4] are renamed. This allows for an author to make a choice - allow themselves to be associated with the Debian version of their work and increase (a) their userbase and (b) their support load, or try to distinguish what Debian ship from their identity. But that document was ratified in 1997 and people haven't really spent much time since then thinking about why it says what it does, and so this tradeoff is rarely considered.

Free software doesn't benefit from distributions antagonising their upstreams, even if said upstream is a cranky nightclub owner. Debian's users are Debian's highest priority, but those users are going to suffer if developers decide that not using free licenses improves their quality of life. Kneejerk reactions around specific instances aren't helpful, but now is probably a good time to start thinking about what value Debian bring to its upstream authors and how that can be increased. Failing to do so doesn't serve users, Debian itself or the free software community as a whole.

[1] The X server has no fundamental concept of a screen lock. This is implemented by an application asking that the X server send all keyboard and mouse input to it rather than to any other application, and then that application creating a window that fills the screen. Due to some hilarious design decisions, opening a pop-up menu in an application prevents any other application from being able to grab input and so it is impossible for the screensaver to activate if you open a menu and then walk away from your computer. This is merely the most obvious problem - there are others that are more subtle and more infuriating. The only fix in this case is to nuke the site from orbit.

[2] There's screenshots here. My favourites are the one that emulate the electrical characteristics of an old CRT in order to present a more realistic depiction of the output of an Apple 2 and the one that includes a complete 6502 emulator.

[3] And obviously new versions of software will often also introduce new security vulnerabilities without there being any indication that they do so, because who would ever put that in their changelog. But the less ethically challenged members of the security community are more likely to be looking at new versions of software than ones released three years ago, so you're probably still tending towards winning overall

[4] There's a perfectly reasonable argument that all packages distributed by Debian are modified in some way

comments

Syndicated 2016-04-05 07:18:20 from Matthew Garrett

4 Apr 2016 »

TPMs, event logs, fine-grained measurements and avoiding fragility in remote-attestation

Trusted Platform Modules are fairly unintelligent devices. They can do some crypto, but they don't have any ability to directly monitor the state of the system they're attached to. This is worked around by having each stage of the boot process "measure" state into registers (Platform Configuration Registers, or PCRs) in the TPM by taking the SHA1 of the next boot component and performing an extend operation. Extend works like this:

New PCR value = SHA1(current value||new hash)

ie, the TPM takes the current contents of the PCR (a 20-byte register), concatenates the new SHA1 to the end of that in order to obtain a 40-byte value, takes the SHA1 of this 40-byte value to obtain a 20-byte hash and sets the PCR value to this. This has a couple of interesting properties:

You can't directly modify the contents of the PCR. In order to obtain a specific value, you need to perform the same set of writes in the same order. If you replace the trusted bootloader with an untrusted one that runs arbitrary code, you can't rewrite the PCR to cover up that fact
The PCR value is predictable and can be reconstructed by replaying the same series of operations

But how do we know what those operations were? We control the bootloader and the kernel and we know what extend operations they performed, so that much is easy. But the firmware itself will have performed some number of operations (the firmware itself is measured, as is the firmware configuration, and certain aspects of the boot process that aren't in our control may also be measured) and we may not be able to reconstruct those from scratch.

Thankfully we have more than just the final PCR date. The firmware provides an interface to log each extend operation, and you can read the event log in /sys/kernel/security/tpm0/binary_bios_measurements. You can pull information out of that log and use it to reconstruct the writes the firmware made. Merge those with the writes you performed and you should be able to reconstruct the final TPM state. Hurrah!

The problem is that a lot of what you want to measure into the TPM may vary between machines or change in response to configuration changes or system updates. If you measure every module that grub loads, and if grub changes the order that it loads modules in, you also need to update your calculations of the end result. Thankfully there's a way around this - rather than making policy decisions based on the final TPM value, just use the final TPM value to ensure that the log is valid. If you extract each hash value from the log and simulate an extend operation, you should end up with the same value as is present in the TPM. If so, you know that the log is valid. At that point you can examine individual log entries without having to care about the order that they occurred in, which makes writing your policy significantly easier.

But there's another source of fragility. Imagine that you're measuring every command executed by grub (as is the case in the CoreOS grub). You want to ensure that no inappropriate commands have been run (such as ones that would allow you to modify the loaded kernel after it's been measured), but you also want to permit certain variations - for instance, you might have a primary root filesystem and a fallback root filesystem, and you're ok with either being passed as a kernel argument. One approach would be to write two lines of policy, but there's an even more flexible approach. If the bootloader logs the entire command into the event log, when replaying the log we can verify that the event description hashes to the value that was passed to the TPM. If it does, rather than testing against an explicit hash value, we can examine the string itself. If the event description matches a regular expression provided by the policy then we're good.

This approach makes it possible to write TPM policies that are resistant to changes in ordering and permit fine-grained definition of acceptable values, and which can cleanly separate out local policy, generated policy values and values that are provided by the firmware. The split between machine-specific policy and OS policy allows for the static machine-specific policy to be merged with OS-provided policy, making remote attestation viable even over automated system upgrades.

We've integrated an implementation of this kind of policy into the TPM support code we'd like to integrate into Kubernetes, and CoreOS will soon be generating known-good hashes at image build time. The combination of these means that people using Distributed Trusted Computing under Tectonic will be able to validate the state of their systems with nothing more than a minimal machine-specific policy description.

The support code for all of this should also start making it into other distributions in the near future (the grub code is already in Fedora 24), so with luck we can define a cross-distribution policy format and make it straightforward to handle this in a consistent way even in hetrogenous operating system environments. Remote attestation is a powerful tool for ensuring that your systems are in a valid state, but the difficulty of policy management has been a significant factor in making it difficult for people to deploy in their data centres. Making it easier for people to shield themselves against low-level boot attacks is a big step forward in improving the security of distributed workloads and makes bare-metal hosting a much more viable proposition.

comments

Syndicated 2016-04-04 21:59:57 from Matthew Garrett

11 Mar 2016 »

I stayed in a hotel with Android lightswitches and it was just as bad as you'd imagine

I'm in London for Kubecon right now, and the hotel I'm staying at has decided that light switches are unfashionable and replaced them with a series of Android tablets.

A tablet displaying the text UK_bathroom isn't responding. Do you want to close it?

One was embedded in the wall, but the two next to the bed had convenient looking ethernet cables plugged into the wall. So.

I managed to borrow a couple of USB ethernet adapters, set up a transparent bridge (brctl addbr br0; brctl addif br0 enp0s20f0u1; brctl addif br0 enp0s20f0u2; ifconfig br0 up) and then stuck my laptop between the tablet and the wall. tcpdump -i br0 showed traffic, and wireshark revealed that it was Modbus over TCP. Modbus is a pretty trivial protocol, and notably has no authentication whatsoever. tcpdump showed that traffic was being sent to 172.16.207.14, and pymodbus let me start controlling my lights, turning the TV on and off and even making my curtains open and close. What fun!

And then I noticed something. My room number is 714. The IP address I was communicating with was 172.16.207.14. They wouldn't, would they?

I mean yes obviously they would.

It's not as bad as it could be - the only traffic I could see was from the 207 subnet, so it seems like there's a separate segment per floor. But I could query other rooms on my floor to figure out whether the lights were on or not, which strongly implies that I could control them as well. Jesus Molina talked about doing this kind of thing a couple of years ago, so it's not some kind of one-off - instead, hotels are happily deploying systems with no meaningful security, and the outcome of sending a constant stream of "Set room lights to full" and "Open curtain" commands at 3AM seems fairly predictable.

We're doomed.

comments

Syndicated 2016-03-11 14:17:02 from Matthew Garrett

25 Feb 2016 »

I bought some awful light bulbs so you don't have to

I maintain an application for bridging various non-Hue lighting systems to something that looks enough like a Hue that an Amazon Echo will still control them. One thing I hadn't really worked on was colour support, so I picked up some cheap bulbs and a bridge. The kit is badged as an iSuper iRainbow001, and it's terrible.

Things seemed promising enough at first, although the bulbs were alarmingly heavy (there's a significant chunk of heatsink built into them, which seems to get a lot warmer than I'd expect from something that claims a 7W power consumption). The app was a bit clunky, but eh - I wasn't planning on using it for long. I pressed the button on the bridge, launched the app and could control the bulbs. The first thing I noticed was that they had a separate "white" and "colour" mode. White mode was pretty bright, but colour mode massively less so - presumably the white LEDs are entirely independent of the RGB ones, and much higher intensity. Still, potentially useful as mood lighting.

Anyway. Next step was to start playing with the protocol, which meant finding the device on my network. I checked anything that had picked up a DHCP lease recently and nmapped them. The OS detection reported Linux, which wasn't hugely surprising - there was no GPL notice or source code included with the box, but I'm way past the point of shock at that. It also reported that there was a telnet daemon running. I connected and got a login prompt. And then I typed admin as the username and admin as the password and got a root prompt. So, there's that. The copy of Busybox included even came with tftp, so it was easy to get copies of tcpdump and strace on there to see what was up.

So. Next step. Protocol sniffing. I wanted to know how discovery worked, so reset the device to factory and watched what happens. The app on my phone sent out a discovery packet on UDP port 18602 which looked like this:

INLAN:CLIP:23.21.212.48:CLPT:22345:MAC:02:00:00:00:00:00

The CLIP and CLPT fields refer to the cloud server that allows for management when you're not on the local network. The mac field contains an utterly fake address. If you send out a discovery packet and your mac hasn't been registered with the device, you get a denial back. If your mac has been (and by "mac" here I mean "utterly fake mac that's the same for all devices"), you get back a response including the device serial number and IP address. And if you just leave out the mac field entirely, you get back a response no matter whether your address is registered or not. So, that's a start. Once you've registered one copy of the app with the device, anything can communicate with it by just using the same fake mac in the discovery packets.

Onwards. The command format turns out to be simple. They start ##, are followed by two ascii digits encoding a command, four ascii digits containing a bulb id, two ascii digits containing the number of following bytes and then the command data (in ascii). An example is:

##05010002ff

which decodes as command 5 (set white intensity) on bulb 1 with two bytes of data following, each of which is an f. This translates as "Set bulb 1 to full white intensity". I worked out the rest pretty quickly - command 03 sets the RGB colour of the bulb, 0A asks the bridge to search for new bulbs, 0B tells you which bulbs are available and their capabilities and 0E gives you the MAC addresses of the bulbs(‽). 0C crashes the server process, and 06 spews a bunch of garbage back at you and potentially crashes the bulb in a hilarious way that involves it flickering at about 15Hz. It turns out that 06 is actually the "Rename bulb" command, and if you send it less data than its expecting something goes hilariously wrong in string parsing and everything is awful.

Ok. Easy enough, but not hugely exciting. What about the remote protocol? This turns out to involve sending a login packet and then a wrapped command packet. The login has some length data, a header of "MQIsdp", a long bunch of ascii-encoded hex, a username and a password.

The username is w13733 and the password is gbigbi01. These are hardcoded in the app. The ascii-encoded hex can be replaced with 0s and the login succeeds anyway.

Ok. What about the wrapping on the command? The login never indicated which device we wanted to talk to, so presumably there's some sort of protection going on here oh wait. The command packet is a length, the serial number of the bridge and then a normal command packet. As long as you know the serial number of the device (which it tells you in response to a discovery packet, even if you're unauthenticated), you can use the cloud service to send arbitrary commands to the device (including the one that crashes the service). One of which involves the device then doing some kind of string parsing that doesn't appear to work that well. What could possibly go wrong?

Ok, so that seemed to be the entirety of the network protocol. Anything else to do? Some poking around on the bridge revealed (a) that it had an active wireless device and (b) a running DHCP server. They wouldn't, would they?

Yes. Yes, they would.

The devices all have a hardcoded SSID of "iRainbow", although they don't broadcast it. There's no security - anybody can associate. It'll then hand out an IP address. It's running telnetd on that interface as well. You can bounce through there to the owner's internal network.

So, in summary: it's a device that infringes my copyright, gives you root access in response to trivial credentials, has access control that depends entirely on nobody ever looking at the packets, is sufficiently poorly implemented that you can crash both it and the bulbs, has a cloud access protocol that has no security whatsoever and also acts as an easy mechanism for people to circumvent your network security. This may be the single worst device I've ever bought.

I called the manufacturer and they insisted that the device was designed in 2012, was no longer manufactured or supported, that they had no source code to give me and there would be no security fixes. The vendor wants me to pay for shipping back to them and reserves the right to deduct the cost of the original "free" shipping from the refund. Everything is awful, which is why I just ordered four more random bulbs to play with over the weekend.

comments

Syndicated 2016-02-25 01:16:56 from Matthew Garrett

438 older entries...