Older blog entries for broonie (starting at number 138)

Expedient ABIs

The biggest change we’ve seen in the Linux kernel for ARM over the past few years has been the transition to providing descriptions of the hardware in systems via device tree. This splits out the description of the devices in the system that can’t be automatically enumerated from the kernel into a separate binary instead of being part of the kernel binary. Currently for most systems that are actively used upstream the device tree source code is kept in the kernel but the goal is to allow people to use device trees that are distributed separately to the kernel, especially device trees that are shipped as part of the board firmware. This is something that other platforms have done for a long time, PowerPC Macs and Sun SPARC systems use device tree as the mechanism for describing the hardware to the operating system.

One consequence of this desire to allow the kernel and device tree to be shipped separately is that the device tree becomes an ABI. This is a really big change for people working in the embedded and consumer electronics areas where ARM has been most widely deployed, it means that any descriptions of the hardware need to be something that can stand the test of time. Anything we release is something we have to expect to carry code for indefinitely. When everything was done as part of the kernel binary we could easily do something that doesn’t quite represent the hardware with the intention of replacing it later, now it is much harder to do that.

An example of this is the SAW in Qualcomm SoCs. This is a block in the SoC which provides control of some regulators used for the CPU cores in the PMIC in very low power states and also allows the CPU to control the regulator with fast memory mapped registers rather than the slower buses used to control the PMIC. Unfortunately it doesn’t fully replace direct access to the PMIC, it supports a subset of the control we need for the PMIC but not all of it. We could represent the SAW as an independent regulator but from a system integration point of view it is functioning as an extra control interface for the external PMIC and if we want to use the extra functionality that is only available via direct access to the PMIC we need to consider that and represent the SAW as an extension of it. If we don’t need that extra PMIC functionality at the current time this means we need to do some extra work to make sure we describe the PMIC before we can use the SAW even if we have no intention to use anything other than the SAW.

Now, few if any people are actually using the device tree as an ABI at present so those working on enabling platforms often forget about the requirement and find it an obstacle to getting things done – they have pressure to get things done, they don’t have quite the same pressure to make sure that attention is paid to device tree compatibility so it can easily get forgotten. Over time this may change, especially if people start to take advantage of the device tree as an ABI that become more and more important, but for now if we want to enable that in the future it’s something we have to actively think about and work on, accepting that this means we won’t always be able to do the most expedient thing.

Syndicated 2016-02-20 17:19:41 from Technicalities

Performance problems

Just over a year ago I implemented an optimization to the SPI core code in Linux that avoids some needless context switches to a worker thread in the main data path that most clients use. This was really nice, it was simple to do but saved a bunch of work for most drivers using SPI and made things noticeably faster. The code got merged in v4.0 and that was that, I kept on kicking a few more ideas for optimizations in this area around but that was that until the past month.

What happened then was that for whatever reason people started picking up v4.0 and using it in production more. On some systems people started seeing problems when there was heavy SPI flash usage, often during things like distribution installation. In some cases the lockup detector fired, but the most entertaining error was that on Marvell Orion systems (which are single core) when the flash was being heavily used the SATA controller started having trouble handling interrupts. These problems all bisected down to the key commit in that series, 0461a4149836c79 (spi: Pump transfers inside calling context for spi_sync()).

The problem is that there are a number of widely deployed SPI controllers out there that don’t support DMA and instead require the CPU to explicitly read and write everything sent to and from registers in the controller. To make matters worse these accesses to the controller will usually take many CPU cycles to complete, each one stalling the CPU while they happen. This is fine for short transfers or if the CPU has nothing else to do but on a busy multitasking system it’s an issue. Before the optimization the switches between the worker thread interacting with the hardware and the thread initiating the SPI operations provided breaks in this activity which allowed other things to switch in. Unfortunately when we optimize those away then if there’s a lot of work for the controller being done from one thread then that thread can run for a long time without pause. The fix for affected drivers if there are no less CPU intensive ways of driving the hardware is to add some explicit sleeps into the driver itself, either at the end of the transfer_one() or perhaps in an unprepare_message() function.

In a way I was quite pleased to see this, it was a clear demonstration that the optimization was having the intended effect though obviously users of affected systems will not find that so comforting. It’s not the first time that making things faster or fixing a bug has revealed an underlying problem, I’m sure it won’t be the last.

Syndicated 2016-02-13 00:01:12 from Technicalities

Maintaining your email

One of the difficulties of being a kernel maintainer for a busy subsystem is that you will often end up getting a lot of mail that requires reading and handling which in turn requires sending a lot of mail out in reply. Some of that requires thought and careful consideration but a lot of it is quite routine and (perhaps surprisingly) there is often more challenge in doing a good job of handling these routine messages.

For a long time I used to hand write every reply I sent but the problem with doing that is that sending the same message a lot of times tends to result in the messages getting more and more brief as the message becomes routine and practised. Your words become more optimised and if you’ve stopped thinking about the message before you’ve finished typing it then there’s a desire to finish the typing and get on to the next thing. This is I think a lot of the reputation that kernel maintainers have for being terse and unhelpful comes from – messages that are very practised for someone sending them all the time aren’t always going to be obvious or helpful for someone who’s not so intimately familiar with what’s going on. The good part of it is that everyone is getting a personalised response and it’s easy to insert a comment about that specific situation when you’re already replying but it’s not clear that the tradeoff is a good one.

What I’ve started doing instead for most things is keeping a set of pre-written paragraphs for common cases that I can just insert into a mail and edit as needed. Hopefully it’s working well for people, it means the replies are that bit more verbose than they might otherwise be (mainly adding an explanation of why a given thing is being asked for) but can easily be adapted as needed. The one exception is the “Applied, thanks” mails I used to send when I apply a patch (literally just saying that). Those are now automatically generated by the script I use to sync my local git repository with kernel.org and very much more verbose:

From: Mark Brown <broonie@kernel.org>
To: ${CCS}
Cc: ${LIST}
Subject: ${SUBJECT}
In-Reply-To: ${MSGID}

The patch

   ${TITLE}

has been applied to the ${REPO} tree at

   ${URL} ${BRANCH}

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

(unfortunately this bit seems to be something that it’s worth pointing out)

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

(the script does try to CC relevant lists). As well as giving people more information this also means that the mails only get sent out when things actually get published to my public repositories which avoids some confusion that used to happen sometimes with people getting my replies before I’d pushed, especially when I’d been working with poor connectivity as often happens when travelling. On the down side it’s very much an obvious form letter which some people don’t like and which can make people glaze over.

My hope with this is to make things easier on average for patch submitters and easier for me, feedback on the scripted e-mails appears to be good thus far and the goal with the pasted in content is that it should be less obvious that it’s happening so I’d expect less feedback there.

Syndicated 2016-02-09 17:36:53 from Technicalities

Maximising the efficiency of chained regulators

Linux v4.4 will include a cool new feature contributed by Sascha Hauer of Pengutronix which propagates voltages set on a regulator to the regulators that supply it (taking into account the minimum headroom that the child regulator needs). The original reason for implementing it was to allow us to set voltages through simple unregulated power switches but the cool bit is that we can also use this to save power in some systems.

There are two standard types of voltage regulator, DCDCs which are very efficient but produce noisy output and LDOs which are much less efficient but a lot cheaper and simpler and produce much cleaner output. What a lot of systems do to avoid a lot of the inefficiency of LDOs is to use a DCDC to reduce the voltage from the main system power supply (eg, the battery) to something close to the minimum power supply for the LDOs in the system This means that most of the voltage reduction (which is what generates inefficiency) comes from the DCDC rather than the LDO but you still get the clean power supply from the LDO and can have several different output voltages from a single expensive DCDC. By managing the voltage we set on the DCDC at runtime depending on the LDO configurations we can maximise the power savings from this setup, putting as much of the work onto the DCDC as we can at any given moment.

This has been at the back of my mind for a long time, I’m really pleased to see it implemented. It’s a pretty small change code wise and probably not worth implementing for any one system but when we do it in the core like this hopefully many systems will be able to use it and the effects will add up.

Syndicated 2015-12-08 12:52:14 from Technicalities

Unconscious biases

Matthew Garrett’s recent very good response to Eric Raymond’s recent post opposing inclusiveness efforts in free software reminded me of something I’ve been noticing more and more often: a very substantial proportion of the female developers I encounter working on the kernel are from non-European cultures where I (and I expect most people from western cultures) lack familiarity with the gender associations of all but the most common and familiar names. This could be happening for a lot of reasons – it could be better entry paths to kernel development in those cultures (though my experience visiting companies in the relevant countries makes me question that), it could be that the sample sizes are so regrettably small that this really is just anecdote but I worry that some of what’s going on is that the cultural differences are happening to mask and address some of the unconscious barriers that get thrown up.

Syndicated 2015-11-30 12:32:47 from Technicalities

Flashing an AT91SAM9G20-EK from bare metal

Since I just had cause to do this and it was harder than it needed to be due to bitrot in the public documentation I could find I thought I’d write up how to get a modern bootloader onto older Atmel boards. These instructions are written for the AT91SAM9G20-EK though they should also apply to other Atmel boards of a similar generation.

These instructions are for booting from NAND since it’s the default thing for the board, for this J34 should be fitted to enable the chip select and J33 disconnected to disable the dataflash. If there is something broken programmed into flash then booting while holding down BP4 should cause the second stage bootloader to trash itself and ensure the ROM bootloader puts itself into recovery mode, or just removing both J33 and J34 during power on will also ensure no second stage bootloader is found.

There is a ROM bootloader but it just loads a small region from the boot media and jumps into it which isn’t enough for u-boot so there is a second stage bootloader called AT91Bootstrap. Download sources for current versions from github. If it (or a more sensibly written equivalent) is not yet merged upstream you’ll need to apply this patch to get it to build with a modern compiler, or you could use an old toolchain (which you’ll need in the next step anyway):

diff --git a/board/at91sam9g20ek/board.mk b/board/at91sam9g20ek/board.mk
index 45f59b1822a6..b8251ca2fbad 100644
--- a/board/at91sam9g20ek/board.mk
+++ b/board/at91sam9g20ek/board.mk
@@ -1,7 +1,7 @@
 CPPFLAGS += \
        -DCONFIG_AT91SAM9G20EK \
-       -mcpu=arm926ej-s
+       -mcpu=arm926ej-s -mfloat-abi=soft
 
 ASFLAGS += \
        -DCONFIG_AT91SAM9G20EK \
-       -mcpu=arm926ej-s
+       -mcpu=arm926ej-s -mfloat-abi=soft

Once that’s done you can build with:

make at91sam9g20eknf_uboot_defconfig
make CROSS_COMPILE=arm-linux-gnueabihf-

producing binaries/at91sam9g20ek-nandflashboot-uboot-${VERSION}.bin. This configuration will look for u-boot at 0x40000 in the flash so we need a u-boot binary. Unfortunately modern compilers seem to produce binaries that fail with no output. This is normally a sign that they need the ABI specifying more clearly as above but I got fed up trying to spot what was missing so I used an old CodeSourcery 2013.05 release instead, hopefully future versions of u-boot will be able to build for this target with older toolchains. Grab a recent release (I used 2015.01) and build with:

cd ${UBOOT}
make at91sam9g20ek_nandflash_defconfig
make CROSS_COMPILE=arm-linux-gnueabihf-

to get u-boot.bin.

These can then be flashed using the Atmel flashing tool SAM-BA. Start it and connect to the target (there is a Linux version, though it appears to rely on old versions of TCL/TK so if you get trouble starting it the easiest thing is to use the sacrificial Windows laptop you’ve obtained in order to run the “entertaining” flashing tools companies sometimes provide without risking a real system, or in my case your shiny new laptop that you’ve not yet installed Linux on). Start it then:

  1. Connect SAM-BA to the device following the dialog on start.
  2. Make sure you’ve selected “NandFlash” in the memory type tabs in the center of the window.
  3. Run the “Enable NandFlash” script.
  4. Run the “Erase All” script.
  5. Run the “Send Boot File” script and provide the at91bootstrap binary.
  6. Set “Send File Name” to be the u-boot binary you built earlier and “Address” to be 0x40000.
  7. Click “Send File”
  8. Press the reset button

which should result in at91bootstrap output followed by u-boot output on the serial console. A similar process works for the AT91SAM9263, there the jumper you need is J19 (sadly u-boot does not flash pictures of cute animals or forested shorelines on the screen as the default “Basic LCD Project 1.4″ firmware does, I’m not sure this “full operating system” thing is really delivering improved functionality).

Syndicated 2015-04-14 18:04:43 from Technicalities

Acer Aspire E11

Recently I was in Seoul in the middle of three weeks of travel and my laptop died on me.  Since I had some work that needed doing fairly urgently I took myself over to Yongsan Electronics Market and got myself a cheap replacement to tide myself over.

What I ended up with was an Acer Aspire E11. There’s a bunch of different models all with very similar plastics, I got one which has a N2940 SoC, 2G of RAM (upgraded to 4G in store), a 500G hard disk and no fans for just over 200000 Korean Won, or about $200. As you’d expect at that price it’s got shortcomings but overall I’ve been extremely happy with it, it’s worth looking at if you need something cheap.

The keyboard in particular is probably the nicest I’ve used 0n a laptop in a long time with a good, definite but not excessive click feel as you press. Battery life is about 5 hours as advertised which is not wonderful but basically fine for me most of the time, and while not exactly Retina it’s clear with good viewing angles and generally pleasant to look at. Everything is plastic but feels very solid and robust, better than a lot of more expensive devices I’ve used, and there’s not much bezel around the screen which means it’s the first laptop I’ve had which has been comfortable to use in a standard economy seat on a plane.

The biggest drawback is performance – it’s a little slow opening applications sometimes and kernel builds crawl with an x86 allmodconfig taking about one and three quarter hours. For e-mail and web browsing there’s no problem at all, I did have to move from offlineimap to mbsync to get my mail to sync in a reasonable time but that’s more to do with the performance of offlineimap than that of the system. Overall in use it feels like the Dell I was using from about 2008-2011 or so, comfortable in use outside of builds, and I do appreciate having a system with no fans.

There were a couple of small tricks getting Debian installed – this is the first system I’ve seen with secure boot enabled by default which took me a few moments to work out (but is really good to see). Once that was disabled the install was smooth other than being bitten by Debian bug#778810 which meant I needed a manual fixup to actually get it to boot from the disk. It’s also got a Broadcom WiFi module which means it doesn’t work at all with mainline but it looked like that was on a standard mini PCI Express module so easily replaceable (I happened to have a USB dongle handy so haven’t bothered) and the wired ethernet just worked.

Like I say I’ve been very happy with it, there’s a bunch of other models with different specs for everything except the case (some touchscreen, some with small 32G eMMC drives) as well. Were it not for my need to do kernel builds I’d probably be keeping it as my primary laptop.

Syndicated 2015-04-12 18:52:10 from Technicalities

Heating the Internet of Things

Internet of Things seems to be trendy these days, people like the shiny apps for controlling things and typically there are claims that the devices will perform better than their predecessors by offloading things to the cloud – but this makes some people worry that there are potential security issues and it’s not always clear that internet usage is actually delivering benefits over something local. One of the more widely deployed applications is smart thermostats for central heating which is something I’ve been playing with. I’m using Tado, there’s also at least Nest and Hive who do similar things, all relying on being connected to the internet for operation.

The main thing I’ve noticed has been that the temperature regulation in my flat is better, my previous thermostat allowed the temperature to vary by a couple of degrees around the target temperature in winter which got noticeable, with this the temperature generally seems to vary by a fraction of a degree at most. That does use the internet connection to get the temperature outside, though I’m fairly sure that most of this is just a better algorithm (the thermostat monitors how quickly the flat heats up when heating and uses this when to turn off rather than waiting for the temperature to hit the target then seeing it rise further as the radiators cool down) and performance would still be substantially improved without it.

The other thing that these systems deliver which does benefit much more from the internet connection is that it’s easy to control them remotely. This in turn makes it a lot easier to do things like turn the heating off when it’s not needed – you can do it remotely, and you can turn the heating back on without being in the flat so that you don’t need to remember to turn it off before you leave or come home to a cold building. The smarter ones do this automatically based on location detection from smartphones so you don’t need to think about it.

For example, when I started this post this I was sitting in a coffee shop so the heating had been turned off based on me taking my phone with me and as a result the temperature gone had down a bit. By the time I got home the flat was back up to normal temperature all without any meaningful intervention or visible difference on my part. This is particularly attractive for me given that I work from home – I can’t easily set a schedule to turn the heating off during the day like someone who works in an office so the heating would be on a lot of the time. Tado and Nest will to varying extents try to do this automatically, I don’t know about Hive. The Tado one at least works very well, I can’t speak to the others.

I’ve not had a bill for a full winter yet but I’m fairly sure looking at the meter that between the two features I’m saving a substantial amount of energy (and hence money and/or the environment depending on what you care about) and I’m also seeing a more constant temperature within the flat, my guess would be that most of the saving is coming from the heating being turned off when I leave the flat. For me at least this means that having the thermostat internet connected is worthwhile.

Syndicated 2015-01-18 21:23:58 from Technicalities

Kernel build times for automated builders

Over the past year or so various people have been automating kernel builds with the aim of both setting the standard that things should build reliably and using the resulting builds for automated testing. This has been having good results, it’s especially nice to compare the results for older stable kernel builds with current ones and notice how much happier everything is.

One of the challenges with doing this is that for good coverage you really need to include allmodconfig or allyesconfig builds to ensure coverage of as much kernel code as possible but that’s fairly resource intensive given the size of the kernel, especially when you want to cover several architectures. It’s also fairly important to get prompt results, development trees are changing all the time and the longer the gap between a problem appearing and it being identified the more likely the report is to be redundant.

Since I was looking at my own setup and I know of several people who’ve done similar benchmarking I thought I’d publish some ballpark numbers for from scratch allmodconfig builds on a single architecture:

i7-4770 with SSD 20 minutes
linode 2048 1.25 hours
EC2 m3.medium 1.5 hours
EC2 c3.medium 2 hours
Cubietruck with SSD 20 hours

All with the number of tasks spawned by make set to the number of execution threads the system has and no speedups from anything like ccache. I may keep this updated in future with further results.

Obviously there’s tradeoffs beyond the time, especially for someone like me doing this at home with their own resources – my desktop is substantially faster than anything else I’ve tried but I’m also using it interactively for my work, it’s not easily accessible when not at home and the fans spin up during builds while EC2 starts to cost noticeable money to use as you add more builds.

Syndicated 2015-01-14 22:37:52 from Technicalities

Adventures with ARM server

I recently got a CubieTruck with a terabyte SSD to use as a general always on server. This being an ARM board rather than a PC (with a rather nice form factor – it’s basically the same size as a SSD) you’d normally expect a blog post about it to include instructions for kernels and patches and so on but with these systems and current Debian testing there’s no need – Debian works out of the box (including our standard kernel) on it, the instructions worked easily and I now have a new machine sitting quietly in the corner serving away. Sadly it being a dual core A7 it’s not got the grunt to replace my kernel build test system, an ARM allmodconfig takes eleven and a bit hours as opposed to a little less than twenty minutes on my desktop (which does draw well over an order of magnitude more power doing it), but otherwise you’d never notice the difference when using the system.

The upshot of all this is that actually there’s no real adventure at all; for systems like these where the system vendors and the communities around them are doing the right things and working well with upstream things just work as you’d expect with minimal effort.

The one thing that’s noticeably different from installing on a PC and really could do with improving is that instead of being shipped as part of the board the boot firmware has to be written to a SD card, something that could be addressed as easily as simply shipping a suitably programmed SD card in the box even without any other modification of the hardware, though on board flash would be even nicer.

Syndicated 2014-12-20 18:57:06 from Technicalities

129 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!