Recent blog entries for benad

On the Usability of Strings

I’ve recently read an article about why programmers should favour Python 2 over Python 3 (”The Case Against Python 3”), and most of it is an incoherent rant that expose the author’s deep misunderstanding of how bytecode is internally used in scripting languages and how “market forces” of backwards-compatibility work against new languages. Somebody else already rebutted those arguments better than I would do, and unlike the original author, his later edits are clear and doesn’t involve “it was meant as a joke”. One interesting a valid technical argument remains: Python 3’s opaque support for Unicode strings can be unintuitive for those used to manipulate strings as transparent sequences of bytes.

Many programming languages came from an era where text representation was either for English, or for Western languages that would neatly fit all their possible characters in 8-bit values. Internationalization, then, meant at worst indicating what “code page” or character encoding the text was. Having started programming on 90s Macintosh computers, the go-to string memory representation was the Pascal string, where its first byte indicated the string length. This meant that performing the wrong memory manipulation on the string, using the wrong encoding to display it, or even attempting to display corrupted memory would at worst display 255 random characters.

There is a strong argument that UTF-8 should be used everywhere, and while it takes the occasion to educate programmers about Unicode (for more complete “Unicode for programmers”, see this article and this more recent guide), doing so seems to conflate the two different design (and usability) issues: What encoding should be used to store Human-readable text, and what abstractions (if any) programming languages should offer to represent strings of text?

The “UTF-8 Everywhere” document already has strong arguments for UTF-8 as the best storage format for text, and looking at the popularity of UTF-8 in web standards, all that remains is to move legacy systems to it.

For strings in programming languages, you could imagine one that has absolutely no support for any form of strings, though it’s difficult to sell the idea of a language that doesn’t even support string literals or an “Hello World” program. The approach of “UTF-8 Everywhere” is very close to that, and seems to indicate the authors’ bias towards C and C++ languages: Transparently use UTF-8 to store text, and shift the burden of not breaking multi-byte code points back to the programmer. The argument that counting characters, or “grapheme clusters”, is seldom needed is misleading: Splitting a UTF-8 string in the middle of a code point will break the validity of the UTF-8 sequence.

In fact, it can be argued that programming languages that offer native abstractions of text strings not only give greater protection against accidentally building invalid byte representations, but also give them a chance to do a myriad of other worthwhile optimizations. Languages that presents strings as immutable sequences of Unicode code points, or that transparently use copy-on-write when characters are changed, can optimize memory by de-duplicating identical strings. Even if de-duplication is done only for literals (like Java), it can greatly help with memory reuse in programs that process large amount of text. The internal memory representation of strings can even be optimized for size based on the biggest code point used in it, like Python 3.3 does.

Of course, the biggest usability issue with using abstracted Unicode strings is that it forces the programmer to explicitly tell how to convert a byte sequence in a string and back. The article “The Case Against Python 3” above mentioned that the language’s runtime should automatically detect the encoding, but that is highly error-prone and CPU intensive. The “UTF-8 Everywhere” argues that since both are using UTF-8, it boils down to memory copy, but then breaking code points is still a risk so you’ll need some kind of UTF-8 encoder and parser.

I personally prefer the approach of most modern programming languages, including Perl, Python 3, Java, JavaScript and C#, of supporting both a string and “char” type, and force the programmer to explicitly mention the input and output encoding when converting to bytes. Because they are older and made when they naively thought that the biggest code point would fit in 2 bytes, meaning before these days of Emojis, Java and JavaScript use UTF-16 and 2-bytes characters, so they still can let you accidentally break 3 or 4-bytes code points. Also, it would be nice to do like C# and by default assume that the default encoding used when decoding or encoding should be UTF-8, instead of having to explicitly say so each time like in Perl 5 and Java. Still, providing those string and “char” abstractions while using UTF-8 as its default byte representation reduces the burden on programmers when dealing with Unicode. Sure, learning about Unicode code points and how UTF-8 works is useful, but shouldn’t be required from novice programmers that write a “Hello World” program that outputs an Unicode Emoji to a text file.

Syndicated 2016-12-26 15:11:21 from Benad's Blog

The Dongle Generation

Apple finally updated their MacBook Pros, and professionals weren’t impressed. Last time, with the late 2013 model, they reduced the number of ports, but having bought one I managed to live with it using a few dongle adapters. While I like that they moved to USB-C, I am annoyed that they moved to solely USB-C, since I would have to buy a new set of dongles, let alone the USB-A to USB-C adapters for “normal” USB devices. Beyond that, the specs are average for the price, and the “touch bar” has little to no use for a developer that frequently use the function keys.

All that being said, I’m not planning to buy a new laptop until roughly a year from now. In the meantime, it does raise the question about if MacBook Pros, let alone macOS in general, is what I need for my programming work. Each time I upgrade macOS I have to recompile countless packages from MacPorts, to the point where I realize that almost all of my work is done on command-line tools easily available on Linux. I have to constantly run a Windows 7 virtual machine, so having Windows 10 in that strange BootCamp + Parallels Desktop setup doesn’t seem to be necessary.

So I’m seriously considering buying a Linux laptop as my next programming laptop. Something like the New XPS 13 Developer Edition, the one that comes with Ubuntu, would be nice, and hopefully by next year they fix that annoying coil noise. If I feel adventurous I might take a typical ThinkPad with well-known components supported in Linux and install Linux myself. Yes, I get the irony that a “Mac guy” would buy what used to be an IBM laptop. Either way, I might both save money (even more in the former since I don’t pay for Windows), and potentially time, since most of my development tools would be easy to set up. I might still have to buy some dongle for an Ethernet network connection if I get a thinner laptop, though interrrestingly both my DisplayPort DVI adapter and Thunderbolt Ethernet from Apple adapters may still work. Or I could even go with a thicker “portable computer” (like the ThinkPad P50) and use a docking connector, the 90s solution to dongles… In fact if I’m willing to let go of thinness and battery life, like I did with my first 17” MacBook Pro, I’d be able to get more storage and 32 GB of RAM.

I should admit that I have no experience with a Linux laptop or desktop, in the context of one attached to a display for day to day use. All my Linux systems were either “headless” or running in virtual machines, so I can’t tell if dealing with Xorg configuration files is going to be difficult or not. Same thing can be said for multiple displays, Bluetooth mice, and so on. But from what I’ve read, as long as I stay away from smaller ultrabooks I should be OK.

I’m not going to stop using Macs for home use, though the price increases may restrict my spending on a well-needed upgrade to my mid-2011 Mac mini. Home use now seems the natural fit for Macs anyways. Long gone is the business-oriented Apple from the mid-90s.

Syndicated 2016-11-10 08:30:07 from Benad's Blog

HTTPS, the New Standard

The “web” used to be simple. A simple plain-text protocol (HTTP) and a simple plain-text markup format (HTML). Sadly, after 20 years, things are not as simple anymore.

Nowadays, it is commonplace for ISPs to inject either “customer communications” or downright advertisement into unencrypted HTTP communications. Using web sites from an unencrypted or “open” WiFi is often a vector for a malicious user to inject viruses into any web page, let alone steal passwords and login tokens from popular web sites. On a larger scale, governments now have the capability to do deep packet inspection to systematically either censor or keep a record of all web traffic.

So, indirectly, my simple, unencrypted web site can become dangerous.

Buying an SSL certificate (actually TLS) used to be something both expensive and difficult to set up. Now with the help of “Let’s Encrypt”, any web site can be set up to use HTTPS, for free. Sure, the certificate merely says that HTTPS traffic came from the real web site, but that’s good enough. And for a personal web site, there is limited value in buying one of those expensive “Extended Validation” certificates.

This is why my web site is now using HTTPS. In fact, HTTPS only, though by doing so I’ve had to cut off browsers like Internet Explorer 6, since they do not support secure cryptographic algorithms anymore. It breaks my rule of graceful degradation, but ultimately the security of people that visit my web site is more important than supporting their 15-year old web browser.

What is sad with this though is that as older cryptographic algorithms become obsolete, so too are machines too old to support the new algorithms, let alone those “Internet appliances” that aren’t supported anymore. This means that, unlike the original idea of simple, plain-text protocols, web browsers have to be at most a decade old to be usable.

And still, HTTP with TLS 1.2 is merely “good enough”. There are simply too many root certificates installed in our systems, with many from states that could hijack secure connections to popular site by maliciously create their own certificates for them. HTTP/2 is a nice update, but pales to modern techniques used in QUIC. Considering that even today only a fraction of the Internet is using IPv6, it may take another decade before QUIC becomes commonplace, let alone HTTP/2.

For now, enjoy the green lock displayed on my web site!

Implementation Notes

The site is an excellent starting point to configure your web server for maximum security. I also used the Qualys SSL Labs SSL test service to verify that my server has the highest security grade (A+).

I was also tempted to move from Apache to Caddy, as Caddy supports HTTP/2, QUIC and even Hugo (what I use for the blog section of this site), but then I remembered that I specifically chose Apache on Debian for its long-term, worry-free security updates, compared to a bleeding edge web server.

Syndicated 2016-09-13 01:15:00 from Benad's Blog

Mac-Only Dev Tools

Even though I use Macs, Linux and Windows machines daily and could switch to any of these exclusively, I prefer running my Mac alongside either Linux or Windows. A reason I do so is that there are some development tools that run exclusively on macOS that I prefer over their other platforms’ equivalents. Here are a few I use regularly.

To be fair, I’ll also list for each of those tools what I typically use to replace these on Windows or Linux.


While BBEdit isn’t as flexible or extensible as jEdit, Atom, Emacs, or even Vim to some extent, BBEdit feels and act the most as a proper native Mac text editor. It is packed with features, is well supported, and is incredibly fast. It works quite well with SFTP, so I often use it to edit remote files. It also is the editor I used the longest, as I used it since the late 90s.

Alternatives : Too many to mention, but I currently prefer Visual Studio Code on the desktop and vim on the command-line.


CodeKit, which I mentioned before, is my “go to” tool to bootstrap my web development. It sits in the background of your text editor (any one you want) and web browsers, and automatically validates and optimizes your JavaScript code and CSS files to your liking. It also supports many languages that compile to JavaScript or CSS, like CoffeeScript and SASS.

Alternative : Once I move closer to production, I do end up using Grunt. You can set it up to auto-rebuild your site like CodeKit using grunt-contrib-watch, but Grunt isn’t as nearly user-friendly as CodeKit.


Paw quickly became my preferred tool to explore and understand HTTP APIs. It is used to build up HTTP requests with various placeholder variables and then explore the results using multiple built-in viewers for JSON. All your requests and their results are saved, so it’s safe to experiment and retrace your way back to your previously working version. You can also create sequences of requests, and use complex authentication like OAuth. When you’re ready, it can generate template code in multiple languages, or cURL commands.

Alternative : I like using httpie for the HTTP requests and jq to extract values from the JSON results.


When I was learning Python 3, I constantly made use of Dash to search its built-in modules. It can do I incremental search in many documentation packages and cheat sheets, and does so very quickly since it is done offline. It also make reading “man” pages much more convenient.

Alternatives : There’s Google, of course, but I prefer using the custom search engine of each language’s documentation site using the DuckDuckGo “bang syntax”.

Syndicated 2016-08-16 23:44:10 from Benad's Blog

Good Enough Wireless Audio

For over six months there have been strong rumours that Apple will drop the 3.5mm headphone jack in the next iPhone, something that may have a large impact on the market of portable headphones. Either they will have to adopt the proprietary Lightning connector, or use the more standard Bluetooth wireless protocol.

This isn’t too surprising. I’ve noticed that with the latest Apple TV, the entirety of the Apple product line supports Bluetooth headphones, as if to prepare the market for a more “wireless” headphone future.

Still, if that were to happen, it would suck. And I’m not the only one to preemptively complain. I’ve avoided wireless headphones since they used to greatly reduce the sound quality, on top of the inconveniences of limited range and batteries that need recharge.

Unrelated to this Apple rumour, I did try out wireless headphones (of many kinds). So, are wireless headphones in 2016 good enough?

My first pair of wireless headphones is the Sennheiser RS 175. It is made for home use, as it requires a wireless base that also acts as a charging station for the headphones. It is using some kind of digital connection (up to 96 kHz PCM at source, and using 8-FSK digital lossless), so you either get the full quality or none at all, unlike analogue signals that would degrade the source quality depending on the interference. For me, the primary use for those headphones was to use them at home with some freedom of movement that wired headphones wouldn’t permit, and also to not have wires on the middle of the living room anymore. I was pleasantly surprised at the sound quality and how long the battery charge lasts.

Yet, those avoided my main concerns about Bluetooth headphones: Is the sound quality allowed by the Bluetooth protocol good enough? Some background about Bluetooth audio protocols first.

The first kind of audio supported by Bluetooth was for phone calls, so the audio quality targets what is typically needed for phone lines and not much more. This was done through the Headset Profile (HSP) for headsets, and the Hands-Free Profile (HFP) for cars. Both support 64 kbit/s signals, be it uncompressed, µ-law or a-law, commonly used in telephony, or CVSDM. For music, this is pretty bad, and I suspect my early experiences with music over Bluetooth was through those profiles.

Later, Bluetooth supported “proper” music streaming though the Advanced Audio Distribution Profile (A2DP). While it can support modern MPEG audio codecs (MP3, AAC, etc.), the only required codec is SBC, the only one that was available for free for use in Bluetooth applications. There’s also the newer and better aptX, but most devices don’t support it, notably all iPhones, maybe due to licensing costs and patent protections. And since MPEG codecs are even more expensive to license, it means the only codec commonly supported is SBC. And how good is SBC? Well, good enough compared to other codecs at that bit rate. In plain English, you can hear some quality loss if you listen hard enough. The quality loss is comparable to 256 kbit/s MP3s, which is fine but not great.

So, before jumping into Bluetooth headphones, I tested out Bluetooth audio with two different portable Bluetooth receivers. The first is the OT-ADAPT, made for outdoor sporting, where you would place your phone in your backpack and use a standard set of headphones connected to the receiver outside. The second is the MPOW, this time made for car stereos, but portable enough to be used outdoors. What I noticed with both is that the audio quality is far more impacted by the quality of the DAC than the SBC codec in the first place. The MPOW is generally louder than the OT-ADAPT (since it typically targets “audio line in” at full volume), but even at comparable volume it has far fewer background noise in the output signal. Still, with either receivers, the sound quality loss cannot be noticed with any sub-$100 headphones at average volume.

And then I finally made the jump to Bluetooth-enabled headphones, with the Sennheiser Momentum M2 AEBT. Testing with the provided cables with wired “airplane mode” use, I noticed no quality difference between wired and Bluetooth wireless audio, even while using SBC with my iPhone (the M2 does support aptX). The price difference for the premium of having Bluetooth is difficult to justify compared to sub-$30 Bluetooth receivers, but then those headphones have amazing sound quality and reasonable active noise cancellation.

So, what’s my recommendation? If you simply want some freedom of movement or Bluetooth-enable your car’s audio system, the MPOW Bluetooth receiver should be good enough. Otherwise, you may want to wait for a few months, as the release of the next headphone jack-free iPhone may spur a new wave of Bluetooth headphones, driving down the price of older models. And don’t worry too much about sound quality: It’s good enough.

Syndicated 2016-07-07 22:44:14 from Benad's Blog

Flattening Circular Buffers

A few weeks ago I discovered TPCircularBuffer, a circular buffer implementation for Darwin operating system implementations, including Mac OS X and iOS. Now, I’ve implemented circular buffers before, so I though there wasn’t much need for yet another circular buffer implementation (let alone one specific to iOS), until I noticed something very interesting in the code.

A trick TPCircularBuffer uses is to map two adjacent memory blocks to the same buffer. The buffer holds the actual data, and the virtual memory manager ensures that both maps contain the exact same data, since effectively both virtual memory blocks remaps to the same memory. This makes things a lot easier than my naive implementations: Rather than dealing with convoluted pointer arithmetics each time the producer or consumer reads or writes a sequence of values that cross the end of the buffer, a simple linear read or write works. In fact, the pointers from that doubly-mapped memory can be safely given to any normal function that accepts a pointer, removing the need to make memory copies before each use of the buffer by an external function.

In fact, this optimization is so common that a previous version of the Wikipedia page for circular buffers had some sample code using common POSIX functions. There’s even a 10-year-old VRB - Virtual Ring Buffer library for Linux systems. As for Windows, I’ve yet to seen some good sample code, but you can do the equivalent with CreateFileMapping and MapViewOfFile.

Both Wikipedia’s and VRB’s implementations can be misleading, and not very portable though. On Darwin, and I suspect BSD and many other systems, the mapped memory must be fully aligned to the size of a memory page (”allocation granularity” in Windows terms). On POSIX, that means using the value of sysconf(_SC_PAGESIZE). Since most of the times the page size is a power of 2, that could explain the otherwise strange buffer->count_bytes = 1UL << order from Wikipedia’s sample code.

By the way, I’d like to reiterate how poor the built-in Mac OS X documentation is for POSIX and UNIX-like functions. Though it does warn pretty well about page size alignment and the risks involved with MAP_FIXED of mmap, the rest of the documentation fails to mention how to set permissions of the memory map. Thankfully, the latest Linux man pages for the same functions are far better documented.

Syndicated 2016-05-23 18:01:52 from Benad's Blog

The Static Blog

A quick note to mention that I added The Static Blog to my main web site, discussing the relocation of the blog you’re reading right now to this site.

Syndicated 2016-04-27 01:11:57 from Benad's Blog

Final Blog Move?

This is a short post to mention that my blog, originally hosted on Squarespace as has now moved alongside my main web site under

All links to each post and the RSS feed should now automatically redirect to the new location. This may have created some duplicates in your feed reader.

There is still some clean up to do to make the older posts look better, and I need to post a longer article to explain the rationale behind this, but for now everything should be working fine. The domain will remain active as an option if something were to happen, but for the time being my blog should remain where it is.

Syndicated 2016-03-29 19:51:33 from Benad's Blog

Amazon Cloud Drive Backend for Duplicity

To follow up on my previous post, I wrote acdbackend, to add Amazon Cloud Drive support to duplicity, by wrapping around the acd_cli tool.

Actually, adding online storage services to duplicity is pleasantly easy to implement. If you have some command-line tool for your service that supports download and upload using stdin/stdout, file listing, and deleting files, then in an hour you can have it working in duplicity.

Syndicated 2016-03-03 02:16:19 from Benad's Blog

DIY Backup

While in the past I did recommend CrashPlan as an online backup solution, I stopped using it in December. At first I used it because their multi-year, unlimited plans had reasonable prices, and was the only online backup (back in 2011) that had client-side encryption support. But over the years I ran into multiple major issues. In 2013, they started excluding backing up iOS backups done in iTunes in a transparent background update, even though they would publicly say otherwise. In fact, that hidden file exclusion list was pushed from their "enterprise" version, which has now moved to a much nicer version 5 while they abandoned their home users to version 4. In November, their Linux client started requiring Java 1.7, so my older client running in 1.6 kept downloading updates and failing to install it until the hard drive was full. Their pricing just kept increasing over time, making it difficult for me to keep reniewing.

I moved to iDrive, which is half the cost of CrashPlan and works pretty well, though I'm still a bit worried that I have to trust client-side encryption to some closed-source software. Also, if you back up anything beyond 1 TB their pricing becomes punitive.

The main issue that I have with all those backup services is that your backups become locked-in into online storage plans that are more expensive than competing generic cloud storage providers, and your valuable backups are held hostage if they increase their pricing. Even tarsnap, with its open-source client for the paranoid, locks you in an expensive storage plan, since the client requires some closed-source server software that only they host. I miss the days of older backup software like MobileMe Backup, where the backup software was somewhat separate from the actual storage solution.

Arq Backup looks more like this traditional backup software I was looking for. It can back up to a handful of cloud storage providers, with different pricing models, and many with free initial storage plans if you have small backups. The software is $40 per machine, and then you're free to pick any support cloud storage. The software isn't open-source, but the recovery software is open-source and documented, so you can vouch for its encryption to some extent.

But what if you're on Linux, or insist on an open-source solution (especially for the encryption part)? If you simply want to back up some files once, with no history, you can combine encfs, in "reverse" mode to have an encrypted view of your existing files, with Rclone. Note that with this approach extended file information may be lost in the transfer. If you want a more thourough versioned backup solution, Duplicity should work fine. It encrypts the files with GPG, and does file-level binary deltas to make backup files as small as possible. If duplicity doesn't support your cloud storage directly, you can store the backups to disk and sync them with Rclone. To make using Duplicity easier, you can also use the wrapper tool duply.

As for what cloud storage provider to use, it depends on your needs. If you can fit your backups in less than about 15 GB, you can use the free version of Google Drive. If you want a flexible pricing and good performance at the lowest cost, Google Nearline looks like a great deal at 0.01$ per GB per month. If you already have Office 365, then you already have 1 TB of OneDrive, though downloads can be a bit slow. The Unlimited plan of Amazon Cloud Drive has good transfer speeds and is worry-free, though Duplicity doesn't support it.

Syndicated 2016-02-25 03:56:28 from Benad's Blog

122 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!