Older blog entries for bagder (starting at number 719)

getaddrinfo with round robin DNS and happy eyeballs

This is not news. This is only facts that seem to still be unknown to many people so I just want to help out documenting this to help educate the world. I’ll dance around the subject first a bit by providing the full background info…

round robin basics

Round robin DNS has been the way since a long time back to get some rough and cheap load-balancing and spreading out visitors over multiple hosts when they try to use a single host/service with static content. By setting up an A entry in a DNS zone to resolve to multiple IP addresses, clients would get different results in a semi-random manner and thus hitting different servers at different times:

server     A   192.168.0.1
server     A   10.0.0.1
server     A   127.0.0.1

For example, if you’re a small open source project it makes a perfect way to feature a distributed service that appears with a single name but is hosted by multiple distributed independent servers across the Internet. It is also used by high profile web servers, like for example www.google.com and www.yahoo.com.

host name resolving

If you’re an old-school hacker, if you learned to do socket and TCP/IP programming from the original Stevens’ books and if you were brought up on BSD unix you learned that you resolve host names with gethostbyname() and friends. This is a POSIX and single unix specification that’s been around since basically forever. When calling gethostbyname() on a given round robin host name, the function returns an array of addresses. That list of addresses will be in a seemingly random order. If an application just iterates over the list and connects to them in the order as received, the round robin concept works perfectly well.

but gethostbyname wasn’t good enough

gethostbyname() is really IPv4-focused. The mere whisper of IPv6 makes it break down and cry. It had to be replaced by something better. Enter getaddrinfo() also POSIX (and defined in RFC 3943 and again updated in RFC 5014). This is the modern function that supports IPv6 and more. It is the shiny thing the world needed!

not a drop-in replacement

So the (good parts of the) world replaced all calls to gethostbyname() with calls to getaddrinfo() and everything now supported IPv6 and things were all dandy and fine? Not exactly. Because there were subtleties involved. Like in which order these functions return addresses. In 2003 the IETF guys had shipped RFC 3484 detailing Default Address Selection for Internet Protocol version 6, and using that as guideline most (all?) implementations were now changed to return the list of addresses in that order. It would then become a list of hosts in “preferred” order. Suddenly applications would iterate over both IPv4 and IPv6 addresses and do it in an order that would be clever from an IPv6 upgrade-path perspective.

no round robin with getaddrinfo

So, back to the good old way to do round robin DNS: multiple addresses (be it IPv4 or IPv6 or both). With the new ideas of how to return addresses this load balancing way no longer works. Now getaddrinfo() returns basically the same order in every invoke. I noticed this back in 2005 and posted a question on the glibc hackers mailinglist: http://www.cygwin.com/ml/libc-alpha/2005-11/msg00028.html As you can see, my question was delightfully ignored and nobody ever responded. The order seems to be dictated mostly by the above mentioned RFCs and the local /etc/gai.conf file, but neither is helpful if getting decent round robin is your aim. Others have noticed this flaw as well and some have fought compassionately arguing that this is a bad thing, while of course there’s an opposite side with people claiming it is the right behavior and that doing round robin DNS like this was a bad idea to start with anyway. The impact on a large amount of common utilities is simply that when they go IPv6-enabled, they also at the same time go round-robin-DNS disabled.

no decent fix

Since getaddrinfo() now has worked like this for almost a decade, we can forget about “fixing” it. Since gai.conf needs local edits to provide a different function response it is not an answer. But perhaps worse is, since getaddrinfo() is now made to return the addresses in a sort of order of preference it is hard to “glue on” a layer on top that simple shuffles the returned results. Such a shuffle would need to take IP versions and more into account. And it would become application-specific and thus would have to be applied to one program at a time. The popular browsers seem less affected by this getaddrinfo drawback. My guess is that because they’ve already worked on making asynchronous name resolves so that name resolving doesn’t lock up their processes, they have taken different approaches and thus have their own code for this. In curl’s case, it can be built with c-ares as a resolver backend even when supporting IPv6, and c-ares does not offer the sort feature of getaddrinfo and thus in these cases curl will work with round robin DNSes much more like it did when it used gethostbyname.

alternatives

The downside with all alternatives I’m aware of is that they aren’t just taking advantage of plain DNS. In order to duck for the problems I’ve mentioned, you can instead tweak your DNS server to respond differently to different users. That way you can either just randomly respond different addresses in a round robin fashion, or you can try to make it more clever by things such as PowerDNS’s geobackend feature. Of course we all know that A) geoip is crude and often wrong and B) your real-world geography does not match your network topology.

happy eyeballs

During this period, another connection related issue has surfaced. The fact that IPv6 connections are often handled as a second option in dual-stacked machines, and the fact is that IPv6 is mostly present in dual stacks these days. This sadly punishes early adopters of IPv6 (yes, they unfortunately IPv6 must still be considered early) since those services will then be slower than the older IPv4-only ones.

There seems to be a general consensus on what the way to overcome this problem is: the Happy Eyeballs approach. In short (and simplified) it recommends that we try both (or all) options at once, and the fastest to respond wins and gets to be used. This requires that we resolve A and AAAA names at once, and if we get responses to both, we connec() to both the IPv4 and IPv6 addresses and see which one is the fastest to connect.

This of course is not just a matter of replacing a function or two anymore. To implement this approach you need to do something completely new. Like for example just doing getaddrinfo() + looping over addresses and try connect() won’t at all work. You would basically either start two threads and do the IPv4-only route in one and do the IPv6 route in the other, or you would have to issue non-blocking resolver calls to do A and AAAA resolves in parallel in the same thread and when the first response arrives you fire off a non-blocking connect() …

My point being that introducing Happy Eyeballs in your good old socket app will require some rather major remodeling no matter what. Doing this will most likely also affect how your application handles with round robin DNS so now you have a chance to reconsider your choices and code!

Syndicated 2012-01-03 22:15:21 from daniel.haxx.se

Top-3 curl bugs in 2011

This is a continuation of my little top-3 things in curl during 2011 which started with the top-3 changes 2011.

The changelog on the curl site lists 150 bugs fixed in the seven released of the year. The most import fixes in my view were…

Bug-fix 1: handle HTTP redirects to //hostname/path

Following redirects is one of the fundamentals of HTTP user agents and one of the primary things people use curl and libcurl for is to mimic browser to do automatic stuff on the web. Therefore it was even more embarrassing to realize that libcurl didn’t properly support the relative redirect when the Location: header doesn’t include the protocol but the host name. It basically means that the protocol shall remain (in reality that means HTTP or HTTPS) but it should move over to the new host and path. All browsers support this since ages ago. Since November 15th 2011, libcurl does too!

Bug-fix 2: inappropriate GSSAPI delegation

We had one security vulnerability announced in 2011 and this was it. I won’t try to blame someone else for this mistake, but there are some corners of curl and libcurl I’m not personally very familiar with and I would say the GSS stuff is one of those. In fact, even the actual GSS and GSSAPI technologies are mercy areas as far as my knowledge reaches so I was not at all aware of this feature or that we even made us of it… Of course it also turns out that there’s a certain amount of existing applications that need it so we now have that ability in the library again if enabled by an option.

Bug-fix 3: multi interface, connect fail continue to next IP

One of those silly bugs nobody would expect us to have at this point. It turned out the code for the multi interface didn’t properly move on to try the next IP in case a connect() failed and the host name had resolved to a number of addresses to try. A long term goal of mine is to remodel the internals of libcurl to always use the multi interface code and I would just wrap that interfact with some glue logic to offer the easy interface. For that to work (and for lots of other reasons of course), the multi interface simply must work for all of these things.

Additionally, this is another of those things that are hard to test for in the test suite as it would involve trickery on IP or TCP level and that’s not easy to accomplish in a portable manner.

Syndicated 2011-12-31 20:51:14 from daniel.haxx.se

Top-3 curl changes in 2011

At the end of the year I thought it would be interesting to have a look back over the past twelve months and see what the biggest changes in the curl project were and what the most important bug-fixes were etc. I’ve turned into a little mini series of blog posts. The top-3s. First out, the top-3 changes.

Top-3 changes

In total I counted 29 notable changes brought during 2011. The most significant ones in my view are:

Change 1 – fancier protocol support in proxy strings

This may sound trivial, and the code certainly is, but with this change suddenly a lot of applications that use libcurl got better proxy support without having do anything at all. Previously the application would have to set what protocol type the proxy it would use is, even though libcurl has supported having the proxy specified as an environment variable for ages.

Having only the proxy name was useful but limiting. With this new change you can specify proxy type or rather proxy protocol by prefixing the proxy name like “socks4://magic-proxy.example.com” if you want a SOCKS4 proxy. libcurl now supports socks4, socks4a, socks5 or socks5h used as prefixes as well as http.

Change 2 – allow sending “empty” HTTP headers

Another minor change in code but possibly larger impact in usefulness for applications. We introduced a way for applications to change internal headers and to add new ones ages ago. We (or rather I) then made the choice that if you’d provide a header with only the name and a colon, that is with no contents on the right side, it would delete the internal header. That was not a clever move as later on people have wanted to add “blank” or “empty” headers that look exactly like that, but libcurl has then refused to.

There have been some more or less hackish work-arounds to trick libcurl into allowing an empty header, but now finally we introduce a nice and clean way for applications to pass in these kinds of empty headers:

Pass in an empty header that instead of a colon has a semicolon! This is an otherwise illegal header that wouldn’t make sense, but libcurl will use that as a trigger that an empty header should be used and it will then replace the semicolon with a colon and things will be fine.

Change 3 – Added support for cyassl and axTLS

Proving that libcurl moves forward and into more and more markets, the number of supported SSL libraries grew to 7 this year. The all new cyassl backend that replaced the previously done yassl backend that was using cyassl’s former OpenSSL emulation layer. Now we’re using the native and pure API and things much cleaner. The possibly smallest available TLS library axTLS also got support.

Not all backends and not all SSL libraries are the same or support the same set of features, but then libcurl is used in many different scenarios and use cases and this way we offer more options to more users to craft libcurl for their particular needs. Our internal SSL backend API has managed quite well and proves to have been a worthy change. Adding support for yet another SSL library within libcurl is actually not a lot of work.

Change 4 – TSL-SRP

You may think number 4 of a top-3 list is weird, but I couldn’t cut it off here! =) TLS-SRP has been waiting in the shadows for so long and all of a sudden two of the major SSL libraries have support for it in released versions and libcurl got support for using these features in both libraries during 2011.

cURL

Syndicated 2011-12-30 20:27:34 from daniel.haxx.se

What I got from Google

Recall my present from Google. Here’s what I spent the gift codes on (they were only valid in google-store.com):

A red fleece for me, and then “softshell” jackets for my whole family.

GO11018B-2GO22003GO21003GO20003GO20003

I got the package today. The stuff matched my expectations in appearance and feel and my kids just loved these “extra Christmas gifts”.

Syndicated 2011-12-27 18:31:33 from daniel.haxx.se

curlers rest on Sundays and during July

We now run gitstats on the curl git repository daily and provides fun graphs.

We have almost 11 years of source code history covered and I personally have done some ~68% of all commits. Given this long history it is fun to see some very clear trends. Like this first one: look at the distribution of commits per weekday over the entire period. The amount of commits done during weekends are significantly lower than during the work week, and the Sunday amount is clearly even lower than Saturday:

day_of_week

Similarly, we can see how the activity is spread out over calendar months. This shows an obvious correlation to the slower periods in my life, which means that July is vacation times and the numbers show it:

month_of_year

Syndicated 2011-12-20 07:47:53 from daniel.haxx.se

My five ADSL modems

bredbandsbolagetI previously blogged when my network hardware died. Here’s the recap and continuation of that story and how things evolved…

One day my ADSL modem could no longer get sync, I couldn’t send data and my (landline) phone was dead. My phone is connected into the ADSL modem through which it does IP telephony. Other times this has happened I could just switch off the modem for 10 seconds and then back on again it would work again for another 6 months or a year or so.

I’ve had ADSL at roughly 12mbit working flawlessly for several years so this was an unexpected breakage.

On 14 sep 16:16 I called my operator’s (Bredbandsbolaget) support about the issue when the modem hadn’t been able to get contact for a whole day – I was suspecting some kind of glitch in the service from the other end. The support person said that I had a “very old modem” and they immediately decided to send me a new modem by mail that would fix my problems.

xavi technologies x5258-p2At 16 sep 18:51 I called support again. I received modem #2 and installed it this day. The modem, Xavi Technologies X5258-P2, is a much more fancy model than what I had been using for the last couple of years – the new one had 4 Ethernet ports and wifi. Not that I really care about that cruft as I want to use my own wifi router anyway to get control of things better.

When I plugged in modem #2 I noticed that it lit up the ‘phone’ LED at once (which normally would only be on if I use the phone) and while internet data seemed to work, the phone did not. When I called support again to ask about this, they decided it was a broken modem they had sent me and would send me a replacement at once.

A few days later I got modem #3 and installed it. I also got the joy of sending back two ADSL modems.

3 oct 20:25 – I called the support again. Modem #3 hung occasionally and I wanted to get their help to fix the problem. The support guy I talked to claimed his sometimes happens if a wifi router is too close to the modem and adviced me to put my ADSL modem and wifi router further apart. It sounded like a suspicious analysis and theory to me, as why would the modem completely hang from this and if it did, why would it keep on running for days at times after a reboot? The support person also revealed that he had detailed logs going back a few weeks at least where he could see my ADSL modem power recycles and he could also see “bad CRC” counters going up before my restarts. I moved my devices two meters apart.

A little side-story: the modem has wifi support, but as I run my own wifi router behind it I don’t want the modem’s wifi. I noticed it ran on a different channel than my regular one so it wasn’t an immediate concern. It did however turn out that in order to switch it off I had to configure that with a Windows program and in order to install that program I had to enter a username and password that I didn’t have. Asking support for the credentials, they instead offered to simply disable the wifi from their end instead. That was fine by me, but again showed what fancy controls they have over these things.

For a week or so my connection actually was better and I actually thought my suspicions about the fishy advice were wrong. But no. It turned out I was only lucky for a few days as then it started hanging again every few days. It would stop transfering data in/out, and the “phone” led would blink slowly. How on earth could a device like this hang in any circumstance? I’ve been an embedded developer all my professional life, I know hanging is the worst possible thing. I much better but still ugly way to resolve a problem without any obvious way out, would be to reboot. A reboot would’ve been annoying as well, but far from as annoying as this.

Now, after all, I have a fiber installation coming “soon” so I figured I could possibly just shut up and endure this ADSL mess and it will go away or at least change drastically once I get my new connection…

But eventually it got too tedious, also partly because my kids and my wife also found it annoying and troubling – I had to give up the eduring. The fiber installtion also seemed to be delayed. Who knows how long I was supposed to remain on ADSL.

So, on 5 dec 18:38 I was back on the phone with the support people and complained about the hangs I frequently get with modem #3. The guy listened to me explaining the issue, he checked the reboot logs from his side and swiftly decided he would send me a new modem. He decided to send a modem of a different brand this time to see if this made things work better in my end.

zyxel-p-2601hn

On dec 8th I got modem #4. A different model this time compared to #2 and #3. It was now a Zyxel P-2601. I got home from work at 18:15, had a quick dinner and then I connected the new equipment. Would this really be the end of my troubles? Anticipation!

- Oh harsh reality, how thee can be rough and cold.

This modem can’t be powered on. If I flip the power switch and turns it on, all the leds switch on but as soon as my finger leaves the power-on toggle again the modem turns itself off… At 18:52 I tried to call support, but a voice claimed they had “internal systems problems” so I gave up.

12:45 on Friday Dec 9th I called again and reported my broken modem and the friendly support woman was a bit surprised I had gotten a broken device as she said “straight from the factory”. She even expressed some sympathy about the replacement unit, modem #5, not being able to reach me until Monday.

On Monday the 12th I got an invoice wanting to charge me 500 SEK for one of the broken modems they claimed I never sent back so I had to call customer service again and have them not do that. (I find 500 SEK for a broken ADSL modem quite a hefty charge when that’s basically the price for a completely new and working unit…)

December 13, modem #5 arrived and I connected it. It didn’t work at once but the phone worked which gave me a clue, so I connected a laptop directly to the ADSL modem and when I then tried to use a browser on that network I reached an admin interface web server and by using that I could switch the modem over to “bridge mode”. It turned out the default setting for this device is to function as a DHCP server and all sorts of other funny things that I didn’t want it to do.

At the time of this writing, number five has been running without problems for 72 hours.

Syndicated 2011-12-16 17:00:43 from daniel.haxx.se

I’m interviewed by foss-magasin

foss-magasin

Claes at foss-magasin.se asked a bunch of questions about me, my commitments within the FOSS community and related matters recently over email. This Swedish interview just now went public: Daniel Stenberg – cURL, Rockbox och FOSS-Sthlm.

For my international friends who don’t understand the Swedish: I am quite happy with the questions and being allowed to answer them at this lengths etc, so I am considering doing a full translation of it and posting it at a later date.

Syndicated 2011-12-11 22:14:59 from daniel.haxx.se

Ten years of Rockbox

In december 2001 the mailing list was setup and the first mail was sent out on December 7th. This was months before the project had any name. We just gathered eager reverse-engineers wanting to improve the Archos Player firmware.

We were just a few friends who like hacking low level code, both as professionals but also in our spare time – and we really thought that these kinds of devices had much larger potential than what the firmwares they were given allowed them. “Rewriting Archos firmware from scratch, how hard can it be?” as we used to joke. Oh well, we did.

archosplayer-front

From that moment we worked on mp3 players. A couple of months later we started on the next target (Archos Recorder) and so we continued. We got ourselves the name Rockbox for the project and people joined up from everywhere, wanting to contribute their knowledge and enthusiasm.

Rewriting Archos firmware from scratch - the tshirts

(Björn “Zagor” Stenberg, Linus “LinusN” Nielsen Feltzing and Daniel “Bagder” Stenberg in September 2002.)

We got our logo in 2002. In 2003 we supported the FM recorder model. We ported code to and run our first stuff on a “software codec” target in 2004. During 2005 we added support for our first color screen targets and in 2006 we added ipod to our “family”. The flood gates opened and new targets have poured in ever since. iAudio X5 and the Sansa e200 were also added that year.

Today, we have code running natively on 75 something targets (on SH1, m68k, ARM and MIPS architectures) and we run Rockbox as an app on top of other operating systems such as Android and Maemo. The project keeps up a fast pace and even in the last few months we’ve seen several new ports having been added to the source code tree.

Being a large project with lots of strong personalities and committed developers we’ve had our share of politics and flame fests. The real name policy was originally a reason for lots of heated debates, as we only accept contributions from people who provide real names – no nick names, but as time has passed the arguments have more and more been over technical details or over how the development is or isn’t run.

Rockbox has participated in the Google summer of Code program four years as a mentor organization and in this time we’ve had perhaps 15 students that have worked on Rockbox, and a bunch of them were successful and a fair amount of those students stayed in the project after having finished their summer projects.

The Android version hasn’t been released on the Android market so far because lots of developers think that first impressions is very important and as Rockbox has been designed with fixed-size screens there has been no support for platforms with varying screen resolutions. This has forced Rockbox to provide different versions for different Android targets (screens really). In addition to that, the GUI of Rockbox has been all native Rockbox and not very Android-like which has also been mentioned as a con. These issues are being worked on, although I cannot provide any estimate for when we’ll see Rockbox “for real” on Android.

I’ll stick to my story about what I think of Rockbox’s future: I think the dedicated music player market is going away slowly and that phones and other portable devices is what people will use to play music on. Rockbox is a very capable music player, but the question is if there’s really a demand for it on the new generation of devices…

Syndicated 2011-12-07 07:00:21 from daniel.haxx.se

A special thank you from Google

Believe me, this kind of apparent random act of kindness warms up even the most seasoned and cold hacker’s heart. Look at this excerpt from a mail I received:

From: Stephanie T <…@google.com>
Subject: 2 Googlers would like to recognize your hard work on cURL

Hello Daniel,

As you know, we here in Google’s Open Source Programs Office are always interested in learning about new projects and people in the open source community. To that end, we asked our co-workers to help us by nominating people outside of Google that they thought were doing great things in the world of open source. This information helps us determine which organizations to fund and which projects to select for the Google Summer of Code and related programs.

Chris C and John M (Googlers) believe your work on cURL deserves to be recognized. To show our appreciation for this work, we would like to send you a special thank you gift and 2- $175 USD gift codes (totaling $350 USD) to the Google online store so you can pick out tee shirts or other gifts for yourself and loved ones.

[snip]

Thanks a lot!

(The names of the persons in the mail have been shortened by me to reduce their exposure here.)

Syndicated 2011-12-06 19:34:45 from daniel.haxx.se

Swedish FOSS-magasin

foss-magasinClaes, our friend from foss-sthlm and several Open Source adventures, has just fired off a new initiative: FOSS-Magasin. The site launched for real on the evening November 19th.

Where there’s no real content on the site yet, Claes has set out a mission for himself and future contributors to create a site with technical content in Swedish that we geeks miss. This would be within areas such as FOSS, *nix, networking and more.

Tired of the poor state of technical and IT related media in Sweden that always seem to try to capture the really large audience and therefore always dumb down everything to a silly level, this is meant to be directed on more competent and interested readers.

The site is free and Claes is looking around for contributors to help hem get content to publish. I can only urge my Swedish friends to join up and help it get going, as I think it would be nice to get a proper Swedish tech site. For me, it will be especially interesting for things that actually happen in or otherwise is related to Sweden, as for all the rest I personally have no problems accessing English sites to get the info.

Syndicated 2011-11-19 22:24:50 from daniel.haxx.se

710 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!