Recent blog entries for bagder

Can curl avoid to be in a future funnily named exploit that shakes the world?

During this year we’ve seen heartbleed and shellshock strike (and a  few more big flaws that I’ll skip for now). Two really eye opening recent vulnerabilities in projects with many similarities:

  1. Popular corner stones of open source stacks and internet servers
  2. Mostly run and maintained by volunteers
  3. Mature projects that have been around since “forever”
  4. Projects believed to be fairly stable and relatively trustworthy by now
  5. A myriad of features, switches and code that build on many platforms, with some parts of code only running on a rare few
  6. Written in C in a portable style

Does it sound like the curl project to you too? It does to me. Sure, this description also matches a slew of other projects but I lead the curl development so let me stay here and focus on this project.

cURLAre we in jeopardy? I honestly don’t know, but I want to explain what we do in our project in order to minimize the risk and maximize our ability to find problems on our own before they become serious attack vectors somewhere!

previous flaws

There’s no secret that we have let security problems slip through at times. We’re right now working toward our 143rd release during our around 16 years of life-time. We have found and announced 28 security problems over the years. Looking at these found problems, it is clear that very few security problems are discovered quickly after introduction. Most of them linger around for several years until found and fixed. So, realistically speaking based on history: there are security bugs still in the code, and they have probably been present for a while already.

code reviews and code standards

We try to review all patches from people without push rights in the project. It would probably be a good idea to review all patches before they go in for real, but that just wouldn’t work with the (lack of) man power we have in the project while we at the same time want to develop curl, move it forward and introduce new things and features.

We maintain code standards and formatting to keep code easy to understand and follow. We keep individual commits smallish for easier review now or in the future.

test cases

As simple as it is, we test that the basic stuff works. We don’t and can’t test everything but having test cases for most things give us the confidence to change code when we see problems as we then remain fairly sure things keep working the same way as long as the test go through. In projects with much less test coverage, you become much more conservative with what you dare to change and that also makes you more vulnerable.

We always want more test cases and we want to improve on how we always add test cases when we add new features and ideally we should also add new test cases when we fix bugs so that we know that we don’t introduce any such bug again in the future.

static code analyzes

We regularly scan our code base using static code analyzers. Both clang-analyzer and coverity are good tools, and they help us by pointing out code that look wrong or suspicious. By making sure we have very few or no such flaws left in the code, we minimize the risk. A static code analyzer is better than run-time tools for cases where they can check code flows that are hard to repeat in my local environment.

valgrind

bike helmetValgrind is an awesome tool to detect memory problems in run-time. Leaks or just stupid uses of memory or related functions. We have our test suite automatically use valgrind when it runs tests in case it is present and it helps us make sure that all situations we test for are also error-free from valgrind’s point of view.

autobuilds

Building and testing curl on a plethora of platforms non-stop is also useful to make sure we don’t depend on behaviors of particular library implementations or non-standard features and more. Testing it all is basically the only way to make sure everything keeps working over the years while we continue to develop and fix bugs. We would course be even better off with more platforms that would test automatically and with more developers keeping an eye on problems that show up there…

code complexity

Arguably, one of the best ways to avoid security flaws and bugs in general, is to keep the source code as simple as possible. Complex functions need to be broken down into smaller functions that are possible to read and understand. A good way to identify functions suitable for fixing is pmccabe,

essential third parties

curl and libcurl are usually built to use a whole bunch of third party libraries in order to perform all the functionality. In order to not have any of those uses turn into a source for trouble we must of course also participate in those projects and help them stay strong and make sure that we use them the proper way that doesn’t lead to any bad side-effects.

You can help!

All this takes time, energy and system resources. Your contributions and help will be appreciated where ever among these tasks that you can insert any. We could do more of all this, more often and more thorough if we only were more people involved!

Syndicated 2014-12-15 22:21:07 from daniel.haxx.se

libcurl multi_socket 3333 days later

.SE-logoOn October 25, 2005 I sent out the announcement about “libcurl funding from the Swedish IIS Foundation“. It was the beginning of what would eventually become the curl_multi_socket_action() function and its related API features. The API we provide for event-driven applications. This API is the most suitable one in libcurl if you intend to scale up your client up to and beyond hundreds or thousands of simultaneous transfers.

Thanks to this funding from IIS, I could spend a couple of months working full-time on implementing the ideas I had. They paid me the equivalent of 19,000 USD back then. IIS is the non-profit foundation that runs the .se TLD and they fund projects that help internet and internet usage, in particular in Sweden. IIS usually just call themselves “.se” (dot ess ee) these days.

Event-based programming isn’t generally the easiest approach so most people don’t easily take this route without careful consideration, and also if you want your event-based application to be portable among multiple platforms you also need to use an event-based library that abstracts the underlying function calls. These are all reasons why this remains a niche API in libcurl, used only by a small portion of users. Still, there are users and they seem to be able to use this API fine. A success in my eyes.

One dollar billPart of that improvement project to make libcurl scale and perform better, was also to introduce HTTP pipelining support. I didn’t quite manage that part with in the scope of that project but the pipelining support in libcurl was born in that period  (autumn 2006) but had to be improved several times over the years until it became decently good just a few years ago – and we’re just now (still) fixing more pipelining problems.

On December 10, 2014 there are exactly 3333 days since that initial announcement of mine. I’d like to highlight this occasion by thanking IIS again. Thanks IIS!

Current funding

These days I’m spending a part of my daytime job working on curl with my employer’s blessing and that’s the funding I have – most of my personal time spent is still spare time. I certainly wouldn’t mind seeing others help out, but the best funding is provided as pure man power that can help out and not by trying to buy my time to add your features. Also, I will decline all (friendly) offers to host the web site on your servers since we already have a fairly stable and reliable infrastructure sponsored.

I’m not aware of anyone else that are spending (much) paid work time on curl code, although I’m know there are quite a few who do it every now and then – especially to fix problems that occur in commercial products or services or to add features to such.

IIS still donates money to internet related projects in Sweden but I never applied for any funding from them again. Mostly because it has been hard to sync with my normal life and job situation. If you’re a Swede or just live in Sweden, do consider checking this out for your next internet adventure!

Syndicated 2014-12-10 07:15:25 from daniel.haxx.se

Why curl defaults to stdout

(Recap: I founded the curl project, I am still the lead developer and maintainer)

When asking curl to get a URL it’ll send the output to stdout by default. You can of course easily change this behavior with options or just using your shell’s redirect feature, but without any option it’ll spew it out to stdout. If you’re invoking the command line on a shell prompt you’ll immediately get to see the response as soon as it arrives.

I decided curl should work like this, and it was a natural decision I made already when I worked on the predecessors during 1997 or so that later would turn into curl.

On Unix systems there’s a common mantra that “everything is a file” but also in fact that “everything is a pipe”. You accomplish things on Unix by piping the output of one program into the input of another program. Of course I wanted curl to work as good as the other components and I wanted it to blend in with the rest. I wanted curl to feel like cat but for a network resource. And cat is certainly not the only pre-curl command that writes to stdout by default; they are plentiful.

And then: once I had made that decision and I released curl for the first time on March 20, 1998: the call was made. The default was set. I will not change a default and hurt millions of users. I rather continue to be questioned by newcomers, but now at least I can point to this blog post! :-)

About the wget rivalry

cURLAs I mention in my curl vs wget document, a very common comment to me about curl as compared to wget is that wget is “easier to use” because it needs no extra argument in order to download a single URL to a file on disk. I get that, if you type the full commands by hand you’ll use about three keys less to write “wget” instead of “curl -O”, but on the other hand if this is an operation you do often and you care so much about saving key presses I would suggest you make an alias anyway that is even shorter and then the amount of options for the command really doesn’t matter at all anymore.

I put that argument in the same category as the people who argue that wget is easier to use because you can type it with your left hand only on a qwerty keyboard. Sure, that is indeed true but I read it more like someone trying to come up with a reason when in reality there’s actually another one underneath. Sometimes that other reason is a philosophical one about preferring GNU software (which curl isn’t) or one that is licensed under the GPL (which wget is) or simply that wget is what they’re used to and they know its options and recognize or like its progress meter better.

I enjoy our friendly competition with wget and I seriously and honestly think it has made both our projects better and I like that users can throw arguments in our face like “but X can do Y”and X can alter between curl and wget depending on which camp you talk to. I also really like wget as a tool and I am the occasional user of it, just like most Unix users. I contribute to the wget project well, both with code and with general feedback. I consider myself a friend of the current wget maintainer as well as former ones.

Syndicated 2014-11-17 09:11:12 from daniel.haxx.se

Keyboard key frequency

A while ago I wrote about my hunt for a new keyboard, and in my follow-up conversations with friends around that subject I quickly came to the conclusion I should get myself better analysis and data on how I actually use a keyboard and the individual keys on it. And if you know me, you know I like (useless) statistics.

Func KB-460 keyboardSo, I tried out the popular and widely used Linux key-logger software ‘logkeys‘ and immediately figured out that it doesn’t really support the precision and detail level I wanted so I forked the project and modified the code to work the way I want it: keyfreq was born. Code on github. (I forked it because I couldn’t find any way to send back my modifications to the upstream project, I don’t really feel a need for another project.)

Then I fired up the logging process and it has been running in the background for a while now, logging every key stroke with a time stamp.

Counting key frequency and how it gets distributed very quickly turns into basically seeing when I’m active in front of the computer and it also gave me thoughts around what a high key frequency actually means in terms of activity and productivity. Does a really high key frequency really mean that I was working intensely or isn’t that purpose more a sign of mail sending time? When I debug problems or research details, won’t those periods result in slower key activity?

In the end I guess that over time, the key frequency chart basically says that if I have pressed a lot of keys during a period, I was working on something then. Hours or days with a very low average key frequency are probably times when I don’t work as much.

The weekend key frequency is bound to be slightly wrong due to me sometimes doing weekend hacking on other computers where I don’t log the keys since my results are recorded from a single specific keyboard only.

Conclusions

So what did I learn? Here are some conclusions and results from 1276614 keystrokes done over a period of the most recent 52 calendar days.

I have a 105-key keyboard, but during this period I only pressed 90 unique keys. Out of the 90 keys I pressed, 3 were pressed more than 5% of the time – each. In fact, those 3 keys are more than 20% of all keystrokes. Those keys are: <Space>, <Backspace> and the letter ‘e’.

<Space> stands out from all the rest as it has been used more than 10%.

Only 29 keys were used more than 1% of the presses, giving this a really long tail with lots of keys hardly ever used.

Over this logged time, I have registered key strokes during 46% of all hours. Counting only the hours in which I actually used the keyboard, the average number of key strokes were 2185/hour, 36 keys/minute.

The average week day (excluding weekend days), I registered 32486 key presses. The most active sinngle minute during this logging period, I hit 405 keys. The most active single hour I managed to do 7937 key presses. During weekends my activity is much lower, and then I average at 5778 keys/day (7.2% of all activity were weekends).

When counting most active hours over the day, there are 14 hours that have more than 1% activity and there are 5 with less than 1%, leaving 5 hours with no keyboard activity at all (02:00- 06:59). Interestingly, the hour between 23-24 at night is the single most busy hour for me, with 12.5% of all keypresses during the period.

Random “anecdotes”

Longest contiguous time without keys: 26.4 hours

Longest key sequence without backspace: 946

There are 7 keys I only pressed once during this period; 4 of them are on the numerical keypad and the other three are F10, F3 and <Pause>.

More

I’ll try to keep the logging going and see if things change over time or if there later might end up things that can be seen in the data when looked over a longer period.

Syndicated 2014-11-12 08:19:13 from daniel.haxx.se

Changing networks on Mac with Firefox

Not too long ago I blogged about my work to better deal with changing networks while Firefox is running. That job was basically two parts.

A) generic code to handle receiving such a network-changed event and then

B) a platform specific part that was for Windows that detected such a network change and sent the event

Today I’ve landed yet another fix for part B called bug 1079385, which detects network changes for Firefox on Mac OS X.

mac miniI’ve never programmed anything before on the Mac so this was sort of my christening in this environment. I mean, I’ve written countless of POSIX compliant programs including curl and friends that certainly builds and runs on Mac OS just fine, but I never before used the Mac-specific APIs to do things.

I got a mac mini just two weeks ago to work on this. Getting it up, prepared and my first Firefox built from source took all-in-all less than three hours. Learning the details of the mac API world was much more trouble and can’t say that I’m mastering it now either but I did find myself at least figuring out how to detect when IP addresses on the interfaces change and a changed address is a pretty good signal that the network changed somehow.

Syndicated 2014-10-30 21:46:20 from daniel.haxx.se

daniel.haxx.se episode 8

Today I hesitated to make my new weekly video episode. I looked at the viewers number and how they basically have dwindled the last few weeks. I’m not making this video series interesting enough for a very large crowd of people. I’m re-evaluating if I should do them at all, or if I can do something to spice them up…

… or perhaps just not look at the viewers numbers at all and just do what think is fun?

I decided I’ll go with the latter for now. After all, I enjoy making these and they usually give me some interesting feedback and discussions even if the numbers are really low. What good is a number anyway?

This week’s episode:

Personal

Firefox

Fun

HTTP/2

TALKS

  • I’m offering two talks for FOSDEM

curl

  • release next Wednesday
  • bug fixing period
  • security advisory is pending

wget

Syndicated 2014-10-16 21:01:58 from daniel.haxx.se

846 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!