Old Lessons Relearned

Posted 27 Oct 2001 at 15:28 UTC by apm Share This

Sometimes even the master forgets his lessons and has to be re- taught them by the school of hard knocks. The lesson in this case is to always benchmark before trying to optimize - don't assume you know where the bottleneck is. And don't forget that the algorithm is often the most important speed factor.

For a while now we've been planning to speed up the version control tool in Suneido. So the other day I grabbed Jeff (jferguson) for some pair programming. I was sure the speed issue was in the actual diff code, and that the answer was to rewrite it from Suneido code to C++. It wasn't too hard since Suneido code is fairly similar to C++ and we finished up in 3 or 4 hours of pair programming spread over two days. Jeff hasn't done any C++ programming for a while, so part of the intent was to teach him some C++. Of course, what I should have been teaching him was to benchmark to find bottlenecks before diving into optimizing.

Needless to say, when we ran the new version it made no noticeable difference. More detailed timing showed that we had, in fact, doubled the speed of the diff function. But since it only accounted for about 10% of the time, we'd only cut 5% off the total time. Argh! Better late than never, we identified the real culprit as the method we were using to display the diff results. It took us less than an hour to revise the display to be about 10 times faster. (Without resorting to any C++ code.)

After making this improvement, we found the new bottleneck was the client server protocol the version control was using. It was transferring the code a line at a time. It took about 10 minutes to rewrite this to send and receive the code as one block. This turned out to be about 5 times faster. With these improvements, the version control was starting to seem positively speedy.

In the end we're not even sure we're going to use the C++ version of the diff function. The algorithm is obviously efficient enough that it isn't that critical.

The moral of the story?

"Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers."
-- The Practice of Programming, Brian W. Kernighan and Rob Pike

You'd think after 25 years of programming I would have learned this lesson once and for all. I guess one thing I learned is that it's always easy to jump to conclusions. And it's always easy to thing you don't need to check your assumptions.

Reference: "An O(ND) Difference Algorithm and Its Variations" by Eugene W. Myers, Dept. of Computer Science, University of Arizona.

Andrew McKinlay
www.suneido.com


a saying that I like to use..., posted 27 Oct 2001 at 22:26 UTC by gstein » (Master)

For years now, I've been advocating a simple phrase to people who are thinking about optimization:

90% of the time, you're wrong about where you spend 90% of the time.

Net result is that you always profile before starting any work on optimization. If you're doing network programming, then try to profile across network operations, too. If you spend 95% of the time sending data over the network, then you just don't have to worry about optimizing your request handler :-)

Regarding algorithms: yes, algorithms are important. After programming for many years, however, they will become less and less of an issue. You will simply select the right one for the job up front. Invariably, your program will have the right/optimal algorithms to begin with, so your optimizations will be fine-tuning rather than algorithmic changes. What does happen to experienced programmers is that the code will evolve over time (typically due to adding new features), and the end result will use non-optimal algorithms for the overall feature set. For an evolved codebase, it will be quite common to rethink the problem and revamp large portions to use a new algorithm(s): satisfying current feature requirements, and satifying performance requirements in one shot.

A lesson often rarely learnt, and too easily forgotten, posted 27 Oct 2001 at 22:35 UTC by jtjm » (Journeyer)

From my own experience, not with low level benchmarking of code (I can write semi-competent Perl, and basic C, but will never be a programmer to match those assembled here) but with benchmarking and tuning web servers and the applications served from them, the importance of never assuming that the bottleneck will be in the place that seems obvious cannot be stressed too highly.

Tuning tiers of webservers, application servers, and backend databases is a task that can too easily lead the experienced to leap to conclusions. "We're running [insert foo-Java-servlet-runner] here, and performance is, frankly, abysmal" is often met with "Oh, that's Java or [foo-JVM] for you. You'll want to use [foo-other-Java-servlet-runner] or [foo-flavour-of-month-JVM] or recode this critical bit of your application using FastCGI- it'll solve all your problems", only for one or more of the recommendations to be taken up, and the performance increase to be negligible.

Then, and too often, only then, does someone step up and say "Hmm, let's have a closer look at the system", and after a few hours of load tests, and much netstating, straceing and digging around in /proc realise that the real bottleneck is the connection between the web server (untuned OS thread limits, perhaps) or class based queuing issues, or misconfiguration of the web server, such that it's needlessly passing all request for static content to the servlet runner, or poorly tuned indexes in the database, or running the JVM with the paltry default heap size, leaving most of the available 1 GB of RAM unused, or...

And then a simple tuning fix produces the 200-300% (sometimes even as much as 2000%) improvement that was originally being looked for, instead of the 5-10% the knee-jerk reaction obtained (at much time and expense to the overworked sysadmin) .

Tuning webservers, algorithms, posted 28 Oct 2001 at 03:37 UTC by goingware » (Master)

I wrote an article called Use Validators and Load Generators to Test Your Web Applications after I got some consulting work where the client wanted to know why his application fell over with only 50 users after they brought it live. It had worked fine for them in development.

Something important to note about tuning algorithms. I disagree with the way algorithm analysis is almost always taught. Or rather, I don't think the way to do the analysis is ever taught completely.

Traditional algorithm analysis teaches "Big-O" notation, where you determine how the algorithm scales with the increase in the size of the input. It is often emphasized that tweaking the code isn't the right way to approach optimization, rather that one must find an algorithm that scales to a lower order - n log n instead of n squared, for example. We are taught to ignore the constant factor in the algorithm.

This is all well and good in the case we are handling just a few data sets which are large, but it doesn't cover the case of handling large numbers of independent data sets. We may know the fastest way to sort a file with a million records, but what is the fastest way to sort 200,000 independent files with 5 records apiece?

In such a case the constant factor can come to outweigh the growth of the algorithm with the size of the input. It is certainly worthwhile in such a case to tweak the code, and it may even be appropriate to use a higher-order algorithm because it offers a lower setup and teardown time.

By the way, I learned to profile after I talked my boss into buying a $5000 floating point accellerator board for our Sun 3/160, only to find that it sped up the time to process one of our aerial photos from 90 seconds to 85. After I learned to profile (desperately, in an all night hacking session), I had the code running 10 seconds without the accellerator and 5 seconds with - so hardware accelleration of floating point lived up to its claims, but only when we weren't wasting time.

In this particular case, a subroutine was being called for each pixel of the image to determine which stdio FILE to read the next pixel from, and then the pixel was obtained with getc(). Changing this to determine the input just once, then read the whole image in a single read() system call provided the greatest speedup.

Re: Tuning webservers, algorithms, posted 28 Oct 2001 at 10:06 UTC by ali » (Apprentice)

goingware, I disagree about the Big-O thing.

Algorithm analysis works pretty well for everything. It just has to be understood right. Let's say we have two algorithms A and B, where A runs in O(n^2) and B runs in O(n^3). Then, true, A is called "faster than" B.

This is true. Given that your code must handle arbitrary values for n, you should choose algorithm A.

But if you know that n is not free, but rather limited (say, n<10), you can just fill it in. You get that A runs in O(100) and B runs in O(1000), and you know that O(100)=O(1000)=O(1). That tells you that for your problem both algorithms have the same runtime complexity, and none of the both is to be chosen because of the O.

Now, the Big-O is a mathematical concept that hides some facts for you. If you did your runtime analysis careful, you first calculated the correct runtime functions (For A it was, say, 17n^2+26n+1, and for B some other term.) Then, you derived the Big-O's from those terms. Given that those Big-O's gave you "same runtime complexity", you'd now insert n<10 into your terms, and get upper bounds for A and B applied on your problem in single step counts. If now you get that A runs in less than 556 steps and B in less than 30, the complexity analysis gives you the correct result, namely that B is the faster algorithm for your specific problem.

If you got taught to always ignore the constant factors and the remaining term, you had the wrong teacher.

p.s.: Why can't I use <sup> here? It would look so much better...

Re: Tuning webservers, algorithms, posted 28 Oct 2001 at 10:56 UTC by goingware » (Master)

ali, you are of course correct here.

But I'm pretty sure almost everything I've seen recently published, and what I got taught in school, did say to forget all but the highest order terms. I particularly remember this from the graduate algorithm analysis course at UCSC.

My hazy recollection though, is that in Knuth's art of computer programming, he did do such a detailed analysis of the complexity for all his algorithms. One reason for inventing the MIX assembly code was so that he would have a standard processor to measure complexity with.

It happens that when I first decided to study programming in a serious way, I read Knuth. It was a little too deep for a beginner, but I did remember this way of looking at things.

Re: Tuning webservers, algorithms, posted 28 Oct 2001 at 15:07 UTC by ali » (Apprentice)

Um. Sorry for my previous post. It wasn't meant to be as offensive as it turned out. I just wrote it before the day's first coffee, when everything I say sounds offensive. Of course the initial "I disagree" is completely garbage, because afterwards I state that I agree with what you said. Damn. (Note to self: Never write anything while Caffeine-shortage-alarm is active.)

I also had some "lecture" on algorithms and complexity, and it taught the same. The reason is quite simple: If you have CS students which panic when they see polynomes, you cannot go deeply into complexity theory.

This is very annoying. To really understand it, you obviously need some knowledge in both mathematics and computer theory (which many CS students neither have nor want)

You can do a simple survey to see that problem: Ask some people what the Big-O complexity for the following problem is: "Given a set of n numbers, find the biggest one." Almost everyone will answer "O(n)", which is basically wrong (Or they'll gasp "Big-WHAT?").

On the other hand, a seriously done complexity analysis is practically impossible for most things. Nearly everyone does it using additional assumptions and simplifications. But if you know what it's really about, you can at least derive correct results from the analysis.

So, my(!) final conclusion is: If it must handle arbitrary data, use the mighty O, otherwise code multiple possibilities and use a profiler.

Yes, Knuth, posted 28 Oct 2001 at 15:11 UTC by roundeye » (Journeyer)

goingware, I was just going to reply that an over-dependence upon big-O notation for choosing algorithms is an artifact of the way run-time analysis is being taught. While I've never seen big-O taught without a formal definition which includes examples about how big-O comparable algorithms may vary widely in performance (current fastest matrix multiplication algorithms being a great expository case), it seems that without a solid grounding in low-level performance analysis that most students end up taking home that ignoring the big-O constants is just fine... which is limiting and probably just plain Bad.

I was thinking that Knuth's ACP is a great introduction to performance analysis. After spending a couple of semesters adding up operations in MIX a student would have a feel for algorithmic comparison, at which point big-O analysis could be taught without fear of the student forgetting about the constants indefinitely.

While I don't expect the normal academic curriculum to reroute itself to start with Knuth, it certainly wouldn't hurt. I'd say to aspiring programmers wanting to learn about performance and complexity, go to Knuth first and then come back to big-O analysis.

more than just "algorithms", posted 28 Oct 2001 at 17:20 UTC by apm » (Journeyer)

One of the common mistakes I run into is people not recognizing the difference between, for example:

    for each record in "select * from customers"
        if (record.city is "Saskatoon")
            process(record)

and

    for each record in "select * from customers where city = Saskatoon"
        process(record)

If there are a lot of customers, and there's an index on city, then there's a huge difference, but a lot of inexperienced people just don't see it. There are two different algorithms here, but it's not obvious.

I/O, posted 29 Oct 2001 at 06:57 UTC by jmason » (Master)

goingware said:

In this particular case, a subroutine was being called for each pixel of the image to determine which stdio FILE to read the next pixel from, and then the pixel was obtained with getc(). Changing this to determine the input just once, then read the whole image in a single read() system call provided the greatest speedup.

Which raises a very good point. When you're profiling, be sure to take a good look at <code>strace</code> and see what system calls you're making; a highly-efficient algorithm can be totally defeated by inefficient I/O (such as the above).

Big-O notation, posted 29 Oct 2001 at 16:08 UTC by forrest » (Journeyer)

I apologize if this is getting off-topic, but my curiousity was piqued when ali said

You can do a simple survey to see that problem: Ask some people what the Big-O complexity for the following problem is: "Given a set of n numbers, find the biggest one." Almost everyone will answer "O(n)", which is basically wrong (Or they'll gasp "Big-WHAT?").
I certainly would answer "0(n)".

To find the largest of n numbers, you cannot escape the need to examine the value of each and every one. In the most straightforward algorithm I can think of, you go through your list of numbers, comparing each number with the largest-found-so-far. That takes n-1 comparisons.

Is the algorithm I've just described not O(n)? Is there some more efficient algorithm to solve this problem?

Re: Big-O notation, posted 29 Oct 2001 at 18:10 UTC by ali » (Apprentice)

No, it isn't possible faster than O(n): Unless the numbers are ordered in some way, you at least have to look at (or "read") each of them, which takes at least O(1).

The problem is that reading and comparing numbers does not run in O(1). Given that (as example) you use binary-encoded integers, it takes O(m), where m is the number of binary digits. The common (but false if talking about general algorithms) assumption is that m<=32 (or 64, or whatever), in which case it reduces to O(1). But generally speaking, find the highest number in a set of just one number needs O(m) (with m as above), not O(1).

Because of that, real complexity analysis defines this magic n appearing in all those O's as the required space</em> for the algorithm's input, measured in bits (for example).

That's what I meant with "it's hardly ever done correct" - nobody wants to deal with that. You wouldn't get anywhere with reasonable effort. But it's a problem that nobody teaches it that way, so people cannot remember it when this distinction suddenly counts.

A (incorrect but simple) example for "it counts" is, for example, with strings as identifiers. A lot of libraries provide algorithms with written runtime complexities that rely on string comparison for "key/value" pairs or something. All those complexities are wrong, because they don't take into account that a user-provided arbitrary string may be 10GByte long and isn't generally comparable in O(1). (This example is slightly incorrect because one can argue that memory isn't infinite large. On a 32-bit-OS, the whole memory is limited to 2^32 bytes (I think...), so all memory operations are limited by that factor, and thus string comparison runs in O(1). But then you can also argue that find-the-largest also is limited by that, runs in O(1), and so on. But that doesn't lead to anything...)

Hope this helped.

It depends (as always), posted 30 Oct 2001 at 19:03 UTC by roundeye » (Journeyer)

ali, I think the point on which we're all ultimately agreeing can be summarized by saying that complexity analysis is usually sloppy, and is unfortunately taught that way (maybe the last part's not fair to educators who usually do teach rigorous concepts, but not in a way where students have to take home a rigorous understanding...). The corollary to this is that not only does it often not matter whether analysis is sloppy, but sometimes it's a downright annoyance. When it does matter we should have the training, rigor, and tools to do it right.

The issues you brought up with the O(n) max problem have to do with sloppiness in analysis or specification. In the original form of the question we just know that we want the big-O complexity of finding the largest of a set of n numbers. We either have to ask for clarification, which usually unnecessary and annoying (i.e., when it is without cause), or make a number of assumptions about the nature of the sloppily-stated problem. We assume things about the model of computation, the storage model, complexity of numeric comparisons, the state of the data provided, etc. This is not to say that the analyst doesn't understand the impact of variations in these constraints, simply that he has to make assumptions.

In a computation model where the n numbers are represented by n sticks of spaghetti with n-proportional lengths the max problem can be solved in constant time by grabbing them in one hand and pushing them end-down on a flat surface and picking out the one which sticks up highest. This model is limited by the size of the hand, coefficients of friction, the ability to discriminate between close lengths, and a number of other factors.

If I were to say that analysis is hardly ever done correctly because it's rare to see anyone bringing up the spaghetti sorting model of computation (or whatever you'd like to call it) I'd be a bit extremist. Unless it really matters it doesn't really frigging matter. :-)

Re: It depends (as always), posted 31 Oct 2001 at 08:38 UTC by ali » (Apprentice)

roundeye, it's obviously true that it usually doesn't matter. But sometimes it does. So, it's okay to say that find-the-highest runs in O(n) because it's clear what assumptions are underlying it.

Besides that, your spaghetti model (albeit being yummy) is a very nice example for when most assumptions break. Given n positive numbers, the largest being m, it takes O(log m) to compare two of them if they're binary encoded. Now, you're spaghetti model is well-known for being the ugliest problem in complexity theory: It's equivalent to using 1-ary digits (That is, "20" is represented by 20 times the same digit followed by a "stop" digit) In this case, (because m=exp(log(m))), you are shifting nearly every algorithm to be running in exponential time.

(Um, the "look at your hand" fails if I give you three spaghetti, one is 1 inch long, and the other two have one end on your table and the other ends leaned against jupiter, their lengths differing by 2 inches. Now find the largest in O(3)...)

So, the common assumption is that you're using a k-ary alphabet with k>1.

All of this is not important if you want to solve simple problems, like our find-the-largest, and it's okay to completely neglect encodings, space requirements and everything.

It does however matter when you, for example, want to say something about P=NP, want to invent a crypto algorithm, or whatever. If you're discussing RSA (a prime-number based crypto algorithm), this encoding stuff gets massively important. If you play around with 32-bit-numbers, you're lost. For the usual limited-size problems, complexity theory only gives you hints. It's really meant for arbitrary-sized problems.

I forgot the name, but there existed a crypto-algorithm that was called "safe", because it was based on (I think) the stable set problem, which is NP-hard. Now, people started using it, and everyone assumed that using NP-hard problems guaranteed safety. But, unfortunately, the inventors fell into exactly this trap: They used fixed sizes for the keys, which made the problem being easily crackable. (Additionally they had some structure in the key generation, which reduced the problem to a sub-problem-set of P-solvable problems). So, that's the perfect example for "You might neglect it, but you have to know when it counts, and what changes then.". (Does anyone know how it was called?)

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page