2007-08-13: In praise of the C preprocessor
In praise of the C preprocessor
Let's face it. cpp, the C preprocessor, gets a lot of flak among language
designers. People blame it for all sorts of atrocities, including code that
doesn't do what it says it does, weird side effects caused by
double-evaluation of parameters, pollution of namespaces, slow compilations,
and the generally annoying evil that is the whole concept of
declaring your API in C/C++ header files.
All of these accusations are true. But there are some things you just can't
do without it. Watch.
Example #1: doubling
Here's a typical example of why C/C++ preprocessor macros are "bad." What's
wrong with this macro?
#define DOUBLE(x) ((x)+(x))
(Notice all the parens. Neophytes often leave those out too, and hilarity
ensues when something like 3*DOUBLE(4) turns into 3*4+4 instead of 3*(4+4).)
But the above macro has a bug too. Here's a hint: what if you write this?
y = DOUBLE(++x);
Aha. It expands to y=((++x)+(++x)), so x gets incremented twice
instead of just once like you expected.
Macro-haters correctly point out that in C++ (and most newer C compilers),
you can use an inline function to avoid this problem and everything like it:
inline double DOUBLE(double x) { return x+x; }
This works great, and look: I didn't need the extra parens either. That's
because C++ language rules require the parameter to be fully evaluated first
before we implement the function, whether it's inline or not. It would have
been totally disastrous if inline functions didn't work like that.
Oh, but actually, that one function isn't really good enough: what if x is
an int, or an instance of class Complex? The macro can double
anything, but the inline can only double floating point numbers.
Never fear: C++ actually has a replacement macro system that's intended to
obsolete cpp. It handles this case perfectly:
template<typename T>
inline T DOUBLE(T x) { return x+x; }
Cool! Now we can double any kind of object we want, assuming it supports
the "+" operation. Of course, we're getting a little heavy on screwy
syntax - the #define was much easier to read - but it works, and there are
never any surprises no matter what you give for "x".
Example #2: logging
In the above example, C++ templated inline functions were definitely better
than macros for solving our problem. Now let's look at something slightly
different: a log message printer. Which of the following is better, LOGv1
or LOGv2?
#define LOGv1(lvl,str) do { \
if ((lvl)
inline void LOGv2(int lvl, std::string str)
{
if (lvl
(Trivia: can you figure out why I have to use the weird do { } while(0)
notation?)
Notice that the problem from the first example doesn't happen here. As long
as you only refer to each parameter once in the definition, you're okay.
And you don't need a template for the inline function, because actually the
log level is always an int and the thing you're printing is (let's assume)
always a string. You could complain about namespace pollution, but they're
both global functions and you only get them if you include their header
files, so you should be pretty safe.
But my claim is that the #define is much better here. Why?
Actually, for the same reason it was worse in the first example:
non-deterministic parameter evaluation. Try this:
LOGv1(1000, hexdump(buffer, 10240));
Let's say _loglevel is less than 1000, so we won't be printing the message.
The macro expands to something like
if (1000
So the print(), including the hexdump(), is bypassed if the log level is too
low. In fact, if _loglevel is a constant (or a #define, it doesn't matter),
then the optimizer can throw it away entirely: the if() is always false, and
anything inside an if(false) will never, ever run. There's no performance
penalty for LOGv1 if your log level is set low enough.
But because of the guaranteed evaluation rules, the inline function actually
expands out to something like this:
std::string s = hexdump(buffer, 10240);
if (1000
The optimizer throws away the print statement, just like before - but
it's not allowed to discard the hexdump() call! That means your
program malloc()s a big string, fills it with stuff, and then free()s it -
for no reason.
Now, it's possible that C++ templates - being a full-powered macro system -
could be used to work around this, but I don't know how. And I'm pretty
smart. So it's effectively impossible for most C++ programmers to
get the behaviour they want here without using cpp macros.
Of course, the workaround is to just type this every time instead:
if (1000
You're comparing to LOGLEVEL twice - before LOGv2 and inside LOGv2 - but
since it's inline, the optimizer will throw away the extra compare. But the
fact that one if() is outside the function call means it can skip evaluating
the hexdump() call.
The fact that you can do this isn't really a justification for leaving out a
macro system - of course, anything a macro system can do, I can also
do by typing out all the code by hand. But why would I want to?
Java and C# programmers are pretty much screwed here(1) - they have no
macro processor at all, and those languages are especially slow so
you don't want to needlessly evaluate stuff. The only option is the
explicit if statement every time. Blech.
Example #3: assert()
My final example is especially heinous. assert() is one of the most
valuable functions to C/C++ programmers (although some of them don't realize
it yet). Even if you prefer your assertions to be non-fatal, frameworks
like JUnit and NUnit have their own variants of assert() to check unit test
results.
Here's what a simplified assert() implementation might look like in C.
#define assert(cond) do { \
if (!NDEBUG && !(cond)) \
_assert_fail(__FILE__, __LINE__, #cond); \
} while (0)
We have the same situation as example #2, where if NDEBUG is set, there's no
need to evaluate (cond). (Of course, exactly this lack of evaluation is
what sometimes confuses people about assert(). Think about what happens
with and without NDEBUG if you type assert(--x >= 0).)
But that's the least of our worries: I never use NDEBUG anyway.
The really valuable parts here are some things you just can't do
without a preprocessor. __FILE__ and __LINE__ refer to the line where
assert() is called, not the line where the macro is declared, or they
wouldn't be useful. And the highly magical "#cond" notation - which you've
probably never seen before, since it's almost, but not quite, never needed -
turns (cond) into a printable string. Why would you want to do that? Well,
so that you can have _assert_fail print out something awesome like this:
** Assertion "--x >= 0" failed at mytest.c line 56
Languages without a preprocessor just can't do useful stuff like that, and
it's very bothersome. As with any macroless language, you end up
typing it yourself, like in JUnit:
assertTrue("oh no, x >= 5!", --x >= 0);
As you can see in the above example, the message is usually a lie, leading
to debugging wild goose chases. It's also a lot more typing and
discourages people from writing tests. (JUnit does manage to capture the
file and function, thankfully, by throwing an exception and looking at its
backtrace. It's harder, but still possible, to get the line number too.)
Side note
(1) The C# language designers probably hate me, but actually
there's nothing stopping you from passing your
C# code through cpp to get these same advantages. Next time someone
tells you cpp is poorly designed, ask yourself whether their "well-designed"
macro language would let you do that.
Syndicated 2007-08-12 03:19:56 from apenwarr's log