A manifesto for wordexp()
Posted 23 Oct 2001 at 10:04 UTC by tim 
A common way of handling command lines leads to bugs and in some
circumstances potential security problems. Any program that uses a
configurable helper program (such as grip, KBiff, etc.) has to tackle
this issue. Unfortunately, the obvious way of doing it is wrong.
I saw a program called KBiff announced on freshmeat.net just now,
and the changes are:
Changed: Ability to reset 'mbox' style mailboxes to
old access time after checking and a few variable substitution rules
(%m, %u, %%) added.
The "variable substitution rules," on closer inspection, turn out
to be another instance of a problem that stems from handling command
lines like lines instead of words.
I have written an article aiming to help dispell the myth that a
command line is just a character string and that 'variable
substitutions' like %f-for-filename are just easy text
replacements in it.
If you maintain a program that manipulates command lines like that,
please take a
look.
Tim's article advocates setting an environment variable before calling
system(), and using the shell to expand that variable for you. Using
the shell for variable expansion to avoid problems that
ultimately stem from shell quoting rules (and the fact that some
programmers and most users don't understand them) seems perverse. I
can't think of a solid technical reason why it won't work, it just
seems weird.
My solution is to never, ever use the system() library call in
production code, because it involves an implicit invocation of
/bin/sh. As soon as a shell gets into the picture, you are doomed to
a future of quoting hell. If you avoid the shell entirely, and treatc
ommand lines as they really are -- lists of strings -- then life is
good.
The only problem is that Perl is the only one major programming
language that makes it easy to do things right. In Perl, you should
always use the list form of system:
system "ls", "-l", @FILES;
and never the string form.
In Python, avoid os.system(). You should either do-it-yourself with
os.fork() and one of the os.exec*() functions, or sneak in the back
way and use the 'spawn()' function from the distutils.spawn module (in
the standard library since Python 1.6). (Using the distutils spawn()
function also buys you portability to NT, which you would lose with
fork/exec.)
Perhaps Python's os.system() should be extended to take a list
argument. It's too late to get this into Python 2.2, but still worth
pursuing.
The problem isn't really the shell quoting rules, but the fact that
people don't tend to think in arrays when it comes to command lines.
The shell quoting rules are there to help you: they enable you
to handle file names with spaces, for example.
Really, I'm advocating that C users avoid trying to apply shell
quoting rules by hand and instead get wordexp() to do it for them.
Note that although an implementation of wordexp() that just calls
the shell is possible, it's not the implementation that GNU libc uses
(it is faster than that).
For perl, I hadn't been aware that system() can take a list (I
guess I don't program enough perl..), and that form is certainly to be
preferred over the string form.
The real problem seems to stem from people's general desire to use
"special" substitution rules of their own rather than just using
environment variables from the outset.
Case in point: KBiff uses the special parameter '%u' to mean a
particular URL. In my view, it would have been wiser to start out
with '${URL}' instead of '%u'; that way there is no command line
manipulation necessary at all and wordexp() (or the shell, if you feel
the need to use that) will do The Right Thing.
It's the percent business that gets me. Just use environment
variables: it's what they're there for, after all.