Advogato: A manifesto for wordexp()

Posted 23 Oct 2001 at 10:04 UTC by tim

A common way of handling command lines leads to bugs and in some circumstances potential security problems. Any program that uses a configurable helper program (such as grip, KBiff, etc.) has to tackle this issue. Unfortunately, the obvious way of doing it is wrong.

I saw a program called KBiff announced on freshmeat.net just now, and the changes are:

Changed: Ability to reset 'mbox' style mailboxes to old access time after checking and a few variable substitution rules (%m, %u, %%) added.

The "variable substitution rules," on closer inspection, turn out to be another instance of a problem that stems from handling command lines like lines instead of words.

I have written an article aiming to help dispell the myth that a command line is just a character string and that 'variable substitutions' like %f-for-filename are just easy text replacements in it.

If you maintain a program that manipulates command lines like that, please take a look.

Tim's article advocates setting an environment variable before calling system(), and using the shell to expand that variable for you. Using the shell for variable expansion to avoid problems that ultimately stem from shell quoting rules (and the fact that some programmers and most users don't understand them) seems perverse. I can't think of a solid technical reason why it won't work, it just seems weird.

My solution is to never, ever use the system() library call in production code, because it involves an implicit invocation of /bin/sh. As soon as a shell gets into the picture, you are doomed to a future of quoting hell. If you avoid the shell entirely, and treatc ommand lines as they really are -- lists of strings -- then life is good.

The only problem is that Perl is the only one major programming language that makes it easy to do things right. In Perl, you should always use the list form of system:

  system "ls", "-l", @FILES;

and never the string form.

In Python, avoid os.system(). You should either do-it-yourself with os.fork() and one of the os.exec*() functions, or sneak in the back way and use the 'spawn()' function from the distutils.spawn module (in the standard library since Python 1.6). (Using the distutils spawn() function also buys you portability to NT, which you would lose with fork/exec.)

Perhaps Python's os.system() should be extended to take a list argument. It's too late to get this into Python 2.2, but still worth pursuing.

The problem isn't really the shell quoting rules, but the fact that people don't tend to think in arrays when it comes to command lines. The shell quoting rules are there to help you: they enable you to handle file names with spaces, for example.

Really, I'm advocating that C users avoid trying to apply shell quoting rules by hand and instead get wordexp() to do it for them.

Note that although an implementation of wordexp() that just calls the shell is possible, it's not the implementation that GNU libc uses (it is faster than that).

For perl, I hadn't been aware that system() can take a list (I guess I don't program enough perl..), and that form is certainly to be preferred over the string form.

The real problem seems to stem from people's general desire to use "special" substitution rules of their own rather than just using environment variables from the outset.

Case in point: KBiff uses the special parameter '%u' to mean a particular URL. In my view, it would have been wiser to start out with '${URL}' instead of '%u'; that way there is no command line manipulation necessary at all and wordexp() (or the shell, if you feel the need to use that) will do The Right Thing.

It's the percent business that gets me. Just use environment variables: it's what they're there for, after all.

A manifesto for wordexp()

Posted 23 Oct 2001 at 10:04 UTC by tim

Using the shell to avoid shell problems?, posted 23 Oct 2001 at 14:10 UTC by gward » (Master)

Weird to use shell quoting rules?, posted 23 Oct 2001 at 15:52 UTC by tim » (Journeyer)