Unix Errors are Stupid
Posted 21 Mar 2000 at 19:07 UTC by Ankh 
The Unix system-level error reporting mechansm (errno and friends) leads to programs with poor error recovery,
misleading feedback to users, and helps to perpetuate a culture of system-based rather than task-based interaction.
For those of us who remember "?" as being the editor's only error, yes, this is an improvement, but...
It doesn't have to be this way.
Today, a C program that runs on Unix interacts with the world though
system calls.
All input and output, memory allocation, file access, starting and stopping,
ultimately boil down to system calls.
System calls either work or fail.
If they fail, they set a thread-local variable, errno, to indicate the error.
It is possible to map errno into a human-readable string that gives some
indication of the possible problem. It is not so easy for the program
itself to interpret the problem. Ad-hoc system-dependent code must be
written to handle each case.
Here some examples to illustrate this.
ENOENT
A file or directory did not exist ("No such file or directory").
A call to open() a file that does not exist would generally
produce this error, but the same value would be returned on
an attempt to open a file in a directory that did not exist.
open("/temp/not-there/boy", O_RDONLY) is indistinguishable from
open("/tmp/not-there.txt", O_RDONLY).
Recovery in the first case might be to create the directory,
or to check that software has been installed correctly.
Recovery in the second case might be to create the file; but
trying to create a file in a directory that isn't there will
lead to exactly the same error.
The error message is too generic.
ENOTTY
The isatty(fd) function uses the ioctl() system call to check
whether the file descripter fd refers to a terminal or a regular
file. The standard I/O package stdio, uses isatty() to determine
whether to line-buffer standard output.
As a result, the first time you call printf() to a non-terminal
file, errno is set to ENOTTY.
The following code will always print "not a typewriter" on such
systems, if the user tries to save the error messages to a file:
void fatalError(char *message)
{
extern char *progname;
if progname) {
fprintf(stderr, "%s: ");
}
fprintf(stderr, "fatal error: ");
perror(message);
exit(1);
}
The error message mechanism decouples the cause of an error from the reporting.
EEXISTS
The shell command mkdir /bin/sh/x produces:
mkdir: cannot make directory "/bin/sh/x": Not a directory
mkdiObviously I know it's not a directory, that's why I'm trying to create it with
mkdir in the first place. The real problem is that
/bin/sh is a regular file, not a directory.
The error messages is at the
wrong level of detail, and does not relate at all to the user's task.
Solutions
A few years ago (1988) I wrote a small library of wrapper functions
for Unix system calls such as unlink() and open().
The idea was to attempt to diagnose common problems and produce
more helpful error messages than perror(3).
An example error message was:
etest: could not open file "/bin/sh/boy.c" for reading
etest: -- "/bin/sh" is a file, not a directory, and cannot contain "boy.c"
This library no longer exists in a useful form (I can mail what I have
to anyone who wants it, but there's not much left. Maybe I was testing the unlink wrapper
too much :-), but in any case it predates widespread implementation of ANSI C.
The library had to be hand-crafted for each platform, which was a pain.
Much of the information is split between the manual page and the source
code for the system call or library function concerned.
Maybe if the manuals used XML to document the error returns, I could
write something to process man pages and automatically build the
library.
As it is, the inconsistent use of troff macros makes that difficult.
Maybe if that XML was available at runtime, I could do something even
smarter, and check it on the fly.
But this is a lunatic idea!
I'm talking about interpreting the online
documentation on the fly so that a program can understand an error
return that's clearly inadequate.
Why not fix the API instead?
error_t e = Eopen(
&result_fd,
"filename",
"short description" e.g. "configuration file"
O_RDONLY, /* or whatever */
[mode]
);
This might be improved by passing down a task context stack,
so that an error message
can relate the user-level task (run web browser), the application's task (load user
interface definition files) and the system error (/proc/xml not found).
Note that exception mechanisms may make this problem worse, not better, by
decoupling the cause of the error (where it's easiest to generate a more precise
report) from the task (where the information is needed).
When I've offered to donate code in the past, I now see that I've offered it to
the wrong people - people for whom ed's famous "?" error message was
satisfactory. But that was over ten years ago. Maybe times have changed?
Is it time to build something better?
exceptions, posted 21 Mar 2000 at 19:47 UTC by graydon »
(Master)
I do not agree that exceptions are worse; they are completely different
animals from error codes. an exception decouples the error's
recovery procedure from its detection phase. You are more than
welcome to write a wrapper library around the standard C libraries which
diagnoses errors in a detailed way, but it is still useful in
many cases to throw such well-detected and well-described error states
up the call stack.
it is especially useful in library code, where the
library
author doesn't even know how the library is being used, and how the user
wants to recover.
one of the biggest problems with exceptions is that java and C++
both
implement then in a broken way. java checks exceptions at compile time
but does not provide any programmer control over resource release, so
there's no way to be sure that leaving a stackframe via an exception
will clean up all the things aquired in that frame. while C++ does
provide the means to release things properly, it fails to properly
enforce the compile-time checking and also fails to enforce
the cleanup by permitting non-smart pointers to unreclaimed memory.
proper use of C++ exceptions requires control over all the source you
depend on and all the programming practises of your contributors: not
likely.
the "errno" solution is popular, not because it is pretty or
well
thought out, but because most people code programs optimistically
anyway. same reason unchecked casting is a popular solution to generic
programming, even though there are safe ways of doing it. enforcing
safety is never going to be popular. it's like enforcing documentation.
imagine a language which wouldn't compile a function, class or module
without docstrings!
the concept of a "task context stack" is identical to the
caller's stack of exception handlers, or rather set of
exception handlers embedded in the call stack.
The Why of the API, posted 21 Mar 2000 at 20:06 UTC by idcmp »
(Journeyer)
Basic API calls cover What/Where you are trying to do, and the
documentation discusses How they are going about it, (Who is an
environmental factor), but
in many cases "Why" isn't covered anywhere.
Why are you allocating this memory? Why are you opening this file? Why
are
you binding to this socket? Why are you locking this fd?
I would imagine the only reason having an API call that can be told
"Why" you are doing things hasn't happened is that it's a recipe for
over-engineering, and that normal design-by-committee issues
occur and eventually clobber such projects.
Sometimes I wish apps would just sleep() for a bit if their malloc()
failed. Sometimes I really wish the OS would just tell me it's out of
memory. I really don't want it killing /bin/bash to make room for a
temporary buffer of another app.
From man 2 open:
ENOTDIR A component used as a directory in pathname is
not, in fact, a directory, or O_DIRECTORY was
specified and pathname was not a directory.
There is a great deal of possible errors. The fact that libcs don't
implement them correctly don't invalidate the API.
Anyway, not all applications want detailed errors (in fact, i believe
most don't) and that is rational enough for not rolling all kinds of
fancy checks into the libc.
If you want to do it, fine, your app can do it on it's own. If you want
to write a wrapper lib to do it, go ahead.
But lease don't imagine bloating *all* applications for the benefit of a
given number of apps.
I don't understand why graydon says
that ``java checks exceptions at compile time but does not provide
any programmer control over resource release, so there's no way to be
sure that leaving a stackframe via an exception will clean up all the
things aquired in that frame.'' Java is not perfect, but I
think exceptions are generally pretty well designed and do not have
this shortfall at all.
The most obvious and common resource acquired by a program is the
creation of new objects. Since Java requires garbage collection these
objects will always be cleaned up properly: this code will not leak:
public void foo(boolean damage) {
Object a = new int[10];
if (damage)
throw new RuntimeException();
}
(Advogato lack of <pre> tags sucks)
The second most common and important resource acquired is
synchronization monitors. Again, because these are always tied to
lexical scope they will be cleaned up without requiring any programmer
intervention. This method will also never return without releasing
the monitor:
public void foo(boolean damage) {
synchronized (this) {
this.a++;
if (damage)
throw new RuntimeException();
}
}
Well over 90% of Java classes will clean up in a completely
satisfactory way in the presence of exceptions using these techniques.
In general code can respond to exceptions it didn't expect in a
reasonable way.
Explicit programmer management of resource cleanup is sometimes
required, usually because the resources in question are not directly
controlled by the JVM. Consider for example wanting to make sure that
a database transaction is rolled back immediately in the case of an
error:
public void foo(Database db, boolean damage) {
Transaction tx = null;
try {
tx = db.newTransaction();
tx.doSomething();
tx.doOtherThing();
tx.commit();
tx = null;
} finally {
if (tx != null)
tx.rollback();
}
}
The finally clause is always called when exiting the block
whether
normally or abnormally. (There is a formal and clear explanation of
this statement in the Java Language
Specification.)
Another great feature of Java exceptions is that because they're
just objects we can for example define chained
exceptions that contain information about underlying causes. For
example "couldn't save record" because "database transaction failed"
because "IO error on hda1". We can also write for example a
general-purpose logger or error dialog that interrogates the exception
object for information.
One problem with this, of course, is that creating an Exception
object every time an operation fails is pretty expensive, whereas the
Unix kernel can just return an integer value which is much cheaper.
Creating all these objects can cause performance problems in Java
programs even on modern hardware, so I imagine it was completely
infeasible on original Unix systems. People might be hesitant to
put any additional cost into the kernel where it has to be paid by
every single program. Remember that in many cases error codes are
harmless and will not be reported to the user, and so it would be a
waste of time to generate detailed messages: look at how many times
-ENOENT is returned while libc starts up.
I'd be interested to see the code for your error-reporting
library,
and I think it could be a very good thing. It seems nearly impossible
to implement your check for "/bin/sh/boy.c" in userspace without
introducing race conditions: it's no good to go back and check one
component at a time if the kernel fails the call, because the
situation may have changed in the interim. Perhaps we could augment
errno with a more detailed explanation that was filled out in the
kernel at the moment the error is detected.
Microsoft has
experimented
with several different exception-handling schemes in the Windows API.
Perhaps other people can comment: last time I looked, three
incompatible systems were used in different parts of the Win32/AFC
code.
I believe that graydon is referring to the fact that since a Java class
does not have a destructor as such, and that in Java there no such thing
as a stack-allocated object (all class instances (objects) in Java are
allocated in the free store and are subject to garbage collection), the
programmer cannot define classes whose instances release resources upon
destruction, one can in C++.
As far as I recall, there is in Java a finalize method that is
run when the object is deallocated, but since this is run at some
indeterminate point in the future, it is not equivalent to a C++
destructor, since we always "know" when a C++ destructor is run. Not
only that: we rely on it.
The use of stack-allocated objects in C++ to acquire and release
resources is a very useful idiom. It interacts very well with exception
handling and C++ exception handling would be far less useful without it.
Finalizers in Java, posted 23 Mar 2000 at 19:10 UTC by mbp »
(Master)
You're correct that Java finalizers are not run at a strictly defined
time as C++ destructors are, but in practice this is not usually a
problem. It just requires a slightly different idiom to what one is
used to in C++, and in any case graydon said `any control', not `control
through object destruction'.
To my mind
public void foo() { OutputStream os = new
FileOutputStream("/tmp/a"); try { os.write(arry); } finally {
os.close(); } }
is sufficiently straightforward. Not allowing objects on the stack
trades off flexibility for simplicity, but it certainly doesn't disallow
deterministic cleanup.
"finally" isn't quite right though. all it guarantees is that the
finally block is "run" on exit -- it does not guarantee that the finally
block will complete. suppose I allocate 10 sensitive objects which need
to be finalized on exit. my finalizer block can do something like
finally {
frob.finalize();
snerk.finalize();
tweedle.finalize();
...
}
but if each of those finalizers might throw a different exception, I
need to nest the handlers and finalizers:
finally {
try {
frob.finalize();
} finally {
try {
snerk.finalize();
} finally {
try {
tweedle.finalize();
}
...
}}}}}
and honestly, if something is this awkward
idiomatically,
it's
being done wrong. if you have a program with a lot of different failure
modes, it can kill program comprehension to have a lot of noise
like this to handle things.
furthermore, since the VM itself will (might) call
finalize()
on the objects when it GC's them, you realy need to set a "clean" flag
inside the object which the finalizer checks to ensure it doesn't run
twice (and thus, in some cases, hurt the underlying system even worse).
compare this approach with stack objects (or even immutables,
like
in
sather) and I think it's clear that stack objects win.
Not allowing objects on the stack trades off flexibility for simplicity,
but it certainly doesn't disallow deterministic cleanup.
But it does disallow deterministic automatic cleanup.
The problem with the finally { ... } idiom is that is places
the responsibility for releasing the resources with the programmer. I
feel that this task is better handled by the object itself; the object
is in a unique position to know what it has acquired and thus to safely
and completely release it.
The wonderful thing about the "resouce acquisition is initialization"
idiom is that classes that use it "just work". That is where simplicity
is gained.
Stack objects, posted 24 Mar 2000 at 00:43 UTC by mbp »
(Master)
Sure, but in C++ having objects on the stack allows all kinds of
interesting damage to do with object slicing, keeping references to dead
objects, and so on. On the other hand scoping objects in this way can
be very clean. This is appropriate and necessary in the no-guard-rails
style of C++, but would go against the OH&S design criteria of
Java. Java's pretty keen on there only being a single right way to do
things.
I've had to use the nested finalizers idiom ocassionally to get
correct
cleanup, but my point is that in the general case is automatic and the
complicated case (of holding externally controlled resources) is at
least possible.
Personally I think the Java design of monitors is worse than
finalizers:
associating a lock with every object is inefficient; making monitors
public breaks encapsulation; and making primitive monitors re-entrant is
questionable.
Java exceptions, posted 24 Mar 2000 at 19:19 UTC by jwz »
(Master)
My two biggest complaints with Java exceptions are how static they are,
and that there is no way to register a handler for an exception that
will decide to continue, rather than throwing out.
I find that as I'm writing code, I
constantly
have
to go
back
and
modify multiple files as I realize down the line that some routine calls
some other routine that might sometimes throw some new exception. So I
have to go all the way back up the potential call stack and add those
exceptions to the list of all callers. This is bogus and non-objecty.
It's bogus that I can't register a handler
that
would do
something
analagous to handling floating-point underflow by returning 0. (I say
``analagous to'' because in that particular case, there are performance
reasons not to use exceptions for that kind of thing, but the general
class of problems still exists.)
Sneakums is right that putting code in
object
finalization
methods
is generally better (cleaner, safer) than using `finally' clauses.
I found the exception mechanisms used in
Flavors
and
CLOS to
be
a
lot easier to deal with than Java's.
I wish Java had some notion of
stack-allocated
objects,
but
only
for
performance reasons, not because of the programming idioms it allows.
Generally, any time you care deeply about when an object is actually
destroyed/finalized, it's because you're doing manual storage
management, and that kind of misses the point of working in an GCed
environment. Because the thing is, some day your assumption about when
an object is really dead is going to be wrong, whereas if you
just let GC do its job, you would never be wrong.
Of course the real bitch about Java is that
it's
impossible
to
define new syntactic elements. For example, Common Lisp has `open' and
`close' functions for files, but you pretty much never use them, instead
you use (with-open-file (fd "name" ...) ...body...) where
with-open-file is a macro that does the equivalent of `try/finally' for
you. Since neither Java nor C have a sensible macro mechanism like
this, you push the effort off to each and every programmer (each
consumer of your APIs) to manage their finallys by hand, rather than
just providing them with a `with-frobbing-foo' form that scopes things
properly.
It's much easier to get people to write
(with-open-file (a ...)
(with-open-file (b ...)
(with-open-file (c ...)
...body...
)))
than the equivalent
try {
a = open(...);
try {
b = open(...);
try {
c = open(...);
...body...
} finally {
close(c);
}
} finally {
close(b);
}
} finally {
close(a);
}
(My god, the HTML parsing that Advogato does is complete shit!
Every time I do `preview' it doubles the number of <P> tags,
and adds more newlines. Now I'm editing text where some sentences
have each word on their own line!)
hmmm, posted 24 Mar 2000 at 21:50 UTC by Ankh »
(Master)
The comment I threw in (incoherently) about exceptions seems to have been more controversial than anything else,
interestingly. But perhaps not surprisingly.
The reason I said that exceptions make error reporting worse, apart from being a blatant effort to stir
people up :-) was that they invite a kind of programming that focuses on hat worked, not on
what failed.
I'd like to go back a bit; when I mentioned a task context, I was thinking not of the call frame and a thread
context, but of a human user task context. You might have several function calls (going back to C) that
are all in support of saving a configuration file, and that might be done because you changed your
Garment Colour Preference to Purple as part of ordering a pair of socks.
In that example, if the config file save failed (out of disk space, say), I want to know whether
my order failed, or if I'll be charged money for purple socks that will never arrive. If the program is running
locally, or if i choose "more details", or in a logfile (erp, but we're out of space!) I want to see that te
code was trying to save a conf file, that it was because my preferences had changed, and that this
particular file system had filled up, and the conf file was (or was not) trashed, and the backup was (was not)
restored.
All of this is possible with exceptions and having the top-level module report the error, as long as your
language supports modifying the exception as it filters up, to add sub-task information.
It's also possible by passing the information </i>down</i> the stack, or mantaining a User Task Model
separately from, or as part of, your Data Model (if you use M-V-C). In this case, the code generating the
error message has access to all the details it might need about the exact local problem (which disk is full),
so it might be easier to write better errors, but the code is more likely to be generic and shared, so it
less likely to want to do so.
The knee-jerk reaction of many programmers to someone who wants to make software accessible to
less technical people, or to provide enough information about problems that you don't need to be a
macho boot-wearing mountaineer (OK, I'm not macho, I admit it darlings) is to say that the result will
be "bloated" (scroll up, someone said it already). Someone told me the other day he wasn't interested
in using XML for data files (for bind) because "XML is a bloated library". Ignorance is everywhere.
Keep your laser handy.
I tried to connect to an IRC server eysterday and spelt the name wrong. BitchX said, can't connect
to server xxx: No such file or directory. Good one. A traceroute and a ping and a telnet later, I eventually
worked out the problem. OK, so I'm slow, and I don't wear shoes.
I'd like to see the environments I use have better error handling. Is that bad?
Oh, for those who wanted to see the code I have, I did go through an 11-year-old backup tape and
found something broken, so I salvaged a tiny part of it at
www.holoweb.net/~liam/elib0.01 but I'm not sure
it's worth looking at. I'll see if I can find the code that tried to diagnose problems, as it's more interesting,
but Clyde has my Zip drive and FreeBSD can't read SPARC SunOS SCSI disks...
I suggest a set of wrappers for open() and friends that can be in a shaerd libray and can
help with errors. This can make command-line applications smaller (no need to test, print error and exit all over
the place) and can help GUI applications recover more gracefully. I don't have time now (alas!) to
devote to writing such a library, or even to managing it as an open source project, so I've thrown it out
in case someone else does. Open Source Ideas :-)
If you read the errno specification, it makes NO gurantees as to the
value of errno when you call another library function. This is because
the library function may make a syscall which will modify errno. The
correct implementation of fatalerror is:
void fatalError(char *message)
{
extern char *progname;
int olderrno;
olderrno = errno;
if (progname) {
fprintf(stderr, "%s: ");
}
fprintf(stderr, "fatal error: ");
errno = olderrno;
perror(message);
exit(1);
}
Which will give you the results you expect. As someone pointed out,
if you actually read the documentation, errnos are a perfectly
acceptable error reporting mechanism.
What I'd rather complain about is all those people that assume a <
-1
return value from a syscall is an error!
There aren't really many cases where checking for -1 and checking for
errno are both necessary. There are a couple cases that come to mind:
If you lseek() and end up with a return value of -1, it could mean one
of two things. it could be an error; it could certainly also be that
you are now at the offset -1 in the fd you seeked upon.
If you call getpriority() to set the niceness of a process, it can
return -1 for the same two reasons: the priority of the process could be
-1, or there could have been an error.
There are similar issues with stro.*(), since the number could be (e.g.)
LONG_MIN or LONG_MAX, this case must be handled as well as the case
where LONG_MIN/LONG_MAX mean that an underflow/overflow has occurred.
Luckily, there aren't very many places where these can be problematic.
For an overwhelming majority of commands, -1 is the standard error
return (or NULL, MAP_FAILED, whatever is defined in the API of the
called function). A good programmer should know to set errno to 0 and
check errno after calls to those functions.
I do agree it can be confusing, but the only thing that will help is
experience. I don't feel that errno is a terrible API, but I do of
course sometimes get irked at weirdness in APIs with regard to error
returns. Some APIs are just badly designed, for example, char
*fgets(char *str, int size, FILE *stream); should return something
useful, such as an int of the length read, rather than the incredibly
obtuse NULL for failure. NULL should be for failure of something which
would be returning allocated memory...
Perhaps there should be a write-up somewhere which will help acquaint
programmers with these kind of quirks. I wouldn't mind contributing to
a "Common Unix programming pitfalls" page, or something of the sort.
One other issue which bothers me about (Java) exceptions is that
they make it real hard to do information hiding right.
Suppose you store an inventory, and at first you do it using ascii
files. There will be a method to open the inventory, and it is
natural that this method throw FileNotFoundException when that
happens. Now the project progresses and now you decide to switch to
an SQL database. Of course, now the natural exception to throw is an
SQLException of some kind.
There are two separate issues to think about. The first issue is
the signature of the method -- the kinds of exceptions it throws are
part of the signature since they are listed in the `throws' clause.
Trying to deal with this problem by declaring all methods to throw
Throwables doesn't deal with the second issue, though.
The second issue are the catch clauses. Before, you called
openInventory() and caught FileNotFoundExceptions, now you call
openInventory() and want to catch SQLExceptions.
The only way around this that I can think of is to change the
classes of the exceptions, which is highly tedious and not
satisfactory at all. That is, you design a class InventoryException
together with some subclasses, and have the openInventory() method
throw one of those. But this means that the exception changes class
every time it goes up the call chain (and crosses module boundaries).
But I thought the promise of exceptions was that they just
propagate up the call chain, and are dealt with at that point in there
call chain where it is most appropriate! And now you find that you
have to deal with every exception at (almost) every point in the call
chain.
Is there a silver bullet? Maybe Java exceptions are braindead and
other programming languages did it right? But which languages?
errno as a non-int, posted 10 Apr 2000 at 15:44 UTC by hpa »
(Master)
Although I think errno has to be an int in the current
C standard (I don't have it handy), it would be really nice if it
wasn't. If errno instead was a structure pointer, it could
contain a lot more information, and it would be much easier to add
error codes appropriate to specific libraries, since errno
values could now be defined in other places than
<errno.h> and the error messages don't have to be
organized in struct tables.
Symbols like
ENOENT
would then be macros of the form:
typedef const struct __error_info *errno_t;
extern errno_t errno;
/* ... */
extern const struct __error_info __ENOENT_struct;
#define ENOENT (&__ENOENT_struct)
... or even ...
extern const struct __error_info __ENOENT_struct;
const errno_t ENOENT = &__ENOENT_struct; /* No macros! */
If a library adds its own errnos, the final link will make them unique
by sheer virtue of having it be a different structure at a different
address. Now strerror() for example becomes simply:
char *strerror(errno_t error)
{
/* Insert localization stuff here */
return error->message;
}
This obviously applies to user space. A table lookup would be need to
convert the indicies from the kernel into the pointers used in user
space, but that's an utter no-brainer.