What is the best system for handling errors?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

I dislike try/catch blocks because I have to put them inline to the code they are handling, like this:

try {
// some code
} catch(exception e) {
// do something with the exception
}

What I want is the ability to write a function that has no try/catch block, but then, at some other place in the code, I set up an observer that watches that function for errors, and handles any errors that the function produces. And this is what is offered by the exception handling system in Common Lisp.

I was interested, but disappointed, to read about exception handling in Go:

Unlike most languages, Go enables you to return multiple values from a function without creating some ad hoc data structure or object to do it. One of the standard return values is an error code, which is accessed conventionally upon the function’s return through the err variable.

This solution solved a messy problem — C’s dual use of return values for data and error codes — which was necessary due to C’s lack of an exception mechanism. To be fair, this problem still exists in a different form in languages that have robust exceptions. For example, in Java, the conventional use of a null return both as an indicator of an error condition and as an actual data item laces codebases with endless tests for null. The problem is so ubiquitous in Java that many JVM scripting languages include shorthand to abbreviate the null checks.

In fact, Go has an exception mechanism as well, but its use is contrary to convention and convention is a central aspect of Go development. It’s also not as elaborate as the exception mechanisms in C++ or Java. Its use is supposed to be for truly exceptional circumstances. I believe this is due to Google scale issues (recalling that Go was designed primarily to address Google development needs, rather than programming problems at large). Namely, that when you’re running thousands of transactions on very large systems, exceptions become a very costly proposition. Not only are they slow, but they permanently fork the execution path. And on large, fast-moving systems, both effects tend to be highly undesirable. Moreover, exceptions have to be handled by future code that depends on current exception-oriented code. Because of these limitations, Google is fairly strict about limiting the use of exceptions in its C++ codebase:

“On their face, the benefits of using exceptions outweigh the costs, especially in new projects. However, for existing code, the introduction of exceptions has implications on all dependent code. If exceptions can be propagated beyond a new project, it also becomes problematic to integrate the new project into existing exception-free code. Because most existing C++ code at Google is not prepared to deal with exceptions, it is comparatively difficult to adopt new code that generates exceptions.

Given that Google’s existing code is not exception-tolerant, the costs of using exceptions are somewhat greater than the costs in a new project. The conversion process would be slow and error-prone. We don’t believe that the available alternatives to exceptions, such as error codes and assertions, introduce a significant burden.

But I would still greatly prefer a system like what Common Lisp has:

The condition system is more flexible than exception systems because instead of providing a two-part division between the code that signals an error1 and the code that handles it,2 the condition system splits the responsibilities into three parts–signaling a condition, handling it, and restarting. In this chapter, I’ll describe how you could use conditions in part of a hypothetical application for analyzing log files. You’ll see how you could use the condition system to allow a low-level function to detect a problem while parsing a log file and signal an error, to allow mid-level code to provide several possible ways of recovering from such an error, and to allow code at the highest level of the application to define a policy for choosing which recovery strategy to use.

To start, I’ll introduce some terminology: errors, as I’ll use the term, are the consequences of Murphy’s law. If something can go wrong, it will: a file that your program needs to read will be missing, a disk that you need to write to will be full, the server you’re talking to will crash, or the network will go down. If any of these things happen, it may stop a piece of code from doing what you want. But there’s no bug; there’s no place in the code that you can fix to make the nonexistent file exist or the disk not be full. However, if the rest of the program is depending on the actions that were going to be taken, then you’d better deal with the error somehow or you will have introduced a bug. So, errors aren’t caused by bugs, but neglecting to handle an error is almost certainly a bug.

So, what does it mean to handle an error? In a well-written program, each function is a black box hiding its inner workings. Programs are then built out of layers of functions: high-level functions are built on top of the lower-level functions, and so on. This hierarchy of functionality manifests itself at runtime in the form of the call stack: if high calls medium, which calls low, when the flow of control is in low, it’s also still in medium and high, that is, they’re still on the call stack.

Because each function is a black box, function boundaries are an excellent place to deal with errors. Each function–low, for example–has a job to do. Its direct caller–medium in this case–is counting on it to do its job. However, an error that prevents it from doing its job puts all its callers at risk: medium called low because it needs the work done that low does; if that work doesn’t get done, medium is in trouble. But this means that medium’s caller, high, is also in trouble–and so on up the call stack to the very top of the program. On the other hand, because each function is a black box, if any of the functions in the call stack can somehow do their job despite underlying errors, then none of the functions above it needs to know there was a problem–all those functions care about is that the function they called somehow did the work expected of it.

In most languages, errors are handled by returning from a failing function and giving the caller the choice of either recovering or failing itself. Some languages use the normal function return mechanism, while languages with exceptions return control by throwing or raising an exception. Exceptions are a vast improvement over using normal function returns, but both schemes suffer from a common flaw: while searching for a function that can recover, the stack unwinds, which means code that might recover has to do so without the context of what the lower-level code was trying to do when the error actually occurred.

Consider the hypothetical call chain of high, medium, low. If low fails and medium can’t recover, the ball is in high’s court. For high to handle the error, it must either do its job without any help from medium or somehow change things so calling medium will work and call it again. The first option is theoretically clean but implies a lot of extra code–a whole extra implementation of whatever it was medium was supposed to do. And the further the stack unwinds, the more work that needs to be redone. The second option–patching things up and retrying–is tricky; for high to be able to change the state of the world so a second call into medium won’t end up causing an error in low, it’d need an unseemly knowledge of the inner workings of both medium and low, contrary to the notion that each function is a black box.

….What happens when the error is signaled depends on the code above parse-log-entry on the call stack. To avoid landing in the debugger, you must establish a condition handler in one of the functions leading to the call to parse-log-entry. When a condition is signaled, the signaling machinery looks through a list of active condition handlers, looking for a handler that can handle the condition being signaled based on the condition’s class. Each condition handler consists of a type specifier indicating what types of conditions it can handle and a function that takes a single argument, the condition. At any given moment there can be many active condition handlers established at various levels of the call stack. When a condition is signaled, the signaling machinery finds the most recently established handler whose type specifier is compatible with the condition being signaled and calls its function, passing it the condition object.

The handler function can then choose whether to handle the condition. The function can decline to handle the condition by simply returning normally, in which case control returns to the SIGNAL function, which will search for the next most recently established handler with a compatible type specifier. To handle the condition, the function must transfer control out of SIGNAL via a nonlocal exit. In the next section, you’ll see how a handler can choose where to transfer control. However, many condition handlers simply want to unwind the stack to the place where they were established and then run some code. The macro HANDLER-CASE establishes this kind of condition handler. The basic form of a HANDLER-CASE is as follows:

(handler-case expression
error-clause*)

where each error-clause is of the following form:

(condition-type ([var]) code)

If the expression returns normally, then its value is returned by the HANDLER-CASE. The body of a HANDLER-CASE must be a single expression; you can use PROGN to combine several expressions into a single form. If, however, the expression signals a condition that’s an instance of any of the condition-types specified in any error-clause, then the code in the appropriate error clause is executed and its value returned by the HANDLER-CASE. The var, if included, is the name of the variable that will hold the condition object when the handler code is executed. If the code doesn’t need to access the condition object, you can omit the variable name.

Post external references

  1. 1
    http://www.drdobbs.com/architecture-and-design/the-scourge-of-error-handling/240143878?
  2. 2
    http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html
Source