Notes on Adding Exception Handling to R

Luke Tierney
School of Statistics
University of Minnesota

**** This is all still very incomplete ...

Background

Terminology

Exception systems are used to signal an unusual condition when it occurs and to provide handlers to deal with such conditions when they are signaled. These systems usually dynamically match handlers with signals, using the most recently established eligible handler to handle the exception. If a handler declines, the next eligible handler is tried.

A related issue, sometimes handled by the same mechanism, sometimes handled separately, is the need to be able to execute clean-up code regardless of whether an expression is executed normally or is terminated by a non-local exit.

The Dylan Reference Manual's chapter on conditions distinguishes exception handling mechanisms on two dimensions. The first is whether they are calling or exiting:

In an exiting exception system, all dynamic state between the handler and the signaler is unwound before the handler receives control, as if signaling were a nonlocal goto from the signaler to the handler.

In a calling exception system the signaler is still active when a handler receives control. Control can be returned to the signaler, as if signaling were a function call from the signaler to the handler.

The second dimension is whether the conditions signaled are name-based or object-based:
In a name-based exception system a program signals a name, and a handler matches if it handles the same name or "any." The name is a constant in the source text of the program, not the result of an expression.

In an object-based exception system a program signals an object, and a handler matches if it handles a type that object belongs to. Object-based exceptions are more powerful, because the object can communicate additional information from the signaler to the handler, because the object to be signaled can be chosen at run-time rather than signaling a fixed name, and because type inheritance in the handler matching adds abstraction and provides an organizing framework.

The case for something like an object-based system is quite strong.

Calling systems are more general and more flexible than pure exiting ones. They allow warnings and other conditions that need not result in termination to be handled, and they allow a debugger to be entered in the context where an error occurred. But there are some tricky aspects as described below in Section [->].

What Some Other Languages Have

At least as far as I understand them, which may not be very far.

Java

Java has an object-based exiting handler system, the try/ catch construct. This system includes a clean-up mechanism, the finally clause.

Even though this mechanism is exiting, Java tries to support debugging by storing a stack trace in a condition. I'm not sure if this is documented, but by experimentation it looks like the constructor for Throwable fills in its stack trace by calling the fillInStackTrace method. This essentially assumes that exceptions are always signaled with the idiom

throw new MyExcept(...)
There is in principle no reason why an exception cannot be pre-allocated, but doing this circumvents the stack trace heuristic.

Common Lisp

The CL condition system is described in Steele [cite ls:CLtL2] Chapter 29. It is also described in the Common Lisp HyperSpec, Chapter 9. A paper by Kent Pitman provides some background.

The CL mechanism is sort of object-based (it was developed before CLOS; under ANSI it is CLOS-based), and it supports calling and exiting handlers. Calling handlers are established with handler-bind, exiting ones with handler-case. Conditions are signaled with warn, error, cerror, or signal. The error function is guaranteed never to return; the others might return if the handlers don't do a nonlocal exit.

The basic tool for signaling conditions is signal. All handlers are conceptually calling handlers. Matching handlers are called one after another with more recently established ones first until one either does a non-local exit (as exiting ones will do) or until there are no more handlers. If signal does run out of handlers (i.e. if there are no eligible handlers or if all decline) then signal returns nil. With this approach, calling handlers decline to handle a condition by simply returning.

In addition to conditions, this mechanism includes things called restarts that provide hooks for building recovery protocols. For example, when a symbol without a value has its value fetched, a restart that allows the value to be set is established and the error is signaled. I use this (sort of) in my auto-loading code.

When a calling handler established with handler-bind is executed, only outer handlers are visible. This prevents infinite recursions but also forces the restart mechanism to use a separate stack.

CL does not have a mechanism for defining default handlers to use if no handlers have been set up with one of the handler forms.

My implementation is available in the file conditns.lsp available from the cvs archive. It uses the CL nonlocal exit mechanism for exiting handlers and restarts, using essentially the same code as given in Steele. Internally, the error functions just call out to a Lisp function defined in this file. Stack overflows are handled, as I recall, by invoking an abort restart, thus forcing an exit. I actually made this a bit more complex than it probably needed to be by trapping these aborts with an errset, the XLISP error handling primitive. My approach tries to get into the debugger as soon as possible so the source of stack overflows can be traced.

Common Lisp handles clean-up by a separate mechanism. Steele's section on dynamic non-local exits contains a detailed description of how clean-up handling interacts with non-local exits, which are used as the building blocks for escaping condition handlers. This description is under the unwind-protect heading.

Dylan

Dylan seems to have one of the richest and cleanest (given its richness) exception systems. It is derived from the Common Lisp one and has cleaned up that model in a number of ways. The Dylan condition system is documented in the Dylan Reference Manual [cite shalit96:_dylan_refer_manual]. Another good reference is [cite feinberg96:_dylan_progr].

Dylan's system is object based, and handlers can be calling or exiting. Calling handlers are established with the let handler form. A calling handlers for a condition is a function of two arguments, the condition and the next handler to use if the handler function wants to decline. If your handler for a condition of type <mycond> is a function h, you would do something like

let handler(<mycond>) = h
expr
to handle conditions that occur while executing expr. Dylan's signal only calls (at most) one handler; to decline, the handler must call the next handler supplied to it as its second argument. If the handler returns, its values are the values returned by signal.

Exiting handlers are established as part of the block form with an exception clause,

block ()
  expr
exception (c :: <mycond>) ...code to deal with c...
end block

Clean-up handling is also done using block by adding a cleanup clause.

Dylan also has a special subclass of conditions called <restart> that are used for setting up dynamic handling protocols. But they are not in any other way special as far as I can tell. In contrast, in Common Lisp restarts are a separate kind of animal. This is a point where I think Dylan is a bit cleaner, though the intent for restarts is that they only be used by exiting handlers, and there does not seem to be a way to enforce this.

When a calling handler is called, handlers established between the called handler and the signaler are not disabled. This means that there is no protection against an infinite recursion of calls to the same handler. On the other hand this seems to be essential if restarts and conditions are to be handled by the same mechanism.

In Dylan there is a generic function that can be used to establish default handlers. This seems quite useful. On the other hand, having default handlers defined at top level sort of forces them to be calling, which is fine as long as you are not dealing with stack overflow or heap exhaustion. Also, some provision needs to be made for signals raised by default handlers.

Dylan's let handler and block exception also allow options to be attached to handlers. These options can be used for a guard condition to test eligibility of the handler based on some runtime context, and they also allow for information to be provided that can help an interactive restart mechanism.

I am a little confused about why Dylan allows handlers to return values that are passed on by signal. It doesn't seem to use this in cerror, but there may be other uses. Maybe some of the examples in the reference manual would illustrate this.

Tom

Tom is an experimental language that seems to have some association with Gnome, but I know little about it other than that it uses a CL-like condition system. What exists of Tom so far seems rather stripped down; it may be worth a look as an example of a bare bones facility. I don't really know how well thought out it is.

ML

ML has an exception mechanism that is exiting. It seems to be based on functions that you can pass data as arguments, but I don't know enough about it to really describe it. References to ML are available off the Persimmon page.

Issues

Calling or Exiting?

[*] Calling exceptions allow certain situations to be ignored at times and to cause exiting at other times by writing an appropriate handler. With calling exceptions, warnings can be part of the exception mechanism. With calling exceptions, one option it so enter a debugger that allows the context where the exception occurred to be inspected.

In principle, calling exception handling is a superset of exiting handling, but this assumes that the language supports some form of non-local exit (something like Common Lisp's or Dylan's catch/throw or the block mechanisms both those languages have). R does not have such a mechanism. It could be added; if it isn't, then it will be necessary to have separate mechanisms for calling and exiting handlers (if calling handlers are supported).

In fact, there is really always the need for this to some extent, even with a calling approach: If the reason an exception is is being signaled is that the stack has run out,then you cannot call a handler in place. Similarly, if the heap is exhausted, a call is not likely to get very far. Both of those, and perhaps a few other resource-related exceptions really require some sort of ``native exiting'' handler. If exiting handlers are separate from calling ones, then this does not require any special case construct.

One approach to stack overflow errors would be to follow the Java model: store a stack trace in an exception object and take the first eligible exiting handler. This would almost be consistent with the CL mechanism: you can conceptually imagine taking any intervening calling handlers, having them fail, and dropping down to the next exiting one. This makes sense since CL handlers only see the handlers that were active when they were established. A problem does arise if no exiting handler exists: in this case the debugger is supposed to be entered by the signaler, but that isn't possible since the stack is full. So this only works if there is guaranteed to be a top level exiting handler. This can be insured by either writing one into the top level loop or by allowing signal to exit the process or thread if there isn't one.

Unfortunately I don't think this approach is consistent with the Dylan approach. In Dylan, calling handlers see all established handlers, so handling a stack overflow almost by definition leads to an infinite recursion of signals. One way out is to make stack overflow impossible by using a stack of heap-allocated frames, but that raises a similar problem with heap exhaustion. One possible way out would be to allow conditions to be marked as exiting-only, thus making only exiting handlers eligible. A convention for an exiting-only continuation might be that if no handler is present the debugger will be entered ``at the earliest feasible point'' as the stack is unwound. This is essentially the hack I have used in my condition system for XLISP. The hack does make some sense, but I would prefer it to be part of a consistent semantics, rather than a special case.

Are Restarts Essential?

Restarts are useful to provide a mechanism for creating structured recovery protocols in a calling handler system. But are they essential or is it possible to leave them out of a first implementation and add them later?

I think the answer depends on the mechanism handlers can use to decline handling a condition and pass it on to the next eligible handler. In CL, signal calls the eligible handlers one after another. To decline handling a condition, a handler just returns. The signal function will then call the next handler. This means that the only way a handler can actually handle a condition and stop the chain of handler calls is by a non-local exit. If a handler wants to have the condition it is handling be ignored, it has to be able to do a non-local exit to a suitable point. This is why the warn function establishes the muffle-warning restart---this is the only way a warning can be ignored. Other non-local exit mechanisms could be used instead of restarts, but some form of non-local exit is essential since a simple return is equivalent to declining to handle the condition.

Dylan does this differently. If Dylan's signal finds an eligible handlers it calls only that handler. If the handler returns (and a return is allowed for that condition), then the values returned by the handler are returned as the values of signal. The handler is called with two arguments. The first is the condition. The second argument is a closure that takes no argument. If the handler handles the condition, then it ignores this second argument. If the handler declines to handle the condition, then it should tail call the second argument. This will then call the next eligible handler or take the appropriate default action if there is none. This approach does not require a non-local exit to handle a condition.

Thus if Dylan's approach for declining to handle a condition is used, then restarts are not really needed up front and can be added later. With the CL approach, they are essential in order to be able to use calling handlers effectively.

I believe the decision on how to handle declining is orthogonal to the other main difference between Dylan and CL: whether restarts occupy a separate stack and handlers can be unwound during handler processing (CL, or whether restarts are placed in the same stack and therefore handlers must remain visible while they are processed (Dylan).

Mechanisms R Has Now

R now has several mechanisms that need to be integrated with an exception handling system.

The stop Function

This function should signal a condition and never return, like CL error.

The warning Function

This can signal a warning and return, like CL's warn. Default handling is to print the message. I assume R does some internal magic to avoid printing huge numbers of similar warnings in vectorized calculations; this needs to be thought through. Having this as part of the condition system means code could ask to enter the debugger on warnings to find out, for example, where na's are being generated.

Splus has the ability to customize the handling of warnings via options. This is similar to CL's use of the *break-on-warnings* variable.

The on.exit Function

This is R's clean-up mechanism. It could be retained or replaced by something like a finally clause. In either case, for exiting handlers the exact semantics of signaling a condition from within a clean-up expression need to be thought through---are all handlers down to the one being thrown to unwound prior to the clean-up or not?

The restart Function

This is an S function that currently does not seem to exist in R. A good exception handling mechanism should make it unnecessary.

Synchronous Signal Handlers

I think the only synchronous signal handled now is SIGFPE. This could just raise an R floating point exception and let the exception handling system take over from there.

Asynchronous Signal Handlers

R currently handles SIGINT. I think this is now almost always handled by a longjmp to top level. This could be replaced by a signal of an interrupt condition. One issue to worry about is that exiting is not always safe (for example in GC). This may already have been addressed by suspending the signal, or just using it to set a flag, and then processing it after the critical section is complete. The possible interaction of SIGINT with clean-up actions also needs another look, but these issues are not specific to adding a exception system.

A point I am not sure about is the issue of what can be done safely from within a UNIX signal handler. Places to read on this are [cite butenhof97:_progr_posix_thread, robbins96:_pract_unix_progr, stevens98:_unix_networ_progr] under the heading of asynch signal safe functions. One reading is that almost nothing can be safely done from within a signal handler, so in particular calling a user level handler that could then do almost anything would be a bad idea. If the signal handler just sets a flag and then allows the regular system to make the handler call when the flag is noticed, then this is not a problem. On the other hand, if there is a need to be able to interrupt a piece of not quite trusted code, then a jump out of the signal handler is needed. Exiting handlers would jump, but calling ones would not.

The Debugger

Interaction with the debugger needs to be thought through.

Interface Ideas

Here is a possible interface. Conditions are objects with a class that inherits from Condition.

Handlers can be established as calling or exiting handlers. This could be done with special syntax or with functions named something like with.handlers and with.catchers. A call of the form

with.handlers(expr, Error=fe, Warning=fw)
will call fe if an error is signaled while evaluating expr and fw if a warning is signaled. As in Dylan, while these are called, all existing handlers remain active

For exiting handlers, something similar could be used,

with.catchers(expr, Error=fe, Warning=fw)
A signal would transfer control to the context of the outer expression and call the appropriate handler. Before transferring control, the available handlers would be unwound down to the level of the ones available in the surrounding context. on.exit expressions executed on the way down would therefore not be able to jump to intermediate handlers. (This is only one possible design, but I think it is the right one. At the least, intervening exit points should be disabled.)

As an alternative, a syntax like

try
  expr
catch Error (e) e.expr
catch Warning (w) w.expr
could be used. This is basically the Java syntax. If this is used, it might be natural to add a finally clause.

As in Dylan, generic functions could be used to set up default calling and exiting handlers, with the calling ones searched before the exiting ones.

Conditions are signaled by stop, warning, or signal. signal would do the following:

signal <- function(c) {
  hlist <- find.handlers(c)
  for (h in hlist) {
    if (h$exiting) {
      # disable intervening exiting handlers
      .Throw(h, c)
    }
    else h$action(c)
  }
}
The handlers found might be local ones or default ones; a default catching handler would be executed at the top level (of the thread).

stop would be something like

stop<-function(arg)
  c <- as.condition(arg)
  signal(c)
  exit.to.toplevel(c)
}
The as.condition function would, for example, convert a string to a condition.

warning would be something like

warning<-function(arg) {
  try
    signal(as.warning(arg))
  catch muffle.warnings (w) return NULL;
}
The default handler could then look in options and signal muffle.warnings if warnings are to be ignored. This would follow the CL model for warn, but, like Dylan, the muffle.warnings restart would be just another condition.

One thing to think about with default handlers is whether the read/eval/print loop should be made public, with a command line switch for setting the loop to run.

Differences between interactive and batch mode could be encoded here in some way too.

Implementation

One approach is to add a handler field to the context structure. This field would contain a list of the form
[exiting, type1, e1, type2, e2, ...]
This list could just be a heap vector of SEXPR's. The exiting member is a boolean flag. This field will probably have to be marked by the GC.

Still not quite sure about best way to handle default handlers.

For Control-C could signal to signal interrupt condition. Not sure this is really safe in UNIX.

Stack overflow and heap exhaustion are signaled by special exiting-only exceptions.

Examples

Trapping All Errors

for (i in x) {
  try
    Big.Simulation(i)
  catch Error (e) {
    paste("Error in", i,":", e)
  }
}

Autoloading

Have get for top level variables do something like
top.get <- function (var) {
  with.handlers({
                  if (unbound(var))
                  signal(make.unbound.variable.exception(var))
                  else real.get(var)
                },
                use.value = function(c) c$value)}
Then the default handler for the unbound variable exception could be something like
function(c) { autoload(c$name); signal(make.use.value(get(var))) }
This isn't quite right and is also too simplistic because it doesn't play nice with catching all errors, but it is an outline.

NA Handling

One option might be to allow an error handler to supply an na.action to a restart if none was originally provided and NA's were found. A skeleton might look like
if (nas.present(x) && na.action == na.fail)
  try
    signal("NA's -- specify action if you like")
  catch na.action (n) na.action = n$action;
  ...
}
Not sure if this is a good idea, or if there are better choices.

Debugging

Still need to think about this.

References

[1] David R. Butenhof. Programming with POSIX Threads. Addison-Wesley, Reading, MA, 1997.

[2] Neal Feinberg, Sonya E. Keene, Robert O. Mathews, Peter S. Gordon, and P. Tucker Withington. Dylan Programming: An Object-Oriented and Dynamic Language. Addison-Wesley, 1996.

[3] Kay A. Robbins and Steven Robbins. Practical UNIX Programming. Prentice Hall, Upper Saddle River, NJ, 1996.

[4] Andrew Shalit, David Moon, and Orca Starbuck. The Dylan Reference Manual: The Definitive Guide to the New Object-Oriented Dynamic Language. Addison-Wesley, 1996.

[5] Guy L. Steele. Common Lisp the Language. Digital Press, Burlington, MA., 2nd edition, 1990.

[6] W. Richard Stevens. UNIX Network Programming, volume I. Prentice-Hall, Upper Saddle River, NJ, 1998.