**** This is all still very incomplete ...
A related issue, sometimes handled by the same mechanism, sometimes handled separately, is the need to be able to execute clean-up code regardless of whether an expression is executed normally or is terminated by a non-local exit.
The Dylan Reference Manual's chapter on conditions distinguishes exception handling mechanisms on two dimensions. The first is whether they are calling or exiting:
In an exiting exception system, all dynamic state between the handler and the signaler is unwound before the handler receives control, as if signaling were a nonlocal goto from the signaler to the handler.The second dimension is whether the conditions signaled are name-based or object-based:In a calling exception system the signaler is still active when a handler receives control. Control can be returned to the signaler, as if signaling were a function call from the signaler to the handler.
In a name-based exception system a program signals a name, and a handler matches if it handles the same name or "any." The name is a constant in the source text of the program, not the result of an expression.In an object-based exception system a program signals an object, and a handler matches if it handles a type that object belongs to. Object-based exceptions are more powerful, because the object can communicate additional information from the signaler to the handler, because the object to be signaled can be chosen at run-time rather than signaling a fixed name, and because type inheritance in the handler matching adds abstraction and provides an organizing framework.
The case for something like an object-based system is quite strong.
Calling systems are more general and more flexible than pure exiting ones. They allow warnings and other conditions that need not result in termination to be handled, and they allow a debugger to be entered in the context where an error occurred. But there are some tricky aspects as described below in Section [->].
try
/ catch
construct. This system includes a clean-up mechanism, the finally
clause.
Even though this mechanism is exiting, Java tries to support debugging
by storing a stack trace in a condition. I'm not sure if this is
documented, but by experimentation it looks like the constructor for
Throwable
fills in its stack trace by calling the
fillInStackTrace
method. This essentially assumes that exceptions
are always signaled with the idiom
throw new MyExcept(...)There is in principle no reason why an exception cannot be pre-allocated, but doing this circumvents the stack trace heuristic.
The CL mechanism is sort of object-based (it was developed before
CLOS; under ANSI it is CLOS-based), and it supports calling and
exiting handlers. Calling handlers are established with
handler-bind
, exiting ones with handler-case
. Conditions are
signaled with warn
, error
, cerror
, or signal
. The
error
function is guaranteed never to return; the others might
return if the handlers don't do a nonlocal exit.
The basic tool for signaling conditions is signal
. All handlers
are conceptually calling handlers. Matching handlers are called one
after another with more recently established ones first until one
either does a non-local exit (as exiting ones will do) or until there
are no more handlers. If signal
does run out of handlers (i.e. if
there are no eligible handlers or if all decline) then
signal
returns nil
. With this approach, calling handlers
decline to handle a condition by simply returning.
In addition to conditions, this mechanism includes things called restarts that provide hooks for building recovery protocols. For example, when a symbol without a value has its value fetched, a restart that allows the value to be set is established and the error is signaled. I use this (sort of) in my auto-loading code.
When a calling handler established with handler-bind
is executed,
only outer handlers are visible. This prevents infinite recursions
but also forces the restart mechanism to use a separate stack.
CL does not have a mechanism for defining default handlers to use if no handlers have been set up with one of the handler forms.
My implementation is available in the file conditns.lsp
available
from the
cvs
archive. It uses the CL nonlocal exit mechanism for exiting handlers
and restarts, using essentially the same code as given in Steele.
Internally, the error functions just call out to a Lisp function
defined in this file. Stack overflows are handled, as I recall, by
invoking an abort
restart, thus forcing an exit. I actually made
this a bit more complex than it probably needed to be by trapping
these aborts with an errset
, the XLISP error handling primitive.
My approach tries to get into the debugger as soon as possible so the
source of stack overflows can be traced.
Common Lisp handles clean-up by a separate mechanism. Steele's
section on
dynamic
non-local exits contains a detailed description of how clean-up
handling interacts with non-local exits, which are used as the
building blocks for escaping condition handlers. This description is
under the unwind-protect
heading.
Dylan's system is object based, and handlers can be calling or
exiting. Calling handlers are established with the let handler
form. A calling handlers for a condition is a function of two
arguments, the condition and the next handler to use if the handler
function wants to decline. If your handler for a condition of type
<mycond>
is a function h
,
you would do something like
let handler(<mycond>) = h exprto handle conditions that occur while executing
expr
. Dylan's
signal
only calls (at most) one handler; to decline, the handler
must call the next handler supplied to it as its second argument. If
the handler returns, its values are the values returned by signal
.
Exiting handlers are established as part of the block
form with an
exception
clause,
block () expr exception (c :: <mycond>) ...code to deal with c... end block
Clean-up handling is also done using block
by adding a cleanup
clause.
Dylan also has a special subclass of conditions called <restart>
that are used for setting up dynamic handling protocols. But they are
not in any other way special as far as I can tell. In contrast, in
Common Lisp restarts are a separate kind of animal. This is a point
where I think Dylan is a bit cleaner, though the intent for restarts
is that they only be used by exiting handlers, and there does not seem
to be a way to enforce this.
When a calling handler is called, handlers established between the called handler and the signaler are not disabled. This means that there is no protection against an infinite recursion of calls to the same handler. On the other hand this seems to be essential if restarts and conditions are to be handled by the same mechanism.
In Dylan there is a generic function that can be used to establish default handlers. This seems quite useful. On the other hand, having default handlers defined at top level sort of forces them to be calling, which is fine as long as you are not dealing with stack overflow or heap exhaustion. Also, some provision needs to be made for signals raised by default handlers.
Dylan's let handler
and block
exception
also allow options
to be attached to handlers. These options can be used for a guard
condition to test eligibility of the handler based on some runtime
context, and they also allow for information to be provided that can
help an interactive restart mechanism.
I am a little confused about why Dylan allows handlers to return
values that are passed on by signal. It doesn't seem to use this in
cerror
, but there may be other uses. Maybe some of the examples
in the reference manual would illustrate this.
In principle, calling exception handling is a superset of exiting
handling, but this assumes that the language supports some form of
non-local exit (something like Common Lisp's or Dylan's
catch
/throw
or the block
mechanisms both those languages
have). R does not have such a mechanism. It could be added; if it
isn't, then it will be necessary to have separate mechanisms for
calling and exiting handlers (if calling handlers are supported).
In fact, there is really always the need for this to some extent, even with a calling approach: If the reason an exception is is being signaled is that the stack has run out,then you cannot call a handler in place. Similarly, if the heap is exhausted, a call is not likely to get very far. Both of those, and perhaps a few other resource-related exceptions really require some sort of ``native exiting'' handler. If exiting handlers are separate from calling ones, then this does not require any special case construct.
One approach to stack overflow errors would be to follow the Java
model: store a stack trace in an exception object and take the first
eligible exiting handler. This would almost be consistent with the CL
mechanism: you can conceptually imagine taking any intervening calling
handlers, having them fail, and dropping down to the next exiting
one. This makes sense since CL handlers only see the handlers that
were active when they were established. A problem does arise if no
exiting handler exists: in this case the debugger is supposed to be
entered by the signaler, but that isn't possible since the stack is
full. So this only works if there is guaranteed to be a top level
exiting handler. This can be insured by either writing one into the
top level loop or by allowing signal
to exit the process or thread
if there isn't one.
Unfortunately I don't think this approach is consistent with the Dylan approach. In Dylan, calling handlers see all established handlers, so handling a stack overflow almost by definition leads to an infinite recursion of signals. One way out is to make stack overflow impossible by using a stack of heap-allocated frames, but that raises a similar problem with heap exhaustion. One possible way out would be to allow conditions to be marked as exiting-only, thus making only exiting handlers eligible. A convention for an exiting-only continuation might be that if no handler is present the debugger will be entered ``at the earliest feasible point'' as the stack is unwound. This is essentially the hack I have used in my condition system for XLISP. The hack does make some sense, but I would prefer it to be part of a consistent semantics, rather than a special case.
I think the answer depends on the mechanism handlers can use to
decline handling a condition and pass it on to the next eligible
handler. In CL, signal
calls the eligible handlers one after
another. To decline handling a condition, a handler just returns.
The signal
function will then call the next handler. This means
that the only way a handler can actually handle a condition and stop
the chain of handler calls is by a non-local exit. If a handler wants
to have the condition it is handling be ignored, it has to be able to
do a non-local exit to a suitable point. This is why the warn
function establishes the muffle-warning
restart---this is the only
way a warning can be ignored. Other non-local exit mechanisms could
be used instead of restarts, but some form of non-local exit is
essential since a simple return is equivalent to declining to handle
the condition.
Dylan does this differently. If Dylan's signal
finds an eligible
handlers it calls only that handler. If the handler returns (and a
return is allowed for that condition), then the values returned by the
handler are returned as the values of signal
. The handler is
called with two arguments. The first is the condition. The second
argument is a closure that takes no argument. If the handler handles
the condition, then it ignores this second argument. If the handler
declines to handle the condition, then it should tail call the second
argument. This will then call the next eligible handler or take the
appropriate default action if there is none. This approach does not
require a non-local exit to handle a condition.
Thus if Dylan's approach for declining to handle a condition is used, then restarts are not really needed up front and can be added later. With the CL approach, they are essential in order to be able to use calling handlers effectively.
I believe the decision on how to handle declining is orthogonal to the other main difference between Dylan and CL: whether restarts occupy a separate stack and handlers can be unwound during handler processing (CL, or whether restarts are placed in the same stack and therefore handlers must remain visible while they are processed (Dylan).
stop
Functionerror
.
warning
Functionwarn
. Default
handling is to print the message. I assume R does some internal magic
to avoid printing huge numbers of similar warnings in vectorized
calculations; this needs to be thought through. Having this as part
of the condition system means code could ask to enter the debugger on
warnings to find out, for example, where na
's are being generated.
Splus has the ability to customize the handling of warnings via
options
. This is similar to CL's use of the
*break-on-warnings*
variable.
on.exit
Functionfinally
clause. In either case, for exiting
handlers the exact semantics of signaling a condition from within a
clean-up expression need to be thought through---are all handlers down
to the one being thrown to unwound prior to the clean-up or not?
restart
Function
SIGFPE
. This
could just raise an R floating point exception and let the exception
handling system take over from there.
SIGINT
. I think this is now almost always
handled by a longjmp
to top level. This could be replaced by a
signal of an interrupt
condition. One issue to worry about is
that exiting is not always safe (for example in GC). This may already
have been addressed by suspending the signal, or just using it to set
a flag, and then processing it after the critical section is complete.
The possible interaction of SIGINT
with clean-up actions also
needs another look, but these issues are not specific to adding a
exception system.
A point I am not sure about is the issue of what can be done safely from within a UNIX signal handler. Places to read on this are [cite butenhof97:_progr_posix_thread, robbins96:_pract_unix_progr, stevens98:_unix_networ_progr] under the heading of asynch signal safe functions. One reading is that almost nothing can be safely done from within a signal handler, so in particular calling a user level handler that could then do almost anything would be a bad idea. If the signal handler just sets a flag and then allows the regular system to make the handler call when the flag is noticed, then this is not a problem. On the other hand, if there is a need to be able to interrupt a piece of not quite trusted code, then a jump out of the signal handler is needed. Exiting handlers would jump, but calling ones would not.
Condition
.
Handlers can be established as calling or exiting handlers. This
could be done with special syntax or with functions named something
like with.handlers
and with.catchers
. A call of the form
with.handlers(expr, Error=fe, Warning=fw)will call
fe
if an error is signaled while evaluating expr
and
fw
if a warning is signaled. As in Dylan, while these are called,
all existing handlers remain active
For exiting handlers, something similar could be used,
with.catchers(expr, Error=fe, Warning=fw)A signal would transfer control to the context of the outer expression and call the appropriate handler. Before transferring control, the available handlers would be unwound down to the level of the ones available in the surrounding context.
on.exit
expressions
executed on the way down would therefore not be able to jump to
intermediate handlers. (This is only one possible design, but I think
it is the right one. At the least, intervening exit points should be
disabled.)
As an alternative, a syntax like
try expr catch Error (e) e.expr catch Warning (w) w.exprcould be used. This is basically the Java syntax. If this is used, it might be natural to add a
finally
clause.
As in Dylan, generic functions could be used to set up default calling and exiting handlers, with the calling ones searched before the exiting ones.
Conditions are signaled by stop
, warning
, or signal
.
signal
would do the following:
signal <- function(c) { hlist <- find.handlers(c) for (h in hlist) { if (h$exiting) { # disable intervening exiting handlers .Throw(h, c) } else h$action(c) } }The handlers found might be local ones or default ones; a default catching handler would be executed at the top level (of the thread).
stop
would be something like
stop<-function(arg) c <- as.condition(arg) signal(c) exit.to.toplevel(c) }The
as.condition
function would, for example, convert a string to
a condition.
warning
would be something like
warning<-function(arg) { try signal(as.warning(arg)) catch muffle.warnings (w) return NULL; }The default handler could then look in
options
and signal
muffle.warnings
if warnings are to be ignored. This would follow
the CL model for warn
, but, like Dylan, the muffle.warnings
restart would be just another condition.
One thing to think about with default handlers is whether the read/eval/print loop should be made public, with a command line switch for setting the loop to run.
Differences between interactive and batch mode could be encoded here in some way too.
[exiting, type1, e1, type2, e2, ...]This list could just be a heap vector of
SEXPR
's. The exiting
member is a boolean flag. This field will probably have to be marked
by the GC.
Still not quite sure about best way to handle default handlers.
For Control-C could signal
to signal interrupt
condition. Not
sure this is really safe in UNIX.
Stack overflow and heap exhaustion are signaled by special exiting-only exceptions.
for (i in x) { try Big.Simulation(i) catch Error (e) { paste("Error in", i,":", e) } }
get
for top level variables do something like
top.get <- function (var) { with.handlers({ if (unbound(var)) signal(make.unbound.variable.exception(var)) else real.get(var) }, use.value = function(c) c$value)}Then the default handler for the unbound variable exception could be something like
function(c) { autoload(c$name); signal(make.use.value(get(var))) }This isn't quite right and is also too simplistic because it doesn't play nice with catching all errors, but it is an outline.
na.action
to a restart if none was originally provided and
NA
's were found. A skeleton might look like
if (nas.present(x) && na.action == na.fail) try signal("NA's -- specify action if you like") catch na.action (n) na.action = n$action; ... }Not sure if this is a good idea, or if there are better choices.
[1] David R. Butenhof. Programming with POSIX Threads. Addison-Wesley, Reading, MA, 1997.
[2] Neal Feinberg, Sonya E. Keene, Robert O. Mathews, Peter S. Gordon, and P. Tucker Withington. Dylan Programming: An Object-Oriented and Dynamic Language. Addison-Wesley, 1996.
[3] Kay A. Robbins and Steven Robbins. Practical UNIX Programming. Prentice Hall, Upper Saddle River, NJ, 1996.
[4] Andrew Shalit, David Moon, and Orca Starbuck. The Dylan Reference Manual: The Definitive Guide to the New Object-Oriented Dynamic Language. Addison-Wesley, 1996.
[5] Guy L. Steele. Common Lisp the Language. Digital Press, Burlington, MA., 2nd edition, 1990.
[6] W. Richard Stevens. UNIX Network Programming, volume I. Prentice-Hall, Upper Saddle River, NJ, 1998.