Write Barrier to Support Generational GC in R

Luke Tierney
School of Statistics
University of Minnesota

Background

The generational collector divides allocated nodes into generations based on some notion of age. Younger generations are collected more frequently than older ones. For this to work correctly, any younger nodes that are reachable only from older nodes must be handled properly. This is accomplished by a write barrier that monitors each assignment and takes appropriate action when a reference to a new node is placed in an older one.

Implementation

The write barrier is implemented by functions that must be used for all assignments of SEXP pointers. These functions are listed in the following table:
Attributes SET_ATTRIB(x,i,v)
Strings and Vectors SET_STRING_ELT(x,i,v), SET_VECTOR_ELT(x,i,v)
Lists SET_TAG(x,v), SETCAR(x,v), SETCDR(x,v)
SETCADR(x,v), SETCADDR(x,v), SETCADDDR(x,v) SETCAD4R(x,v)
Closures SET_FORMALS(x,v), SET_BODY(x,v), SET_CLOENV(x,v)
Symbols SET_PRINTNAME(x,v), SET_SYMVALUE(x,v), SET_INTERNAL(x,v)
Environments SET_FRAME(x,v), SET_ENCLOS(x,v), SET_HASHTAB(x,v)
Promises SET_PREXPR(x,v), SET_PRENV(x,v), SET_PRVALUE(x,v)

Assignments must only be made through these functions.

Checking the Write Barrier

Code that includes Rinternals.h will see SEXP's as opaque pointers and only be able to access SEXPREC fields through functions. (This can be circumvented by defining USE_RINTERNALS before the include statement, but doing this should be strongly discouraged.)

Ordinarily files that include Defn.h will be able to access internal fields of SEXPREC's directly. This improves performance (by about 10%I believe on the base tests) but it comes at the risk of violating the write barrier. To check that the write barrier is intact, define TESTING_WRITE_BARRIER in Defn.h. When TESTING_WRITE_BARRIER is defined, only files that explicitly define USE_RINTERNALS will be able to access SEXPREC fields directly (for the generational collector this should only be memory.c). All other files see SEXP's as opaque pointers that must be accessed by functions. Using old-style assignments of the form

CAR(x) = v
will produce compiler errors.

Accessing Sting and Vector Data Pointers

It is still possible to access pointers to the data of STRSXP's and VECSXP's. This is done using the macros/functions STRING_PTR and VECTOR_PTR.

The VECTOR_PTR function/macro is only used in one place, in RObjToCPtr in dotcode.c. I'm not usre if this is needed; if it is not it should be dropped. If it is, it has to be used with extreme caution since changing the values referenced can damage the integrity of the heap.

STRING_PTR is currently used in around 20 places. Most look safe; many have to do with sorting. It would be a good idea to replace all by accesses via STRING_ELT and SET_STRING_ELT.

Changing the Representation of String Vectors

Currently STRSXP data consists of a vector of SEXP]'s. To change this to a [[char **, which would be more convenient for the C code interface, two steps are needed:
  1. Remove all uses of STRING_PTR
  2. Redefine STRING_ELT and SET_STRING_ELT to set the data value to the CHAR address for the CHARSXP and to retrieve the CHARSXP by stepping back from theCHAR pointer stored.