Setting the reference group in SAS PROCEDURES

 

Some procedures in SAS allow you to directly set a reference group by simply including that information in the CLASS statement (see example below). 

 

proc logistic data=ds;

   class sex (ref='female');

   model y = sex;

run;

 

SAS procedures that use this syntax:

- PROC LOGISTIC

- PROC GENMOD

- PROC PHREG (for proportional hazards modeling of survival data)

- PROC SURVEYLOGISTIC

 

Unfortunately, PROC GLM and PROC MIXED do not offer this syntax, and those are the procedures we most often use in the foundations of experimental design.  Below, we will look at using PROC FORMAT to switch which level of the factor is the reference (or baseline) group.

 

proc print data=hamster (obs=10);

run;

 

Obs    litter    daylength    enzyme

 

  1       1        short        2.1

  2       2        short        1.8

  3       3        short        1.4

  4       4        short        1.2

  5       5        short        1.9

  6       6        short        2.4

  7       1        long         2.6

  8       2        long         2.2

  9       3        long         2.4

 10       4        long         1.7

 

In the data set above, the default reference group for daylength is short, as it is alphanumerically last.  In the output below, we see that short is used as the reference group as SAS sets its alpha-hat to zero.

 

/*Original coding. */

proc glm data=hamster;

   class litter daylength;

   model enzyme=daylength litter/solution;

run;

 

The GLM Procedure

                                          Standard

Parameter               Estimate             Error    t Value    Pr > |t|

 

Intercept            2.075000000 B      0.22332711       9.29      0.0002

daylength long       0.550000000 B      0.16881943       3.26      0.0225

daylength short      0.000000000 B       .                .         .

litter    1          0.000000000 B      0.29240383       0.00      1.0000

litter    2         -0.350000000 B      0.29240383      -1.20      0.2850

litter    3         -0.450000000 B      0.29240383      -1.54      0.1844

litter    4         -0.900000000 B      0.29240383      -3.08      0.0275

litter    5          0.050000000 B      0.29240383       0.17      0.8709

litter    6          0.000000000 B       .                .         .

 

 

From the parameter estimates we see that the mean of the long group is 0.55 units higher than the mean of the short group (for any given litter).

 

 

 

SETTING REFERENCE GROUP WITH NUMERIC FORMATTING

 

We will now define a new variable called newDL which is simply a numeric coding for the old daylength factor variable.  We will then use this numeric coding to set-up a format for newDL that puts long as the reference group.

 

/* Create numerically coded variable for DayLength. */

data hamster; set hamster;

if daylength = "long" then newDL=2;

if daylength = "short" then newDL=1;

run;

 

/*Define format newDLcode for the numerically coded variable. */

proc format;

     value newDLcode

     1 = 'short'

     2= 'z-long';

run;

 

/*Use new numeric variable and formatted coding in old model. */

proc glm data=hamster;

   format newDL newDLcode.;

   class litter newDL;

   model enzyme=newDL litter/solution;

run;

                                           Standard

Parameter                Estimate             Error    t Value    Pr > |t|

 

Intercept             2.625000000 B      0.22332711      11.75      <.0001

newDL     short      -0.550000000 B      0.16881943      -3.26      0.0225

newDL     z-long      0.000000000 B       .                .         .

litter    1           0.000000000 B      0.29240383       0.00      1.0000

litter    2          -0.350000000 B      0.29240383      -1.20      0.2850

litter    3          -0.450000000 B      0.29240383      -1.54      0.1844

litter    4          -0.900000000 B      0.29240383      -3.08      0.0275

litter    5           0.050000000 B      0.29240383       0.17      0.8709

litter    6           0.000000000 B       .                .         .

 

The reference group is now z-long.  From the parameter estimates we see that the mean of the short group is 0.55 units lower than the mean of the long group (for any given litter).

 

 

SETTING REFERENCE GROUP WITH CATEGORICAL FORMATTING

 

The same result can also be accomplished by creating a format for the original daylength factor variable using the $ as shown below.

 

/*Define format DLChar for factor variable daylength. */

proc format;

     value $DLChar

     'long' = 'z-long'

     'short'= 'short';

run;

 

/*Use original variable and the new formatting. */

proc glm data=hamster;

   format daylength $DLChar.;

   class litter daylength;

   model enzyme=daylength litter/solution;

run;

 

Parameter                Estimate             Error    t Value    Pr > |t|

 

Intercept             2.625000000 B      0.22332711      11.75      <.0001

daylength short      -0.550000000 B      0.16881943      -3.26      0.0225

daylength z-long      0.000000000 B       .                .         .

litter    1           0.000000000 B      0.29240383       0.00      1.0000

litter    2          -0.350000000 B      0.29240383      -1.20      0.2850

litter    3          -0.450000000 B      0.29240383      -1.54      0.1844

litter    4          -0.900000000 B      0.29240383      -3.08      0.0275

litter    5           0.050000000 B      0.29240383       0.17      0.8709

litter    6           0.000000000 B       .                .         .

 

 

 

 

Relevant Websites:

 

http://sas-and-r.blogspot.com/2010/09/example-86-changing-reference-category.html

 

http://support.sas.com/kb/37/108.html