Setting the reference group in
SAS PROCEDURES
Some procedures in SAS allow you
to directly set a reference group by simply including that information in the
CLASS statement (see example below).
proc logistic data=ds;
class sex (ref='female');
model y = sex;
run;
SAS procedures that use this
syntax:
- PROC LOGISTIC
- PROC GENMOD
- PROC PHREG (for proportional hazards modeling of survival data)
- PROC SURVEYLOGISTIC
Unfortunately, PROC GLM and PROC
MIXED do not offer this syntax, and those are the procedures we most often use
in the foundations of experimental design.
Below, we will look at using PROC FORMAT to switch which level of the
factor is the reference (or baseline) group.
proc print data=hamster (obs=10);
run;
Obs litter daylength enzyme
1 1
short 2.1
2 2
short 1.8
3 3
short 1.4
4 4
short 1.2
5 5
short 1.9
6 6
short 2.4
7 1
long
2.6
8 2
long
2.2
9 3
long
2.4
10 4
long
1.7
In the data set
above, the default reference group for daylength is short, as it is alphanumerically last. In the output below, we see that short is used as the reference group as SAS
sets its alpha-hat to zero.
/*Original coding. */
proc glm data=hamster;
class litter daylength;
model enzyme=daylength litter/solution;
run;
The
GLM Procedure
Standard
Parameter
Estimate Error t Value Pr
> |t|
Intercept
2.075000000 B 0.22332711 9.29 0.0002
daylength long
0.550000000 B 0.16881943 3.26 0.0225
daylength short 0.000000000 B .
.
.
litter 1
0.000000000 B 0.29240383 0.00 1.0000
litter 2
-0.350000000 B 0.29240383 -1.20 0.2850
litter 3
-0.450000000 B 0.29240383 -1.54 0.1844
litter 4
-0.900000000 B 0.29240383 -3.08 0.0275
litter 5
0.050000000 B 0.29240383 0.17 0.8709
litter 6
0.000000000 B .
.
.
From the
parameter estimates we see that the mean of the long group is 0.55 units higher than the mean of the short group (for any given litter).
SETTING
REFERENCE GROUP WITH NUMERIC FORMATTING
We will now
define a new variable called newDL which is simply a numeric coding for the old daylength factor variable. We will then use this numeric coding to
set-up a format for newDL that puts long as the reference group.
/* Create
numerically coded variable for DayLength. */
data hamster; set hamster;
if daylength
= "long" then newDL=2;
if daylength
= "short" then newDL=1;
run;
/*Define format newDLcode for the numerically coded variable. */
proc format;
value newDLcode
1 = 'short'
2= 'z-long';
run;
/*Use new
numeric variable and formatted coding in old model. */
proc glm data=hamster;
format newDL
newDLcode.;
class litter newDL;
model enzyme=newDL litter/solution;
run;
Standard
Parameter
Estimate
Error t
Value Pr > |t|
Intercept
2.625000000 B 0.22332711 11.75 <.0001
newDL short -0.550000000
B
0.16881943
-3.26
0.0225
newDL z-long 0.000000000
B .
.
.
litter 1
0.000000000 B 0.29240383 0.00 1.0000
litter 2
-0.350000000 B 0.29240383 -1.20 0.2850
litter 3
-0.450000000 B 0.29240383 -1.54 0.1844
litter 4
-0.900000000 B 0.29240383 -3.08 0.0275
litter 5
0.050000000 B 0.29240383 0.17 0.8709
litter 6
0.000000000 B .
.
.
The reference
group is now z-long. From the parameter estimates we see that
the mean of the short group is 0.55
units lower than the mean of the long
group (for any given litter).
SETTING
REFERENCE GROUP WITH CATEGORICAL FORMATTING
The same result
can also be accomplished by creating a format for the original daylength factor variable using the $ as shown below.
/*Define format DLChar for factor variable daylength.
*/
proc format;
value $DLChar
'long' = 'z-long'
'short'= 'short';
run;
/*Use original
variable and the new formatting. */
proc glm data=hamster;
format daylength
$DLChar.;
class litter daylength;
model enzyme=daylength litter/solution;
run;
Parameter
Estimate
Error t
Value Pr > |t|
Intercept
2.625000000 B 0.22332711 11.75 <.0001
daylength short -0.550000000
B 0.16881943 -3.26 0.0225
daylength z-long 0.000000000
B
.
.
.
litter 1
0.000000000 B 0.29240383 0.00 1.0000
litter 2
-0.350000000 B 0.29240383 -1.20 0.2850
litter 3
-0.450000000 B 0.29240383 -1.54 0.1844
litter 4
-0.900000000 B 0.29240383 -3.08 0.0275
litter 5
0.050000000 B 0.29240383 0.17 0.8709
litter 6
0.000000000 B .
.
.
Relevant Websites:
http://sas-and-r.blogspot.com/2010/09/example-86-changing-reference-category.html
http://support.sas.com/kb/37/108.html