Web page to support

 

Hayes, A. F., & Cai, L. (2007).  Using heteroscedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation.  Behavior Research Methods, 39, 709-722.  For a copy of the paper corresponding to these macros, click here.

 

 

NOTE: Some users of SPSS 18 have reported problems with macros and scripts.  Click here for more information and a possible solution.

 

 

 

SPSS and SAS have powerful matrix algebra-based sets of commands similar in their features to such programming languages as GAUSS, R, and MATLAB. These command sets, combined with the ability to generate new command syntax through the macro definition procedures, allow the user to add tremendous functionality to SPSS and SAS.  Here we provide a set of macros that implement the HC methods described in this paper in SPSS and SAS. 

             Each time a new SPSS session is started, the macro must first be activated by running the program.  Once the program is run, a new command, HCREG, will be added to the SPSS syntax command set.   Once the macro is activated, HCREG will remain a part of the syntax command set until the SPSS session is terminated by quitting SPSS.  However, the macro must be reactivated each time SPSS is executed.  The syntax of the HCREG command is HCREG dv = yvar/iv = ivlist/const = c/method = m/covmat = cv/test = q.,  where c is 1 for regression with a constant and 0 for regression without the constant, m is an integer between 0 and 5 corresponding to the HC method described in this paper or 5 for ordinary standard errors that assume homoscedasticity, cv is 1 to print the variance-covariance matrix of the parameter estimates or 0 to disable its printing, yvar is the name of the criterion variable in the active SPSS data file, ivlist  is a list of one or more predictor variables in the SPSS file.  The parts of the command underlined above are optional.  When those parts are omitted, c defaults to 1, m defaults to 3, and cv defaults to 0.  The “/test = q” subcommand is also optional and can be omitted.  If it is included, the output will include a test that the regression coefficients for the last q variables in ivlist are all equal to zero.  The value of q must be between 1 and one less than the number of variables in ivlist.  There is no limit to the number of predictor variables that can be listed in ivlist.  For example, the command set HCREG dv = depress/iv = gender age anxiety ptsd/test = 2 produces an OLS regression estimating depress from a constant, gender, age, anxiety, and ptsd and prints standard error estimates and significance tests using the HC3 estimator.  It also produces a test of the null hypothesis that the regression coefficients for both anxiety and ptsd are equal to zero.

            The macro has no error handling procedures, so it is important that the data first be screened for such problems as singularities in the predictors and missing data.  The presence of a singularity can be easily identified by seeing if the SPSS OLS regression procedure rejects one or more of the predictors.  The macro can handle system missing data represented with the period character (“.”) through listwise deletion, but it does not recognize user missing values, so user-defined missing data should be deleted manually prior to the execution of the macro command.  Problems during computation will show up with a string of largely unintelligible errors.  If any errors appear, do not interpret the output.   With large sample sizes, SPSS may produce an “out of memory” error.  If this occurs, try increasing the workspace or memory allocated to SPSS  (See the SPSS command reference documentation for details on the SET command to accomplish this). 

            The SAS version of the macro functions identically, although the syntax is slightly different.  After the macro is activated, the new command is %HCREG(data = filename, dv = yvar, iv = xvarlist, const = c, method = m, covmat = cv, test = q), where filename is the name of a SAS data file and the other parameters are defined as described above.  The test parameter is optional, and c, m, and cv default to 1, 3, and 0, respectively.  The SAS version of the macro also has no error handling and assumes that missing data are entered as “.” 

 

            I do not recommend that you type the macros in yourself.  And cutting and pasting the macro text below often will produce extra line breaks that must be removed.  I recommend that you download the syntax files directly: Click here for the SPSS version, or here for the SAS version.  If prompted, select the option to save the file onto your computer.   

 

 

SPSS Version (version 2.0, 10/11/05)

 

DEFINE hcreg (dv =!charend ('/')/iv =!charend ('/')

             /test = !charend('/') !default (0)

             /const = !charend('/') !default(1)

             /method = !charend ('/') !default (3)

             /covmat = !charend('/') !default(0)).

PRESERVE.

set length = none.

SET MXLOOP = 100000000.

MATRIX.

GET x/file = */variables = !dv !iv/names = dv/missing = omit.

compute y=x(:,1).

compute x=x(:,2:ncol(x)).

compute iv5 = x.

compute pr = ncol(x).

compute n = nrow(x).

compute L = ident(pr).

compute tss=cssq(y)-(((csum(y)&**2)/n)*(!const <> 0)).

do if (!const = 0).

  compute iv = t(dv(1,2:ncol(dv))).

  compute df2 = n-pr.

else.

  compute iv = t({"Constant", dv(1,2:ncol(dv))}).

  compute con = make(n,1,1).

  compute x={con,x}.

  compute df2 = n-pr-1.

  compute L1 = make(1,pr,0).

  compute L = {L1;L}.

end if.

compute dv=dv(1,1).

compute b = inv(t(x)*x)*t(x)*y).

compute k = nrow(b).

compute invXtX = inv(t(x)*x).

compute h = x(:,1).

loop i=1 to n.

  compute h(i,1)= x(i,:)*invXtX*t(x(i,:)).

end loop.

compute resid = (y-(x*b)).

compute mse = csum(resid&**2)/(n-ncol(x)).

compute pred = x*b.

compute ess= cssq(resid).

 do if (!method = 2 or !method = 3).

  loop i=1 to k.

    compute x(:,i) = (resid&/(1-h)&**(1/(4-!method)))&*x(:,i).

  end loop.

 end if.

 do if (!method = 0 or !method = 1).

  loop i=1 to k.

    compute x(:,i) = resid&*x(:,i).

  end loop.

 end if.

 do if (!method = 5).

   loop i=1 to k.

    compute x(:,i) = sqrt(mse)&*x(:,i).

  end loop.

 end if.

do if (!method = 4).

 compute mn = make(n,2,4).

 compute pr3 = n-df2.

 compute mn(:,2) = (n*h)/pr3.

 compute ex=rmin(mn).

  loop i=1 to k.

    compute x(:,i) = (resid&/(1-h)&**(ex/2))&*x(:,i).

  end loop.

 end if.

compute hc = invXtX*t(x)*x*invXtX.

do if (!method = 1).

  compute hc = (n/(n-k))&*hc.

end if.

compute F = (t(t(L)*b)*inv(t(L)*hc*L)*((t(L)*b)))/pr).

compute pf = 1-fcdf(f,pr,df2).

compute r2 = (tss-ess)/tss.

compute pf = {r2,f,pr,df2,pf}.

do if (!method <> 5).

  print !method/title = "HC Method"/format F1.0.

end if.

print dv/title = "Criterion Variable"/format A8.

print pf/title = "Model Fit:"/clabels = "R-sq" "F" "df1" "df2" "p"/format F10.4.

compute sebhc = sqrt(diag(hc)).

compute te = b&/sebhc.

compute p = 2*(1-tcdf(abs(te), n-nrow(b))).

compute oput = {b,sebhc, te, p}.

do if (!method <> 5).

 print oput/title = 'Heteroscedasticity-Consistent Regression Results'/clabels

       = "Coeff" "SE(HC)" "t" "P>|t|"/rnames = iv/format f10.4.

 else if (!method = 5).

 print oput/title = 'OLS Regression Results Assuming Homoscedasticity'/clabels

       = "Coeff" "SE" "t" "P>|t|"/rnames = iv/format f10.4.

end if.

compute iv2 = t(iv).

do if (!covmat = 1).

 print hc/title = 'Covariance Matrix of Parameter Estimates'/cnames =

      iv/rnames = iv2/format f10.4.

end if.

do if (!test > 0 and !test < pr).

 compute L2 = make(pr-!test+!const,!test,0).

 compute L = {L2;L((pr+1-!test+!const):(pr+!const),(pr-!test+1):(pr))}.

 compute F = (t(t(L)*b)*inv(t(L)*hc*L)*((t(L)*b)))/!test).

 compute pf = 1-fcdf(f,!test,df2).

 compute pf = {f,!test,df2,pf}.

 print pf/title = "Setwise Hypothesis Test"

    /clabels = "F" "df1" "df2" "p"/format F10.4.

 compute iv = t(iv((pr+1-!test+!const):(pr+!const),1)).

 print iv/title = "Variables in Set:"/format A8.

end if.

END MATRIX.

RESTORE.

!END DEFINE.

 

 

SAS Version (version 2.1, 1/6/06)

 

 

%macro hcreg (data=,dv=,iv=,const=1,method=3,covmat=0,test=0);                        

proc iml;                                                                    

use &data;                                                                   

read all var{&dv &iv} into x;                                                 

nms={&iv};                                                                   

xx=(x = .);xx=xx[,+];                                                        

j=1;do i=1 to nrow(x);if xx[i,1]=0 then;do;x[j,]=x[i,];j=j+1;end;end;         

x=x[1:j-1,];                                                                 

y=x[,1];                                                                     

x=x[,2:ncol(x)];                                                             

pr=ncol(x);                                                                   

n=nrow(x);                                                                   

L=I(pr);                                                                     

tss=ssq(y)-(((y[+,]*y[+,])/n)*(&const ^= 0));                                

if (&const = 0) then;                                                        

  do;                                                                        

  df2 = n-pr;                                                                 

  end;                                                                       

if (&const = 1) then;                                                        

  do;                                                                         

  nms={constant}||{&iv};                                                     

  con=j(n,1,1);                                                              

  x=con||x;                                                                  

  df2 = n-pr-1;                                                              

  L1=j(1,pr,0);                                                              

  L=L1//L;                                                                   

  end;                                                                        

b=inv(x`*x)*x`*y;                                                            

k=nrow(b);                                                                   

invXtx=inv(x`*x);                                                             

h=x[,1];                                                                     

do i=1 to n;                                                                 

  h[i,1]=x[i,]*invXtX*(x[i,]`);                                               

end;                                                                         

resid=(y-(x*b));                                                             

pred = x*b;  

resid2 = resid##2;

mse = resid2[+,]/(n-ncol(x));

ess=ssq(resid);                                                               

if (&method > 1) then;                                                       

  do;                                                                        

  if (&method < 4) then;                                                      

    do i = 1 to k;                                                           

    x[,i]=(resid/(1-h)##(1/(4-&method)))#x[,i];                              

    end;                                                                      

  end;                                                                       

if (&method < 2) then;                                                       

  do i = 1 to k;                                                             

  x[,i]=resid#x[,i];                                                         

  end;         

if (&method = 5) then;

  do i = 1 to k;

  x[,i]=sqrt(mse)#x[,i];

  end;

mn1=j(n,1,4); 

if (&method = 4) then; 

  do;                                                      

  pr3=n-df2;

  mn2=(n*h)/pr3;                                                              

  ex = mn1 >< mn2;                                                               

  do i = 1 to k;                                                              

  x[,i]=(resid/(1-h)##(ex/2))#x[,i];                                         

  end;       

  end;                                                               

hc = invXtX*x`*x*invXtX;                                                      

if (&method = 1) then;                                                       

  do;                                                                        

  hc=(n/(n-k))*hc;                                                           

  end;                                                                        

F=((L`*b)`*inv(L`*hc*L)*(L`*b))/pr;                                          

pf=1-probf(f,pr,df2);                                                        

r2=(tss-ess)/tss;                                                             

pf=r2||f||pr||df2||pf;

cn={"Coeff" "SE" "t" "P>|t|"};

if (&method < 5) then;

 do;print "HC Method" &method;

 cn={"Coeff" "SE(HC)" "t" "P>|t|"};end;

                                          

print "Criterion Variable" {&dv};                                            

print "Model Fit";                                                           

cnms = {"R-sq" "F" "df1" "df2" "p"};                                         

print pf [colname = cnms format = 10.4];                                     

nmst=nms`;                                                                   

sebhc=sqrt(vecdiag(hc));                                                     

te=b/sebhc;                                                                   

p=2*(1-probt(abs(te),n-nrow(b)));                                            

oput=b||sebhc||te||p;

if (&method < 5) then;do;

print "Heteroscedasticity-Consistent Regression Results";end;

if (&method = 5) then;do;

print "OLS Regression Results Assuming Homoscedasticity";end;

print oput [colname = cn rowname = nmst format=10.4];    

if (&covmat = 1) then;

 do;print "Heteroscedasticity-Consistent Covariance Matrix";

 print hc [colname = nms rowname = nmst format=10.4];

 end;                   

if (&test > 0) then;                                                         

   do;if (&test < pr) then;do;                                               

   L2=j(pr-&test+&const,&test,0);                                            

   L=L2//L[(pr+1-&test+&const):(pr+&const),(pr-&test+1):(pr)];               

   F=((L`*b)`*inv(L`*hc*L)*(L`*b))/&test;                                    

   pf=1-probf(f,&test,df2);                                                  

   pf=f||&test||df2||pf;                                                     

   cnms = {"F" "df1" "df2" "p"};                                             

   print "Setwise Hypothesis Test";                                          

   print pf [colname = cnms format = 10.4];                                  

   nmst = nmst[(pr+1-&test+&const):(pr+&const),1]`;                          

   print "Variables in Set: " nmst;                                          

   end;                                                                       

   end;                                                                      

quit;                                                                         

%mend hcreg; 

 

 

invisible hit counter