Contact Us     Search     Site Map  
  Home

    LogXact

Cytel Home > Products > LogXact > 


Example 2


Data with Missing Categorical Covariates:
Binary Logistic Regression Model (2)
.

Unlike imputation procedures that do not consider the specific model being fitted to the data, the method of weights (built into LogXact 7 with Cytel Studio) uses the likelihood function in adjusting estimates of the parameters. It is informative to examine the difference this can make in a small illustrative dataset. (For more details on the method of weights see references given below.)

We list below a small data set that will serve as an example to examine differences between standard imputation procedures for missing data as provided, for example, in SAS and the method of weights. The data set is listed below. The last two observations have missing covariate values. The model to be fitted is a logistic regression

model with Y as the response and X as the covariate and an intercept term.

Y

1

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

X

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

.

.

A comparison of the method of weights (LogXact) with the method of multiple imputation ( SAS PROC MI and PROC MIANALYZE), using the default number of imputations (5), shows that for the above data, the default number of imputations is inadequate. The results are given below.

 

 

LogXact 6 with
Cytel Studio
SAS (# Imputations=5) SAS (#Imputations=1000)

Seed= 282797001

Seed= 680677001

Beta Beta Beta Beta
Intercept -1.0986

-0.9808

-1.2040

-1.0959

X -0.0000

-0.2231

0.2231

0.0011

Notice that there is a large difference between the coefficients for the slope for different starting seeds of the random number generator. With 1000 imputations the values from SAS MI are consistent with those computed by the method of weights. The default number of imputations in SAS is clearly insufficient. Increasing the number of imputations to a large number to ensure that results do not depend on the random number sequence used to generate the imputed values can be time consuming for larger datasets.

References:

Ibrahim, JG (1990), “Incomplete Data in Generalized Linear Models”, JASA, 85, 765-769.

Lipsitz, SR and Ibrahim, JG (1996a), ``A Conditional Model For Incomplete Covariates in Parametric Regression Models”, Biometrika, 83, 916-922.

 

Order Now

Demo Now