Multiple Imputation
Imputation equations for filling missing values
- Up to 30 variables imputed simultaneously
- Six types of imputation procedures for
- Continuous variables using multiple regression
- Binary variables using logistic regression
- Count variables using Poisson regression
- Likert scale (ordered outcomes) using ordered probit
- Fractional (proportional outcome) using logistic regression
- Unordered multinomial choice using multinomial logit
- No duplication of the base data set
Estimation step for any model in LIMDEP or NLOGIT
- All models supported by built in procedures
- Any model written by the user with GMME, MAXIMIZE, NLSQ, etc.
- Estimate any number of models using each imputed data set
Example
Here is a constructed example based on a data set that contains 27,326 observations and about 30 variables. The variable married is a marital status dummy variable. We have injected about 10% missing values into this binary variable. We create an imputation equation for married with the IMPUTE command. The procedure then fits a probit model that uses married and several other variables. The missing values are imputed using age, education and income in each of 25 iterations. The second set of results is the simple probit results using casewise deletion rather than imputation.
SAMPLE ; All $
CREATE ; missing = Rnu(0,1) < .1 $
CREATE ; If(missing=1)married = -999 $
IMPUTE ; Lhs = married ; Rhs = one,age,educ,income ; Type = Binary $
PROC $
PROBIT ; Lhs = doctor ; Rhs = one,married,age,kids,public ; Imputation = Probita $
ENDPROC$
EXECUTE ; N = 25 ; Imputation = Probita $
SKIP $
PROBIT ; Lhs = doctor ; Rhs = one,married,age,kids,public $
---------------------------------------------------------------
Deleted 2761 observations with missing data. N is now 24565
---------------------------------------------------------------
---------------------------------------------------------
Equation stored for imputing missing values of MARRIED
Imputation method: Binary Logistic
Observations currently in full data set = 33333
Complete observations for imputation equation = 24565
Missing observations on MARRIED in data set = 2761
---------------------------------------------------------
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -17679.32198
Restricted log likelihood -18019.55173
Chi squared [ 4 d.f.] 680.45951
Significance level .00000
Estimation based on N = 27326, K = 5
MI results based on 25 imputed samples
Likelihood based stats are not reliable
when using multiple imputation methods.
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| -.42849*** .04118 -10.41 .0000 -.50920 -.34778
MARRIED| .04706** .02052 2.29 .0218 .00684 .08728
AGE| .01379*** .00078 17.75 .0000 .01226 .01531
KIDS| -.13128*** .01799 -7.30 .0000 -.16653 -.09603
PUBLIC| .20659*** .02409 8.58 .0000 .15938 .25380
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Maximum repetitions of PROC
---------------------------------------------------------------
Deleted 2761 observations with missing data. N is now 24565
---------------------------------------------------------------
Normal exit: 4 iterations. Status=0, F= 15885.25
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable DOCTOR
Log likelihood function -15885.24892
Restricted log likelihood -16190.94099
Chi squared [ 4 d.f.] 611.38414
Significance level .00000
McFadden Pseudo R-squared .0188804
Estimation based on N = 24565, K = 5
Inf.Cr.AIC = 31780.5 AIC/N = 1.294
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
DOCTOR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| -.42261*** .04335 -9.75 .0000 -.50758 -.33764
MARRIED| .05610*** .02112 2.66 .0079 .01471 .09749
AGE| .01351*** .00083 16.37 .0000 .01189 .01513
KIDS| -.13675*** .01921 -7.12 .0000 -.17439 -.09911
PUBLIC| .20923*** .02540 8.24 .0000 .15945 .25901
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------