Multiple Imputation

LIMDEP’s new implementation of multiple imputation is woven into the entire program, not just a few specific models. Any estimator, even your own created with MAXIMIZE, or any other computation involving data that produces a coefficient vector and a sampling covariance matrix, can be based on multiple imputed data sets. And, we have built this technique to bypass the need to create multiple data sets – traditionally, the need to replicate the full data set has hobbled this method. LIMDEP’s implementation of multiple imputation uses only the existing data set. The results are fully replicable as well. (You can create and save the imputed data sets if you wish.)

Multiple Imputation Features Including Continuous Data, Binary Variables, Ordered Outcomes and More

Imputation equations for filling missing values

  • Up to 30 variables imputed simultaneously
  • Six types of imputation procedures for
    • Continuous variables using multiple regression
    • Binary variables using logistic regression
    • Count variables using Poisson regression
    • Likert scale (ordered outcomes) using ordered probit
    • Fractional (proportional outcome) using logistic regression
    • Unordered multinomial choice using multinomial logit
  • No duplication of the base data set

Estimation step for any model in LIMDEP or NLOGIT

  • All models supported by built in procedures
  • Any model written by the user with GMME, MAXIMIZE, NLSQ, etc.
  • Estimate any number of models using each imputed data set

Example

Here is a constructed example based on a data set that contains 27,326 observations and about 30 variables. The variable married is a marital status dummy variable. We have injected about 10% missing values into this binary variable. We create an imputation equation for married with the IMPUTE command. The procedure then fits a probit model that uses married and several other variables. The missing values are imputed using age, education and income in each of 25 iterations. The second set of results is the simple probit results using casewise deletion rather than imputation.

SAMPLE	; All $
CREATE 	; missing = Rnu(0,1) < .1 $
CREATE 	; If(missing=1)married = -999 $
IMPUTE 	; Lhs = married ; Rhs = one,age,educ,income ; Type = Binary $
PROC $
PROBIT	; Lhs = doctor ; Rhs = one,married,age,kids,public ; Imputation = Probita $
ENDPROC$
EXECUTE	; N = 25 ; Imputation = Probita $
SKIP $
PROBIT	; Lhs = doctor ; Rhs = one,married,age,kids,public $

---------------------------------------------------------------
Deleted   2761 observations with missing data. N is now  24565
---------------------------------------------------------------
---------------------------------------------------------
Equation stored for imputing missing values of    MARRIED
Imputation method: Binary Logistic
Observations currently in full data set        =    33333
Complete observations for imputation equation  =    24565
Missing observations on  MARRIED in data set   =     2761
---------------------------------------------------------
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable               DOCTOR
Log likelihood function    -17679.32198
Restricted log likelihood  -18019.55173
Chi squared [   4 d.f.]       680.45951
Significance level               .00000
Estimation based on N =  27326, K =   5
MI results based on  25 imputed samples
Likelihood based stats are not reliable
when using multiple imputation methods.
--------+--------------------------------------------------------------------
        |                  Standard            Prob.      95% Confidence
  DOCTOR|  Coefficient       Error       z    |z|>Z*         Interval
--------+--------------------------------------------------------------------
        |Index function for probability
Constant|    -.42849***      .04118   -10.41  .0000     -.50920   -.34778
 MARRIED|     .04706**       .02052     2.29  .0218      .00684    .08728
     AGE|     .01379***      .00078    17.75  .0000      .01226    .01531
    KIDS|    -.13128***      .01799    -7.30  .0000     -.16653   -.09603
  PUBLIC|     .20659***      .02409     8.58  .0000      .15938    .25380
--------+--------------------------------------------------------------------
Note: ***, **, * ==>  Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Maximum repetitions of PROC
---------------------------------------------------------------
Deleted   2761 observations with missing data. N is now  24565
---------------------------------------------------------------
Normal exit:   4 iterations. Status=0, F=    15885.25
-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable               DOCTOR
Log likelihood function    -15885.24892
Restricted log likelihood  -16190.94099
Chi squared [   4 d.f.]       611.38414
Significance level               .00000
McFadden Pseudo R-squared      .0188804
Estimation based on N =  24565, K =   5
Inf.Cr.AIC  =  31780.5 AIC/N =    1.294
--------+--------------------------------------------------------------------
        |                  Standard            Prob.      95% Confidence
  DOCTOR|  Coefficient       Error       z    |z|>Z*         Interval
--------+--------------------------------------------------------------------
        |Index function for probability
Constant|    -.42261***      .04335    -9.75  .0000     -.50758   -.33764
 MARRIED|     .05610***      .02112     2.66  .0079      .01471    .09749
     AGE|     .01351***      .00083    16.37  .0000      .01189    .01513
    KIDS|    -.13675***      .01921    -7.12  .0000     -.17439   -.09911
  PUBLIC|     .20923***      .02540     8.24  .0000      .15945    .25901
--------+--------------------------------------------------------------------
Note: ***, **, * ==>  Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------