Multiple Imputation
LIMDEP’s new implementation of multiple imputation is woven into the entire program, not just a few specific models. Any estimator, even your own created with MAXIMIZE, or any other computation involving data that produces a coefficient vector and a sampling covariance matrix, can be based on multiple imputed data sets. And, we have built this technique to bypass the need to create multiple data sets – traditionally, the need to replicate the full data set has hobbled this method. LIMDEP’s implementation of multiple imputation uses only the existing data set. The results are fully replicable as well. (You can create and save the imputed data sets if you wish.)Multiple Imputation Features Including Continuous Data, Binary Variables, Ordered Outcomes and More
Imputation equations for filling missing values
- Up to 30 variables imputed simultaneously
- Six types of imputation procedures for
- Continuous variables using multiple regression
- Binary variables using logistic regression
- Count variables using Poisson regression
- Likert scale (ordered outcomes) using ordered probit
- Fractional (proportional outcome) using logistic regression
- Unordered multinomial choice using multinomial logit
- No duplication of the base data set
Estimation step for any model in LIMDEP or NLOGIT
- All models supported by built in procedures
- Any model written by the user with GMME, MAXIMIZE, NLSQ, etc.
- Estimate any number of models using each imputed data set
Example
Here is a constructed example based on a data set that contains 27,326 observations and about 30 variables. The variable married is a marital status dummy variable. We have injected about 10% missing values into this binary variable. We create an imputation equation for married with the IMPUTE command. The procedure then fits a probit model that uses married and several other variables. The missing values are imputed using age, education and income in each of 25 iterations. The second set of results is the simple probit results using casewise deletion rather than imputation.
SAMPLE ; All $ CREATE ; missing = Rnu(0,1) < .1 $ CREATE ; If(missing=1)married = -999 $ IMPUTE ; Lhs = married ; Rhs = one,age,educ,income ; Type = Binary $ PROC $ PROBIT ; Lhs = doctor ; Rhs = one,married,age,kids,public ; Imputation = Probita $ ENDPROC$ EXECUTE ; N = 25 ; Imputation = Probita $ SKIP $ PROBIT ; Lhs = doctor ; Rhs = one,married,age,kids,public $ --------------------------------------------------------------- Deleted 2761 observations with missing data. N is now 24565 --------------------------------------------------------------- --------------------------------------------------------- Equation stored for imputing missing values of MARRIED Imputation method: Binary Logistic Observations currently in full data set = 33333 Complete observations for imputation equation = 24565 Missing observations on MARRIED in data set = 2761 --------------------------------------------------------- ----------------------------------------------------------------------------- Binomial Probit Model Dependent variable DOCTOR Log likelihood function -17679.32198 Restricted log likelihood -18019.55173 Chi squared [ 4 d.f.] 680.45951 Significance level .00000 Estimation based on N = 27326, K = 5 MI results based on 25 imputed samples Likelihood based stats are not reliable when using multiple imputation methods. --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence DOCTOR| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Index function for probability Constant| -.42849*** .04118 -10.41 .0000 -.50920 -.34778 MARRIED| .04706** .02052 2.29 .0218 .00684 .08728 AGE| .01379*** .00078 17.75 .0000 .01226 .01531 KIDS| -.13128*** .01799 -7.30 .0000 -.16653 -.09603 PUBLIC| .20659*** .02409 8.58 .0000 .15938 .25380 --------+-------------------------------------------------------------------- Note: ***, **, * ==> Significance at 1%, 5%, 10% level. ----------------------------------------------------------------------------- Maximum repetitions of PROC --------------------------------------------------------------- Deleted 2761 observations with missing data. N is now 24565 --------------------------------------------------------------- Normal exit: 4 iterations. Status=0, F= 15885.25 ----------------------------------------------------------------------------- Binomial Probit Model Dependent variable DOCTOR Log likelihood function -15885.24892 Restricted log likelihood -16190.94099 Chi squared [ 4 d.f.] 611.38414 Significance level .00000 McFadden Pseudo R-squared .0188804 Estimation based on N = 24565, K = 5 Inf.Cr.AIC = 31780.5 AIC/N = 1.294 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence DOCTOR| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Index function for probability Constant| -.42261*** .04335 -9.75 .0000 -.50758 -.33764 MARRIED| .05610*** .02112 2.66 .0079 .01471 .09749 AGE| .01351*** .00083 16.37 .0000 .01189 .01513 KIDS| -.13675*** .01921 -7.12 .0000 -.17439 -.09911 PUBLIC| .20923*** .02540 8.24 .0000 .15945 .25901 --------+-------------------------------------------------------------------- Note: ***, **, * ==> Significance at 1%, 5%, 10% level. -----------------------------------------------------------------------------