Data Management and Analysis: Monte Carlo Analysis

Bootstrapping

  • Draw specified number of observations from current sample
  • Draw with replacement
  • Draw specified number of groups in a panel data set (Block bootstrap)
  • Execute a procedure a specified number of times, drawing a new bootstrap sample with each repetition

The executed procedure may be used to bootstrap any estimator in the program whether one of the supported procedures or one that is created by the user. The command specifies the matrix or scalar to be bootstrapped. It may be anything that the program specifies.

Gibbs Sampling

Random number generation can be conditioned on preceding values to set up Gibbs sampling and Markov Chain Monte Carlo estimators.  A simple example appears below.

Random Number Generation

  • Continuous random variables: uniform, normal, lognormal, t, chi-squared, F, exponential, Weibull, Gumbel, gamma, beta, logistic, Cauchy, truncated standard normal
  • Discrete random variables: Poisson, discrete uniform, binomial, geometric
  • Several estimators use Halton draws instead (pseudo Monte Carlo draws). Users may generate Halton sequences.

Estimation

Randomly generated data enter the sample exactly the same as external, real data. Monte Carlo samples may be specified and used with any estimation procedure or any other part of the program that uses data.


Example: Fixed Effects Probit Model

The following program does a Monte Carlo study of the fixed effects probit estimator. We fit the model 20 times with randomly generated data and compute descriptive statistics for the estimated parameter vectors. The underlying theory states that the estimator is biased when T is small. (The example is consistent with it - the estimator appears to be badly biased in our study with T = 5.) This program illustrates several features of LIMDEP.

/*  We investigate the behavior of the fixed effects estimator
    in the panel probit model 

    y*(i,t) = a(i) + b1 * x(i,t) + b2 * d(i,t) + e(i,t),

    x(i,t) is a continuous variable, d(i,t) is a dummy variable
    and both b1 and b2 equal 1.  The model is fit 20 times
    with randomly generated data on a(i) x(i,t) and d(i,t) all
    (reused) and e(i,t) resampled for each replication.
*/
?
? Achieve replicability by using a known seed for generator
?
CALC   ; Ran(55377) $
SAMPLE ; 1 - 5000 $ (Sample length is constant)
CREATE ; xit = rnn(0,1) ; dit=(rnu(-.5,.5)+xit) > 0  
       ; i5=trn(5,0) $  (I5 = 1,1,1,1,1,2,2,2,2,2,...)
MATRIX ; u=rndm(1000) ?  Random matrix for fixed effects
       ; a5  = gxbr(xit,i5)$ Group means of x variable
CREATE ; ai = sqr(5)*a5(i5)+u(i5) ? Fixed effects correlated with x
       ; index  =  ai + xit + dit ; br = 0 ; dr = 0 $
NAMELIST; repl = br,dr $      ? Group two variables

? Set parameters for the simulation. Values can be changed
? 
PROC  = MCset(Tir,Nir) $
CALC    ; Ti = Tir ; Nobs = Nir ; NT = Ti*Nobs ; i = 0 $ 
SAMPLE  ; 1 - NT $  Sample set to NT = 5000. 
MATRIX  ; beta = init(20,2,0.0) $  Place to store results
ENDPROC
?
? This runs the replication.
? Binomial Probit is fit by unconditional maximum likelihood
?
PROC = MCrun$
CREATE ; eit = Rnn(0,1) ; yit = (index + eit) > 0 $
CALC   ; i = i + 1 $
PROBIT ; Lhs = yit ; Rhs = xit,dit ;  Pds = Ti ; Fixed effects $
MATRIX ; beta(i,*) = b $   Save estimated parameters in row i.
ENDPROC
?
? T=5, N=1000 groups
? 
EXECUTE ; Proc = MCset(5,1000) $            ? Setup simulation
EXECUTE ; Proc = MCrun ; n = 20 ; Silent $  ? 20 replications
SAMPLE  ; 1 - 20 $                          ? Describe results
CREATE  ; repl = beta $       ? Copy matrix to 2 variables
DSTAT   ; rhs  = repl $       ? Examine the results

The following results emerge. The theory appears to be correct; the estimates (means) deviate substantially from the theoretical values of 1.0 and 1.0.

====================================================================
Variable        Mean         Std.Dev.        Minimum         Maximum
====================================================================
BR        1.60317327      .100071915      1.37897075      1.82531768
DR        1.42251625      .190029262      1.04402871      1.78544870

Example: MCMC Estimation of the Binary Probit Model

This program uses data augmentation to estimate the posterior means of the coefficients in a probit model.  We apply the program to the well known Spector and Mazzeo study of performance by high school students.

? This is the only application specific part of the program:
?
NAMELIST ; x = one,gpa,tuce,psi$
CREATE   ; y = grade $
?
? From this point forward, the routine is completely generic.
?
CALC     ; rep = 2000 ; burnin = 500 $
CALC     ; k=col(x)$
MATRIX   ; xx=x'x ; xxi =  $
PROBIT   ; Lhs = y ; Rhs = x $  (Classical estimator)
CALC     ; iter=0 $
MATRIX   ; beta=init(k,1,0) ; bbar=beta;bv=init(k,k,0)$
PROC     = gibbs$
?
? Gibbs sampler
?
DO FOR   ; Simulate ; r =1,rep $
CALC     ; iter=iter+1$
?
? Data augmentation step:  E[yi*|beta,y,X]
?
CREATE   ; mui = x'beta ; f = rnu(0,1) 
         ; if(y=1)  ysg = mui + inp(1-(1-f)*phi( mui)); 
            (else)  ysg = mui + inp(     f *phi(-mui)) $
?
? Gibbs step: E[beta|yi*,y,X]
?
MATRIX   ; mb = xxi*x'ysg ; beta = rndm(mb,xxi) $
MATRIX   ; if(iter > burnin) ; bbar=bbar+beta ; bv=bv+beta*beta'$
ENDDO    ; Simulate $
ENDPROC  $
EXECUTE  ; Proc = Gibbs $
CALC     ; ri = rep - burnin ; ri = 1/ri $
MATRIX   ; bbar=ri*bbar ; bv=ri*bv-bbar*bbar' $
MATRIX   ; Stat(bbar,bv,x); Stat(b,varb,x) $

The first set of results shown is the output of the Gibbs sampler. This is the Bayesian estimate. The second is the classical, probit estimator for the same model and data.

+---------------------------------------------------+
|Number of observations in current sample =      32 |
|Number of parameters computed here       =       4 |
|Number of degrees of freedom             =      28 |
+---------------------------------------------------+
+--------+--------------+----------------+--------+--------+
|Variable| Coefficient  | Standard Error |b/St.Er.|P[|Z|>z]|
+--------+--------------+----------------+--------+--------+
 Constant|   -8.69232881      2.89006712    -3.008   .0026
 GPA     |    1.82171165       .75791800     2.404   .0162
 TUCE    |     .07038013       .09139852      .770   .4413
 PSI     |    1.65233826       .65265590     2.532   .0114

+---------------------------------------------------+
|Number of observations in current sample =      32 |
|Number of parameters computed here       =       4 |
|Number of degrees of freedom             =      28 |
+---------------------------------------------------+
+--------+--------------+----------------+--------+--------+
|Variable| Coefficient  | Standard Error |b/St.Er.|P[|Z|>z]|
+--------+--------------+----------------+--------+--------+
 Constant|   -7.45231965      2.54247232    -2.931   .0034
 GPA     |    1.62581004       .69388249     2.343   .0191
 TUCE    |     .05172895       .08389026      .617   .5375
 PSI     |    1.42633234       .59503790     2.397   .0165