Data Management: Sampling and Bootstrapping

These features are used to define the specific observations to be used in the estimation procedures or other computations that access the data. The first group are used to define the ‘current sample.’ The second group are used to ‘subsample’ from the sample, using either a random subset of the current sample, or the ‘leave one out’ procedure of the jackknife procedure.

Sampling

  • Include observations based on algebraic condition
  • Reject observations based on algebraic conditions
  • Period - specify time interval to be in sample
  • Sample - specify particular observations or ranges of observations

Bootstrap and jackknife

  • Draw specified number of observations from current sample
  • Draw with replacement
  • Draw specified number of groups in a panel data set
  • Execute a procedure a specified number of times, drawing a new bootstrap sample with each repetition
  • Execute using jackknife procedure for sampling
  • Maximum score binary choice estimator uses bootstrapping
  • Regression, least absolute deviations uses bootstrapping

The EXECUTE procedure may be used to bootstrap any estimator in the program whether one of the supported procedures or one that is created by the user. The command specifies the matrix or scalar to be bootstrapped. It may be anything that the program specifies.

Application

A simple application illustrates estimating a standard error for a correlation coefficient. The delta method is sometimes used for this purpose.  The following computes 100 bootstrap samples and estimates the mean, variance, skewness and kurtosis of r and displays a histogram of the estimates.

graph
PROC    $
CALC    ; Cor = Cor(DocVis,HospVis) $
ENDPROC $
EXECUTE ; N = 100 ; Bootstrap = RankCor ; Histogram $

Completed   100 bootstrap iterations.
+------------------------------------------+
| Results of bootstrap estimation of model.|
| Model has been reestimated   100 times.  |
| Statistics shown below are centered      |
| around the  original estimate  based on  |
| the original full sample of observations.|
| Result is COR      =       .11751        |
| bootstrap samples have15000 observations.|
| Estimate  RtMnSqDev  Skewness   Kurtosis |
|     .118       .013     -.410      2.891 |
| Minimum =      .086  Maximum =      .148 |
+------------------------------------------+

The bootstrapped quantity in the procedure can be anything that the program computes using any instruction, model estimator, or other procedure.