# Data Management and Analysis: Sampling and Bootstrapping

These features are used to define the specific observations to be used in the estimation procedures or other computations that access the data. The first group are used to define the ‘current sample.’ The second group are used to ‘subsample’ from the sample, using either a random subset of the current sample, or the jackknife, ‘leave one out’ procedure.

## Sampling

- Include or exclude observations based on algebraic conditions
- Period - specify time interval to be in sample
- Sample - specify particular observations or ranges of observations

## Bootstrap and Jackknife

- Draw specified number of observations from current sample
- Draw with or without replacement
- Draw specified number of groups in a panel data set (block bootstrap)
- Execute a procedure a specified number of times, drawing a new bootstrap sample with each repetition
- Execute using jackknife procedure for sampling
- Regression, least absolute deviations and quantile Poisson use bootstrapping

The EXECUTE procedure may be used to bootstrap any estimator in the program whether one of the supported procedures or one that is created by the user. The command specifies the matrix or scalar to be bootstrapped. It may be anything that the program specifies.

## Application

A simple application illustrates estimating a standard error for a correlation coefficient. The delta method is sometimes used for this purpose. The following computes 100 bootstrap samples and estimates the mean, variance, skewness and kurtosis of r and displays a histogram of the estimates.

PROC $ CALC ; Cor = Cor(DocVis,HospVis) $ ENDPROC $ EXECUTE ; N = 100 ; Bootstrap = Cor ; Histogram $ Completed 100 bootstrap iterations. +------------------------------------------+ | Results of bootstrap estimation of model.| | Model has been reestimated 100 times. | | Statistics shown below are centered | | around the original estimate based on | | the original full sample of observations.| | Result is COR = .17247 | | bootstrap samples have 3377 observations.| | Estimate RtMnSqDev Skewness Kurtosis | | .172 .043 .992 3.043 | | Minimum = .100 Maximum = .294 | +------------------------------------------+

The bootstrapped quantity in the procedure can be anything that the program computes using any instruction, model estimator, or other procedure.