Data Description and Graphics: Descriptive Statistics for Cross Sections and Panels
Summary measures
- Means (arithmetic, geometric), standard deviations, minima, maxima
- Medians, sample quantiles (deciles, quartiles)
- Covariances
- Correlations (Pearson, rank)
- Coefficient of concordance for a set of ranks
- Autocorrelations
- Canonical correlations
- Principal components
- Condition number for data matrices
Normality test
- Skewness, kurtosis
- Normal-quantile plot
- Chi-squared test
Example
This is a description of Mroz’s (1987) Labor Supply Data. FAMINC is family income, stratified by KIDS, which indicates whether there are children in the household (1 = no, 2 = yes).
All results based on nonmissing observations.
Stratification is based on KIDS
=========================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
=========================================================================
Stratum is KIDS = 1.000. Obs.= 79.000, Sum of wts. = 79.000
FAMINC 21.2270000 11.1699737 3.30500000 63.2000000 79
--------Skewness= 1.3629 Kurtosis= 5.3140 Chisq= 42.08 Prob= .00
Stratum is KIDS = 2.000. Obs.= 171.000, Sum of wts. = 171.000
FAMINC 23.9105380 13.6056289 4.00000000 91.0440000 171
--------Skewness= 2.2943 Kurtosis= 10.7256 Chisq= 575.28 Prob= .00
All observations in current sample
FAMINC 23.0625400 12.9239815 3.30500000 91.0440000 250
--------Skewness= 2.1517 Kurtosis= 10.2941 Chisq= 747.12 Prob= .00
Order Statistics for Variables
Percentiles for FAMINC
Min. 3.3050 10th 10.195 20th 13.783
25th 15.500 30th 16.235 40th 18.900
Med. 21.136 60th 23.225 70th 26.020
75th 27.925 80th 29.100 90th 36.175
Max. 91.044
Data arrangements
- Stratified data
- Weights
Analysis of variance
- Balanced and unbalanced panels
Panel and stratified data
- Analysis of variance
- F test for effects
Crosstabs
- By row, column, total proportions
- Independence test
- Individual or frequency data
- Crosstabs saved as matrices
This is Agresti's example of cross classified data on political affiliation (Analysis of Categorical Data).
+-----------------------------------------------------------------+ |Cross Tabulation | |Row variable is NIJ (Out of range 0-49: 0) | |Number of Rows = 3 (NIJ = 0 to 2) | |Col variable is NIJ (Out of range 0-49: 0) | |Number of Cols = 3 (NIJ = 0 to 2) | |Chi-squared independence tests: | |Chi-squared[ 4] = 102.04903 Prob value = .00000 | |G-squared [ 4] = 105.66216 Prob value = .00000 | +-----------------------------------------------------------------+ |Joint Frequencies for Row Variable NIJ Column Variable NIJ | +--------+--------+-----------------------------------------------+ |NIJ | Total |DEMOCR INDPND REBUBL | +--------+--------+-----------------------------------------------+ |LIBERAL | 399 | 143 156 100 | |MODERATE| 470 | 119 210 141 | |CONSRVTV| 214 | 15 72 127 | +--------+--------+-----------------------------------------------+ | Total| 1083 | 277 438 368 | +--------+--------+-----------------------------------------------+
Embedded results - transfer to other programs
Program results such as descriptive statistics are displayed in embedded windows that may be transported to other programs.
Accurate computations
This is one of the National Institute of Standards and Technology test examples for benchmarking the accuracy of descriptive statistics computations.
Dataset Name: Maryland Pick-3 Lottery
Description: This is an observed/"real world" data set
consisting of 218 Maryland Pick-3 Lottery values
from September 3, 1989 to April 14, 1990 (32 weeks).
One 3-digit random number (from 000 to 999)
is drawn per day, 7 days per week for most
weeks, but fewer days per week for some weeks.
We here use this data to test accuracy
in summary statistics calculations.
Stat Category: Univariate: Summary Statistics
Reference: None
Data: "Real World"
1 Response : y = 3-digit random number
0 Predictors
218 Observations
Model: Lower Level of Difficulty
2 Parameters : mu, sigma
1 Response Variable : y
0 Predictor Variables
y = mu + e
Certified Values
Sample Mean ybar: 518.958715596330
Sample Standard Deviation (denom. = n-1) s: 291.699727470969
Sample Autocorrelation Coefficient (lag 1) r(1): -0.120948622967393
Number of Observations: 218
Data: Y
READ ; Nobs = 218 ; Nvar = 1 ; Names = y ; ByVariable $
162 671 933 414 788 730 817 33 536 875 670 236 473 167 877 980 316 950
456 92 517 557 956 954 104 178 794 278 147 773 437 435 502 610 582 780
689 562 964 791 28 97 848 281 858 538 660 972 671 613 867 448 738 966
139 636 847 659 754 243 122 455 195 968 793 59 730 361 574 522 97 762
431 158 429 414 22 629 788 999 187 215 810 782 47 34 108 986 25 644
829 630 315 567 919 331 207 412 242 607 668 944 749 168 864 442 533 805
372 63 458 777 416 340 436 140 919 350 510 572 905 900 85 389 473 758
444 169 625 692 140 897 672 288 312 860 724 226 884 508 976 741 476 417
831 15 318 432 241 114 799 955 833 358 935 146 630 830 440 642 356 373
271 715 367 393 190 669 8 861 108 795 269 590 326 866 64 523 862 840
219 382 998 4 628 305 747 247 34 747 729 645 856 974 24 568 24 694
608 480 410 729 947 293 53 930 223 203 677 227 62 455 387 318 562 242
428 968
DSTAT ; Rhs = y ; AR1 $
Descriptive Statistics
All results based on nonmissing observations.
=====================================================================
Variable Mean Std.Dev. Minimum Maximum Cases
=====================================================================
Y 518.958716 291.699727 4.00000000 999.000000 218
Autocorrelation -.120948623