Data Description and Graphics: Descriptive Statistics for Cross Sections and Panels

Summary measures

  • Means (arithmetic, geometric), standard deviations, minima, maxima
  • Medians, sample quantiles (deciles, quartiles)
  • Covariances
  • Correlations (Pearson, rank)
  • Coefficient of concordance for a set of ranks
  • Autocorrelations
  • Canonical correlations
  • Principal components
  • Condition number for data matrices

Normality test

  • Skewness, kurtosis
  • Normal-quantile plot
  • Chi-squared test

Example

This is a description of Mroz’s (1987) Labor Supply Data. FAMINC is family income, stratified by KIDS, which indicates whether there are children in the household (1 = no, 2 = yes).

All results based on nonmissing observations.
Stratification is based on KIDS
=========================================================================
Variable        Mean       Std.Dev.      Minimum       Maximum    Cases
=========================================================================
Stratum is KIDS     =    1.000.  Obs.=    79.000, Sum of wts. =  79.000
FAMINC    21.2270000    11.1699737    3.30500000    63.2000000       79
--------Skewness=  1.3629   Kurtosis=  5.3140  Chisq=   42.08 Prob= .00
Stratum is KIDS     =    2.000.  Obs.=   171.000, Sum of wts. = 171.000
FAMINC    23.9105380    13.6056289    4.00000000    91.0440000      171
--------Skewness=  2.2943   Kurtosis= 10.7256  Chisq=  575.28 Prob= .00
All observations in current sample
FAMINC    23.0625400    12.9239815    3.30500000    91.0440000      250
--------Skewness=  2.1517   Kurtosis= 10.2941  Chisq=  747.12 Prob= .00

              Order Statistics for Variables
Percentiles for FAMINC
Min.        3.3050    10th       10.195	  20th          13.783
25th        15.500    30th       16.235	  40th          18.900
Med.        21.136    60th       23.225	  70th          26.020
75th        27.925    80th       29.100	  90th          36.175
Max.        91.044

Data arrangements

  • Stratified data
  • Weights

Analysis of variance

  • Balanced and unbalanced panels

Panel and stratified data

  • Analysis of variance
  • F test for effects

Crosstabs

  • By row, column, total proportions
  • Independence test
  • Individual or frequency data
  • Crosstabs saved as matrices

This is Agresti's example of cross classified data on political affiliation (Analysis of Categorical Data).

+-----------------------------------------------------------------+
|Cross Tabulation                                                 |
|Row variable is NIJ      (Out of range 0-49:      0)             |
|Number of Rows =  3      (NIJ      =  0 to  2)                   |
|Col variable is NIJ      (Out of range 0-49:      0)             |
|Number of Cols =  3      (NIJ      =  0 to  2)                   |
|Chi-squared independence tests:                                  |
|Chi-squared[   4] =  102.04903   Prob value =  .00000            |
|G-squared  [   4] =  105.66216   Prob value =  .00000            |
+-----------------------------------------------------------------+
|Joint Frequencies for Row Variable NIJ       Column Variable NIJ |
+--------+--------+-----------------------------------------------+
|NIJ     | Total  |DEMOCR INDPND REBUBL                           |
+--------+--------+-----------------------------------------------+
|LIBERAL |   399  |   143    156    100                           |
|MODERATE|   470  |   119    210    141                           |
|CONSRVTV|   214  |    15     72    127                           |
+--------+--------+-----------------------------------------------+
|   Total|  1083  |   277    438    368                           |
+--------+--------+-----------------------------------------------+

Embedded results - transfer to other programs

Program results such as descriptive statistics are displayed in embedded windows that may be transported to other programs.

Accurate computations

This is one of the National Institute of Standards and Technology test examples for benchmarking the accuracy of descriptive statistics computations.

Dataset Name:  Maryland Pick-3 Lottery
Description:   This is an observed/"real world" data set
               consisting of 218 Maryland Pick-3 Lottery values
               from September 3, 1989 to April 14, 1990 (32 weeks).
               One 3-digit random number (from 000 to 999)
               is drawn per day, 7 days per week for most
               weeks, but fewer days per week for some weeks.
               We here use this data to test accuracy
               in summary statistics calculations.
Stat Category: Univariate: Summary Statistics
Reference:     None
Data:          "Real World"
               1    Response          : y = 3-digit random number
               0    Predictors
               218  Observations
Model:         Lower Level of Difficulty
               2    Parameters        : mu, sigma
               1    Response Variable : y
               0    Predictor Variables
               y    = mu + e
                                                  Certified Values
Sample Mean                                ybar:  518.958715596330
Sample Standard Deviation (denom. = n-1)      s:  291.699727470969
Sample Autocorrelation Coefficient (lag 1) r(1):  -0.120948622967393
Number of Observations:                             218
Data: Y

READ ; Nobs = 218 ; Nvar = 1 ; Names = y ; ByVariable $
162 671 933 414 788 730 817  33 536 875 670 236 473 167 877 980 316 950
456  92 517 557 956 954 104 178 794 278 147 773 437 435 502 610 582 780
689 562 964 791  28  97 848 281 858 538 660 972 671 613 867 448 738 966
139 636 847 659 754 243 122 455 195 968 793  59 730 361 574 522  97 762
431 158 429 414  22 629 788 999 187 215 810 782  47  34 108 986  25 644 
829 630 315 567 919 331 207 412 242 607 668 944 749 168 864 442 533 805 
372  63 458 777 416 340 436 140 919 350 510 572 905 900  85 389 473 758 
444 169 625 692 140 897 672 288 312 860 724 226 884 508 976 741 476 417 
831  15 318 432 241 114 799 955 833 358 935 146 630 830 440 642 356 373 
271 715 367 393 190 669   8 861 108 795 269 590 326 866  64 523 862 840 
219 382 998   4 628 305 747 247  34 747 729 645 856 974  24 568 24  694 
608 480 410 729 947 293  53 930 223 203 677 227  62 455 387 318 562 242 
428 968

DSTAT ; Rhs = y ; AR1 $
Descriptive Statistics
All results based on nonmissing observations.
=====================================================================
Variable      Mean       Std.Dev.      Minimum       Maximum    Cases
=====================================================================
Y       518.958716    291.699727    4.00000000    999.000000      218
Autocorrelation -.120948623