Accuracy: Linear Regression Computation

LIMDEP and NLOGIT's linear regression computations are extremely accurate. The 'Filippelli problem' in the NIST benchmark problems is the most difficult of the set. Most programs are not able to do the computation at all. The assessment of another widely used package was as follows:

'Filippelli test: XXXXX found the variables so collinear that it dropped two of them - that is, it set two coefficients and standard errors to zero. The resulting estimates still fit the data well. Most other statistical software packages have done the same thing and most authors have interpreted this result as acceptable for this test.'

We don't find this acceptable. First, the problem is solvable. See LIMDEP and NLOGIT's solution below using only the program defaults - just the basic regression instruction. Second, LIMDEP and NLOGIT would not, on their own, drop variables from any regression and leave behind some arbitrarily chosen set that provides a 'good fit.' If the regression can't be computed within the (very high) tolerance of the program, we just tell you so. For this problem, LIMDEP and NLOGIT do issue a warning, however. What you do next is up to you, not the program.

The NIST data and solution file

Dataset Name:  Filippelli (NIST-Filippelli.dat )
Procedure:     Linear Least Squares Regression
Reference:     Filippelli, A., NIST.
Data:          1 Response Variable (y)
               1 Predictor Variable (x)
               82 Observations
               Higher Level of Difficulty
Model:         Polynomial Class
               11 Parameters (B0,B1,...,B10)
               y = B0+B1*x+B2*(x**2) +...+B9*(x**9)+B10*(x**10)+e
               Certified Regression Statistics
                                            Standard Deviation
     Parameter         Estimate                of Estimate
        B0        -1467.48961422980         298.084530995537
        B1        -2772.17959193342         559.779865474950
        B2        -2316.37108160893         466.477572127796
        B3        -1127.97394098372         227.204274477751
        B4        -354.478233703349         71.6478660875927
        B5        -75.1242017393757         15.2897178747400
        B6        -10.8753180355343         2.23691159816033
        B7        -1.06221498588947         0.221624321934227
        B8        -0.670191154593408E-01    0.142363763154724E-01
        B9        -0.246781078275479E-02    0.535617408889821E-03
        B10       -0.402962525080404E-04    0.896632837373868E-05
     Standard Deviation   0.334801051324544E-02
     R-Squared            0.996727416185620
     F Statistic       2162.43954511489
Certified Analysis of Variance Table
Source of Degrees of     Sums of                 Mean  
Variation  Freedom       Squares                Squares           
Regression   10     0.242391619837339     0.242391619837339E-01 
Residual     71     0.795851382172941E-03 0.112091743968020E-04

LIMDEP and NLOGIT solution

WARNING: Badly conditioned X. Condition value =    .2999482D+10 

-----------------------------------------------------------------------------
Ordinary     least squares regression ............
LHS=Y        Mean                 =         .84958
             Standard deviation   =         .05479
             No. of observations  =             82  Degrees of freedom
Regression   Sum of Squares       =        .242392          10
Residual     Sum of Squares       =    .795851E-03          71
Total        Sum of Squares       =        .243187          81
             Standard error of e  =         .00335
Fit          R-squared            =         .99673  R-bar squared =   .99627
Model test   F[ 10,    71]        =     2162.43959  Prob F > F*   =   .00000
Diagnostic   Log likelihood       =      356.90255  Akaike I.C.   =-11.27452
             Restricted (b=0)     =      122.29336  Bayes  I.C.   =-10.95167
             Chi squared [ 10]    =      469.21839  Prob C2 > C2* =   .00000
--------+--------------------------------------------------------------------
        |                  Standard            Prob.      95% Confidence
       Y|  Coefficient       Error       t    |t|>T*         Interval
--------+--------------------------------------------------------------------
Constant|   -1467.49***    298.0845    -4.92  .0000    -2051.72   -883.25
      X1|   -2772.18***    559.7799    -4.95  .0000    -3869.33  -1675.03
      X2|   -2316.37***    466.4776    -4.97  .0000    -3230.65  -1402.09
      X3|   -1127.97***    227.2043    -4.96  .0000    -1573.29   -682.66
      X4|   -354.478***    71.64787    -4.95  .0000    -494.905  -214.051
      X5|   -75.1242***    15.28972    -4.91  .0000   -105.0915  -45.1569
      X6|   -10.8753***     2.23691    -4.86  .0000    -15.2596   -6.4911
      X7|   -1.06222***      .22162    -4.79  .0000    -1.49659   -.62784
      X8|    -.06702***      .01424    -4.71  .0000     -.09492   -.03912
      X9|    -.00247***      .00054    -4.61  .0000     -.00352   -.00142
     X10|-.40296D-04***   .8966D-05    -4.49  .0000 -.57870D-04  -.22723D-04
--------+--------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==>  Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------