Pt. 86, App. XVIII
Appendix XVIII to Part 86
—Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks Certifying to the Provisions of Part 86
, Subpart R
Residual normal deviates to indicate outliers are used routinely and usefully in analyzing regression data, but suffer theoretical deficiencies if statistical significance tests are required. Consequently, the procedure for testing for outliers outlined by Snedecor and Cochran, 6th ed., Statistical Methods, PP. 157-158, will be used. The method will be described generally, then by appropriate formulae, and finally a numerical example will be given.
(a) Linearity is assumed (as in the rest of the deterioration factor calculation procedure), and each contaminant is treated separately. The procedure is as follows:
(1) Calculate the deterioration factor regression as usual, and determine the largest residual in absolute value. Then recalculate the regression with the suspected outlier omitted. From the new regression line calculate the residual at the deleted point, denoted as (yi − yi′). Obtain a statistic by dividing (yi − yi′) by the square root of the estimated variance of (yi − yi′). Find the tailed probability, p, from the t-distribution corresponding to the quotient (double-tailed), with n-3 degrees of freedom, with n the original sample size.
(2) This probability, p, assumes the suspected outlier is randomly selected, which is not true. Therefore, the outlier will be rejected only if 1 − (1-p)n < 0.05.
(3) The procedure will be repeated for each contaminant individually until the above procedure indicates no outliers are present.
(4) When an outlier is found, the vehicle test-log will be examined. If an unusual vehicle malfunction is indicated, data for all contaminants at that test-point will be rejected; otherwise, only the identified outlier will be omitted in calculating the deterioration factor.
(b) Procedure for the calculation of the t-Statistic for Deterioration Data Outlier Test.
(1) Given a set of n points, (x1, y1), (x2. y2) * * * (xn, yn).
xi is the mileage of the ith data point.
yi is the emission of the ith data point.
y = a β(x − x) ∈
(2)(i) Calculate the regression line.
ŷ = a b(x − x)
(ii) Suppose the absolute value of the ith residual
(yi − ŷi) is the largest.
(3)(i) Calculate the regression line with the ith point deleted.
ŷ′ = a′ b′(x − x)
y1 is the observed suspected outlier.
ŷ′i is the predicted value with the suspected outlier deleted.
(x is calculated without the suspected outlier)
(iii) Find p from the t-statistic table
p = prob (|t(n-3) | ≥ t)
t(n-3) is a t-distributed variable with n-3 degrees of freedom.
(iv) yi is an outlier if 1-(1-p)n < .05
||y − ŷ
1 Suspected outlier.
(4)(i) Assume model:
y = a β(x − x) ∈
y = 45 − 1.013(x − x)
(ii) Suspected point out of regression:
y = 44.273 − 1.053(x − x)
y = 44.273 − 1.053(22 − 18.727) = 40.827
yi − ŷ′i = 12.173