Hello, |
Dear Juana,
I appreciate the desire to identify multiple outliers with one test, but the reason the tests used target individual values is that the "rejection" criteria depend on its relationship to the statistics of the whole data set. IF you identify an "outlier" and remove it from your 'legitimate' data set, the statistics of that set change as well. The change may or may not move other data points into the 'outlier' region. You cannot determine that until you have the 'new' data set to work with, so you can't identify multiple outliers with a single test - it will require sequential tests of each suspect data point.
In practical terms, you are applying this to a relatively small data set. As an analytical chemist, if I find that even 2 out of 7-8 data points may be 'outliers' I would prefer to find out why, and rerun my tests with greater confidence. Reducing my 'valid' data set to only 5-6 data points also reduces my confidence in any "statistical" analyses that I may want to do with them.
I prefer the Grubb's test because it includes the standard deviation of the sample as an indicator of the central tendency, which I think is better than a simple arithmetic (linear approximation) ratio of the raw values.
In ALL cases, you should never simply 'remove' data from your records! An explanation of the reason for removal (including these tests) of a data point and a retention of the total raw data and calculations should always be retained for the most ethical reporting of any data and results. You may then proceed to your statistical analyses of the 'valid' data remaining.
Best regards,
Steven Cooke
Process Systems Consulting
Thank you Steven! I have also learned from your brilliant explanation.
Cheers,
Saki