2 Replies Latest reply on Aug 13, 2014 11:03 AM by Saki Golafale

    Multiple outlier stats test suggestion

    Juana Zamora

       

      Hello,

      I was wondering if you could suggest a statistical test to detect more than one outlier from a data set with n=7 . I thought of using Dixon’s Q test or Grubb’s Test but they seem to be for single outliers.


      Thank you,

      Juana

       

        • Re: Multiple outlier stats test suggestion
          Steven Cooke

          Dear Juana,

          I appreciate the desire to identify multiple outliers with one test, but the reason the tests used target individual values is that the "rejection" criteria depend on its relationship to the statistics of the whole data set.  IF you identify an "outlier" and remove it from your 'legitimate' data set, the statistics of that set change as well.  The change may or may not move other data points into the 'outlier' region.  You cannot determine that until you have the 'new' data set to work with, so you can't identify multiple outliers with a single test - it will require sequential tests of each suspect data point.

           

          In practical terms, you are applying this to a relatively small data set.  As an analytical chemist, if I find that even 2 out of 7-8 data points may be 'outliers' I would prefer to find out why, and rerun my tests with greater confidence.  Reducing my 'valid' data set to only 5-6 data points also reduces my confidence in any "statistical" analyses that I may want to do with them.

           

          I prefer the Grubb's test because it includes the standard deviation of the sample as an indicator of the central tendency, which I think is better than a simple arithmetic (linear approximation) ratio of the raw values.

           

          In ALL cases, you should never simply 'remove' data from your records!  An explanation of the reason for removal (including these tests) of a data point and a retention of the total raw data and calculations should always be retained for the most ethical reporting of any data and results.  You may then proceed to your statistical analyses of the 'valid' data remaining.

           

          Best regards,

          Steven Cooke

          Process Systems Consulting