Robust Statistics: The Maximum Bias Curve
A Useful Tool for Evaluating the Breaking Point
In the last article of this series, we introduced the problem of statistics with outliers, which can lead to erroneous regressions or misinterpretations about conclusions. We also described the influence function and how it can be used to measure how robust an estimator is to a small fraction of outliers. However, it is also useful to consider the worst case scenario; that is, how badly the statistic estimator performs with the worst outlier contamination.
The Breaking Point
The breaking point is the maximum allowed fraction of outliers that an estimator can tolerate.
Note that this statistic has a range between 0 and 0.5, since if the outlier fraction is greater than 0.5 then the outlier and clean data roles switch.
The Maximum Bias Curve
The maximum bias curve represents the maximum possible bias on the estimator vs fraction of outliers.
Let’s reintroduce the framework of the contamination distribution. Suppose we have a ‘normal’ distribution with thin tails, f. We contaminate this with an outlier distribution g, which is generally assumed to be a point mass at z. This gives a resultant distribution: