Member-only story

Robust Statistics: The Maximum Bias Curve

A Useful Tool for Evaluating the Breaking Point

3 min readMar 8, 2022

https://www.adelaide.edu.au/aiml/our-research/machine-learning/robust-statistics

In the last article of this series, we introduced the problem of statistics with outliers, which can lead to erroneous regressions or misinterpretations about conclusions. We also described the influence function and how it can be used to measure how robust an estimator is to a small fraction of outliers. However, it is also useful to consider the worst case scenario; that is, how badly the statistic estimator performs with the worst outlier contamination.

The Breaking Point

The breaking point is the maximum allowed fraction of outliers that an estimator can tolerate.

Note that this statistic has a range between 0 and 0.5, since if the outlier fraction is greater than 0.5 then the outlier and clean data roles switch.

The Maximum Bias Curve

The maximum bias curve represents the maximum possible bias on the estimator vs fraction of outliers.

Let’s reintroduce the framework of the contamination distribution. Suppose we have a ‘normal’ distribution with thin tails, f. We contaminate this with an outlier distribution g, which is generally assumed to be a point mass at z. This gives a resultant distribution:

Where ε is the fraction of contamination.

Now, choose a metric of a probability distribution to assess as θ(X) where X is any probability distribution function. This metric can be anything; for example, a measure of the spread of the distribution (e.g. standard deviation).

The maximum bias curve is then defined based on the sample estimator for θ:

In words, we get the maximum difference between the estimator values from a contaminated distribution and the clean distribution over all possible contamination distributions, limited by the fraction ε.

Example

Let’s define θ(X) as the location/average metric. This can be measured using the mean, median or Huber/M estimators.

A practical tool to evaluate the breaking point of these estimators is to define the fraction of outliers when the MBC goes to infinity:

Robust Statistics: The Maximum Bias Curve

A Useful Tool for Evaluating the Breaking Point

The Breaking Point

The Maximum Bias Curve

Example

Written by Rohan Tangri

No responses yet