Outliers
Outliers
Outliers
The interquartile range (IQR) is the distance between the first and third
quartiles (the length of the box in the boxplot)
IQR = Q3 Q1
suspect x
G exp
s
example
The following values were got for the nitrate concentration (mg/L) in a
sample of river water:
Ideally get more measurements if suspect occurs, esp. if only a few made.
the more values may make it clearer if suspect should be rejected
Also if kept, reduce its effect.
if 3 further measurements...
popular
for small sample (n=3 to 10)
assumes Normal population
if Q > critical value, then REJECT suspect
Dixon's Q-Test
The following values were got for the
nitrate concentration (mg/L) in a sample of
river water:
suspect nearest
Q
range
You try:
0.189 0.167 0.187 0.183 0.186 0.182
suspect nearest
Q
range
DECIDE
Correct obvious errors for which data exists
Exclude obvious errors for which no data exists
Ignore? run with/without to see if influential
trimmed mean
Retain?
outliers are expected for large sample sizes
some methods are robust
Replace
DISCLOSE