Anderson Darling Minimisation
Anderson Darling Minimisation
Anderson Darling Minimisation
net/publication/305724742
CITATIONS READS
3 263
1 author:
Mathias Raschke
Co.M.Raschke / independent resercher
86 PUBLICATIONS 699 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mathias Raschke on 12 August 2018.
Abstract: We reveal that the minimum Anderson-Darling (MAD) estimator is a variant of the
maximum likelihood method. Furthermore, it is shown that the MAD estimator offers excellent
opportunities for parameter estimation if there is no explicit formulation for the distribution model.
The computation time for the MAD estimator with approximated cumulative distribution function is
much shorter than that of the classical maximum likelihood method with approximated probability
density function. Additionally, we research the performance of the MAD estimator for the generalized
Pareto distribution and demonstrate a further advantage of the MAD estimator with an issue of seismic
hazard analysis.
Key words: minimum distance estimator, minimum Anderson-Darling estimator, likelihood function,
1 Introduction
The minimum-distance estimators which apply to the empirical distribution function (EDF) are well-
known inference methods (e.g. Wolfowitz, 1957; Drossos, 1980; Paar and Schucany, 1980; Paar,
𝑑[𝐹(𝑥; ̂
𝛉), 𝐹𝑛 (𝑥)] = min 𝑑[𝐹 (𝑥; 𝛉), 𝐹𝑛 (𝑥)], (1)
𝛉𝜖Θ
wherein 𝛉 is the parameter vector of the cumulative distribution function (CDF) F. Fn is the empirical
distribution function of a sample of size n. One variant is the minimum Anderson-Darling (MAD)
estimator of Boos (1982), which applies the Anderson-Darling distance A (Anderson and Darling
1954)
2
∞ (𝐹(𝑥;𝛉)−𝐹 (𝑥))
𝑛
𝐴(𝛉) = 𝑛 ∫−∞ 𝐹(𝑥;𝛉)(1−𝐹(𝑥;𝛉)) 𝑑𝐹 (𝑥; 𝛉). (2)
This distance is also frequently applied to goodness-of-fit tests for different distribution types
𝐴(𝛉) = −𝑛 − 1/𝑛 ∑𝑛𝑖=1(2𝑖 − 1) ln(𝐹(𝑋𝑖 ; 𝛉)) + (2(𝑛 − 𝑖 ) + 1) ln(1 − 𝐹 (𝑋𝑖 ; 𝛉)) (3)
wherein Xi is a realisation of the ordered sample. The point estimation by the MAD estimator is
̂ ) = min 𝐴(𝛉)
𝐴(𝛉 (4)
𝛉𝜖Θ
The MAD estimator has a very good performance in the case of many location scale distributions
2
according to Boos (1982, Tab.1), the mean squared error 𝑀𝑆𝐸 = 𝐸 ((𝜃̂ − 𝜃0 ) ) is frequently
marginally higher than MSE of the well-known maximum likelihood (ML) method.
There are more types of distance estimators. For example Basu et al. (2011) deals primarily with
estimators which apply the distance between probability distribution functions (PDFs). The minimum-
distance estimators which are based on the EDF are currently not very popular in statistics,
publications are infrequent (e.g., Kozek 1998). However, there is a certain interest in minimum-
distance methods in the actuarial research community (e.g. Clemente et al., 2012, Skřivánková, and
Juhás, 2012), and Coronel-Brizio and Hernandez-Montoya use Anderson-Darling distance for the
threshold selection in extreme value analysis. Our attention to MAD arose from a special problem: the
estimation of parameters of a known mechanism which generate a random variable X without explicit
formulations for PDF and CDF of X. Our first idea was to apply an ML estimator with point
estimation
̂ ) = max 𝑙 (𝛉),
𝑙(𝛉 (5)
𝛉𝜖Θ
The PDF can be approximated for a fixed parameter vector by a large sample being generated by a
Monte Carlo simulation and the Kernel density estimation according to Silverman (1986). However,
the computational burden would be high (the same applies to computation of a PDF by
multidimensional integral) and there is the danger that there is more than one maximum of 𝑙 (𝛉) due to
the approximation. The second idea is to apply an estimator which uses the CDF. The latter can be
approximated easily by EDF Fn of ordered sample X1≤ X2≤…≤ Xn* of size n* of simulated sample with
2
Revised manuscript
𝑖
𝐹̂𝑛∗ (𝑥𝑖 ) = 𝑛∗+1 (7)
The MAD estimator would be suitable for the second approach. Nevertheless, we will focus at first on
the ML method because it is the most important estimation method with very good asymptotic
behaviour. We develop a special version of the ML method which applies the CDF in the following
section. The resulting variant of the ML estimator is equivalent to the MAD estimator. In section 3, we
discover an initial advantage of the MAD estimator by researching the computing speed of the MAD
estimator with CDF approximated by EDF of a large simulated sample in comparison to the ML
method with PDF approximated by Kernel smoothing. Then we briefly research the performance of
the MAD estimator for the generalized Pareto distribution (GPD) in section 4 - as the interest of
actuarial science is in distance estimators and GPD. Finally, we demonstrate a further advantage of the
MAD by an example of earthquake engineering – the estimation of the variance of the individual
distributions. A Bernoulli distribution is a discrete, binary distribution with case B and case \B,
𝑃 (\𝐵) = 1 − 𝑝. (8b)
with sample size n and the number of observations nB for case B. The corresponding point estimation
is
𝑛𝐵
𝑝̂ = 𝑛
. (10)
At each point x at the scale of rational numbers, the case B of an Bernoulli distribution is defined
by X with X≤x and case \B means X>x. The parameter p of Eq. (8,9) is therein determined by
𝑝 = 𝐹 (𝑥). (11)
3
Revised manuscript
Furthermore, the number nB(x) is the number of observations with X≤x and Eq. (9) gives
We want to estimate the actual 𝟎 of F which applies for every point x. This is carried out by mixing
Now we replace 𝑓(𝑥; 𝟎 ) by an empirical distribution function of the current sample as we do not
know the actual 𝟎. We write a Bernoulli likelihood function for the ordered sample X1≤ …≤ Xi ≤…≤
Xn
1
𝑙𝐵 () = ∑𝑛𝑖=1 (i − 0.5)ln(𝐹(𝑋𝑖 ; )) + (𝑛 − 𝑖 + 0.5) ln(1 − 𝐹(𝑋𝑖 ; )). (14)
𝑛
The numbers nB(x) and n-nB(x) of observations are replaced therein by i-0.5 and n-i+0.5. The value 0.5
is applied as observation xi at point x=xi could be interpreted as xix or x≥xi in the approximation.
Now, we have the MAD estimator as a special variant of the ML method with
as 𝑙𝐵 (𝛉) of Eq.(14) has obviously the maximum where 𝐴(𝛉) of Eq.(3) has a minimum. This
relationship of the MAD estimator to the ML method explains that the asymptotic MSE of the MAD is
frequently only bit higher than the MSE of ML method according to Boos (1982). Even though the
issue of estimation error for the MAD estimator was not discussed by him. We only refer to the
empirical variant of the well-known Fisher information matrix (see e.g. Coles, 2001; Upton and Cook,
2008) regarding this issue. The Jack-knife method of Efron (1979) could also be applied to estimate
If the empirical distribution function (7) is used for the application of the MAD estimator, then the
numerical stability of the estimation procedure can be ensured by a fixed start value of the
initialisation of the random generator for every considered parameter vector . Furthermore, we point
out that the MAD estimator has an advantage over ML method: it can be applied directly and with
much less modification in the case of censored data. Only the limits of the counting variable i has to be
changed in Eq.(14).
4
Revised manuscript
Here, we compare the computing speed of the MAD estimator with approximated CDF with the
performance of the conventional ML method with approximated PDF. The latter is provided by the
Kernel density estimation according to Silverman (1986). For performance analysis, we construct a
𝑌1 +√𝑌2 + 4√ 𝑌3
𝑋= 𝑌4 √𝑌5
, 𝑋 ≥ 0. (16)
1 𝑥
𝑓𝑦 (𝑥) = 𝜎 exp (− 𝜎 ) , 𝑥 ≥ 0, 𝜎 > 0. (17)
The PDF f of X is only parametrized by =1 of (17) in our example. We know neither the PDF nor the
CDF of X; the constructed example also demonstrates the limits of conventional estimation methods.
An interesting detail is that the approximated CDF F of X is very similar in the upper half to the well-
known generalized Pareto distribution (see, e.g., Beirlant et al., 2004) with CDF
𝑥 −1/𝛾
𝐹 (𝑥) = 1 − (1 + 𝛾 𝜎 ) , 𝜎 > 0, (18)
with =5 and extreme value index =1. This means that the expectation E(X) is infinite (cf. Beirlant et
al., 2004, section 5.3) and a moment estimator would not work. The CDFs and the survival functions
are shown in Figures 1a and b. The CDF of X is computed by the empirical distribution function
according to Eq.(7) with a large sample with n*=100,0000 being generated by a Monte Carlo
simulation. The parameter of fy and f is estimated in the performance analysis for a sample with
Figure 1.
We apply a simple optimization algorithm (nearest neighbour; highest relative exactness 1‰, start
value is =1) to maximize the logarithmized likelihood functions. The numerical procedures are
programmed in VB.net by applying the interpolation tools from the mathematics library of Extreme
the PDF is computed with the sample variance of the observation (Fig.1c) according to the optimal
5
Revised manuscript
bandwidth for a Gauss distribution. We consider Gaussian, Epanechnikov and Biweight kernels
The required computation time of an ACPI x64 based PC is listed in Table 1. Obviously,
computation time for the Bernoulli estimator with approximated CDF is much shorter than
computation time for conventional ML estimation with approximated PDF. The interpolation method
does not have very much influence on the computation time and estimation results of MAD method.
Tab.1.
The current interest in minimum distance estimators for the GPD is relatively high in actuarial science
(e.g., Ruckdeschel and Horbenkom, 2010; Skřivánková and Juhás, 2012) and the performance of the
MAD estimator was not researched by Boss (1982) for GPD. Hence we compute the MSE of the
parameter estimation for finite sample sizes n=50 and 100. These are the sample sizes which have
already been considered in the performance analyse of Hüsler et al. (2011, Figure 2-5), wherein the
Moment – ML method was also introduced. The scale parameter is =1 in all researched cases. The
considered extreme value index is between -2 and 2. The MSE is quantified empirically by point
estimations of 50,000 samples for each parameter variant. The samples are generated by Monte Carlo
simulation. The results are depicted in Figure 2. The MSE of 𝛾̂ of the MAD estimator is larger than of
the ML and Moment – ML method. But the MSE of scale parameter 𝜎̂ is the smallest for MAAD
estimator if >1. We underline that the ML method has not always a solution for <-0.5 (cf.
Grimshaw, 1993). That is why we only consider the ML method for ≥-0.5.
Figure 2.
6
Revised manuscript
In seismic hazard analysis, the annual probability of exceedance of local earthquake shaking intensity
Y is estimated (cf. Raschke, 2013). The ground motion relation is an element of the seismic hazard
model and describes the relation between concrete event parameters and source distance, being a
The random variable Y is the absolute peak of ground motion acceleration or a similar quantity. The
event-specific random component (random variable; residual in the sense of an regression model) is b
and has only one realization per event, and the individual random component a has a realization per
event and site. The value of corresponding variances V(a) and V(ln(a)) can considerably influence the
results of hazard estimation for large return periods. Hence the correct estimation of V(a) is very
important. Eq.(19) looks like an regression model but V(a) may not be estimated by regression
analysis (residual variance) because of the area-equivalence according to Raschke (2013). The
variance V(a) has to be estimated by a special procedure. Therein, we use the fact that there are two
horizontal component intensities Y1 and Y2 with random components a,1 and a,2. The shaking intensity
Yi are the maxima of absolute values of time history yi(t) of earthquake ground acceleration with
We also can consider the shaking intensities Y(w)= Y(w+) which depend on orientation angle w with
The issue is illustrated by the polar plot of an earthquake time history and the resulting shaking
intensities Y(w) in Figure 3a. Therein, the random components a,1 and a,2 determine the random
7
Revised manuscript
In practice, the concrete orientation angles w1 and w2 are frequently the geographic direction north-
south and east-west. However, it is important that the components are perpendicular to each other.
We approximate the stochastic mechanism which generates a for the estimation of V(a) by
random impulses Zi with a uniformly distributed random direction v according to Fig.3b. The random
element a is
The natural distribution of Zi is the Gumbel distribution (s. Johnson et al., 1994; and appendix) as it is
works like a cluster peak. Furthermore, we assume a Poisson distributed number k of random impulses
(s. appendix).
Figure 3.
We have no explicit formulations for the distributions of and , and approximate these by the
aforementioned Monte Carlo simulation and Kernel smoothing or by EDF. The ML and MAD
methods can be applied to these approximations for parameter estimation. Here, we combine these
with the moment estimator to ensure numerically that 𝐸(𝑎 ) = 1 and that the empirical variance of the
observed difference is equal to that modelled. In this way the parameters 𝐸(𝑍) and 𝑉(𝑍) are
determined by the moment estimator, and only the intensity of the Poisson distributed number k (s.
appendix) has to be estimated by the ML or MAD methods. We have computed the likelihood
functions for a sample of from data of peak ground accelerations of the SHARE project (Giardini,
2013; Share_Metafile_v3.2a.xls, free field observations with moment magnitudes) with n=1,829 and
𝑉̂() = 0.119. We approximate the PDF and the CDF for the likelihood functions by a sample of size
n*=100,000 generated by a Monte Carlo simulation. Therein we have generated 200 random impulses
Z for every realization of a,1, a,2 and , although the generated random integer number k of impulses
is much smaller. This procedure ensures smoother likelihood function graphs. For the same reason, the
random generator (Mersenne-Twister of library Extreme Optimization) was started again with the
same start value for every considered value of the Poisson intensity .
Figure 4.
8
Revised manuscript
The likelihood functions are shown in Figure 4 for steps 0.1 with Poisson intensity . The
advantage of the Bernoulli likelihood function of the MAD estimator is obvious; it has a smooth graph
with only one maximum, in contrast to the classical likelihood function of the ML method. The
disadvantage of the MAD estimator is a smaller slope of the Bernoulli likelihood function. The point
estimations of are 7.9 and 8.0. The actual target parameter is the corresponding variance V(a) and is
estimated at 0.058 for both estimation methods. Additionally, we have tested the estimation by a
comparison of modelled distributions of random the difference with the EDF of the observed
sample. The results are shown in Figure 5. The symmetric distributions correspond very well.
Figure 5.
6 Conclusions
We have researched the opportunities of the MAD estimator. Its advantages are
• The MSE is not much larger than that of the ML method (Boss, 1982). In some cases of finite
• It can be applied with approximated CDF. The computing speed is much higher than for the
• The MAD estimator with approximated CDF is not affected by a larger number of local
maxima of lB() (minima of A()) in the case of approximated CDF in contrast to the ML
method with approximated PDF (sect. 5). Automatic procedures can be applied more easily to
In other words, the MAD estimator offers a good and practical solution for special estimation
problems which can occur in the modelling of complex phenomena. The biggest disadvantage of the
MAD estimator is that its MSE can be higher than the MSE of the ML method.
9
Revised manuscript
References
Anderson, T.W. and Darling, D.A. (1954). A Test of Goodness-of-Fit. Journal of the American Statistical
Association 49:765–769.
Basu, A., Shioya, H., Park, C. (2011). Statistical Inference: The Minimum Distance Approach. Monographs on
Statistics and Applied Probability, Chapman & Hall, Taylor & Francis Group, Boca Raton.
Beirlant, J., Goegebeur, Y., Teugels, J., Segers, J. (2004). Statistics of Extremes: Theory and Applications. Wiley
Boos, D. (1982). Minimum anderson-darling estimation. Communications in Statistics - Theory and Methods
11:2747-2774.
Clemente, G.P., Saveli, N., Zappa, D. (2012). Modelling and calibration for non-life underwriting risk: from
empirical data to risk capital evaluation. Presentation, ASTIN Colloquium, 1-4 October 2013, Mexico City.
Coronel-Brizio, H.F., Hernandez-Montoya, A.R. (2005) On fitting the Pareto–Levy distribution to stock market
Drossos, C. A., Philippou, A. N. (1980). A Note on Minimum Distance Estimates. Annals of the Institute of
Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. Ann. Statist. 7:1-26.
Giardini, D., Woessner, J., Danciu, L. , Valensise, G., Grünthal, G., Cotton, F., Akkar, S., Basili, R., Stucchi, M.,
Rovida, A., Stromeyer, D. , Arvidsson, R., Meletti, F., Musson, R., Sesetyan, K., Demircioglu, M. B, Crowley,
H., Pinho, R. , Pitilakis, K. , Douglas, J. , Fonseca, J. , Erdik, M., Campos-Costa, A., Glavatovic, B. ,
Makropoulos, K., Lindholm, C., Cameelbeeck, T. (2013). Seismic Hazard Harmonization in Europe (SHARE):
Grimshaw, S. (1993). Computing maximum likelihood estimates for the generalized Pareto distribution.
Technometrics 35:185-191.
Hüsler, J., Li, D. , Raschke, M. (2011). Estimation for the generalized Pareto distribution using maximum
likelihood and goodness-of-fit. Communication in Statistics – Theory and Methods 40: 2500 – 2510.
Johnson, N.L., Kemp, A.W. and Kotz, S. 2005. Univariate discrete distributions, 3rd Ed.. Wiley Series in
10
Revised manuscript
Johnson, N.L., Kotz, S., Balakrishnan, N. (1994). Continuous univariate distributions, vol 1. 2nd edn. New
York; Wiley.
Kozek, A.S. (1998). On minimum distance estimation using Kolmogorov-Lévy type metrics. Austral. & New
NIED (National research institute for earth science and disaster prevention) (2015). Strong-motion networks,
Parr, W. C., Schucany, W. R. (1980). Minimum Distance and Robust Estimation. Journal of the American
Parr, W. C. l98l. Minimum Distance Estimation: A Bibliography , Communications in Statistics - Theory and
Raschke, M. (2013). Statistical modelling of ground motion relations for seismic hazard analysis. Journal of
Seismology 17:1157-1182.
Ruckdeschel, P., Horbenkom, N. (2010). Robustness properties of estimators in generalized Pareto models.
Media/Zentral/Pdf/Berichte_ITWM/2010/bericht_182.pdf)
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman & Hall/CRC.
Skřivánková, V., Juhás, M. (2012). EVT methods as risk management tools. Conference paper, 6th International
Scientific Conference Managing and Modelling of Financial Risks, 10-11 September 2012, Ostrava.
Stephens MA (1986). Test based on EDF statistics. in D’Augustino, RB, Stephens, MA (Editors) Goodness-of-
Fit Techniques. statistics: textbos and monographs, Vol. 68, Marcel Dekker, New York
Upton, G., Cook, I. (2008). A dictionary of statistics, 2nd edn. Oxford: Oxford University Press.
Wolfowitz, J. (1957). The minimum distance method. The Annals of Mathematical Statistics 28:75–88.
Appendix
The Gumbel distribution of a real valued random variable X has CDF (cf. Johnson et al., 1995, sect. 22)
𝑥−𝑎
𝐹 (𝑥) = 𝑒𝑥𝑝 (𝑒𝑥𝑝 (− 𝑏
)) (A1)
11
Revised manuscript
𝑏 2𝜋 2
𝑉(𝑥) = 6
. (A3)
The Poisson distribution for a discrete random variable K0 is formulated with (cf. Johnson et al.,
2005, sect. 4)
𝜆𝑥 exp(−𝜆)
𝑃 (𝐾 = 𝑥) = 𝑥!
(A4)
12
Revised manuscript
Tables
Table 1: Computation time of MAD estimator and ML method with approximated distributions
Sample size 𝑛 ∗ 100,000 1000,000
Estimator ML MAD ML MAD
Kernel/interpolation Gauss Epan. Biwei. Linear Spline Gauss Epan. Biwei. Linear Spline
Time [s] for the first computation
0.59 1.08 1.57 0.02 0.05 4.99 10.36 15.38 0.16 0.20
of likelihood function
Time [s] for entire estimation 8.87 8.77 13.29 0.80 0.73 83.00 109.36 432.55 9.07 10.31
Point estimation 𝜎̂ 0.79 1.00 1.00 1.17 1.08 0.99 0.99 0.99 1.16 1.16
13
Revised manuscript
Figures
a) b) c)
Figure 1: Distribution of the researched example of X according toEq.(17,18) with =1 and a generalized Pareto
distribution according to Eq.(19) with =5 and =1: a) CDF, b) survival function, c) analysed sample of X
a) b) c) d)
Figure 2: MSE of different estimation methods for GPD: a) MSE of 𝛾̂ for n=50, a) MSE of 𝛾̂ for n=100, c)
MSE of 𝜎̂ for n=50, d) MSE of 𝜎̂ for n=100 (light blue line: moment – ML method, broken red line: ML method,
solid black line: MAD method; supporting points according to Hüsler et al., 2011).
a) b)
Figure 3: Modelling of random component a. a) example of an earthquake time history [cm/s2] of station
FKS013 in Japan (NIED, 2015; Record Time 2001/10/02 17:20:12) and shaking intensity Y(w), b) random
impulse Zi, random angel vi and orientation wi of Eq.(21).
14
Revised manuscript
a) b)
c) d)
Figure 4: Likelihood functions for the ML and the MAD method for parametrisation of generation of
according to Eq.(22,23): a) l() of ML method according to Eq.(6), b) lB() of MAD estimator according
to Eq.(15), c) detail of a), d) detail of b).
a) b)
Figure 5: Validation of the modelled distribution of random difference : a) CDFs, b) CDFs with logarithmic
scale
15