Beta Distribution Fitting: Reliability Function
Beta Distribution Fitting: Reliability Function
CHAPTER 112
Beta Distribution
Fitting
Introduction
This module fits the beta probability distributions to a complete set of individual or grouped data
values. It outputs various statistics and graphs that are useful in reliability and survival analysis.
The beta distribution is useful for fitting data which have an absolute maximum (and minimum). It
finds some application as a lifetime distribution.
Technical Details
The four-parameter beta distribution is indexed by two shape parameters (P and Q) and two
parameters representing the minimum (A) and maximum (B). We will not estimate A and B, but
rather assume that they are known parameters.
Using these symbols, the beta density function may be written as
( )
( ) ( )
( )
f t P Q A B
B P Q
t A B t
B A
P Q A t B
P Q
P Q
( | , , , )
,
, , ,
results in the two-parameter beta distribution. This is also known as the standardized form of the
beta distribution. In this case, the density function is
( )
( ) f x P Q
B P Q
x x P Q x
P
Q
( | , )
,
, , , > > < <
1
1 0 0 0 1
1
1
Reliability Function
The reliability (or survivorship) function, R(t), gives the probability of surviving beyond time t. For
the beta distribution, the reliability function is
2192 Beta Distribution Fitting
( ) R T f t P Q A B dt
A
T
( ) | , , ,
1
where the integral is known as the incomplete beta function ratio.
The conditional reliability function, R(t,T), may also be of interest. The is the reliability of an item
given that it has not failed by time T. The formula for the conditional reliability is
R t
R T t
R T
( )
( )
( )
+
Hazard Function
The hazard function represents the instantaneous failure rate. For this distribution, the hazard
function is
h t
f t
R t
( )
( )
( )
+
03
04
Median (Exact)
The most popular method is to calculate the median rank for each data value. This is the median
rank of the j
th
sorted time value out of n values. The exact value of the median rank is calculated
using the formula
Chapter 112 Beta 2197
F t
n j
j
F
j
n j j
( )
. ; ( );
+
+
+
1
1
1
0 50 2 1 2
Mean j/(n+1)
The mean rank is sometimes recommended. In this case, the formula is
F t
j
n
j
( )
+1
White (j-3/8)/(n+1/4)
A formula proposed by White is sometimes recommended. The formula is
F t
j
n
j
( )
/
/
+
+
3 8
1 4
(j-0.5)/n
The following formula is sometimes used
F t
j
n
j
( )
.
05
Show Trend Line
This option controls whether the trend (least squares) line is calculated and displayed.
Show Residuals from Trend
This option controls whether the vertical deviations from the trend line are displayed. Displaying
these residuals may let you see departures from linearity more easily.
Lines Tab
These options specify the attributes of the lines used for each group in the hazard curves and
survival curves.
Line 1 - 15
These options specify the color, width, and pattern of the lines used in the plots of each group. The
first line is used by the first group, the second line by the second group, and so on. These line
attributes are provided to allow the various groups to be indicated on black-and-white printers.
Clicking on a line box (or the small button to the right of the line box) will bring up a window that
allows the color, width, and pattern of the line to be changed.
Symbols Tab
These options specify the attributes of the symbols used for each group in the probability plots.
Symbol 1 - 15
These options specify the symbols used in the plot of each group. The first symbol is used by the
first group, the second symbol by the second group, and so on. These symbols are provided to
allow the various groups to be easily identified, even on black and white printers.
Clicking on a symbol box (or the small button to the right of the symbol box) will bring up a
window that allows the color, width, and pattern of the line to be changed.
Legend Tab
This section specifies the legend.
2198 Beta Distribution Fitting
Show Legend
Specifies whether to display the legend.
Legend Text
Specifies legend label. A {G} is replaced by the name of the group variable.
Template Tab
The options on this panel allow various sets of options to be loaded (File menu: Load Template) or
stored (File menu: Save Template). A template file contains all the settings for this procedure.
File Name
Designate the name of the template file either to be loaded or stored.
Template Files
A list of previously stored template files for this procedure.
Template Ids
A list of the Template Ids of the corresponding files. This id value is loaded in the box at the
bottom of the panel.
Tutorial
This section presents an example of how to fit a beta distribution. The data used were shown above
and are found in the BETA database.
To run this example, take the following steps:
1 Open the BETA dataset.
From the File menu of the NCSS Data window, select Open.
Select the Data subdirectory of the NCSS97 directory.
Click on the file BETA.S0.
Click Open.
2 Open the Beta Fitting window.
On the menus, select Analysis, then Survival / Reliability, then Beta Fitting. The Beta
Fitting procedure will be displayed.
On the menus, select File, then New Template. This will fill the procedure with the default
template.
3 Specify the variables.
On the Beta Fitting window, select the Variables tab.
Double-click in the Time Variable box. This will bring up the variable selection window.
Select Time from the list of variables and then click Ok.
Click in the Beta Maximum box. Enter 100 for the maximum value.
4 Run the procedure.
From the Run menu, select Run Procedure. Alternatively, just click the Run button (the
left-most button on the button bar at the top).
Chapter 112 Beta 2199
Data Summary Section
Data Summary Section
Type of
Observation Rows Count Minimum Maximum Average Sigma
Failed 10 10 23.5 95.3 70.8 21.2021
This report displays a summary of the data that were analyzed. Scan this report to determine if
there were any obvious data errors by double-checking the counts and the minimum and maximum.
Parameter Estimation Section
Parameter Estimation Section
Method of Maximum MLE MLE MLE
Moments Likelihood Standard 95% Lower 95% Upper
Parameter Estimate Estimate Error Conf. Limit Conf. Limit
Minimum (A) 0 0
Maximum (B) 100 100
P 2.548055 3.301583 1.485834 0.3894027 6.213764
Q 1.050893 1.414615 0.577846 0.2820573 2.547172
Log Likelihood -3.403845
Mean 70.8 70.00519
Median 74.91825 73.002
Mode 96.81711 84.73547
Sigma 21.2021 19.16614
This report displays parameter estimates along with standard errors and confidence limits in the
maximum likelihood case.
Method of Moments Estimate
By equating the theoretical moments with the data moments, the following estimates are obtained.
( )
~
P
m A
B A
m A
B A
m
B A
m A
B A
1
]
1
1
]
1
1
]
1
1
1
]
1
1
2
1
2
2
1
1
( )
~ ~
Q
m A
B A
m A
B A
m
B A
P
1
]
1
1
]
1
1
]
1
1
1 1
2
2
1
where m
1
is the usual estimator of the mean and m
2
is the usual estimate of the standard deviation.
Maximum Likelihood Estimates of A, C, and D
These estimates maximize the likelihood function. The maximum likelihood equations are
( ) ( )
$ $ $
log P P Q
n
t A
B A
j
j
n
+
_
,
1
1
2200 Beta Distribution Fitting
( ) ( )
$ $ $
log Q P Q
n
B t
B A
j
j
n
+
_
,
1
1
where
( )
x is the digamma function.
The formulas for the standard errors and confidence limits come from the inverse of the Fisher
information matrix, {f(i,j)}. The standard errors are given as the square roots of the diagonal
elements f(1,1) and f(2,2). The confidence limits for P are
$ $
( , )
, / /
P P z f
lower 1 2 1 2
11
$ $
( , )
, / /
P P z f
upper 1 2 1 2
11
+
The confidence limits for Q are
$ $
( , )
, / /
Q Q z f
lower 1 2 1 2
2 2
$ $
( , )
, / /
Q Q z f
upper 1 2 1 2
2 2
+
Log Likelihood
This is the value of the log likelihood function. This is the value being maximized. It is often used as
a goodness-of-fit statistic. You can compare the log likelihood value from the fits of your data to
several distributions and select as the best fitting the one with the largest value.
Mean
This is the mean time to failure (MTTF). It is the mean of the random variable (failure time) being
studied given that the beta distribution provides a reasonable approximation to your datas actual
distribution.
The formula for the mean is
( )
Mean A
P B A
P Q
+
+
Median
The median of the beta distribution is the value of t where F(t)=0.5.
Median A I P Q + ( . , , ) 05
where I(0.5,P,C) is the incomplete beta function.
Mode
The mode of the beta distribution is given by
( )( )
Mode A
P B A
P Q
+
+
1
2
Chapter 112 Beta 2201
when A > 1 and D otherwise.
Sigma
This is the standard deviation of the failure time. The formula for the standard deviation (sigma) of
a beta random variable is
( )
( ) ( )
+ + +
PQ B A
P Q P Q
2
2
1
Inverse of Fisher Information Matrix
Inverse of Fisher Information Matrix
Parameter Scale Shape
Scale 2.207702 0.6725335
Shape 0.6725335 0.333906
This table gives the inverse of the Fisher information matrix for the two-parameter beta. These
values are used in creating the standard errors and confidence limits of the parameters and reliability
statistics. The approximate Fisher information matrix is given by the 2-by-2 matrix whose elements
are
( ) ( )
( ) ( ) ( ) ( ) ( ) { } ( )
f
Q P Q
n P Q P Q P Q
( , )
$ $ $
$ $ $ $ $ $
11
+
+ +
( )
( ) ( ) ( ) ( ) ( ) { } ( )
f f
P Q
n P Q P Q P Q
( , ) ( , )
$ $
$ $ $ $ $ $
1 2 21
+
+ +
( ) ( )
( ) ( ) ( ) ( ) ( ) { } ( )
f
P P Q
n P Q P Q P Q
( , )
$ $ $
$ $ $ $ $ $
2 2
+
+ +
where
( )
z is the trigamma function and n represents the sample size.
Kaplan-Meier Product-Limit Survival Distribution
Kaplan-Meier Product-Limit Survival Distribution
Lower Upper Lower Upper
Failure 95% C.L. Estimated 95% C.L. 95% C.L. Estimated 95% C.L. Sample
Time Survival Survival Survival Hazard Hazard Hazard Size
23.5 0.714061 0.900000 1.000000 0.000000 0.105361 0.336786 10
50.1 0.552082 0.800000 1.000000 0.000000 0.223144 0.594059 9
65.3 0.415974 0.700000 0.984026 0.016103 0.356675 0.877132 8
68.9 0.296364 0.600000 0.903636 0.101328 0.510826 1.216168 7
70.4 0.190102 0.500000 0.809898 0.210848 0.693147 1.660192 6
77.3 0.096364 0.400000 0.703636 0.351494 0.916291 2.339626 5
81.6 0.015974 0.300000 0.584026 0.537810 1.203973 4.136778 4
85.7 0.000000 0.200000 0.447918 0.803145 1.609438 3
89.9 0.000000 0.100000 0.285939 1.251978 2.302585 2
95.3 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1
Confidence Limits Method: Linear (Greenwood)
2202 Beta Distribution Fitting
This report displays the Kaplan-Meier product-limit survival distribution and hazard function along
with confidence limits. The formulas used were presented in the Technical Details section earlier in
this chapter. Note that these estimates do not use the beta distribution in any way. They are the
nonparametric estimates and are completely independent of the distribution that is being fit. We
include them for reference.
Note that the Sample Size is given for each time period. As time progresses, participants are
removed from the study, reducing the sample size. Hence, the survival results near the end of the
study are based on only a few participants and are therefore less reliable. This shows up in a
widening of the confidence limits.
Reliability Section
Reliability Section
ProbPlot MLE
Estimated Estimated
Fail Time Reliability Reliability
5.0 0.999474 0.999900
10.0 0.996931 0.999030
15.0 0.991393 0.996366
20.0 0.982123 0.990777
25.0 0.968503 0.981102
30.0 0.949995 0.966195
35.0 0.926119 0.944961
40.0 0.896446 0.916390
45.0 0.860585 0.879593
50.0 0.818187 0.833836
55.0 0.768940 0.778582
60.0 0.712568 0.713545
65.0 0.648840 0.638750
70.0 0.577573 0.554623
75.0 0.498648 0.462119
80.0 0.412033 0.362916
85.0 0.317839 0.259761
90.0 0.216436 0.157159
95.0 0.108834 0.063202
100.0 0.000000 0.000000
This report displays the estimated reliability (survivorship) at the time values that were specified in
the Times option of the Reports Tab. Reliability may be thought of as the probability that failure
occurs after the given failure time. Thus, (using the ML estimates) the probability is 0.944961 that
failure will not occur until after 35 hours.
Two reliability estimates are provided. The first uses the method of moments estimates and the
second uses the maximum likelihood estimates. Confidence limits are not available. The formulas
used are as follows.
Estimated Reliability
The reliability (survivorship) is calculated using the beta distribution as
$
( )
$
( ) ; , R t S t I
t A
B A
P Q
_
,
1
Percentile Section
Percentile Section
MOM MLE
Failure Failure
Percentile Time Time
Chapter 112 Beta 2203
0.050000 30.0 33.9
0.100000 39.5 42.4
0.150000 46.3 48.3
0.200000 51.9 53.2
0.250000 56.8 57.3
0.300000 61.0 61.0
0.350000 64.9 64.3
0.400000 68.5 67.4
0.450000 71.8 70.3
0.500000 74.9 73.0
0.550000 77.9 75.6
0.600000 80.7 78.2
0.650000 83.3 80.6
0.700000 85.9 83.1
0.750000 88.4 85.5
0.800000 90.8 87.9
0.850000 93.1 90.4
0.900000 95.4 92.9
0.950000 97.7 95.8
This report displays failure time percentiles using the method of moments and the maximum
likelihood estimates. No confidence limit formulas are available.
The formulas used are
Estimated Percentile
The time percentile at P (which ranges between zero and one) is calculated using
( )
$
( ; , ) t A I p A C B A
p
+
Product-Limit Survivorship Plot
0.000
0.250
0.500
0.750
1.000
20.0 40.0 60.0 80.0 100.0
Survivorship: S(t) Plot
Time
S
u
r
v
i
v
o
r
s
h
i
p
:
S
(
t
)
This plot shows the product-limit survivorship function for the data analyzed. If you have several
groups, a separate line is drawn for each group. The step nature of the plot reflects the
nonparametric product-limit survival curve.
2204 Beta Distribution Fitting
Hazard Function Plot
0.000
0.625
1.250
1.875
2.500
20.0 40.0 60.0 80.0 100.0
Hazard Fn Plot
Time
H
a
z
a
r
d
F
n
This plot shows the cumulative hazard function for the data analyzed. If you have several groups,
then a separate line is drawn for each group. The shape of the hazard function is often used to
determine an appropriate survival distribution.
Beta Reliability Plot
0.000
0.250
0.500
0.750
1.000
20.0 40.0 60.0 80.0 100.0
S(t) Beta Plot
Time
S
(
t
)
B
e
t
a
This plot shows the product-limit survival function (the step function) and the beta distribution
overlaid. If you have several groups, a separate line is drawn for each group.
Chapter 112 Beta 2205
Beta Probability Plots
20.000
40.000
60.000
80.000
100.000
0.3 0.5 0.7 0.8 1.0
Beta Prob Plot at MLE's
Beta Quantile
T
i
m
e
This is a beta probability plot for these data. The expected quantile of the theoretical distribution is
plotted on the horizontal axis. The time value is plotted on the vertical axis. Also note that for
grouped data, only one point is shown for each group.
This plot lets you investigate the goodness of fit of the beta distribution to your data. If the points
seem to fall along a straight line, the beta probability model may be useful. You have to decide
whether the beta distribution is a good fit to your data by looking at this plot and by comparing the
value of the log likelihood to that of other distributions.
Grouped Data
The case of grouped data causes special problems when creating a probability plot. Remember that
the horizontal axis represents the expected quantile from the beta distribution for each (sorted)
failure time. In the regular case, we used the rank of the observation in the overall dataset.
However, in case of grouped data, we most use a modified rank. This modified rank, O
j
, is
computed as follows
O O I
j p j
+
where
( )
I
n O
c
j
p
+
+
1
1
were I
j
is the increment for the jth failure; n is the total number of data points; O
p
is the order of
the previous failure; and c is the number of data points remaining in the data set, including the
current data. Implementation details of this procedure may be found in Dodson (1994)