Exp Assum START
Exp Assum START
Figure 1. Normal Distribution of Times to Failure We leave the details of obtaining these two specific statistics or
formulas for another paper.
350
Since in reality the ten data points come from the Exponential,
300 only the CI (55.11, 196.3) is correct and its coverage probability
250 (95%) is the one prescribed. Had we erroneously assumed
Frequency
200
Normality, the CI obtained under this assumption, for this small
sample, would have been incorrect. Moreover, its true coverage
150 probability (confidence) would be unknown and every policy,
100 derived under such unknown probability, is at risk.
50
This example illustrates and underlines how important it is to
0 establish the validity (or at least the strong plausibility) of the
0 100 200 300 400 500 600 700 800 underlying statistical distribution of the data.
Times to Failure
Statistical Assumptions and their Implications
Figure 2. Exponential Distribution of Times to Failure Every statistical model has its own assumptions that have to be
verified and met, to provide valid results. In the Exponential
There are practical consequences of data fitting one or the other case, the CI for the mean life of a device requires two assump-
of these two different distributions. Normal lives are symmet- tions: that the lives of the tested devices are (1) independent,
ric about 100 and concentrated in the range of 40 to 160 (three and (2) Exponentially distributed. These two statistical assump-
standard deviations, on each side of the mean, which comprises tions must be met (and verified) for the corresponding CI to
99% of the population). Exponential lives, on the other hand, cover the true mean with the prescribed probability. But if the
are right-skewed, with a relatively large proportion of device data do not follow the assumed distribution, the CI coverage
lives much smaller than 40 units and a small proportion of device probability (or its confidence) may be totally different than the
lives larger than 200 units. one prescribed.
To highlight the consequences of choosing the wrong distribu- Fortunately, the assumptions for all distribution models (e.g.,
tion, consider a sample of n = 10 data points (Table 1). We will Normal, Exponential, Weibull, Lognormal, etc.) have practical
obtain a 95% CI for the mean of these data, using two different and useful implications. Hence, having some background infor-
distribution assumptions: Exponential and Normal. mation about a device may help us assess its life distribution.
Table 1. Small Sample Data Set A case in question occurs with the assumption that the distribu-
5.950 119.077 366.074 155.848 30.534 tion of the lives of a device is Exponential. An implication of the
20.615 15.135 3.590 103.713 120.859 Exponential is that the device failure rate is constant. In prac-
tice, the presence of a constant failure rate may be confirmed
2
from observing the times between failures of a process where They are practical for the engineer because they are largely intu-
failures occur at random times. itive and easy to implement.
In general, if we observe any process composed of events that To assess the data in Table 2, using this more practical approach,
occur at random times (say lightning strikes, coal mine acci- we first obtain their descriptive statistics (Table 3). Then, we
dents, earthquakes, fires, etc.), the times between these events analyze and plot the raw data in several ways, to check (empiri-
will be Exponentially distributed. The probability of occurrence cally but efficiently) if the Exponential assumption holds.
of the next event is independent of the occurrence time of the
past event. As a result, phrases such as old is as good as new Table 3. Descriptive Statistics of Data in Table 2
have a valid meaning. [It is important to note that although fail- Variable n Mean Median Std. Dev.
ures may occur at random times, they do not occur for no rea- Exp. Data 45 99.9 77.1 85.6
son. Every failure has an underlying cause.]
In what follows, we will use statistical properties derived from Where Mean is the average of the data and the Standard
Exponential distribution implications to validate the Exponential Deviation is the square root of:
assumption.
2 (
å xi - x
2
)
Practical Methods to Verify Exponential S =
n -1
Assumptions
Several empirical and practical methods can be used to establish It is worthwhile to notice that the values of the sample mean and
the validity of the Exponential distribution. We will illustrate the standard deviation are the same, irrespective of the underlying
process of validating the Exponential assumptions using the life distribution. What will change are the properties of such values,
test data in Table 2. This larger sample (n = 45) was generated a fact that can be used to help identify the distribution in ques-
following the same process used to generate the previous small- tion.
er sample (n = 10) presented in Table 1.
There are a number of useful and easy to implement procedures,
Table 2. Large Sample Life Data Set based on well-known statistical properties of the Exponential
12.411 58.526 46.684 49.022 77.084 7.400 distribution, which help us to informally assess this assumption.
21.491 28.637 16.263 53.533 93.241 43.911 These properties are summarized in Table 4.
33.771 78.954 399.071 102.947 118.077 61.894
72.435 108.561 46.252 40.479 95.291 10.291 Table 4. Some Properties of the Exponential Distribution
27.668 116.729 149.432 59.067 199.458 45.771 1. The theoretical mean and standard deviation are equal1;
272.005 60.266 233.254 87.592 137.149 50.668 sample hence, the values of mean and standard deviation
89.601 313.879 150.011 173.580 220.413 182.737
should be close.
6.171 162.792 82.273
2. Histogram should show that the distribution is right-
skewed (Median < Mean).
In this data set, two distribution assumptions need to be verified 3. A plot of Cumulative-Failure vs. Cumulative-Time should
or assessed: (1) that the data are independent and (2) that they be close to linear.
are identically distributed as an Exponential. 4. The regression slope of Cum-Failure vs. Cum-Time is
close to the failure rate.
The assumption of independence implies that randomization 5. A plot of Cum-Rate vs. Cum-Failure should decrease/sta-
(sampling) of the population of devices (and other influencing fac- bilize at the failure rate level.
tors) must be performed before placing them on test. For exam- 6. Plots of the Exponential probability and its scores should
ple, device operators, times of operations, weather conditions, also be close to linear.
1
location of devices in warehouses, etc., should be randomly Although the exponential is a one-parameter distribution, it has a stan-
selected so they become representative of these characteristics and dard deviation. All distributions, except for the Cauchy, have a standard
of the environment in which devices will normally be operated. deviation.
3
The theoretical Exponential standard deviation s is equal to the The regression equation is:
mean. Hence, one standard deviation above and below the mean
yields a range of population values 0 to 2s, which comprises the Cum-Fail = 5.80 + 0.00931 Cum-Time
majority (86% of the values) of the population (see Figure 3). S = 2.283 R-Sq = 97.0%
For reference, in the Normal distribution, one standard deviation
above and below the Mean comprise only 68% of the population. Notice how the regression in Table 5 is significant, as shown by
The corresponding sample points under these ranges should be the large T (= 37.59) test value for the regression coefficient and
commensurate to these percentages and provide an indication to 2
by the large Index of Fit (R = 0.97). Both results suggest that a
which distribution they come from (especially in large samples). linear regression with slope equal to failure rate is plausible.
10
0.08
0.07
0
0.06
Failure Rate
0 50 100 150 200 250 300 350 400
Cumulative 0.05
Failure Times
0.04
Median Mean 0.03
0 10 20 30 40 50
If we regress the Cumulative Failures on Cumulative Test Time
(Table 5), the result is a straight line (Figure 4), whose slope Cumulative No. of Failures
(0.00931) is close to the true (0.01) failure rate (Property 4).
Figure 5. Plot of Cum-Rate vs. Cum-Fail Stabilizes (flat) and
Table 5. Cum-Fail vs. Cum-Time Regression Analysis Converges to the Exponential Failure Rate (Close to Value
0.01) (Property No. 5)
Predictor Coeff. Std. Dev. T P
Constant 5.8048 0.5702 10.18 0.000 The Probability Plot is one where the Exponential Probability (PI)
Cum-Time 0.0093135 0.0002478 37.59 0.000 is plotted vs. I/(n + 1) (where I is the data sequence order, i.e., I
= 1,
, 45). Each PI is obtained by calculating the Exponential
probability of the corresponding failure data, XI using the sample
50
mean (see Figure 6). For example, the first sorted (smallest) data
Cumulative Number of
4
where: All of the preceding empirical results contribute to support the
plausibility of the assumption of the Exponentiality of the given
life data. If, at this point, a stronger case for the validity of the
i Exponential distribution is required, then a number of theoreti-
P99.9 (X i ) »
n +1 cal GoF tests can be carried out with the data.
0.9 -but real and complicated ones, whose settings are not perfect.
0.8 Consequently, some statistical assumptions may not be met.
0.7 This does not, however, necessarily preclude the use of statisti-
0.6
cal procedures.
0.5
0.4
0.3 In such cases, some assumptions may have to be relaxed and
0.2 some of the inferences (results) may have to be interpreted with
0.1 care and used with special caution. The best criteria to establish
0.0 such relaxation and interpretation of the rules (e.g., which
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 assumptions can be relaxed and by how much) often come from
a thorough knowledge of the underlying engineering and statis-
I/(n + 1); I = 1,
, n
tical theories, from extensive professional experience and from a
Figure 6. Plot of Exponential Probability (PI) vs. I/(n+1) ; I = deep understanding of the specific processes under considera-
1,
, n is Close to Linear, as Expected When the Data Come tion.
from an Exponential Distribution (Property 6)
Summary
Substituting in the above formula I/(n + 1) for I = 1, we get the This START sheet discussed the important problem of (empiri-
first exponential score: cally) assessing the Exponential distribution assumptions.
Several numerical and graphical examples were provided,
æ 1 ö together with some related theoretical and practical issues, and
X i = - 99.9lnç1 - ÷ = - 99.9ln(0.9783) = - 99.9x(-0.022) = 2.2 some background information and references to further read-
è 46 ø
ings.
The scores are then plotted vs. their corresponding sorted real Other, very important, reliability analysis topics were mentioned
data values (in the case above, 2.2 is plotted against 6.17, the in this paper. Due to their complexity, these will be treated in
smallest data point). When the data come from an Exponential more detail in separate, forthcoming START sheets.
Distribution, this plot is close to a straight line (Property 6), see
Figure 7.
Bibliography
400 1. Practical Statistical Tools for Reliability Engineers,
Coppola, A., RAC, 1999.
2. A Practical Guide to Statistical Analysis of Material
Exponential Scores
300
Property Data, Romeu, J.L. and C. Grethlein. AMPTIAC,
2000.
200
3. Mechanical Applications in Reliability Engineering,
Sadlon, R.J., RAC, 1993.
100 4. Reliability and Life Testing Handbook (Vols. 1 & 2),
Kececioglu, D., Editor, Prentice Hall, NJ, 1993.
0
5
About the Author Romeu is a senior technical advisor for reliability and advanced
Dr. Jorge Luis Romeu has over thirty years of statistical and information technology research with IIT Research Institute
operations research experience in consulting, research, and (IITRI). Since joining IITRI in 1998, Romeu has provided con-
teaching. He was a consultant for the petrochemical, construc- sulting for several statistical and operations research projects.
tion, and agricultural industries. Dr. Romeu has also worked in He has written a State of the Art Report on Statistical Analysis
statistical and simulation modeling and in data analysis of soft- of Materials Data, designed and taught a three-day intensive
ware and hardware reliability, software engineering and eco- statistics course for practicing engineers, and written a series of
logical problems. articles on statistics and data analysis for the AMPTIAC
Newsletter and RAC Journal.
Dr. Romeu has taught undergraduate and graduate statistics,
operations research, and computer science in several American Other START Sheets Available
and foreign universities. He teaches short, intensive profes- Many Selected Topics in Assurance Related Technologies
sional training courses. He is currently an Adjunct Professor of (START) sheets have been published on subjects of interest in
Statistics and Operations Research for Syracuse University and reliability, maintainability, quality, and supportability. START
a Practicing Faculty of that schools Institute for Manufacturing sheets are available on-line in their entirety at <http://rac.
Enterprises. iitri.org/DATA/START>.
For his work in education and research and for his publications
and presentations, Dr. Romeu has been elected Chartered
Statistician Fellow of the Royal Statistical Society, Full
Member of the Operations Research Society of America, and For further information on RAC START Sheets contact the:
Fellow of the Institute of Statisticians.
Reliability Analysis Center
Romeu has received several international grants and awards, 201 Mill Street
including a Fulbright Senior Lectureship and a Speaker Rome, NY 13440-6916
Specialist Grant from the Department of State, in Mexico. He Toll Free: (888) RAC-USER
has extensive experience in international assignments in Spain Fax: (315) 337-9932
and Latin America and is fluent in Spanish, English, and
French. or visit our web site at:
<http://rac.iitri.org>