Renyan Jiang - Introduction To Quality and Reliability Engineering - Springer - 2015-107-120
Renyan Jiang - Introduction To Quality and Reliability Engineering - Springer - 2015-107-120
Renyan Jiang - Introduction To Quality and Reliability Engineering - Springer - 2015-107-120
6.1 Introduction
Most of the models presented in Chaps. 3 and 4 are univariate life distributions.
Such models are suitable for modeling an i.i.d. random variable (e.g., time to the
first failure), and represent the average behavior of the population’s reliability
characteristics.
A repairable system can fail several times since the failed system can be restored
to its operating condition through corrective maintenance actions. If the repair time
is neglected, the times to failure form a failure point process. The time between the
ði 1Þth failure and the ith failure, Xi , is a continuous random variable. Depending
on the effect of the maintenance actions, the inter-failure times Xi ’s are generally not
i.i.d. As such, we need new models and methods for modeling the failure process.
This chapter focuses on such models and methods.
There are two categories of models for modeling a failure process. In the first
category of models, the underlying random variable is NðtÞ, which is the number of
failures by t; and in the second category of models, the underlying random variable
P
is Xi or Ti ¼ ij¼1 Xj , which is the time to the ith failure. We call the first category
of models the discrete models (which are actually counting process models) and the
second category of models the continuous models (which are actually variable-
parameter distribution models).
The model and method for modeling a given failure process depend on whether
or not the inter-failure times have a trend. As such, the trend analysis for a failure
process plays a fundamental role in reliability analysis of repairable systems. When
the trend analysis indicates that there is no trend for a set of inter-failure times, a
further test for their randomness is needed.
This chapter is organized as follows. We first look at the failure counting
process models in Sect. 6.2, and then look at the distribution models in Sect. 6.3.
Zt
MðtÞ ¼ FðtÞ þ Mðt xÞf ðxÞdx: ð6:1Þ
0
where l and r are the mean and standard deviation of the inter-failure time. The
variance of NðtÞ is given by
X
1
VðtÞ ¼ ð2n 1ÞF ðnÞ ðtÞ ½MðtÞ2 ð6:3Þ
n¼1
For a repairable system, a renewal process assumes that the system is returned to
an ‘‘as new’’ condition every time it is repaired. As such, the distribution of Xi is the
same as the distribution of X1 . For a multi-components series system, if each
component is replaced by a new one when it fails, then the system failure process is
a superposed renewal process. In general, a superposed renewal process is not a
renewal process. In fact, it is close to a minimal repair process when the number of
components is large.
If the times between failures are independent and identically exponentially dis-
tributed, the renewal process reduces into a homogeneous Poisson process (also
termed as stationary Poisson process). In this case, NðtÞ follows a Poisson distri-
bution with the Poisson parameter kt, where k is failure intensity.
where MðtÞ ¼ E½NðtÞ is the mean cumulative function. In this model, b provides
the following information:
• if b ¼ 1, the failure arrivals follow a homogeneous Poisson process;
• if b [ 1, the system deteriorates with time; and
• if b \ 1, the system improves with time.
92 6 Reliability Modeling of Repairable Systems
Depending on the time origin and observation window, the power-law model
can have two variants. If we begin the failure counting process at t ¼ d (either
known or unknown) and set this time as the time origin, then Eq. (6.5) can be
revised as
b
tþd b d
MðtÞ ¼ : ð6:6Þ
g g
Suppose we have several failure point processes that come from nominally identical
systems with different observation windows ðð0; Ti Þ; 1 i nÞ. Arrange all the
failure data in ascending order. The ordered data are denoted as
where tj ’s are failure times (i.e., not including censored times) and sj is the number
of the systems under observation at tj . The nonparametric estimate of the MCF is
given by
For a given theoretical model Mh ðtÞ such as the power-law model, the parameter
set h can be estimated by the MLM or LSM. The LSM is simple and estimates the
parameters by minimizing the sum of squared errors given by:
X
m
SSE ¼ ½Mh ðtj Þ Mðtj Þ2 : ð6:11Þ
j¼1
We consider three categories of models that can be used for modeling failure
processes in different situations. They are:
• Ordinary life distribution models;
• Imperfect maintenance models; and
• Distribution models with the parameters varying with the numbers of failures or
system age.
We briefly discuss them below.
Ordinary life distribution models can be used to model the renewal process and
minimal repair process. When each failure is corrected by a replacement or perfect
repair, times to failure form a renewal process, whose inter-failure times are i.i.d.
random variables and hence can be modeled by an ordinary life distribution.
When each failure is corrected by a minimal repair, times to failure form a
minimal repair process. After a minimal repair completed at age t, the time to the
next failure follows the conditional distribution of the underlying distribution (i.e.,
the distribution of X1 ¼ T1 ). This implies that the distribution of inter-failure times
can be expressed in terms of the underlying distribution though they are not i.i.d.
random variables.
When each failure is corrected by either a replacement or a minimal repair, the
inter-failure times can be modeled by a statistical distribution. Brown and Proschan
[2] develop such a model. Here, the item is returned to the good-as-new state with
probability p and to the bad-as-old state with probability q ¼ 1 p. The parameter
p can be constant or time-varying. The process reduces into the renewal process
when p ¼ 1 and into the minimal repair process when p ¼ 0.
94 6 Reliability Modeling of Repairable Systems
When each failure is corrected by an imperfect maintenance, the time to the next
failure depends on the effects of prior maintenance actions. As such, the ordinary
life distribution is no longer applicable, and a category of imperfect maintenance
models can be used for modeling subsequent failures.
Preventive maintenance (PM) aims to maintain a working item in a satisfactory
condition. The PM is often imperfect, whose effect is in between the perfect
maintenance and minimal maintenance. As such, the effect of PM can be repre-
sented by an imperfect maintenance model.
There are a large number of imperfect maintenance models in the literature, and
Pham and Wang [11] present a review on imperfect maintenance models, and Wu
[16] provides a comprehensive review on the PM models (which are actually
imperfect maintenance models). Several typical imperfect maintenance models will
be presented in Chap. 16.
This category of models assumes that Xi ’s can be represented by the same life
distribution family Fðx; hi Þ with the parameter set hi being functions of i or ti .
Clearly, when hi is independent of i or ti , the model reduces into an ordinary
distribution model.
For the bus-motor data shown in Table 5.2, Jiang [4] presents a normal variable-
parameter model, whose parameters vary with i; and Jiang [6] presents a Weibull
variable-parameter model, whose parameters are also functions of i. The main
advantage of these models is that they can be used to infer the life distribution after
a future failure.
6.4.1 An Illustration
Example 6.1 The data shown in Table 6.1 come from Ref. [12] and deal with
failure times (in 1000 h) of a repairable component in a manufacturing system.
6.4 A Procedure for Modeling Failure Processes 95
Under the assumption that the times to failure form an RP with the underlying
distribution being the Weibull distribution, we obtained the MLEs of the parameters
shown in the second column of Table 6.2.
Under the NHPP assumption with the MCF given by Eq. (6.5) (i.e., the power-
law model), we obtained the MLEs of the parameters shown in the third column of
Table 6.2 (for the MLE of the power-law model, see Sect. 11.5.3.1). The empirical
and fitted MCFs are shown in Fig. 6.1.
From Table 6.2 and Fig. 6.1, we have the following observations:
• the parameters of the fitted models are significantly different, but
• the plots of MðtÞ are close to each other.
A question is which model we should use. The answer to this question depends
on the appropriateness of the assumption for the failure process. This deals with
testing whether the failure process is stationary and whether the inter-failure times
are i.i.d. Such tests are called test for trend and test for randomness, respectively. As
a result, a procedure is needed to combine these tests to the modeling process.
Modeling a failure point process involves a multi-step procedure. Specific steps are
outlined as follows.
Step 1: Draw the plot of the MCF of data and other plots (e.g., running arith-
metic average plot, which will be presented later). If the plots indicate that the trend
is obvious, implement Step 3; otherwise, implement Step 2.
96 6 Reliability Modeling of Repairable Systems
14
12
10
8
M(t )
6
NHPP
4
Asymptotic RF
2
0
0 5 10 15 20 25
t
Step 2: If the trend is not very obvious, carry out one or more tests for
stationarity to further check for trend. If no trend is confirmed, a further test for i.i.d.
assumption needs to carry out. If the i.i.d. assumption is confirmed, the data can be
modeled by an appropriate life distribution model.
Step 3: This step is implemented when the inter-failure times have a trend or
they are not i.i.d. In this case, the data should be modeled using nonstationary
models such as the power-law model, variable-parameter models, or the like.
that there exists a trend in the process. However, when we cannot reject the null
hypothesis at the given level of significance, it does not necessarily imply that we
accept the null hypothesis unless the test has a particularly high power (which is the
probability of correctly rejecting the null hypothesis given that it is false [10]). This
is because the conclusion is made based on the assumption that the null hypothesis
is true and depends on the significance level (which is the probability the null
hypothesis to be rejected assumed that it is true [10]), whose value is commonly
small (0.05 or 0.01).
In this section, we present several tests for stationarity. We will use the data
shown in Table 6.1 to illustrate each test.
A plot of data helps get a rough impression for trend before conducting a quanti-
tative trend test. Such a plot is the empirical MCF. If the process is stationary, the
plot of empirical MCF is approximately a straight line through the origin.
Another useful plot is the plot of the running arithmetic average. Consider a set
P
of inter-failure times ðxi ; 1 i nÞ. Let ti ¼ ij¼1 xj . The running arithmetic
average is defined as below:
If the running arithmetic average increases as the failure number increases, the
time between failures is increasing, implying that the system’s reliability gets
improved with time. Conversely, if the running arithmetic average decreases with
the failure number, the average time between failures is decreasing, implying that
the system’s reliability deteriorates with time. In other words, if the process is
stationary, the plot of running arithmetic average is approximately a horizon line.
Figure 6.2 shows the plot of running arithmetic average for the data in Table 6.1.
As seen, the reliability gets improved at the beginning and then becomes stationary.
For this case, one could implement the second step or directly go to the third step.
Tests with HPP null hypothesis include Crow test, Laplace test, and Anderson-
Darling test.
98 6 Reliability Modeling of Repairable Systems
1.5
r (i )
1
0.5
0
0 2 4 6 8 10 12 14
i
This test is developed by Crow [3] and is based on the power-law model given by
Eq. (6.5). When b ¼ 1, the failure process follows an HPP. As such, the test
involves whether an estimate of b is significantly different from 1. The null
hypothesis is b ¼ 1 and the alternative hypothesis is b 6¼ 1.
For one system on test, the maximum likelihood estimate of b is
X
n
^ ¼ n=
b lnðT=ti Þ ð6:13Þ
i¼1
where n is the number of observed failures and T is the censored time, which can be
^ follows a chi-squared distribution
larger than or equal to tn . The test statistic 2n=b
with the degree of freedom of 2n. The rejection criterion for null hypothesis H0 is
given by
^ \ v2
2n=b ^
2n;1a=2 or 2n=b [ v2n;a=2 ð6:14Þ
2
where v2k;p is the inverse of the one-tailed probability of the chi-squared distribution
associated with probability p and degree of freedom k.
Example 6.2 Test the stationarity of the data in Table 6.1 using the Crow test.
From Eq. (6.13), we have b ^ ¼ 0:9440 and 2n=b ^ ¼ 25:423. For significant level
a ¼ 0:05, v22n;a=2 ¼ 39:364 and v22n;1a=2 ¼ 12:401. As a result, we cannot reject H0 .
X
n1
U¼ ti : ð6:15Þ
i¼1
The test statistic is the standard normal score given by Z ¼ ðU lU Þ=rU . For large
n, Z approximately follows a standard normal distribution. The rejection criterion
for H0 is given by
Example 6.2 (continued) Test the stationarity of the data in Table 6.1 using the
Laplace test.
From Eqs. (6.15) and (6.16), we have U ¼ 132.687, lU ¼ 121.682, rU ¼
21.182, and Z ¼ 0.5280. For a ¼ 0:05, za=2 ¼ z1a=2 ¼ 1:96. As a result, we
cannot reject H0 .
The Anderson–Darling test for trend is based on the Anderson–Darling test statistic
given by (see Ref. [8]):
1X n0 h t tn þ1i i
i
AD ¼ n0 ð2i 1Þ ln þ ln 1 0 ð6:18Þ
n0 i¼1 T T
From Eq. (6.18), we have AD ¼ 0:2981 and hence the null hypothesis is not
rejected for a ¼ 0:05.
Tests with RP null hypothesis include Mann test, Lewis–Robinson test, and gen-
eralized Anderson–Darling test.
This test is presented in Ref. [1] and is sometimes called reverse arrangement test or
pairwise comparison nonparametric test (see Refs. [14, 15]). The null hypothesis is
renewal process and the alternative hypothesis is nonrenewal process. The test
needs to compare all the interarrival times xj and xi for j [ i. Let uij ¼ 1 if xj [ xi ;
otherwise uij ¼ 0. The number of reversals of the data is given by
X
U¼ uij : ð6:19Þ
i\j
Too many reversals indicate an increasing trend, too few reversals imply a
decreasing trend, and there is no trend if the number of reversals is neither large nor
small.
Under H0 , the mean and variance of U are given, respectively, by
The test statistic is the standard normal score given by Z ¼ ðU lU Þ=rU . For large
n (e.g., n 10), Z approximately follows a standard normal distribution. The
rejection criterion for H0 is given by Eq. (6.17).
Example 6.2 (continued) Test the stationarity of the data in Table 6.1 using the
Mann test.
From Eqs. (6.19) and (6.20), we have U ¼ 44, lU ¼ 33, rU ¼ 7:2915 and
Z ¼ 1:5086. As a result, we cannot reject H0 for a ¼ 0:05.
Laplace test statistic and CV is the coefficient of variation for the observed inter-
arrival times. The critical value for rejecting H0 is shown in the third column of
Table 6.3.
Example 6.2 (continued) Test the stationarity of the data in Table 6.1 using the
Lewis–Robinson test.
Using the approach outlined above, we have Z ¼ 0.5280, CV ¼ 0.4051 and
LR ¼ 1.3034. As a result, we still cannot reject H0 for a ¼ 0:05.
The test statistic of the generalized Anderson–Darling test is given by (see Ref. [8])
n
ðn 4Þx2 X i 2 1 1 2
GAD ¼ 2
qi ln þ ðqi þ ri Þ ln 1 þ ri ð6:21Þ
r2 i¼1
i1 ni n
where
nxi 1 Xn1
qi ¼ ðti ixi Þ=tn ; ri ¼ 1; and r2 ¼ ðxiþ1 xi Þ2
tn 2ðn 1Þ i¼1
with
i 2 1
q2i ln j ¼ 0; ðqi þ ri Þ ln 1 þ j ¼ 0:
i 1 i¼1 n i i¼n
It is one-sided and the null hypothesis is rejected if GAD is greater than the critical
value, which is shown in the last column of Table 6.3.
Example 6.2 (continued) Test the stationarity of the data in Table 6.1 using the
Anderson–Darling test.
From Eq. (6.21), we have GAD ¼ 1:3826 and hence cannot reject the null
hypothesis for a ¼ 0:05.
The performances of the tests discussed above have been studied (see Refs. [8, 9,
15]), and the results are summarized in Table 6.4. It is noted that no test provides
“very good” performance for the decreasing case.
102 6 Reliability Modeling of Repairable Systems
Randomness means that the data are not deterministic and/or periodic. Tests for
randomness fall into two categories: nonparametric methods and parametric
methods. In this section, we focus on nonparametric methods.