Reliability Engineering Methods and Applications
Reliability Engineering Methods and Applications
Reliability Engineering
Theory and Applications
Edited by Ilia Vonta and Mangey Ram
Reliability Engineering
Methods and Applications
Edited by Mangey Ram
For more information about this series, please visit: https:// www.crcpress.com/
Reliability-Engineering-Theory-and-Applications/Vonta-Ram/p/book/9780815355175
Reliability Engineering
Methods and Applications
Edited by
Mangey Ram
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereaf-
ter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the
CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Chapter 5 Markov Chains and Stochastic Petri Nets for Availability and
Reliability Modeling......................................................................... 127
Paulo Romero Martins Maciel, Jamilson Ramalho Dantas,
and Rubens de Souza Matos Júnior
v
vi Contents
Chapter 15 Vulnerability Discovery and Patch Modeling: State of the Art........ 401
Avinash K. Shrivastava, P. K. Kapur, and Misbah Anjum
Index....................................................................................................................... 439
Preface
The theory, methods, and applications of reliability analysis have been developed
significantly over the last 60 years and have been recognized in many publications.
Therefore, awareness about the importance of each reliability measure of the system
and its fields is very important to a reliability specialist.
This book Reliability Engineering: Methods and Applications is a collection of
different models, methods, and unique approaches to deal with the different techno-
logical aspects of reliability engineering. A deep study of the earlier approaches and
models has been done to bring out better and advanced system reliability techniques
for different phases of the working of the components. Scope for future develop-
ments and research has been suggested.
The main areas studied follow under different chapters:
Chapter 1 provides the review and analysis of preventive maintenance modeling
issues. The discussed preventive maintenance models are classified into two main
groups for one-unit and multi-unit systems.
Chapter 2 provides the literature review on the most commonly used optimal
inspection maintenance mode using appropriate inspection strategy analyzing the
complexity of the system whether single or multi-stage system etc. depending on the
requirements of quality, production, minimum costs, and reducing the frequency of
failures.
Chapter 3 presents the application of stochastic processes in degradation modeling
to assess product/system performances. Among the continuous stochastic processes,
the Wiener, Gamma, and inverse Gaussian processes are discussed and applied for
degradation modeling of engineering systems using accelerated degradation data.
Chapter 4 presents a novel approach for analysis of Failure Modes and Effect
Analysis (FMEA)-related documents through a semi-automatic procedure involving
semantic tools. The aim of this work is reducing the time of analysis and improving
the level of detail of the analysis through the introduction of an increased number of
considered features and relations among them.
Chapter 5 studies the reliability and availability modeling of a system through
Markov chains and stochastic Petri nets.
Chapter 6 talks about the fault tree analysis technique for the calculation of
reliability and risk measurement in the transportation of radioactive materials.
This study aims at reducing the risk of environmental contamination caused due to
human errors.
Chapter 7 surveys the failure rate functions of replacement times, random, and
periodic replacement models and their properties for an understanding of the com-
plex maintenance models theoretically.
Chapter 8 highlights the design of accelerated life tests with competing failure
modes which give rise to competing risk analysis. This design helps in the prediction
of the product reliability accurately, quickly, and economically.
vii
viii Preface
Mangey Ram
Graphic Era (Deemed to be University), India
Acknowledgments
The Editor acknowledges CRC Press for this opportunity and professional sup-
port. My special thanks to Ms. Cindy Renee Carelli, Executive Editor, CRC Press/
Taylor & Francis Group for the excellent support she provided me to complete this
book. Thanks to Ms. Erin Harris, Editorial Assistant to Mrs. Cindy Renee Carelli,
for her follow up and aid. Also, I would like to thank all the chapter authors and
reviewers for their availability for this work.
Mangey Ram
Graphic Era (Deemed to be University), India
ix
Editor
Dr. Mangey Ram received a PhD degree major in Mathematics and minor in
Computer Science from G. B. Pant University of Agriculture and Technology,
Pantnagar, India. He has been a Faculty Member for over 11 years and has taught
several core courses in pure and applied mathematics at undergraduate, postgradu-
ate, and doctorate levels. He is currently a Professor at Graphic Era (Deemed to be
University), Dehradun, India. Before joining Graphic Era, he was a Deputy Manager
(Probationary Officer) with Syndicate Bank for a short period. He is Editor-in-Chief
of International Journal of Mathematical, Engineering and Management Sciences
and the guest editor and member of the editorial board of various journals. He is
a regular reviewer for international journals, including IEEE, Elsevier, Springer,
Emerald, John Wiley, Taylor & Francis, and many other publishers. He has pub-
lished 150-plus research publications in IEEE, Taylor & Francis, Springer, Elsevier,
Emerald, World Scientific, and many other national and international journals of
repute and presented his works at national and international conferences. His fields
of research are reliability theory and applied mathematics. Dr. Ram is a Senior
Member of the IEEE, life member of Operational Research Society of India, Society
for Reliability Engineering, Quality and Operations Management in India, Indian
Society of Industrial and Applied Mathematics, member of International Association
of Engineers in Hong Kong, and Emerald Literati Network in the UK. He has been
a member of the organizing committee of a number of international and national
conferences, seminars, and workshops. He was conferred with the Young Scientist
Award by the Uttarakhand State Council for Science and Technology, Dehradun,
in 2009. He was awarded the Best Faculty Award in 2011; the Research Excellence
Award in 2015; and the Outstanding Researcher Award in 2018 for his significant
contribution in academics and research at Graphic Era (Deemed to be University)
in, Dehradun, India.
xi
Contributors
Misbah Anjum Kanchan Jain
Amity Institute of Information Department of Statistics
Technology Panjab University
Amity University Chandigarh, India
Noida, India
Rivero Oliva Jesús
Laurent Bordes Departamento de Engenharia Nuclear
Laboratory of Mathematics and its Universidade Federal do Rio de Janeiro
Applications—IPRA, UMR 5142 (UFRJ)
University of Pau and Pays Rio de Janeiro, Brazil
Adour—CNRS—E2S UPPA
Pau, France Salomón Llanes Jesús
GAMMA SA
Jose Carpio La Habana, Cuba
Department of Electrical, Electronic
and Control Engineering P. K. Kapur
Spanish National Distance Education Amity Centre for Interdisciplinary
University Research
Madrid, Spain Amity University
Noida, India
Jamilson Ramalho Dantas
Departamento de Ciência da Akshay Kumar
Computação Centro de Informática Department of Mathematics
da UFPE—CIN Recife Graphic Era Hill University
Pernambuco, Brasil Dehradun, India
and
Shah Limon
Departamento de Ciência da Industrial & Manufacturing
Computação Universidade Federal Engineering
do Vale do São Francisco— North Dakota State University
UNIVASF Campus Salgueiro Fargo, North Dakota
Salgueiro, Pernambuco, Brasil
Claudio Cunha Lopes
Maritza Rodriguez Gual Department of Reactor Technology
Department of Reactor Technology Service ( SETRE)
Service (SETRE) Centro de Desenvolvimento da
Centro de Desenvolvimento da Tecnologia Nuclear—CDTN
Tecnologia Nuclear—CDTN Belo Horizonte, Brazil
Belo Horizonte, Brazil
xiii
xiv Contributors
CONTENTS
1.1 Introduction........................................................................................................1
1.2 Preventive Maintenance Modeling for Single-Unit Systems.............................3
1.3 Preventive Maintenance Modeling for Multi-unit Systems............................. 14
1.4 Conclusions and Directions for Further Research...........................................24
References.................................................................................................................26
1.1 INTRODUCTION
Preventive maintenance (PM) is an important part of facilities management in many
of today’s companies. The goal of a successful PM program is to establish consistent
practices designed to improve the performance and safety of the operated equip-
ment. Recently, this type of maintenance strategy is applied widely in many techni-
cal systems such as production, transport, or critical infrastructure systems.
Many studies have been devoted to PM modeling since the 1960s. One of the first
surveys of maintenance policies for stochastically failing equipment—where PM
models are under investigation—is given in [1]. In this work, the author investigated
PM for known and uncertain distributions of time to failure. Pierskalla and Voelker [2]
prepared another excellent survey of maintenance models for proper scheduling and
optimizing maintenance actions, which Valdez-Flores and Feldman [3] updated later.
Other valuable surveys summarize the research and practice in this area in different
ways (e.g., [4–18]. In turn, the comparison between time-based maintenance and
condition-based maintenance is the authors’ area of interest, e.g., in works [19,20]).
In this chapter, the author focuses on the review and summary of recent PM
policies developed and presented in the literature. The adopted main maintenance
models classification is based on developments given in [15–18]. The models classi-
fication includes two main groups of maintenance strategies—single- and multi-unit
systems. The main scheme for classification of PM models for technical system is
presented in Figure 1.1.
1
2 Reliability Engineering
FIGURE 1.1 The classification for preventive maintenance models for technical system.
(Own contribution based on Wang, H., European Journal of Operational Research, 139,
469–489, 2002; Werbińska-Wojciechowska, S., Technical System Maintenance, Delay-time-
based modeling, Springer, London, UK, 2019; Werbińska-Wojciechowska, S., Multicomponent
technical systems maintenance models: State of art (in Polish), in Siergiejczyk, M. (ed.),
Technical Systems Maintenance Problems: Monograph (in Polish), Publication House of
Warsaw University of Technology, Warsaw, Poland, pp. 25–57, 2014.)
cr F (T ) + c p F (T )
C (T ) = T (1.1)
∫ F (t )dt
0
where:
C(T) is the long-run expected cost per unit time
cp is the cost of preventive replacement of a unit
cr is the cost of failed unit replacement
F(t) is the probability distribution function of system/unit lifetime: F (t ) = 1 − F (t )
ARP MODELS FOR BRP MODELS FOR SEQUENTIAL PM MODELS LIMIT PM MODELS FOR
SINGLE-UNIT SYSTEMS SINGLE-UNIT SYSTEMS FOR SINGLE-UNIT SINGLE-UNIT SYSTEMS
SYSTEMS
*minimal repair implementation *minimal repair implementation
*perfect/imperfect repair *perfect/imperfect repair *minimal repair implementation
*shock modelling *shock modeling *finite/infinite time horizon
*cost/availability/reliability *cost/availability constraints *hybrid models
constraints *inspection policy
*inspection policy *finite/infinite time horizon
*new/used unit maintenance
modeling
*negligible/non-negligible downtime LIMIT PM MODELS FOR LIMIT PM MODELS FOR
SINGLE-UNIT SYSTEMS SINGLE-UNIT SYSTEMS
*perfect/imperfect repair
*finite/infinite time horizon
*dynamic reliability models
*mixed PM models
Analytical equations of the expected cost rate with numerical solutions are provided.
The authors also present the comparison of given replacement policies.
Another extension of ARP modeling is given in [53], where the authors investigate
the problem of PM uncertainty by assuming that the quality of PM actions is a random
variable with a defined probability distribution. Following this, the authors analyze an
age reduction PM model and a failure rate PM model. Under the age reduction PM
model, it is assumed that each PM reduces operational stress to the existing time units
previous to the PM intervention, where the restoration interval is less than or equal to
the PM interval. The optimization criteria also is based on maintenance cost structure.
The issues of warranty policy are investigated in [54]. The author in this work
investigates a general age-replacement model that incorporates minimal repair,
planned replacement, and unplanned replacement for a product under a renewing
free-replacement warranty policy. The main assumptions of the ARP are compatible
with [43,44]. The authors assume that all the product failures that cause minimal repair
can be detected instantly and repaired instantaneously by a user. Thus, it is assumed
in this study that the user of the product should be responsible for all minimal repairs
before and after the warranty expires. Following this, for the product with an increas-
ing failure rate function, the authors show that a unique optimal replacement age exists
such that the long-run expected cost rate is minimized. The authors also compare
analytically the optimal replacement ages for products with and without warranty.
The warranty policy problem is analyzed in [55], where the authors propose
an age-dependent failure-repair model to analyze the warranty costs of products.
In this paper, the authors consider four typical warranty policies (fixed warranty,
renewing warranty, mixture of minimal and age-reducing repairs, and partial rebate
warranty).
The last group of ARP models applies to PM strategies based on the implementa-
tion of shock models. The simple age-based policy with shock model is presented
in [56]. In this work, the authors introduce the three main cumulative damage m odels:
(1) a unit that is subjected to shocks and suffers some damage due to shocks, (2) the
model includes periodic inspections, and (3) the model assumes that the amount of
damage increases linearly with time. For the defined shock models, optimal replace-
ment policies are derived for the expected cost rate minimization.
The extension of the given models is presented in [57], where the authors study
the mean residual life of a technical object as a measure used in the age replacement
model assessment. The analytical solution is supplied with a new U-statistic test pro-
cedure for testing the hypothesis that the life is exponentially distributed against the
alternative that the life distribution has a renewal-increasing mean residual property.
Another development of general replacement models of systems subject to shocks
is presented in [58], where the authors introduce the fatal and nonfatal shocks occur-
rence. The fatal shock causes the system total breakdown and the system is replaced,
whereas the nonfatal shock weakens the system and makes it more expensive to run.
6 Reliability Engineering
Following this, the authors focus on finding the optimal T that minimizes the long-
run expected cost per unit time.
Another extension of the ARP with shock models is to introduce the minimal repair
performance. Following this, in [59] the authors extend the generalized replacement
policy given in [58] by introducing minimal repair of minor failures. Moreover, in the
given PM model, the cost of minimal repair of the system is age dependent.
Later, in [60], the authors introduce an extended ARP policy with minimal repairs
and a cumulative damage model implementation. Under the developed maintenance
policy, the fatal shocks are removed by minimal repairs and the minor shocks increase
the system failure rate by a certain amount. Without external shocks, the failure rate
of the system also increases with age due to the aging process. The optimality criteria
also are focused on the long-run expected cost per unit time. This model is extended
later in [61], where the authors consider the ARP with minimal repair for an extended
cumulative damage model with maintenance at each shock. According to the devel-
oped PM policy, when the total damage does not exceed a predetermined failure level,
the system undergoes maintenance at each shock. When the total damage has reached
a given failure level, the system fails and undergoes minimal repair at each failure.
The system is replaced at periodic times T or at Nth failure, whichever occurs first.
To sum up, many authors usually discuss ARPs of single-unit systems analyti-
cally. The main models that address this maintenance strategy also should be sup-
plemented by works that investigate the problem of ARP modeling with the use of
semi-Markov processes (see, e.g., [62,63]), TTT-plotting (see, e.g., [64]), heuristic
models (see, e.g., [65]), or approximate methods implementation (see, e.g., [66]).
The authors in [67] introduce the new stochastic order for ARP based on the com-
parison of the Laplace transform of the time to failure for two different lifetime
distributions. The comparison of ARP models for a finite horizon case based on a
renewal process application and a negative exponential and Weibull failure-time dis-
tribution is presented in [68]. The additional interesting problems in ARP modeling
may be connected with spare provisioning policy implementation (see, e.g., [69]) or
multi-state systems investigation (see, e.g., [62,70,71]).
The quick overview of the given ARPs is presented in Table 1.1.
Another popular PM policy for single-unit systems is block replacement policy
(BRP). For the given maintenance policy, it is assumed that all units in a system are
replaced at periodic intervals regardless of their individual age in kT time moments,
where k = 1, 2, 3, and so on. The maintenance problem usually is aimed at finding
the optimal cycle length T either to minimize total maintenance and operational
costs or to maximize system availability. The simple BRP, when the maintenance
times are negligible, is based on the optimization of the expected long-run mainte-
nance cost per unit time as a function of T, given as [72]:
cr N (T ) + c p
C (T ) = (1.2)
T
where:
N(t) is the expected number of failure/renewals for time interval (0,t)
TABLE 1.1
Summary of PM Policies for Single-Unit Systems
Type of Maintenance Policy Planning Horizon Optimality Criterion Modeling Method Typical References
ARP Infinite (∞) The long-run expected cost per time unit Bayesian approach [47]
ARP Infinite (∞) The long-run expected cost per unit time, Analytical [38]
availability function
ARP Infinite (∞) The long-run expected cost per time unit Analytical [39,40–42,44,53,54,
60,118]
ARP Infinite (∞) The expected cost rate Analytical [45,49,51,56,59,61,
66,119]
Preventive Maintenance Modeling
Sequential PM policy Infinite (∞) Expected costs per unit time Analytical [90]
Sequential PM policy Infinite (∞) Total expected maintenance costs Genetic algorithm [92]
Preventive Maintenance Modeling
Sequential PM policy Infinite (∞) Mean cost rate Bayesian approach [93]
Sequential PM policy Infinite (∞)/finite Expected cost rate till replacement Analytical [89]
The main advantage of this policy is its simplicity. However, the main drawback
of simple block replacement policy is that at planned replacement times practically
new items might be replaced and a major portion of the useful life of these units is
wasted. Thus, to overcome this disadvantage, various modifications have been intro-
duced in the literature. The main extensions for the simple BRP include minimal
repair implementation, finite/infinite time horizon, shock modeling use, and inspec-
tion maintenance performance.
The introduction of minimal repair performance was analyzed first in the 1970s.
(see, e.g., [41,73]). Later, in [74], the author considers a BRP with minimal repair at
failure for a used unit of age Tax. In the given model, the item is preventively replaced
by new ones at times kT, k = 1, 2, 3, and so on. If the system fails in [(k−1)T, kT−Δδ],
then the item either is replaced by new ones or is repaired minimally. If the failure
occurs in [kT−Δδ , kT ], then the item either is replaced by used ones with age vary-
ing from Δδ to T or is repaired minimally. The choice is random with age-dependent
probability. The cost structure also is age-dependent. For the given assumptions, the
author defines the expected long-run cost per unit time function. This maintenance
model is extended later in [75] for single and multi-unit cases.
An interesting model is introduced in [76], where the authors investigate optimal
maintenance model for repairable systems under two types of failures with differ-
ent maintenance costs. The model assumes that there are performed periodic visual
inspections that detect potential failures of type I. For the given assumptions, the
total expected costs are estimated.
The presented models are developed for an infinite time span. In [7] finite
replacement models are considered. Taking into account, that the working time of
a unit is given by a specified value Two, the long-run expected costs per unit time
are estimated.
Another extension of the simple BRP applies to shock modeling implementation.
For example, in [77] the authors investigate the system subjected to shocks, which occur
independently and according to a Poisson process with intensity rate λs. The occurred
shocks either may be nonlethal with probability ps or lethal with probability (1−ps).
Later, the extension of the given model is presented in [78]. In the given paper, the author
analyzes a system subject to shocks that arrive according to an Non-Homogeneous
Poisson (NHP) process. As shocks occur, the system has two types of failures:
To sum up, many authors discuss BRPs of single-unit systems due to their sim-
plicity. The main models that address this maintenance strategy also should be
supplemented by works that investigate the problem of imperfect maintenance (see,
e.g., [81,82]), joint preventive maintenance with production inventory control policy
(see, e.g., [83]), risk at failure investigation (see, e.g., [84]), or estimation issues (see,
e.g., [72]). The examples of BRP implementation apply to transportation systems
maintenance (see, e.g., [85]), aircraft component maintenance (see, e.g., [86]), or
preventive maintenance for milling assemblies (see, e.g., [87]). The quick overview
of the given BRPs is presented in Table 1.1.
Another PM policy applied in the area of maintenance of single-unit systems
is sequential PM policy. Under this PM policy a unit is preventively maintained at
unequal time intervals. The unequal time interval usually is related to the age of the
system or is predetermined as in periodic maintenance policies [15].
One of the first works where the author considers sequential PM policy is [88].
In this work, the sequential preventive maintenance for a system with minimal repair
at failure is investigated. The policy assumes that the system is replaced at constant
time intervals and at the Nth failure. This model is later investigated in [7], where the
author proposes the simple sequential PM policy with imperfect maintenance for a
finite time span.
Another interesting model of the sequential PM policy is presented in [89], where
the authors introduce a shock model and a cumulative damage model. In this article,
two replacement policies are developed—a periodic PM and a sequential PM pol-
icy with minimal repair at failure and imperfect PM. The solutions are obtained for
finite and infinite time spans. These problems are investigated later in [90], where
the authors adopt improvement factors in the hazard rate function for modeling the
imperfect PM performance. The model is presented for an infinite time-horizon.
The main characteristic of the given model is connected with considering the age-
dependent minimal repair cost and the stochastic failure type.
In [91], the authors present a sequential imperfect PM policy for a degradation
system. This model extends assumptions given in [88]. The developed model is based
on maximal/equal cumulative-hazard rate constraints. The optimization is obtained
using a genetic algorithm. Later, the random adjustment-reduction maintenance
model with imperfect maintenance policy for a finite time span is presented in [92].
The authors also use the genetic algorithm implementation.
The Bayesian approach implementation in the sequential PM problem is presented
in [93]. The authors determine the optimal PM schedules for a hybrid sequential PM
policy, where the age reduction PM model and the hazard rate PM model are com-
bined. Under such a hybrid PM model, each PM action reduces the effective age of
the system to a certain value and also adjusts the slope of the hazard rate (slows down
the degradation process of the maintained system).
Sequential PM policies are practical for most units that need more frequent main-
tenance with increasing age. The quick overview of the main known sequential PM
models is given in Table 1.1.
The last group of PM policies applies to predefined limit level policies. The PM
policy depends on the failure model assumed for operated units—failure limit p olicy.
Under this policy, PM is performed only when the defined state variable, which
Preventive Maintenance Modeling 13
describes the state of the unit at age T (e.g., failure rate), reaches a predetermined
level and failures that occur are repaired.
One of the first works that investigates the optimal replacement model with the use
of the failure limit policy is in [94]. The author in this work presents the replacement
policy based on the failure model defined for an operating unit. In this model, a unit
state at age T is defined by a random variable. The replacement is performed either at
failure or when the unit state reaches or exceeds a given level, whichever occurs first.
Model optimization is based on the average long-run cost per unit time estimation.
This problem is investigated later in [95]. The author in his work introduces a PM
model with the monotone hazard function affected by system degradation. The author
develops a hazard model and achieves a cost optimization of system operation.
The imperfect repair in failure limit policy is introduced in [96]. The authors in
their work consider two types of PM (simple PM and preventive replacement) and
two types of corrective maintenance (minimal repair and corrective replacement).
The developed cost-rate model is based on adjustment of the failure rate after simple
PM with the use of a concept of improvement factor. The expected costs are the sum
of average costs of both types of PM and average cost of downtime. This problem is
addressed continued in [97]. The authors in their work propose a cost model for two
types of PM (as in [96]) and one type of corrective maintenance (corrective replace-
ment) that considers inflationary trends over a finite time horizon.
The PM scheduling for a system with deteriorated components also is analyzed
in [98]. The authors consider a PM policy compatible with those presented in [97],
but the degraded behavior of maintained components is modeled by a dynamic reli-
ability equation. The optimal solution, based on unit-cost life estimation, is obtained
with the use of genetic algorithms.
Another example of PM modeling under the failure limit policy is presented
in [99], where the authors focus on system availability optimization. In the presented
model system failure rate is reduced after each PM and depends on age and on the
number of performed PM actions.
Maintenance models under the failure limit policy are summarized in the
Table 1.1.
The second group of PM policies based on predefined limit levels are repair limit
policies. In the known literature, there are two types of repair limit policies: a repair
cost limit policy and a repair time limit policy [13]. Under the repair cost limit policy,
when a unit fails, a repair cost is estimated and repair is undertaken if the estimated
cost is less than a predetermined limit. Otherwise, the unit is replaced. For the repair
cost limit policy, a decision variable applies to time of repair. If the time of corrective
repair is greater than the specified time Trmax, a unit is replaced. Otherwise, the unit
is repaired [15,100].
The first models on repair limit policies are presented in [100,101]. The modeling
methods are based on Markov renewal process use. Later, in [102], the authors dis-
cuss the optimal repair limit replacement policy based on a graphical approach with
the use of the Total Time on Test (TTT) concept. This graphical approach is used
in [103] to determine the optimal repair limit replacement policy.
Another extension of the simple repair time limit policy is imperfect maintenance
implementation. In this implementation, known models are presented in [104–107].
14 Reliability Engineering
The implemented modeling methods are based on using the TTT concept and Lorenz
statistics.
The second type of repair limit policies is repair cost estimations at a system
failure and is defined as a repair-cost limit policy. One of the first studies that inves-
tigates a general maintenance model with replacements and minimal repair as a
base for repair limit replacement policy is [108]. The author presents three basic
maintenance policies (based on age-dependent PM and periodic PM) and two basic
repair limit replacement policies. In the first repair-cost limit replacement policy, the
author assumes that a system is replaced by the new one if the random repair cost
exceeds a given repair cost limit; otherwise, it is minimally repaired. This problem
is later investigated in [109], in which the minimal repairs follow Non-Homogeneous
Poisson Process (NHPP).
The problem of imperfect maintenance is introduced in [110], whereas in [111]
the authors investigate the problem of imperfect estimation of repair cost (imperfect
inspection case).
The implementation of a graphical method (TTT concept) in the repair-cost limit
replacement problem with imperfect repair is presented in [112]. In the presented
model, the authors introduce the imperfect repair (according to [110]) and a lead time
for failed unit replacement. The solution is based on the assumption of negligible
replacement time and uses the renewal reward process.
The cumulative damage model for systems subjected to shocks is presented
in [113]. The author introduces a periodical replacement policy with the concept of
repair cost limit under a cumulative damage model and solves it analytically for an
infinite time span.
Another interesting approach to the repair-cost limit replacement policies is pre-
sented in [114]. The author proposes the total repair-cost limit replacement policy,
where a system is replaced by the new one as soon as its total repair cost reaches
or exceeds a given level. The presented problem is later investigated and extended
in [115,116], where the authors introduce two types of failures (repairable and non-
repairable) and propose a mixed maintenance policy similar to the one presented
in [117].
The current repair limit policies and their extensions are summarized in the Table 1.1.
First, the group maintenance policies may be used. Under such a policy, a group of
items is replaced at the same time to take advantage of economies of scale.
Opportunity-based replacement models is based on the rule that replacement is
performed at the time when an opportunity arrives, such as scheduled downtime,
planned shutdown of the machines, or failure of a system in close proximity to the
item of interest.
In the situation when one machine is inoperative due to lack of components and
at the same time one or more other machines are inoperative due to the lack of dif-
ferent components, maintenance personnel may cannibalize operative components
from one or more machines to repair the other or others. This practice is common in
systems that are composed of sufficiently identical component parts (see, e.g., [34]).
The main classification for these types of PM maintenance models is given in Figure 1.3.
Following is a detailed review of the most commonly used maintenance policies.
First, maintenance policies for multi-unit systems without component dependence
are reviewed. In these systems two PM policies usually are used—ARP and BRP.
One of the first works that applies the simple age replacement policy imple-
mentation is [133]. The author proposes the simple ARP model for an nk-out-of-n
warm stand-by system, where the lifetime of components is exponentially distrib-
uted. The optimal maintenance policy for n failure-independent but non-identical
machines in series is given in [134]. The solution is obtained with the use of nonlin-
ear programming models.
The maintenance models with the use of ARP for multi-unit systems mostly
implement minimal repair, a shock-modeling approach, and hybrid PM.
The minimal repair is introduced in [135]. In this paper, the model assumes that a
system is replaced at age T. When the system fails before age T, it is either replaced
or minimally repaired depending on the random repair cost at failure. The model
considers finite and infinite time spans and is solved with a Bayesian approach
implementation.
ARP MODELS FOR MULTI- BRP MODELS FOR MULTI- OPPORTUNISTIC GROUP MAINTENANCE
UNIT SYSTEMS UNIT SYSTEMS MAINTENANCE MODELS MODELS
*minimal repair implementation *minimal repair implementation *age-based maintenance *static models
*perfect/imperfect repair *failure-based maintenance *dynamic models
*perfect/imperfect repair
*shock modeling *condition-based maintenance
*shock modeling *HYBRID MODELS (mixed PM)
*HYBRID MODELS (mixed PM) *cost/availability constraints
*HYBRID MODELS (mixed PM,
economic dependence occurrence)
CANNIBALIZATION
MAINTENANCE
*reliability-based models
* simulation models
*inventory-based models
TABLE 1.2
Summary of Age and Block Replacement Policies for Multi-unit Systems
Type of
Maintenance Planning Typical
policy Horizon Optimality Criterion Modeling Method References
ARP Infinite (∞) The expected long-run Analytical [133,138–141,
costs per unit time 144]
ARP Infinite (∞) The expected long-run Nonlinear [134]
costs per unit time programming
ARP Infinite (∞) The expected cost rate Analytical [136,137,143]
ARP Infinite (∞) Average loss rate Renewal process/ [142]
geometric process/
Markov process
ARP Infinite (∞)/ The expected long-run Renewal reward theory/ [119]
finite costs per unit time Bayesian approach
BRP Infinite (∞) The expected long-run Analytical/simulation [145,149]
cost per unit time
BRP Infinite (∞) The expected long-run Analytical (hybrid PM) [152,157]
cost per unit time
BRP Infinite (∞) The expected long-run Analytical (expected and [155]
cost per unit time critical value models)
BRP Infinite (∞) The expected long-run Markov processes [158]
cost per unit time
BRP Infinite (∞) The expected long-run Embedded Markov [153]
cost per unit time chain
BRP Infinite (∞) The expected long-run Analytical [75,146–151]
cost per unit time
BRP Infinite (∞) The expected long-run Analytical [160]
cost per unit time,
system availability
BRP Infinite (∞) System availability Analytical [150,161]
set of machines for a given determined T. The study presents the completely determinis-
tic approach to decide for each period t ∈ T which machine to service (if any) such that
total servicing costs and operating costs are minimized. The solution is obtained with the
use of a branch and price algorithm. Another interesting maintenance problem applies to
investigation of uncertain lifetime of system units (see [155]), introduction of repairable
18 Reliability Engineering
dynamic programming
Static (T-age policy) The long run expected cost per unit of time Analytical [172,173]
Static (T-age policy, m-failure The expected cost per unit time [174,175]
policy, (m, T)-policy)
Static The long run expected cost per unit of time Bayesian approach [164,176,177]
The long-run average maintenance cost per Markov processes [35]
unit time Discrete-time Markov decision chains/ [178]
simulation
Total maintenance possession time and cost Petri-net and GA-based approach [165]
Total maintenance costs Random-key genetic algorithm [166]
Finite rolling Dynamic The long-term tentative plan Dynamic programming [179]
horizon The economic profit of group Heuristic approach based on genetic [180]
algorithm and MULTIFIT algorithm
The economic profit of group Heuristic approach based on GA [181]
Penalty cost function, total maintenance cost Analytical [163]
savings over the scheduling interval
19
20 Reliability Engineering
(Continued)
21
22
• Reliability-based models
• Inventory-based maintenance models
• Simulation (queuing) maintenance models
TABLE 1.5
Summary of Cannibalization Maintenance Policies for Deteriorating
Multi-unit Systems
Optimality Criterion Approach Modeling Method Typical References
System minimum condition Reliability-based Analytical [226]
Cannibalized structure function (allocation model) [227]
Four measures: expected system Analytical [228]
state, defectives per failed (allocation
machine, MTTCFa, total model)/simulation
cannibalizations
The survival function of number Analytical [229]
of units of equipment available
or use at the end of given time
period
System reliability for mission Nonlinear [225]
programming
Total profit resulting from a Simulation [230]
component reusing
Reasons for product returns Case study [223]
Expected number of inoperative Markov process [34]
machines
The average total maintenance Simulation-based A closed-network, [222]
investments discrete-event
Average total maintenance simulation [231]
costs/average fleet readiness
NORS rate Inventory-based NORS model [232]
Optimal portfolio, optimal stock Allocation problem – [233]
level heuristic approach
The expected availability objective DRIVE model [224]
function
Aircraft availability Analytical (AAM [234]
model)
Cannibalization rates Analytical [235]
Cannibalization rates Performance indicators [236]
analysis
Product cannibalization Statistical data analysis [237]
e.g., Inter-Squadron cannibalization Balanced Scorecard [238]
Moreover, the given literature overview provided definition for the following main
conclusions:
REFERENCES
1. Mccall, J. J. (1965). Maintenance policies for stochastically failing equipment: A sur-
vey. Management Science 11(5): 493–524.
2. Pierskalla, W. P. and Voelker, J. A. (1976). A survey of maintenance models: The con-
trol and surveillance of deteriorating systems. Naval Research Logistics Quarterly 23:
353–388.
3. Valdez-Flores, C. and Feldman, R. (1989). A survey of preventive maintenance mod-
els for stochastically deteriorating single-unit systems. Naval Research Logistics 36:
419–446.
4. Cho, I. D. and Parlar, M. (1991). A survey of maintenance models for multi-unit sys-
tems. European Journal of Operational Research 51(1): 1–23.
5. Dekker, R., Wildeman, R. E., and Van Der Duyn Schouten, F. A. (1997). A review
of multi-component maintenance models with economic dependence. Mathematical
Methods of Operations Research 45: 411–435.
6. Mazzuchi, T. A., Van Noortwijk, J. M., and Kallen, M. J. (2007). Maintenance optimi-
zation. Technical Report, TR-2007-9.
7. Nakagawa, T. and Mizutani, S. (2009). A summary of maintenance policies for a
finite interval. Reliability Engineering and System Safety 94: 89–96. doi:10.1016/
j.ress.2007.04.004.
8. Nicolai, R. P. and Dekker, R. (2007). A review of multi-component maintenance models.
In: Aven, T. and Vinnem, J. M. (eds.) Risk, Reliability and Societal Safety: Proceedings
of European Safety and Reliability Conference ESREL 2007, Stavanger, Norway, June
25–27, 2007, Leiden, the Netherlands: Taylor & Francis Group: pp. 289–296.
Preventive Maintenance Modeling 27
28. Frostig, E. (2003). Comparison of maintenance policies with monotone failure rate dis-
tributions. Applied Stochastic Models in Business and Industry 19: 51–65. doi:10.1002/
asmb.485.
29. Langberg, N. A. (1988). Comparisons of replacement policies. Journal of Applied
Probability 25: 780–788.
30. Thomas, L. C. (1986). A survey of maintenance and replacement models for maintain-
ability and reliability of multi-item systems. Reliability Engineering 16(4):297–309.
31. Aboulfath, F. (1995). Optimal maintenance schedules for a fleet of vehicles under the
constraint of the single repair facility. MSc Thesis. Toronto, ON: University of Toronto.
32. Nicolai, R. P. and Dekker, R. (2006). Optimal maintenance of multicomponent sys-
tems: A review. Economic Institute Report.
33. Lamberts, S. W. J. and Nicolai, R. P. (2008). Maintenance Models for Systems Sub-
ject to Measurable Deterioration. Rotterdam, the Netherlands: Rozenberg Publishers,
University Dissertations.
34. Fisher, W. W. (1990). Markov process modelling of a maintenance system with spares,
repair, cannibalization and manpower constraints. Mathematical Computer Modelling
13(7): 119–125.
35. Gurler, U. and Kaya, A. (2002). A maintenance policy for a system with multi-state
components: An approximate solution. Reliability Engineering and System Safety 76:
117–127. doi:10.1016/S0951-8320(01)00125-9.
36. Block, H. W., Langberg, N. A., and Savits, T. H. (1993). Repair replacement policies.
Journal of Applied Probability 30: 194–206. doi:10.2307/3214632.
37. Park, M. and Pham, H. (2016). Cost models for age replacement policies and block
replacement policies under warranty. Applied Mathematical Modelling 40(9–10):
5689–5702. doi:10.1016/j.apm.2016.01.022.
38. Scarf, P. A., Dwight, R., and Al-Musrati, A. (2005). On reliability criteria and the
implied cost of failure for a maintained component. Reliability Engineering and System
Safety 89: 199–207. doi:10.1016/j.ress.2004.08.019.
39. Chowdhury, C. H. (1988). A systematic survey of the maintenance models. Periodica
Polytechnica. Mechanical Engineering 32(3–4): 253–274.
40. Glasser, G. J. (1967). The age replacement problem. Technometrics 9(1): 83–91.
41. Rakoczy, A. and Żółtowski, J. (1977). About the issues on technical object renewal
principles definition (in Polish). In: Proceedings of Winter School on Reliability,
Szczyrk, Poland: pp. 175–191.
42. Yun, W. Y. (1989). An age replacement policy with increasing minimal repair cost.
Microelectronics Reliability 29(2): 153–157.
43. Sheu, S.-H. (1991). A general age replacement model with minimal repair and general
random repair cost. Microelectronics Reliability 31(5): 1009–1017.
44. Sheu, S.-H. and Liou, C.-T. (1992). An age replacement policy with minimal repair and
general random repair cost. Microelectronics Reliability 32(9): 1283–1289.
45. Sheu, S.-H. (1993). A generalized model for determining optimal number of minimal
repairs before replacement. European Journal of Operational Research 69: 38–49.
46. Lim, J. H., Qu, J., and Zuo, M. J. (2016). Age replacement policy based on imperfect
repair with random probability. Reliability Engineering and System Safety 149: 24–33.
doi:10.1016/j.ress.2015.10.020.
47. Mazzuchi, T. A. and Soyer, R. (1996). A Bayesian perspective on some replacement
strategies. Reliability Engineering and System Safety 51: 295–303.
48. Cha, J. H. and Kim, J. J. (2002). On the existence of the steady state availability of
imperfect repair model. Sankhya: The Indian Journal of Statistics 64, series B. Pt.
1: 76–81.
49. Dagpunar, J. S. (1994). Some necessary and sufficient conditions for age replacement
with non-zero downtimes. Journal of Operational Research Society 45(2): 225–229.
Preventive Maintenance Modeling 29
50. Vaurio, J. K. (1999). Availability and cost functions for periodically inspected pre-
ventively maintained units. Reliability Engineering and System Safety 63: 133–140.
doi:10.1016/S0951-8320(98)00030-1.
51. Nakagawa, T., Zhao, X., and Yun, W. Y. (2011). Optimal age replacement and inspection
policies with random failure and replacement times. International Journal of Reliability,
Quality and Safety Engineering 18(5): 405–416. doi:10.1142/S0218539311004159.
52. Zhao, X., Mizutani, S., and Nakagawa, T. (2015). Which is better for replacement poli-
cies with continuous or discrete scheduled times? European Journal of Operational
Research 242: 477–486. doi:10.1016/j.ejor.2014.11.018.
53. Wu, S. and Clements-Croome, D. (2005). Preventive maintenance models with random
maintenance quantity. Reliability Engineering and System Safety 90: 99–105.
54. Chien, Y.-H. (2008). A general age-replacement model with minimal repair under
renewing free-replacement warranty. European Journal of Operational Research 186:
1046–1058. doi:10.1016/j.ejor.2007.02.030.
55. Dimitrov, B., Chukova, S., and Khalil, Z. (2004). Warranty costs: An age-dependent
failure/repair model. Naval Research Logistics 51(7): 959–976. doi:10.1002/nav.20037.
56. Ito, K. and Nakagawa, T. (2011). Comparison of three cumulative damage models.
Quality Technology and Quantitative Management 8(1): 57–66. doi:10.1080/16843703
.2011.11673246.
57. Sepehrifar, M. B., Khorshidian, K., and Jamshidian, A. R. (2015). On renewal
increasing mean residual life distributions: An age replacement model with hypoth-
esis testing application. Statistics and Probability Letters 96: 117–122. doi:10.1016/
j.spl.2014.09.009.
58. Sheu, S.-H. (1992). A general replacement of a system subject to shocks. Microelectronics
Reliability 32(5): 657–662.
59. Sheu, S.-H., Griffith, W. S., and Nakagawa, T. (1995). Extended optimal replace-
ment model with random repair cost. European Journal of Operational Research 85:
636–649.
60. Lai, M.-T. and Leu, B.-Y. (1996). An economic discrete replacement policy for a shock
damage model with minimal repairs. Microeconomics Reliability 36(10): 1347–1355.
61. Qian, C., Nakamura, S., and Nakagawa, T. (2003). Replacement and minimal repair pol-
icies for a cumulative damage model with maintenance. Computers and Mathematics
with Applications 46: 1111–1118.
62. Lam, C. T. and Yeh, R. H. (1994). Optimal replacement policies for multi-state deterio-
rating systems. Naval Research Logistics 41(3): 303–315.
63. Segawa, Y., Ohnishi, M., and Ibaraki, T. (1992). Optimal minimal-repair and replace-
ment problem with age dependent cost structure. Computers and Mathematics with
Applications 24(1/2): 91–101.
64. Kumar, D. and Westberg, U. (1997). Maintenance scheduling under age replacement
policy using proportional hazards model and TTT-ploting. European Journal of
Operational Research 99: 507–515.
65. Mahdavi, M. and Mahdavi, M. (2009). Optimization of age replacement policy using
reliability based heuristic model. Journal of Scientific and Industrial Research 68:
668–673.
66. Zhao, X., Al-Khalifa, K. N., and Nakagawa, T. (2015). Approximate method for opti-
mal replacement, maintenance, and inspection policies. Reliability Engineering and
System Safety 144: 68–73. doi:10.1016/j.ress.2015.07.005.
67. Kayid, M., Izadkhah, S., and Alshami, S. (2016). Laplace transform ordering of time
to failure in age replacement models. Journal of the Korean Statistical Society 45(1):
101–113.
68. Christer, A. H. (1986). Comments on finite-period applications of age-based replace-
ment models. IMA Journal of Mathematics in Management 1: 111–124.
30 Reliability Engineering
89. Nakagawa, T. and Mizutani, S. (2008). Periodic and sequential imperfect preven-
tive maintenance policies for cumulative damage models. In: Pham, H. (ed.) Recent
Advances in Reliability and Quality in Design, London, UK: Springer.
90. Sheu, S.-H., Chang, C. C., and Chen, Y.-L. (2012). An extended sequential imper-
fect preventive maintenance model with improvement factors. Communications in
Statistics: Theory and Methods 41(7): 1269–1283. doi:10.1080/03610926.2010.542852.
91. Liu, Y., Li, Y., Huang, H.-Z., and Kuang, Y. (2011). An optimal sequential preventive
maintenance policy under stochastic maintenance quality. Structure and Infrastructure
Engineering: Maintenance, Management, Life-Cycle Design and Performance 7(4):
315–322.
92. Peng, W., Liu, Y., Zhang, X., and Huang, H.-Z. (2015). Sequential preventive main-
tenance policies with consideration of random adjustment-reduction features.
Eksploatacja i Niezawodnosc: Maintenance and Reliability 17(2): 306–313.
93. Kim, H. S., Sub Kwon, Y., and Park, D. H. (2006). Bayesian method on sequential
preventive maintenance problem. The Korean Communications in Statistics 13(1):
191–204.
94. Bergman, B. (1978). Optimal replacement under a general failure model. Advances in
Applied Probability 10: 431–451.
95. Canfield, R. V. (1986). Cost optimization of periodic preventive maintenance. IEEE
Transactions on Reliability R-35(1): 78–81. doi:10.1109/TR.1986.4335355.
96. Lie, C. H. and Chun, Y. H. (1986). An algorithm for preventive maintenance policy.
IEEE Transactions on Reliability R-35(1): 71–75.
97. Jayabalan, V. and Chaudhuri, D. (1992). Cost optimization of maintenance scheduling
for a system with assured reliability. IEEE Transactions on Reliability 41(1): 21–25.
doi:10.1109/24.126665.
98. Tsai, Y.-T., Wang, K.-S., and Teng, H.-Y. (2001). Optimizing preventive maintenance
for mechanical components using genetic algorithms. Reliability Engineering and
System Safety 74: 89–97. doi:10.1016/S0951-8320(01)00065-5.
99. Chan, J.-K. and Shaw, L. (1993). Modeling repairable systems with failure rates that
depend on age and maintenance. IEEE Transactions on Reliability 42(4): 566–571.
doi:10.1109/24.273583.
100. Nakagawa, T. and Osaki, S. (1974). The optimum repair limit replacement policies.
Operational Research Quarterly 25(2): 311–317.
101. Okumoto, K. and Osaki, S. (1976). Repair limit replacement policies with lead time.
Zeitschrift fur Operations Research 20: 133–142.
102. Koshimae, H., Dohi, T., Kaio, N., and Osaki, S. (1996). Graphical/statistical approach to
repair limit replacement policies. Journal of the Operations Research 39(2): 230–246.
103. Dohi, T., Kaio, N., and Osaki, S. (2000). A graphical method to repair-cost limit
replacement policies with imperfect repair. Mathematical and Computer Modelling 31:
99–106. doi:10.1016/S0895-7177(00)00076-5.
104. Dohi, T., Ashioka, A., Kaio, N., and Osaki, S. (2006). Statistical estimation algorithms
for repairs-time limit replacement scheduling under earning rate criteria. Computers
and Mathematics with Applications 51: 345–356. doi:10.1016/j.camwa.2005.11.004.
105. Dohi, T., Ashioka, A., Kaio, N., and Osaki, S. (2003). The optimal repair-time limit
replacement policy with imperfect repair: Lorenz transform approach. Mathematical
and Computer Modelling 38: 1169–1176. doi:10.1016/S0895-7177(03)90117-8.
106. Dohi, T., Kaio, N., and Osaki, S. (2003). A new graphical method to estimate the
optimal repair-time limit with incomplete repair and discounting. Computers and
Mathematics with Applications 46: 999–1007. doi:10.1016/S0898-1221(03)90114-3.
107. Dohi, T., Matsushima, N., Kaio, N., and Osaki, S. (1996). Nonparametric repair-limit
replacement policies with imperfect repair. European Journal of Operational Research
96: 260–273.
32 Reliability Engineering
108. Beichelt, F. (1992). A general maintenance model and its application to repair
limit replacement policies. Microelectronics Reliability 32(8): 1185–1196.
doi:10.1016/0026-2714(92)90036-K.
109. Bai, D. S. and Yun, W. Y. (1986). An age replacement policy with minimal repair cost
limit. IEEE Transactions on Reliability R-35(4): 452–454.
110. Yun, W. Y. and Bai, D. S. (1987). Cost limit replacement policy under imperfect repair.
Reliability Engineering 19: 23–28.
111. Yun, W. Y. and Bai, D. S. (1988). Repair cost limit replacement policy under imperfect
inspection. Reliability Engineering and System Safety 23: 59–64.
112. Dohi, T., Takeita, K., and Osaki, S. (2000). Graphical method for determining/estimating
optimal repair-limit replacement policies. International Journal of Reliability, Quality
and Safety Engineering 7(1): 43–60.
113. Lai, M.-T. (2014). Optimal replacement period with repair cost limit and cumulative
damage model. Eksploatacja i Niezawodnosc: Maintenance and Reliability 16(2):
246–252.
114. Beichelt, F. (1999). A general approach to total repair cost limit replacement policies.
ORiON 15(1/2): 67–75.
115. Chang, C.-C., Sheu, S.-H., and Chen, Y.-L. (2013). Optimal replacement model with
age-dependent failure type based on a cumulative repair-cost limit policy. Applied
Mathematical Modelling 37: 308–317. doi:10.1016/j.apm.2012.02.031.
116. Chang, C.-C., Sheu, S.-H., and Chen, Y.-L. (2013) Optimal number of minimal repairs
before replacement based on a cumulative repair-cost limit policy. Computers and
Industrial Engineering 59: 603–610. doi:10.1016/j.cie.2010.07.005.
117. Kapur, P. K. and Garg, R. B. (1989) Optimal number of minimal repairs before replace-
ment with repair cost limit. Reliability Engineering and System Safety 26: 35–46.
118. Chien, Y.-H. and Sheu, S.-H. (2006). Extended optimal age-replacement policy with
minimal repair of a system subject to shocks. European Journal of Operational
Research 174: 169–181. doi:10.1016/j.ejor.2005.01.032.
119. Sheu, S.-H. (1999). Extended optimal replacement model for deteriorating systems.
European Journal of Operational Research 112: 503–516.
120. Chang, C.-C. (2014). Optimum preventive maintenance policies for systems subject to
random working times, replacement, and minimal repair. Computers and Industrial
Engineering 67: 185–194. doi:10.1016/j.cie.2013.11.011.
121. Martorell, S., Sanchez, A., and Serradell, V. (1999). Age-dependent reliability model
considering effects of maintenance and working conditions. Reliability Engineering
and System Safety 64: 19–31.
122. Jiang, R. and Ji, P. (2002). Age replacement policy: A multi-attribute value
model. Reliability Engineering and System Safety 76: 311–318. doi:10.1016/
S0951-8320(02)00021-2.
123. Sheu, S.-H. and Chien, Y.-H. (2004). Optimal age-replacement policy of a system sub-
ject to shocks with random lead-time. European Journal of Operational Research 159:
132–144. doi:10.1016/S0377-2217(03)00409-0.
124. Legat, V., Zaludowa, A. H., Cervenka, V., and Jurca, V. (1996). Contribution to opti-
mization of preventive replacement. Reliability Engineering and System Safety 51:
259–266.
125. Nakagawa, T. and Kowada, M. (1983). Analysis of a system with minimal repair and its
application to replacement policy. European Journal of Operational Research 12(2):
176–182.
126. Park, D. H., Jung, G. M., and Yum, J. K. (2000). Cost minimization for periodic main-
tenance policy of a system subject to slow degradation. Reliability Engineering and
System Safety 68(2): 105–112. doi:10.1016/S0951-8320(00)00012-0.
Preventive Maintenance Modeling 33
127. Sheu, S.-H., Chen, Y.-L., Chang, C. H.-C. H., and Zhang, Z. G. (2016). A note on a
two variable block replacement policy for a system subject to non-homogeneous pure
birth shocks. Applied Mathematical Modelling 40(5–6): 3703–3712. doi:10.1016/
j.apm.2015.10.001.
128. Bukowski, L. (1980). Optimization of technical systems maintenance policy (case
study of metallurgical production line) (in Polish). In: Proceedings of Winter School on
Reliability. Katowice, Ploand: Centre for Technical Progress: pp. 47–62.
129. Zhao, Y. X. (2003). On preventive maintenance policy of a critical reliability level for
system subject to degradation. Reliability Engineering and System Safety 79: 301–308.
doi:10.1016/S0951-8320(02)00201-6.
130. Jiang, X., Cheng, K., and Makis, V. (1998). On the optimality of repair-cost-limit poli-
cies. Journal of Applied Probability 35: 936–949.
131. Segawa, Y. and Ohnishi, M. (2000). The average optimality of a repair-limit replace-
ment policy. Mathematical and Computer Modelling 31: 327–334.
132. Murthy, D. N. P. and Nguyen, D. G. (1988). An optimal repair cost limit policy for ser-
vicing warranty. Mathematical and Computer Modelling 11: 595–599.
133. Frees, E. W. (1986). Optimizing costs on age replacement policies. Stochastic Processes
and their Applications 21: 195–212.
134. Maillart, L. M. and Fang, X. (2006). Optimal maintenance policies for serial, multi-
machine systems with non-instantaneous repairs. Naval Research Logistics 53(8):
804–813.
135. Sheu, S.-H., Yeh, R. H., Lin, Y.-B., and Juang, M.-G. (1999). A Bayesian perspective
on age replacement with minimal repair. Reliability Engineering and System Safety 65:
55–64.
136. Sheu, S.-H., Sung, C. H.-K., Hsu, T.-S., and Chen, Y.-C. H. (2013a). Age replacement
policy for a two-unit system subject to non-homogeneous pure birth shocks. Applied
Mathematical Modelling 37: 7027–7036. doi:10.1016/j.apm.2013.02.022.
137. Sheu, S.-H., Zhang, Z. G., Chien, Y.-H., and Huang, T.-H. (2013). Age replacement pol-
icy with lead-time for a system subject to non-homogeneous pure birth shocks. Applied
Mathematical Modelling 37: 7717–7725. doi:10.1016/j.apm.2013.03.017.
138. Dekker, R. and Dijkstra, M. C. (1992) Opportunity-based age replacement:
Exponentially distributed times between opportunities. Naval Research Logistics 39:
175–190.
139. Iskandar, B. P. and Sandoh, H. (2000). An extended opportunity-based age replacement
policy. RAIRO Operations Research 34: 145–154.
140. Jhang, J. P. and Sheu, S. H. (1999). Opportunity-based age replacement policy with
minimal repair. Reliability Engineering and System Safety 64: 339–344.
141. Satow, T. and Osaki, S. (2003). Opportunity-based age replacement with different
intensity rates. Mathematical and Computer Modelling 38: 1419–1426. doi:10.1016/
S0895-7177(03)90145-2.
142. Leung, F. K. N., Zhang, Y. L., and Lai, K. K. (2011). Analysis for a two-dissimilar-
component cold standby repairable system with repair priority. Reliability Engineering
and System Safety 96: 1542–1551. doi:10.1016/j.ress.2011.06.004.
143. Armstrong, M. J. (2002). Age repair policies for the machine repair problem. European
Journal of Operational Research 138: 127–141. doi:10.1016/S0377-2217(01)00135-7.
144. Van Dijkhuizen, G. C. and Van Harten, A. (1998). Two-stage generalized age mainte-
nance of a queue-like production system. European Journal of Operational Research
108: 363–378.
145. Scarf, P. A. and Deara, M. (2003). Block replacement policies for a two-component
system with failure dependence. Naval Research Logistics 50: 70–87. doi:10.1002/
nav.10051.
34 Reliability Engineering
146. Yusuf, I. and Ali, U. A. (2012). Structural dependence replacement model for parallel
system of two units. Journal of Basic and Applied Science 20(4): 324–326.
147. Lai, M.-T. and Yuan, J. (1991). Periodic replacement model for a parallel system subject
to independent and common cause shock failures. Reliability Engineering and System
Safety 31(3): 355–367.
148. Yasui, K., Nakagawa, T., and Osaki, S. (1988). A summary of optimum replacement
policies for a parallel redundant system. Microelectronic Reliability 28(4): 635–641.
149. Jodejko, A. (2008). Maintenance problems of technical systems composed of hetero-
geneous elements. In: Proceedings of Summer Safety and Reliability Seminars, June
22–28, 2008, Gdańsk-Sopot, Poland: pp. 187–194.
150. Sheu, S.-H., Lin, Y.-B., and Liao, G.-L. (2006). Optimum policies for a system with
general imperfect maintenance. Reliability Engineering and System Safety 91(3): 362–
369. doi:10.1016/j.ress.2005.01.015.
151. Sheu, S.-H. (1990). Periodic replacement when minimal repair costs depend on the age
and the number of minimal repairs for a multi-unit system. Microelectronics Reliability
30(4): 713–718.
152. Zequeira, R. I. and Berenguer, C. (2005). A block replacement policy for a periodically
inspected two-unit parallel standby safety system. In: Kołowrocki, K. (ed.) Advances in
Safety and Reliability: Proceedings of the European Safety and Reliability Conference
(ESREL 2005), Gdynia-Sopot-Gdańsk, Poland, June 27–30, 2005, Leiden, the
Netherlands: A. A. Balkema: pp. 2091–2098.
153. Park, J. H., Lee, S. C., Hong, J. W., and Lie, C. H. (2009). An optimal Block pre-
ventive maintenance policy for a multi-unit system considering imperfect mainte-
nance. Asia-Pacific Journal of Operational Research 26(6): 831–847. doi:10.1142/
S021759590900250X.
154. Grigoriev, A., Van De Klundert, J., and Spieksma, F. C. R. (2006). Modeling and solv-
ing the periodic maintenance problem. European Journal of Operational Research
172: 783–797. doi:10.1016/j.ejor.2004.11.013.
155. Ke, H. and Yao, K. (2016). Block replacement policy with uncertain lifetimes. Reliability
Engineering and System Safety 148: 119–124. doi:10.1016/j.ress.2015.12.008.
156. Wells, C. H. E. (2014). Reliability analysis of a single warm-standby system subject
to repairable and non-repairable failures. European Journal of Operational Research
235: 180–186. doi:10.1016/j.ejor.2013.12.027.
157. Scarf, P. A. and Cavalcante, C. A. V. (2010). Hybrid block replacement and inspection
policies for a multi-component system with heterogeneous component lives. European
Journal of Operational Research 206: 384–394. doi:10.1016/j.ejor.2010.02.024.
158. Anisimov, V. V. (2005). Asymptotic analysis of stochastic block replacement policies
for multi-component systems in a Markov environment. Operations Research Letters
33: 26–34. doi:10.1016/j.orl.2004.03.009.
159. Caldeira, D. J., Taborda, C. J., and Trigo, T. P. (2012). An optimal preventive main-
tenance policy of parallel-series systems. Journal of Polish Safety and Reliability
Association Summer Safety and Reliability Seminars 3(1): 29–34.
160. Duarte, A. C., Craveiro Taborda, J. C., Craveiro, A., and Trigo, T. P. (2005). Optimization
of the preventive maintenance plan of a series components system. In: Kołowrocki,
K. (ed.) Advances in Safety and Reliability: Proceedings of the European Safety and
Reliability Conference (ESREL 2005), Gdynia-Sopot-Gdańsk, Poland, June 27–30,
2005, Leiden, the Netherlands: A.A. Balkema.
161. Chelbi, A., Ait-Kadi, D., and Aloui, H. (2007). Availability optimization for multi-
component systems subjected to periodic replacement. In: Aven, T. and Vinnem, J. M.
(eds.) Risk, Reliability and Societal Safety: Proceedings of European Safety and
Reliability Conference ESREL 2007, Stavanger, Norway, June 25–27, 2007, Leiden,
the Netherlands: Taylor & Francis Group.
Preventive Maintenance Modeling 35
181. Vu, H. C., Do, P., Barros, A., and Berenguer, C. H. (2014). Maintenance grouping strat-
egy for multi-component systems with dynamic contexts. Reliability Engineering and
System Safety 132: 233–249. doi:10.1016/j.ress.2014.08.002.
182. Zhang, X. and Zeng, J. (2015) A general modelling method for opportunistic mainte-
nance modelling of multi-unit systems. Reliability Engineering and System Safety 140:
176–190. doi:10.1016/j.ress.2015.03.030.
183. Zequeira, R. I., Valdes, J. E., and Berenguer, C. (2008). Optimal buffer inventory
and opportunistic preventive maintenance under random production capacity avail-
ability. International Journal of Production Economics 111: 686–696. doi:10.1016/
j.ijpe.2007.02.037.
184. Laggoune, R., Chateauneuf, A., and Aissani, D. (2009). Opportunistic policy for opti-
mal preventive maintenance of a multi-component system in continuous operating
units. Computers and Chemical Engineering 33: 1499–1510.
185. Hou, W. and Jiang, Z. (2013). An opportunistic maintenance policy of multi-unit series
production system with consideration of imperfect maintenance. Applied Mathematics
and Information Sciences 7(1L): 283–290.
186. Shafiee, M., Finkelstein, M., and Berenguer, C. H. (2015). An opportunistic condition-
based maintenance policy for offshore wind turbine blades subjected to degradation
and environmental shocks. Reliability Engineering and System Safety 142: 463–471.
doi:10.1016/j.ress.2015.05.001.
187. Xia, T., Jin, X., Xi, L., and Ni, J. (2015). Production-driven opportunistic mainte-
nance for batch production based on MAM-APB scheduling. European Journal of
Operational Research 240: 781–790. doi:10.1016/j.ejor.2014.08.004.
188. Cavalcante, C. A. V. and Lopes, R. S. (2015). Multi-criteria model to support the defini-
tion of opportunistic maintenance policy: A study in a cogeneration system. Energy 80:
32–80.
189. Bedford, T., Dewan, I., Meilijson, I., and Zitrou, A. (2011). The signal model: A model
for competing risks of opportunistic maintenance. European Journal of Operational
Research 214: 665–673. doi:10.1016/j.ejor.2011.05.016.
190. Hu, J. and Zhang, L. (2014). Risk based opportunistic maintenance model for complex
mechanical systems. Expert Systems with Applications 41(6): 3105–3115. doi:10.1016/j.
eswa.2013.10.041.
191. Zhou, X., Xi, L., and Lee, J. (2006). A dynamic opportunistic maintenance policy
for continuously monitored systems. Journal of Quality in Maintenance Engineering
12(3): 294–305. doi:10.1108/13552510610685129.
192. Shi, H. and Zeng, J. (2016). Real-time prediction of remaining useful life and p reventive
opportunistic maintenance strategy for multi-component systems considering stochas-
tic dependence. Computers and Industrial Engineering 93: 192–204. doi:10.1016/
j.cie.2015.12.016.
193. Zhou, X., Lu, Z.-Q., Xi, L.-F., and Lee, J. (2010). Opportunistic preventive maintenance
optimization for multi-unit series systems with combing multi-preventive maintenance
techniques. Journal of Shanghai Jiaotong University 15(5): 513–518.
194. Gustavsson, E., Patriksson, M., Stromberg, A.-B., Wojciechowski, A., and Onnheim,
M. (2014). Preventive maintenance scheduling of multi-component systems with
interval costs. Computers and Industrial Engineering 76: 390–400. doi:10.1016/
j.cie.2014.02.009.
195. Haque, S. A., Zohrul Kabir, A. B. M., and Sarker, R. A. (2003). Optimization model for
opportunistic replacement policy using genetic algorithm with fuzzy logic controller.
Proceedings of the Congress on Evolutionary Computation 4: 2837–2843.
196. Samhouri, M. S., Al-Ghandoor, A., Fouad, R. H., and Alhaj Ali, S. M. (2009). An intel-
ligent opportunistic maintenance (OM) system: A genetic algorithm approach. Jordan
Journal of Mechanical and Industrial Engineering 3(4): 246–251.
Preventive Maintenance Modeling 37
197. Kececioglu, D. and Sun, F.-B. (1995). A general discrete-time dynamic programming
model for the opportunistic replacement policy and its application to ball-bearing sys-
tems. Reliability Engineering and System Safety 47: 175–185.
198. Iung, B., Levrat, E., and Thomas, E. (2007). Odds algorithm-based opportunistic main-
tenance task execution for preserving product conditions. Annals of the CIRP 56/1:
13–16.
199. Derigent, W., Thomas, E., Levrat, E., and Iung, B. (2009). Opportunistic maintenance
based on fuzzy modelling of component proximity. CIRP Annals – Manufacturing
Technology 58: 29–32.
200. Assid, M., Gharbi, A., and Hajji, A. (2015). Production planning and opportunistic pre-
ventive maintenance for unreliable one-machine two-products manufacturing systems.
IFAC-PapersOnLine 48–43: 478–483. doi:10.1016/j.ifacol.2015.06.127.
201. Hu, J., Zhang, L., and Liang, W. (2012). Opportunistic predictive maintenance for
complex multi-component systems based on DBN-HAZOP model. Process Safety and
Environmental Protection 90: 376–386.
202. Bedford, T. and Alkabi, B. M. (2009). Modelling competing risks and opportunis-
tic maintenance with expert judgement. In: Martorell, S., Guedes Soares, C. and
Barnett, J. Safety, Reliability and Risk Analysis: Theory, Methods and Applications:
Proceedings of European Safety and Reliability Conference ESREL 2008, Valencia,
Spain, September 22–25, 2008, Leiden, the Netherlands: Taylor & Francis Group:
pp. 515–521.
203. Radner, R. and Jorgenson, D. W. (1963). Opportunistic replacement of a single part in
the presence of several monitored parts. Management Science 10(1): 70–84.
204. Epstain, S. and Wilamowsky, Y. (1985). Opportunistic replacement in a deterministic
environment. Computers and Operations Research 12(3): 311–322.
205. Van Der Duyn Schouten, D. A., and Vanneste, S. G. (1990). Analysis and computation
of (n, N)-strategies for maintenance of a two-component system. European Journal of
Operational Research 48: 260–274.
206. Ding, S.-H. and Kamaruddin, S. (2012). Selection of optimal maintenance policy
by using fuzzy multi criteria decision making method. In: Proceedings of the 2012
International Conference on Industrial Engineering and Operations Management,
July 3–6, 2012, Istanbul, Turkey: pp. 435–443.
207. Sarker, B. R. and Ibn Faiz, T. (2016). Minimizing maintenance cost for offshore wind
turbines following multi-level opportunistic preventive strategy. Renewable Energy 85:
104–113. doi:10.1016/j.renene.2015.06.030.
208. Laggoune, R., Chateauneuf, A., and Aissani, D. (2010). Impact of few failure data
on the opportunistic replacement policy for multi-component systems. Reliability
Engineering and System Safety 95: 108–119. doi:10.1016/j.ress.2009.08.007.
209. Gunn, E. A. and Diallo, C. (2015). Optimal opportunistic indirect grouping of preven-
tive replacements in multicomponent systems. Computers and Industrial Engineering
90: 281–291. doi:10.1016/j.cie.2015.09.013.
210. Zhou, X., Huang, K., Xi, L., and Lee, J. (2015). Preventive maintenance modeling
for multi-component systems with considering stochastic failures and disassembly
sequence. Reliability Engineering and System Safety 142: 231–237. doi:10.1016/
j.ress.2015.05.005.
211. Hopp, W. J. and Kuo, Y.-L. (1998). Heuristics for multicomponent joint replacement:
Applications to aircraft engine maintenance. Naval Research Logistics 45: 435–458.
212. Fard, N. and Zheng, X. (1991). An approximate method for non-repairable systems
based on opportunistic replacement policy. Reliability Engineering and System Safety
33: 277–288.
213. Zheng, X. and Fard, N. (1991). A maintenance policy for repairable systems based on
opportunistic failure-rate tolerance. IEEE Transactions on Reliability 40(2): 237–244.
38 Reliability Engineering
214. Pham, H. and Wang, H. (1999). Optimal (τ,T) opportunistic maintenance of a k-out-
of-n:G system with imperfect PM and partial failure. Naval Research Logistics 47:
223–239.
215. Cui, L. and Li, H. (2006). Opportunistic maintenance for multi-component shock
models. Mathematical Methods of Operations Research 63(3): 493–511. doi:10.1007/
s00186-005-0058-9.
216. Tambe, P. P. and Kularni, M. S. (2013). An opportunistic maintenance decision of a
multi-component system considering the effect of failures on quality. In: Proceedings
of the World Congress on Engineering 2013, Vol. 1, July 3–5, 2013, London, UK: WCE
2013: pp. 1–6.
217. Tambe, P. P., Mohite, S., and Kularni, M. S. (2013). Optimisation of opportunistic main-
tenance of a multi-component system considering the effect of failures on quality and
production schedule: A case study. International Journal of Advanced Manufacturing
Technology 69(5): 1743–1756.
218. Huynh, T. K., Barros, A., and Berenguer, C.H. (2013). A reliability-based opportu-
nistic predictive maintenance model for k-out-of-n deteriorating systems. Chemical
Engineering Transactions 33: 493–498.
219. Cheng, Z., Yang, Z., Tan, L., and Guo, B. (2011). Optimal inspection and maintenance
policy for the multi-unit series system. In: Proceedings of 9th International Conference
on Reliability, Maintainability and Safety (ICRMS) 2011, June 12–15, 2011, Guiyang,
China: pp. 811–814.
220. Cheng, Z., Yang, Z., and Guo, B. (2013). Optimal opportunistic maintenance model of
multi-unit systems. Journal of Systems Engineering and Electronics 24(5): 811–817.
doi:10.1109/JSEE.2013.00094.
221. Taghipour, S. and Banjevic, D. (2012). Optimal inspection of a complex system sub-
ject to periodic and opportunistic inspections and preventive replacements. European
Journal of Operational Research 220: 649–660. doi:10.1016/j.ejor.2012.02.002.
222. Ormon, S. W. and Cassady, C. R. (2004). Cannibalization policies for a set of paral-
lel machines. In: Reliability and Maintainability, 2004 Annual Symposium: RAMS,
January 26–29, 2004, Colorado Springs, CO: pp. 540–545.
223. Nowakowski, T. and Plewa, M. (2009). Cannibalization: Technical system maintenance
method (in Polish). In: Proceedings of XXXVII Winter School on Reliability, Warsaw,
Poland: Szczyrk, Publication House of Warsaw University of Technology: pp. 230–238.
224. Sherbrooke, C. C. (2004). Optimal Modeling Inventory of Systems. Multi-echelon
Techniques. Boston, MA: Kluwer Academic Publishers.
225. Lv, X.-Z., Fan, B.-X., Gu, Y., and Zhao, X.-H. (2013), Selective maintenance model
considering cannibalization and its solving algorithm. In: Proceedings of 2013
International conference on Quality, Reliability, Risk, Maintenance, and Safety
Engineering (WR2MSE), IEEE: pp. 717–723.
226. Simon, R. M. (1970). Cannibalization policies for multicomponent systems. SIAM
Journal on Applied Mathematics 19(4): 700–711.
227. Baxter, L. A. (1988). On the theory of cannibalization. Journal of Mathematical
Analysis and Applications 136: 290–297. doi:10.1016/0022-247X(88)90131-X.
228. Khalifa, D., Hottenstein, M., and Aggarwal, S. (1977). Technical note: Cannibalization
policies for multistate systems. Operations Research 25(6): 1032–1039.
229. Byrkett, D. L. (1985). Units of equipment available using cannibalization for repair-part
support. IEEE Transactions on Reliability R-34(1): 25–28.
230. Jodejko-Pietruczuk, A. and Plewa, M. (2012). The model of reverse logistics, based on
reliability theory with elements’ rejuvenation. Logistics and Transport 2(15): 27–35.
231. Salman, S., Cassady, C. R., Pohl, E. A., and Ormon, S. W. (2007). Evaluating the
impact of cannibalization on fleet performance. Quality and Reliability Engineering
International 23: 445–457. doi:10.1002/qre.826.
Preventive Maintenance Modeling 39
232. Sherbrooke, C. C. (1971). An evaluator for the number of operationally ready aircraft in
a multilevel supply system. Operations Research 19(3): 618–635.
233. Shah, J. and Avittathur, B. (2007). The retailer multi-item inventory problem with
demand cannibalization and substitution. International Journal of Production
Economics 106: 104–114. doi:10.1016/j.ijpe.2006.04.004.
234. Gaver, D. P., Isaacson, K. E., and Abell, J. B. (1993). Estimating aircraft recoverable
spares requirements with cannibalization of designated items. Santa Monica, CA:
RAND Corporation. https://www.rand.org/pubs/reports/R4213.html.
235. Hoover, J., Jondrow, J. M., Trost, R. S., and Ye, M. (2002). A model to study:
Cannibalization, FMC, and customer waiting time. Alexandria, VA: CNA.
236. Albright, T. L., Geber, C. A., and Juras, P. (2014). How naval aviation uses the Balanced
Scorecard. Strategic Finance 10: 21–28.
237. Meenu, G. (2011). Identification of factors affecting product cannibalization in Indian
automobile sector. IJCEM International Journal of Computational Engineering and
Management 12: 2230–7893.
238. Curtin, N. P. (2001). Military Aircraft: Cannibalizations Adversely Affect Personnel
and Maintenance. Washington, DC: US General Accounting Office.
239. Cheng, Y.-H. and Tsao, H.-L. (2010). Rolling stock maintenance strategy selection,
spares parts’ estimation, and replacements’ interval calculation. International Journal
of Production Economics 128: 404–412. doi:10.1016/j.ijpe.2010.07.038.
240. Garg, J. (2013). Maintenance: Spare Parts Optimization. M2 Research Intern Theses,
Ecole Centrale de Paris, Capgemini Consulting.
241. Ondemir, O. and Gupta, S. M. (2014). A multi-criteria decision making model for
advanced repair-to-order and disassembly-to-order system. European Journal of
Operational Research 233: 408–419. doi:10.1016/j.ejor.2013.09.003.
242. Silver, E. A. and Fiechter, C.-N. (1995). Preventive maintenance with limited historical
data. European Journal of Operational Research 82: 125–144.
243. Nguyen, K.-A., Do, P., and Grall, A. (2015). Multi-level predictive maintenance for
multi-component systems. Reliability Engineering and System Safety 144: 83–94.
doi:10.1016/j.ress.2015.07.017.
244. Predictive maintenance 4.0. Predict the unpredictable. PWC, Mainnovation,
Pricewaterhouse Coopers B.V. 2017.
245. Predictive maintenance and the smart factory. Deloitte Development LLC. 2017.
2 Inspection Maintenance
Modeling for
Technical Systems
An Overview
Sylwia Werbińska-Wojciechowska
CONTENTS
2.1 Introduction..................................................................................................... 41
2.2 Inspection Maintenance Modeling for Single-Unit Systems...........................44
2.2.1 Inspection Maintenance for Two-State Systems.................................44
2.2.2 Inspection Maintenance for Multi-state Systems................................ 47
2.3 Inspection Maintenance Modeling for Multi-unit Systems............................. 56
2.3.1 Inspection Maintenance for Standby Systems..................................... 56
2.3.2 Inspection Maintenance for Operating Systems.................................. 58
2.4 Hybrid Inspection Models............................................................................... 65
2.5 Other Inspection Maintenance Models........................................................... 67
2.6 Conclusions and Directions for Further Research........................................... 67
References................................................................................................................. 69
2.1 INTRODUCTION
All equipment breaks down from time to time, requiring materials, tradespeople
to repair it, and causing some negative consequences, such as loss in production or
transportation delays. To reduce the number of these breakdowns, planned main-
tenance actions are implemented. One of the most familiar planned maintenance
actions is inspection.
Currently, inspection and inspection policy development have an important role in
various technical systems, thus they attract a lot of attention in the literature. In many
situations there are no apparent systems indicating the forthcoming failure. In such
systems with non-self-announcing failures (also called unrevealed faults or latent
faults), the typical preventive maintenance policies cannot be used [1]. In maintenance
of such systems the inspection actions performance is introduced. Examples of these
systems include protective devices, emergency devices, and standby units (see [1,2]).
The main purpose of an inspection is to determine the state of equipment
based on the chosen indicators, such as bearing wear, gauge readings, and quality
of a product [3]. Following this, the main definition of inspection can be derived.
41
42 Reliability Engineering
FIGURE 2.1 Inspection maintenance models for technical systems – the main classifica-
tion. (Own contribution based on Tang, T., Failure finding interval optimization for peri-
odically inspected repairable systems, PhD Thesis, University of Toronto, 2012; Beichelt, F.,
Nav. Res. Logist. Q., 28, 375–381, 1981; Cazorla, D.M. and R. Perez-Ocon, Eur. J. Oper. Res.,
190, 494–508, 2008; Boland, P.J. and E. El-Neweihi, Comput. Oper. Res., 22, 383–390, 1995.)
For the given assumptions, the expected total cost is obtained according to the formula:
∞
∑∫
n+1
t in
C (Tin ) = tinn
cin1(n + 1) + cin2 (tinn+1 − x ) dF ( x ) (2.1)
n=0
where:
C(Tin) Long-run expected cost per unit time
cin1 Cost of first inspection action performance
cin2 Cost of second (and subsequent) inspection action performance
F(x) Probability distribution function of system/unit lifetime
Inspection Maintenance Modeling for Technical Systems 45
damages the system. This model is investigated and extended later in [59,60].
The new inspection policy considers random shock magnitudes and times between
shock arrivals and focuses on optimization of availability criterion.
Another extension of the model presented in [58] is given in [61]. The authors in
this work incorporate a more general deterioration process that includes both shock
degradation and graceful degradation (continuous accumulation of damage). With
the use of regenerative arguments and considering a constant rate of graceful deg-
radation occurrence, an expression for the limiting average availability is derived.
The maintenance models for systems with two failure modes—type I failure rela-
tive to non-maintainable failure mode, and type II failure relative to periodically
maintainable failure mode—are developed in [62–65].
In 2006, a model with three types of inspections is introduced in [66]. In this
article, the authors assume that a system can fail because of three competing failure
types: I, II, and III. Partial inspections detect type I failures without error. Failures
of type II can be detected by imperfect inspections. Type III failures are detectable
only by perfect inspections. If the system is found to have failed in an inspection, a
perfect repair is made.
The summary of the main known models published in the recent literature is
presented in Table 2.1. The author considers a few main criteria for summarizing
this review:
TABLE 2.1
Summary of Inspection Policies for Two-State, Single-Unit Systems
Quality of Modeling Method/
Planning Performed Checking Type of Publication
Problem Category Horizon Inspections Failure Modes Optimization Criterion Procedures References Years
Original algorithm Infinite Perfect n/a Expected cost per unit of time Analytical [67] 1980
Original algorithm Infinite Perfect n/a Expected cost per unit of time Analytical/optimal [68] 1984
Original algorithm Infinite Perfect n/a Expected profit per unit of time Analytical/heuristic [32] 1996
approach
Original algorithm Infinite Perfect n/a Expected cost function Heuristic approach [33] 1992
Original algorithm Infinite Perfect n/a Expected total cost Analytical [69] 2005
Original algorithm Finite Perfect n/a Expected costs of loss Discrete dynamic [30] 1980
programming
One-parameter Infinite Perfect n/a Average total cost per time unit [60] 1998
optimization model
Model with unknown or Infinite Perfect n/a Expected loss cost per time unit Analytical [34] 1981
partially unknown system
lifetime probabilitya
Model with known or Infinite Perfect n/a Total expected cost Analytical [36] 2006
unknown slpa
Model with unknown slpa Infinite Perfect/imperfect n/a Total expected cost Analytical [35] 2001
Model with known slpa Infinite Imperfect n/a Cost per unit of time Analytical [70] 2002
Model with known slpa Infinite Imperfect n/a Long-run expected cost per unit Renewal reward [71] 1995
time/ availability function process/non linear
programming
Model with known slpa Infinite Imperfect n/a Long-run expected cost per unit Renewal reward [72] 2003
time process
Reliability Engineering
(Continued)
TABLE 2.1 (Continued)
Summary of Inspection Policies for Two-State, Single-Unit Systems
Quality of Modeling Method/
Planning Performed Checking Type of Publication
Problem Category Horizon Inspections Failure Modes Optimization Criterion Procedures References Years
Model with known slpa Infinite Imperfect n/a Long-run expected cost per unit Renewal theory, [37] 2016
time Wiener process
Model with known slpa Infinite Imperfect n/a Total cost over a lifetime Continuous-time [38] 1982
Markovian decision
process
Model with known slpa Infinite Imperfect n/a Expected cost per time unit Markovian model [73] 1998
Model with known slpa Infinite Fallible/ n/a Long-run cost per unit time Dynamic [43] 1993
error-free tests programming
Model with known slpa Infinite Fallible tests n/a Long-run cost per unit time Analytical [44] 1993
Model with known slpa Infinite Fallible tests n/a Mean loss per unit time Analytical [41] 1979
Model with known slpa Infinite Fallible tests n/a Expected lifetime of the unit Markov decision [42] 1979
process
Inspection Maintenance Modeling for Technical Systems
Model with known slpa Infinite/ Failure detection n/a Long-run cost per unit time Analytical [39] 2015
finite zone
Model with known slpa Finite Imperfect n/a Expected sum of discounted cost Markov decision [74] 2008
process +
quasi-Bayes
approach + dynamic
programming
(Continued)
49
50
to retirement
Shock model Infinite Perfect Random shocks Time-stationary availability Analytical (renewal [58–60] 1994, 1998,
arriving according to process) 2000
a Poisson process
Shock model Infinite Perfect Random shocks (a Limiting average availability Analytical (renewal [61] 2002
Poisson process) and process)
graceful degradation
(Continued)
51
52
TABLE 2.2
Summary of Inspection Policies for Multi-state, Single-Unit Systems
Quality of
Planning Performed Optimization Modeling Method/ Type of Publication
Problem Category Horizon Inspections Failure Modes Criterion Checking Procedures References Years
Optimization model Infinite Perfect n/a Discounted and Discrete-time Markov [81] 1976
average cost process
Optimization model Infinite Perfect n/a Discounted and Markov decision [88] 1978
average cost process
Optimization model Infinite Perfect n/a Total expected Markovian model [79] 1976
cost per time unit
Optimization model Infinite Perfect n/a Long-run Markov renewal [85] 1997
expected average theory
cost per unit time
Optimization model Infinite Perfect n/a Expected long-run Semi-Markov decision [83] 1992
discounted cost process
Optimization model Infinite Imperfect n/a Long-run Analytical [76] 2014
expected cost per
unit time
Optimization model Infinite Imperfect n/a Expected total Discrete-time Markov [86] 1986
discounted cost chain
Optimization model Infinite Imperfect n/a Reliability Semi-Markov [77] 1962
function processes
Inspection with CBM Infinite Imperfect n/a Operational Analytical [50] 2013
modeling reliability
Optimization model Finite Perfect n/a Average cost Semi-Markov decision [82] 1984
model
(Continued)
Reliability Engineering
TABLE 2.2 (Continued)
Summary of Inspection Policies for Multi-state, Single-Unit Systems
Quality of
Planning Performed Optimization Modeling Method/ Type of Publication
Problem Category Horizon Inspections Failure Modes Criterion Checking Procedures References Years
Shock model Infinite Perfect Cumulative damage attributed Long-run average Analytical (renewal [89] 1980
to shocks occurrence cost per unit time reward theorem)
(Poisson process)
Shock model Infinite Perfect Deterioration level assumed Long-run average Markov process/ [90] 1987
as increasing pure jump cost per unit time control-limit policy
Markov process
Shock model Infinite Perfect Cumulative damage caused Expected long-run Analytical (renewal [91] 1997
by gradual damage cost rate reward theorem)
Shock model Infinite Perfect Poisson shock process Limiting average Continuous-time [94] 2006
availability Markov chain
Inspection Maintenance Modeling for Technical Systems
Shock model Infinite Perfect Fatal shocks occurrence Expected long-run Continuous-time [93] 2001
cost rate Markov process
Shock model Infinite Perfect/imperfect Internal and external failures Total costs per Generalized Markov [92] 2008
occurrence unit time process
55
56 Reliability Engineering
At the beginning models are investigated for protective devices and standby units.
two- and three-component systems using discrete Markov chains. The first model
applies to active redundancy without component repair, the second model includes
active redundancy with component repair, the third and fourth models analyze
standby redundancy without and with component repair.
TABLE 2.3
Summary of Inspection Policies for Multi-unit Systems
Quality of
System Stand by Planning Performed Modeling Method/ Type of
Type Unit Type Horizon Inspections Optimization Criterion Checking Procedures References Publication Years
Standby Cold standby Infinite Perfect Main unreliability Analytical (regenerative [104] 1997
system characteristics point technique)
Standby Cold standby Infinite Perfect Reliability function, MTTF Analytical (renewal theory) [95] 1970
system
Standby Cold standby Infinite Perfect Expected loss due to system Analytical (renewal theory) [107] 2012
system unavailability per time
unit, the average system
unavailability per cycle
Standby Cold standby Infinite Perfect Main reliability Semi-Markov process and [102] 2016
system characteristics, the regenerative point
expected total profit per technique
unit of time
Standby Cold standby Infinite Perfect Main reliability Semi-Markov process and [100] 2011
system characteristics, the profit regenerative point
function technique
Standby Cold standby Infinite Perfect Main reliability Regenerative point [101] 2012
system characteristics, the technique, MC
expected total profit per simulation, Bayesian
unit of time setup
(Continued)
Reliability Engineering
TABLE 2.3 (Continued)
Summary of Inspection Policies for Multi-unit Systems
Quality of
System Stand by Planning Performed Modeling Method/ Type of
Type Unit Type Horizon Inspections Optimization Criterion Checking Procedures References Publication Years
Standby Warm standby Infinite Perfect Main reliability Generalized Markov [99] 2008
system characteristics, the total process
cost of a system per unit
of time
Standby Warm standby Infinite Perfect/ imperfect Total cost per unit of time Analytical (renewal theory) [129] 2002
system
Standby Cold/warm Infinite Perfect Limiting average Analytical (renewal [105] 2009
system standby availability, the expected theory), Markov jump
cost rate process
Standby Cold standby Finite/infinite Perfect Main reliability Analytical (regenerative [97] 1995
system characteristics point technique)
Standby Warm standby Finite/infinite Perfect Main reliability Analytical (regenerative [98] 1995
system characteristics, the point technique)
Inspection Maintenance Modeling for Technical Systems
2% 6%
10%
59% 23%
when searching for the keyword “inspection maintenance” in Google search, there
were about 260 million hits. In the ScienceDirect database, this keyword had about
98,500 hits. Comparing the obtained search results to the main required criteria
such as periodic inspection, maintenance optimization, and technical system, 122
inspection models published from 1962 to 2016 (see Figure 2.2) were the focus of
this chapter.
Due to the plethora of available publications on inspection maintenance, there
was no possibility to present all the known models from this research area. The most
investigated ones that are not included in this chapter apply to:
This literature overview lets the author draw the following main conclusions:
REFERENCES
1. Tang T (2012) Failure finding interval optimization for periodically inspected repair-
able systems. PhD Thesis, University of Toronto.
2. Keller JB (1982) Optimum inspection policies. Management Science 28(4): 447–450.
3. Sheriff YS (1982) Reliability analysis: Optimal inspection & maintenance schedules of
failing equipment. Microelectronics Reliability 22(1): 59–115.
4. PN-EN 13306:2018 Maintenance—Maintenance terminology, The Polish Committee
for Standardization, Warsaw.
5. Gulati R, Kahn J, Baldwin R (2010) The professional’s guide to maintenance and reli-
ability terminology. Reliabilityweb.com.
6. Peters R (2014) Reliable, Maintenance Planning, Estimating, and Scheduling. Gulf
Professional Publishing.
7. Barlow RE, Hunter LC, Proschan F (1963) Optimum checking procedures. Journal
of the Society for Industrial and Applied Mathematics 11(4): 1078–1095. https://www.
jstor.org/stable/2946496.
8. Beichelt F, Tittmann P (eds.) (2012) Reliability and Maintenance. Networks and
Systems. CRC Press.
9. Radner R, Jorgenson DW (1962) Optimal replacement and inspection of stochasti-
cally failing equipment. In: Arrow KJ, Karlin S, Scarf H (eds.) Studies in Applied
Probability and Management Science, Stanford University Press: 184–206.
10. Jorgenson DW, Mccall JJ (1963) Optimal scheduling of replacement and inspection.
Operations Research 11(5): 732–746.
11. Pierskalla WP, Voelker JA (1976) A survey of maintenance models: The control and
surveillance of deteriorating systems. Naval Research Logistics Quarterly 23: 353–388.
12. Valdez-Flores C, Feldman R (1989) A survey of preventive maintenance models for sto-
chastically deteriorating single-unit systems. Naval Research Logistics 36: 419–446.
13. Cho ID, Parlar M (1991) A survey of maintenance models for multi-unit systems.
European Journal of Operational Research 51(1): 1–23.
14. Thomas LC, Gaver DP, Jacobs PA (1991) Inspection models and their application.
IMA Journal of Mathematics Applied in Business and Industry 3: 283–303.
15. Parmigiani G (1991) Scheduling inspections in reliability. Institute of Statistics and
Decision Sciences Discussion Paper no. 92–A11:1–21, Duke University. https://stat.
duke.edu/research/papers/1992-11 (accessed 17 October 2018).
16. Osaki S (ed.) (2002) Stochastic Models in Reliability and Maintenance, Springer-
Verlang, Berlin, Germany.
17. Nakagawa T (2005) Maintenance Theory of Reliability. Springer.
70 Reliability Engineering
18. Jardine AKS, Tsang AHC (2013) Maintenance, replacement and reliability. Theory and
Applications. CRC Press.
19. Werbińska-Wojciechowska S (2019) Technical System Maintenance. Delay-Time-Based
Modeling. Springer.
20. Kaio N, Osaki S (1989) Comparison of inspection policies. Journal of Operations
Research Society 40(5): 499–503. Palgrave Macmillan Journals.
21. Kaio N, Osaki S (1988) Inspection policies: Comparisons and modifications. Revenue
française d’automatique, d’informatique et de recherché opérationnelle. Recherche
opérationnelle 22(4): 387–400.
22. Munford AG (1981) Comparison among certain inspection policies. Management
Science 27(3): 260–267.
23. Jiang R, Jardine AKS (2005) Two optimization models of the optimum inspection prob-
lem. The Journal of the Operational Research Society 56(10): 1176–1183. doi:10.1057/
palgrave.jors.2601885.
24. Boland PJ, El-Neweihi E (1995) Expected cost comparisons for inspec-
tion and repair policies. Computers and Operations Research 22(4): 383–390.
doi:10.1016/0305-0548(94)00047-C.
25. Hu T, Wei Y (2001) Multivariate stochastic comparisons of inspection and repair poli-
cies. Statistics and Probability Letters 51: 315–324.
26. Mccall JJ (1963) Operating characteristics of opportunistic replacement and inspection
policies. Management Science 10(1): 85–97.
27. Choi KM (1997) Semi-Markov and delay time models of maintenance. PhD thesis,
University of Salford, UK.
28. Chelbi A, Ait-Kadi D (2009) Inspection strategies for randomly failing systems. In:
Ben-Daya M, Duffuaa SO, Raouf A, Knezevic J, Ait-Kadi D (eds.) Handbook of
Maintenance Management and Engineering. Springer, London, UK.
29. Lee C (1999) Applications of delay time theory to maintenance practice of complex
plant. PhD thesis, University of Salford, UK.
30. Bobrowski D (1980) Optimisation of technical object maintenance with inspections (in
Polish). In: Proceedings of Winter School on Reliability, Center for Technical Progress,
Katowice, Poland: 31–46.
31. Viscolani B (1991) A note on checking schedules with finite horizon. Operations
Research 25(2): 203–208. doi:10.1051/ro/1991250202031.
32. Hariga MA (1996) A maintenance inspection model for a single machine with general
failure distribution. Microelectronics Reliability 36(3): 353–358.
33. Klatzky RL, Messick DM, Loftus J (1992) Heuristics for determining the optimal inter-
val between checkups. Psychological Science 3(5): 279–284.
34. Beichelt F (1981) Minimax inspection strategies for single unit systems. Naval Research
Logistics Quarterly 28(3): 375–381.
35. Leung FKN (2001) Inspection schedules when the lifetime distribution of a single-
unit system is completely unknown. European Journal of Operational Research 132:
106–115. doi:10.1016/S0377-2217(00)00115-6.
36. Okumura S (2006) Determination of inspection schedules of equipment by variational
method. Mathematical Problems in Engineering, Hindawi Publishing Corporation,
Article ID 95843: 1–16.
37. Liu B, Zhao X, Yeh R-H, Kuo W (2016) Imperfect inspection policy for systems with
multiple correlated degradation processes. IFAC-PapersOnLine 49–12: 1377–1382.
38. Senegupta B (1982) An exponential riddle. Journal of Applied Probability 19(3):
737–740.
39. Guo H, Szidarovszky F, Gerokostopoulos A, Niu P (2015) On determining optimal
inspection interval for minimizing maintenance cost. In: Proceedings of 2015 Annual
Reliability and Maintainability Symposium (RAMS), IEEE: 1–7.
Inspection Maintenance Modeling for Technical Systems 71
80. Becker G, Camarinopoulos L, Ziouas G (1994) A Markov type model for systems
with tolerable down times. The Journal of the Operational Research Society 45(10):
1168–1178. doi:10.2307/2584479.
81. Rosenfield D (1976) Markovian deterioration with uncertain information. Operations
Research 24(1): 141–155.
82. Tijms HC, Van Der Duyn Schouten FA (1984) A Markov decision algorithm for optimal
inspections and revisions in a maintenance system with partial information. European
Journal of Operational Research 21: 245–253. Elsevier.
83. Kawai H, Koyanagi J (1992) An optimal maintenance policy of a discrete time Markovian
deterioration system. Computers Mathematics with Applications 24(1/2): 103–108.
84. Weiss GH (1962) A problem in equipment maintenance. Management Science 8(3):
266–277.
85. Fung J, Makis V (1997) An inspection model with generally distributed restoration and
repair times. Microelectronics Reliability 37(3): 381–389.
86. Ohnishi M, Kawai H, Mine H (1986) An optimal inspection and replacement policy for
a deteriorating system. Journal of Applied Probability 23(4): 973–988.
87. Wang GJ, Zhang YL (2014) Geometric process model for a system with inspections
and preventive repair. Computers and Industrial Engineering 75: 13–19. doi:10.1016/
j.cie.2014.06.007.
88. White III ChC (1978) Optimal inspection and repair of a production process subject to
deterioration. The Journal of the Operational Research Society 29(3): 235–243.
89. Zuckerman D (1980) Inspection and replacement policies. Journal of Applied
Probability 17(1): 168–177.
90. Abdel-Hameed M (1987) Inspection and maintenance policies of devices subject to
deterioration. Advances in Applied Probability 19(4): 917–931.
91. Kong MB, Park KS (1997) Optimal replacement of an item subject to cumulative dam-
age under periodic inspections. Microelectronics Reliability 37(3): 467–472.
92. Delia M-C, Rafael P-O (2008) A maintenance model with failures and inspection
following Markovian arrival processes and two repair modes. European Journal of
Operational Research 186: 694–707. doi:10.1016/j.ejor.2007.02.009.
93. Chiang JH, Yuan J (2001) Optimal maintenance policy for a Markovian system
under periodic inspection. Reliability Engineering and System Safety 71: 165–172.
doi:10.1016/S0951-8320(00)00093-4.
94. Kharoufer JP, Finkelstein DE, Mixon DG (2006) Availability of periodically inspected
systems with Markovian wear and shocks. Journal of Applied Probability 43(2): 303–
317. doi:10.1239/jap/1152413724.
95. Mazumdar M (1970) Reliability of two-unit redundant repairable systems when failures
are revealed by inspections. SIAM Journal on Applied Mathematics 19(4): 637–647.
96. Osaki S, Asakura T (1970) A two-unit standby redundant system with repair and pre-
ventive maintenance. Journal of Applied Probability 7(3): 641–648.
97. Mahmoud MAW, Mohie El-Din MM, El-Said Moshref M (1995) Reliability analysis of
a two-unit cold standby system with inspection, replacement, proviso of rest, two types
of repair and preparation time. Microelectronics Reliability 35(7): 1063–1072.
98. Pandey D, Tyagi SK, Jacob M (1995) Profit evaluation of a two-unit system with inter-
nal and external repairs, inspection and post repair. Microelectronics Reliability 35(2):
259–264.
99. Cazorla DM, Perez-Ocon R (2008) An LDQBD process under degradation, inspection,
and two types of repair. European Journal of Operational Research 190: 494–508.
doi:10.1016/j.ejor.2007.04.056.
100. Kumar J (2011) Cost-benefit analysis of a redundant system with inspection and priority
subject to degradation. IJCSI International Journal of Computer Science Issues 8(6/2):
314–321.
74 Reliability Engineering
101. Kishan R, Jain D (2012) A two non-identical unit standby system model with repair,
inspection and post-repair under classical and Bayesian viewpoints. Journal of
Reliability and Statistical Studies 5(2): 85–103.
102. Bhatti J, Chitkara AK, Kakkar MK (2016) Stochastic analysis of dis-similar standby
system with discrete failure, inspection and replacement policy. Demonstratio
Mathematica 49(2): 224–235.
103. Vaurio JK (1999) Availability and cost functions for periodically inspected preventively
maintained units. Reliability Engineering and System Safety 63: 133–140. doi:10.1016/
S0951-8320(98)00030-1.
104. Vaurio JK (1997) On time-dependent availability and maintenance optimization of
standby units under various maintenance policies. Reliability Engineering and System
Safety 56: 79–89. doi:10.1016/S0951-8320(96)00132-9.
105. Kenzin M, Frostig E (2009) M out of n inspected systems subject to shocks in random
environment. Reliability Engineering and System Safety 94: 1322–1330. doi:10.1016/j.
ress.2009.02.005.
106. Zequeira RI, Berenguer C (2005) On the inspection policy of a two-component parallel
system with failure interaction. Reliability Engineering and System Safety 88: 99–107.
doi:10.1016/j.ress.2004.07.009.
107. Lee BL, Wang M (2012) Approximately optimal testing policy for two-unit parallel
standby systems. International Journal of Applied Science and Engineering 10(3):
263–272.
108. Mendes AA, Coit DW, Duarte Ribeiro JL (2014) Establishment of the optimal time
interval between periodic inspections for redundant systems. Reliability Engineering
and System Safety 131: 148–165. doi:10.1016/j.ress.2014.06.021.
109. Greenberg H (1964) Optimum test procedure under stress. Operations Research 12(5):
689–692.
110. Anily S, Glass CA, Hassin R (1998) The scheduling of maintenance service. Discrete
Applied Mathematics 82(1–3): 27–42. doi:10.1016/S0166-218X(97)00119-4.
111. Sheils E, O’connor A, Breysse D, Schoefs F, Yotte S (2010) Development of a two-
stage inspection process for the assessment of deteriorating infrastructure. Reliability
Engineering and System Safety 95: 182–194. doi:10.1016/j.ress.2009.09.008.
112. Duffuaa S, Al-Najjar HJ (1995) An optimal complete inspection plan for critical
multicharacteristic components. Journal of the Operational Research Society 46(8):
930–942.
113. Duffuaa S, Khan M (2002) An optimal repeat inspection plan with several classifi-
cations. Journal of the Operational Research Society 53(9): 1016–1026. doi:10.1057/
palgrave.jors.2601392.
114. Duffuaa S, Khan M (2008) A general repeat inspection plan for dependent multicharac-
teristic critical components. European Journal of Operational Research 191: 374–385.
doi:10.1016/j.ejor.2007.02.033.
115. Sahraoui Y, Khelif R, Chateauneuf A (2013) Maintenance planning under imperfect
inspections of corroded pipelines. International Journal of Pressure Vessels and
Piping 104: 76–82. doi:10.1016/j.ijpvp.2013.01.009.
116. Srivastava MS, Wu Y (1993) Estimation and testing in an imperfect-inspection model.
IEEE Transactions on Reliability 42(2): 280–286. IEEE, doi: 10.1109/24.229501.
117. Godziszewski J (2001) The impact of errors of the first and second types made dur-
ing inspections on the costs of maintenance of a homogeneous equipment park (in
Polish). In: Proceedings of XIX Winter School on Reliability—Computer Aided
Dependability Analysis, Publishing House of Institute for Sustainable Technologies,
Radom: 89–100.
118. Aven T (1987) Optimal inspection and replacement of a coherent system.
Microelectronics Reliability 27(3): 447–450. doi:10.1016/0026-2714(87)90460-4.
Inspection Maintenance Modeling for Technical Systems 75
119. Zuckerman D (1989) Optimal inspection policy for a multi-unit machine. Journal of
Applied Probability 26: 543–551.
120. Qiu Y (1991) A note on optimal inspection policy for stochastically deteriorating series
systems. Journal of Applied Probability 28: 934–939.
121. Dieulle L (1999) Reliability of a system with Poisson inspection times. Journal of
Applied Probability 36(4): 1140–1154.
122. Dieulle L (2002) Reliability of several component sets with inspections at random
times. European Journal of Operational Research 139: 96–114.
123. Rezaei E (2017) A new model for the optimization of periodic inspection intervals
with failure interaction: A case study for a turbine rotor. Case Studies in Engineering
Failure Analysis 9: 148–156. doi:10.1016/j.csefa.2015.10.001.
124. Tolentino D, Ruiz SE (2014) Influence of structural deterioration over time on the opti-
mal time interval for inspection and maintenance of structures. Engineering Structures
61: 22–30. doi:10.1016/j.engstruct.2014.01.012.
125. Lu Z, Chen M, Zhou D (2015) Periodic inspection maintenance policy with a general
repair for multi-state systems. In: Proceedings of Chinese Automation Congress (CAC):
2116–2121.
126. Zhang J, Huang X, Fang Y, Zhou J, Zhang H, Li J (2016) Optimal inspection-based pre-
ventive maintenance policy for three-state mechanical components under competing
failure modes. Reliability Engineering and System Safety 152: 95–103. doi:10.1016/j.
ress.2016.02.007.
127. Carvalho M, Nunes E, Telhada J (2009) Optimal periodic inspection of series sys-
tems with revealed and unrevealed failures. In: Safety, Reliability and Risk Analysis:
Theory, Methods and Applications—Proceedings of the Joint Esrel and SRA-Europe
Conference, CRC Press: 587–592.
128. Bris R, Chatelet E, Yalaoui F (2003) New method to minimize the preventive main-
tenance cost of series-parallel systems. Reliability Engineering and System Safety 82:
247–255. doi:10.1016/S0951-8320(03)00166-2.
129. Badia FG, Berrade MD, Campos CA (2002) Maintenance policy for multivariate
standby/operating units. Applied Stochastic Models in Business and Industry 18:
147–155.
130. Huang J, Song Y, Ren Y, Gao Q (2014) An optimization method of aircraft periodic inspec-
tion and maintenance based on the zero-failure data analysis. In: Proceedings of 2014
IEEE Chinese Guidance, Navigation and Control Conference, Yantai, China: 319–323.
131. Azadeh A, Sangari MS, Amiri AS (2012) A particle swarm algorithm for inspection
optimization in serial multi-stage process. Applied Mathematical Modelling 36: 1455–
1464. doi:10.1016/j.apm.2011.09.037.
132. Dzwigarek M, Hryniewicz O (2011) Frequency of periodical inspections of safety-
related control systems of machinery—Practical recommendations for determining
methods. In: Proceedings of Summer Safety and Reliability Seminars, SSARS 2011,
Gdańsk-Sopot, Poland: 17–26.
133. Alfares H (1999) A simulation model for determining inspection frequency. Computers
and Indus-trial Engineering 36: 685–696. doi:doi.org/10.1016/S0360-8352(99)00159-X.
134. Bai Y, Bai Q (2014) Subsea Pipeline Integrity and Risk Management. Elsevier.
doi:10.1016/C2011-0-00113-8.
135. Bai Y, Jin W-L (2015) Marine Structural Design. Elsevier.
136. Zhaoyang T, Jianfeng L, Zongzhi W, Jianhu Z, Weifeng H (2011) An evaluation of main-
tenance strategy using risk-based inspection. Safety Science 49: 852–860. doi:10.1016/j.
ssci.2011.01.015.
137. Hagemeijer PM, Kerkveld G (1998) A methodology for risk-based inspection of pres-
surized systems. Proceedings of the Institution of Mechanical Engineers, Part E:
Journal of Process Mechanical Engineering 212(1): 37–47. SAGE Journals.
76 Reliability Engineering
138. Hagemeijer PM, Kerkveld G (1998) Application of risk-based inspection for pressurized
HC production systems in a Brunei petroleum company. Proceedings of the Institution of
Mechanical Engineers, Part E: Journal of Process Mechanical Engineering 212(1):
49–54.
139. Wang J, Matellini B, Wall A, Phipps J (2012) Risk-based verification of large off-
shore systems. Proceedings of the Institution of Mechanical Engineers, Part
M: Journal of Engineering for the Maritime Environment 226(3): 273–298.
doi:10.1177/1475090211430302.
140. Jovanovic A (2003) Risk-based inspection and maintenance in power and process
plants in Europe. Nuclear Engineering and Design 226: 165–182.
141. Kallen MJ, Van Noortwijk JM (2005) Optimal maintenance decisions under imper-
fect inspection. Reliability Engineering and System Safety 90: 177–185. doi:10.1016/j.
ress.2004.10.004.
142. You J-S, Kuo H-T, Wu W-F (2006) Case studies of risk-informed inservice inspection
of nuclear piping systems. Nuclear Engineering and Design 236: 35–46.
143. Podofillini L, Zio E, Vatn J (2006) Risk-informed optimisation of railway tracks
inspection and maintenance procedures. Reliability Engineering and System Safety 91:
20–35. doi:10.1016/j.ress.2004.11.009.
144. Nakagawa T (1980) Replacement models with inspection and preventive maintenance.
Microelectronics and Reliability 20: 427–433.
145. Yeh L (2003) An inspection-repair-replacement model for a deteriorating system with
unobservable state. Journal of Applied Probability 40: 1031–1042.
146. Park JH, Lee SC, Hong JW, Lie CH (2009) An optimal block preventive maintenance
policy for a multi-unit system considering imperfect maintenance. Asia-Pacific Journal
of Operational Research 26(6): 831–847.
147. Sheu S-H, Lin Y-B, Liao G-L (2006) Optimum policies for a system with general imper-
fect maintenance. Reliability Engineering and System Safety 91(3): 362–369.
148. Taghipour S, Banjevic D (2012) Optimum inspection interval for a system under peri-
odic and opportunistic inspections. IIEE Transactions 44: 932–948. doi:10.1080/07408
17X.2011.618176.
149. Taghipour S, Banjevic D (2012) Optimal inspection of a complex system subject to
periodic and opportunistic inspections and preventive replacements. European Journal
of Operational Research 220: 649–660. doi:10.1016/j.ejor.2012.02.002.
150. Babishin V, Taghipour S (2016) Joint optimal maintenance and inspection for a k-out-
of-n system. International Journal of Advanced Manufacturing Technology 87(5):
1739–1749. doi:10.1109/RAMS.2016.7448039.
151. Bjarnason ETS, Taghipour S (2014) Optimizing simultaneously inspection interval and
inventory levels (s, S) for a k-out-of-n system. In: 2014 Reliability and Maintainability
Symposium, Colorado Springs, CO: 1–6. doi:10.1109/RAMS.2014.6798463.
152. Chen C-T, Chen Y-W, Yuan J (2003) On dynamic preventive maintenance policy
for a system under inspection. Reliability Engineering and System Safety 80: 41–47.
doi:10.1016/S0951-8320(02)00238-7.
153. Chen Y-C (2013) An optimal production and inspection strategy with preventive main-
tenance error and rework. Journal of Manufacturing Systems 32: 99–106. doi:10.1016/j.
jmsy.2012.07.010.
154. Duffuaa S, El-Ga’aly A (2013) A multi-objective mathematical optimization model for
process targeting using 100% inspection policy. Applied Mathematical Modelling 37:
1545–1552. doi:10.1016/j.apm.2012.04.008.
155. Wang H, Wang W, Peng R (2017) A two-phase inspection model for a single compo-
nent system with three-stage degradation. Reliability Engineering and System Safety
158: 31–40.
Inspection Maintenance Modeling for Technical Systems 77
156. Feng Q, Peng H, Coit DW (2010) A degradation-based model for joint optimiza-
tion of burn-in, quality inspection, and maintenance: A light display device applica-
tion. International Journal of Advanced Manufacturing Technology 50: 801–808.
doi:10.1007/s00170-010-2532-7.
157. Tsai H-N, Sheu S-H, Zhang ZG (2016) A trivariate optimal replacement policy for
a deteriorating system based on cumulative damage and inspections. Reliability
Engineering and System Safety 160: 122–135. doi:10.1016/j.ress.2016.10.031.
158. Bjarnason ETS, Taghipour S, Banjevic D (2014) Joint optimal inspection and inven-
tory for a k-out-of-n system. Reliability Engineering and System Safety 131: 203–215.
doi:10.1016/j.ress.2014.06.018.
159. Panagiotidou S (2014) Joint optimization of spare parts ordering and maintenance
policies for multiple identical items subject to silent failures. European Journal of
Operational Research 235: 300–314. doi:10.1016/j.ejor.2013.10.065.
160. Bukowski JV (2001) Modeling and analyzing the effects of periodic inspection on
the performance of safety-critical systems. IEEE Transactions on Reliability 50(3):
321–329. doi:10.1109/24.974130.
161. Ellingwood BR, Mori Y (1997) Reliability-based service life assessment of con-
crete structures in nuclear power plants: Optimum inspection and repair. Nuclear
Engineering and Design 175: 247–258.
162. Estes AC, Frangopol DM (2000) An optimized lifetime reliability-based inspection
program for deteriorating structures. In: Proceedings of the 8th ASCE Joint Specialty
Conference on Probabilistic Mechanics and Structural Reliability, Notre Dame, IN.
163. Faber MH, Sorensen JD (2002) Indicators for inspection and maintenance planning of
concrete structures. Structural Safety 24: 377–396. doi:10.1016/S0167-4730(02)00033-4.
164. Onoufriou T, Frangopol DM (2002) Reliability-based inspection optimization of com-
plex structures: A brief retrospective. Computers and Structures 80: 1133–1144.
165. Woodcock K (2014) Model of safety inspection. Safety Science 62: 145–156.
166. Ten Wolde M, Ghobbar AA (2013) Optimizing inspection intervals—Reliability and
availability in terms of a cost model: A case study on railway carriers. Reliability
Engineering and System Safety 114: 137–147. doi:10.1016/j.ress.2012.12.013.
167. Ali SA, Bagchi G (1998) Risk-informed in service inspection. Nuclear Engineering
and Design 181: 221–224.
168. Garnero M-A, Beaudouin F, Delbos J-P (1998) Optimization of bearing-inspection
intervals. IEEE Proceedings of Annual Reliability and Maintainability Symposium:
332–338.
169. Aoki K, Yamamoto K, Kobayashi K (2007) Optimal inspection and replacement policy
using stochastic method for deterioration prediction, In: Proceedings of 11th World
Conference on Transport Research, Berkeley CA:1–13.
170. Sandoh H, Igaki N (2003) Optimal inspection policies for a scale. Computers and
Mathematics with Applications 46: 1119–1127.
171. Sandoh H, Igaki N (2001) Inspection policies for a scale. Journal of Quality in
Maintenance Engineering 7(3): 220–231.
172. Guduru RKR, Shaik SH, Yaramala S (2018) A dynamic optimization model for multi-
objective maintenance of sewing machine. International Journal of Pure and Applied
Mathematics 118(20): 33–43.
173. Gravito FM, Dos Santos Filho N (2003) Inspection and maintenance of wooden poles
structures. Global ESMO 2003, Orlando, Florida: 151–155.
174. Jazwinski J, Zurek J (2000) Principles of determining the maintenance set of the condi-
tion of the transport system with the use of expert opinions (in Polish). In: Proceeding of
XXVIII Winter School on Reliability—Decision Problems in Dependability Engineering,
Publishing House of Institute for Sustainable Technologies, Radom, Poland: 118–125.
78 Reliability Engineering
CONTENTS
3.1 Introduction..................................................................................................... 79
3.2 Continuous State Stochastic Processes............................................................80
3.2.1 Wiener Process.................................................................................... 81
3.2.2 Gamma Process................................................................................... 83
3.2.3 Inverse Gaussian Process.....................................................................84
3.2.4 Case Example: Degradation Analysis with a Continuous State
Stochastic Process................................................................................ 86
3.2.5 Selection of Appropriate Continuous State Stochastic Process........... 88
3.3 Discrete State Stochastic Processes.................................................................90
3.3.1 Markovian Structure............................................................................ 91
3.3.2 Semi-Markov Process..........................................................................99
3.4 Summary and Conclusions............................................................................ 104
References............................................................................................................... 104
3.1 INTRODUCTION
Most engineering systems experience the aging phenomena during their life cycle.
The operating conditions and external stresses further expedite the aging process of
these systems. The aging process reflects the propagation of the failure mechanism,
which ultimately results in a decline of product performances and finally product
failure. To reduce the downtime and ensure safe operations, it is desirable to identify
the product’s lifetime and reliability measure accurately so that appropriate main-
tenance policies can be executed. Therefore, the knowledge of product deteriora-
tion characteristics and fundamental root causes is a great source of information
to assess the product performance and reliability using the degradation modeling
(Limon et al. 2017a; Shahraki et al. 2017). In degradation modeling, a predefined
threshold value is considered to identify the time-to-failure. Further, the degradation
79
80 Reliability Engineering
result of the small and independent degradation increments. Besides capturing the
temporal variation of the degradation processes, these members of the Levy pro-
cesses also have well-established mathematical properties useful for explaining the
degradation behavior. Further, the members of the Levy processes also have a strong
Markov property with the following mathematical expression:
Pr( X ti | X ti −1 , X ti −2 , X ti −3 ……… X t1 ) = Pr ( X ti | X ti −1 )
This implies that the next degradation increment is only dependent on the cur-
rent state of the degradation and independent of the past degradation increments.
This property is also intuitive and practical for many deterioration processes.
The following sections provide the details of each stochastic processes for degrada-
tion modeling.
3.2.1 Wiener Process
The basic Wiener process can be expressed as:
Y ( t ) = µΛ ( t ) + σ B ( Λ ( t ) ) (3.1)
Here B(.) is the standard Brownian motion, µ and σ represents the drift and volatility
parameter respectively, Λ(.) indicates the timescale function, and Y(t) is the charac-
teristic indicator that represents the system behavior. Suppose, a random variable
Y(t) follows the Wiener stochastic process, then it has the following mathematical
properties:
1. y ( 0 ) = 0
2. y(t ) follows a normal distribution with N ~ ( µ Λ(t ), σ 2Λ(t ))
3. y(t ) has an independent increment for every time interval ∆t ( ∆t = ti − ti −1 )
4. The independent increment ∆y ( t ) = yi − yi −1 follows the normal distribution
( )
N ~ µ ∆Λ(t ), σ 2∆Λ(t ) with probability density function (PDF):
∆y − µ ∆Λ t 2
( )
−
1 2σ 2 ∆Λ ( t )
f ∆y (t ) = e
(3.2)
σ 2π∆Λ ( t )
The Wiener process is known also as the standard Brownian motion that is the random
movement of particles suspended in a fluid environment resulting from their collision.
This random movement of small particles is very analogous to the random incre-
ment of the deterioration path. Besides, the Wiener process has many other attractive
properties that are well suited to model the degradation behavior. For example, the
degradation process can be viewed as an integration of small environmental effects in
a cumulative form. The increment process of these small effects can be approximated
by a normal distribution according to the central limit theorem. The environmen-
tal effects such as temperature, shocks, and humidity are most often independent,
82 Reliability Engineering
and resulting degradation are also independent in the time interval. Considering this
aspect, the Wiener process is a good versatile model to describe many degradation
phenomena. In a Wiener process, the drift parameter µ represents the degradation
rate and timescale function Λ(.) captures the nonlinearity in the degradation process.
The manufacturer often uses the accelerated degradation test (ADT) to quickly
analyze the reliability matrices during the product design stages. In ADT, to expedite
the degradation process, product samples are subjected to higher stress levels than
the normal operating conditions. The effect of stress on product degradation as well
as the lifetime can be explained by several existing physics or empirical-based reac-
tion rate models. For example, the temperature or any thermal effect on a product
deterioration can be captured easily by the Arrhenius model. Following are several
other well-established reaction rate models where d ( s) represents the rate of deg-
radation at stress level s, and a1 and a2 are the constant coefficients that depend on
material or product types (Nelson 2004):
a
− 2
d ( s) = a1e T ; Arrhenius model ( s = T )
= aV
1
a2
; Power law model ( s = V ) (3.3)
Since the magnitude of stress measurement units may differ significantly in the
multi-stress scenario, it is important to use standardized transform stresses to disre-
gard the influence of stress measurement units. The transformed stress level is given
as (Park and Yum 1997):
1 S0′ − 1 Sk′
Sk = , for Arrhenius model
1 S0′ − 1 S M′
=
( ) ( ),
log Sk′ − log S0′
for Power law model (3.4)
log ( S ) − log ( S )
′
M
′
0
Sk′ − S0′
= , for Exponential law modell
S M′ − S0′
where So′ , Sk′ , and S M′ represent the operational, applied accelerated, and maximum
stress level in their original form, whereas Sk represents corresponding transformed
stress. It is considered the multiple stress degradation test with possible interaction
effect between stresses. The nonlinear behavior of the degradation is described by
the power law function ( Λ ( t ) = t c, c is a constant ). Considering both the Wiener
parameter is stress dependent, the log-likelihood function can be written as:
( )
2
n m p 1 ∆yijk − µ ( s) tijk
c
− t(ci −1) jk
L (θ) = ∏∏∏
− log 2π c
2 ((
tijk − t(ci −1) jk )) 1
− log(σ ) −
2
2
2σ 2 tijk
c
(− t(ci −1) jk)
i =1 j =1 k =1
(3.5)
Application of Stochastic Processes in Degradation Modeling 83
1 b ( y − a )2
b 2 − 2 a2 y
f IG , ( y , a,b ) = 3
e (3.6)
2π y
Here, a and b are the IG distribution parameters. The mean time to failure than can
be written as:
1
D − y0 c
ξw = (3.7)
µ ( s)
D − y − µ ( s)t c
R (t ) ≈ Φ (3.8)
0
σ 2
( s )t c
3.2.2 Gamma Process
The gamma process represents the degradation behavior in a form of cumulative
damage where the deterioration occurs gradually over the period of time. Assuming
a random variable Y(t) represents the deterioration, then the gamma process that
is a continuous-time stochastic process has the following mathematical properties
(O’Connor 2012):
1. y(0) = 0
2. y(t ) follow a gamma distribution with Ga ~ (α t , β )
3. y(t ) has an independent increment in a time interval ∆t ( ∆t = ti − ti −1 )
4. The independent increment ∆y(t ) = yi − yi −1 also follows the gamma distri-
bution Ga ~ (α∆t , β ) with PDF:
c c
β α ( ti −ti −1 ) α ( t c − t c ) −1
f ∆y ( t ) = ∆y i i −1 e −( β∆y ) (3.9)
Γ(α (ti − ti −1 ))
c c
where α > 0 and β > 0 represent the gamma shape and scale parameters, respectively,
c is a nonlinearity parameter, and Г(.) is a gamma function with Γ ( a ) = ∫0 x a−1e −( x ) dx.
∞
84 Reliability Engineering
Now, considering the accelerated test and both gamma parameter dependent on stresses
with interaction effect, the log-likelihood function can be written as:
β ( s ) tijk
c
−t c
n m p [α ( s)] ( i −1) jk
∆yijk
c c
α ( s ) tijk − t( i −1) jk −1 − ∆yijk β ( s )
e
L(θ ) = ∏∏∏ Γ α (s) (t
i =1 j =1 k =1
c
ijk − t(ci −1) jk
) (3.10)
The MLE method with advanced optimization software can be used to solve this
complex equation. Now assuming that a failure occurs while the degradation path
reaches the threshold D, then the time to failure ξ is defined as the time when the
degradation path crosses the threshold D and the reliability function at time t will be:
R (t ) = P (t < tD ) = 1 −
(
Γ α t c, Dβ ) (3.11)
Γ αt ( ) c
F (t ) =
(
Γ α t c, Dβ ) (3.12)
Γ αt ( ) c
Because of the gamma function, the evaluation of the CDF becomes mathematically
intractable. To deal with this issue, Park and Padgett (2005) proposed an approxi-
mation of time-to-failure ξ with a Birnbaum-Saunders (BS) distribution having the
following CDF:
1 tc b
FBS ( t ) ≈ φ − c (3.13)
a b t
where a = 1/√ (ωβ) and b = ωβ/α. Considering BS approximation, the expected failure
time can be estimated as:
1
ω 1 c
ξG = β + (3.14)
α 2α
Here µ and λ denote the mean and scale parameter and Λ (t) represents the shape
function. The mean of Y(t) is defined by µΛ(t) and the variance is µ3Λ(t)/λ. The shape
function is nonlinear, and a power law is chosen in this work to represent the nonsta-
tionary process (Λ (t) = tc). By the properties of the IG process and Equation 3.15, the
likelihood function of the degradation increment can be given as:
( ∆y ( ))
2
c c
− µijk tijk − tijk
( )
2 ijk
n m p λijk t − t
c c − λijk
∏∏∏
ijk ijk 2
2 µijk ∆yijk
L(θ) = e (3.16)
2π∆yijk
3
i =1 j =1 k =1
2λt c
λ c D − y0 λ c D − y0
( c c 2
)
F ξ D | D, µ t , λ (t ) = Φ
D − y0
t −
µ
− e
µ
Φ −
D − y0
t +
µ
(3.17)
where Φ (.) is the CDF of the standard normal distribution. However, when µΛ(t) and
t are large, Y(t) can be approximated by the normal distribution with mean µΛ(t) and
variance µ3Λ(t)/λ. Therefore, the CDF of ξD also can be approximated by the follow-
ing equation (Ye and Chen 2014):
D − µ ( s)t c
(
F ξ IG | D, µ t c , λ (t c )2 = Φ )
µ ( s)3 t c / λ
(3.18)
TABLE 3.1
Accelerated Degradation Test Dataset of LEDs
Stress Level Degradation Measurement (lux)
Sample/time (hrs) 0 50 100 150 200 250
1 1 0.866 0.787 0.76 0.716 0.68
2 1 0.821 0.714 0.654 0.617 0.58
3 1 0.827 0.703 0.64 0.613 0.593
4 1 0.798 0.683 0.623 0.6 0.59
5 1 0.751 0.667 0.628 0.59 0.54
6 1 0.837 0.74 0.674 0.63 0.613
40 mA 7 1 0.73 0.65 0.607 0.583 0.58
8 1 0.862 0.676 0.627 0.6 0.597
9 1 0.812 0.65 0.606 0.593 0.573
10 1 0.668 0.633 0.593 0.573 0.565
11 1 0.661 0.642 0.594 0.58 0.553
12 1 0.765 0.617 0.613 0.597 0.56
1 1 0.951 0.86 0.776 0.7 0.667
2 1 0.933 0.871 0.797 0.743 0.73
3 1 0.983 0.924 0.89 0.843 0.83
4 1 0.966 0.882 0.851 0.814 0.786
5 1 0.958 0.89 0.84 0.81 0.8
6 1 0.94 0.824 0.774 0.717 0.706
35 mA 7 1 0.882 0.787 0.75 0.7 0.693
8 1 0.867 0.78 0.733 0.687 0.673
9 1 0.89 0.8 0.763 0.723 0.713
10 1 0.962 0.865 0.814 0.745 0.742
11 1 0.975 0.845 0.81 0.75 0.741
12 1 0.924 0.854 0.8 0.733 0.715
Source: Chaluvadi, V.N.H., Accelerated life testing of electronic revenue meters, PhD dissertation,
Clemson University, Clemson, SC, 2008.
Application of Stochastic Processes in Degradation Modeling 87
and degradation data from the test. Two different combinations of constant acceler-
ated stresses were used to accelerate the lumen degradation of LEDs. At each stress
level, twelve samples are assigned, and the light intensity of each sample LED was
measured at room temperature every 50 hours up to 250 hours. The operating stress
is defined as 30 mA and 50 percent degradation of the initial light intensity is con-
sidered to be the failure threshold value.
Figure 3.1 shows the nonlinear nature of the LEDs degradation path that
justifies our assumption of the non-stationary continuous state stochastic pro-
cess. The nonlinear likelihood function with multiple model parameters makes
a greater challenge to estimate parameter values. The MLE method with an
advanced optimization software R has been used to solve these complex equa-
tions. The built-in “mle” function that uses the Nelder-Mead algorithm (optim) to
optimize the likelihood function is used to estimate model parameters. After the
model parameters for each stochastic process have been estimated, the lifetime
and reliability under any given set of operating conditions can be estimated. Now,
considering the different stochastic process models, the parameter and lifetime
estimates are provided in Table 3.2.
The results show that the Wiener process has deviated (larger) lifetime estimates
compared to the Gamma and IG process. Figure 3.2 illustrates the reliability estimates
considering different stochastic process models. Similar to lifetimes, r eliability plots
also show deviated (higher) estimate by the Wiener process.
TABLE 3.2
Parameter and Lifetime Estimates with Different Degradation Model
Model γ 0 γ δ 0 δ 1
c Lifetime
Weibull −4.3516 0.9483 −3.8413 0.1570 0.4569 3002.26
Gamma −0.7636 0.0954 4.2685 −.08528 0.5802 1812.28
IG −5.1956 0.9481 −6.1025 0.1185 0.6097 1611.15
88 Reliability Engineering
TABLE 3.3
Goodness-of-fit Statistics for Stochastic Processes
Goodness-of-fit Statistic Wiener Gamma Inverse Gaussian
KS statistic 0.1802 0.0708 0.1590
CVM statistic 1.27821 0.1159 0.5977
AD statistic 7.1927 0.6224 2.9771
AIC −315.2034 −407.427 −389.4947
BIC −309.6285 −401.852 −383.9197
90 Reliability Engineering
Wiener is the least suitable model for the LED degradation data. This result explains
the huge discrepancy between the lifetime and reliability estimates of the Wiener
process compared to other two degradation models. The physical degradation phe-
nomena also is intuitive to this fitness checking criteria. As LEDs are monotonically
degraded over a period of time, thus it basically follows the assumption of a mono-
tonic and nonnegative Gamma process most and then an IG process. Because of the
clear monotonic behavior of the LED data, the degradation definitely does not follow
the Wiener process. All the model fitness test statistic and criteria also indicate an
ill-fitted degradation behavior of Wiener process for LED data. Further, this poorly
fitted Wiener process also resulted in much lower nonlinear constant estimates (see
Table 3.2) that represent a slower degradation rate than the actual situation. This mis-
representation of the degradation increment and the lower degradation rate than the
actual situation causes the overestimate of the lifetime and reliability by the Wiener
degradation modeling. This case example clearly shows the importance of choosing
the right stochastic process for assessing the system’s degradation behavior.
The discrete state stochastic process used to model the degradation process can
be divided into different categories depending on the continuous or discrete nature
of the time variable, and Markovian and non-Markovian property (Moghaddass and
Zuo 2014).
From a time viewpoint, the multistate degradation process can evolve according
to a discrete-time stochastic process or a continuous-time stochastic process. In the
discrete-time type, the transition between different states occurs only at a specific
time; however, transitions can occur at any time for the continuous-time stochastic
process. With respect to the dependency of degradation transitions to the history
of the degradation process, the multistate degradation process can be divided into
Markovian degradation process and non-Markovian degradation process. When the
degradation transition between two states depends only on its current states, that is,
the degradation process is independent of the history of the process, the degradation
model follows the Markovian structure. On the other hand, in a multistate degrada-
tion process with a non-Markovian structure, the transition between two states may
depend on other factors like previous states, the age of the system, and on how long
the system has been in its current state. The following sections provide a detailed dis-
cussion on Markovian structure and semi-Markov process with suitable examples.
Pr ( X n = xn | X 0 = x0 , X 1 = x1, …, X n −1 = xn −1 ) = Pr ( X n = xn | X n −1 = xn −1 ) (3.21)
If the state of the Markov chain at time step n is xn, we denote it as X n = xn. Equa-
tion 3.21 implies that the chain behavior in the future depends only on its current
state and it is independent of its behavior in the past. Therefore, the probability that
the Markov chain is going from state i into state j in one step, which is called one-step
transition probability, is pij = Pr ( X n = j | X n −1 = i ). For time a homogeneous Markov
chain, the transition probability between two states does not depend on the n, i.e.,
pij = Pr ( X n = j | X n −1 = i ) = Pr ( X 1 = j | X 0 = i ) = constant. The one-step transition
probabilities can be condensed into a transition probability matrix for a discrete-time
Markov chain with M + 1 states as follows:
p00 p01 … p0 M
p10 p11 … p1M
P = (3.22)
… … … …
pM 0 pM 1 … pMM
The sum of each row in P is one and all elements are non-negative. As the
discrete-time Markov chain is used to model the degradation process of an
item, the transition probability matrix P is in upper-triangular form ( pij = 0 for
i > j ) to reflect the system deterioration without considering maintenance or repair.
Moreover, for the failure state M, which is also known as an absorbing state,
pMM = 1 and pMj = 0 for j = 0,1,…, M − 1.
Having the transition probability matrix P and the knowing the initial conditions
of the Markov chain, p(0) = [ p0 (0), p1(0),…, pM (0) ], we can compute the state proba-
bilities at step n, p ( n ) = p0 ( n ) , p1 ( n ) ,…, pM ( n) . p j ( n ) = Pr { X n = j} , j = 1, …, M,
which is the probability that the chain is in state j after n transitions. For many
applications such as reliability estimation and prognostics, state probabilities are of
utmost interest.
Based on the Chapman-Kolmogorov equation, the probability of a process mov-
ing from state i to state j after n steps (transitions) can be calculated by multiplying
the matrix P by itself n times (Ross 1995). Thus, assuming that p(0) is the initial
state vector, the row-vector of the state probabilities after the nth step is given as:
p( n) = p(0).P n (3.23)
For most of the systems, as the system is in the perfect condition at the beginning of
its mission, the initial state vector is given as p(0) = [1, 0, 0,…, 0].
When the transition from the current state i to a lower state j takes place at any
instant of the time, the continuous-time Markov chain is used to model the degra-
dation process. In analogy with discrete-time Markov chains, a stochastic process
Application of Stochastic Processes in Degradation Modeling 93
Pr ( X (t n ) = xn | X (t0 ) = x0 , …, X (t n −1 ) = xn −1 ) = Pr ( X (t n ) = xn | X (t n −1 ) = xn −1 ) (3.24)
Equation 3.24 is analogous to Equation 3.21. Thus, most of the properties of the
continuous-time Markov process are similar to those of the discrete-time Markov
process. The probability of the continuous-time Markov chain going from state i into
state j during ∆t , which is called transition probability, is Pr ( X (t + ∆t ) = j | X (t ) = i ) =
π ij ( t , ∆t ) . They satisfy: π ij ( t , ∆t ) ≥ 0 and ∑ Mj = 0 π ij ( t , ∆t ) = 1.
For time homogeneous continuous-time Markov chain, the transition probability
between two states does not depend on the t but depends only on the length of the
time interval ∆t . Moreover, the transition rate ( λ ij (t ) ) from state i to state j ( i ≠ j ) at
π ij ( t , ∆t )
time t is defined as: λ ij (t ) = ∆lim t →0 ∆t , which does not depend on t and is constant
for a homogeneous Markov process.
Like the discrete-time case, it is important to get the state probabilities for calcu-
lating the availability and reliability measures for the system. The state probabilities
of X (t ) are:
M
Knowing the initial condition and based on the theorem of total probability and
Chapman-Kolmogorov equation, the state probabilities are obtained using the sys-
tem of differential equations as (Trivedi 2002; Ross 1995):
M M
∑ ∑λ ,
dp j (t )
p′ j ( t ) = = pi (t )λij − p j ( t ) ji j = 0,1, ..., M (3.26)
dt i =0 i =0
i≠ j i≠ j
λ00 λ01 … λ0 M
dp(t ) λ10 λ11 … λ1M
= p(t )λ , p ( t ) = p0 ( t ) , p1 ( t ) ,…, pM (t ) , λ =
dt … … … …
λM 0 λM 1 … λMM
(3.27)
In the transition rate matrix, λ jj = −∑ i ≠ j λ ji and ∑ j =0 λij = 0 for 0 ≤ i ≤ M. As the
M
continuous-time Markov chain is used to model the degradation process, the tran-
sition rate matrix λ is in upper-triangular form (λij = 0 for i > j ) to reflect the
degradation process without considering maintenance or repair. Since state M
is an absorbing state, all the transition rates from this state are equal to zero,
λMj = 0 for j = 0,1, …, M − 1.
Regarding the method to solve the system of Equation 3.27, there are several
methods including numerical and analytical methods such as enumerative method
94 Reliability Engineering
(Liu and Kapur 2007), recursive approach (Sheu and Zhang 2013), and Laplace-
Stieltjes transform (Lisnianski and Levitin 2003).
Example 3.3.1.1
Consider a system that can have four possible states, S = {0,1,2,3}, where state
0 indicates that the system is in as good as new condition, states 1 and 2 are inter-
mediate degraded conditions, and state 3 is the failure state. The system has only
minor failures; i.e., there is no jump between different states without passing all
intermediate states. The transition rate matrix is given as:
The λ33 = 0 shows that the state 3 is an absorbing state. If the system is in the best
state=
at the beginning ( p(0) [ =p0 (0), p1(0), p2 (0), p3 (0)] [10
, ,0,0]), the goal is to com-
pute the system reliability at time t > 0.
Solution 3.3.1.1: For the multi-state systems, the reliability measure can be
based on the ability of the system to meet the customer demand W (required
performance level). Therefore, the state space can be divided into two subsets
of acceptable states in which their performance level is higher than or equal to
the demand level and unacceptable states. The reliability of the system at time
t is the summation of probabilities of all acceptable states. All the unacceptable
states can be regarded as failed states, and the failure probability is a sum of
probabilities of all the unacceptable states.
First, find the state probabilities at time t for each state solving the following
differential equations:
dp0 (t )
dt = −λ01 p0 ( t )
dp1(t ) = −λ01 p0 ( t ) − λ12 p1 (t )
dt
dp2 (t ) = −λ12 p1( t ) − λ23 p2 (t )
dt
dp (t )
3 = −λ23 p2 ( t )
dt
Using the Laplace-Stieltjes transforms and inverse Laplace-Stieltjes transforms
(Lisnianski et al. 2010), the state probabilities at time t are found as:
p0 (t ) = e − λ43t
p1(t ) = λ01 (e − λ12t −e − λ01t )
λ01 − λ12
p2 (t ) = − λ 12λ 01[( λ01 − λ 12 )e − λ23 t
+ ( λ23 − λ01 )e − λ12 t
+( λ12 − λ23 )e − λ01t
]
( λ12 − λ21) ( λ01 − λ12 ) ( λ23 − λ01)
p3 = 1− p2 (t ) − p1(t ) − p0 (t )
Application of Stochastic Processes in Degradation Modeling 95
The plots of the system reliability for all three cases are shown in Figure 3.7.
Let τ i denote the time that the degradation process spent in state i. According to
the Markov property in Equation 3.24, i does not depend on the past state of the
process, so the following equation holds:
Function h(∆t ) in Equation 3.28 only depends on ∆t , and not on the past time t.
The only continuous probability distribution that satisfies Equation 3.28 is the
exponential distribution. In the discrete time case, requirement in Equation 3.28
leads to the geometric distribution.
In a Markovian degradation structure, the transition between two states at time t
depends only on the two states involved and is independent of the history of the pro-
cess before time t (memoryless property). The fixed transition probabilities/rates and
the geometric/exponential sojourn time distribution limit the use of a Markov chain to
model the degradation process of real systems. For the degradation process of some
systems, the probability of making the transition from one state to a more degraded
state may increase with the age and the probability that it continuously stays at the
current state will decrease. That is, pii (t + ∆t ) ≤ pii (t ) and ∑ j =i +1 pij (t + ∆t )≥∑ j =i +1 pij (t ).
n n
Therefore, the transition probabilities and transition rates are not constant during the
time and an extension of the Markovian model, which is called aging Markovian
deterioration model, is used to include this aging effect.
For the discrete-time aging Markovian model, P(t ) is one-step transition prob-
ability matrix at time t and pij (t ) represents the transition probability from state i to
state j at time t. As shown in Chen and Wu (2007), each row of P(t ) represents a
state probability distribution given the current state at i that will form a bell-shape
distribution. Let Ni satisfy pi ,Ni (t ) = max { pi , j ( t ) , j = 0,1,… , M}, where Ni represents
j
the peak transition probability in the bell-shape distribution. Then:
Ni M
Pi L ( t ) ≡ ∑p (t ) ; P
j =1
ij i
R
(t ) ≡ ∑ p (t )
j = Ni +1
ij (3.29)
and for j > Ni , pij (t + 1) ≤ pij (t ). When the system becomes older, Pi L increases while
Pi R decreases, therefore:
Pi L ( t ) ≥ Pi L ( t + 1) ; Pi R ( t ) ≤ Pi R ( t + 1) (3.30)
piL ( t + 1) piR ( t + 1)
pij ( t + 1) ≡ pij ( t ) ∀j ≤ N ; p ( t + 1) ≡ p ( t ) ∀j > Ni (3.31)
piL ( t ) piR ( t )
i ij ij
i ( )
can be estimated from historical data. Therefore, Equation 3.31 is represented as:
piR ( t + 1)
pij ( t + 1) ≡ pij ( t ) . 1− ∀j ≤ Ni ; pij ( t + 1) ≡ pij ( t ) . (1+ δ ) ∀j > Ni (3.32)
piL ( t )
Application of Stochastic Processes in Degradation Modeling 97
Starting with the initial transition probability matrix P(0), the values of the P(t),
which are changing during the time, can be calculated according to Equation 3.32.
For the continuous-time aging Markovian model, which is called the non-
homogeneous continuous-time Markov process, the amount of time that the sys-
tem spends in each state before proceeding to the degraded state does not follow
the exponential distribution. Usually, the transition times are assumed to obey
Weibull distribution because of its flexibility, which allows considering hazard
functions both increasing and decreasing over time, at different speeds.
To get the state probabilities at each time t, we have to solve the Chapman-
Kolmogorov equations as:
M M
∑ ∑λ
dp j (t )
= pi (t )λij (t ) − p j ( t ) ji ( t ), j = 0,1,…,M (3.33)
dt i =0 i =0
i≠ j i≠ j
d p (t )
= p (t )λ (t ),
dt
λ00 ( t ) λ01 ( t ) … λ0 M ( t )
λ10 ( t ) λ11 ( t ) … λ1M ( t )
p ( t ) = p0 ( t ),…, p M (t ) , λ ( t ) = (3.34)
… … … …
λM0 ( t ) λM1 ( t ) … λMM ( t )
The transition rate matrix λ (t ) has the same properties as the transition matrix
in Equation 3.27. To find the state probabilities at time t, many methods have
been used to solve Equation 3.34 such as state–state integration method (Liu and
Kapur 2007) and recursive approach (Sheu and Zhang 2013). Equation 3.34 can
be recursively solved from state 0 to state M as follows:
t
∫λ00 ( s)ds
p0 ( t ) = e 0 (3.35)
t
j −1 t
∫ λ jj ( s ) ds
pj ( t ) = ∑∫p (τ
i =0 0
i i +1 ) λij (τ i +1) e
τ i +1
dτ i +1 , j = 1,… , M − 1 (3.36)
M −1
pM ( t ) = 1 − ∑ p (t )
j=0
j (3.37)
Example 3.3.1.2
(Sheu and Zhang 2013; Shu et al. 2015) Assume that a system degrades through
five different possible states, S = {0,1, 2, 3, 4} and state 0 is the best state and state
4 is the worst state. The time Tij spent in each state i before moving to the next state
j follows the Weibull distribution Tij ~ Weibull(1 / ( i − 0.5 j ) , 3) with scale parameter
98 Reliability Engineering
demand level, the states 3 and 4 are unacceptable states. The goal is to compute
the system reliability at time t(0 < t < 4) .
p0 ( 0 ) = 1, pj ( 0 ) = 0 j = 1, 2,… , M.
The state probabilities can be obtained using Equations 3.36 and 3.37 as:
3
p0 (t ) = e −0.14t
p4 ( t ) = 1− p0 ( t ) − p1 ( t ) − p2 ( t ) − p3 ( t )
3.3.2 Semi-Markov Process
The semi-Markov process can be applied to model the degradation process of
some systems whose degradation process cannot be captured by a Markov process.
For example, Ng and Moses (1998) used the semi-Markov process to model bridge
degradation behavior. They described the semi-Markov process in terms of a transi-
tion matrix and a holding time or sojourn time matrix. A transition matrix has a set
100 Reliability Engineering
of transition probabilities between states that describe the embedded Markov chain.
The holding time matrix has a set of probabilities obtained from the probability den-
sity function of the holding times between states.
For Markov models, the transition probability of going from one state to another
does not depend on how the item arrived at the current state or how long it has been
there. However, semi-Markov models relax this condition to allow the time spent in
a state to follow an arbitrary probability distribution. Therefore, the process stays in
a particular state for a random duration that depends on the current state and on the
next state to be visited (Ross 1995).
To describe the semi-Markov process X ≡ { X ( t ) : t ≥ 0}, consider the degrada-
tion process of a system with finite state space S = {0,1, 2,…, M } (M + 1: the total
number of possible states). The process visits some state i ∈ S and spends a random
amount of time there that depends on the next state it will visit, j ∈ S , i ≠ j . Let Tn
denote the time of the nth transition of the process, and let X (Tn ) be the state of the
process after the nth transition. The process transitions from state i to state j ≠ i
with the probability pij = P ( X (Tn+1 ) = j X (Tn ) = i ). Given the next state is j, the
sojourn time from state i to state j has a CDF, Fij . For a semi-Markov process, the
sojourn times can follow any distribution, and pij is defined also as the transition
probability of the embedded Markov chain.
The one-step transition probability of the semi-Markov process transiting to
state 𝑗 within a time interval less than or equal to t, provided starting from state, is
expressed as (Cinlar 1975):
(
Qij ( t ) = Pr X (Tn+1 ) = j , Tn+1 − Tn ≤ t , X (Tn ) = i ) t ≥ 0 (3.38)
The random time between every transition (Tn +1 − Tn ), sojourn time, has a CDF as:
( )
Fij ( t ) = Pr Tn +1 − Tn ≤ t X (Tn +1 ) = j , X (Tn ) = i (3.39)
If the sojourn time in a state depends only on the current visited state, then the
unconditional sojourn time in state i is Fij ( t ) = Fi ( t ) = ∑ j∈S Qij (t ). The transition
probabilities of the semi-Markov process ( Q ( t ) = [Qij (t )], i , j ∈ S ), which is called
semi-Markov kernel, is the essential quantity of a semi-Markov process and satisfies
the relation:
Equation 3.40 indicates that the transition of the semi-Markov model has two steps.
Figure 3.10 shows a sample degradation path of a system. The system is in the state i
at the initial time instance and transits to the next worse state j with transition prob-
ability pij . As the process is a monotone non-increasing function without considering
the maintenance, j = i +1 with probability one. Before moving into the next state j,
the process will wait for a random time with CDF Fij (t ). This process continues until
Application of Stochastic Processes in Degradation Modeling 101
the process enters the state M that is an absorbing state. For this example the transi-
tion probability matrix is given as:
0 1 0 … 0
0 0 1 … 0
P = (3.41)
… … … … …
0 0 0 … 1
When the semi-Markov process is used to model the degradation process, the initial
state of the process, the transition probability matrix P, and matrix F(t ) must be
known. Another way of defining the semi-Markov process is knowing the kernel
matrix and the initial state probabilities.
Like previous models, it is important to find the state probabilities of the semi-
Markov process. The probability that a semi-Markov process will be in state j at
time t ≥ 0 given that it entered state i at time zero, π ij ( t ) ≡ Pr { X ( t ) = j | X ( 0 ) = i},
is found as follows (Howard 1960; Kulkarni 1995):
π ij ( t ) = δ ij [1 − Fi (t ) ] + ∑∫q
k ∈S 0
ik (ϑ ) π kj ( t − ϑ ) dϑ (3.42)
dQik (ϑ )
qik (ϑ ) = (3.43)
dϑ
1i = j
δ ij = (3.44)
0 i ≠ j
102 Reliability Engineering
In general, it is difficult to obtain the transition functions, even when the kernel
matrix is known. Equation 3.42 can be solved using numerical methods such as
quadrature method (Blasi et al. 2004; Corradi et al. 2004) and Laplace and inverse
Laplace transforms (Dui et al. 2015) or simulation methods (Sánchez-Silva and
Klutke 2016).
Moreover, the stationary distribution π = (π j ; j ∈S ) of the semi-Markov process
is defined, when it exists, as:
υ jw j
π j := lim π ij ( t ) = (3.45)
∑
M
t →∞
υi wi
i =0
where υ j for j ∈ S denotes the stationary probability of the embedded Markov chain
satisfying the property: υ j = ∑ iM= 0 υi pij , ∑ iM= 0 υi = 1, and w j for j ∈ S is the expected
sojourn time in state j.
For some systems, degradation transitions between two states and may depend on
the states involved in the transitions, the time spent at the current state (t), the time
that the system reached the current state (s), and/or the total age of the system (t+s).
As another extension, a nonhomogeneous semi-Markov process is used for model-
ing the degradation of such systems in which degradation transition can follow an
arbitrary distribution.
The associated non-homogeneous semi-Markov kernel is defined by:
(
Qij ( s, t ) = Pr X (Tn +1 ) = j , Tn +1 ≤ t , X (Tn ) = i ,Tn = s t ≥ 0 (3.46) )
In non-homogeneous semi-Markov, the state probabilities are defined and obtained
using the following equation:
t
π ij ( t ) = Pr { X ( t ) = j | X ( 0 ) = i} = δ ij [1 − Fi (t , s) ] + ∑∫q (s,ϑ )π
k∈S s
ik kj (t − ϑ )( dϑ ) (3.47)
The obtained state probabilities can be used to find different availability and reli-
ability indexes.
Example 3.3.2
Consider a system (or a component) whose possible states during its evolution in
time are S = {0,1, 2}. Denote by U = {0,1} the subset of working states of the system
and by D = {2} the failure state. In this system, both minor and major failures are
possible. The state transition diagram is shown in Figure 3.11.
The holding times are normally distributed, i.e., Fij ~ N(µij ,σ ij ) . Therefore, the
CDF of the holding time from state i to state j is:
t (u − µij )
−
1
Fij ( t ) =
∫e
2σ ij
du ∀i, j ∈ S
2πσ ij2 0
Application of Stochastic Processes in Degradation Modeling 103
The goal is to find the system reliability at time t given the best state is the initial
state of the system.
Solution 3.3.2: As the system is at state 0 at the beginning, the reliability of the
system at time t is the probability of transition from state 0 to state 2 at time t,
π 02 ( t ).
First, we find the kernel matrix of the semi-Markov process Q ( t ) = [Qij (t )], i, j∈ S:
0 Q01(t ) Q02 (t )
Q ( t ) = 0 0 Q12 (t )
0 0 0
Q01(t ) is the probability that the process transitions from state 0 to 1 within a time
interval less than or equal to t that can be determined as the probability that the
time of transition from state 0 to 1 (T01) is less than or equal to t and the time of
transition from state 0 to 2 (T02) is greater than t.
∫
Q 01( t ) = Pr(T01 ≤ t and T02 > t ) = 1 − F02 (t ) dF01(t )
0
∫
Q 02 ( t ) = Pr(T02 ≤ t and T01 > t ) = 1 − F01(t ) dF02 (t )
0
Q12 ( t ) = Pr(T12 ≤ t ) = F12
t
∫
π 02 ( t ) = q01 (ϑ ) π 12 (t − ϑ )dϑ
0
t
∫
π 12 ( t ) = q12 (ϑ ) π 22 (t − ϑ )dϑ
0
π
22 ( t ) = 1
104 Reliability Engineering
All these models presented are based on the assumption that the degradation
process is directly observable. However, in many cases, the degradation level is
not directly observable due to the complexity of the degradation process or the
nature of the product type. Therefore, to deal with indirectly observed states,
models such as hidden Markov models (HMM) and hidden semi-Markov mod-
els (HSMM) have been developed. The HMM deals with two different stochastic
processes: the unobservable degradation process and measurable characteristics
(which is dependent on the actual degradation process). In HHMs, finding a sto-
chastic relationship between unobservable degradation process and the output
signals of the observation process is a critical prerequisite for condition monitoring
and reliability analysis. As discussed, the details of HMM are beyond the scope of
this chapter, interested readers can refer to Shahraki et al. (2017 and Si et al. (2011)
for more details.
REFERENCES
Blasi, A., Janssen, J. and Manca, R., 2004. Numerical treatment of homogeneous and non-
homogeneous semi-Markov reliability models. Communications in Statistics, Theory
and Methods 33(3): 697–714.
Chaluvadi, V. N. H., 2008. Accelerated life testing of electronic revenue meters. PhD disser-
tation, Clemson, SC: Clemson University.
Chen, A. and Wu, G.S., 2007. Real-time health prognosis and dynamic preventive main-
tenance policy for equipment under aging Markovian deterioration. International
Journal of Production Research 45(15): 3351–3379.
Application of Stochastic Processes in Degradation Modeling 105
Cinlar E., 1975. Introduction to Stochastic Processes. Englewood Cliffs, NJ: Prentice-Hall.
Corradi, G., Janssen, J. and Manca, R., 2004. Numerical treatment of homogeneous semi-
Markov processes in transient case—a straightforward approach. Methodology and
Computing in Applied Probability 6(2): 233–246.
Dui, H., Si, S., Zuo, M. J. and Sun, S., 2015. Semi-Markov process-based integrated impor-
tance measure for multi-state systems. IEEE Transactions on Reliability 64(2): 754–765.
Howard R. 1960. Dynamic Programming and Markov Processes, Cambridge, MA: MIT
press.
Kulkarni, V. G. 1995. Modeling and Analysis of Stochastic Systems, London, UK: Chapman
and Hall.
Limon, S., Yadav, O. P. and Liao, H., 2017a. A literature review on planning and analysis
of accelerated testing for reliability assessment. Quality and Reliability Engineering
International 33(8): 2361–2383.
Limon, S., Yadav, O. P. and Nepal, B., 2017b. Estimation of product lifetime considering
gamma degradation process with multi-stress accelerated test data. IISE Annual
Conference Proceedings, pp. 1387–1392.
Limon, S., Yadav, O. P. and Nepal, B., 2018. Remaining useful life prediction using ADT data
with Inverse Gaussian process model. IISE Annual Conference Proceedings, pp. 1–6.
Lisnianski, A., Frenkel, I. and Ding, Y., 2010. Multi-state System Reliability Analysis and
Optimization for Engineers and Industrial Managers, Berlin, Germany: Springer
Science & Business Media.
Lisnianski, A. and Levitin, G., 2003. Multi-state System Reliability: Assessment,
Optimization, and Applications, Singapore: World scientific.
Liu, Y. W. and Kapur, K. K. C., 2007. Customer’s cumulative experience measures for reli-
ability of non-repairable aging multi-state systems. Quality Technology & Quantitative
Management 4(2): 225–234.
Moghaddass, R. and Zuo, M. J., 2014. An integrated framework for online diagnostic and
prognostic health monitoring using a multistate deterioration process. Reliability
Engineering & System Safety 124: 92–104.
Narendran, N. and Gu, Y., 2005. Life of led-based white light sources. Journal of Display
Technology 1: 167–171.
Nelson, W., 2004. Accelerated Testing: Statistical Models, Test Plans and Data Analysis (2nd
ed.), New York: John Wiley & Sons.
Ng, S. K. and Moses, F., 1998. Bridge deterioration modeling using semi-Markov theory.
A. A. Balkema Uitgevers B. V, Structural Safety and Reliability 1: 113–120.
O’Connor, P. D. D. T. and Kleyner, A., 2012. Practical Reliability Engineering (5th ed.),
Chichester, UK: Wiley.
Park, C. and Padgett, W. J., 2005. Accelerated degradation models for failure based on geo-
metric Brownian motion and gamma processes. Lifetime Data Analysis 11: 511–527.
Park, J. I. and Yum, B. J., 1997. Optimal design of accelerated degradation tests for estimating
mean lifetime at the use condition. Engineering Optimization 28: 199–230.
Ross, S., 1995. Stochastic Processes, New York: Wiley.
Sánchez-Silva, M. and Klutke, G. A., 2016. Reliability and Life-cycle Analysis of
Deteriorating Systems (Vol. 182). Cham, Switzerland: Springer International
Publishing.
Shahraki, A. F. and Yadav, O. P., 2018. Selective maintenance optimization for multi-
state systems operating in dynamic environments. In 2018 Annual Reliability and
Maintainability Symposium (RAMS). IEEE: pp. 1–6.
Shahraki, A. F., Yadav, O. P. and Liao, H., 2017. A review on degradation modelling and its
engineering applications. International Journal of Performability Engineering 13(3): 299.
Sheu, S. H. and Zhang, Z. G., 2013. An optimal age replacement policy for multi-state
systems. IEEE Transactions on Reliability 62(3): 722–735.
106 Reliability Engineering
Sheu, S. H., Chang, C. C., Chen, Y. L. and Zhang, Z. G., 2015. Optimal preventive mainte-
nance and repair policies for multi-state systems. Reliability Engineering & System
Safety, 140, 78–87.
Si, X. S., Wang, W., Hu, C. H. and Zhou, D. H., 2011. Remaining useful life estimation:
A review on the statistical data driven approaches. European Journal of Operational
Research 213(1): 1–14.
Trivedi, K, 2002. Probability and Statistics with Reliability, Queuing and Computer Science
Applications, New York: Wiley.
Wang, X. and Xu, D., 2010. An inverse Gaussian process model for degradation data.
Technometrics 52: 188–197.
Ye, Z. S. and Chen, N., 2014. The inverse Gaussian process as a degradation model.
Technometrics 56: 302–311.
Ye, Z. S., Wang, Y., Tsui, K. L. and Pecht, M., 2013. Degradation data analysis using Wiener
processes with measurement errors. IEEE Transactions on Reliability 62: 772–780.
4 Building a Semi-automatic
Design for Reliability
Survey with Semantic
Pattern Recognition
Christian Spreafico and Davide Russo
CONTENTS
4.1 Introduction................................................................................................... 107
4.2 Research Methodology and Pool Definition.................................................. 109
4.2.1 Definition of the Electronic Pool....................................................... 109
4.2.2 Definition of the Features of Analysis............................................... 109
4.2.2.1 Goals................................................................................... 110
4.2.2.2 Strategies (FMEA Interventions)........................................ 110
4.2.2.3 Integrations......................................................................... 111
4.3 Semi-automatic Analysis............................................................................... 111
4.4 Results and Discussion.................................................................................. 115
4.5 Conclusions.................................................................................................... 119
References............................................................................................................... 120
4.1 INTRODUCTION
Almost 70 years after its introduction, Failure Modes and Effects Analysis (FMEA)
has been applied in a large series of cases from different sectors, such as automotive,
electronics, construction and services, and has become a standard procedure in many
companies for quality control and for the design of new products. FMEA has also a
great following in the scientific community as testified by the vast multitude of related
documents from scientific and patent literature; to date, more than 3,600 papers
in Scopus DB and 146 patents in Espacenet DB come up by just searching for
FMEA without synonyms, with a trend of constant growth over the years.
The majority of those contributions deals with FMEA modifications involving
the procedure and the integrations with new methods and tools to enlarge the field
of application and to improve the efficiency of the analysis, such as by reducing the
required time and by finding more results.
To be able to orientate among the many contributions, the surveys proposed in
the literature can play a fundamental role, which have been performed according to
different criteria of data gathering and classification.
107
108 Reliability Engineering
In [1] the authors analyzed scientific papers about the description and review of
basic principles, the types, the improvements, the computer automation codes, the
combination with other techniques, and specific applications of FMEA.
The literature survey in [2] analyzes the FMEA applications for enhancing service
reliability by determining how FMEA is focused on profit and supply chain-oriented
service business practices. The significant contribution consists in comparing what
previously was mentioned about FMEA research opportunities and in observing how
FMEA is related to enhancement in Risk Priority Number (RPN), reprioritization,
versatility of its application in service supply chain framework and non-profit service
sector, as well as in combination with other quality control tools, which are proposed
for further investigations.
In [3], the authors studied 62 methodologies about risk analysis by separat-
ing them into three different phases (identification, evaluation, and hierarchiza-
tion) and by studying their inputs (plan or diagram, process and reaction, products,
probability and frequency, policy, environment, text, and historical knowledge),
the implemented techniques to analyze risk (qualitative, quantitative, determin-
istic, and probabilistic), and their output (management, list, probabilistic, and
hierarchization).
In [4], the authors analyzed the innovative proposed approaches to overcome
the limitations of the conventional RPN method within 75 FMEA papers published
between 1992 and 2012 by identifying which shortcomings attract the most attention,
which approaches are the most popular, and the inadequacy of approaches.
Other authors focused on analyzing specific applications of the FMEA approach.
In [5] the authors studied how 78 companies of motor industry in the United Kingdom
apply FMEA by identifying some common difficulties such as time constraints, poor
organizational understanding of the importance of FMEA, inadequate training, and
lack of management commitment.
However, despite the results achieved by these surveys, no overview considers
all the proposals presented, including patents, and analyzes at a higher level than
“simple” document counting within the cataloging classes and tools used.
To fulfill this aim, a previous survey [6] considerably increased the number
of analyzed documents, by including also patents. In addition, the analysis of the
content was improved by carrying out the analysis on two related levels: followed
strategies of intervention (e.g., reduce time of application) and integrated tools
(e.g., fuzzy logic). Although the results achieved are remarkable, the main limita-
tions of this analysis are the onerous amount of time required along with the number
of correlations between different aspects (e.g., problems and solutions, methods and
tools, etc.).
This chapter proposes a semi-automatic semantic analysis about documents
related to FMEA modifications and the subsequent manual review for reassuming
each of them through a simple sentence made by a causal chain including the decla-
ration of the goals, the followed strategies (FMEA modifications), and integrations
with methods/tools.
This chapter is organized as follows. Section 4.2 presents the proposed procedure
of analysis, Section 4.3 proposes the results and the discussions, and Section 4.4 draws
conclusions.
Building a Semi-automatic Design for Reliability Survey 109
80
70
203
60 Papers 17
50
40
30 23 86
20 Patents
10
0
(a) (b) Academia Industry
Papers Patents
FIGURE 4.1 (a) Time distribution (priority date) of the collected documents and (b) compo-
sition of the final set of documents (papers vs. patents and academia vs. industry).
110 Reliability Engineering
4.2.2.1 Goals
These features deal with targets that the authors who is proposing the analyzed
FMEA modifications wants to achieve through them. All of them focus on improving
the main aspects related to the applicability of the method (e.g., reducing the required
input, improving expected output, ameliorating the approach of the involved actors):
4.2.2.3 Integrations
The following kinds of integrations have been collected:
• Templates (e.g., tables and matrices) to organize and manage the bill of
material, the list of functions and faults, and the related risk.
• Database (DB) containing information about product parts, functions,
historical failures, risk, and the related economic quantifications. They are
used to automatically or manually gather the content for the analysis.
• Tools for fault analysis (Fault A.) including Fault Tree Analysis (FTA),
Fishbone diagram and Root Cause Analysis (RCA) ([17], [38]).
• Interactive graphical interfaces or software that directly involve user inter-
actions through graphical elements and representations (e.g., plant schemes
and infographics) for data entry and visualization.
• Artificial Intelligence (AI) based tools involving Semantic Recognition and
Bayesian Networks ([12], [67], [102], [125], [127], [129], [133]).
Other considered integrations are function analysis (FA), fuzzy logic, Monte Carlo
method, quality function deployment (QFD), hazard and operability study (HAZOP),
ontologies, theory of inventive problem solving (TRIZ), guidelines, automatic mea-
surements (AM) methods, brainstorming techniques, and cognitive maps (C Map).
TABLE 4.1
Keywords Used to Explain the Features Through the Queries
Generic terms
Name Verbs FMEA Terms Methods/Tool
FMEA, Human, Approach, Improve, Anticipate, Failures, Modes, Fuzzy, TRIZ, Database,
Design, Production, Ameliorate, Effects, Cause, Artificial Intelligence,
Maintenance, Time, Automatize, Analyze, Risk, Solving, QFD, Function
Costs, Problem Reduce, Eliminate, Decision making Analysis, etc.
Solve
TABLE 4.2
Example of the Strategy Used to Build the Triads
Considered document
Investigated Used Syntactic Triad Subject +
Features Keyword Parser Related Sentence Verb + Object
Ameliorate Improve Improve + The objective of this paper is The improved
Human Human to propose a new approach Failure Modes
Approach Approach for simplifying FMEA by Determination
determining the failures in a ameliorates human
more practical way by better approach
involving the problem solver
in a more pro-active and
creative approach
Improve Failure Improve Improve + Perturbed Functional Analysis Perturbed Function
Modes Failure is proposed in order to Analysis improves
determination Modes improve the capability of Failure Modes
determine Failure Modes determination
Introduce TRIZ TRIZ TRIZ + Specifically, an inedited The authors propose
Perturbed version of TRIZ function the Perturbed
Function analysis, called “Perturbed Function Analysis
Analysis Function Analysis” is
(Modifier) proposed
Source: Spreafico, C. and Russo, D., Can TRIZ functional analysis improve FMEA? Advances in
Systematic Creativity Creating and Managing Innovations, Palgrave Macmillan, Cham,
Switzerland, pp. 87–100, 2019.
TABLE 4.3
An Extract from the Table of Comparison of the Documents and the Triads
Features
Goal Strategy Methods/Tools
Ameliorate Improve Failures Introduce Perturbed
Document Human Approach … Determination … Function Analysis …
[7] The improved … Perturbed Function … The authors …
failure modes Analysis
… … … … … … …
114 Reliability Engineering
Why? Why?
How? How?
Node N+1 Node N Node N-1
modes) related to a determined feature that has been redefined by using the verb and
the object of the triad (e.g., ameliorates human approach).
Therefore, the identified subjects are used as links to build the causal chains,
starting from the latter ones, related to the integrations with methods and tools.
For example, the causal chain resulting from the previous example (Table 4.3) is the
authors introduce the Perturbed Function Analysis (METHOD/TOOL) IN ORDER
TO Improve the failure identification (STRATEGY) IN ORDER TO Ameliorate
Human Approach (GOAL).
By reading the causal chain in this manner, the logic on its base is the following:
each node provides the explanation of the existence of the previous one (WHY?) and
it represents a way to obtain the next one (HOW?).
Figure 4.2 shows an example of the simpler causal chain that can be built,
which is constituted by one goal (i.e., Ameliorate Human Approach), one strategy
(e.g., Improve Failure Determination), and one integration with methods or tools
(i.e., The Perturbed Function Analysis).
This example represents the simplest obtained causal chain, consisting of only
three nodes arranged in sequence: one for the goals, one for the strategies, and one
for the integrations with methods/tools.
However, the structure of the causal chain can be more complex because the num-
ber of nodes can increase and their reciprocal disposition can change from series to
parallel and by a mix of both.
In the first case (nodes in series), each intermediate node is preceded (on the left)
by another node expressing its motivation (WHY?—relation) and it is followed by
another representing a way to realize it (HOW?—relation). More goals can be con-
nected in the same way, through their hierarchization: e.g., the goal “reduce the
number of experts” can be preceded by the more generic goal “reduce FMEA costs.”
The same reasoning is valid for the strategies and the integrations with methods/
tools. In particular, in this case, we stratified them into four hierarchical levels: (1)
theories and logics (e.g., fuzzy logic), (2) methods (e.g., TRIZ), (3) tool, which can be
included in the methods (e.g., FA is part of TRIZ), and (4) knowledge sources (e.g.,
costs DB).
Building a Semi-automatic Design for Reliability Survey 115
Automate
Reduce FMEA
Failures Fuzzy logic Failures DB
me/costs
CN202887188 determinaon
FIGURE 4.3 Example of a complex causal chain obtained from the patent. (From Ming, X.
et al., System capable of achieving failure mode and effects analysis (FMEA) data multi
dimension processing, CN202887188, filed June 4, 2012, and issued April 17, 2013.
Representation is courtesy of the authors.)
In the second case (nodes in parallel), two or more nodes can concurrently pro-
vide a motivation for a previous node or be two possibilities to realize the subsequent
node.
As example of a more complex causal chain, consider the Chinese patent [8].
Table 4.4 represents an extract from the table of comparison relative to this docu-
ment: as can be seen, the resulting relations between the included subjects and the
features are more complex and interlaced in comparison to the example shown in
Table 4.3.
Figure 4.3 represents the causal chain obtained for this document. In this case the
two nodes reduce FMEA time/costs and analyze complex systems represent the two
main independent goals pursued by this contribution. The two nodes Automate
Failure Determination and Automate Risk Analysis are the two followed strategies
both for reduce FMEA time/cost” and to analyze Complex Systems. Finally, the
node fuzzy logic represents a high-level integration to realize the two strategies,
while a failure DB and a risk DB have been used to provide the knowledge for a
fuzzy logic-based reasoning in two different ways: the first one is used for Automate
the Failure determination (through fuzzy logic) and the second one is to Automate
Risk Analysis (through fuzzy logic).
TABLE 4.4
An Extract from the Table of Comparison of the Documents and the Triads, Line of the Document
Features
Goal Strategy Methods/Tools
Reduce Analyze Complex Automate Failure Automate Introduce Fuzzy
Document FMEA Time/Costs Systems Determination Risk Analysis Logic Introduce Failure DB Introduce Risk DB
[8] Automate Failure Automate Failure Fuzzy logic Fuzzy logic Failure DB The authors The authors
Determination Determination
Automate Risk Automate Risk Risk DB
Analysis Analysis
Source: Ming, X. et al., System capable of achieving failure mode and effects analysis (FMEA) data multi-dimension processing, CN202887188, filed June 4, 2012, and
issued April 17, 2013.
Reliability Engineering
Building a Semi-automatic Design for Reliability Survey 117
consists of more than four nodes, including at least one for each part (goal, strategy,
and integration). The total number of the causal chains is the same of the analyzed
document (127), since their correspondence is biunivocal: for each document there
was only one causal chain and vice versa.
In general, the more followed goals are Improve Design and Improve Human
Approach, which together are contained within 61 percent of the triads, while the
more considered strategies are related to the failure determination (automate and
improve), followed by Automate Risk Analysis.
Among the integrations with methods and tools, fuzzy logic and databases are
the most diffused, respectively, with 37 and 28 occurrences within the causal chains,
followed by the interface with 23 occurrences.
More detailed considerations are possible by analyzing the relations between goals
and strategies. In fact, the two more diffused strategies are considered differently:
those for failure determination are implemented to realize all the goals, while those
for Improving Risk Analysis are especially considered to Improve Human Approach
but practically ignored for achieving other purposes (i.e., Improve Design and
Analyze Complex Systems).
Other considerations can be done by comparing the couplings between multiple
goals, strategies, and tools.
By comparing the combinations between goals, the most considered combina-
tions found are: Improve Design—Improve Human Approach (8 occurrences) and
Improve Design—Analyze Complex Systems (7 occurrences), and Improve Human
Approach—Reduce Production Time/Costs (7 occurrences).
Among the combinations of the strategies that emerged, the most considered
combinations are: Automate Failure Determination—Automate Risk Analysis
(12 occurrences) and Automate Failure Determination—Improve Risk Analysis
(7 occurrences).
Finally, the analysis of the multiple integrations revealed that the common cou-
pling is between fuzzy logic and DBs with 6 occurrences.
A deeper analysis can be done by considering the causal chains. Among the dif-
ferent possibilities, the most significant deals with the comparison of the common
triads, or the combinations of three nodes: goal, strategy, and integration. In this way,
a synthetic but sufficiently significant indication is obtained to understand how the
authors are working to improve FMEA.
Figure 4.4 shows the tree map of the common triads, where the five main areas
are the goals, their internal subdivisions (colored) represent the strategies, in turn
divided between the integrations, where are reported the documents index (please
refer to the legend).
For example, analyzing the graph shows that the three documents [11,97,101] pro-
pose modified versions of FMEA based on the same common triad, or with the
objective to Improve Design phase, by improving the determination of the failures
through the introduction of databases (DB). Other goals, strategies, or integrations
differentiate the three contributions.
Analysis of the common triad shows that the most diffused consider the
goal Improve Human Approach: Improve Human Approach—Improve Risk
Analysis—Fuzzy (8 documents), Improve Human Approach—Improve Function
118 Reliability Engineering
FIGURE 4.4 Main solutions proposed in papers and patents to improve FMEA, represented
through triads (goal, strategy, and method/tool).
Building a Semi-automatic Design for Reliability Survey 119
4.5 CONCLUSIONS
In this chapter a method for performing semi-automatic semantic analysis about
FMEA documents has been presented and applied on a pool of 127 documents,
consisting of paper and patents, selected from international journals, conference
proceedings, and international patents.
As a result, each document has been summarized through a specific causal chain
including its considered goals (i.e., Improve Design, Improve Human Approach,
Reduce FMEA Time/Costs, Reduce Production Time/Costs, Analyze Complex
Systems), its strategies of intervention (Improve/Automate BoM, Function, Failures
Determination, Risk Analysis and Problem solving) and the integrated methods,
tools, and knowledge sources.
The main output of this work is summarized in an infographic based on a Treemap
diagram style comparing all the considered documents on the basis of the common
elements in their causal chains, which highlights the more popular direction at dif-
ferent levels of detail (i.e., strategies, methods, and tools) of intervention in relation
to the objective to pursue.
The consistent reduction of required time along with the number of considered
analyzed sources and the level of deepening of the same, represented by the ability to
determine the relationships between the different parameters of the analysis within
the causal chain, are elements of novelty compared to previous surveys, which could
positively impact scientific research in the sector.
The main limitations of the approach consist of the complexity of the manual
operations required to define the electronic pool and to create part of the relations
within the causal chains, which will be partly solved by automating the method for
future developments.
120 Reliability Engineering
REFERENCES
1. Bouti, A., and Kadi, D. A. 1994. A state-of-the-art review of FMEA/FMECA.
International Journal of Reliability Quality and Safety Engineering 1(04): 515–543.
2. Sutrisno, A., and Lee, T. J. 2011. Service reliability assessment using failure mode and
effect analysis (FMEA): Survey and opportunity roadmap. International Journal of
Engineering Science and Technology 3(7): 25–38.
3. Tixier, J., Dusserre, G., Salvi, O., and Gaston, D. 2002. Review of 62 risk analysis meth-
odologies of industrial plants. Journal of Loss Prevention in the Process Industries
15(4): 291–303.
4. Liu, H. C., Liu, L., and Liu, N. 2013. Risk evaluation approaches in failure mode and
effects analysis: A literature review. Expert Systems with Applications 40(2): 828–838.
5. Dale, B. G., and Shaw, P. 1990. Failure mode and effects analysis in the UK motor
industry: A state‐of‐the‐art study. Quality and Reliability Engineering International
6(3): 179–188.
6. Spreafico, C., Russo, D., and Rizzi, C. 2017. A state-of-the-art review of FMEA/FMECA
including patents. Computer Science Review 25: 19–28.
7. Spreafico, C., & Russo, D. (2019). Case: Can TRIZ Functional Analysis Improve FMEA?
In Advances in Systematic Creativity (pp. 87–100). Palgrave Macmillan, Cham.
8. Ming, X., Zhu, B., Liang, Q., Wu, Z., Song, W., Xia R., and Kong, F. 2013. System
capable of achieving failure mode and effects analysis (FMEA) data multi-dimension
processing. CN202887188, filed June 4, 2012, and issued April 17, 2013.
9. Ahmadi, M., Behzadian, K., Ardeshir, A., and Kapelan, Z. 2017. Comprehensive risk
management using fuzzy FMEA and MCDA techniques in highway construction
projects. Journal of Civil Engineering and Management 23(2): 300–310.
10. Almannai, B., Greenough, R., and Kay, J. 2008. A decision support tool based on QFD
and FMEA for the selection of manufacturing automation technologies. Robotics and
Computer-Integrated Manufacturing 24(4): 501–507.
11. Arcidiacono, G., and Campatelli, G. 2004. Reliability improvement of a diesel engine
using the FMETA approach. Quality and Reliability Engineering International 20(2):
143–154.
12. Augustine, M., Yadav, O. P., Jain, R., and Rathore, A. 2009. Modeling physical systems
for failure analysis with rate cognitive maps. Industrial Engineering and Engineering
Management. IEEM 2009 IEEE International Conference 1758–1762.
13. Lai, J., Zhang, H., & Huang, B. (2011, June). The object-FMA based test case generation
approach for GUI software exception testing. In the Proceedings of 2011 9th International
Conference on Reliability, Maintainability and Safety (pp. 717–723). IEEE.
14. Banghart, M., and Fuller, K. 2014. Utilizing confidence bounds in Failure Mode Effects
Analysis (FMEA) hazard risk assessment. Aerospace Conference, 2014 IEEE 1–6.
15. Bertelli, C. R., and Loureiro, G. 2015. Quality problems in complex systems even con-
sidering the application of quality initiatives during product development. ISPE CE
40–51.
16. Bevilacqua, M., Braglia, M., and Gabbrielli, R. 2000. Monte Carlo simulation
approach for a modified FMECA in a power plant. Quality and Reliability Engineering
International 16(4): 313–324.
17. Bluvband, Z., Polak, R., and Grabov, P. 2005. Bouncing failure analysis (BFA):
The unified FTA-FMEA methodology. Reliability and Maintainability Symposium
Proceedings Annual 463–467.
18. Bowles, J. B., and Peláez, C. E. 1995. Fuzzy logic prioritization of failures in a system
failure mode, effects and criticality analysis. Reliability Engineering & System Safety
50(2): 203–213.
Building a Semi-automatic Design for Reliability Survey 121
19. Braglia, M., Fantoni, G., and Frosolini, M. 2007. The house of reliability. International
Journal of Quality & Reliability Management 24(4): 420–440.
20. Braglia, M., Frosolini, M., and Montanari, R. 2003. Fuzzy TOPSIS approach for
failure mode, effects and criticality analysis. Quality and Reliability Engineering
International 19(5): 425–443.
21. Doskocil, D. C., and Offt, A. M. 1993. Method for fault diagnosis by assessment
of confidence measure. CA2077772, filed September 9, 1992, and issued April 25,
1993.
22. Draber S. 2000. Method for determining the reliability of technical systems.
CA2300546, filed March 7, 2000, and issued September 8, 2000.
23. Chang, K. H., and Wen, T. C. 2010. A novel efficient approach for DFMEA combining
2–tuple and the OWA operator. Expert Systems with Applications 37(3): 2362–2370.
24. Chen, L. H., and Ko, W. C. 2009. Fuzzy linear programming models for new product
design using QFD with FMEA. Applied Mathematical Modelling 33(2): 633–647.
25. Chin, K. S., Chan, A., and Yang, J. B. 2008. Development of a fuzzy FMEA based prod-
uct design system. The International Journal of Advanced Manufacturing Technology
36(7–8): 633–649.
26. Zhang, L., Liang, W., and Hu, J. 2011. Modeling method of early warning model of
mixed failures and early warning model of mixed failures. CN102262690, filed June 7,
2011, and issued November 30, 2011.
27. Pan, L., Chin, X., Liu, X., Wang, W., Chen, C., Luo, J., Peng, X. et al., 2012. Intelligent
integrated fault diagnosis method and device in industrial production process.
CN102637019, filed February 10, 2011, and issued August 15, 2012.
28. Ming, X., Zhu, B., Liang, Q., Wu, Z., Song, W., Xia, R., and Kong, F. 2012. System
for implementing multidimensional processing on failure mode and effect analysis
(FMEA) data, and processing method of system. CN102810112, filed June 4, 2012, and
issued December 5, 2012.
29. Li, G., Zhang, J., and Cui, C. 2012. FMEA (Failure Mode and Effects Analysis) pro-
cess auxiliary and information management method based on template model and text
matching. CN102831152, filed June 28, 2012, and issued December 19, 2012.
30. Li, R., Xu, P., and Xu, Y. 2012. Accidence safety analysis method for nuclear fuel repro-
cessing plant. CN102841600, filed August 24, 2012, and issued December 26, 2012.
31. Jia, Y., Shen, G., Jia, Z., Zhang, Y., Wang, Z., and Chen, B. 2013. Reliability com-
prehensive design method of three kinds of functional parts. CN103020378, filed
December 26, 2012, and issued April 3, 2013.
32. Chen, Y., Zhang, X., Gao, L., and Kang, R. 2014. Newly-developed aviation electronic
product hardware comprehensive FMECA method. CN103760886, filed December 2,
2013, and issued April 30, 2014.
33. Liu, Y., Deng, Z., Liu, S., Chen, X., Pang, B., Zhou, N., and Chen, Y. 2014. Method
for evaluating risk of simulation system based on fuzzy FMEA. CN103902845, filed
April 25, 2014, and issued July 2, 2014.
34. He, C., Zhao, H., Liu, X., Zong, Z., Li, L., Jiang, J., and Zhu, J. 2014. Data mining-based
hardware circuit FMEA (Failure Mode and Effects Analysis) method. CN104198912,
filed July 24, 2014, and issued December 10, 2014.
35. Xu, H., Wang, Z., Ren, Y., Yang D., and Liu, L. 2015. Failure knowledge storage and
push method for FMEA (failure mode and effects analysis) process. CN104361026,
filed October 22, 2014, and issued February 18, 2015.
36. Tang, Y., Sun, Q., and Lü, Z. 2015. Failure diagnosis modeling method based on design-
ing data analysis. CN104504248, filed December 5, 2014, and issued April 8, 2015.
37. David, P., Idasiak, V., and Kratz, F. 2010. Reliability study of complex physical systems
using SysML. Reliability Engineering & System Safety 95(4): 431–450.
122 Reliability Engineering
38. Demichela, M., Piccinini, N., Ciarambino, I., and Contini, S. 2004. How to avoid the
generation of logic loops in the construction of fault trees. Reliability Engineering &
System Safety 84(2): 197–207.
39. Deshpande, V. S., and Modak, J. P. 2002. Application of RCM to a medium scale
industry. Reliability Engineering & System Safety 77(1): 31–43.
40. Doble, M. 2005. Six Sigma and chemical process safety. International Journal of Six
Sigma and Competitive Advantage 1(2): 229–244.
41. Van Bossuyt, D., Hoyle, C., Tumer, I. Y., and Dong, A. 2012. Risk attitudes in risk-
based design: Considering risk attitude using utility theory in risk-based design. AI
EDAM 26(4): 393–406.
42. Ebrahimipour, V., Rezaie, K., and Shokravi, S. 2010. An ontology approach to support
FMEA studies. Expert Systems with Applications 37(1): 671–677.
43. Draber, C. D. 2000. Method for determining the reliability of technical systems.
EP1035454, filed March 8, 1999, and issued September 8, 2000.
44. Eubanks, C. F., Kmenta, S., and Ishii, K. 1996. System behavior modeling as a basis
for advanced failure modes and effects analysis. ASME Computers in Engineering
Conference, Irvine, CA, pp. 1–8.
45. Eubanks, C. F., Kmenta, S., and Ishii, K. 1997. Advanced failure modes and effects
analysis using behavior modeling. ASME Design Engineering Technical Conferences,
Sacramento, CA, pp. 14–17.
46. Gandhi, O. P., and Agrawal, V. P. 1992. FMEA—A diagraph and matrix approach.
Reliability Engineering & System Safety 35(2): 147–158.
47. Hartini, S., Nugroho, W. P., and Subekti, K. R. 2010. Design of Equipment Rack with
TRIZ Method to Reduce Searching Time in Change Over Activity (Case Study: PT.
Jans2en Indonesia). Proceedings of the Apchi Ergo Future.
48. Hassan, A., Siadat, A., Dantan, J. Y., and Martin, P. 2010. Conceptual process plan-
ning–an improvement approach using QFD, FMEA, and ABC methods. Robotics and
Computer-Integrated Manufacturing 26(4): 392–401.
49. Hu, C. M., Lin, C. A., Chang, C. H., Cheng, Y. J., and Tseng, P. Y. 2014. Integration with
QFDs, TRIZ and FMEA for control valve design. Advanced Materials Research Trans
Tech Publications 1021: 167–180.
50. Jenab, K., Khoury, S., and Rodriguez, S. 2015. Effective FMEA analysis or not.
Strategic Management Quarterly 3(2): 25–36.
51. Jong, C. H., Tay, K. M., and Lim, C. P. 2013. Application of the fuzzy failure mode and
effect analysis methodology to edible bird nest processing. Computers and Electronics
in Agriculture 96: 90–108.
52. Koizumi, A., Shimokawa K., and Isaki, Y. 2003. Fmea system. JP2003036278, filed
July 25, 2001, and issued February 7, 2003.
53. Wada, T., Miyamoto, Y., Murakami, S., Sugaya, A., Ozaki Y., Sawai, T., Matsumoto, S.
et al., 2003. Diagnosis rule structuring method based on failure mode analysis, diagnosis
rule creating program, and failure diagnosis device. JP2003228485, filed February 6,
2002, and issued August 15, 2003
54. Yatake, H., Konishi, H., and Onishi T. 2009. Fmea sheet creation support system and
creation support program. JP2011008355, filed June 23, 2009.
55. Suzuki, K., Hayata, A., and Yoshioka, M. 2009. Reliability analysis device and method.
JP2011113217, issued November 25, 2009.
56. Kawai, M., Hirai, K., and Aryoshi, T. 1990. Fmea simulation method for analyzing
circuit. JPH0216471, filed July 4, 1988, and issued January 19, 1990.
57. Sonoda, Y., and Kageyama., T. 1992. Plant diagnostic device. JPH086635, filed May 9,
1990, and issued January 23, 1992.
Building a Semi-automatic Design for Reliability Survey 123
58. Kim, J. H., Kim, I. S., Lee, H. W., and Park, B. O. 2012. A Study on the Role of TRIZ
in DFSS. SAE International Journal of Passenger Cars-Mechanical Systems 5(2012–
01–0068): 22–29.
59. Kimura, F., Hata, T., and Kobayashi, N. 2002. Reliability-centered maintenance plan-
ning based on computer-aided FMEA. Proceeding of the 35th CIRP-International
Seminar on Manufacturing Systems 506–511.
60. Kmenta, S., and Ishii, K. 2000. Scenario-based FMEA: A life cycle cost perspective.
Proceedings of ASME Design Engineering Technical Conference, Baltimore, MD.
61. Kmenta, S., and Ishii, K. 2004. Scenario-based failure modes and effects analysis using
expected cost. Journal of Mechanical Design 126(6): 1027–1035.
62. Kmenta, S., and Ishii, K. 1998. Advanced FMEA using meta behavior modeling for
concurrent design of products and controls. Proceedings of the 1998 ASME Design
Engineering Technical Conferences.
63. Kmenta, S., Cheldelin, B., and Ishii, K. 2003. Assembly FMEA: A simplified method
for identifying assembly errors. ASME 2003 International Mechanical Engineering
Congress and Exposition 315–323.
64. Lee, M. S., and Lee, S., H. 2013. Real-time collaborated enterprise asset management
system based on condition-based maintenance and method thereof. KR20130065800,
filed November 30, 2011, and issued June 24, 2013.
65. Choi, S. H., Kim, G. H., Cho, C. H., and Kim, Y., G. 2013. Reliability centered main-
tenance method for power generation facilities. KR20130118644, filed April 20, 2012,
and issued December 12, 2013.
66. Lim, S. S., and Lee, J., Y. 2014. Intelligent failure asset management system for railway
car. KR20140036375, filed September 12, 2012, and issued March 3, 2014.
67. Ku, C., Chen, Y. S., and Chung, Y. K. 2008. An intelligent FMEA system implemented
with a hierarchy of back-propagation neural networks. Cybernetics and Intelligent
Systems IEEE Conference 203–208.
68. Kutlu, A. C., and Ekmekçioğlu, M. 2012. Fuzzy failure modes and effects analysis by
using fuzzy TOPSIS-based fuzzy AHP. Expert Systems with Applications 39(1): 61–67.
69. Laaroussi, A., Fiès, B., Vankeisbelckt, R., and Hans, J. 2007. Ontology-aided
FMEA for construction products. Bringing ITC knowledge to work. Proceedings of
W78 Conference 26(29): 6.
70. Lee, B. H. 2001. Using FMEA models and ontologies to build diagnostic models. AI
EDAM 15(4): 281–293.
71. Lindahl, M. 1999. E-FMEA—a new promising tool for efficient design for environment.
Proceedings of Environmentally Conscious Design and Inverse Manufacturing 734–739.
72. Liu, H. T. 2009. The extension of fuzzy QFD: From product planning to part deploy-
ment. Expert Systems with Applications 36(8): 11131–11144.
73. Liu, J., Martínez, L., Wang, H., Rodríguez, R. M., and Novozhilov, V. 2010. Computing
with words in risk assessment. International Journal of Computational Intelligence
Systems 3(4): 396–419.
74. Liu, H. C., Liu, L., Liu, N., and Mao, L. X. 2013. Risk evaluation in failure mode
and effects analysis with extended VIKOR method under fuzzy environment. Expert
Systems with Applications 40(2): 828–838.
75. Grantham, K. (2007). Detailed risk analysis for failure prevention in conceptual design:
RED (Risk in early design) based probabilistic risk assessments.
76. Mader, R., Armengaud, E., Grießnig, G., Kreiner, C., Steger, C., and Weiß, R. 2013.
OASIS: An automotive analysis and safety engineering instrument. Reliability
Engineering & System Safety 120: 150–162.
77. Mandal, S., and Maiti, J. 2014. Risk analysis using FMEA: Fuzzy similarity value and
possibility theory-based approach. Expert Systems with Applications 41(7): 3527–3537.
124 Reliability Engineering
96. Su, C. T., and Chou, C. J. 2008. A systematic methodology for the creation of Six Sigma
projects: A case study of semiconductor foundry. Expert Systems with Applications
34(4): 2693–2703.
97. Suganthi, S., and Kumar, D. 2010. FMEA without fear AND tear. In Management of
Innovation and Technology (ICMIT)IEEE International Conference 1118–1123.
98. Ming Tan, C. 2003. Customer-focused build-in reliability: A case study. International
Journal of Quality & Reliability Management 20(3): 378–397.
99. Meng Tay, K., and Peng Lim, C. 2006. Fuzzy FMEA with a guided rules reduction
system for prioritization of failures. International Journal of Quality & Reliability
Management 23(8): 1047–1066.
100. Teng, S. H., and Ho, S. Y. 1996. Failure mode and effects analysis: An integrated
approach for product design and process control. International Journal of Quality &
Reliability Management 13(5): 8–26.
101. Teoh, P. C., and Case, K. 2004. Failure modes and effects analysis through knowledge
modelling. Journal of Materials Processing Technology153: 253–260.
102. Throop, D. R., Malin, J. T., and Fleming, L. D. 2001. Automated incremental design
FMEA. IEEE Aerospace Conference. Proceedings 7: 7–3458.
103. Johnson, T., Azzaro, S., and Cleary, D., 2004. Method, system and computer prod-
uct for integrating case-based reasoning data and failure modes, effects and corrective
action data. US2004103121, filed November 25, 2002, and issued May 27, 2004.
104. Johnson, T. L., Cuddihy, P. E., and Azzaro, S. H. 2004. Method, system and computer
product for performing failure mode and effects analysis throughout the product life
cycle. US2004225475, filed November 25, 2002, and issued November 11, 2004.
105. Chandler, F. T., Valentino, W. D., Philippart, M. F., Relvini, K. M., Bessette, C. I.
and Shedd, N. P. 2004. Human factors process failure modes and effects analysis
(hf pfmea) software tool. US2004256718, filed April 15, 2004, and issued December 23,
2004.
106. Liddy, R., Maeroff, B., Craig, D., Brockers, T., Oettershagen, U., and Davis, T.
2005. Method to facilitate failure modes and effects analysis. US2005138477, filed
November 25, 2003, and issued June 23, 2005.
107. Lonh, K. J., Tyler, D. A., Simpson, T. A., and Jones, N. A. 2006. Method for predict-
ing performance of a future product. US2006271346, filed May 31, 2005, and issued
November 30, 2006.
108. Mosleh, A., Wang, C., and Groen, F. J. 2007. System and methods for assessing risk
using hybrid causal logic. US2007011113, filed March 17, 2006, and issued July 11,
2007.
109. Coburn, J. A., and Weddle, G. B. 2009. Facility risk assessment systems and methods.
US20090138306, filed September 25, 2008, and issued May 28, 2009.
110. Singh, S., Holland, S. W. and Bandyopadhyay, P. 2012. Graph matching system for
comparing and merging fault models. US2012151290, filed December 9, 2010, and
issued June 14, 2012.
111. Harsh, J. K., Walsh, D. E., and Miller, E., M. 2012. Risk reports for product qual-
ity planning and management. US2012254044, filed March 30, 2012, and issued
October 4, 2012.
112. Abhulimen, K. E. 2012. Design of computer-based risk and safety management sys-
tem of complex production and multifunctional process facilities-application to fpso’s,
US2012317058, filed June 13, 2011, and issued December 13, 2012.
113. Oh, K., P. 2013. Spreadsheet-based templates for supporting the systems engineering
process. US2013013993, filed August 24, 2011, and issued January 10, 2013.
114. Chang, Y. 2014. Product quality improvement feedback method. US20140081442, filed
September 18, 2012, and issued March 20, 2014.
126 Reliability Engineering
115. Barnard, R. F., Dohanich, S. L., and Heinlein, P., D. 1996. System for failure mode and
effects analysis. US5586252, filed May 24, 1994, and issued December 17, 1996.
116. Williams, E., and Rudoff, A. 2006. System and method for performing automated sys-
tem management. US7120559, filed June 29, 2004, and issued October 10, 2006.
117. Williams, E., and Rudoff, A. 2008. System and method for automated problem diagno-
sis. US7379846, filed June 29, 2004, and issued May 27, 2008.
118. Williams, E., and Rudoff A., 2009. System and method for providing a data structure
representative of a fault tree. US7516025, filed June 29, 2004, and issued April 7, 2009.
119. Dreimann, M., Ehlers, P., Goerisch, A., Maeckel, O., Sporer, R., and Sturm, A. 2007.
Method for analyzing risks in a technical project. US8744893, filed April 11, 2006, and
issued November 1, 2007.
120. Vahdani, B., Salimi, M., and Charkhchian, M. 2015. A new FMEA method by inte-
grating fuzzy belief structure and TOPSIS to improve risk evaluation process.
The International Journal of Advanced Manufacturing Technology 77(1–4): 357–368.
121. Wang, C. S., and Chang, T. R. 2010. Systematic strategies in design process for inno-
vative product development. Industrial Engineering and Engineering Management
Proceedings: 898–902.
122. Wang, M. H. 2011. A cost-based FMEA decision tool for product quality design and
management. IEEE Intelligence and Security Informatics Proceedings 297–302.
123. Wirth, R., Berthold, B., Krämer, A., and Peter, G. 1996. Knowledge-based support of
system analysis for the analysis of failure modes and effects. Engineering Applications
of Artificial Intelligence 9(3): 219–229.
124. Selvage, C. 2007. Look-across system. WO2007016360, filed July 28, 2006, and issued
February 28, 2007.
125. Bovey, R. L., and Senalp, E., T. 2010. Assisting with updating a model for diagnosing
failures in a system, WO2010038063, filed September 30, 2009, and issued April 8,
2010.
126. Snooke, N. A. 2010. Assisting failure mode and effects analysis of a system,
WO2010142977, filed June 4, 2010, and issued December 16, 2010.
127. Snooke, N. A. 2012. Automated method for generating symptoms data for diagnostic
systems, WO2012146908, filed April 12, 2012, and issued November 1, 2012.
128. Xiao, N., Huang, H. Z., Li, Y., He, L., and Jin, T. 2011. Multiple failure modes analysis
and weighted risk priority number evaluation in FMEA. Engineering Failure Analysis
18(4): 1162–1170.
129. Yang, C., Letourneau, S., Zaluski, M., and Scarlett, E. 2010. APU FMEA validation
and its application to fault identification. ASME International Design Engineering
Technical Conferences and Computers and Information in Engineering Conference
959–967.
130. Zafiropoulos, E. P., and Dialynas, E. N. 2005. Reliability prediction and failure mode
effects and criticality analysis (FMECA) of electronic devices using fuzzy logic.
International Journal of Quality & Reliability Management 22(2): 183–200.
131. Yang, Z., Bonsall, S., and Wang, J. 2008. Fuzzy rule-based Bayesian reasoning approach
for prioritization of failures in FMEA. IEEE Transactions on Reliability 57(3), 517–528.
132. Zhao, X., and Zhu, Y. 2010. Research of FMEA knowledge sharing method based on
ontology and the application in manufacturing process. Database Technology and
Applications (DBTA), 2nd International Workshop 1–4.
133. Zhou, J., and Stalhaane, T. 2004. Using FMEA for early robustness analysis of Web-
based systems. In Computer Software and Applications Conference Proceedings (2):
28–29.
5 Markov Chains and
Stochastic Petri Nets
for Availability and
Reliability Modeling
Paulo Romero Martins Maciel, Jamilson
Ramalho Dantas, and Rubens de Souza
Matos Júnior
CONTENTS
5.1 Introduction................................................................................................... 127
5.2 A Glance at History....................................................................................... 128
5.3 Background.................................................................................................... 130
5.3.1 Markov Chains.................................................................................. 130
5.3.2 Stochastic Petri Nets.......................................................................... 131
5.4 Availability and Reliability Models for Computer Systems.......................... 133
5.4.1 Common Structures for Computational Systems Modeling.............. 134
5.4.1.1 Cold, Warm, and Hot Standby Redundancy....................... 135
5.4.1.2 Active-Active and k-out-of-n Redundancy Mechanisms.......138
5.4.2 Examples of Models for Computational Systems.............................. 140
5.4.2.1 Markov Chains.................................................................... 140
5.4.2.2 SPN Models........................................................................ 143
5.5 Final Comments............................................................................................. 147
Acknowledgment.................................................................................................... 147
References............................................................................................................... 148
5.1 INTRODUCTION
Due to the ubiquitous provision of services on the internet, dependability has become
an attribute of prime concern in hardware/software development, deployment, and
operation. Providing fault-tolerant services is related inherently to the adoption of
redundancy. Redundancy can be exploited either in time or in space. Replication of
services usually is provided through distributed hosts across the world so that when-
ever the service, the underlying host, or network fails another service is ready to take
over. Dependability of a system can be understood as the ability to deliver a specified
functionality that can be justifiably trusted. Functionality might be a set of roles or
127
128 Reliability Engineering
used the term “Markov chain” [8]. In the 1910s, A. K. Erlang studied telephone traf-
fic planning for reliable service provisioning [10].
The first generation of electronic computers was entirely undependable; thence
many techniques were investigated for improving their reliability. Among such tech-
niques, many researchers investigated design strategies and evaluation methods.
Many methods then were proposed for improving system dependability such as error
control codes, replication of components, comparison monitoring, and diagnostic
routines. The leading researchers during that period were Shannon [13], Von
Neumann [14], and Moore [15], who proposed and developed theories for building
reliable systems by using redundant and less reliable components. These theories
were the forerunners of the statistical and probabilistic techniques that form the
groundwork of modern dependability theory [17].
In the 1950s, reliability turns out to be a subject of great interest because of the
cold war efforts, failures of American and Soviet rockets, and failures of the first
commercial jet—the British de Havilland Comet [18,19]. Epstein and Sobel’s 1953
paper on the exponential distribution was a landmark contribution [20]. In 1954, the
first Symposium on Reliability and Quality Control (it is now the IEEE Transactions
on Reliability) was held in the United States, and in 1958 the First All-Union
Conference on Reliability was held in Moscow [7,21]. In 1957, S. J. Einhorn and
F. B. Thiess applied Markov chains for modeling system intermittence [22], and in
1960 P. M. Anselone employed Markov chains for evaluating the availability of radar
systems [23]. In 1961, Birnbaum, Esary, and Saunders published a pioneering paper
introducing coherent structures [24].
The reliability models might be classified as combinatorial (non-state space
model) and state-space models. Reliability Block Diagrams (RBD) and Fault Trees
(FT) are combinatorial models and the most widely adopted models in reliability
evaluation. RBD is probably the oldest combinatorial technique for reliabil-
ity analysis. Fault Tree Analysis (FTA) was initially developed in 1962 at Bell
Laboratories by H. A. Watson to analyze the Minuteman I Intercontinental Ballistic
Missile Launch Control System. Afterward, in 1962, Boeing and AVCO expanded
the use of FTA to the entire Minuteman II [25]. In 1965, W. H. Pierce unified the
Shannon, Von Neumann, and Moore theories of masking and redundancy as the
concept of failure tolerance [26]. In 1967, A. Avizienis combined masking methods
with error detection, fault diagnosis, and recovery into the concept of fault-tolerant
systems [27].
The formation of the IEEE Computer Society Technical Committee on Fault-
Tolerant Computing (now Dependable Computing and Fault Tolerance TC) in 1970 and
of IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance in 1980
was an essential mean for defining a consistent set of concepts and terminology. In early
1980s, Laprie coined the term dependability for covering concepts such as reliability,
availability, safety, confidentiality, maintainability, security, and integrity [1,29].
In late 1970s some works were proposed for mapping Petri nets to Markov
chains [30,32,47]. These models have been extensively adopted as high-level Markov
chain automatic generation models and for discrete event simulation. Natkin was the
first to apply what is now generally called stochastic Petri nets (SPNs) to depend-
ability evaluation of systems [33].
130 Reliability Engineering
5.3 BACKGROUND
This section provides a very brief introduction to Continuous Time Markov Chains
(CTMCs) and SPNs, which are the formalism adopted to model availability and reli-
ability in this chapter.
−α α 0
Q= β −(β + γ ) γ
0 λ −λ
For time homogeneous CTMCs:
dΠ (t )
= Π ( t ) ⋅ Q, (5.1)
dt
that has the following solution [12,16]:
∞
Qt k
Π ( t ) = Π ( 0 ) e Qt = Π ( 0 ) I +
∑
k =1
k!
. (5.2)
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 131
In many cases, however, the instantaneous behavior, Π(t), of the Markov chain is
more than needed. In many cases, often it is satisfied already when computing the
steady-state probabilities, that is, Π = limt → ∞Π(t). Hence, consider the system of
differential equations presented in Equation 5.1. If the steady-state distribution
exists, then dΠ(t):
dΠ (t )
= 0
dt
Consequently, for calculating the steady-state probabilities, the only necessity is to
solve the system:
Π ⋅ Q = 0, ∑ ∀i
π i = 1. (5.3)
and
where MD = {true, false} is a set that specify if the arc between p and t is marking
dependent or not. If the arc is marking dependent, the arc weight is dependent on the
current marking M ∈ RSSPN, RSSPN is the reachability set of the net SPN. Otherwise,
it is constant.
is a matrix of inhibitor arcs. These arcs may also be marking dependent, that is the
arc weight may be dependent on current marking. hp,t: MD × RSSPN → ℕ, where
MD = {true, false} is a set that specify if the arc between p and t is marking depen-
dent or not. If the arc is marking dependent, the arc weight is dependent on the cur-
rent marking M ∈ RSSPN. Otherwise, it is constant.
with prd are discarded and new values are generated in the new marking.
The timers of transitions with prs hold the present values.
• Concurrency: T − Tim → {sss, iss} is a function that assigns to each timed
transition a timing semantics, where sss denotes single server semantics
and iss is infinite server semantics.
SPNs are usually evaluated through numerical methods. However, if the state space
is too big, infinite or even if non-phase-type distributions should be represented, the
evaluation option may fall into the simulation. With simulation, there are no funda-
mental restrictions on the models that can be evaluated. Nevertheless, the simulation
does have pragmatical constraints, since the amount of computer time and memory
running a simulation can be prohibitively large. Therefore, the general advice is to
pursue an analytical model wherever possible, even if simplifications and or decom-
position is required.
For a detailed introduction to SPNs, refer to [43,45].
from Markovian assumptions requires the adoption of simulation for a model solu-
tion [57,59–61]. It is possible also to adapt transitions to represent other distributions
employing phase approximation or moment matching as shown in [36,52]. The use
of such techniques allows the modeling of events described by distributions such as
Weibull, hypoexponential, hyperexponential, and Erlang and Cox [13,16].
µ λ
A (t ) = πU (t ) = e ( ) (5.4)
− λ +µ t
+
λ+µ λ+µ
and
λ λ
UA ( t ) = π D ( t ) = e ( ) , (5.5)
− λ +µ t
−
λ+µ λ+µ
µ
A = πU = (5.6)
λ+µ
and
λ
UA = π D = , (5.7)
λ+µ
FIGURE 5.1 Single component system: (a) Availability model and (b) Matrix rate.
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 135
Π ⋅ Q = 0, π U + π D = 1,
dΠ (t )
= Π ( t ) ⋅ Q,
dt
R(t ) = π U (t ) = e − λt (5.8)
UR(t ) = π D (t ) = 1 − e − λt . (5.9)
It is worth mentioning UR(t) = F(t), where F(t) is cumulative distribution function of the
∞ ∞
time to failure. Consequently, as MTTF = ∫ 0 R(t ) dt , we have: MTTF = ∫0 e − λt dt = λ1 .
The mean time to failure (MTTF) also can be computed from the rate matrix
Q [56,65].
Figure 5.2 depicts an example SPN for a cold-standby server system, comprising two
servers (S1 and S2). There are two places (S1 -Up and S2 -Down) representing the
operational status of the primary server, indicating when it is working or has failed,
respectively. Three places (S1 Up, S2 Down, and S2 Waiting) represent the opera-
tional status of the spare server, indicating when it is working, failed, or waiting for
activation in case of a primary server failure.
Notice that in the initial state of the cold-standby model, both places S1 -up and
S2 Waiting have one token, denoting the primary server is up, and the spare server
is in standby mode. The activation of the spare server occurs when the transition
S1 Fail fires, consuming the token from S1 Up. Once the place S1 -Up is empty, the
transition S2 Switch On becomes enabled, due to the inhibitor arc that connects it to
S1 Up. Hence, S2 Switch On fires, removing the token from S2 Waiting, and putting
one token in place S2 Up. This is the representation of the switchover process from
the primary server to the secondary server, which takes an activation time specified
in the S2 Switch On firing delay.
The repair of the primary server is represented by firing the S1 Repair transition.
The places S1 Down and S2 Up become empty, and S1 -Up receives one token again.
As previously mentioned, the time to failure of primary and secondary servers will
be different after the spare server is preserved from the effects of wear and tear when
it is on shut off or in standby mode. The availability can be numerically obtained
from the expression:
A = P ( ( # S1UP = 1) ∨ ( # S 2UP = 1) )
Figure 5.3 depicts an example CTMC for a warm-standby server system, origi-
nally shown in [49]. This model has many similarities to the SPN model for the
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 137
S1_Fail
S1_Up S1_Down
S1_Repair
S2_Fail
S2_Repair S2_Switch_On
cold-standby system, despite the distinct semantics and notation. It might be inter-
esting to verify that both approaches can be used interchangeably, mainly when the
state-space size is not a major concern.
The CTMC has five states: UW, UF, FF, FU, and FW, and considers one pri-
mary and one spare server. The first letter in each state indicates the primary server
status, and the second letter indicates the secondary server status. The letter U
stands for Up and active, F means Failed, and W indicates Waiting condition (i.e.,
the server is up but in standby waiting for activation). The shaded states represent
that the system has failed (i.e., it is not operational anymore). The state UW rep-
resents the primary server (S1) is functional and secondary server (S2) in standby.
When S1 fails, the system goes to state FW, where the secondary server has not yet
detected the S1 failure. FU represents the state where S2 leaves the waiting condi-
tion and assumes the active role, whereas S1 is failed. If S2 fails before taking the
active role, or before the repair of S1, the system goes to the state FF, when both
servers have failed. For this model, we consider a setup where the primary server
repair has priority over the secondary server repair. Therefore, when both serv-
ers have failed (state FF) there is only one possible outgoing transition: from FF
to UF. If S2 fails when S1 is up, the system goes to state UF and returns to state
UW when the S2 repair is accomplished. Otherwise, if S1 also fails, the system
transitions to the state FF. The failure rates of S1 and S2, when they are active, are
denoted by λ1. The rate λ2 denotes the failure rate of the secondary server when it
is inactive. The repair rate assigned to both S1 and S2 is µ. The rate α represents
the switchover rate (i.e., the reciprocal of the mean time to activate the secondary
server after a failure of S1).
The warm standby system availability is computed from the CTMC model by
summing up the steady-state probabilities for UW, UF, and FU states, which denote
the cases where the system is operational. Therefore, A = πUW + πUF + πFU. System
unavailability might be computed as U = 1 − A, but also as U = πFF + πFW .
A CTMC model for a cold standby system can be created with little adjustments
to the warm standby model, described as follows. The switchover rate (α) must be
modified accordingly to reflect a longer activation time. The transition from UW to
the UF state should be removed if the spare server is not assumed to fail while inac-
tive. If such a failure is possible, the failure rate (λ2) should be adjusted to match the
longer mean time to failure expected for a spare server that is partially or entirely
turned off.
A CTMC model for a hot standby system also can be derived from the warm
standby model by reducing the value of the switchover rate (α) to reflect a smaller
activation time or even removing state FW to allow transition from UW to FU
directly if the switching time from primary to spare server is negligible. In every
case, the failure rate of the spare server (λ2) should be replaced by the same rate
of the primary server since the mean time to failure is expected to be the same for
both components.
60λ 3
A = 1−
60λ + 20λ 2 µ + 5λµ 2 + µ 3
3
COA =
(
λ 60λ 2 + 16λµ + 3µ 2 )
60λ + 20λ µ + 5λµ + µ 3
3 2 2
140 Reliability Engineering
S1
Switcher/Router
Clients
S2
Servers
(D,S2,D)
S2
µ_
µ_
S1
SR
λ_
S1
µ_
(S1,S2,D)
S2
λ_
λ_
SR
λ_SR
µ_SR
S
µ_
λ_
(D,D,D) S1
(S1,S2,SR)
λ_ µ_S1
SR (D,S2,SR)
S
µ_
S2
λ_
µ_ S2
SR µ_
(D,D,SR)
Each state name comprises three parts. The first one represents the server one (S1),
the second denotes the server two (S2), and the third letter describes the switch/router
component (SR). The S1 denotes that S1 is running and operational, the S2 represents
the S2 is running and operational, and SR represents the Switch/router is running
and operational. The letter D represents the failure state. The initial state (S1,S2,SR)
represents the primary server (S1) is running and operational, the secondary server
(S2) is the spare server, and the switch/router (SR) is functional. When S1 fails, the
system goes to the state (D,S2,SR), outgoing transition: from (S1,S2,SR) to (D,S2,SR),
when S1 repair, the system returns to the initial state. Once in the state (D,S2,SR), the
system may go to the state (D,S2,D) through the SR failure or, the system may go to
state (D, SR) through the S2 failure. In both cases, the system may return to the previ-
ous state across the SR repair rate or S2 repair rate, respectively. As soon as the state
(D,D,SR) is achieved, the system may go to the state (D,D,D) with the SR failure, or
returns to the initial state (S1,S2,SR), when the repair is accomplished (i.e., the repair
142 Reliability Engineering
of the systems S1 and S2). The failure rates of S1, S2, and SR are denoted by λ_S1,
λ_S2, and λ_SR, respectively, as well as the repair rates for each component µ_S1,
µ_S2, and µ_SR. The µ_S denotes the repair rate when the two servers are in a fail-
ure state.
The CTMC that represents the architecture enables obtaining a closed-form equa-
tion to compute the availability (see Equation 5.12). It is important to stress that the
parameters µ_S1=µ_S2=µ_SR are equal to µ and λ_S1=λ_S2 are equal to λ.
µ ( µ ( µ + µ s ) + λ ( µ + 2µs ) )
A= (5.12)
( λSR + µ ) ( λ 2 + µ ( µ + µs ) + λ ( µ + 2µs ) )
5.4.2.1.2 Reliability CTMC Model
Figure 5.7 depicts the CTMC reliability model for this architecture. The main
characteristics of the reliability models are the absence of repair, i.e., when the
system goes to the failure state the repair is not considered. This action is neces-
sary to compute with more ease the system mean time to failure, and subsequently
the reliability metric. The reliability model has three states as a tuple: (S1,S2,SR);
(D,S2,SR); and Down state. The initial state (S1,S2,SR) represents all components
running. If S1 fails, the system may go to (D,S2,SR) state, then this event repre-
sents that even with the failure of S1 server, the system may continue the operation
with the secondary server (S2). When S1 is repaired, the system returns to the
initial state. Outgoing transition: from (S1,S2,SR) to Down, when SR fails, repre-
sents the system failure; thus, the system is offline and may not provide the service.
Once in (D,S2,SR) state, the system may go to the Down state with S2 failure rate
or SR failure rate. Once in the Down state, the system goes to the failure condition,
and it is possible to obtain the reliability metric. The up states of the system are
represented by (S1,S2,SR) and (D,S2,SR).
Down
R
λ _S
λ _S
R+λ
_S2
λ _S1
(S1,S2,SR) (D,S2,SR)
µ _S1
5.4.2.1.3 Results
Table 5.1 presents the values of failure and repair rates, which are the reciprocal
of the MTTF and mean time to repair (MTTR) of each component represented in
Figures 5.6 and 5.7. Those values were estimated and were used to compute the avail-
ability and reliability metrics.
It is important to stress that the µ S represents twice the repair rate of µ S1 con-
sidering just one maintenance team. The availability and reliability measures were
computed herein for the architecture described in Figure 5.5, using the mentioned
input parameters. The results are shown in Table 5.2, including steady-state avail-
ability, number of nines, annual downtime, reliability, and unreliability, considering
4,000 h of activity.
The downtime provides a view of how much time the system is unavailable for its
users for 1 year. The downtime value of 10.514278 h indicates that the system can be
improved; this downtime indicates that the system stands still for 10 hours of total
outage through a year. At 4,000 h of activity, the system has a reliability a little over
80 percent.
TABLE 5.1
Input Parameters
Variable Value (h−1)
λ -SR 1/20,000
λ -S1 = λ S2 1/15,000
µ S1 = µ S2 = µ -SR 1/24
µS 1/48
TABLE 5.2
CTMC Results
Availability 0.9987997
Number of nines 2.9207247
Downtime (h/yr) 10.514278
Reliability (4,000 h) 0.8183847
Unreliability (4000 h) 0.1816152
144 Reliability Engineering
( (
A = P ( # SR _ OK = 1) AND ( ( # S1 _ OK = 1) OR ( # S 2 _ OK = 1) ) ))
( (
UA = 1 − P ( # SR _ OK = 1) AND ( ( # S1 _ OK = 1) OR ( # S 2 _ OK = 1) ) ))
SR_OK
S1_OK
SRR S1R
SRF S1F
SR_F S1_F
S2_OK
S2R
S2F
S2_SwitchingOn S2_OFF
S2_F
SR_OK
S1_OK
S1R
SRF S1F
SR_F S1_F
Failure_sys
System_OFF S2_OK
S2R
S2F
S2_F
place System OFF with a token. The following expressions are adopted for esti-
mating reliability and unreliability, respectively:
5.4.2.2.3 Results
Table 5.3 presents the values of mean time to failure (MTTF) and mean time to
repair (MTTR) used for computing availability and reliability metrics for the SPN
models. We computed the availability and reliability measures using the mentioned
input parameters. The results are shown in Table 5.4, including steady-state avail-
ability, number of nines, annual downtime, reliability, and unreliability, considering
4,000 h of activity. The switching time considered is 10 minutes, which are enough
for the system startup and software loading.
This SPN model enables the computation of the reliability function of this sys-
tem over time, which is plotted in Figure 5.10, considering the baseline setup of
parameters shown in Table 5.3, and also a scenario with improved values for the
switch/router MTTF (30,000 h) and both servers MTTR (8 h). It is noticeable that, in
the baseline setup, the system reliability reaches 0.50 at around 15,000 h, and after
60,000 h (about 7 years), the system reliability is almost zero. When the improved
version of the system is considered, the reliability has a smoother decay, reaching
0.50 just around 25,000 h, and approaching zero only near to 100,000 h. For the
sake of comparison, the reliability at 4,000 h is 0.8840773, wherein the baseline
setup is 0.818385. Such an analysis might be valuable for systems administrators to
TABLE 5.3
Input Parameters for SPN Models
Transition Value (h) Description
SRF 20,000 Switch/Router MTTF
S1F = S2F 15,000 Servers MTTF
SRR = S1R = S2R 24 MTTR
S2 Switching On 0.17 MTA
TABLE 5.4
SPN Results
Availability 0.998799
Number of nines 2.920421
Downtime (h/yr) 10.521636
Reliability (4,000 h) 0.818385
Unreliability (4,000 h) 0.181615
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 147
1 Baseline setup
Improved setup
Reliability 0.8
0.6
0.4
0.2
0
0 20,000 40,000 60,000 80,000 100,000
Time (h)
ACKNOWLEDGMENT
This work was supported by a grant of contract number W911NF1810413 from the
U.S. Army Research Office (ARO).
148 Reliability Engineering
REFERENCES
1. Laprie, J.C. Dependable Computing and Fault Tolerance: Concepts and terminology.
Proceedings of the 15th IEEE International Symposium on Fault-Tolerant Computing.
1985.
2. Schaffer, S. Babbage’s Intelligence: Calculating Engines and the Factory System.
Critical Inquiry. 1994, Vol. 21, No. 1, 203–227.
3. Blischke, W.R., Murthy, D.P. [ed.]. Case Studies in Reliability and Maintenance.
Hoboken, NJ: John Wiley & Sons, 2003. p. 661.
4. Stott, H.G. Time-Limit Relays and Duplication of Electrical Apparatus to Secure
Reliability of Services at New York. Transactions of the American Institute of Electrical
Engineers, 1905, Vol. 24, 281–282.
5. Stuart, H.R. Time-Limit Relays and Duplication of Electrical Apparatus to Secure
Reliability of Services at Pittsburg. Transactions of the American Institute of Electrical
Engineers, 1905, vol. XXIV, 281–282.
6. Board of Directors of the American Institute of Electrical Engineers. Answers to
Questions Relative to High Tension Transmission. s.l.: IEEE, 1902.
7. Ushakov, Igor. Is Reliability Theory Still Alive? Reliability: Theory &Applications.
2007, Vol. 2, No. 1, Mar. 2017, p. 10.
8. Bernstein, S. Sur l’extension du théorème limite du calcul des probabilités aux´ sommes
de quantités dépendantes. Mathematische Annalen. 1927, Vol. 97, 1–59. http://eudml.
org/doc/182666.
9. Basharin, G.P., Langville, A.N., Naumov, V.A. The Life and Work of A.A. Markov.
Linear Algebra and Its Applications. 2004, Vol. 386, 3–26. doi:10.1016/j.laa.2003.12.041.
10. Principal Works of A. K. Erlang—The Theory of Probabilities and Telephone
Conversations. First published in Nyt Tidsskrift for Matematik B. 1909, Vol. 20, 33–39.
11. Kotz, S., Nadarajah, S. (2000). Extreme Value Distributions: Theory and Applications.
Imperial College Press. doi:10.1142/9781860944024.
12. Kolmogoroff, A. Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung [in
German] [Springer-Verlag]. Mathematische Annalen. 1931, Vol. 104, 415–458.
doi:10.1007/BF01457949.
13. Shannon, C.E. A Mathematical Theory of Communication. The Bell System Technical
Journal. 1948, Vol. 27, 379–423, 623–656.
14. Neumann, J.V. Probabilistic Logics and the Synthesis of Reliable Organisms from
Unreliable Components. Automata studies, 1956, Vol. 34, 43–98.
15. Moore, E.F. Gedanken-Experiments on Sequential Machines. The Journal of Symbolic
Logic. 1958, Vol. 23, No. 1, 60.
16. Cox, D. A Use of Complex Probabilities in the Theory of Stochastic Processes.
Mathematical Proceedings of the Cambridge Philosophical Society. 1955, Vol. 51,
No. 2, 313319. doi:10.1017/S0305004100030231.
17. Avizienis, A. Toward Systematic Design of Fault-Tolerant Systems. IEEE Computer.
1997, Vol. 30, No. 4, 51–58.
18. Barlow, R.E. Mathematical Theory of Reliability. New York: John Wiley & Sons, 1967.
SIAM series in applied mathematics.
19. Barlow, R.E., Mathematical Reliability Theory: From the Beginning to the Present
Time. Proceedings of the Third International Conference on Mathematical Methods
In Reliability, Methodology and Practice. Trondheim, Norway, 2002.
20. Epstein, B., Sobel, M. Life Testing. Journal of the American Statistical Association.
1953, Vol. 48, No. 263, 486–502.
21. Gnedenko, B., Ushakov, I. A., & Ushakov, I. (1995). Probabilistic reliability engineering.
John Wiley & Sons.
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 149
44. German, R., Lindemann, C. Analysis of Stochastic Petri Nets by the Method of
Supplementary Variables. Performance Evaluation. 1994, Vol. 20, No. 1, 317–335.
45. German, R. Performance Analysis of Communication Systems with NonMarkovian
Stochastic Petri Nets. New York: John Wiley & Sons, 2000.
46. Lindemann, C. (1998). Performance modelling with deterministic and stochastic Petri
nets. ACM sigmetrics performance evaluation review, 26(2), 3.
47. Molloy, M.K. Performance Analysis Using Stochastic Petri Nets. IEEE Transactions
on Computers. 1982, Vol. 9, 913–917.
48. Muppala, J., Ciardo, G., Trivedi, K.S. Stochastic Reward Nets for Reliability Prediction.
Communications in Reliability, Maintainability and Serviceability. 1994, Vol. 1, 9–20.
49. Matos, R., Dantas, J., Araujo, J., Trivedi, K.S., Maciel, P. Redundant Eucalyptus Private
Clouds: Availability Modeling and Sensitivity Analysis. Journal Grid Computing.
2017, Vol. 15, No. 1, 1–23.
50. Malhotra, M., Trivedi, K.S. Power-hierarchy of Dependability-Model Types. IEEE
Transactions on Reliability. 1994, Vol. 43, No. 3, 493–502.
51. Shooman, M.L. The Equivalence of Reliability Diagrams and Fault-Tree Analysis.
IEEE Transactions on Reliability. 1970, Vol. 19, No. 2, 74–75.
52. Watson, J.R., Desrochers, A.A. Applying Generalized Stochastic Petri Nets to
Manufacturing Systems Containing Nonexponential Transition Functions. IEEE
Transactions on Systems, Man, and Cybernetics. 1991, Vol. 21, No. 5, 1008–1017.
53. O’Connor P, Kleyner A. Practical Reliability Engineering. John Wiley & Sons; 2012
Jan 30.
54. Beaudry, M.D. Performance-Related Reliability Measures for Computing Systems.
IEEE Transactions on Computers. 1978, Vol. 6, 540–547.
55. Dantas, J., Matos, R., Araujo, J., Maciel, P. Eucalyptus-based Private Clouds:
Availability Modeling and Comparison to the Cost of a Public Cloud. Computing. 2015,
Vol. 97, No. 11, 1121–1140.
56. Buzacott, J.A. Markov Approach to Finding Failure Times of Repairable Systems.
IEEE Transactions on Reliability. 1970, Vol. 19, No. 4, 128–134.
57. Maciel, P., Matos, R., Silva, B., Figueiredo, J., Oliveira, D., Fe, I., Maciel, R.,
Dantas, J. Mercury: Performance and Dependability Evaluation of Systems with
Exponential, Expolynomial, and General Distributions. In: The 22nd IEEE Pacific Rim
International Symposium on Dependable Computing (PRDC 2017). January 22–25,
2017. Christchurch, New Zealand.
58. Guedes, E., Endo, P., Maciel, P. An Availability Model for Service Function Chains
with VM Live Migration and Rejuvenation. Journal of Convergence Information
Technology. Volume 14 Issue 2, April, 2019. Pages 42–53.
59. Silva, B., Matos, R., Callou, G., Figueiredo, J., Oliveira, D., Ferreira, J., Dantas, J.,
Junior, A.L., Alves, V., Maciel, P. Mercury: An Integrated Environment for Performance
and Dependability Evaluation of General Systems. Proceedings of Industrial Track at
45th Dependable Systems and Networks Conference (DSN-2015). 2015. Rio de Janeiro,
Brazil.
60. Silva, B., Maciel, P., Tavares, E., Araujo, C., Callou, G., Souza, E., Rosa, N. et al.
ASTRO: A Tool for Dependability Evaluation of Data Center Infrastructures. IEEE
International Conference on Systems, Man, and Cybernetics, 2010, Istanbul, Turkey.
IEEE Proceeding of SMC, 2010.
61. Silva, B., Callou, G., Tavares, E., Maciel, P., Figueiredo, J., Sousa, E., Araujo, C.,
Magnani, F., Neves, F. Astro: An Integrated Environment for Dependability and
Sustainability Evaluation. Sustainable Computing: Informatics and Systems. 2013
Mar 1;3(1):1–7.
62. Kuo, W., Zuo, M.J. Optimal Reliability Modeling—Principles and Applications. Wiley,
2003. p. 544.
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 151
63. Heimann, D., Mittal, N., Trivedi, K. Dependability Modeling for Computer systems.
Proceedings Annual Reliability and Maintainability Symposium, 1991. IEEE, Orlando,
FL, pp. 120–128.
64. Matos, R., Maciel, P.R.M., Machida, F., Kim, D.S., Trivedi, K.S. Sensitivity Analysis
of Server Virtualized System Availability. IEEE Transactions on Reliability. 2012,
Vol. 61, No. 4, 994–1006.
65. Reinecke, P., Bodrog, L., Danilkina, A. Phase-Type Distributions. Resilience
Assessment and Evaluation of Computing Systems, Berlin, Germany: Springer, 2012.
6 An Overview of Fault
Tree Analysis and Its
Application in Dual
Purposed Cask Reliability
in an Accident Scenario
Maritza Rodriguez Gual, Rogerio Pimenta
Morão, Luiz Leite da Silva, Edson Ribeiro,
Claudio Cunha Lopes, and Vagner de Oliveira
CONTENTS
6.1 Introduction................................................................................................... 153
6.2 Overview of Fault Tree Analysis................................................................... 154
6.3 Minimal Cut Sets........................................................................................... 156
6.4 Description of Dual Purpose Cask................................................................ 157
6.5 Construct the Fault Tree of Dual Purpose Cask............................................ 159
6.6 Results............................................................................................................ 160
6.7 Conclusion..................................................................................................... 163
References............................................................................................................... 164
6.1 INTRODUCTION
Spent nuclear fuel is generated from the operation of nuclear reactors and must be
safely managed following its removal from reactor cores. The Nuclear Technology
Development Center (Centro de Desenvolvimento da Tecnologia Nuclear–CDTN),
Belo Horizonte, Brazil constructed a dual-purpose metal cask in scale 1:2 for the
transport and dry storage of spent nuclear fuel (SNF) that will be generated by
research reactors, both plate-type material testing reactor (MTR) and TRIGA fuel
rods. The CDTN is connected to the Brazilian National Nuclear Energy Commission
(Comissão Nacional de Energia Nuclear—CNEN).
The dual purpose cask (DPC) development was supported by International
Atomic Energy Agency (IAEA) Projects RLA4018, RLA4020, and RLA3008.
The project began in 2001 and finished in 2006. Five Latin American countries
participated—Argentina, Brazil, Chile, Peru, and Mexico. The cask is classified as a
Type B package according to IAEA Regulations for the Safe Transport of Radioactive
153
154 Reliability Engineering
Materials (IAEA, TS-R-1, 2009). The RLA/4018 cask was designed and constructed
in compliance with IAEA Transport Regulations. The IAEA established the stan-
dards for the packages used in the transport of radioactive materials under both nor-
mal and accident conditions.
The general safety requirement concerns, among other issues, are package tie-
down, lifting, decontamination, secure and closing devices, and material resistance to
radiation, thermal, and pressure conditions likely to be found during transportation.
The regulations establish requirements that guarantee that fissile material is pack-
aged and shipped in such a manner that they remain subcritical under the conditions
prevailing during routine transport and in accidents.
TABLE 6.1
Fundamental Laws of Boolean Algebra
Law AND Form Representation OR Form Representation
Commutative x + y = y + x x·y = y·x
Associative x + (y + z) = (x + y) + z x·(y·z) = (x·y)·z
Distributive x·(y + z) = x·y + x·z x·y + x·z
Idempotent x·x = x x + x = x
Absorption x·(x + y) = x x + x·y = x
(Vesely et al., 1981). The FTA begins by identifying multiple-cause combina-
tions for each fatality. These multiple-cause combinations can be connected by an
AND-gate (the output occurs only if all inputs occur), indicating that these two or
three events contributed simultaneously to these fatal falls and OR gate (the out-
put occurs if any input occurs). Fundamental laws of Boolean algebra (Whitesitt,
1995) (see Table 6.1) were applied to reduce all possible cause combinations to
the smallest cut set (Vesely et al., 1981) that could cause the top event to occur.
Eventually, all case combinations associated with each basic event can be simpli-
fied and presented in a fault tree diagram.
The fault tree gates are systematically substituted by their entries, applying the
Boolean algebra laws in several stages until the top event Boolean expression con-
tains only basic events. The final form of the Boolean equation is an irreducible logi-
cal union of minimum sets of events necessary and enough to cause of the top event,
denominated MCSs. Then, the original fault tree is mathematically transformed into
an equivalent MCS fault tree. The transformation process also ensures that any sin-
gle event that appears repeatedly in various branches of the fault tree is not counted
twice.
Fault trees graphically represent the interaction of failures and other events within
a system. Basic events at the bottom of the fault tree are linked via logic symbols
(known as gates) to one or more top events. These top events represent identified
hazards or system failure modes.
A fault tree diagram (FTD) is a logic block diagram that displays the state of
a system (top event) in terms of the states of its components or subsystems (basic
events). The basic events are determined through a top-down approach. The rela-
tionship between causes and the top event (failure or fatalities) are represented by
standard logic symbols (AND, OR, etc.).
FTA involves five steps to obtain an understanding of the system:
FTA is a simple, clear, and direct-vision method for effectively analyzing and esti-
mating possible accidents or incidents and causes. FTA is useful to prioritize the
preventive maintenance of the equipment that is contributing the most to the failure.
Also, it is a quality assurance (QA) tool. The overall success of the FTA depends on
the skill and experience of the analyst.
Qualitative analysis by FTA is an alternative for reliability assessment when his-
torical data of undesirable event (fatalities or failure) are incomplete or unavailable
for probabilistic calculation (quantitative).
FTA can be used for quantitative assessment of reliability if probability values
are available.
For a large or very complex system that includes a large number of equipment and
components, FTA can be time consuming. The complex FTA must be analyzed with
a specialized computer program. However, there are still several practical cases in
which fault trees are convenient as it is for the case study solved here.
This methodology (Vesely et al., 1981) is applicable to all fault trees, regardless of
size of complexity, that satisfy the following conditions:
But, the biggest advantage of using FTA is that it starts from a top event that is
selected by the user for a specific interest, and the tree developed will identify the
root cause.
The MCSs can be ordered by number and the order (i.e., cut set size).
A cut set order is the number of elements in cut sets. The first-order MCS can be
directly obtained, and the second-order MCS is obtained by the logical operation
“OR.” When the gate is “AND,” it increases the order of the MCS and when it is
“OR,” the quantity of MCS is increased.
The lower the order, the more critical is the cut, which is only acceptable if this
failure is of very low probability.
FIGURE 6.2 Photography of spent fuel transport and storage RLA4018 design by CDTN.
The cask is provided with four lifting trunnions; two in the top half and two in its
bottom half so that the cask can be easily rotated. The cask is vertically held down
by four bottom screwed trunnions.
The process of loading spent fuel consists of submerging the transport cask in
the reactor pool while spent fuel is transferred into the basket. The water is drained
and the cask is dried to eliminate residual amounts of water in the cavity to ensure
sub-criticality conditions.
The cask has one draining port for vacuum drying, while its primary lid is pro-
vided with another one for helium gas filling. After the water draining, a cask pri-
mary lid is installed on the cask body. Next, a vacuum drying system is connected to
a cask to remove the moisture from the cask.
The shock absorbers provide protection to the whole cask during the 9 meters
drop test prescribed for this type of package. They consist of a thin external
stainless-steel shell encasing an energy absorbing material. Different materials
An Overview of Fault Tree Analysis and Its Application 159
have been used by the cask designers for this purpose, the most common being
polyurethane foam, solid wood and wood composites, aluminum honeycomb, and
aluminum foam. The currently selected cushioning material is high density rigid
polyurethane foam.
It is important to note that the accelerometer base is not in the final cask. It is used
only to measure the acceleration range during the impact tests.
Type B packages are designed to withstand very severe accidents in all the modes
of transport without any unacceptable loss of containment or shielding.
The transport regulations and storage safety requirements to consider in the
DPC package design (IAEA, 2014), under routine conditions of transport (RCT),
normal conditions of transport (NCT), and accident conditions of transport
(ACT) are:
Aging effects in DPCs is considering because they are expected to be used for spent
fuel interim storage for up to 20 years.
The objective of the regulations is to protect people and the environment from the
effects of radiation during the transport of radioactive material.
Normal conditions that a spent fuel transport package must be able to resist
include hot and cold environments, changes in pressure, vibration, water spray,
impact, puncture, and compression.
To show that it can resist accident conditions, a package must pass impact, fire,
and water immersion tests.
Reports from the United States (Nuclear Monitor 773, 2013) and the United
Kingdom (Jones and Harvey, 2014) include descriptions of various accidents and
incidents involving the transport of radioactive materials, which occurred until 2014,
but none resulted in a release of radioactive material or a fatality due to radiation
exposure. For this reason, this study is important.
The root, called the top event (TE), is the undesired event of a tree.
Rectangle represents top event and middle events.
Circle represents basic events.
Logic OR gate, which is equivalent to the Boolean symbol +, represents a situ-
ation in which one of the events alone (input gate) is enough to contribute to the
system fault (output event). OR gates increase the number of cut sets, but often lead
to single component sets.
Logic AND gate, which is equivalent to the Boolean symbol, represents a situa-
tion in which all the events shown below the gate (input gate) are required for a sys-
tem fault shown above the gate (output event). AND gates of the fault tree increase
the number of components (order) in a cut set.
The analysis was performed according to the following steps:
• Definition of the system failure event of interest, known as the top event, as
environmental contamination.
• Identification of contributing events (basic or intermediate), which might
directly cause the top event to occur.
6.6 RESULTS
The specific case study analyzed to apply FTA is titled Environmental contamina-
tion (Top Event).
The fault tree was constructing within multidisciplinary teams working together,
such as nuclear engineers, electrical engineers, and mechanical engineers. Working
within multidisciplinary teams makes it possible to analyze the design of weak points.
The fault tree diagram is shown in Figure 6.3.
The basic events that led to the top event, Environmental contamination, are
shown in Table 6.2 with the symbols given.
Table 6.3 describes the symbols for the Intermediary Events on the FTD.
The Boolean algebra analysis of the fault tree is shown in Table 6.4.
The MCSs are listed in Table 6.5.
Events B1, B2, B3, B4, B5, B7, B8, B9, and B10 are associated with human errors.
Hence, B6 is susceptible to human error.
Boolean algebra laws reduced the amount of cause combinations and the redun-
dancy of basic events.
MCS can be used to understand the structural vulnerability of a system. If the
order of MCS is high, then the system will less vulnerable (or top event in fault trees)
to these events combinations. In addition, if there are numerous MCSs, it indicates
that the system has higher vulnerability. Cut sets can be used to detect single point
failures (one independent element of a system that causes an immediate hazard to
occur and/or causes the whole system to fail).
Two first-order and five second-order MCS were found.
• 1st order: The occurrence of a BE implies the occurrence of the top or
undesired event.
• 2nd order: The simultaneous occurrences of BEs result in the loss of conti-
nuity of operation of the system.
An Overview of Fault Tree Analysis and Its Application 161
TABLE 6.2
Description of Symbols for the Basic Events on the Fault Tree Diagram
Number Basic Events Symbols
1 Containment failure B1
2 Failure in inspection, control, in testing program B2
3 Vehicle collision B3
4 Fire of oil B4
5 Deficiencies in component B5
6 Operator errors B6
7 Contaminated water in reactor pool B7
8 Improper equipment to closure of screw B8
9 Error in tightening torque calculation B9
10 Material aging B10
162 Reliability Engineering
TABLE 6.3
Description of Symbols for the Intermediary Events on the FTD
Number Intermediary Events Symbols
1 Contamination outside of cask G2
2 Vehicle fire G3
3 Internal lid screws with incorrect torque G4
Collision and other accidents G5
4 Containment failure G6
5 Deficiencies in decontamination equipment and/or contamination detection F1
6 Failure in screw closure F2
TABLE 6.4
Minimal Cut Set Determination Steps
Step Boolean Expression for Top Event (G1) of Figure 6.3
1 G1 = G2 + G6 + G3 + G4 + (G5·G3)
2 G1 = (B7·B6) + (B2·B6) + (B2·B6) + (B2·B6) + (B2·B6) + B8 + B9 + (B1·B2) + (B10·B2) ++
(B3·B4) + (B2·B6) + (B5·B6·B3·B4)
3 G1 = (B7·B6) + (B2·B6) + B8 + B9 + (B1·B2) + (B10·B2) + (B3·B4)
TABLE 6.5
List of Minimal Cut Sets
Number Minimal Cut Sets Cause
1 B8 Improper equipment to closure of screw
2 B9 Error in tightening torque calculation
3 (B6,B7) Operator errors and contaminated water in reactor pool
4 (B3,B4) Vehicle collision and fire of oil
5 (B2,B6) Failure in inspection, control, in testing program and operator errors
6 (B1,B2) Containment failure and failure in inspection, control, in testing program
7 (B10,B2) Failure in inspection, control, in testing program and material aging
Based on this, it is necessary early on to prevent the occurrence of the top event and
take care more quickly with the most critical causes (i.e., those that represent the first
or lowest order MCSs (B8 and B9). It shows the system is relatively safe because the
first order MCSs are few. The system is relative dangerous however.
For this tree, seven root causes were found and, according to the MCSs, two of
these causes are critical; they can happen independent of the others and cause the
top event.
Human error in inspection, control, in testing program, decontamination, in contam-
ination detection, manufacturing, in tightening torque calculation, in use of improper
An Overview of Fault Tree Analysis and Its Application 163
The diagrams created in the fault tree methods, in general, are more easily under-
stood by non-probabilistic safety analysis (PSA) specialists, and therefore they can
greatly assist in the documentation of the event model (IAEA-TECDOC-1267, 2002).
A PSA fault tree is a powerful tool that can be used to confirm assumptions that
are commonly made in the deterministic calculation about the availability of sys-
tems, for example, to determine the potential for common cause failures or the mini-
mum system requirements, to identify important single failures, and to determine
the adequacy of technical specifications (IAEA-SSG-2, 2002).
The risk assessment has been seriously addressed within the IAEA staff in the
Safety Analysis Report (SAR) and an assessment of PSA (IAEA-GSR-4, 2016) is
included in the SAR.
The risk assessment for spent nuclear fuel transportation and storage are part of
SAR of the CDTN. The constructed DPC is not yet licensed in Brazil. The SAR is
an important document for the entire licensing process.
This study will form part of a future SAR of the CDTN and a safety operation
manual for the DPC because it provides pertinent information.
6.7 CONCLUSION
The FTA of the DPC was established on the basis of the environmental contamina-
tion scenario of the DPC in this chapter.
Some main causes include the use of improper equipment for closure of screws
and errors in calculation of the tightening torque. Appropriate precautions measures
can be taken to decrease the probability of this occurrence.
The results revealed that a large proportion of undesired events were the result of
human errors. Proposed corrective actions have been implemented to minimize the incident.
This evaluation system predicted the weak points existing in the DPC, as well as
provided theoretical support to avoid the loss of DPC integrity.
Despite all the advantages previously discussed, it is important to note that this
study is an initial work that must continue because other possible undesired events
must be studied.
This is the first work in CDTN about FTA for DPCs that will contribute to many
future studies in this system, and will involve quantitative derivation of probabilities.
This study provides an organized record of basic events that contribute to an envi-
ronmental contamination of a DPC. Also is provided information pertinent to future
SARs of nuclear installations of CDTN (in Portuguese, RASIN) and an operation
manual for DPCs.
164 Reliability Engineering
REFERENCES
Gual MR, Rival RR, Oliveira V, Cunha CL. 2017. Prevention of human factors and Reliability
analysis in operating of sipping device on IPR-R1 TRIGA reactor, a case study. In.
Human Factors and Reliability Engineering for Safety and Security in Critical
Infrastructures, eds. Felice F. and Petrillo A., pp. 155–170, Springer Series in Reliability
Engineering, Piscataway, NJ.
Gual MR, Perdomo OM, Salomon J, Wellesley J, Lora A. 2014. ASeC software applica-
tion based on FMEA in a mechanical sample positioning system on a radial chan-
nel for irradiations in a nuclear research reactor with continuous full-power operation.
International Journal of Ecosystems and Ecology Science 4(1):81–88.
International Atomic Energy Agency—IAEA. 2009. Regulations for the Safe Transport of
Radioactive Material. Safety Requirements No. TS-R-1, Vienna, Austria.
IAEA-TECDOC-1267. 2002. Procedures for Conducting Probabilistic Safety Assessment
for Non-reactor Nuclear Facilities. International Atomic Energy Agency Vienna
International Centre, Vienna, Austria.
IAEA Specific Safety Guide No. SSG-2. 2009. Deterministic Safety Analysis for Nuclear
Power Plants Safety. International Atomic Energy Agency Vienna International
Centre, Vienna, Austria.
IAEA General Safety Requirements GSR-4. 2016. Safety Assessment for Facilities and
Activities, IAEA-GSR-4, Vienna, Austria.
Jones AL, Harvey MP. 2014. Radiological Consequences Resulting from Accidents and
Incidents Involving the Transport of Radioactive Materials in the UK: 2012 Review,
PHE-CRCE-014. Education Public Health England, August, HPA-RPD-034.
Nuclear Monitor 773. 2013. Nuclear transport accidents and incidents. https://www.
wiseinternational.org/nuclear-monitor/773/nuclear-transport-accidents-and-incidents
(accessed May 30, 2018).
Perdomo OM, Salomon LJ. 2016. Expanded failure mode and effects analysis: Advanced
approach for reliability assessments. Revista Cubana de Ingenierıia VII (2):5–14.
Rivero JJ, Salomón LJ, Perdomo OM, Torres VA. 2018. Advanced combinatorial method for
solving complex fault trees. Annals of Nuclear Energy 120:666–681.
Stamatelatos M, Vesely WE, Dugan J, Fragola J. 2002. Fault Tree Handbook with Aerospace
Applications. NASA Office of Safety and Mission Assurance. NASA Headquarters.
Washington, DC. August.
Troncoso M. 2018a. Estudio LOPA para la Terminal de Petrolíferos Veracruz (In Spanish)
Internal Task: 9642-18-2017 (408).
Troncoso M. 2018b. Estudio HAZOP para la Terminal de Petrolíferos Puebla. (In Spanish)
Internal Task: 9642-18-2015 (408).
Vesely W, Goldberg F, Roberts N, Haasl D. 1981. Fault tree handbook. Technical Report
NUREG-0492, Office of Nuclear Regulatory Research U.S. Nuclear Regulatory
Commission (NRC). Washington, DC.
Whitesitt J. Eldon. 1995. Boolean Algebra and Its Applications. Courier Corporation, 182 p.,
Dover Publications, Inc., New York.
7 An Overview on
Failure Rates in
Maintenance Policies
Xufeng Zhao and Toshio Nakagawa
CONTENTS
7.1 Introduction................................................................................................... 165
7.2 Inequalities of Failure Rates.......................................................................... 167
7.3 Age and Random Replacement Policies........................................................ 169
7.4 Periodic and Random Replacement Policies................................................. 174
7.5 Periodic Replacement Policies with Failure Numbers.................................. 178
7.6 Conclusions.................................................................................................... 182
Appendices.............................................................................................................. 183
Acknowledgment.................................................................................................... 196
References............................................................................................................... 196
7.1 INTRODUCTION
Aging describes how an operating unit improves or deteriorates with its age and is
usually measured by the term of the failure rate function [1,2]. The failure rate is
the most important quantity in reliability theory and these properties were investi-
gated in [2–4]. For an age replacement model, it has been supposed that an operat-
ing unit is replaced preventively at time T (0 < T ≤ ∞) or correctively at failure time
X ( X > 0), whichever occurs first, in which the random variable X has a general
∞
distribution F (t ) ≡ Pr{ X ≤ t} for t ≥ 0 with finite mean µ ≡ ∫0 F (t )dt . The expected
cost rate for the age replacement policy was given [2,4]:
cT + (cF − cT ) F (T )
C (T ) = T , (7.1)
∫ 0
F (t )dt
where:
cT = preventive replacement cost at time T ,
cF = corrective cost at failure time X ,
cF > cT .
165
166 Reliability Engineering
f (t )
h(t ) ≡ , (7.2)
F (t )
where h(t )∆t ≡ Pr{t < X ≤ t + ∆t} for small ∆t > 0 represents the probability that an
operating unit with age t will fail during interval (t , t + ∆t ). Therefore, optimum T * to
minimize C (T ) is a solution of:
T
cT
∫ F (t )[h(T ) − h(t )]dt = c
0 F − cT
. (7.3)
It has been shown [4] that if h(t ) increases strictly to h(∞) ≡ lim t →∞h(t ) and
h(∞) > cF / [ µ (cF − cT )], then a finite and unique T * exists, and the optimum cost rate
is given by the failure rate h(t ) as:
Equations (7.3) and (7.4) indicate the optimum time T * decreases while the expected
cost rate C (T * ) increases with the failure rate h(t ). This result means if we know
more about the properties of the failure rate, we can make better replacement deci-
sions for an operating unit in an economical way.
We recently proposed several new replacement models such as random replace-
ment, replacement first, replacement last, replacement overtime, and replacement
middle [5–8]. These models showed that the extended types of failure rates appeared,
which played important roles in obtaining optimum replacement times in analytical
ways. So it would be of interest to survey the reliability properties of the failure rates
and their further applications for the recent maintenance models.
The standard failure rate h(t ) has been defined in Equation (7.2). We will for-
mulate several extended failure rates in inequality forms by integrating h(t ) with
replacement policy at time T. We show the examples of these failure rates appeared
in replacement models. In Section 7.3, when the replacement time T and work num-
ber N become the decision variables, we introduce the failure rates that are found
in age and random replacement models. In Section 7.4, the failure rates in periodic
replacement models with minimal repairs are given and shown in periodic and ran-
dom replacement models. In Section 7.5, the failure rates and their inequalities are
shown for the model where replacement is done at failure number K .
The recent models of replacement first, replacement last, and replacement over-
time are surveyed for these failure rates in the following sections. In addition, we
give an appendix for the proofs of these extended failure rates.
An Overview on Failure Rates in Maintenance Policies 167
t
0 ∫
F (t ) = exp − h(u)du = e − H ( t ) , i.e., H (t ) = − log F (t ). (7.5)
∫ F (t )dt ≤ F (T ) ≤ H (T ) ≤ h(T )
0
T
T
t
T
∫ ∫ F (u)du dt ∫ F (t )dt
0 0 0
∞
(7.6)
≤ ∞
F (T )
≤
∫ T
.
F (t )dt
∞
∞
∫T
F (t )dt
∫ ∫ F (u)du dt
T
t
Inequality II:
∞ T
T
F (T )
≤
∫ T
F (t )dt
1
≤ ≤
∫ F (t )dt ≤ F (T ) . (7.7)
0
∞
∞
t
µ T
∞
∫0
F (t )dt
∫ ∫ F (u)du dt
T 0 ∫ ∫ F (u)du dt ∫ F (t )dt
0 t T
T
tn t1
∫ ∫ ∫
T
0
0
[
0
f (u)du]dt n−1 dt n
≤≤
∫ f (t )dt ≤ h(T )
0
T
T
tn t1
∫ ∫ ∫
0
0
[ F (u)du]dt n−1 dt n
0
∫ F (t )dt
0
(7.8)
∞ ∞ ∞
∫ ∫ ∫
∞
∫
≤ T∞
f (t )dt
T
≤≤ ∞
tn
[ f (u)du]dt n−1 dt n
t1 ( n = 1,2,).
∞ ∞
∫
T
F (t )dt
T ∫ ∫ ∫ tn
[ F (u)du]dt n−1 dt n
t1
168 Reliability Engineering
All of the above functions increase with T and become h(T ) = λ for T ≥ 0 when
F (t ) = 1− e − λt .
We next give some other applications of the failure rates in Equations (7.2), (7.6),
and (7.7) to replacement policies planned at time T when h(t ) increases with t from
h(0) to h(∞).
Example 7.1
[4, p. 8] Suppose the unit only produces profit per unit of time when it is operating
without failure, and it is replaced preventively at time T (0 < T < ∞). Then the aver-
age time for operating profit during [0, T ] is:
1
h(T ) = . (7.9)
T
Example 7.2
[4, p. 8] Suppose there is one spare unit available for replacement, the operating
unit is replaced preventively with the spare one at time T (0 < T ≤ ∞), and the spare
unit should operate until failure. When both units have an identical failure distri-
bution F(t ) with mean µ , the mean time to failure of either unit is:
T T
l1(T ) =
∫
0
tdF(t ) + F(T )(T + µ ) =
∫ F(t)dt + µ F(T ).
0
1
h(T ) = . (7.10)
µ
Next, suppose there are unlimited spare units available for replacement and each
unit has an identical failure distribution F(t ). When preventive replacement is
planned at time T (0 < T ≤ ∞), the mean time to failure of any unit is:
T
1 F(T )
l(T )=
∫ tdF(t)+F(T )[T +l(T )], i.e., =
l(T ) T
, (7.11)
∫
0
F(t )dt
0
Example 7.3
[4, p. 8] The failure distribution of an operating unit with age T (0 ≤ T < ∞) is:
F(t + T ) − F(T )
F(t ; T ) ≡ Pr{T < X ≤ T + t | X > T } = , (7.12)
F(T )
∞ ∞
1 1
F(T ) ∫ [F(T ) − F(t + T )]dt = F(T ) ∫ F(t)dt,
0 T
(7.13)
T T
∫ (θ t ) N e −θ t dF (t )
∫ t dF (t ) ≤ h(T )N
0
T ≤ 0
T
∫ (θ t ) e −θ t
∫ t F ( t ) dt
N N
F ( t ) dt
0 0
∞ ∞ (7.14)
≤
∫ T
∞
(θ t ) N e −θ t dF (t )
≤
∫T
∞
t N dF ( t )
.
∫ (θ t ) −θ t
∫t
N N
e F ( t ) dt F (t )dt
T T
Inequality V:
T T
∫ e −θ t dF (t )
∫ t dF (t ) ≤ h(T ) ≤ F (T ) N
F (T )
0
T ≤ ≤ T
0
T ∞
∫e0
−θ t
F (t )dt
∫ F (t )dt ∫ t F (t )dt
0 ∫ F (t )dt 0
N
∞ ∞ (7.15)
≤
∫ T
∞
t N dF (t )
≤
∫ T
∞
e−θ t dF (t )
.
∫t ∫e
N −θ t
F (t )dt F (t )dt
T T
170 Reliability Engineering
Inequality VI:
T t T ∞ t
∫0
T
∫
(θ t ) N [ e −θ u dF (u)]dt
t
0
≤
∫ 0
T
e −θ t dF
F (t )
≤
∫ T
T ∫
(θ t ) N [ e −θ u dF (u)]dt
∞
0
∫ (θ t ) [∫ e −θ u
∫e ∫ (θ t ) [∫ e
N −θ t N −θ u
F (u)du]dt F (t )dt F (u)du]dt
0 0 0 0 t
T ∞ ∞
≤
∫ 0
T
∫ (θ t ) N [
t
∞
e −θ u dF (u)]dt
≤
∫e T
∞
−θ t
dF (t )
(7.16)
∫ (θ t ) [∫ e ∫e
N −θ u −θ t
F (u)du]dt F (t )dt
0 t T
∞ ∞
≤
∫ (θ t ) [∫ e
T
∞
N
t
∞
−θ u
dF (u)]dt
.
∫ (θ t ) [∫ e N −θ u
F (u)du]dt
T t
t −θ u T
∫ ∫
T T
0 ∫ ≤ T t
0 0
e dF (u) dt
e −θ t F (t )dt
≤
∫e
0
T
−θ t
dF (t )
−θ t −θ u
T t
∫ 0
e F (u)du dt
0 ∫ e F (u)du dt
0 0 ∫ ∫ ∫e
0
−θ t
F ( t ) dt
∫e
−θ t
dF (t )
F (T ) F (T )
≤ T ≤ h(T ) ≤ T
∞ ≤ ∞
∫ F (t )dt
0 ∫e T
−θ t
F (t )dt
∫ T
F (t )dt
∞ ∞ ∞
≤
∫ ∫ T
∞
[
t
∞
e −θ u dF (u)]dt
≤
∫e ∞
T
−θ t
∞
F ( t ) dt
, (7.17)
∫ [∫ e T t
−θ u
F (u)du]dt
∫ e [∫ T
−θ t
t
F (u)du]dt
Example 7.4
[5, p. 30] Suppose when the random time Y has an exponential distribution
Pr{Y ≤ t } = 1− e −θ t , the unit is replaced preventively at time T (0 < T ≤ ∞) or at time
Y , whichever occurs first. Then the expected cost rate is:
∫e −θ t
cT + (cF − cT ) dF(t )
C (T ) = T
0
, (7.18)
∫ 0
e −θ t F(t )dt
An Overview on Failure Rates in Maintenance Policies 171
where:
cT = replacement cost at time T or at time Y,
cF = replacement cost at failure with cF > cT .
T
cT
∫e
−θ t
F(t )[h(T ) − h(t )]dt = . (7.19)
0 cF − cT
Example 7.5
[5, p. 44] Suppose the unit is replaced preventively at time T (0 < T ≤ ∞) or at
working number N (N = 1,2,), i.e., at Y1 + Y2 + + YN, whichever occurs first.
Denoting that G ( j ) (t ) ≡ Pr{Y1 + Y2 + + Yj ≤ t } ( j = 1,2,) and G (0) (t ) ≡ 1 for t > 0,
the expected cost rate is:
C (T , N ) =
cT + (cF − cT )
T
∫ 0
1− G(N ) (t ) dF(t )
. (7.20)
∫0
1− G(N ) (t ) F(t )dt
∞
When G(t ) = 1− e −θ t , G (N) (t ) = ∑ j = N[(θ t ) j /j!]e −θ t (N = 0,1,2,). Optimum time T to
minimize C (T , N) satisfies:
N −1
(θ t ) j −θ t
∑∫
T
cT
e F(t )[h(T ) − h(t )]dt = , (7.21)
j =0
0 j! cF − cT
∫ (θ t) e
N −θ t
dF(t ) N −1 N −1
(θ t ) j −θ t (θ t ) j −θ t
∑∫ ∑∫
T T
0
T
e F(t )dt − e dF(t )
j! j!
∫ (θ t) e N −θ t 0 0
F(t )dt j =0 j =0
0 (7.22)
cT
≥ .
cF − cT
Example 7.6
[5, p. 46] Suppose the unit is replaced preventively at time T (0 < T ≤ ∞) or at
working number N (N = 1,2,), whichever occurs last. Then the expected cost
rate is:
∫ [1− G (N )
cT + (cF − cT ){F(T ) + (t )]dF(t )}
C (T , N) = T ∞
T
. (7.23)
∫0
F(t )dt +
∫T
[1− G (N) (t )]F(t )dt
172 Reliability Engineering
N −1
(θ t ) j −θ t
∑∫
T ∞
cT
∫
0
F(t )[h(T ) − h(t )]dt −
j =0
T j!
e F(t )[h(t ) − h(T )]dt =
cF − cT
, (7.24)
∫ (θ t) e N −θ t
dF(t ) N −1
(θ t ) j −θ t
∑∫
∞
T
T
∞ ∫ F(t )dt +
j!
e F(t )dt
∫ (θ t) e F(t )dt
N −θ t 0 T
j =0
T
N −1 (7.25)
(θ t ) j −θ t
∑∫
∞
cT
− F(T ) − e dF(t ) ≥ .
T j! cF − cT
j =0
Example 7.7
[5, p. 34] Suppose the unit is replaced preventively at the next working time over time
T, e.g., at time Yj +1 for Yj < T ≤ Yj +1. When G(t ) = 1− e −θ t , the expected cost rate is:
∫ θe −θ ( t −T )
cF − (cF − cT ) F(t )dt
C (T ) = T
T
∞
, (7.26)
∫ F(t)dt + ∫ e
0 T
−θ ( t −T )
F(t )dt
where:
cT = replacement cost over time T,
cT < c F .
∫e −θ t
dF(t ) T
cT
T
∞ ∫ F(t)dt − F(T ) = c F − cT
. (7.27)
∫e
−θ t 0
F(t )dt
T
Example 7.8
[5, p. 9] Suppose the unit is replaced preventively at the end of the next working
number over time T or at working number N (N = 1,2,), whichever occurs first.
Then the expected cost rate is:
C (T , N) =
N −1 N −1
∑
∞
∑[(θ T )
∞
cF − (cF − cT ) − [(θ T ) j / j!]e −θ T
∫ θ e −θ t F(t )dt − / j!]e −θ T
∫ θe
j −θ t
F(t )dt
j =0
T T
j =0
N −1 .
∑∫
T ∞
j =0
{ [(θ t ) j / j!]e −θ t F(t )dt + [(θ T ) j / j!]e −θ T
0 ∫
T
e −θ t F(t )dt }
(7.28)
An Overview on Failure Rates in Maintenance Policies 173
∫e −θ t
dF(t ) N −1 N −1
(θ t ) j −θ t (θ t ) j −θ t
∑∫ ∑∫
T T
cT
T
∞
e F(t )dt − e dF(t ) = , (7.29)
j! j! cF − cT
∫e −θ t 0 0
F(t )dt j =0 j =0
T
T
∞
∫ 0 t ∫
(θ t )N −1 e −θ udF(u) dt
N −1
T ∞
∫ 0
−θ u
(θ t ) e F(u)du dt
t ∫
N −1
(7.30)
(θ T ) j (θ t ) j −θ t
∑
∞ T
∫e ∫
−θ t
× F(t )dt + e F(t )dt
j =0 j! T 0 j!
N −1
(θ T ) j (θ t ) j −θ t
∑
∞ T
cT
−
j =0
j! ∫ T
e −θ t dF(t ) +
∫
0 j!
e dF(t ) ≥
cF − cT
.
Example 7.9
[6, p. 13] Suppose the unit is replaced preventively at the end of the next working
number over time T or at working number N (N = 1,2,), whichever occurs last.
Then the expected cost rate is:
C (T , N) =
∞ ∞
∑[(θ T )
∞
∫ F(t)dt + ∑∫ [(θ t)
∞
∑[(θ T )
T ∞
∫e
−θ t −θ T −θ t
j
/ j!]e F(t )dt + j
/ j!]e F(t )dt
0 T T
j =0 j =N
(7.31)
∫e −θ t
dF(t ) N −1
(θ t ) j −θ t
∑∫
∞
T
T
∞ ∫ F(t )dt +
j!
e F(t )dt
∫e F(t )dt
−θ t 0 T
j =0
T (7.32)
N −1
(θ t ) −θ t
∑∫
∞ j
cT
− F(T ) − e dF(t ) = ,
T j! cF − cT
j =0
174 Reliability Engineering
∫ (θ t) [∫ e N −θ u
dF(u)]dt N −1
(θ t ) j −θ t
∑∫
T ∞
T
∞
t
∞
F(u)du]dt
∫ F(t )dt +
j!
e F(t )dt
∫ (θ t) [∫ e N −θ u 0 T
j =0
T t
(θ T )
∞
+∑
j ∞
(7.33)
e −θ T
∫e F(t )dt
−θ t
j! T
j =N
∞
(θ T ) j −θ T (θ t ) j −θ t
∑
∞ ∞
cT
−
j =N
j!
e
∫ T
e −θ t dF(t ) −
∫ T j!
e dF(t ) ≥
cF − cT
.
∫ e −θ t h(t )dt
∫ (θ t ) N e −θ t h(t )dt
∫ t h(t )dt ≤ h(T )
N
0
T ≤ 0
T ≤ 0
T
∫e ∫ (θ t ) ∫ t dt
−θ t
dt N
e−θ t dt N
0 0 0
∞
(7.34)
∞
∫ (θ t ) N e −θ t h(t )dt
≤
∫ θe −θ t
h(t + T )dt ≤ T
∞ .
∫ (θ t ) N e −θ t dt
0
T
Inequality IX:
T
t ∞
t
∫ ∫ ∫ ∫ e h(u)du dt
T
(θ t ) N e −θ u h(u)du dt
∫
−θ u
e −θ t h(t )dt (θ t ) N
0 0 ≤ 0 ≤
T 0
T
T
t
∞
t
∫
0 0 ∫
(θ t ) N e −θ u du dt
0
−
e dtθ t
T
(θ t ) N ∫ ∫ ∫ e du dt
0
−θ u
T
∞
∫ ∫
(θ t ) N e −θ u h(u)du dt ∞
t ≤ e −θ t h(t + T )dt
∫
0
≤ (7.35)
T
∞
∫ ∫
0
(θ t ) N e−θ u du dt
0 t
∞
∞
≤
∫
T t ∫
(θ t ) N e −θ u h(u)du dt
.
N
∞ ∞
∫ ∫
−θ u
(θ t ) e du dt
T t
Note that all these functions increase with T and N .
An Overview on Failure Rates in Maintenance Policies 175
Example 7.10
[5, p. 65] Suppose the unit is replaced at time T (0 < T ≤ ∞) or at time Y , whichever
occurs first. Then the expected cost rate is:
∫e −θ t
cT + cM h(t )dt
C (T ) = T
0
, (7.36)
∫ 0
e −θ t dt
where:
cT = replacement cost at time T or at time Y,
cM = cost of minimal repair at each failure.
T
cT
∫e
−θ t
[h(T ) − h(t )]dt = . (7.37)
0 cM
Example 7.11
[5, p. 77] Suppose the unit is replaced at time T (0 < T ≤ ∞) or at working number
N (N = 1,2,), whichever occurs first. Then, the expected cost rate is:
C (T , N) =
cT + cM
T
∫ [1− G
0
(N )
(t )]h(t )dt
. (7.38)
∫ 0
[1− G (N) (t )]dt
N −1
(θ t ) j −θ t
∑∫
T
c
e [h(T ) − h(t )]dt = T , (7.39)
j =0
0 j! cM
∫ (θ t) e h(t)dt ∑
N −θ t
N −1 N −1
(θ t ) j −θ t (θ t ) j −θ t
∑∫
T T
c
0
T ∫ j!
e dt −
j!
e h(t )dt ≥ T . (7.40)
∫ (θ t) e dt N −θ t
j =0
0
j =0
0 cM
0
176 Reliability Engineering
Example 7.12
∫ [1− G (N )
cT + cM {H(T ) + (t )]h(t )dt }
C (T , N) = ∞
T
. (7.41)
T+
∫ T
[1− G (N) (t )]dt
N −1
(θ t ) j −θ t
∑∫
T ∞
c
∫0
[h(T ) − h(t )]dt −
j =0
T j!
e [h(t ) − h(T )]dt = T ,
cM
(7.42)
∫ (θ t) e h(t)dt T + ∑
N −θ t
(θ t ) j −θ t
N −1 ∞
T
∞
∫ j!
e dt − H(T )
∫ (θ t) e dt N −θ t T
j =0
T
(7.43)
N −1
(θ t ) j −θ t
∑∫
∞
c
− e h(t )dt ≥ T .
T j! cM
j =0
Example 7.13
[6, p. 39] Suppose the unit is replaced at the end of the next working number over
time T. When G(t ) = 1− e −θ t , the expected cost rate is:
∫e −θ t
cT + cM[H(T ) + h(t + T )dt ]
C (T ) = 0
. (7.44)
T + 1/ θ
T
cT
∫ θe
−θ t
T [h(t + T ) − h(t )]dt = . (7.45)
0 cM
An Overview on Failure Rates in Maintenance Policies 177
Example 7.14
[6, p. 41] Suppose the unit is replaced at the next working number over time
T (0 < T ≤ ∞) or at working number N (N = 1,2,), whichever occurs first. Then,
the expected cost rate is:
N −1
∑∫ ∫
T T ∞
cT + cM
∫ 0
[1− G (N) (t )]h(t )dt +
j =0
0 T
u]dG ( j ) (t )
G(u − t )h(u)du
[
C (T , N) = T
. (7.46)
∫ 0
[1− G (N) (t )]dt + (1/ θ )[1− G (N) (T )]
N −1
(θ t ) j −θ t
∑∫
T ∞
c
∫ θe −θ u
e [h(u + T ) − h(t )]du dt = T , (7.47)
j =0
0 j! 0 cM
T
∞
∫ (θ t) ∫ e h(u)du dt
N −1 −θ u
N −1
θ (θ t ) j −θ t (θ T ) j −θ T
∑ ∫
T
0 t
T
e dt + e
j! j!
∫ (θ t) e dt
N −1 −θ t 0
j =0
0
(7.48)
N −1
(θ t ) j −θ t (θ T ) j c
∑∫
T ∞
−
j =0
0 j!
e h(t )dt +
j! ∫ T
e −θ t h(t )dt ≥ T .
cM
Example 7.15
[6, p. 44] Suppose the unit is replaced at the next working number over time
T (0 ≤ T ≤ ∞) or at working number N (N = 0,1,2,), whichever occurs last. Then,
the expected cost rate is:
∞ ∞
∑∫ ∫ G(u − t)h(u)du dG
T ∞
∫
( j)
cT + cM H(T ) + [1− G (N) (t )]h(t )dt + (t )
T
j =N
0 T
C (T , N) = ∞
. (7.49)
T+
∫ [1− G
T
(N )
(t )]dt + (1/ θ )G (N) (T )
T
∞
∫ ∫ θ e
−θ u
[h(u + T ) − h(t )]du dt
0 0
N −1
(7.50)
(θ t ) j −θ t
∑∫
∞ ∞
c
+
j =0
T j!
e
∫ 0
θ e −θ u[h(u + T ) − h(t )]du dt = T ,
cM
∫ ∫e −θ u
(θ t )N −1[ h(u)du]dt
(θ T ) j −θ T
N −1
T
∞
t 1+ θ T +
∑ j!
e − H(T )
∫ T
(θ t )N −1e −θ t dt j =0
(7.51)
N −1
∞ (θ t ) −θ t
j
(θ T ) c
∑∫
∞ j ∞
∫e ∫
−θ t
− h(t + T )dt − e h(t )dt − e −θ t h(t + T )dt ≥ T .
0
j =0 T j! j! 0 cM
H (t ) k − H ( t )
pk (t ) ≡ e ( k = 0,1,2),
k!
and the probability that more than k failures occur in [0, t ] is Pk (t ) = ∑ ∞j = k p j (t ). Note
that Pk (t ) ≡ 1 − Pk (t ) = ∑ kj −=01 p j (t ) . Suppose the unit undergoes minimal repair at
failures and is replaced at time T or at failure number K , we give the following
inequalities of the extended failure rates: For 0 < T < ∞ and K = 0,1,2,:
Inequality X:
T ∞
T
F (T )
≤
∫ 0
pK (t )h(t )dt
T ≤ h(T ) ≤ ∞
F (T )
≤
∫ T
pK (t )h(t )dt
∞ . (7.52)
∫ 0
F (t )dt
∫ 0
pK (t )dt
∫ T
F (t )dt
∫ T
pK (t )dt
Inequality XI:
T T
∫ 0
pK (t )h(t )dt
T
≤ T
∫ p (t)h(t)dt 0
≤
K
F (T )
∞ ∞
∫ 0
pK (t )dt
∫ p (t)[∫ F(u)du / F(t)]dt ∫ F(t)dt
0
K
t T
∞ ∞
≤
∫ T
pK (t )h(t )dt
∞
≤ ∞
∫ T
pK (t )h(t )dt
∞
. (7.53)
∫ T
pK (t )dt
∫ T
pK (t )[
∫
t
F(u)du / F(t )]dt
An Overview on Failure Rates in Maintenance Policies 179
Inequality XII:
T ∞
∫0
pK (t )h(t )dt
T ≤ ∞
1
≤
∫T
pK (t )h(t )dt
∞ , (7.54)
∫ 0
pK (t )dt
∫0
pK (t )dt
∫ T
pK (t )dt
1
≤
∫ p (t )h(t )dt0
K
≤ ∞
1
µ T
∞
∫ p (t ) ∫ F (u)du / F (t ) dt ∫
0
K
t 0
pK +1(t )dt
∞
(7.55)
≤
∫ T
pK (t )h(t )dt
.
∞
∞
∫T
pK ( t )
∫ t
F (u)du / F (t ) dt
Example 7.16
[2, p. 104] Suppose the unit is replaced at failure number K (K = 1, 2,). Then, the
expected cost rate is:
cK + cMK
C (K ) = ∞
, (7.56)
∫0
P K (t )dt
Example 7.17
C (T , K ) =
cT + cM
∫ P (t)(t)h(t)dt ,
T
0
K
(7.58)
∫ P (t)dt
0
K
T T
cT
h(T )
∫ P (t)dt − ∫ P (t)h(t)dt = c
0
K
0
K
M
, (7.59)
180 Reliability Engineering
∫ ∫
T
0
T K K . (7.60)
c
∫ p (t)dt
0 0 M
K
0
Example 7.18
C (T , K ) =
cT + cM[H(T ) +
∞
∫ P (t)h(t)dt] .
T
K
(7.61)
T+
∫T
P K (t )dt
∞
∞
cT
h(T ) T +
∫
T
P K (t )dt − H(T ) −
∫ P (t)h(t)dt = c
T
K
M
, (7.62)
∫ ∫
T
T
∞ K K . (7.63)
c
∫ p (t)dt T T M
K
T
Example 7.19
[6, p. 47] Suppose the unit is replaced at the first failure over time T (0 ≤ T < ∞).
Then, the expected cost rate is:
cT + cM[H(T ) + 1]
C (T ) = ∞
, (7.64)
T+
∫T
e −[H(t ) −H(T )]dt
cT
TQ(T ) − H(T ) = , (7.65)
cM
where:
F(T )
Q(T ) ≡ ∞
.
∫ T
F(t )dt
An Overview on Failure Rates in Maintenance Policies 181
Example 7.20
[6, p. 47] Suppose the unit is replaced at failure number K (K = 1, 2,) or at the
first failure over time T (0 ≤ T < ∞), whichever occurs first. Then, the expected
cost rate is:
C (T , K ) =
∫ P (t)h(t)dt + P (T )] .
cT + cM[
T
0
K
∞
K
(7.66)
∫ P (t)dt + P (T )∫ e
−[H (t ) − H (T )]
K dt K
0 T
0
T
(7.68)
cT
−
∫0
P K (t )h(t )dt − P K (T ) ≥
cM
.
Example 7.21
[6, p. 50] Suppose the unit is replaced at failure number K (K = 0,1, 2,) or at the
first failure over time T (0 ≤ T < ∞), whichever occurs last. Then, the expected cost
rate is:
∞
C (T , K ) =
∫ P (t)h(t)dt + P (T )] .
cT + cM[H(T ) +
∞
T
K
∞
K
(7.69)
∫ ∫
−[H (t ) − H (T )]dt
T + P (t )dt + P (T ) e K K
T T
∞
∞
cT
Q(T ) T +
∫
T
P K (t )dt − H(T ) +
∫ P (t)h(t)dt = c
T
K
M
, (7.70)
∫p K −1 (t )h(t )dt
∞ ∞
∫ ∫e
−[H (t ) − H (T )]
T + P K (t )dt + PK (T )
T
∞
dt
∫ pK −1(t )[h(t ) / Q(t )]dt
T T
T
(7.71)
∞
c
− H(T ) +
∫
T
P K (t )h(t )dt + PK (T ) ≥ T .
cM
182 Reliability Engineering
7.6 CONCLUSIONS
We surveyed several extended failure rates that appeared in the recent age, random,
and periodic replacement models. The reliability properties of these extended failure
rates would be helpful in obtaining optimum maintenance times for complex sys-
tems. We also gave the inequalities of the failure rates, which would help greatly to
compare their optimum replacement policies.
There are some examples for which we cannot give inequalities. For example:
T
1. T
F (T )
and
∫ (θ t ) e
0
T
N −θ t
dF ( t )
.
∫
F (t )dt
0 ∫ (θ t ) e
0
N −θ t
F ( t ) dt
∫ (θ t ) e N −θ t
dF ( t )
F (T ) T
2. ∞ and ∞ .
∫ ∫ (θ t ) e −θ t
N
F (t )dt F ( t ) dt
T T
1
h(T ) and
3. ∞ .
∫0
pK (t )dt
∫
1. (θt ) N e − θt dF (t ) /
∫ (θt ) e − θt F (t )dt increases with N from:
N
0 0
∫e 0
T
−θ t
dF (t )
≤
F (T )
T to h(T ) ≥
F (T )
T .
∫e
0
−θ t
F (t )dt
∫0
F (t )dt
∫
0
F (t )dt
∞ ∞
∫
2. (θ t ) N e −θ t dF (t ) /
∫ (θ t ) e −θ t F (t )dt increases with N from:
N
T T
∫e −θ t
dF (t )
F (T ) F (T )
T
∞ ≤ ∞ to h(T ) ≥ ∞ .
∫e
T
−θ t
F (t )dt
∫
T
F (t )dt
∫
T
F (t )dt
∞
3.
∫
1/ pK (t )dt increases with K from 1/µ ≥ h(0) to h(∞).
0
An Overview on Failure Rates in Maintenance Policies 183
APPENDICES
Assuming that the failure rate h(t ) increases with t from h(0) to h(∞), we complete
the following proofs.
APPENDIX 7.1
Prove that for 0 < T < ∞:
T
H (T ) 1
T
=
T ∫ h(t )dt
0
F (T ) H (T )
T ≤ ≤ h(T ). (A7.1)
T
∫ F (t )dt
0
T
H (T ) 1
lim
T →∞ T
= lim
T →∞ T ∫ h(t )dt = h(∞),
0
and
T
d[ H (T ) / T ] 1 1
dT
= 2 [Th(T ) − H (T )] = 2
T T ∫ [h(T ) − h(t )]dt ≥ 0.
0
T
L(T ) ≡ H (T )
∫ F (t )dt − TF (T ),
0
T
dL(T )
dT
= h(T )
∫ F (t )dt + H (T )F (T ) − F (T ) − Tf (T )
0
=
∫ [h(T ) − h(t )][F (T ) − F (t )]dt ≥ 0,
0
APPENDIX 7.2
Prove that for 0 < T < ∞:
∞
T∫ F (t )dt (A7.2)
∞
t
∫ ∫ F (u)du dt
T 0
∫ F (t )dt (A7.3)
0
T
∞
∫ ∫ F (u)du dt
0 t
∫ T
≤
∫ F (t )dt . (A7.4)
F (t )dt
0
∞ t
T ∞
lim ∞
∫ F (t )dt = lim F (T ) = 1 ,
T
t T
µ
∫ [∫ F (u)du]dt ∫ F (t )dt
T →∞ T →∞
T 0 0
lim T
∫ F (t )dt = lim F (T ) = 1 .
0
∞ ∞
µ
∫ [∫ F (u)du]dt ∫ F (t )dt
T →0 T →0
0 t T
T
Furthermore, because F (T ) /
∫ F (t )dt increases with T:
0
∫ F (t )dt ≥ F (T ) .
∞
T
t T
∫ [∫ F (u)du]dt ∫ F (t )dt
T 0 0
∞
Similarly, because F (T ) /
∫ T
F (t )dt increases with T:
∫ F (t )dt ≤ F (T ) .
T
0
∞ ∞
∫ [∫ F (u)du]dt ∫ F (t )dt
0 t T
An Overview on Failure Rates in Maintenance Policies 185
∞ ∞ t
Differentiating
∫ F (t )dt / ∫ [∫ F (u)du]dt with respect to T:
T T 0
∞
t ∞ T
− F (T )
∫ ∫ T
0 F ( u ) du dt +
∫ T
F ( t ) dt
∫ F ( t ) dt
0
∞
T ∞
t
∫ F (t )dt − F (T ) ≥ 0,
=
∫ F (t )dt ∫ ∫ 0 F (u)du dt ∞
T
t T
∫ [∫ F (u)du]dt ∫ F (t )dt
0 T
T 0 0
APPENDIX 7.3
For 0 < T < ∞ and N = 0,1,2,:
T
∫ t dF (t ) (A7.5) N
0
T
∫ t F (t )dt 0
N
∫
increases with T from h(0) and increases with N from F (T ) / F (t )dt to h(T ):
0
∫ (θt ) e N − θt
dF (t )
0
T
(A7.6)
∫ (θt ) e
0
N − θt
F (t )dt
T T
∫ e −θ t dF (t ) /
∫e −θ t
increases with T from h(0) and increases with N from F (t )dt to
0 0
h(T ):
T T
∫ ∫ (θt ) e − θt
t N dF ( t ) N
dF (t )
0
T ≥ 0
T . (A7.7)
∫t0
N
F (t )dt
∫ (θt ) e 0
N − θt
F (t )dt
∫ ∫ (θt ) e
− θt
t N dF ( t ) N
dF (t )
lim 0
T = lim 0
T = h(0),
∫t ∫ (θt ) e
T →0 N T →0 N −θt
F (t )dt F (t )dt
0 0
186 Reliability Engineering
T T
∫ ∫ (θt ) e
− θt
t N dF ( t ) N
dF (t )
lim 0
T = lim 0
T = h(T ).
∫t ∫ (θt ) e
N →∞ N N →∞ N −θt
F (t )dt F (t )dt
0 0
T T
∫ t N dF ( t ) /
∫t N
Differentiating F (t )dt with respect to T :
0 0
T T
T N f (T )
∫ t N F (t )dt − T N F (T )
∫t N
dF ( t )
0 0
T
= T N F (T )
∫ 0
t N F (t )[h(T ) − h(t )]dt ≥ 0,
T T
∫ t N +1dF (t )
∫ t dF ( t ) , N
0
T − 0
T
∫t ∫ t F ( t ) dt
N +1 N
F ( t ) dt
0 0
and denoting:
T T T T
∫ t N +1dF (t )
∫ ∫ ∫t N +1
L(T ) ≡ t N F (t )dt − t N dF (t ) F (t )dt ,
0 0 0 0
T T
− T N f (T )
∫t N +1
F (t )dt − T N +1 F (T )
∫t N
dF ( t )
0 0
= T N F (T )
∫t 0
N
F (t )(T − t )[h(T ) − h(t )]dt ≥ 0,
∫ t N dF ( t )
∫ (θt ) e − θt F (t )dt
N
L(T ) ≡
0 0
T T
−
∫ 0
t N F (t )dt
∫ (θt )
0
N
e − θt dF (t ),
An Overview on Failure Rates in Maintenance Policies 187
T T
− T N F (T )
∫ (θ t ) N e −θ t dF (t ) − (θ T ) N e −θ T f (T )
∫t N
F (t )dt
0 0
T
= T N F (T )
∫ (θ t ) F (t )(e −θ t − e −θ T )[h(T ) − h(t )]dt ≥ 0,
N
0
which completes the proof of Equation (A7.7).
APPENDIX 7.4
For 0 < T < ∞ and N = 0,1,2,:
∞
∫ t dF ( t )
T
∞
N
∫ t F (t )dt
N
∞
increases with T to h(∞) and increases with N from F (T ) /
∫T
F (t )dt to h(∞):
∫ (θt ) e N − θt
dF (t )
T
∞
∫ (θt ) e − θt
N
F (t )dt
T
∞ ∞
increases with T to h(∞) and increases with N from
h(∞), and:
∫e
T
− θt
dF ( t ) /
∫e
T
− θt
F (t )dt to
∞ ∞
∫ ∫ (θt ) e − θt
t N dF (t ) N
dF (t )
T
∞ ≥ T
∞ .
∫t ∫ (θt ) e − θt
N N
F (t )dt F (t )dt
T T
Proof. Appendix 7.4 can be proved by using the similar discussions of Appendix 7.3.
APPENDIX 7.5
Prove that:
T ∞
∫ ∫e − θu
(θt ) N [ dF (u)]dt
0 t
T ∞ (A7.8)
∫ (θt ) [∫ e
0
N
t
− θu
F (u)du]dt
188 Reliability Engineering
∞ ∞
∫ e − θt dF (t ) /
∫e − θt
increases with T from F (t )dt and increases with N to:
∞ ∞ 0 0
∫e ∫e
− θt − θt
dF ( t ) / F (t )dt ,
T T
∞ t
∫ T
∞
∫ (θt ) N [ e − θu dF (u)]dt
t
0
(A7.9)
∫ (θt ) [∫ e − θu
N
F (u)du]dt
T 0
∞ ∞
e − θt dF (t ) /
∫ ∫e − θt
increases with T to F (t )dt and increases with N to
∞ ∞
∫ ∫
− θt − θt 0 0
e dF (t ) / e F (t )dt , and:
0 0
T ∞ ∞ t
∫ 0
T
∫ (θt ) N [
t
∞
e − θu dF (u)]dt
≥
∫ T
∞
∫(θt ) N [ e − θu dF (u)]dt
t
0
. (A7.10)
∫ (θt ) [∫ e N − θu
∫ (θt ) [∫ e − θu
N
F (u)du]dt F (u)du]dt
0 t T 0
T ∞ ∞
∫ ∫ e − θu dF (u)]dt
∫e − θt
(θt ) N [ dF ( t )
0 t 0
lim T ∞ = ∞ ,
∫ (θt ) [∫ e ∫e
T →0 N − θu − θt
F (u)du]dt F (t )dt
0 t 0
T ∞ ∞
lim
∫ 0
T
∫ (θt ) N [
t
∞
e − θu dF (u)]dt
=
∫e T
∞
− θt
dF ( t )
,
∫ (θt ) [∫ e ∫e
N →∞ N − θu − θt
F (u)du]dt F (t )dt
0 t T
and:
∞
t ∞
t
lim ∞
T ∫ ∫
(θt ) N e − θu dF (u) dt
0 = lim
∫ T 0 ∫
(θt ) N e − θu dF (u) dt
t
∞
t
∫ ∫ ∫ ∫
T →∞ − θu N →∞
N
(θt ) e F (u)du dt (θt ) N e − θu F (u)du dt
T 0 T 0
∞
∫e
− θt
dF ( t )
0
= ∞ .
∫e 0
− θt
F (t )dt
An Overview on Failure Rates in Maintenance Policies 189
∞ ∞
∫ e − θt dF (t ) /
∫e − θt
Furthermore, because F (t )dt increases with T :
T T
T ∞ ∞
∫ (θt ) [∫ e ∫e
N − θu − θt
dF (u)]dt F (t )
dF
0
T
t
∞ ≤ T
∞ .
∫ (θt ) [∫ e − θu
∫e
N − θt
F (u)du]dt F (t )dt
0 t T
T ∞ T ∞
∫ ∫ e − θu dF (u)]dt /
∫ ∫e − θu
Differentiating (θt ) N [ (θt ) N [ F (u)du]dt with respect to T:
0 t 0 t
∞ T
∞
∫ e − θt dF (t )
∫ ∫e − θu
(θT ) N (θt ) N F (u)du dt
T 0 t
∞ T
∞
− (θT ) N
∫ T
e − θt F (t )dt
∫ 0
(θt ) N
∫e t
− θu
dF (u) dt
∞ T
∞
= (θT ) N
∫ T
e − θt F (t )dt
∫0
(θt ) N
∫e t
− θu
F ( u ) du dt
∞ T ∞
∫ ∫ ∫e
− θu
e − θt dF (t ) (θt ) N [ dF (u)]dt
× T
∞ − 0
T
t
∞ ≥ 0,
∫e T
− θt
F (t )dt
∫ (θt ) [∫ e 0
N
t
− θu
F (u)du]dt
∞ ∞
∫ e − θt dF (t ) /
∫e − θt
which follows that Equation (A7.8) increases with T from F (t )dt.
0 0
Forming:
T ∞ T ∞
t
− θu
F (u)du]dt
∫ (θt ) [∫ e 0
N
t
− θu
F (u)du]dt
and denoting:
T
∞
T
∞
∫ (θt ) N +1
∫ e − θu dF (u) dt
∫ ∫e − θu
L(T ) ≡ (θt ) N F (u)du dt
0 t 0 t
T
∞
T
∞
∫ ∫ e − θu dF (u) dt
∫ (θt ) N +1
∫e
− θu
− (θt ) N F (u)du dt ,
0 t 0 t
190 Reliability Engineering
dL(T ) ∞ T
∞
= (θT ) N +1
∫ e − θt dF (t )
∫ ∫e − θu
(θt ) N F (u)du dt
dT T 0 t
∞ T
∞
+ (θT ) N
∫T
e − θt F (t )dt
∫ 0
(θt ) N +1
∫e t
− θu
dF (u) dt
∞ T
∞
− (θT ) N
∫ T
e− θt dF (t )
∫ 0
(θt ) N +1
∫e t
− θu
F (u)du dt
∞ T
∞
− (θT ) N +1
∫ T
e − θt F (t )dt
∫ 0
(θt ) N
∫e t
− θu
dF (u) dt
∞ T
∞
=
∫T
e − θt F (t )dt
∫ 0
N N
(θT ) (θt ) (θT − θt )
∫et
− θu
F (u)du
∞ ∞
∫ ∫e
− θu
e − θu dF (u) dF ( u )
× T
∞ − t
∞ dt ≥ 0,
∫e
T
− θu
F (u)duu
∫e t
− θu
F ( u ) du
∞ ∞
∫ e − θt dF (t ) /
∫e − θt
which follows that Equation (A7.8) increases with N to F (t )dt .
T ∞ T ∞
∫ ∫e
− θt − θt
Similarly, we can prove Equation (A7.9) increases with T to e dF ( t ) /
∞ ∞ 0 0
∫e ∫e
− θt − θt
F (t )dt and increases with N to dF ( t ) / F (t )dt and complete the proof of
0 0
Equation (A7.10).
APPENDIX 7.6
Prove that:
T t
∫ 0
T
∫ (θt ) N [ e − θu dF (u)]dt
t
0
(A7.11)
∫ (θt ) [∫ e
0
N
0
− θu
F (u)du]dt
T T
∫e ∫e
− θt − θt
increases with T from h(0) and increases with N to dF ( t ) / F (t )dt , and:
0 0
∞ ∞
∫ ∫e − θu
(θt ) N [ dF (u)]dt
T t
∞ ∞ (A7.12)
∫ (θt ) [∫ e − θu
N
F (u)du]dt
T t
∫ ∫ ∫e ∫e
− θu − θt − θt
(θt ) e F (u)du dt F (t )dt F (t )dt
0 0 0 T
∞
∞
≤ ∞
T ∫ t ∫
(θtt ) N e − θu dF (u) dt
.
N
∞
∫ ∫
− θu
(θt ) e F (u)du dt
T t
Using the similar discussions of Appendices 7.5 and 7.6 can be proved.
Similarly, we can prove that the failure rates in VIII and IX increase with T and
N and obtain the inequalities for 0 < T < ∞ and N = 0,1,2,.
APPENDIX 7.7
Prove that for 0 < T < ∞ and K = 0,1,2,:
T
∫ p (t )h(t )dt (A7.13)
0
T
K
∫ p ( t ) dt
0
K
∞ T
increases with T from h(0) to1/
to h(T ), and:
∫ 0
pK (t )dt and increases with K from F (T ) /
∫ F (t )dt
0
T
F (T )
≤
∫ p (t )h(t )dt ≤ h(T ) ≤ F (T ) , (A7.14)
0
T
K
∞
∫0
F (t )dt
∫ p (t )dt 0
K ∫ F (t )dt T
∫ p (t )h(t )dt ≤
0
K
T ∞
1
. (A7.15)
∫ p (t )dt ∫
0
K
0
pK (t )dt
lim
∫
0
pK (t )h(t )dt
T = h(0), lim
∫ p (t )h(t )dt =
0
T
K
∞
1
,
∫ p ( t ) dt ∫ p (t )dt ∫
T →0 T →∞
K K
pK (t )dt
0 0 0
192 Reliability Engineering
T T
lim
∫ 0
pK (t )h(t )dt
T = T
F (T )
, lim
∫ p (t )h(t )dt = h(T ).
0
T
K
∫ p ( t ) dt ∫ F (t )dt ∫ p ( t ) dt
K →0 K →∞
K K
0 0 0
T T
Differentiating
∫ 0
pK (t )h(t )dt /
∫ p ( t ) dt
0
K with respect to T :
T T
pK (T )h(T )
∫ 0
pK (t )dt − pK (T )
∫ p (t )h(t )dt
0
K
T
= pK (T )
∫ p (t )[h(T ) − h(t )]dt ≥ 0,
0
K
T T T T
L(T ) ≡
∫ 0
pK +1 (t )h(t )dt
∫ 0
pK (t )dt −
∫ 0
pK (t )h(t )dt
∫p
0
K +1 (t )dt ,
T
L′(T ) = pK +1 (T )
∫ p (t )[h(T ) − h(t )]dt
0
K
T
− pK (T )
∫p 0
K +1
(t )[h(T ) − h(t )]dt
H (T ) K − H (T ) T
=
( K + 1)!
e
∫ p (t )[h(T ) − h(t )][H (T ) − H (t )]dt ≥ 0,
0
K
APPENDIX 7.8
Prove that for 0 ≤ T < ∞ and K = 0,1,2,:
∫ T
pK (t )h(t )dt
∞ (A7.16)
∫ T
pK (t )dt
∞ ∞
increases with T from1/
to h(∞), and:
∫
0
pK (t )dt to h(∞) and increases with K from F (T ) /
∫
T
F (t )dt
An Overview on Failure Rates in Maintenance Policies 193
∞ ∞
∞
F (T )
≤
∫ T
pK (t )h(t )dt
∞ , ∞
1
≤
∫ T
pK (t )h(t )dt
∞ . (A7.17)
∫T
F (t )dt
∫ T
pK (t )dt
∫
0
pK (t )dt
∫ T
pK (t )dt
lim
∫ T
pK (t )h(t )dt
∞ = ∞
1
, lim
∫ T
pK (t )h(t )dt
∞ = h(∞),
∫ ∫ ∫
T →0 T →∞
pK (t )dt pK (t )dt pK (t )dt
T 0 T
∞ ∞
lim
∫ T
pK (t )h(t )dt
∞ = ∞
F (T )
, lim
∫ T
pK (t )h(t )dt
∞ = h(∞).
∫ ∫ ∫
T →0 K →∞
pK (t )dt F (t )dt pK (t )dt
T T T
By similar methods used in Appendix 7.7, we can easily prove Appendix 7.8.
APPENDIX 7.9
Prove that for 0 < T < ∞ and K = 0,1,2,:
T
T
∫ p (t )h(t )dt
0
K
(A7.18)
∫ 0
pK (t )[h(t ) / Q(t )]dt
increases with T from 1/µ to 1/ ∫ ∞ pK +1(t )dt and increases with K from:
0
T ∞ ∞
F (T ) / ∫0 h(t )[ ∫t F (u)du]dt to F (T ) / ∫T F (t )dt , and:
1
≤ T
∫ p (t )h(t )dt
0
K
≤ ∞
1
, (A7.19)
µ
∫ 0
pK (t )[h(t ) / Q(t )]dt
∫ 0
pK +1(t )dt
T
∫ p (t )h(t )dt
0
K
≤ ∞
F (T )
, (A7.20)
∫
0
pK (t )[h(t ) / Q(t )]dt
∫ T
F (t )dt
lim T
∫ p (t )h(t )dt
0
K
= lim ∞
F (T )
=
1
,
µ
∫ p (t )[h(t ) / Q(t )]dt ∫
T →0 T →0
K
F (t )dt
0 T
194 Reliability Engineering
lim T
∫ p (t )h(t )dt
0
K
= ∞
1
= ∞
1
,
∫ ∫ ∫
T →∞
pK (t )[h(t ) / Q(t )]dt pK (t )[h(t ) / Q(t )]dt pK +1(t )dt
0 0 0
lim T
∫ p (t )h(t )dt
0
K
= T
F (T )
∞ ,
∫ p (t )[h(t ) / Q(t )]dt ∫ h(t )[∫
K →0
K
F (u)du]dt
0 0 t
lim T
∫ p (t )h(t )dt
0
K
=
F (T )
∞ .
∫ ∫
K →∞
pK (t )[h(t ) / Q(t )]dt F (t )dt
0 T
T T
Differentiating
∫0
pK (t )h(t )dt /
∫ p (t )[h(t ) / Q(t )]dt
0
K
with respect to T :
T T
h(t ) h(T )
pK (T )h(T )
∫ 0
pK (t )
Q (t )
dt − pK (T )
Q(T ) ∫ p (t )h(t )dt
0
K
T
h(T ) h(t )
= pK (T )
Q(T ) ∫ p (t ) Q(t ) [Q(T ) − Q(t )]dt ≥ 0,
0
K
T T
h(t )
L(T ) ≡
∫ 0
pK +1 (t )h(t )dt
∫ p (t ) Q(t ) dt
0
K
T T
h(t )
−
∫ p (t )h(t )dt ∫
0
K
0
pK +1 (t )
Q (t )
dt ,
H (T ) K − H (T ) T
1 1
L′(T ) =
( K + 1)!
e h(T )
∫ p (T ) Q(t ) − Q(T ) [H (T ) − H (t )]dt ≥ 0,
0
K
APPENDIX 7.10
Prove that for 0 ≤ T < ∞ and K = 0,1,2,:
∞
∞
∫
T
pK (t )h(t )dt
(A7.21)
∫T
pK (t )[h(t ) / Q(t )]dt
∞
increases with T from 1/ ∫0 pK +1(t )dt to h(∞) and increases with K from:
∞ ∞
F (T ) / ∫T [ ∫t F (u)du]dt to h(∞), and:
∞ ∞
∫T
pK (t )h(t )dt
∞ ≤ ∞
∫ T
pK (t )h(t )dt
∞ ,
∫ T
pK (t )dt
∫ T
pK (t )h(t )[
∫ t
F (u)du / F (t )]dt
∞
1
≤ ∞
∫T
pK (t )h(t )dt
∞ .
∫ 0
pK +1 (t )dt
∫ T
pK (t )h(t )[
∫t
F (u)du / F (t )]dt
lim ∞
∫ T
pK (t )h(t )dt
= ∞
1
,
∫ ∫
T →0
pK (t )[h(t ) / Q(t )]dt pK +1 (t )dt
T 0
lim ∞
∫ T
pK (t )h(t )dt
= h(∞),
∫
T →∞
pK (t )[h(t ) / Q(t )]dt
T
lim ∞
∫ T
pK (t )h(t )dt
= ∞
F (T )
∞ ,
∫ ∫ ∫
K →0
pK (t )[h(t ) / Q(t )]dt h(t )[ F (u)du]dt
T T t
lim ∞
∫ T
pK (t )h(t )dt
= limQ(T ) = h(∞).
∫
K →∞ T →∞
pK (t )[h(t ) / Q(t )]dt
T
Using h(t ) / Q(t ) ≤ 1 and similar methods in Appendix 7.9, we can easily prove
Appendix 7.10.
196 Reliability Engineering
ACKNOWLEDGMENT
This work is supported by National Natural Science Foundation of China
(NO. 71801126), Natural Science Foundation of Jiangsu Province (NO. BK20180412),
Aeronautical Science Foundation of China (NO. 2018ZG52080), and Fundamental
Research Funds for the Central Universities (NO. NR2018003).
REFERENCES
1. Lai, C.D., Xie, M. Concepts and applications of stochastic aging in reliability.
In Pham, H. (ed.), Handbook of Reliability Engineering. London, UK: Springer, 2003:
pp. 165–180.
2. Barlow, R.E., Proschan, F. Mathematical Theory of Reliability. New York: John
Wiley & Sons, 1965.
3. Finkelstein, M. Failure Rate Modeling for Reliability and Risk. London, UK: Springer,
2008.
4. Nakagawa, T. Maintenance Theory of Reliability. London, UK: Springer, 2005.
5. Nakagawa, T. Random Maintenance Policies. London, UK: Springer, 2014.
6. Nakagawa, T., Zhao, X. Maintenance Overtime Policies in Reliability Theory. Cham,
Switzerland: Springer, 2015.
7. Zhao, X., Al-Khalifa, K.N., Hamouda, A.M.S., Nakagawa, T. What is middle mainte-
nance policy? Quality and Reliability Engineering International, 2016, 32, 2403–2414.
8. Zhao, X., Al-Khalifa, K.N., Hamouda, A.M.S., Nakagawa, T. Age replacement models:
A summary with new perspectives and methods. Reliability Engineering and System
Safety, 2017, 161, 95–105.
9. Nakagawa, T. Stochastic Processes with Applications to Reliability Theory. London,
UK: Springer, 2011.
10. Zhao, X., Qian, C., Nakagawa, T. Comparisons of maintenance policies with peri-
odic times and repair numbers. Reliability Engineering and System Safety, 2017, 168,
161–170.
8 Accelerated Life Tests
with Competing
Failure Modes
An Overview
Kanchan Jain and Preeti Wanti Srivastava
CONTENTS
8.1 Introduction................................................................................................... 198
8.1.1 Accelerated Life Test Models............................................................ 198
8.1.2 An Accelerated Life Test Procedure.................................................. 199
8.1.3 Competing Failure Modes................................................................. 199
8.2 Accelerated Life Tests with Independent Causes of Failures........................200
8.2.1 Constant-Stress Accelerated Life Test with Independent Causes
of Failures.......................................................................................... 201
8.2.1.1 Model Illustration...............................................................202
8.2.2 Step-Stress Accelerated Life Test with Independent Causes of
Failures..............................................................................................204
8.2.2.1 Model Illustration............................................................... 205
8.2.3 Modified Ramp-Stress Accelerated Life Test with Independent
Causes of Failures..............................................................................208
8.2.3.1 Model Illustration............................................................... 210
8.3 Accelerated Life Tests with Dependent Causes of Failures.......................... 213
8.3.1 Copulas and Their Properties............................................................ 213
8.3.2 Constant-Stress Accelerated Life Test Based on Copulas................. 214
8.3.2.1 Model Illustration............................................................... 215
8.3.3 Constant-Stress Partially Accelerated Life Test Based on Copulas.....216
8.3.3.1 Model Illustration............................................................... 217
8.3.4 Step-Stress Accelerated Life Test Based on Copulas........................ 219
8.4 Bayesian Approach to Accelerated Life Test with Competing Failure
Mode.............................................................................................................. 219
8.5 Conclusion..................................................................................................... 219
References............................................................................................................... 219
197
198 Reliability Engineering
8.1 INTRODUCTION
A longer time period is necessary to test systems or components with a long expected
lifetime under normal operating conditions and many units are required which is
very costly and impractical. In such situations, accelerated life test (ALT) meth-
ods are used that lead to failure/degradation of systems or components in shorter
time periods. Hence, failure data can be obtained during a reasonable period without
changing failure mechanisms.
ALTs were introduced by Chernoff (1962) and Bessler et al. (1962). They are used
during Design and Development, Design Verification, and Process Validation stages
of a product life cycle. Designing of optimal test plans is a critical step for assur-
ing that ALTs help in prediction of the product reliability accurately, quickly, and
economically.
In ALT, systems or components are:
For accurate prediction of the reliability, the types of stresses to which systems/
components are subjected and the failure mechanisms must be understood.
Different Types of Stress are:
1. Constant
2. Step
3. Ramp-step
4. Triangular-cyclic
5. Ramp-soak-cyclic
6. Sinusoidal-cyclic
• Partially accelerated life test model: Degroot and Goel (1979) introduced
Partially Accelerated Life Test (PALT) models wherein the items are run at
normal as well as accelerated conditions.
A PALT model consists of a life distribution and an acceleration factor for
extrapolating accelerated data results to normal operating condition when
the life-stress relationship cannot be specified. The acceleration factor—the
Accelerated Life Tests with Competing Failure Modes 199
Some of such stress-life relationships used in the literature (Nelson 1990; Yang 2007;
Elsayed 2012; Srivastava 2017) are:
In these examples, the assessment of each risk factor in the presence of other risk
factors is necessary and gives rise to competing risks analysis. For such an analysis,
each complete observation must be composed of the failure time and the correspond-
ing cause of failure. The causes of failure can be independent or dependent upon
each other.
The procedure underlying an ALT is shown in the following flowchart (Figure 8.1).
200 Reliability Engineering
r
F (t ) = 1 − ∏ (1 − G (t ) ), (8.1)
i=1
j
Accelerated Life Tests with Competing Failure Modes 201
and PDF:
r r
f (t ) = ∑
j =1
h j (t ) ∏
j =1
(1 − G (t ) ), (8.2)
j
where h j (t ), the hazard rate corresponding to the jth risk factor, is defined as:
g j (t )
h j (t ) = . (8.3)
(1− G j (t ))
Let C be the indicator variable for the cause of failure, then the joint distribution of
(T, C) is given by:
fT ,C (t , j ) = g j (t ). (8.4)
8.2.1 Constant-Stress Accelerated Life Test with Independent
Causes of Failures
In a constant-stress ALT (CSALT) set-up, sub-groups of test specimens are allocated
to different test chambers and, in each test chamber, the test units are subjected to
different but fixed stress levels. The experiment is terminated according to a pre-
specified censoring scheme. Each unit is tested under the same temperature for a
fixed duration of time. For example, 10 units are tested for 100 hours at 310 K, 10
different units are tested for 100 hours at 320 K, and another 10 different units are
tested for 100 hours at 330 K.
Figure 8.2 exhibits the constant-stress patterns.
McCool (1978) presented a technique for finding interval estimates for Weibull
parameters of a primary failure mode when there is a secondary failure mode with
the same (but unknown) Weibull shape parameter. Moeschberger and David (1971)
and David and Moeschberger (1978) gave an expression for the likelihood of com-
peting risk data under censoring and fixed experimental conditions. Large sample
properties of maximum likelihood estimators (MLEs) were discussed for Weibull
and log-normal distributions. Herman and Patell (1971) discussed the MLEs under
competing have causes of failure.
Klein and Basu (1981, 1982a) analyzed ALT for more than one failure mode.
For independent competing failure modes for each stress level, the authors found
MLEs with life times as exponential or Weibull, with common or different shape
parameters under Type-I, Type-II, or progressively censored data. Using a general
stress function, Klein and Basu (1982b) obtained estimates of model parameters
under various censoring schemes. A dependent competing risk model was proposed
by considering a bivariate Weibull distribution as the joint distribution of two com-
peting risks.
Nelson (1990) and Craiu and Lee (2005) analyzed ALTs under competing causes
of failure for semiconductor devices, ball bearing assemblies, and insulation sys-
tems. Kim and Bai (2002) analyzed ALT data with two competing risks taking a
mixture of two Weibull distributions and location parameters as linear functions of
stress.
Pascual (2007) considered the problem of planning ALT when the respective
times to failure of competing risks are independently distributed as Weibull with a
commonly known shape parameter.
Shi et al. (2013) proposed a CSALT with competing risks for failure from expo-
nential distribution under progressive Type-II hybrid censoring. They obtained the
MLE and Bayes estimators of the parameter and proved their equivalence under
certain circumstances. A Monte Carlo simulation demonstrated the accuracy and
effectiveness of the estimations.
Yu et al. (2014) proposed an accelerated testing plan with high and low tempera-
tures as multiple failure modes for a complicated device. They gave the reliability
function of the product and established the efficiency of the plan through a numerical
example.
Wu and Huang (2017) considered planning of two or more level CSALTs with
competing risk data from Type-II progressive censoring assuming exponential
distribution.
It is assumed that at the lth stress level, the mean life time of a test unit is a log-linear
function of standardized stress:
1
log = β 0j + β1 j sl , (8.5)
λ jl
where:
−∞ < β 0 j < ∞,
β1 j < 0 are unknown design parameters.
yl − yD
sl = , 0 ≤ sl ≤ 1, l = 1, 2, …, L,
yL − yD
y1 < y2 < …, < yL are L ordered stress levels and yD is the stress at normal operat-
ing condition. The log linear function is a common choice of life-stress relationship
because it includes the power law and the Arrhenius law as special cases.
The failure density and failure distribution of the ith unit under jth risk are,
respectively:
J
λ jl
F ( xil , l ) =
λ+ l
( )
1 − e − λ+ l xil , xil > 0, λ + l = ∑ λ . (8.7)
j =1
jl
( )
F ( xil ) = 1 − e − λ+ l xil , xil > 0. (8.8)
The authors have used progressive Type-II censoring scheme. Under this scheme,
nl units are tested at stress level sl with ΣlL= 1nl = n . For each stress level l, ml failures
are observed. The data are collected as follows:
When the first failure time, X(1)l, and its cause of failure, δ 1l , are observed, r1l of
the surviving units are selected randomly and removed. When the second failure
time, X(2)l, and its cause of failure, δ 2l, are observed, r2l of the surviving units are
selected randomly and removed. For simplicity, Xil is used instead of X(i)l. Type-II
progressive censored data with competing risks at stress level sl are:
L ml J − λ x r
L= ∏∏ ∏
l =1 i =1
j = 1
λ Ijlijl e + l { il il
( + 1)}
, (8.9)
J
1
log
λ
= β 0j + β1 j sl , λ jl = e
jl
− β0 j − β1j sl
and λ+ l = ∑e j =1
− β0 j − β1j sl
.
Using the likelihood function, the authors have used D-optimality, variance optimal-
ity, and A-optimality criteria to obtain the optimal stress level as well as the optimal
sample allocation at each stress level. They used the real data set from Nelson (1990)
on times to failure of the Class-H insulation system in motors to explain the pro-
posed method. The design temperature is 180°C. The insulation systems are tested
at high temperatures of 190°C, 220°C, 240°C, and 260°C. Turn, Phase, and Ground
are three causes of failure.
G1( w ), τ 0 ≤ w < τ 1
G( w ) = Gi ( w − τ i −1 + si −1 ), τ i −1 ≤ w < τ i , i = 1, 2, ..., k − 1 (8.10)
G k ( w − τ k −1 + s k −1 ), τ i −1 ≤ w < τ i ,
Stress
Time
Khamis and Higgens (1998) formulated the Weibull CEM, which is based on the
time transformation of exponential CEM. Bai and Chun (1991) studied optimum
simple step-stress accelerated life tests (SSALTs) with competing causes of failure
when the distributions of each failure cause were independent and exponential.
Balakrishnan and Han (2008) and Han and Balakrishnan (2010) considered an
exponential SSALT with competing risks using Type-I and Type-II censored data
respectively. Donghoon and Balakrishnan (2010) studied inferential problem for
exponential distribution under time constraint. Using time-censored data, Liu and
Qiu (2011) devised a multiple-step SSALT with independent competing risks.
Srivastava et al. (2014) considered simple SSALT under Type-I censoring using
the Khamis-Higgins model (an alternative to the Weibull CEM) with competing
causes of failure. The Khamis-Higgins model is based on time transformation of the
exponential model. The life distribution of each failure cause, which is independent
of other, is assumed to be Weibull with the log of characteristic life as a linear func-
tion of the stress level.
Haghighi (2014) studied a step-stress test under competing risks and degradation
measurements and estimated the reliability function.
−w δ
G j ( w ) = 1 − exp , 0 ≤ w < ∞. (8.11)
θij
206 Reliability Engineering
The characteristic life, which is the 63.2th percentile of the distribution, of two
potential failure times are log-linear functions of stress and:
1
log θijδ = α j + β j xi ; i = 0, 1, 2; j = 1, 2. (8.12)
α j , β j (< 0) are unknown parameters depending on the nature of1 the product and
the test method and δ is known. It can be shown as follows that θijδ is the character-
istic life of expression 8.11.
1/δ ≈ ξ , i , j = 1, 2. (8.13)
G j (ξ p ) = p ⇒ ξ P = (−θij log (1 − p))1/δ, ⇒ θij 0.632
For each failure cause, Weibull CEM is assumed. Failure times and failure causes of
test units are observed jointly and continuously.
From the CEM and Weibull distributed life assumptions, the CDF of failure
cause, j = 1, 2, under a simple time step-stress test is the Khamis-Higgins model
given by:
G j ( w ) = G j ( w; θ 1 j , θ 2 j )
−w δ
1 − exp if 0 < w < τ (8.14)
θ 1 j
=
τ δ (w δ − τ δ )
1 − exp − − if τ ≤ w < ∞
θ 1 j θ2 j
Since only the smaller of W1 and W2 is observed, let the overall failure time of a test
unit be
F ( w ) = F ( w;θ )
= 1 − (1 − G1( w ))(1 − G2 ( w ))
{
1 1 δ
1 − exp − θ11 + θ12 w
=
} if 0 < w < τ , (8.15)
{
θ11 θ12
θ 21 θ 22 }
1 − exp − 1 + 1 τ δ − 1 + 1 ( w δ − τ δ ) if τ ≤ w <∞
Accelerated Life Tests with Competing Failure Modes 207
f ( w ) = f ( w;θ )
δ −1 1 1 1 1 δ
δw +
θ11 θ12 exp − + w
θ11 θ12
if 0 < w < τ ,
1 1 δ
= − + τ (8.16)
θ11 θ12
δ w δ −1 1 1
+ exp if τ ≤ w <∞
θ 21 θ 22
1 1 δ δ
− + (w − τ )
θ 21 θ 22
f w, c ( w , j ) = g j ( w )(1 − G j′ ( w ))
δ w δ −1 1 1 δ
exp − + w if 0 < w < τ ,
θ1 j θ11 θ12
1
1 δ
= − + τ (8.17)
δ w exp 11
δ −1 θ θ12
θ2 j if τ ≤ w < ∞.
1 1 δ δ
− θ + θ (w − τ )
21 22
The relative risk imposed on a test unit before τ and due to failure cause j is
denoted by
θ −11j
π 1 j = Pr[C = j | 0 < W < τ ] = , j = 1, 2. (8.18)
θ + θ12−1
−1
11
θ −21j
π 2 j = Pr[C = j | W ≥ τ ] = , j = 1, 2. (8.19)
θ 21−1 + θ 22−1
These equations are simply the proportion of failure rates in the given time frame.
It follows from Equations 8.11 through 8.13 that W and C are independent given the
time frame in which a failure has occurred. For j = 1,2, let
Under the assumption of the CEM, the likelihood function of θ based on the Type-I
censored sample is:
n11
δ w δ −1 −wδ /θ n12 δ w δ −1 −wδ /θ
L(θ ) = ∏ i e i 1• ∏ i e i 1•
i =1 θ 11 i =1 θ 12
δ −1 − τ δ − wi −τ n δ −1 − τ δ − wiδ −τ δ
δ δ
n21
δ w 22 δ wi e θ 1•
× ∏ i e θ 1• θ 2• θ 2• (8.20)
i =1 θ 21 ∏i =1 θ 22
nc (T δ −τ δ ) ncτ δ
× − θ 2• + θ 1•
e ,
where
1 1 1
= + ,
θ 1• θ 11 θ 12
1 1 1
= + ,
θ 2 • θ 21 θ 22
n2 • = n21 + n22,
n = n1• + n2 • + nc,
The authors estimated model parameters and obtained optimum plan for the time-
censored SSALTs which minimizes the sum over all causes of failure of asymptotic
variances of the MLEs of the log characteristics life at design stress. The inferential
procedures involving design parameters also were studied.
t
F0 ( t ; , x ) = G (8.21)
α ( , x )
where the scale parameter is set equal to unity in the assumed CDF, G(⋅).
⇒ F0 (t I ) = G(ε ), (8.22)
where
t I = ∆1 + ∆ 2 + ... + ∆ i + ... + ∆ I (8.23)
is the time after I steps in step-stress testing with step i at stress level i for a time:
∆ i = ti − ti − 1, (8.24)
with t0 = 0, and
∆1 ∆2 ∆i ∆I
ε = + + ... + + ... + (8.25)
α ( 1 , x ) α ( 2 , x ) α ( i , x ) α ( I , x )
∆1 ∆2 ∆i ∆I
ε ( t ) = lim + + ... + + ... + .
∆ i → 0 α ( 1, x ) α ( 2, x ) α ( i, x ) α ( I, x )
(8.26)
t
dt
=
∫ α ( ( t ), x )
0
210 Reliability Engineering
⇒ F0 ( t ; ( t ) , x ) = G ( ε ( t ) ) (8.27)
t t t
1 1 1
ε (t ) =
∫ θ ( s( y ))
dy =
∫ γ0 j s0
γ1 j dy =
∫ γ0 j s0
γ1 j dy,
0 0 e
0 e
s( y ) s
0 + y β 1 (8.28)
γ 1+γ1 j
e −γ 0 j s0−γ1 j ((s0 + β 1t )1+ 1 j − (s0 ) )
= = W1 j (t ), 0 < t ≤ τ 1
β 1(1 + γ 1 j )
and
t t
1 1
ε (t ) =
∫ θ (s( y))
dy = ε (τ 1 ) +
∫ γ0 j s0
γ1 j
dy
0 τ1 e
s
1 + β 2 ( y − τ )
1
γ γ
e −γ 0 j s0−γ1 j ((s1 + β 2 (t − τ 1 ))1+ 1 j − s11+ 1 j )
= ε (τ 1 ) + (8.29)
β 2 (1 + γ 1 j )
= ε (τ 1 ) + W2 j (t ),τ 1 < t ≤ η .
Then the CDF of failure cause j (j = 1,2) under modified ramp-stress is:
G j (t ) = J (ε (t )), (8.30)
where:
J (⋅) is the exponential CDF with mean θ set equal to one and
ε (t ) is the cumulative exposure (damage) function.
Accelerated Life Tests with Competing Failure Modes 211
G j (t ) ≡ G j (t ; γ 0 j , γ 1 j )
g j (t ) ≡ g j (t ; γ 0 j , γ 1 j )
Let T = min {T1, T2} denote the overall failure time of a test unit, then its CDF and
PDF, respectively, are
F (t ) ≡ F (t ; γ 0 j , γ 1 j )
= 1 − (1 − G1(t ))(1 − G2 (t ))
(8.33)
1− exp{−W11(t ) −W12 (t ) } , if 0 < t < τ1
=
1− exp{−W11(τ1) −W21(t ) −W12 (τ1) −W22 (t ) } , if τ1 ≤ t < ∞
f (t ) ≡ f (t; γ 0 j , γ 1j )
Furthermore, let the indicator for the cause of failure be denoted by j. Then, under
assumptions, for j, j′ = 1, 2 and j′ ≠ j, the joint PDF of (T, C) is given by:
fT ,C (t , j ) = g j (t )(1 − G j′ (t ))
The relative risk imposed on a test unit before τ1 due to failure cause j for j = 1,2 is
denoted by:
τ1
Similarly, the relative risk after τ1 due to the cause j for j = 1,2 is denoted by:
π 2 j = Pr[C = j | T ≥ τ 1 ]
∞
∫ exp{−W 1j (τ 1 ) − W2 j (t ) } exp {−W1 j ′ (τ 1 ) − W2 j ′ (t ) } ( W2′ j (t )) dt (8.37)
= τ1
exp {−W11(τ 1 ) − W12 (τ 1 )}
n1j is the number of units that fail before τ1 due to the failure cause j,
n2j is the number of units that fail after τ1 due to the failure cause j.
n1 j n2 j
2 2 nc
L(γ 0 j , γ 1 j ) = ∏
j =1
∏ f (ti , ci ) ∏
j =1
∏ f (ti , ci ) (1 − F (η ))
i =1 i =1
n11
= ∏[ exp{−W (t )}exp{−W
i =1
11 i 12 (ti )} ( W11′ (ti )) ]
n12
∏[ exp{−W
i =1
12 (ti )} exp {−W11(ti )} ( W12′ (ti )) ] (8.38)
n21
∏ exp{−W (τ ) −W
i =1
11 1 21 (ti ) }exp { −W12 (τ 1 ) −W22 (ti ) } (W21′ (ti ) )
n22
∏[exp{−W
i =1
12 (τ 1 ) − W22 (ti ) } exp {−W11(τ 1 ) − W21 (ti ) } (W22′ (ti ) ) ]
e c { 11 1 21
− n W (τ ) + W ( η) + W12 (τ1 ) + W22 (η )}
.
The model parameters have been estimated and the optimal plan reveals rele-
vant experimental variables, namely, stress rate and stress rate change point(s) using
D-optimality criterion, which consists in finding out the optimal stress rate and the
optimal stress rate change point by maximizing the logarithm of the determinant
of the Fisher information matrix to the base 10. This criterion is motivated by the
fact that the volume of the joint confidence region of model parameters is inversely
proportional to the square root of the determinant of the Fisher information matrix.
The method developed has been explained using a numerical example. The results
of sensitivity analysis show that the plan is robust to small deviations from the true
values of baseline parameters.
Srivastava and Gupta (2018) also formulated the triangular cyclic-stress ALT
plan with independent competing failure modes.
Survival Copula
= F1 (x ) + F2 ( y) − 1 + C( F1 (x ), F2 ( y))
(8.39)
= F1 (x ) + F2 ( y) − 1 + C(1 − F1 (x ),1 − F2 ( y))
= C(F1 (x ), F2 (y))
1
C (u, v ) = exp[−(( −loge [u])θ + ( −loge [v ])θ ) θ ] (8.40)
θ θ (1 θ )
Si ( t ) = C (e − λi1t , e − λi 2t ) = e −( λi1 +λi 2 )
.t
Under stress level si, the stress-life relationship is modeled using the log-linear
equation:
log( µi j ) = α j + β jφ ( si ), (8.41)
where:
µi j = 1 λi j , α j and β j are unknown parameters,
φ ( s) is a given function of stress s.
This is a general formulation which contains the Arrhenius and inverse power law
models as special cases; defined:
1 if cil = j
δ j (cil ) = (8.42)
0 if cil ≠ j , j = 1, 2
Then the likelihood function due to failure mode 1 under stress level si is
δ1 ( cil )
ri
P (T1 < T2 ) ∩ ( til ≤ T1 < til + ∆t )
Li1 = ∏
l =1
∆lim
t →0 ∆t
1 − δ1 ( cil )
P (T2 < T1 ) ∩ ( til ≤ T2 < til + ∆t )
× lim
∆t →0 ∆t
{P [T > t , T2 > tiri ]}
ni − ri
× 1 iri
λ
θ gi1 1 1 ri
( ) ( ) ∑
ri − 1
til + ( n − ri ) tiri
= i1 λiθ2 − ri λiθ1 + λiθ2 θ exp− λiθ1 + λiθ2 θ
λi 2
l =1
(8.43)
and that due to failure mode 2 under stress level si is
216 Reliability Engineering
λ
θ gi2 1 1 ri
( ) ( ) ∑
ri − 1
til + ( n − ri ) tiri
Li 2 = i 2 λiθ1 − ri λiθ1 + λiθ2 θ exp− λiθ1 + λiθ2 θ
λi1
l =1
(8.44)
where:
gij = ∑ lri= 1δ j (cil )
gi1 = ri − gi 2
1
1
θ
(
⇒ log Li = 2 θ gi1 log λi1 + θ ( ri − gi log λi 2 ) + − 1 log λiθ1 + λiθ2 − λiθ1 + λiθ2 ) ( ) θ TTTi
(8.45)
h j ( y ) = A j h j −1 ( y ) = ∏
i =1
Ai h ( y ) at j th stress level , j = 1, 2, …, m
− t h (u ) du
∫0
e = 1 − e − λ j t , under normal operating condition
G j (t ) = t (8.47)
− ∫Ah (u ) du
e 0 = 1 − e − Aλ j t , under accelerated condition
Under the tampered failure rate model and the Gumbel-Hougaard copula with expo-
nential marginals, S(t) is given as
The probabilities of failure of a unit under different failure modes over different
intervals are required for the formulation of the likelihood function and:
∂ C(u, v)
= g j (t ) dt (8.50)
∂u u = G ( t ), v = G
1 2 (t )
{ θ
}
θ θ (1 θ ) δ11
θ (1 θ ) −1 θ − ( λ1 + λ2 ) ti
(λ1 + λ2 ) λ1 e
δ12
L =∏ { }
nφ1
θ θ (1 θ )
1 (λ1θ + λ2θ )(1 θ )−1 λ2θ e −( λ1 +λ2 ) ti
i =1
{ }
1−δ11 −δ12
e −( λ1θ +λ2θ )(1 θ ) . η
{
}
θ θ (1 θ ) δ 21
θ θ (1 θ ) −1 θ − A( λ1 + λ2 ) . ti
A(λ1 + λ2 ) λ1 e
L =∏ { }
nφ2 δ 22
θ θ (1 θ )
2 A(λ1θ + λ2θ )(1 θ ) −1 λ2θ e − A( λ1 + λ2 ) . ti
i =1
{ }
1−δ 21 −δ 22
e − A( λ1θ + λ2θ )(1 θ ) . η
where:
Define as
Φm as the proportion of units that are allocated in chamber m, m = 1, 2 and Φ1 + Φ2 = 1.
The authors have estimated model parameters and obtained optimal plan that con-
sists in finding the optimal allocation, n1= n Φ1, n the first test chamber in normal
conditions using D-optimality criterion. The method developed has been explained
using numerical example and sensitivity analysis were carried out.
Accelerated Life Tests with Competing Failure Modes 219
8.5 CONCLUSION
This chapter is a brief review on formulation of ALT models with competing failure
modes—independent or dependent. The stress loading factors used in the literature are
constant, step-stress, modified ramp-stress, and triangular cyclic. In case of dependent
failure modes, dependence is described through copulas. In the literature, ALT models
have been designed by various authors using the classical approach or the Bayesian
approach. Various authors carried out data analysis using different censoring schemes
such as time-censoring, failure censoring, progressive censoring, and determined opti-
mal plans. The methods developed also were explained using numerical examples.
REFERENCES
Ancha, X. and Yincai, T. (2012). Statistical analysis of competing failure modes in acceler-
ated life testing based on assumed copulas. Chinese Journal of Applied Probability and
Statistics, 28, 51–62.
Bai, D.S. and Chun, Y.R. (1991). Optimum simple step-stress accelerated life tests with com-
peting causes of failure. IEEE Transactions on Reliability, 40 (5), 622–627.
220 Reliability Engineering
Bai, X., Shi, Y., Liu, Y., and Liu, B. (2018). Statistical analysis of dependent competing risks
model in constant stress accelerated life testing with progressive censoring based on
copula function. Statistical Theory and Related Fields, 2 (1), 48–57.
Balakrishnan, N. and Han, D. (2008). Exact inference for simple step-stress model with com-
peting risks for failure from exponential distribution under Type-II censoring. Journal
of Statistical Planning and Inference, 138, 4172–4186.
Bessler, S., Chernoff, H., and Marshall, A.W. (1962). An optimal sequential accelerated life
test. Technometrics, 4 (3), 367–379.
Bhattacharya, G.K. and Soejoeti, Z. (1989). A tampered failure rate model for step-stress
accelerated life test. Communications in Statistics—Theory and Methods, 18 (5),
1627–1643.
Bunea, C. and Mazzuchi, T.A. (2005). Bayesian accelerated life testing under competing
failure modes. Proceedings of Annual Reliability and Maintainability Symposium,
Alexandria, VA, 152–157.
Bunea, C. and Mazzuchi, T.A. (2006). Competing failure modes in accelerated life testing.
Journal of Statistical Planning and Inference, 136, 1608–1620.
Bunea, C. and Mazzuchi, T.A. (2014). Accelerated Life Tests: Analysis with Competing
Failure Modes. Wiley Stats, Reference: Statistics Reference Online, pp. 1–12.
Carriere, J. (1994). Dependent decrement theory. Transactions, Society of Actuaries, XLVI, 45–65.
Chernoff, H. (1962). Optimal accelerated life designs for estimation, accelerated life test.
Technometrics, 4 (3), 381–408.
Craiu, R.V. and Lee, T.C.M. (2005). Model selection for the competing-risks model with and
without masking. Technometrics, 47 (4), 457–467.
David, H.A. and Moeschberger, M.L. (1978). The Theory of Competing Risks. Griffin,
London, UK.
DeGroot, M.H. and Goel, P.K. (1979). Bayesian estimation and optimal designs in partially
accelerated life testing. Naval Research Logistic Quarterly, 26 (20), 223–235.
Donghoon, H. and Balakrishnan, N. (2010). Inference for a simple step-stress model with
competing risks for failure from the exponential distribution under time constraint.
Computational Statistics & Data Analysis, 54 (9), 2066–2081.
Elsayed, A.E. (2012). Reliability Engineering. John Wiley & Sons, Hoboken, NJ.
Escarela, G. and Carriere, J. (2003). Fitting competing risks with an assumed copula.
Statistical Methods in Medical Research, 12 (4), 333–349.
Haghighi, F. (2014). Accelerated test planning with independent competing risks and concave
degradation path. International Journal of Performability Engineering, 10 (1), 15–22.
Han, D. and Balakrishnan, N. (2010). Inference for a simple step-stress model with competing
risks for failure from the exponential distribution under time constraint. Computational
Statistics and Data Analysis, 54, 2066–2081.
Herman, R.J. and Patell Rusi, K.N. (1971). Maximum likelihood estimation for multi-risk
model. Technometrics, 13 (2), 385396. doi:10.1080/00401706.1971.10488792.
Khamis, I.H. and Higgins, J.J. (1998). New model for step-stress testing. IEEE Transactions
on Reliability, 47 (2), 131–134.
Kim, C.M. and Bai, D.S. (2002). Analysis of accelerated life test data under two failure modes.
International Journal of Reliability, Quality and Safety Engineering, 9, 111–125.
Klein, J.P. and Basu, A.P. (1981). Weibull accelerated life tests when there are competing causes
of failure. Communications in Statistics Theory and Methods, 10 (20), 2073–2100.
Klein, J.P. and Basu, A.P. (1982a). Accelerated life testing under competing exponential fail-
ure distributions. IAPQR Transactions, 7 (1), 1–20.
Klein, J.P. and Basu, A.P. (1982b). Accelerated life tests under competing Weibull causes of
failure. Communications in Statistics—Theory and Methods, 11 (20), 2271–2286.
Liu, X. and Qiu, W.S. (2011). Modeling and planning of step-stress accelerated life tests with
independent competing risks. IEEE Transactions on Reliability, 60 (4), 712–720.
Accelerated Life Tests with Competing Failure Modes 221
McCool, J. (1978). Competing risk and multiple comparison analysis for bearing fatigue tests.
Tribology Transactions, 21, 271–284.
Moeschberger, M.L. and David, H.A. (1971). Life tests under competing causes of failure and
the theory of competing risks. Biometrics, 27 (4), 909–933.
Nelsen, R.B. (2006). An Introduction to Copulas, 2nd ed. Springer Science + Business Media,
New York.
Nelson, W.B. (1990). Accelerated Testing: Statistical Models, Test Plans, and Data Analysis.
John Wiley & Sons, Hoboken, NJ.
Pascual, F.G. (2007). Accelerated life test planning with independent Weibull competing
risks with known shape parameter. IEEE Transactions on Reliability, 56 (1), 85–93.
Shi, Y., Jin, L., Wei, C., and Yue, H. (2013). Constant-stress accelerated life test with compet-
ing risks under progressive type-II hybrid censoring. Advanced Materials Research,
712–715, 2080–2083.
Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de
l'Institut de statistique de l'Université de Paris, 8, 229–231.
Srivastava, P.W. (2017). Optimum Accelerated Life Testing Models with Time-varying
Stresses. World Scientific Publishing Europe, London, UK.
Srivastava, P.W. and Gupta, T. (2015). Optimum time-censored modified ramp-stress ALT
for the Burr Type XII distribution with warranty: A goal programming approach.
International Journal of Reliability, Quality and Safety Engineering, 22 (3), 23.
Srivastava, P.W. and Gupta, T. (2017). Optimum modified ramp-stress ALT plan with compet-
ing causes of failure. International Journal of Quality and Reliability Management, 34
(5), 733–746.
Srivastava, P.W. and Gupta, T. (2018). Optimum triangular cyclic-stress ALT plan with
independent competing causes of failure. International Journal of Reliability and
Applications, 19 (1), 43–58.
Srivastava, P.W. and Gupta, T. (2019). Copula based constant-stress PALT using tampered
failure rate model with dependent competing risks. International Journal of Quality
and Reliability Management, 36 (4), 510–525.
Srivastava, P.W. and Sharma, D. (2014). Optimum time-censored constant-stress PALTSP for
the Burr Type XII distribution using tampered failure rate model. Journal of Quality
and Reliability Engineering, 2014, 564049, 13. doi:10.1155/2014/564049.
Srivastava, P.W., Shukla, R., and Sen, K. (2014). Optimum simple step-stress test with
competing risks for failure using Khamis-Higgins model under Type-I censoring.
International Journal of Operational Research/Nepal, 3, 75–88.
Tan, Y., Zhang, C., and Cen, X. (2009). Bayesian analysis of incomplete data from
accelerated life testing with competing failure modes. 8th International
Conference on Reliability, Maintainability and Safety, pp. 1268–1272. doi:10.1109/
ICRMS.2009.5270049.
Wu, S.-J. and Huang, S.-R. (2017). Planning two or more level constant-stress accelerated life
tests with competing risks. Reliability Engineering and System Safety, 158, 1–8.
Yang, G. (2007). Life Cycle Reliability Engineering. John Wiley & Sons, Hoboken, NJ.
Yu, Z., Ren, Z., Tao, J., and Chen, X. (2014). Accelerated testing with multiple failure modes
under several temperature conditions. Mathematical Problems in Engineering, 839042,
8. doi:10.1155/2014/839042.
Zhang, Z. and Mao, S. (1998). Bayesian estimator for the exponential distribution with the
competing causes of failure under accelerated life test. Chinese Journal of Applied
Probability and Statistics, 14 (1), 91–98.
Zhou, Y., Lu, Z., Shi, Y, and Cheng, K. (2018). The copula-based method for statistical
analysis of step-stress accelerated life test with dependent competing failure modes.
Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and
Reliability, 1–18. doi:10.1177/1748006X18793251.
9 European Reliability
Standards
Miguel Angel Navas, Carlos Sancho,
and Jose Carpio
CONTENTS
9.1 Introduction................................................................................................... 223
9.2 Classification of the Dependability Standards of the International
Electrotechnical Commission........................................................................224
9.3 Management Procedures............................................................................... 225
9.4 Establishment of Requirements..................................................................... 230
9.5 Test Methods.................................................................................................. 232
9.6 Method Selection........................................................................................... 234
9.7 Reliability Evaluation Methods..................................................................... 237
9.8 Statistical Methods for the Evaluation of Reliability....................................246
9.9 Conclusions.................................................................................................... 253
References............................................................................................................... 254
9.1 INTRODUCTION
The International Electrotechnical Commission (IEC) is a standardization organiza-
tion in the fields of electrical, electronic, and related technologies. It is integrated
by the national standardization bodies of each member country. The IEC includes
85 countries, including those of the European Union, Japan, and the United States,
among others.
The IEC has a Technical Committee, TC56, whose current name is Dependability.
The purpose of TC 56 is to prepare international standards for reliability (in its
broadest sense), applicable in all technological areas. Reliability can be expressed in
terms of the essential attributes of support such as availability, maintainability, etc.
The standards provide systematic methods and tools for evaluating the reliability and
management of equipment, services, and systems throughout their life cycles. As of
June 2018, TC56 has 57 published standards in this area.
The standards cover generic aspects of administration of the reliability and main-
tenance program, tests and analytical techniques, software and system reliability, life
cycle costs, technical risk analysis, and project risk management. This list includes
standards related to product problems from reliability of components to guidance for
reliability of systems engineering, standards related to process issues from technolog-
ical risk analysis to integrated logistics support, and standards related to management
issues from program management from reliability to administration for obsolescence.
223
224 Reliability Engineering
TABLE 9.1
Classification of the Dependability
Standards Issued by IEC
Cluster Number of Standards
Management procedures 19
Establishment of requirements 8
Test methods 11
Method selection 5
Reliability evaluation methods 9
Statistical methods for reliability 5
European Reliability Standards 225
TABLE 9.2
Classification of the Management
Procedures Standards Issued by IEC
Cluster Number of Standards
Maintenance strategies 8
Data processing 2
Risk 3
Logistics 2
Improvement processes 1
Life cycle 3
226 Reliability Engineering
The most relevant aspects of the two standards of data processing are summarized
as follows:
TABLE 9.3
Attributes of the Collection of Dependability Data from
the Field
Attribute Values
Respect to time Continuous, discontinuous, etc.
Number of data Complete or limited
Type of population Finite, infinite, or hypothetical
Sample size No sampling, random sampling, or stratified sampling
Type of data Qualitative or quantitative
Data censorship Uncensored, lateral censorship, or censorship by interval
Data validation In origin, by supervisor, etc.
Data screening Without screening or with screening standards
The most important aspects of the three standards dedicated to risk management are
summarized as follows:
And finally, the three standards developed for the life cycle are:
outcomes. It helps an organization define the activities and tasks that need
to be undertaken to achieve dependability objectives in an open system,
including dependability related communication, dependability assessment,
and evaluation of dependability throughout system life cycles.
and procedures for electronic components. Is intended for use of (1) com-
ponent manufacturers as a guideline, (2) component users as a guideline to
negotiate with component manufacturers on stress screening requirements or
plan a stress screening process in house due to reliability requirements, and
(3) subcontractors who provide stress screening as a service.
• IEC 61164:2004: Reliability growth—Statistical test and estimation meth-
ods (Edition 2.0) gives models and numerical methods for reliability growth
assessments based on failure data, which were generated in a reliability
improvement program. These procedures deal with growth, estimation,
confidence intervals for product reliability, and goodness-of-fit tests.
In Table 9.4, the types of model developed are classified.
• IEC 62309:2004: Dependability of products containing reused
parts—Requirements for functionality and tests (Edition 1.0) introduces the
concept to check the reliability and functionality of reused parts and their
usage within new products. It also provides information and criteria about
the tests/analysis required for products containing such reused parts, which
are declared “qualified-as-good-as-new” relative to the designed life of the
product. The purpose of this standard is to ensure by tests and analysis that
the reliability and functionality of a new product containing reused parts is
comparable to a product with only new parts.
• IEC 62429:2007: Reliability growth—Stress testing for early failures in
unique complex systems (Edition 1.0). This International Standard gives
guidance for reliability growth during final testing or acceptance testing of
unique complex systems. It gives guidance on accelerated test conditions
and criteria for stopping these tests.
• IEC 62506:2013: Methods for product accelerated testing (Edition 1.0)
provides guidance on the application of various accelerated test techniques
for measurement or improvement of product reliability. Identification
of potential failure modes that could be experienced in the use of a
product/item and their mitigation is instrumental to ensure dependability
of an item. The object of the methods is to either identify potential design
weakness or provide information on item dependability, or to achieve nec-
essary reliability/availability improvement, all within a compressed or
accelerated period of time. This standard addresses accelerated testing of
non-repairable and repairable systems.
TABLE 9.4
Attributes of the Collection of Dependability
Data from the Field
Type of Model Continuous Time Discrete Time
Classic design Section 6.1 —
Bayesian design Section 6.2 —
Classic tests Section 7.1 Section 7.2
Bayesian tests — —
234 Reliability Engineering
The 12 methods included are briefly explained in Annex A of the standard and refer-
ence is made to the IEC standard developed by each method, if any. This standard
includes a guide for the selection of the appropriate analysis method taking into
account the characteristics of the system or equipment:
This standard establishes the methods and conditions for reliability tests and prin-
ciples for the performance of statistical tests. It includes a detailed guide for the
selection of the statistical methods used to analyze the data coming from reliability
tests of repairable or non-repairable elements.
The requirements for a correct specification of the reliability test to be executed
are established so that all the variables that may affect the test are determined and
bounded prior to the application of the statistical test and contrast methods.
The following standard focuses on the analysis of trial data. For the non-repairable
elements, parametric methods adjusted to the exponential distribution are proposed
for failure rate λ(t) constant and adjusted to the Weibull distribution for λ(t) with trend.
The statistical nature of failure modes in repairable elements is described as a sto-
chastic point process (SPP). The failure intensity z(t) refers exclusively to repairable
elements. This means that the failure current of a single repairable element can be
estimated using the successive times between failures. It is estimated by the number
of failures per unit of time or another variable.
In this case, the failures of each element happen sequentially and this is known as
an SPP. It is important to maintain the traceability of the sequence of times between
failures. If the times between failures are distributed exponentially, then the fail-
ure current is constant. Therefore, the time between failures can be modeled by an
exponential distribution. In this case, the number of failures per unit of time can be
modeled by a homogeneous Poisson process (HPP).
In many cases where there is a trend in the failure intensity, the power-law process
(PLP) can be applied. This leads to a model from which the trend can be estimated.
If there is a trend (intensity of increasing or decreasing failure), a non-homogeneous
Poisson process (NHPP) can be applied. See classification in Table 9.5.
Attached is a list of standards for the estimation of reliability in non-repairable
elements according to IEC 60300-3-5:
TABLE 9.5
Appropriate Models for Data Analysis According to IEC
60300-3-5
Item Trend Appropriate Model
Non-repairable Constant Exponential distribution
Non-repairable Non-constant Weibull distribution
Repairable Constant Homogenous Poisson process (HPP)
Repairable Non-constant Non-homogenous Poisson process (NHPP)
236 Reliability Engineering
• Non-repairable items
• Items repairable with time to zero restoration
• Repairable items with time to non-zero restoration
For non-repairable items, repairable items with time zero restoration, and repair-
able items with time to non-zero restoration develop and formulate the mathematical
expressions:
• Reliability; R(t)
• Instantaneous failure rate; λ(t) (non-repairable items)
• Instantaneous failure intensity; z(t) (repairable items)
• Average failure rate; λ (t1, t2 ) (non-repairable items)
• Average failure intensity; z (t1, t2 ) (repairable items)
• Mean Time To Failure: MTTF (non-repairable items)
• Mean Up Time: MUT (repairable items)
• Mean Time Between Failures: MTBF (repairable items)
European Reliability Standards 237
Likewise, and for the repairable items with time to the non-zero restoration,
mathematical expressions are included for the calculation of availabilities and
instantaneous, average and asymptotic unavailability, and maintainability, average
repair rate, and average repair time.
• IEC 62308:2006: Equipment reliability—Reliability assessment methods
(Edition 1.0). This International Standard describes early reliability assessment
methods for items based on field data and test data for components and mod-
ules. It is applicable to mission, safety and business critical, high integrity,
and complex items. It contains information on why early reliability estimates
are required and how and where the assessment would be used.
TABLE 9.6
Symbols That Are Used in the Representation of
the FTA Method
FTA Symbols Event or Gate
Basic event
Undeveloped event
Transfer gate
OR gate
AND gate
Block diagrams are among the first tasks that are completed during the definition
of the product. They should be built as part of the initial conceptual development.
They should be started as soon as the program definition exists, completed as part of
the requirements analysis, and continuously extended to a greater level of detail, as
the data becomes available to make decisions and perform cost-benefit studies.
To construct an RBD, several techniques of qualitative analysis can be used:
You can evaluate more complex models in which the same block appears more than
once in the diagram by using:
The Markov model is a probabilistic method that allows adapting the statistical
dependence of the failure or repairing characteristics of the individual components
to the state of the system. Therefore, the Markov model can consider the effects of
the failures of the order-dependent components and the variable transition rates that
change as a result of efforts or other factors. For this reason, Markov analysis is an
adequate method for evaluating the reliability of functionally complex system struc-
tures and complex repair and maintenance strategies.
The method is based on the theory of Markov chains. For reliability applica-
tions, the normal reference model is the homogeneous Markov model over time that
requires transition rates (failure and repair) to be constant. At the expense of the
increase in the state space, non-exponential transitions can be approximated by a
sequence of exponential transitions. For this model, general and efficient techniques
of numerical methods are available and their only limitation for their application is
the dimension of the state space.
European Reliability Standards 241
In Figure 9.2, the white circles represent operational states, while the gray circles
represent non-operative states. λx are the transition failure rates from one state to
another and μx are the step repair rates from one state to another.
The stress models described herein are generic and can be used as a basis for conver-
sion of failure rate data given at these reference conditions to actual operating condi-
tions when needed and this simplifies the prediction approach. Conversion of failure
rate data is only possible within the specified functional limits of the components.
This document also gives guidance on how a database of component failure data can
be constructed to provide failure rates that can be used with the included stress models.
Reference conditions for failure rate data are specified so that data from differ-
ent sources can be compared on a uniform basis. If failure rate data are given in
accordance with this document, then additional information on the specified condi-
tions can be dispensed with. This document does not provide base failure rates for
components—rather it provides models that allow failure rates obtained by other
means to be converted from one operating condition to another operating condition.
The prediction methodology described in this document assumes that the parts are
being used within its useful life.
This international standard is intended for the prediction of reliability of compo-
nents used in equipment and focuses on organizations with their own data, describing
how to establish and use such data to make predictions of reliability. The failure rate
of a component under operating conditions is calculated as follows:
λ = λref π U π I π T π Eπ Sπ ES (9.1)
with:
λb is the failure rate in the reference conditions
ΠU is the dependence factor with voltage
ΠI is the dependence factor with current
ΠT is the dependence factor with temperature
ΠE is the environmental application factor
ΠS is the dependence factor with switching frequency
ΠES is the dependence factor with electrical stress
Therefore, the failure rate for sets of components under operating conditions is cal-
culated as aggregation as follows:
n
λEquip = ∑ ( λ ) (9.2)
i =1
i
European Reliability Standards 243
The standard develops specific stress models and values of the π factors applicable to
the different types of components that must be used to convert the reference failure
rates to failure rates in the operating conditions. The π factors are modifiers of the fail-
ure rate associated with a specific condition or effort. They provide a measure of the
modification of the failure rate as a consequence of changes in the effort or condition.
Petri net is a graphical tool for the representation and analysis of complex logical
interactions between the components or events of a system. The typical complex
interactions that are naturally included in the language of the Petri net are concur-
rency, conflict, synchronization, mutual exclusion, and resource limitation.
Prob. YES=0,a
Failure
Prob. NOT=0,d
State 2. Prob.= 0,a x 0,d
Failure
Prob. NOT=0,b
State 3. Prob.= 0,b
A condition is valid in a given situation if the corresponding node is marked; that is,
it contains at least one “•” mark (drawn as a black dot). The dynamics of the system
is represented by the movement of the marks in the graph. A transition is allowed if
its input nodes contain at least one mark.
A permitted transition can be triggered and that trigger removes a mark from each
entry node and places a mark on each exit node. The distribution of the marks in the
nodes is called marking.
Starting from an initial marking, the application of the activation and firing
rules produces all the possible markings that constitute the attainable set of the
Petri nets. This achievable set provides all the states that the system can reach from
the initial state.
Standard Petri nets do not contemplate the notion of time. However, many exten-
sions have appeared in which temporary aspects overlap the Petri network. If a trig-
ger rate (constant) is assigned to each transition, then the dynamics of the Petri net
can be analyzed by a continuous-time Markov chain whose state space is isomorphic
with the attainable set of the corresponding Petri net.
The Petri net can be used as a high level language to generate Markov models and
some tools used for reliability analysis are based on this methodology. Petri nets also
provide a natural environment for simulation.
The use of Petri nets is recommended when complex logical interactions must be
considered (concurrence, conflict, synchronization, mutual exclusion, and resource
limitation). In addition, Petri net usually is an easier and more natural language to
use in describing a Markov model.
246 Reliability Engineering
The key element of a Petri net analysis is the description of the structure of the
system and its dynamic behavior in terms of primary elements (nodes, transitions,
arcs, and marks) typical of the Petri net language. This step requires the use of ad
hoc software tools.
• IEC 62740:2015: Root cause analysis (RCA) (Edition 1.0) describes the
basic principles of root cause analysis (RCA) and specifies the steps that a
process for RCA should include. This standard identifies a number of attri-
butes for RCA techniques that assist with the selection of an appropriate
technique. It describes each RCA technique and its relative strengths and
weaknesses. RCA is used to analyze the root causes of focus events with
both positive and negative outcomes, but it is most commonly used for the
analysis of failures and incidents.
Causes for such events can be varied in nature, including design processes and
techniques, organizational characteristics, human aspects, and external events.
An RCA can be used for investigating the causes of non-conformances in quality
(and other) management systems as well as for failure analysis (e.g., in maintenance
or equipment testing). An RCA is used to analyze focus events that have occurred;
therefore, this standard only covers a posteriori analyses.
It is recognized that some of the RCA techniques with adaptation can be used
proactively in the design and development of items and for causal analysis during
risk assessment; however, this standard focuses on the analysis of events that have
occurred. The intent of this standard is to describe a process for performing RCA and
to explain the techniques for identifying root causes. These techniques are not designed
to assign responsibility or liability, which is outside the scope of this standard.
The standard develops the tests to check the hypothesis of constant failure rate λ(t)
for non-repairable elements, and the tests to check the hypothesis of constant failure
intensity z(t) for repairable elements.
In Section 6.2 of the standard the U-test (Laplace test) is developed to analyze
whether the nonrepairable equipment object of the study has a trend in its failure rate.
The standard also includes three graphical methods of trend testing in Sections 6.3
through 6.5 of the standard as support to the researcher to assess whether it can be
assumed that the non-repairable elements under study have a trend or not trend.
In Section 7.2 of the standard, the procedure is developed to check if a repairable
element has a constant failure intensity z(t), based on the calculation of the U-test
(Laplace test).
For testing completed by time:
∑
r T*
Ti − r
U= i =1 2 (9.3)
r
T*
12
∑
r Tr
Ti − ( r − 1)
U= i =1 2 (9.4)
r −1
Tr
12
with:
r is the total number of failures
T* is the total time of the test completed by time
Tr is the total time of the test completed by failure
Ti is the cumulative time of the test in the ith failure
With the zero growth hypothesis (i.e., the failure times follow a HPP), the U-test is
roughly distributed according to a standardized exponential distribution of mean 0
and deviation 1. The U-test can be used to test whether there is evidence of reliability
growth, positive or negative, independent of the reliability growth model.
A bilateral test for positive or negative growth with significance level α has criti-
cal values u1−α/2 and −u1−α/2, where u1−α/2 is the (1−α/2)100 percent percentile of the
typical normal distribution. If −u1−α/2 < U < u1−α/2, then there is no evidence of posi-
tive or negative growth of the reliability to a significance level α. In this case, the
hypothesis of an exponential distribution of times between successive failures of the
HPP is accepted with significance level α:
For the significance levels required in each test, the appropriate critical values of the
percentile table of the normalized typified distribution should be chosen according
to Table 9.7.
248 Reliability Engineering
TABLE 9.7
Critical Values for a Level
of Significance α
α Uα Value
0.025 2.24
0.050 1.96
0.100 1.64
∑ ∑ ( )
k ri
Tij − 0, 5 rT
1 1 + r2T2 + ... + rk Tk
* * *
i =1 j =1
U= (9.6)
1
12
rT (
1 2 + r2T2 + ... + rk Tk
*2 2
*2 *2
)
with:
ri is the total number of failures to consider from the ith item
Ti* is the total time of the test for the ith item
Tij is the time accumulated at the jth failure of the ith item
k is the total number of items
As in the case of Section 7.2 of the standard, a bilateral test for positive or negative
growth with significance level α has critical values u1−α/2 and −u1−α/2, where u1−α/2 is
the (1−α/2)100 percent percentile of the typical normal distribution
In Section 7.4 of the standard, the graphical procedure M(t) plot is developed to
check whether one or a set of repairable elements of the same characteristics has
constant failure intensity. It is a more qualitative than quantitative test.
This standard develops the statistical procedure for the exponential distribution and
allows estimating the value of constant failure rate λ(t) for non-repairable elements
and the constant failure intensity z(t) value for non-repairable elements. It also
includes the formulation for the calculation of confidence intervals, tolerances, and
so on.
European Reliability Standards 249
This norm must apply complementary to IEC 60605-6 in such a way that
if the result of the application of U-test accepts the hypothesis of exponential
distribution of the times between successive failures (or a HPP), it is possible
to calculate directly the value of constant failure rate λ(t) or constant failure
intensity z(t).
For testing completed by time and non-repairable items, the point estimate of the
failure rate:
r
λ = * (9.7)
T
For test terminated by failure:
r
λ = * (9.8)
T
with:
r is the total number of failures in test
T* is the total time of the test completed by time or by failure
For testing completed by time and repairable elements, the point estimate of the
failure intensity:
= r (9.9)
Z
T*
= r (9.10)
Z
T*
with:
r is the total number of failures in test
T* is the total time of the test completed by time or by failure
X α2 / 2 2r
Z L 2 = λL 2 = (9.11)
2T *
X 2 α (2r + 2)
1−
ZU 2 = λU 2 = 2
(9.12)
2T *
with:
Χ2 is the fractile table value of the Χ 2 distribution for the 90 percent confidence
interval.
250 Reliability Engineering
In addition, the standard allows for prediction intervals for failures for a
future period in Section 9.6 and a procedure for assigning tolerance intervals in
Section 9.7.
• IEC 61649:2008: Weibull analysis (Edition 2.0) provides methods for ana-
lyzing data from a Weibull distribution using continuous parameters such as
time to failure, cycles to failure, mechanical stress, and so on. This standard
is applicable whenever data on strength parameters such as times to fail-
ure, cycles, and stress are available for a random sample of items operating
under test conditions or in-service to estimate measures of reliability per-
formance of the population from which these items were drawn. The main
changes with respect to the previous edition are as follows: the title has been
shortened and simplified to read “Weibull analysis” and provision of meth-
ods for both analytical and graphical solutions has been added.
In non-repairable items, when the failure rate λ(t) does not have a constant behavior
over time, usually the Weibull distribution is tried:
β
f ( t ) = βα (α t ) e ( ) (9.13)
β −1 − αt
β
R ( t ) = e ( ) (9.14)
− αt
λ ( t ) = βα (α t ) (9.15)
β −1
where:
α is the scale parameter
β is the shape parameter
f(t) is the probability density function of the failure
R(t) is the reliability function
The Weibull distribution is used to model data without considering whether the fail-
ure rate is increasing, decreasing, or constant. The Weibull distribution is flexible
and can be adapted to a wide variety of data.
The standard contemplates the Weibull distribution with two and three param-
eters, graphical methods, and goodness of fit. It also includes a section for the inter-
pretation of the resulting probability graph.
It develops computational methods for the point estimation of parameters by
means of maximum likelihood estimation (MLE), confidence intervals, as well as
the Weibayes approach, and the “sudden death” method.
of the power law model to data from repairable items. It is assumed that the time
to failure data have been collected from an item or some identical items operat-
ing under the same conditions (e.g., environment and load).
This standard develops the statistical procedure for an NHPP by means of PLP and
allows estimating the value of the failure intensity z(t) for tests of one or more repair-
able items in tests terminated by time or by failure. It also allows the estimation of
the z(t) in tests for groups of failures in time intervals.
This standard must be applied in a complementary way to IEC 60605-6 so that
if the result of the application of the U-test is rejected, there is a trend (intensity of
increasing or decreasing failure) and may be applicable PLP:
E N ( t ) = λt β (9.16)
The methods of estimating z(t) differ according to the type of test carried out:
• One or more repairable items observed in the same space time (the statistics
of Section 7.2.1 of the standard are applied)
• Multiple repairable items observed in different time intervals (the statistics
of Section 7.2.2 of the standard are applied)
• Groups of failures in time intervals (the statistics of Section 7.2.3 of the
standard are applied)
For one or multiple repairable items observed in the same period of time, Section 7.2.1
the summation is calculated:
N
T*
S1 = ∑ ln t
j =1
; for tests completed on time (9.18)
j
N
tN
S2 = ∑ ln t
j =1
; for tests completed to failure (9.19)
j
with:
T* is the total time of the test completed by time
tN is the total time of the test completed by failure
tj is the cumulative time of test in jth failure
252 Reliability Engineering
N −1
β = ; for tests completed on time (9.20)
S1
N −2
β = ; for tests completed to failure (9.21)
S2
N
λ = ; for tests completed on time (9.22)
k (T * ) β
N
λ = ; for tests completed to failure (9.23)
k (t N ) β
with:
N is the total number of failures accumulated in test
k is the total number of test items
( t ) = λ β
Z t β −1 (9.24)
∑
k
N N T jβ lnT j
∑ lnt −
N j =1
+ = 0 (9.25)
β
∑
i k
i =1 T jβ
j =1
N
λ = (9.26)
∑
k
T jβ
J =1
with:
N is the total number of failures accumulated in the test
k is the total number of items
ti is time to the ith failure (i = 1, 2, …, N)
Tj is the total time of observation for item j = 1, 2, …, k
European Reliability Standards 253
The goodness-of-fit test given in IEC 61710 (2013) is the Cramér–von Mises statistic
C2, with M = N and T = T* for testing completed based on time, and M = N − 1 and
T = T N for tests completed to failure:
2
M
t j β 2 j − 1
∑
1
C =
2
+ − (9.27)
12 M j =1 T 2 M
9.9 CONCLUSIONS
The IEC standards published in the field of reliability provide maintenance engineers
with tools, procedures, and methods to deal with a large part of the management and
control activities that they have to develop, in a standardized and auditable manner and
that have the support from official, business, and scientific community organizations.
254 Reliability Engineering
REFERENCES
IEC/ISO 31010:2009 Edition 1.0, Risk Management: Risk Assessment Techniques,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 60300-1:2014 Edition 3.0, Dependability Management: Part 1: Guidance for
Management and Application, International Electrotechnical Commission (IEC),
Geneva, Switzerland.
IEC 60300-3-1:2003 Edition 2.0, Dependability Management: Part 3-1: Application Guide—
Analysis Techniques for Dependability—Guide on Methodology, International
Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 60300-3-2:2004 Edition 2.0, Dependability Management: Part 3-2: Application Guide—
Collection of Dependability Data from the Field, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
European Reliability Standards 255
IEC 61025:2006 Edition 2.0, Fault Tree Analysis (FTA), International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 61070:1991 Edition 1.0, Compliance Test Procedures for Steady-State Availability,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 61078:2016 Edition 3.0, Reliability Block Diagrams, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 61123:1991 Edition 1.0, Reliability Testing: Compliance Test Plans for Success Eatio,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 61124:2012 Edition 3.0, Reliability Testing: Compliance Tests for Constant Failure Rate
and Constant Failure Intensity, International Electrotechnical Commission (IEC),
Geneva, Switzerland.
IEC 61160:2005 Edition 2.0, Design Review, International Electrotechnical Commission
(IEC), Geneva, Switzerland.
IEC 61163-1:2006 Edition 2.0, Reliability Stress Screening: Part 1: Repairable Assemblies
Manufactured in Lots, International Electrotechnical Commission (IEC), Geneva,
Switzerland.
IEC 61163-2:1998 Edition 1.0, Reliability Stress Screening: Part 2: Electronic Components,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 61164:2004 Edition 2.0, Reliability Growth: Statistical Test and Estimation Methods,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 61165:2006 Edition 2.0, Application of Markov Techniques, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 61649:2008 Edition 2.0, Weibull Analysis, International Electrotechnical Commission
(IEC), Geneva, Switzerland.
IEC 61650:1997 Edition 1.0, Reliability Data Analysis Techniques: Procedures for
Comparison of Two Constant Failure Rates and Two Constant Failure (event)
Intensities, International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 61703:2016 Edition 2.0, Mathematical Expressions for Reliability, Availability,
Maintainability and Maintenance Support Terms, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 61709:2017 Edition 3.0, Electric Components: Reliability: Reference Conditions for
Failure Rates and Stress Models for Conversion, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 61710:2013 Edition 2.0, Power Law Model: Goodness-of-fit Tests and Estimation
Methods, International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 61882:2016 Edition 2.0, Hazard and Operability Studies (HAZOP studies): Application
Guide, International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 61907:2009 Edition 1.0, Communication Network Dependability Engineering,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62198:2013 Edition 2.0, Managing Risk in Projects: Application Guidelines, International
Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62308:2006 Edition 1.0, Equipment Reliability: Reliability Assessment Methods,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62309:2004 Edition 1.0, Dependability of Products Containing Reused Parts:
Requirements for Functionality and Tests, International Electrotechnical Commission
(IEC), Geneva, Switzerland.
IEC 62347:2006 Edition 1.0, Guidance on System Dependability Specifications, International
Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62402:2007 Edition 1.0, Obsolescence Management: Application Guide, International
Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62429:2007 Edition 1.0, Reliability Growth: Stress Testing for Early Failures in Unique
Complex Systems, International Electrotechnical Commission (IEC), Geneva, Switzerland.
European Reliability Standards 257
IEC 62502:2010 Edition 1.0, Analysis Techniques for Dependability: Event Tree Analysis
(ETA), International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62506:2013 Edition 1.0, Methods for Product Accelerated Testing, International
Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62508:2010 Edition 1.0, Guidance on Human Aspects of Dependability, International
Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62550:2017 Edition 1.0, Spare Parts Provisioning, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 62551:2012 Edition 1.0, Analysis Techniques for Dependability: Petri Net Techniques,
International Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62628:2012 Edition 1.0, Guidance on Software Aspects of Dependability, International
Electrotechnical Commission (IEC), Geneva, Switzerland.
IEC 62673:2013 Edition 1.0, Methodology for Communication Network Dependability
Assessment and Assurance, International Electrotechnical Commission (IEC), Geneva,
Switzerland.
IEC 62740:2015 Edition 1.0, Root Cause Analysis (RCA), International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 62741:2015 Edition 1.0, Demonstration of Dependability Requirements:
The Dependability Case, International Electrotechnical Commission (IEC), Geneva,
Switzerland.
IEC TS 62775:2016 Edition 1.0, Application Guidelines: Technical and Financial Processes
for Implementing Asset Management Systems, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC 62853:2018 Edition 1.0, Open Systems Dependability, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
IEC TR 63039:2016 Edition 1.0, Probabilistic Risk Analysis of Technological Systems:
Estimation of Final Event Rate at a Given Initial State, International Electrotechnical
Commission (IEC), Geneva, Switzerland.
10 Time-Variant Reliability
Analysis Methods for
Dynamic Structures
Zhonglai Wang and Shui Yu
CONTENTS
10.1 Introduction.................................................................................................260
10.2 Time-Variant Reliability.............................................................................. 261
10.3 The Proposed Three Time-Variant Reliability Analysis Methods.............. 262
10.3.1 Failure Processes Decomposition Method..................................... 262
10.3.1.1 The FMTP Search for the Time-Variant Limit State
Function......................................................................... 262
10.3.1.2 Failure Processes Decomposition Based on
Taylor Expansion...........................................................264
10.3.1.3 Failure Processes Decomposition Based on Case
Classification.................................................................264
10.3.1.4 Kernel Density Estimation Method for the
Decomposed Model......................................................266
10.3.2 The Combination of the Extreme Value Moment and
Improved Maximum Entropy (EVM-IME) Method...................... 267
10.3.2.1 Determination of the Extreme Value Moments by
the Sparse Grid Technique............................................ 268
10.3.2.2 The Improved Maximum Entropy Method Based
on the Raw Moments..................................................... 269
10.3.3 Probability Density Function of the First-Passage Time Point
Method........................................................................................... 272
10.3.3.1 Time-Variant Reliability Model Based on PDF of
F-PTP...................................................................... 273
10.3.3.2 Establish f (τ ) by Using the Maximum Entropy
Method Combined with the Moment Method............... 273
10.4 Examples and Discussions........................................................................... 274
10.4.1 Numerical Example........................................................................ 275
10.4.2 A Corroded Simple Supported Beam Under Random
Loadings................................................................................. 276
10.5 Conclusions.................................................................................................. 278
Appendix: Discretization of Random Processes..................................................... 278
References............................................................................................................... 278
259
260 Reliability Engineering
10.1 INTRODUCTION
Reliability analysis aims to estimate the probability that products perform their
intended performance under the specified working conditions during their lifecycle.
For highly reliable products, it is difficult to collect enough data to conduct reli-
ability analysis using the statistics-based method. From the aspect of failure mecha-
nism of products, the physics-based method will be a proper choice for reliability
analysis with insufficient data. Traditional physics-based static (time-invariant) reli-
ability analysis methods have been developed extensively such as the First Order
Reliability Method (FORM) [1], the Second Order Reliability Method (SORM) [2],
the moment-based method [3], and surrogate models [4], which only consider the
static performance or simplify the dynamic performance to be the static perfor-
mance. For most products, the performance is usually dynamic because of various
time-varying loadings, working conditions, and inherent motion. Time-invariant
reliability analysis methods have shown poor capability in satisfying the reliability
accuracy requirements for time-varying and high nonlinear performance functions
of products [5]. Therefore, such engineering requirements have fostered the develop-
ment of time-variant reliability methods and several time-variant reliability analysis
methods have been developed.
Time-variant reliability analysis aims to estimate the probability that products
successfully complete the intended performance during a given time interval.
There are typically two categories of time-variant reliability analysis methods:
simulation and analytical. Typical analytical time-variant reliability analy-
sis methods include Gamma process method [6], extreme value method [7],
composite limit state method [8], compound random processes method [9], and
crossing-rate based methods [10,11]. The high model error would be produced
due to the model approximation since system parameters or performance func-
tions are usually assumed to follow a certain distribution in the Gamma process,
extreme value, and compound random processes methods. When handling high
nonlinear limit state functions, the computational accuracy of the composite
limit state method may be unsatisfactory. After the crossing-rate method was
first proposed [10,11], many crossing-based methods were developed further:
e.g., differential Gaussian process method [12], the rectangular wave renewal
process method [13], Laplace integration method [14], PHI2 method [15], and
PHI2+ method [16]. The d ifferential Gaussian process method, rectangular wave
renewal process method, and Laplace integration method are suitable mainly for
the crossing-rate calculation for some specific random processes. The developed
PHI2 and PHI2+ methods based on the crossing-rate method use the parallel reli-
ability framework to improve the computational accuracy and further broaden
the application range of the crossing rate methods. However, the PHI2 and PHI2+
methods show lower computational accuracy when dealing with the time-variant
reliability analysis of non-monotonic systems [16].
The other branch of the time-variant reliability analysis is the simulation
methods. The typical simulation methods are MCS, importance sampling (IS),
and subset simulation (SS) methods. MCS is a direct and easy-to-use method,
regardless of the dimensions and nonlinearity of limit state functions, but the
Time-Variant Reliability Analysis Methods for Dynamic Structures 261
{( ) }
Rt ( tlb , tub ) = Pr g d, X, Y ( t ) , t > 0, ∀t ∈ tlb , tub (10.1)
where:
g(•) is the time-variant limit state function for a certain structure
d denotes the vector of deterministic design variables
X defines the vector of random design variables and parameters
Y(t ) expresses the vector of time-variant random design variables and parameters,
actually stochastic process
tlb and tub are lower and upper boundaries of the time interval
where Z = [ X, N ].
262 Reliability Engineering
t − tlb
T= (10.3)
tub − tlb
Since t ∈ [tlb , tub ], T belongs to [ 0,1] in Equation 10.3. With the normalization in
Equation 10.3, the expression of the time-variant reliability in Equation (10.2) can
be rewritten as:
{ }
RT ( 0,1) = Pr gT ( d, Ζ, T ) > 0, ∀T ∈ 0,1 (10.4)
{
RT ( 0,1) = Pr gmin
T
}
( d, Z, T ) > 0, T ∈ 0,1 (10.5)
Time-Variant Reliability Analysis Methods for Dynamic Structures 263
find: T
minimize: g ( d, Z, T ) (10.6)
T
subject to: T ∈ [ 0,1]
264 Reliability Engineering
gT ( d, Z, T )
∂gT 1 ∂ 2 gT
( ) ( ) ( )
2
≈ gT d, Z, T ∗ + T − T∗ + T − T ∗ (10.7)
∂T T = T ∗ 2 ∂T T = T ∗
2
= aT 2 + bT + c
where
1 ∂ 2 gT
a=
2 ∂T 2 T = T ∗
∂gT ∂ 2 gT
b= − T∗
∂T T = T ∗ ∂T 2 T = T ∗
∂gT T ∗2 ∂ 2 g T
( )
c = gT d, Z, T ∗ − T ∗
∂T T = T ∗
+
2 ∂T 2 T = T ∗
When the second derivative of the approximate limit state function to T equals 0,
the approximate limit state function is a monotonic function of T . Therefore,
T
g min ( d, Z, T ) = g T ( d, Z, 0 ) or gmin
T
( d, Z, T ) = g T ( d, Z,1), and the reliability can be
obtained for this case:
{ }
RT ( 0,1) = Pr min g T ( d, Z, 0 ) , g T ( d, Z,1) > 0 (10.8)
{ }
C1 = Ts < 0, gT ( d, Z, 0 ) > 0, gT ( d, Z,1) > 0
(10.9)
b
= − < 0, c > 0, a + b + c > 0
2 a
Time-Variant Reliability Analysis Methods for Dynamic Structures 265
FIGURE 10.2 The geometrical relationship between g T ( d, Z, T ) and T. (a) case 1 of the
safety situation, (b) case 2 of the safety situation, (c) case 3 of the safety situation.
TABLE 10.1
Properties for the Three Cases
Cases Position of Ts Location of Minimum Point
Case 1 Ts < 0 T = 0 or 1
Case 2 0 ≤ Ts < 1 T = 0 or Ts or 1
Case 3 Ts ≥ 1 T = 0 or 1
266 Reliability Engineering
b b2
C2 = 0 < − < 1, c − > 0, c > 0, a + b + c > 0 (10.10)
2 a 4 a
b
C3 = − > 1, c > 0, a + b + c > 0 (10.11)
2 a
Because the three events C1 ∼ C3 are mutually exclusive, the PDF of the system time-
invariant reliability transformed from the time-variant reliability can be expressed by:
f (ς ) = f1 (ς ) + f 2 (ς ) + f3 (ς ) (10.12)
where f1 (ς ), f 2 (ς ), and f3 (ς ) denote the PDF of Case 1, Case 2, and Case 3 occur-
ring, respectively. Therefore, the time-variant reliability can be calculated by the
numerical integration:
+∞
RT ( 0,1) =
∫
0
f1 (ς ) + f2 (ς ) + f3 (ς ) dς (10.13)
The KDE method will be employed to calculate the PDF for each case.
b
g1 =
2 a
g2 = a + b + c
g3 = c (10.14)
b2
g4 = c −
4a
b
g5 = 1 +
2a
Because of the similar procedure for calculating the system reliability for each case,
Case 1 is taken for an example. The limit state function for the event C1 is:
b
GC1 ( Z ) = min g1, g2 , g3 = min , a + b + c, c (10.15)
2a
Time-Variant Reliability Analysis Methods for Dynamic Structures 267
M samples are directly drawn from the limit state function GC1 ( Z ) , and the vector
{ }
of samples can be obtained as GC1 = ζ 1C1 , ζ 2C1 , ⋅⋅⋅, ζ MC1 . Then the PDF fGC1 (ζ ) for the
event C1 is:
M
ζ − ζ iC1
∑
1
fC1 (ζ ) = K (10.16)
MhC1 i =1 hC1
where:
K ( • ) is the kernel function and the Gaussian kernel function in this model is
considered: i.e., K ( u ) = 21π exp − u2
2
( )
h is the bandwidth of the kernel function
The bandwidth of the kernel function is important for the prediction accuracy and
the optimal value of h is:
0.2
4
hopt = σ ( G ) (10.17)
3M
f1 (ς ) = fC1 (ζ )
M
ζ − ζ iC1 (10.18)
∑
1
= K
MhC1 i =1 hC1
With the same procedure, PDFs are estimated for events C2 and C3 based on the KDE
method. Using the estimated PDFs, the time-invariant system reliability is obtained:
+∞
RT ( 0,1) =
∫0
f1 (ς ) + f2 (ς ) + f3 (ς ) dς
3 M
(10.19)
ζ Ck
∑∑
1
= Φ i
M k =1 i =1 hCk
{ }
RT ( 0,1) = Pr g N ( d, N, T ) > 0, ∀T ∈ 0,1 (10.20)
+∞ +∞ l
k −1 mi1 mik
where the abscissas and weights are x ijii = 2ξ jiii and pijii = 1π ζ jiii , and ξ ijii and ζ ijii are
the abscissas and weights in the Gauss-Hermite quadrature formula; ji = 1,, mii ; the
multi-index i = ( i1,, ik ) ∈ N+k ; and the set H ( q, k ) is defined by:
k
H ( q, k ) = i = ( i1,, ik ) ∈ N+k , i ≥ 1: q + 1 ≤
∑ i ≤ q + k
r =1
r (10.23)
find: T
N i1
( ik
minimize: g d, x j1 , ⋅⋅⋅, x jk , T ) (10.24)
subject to: T ∈ 0,1
find: p( x )
∫
(10.25)
maximize: H = − p( x ) ln p( x )dx
subject to:
∫
x i p( x )dx = M i , i = 0,1, ⋅⋅⋅, l
270 Reliability Engineering
where:
p( x ) is the PDF of the time-invariant limit state function g ( X )
H is the entropy of the PDF p( x )
M i is the ith raw moment and M0 = 1
l is the number of the given moment constraints, which is defined to be 4 here
δL
= 0 is satisfied for calculating the optimal solution, and therefore the analytical
δ p( x )
expression of p( x ) can be easily obtained by:
4
p ( x ) = exp −
i =0
∑
λi x i .
(10.27)
4
I ( λ ) = λ0 + ∑ λ M (10.28)
i =1
i i
where λ0 = ln ∫ exp ( −∑i4=1 λi x i ) dx . The optimization with equality constraints in
Equation 10.25 can be converted into an unconstrained optimization:
find: λ1, λ2 , λ3 , λ4
4 4 . (10.29)
minimize: I = ln
∫
exp −
i =1
∑
λi x i dx +
∑ i =1
λi M i
With the obtained raw moments, the PDF p ( x ) of the limit state function g ( X ) can
be acquired from the optimization model in Equation 10.29:
4
4
∫
p ( x ) = exp − ln exp −
∑ λi x dx −
i
∑ λ x (10.30)
i
i
i =1 i =1
Reliability is then calculated based on the PDF p ( x ) from the maximum entropy
method:
+∞
R=
∫0
p ( x ) dx (10.31)
Time-Variant Reliability Analysis Methods for Dynamic Structures 271
The reliability results from this method may be not accurate due to the trunca-
tion error of the integral. Addressing this issue, the monotonic scaling function
is introduced to improve the computational accuracy of the maximum entropy
method. The truncation error of the numerical integration can be greatly reduced
by changing the definition domain of the PDF from an infinite interval to a lim-
ited interval.
The scaling function is expressed as:
g (X)
1 − exp
c
gs ( X ) = − (10.32)
g (X)
1 + exp
c
find: λ1, λ2 , λ3 , λ4
1 4 4 (10.33)
minimize: I = ln
∫
−1
exp −
∑
i =1
λi x i dx +
∑λ M
i =1
i i
s
1
R=
∫ p ( x ) dx (10.34)
0
where M is is the ith raw moment of g s ( X ), p ( x ) is the PDF obtained from M is.
For the time-variant reliability analysis, the scaling function can be expressed by:
g T ( d, N, T )
1 − exp
c (10.35)
g s ( d, N, T ) = −
g T ( d, N, T )
1 + exp
c
find : c
minimize: c = µ g ( d, N, T ) (10.36)
subject to: T ∈ [ 0,1]
{ }
P ( 0,1) = Pr t1 ∈ 0,1 (10.37)
∫
µ ( T ) = gT ( d, Z, T )p ( Z ) dZ (10.39)
∂g T
( )
g T ( d, Z, T ) ≈ g T d, Z, T * +
∂T T = T *
T −T*( )
1 ∂2 gT
( )
2
+ T −T* (10.40)
2 ∂T 2 T = T *
= AT 2 + BT + C
where:
1 ∂ 2 gT 1
A= ,
2 ∂T T = T * 2
2
∂gT ∂ 2 gT
B= − T* ,
∂T T = T * ∂T 2 T = T *
( )
2
∂gT T* ∂ 2 gT
T
(
C = g d, Z, T *
) −T*
∂T T = T *
+
2 ∂T 2 T = T *
.
find: f (τ )
∫
maximize: H = − f (τ ) ln f (τ )dτ (10.41)
subject to:
∫
τ i f (τ )dτ = M iF , i = 0,1, ⋅⋅⋅, l
P − PMCS
Err = × 100% (10.42)
PMCS
where P is the failure probability from the three proposed methods, PMCS is failure
probability from the MCS method.
10.4.1 Numerical Example
The time-variant limit state function is:
( )
g ( d, X, t ) = x12 x2 − 5 x3t + ( x4 + 1) exp x5t 2 − ∆ (10.43)
TABLE 10.2
Failure Probability Results for Example 1
∆ MCS FPD EVM-IME F-PTP Err1 (%) Err2 (%) Err3 (%)
70 0.00823 0.00897 0.00854 0.00893 8.99 3.77 8.51
72 0.01149 0.01099 0.01126 0.01117 4.35 2.00 2.79
74 0.01449 0.01506 0.01458 0.01626 3.93 0.6211 12.22
76 0.01798 0.01900 0.01880 0.01936 5.67 4.56 7.68
78 0.02312 0.02453 0.02386 0.02103 6.88 3.20 9.04
276 Reliability Engineering
EVM-IME method. In the F-PTP method, 101 function calls are used. Therefore,
the order of the computational efficiency is the F-PTP method > the EVM-IME
method > the FPD method.
p = σ b0 h0 ( N/m) (10.44)
where:
σ = 78,500 ( N/m).
∑
Ni T
F ( t ) = 3, 500 + 700 φi ρ ( t ) (10.45)
i =1
τi
where:
τ i , φiT , and ρ ( t ) can be obtained within different time interval according to the
appendix in this chapter
N i are independently standard normal random variables
With the effect of F ( t ) and p, the bending moment at the mid-span section is:
F ( t ) L pL2
M (t ) = + . (10.46)
4 8
A ( t ) = b ( t ) × h ( t ) (10.47)
A(t ) h (t )
Mu ( t ) = fy . (10.48)
4
g ( X, Y ( t ) , t ) = M u ( t ) − M ( t ) (10.49)
In this case, the time intervals [0,15] and [0,20] years are considered. The related
information of random parameters is given in Table 10.3.
The reliability results for the four methods are provided in Table 10.4. From
Table 10.4, it is seen that the order of the accuracy remains the same as that in exam-
ple 1. Since the expression of the limit state function has little impact on the compu-
tational efficiency, the computational efficiency keeps the same order.
TABLE 10.3
Information of Random Parameters in Example 2
Parameter Distribution Type Mean Standard Deviation
f Lognormal 240 24
b0 Lognormal 0.2 0.01
h0 Lognormal 0.04 0.004
F(t) Gaussian process 3,500 700
TABLE 10.4
Time-Variant Reliability Results in Example 2
Time
Interval MCS FPD EVM-IME F-PTP Err 1 Err 2 Err 3
[0,20] 0.00178 0.00182 0.00175 0.00174 2.28 1.69 2.25
[0,15] 0.00121 0.00119 0.00122 0.00116 1.65 0.826 4.13
278 Reliability Engineering
10.5 CONCLUSIONS
In this chapter, three time-variant reliability analysis methods including the FPD
method, the EVM-IME method, and the F-PTP method are discussed. From the
procedure and examples, the following conclusions can be reached: (1) the three
time-variant reliability analysis methods have the high computational accuracy,
which satisfy the engineering requirements; (2) the three time-variant reliability
analysis methods have the high computational efficiency, which provide the feasi-
bility for solving complex engineering problems; (3) the order of the computational
accuracy is approximately the EVM-IME method > the FPD method > the F-PTP
method; and (4) the order of the computational efficiency is the F-PTP method > the
EVM-IME method > the FPD method.
In the further research, the intelligent technique will be used for time-variant reli-
ability analysis to further improve the computational efficiency under the satisfac-
tion of high computational accuracy.
r
ξi T
Y ( t ) = m ( t ) + σ ( t ) ∑i =1
τi
φi ρ ( t ) (A10.1)
(φ ρ( t))
2
r T
∑
i
e( t ) = 1 − (A10.2)
i =1
τi
REFERENCES
1. Du X. Unified uncertainty analysis by the first order reliability method. Journal of
Mechanical Design, 2008, 130(9): 091401.
2. Wang Z, Huang HZ, Liu Y. A unified framework for integrated optimization under
uncertainty. Journal of Mechanical Design, 2010, 132(5): 051008.
3. Zhao YG, Ono T. Moment methods for structural reliability. Structural Safety, 2001,
23(1): 47–75.
4. Xiao NC, Zuo MJ, Zhou C. A new adaptive sequential sampling method to construct
surrogate models for efficient reliability analysis. Reliability Engineering & System
Safety, 2018, 169: 330–338.
Time-Variant Reliability Analysis Methods for Dynamic Structures 279
CONTENTS
11.1 Introduction.................................................................................................. 281
11.2 Latent Variable Model for Handling Incomplete Data................................ 282
11.2.1 Right Censoring.............................................................................. 282
11.2.2 Partly Observed Current Status Data............................................. 283
11.2.3 Competing Risks Models............................................................... 285
11.3 Latent Variable Model for Handling Heterogeneity.................................... 288
11.3.1 Frailty Models................................................................................ 288
11.3.2 Finite Mixture Models.................................................................... 289
11.3.3 Cure Models................................................................................... 292
11.3.4 Excess Hazard Rate Models........................................................... 294
11.4 Latent Variable or Process Models for Handling Specific Phenomena....... 296
11.4.1 Gamma Degradation Model with Random Initial Time................ 297
11.4.2 Gamma Degradation Model with Frailty Scale Parameter............ 298
11.4.3 Bivariate Gamma Degradation Models.......................................... 298
11.5 Concluding Remarks....................................................................................300
References............................................................................................................... 301
11.1 INTRODUCTION
A latent variable is a variable that is not directly observable and is assumed to
affect the response variables. There are many statistical models that involve latent
variables. Such models are called latent variable models. Surprisingly, there are few
monographs specifically dedicated to latent variable models (see, e.g., [1–4]). Latent
variables typically are encountered in econometric, reliability, and survival statistical
model with different aims. A latent variable may represent the effect of unobservable
covariates or factors and then it allows accounting for the unobserved heterogene-
ity between subjects, it may also account for measurement errors assuming that the
latent variables represent the “true” outcomes and the manifest variables represent
their “disturbed” versions, it may also summarize different measurements of the
same (directly) unobservable characteristics (e.g., quality of life), so that sample units
may be easily ordered or classified based on these traits (represented by the latent
variables). Hence, latent variable models now have a wide range of applications,
especially in the presence of repeated observations, longitudinal/panel data, and
multilevel data.
281
282 Reliability Engineering
In this chapter, we propose to select a few latent variable models that have proved
to be useful in the domain of reliability. We do not pretend to have an exhaustive
view of such models but we try to show that these models lead to various estimation
methodologies that require various mathematical tools if we want to derive large
sample properties. Basic mathematical tools are based on empirical processes theory
(see [5–7]), or martingale methods for counting processes theory (see [8]), or again
Expectation-Maximization (EM) algorithms for parametric models (see [9]).
This chapter is organized into three parts. The first part is Section 11.2, which
is devoted to incomplete data including right censored data, partly right and left
censored data, and competing risk data. Then in the second part, Section 11.3,
we consider models that allow consideration of heterogeneity in data, including
frailty models, finite mixture models, cure models as well as excess risk models.
The last part in Section 11.4 deals with models for time-dependent phenomena.
Indeed, we consider degradation processes for which the latent variable is either
a random duration; this is the case for the Gamma degradation processes with
random initial time or a frailty scale parameter. We also consider bivariate degra-
dation processes obtained from trivariate construction that requires a third latent
Gamma process.
H δ ( x ) = Pr( X ≤ x; ∆ = δ ),
and:
H ( x ) ≡ H0 ( x ) + H1 ( x ) = Pr( X ≤ x ) = 1 − ST ( x )SC ( x ).
Latent Variable Models in Reliability 283
∑1( X ≤ x; ∆ = 1)
1
H1n ( x ) ≡ i i
n i =1
and:
n
∑1( X ≥ x).
1
H n ( x) ≡ i
n i =1
∑∆
x
dH1n ( y) 1( Xi ≤ x )
T ( x) =
Λ
∫
0 Hn ( y )
=
i =1
i
Hn ( X i )
.
X = T if A=0
X < T if A =1
X > T if A = 2
284 Reliability Engineering
X = T and A = 0 if T ≤ C and ∆ = 1
X = C and A = 1 if C<T
X = C and A = 2 if T ≤ C and ∆ = 0
and for purposes of identification it is assumed that the random variables T , C, and
∆ are independent. As in the previous section, distributions functionals of T and C
are indexed by T and C, respectively, while Pr(∆ = 1) = p ∈ [0,1] . Note that p = 1
corresponds to the right censoring case and that p = 0 corresponds to current status
data. However, for identification it is assumed that p ∈(0,1], which guaranties that
a proportion of durations of interest will be observed. For the sake of simplicity we
assume that T and C admit PDF functions. Thus, defining:
Ha ( x ) = Pr( X ≤ x; A = a),
from which we obtain the following representation for the hazard rate function λT :
dH0 ( x )
λT ( x )dx = .
H0 ( x ) + pH1 ( x )
In addition we have:
H0 (0) Pr( A = 0)
p= = Pr(∆ = 1 | T ≤ C) ≡ .
H0 (0) + H2 (0) Pr( A = 0) + Pr( A = 2)
dH0 n ( x )
T ( x) =
Λ
∫ 1n ( x )
[0, x ] H0 n ( x ) + pH
where:
n n
∑ ∑1( X ≥ x; A = a)
1 1
Han ( x ) = 1( Xi ≤ x; Ai = a) and Han ( x ) = i i
n i =1
n i =1
and:
∑
n
1( Ai = 0)
p = i =1
.
∑
n
1( Ai ≠ 1)
i =1
T ( x ) can be written:
Alternatively the estimator Λ
∑
T (x) = 1( Ai = 0)1( Xi ≤ x )
Λ .
∑
n
i =1 {1( X j ≥ Xi ; Aj = 0) p + 1( X j ≥ Xi ; Aj = 1)}
j =1
In addition to the fact that this estimator is explicit, it is easily seen that it
can be written as functional of the three-dimensional empirical process
x ( H 0 n ( x ), H1n ( x ), H 2 n ( x )), which allows us to derive its asymptotic behavior by
standard empirical processes tools.
the lifetime of the ith component, the lifetime of the whole system is nothing but
X = min1≤ j ≤ p X j = X 1 ∧ ∧ X p and we note S X its reliability function. Let us con-
sider several model assumptions (A1, A2, and so on):
A1. X 1 … X p are i.i.d. and write S the common reliability function of these
random variables. Because the reliability function S X of X verifies:
p
SX ( x ) = Pr( X ≥ x ) = S( x ) ,
∑1( X
1
Hn ( x ) = j ≥ x )
n j =1
S ( x ) = Hn ( x ) ,
1/ p
n n
∑ ∑1( X ≥ x; ∆ = δ )
1 1
Hδ ,n = 1( Xi ≤ x; ∆ i = δ ) and Hδ ,n = i i
n n
i =1 i =1
the j-th cumulative hazard rate function can be consistently estimated by:
x
dH j ,n ( y)
j ( x) =
Λ
∫ 0 Hn ( y )
,
∑1( X ≤ x; D = d ),
1
H d ,n ( x ) = i i
n
i =1
∑1( X ≥ x).
1
Hn ( x ) = i
n
i =1
∑
j ( x) = 1 dH j ,n ( x ) 1 1( Xi ≤ x; Di = j )
Λ
α ∫
[0, x ] Hn ( x )
=
α i =1
∑
n
1( Xk ≥ Xi )
.
k =1
∑
1 dH0,n ( x ) 1 1( Xi ≤ x; Di = 0)
0 ( x) =
Λ
1 − α ∫
[0, x ] Hn ( x )
=
1 − α i =1
∑
n
1( Xk ≥ Xi )
.
k =1
To this end, they define H as the set of p × ( p + 1) real valued matrices such that
Ha = a* for all a* = (a1,, a p )T ∈ p and a = (a*T , ∑ pj =1a j )T ∈ p+1.
Then, for a consistent estimator Σ ( x ) of Σ( x ) ≡ Σ( x, x ), the authors define:
(
H ( x ) = arg min trace H Σ ( x ) H T
H ∈H
)
where a close form expression is available for H ( x ) and where H ( x ) has to
be calculated at points Xi ∈[0,τ ] such that Di > 0. Then Λ ( x ) = H ( x )Λ * ( x )
is a new estimator of Λ ( x ) asymptotically T -optimal in the sense that
among all the estimators obtained by linear transformation of Λ , this one
has the smallest asymptotic variance trace.
β α zα −1 − β z
f ( z ) ≡ f Γ (α , β ) ( z ) = e 1( z > 0).
Γ(α )
Using the Bayes inversion formula it is easy to show that conditionally on X = x , the
frailty Z is distributed according Γ(α ,α + Λ T ( x )) . We also derive the unconditional
PDF f X of X since:
Latent Variable Models in Reliability 289
+∞
fX ( x ) =
∫
0
λ X |Z ( x | z)SX |Z ( x | z) fZ ( z )dz
α α +1λ T ( x )
= .
(α + Λ T ( x ))α +1
α α +1λ ( X i | θ )
(α ,θ ) = arg max (α ,θ )∈(0,+∞ ) × ΘΠ in=1 .
(α + Λ( X i | θ ))α +1
The asymptotic properties of the estimators of α and θ are studied in [8] using the
theory of martingales for counting processes in the right censoring setup. The semi-
parametric joint estimation of (α , Λ T ) has been studied in [22,23]. Frailty models are
interesting ways to consider population heterogeneity. By introducing a known cor-
relation structure between the frailty random variable, it is possible to construct some
homogeneity test based on an approximation of the score function (see, e.g., [24]).
In the case where Z is a positive discrete random variable belonging to
{z1,, zd } ∈ (0, +∞) d for some 2 ≤ d ∈ and Pr( Z = zi ) = pi ∈ (0,1), then the reliabil-
ity function S X of X is defined by:
∑p S ( x)
zi
SX ( x ) = i T .
i =1
It means that the PDF f X is a convex linear combination of d PDF that are noth-
ing but the conditional PDF of X given Z = zi . This model is a special case of finite
mixture models that we discuss in the next section.
where the pi s are non-negative and sum to one and the f js are PDF. A latent variable
representation of T is possible in the sense that if T1,…, Td and Z are p +1 random
variables such that Tj has PDF f j for 1 ≤ j ≤ d , Z ∈{1,, d} with Pr(= Z z= j) p j for
1 ≤ j ≤ d , then if Z and (T1,…, Td ) are independent T and TZ have the same PDF and
thus the same distribution. T can be seen as the lifetime of an individual chosen at
290 Reliability Engineering
random within d populations where the proportion of individuals coming from the
ith population is pi and the lifetimes coming from the ith population are homoge-
neous with PDF fi . Sometimes we are interested in estimating the distributions of
the d sub-populations, that is, the distributions of the Tis.
Of course, if the latent variable Z is observed and if S j denotes the reliability
function of Tj for some 1 ≤ j ≤ d then based on n i.i.d. copies {(Ti , Zi )}1≤i ≤n of (T , Z ):
∑ 1(T ≥ x; Z = j) ∑
n n
i i 1( Zi = j )
S ( x ) =
j
i =1
and p j = i =1
,
∑ 1(Z = j)
n
n
i
i =1
fT = pf1 + (1 − p) f2
which shows that the semi-parametric identifiability fails. It is not possible to obtain
identifiability in the semi-parametric setup without additional constraints on the sub-
distribution functions fi . See [25] for the discussion of this problem in the setup of
right-censored data. In the case of mixture of parametric lifetime distributions, that
is when:
d
fT ( x; p,θ ) = ∑ p f ( x | θ )
j =1
j j
In the case of right censoring and left truncation, that is when instead of observ-
ing T we observe ( L, X , ∆) where X = TZ ∧ C = T ∧ C ≥ L and ∆ = 1(T ≤ C) with C a
right censoring time and L a left truncation time both independent of the label ran-
dom variable Z and the lifetime T . The authors in [29] have shown that it is possible
to use the EM–algorithm to estimate the unknown model parameters based on n
i.i.d. realizations of ( L, X , ∆) . However, in the discussion of the this paper, [30] men-
tioned that the EM–algorithm may be trapped by a local maximum and as proposed
in [31], as an alternative estimation method, to use the stochastic EM–algorithm.
Here we recall the stochastic EM–algorithm principle and we show that it can be
easily extended to the case of parametric mixtures when data are right censored
and left truncated. Let us write l = (l1,, ln ) , x = ( x1,, xn ) and δ = (δ1,, δ n )
where ( l , x, δ ) = ((l1, x1, δ1 ),,(ln , xn , δ n )) are n i.i.d. realizations of ( L, X , ∆) and for
1 ≤ i ≤ n we have xi = ti ∧ ci . Let us write t = (t1,, tn ) . For the sake of simplicity we
note for 1 ≤ k ≤ d , S(⋅ | θ k ) the reliability function of Tk and λ (⋅ | θ k ) its hazard rate
function, then it is not difficult to check that for 1 ≤ k ≤ d we have:
h( k , l , x, δ ; p,θ ) = Pr( Z = k | ( L, X , ∆) = (l , x, δ ))
pk ( λ( x | θ k ) ) S ( x | θ k ) / S (l | θ k )
δ
= .
∑
p
p j ( λ( x | θ j ) ) S ( x | θ j ) / S ( l | θ j )
δ
j =1
It is important to note that the above probability does not depend on the distribution
of L and C, thus it is possible to estimate both p and θ following the method of [25].
As the EM–algorithm, the stochastic–EM algorithm is an iterative algorithm
which requires an initial value for the unknown parameter θ , for example, θ 0,
and which allows us to obtain iterates ( p s ,θ s ) s≥1 . Indeed let ( p s ,θ s ) be the current
value of the unknown parameters, the next value ( p s+1,θ s+1 ) is derived in the fol-
lowing way:
pijs = h( j, li , xi , δ i ; ps ,θ s ).
Card ( X sj )
p sj +1 = ,
n
where:
∑ δ log λ( x | θ ) − ∫
xi
j (θ | ( l , x, δ )) = i i λ( x | θ )dx .
i∈X sj
li
θ js +1 =
∑ δ . i∈X sj
i
∑ (x − l )
i∈X sj
i i
Obtaining an initial guess θ 0 may be a tricky problem, see [25] for discussion and
comments about initialization of the stochastic EM–algorithm. There are several
ways to construct the final estimate based on K iterations of the algorithm. The most
classical one, because the sequence ( p s ,θ s ) s≥1 is a Markov chain, consists in taking
the ergodic mean of iterates, that is:
K K
∑ ∑θ .
1 1
p = ps and θ = s
K s =1
K s =1
11.3.3 Cure Models
Cure models are special cases of duration models; Boag, [33] was among the first
to consider a population of patients containing a cured fraction. He used a mixture
model to fit a data set of follow-up study of breast cancer patients and estimated the
cured fraction by maximum likelihood method. As previously stated, the specificity
of cure models comes from the fact that a fraction of subjects in the population will
never experience the event of interest. This outcome is the reason why most of cure
models are special cases of mixture models where the time of interest T has the fol-
lowing distribution:
T ∼ (1 − p) P0 + pδ ∞ ,
Latent Variable Models in Reliability 293
Because Pr(C < +∞) = 1, the event {T = +∞} will never be observed since X ≤ C
with probability one. Concerning the probability of being cured a logistic regression
model is generally assumed (see [34]):
exp(γ 0 + γ T z)
p( z | γ 0 , γ ) = .
1 + exp(γ 0 + γ T z)
qθ ( x, δ , z) ≡ Pθ ( Y = 1 | ( X , ∆, Z) = ( x, δ , z) )
p( z | γ 0 , γ )(1 − δ )
= .
p( z | γ 0 , γ ) + (1 − p( z | γ 0 , γ ) ) S0 ( x | z)
It is important to note that this conditional probability does not depend on the distri-
bution of the censoring variable. This fact is essential because it allows considering
the distribution of C as a nuisance parameter in the model.
Example 11.1
(1−δ )exp(γ 0 + γ 1 z)
qθ ( x,δ ,z) = .
(
exp(γ 0 + γ 1 z) + exp − xe β0 + β1z )
Thus, given θ ( k ) = (γ 0(k ) ,γ 1(k ) ,β0(k ) ,β1(k ) ) , the kth iterate of θ , for the simulation Step 2a
we have for 1≤ i ≤ n:
( )
y (ik ) ∼ B qθ ( k ) ( x i ,δ i ,z i ) ,
n
γ ( k +1) = argmax
2
γ ∈
∑(y
i =1
(k)
i ) ( )
(1− δ i ) log ( p( zi | γ ) ) + 1− yi( k ) log (1− p( zi | γ ) ) ,
And:
n
β ( k +1) = argmax
β ∈2
∑ ((1− y )δ ) ( β
i =1
(k)
i i 0 ( )
+ β1zi ) − 1− yi( k ) xi e β0 +β1zi .
Assuming that K iterates have been obtained, final estimate of Step 3 may be
obtained by averaging the iterates, that is θ = K −1∑ k =1θ ( k ) .
K
does not die from the disease is generally not null resulting in an improper excess
risk function λexc connected to p through the relationship:
+∞
p = exp −
∫0
λexc (s)ds .
Of course, in such a model the population risk and the excess risk may depend on
covariates and data are generally incomplete including, for instance, right cen-
soring. For example, a proportional hazards model on the excess risk function
allows us to include covariates effects (see [36] for an efficient semi-parametric
estimator).
Let us see that it is possible to obtain a latent variable representation for a time to
event T the hazard rate function of which is λobs . Indeed, let us introduce the random
variable A corresponding to the age at which the individual is diagnosed. Then let
Z be a Bernoulli random variable with probability of success p ∈[0,1], T∞ = +∞, T p
a positive random variable with hazard rate function λ pop, and T0 a positive random
variable with hazard rate function λ0. Assume, in addition, that conditionally on A
the random variables Z , T p, and T are independent, then conditionally on A = a , the
hazard rate of the random variable:
{Z × T∞ + (1 − Z ) × T0 } ∧ {Tp − A}
is λobs whenever we have for all t ≥ 0:
t e − Λexc ( t ) − p
Λ 0 (t ) =
∫0
λ 0( s)ds = − log
1− p
,
t
where Λ exc (t ) = ∫0λexc ( s) ds. It is interesting to note that the excess hazard rate model is
close to the competing risk model. Indeed, if T1 = Z × T∞ + (1 − Z ) × T0 and T2 = T p − A,
we observe the smallest lifetime T = T1 ∧ T2 and the lack of information about the
component failure (here 1(T1 ≤ T2 ) is not observed) is compensated by the assump-
tion that conditionally on A, the distribution of T2 is known.
There is a large amount of literature about parametric, semi-parametric, and
non-parametric estimation of these models. In addition, a major difficulty comes
from the heterogeneity of the observed T p which generally depend on covariates
that include the age at diagnostic. See, for example [37] for recent discussion about
this issue.
Here, for simplicity, we consider that λ pop is homogeneous, more precisely it
means that it does not depend on the age at diagnosis. Let Sobs (resp. Sexc and S pop) be
the survival function associated to the hazard rate function λobs (resp. λexc and λ pop).
It is straightforward to check that if A = a :
Hence, based on n i.i.d. copies (T (i ) )i =1,,n of T and assuming that all the individuals
are diagnosed at the same age a, the empirical estimator of Sobs is defined by:
1 n
S obs (t ) = ∑ Yi (t ),
n i =1
Sobs (t )
Sexc (t ) = .
S pop (a + t )
In this very simple case the asymptotic properties of S exc are easy to obtain. Suppose
now that the age at diagnosis varies from one individual to another, and let us write ai
the age at diagnosis of the ith individual. It is well known (see [8]) that the intensity
process of the counting process N (t ) = ∑ in=11(T (i ) ≤ t ) is:
n
∑Y (t) ( λ
i =1
i exc (t ) + λ pop (ai + t ) ) .
n
dNi (s) − Yi (s)λ pop (ai + s)ds
∑∫
t
exc (t ) =
Λ ,
∑
n
i =1
0
Yi (s)
i =1
β α x α −1 exp(− β x )
fΓ (α ,β ) ( x ) = 1( x ≥ 0).
Γ(α )
Note that if the shape function satisfies a(t ) = at , then the Gamma process X is
homogeneous since for s ≥ 0 and t ≥ 0, the distribution of Xt + s − Xt is nothing but the
Γ( as, b) distribution which hence does not depend on t.
t
b a( t − s;θ1 ) y a( t − s;θ1 )−1 exp( −by )
fYt ( y;θ ) = (1 − FT (t ;θ 2 ) ) δ 0 ( y ) +
∫
0 Γ( a(t − s;θ1 ))
fT ( s;θ 2 )dsdy
with respect to the sum of the Dirac measure δ 0 at 0 and the Lebesgue measure
dy on where θ = (θ1,θ 2 , b). When N i.i.d. copies (Y ( k ) )k =1,, N of the delayed
degradation process Y = (Yt )t ≥0 are observed at times 0 = t00 < t k1 < < t knk for
k = 1,, N , it is possible to derive the joint distribution of (Ytk( k1) ,,Ytkn
(k )
) to apply
k
a maximum likelihood principle. However, due to numerical instabilities the max-
imization of the associated log-likelihood function is a tricky problem. An alterna-
tive estimation method based on the pseudo-likelihood (or composite likelihood)
can be (see, e.g., [40]) an alternative method. It simply consists in maximizing:
N nk
(θ ) ∑∑ log ( f
k =1 i =1
Yt
ki )
( yki ;θ ) ,
where for 1 ≤ i ≤ n and 1 ≤ j ≤ N , yki is the observation of Ytki( k ) . In other words, the
pseudo-likelihood method consists in doing as if the random variables Ytki( k ) were
independent, this simplifies the calculation of the log-likelihood at the price of a loss
of efficiency. See [39] for an application to competing degradation processes.
298 Reliability Engineering
n
(δ xi )∆ai −1 b ∆ai exp(−bδ xi )
f∆X1,,∆Xn |B (δ x1,, δ xn | b) = ∏
i =1
Γ(∆ai )
,
n +∞
(δ xi )∆ai −1 b ∆ai exp(−bδ xi )
f∆X1,,∆Xn (δ x1,, δ xn ) = ∏∫
i =1
0 Γ(∆ai )
fB (b)db.
n ∆ai −1 α
δ xi β Γ(∆ai + α )
f∆X1,,∆Xn (δ x1,, δ xn ) = ∏i =1
δ xi + β
δ xi + β Γ(∆ai )Γ(α )
.
N n ∆ai (θ )−1 α
δ xij β Γ((∆ai (θ ) + α )
(θ ,α , β ) = ∏∏
j =1 i =1
δ xij + β
δ xij + β Γ(∆ai (θ ))Γ(α )
,
process from one hand, and, on the other hand, that the components of the bivariate
process share a common Gamma latent process which allows obtaining correlation
between the two marginal processes.
Now let us consider three independent Gamma processes X ( i ) for 0 ≤ i ≤ 2 with
scale parameter one and shape functions α i : + → + . The bivariate Gamma pro-
cess Y is defined by:
Yt(1)
= (X (0)
t )
+ Xt(1) / b1
(2)
Yt = (X (0)
t + Xt(2) )/b 2
where b1 and b2 are two positive scale parameters. As a consequence Y has indepen-
dent increments and for i = 1,2 the marginal process (Yt ( i ) )t ≥0 is a Gamma process
with scale parameter bi and shape function α 0 + α i . In addition it is straightforward
to check that we have for i = 1,2:
α 0 (t ) + α i (t ) α 0 (t ) + α i (t )
( )
Yt(i ) =
bi
and ( )
var Yt(i ) =
bi2
.
and:
α 0 (t )
(
cov Yt(1), Yt(2) = ) b1b2
.
∆Y (i ) α + α i
µi = j = 0 for i = 1,2,
∆t j bi
∆Y (i ) α + α
σ i2 = var j = 0 2 i for i = 1,2,
∆t j bi
∑ (Y )
n
(i )
tj − Yt(ji−)1
j =1
µ = for i = 1,2,
∑
i n
∆t j
j =1
300 Reliability Engineering
∑ (Y )
n 2
2
(i )
tj − Yt(ji−)1 − µ i ∆t j
σ i = j =1
for i = 1,2,
∑
n 2
(∆t j )
∑
n
j =1
∆t j −
∑
j =1 n
∆t j
j =1
∑ ( ∆Y )( ) ,
n
j
(1)
− µ1∆t j ∆Yj(2) − µ 2 ∆t j
j =1
ρ =
∑ (∆t )
n 2
∑
n j
j =1
∆t −
∑ ∆t
j n
j =1
j
j =1
σ 12σ 22 ρ
α 0 =
µ1µ2
µi
bi = for i = 1,2
σ i2
µi2 σ 12σ 22 ρ
α i = − for i = 1, 2
σ i2 µ1µ2
we obtain:
2 2
α 0 σ 1 σ 2 ρ
=
µ 1 µ 2
µ i
bi = 2
for i = 1, 2
σ i
2 2
µ i2 σ 1 σ 2 ρ
α i = 2
−
µ1 µ 2
for i = 1,2
σ i
REFERENCES
1. B. Everett. An Introduction to Latent Variable Models. Springer Monographs on
Statistics and Applied Probability, Chapman & Hall, London, UK, 2011.
2. D. Bartholomew, M. Knott, and I. Moustaki. Latent Variable Models and Factor
Analysis: A Unified Approach. Wiley Series in Probability and Statistics, 3rd ed. John
Wiley & Sons, Chichester, UK, 2011.
3. A.A. Beaujean. Latent Variable Modeling Using R: A Step-by-Step Guide. Taylor &
Francis Group, New York, 2014.
4. J.C. Loehlin and A.A. Beaujean. Latent Variable Models: An Introduction to Factor,
Path, and Structural Equation Analysis, 5th ed. Taylor & Francis Group, New York,
2016.
5. A.W. van der Vaart and J.A. Wellner. Weak Convergence and Empirical Processes.
Springer Series in Statistics, New York, 1996.
6. A.W. van der Vaart. Asymptotic Statistics. Cambridge University Press, New York,
1998.
7. M. Kosorok. Introduction to Empirical Processes and Semiparametric Inference.
Springer Series in Statistics, New York, 2006.
8. P.K. Andersen, O. Borgan, R.D. Gill, and N. Keiding. Statistical Models Based on
Counting Processes. Springer Series in Statistics, New York, 1993.
9. G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York,
2000.
10. W. Nelson. Theory and applications of hazard plotting for censored failure data.
Technometrics, 14(4):945–966, 1972.
11. O. Aalen. Nonparametric inference for a family of counting processes. The Annals of
Statistics, 6(4):701–726, 1978.
12. E.L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations.
Journal of the American Statistical Association, 282(53):457–481, 1958.
13. B.W. Turnbull. The empirical distribution function with arbitrary grouped, censored
and truncated data. Journal of the Royal Statistical Society, Series B, 38:290–295,
1976.
14. J. Huang. Asymptotic properties of nonparametric estimation based on partly interval–
censored data. Statistica Sinica, 9:501–519, 1999.
15. V. Patilea and J.M. Rolin. Product-limit estimators of the survival function for two
modified forms of current-status data. Bernoulli, 12:801–819, 2006.
16. L. Bordes, J.Y. Dauxois, and P. Joly. Semiparametric inference of competing risks data
with additive hazards and missing cause of failure under mcar or mar assumptions.
Electronic Journal of Statistics, 8:41–95, 2014.
17. J.W. Vaupel, K.G. Manton, and E. Stallard. The impact of heterogeneity in individual
frailty on the dynamics of mortality. Demography, 16(3):439–454, 1979.
18. P. Hougaard. Analysis of Multivariate Survival Data. Springer, New York, 2000.
19. L. Duchateau and P. Janssen. The Frailty Model. Statistics for Biology and Health,
Springer, New York, 2008.
20. A. Wienke. Frailty Models in Survival Analysis. CRC Press, Boca Raton, FL, 2010.
21. P. Hougaard. Frailty models for survival data. Lifetime Data Analysis, 1:255–273, 1995.
302 Reliability Engineering
43. F.A. Buijs, J.W. Hall, J.M. van Noortwijk, and P.B. Sayers. Time-dependent reliability
analysis of flood defences using gamma processes. In G. Augusti, G.I. Schueller, and
M. Ciampoli, editors, Safety and Reliability of Engineering Systems and Structures,
pp. 2209–2216; Proceedings of the Ninth International Conference on Structural Safety
and Reliability (ICOSSAR), Rome, Italy, June 19–23, 2005, Mill-Press, Rotterdam, the
Netherlands.
12 Expanded Failure Modes
and Effects Analysis
A Different Approach for
System Reliability Assessment
Perdomo Ojeda Manuel, Rivero Oliva Jesús,
and Salomón Llanes Jesús
CONTENTS
12.1 Background for Developing the Expanded Failure Modes and
Effects Analysis...........................................................................................306
12.2 Some Distinctive Features of the FMEAe Methodology............................308
12.3 Criticality Analysis of Failure Modes by Applying the Component
Reliability Model Approach........................................................................ 310
12.3.1 Indexes Used in the Criticality Analysis of
Components-Failure Modes���������������������������������������������������������� 313
12.3.1.1 Component Risk Index.................................................. 313
12.3.1.2 System Risk Index......................................................... 314
12.3.1.3 Index of Relative Importance of the
Component-Failure Mode i������������������������������������������ 315
12.3.2 Treatment of Redundant Components............................................ 316
12.4 Procedure for Treating the Common Cause Failures in FMEAe................ 319
12.4.1 List of Components with Potential to Generate Common
Cause Failures................................................................................ 321
12.4.2 Classification of Common Cause Failures into Groups by
Their Degree of Dependency......................................................... 322
12.4.3 Assignment of Postulated Generic β Factors................................. 323
12.4.4 Correction of the β Factor of the Common Cause Failure
Events, According to the Degree of Redundancy.......................... 324
12.5 Analysis of Importance by Component Type.............................................. 324
12.6 Reliability Assessment of a Generic Fire Quenching System
Applying FMEAe........................................................................................ 325
12.6.1 General Assumptions and Other Considerations for the Analysis...... 327
12.6.2 Preparing the Worksheet for the Analysis and Reliability
Assessment in FMEAe������������������������������������������������������������������ 328
305
306 Reliability Engineering
• Not all the necessary information for a correct decision is always available
• An important part of the information may be available, but not organized
and processed in an appropriate manner
• Gathering the raw data of the facility, processing them adequately, and pre-
paring a database oriented to reliability and safety, so that a specialized
computer tool of reliability and risk analysis can use it in a proper manner
• Training and qualifying specialists and managers in the use of these data-
bases and specialized computer programs so that the data can be used cor-
rectly and decisions can produce the expected results
Training and qualifying the staff of an industry in the use of specialized programs in
this field is not a major problem, or at least its solution can be ready in the short term,
because there is currently a significant amount of experience in that field.
Nevertheless, the collection of data, its handling, and developing of computerized
databases, ready to be used in risk and reliability studies, are time-consuming tasks.
On the other hand, the sample of available data should be sufficiently representative
of the processes that are going to be modeled (e.g., failure rates of components-
failure modes and average repair times).
Moreover, inaccuracies in the definition of the component boundaries and in the
way the raw data are described, among other aspects, bring with them uncertain-
ties in the data to be processed. The uncertainties degrees could be so high that,
for example, the generic databases available for use in the Probabilistic Safety
Assessment (PSA) indicate differences of up to 2 orders of magnitude in the values
of the failure rates of the same failure mode and type of equipment [2,3].
Expanded Failure Modes and Effects Analysis 307
TABLE 12.1
Comparative Matrix of Reliability and Risk Analysis Techniques
Techniques ► HAZOP FMEA Checklist What If? SR PreHA ETA FTA
Items to Compare ▼
Completeness ++ ++ − − − − + ++
Structured approach ++ ++ − − − − ++ ++
Flexibility of application − ++ ++ ++ ++ ++ + +
Objectivity + + − − − − + +
Independence on quantitative + + ++ ++ ++ ++ ++ −
dataa
Capability of modeling − − − − − − ++ ++
dependences
Independence on the analysis − + ++ + + − −− −−
team expertiseb
Quickness in obtaining results − − + + + + + −
++ High
+ Moderate
− Low
−− Very low or none
one of the most frequent causes of accidents in complex industrial facilities has been
the common cause failures and human errors, hence the importance of being able to
treat them adequately in these studies.
An important part of the insufficiencies or limitations found in the qualitative
techniques presented in Table 12.1 have been resolved in FMEAe. The most sig-
nificant improvements in FMEAe methodology, in comparison with the traditional
FMEA1 were introduced through several procedures for:
In FMEAe, these analyses are carried out through algorithms of identification and
comparison of strings. These strings include the information in the fields of the tra-
ditional FMEA worksheet, together with some other that have been added to enlarge
and complete the information about the design and functioning of the components
involved in the analysis [27,28].
Figure 12.1 presents the work sheet of FMEAe in ASeC computer code, showing
the CCF modes included at the final part of the list (those whose code begins with CM),
1 Not included in the computer codes considered in the state of the art of this methodology.
310 Reliability Engineering
which form part of the criticality analysis. These events are generated automatically by
algorithms that handle the information the analyst provides for the worksheet using the
first two tables: Datos Técnicos (Engineering- related Data) and Datos de Fallos y Efectos
(Failures and Effects Data). The third table of the worksheet, Observaciones (Remarks),
serves to complete ideas or descriptions about the failure mode, mode of operation, mode
of control of the components, or any consideration made for the analysis.
In the lower left corner of the worksheet appears a panel showing the quantity
and code of the component-failure modes, which are the precursors of the CCF, clas-
sified by their degree of dependence (G1 to G3, in decreasing order of size). In this
example, ten precursors having a G3 dependence degree were determined, where
G3 represents the highest degree of dependence, G2 an intermediate degree, and G1
the lowest degree. As can be observed, for this example, there were no precursors of
the G1 degree because all the five pairs of precursors share the attributes of the G3
failure. Later, in Section 12.4, the methodology developed for the automated deter-
mination of CCFs in FMEAe will be described in more detail.
After the worksheet fields had been filled, if there are data available it is conve-
nient to start the criticality analysis by determining the CCFs so that their influence
in the results is not missed. This latter issue is especially important in the case of
redundant systems, which can be verified later (Section 12.6) through the example of
application of the FMEAe methodology to a fire cooling system.
After determining the CCF, the criticality analysis is carried out by one of two
approaches: the Component Reliability Model or the Risk Matrix. The first approach
uses models similar to those included in the FTA technique to estimate the probability
of basic events and it is discussed in this chapter. Here, the calculated reliability param-
eter (probability of loss of functional capacity) is one of the factors used to determine
the criticality of the failure modes, together with the degree of severity of their effects.
There are three types of effects, each of them requiring of a separate analysis: the
environmental effects (EA), the effects on the safety or health (ES), and the effects
on the system availability (ED). The example shown in Figure 12.1 presents a case in
which the failure modes affected only the system availability.
Another distinctive feature of the FMEAe worksheet can be observed from
Figure 12.1, and it refers to the way the information is presented to the analyst. As
can be seen, the worksheet contains three tables that present all the relevant informa-
tion for the analysis in a unique screen page so that the user can access all the aspects
at once without the need to scroll to another page.
To model the behavior of components reliability through the loss of their func-
tional capacity, two parameters are considered. One of them is the probability of
failure (p), which characterizes the reliability of the components that must operate
during a given mission time; the other is the unavailability (q), for those in standby
that must change their position or state at the time of the demand.
Together with the reliability parameter, the effects caused by the failure modes
are considered to form a matrix (probability of occurrence vs. severity of the effects
of each failure mode). This matrix is affected, in turn, by a weighting (quality) factor
that considers the way the equipment is commanded, that is, auto-actuated or manual
mode from either a remote panel or locally (at field).
The rest of the characteristics that influence the functional capacity of the compo-
nent, such as degree of redundancy and the control mode (periodic testing, continu-
ously monitoring, or non-controlled component), are already included implicitly in
the reliability model of each component-failure mode and in the severity of the effect
considering the information filled in the worksheet by the analyst.
There are five degrees of severity for the three kind of effects considered in
FMEAe, which are described in Tables 12.2 through 12.4.
This approach assumes that once the failure mode has occurred, the effect will
take place. In the case that more than one effect of the same kind (environment-
related, safety/health-related, or facility’s availability-related) can occur, the one
with the highest severity is chosen.
TABLE 12.2
Severity of the Environmental Effect of a Failure Mode (EA)
Qualitative Classification Associated
of the Effect Value Meaning
Low 1 There are no impacts on facility’s site. It considers only
internal minor effects. Corrective measures are
not required.
Serious 2 There are minor impacts out of the facility boundaries, which
demand some cleaning procedures, considering a recovery
time of 1 week or less. There is presence of smoke, noise,
and bad smells. The local traffic is affected by the
evacuation.
Severe 3 There are minor impacts outside the facility boundaries,
which require some cleaning processes with a recovery time
of at least 1 month. Possible wounded or injured people.
Very severe 4 There are serious impacts outside the facility boundaries.
Reversible damages are considered with a recovery time
of up to 6 months. Moderated impacts on animal and
vegetal life. Temporary disabilities of people.
Catastrophic 5 There are significant impacts outside the facility boundaries,
with a recovery time of more than 6 months. Irreversible
damages on animal and vegetal life are considered. Possible
deaths or permanent disabilities of people are considered.
312 Reliability Engineering
TABLE 12.3
Severity of the Effect of a Failure Mode on the Safety or Health (ES)
Qualitative Classification Associated
of the Effect Value Meaning
Low 1 Local minor effects (including first aid procedures).
There are no disabling damages.
Serious 2 Appreciable internal effects. Temporary injuries and
disabilities.
Severe 3 Important internal effects. Some permanently injured
and disabled people. The occurrence of up to 1 death is
considered possible.
Very severe 4 Very important internal effects. Several permanently
injured and disabled people. Up to 4 deaths are possible.
Catastrophic 5 Catastrophic internal effects. Multiple permanent
affectations. Numerous deaths (5 cases or more).
TABLE 12.4
Severity of the Effect of a Failure Mode on the Facility’s Availability (ED)
Qualitative Classification Associated
of the Effect Value Meaning
Low 1 There is no effect on production/functioning. Additional
maintenance tasks during shutdown could be required.
Serious 2 Loss of important redundancy/reserve. An unplanned
shutdown within 72 hours could be required. Recovery
time of up to 1 month is considered.
Severe 3 Immediate shutdown is required. Recovery time of
1–3 months is considered.
Very severe 4 Immediate shutdown is required. Recovery time of
3–6 months is considered.
Catastrophic 5 Immediate shutdown is required. Recovery time of more
than 6 months.
Any necessary clarification in support of the analysis as, for example, some
analysis hypothesis, basis of causes and effects, or assignment of certain param-
eters, whose certainty is not proven, is made in the Remarks table of the work-
sheet, for each failure mode analyzed. Finally, the corrective measures derived from
all the information collected and the criticality analysis are incorporated in the
Recommendations field of the Results sheet.
Expanded Failure Modes and Effects Analysis 313
The criticality analysis in FMEAe uses some factors related to the following subjects:
Next, a set of semi-quantitative indices is defined for the criticality analysis of the
component-failure modes.
where:
qi is the probability of failure or the unavailability of the component-failure mode i
EDi, ESi, and EAi, are the severity degrees of the three kinds of effects
(availability-related, safety-related, and environmental, respectively)
induced by the component-failure mode i
FPmi is the weighting (quality) factor that considers the way the respective equip-
ment that experiences a failure mode i is commanded when it is demanded
for operation, that is, auto-actuated or in manual mode, from either a remote
panel or locally at field
It takes the following values: 1 for components auto-activated; 3 for components
commanded in remote manual mode (from a control room), and 5 for components
commanded manually at field (by hand switch located near the equipment).
314 Reliability Engineering
In this way, the criticality scale starts with the target minimal value of IRC = 1.0E-03,
and increases periodically by a factor of 5 until reaching the postulated upper limit
of IRC = 1.2E-01, above which it considers that the criticality of the component-
failure mode is extreme (extremely critical component-failure mode). The scale is
as follows:
∑ IRC (12.4)
i
IRS = i =1
n
Expanded Failure Modes and Effects Analysis 315
where:
IRCi is the risk index of the component-failure mode i
n is the total of component-failure modes of the analyzed system
Following similar criteria to those defined for the IRC, it is postulated that the IRS
target value is 1.0E-3. From this goal, a scale like that proposed for the IRC is estab-
lished, but with more intermediate ranges for a finer classification of the system reli-
ability. The scale is as follows:
IRCi
IIR i = (12.5)
IRS
where:
IRCi is the risk index of the component-failure mode i
IRS is the risk index of the system
In this way, those component-failure modes, very critical or critical (according with
their IRCi values) with too important or important deviations in excess, will receive
the highest priority for proposing corrective measures to diminish their criticality.
Thus, each group of redundant components is identified with a unique integer value
in the “C” cell of the respective component-failure mode, and the following attri-
butes must coincide for that group, which represent table fields in the worksheet of
the FMEAe (see Figure 12.1):
After the group of redundant components has been identified, the unavailability of
their values or probabilities of failure (represented by qi) are weighted (penalized),
as follows:
( q) n
qp( n) =
n (12.6)
where:
qp(n) is the weighted unavailability/probability of failure mode of redun-
dant components with redundancy degree n
n is the degree of redundancy of the redundant component group
q is the original unavailability/probability of failure mode of the redundant
component group degree of redundancy n
∏q i
qp ( n ) = i =1
(12.7)
n
Qs = q( A × B) + q(C)
= [ q( A ) × q( B) ] + [ q(C) ]
=1E-6 +1E-3
=1.001E-3
where:
IRC[A], IRC[B], and IRC[C] are the component risk indexes of components
A, B, and C, respectively, assuming no effects
N is the total number of component-failure modes analyzed (assumed three
for this example)
From these results, the inconsistency of redundant components A and B can be con-
cluded having the same contribution to the IRS of the component C. Different from
C, the occurrence of failure mode A, or failure mode B, is not a sufficient condition
for the system to fail. Then, this result means an excessively conservative contribu-
tion of the components A and B was caused by the previous procedure (without
weighting of q values of redundant components).
The previous problem is solved by applying the weighting procedure in estimat-
ing the q values of the failure modes, in the case of redundant components, as estab-
lished by expression 12.7. Thus, the contribution of the risk index of each of the
failure modes of the previous example to the IRS, is estimated as follows:
qp (A,B) = (1E-3)2 / 2
= 5E-7
Now the IRS can be estimated again, but using the new values of q for A and B,
by expression 12.4:
As can be observed, the new IRS value (3.34E-4) is smaller than the formerly esti-
mated value without weighting the q values of the redundant components (1.0E-3),
which is considered more realistic because it considers the expected effect of the
redundancy on the system reliability.
The new IIR indexes for the component-failure modes A, B, and C are now
estimated again, but substituting the modified values of q and IRS, applying
expression 12.5:
Finally, it should be noted that the new values obtained for IRS and IIR[A], IIR[B],
and IIR[C], are more representative of the system reliability, with a less value of IRS
and more realistic relative importance or contribution to the IRS of the three compo-
nents-failure modes (IRR[A] = IRR[B] << IIR[C]). This result proves the usefulness
of the weighting procedure for treating redundant components included in FMEAe.
In this way, the objective of treating the CCFs within a structured method like
FMEAe is to avoid letting their effects on results go unnoticed in cases where their
occurrence is possible despite the availability of quantitative specific data that reflect
their occurrence. Thus, FMEAe employs generic data, as a first approximation,
although later, the analysts can update them if the experience warrants it or depend-
ing on the existence of local defense measures.
The set of procedures developed to include the CCFs in an automated way
within the worksheet of FMEAe summarizes criteria and steps of other pro-
cedures collected in the literature specialized in the subject [29–34], and the
efforts to update and improve the methodology and general approaches of con-
cern [34]. To achieving this, the typical structure of the traditional FMEA work-
sheet had to be modified and some new fields were included in the tables, as
shown in Figure 12.1, to facilitate the analysis and comparison of the pertinent
information.
The algorithms of these procedures include the following general tasks/steps:
where:
βk is the beta factor to characterize the failure of k components from the
same generic group of size m
β2 is the generic beta factor to characterize the failure of two components
from the same group (assumed here as β2 = 0.1, which is the average
of the values of β factors estimated for the component-failure modes
involved in previous CCF studies [30,31])
• Finally, the resulting CCFs are added in the worksheet, so that they are
part of the criticality analysis together with the rest of the single failure
modes.
The next Sections 12.4.1 through 12.4.4 offer some details of these tasks.
• BT: Battery
• DG: Emergency diesel generator
• KA: Circuit breaker general
• KB: Circuit breaker bus bar
• KG: Circuit breaker generator
• MA: Motor electrical
• PD: Diesel driven pump
• PM: Motor driven pump
• PT: Turbine driven pump
• QB: Blower fan
• QC: Compressor
• RT: Relay time delay
322 Reliability Engineering
The dependency degrees are used in FMEAe to qualify the depth with which the
dependency mechanisms defined in [30,31,34] could act. Therefore, the ratio of fail-
ures due to common causes among the totality of causes is a value directly propor-
tional to that dependency degree. Following is a general description of the procedure
for classifying CCFs by their degree of dependency:
• G1: β2 = 0.1
• G2: β2 = 0.15
• G3: b2 = 0.2
324 Reliability Engineering
• After the IRCi values have been calculated, they are grouped by component
types, according to their generic code (field Cod G. in the Datos de fallas y
effectos table of the FMEAe worksheet).
Expanded Failure Modes and Effects Analysis 325
∑ IRC k
i
IIC[k ] = i =1
(12.9)
Nk
where:
IIC[k] is the importance index of component type k (average IRC value
within a given group k of generic components)
Nk is the total number of component-failure modes belonging to the group k
IRCik is the risk index of the component-failure mode i belonging to group k
• Thus, the most important component types which engender the most crit-
ical failure modes are determined; that is, the types of components that
most contribute to the risk can be known and, therefore, unique correc-
tive actions for similar components can be typified or, otherwise, important
design changes can be proposed.
TABLE 12.5
State/Position of Components System of Figure 12.3
Component Component Standby Demand
No. ID Description State/Position State/Position Control
1. TK Water storage tank Full level Empty after Continuously
mission fulfilled monitored
2. V1 Manual operated valve Normally open Normally open Periodically
for isolating the odd tested
train
3. V2 Manual operated valve Normally open Normally open Periodically
for isolating the even tested
train
4. PM1 Motor driven pump. Automatic Running for Periodically
Odd train 4 hours tested
5. PM2 Motor driven pump. Standby Running for Periodically
Even train 4 hours if PM1 tested
fails
6. VC1 Check valve. Odd train Normally closed Open while PM1 Periodically
is running tested
7. VC2 Check valve. Even Normally closed Open while PM2 Periodically
train is running tested
8. MV1 Motor operated valve Normally closed Full open Periodically
at discharge of the tested
odd train
9. MV2 Motor operated valve Normally closed Full open if train Periodically
at discharge of the odd is failed tested
even train
10. MV3 Motor operated valve Normally open Full closed Periodically
for testing the odd tested
train
11. MV4 Motor operated valve Normally open Full closed is Periodically
for testing the even PM2 is running tested
train
12. SP Sprinkler Empty Cooling water Non-controlled
flowing
13. Power6KV Support system for Energized Energized Continuously
power supply of both monitored
PM1 and PM2
14. Power380V Support system for Energized Energized Continuously
power supply of all monitored
MOVs
15. Odd-IC I&C circuit for Energized Energized Continuously
auto-activation of the monitored
odd train
16. Even-IC I&C circuit for Energized Energized Continuously
auto-activation of the monitored
even train
Expanded Failure Modes and Effects Analysis 327
4. Under signal of fire event, if the flow is not established, a signal for activa-
tion of the even train is produced, which starts PM2 (set in standby position
of its HS), closes MV4, and opens MV2.
5. The odd train is tested every 720 hours through an MV3 valve flowing the
cooling water in recirculation mode through MV3 to the water storage tank
TK, and 15 days later, the even train is tested by starting PM2 and recircu-
lating cooling water through VM4 to the water storage tank TK.
6. The motor operated valves (MOVs) MV3 and MV4 are tested monthly by clos-
ing them, following the same procedure used for testing each train. When the full
closed position is verified, the valves are opened again and stay in that position.
7. In a way like the former case, the MOVs MV1 and MV2 are tested monthly
by opening them. When the full open position is verified, the valves are
closed again and stay in that position.
8. The motor driven pumps, PM1 and PM2, are powered from the same 6000
volts alternating current (6 kV AC) bus bar.
9. All MOVs, MV1, MV2, MV3 and MV4 are powered from the same 380 V
AC bus bar.
1. Only two types of support systems were considered: power supply and I&C
circuits for auto-activating the fire quenching system.
2. The position of the HS of the active components in the case of the odd train
is set to automatic. This means that under a real demand condition, the gen-
erated signal will act on the components of the odd train.
3. The position of HS for the active components of the even train is set in
standby, which means that they will act only on the condition of coinci-
dence of PM1 failed and fire alarm signal present.
4. The only human errors considered for this example refers to “V1 in wrong
position on demand” (it fails to remain open on demand) due to human error
after maintenance of the odd train; and “V2 in wrong position on demand”
(it fails to remain open on demand) due to human error after maintenance
of the even train. Both human actions are considered as independent events.
5. The pumps and valves are in the same room.
6. The boundary of pumps and MOVs includes the respective circuit breakers
so that the interface for power supply is considered the BB-6KV bus bar for
PM1 and PM2; and BB-380V bus bar for all MOVs from MV1 to MV4.
7. The interfaces for I&C circuit are assumed to be RC-101 for odd-IC circuit
and RC-102 for even-IC circuit.
8. All quantitative data for component-failure modes were taken from generic
databases starting from [2,3,39].
9. For simplicity, only one failure mode for each component was considered,
except for the motor driven pumps. In the case of the manual operated
328 Reliability Engineering
valves, V1 and V2, the hardware cause for “fail to remain in position” fail-
ure mode was neglected and only the human error was considered instead.
10. The sprinklers also were excluded from the assessment because they are
passive components with very low failure rates.
Figure 12.3 presents the FMEAe worksheet with all the essential information filled
in the respective cells, before the CCF events are determined.
It can be observed from Figure 12.3 that 17 component-failure modes are included
in the analysis, according to the system drawing of Figure 12.2, the assumptions, and
other information of interest.
After the fields of the tables are filled, the next step is to proceed with the critical-
ity analysis. To prove how the risk profile of the system is modified due to the inclu-
sion of CCF events in the reliability assessment by means of FMEAe, the results of
both cases are compared, which are presented in Figures 12.5 and 12.6. Figure 12.4
shows the modified FMEAe worksheet after the CCFs were determined.
The worksheet in Figure 12.4 shows an increase in the number of component-failure
modes with respect to Figure 12.3. Thus, after determining CCF events the list of
component-failure modes that participate in the criticality analysis encloses 22 ele-
ments, because five CCFs were added (those whose code begins with CM-). Since the
degree of redundancy is two, five CCFs of double-failure were added as indicated at
FIGURE 12.3 FMEAe worksheet showing the last five component-failure modes of the list
(before determining the CCF events).
Expanded Failure Modes and Effects Analysis 329
FIGURE 12.4 FMEAe worksheet modified after determining CCF events showing the last
5 out of 22 component-failure modes.
FIGURE 12.6 List of component importance ranked by the F-V importance measure esti-
mated by the ARCON code without CCF contributions.
the panel located in the left-low corner of the worksheet. They were classified as degree
G3 because they all share the complete set of attributes to be considered (internal and
external attributes).
Then, the analysts must verify all the information concerned in the worksheet
before the criticality analysis is made to avoid inconsistence of results. The data
accompanying the single failure modes generating the CCFs are transferred to
the latter. Some of them, like failure rates, need to be recalculated, which is done
automatically by the FMEAe algorithms. However, the analysts still need to enter
some data in the worksheet, as is the expected effects of each CCF-related failure
modes; in this case, it refers to effects related to system availability (ED), whose
degrees of severity are indicated in the Table 12.3. For this example, the effects
of each of the five CCF-related failure modes were assigned to a severe degree of
severity (3) which means in FMEAe approach: Immediate shutdown is required.
Recovery time of 1–3 months is considered. After doing this, the criticality analy-
sis can be performed.
Expanded Failure Modes and Effects Analysis 331
According to the data introduced by analysts, several indexes are estimated by algo-
rithms of criticality analysis (through the corresponding expressions of Sections 12.3
and 12.4). Figure 12.5 shows the FMEAe Result sheet in which it can be seen the
value of IRS and the ranking of criticality of the component-failure modes without
CCF contribution.
Despite the low value of IRS (9.39E-5), which means that according to the data
used the system presents high degrees of reliability/low degrees of risk, Figure 12.5
shows the component-failure modes which dominate the system reliability by means
of the ranking made by the IIRi values. These include, in decreasing order of impor-
tance, the single failure of both pumps to start under demand (PM1.S and PM2.S),
the failure of both support systems of power supply, and the human errors on the
valves V1 and V2 (V1.M and V2.M).
Figure 12.6 represents the results of the FTA for the system using the same
set of data and assumptions made for FMEAe analysis by means of the ARCON
code [40,41], which is used here as a way of comparison between the FTA and
FMEAe approaches. The system failure probability estimated is Ps = 1.61E-3,
which can be considered within the reliability target that should be established
for safety systems at industrial facilities with high requirements for safety and
availability. The reliability profile is shown in Figure 12.6 ranked by Fussell-
Vesely (F-V) importance measure that represents the relative contribution of each
component-failure mode to the system’s probability of not fulfilling its safety
function.
From Figures 12.5 and 12.6, it can be seen that the same group of component-
failure modes dominate the reliability profile. The distinctive feature between
both approaches lies in the fact that FMEAe adds the failure effects, which, in
turn, considers the redundancy. Therefore, the single events Power380V and
Power6KV are placed in a higher level of the ranking made by FMEAe. The same
is made by F-V in the FTA performed with ARCON. In that sense, the approach
of FMEAe can be used to follow the regulatory issues closer than that of FTA.
Nevertheless, the list of the 12 more important failure modes coincides in both
approaches.
When the CCFs are included in the analysis, whatever the method used, the reli-
ability profiles change dramatically, as a function of the system’s redundancy degree.
In the case of the example used herein, the global values were not affected because
of the relatively low values of unavailability used for the system’s components, and
the system redundancy itself, but importance profile of the component-failure modes
has slight changes as shown in Figures 12.7 and 12.8.
The approach of minimal cut sets (MCSs) is responsible for the major difference
between the results obtained by FMAEe and ARCON code and, once again, the
inclusion of the effects in the former strengthen that difference even more.
332 Reliability Engineering
The value of IRS increases five times when the CCFs are included in the analy-
sis (see Figures 12.5 and 12.7), while the system failure probability estimated by
ARCON after the inclusion of CCFs is P = 4.55E-3; that is, it increases 2.8 times.
The reliability profile coincide in both cases with slight differences that are based on
the same criteria explained previously. In this case, the most dominant failure modes
in both approaches were the CCFs of motor-operated pumps to start, followed in the
case of FMEAe by another two CCF events involving the failure to open on demand
of both MV1 and MV2, and the failure to close on demand of MV3 and MV4 as
shown in Figure 12.7.
On the other hand, the ranking of values estimated by FTA gives more priority
to the single failures of the pumps PM1 and PM2 to start on demand over the CCFs
of the MOVs to open and to close on demand, as shown in Figure 12.8, and as was
stated before. This is due to the MCS approach, with respect to the failure mode
and effect approach. Nevertheless, both approaches coincide in estimating the most
important contributors to the system reliability, and therefore, they can be equally
useful for decision making, despite their known major differences.
Finally, to complete the results from the FMEAe approach in evaluating the sys-
tem reliability, an analysis of importance by component type can be done as was
indicated in Section 12.5. The results of that kind of analysis for this example system
is presented in Figure 12.9.
Figure 12.9 shows that the component type of greater importance was the PM type
(motor driven pumps), which resulted in a medium value of importance according
to the FMEAe postulated scale. This classification is quite logical because this type
of component was the one with the highest values of criticality among all the system
Expanded Failure Modes and Effects Analysis 333
FIGURE 12.8 List of component importance ranked by the F-V importance measure
estimated by the ARCON code with CCF contributions.
components either due to their single or their common cause failures. This result
supports the measures to be taken to improve the system reliability profile, even
though the system reliability can be considered acceptable.
REFERENCES
1. IAEA. INSAG-12. Basic Safety Principles for Nuclear Power Plants. IAEA Safety
Series No. 75-INSAG-3, Rev. 1. IAEA, Vienna, Austria, 1999.
2. IAEA. IAEA-TECDOC-478. Component Reliability Data for Use in Probabilistic
Safety Assessment. IAEA, Vienna, Austria, 1988.
3. IAEA. IAEA-TECDOC-508. Survey of Ranges of Component Reliability Data for Use
in Probabilistic Safety Assessment. IAEA, Vienna, Austria, 1989.
4. PHA5-Pro Software. Trial Version. DYADEM International LTD, USA, 1994–2000.
5. Hazard Review Leader. Trial Version 4.0.106. ABS Consulting, 2000–2003.
6. Dinámica Heurística, S.A. de C.V. Software SCRI-HAZOP. SCRI-FMEA, SCRI-What/
If. NL, México, 2004.
7. FMEA Pro 6. FMEA-PRO 6 World’s Most Powerful FMEA Tool Risknowlogy Risk,
Safety & Reliability. Dyadem International LTD, 2003.
8. Relex FMECA. Relex Software Corporation, 2003.
Expanded Failure Modes and Effects Analysis 335
30. EPRI NP-3967. Classification and Analysis of Reactor Operating Experience Involving
Dependent Events. EPRI, Palo Alto, CA, 1985.
31. EPRI TR-100382. A Data Base of Common-Cause Events for Risk and Reliability
Applications. EPRI, Palo Alto, CA, 1992.
32. US NRC. NUREG/CR-5460. A Cause-Defense Approach to the Understanding and
Analysis of Common Cause Failures. SNLs, JBF Associates, NUS Corporation,
Washington, DC, 1990.
33. US NRC. NUREG/CR-5801. Procedure for Analysis of Common-Cause Failures in
Probabilistic Safety Analysis. USNRC, Washington, DC, 1993.
34. US NRC. NUREG/CR-6268, Rev. 1. Common-Cause Failure Database and Analysis
System: Event Data Collection, Classification, and Coding. INL, Washington, DC,
2007.
35. US NRC. NUREG/CR-6819, Vol. 1. Common-Cause Failure Event Insights: Diesel
Generators. INEEL, Washington, DC, 2003.
36. US NRC. NUREG/CR-6819, Vol. 2. Common-Cause Failure Event Insights: Motor
Operated Valves. INEEL, Washington, DC, 2003.
37. US NRC. NUREG/CR-6819, Vol. 3. Common-Cause Failure Event Insights: Pumps.
INEEL, Washington, DC, 2003.
38. US NRC. NUREG/CR-6819, Vol. 4. Common-Cause Failure Event Insights: Circuit
Breakers. INEEL, Washington, DC, 2003.
39. OREDA. Offshore Reliability Data, 4th ed. Det Norske Veritas, Høvik, Norway, 2002.
40. Mosquera, G., Rivero, J., Salomón, J. et al. 1995. Disponibilidad y Confiabilidad
de Sistemas Industriales. El sistema ARCON. Anexo B, pp. 137–140. Ediciones
Universitarias UGMA, Barcelona, Venezuela, Mayo.
41. Salomón, J. Manual de Usuario Práctico del Código ARCONWIN Ver 7.2. Registro de
autor, CENDA, La Habana, Cuba, 2015.
13 Reliability Assessment
and Probabilistic
Data Analysis of
Vehicle Components
and Systems
Zhigang Wei
CONTENTS
13.1 Introduction................................................................................................. 337
13.2 Reliability of Vehicle Components and Systems.........................................340
13.3 Fatigue S-N Curve Transformation Technique............................................ 341
13.4 Representation of Reliability Testing Methods in the Damage-Cycle
Diagram....................................................................................................... 347
13.5 Probabilistic Data Analysis......................................................................... 350
13.5.1 Binomial Reliability Demonstration.............................................. 351
13.5.2 Life Testing..................................................................................... 352
13.5.3 Bayesian Statistics for Sample Size Reduction.............................. 355
13.6 System Reliability........................................................................................ 357
13.6.1 Series System Model...................................................................... 357
13.6.2 Parallel System Model.................................................................... 357
13.6.3 Mixtures Model.............................................................................. 358
13.7 Conclusions.................................................................................................. 358
References............................................................................................................... 359
13.1 INTRODUCTION
Fatigue-related durability and reliability performance is a major concern for the
design of vehicle components and systems [1]. Durability describes the ability of
a product to sustain required performance over time or cycles without undesirable
failure. Reliability is defined as the ability of a system or component to perform its
required functions under stated conditions for a specified period. Both load/stress,
as experienced by a vehicle component or a system, and the strength of the compo-
nents or systems being studied are random variables and normally follow stochas-
tic and probabilistic processes. Eventually, probability distribution functions can
337
338 Reliability Engineering
FIGURE 13.1 Reliability assessment based on (a) stress-strength interference model and
(b) demand-capability in terms of fatigue cycle. (Adapted from Wei, Z. et al., Reliability
analysis based on stress-strength interface model, Wiley Encyclopedia of Electrical and
Electronics Engineering, Chichester, UK, Wiley, 2018.)
be used to characterize the load/stress and strength of the vehicle components and
systems for a given cycle. The stress–strength interference model is a fundamen-
tal probability-based method for reliability analysis [2] (Figure 13.1a) and it can be
applied to fatigue-related reliability assessment if both stress/load distribution and
fatigue strength distribution at a given common cycle are known. Another approach,
which is like the stress-strength interference model but is more commonly used in prac-
tical reliability assessment, is the life-based demand-capability model (Figure 13.1b).
In contrast to the stress-strength interference model, the life distributions of demand
and capability at a given certain stress or load level must be known in advance.
To make the stress-strength interference model applicable, the stress
distribution—probability density function (PDF) f P ( P ) and the strength distribu-
tion f S ( S )—must be available. Similarly, to make the life-based demand-capability
model applicable, the demand distribution f N ( D ) and the capability distribution
f N (C ) must be provided in advance. How to obtain a representative stress distri-
bution and a life demand distribution is a challenging topic. A simplified method
often is used in practice. For example, instead of using the whole set of life demand
information, a single life demand point is set as a target, which represents XXth
(e.g., 95%) percentile usage. Corresponding to the life demand point, a single capa-
bility point, which represents a certain reliability and confidence (RC) levels, for
example, R90C90 (90% in reliability and 90% in confidence), as obtained from the
life capability is identified to compare it with the demand point [1]. A safety factor
then can be defined as the ratio of life capability over the life demand. The stress-
strength interference model can be simplified in a similar way. How to obtain
a fatigue life distribution and a fatigue strength distribution from a given set of
stress-cycle (S-N) fatigue data is one of the main focuses in this chapter. The rela-
tionship between these two distributions for a given set of fatigue data is a key to
accomplishing reliability assessments; however, the relationship between them is
often unclear. To reveal the relationship, a new fatigue S-N curve transformation
technique, which is based on the fundamental statistics definition and some reason-
able assumptions, is specifically introduced in this chapter.
Reliability Assessment and Probabilistic Data Analysis 339
Numerous testing methods are available for product durability validation and reli-
ability demonstration, and such methods include life testing (test-to-failure), binomial
testing (pass or fail), and degradation testing [1,3]. The test-to-failure method tests
a component to the occurrence of failure under a specified loading. The binomial
(Bogey) testing method is used often in reliability demonstration in which the cus-
tomers’ specifications must be met for acceptance into service. The degradation
testing is used to test a product to a certain damage level, which is often at a level
far below complete failure. Additionally, the associated accelerated testing meth-
ods [3,4] (i.e., accelerated life testing, accelerated binomial testing, and accelerated
degradation testing) are used often to shorten the development time and reduce the
associated cost while not significantly sacrificing the accuracy of the assessment.
All these methods are treated separately, and their relationships are not clear, which
impedes the wide and proper applications of these methods and their combinations.
In this chapter, a unified framework of the reliability assessment method is pre-
sented in a damage-cycle (D-N) diagram [5], which consists of the following major
constituents: (1) test data, either test-to-failure, binomial, degradation, or combined,
for estimating the continuous probabilistic distribution function, (2) damage accu-
mulation rules, such as the linear or nonlinear damage accumulation rules, for data
interpolation and extrapolation, and (3) a variable transformation technique, which
converts a probabilistic distribution of a variable into a probabilistic distribution of
another variable.
In addition to these two transformation techniques, the probabilistic analysis on
data with large sample size with two- and three-parameter Weibull distribution func-
tions, the uncertainty for data with small sample size, and the sample size reduction
approaches based on the Bayesian statistic also are investigated. Furthermore, the
basic assumptions and theories in assessing the reliability of systems are provided to
complement these two basic transformation techniques. It should be noted that soft-
ware reliability of the modern vehicle components and systems is very important [3]
and it is especially true when vehicle-to-vehicle (V2V), vehicle-to-infrastructure
(V2I), and autonomous vehicle are the mainstream topics in the automotive indus-
try. However, only fatigue-related reliability is considered in this chapter because of
space limitations.
This chapter is organized as follows:
Section 13.2 provides a brief and general background about the reliability assess-
ments of vehicle components and systems with an emphasis on vehicle exhaust
components and systems. Section 13.3 presents a fatigue S-N curve transformation
technique in which distributions of load/stress and life can be properly selected
based on data pattern and converted to each other when necessary. Section 13.4
introduces a variable transformation technique in a damage-cycle (D-N) diagram,
which is a tool that can effectively interpret the commonly used fatigue-testing
methods and seamlessly reveal the interrelationship among these testing meth-
ods. Section 13.5 provides the basic concepts on reliability assessment of systems.
Section 13.6 provides some basic methods for processing data with probabilistic
distributions with a special attention to the differences between the two-parameter
and three-parameter Weibull distribution functions in terms of predictability and
applicability. Uncertainty analysis on data with small sample size and the potential
340 Reliability Engineering
capability of the Bayesian statistic in sample size reduction also are discussed in
Section 13.6. Pertinent examples are provided in each section to demonstrate the
concepts and techniques developed. Finally, Section 13.7 summarizes this chapter
with several key observations.
be performed with the help of the linear Miner’s damage rule. Miner’s rule predicts
that failure occurs when damage is greater than or equal to 1 [8].
As the name implies, in hot-testing all parts of the RLDA, calibration, and
bench testing are conducted in service or equivalent high temperature conditions.
The fatigue life can be assessed in service condition and no temperature correc-
tion factor is required in the fatigue life assessment. The hot-testing method is still
evolving [8] and, without losing generality, only the cold-testing related topics will
be addressed in this chapter.
FIGURE 13.3 Cycle and load based probabilistic distributions for the same set of fatigue
S-N data.
The life distribution is used much more commonly than the strength distribution
in fatigue data analysis. However, the strength distribution has many unique charac-
teristics and important applications, such as:
In all these cases, a new technique is required to transform the distribution of life
to the distribution of strength or vice versa. The following is such a technique to
accomplish this goal.
Reliability Assessment and Probabilistic Data Analysis 343
f [ y( x ) ] = Kf [ x( y ) ] (13.1)
Equation 13.1 indicates implicitly that the peak of the PDF of the strength distribu-
tion corresponds to that of the PDF of life distribution, and the valley of the PDF
of the strength distribution corresponds to that of the PDF of life distribution for
single-mode probabilistic distributions (Figure 13.4). This assumption makes sense
intuitively based on the observations of a wide variety of fatigue data. The follow-
ing lognormal (normal) distributions is provided to demonstrate the transformation
technique.
The selection of probabilistic distribution functions is a critical issue in reliability
assessment. The real distribution of a fatigue life given stress level is essentially
unknown. However, the two-parameter Weibull and log-normal distribution functions
are commonly used in probabilistic fatigue life assessments [11]. In the automotive
industry, the two-parameter Weibull often is preferred in fatigue life assessments
because of its simplicity and seemly meaningful interpretation of the shape parameter.
Years of experience and data collection show that both functions empirically fit the
fatigue data equally well as far as the mean behavior is concerned [11]. The pairs of
the two fit parameters for the two distribution functions are, respectively, µ (mean)/σ
(standard deviation) and η (scale)/β (shape). The bell-shaped normal PDF and the cor-
responding cumulative distribution function (CDF) are expressed in Equations 13.3a
and 13.3b, respectively:
1 1 x − µ 2
f ( x) = exp − (13.3a)
σ 2π 2 σ
344 Reliability Engineering
1 x − µ
F ( x) = 1 + erf (13.3b)
2 σ 2
βx
β −1
x β
f ( x) = exp − (13.4a)
η η η
x β
F ( x) = −
1 exp − (13.4b)
η
β x − a
β −1
x − a β
f ( x) = exp − (13.5a)
η η η
( x − a) β
F ( x ) = 1 − exp − ; 0 < a ≤ x < ∞,η , β > 0 (13.5b)
η
The normal distribution is used in this section to show the fatigue S-N curve trans-
formation technique. Based on Equations 13.1 through 13.3a, the PDF of the normal
distribution function f ( y ) as a function of y can be written as Equation 13.6:
+∞ +∞ 1 x − µ ( y ) 2
1
∫ K f x ( y ) dy = K
∫
x
exp − dy = 1 (13.6)
−∞ −∞ σ x ( y ) 2π 2 σ x ( y )
µ x = a + by (13.7)
+∞
1 1 y − ( a − x ) ( −b) 2
K
∫
−∞ σx 2π
exp −
2 ( −σ x / b)
dy = 1 (13.8)
Reliability Assessment and Probabilistic Data Analysis 345
K = −b (13.9)
For a linear fatigue S-N curve in a log-log plot with x = log( N ) and y = log( S ),
assume that the distribution of the cycles to failure at a stress level follows a normal
distribution in a log-log plot, then Equation 13.8 simply becomes Equation 13.10:
2
log( S ) − a − log ( N )
−b
1
f S ( N ) = exp − (13.10)
2π ( − σ b ) 2 ( −σ b )
2
Table 13.1 lists a set of fatigue S-N data of welded exhaust components made of a
steel. Tests are conducted by controlling the applied load and only two load levels
are tested with six data points at each load level. The fatigue data show a wide
scatter band because many factors, such as material inhomogeneity and welding
quality, are involved in the failure of the exhaust components. Since the data in
Figure 13.5 belong to the “standard horizontal pattern” [12], the horizontal offsets
method, which is the ASTM standard recommended method [10], should provide
a reasonable fit curve. The fit curves with the horizontal offset method as well as
the vertical offset method are plotted in Figure 13.5, and the fit parameters are
listed in Table 13.2. The results of the horizontal offset methods are very different
from those of the vertical offset methods, which provide a poor fit to the set of
data. Based on the linear assumption in the log-log plot and the estimated mean
curve from the horizontal offset method, the standard deviation of the strength
can be calculated, and the results are listed in Table 13.2. This example belongs
to type (f) listed in Section 13.3. To accurately assess the reliability of a system,
the reliability of each constituent component must be accurately assessed as well.
However, the reliability assessment of each component often is conducted with
limited sample size and under certain testing conditions because of budget and
346 Reliability Engineering
testing constraints, which brings significant uncertainty in test results and their
interpretations. Example 13.1 indicates that the obtained results (the mean and the
standard deviation) could be inaccurate and even misleading if the load/stress dis-
tribution is obtained directly from fitting the data with the vertical offsets method.
By contrast, the load/stress distribution as obtained by transforming the life distri-
bution, which is obtained by fitting the data with the horizontal offsets method,
is logically sound and meaningful; therefore, surely it will lead to a more accurate
system reliability assessment.
TABLE 13.1
Fatigue Cycles to Failure at the Two-Stress Levels
Load, lbs No. 1 No. 2 No. 3
520 86188 130708 153282
620 45823 55775 73715
Load, lbs No. 4 No. 5 No. 6
520 168718 177465 304998
620 89524 108583 135140
FIGURE 13.5 Vertical and horizontal offsets methods for fatigue data of an automotive
exhaust component.
TABLE 13.2
Calculated Fit Parameters with a = log(C ) and b = 1/ h for the
Power Law S = CN h
C (a ) h(b) STD_N STD_S (Equation 13.11b)
Vert. 2213.1 (28.6) −0.117 (−8.547) 0.262 —
Hori. 10889.3 (15,9) −0.254 (−3.937) 0.178 0.045
Reliability Assessment and Probabilistic Data Analysis 347
FY ( y ) = P (Y ≤ y ) = P ϕ ( X ) ≤ y (13.12)
dFY ( y ) dψ ( y )
fY ( y ) = = f X ψ ( y ) (13.14)
dy dy
dFY ( y ) dψ ( y )
fY ( y ) = = − f X ψ ( y ) (13.16)
dy dy
dFY ( y ) dψ ( y )
fY ( y ) = = f X ψ ( y ) (13.17)
dy dy
The variable transformation technique shown in Equations 13.14 through 13.17 is the
essential part of the unified framework for representing these three reliability testing
methods. With this technique, the distribution of cycles to failure can be calculated
easily from the damage distribution at a given cycle or the cycle distribution at a
given damage with the help of a damage evolution equation, which can be either
Reliability Assessment and Probabilistic Data Analysis 349
linear or nonlinear. In reverse, if the final life distribution is known, then the distri-
bution of damage at any given cycle and the cycle distribution at any given damage
level can be calculated in the same manner. In practice, the PDF, f N f ( D = 1) and
CDF F N f ( D = 1) , can be estimated by fitting the life testing data. It is noted that
Equation 13.17 is obtained by assuming that the transformation functions are either
monotone increasing or decreasing, which is the case for the fatigue-based reliabil-
ity analysis. For complex cases where the assumptions of monotone increasing or
decreasing are not valid, a more general theoretical framework as provided in [13]
can be followed.
Corresponding to the three commonly used testing methods, there are three cor-
responding accelerated testing methods: accelerated life testing, accelerated bino-
mial testing, and accelerated degradation testing methods. For example, accelerated
fatigue life testing can be achieved through increasing stress/load levels. At least
two higher stress levels (lower and upper levels) often are introduced to conduct
accelerated fatigue life testing. Then the design parameters at service stress level
are estimated from the accelerated testing through extrapolation. With probabilistic
distribution functions (i.e., f N F ( SU ) and f N F ( S L ) ) at the two higher stress
levels, SU and S L , the probabilistic distribution f N F ( Ss ) of the life at the service
stress level, SS , can be obtained appropriately by extrapolating data obtained from
the higher stress levels. It should be noted that in accelerated testing data analysis,
the farther the accelerated stress level is from the normal stress level, the larger the
uncertainty in the extrapolation [4]. All these testing methods can be interpreted in
a single D-N diagram for one-stress level testing (Figure 13.6) and the S-N curve for
multiple-stress level testing.
( β −1) N β
β N N
f (D ) = 2 exp − , D ≥ 0 (13.18a)
η D Dη Dη
N β
F ( D ) = exp − , D ≥ 0 (13.18b)
Dη
1 β N
β −1
N β
f (N ) = exp − , N ≥ 0 (13.19a)
D n Dη Dη
N β
F ( N ) = 1− exp − , D ≥ 0 (13.19b)
Dη
∑ i !( n − i )! R
n!
(1 − R ) (13.20)
n −i i
C = 1−
i =0
where:
R is reliability
C is confidence level
r is the number of failed items
When r = 0 (no failure), Equation 13.20 is a simple equation for a successful run
testing (Equation 13.21):
C = 1 − R n (13.21)
The binomial test methods have been used widely in the automotive industry.
However, the sample size required for achieving high confidence and reliability are
significant. Based on the assumption that the probabilistic distribution follows the
two-parameter Weibull distribution (Equation 13.4), a general accelerated testing
procedure (Equation 13.22) can be developed by following the Lipson equality [14]:
β
C = 1 − R ( 1 2 ) (13.22a)
n LL
β
1/ n( L1L2 )
R = (1 − C ) (13.22b)
ln (1 − C )
n= (13.22c)
( L1L 2 )
β
ln R
where:
L1 = t2 / t1 is life test ratio
L2 = η1/η2 is load test ratio indicating that the change in the characteristic life is
caused by the change in load
on Equation 13.22c, ( L1L2 ) β times fewer test units are needed than would be required
by using the conventional successful run approach. In addition, the larger the value of
the shape parameter β , the greater the ratio effects on sample size reduction. The ratio
η1/η2 can be estimated from historical data or expert opinions. Equation 13.22 can be
reduced to the extended test method when the effect of L2 is ignored (i.e., η1 = η2 [3]).
From Equation 13.22, with the same confidence and reliability, the sample size
reduction can be achieved in three ways:
Way-1: extend or increase the test time at the same stress/load level [3]
Way-2: increase the load/stress level and eventually reduce the characteristic
life η
Way-3: combine Way-1 and Way-2
13.5.2 Life Testing
The two-parameter Weibull distribution function often is used in the life testing and
almost exclusively used in the extended time testing, which can be considered as
an accelerated testing method by appropriately extending the testing time but with
significantly reduced testing samples as shown in Equation 13.22c in Section 13.5.1.
However, the fatigue data from a wide variety of sources indicate that the three-
parameter Weibull distribution function with a threshold parameter at the left tail
is more appropriate for fatigue life data with large sample sizes [14]. The uncertain-
ties introduced from the assumptions about the underlying probabilistic distribution
would significantly affect the interpretation of the test data and the assessment of the
performance of the accelerated binomial testing methods; therefore, the selection
of a probabilistic model is critically important. Product validation and reliability
demonstration, designs targeting the low percentiles of the fatigue life at the left tail,
are required [11]. Therefore, the characteristics of the left tail of a selected model
needs to be thoroughly examined test data with a large sample size against the physi-
cal mechanisms when the left tail of a distribution is a concern. For test data with
FIGURE 13.8 Schematic of accelerated binomial (Bogey) testing procedure through (a)
extended testing time and (b) increased load/stress level as represented in the (S-N) diagram
with D = 1.
Reliability Assessment and Probabilistic Data Analysis 353
a small sample size, the benefit of using the three-parameter Weibull distribution is
not clear because the third fit parameter (threshold) brings significant uncertainty in
data analysis and often results in abnormal values of the fit parameters. However,
meaningful results can be obtained for data with even very small sample sizes if
Bayesian statistics are used and the historical data are available. Three examples
following demonstrate these three respective aspects.
Example 13.4 Fatigue Data of 2024-T4 with Relative Large Sample Size
(a) (b)
FIGURE 13.9 Probability plots of (a) two-parameter Weibull distribution and (b) three-
parameter Weibull distribution for a set of 2024-T4.
354 Reliability Engineering
TABLE 13.3
The Values of the Fit Parameters of
Two-Parameter (2P) and Three-Parameter
(3P) Weibull Distribution Functions for
the High-Cycle Fatigue Data of 2024-t4
Distribution Functions Parameters
2P-Weibull Shape 1.74758
Scale 2092213
AD statistic 1.246
3P-Weibull β 0.908975
η 1510218
δ 452578
AD statistic 0.526
Probability Plot
Weibull - 90% LB
Complete Data - LSXY Estimates
99
Table of Statistics
90 Shape 2.37219
80
70 Scale 75035.5
60
50 Mean 66504.0
40 StDev 29826.6
Percent
30
20
Median 64293.5
IQR 41734.1
10
Failure 6
5 Censor 0
3 AD* 2.133
2 Correlation 0.972
1
10000 100000
Cycles to failure
lower bound is shown with a large scatter band indicating the uncertainty nature
of the calculated values of the fit parameters caused by the small sample size.
The value of the test data at any given reliability and confidence levels (RxxCyy)
can be obtained readily. The smaller the sample size, the wider the scatter band
and the larger the uncertainty. Clearly, how to obtain accurate estimated param-
eters from small sample size is a big challenge.
It should be noted that even though the suitability of the three-parameter
Weibull distribution in fatigue life testing and associated product validation is
obvious from Figure 13.9, for the data with a relatively large sample size, its
application to data with small sample size is not recommended because of the
high possibility of unstable solutions with the introduced third fit parameter in the
three-parameter Weibull distribution. Instead, a two-parameter Weibull distribu-
tion is recommended because, although it cannot provide accurate information
about the tails of the distribution, it does provide reliable information of the mean,
which is often useful. To obtain accurate parameter estimation of three- or mul-
tiple Weibull distributions with limited sample sizes, the Bayesian statistics, which
uses historical data, can be considered.
l (θ ; x ) p (θ )
p (θ | x ) = 1
(13.23)
∫
0
l (θ ; x ) p (θ ) d (θ )
where p(θ | x ) is posterior PDF, for the parameter θ given the data x. p(θ ) is prior
PDF for nthe parameter θ . l (θ ; x ) is the likelihood function, which is defined as
l (θ ; x ) = ∏ f (θ ; xk ) , where xk is kth experimental observation and f (θ ; xk ) is the
k −1
PDF of cycles to failure. The denominator in Equation 13.27 is simply a normalizing
factor which ensures that the posterior PDF integrates to one. The Bayesian process
is schematically shown in Figure 13.11. The posterior distribution usually is nar-
rower than the prior distribution, and results with improved confidence and accuracy
can be obtained by analyzing the posterior data.
Two key steps to realize the Bayesian statistics in constructing a reliability
RxxCyy are (1) posterior distributions from the historical data and (2) efficient
numerical algorithms to implement Equation 13.23.
356 Reliability Engineering
A large amount of reliable historical fatigue test data for welded structures has
been systematically collected and analyzed, and the associated probabilistic
distributions of the mean and standard deviation of the failure cycles have been
successfully obtained [15]. An advanced acceptance-rejection resampling algo-
rithm and a Monte Carlo simulation procedure have been implemented.
Figure 13.12a shows a mean life-standard deviation (Mean-STD) plot (in log-log)
of 110 sets of fatigue failure data of a type of welded exhaust component. Based
on the data pattern shown in Figure 13.12a, a probabilistic distribution and the val-
ues of the corresponding fit parameters have been obtained. With the probabilistic
FIGURE 13.12 (a) The STD-Mean plot based on historical weld fatigue data and (b) the
R90C90 design curve constructed with only one data point at each of the two stress levels by
using a Bayesian statistics procedure.
Reliability Assessment and Probabilistic Data Analysis 357
distribution of the historical data, the Bayes’s rule (Equation 13.23) and an advanced
numerical algorithm, a design curve obtained with only one data point at each of
the two stress levels, can be constructed and is shown Figure 13.12b. It should be
noted that a design curve cannot be constructed with the traditional probability
plot even with two data points at each stress level. The advantage of the Bayesian
statistic is clearly demonstrated from this example.
or, equivalently:
R( x) = ∏ R ( x ) (13.24b)
i =1
i
F ( x) = ∏ F ( x ) (13.25a)
i =1
i
or, equivalently:
The parallel system model represents statistically the polar opposite from the series
system model but with F ( x ) and R( x ) interchanged. Like the product rule of reli-
ability, Equation 13.25a can be referred to as the product rule of unreliability since it
establishes that the unreliability of a parallel system is the product of the individual
component unreliabilities. A parallel system model also is called a dominant modal
model [4] if bi-modal or multiple failure mechanisms are involved.
or, equivalently:
n n
R( x) = ∑
i =1
pi Ri ( x ) ; ∑ p = 1;
i =1
i 0 ≤ pi ≤ 1 (13.26b)
13.7 CONCLUSIONS
This chapter introduces several recently developed new methodologies for fatigue
associated reliability assessment of vehicle components and systems. The most
important two of these methodologies are the fatigue S-N curve transformation
technique and a variable transformation technique. In principle, these methodologies
can be applied to the reliability assessment of other similar engineering components
and systems. With these new methodologies, the current S-N data analysis and reli-
ability testing methods can be interpreted in a new unified probabilistic framework.
The importance of selecting two-parameter and three-parameter Weibull distribu-
tions in a probabilistic analysis of data with large sample size has been illustrated
with examples. The uncertainty introduced in test data with small sample size and
the benefits of using Bayesian statistics approach in cost reduction also has been
demonstrated with examples.
Reliability Assessment and Probabilistic Data Analysis 359
REFERENCES
1. Lee YL, Pan J, Hathaway R, Barkey M. Fatigue Testing and Analysis: Theory and
Practice. Boston, MA: Elsevier Butterworth-Heinemann; 2005.
2. Wei Z, Hamilton J, Ling J, Pan J. Reliability analysis based on stress-strength inter-
face model, Wiley Encyclopedia of Electrical and Electronics Engineering, Chichester,
UK: Wiley; 2018.
3. O’Connor PDT, Kleyner A. Practical Reliability Engineering, 5th ed. Chichester, UK:
Wiley; 2012.
4. Nelson WB. Accelerated Testing: Statistical Models, Test Plans, and Data analysis,
Hoboken, NJ: John Wiley & Sons; 2004.
5. Wei Z, Start M, Hamilton J, Luo L. A unified framework for representing product
validation testing methods and conducting reliability analysis. SAE Technical Paper
2016-01-0269.
6. Wei Z, Kotrba A, Goehring T, Mioduszewski M, Luo L, Rybarz M, Ellinghaus K,
Pieszkalla M. Chapter 18: Failure mechanisms and modes analysis of automo-
tive exhaust components and systems, pp. 392–432, Handbook of Materials Failure
Analysis, Abdel Salam Hamdy Makhlouf (Ed.). Amsterdam, the Netherland: Elsevier;
2015.
7. Wei Z, Luo L, Voltenburg R, Seitz M, Hamilton J, Rebandt R. Consideration of tem-
perature effects in thermal-fatigue performance assessment of components with stress
raisers, SAE Technical Paper 2017-01-0352.
8. Seitz M, Hamilton J, Voltenburg R, Luo L, Wei Z, Rebandt R. Practical and techni-
cal challenges of the exhaust system fatigue life assessment process at elevated tem-
perature, ASTM Selected Technical Papers (STP) 1598, Zhigang Wei, Kamran Nikbin,
D. Gary Harlow, Peter C. McKeighan (Eds.). ASTM International; 2016.
9. Shen CL. The statistical analysis of fatigue data. PhD dissertations, Tucson, AZ:
The University of Arizona; 1994.
10. Standard practice for statistical analysis of linear or linearized stress-life (S-N) and
strain-life ( ε − N ) fatigue data, ASTM Designation: E739-10.
11. Wei Z, Luo L, Yang F, Lin B, Konson D. Product durability/reliability design and vali-
dation based on test data analysis, pp. 379–413, Quality and Reliability Management
and Its Applications, Hoang Pham (Ed.). Springer; 2016.
12. Wei Z, Yang F, Cheng H, Maleki S, Nikbin K. Engineering failure data analysis:
Revisiting the standard linear approach. Engineering Failure Analysis, 2013; 30:
27–42.
13. Elishakoff I. Probabilistic Theory of Structures, 2nd ed. Mineola, NY: Dover
Publications; 1999.
14. Wei Z, Mandapati R, Nayaki R, Hamilton J. Accelerated reliability demonstra-
tion methods based on three-parameter Weibull distribution. SAE Technical Paper
2017-01-0202.
15. Wei Z, Zhu G, Gao L, Luo L. Failure modes effect and fatigue data analysis of
welded components and its applications in product validation. SAE Technical Paper,
2016-01-0374.
14 Maintenance Policy
Analysis of a Marine
Power Generating
Multi-state System
Thomas Markopoulos and Agapios N. Platis
CONTENTS
14.1 Introduction................................................................................................. 361
14.2 Reliability Assessment and Multi-state Systems......................................... 365
14.3 Description of Ship’s Electric Power Generation System............................ 366
14.4 Semi-Markov Model Development.............................................................. 371
14.5 Multi-state System Analysis........................................................................ 374
14.6 Maintenance Policy and Implications......................................................... 380
14.7 Conclusions.................................................................................................. 385
Acknowledgment.................................................................................................... 385
Appendix................................................................................................................. 386
References............................................................................................................... 397
14.1 INTRODUCTION
This study is an attempt to analyze the reliability performance of a marine power
generation system with the auxiliary systems attached and to develop an alternative
for maintenance policy. The main scope of this study is to analyze the methodology
and to conduct reliability analysis of the marine electric power system, focusing
rather on the mathematical modeling than on the field of the research on pure electric
and mechanical systems and their technical details. This aspect leads to generic infer-
ences that are applicable in most systems providing the big picture of the problem
and its solution. Nevertheless, authors use references to certain technical issues to
help the reader to understand the basic principles of a marine electrical power gener-
ating system with the attached auxiliary systems.
This chapter is organized as follows. In this section, there is a short description and
general information concerning the marine power generating system as a part of the ship
and some related references. Section 14.2 is a presentation of the reliability assessment
and multi-state systems in brief. In Section 14.3, there is a description of a typical electric
power generation system and reliability characteristics. Section 14.4 presents the devel-
opment of the semi Markov model. Section 14.5 is a description of the auxiliary diesel
361
362 Reliability Engineering
engines system driving the electric power generators and a reliability analysis of the
multi-state system including the probabilities related with its operation. In Section 14.6
the basic outlines on maintenance policy and maintenance implications and ideas on how
stochastic analysis and its inferences could contribute in real world management issues
are presented. In addition, there are empirical results concerning the availability of the
power generating system under different system configurations. Finally, in Section 14.7,
the conclusion sums up maintenance policy and suggests some ideas for further research.
The design of a vessel follows certain basic principles given as guidelines by organiza-
tions such as International Maritime Organization (IMO) and Marine Technology Society
(MTS), covering all possible sectors of a ship building project and all systems of the ves-
sel. Consequently, such guidelines (MTS DP Technical Committee; MSC/Circular 1994)
as a design philosophy and for all essential calculations (IMO MEPC 1-CIRC 866 2014)
exist for the electrical power generating system as a part of the whole vessel. Currently,
and due to issues related to environment and modern economics, major challenges arise
concerning the ship’s technology (MUNIN D6.7 2014). There is an increasing pressure for
more efficiency in energy, environmental effect, and safety. IMO has developed certain
regulations (IMO 2016) concerning a ship’s efficiency quantification providing guidelines
for all essential calculations (MEPC 61/inf.18 2010; MEPC.1-Circ.681–2 2009; MEPC.1-
Circ.684 2009). One major problem designers have in ship technology and design is
systems efficiency. Especially, the ship’s energy is a sector where a lot of challenges
arise continuously. Climate change and the problem of the greenhouse gas emissions
lead research to more efficient energy systems on ships and intensifying the demand
for improved safety levels and environmental protection to be competitive. The quan-
titative analysis of this effort could be summarized using certain indices such as the
Energy Efficiency Design Index (EEDI) and the Energy Efficiency Operational Indicator
(EEOI). Presumably, diesel engine driven electric power generating systems depend on
these regulations. Previous research (Prousalidis et al. 2011) has shown that the evolution
of a ship’s technology leads to new trends. Concerning the energy efficiency, use of vessel
and energy management means research on optimization of routes and vessel’s speed,
which implies optimization of power systems and management and finally presenting
advantages through an extensive electrification of ship systems. All those challenges and
trends could lead to increased complexity of the systems and requirements concerning
the technical background and skills of crewmembers. Unfortunately, all improvements
mentioned do not assure full ship safety and there is always the probability that unpre-
dictable incidents will happen (Mindykowski and Tarasiuk 2015). Since electric power
is a basic and essential factor of the normal operation of a ship, the electric power system
of a ship is dedicated to meet its electric load requirements according to the type of
mission during the different phases of its operation, such as overseas voyage, charging
and discharging, berthing, etc. According to international regulations in the case of an
electrical system failure, the usual and anticipated consequence is a blackout (Brocken
2016), which leads to a deadship condition initiating event. The meaning of the term
“deadship” is a condition under which the main propulsion plant, boilers, and auxiliaries
are not in operation and in restoring the propulsion, no stored energy for starting the pro-
pulsion plant, the main source of electrical power, and other essential auxiliaries should
be assumed available. It is assumed that the means are available to start the emergency
generator at all times (IMO 2005).
Maintenance Policy Analysis of a Marine Power Generating MSS 363
The research about blackout incidents shows that there are many different factors
causing a blackout in a ship, such as human error, control equipment failure, auto-
mation failure, electrical failure, lack of fuel, mechanical failure, and other causes
(Miller 2012) leading to certain questions such as:
• Do the available electric power generators meet the ship’s power requirements?
• What is the probability of a total system failure?
• What would be the financial cost of the system failure?
All these questions are closely related with the issue of electric power system reli-
ability, which in the case of a vessel is manageable by following strategies on its
architecture such as the use of multiple power sources, sectioning of the distribution
grid (Stevens et al. 2015), and use of auxiliary safety subsystems, such as earthing
and protection systems (Maes 2013). More specifically, the primary and standby
generators are driven by diesel engines with different technical characteristics and
attributes related to the requirements and the mission of the vessel such as:
• Excitation system
• Lubrication system
• Cooling system
• Facilities for alarms, monitoring, and protection
• Neutral earthing
The importance of the marine electric power system and its components could be
understood easily if electric failures are considered that led to marine accidents, such
as that of RMS Queen Mary 2 (MAIB 2011), which is obvious since its main tasks
could be summarized as follows (Patel 2012):
Since a ship operates in an autonomous mode at sea and usually when moored, the
design of the power system faces major challenges to meet the established stan-
dards and other requirements. The ship designers must consider the electrical power
364 Reliability Engineering
requirements during each phase of the ship operation. A major concept affecting the
use of electric energy on a ship is the quality of power. According to the established
standards, by the term “quality of power,” we mean “the term of power quality refer-
ring to a wide variety of electromagnetic phenomena that characterize the voltage and
current at a given time and at a given location on the power system” (IEEE 1159–1995).
There are several direct and indirect consequences of a poor electric power supply
quality on a ship, which leads to several problems and distortions that could take place
resulting in systems failures and a reduced level of reliability.
These problems could be summarized as follows (Prousalidis et al. 2008):
• Harmonics
• Short duration voltage events
• Voltage unbalance
According to other research, the operation of the electric power generating system
could be summarized by two major groups of parameters:
• Parameters of voltage and currents in all the points of the analyzed system
• Parameters describing a risk of loss of power supply continuity
Attempting to evaluate the levels of quality and to deal with these problems,
researchers have developed certain quality indices concerning voltage and frequency
deviations (Prousalidis et al. 2011). The importance of those indices is obvious if
their limit values and the standards established (Table 14.1) concerning the issues
TABLE 14.1
Standards Concerning Power Quality of a Ship
# Standard Range
1 IEEE Std. 45:2002 IEEE Recommended Practice for Electrical
Installations on Shipboard
2 IEC 60092-101:2002 Electrical installations in ships. Definitions and
general requirements
3 STANAG 1008:2004 Characteristics of Shipboard
Electrical Power Systems in Warships of the
North Atlantic Treaty Navies, NATO, Edition 9,
2004
4 American Bureau of Shipping, ABS, 2008 Rules of building and classing, steel vessels
5 Rules of international ship classification Technical Requirements for Shipboard Power
societies, e.g., PRS/25/P/2006 Electronic Systems
Source: Mindykowski, J., Power quality on ships: Today and tomorrow’s challenges, International
Conference and Exposition on Electrical and Power Engineering (EPE 2014), Iasi,
Romania, 2014.
Maintenance Policy Analysis of a Marine Power Generating MSS 365
of electric power quality assessment in ship networks are considered. The usual
causes of the power quality problems on ships are human factors, the assigned loads,
overloading, and technical failures (Mindykowski 2014). It should be noted that the
quality of electric power passes through two stages: assessment and improvement.
The improvement stage is possible through the technical solutions and the invest-
ment in the staff and human capital (Mindykowski 2016). Technical solutions refer
to new distribution systems such as Zonal Electrical Distribution System (ZEDS)
or hybrid technology solutions (Shagar et al. 2017). The needs for electrical power
differ from phase to phase of operation depending on the devices and systems that
are necessary for the normal operation of the ship. According to expert opinions,
the phase of charging and discharging are the most demanding and stressful for
the electrical power generating system of a ship. Thus, a reliability modeling and
analysis of the system related to these phases provides valuable inferences about the
safety of a ship.
where:
s ji is the state representing a specific level of performance of the subsystem j
i ∈ {1, 2,..., k } is the set of the states of each subsystem
366 Reliability Engineering
Introducing the factor of time in the model, the state of the MSS over time is a ran-
dom variable representing a stochastic process (Lisnianski et al. 2010) with its major
parameters such as mean and variance. The function describing the reliability of the
MSS can be defined as:
R(t , w ) = P {S (t ) ≥ w} (14.2)
Based on literature findings, one of the major research fields of MSS is reliabil-
ity assessment and more specifically the electric—electronic systems such as
power generation and communication systems (Lisnianski et al. 2012). To assess
the expected performance of a complex or composite system, it is necessary to
determine the states of the system and the sojourn time of each state (Barbu and
Karagrigoriou 2018). This aspect implies the use of the semi-Markov methodology
to take advantage of the flexibility it provides compared with the ordinary two-state
Markov binary systems (operation or failure). The trade-off of the flexibility is the
complexity of the system and the implied difficulties for understanding and perfor-
mance evaluation (Yingkui and Jing 2012). There are more advantages concerning
the flexibility of MSS. Since the focus is on the acceptable and non-acceptable sides,
the analysis is closer to real world problems (Liu and Kapur 2006) than the ordinary
simple systems that focus on “time to failure.” This advantage leads to better accu-
racy assessments (Lisnianski et al. 2012) and improving the time needed to analyze
the model (Billinton and Li 2007).
using the existing automatic control switching (ACS) system. A typical example of an
electric power system in a ship consists of operating components for power genera-
tion, energy transmission, and energy distribution for all energy consuming devices.
Usually, there are ships with a configuration of three main generators and one emer-
gency unit (Wärtsilä 2014) where the main system consists of two primary and one
secondary and the switchboard (Mindykowski 2016), or a set of four, consisting of
two main generators (primary) and two standby ones (Mennis and Platis 2013).
Considering the standards of IEEE (IEEE 45-2002) as shown in Figure 14.1, a
typical example of the electrical power generation system of a large cargo ship con-
sists of four generator units dedicated to serve the ordinary loads during different
phases of the ship’s use. An emergency generator unit exists in case of a total failure
(blackout) of all four main generators. In this case, the capacity of the emergency
generator is lower than that of the main ones, since it serves only the basic loads
such as emergency lighting and basic instruments and devices of the ship such as
internal communication and basic electronic systems (Patel 2012). In addition, many
FIGURE 14.1 Large cargo ship power system with emergency generator and battery backup
based on Standard IEEE 45-2002. (Based on Patel, M.R., Shipboard Electrical Power
Systems, CRC Press, Boca Raton, FL, 2012.)
368 Reliability Engineering
batteries exist to serve the ship in case of a total blackout. The case of four generators
is the generic one covering more complex systems.
Starting the description of the system, we examine the case of the four-generator
system assuming that it consists of two primary generators and two standby genera-
tors as shown in Figure 14.1 and in Table 14.2 (Patel 2012). When generators are in
automatic startup, they need specific time to acquire their operational parameters
such as the voltage and the frequency of their output current. All generators are
controlled by an automatic control system which activates the standby generator or
generators when necessary.
According to the same standard (IEEE 45-2002), we assume that when a genera-
tor startup failure takes place there are two ways of activation: automatic switching
by the automatic control system and manual switching by the crew. The automatic
switching time to activate the standby generators is 45 seconds.
The switching time for the manual activation depends on the current position of
the crew members in the ship and for the current analysis we assume it is 5 minutes
as the time to proceed to the machinery room from anywhere in the ship. Concerning
the nominal power of the generators (e.g., Wärtsilä 2014), we assume the output of
main and standby generators is 875 KW and the output of the emergency generator
is 200 KW (Table 14.3). According to the ordinary use of the marine power gen-
erating system, the standby generators remain in cold mode to operate in case of a
primary generator failure. In fact, the standby generators are not in running mode
and only some of their essential subsystems are running to respond whenever it is
necessary. All these generators are driven by auxiliary diesel engines that also are
subject to failures, repairs, and maintenance. In this case, there are certain failure
modes (shown in Table 14.4) for each subsystem describing the type of occurrence
and its effect to the normal operation of the whole system. Due to the standby status
TABLE 14.2
Basic Parts of Ship’s Electric Power
Generating System (Four Generators)
Number of main generators 2
Number of stand by generators 2
Number of emergency generator 1
Automatic control system (ACS) 1
TABLE 14.3
Output Power of the Generators
Main generator #1 875 KW
Main generator #2 875 KW
Standby generator #1 875 KW
Standby generator #2 875 KW
Emergency generator 200 KW
Maintenance Policy Analysis of a Marine Power Generating MSS 369
TABLE 14.4
Failure Modes of Standby Generator
Effect
Does Not Prevent
Occurrence Type Prevents the Operation the Operation
Monitored Monitored Critical Monitored
Non-critical
Latent Latent Critical Latent Non-critical
of the secondary systems, their failures are probable to remain latent and they would
be realized during a simultaneous failure of a main generator, whereas the time of
this failure combined with the status of the whole system would be critical, espe-
cially when the specific generator is the last one available, since all main and standby
ones have failed.
We should notice when the ship is in anchorage without additional electric sys-
tems in operation, one generator meets all load requirements. During additional
operations such as cargo charging and discharging, one more generator is considered
necessary (Mennis and Platis 2013). A general block diagram of the whole system
is shown in Figure 14.2, where in case of a primary generator failure the automatic
control system will switch normally to one of the two standby generators or the
emergency one in case of failure of all main generators. We assume that according
to the switching sequence, the ACS activates the first available secondary generator
anticipating a failure with the probability (γ).
Considering the block diagram of Figure 14.2, the next step is to construct the
Markov model diagrams for each phase of the vessel’s operation. Since it is an ordi-
nary electric power structure, we can use the same guidelines from previous research
(Mennis and Platis 2013) adapting to the requirements of the current analysis.
Primary #1
Primary #2
Automatic
Standby #1 Output
Control
Standby #2
Emergency
FIGURE 14.2 Block diagram of the power generating system. Use of primary #2 generator
depends on the phase of the operation (e.g., it is necessary only during port phase).
370 Reliability Engineering
Port
3 days
Journey
7 days Idle
Maintenance
2 days
v = vP (14.4)
13
∑ v = 1 (14.6)
i =1
i
1
hi = (14.8)
∑ ∑
i
λi +
j
µj
where λ and μ are the failure and repair rates, respectively. Since the manual time
and automatic repair time are considered constant, the mean sojourn time is:
= =
hi t man and hi t aut (14.9)
The expression of formula (14.8) is a general one that implies that the transition of
the system from one state to another depends on the combination of all probable fail-
ures and repairs between the two states. The state probabilities of the semi-Markov
model are given by the following formula:
vi hi
πi = (14.10)
∑ vh
j j
j
The matrix equation is:
−1
V ⋅ Psemi = U ⇔ V = U ⋅ Psemi (14.11)
and V is the matrix that will be combined with the set of mean sojourn times to
calculate the final steady-state probabilities. Considering the general model of
Figure 14.3, the one-step transition probability matrix is given by Table A14.4 of the
Appendix to this chapter. A typical scenario of the operation cycle of a ship as previ-
ously mentioned consists of three phases: the system runs for 7 days in the journey
phase, for 3 days in the port phase, and 2 days in the maintenance phase for a total
of 12 days and a total of approximately 30 cycles on an annual basis. Proceeding to
further analysis, the failures on the operating components of the system are events
that take place in a random order; thus, they could be assumed to follow the Poisson
distribution with a mean rate of failure (λ), whereas the mean time to repair 1 µ fol-
lows the exponential distribution and, consequently, the rate of repair is (μ). A series
of state diagrams could describe the system. The number of possible states of the
system depends on its complexity. The model of the main electric power generation
system as mentioned previously consists of two primary generators and two second-
ary (or standby) ones. Their output is identical, providing 875 KW. There is also a
fifth generator (emergency generator) for providing a lower power level at 200 KW
and its mission is to provide power for auxiliary loads (Wärtsilä 2014), thus provid-
ing a certain level of reliability in the case of a total blackout of all main generators.
Maintenance Policy Analysis of a Marine Power Generating MSS 373
TABLE 14.5
Failure and Repair Rates
Failure Rate MTTR Failure Rate Repair Rate
System MTTF (Hrs) (per 106 Hrs) (man-hours) (per hour) (per hour)
Prim. Gen. 1 2,208.04 452.89 58.00 0.000453 0.017241
Prim. Gen. 2 2,208.04 452.89 58.00 0.000453 0.017241
Standby Gen. 1 2,208.04 452.89 58.00 0.000453 0.017241
Standby Gen. 2 2,208.04 452.89 58.00 0.000453 0.017241
Autom. Control 2,828.97 353.49 0.0833 0.000353 N/A
According to the available data (OREDA 2002), the failure rates and the repair
times are given in three forms: min, mean, and max. Examining the worst case sce-
nario, we assume the highest rate of failure and the longest repair time expressed in
man-hours for each case. The failure rates and time to repair for all five generators
are shown in Table 14.5. The failure rates are expressed in failures per 106 hours and
the repair rates in hours considering a basic crew of six in the engine room.
The automatic control system is responsible for the activation process of a standby
generator when a primary generator fails. In this study, we assume an automatic sys-
tem (Wu et al. 2013) that is connected to the marine generators provides reliability
parameters and characteristics as shown in Table A14.1 of the Appendix. The systems
that are used in our study consist of three serial subsystems (Figure 14.4): Sys1—the
main switch with a failure rate λSYS1 = 59.9998 × 10 −6 / hr, Sys2—the excitation system
with a failure rate fSYS2 = 18.7 × 10 −6 /hr, and Sys3—the main switching system with
a failure rate fSYS3 = 361.4859 × 10 −6 / hr. The system is serial; thus, its failure rate is
the sum of its components failure rates and totally fSYSTEM = 432.1857 × 10 −6 /hr.
Since it is an electronic system, in the case of a failure its repair includes replace-
ment of a module or rearrangement of the cables and contacts start-up a secondary
generator when a primary one fails. The time the crew needs to repair the system
manually is considered mean time to repair (MTTR) = 5 minutes or 0.0833 hours.
Considering the structure of the whole power system, the automatic control system is
vital for its normal operation. Consequently, the calculation of the probability (even
if it is close to zero) to switch from a failed generator to a standby one is necessary.
This probability is identical with the availability (A) of the control system and is
expressed by the following formula:
MTBF-MTTR
A= ⇔ A = 1 − γ = 0.999964 (14.13)
MTBF
Equation (14.14) represents the weighted average of the switching time either manu-
ally or automatically.
1 Information given by the marine engineer expert based on major engine manufacturer’s data.
Maintenance Policy Analysis of a Marine Power Generating MSS 375
is adopted for other elements such as centrifugal oil filter and the compressed air
system. There is a major factor affecting the maintenance of certain subsystems in
the auxiliary engines. Due to crew and other restrictions, overhaul maintenance
and all minor inspections (daily, weekly, and monthly) take place during the main-
tenance phase. All inspections or maintenance cover the respective ones of lower
levels, for example, when the monthly inspection takes place then the respective
weekly or daily inspections are omitted and engine crew members repair failures of
auxiliary engines when they appear. Concerning the detailed analysis of the Markov
model for each operational phase, each state presents the specific conditions of the
system’s operation. The code of label in each state describes the operational state
(1st character), the number of active primary generators (2nd character), the num-
ber of the active secondary generators (3rd character), and whether one generator
primary or secondary is in the maintenance process (4th character). Transitions and
their rates for all states of the model are provided in Table A14.4 of the Appendix to
this chapter. Starting with the maintenance phase as shown in Figure 14.5, the sys-
tem enters the maintenance phase and leaves the port phase. The possible states are
all those with one primary generator active (M,1,3,0 – M,1,2,0 – M,1,1,0, – M,1,0,0).
These states represent the preparation of the maintenance process. The rate of
maintenance is four generators per 48 hours (2 days of maintenance). During this
phase, if a primary generator fails, then a secondary generator is activated either
automatically (by ACS) or manually by the crew. This situation refers to the states
M,0,3,0 – M,0,2,0 – M,0,1,0 – M,0,2,1, and M,0,1,1.
If the failure happens while scheduled maintenance is in progress, then the crew
continues to complete the maintenance because this time is shorter than that of a
repair. If a secondary generator and a primary one operate normally, then the crew
starts the process to repair it. Concerning the failure of a secondary generator while
maintenance is in progress, the crew follows the same steps as in primary’s failure.
When all generators fail, the crew repairs one to recover normal power for main-
tenance. In this phase, the system is considered in normal operation when at least
one primary generator is in normal operation, including states (M,1,3,0 – M,1,2,0 –
M,1,10 – M,1,0,0 – M,1,2,1 – M,1,1,1 – M,1,0,1) and fails when it falls in any of the
other states.
Next is the journey phase shown in Figure 14.6 when the system enters the jour-
ney phase leaving the maintenance phase. The strategy of the crew to repair or
maintain the generators is the same as that of the maintenance phase with the dif-
ference that there is no generator under maintenance process. The possible states
are all those with one primary generator active (J,1,3,0 – J,1,2,0 – J,1,1,0, – J,1,0,0).
The activation of a secondary generator after a primary one’s failure follows the
same steps through the ACS and the normal operation includes the states J,1,3,0 –
J,1,2,0 – J,1,1,0 – J,1,0,0. In this phase, there is an additional characteristic. Whereas
the journey phase requires at least one primary generator, the transition to the next
phase, the port phase, requires at least two primary generators. Thus, there are three
FIGURE 14.6 Markov model of the electric power generating system (4-Gen)—journey
phase.
Maintenance Policy Analysis of a Marine Power Generating MSS 377
FIGURE 14.7 Markov model of the electric power generating system (4-Gen)—port phase.
additional states in the journey phase (J,2,2,0 – J,2,1,0 – J,2,0,0) aiming to assure the
activation of the second primary generator to prepare the system for the next phase
requiring increased power.
The next and last phase, the port phase, shown in Figure 14.7 is when the system
enters this phase after the journey phase. The possible states are P,2,2,0 – P,2,1,0 –
P,2,0,0. The repair strategy for activation of secondary generators using ACS is the
same with that of the journey phase.
Following the semi-Markov methodology as described in formulas (14.3) through
(14.13), we can construct the transition matrix easily using Table A14.4 of the
Appendix to this chapter followed by the one step probability matrix and the matrix
of mean sojourn times. The V vector after calculations and the mean sojourn times
are shown in Table A14.6 of the Appendix to this chapter and the final matrix of
the steady-state probabilities in Table A14.8. As previously mentioned, systems with
four generators are usual in large vessels. The analysis of the model with four gen-
erators shows that the level of availability of the system is high and there is no seri-
ous variation when the number of crew changes proving that investment in backup
systems can reduce the need for crewmembers.
At this point it would worthwhile to investigate the sensitivity of the system’s
structure concerning the crewmembers and the backup systems. One test is to assume
fewer generators for the system (e.g., three generators) as shown in Figures 14.8
through 14.10.
378 Reliability Engineering
FIGURE 14.8 Markov model of the electric power generating system (3-Gen)—
maintenance phase.
FIGURE 14.9 Markov model of the electric power generating system (3-Gen)—journey phase.
Maintenance Policy Analysis of a Marine Power Generating MSS 379
FIGURE 14.10 Markov model of the electric power generating system (3-Gen)—port phase.
Given that the needs for power are the same for each phase in both configura-
tions (three and four generators), the systems differ only in the number of secondary
generators. Following the same methodology of semi-Markov modeling as in four
generator configurations, we can see simpler diagrams. The Markov models of the
phases for system with three generators are shown in Figures 14.7 through 14.9 and
all transitions are shown in Table A14.5 of the Appendix to this chapter. Concerning
the maintenance phase (Figure 14.8), there are five out of ten states in normal opera-
tion (M,1,2,0 – M,1,1,0 – M,1,0,0 – M,1,1,1 – M,1,0,1), while all others are considered
failure. The next phase, the journey phase, as shown in Figure 14.9, is where the
system enters the phase and leaves the maintenance phase. The strategy of crew to
repair or maintain the generators is the same as that of the maintenance phase with
the difference that there is no generator under a maintenance process. The possible
states are all those with one primary generator active (J,1,2,0 – J, 1,1,0, – J,1,0,0).
The activation of a secondary generator after a primary one’s failure follows the
same steps through the ACS and the normal operation includes the states J,1,2,0 –
J,1,1,0 – J,1,0,0. In this phase, there is an additional characteristic. Following the
same transition states of preparation (states J,2,1,0 and J,2,0,0), the system passes
to the port phase (Figure 14.10). Implementing the semi-Markov methodology, we
can construct the transition matrix using transitions of Table A14.5 in the Appendix
to this chapter followed by the one step probability matrix and the matrix of mean
sojourn times.
380 Reliability Engineering
TABLE 14.6
Steady-State Probabilities of the Semi-Markov Model
Configuration
State of 4 Gen 3 Gen
Unavailability 2.893424E-06 1.024447E-05
The V vector after calculations and the mean sojourn times are shown in
Table A14.6 of the Appendix to this chapter and the matrix of the steady-state proba-
bilities in Table A14.9. A summary of probabilities concerning states of normal oper-
ation and states of unavailability (as they shown in Table A14.14 of the Appendix to
this chapter) for system configuration (three and four generators) and crew of six are
shown in Table 14.6.
The differences between probabilities in state of normal operation and failure
are in line with the general reliability theory concerning the use of backup systems.
Considering the first group, this method is applied once a failure takes place.
Following this strategy, it can be handled as a stochastic renewal process and the
implied cost can be expressed by the following formula:
C RR = C R + C D (14.15)
where:
C RR is the total cost of maintenance
C R is the repair cost
C D is the indirect cost while the system is not operative
Compared with the alternative of planned maintenance, the repair on failure policy
is preferred when:
C RR ≤ C PM (14.16)
Maintenance Policy Analysis of a Marine Power Generating MSS 381
C2
T0 m(T0 ) − M (T0 ) = (14.17)
C1
where:
T0 is the optimum time interval
M (T0 ) is the renewal function
m(T0 ) is the renewal density
C1 is the expected cost of failure
C2 is the expected cost for exchanging non-failed item
The study of the maintenance problem also is related to the MSS methodology.
In general, two major categories of maintenance are followed (Liu and Huang 2010).
The corrective one is conducted when a system failure takes place and the preven-
tive one is conducted when the user’s intention is to keep its performance within
the desired limits during specific periods of operation. Concerning the ships and
shipping industry, the application of the corrective maintenance refers to onboard
repairing activities whereas the preventive one refers to repairs in shipyards during
major overhauls. Due to the existing restrictions to repair failed systems on board,
the corrective maintenance presents inherent difficulties. One major challenge is
the optimization of maintenance policy through a combination of maintenance
policy to achieve the ship’s unobstructed operation. Depending on the management
policy of the ship’s owner, the maintenance policy (consisting of corrective and
preventive maintenance) of the ship’s subsystems should consider the expected time
between failures organize the transportation assignments of each ship aiming to
minimize the cost. Thus, minor failures subjected to repair by the crew members
would not affect the ship’s transportation capability. It is obvious that repair of
major failures should be scheduled during the overhaul inspections and repair in
the shipyard.
One important parameter that affects the development of the maintenance pol-
icy is the time horizon. This horizon determines the strategy of the maintenance
policy management. In the case of a long or medium term, the maintenance policy
could focus more on planning and spare parts’ inventory management, while the
short-term horizon focuses more on monitoring and control (Ben-Daya et al. 2000).
382 Reliability Engineering
If the complexity of the system is high, the replacement policy should focus rather
on block replacement than on other policies such as age replacement, proving that
the first one is preferred to the second one (Barlow and Proschan 1996). The basic
analysis of the maintenance policy refers to the failures and their distributions of the
parts, subsystems, and systems. Nevertheless, it does not provide universal answers
concerning maintenance, because further questions could arise, such as what the
optimal maintenance policy is, considering the specific conditions of the system.
The term “optimal” refers not only to a cost minimization, but also to maximization
of the availability (Barlow and Proschan 1996). These two terms follow the same
principles concerning the optimization of problems with more than two decision
variables under optimization.
Considering the findings from previous sections, we could propose certain basic
principles concerning a ship’s operation. One example is the maintenance of the
electrical power system and the results of this study as a way of thinking could be
expanded to other subsystems and finally to the whole ship. The electric system
of a ship as a complex system consists of many different parts that are subject to
deterioration and a possible gradual degradation. Since it is a complicated system,
it could be an MSS operating in different output levels. As described in previous
sections, a marine electric power generating system consists of many different
main and emergency generators. The system follows a typical configuration of
four main generators and one emergency, whereas the set of the four main consists
of two primary and two standby generators. Considering the generating system as
an MSS, in general, whenever a system’s performance falls under the threshold of
acceptance it is assumed failed, thus maintenance actions should take place (Liu
and Huang 2010). Since there is always the probability of transition of the MSS
from one state to another, the restoration of the system is subject to a factor of ran-
domness, because one subsystem could fail during the restoration of a previously
failed one.
As maintenance policy depends on the strategic goals of the decision makers, it is
closely related to the minimization of the maintenance cost. Although cost is a single
concept, there are different aspects to describe it in the same result. Concerning a
ship, the sufficient level of maintenance implies direct and indirect cost savings.
Direct cost savings refers to reduced needs for repairs, reduced man-hours, and the
losses due to not using the equipment. The indirect cost savings refer to meeting the
requirements of contracts and penalties due to delays. Another valid assessment of
this cost is the reliability associated cost (RAC), which is expressed by (Lisnianski
et al. 2010) by the formula:
RAC = OC + RC + PC (14.18)
where:
OC is the operational cost and the fuel cost, when it comes to power systems
driven by auxiliary diesel engines
RC the repair cost including the repair and maintenance cost in man-hours and
spare parts
PC the penalty cost when the system’s failure leads to delays of the operation
Maintenance Policy Analysis of a Marine Power Generating MSS 383
TABLE 14.7
Probabilities of Operation at Acceptable Level
(min power requirements)
4 Generators 3 Generators
Crew
Members Unavailability Unavailability
1 1.400095E-05 2.204020E-04
2 4.306027E-06 5.963794E-05
3 3.288975E-06 2.927190E-05
4 3.031658E-06 1.834814E-05
5 2.936747E-06 1.314772E-05
6 2.893424E-06 1.024447E-05
7 2.870731E-06 8.447592E-06
8 2.857641E-06 7.252148E-06
9 2.849532E-06 6.413197E-06
10 2.844226E-06 5.799750E-06
11 2.840598E-06 5.336362E-06
12 2.838030E-06 4.976965E-06
According to our findings, the probability of the system to reach a non-acceptable level
of operation depending on the crew varies from 1.400095E-05 to 2.838030E-06 in a
four generator configuration and from 2.204020E-04 to 4.976965E-06 in a three gener-
ators configuration (Table 14.7). Attempting to understand the sensitivity of availability
subjected to changes of crew, this chapter developed all models of previous sections for
different crews, from 1 up to 12 members. The final probabilities for each phase of the
auxiliary engines’ operation and for different crews are shown in Table 14.7.
Alternatively, the availabilities of both configurations are shown in Figure 14.11.
There is an obvious difference between the availability of the two systems showing
the possible interaction between systems and manpower.
1.0001E+00
1.0000E+00
9.9995E-01
3 GEN
9.9990E-01 4 GEN
9.9985E-01
9.9980E-01
9.9975E-01
9.9970E-01
9.9965E-01
1 2 3 4 5 6 7 8 9 10 11 12
Reliabilit y Cost
Man-hours Cost
Supplies and
Transportation
14.7 CONCLUSIONS
This study evaluated the reliability performance using MSS theory of a marine
electric power generation unit that consists of four generators (one or two primary
and two or three secondary) driven by auxiliary diesel engines. An additional
alternative analysis of a system including three generators (one or two primary and
one or two secondary) was conducted to investigate the differences between two
system configurations and to identify possible alternatives concerning the decision
making.
One main characteristic of the system is that it uses a single type of generators.
This strategy implies a managerial aspect that is the simplification of the system’s
management concerning, in general, the schedule of supplies and the maintenance.
The analysis of the generators as a MSS shows that the probability of operation in
non-acceptable level of output is low implying high availability, which is along with
the operational requirements of a ship. This fact is possible through the increased
number of generators (primary, secondary, and emergency) and the expected rates
of failure of the subsystems (generators) that tend to reach low levels following the
technological progress. Due to technological limitations, the continuous lowering of
failure rates could be considered difficult. To achieve the goal of continuous improve-
ment of the systems there are two options. The first one is the additional backup
systems (configuration with four generators) and the second one is to increase repair
rates using the parameter of maintainability (configuration with three generators)
using additional highly qualified personnel in the engine room.
The analysis of the system shows that except the ordinary aspect of maintenance
focusing on materials, another aspect focusing on manpower and human capital
also exists. Considering the reliability parameters of the systems and the modeling
process, the empirical results show that the efficiency of the systems is very high.
Comparison of both system configurations shows that increasing the engine crew,
the probability of normal operation tends to be almost the same for both options.
This fact implies the existence of an interaction between system and crew. The find-
ings that the level of reliability increases along with the increase of crew members are
valid within limits that depend on the cost of penalty to cost of wage ratio. Obviously,
these limits differ depending on the specifications and operation requirements of
each system. The analysis conducted in this chapter covers a theoretical approach of
the maintenance problem optimization in a specific sector of the ship. This model
could focus to specific devices and subsystems or could be expanded in an appropri-
ate way to include more systems or integrated blocks. All the above findings would
be a starting point for further research, combining additional mathematical methods
such as Monte Carlo Simulation, leading to uncertainty reduction of the models.
ACKNOWLEDGMENT
We would like to thank Dr. J. Dagkinis, whose expert knowledge on marine engi-
neering issues was most helpful.
386 Reliability Engineering
APPENDIX
TABLE A14.1
Failure Rates for Automatic Control System EEA-22
Type of Failure MTBF (Days) Failure Rate (per 10-6 Hrs)
5C147 1,818.38 22.9142
Input 5C15 8,998.89 4.6302
Power 5C3 301.02 138.4184
Start 5C21 2,324.97 17.9214
Stop 5C27 3,929.78 10.6028
General control 6C109 2,270.22 18.3536
Stop cascade 6C103 10,659.16 3.9090
Voltage monitor 5C153 671.80 62.0226
V/f monitor 6C39 4,030.83 10.3370
Additional reference value 6C67 5,374.96 7.7520
Closing pulse EVG23 8,726.73 4.7746
Power SNT23 19,007.65 2.1921
Relay output RAG23 2,102.36 19.8190
Frequency presetting FAG23 9,780.68 4.2601
Active power measuring WMG23 3,499.05 11.9080
Frequency controller FRG23 9,063.09 4.5974
Load distribution LAG23 4,592.13 9.0735
Total Rate 117.87 353.4859
TABLE A14.2
Failure and Repair Rates for Engine Drivers of Power Generators
Failure Rate Time to Repair Rate
Type of Failure MTBF (Days) (per 10–6 Hrs) Repair (Hrs) (per 10–6 Hrs)
Fuel oil filters 20 2,083.33 1.5 27,777.78
Oil filters 20 2,083.33 1.5 27,777.78
Air filters 20 2,083.33 1.5 27,777.78
Water filters 30 1,388.89 1.5 27,777.78
Fuel injector 60 694.44 1.5 27,777.78
Leaking of gasket 90 462.96 1.5 27,777.78
Piping system 60 694.44 1.5 27,777.78
Water pump 60 694.44 3.0 13,888.89
Fuel pump 30 1,388.89 3.0 13,888.89
Fuel injector 180 231.48 3.0 13,888.89
Dirty water cooler 60 694.44 3.0 13,888.89
Dirty oil cooler 60 694.44 3.0 13,888.89
Cover gasket damage 365 114.16 3.0 13,888.89
Exhaust inlet valve 365 114.16 7.0 5,952.38
Cracking of cyl. heads 240 173.61 7.0 5,952.38
Piston ring damage 365 114.16 7.0 5,952.38
Turbo charger 365 114.16 7.0 5,952.38
TABLE A14.3
General Plan of Maintenance
Overhaul Maintenance Interval (Hours of Operation)
Action—Description Daily Weekly Monthly 1,000 2,000 4,000 8,000 12,000 16,000 20,000 24,000 28,000 30,000
Major fasteners—retightening X X X X
Major bearing—inspection X X X
Resilient mounts—inspect-retighten X
Cylinder and rod—inspection X X
Crankshaft—gears—inspection X X X
Valve mechanism X
Control system X X X
Fuel system X X
Lubricating oil system X X X
Cooling water system X X X
Compressed air system X X X
Maintenance Policy Analysis of a Marine Power Generating MSS
Supercharging system X X X
387
388 Reliability Engineering
TABLE A14.4
States of the Marine Power Generating System (4 Generators)a
State FR TO Rate State FR TO Rate State FR TO Rate
a The transition to another state is either through failure or repair of a generator or automatic control
system.
Maintenance Policy Analysis of a Marine Power Generating MSS 389
TABLE A14.5
States of the Marine Power Generating System (3 Generators)a
State FR TO Rate State FR TO Rate State FR TO Rate
M,1,2,0 S1 S2 λs J,1,2,0 S11 S12 λs P,2,1,0 S19 S1 λp→m
S4 m S13 λ S20 λs
S6 λ S17 1/tswitch S21 λ
S11 λm→j J,1,1,0 S12 S11 μ P,2,0,0 S20 S2 λp→m
M,1,1,0 S2 S1 μ S14 λ S19 μ
S3 λs S15 λs S22 λ
S4 m S18 1/tswitch P,1,1,0 S21 S2 λp→m
S7 λ J,0,2,0 S13 S12 1/tswitch S20 1/tswitch
S12 λm→j S14 λs S22 λs
M,1,0,0 S3 S2 μ J,0,1,0 S14 S15 1/tswitch S23 λ
S8 λ S16 λs P,1,0,0 S22 S3 λp→m
S15 λm→j J,1,0,0 S15 S12 μ S20 μ
M,1,1,1 S4 S1 1/M S16 λ S24 λ
S5 λs J,0,0,0 S16 S15 μ P,0,1,0 S33 S22 1/tswitch
S9 λ J,2,1,0 S17 S19 λj→p S24 λs
M,1,0,1 S5 S2 1/M J,2,0,0 S18 S20 λj→p P,0,0,0 S34 S22 μ
S10 λ
M,0,2,0 S6 S2 1/tswitch
S7 λs
M,0,1,0 S7 S3 1/tswitch
S3 μ
M,0,0,0 S8 S5 1/tswitch
S10 λs
M,0,1,1 S9 S3 1/M
S2 λs
M,0,0,1 S10 S4 m
a The transition to another state is either through failure or repair of a generator or automatic control
system.
390 Reliability Engineering
TABLE A14.6
V-Vector and Mean Sojourn Times (4 Generators)
State vi hi State vi hi
1 2.9110E-01 1.4128E+00 18 1.1644E-06 1.2502E-02
2 1.6476E-03 1.0933E+00 19 5.1062E-09 1.2502E-02
3 7.2131E-06 1.0938E+00 20 2.2407E-11 1.2502E-02
4 1.0464E-08 1.4137E+00 21 7.4578E-09 4.8228E+00
5 8.5464E-02 4.8123E+00 22 1.6289E-11 4.8333E+00
6 3.7416E-04 4.8123E+00 23 2.0564E-01 7.0000E+00
7 1.6319E-06 4.8228E+00 24 9.0063E-04 7.0000E+00
8 1.8626E-04 1.2502E-02 25 3.9522E-06 7.0000E+00
9 8.1683E-07 1.2502E-02 26 2.0620E-01 2.9919E+00
10 3.5778E-09 1.2502E-02 27 1.4607E-03 1.8480E+00
11 6.7201E-12 4.8333E+00 28 6.3973E-06 1.8495E+00
12 1.8626E-04 1.2502E-02 29 2.7939E-04 1.2451E-02
13 8.1650E-07 1.2502E-02 30 1.2257E-06 1.2451E-02
14 3.5691E-09 4.8333E+00 31 5.3700E-09 1.8495E+00
15 2.0564E-01 1.2502E-02 32 1.5754E-09 1.2502E-02
16 9.0297E-04 1.2470E-02 33 6.9200E-12 1.2502E-02
17 3.9625E-06 1.2470E-02 34 4.4978E-12 4.8333E+00
TABLE A14.7
V-Vector and Mean Sojourn Times (3 Generators)
State vi hi State vi hi
1 2.9111E-01 1.4128E+00 13 1.1644E-06 1.2502E-02
2 1.6468E-03 1.0933E+00 14 5.1164E-09 1.2502E-02
3 3.2053E-06 1.4137E+00 15 2.2809E-06 4.8228E+00
4 8.5465E-02 4.8123E+00 16 4.9820E-09 4.8333E+00
5 3.7253E-04 4.8228E+00 17 2.0564E-01 7.0000E+00
6 1.8626E-04 1.2502E-02 18 9.0243E-04 7.0000E+00
7 8.1641E-07 1.2503E-02 19 2.0620E-01 2.9919E+00
8 2.0523E-09 4.8333E+00 20 1.4605E-03 1.8495E+00
9 1.8626E-04 1.2502E-02 21 2.7940E-04 1.2451E-02
10 8.1473E-07 4.8333E+00 22 1.2276E-06 1.8495E+00
11 2.0564E-01 1.2502E-02 23 1.5754E-09 1.2502E-02
12 9.0477E-04 1.2470E-02 24 1.0282E-09 1.0000E+00
Maintenance Policy Analysis of a Marine Power Generating MSS 391
TABLE A14.8
Steady-State Probabilities of Electric Power Generating System
(4 Generators)
Probability
Phase State Crew of 6
Maintenance S1 M,1,3,0 1.4147949E-01
S2 M,1,2,0 1.2390664E-03
S3 M,1,1,0 1.0848216E-05
S4 M,1,0,0 2.5590879E-08
S5 M,1,2,1 1.4147978E-01
S6 M,1,1,1 1.2387772E-03
S7 M,1,0,1 1.0802759E-05
S8 M,0,3,0 8.0109202E-07
S9 M,0,2,0 7.0204378E-09
S10 M,0,1,0 6.1465043E-11
S11 M,0,0,0 1.1230434E-10
S12 M,0,2,1 8.0109365E-07
S13 M,0,1,1 7.0188000E-09
S14 M,0,0,1 4.7324521E-08
Journey S15 J,1,3,0 8.8442730E-04
S16 J,1,2,0 7.7457435E-06
S17 J,1,1,0 6.7974990E-08
S18 J,0,3,0 5.0078470E-09
S19 J,0,2,0 4.3886679E-11
S20 J,0,1,0 3.8513977E-13
S21 J,1,0,0 1.2428462E-07
S22 J,0,0,0 5.4411190E-10
S23 J,2,2,0 4.9517823E-01
S24 J,2,1,0 4.3367313E-03
S25 J,2,0,0 3.8058227E-05
Port S26 P,2,2,0 2.1221906E-01
S27 P,2,1,0 1.8575841E-03
S28 P,2,0,0 1.6281384E-05
S29 P,1,2,0 1.1966432E-06
S30 P,1,1,0 1.0487945E-08
S31 P,1,0,0 1.6892712E-08
S32 P,0,2,0 6.7756912E-12
S33 P,0,1,0 5.9423717E-14
S34 P,0,0,0 7.3956051E-11
392 Reliability Engineering
TABLE A14.9
Steady-State probabilities of Electric Power Generating System (3 Generators)
Probability
Phase State Crew of 6
Maintenance S1 M,1,2,0 1.4148529E-01
S2 M,1,1,0 1.2381783E-03
S3 M,1,0,0 3.8518151E-06
S4 M,1,1,1 1.4148464E-01
S5 M,1,0,1 1.2334183E-03
S6 M,0,2,0 8.0112483E-07
S7 M,0,1,0 7.0154487E-09
S8 M,0,0,0 1.6863003E-08
S9 M,0,1,1 8.0112119E-07
S10 M,0,0,1 5.4033344E-06
Journey S11 J,1,2,0 8.8446355E-04
S12 J,1,1,0 7.7642464E-06
S13 J,0,2,0 5.0080522E-09
S14 J,0,1,0 4.3991448E-11
S15 J,1,0,0 1.8685111E-05
S16 J,0,0,0 8.1802426E-08
S17 J,2,1,0 4.9519852E-01
S18 J,2,0,0 4.3470908E-03
Port S19 P,2,1,0 2.1222833E-01
S20 P,2,0,0 1.8595168E-03
S21 P,1,1,0 1.1966955E-06
S22 P,1,0,0 1.9305769E-06
S23 P,0,1,0 6.7759872E-12
S24 P,0,0,0 8.7434201E-10
TABLE A14.10
Steady-State Probabilities for Different Crews (4 Generators)
Probabilities (1 of 2)
State Crew 1 2 3 4 5 6
M,1,3,0 1 1.35245E-01 1.38982E-01 1.40230E-01 1.40855E-01 1.41230E-01 1.41479E-01
M,1,2,0 2 7.10546E-03 3.65104E-03 2.45599E-03 1.85026E-03 1.48420E-03 1.23907E-03
M,1,1,0 3 3.73045E-04 9.58536E-05 4.29911E-05 2.42938E-05 1.55917E-05 1.08482E-05
M,1,0,0 4 1.11179E-06 2.72044E-07 1.16200E-07 6.26382E-08 3.84186E-08 2.55909E-08
M,1,2,1 5 1.35246E-01 1.38982E-01 1.40231E-01 1.40855E-01 1.41230E-01 1.41480E-01
M,1,1,1 6 7.10492E-03 3.65068E-03 2.45566E-03 1.84996E-03 1.48391E-03 1.23878E-03
M,1,0,1 7 3.63724E-04 9.46614E-05 4.26364E-05 2.41436E-05 1.55141E-05 1.08028E-05
(Continued)
Maintenance Policy Analysis of a Marine Power Generating MSS 393
TABLE A14.11
Steady-State Probabilities for Different Crews (4 Generators)
Probabilities (2 of 2)
State Crew 7 8 9 10 11 12
M,1,3,0 1 1.41658E-01 1.41792E-01 1.41896E-01 1.41979E-01 1.42048E-01 1.42104E-01
M,1,2,0 2 1.06343E-03 9.31411E-04 8.28553E-04 7.46157E-04 6.78668E-04 6.22377E-04
M,1,1,0 3 7.98124E-06 6.11720E-06 4.83750E-06 3.92115E-06 3.24256E-06 2.72606E-06
M,1,0,0 4 1.80554E-08 1.32918E-08 1.01108E-08 7.89429E-09 6.29624E-09 5.11149E-09
M,1,2,1 5 1.41658E-01 1.41792E-01 1.41896E-01 1.41980E-01 1.42048E-01 1.42105E-01
(Continued)
394 Reliability Engineering
TABLE A14.12
Steady-State Probabilities for Different Crews (3 Generators)
Probabilities (1 of 2)
State Crew 1 2 3 4 5 6
M,1,2,0 1 1.35479E-01 1.39041E-01 1.40256E-01 1.40869E-01 1.41238E-01 1.41485E-01
M,1,1,0 2 7.11002E-03 3.64893E-03 2.45415E-03 1.84884E-03 1.48309E-03 1.23818E-03
M,1,0,0 3 2.74263E-05 1.34534E-05 8.65137E-06 6.24126E-06 4.80228E-06 3.85182E-06
M,1,1,1 4 1.35472E-01 1.39037E-01 1.40254E-01 1.40867E-01 1.41237E-01 1.41485E-01
M,1,0,1 5 6.93487E-03 3.60483E-03 2.43476E-03 1.83805E-03 1.47623E-03 1.23342E-03
M,0,2,0 6 7.67116E-07 7.87283E-07 7.94163E-07 7.97634E-07 7.99726E-07 8.01125E-07
M,0,1,0 7 4.02633E-08 2.06657E-08 1.39006E-08 1.04732E-08 8.40221E-09 7.01545E-09
M,0,0,0 8 7.20424E-07 1.76694E-07 7.57503E-08 4.09858E-08 2.52289E-08 1.68630E-08
M,0,1,1 9 7.67076E-07 7.87264E-07 7.94152E-07 7.97626E-07 7.99721E-07 8.01121E-07
M,0,0,1 10 1.82183E-04 4.73554E-05 2.13254E-05 1.20755E-05 7.75960E-06 5.40333E-06
J,1,2,0 11 8.46916E-04 8.69182E-04 8.76778E-04 8.80609E-04 8.82919E-04 8.84464E-04
J,1,1,0 12 4.46181E-05 2.28945E-05 1.53956E-05 1.15966E-05 9.30121E-06 7.76425E-06
J,0,2,0 13 4.79545E-09 4.92152E-09 4.96453E-09 4.98623E-09 4.99931E-09 5.00805E-09
J,0,1,0 14 2.52666E-10 1.29662E-10 8.72020E-11 6.56912E-11 5.26941E-11 4.39914E-11
J,1,0,0 15 7.97707E-04 1.95675E-04 8.38996E-05 4.54015E-05 2.79510E-05 1.86851E-05
J,0,0,0 16 2.09539E-05 2.56996E-06 7.34615E-07 2.98148E-07 1.46841E-07 8.18024E-08
J,2,1,0 17 4.74176E-01 4.86642E-01 4.90895E-01 4.93041E-01 4.94334E-01 4.95199E-01
J,2,0,0 18 2.49811E-02 1.28183E-02 8.61979E-03 6.49278E-03 5.20762E-03 4.34709E-03
P,2,1,0 19 2.03219E-01 2.08562E-01 2.10384E-01 2.11304E-01 2.11858E-01 2.12228E-01
P,2,0,0 20 1.06905E-02 5.48495E-03 3.68807E-03 2.77777E-03 2.22777E-03 1.85952E-03
P,1,1,0 21 1.14590E-06 1.17602E-06 1.18630E-06 1.19148E-06 1.19461E-06 1.19670E-06
P,1,0,0 22 1.38134E-05 6.75649E-06 4.34056E-06 3.12977E-06 2.40744E-06 1.93058E-06
P,0,1,0 23 6.48835E-12 6.65892E-12 6.71711E-12 6.74646E-12 6.76416E-12 6.77599E-12
P,0,0,0 24 6.25597E-09 3.05995E-09 1.96580E-09 1.41745E-09 1.09031E-09 8.74342E-10
TABLE A14.13
Steady-State Probabilities for Different Crews (3 Generators)
Probabilities (2 of 2)
State Crew 7 8 9 10 11 12
M,1,2,0 1 1.41662E-01 1.41795E-01 1.41898E-01 1.41981E-01 1.42049E-01 1.42106E-01
M,1,1,0 2 1.06271E-03 9.30808E-04 8.28044E-04 7.45722E-04 6.78293E-04 6.22051E-04
M,1,0,0 3 3.18093E-06 2.68457E-06 2.30418E-06 2.00458E-06 1.76339E-06 1.56571E-06
M,1,1,1 4 1.41662E-01 1.41795E-01 1.41898E-01 1.41981E-01 1.42049E-01 1.42105E-01
M,1,0,1 5 1.05920E-03 9.28102E-04 8.25882E-04 7.43944E-04 6.76797E-04 6.20768E-04
M,0,2,0 6 8.02126E-07 8.02878E-07 8.03464E-07 8.03933E-07 8.04317E-07 8.04637E-07
M,0,1,0 7 6.02188E-09 5.27504E-09 4.69317E-09 4.22704E-09 3.84524E-09 3.52678E-09
(Continued)
396 Reliability Engineering
TABLE A14.14
States of Operation and Unavailability for 4 and 3 Generators
State 4 Generators State 3 Generators
M,1,3,0 1 Operation M,1,2,0 1 Operation
M,1,2,0 2 Operation M,1,1,0 2 Operation
M,1,1,0 3 Operation M,1,0,0 3 Operation
M,1,0,0 4 Operation M,1,1,1 4 Operation
M,1,2,1 5 Operation M,1,0,1 5 Operation
M,1,1,1 6 Operation M,0,2,0 6 Unavailability
M,1,0,1 7 Operation M,0,1,0 7 Unavailability
M,0,3,0 8 Unavailability M,0,0,0 8 Unavailability
M,0,2,0 9 Unavailability M,0,1,1 9 Unavailability
M,0,1,0 10 Unavailability M,0,0,1 10 Unavailability
M,0,0,0 11 Unavailability J,1,2,0 11 Operation
M,0,2,1 12 Unavailability J,1,1,0 12 Operation
M,0,1,1 13 Unavailability J,0,2,0 13 Unavailability
M,0,0,1 14 Unavailability J,0,1,0 14 Unavailability
J,1,3,0 15 Operation J,1,0,0 15 Operation
J,1,2,0 16 Operation J,0,0,0 16 Unavailability
(Continued)
Maintenance Policy Analysis of a Marine Power Generating MSS 397
REFERENCES
Alzbutas, R. (2003). Diesel generators reliability data analysis and testing interval optimiza-
tion. Energetika 4:27–33.
Barbu, V.S., Karagrigoriou, A. (2018). Modeling and inference for multi-state systems,
In: Lisnianski A., Frenkel I., Karagrigoriou A. (eds) Recent Advances in Multi-state
Systems Reliability: Springer Series in Reliability Engineering. Springer, Cham,
Switzerland. doi:10.1007/978–3-319–63423-4_16.
Barlow, R., Proschan, F. (1996). Mathematical Theory of Reliability. John Wiley & Sons,
New York.
Ben-Daya, M., Duffuaa, S., Raouf, A. (2000). Maintenance Modelling and Optimization.
Springer Science and Media, New York.
Billinton, R., Li, Y. (2007). Incorporating multi state unit models in composite system ade-
quacy assessment. European Transactions on Electrical Power 17:375–386.
Brocken, E.M. (2016). Improving the Reliability of Ship Machinery: A Step Towards
Unmanned Shipping. Delft University of Technology, Delft, the Netherlands.
Chowdhury, C. (1988). A systematic survey of the maintenance models. Periodica
Polytechnica Engineering Mechanical Engineering 32(3–4):253–274.
Det Norske Veritas DNV (2011). Machinery Systems General, in Rules for Classification of
Ships, Høvik, Norway.
Eryilmaz, S. (2015). Assessment of a multi-state system under a shock model. Applied
Mathematics and Computation 269:1–8.
IEEE 1159–1995: IEEE Recommended Practice for Monitoring Electric Power Quality, 1995.
IEEE 45–2002: IEEE Recommended Practice for Electrical Installations on Shipboard, 2002.
398 Reliability Engineering
IMO (2005). Unified Interpretations to SOLAS Chapters II-1 and XII and to the Technical
Provisions for Means of Access for Inspections, London, UK. http://imo.udhb.gov.tr/
dosyam/EKLER/SOLAS__BOLUM_II_1_EK(21).pdf.
IMO Study on the optimization of energy consumption as part of implementation of a Ship
Energy Efficiency Management Plan (SEEMP) 2016.
IMO MEPC 1-CIRC 866 (E). (2014). Guidelines on the Method of Calculation of the Attained
Energy Efficiency Design Index (EEDI) For New Ships, As Amended (Resolution
Mepc.245(66), As Amended By Resolutions Mepc.263(68) And Mepc.281(70), January
2017.
Levitin, G., Lisnianski, A. (1999). Joint redundancy and maintenance optimization for multi-
state series-parallel systems. Reliability Engineering & System Safety 64(1):33–42.
Levitin, G., Xing, L. (2018). Dynamic performance of series parallel multi-state sys-
tems with standby subsystems or repairable binary elements, In: Lisnianski A.,
Frenkel I., Karagrigoriou A. (eds) Recent Advances in Multi-state Systems
Reliability: Springer Series in Reliability Engineering. Springer, Cham, Switzerland.
doi:10.1007/978–3-319–63423-4_16.
Lisnianski, A., Frenkel, I., Ding, Y. (2010). Multi State Systems Reliability and Optimization
for Engineers and Industrial Managers. Springer, London, UK.
Lisnianski, A., Elmakias, D., Laredo, D., Haim, H.B. (2012). A multi-state Markov model
for a short-term reliability analysis of a power generating unit. Reliability Engineering
Systems Safety 98:1–6.
Liu, Y., Huang, H.Z. (2010). Optimal replacement policy for multi-state system under imper-
fect maintenance. IEEE Transactions on Reliability 59(3):483–495.
Liu, Y.W., Kapur, K.C. (2006). Reliability measures for dynamic multi state non repairable
systems and their applications to system performance evaluation. IIE Transaction
38(6):511–520.
Maes, W. (2013). Marine Electrical Knowledge. Antwerp Maritime Academy, Antwerp,
Belgium.
MAIB (2011). Report on the investigation of the catastrophic failure of a capacitor in the aft
harmonic filter room on board RMS Queen Mary 2 while approaching Barcelona on
23 September 2010. Marine Accident Investigation Branch. http://www.maib.gov.uk/
publications/investigation_reports/2011/qm2.cfm (last accessed May 7, 2018).
Markopoulos, T., Platis, A. (2018). Reliability analysis of a modified IEEE 6 BUS RBTS,
In: Lisnianski A., Frenkel I., Karagrigoriou A. (eds) Recent Advances in Multi-state
Systems Reliability. Springer Series in Reliability Engineering. Springer, Cham,
Switzerland. doi:10.1007/978–3-319–63423-4_16.
Mennis, E., Platis, A. (2013). Availability assessment of diesel generator system of a ship:
A case study. International Journal of Performability Engineering 9(5):561–567.
MEPC 61/inf.18: Reduction of GHG Emissions from Ships—Marginal abatement costs and
cost-effectiveness of energy-efficiency measures, October 2010.
MEPC.1-Circ.681–2: Interim Guidelines on the Method of Calculation of the Energy
Efficiency Design Index for New Ships, August 2009.
MEPC.1-Circ.684: Guidelines for Voluntary Use of the Ship Energy Efficiency Operational
Indicator (EEOI), August 2009.
Miller, T. (2012). Risk focus: Loss of power. http://www.ukpandi.com/fileadmin/uploads/
uk-pi/Documents/Brochures/Risk%20Focus%20-%20Loss%20of% 20 Power.pdf.
Mindykowski, J. (2014). Power quality on ships: Today and tomorrow’s challenges.
International Conference and Exposition on Electrical and Power Engineering (EPE
2014), Iasi, Romania.
Mindykowski, J. (2016). Case study—Based overview of some contemporary challenges to
power quality in ship systems. Inventions 1(2):12.
Maintenance Policy Analysis of a Marine Power Generating MSS 399
Mindykowski, J., Tarasiuk, T. (2015). Problems of power quality in the wake of ship
technology development. Ocean Engineering 107:108–117.
MSC/Circular.645-Guidelines for Vessels with Dynamic Positioning Systems-(adopted on 6
June 1994).
MTS DP Technical Committee. DP Vessel Design Philosophy Guidelines Part II.
MUNIN. D6.7: Maintenance indicators and maintenance management principles for autono-
mous engine room, 2014.
Nakagawa, T. (2006). Maintenance Theory of Reliability. Springer Science & Business
Media, London, UK.
OREDA (2002). Offshore Reliability Data Handbook, 4th ed. OREDA, Trondheim, Norway.
Patel, M.R. (2012). Shipboard Electrical Power Systems. CRC Press, Boca Raton, FL.
Prousalidis, J., Styvaktakis, E., Hatzilau, I.K., Kanellos, F., Perros, S., Sofras, E. (2008).
Electric power supply quality in ship systems: An overview. International Journal of
Ocean Systems Management 1(1):68–83.
Prousalidis, J.M., Tsekouras, J.G., Kanellos, F. (2011). New challenges emerged from the
development of more efficient electric energy generation units. From Electric Ship
Technologies Symposium (ESTS), IEEE. doi:10.1109/ESTS.2011.5770901.
Shagar, V., Jayasinghe, S.G., Enshaei, H. (2017). Effect of load changes on hybrid shipboard
power systems and energy storage as a potential solution: A review. Inventions 2:21.
Stevens, B., Dubey, A., Santoso, S. (2015). On improving reliability of shipboard power sys-
tem. IEEE Transactions on Power Systems 30(4):1905–1906.
Trivedi, K.S. (2002). Probability and Statistics with Reliability, Queuing and Computer
Science Applications. Wiley, New York.
Wärtsilä (2014). WSD 42111K, Aframax Tanker for Oil and Products. Wärtsilä Corporation,
Helsinki, Finland.
Wu, Z., Yao, Y., Wang, D. (2013). The reliability modeling of marine power station. Applied
Mechanics and Materials 427–429:404–407.
Yingkui, G., Jing, L. (2012). Multi state system reliability: A new and systematic review.
Procedia Engineering 29:531–536.
15 Vulnerability Discovery
and Patch Modeling
State of the Art
Avinash K. Shrivastava, P. K. Kapur,
and Misbah Anjum
CONTENTS
15.1 Introduction................................................................................................. 401
15.1.1 Vulnerability.................................................................................402
15.2 Literature Review........................................................................................403
15.2.1 Anderson Thermodynamic Model...............................................404
15.2.2 Alhazmi Malaiya Logistic Model................................................405
15.2.3 Rescorla Quadratic and Rescorla Exponential Models................405
15.2.3.1 Rescorla Quadratic Model..........................................406
15.2.3.2 Rescorla Exponential Model.......................................406
15.2.4 Vulnerability Discovery Model Using Stochastic Differential
Equation........................................................................................406
15.2.5 Effort-Based Vulnerability Discovery Model..............................407
15.2.6 User-Dependent Vulnerability Discovery Model.........................408
15.2.7 Vulnerability Discovery Model for Open and Closed Source.....409
15.2.8 Coverage Based Vulnerability Discovery Modeling.................... 410
15.2.9 Vulnerability Patching Model...................................................... 411
15.2.9.1 One-Dimension Vulnerability Patching Model.......... 412
15.2.9.2 Two-Dimensional Vulnerability Patch Modeling....... 413
15.2.10 Vulnerability Discovery and Patching Model.............................. 413
15.3 Vulnerability Discovery in Multi-version Software Systems...................... 415
15.3.1 User Dependent Multi-version Vulnerability
Discovery Modeling..................................................................... 416
15.4 Conclusion and Future Directions............................................................... 417
References............................................................................................................... 417
15.1 INTRODUCTION
With the continual evolution of information technology (IT) infrastructures, the
related vulnerabilities and exploitations are increasing because of the security issues
raised during the operational phase. Today, there is no software system that is free
from weaknesses or vulnerabilities, whether it is a system for personal use or for a
large-scale organization. According to National Vulnerability Database (NVD), a total
401
402 Reliability Engineering
of 16,555 security vulnerabilities were reported in 2018 (the highest figures thus far).
This statistic indicates that vulnerability assessment is the most ignored security tech-
nology today. Thus, there is a need to quantify the discovered software vulnerabilities
with some mathematical models with an improvement in security without increasing
penetration costs. However, some considerable work has been done on modeling the
vulnerabilities with respect to time (Alhazmi & Malaiya 2005a, 2005b; Kimura 2006;
Kim et al. 2007; Okamura et al. 2013; Joh and Malaiya 2014; Kapur et al. 2015;
Sharma et al. 2016; Kansal et al. 2017a, 2017b; Movahedi et al. 2018). In the next
section, we will discuss briefly the vulnerability life cycle followed by a literature
review of vulnerability discovery models (VDMs) in Section 15.2. In Section 15.3, we
provide a description of the modeling frameworks of VDMs based on a different set
of assumptions followed by vulnerability patching models (VPM). Then modeling of
the multi-version vulnerability discovery will be discussed in Section 15.4 followed
by the conclusion and future research directions in Section 15.5.
15.1.1 Vulnerability
One of the best definitions of software vulnerability is given by Schultz et al. (1990)
who defined it as follows: “A vulnerability is defined as a defect which enables an
attacker to bypass security measures.” To assess the value of vulnerability finding,
we must examine the events surrounding discovery and disclosure. Schneier (2000)
described the lifecycle of a vulnerability in six phases: Introduction, Discovery,
Private Exploitation, Disclosure, Public Exploitation, and Fix Release. These events
do not necessarily occur strictly in this order. Disclosure and Fix Release often occur
together, especially when a manufacturer discovers a vulnerability and releases the
announcement along with a patch (Figure 15.1).
Expectation of a more secured software system requires longer testing that results
in high cost and delay in release with increased selling price. However, due to strong
market competition, the release time cannot be delayed or the price of the soft-
ware cannot be increased. Therefore, a trade-off between testing and launch time
is required. In the existing literature, many quantitative models have been proposed
by several authors. These quantitative models can help the developers in allocating
the resources for security testing, scheduling, and development of security patches
(Alhazmi & Malaiya 2005a, 2005b; Kimura 2006; Kim et al. 2007; Okamura et al.
2013; Joh et al., 2014; Kapur et al. 2015; Sharma et al. 2016; Younis et al., 2016;
Kansal et al. 2017a, 2017b). In addition, developers can use VDMs to assess risk and
estimate the redundancy needed in resources and procedures to deal with potential
breaches. These measures help to determine the resources needed to test a specific
part of software. The prime objective of this study is to understand the mathematical
models pertaining to vulnerability discovery and patching phenomenon.
the Weibull distribution and is known as the Joh-Weibull (JW) model. The model
represents the asymmetric nature of the vulnerability discovery rate because of the
skewness present in probabilistic density functions. Although this model is also
exclusively dependent on discovery time, Bass et al. (1969) scrutinized the factors
that motivate the vulnerability discoverers to spend the effort in findings. As per
the study, the discoverers are more attracted toward bug bounty programs that have
become the main reason for their encouragement. However, they have not modeled
the vulnerability discovery process. Massacci and Nguyen (2014) proposed a meth-
odology to validate the performance of empirical VDMs. The methodology focuses
on two quantitative metrics: quality and prediction capability. The quality is mea-
sured on the basis of good fit and inconclusive fit while the predictive accuracy is
measured on current and future horizon. However, he does not propose any math-
ematical model. Joh et al. (2014) found the relationship between performance of
S‐shaped vulnerability discovery models and the skewness in some vulnerability
data sets and applied Weibull, Beta, Gamma, and Normal distributions. Anand et al.
(2017) proposed an approach to quantify the discovered vulnerabilities using vari-
ous software versions. The authors examined their approach using Windows and
Windows Server Operating Systems. Zhu et al. (2017) proposed a mathematical
model that predicts the software vulnerabilities and used the estimated parameters to
develop a new risk model. The authors also determined the severity of vulnerability
using logistic function and binomial distribution, respectively. Although this model
also is dependent exclusively on discovery time, Wai et al. (2018) proposed two new
algorithms—mean fit and trend fit—to predict the vulnerability discovery rate using
past vulnerability data. Recently, Movahedi et al. (2018) used a clustering approach
to group vulnerabilities into different clusters and then used NHPP-based software
reliability models to predict the number of vulnerabilities in each cluster and then
combined them together to find the total number of vulnerabilities in the system.
In the next section, we will briefly discuss the VDMs proposed in the literature so far.
k
p(t ) = (15.1)
γt
where:
k is a constant
γ is value that takes care of lower failure rate during beta testing by the users in
comparison with alpha testing
Vulnerability Discovery and Patch Modeling 405
k
Ω(t ) = ln(Ct ) (15.2)
γt
dΩ
= AΩ( B − Ω) (15.3)
dt
B
Ω(t ) = − ABt
(15.4)
BCe +1
ω (t ) = Bt + K (15.5)
Bt 2
Ω(t ) = + kt (15.6)
2
ω (t ) = Bλ e − λt (15.7)
where B represents the total number of vulnerabilities in the system and λ is the rate
constant. On integrating Equation 15.7, we get the cumulative number of vulner-
abilities as:
Kapur et al. (2015) applied two of the SRGMs (i.e., the Kapur & Garg (1992) Model
and the Two Stage Erlang Logistic Model) on vulnerability data sets and compared
their results with the AML model. They claimed that the results are equivalent to
those obtained from AML model. Shrivastava et al. (2015) applied stochastic dif-
ferential equation to develop a stochastic VDM using the AML model and found that
results of their model are better than the AML model. The formulation of the model
follows.
dN ( t )
= b ( t ) B − N ( t ) (15.9)
dt
Vulnerability Discovery and Patch Modeling 407
Now assuming irregular variations in b(t) Equation 15.9 can be extended as the fol-
lowing SDE:
dN ( t )
= {b ( t ) + σγ ( t )}{ B − N ( t )} (15.10)
dt
∧
We extend the previous equation to the following SDE of an It O type:
1
dN ( t ) = b ( t ) − σ 2 { B − N ( t )} dt + σ B − N ( t ) dW ( t ) (15.11)
2
where
∧
W(t) is called a Brownian or Wiener process. After solving Equation 15.11 using
It O formula, we get:
B − k −( Bbt − 1 σ 2t )
.e
2
k (15.13)
Ω(t ) = E N ( t ) = B 1 −
B − k − Bbt
1 + e
k
Using the previous equation, we can predict the number of vulnerabilities in the
software.
∑
n
E= (U i − Pi ) (15.14)
i =0
Here U i denotes the number of users working on all systems at the time period i and
Pi is the percentage of the users using the system. Assuming that vulnerability detec-
tion rate is proportional to the effort and the remaining number of vulnerabilities, the
effort based VDM is given as follows:
Notations Description
S Actual number of software buyers
S (t ) Cumulative number of potential software users at time t
d Ω d Ω dI dS
= ⋅ ⋅ (15.16)
dt dI dS dt
dΩ Ω
= x + y ⋅ ⋅ ( B − Ω ) (15.17)
dI B
where:
( B − Ω) are the remaining vulnerabilities residing in the software
‘x’ is the rate with which unique vulnerabilities are detected
‘y’ is the rate with which the dependent vulnerabilities are detected through the
support rate of ΩB .
dI
= k (15.18)
dS
3. The rate at which the number of people buys the software is given by:
dS S
= α + β ⋅ ⋅ ( S − S ) (15.19)
dt S
where:
( S − S ) are the remaining number of users who have yet to buy the software
α and β are the rate with which innovators and imitators are buying the software
Vulnerability Discovery and Patch Modeling 409
1 − exp ( − (α + β ) ⋅ t )
S (t ) = S ⋅ (15.20)
β
1 + ⋅ exp ( − (α + β ) ⋅ t )
α
Now from Equations 15.17 through 15.20, the vulnerability discovery rate becomes:
dΩ Ω dS
= x + y ⋅ ⋅ ( B − Ω ) ⋅ k ⋅ (15.21)
dt B dt
k
Ω (t ) = B ⋅
(1 + h ⋅ exp ( − ( x + y ) ⋅ S (t ))) − ((1 + h) ⋅ exp ( − ( x + y ) ⋅ S (t ) ⋅ k )) (15.22)
k
(1 + h ⋅ exp ( − ( x + y ) ⋅ S (t )))
where h = xy , A = x + y
α −1 t
1 t −
β
f (t ) = e ; t ≥ 0, α , β > 0 (15.23)
Γ (α ) β β
where α , β denote the shape and scale parameters, respectively. α controls the shape
of distribution. The cumulative distribution function for Gamma to perform vulner-
ability prediction is given by:
t
γ (α , β t )
cdf (Gamma) = F ( t ;α , β ) =
∫ f ( u;α , β ) du =
0
Γ (α )
(15.24)
So,
Ω(t ) = B * F (t ,α , β ) (15.25)
410 Reliability Engineering
d Ω d Ω dC dI dX
= ⋅ ⋅ ⋅ (15.26)
dt dC dI dX dt
where C, I, and X are explicitly the operational coverage, executed instructions, and
operational effort. The four components in the right-hand side are defined as:
1.
Component 1: Here it was assumed that the vulnerability discovery rate is
directly proportional to the operational coverage rate of the remaining vul-
nerabilities and inversely proportional to uncovered proportion of software
and given by:
dΩ c′
= A1 ⋅ ⋅ ( B − Ω ) (15.27)
dC p−c
dC
= φ 1 (15.28)
dI
3.
Component 3: The rate at which instructions are executed per operational
effort is assumed to be constant and given by:
dI
= φ 2 (15.29)
dX
4.
Component 4: Rate of operational effort is directly proportional to remain-
ing resources where vulnerability discoverers and time are the resources
that are considered as operational effort spent on vulnerability discovery
and it is given by:
dX
= β ( t ) ⋅ (α − X ( t ) ) (15.30)
dt
where β (t ) is the time dependent rate at which operational resources are con-
sumed and α is the total amount of effort required for vulnerability discovery.
Vulnerability Discovery and Patch Modeling 411
d Ω c′
= A1⋅
dt p − c
( ) dX
⋅( B − Ω ) ⋅ φ 1 ⋅ (φ 2 ) ⋅
dt
(15.31)
A1⋅φ1⋅φ 2
c ( X (t ))
Ω ( X ( t ) ) = B ⋅ 1 − 1 − (15.32)
p
In the previously described model, the authors took various effort functions X(t), that
is, to find the final model for vulnerability prediction. They used the Weibull and the
Logistic effort functions in their model. They further took various operational cover-
age functions in their model. For example, if operational effort is assumed to follow
Weibull distribution, then Mean Value Function (MVF) or VDM becomes:
A1⋅φ1⋅φ 2
h
− β ⋅t k
− A2⋅ α ⋅1−e
Ω ( t ) = B ⋅ 1 − e
(15.33)
Notation Description
ρ ( r ) Expected number of patches released with respect to patching resources r
A Vulnerability patching rate
r Patching resources
t Patching time or patch release time
v Vulnerabilities reported/discovered
d Vulnerabilities disclosed
B Actual potential number of patches released
C Integration constant
∆, δ Intermediate variables
d ρ
dt
( ) ρ
( )
= A ⋅ B − ρ + C ⋅ (1 − σ ) ⋅ ⋅ B − ρ − σ ⋅ ρ (15.34)
B
where A represents the proportion of patches that are released or installed suc-
cessfully without disruption. While C represents the proportion of patches that are
released because of the reports submitted to vendors about vulnerabilities.
Under the initial condition ρ (t = 0) = 0 and solving the above equation we get:
− A + C ⋅t
B ⋅ 1 − e
ρ = (15.35)
C
− A + C ⋅t
1+ ⋅ e
A
Vulnerability Discovery and Patch Modeling 413
r ≅ vα ⋅ t1−α 0 ≤ α ≤ 1 (15.36)
where “r” refers to the patching resources, v refers to the quantifiable vulnerabilities,
“t” refers to the patching time and α as the degree of impact to the vulnerability
patching process. The model development is similar to what we have already defined
in Section 15.3.1 where the only change is to replace “t” with “r” to obtain the final
equation as:
dΩ (t ) v (t )
= B − Ω ( t ) (15.38)
dt 1 − V (t )
Ω ( t ) = B ⋅V ( t ) (15.39)
414 Reliability Engineering
After accounting for the number of vulnerabilities discovered, the next step taken by
developers is to develop patches. Hence, we have considered the vulnerability patch-
ing time in our model under the vulnerability discovery process. The intensity with
which discovered vulnerabilities are patched can be calculated as:
d ρ ( t )
=
[ v ∗ p]( t ) B − ρ ( t ) (15.40)
dt 1 − [V ⊗ P ] ( t )
v ∗ p ( t ) (15.41)
1 − V ⊗ P ( t )
d [V ⊗ P ]( t )
is the vulnerability patching rate wherein dt = [v ∗ p](t ).
The symbol [v ∗ p](t ) denotes convolution of v and p. Another definition of con-
volution function that is a stieltjes convolution is represented as [V ⊗ P ](t ) . Solving
Equation 15.41 under the initial conditions ρ (t = 0) = 0, we get:
ρ ( t ) = B ⋅ (V ⊗ P )( t ) (15.42)
V ( t ) = (1 − exp ( − A ⋅ t ) ) (15.43)
Ω ( t ) = B ⋅ (1 − exp ( − A ⋅ t ) )
1 − exp ( − A ⋅ t )
P (t ) =
1 + C ⋅ exp ( − A ⋅ t )
(15.44)
where A represents the vulnerability patching rate with learning and C represents the
shape parameter.
To obtain the simple mathematical form for the proposed model, we have assumed
that the discovery rate A as in Equation 15.43 is same as the patching rate with learn-
ing as in Equation 15.44. In other words, we have considered that the discovery rate
and patching rate are the same.
Vulnerability Discovery and Patch Modeling 415
V (t ) ⊗ P (t ) =
∫ P (t − x ) dV ( x ) (15.45)
Equation 5.45 shows the time delay between the vulnerability discovery and the
patching process wherein the vulnerability discovery time is denoted as x and the
vulnerability patching time is denoted as t − x. Here, the model also manifests that it
is not necessary that the number of vulnerabilities discovered and patched are always
same. However, at time infinity the numbers may become similar.
Thus, from Equations 15.43 and 15.44, Equation 15.45 can be re-written as:
1 − exp ( − A ⋅ ( t − x ) )
t
V (t ) ⊗ P (t ) =
∫
0
⋅ ( A ⋅ exp ( − A ⋅ x ) ) ⋅ dx (15.46)
1 + C ⋅ exp ( − A ⋅ ( t − x ) )
On solving Equation 15.46, we get:
(1 + C ) ⋅ exp ( − A ⋅ ( t ) )
ρ ( t ) = B ⋅ 1 − exp ( − A ⋅ t ) + (1 + C ) ⋅ exp ( − A ⋅ t ) ln (15.47)
(
1 + C ⋅ exxp ( − A ⋅ ( t ) ) )
Equation 15.47 is used further for predicting the number of vulnerabilities discov-
ered and patched.
n
Bi′
Ω(t ) = ∑α B ′C ′e
i =1
i
i i
− Ai′ Bi′ ( t −ε i )
+ 1 (15.49)
Following the assumptions of Kim et al. (2007), Anand et al. (2017) developed a
framework for predicting the number of vulnerabilities in multi-versions of software
and proposed a similar model and showed that the results are equivalent to those
obtained from the model proposed by Kim et al. (2007).
where F1( S1(t )) represents the user dependent vulnerability discovery function.
For predicting the number of vulnerabilities in the next version, Narang et al. (2018)
considered the vulnerabilities of previous version which were removed in the current
version should be counted in the newer version. The mathematical form for the next
version of vulnerabilities is given by:
where B1(1 − F1( S1(t1 )) are some left over vulnerabilities of the first version, and
F1( S1(t )) and F2 ( S 2 (t )) are the vulnerability discovery rates of older and newer
versions.
REFERENCES
Alhazmi, O. (2007). Assessing vulnerabilities in software systems: A quantitative approach.
Thesis, Colorado State University.
Alhazmi, O.H., & Malaiya, Y.K. (2005a). Modeling the vulnerability discovery pro-
cess. In 16th IEEE International Symposium on Software Reliability Engineering
(ISSRE’05) (pp. 10–pp). IEEE.
Alhazmi, O.H., & Malaiya, Y.K. (2005b). Quantitative vulnerability assessment of systems
software. IEEE, pp. 615–620.
Alhazmi, O.H., & Malaiya, Y.K. (2008). Application of vulnerability discovery models to
major operating systems. IEEE Transactions on Reliability, 57(1), 14–22.
Anand, A., Das, S., Aggrawal, D., & Klochkov, Y. (2017). Vulnerability discovery modelling
for software with multi-versions. In Advances in Reliability and System Engineering
(pp. 255–265). Cham, Switzerland: Springer International Publishing.
Anderson, R. (2002). Security in open versus closed systems: The dance of Boltzmann, Coase
and Moore. Technical report, Cambridge University.
418 Reliability Engineering
Bass, F.M. (1969), A new-product growth model for consumer durables. Management Science,
15, 215–227.
Joh, H., & Malaiya, Y.K. (2014). Modeling skewness in vulnerability discovery: Modeling
skewness in vulnerability discovery. Quality and Reliability Engineering International,
30(8), 1445–1459.
Kansal, Y., & Kapur P.K. (2019). Two-dimensional vulnerability patching model. In: Kapur,
P., Klochkov, Y., Verma, A., Singh, G. (Eds.), System Performance and Management
Analytics: Asset Analytics (Performance and Safety Management) (pp. 321–331).
Singapore: Springer.
Kansal, Y., Kapur, P.K., & Kumar, U. (2018). Coverage based vulnerability discovery model-
ing to optimize disclosure time using multi-attribute approach. Quality and Reliability
Engineering International, 35(1), 62–73. doi:10.1002/qre.2380.
Kansal, Y., Kapur, P.K., Kumar, U., & Kumar, D. (2017a). User-dependent vulnerability
discovery model and its interdisciplinary nature. International Journal of Life Cycle
Reliability and Safety Engineering, 6(1), 23–29.
Kansal, Y., Kapur, P.K., Kumar, U., & Kumar, D. (2017b). Effort and coverage dependent
vulnerability discovery modeling In: IEEE Xplore, International Conference on
Telecommunication and Networking (TELNET), Noida.
Kansal, Y., Kumar, D., & Kapur, P.K. (2016a). Assessing optimal patch release time for vul-
nerable software systems. In IEEE Xplore, International Conference on Innovation
and Challenges in Cyber Security (ICICCS-INBUSH), Noida, pp. 308–314.
Kansal, Y., Kumar, D., & Kapur, P.K. (2016b). Vulnerability patch modeling. International
Journal of Reliability, Quality and Safety Engineering, 23(6), 1640013.
Kapur, P.K., & Garg, R.B. (1992). A software reliability growth model for an error-removal
phenomenon. Software Engineering Journal, 7(4), 291–294.
Kapur, P.K., Pham, H., Gupta, A., & Jha, P.C. (2011). Software Reliability Assessment with
OR Applications. London, UK: Springer.
Kapur, P.K., Yadavalli, V.S.S., & Shrivastava, A.K. (2015). A comparative study of vulnerabil-
ity discovery modeling and software reliability growth modeling. In The IEEE Xplore
Proceedings of International Conference on Futuristic Trends in Computational
Analysis and Knowledge Management, Amity University, Greater Noida, February
25–27, pp. 246–251.
Kim, J., Malaiya, Y.K., & Ray, I. (2007). Vulnerability discovery in multi-version software
systems. In 10th IEEE High Assurance Systems Engineering Symposium. HASE’07,
pp. 141–148.
Kimura, M. (2006). Software vulnerability: Definition, modelling, and practical evaluation
for e-mail transfer software. International Journal of Pressure Vessels and Piping,
83(4), 256–261.
Massacci, F., & Nguyen, V.H. (2014). An empirical methodology to evaluate vulnerability
discovery models. IEEE Transactions on Software Engineering, 40(12), 1147–1162.
Movahedi, Y., Cukier, M., Andongabo, A., & Gashi, I. (2018). Cluster-based vulnerability
assessment of operating systems and web browsers. Computing, 1–22. doi:10.1007/
s00607-018-0663-0.
Narang, S., Kapur, P.K., Damodaran, D., & Shrivastava, A.K. (2017). User-based multi-upgra-
dation vulnerability discovery model. In 6th International Conference on Reliability,
Infocom Technologies and Optimization (Icrito 2017) (Trends and Future directions) to
be held during September 20–22, 2017, Amity University Uttar Pradesh.
Narang, S., Kapur, P.K., Damodaran, D., & Shrivastava, A.K. (2018). Bi-criterion problem to
determine optimal vulnerability discovery and patching time. International Journal of
Quality Reliability and Safety Engineering, 25(1), 1850002.
Vulnerability Discovery and Patch Modeling 419
Okamura, H., Tokuzane, M., & Dohi, T. (2013). Quantitative security evaluation for software
system from vulnerability database. International Journal of Software Engineering &
Applications, 6(3), 15.
Ozment, A., & Schechter, S.E. (2006). Milk or wine: Does software security improve with
age? Proceedings of the 15th Conference on Usenix Security Symposium, Berkeley,
CA.
Ozment, J.A. (2007). Vulnerability discovery & software security. PhD thesis, University of
Cambridge.
Rescorla, E. (2003). Security holes. Who cares? In Proceedings of the 12th Conference on
USENIX Security Symposium, pp. 75–90.
Rescorla, E. (2005). Is finding security holes a good idea? IEEE Security & Privacy, 3(1),
14–19.
Schneier, B. (2000). Full disclosure and the window of vulnerability, Crypto-Gram
(September 15, 2000). www.counterpane.com/cryptogram-0009.html#1.
Schultz, E.E., Brown, D.S., & Longstaff, T.A. (1990). Responding to Computer Security
Incidents, Lawrence Livermore National Laboratory, 165. http://ftp.cert.dfn.de/pub/
docs/csir/ ihg.ps.gz, July 23.
Sharma, R., Sibbal, R., & Shrivastava, A.K. (2016). Vulnerability discovery modeling for open
and closed source software. International Journal of Secure Software Engineering,
7(4), 19–38.
Shrivastava, A.K., Sharma, R., & Kapur, P.K. (2015). Vulnerability discovery model
for a software system using stochastic differential equation. In The IEEE Xplore
Proceedings of International Conference on Futuristic Trends in Computational
Analysis and Knowledge Management, Amity University, Greater Noida, February
25–27, pp. 199–205.
Wai, F.K., Yong, L.W., Divakaran, D.M. & Thing, V.L.L. (2018). Predicting vulnerability dis-
covery rate using past versions of a software. In 2018 IEEE International Conference
on Service Operations and Logistics, and Informatics (SOLI), Singapore, pp. 220–225.
Woo, S., Alhazmi, O., & Malaiya, Y. (2006). Assessing vulnerabilities in apache and IIS
HTTP servers. In 2006 2nd IEEE International Symposium on Dependable, Autonomic
and Secure Computing IEEE, pp. 103–110.
Younis, A., Joh, H., & Malaiya, Y. (2011). Modeling learningless vulnerability discovery
using a folded distribution. Proceedings of SAM, 11, 617–623.
Younis, A., Malaiya, Y.K., & Ray, I. (2016). Assessing vulnerability exploitability risk using
software properties. Software Quality Journal, 24, 159–202.
16 Signature Reliability
Evaluations
An Overview of
Different Systems
Akshay Kumar, Mangey Ram, and S. B. Singh
CONTENTS
16.1 Introduction................................................................................................. 421
16.2 Algorithms Used in Signature Reliability................................................... 427
16.2.1 Algorithm for Computing the Signature Using
Reliability Function...................................................................... 427
16.2.2 The Algorithm to Assess the Expected Lifetime of the System
by Using Minimum Signature........................................................ 428
16.2.3 Algorithm for Obtaining the Barlow-Proschan Index for the
System............................................................................................ 429
16.2.4 Algorithm to Determine the Expected Value of the System.......... 429
16.2.5 Algorithm for Obtaining the Reliability of the Sliding
Window System.............................................................................. 429
16.3 Illustrations.................................................................................................. 429
16.4 Conclusion................................................................................................... 436
References............................................................................................................... 436
16.1 INTRODUCTION
In recent years, substantial efforts are being made in the development of reliability
theory including signature and fuzzy reliability theories and their applications to
various areas of real-life problems. Barlow and Proschan (1975) discussed an impor-
tant measure of the elements in a coherent system and expressed its fundamental
characteristics in the fault tree. The given new important measure is a useful tool for
evaluating the minimum cut sets, system reliability, and minimum cost of the fault
tree system using the Monte Carlo method and life distribution. They discussed a
method for computing the importance in hazard rate corresponding to series-parallel
and complex systems. Owen (1975) discussed multi-linear extensions of the
composite value of compounds game theory and evaluated the Banzahat value by
differentiating the extension value of the game unit cube. The presidential election
game and Electoral College can be computed from the proposed algorithm.
421
422 Reliability Engineering
Samaniego (1985) presented the failure rate of an erratic coherent system with a
lifetime element having independent identically distributed (i.i.d.) elements using the
common continuous distribution F. Various examples were quoted for a coherent
system including the closure theorem for k-out-of-n:F system having i.i.d. elements
and obtained various characteristic of the s-coherent system. Owen (1988) defined
the theory of multi-linear extensions of games and discussed its various properties in
real-life situations based on the Shapley game theory. This study showed that game
theory is a very useful tool for solving many real-life problems. Shapley introduced
game theory in 1953, by which players could compute their utility scales and then
play could be improved. Boland et al. (1990) considered a consecutive k-out-of-n:F
system that consisted of n ordered elements of a coherent system and the system fails
if at least k consecutive elements fail. They presented several examples for consecu-
tive k-out-of-n:F systems applied in oil pipelines, telecommunications, and circuitry
system. Also, they computed the reliability of consecutive k-out-of-n:F systems that
had elements independent from each other. They developed a system having positive
dependence between adjoining elements and showed the reliability of the system was
less for k ≥ (n + l)/2. Yu et al. (1994) investigated the multi-state coherent systems
(MSCS) assumed that the states of the system and its elements are totally ordered set.
They discussed a new MSCS: generalized multi-state coherent system. They ana-
lyzed some properties of the MSCS generalized model and defined a new approach
for computing signature of MSCS. They analyzed some properties of the MSCS
generalized model and defined a new approach for evaluating the signature of MSCS.
Ushakov (1986, 1994) discussed reliability engineering that plays a key role in real
life. He reviewed and discussed the system reliability and applied it to engineering
systems. He introduced some basic techniques applicable in cutting-edge results,
probabilistic reliability ,and statistical reliability, etc. He presented various tech-
niques and applications of reliability theory in real-life systems. Kochar et al. (1999)
discussed the different techniques and properties for discussing coherent systems
having i.i.d. lifetime elements. They assumed that all comparisons rely on the presen-
tation of a system’s lifetime element as a function of the system’s signature. Signature
of the coherent system was based on the probability of that system and failed with the
ith failure element. They introduced a method for evaluating the system signature
from the stochastic method, hazard rate ordering, and likelihood ordering ratio
method and presented an approach to the coherent system. Levitin (2001) considered
a redundancy optimization system for a multi-state system that has a fixed amount of
resources for its work performance and resource generator from the subsystem.
The suggested algorithm evaluated the optimal system structure and system avail-
ability. The system productivity, availability, and cost were evaluated from perfor-
mance based on each element. The main goal of the study was to minimize the cost
investment, total demand, and to present the demand curve based on system proba-
bility. A genetic algorithm was used for solving universal generating function (UGF)
based problems, to compute the system availability, optimal structure function while
the working element of the subsystem had a maximum performance rate under given
demand distribution. Boland (2001) studied the characteristic of signatures having an
i.i.d. lifetime element based on a coherent system. He concluded that a signature is a
widely useful technique for comparing different systems properties and discussed
Signature Reliability Evaluations 423
simple and indirect majority system characteristics. Based on signature and system
lifetime, the ith order statistic described the probabilities of system element and its
computation for the path set and ordered cut set of the system lifetime element.
Levitin (2002) proposed a new system linear multi-state sliding window system that
generalized the multi-state consecutive k-out-of-r-from-n:F system. The considered
system consisted of n linearly ordered multi-state elements. Each element could have
two states: total failure or completely working. If the performance sum of the r con-
secutive element is lower than the total allocated weight, then the system called fails.
The author evaluated various characteristics of the linear multi-state sliding window
system with the suggested algorithm to find the order of elements and maximum
system reliability. A genetic algorithm is used as the optimal solution based on a
UGF technique for reliability computation. Levitin (2003a) introduced a two-state
linear multi-state sliding window system which consisted of n linearly ordered multi-
state elements. The system performance rate was based on a given performance
weight. The author presented an approach for calculating the reliability of the sliding
window system (SWS) to the common supply failures (CSFs) and common supply
groups (CSGs). He also described a method for comparing optimal element distribu-
tions of the CSG system reliability. The proposed study computed the optimization
result with the help of the UGF technique and the genetic algorithm. Levitin (2003b)
proposed multi-state a system that generalized the consecutive k-out-of-r-from-n:F
system. The considered linear multi-state SWS consisted of n ordered multi-state
element and every element could have two states. In this study, he evaluated the sys-
tem reliability, mean time to failure (MTTF) and cost of the considered system using
the extended universal moment generating function. Boland and Samaniego (2004)
described the various characteristic of a system called its “signature.” They defined a
concept between a system’s signature and other well-known system reliabilities and
found that the signature was useful for comparing different systems. They provided
different stochastic comparisons between systems and signature-based comparisons
of a coherent system. They investigated the signature of different systems having an
i.i.d. lifetime element and evaluated expected lifetime and expected cost using the
system reliability function and order statistical methods. Belzunce and Shaked (2004)
reviewed and studied the properties of the failure profile in the coherent system.
In this study, the authors presented system reliability based on the methods of path
set and cut set and discussed the relationship between elements and properties of
failure profiles. They derived an expression for the independent element and density
function of the lifetime distribution of a coherent system. Also, they presented the
likelihood ratio of lifetimes of two systems using failure profiles and obtained bounds
of failure profiles in the likelihood ratio on the lifetimes of coherent systems with
independent and without identical lifetimes. Navarro and Rychlik (2007) studied the
structure functions and the MTTF rate of coherent systems depending on exchange-
able elements having a lifetime distribution function depending on the signature.
They discussed exchangeable elements with absolutely continuous joint distribution
order statistics with the weights identical to the signature based on any coherent sys-
tem. They assessed expectation bounds for exchangeable exponential elements and
expressed the parent marginal reliability function from reliability bounds for all the
coherent with three and four exchangeable elements with exponential lifetime
424 Reliability Engineering
discussed with the help of Samaniego signature simulation and defined system char-
acteristics based on a coherent system. The multivariate Pareto distribution was used
to evaluate the results of the system with exchangeable elements. Eryilmaz (2010)
examined the reliability functions of the consecutive systems as a mixture of the reli-
ability of order statistics which consisted of exchangeable lifetime elements. He also
revealed that the reliability and stochastic ordering results for consecutive k-system
can be computed from mixture representations. The consecutive k-systems can be
applied in an oil pipeline, a system in accelerators, vacuums, telecom networks, and
spacecraft relay stations. Navarro and Rychlik (2010) discussed the expected lifetime
of system reliability and compared their bounds and calculated expected lifetimes of
the coherent system and mixed systems based on elements with independently dis-
tributed lifetimes. They obtained better inequalities dependent on a concentration
measure connected to the Gini dispersion index in case of i.i.d. The expected life-
times of series systems of compact sizes could be derived from bounds and expected
a lifetime of one unit in the case of i.i.d. lifetime distribution. Da Costa Bueno (2011)
determined the importance measure of a coherent system in the presentation of its
signature and described the properties of the dynamic system signature, Barlow-
Proschan importance, and element importance under compensator transforms in case
of deterministic compensators having i.i.d elements using lifetime distribution.
Eryilmaz et al. (2011) discussed the m-consecutive-k-out-of-n:F systems with
exchangeable elements based on reliability properties and evaluated the recurrence
relations for the signature of the system by exact methods. They introduced order
statistics and the lifetime distribution for describing system reliability metrics.
They also computed the system minimum and maximum signature having i.i.d. ele-
ments and MTTF from stochastic ordering methods for the m-consecutive-k-out-of-
n:F system. Lisnianski and Frenkel (2011) studied the MSS reliability evaluation on
the basis of signature, optimization, and statistical inference. They discussed the
advanced role of a signature in dynamic reliability and non-parametric inference for
lifetime distribution. The authors defined the role of a coherent system in various
engineering problems and dynamic reliability based on the signature. They also pre-
sented various methods for signature-based representation of a coherent system
using order statistical, Markov process, and multiple-valued logic methods and com-
puted MSS reliability, expected lifetime, and cost. Mahmoudi and Asadi (2011) eval-
uated the properties of dynamic signature for a coherent system. They reviewed and
studied the concept of signature for the stochastic and advance advantage of coherent
systems. They considered a coherent system and described its various characteristics
and measures in real-life situations and evaluated engineering reliability based on
partial information and obtained the lifetime failure probability of the coherent sys-
tem. Triantafyllou and Koutras (2011) proposed a 2-within consecutive k-out-of-n:F
system that consisted of exchangeable elements. The system was based on the signa-
ture and they gave some stochastic comparisons between the reliability function and
the lifetime element. Researchers presented many stochastic orderings in the 2-within
consecutive k-out-of-n:F system with signature. In this study, they discussed the
preservation of intrinsic failure rate (IFR) property with the help of the proposed
system. A 2-within consecutive k-out-of-n:F system is used in telecommunication,
oil pipeline, and vacuum systems in accelerators. Balakrishnan et al. (2012)
426 Reliability Engineering
presented an observation of the present theories relating to the signatures and their
applicable use in the study of dynamic reliability, systems with i.i.d. elements and
non-parametric inference for an element lifetime distribution. They introduced the
various properties of the signature based on a coherent system. The authors discussed
various methods, algorithms for obtaining system reliability, expected lifetime,
Barlow-Proschan index, and expected cost rate using order statistics and reliability
functions of a coherent systems. Eryilmaz (2012) investigated the number of ele-
ments that fail at the time of system failure. The author discussed the coherent sys-
tems such as linear consecutive k-within-m-out-of-n:F and m-consecutive-k-out-of-n:F
and obtained expected lifetime, expected x value, and system reliability of consid-
ered linear consecutive k-within-m-out-of-n:F and m-consecutive-k-out-of-n:F sys-
tems using lifetime distributions and ordering statistics. Da Costa Bueno (2013)
introduced the multi-state monotone system using decomposition methods and eval-
uated the signature of a coherent system in the classical case through exchangeability
properties. The system reliability function was obtained with monotone i.i.d. ele-
ments and the Samaniego signature. The work also included the study of the signa-
ture of the binary and MSS with the help of the proposed theorem. Marichal and
Mathonet (2013) evaluated that the Samaniego signature of a coherent system has
i.i.d. lifetime elements using Boland’s formula, which had structure function.
They measured the signature of the coherent system: derivative, Barlow-Proschan
index, and tail signature with lifetime distribution. For computing the signature of the
coherent system with structure function, they used Owen’s method. In real-life situ-
ations, various engineering problems were discussed and provided various methods
and algorithms for determining system signature. Da et al. (2014) studied and dis-
cussed the signature of a k-out-of-n coherent system consisting of n elements.
They computed the minimal signature and the signature of the binary coherent sys-
tem and their combination of elements were derived. The authors gave several
numerical examples for defining the characteristic of a coherent system with i.i.d.
elements based on the minimum path set along with application in engineering fields.
Also, they obtained the signature from order statistics and suggested algorithms.
Eryilmaz (2014) discussed the signature of a system that is an effective tool not only
for investigation of the binary coherent systems but also for application in network
systems. For evaluating the system signature of series and parallel systems, he
derived a simple method based on the signature and minimum signature of modules
with the help of system structure functions. A simple statistical approach was given
for comparing the system signature, which was dependent on a coherent system and
computation of series and parallel system modules. Eryilmaz (2015) defined the rep-
resentation for a mixture of the 3-state system with three state elements and reliabil-
ity modeling of 3-state systems consisting of 3-state s-independent elements.
The systems and its element could have three states: perfect functioning, partial per-
formance, and complete failure. The presented study showed that survival functions
of the systems were of different state subsets. Markov process was used for analy-
zing the signature of the 3-state consecutive-k-out-of-n:G systems consisting of
s-independent elements. Lindqvist and Samaniego (2015) introduced that the signa-
ture reliability of a coherent system is a very useful tool in the study with i.i.d. life-
time elements. The signature of a coherent system in n element was a vector whose
Signature Reliability Evaluations 427
kth failure element caused a system failure. They evaluated the dynamic signature of
binary and complex systems with minimum repair called system conditional dynamic
signature with the help of suggested stochastic and minimal path sets. Eryilmaz and
Tuncel (2015) studied a k-out-of-n system that consisted of n linearly ordered ele-
ment (linear and circular). They discussed signature with the help of simulation and
that the system could have various numbers of the element. After obtaining the sig-
nature based on the expression for the structure function, MTTF, mean number of the
failed element, they provided various applications in the engineering fields. Franko
and Tutuncu (2016) computed the reliability of the weighted k-out-of-n:G system
based on the signature with repairable i.i.d. lifetime elements. They studied the reli-
ability and some reliability indices with the repairable weighted k-out-of-n:G system
and found several uncertainties via signature. The proposed system is widely used in
the engineering field such as solar field, military system, etc. They computed the
system signature of the considered system depending on the weights of the element
using the stochastic method and path set and calculated the Birnbaum and Barlow-
Proschan element importance measures through the suggested algorithms. Chahkandi
et al. (2016) discussed a repairable coherent system to examine signature and
Samaniego’s notation for i.i.d. lifetime elements. The Poisson process was used to
calculate the failure element that has the same intensity function. They presented
Samaniego for i.i.d. random variable, whereas the Poisson process could have an
identical intensity function. The authors supposed that the reliability function of a
coherent system depends on the mixture of the probabilities and number of repair-
able elements. They determined the reliability function of the series system using a
stochastic order statistic algorithm. Samaniego and Navarro (2016) studied the
coherent system and its properties for comparing heterogeneous elements. They used
various methods for comparing coherent systems having both independent and
dependent elements. In the independent case, for computing the signature in survival
function, Coolen and Coolen-Maturi methods were used. Kumar and Singh (2017a,
2017b, 2017c) evaluated the signature, expected cost, MTTF, and Barlow-Proschan
index of various engineering systems with the help of reliability functions and using
UGF techniques. Bisht and Singh (2019) discussed the signature of complex bridge
networks with binary state nodes using UGF techniques. They computed the signa-
ture of each node in series, parallel, and complex forms of the network system.
∑ ∑
1 1
Aa = φ(H) − φ ( H ) (16.1)
s H ⊆s s H ⊆ s
s − a + 1 H =s −a+1 s − 1 H =s −1
428 Reliability Engineering
Step 2: Evaluate the tail signature of the system, i.e., ( s +1)-tuple V = (V0 ,...,Vs )
with
s
∑V = s ∑ φ ( H ) (16.2)
1
Va = i
i = a +1
H =s−a
s − a
Step 3: Calculate the reliability function from a polynomial form with the help
of Taylor evolution at v = 1 by:
1
P ( v ) = v s H (16.3)
v
Step 4: Compute the tail signature of the system with the help of the reliability
function using Equation 16.2 by (see Marichal and Mathonet, 2013).
Va =
( s − 1)! Da P(1), a = 0,1,..., s (16.4)
s!
Step 1: Determine the MTTF of the i.i.d. of the element of the system that have
exponentially distributed elements with the mean ( µ = 1) .
Step 2: Assessment E(T ) of the system, which has i.i.d. elements (see Navarro,
2009):
n
∑ i (16.6)
Ci
E (T ) = µ
i =1
where C = (C1, C2 ,..., Cn ) is a vector coefficient we obtain with the help of the
minimal signature.
Signature Reliability Evaluations 429
Compute the Barlow-Proschan index of the i.i.d. elements with the help of the reli-
ability function as (see Shapley, 1953; Owen, 1975, 1988).
1
Step 1: Calculate the expected value of the system elements using the signature:
n
E( X ) = ∑ iV , i = 1,2,…, n.
i =1
i
16.3 ILLUSTRATIONS
Case 1: Find a series system that has five elements in a series manner and reli-
ability of the proposed system can be computed as shown in Figure 16.1
such as:
Structure function of the series system from Figure 16.1 as:
n
R( P ) = ∏R j =1
j
R( P ) = R1R2 R3 R4 R5 (16.8)
430 Reliability Engineering
In this case when elements are identically distributed ( R j = R), the reliability func-
tion R( P ) of the series system which has i.i.d. in the element can be revealed as:
R( P ) = P 5 .
H ( v ) = v 5 . (16.9)
With the help of Equations 16.3 and 16.9, the reliability function can be
written as:
1
P ( v ) = v 5 H = 1.
v
=
V0 1=
, V1 0 =
, V2 0=
, V3 0= , V5 0.
, V4 0=
V = (1, 0, 0, 0, 0, 0).
V = (1, 0, 0, 0, 0, 0 ) .
(1) 1 1
I BP = ∫ (d1H )dH = ∫ v 4dv = 1 .
5
0 0
(K )
Similarly, we obtain all elements Barlow-Proschan index I BP for
K = (1, 2,..., 5 ) given as:
1 1 1 1 1
I BP = , , , , .
5 5 5 5 5
Signature Reliability Evaluations 431
E ( X ) = 1. (16.11)
Using Equations 16.10 and 16.11, the expected cost rate is defined as:
= E ( X ) / E ( t )
= 1.
Case 2: We find a parallel system that has five elements in a parallel man-
ner and the reliability function of the proposed system can be evaluated as
shown in Figure 16.2 such as:
Reliability function of the parallel system from Figure 16.2 defined as:
n
R( P ) = 1 − ∏ (1 − R )
j =1
j
R( P ) = 5R − 10 R2 + 10 R3 − 5R 4 + R5 .
H ( P ) = 5P − 10 P 2 + 10 P 3 − 5P 4 + P 5 . (16.13)
H ( P ) = 5P − 10 P 2 + 10 P 3 − 5P 4 + P 5 . (16.14)
1
P ( v ) = v 5 H = 1 − 5v + 10v 2 − 10v 3 + 5v 4 .
v
=
V0 1=
, V1 1 ,=
V2 1,=
V3 1 = , V5 0.
, V4 1=
V = (1, 1, 1, 1, 1, 0).
V = ( 0, 0, 0, 0, 0, 1) .
0 0
1 1 1 1 1
I BP = , , , , .
5 5 5 5 5
E (t ) = 2.28 (16.15)
E ( X ) = 5 (16.16)
= E ( X ) / E ( t )
= 2.19298.
Case 3: Consider an SWS that has four window elements with n = 4, r = 3, and
W = 4 as shown in Figure 16.3. Each window having two states: complete
successor and complete failure. Suppose the performance rates of the win-
dow from 1 to 4 are 1,2,3,4, respectively.
Now from UGF of the proposed system from Figure 16.3 given as:
U j ( z ) = Pj z j + (1 − Pj ) z 0
where j = 1,2,3,4, and Pj is given the probability function and z j, z 0 is the per-
formance and non-performance rate.
Therefore, the UGF U j ( z ) ( j = 1, 2, 3, 4 ) of the system is given by:
U1 ( z ) = P1z1 + (1 − P1 ) z 0
U 2 ( z ) = P2 z 2 + (1 − P2 ) z 0
U 3 ( z ) = P3 z 3 + (1 − P3 ) z 0
U 4 ( z ) = P4 z 4 + (1 − P4 ) z 0 .
From the Algorithm 16.2.5 of SWS, we obtain the beginning element of the
SWS as:
For i = 1
U 0 ( z ) = φ (U −1( z ),U1( z ))
= P1z ( ) + (1 − P1 ) z ( )
0 , 0 ,0 ,1 0 , 0 ,0 ,0
For i = 2
U1( z ) = φ (U 0 ( z ),U 2 ( z ))
= φ ( P1z ( ) + (1 − P1 ) z ( ) , P2 z 2 + (1 − P2 ) z 0 )
0 , 0 ,0 ,1 0 , 0 ,0 ,0
For i = 3
U 2 ( z ) = φ (U1( z ),U 3 ( z ))
= P1P2 P3 z ( ) + P1 (1 − P2 ) P3 z ( ) + P2 (1 − P1 ) P3 z ( )
0 , 1,2,3 0 , 1,0 ,3 0 , 0 ,2,3
+ (1 − P1 ) (1 − P2 ) P3 z ( ) + P1P2 (1 − P3 )zz ( )
0 , 0 ,0 ,3 0 , 1,2,0
+ P1 (1 − P2 ) (1 − P3 ) z ( ) + P2 (1 − P1 ) (1 − P3 ) z ( )
0 , 1,0 ,0 0 , 0 ,2 ,0
+ (1 − P1 ) (1 − P2 ) (1 − P3 ) z ( )
0 , 0 ,0 ,0
F = (1 − P1 ) (1 − P2 ) P3 + P1P2 (1 − P3 ) + P1 (1 − P2 ) (1 − P3 )
(16.17)
+ P2 (1 − P1 ) (1 − P3 ) + (1 − P1 ) (1 − P2 ) (1 − P3 )
For i = 4
U 3 ( z ) = φ (U 2 ( z ) , U 4 ( z ) )
= φ ( P1P2 P3 z ( ) + P1(1 − P2 ) P3 z ( ) + (1 − P1 ) P2 P3 z ( ) , P4 z 4 + (1 − P4 ) z 0 )
0 , 1, 2,3 0 , 1, 0 ,3 0 , 0 , 2,3
= P1P2 P3 P4 z ( ) + P1(1 − P2 ) P3 P4 z ( ) + (1 − P1 ) P2 P3 P4 z ( )
1, 2,3, 4 1, 0 ,3, 4 0 , 2,3, 4
+ P1P2 P3 (1 − P4 ) z ( ) + P1(1 − P2 ) P3 (1 − P4 ) z ( ) + (1 − P1 ) P2 P3 (1 − P4 ) z ( )
1, 2,3, 0 1, 0 ,3, 0 0 , 2,3, 0
F = P1(1 − P2 ) P3 (1 − P4 )
(16.18)
Signature Reliability Evaluations 435
R( P ) = P 2 + P 3 − P 4 .
H ( v ) = v 2 + v 3 − v 4 (16.20)
1
P ( v ) = v 4 H = −1 + v + v 2 .
v
Now calculate the tail signature V of the SWS from using step 4 of
Algorithm 16.2.1 as:
3 1
=
V0 1=
, V1 = , V2 = , V3 0, V4 = 0.
4 2
3 1
V = 1, , , 0, 0 .
4 2
Now, find the signature of the SWS from step 5 Algorithm 16.2.1 is:
1 1 1
V = , , , 0 .
4 4 2
Barlow-Proschan index of the sliding window system
2.
From Equation 16.20 and Algorithm 16.2.3, we obtain the Barlow-
Proschan index of the SWS by:
1
(1) 1
I BP = ∫ (v
2
− v 3 ) dv = .
12
0
1 1 7 1
I BP = , , , .
12 14 12 12
436 Reliability Engineering
Minimal signature ( 0, 1, 1, − 1)
E (t ) = 0.58. (16.21)
E(X)= 2 (16.22)
= E (X ) /E ( t )
= 3.4483.
16.4 CONCLUSION
In this chapter, we discussed the properties of signature and its factor like a tail sig-
nature, expected cost rate, mean time to failure, and Barlow-Proschan index with the
help of the reliability function and Owen’s method. Also, we evaluated the reliability
function by using UGF. Further, different systems such as series, parallel, and SWS
and computed signature with the help of given algorithms were discussed.
REFERENCES
Balakrishnan, N., Navarro, J., & Samaniego, F. J. (2012). Signature representation and preser-
vation results for engineered systems and applications to statistical inference. In Recent
Advances in System Reliability, Springer, London, UK, pp. 1–22.
Barlow, R. E., & Proschan, F. (1975). Importance of system elements and fault tree events.
Stochastic Processes and Their Applications, 3(2), 153–173.
Belzunce, F., & Shaked, M. (2004). Failure profiles of coherent systems. Naval Research
Logistics (NRL), 51(4), 477–490.
Bhattacharya, D., & Samaniego, F. J. (2008). On the optimal allocation of elements within
coherent systems. Statistics & Probability Letters, 78(7), 938–943.
Bisht, S., & Singh S. B. (2019). Signature reliability of binary state node in complex bridge
network using universal generating function. International Journal of Quality &
Reliability Management, 36(2), 186–201.
Boland, P. J. (2001). Signatures of indirect majority systems. Journal of Applied Probability,
38(2), 597–603.
Signature Reliability Evaluations 437
Boland, P. J., Proschan, F., & Tong, Y. L. (1990). Linear dependence in consecutive k-out-
of-n: F systems. Probability in the Engineering and Informational Sciences, 4(3),
391–397.
Boland, P. J., & Samaniego, F. J. (2004). The signature of a coherent system and its applications in
reliability. In Mathematical Reliability: An Expository Perspective, Springer US, pp. 3–30.
Chahkandi, M., Ruggeri, F., & Suárez-Llorens, A. (2016). A generalized signature of repair-
able coherent systems. IEEE Transactions on Reliability, 65(1), 434–445.
Da Costa Bueno, V. (2011). A coherent system element importance under its signatures repre-
sentation. American Journal of Operations Research, 1(3), 172.
Da Costa Bueno, V. (2013). A multistate monotone system signature. Statistics & Probability
Letters, 83(11), 2583–2591.
Da, G., Xia, L., & Hu, T. (2014). On computing signatures of k-out-of-n systems consisting of
modules. Methodology and Computing in Applied Probability, 16(1), 223–233.
Eryılmaz, S. (2010). Mixture representations for the reliability of consecutive-k systems.
Mathematical and Computer Modelling, 51(5), 405–412.
Eryilmaz, S. (2012). The number of failed elements in a coherent system with exchangeable
elements. IEEE Transactions on Reliability, 61(1), 203–207.
Eryilmaz, S. (2014). On signatures of series and parallel systems consisting of modules with
arbitrary structures. Communications in Statistics-Simulation and Computation,
43(5), 1202–1211.
Eryilmaz, S. (2015). Mixture representations for three-state systems with three-state ele-
ments. IEEE Transactions on Reliability, 64(2), 829–834.
Eryilmaz, S., Kan, C., & Akici, F. (2009). Consecutive k-within-m‐out‐of‐n: F system with
exchangeable elements. Naval Research Logistics (NRL), 56(6), 503–510.
Eryilmaz, S., Koutras, M. V., & Triantafyllou, I. S. (2011). Signature based analysis of m‐con-
secutive k‐out‐of‐n: F systems with exchangeable elements. Naval Research Logistics
(NRL), 58(4), 344–354.
Eryilmaz, S., & Tuncel, A. (2015). Computing the signature of a generalized k -out-of- n
system. IEEE Transactions on Reliability, 64(2), 766–771.
Franko, C., & Tütüncü, G. Y. (2016). Signature based reliability analysis of repairable
weighted k-out-of-n: G systems. IEEE Transactions on Reliability, 65(2), 843–850.
Kochar, S., Mukerjee, H., & Samaniego, F. J. (1999). The signature of a coherent system and
its application to comparisons among systems. Naval Research Logistics (NRL), 46(5),
507–523.
Kumar, A., & Singh, S. B. (2017a). Signature reliability of linear multi-state sliding window
system. International Journal of Quality & Reliability Management, 35(10), 2403–2413.
Kumar, A., & Singh, S. B. (2017b). Computations of signature reliability of coherent system.
International Journal of Quality & Reliability Management, 34(6), 785–797.
Kumar, A., & Singh, S. B. (2017c). Signature reliability of sliding window coherent system.
In Mathematics Applied to Engineering, Elsevier International Publisher, London, UK,
pp. 83–95.
Levitin, G. (2001). Redundancy optimization for multi-state system with fixed resource-
requirements and unreliable sources. IEEE Transactions on Reliability, 50(1), 52–59.
Levitin, G. (2002). Optimal allocation of elements in a linear multi-state sliding window
system. Reliability Engineering & System Safety, 76(3), 245–254.
Levitin, G. (2003a). Common supply failures in linear multi-state sliding window systems.
Reliability Engineering & System Safety, 82(1), 55–62.
Levitin, G. (2003b). Linear multi-state sliding-window systems. IEEE Transactions on
Reliability, 52(2), 263–269.
Levitin, G. (2005). The Universal Generating Function in Reliability Analysis and
Optimization, Springer, London, UK, p. 442. doi:10.1007/1-84628-245-4.
438 Reliability Engineering
Li, X., & Zhang, Z. (2008). Some stochastic comparisons of conditional coherent systems.
Applied Stochastic Models in Business and Industry, 24(6), 541–549.
Lindqvist, B. H., & Samaniego, F. J. (2015). On the signature of a system under minimal
repair. Applied Stochastic Models in Business and Industry, 31(3), 297–306.
Lisnianski, A., & Frenkel, I. (Eds.). (2011). Recent Advances in System Reliability: Signatures,
Multi-state Systems and Statistical Inference. Springer Science & Business Media,
London, UK.
Mahmoudi, M., & Asadi, M. (2011). The dynamic signature of coherent systems. IEEE
Transactions on Reliability, 60(4), 817–822.
Marichal, J. L., & Mathonet, P. (2013). Computing system signatures through reliability func-
tions. Statistics & Probability Letters, 83(3), 710–717.
Navarro, J., & Rubio, R. (2009). Computations of signatures of coherent systems with five
elements. Communications in Statistics-Simulation and Computation, 39(1), 68–84.
Navarro, J., Ruiz, J. M., & Sandoval, C. J. (2007a). Properties of coherent systems with depen-
dent elements. Communications in Statistics: Theory and Methods, 36(1), 175–191.
Navarro, J., & Rychlik, T. (2007). Reliability and expectation bounds for coherent systems
with exchangeable elements. Journal of Multivariate Analysis, 98(1), 102–113.
Navarro, J., & Rychlik, T. (2010). Comparisons and bounds for expected lifetimes of reliabil-
ity systems. European Journal of Operational Research, 207(1), 309–317.
Navarro, J., Rychlik, T., & Shaked, M. (2007b). Are the order statistics ordered? A survey of
recent results. Communications in Statistics: Theory and Methods, 36(7), 1273–1290.
Navarro, J., Samaniego, F. J., Balakrishnan, N., & Bhattacharya, D. (2008). On the appli-
cation and extension of system signatures in engineering reliability. Naval Research
Logistics (NRL), 55(4), 313–327.
Owen, G. (1975). Multilinear extensions and the Banzhaf value. Naval Research Logistics
Quarterly, 22(4), 741–750.
Owen, G. (1988). Multilinear extensions of games. The Shapley Value Essays in Honor of
Lloyd S Shapley, Cambridge University Press, New York, pp. 139–151.
Samaniego, F. J. (1985). On closure of the IFR class under formation of coherent systems.
IEEE Transactions on Reliability, 34(1), 69–72.
Samaniego, F. J. (2007). System Signatures and Their Applications in Engineering Reliability.
Springer Science & Business Media, London, UK, p. 110.
Samaniego, F. J., & Navarro, J. (2016). On comparing coherent systems with heterogeneous
elements. Advances in Applied Probability, 48(1), 88–111.
Shapley, L.S. (1953). A value for n-person games. In: Contributions to the Theory of Games,
Vol. 2. In: Annals of Mathematics Studies, vol. 28. Princeton University Press,
Princeton, NJ, pp. 307–317.
Triantafyllou, I. S., & Koutras, M. V. (2011). Signature and IFR preservation of 2-within-
consecutive k-out-of- n- F: Systems. IEEE Transactions on Reliability, 60(1), 315–322.
Ushakov, I. (1986) Universal generating function. Journal of Computer Science and Systems
Biology, 24, 118–129.
Ushakov, I. A. (Ed.). (1994). Handbook of Reliability Engineering. John Wiley & Sons,
New York.
Yu, K., Koren, I., & Guo, Y. (1994). Generalized multistate monotone coherent systems. IEEE
Transactions on Reliability, 43(2), 242–250.
Index
A copulas, 213, 216, 219
cost reduction, 358
accelerated, 86, 197–201, 204, 208, 213–214, coverage based vulnerability discovery
216, 219 modeling, 410
age replacement policies, 17 criticality analysis, 310, 313, 331
Alhazmi Malaiya Logistic (AML) model, 405 CTMC model, 138–140, 142–143
Anderson Thermodynamic (AT) model, 404 Cumulative Exposure Model (CEM), 204
A-optimality, 204 cure models, 292
approximation, 84, 133–134, 260, 268, 289, current, 283
320, 341
Arrhenius law, 203
automatic analysis, 111 D
automatic control system, 386 damage-cycle diagram, 347
auxiliary engines, 375, 383–384 damage distribution, 349
availability, 127, 133 data, 233, 235, 282–283, 345, 353–354
database, 25, 65, 68, 111–112, 117, 119, 242, 306
B degradation, 79, 87, 297–298
analysis, 86
Barlow-Proschan index, 429–430, 432, 435 delay-time modeling, 68
Bayes, 355–356 dependability, 224, 228, 233
Bayesian approach, 219 design curve construction, 356
Bayesian statistics, 356 diesel engine, 362, 368, 374, 382, 385
binomial reliability demonstration, 351 discrete state, 90
bivariate Gamma degradation models, 298 D-optimality, 204, 213, 218
block replacement policy, 17
E
C
effort-based vulnerability discovery model, 407
cannibalization maintenance policy, 24 electric load, 362
case studies, 59, 67 evaluation, 237, 246, 421
censoring, 202, 204–205, 214, 218, 282 excess hazard rate models, 294
schemes, 201–202, 204 expected cost rate, 431, 433, 436
cold standby redundancy, 135 expected lifetime, 428, 431–432, 436
common cause failure analysis, 319, 321–322, 324 expected value, 429
competing, 197, 199, 219 exponential model, 405–406
complexity, 69, 104, 119, 147, 156, 234, 262, extreme value moment, 267–268
365–366, 370, 372, 380, 382
component diversity, 316 F
component-failure modes, 315, 321
component redundancy, 316 failure limit policy, 9, 12–13
Component Risk Index (IRC), 313 failure numbers, 178
composite system, 366 failure processes decomposition (FPD), 262, 264
computational systems modeling, 134 failure rate, 165, 167, 386
computer systems, 133 failures analysis, 321
constant, 201, 214, 216 fatigue data, 345, 353–354
stress, 214, 216 first order reliability method, 260
continuous state, 86, 88 first-passage point, 272
continuous time Markov chain, 53, 55, 59, 92–93, FMEA, 110, 308, 319, 325, 328, 331
104, 130, 245 FMECA, 154, 237, 308, 316, 334
conversion coefficient, 271–272 frailty, 288, 298
439
440 Index
FTA, 239 maintenance, 41, 44, 47, 56, 58, 67, 165, 361, 370,
functional dependence analysis, 308 380, 387
function analysis, 113 policy, 7–10, 361, 380
fuzzy logic, 115 strategy, 12, 23, 45
management, 225
G marine electric power, 361, 363, 365, 382, 385
marine power generator, 361, 368, 389
Gamma, 83, 297–298 Markov, 91, 99, 127, 130, 140, 371
process, 83 Markov chains, 127, 130, 140
generator configuration, 379, 383 Markovian structure, 91
group maintenance policy, 2, 14 mathematical model, 1, 6, 12, 23, 45
maximum entropy, 267, 269, 273
H methods, 232, 237, 246, 259, 347
minimum signature, 428
hazard rate models, 294 mixtures model, 358
hot standby redundancy, 135 model, 21–22, 65, 67, 133, 140, 204–205, 210,
human approach, 114 215, 217, 285, 288–289, 292, 296–298,
hybrid inspection models, 65 357–358, 404–411, 413
classification, 1
I modeling, 1, 3, 14, 41, 44, 56, 79, 127, 134, 401,
413, 416
improve design, 110, 112, 117, 119 methods, 8, 10, 17, 21–22
improvement, 12–13, 108, 229, 231, 233, 238, modes, 197, 199, 219, 305, 310, 313, 369
309, 362, 365, 385, 402, 417 modified ramp-stress, 208
incomplete data, 282 MTTF, 59–60, 63, 135, 140, 143, 146, 232, 236,
independent causes of failure, 200–201, 204, 208 373, 423, 425, 427–428
independent identically distributed (i.i.d.) MTTR, 373
elements, 283, 422 multi-dimensional integration, 268
inequalities of failure rates, 167 multiple failure modes, 202, 213, 262
inspection maintenance, 41, 44, 47, 56, 58, 67 multiple power sources, 363
inspection modeling, 42, 46, 68 multi-state system, 47, 361, 365, 374
inspection policy, 3, 41–47, 53, 57–59, 65–68 multi-unit systems, 14, 17, 19, 21–22, 24, 56,
inverse Gaussian process, 84 60–64
K N
kernel density estimation, 266 non-repairable, 18, 233, 235–236, 246–249, 373
k-out-of-n redundancy, 138
Kullback–Leibler divergence, 270 O