Lillie 2015

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Microelectronics Reliability 55 (2015) 969–979

Contents lists available at ScienceDirect

Microelectronics Reliability
journal homepage: www.elsevier.com/locate/microrel

Assessing the value of a lead-free solder control plan using cost-based


FMEA
Edwin Lillie a, Peter Sandborn a,⇑, David Humphrey b
a
CALCE Electronic Products and Systems Center, Department of Mechanical Engineering, University of Maryland, College Park, MD, USA
b
New Aspen Consulting, LLC, Tucson, AZ, USA

a r t i c l e i n f o a b s t r a c t

Article history: While the transition to lead-free electronics, which began nearly a decade ago, is complete for most com-
Received 24 June 2014 mercial products, many safety, mission and infrastructure critical systems that were originally exempt
Received in revised form 1 February 2015 from RoHS and WEEE are only now transitioning. For these types of products qualification is very expen-
Accepted 17 February 2015
sive and the consequences of failure can be catastrophic, therefore carefully engineered control plans are
Available online 26 March 2015
needed when technology or process changes are required. A control plan is a set of activities that a manu-
facturer can choose or be required to perform to ensure product performance. This paper uses cost-based
Keywords:
FMEA to determine the projected cost of failure consequence for a technology insertion control plan for
Lead-free solder
Cost modeling
the adoption of lead-free solder for the assembly of electronic systems in critical applications that pre-
Reliability viously used tin–lead solder. A case study of the lead-free implementation of a power supply demon-
Control plan strates the return on investment of the control plan for the same product under to different risk scenarios.
Ó 2015 Elsevier Ltd. All rights reserved.

1. Introduction failure in aerospace systems are potentially dire, including loss of


life and large financial losses.
On July 1, 2006 the European Union’s Restriction of Hazardous Given the high stakes of electronic performance in aerospace
Substances (RoHS) Directive and Waste Electrical and Electronic systems, the Aerospace Industries Association created the Pb-Free
Equipment (WEEE) Directive went into effect banning the use of Electronics Risk Management (PERM) Consortium to provide guid-
lead in electronics and electrical equipment. The environmental ance and leadership to the aerospace industry and respond to the
and technological issues associated with using lead-free solder in challenges posed by the use of lead-free solder in aerospace and
electronic assemblies are discussed elsewhere, e.g. [1,2], and will defense applications. One of the PERM Consortium’s contributions
not be addressed here, but as a result of RoHS and WEEE, product has been the creation of a performance standard for what a control
developers must qualify the products (and processes) that they use plan for lead-free solder must include [4]. According to the PERM
to replace tin–lead solder with lead-free solder. Changing solder standard, a lead-free control plan must address, the reliability
may affect the reliability of a product, and less data exists on the objectives of a system, outline all the risks that are threats to
performance of lead-free solder than tin–lead solder. achieving those requirements, and define the processes that will
The performance and reliability of electronic parts is of great be performed to ensure the stakeholders’ reliability requirements
concern in many safety-critical applications such as the aerospace are met. In the context of the study presented in this paper, the
industry, and the problem of transitioning to from tin–lead solder lead-free control plan defines a set of activities that the user may
to lead-free solder is particularly difficult for aerospace applica- implement with the goal of improving the reliability of the system
tions for a number of reasons. First, avionics and other electronic so that it meets all stakeholders’ reliability requirements. Also,
systems in aerospace applications often operate in extreme some activities may be required by industry standards, the cus-
environments, exposed to temperature extremes, high altitudes, tomer or the law, but the user may have a choice as to the level
vibration and mechanical shock [3]. Also, unlike consumer elec- of rigor at which they are performed and whether to perform other
tronics that have service lives of months or a few years, aerospace activities that are not required.
systems are operated for decades [3]. Finally, the consequences of While many qualitative discussions of the cost impacts of lead-
free electronics exist, e.g. [5,6], only a few quantitative models
have appeared. Palesko [7] analyzed the cost differences between
⇑ Corresponding author. process flows for assembling tin–lead and lead-free electronics.

http://dx.doi.org/10.1016/j.microrel.2015.02.022
0026-2714/Ó 2015 Elsevier Ltd. All rights reserved.
970 E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979

Sandborn and Jafreen [8] develop a cost model for assessing the maintenance. Taubel [18] implements a similar approach for the
cost ramifications on an organization of transitioning from tin–lead calculation of a total ‘‘mishap cost’’ by relating the known costs
to lead-free parts. Neither [7] nor [8] address qualification or risk associated with mishaps to the probability of mishap for different
control activities, or the impact of these activities on the cost of a severities of mishap. In Taubel’s model, the definition of mishap
system implemented with lead-free solder. This paper describes derives from the Department of Defense’s Military Standard 882C
a model that analyzes the risk and cost implications, good or [19]: ‘‘an unplanned event or series of events resulting in death,
bad, of adopting activities in the risk control plan for lead-free sol- injury, occupational illness, or damage to or loss of equipment or
der. We are not addressing the tradeoffs associated with conver- property, or damage to the environment’’.
sion from tin–lead to lead-free solder (the models in [7,8] can be The models developed in [16–18] form the basis for the model
used for this) – we implicitly assume that a conversion decision used in this paper, which is described in the remainder of this sec-
has already been made or mandated. The purpose of the analysis tion. We have extended these models so that technologies can be
described in this paper is to quantify the ramifications of the con- inserted at various levels of rigor, and there is uncertainty in the
version and establish the value of activities designed to mitigate life-cycle cost of the system and effectiveness of the technologies
the conversion risks. We also wish to understand how the cost- in reducing failures. The model in this paper also replaces the
effectiveness of adopting lead-free solder changes when the appli- FMEA probability of occurrence with discrete event simulation
cation (i.e. risk environment) changes; although application based reliability sampling. The model presented here predicts rela-
changes do not necessarily change the consequence of failure, for tive costs (cost differences between cases) rather than absolute
the particular applications considered in this paper the application costs, and our model is directed toward the activities necessary
changes the consequences of failure significantly. to implement and qualify a technology insertion (specifically a
Section 2 of this paper describes the technology insertion model lead-free control plan).
used to assess the cost of risk of using lead-free solder in systems.
Section 3 applies the cost of risk model to a power supply imple- 2.2. Multiple severity model
mented in two different risk scenarios and demonstrates that the
optimum control plan differs depending on the usage scenario. In order to assess the cost of risk associated with technology
Finally, Section 4 discusses the results and suggests analysis insertion (lead-free solder in our case) we will determine the dif-
extensions. ference in failure consequence costs between the system with
and without the technology change. Note that the method
described in this section does not calculate the actual life-cycle
2. Technology insertion cost of risk model cost of the system, but rather the cost difference between the res-
olution and consequences of failure for the two cases while assum-
2.1. Review of relevant literature ing that other life-cycle cost contributions are a ‘‘wash’’. This is
referred to as a ‘‘relative accuracy’’ cost model in [20].
Barringer [9] defines the cost of reliability as those costs that are Systems can fail in different ways, and all failures do not
used to keep the system free from failure. Models that estimate the necessarily have the same financial consequences. A system failure
cost of reliability based on Barringer’s definition include [10,11]. that requires maintenance (repair) might cost less than a failure
Models based on the risk of failure where failures are ranked based that requires the system owner to replace the system. Ideally the
on severity and likelihood of occurrence have also been developed. system owner needs to predict the cost of all the failure events that
Hauge and Johnston [12] define risk as ‘‘the product of the severity are expected to occur over the life of the fleet of systems, taking
of a failure and the probability of that failure’s occurrence’’. In [12], into account that those systems can fail multiple times, in multiple
the severity and occurrence ratings are multiplied together to give ways, and with different financial consequences of failure depend-
a total magnitude of the risk due to the failure. Perera and ing how the systems fail.
Holsomback [13] describe a NASA risk management approach, Taubel [18] calculates a total mishap cost by plotting the known
which prioritizes risks based on likelihood and severity, with equal costs associated with mishaps versus the probability of mishap for
weight given to both factors. Perera and Holsomback identified different severities of mishap (e.g. Fig. 1). In the model, each sever-
risks from ‘‘fault-tree analysis results, failure modes and effects ity level has a distinct cost and an associated probability of occur-
analysis (FMEA) results, test data, expert opinion, brainstorming, rence. The area under the curve is the expected total mishap cost.
hazard analysis, lessons learned from other project/programs, A mitigation activity is a process that may reduce the overall
technical analysis or trade studies and other resources’’. Sun expected number of mishaps at specific severity levels. Each mit-
et al. [14] describe a software cost of reliability model that incorpo- igation activity is assumed to affect a specified set of severity levels
rates the severity level of failures. Sun et al. claim that the risk from and does not change the probability of a failure for the other
a defect in software depends on both the failure rate of the defect severity levels.
and the severity level of the defect. According to Sun et al., the risk
of a defect is defined as ‘‘the expected loss if [the defect] remains in
the released software’’. Another concept introduced in the litera- Severity Level 1
$10,000,000
ture is the cost of risk. Liu and Boggs [15], in their paper on cable
life, define the cost of risk as ‘‘the cost to a [electric] utility associ- $1,000,000
ated with early cable failure’’ and the cost of failure as ‘‘the cost to
replace the cable’’. Liu and Boggs define the cost of risk as the cost
Cost

Severity Level 2
$100,000
of failures that occur before the end of the service life of the
product. Severity Level 3
$10,000
Rhee and Ishii [16] introduced a cost-based failure modes and
Severity
effects (FMEA) approach to measure the cost of risk and apply it Level 4
$1,000
to the selection of design alternatives. Kmenta and Ishii [17] use 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01
scenario-based FMEA to evaluate risk using probability and cost. Probability
Scenario-based FMEA uses predicted failure costs to make
decisions about investments in reliability improvement versus Fig. 1. Multiple severity model (after [18]).
E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979 971

The model described in this section determines the expected 1. Determine all relevant failure modes
number of failures at each severity level rather than calculating
the probability of failure at each severity level. This is done because
some failures may occur more than once during the life of the pro- 2. Determine the expected number of
duct, hence the cost of (multiple) failures is accounted for. We refer occurrences and costs per occurrence for each
to this as the Projected Cost of Failure Consequences (PCFC) for the failure mode
fleet (population) of products.1 An overview of the steps in the
model is shown in Fig. 2.
The first step in the model is to identify and describe each rele- 3. Determine the total cost of failure
vant failure by determining the part affected by the failure, and the
failure mode, cause, and mechanism associated with an occurrence
of that failure. Additionally, each failure is defined by an applica- 4. Select a set of migang acvies
tion-specific severity level. The severity level determines the cost
associated with an occurrence of the failure.
5. Determine the cost of performing the
Next, the number of failures expected to occur over the service
migang acvates and repeat steps 2 and
life of the product at each severity level are determined. This is an
application-specific calculation (see the case study in Section 3 for
3 with the migaon acvies used
the methodology used). The collective expected number of failures
for each severity level is called the severity level profile. The 6. Determine the ROI of performing the acvies
calculation of the expected number of failures per product per unit
lifetime for each distinct severity level is given by: Fig. 2. Modeling steps.

X
n
fi ¼ fj ð1Þ assigned an expected cost associated with the consequences of
j¼1
the occurrence of a failure of that severity.3
The transformation of FMMEA ratings to numerical values of
where fi is the expected number of failures of severity i per product
cost and expected number of failures is application specific. The
per unit lifetime; and n is the number of ways a product can experi-
cost associated with a certain severity of failure and expected
ence failure at severity level i.
number of failures for a given frequency rating could vary based
on several factors including: operating conditions, the context
2.3. Using FMMEA data to determine the initial PCFC the system is being used in, and the length of the service life.
Using an expected number of occurrences for each failure sever-
Assuming a repairable system, each failure experienced by the ity, and a cost associated with each occurrence at every failure, the
system is described by two characteristics: the severity of failure PCFC for the system can be determined. Fig. 3 shows a plot of the
and the frequency of occurrence of that failure. Severity correlates expected number of failures and cost associated with each failure
to the cost of the actions that the system or product owner will for five severity levels. The vertical axis is the number of failures
have to take to correct or compensate for the effects of a failure expected to occur per product per service life. The service life is
after it has occurred. One possible source of data for determining the required life the system, expressed in years or temperature
a PCFC is a Failure Modes, Mechanisms, and Effects Analysis cycles. The horizontal axis is the cost per failure event.4
(FMMEA) report (e.g. [21]).2 The cost and number of failures for each severity level are con-
Most FMMEAs in use today qualitatively describe severity and nected and form a curve as shown in Fig. 3. The area under this
frequency of failure, whereas to be used in this model each failure’s curve is the PCFC for the system.
severity and frequency must be quantitatively defined. Each fail- Z Em
ure’s severity and frequency will be used to determine: (1) the PCFCinitial ¼ CðxÞdx ð2Þ
expected cost that the system owner will incur for every instance E1

of the occurrence of that failure, and (2) the number of times the where E1 is the expected number of severity level 1 failures (Em is
failure is expected to occur over the service life of the system. the expected number of severity level m failures); m is the number
For example, in the FMMEA used for the case study in this of severity levels under consideration and C(x) is the cost of a failure
paper, severity of failure is rated on a scale of 1–5, with a severity event occurring at severity level x.
5 failure defined as a minor nuisance and a severity 1 failure In practice the area of the discrete trapezoids formed by the
defined as a catastrophic failure. Each of these severities must be points in the curve are determined and summed using,

1
X
m
To clarify, the models used in [18] and in this paper (although not exactly the PCFCinitial ¼ ½Eði þ 1Þ þ 0:5EðiÞ½Cði þ 1Þ  CðiÞ ð3Þ
same – see Section 2.3) are continuous risk models, i.e. they assume that probabilities i¼1
are continuous, therefore the PCFC is defined as the area under the curve. However,
some risk models assume the probabilities are discrete, in which case the cost of where E(x) is the expected number of failures per product per unit
failure would be calculated be the sum of the probability of failure at each discrete lifetime of point (severity level) x on the curve.
severity level multiplied by the cost of failure resolution at the corresponding severity
level. Both approaches are valid, continuous risk is assumed in this paper.
2 3
A FMMEA categorizes failure events and assigns each event a rating for its It should be noted that FMMEAs also describe the frequency of failure on a
severity and likelihood of occurrence. Alternatively, a Failure Modes and Effects qualitative scale (this is usually called the ‘‘probability of occurrence’’). Kmenta and
Analysis (FMEA) or a Failure Modes Effects and Criticality Analysis (FMECA) could also Ishii [17] use the probability of occurrence; however, in the model presented in this
be used as a source of data on the severities and frequencies of the ways a system paper, the expected number of failures per product per service life are determined
could fail. A FMEA is very similar to a FMMEA, except that a FMEA does not analyze from reliability distributions, not generated from the FMMEA.
4
the mechanisms associated with each failure. Additionally, a FMECA is an extension of The model described in this paper assumes that the cost of failure decreases
a FMMEA that includes a criticality analysis. Criticality analysis is a method of linearly between severity levels. The assumed linear decrease appears as shown in
prioritizing failures after each failure is assigned a severity and occurrence rating, Figs. 3 and 4 when graphed on a log–log plot. For the plots in the case study, the lines
where the highest priority failures (those to be dealt with first), are those with the between severity levels are represented by straight lines (on the log–log plots) for
highest aggregate severity and occurrence ratings. graphical convenience.
972 E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979

An activity is defined by the change in failures over the service


Expected Number of Failures per Product

1
0.1
life of the product, the non-recurring (NRE) cost for each level of
0.01
rigor, and the particular failure modes, failure mechanisms, failure
0.001
causes, and parts the activity will impact if performed.
The cost of performing all activities, called the Total
Service Life

0.0001
Implementation Cost, (CTotal) is calculated according to,
0.00001
Severity Level 5

Severity Level 4

Severity Level 3

Severity Level 2

Severity Level 1
0.000001 X
q

0.0000001 C Total ¼ C NRE ði; RÞ ð5Þ


i¼1
1E-08
1E-09 where CNRE(i, R) is the cost of performing activity i at level of rigor R.
1E-10 Performing an activity at level of rigor R may reduce the number
10 100 1000
of times a failure is expected to occur. The model determines
Cost per Failure
which failures listed in the FMMEA each activity affects by check-
Fig. 3. Expected number of failures versus cost per failure. ing if a failure’s mode, mechanism, cause, and part are impacted by
the activity. The model performs the calculation for each activity
on every failure listed in the FMMEA whose mode, cause, mecha-
nism, and part are all impacted by the activity.
2.4. Activities that modify the expected number of failures
Once a set of activities has been chosen, the model calculates
the modified PCFC for the system. First the model calculates the
An activity is sub-process, process, or group of processes that
number of failures expected to occur at each severity level using
when performed (or applied) changes the expected number of fail-
Eq. (1) and generates a modified severity level profile. Next, the
ures over the service life of the product. Activities can be per-
model uses the new expected number of failures (determined via
formed at multiple levels of rigor; rigor is the detail or depth at
a discrete event simulation that samples cycles to failure dis-
which the activity is performed. Performing an activity at a higher
tributions through the support life of the product – see the case
level of rigor has the potential for a greater reduction in the num-
study) to calculate expected PCFC of the system using,
ber of expected failures, but it will cost more.
Activities can affect specific failure modes, failure mechanisms, Z Emf

failure causes, and parts. If an activity affects the mode, mecha- PCFCmodified ¼ CðxÞdx ð6Þ
E1f
nism, cause, or part that corresponded to a failure in the FMMEA
used to create the initial severity level profile, then if that activity where E1f is the expected number of severity level 1 failures after
is performed, the expected number of failures will change. Eq. (4) activities are considered and Emf is the expected number of sever-
shows the calculation of the new expected number of failures after ity level m failures after activities are considered.
activities are performed. The difference between the initial PCFC and the modified PCFC,
called the Reduction in Failure Cost is calculated as,
Y
q
Nf f ¼ Nf i PR ði; RÞ ð4Þ Reduction in Failure Cost ¼ PCFCInitial  PCFCModified ð7Þ
i¼1
The Reduction in Failure Cost can be graphically represented as
where Nff is the number of failures expected to occur over the ser- the difference in the areas under the curves in Fig. 4. The top curve
vice life of the product for a particular failure listed in the FMMEA is the expected number of failures versus PCFC before activities are
after considering activities; Nfi is the number of failures expected considered, and the bottom curve is the expected numbers of fail-
to occur over the service life of the product for a particular failure ures versus PCFC after activities are considered.
listed in a the FMMEA before considering activities; PR (i, R) is the
fractional reduction in the expected number of failures to occur 2.5. Calculating return on investment
over the service life of the product due to performing activity i; q
is the number of activities performed that affect the failure under The final step in the model is to calculate the Return on
consideration; and R is the level of rigor activity i is performed at. Investment or ROI. The ROI is defined as the difference between
return and investment divided by investment. In this model, the
investment is the money spent on performing activities, the Total
Implementation Cost, and the return is the PCFC that will be avoided
Expected Number of Failures per Product

1
because activities have been performed, the Reduction in Failure
0.1
Cost,
0.01
0.001 Reduction in Failure Cost  C Total
Return on Investment ðROIÞ ¼
Service Life

0.0001 C Total
0.00001
ð8Þ
Severity Level 5

Severity Level 4

Severity Level 3

Severity Level 2

Severity Level 1

0.000001
0.0000001
1E-08
3. Cost implications of implementing a lead-free solder control
1E-09
plan – a power supply case study
1E-10
10 100 1000
Cost per Failure In this section, the model described in Section 2 will be used to
project the cost implications of implementing a lead-free solder
Fig. 4. The blue (dashed, top) curve represents the number of failures per product control plan on a power supply whose manufacturer has recently
per unit lifetime at each severity level before activities are considered, and the red
(solid, bottom) line represents the expected number of failures with the activities
changed from using tin–lead solder to lead-free solder. The case
performed. (For interpretation of the references to colour in this figure legend, the study will analyze the system under two sets of conditions: in
reader is referred to the web version of this article.) one situation the power supply is used in desktop computers and
E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979 973

Fig. 5. Components on the PCB.

Table 1 We assume that the manufacturer of the power supply has


Quantity of solder connections on the power supply PCB. already made the decision to transition to lead-free, or that its
Solder joint Capacitor Wires IC Resistor transition is required by outside factors (e.g. RoHS legislation). It
Through hole-large 4 16 0 0
is important to note that the goal of the case study is to analyze
Through hole-medium 0 3 0 0 and optimize the lead-free solder control plan activities, not to
Through hole-small 24 0 48 36 analyze the decision to convert the power supply to lead-free sol-
Surface mount – 2 lead 24 0 0 86 der. Since lead-free control plan activities only affect solder, we are
Surface mount – 3 lead 0 0 0 3
not considering all failures in the FMMEA, i.e. we will only consider
Surface mount – 8 lead 0 0 0 1
the failures associated with the solder connections of the parts to
the PCB – note, tin whisker mitigation activities apply to both
the leads and the solder. Table 2 shows the relevant portion of
in the other situation the power supply is used in a commercial
the FMMEA for this case study and categorizes the solder connec-
aircraft.
tions on the PCB by type of connection and the parts connected to
the PCB.
3.1. Power supply description In the FMMEA, solder connection failures are classified based on
the type of part, the size of the connection, and the type of solder
This case study uses a Dell power supply (Model: NPS-250 KB) connection (through hole or surface mount). For example, in the
with a variable 100–120 V – 9.0 A/200–240 V – 4.5 input and 5 V FMMEA, large through-hole solder joints connecting capacitors
– 22.0 A/12 V – 14.0 A output. Fig. 5 shows the power supply’s are one distinct ‘‘part’’ and there are 4 instances of this in the
components attached to the printed circuit board (PCB) using both power supply. Open circuit and intermittent open circuit failure
through hole and surface mount. A detailed list of solder connec- modes are associated with failure cause and mechanism tempera-
tions in the power supply is given in Table 1. ture cycling and fatigue, respectively. The short circuit mode is
Previous analyses of power supplies in the [21,22] were used to associated with a failure cause of conductive bridge and failure
construct a full FMMEA for the power supply. mechanism of tin whisker. The difference between an intermittent

Table 2
Solder connections portion of the full FMMEA for the power supply.

Number of parts Part Failure mode Failure cause Failure mechanism


4 PCB – capacitor through-hole solder joint- large Open circuit/cracked solder joint Temperature cycling Fatiguea
4 PCB – capacitor through-hole solder joint – large Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
4 PCB – capacitor through-hole solder joint – large Short circuit Conductive bridge Tin whisker
16 PCB – wire through-hole solder joint – large Open circuit/cracked solder joint Temperature cycling Fatigue
16 PCB – wire through-hole solder joint – large Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
16 PCB – wire through-hole solder joint – large Short circuit Conductive bridge Tin whisker
24 PCB – capacitor through-hole solder joint – small Open circuit/cracked solder joint Temperature cycling Fatigue
24 Capacitor through-hole solder joint – small Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
24 Capacitor through-hole solder joint – small Short circuit Conductive bridge Tin whisker
22 PCB – surface mount capacitor – 2 lead connection Open circuit/cracked solder joint Temperature cycling Fatigue
22 PCB – surface mount capacitor – 2 lead connection Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
22 PCB – surface mount capacitor – 2 lead connection Short circuit Conductive bridge Tin whisker
86 PCB – surface mount resistor – 2 lead connection Open circuit/cracked solder joint Temperature cycling Fatigue
86 PCB – surface mount resistor – 2 lead connection Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
86 PCB – surface mount resistor – 2 lead connection Short circuit Conductive bridge Tin whisker
3 PCB – surface mount resistor – 3 lead connection Open circuit/cracked solder joint Temperature cycling Fatigue
3 PCB – surface mount resistor – 3 lead connection Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
3 PCB – surface mount resistor – 3 lead connection Short circuit Conductive bridge Tin whisker
1 PCB – surface mount resistor – 8 lead connection Open circuit/cracked solder joint Temperature cycling Fatigue
1 PCB – surface mount resistor – 8 lead connection Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
1 PCB – surface mount resistor – 8 lead connection Short circuit Conductive bridge Tin whisker
2 PCB-IC – 14 lead connection Open circuit/cracked solder joint Temperature cycling Fatigue
2 PCB-IC – 14 lead connection Intermittent open circuit/cracked solder joint Temperature cycling Fatigue
2 PCB-IC – 14 lead connection Short circuit Conductive bridge Tin whisker
a
Only thermal fatigue is considered in the present model. Fatigue due to mechanical overstress (i.e. drop shock) is not included.
974 E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979

Table 3 summarizes the operational expectations of the power supply when


Usage conditions for the power supply. used in both applications.
PC Commercial Aircraft The consequences of failure for a power supply in an aircraft can
Temperature cycles (per lifetime) (c) 9000 36,000 be far greater than for a power supply in a desktop computer. In
Service life (years) 5 20 the context of this paper, we consider the consequences of failure
Temperature cycles per year 1800 1800 in terms of the financial loss to the entity or entities responsible for
Number of units in service 100,000 500 the performance of the system. For the PC case, we assume the
entity responsible for failure costs is the manufacturer, and the
PC is under warranty for the service life (five years). In the com-
Table 4 mercial aircraft case, we assume the entity responsible for failure
Consequences and likelihoods of varying severity and occurrence ratings of failure for costs is an airline (the system operator). Table 4 shows the
the power supply used in a desktop PC and a commercial aircraft. assumed consequences and likelihoods of varying severity and
Severity Desktop PC Commercial aircraft occurrence ratings of failure when the power supply used in a
of
Failure event Failure Failure event Failure
desktop PC and a commercial airplane.
failure In this case study we assume that all repair and replace mainte-
associated with: cost associated with: cost
5 Minor nuisance $10 Minor nuisance $100
nance actions associated with a part result in a good-as-new part
4 Repair of power $75 Repair of power $2500 in the product. Also, the ‘‘minor nuisance’’ failure event could be
supply supply a ‘‘no fault found’’ failure event. This case study assumes that the
3 Replacement of $150 Replacement of $5000 financial consequences of a no fault found event (severity level
power supply power supply
1) do not change if multiple no fault found events occur on the
2 Replacement of $750 Repair or replace, $25,000
power supply, interrupting flight same board.6
collateral damage to schedule In the case study in this paper the lead-free solder was assumed
PC to be SAC305 solder and its cycles to failure is modeled with a
1 Loss of entire PC $1500 Repair or replace, $250,000 Weibull distribution,
causes collateral
damage cc b
FðcÞ ¼ 1  eð g Þ ð9Þ
open circuit and a ‘‘non-intermittent’’ open circuit is that an inter- where F(c) is the cumulative distribution of failure; c is the number
mittent open circuit will close after a period of time, and that a of temperature cycles (Table 4); b is the Weibull shape parameter; g
‘‘non-intermittent’’ open circuit stays open until maintenance is is the Weibull scale parameter; and c is the Weibull location
performed. In this case study is it assumed that failures associated parameter.7 A value of 2.9 was assumed for b based on testing done
with intermittent open circuits are less severe than failures associ- by [24,25], however, characteristic life (g) of SAC305 solder for the
ated with permanent open circuits (this is not necessarily always conditions in the case study are not well known so a range of values
true depending on the application). Note, shock has been omitted from 25,000 to 75,000 cycles was assumed in the results that follow,
from the case study – while shock can be a significant cause of fail- the location parameter (c) was assume to be zero in all cases.8
ure in this type of system, it has been omitted to simplify the A discrete event simulator that sampled the respective cycles to
example. failure distributions for each of the product’s parts was used to
determine the sequence of failure events (a Monte Carlo approach
3.2. Environmental and operating conditions, and consequences of was used). The discrete event simulator was run through the entire
failure service life of the product to determine the total failure counts for
each part in the product. 100 independent time histories of the
This case study analyzes the cost implications of implementing products analyzed in the case study were run to build the results
a lead-free control plan for the power supply for two cases: one provided.
where the power supply is used in a desktop computer; and the
other where the power supply is used in a commercial aircraft. A 3.3. Lead-free control plan activities
desktop computer is in an environment that is assumed to have
stable temperatures, pressure, and humidity, while in an aircraft The lead-free control plan is a set of activities that a manufac-
we are assuming that the power supply is operating in the unpres- turer can choose or be required to perform to ensure product
surized, non-climate controlled, tail of the aircraft, colloquially performance.
known as the ‘‘hell hole.’’ The conditions in the hell hole are The activities considered in this paper, summarized in Table 5,
assumed to be those defined by [23]. are from the PERM working group [4]. For this study, the activities
The power supply may also have different service lives and are applied to both applications of the power supply (in a desktop
rates of use depending on its application. For this case study, we computer or commercial aircraft). Note: the standard was written
assume that while a typical commercial aircraft has an expected for the aerospace high-reliability electronics industry. It is not in
service life of 20 years.5 Alternatively, we assume that when the general use in the commercial computer industry.
power supply is used in a PC it has an expected service life of 5 years.
When used in an aircraft, we assume the power supply will experi- 6
Some organizations have policies limiting the number of no fault found events.
ence one temperature cycle per flight, and that the aircraft is making These policies may require that if a board has encountered a specific number of no
an average of 6 flights per day, and that it operates 300 days per fault found events (for example three), and the underlying cause is not discovered, the
year. Similarly, we will assume that the PC will encounter 1800 tem- board is thrown away. Adapting these policies within the model would require that a
third severity level 5 event on a particular board would actually be a severity level 3
perature cycles per year (on/off and sleep cycles). Table 3
event; however this case has not been considered in this case study.
7
We have assumed for this study only that all the relevant failure mechanisms are
5
We have assumed that the power supply is not flight-critical hardware. However, driven by temperature cycling.
8
if the failure of the power supply is severe enough to render the power supply non- In reality, the Weibull parameters would be expected to differ for the SMT versus
operational, then the aircraft is not allowed to fly again until corrective action is through-hole solder joints, and for the same solder joints in the two different
taken. The wait time for corrective action could result in significant financial environments considered in the case study, however, insufficient data exists to
consequences for the aircraft operator. represent this differentiation.
E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979 975

Table 5
Lead-free control plan activities.

Activity name Brief description Failure modes Failure causes impacted Failure mechanisms
impacted impacted
Risk and limitations of use Processes that identify and report limitations on system Solder joint Fatigue stresses, over loads, Material fatigue due
operation, to avoid unacceptable levels of risk to intermittent and poor quality to temperature
performance, reliability, safety, or airworthiness due to open circuits cycling
the use of lead-free solder or finishes. Include limitations
on incompatible materials, environmental conditions,
maintenance, rework, and repair and other risks
Deleterious effects of tin Plan to mitigate the deleterious effects of tin whiskers, Short circuits Conductive bridge between Tin whisker
whiskers prepared and approved and implemented in compliance conductors
to the requirements of GEIA-STD-0005-2
Repair rework maintenance Are the requirements of this standard applied equally to Solder joints Fatigue stresses, over loads, Material fatigue due
and support original equipment manufacturing and repair, rework, intermittent and poor quality and conductive to temperature
maintenance and support activities? open circuits; short bridge between conductors cycling and tin
circuits whisker
System reliability Are the effects of lead-free solder and termination finishes Solder joints Fatigue stresses, over loads, Material fatigue due
on solder joint infant mortality, failure rates and wear out intermittent and poor quality and conductive to temperature
monitored and the impact to product and system level open circuits; short bridge between conductors cycling and tin
safety, reliability and maintainability determined? When circuits whisker
performance is degraded and/or when failure trends
dictate detailed investigation, specific attention shall be
given to the effectiveness of mitigation of tin whisker
growth and subsequent impact on reliability performance
Product and system level Qualification of the lead-free solder and termination Solder joints Fatigue stresses, over loads, Material fatigue due
reliability finishes may include additional evaluation of reliability intermittent and poor quality and conductive to temperature
and durability at the product/system level. The evaluation open circuits; short bridge between conductors cycling and tin
is performed to obtain additional data on how the circuits whisker
electrical and mechanical characteristics of the assembled
product affect the transfer of thermal and mechanical
environmental stresses from the product level to the
solder joint level
Environmental and The life-cycle environmental and operating conditions for Solder joints Fatigue stresses, over loads, Material fatigue due
operating conditions the given application (for the individual assembly) known, intermittent and poor quality and conductive to temperature
and used in assessing the reliability of the given materials open circuits; short bridge between conductors cycling and tin
and processes in the given application? circuits whisker

Table 6 1000.00
Expected Number of Failures per Product Per Service Life

Cost and benefit data for various levels of rigor of performing the activity ‘‘risk and
limitations of use’’ (NRE = non-recurring).
System prior to performing lead-free control plan
Level of Fractional change in Mode Low High risk migaon acvies
100.00
rigor failures over the System aer performing lead-free control plan
product service life risk migaon acvies
1 1.00 – – –
2 1.00 – – – 10.00
3 Triangular 0.85 0.70 1.00
distribution
4 Triangular 0.50 0.40 0.60
distribution 1.00
5 Triangular 0.25 0.15 0.35
distribution
Level of NRE cost Mode Low High
0.10
rigor
1 Uniform distribution $1,000,000 $500,000 $1,500,000
2 Uniform distribution $2,000,000 $1,500,000 $2,500,000
3 Uniform distribution $3,000,000 $2,500,000 $3,500,000
4 Uniform distribution $4,000,000 $3,500,000 $4,500,000 0.01
$10 $100 $1,000 $10,000 $100,000
5 Uniform distribution $5,000,000 $4,500,000 $5,500,000
Cost of Each Failure

Fig. 6. Results for the PC, g = 25,000 cycles.

The system implementer may have the choice to perform or not


perform each activity in the control plan, and each activity can be cycles to failure distributions for each part in the product (see
performed at various levels of rigor. For example, the cost and analysis description after Eq. (9)). In this paper, each solder joint
benefit details for the activity ‘‘Risk and limitations of use’’ on the power supply represents a socket (a place where a part
assumed in this case study are given in Table 6. goes). Each socket must complete the number of cycles defined
by the service life. If a solder joint does not last for its service life
3.4. Case study results then it is assumed that corrective action is taken (repair or replace)
and the socket samples the cycles to failure distribution again until
Each application of the power supply was run for 100 trials (life the cumulative lives (in cycles) of the parts in the socket are
histories). Each trial calculates the initial PCFC by sampling the greater than or equal to the service life (also in cycles). Then each
976 E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979

35 70
30 60
25 50

Count
Count

20 40
15 30
10 20
5 10
0 0

-100%<ROI<-75%

-75%<ROI<0%

0%<ROI <75%

75%<ROI <150%

150%<ROI <225%

225%<ROI <300%

300%<ROI <375%

375%<ROI <450%

450%<ROI <525%
0%<ROI <75%
-100%<ROI<-75%

-75%<ROI<0%

75%<ROI <150%

150%<ROI <225%

225%<ROI <300%

300%<ROI <375%

375%<ROI <450%

450%<ROI <525%
Return on Investment (ROI)
Return on Investment (ROI)
Fig. 9. Histogram of ROIs for the PC, g = 50,000 cycles.
Fig. 7. Histogram of ROIs for the PC, g = 25,000 cycles.

trial calculates: an investment cost (the cost of performing activi-


Table 7
ties) by sampling the cost distribution defining the cost of perform-
ROI values for all cases considered.
ing the activity at the severity level chosen, a return (the reduction
in PCFC after performing activities) by sampling the distribution g = 25,000 g = 50,000 g = 75,000
cycles (%) cycles (%) cycles (%)
defining the fractional reduction in failures for each activity per-
formed and applying the factional reduction in failures to the fail- PC Median 189 78 100
ROI
ures in the FMMEA that the activity affects, and an ROI. Thus, for
Minimum 4 100 100
each trial, the initial PCFC, investment cost, and return could be dif- ROI
ferent because the parameters that determine them are defined as Maximum 523 48 28
distributions that are sampled for each trial. ROI
Commercial Median 956 240 130
aircraft ROI
3.4.1. Desktop computer results
Minimum 344 70 73
In this section, the lead-free control plan is applied to the power ROI
supply used in a desktop computer. The results of the case study Maximum 1633 928 897
are shown in Fig. 6 through Fig. 9, where the blue (dashed) lines ROI
represent the system before the lead-free control plan activities
are performed, and the red (solid) lines represent the system after
the lead-free control plan activities are performed. Note that while
System prior to performing lead-free control plan
100 trials were performed, Figs. 6 and 8 only show the results of 15 risk migaon acvies
of the trials. Figs. 7 and 9 show histograms of all 100 ROIs that were
System aer performing lead-free control plan
calculated. risk migaon acvies
In Fig. 6 and the similar figures that follow, 15 trials (randomly 1000.00
Expected Number of Failures per Product Per Service Life

selected from the 100 generated) are shown. Each trial represents
one possible future for the system. In Fig. 6 the system without the
lead-free control plan activities (the blue dashed line) generally 100.00
have a higher expected number of failures than the system after
control plan activities are performed, however this is not univer-
sally true, in a few trials performing control plan activities leads
to a worse result. Note, performing control plan activities does 10.00

1000.00
Expected Number of Failures per Product Per Service Life

1.00
System prior to performing lead-free control plan
risk migaon acvies
100.00 System aer performing lead-free control plan
risk migaon acvies 0.10

10.00

0.01
$10 $100 $1,000 $10,000 $100,000
Cost of Each Failure
1.00

Fig. 10. Results for the commercial aircraft, g = 25,000 cycles.

0.10
not make the cost of failure lower (we assume that the same failure
always costs the same to resolve). Any positive return on invest-
ment is the result of changes in the number of expected failure
0.01 events or changes in the severity of the failure events.
$10 $100 $1,000 $10,000 $100,000
Cost of Each Failure
When g = 50,000 cycles the median ROI is negative, because the
cost of performing activities is so high that it is greater than the
Fig. 8. Results for the PC, g = 50,000 cycles. benefit of performing activities in 90% of the trials. But, when
E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979 977

50 System prior to performing lead-free control plan


45
40 risk migaon acvies
35
System aer performing lead-free control plan
Count

30
25
20 risk migaon acvies
15 1000.0000
10
5

Expected Number of Failures per Product Per Service Life


0
0%<ROI<200%
-200%<ROI<0%

1000%<ROI<1200%

1200%<ROI<1400%

1400%<ROI<1600%

1600%<ROI<1800%
200%<ROI<400%

400%<ROI<600%

600%<ROI<800%

800%<ROI<1000%
100.0000

10.0000

Return on Investment (ROI)


1.0000

Fig. 11. Histogram of ROIs for the commercial aircraft, g = 25,000 cycles.
0.1000

System prior to performing lead-free control plan 0.0100


risk migaon acvies
System aer performing lead-free control plan
risk migaon acvies 0.0010
1000.00
Expected Number of Failures per Product Per Service Life

0.0001
$10 $100 $1,000 $10,000 $100,000

100.00 Cost of Each Failure

Fig. 14. Results for commercial aircraft, g = 50,000 cycles, activities performed at
the highest level of rigor (level 5).

10.00

45
40
35
1.00 30
Count

25
20
15
10
0.10 5
0
-200%<ROI<0%

0%<ROI<200%

200%<ROI<400%

400%<ROI<600%

600%<ROI<800%

800%<ROI<1000%

1000%<ROI<1200%

1200%<ROI<1400%

1400%<ROI<1600%

1600%<ROI<1800%
0.01
$10 $100 $1,000 $10,000 $100,000
Cost of Each Failure
Return on Investment (ROI)
Fig. 12. Results for the commercial aircraft, g = 50,000 cycles.

Fig. 15. Histogram of ROIs for the commercial aircraft, g = 50,000 cycles, activities
performed at the highest level of rigor (level 5).
50
45
40
35 as for the PC, except that the PCFC associated with each severity
30
Count

level of failure is much greater because the power supply is being


25
20 used in an airplane, and the service life of the aircraft is 20 years. As
15 in the PC case study, the study is run for varying values of the char-
10
5 acteristic life of the solder. Figs. 10–13 show the results.
0

3.4.3. Case study summary


Table 7 shows the minimum, median, and maximum values of
ROI for each of the cases considered.
Return on Investment (ROI) The model indicates that performing activities provides more
value in the commercial aircraft scenario than the PC scenario.
Fig. 13. Histogram of ROIs for the commercial aircraft, g = 50,000 cycles. Performing activities is only viable for the PC scenario when the
solder has the lowest value of g. When the power supply is used
in an aircraft, there is a good case for performing activities in all
g = 25,000 cycles, the initial PCFC is large enough that paying for three scenarios, but when g = 75,000, there is a chance that per-
activities to reduce it is cost effective. Results for g = 75,000 cycles
forming activities will result in a negative ROI.
also appear in the summary (Table 7).

3.4.2. Commercial aircraft results 3.4.4. Performing activities at the highest level or rigor
Next we perform the case study again for the power supply in a If the lead-free control plan activities are performed at level 5
commercial aircraft. All parameters are the same in this case study rigor for the commercial aircraft case (instead of level 3 assumed
978 E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979

1000.00 System prior to performing lead-free control plan


risk migaon acvies
Expected Number of Failures per Product Per Service Life

System aer performing lead-free control plan


risk migaon acvies
100.00 1000.00

Expected Number of Failures per Product Per Service Life


10.00 100.00

10.00
1.00

1.00
0.10

0.10
0.01
$10 $100 $1,000 $10,000 $100,000
Cost of Each Failure

0.01
Fig. 16. Results for the commercial aircraft, g = 50,000 cycles, activities performed $10 $100 $1,000 $10,000 $100,000
at level of rigor 3, activities are not independent.
Cost of Each Failure

Fig. 18. Results for the commercial aircraft, g = 50,000 cycles, activities performed
90 at level 5 of rigor, activities are not independent.
80
70
60
Count

50 60
40
30 50
20 40
Count

10 30
0
20
0%<ROI<200%
-200%<ROI<0%

1000%<ROI<1200%

1200%<ROI<1400%

1400%<ROI<1600%

1600%<ROI<1800%
200%<ROI<400%

400%<ROI<600%

600%<ROI<800%

800%<ROI<1000%

10
0
0%<ROI<200%
-200%<ROI<0%

1000%<ROI<1200%

1200%<ROI<1400%

1400%<ROI<1600%

1600%<ROI<1800%
200%<ROI<400%

400%<ROI<600%

600%<ROI<800%

800%<ROI<1000%
Return on Investment (ROI)

Fig. 17. Histogram of ROIs for the commercial aircraft, g = 50,000 cycles, activities
performed at level of rigor 3, activities are not independent.
Return on Investment (ROI)

Fig. 19. Histogram of ROIs for the commercial aircraft, g = 50,000 cycles, activities
in the results shown in Figs. 12 and 13), the result in Figs. 14 and performed at level 5 of rigor, activities are not independent.
15 are obtained.
The median ROI when all activities are performed at level of
rigor 3 is 240% and the median ROI when activities are performed
at level of rigor 5 is 265%. There is not much change in ROI because performed at level of rigor 5 is 175%. Clearly, when activities are
the additional benefits of performing activities at a higher level of not independent, their effectiveness is reduced.
rigor cost more to attain.

3.4.5. Activities that are not Independent 4. Summary and discussion


In all the cases considered so far, we have assumed that activi-
ties are independent, implying that if multiple activities that affect Adoption and insertion of new technologies and processes into
the same mode, mechanism, cause, or part are performed, the full systems is inherently risky. An assessment of the cost of risk may
benefit of each activity is realized. Now we assume that performing be a necessary part of planning or building a business case to
multiple activities reduces the benefit of performing other activi- change a system. A cost-based FMEA model that forecasts the cost
ties that affect the same mode, mechanism, cause, or part. In this of risk associated with inserting a new technology into a system
scenario, we assume that the user performs all activities (because has been used to assess a lead-free control plan for the same pro-
they are required to do so by regulation or the customer), but after duct in various risk scenarios. In the model the projected cost of
performing one activity that affects a particular mode, mechanism, failure consequences (PCFC) is defined as the cost of all failure
cause, or part, performing additional activities has no effect on that events (of varying severity) that are expected to occur over the ser-
mode, mechanism, cause, or part. Two simulations were run, one vice life of the system. The PCFC is uncertain, and the potential
with activities performed at level 3 of rigor and the other with positive impact of adopting new technologies into the system is
activities performed at the level 5 of rigor. The results are shown to reduce the cost of risk and/or reduce its uncertainty.
in Figs. 16 through Fig. 19. The case study presented assesses the adoption of a lead-free
The median ROI when activities are not independent and per- solder control plan (required by customers) into a system that pre-
formed at level of rigor 3 is 26%, and when activities are viously used tin–lead solder. The case study applied of the model
E. Lillie et al. / Microelectronics Reliability 55 (2015) 969–979 979

to two applications: a power supply in a personal computer (PC) Acknowledgements


and in a commercial aircraft. This case study was performed to
show that if one had accurate data on the PCFC for a system, the The authors also wish to thank the members of the PERM
cost of performing various activities, and the benefit of performing Consortium and the Center for Advanced Life Cycle Engineering
the same activities, a judgment could be made, with a quantifiable (CALCE) at the University of Maryland for funding this work.
level of certainty, as to the cost-effectiveness of performing some
or all of the activities in the control plan. In the case study per- References
formed for this paper, performing activities was far more cost
effective when the power supply was used in a commercial aircraft [1] Ganesan S, Pecht MG. Lead-free electronics. Wiley; 2006.
[2] Titas J. Was lead-free solder worth the effort? Electron Compo News 2011
than when used in a PC, because the power supply had a greater [December 28].
service life requirement and higher financial consequences of fail- [3] Pinsky DA, Rafanelli AJ, Condra LW, Amick PJ, Anderson V. How the aerospace
ure when used in an aircraft. The power supply is projected to fail industry is facing the lead-free challenge. LEAP-WG White Paper. Aerospace
Industries Association; 2006.
more often over its service life in an aircraft and the entities [4] ANSI-GEIA-STD-0005-1. Performance standard for aerospace and military
responsible for supporting the power supply incur more cost when electronic systems containing lead-free solder. APMC-Avionics Process
the power supply fails, hence there is more benefit to spending Management & G12-Solid State Devices Committee, Revision/Edition: A Chg:
Date: 03/00/12; 2012.
money to reduce the expected number of failures.
[5] Bradley E, Handwerker CA, Bath J, Parker RD, Gedney RW. Lead-free
The basic model presented in this paper assumes that the risk electronics: iNEMI projects lead to successful manufacturing. Wiley-IEEE
mitigation activities are independent of each other, that is per- Press; 2007.
forming one activity does not affect the benefit associated with [6] Puttlitz KJ, Stalter KA. Handbook of lead-free solder technology for
microelectronic assemblies. Monticello (NY): Marcel Dekker, Inc.; 2004.
performing another.9 This will not always be the case, since multiple [7] Palesko A. Breaking down the cost impact of lead-free manufacturing. In:
activities may impact the same failure mechanism. The current Proceedings of the international conference on electronic packaging, Tokyo
architecture of the model can accommodate narrowly defining the Japan; April 18–20, 2007. p. 187–92.
[8] Sandborn P, Jafreen R. Cost of accommodating the transition to lead-free
application of activities to specific parts, specific modes, specific electronics. In: Proceedings of the 2007 Aging Aircraft Conference, Palm
mechanisms, and/or specific parts; however, the current model can Springs, CA; April 2007.
only assume either the best case (independence of activity impacts), [9] Barringer HP. Life cycle cost & reliability for process equipment. In:
Proceedings energy week conference & exhibition, Houston, TX; 1997.
or the worst case (once one activity is performed on a specific mode, [10] Sears Jr RW. A model for managing the cost of reliability. In: Proceedings of the
mechanism, or cause of failure for a specific part performing addi- annual reliability and maintainability symposium, Orlando, FL, USA; January
tional activities that affect the same mode, mechanism, or cause 29–31, 1991. p. 64–9.
[11] Jiang ZH, Shu LH, Benhabib B. Reliability analysis of non-constant-size part
on the part results in no additional benefit). Additionally, some populations in design for remanufacturing. ASME J Mech Des 2000;122:172–8.
activities may be effectively ‘‘grouped’’, i.e. one activity may be a [12] Hauge BS, Johnston DC. Reliability centered maintenance and risk assessment.
pre-requisite for another. While correlating inputs in the model In: Proceedings of the annual reliability and maintainability symposium,
Philadelphia, PA, USA; January 22–25, 2001. p. 36–40.
may accommodate some of the possible dependencies, fundamental
[13] Perera J, Holsomback J. An integrated risk management tool and process. In:
changes to the architecture of the model could be needed. Proceedings of the IEEE aerospace conference, Big Sky, MT, USA; March 5–12,
Redundancy in systems potentially needs to be accommodated 2005. p. 129–36.
in the model. In the aircraft case study in this paper, the power [14] Sun B, Shu G, Podgurski A, Ray S. CARIAL: Cost-aware software reliability
improvement with active learning. In: Proceedings of the IEEE international
supply is assumed to be non-flight critical and therefore redun- conference on software testing, verification and validation (ICST), Montreal,
dancy has been ignored. If the power supply was a flight-critical Canada; April 17–21, 2012. p. 360–9.
item, it could be redundant with another identical power supply, [15] Liu R, Boggs S. Cable life and the cost of risk. IEEE Electr Insul Mag
2009;25:13–9.
and if the power supply being modeled fails the redundant power [16] Rhee S, Ishii K. Using cost based FMEA to enhance reliability and serviceability.
supply immediately takes over. We model various failure events Adv Eng Inform 2003;17:179–88.
where the power supply is repaired or replaced during scheduled [17] Kmenta S, Ishii K. Scenario-based failure modes and effects analysis using
expected cost. ASME J Mech Des 2004;126:1027–35.
maintenance, and we also model failures that are severe enough [18] Taubel J. Use of the multiple severity method to determine mishap costs and
that the aircraft cannot take-off because the power supply needs life cycle cost savings. In: Proceedings from the 29th international system
to be replaced before scheduled maintenance. However, the pre- safety conference, Las Vegas, NV, USA; August 8–12, 2011.
[19] MIL-STD-882C U.S. Department of Defense; 1993.
sent model does not consider the situation where one power sup- [20] Sandborn P. Cost analysis of electronic systems. Singapore: World Scientific;
ply fails, the redundant power supply takes over, and then the 2013.
redundant power supply also fails in the same flight (and the [21] Matthew S, Alam M, Pecht M. Identification of failure mechanisms to enhance
prognostic outcomes. J Fail Anal Prev 2012;12:66–73.
power supply is flight critical). This is a situation that has very
[22] Oh H, Azarian MH, Pecht M. Physics-of-failure approach for fan PHM in
low probability, but one that should be considered if all potential electronics applications. In: Proceedings of the prognostics & system health
failure events were to be modeled. This could be modeled by a dis- management conference, Macau, China; January 12–14, 2010.
crete event simulator that models every flight an airplane takes [23] Das D. Use of thermal analysis information in avionics equipment
development. Electron Cool 1999;5:28–34.
over its service life and checks if the second redundant power sup- [24] George E, Osterman M, Pecht M, Coyle R. Effects of extended dwell time on
ply fails after it takes over. Treatment of flight-critical systems thermal fatigue life of ceramic chip resistors. In: Proceedings of the 45th
would potentially also require modeling common-mode failures international symposium on microelectronics, San Diego, CA, USA; September
9–13, 2012.
and single point of failure analysis. For flight critical systems, [25] Wang W, Osterman M, Das D, Pecht M. Solder joint reliability of SnAgCu colder
redundancy is one of the primary system architecture design refinished components under temperature cycling test. IEEE Trans Compo,
strategies to achieve and certify the required by law probability Pack Manuf Technol 2011;1:798–808.
of a sever or critical failure to be 109 or better.

9
An example of the non-independence of activities is considered in Section 3.4.5
(Figs. 16–19), however the non-independence of activities is potentially much more
complex than the example provided.

You might also like