What Is Wrong With The Existing Reliability Prediction Methods?
What Is Wrong With The Existing Reliability Prediction Methods?
What Is Wrong With The Existing Reliability Prediction Methods?
251-257 (1990)
SUMMARY
Inaccurate reliability predictions could lead to disasters such as in the case of the U.S. Space Shuttle
failure. The question is: ‘what is wrong with the existing reliability prediction methods?’ This paper
examines the methods for predicting reliability of electronics. Based on information in the literature
the measured vs predicted reliability could be as far apart as five to twenty times. Reliability calculated
using the five most commonly used handbooks showed that there could be a 100 times variation. The
root cause for the prediction inaccuracy is that many of the first-order effect factors are not explicitly
included in the prediction methods. These factors include thermal cycling, temperature change rate,
mechanical shock, vibration, power on/off, supplier quality difference, reliability improvement with
respect to calendar years and ageing. As indicated in the data provided in this paper any one of these
factors neglected could cause a variation in the predicted reliability by several times. The reliability vs
ageing-hour curve showed that there was a 10 times change in reliability from 1000 ageing-hours to 10,
000 ageing-hours.Therefore, in order to increase the accuracy of reliability prediction the factors must
be incorporated into the prediction methods.
KEY WORDS Reliability prediction MiI-Hdbk-217 Environmental factors Environmental stress screening Age-
ing
500
100
12
10
C
= 6
(0
0,
2
=
e 1
-.-e,
t .6
?!
B0
.l
UPW I
!
Confide
.05 Limit
Polnt E8
.o 1 Lowar
Conflda
Clmlt
.005
I J I I I I I I
.Wt .OM .01 .06 .1 .6 1 6 10 60
(a) RPP: Bellcore reliability prediction pro- different from the others. No wonder that the pre-
cedure dictions simply do not match the experienced num-
(b) 217D: Mil-Hdbk-217D bers. Part of the reason for the inaccuracy lies in
(c) BT: British Telecom Handbook the inaccuracy of the values assigned to the various
(d) CNET: French National Center of Telecom- input parameters which in turn affect the failure
munications failure data compilation rates. Expansion of the failure database to correlate
(e) N R : Nippon Telegraph and Telephone the parameters to the failure rates can improve the
Reliability Table accuracy in this area. The most important reason
(f) Supplier A or B: Memory board supplier’s for the inaccuracy lies in the fact that many critical
own procedure factors that influence reliability in the first order
manner are not even considered in the existing mod-
The predictions in Table I1 clearly indicate that each els. These critical factors will be discussed individu-
method came up with its own reliability number ally in the remainder of this paper.
WHAT IS WRONG? 253
Range of
quarterly MTBF
1985 values
Number 1985 cumulative 90 Cumulative field
Equipment of cumulative Demonstrated per cent Predicted MTBF divided by
type failures period hours MTBF confidence MTBF predicted MTBF Low High
~- - ~~ ~
Table 11. Predicted reliability of a large memory board following paragraphs discuss a number of the critical
but neglected real environments.
Board failure rate
Percentage
Procedure FITS per year
Vibration and mechanical shock
RPP 38,500 33 Each of the general environmental factors used
217D 4,240,460 3713 in the existing models implies certain vibration and
BT 700 0.6 shock environment. For example, for the factor of
CNET 37,870 33 ‘airborne fighter uninhabited’, vibration and shock
NTT 37,940 33
Supplier A 56,280 49 spectrum and magnitude expected from the unin-
Supplier B 19,600 17 habited compartment of a fighter are assumed.
Unfortunately, knowing the general vibration level
of an area is far from knowing the true vibration
level in a unit. Each unit would have its own
environment depending where it is installed and the
THE REAL ENVIRONMENTS
exact mounting structure. Figure 2, from Reference
The so-called environmental factors used in the 4, shows the response spectrum on a printed circuit
existing reliability prediction models are usually board (PCB) subjected to U.S. NAVMAT P-9492
multipliers for modifying the basic failure rates to standard environmental stress screening vibration
reflect the influence of the particular environments input. From this Figure one can discern that the
on generation of failures. These factors are usually amplitudes of the vibration were dampened at some
classified into broad categories such as ‘ground frequencies and amplified at others. Each unit would
fixed’, ‘ground mobile’, ‘airborne fighter, uninhabi- have its own response spectrum and, hence, its own
ted’, and ‘missile launch’. Temperature is usually failure rate when subjected to the same vibration
considered separately. Thus, except for tempera- input. Figure 3, from Reference 5 , shows time to
ture, these general environmental factors include failure of an air-to-air missile with respect to
many other kinds of real environments: e.g. vibration levels. The scattering of points in this Fig-
vibration, mechanical shock, thermal cycling, ther- ure indicates that there was a very large variation
mal shock, thermal transient, electrical transient, in the response from missile to missile. Furthermore,
power on/off, moisture, humidity, altitude, sand and the part type dominating the failure distribution
dust, and chemical contamination. Many of these changes with vibration intensity as shown in Figure
real environments depend on equipment usage. The 4, from Reference 5. This is because different failure
254 K . L. WONG
S C
-55°C/+1250C
: :
2
0
1 20 40 60 180 200 220 240 260
80 100 120 140 160ooc1+550c
Figure 4. Change of failure mode distribution with vibration Figure 5. IC failures as function of temperature cycles for differ-
level for air-to-air missile ent temperature ranges 0 1968 IEEE
WHAT IS WRONG? 255
RELIABILITY IMPROVEMENT VS
CALENDAR YEAR
Figure 10, from Reference 14, shows how the failure
rate of a part decreases with calendar year. This was
due to the continuous upgrading of the manufactur-
ing processes by all of the manufacturers as well as
the specific part supplier. In the computer reliability
prediction by function regression equation in Mil-
Hdbk-338, the coefficient for calendar year is the
I
1.o
IM)
f 250
I I
I
500
I
low
I
I
2500
I
woo
I
I
I
76.000
1o.Ooo E0
m
.
I
I 100.000 second most significant factor in the equation (the
first being addhubtract time). This is because the
AVERAGE AGE/LRU. HRS
I I I 1 1 I IIII equation was developed at a time when integrated
JAN JAN UN
72 73 7) circuits reliability was being imprved at a tremen-
dous rate and large quantities of integrated circuits
Figure 8. Electronics failure rate versus average age in hours
0 1979 IEEE were being used in computers. The reliability could
easily double every three years. Thus, if the true
“O1’
reliability of a system built at a particular time frame
is desired, this factor must be included.
CONCLUSIONS
It has been shown in this paper that the various
electronics reliability prediction methods do not pro-
vide consistent results, and the measured reliabilities
do not agree with the predicted numbers. The root
cause for the prediction inaccuracy lies in the fact
that many of the first-order effect factors are not
0 I 2 3 4 5 6 7 8 9 1 0 1 1 12131415161718 explicitly included in the prediction methods. These
HALF YEAR INTERVALS
factors include thermal cycling, temperature change
Figure 9. Unsmoothed failure ratios of spacecraft operation in rate, mechanical shock, vibration, power on/off,
orbit @ 1987 IEEE
supplier quality difference, reliability improvement
with respect to calendar years and ageing trends.
Any one of these factors neglected could cause a
that ageing is not limited to the operating period. variation in the predicted reliability by several times.
A system would age, although very slowly, when In order to make the predictions more accurate
not operating. Any reliability prediction method these factors must be incorporated into the predic-
must take into account this ageing effect. Otherwise tion methods.
one could be off by a factor of 10.