09 RTS Redundancy

Reliability of Technical Systems
Main Topics
1. Short Introduction, Reliability Parameters: Failure Rate, Failure
Probability, etc.
2. Some Important Reliability Distributions
3. Component Reliability
4. Introduction, Key Terms, Framing the Problem
5. System Reliability I: Reliability Block Diagram, Structure Analysis (Fault
Trees), State Model.
6. System Reliability II: State Analysis (Markovian chains)
7. System Reliability III: Dependent Failure Analysis
8. Data Collection, Bayes Theorem, Static and Dynamic Redundancy
9. Advanced Methods for Systems Modeling and Simulation (Petri Nets,
network theory, object-oriented modeling)
10. Software Reliability, Fault Tolerance
11. Human Reliability Analysis
12. Case study: Building a Reliable System
HS 10 / ETH Zürich Reliability of technical Systems 2

Data Collection
• Specific Data
Available data for a specific unit same as the unit being subject of
analysis; its validity hence is provided.
This kind of data is ideal for a reliability analyis. Nevertheless,
often there is a lack of it in practice.
• Generic Data
Such data often are given in publications for „similar“ units; the validity of
this data is not given per se.
Application to other units is questionable. However, convenient
increase of the data basis
• „expert judgement“
subjective judgement of an expert regarding the unit behavior.
Rather inappropriate for a reliability analysis, but often the only
available data source.
Data Collection
Assumptions
Characterizing a unit / component
• Ensure statistical similarity between database and analysis
- Construction
- Conditions, i.e. process parameters (pressure, temperature), Medium,
Environment u.a.
- Operational conditions, e.g. active versus stand-by
• Definition of a failure
• Definition of an observation period
• Definition of the boundary elements.
Boundaries
Motor Pump Pipelines
Connections:
flange, weld, etc.
Data Collection
Plant specific data sources
Current basic documents are business documents (BU), i.e. damage
reports, repair orders, etc.
• Loss of species, causes, impacts are rarely held
• BU are usually not designed for reliability data function,

must represent at least 90% of all failures (events).

Bayes Theorem
Conditional Probability
• It is important to compute the probability of an event A given that

another event B has occurred, which is called conditional probability of
A given B
Where P(A|B) gives the probability of the event A not on the entire possible
sample space Ω, but on the sample space relative to the occurrence of B
• Event A is said to be statistically independent from event B if

P(A|B)=P(A)
• Statistical independence should not be confused with mutual exclusivity

(XA XB =0), which represents a logical dependence: knowing that A has
occurred, guarantees that B cannot occur

Bayes Theorem
Conditional Probability : Exercise Example
There are two streams flowing past an industrial plant. The dissolved
oxygen, DO, level in the water downstream is an indication of the
degree of pollution caused by the waste dumped from the plant. Let A
denote the event that stream a is polluted, and B denote the event that
stream b is polluted. From measurement taken on the DO level of each
stream over the last year, it was determined that in a given day
P(A) = 2/5 and P(B) = ¾

and the probability that at least one stream will be polluted in any given
day is P(A U B) = 4/5
Q1: Determine the probability that stream a is also polluted given that
stream b is polluted.
Q2: Determine the probability that stream b is also polluted given that
stream a is polluted.

Bayes Theorem
Conditional Probability : Exercise - Solution
We have
The probability that both streams are polluted
= (2/5)+(3/4)-(4/5)=7/20
For Q1:
P(A|B) =
For Q2:
P(B|A) =

Bayes Theorem
Theorem of Total Probability
• Consider a partition of the sample space Ω into n mutually exclusive

and exhaustive events Ej , j = 1,2,…..,n.
• Given any event A in Ω, its probability can be computed in terms of the

partitioning events Ej (j = 1,2,…..,n), and conditional probabilities of A on
these events :

Bayes Theorem
Bayes Theorem
• What is the probability that event Ej has occurred if there is the evidence
that event A has occurred ?
• Equation above updates the prior probability value P(Ej) of event Ej to

the posterior probability value P(Ej |A) where P(A) can be computed by
applying the theorem of total probability.

Bayes Theorem
Bayes Theorem : Exercise Example
Same components are purchased from 3 suppliers (S1, S2, S3) in

quantities of 1000, 600, 400 pieces, respectively. The probabilities for one
component to be defective are 0.006 for S1, 0.02 for S2, and 0.03 for S3.
All the components are stored in a common container disregarding their
source.
Q1. What is the probability that one component randomly selected from the
stock is defective ?
Q2. Let one component as selected in previous question be defective.

What is the probability that it is from S1 ?

Bayes Theorem
Bayes Theorem : Exercise Solution
Q1: Pr(the selected component is defective) =
Q2: Pr (component from S1 | component is defective)
Using Bayes Theorem equation
Pr(component from S1 | component is defective) =

=

Redundancy
Redundancy
Existence of more than one means for performing a required function in

item.
• For hardware, distinction is made between active (hot, parallel), warm

(lightly loaded), and standby (cold) redundancy.
• Redundancy does not necessarily imply a duplication of hardware, it can

be implemented, for example, by coding or by software.
•To avoid common mode failures, redundant elements should be realized

independently from each other.

Redundancy

Redundancy
The properties of redundancy characterize various issues of redundancy
rather than distinguishing different types of redundancy:
• The extension by extra components in the structure and functions model.
• The extension by extra functions in the structure and functions model.

These extra functions can be different from the already existing ones
(additional functions) or satisfy the same specification by a different
implementation (diversity).
• The additional information to be stored, transferred and processed.
• The additional time requirements.
Redundancy is either used from the beginning of the system operation

(active / hot) or activated on fault occurrence (standby / cold) or used in a
combination thereof (lightly loaded).
Redundancy

Redundancy

Redundancy
Supposing a system consists of components which will not fail with a

probability of 99% (p=0.99) and which are connected in series. Then the
probability that the entire system will not fail changes with the number of
components as follows:
10 components lead to a survival probability of 90.40%

20 components lead to a survival probability of 81.71 %

Static Redundancy: n-out-of-m system
• The system is faultless, if at least n out of m existing components are
faultless.
• If n=1, complete redundancy occurs (in parallel), and if n=m, the m

components are, in effect, in series.
• The reliability may be obtained from the binomial probability distribution.
• If R is reliability of each independent trial, then the probability of n or more

successes among the m components may be represented as:

Parallel System: 1-out-of-m system
Km
E …
Ki A
Reliability of the parallel system :
…
K1
Compare with reliability of the series system (non redundancy):


Case Study
Learning from Deficits: Gulf Oil Spill and Breach of Basic Principles
March – April 2010

• Oil rig in preparation to move to another job
• Temporarily plug and cap the well with cement
• Rise in pressure from the well that suggested the cement was not holding
• First test showed large abnormality, second test was misread and declared as safe
April 20
• Jump in pressure from oil and gas rising in the well
• Methane expanded on the rig without given warnings
• All applications in operation, including those dangerous to ignite the methane
• Explosion on rig, chaotic conditions to evacuate the rig, weak clear directives
• Closing of blowout preventer failed
• Consequences
• 11 victims, 17 injured
3 3
• ~780 x10 m oil spilled in ocean (2 Super-tankers)
InTech, August 10

Learning from Deficits: Gulf Oil Spill and Breach of Basic Principles
A dead battery in the BOP’s

„brain“ which gives pressure
readings and controls other
functions in the giant stack of A leak in the hydraulic
valves. system that sends emergency
power to rams, valves that are
supposed to close off the
space around the pipe.
The shear ram, the BOP’s
valve of last resort, wasn’t
strong enough to cut through
joints in the pipe. Those joints
account for about 10 percent of
the pipe’s length.
Several “unexpected”
modifications to the BOP,
including test ram in place of a
The cement seal around the real one. Schematics didn’t
casing pipes in the well failed match the actual device.
pressure tests before the
explosion - gas may have been
building up in the well.
Washington Post, May 13

References:
• Zio, Enrico. (2007) An Introduction To the Basics of Reliability and Risk Analysis. World
scientific Publishing Co.
• Birolini, Alessandro. (2007) Reliability Engineering: Theory and Practice (5th edition).
Springer-Verlag: Berlin

Malfunctions of an unit (failure modes)
Functions Types of failure

Closing Fails open
Only partly closed
Opening Fails closed
Only partly opened
Remain closed Opens completely
Partly opens
Remain opened Closes completely
Partly closes

Techniques to speed up the experiment
 Sequential test
 Accelerated test
 Extrapolation

Sequential Test (I)
If the actual failure rate λ does not exceed the limit λ1 (λ < λ1) with high probability
w1, the components are to be accepted.
On the other hand, if λ does not under-run the limit λ2 (λ > λ2) with high probability
w2, the components are to be rejected.
Given: λ1, sample size n, acceptance threshold k, time of experiment t.
Let X be the number of failures (Poisson distributed) within the time interval [0, t].

Sequential Test (II)
For given probabilities α and β is true:
For λ= λ1 For λ= λ2
Probability of acceptance 1- α β
Probability of rejection α 1- β
β
ln
i≤ 1 − α + λ2 − λ1 ⋅ n ⋅ t
Accept decision if: λ λ
ln 2 ln 2
λ1 λ1
1− β
ln
i≥ α + λ2 − λ1 ⋅ n ⋅ t
Reject decision if: λ λ
ln 2 ln 2
λ1 λ1

Sequential Test (III): Illustration
number of failures
reject decision
accept decision
cumulative testing time
The minimum testing time (nt)min and the minimum number of failure imin amount to:
where α and β are probabilities

Sequential Test (IV): Algorithm
failure rate → reject failure rate → reject

is high is high
test until failure rate test until failure rate

first → is neither → → is neither → ...
next
failure high nor low failure high nor low
failure rate → accept failure rate → accept

is low is low

Extrapolation (I)
Failure prediction by extrapolation:

Shorter testing time
 Test is non-destructive
Conditions:
Just drift failures
 Failure criteria are known

Extrapolation (II): Illustration
parameter y
failure criteria
t
texp tpred
t exp testing time

t pred predicted life time (forecast through extrapolation)

Accelerated Test (I)

Accelerated Test (II): Algorithm
E 1 1
L1 ( − )
Acceleration factor F1, 2 = =e K T1 T2
L2
Measure mean life time L2 of component by T2 > T1, where T1 is a low temperature.
If the quotient E/K is unknown, repeat the test by T3 > T1 and find it from
E 1 1
L2 ( − )
=e K T2 T3
.
L3
E 1 1
( − )
Then calculate L=
1 L2 ⋅ e K T1 T2
.

Accelerated Test (III): Illustration

09 RTS Redundancy

Uploaded by

Copyright:

Available Formats

09 RTS Redundancy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

09 RTS Redundancy

Uploaded by

Copyright:

Available Formats

Reliability of Technical Systems

HS 10 / ETH Zürich Reliability of technical Systems 2

Motor Pump Pipelines

• Loss of species, causes, impacts are rarely held

• BU are usually not designed for reliability data function,

HS 10 / ETH Zürich Reliability of technical Systems 5

• It is important to compute the probability of an event A given that

• Event A is said to be statistically independent from event B if

• Statistical independence should not be confused with mutual exclusivity

HS 10 / ETH Zürich Reliability of technical Systems 6

P(A) = 2/5 and P(B) = ¾

HS 10 / ETH Zürich Reliability of technical Systems 7

The probability that both streams are polluted

HS 10 / ETH Zürich Reliability of technical Systems 8

• Consider a partition of the sample space Ω into n mutually exclusive

• Given any event A in Ω, its probability can be computed in terms of the

HS 10 / ETH Zürich Reliability of technical Systems 9

• Equation above updates the prior probability value P(Ej) of event Ej to

HS 10 / ETH Zürich Reliability of technical Systems 10

Same components are purchased from 3 suppliers (S1, S2, S3) in

Q2. Let one component as selected in previous question be defective.

HS 10 / ETH Zürich Reliability of technical Systems 11

Q1: Pr(the selected component is defective) =

Q2: Pr (component from S1 | component is defective)

Using Bayes Theorem equation

Pr(component from S1 | component is defective) =

HS 10 / ETH Zürich Reliability of technical Systems 12

Existence of more than one means for performing a required function in

• For hardware, distinction is made between active (hot, parallel), warm

• Redundancy does not necessarily imply a duplication of hardware, it can

•To avoid common mode failures, redundant elements should be realized

HS 10 / ETH Zürich Reliability of technical Systems 13

HS 10 / ETH Zürich Reliability of technical Systems 14

• The extension by extra components in the structure and functions model.

• The extension by extra functions in the structure and functions model.

• The additional information to be stored, transferred and processed.

• The additional time requirements.

Redundancy is either used from the beginning of the system operation

HS 10 / ETH Zürich Reliability of technical Systems 16

HS 10 / ETH Zürich Reliability of technical Systems 17

Supposing a system consists of components which will not fail with a

10 components lead to a survival probability of 90.40%

HS 10 / ETH Zürich Reliability of technical Systems 18

• If n=1, complete redundancy occurs (in parallel), and if n=m, the m

• The reliability may be obtained from the binomial probability distribution.

• If R is reliability of each independent trial, then the probability of n or more

HS 10 / ETH Zürich Reliability of technical Systems 19

Compare with reliability of the series system (non redundancy):

HS 10 / ETH Zürich Reliability of technical Systems 20

HS 10 / ETH Zürich Reliability of technical Systems 21

March – April 2010

HS 10 / ETH Zürich Reliability of technical Systems 22

A dead battery in the BOP’s

Washington Post, May 13

HS 10 / ETH Zürich Reliability of technical Systems 23

HS 10 / ETH Zürich Reliability of technical Systems 24

Functions Types of failure

HS 10 / ETH Zürich Reliability of technical Systems 26