Reliability
Reliability
Reliability
RELIABILITY
DEFINITION Reliability is the ability of an item to perform its intended function under stated operation conditions for a given period of time. The definition stresses on four significant elements probability, intended function, time and operating conditions. Probability Consideration of variation makes reliability a probability. It is possible to identify the frequency distribution of an item, which permits prediction of life of the item, e.g., the probability of an item functioning is 0.85 for 60 hours indicates that only 85 times out of 100, we would expect the item to be functioning for a period of 60 hours. Intended Function For an item to be reliable, it must perform a certain functions satisfactorily when called upon to do.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 1
RELIABILITY
Time Time is the most important factor in the assessment of reliability, since it represents a measure of period during which one can expect a certain degree of performance from an item. Stated Conditions The applications and operating circumstances under which an item is put to use is an important component of reliability. As the operating conditions change, the reliability of an item also changes. Operating conditions such as temperature, humidity, torque, and corrosive atmosphere all have a definite effect on performance.
4/22/2013
RELIABILITY
Failure Failure of an item represents unreliability. Thus, to compute the reliability of an item, it is necessary to understand the concept of failure. A deviation in the properties of an item from the prescribed conditions is considered as fault. A state of fault is denoted as Failure. An item is considered to have failed under one of the following conditions: When it becomes completely inoperable. When it is still operable, but no longer able to perform a required function. When a serious deterioration makes the item unsafe for its continued use.
RELIABILITY
Causes of Failure Some of the causes of failures are: Deficiencies in design. Improper selection of process and manufacturing technique. Lack of knowledge and experience. Error of assembly. Improper service conditions. Inadequate maintenance. Variation in environmental and operating conditions. Human errors.
4/22/2013
RELIABILITY
Nature/Modes of Failures The different modes of failure are: 1) Catastrophic Failures: In this case, a normally operating item suddenly becomes inoperative. Example: Blowing of a fuse or electric bulb. 2) Degradation (Creeping) Failures: These failures occur gradually because of change in some parameter with time. Example, change in resistance will affect the performance of a resistor. 3) Independent Failures: These are the failures, which occur independently and does not depend on failure of the others. 4) Secondary Failures: A secondary failure occurs as a result of some primary failure. For example tsunami occurred due to earthquake. 5) Failure due to improper handling and misuse: These are caused mainly due to certain factors like overloading, stressing beyond the capacity.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 5
Bath Tub Curve (Phases of Failures) Analysis of failure data has shown that failures in general can be grouped into different modes depending upon the nature of the failure. When a large number of units are put into operation, it is likely that there is a large number of failures initially. These failures are called Initial failures or Infant Mortality. After the initial failures, for a long period of time of operation fewer failures are reported but it is difficult to determine their causes. The failures during this period are often called random or catastrophic failures. This is a period of normal operation. As the time passes, the units get worn out due to wear and tear and begin to deteriorate. Here in this period, the failures are due to wear and tear and due to ageing. This region is called the wear out region. This curve is also called as Life Cycle Curve.
4/22/2013
F A I L U R E R A T E
EARLY LIFE
(burn-in or break-in or infant mortality period)
WEAROUT LIFE
Bath Tub Curve (Phases of Failures) Early Failures These failures occur at the beginning due to the probability of defective design, manufacturing or assembly and quality control techniques during manufacturing. These are eliminated by debugging or burn in process. The weak and substandard products/components that fail during early hours of system operation are replaced by good or tested components. Debugging is a method of accelerating the completion of early failures by operating the system continuously for number of hours, correcting them and then releasing the system for actual use. Debugging is done generally prior to dispatch to the user to ensure the detection and elimination of early failures. Warranty is based on the concept of early failures.
4/22/2013
Catastrophic (Chance) Failures These failures are predominant during actual working of the system. They occur randomly and unexpectedly. The failure rate is fairly constant. These are caused due to sudden stress accumulation beyond the design strength of the material. This phase is called the useful life of the component. The failures at this stage can be minimised by introducing redundancy in the system. Wear Out Failures The item is more likely to fail due to wear and tear and the number of failures will be high. This is a typical ageing problem. Proper care and maintenance will reduce the failures at this stage.
Measures of Reliability/ Quantification of Reliability 1) Failure Rate Failure rate is expressed in terms of failures per unit time i.e. as failures per hour, or failures per 100 or 1000 hours. Failure rate is the ratio of number of failures (f) during a specified test interval to the total test time of items undergoing test. = f/T = Failure rate f = Number of failures during the test interval T= Total test time When the design is new, failure rate is high and when the design is matured, the failure rate is fairly constant. Smaller the value of failure rate, higher is the reliability of the system.
10
4/22/2013
Measures of Reliability
2) Mean Time Between Failures (MTBF) MTBF is referred to as the average time of satisfactory operation of the system. Larger the MTBF, higher is the reliability of the system. It is applicable to repairable systems and is expressed in hours, e.g. If an item fails 8 times over a period of 40,000 hours of operation, the MTBF would be 500 hours. During the operating period, the failure rate is fairly constant. MTBF is the reciprocal of the constant failure rate or the ratio of test time to number of failures. MTBF = 1/
11
Measures of Reliability 3) Mean Time To Failures (MTTF) This is applicable to non-repair systems. The mean time to failure is expressed as the average time an item is expected to function before failure. If we have the life test information on n items with failure times t1, t2,.tn, then the mean time to failure is defined as MTTF = 1/nti, where i= 1 to n The exponential formula for Reliability The distribution of time between failures indicates the chance of a failure-free operation for a specific time interval. When the failure rate is constant, the probability of survival (reliability) is given by: Ps = R = e-t
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 12
4/22/2013
System Reliability
System A system is a collection of components, subsystem and/or assemblies arranged to a specific design order to achieve desired functions with acceptable performance and reliability. Overall system results in the functioning of a product and a measure of how well a system performs are based on the quality of design. The basic steps for establishing system reliability are as follows:
The components and sub-systems constituting a given system and individual reliability factors can be estimated, identified and computed. A block diagram known as reliability block diagram represents the configuration of a systems sub-components. The condition for a successful operation of a system is then established to decide the functioning of components. Combination rules of theory of probability are formulated to estimate a systems reliability.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 13
System Reliability Types of a System Repairable System It is a system that can be restored to an operating condition after a failure by a repair or replacement of one or more components. The following are the types of reparable systems:
Continuously Operating System: This type of system, when put in operation to function, continues to operate till its failure or it stops for planned maintenance schedules. For example, nuclear reactors or satellites. On and off Operating System: Such a system can be operated at the time when a consumer desires, and it can be re-operated when required, for example, television, mobile phones. Intermittently Operating System: This type of system is always ready to perform but is operating intermittently, for example, cars and planes.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 14
4/22/2013
System Reliability
Types of a System Non Repairable System It is also known as an instantaneous operating system with only one cycle of performance, for example, fuses and flash bulbs.
Reliability Block Diagram It is a diagram that represents how the components of a product, represented by blocks, are arranged and related reliability-wise in a larger system. It is often, but not necessarily, same as the way that components are physically related.
15
System Reliability System A system is a collection of components, subsystem and/or assemblies arranged to a specific design order to achieve desired functions with acceptable performance and reliability. Models of a System Series Structure
1 1
...
In this case, all n components must work in order for the whole system to work.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 16
4/22/2013
System Reliability
A Personal Computer
Power Supply
Motherboard
Processor 1
Hard Drive
Parallel Structures
2
2
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY . . .
17
System Reliability In a parallel system, the system will work as long as at least one component works. Combination of Series and Parallel Structures
3
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 18
4/22/2013
System Reliability
...
For a pure series system, the system reliability is equal to the product of the reliabilities of its constituent components. Or:
Rs R1 R2 ... Rn
19
1 2
. . .
For a pure parallel system, the overall system reliability is equal to the product of the component unreliability's. Thus, the reliability of the parallel system is given by:
Rs = 1 [(1 R1)(1-R2)(1-Rn)]
20
10
4/22/2013
System Reliability
Example:
Consider a system with three components. Units 1 and 2 are connected in series and Unit 3 is connected in parallel with the first two, as shown in the figure below. Find the reliability of the system.
Solution:
R1 = 0.99 R3 = 0.97
Monday, 22 April 2013
R2= 0.98
21
System Reliability Finding A Systems Reliability First, the reliability of the segment consisting of Units 1 and 2 is calculated:
R3 = 0.97
By- Himanshu Gupta
22
11
4/22/2013
System Reliability
1
5 2
Monday, 22 April 2013
4
LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 23
System Reliability Finding A Systems Reliability Standby Redundancy In a standby redundancy, only one component is operating where one or more components are in a standby mode to take over the operation, if the already operating component fails without accumulating any time. In case of parallel network, all the units in the configuration are active, whereas in standby redundancy, they are not. Reliability of a standby redundancy of n units in which one unit is operating and n-1 units are standby, ready to take over, until operating unit fails, is given by
n-1
24
12
4/22/2013
Reliability Improvement
A high degree of reliability is an absolute necessity for complex and modern systems to be used for industrial, military and other scientific purposes. There are many ways by which reliability of a component or system can be enhanced. These are: 1) Design and Safety Factor: In order to design reliability into products, reasons for product failures must be analyzed thoroughly. Generally, a product fails prematurely because of inadequate design features, manufacturing and part defects, abnormal stresses induced, environmental condition and human error. Higher reliability could be achieved through mature design. 2) Parts and Material Selection: Designer has to choose between selecting standard parts and manufactured specialized parts with higher reliability and greater tolerances. The trade off is usually in cost but ease of parts availability, ease of repair, energy requirements, weight and size may also be considered.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 25
Reliability Improvement 3) Redundancy: When it is not possible either to manufacture a highly reliable component or the cost associated with such manufacturing is too high, the system reliability can be improved by the techniques of redundancy. Redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system. The various approaches of introducing redundancy in the system are: a) To provide a duplicate or an additional path for the entire system itself. This is known as system or unit redundancy. b) To provide redundant path for each component individually which is called component redundancy. c) Use a combination of the above methods depending upon the configuration called mixed redundancy.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 26
13
4/22/2013
Failure Analysis Failure analysis techniques provide a methodical way to examine a proposed design for possible ways in which a failure can occur. There are two ways by which a failure analysis is done in reliability engineering: 1. Failure Mode and Effect Analysis (FMEA)/ Failure Mode and Effect Criticality Analysis (FMECA). 2. Fault Tree Analysis (FTA). Fault Tree Analysis The FTA is one of the many symbolic analytical logic techniques found in operations research and system reliability. Fault tree diagrams are logic block diagrams that display state of system (top event) in terms of the states of its components (basic events). FTDs provide an alternative to RBDs. An FTD is built top-down in terms of events rather than blocks.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 27
System Safety and Fault Tree Analysis Reliability and product safety are obviously related. Safety can be broadly defined as the avoidance of conditions that can cause injury, loss of life, or severe damage. Therefore the focus here is on failures that may create safety hazards. The objective is to determine during design how these failures are likely to occur, to estimate their probability of occurrence, and to take corrective action. Fault Tree Analysis A useful tool in performing a system safety analysis. It is a graphical design technique that provides an alternative to reliability block diagrams. It is a top-down, deductive analysis structured in terms of events rather than components. The perspective is on faults rather than reliability. All failures are faults, but not all faults may be considered failures.
Monday, 22 April 2013 LOVELY PROFESSIONAL UNIVERSITY By- Himanshu Gupta 28
14
4/22/2013
System Safety and Fault Tree Analysis Fault Tree Analysis There are four major steps to a fault tree analysis: i. Define the system, its boundaries, and the top event. ii. Construct the fault tree, which symbolically represents the system and its relevant events. iii. Perform a qualitative evaluation by identifying those combinations of events that will cause the top event. iv. Perform a quantitative evaluation by assigning failure probabilities or unavailability's to the basic events and computing the probability of the top event.
29
System Safety and Fault Tree Analysis Fault tree main symbols
AND gate
An External Event
Basic Event
30
15
4/22/2013
An Event / Fault: This can be a intermediate event (or) a top event. They are a result logical combination of lower level events. E.g. Both transmitters fail, Run away reaction OR Gate: Either one of the bottom event results in occurrence of the top event. E.g. Either one of the root valve is closed, process signal to transmitter fails.
AND Gate: For the top event to occur all the bottom events should occur. E.g. Fuel, Oxygen and Ignition source has to be present for fire.
31
External Event:
32
16
4/22/2013
Simple Examples
0.28
Example 1:
Transmitter Failed
OR
0.1 Transmitter 1 Failed Transmitter 2 Failed 0.2
Example 2:
Valve Failed
AND
0.001 Valve 1 Failed Valve 2 Failed 0.002
33
Procedure
Steps to get the final Boolean equation:
1. Replace AND gates with the product of their inputs. IE1 = A.B TOP
IE2 = C.D
2. Replace OR gates with the sum of their inputs. TOP = IE1+IE2 = A.B+C.D 3. Continue this replacement until all intermediate event gates have been replaced and only the basic events remain in the equation. TOP = A.B+C.D A B C D IE1 IE2
34
17
4/22/2013
Procedure
Boolean Algebra Reduction Example:
TOP = IE1 + IE2 = (A.B) + (A + IE3) = A.B + A + (C.D.IE4) = A.B + A + (C.D.D.B) = A + A.B + B.C.D.D (D.D = D) = A + A.B + B.C.D (A + A.B = A) = A + B.C.D A TOP
IE1
IE2
IE3
IE4
35
System Safety and Fault Tree Analysis Faults can be classified as primary, secondary, and command. A fault is primary if the component or part is functioning within its design parameters when an inherent failure occurs. A secondary failure occurs when an environmental stress or an excessive operational stress causes the failure. A command fault is one that occurs as a result of a correct action being accomplished at a wrong time or place. For example, a command fault may occur when turning power on prematurely or turning off a cooling subsystem before the system has been shut down.
36
18
4/22/2013
37
38
19
4/22/2013
39
40
20
4/22/2013
41
42
21
4/22/2013
A Failure Mode is: The way in which the component, subassembly, product, input, or process could fail to perform its intended function. Failure modes may be the result of upstream operations or may cause downstream operations to fail Things that could go wrong
43
FMEA
A structured approach to: Identifying the ways in which a product or process can fail Estimating risk associated with specific causes Prioritizing the actions that should be taken to reduce risk Evaluating design validation plan (design FMEA) or current control plan (process FMEA)
44
22
4/22/2013
Prioritize
45
Types of FMEAs
Design Analyzes product design before release to production, with a focus on product function Analyzes systems and subsystems in early concept and design stages Process Used to analyze manufacturing and assembly processes after they are implemented
46
23
4/22/2013
FMEA Procedure
1. For each process input (start with high value inputs), determine the ways in which the input can go wrong (failure mode)
2. For each failure mode, determine effects Select a severity level for each effect
3. Identify potential causes of each failure mode Select an occurrence level for each cause 4. List current controls for each cause Select a detection level for each cause
47
FMEA Procedure
5. Calculate the Risk Priority Number (RPN)
6. Develop recommended actions, assign responsible persons, and take actions Give priority to high RPNs MUST look at severities rated a 10 7. Assign the predicted severity, occurrence, and detection levels and compare RPNs
48
24
4/22/2013
49
Rating Scales
Severity 1 = Not Severe, 10 = Very Severe Occurrence 1 = Not Likely, 10 = Very Likely Detection 1 = Easy to Detect, 10 = Not easy to Detect
50
25
4/22/2013
Severity
Occurrence
Detection
RPN
51
52
26
4/22/2013
53
27
4/22/2013
55
56
28
4/22/2013
Sequential Reliability Testing: The cumulative number of failures based on the choice of the sample is plotted versus, the accumulated test time of the items. Based on the acceptable mean life 0, an associated producers risk , a minimum mean life 1, and an associated consumer risk , equations for the acceptance line and rejection line are found. If the plot stays within the two lines, testing continues; if the plot falls in acceptance region, the test is terminated and lot accepted; if the plot falls in rejection region, the test is terminated and lot rejected.
57
58
29
4/22/2013
Availability An availability is a probability that a component/system is operational at a given time, t (i.e. has not failed or been restored after failure). The availability of an item is a probability that it is operating satisfactorily at any point of time, used in stated conditions including an operating time, active repair time and logistic time. If a vehicle has 99.9% availability, there is one time out of thousand that someone needs to use the vehicle and finds out that the vehicle is not operational because some part of the vehicle is either damaged or in the process of being replaced.
59
Maintainability
Maintainability is defined as the probability of performing a successful repair action within a given time. It is a measure of ease and speed with which a system can be restored to operational status after a failure occurs. M(t) = 1- e-ut Where u is the repair rate MTTR (Mean Time To Repair) = 1/u
60
30