Establishment of Models and Data Tracking For Small UAV Reliability
Establishment of Models and Data Tracking For Small UAV Reliability
Establishment of Models and Data Tracking For Small UAV Reliability
2004-06
Dermentzoudis, Marinos
Monterey California. Naval Postgraduate School
http://hdl.handle.net/10945/1157
NAVAL
POSTGRADUATE
SCHOOL
MONTEREY, CALIFORNIA
THESIS
by
Marinos Dermentzoudis
June 2004
11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official
policy or position of the Department of Defense or the U.S. Government.
12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE
Approved for public release; distribution is unlimited A
This thesis surveys existing reliability management and improvement techniques, and describes
how they can be applied to small unmanned aerial vehicles (SUAVs). These vehicles are currently
unreliable, and lack systems to improve their reliability. Selection of those systems, in turn, drives data
collection requirements for SUAVs, which we also present, with proposed solutions.
This thesis lays the foundation for a Navy-wide SUAV reliability program.
14. SUBJECT TERMS reliability improvement, FMECA, FRACAS, reliability 15. NUMBER OF
growth PAGES 247
i
THIS PAGE INTENTIONALLY LEFT BLANK
ii
Approved for public release; distribution is unlimited
Marinos Dermentzoudis
Commander, Hellenic Navy
B.S., Naval Academy of Greece, 1986
from the
Russell Gottfried
Second Reader
James N. Eagle
Chairman, Department of Operations Research
iii
THIS PAGE INTENTIONALLY LEFT BLANK
iv
ABSTRACT
v
THIS PAGE INTENTIONALLY LEFT BLANK
vi
TABLE OF CONTENTS
I. INTRODUCTION........................................................................................................1
A. BACKGROUND (UAVS, SUAVS).................................................................1
1. UAV – Small UAV ...............................................................................1
2. The Pioneer RQ-2 ................................................................................3
a. The Predator RQ-1....................................................................4
b. The Global Hawk RQ-4 ............................................................5
c. The Dark Star RQ-3..................................................................5
3. RQ-5 Hunter.........................................................................................6
4. RQ-7 Shadow 200.................................................................................6
5. RQ-8 Fire Scout....................................................................................6
6. Residual UAVs Systems.......................................................................7
7. Conceptual Research UAV Systems...................................................7
8. DARPA UAV Programs ......................................................................8
9. Other Nation’s UAVs...........................................................................9
10. NASA.....................................................................................................9
11. What Is a UAV? .................................................................................10
12. Military UAV Categories ..................................................................11
13. Battlefield UAVs.................................................................................12
a. Story 1. Training at Fort Bragg..............................................12
2. Story 2. Desert Shield/Storm Anecdote ..................................13
14. Battlefield Missions............................................................................14
a. Combat Surveillance UAVs ....................................................15
b. Tactical Reconnaissance UAVs..............................................15
B. PROBLEM DEFINITION ............................................................................16
1. UAVs Mishaps....................................................................................16
2. What is the Problem? ........................................................................17
3. What is the Importance of the Problem?.........................................19
4. How Will the Development Teams Solve the Problem without
the Thesis? ..........................................................................................20
5. How Will This Thesis Help?..............................................................20
6. How Will We Know That We Have Succeeded...............................20
7. Improving Reliability.........................................................................20
8. Area of Research ................................................................................21
II. RELATED RESEARCH ...........................................................................................23
A. EXISTING METHODS.................................................................................23
1. General: FMEA, FMECA and FTA.................................................23
a. Introduction to Failure Mode and Effect Analysis
(FMEA) ...................................................................................23
b. Discussion................................................................................24
c. FMEA: General Overview ......................................................26
vii
d. When is the FMEA Started?...................................................26
e. Explanation of the FMEA ......................................................27
f. The Eight Steps Method for Implementing FMEA ...............28
g. FMEA Team............................................................................31
h. Limitations Applying FMEA ..................................................31
i. FMEA Types ...........................................................................32
j. System and Design FMEA......................................................32
k. Analysis of Design FMEA ......................................................33
l. FMEA Conclusion ..................................................................36
m. Other Tools..............................................................................36
2. Manned Aviation Specific: RCM, MSG-3 .......................................42
a. Introduction to RCM...............................................................42
b. The Seven Questions...............................................................43
c. RCM-2 .....................................................................................43
d. SAE STANDARD JA 1011 .....................................................44
e. MSG-3......................................................................................45
f. MSG-3 Revision ......................................................................47
g. General Development of Scheduled Maintenance ................48
h. Divisions of MSG-3 Document...............................................49
i. MSI Selection ..........................................................................49
j. Analysis Procedure .................................................................51
k. Logic Diagram.........................................................................51
l. Procedure ................................................................................55
m. Fault Tolerant Systems Analysis ............................................55
n. Consequences of Failure in the First level ............................56
o. Failure Effect Categories in the First Level ..........................57
p. Task Development in the Second level ...................................58
3. Comparison of Existing Methods .....................................................61
a. RCM.........................................................................................61
b. Conducting RCM Analysis .....................................................62
c. Nuclear Industry & RCM .......................................................62
d. RCM in NAVAIR ....................................................................63
e. RCM in Industries Other Than Aviation and Nuclear
Power .......................................................................................64
f. FMEA and RCM.....................................................................66
g. FMECA ...................................................................................67
h. FTA, FMEA, FMECA ............................................................67
i. FTA..........................................................................................68
j. RCM Revisited.........................................................................68
k. UAVs, SUAVs versus Manned Aircraft .................................70
l. Conclusions-Three Main Considerations about UAV-
RCM.........................................................................................71
B. SMALL UAV RELIABILITY MODELING ..............................................73
1. System’s High Level Functional Architecture ................................73
2. System Overview................................................................................77
viii
3. System Definition ...............................................................................78
4. System Critical Functions Analysis..................................................80
5. System Functions ...............................................................................82
6. Fault Tree Analysis ............................................................................82
7. Loss of Mission ...................................................................................83
8. Loss of Platform .................................................................................85
9. Loss of GCS ........................................................................................87
10. Loss of Platform’s Structural Integrity ...........................................89
11. Loss of Lift ..........................................................................................91
12. Loss of Thrust.....................................................................................93
13. Loss of Platform Control...................................................................95
14. Loss of Platform Position ..................................................................97
15. Loss of Control Channel....................................................................98
16. Engine Control Failure....................................................................100
17. Engine Failure ..................................................................................101
18. Failure of Fuel System .....................................................................103
19. Loss of Platform Power ...................................................................105
20. Loss of GCS Power ..........................................................................107
21. Operator Error.................................................................................109
22. Mechanical Engine Failure .............................................................111
23. Engine Vibrations ............................................................................113
24. Overheating ......................................................................................115
25. Inappropriate Engine Operation....................................................117
26. Follow-on Analysis for the Model...................................................119
27. Criticality Analysis...........................................................................125
28. Interpretation of Results .................................................................131
III. DATA COLLECTIONS SYSTEMS ......................................................................133
A. RELIABILITY GROWTH AND CONTINUOUS IMPROVEMENT
PROCESS .........................................................................................133
1. Failure Reporting and Corrective Action System (FRACAS).....133
a. Failure Observation ..............................................................134
b. Failure Documentation ........................................................135
c. Failure Verification ..............................................................135
d. Failure Isolation ...................................................................135
e. Replacement of Problematic Part(s).....................................135
f. Problematic Part(s) Verification ..........................................135
g. Data Search ...........................................................................135
h. Failure Analysis ....................................................................136
i. Root-Cause Analysis .............................................................136
j. Determine Corrective Action ................................................136
k. Incorporate Corrective Action and Operational
Performance Test ..................................................................136
l. Determine Effectiveness of Corrective Action .....................137
m. Incorporate Corrective Action into All Systems...................137
2. FRACAS Basics................................................................................137
ix
3. FRACAS Forms ...............................................................................140
4. Discussion for the Forms Terms.....................................................141
5. Reliability Growth Testing .............................................................150
6. Reliability Growth Testing Implementation .................................152
B. RELIABILITY IMPROVEMENT PROCESS .........................................152
1. UAVs Considerations ......................................................................152
2. UAVs and Reliability .......................................................................154
a. Pilot Not on Board ................................................................154
b. Weather Considerations........................................................154
c. Gusts and Turbulence...........................................................156
d. Non Developmental Items (NDI) or Commercial Off-the-
shelf (COTS)..........................................................................156
e. Cost Considerations ..............................................................156
f. Man in the Loop....................................................................158
g. Collision Avoidance ..............................................................159
h. Landing..................................................................................159
i. Losing and Regaining Flight Control ..................................160
j. Multiple Platforms Control...................................................160
k. Reliability, Availability, Maintainability of UAVs ...............161
3. Reliability Improvement for Hunter ..............................................162
4. Measures of Performance (MOP) for SUAVs ...............................163
5. Reliability Improvement Program on SUAVs ..............................165
6. Steps for Improving Reliability on SUAVs....................................166
IV. EXAMPLE................................................................................................................171
A. RQ-2 PIONEER 86 THROUGH 95 ...........................................................171
V. CONCLUSION ........................................................................................................179
A. SUMMARY ..................................................................................................179
B. RECOMMENDATIONS FOR FUTURE RESEARCHERS...................181
APPENDIX A: DEFINITION OF FMEA FORM TERMS............................................183
1. First Part of the Analysis of Design FMEA ...................................183
2. The Second Part of the Analysis of Design FMEA .......................183
3. Third Part of the Analysis of Design FMEA .................................188
APPENDIX B: THE MRB PROCESS..............................................................................189
APPENDIX C: FAILURES ...............................................................................................191
1. Functions...........................................................................................191
2. Performance Standards...................................................................191
3. Different Types of Functions...........................................................191
4. Functional Failure............................................................................192
5. Performance Standards and Failures ............................................192
6. Failure Modes...................................................................................194
7. Failure Effects ..................................................................................194
8. Failure Consequences ......................................................................195
APPENDIX D: RELIABILITY .........................................................................................197
x
1. Introduction to Reliability...............................................................197
2. What is Reliability?..........................................................................198
3. System Approach .............................................................................198
4. Reliability Modeling.........................................................................199
a. System Failures .....................................................................199
b. Independent vs Dependent Failures.....................................200
c. Black-Box Modeling .............................................................200
d. White-Box Modeling .............................................................201
e. Reliability Measures..............................................................202
f. Structure Functions ..............................................................208
g. Series System Reliability Function and MTTF ...................210
h. Quantitative Measures of Availability..................................210
APPENDIX E: LIST OF ACRONYMS AND DEFINITIONS.......................................213
LIST OF REFERENCES ....................................................................................................217
INITIAL DISTRIBUTION LIST .......................................................................................225
xi
THIS PAGE INTENTIONALLY LEFT BLANK
xii
LIST OF FIGURES
xiii
Figure 42. Condition Variable Versus Time.(From Hoyland, page 18)..........................202
Figure 43. Distribution and Probability Density Functions (From Hoyland, page 18)...203
Figure 44. Typical Distribution and Reliability Function ...............................................204
Figure 45. The Bathtub Curve.........................................................................................206
Figure 46. MTTF, MTTR, MTBF. (From Hoyland, page 25) ........................................208
xiv
LIST OF TABLES
xv
THIS PAGE INTENTIONALLY LEFT BLANK
xvi
ACKNOWLEDGMENTS
The author would like to acknowledge the assistance of the VC-6 Team that
operates the XPV 1B TERN SUAV system for providing valuable insight information
about their system.
In addition, I would like to thank the NPS Dudley Knox Library staff for their
high level of professionalism and their continuous support and help during my effort.
My sincere thanks also go to LCDR Russell Gottfried for his untiring support and
motivation of my research effort.
I could not have accomplished this thesis without the technical help and directions
patiently and expertly provided by my thesis advisor, Professor David Olwell. His
contribution to this thesis was enormous and decisive. I feel that he has also influenced
me to like reliability, which is a new and interesting field for me.
To my one-year old son, Stephanos, who brought joy to my life, I would like to
express my gratitude because he made me realize through his everyday achievements that
I had to bring this research effort to an end.
Last, but not least, I would like to express my undying and true love to my
beautiful wife, Vana and dedicate this thesis to her. Without her support, encouragement,
patience, and understanding, I could not have performed this research effort.
xvii
THIS PAGE INTENTIONALLY LEFT BLANK
xviii
EXECUTIVE SUMMARY
Small UAVs will be used with growing frequency in the near future for military
operations. As SUAVs progress from being novelties and toys to becoming full members
of the military arsenal, their reliability and availability must begin to approach the levels
expected of military systems. They currently miss those levels by a wide margin.
The military has wide experience with the need for reliability improvement in
systems, and in fact developed or funded the development of many of the methods
discussed in this thesis. These methods have not yet been applied to SUAVs.
With the existing crude data on one UAV system, I was able to perform a crude
analysis using a reliability growth model based on Duane’s postulate. With good data, the
Navy will be able to do much more, as outlined in the thesis.
xx
I. INTRODUCTION
The UAV puts eyes out there in places we don’t want to risk
having a manned vehicle operate. Sometimes it’s very dull, but necessary
work—flying a pattern for surveillance or reconnaissance. UAVs can go
into a dirty environment where there’s the threat of exposure to nuclear,
chemical or biological warfare. They are also sent into dangerous
environments—battle zones: Dull, Dirty, Dangerous. The primary reason
for the UAV is the Three D’s.2
The Lightning Bug was based on the earlier Fire Bee. It operated from 1964 until
April 1975, performing a total of 3,435 flight hours in RECCE missions that were too
dangerous for manned aircraft, especially during the Vietnam War. Some of its most
valuable contributions were photographing prisoner camps in Hanoi and Cuba, providing
photographic evidence of SA-2 missiles in North Vietnam, providing low-altitude battle
assessment after B-52 raids, and acting as a tactical air launched decoy.6
In 1962, Lockheed Martin began developing the D-21 supersonic RECCE drone,
the Tagboard. It was designed to be launched from either the back of a two-seat A-12,
which was under development at the same time, or from the wing of a B-52H. The drone
could fly at speeds greater than 3.3 Mach, at altitudes above 90,000 feet and had a range
4 The Global Aircraft Organization, US Reconnaissance, “U-2 Dragon Lady,” Internet, February 2004.
Available at: http://www.globalaircraft.org/planes/u-2_dragon_lady.pl
5 Clark, Richard M., Lt Col, USAF, “Uninhabited Combat Aerial Vehicles, Airpower by the People,
For the People, But Not with the People,” CADRE Paper No. 8, Air University Press, Maxwell Air Force
Base, Alabama, August 2000, Internet, February 2004. Available at: http://www.maxwell.af.mil
/au/aupress/CADRE_Papers/PDF_Bin/clark.pdf
6 Ibid.
2
of 3,000 miles. The project was canceled in 1971 together with the A-12 development
due to numerous failures, high cost of operations, and bad management.7
In addition to the RECCE role, Teledyne Ryan experimented with strike versions
of the BQM-34 drone, the Tomcat. They investigated the possibility of arming the
Lightning Bug with Maverick electro-optical-seeking missiles or electro-optically-guided
bombs Stubby Hobo. Favorable results were demonstrated in early 1972 but the armed
drones were never used during the Vietnam War. Interest in the UAVs was fading by the
end of the Vietnam War.8
In the 1973 Yom Kippur War, the Israelis used UAVs effectively as decoys to
draw antiaircraft fire away from attacking manned aircraft. In 1982, UAVs were used to
obtain the exact location of air defenses and gather electronic intelligence information in
Lebanon and Syria. The Israelis also used UAVs to monitor airfield activities, changing
strike plans accordingly.9
2. The Pioneer RQ-210
The US renewed its interest in UAVs in the late 1980s and early 90s, with the
start of the Gulf War. Instead of developing one from scratch, the US acquired and
improved the Scout, which was used by the Israelis in 1982 against the Syrians. The
outcome was the Pioneer, which was bought by the Navy to provide cheap unmanned
over the horizon targeting (OTHT), RECCE, and battle assessment. The Army and
Marines bought the Pioneer for similar roles and six Pioneer systems were deployed to
SW Asia for Desert Storm.
Compared to the Lightning Bug, the Pioneer is slower, larger, and lighter, but
cheaper. The average cost of the platform was only $850K, which was inexpensive
relative to the cost of a manned RECCE aircraft. 11 With its better sensor technology, the
7 Carmichael.
8 Ibid.
9 Ibid.
10 The material of this section is taken (in some places verbatim) from GlobalSecurity.org, “Pioneer
Short Range (SR) UAV,” maintained by John Pike, last modified: November 20, 2002, Internet, May 2004.
Available at: http://www.globalssecurity.org/intell/systems/pioneer.htm
11 National Air and Space Museum, Smithsonian Institution, “Pioneer RQ-2A,” 1998-2000, revised
9/14/01 Connor R. and Lee R. E., Internet, May 2004. Available at: http://www.nasm.si.edu/research
/aero/aircraft/pioneer.htm
3
Pioneer can deliver real-time battlefield assessment in video stream, a huge improvement
compared to the film processing required for the Lightning Bugs.
By 2000, after 15 years of operations, the Pioneer had logged more than 20,000
flight hours. Apart from Desert Storm it was used in Desert Shield, in Bosnia, Haiti,
Somalia, and for other peacekeeping missions. The Navy used the Pioneer to monitor the
Kuwait and Iraqi coastline and to provide spotting services for every 16-inch round fired
by its battleships.
Three different platforms compose the endurance UAV family: Predator, Global
Hawk, and Dark Star.
a. The Predator RQ-112
Predator is a by-product of the CIA-developed Gnat 750, also known as
the TierII or medium altitude endurance (MAE) UAV. It is manufactured by General
Atomics Aeronautical Systems and costs about $3.2M to $4.5M per platform.13 Its
endurance was designed to be greater than 40 hours with a cruising speed of 110 knots
and operational speed of 75 knots using a reciprocating engine with a 25,000-foot ceiling
and 450-pound payload. Predator can carry electro-optical (EO) and infrared (IR)
sensors. It also collects full-rate video imagery and transmits it in near real-time via
satellite, other UAVs, manned aircraft or line-of-sight (LOS) data link. More importantly,
Predator is highly programmable. It can go from autonomous flight to manual control by
a remote pilot.
12 The material for this section is taken (in some places verbatim) from: Carmichael.
13 Ciufo, Chris A., “UAVs:New Tools for the Military Toolbox,” [66] COTS Journal, June 2003,
Internet, May 2004. Available at: http://www.cotsjournalonline.com/2003/66
4
Except for Pioneer, Predator is the most tested and commonly used UAV. It was
first deployed to Bosnia in 1994, next in the Afghan War of 2001, and then in the Iraqi
war of 2003.
Used as a low altitude UAV, Predator can perform almost the same tasks as
Pioneer: surveillance, RECCE, combat assessment, force protection, and close air
support. It can also be equipped with two laser-guided Hellfire missiles for direct hits at
moving or stationary targets. During operation Enduring Freedom in Afghanistan,
Predators were considered invaluable to the troops for scouting around the next bend of
the road or over the hill for hidden Taliban forces.
Used as a high altitude UAV, the Predator can perform surveillance over a wide
area for up to 30 to 45 hours. In Operation Iraqi Freedom, Predators were deployed near
Baghdad to attract hostile fire from the city’s anti-air defense systems. Once the locations
of these defense systems were revealed, manned airplanes eliminated the targets.
b. The Global Hawk RQ-414
A TierII+ aircraft, Global Hawk is a conventional high-altitude endurance
(CHAE) UAV by Teledyne Ryan Aeronautical. A higher performance vehicle, it was
designed to fulfill a post-Desert Storm requirement for high resolution RECCE of a
40,000 square nautical mile area in 24 hours. It can fly for more than 40 hours and over
3,000 miles away from its launch and recovery base carrying a synthetic aperture radar
(SAR) and an EO/IR payload of 2,000 pounds at altitudes above 60,000 feet at a speed of
340 knots. The cost of a Global Hawk is about $57M per unit.15
c. The Dark Star RQ-316
The Tier III stealth or low observable high altitude endurance (LOHAE)
RQ-3 UAV was the Lockheed-Martin/Boeing Dark Star. Its primary purpose was to
image well-protected, high-value targets. Capable of operating for more than eight hours
at altitudes above 45,000 feet and a distance of 500 miles from its launch base, it was
designed to meet a $10M per platform unit cost. Its first flight occurred in March 1996;
14 The material for this section is taken (in some places verbatim) from: Carmichael.
15 Ciufo.
16 The material for this section is taken (in some places verbatim) from: Carmichael.
5
however, a second flight in April 1996 crashed due to incorrect aerodynamic modeling of
the vehicle flight-control laws. The project was cancelled in 1999.17
For the characterization code RQ-3 the "R" is the Department of Defense
designation for reconnaissance; "Q" means unmanned aircraft system. The "3" refers to it
being the third of a series of purpose-built unmanned reconnaissance aircraft systems.18
3. RQ-5 Hunter19
Initially engaged to serve as the Army’s short range UAV system for division and
corps commanders at a cost of $1.2M per unit,20 the RQ-5 Hunter can carry a 200 lb load
for more than 11 hours. It uses an electro-optical infrared (EO/IR) sensor, and relays its
video images in real-time via a second airborne Hunter over a line-of-site (LOS) data
link. It deployed to Kosovo in 1999 to support NATO operations. Production was
cancelled in 1999 but the remaining low-rate initial production (LRIP) platforms remain
in service for training and experimental purposes. Hunter is to be replaced by the Shadow
200 or RQ-7 tactical UAV (TUAV).
4. RQ-7 Shadow 20021
The Army selected the RQ-7 Shadow 200 in December 1999 as the close range
UAV for support to ground maneuver commanders. It can be launched by the use of a
catapult rail and recovered with the aid of arresting gear, and remain at least four hours
on station with a payload of 60 lbs.
5. RQ-8 Fire Scout22
The RQ-8 Fire Scout is a vertical take-off and landing (VTOL) tactical UAV
(VTUAV). It can remain on station for at least three hours at 110 knots with a payload of
200 lb. Its scouting equipment consists of an EO/IR sensor with an integral laser
17 GlobalSecurity.org, “RQ-3 Dark Star Tier III Minus,” maintained by John Pike, last modified:
November 20, 2002, Internet, May 2004. Available at: Available at: http://www.globalsecurity.org
/intell/systems/darkstar.htm
18 Ibid.
19 The material for this section is taken (in some places verbatim) from: Office of the Secretary of
Defense (OSD), “Unmanned Aerial Vehicles Roadmap 2000-2025,” April 2001, page 4.
20 Ciufo.
21 The material for this section is taken (in some places verbatim) from: OSD 2001, page 5.
22 The material for this section is taken (in some places verbatim) from: OSD 2001, page 5.
6
designator rangefinder. Data is relayed to its ground or ship control station in real time
over a LOS data link and a UHF backup that could operate from all air capable ships.
6. Residual UAVs Systems23
The US military maintains the residual of several UAV programs that are not
current programs for development but have recently deployed with operational units and
trained operators. BQM-147, Exdrones, is an 80-lb delta wing communications jammer
and was deployed during the Gulf War. From 1997 to 1998 some of them were rebuilt
and named Dragon Drone and deployed with Marine Expeditionary units. Air Force
Special Operations Command and Army Air Maneuver Battle Lab are also conducting
experiments with Exdrones.
23 Ibid, page 6.
24 Ibid, pages 7-8.
7
Sponsored by the Defense Threat Reduction Agency, the Counterproliferation
(CP) Advance Concept Technology Demonstrations (ACTD) envisions deploying several
mini-UAVs like the Finder from a larger Predator UAV to detect chemical agents and
relay the results back through Predator.
Besides the Dragon Eye and Finder mentioned above, the Naval Research
Laboratory (NRL) has built and flown several small and micro-UAVs. Definition for
these airframes will follow. The Naval Air Warfare Centre Aircraft Division
(NAWC/AD) maintains a small UAV test and development team and also operates
various types of small UAVs.
8. DARPA UAV Programs26
The Defence Advanced Research Projects Agency (DARPA) is sponsoring five
major creative UAV programs:
a. The Air Force X-45 UCAV, which was awarded to Boeing in 1999. The
mission for the UCAV is Suppression of the Enemy Air Defences (SEAD). The platform
will cost one third as much as a Joint Strike Fighter (JSF) to acquire and one quarter as
much to operate and support (O&S). The X-45A, with a maximum speed of 1000km/h,
was designed to carry two 500 kg bombs using radar absorbing materials, and was first
flown in June 2002.
8
Grumman successfully flew in March 2003 using modified GPS coordinates for
navigation.
(1) The Dragon Fly Canard Rotor Wing, which will demonstrate
vertical take-off-and-land (VTOL) capability and then transition to fixed wing flight for
cruise.
Derivatives of the Israeli designs are the Crecerelle used by the French Army, the
Canadair CL-289 used by the German and French Armies and the British Phoenix. The
Russians use the VR-3 Reys and the Tu-300 and the Italians the Mirach 150.27
10. NASA
In the civilian sector, NASA has been the main agency concerned with
developing medium and high-altitude long endurance UAVs. The agency has been
involved with two main programs “Mission to Planet Earth” and “Earth Science
Enterprise” for environmental monitoring of the effects of global climatic change. During
the late 80s, NASA started to operate high-altitude manned aircraft, but later decided to
develop a UAV for high-altitude operations. NASA constructed the propeller driven
27 Petrie, G., Geo Informatics, Article “Robotic Aerial Platforms for Remote Sensing,” Department of
Geography &Topographic Science, University of Glasgow, May 2001, Internet, February 2004. Available
at: http://web.geog.gla.ac.uk/~gpetrie/12_17_petrie.pdf
9
Perseus between 1991 and 1994 and Theseus, which was a larger version of Perseus, in
1996.
The development of solar powered UAVs is also being supported and funded by
NASA. The idea, development, and construction was initiated by the Aerovironment
company, which has been involved in the construction of solar-powered aircraft for 20
years. Solar Challenger, HALSOL, Talon, Pathfinder, Centurion, and Helios with a
wingspan of 247 feet, were among the solar-powered UAVs during those efforts.28
a. UAVs are designed to be recovered at the end of their flight while cruise
missiles are not.
We also classify military UAVs in three main categories, considering their ceiling
as their driver characteristic: Tactical UAVs (TUAVs), Medium-Altitude Endurance
UAVs (MAE UAVs), and High-Altitude Endurance UAVs (HAE UAVs).32
31 Ibid.
32 Tozer, Tim, and others, “UAVs and HAPs-Potential Convergence for Military Communications,”
University of York, DERA Defford, undated, Internet, February 2004. Available at: http://www.elec.york
.ac.uk/comms/papers/tozer00_ieecol.pdf
33 Pike, John, Intelligence Resource Program, “Unmanned Aerial Vehicles (UAVS),” Internet, March
2004. Available at: http://www.fas.org/irp/program/collect/uav
34 The material for this part of section is taken (in some places verbatim) from: OSD 2002, Section2,
“Current UAV programs.”
11
(2) Mini UAVs have a span up to four feet. They provide the
company/platoon/squad level with an organic RSTA capability out to 10 Km. The
Aerovironment Dragon Eye is an example of this category.35
(3) Small UAVs (SUAVs) have a size greater than four feet in
length. “SUAV is a low-cost and user-friendly UAV system.” It is a highly mobile air
vehicle system that among other potentials allows the small warfighting unit to set the
foundation to exploit battlefield information superiority.36
b. Tier II or MAE UAVs are larger than TUAVs, more expensive, with an
average cost of 1M$FY00, and have enhanced performance. Their payload can reach 300
kg, their endurance is 12 or more hours, and their ceiling is up to 20,000 feet. Predator is
a typical example of a MAE UAV.
c. Tier II Plus or HAE UAVs can be large craft with an endurance of more
than 24 hours, payload capacities of more than 800 Kg and a ceiling of more than 30,000
feet. Their average cost is about 10M$FY00. Global Hawk is a typical example of HAE
UAV.
d. Tier III Minus or LOHAE UAVs can be large crafts with an endurance
of more than 12 hours, payload capacities of more than 300 Kg, and a ceiling of more
than 65,000 feet. Dark Star was a typical example of LOHAE UAV.
13. Battlefield UAVs
Here are two descriptions of the use of UAVs in training and combat.
a. Story 1. Training at Fort Bragg
“FDC this is FO adjust fire, over”. “FO this is FDC adjust fire,
out”. “FDC grid 304765, over”. “FO grid 304765, out”. “FDC two tanks
in the open, over”. “FO that’s two tanks in the open, out”. Then about 30
seconds later, “FO shot, over”. “FDC shot, out”. “FO splash, over”. “FDC
splash, out”. Fort Bragg, N.C. (April 5, 2001).
35 Ibid.
36 NAVAIR, “Small Unmanned Aerial Vehicles,” undated, Internet, February 2004. Available at:
http://uav.navair.navy.mil/smuav/smuav_home.htm
12
Thunder, the 3rd Battalion, 14th Marines used a different type of forward
observer.
For this mission, the UAV, which was flying at around 6,000 to
8,000 feet, was used to identify targets. They then looked at that data and
turned it into a fire mission, which was sent to the Marines on the gun line.
Once the Marines on the gun line blasted their round toward the target, the
UAV was used to adjust fire. “After using the UAV, I think it is equal to,
if not better than, a forward observer,” said Zoganas. “A forward observer
has a limited view depending on where he is at, but a UAV, being in the
air, has the ability to cover a lot more area,” said Zoganas. “I think the
UAV’s capabilities are underestimated, it is a great weapon to have on the
modern battlefield.”37
37 Zachany, Bathon A., Marine Forces Reserve, “Unmanned Aerial Vehicles Help 3/14 Call For and
Adjust Fire,” Story ID Number: 2001411104010, April 5, 2001, Internet, February 2004. Available at:
http://www.13meu.usmc.mil/marinelink/mcn2000.nsf/Open document
13
handkerchiefs, undershirts, and bed sheets, they signalled their desire to
surrender. Imagine the consternation of the Pioneer aircrew who called the
commanding officer of Wisconsin and asked plaintively, “Sir, they want to
surrender, what should I do with them?”38
UAVs have been used for the above missions since their inception. They can also
be used for target acquisition, target designation and battle damage assessment (BDA).
Due to their small size, they can operate more discreetly than their manned counterparts,
allowing target acquisition to occur with less chance of counter-detection. “The
surveillance UAV can be used to designate the target for a precision air and/or artillery or
missile strike while providing near real-time battle damage assessment to the force or
mission commander.”40 In that way, useless repeat attacks on a target could be avoided as
well as wastage of munitions.
Battlefield UAVs are appropriate UAVs for all of the above missions. In the
beginning of the 1950s, UAVs like the Northrop Falconer had been developed for
battlefield reconnaissance with little or no combat service. Later the Israelis were the
early developers of the operational use of battlefield UAVs in the early 1980s in southern
Lebanon operations. Their successes with battlefield UAVs drew international
attention.41
38 The Warfighter’s Encyclopedia, Aircraft, UAVs, “RQ-2 Pioneer,” August 14, 2003, Internet,
February 2004. Available at: http://www.wrc.chinalake.navy.mil/warfighter_enc/aircraft/UAVs/pioneer
.htm
39 Ashworth, Peter, LCDR, Royal Australian Navy, Sea Power Centre, Working Paper No6, “UAVs
and the Future Navy”, May 2001, Internet, February 2004. Available at: http://www.navy.gov.au
/spc/workingpapers/Working%20Paper%206.pdf
40 The material for the above part of section is taken (in some places verbatim) from: Ashworth.
41 Goebel, Greg,/ In the Public Domain, “[6.0] US Battlefield UAVs (1),” Jan 1, 2003, Internet,
February 2004. Available at: http://www.vectorsite.net/twuav6.html
14
We can distinguish two broad categories of battlefield UAVs; the “combat
surveillance” UAV and the “tactical reconnaissance” UAV.
a. Combat Surveillance UAVs42
The function of combat surveillance UAVs is to observe everything on a
battlefield in real-time, flying over the battle area, and relaying intelligence to a ground-
control station. In general, they are powered by a small internal combustion two-stroke
piston engine, known as a “chain saw” because of its characteristic noise. An autopilot
system with a radio control (RC) back-up for manual operations directs the platform from
pre-takeoff programmed sets of waypoints. In most cases, the program is set up by
displaying a map on a workstation, entering the coordinates, and downloading the
program into the UAV. Navigation is always verified by a GPS and often by an INS
system as well.
Sensors are generally housed in a turret underneath the platform and/or are
integrated into the platform’s fuselage. They usually feature day-night imagers and in
many case a laser designator, SIGINT packages, or Synthetic Aperture Radar (SAR).
Larger UAVs have fixed landing gear that are used for takeoff and landing
purposes on small airstrips. Larger UAV can also be launched by special rail launcher
boosters and recovered by parachute, parasail or by flying into a net. Smaller UAVs may
be launched by a catapult and recovered in the same way or by landing in plain terrain
without any use of landing gear.
b. Tactical Reconnaissance UAVs43
Tactical Reconnaissance (TR) UAVs are usually larger and in some cases
jet powered with extended range and speed. Like the combat surveillance UAVs, they are
equipped with an autopilot with RC backup. Their primary mission is to fly over
42 The material for this section is taken (in some places verbatim) from: Goebel.
43 Ibid.
15
predefined targets out of line of sight, and take pictures or relay near real-time data to the
ground-control station via satellite links.
A UAV of this type can usually carry day-night cameras and/or Synthetic
Aperture Radar (SAR). The necessary communication equipment is usually located on
the upper part of the platform’s fuselage. A TR UAV can also be launched from runways
or small airstrips, an aircraft, and/or by special rail launcher boosters, and be recovered
by parachute.
The exact distinction between the two types of battlefield UAVs and other
types of UAVs is not clear. Some types are capable of both missions. A small combat
surveillance UAV may be the size of “a large hobbyist RC model plane.” It can be “used
to support military forces at the brigade or battalion level and sometimes they are called
‘mini UAVs.’ Their low cost makes them suitable for ‘expendable’ missions.”
B. PROBLEM DEFINITION
1. UAVs Mishaps
According to the Office of Secretary of Defense “UAV Roadmap” the mishap rate
for UAVs is difficult to define:
To get a view of the problem, we see that the 2002 crash rate for Predator was
32.8 crashes per 100,000 flight hours, and for 2003 it was 49.6 until May. The accident
rate for the Global Hawk was 167.7 per 100,000 flight hours on May 2003.46
Nevertheless, commanders can take greater risks with UAVs without worrying
about loss of life. These risks would not be taken with manned aircrafts. For example, the
recently updated MR for the F-16 was 3.5 per 100,000 flight hours. According to DoD
data, the MR for the RQ-2A Pioneer was 363 while the MR for the RQ-2A dropped to
139. For the RQ-5 Hunter it was 255 for pre-1996 platforms, and has dropped to 16 since
then. For the Predator RQ-1A, it was 43 and for the RQ-1B it was 31.47
2. What is the Problem?
Currently a network experiment series named Surveillance and Tactical
Acquisition Network (STAN) is being conducted by the Naval Postgraduate School
(NPS) at Camp Roberts, with SUAVs as the sensor platforms and the primary source of
information. SUAV programs are currently of great interest to the Fleet, Special Forces,
and other interested parties and are receiving large amounts of funding. There is a great
deal of concern about the reliability of SUAVs because a lot of problems have emerged
in testing. Reliability must be improved.
This thesis documents these problems. At the CIRPAS site at McMillan Field in
Camp Roberts on September 11 and 12, 2003, I observed flight, communication, search
and detection, and target acquisition tests, using two different types of SUAV platforms,
XPV-1B TERN and Silverfox, an experimental program funded by the office of Naval
Research. Incidents regarding reliability that occurred during that time include:
46 Peck, Michael, National Defense Magazine, May 2003, Feature Article, “Pentagon Unhappy About
Drone Aircraft Reliability, Rising Mishap Rates of Unmanned Vehicles Attributed to Rushed
Deployments,” page 1, Internet, February 2004. Available at: http://www.nationaldefensemagazine.org
/article.cfm?Id=1105
47 Peck.
17
a. During the pre-takeoff checks in the runway end, an engine air-intake
filter failed (due to broken support lock wire hole). The problem was obviously due to
engine vibrations. There was no spare part filter or any other means to repair the failure,
so it was replaced with another TERN platform’s air filter.
b. During the start engine procedure, a starting device failed. The failure
was due to a loose bolt and the starting device could not start the engine. After ten
minutes delay, the bolt was tightened.
c. After two and a half hours of flight operation on a TERN platform and
while in flight, the engine stalled at 500 feet. The SUAV ran out of fuel.
Result: At least twenty minutes cleaning was needed after such landings.
The next step for STAN experiment was at the CIRPAS site at McMillan Field in
Camp Roberts from May 2 to May 6, 2004. I observed flight, communication, search and
detection, and networking tests, using the XPV-1B TERN on May 2 and 3. Incidents
regarding reliability that occurred during that time include:
18
a. During the assembly checks in the hangar on May 2, major software
problem was detected. Repairing was not possible by the team members.
b. During the test flight operation of the next platform the same day, the
engine stalled at 1000 feet and led to a platform crash.
c. On May 3, after one hour of flight operation, the third platform and
while in flight, an autopilot software malfunction was occurred that led to a platform auto
hard landing in the ground.48
d. During landing of the next TERN platform and after two hours in flight
operation the front tire delaminated on May 5.49 Probably due to operator error, the
damage was impossible to be repaired by the team members.
e. On May 6, after one hour of operation flight and while in flight, a right-
wing servo failure occurred that result in loss of platform control and then to a platform
crash.50
48 Gottfried, Russell, LCDR (USN), Unmanned Vehicle Integration TACMEMO, 5-6 May Recap, e-
mail May 7, 2004.
49 Ibid.
50 Ibid.
19
reliability, operability, and reusability. In that way, they can become dependable systems
and be used in the battlefield with other systems.
4. How Will the Development Teams Solve the Problem without the
Thesis?
Trial and error and/or test, analyze and fix (TAAF) are the methods being used to
overcome failures for the Silverfox system. Being in the experimental phase, it is the
easiest but most time consuming way.
For the other system (TERN) that has been operational for almost two years, an
extended trial period is presently being conducted. From it, conclusions can be made for
future system improvements and operational usages. Other experimental systems can also
contribute to quantitative assessments of readiness and availability.
5. How Will This Thesis Help?
This thesis provides a tool to consider reliability issues by developing a system
for tracking data that could be improve reliability for SUAV systems.
6. How Will We Know That We Have Succeeded
Verification and validation of the proposed solutions and methods by NAVAIR
and the other interested parties will indicate the accuracy and the effectiveness of the
framework suggested by this thesis.
7. Improving Reliability
UAV reliability is the main issue preventing the FAA from relaxing its
restrictions on UAVs flying in civilian airspace and for foreign governments to allow
overflight and landing flights. Improved reliability or simply knowing actual mishap rates
and causes will enable risk mitigation and eventual flight clearance.
Efforts toward improving UAV reliability are required, but how can this best be
accomplished? The answer is by spending money, but we can be more specific. More
redundancy of flight control-systems may increase reliability, but there is another trade
off. The absence of components needed for manned aircraft makes UAVs cheaper, but
this also degrades their reliability. If reliability is sacrificed, then high attrition will
increase the number of UAVs needed and so the cost will rise again.
20
By focusing on flight control systems, propulsion, and operator training, which
account for approximately 80% of UAVs mishaps, we can increase reliability.51
Redundancy in on-board systems is not easily added, especially to small UAVs. Weight
and volume restrictions are very tight and that can lead to expensive solutions. But then if
we make UAVs too expensive, we cannot afford to lose them.
51 Peck.
21
THIS PAGE INTENTIONALLY LEFT BLANK
22
II. RELATED RESEARCH
A. EXISTING METHODS
The following section, presents and analyzes existing general methods of failure
tracking and analysis, as well as the existing reliability centered maintenance method that
has been used by the civil aviation industry. A comparison between them, focusing on
small UAV (SUAV) application, is also presented.
1. General: FMEA, FMECA and FTA
a. Introduction to Failure Mode and Effect Analysis (FMEA)
Well-managed companies are interested in preventing or at least
minimizing risk in their operations, through risk management analysis. “The risk analysis
has a fundamental purpose of answering the following two questions:
FMEA is one of the first systematic techniques for failure analysis. “An
FMEA is often the first step of a system’s reliability study.”54 It incorporates reviewing
components, assemblies and subsystems to identify failure modes, causes and effects of
such failures. FMEA is a systematic method of identifying and preventing product and
process failures before they occur. It is focused on preventing defects, enhancing safety,
and increasing customer satisfaction.
52 Stamatis, D. H., Failure Mode and Effect Analysis: FMEA from Theory to Execution, American
Society for Quality (ASQ), 1995, page xx. The above part of section is a summary and paraphrase (in some
places verbatim) of “Introduction.”
53 Ibid, page xxi.
54 Hoyland, A., and Rausand, M., System Reliability Theory: Models and Statistics Methods, New
York: John Wiley and Sons, 1994, page 73.
23
The purpose of FMEA is preventing process and product problems before
they occur.55 Used in the design and manufacturing process, FMEAs reduce cost and
efforts by identifying product and process improvements early in the development phase
when it is easier, faster and less costly to make changes. Formal FMEAs were first
conducted in the aerospace industry in the mid 60’s, when looking at safety issues.
Industry in general (automotive particularly) adapted the FMEA for use as a quality
improvement tool.
55 McDermott, E. R., Mikulak, J. R, and Beauregard, R. M., The Basics of FMEA, Productivity Inc.,
1996, page 4.
56 Stamatis, page xxi.
57 Stamatis, page xxii. The above part of section is a summary and paraphrase (in some places
verbatim) of “Introduction.”
58 Hoyland, page 74.
24
carried out according to the bottom-up approach. However, for some systems adopting
the top-down approach can save time and effort.59
(6) What inherent provisions are provided in the design to compensate for
the failures?”61
There are at least four prerequisites we must understand and must consider
while conducting FMEA:
(1) All problems are not the same and not equally important.
25
Definitions of terms related to failure and failure modes are presented in
Appendix C.
c. FMEA: General Overview
For a system, a FMEA “is an engineering technique used to define,
identify and eliminate known and/or potential failures” before they reach the end user.63
A FMEA may take two courses of action. First, using historical data there may be an
analysis of data for similar products or systems. Second, inferential statistics,
mathematical modeling, simulations, and reliability analysis may be used concurrently to
identify and define the failures. A FMEA, if conducted properly and appropriately, will
provide the practitioner with useful information that can reduce the risk load in the
system. It is one of the most important early preventive actions in a system, which can
prevent failures from occurring and reaching the user. “FMEA is a systematic way of
examining all the possible ways in which a failure may occur. For each failure, an
estimate is made of its effect on the system, of its seriousness of its occurrence, and its
detection.” As a result, corrective actions required to prevent failures from reaching the
end user will be identified, thereby assuring the highest durability, quality and reliability
possible in the system. 64
d. When is the FMEA Started?65
As a methodology used to maximize the end user’s satisfaction by
eliminating and/or reducing known or potential problems, FMEA must begin as early as
possible, even if all the facts and information are not yet known. After FMEA begins, it
becomes a living document and is never really complete. It uses information to improve
the system and it is continually updated as necessary. Therefore, an FMEA should be
available for the entire system life.
66 The material from this section is taken (in some places verbatim) from: Stamatis, page 33,
”Interpretation of FMEA.”
67 Stamatis, page 33.
68 Stamatis, page 35.
27
The failure’s priority is represented through the risk priority number
(RPN), which is the product of occurrence times severity times detection. The value of
RPN is used only to rank order the concerns of the system. If there are more than two
failures with the same RPN, we first address the failure with the higher severity and then
with the higher detection. Severity comes first because it has to do with the effects of the
failure. Detection is next because user dependency is more important than the failure
frequencies.
(2) Do the functional block diagram: The first step for every
attempt to solve any problem is to become familiar with the subject to ensure that
everyone on the FMEA team has the same understanding of the process and the
production phases. A blueprint, an engineering drawing, or a flowchart review is
necessary. If it is not available, the team needs to create one. Team members should see
the product or a prototype and walk through the production process exactly. A block
diagram of the system provides an overview and a working model of the relationships
and interactions of the system’s subsystems and components.
69 McDermott, page 25. The above part of section is a summary and paraphrase (in some places
verbatim) of “Product/Design.”
70 The material from this section is taken (in some places verbatim) from Stamatis, pages 42-44, “The
Process of Conducting an FMEA,” and McDermott, pages 28-42, “The FMEA Worksheet.”
28
(3) Collect data: The team begins to collect and categorize data.
Then they should start filling the FMEA forms. The failures identified are the failure
modes of the FMEA.
29
(c) By assigning the detection rating, we estimate how
likely we are to detect a failure or the effect of a failure. We start by identifying controls
that may detect a failure or the effect of a failure. In case there are no controls, the
likelihood of detection will be low and the item would receive a high rating (9-10).
(6) Results: Results are derived from the analysis. RPNs must be
calculated and all FMEA forms are completed. The RPN is the product of severity, times
occurrence, times detection for all of the items. The total RPN is the sum of all RPNs.
This number is used as a metric to compare the revised total RPN against the original
RPN, once the recommended actions have been introduced. From the highest RPN to the
smallest, we can now prioritize the failure modes. A Pareto Chart or other diagram helps
to visualize the differences between the various ratings and enables decision regarding on
which items to work. Usually it is useful to set a threshold RPN such that everything
above that point is addressed.
(7) Confirm, evaluate and measure: After the results have been
recorded, confirmation, evaluation, and measurements of the success or failure are done.
Using an organized process, we can identify and implement actions to eliminate or reduce
the problem of high-risk failure modes. It is very common to manage a reduction on a
high-risk failure mode. After doing that, we refer back to the severity occurrence and
detection ratings. Often the easiest approach to make a process or product improvement is
to increase detectability of the failure, thus lowering the detection rating. This is not the
best approach because increasing failure-detectability only makes it easier to detect
failures once they occur. Reducing severity is important, especially in situations leading
to injuries. The best way for improvement is by reducing the likelihood of the occurrence
of the failure. And if it is highly unlikely that a failure will occur, there is less need for
detection measures. Evaluation answers the question: “Is the situation better, worse or the
same as before?”
(8) Do it all over again: The team must pursue improvement until
the failures are completely eliminated, regardless of the answer from Step 7, because
FMEA is a process of continual improvement. The long-term goal is to eliminate or
mitigate every failure completely. The short-term goal is to minimize the effects of the
30
most serious failures, if not eliminate them. Once action has been taken to improve the
product, new ratings should be determined and a resulting RPN calculated. For the failure
modes that have been corrected, there should be a reduction in the RPN. Resulting RPNs
and total RPNs can be organized in diagrams and compared with the original RPNs.
There is no target RPN for FMEAs. It is up to the organization to decide on how far the
team should pursue improvements. Failures happen sooner or later. The question is how
much relative risk the team is willing to take. The answer, again, depends on
management and the seriousness of failure.
g. FMEA Team
“A team is a group of individuals who are committed to achieving
common organizational objectives.” They meet regularly to identify, to solve problems,
and to improve processes. They work and interact openly and effectively together and
produce the desired results for the organization. “Synergy,” which means that “the sum of
the total is greater than the sum of the individuals,” is the characteristic of a team. 71
71 Stamatis, pages 85-88. The above part of section is a summary and paraphrase (in some places
verbatim) of “What Is a Team?” and “Why Use a Team?”
72 McDermott, page 15. The above part of section is a summary and paraphrase (in some places
verbatim) of “The FMEA Team.”
73 Hoyland, page 80, the above part of section is a summary of “Applications.”
31
(2) FMEA gives inadequate attention to human errors because the
focus is on hardware failures.
74 The material from this section is taken (in some places verbatim) from: Stamatis, pages 101-129,
“System FMEA,” “Design FMEA.”
32
The first step in conducting the system FMEA is a feasibility study to find
solutions to a problem. The outcome of the system FMEA is an initial design with a
baseline configuration and operational specifications.
33
The form and the rating guidelines for the design FMEA (or any kind of
FMEA) are not standardized. Each one performing FMEA makes his own forms and
rating guidelines, which correspond to the project’s special requirements and
characteristics, as well as the designer’s vision and experience.
There are also two ways that the rating guidelines can be formulated: The
qualitative method and the quantitative method. In both cases, the numerical values can
be from 1 to 5 or 1 to 10, which is most common.
34
(1) Subsystem Name (4) Supplier Involvement (8) FMEA Date
(2A) The Head of the System Design Team (6) Engineering Release Date (10) Part Name
(11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) Actions Results
Design Potential Potential Critical S Potential O Detection D R Recommended Responsible Area
Function Failure Effect(s) Characteri E Cause(s) C Method E P Action or Person and (24)
Mode of stics V of Failure C T N Completion Date (23) S O D R
Failure Action E C E P
Taken V C T N
35
l. FMEA Conclusion
Technology can develop complex systems today. UAVs are an example of
the increased automation built into a complex system. To be able to develop these
systems efficiently, a number of appropriate system development processes can be used.
Implementing such a process from the early stages of design is important for total
development, cost, and time.
The objective of a FMEA is to look for all the ways a system or product
can fail. Failure occurs when a product or system does not function as it should, or when
the user makes a mistake. Failure modes are ways in which a product or process can fail.
Each failure mode has a potential effect. Some effects are more likely to occur than
others. Each effect has a risk associated with it. The FMEA process is a way to identify
failure modes effects and risks within a process or product, and eliminate or reduce them.
76 The material from this section is taken (in some places verbatim) from Stamatis, pages 51-67,
“Relationships of FMEA and Other Tools.”
36
This thesis develops a FTA for SUAVs. FTA process outline
follows:
38
(7) RCM. 79 Reliability-centered maintenance (RCM) has its roots
in the aviation industry.80 Airlines and airplane manufacturers developed the RCM
process in the late 1960’s. The initial development work was started by North American
civil aviation industry. The airlines at that time began to realize that existing maintenance
philosophies were not only too expensive but very dangerous as well. In 1980, an
international civil aviation group developed an inclusive basis for different maintenance
strategies. This basis is known as the Maintenance Steering Group-3 (MSG-3) for the
aviation industry.81
The earliest view of failure in the 1930’s was that as products aged,
due to wear and tear, they were more likely to fail. So the best way to optimize system
reliability and availability was by providing maintenance on a routine basis. During
World War II, awareness about infant mortality led to the widespread belief in the
“bathtub curve”. In that case, overhauls or component replacements should be done at
fixed time intervals to optimize system reliability and availability. This is based on the
assumption that most systems operate reliably for a period of “X” and then wear out.
Keeping records on failures enables us to determine “X” and take preventive actions just
before deterioration starts. This model is true for certain types of simple systems and
some complex ones with age-related failure modes. However, after 1960, due to
complexity of the systems, research revealed that six failure patterns actually occur in
practice. Data collection and analysis will enable NAVAIR to determine which apply to
SUAVs.
79 The material from this subsection is taken (in some places verbatim) from: Aladon Ltd, Specialists
in the application of Reliability-Centered Maintenance, “Reliability Centred Maintenance-An
Introduction,” Internet, February 2004. Available at: www.aladon.co.uk/10intro.html
80 Hoyland, page 79.
81 Aladon Ltd, Specialists in the application of Reliability-Centered Maintenance, “About RCM,”
Internet, February 2004. Available at: www.aladon.co.uk/02rcm.html
39
(b) Constant or slowly increasing conditional probability of
failure, ending in a wear-out zone.
(f) A high infant mortality during the early period and then
constant or slowly decreasing conditional probability of failure.
The above six failure patterns are illustrated in the next figure.
Failure Rate
Failure Rate
Age Age
(a) (d)
Failure Rate
Failure Rate
Age Age
(b) (e)
Failure Rate
Failure Rate
Age Age
(c) (f)
Figure 1. The Six Failure Patterns
40
The idea of RCM is based on the realization that what users want depends
on the operating context of the system. So RCM is “a process used to determine what
must be done to ensure that any physical asset continues to do what its users want it to do
in its present operating context.” The RCM process asks seven questions about the
system under review. Any RCM process should ensure that all of the following seven
questions are answered satisfactorily in the sequence shown below:
• Actions are taken in the design and/or production phase to ensure that
failures do not recur.
41
• Tests are done at high level since improvements at that level have the
maximum effect on system reliability.
82 Blischke, R. W., and Murthy D. N. Prabhakar, Reliability Modeling, Prediction, and Optimization,
John Wiley & Sons, 2000, page 547-548.
83 Pecht, M., Product Reliability Maintainability and Supportability Handbook, CRC Press, 1995,
page 322.
84 Ibid, page 324.
42
ensure that any physical asset continues to do what its users want it to do in its present
operating context.”
b. The Seven Questions
The RCM process answers seven questions about the system under review.
Any RCM process should ensure that all of the following seven questions are answered
satisfactorily and are answered in the sequence shown below:
85 Moubray, John summarized by Sandy Dunn, Plant Maintenance Resource Center, “Maintenance
Task Selection-Part 3,” Revised September 18, 2002, Internet, May 2004. Available at: http://www.plant-
maintenance.com/articles /maintenance_tak_selection_part2.shtml
86 The material from this part of section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction.”
87 The material from this section is taken (in some places verbatim) from: Aladon Ltd, “About RCM.”
43
Nowlan and Heap’s report and MSG-3 have been used as a basis for
various military RCM standards and for non-aviation derivatives. Of these, by far the
most widely used is RCM-2.
RCM-2 is a process used to decide what must be done to ensure that any
physical asset, system or process continues to perform exactly as its user wants it to. The
process defines what users expect from their assets in terms of
The second step in the RCM-2 process is to identify the ways the system
can fail, followed by an FMEA to associate all the events that are likely to cause each
failure.
The last step is to identify a suitable failure management policy for dealing
with each failure mode. These policy options may include predictive maintenance,
preventive maintenance, failure finding, or changing the design and/or configuration of
the system.
The RCM-2 process provides rules for choosing which of the failure
management policies is technically appropriate and presents criteria for deciding the
frequency of the various routine tasks.
d. SAE STANDARD JA 1011
RCM-2 complies with SAE Standard JA 1011 or “Evaluation Criteria for
Reliability-Centered Maintenance (RCM) Process.” It was published in August 1999 by
the Society of the Automotive Engineers (SAE). It is a brief document setting out the
minimum criteria that any process must include to be called an RCM process when
applied to any particular asset or system.88
88 The material from this section is taken (in some places verbatim) from: Aladon Ltd, “About RCM.”
44
The standard says that in order to be called an “RCM” process, a process
must get satisfactory answers to the seven questions above, which must be asked, in that
particular order. The rest of the standard identifies the information that must be gathered,
and the decisions that must be made in order to answer each of these questions
satisfactorily. 89
e. MSG-390
In July 1968, Handbook MSG-1, “Maintenance Evaluation and Program
Development,” was developed by various airlines and air manufacturers’ representatives.
Decision logic and airline/manufacturer procedures for scheduled maintenance
development for the new Boeing 747 were the main part of the document.
89 The material from the above part of section is taken (in some places verbatim) from: Athos
Corporation, Reliability-Centered Maintenance Consulting, “SAE RCM Standard: JA 1011, Evaluation
Criteria for RCM Process,” Internet, February 2004. Available at: http://www.athoscorp.com/SAE-
RCMStandard.html
90 The material from this section is taken (in some places verbatim) from: Air Transport Association of
America, “ATA MSG-3, Operator/Manufacturer Scheduled Maintenance Development, Revision 2002.1,”
Nov 30, 2001, pages 6-8.
45
Some of the major improvements presented by MSG-3 as compared to
MSG-2 were
(3) “MSG-3 recognized the new damage tolerance rules and the
supplemental inspection programs and provided a method by which their purpose could
be adapted to the Maintenance Review Board (MRB) process instead of relying on type
data certificate restrains.” The MRB is discussed in Appendix B.
46
(6) Treatment of hidden functional failures was more thorough
because of their distinct separation from the evident functional failures.
The analysis process identifies all scheduled tasks and intervals based on
the aircraft’s certificated operating capabilities.
91 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 9-13.
92 ATA MSG-3, page 87.
47
A SSI is any “element or assembly,” related to significant flight, ground,
pressure or control loads. An SSI failure could affect the structural integrity of the
aircraft.93
• “To restore safety and reliability to their inherent levels when deterioration
has occurred;”
• “To obtain the information needed for design improvement of those items
whose inherent reliability proves insufficient;”
48
Scheduled maintenance consists of two groups of tasks:
49
Progressive logic diagram is the evaluation technique applied to each
maintenance significant item (MSI) using the technical data available. An MSI may be a
system, a subsystem, module, component, accessory, unit or part. In general, the
evaluations are based on the item’s functional failures and causes of the failure.
(1) The manufacturer divides the aircraft into the main functional
areas, Air Transport Association (ATA) systems, and subsystems. This division continues
“until all the aircraft’s replaceable components have been identified.”
50
(b) For those items for which all four questions are
answered with a “no,” MSG-3 analysis is not required. “The lower level items should be
listed to identify those that will not be further assessed.” This list must be reviewed and
approved by the Industry Steering Committee (ISC).
(5) The resulting list for the highest manageable level items is
considered the “candidate MSI list” and is presented by the manufacturer to the ISC. The
ISC reviews and approves this list, which is passed to the working groups (WGs).
(6) The WGs review the candidate MSI list in order “to verify that
no significant items have been overlooked, and that the right level for the analysis has
been chosen.” By applying MSG-3 analysis, the WGs can “validate the selected highest
manageable level or propose modification of the MSI list to the ISC.”
j. Analysis Procedure97
For each MSI, the following must be identified:
(2) Level 2 then takes the failure cause(s) for each functional
failure into account for selecting the specific type of task(s).”
Default logic concerns areas paths that do not affect safety. If there is no
“adequate information to a clear ‘yes’ or ‘no’ to the questions in the second level, then
default logic dictates a ‘no’ answer.” “No,” as an answer in most cases, provides a more
conservative and/or costly task.
52
Is the occurrence
of a functional
failure evident to the
Evident functional failure YES operating crew NO
during the
performance of
Does the functional normal duties?
failure or secondary
damage as a result
NO
of that, have a
direct adverse effect
on operating
safety?
Does the functional
failure have a direct
YES NO
adverse effect on
Level 1 YES operating
capability?
Level 2
Safety effects : Operational effects : Economic effects :
Task(s) required to Task desirable if it Task desirable if cost
assure safe reduces risk to an is less than repair
operation acceptable level costs
Is a lubrication or
YES service task NO Is a lubrication or
applicable and YES service task
effective? applicable and NO
effective?
Lubrication/
servising Lubrication/
servising
Is an inspection
YES or functional NO Is an inspection
check applicable YES or functional
and effective? check applicable NO
Inspection/
functional
and effective?
Inspection/
check functional
Is a restoration check
task to reduce Is a restoration Same as
YES NO
failure rate task to reduce Operational
YES
applicable and failure rate NO effects
effective? applicable and
Restoration effective?
Restoration
Is a discard task
to avoid/reduce Is a discard task
YES NO
failure rate to avoid/reduce
YES NO
applicable and failure rate
effective? applicable and
Discard effective?
Discard Redesign
may be
Is there a task or desirable
YES combination of NO
tasks applicable
and effective?
Figure 2. Systems Powerplant Logic Diagram Part1 (After ATA MSG-3, page 18)
53
Hidden functional failure
Does the combination of
a hidden functional failure
and one additional failure
of a system related or
backup function have an
adverse effect on
YES operating safety? NO
Level 1
Is a lubrication or Is a lubrication or
YES NO
service task applicable YES service task NO
and effective? applicable and
effective?
Lubrication/ Lubrication/
servising servising
Is an inspection or
Is an inspection or YES functional check NO
functional check to
YES NO applicable and
detect degradation of
function applicable effective?
Inspection/
Inspection and effective? functional
/functional check
test
Is a restoration task
Is a restoration task to YES to reduce failure NO
YES reduce failure rate NO rate applicable and
applicable and effective?
effective? Restoration
Restoration
Is a discard task to
YES avoid/reduce failure NO
Is a discard task to
YES NO rate applicable and
avoid/reduce failure
effective?
rate applicable and
effective? Discard
Discard Redesign is
desirable
Is there a task or
YES combination of tasks NO
applicable and
effective?
Figure 3. Systems Powerplant Logic Diagram Part2 (After ATA MSG-3, page 20)
54
l. Procedure
This procedure requires consideration of the functional failures,
failure causes, and the applicability or effectiveness of each task. Each
functional failure processed through the logic will be directed into one of
five failure effect categories: 99
• Safety
• Operational
• Economic
• Hidden safety
• Hidden non-safety100
m. Fault Tolerant Systems Analysis101
“In MSG-3 analysis, a fault tolerant system is one that has redundant
elements that can fail without impacting safety or operating capability.” These faults are
not very noticeable to the operating crew and the aircraft’s safety and airworthiness is not
impaired. So, “functional failures, in fault tolerant systems, are hidden non-safety.” The
“fault-tolerant” faults can be “detected by interrogation of the system.”
The method for analyzing MSIs that include fault-tolerant functions has
the following steps:
• “The manufacturer identifies and lists all functions, highlighting those that
are fault-tolerant.”
55
n. Consequences of Failure in the First level102
There are four first-level questions.
(1) Evident or Hidden Functional Failure. Question: “Is the
occurrence of a functional failure evident to the operating crew during the performance of
normal duties?”
The intention for this question is to separate the evident from the
hidden functional failures. The operating crew is the pilots and air crew on duty. The
ground crew is not part of the operating crew. A “yes” answer indicates the functional
failure is evident and leads to Question 2. A “no” answer indicates the functional failure
is hidden and leads to Question 3.
(2) Direct Adverse Effect on Safety. Question: “Does the
functional failure or secondary damage resulting from the functional failure have a direct
unfavorable effect on operating safety?”
A direct functional failure or resulting secondary damage
“achieves its effect by itself, not in combination with other functional failures.” If the
consequences of the failure condition would “prevent the continued safe flight and
landing of the aircraft and/or might cause serious or fatal injury to human occupants,”
then safety should be considered as unfavorably affected. A “yes” answer indicates that
this functional failure must be considered within “the Safety Effects category” and task(s)
must be developed accordingly. A “no” answer indicates the effect is either “operational
or economic” and leads to question 4.
(3) Hidden Functional Failure Safety Effect. Question: “Does the
combination of a hidden functional failure and one additional failure of a system related
or back-up function have an adverse effect on operating safety?”
This question is asked of each hidden functional failure, identified
in Question 1. A “yes” answer indicates that there is a “safety effect and task
development must proceed in accordance” with the hidden-function safety-effects
category. A “no” answer indicates that there is a “non-safety effect and will be handled in
accordance” with hidden-function non-safety effects category.
102 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 26-30.
56
(4) Operational Effect. Question: “Does the functional failure
have a direct unfavorable effect on operating capabilities?”
In this question, considerations must be taken concerning the
operating restrictions, correction prior to further dispatch, and abnormal or emergency
procedures from the flight crew. A “yes” as an answer means that the effect of the
functional failure has an unfavorable effect on operating capability, and task selection
will be handled in evident operational effects category. A “no” as an answer means that
there is an economic effect and should be handled in accordance with evident economic
effects category.
o. Failure Effect Categories in the First Level103
After the analysts have answered the applicable first-level questions, “they
are directed to one of the five effect categories.”
(1) Evident Safety: The Evident Safety Effect category concerns
the safety operation assurance tasks. “All questions in this category must be asked.” In
case no effective task(s) results from this category analysis, “redesign is mandatory.”
(2) Evident Operational: In this category, a task is “desirable if it
reduces the risk of failure to an acceptable level.” Analysis requires the first question
(LU/SV) to be answered and regardless of the answer, to proceed to the next level
question. From that point a “yes” as an answer completes the analysis and “the resultant
task(s) will satisfy the requirements.” If all answers are “no,” then no task has been
generated and if operational penalties are severe, redesign may be desirable.
(3) Evident Economic: In that category, a task(s) is desirable if its
cost is less than the repair cost. Analysis has the same logic as the operational category. If
all answers are “no,” then no task has been generated and if economic penalties are
severe, a redesign may be desirable.
(4) Hidden Safety: “The hidden function safety effect requires a
task(s) to assure the availability necessary to avoid the safety effect of multiple failures.”
All questions must be asked and “if there are no tasks found effective, then redesign is
mandatory.”
103 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 31-38.
57
(5). Hidden non-Safety: “The hidden function non-safety category
indicates that a task(s) may be desirable to assure the availability necessary to avoid the
economic effects of multiple failures.” Analysis has the same logic as the operational
category. If all answers are “no,” no task has been generated and if economic penalties
are severe, a redesign may be desirable.
p. Task Development in the Second level104
For each of the five-effect categories, task development is used in a similar
manner. “It is necessary to apply the failure causes for the functional failure to the second
level of the logic diagram” for the task resolution as in Table 2. There are six possible
task follow-on questions in the effect categories.
(1) Lubrication/servicing (in all categories). Question: “Is the
lubrication or servicing task applicable and effective?”
104 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 31-47.
58
(4) Restoration (All categories). Question: Is a restoration task to
reduce the failure rate applicable and effective?
59
Task Applicability Safety Operational Economic
Effectiveness Effectiveness Effectiveness
Lubrication The replenishment The task must The task must The task must be
or Servicing of the consumable reduce the risk of reduce the risk cost effective (i.e.,
must reduce the failure. of failure to an the cost of the
rate of functional acceptable level. task must be less
deterioration. than the cost of
the failure
prevented)
Operational Identification of The task must No applicable. The task must
or Visual failure must be ensure adequate ensure adequate
possible. availability of the availability of the
Check hidden function hidden function,
to reduce the risk to avoid economic
of a multiple effects of multiple
failure. failures and must
be cost effective.
Inspection Reduced resistance The task must The task must The task must be
or to failure must be reduce the risk of reduce the risk cost effective
detectable, and failure to assure of failure to an
Functional there exists a safe operation acceptable level.
Check reasonably
consistent interval
between a
deterioration
condition and
functional failure.
Restoration The item must The task must The task must The task must be
show functional reduce the risk of reduce the risk cost effective
degradation failure to assure of failure to an
characteristics at safe operation. acceptable level.
an identifiable age,
and a large
proportion of units
must survive to
that age. It must be
possible to restore
the item to a
specific standard
of failure
resistance.
Discard The item must The safe life limit The task must An economic life
show functional must reduce the reduce the risk limit must be cost
degradation risk of failure to of failure to an effective.
characteristics at assure safe acceptable level.
an identifiable age, operation.
and a large
proportion of units
must survive to
that age.
60
3. Comparison of Existing Methods
a. RCM
It is clear that maintenance activity must help ensure that the inherent
levels of safety and reliability of the aircraft are maintained.
We may say in simple words that the RCM goals are to:
• Assign the least expensive but adequate maintenance task to prevent each
failure.107
105 Nakata, Dave, White paper, “Can Safe Aircraft and MSG-3 Coexist in an Airline Maintenance
Program?”, Sinex Aviation Technologies, 2002, Internet, May 2004. Available at: http://www.sinex.com/
products/Infonet/q8.htm
106 The above part is taken (in some places verbatim) from: Nakata.
107 The above part is taken (in some places verbatim) from: National Aeronautics and Space
Administration (NASA), “Reliability Centered Maintenance & Commissioning,” slide 5, February 16,
2000, Internet, May 2004. Available at: http://www.hq.nasa.gov/office/codej/codejx/Intro2.pdf
61
b. Conducting RCM Analysis
Some managers, who see RCM as a quick, cheap and easy route to
obtaining the particular maintenance policies they are seeking, frequently overrule junior
staff taking part in RCM analysis. This is a poor approach to the conduct of any analysis.
RCM is better conducted by a review group, which may involve senior staff alongside
more junior staff. An experienced analyst with a developed background in RCM and in
managing groups should lead it. If the group functioning is wrong, it is improper to blame
RCM for what project management is failing to achieve.108
c. Nuclear Industry & RCM109
The initial maintenance programs in US nuclear power plants were
developed in conventional fashion, mainly depending on vendor recommendations.
“Continuing efforts to enhance safety and reliability” resulted in “utility management at
some plants” questioning if the overall outcome was a “significant degree of over-
maintenance.” By the early 80s, the nuclear power industry seemed to be “faced with a
choice of either generating power or doing the prescribed planned maintenance (PM).”
They were seeking a way to reduce the PM workloads without impairing safety or
reliability. This is the same type of question applicable to SUAV maintenance.
62
“The most abbreviated approach,” recommended by EPRI in TR-105365
in September 1995, “modified the RCM process by setting up a list of simple functional
questions,” without further functional analysis, the question is whether the component
failure leads to:
NAVAIR, which was one of the sponsors of the original Nowlan & Heap
report, found that some vendors were using all sorts of unique and custom-made
processes, which they described as “RCM processes,” to develop maintenance programs
for equipment that they were selling to NAVAIR. “In this age of ‘do more with less,’
there is a problem that has infected the discipline of physical asset management. In the
interest of saving time and money, corrupted versions of RCM, versions that
110 The material from this section is taken (in some places verbatim) from: Regan, Nancy, RCM Team
Leader, Naval Air Warfare Center, Aircraft Division, “US Naval Aviation Implements RCM,” undated,
Internet, February 2004. Available at: http://www.mt-online.com/articles/0302_navalrcm.cfm
63
irresponsibly shorten the process, continue to flood the market. These tools are
incorrectly called RCM.”
(1) The head internal sponsor of the effort “quit the organization or
moved to a different position before the new ways of thinking embodied in the RCM
process” could be absorbed.
(2) The internal sponsor and/or the consultant, who was the acting
change agent, “could not generate sufficient enthusiasm for the process,” so it was not
applied in a way which would yield results.
111 The material from the above part of section is taken (in some places verbatim) from: Aladon Ltd,
“About RCM.”
112 The material from this section is taken (in some places verbatim) from: Moubray, page 5.
64
Of course, the other two-thirds have been successful. There is “a high
correlation between the success rate of RCM-2 (MSG-3) applications and the change
management capabilities of the consultants involved.” For example, the (British) Royal
Navy (RN), which is a major user of SAE-compliant RCM, “has come to understand that
the capabilities of individual consultants are as important as the track record of their
employers.” So the “RN now insists on interviewing at great length every RCM
consultant that is at their disposal” to verify the commercial sincerity of the employers.
When discussing RCM, both the economic benefits and the question of
risk are considerations. For the economic benefits in some cases, “the payback period has
been measured in days and sometimes one or two years.” The normal period is weeks to
months. “These economic benefits flow from improved plant performance” mostly,
although in some cases users (especially military) have achieved very substantial
“reductions in direct maintenance costs”.
It is often said that RCM “is a good tool for developing maintenance
programs in ‘high risk’ situations” and that “some equipment items have such low impact
on business risk that the effort required to perform RCM analysis on them is greater than
the potential benefits.” The truth is that “no physical asset or system can be deemed to be
‘low risk’ unless it has been subjected at the very least to a zero-based FMECA” that
proves it is in fact low risk.
About the supposedly “low risk” industries: automobile and food plants
are frequently said to be “low risk,” and therefore not worth strict and rigorous analysis.
65
The truth is that you cannot characterize these industries as low risk as the following
examples indicate:
(2) The failure of the Firestone tires on Ford Explorers, which has
been charged to the design, the operating pressure and to manufacturing process failures.
These failures put the existence of Firestone as a company at risk,
113 The material from this section is taken (in some places verbatim) from: NASA, slide 13.
66
• Corrective/preventive measures
“If there are many different areas of concern and all of them need to be
revealed, then a FMECA is more effective because it has a greater chance of finding the
critical failure modes.” If only a single event or a few events that can be clearly defined
are of crucial concern, then FTA is favored.
The desire for either a qualitative and/or a quantitative analysis is not the
distinguishing factor for selecting a FTA or a FMECA/FMEA. Either approach can give
qualitative or quantitative results. The following table gives guidance for choosing
between FTA and FMECA/FMEA.
114 The material from this section is taken (in some places verbatim) from: Reliability Analysis Center
(RAC), Fault Tree Analysis (FTA) Application Guide, 1990, pages 8-10.
67
FTA vs FMECA Selection Criteria FTA FMECA/
Preferred FMEA
Preferred
Safety of personnel or public as the primary concern X
A small number of explicitly defined “top events” X
Inability to clearly define a small number of “top events” X
Mission completion is of critical importance X
Any number of successful missions X
“All possible” failure modes are of concern X
“Human errors” contributions are of concern X
“Software errors” contributions are of concern X
A numerical “Risk evaluation” is the primary concern X
System is highly complex and interconnected X
System with linear architecture and little human or software X
intervention
System is not repairable X
i. FTA115
For any reliability program, FTA is an effective tool. It is a quick way of
“understanding the causes of a system’s inherent problems” and also a way to “identify
potential safety hazards during the design phase.”
Tailoring the FTA to fit the specific type of analysis that is necessary for a
certain scope requires two decisions. The selection of the “top event,” which is the target
upon which the FTA is to focus is the first decision, and the concern of whether the
analysis is about to yield qualitative or quantitative or both types of results is the second
decision.
j. RCM Revisited
“RCM is better in the operating and support phase of the life cycle of a
system” This is true when considering how and why RCM was created. For example,
airplanes were used from the beginning of the previous century. The general concept of
the airplane has been known for many years. Legal requirements and special regulations
controlling manned-aviation have also been in place for many years. Thus, in this case,
115 The material from this section is taken (in some places verbatim) from: RAC FTA, pages 9-11.
68
RCM provided solutions to certain manned-aviation problems mainly related to
operations and maintenance issues with safety and economics as backgrounds. Similarly,
many other industries also employed RCM to solve such problems.
69
k. UAVs, SUAVs versus Manned Aircraft
The primary difference between manned piloted aircraft and UAVs is that
piloted aircrafts rely on the presence of humans to detect (sense) and respond to changes
in the vehicle’s operation. The human can sense the condition of the aircraft, say with
unusual vibration that may indicate structural damage or impending engine failure.
Humans can sense events within and outside the vehicle, gaining what is known as
“situational awareness.”
During the last few years, commanders no longer want their SUAVs to be
“toys” that uncertainly expand their capabilities. Commanders want their SUAVs to be
operationally effective assets to help win battles. “Operationally, the same case may be
made for ensuring the missions are completed if we rely on UAVs to accomplish mission
critical tasks once done using manned assets.”116
116 Clough, Bruce, “UAVS-You Want Affordability and Capability? Get Autonomy!” Air Force
Research Laboratory, 2003.
70
There are some facts about SUAV systems that require consideration:
(6) Most of the SUAVs are not maritime systems; they are in design phase
or operational testing.
(10) Human factors for the GCS are critical since they are the linkage
between the system and its effective employment.
l. Conclusions-Three Main Considerations about UAV- RCM
The reliability tracking and improvement system for SUAVs must be
inexpensive, easily and quickly adapted, and implemented by a few, relatively
inexperienced personnel. It must also cover the entire system’s issues of hardware,
software and human factors. The safety requirements for personnel apply only to the
ground operators and maintainers and the main source of data for hidden failures during
flight can only be provided by telemetry. Finally, because sensor technology is rapidly
developing and easily implemented due to low cost, the reliability tracking and
improvement system for SUAVs must be easily adaptable to changes.
From the above we can construct the following table which summarizes
the basic differences between the MSG-3 and FMEA/FMECA methods with respect to
SUAVs:
71
FMECA
FMEA/
MSG-3
RCM
SUAVs
So, the main considerations about RCM implementation for SUAVs are:
(2) In the RCM process, the key factor for the initial identification
of the hidden failure is the flight crew. In the UAV case, there is no crew aboard and so
there is no chance for crew to sense hidden failures. The only indication that might be
available is the platforms’ control sensors reading while in-flight and the system’s
performance while a platform is tested on the ground prior to take-off.
From the above it is clear that RCM MSG-3 is not suitable for
SUAVs. These leaves fault tree analysis and FMEA as the remaining methods. We
develop both in detail for the SUAV in the subsequent chapters of this thesis.
72
B. SMALL UAV RELIABILITY MODELING
During recent urban operations in Iraq and Afghanistan, SUAVs that provide
over-the-hill or around-the-corner information were invaluable for operating teams. Some
systems have been tested with very good results, but controversy surrounds the
capabilities of such systems. A generic SUAV system must provide military forces with
real-time around the clock surveillance, target acquisition, and battle assessment. Such a
system must be capable of detecting any desired tactical information in a designated
sector.
Each service component (Navy, Army, and Marines) requires versatile, easy to
handle, and user-friendly systems that enable the commander to conduct reconnaissance
on the battlefield in real-time. SUAVs are being seriously considered for this role. This
entails a small-scale operation over a city block, or more extensive surveillance missions.
Requirements of the system include locating and identifying targets, then relaying the
information to a higher command. The detection accuracy should be sufficient to select
and to deploy weapons, and then to maintain contact after engagement with such
weapons. The system must be able to survey a large area rapidly using multiple platforms
simultaneously. The configuration of the system should enhance the fighting capabilities
of the force, minimizing the time for precise control movements and maximizing
mobility, robustness and functionality. Due to previous experience with similar systems,
reliability and interoperability are most important considerations.
1. System’s High Level Functional Architecture
As illustrated in Figure 4, SUAV battlefield systems high-level architecture
consists of the following:
(1). Platform(s)
73
(c) Onboard computer (OBC)
(d) Payload with the appropriate sensors for the type of mission
(2). Ground control station (GCS) with command, monitor and support
capabilities.117 This may be shipboard or land-based.
Platform
Navigation
GPS
INS
Flight Control
Onboard Computer (remote manual,
Hardware, Software, semi auto,
Peripherals autonomous
operation)
Payload
Sensors
Communication
Channels
117 Fei-Bin, Hsiao, and others, ICAS 2002, 23rd International Congress of Aeronautical Sciences,
proceedings, Toronto Canada, 8 to 13 September, 2002, Article: “The Development of a Low Cost
Autonomous UAV System”, Institute of Aeronautics National Cheng Kung University Tainan, TAIWAN
ROC.
74
Platform
GPS
Structure Fuel Tank
INS Engine
Sensors
Other
4. Lightning
Antennas
5. Fog
6. Altitude
7. Icing Conditions
8. Wind Speed Communication
Line of
9. Proximity to Sea Channels
Sight
10. Proximity to Desert
(LOS)
11. Proximity to
Inhabited Area Ground Control
Station (GCS)
Antennas
Auto Control
Power
Screen Semi-auto Manual
Supply
Output Control Control
Landing
Battery Start-up Auto Launching
Spare Parts
Charger Device Recovery Device
Unit
75
For the platform’s configuration, weight and volume are critical factors because
of the limited size and the flight characteristics of the platform. The system is a complex
one and reliability plays an important role for the operational effectiveness of the system.
In general, there are two ways to increase reliability: Fault tolerance and fault
avoidance.118
The SUAV system cannot implement fault tolerance, at least for the platforms, so
fault avoidance is the better approach. To achieve this, at first we must conduct an FMEA
in order to define and to identify each subsystem function and its associated failure
modes for each functional output.
118 Reliability Analysis Center (RAC), Reliability Toolkit: Commercial Practices Edition. A Practical
Guide for Commercial Products and Military Systems Under Acquisition Reform, 2004, page 115.
76
To perform these analyses, we will use the qualitative approach due to lack of
failure rate data and a lack of the appropriate level of detail for part configuration.119
2. System Overview
The airborne system comprises the aerial platform and an onboard system. The
ground system comprises a PC and a modem to communicate with the airborne system.
All the onboard hardware is packed in a suitable model platform powered by a 1.5
kilowatts (kw) aviation fuel (JP-5) engine with a wingspan of 1.5 meters (m) and a
fuselage diameter of 12 centimeters (cm). The sensor’s payload is about two kilograms
(kg).
The GCS PC is the equivalent of a pilot’s cockpit. It can display in near real time
the status of the flying UAV or UAVs including:
• Speed
• Altitude
• Course
• Output from the mission sensors such as near real time imagery displayed
from various types of cameras like CCD, infrared (IR) and others.
119 Reliability Analysis Center (RAC), Failure Mode, Effects and Criticality Analysis (FMECA),
1993, pages 9-13.
77
3. System Definition
Platform
OBC Communication
Channels
Flight
Engine Power
Controls
Proper Landing
Battery Auto Pilot Structure Fuel Tank
Software Gear
Environmental Considerations
in Field Operation
1. Temperature
Payload GPS Communication 2. Humidity- Precipitation
3. Cloudiness
4. Lightning
Cameras INS Receivers 5. Fog
6. Altitude
Other Avionics 7. Icing Conditions
Transmitters 8. Wind Speed
Sensors Sensors
9. Proximity to Sea
10. Proximity to Desert
Antennas 11. Proximity to Inhabited Area
Line of
Sight
Ground Control (LOS)
Station (GCS)
OBC
Start-up
Auto Control Receivers Pilot Battery
Device
78
Using the diagram in Figure 6, we give the following functional definitions to
each element in the diagram.
Platform Structure: The flying physical asset responsible for integration of all
the necessary equipment for the mission profile.
Payload, Cameras and Other Sensors: The actual physical assets for the type of
desired mission consisting mainly of cameras and other special sensors like NBC agent
detectors, magnetic disturbance, detectors, and much more.
GPS: The primary navigation system based on a satellite network known as the
Global Positioning System.
INS: The support navigation system based on the inertial calculations of current
speed and course in order to provide an accurate platform fix that will be used for piloting
the platform and for target tracking.
Battery: The electric power supply asset for the entire platform’s equipment
service.
Flight Controls: The necessary flight sensors, like pittot tubes, hardware,
ailerons, elevators, rudder, the relevant servo units and the flight controller together with
the right software for manual, semi-auto and autonomous flight.
Landing Gear: Responsible for platform mobility in ground during takeoff and
landing. Not mandatory for use.
79
GCS: The manned shipboard or land-based component of the system responsible
for command, control communication and support system center.
GCS Flight Controls: The GCS hardware and software for flight controls.
GCS Antennas: Conducts the transmitted and received signals to and from the
platform and other centers related to the mission and passes them to/from transmitters or
receivers and the appropriate communication hardware and software.
GCS Proper Software: The necessary software for GCS mission control.
Start-up Device: Responsible for the initial start up of the engine’s platform prior
to takeoff.
Spare Parts: Necessary items for operating and supporting the system.
Power Supply: Generator and batteries that provide the GCS electric power.
Personnel: A pilot, a load/sensor operator, and maintainers who man the system
for one shift.
Landing Auto Recovery Unit: Provides auto guidance to the platforms for auto-
landings.
4. System Critical Functions Analysis
The SUAV essential functions analysis can be seen in Table 6.
80
Mission Phases
Cruise Back to
Cruise to Area
Off Station
On Station
of Interest
Stand-by
Launch
Land
Item
Base
Essential Functions
Flight
1 Provide structural integrity x x x x x x x
2 Provide lift and thrust x x x x x
Provide controlled flight
3 Manual control x x x x x
4 Semi auto x x x x x
5 Auto x x x x x
6 Navigate x x x x
7 Provide power to control x x x x x x
and navigation equipment
8 Withstand environmental x x x x
factors (mainly wind)
Mission
9 Start systems x
10 System’s backup x x
11 Communications x x x x
12 Line of sight x x x x
13 Provide power to sensors x x x x
and communications
14 Detect, locate and identify x x x
targets
15 Provide data x x x
16 Provide video image x x x
17 Monitor system’s functions x x x x x x x
81
5. System Functions
The mission phase consists of the following functions:
• Detect targets
• Identify targets
• Classify targets
• Track targets
• Service the platform at a certain time and set it ready for the next mission
These functions are the primary drivers for software development and among the
factors for the hardware selection.
6. Fault Tree Analysis
In the following fault-tree analysis of a SUAV system a top-down analysis has
been used to reveal the failure causes. The sub-analyses end with a circle, which means
that further analyses are needed at a more detailed level, or end with a diamond, which
means that the analysis stops there. Due to a lack of data, only the mechanical engine
failure has been analyzed at more than one level. Using that analysis we formulate a
model to use as an example for further analysis.
82
7. Loss of Mission
The first attempt for the fault-tree analysis should be the loss of the mission tree.
The reasons for mission loss may be:
83
Loss of Mission
OR
Loss of
Platform
Unable to Locate
Loss of GCS
Platform
Out-of-System Inappropriate
M1 Reasons Sensors
Unable to Launch
L5 Sensor Failure A1
Platform
Loss of
Unable to
Operator O2 S2
Communicate
(Pilot)
With Platform
M3 S1
Loss of M4
OBC
84
8. Loss of Platform
The reasons for loss of platform may be:
85
M1
Loss of Platform
OR
Loss of Structural
Integrity
Loss of Platform’s
Loss of Lift
Position
Loss of
L1 Loss of Thrust
GCS
L2 Loss of Control L6
L3 L5
L4
86
9. Loss of GCS
The reasons for loss of GCS may be:
(7) Fire
87
L5
Loss of
GCS
OR
Software
Failure Loss of
Environmental OBC
Loss of
GCS Reasons
Power
Loss of GCS
Fire Communication
Loss of GCS
Personnel
88
10. Loss of Platform’s Structural Integrity
The reasons for loss of platform’s structural integrity include fuselage, wing, or
empennage related problems, which could be due to:
(1) Fracture
Figure 10 contains the fault-tree analysis for loss of platform’s structural integrity.
89
L1
OR
Fuselage Wings Empennage
Same as Same as
OR
Wings Wings
Operator Error
O1
90
11. Loss of Lift
Reasons for loss of lift may be:
(3) Loss of wing surface, which could be due to loss of right or left wing
surface, which in turn could be due to:
91
L2
Loss of Lift
OR
Loss of Wing Operator Error Loss of Thrust
Surface
OR
O1 L3
Same as
Right Wing
OR
Operator Error
Fracture
Pressure
Removal Thermal
Overload Delamination/
Weakening Connection
Fiber Buckling
Failure
O1
92
12. Loss of Thrust
Reasons for loss of thrust may be:
93
L3
Loss of Thrust
OR
Loss of Engine Loss of
Loss of Engine Operator Error
Control Propeller
OR
OR
E2 O1
F1 O1 E2
94
13. Loss of Platform Control
Reasons for loss of control may be:
(a) Loss of left wing aileron force that could be due to:
(b) Loss of right-wing aileron force for the same as the left-wing
aileron reasons
(5) Loss of rudder force for the same as the left-wing aileron reasons
(6) Loss of elevator force for the same as the left-wing aileron reasons
95
L4
Loss of Control
OR
Loss of Loss of Loss of Elevator Loss of Control Loss of
Loss of Lift
Aileron Forces Rudder Force Force Channel Power
Same as Same as
OR
OR
C1 L2
Same as
Left Aileron
OR
P1 P2
Loss of
Aileron
Loss of
Loss of Surface
OBC
Servo
Disruption Unit
of Control
Cables
96
14. Loss of Platform Position
Reasons for loss of platform position may be:
L7
Loss of Platform
Position
OR
Platform Failure
Loss of Loss of
Loss of Loss of to Transmit
LOS INS Loss of
GPS Unit GPS
backup GPS Signal
Antenna
T1
97
15. Loss of Control Channel
The reasons for loss-of-control channel may be:
(7) Loss of GCS control antenna the same as reasons for loss of platform
control antenna
98
C1
Loss of Control
Channel
OR
Loss of GCS Loss of Platform
Loss of Power
Operator Control Antenna Control Antenna
(pilot) Control Failure of GCS
Panel Failure Loss of Control
LOS Transmitter Same as Loss of
Platform Control
OR
Antenna
Failure of
Control
Receiver
OR
Structural
Damage
Loss of Loss of Antenna
Platform Power GCS Power Antenna Disconnection
Failure
Short-circuit
P1 P2 Antenna
99
16. Engine Control Failure
Engine control failure may be caused by:
The fault-tree analysis for engine control failure can be seen in Figure 16.
E2
Engine Control
Failure
OR
Disruption
of Control Engine Failure
Cables Loss of Carburetor
Controller Loss of Failure
Servo
Unit
Loss of
E1
LOS
100
17. Engine Failure
The reasons for engine failure may be:
The fault-tree analysis for engine failure can be seen in Figure 17.
101
E1
Engine Failure
OR
Mechanical Loss of
Engine Failure Engine Lubrication
Fire
Improper
OR
E3 Fuel
Fuel/Air
Improper
Excessive Mixture
Engine
Vibrations Gas/ Excessive Improper
Lubricant Engine Lubricant
Improper Temperature
Mixture Rise
102
18. Failure of Fuel System
The reasons for fuel system failure may be:
(c) Fire
The fault-tree analysis for fuel system failure can be seen in Figure 18.
103
F1
Failure of Fuel
System
OR
Failure of Engine Loss of Fuel
Fuel System Supply
OR
OR
Fuel Pump
Line Hydrodynamic
Failure Ram
Penetration Fire/
Fuel
of Fuel Lines Explosion
Pump
Failure
Penetration
Fuel Tank Fuel of Fuel Tank
Fire Lines Depletion
Failure
Carburetor
Failure Penetration
of Fuel Lines
104
19. Loss of Platform Power
The reasons for loss of platform power may be:
105
P1
OR
Battery
Fuse Failure
Failure
Wiring
Short-circuit
OR
OR
Battery
Battery Exhaustion
Disconnection
106
20. Loss of GCS Power
Reasons for loss of GCS power may be:
107
P2
OR
Loss of GCS
Wiring Fuse Failure
Generator
Short-
circuit Main and
Auxiliary
G1
OR
Power
Failure
Power
Disconnection
Circuit Improper
Problem Fuse
108
21. Operator Error
Reasons for operator error may be:
(8) Poor workload balance resulting in task saturation with resulting loss
of situational awareness
109
L5
Operator Error
OR
Poor
Personnel Workload
Fatigue Balance
Personnel
Lack of
Ergonomics
Experience
of GCS
Operator’s
Wrong
Inadequate Reaction to
Training Failure
Environmental Poor
Reasons Documentation
Misjudgment of Procedures
Inadequate Man
Machine
Interface
110
22. Mechanical Engine Failure
Reasons for mechanical engine failure may be:
(d) Piston(s)
(f) Bearings
(g) Crankshaft
(8) Overheating
111
E3
Mechanical Engine
Failure
OR
Engine
Vibrations
Carburetor
Failure
Crash Damage
E5 Overheating
Bad
Operator’s Inappropriate Material
Error Engine Operation
Normal
E4 Engine Wear
Bad
E6 Manufacture
L5
Bad
Design
Insufficient or
Bad Maintenance
112
23. Engine Vibrations
Reasons for engine vibrations may be:
(c) Piston(s)
(e) Bearings
(f) Crankshaft
113
E5
Engine Vibrations
OR
Broken Bad
Piston Design
Bearing Broken
Failure Piston
Rings
Improper
Engine Lack of
Mounting Propeller
Balancing
Bad
Manufacture
114
24. Overheating
Reasons for engine overheating may be:
(c) Piston(s)
(e) Bearings
(f) Crankshaft
115
E4
Overheating
OR
Engine Operating
too Fast Bad
Design
Bearing
Failure Broken
Piston
Bad Rings
OR
Material
Dirty Cooling
Bad Lubricant Surfaces
Bad
Manufacture
Improper
Propeller Inappropriate
Size Fuel
Improper
Engine
Adjustments
116
25. Inappropriate Engine Operation
Reasons for inappropriate engine operation may be:
(8) Inappropriate lean runs (starting after a long period of storage without
any precautions) such as rusted bearings, seized connecting rod or piston, dry piston rings
The fault-tree analysis for inappropriate engine operation can be seen in Figure
25.
117
E6
Inappropriate Engine
Operation
OR
Improper
Engine Propeller Stops
Adjustments While Turning
Inappropriate Inappropriate
Fuel Lubricant
Inappropriate
Inappropriate
Fuel/Lubricant
Engine
Mixture
Cleaning
Inappropriate
Engine Stall Engine
Storage
Bad
Carburetor
Adjustments
118
26. Follow-on Analysis for the Model
The occurrence of the top event is due to different combinations of basic events.
A fault tree provides useful information about these combinations. In this approach, we
introduce the concept of the “cut set.” A cut set is “a set of basic events” whose
occurrences result in the top event. A cut set is said to be a “minimal cut set” if any basic
event is removed from the set and the remaining events no longer form a cut set.120
For example, Figure 26 shows that the set {1, 2, 3, and 4} is a cut set because if
all of the four basic events occur, then the top event occurs.
AND
#3 Basic
#4 Basic
OR
Event
Event
#1 Basic #2 Basic
Event Event
Figure 26. Example for Cut Set. (After Kececioglu, page 223)
120 Kececioglu, D., Reliability Engineering Handbook Volume 2, Prentice Hall Inc., 1991, page 222.
119
This is not the minimal cut set, however, because if the basic event 1 or basic
event 2 is removed from this set, the remaining basic events {1, 3 and 4} and {2, 3 and 4}
still form cut sets. These two sets are the minimal cut sets in that example.
In the SUAV case, there is an absence of AND gates. Only OR gates are present.
For example, trying to find the minimal cuts for engine failure, gates in the following
diagrams are involved:
Naming the gates G1, G2, up to G8,we number each basic event related to each of
the gates. For example, in the engine failure diagram we have gate G1 with the following
basic events:
Working in the same way we end up with the diagram in Figure 27.
120
E1
Engine
Failure
OR
G1
Engine Loss of
Mechanical
E5 Vibrations 1G1 Lubrication
Engine
E3 Failure
G4 G3
2G1
OR
OR
3G1
G2 4G1
OR
1G3
3G3 2G3
7G4 1G4
1G2
6G4 2G4 Crash
Damage
E4
5G4 3G4 Overheating
2G2
4G4 Operator E6
L5 Inappropriate 3G2
Error
Engine
Operation 4G2
5G2 G7
OR
G5
OR
6G2
9G7 3G7
3G5
OR
7G5 G6
8G7 4G7
6G5 4G5 Engine
Operating
too fast 7G7 5G7
1G6
5G5 7G6 6G7
2G6
6G6
OR
G8
5G6 3G6
4G6
3G8
1G8
2G8
121
According to the MOCUS algorithm, which generates the minimal cut sets for a
fault tree in which only AND and OR gates exist, an OR gate increases the number of cut
sets while an AND gate increases the size of a cut set.121 MOCUS “algorithm is best
explained by an example.”122 In the following paragraph, the steps of MOCUS algorithm
were followed to determine the minimal cut sets.
Locating the uppermost gate, which is the OR gate G1, we replace the G1 gate
with a vertical arrangement of the inputs to that gate. Were it an AND type, then we
should have replaced it with a horizontal arrangement of the inputs to that gate.
Continuing in the next level to locate the gates, and replacing them in the above-
prescribed way yields Table 7.
122
1 2 3 4 5 …… …… Last
G1 1G1 1G1 1G1 1G1
2G1 2G1 2G1 ……
3G1 3G1 3G1 4G1
4G1 4G1 4G1 1G2
G2 1G2 1G2 ……
G3 2G2 2G2 6G2
3G2 3G2 1G3
4G2 4G2 ……
5G2 5G2 ……
6G2 6G2 ……
G4 1G4 ……
G5 ….. ……
G6 7G4 ……
G7 1G5 ……
1G3 ……
2G3 9G5
3G3 1G6
……
7G6
G8
1G7
......
11G7
1G3 ……
2G3 ……
3G3 3G8
In the last column of table 7, we have the set of minimal cuts for the engine
failure, which is ({1G1},{2G1},{3G1},{4G1},{1G2},{2G2},…,{1G8},{2G8},{3G8}).
The reason for the set of one element sets is the OR gates and the absence of AND gates.
An equivalent approach for the MOCUS algorithm starts from the lowermost
gates. It replaces an OR gate with the union (+) sign and a AND gate with the intersection
(*) sign, and after all the expressions are obtained, it continues the procedure to the gates
123
one step above from the lowermost gates. It continues in this way until the expression for
the top event is obtained.123
Following this algorithm, we have to end up with the same result as the MOCUS
algorithm, given as an expression of intersections and unions. In our case, we end up
with: E1= 1G1+2G1+3G1+4G1+1G2+2G2….+1G8+2G8+3G8. The equivalent to that
expression diagram is given in Figure 28.
E1
Engine Failure
OR
1G1
1G8
2G8
2G1
3G8
1G2
3G1
2G2
4G1 ….
124
Trying to find the equivalent representation to a block diagram, we end up with a
“chain like” representation that can be seen in Figure 29. A fault-tree representation of a
system can be converted into a block-diagram representation by replacing the AND gates
with parallel boxes and the OR gates with boxes in series. 124
Engine Failure
In a series structure, the component with the lowest reliability is the most
important one. We can compare that with a chain. A chain is never stronger than its
weakest link. So the most important element for reliability improvement is the one with
the lowest reliability.125 Reliability for a series system can be also explained by the use of
Structural Functions, which is summarized in Appendix D.
27. Criticality Analysis
For the criticality matrix, we need a metric for the severity-of-failure effect, so we
can use the designations in Table 8.
125
Due to the absence of historic lack of data, it is appropriate to use a qualitative
approach for the classification of failures according to their occurrence number which is
the overall probability of failure during the item operating time internal, as illustrated in
Table 9.126
From our previous analysis for engine failure using FTA, we ended up with the
following reasons:
c. Improper fuel
d. Engine fire
g. Improper lubricant
126
i. Personnel fatigue
m. Environmental reasons
r. Bad material
t. Bad manufacture
u. Bad design
v. Insufficient maintenance
w. Carburetor failure
x. Broken piston
y. Bearing failure
127
ff. Improper engine adjustments
kk. Inappropriate lean runs such as rusted bearings, seized connecting rod
or piston
From the above, we can derive the following issues about an engine failure
criticality analysis, initially based on our own experience and judgment due to lack of
tracking by current operators:
128
Number Issue ID Probability of Severity of
occurrence failure effect
1 Excessive engine vibrations L1 D II
2 Engine fire L2 D I
3 Fuel type L3 D III
4 Lubricant type L4 D III
5 Fuel/air mixture adjustment L5 C III
6 Gas and lubricant mixture L6 D III
7 Personnel training P1 C II
8 Operator’s frustration P2 C II
9 Personnel experience P3 B III
10 Poor documentation of procedures P4 C II
11 Poor workload balance P5 C II
12 Ergonomics of GCS P6 C II
13 Misjudgment P7 B II
14 Environmental reasons P8 C II
15 Man machine interface P9 D III
16 Maintenance P10 D II
17 Engine adjustments P11 C III
18 Usage P12 B II
19 Manufacture P13 D III
20 Software failure S D II
21 Material M1 D I
22 Hardware failure M2 E III
23 Design M3 D II
24 Engine wear M4 D II
25 Carburetor M5 C II
26 Piston M6 E II
27 Bearing M7 C I
28 Piston rings M8 E I
29 Propeller size PR E II
30 Engine temperature T1 D II
31 Cooling areas T2 D II
129
Our next step is to construct the criticality matrix based on the previous
qualitative analysis table:
A
)
P3 P7
Probability of occurrence (increasing
P12
P11 P6 P8 P2 M7
C
P5
L5 P1 M5 P4
P13 L3 T2 M4 L1 L2
D L6 P10
L4 P9 S M3 T1 M1
M2 M6 M8
E
PR
IV III II I
Figure 30. Engine Failure Criticality Matrix. (After RAC FMECA, page 33)
“The criticality matrix provides a visual representation of the critical areas” of our
engine failure analysis.127 Items in the upper most right corner of the matrix require the
most immediate action and attention because they have a high probability of occurrence
and a catastrophic or critical effect on severity. Diagonally toward the lower left corner of
the matrix, criticality and severity decreases. In case the same severity and criticality,
130
exists for different terms, safety and cost are the driving factors of the analysis. For
SUAVs, we do not take safety under great consideration because we are dealing with
unmanned systems, but we do have to consider cost.
Table 11. Results from Engine Failure Criticality Analysis. The most critical issues are
highlighted.
The importance of the bearing and carburetor are clearly shown. Those two parts
are the most critical among all the parts composing the engine, according to our analysis.
131
Finally, environmental reasons conclude the most critical of the issues that could
result in an engine failure.
132
III. DATA COLLECTIONS SYSTEMS
128 The material for this section is taken (in some places verbatim) from: RAC Toolkit, pages 284-289,
and: National Aeronautics and Space Administration (NASA), “Preferred Reliability Practices: Problem
Reporting and Corrective Action System (PRACAS),” practice NO. PD-ED-1255, Internet, February 2004.
Available at: http://klabs.org/DEI /References/design_guidelines/design_series/1255ksc .pdf
133
12
Determine
11 Effectiveness of 1
Incorporate Corrective Action
Failure Observation
Corrective Action
and Do Operational
Performance Test
13
10 Incorporate 2
Determine Corrective Action Failure
Corrective Action Into All Products Documentation
Root Cause
9 Analysis
Failure Verification 3
Closed-loop FRACAS
8
4
Problematic Item
Data Search
Replacement
7 Problematic Item
Verification 5
134
b. Failure Documentation
We record all relevant data describing the conditions in which the failure
has occurred. A detailed description of the failure incident as well as supporting data and
equipment operating hours is needed.
c. Failure Verification
If the failure is permanent, then we verify the incident by performing tests
for failure identification. If the failure is not permanent, then we verify the incident by
uncovering the conditions in which the failure has occurred. Finally, if the failure cannot
be verified, we pay close attention to the reoccurrence of failure.
d. Failure Isolation
For failures that were verified, we perform testing and troubleshooting to
isolate their causes. Isolating failure can identify a defective part or parts of the system,
or it can relate the incident to other reasons, like operator’s error, test equipment failure,
improper procedures, lack of personnel training, etc.
e. Replacement of Problematic Part(s)
For the above failures, we replace the problematic part or parts with a
known good one and replicate the conditions under which the failure has occurred. By
testing, we confirm that the current part (or parts) has been replaced. If failure reappears
we repeat failure isolation in order to determine the cause of failure correctly. We have to
tag the replaced part or parts, including all relevant documentation and data.
f. Problematic Part(s) Verification
We have to verify the problematic part(s) independent of the system. If the
failure cannot be confirmed then we have to review failure verification and isolation to
determine the right failure part(s). The isolation of the failure to the lowest possible level
of the system’s decomposition is the key to reveal the root failure cause.
g. Data Search
In this step, it is necessary to look up historical databases and reports for
similar or identical failure documentations. Databases could be from the implementation
of FRACAS methodology itself or could be from a FMEA or other technical reports.
Failure tendencies or patterns, if any, must be evaluated because they may reveal
135
defective lots of parts, or bad design, or bad manufacturing, or even bad usage. This is
obviously absent for SUAV systems.
h. Failure Analysis
A failure analysis to determine the root failure cause follows next. The
depth and the extension of failure analysis depend on the criticality of the mission, the
system’s reliability impact and the related cost. The outcome of failure analysis should be
specify failure causes and identify any external causes.
i. Root-Cause Analysis
This answers the question, “what could have been done to prevent
failure?” It focuses more on the true nature of failure, which could be due to:
• Overstress conditions
• Design error
• Manufacturing defect
• Unfavorable environmental conditions
• Operator or procedural error, etc
j. Determine Corrective Action
In this phase, we have to develop a corrective action. We have to rely on
the failure analysis and root-cause analysis results and our solution should prevent
reappearance of the failure in the long term in order to be effective. Corrective actions
could be:
• System redesign
• Part(s) redesign
• Selection of different parts or suppliers
• Improvements in processes
• Improvements in manufacturing etc
k. Incorporate Corrective Action and Operational Performance Test
Now, we can incorporate the identified corrective action in the failed
system and perform initial baseline tests as a start in order to verify the desired
performance. After the first successful results, our tests should become operational tests
including conditions under which the failure had occurred. After the documentation of all
test results, we can compare the pre-failure test results to identify alterations in baseline
136
data. Testing should be sufficient enough to give us the confidence level that the original
failure mode has been successfully eliminated from reoccurring. For large-scale
incorporation of a corrective action, verifying the action is first needed to avoid
unnecessary delays and expenses.
l. Determine Effectiveness of Corrective Action
We have to verify that our corrective action:
• Has successfully corrected the failure
• Has not created or induced other failures
• Has not degraded performance below acceptable levels
137
d. How can we prevent such failures from reoccurring?
From all the above we can simplify the procedure to the next checklist shown in
Figures 32 and 33.
Start
Failure O bservation
Failure
D ocum entation
Failure YES
D o Tests to Verify
Perm anent
Failure Identification
?
NO
NO
Failure
YES Tag Problem atic
R epeats
part(s)
?
NO YES
Failure of
Part(s)
Verified ? G o to
page 2/2
138
From
page 1/2
Inputs from:
Historic databases
Data Search FRACAS implementation
FMEA
Technical reports
Root-Cause
What could have happened to prevent failure ?
Analysis
Repeat FRACAS
Determine
Document corrective action
Corrective Action
Incorporate
Document results
Corrective Action
Verify desired performance
and do Operational
Performance Tests
Incorporate
YES Does Original NO
Corrective Action
Failure Reoccur
into all Systems of
?
the Same Kind
Continue
Implementation of
FRACAS
139
3. FRACAS Forms
I have developed forms to implement the FRACAS methodology for SUAVs:
These forms were presented to a VC-6 team for use during the STAN experiment
during May 2004. The effort to implement these forms was not successful. The primary
reason was lack of personnel training related to the FRACAS system itself, the form
filling, and the general concept of reliability. The secondary reason was lack of
coordination and control to fill these forms. It was obvious that a member of the
operating team, assigned with the extra task to coordinate and control the proper data
entry for the forms, was needed.
“It is preferable to attempt to communicate the ‘big picture,’ so that each team
member is sensitive to failure detection” and identification, and “the appropriate
corrective action process.” 131 Nevertheless, it is typical especially in military
applications to have overall control, so a centralized FRACAS administration within a
team or teams is needed.
The forms cover all aspects of SUAV design, development, production and
operation with emphasis to experienced operation or test teams. All forms are addressed
140
to the operating or test team. The Failure Analysis Report form is also addressed to the
design and development team.
Using the forms we can collect information and data to the level of detail
necessary to identify design and/or process deficiencies that should eliminated, preferably
before the SUAV released to its users in the battlefield. For that reason the forms can be
used for other systems as well as SUAVs.
There are no known forms of FRACAS or any other reliability tracking system
that have been used for SUAV testing or operation in the past.
4. Discussion for the Forms Terms
Most of the terms in those forms are self-explanatory. Some discussion follows
for some of them.
141
like temperature, wind speed, humidity/precipitation, cloudiness, lightning, and fog, icing
condition, proximity to sea or desert or inhabited area. In position (16b) System
Parameters/Conditions, is a list of system conditions and parameters like, flight altitude,
platform speed, engine RPM, fuel level, battery status, communication status, LOS
availability.
142
(1) History, in position (31) is a complete description of the
observed failure and all the events that followed.
In all forms, except the Log-Sheet form, there is a term for Comments. It covers
any other detail that the operator or the tester estimates that is relevant to the failure and
warrants mention.
143
1. No Form type: 2. Page 1 of
Initial Failure Report _____
3. Project ID 4. System 5. Serial 6. Detected 7. Failure 8.Total 9. Current
No During Date, Time Operating Hours Mission Hours
10. Reported by 11. Verified by 12. System 13. Type of System’s 14. Type of Failure
Operated by: Mission (permanent/recoverable)
b. System Parameters/Conditions
18. Name 19. Reference Drawings 20. Part No 21. Manufacturer 22.Serial No
SUBSYSTEMS
AFFECTED
23. Name 24. Reference Drawings 25. Part No 26. Manufacturer 27.Serial No
30. Comments
36. Checked (engineering) 37. Date 38. Checked (program) 39. .Date 40. Distribution
144
1. No Form type: 2. Page 1 of
Failure Report (Continued) _____
3. Project ID 4. System 5. Serial 6. Detected 7. Failure Date, 8.Total 9. Current
No During Time Operating Hours Mission Hours
10. Reported by 11. Verified by 12. System 13. Type of System’s 14. Number of Failure
Operated by: Mission
19. Name 20. Reference Drawings 21. Part No 22. Manufacturer 23.Serial No
PROBLEMATIC PART(S)
24. Tagged 25. Failure Verified 26. Failure Verified by 27. Failure Verified by 28. System Condition
by: by(reliability) : (engineering) : (program) : after Replacement:
30. Background
31. Comments
321. Prepared by 33. Date 34. Checked (reliability) 35. Date 36. Problem No
37. Checked (engineering) 38. Date 39. Checked (program) 40. Date 41. Distribution
145
1. No Form type: 2. Page 1 of
Failure Analysis Report _____
3. Project ID 4. System 5. Serial 6. Test Level 7. Failure Date 8. Operating Hours 9. Reported
No by
MAJOR 10. Name 11. Reference 12. Part No 13. Manufacturer 14.Serial No
COMPONENT OR Drawings
UNIT
15. Name 16. Reference Drawings 17. Part No 18. Manufacturer 19.Serial No
SUB ASSEMBLY
20. Name 21. Reference Drawings 22. Part No 23. Manufacturer 24.Serial No
25. Name 26. Reference Drawings 27. Part No 28. Manufacturer 29.Serial No
PART(S)
31. History
32. Analysis
33. Conclusions
43. Approval (engineering) 44. Date 45. Approval (program) 46. .Date 47. Distribution
Table 14. Failure Analysis Report Form (From RAC Toolkit, page 290)
146
1. No Form type: 2. Page 1 of
Corrective Action Verification Report _____
3. Project ID 4. System 5. Serial 6. Test Level 7. Failure Date 8. Total Operating 9. Reported
No Hours by
10. Initial Failure 11. Failure Report 11. Failure 12. Current 13. Operation 14.Number of
Report form Number Continue form Number Analysis Mission Hours Hours after Corrective
Report Before Failure Previous Failure Action Taken
b. System Condition
22. Comments
23. Corrective Action Taken by 24. Date 25. Document No 26. Corrective Action
Effectiveness
27. Prepared by 28. Date 29. Approval (reliability) 30. Date 31. Problem No
32. Approval (engineering) 33. Date 34. Approval (program) 35. Date 36. Distribution
147
1. No Form type: 2. Page 1 of
Tag to Problematic Part _____
3. Project ID 4. System 5. Serial 6. Detected 7. Failure Date 8. System’s Total 9. Reported
No During Operating Hours by
10. Initial Failure 11. Failure Report 12. Failure 13. Corrective 14. Operation 15. Total
Report form Number Continue form Number Analysis Action Verification Hours after Number of
Report Report Previous Failures.
Failure
18. History
19. Name 20. Reference drawings 21. Part No 22. Manufacturer 23.Serial No
PROBLEMATIC PART
24. Tagged 25. Failure Verified 26. Failure Verified by 27. Failure Verified by 28. System
by: by(reliability) : (engineering) : (program): Condition after
Replacement:
29. Comments
39. Approval (engineering) 40. Date 41. Approval (program) 42. Date 43. Distribution
148
Form type:
Failure Log-Sheet
4. Operator 5. Failure Description (brief) 7. Initial 8. Initials
6. Reported?
Report
1. Number
Number
3. Time
2. Date
Use of these forms will allow detailed analysis of the causes of failure and
detailed modeling of reliability by subsequent analysts.
149
5. Reliability Growth Testing 132
It is almost certain that prototypes or new designs will not initially meet their
reliability goals. Implementation of a reliability enhancement methodology such as
FRACAS is the only way to overcome the initial problems that may surface in the first
prototype performance tests and later. Therefore, failures are identified, and actions taken
to correct them. As the procedure continues, corrective actions become less frequent.
After a reasonable amount of time, one must check whether reliability has improved, and
estimate how much additional testing is needed.
Duane observed that there is a relationship between the total operation time (T)
accumulated on a prototype or new design and the number of failures (n(T)) since the
beginning of operation. 133 If we plot the cumulative failure rate (or cumulative mean
time between failures MTBFc ) n(T)/T versus T in a log-log scaled graph, the observed
data tends to be a linear regardless of the type of equipment under consideration.
Duane’s plots provide a rough estimate of the increment of the time between
failures. It is expected that time between failures at the early stages of development will
be short. But soon after the first corrective actions they will gradually become longer. As
a consequence Duane’s plots will show a rapid reliability improvement in the early stages
of development. After the first corrective actions the reliability improvement would be
less rapid. After a corrective action we can see whether there is a reliability improvement
or not. So we can have a measure of effectiveness of our corrective actions, which
corresponds to the growth of reliability.
132 The material for this section is taken (in some places verbatim) from: Lewis, E. E., Introduction to
Reliability Engineering, Second Edition, John Wiley & Sons, 1996, pages 211-212.
133 Duane, J. J., “Learning Curve Approach to Reliability Modeling,” Institute of Electrical and
Electronic Engineers Transactions on Aerospace and Electronic Systems (IEEE. Trans. Aerospace) 2563,
1964.
150
1000.0
100.0
Cumulative Failure Rate n(T)/T
10.0
a>0
a=0
1.0 a<0
a=- 1
0.1
Figure 33 illustrates a Duane’s data plot for a hypothetical system. Because of the
straight line we get: ln[n(T ) / T ] = α ⋅ ln(T ) + b and then:
finally we have n(T ) = K ⋅ T (1+α ) . Alpha (α) is the growth rate or the change in MTBF per
time interval over which change occurred and K is a constant related with the initial
MTBF.
b. If α<0, then the cumulative failure rate decreases, and the expected
failures become less frequent as T increases. Therefore reliability increases.
151
c. If α=-1, n(T ) = K = eb = constant . Therefore the number of failures is
independent of time T. We can assume that α=-1 is the theoretical upper limit for
reliability growth.
d. If α>0, then the cumulative failure rate increases, and the expected
failures become more frequent as T increases. Therefore reliability decreases.
e. Miscellaneous.
Each failure can be assigned to one of the above categories and therefore we have
to keep track of five different reliability tendencies.
152
b. There are many classified and unclassified reports published on many
different types of SUAVs.
c. Many systems have been tested and there are plans for future tests in
battlefield environments and in deployments with the fleet.
(3). The Sea ALL (Sea Airborne Lead Line) SUAV system which is
a variety of the USMC Dragon Eye UAV.136
134 Morris Jefferson, Aerospace Daily, December 8, 2003, “Navy To Use Wasp Micro Air Vehicle To
Conduct Littoral Surveillance.”
135 Message from COMMMNAVAIRSYSCOM to HQ USSOCOM MACDILL AFB FL, March 26,
2004, “UAV Interim Flight Clearance for XPV-1B TERN UAV System, Land Based Concept of Operation
Flights.”
136 Sullivan Carol, Kellogg James, Peddicord Eric, Naval Research Lab, January 2002, Draft of
“Initial Sea All Shipboard Experimentation.”
137 Undated message from Commander, Cruiser Destroyer Group 12 to Commander, Second Fleet,
“Urgent Requirement for UAVs in Support of Enterprise Battle Group Recognized Maritime Picture.”
138 UAV Rolling News, “UAV Roadmap defines reliability objectives,” March 18, 2003, Internet,
February 2004. Available at: http://www.uavworld.com/_disc1/0000002
153
2. UAVs and Reliability
The U.S. military UAV fleet (consisting of Pioneers, Hunters, and Predators)
reached 100,000 cumulative flight hours in 2002. This milestone is a good point at which
to assess the reliability of these UAVs. Reliability is an important measure of
effectiveness for achieving routine airspace access, reducing acquisition system cost, and
improving UAVs mission effectiveness. UAV reliability is important because it supports
their affordability, availability, and acceptance.139
UAV reliability is closely tied to their affordability primarily because UAVs are
expected to be less expensive than manned aircraft with similar capabilities. Savings are
based on the smaller size of the UAVs and the omission of pilot or aircrew systems.
a. Pilot Not on Board140
With the removal of the pilot and the tendency to produce a cheaper UAV,
redundancy was minimized and component quality was degraded. Yet UAVs became
more prone to in-flight loss and more dependent on maintenance. Therefore, their
reliability and mission availability were decreased significantly. Being unmanned, they
cannot provide flight cues to the user such as:
• Acceleration sensation,
• Vibration response,
• Buffet response,
Ground testing and instrumentation data analysis are the only source for
such cues.
b. Weather Considerations141
For the platform the most important weather condition is wind speed and
direction at surface (the lowest 100 meters of the atmosphere) and upper levels. Other
weather conditions are important but do not affect the flight unless they are extreme.
Surface winds affect air-platforms during takeoff and landings, but also during preflight
and post flight ground handling. Light winds are most favorable for routine operation and
testing. High winds during flight can cause significant platform drift, which results in
poor platform position controllability. This can render a mission profile infeasible and
result in flight cancellation.
141 Teets, Edward H., Casey J. Donohue, Ken Underwood, and Jeffrey E. Bauer, National Aeronautics
and Space Administration (NASA), NASA/TM-1998-206541, “Atmospheric Considerations for UAV
Flight Test Planning,” January 1998, Internet, February 2004. Available at:
http://www.dfrc.nasa.gov/DTRS /1998/PDF/H-2220.pdf
155
c. Gusts and Turbulence
The high susceptibility of the platform to gusts and turbulence makes
stabilizing flight operation points very difficult. The platform’s low-wing loading can
lead to high-power loading due to gusts, and turbulence and the low inertia are the main
reasons for that behavior142.
During the development test and evaluation period (DT&E), an SUAV can
be tested in aerodynamic/wind tunnels to establish its general flight characteristics. A
basic flight manual can be produced during DT&E that will be tested and refined during
the operational test and evaluation period (OT&E). The advantage of SUAVs is that the
actual airframe can be tested in the wind tunnel, without any analogy or other factor
involved in the calculations because the original platform (and not any miniaturized
model) is being tested.
d. Non Developmental Items (NDI) or Commercial Off-the-shelf
(COTS)
One of the factors in lack of reliability of inexpensive UAVs is the
use of NDI/COTS components that were never meant for an aviation
environment. In many cases, it would have been better to buy the more
expensive aviation-grade components to begin with than to retrofit the
system once constructed. Do not assume COTS components/systems will
work for an application they were not designed for. In other words, they
have to be COTS for that specific use.143
Using NDI/COTS items may save money but require testing in order to
ensure compatibility and to reduce uncertainty in mission efficiency.144
e. Cost Considerations145
By using COTS technology, distributed sensors, communications
and navigation, it is also proposed that the total system reliability may be
increased. It must be noted however that this approach does not currently
account for issues of airworthiness certification.
146 Clough.
157
Life Cycle Cost
tio
ea
isi
r
qu
Ou
Ac
t
0 Reliability 100%
147 Carmichael, Bruce W., and others, “Strikestar 2025,” Chapter 4, “Developmental Considerations,
Man-in-the-Loop,” August 1996, Department of Defense , Internet, February 2004. Available at:
http://www.au.af.mil/au/2025/volume3/chap13/v3c13-4.htm
158
(1) Collision avoidance
148 Finley, Barfield, Automated Air Collision Avoidance Program, Air Force Research Laboratory,
AFRL/VACC, WPAFB,“Autonomous Collision Avoidance: the Technical Requirements,” 0-7803-6262-
4/00/$10.00(c)2000 IEEE.
149 Coker, David, Kuhlmann, Geoffrey, “Tactical-Unmanned Aerial Vehicle ‘Shadow 200’
(T_UAV),” Internet, February 2004. Available at: http://www.isye.gatech.edu/~tg/cources/6219/assign
/fall2002 /TUAVRedesign/
150 Lopez, Ramon, American Institute of Aeronautics and Astronautics (AIAA), “Avoiding Collisions
in the Age of UAVs,” Aerospace America, June 2002, Internet, February 2004. Available at:
http://www.aiaa.org /aerospace/Article.cfm?issuetocid=223&ArchiveIssueID=27
159
• Landing in sea water
• Using a parachute
The most common problems with recovery are lack of experience by the
remote pilot and low altitude winds, even for the VTOL UAVs. To resolve or mitigate
this problem, automated recovery systems can be used. Those systems have been
developed to improve precision, ease and safety of UAV recoveries, on land and sea, and
in a variety of weather conditions.151
i. Losing and Regaining Flight Control
The need for uninterrupted communication between the operator in the
GCS and the platform is a critical capability.152 An interruption of that link is always
possible due to loss of Line-of-Sight (LOS), communication failure related to platform or
GCS, and electromagnetic interference (EMI). The only way to overcome this problem is
autonomy with dependable autopilot and mission control software.153
The SUAV operators are part of a battle team and their primary skill and
training is to fight and then to operate the SUAVs. They operate SUAVs from a distance
yet in the proximity of the battlefield. So, care must be taken in making excessive
workload demands on the SUAV operators. Instead, by making the platform control and
operation more user-friendly, we can optimize the benefits of SUAVs capabilities. When
the operators can stand far enough from the battlefield, user-friendly control of SUAVs is
advantageous, and multiple platform control can become a more realistic capability if
SUAV autonomy is high.
k. Reliability, Availability, Maintainability of UAVs
Reliability is the probability that a UAV system or component will operate
without failures for a specified time (the mission duration) as well as the preflight tests
duration. This probability is related to the mean time between failures (MTBF) and
availability.
155 Dixon, Stephen R., and Christopher D. Wickens, “Control of multiple UAVs: A Workload
Analysis,” University of Illinois, Aviation Human Factors Division, Presented to 12th International
Symposium on Aviation Psychology, Dayton, Ohio 2003.
161
Volume, weight, and cost are also important for UAVs system’s operational usage and
real system needs. There is a trade off as indicated in Figure 36.156
-9
1-10
Quad y Airliner/Satelite
(150M-500M)
-7
1-10
Triple y Fighter
(50M-150M)
-5 y Moderate Cost
1-10 UAVs
(10M-50M)
Dual/Triple
-3
1-10 y Reusable UAVs
(300K-10M)
156 Sakamoto, Norm, presentation: “UAVs, Past Present and Future,” Naval Postgraduate School,
February 26, 2004.
157 Clough.
162
The Army’s acquisition of the Hunter RQ-5 system is an example of reliability
improvement after the implementation of a reliability improvement program. In 1995,
during acceptance testing, three Hunter platforms crashed within a three week period. As
a result, full rate production was canceled. The Program Management Office and the
prime contractor Thompson Ramo Wooldridge (TRW) performed a Failure Mode Effect
and Criticality Analysis (FMECA) for the whole system. Failures were identified and
design changes were made after failure analyses and corrective actions were
implemented. As a result, Hunter’s Mean Time Between Failures (MTBF) for its servo
actuators, which were the main cause for many crashes, increased from 7,800 hours to
57,300 hours.
Hunter returned to flight status three months after its last crash. Over the next two
years, the system’s MTBF doubled from four to eight hours and today stands close to 20
hours. Prior to the 1995, Hunters mishap rate was 255 per 100,000 hours; afterwards
(1996-2001) the rate was 16 per 100,000 hours. Initially canceled because of its
reliability problems, Hunter has become the standard to which other UAVs are compared
in reliability.158
4. Measures of Performance (MOP) for SUAVs
In manned aviation, the usual Measures Of Performance (MOPs) used for
reliability tracking are
In the Vietnam War, the MOPs used for the Lightning Bug were
The frequency of mishaps is the primary factor for choosing a MOP. In the SUAV
case, we can use the following MOPs for reliability tracking:
a. Crash Rate (CR): The total number of crashes divided by the total
number of flight hours. A crash results in loss of platform.
c. Mishap Rate (MR): The total number of mishaps divided by the total
number of flight hours. This thesis defines a mishap for a SUAV as significant platform
damage or a total platform loss. A mishap requires repair less than or equal to a crash
depending on the condition of the platform after the mishap.
e. Current Crash Rate (CCR): The total number of crashes from the last
system modification divided by the total number of flight hours from the last system
modification.
f. Operational CCR: The total number of crashes from the last system
modification divided by the total number of operating flight hours since the last
modification.
g. Current Mishap Rate (CMR): The total number of mishaps from the last
system modification divided by the total number of flight hours from the last
modification.
160 Carmichael, Bruce W., Col (Sel), and others, “Strikestar 2025,” Appendix A,B & C, “Unmanned
Aerial Vehicle Reliability,” Appendix A, Table 4August 1996, Department of Defense, Internet, February
2004. Available at: http://www.au.af.mil/au/2025/volume3/chap13/v3c13-8.htm
164
h. Operational CMR: The total number of mishaps from the last system
modification divided by the total number of operating flight hours from the last
modification.
i. Crash Rate “X” (CRX): The crash rate for the last “X” hours of
operational flight hours, as in “CR50” which is the CR for the last 50 flight hours.
j. Mishap rate “X”: The MR for the last “X” hours of operational flight
hours, as in “MR50” which is the MR for the last 50 flight hours.
l. Percent Sorties Loss: The total number of sorties lost (for any reason)
divided by the total number of sorties assigned.
SUAVs are generally low cost systems with prices from $15K to $300K. For that
reason there is no official data collecting system in effect detailed enough to provide
reliability data. Usually, only the number of flight hours and the number of crashes is
known. For that reason, the most suitable reliability MOPs currently are CR, CCR and
CRX.
5. Reliability Improvement Program on SUAVs
A reliability improvement program seeks to achieve reliability goals by improving
product design. The objective of an improvement program is to identify, locate and
correct, faulty and weak aspects of the design, manufacturing process, and operating
procedures. For the SUAV, we first applied existing techniques for improving system
reliability.
Starting with the FMEA, which is the basis for the most common methodologies
for improving reliability; we also discussed FMECA and FTA. After that, reliability
centered maintenance, specifically MSG-3, was presented as the prevailing methodology
for enhancing civil aviation reliability and maintenance preservation methodology. We
165
showed that MSG-3 is not suitable for UAVs applications because of its dependence on
an in-board operator. We highlighted the need for a data collection system and presented
FRACAS. FRACAS is best suited for SUAVs especially during their initial phases of
development or operational test development. Finally, a method or technique is needed to
keep track of reliability growth. Duane’s plots presented and recommended for their
simplicity.
6. Steps for Improving Reliability on SUAVs
We can consider a FRACAS system as a part of a generic reliability improvement
program. The first step of such a program is an environmental stress screening (ESS).
ESS is a process that uses random vibration within certain operational limits, and
temperature cycling to accelerate part and workmanship imperfections. Identification of
infant mortality failures can be identified in a short time and relatively easily.
161 Hoivik.
162 Department of Defense, MIL-STD-1629A, “Procedures For Performing a Failure Mode Effects
and Criticality Analysis,” Task 101 FMEA sheet, November 24, 1980.
166
FMEA Date
UAVs FMEA Form
System Name Page of Pages
ID Item/ Design Function Failure Operational Failure Effects Failure Fault Severity Remarks
Number functional Modes and Phase Detection Acceptance Classification
ID Causes Local Next Higher Level End Method
Effects
167
The cell definitions are: 163
(1). ID Number, given to each entry on the FMEA form for record-
keeping purposes.
(4). Failure Modes and Causes, a brief statement about the way(s)
in which the item may fail. In the case of the carburetor, the failure modes are improper
adjustment, plugged needle valve, jammed leverage, servo failure, excess vibrations,
throttle failure, insufficient fastening to the frame, etc.
(7). Next Higher Level, about the effect of the local failure on the
next higher functional system level; in the case of the carburetor, we can state “Loss of
engine.”
163 The material from the following part of section is taken (in some places verbatim) from: RAC
FMECA, pages 60-66.
168
(10). Fault Acceptance, statement of the ways that the system can
overcome or bypass the effects of failure. In the case of the carburetor the system design
does not provide any alternatives so the word “None” can be placed under fault
acceptance.
169
f. Complete a reliability improvement plan. This plan must be completed,
approved and coordinated by the manufacturer’s engineers and reliability manager in
cooperation with the military personnel who operate the systems. The following need to
be addressed in the plan:
• Resources,
• Personnel,
• Test environment,
• Procedures,
Verify/Calibrate
a
Instruments
Set Initial
b Weather
Restrictions
c Conduct FMEA
Establish
d
FRACAS
e Track Reliability
Complete
f Reliability
Improvement Plan
170
IV. EXAMPLE
Flight
Operating hours vs time
Year Mishaps hours 14000
86 5 96.3 12000
Operating hours
87 9 447.1 10000
88 24 1050.9 8000
89 21 1310.5 Operating
6000
90 21 1407.9 hours vs
4000 time
91 28 2156.6
2000
92 20 1179.3
0
93 8 1275.6
84 86 88 90 92 94 96
94 16 1568 Year
95 16 1752
0.04
88 0.023835 0.022837568 Current Mishap Rate (CMR)
89 0.020311 0.016024418 0.03
90 0.01855 0.014915832 0.02
91 0.016694 0.0129834 0.01
92 0.016735 0.016959213
0
93 0.015239 0.006271558 84 86 88 90 92 94 96
94 0.014487 0.010204082 Year
95 0.013721 0.00913242
165 Carmichael, Bruce W., Col (Sel), and others, “Strikestar 2025,” Appendix A, B & C, “Unmanned
Aerial Vehicle Reliability,” August 1996, Department of Defense school, Internet, February 2004.
Available at: http://www.au.af.mil/au/2025/volume3/chap13/v3c13-8.htm
171
It is obvious that both MOPs provide the notion of rapid improvement during the
first two years followed by a much slower rate of improvement.
1. We follow Duane’s theory and analyze the data as seen in Table 21. We assume
that reliability improvement efforts have been implemented every year on all similar
systems.
N T
Cum Mish Cum flight hours N/T ln(T) ln(N/T) Regression exp(regression)
5 96.3 0.051921 4.567468 -2.95803 -3.0499928 0.047359265
14 543.4 0.025764 6.297846 -3.658788 -3.4840591 0.030682613
38 1594.3 0.023835 7.37419 -3.736604 -3.7540609 0.023422437
59 2904.8 0.020311 7.97412 -3.896582 -3.9045536 0.020149947
80 4312.7 0.01855 8.369319 -3.987293 -4.0036897 0.018248183
108 6469.3 0.016694 8.774823 -4.092692 -4.1054106 0.016483249
128 7648.6 0.016735 8.942278 -4.090248 -4.1474168 0.015805192
136 8924.2 0.015239 9.096522 -4.183867 -4.186109 0.015205334
152 10492.2 0.014487 9.258387 -4.234507 -4.226713 0.014600302
168 12244.2 0.013721 9.412808 -4.288844 -4.2654495 0.014045553
Regression Statistics 0
Multiple R 0.984226284 -0.05 10 100 1000 10000 100000
R Square 0.968701379 -0.1
Adjusted R Square 0.964789051
-0.15 Residuals vs fit
Standard Error 0.073881376
Observations 10 -0.2
Operating hours
ANOVA
df SS MS F Significance F
Regression 1 1.351526748 1.351526748 247.6023 2.65748E-07
Residual 8 0.043667661 0.005458458
Total 9 1.39519441
172
In that case α is -0.25 for the total 12,244.2 hours of operations. In the next figure,
we can see Duane’s regression and failure rate versus time plots.
0.1
failure rate vs
time
failure rate Duane's
regression
0.01
1 100 10000
Operating hours
From the residual and the Duane’s plots we see a steeper descent for the failure
rate in the first years followed by a short period of constant failure rate. The last year’s
failure rate is not as steep as the first year’s.
2. Using the same data set, we concentrate on the last six years, from 1990 to
1995.
90 21 1407.9
8000
91 28 2156.6
6000
92 20 1179.3 Operating
4000 hours vs time
93 8 1275.6
2000
94 16 1568
0
95 16 1752
90 92 94 96
year
We follow Duane’s theory and analyze the data as seen in the next table.
173
N T
Cum Mish Cum flight hours N/T ln(T) ln(N/T) Regression exp(regression)
21 1407.9 0.014916 7.249855 -4.205332 -4.173563 0.015397302
49 3564.5 0.013747 8.178779 -4.286959 -4.2891601 0.013716442
69 4743.8 0.014545 8.464594 -4.230487 -4.3247274 0.013237158
77 6019.4 0.012792 8.702743 -4.358937 -4.3543631 0.012850622
93 7587.4 0.012257 8.934244 -4.401645 -4.3831715 0.012485697
109 9339.4 0.011671 9.141997 -4.450649 -4.4090247 0.012167039
Residuals vs fit
SUMMARY OUTPUT 0.1
residuals vs fit
Regression Statistics
Residuals
0.05
Multiple R 0.864537515
R Square 0.747425115
Adjusted R Square 0.684281394 0
Standard Error 0.054749695 1000 10000
Observations 6 -0.05
Operating hours
ANOVA
df SS MS F Significance F
Regression 1 0.035481414 0.035481 11.83689 0.026282253
Residual 4 0.011990116 0.002998
Total 5 0.047471531
Now the parameter α is -0.12 for the last 9,339.4 hours of operations. That means
we have less rapid reliability growth the last six years. Figure 39 depicts Duane’s
regression and failure rate versus time plot:
174
0.1
failure rate vs
time
failure rate
Duane's
regression line
0.01
1000 10000
Operating hours
Figure 39. Duane’s Regression and Failure Rate versus Time for 1990 to 1995
Comparing the two time periods, we can say that rate of reliability growth for the
last six years (factor of -0.12) from 1990 to 1995 decreased compared to the overall
factor -0.25 for the whole ten-year period from 1986 to 1995.
3. Using the same data set, we concentrate in the first six years from 1986 to
1991.
86 5 96.3 Operating
87 9 447.1 6000 hours vs time
88 24 1050.9 4000
89 21 1310.5 2000
90 21 1407.9
0
91 28 2156.6
86 87 88 89 90 91
year
We follow Duane’s theory and analyze the data as seen in the next table:
175
N T
Cum Mish Cum flight hours N/T ln(T) ln(N/T) Regression exp(regression)
5 96.3 0.051921 4.567468 -2.95803 -3.0470131 0.047500591
14 543.4 0.025764 6.297846 -3.658788 -3.4860799 0.030620672
38 1594.3 0.023835 7.37419 -3.736604 -3.7591921 0.023302559
59 2904.8 0.020311 7.97412 -3.896582 -3.9114186 0.020012093
80 4312.7 0.01855 8.369319 -3.987293 -4.0116967 0.018102654
108 6469.3 0.016694 8.774823 -4.092692 -4.1145894 0.016332645
Residuals vs fit
0.1
SUMMARY OUTPUT
0.05
ANOVA
df SS MS F Significance F
Regression 1 0.786578144 0.786578144 79.54973601 0.000873629
Residual 4 0.039551515 0.009887879
Total 5 0.826129659
Now α is -0.25 for the first 6469.3 hours of operations. In the next figure, we see
Duane’s regression and failure rate versus time plots:
176
0.1
failure rate vs time
0.01
1000 10000
Operating hours
Figure 40. Duane’s Regression and Failure Rate versus Time for 1986 to 1991
If we compare the first six years with the last six years, we can say that reliability
growth for the last six years has increased according to the factor of -0.12, instead of the
factor -0.25, which related to the first six years. We do not know why the reliability
growth rate has decreased, but it has.
4. We can use the Duane curve to predict the MTBF for the future. From the
previous discussion on Duane’s plots on IIIB4, MTBF is K ⋅ T α where K = eb . Using the
results for the last six years we have a is -0.1244 and b is -3.2714. So the equation for the
curve is MTBF = e −3.2714 ⋅ T −0.1244 . This curve can be used as the prediction curve for the
MTBF. For example, in 12,000 hours of operation after 1990, the MTBF will be
0.011793 failures per hour of operation or 12 failures per 1,000 hours of operation.
0.0116
0.0114
0.0112
0.011
0.0108
9000 12000 15000 18000 21000
Operating hours
177
THIS PAGE INTENTIONALLY LEFT BLANK
178
V. CONCLUSION
A. SUMMARY
From the material presented in this thesis, we can conclude the following:
2. RCM (or MSG-3) is a system suitable for civil and military manned
aviation and other industry fields in which experience is prevalent, hidden failures can be
easily identified by personnel, and safety considerations are the primary factor. For small
UAV systems in military applications, safety is not the primary factor. Experience has not
reached the manned aviation levels and hidden failures for unmanned systems are very
difficult to be observed. Therefore MSG-3 is not a suitable standard for SUAVs.
3. FMEA may be used for almost any kind of reliability analysis that
focuses on finding the causes of failure. A good and complete knowledge of the system is
necessary prior to proceeding with the FMEA. FMEA is an appropriate method for
SUAVs. This thesis has developed FMEA forms for the SUAV.
7. For SUAVs we have to use fault avoidance due to size and weight
limitations. Redundancy cannot be easily implemented, especially due to platform cost
and size constrains.
• Communication
• Miscellaneous
and keep track of the reliability for each subsystem separately. The forms that we have
developed can be used as data source for subsystem reliability separation.
• Calibrate and verify the instruments for the field tests or field operations,
• Establish a FRACAS,
180
• Track of reliability improvement,
13. Reliability costs, and benefits, are like an investment. One truly gets
what one pays for.
This thesis is a qualitative approach to the issue of reliability and UAVs. In order
to obtain further benefit and value from that research effort, we must have data. For a
specific type of UAV, we can start implementing FRACAS and collecting data. A
database can be created easily after the implementation of FRACAS, and we can start
analyzing and interpreting reliability improvement, if any, quite soon.
2. Some experts believe that difficult problems can be solved with better
software, but software is not free. In the near network-centric future, software will
probably be one of the most expensive parts of a UAV system. Additionally, software is a
dynamic part of the system. It must be constantly upgraded to meet new expectations, or
to integrate new equipment technologies. For that reason software reliability is another
critical issue that will become more intense in the near future. The emerging question is
how we can find the best means to maintain software reliability at acceptable levels.
181
3. Similar to the above issue, micro-technologies are quickly evolving.
New ones are rapidly being inserted into UAV systems. In what way can our reliability
tracking methodology cope with new subsystems?
Data collected using the methods developed in this thesis will provide the
material with which to answer these essential questions.
182
APPENDIX A: DEFINITION OF FMEA FORM TERMS
(2) Design Responsibility: Name the system design team and for (2A)
name the head of the system design team.
(5) Model/product: Name the model and/or the product using the system.
(9) FMEA Date, revision: Record the date of the latest revision.
(12) Potential Failure Mode: The defect refers to the loss of a design
function or a specific failure. “For each design function identified in Item 11 the
corresponding failure of the function must be listed. There can be more than one failure
from one function. ” To identify the failure mode ask the question: “How could this
166 The material from this section is taken (in some places verbatim) from: Stamatis, pages 130-132.
167 Ibid, pages 132-149.
183
design fail?” or “Can the design break, wear, bind and so on?” Another way to identify a
failure mode is through a FTA. In a FTA the top level is the loss of the part function and
the lower levels are the corresponding failure modes.
184
Effect Rank Criteria
None 1 No effect
Very slight 2 User not annoyed. Very slight effect on the product performance.
Non-essential fault noticed occasionally
Slight 3 User slightly annoyed. Slight effect on the product performance.
Non-essential fault noticed frequently.
Minor 4 User’s annoyance is minor. Minor effect on the product
performance. Non-essential faults almost always noticed. Fault
does not require repair.
Moderate 5 User has some dissatisfaction. Moderate effect on the product
performance. Fault requires repair.
Significant 6 User is inconvenienced. Degradation on product’s performance
but safe and operable. Non-essential parts inoperable.
Major 7 User is dissatisfied. Major degradation on product’s performance
but safe and operable. Some subsystems are inoperable.
Extreme 8 User is severely dissatisfied. Product is safe but inoperable.
System is inoperable.
Serious 9 Safe operation and compliance with regulations are in jeopardy.
Hazardous 10 Unsafe for operation, non-compliance with regulations,
completely unsatisfactory.
Table 29. Example of Severity Guideline Table for Design FMEA (After Stamatis, page
138)
(16) Potential Cause of Failure: This identifies the cause of a failure mode.
For a failure mode there may be a single cause or numerous causes, which in that case are
symptoms, with one root cause. A good understanding of the system’s functional analysis
is needed at that stage. Trying to find the real cause can identify the root cause. Asking
“Why?” five times is the rule of thumb for finding the cause of a failure mode. It is
essential to identify all potential failures while performing the FMEA. There is not
always a linear or “one-to-one relationship” between the cause and failure mode. Listing
as many causes as possible makes FMEA easier and less error prone. If the severity of a
failure is rated 8 to 10, then an effort should be made to identify as many root causes as
possible.
Table 30. Example of Occurrence Guideline Table for Design FMEA (After Stamatis,
page 142)
186
(19) Detection: Is the “likelihood that the proposed design controls will
detect” the root cause of a failure mode before it reaches the end user. The detection
rating estimates the ability of each of the controls in (18) to detect failures before it
reaches the customer. A typical detection guideline is shown in Table 31.
Table 31. Example of Detection Guideline Table for Design FMEA (After Stamatis, page
147)
187
(22) Responsible Area or Person and Completion Date: Name the
responsible person/area and the completion date for the recommended action.
(24) Revised RPN: This is the reevaluation of RPN after the corrective
actions have been implemented. If the revised RPN is less than the original then that
indicates an improvement.
3. Third Part of the Analysis of Design FMEA168
(25) Approval signatures: Name the authority to conduct the FMEA.
(26) Concurrence signatures: Names there responsible for carrying out the
FMEA.
188
APPENDIX B: THE MRB PROCESS
The Maintenance Review Process (MRB process) “is broadly defined as all of the
activities necessary to produce and maintain a Maintenance Review Board Report
(MRBR).” The process involves three major objectives, which are to ensure that:
“MRBRs are developed as a joint exercise involving the air operators, the type of
certificate applicant,” ATA and other Regulatory Authorities. The MRB process
The MRB chairperson reviews the proposed MRBR, which is then published as
the MRBR.170
169 Transport Canada Civil Aviation (TCCA), Maintenance Instruction Development Process, TP
13850, Part B, “The Maintenance Review Board (MRB) Process(TP 13850), Chapter 1. General,” last
updated: April 19, 2003, Internet, February 2004. Available at: http://www.tc.gc.ca/civilaviation
/maintenance/aarpd/tp13850/partB.htm
170 TCCA, Chapter 2.
189
THIS PAGE INTENTIONALLY LEFT BLANK
190
APPENDIX C: FAILURES
1. Functions171
A function statement should consist of a verb, an object and a desired standard of
performance. For example: A SUAV platform flies up to 4,000 feet at a speed of at least
on 55 knots sustained. The verb is “fly” while the object is “a SUAV platform” and the
standard is “up to 4,000 feet at a sustained speed of at least 55 knots.”
2. Performance Standards172
In our example: One process that degrades the SUAV platform, in other words
one failure mode for the SUAV, is engine failure. Engine failure happens due to many
reasons. The question is how much an engine failure can impair the ability of the UAV to
fly at the desired altitude on the designated sustained speed.
In order to avoid degradation, the SUAV must be able to perform better than the
minimum standard of performance desired by the user. What the asset is able to deliver is
known as its “initial capability,” say 4,500 feet on 60 knots sustained speed. This leads
one to define performance as:
• Desired performance, which is what the user wants the asset to do (4,000
feet on 55 knots sustained speed in our case).
• Built-in capacity, which is what the asset really is (4,500 feet on 60 knots
sustained speed in our case).
3. Different Types of Functions173
Every physical asset usually has more than one function. If the objective of
maintenance is to ensure that the asset can continue to fulfill these functions, then they
must all be identified together with their current standards of performance.
Functions are divided in two main categories: primary and secondary functions.
171 Moubray, John, an excerpt of the first chapter of the book “Reliability-centered Maintenance,”
Plant Maintenance Resource Center, “Introduction to Reliability-centered Maintenance,” Revised
December 3, 2002, Internet, May 2004, Available at: http://www.plant-maintenance.com/RCM-intro.shtml
172 Moubray, “Introduction.”
173 The material from this section is taken (in some places verbatim) from: Moubray, “Introduction.”
191
a. Primary functions are fairly easy to recognize and most industrial assets
are based on their primary functions. For example, the primary function of a “printer” is
to print documents, and of a “crusher” is to crush something, etc. In the SUAV example
the primary function is to provide lift and thrust so as the platform flies up to 4,000 feet at
a sustained speed of at least 55 knots.
However, each asset has more than one function, and each function often has
more than one desired standard of performance. It is possible for the asset to fail for each
function, so the asset can fail in different states. Therefore, it is required that failure can
be defined more accurately in terms of loss of specific functions rather than the failure of
an asset as a whole.
174 The material from this section is taken (in some places verbatim) from: Moubray, “Introduction.”
175 The material from this section is taken (in some places verbatim) from: Hoyland, pages 11-12.
192
A functional failure is defined as the inability of any “asset to fulfill a function to a
standard of performance, which is acceptable to the user.” 176
176 The material from this part of section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction.”
193
e Another classification according to the effects of severity:
Failure mode is “the effect by which a failure is observed on the failed item.”
Technical items are designed to perform one or more functions. So a failure mode can be
defined as nonperformance of one of these functions. Failure modes may generally be
subdivided as “demanded change of state is not achieved” and “change of conditions.”
For example, an automatic valve may show one of the following failure modes:
The first two failure modes are “demanded change of state is not achieved” while
the third one is “change of condition.”
7. Failure Effects179
177 The material from this section is taken (in some places verbatim) from: Hoyland, page 10.
178 The material from this part of section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction,” page 5.
179 The material from this section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction,” page 5.
194
The fourth of the seven questions in the RCM process, as previously mentioned in
IIA2b of this thesis, is listing “What happens when each failure occurs?” These are
known as “failure effects.”
Failure effects describe what happens when a failure occurs. While describing the
effects of a failure, the following should be recorded:
A proactive task is worth doing if it reduces the consequences of the failure mode
and justifies the direct and indirect costs of doing the task.
180 The material from this section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction,” page 5.
195
b. Operational, if the failure affects the operation, production output,
quality, cost or customer satisfaction.
d. Hidden, when failures have no direct impact, but they expose the
organization to multiple failures with serious and often catastrophic consequences.
196
APPENDIX D: RELIABILITY
1. Introduction to Reliability181
Reliability is a concept that has dominated systems design, performance and
operation for the last 60 years. It appeared after WWI, when it was used to compare
operational safety of one, two, three, and four-engine airplanes. At that time reliability
was measured as the number of accidents per flight hour.
In order to avoid low system reliability, engineers in the USA, at that time tried to
improve the individual system’s components. They used “better” materials and “better”
designs for the products. The result was higher system reliability but broad and further
analysis of the problem was not performed.
By the end of 1950s and early 1960s, interest in the USA focused on production
of the intercontinental ballistic missile and space research like the Mercury and Gemini
programs. In the race to put a man on the moon, a reliable program was very important.
The first association for engineers working with reliability issues was established. IEEE-
Transactions on Reliability was the first journal published on the subject in 1963. After
that, a number of textbooks were published and in the 1970s many countries from Europe
181 The material from this section is taken (in some places verbatim) from: Hoyland, pages 1-2.
197
and Asia began dealing with the same issues. Soon it became clear that a low reliability
level cannot be compensated by extensive maintenance.
2. What is Reliability?
“Until the 1960s, reliability was defined as the probability that an item will
perform a required function under stated conditions for a stated period of time.”
According to the International Standard Organization (ISO) 8402 and British Standard
(BS) 4778, “reliability is the ability of an item to perform a required function, under
given environmental and operational conditions and for a stated period of time.” The term
“item” is used to denote any component, subsystem or an entity system. A “required
function” may be a single function or a combination of functions necessary to provide a
certain service.182
f. The probability that the item does not fail in a time interval.
3. System Approach
A system is a group of elements, parts, or components that work together for a
specified purpose. A failure of the system is related at least to one of its parts or elements
182Hoyland, page 3.
183 Hoivik, slide 6.
198
or components failure. A part starts at its working state and for various reasons changes
to a failed state after a certain time. The time to failure is considered a random variable
that we can model by a failure-distribution function.184
Failure occurs due to a complex set of interactions between the material properties
and/or physical properties of the part and/or stresses that act on the part. The failure
process is complex and is different for different types of parts or elements or
components.185
Even though the failure mechanisms vary, they are basically divided into two
categories, the overstress and the wear-out. The overstress failures are those due to
fracture, yielding, buckling, large elastic deformation, electrical overstress, and thermal
breakdown. Wear-out failures are those due to wear, corrosion, metal migration, inter-
diffusion, fatigue-crack propagation, diffusion, radiation, fatigue-crack initiation and
creep.186
For multi-component systems like a SUAV the number of parts may be very large
and a multilevel decomposition of such a system is necessary.
4. Reliability Modeling
a. System Failures187
System failures for a multi-component system can be modeled in several
ways. A system failure is due to the failure of at least one of its components. So analysis
199
of failures at the component level is the initial point of a failure system analysis. “Henley
and Kumamoto (1981) propose the following classification of failures:
188 The material from this section is taken (in some places verbatim) from: Blischke, page 205.
189 Ibid.
200
time is very small relative to the mean time between failures, then it can be ignored, and
we can model the failure system as a function reflecting the effect of age. In other words,
the model function can be viewed as the failure rate of the system through time.
After overhauls or major repairs or design alterations the failure rate of the
system can be significantly reduced. Usually, it becomes smaller than the failure rate
before.
“The linking of the system performance to failures at the part level can be
done either qualitatively or quantitatively.” In the qualitative case, we are interested in the
causal relations between failures and system performance. In the quantitative case, we
can use many measures of system effectiveness, like reliability, in terms of component
reliabilities.
201
consists of both those machines has a probability that both machines fail on the same day
as 1/100 squared or 1/10,000.
e. Reliability Measures191
In order to understand the reliability measures, we must determine the
“time-to-failure” as a basic step. Time-to-failure of a system or component or part or unit
or element (system) is the time elapsing from when the system is put into operation until
the first failure. Let t=0, the operation starting time. The time to failure is subject to many
variables. Consequently, we can represent time-to-failure as a random variable T. We can
describe the condition or state of the system at time t by the condition random variable
⎧1 if the system is functioning at t ⎫
X(t) where X (t ) = ⎨ ⎬
⎩0 if the system is in failed condition at t ⎭
Working Condition
1
Failure
0
Time t
T
Time-to-failure
The time-to-failure may not always be measured in time but can also be
measured in numbers of repetitions of operation, or distance of operation, or number of
rotations of a bearing, etc. We can assume that the time-to-failure T is continuously
distributed with a probability density f(t) and distribution function :
191 The material from this section is taken (in some places verbatim) from: Hoyland, pages 18-25.
202
1
F (t ) = P(T ≤ t ) = ∫ f (u )du for t > 0 .
0
d F (t + ∆t ) − F (t ) P(t < T ≤ t + ∆t )
f (t ) = F (t ) = lim = lim .
dt ∆t → 0 ∆t ∆t → 0 ∆t
If ∆t is small then:
f (t ) ⋅ ∆t = P(t < T ≤ t + ∆t ) .
1.0
F(t)
0.5
f(t)
0.0
Time t
0 1 2 3 4
Figure 43. Distribution and Probability Density Functions (From Hoyland, page 18)
203
(1). Reliability or Survivor Function R(t). The reliability function
of a system is defined as:
1.0
F(t)
R(t)
0.5
F(t),
R(t)
0.0
Time t
0 1 2 3 4
P (t < T ≤ t + ∆t ) F (t + ∆t ) − F (t )
P (t < T ≤ t + ∆t | T > t ) = = .
P(T > t ) R (t )
Failure-rate z(t) is the limit as ∆t → 0 of probability that a system will fail in the interval
(t, t+∆t], given that it is in operating condition at time t, per unit length of time. If this
unit length of time approaches 0, then we have the following expression for the failure
P(t < T ≤ t + ∆t | T > t ) F (t + ∆t ) − F (t ) 1 f (t )
rate: z (t ) = lim = lim ⇒ z (t ) = (B)
∆t → 0 ∆t ∆t → 0 ∆t R(t ) R(t )
F (t + ∆t ) − F (t )
because it is known that f (t ) = lim or equivalently
∆t → 0 ∆t
204
d
f (t ) = F (t ) . (C)
dt
d
From (A) and (C) we get: f (t ) = (1 − R (t )) = − R '(t ) .
dt
So (B) becomes:
− R '(t )
1
d
z (t ) = = − ln R(t ) ⇒ since R(0)=1, ∫ z (t )dt = − ln R(t ) , so
R(t ) dt 0
∫
− z ( u ) du
R(t ) = e 0
. Finally we have:
t t
d ∫
− z ( u ) du ∫
− z ( u ) du
f (t ) = − R '(t ) = − (e 0
) ⇔ f (t ) = z (t )e 0
, t>0.
dt
205
F(t) f(t) R(t) z(t)
t t
F(t)= ∫
0
f (u )du 1 − R(t )
1− e
∫
− z ( u ) du
0
t
d d ∫
f(t)= F (t ) − R(t ) − z ( u ) du
dt dt z (t )e 0
∞ t
R(t)= 1 − F (t ) ∫ t
f (u )du
e
∫
− z ( u ) du
0
f (t )
dF (t ) / dt ∞ d
z(t)= − ln R(t )
1 − F (t ) ∫ f (u )du
t
dt
Table 32. Relationships Between Functions F(t), R(t), f(t), z(t) (From Hoyland, page 22)
For the most mechanical and electronic systems the failure rate
over the life of the system has three discrete periods, characterized by the well known
“Bathtub Curve,” shown in Figure 45.193
Reliability Measure
Durability Measure
Time a b
206
Infant mortality is the first phase of the bathtub curve where the
failure rate is high because of early manufacturing tolerances and inadequate
manufacturing skills. The failure rate is decreasing through time because of the maturity
of the design and the manufacturing process. Useful life is the second phase, which is
characterized as a relative constant failure rate. Wear-out is the last phase where
components start to deteriorate to such a degree that they have reached the end of their
useful life. This can be modeled either piece-wise or as the sum of three failure-rate or
t
⎧ z1 (t ), t < a
⎪ 3
z (t ) = ⎨ z2 (t ), a < t < b ,or z (t ) = ∑ zi (t ) .
⎪ z (t ), t > b i =1
⎩ 3
207
Condition Variable X(t)
1 : System up
0 : System down
0
MTTF MTTR MTTF MTTR
Time t
MTBF MTBF
MTTF < ∞ which is what is happening in reality, then −[tR(t )]∞0 = 0 and so
∞
MTTF = ∫ R(t )dt also. (E)
0
f. Structure Functions
The system and each component may only be in one of two states,
operable or failed. Let xi indicate the state of component i, for 1 ≤ i ≤ n , and
⎧1 if component i works
xi = ⎨ where x = ( x1 , x2 ,...xn ) is the component state vector.
⎩ 0 if component i failed
{
Φ = Φ(x) = system state = 10 ifif system works
system failed
, and
208
A series system with n components, works if and only if each of its n
components work, and fails whenever any of its components fails. The structure function
for a series system is
n
Φ = Φ (x) = x1 ⋅ x2 ⋅ ... ⋅ xn = ∏ xi 195.
i =1
system’s reliability and ri(t) is the ith component’s reliability for a series system.196
209
g. Series System Reliability Function and MTTF197
From Table 32 we find the failure rate function for the system is
d d
z (t ) = − ln( R(t )) = − ln(r1 (t ) + r2 (t ) + ⋅⋅⋅ + rn (t ))
dt dt
which is z (t ) = z1 (t ) + z2 (t ) + ⋅⋅⋅ + zn (t ) .
So the failure rate for a series system equals the sum of the failure rates of
all its components. As a result, the failure rate of the system is greater than the failure rate
of any of its components, and the whole system is driven by the worst component, which
is the one with the larger failure rate or the least reliability.
For example, and to simplify, we may assume that each of the components
in our system has an exponential lifetime distribution. Then the system also has an
exponential lifetime distribution. If zi (t ) = λi is the failure rate for component i, then the
n
failure rate for the system is Z (t ) = λs = ∑ λi , and the reliability function of the system
i =1
∞
becomes: R (t ) = e − λs ⋅t
. Then (E) becomes: MTTFs = ∫ e − λs ⋅t dt = 1/ λs .
0
210
Measure Equation Reliability & Maintainability
considerations
Inherent MTBF Assures operation under declared
Availability Ai = conditions in an ideal customer service
MTBF+MTTR
environment.
It is usually not a field-measured
requirement.
Achieved MTBM Similar to Ai
Availability Aa =
MTBM+MTTR active
Operational MTBM Extends Ai to include delays
Availability Ao = Reflects the real world operating
MTBM+MDT
environment
Not specified as a manufacturer-
controllable requirement
MTBF = Mean Time between Failure
MTTR = Mean Time to Repair
MTBM = Mean Time between Maintenance
MTTRactive = Mean Time to Repair
MDT = Mean Downtime
(corrective maintenance only)
Table 33. The Quantitative Measures of Availability (After RAC Toolkit, page 12)
211
THIS PAGE INTENTIONALLY LEFT BLANK
212
APPENDIX E: LIST OF ACRONYMS AND DEFINITIONS
BS - British Standard
CP – Counter-proliferation
DS - Discard
EO - Electro-Optical
213
FTA - Fault Tree Analysis
IR - Infrared
LOS - Line-Of-Sight
LU/SV - Lubrication/Servicing
MR - Mishap Rate
214
MTTR - Mean Time to Repair
PM - Planned Maintenance
RC - Radio Control
RS - Restoration
215
SIGINT – Signal Intelligence
TR - Tactical Reconnaissance
VR - Vendor Recommendations
WG - Working Group
216
LIST OF REFERENCES
Ashworth, Peter, LCDR, Royal Australian Navy, Sea Power Centre, Working
Paper No6, “UAVs and the Future Navy”, May 2001, Internet, February 2004. Available
at: http://www.navy.gov.au/spc/workingpapers/Working%20Paper%206.pdf
Carmichael, Bruce W., Col (Sel), Troy E. DeVine, Maj, Robert J. Kaufman, Maj,
Patrick E. Pence, Maj, and Richard S. Wilcox, Maj, “Strikestar 2025,” August 1996,
Department of Defense, Internet, February 2004. Available at: http://www.au.af.mil
/au/2025/volume3 /chap13/v3c13-2.htm
Ciufo, Chris A., “UAVs:New Tools for the Military Toolbox,” [66] COTS
Journal, June 2003, Internet, May 2004. Available at: http://www.cotsjournalonline.com
/2003/66
217
Clade, Lt Col, USAF, “Unmanned Aerial Vehicles: Implications for Military
Operations,” July 2000, Occasional Paper No. 16 Center for Strategy and Technology,
Air War College, Air University, Maxwell Air Force Base.
Clarke, Phill, “Letter to the Editor of New Engineer Magazine regarding Professor
David Sherwin at ICOMS 2000,” question 10, August 2000, Internet, May 2004.
Available at: http://www.assetpartnership.com /downloads.htm-13k
218
Fei-Bin, Hsiao, Meng-Tse Lee, Wen-Ying Chang, Cheng-Chen Yang, Kuo-Wei
Lin, Yi-Feng Tsai, and Chun-Ron Wy, ICAS 2002, 23rd International Congress of
Aeronautical Sciences, proceedings, Toronto Canada, 8 to 13 September 2002, Article:
“The Development of a Low Cost Autonomous UAV System”, Institute of Aeronautics
National Cheng Kung University Tainan, TAIWAN ROC.
GlobalSecurity.org, “RQ-3 Dark Star Tier III Minus,” maintained by John Pike,
last modified: November 20, 2002, Internet, May 2004. Available at: Available at:
http://www.globalsecurity.org /intell/systems/darkstar.htm
Goebel, Greg,/ In the Public Domain, “[6.0] US Battlefield UAVs (1),” January 1,
2003, Internet, February 2004. Available at: http://www.vectorsite. net/twuav6.html
Hoivik, Thomas H., OA-4603 Test and Evaluation Lecture Notes, Version 5.5,
“The Role of Test and Evaluation,” presented at NPS, winter quarter 2004.
Hoyland, A., and Rausand, M., System Reliability Theory: Models and Statistics
Methods, New York: John Wiley and Sons, 1994.
Kuo, W., and Zuo, J. M., Optimal Reliability Modeling, John Wiley & Sons,
2003.
219
February 2004. Available at: http://www.aiaa.org/aerospace /Article.cfm?
issuetocid=223&ArchiveIssueID=27
Meeker, Q. W., and Escobar, A. L., Statistical Methods for Reliability Data, John
Wiley & Sons Inc., 1998.
Munro, Cameron and Petter Krus, AIAA’s 1st Technical Conference & Workshop
on Unmanned Aerospace Vehicles, Systems, Technologies and Operations; a Collection
of Technical Papers, AIAA 2002-3451,“A Design Approach for Low cost ‘Expendable’
UAV system,” undated.
Nakata, Dave, White paper, “Can Safe Aircraft and MSG-3 Coexist in an Airline
Maintenance Program?”, Sinex Aviation Technologies, 2002, Internet, May 2004.
Available at: http://www.sinex.com/ products/Infonet/q8.htm
220
PD-ED-1255, Internet, February 2004. Available at: http://klabs.org/DEI /References
/design_guidelines/design_series/1255ksc .pdf
Petrie, G., Geo Informatics, Article “Robotic Aerial Platforms for Remote
Sensing,” Department of Geography &Topographic Science, University of Glasgow,
May 2001, Internet, February 2004. Available at: http://web.geog.gla.ac.uk /~gpetrie
/12_17_petrie.pdf
221
Puscov, Johan, “Flight System Implementation,” Sommaren-Hosten 2002, Royal
Institute of Technology (KTH), Internet, February 2004. Available at: http://www.
particle.kth.se/group_docs/admin /2002/Johan_2t.pdf
Regan, Nancy, RCM Team Leader, Naval Air Warfare Center, Aircraft Division,
“US Naval Aviation Implements RCM,” undated, Internet, February 2004. Available at:
http://www.mt-online.com/articles/0302_navalrcm.cfm
Reliability Analysis Center (RAC), Failure Mode, Effects and Criticality Analysis
(FMECA), 1993.
Reliability Analysis Center (RAC), Fault Tree Analysis (FTA) Application Guide,
1990.
Riebeling, Sandy, Redstone Rocket Article, Volume 51, No.28, “Unmanned Aerial
Vehicles,” July 17, 2002, Col. Burke John, Unmanned Aerial Vehicle Systems project
manager, Internet, February 2004. Available at: http://www.tuav.redstone.army.mil
/rsa_article.htm
Stamatis, D. H., Failure Mode and Effect Analysis: FMEA from Theory to
Execution, American Society for Quality (ASQ), 1995.
Sullivan, Carol, Kellogg, James, Peddicord, Eric, Naval Research Lab, January
2002, Draft of “Initial Sea All Shipboard Experimentation.”
222
Teets, Edward H., Casey J. Donohue, Ken Underwood, and Jeffrey E. Bauer,
National Aeronautics and Space Administration (NASA), NASA/TM-1998-206541,
“Atmospheric Considerations for UAV Flight Test Planning,” January 1998, Internet,
February 2004. Available at: http://www.dfrc.nasa.gov /DTRS/1998/PDF/H-2220.pdf
Tozer, Tim, David Grace, John Thompson, and Peter Baynham, “UAVs and
HAPs-Potential Convergence for Military Communications,” Univercity of York, DERA
Defford, undated, Internet, February 2004. Available at: http://www.elec.york.ac.uk
/comms/papers/tozer00_ieecol.pdf
UAV Rolling News, “New UAV work for Dryden in 2004,” June 12, 2003,
Internet, February 2004. Available at: http://www.uavworld.com /_disc1/00000068.htm
UAV Rolling News, “UAV Roadmap defines reliability objectives,” March 18,
2003, Internet, February 2004. Available at: http://www.uavworld.com/_disc1 /0000002
223
Williams Warren, and Michael Harris, “The Challenges of Flight –Testing
Unmanned Air Vehicles,” Systems Engineering, Test & Evaluation Conference, Sydney,
Australia, October 2002.
Zachany, Bathon A., Marine Forces Reserve, “Unmanned Aerial Vehicles Help
3/14 Call For and Adjust Fire,” Story ID Number: 2001411104010, April 5, 2001,
Internet, February 2004. Available at: http://www.13meu.usmc. mil/marinelink
/mcn2000.nsf/Open document.
224
INITIAL DISTRIBUTION LIST
225