Establishment of Models and Data Tracking For Small UAV Reliability

Download as pdf or txt
Download as pdf or txt
You are on page 1of 248

View metadata, citation and similar papers at core.ac.

uk brought to you by CORE


provided by Calhoun, Institutional Archive of the Naval Postgraduate School

Calhoun: The NPS Institutional Archive

Theses and Dissertations Thesis Collection

2004-06

Establishment of models and data tracking for small


UAV reliability

Dermentzoudis, Marinos
Monterey California. Naval Postgraduate School

http://hdl.handle.net/10945/1157
NAVAL
POSTGRADUATE
SCHOOL
MONTEREY, CALIFORNIA

THESIS

ESTABLISHMENT OF MODELS AND DATA


TRACKING FOR SMALL UAV RELIABILITY

by

Marinos Dermentzoudis

June 2004

Thesis Advisor: David Olwell


Second Reader: Russell Gottfried

Approved for public release; distribution is unlimited


THIS PAGE INTENTIONALLY LEFT BLANK
REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including
the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and
completing and reviewing the collection of information. Send comments regarding this burden estimate or any
other aspect of this collection of information, including suggestions for reducing this burden, to Washington
headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite
1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project
(0704-0188) Washington DC 20503.
1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED
June 2004 Master’s Thesis
4. TITLE AND SUBTITLE: Establishment of Models and Data Tracking for 5. FUNDING NUMBERS
Small UAV Reliability

6. AUTHOR(S) Marinos Dermentzoudis


7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING
Naval Postgraduate School ORGANIZATION REPORT
Monterey, CA 93943-5000 NUMBER
9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORING
N/A AGENCY REPORT NUMBER

11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official
policy or position of the Department of Defense or the U.S. Government.
12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE
Approved for public release; distribution is unlimited A

13. ABSTRACT (maximum 200 words)

This thesis surveys existing reliability management and improvement techniques, and describes
how they can be applied to small unmanned aerial vehicles (SUAVs). These vehicles are currently
unreliable, and lack systems to improve their reliability. Selection of those systems, in turn, drives data
collection requirements for SUAVs, which we also present, with proposed solutions.
This thesis lays the foundation for a Navy-wide SUAV reliability program.

14. SUBJECT TERMS reliability improvement, FMECA, FRACAS, reliability 15. NUMBER OF
growth PAGES 247

16. PRICE CODE


17. SECURITY 18. SECURITY 19. SECURITY 20. LIMITATION
CLASSIFICATION OF CLASSIFICATION OF THIS CLASSIFICATION OF OF ABSTRACT
REPORT PAGE ABSTRACT
Unclassified Unclassified Unclassified UL
NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89)
Prescribed by ANSI Std. 239-18

i
THIS PAGE INTENTIONALLY LEFT BLANK

ii
Approved for public release; distribution is unlimited

ESTABLISHMENT OF MODELS AND DATA TRACKING FOR SMALL UAV


RELIABILITY

Marinos Dermentzoudis
Commander, Hellenic Navy
B.S., Naval Academy of Greece, 1986

Submitted in partial fulfillment of the


requirements for the degree of

MASTER OF SCIENCE IN OPERATIONS RESEARCH


MASTER OF SCIENCE IN SYSTEMS ENGINEERING

from the

NAVAL POSTGRADUATE SCHOOL


June 2004

Author: Marinos Dermentzoudis

Approved by: David Olwell


Thesis Advisor

Russell Gottfried
Second Reader

James N. Eagle
Chairman, Department of Operations Research

iii
THIS PAGE INTENTIONALLY LEFT BLANK

iv
ABSTRACT

This thesis surveys existing reliability management and improvement techniques,


and describes how they can be applied to small unmanned aerial vehicles (SUAVs).
These vehicles are currently unreliable, and lack systems to improve their reliability.
Selection of those systems, in turn, drives data collection requirements for SUAVs, which
we also present, with proposed solutions.
This thesis lays the foundation for a Navy-wide SUAV reliability program.

v
THIS PAGE INTENTIONALLY LEFT BLANK

vi
TABLE OF CONTENTS

I. INTRODUCTION........................................................................................................1
A. BACKGROUND (UAVS, SUAVS).................................................................1
1. UAV – Small UAV ...............................................................................1
2. The Pioneer RQ-2 ................................................................................3
a. The Predator RQ-1....................................................................4
b. The Global Hawk RQ-4 ............................................................5
c. The Dark Star RQ-3..................................................................5
3. RQ-5 Hunter.........................................................................................6
4. RQ-7 Shadow 200.................................................................................6
5. RQ-8 Fire Scout....................................................................................6
6. Residual UAVs Systems.......................................................................7
7. Conceptual Research UAV Systems...................................................7
8. DARPA UAV Programs ......................................................................8
9. Other Nation’s UAVs...........................................................................9
10. NASA.....................................................................................................9
11. What Is a UAV? .................................................................................10
12. Military UAV Categories ..................................................................11
13. Battlefield UAVs.................................................................................12
a. Story 1. Training at Fort Bragg..............................................12
2. Story 2. Desert Shield/Storm Anecdote ..................................13
14. Battlefield Missions............................................................................14
a. Combat Surveillance UAVs ....................................................15
b. Tactical Reconnaissance UAVs..............................................15
B. PROBLEM DEFINITION ............................................................................16
1. UAVs Mishaps....................................................................................16
2. What is the Problem? ........................................................................17
3. What is the Importance of the Problem?.........................................19
4. How Will the Development Teams Solve the Problem without
the Thesis? ..........................................................................................20
5. How Will This Thesis Help?..............................................................20
6. How Will We Know That We Have Succeeded...............................20
7. Improving Reliability.........................................................................20
8. Area of Research ................................................................................21
II. RELATED RESEARCH ...........................................................................................23
A. EXISTING METHODS.................................................................................23
1. General: FMEA, FMECA and FTA.................................................23
a. Introduction to Failure Mode and Effect Analysis
(FMEA) ...................................................................................23
b. Discussion................................................................................24
c. FMEA: General Overview ......................................................26

vii
d. When is the FMEA Started?...................................................26
e. Explanation of the FMEA ......................................................27
f. The Eight Steps Method for Implementing FMEA ...............28
g. FMEA Team............................................................................31
h. Limitations Applying FMEA ..................................................31
i. FMEA Types ...........................................................................32
j. System and Design FMEA......................................................32
k. Analysis of Design FMEA ......................................................33
l. FMEA Conclusion ..................................................................36
m. Other Tools..............................................................................36
2. Manned Aviation Specific: RCM, MSG-3 .......................................42
a. Introduction to RCM...............................................................42
b. The Seven Questions...............................................................43
c. RCM-2 .....................................................................................43
d. SAE STANDARD JA 1011 .....................................................44
e. MSG-3......................................................................................45
f. MSG-3 Revision ......................................................................47
g. General Development of Scheduled Maintenance ................48
h. Divisions of MSG-3 Document...............................................49
i. MSI Selection ..........................................................................49
j. Analysis Procedure .................................................................51
k. Logic Diagram.........................................................................51
l. Procedure ................................................................................55
m. Fault Tolerant Systems Analysis ............................................55
n. Consequences of Failure in the First level ............................56
o. Failure Effect Categories in the First Level ..........................57
p. Task Development in the Second level ...................................58
3. Comparison of Existing Methods .....................................................61
a. RCM.........................................................................................61
b. Conducting RCM Analysis .....................................................62
c. Nuclear Industry & RCM .......................................................62
d. RCM in NAVAIR ....................................................................63
e. RCM in Industries Other Than Aviation and Nuclear
Power .......................................................................................64
f. FMEA and RCM.....................................................................66
g. FMECA ...................................................................................67
h. FTA, FMEA, FMECA ............................................................67
i. FTA..........................................................................................68
j. RCM Revisited.........................................................................68
k. UAVs, SUAVs versus Manned Aircraft .................................70
l. Conclusions-Three Main Considerations about UAV-
RCM.........................................................................................71
B. SMALL UAV RELIABILITY MODELING ..............................................73
1. System’s High Level Functional Architecture ................................73
2. System Overview................................................................................77

viii
3. System Definition ...............................................................................78
4. System Critical Functions Analysis..................................................80
5. System Functions ...............................................................................82
6. Fault Tree Analysis ............................................................................82
7. Loss of Mission ...................................................................................83
8. Loss of Platform .................................................................................85
9. Loss of GCS ........................................................................................87
10. Loss of Platform’s Structural Integrity ...........................................89
11. Loss of Lift ..........................................................................................91
12. Loss of Thrust.....................................................................................93
13. Loss of Platform Control...................................................................95
14. Loss of Platform Position ..................................................................97
15. Loss of Control Channel....................................................................98
16. Engine Control Failure....................................................................100
17. Engine Failure ..................................................................................101
18. Failure of Fuel System .....................................................................103
19. Loss of Platform Power ...................................................................105
20. Loss of GCS Power ..........................................................................107
21. Operator Error.................................................................................109
22. Mechanical Engine Failure .............................................................111
23. Engine Vibrations ............................................................................113
24. Overheating ......................................................................................115
25. Inappropriate Engine Operation....................................................117
26. Follow-on Analysis for the Model...................................................119
27. Criticality Analysis...........................................................................125
28. Interpretation of Results .................................................................131
III. DATA COLLECTIONS SYSTEMS ......................................................................133
A. RELIABILITY GROWTH AND CONTINUOUS IMPROVEMENT
PROCESS .........................................................................................133
1. Failure Reporting and Corrective Action System (FRACAS).....133
a. Failure Observation ..............................................................134
b. Failure Documentation ........................................................135
c. Failure Verification ..............................................................135
d. Failure Isolation ...................................................................135
e. Replacement of Problematic Part(s).....................................135
f. Problematic Part(s) Verification ..........................................135
g. Data Search ...........................................................................135
h. Failure Analysis ....................................................................136
i. Root-Cause Analysis .............................................................136
j. Determine Corrective Action ................................................136
k. Incorporate Corrective Action and Operational
Performance Test ..................................................................136
l. Determine Effectiveness of Corrective Action .....................137
m. Incorporate Corrective Action into All Systems...................137
2. FRACAS Basics................................................................................137
ix
3. FRACAS Forms ...............................................................................140
4. Discussion for the Forms Terms.....................................................141
5. Reliability Growth Testing .............................................................150
6. Reliability Growth Testing Implementation .................................152
B. RELIABILITY IMPROVEMENT PROCESS .........................................152
1. UAVs Considerations ......................................................................152
2. UAVs and Reliability .......................................................................154
a. Pilot Not on Board ................................................................154
b. Weather Considerations........................................................154
c. Gusts and Turbulence...........................................................156
d. Non Developmental Items (NDI) or Commercial Off-the-
shelf (COTS)..........................................................................156
e. Cost Considerations ..............................................................156
f. Man in the Loop....................................................................158
g. Collision Avoidance ..............................................................159
h. Landing..................................................................................159
i. Losing and Regaining Flight Control ..................................160
j. Multiple Platforms Control...................................................160
k. Reliability, Availability, Maintainability of UAVs ...............161
3. Reliability Improvement for Hunter ..............................................162
4. Measures of Performance (MOP) for SUAVs ...............................163
5. Reliability Improvement Program on SUAVs ..............................165
6. Steps for Improving Reliability on SUAVs....................................166
IV. EXAMPLE................................................................................................................171
A. RQ-2 PIONEER 86 THROUGH 95 ...........................................................171
V. CONCLUSION ........................................................................................................179
A. SUMMARY ..................................................................................................179
B. RECOMMENDATIONS FOR FUTURE RESEARCHERS...................181
APPENDIX A: DEFINITION OF FMEA FORM TERMS............................................183
1. First Part of the Analysis of Design FMEA ...................................183
2. The Second Part of the Analysis of Design FMEA .......................183
3. Third Part of the Analysis of Design FMEA .................................188
APPENDIX B: THE MRB PROCESS..............................................................................189
APPENDIX C: FAILURES ...............................................................................................191
1. Functions...........................................................................................191
2. Performance Standards...................................................................191
3. Different Types of Functions...........................................................191
4. Functional Failure............................................................................192
5. Performance Standards and Failures ............................................192
6. Failure Modes...................................................................................194
7. Failure Effects ..................................................................................194
8. Failure Consequences ......................................................................195
APPENDIX D: RELIABILITY .........................................................................................197
x
1. Introduction to Reliability...............................................................197
2. What is Reliability?..........................................................................198
3. System Approach .............................................................................198
4. Reliability Modeling.........................................................................199
a. System Failures .....................................................................199
b. Independent vs Dependent Failures.....................................200
c. Black-Box Modeling .............................................................200
d. White-Box Modeling .............................................................201
e. Reliability Measures..............................................................202
f. Structure Functions ..............................................................208
g. Series System Reliability Function and MTTF ...................210
h. Quantitative Measures of Availability..................................210
APPENDIX E: LIST OF ACRONYMS AND DEFINITIONS.......................................213
LIST OF REFERENCES ....................................................................................................217
INITIAL DISTRIBUTION LIST .......................................................................................225

xi
THIS PAGE INTENTIONALLY LEFT BLANK

xii
LIST OF FIGURES

Figure 1. The Six Failure Patterns...................................................................................40


Figure 2. Systems Powerplant Logic Diagram Part1 (After ATA MSG-3, page 18) .....53
Figure 3. Systems Powerplant Logic Diagram Part2 (After ATA MSG-3, page 20) .....54
Figure 4. High Level Architecture of a SUAV System (After Fei-Bin) .........................74
Figure 5. Simple Block Diagram of a SUAV System.....................................................75
Figure 6. Simple Block Functional Diagram of a SUAV System...................................78
Figure 7. Loss of Mission................................................................................................84
Figure 8. Loss of Platform...............................................................................................86
Figure 9. Loss of GCS.....................................................................................................88
Figure 10. Loss of Structural Integrity ..............................................................................90
Figure 11. Loss of Lift.......................................................................................................92
Figure 12. Loss of Thrust ..................................................................................................94
Figure 13. Loss of Platform’s Control...............................................................................96
Figure 14. Loss of Platform’s Position..............................................................................97
Figure 15. Loss of Control Channel ..................................................................................99
Figure 16. Engine Control Failure...................................................................................100
Figure 17. Engine Failure................................................................................................102
Figure 18. Fuel System Failure .......................................................................................104
Figure 19. Loss of Platform Power .................................................................................106
Figure 20. Loss of GCS Power........................................................................................108
Figure 21. Operator Error................................................................................................110
Figure 22. Mechanical Engine Failure ............................................................................112
Figure 23. Engine Vibrations ..........................................................................................114
Figure 24. Overheating....................................................................................................116
Figure 25. Inappropriate Engine Operation.....................................................................118
Figure 26. Example for Cut Set. (After Kececioglu, page 223)......................................119
Figure 27. Engine Failure Combined Diagram ...............................................................121
Figure 28. Equivalent Diagram .......................................................................................124
Figure 29. Equivalent Block Diagram.............................................................................125
Figure 30. Engine Failure Criticality Matrix. (After RAC FMECA, page 33)................130
Figure 31. Closed-loop for FRACAS (After NASA, PRACAS, page 2)........................134
Figure 32. FRACAS Methodology Checklist page 1/2...................................................138
Figure 33. FRACAS Methodology Checklist page 2/2...................................................139
Figure 34. Duane’s Data Plotted on a Log-log Scale. .....................................................151
Figure 35. Generic Cost Relationship. (After Munro) ....................................................158
Figure 36. Reliability Trade-Offs. (After Sakamoto, slide 8) .........................................162
Figure 37. Reliability Improving Process on SUAVs .....................................................170
Figure 38. Duane’s Regression and Failure Rate versus Time .......................................173
Figure 39. Duane’s Regression and Failure Rate versus Time for 1990 to 1995............175
Figure 40. Duane’s Regression and Failure Rate versus Time for 1986 to 1991............177
Figure 41. Prediction Plot Curve.....................................................................................177

xiii
Figure 42. Condition Variable Versus Time.(From Hoyland, page 18)..........................202
Figure 43. Distribution and Probability Density Functions (From Hoyland, page 18)...203
Figure 44. Typical Distribution and Reliability Function ...............................................204
Figure 45. The Bathtub Curve.........................................................................................206
Figure 46. MTTF, MTTR, MTBF. (From Hoyland, page 25) ........................................208

xiv
LIST OF TABLES

Table 1. An Example of Design FMEA (From Stamatis, page 131) .............................35


Table 2. Task Selection Criteria (After ATA MSG-3, page 46)....................................60
Table 3. FTA and FMECA/FMEA (After RAC FTA, page 10) ....................................68
Table 4. MSG-3 and FMECA/FMEA............................................................................69
Table 5. Comparing RCM MSG-3 and FMEA/FMECA for SUAVs............................72
Table 6. System’s Essential Functions Analysis............................................................81
Table 7. Cut Set Analysis. (After Kececioglu, page 229)............................................123
Table 8. Classification of Failures According To Severity (After RAC FMECA,
page 26)..........................................................................................................125
Table 9. Classification of Failures According To Occurrence.....................................126
Table 10. Qualitative Occurrence and Severity Table ...................................................129
Table 11. Results from Engine Failure Criticality Analysis. The most critical issues
are highlighted. ..............................................................................................131
Table 12. Initial Failure Report Form ............................................................................144
Table 13. Failure Report Continuation Form.................................................................145
Table 14. Failure Analysis Report Form (From RAC Toolkit, page 290) .....................146
Table 15. Correction Action Verification Report Form.................................................147
Table 16. Tag to Problematic Part Form........................................................................148
Table 17. Failure Log-Sheet...........................................................................................149
Table 18. UAVs FMEA Form (After MIL-STD-1629A, Figure 101.3)........................167
Table 19. RQ-2 Pioneer data..........................................................................................171
Table 20. MR and CMR.................................................................................................171
Table 21. Duane’s Theory Data Analysis ......................................................................172
Table 22. Regression Results .........................................................................................172
Table 23. RQ-2 Pioneer Data, 1990 to 1995..................................................................173
Table 24. Duane’s Theory Data Analysis for 1990 to 1995 ..........................................174
Table 25. Regression Results for 1990 to 1995 .............................................................174
Table 26. RQ-2 Pioneer Data, 1986 to 1991..................................................................175
Table 27. Duane’s Theory Data Analysis for 1986 to 1991 ..........................................176
Table 28. Regression Results for 1986 to 1991 .............................................................176
Table 29. Example of Severity Guideline Table for Design FMEA (After Stamatis,
page 138)........................................................................................................185
Table 30. Example of Occurrence Guideline Table for Design FMEA (After
Stamatis, page 142)........................................................................................186
Table 31. Example of Detection Guideline Table for Design FMEA (After Stamatis,
page 147)........................................................................................................187
Table 32. Relationships Between Functions F(t), R(t), f(t), z(t) (From Hoyland, page
22) 206
Table 33. The Quantitative Measures of Availability (After RAC Toolkit, page 12)....211

xv
THIS PAGE INTENTIONALLY LEFT BLANK

xvi
ACKNOWLEDGMENTS

The author would like to acknowledge the assistance of the VC-6 Team that
operates the XPV 1B TERN SUAV system for providing valuable insight information
about their system.

I also wish to express my appreciation to Professor David Netzer for sponsoring


my trips to Camp Roberts. His financial support was invaluable for gaining knowledge
regarding real UAV operations and the successful completion of this effort.

In addition, I would like to thank the NPS Dudley Knox Library staff for their
high level of professionalism and their continuous support and help during my effort.

To my Professors Thomas Hoivik and Michael McCauley, I would like to express


my gratitude for their support in collecting valuable information for this thesis. I enjoyed
their assistance very much and I was also highly encouraged by Professor McCauley’s
presence at Camp Roberts during my two visits there.

My sincere thanks also go to LCDR Russell Gottfried for his untiring support and
motivation of my research effort.

I could not have accomplished this thesis without the technical help and directions
patiently and expertly provided by my thesis advisor, Professor David Olwell. His
contribution to this thesis was enormous and decisive. I feel that he has also influenced
me to like reliability, which is a new and interesting field for me.

To my one-year old son, Stephanos, who brought joy to my life, I would like to
express my gratitude because he made me realize through his everyday achievements that
I had to bring this research effort to an end.

Last, but not least, I would like to express my undying and true love to my
beautiful wife, Vana and dedicate this thesis to her. Without her support, encouragement,
patience, and understanding, I could not have performed this research effort.

xvii
THIS PAGE INTENTIONALLY LEFT BLANK

xviii
EXECUTIVE SUMMARY

Small UAVs will be used with growing frequency in the near future for military
operations. As SUAVs progress from being novelties and toys to becoming full members
of the military arsenal, their reliability and availability must begin to approach the levels
expected of military systems. They currently miss those levels by a wide margin.

The military has wide experience with the need for reliability improvement in
systems, and in fact developed or funded the development of many of the methods
discussed in this thesis. These methods have not yet been applied to SUAVs.

The projection of reliability experience from manned piloted aviation to UAVs


has led to overestimation of the UAV reliability. Real and urgent operational demands in
the Persian Gulf, Kosovo, and Afghanistan have highlighted the very low levels of
reliability of UAVs compared to manned air vehicles.

To make a decision, one needs analytical support. Analytical support requires


models. Models require good data. Good data requires systems to collect and archive it
for easy retrieval. When I began this thesis, I thought that good data on SUAV reliability
would be easily available for analysis. I was mistaken. That is why the majority of this
thesis has discussed data collection systems and argued that some (but not all) need to be
applied to SUAVs. For ease of implementation, we adapted forms from commercial use
for FMECA and FRACAS systems for SUAVs, and constructed a very detailed FTA for
a typical SUAV. This work is more typical of a reliability engineering thesis, but was
necessary to enable any operational analysis.

With the existing crude data on one UAV system, I was able to perform a crude
analysis using a reliability growth model based on Duane’s postulate. With good data, the
Navy will be able to do much more, as outlined in the thesis.

The DoD Reliability Primer is currently under extensive revision. In the


meantime, this thesis can serve as a survey of the reliability methods that are applicable
to SUAVs and as template for the implementation of FMECA, FTA, and FRACAS
xix
methods for reliability improvement for SUAVs. As with all surveys, it has depended on
the work of the original authors, which I have borrowed liberally and documented
extensively. The adaptation of these methods for SUAVs is the original contribution of
this thesis.

I observed developmental tests of SUAVs in the course of writing this thesis. I


can personally attest that no appropriate methods of data collection, archival, or analysis
are currently being used, and that these methods are desperately needed by the SUAV
community if it is to progress beyond the novelty stage. I strongly recommend their
adoption by NAVAIR.
This thesis makes an initial examination of the real problem of SUAV reliability.
Primarily it is a qualitative approach, which illuminates some of the problem’s aspects.
Collecting real data from SUAV systems will formulate reliability databases.
Quantitative reliability analysis may then follow and result in detailed information about
reliability improvement, but only if the collection systems outlined here are implemented
to provide the data for analysis.

xx
I. INTRODUCTION

A. BACKGROUND (UAVS, SUAVS)


1. UAV – Small UAV
One hundred years after the Wright brothers’ first successful airplane flight,
aircraft have been proven invaluable in combat. Unfortunately, airplanes have also
contributed to the loss of operator life. Many pilots have been killed attempting to
accomplish their mission, to become better pilots, and to test new technologies. The
development of uninhabited or unmanned aerial vehicles (UAVs) raises the possibility of
more efficient, secure, and cost effective military operations.1

The UAV puts eyes out there in places we don’t want to risk
having a manned vehicle operate. Sometimes it’s very dull, but necessary
work—flying a pattern for surveillance or reconnaissance. UAVs can go
into a dirty environment where there’s the threat of exposure to nuclear,
chemical or biological warfare. They are also sent into dangerous
environments—battle zones: Dull, Dirty, Dangerous. The primary reason
for the UAV is the Three D’s.2

The history of UAVs started in 1883 when Douglas Archibald attached an


anemometer to the line of a kite. Archibald managed to obtain differential measures of
wind velocity at altitudes up to 1,200 feet. In 1888, Arthur Batat made the first aerial
photograph in France, after installing a camera on a kite. The first use of UAVs built for
military purposes was during WWII by the Germans. The well-known flying bombs V-I
and V-II showed that unmanned aircraft could launch against targets and create a
destructive effect. In the 1950s, the US developed the Snark. It was an unmanned
intercontinental range aircraft designed to supplement Strategic Air Command’s manned
bombers against the Soviet Union. Snark, V-I and V-II destroyed themselves as they hit
their targets. In fact, these were early versions of today’s cruise and ballistic missiles.3
1 Clade, Lt Col, USAF, “Unmanned Aerial Vehicles: Implications for Military Operations,” July 2000,
Occasional Paper No. 16 Center for Strategy and Technology, Air War College, Air University, Maxwell
Air Force Base.
2 Riebeling, Sandy, Redstone Rocket Article, Volume 51, No.28, “Unmanned Aerial Vehicles,” July
17, 2002, Col. Burke John, Unmanned Aerial Vehicle Systems project manager, Internet, February 2004.
Available at: http://www.tuav.redstone.army.mil/rsa_article.htm
3 Carmichael, Bruce W., Col (Sel), and others, “Strikestar 2025,” Chapter 2, “Historical Development
and Employment,” August 1996, Department of Defense, Internet, February 2004. Available at:
http://www.au.af.mil/au/2025/volume3/chap13/v3c13-2.htm
1
In the US, the need to perform reconnaissance (RECCE) missions by UAVs came
after the realization that these missions are extremely dangerous and mentally fatiguing
for the pilot. The U2 Dragon Lady planes used to be the state-of-the-art platforms for
RECCE missions. They were slow, with a maximum speed of 0.6 Mach, and cruised at
an altitude between 70 and 90,000 feet.4 In May 1960 the Soviets captured a U2 plane.
The pilot, Gary Powers, confessed to the black bird program, created by President
Eisenhower to monitor the development of Soviet intercontinental ballistic missiles after
the launch of Sputnik-I. The U2 flights over Russia were suspended. Spy satellites filled
their gap. In 1962, another U2 was hit by a Soviet anti-air missile while on a RECCE
mission in Cuba. The pilot was killed in the crash. As a result of these incidents, the first
unmanned RECCE “drone”, the AQM-34 Lighting Bug, was made by the Ryan
Aeronautical Company in 1964. The term “drone” became slang among military
personnel for early-unmanned vehicles. It was a byword of the DH.82B Queen Bee,
which was a dummy target for anti-aircraft gunner training.5

The Lightning Bug was based on the earlier Fire Bee. It operated from 1964 until
April 1975, performing a total of 3,435 flight hours in RECCE missions that were too
dangerous for manned aircraft, especially during the Vietnam War. Some of its most
valuable contributions were photographing prisoner camps in Hanoi and Cuba, providing
photographic evidence of SA-2 missiles in North Vietnam, providing low-altitude battle
assessment after B-52 raids, and acting as a tactical air launched decoy.6

In 1962, Lockheed Martin began developing the D-21 supersonic RECCE drone,
the Tagboard. It was designed to be launched from either the back of a two-seat A-12,
which was under development at the same time, or from the wing of a B-52H. The drone
could fly at speeds greater than 3.3 Mach, at altitudes above 90,000 feet and had a range

4 The Global Aircraft Organization, US Reconnaissance, “U-2 Dragon Lady,” Internet, February 2004.
Available at: http://www.globalaircraft.org/planes/u-2_dragon_lady.pl
5 Clark, Richard M., Lt Col, USAF, “Uninhabited Combat Aerial Vehicles, Airpower by the People,
For the People, But Not with the People,” CADRE Paper No. 8, Air University Press, Maxwell Air Force
Base, Alabama, August 2000, Internet, February 2004. Available at: http://www.maxwell.af.mil
/au/aupress/CADRE_Papers/PDF_Bin/clark.pdf
6 Ibid.

2
of 3,000 miles. The project was canceled in 1971 together with the A-12 development
due to numerous failures, high cost of operations, and bad management.7

In addition to the RECCE role, Teledyne Ryan experimented with strike versions
of the BQM-34 drone, the Tomcat. They investigated the possibility of arming the
Lightning Bug with Maverick electro-optical-seeking missiles or electro-optically-guided
bombs Stubby Hobo. Favorable results were demonstrated in early 1972 but the armed
drones were never used during the Vietnam War. Interest in the UAVs was fading by the
end of the Vietnam War.8

In the 1973 Yom Kippur War, the Israelis used UAVs effectively as decoys to
draw antiaircraft fire away from attacking manned aircraft. In 1982, UAVs were used to
obtain the exact location of air defenses and gather electronic intelligence information in
Lebanon and Syria. The Israelis also used UAVs to monitor airfield activities, changing
strike plans accordingly.9
2. The Pioneer RQ-210
The US renewed its interest in UAVs in the late 1980s and early 90s, with the
start of the Gulf War. Instead of developing one from scratch, the US acquired and
improved the Scout, which was used by the Israelis in 1982 against the Syrians. The
outcome was the Pioneer, which was bought by the Navy to provide cheap unmanned
over the horizon targeting (OTHT), RECCE, and battle assessment. The Army and
Marines bought the Pioneer for similar roles and six Pioneer systems were deployed to
SW Asia for Desert Storm.

Compared to the Lightning Bug, the Pioneer is slower, larger, and lighter, but
cheaper. The average cost of the platform was only $850K, which was inexpensive
relative to the cost of a manned RECCE aircraft. 11 With its better sensor technology, the
7 Carmichael.
8 Ibid.
9 Ibid.
10 The material of this section is taken (in some places verbatim) from GlobalSecurity.org, “Pioneer
Short Range (SR) UAV,” maintained by John Pike, last modified: November 20, 2002, Internet, May 2004.
Available at: http://www.globalssecurity.org/intell/systems/pioneer.htm
11 National Air and Space Museum, Smithsonian Institution, “Pioneer RQ-2A,” 1998-2000, revised
9/14/01 Connor R. and Lee R. E., Internet, May 2004. Available at: http://www.nasm.si.edu/research
/aero/aircraft/pioneer.htm
3
Pioneer can deliver real-time battlefield assessment in video stream, a huge improvement
compared to the film processing required for the Lightning Bugs.

By 2000, after 15 years of operations, the Pioneer had logged more than 20,000
flight hours. Apart from Desert Storm it was used in Desert Shield, in Bosnia, Haiti,
Somalia, and for other peacekeeping missions. The Navy used the Pioneer to monitor the
Kuwait and Iraqi coastline and to provide spotting services for every 16-inch round fired
by its battleships.

Pioneer can give detailed information about a local position to a battalion


commander. Joint force commanders wanted to see a bigger, continuous picture of the
battlefield, but space-based and manned-airborne RECCE platforms could not satisfy
their demand for continuous situational awareness information. In response to that need
and in addition to tactical UAVs (TUAVs) like the Pioneer, the US began to develop a
family of endurance UAVs.

Three different platforms compose the endurance UAV family: Predator, Global
Hawk, and Dark Star.
a. The Predator RQ-112
Predator is a by-product of the CIA-developed Gnat 750, also known as
the TierII or medium altitude endurance (MAE) UAV. It is manufactured by General
Atomics Aeronautical Systems and costs about $3.2M to $4.5M per platform.13 Its
endurance was designed to be greater than 40 hours with a cruising speed of 110 knots
and operational speed of 75 knots using a reciprocating engine with a 25,000-foot ceiling
and 450-pound payload. Predator can carry electro-optical (EO) and infrared (IR)
sensors. It also collects full-rate video imagery and transmits it in near real-time via
satellite, other UAVs, manned aircraft or line-of-sight (LOS) data link. More importantly,
Predator is highly programmable. It can go from autonomous flight to manual control by
a remote pilot.

12 The material for this section is taken (in some places verbatim) from: Carmichael.
13 Ciufo, Chris A., “UAVs:New Tools for the Military Toolbox,” [66] COTS Journal, June 2003,
Internet, May 2004. Available at: http://www.cotsjournalonline.com/2003/66
4
Except for Pioneer, Predator is the most tested and commonly used UAV. It was
first deployed to Bosnia in 1994, next in the Afghan War of 2001, and then in the Iraqi
war of 2003.

Used as a low altitude UAV, Predator can perform almost the same tasks as
Pioneer: surveillance, RECCE, combat assessment, force protection, and close air
support. It can also be equipped with two laser-guided Hellfire missiles for direct hits at
moving or stationary targets. During operation Enduring Freedom in Afghanistan,
Predators were considered invaluable to the troops for scouting around the next bend of
the road or over the hill for hidden Taliban forces.

Used as a high altitude UAV, the Predator can perform surveillance over a wide
area for up to 30 to 45 hours. In Operation Iraqi Freedom, Predators were deployed near
Baghdad to attract hostile fire from the city’s anti-air defense systems. Once the locations
of these defense systems were revealed, manned airplanes eliminated the targets.
b. The Global Hawk RQ-414
A TierII+ aircraft, Global Hawk is a conventional high-altitude endurance
(CHAE) UAV by Teledyne Ryan Aeronautical. A higher performance vehicle, it was
designed to fulfill a post-Desert Storm requirement for high resolution RECCE of a
40,000 square nautical mile area in 24 hours. It can fly for more than 40 hours and over
3,000 miles away from its launch and recovery base carrying a synthetic aperture radar
(SAR) and an EO/IR payload of 2,000 pounds at altitudes above 60,000 feet at a speed of
340 knots. The cost of a Global Hawk is about $57M per unit.15
c. The Dark Star RQ-316
The Tier III stealth or low observable high altitude endurance (LOHAE)
RQ-3 UAV was the Lockheed-Martin/Boeing Dark Star. Its primary purpose was to
image well-protected, high-value targets. Capable of operating for more than eight hours
at altitudes above 45,000 feet and a distance of 500 miles from its launch base, it was
designed to meet a $10M per platform unit cost. Its first flight occurred in March 1996;

14 The material for this section is taken (in some places verbatim) from: Carmichael.
15 Ciufo.
16 The material for this section is taken (in some places verbatim) from: Carmichael.

5
however, a second flight in April 1996 crashed due to incorrect aerodynamic modeling of
the vehicle flight-control laws. The project was cancelled in 1999.17

For the characterization code RQ-3 the "R" is the Department of Defense
designation for reconnaissance; "Q" means unmanned aircraft system. The "3" refers to it
being the third of a series of purpose-built unmanned reconnaissance aircraft systems.18
3. RQ-5 Hunter19
Initially engaged to serve as the Army’s short range UAV system for division and
corps commanders at a cost of $1.2M per unit,20 the RQ-5 Hunter can carry a 200 lb load
for more than 11 hours. It uses an electro-optical infrared (EO/IR) sensor, and relays its
video images in real-time via a second airborne Hunter over a line-of-site (LOS) data
link. It deployed to Kosovo in 1999 to support NATO operations. Production was
cancelled in 1999 but the remaining low-rate initial production (LRIP) platforms remain
in service for training and experimental purposes. Hunter is to be replaced by the Shadow
200 or RQ-7 tactical UAV (TUAV).
4. RQ-7 Shadow 20021
The Army selected the RQ-7 Shadow 200 in December 1999 as the close range
UAV for support to ground maneuver commanders. It can be launched by the use of a
catapult rail and recovered with the aid of arresting gear, and remain at least four hours
on station with a payload of 60 lbs.
5. RQ-8 Fire Scout22
The RQ-8 Fire Scout is a vertical take-off and landing (VTOL) tactical UAV
(VTUAV). It can remain on station for at least three hours at 110 knots with a payload of
200 lb. Its scouting equipment consists of an EO/IR sensor with an integral laser

17 GlobalSecurity.org, “RQ-3 Dark Star Tier III Minus,” maintained by John Pike, last modified:
November 20, 2002, Internet, May 2004. Available at: Available at: http://www.globalsecurity.org
/intell/systems/darkstar.htm
18 Ibid.
19 The material for this section is taken (in some places verbatim) from: Office of the Secretary of
Defense (OSD), “Unmanned Aerial Vehicles Roadmap 2000-2025,” April 2001, page 4.
20 Ciufo.
21 The material for this section is taken (in some places verbatim) from: OSD 2001, page 5.
22 The material for this section is taken (in some places verbatim) from: OSD 2001, page 5.

6
designator rangefinder. Data is relayed to its ground or ship control station in real time
over a LOS data link and a UHF backup that could operate from all air capable ships.
6. Residual UAVs Systems23
The US military maintains the residual of several UAV programs that are not
current programs for development but have recently deployed with operational units and
trained operators. BQM-147, Exdrones, is an 80-lb delta wing communications jammer
and was deployed during the Gulf War. From 1997 to 1998 some of them were rebuilt
and named Dragon Drone and deployed with Marine Expeditionary units. Air Force
Special Operations Command and Army Air Maneuver Battle Lab are also conducting
experiments with Exdrones.

Some hand-launched, battery powered FQM-151 Pointers have been acquired by


the Marines and the Army since 1989 and were employed in the Gulf War. Pointers
performed as test platforms for various miniaturized sensors and have performed
demonstrations with the Drug Enforcement Agency, National Guard and Special
Operations Forces.
7. Conceptual Research UAV Systems24
The various service laboratories have developed a number of UAVs to research
special operational needs and concepts. The Marine Corps Warfighting Laboratory is
exploring three such concepts. The Dragon Warrior or Cypher II is intended to fly over
the shore on fixed-wing mode flight and then, after removing its wings, converts into a
hovering land platform design for urban operations.

Marines have converted a K-Max helicopter to a UAV in order to explore the


Broad Area Unmanned Responsible Resupply Operations. This concept is for ship-to-
shore resupply by UAVs.

Battery-powered Dragon Eye is a mini-UAV (2.4 foot wingspan and 4 lbs)


developed as the Navy’s version for the Over-The-Hill RECCE Initiative and the
Marines’ Interim Small Unit Remote Scouting System requirement. The Dragon Eye can
be carried in a backpack, and hence is given the name of Backpack UAV.

23 Ibid, page 6.
24 Ibid, pages 7-8.

7
Sponsored by the Defense Threat Reduction Agency, the Counterproliferation
(CP) Advance Concept Technology Demonstrations (ACTD) envisions deploying several
mini-UAVs like the Finder from a larger Predator UAV to detect chemical agents and
relay the results back through Predator.

The CP ACTD is designed to address the growing need to provide


a military capability for “precision engagement” of weapons of mass
destruction (WMD) related facilities. In order to accomplish this objective,
the CP ACTD will develop, integrate, demonstrate and transition to the
warfighters, operationally mature technologies that potentially address the
unique requirements to enhance the joint counterforce mission to hold
WMD-related facilities at risk. The driving CP counterforce requirements
include enhancing the ability to predict and to control collateral effects
and to provide prompt response and reliable kill.25

Besides the Dragon Eye and Finder mentioned above, the Naval Research
Laboratory (NRL) has built and flown several small and micro-UAVs. Definition for
these airframes will follow. The Naval Air Warfare Centre Aircraft Division
(NAWC/AD) maintains a small UAV test and development team and also operates
various types of small UAVs.
8. DARPA UAV Programs26
The Defence Advanced Research Projects Agency (DARPA) is sponsoring five
major creative UAV programs:

a. The Air Force X-45 UCAV, which was awarded to Boeing in 1999. The
mission for the UCAV is Suppression of the Enemy Air Defences (SEAD). The platform
will cost one third as much as a Joint Strike Fighter (JSF) to acquire and one quarter as
much to operate and support (O&S). The X-45A, with a maximum speed of 1000km/h,
was designed to carry two 500 kg bombs using radar absorbing materials, and was first
flown in June 2002.

b. The UCAV-Navy X-46/X-47 is a similar program for the equivalent


Navy version of a UCAV that can be carrier-based. Apart from SEAD missions, RECCE
and strike will be among the platform’s capabilities. The X-47A Pegasus by Northrop
25 Department of Defense, Director of Operational Test & Evaluation, “Missile Defense and Related
Programs FY 1997 Annual Report,” February 1998, Internet, February 2004. Available at:
http://www.fas.org/spp/starwars/program/dote97/97cp.html
26 The material for this section is taken (in some places verbatim) from: OSD 2001, pages 8-9.

8
Grumman successfully flew in March 2003 using modified GPS coordinates for
navigation.

c. The Advanced Air Vehicle (AAV) program includes two rotorcraft


projects:

(1) The Dragon Fly Canard Rotor Wing, which will demonstrate
vertical take-off-and-land (VTOL) capability and then transition to fixed wing flight for
cruise.

(2) The A160 Hummingbird, which uses a hingeless rigid rotor to


perform high endurance flight of more than 24 hours at a high altitude of more than
30,000 feet.

d. DARPA is exploring various designs of micro-air vehicles (MAVs),


which are less than six inches in any dimension. The Lutronix Kolibri and the Microcraft
Ducted Fan rely on an enclosed rotor for vertical flight, while the Lockheed Martin
Sanders Microstar and the AeroVironment Black Window and E-Wasp are fixed-wing
horizontal fliers.
9. Other Nation’s UAVs
In FY00 some 32 nations manufactured more than 150 models of UAVs, and 55
countries operate some 80 types of UAVs, primarily for RECCE missions.

Derivatives of the Israeli designs are the Crecerelle used by the French Army, the
Canadair CL-289 used by the German and French Armies and the British Phoenix. The
Russians use the VR-3 Reys and the Tu-300 and the Italians the Mirach 150.27
10. NASA
In the civilian sector, NASA has been the main agency concerned with
developing medium and high-altitude long endurance UAVs. The agency has been
involved with two main programs “Mission to Planet Earth” and “Earth Science
Enterprise” for environmental monitoring of the effects of global climatic change. During
the late 80s, NASA started to operate high-altitude manned aircraft, but later decided to
develop a UAV for high-altitude operations. NASA constructed the propeller driven
27 Petrie, G., Geo Informatics, Article “Robotic Aerial Platforms for Remote Sensing,” Department of
Geography &Topographic Science, University of Glasgow, May 2001, Internet, February 2004. Available
at: http://web.geog.gla.ac.uk/~gpetrie/12_17_petrie.pdf
9
Perseus between 1991 and 1994 and Theseus, which was a larger version of Perseus, in
1996.

In 1994 NASA started its Environmental Research Aircraft and Sensor


Technology (ERAST) program. As a result, NASA has operated the Altus and Altus II
since 1998. Their operating ceilings are 45,000 to 65,000 feet using turbocharged
engines.

The development of solar powered UAVs is also being supported and funded by
NASA. The idea, development, and construction was initiated by the Aerovironment
company, which has been involved in the construction of solar-powered aircraft for 20
years. Solar Challenger, HALSOL, Talon, Pathfinder, Centurion, and Helios with a
wingspan of 247 feet, were among the solar-powered UAVs during those efforts.28

New technologies like regenerative fuel-cell-powered UAVs are underway. These


allow UAVs to fly for weeks or months, reducing the costs of missions so as to deliver a
maximum return on investment per flight. NASA will also support the development of
such technology.29
11. What Is a UAV?30
The distinction between cruise missile weapons and UAV weapon systems is
sometimes confusing. Their main differences are:

a. UAVs are designed to be recovered at the end of their flight while cruise
missiles are not.

b. A warhead is tailored and integrated into a missile’s airframe while any


munitions carried by UAVs are external loads.

According to 1-02 DoD Dictionary, a UAV is

A powered, aerial vehicle that does not carry a human operator,


uses aerodynamic forces to provide vehicle lift, can fly autonomously or
be piloted remotely, can be expendable or recoverable, and can carry a
28 Ibid.
29 UAV Rolling News, “New UAV work for Dryden in 2004,” June 12, 2003, Internet, February 2004.
Available at: http://www.uavworld.com/_disc1/00000068.htm
30 The material for this section is taken (in some places verbatim) from: Office of the Secretary of
Defense (OSD), “Unmanned Aerial Vehicles Roadmap 2002-2027,” December 2002, Section 1,
“Introduction.”
10
lethal or non-lethal payload. Ballistic or semi ballistic vehicles, cruise
missiles, and artillery projectiles are not considered unmanned aerial
vehicles.31

12. Military UAV Categories


UAVs can be classified according to different criteria such as mission type, sensor
type, performance, and control system. Remote Piloted Vehicles (RPVs) and autonomous
UAVs are two distinct groups based on their different control systems. They have many
common features but the main difference is that an RPV follows the data-link commands
of a remote station for the specific air mission. In other words, it is a “dumb” vehicle,
which can carry sensors and relay data. UAVs can be further classified according to their
mission as Reconnaissance Surveillance and Target Acquisition (RSTA) UAVs, Combat
UAVs (UCAVs), and others. According to the way they are launched, they can be
classified as hand-launched, rail-launched, rocket-launched and airfield-launched.

We also classify military UAVs in three main categories, considering their ceiling
as their driver characteristic: Tactical UAVs (TUAVs), Medium-Altitude Endurance
UAVs (MAE UAVs), and High-Altitude Endurance UAVs (HAE UAVs).32

a. Tier I or TUAVs are inexpensive with an average cost of 100K$FY00,


with a limited payload of around 50 kg, a LOS permitted range of the ground control
station, and endurance of approximately four hours. In general, they are rather small with
an average length of two meters and their maximum ceiling is around 5,000 feet. Pioneer
is a typical example. This category is also referred to as “Battlefield UAVs” and can be
divided in three subcategories:

(1) Micro UAVs (MUAVs) are very small UAVs in sizes 6 to 12


inches.33 The Aerovironment Wasp is an example of this category.34

31 Ibid.
32 Tozer, Tim, and others, “UAVs and HAPs-Potential Convergence for Military Communications,”
University of York, DERA Defford, undated, Internet, February 2004. Available at: http://www.elec.york
.ac.uk/comms/papers/tozer00_ieecol.pdf
33 Pike, John, Intelligence Resource Program, “Unmanned Aerial Vehicles (UAVS),” Internet, March
2004. Available at: http://www.fas.org/irp/program/collect/uav
34 The material for this part of section is taken (in some places verbatim) from: OSD 2002, Section2,
“Current UAV programs.”
11
(2) Mini UAVs have a span up to four feet. They provide the
company/platoon/squad level with an organic RSTA capability out to 10 Km. The
Aerovironment Dragon Eye is an example of this category.35

(3) Small UAVs (SUAVs) have a size greater than four feet in
length. “SUAV is a low-cost and user-friendly UAV system.” It is a highly mobile air
vehicle system that among other potentials allows the small warfighting unit to set the
foundation to exploit battlefield information superiority.36

b. Tier II or MAE UAVs are larger than TUAVs, more expensive, with an
average cost of 1M$FY00, and have enhanced performance. Their payload can reach 300
kg, their endurance is 12 or more hours, and their ceiling is up to 20,000 feet. Predator is
a typical example of a MAE UAV.

c. Tier II Plus or HAE UAVs can be large craft with an endurance of more
than 24 hours, payload capacities of more than 800 Kg and a ceiling of more than 30,000
feet. Their average cost is about 10M$FY00. Global Hawk is a typical example of HAE
UAV.

d. Tier III Minus or LOHAE UAVs can be large crafts with an endurance
of more than 12 hours, payload capacities of more than 300 Kg, and a ceiling of more
than 65,000 feet. Dark Star was a typical example of LOHAE UAV.
13. Battlefield UAVs
Here are two descriptions of the use of UAVs in training and combat.
a. Story 1. Training at Fort Bragg
“FDC this is FO adjust fire, over”. “FO this is FDC adjust fire,
out”. “FDC grid 304765, over”. “FO grid 304765, out”. “FDC two tanks
in the open, over”. “FO that’s two tanks in the open, out”. Then about 30
seconds later, “FO shot, over”. “FDC shot, out”. “FO splash, over”. “FDC
splash, out”. Fort Bragg, N.C. (April 5, 2001).

Communications like these can normally be heard during a live-fire


training exercise between the forward observers (FO) and the Marines at
the fire direction control centre (FDC), but during exercise Rolling

35 Ibid.
36 NAVAIR, “Small Unmanned Aerial Vehicles,” undated, Internet, February 2004. Available at:
http://uav.navair.navy.mil/smuav/smuav_home.htm
12
Thunder, the 3rd Battalion, 14th Marines used a different type of forward
observer.

Instead of a few Marines dug in a forward position, a UAV


controlled by the Marines from the Marine fixed-Wing Unmanned Vehicle
Squadron 2 (VMMU-2), Cherry Point, N.C., gave the calls for fire.

The UAV is a remote-controlled, single-propeller plane with a


wing span of 17 feet and an overall length of 14 feet. Inside the body of
the plane is a camera that allows the pilots to see and to identify targets,
according to Cpl. Tim Humbert, team non-commissioned officer, VMU-2.

“This was an excellent training opportunity for us,” said Capt.


Konstantine Zoganas, battalion fire direction officer, 3rd Bn., 14th Marines,
Philadelphia, Pa. “There aren’t many units who get the opportunity to train
with this equipment.”

For this mission, the UAV, which was flying at around 6,000 to
8,000 feet, was used to identify targets. They then looked at that data and
turned it into a fire mission, which was sent to the Marines on the gun line.
Once the Marines on the gun line blasted their round toward the target, the
UAV was used to adjust fire. “After using the UAV, I think it is equal to,
if not better than, a forward observer,” said Zoganas. “A forward observer
has a limited view depending on where he is at, but a UAV, being in the
air, has the ability to cover a lot more area,” said Zoganas. “I think the
UAV’s capabilities are underestimated, it is a great weapon to have on the
modern battlefield.”37

2. Story 2. Desert Shield/Storm Anecdote


Surrenders of Iraqi troops to an unmanned aerial vehicle actually
happened. All of the UAV units at various times had individuals or groups
attempt to signal the Pioneer, possibly to indicate a willingness to
surrender. However, the most famous incident occurred when USS
Missouri (BB 63), using her Pioneer to spot 16-inch gunfire, devastated
the defences of Faylaka Island off the coast of Kuwait City. Shortly
thereafter, while still over the horizon and invincible to the defenders, the
USS Wisconsin (BB 64) sent her Pioneer over the island at low altitude.
When the UAV came over the island, the defenders heard the obnoxious
sound of the two-stroke engine since the air vehicle was intentionally
flown low to let the Iraqis know that they were being targeted.
Recognizing that with the “vulture” overhead, there would soon be more
of those 2,000-pound naval gunfire rounds landing on their positions with
the same accuracy, the Iraqis made the right choice and, using

37 Zachany, Bathon A., Marine Forces Reserve, “Unmanned Aerial Vehicles Help 3/14 Call For and
Adjust Fire,” Story ID Number: 2001411104010, April 5, 2001, Internet, February 2004. Available at:
http://www.13meu.usmc.mil/marinelink/mcn2000.nsf/Open document
13
handkerchiefs, undershirts, and bed sheets, they signalled their desire to
surrender. Imagine the consternation of the Pioneer aircrew who called the
commanding officer of Wisconsin and asked plaintively, “Sir, they want to
surrender, what should I do with them?”38

14. Battlefield Missions


Reconnaissance is a “mission undertaken to obtain, by visual or other detection
methods, information about the activities and resources of an enemy; or to secure data
concerning the meteorological, hydrographical geographical characteristics of a particular
area.” This task is about gathering general information about an enemy or an area.
Surveillance is the “specific and systematic observation of a particular area or target for a
short or extended period of time.”39

UAVs have been used for the above missions since their inception. They can also
be used for target acquisition, target designation and battle damage assessment (BDA).
Due to their small size, they can operate more discreetly than their manned counterparts,
allowing target acquisition to occur with less chance of counter-detection. “The
surveillance UAV can be used to designate the target for a precision air and/or artillery or
missile strike while providing near real-time battle damage assessment to the force or
mission commander.”40 In that way, useless repeat attacks on a target could be avoided as
well as wastage of munitions.

Battlefield UAVs are appropriate UAVs for all of the above missions. In the
beginning of the 1950s, UAVs like the Northrop Falconer had been developed for
battlefield reconnaissance with little or no combat service. Later the Israelis were the
early developers of the operational use of battlefield UAVs in the early 1980s in southern
Lebanon operations. Their successes with battlefield UAVs drew international
attention.41

38 The Warfighter’s Encyclopedia, Aircraft, UAVs, “RQ-2 Pioneer,” August 14, 2003, Internet,
February 2004. Available at: http://www.wrc.chinalake.navy.mil/warfighter_enc/aircraft/UAVs/pioneer
.htm
39 Ashworth, Peter, LCDR, Royal Australian Navy, Sea Power Centre, Working Paper No6, “UAVs
and the Future Navy”, May 2001, Internet, February 2004. Available at: http://www.navy.gov.au
/spc/workingpapers/Working%20Paper%206.pdf
40 The material for the above part of section is taken (in some places verbatim) from: Ashworth.
41 Goebel, Greg,/ In the Public Domain, “[6.0] US Battlefield UAVs (1),” Jan 1, 2003, Internet,
February 2004. Available at: http://www.vectorsite.net/twuav6.html
14
We can distinguish two broad categories of battlefield UAVs; the “combat
surveillance” UAV and the “tactical reconnaissance” UAV.
a. Combat Surveillance UAVs42
The function of combat surveillance UAVs is to observe everything on a
battlefield in real-time, flying over the battle area, and relaying intelligence to a ground-
control station. In general, they are powered by a small internal combustion two-stroke
piston engine, known as a “chain saw” because of its characteristic noise. An autopilot
system with a radio control (RC) back-up for manual operations directs the platform from
pre-takeoff programmed sets of waypoints. In most cases, the program is set up by
displaying a map on a workstation, entering the coordinates, and downloading the
program into the UAV. Navigation is always verified by a GPS and often by an INS
system as well.

Combat surveillance UAVs normally use the autopilot to get on station


(above the operating area) and then operate in manual mode by RC to find or detect
potential targets. As a result, only LOS ranges are permitted, due to the limitations of the
RC transmitter signals.

Sensors are generally housed in a turret underneath the platform and/or are
integrated into the platform’s fuselage. They usually feature day-night imagers and in
many case a laser designator, SIGINT packages, or Synthetic Aperture Radar (SAR).

Larger UAVs have fixed landing gear that are used for takeoff and landing
purposes on small airstrips. Larger UAV can also be launched by special rail launcher
boosters and recovered by parachute, parasail or by flying into a net. Smaller UAVs may
be launched by a catapult and recovered in the same way or by landing in plain terrain
without any use of landing gear.
b. Tactical Reconnaissance UAVs43
Tactical Reconnaissance (TR) UAVs are usually larger and in some cases
jet powered with extended range and speed. Like the combat surveillance UAVs, they are
equipped with an autopilot with RC backup. Their primary mission is to fly over

42 The material for this section is taken (in some places verbatim) from: Goebel.
43 Ibid.

15
predefined targets out of line of sight, and take pictures or relay near real-time data to the
ground-control station via satellite links.

A UAV of this type can usually carry day-night cameras and/or Synthetic
Aperture Radar (SAR). The necessary communication equipment is usually located on
the upper part of the platform’s fuselage. A TR UAV can also be launched from runways
or small airstrips, an aircraft, and/or by special rail launcher boosters, and be recovered
by parachute.

The exact distinction between the two types of battlefield UAVs and other
types of UAVs is not clear. Some types are capable of both missions. A small combat
surveillance UAV may be the size of “a large hobbyist RC model plane.” It can be “used
to support military forces at the brigade or battalion level and sometimes they are called
‘mini UAVs.’ Their low cost makes them suitable for ‘expendable’ missions.”

B. PROBLEM DEFINITION
1. UAVs Mishaps
According to the Office of Secretary of Defense “UAV Roadmap” the mishap rate
for UAVs is difficult to define:

Class A mishap rate (MR) is the number of significant vehicle


damages or total losses occurring per 100,000 hours of fleet flight time. As
no single U.S. UAV fleet has accumulated this amount of flying time,
each fleet’s MR represents its extrapolated losses to the 100,000-hour
mark. It is expressed as mishaps per 100,000 hours. It is important to note
that this extrapolation does not reflect improvements that should result
from operational learning or improvement in component technology.44

A Pentagon report said that crashes and component failures are


increasing the cost of UAVs and restrict their availability for military
operations.45

44 OSD 2002, Appendix J, page 186.


45 Peck, Michael, National Defense Magazine, May 2003, Feature Article, “Pentagon Unhappy About
Drone Aircraft Reliability, Rising Mishap Rates of Unmanned Vehicles Attributed to Rushed
Deployments,” Internet, February 2004. Available at: http://www.nationaldefensemagazine.org/article.
cfm?Id=1105
16
The reliability issue has sparked controversy and concern that UAVs are
becoming too expensive. There is a wide-spread notion that UAVs are simply
expendables and cheap vehicles, something like diapers that are used once and discarded.
The truth is that these are costly components of expensive systems.

To get a view of the problem, we see that the 2002 crash rate for Predator was
32.8 crashes per 100,000 flight hours, and for 2003 it was 49.6 until May. The accident
rate for the Global Hawk was 167.7 per 100,000 flight hours on May 2003.46

Nevertheless, commanders can take greater risks with UAVs without worrying
about loss of life. These risks would not be taken with manned aircrafts. For example, the
recently updated MR for the F-16 was 3.5 per 100,000 flight hours. According to DoD
data, the MR for the RQ-2A Pioneer was 363 while the MR for the RQ-2A dropped to
139. For the RQ-5 Hunter it was 255 for pre-1996 platforms, and has dropped to 16 since
then. For the Predator RQ-1A, it was 43 and for the RQ-1B it was 31.47
2. What is the Problem?
Currently a network experiment series named Surveillance and Tactical
Acquisition Network (STAN) is being conducted by the Naval Postgraduate School
(NPS) at Camp Roberts, with SUAVs as the sensor platforms and the primary source of
information. SUAV programs are currently of great interest to the Fleet, Special Forces,
and other interested parties and are receiving large amounts of funding. There is a great
deal of concern about the reliability of SUAVs because a lot of problems have emerged
in testing. Reliability must be improved.

This thesis documents these problems. At the CIRPAS site at McMillan Field in
Camp Roberts on September 11 and 12, 2003, I observed flight, communication, search
and detection, and target acquisition tests, using two different types of SUAV platforms,
XPV-1B TERN and Silverfox, an experimental program funded by the office of Naval
Research. Incidents regarding reliability that occurred during that time include:

46 Peck, Michael, National Defense Magazine, May 2003, Feature Article, “Pentagon Unhappy About
Drone Aircraft Reliability, Rising Mishap Rates of Unmanned Vehicles Attributed to Rushed
Deployments,” page 1, Internet, February 2004. Available at: http://www.nationaldefensemagazine.org
/article.cfm?Id=1105
47 Peck.

17
a. During the pre-takeoff checks in the runway end, an engine air-intake
filter failed (due to broken support lock wire hole). The problem was obviously due to
engine vibrations. There was no spare part filter or any other means to repair the failure,
so it was replaced with another TERN platform’s air filter.

Result: the mission was delayed for thirty minutes.

b. During the start engine procedure, a starting device failed. The failure
was due to a loose bolt and the starting device could not start the engine. After ten
minutes delay, the bolt was tightened.

Result: the procedure was delayed for ten minutes.

c. After two and a half hours of flight operation on a TERN platform and
while in flight, the engine stalled at 500 feet. The SUAV ran out of fuel.

Result: loss of one TERN platform.

d. At the pre-takeoff checks on a Silverfox platform, recalibration of an


engine’s rpm was necessary (probably because it was during the initial flight after
replacing the old engines with new).

Result: five-minute delay.

e. During the operations on Silverfox platforms, many bad sensor signals


were received (especially using the CCD camera) probably due to ground-control station
antennas or due to LOS constraints.

Result: Missions lost their search and detection capability

f. After Silverfox’s landings (calculated crashes) in the field (not on a


runway), extensive cleaning of the interior of the platform due to weeds, soil and debris
that entered the vehicle from the front engine opening was needed.

Result: At least twenty minutes cleaning was needed after such landings.

The next step for STAN experiment was at the CIRPAS site at McMillan Field in
Camp Roberts from May 2 to May 6, 2004. I observed flight, communication, search and
detection, and networking tests, using the XPV-1B TERN on May 2 and 3. Incidents
regarding reliability that occurred during that time include:
18
a. During the assembly checks in the hangar on May 2, major software
problem was detected. Repairing was not possible by the team members.

Result: the platform was unable to operate at all.

b. During the test flight operation of the next platform the same day, the
engine stalled at 1000 feet and led to a platform crash.

Result: loss of platform.

c. On May 3, after one hour of flight operation, the third platform and
while in flight, an autopilot software malfunction was occurred that led to a platform auto
hard landing in the ground.48

Result: loss of one more TERN platform.

d. During landing of the next TERN platform and after two hours in flight
operation the front tire delaminated on May 5.49 Probably due to operator error, the
damage was impossible to be repaired by the team members.

Result: loss of platform.

e. On May 6, after one hour of operation flight and while in flight, a right-
wing servo failure occurred that result in loss of platform control and then to a platform
crash.50

Result: loss of platform.


3. What is the Importance of the Problem?
It is most notable that SUAVs are not technologically sophisticated enough to
warn the operator that the vehicle is under attack and/or under critical failure (such as out
of fuel), cannot operate under unfavorable weather conditions, and have a low level of
reliability, which degrades their role in military operations. Even though SUAVs cost
very little compared to other systems, such as observers, helicopters, planes and satellites,
it is essential that small UAV missions be carried out with an acceptable level of

48 Gottfried, Russell, LCDR (USN), Unmanned Vehicle Integration TACMEMO, 5-6 May Recap, e-
mail May 7, 2004.
49 Ibid.
50 Ibid.

19
reliability, operability, and reusability. In that way, they can become dependable systems
and be used in the battlefield with other systems.
4. How Will the Development Teams Solve the Problem without the
Thesis?
Trial and error and/or test, analyze and fix (TAAF) are the methods being used to
overcome failures for the Silverfox system. Being in the experimental phase, it is the
easiest but most time consuming way.

For the other system (TERN) that has been operational for almost two years, an
extended trial period is presently being conducted. From it, conclusions can be made for
future system improvements and operational usages. Other experimental systems can also
contribute to quantitative assessments of readiness and availability.
5. How Will This Thesis Help?
This thesis provides a tool to consider reliability issues by developing a system
for tracking data that could be improve reliability for SUAV systems.
6. How Will We Know That We Have Succeeded
Verification and validation of the proposed solutions and methods by NAVAIR
and the other interested parties will indicate the accuracy and the effectiveness of the
framework suggested by this thesis.
7. Improving Reliability
UAV reliability is the main issue preventing the FAA from relaxing its
restrictions on UAVs flying in civilian airspace and for foreign governments to allow
overflight and landing flights. Improved reliability or simply knowing actual mishap rates
and causes will enable risk mitigation and eventual flight clearance.

Efforts toward improving UAV reliability are required, but how can this best be
accomplished? The answer is by spending money, but we can be more specific. More
redundancy of flight control-systems may increase reliability, but there is another trade
off. The absence of components needed for manned aircraft makes UAVs cheaper, but
this also degrades their reliability. If reliability is sacrificed, then high attrition will
increase the number of UAVs needed and so the cost will rise again.

20
By focusing on flight control systems, propulsion, and operator training, which
account for approximately 80% of UAVs mishaps, we can increase reliability.51
Redundancy in on-board systems is not easily added, especially to small UAVs. Weight
and volume restrictions are very tight and that can lead to expensive solutions. But then if
we make UAVs too expensive, we cannot afford to lose them.

We can categorize UAVs by their volume, by their usage, by their endurance or


by their capabilities and type of operations, but we can also view each UAV system as a
unique case. We can analyze the system according to its functional components and do a
Failure Mode and Effect Analysis (FMEA). That is the first step for further
implementation of a reliability tracking and improvement method such as FRACAS,
Failure Mode Effect and Criticality Analysis (FMECA) or even an implementation of
MSG-3, if it is more suitable. I discuss these methods, in details, later.

Reliability by itself is a measure of effectiveness (MOE). In order to keep track of


reliability I develop some measures of performance (MOP), and by using them we can
determine the results of our reliability corrective actions, if any. We can also keep track
of our system’s ability to be maintained, and if we consider the operational requirements
and logistic data, then we can evaluate its availability as well. Definitions and a
discussion of reliability are included in Appendix D.
8. Area of Research
This study provides a basis for conducting reliability tracking for SUAVs to
improve techniques and methodologies that increase SUAVs readiness. To achieve this,
existing methodologies of controlling reliability, FMEA and reliability centered
maintenance (RCM) with maintenance steering group-3 (MSG-3) are analyzed and
compared. Finally, a criticality analysis provides a method for SUAV operators to
account for and to mitigate risk during operations.

51 Peck.

21
THIS PAGE INTENTIONALLY LEFT BLANK

22
II. RELATED RESEARCH

A. EXISTING METHODS
The following section, presents and analyzes existing general methods of failure
tracking and analysis, as well as the existing reliability centered maintenance method that
has been used by the civil aviation industry. A comparison between them, focusing on
small UAV (SUAV) application, is also presented.
1. General: FMEA, FMECA and FTA
a. Introduction to Failure Mode and Effect Analysis (FMEA)
Well-managed companies are interested in preventing or at least
minimizing risk in their operations, through risk management analysis. “The risk analysis
has a fundamental purpose of answering the following two questions:

• What can go wrong?

• If something does go wrong, what is the probability of it happening and


what are the consequences?”52

To answer these questions, previously forensic techniques were used.


Today the focus has changed. “The focus is on prevention.”53

FMEA is one of the first systematic techniques for failure analysis. “An
FMEA is often the first step of a system’s reliability study.”54 It incorporates reviewing
components, assemblies and subsystems to identify failure modes, causes and effects of
such failures. FMEA is a systematic method of identifying and preventing product and
process failures before they occur. It is focused on preventing defects, enhancing safety,
and increasing customer satisfaction.

52 Stamatis, D. H., Failure Mode and Effect Analysis: FMEA from Theory to Execution, American
Society for Quality (ASQ), 1995, page xx. The above part of section is a summary and paraphrase (in some
places verbatim) of “Introduction.”
53 Ibid, page xxi.
54 Hoyland, A., and Rausand, M., System Reliability Theory: Models and Statistics Methods, New
York: John Wiley and Sons, 1994, page 73.
23
The purpose of FMEA is preventing process and product problems before
they occur.55 Used in the design and manufacturing process, FMEAs reduce cost and
efforts by identifying product and process improvements early in the development phase
when it is easier, faster and less costly to make changes. Formal FMEAs were first
conducted in the aerospace industry in the mid 60’s, when looking at safety issues.
Industry in general (automotive particularly) adapted the FMEA for use as a quality
improvement tool.

“FMEA is a specific methodology to evaluate a system, design, process or


service, for possible ways in which failures (problems, errors, risks, and concerns) can
occur.”56 For each of the failures identified, an estimate is made for its occurrence,
severity, and detection. Then an evaluation is made for the necessary action to be taken,
planned, or ignored. The effort focuses on minimizing the probability of failure or the
effect of failure. This approach can be technical or nontechnical. Technical is the
quantitative way, in other words, the way in which we determine, express, and measure
the quantity of something. Nontechnical is the qualitative way, which is relative to, or
involves the quality of something. For both ways, the focus is on the risk one is willing to
take. In that way, FMEA becomes a systematic technique using engineering knowledge,
reliability, and organizational development techniques.57
b. Discussion
FMEA, as a qualitative analysis, is better carried out during the design
stages of the system. “The purpose is to identify design areas where improvements are
needed to meet reliability requirements.”58 It provides an important basis for design
reviews and inspections. It can be carried out using the bottom-up or the top-down
approach. With the bottom-up approach or hardware approach, FMEA starts at the
component level and expands upward. When the expansion is from the system level
downwards, then the top-down or functional approach is being used. Most FMEA are

55 McDermott, E. R., Mikulak, J. R, and Beauregard, R. M., The Basics of FMEA, Productivity Inc.,
1996, page 4.
56 Stamatis, page xxi.
57 Stamatis, page xxii. The above part of section is a summary and paraphrase (in some places
verbatim) of “Introduction.”
58 Hoyland, page 74.

24
carried out according to the bottom-up approach. However, for some systems adopting
the top-down approach can save time and effort.59

In order to have a formal FMEA process, accurate data is key. Given


accurate data, one can make the proper assumptions and calculations, producing an
accurate FMEA process. Accurate data presume a comprehensive quality system
implementation. Without accurate data “on a product or process, the FMEA becomes a
guessing game, based on opinions rather than actual facts”. Implementing a quality
system assures standard procedures and proper documentation and thus yields reliable
data.60

“The basic questions to be answered by FMEA are

(1) How can each part of the system possibly fail?

(2) What mechanisms might produce these modes of failure?

(3) What could the effects be if the failures did occur?

(4) Is the failure in the safe or unsafe direction?

(5) How is the failure detected?

(6) What inherent provisions are provided in the design to compensate for
the failures?”61

There are at least four prerequisites we must understand and must consider
while conducting FMEA:

(1) All problems are not the same and not equally important.

(2) Know the customer (end user).

(3) Identify the function’s purpose and objective.

(4) When doing an FMEA, it must be prevention oriented.62


59 Hoyland, page 76. The above part of section is a summary of “Bottom-up versus Top-down
Approach.”
60 McDermott, page 4. The above part of section is a summary of “Part of a Comprehensive Quality
System.”
61 Hoyland, page 76.
62 Stamatis, page xxii-xxiii.

25
Definitions of terms related to failure and failure modes are presented in
Appendix C.
c. FMEA: General Overview
For a system, a FMEA “is an engineering technique used to define,
identify and eliminate known and/or potential failures” before they reach the end user.63
A FMEA may take two courses of action. First, using historical data there may be an
analysis of data for similar products or systems. Second, inferential statistics,
mathematical modeling, simulations, and reliability analysis may be used concurrently to
identify and define the failures. A FMEA, if conducted properly and appropriately, will
provide the practitioner with useful information that can reduce the risk load in the
system. It is one of the most important early preventive actions in a system, which can
prevent failures from occurring and reaching the user. “FMEA is a systematic way of
examining all the possible ways in which a failure may occur. For each failure, an
estimate is made of its effect on the system, of its seriousness of its occurrence, and its
detection.” As a result, corrective actions required to prevent failures from reaching the
end user will be identified, thereby assuring the highest durability, quality and reliability
possible in the system. 64
d. When is the FMEA Started?65
As a methodology used to maximize the end user’s satisfaction by
eliminating and/or reducing known or potential problems, FMEA must begin as early as
possible, even if all the facts and information are not yet known. After FMEA begins, it
becomes a living document and is never really complete. It uses information to improve
the system and it is continually updated as necessary. Therefore, an FMEA should be
available for the entire system life.

63 Stamatis, page 25.


64 Stamatis, page 26. The above part of section is a summary and paraphrase (in some places verbatim)
of “FMEA: A General Overview.”
65 The material from this section is taken (in some places verbatim) from: Stamatis, page 29, “When is
the FMEA Started?”
26
e. Explanation of the FMEA66
Identification and prevention of known and potential problems from
reaching the end user is the essence of an FMEA system. One of the assumptions that
must be made is that problems have different priorities. Finding or setting priorities is
important because that is the main issue, which drives the methodology. Three
components help define the priority of failures: occurrence, severity and detection.

Occurrence is the frequency of the failure. Severity is the


seriousness (effects) of the failure. Detection is the ability to detect the
failure before it reaches the customer. To define the value of these
components, the usual way is to use numerical scales called risk-criteria
guidelines. These guidelines can be qualitative and/or quantitative.67

If the guideline is qualitative, then it must follow the theoretical expected


behavior of the potential component. For occurrence the expected behavior follows a
normal distribution because frequencies tend to be like that over time. For severity, the
expected behavior is lognormal. This is due to the fact that failures, which do occur,
should cause annoyance, and they are not usually critical or catastrophic. So the guideline
should follow a right-skewed distribution. For detection, the expected behavior is that of
a discrete distribution. This is expected due to the fact that there is more concern if the
failure is found by the end user than finding it during the manufacturing phase in the
production facilities. So the guideline should follow a distribution with a gap between
values.

If the guideline is quantitative, it must be specific. It is not necessary for


the guideline to follow a theoretical distribution.

Ranking for the criteria usually has a value based on 1 to 10 scales. It


provides ease of interpretation, accuracy, and some precision in the quantification of the
ranking. Ranking using scales from 1 to 5, if used, offers convenience but does not give
an accurate “quantification because it reflects a uniform distribution.”68

66 The material from this section is taken (in some places verbatim) from: Stamatis, page 33,
”Interpretation of FMEA.”
67 Stamatis, page 33.
68 Stamatis, page 35.

27
The failure’s priority is represented through the risk priority number
(RPN), which is the product of occurrence times severity times detection. The value of
RPN is used only to rank order the concerns of the system. If there are more than two
failures with the same RPN, we first address the failure with the higher severity and then
with the higher detection. Severity comes first because it has to do with the effects of the
failure. Detection is next because user dependency is more important than the failure
frequencies.

The objective for product/design FMEAs is to reveal product problems


that will result in safety hazards, malfunctions or shortened product life. FMEAs can be
conducted at each phase in the design process (initial design, prototype, final design) or at
the production process while it is occurring. “How can the product fail?” is the basic
question asked in design FMEAs.69
f. The Eight Steps Method for Implementing FMEA70
The eight steps of the method are:

(1) Select the team: The team should be “cross-functional and


multidisciplinary and the team members must be willing to contribute.” After the team
has been identified, it prioritizes the opportunities for improvement.

(2) Do the functional block diagram: The first step for every
attempt to solve any problem is to become familiar with the subject to ensure that
everyone on the FMEA team has the same understanding of the process and the
production phases. A blueprint, an engineering drawing, or a flowchart review is
necessary. If it is not available, the team needs to create one. Team members should see
the product or a prototype and walk through the production process exactly. A block
diagram of the system provides an overview and a working model of the relationships
and interactions of the system’s subsystems and components.

69 McDermott, page 25. The above part of section is a summary and paraphrase (in some places
verbatim) of “Product/Design.”
70 The material from this section is taken (in some places verbatim) from Stamatis, pages 42-44, “The
Process of Conducting an FMEA,” and McDermott, pages 28-42, “The FMEA Worksheet.”
28
(3) Collect data: The team begins to collect and categorize data.
Then they should start filling the FMEA forms. The failures identified are the failure
modes of the FMEA.

(4) Brainstorm and prioritize potential failure modes: Important


issues of the problem are recognized by the team. The team can now begin thinking about
potential failure modes that could affect the product function, quality or manufacturing.
Brainstorm sessions place all ideas out on the table. The objective is to create dozens of
ideas. The ideas should be organized by grouping them into similar categories. Grouping
can be done by the type of failure, (e.g. mechanical, electrical, communication etc) or the
seriousness of the failure. At that step, the FMEA team reviews the failure modes and
identifies the potential effects of any failure. This step is like an “if-then statement”
process. If that failure occurs, then what are the consequences?

(5) Analysis: Assign a severity, occurrence and detection rating for


each effect and failure mode. The sequence from data to information to knowledge to
decision is followed. The analysis could be qualitative or quantitative and anything may
be used (cause and effect analysis, mathematical modeling, simulation, reliability
analysis etc). At this step, severity, occurrence, and detection ratings must be estimated.
Those ratings are based on a 10-point scale, with number 1 being the lowest and 10 the
highest in importance. Establishing clear and concise descriptions for the points on each
of the scales is important so that all team members have the same understanding of the
ratings.

(a) The severity rating estimates how serious the effect


would be if a given failure did occur. Each effect should be given its own severity rating,
even if there are several effects for a single failure mode.

(b) The most accurate way to determine the occurrence


rating is by using actual failure data from the product. When this is not possible, failure
mode occurrence must be estimated. Knowing the potential cause of failure can produce a
better estimate. Once the potential causes have been identified for all of the failure
modes, an occurrence rating can be assigned, even without failure data.

29
(c) By assigning the detection rating, we estimate how
likely we are to detect a failure or the effect of a failure. We start by identifying controls
that may detect a failure or the effect of a failure. In case there are no controls, the
likelihood of detection will be low and the item would receive a high rating (9-10).

(6) Results: Results are derived from the analysis. RPNs must be
calculated and all FMEA forms are completed. The RPN is the product of severity, times
occurrence, times detection for all of the items. The total RPN is the sum of all RPNs.
This number is used as a metric to compare the revised total RPN against the original
RPN, once the recommended actions have been introduced. From the highest RPN to the
smallest, we can now prioritize the failure modes. A Pareto Chart or other diagram helps
to visualize the differences between the various ratings and enables decision regarding on
which items to work. Usually it is useful to set a threshold RPN such that everything
above that point is addressed.

(7) Confirm, evaluate and measure: After the results have been
recorded, confirmation, evaluation, and measurements of the success or failure are done.
Using an organized process, we can identify and implement actions to eliminate or reduce
the problem of high-risk failure modes. It is very common to manage a reduction on a
high-risk failure mode. After doing that, we refer back to the severity occurrence and
detection ratings. Often the easiest approach to make a process or product improvement is
to increase detectability of the failure, thus lowering the detection rating. This is not the
best approach because increasing failure-detectability only makes it easier to detect
failures once they occur. Reducing severity is important, especially in situations leading
to injuries. The best way for improvement is by reducing the likelihood of the occurrence
of the failure. And if it is highly unlikely that a failure will occur, there is less need for
detection measures. Evaluation answers the question: “Is the situation better, worse or the
same as before?”

(8) Do it all over again: The team must pursue improvement until
the failures are completely eliminated, regardless of the answer from Step 7, because
FMEA is a process of continual improvement. The long-term goal is to eliminate or
mitigate every failure completely. The short-term goal is to minimize the effects of the

30
most serious failures, if not eliminate them. Once action has been taken to improve the
product, new ratings should be determined and a resulting RPN calculated. For the failure
modes that have been corrected, there should be a reduction in the RPN. Resulting RPNs
and total RPNs can be organized in diagrams and compared with the original RPNs.
There is no target RPN for FMEAs. It is up to the organization to decide on how far the
team should pursue improvements. Failures happen sooner or later. The question is how
much relative risk the team is willing to take. The answer, again, depends on
management and the seriousness of failure.
g. FMEA Team
“A team is a group of individuals who are committed to achieving
common organizational objectives.” They meet regularly to identify, to solve problems,
and to improve processes. They work and interact openly and effectively together and
produce the desired results for the organization. “Synergy,” which means that “the sum of
the total is greater than the sum of the individuals,” is the characteristic of a team. 71

“One person typically is responsible for coordinating the FMEA process


but all FMEAs are team-based.” Team members “bring a variety of perspectives and
experiences to the project.” They are “formed when needed and disbanded” after the
FMEA is completed. 72 The first priority for the team is to define the scope of FMEA. A
clear definition of the product or process to be studied should be written and understood
by all team members.
h. Limitations Applying FMEA73
(1) “FMEA analysis may be very effective when applied to a
system in which system failures most probably are the results of single-component
failures.” In that way, “each failure is considered individually as an independent
occurrence.” So, an FMEA is not the best approach for analyzing systems with a fair
degree of redundancy (dependency). For such systems, a Fault Tree Analysis (FTA) is a
better alternative.

71 Stamatis, pages 85-88. The above part of section is a summary and paraphrase (in some places
verbatim) of “What Is a Team?” and “Why Use a Team?”
72 McDermott, page 15. The above part of section is a summary and paraphrase (in some places
verbatim) of “The FMEA Team.”
73 Hoyland, page 80, the above part of section is a summary of “Applications.”

31
(2) FMEA gives inadequate attention to human errors because the
focus is on hardware failures.

(3) The amount of insignificant work that must be done is also a


disadvantage. Component failures, including those with insignificant consequences, are
examined and documented. For large complex systems with a high degree of redundancy,
the amount of trivial and unnecessary work is huge.
i. FMEA Types
Generally there are four types of FMEA: System, design, process and
service. In the SUAV case, we deal with the system and design FMEA. Failure modes are
caused by system deficiencies in the functions of the system. Deficiencies include
interactions among subsystems and elements of the system.
j. System and Design FMEA74
We focus on system/design FMEA once we begin to analyze the reliability
for SUAVs. A system FMEA is usually accomplished in steps, which “include
conceptual design, detailed design and development, and testing and evaluation.”
Establishing a system FMEA, uses a system engineering process as well as a product
development methodology, or research and development, or a combination of all these.
During the early stages of development, the main focus is to

• Turn an operational need into a demand for system performance


parameters and system configuration through “the use of an interactive
process.”

• “Integrate related technical parameters and assure compatibility of


physical, functional, and program interfaces” optimizing the total system.

• “Integrate reliability, maintainability, engineering support, human factors,


safety, liability, security, and other related specialties into the total
engineering effort.”

74 The material from this section is taken (in some places verbatim) from: Stamatis, pages 101-129,
“System FMEA,” “Design FMEA.”
32
The first step in conducting the system FMEA is a feasibility study to find
solutions to a problem. The outcome of the system FMEA is an initial design with a
baseline configuration and operational specifications.

Design FMEA is a method of “identifying potential or known failure


modes and providing corrective actions” before the production line starts. Initial sample
runs or prototype runs and trial runs are excluded. The milestone for the first production
run is important because after that point any modification and/or change in the design it
would be a major problem due to the amount of effort, time and cost required to do the
changes in that stage. The design FMEA is a “dynamic process” involving the
implementation of numerous “technologies and methods to produce an effective design.”
This result will be an input for the process, and/or the service FMEA.

The first step in conducting the design FMEA should be a “feasibility


study and/or a risk-benefit analysis.” The objective of this early stage is to optimize the
system, which means to maximize the system quality, reliability and maintainability, and
minimize cost. The outcome of the design FMEA is a preliminary design, which can be
used as baseline configuration and functional specifications.
k. Analysis of Design FMEA75
There are two main methods of design: design-to-cost and design-to-
customer requirements. In the first approach, the main goal of the design is to keep costs
within a certain budget. This is also called value-engineering analysis and it is suitable
for commercial products with minimum safety standards. In the design-to-customer
requirements approach, the primary designer’s concern is to satisfy the customer’s
requirements and safety and regulatory obligations. This is common for products related
with military applications and with high safety standards.

A design FMEA starts with two requirements:

• Identifying the appropriate form, and

• Identifying the rating guidelines

75 Stamatis, page 129-130.

33
The form and the rating guidelines for the design FMEA (or any kind of
FMEA) are not standardized. Each one performing FMEA makes his own forms and
rating guidelines, which correspond to the project’s special requirements and
characteristics, as well as the designer’s vision and experience.

There are also two ways that the rating guidelines can be formulated: The
qualitative method and the quantitative method. In both cases, the numerical values can
be from 1 to 5 or 1 to 10, which is most common.

An example of design FMEA form is in Table 1. The form is divided into


three parts. The first part with item numbers from 1 to 10 is the introduction part. The
second part of the form includes items 11 to 24 which are the body items of any design
FMEA. The third part items 25 and 26 concern authority and responsibility of the FMEA
team. Definition of terms is in Appendix A.

34
(1) Subsystem Name (4) Supplier Involvement (8) FMEA Date

(2) Design Responsibility (5) Model/Product (9) FMEA Revision Date

(2A) The Head of the System Design Team (6) Engineering Release Date (10) Part Name

(3) Involvement of Others (7) Prepared by Page___of___Pages

(11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) Actions Results
Design Potential Potential Critical S Potential O Detection D R Recommended Responsible Area
Function Failure Effect(s) Characteri E Cause(s) C Method E P Action or Person and (24)
Mode of stics V of Failure C T N Completion Date (23) S O D R
Failure Action E C E P
Taken V C T N

(25) Approval Signatures (26) Concurring Signatures

Table 1. An Example of Design FMEA (From Stamatis, page 131)

35
l. FMEA Conclusion
Technology can develop complex systems today. UAVs are an example of
the increased automation built into a complex system. To be able to develop these
systems efficiently, a number of appropriate system development processes can be used.
Implementing such a process from the early stages of design is important for total
development, cost, and time.

The objective of a FMEA is to look for all the ways a system or product
can fail. Failure occurs when a product or system does not function as it should, or when
the user makes a mistake. Failure modes are ways in which a product or process can fail.
Each failure mode has a potential effect. Some effects are more likely to occur than
others. Each effect has a risk associated with it. The FMEA process is a way to identify
failure modes effects and risks within a process or product, and eliminate or reduce them.

The most important reason for conducting an FMEA is the need to


improve. FMEAs have a positive impact because of their preventive role. The purpose of
FMEA is preventing system and product problems before they occur. Used in the design
and manufacturing process, they reduce cost and efforts by identifying product and
system improvements early in the development phase when it is easier, faster and cheaper
to make changes.
m. Other Tools76
(1) Fault Tree Analysis (FTA). This is a reasoned-conclusion
“analytical technique for reliability and safety analysis used for complex dynamic
systems.” It provides an “objective basis” for further analysis and changes. It was
developed in 1961 by Bell Telephone Company and is widely used in many applications
in industry. FTA is a logical tree in which the “various combinations of possible events”
are represented graphically. It shows the “cause and effect relationships” between a
single failure and its causes. At the top of the tree is the failure, and the various
contributing causes are at the bottom branches of the tree. “The FTA always supplements
the FMEA.”

76 The material from this section is taken (in some places verbatim) from Stamatis, pages 51-67,
“Relationships of FMEA and Other Tools.”
36
This thesis develops a FTA for SUAVs. FTA process outline
follows:

(a) Identify the system fault state(s) or undesired events.


The top event must be quantifiable, definable, noticeable, controllable and inclusive from
the lower events.

(b) Proceed with Fault tree construction. Determine the


level to which the examination should be conducted and fully describe all events that
immediately caused this event. With each lower level fault, describe its immediate causes
until a component level failure or human error is exposed.

(c) Fault tree analysis is the last step in which we must


determine the minimal cut sets for tree simplification and the probability of each input
event. For the AND logic gates the probability of the output is the product of the inputs
probabilities while for the OR logic gates it is the sum if and only if the events are
mutually exclusive. Finally we must determine the top event probability.

(2) Functional flow diagrams or block diagrams “illustrate the


physical or functional relationships” within a system under analysis. They are used to
give a quick and comprehensive view of the system design requirements illustrating
series and parallel relationships, hierarchy and other relationships among the system’s
functions. The types of block diagrams used in FMEA are:

(a) System Diagrams, used for identifying relationships


between major components and other system components in large systems composed of
several assemblies or subsystems,

(b) Detail Diagrams, used for identifying relationships


between each part within an assembly or subsystem, and

(c) Reliability Diagrams, used for identifying the series


dependence or independence of major components, subsystems or detail parts in
achieving required functions.

(3) FMA. “Failure mode analysis (FMA) is a systematic approach


to quantify failure modes, failure rate and root causes of known failures.” FMA is based
37
only on historical field and process data. It is a diagnostic tool because it concerns itself
with only known and/or occurred failures. “Both FMA and FMEA deal with failure
modes and causes.” FMA may be conducted first and then the outcome becomes input for
the FMEA.

(4) FMECA, FAMECA. An FMEA becomes (FMECA or


FAMECA) Failure Mode, Effects and Critically Analysis if criticalities are assigned to
the failure mode effects.77 An analysis like that identifies any faulty components in the
system so their reliability, or safety of operation, can be improved early enough so the
designer can make corrections and set limitations in the design. FMECA results may also
be useful when modifying the system and for maintenance planning. In a complex system
all components cannot be redesigned. The most critical components are scientifically
selected, and only these should be improved. FMECA is usually conducted during the
design phase of a system.

(5) FMCA. “Failure mode and critical analysis (FMCA) is a


systematic approach to quantify failure modes, rates and root causes from a criticality
perspective.78” It is similar to the FMEA in all other details. An FMCA analysis is used
“where the identification of critical, major and minor characteristics is important.” By
focusing on criticality one can identify the single-point failure modes, which are a human
error or hardware failure that can result in an accident.

(6) QFD. Quality function deployment (QFD) is a systematic


methodology that unites the various working groups within a corporation and guides
them to focus on customer’s choices, demands and expectations. QFD “encourages a
comprehensive, holistic approach to product development.” It is a tool that interprets the
customer’s requirements, through specific characteristics, manufacturing operations and
production requirements. QFD and FMEA have much in common. They both target
continual improvement by eliminating failures and looking for customer satisfaction.
Usually, QFD occurs first and based on the results FMEA follows.

77 Hoyland, page 74.


78 Stamatis, page 62.

38
(7) RCM. 79 Reliability-centered maintenance (RCM) has its roots
in the aviation industry.80 Airlines and airplane manufacturers developed the RCM
process in the late 1960’s. The initial development work was started by North American
civil aviation industry. The airlines at that time began to realize that existing maintenance
philosophies were not only too expensive but very dangerous as well. In 1980, an
international civil aviation group developed an inclusive basis for different maintenance
strategies. This basis is known as the Maintenance Steering Group-3 (MSG-3) for the
aviation industry.81

The earliest view of failure in the 1930’s was that as products aged,
due to wear and tear, they were more likely to fail. So the best way to optimize system
reliability and availability was by providing maintenance on a routine basis. During
World War II, awareness about infant mortality led to the widespread belief in the
“bathtub curve”. In that case, overhauls or component replacements should be done at
fixed time intervals to optimize system reliability and availability. This is based on the
assumption that most systems operate reliably for a period of “X” and then wear out.
Keeping records on failures enables us to determine “X” and take preventive actions just
before deterioration starts. This model is true for certain types of simple systems and
some complex ones with age-related failure modes. However, after 1960, due to
complexity of the systems, research revealed that six failure patterns actually occur in
practice. Data collection and analysis will enable NAVAIR to determine which apply to
SUAVs.

(a) The bathtub curve. It begins with high


occurrence/incidence of failure, which is the infant mortality, followed by constant or
gradually increasing conditional probability of failure, and ends up in a wear-out zone
due to age.

79 The material from this subsection is taken (in some places verbatim) from: Aladon Ltd, Specialists
in the application of Reliability-Centered Maintenance, “Reliability Centred Maintenance-An
Introduction,” Internet, February 2004. Available at: www.aladon.co.uk/10intro.html
80 Hoyland, page 79.
81 Aladon Ltd, Specialists in the application of Reliability-Centered Maintenance, “About RCM,”
Internet, February 2004. Available at: www.aladon.co.uk/02rcm.html
39
(b) Constant or slowly increasing conditional probability of
failure, ending in a wear-out zone.

(c) Slowly increasing conditional probability of failure, but


no recognizable wear-out zone.

(d) A low conditional probability of failure when the system


is new and then a rapid increase to a constant level.

(e) A constant conditional probability of failure at all ages.

(f) A high infant mortality during the early period and then
constant or slowly decreasing conditional probability of failure.

The above six failure patterns are illustrated in the next figure.

Failure Rate
Failure Rate

Age Age
(a) (d)
Failure Rate

Failure Rate

Age Age
(b) (e)
Failure Rate

Failure Rate

Age Age
(c) (f)
Figure 1. The Six Failure Patterns

40
The idea of RCM is based on the realization that what users want depends
on the operating context of the system. So RCM is “a process used to determine what
must be done to ensure that any physical asset continues to do what its users want it to do
in its present operating context.” The RCM process asks seven questions about the
system under review. Any RCM process should ensure that all of the following seven
questions are answered satisfactorily in the sequence shown below:

• What are the functions and associated desired performance standards of


the system in its present operating context? (Functions).

• In what ways can it fail to fulfill its functions? (Functional failures).

• What causes each functional failure? (Failure modes)

• What happens when each failure occurs? (Failure effects).

• In what way does each failure matter? (Failure ramifications)

• What should be done to predict or prevent each failure? (Proactive tasks


and task intervals).

• What if a convenient solution cannot be found? (Default actions)

Definitions of terms related to functions, functional failures, failure


modes, and failure effects are presented in appendix C.

(8) TAAF. The Test-Analyze And Fix (TAAF) philosophy is


accomplished in an iterative manner by conducting tests, collecting data, analyzing data,
making the appropriate modifications and starting the tests again. The process starts by
conducting tests on the prototypes. The failure data are collected and the causes are
sought. Corrective actions are then taken to reduce the occurrence of future failures. The
same process is repeated until the tests results are acceptable.

Some characteristics of TAAF process are

• All failures are fully analyzed.

• Actions are taken in the design and/or production phase to ensure that
failures do not recur.
41
• Tests are done at high level since improvements at that level have the
maximum effect on system reliability.

• Corrective actions must be taken as soon as possible on all components in


the development program.

In general, TAAF is a time consuming and costly reliability growth


process, which resembles the spiral method of project development.82

(9) FRACAS. A Failure Reporting Analysis and Corrective Action


system (FRACAS)83 or Data Reporting Analysis and Corrective Action System
(DRACAS) is commonly referred as a “closed loop reporting system.” Implemented for a
program during production, integration, testing, and field deployment phases, it allows
for the collection and analyses of reliability and maintainability data for the hardware and
software items. For a successful reliability improvement program, all failures should be
considered. Every hardware and software failure, including the most simplistic ones, such
as those caused by loose nuts and bolts or loose cables, should be investigated. Corrective
action for each one should be developed. The manufacturer can use FRACAS results to
“incorporate the corrective actions into the product.” 84 We develop a FRACAS for
SUAVs in this thesis.
2. Manned Aviation Specific: RCM, MSG-3
a. Introduction to RCM
Reliability centered maintenance (RCM) originated in the aviation
industry in the late 60s. In the mid 70s, the US Department of Defense wanted to know
more about aviation maintenance. As a result, Stanley Nowlan and Howard Heap of the
United Airlines wrote a report titled “Reliability Centered Maintenance.” It was
published in 1978, and it is still one of the most important documents in the history of
physical asset management.85 RCM is “a process used to determine what must be done to

82 Blischke, R. W., and Murthy D. N. Prabhakar, Reliability Modeling, Prediction, and Optimization,
John Wiley & Sons, 2000, page 547-548.
83 Pecht, M., Product Reliability Maintainability and Supportability Handbook, CRC Press, 1995,
page 322.
84 Ibid, page 324.

42
ensure that any physical asset continues to do what its users want it to do in its present
operating context.”
b. The Seven Questions
The RCM process answers seven questions about the system under review.
Any RCM process should ensure that all of the following seven questions are answered
satisfactorily and are answered in the sequence shown below:

(1) What are the functions and associated desired performance


standards of the system in its present operating context (functions).

(2) In what ways can it fail to fulfill its functions? (Functional


failures)

(3) What causes each functional failure? (Failure modes)

(4) What happens when each failure occurs? (Failure effects)

(5) In what way does each failure matter? (Failure ramifications)

(6) What should be done to predict or prevent each failure?

(7) What if a preventative approach cannot be found? (Default


actions)

While defining the functions and desired standards of performance of a


system, the objectives of maintenance are defined. Defining functional failures enables
exact explanation of the meaning of failure. The functions and functional failures were
addressed by the first two questions of the RCM process. The next two questions
identified the failure modes, which are more likely to cause each functional failure, and to
find out the failure effects associated with each failure mode. This is done by performing
an FMEA for each functional failure.86
c. RCM-287

85 Moubray, John summarized by Sandy Dunn, Plant Maintenance Resource Center, “Maintenance
Task Selection-Part 3,” Revised September 18, 2002, Internet, May 2004. Available at: http://www.plant-
maintenance.com/articles /maintenance_tak_selection_part2.shtml
86 The material from this part of section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction.”
87 The material from this section is taken (in some places verbatim) from: Aladon Ltd, “About RCM.”

43
Nowlan and Heap’s report and MSG-3 have been used as a basis for
various military RCM standards and for non-aviation derivatives. Of these, by far the
most widely used is RCM-2.

RCM-2 is a process used to decide what must be done to ensure that any
physical asset, system or process continues to perform exactly as its user wants it to. The
process defines what users expect from their assets in terms of

(1) Primary performance parameters such as output, throughput,


speed, range and carrying capacity, and

(2) Risk (safety and environmental integrity), quality (precision,


accuracy, consistency and stability), control, comfort, containment, economy, customer
service and so on.

The second step in the RCM-2 process is to identify the ways the system
can fail, followed by an FMEA to associate all the events that are likely to cause each
failure.

The last step is to identify a suitable failure management policy for dealing
with each failure mode. These policy options may include predictive maintenance,
preventive maintenance, failure finding, or changing the design and/or configuration of
the system.

The RCM-2 process provides rules for choosing which of the failure
management policies is technically appropriate and presents criteria for deciding the
frequency of the various routine tasks.
d. SAE STANDARD JA 1011
RCM-2 complies with SAE Standard JA 1011 or “Evaluation Criteria for
Reliability-Centered Maintenance (RCM) Process.” It was published in August 1999 by
the Society of the Automotive Engineers (SAE). It is a brief document setting out the
minimum criteria that any process must include to be called an RCM process when
applied to any particular asset or system.88

88 The material from this section is taken (in some places verbatim) from: Aladon Ltd, “About RCM.”

44
The standard says that in order to be called an “RCM” process, a process
must get satisfactory answers to the seven questions above, which must be asked, in that
particular order. The rest of the standard identifies the information that must be gathered,
and the decisions that must be made in order to answer each of these questions
satisfactorily. 89
e. MSG-390
In July 1968, Handbook MSG-1, “Maintenance Evaluation and Program
Development,” was developed by various airlines and air manufacturers’ representatives.
Decision logic and airline/manufacturer procedures for scheduled maintenance
development for the new Boeing 747 were the main part of the document.

In the 1970’s the “Airline/Manufacturer Maintenance Program Planning


Document” or MSG-2 was released. It was a universal document that updated the
decision logic for the latest aircraft.

In 1979, after a decade of MSG-2 implementation, “experience and events


indicated” that MSG procedures needed updating. In addition, new generation aircraft
maintenance requirements, new regulations on maintenance programs, the high price of
fuel and spare parts greatly influenced maintenance program development. Various areas
that where “most likely candidates for improvement” were the difficulty of the decision
logic, the clarity of the difference between economic and safety issues, and the
effectiveness of the hidden functional failures solutions.

With the participation and combined efforts of the Federal Aviation


Authority (FAA), Civil Aviation Administration from the UK (CAA/UK), the American
Engineering Association (AEA), US and European aircraft engine manufacturers,
airlines, and the US Navy created the MSG-3 document.

89 The material from the above part of section is taken (in some places verbatim) from: Athos
Corporation, Reliability-Centered Maintenance Consulting, “SAE RCM Standard: JA 1011, Evaluation
Criteria for RCM Process,” Internet, February 2004. Available at: http://www.athoscorp.com/SAE-
RCMStandard.html
90 The material from this section is taken (in some places verbatim) from: Air Transport Association of
America, “ATA MSG-3, Operator/Manufacturer Scheduled Maintenance Development, Revision 2002.1,”
Nov 30, 2001, pages 6-8.
45
Some of the major improvements presented by MSG-3 as compared to
MSG-2 were

(1) For systems and powerplant treatment:

(a) MSG-3 provides a “more rational procedure for task


definition” and “linear progression through the decision logic.”

(b) “MSG-3 logic took a top-down or consequence of


failure approach.” At the beginning, the functional failure was evaluated for the
consequences of failure and was assigned one of two basic categories, safety or
economic.

(c) Further classification established sub-categories based


on “whether the failure was evident to or hidden from the operating crew.”

(d) “Task selection questions were arranged in a sequence”


so that the “most easily accomplished task, was considered first.” If the task was not
applicable or effective, then “the next task in sequence was considered, down to and
including possible redesign.”

(2) Structures treatment, “fatigue, corrosion, accidental damage,


age exploration” and other considerations were incorporated in the logic diagram.

(3) “MSG-3 recognized the new damage tolerance rules and the
supplemental inspection programs and provided a method by which their purpose could
be adapted to the Maintenance Review Board (MRB) process instead of relying on type
data certificate restrains.” The MRB is discussed in Appendix B.

(4) MSG-3 logic was “task-oriented and not maintenance process


oriented.” With the task-oriented concept, “one would be able to view the MRB
document and identify the initial scheduled maintenance for a given item.” Definitions
for the MRB are in appendix B.

(5) Servicing/lubrication was included as part of the logic diagram


to emphasize its severity.

46
(6) Treatment of hidden functional failures was more thorough
because of their distinct separation from the evident functional failures.

(7) “The effect of concurrent or multiple failures was considered.”

(8) “Structures decision logic no longer contained a specific


numerical rating system.”
f. MSG-3 Revision91
In 1987, after seven years of MSG-3 experience, the first revision was
undertaken and released, and in 1993 revision two followed. In 2001, MSG-3 revision
2001 was incorporated and in 2002, revision 2002 was issued and is now in effect.

MSG-3 is intended to facilitate the development of initial


scheduled maintenance. The remaining maintenance (that is non-
scheduled or non-routine maintenance) consists of maintenance actions to
correct discrepancies noted during scheduled maintenance tasks, other
non-scheduled maintenance, normal operation or data analysis.

The analysis process identifies all scheduled tasks and intervals based on
the aircraft’s certificated operating capabilities.

“The management of the scheduled maintenance development activities”


should be accomplished by an Industry Steering Committee (ISC), which consists of
members from representatives of operators, and prime airframe and engine
manufacturers. “The ISC should see that the MSG-3 process identifies 100%
accountability for all Maintenance Significant Items (MSI’s) and Structural Significant
Items (SSI’s).”

An MSI is an item that has been identified by the manufacturer whose


failure

• can affect ground or flight safety, and/or

• is undetectable during operation time, and/or

• could have significant operational and/or economic impact.92

91 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 9-13.
92 ATA MSG-3, page 87.

47
A SSI is any “element or assembly,” related to significant flight, ground,
pressure or control loads. An SSI failure could affect the structural integrity of the
aircraft.93

“One or more working groups, composed of specialist representatives


from the participating operators, the prime manufacturer and the Regulatory Authority,
may be constituted.” The ISC will approve analyses, technical data and information,
which will be “consolidated into a final report for presentation to the Regulatory
Authority.”
g. General Development of Scheduled Maintenance94
For each new type of aircraft, it is necessary to develop scheduled
maintenance prior to its introduction into airline service. The MSG-3 (revision 2002)
document has the primary purpose “to develop a proposal to assist the Regulatory
Authority in establishing initial scheduled maintenance tasks and intervals for new types
of aircraft and/or powerplants.” The intention is to maintain and to enhance the inherent
“safety and reliability levels of the aircraft.” As operating experience is gained, the
operator may make additional adjustments to maintain and to enhance safety and
reliability.

The objectives of efficient aircraft scheduled maintenance are

• To ensure the inherent safety and reliability levels of the aircraft;

• “To restore safety and reliability to their inherent levels when deterioration
has occurred;”

• “To obtain the information needed for design improvement of those items
whose inherent reliability proves insufficient;”

• To achieve the above goals at a minimum total cost.

From the above objectives, obviously, scheduled maintenance can only


prevent deterioration of inherent levels. If the inherent levels are unsatisfactory, then
redesign is necessary to achieve the desired safety and reliability levels.
93 Ibid, page 89.
94 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 14-16.

48
Scheduled maintenance consists of two groups of tasks:

(1) “A group of scheduled tasks to be accomplished at specified


intervals. The objectives of these tasks are to prevent deterioration of the inherent safety
and reliability levels of the aircraft.” They may include lubrication/servicing (LU/SV),
operational/visual check (OP/VC), inspection/functional check (IN/FC), restoration (RS)
and discard (DS).

(2) A group of non-scheduled tasks that result from the scheduled


tasks accomplished at specified intervals, and reports of malfunctions usually created by
the operating crew and data analysis. The objectives of these tasks are to bring the aircraft
to a desired condition.

An efficient program schedules only those tasks necessary to meet the


fixed objectives. Additional tasks, which will increase cost without any significant
improvement in reliability, are not scheduled. The MSG-3 document “describes the
method for developing the scheduled maintenance” using a “guided logic approach.” The
logic flow of analysis is “failure-effect oriented” while the result must be a task-oriented
program. Items with no scheduled task specified may be monitored by an operator’s
reliability program. Finally, assumptions that can result in a change must be documented.
h. Divisions of MSG-3 Document95
The working portions of MSG-3 are contained in four sections. They are a
section for System/Powerplant, including components and Auxiliary Power Units
(APU’s); a section for aircraft structure; a section for zonal inspection; and finally a
section for lightning/high intensity radiated field (L/HIRF) analysis. “Each section
contains its own explanatory material and decision logic diagram, and it may be used
independently of other MSG-3 sections.”

In the following sections (i through p), Aircraft Systems/Powerplant


Analysis is further discussed because it obviously has the closest potential relationship
with SUAVs applications.
i. MSI Selection96
95 The material from this section is taken (in some places verbatim) from: ATA MSG-3, page 16.
96 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 22-23.

49
Progressive logic diagram is the evaluation technique applied to each
maintenance significant item (MSI) using the technical data available. An MSI may be a
system, a subsystem, module, component, accessory, unit or part. In general, the
evaluations are based on the item’s functional failures and causes of the failure.

Before MSG-3 logic can be applied to an item, the aircraft’s significant


systems and components must be identified. Then using the top-down approach MSIs
must be identified. To select MSIs, the process is as follows:

(1) The manufacturer divides the aircraft into the main functional
areas, Air Transport Association (ATA) systems, and subsystems. This division continues
“until all the aircraft’s replaceable components have been identified.”

(2) “The manufacturer establishes the list of items to which MSI


selection questions will be applied.”

(3) Those questions applied to the items in the lists are

(a) “Could failure be undetectable or not likely to be


detected by the operating crew during normal duties?” (Detectability)

(b) Could failure affect safety on ground or in flight?


(Safety part of severity)

(c) “Could failure have a significant operational impact?”


(Operational part of severity)

(d) “Could failure have a significant economic impact?”


(Economic part of severity)

(4) Subsequent analysis.

(a) If at least one of the above four questions is answered


with “yes,” MSG-3 analysis is required. “An MSI is usually a system or subsystem,” and
in most cases is “one level above the lowest level identified” on (1). “This level is
considered the highest manageable level; i.e. one that is high enough to avoid
unnecessary analysis, but low enough to be properly analyzed.”

50
(b) For those items for which all four questions are
answered with a “no,” MSG-3 analysis is not required. “The lower level items should be
listed to identify those that will not be further assessed.” This list must be reviewed and
approved by the Industry Steering Committee (ISC).

(5) The resulting list for the highest manageable level items is
considered the “candidate MSI list” and is presented by the manufacturer to the ISC. The
ISC reviews and approves this list, which is passed to the working groups (WGs).

(6) The WGs review the candidate MSI list in order “to verify that
no significant items have been overlooked, and that the right level for the analysis has
been chosen.” By applying MSG-3 analysis, the WGs can “validate the selected highest
manageable level or propose modification of the MSI list to the ISC.”
j. Analysis Procedure97
For each MSI, the following must be identified:

• Function(s), the “normal characteristic actions of an item”

• Functional Failure(s), the failure of an item to perform its planned


function(s)

• Failure Effect(s), the result of a functional failure

• Failure Cause(s), the reason for the functional failure occurrence

Analysis should take special care to “identify the functions of all


protective devices,” and include economic and safety related tasks in order to “produce
initial scheduled maintenance tasks and intervals.” Vendor recommendations (VR) that
are available should be “considered and discussed in the WGs meetings and accepted if
they are applicable and effective.”

A preliminary work sheet, prior to applying the MSG-3 logic diagram to


an item, clearly defines the MSI, its function(s), functional failure(s), failure cause(s) and
additional data for each item.
k. Logic Diagram98
97 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 23-24.
98 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 24-25.
51
The decision logic diagram, illustrated in Figure 2 and 3, assists in
analyzing systems in general and powerplant items in particular. The logic flow follows a
top-down approach and answers the “yes” or “no” questions giving the direction of the
analysis flow after each answer.

There are two levels in the decision analysis:

“(1) Level 1 requires the evaluation of each functional failure in


order to determine the failure effect category; i.e. safety, operational, economic, hidden
safety or hidden non-safety.

(2) Level 2 then takes the failure cause(s) for each functional
failure into account for selecting the specific type of task(s).”

In Level 2, regardless of the answer to the first question about


lubrication/servicing (LU/SV), the next task selection question must always be asked.
When following the hidden or evident safety effects path, all successive questions must
be asked. In the remaining categories that follow the first question, a “yes” answer
permits exiting the logic.

Default logic concerns areas paths that do not affect safety. If there is no
“adequate information to a clear ‘yes’ or ‘no’ to the questions in the second level, then
default logic dictates a ‘no’ answer.” “No,” as an answer in most cases, provides a more
conservative and/or costly task.

52
Is the occurrence
of a functional
failure evident to the
Evident functional failure YES operating crew NO
during the
performance of
Does the functional normal duties?
failure or secondary
damage as a result
NO
of that, have a
direct adverse effect
on operating
safety?
Does the functional
failure have a direct
YES NO
adverse effect on
Level 1 YES operating
capability?

Level 2
Safety effects : Operational effects : Economic effects :
Task(s) required to Task desirable if it Task desirable if cost
assure safe reduces risk to an is less than repair
operation acceptable level costs
Is a lubrication or
YES service task NO Is a lubrication or
applicable and YES service task
effective? applicable and NO
effective?
Lubrication/
servising Lubrication/
servising
Is an inspection
YES or functional NO Is an inspection
check applicable YES or functional
and effective? check applicable NO
Inspection/
functional
and effective?
Inspection/
check functional
Is a restoration check
task to reduce Is a restoration Same as
YES NO
failure rate task to reduce Operational
YES
applicable and failure rate NO effects
effective? applicable and
Restoration effective?
Restoration
Is a discard task
to avoid/reduce Is a discard task
YES NO
failure rate to avoid/reduce
YES NO
applicable and failure rate
effective? applicable and
Discard effective?
Discard Redesign
may be
Is there a task or desirable
YES combination of NO
tasks applicable
and effective?

Task combination Redesign is


most effective must mandatory
be done

Figure 2. Systems Powerplant Logic Diagram Part1 (After ATA MSG-3, page 18)

53
Hidden functional failure
Does the combination of
a hidden functional failure
and one additional failure
of a system related or
backup function have an
adverse effect on
YES operating safety? NO
Level 1

Level 2 Non safety effects :


Safety effects :
Task(s) required to assure the Task(s) required to assure the
availability necessary to avoid availability necessary to avoid
effects of multiple failures economic effects of multiple failures

Is a lubrication or Is a lubrication or
YES NO
service task applicable YES service task NO
and effective? applicable and
effective?
Lubrication/ Lubrication/
servising servising

YES Is a check to verify


Is a check to verify NO
YES NO operation applicable
operation applicable and effective?
and effective?
Operational/
Operational/ visual check
visual check

Is an inspection or
Is an inspection or YES functional check NO
functional check to
YES NO applicable and
detect degradation of
function applicable effective?
Inspection/
Inspection and effective? functional
/functional check
test
Is a restoration task
Is a restoration task to YES to reduce failure NO
YES reduce failure rate NO rate applicable and
applicable and effective?
effective? Restoration

Restoration
Is a discard task to
YES avoid/reduce failure NO
Is a discard task to
YES NO rate applicable and
avoid/reduce failure
effective?
rate applicable and
effective? Discard
Discard Redesign is
desirable

Is there a task or
YES combination of tasks NO
applicable and
effective?

Task combination Redesign is


most effective must mandatory
be done

Figure 3. Systems Powerplant Logic Diagram Part2 (After ATA MSG-3, page 20)

54
l. Procedure
This procedure requires consideration of the functional failures,
failure causes, and the applicability or effectiveness of each task. Each
functional failure processed through the logic will be directed into one of
five failure effect categories: 99

• Safety

• Operational

• Economic

• Hidden safety

• Hidden non-safety100
m. Fault Tolerant Systems Analysis101
“In MSG-3 analysis, a fault tolerant system is one that has redundant
elements that can fail without impacting safety or operating capability.” These faults are
not very noticeable to the operating crew and the aircraft’s safety and airworthiness is not
impaired. So, “functional failures, in fault tolerant systems, are hidden non-safety.” The
“fault-tolerant” faults can be “detected by interrogation of the system.”

The method for analyzing MSIs that include fault-tolerant functions has
the following steps:

• “The manufacturer identifies and lists all functions, highlighting those that
are fault-tolerant.”

• The basis for identifying fault-tolerant functions must be provided.

• “For non-fault-tolerant functions, the standard analysis process must be


used.”

• “For fault-tolerant functions, the WGs must determine and select an


applicable and effective task and interval, based on the available data from
the manufacturer.”

99 ATA MSG-3, page 25.


100 Ibid, page 21.
101 The material from this section is taken (in some places verbatim) from: ATA MSG-3, page 26.

55
n. Consequences of Failure in the First level102
There are four first-level questions.
(1) Evident or Hidden Functional Failure. Question: “Is the
occurrence of a functional failure evident to the operating crew during the performance of
normal duties?”
The intention for this question is to separate the evident from the
hidden functional failures. The operating crew is the pilots and air crew on duty. The
ground crew is not part of the operating crew. A “yes” answer indicates the functional
failure is evident and leads to Question 2. A “no” answer indicates the functional failure
is hidden and leads to Question 3.
(2) Direct Adverse Effect on Safety. Question: “Does the
functional failure or secondary damage resulting from the functional failure have a direct
unfavorable effect on operating safety?”
A direct functional failure or resulting secondary damage
“achieves its effect by itself, not in combination with other functional failures.” If the
consequences of the failure condition would “prevent the continued safe flight and
landing of the aircraft and/or might cause serious or fatal injury to human occupants,”
then safety should be considered as unfavorably affected. A “yes” answer indicates that
this functional failure must be considered within “the Safety Effects category” and task(s)
must be developed accordingly. A “no” answer indicates the effect is either “operational
or economic” and leads to question 4.
(3) Hidden Functional Failure Safety Effect. Question: “Does the
combination of a hidden functional failure and one additional failure of a system related
or back-up function have an adverse effect on operating safety?”
This question is asked of each hidden functional failure, identified
in Question 1. A “yes” answer indicates that there is a “safety effect and task
development must proceed in accordance” with the hidden-function safety-effects
category. A “no” answer indicates that there is a “non-safety effect and will be handled in
accordance” with hidden-function non-safety effects category.

102 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 26-30.

56
(4) Operational Effect. Question: “Does the functional failure
have a direct unfavorable effect on operating capabilities?”
In this question, considerations must be taken concerning the
operating restrictions, correction prior to further dispatch, and abnormal or emergency
procedures from the flight crew. A “yes” as an answer means that the effect of the
functional failure has an unfavorable effect on operating capability, and task selection
will be handled in evident operational effects category. A “no” as an answer means that
there is an economic effect and should be handled in accordance with evident economic
effects category.
o. Failure Effect Categories in the First Level103
After the analysts have answered the applicable first-level questions, “they
are directed to one of the five effect categories.”
(1) Evident Safety: The Evident Safety Effect category concerns
the safety operation assurance tasks. “All questions in this category must be asked.” In
case no effective task(s) results from this category analysis, “redesign is mandatory.”
(2) Evident Operational: In this category, a task is “desirable if it
reduces the risk of failure to an acceptable level.” Analysis requires the first question
(LU/SV) to be answered and regardless of the answer, to proceed to the next level
question. From that point a “yes” as an answer completes the analysis and “the resultant
task(s) will satisfy the requirements.” If all answers are “no,” then no task has been
generated and if operational penalties are severe, redesign may be desirable.
(3) Evident Economic: In that category, a task(s) is desirable if its
cost is less than the repair cost. Analysis has the same logic as the operational category. If
all answers are “no,” then no task has been generated and if economic penalties are
severe, a redesign may be desirable.
(4) Hidden Safety: “The hidden function safety effect requires a
task(s) to assure the availability necessary to avoid the safety effect of multiple failures.”
All questions must be asked and “if there are no tasks found effective, then redesign is
mandatory.”

103 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 31-38.

57
(5). Hidden non-Safety: “The hidden function non-safety category
indicates that a task(s) may be desirable to assure the availability necessary to avoid the
economic effects of multiple failures.” Analysis has the same logic as the operational
category. If all answers are “no,” no task has been generated and if economic penalties
are severe, a redesign may be desirable.
p. Task Development in the Second level104
For each of the five-effect categories, task development is used in a similar
manner. “It is necessary to apply the failure causes for the functional failure to the second
level of the logic diagram” for the task resolution as in Table 2. There are six possible
task follow-on questions in the effect categories.
(1) Lubrication/servicing (in all categories). Question: “Is the
lubrication or servicing task applicable and effective?”

“Any act of lubrication or servicing for the purpose of maintaining


the inherent design capabilities” is considered.
(2) Operational/visual check (hidden functional failure categories
only). Question: “Is a check to verify operation applicable and effective?”

“The operational check is a task to determine that an item is


fulfilling its intended purpose.” It is a failure-finding task and does not require
quantitative tolerances. “A visual check is an observation to determine that an item is
fulfilling its intended purpose.” It is also a failure-finding task and does not require
quantitative tolerances.
(3) Inspection/functional check (All categories). Question: “Is an
inspection or functional check to detect degradation of function applicable and
effective?”

An inspection could be general and visual, detailed with surface


cleaning or elaborate access procedures, special detailing with excess surface cleaning
and substantial access and disassembly procedures. “A functional check is a quantitative
check to determine if one or more functions of an item performs within specified limits.”

104 The material from this section is taken (in some places verbatim) from: ATA MSG-3, pages 31-47.

58
(4) Restoration (All categories). Question: Is a restoration task to
reduce the failure rate applicable and effective?

Restoration is the “work necessary to return the item to a specific


standard.” The scope of each assigned restoration task has to be clearly specified.
(5) Discard (All categories). Question: Is a discard task to avoid
failures or reduce the failure rate applicable and effective?

Discard is the “removal from service of an item at a specified life


limit.” It is a typical task applied to single celled parts such as cartridges, canisters,
filters, engine disks, etc.
(6) Combination (Safety categories only). Question: Is there a
task or combination of tasks applicable and effective?

All possible paths must be analyzed since this is a safety category


question.

59
Task Applicability Safety Operational Economic
Effectiveness Effectiveness Effectiveness
Lubrication The replenishment The task must The task must The task must be
or Servicing of the consumable reduce the risk of reduce the risk cost effective (i.e.,
must reduce the failure. of failure to an the cost of the
rate of functional acceptable level. task must be less
deterioration. than the cost of
the failure
prevented)
Operational Identification of The task must No applicable. The task must
or Visual failure must be ensure adequate ensure adequate
possible. availability of the availability of the
Check hidden function hidden function,
to reduce the risk to avoid economic
of a multiple effects of multiple
failure. failures and must
be cost effective.
Inspection Reduced resistance The task must The task must The task must be
or to failure must be reduce the risk of reduce the risk cost effective
detectable, and failure to assure of failure to an
Functional there exists a safe operation acceptable level.
Check reasonably
consistent interval
between a
deterioration
condition and
functional failure.
Restoration The item must The task must The task must The task must be
show functional reduce the risk of reduce the risk cost effective
degradation failure to assure of failure to an
characteristics at safe operation. acceptable level.
an identifiable age,
and a large
proportion of units
must survive to
that age. It must be
possible to restore
the item to a
specific standard
of failure
resistance.
Discard The item must The safe life limit The task must An economic life
show functional must reduce the reduce the risk limit must be cost
degradation risk of failure to of failure to an effective.
characteristics at assure safe acceptable level.
an identifiable age, operation.
and a large
proportion of units
must survive to
that age.

Table 2. Task Selection Criteria (After ATA MSG-3, page 46)

60
3. Comparison of Existing Methods
a. RCM
It is clear that maintenance activity must help ensure that the inherent
levels of safety and reliability of the aircraft are maintained.

The days of doing maintenance just for the sake of maintenance or


because it makes us “feel good” are past. Studies have revealed that
technicians performing maintenance based on “trivial knowledge” rather
than the air carrier’s approved maintenance program have generated
errors. In other cases, technicians performing approved maintenance that
was not necessary have also generated maintenance errors. Each time we
provide technicians access to an aircraft, we also provide the potential for
that technician to inadvertently induce an error.105

We may say in simple words that the RCM goals are to:

• Ensure realization of the equipment’s inherent safety and reliability.

• Restore equipment’s safety and reliability to required levels when


deterioration occurs.

• Obtain the information necessary for design improvements where inherent


reliability is insufficient.

• Accomplish these goals at a minimum total life-cycle cost. 106

The RCM logic is simply to:

• Determine the function of the system/component;

• Find out what the functional failures are;

• Evaluate the consequences of each failure; and,

• Assign the least expensive but adequate maintenance task to prevent each
failure.107

105 Nakata, Dave, White paper, “Can Safe Aircraft and MSG-3 Coexist in an Airline Maintenance
Program?”, Sinex Aviation Technologies, 2002, Internet, May 2004. Available at: http://www.sinex.com/
products/Infonet/q8.htm
106 The above part is taken (in some places verbatim) from: Nakata.
107 The above part is taken (in some places verbatim) from: National Aeronautics and Space
Administration (NASA), “Reliability Centered Maintenance & Commissioning,” slide 5, February 16,
2000, Internet, May 2004. Available at: http://www.hq.nasa.gov/office/codej/codejx/Intro2.pdf
61
b. Conducting RCM Analysis
Some managers, who see RCM as a quick, cheap and easy route to
obtaining the particular maintenance policies they are seeking, frequently overrule junior
staff taking part in RCM analysis. This is a poor approach to the conduct of any analysis.
RCM is better conducted by a review group, which may involve senior staff alongside
more junior staff. An experienced analyst with a developed background in RCM and in
managing groups should lead it. If the group functioning is wrong, it is improper to blame
RCM for what project management is failing to achieve.108
c. Nuclear Industry & RCM109
The initial maintenance programs in US nuclear power plants were
developed in conventional fashion, mainly depending on vendor recommendations.
“Continuing efforts to enhance safety and reliability” resulted in “utility management at
some plants” questioning if the overall outcome was a “significant degree of over-
maintenance.” By the early 80s, the nuclear power industry seemed to be “faced with a
choice of either generating power or doing the prescribed planned maintenance (PM).”
They were seeking a way to reduce the PM workloads without impairing safety or
reliability. This is the same type of question applicable to SUAV maintenance.

The Electric Power Research Institute (EPRI) became aware of the


Nowlan & Heap report on RCM published in 1978. However, after the initial applications
of RCM, many plants developed their own methods for maintenance optimization, which
deviated from RCM principles. “They took the view that high levels of redundancy in
their safety systems, high levels of regulations imposing failure-finding tasks, and the
fairly simple mission of the power generating systems at such plants could validly
support certain simplifications of the methodology.” They also took the view that in older
plants the existing experience had found all potential failure modes, and there was a very
detailed record keeping conducted by the nuclear power industry. So “they felt that the
function analysis and the FMEA steps embodied in the RCM process could be
simplified.”
108 The above part is taken (in some places verbatim) from: Clarke Phill, “Letter to the Editor of New
Engineer Magazine regarding Professor David Sherwin at ICOMS 2000,” question 10, August 2000,
Internet, May 2004. Available at: http://www.assetpartnership.com/downloads.htm-13k
109 The material from this section is taken (in some places verbatim) from: Moubray, page 3.

62
“The most abbreviated approach,” recommended by EPRI in TR-105365
in September 1995, “modified the RCM process by setting up a list of simple functional
questions,” without further functional analysis, the question is whether the component
failure leads to:

(1) plant trip (shutdown),

(2) power reduction of more than 5% (degradation),

(3) loss of a safety function,

(4) plant transient (recoverable),

(5) personnel hazard, or

(6) delay in start-up (mission delay)?

“These processes achieved their limited objectives in the nuclear industry”


and they led to a very substantial reduction in the PM workload without impairing safety
or reliability. Effects (1), (2), (3), and (5) are noticeable in SUAV operations. Event (4) is
rarely tracked. Event (6) occurs routinely but rarely recorded.
d. RCM in NAVAIR110
“As reported by US Naval Air Command (NAVAIR), current operation
and support (O&S) costs for naval aviation weapon systems consume 50 to 60 percent of
the Navy’s total operating account” with a tendency to increase every year by a rate of 5
percent.

NAVAIR, which was one of the sponsors of the original Nowlan & Heap
report, found that some vendors were using all sorts of unique and custom-made
processes, which they described as “RCM processes,” to develop maintenance programs
for equipment that they were selling to NAVAIR. “In this age of ‘do more with less,’
there is a problem that has infected the discipline of physical asset management. In the
interest of saving time and money, corrupted versions of RCM, versions that

110 The material from this section is taken (in some places verbatim) from: Regan, Nancy, RCM Team
Leader, Naval Air Warfare Center, Aircraft Division, “US Naval Aviation Implements RCM,” undated,
Internet, February 2004. Available at: http://www.mt-online.com/articles/0302_navalrcm.cfm
63
irresponsibly shorten the process, continue to flood the market. These tools are
incorrectly called RCM.”

These wayward RCM processes led NAVAIR to approach the Society of


Automobile Engineers (SAE) as a recognized standard-setting institution with close
relations to the US Military and to the aerospace industry, and SAE JA 1011 was
published in August 1999. It is a brief document setting out all the minimum criteria that
any process must include to be called an RCM process when applied to any particular
asset or system.111

When NAVAIR initially implemented RCM in some systems, the


economic savings included, on average:

• Scheduled maintenance reduced by 75 percent per year.

• Consumable usage decreased 88 percent per year.

• Disposal of hazardous material decreased 84 percent per year.


e. RCM in Industries Other Than Aviation and Nuclear Power112
RCM has been applied in many industrial sites in many countries. “These
applications have embodied the performance of several thousand RCM analyses.” RCM
applications have not been successful in every case. It can be said to have failed in about
one-third of the cases. None of the initiatives that failed was due to technical reasons but
for organizational ones. The two most common reasons for failure are

(1) The head internal sponsor of the effort “quit the organization or
moved to a different position before the new ways of thinking embodied in the RCM
process” could be absorbed.

(2) The internal sponsor and/or the consultant, who was the acting
change agent, “could not generate sufficient enthusiasm for the process,” so it was not
applied in a way which would yield results.

111 The material from the above part of section is taken (in some places verbatim) from: Aladon Ltd,
“About RCM.”
112 The material from this section is taken (in some places verbatim) from: Moubray, page 5.

64
Of course, the other two-thirds have been successful. There is “a high
correlation between the success rate of RCM-2 (MSG-3) applications and the change
management capabilities of the consultants involved.” For example, the (British) Royal
Navy (RN), which is a major user of SAE-compliant RCM, “has come to understand that
the capabilities of individual consultants are as important as the track record of their
employers.” So the “RN now insists on interviewing at great length every RCM
consultant that is at their disposal” to verify the commercial sincerity of the employers.

When discussing RCM, both the economic benefits and the question of
risk are considerations. For the economic benefits in some cases, “the payback period has
been measured in days and sometimes one or two years.” The normal period is weeks to
months. “These economic benefits flow from improved plant performance” mostly,
although in some cases users (especially military) have achieved very substantial
“reductions in direct maintenance costs”.

It is often said that RCM “is a good tool for developing maintenance
programs in ‘high risk’ situations” and that “some equipment items have such low impact
on business risk that the effort required to perform RCM analysis on them is greater than
the potential benefits.” The truth is that “no physical asset or system can be deemed to be
‘low risk’ unless it has been subjected at the very least to a zero-based FMECA” that
proves it is in fact low risk.

(1) From the results of thousands of RCM-2 (MSG-3) analyses that


are being performed around the world, and incidents in supposedly “low risk,” some
industries have avoided very serious business consequences.

(2) On average about 4% of the failure modes have direct safety or


environmental implications. Frequently, findings showed that as many as 25% of the
failure modes are not currently receiving any form of preventive maintenance. Most of
those failure modes concern protective devices that had not been receiving proper
attention prior to the RCM-2 analysis.

About the supposedly “low risk” industries: automobile and food plants
are frequently said to be “low risk,” and therefore not worth strict and rigorous analysis.

65
The truth is that you cannot characterize these industries as low risk as the following
examples indicate:

(1) The boiler that exploded during a maintenance inspection at


Ford’s River Rouge plant in Detroit in February 1999, killing six and shutting the plant
down for 10 days,

(2) The failure of the Firestone tires on Ford Explorers, which has
been charged to the design, the operating pressure and to manufacturing process failures.
These failures put the existence of Firestone as a company at risk,

(3) The failure of a filter used in the Perrier water bottling in


France, leading to the recall of thousands of Perrier products and an enormous cost to the
company.

Although rare events, it is wrong to characterize a task or a component or a


failure as “low risk,” especially if all failure modes had not being considered.
f. FMEA and RCM113
“An FMEA, usually conducted in the design phase of an equipment or
system, can also be used as a tool for analysis in RCM.” While defining the functions and
desired standards of performance of an asset, the objectives of maintenance with respect
of that asset are defined. Defining functional failures enables us to explain what we mean
by “failed.” These two issues were addressed by the first two questions of the RCM
process. The next two questions seek to identify the failure modes that are reasonably
likely to cause each functional failure and to find out the failure effects associated with
each failure mode. This is done by performing an FMEA for each functional failure. An
FMEA contains:

• Description and detection for each failure mode

• Cause and effects of each failure

• Probability of failure (occurrence)

• Criticality of failure (severity)

113 The material from this section is taken (in some places verbatim) from: NASA, slide 13.

66
• Corrective/preventive measures

FMEA is the key to a successful commissioning program. For newly


developed systems with not much experience gained by the developing parties,
insufficient oversight, and many unknown potential circumstances, requirements are not
standard and certain. Requirements in systems under development, like the UAVs, are a
matter of research, experience and technology advances.
g. FMECA
Trying to perform an FMEA to a new system under development, such as
SUAVs, is not an easy task because a lot of details keep changing. Instead, an FMECA is
better since critical issues are those considered first priority. FMECA is a first-step effort
that can be done in such a case.
h. FTA, FMEA, FMECA114
“The question to be addressed when considering the most appropriate
system analysis tool is whether to conduct a FMECA/FMEA or a FTA.” The most
obvious answer to that decision making question is “it depends”. The “criticality of a
mission and/or personnel safety” matters are the primary driving concern and the initial
reason for a FTA. The FTA’s target is “finely focused” to a point compared to that of a
FMECA’s which is not focused only to one point but to a broader area.

“If there are many different areas of concern and all of them need to be
revealed, then a FMECA is more effective because it has a greater chance of finding the
critical failure modes.” If only a single event or a few events that can be clearly defined
are of crucial concern, then FTA is favored.

The desire for either a qualitative and/or a quantitative analysis is not the
distinguishing factor for selecting a FTA or a FMECA/FMEA. Either approach can give
qualitative or quantitative results. The following table gives guidance for choosing
between FTA and FMECA/FMEA.

114 The material from this section is taken (in some places verbatim) from: Reliability Analysis Center
(RAC), Fault Tree Analysis (FTA) Application Guide, 1990, pages 8-10.
67
FTA vs FMECA Selection Criteria FTA FMECA/
Preferred FMEA
Preferred
Safety of personnel or public as the primary concern X
A small number of explicitly defined “top events” X
Inability to clearly define a small number of “top events” X
Mission completion is of critical importance X
Any number of successful missions X
“All possible” failure modes are of concern X
“Human errors” contributions are of concern X
“Software errors” contributions are of concern X
A numerical “Risk evaluation” is the primary concern X
System is highly complex and interconnected X
System with linear architecture and little human or software X
intervention
System is not repairable X

Table 3. FTA and FMECA/FMEA (After RAC FTA, page 10)

i. FTA115
For any reliability program, FTA is an effective tool. It is a quick way of
“understanding the causes of a system’s inherent problems” and also a way to “identify
potential safety hazards during the design phase.”

Tailoring the FTA to fit the specific type of analysis that is necessary for a
certain scope requires two decisions. The selection of the “top event,” which is the target
upon which the FTA is to focus is the first decision, and the concern of whether the
analysis is about to yield qualitative or quantitative or both types of results is the second
decision.
j. RCM Revisited
“RCM is better in the operating and support phase of the life cycle of a
system” This is true when considering how and why RCM was created. For example,
airplanes were used from the beginning of the previous century. The general concept of
the airplane has been known for many years. Legal requirements and special regulations
controlling manned-aviation have also been in place for many years. Thus, in this case,

115 The material from this section is taken (in some places verbatim) from: RAC FTA, pages 9-11.

68
RCM provided solutions to certain manned-aviation problems mainly related to
operations and maintenance issues with safety and economics as backgrounds. Similarly,
many other industries also employed RCM to solve such problems.

By definition, RCM is a methodology for determining the most cost-


effective maintenance strategy for a given item of equipment taking into account its
operating environment. When a product is in design phase, the designers have little
historical experience so the whole effort is focused on developing something that works
and is not focused on cost-effective strategies. The following table gives guidance for
choosing between MSG-3 and FMEA/FMECA.

FMEA/FMECA vs MSG-3 Selection Criteria MSG-3 FMEA/


Preferred FMECA
Preferred
Safety of personnel or public as the primary concern X
Top-down approach of failure analysis X
Bottom-up approach of failure analysis X
System is highly complex and interconnected X
Early design and development phase X
Implementation cost X
Implementation timescale X
Economy issues are of critical importance X
“All possible” failure modes are of concern X X
“Human errors” contributions are of concern X X
“Software errors” contributions are of concern X
Systems with little human and a lot of software intervention X
First tool for initial failure analysis X
Available for the entire system life-cycle (long-term) X
Available for the entire system life-cycle (short-term) X
Implementation effort X
Operational phase X
Conducted by experienced personnel X
Training requirements X
Extensive and conclusive X
System with linear architecture and little human or software X
intervention

Table 4. MSG-3 and FMECA/FMEA

69
k. UAVs, SUAVs versus Manned Aircraft
The primary difference between manned piloted aircraft and UAVs is that
piloted aircrafts rely on the presence of humans to detect (sense) and respond to changes
in the vehicle’s operation. The human can sense the condition of the aircraft, say with
unusual vibration that may indicate structural damage or impending engine failure.
Humans can sense events within and outside the vehicle, gaining what is known as
“situational awareness.”

For manned military aviation the philosophy is pilot and aircraft-oriented.


The valuable life of the pilot who spends so much time in studies, training, and gaining
the experience of hundreds of flight hours, is the number one factor. The expensive, state-
of-the-art multi-mission-capable aircraft is the number two factor. For UAVs, the
philosophy is mission and cost-oriented. Different missions require different systems,
different platforms with different capabilities. It is also desired that the cost should
remain as low as possible. Technology helps to achieve both those goals for UAVs.
Better, cheaper technologies can be adapted very easily and very quickly to UAVs.

UAVs can be remotely piloted (“controlled”) from the ground. It is


difficult for the pilot (operator) to feel and sense having the same or better situational
awareness than if he was piloted a manned aircraft. For SUAVs, specifically, volume,
weight, cost, duration of flight, and sensor capabilities are the primary factors of interest.
Personnel safety is approached differently than manned aviation. With costs starting from
$15K up to $300K per platform, SUAVs are considered expendables, but reusables, and
treated accordingly. Thus SUAV reliability is low since they are designed to be
inexpensive and have a relatively short life circle.

During the last few years, commanders no longer want their SUAVs to be
“toys” that uncertainly expand their capabilities. Commanders want their SUAVs to be
operationally effective assets to help win battles. “Operationally, the same case may be
made for ensuring the missions are completed if we rely on UAVs to accomplish mission
critical tasks once done using manned assets.”116

116 Clough, Bruce, “UAVS-You Want Affordability and Capability? Get Autonomy!” Air Force
Research Laboratory, 2003.
70
There are some facts about SUAV systems that require consideration:

(1) They are potentially valuable on battlefields.

(2) Unreliability creates operational ineffectiveness.

(3) SUAV design philosophy remains mission and cost oriented.

(4) Software and hardware reliability improvement is desirable.

(5) Tracking reliability of SUAVs provides insight on operational


availability. Currently, there is not any system to track SUAVs reliability in use.

(6) Most of the SUAVs are not maritime systems; they are in design phase
or operational testing.

(7) Sensor and miniaturization technology for SUAVs changes rapidly.

(8) Systems are not highly complex.

(9) The new unmanned aviation “community” has started to develop;


experience operating SUAVs has just started to accumulate.

(10) Human factors for the GCS are critical since they are the linkage
between the system and its effective employment.
l. Conclusions-Three Main Considerations about UAV- RCM
The reliability tracking and improvement system for SUAVs must be
inexpensive, easily and quickly adapted, and implemented by a few, relatively
inexperienced personnel. It must also cover the entire system’s issues of hardware,
software and human factors. The safety requirements for personnel apply only to the
ground operators and maintainers and the main source of data for hidden failures during
flight can only be provided by telemetry. Finally, because sensor technology is rapidly
developing and easily implemented due to low cost, the reliability tracking and
improvement system for SUAVs must be easily adaptable to changes.

From the above we can construct the following table which summarizes
the basic differences between the MSG-3 and FMEA/FMECA methods with respect to
SUAVs:

71
FMECA
FMEA/
MSG-3
RCM
SUAVs

1 Reliability improvement needed X X


2 Mission and cost oriented X
3 Operational testing and development phase X
4 Rapid changes in technology X
5 Inexpensive and easily adapted methodology X
6 Telemetry is used a lot (Hidden failure difficult to identify) X
7 Safety for operating personnel is not a critical issue X
8 Experienced personnel difficult to find X
9 Human factors for GCS is critical X X

Table 5. Comparing RCM MSG-3 and FMEA/FMECA for SUAVs.

So, the main considerations about RCM implementation for SUAVs are:

(1) Safety has an important role in RCM methodology because of


the nature of civil aviation. The primary goal for civil aviation is to transport people and
goods safely. Safety standards and strict rules are the top priority and, so they become a
priority in RCM analysis. For industries where RCM has been applied, safety has almost
the same role as in the aviation case because of strict regulations and standards for the
operators and the employees. In the UAV case, however, there are no people onboard, so
safety for travelers and crew is not as critical an issue.

(2) In the RCM process, the key factor for the initial identification
of the hidden failure is the flight crew. In the UAV case, there is no crew aboard and so
there is no chance for crew to sense hidden failures. The only indication that might be
available is the platforms’ control sensors reading while in-flight and the system’s
performance while a platform is tested on the ground prior to take-off.

(3) Experience gained in civil aviation cannot be applied directly to


UAVs.

From the above it is clear that RCM MSG-3 is not suitable for
SUAVs. These leaves fault tree analysis and FMEA as the remaining methods. We
develop both in detail for the SUAV in the subsequent chapters of this thesis.
72
B. SMALL UAV RELIABILITY MODELING
During recent urban operations in Iraq and Afghanistan, SUAVs that provide
over-the-hill or around-the-corner information were invaluable for operating teams. Some
systems have been tested with very good results, but controversy surrounds the
capabilities of such systems. A generic SUAV system must provide military forces with
real-time around the clock surveillance, target acquisition, and battle assessment. Such a
system must be capable of detecting any desired tactical information in a designated
sector.

Each service component (Navy, Army, and Marines) requires versatile, easy to
handle, and user-friendly systems that enable the commander to conduct reconnaissance
on the battlefield in real-time. SUAVs are being seriously considered for this role. This
entails a small-scale operation over a city block, or more extensive surveillance missions.
Requirements of the system include locating and identifying targets, then relaying the
information to a higher command. The detection accuracy should be sufficient to select
and to deploy weapons, and then to maintain contact after engagement with such
weapons. The system must be able to survey a large area rapidly using multiple platforms
simultaneously. The configuration of the system should enhance the fighting capabilities
of the force, minimizing the time for precise control movements and maximizing
mobility, robustness and functionality. Due to previous experience with similar systems,
reliability and interoperability are most important considerations.
1. System’s High Level Functional Architecture
As illustrated in Figure 4, SUAV battlefield systems high-level architecture
consists of the following:

(1). Platform(s)

(a) Navigation with Global Positioning System (GPS) and Inertial


Navigation System (INS)

(b) Flight control with remote manual, semi-auto, and full-auto


(autonomous) mode of operation

73
(c) Onboard computer (OBC)

(d) Payload with the appropriate sensors for the type of mission

(2). Ground control station (GCS) with command, monitor and support
capabilities.117 This may be shipboard or land-based.

Platform
Navigation
GPS
INS
Flight Control
Onboard Computer (remote manual,
Hardware, Software, semi auto,
Peripherals autonomous
operation)

Payload
Sensors

Communication
Channels

Ground Control Station


Monitor
Command

Figure 4. High Level Architecture of a SUAV System (After Fei-Bin)

For more detailed system architecture, refer to Figure 5:

117 Fei-Bin, Hsiao, and others, ICAS 2002, 23rd International Congress of Aeronautical Sciences,
proceedings, Toronto Canada, 8 to 13 September, 2002, Article: “The Development of a Low Cost
Autonomous UAV System”, Institute of Aeronautics National Cheng Kung University Tainan, TAIWAN
ROC.
74
Platform

GPS
Structure Fuel Tank
INS Engine

Onboard computer Flight


Auto pilot
Batteries (OBC) Controls

Payload Communication Servo Units


Environmental
Considerations in Receivers
Field Operation Landing
1. Temperature Gear
2. Humidity-
Transmitters
Precipitation
3. Cloudiness
Cameras

Sensors
Other
4. Lightning
Antennas
5. Fog
6. Altitude
7. Icing Conditions
8. Wind Speed Communication
Line of
9. Proximity to Sea Channels
Sight
10. Proximity to Desert
(LOS)
11. Proximity to
Inhabited Area Ground Control
Station (GCS)
Antennas

Personnel Communication Receivers Transmitters


Pilot
Load/Sensor Operator
Maintenance Team Onboard computer Flight Sensors
(OBC) Controls Control

Auto Control

Power
Screen Semi-auto Manual
Supply
Output Control Control

Landing
Battery Start-up Auto Launching
Spare Parts
Charger Device Recovery Device
Unit

Figure 5. Simple Block Diagram of a SUAV System

75
For the platform’s configuration, weight and volume are critical factors because
of the limited size and the flight characteristics of the platform. The system is a complex
one and reliability plays an important role for the operational effectiveness of the system.
In general, there are two ways to increase reliability: Fault tolerance and fault
avoidance.118

(1) Fault tolerance can be accomplished through redundancy in hardware


and/or in software. The disadvantage is that it increases the complexity of an already
complex system, as well as increasing equipment costs, volume, weight, and power
consumption.

(2) Fault avoidance can be accomplished by improving reliability of


certain components that constitute the system. In general, those components that
contribute the most to reliability degradation are the most critical for fault avoidance.

The SUAV system cannot implement fault tolerance, at least for the platforms, so
fault avoidance is the better approach. To achieve this, at first we must conduct an FMEA
in order to define and to identify each subsystem function and its associated failure
modes for each functional output.

In order to proceed in the FMEA, as analysts we will need the following:

(1) System definition and functional breakdown,

(2) Block diagram of the system,

(3) Theory of operation,

(4) Ground rules and assumptions,

(5) Software specifications.

As a second step, we conduct a criticality analysis in order to identify those


mission critical elements that cause potential failures and weaknesses.

118 Reliability Analysis Center (RAC), Reliability Toolkit: Commercial Practices Edition. A Practical
Guide for Commercial Products and Military Systems Under Acquisition Reform, 2004, page 115.
76
To perform these analyses, we will use the qualitative approach due to lack of
failure rate data and a lack of the appropriate level of detail for part configuration.119
2. System Overview
The airborne system comprises the aerial platform and an onboard system. The
ground system comprises a PC and a modem to communicate with the airborne system.
All the onboard hardware is packed in a suitable model platform powered by a 1.5
kilowatts (kw) aviation fuel (JP-5) engine with a wingspan of 1.5 meters (m) and a
fuselage diameter of 12 centimeters (cm). The sensor’s payload is about two kilograms
(kg).

The onboard computing system is being developed on a PC based single-board-


computer. The onboard computer (OBC) is a multi-tasking real time operating system.
The OBC can obtain data from the GPS, the INS, the communication system and the
onboard flight and mission sensors. It computes the flight control and navigation
algorithms, commands the sensor payload, and stores and downlinks data to the GCS in
near real-time operation.

The GCS PC is the equivalent of a pilot’s cockpit. It can display in near real time
the status of the flying UAV or UAVs including:

• UAV(s) position and GCS position

• Speed

• Altitude

• Course

• Attitude and system health in visual pilot-like instruments

• The actual position can also be displayed on an electronic moving map.

• Output from the mission sensors such as near real time imagery displayed
from various types of cameras like CCD, infrared (IR) and others.

119 Reliability Analysis Center (RAC), Failure Mode, Effects and Criticality Analysis (FMECA),
1993, pages 9-13.
77
3. System Definition
Platform

OBC Communication
Channels

Flight
Engine Power
Controls
Proper Landing
Battery Auto Pilot Structure Fuel Tank
Software Gear

Environmental Considerations
in Field Operation
1. Temperature
Payload GPS Communication 2. Humidity- Precipitation
3. Cloudiness
4. Lightning
Cameras INS Receivers 5. Fog
6. Altitude
Other Avionics 7. Icing Conditions
Transmitters 8. Wind Speed
Sensors Sensors
9. Proximity to Sea
10. Proximity to Desert
Antennas 11. Proximity to Inhabited Area

Line of
Sight
Ground Control (LOS)
Station (GCS)

OBC

Sensors Flight Battery Proper


Communication Personnel Power
Control Controls Charger Software

Start-up
Auto Control Receivers Pilot Battery
Device

Semi Auto Spare Load/Sensor


Transmitters Launching
Control Parts Operator
Device
Manual Maintenance
Antennas Landing Auto
Control Team
Recovery Uit
Screen
Output

Figure 6. Simple Block Functional Diagram of a SUAV System

78
Using the diagram in Figure 6, we give the following functional definitions to
each element in the diagram.

Platform Structure: The flying physical asset responsible for integration of all
the necessary equipment for the mission profile.

Antennas: Responsible for conducting the transmitted and received signals


to/from the GCS and passing them from/to transmitters or receivers and the appropriate
communication hardware and software.

Payload, Cameras and Other Sensors: The actual physical assets for the type of
desired mission consisting mainly of cameras and other special sensors like NBC agent
detectors, magnetic disturbance, detectors, and much more.

GPS: The primary navigation system based on a satellite network known as the
Global Positioning System.

INS: The support navigation system based on the inertial calculations of current
speed and course in order to provide an accurate platform fix that will be used for piloting
the platform and for target tracking.

Engine: The unit responsible for providing mechanical power be used in


conjunction with the propeller to provide thrust to the platform.

Battery: The electric power supply asset for the entire platform’s equipment
service.

Flight Controls: The necessary flight sensors, like pittot tubes, hardware,
ailerons, elevators, rudder, the relevant servo units and the flight controller together with
the right software for manual, semi-auto and autonomous flight.

Proper Software: The necessary software for platform mission control.

Landing Gear: Responsible for platform mobility in ground during takeoff and
landing. Not mandatory for use.

Fuel Tank: Storage of fuel necessary for engine operation.

79
GCS: The manned shipboard or land-based component of the system responsible
for command, control communication and support system center.

GCS Flight Controls: The GCS hardware and software for flight controls.

Sensors Control: Main factor responsible for mission performance. Manually


operated with auto capabilities.

Screen Output: The outcome of the systems’ performance presented on a


monitor with all the relevant information for the mission and the system.

GCS Antennas: Conducts the transmitted and received signals to and from the
platform and other centers related to the mission and passes them to/from transmitters or
receivers and the appropriate communication hardware and software.

GCS Proper Software: The necessary software for GCS mission control.

Battery Charger: Charges the platform battery.

Start-up Device: Responsible for the initial start up of the engine’s platform prior
to takeoff.

Spare Parts: Necessary items for operating and supporting the system.

Power Supply: Generator and batteries that provide the GCS electric power.

Personnel: A pilot, a load/sensor operator, and maintainers who man the system
for one shift.

Launching Device: Launches the platform.

Landing Auto Recovery Unit: Provides auto guidance to the platforms for auto-
landings.
4. System Critical Functions Analysis
The SUAV essential functions analysis can be seen in Table 6.

80
Mission Phases

Cruise Back to
Cruise to Area

Off Station
On Station
of Interest
Stand-by

Launch

Land
Item

Base
Essential Functions
Flight
1 Provide structural integrity x x x x x x x
2 Provide lift and thrust x x x x x
Provide controlled flight
3 Manual control x x x x x
4 Semi auto x x x x x
5 Auto x x x x x
6 Navigate x x x x
7 Provide power to control x x x x x x
and navigation equipment
8 Withstand environmental x x x x
factors (mainly wind)
Mission
9 Start systems x
10 System’s backup x x
11 Communications x x x x
12 Line of sight x x x x
13 Provide power to sensors x x x x
and communications
14 Detect, locate and identify x x x
targets
15 Provide data x x x
16 Provide video image x x x
17 Monitor system’s functions x x x x x x x

Table 6. System’s Essential Functions Analysis

81
5. System Functions
The mission phase consists of the following functions:

• Launch the platform

• Fly the platform

• Control, Command and Communicate with the platform

• Control, Command and Communicate with the platform sensors

• Perform surveillance and reconnaissance

• Detect targets

• Identify targets

• Classify targets

• Track targets

• Perform battle assessment

• Know platform’s position

• Sustain flight mission for a certain time at a certain altitude at a certain


speed and on a certain course

• Return to base and land safely

• Service the platform at a certain time and set it ready for the next mission

These functions are the primary drivers for software development and among the
factors for the hardware selection.
6. Fault Tree Analysis
In the following fault-tree analysis of a SUAV system a top-down analysis has
been used to reveal the failure causes. The sub-analyses end with a circle, which means
that further analyses are needed at a more detailed level, or end with a diamond, which
means that the analysis stops there. Due to a lack of data, only the mechanical engine
failure has been analyzed at more than one level. Using that analysis we formulate a
model to use as an example for further analysis.

82
7. Loss of Mission
The first attempt for the fault-tree analysis should be the loss of the mission tree.
The reasons for mission loss may be:

(1) Loss of platform

(2) Loss of GCS

(3) Unable to locate platform (loss of platform’s position)

(4) Inappropriate mission for the sensors (wrong choice of sensors)

(5) Sensor(s) failure

(6) Unable to launch platforms for various reasons, such as weather or


launching device failure

(7) Unable to communicate with the platform

(8) Loss of the operator(s)

(9) Loss of the onboard platform’s or GCS’s computer

(10) For out-of-the system reasons, such as weather conditions or


situational reasons.

Figure 7 illustrates the tree analysis for loss of mission.

83
Loss of Mission

OR
Loss of
Platform

Unable to Locate
Loss of GCS
Platform

Out-of-System Inappropriate
M1 Reasons Sensors

Unable to Launch
L5 Sensor Failure A1
Platform
Loss of
Unable to
Operator O2 S2
Communicate
(Pilot)
With Platform

M3 S1

Loss of M4
OBC

Figure 7. Loss of Mission

84
8. Loss of Platform
The reasons for loss of platform may be:

(1) Loss of platform’s structural integrity

(2) Loss of platform’s lift

(3) Loss of thrust

(4) Loss of platform’s control

(5) Loss of GCS

(6) Loss of platform’s position

Figure 8 illustrates the tree analysis for loss of platform.

85
M1

Loss of Platform

OR
Loss of Structural
Integrity

Loss of Platform’s
Loss of Lift
Position

Loss of
L1 Loss of Thrust
GCS

L2 Loss of Control L6

L3 L5

L4

Figure 8. Loss of Platform

86
9. Loss of GCS
The reasons for loss of GCS may be:

(1) GCS software failure

(2) Loss of OBC

(3) Loss of GCS power

(4) Loss of GCS communication

(5) Loss of GCS personnel

(6) Environmental reasons (e.g. heavy weather conditions, earthquake)

(7) Fire

Figure 9 presents the tree analysis for loss of GCS.

87
L5

Loss of
GCS

OR
Software
Failure Loss of
Environmental OBC
Loss of
GCS Reasons
Power

Loss of GCS
Fire Communication

Loss of GCS
Personnel

Figure 9. Loss of GCS

88
10. Loss of Platform’s Structural Integrity
The reasons for loss of platform’s structural integrity include fuselage, wing, or
empennage related problems, which could be due to:

(1) Fracture

(2) Pressure overload

(3) Thermal weakening

(4) Delamination or fiber buckling

(5) Structural connection failure or

(6) Operator error.

Figure 10 contains the fault-tree analysis for loss of platform’s structural integrity.

89
L1

Loss of Structural Integrity

OR
Fuselage Wings Empennage

Same as Same as
OR

Wings Wings

Fracture Pressure Thermal Connection


Delamination/
Overload Weakening Failure
Fiber Buckling

Operator Error

O1

Figure 10. Loss of Structural Integrity

90
11. Loss of Lift
Reasons for loss of lift may be:

(1) Loss of thrust

(2) Operator error, or

(3) Loss of wing surface, which could be due to loss of right or left wing
surface, which in turn could be due to:

(a) Fracture removal

(b) Pressure overload

(c) Thermal weakening

(d) Delamination or fiber buckling

(e) Structural connection failure or

(f) Operator error

Figure 11 shows the fault-tree analysis for loss of lift.

91
L2

Loss of Lift

OR
Loss of Wing Operator Error Loss of Thrust
Surface
OR

O1 L3

Loss of Right Loss of Left Wing


Wing Surface Surface

Same as
Right Wing
OR

Operator Error
Fracture
Pressure
Removal Thermal
Overload Delamination/
Weakening Connection
Fiber Buckling
Failure
O1

Figure 11. Loss of Lift

92
12. Loss of Thrust
Reasons for loss of thrust may be:

(1) Loss of engine control

(2) Operator error

(3) Loss of propeller that could be due to:

(a) Propeller structural failure

(b) Propeller disconnection

(c) Operator error

(4) Loss of engine, which could be due to:

(a) Engine failure

(b) Engine stalling, which could be due to:

((1)) Failure of fuel system

((2)) Operator error

((3)) Air filter failure

((4)) Air filter clogged

((5)) Engine control failure

Figure 12 shows the tree analysis for loss of thrust.

93
L3

Loss of Thrust

OR
Loss of Engine Loss of
Loss of Engine Operator Error
Control Propeller
OR

OR
E2 O1

Engine Engine Failure Operator Error Propeller


Stalling Structural
Failure
Propeller
Disconnection
E1 O1
OR

Failure of Engine Control


Operator Error
Fuel System Air Filter Failure
Failure/
Clogged

F1 O1 E2

Figure 12. Loss of Thrust

94
13. Loss of Platform Control
Reasons for loss of control may be:

(1) Loss of lift

(2) Loss of control channel

(3) Loss of power, which could be due to:

(a) Total loss of platform’s power

(b) Loss of control unit power

(4) Loss of aileron forces that could be due to:

(a) Loss of left wing aileron force that could be due to:

((1)) Loss of onboard computer (OBC)

((2)) Disruption of control cables

((3)) Loss of servo unit

((4)) Loss of aileron surface

(b) Loss of right-wing aileron force for the same as the left-wing
aileron reasons

(5) Loss of rudder force for the same as the left-wing aileron reasons

(6) Loss of elevator force for the same as the left-wing aileron reasons

Figure 13 illustrates the tree analysis for loss of platform’s control.

95
L4

Loss of Control

OR
Loss of Loss of Loss of Elevator Loss of Control Loss of
Loss of Lift
Aileron Forces Rudder Force Force Channel Power

Same as Same as
OR

Aileron Force Aileron Force

OR
C1 L2

Loss of Right Loss of Left Loss of Loss of Control


Aileron Aileron Platform Power Unit Power

Same as
Left Aileron
OR

P1 P2

Loss of
Aileron
Loss of
Loss of Surface
OBC
Servo
Disruption Unit
of Control
Cables

Figure 13. Loss of Platform’s Control

96
14. Loss of Platform Position
Reasons for loss of platform position may be:

(1) Loss of line of sight (LOS)

(2) Loss of INS backup

(3) Loss of GPS unit

(4) Loss of GPS antenna

(5) Loss of GPS signal

(6) Platform failure to transmit

Figure 14 shows the tree analysis for loss of platform’s position:

L7

Loss of Platform
Position
OR

Platform Failure
Loss of Loss of
Loss of Loss of to Transmit
LOS INS Loss of
GPS Unit GPS
backup GPS Signal
Antenna

T1

Figure 14. Loss of Platform’s Position

97
15. Loss of Control Channel
The reasons for loss-of-control channel may be:

(1) Operator or pilot control panel failure

(2) Loss of LOS

(3) Failure of control receiver

(4) Failure of GCS control transmitter

(5) Loss of power, which could be due to:

(a) Loss of platform’s power

(b) Loss of GCS power

(6) Loss of platform control antenna, which could be due to:

(a) Antenna disconnection

(b) Short-circuit in antenna

(c) Antenna failure

(d) Structural damage

(7) Loss of GCS control antenna the same as reasons for loss of platform
control antenna

Figure 15 illustrates the tree analysis for loss of control channel.

98
C1

Loss of Control
Channel

OR
Loss of GCS Loss of Platform
Loss of Power
Operator Control Antenna Control Antenna
(pilot) Control Failure of GCS
Panel Failure Loss of Control
LOS Transmitter Same as Loss of
Platform Control

OR
Antenna
Failure of
Control
Receiver
OR

Structural
Damage
Loss of Loss of Antenna
Platform Power GCS Power Antenna Disconnection
Failure

Short-circuit
P1 P2 Antenna

Figure 15. Loss of Control Channel

99
16. Engine Control Failure
Engine control failure may be caused by:

(1) Disruption of control cables

(2) Loss of OBC

(3) Loss of LOS

(4) Loss of servo unit

(5) Carburetor failure

(6) Engine failure

The fault-tree analysis for engine control failure can be seen in Figure 16.

E2

Engine Control
Failure
OR

Disruption
of Control Engine Failure
Cables Loss of Carburetor
Controller Loss of Failure
Servo
Unit
Loss of
E1
LOS

Figure 16. Engine Control Failure

100
17. Engine Failure
The reasons for engine failure may be:

(1) Mechanical engine failure

(2) Excessive engine vibration

(3) Fuel/air improper mixture

(4) Improper fuel

(5) Engine fire

(6) Loss of lubrication, which could be due to:

(a) Gas and lubricant improper mixture

(b) Excessive engine temperature rise

(c) Improper lubricant

The fault-tree analysis for engine failure can be seen in Figure 17.

101
E1

Engine Failure

OR
Mechanical Loss of
Engine Failure Engine Lubrication
Fire

Improper

OR
E3 Fuel
Fuel/Air
Improper
Excessive Mixture
Engine
Vibrations Gas/ Excessive Improper
Lubricant Engine Lubricant
Improper Temperature
Mixture Rise

Figure 17. Engine Failure

102
18. Failure of Fuel System
The reasons for fuel system failure may be:

(1) Failure of engine fuel system, which could be due to:

(a) Fuel pump line failure

(b) Fuel pump failure

(c) Fire

(d) Penetration of fuel lines

(e) Carburetor failure

(2) Loss of fuel supply, which could be due to:

(a) Fuel tank lines failure

(b) Fire and/or explosion

(c) Fuel depletion

(d) Penetration of fuel lines

(e) Penetration of fuel tank

(f) Hydrodynamic ram

The fault-tree analysis for fuel system failure can be seen in Figure 18.

103
F1

Failure of Fuel
System

OR
Failure of Engine Loss of Fuel
Fuel System Supply
OR

OR
Fuel Pump
Line Hydrodynamic
Failure Ram
Penetration Fire/
Fuel
of Fuel Lines Explosion
Pump
Failure
Penetration
Fuel Tank Fuel of Fuel Tank
Fire Lines Depletion
Failure
Carburetor
Failure Penetration
of Fuel Lines

Figure 18. Fuel System Failure

104
19. Loss of Platform Power
The reasons for loss of platform power may be:

(1) Wiring short-circuit

(2) Fuse failure that could be due to:

(a) Circuit problem

(b) Improper fuse

(3) Battery failure that could be due to:

(a) Battery discharge

(b) Improper battery

(c) Battery disconnection

(d) Battery short-circuit

(e) Battery exhaustion

(f) Battery not fully charged

Figure 19 illustrates the fault-tree analysis for loss of platform power.

105
P1

Loss of Platform Power

OR
Battery
Fuse Failure
Failure
Wiring
Short-circuit

OR
OR

Battery Battery not Circuit Improper


Battery Fully Fuse
Discharge Improper Problem
Short- Charged
Battery Circuit

Battery
Battery Exhaustion
Disconnection

Figure 19. Loss of Platform Power

106
20. Loss of GCS Power
Reasons for loss of GCS power may be:

(1) Wiring short-circuit

(2) Fuse failure that could be due to:

(a) Circuit problem

(b) Improper fuse

(3) Main and auxiliary power failure

(4) Power disconnection

(5) Loss of GCS generator

Figure 20 shows the fault-tree analysis for loss of GCS power.

107
P2

Loss of GCS Power

OR
Loss of GCS
Wiring Fuse Failure
Generator
Short-
circuit Main and
Auxiliary
G1

OR
Power
Failure

Power
Disconnection
Circuit Improper
Problem Fuse

Figure 20. Loss of GCS Power

108
21. Operator Error
Reasons for operator error may be:

(1) Inadequate personnel training

(2) Personnel fatigue

(3) Personnel frustration and lack of experience

(4) Inadequate man machine interface

(5) Operator’s wrong reaction to failure

(6) Misjudgment due to environmental reasons (mainly weather)

(7) Poor documentation of procedures

(8) Poor workload balance resulting in task saturation with resulting loss
of situational awareness

(9) Ergonomics (Human factors) of GCS

Figure 21 illustrates the fault-tree analysis for operator error.

109
L5

Operator Error

OR
Poor
Personnel Workload
Fatigue Balance

Personnel
Lack of
Ergonomics
Experience
of GCS

Operator’s
Wrong
Inadequate Reaction to
Training Failure

Environmental Poor
Reasons Documentation
Misjudgment of Procedures

Inadequate Man
Machine
Interface

Figure 21. Operator Error

110
22. Mechanical Engine Failure
Reasons for mechanical engine failure may be:

(1) Bad material of engine parts:

(a) Engine block

(b) Cylinder head

(c) Connecting rod(s)

(d) Piston(s)

(e) Piston rings

(f) Bearings

(g) Crankshaft

(2) Normal engine wear

(3) Bad manufacture of engine parts

(4) Bad design of the whole engine or engine parts

(5) Insufficient or bad maintenance

(6) Carburetor failure

(7) Inappropriate engine operation

(8) Overheating

(9) Crash damage, which is due to operator’s error

(10) Engine vibrations.

Figure 22 shows the fault-tree analysis for mechanical engine failure.

111
E3

Mechanical Engine
Failure

OR
Engine
Vibrations
Carburetor
Failure
Crash Damage

E5 Overheating
Bad
Operator’s Inappropriate Material
Error Engine Operation
Normal
E4 Engine Wear

Bad
E6 Manufacture
L5
Bad
Design

Insufficient or
Bad Maintenance

Figure 22. Mechanical Engine Failure

112
23. Engine Vibrations
Reasons for engine vibrations may be:

(1) Broken piston

(2) Bearing failure

(3) Broken piston rings

(4) Bad manufacture of engine parts like:

(a) Cylinder head

(b) Connecting rod(s)

(c) Piston(s)

(d) Piston rings

(e) Bearings

(f) Crankshaft

(5) Bad design of the whole engine or engine parts

(6) Improper engine mounting

(7) Lack of propeller balancing

Figure 23 shows the fault-tree analysis for engine vibrations.

113
E5

Engine Vibrations

OR
Broken Bad
Piston Design

Bearing Broken
Failure Piston
Rings
Improper
Engine Lack of
Mounting Propeller
Balancing

Bad
Manufacture

Figure 23. Engine Vibrations

114
24. Overheating
Reasons for engine overheating may be:

(1) Broken piston rings

(2) Bearing failure

(3) Bad manufacture of engine parts like:

(a) Cylinder head

(b) Connecting rod(s)

(c) Piston(s)

(d) Piston rings

(e) Bearings

(f) Crankshaft

(4) Bad design of the whole engine or engine parts

(5) Dirty cooling surfaces

(6) Bad lubricant

(7) Engine operating too fast due to:

(a) Improper propeller size

(b) Improper engine adjustments

(c) Inappropriate fuel

(8) Bad material of engine parts

Figure 24 illustrates the fault-tree analysis for engine overheating.

115
E4

Overheating

OR
Engine Operating
too Fast Bad
Design
Bearing
Failure Broken
Piston
Bad Rings
OR

Material
Dirty Cooling
Bad Lubricant Surfaces

Bad
Manufacture
Improper
Propeller Inappropriate
Size Fuel
Improper
Engine
Adjustments

Figure 24. Overheating

116
25. Inappropriate Engine Operation
Reasons for inappropriate engine operation may be:

(1) Improper engine adjustment, mounting, disassembly

(2) Inappropriate fuel/lubricant mixture

(3) Improper propeller size

(4) Inappropriate fuel and/or lubricant

(5) Engine stall (during flight)

(6) Bad carburetor adjustments

(7) Inappropriate engine cleaning and/or storage after flights

(8) Inappropriate lean runs (starting after a long period of storage without
any precautions) such as rusted bearings, seized connecting rod or piston, dry piston rings

(9) Propeller stops abruptly (due to external reason) while turning.

The fault-tree analysis for inappropriate engine operation can be seen in Figure
25.

117
E6

Inappropriate Engine
Operation

OR
Improper
Engine Propeller Stops
Adjustments While Turning

Improper Inappropriate Lean


Propeller Runs (Rusted Bearings,
Seized Connecting Rod
Size or Piston)

Inappropriate Inappropriate
Fuel Lubricant

Inappropriate
Inappropriate
Fuel/Lubricant
Engine
Mixture
Cleaning

Inappropriate
Engine Stall Engine
Storage
Bad
Carburetor
Adjustments

Figure 25. Inappropriate Engine Operation

118
26. Follow-on Analysis for the Model
The occurrence of the top event is due to different combinations of basic events.
A fault tree provides useful information about these combinations. In this approach, we
introduce the concept of the “cut set.” A cut set is “a set of basic events” whose
occurrences result in the top event. A cut set is said to be a “minimal cut set” if any basic
event is removed from the set and the remaining events no longer form a cut set.120

For example, Figure 26 shows that the set {1, 2, 3, and 4} is a cut set because if
all of the four basic events occur, then the top event occurs.

Top Example Event

AND

#3 Basic
#4 Basic
OR

Event
Event

#1 Basic #2 Basic
Event Event

Figure 26. Example for Cut Set. (After Kececioglu, page 223)

120 Kececioglu, D., Reliability Engineering Handbook Volume 2, Prentice Hall Inc., 1991, page 222.

119
This is not the minimal cut set, however, because if the basic event 1 or basic
event 2 is removed from this set, the remaining basic events {1, 3 and 4} and {2, 3 and 4}
still form cut sets. These two sets are the minimal cut sets in that example.

In the SUAV case, there is an absence of AND gates. Only OR gates are present.
For example, trying to find the minimal cuts for engine failure, gates in the following
diagrams are involved:

(1) Engine failure diagram (E1)

(2) Mechanical engine failure (E3)

(3) Engine vibrations (E5)

(4) Operator error (L5)

(5) Overheating (E4)

(6) Inappropriate engine operation (E6)

Naming the gates G1, G2, up to G8,we number each basic event related to each of
the gates. For example, in the engine failure diagram we have gate G1 with the following
basic events:

(1) Mechanical engine failure that corresponds to gate G2 in Diagram E3.

(2) 1G1, engine fire

(3) 2G1, improper fuel

(4) 3G1, fuel/air improper mixture

(5) 4G1, excessive engine vibrations

(6) G3, the gate that corresponds to loss of lubrication

(a) 1G3, improper lubricant

(b) 2G3, excessive engine temperature raises

(c) 3G3, gas/lubricant improper mixture

Working in the same way we end up with the diagram in Figure 27.

120
E1

Engine
Failure

OR
G1

Engine Loss of
Mechanical
E5 Vibrations 1G1 Lubrication
Engine
E3 Failure
G4 G3
2G1

OR
OR
3G1
G2 4G1

OR
1G3
3G3 2G3
7G4 1G4
1G2
6G4 2G4 Crash
Damage
E4
5G4 3G4 Overheating
2G2
4G4 Operator E6
L5 Inappropriate 3G2
Error
Engine
Operation 4G2
5G2 G7

OR
G5
OR

6G2

1G5 11G7 1G7


9G5

2G5 10G7 2G7


8G5

9G7 3G7
3G5
OR

7G5 G6
8G7 4G7
6G5 4G5 Engine
Operating
too fast 7G7 5G7
1G6
5G5 7G6 6G7
2G6
6G6
OR

G8
5G6 3G6

4G6
3G8
1G8
2G8

Figure 27. Engine Failure Combined Diagram

121
According to the MOCUS algorithm, which generates the minimal cut sets for a
fault tree in which only AND and OR gates exist, an OR gate increases the number of cut
sets while an AND gate increases the size of a cut set.121 MOCUS “algorithm is best
explained by an example.”122 In the following paragraph, the steps of MOCUS algorithm
were followed to determine the minimal cut sets.

Locating the uppermost gate, which is the OR gate G1, we replace the G1 gate
with a vertical arrangement of the inputs to that gate. Were it an AND type, then we
should have replaced it with a horizontal arrangement of the inputs to that gate.
Continuing in the next level to locate the gates, and replacing them in the above-
prescribed way yields Table 7.

121 Kececioglu, page 222.


122 Hoyland, page 88.

122
1 2 3 4 5 …… …… Last
G1 1G1 1G1 1G1 1G1
2G1 2G1 2G1 ……
3G1 3G1 3G1 4G1
4G1 4G1 4G1 1G2
G2 1G2 1G2 ……
G3 2G2 2G2 6G2
3G2 3G2 1G3
4G2 4G2 ……
5G2 5G2 ……
6G2 6G2 ……
G4 1G4 ……
G5 ….. ……
G6 7G4 ……
G7 1G5 ……
1G3 ……
2G3 9G5
3G3 1G6
……
7G6
G8
1G7
......
11G7
1G3 ……
2G3 ……
3G3 3G8

Table 7. Cut Set Analysis. (After Kececioglu, page 229)

In the last column of table 7, we have the set of minimal cuts for the engine
failure, which is ({1G1},{2G1},{3G1},{4G1},{1G2},{2G2},…,{1G8},{2G8},{3G8}).
The reason for the set of one element sets is the OR gates and the absence of AND gates.

An equivalent approach for the MOCUS algorithm starts from the lowermost
gates. It replaces an OR gate with the union (+) sign and a AND gate with the intersection
(*) sign, and after all the expressions are obtained, it continues the procedure to the gates

123
one step above from the lowermost gates. It continues in this way until the expression for
the top event is obtained.123

Following this algorithm, we have to end up with the same result as the MOCUS
algorithm, given as an expression of intersections and unions. In our case, we end up
with: E1= 1G1+2G1+3G1+4G1+1G2+2G2….+1G8+2G8+3G8. The equivalent to that
expression diagram is given in Figure 28.

E1

Engine Failure

OR

1G1
1G8

2G8
2G1

3G8

1G2
3G1

2G2
4G1 ….

Figure 28. Equivalent Diagram

123 Kececioglu, page 230.

124
Trying to find the equivalent representation to a block diagram, we end up with a
“chain like” representation that can be seen in Figure 29. A fault-tree representation of a
system can be converted into a block-diagram representation by replacing the AND gates
with parallel boxes and the OR gates with boxes in series. 124

Engine Failure

1G1 2G1 3G1 4G1 1G2 2G2 …. 1G8 2G8 3G8

Figure 29. Equivalent Block Diagram

In a series structure, the component with the lowest reliability is the most
important one. We can compare that with a chain. A chain is never stronger than its
weakest link. So the most important element for reliability improvement is the one with
the lowest reliability.125 Reliability for a series system can be also explained by the use of
Structural Functions, which is summarized in Appendix D.
27. Criticality Analysis
For the criticality matrix, we need a metric for the severity-of-failure effect, so we
can use the designations in Table 8.

Description Classification Mishap definition


Catastrophic I System or platform loss
Critical II Major system damage
Marginal III Minor system damage
Minor IV Less than minor system
damage.

Table 8. Classification of Failures According To Severity (After RAC FMECA, page


26)

124 Blischke, page 220.


125 Hoyland, page 197.

125
Due to the absence of historic lack of data, it is appropriate to use a qualitative
approach for the classification of failures according to their occurrence number which is
the overall probability of failure during the item operating time internal, as illustrated in
Table 9.126

Level Occurrence Description Occurrence


number
A Frequent High probability of >0.20
occurrence
B Reasonably probable Moderate probability >0.10 and
of occurrence <0.20
C Occasional Occasional >0.01 and
probability of <0.10
occurrence
D Remote Unlikely probability >0.001 and
of occurrence <0.01
E Extremely Unlikely Essentially zero <0.001

Table 9. Classification of Failures According To Occurrence

From our previous analysis for engine failure using FTA, we ended up with the
following reasons:

a. Excessive engine vibrations

b. Fuel/air improper mixture

c. Improper fuel

d. Engine fire

e. Gas and lubricant improper mixture

f. Excessive engine temperature rise

g. Improper lubricant

h. Inadequate personnel training

126 RAC FMECA, page 60.

126
i. Personnel fatigue

g. Operator’s frustration and lack of experience

k. Inadequate man machine interface

l. Operator’s wrong reaction to failure

m. Environmental reasons

n. Misjudgment due to environmental reasons (mainly weather)

o. Poor documentation of procedures

p. Poor workload balance resulting in task saturation with resulting loss of


situational awareness

q. Ergonomics (Human factors) of GCS

r. Bad material

s. Normal engine wear

t. Bad manufacture

u. Bad design

v. Insufficient maintenance

w. Carburetor failure

x. Broken piston

y. Bearing failure

z. Improper engine mounting

aa. Lack of propeller balancing

bb. Broken piston rings

cc. Bearing failure

dd. Dirty cooling areas

ee. Improper propeller size

127
ff. Improper engine adjustments

gg. Broken piston rings

hh. Engine stalls (during flight)

ii. Bad carburetor adjustments

jj. Inappropriate engine cleaning and/or storage after flights

kk. Inappropriate lean runs such as rusted bearings, seized connecting rod
or piston

ll. Propeller stops while turning.

From the above, we can derive the following issues about an engine failure
criticality analysis, initially based on our own experience and judgment due to lack of
tracking by current operators:

128
Number Issue ID Probability of Severity of
occurrence failure effect
1 Excessive engine vibrations L1 D II
2 Engine fire L2 D I
3 Fuel type L3 D III
4 Lubricant type L4 D III
5 Fuel/air mixture adjustment L5 C III
6 Gas and lubricant mixture L6 D III
7 Personnel training P1 C II
8 Operator’s frustration P2 C II
9 Personnel experience P3 B III
10 Poor documentation of procedures P4 C II
11 Poor workload balance P5 C II
12 Ergonomics of GCS P6 C II
13 Misjudgment P7 B II
14 Environmental reasons P8 C II
15 Man machine interface P9 D III
16 Maintenance P10 D II
17 Engine adjustments P11 C III
18 Usage P12 B II
19 Manufacture P13 D III
20 Software failure S D II
21 Material M1 D I
22 Hardware failure M2 E III
23 Design M3 D II
24 Engine wear M4 D II
25 Carburetor M5 C II
26 Piston M6 E II
27 Bearing M7 C I
28 Piston rings M8 E I
29 Propeller size PR E II
30 Engine temperature T1 D II
31 Cooling areas T2 D II

Table 10. Qualitative Occurrence and Severity Table

129
Our next step is to construct the criticality matrix based on the previous
qualitative analysis table:

A
)

P3 P7
Probability of occurrence (increasing

P12

P11 P6 P8 P2 M7
C
P5
L5 P1 M5 P4

P13 L3 T2 M4 L1 L2
D L6 P10
L4 P9 S M3 T1 M1

M2 M6 M8
E

PR

IV III II I

Severity classification (increasing )

Figure 30. Engine Failure Criticality Matrix. (After RAC FMECA, page 33)

“The criticality matrix provides a visual representation of the critical areas” of our
engine failure analysis.127 Items in the upper most right corner of the matrix require the
most immediate action and attention because they have a high probability of occurrence
and a catastrophic or critical effect on severity. Diagonally toward the lower left corner of
the matrix, criticality and severity decreases. In case the same severity and criticality,

127 RAC FMECA, pages 33-34.

130
exists for different terms, safety and cost are the driving factors of the analysis. For
SUAVs, we do not take safety under great consideration because we are dealing with
unmanned systems, but we do have to consider cost.

Table 11 shows the results from our analysis:

Number Issue ID Probability of Severity of


Occurrence Failure
Effect
1 Misjudgment P7 B II
2 Usage P12 B II
3 Bearing M7 C I
4 Personnel training P1 C II
5 Operator’s frustration P2 C II
6 Personnel experience P3 B III
7 Poor documentation of procedures P4 C II
8 Poor workload balance P5 C II
9 Ergonomics of GCS P6 C II
10 Environmental reasons P8 C II
11 Carburetor M5 C II
12 Fuel/air mixture adjustment L5 C III
13 Engine adjustments P11 C III

Table 11. Results from Engine Failure Criticality Analysis. The most critical issues are
highlighted.

28. Interpretation of Results


From the above, it is obvious how important the human factor is. The way the
user operates the system: the ability to make the right decisions, frustration, training,
experience, poor workload balance among the operators and poor documentation of
procedures are among the most critical factors for our engine failure mode. The way the
user maintains the system, also related to training and experience, the ability to adjust the
engine and the fuel-air mixture properly are also among the critical contributors for
engine failure mode.

The importance of the bearing and carburetor are clearly shown. Those two parts
are the most critical among all the parts composing the engine, according to our analysis.

131
Finally, environmental reasons conclude the most critical of the issues that could
result in an engine failure.

132
III. DATA COLLECTIONS SYSTEMS

A. RELIABILITY GROWTH AND CONTINUOUS IMPROVEMENT


PROCESS
SUAVs do not have a FRACAS system. In this section of the thesis we construct
one. The FRACAS system is addressed to the Program Manager of any SUAV type
during the phase of design, development or operation.
1. Failure Reporting and Corrective Action System (FRACAS)128
“The basic measure of FRACAS effectiveness is its ability to function as a
closed-loop coordinated system” in identifying and repairing product and/or process
failure modes, and identifying, implementing and verifying a corrective action to prevent
repetition of the failure. “As a result, early elimination of causes of failure or trends,”
greatly improves reliability.

At each stage of product development, the closed-loop FRACAS should collect


and evaluate information for each failure incident, as shown in Figure 31.

128 The material for this section is taken (in some places verbatim) from: RAC Toolkit, pages 284-289,
and: National Aeronautics and Space Administration (NASA), “Preferred Reliability Practices: Problem
Reporting and Corrective Action System (PRACAS),” practice NO. PD-ED-1255, Internet, February 2004.
Available at: http://klabs.org/DEI /References/design_guidelines/design_series/1255ksc .pdf
133
12
Determine
11 Effectiveness of 1
Incorporate Corrective Action
Failure Observation
Corrective Action
and Do Operational
Performance Test
13
10 Incorporate 2
Determine Corrective Action Failure
Corrective Action Into All Products Documentation

Root Cause
9 Analysis
Failure Verification 3

Closed-loop FRACAS

Failure Analysis Failure Isolation

8
4

Problematic Item
Data Search
Replacement

7 Problematic Item
Verification 5

Figure 31. Closed-loop for FRACAS (After NASA, PRACAS, page 2)

In order to conduct FRACAS, we need to follow a FRACAS flow and evaluation


checklist:
a. Failure Observation
In the first step, we identify that a failure incident has occurred and we
notify all required personnel about the failure.

134
b. Failure Documentation
We record all relevant data describing the conditions in which the failure
has occurred. A detailed description of the failure incident as well as supporting data and
equipment operating hours is needed.
c. Failure Verification
If the failure is permanent, then we verify the incident by performing tests
for failure identification. If the failure is not permanent, then we verify the incident by
uncovering the conditions in which the failure has occurred. Finally, if the failure cannot
be verified, we pay close attention to the reoccurrence of failure.
d. Failure Isolation
For failures that were verified, we perform testing and troubleshooting to
isolate their causes. Isolating failure can identify a defective part or parts of the system,
or it can relate the incident to other reasons, like operator’s error, test equipment failure,
improper procedures, lack of personnel training, etc.
e. Replacement of Problematic Part(s)
For the above failures, we replace the problematic part or parts with a
known good one and replicate the conditions under which the failure has occurred. By
testing, we confirm that the current part (or parts) has been replaced. If failure reappears
we repeat failure isolation in order to determine the cause of failure correctly. We have to
tag the replaced part or parts, including all relevant documentation and data.
f. Problematic Part(s) Verification
We have to verify the problematic part(s) independent of the system. If the
failure cannot be confirmed then we have to review failure verification and isolation to
determine the right failure part(s). The isolation of the failure to the lowest possible level
of the system’s decomposition is the key to reveal the root failure cause.
g. Data Search
In this step, it is necessary to look up historical databases and reports for
similar or identical failure documentations. Databases could be from the implementation
of FRACAS methodology itself or could be from a FMEA or other technical reports.
Failure tendencies or patterns, if any, must be evaluated because they may reveal

135
defective lots of parts, or bad design, or bad manufacturing, or even bad usage. This is
obviously absent for SUAV systems.
h. Failure Analysis
A failure analysis to determine the root failure cause follows next. The
depth and the extension of failure analysis depend on the criticality of the mission, the
system’s reliability impact and the related cost. The outcome of failure analysis should be
specify failure causes and identify any external causes.
i. Root-Cause Analysis
This answers the question, “what could have been done to prevent
failure?” It focuses more on the true nature of failure, which could be due to:
• Overstress conditions
• Design error
• Manufacturing defect
• Unfavorable environmental conditions
• Operator or procedural error, etc
j. Determine Corrective Action
In this phase, we have to develop a corrective action. We have to rely on
the failure analysis and root-cause analysis results and our solution should prevent
reappearance of the failure in the long term in order to be effective. Corrective actions
could be:
• System redesign
• Part(s) redesign
• Selection of different parts or suppliers
• Improvements in processes
• Improvements in manufacturing etc
k. Incorporate Corrective Action and Operational Performance Test
Now, we can incorporate the identified corrective action in the failed
system and perform initial baseline tests as a start in order to verify the desired
performance. After the first successful results, our tests should become operational tests
including conditions under which the failure had occurred. After the documentation of all
test results, we can compare the pre-failure test results to identify alterations in baseline
136
data. Testing should be sufficient enough to give us the confidence level that the original
failure mode has been successfully eliminated from reoccurring. For large-scale
incorporation of a corrective action, verifying the action is first needed to avoid
unnecessary delays and expenses.
l. Determine Effectiveness of Corrective Action
We have to verify that our corrective action:
• Has successfully corrected the failure
• Has not created or induced other failures
• Has not degraded performance below acceptable levels

If the original failure reoccurs, we have to repeat the FRACAS process


from the beginning to determine the correct root cause.
m. Incorporate Corrective Action into All Systems
After verifying our corrective action in one system, we can implement our
solution to all similar systems. We have to keep the FRACAS procedure running in order
to track, document, report, and determine the correct root cause and the corrective action
necessary for all failure modes that appear. Corrective actions involve changes to
procedures, alterations to processes and personnel training, so tracking is necessary to
assure that the new versions were implemented correctly and not confused with old ones.
2. FRACAS Basics
Basically, the system must provide exact information on:

a. What was the failure?

b. How did the failure occur?

c. Why did the failure occur?


(1) Was it an equipment or part design error?
(2) Was it an equipment or part manufacturer workmanship error?
(3) Was it a software error?
(4) Was it a test operator error?
(5) Was it a test procedure or equipment error?
(6) Was it an induced failure?

137
d. How can we prevent such failures from reoccurring?

From all the above we can simplify the procedure to the next checklist shown in
Figures 32 and 33.

Start

Failure O bservation

Failure
D ocum entation

Failure YES
D o Tests to Verify
Perm anent
Failure Identification
?

NO

Failure YES D eterm ine Failure


R ecoverable C onditions to Verify
? Failure Identification
R eplace problem atic parts.
NO Identify failure conditions.
Failure C annot Perform tests to verify
For verified failures, problem atic part
Verified.
perform necessary replacem ent
Pay C lose
testing to isolate
Attention for Failure
cause of failure.
R eoccurrence.

NO
Failure
YES Tag Problem atic
R epeats
part(s)
?

Verify the Failure of


the Part(s),
Independent of the
System

NO YES
Failure of
Part(s)
Verified ? G o to
page 2/2

Figure 32. FRACAS Methodology Checklist page 1/2

138
From
page 1/2
Inputs from:
Historic databases
Data Search FRACAS implementation
FMEA
Technical reports

Specify failure causes


Failure Analysis Address identification of
external causes

Root-Cause
What could have happened to prevent failure ?
Analysis
Repeat FRACAS

Determine
Document corrective action
Corrective Action

Incorporate
Document results
Corrective Action
Verify desired performance
and do Operational
Performance Tests

Determine Verify that our corrective action has:


Effectiveness of 1. Successfully corrected the failure
Corrective Action 2. Not induced other failures
3. Not degraded performance below
acceptance levels

Incorporate
YES Does Original NO
Corrective Action
Failure Reoccur
into all Systems of
?
the Same Kind

Continue
Implementation of
FRACAS

Figure 33. FRACAS Methodology Checklist page 2/2

139
3. FRACAS Forms
I have developed forms to implement the FRACAS methodology for SUAVs:

a. Failure report as shown in Table 12 and 13

b. Failure analysis report as shown in Table 14129

c. Corrective action verification report as shown in Table 15

d. Tag to problematic part as shown in Table 16.

e. Failure Log-Sheet as shown in Table 17.

During recent operations experimenting with the Surveillance and Tactical


Acquisition Network (STAN) project at Camp Roberts, observers identified reliability
and operational availability issues for SUAVs. As a result, I developed these forms for
use during upcoming operations with the XPV 1-B TERN SUAV system.130

These forms were presented to a VC-6 team for use during the STAN experiment
during May 2004. The effort to implement these forms was not successful. The primary
reason was lack of personnel training related to the FRACAS system itself, the form
filling, and the general concept of reliability. The secondary reason was lack of
coordination and control to fill these forms. It was obvious that a member of the
operating team, assigned with the extra task to coordinate and control the proper data
entry for the forms, was needed.

“It is preferable to attempt to communicate the ‘big picture,’ so that each team
member is sensitive to failure detection” and identification, and “the appropriate
corrective action process.” 131 Nevertheless, it is typical especially in military
applications to have overall control, so a centralized FRACAS administration within a
team or teams is needed.

The forms cover all aspects of SUAV design, development, production and
operation with emphasis to experienced operation or test teams. All forms are addressed

129 RAC Toolkit, page 290.


130 Gottfried.
131 RAC Toolkit, page 284.

140
to the operating or test team. The Failure Analysis Report form is also addressed to the
design and development team.

Using the forms we can collect information and data to the level of detail
necessary to identify design and/or process deficiencies that should eliminated, preferably
before the SUAV released to its users in the battlefield. For that reason the forms can be
used for other systems as well as SUAVs.

The characteristics of the forms are:

a. Simple and easy to implement even by one or two persons

b. Brief in meanings and implementation time (time oriented)

c. Suitable for cheap systems like SUAV (cost oriented)

d. Focused on elimination of fault reoccurrence

e. Generates data collection that can be used as data-base source

There are no known forms of FRACAS or any other reliability tracking system
that have been used for SUAV testing or operation in the past.
4. Discussion for the Forms Terms
Most of the terms in those forms are self-explanatory. Some discussion follows
for some of them.

a. For the Initial Failure Report in Table 12:

(1) Total Operating Hours, in position (8), is the cumulative


operation hours for the SUAV system

(2) Current Mission Hours, in position (9), is the operation hours


from the beginning of the current mission that the fault was been detected

(3) Description of Failure, in position (15), is the full description of


the observed failure.

(4) Supporting data, in position (16) all available telemetry data


related to the failure time must be listed. In position (16a) Environmental
Parameters/Conditions, is the list of environmental conditions and parameters available

141
like temperature, wind speed, humidity/precipitation, cloudiness, lightning, and fog, icing
condition, proximity to sea or desert or inhabited area. In position (16b) System
Parameters/Conditions, is a list of system conditions and parameters like, flight altitude,
platform speed, engine RPM, fuel level, battery status, communication status, LOS
availability.

(5) Actions for Failure Verification, in position (17), is a


description of the operators’ actions that verify the failure.

(6) Affected Subsystems, in positions (18) to (22) and (23) to (27)


are references for the effected subsystems of the SUAV system, during the failure
incidence.

(7) System Condition after Failure, in position (29) is a description


of the system general condition after the failure. For example, “platform crashed due to
loss of control.”

b. For the Failure Report continued in Table 13:

(1) Problematic Parts Recognized, in position (16) is a list of the


affected parts that have been recognized after the failure.

(2) Problematic Parts Replaced, in position (17) is a list of the parts


that have been replaced after the failure in an effort to isolate the failure cause.

(3) Root Failure Cause, in position (18) is an estimate or the


outcome of the previous efforts to isolate the failure cause.

(4) Previous Similar or Same Cases (if any), in position (29) is a


reference to similar or same failure cases based on historical data or other accurate
sources.

(5) Background, in position (30) is all background information


related to the failure. For example it could be an explanation of a sensor subsystem, or a
software function.

c. For the Failure Analysis Report in Table 14:

142
(1) History, in position (31) is a complete description of the
observed failure and all the events that followed.

(2) Analysis, in position (32) is the failure analysis based on data,


drawings and blueprints, manuals, and opinion from experts, designers, and operators.

(3) Conclusions, in position (33) are the analysis outcome.

(4) Corrective Action/Recommendation, in position (34) is the


result of that form. This is the recommended solution to the problem.

d. For the Corrective Action Verification Report:

(1) Operating Hours after Previous Failure, in position (13), is the


cumulative operation hours after the previous failure which resulted in corrective action
taken for the SUAV system.

(2) Tests for Corrective Action Verification Made, in position (17),


is a reference to all tests that have been made to verify that the recommended solution is
correct.

(3) Alterations from Baseline Data, in position (22), is a list of all


alteration from the initial data readings, after the implementation of the recommended
solution.

(4) Corrective Action Taken, in position (21), is a statement about


the corrective actions that have been taken in order to solve the problem.

e. For the Failure Log-Sheet:

All entries in the Log-Sheet, like Date, Time, Number, Initial


Report Number and Failure Description, must be consistent to the relevant entries in the
other forms. In that way we can easily track the failure cases when is needed.

In all forms, except the Log-Sheet form, there is a term for Comments. It covers
any other detail that the operator or the tester estimates that is relevant to the failure and
warrants mention.

143
1. No Form type: 2. Page 1 of
Initial Failure Report _____
3. Project ID 4. System 5. Serial 6. Detected 7. Failure 8.Total 9. Current
No During Date, Time Operating Hours Mission Hours

10. Reported by 11. Verified by 12. System 13. Type of System’s 14. Type of Failure
Operated by: Mission (permanent/recoverable)

15. Description of Failure

16. Supporting Data:


a. Environmental Parameters/Conditions

b. System Parameters/Conditions

17. Actions for Failure Verification

18. Name 19. Reference Drawings 20. Part No 21. Manufacturer 22.Serial No
SUBSYSTEMS
AFFECTED

23. Name 24. Reference Drawings 25. Part No 26. Manufacturer 27.Serial No

28. Quick Failure Assessment (if any)

29. System Condition after Failure

30. Comments

31. Prepared by 32 Date 33 Checked (reliability) 34.Date 35. Problem No

36. Checked (engineering) 37. Date 38. Checked (program) 39. .Date 40. Distribution

Table 12. Initial Failure Report Form

144
1. No Form type: 2. Page 1 of
Failure Report (Continued) _____
3. Project ID 4. System 5. Serial 6. Detected 7. Failure Date, 8.Total 9. Current
No During Time Operating Hours Mission Hours

10. Reported by 11. Verified by 12. System 13. Type of System’s 14. Number of Failure
Operated by: Mission

15. Description of Failure (brief)

16. Problematic Parts Recognized:

17. Problematic Parts Replaced:

18. Root Failure Cause

19. Name 20. Reference Drawings 21. Part No 22. Manufacturer 23.Serial No
PROBLEMATIC PART(S)

24. Tagged 25. Failure Verified 26. Failure Verified by 27. Failure Verified by 28. System Condition
by: by(reliability) : (engineering) : (program) : after Replacement:

29. Previous Similar or Same Cases (if any)

30. Background

31. Comments

321. Prepared by 33. Date 34. Checked (reliability) 35. Date 36. Problem No

37. Checked (engineering) 38. Date 39. Checked (program) 40. Date 41. Distribution

Table 13. Failure Report Continuation Form

145
1. No Form type: 2. Page 1 of
Failure Analysis Report _____
3. Project ID 4. System 5. Serial 6. Test Level 7. Failure Date 8. Operating Hours 9. Reported
No by

MAJOR 10. Name 11. Reference 12. Part No 13. Manufacturer 14.Serial No
COMPONENT OR Drawings
UNIT
15. Name 16. Reference Drawings 17. Part No 18. Manufacturer 19.Serial No
SUB ASSEMBLY

20. Name 21. Reference Drawings 22. Part No 23. Manufacturer 24.Serial No

25. Name 26. Reference Drawings 27. Part No 28. Manufacturer 29.Serial No
PART(S)

30. Related MRs and PINs

31. History

32. Analysis

33. Conclusions

34. Corrective Action/Recommendation

35. Corrective Action by 36. Document No 37. Corrective Action


Effectiveness

38.Prepared by 39. Date 40. Approval (reliability) 41.Date 42. Problem No

43. Approval (engineering) 44. Date 45. Approval (program) 46. .Date 47. Distribution

Table 14. Failure Analysis Report Form (From RAC Toolkit, page 290)

146
1. No Form type: 2. Page 1 of
Corrective Action Verification Report _____
3. Project ID 4. System 5. Serial 6. Test Level 7. Failure Date 8. Total Operating 9. Reported
No Hours by

10. Initial Failure 11. Failure Report 11. Failure 12. Current 13. Operation 14.Number of
Report form Number Continue form Number Analysis Mission Hours Hours after Corrective
Report Before Failure Previous Failure Action Taken

16. Related Drawings, Documents, Other Data

17. Tests for Corrective Action Verification Made

18. Test Conditions


a. Environmental Conditions

b. System Condition

19. Test Results

20. Alterations from Baseline Data.

21. Corrective Action Taken

22. Comments

23. Corrective Action Taken by 24. Date 25. Document No 26. Corrective Action
Effectiveness
27. Prepared by 28. Date 29. Approval (reliability) 30. Date 31. Problem No

32. Approval (engineering) 33. Date 34. Approval (program) 35. Date 36. Distribution

Table 15. Correction Action Verification Report Form

147
1. No Form type: 2. Page 1 of
Tag to Problematic Part _____
3. Project ID 4. System 5. Serial 6. Detected 7. Failure Date 8. System’s Total 9. Reported
No During Operating Hours by

10. Initial Failure 11. Failure Report 12. Failure 13. Corrective 14. Operation 15. Total
Report form Number Continue form Number Analysis Action Verification Hours after Number of
Report Report Previous Failures.
Failure

16. Failure Description

17. Failure Relevant Documentation

18. History

19. Name 20. Reference drawings 21. Part No 22. Manufacturer 23.Serial No
PROBLEMATIC PART

24. Tagged 25. Failure Verified 26. Failure Verified by 27. Failure Verified by 28. System
by: by(reliability) : (engineering) : (program): Condition after
Replacement:

29. Comments

30. Verified by 31. Date 32. Document No 33. Corrective Action


Effectiveness
34. Prepared by 35. Date 36. Approval (reliability) 37. Date 38. Problem No

39. Approval (engineering) 40. Date 41. Approval (program) 42. Date 43. Distribution

Table 16. Tag to Problematic Part Form

148
Form type:
Failure Log-Sheet
4. Operator 5. Failure Description (brief) 7. Initial 8. Initials

6. Reported?
Report
1. Number

Number
3. Time
2. Date

9. Checked by 10. Date 11. Mission Description

Table 17. Failure Log-Sheet

Use of these forms will allow detailed analysis of the causes of failure and
detailed modeling of reliability by subsequent analysts.

149
5. Reliability Growth Testing 132
It is almost certain that prototypes or new designs will not initially meet their
reliability goals. Implementation of a reliability enhancement methodology such as
FRACAS is the only way to overcome the initial problems that may surface in the first
prototype performance tests and later. Therefore, failures are identified, and actions taken
to correct them. As the procedure continues, corrective actions become less frequent.
After a reasonable amount of time, one must check whether reliability has improved, and
estimate how much additional testing is needed.

Duane observed that there is a relationship between the total operation time (T)
accumulated on a prototype or new design and the number of failures (n(T)) since the
beginning of operation. 133 If we plot the cumulative failure rate (or cumulative mean
time between failures MTBFc ) n(T)/T versus T in a log-log scaled graph, the observed
data tends to be a linear regardless of the type of equipment under consideration.

Duane’s plots provide a rough estimate of the increment of the time between
failures. It is expected that time between failures at the early stages of development will
be short. But soon after the first corrective actions they will gradually become longer. As
a consequence Duane’s plots will show a rapid reliability improvement in the early stages
of development. After the first corrective actions the reliability improvement would be
less rapid. After a corrective action we can see whether there is a reliability improvement
or not. So we can have a measure of effectiveness of our corrective actions, which
corresponds to the growth of reliability.

132 The material for this section is taken (in some places verbatim) from: Lewis, E. E., Introduction to
Reliability Engineering, Second Edition, John Wiley & Sons, 1996, pages 211-212.
133 Duane, J. J., “Learning Curve Approach to Reliability Modeling,” Institute of Electrical and
Electronic Engineers Transactions on Aerospace and Electronic Systems (IEEE. Trans. Aerospace) 2563,
1964.
150
1000.0

100.0
Cumulative Failure Rate n(T)/T

10.0

a>0
a=0
1.0 a<0

a=- 1
0.1

10 100 1,000 10,000 100,000 1,000,000

Cumulative Operating Hours T

Figure 34. Duane’s Data Plotted on a Log-log Scale.

Figure 33 illustrates a Duane’s data plot for a hypothetical system. Because of the
straight line we get: ln[n(T ) / T ] = α ⋅ ln(T ) + b and then:

eln[n (T ) / T ] = e[α ⋅ln (T ) +b ] ⇔ n(T ) / T = eb ⋅ T α ⇔ n(T ) = eb ⋅ T (1+α ) = K ⋅ T (1+α ) if K = eb , and so

finally we have n(T ) = K ⋅ T (1+α ) . Alpha (α) is the growth rate or the change in MTBF per
time interval over which change occurred and K is a constant related with the initial
MTBF.

a. If α=0, there is no improvement in reliability because the straight line is


parallel to the cumulative operating hours axis, which means that there is no change in
the cumulative failure rate.

b. If α<0, then the cumulative failure rate decreases, and the expected
failures become less frequent as T increases. Therefore reliability increases.

151
c. If α=-1, n(T ) = K = eb = constant . Therefore the number of failures is
independent of time T. We can assume that α=-1 is the theoretical upper limit for
reliability growth.

d. If α>0, then the cumulative failure rate increases, and the expected
failures become more frequent as T increases. Therefore reliability decreases.

From n(T ) = K ⋅ T (1+α ) we have: n(T ) / T = K ⋅ T α which is the reciprocal of


cumulative MTBF. And so the testing time required to achieve a given failure rate
−1
(MTBF), is ( K ⋅ MTBF ) α .

6. Reliability Growth Testing Implementation


In order to implement the above-mentioned methodology, we may consider the
system as an entity and as a set of entities. In the first case, we just count all systems
failures and the operational hours related to each failure. In the second case, we may
consider that the system is the composition of:

a. Propulsion and power

b. Flight control and navigation

c. Communication and sensors

d. GCS (Human in the loop)

e. Miscellaneous.

Each failure can be assigned to one of the above categories and therefore we have
to keep track of five different reliability tendencies.

B. RELIABILITY IMPROVEMENT PROCESS


1. UAVs Considerations
For a reliability improvement process application in SUAVs we can consider the
following:

a. There is no officially accepted future system concept of operations for


SUAVs.

152
b. There are many classified and unclassified reports published on many
different types of SUAVs.

c. Many systems have been tested and there are plans for future tests in
battlefield environments and in deployments with the fleet.

(1). The EWASP SUAV system.134

(2). The XPV-1B TERN UAV system135

(3). The Sea ALL (Sea Airborne Lead Line) SUAV system which is
a variety of the USMC Dragon Eye UAV.136

d. There is a real operational need for SUAVs during deployments of the


fleet. For example, due to an urgent requirement to maintain a continuous recognized
maritime picture of the Carrier Strike Group vital area, small UAVs are needed to assist
the limited existing maritime patrol aircrafts. For that reason, a request for the SUAV
Archangel to be used onboard USS Enterprise CSG has been released.137

e. There is a real problem regarding the reliability of those systems. UAVs


in general have roughly up to 100 times the failure rate of manned aircrafts, and SUAVs
are even more failure prone than larger ones. The US Office of the Secretary of Defense’s
UAV Roadmap, which was released in May 2003, recommends that more research be
made into low Reynolds-number flight regimes, investigations be carried out for
enhancing UAV reliability and therefore availability. It also recommends the
incorporation and development of all-weather practices into UAV designs.138

134 Morris Jefferson, Aerospace Daily, December 8, 2003, “Navy To Use Wasp Micro Air Vehicle To
Conduct Littoral Surveillance.”
135 Message from COMMMNAVAIRSYSCOM to HQ USSOCOM MACDILL AFB FL, March 26,
2004, “UAV Interim Flight Clearance for XPV-1B TERN UAV System, Land Based Concept of Operation
Flights.”
136 Sullivan Carol, Kellogg James, Peddicord Eric, Naval Research Lab, January 2002, Draft of
“Initial Sea All Shipboard Experimentation.”
137 Undated message from Commander, Cruiser Destroyer Group 12 to Commander, Second Fleet,
“Urgent Requirement for UAVs in Support of Enterprise Battle Group Recognized Maritime Picture.”
138 UAV Rolling News, “UAV Roadmap defines reliability objectives,” March 18, 2003, Internet,
February 2004. Available at: http://www.uavworld.com/_disc1/0000002
153
2. UAVs and Reliability
The U.S. military UAV fleet (consisting of Pioneers, Hunters, and Predators)
reached 100,000 cumulative flight hours in 2002. This milestone is a good point at which
to assess the reliability of these UAVs. Reliability is an important measure of
effectiveness for achieving routine airspace access, reducing acquisition system cost, and
improving UAVs mission effectiveness. UAV reliability is important because it supports
their affordability, availability, and acceptance.139

UAV reliability is closely tied to their affordability primarily because UAVs are
expected to be less expensive than manned aircraft with similar capabilities. Savings are
based on the smaller size of the UAVs and the omission of pilot or aircrew systems.
a. Pilot Not on Board140
With the removal of the pilot and the tendency to produce a cheaper UAV,
redundancy was minimized and component quality was degraded. Yet UAVs became
more prone to in-flight loss and more dependent on maintenance. Therefore, their
reliability and mission availability were decreased significantly. Being unmanned, they
cannot provide flight cues to the user such as:

• Acceleration sensation,

• Vibration response,

• Buffet response,

• Control stick force feedback,

• Any higher longitudinal, directional and lateral control sensitivities.

• Direct feeling of the failure, in general.

Ground testing and instrumentation data analysis are the only source for
such cues.
b. Weather Considerations141

139 OSD 2002, Appendix J, page 186.


140 The material for this section is taken from: Williams Warren, Michael Harris, “The Challenges of
Flight –Testing Unmanned Air Vehicles,” Systems Engineering, Test & Evaluation Conference, Sydney,
Australia, October 2002.
154
Experience has shown that the most important operational consideration
for flight is the weather, regardless of other technical characteristic, such as engine type,
power or wingspan. Meteorological conditions affect both the platform and the GCS.
Factors include winds, turbulence, cold temperatures at designated altitudes, icing, rain,
fog, low cloudiness, humidity in general and lightning strikes. Meteorological conditions
affect the GCS include extreme ambient temperatures, icing, rain, fog, low cloudiness,
humidity and lightning strikes. These considerations can be mitigated because of the
relaxed constraints of ground units compared to the restricted constraints for small aerial
units.

For the platform the most important weather condition is wind speed and
direction at surface (the lowest 100 meters of the atmosphere) and upper levels. Other
weather conditions are important but do not affect the flight unless they are extreme.
Surface winds affect air-platforms during takeoff and landings, but also during preflight
and post flight ground handling. Light winds are most favorable for routine operation and
testing. High winds during flight can cause significant platform drift, which results in
poor platform position controllability. This can render a mission profile infeasible and
result in flight cancellation.

Prior to deploying any UAV system, a study must be made of the


prevailing meteorological conditions. If conditions are extreme (such as very high winds,
extreme cold, or high altitude), then the UAV system may not be mission capable, and a
different asset may be better suited. Alternate UAVs or manned systems should be
considered in this case.

141 Teets, Edward H., Casey J. Donohue, Ken Underwood, and Jeffrey E. Bauer, National Aeronautics
and Space Administration (NASA), NASA/TM-1998-206541, “Atmospheric Considerations for UAV
Flight Test Planning,” January 1998, Internet, February 2004. Available at:
http://www.dfrc.nasa.gov/DTRS /1998/PDF/H-2220.pdf
155
c. Gusts and Turbulence
The high susceptibility of the platform to gusts and turbulence makes
stabilizing flight operation points very difficult. The platform’s low-wing loading can
lead to high-power loading due to gusts, and turbulence and the low inertia are the main
reasons for that behavior142.

During the development test and evaluation period (DT&E), an SUAV can
be tested in aerodynamic/wind tunnels to establish its general flight characteristics. A
basic flight manual can be produced during DT&E that will be tested and refined during
the operational test and evaluation period (OT&E). The advantage of SUAVs is that the
actual airframe can be tested in the wind tunnel, without any analogy or other factor
involved in the calculations because the original platform (and not any miniaturized
model) is being tested.
d. Non Developmental Items (NDI) or Commercial Off-the-shelf
(COTS)
One of the factors in lack of reliability of inexpensive UAVs is the
use of NDI/COTS components that were never meant for an aviation
environment. In many cases, it would have been better to buy the more
expensive aviation-grade components to begin with than to retrofit the
system once constructed. Do not assume COTS components/systems will
work for an application they were not designed for. In other words, they
have to be COTS for that specific use.143

Using NDI/COTS items may save money but require testing in order to
ensure compatibility and to reduce uncertainty in mission efficiency.144
e. Cost Considerations145
By using COTS technology, distributed sensors, communications
and navigation, it is also proposed that the total system reliability may be
increased. It must be noted however that this approach does not currently
account for issues of airworthiness certification.

142 NASA 1998.


143 Clough.
144 Hoivik, Thomas H., OA-4603 Test and Evaluation Lecture Notes, Version 5.5, “The Role of Test
and Evaluation,” presented at NPS, winter quarter 2004.
145 The material for this section is taken (in some places verbatim) from: Munro Cameron, and Petter
Krus, AIAA’s 1st Technical Conference & Workshop on Unmanned Aerospace Vehicles, Systems,
Technologies and Operations; a Collection of Technical Papers, AIAA 2002-3451,“A Design Approach for
Low cost ‘Expendable’ UAV system,” undated.
156
It is a fact that the primary cost item in UAVs is not the vehicles but the
guidance, navigation, control and sensor packages that they carry. Typically all those
technology “miracles” can represent 70% of the system’s cost. Although sensors continue
to decrease in cost, size and power consumption, the demands for more capabilities and
mission types are increasing. As a result, cost is increasing.

We can assume that acquisition cost is proportional to reliability, and wear


out is not proportional to reliability. Then, a generic reliability trade-off can be seen as in
Figure 35. We can conclude that a highly reliable UAV does not coincide with an overall
low system cost.

Another point of interest related to cost and reliability is that reliability is


low for SUAVs because SUAVs are designed to be inexpensive. This statement is true
because reliability is expensive and one truly gets what one pays for.146

146 Clough.

157
Life Cycle Cost

Wear Out Rate


st
Co
n
W
Cost

tio
ea

isi
r

qu
Ou

Ac
t

0 Reliability 100%

Figure 35. Generic Cost Relationship. (After Munro)

f. Man in the Loop


The man-in-the-loop can be accomplished “through nearly all of the
potential controlling equipment available.” UAV control equipment is the link between
man and machine together with the data display mechanisms. Controlling equipment can
be remotely piloted, semi-autonomous with a combination of programmed and remote
piloted, and fully autonomous (full-auto) with pre-flight and/or in-flight programmed.147

Another point of interface between man and machine is maintenance and


pre-flight and after-flight servicing. Piloting a UAV resembles an instrumented manned
flight. For that reason there are four main considerations:

147 Carmichael, Bruce W., and others, “Strikestar 2025,” Chapter 4, “Developmental Considerations,
Man-in-the-Loop,” August 1996, Department of Defense , Internet, February 2004. Available at:
http://www.au.af.mil/au/2025/volume3/chap13/v3c13-4.htm
158
(1) Collision avoidance

(2) Multiple platforms control

(3) Landing (recovery)

(4) Loss of flight control and regain of it.


g. Collision Avoidance
For UAVs, a system is needed that can weigh tasks and put priorities only
on the flight requirements or mission requirements. It is essential to have the capability,
like the pilot does, to sense and to avoid obstacles that most of the time the remote pilot
cannot see.148

If an accurate collision avoidance system were developed, UAVs could


become more responsive to the demanding needs of the battle commander.149 “NASA,
the U.S military, and the aerospace community have joined forces to develop detect, see,
and avoid (DSA) technologies for UAVs.”150 These technologies will also increase safety
operations above residential areas and allow UAVs to join the piloted aerial vehicles in
national airspace.
h. Landing
A lot of UAV mishaps are related to landing. The usual ways for UAVs to
land are:

• Using landing gear on runways or airstrips

• Using landing gear and arresting gear on ship flight-decks

• Making a calculated crash landing without using landing gear

• Recovering in an arresting net

148 Finley, Barfield, Automated Air Collision Avoidance Program, Air Force Research Laboratory,
AFRL/VACC, WPAFB,“Autonomous Collision Avoidance: the Technical Requirements,” 0-7803-6262-
4/00/$10.00(c)2000 IEEE.
149 Coker, David, Kuhlmann, Geoffrey, “Tactical-Unmanned Aerial Vehicle ‘Shadow 200’
(T_UAV),” Internet, February 2004. Available at: http://www.isye.gatech.edu/~tg/cources/6219/assign
/fall2002 /TUAVRedesign/
150 Lopez, Ramon, American Institute of Aeronautics and Astronautics (AIAA), “Avoiding Collisions
in the Age of UAVs,” Aerospace America, June 2002, Internet, February 2004. Available at:
http://www.aiaa.org /aerospace/Article.cfm?issuetocid=223&ArchiveIssueID=27
159
• Landing in sea water

• Using a parachute

• Vertical take-off-and-landing (VTOL)

The most common problems with recovery are lack of experience by the
remote pilot and low altitude winds, even for the VTOL UAVs. To resolve or mitigate
this problem, automated recovery systems can be used. Those systems have been
developed to improve precision, ease and safety of UAV recoveries, on land and sea, and
in a variety of weather conditions.151
i. Losing and Regaining Flight Control
The need for uninterrupted communication between the operator in the
GCS and the platform is a critical capability.152 An interruption of that link is always
possible due to loss of Line-of-Sight (LOS), communication failure related to platform or
GCS, and electromagnetic interference (EMI). The only way to overcome this problem is
autonomy with dependable autopilot and mission control software.153

Autonomy for a UAV platform is based on an onboard computer, which is


responsible for most of the platform’s performance and “behavior”. Subprograms for
time-related loss of communications, regaining communications, points of regaining
communication efforts, and other functions related with mission effectiveness are very
common among UAV software. Additionally, emission control applications help allocate
bandwidth for different uses and may decrease the EMI hazard. For UAVs, which use
different sensor configurations in the same type of platform, there is also a need for
reconfigurable multi-mission processing.154
j. Multiple Platforms Control
151 UAV Annual Report FY 1997: Subsystems, Key subsystem program, “UAV common recovery
system (UCARS),” Internet, February 2004. Available at: http://www.fas.org/irp/agency/daro/uav97
/page36.html
152 Coker.
153 Puscov, Johan, “Flight System Implementation,” Sommaren-Hosten 2002, Royal Institute of
Technology (KTH), Internet, February 2004. Available at: http://www.particle.kth.se/group_docs/admin
/2002/Johan_2t.pdf
154 Robinson, John, Technical Specialist Mercury Computers, COTS Journal, “UAV Multi-Mission
Payloads Demand a Flexible Common Processor,” June 2003, Internet, February 2004. Available at:
http://www.mc.com/literature/literature_files/COTSJ_UAVs_6-03.pdf
160
Demands for piloting a UAV require two operators in general. The aviator
operator (AVO) is responsible for aviating and navigating, and the mission payload
operator (MPO), or Sensor Operator (SENSO), is responsible for target search and
system parameters monitoring. In smaller UAVs there may be only one operator who
does both tasks. Requiring two operators limits the number of operators available for
other missions. Is it possible for those two operators to control two or more platforms
simultaneously?155 Is it also possible for the single operator for the smaller UAV to do
the same?

The SUAV operators are part of a battle team and their primary skill and
training is to fight and then to operate the SUAVs. They operate SUAVs from a distance
yet in the proximity of the battlefield. So, care must be taken in making excessive
workload demands on the SUAV operators. Instead, by making the platform control and
operation more user-friendly, we can optimize the benefits of SUAVs capabilities. When
the operators can stand far enough from the battlefield, user-friendly control of SUAVs is
advantageous, and multiple platform control can become a more realistic capability if
SUAV autonomy is high.
k. Reliability, Availability, Maintainability of UAVs
Reliability is the probability that a UAV system or component will operate
without failures for a specified time (the mission duration) as well as the preflight tests
duration. This probability is related to the mean time between failures (MTBF) and
availability.

Availability is defined as the ability of a system to be ready for use when


needed at an unknown (random) time. It is the natural interpretation of reliability of our
everyday life. Availability is a function of reliability and maintainability.

As discussed earlier, redundancy plays an important role to keep reliability


high. Keeping redundancy at a high level increases system complexity and cost, however.

155 Dixon, Stephen R., and Christopher D. Wickens, “Control of multiple UAVs: A Workload
Analysis,” University of Illinois, Aviation Human Factors Division, Presented to 12th International
Symposium on Aviation Psychology, Dayton, Ohio 2003.
161
Volume, weight, and cost are also important for UAVs system’s operational usage and
real system needs. There is a trade off as indicated in Figure 36.156

Reliability Complexity Cost Redundancy Availability Example

-9
1-10
Quad y Airliner/Satelite
(150M-500M)

-7
1-10
Triple y Fighter
(50M-150M)
-5 y Moderate Cost
1-10 UAVs
(10M-50M)
Dual/Triple
-3
1-10 y Reusable UAVs
(300K-10M)

No /Dual y Reusable UAVs


Expendables
0 (<300K)

Figure 36. Reliability Trade-Offs. (After Sakamoto, slide 8)

Where redundancy is difficult to implement, fault avoidance or parts


quality is the solution to improve reliability. In some cases adding redundancy in critical
subsystems, like navigation aids, is unavoidable. Thus, cost and complexity increases.157
Maintainability is a system effectiveness concept that measures the ease and rapidity with
which a system or equipment is restored to its operational state after failing. Reliability,
availability, and maintainability are discussed in Appendix D.
3. Reliability Improvement for Hunter

156 Sakamoto, Norm, presentation: “UAVs, Past Present and Future,” Naval Postgraduate School,
February 26, 2004.
157 Clough.

162
The Army’s acquisition of the Hunter RQ-5 system is an example of reliability
improvement after the implementation of a reliability improvement program. In 1995,
during acceptance testing, three Hunter platforms crashed within a three week period. As
a result, full rate production was canceled. The Program Management Office and the
prime contractor Thompson Ramo Wooldridge (TRW) performed a Failure Mode Effect
and Criticality Analysis (FMECA) for the whole system. Failures were identified and
design changes were made after failure analyses and corrective actions were
implemented. As a result, Hunter’s Mean Time Between Failures (MTBF) for its servo
actuators, which were the main cause for many crashes, increased from 7,800 hours to
57,300 hours.

Hunter returned to flight status three months after its last crash. Over the next two
years, the system’s MTBF doubled from four to eight hours and today stands close to 20
hours. Prior to the 1995, Hunters mishap rate was 255 per 100,000 hours; afterwards
(1996-2001) the rate was 16 per 100,000 hours. Initially canceled because of its
reliability problems, Hunter has become the standard to which other UAVs are compared
in reliability.158
4. Measures of Performance (MOP) for SUAVs
In manned aviation, the usual Measures Of Performance (MOPs) used for
reliability tracking are

• Accidents per 100,000 hours of flight

• Accidents per 1,000,000 miles flown

• Accidents per 100,000 departures159

In the Vietnam War, the MOPs used for the Lightning Bug were

• The percent of platforms returned from a mission, calculated as the


number of platforms recovered from similar successful missions divided

158 OSD 2002, Appendix J.


159 National Transportation Safety Board (NTSB), Aviation Accident Statistics,”Table 6. Accidents,
Fatalities, and Rates, 1984 through 2003, for U.S. Air Carriers Operating Under 14 CFR 121, Scheduled
Service (Airline), Internet, April 2004. Available at: http://www.ntsb.gov/aviation/Table6.htm
163
by the number of platforms launched for that mission in a certain time
period.

• Missions accomplished per platform per mission type in a certain time


period.160

The frequency of mishaps is the primary factor for choosing a MOP. In the SUAV
case, we can use the following MOPs for reliability tracking:

a. Crash Rate (CR): The total number of crashes divided by the total
number of flight hours. A crash results in loss of platform.

b. Operational CR: The total number of crashes divided by the total


number of operating flight hours.

c. Mishap Rate (MR): The total number of mishaps divided by the total
number of flight hours. This thesis defines a mishap for a SUAV as significant platform
damage or a total platform loss. A mishap requires repair less than or equal to a crash
depending on the condition of the platform after the mishap.

d. Operational MR: The total number of mishaps divided by the total


number of operating flight hours.

e. Current Crash Rate (CCR): The total number of crashes from the last
system modification divided by the total number of flight hours from the last system
modification.

f. Operational CCR: The total number of crashes from the last system
modification divided by the total number of operating flight hours since the last
modification.

g. Current Mishap Rate (CMR): The total number of mishaps from the last
system modification divided by the total number of flight hours from the last
modification.

160 Carmichael, Bruce W., Col (Sel), and others, “Strikestar 2025,” Appendix A,B & C, “Unmanned
Aerial Vehicle Reliability,” Appendix A, Table 4August 1996, Department of Defense, Internet, February
2004. Available at: http://www.au.af.mil/au/2025/volume3/chap13/v3c13-8.htm
164
h. Operational CMR: The total number of mishaps from the last system
modification divided by the total number of operating flight hours from the last
modification.

i. Crash Rate “X” (CRX): The crash rate for the last “X” hours of
operational flight hours, as in “CR50” which is the CR for the last 50 flight hours.

j. Mishap rate “X”: The MR for the last “X” hours of operational flight
hours, as in “MR50” which is the MR for the last 50 flight hours.

k. Achieved Availability (AA): The total operating time (OT) divided by


the sum of OT, plus the total corrective maintenance time, plus the total preventive
maintenance time.

l. Percent Sorties Loss: The total number of sorties lost (for any reason)
divided by the total number of sorties assigned.

m. Percent Sorties Mishap: The total number of sorties with a mishap


divided by the total number of sorties assigned.

SUAVs are generally low cost systems with prices from $15K to $300K. For that
reason there is no official data collecting system in effect detailed enough to provide
reliability data. Usually, only the number of flight hours and the number of crashes is
known. For that reason, the most suitable reliability MOPs currently are CR, CCR and
CRX.
5. Reliability Improvement Program on SUAVs
A reliability improvement program seeks to achieve reliability goals by improving
product design. The objective of an improvement program is to identify, locate and
correct, faulty and weak aspects of the design, manufacturing process, and operating
procedures. For the SUAV, we first applied existing techniques for improving system
reliability.

Starting with the FMEA, which is the basis for the most common methodologies
for improving reliability; we also discussed FMECA and FTA. After that, reliability
centered maintenance, specifically MSG-3, was presented as the prevailing methodology
for enhancing civil aviation reliability and maintenance preservation methodology. We
165
showed that MSG-3 is not suitable for UAVs applications because of its dependence on
an in-board operator. We highlighted the need for a data collection system and presented
FRACAS. FRACAS is best suited for SUAVs especially during their initial phases of
development or operational test development. Finally, a method or technique is needed to
keep track of reliability growth. Duane’s plots presented and recommended for their
simplicity.
6. Steps for Improving Reliability on SUAVs
We can consider a FRACAS system as a part of a generic reliability improvement
program. The first step of such a program is an environmental stress screening (ESS).
ESS is a process that uses random vibration within certain operational limits, and
temperature cycling to accelerate part and workmanship imperfections. Identification of
infant mortality failures can be identified in a short time and relatively easily.

In addition to ESS, the next actions should be taken:

a. Verify/calibrate the instruments for the field tests or field operations.


With calibrated instruments we can substantially reduce instrumentation errors. A rule of
thumb is to use another instrument that is at least 10 times more accurate than the
instrument we want to calibrate.161

b. Set the initial weather restrictions for UAVs flights.

c. Conduct a FMEA of the system and/or perform an FTA if it is necessary


when we want to focus on a certain failure. For that purpose, we have tailored a form as
in Table 18.162

161 Hoivik.
162 Department of Defense, MIL-STD-1629A, “Procedures For Performing a Failure Mode Effects
and Criticality Analysis,” Task 101 FMEA sheet, November 24, 1980.
166
FMEA Date
UAVs FMEA Form
System Name Page of Pages

Part Name Prepared by

Reference Drawing Approved by

Mission Revisited by/Revision Date

ID Item/ Design Function Failure Operational Failure Effects Failure Fault Severity Remarks
Number functional Modes and Phase Detection Acceptance Classification
ID Causes Local Next Higher Level End Method
Effects

Table 18. UAVs FMEA Form (After MIL-STD-1629A, Figure 101.3)

167
The cell definitions are: 163

(1). ID Number, given to each entry on the FMEA form for record-
keeping purposes.

(2). Item/Functional Identification, for the item or the functional


block or subsystem, such as the carburetor or the fuel tank, for example.

(3). Design Function, a brief statement about the item’s design


function. State that the carburetor mixes fuel and air in order to feed the engine with the
proper fuel-air density, for example.

(4). Failure Modes and Causes, a brief statement about the way(s)
in which the item may fail. In the case of the carburetor, the failure modes are improper
adjustment, plugged needle valve, jammed leverage, servo failure, excess vibrations,
throttle failure, insufficient fastening to the frame, etc.

(5). Operational Phase, a brief statement about the item’s objective


or task must be written; in the case of the carburetor, it controls engine running speed.

(6). Local Failure Effects, explaining the immediate consequences


of the item’s identified failure mode. In the case of the carburetor, we can state “Engine
cannot be controlled.”

(7). Next Higher Level, about the effect of the local failure on the
next higher functional system level; in the case of the carburetor, we can state “Loss of
engine.”

(8). End Effects, explaining the effects of the indicated failure


mode on the whole system. In the case of the carburetor, we can state “Loss of thrust.”

(9). Failure detection method, explaining the way(s) by which a


failure can be detected. In the case of the carburetor, it could be detected by the operator
or by the control system itself.

163 The material from the following part of section is taken (in some places verbatim) from: RAC
FMECA, pages 60-66.
168
(10). Fault Acceptance, statement of the ways that the system can
overcome or bypass the effects of failure. In the case of the carburetor the system design
does not provide any alternatives so the word “None” can be placed under fault
acceptance.

(11). Severity Classification, representing the degree of damage


that will be caused by the occurrence of the failure mode. It could be any of the following
categories:

(a) Classification I, for complete loss of system

(b) Classification II, for degraded operation of the system

(c) Classification III, for a failure status that still needs to be


investigated

(d) Classification IV, for no effect on systems functions.

The failure effect for the carburetor can be classified as a Category I


severity.

(12). Remarks, relating details about the evaluation of the given


failure mode.

d. Establish a FRACAS. Implementation of FRACAS through the


system’s life cycle, even for the ESS tests, should continue for all failures occurring
during developmental and operational testing.

For a reliability improvement program, FRACAS is the most critical


facet.164 Failures must be identified and isolated to the root failure mode. After the failure
analysis is complete, corrective actions are identified, documentation is completed and
data is entered into FRACAS. The system’s manufacturer can use the information in
FRACAS to incorporate the corrective actions into the product. We can use the same
FRACAS forms we presented in the previous subsection.

e. Track of reliability improvement by using Duane’s theory, MTBFs


and/or achieved availability of the system.
164 Pecht, page 323.

169
f. Complete a reliability improvement plan. This plan must be completed,
approved and coordinated by the manufacturer’s engineers and reliability manager in
cooperation with the military personnel who operate the systems. The following need to
be addressed in the plan:

• Resources,

• Test schedule and test equipment,

• Personnel,

• Test environment,

• Procedures,

• Data base establishment, and

• Corrective action implementation program.

Figure 37 outlines the reliability improving process for SUAVs.

Verify/Calibrate
a
Instruments

Set Initial
b Weather
Restrictions

c Conduct FMEA

Establish
d
FRACAS

e Track Reliability

Complete
f Reliability
Improvement Plan

Figure 37. Reliability Improving Process on SUAVs

170
IV. EXAMPLE

A. RQ-2 PIONEER 86 THROUGH 95


From the US Navy’s Airborne Reconnaissance Office, 15 March 1996, come the
following data regarding the RQ-2 Pioneer battlefield UAV mishaps from 1986 until
1995.165

Flight
Operating hours vs time
Year Mishaps hours 14000
86 5 96.3 12000

Operating hours
87 9 447.1 10000
88 24 1050.9 8000
89 21 1310.5 Operating
6000
90 21 1407.9 hours vs
4000 time
91 28 2156.6
2000
92 20 1179.3
0
93 8 1275.6
84 86 88 90 92 94 96
94 16 1568 Year
95 16 1752

Table 19. RQ-2 Pioneer data


As discussed in Chapter 3, we can calculate only the Mishap Rate (MR) and the
Current Mishap Rate (CMR) because we have data only for mishaps and total flight
hours. Assuming that each year we have modifications in the system, we calculate the
following:

Mishap Current Mishap


Year Rate (MR) Rate (CMR) 0.06 MR and CMR plots
86 0.051921 0.05192108 0.05 Mishap Rate (MR)
87 0.025764 0.020129725
MR and CMR

0.04
88 0.023835 0.022837568 Current Mishap Rate (CMR)
89 0.020311 0.016024418 0.03
90 0.01855 0.014915832 0.02
91 0.016694 0.0129834 0.01
92 0.016735 0.016959213
0
93 0.015239 0.006271558 84 86 88 90 92 94 96
94 0.014487 0.010204082 Year
95 0.013721 0.00913242

Table 20. MR and CMR

165 Carmichael, Bruce W., Col (Sel), and others, “Strikestar 2025,” Appendix A, B & C, “Unmanned
Aerial Vehicle Reliability,” August 1996, Department of Defense school, Internet, February 2004.
Available at: http://www.au.af.mil/au/2025/volume3/chap13/v3c13-8.htm
171
It is obvious that both MOPs provide the notion of rapid improvement during the
first two years followed by a much slower rate of improvement.

1. We follow Duane’s theory and analyze the data as seen in Table 21. We assume
that reliability improvement efforts have been implemented every year on all similar
systems.

N T
Cum Mish Cum flight hours N/T ln(T) ln(N/T) Regression exp(regression)
5 96.3 0.051921 4.567468 -2.95803 -3.0499928 0.047359265
14 543.4 0.025764 6.297846 -3.658788 -3.4840591 0.030682613
38 1594.3 0.023835 7.37419 -3.736604 -3.7540609 0.023422437
59 2904.8 0.020311 7.97412 -3.896582 -3.9045536 0.020149947
80 4312.7 0.01855 8.369319 -3.987293 -4.0036897 0.018248183
108 6469.3 0.016694 8.774823 -4.092692 -4.1054106 0.016483249
128 7648.6 0.016735 8.942278 -4.090248 -4.1474168 0.015805192
136 8924.2 0.015239 9.096522 -4.183867 -4.186109 0.015205334
152 10492.2 0.014487 9.258387 -4.234507 -4.226713 0.014600302
168 12244.2 0.013721 9.412808 -4.288844 -4.2654495 0.014045553

Table 21. Duane’s Theory Data Analysis

The results from the regression analysis are the following:

0.1 Residuals vs fit


SUMMARY OUTPUT
0.05
Residuals

Regression Statistics 0
Multiple R 0.984226284 -0.05 10 100 1000 10000 100000
R Square 0.968701379 -0.1
Adjusted R Square 0.964789051
-0.15 Residuals vs fit
Standard Error 0.073881376
Observations 10 -0.2
Operating hours
ANOVA
df SS MS F Significance F
Regression 1 1.351526748 1.351526748 247.6023 2.65748E-07
Residual 8 0.043667661 0.005458458
Total 9 1.39519441

Coefficients Standard Error t Stat P-value


Intercept -1.90424026 0.129763159 -14.6747372 4.57E-07
ln(T) -0.25085068 0.015941821 -15.7353841 2.66E-07

Table 22. Regression Results

172
In that case α is -0.25 for the total 12,244.2 hours of operations. In the next figure,
we can see Duane’s regression and failure rate versus time plots.

0.1
failure rate vs
time
failure rate Duane's
regression

0.01
1 100 10000

Operating hours

Figure 38. Duane’s Regression and Failure Rate versus Time

From the residual and the Duane’s plots we see a steeper descent for the failure
rate in the first years followed by a short period of constant failure rate. The last year’s
failure rate is not as steep as the first year’s.
2. Using the same data set, we concentrate on the last six years, from 1990 to
1995.

Year Mishaps Fllight hours Operating hours vs time


10000
Operating hours

90 21 1407.9
8000
91 28 2156.6
6000
92 20 1179.3 Operating
4000 hours vs time
93 8 1275.6
2000
94 16 1568
0
95 16 1752
90 92 94 96
year

Table 23. RQ-2 Pioneer Data, 1990 to 1995

We follow Duane’s theory and analyze the data as seen in the next table.

173
N T
Cum Mish Cum flight hours N/T ln(T) ln(N/T) Regression exp(regression)
21 1407.9 0.014916 7.249855 -4.205332 -4.173563 0.015397302
49 3564.5 0.013747 8.178779 -4.286959 -4.2891601 0.013716442
69 4743.8 0.014545 8.464594 -4.230487 -4.3247274 0.013237158
77 6019.4 0.012792 8.702743 -4.358937 -4.3543631 0.012850622
93 7587.4 0.012257 8.934244 -4.401645 -4.3831715 0.012485697
109 9339.4 0.011671 9.141997 -4.450649 -4.4090247 0.012167039

Table 24. Duane’s Theory Data Analysis for 1990 to 1995

Now the results from the regression analysis follow:

Residuals vs fit
SUMMARY OUTPUT 0.1
residuals vs fit
Regression Statistics
Residuals

0.05
Multiple R 0.864537515
R Square 0.747425115
Adjusted R Square 0.684281394 0
Standard Error 0.054749695 1000 10000
Observations 6 -0.05
Operating hours
ANOVA
df SS MS F Significance F
Regression 1 0.035481414 0.035481 11.83689 0.026282253
Residual 4 0.011990116 0.002998
Total 5 0.047471531

Coefficients Standard Error t Stat P-value


Intercept -3.271377559 0.306285094 -10.68083 0.000435
ln(T) -0.124441862 0.036169936 -3.440478 0.026282

Table 25. Regression Results for 1990 to 1995

Now the parameter α is -0.12 for the last 9,339.4 hours of operations. That means
we have less rapid reliability growth the last six years. Figure 39 depicts Duane’s
regression and failure rate versus time plot:

174
0.1
failure rate vs
time

failure rate
Duane's
regression line

0.01
1000 10000
Operating hours

Figure 39. Duane’s Regression and Failure Rate versus Time for 1990 to 1995

Comparing the two time periods, we can say that rate of reliability growth for the
last six years (factor of -0.12) from 1990 to 1995 decreased compared to the overall
factor -0.25 for the whole ten-year period from 1986 to 1995.

3. Using the same data set, we concentrate in the first six years from 1986 to
1991.

Year Mishaps Fllight hours Operating hours vs time


8000
Operating hours

86 5 96.3 Operating
87 9 447.1 6000 hours vs time
88 24 1050.9 4000
89 21 1310.5 2000
90 21 1407.9
0
91 28 2156.6
86 87 88 89 90 91
year

Table 26. RQ-2 Pioneer Data, 1986 to 1991

We follow Duane’s theory and analyze the data as seen in the next table:

175
N T
Cum Mish Cum flight hours N/T ln(T) ln(N/T) Regression exp(regression)
5 96.3 0.051921 4.567468 -2.95803 -3.0470131 0.047500591
14 543.4 0.025764 6.297846 -3.658788 -3.4860799 0.030620672
38 1594.3 0.023835 7.37419 -3.736604 -3.7591921 0.023302559
59 2904.8 0.020311 7.97412 -3.896582 -3.9114186 0.020012093
80 4312.7 0.01855 8.369319 -3.987293 -4.0116967 0.018102654
108 6469.3 0.016694 8.774823 -4.092692 -4.1145894 0.016332645

Table 27. Duane’s Theory Data Analysis for 1986 to 1991

Now the results from the regression analysis follow:

Residuals vs fit
0.1
SUMMARY OUTPUT
0.05

Regression Statistics Residuals 0


Multiple R 0.975768582 -0.05 10 100 1000 10000
R Square 0.952124325 -0.1
Adjusted R Square 0.940155406
-0.15
Standard Error 0.099437813 Residuals vs fit
Observations 6 -0.2
Operating hours

ANOVA
df SS MS F Significance F
Regression 1 0.786578144 0.786578144 79.54973601 0.000873629
Residual 4 0.039551515 0.009887879
Total 5 0.826129659

Coefficients Standard Error t Stat P-value


Intercept -1.888061479 0.209552206 -9.009981407 0.000840246
ln(T) -0.25374049 0.028449223 -8.919065871 0.000873629

Table 28. Regression Results for 1986 to 1991

Now α is -0.25 for the first 6469.3 hours of operations. In the next figure, we see
Duane’s regression and failure rate versus time plots:

176
0.1
failure rate vs time

Duane's regression line


Failure rate

0.01
1000 10000
Operating hours

Figure 40. Duane’s Regression and Failure Rate versus Time for 1986 to 1991

If we compare the first six years with the last six years, we can say that reliability
growth for the last six years has increased according to the factor of -0.12, instead of the
factor -0.25, which related to the first six years. We do not know why the reliability
growth rate has decreased, but it has.

4. We can use the Duane curve to predict the MTBF for the future. From the
previous discussion on Duane’s plots on IIIB4, MTBF is K ⋅ T α where K = eb . Using the
results for the last six years we have a is -0.1244 and b is -3.2714. So the equation for the
curve is MTBF = e −3.2714 ⋅ T −0.1244 . This curve can be used as the prediction curve for the
MTBF. For example, in 12,000 hours of operation after 1990, the MTBF will be
0.011793 failures per hour of operation or 12 failures per 1,000 hours of operation.

Prediction of MTBF vs Operating hours


0.0122
0.012 MTBF vs Operating
0.0118 hours
MTBF

0.0116
0.0114
0.0112
0.011
0.0108
9000 12000 15000 18000 21000
Operating hours

Figure 41. Prediction Plot Curve

177
THIS PAGE INTENTIONALLY LEFT BLANK

178
V. CONCLUSION

A. SUMMARY
From the material presented in this thesis, we can conclude the following:

1. There is a real need for reliability improvement in Small UAV systems.

2. RCM (or MSG-3) is a system suitable for civil and military manned
aviation and other industry fields in which experience is prevalent, hidden failures can be
easily identified by personnel, and safety considerations are the primary factor. For small
UAV systems in military applications, safety is not the primary factor. Experience has not
reached the manned aviation levels and hidden failures for unmanned systems are very
difficult to be observed. Therefore MSG-3 is not a suitable standard for SUAVs.

3. FMEA may be used for almost any kind of reliability analysis that
focuses on finding the causes of failure. A good and complete knowledge of the system is
necessary prior to proceeding with the FMEA. FMEA is an appropriate method for
SUAVs. This thesis has developed FMEA forms for the SUAV.

4. FTA is another useful method of analysis based on the top-down


approach and can be used to focus only on the weak points that need enhancing. It is
appropriate for SUAVs, and can be used to focus on engine, control, and navigation
subsystems that are among the most critical elements. We developed FTA diagrams for
the SUAV in this thesis.

5. Functional flow diagrams or block diagrams are used to give a quick


and comprehensive view of the system design requirements illustrating series and parallel
relationships, hierarchy and other relationships among system’s functions. Since a SUAV
is essentially a series system it is less useful.

6. FRACAS, a failure reporting analysis and corrective action system,


should be implemented for a program during production, integration, test, and field
deployment phases to allow for the collection and analyses of reliability and
maintainability data for the hardware and software items. For a successful reliability
improvement program, all failures should be considered. SUAVs need FRACAS system.
179
This research effort developed the framework of one aircraft, including the necessary
forms.

7. For SUAVs we have to use fault avoidance due to size and weight
limitations. Redundancy cannot be easily implemented, especially due to platform cost
and size constrains.

8. In a series structure, like SUAVs, the component with the lowest


reliability is the most important one for reliability improvement. Currently, there is no
bans for estimating the reliability of a system for operational planners.

9. We can track the overall reliability of a system for SUAVs under


experimental development by implementing a method that records failure data. By
analyzing the data we can calculate and predict reliability growth.

10. Similarly we can track the reliability of subsystems of a SUAV


system. We divide the system into:

• Propulsion and power

• Flight control and navigation

• Communication

• GCS (Human in the loop)

• Miscellaneous

and keep track of the reliability for each subsystem separately. The forms that we have
developed can be used as data source for subsystem reliability separation.

12. For a reliability improvement program, we need to:

• Conduct an Environmental Stress Screening (ESS),

• Calibrate and verify the instruments for the field tests or field operations,

• Set the initial weather restrictions for UAVs flights,

• Execute a FMEA of the system and/or perform an FTA,

• Establish a FRACAS,
180
• Track of reliability improvement,

• Complete a reliability improvement plan.

13. Reliability costs, and benefits, are like an investment. One truly gets
what one pays for.

This thesis is a qualitative approach to the issue of reliability and UAVs. In order
to obtain further benefit and value from that research effort, we must have data. For a
specific type of UAV, we can start implementing FRACAS and collecting data. A
database can be created easily after the implementation of FRACAS, and we can start
analyzing and interpreting reliability improvement, if any, quite soon.

B. RECOMMENDATIONS FOR FUTURE RESEARCHERS


This thesis outlines methods of improving SUAV reliability. Methods must be
defined for better data collections. Real data from SUAV systems must be collected in
order to formulate reliability databases. The quantitative reliability analysis follows and
detailed information about reliability improvement results.

Researching many issues would be worthwhile.

1. SUAVs are considered expendables since no pilot is onboard. As we


increase their reliability, their cost, and their importance in the battlefield operations, we
have to start considering their survivability. Being small in size may be is not enough to
cope with enemy-fires. Researching survivability issues for SUAVs is another field of
interest with many extensions and relations to design philosophy and cost.

2. Some experts believe that difficult problems can be solved with better
software, but software is not free. In the near network-centric future, software will
probably be one of the most expensive parts of a UAV system. Additionally, software is a
dynamic part of the system. It must be constantly upgraded to meet new expectations, or
to integrate new equipment technologies. For that reason software reliability is another
critical issue that will become more intense in the near future. The emerging question is
how we can find the best means to maintain software reliability at acceptable levels.

181
3. Similar to the above issue, micro-technologies are quickly evolving.
New ones are rapidly being inserted into UAV systems. In what way can our reliability
tracking methodology cope with new subsystems?

4. If there is a need to achieve a certain level of reliability, what would the


economic consequences be?

5. Generally, it would be of great interest to research the potential


mechanisms for incorporating new equipment into a reliability improvement program.

6. What is the best number of maintenance personnel to keep the system at


a given level of availability?

7. What should the spares policy be for SUAVs?

8. What fraction of failures are due to software instead of hardware


failures?

Data collected using the methods developed in this thesis will provide the
material with which to answer these essential questions.

182
APPENDIX A: DEFINITION OF FMEA FORM TERMS

1. First Part of the Analysis of Design FMEA166


(1) Subsystem Identification: Name the subsystem or identification title of
the FMEA.

(2) Design Responsibility: Name the system design team and for (2A)
name the head of the system design team.

(3) Involvement of Others: Name other people or activities within the


company that affect the design of the system.

(4) Supplier Involvement: Name other people, suppliers and/or outside


organizations that affect the design of the system.

(5) Model/product: Name the model and/or the product using the system.

(6) Engineering Release Date: This is the product release date.

(7) Prepared by: The name of the FMEA design engineer.

(8) FMEA Date: Record the date of the FMEA initiation.

(9) FMEA Date, revision: Record the date of the latest revision.

(10) Part Name: Identify the part name or number.


2. The Second Part of the Analysis of Design FMEA167
(11) Design Function: This is the objective function of the design. The
function should be described in specific terms. Active verbs defining functions and
appropriate nouns should be used.

(12) Potential Failure Mode: The defect refers to the loss of a design
function or a specific failure. “For each design function identified in Item 11 the
corresponding failure of the function must be listed. There can be more than one failure
from one function. ” To identify the failure mode ask the question: “How could this

166 The material from this section is taken (in some places verbatim) from: Stamatis, pages 130-132.
167 Ibid, pages 132-149.

183
design fail?” or “Can the design break, wear, bind and so on?” Another way to identify a
failure mode is through a FTA. In a FTA the top level is the loss of the part function and
the lower levels are the corresponding failure modes.

(13) Potential Effect(s) of failure: This is the ramification of the failure on


the design. The questions usually asked are “What does the user experience as a result of
that failure?” or “What are the consequences for the design?” To identify the potential
effects, documents like historical data, warranty documents, field-service data, reliability
data and others may be reviewed. If safety is an issue, then an appropriate notation should
be made.

(14) Critical Characteristics: Examples of critical items may be


dimensions, specifications, tests, processes etc. These characteristics affect safety and/or
compliance with rules and regulations and are necessary for special actions or controls.
An item is indicated critical when its severity is rated 9 to 10 with occurrence and
detection is higher than 3.

(15) Severity of Effect: Indicates the seriousness of a potential failure. For


critical effects severity is high while for minor effects severity is very low. Usually there
is a rating table, which is used for evaluation purposes. This table is made in such a way
that all designing issues have been taken into consideration. The severity rating should be
based on the worst effect of the failure mode. An example of the severity guideline table
for design FMEA is in Table 29.

184
Effect Rank Criteria
None 1 No effect
Very slight 2 User not annoyed. Very slight effect on the product performance.
Non-essential fault noticed occasionally
Slight 3 User slightly annoyed. Slight effect on the product performance.
Non-essential fault noticed frequently.
Minor 4 User’s annoyance is minor. Minor effect on the product
performance. Non-essential faults almost always noticed. Fault
does not require repair.
Moderate 5 User has some dissatisfaction. Moderate effect on the product
performance. Fault requires repair.
Significant 6 User is inconvenienced. Degradation on product’s performance
but safe and operable. Non-essential parts inoperable.
Major 7 User is dissatisfied. Major degradation on product’s performance
but safe and operable. Some subsystems are inoperable.
Extreme 8 User is severely dissatisfied. Product is safe but inoperable.
System is inoperable.
Serious 9 Safe operation and compliance with regulations are in jeopardy.
Hazardous 10 Unsafe for operation, non-compliance with regulations,
completely unsatisfactory.

Table 29. Example of Severity Guideline Table for Design FMEA (After Stamatis, page
138)

(16) Potential Cause of Failure: This identifies the cause of a failure mode.
For a failure mode there may be a single cause or numerous causes, which in that case are
symptoms, with one root cause. A good understanding of the system’s functional analysis
is needed at that stage. Trying to find the real cause can identify the root cause. Asking
“Why?” five times is the rule of thumb for finding the cause of a failure mode. It is
essential to identify all potential failures while performing the FMEA. There is not
always a linear or “one-to-one relationship” between the cause and failure mode. Listing
as many causes as possible makes FMEA easier and less error prone. If the severity of a
failure is rated 8 to 10, then an effort should be made to identify as many root causes as
possible.

(17) Occurrence: This is the value that corresponds to the estimated


frequency of failures for a given cause over the life of the design. To identify the
frequency for each cause, we need reliability mathematics, expected frequencies or the
185
cumulative number of component failures per 100 or 1000 components (CF/100 or
CF/1000). If expected frequencies and/or the cumulative number of failures cannot be
estimated, then alternative systems or components could be examined for similar data that
could be used as a surrogate. Usually, the assumption of a single-point-failure is used in
design FMEA. It is a component failure, which could cause the system to fail and is not
balanced by an alternative method. So occurrence referred to a single-cause-failure. A
guideline for occurrence is shown in Table 30.

Occurrence Rank Criteria CF/1000


Almost impossible 1 Failure unlikely. Historical data indicate <0.00058
no failures
Indifferent 2 Rare number of failures likely 0.0068
Very slight 3 Very few failures likely 0.0063
Slight 4 Few failures likely 0.46
Low 5 Occasional number of failures likely 2.7
Medium 6 Medium number of failures likely 12.4
Moderately high 7 Moderately high number of failures 46
likely
High 8 High number of failures likely 134
Very high 9 Very high number of failures likely 316
Almost certain 10 Failure almost certain >316

Table 30. Example of Occurrence Guideline Table for Design FMEA (After Stamatis,
page 142)

(18) Detection Method: This is a procedure, test, design or analysis used to


detect a failure in a design or part. It can be very simple or very difficult, to identify
problems before they reach the end user. If there is no method, then “None identified at
this time” is the answer. Two of the leading questions are “How can this failure be
discovered?” and “In what way can this failure be recognized?” A checklist may be
helpful. Nevertheless, some of the most effective ways to detect a failure are simulation
techniques, mathematical modeling, prototype testing, specific design tolerance studies
and design and material review. The design review is an important way to revisit the
suitability of the system or design. A design review can be quantitative or qualitative,
using a systematic methodology of questioning and design.

186
(19) Detection: Is the “likelihood that the proposed design controls will
detect” the root cause of a failure mode before it reaches the end user. The detection
rating estimates the ability of each of the controls in (18) to detect failures before it
reaches the customer. A typical detection guideline is shown in Table 31.

Effect Rank Criteria


Almost 1 Has the highest effectiveness
certain
Very high 2 Has very high effectiveness
High 3 Has high effectiveness
Moderately 4 Has moderately high effectiveness.
high
Medium 5 Has medium effectiveness
Low 6 Has low effectiveness
Slight 7 Has very low effectiveness.
Very slight 8 Has the lowest effectiveness
Indifferent 9 It is unproven, or unreliable, effectiveness unknown
Almost 10 There is no design technique available or known
impossible

Table 31. Example of Detection Guideline Table for Design FMEA (After Stamatis, page
147)

(20) Risk Priority Number (RPN): This is the product of severity,


occurrence, and detection. RPN is just a number that represents the priority of the failure.
Reducing RPN is the FMEA’s goal, and this is the result after the reduction in severity
and/or occurrence and/or detection. So, changing the design, one can reduce the severity
rating. By improving the requirements and engineering specifications while focusing on
“preventing causes or reducing their frequencies,” one can reduce the occurrence rating.
Adding detection equipment and tools or “improving the design evaluation technique”
can reduce the detection rating.

(21) Recommended Actions: These may be specific actions or suggestions


for further study. Recommended actions intend to reduce the RPN for the different failure
modes. Prioritization of failure modes according to their RPN, severity and occurrence, is
needed while conducting a FMEA.

187
(22) Responsible Area or Person and Completion Date: Name the
responsible person/area and the completion date for the recommended action.

(23) Action Taken: This is about the follow-up actions.

(24) Revised RPN: This is the reevaluation of RPN after the corrective
actions have been implemented. If the revised RPN is less than the original then that
indicates an improvement.
3. Third Part of the Analysis of Design FMEA168
(25) Approval signatures: Name the authority to conduct the FMEA.

(26) Concurrence signatures: Names there responsible for carrying out the
FMEA.

168 Stamatis, page 149.

188
APPENDIX B: THE MRB PROCESS

The Maintenance Review Process (MRB process) “is broadly defined as all of the
activities necessary to produce and maintain a Maintenance Review Board Report
(MRBR).” The process involves three major objectives, which are to ensure that:

1. Scheduled maintenance instructions (tasks and intervals) which are


developed for a specific aircraft, contribute to the continuing airworthiness and
environmental requirements of the Regulatory Authorities and the Standards and
Recommended Practices (SARPs) as published by the International Civil Aviation
Organization (ICAO).

2. The tasks are realistic and capable of being performed.

3. The developed scheduled maintenance instructions may be performed


with a minimum of maintenance expense.169

“MRBRs are developed as a joint exercise involving the air operators, the type of
certificate applicant,” ATA and other Regulatory Authorities. The MRB process

consists of a number of specialist working groups who use an


analytical logic plan to develop and propose maintenance/inspection tasks
for a specific aircraft type. The proposed tasks are presented to an Industry
Steering Committee (ISC) who, after considering the working group
proposals, prepares a proposal for the MRBR.

The MRB chairperson reviews the proposed MRBR, which is then published as
the MRBR.170

169 Transport Canada Civil Aviation (TCCA), Maintenance Instruction Development Process, TP
13850, Part B, “The Maintenance Review Board (MRB) Process(TP 13850), Chapter 1. General,” last
updated: April 19, 2003, Internet, February 2004. Available at: http://www.tc.gc.ca/civilaviation
/maintenance/aarpd/tp13850/partB.htm
170 TCCA, Chapter 2.

189
THIS PAGE INTENTIONALLY LEFT BLANK

190
APPENDIX C: FAILURES

1. Functions171
A function statement should consist of a verb, an object and a desired standard of
performance. For example: A SUAV platform flies up to 4,000 feet at a speed of at least
on 55 knots sustained. The verb is “fly” while the object is “a SUAV platform” and the
standard is “up to 4,000 feet at a sustained speed of at least 55 knots.”
2. Performance Standards172
In our example: One process that degrades the SUAV platform, in other words
one failure mode for the SUAV, is engine failure. Engine failure happens due to many
reasons. The question is how much an engine failure can impair the ability of the UAV to
fly at the desired altitude on the designated sustained speed.

In order to avoid degradation, the SUAV must be able to perform better than the
minimum standard of performance desired by the user. What the asset is able to deliver is
known as its “initial capability,” say 4,500 feet on 60 knots sustained speed. This leads
one to define performance as:

• Desired performance, which is what the user wants the asset to do (4,000
feet on 55 knots sustained speed in our case).

• Built-in capacity, which is what the asset really is (4,500 feet on 60 knots
sustained speed in our case).
3. Different Types of Functions173
Every physical asset usually has more than one function. If the objective of
maintenance is to ensure that the asset can continue to fulfill these functions, then they
must all be identified together with their current standards of performance.

Functions are divided in two main categories: primary and secondary functions.
171 Moubray, John, an excerpt of the first chapter of the book “Reliability-centered Maintenance,”
Plant Maintenance Resource Center, “Introduction to Reliability-centered Maintenance,” Revised
December 3, 2002, Internet, May 2004, Available at: http://www.plant-maintenance.com/RCM-intro.shtml
172 Moubray, “Introduction.”
173 The material from this section is taken (in some places verbatim) from: Moubray, “Introduction.”

191
a. Primary functions are fairly easy to recognize and most industrial assets
are based on their primary functions. For example, the primary function of a “printer” is
to print documents, and of a “crusher” is to crush something, etc. In the SUAV example
the primary function is to provide lift and thrust so as the platform flies up to 4,000 feet at
a sustained speed of at least 55 knots.

b. In addition to their primary functions, most assets are expected to fulfill


one or more additional functions, which are the secondary functions. For example, the
primary function of the SUAV platform in the example, is to provide thrust and lift so as
to fly up to 4,000 feet on 55 knots sustained speed at least. A secondary function could be
to use an auto-recovery system. Secondary functions could include environmental
expectations, safety, control, containment, and comforts aspects, appearance, protection,
economy, efficiency and other extra functions.
4. Functional Failure174
If, for any reason, the asset is unable to do what the user wants, the user will
consider it to have failed. “Failure is defined as the inability of any asset to do what its
users want it to do.” This definition treats the concept of failure as if it applies to an asset
as a whole.

However, each asset has more than one function, and each function often has
more than one desired standard of performance. It is possible for the asset to fail for each
function, so the asset can fail in different states. Therefore, it is required that failure can
be defined more accurately in terms of loss of specific functions rather than the failure of
an asset as a whole.

According to British Standard (BS) 4778 failure is defined as “The termination of


an item’s ability to perform a required function.”
5. Performance Standards and Failures175
The limit between satisfactory performance and failure is specified by a
performance standard. Failure can be defined by defining a functional failure as follows:

174 The material from this section is taken (in some places verbatim) from: Moubray, “Introduction.”
175 The material from this section is taken (in some places verbatim) from: Hoyland, pages 11-12.

192
A functional failure is defined as the inability of any “asset to fulfill a function to a
standard of performance, which is acceptable to the user.” 176

A failure could have different aspects of functional failure:

• Partial and total failure

• Upper and lower limits

• Gauges and indicators

• The operating context

Failures may be classified in many different ways:

a. Sudden versus gradual failures

b. Hidden versus evident failures

c. According to effects of severity

(1). Critical failure: A failure that is sudden and causes termination


of one or more primary functions.

(2). Degraded failure: A failure that is gradual and/or partial.

(3). Incipient failure: A deficiency in the condition of an item so


that a critical or degraded failure can be expected unless corrective action is not taken.

d. Another classification according to the effects of severity by US Mil-Std


882, “System Safety Program Requirements”:

(1) Catastrophic, which results in loss of life and/or loss of system.

(2) Critical, which results in severe injury and/or illness and/or


severe system damage.

(3) Marginal, which results in minor injury and/or illness and/or


minor system damage.

(4) Negligible with less than minor results.

176 The material from this part of section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction.”
193
e Another classification according to the effects of severity:

(1) Primary failure due to aging.

(2) Secondary failure due to excessive stresses.

(3) Command fault or transient failures due to improper control


signal or noise.
6. Failure Modes177
“Once each functional failure has been identified, the next step is to try to identify
all the events that are reasonably likely to cause each failed state. These events are known
as failure modes.” Failure modes are those that have occurred on the same or similar
equipment operating with the same parameters and conditions, failures that can be
prevented by existing maintenance policies, and failures that have not yet happened but
they can be considered as likely to happen.178

Failure mode is “the effect by which a failure is observed on the failed item.”
Technical items are designed to perform one or more functions. So a failure mode can be
defined as nonperformance of one of these functions. Failure modes may generally be
subdivided as “demanded change of state is not achieved” and “change of conditions.”

For example, an automatic valve may show one of the following failure modes:

a. Fail to open on command

b. Fail to close on command

c. Leakage in closed position

The first two failure modes are “demanded change of state is not achieved” while
the third one is “change of condition.”
7. Failure Effects179

177 The material from this section is taken (in some places verbatim) from: Hoyland, page 10.
178 The material from this part of section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction,” page 5.
179 The material from this section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction,” page 5.
194
The fourth of the seven questions in the RCM process, as previously mentioned in
IIA2b of this thesis, is listing “What happens when each failure occurs?” These are
known as “failure effects.”

Failure effects describe what happens when a failure occurs. While describing the
effects of a failure, the following should be recorded:

a. What is the evidence that the failure has happened?

b. In what way does it pose a threat to safety or the environment?

c. In what way does it affect production or operation?

d. What physical damage is caused by the failure?

e. What must be done to repair the failure?


8. Failure Consequences180
Failures affect output, but other factors such as product quality, customer service,
safety or environment also influence output. The nature and severity of these effects
govern the consequences of the failure. The failure effects tell us what happens, and when
a failure occurs. The consequences describe how and how much it matters. For example,
if we can reduce the occurrence (frequency) and/or severity of failure effects, then we can
reduce the consequences.

Therefore, if a failure matters very much, efforts will be made to mitigate or


eliminate the consequences. On the contrary, if the failure is of minor consequence, no
proactive action may be needed.

A proactive task is worth doing if it reduces the consequences of the failure mode
and justifies the direct and indirect costs of doing the task.

Failure consequences could be classified as:

a. Environmental and safety consequences, when it is not able to fulfill the


local and/or national and/or international environmental standards, or if the failure causes
injury and/or death.

180 The material from this section is taken (in some places verbatim) from: Aladon Ltd,
“Introduction,” page 5.
195
b. Operational, if the failure affects the operation, production output,
quality, cost or customer satisfaction.

c. Non-operational, when only maintenance and/or repair is involved,


without affecting the environmental, safety or production.

d. Hidden, when failures have no direct impact, but they expose the
organization to multiple failures with serious and often catastrophic consequences.

196
APPENDIX D: RELIABILITY

1. Introduction to Reliability181
Reliability is a concept that has dominated systems design, performance and
operation for the last 60 years. It appeared after WWI, when it was used to compare
operational safety of one, two, three, and four-engine airplanes. At that time reliability
was measured as the number of accidents per flight hour.

During WWII, a group of scientists, under Wernher von Braun in Germany,


developed the V-I missile. After the war it was reported that the first ten V-I missiles
were all ridiculous failures. All of the first missiles either exploded on the launching rail,
or landed earlier than planned, in the English Channel. It was the mathematician Robert
Lusser who analyzed the missile system and derived the “product probability law of
series components.” The theorem states that “a system is functioning only if all the
components are functioning and is valid under special assumptions.” It simply says that
the reliability of the system is equal to the product of the system’s individual components
reliabilities. If the system has many components, then its reliability is rather low, even
though the individual components have high reliabilities.

In order to avoid low system reliability, engineers in the USA, at that time tried to
improve the individual system’s components. They used “better” materials and “better”
designs for the products. The result was higher system reliability but broad and further
analysis of the problem was not performed.

By the end of 1950s and early 1960s, interest in the USA focused on production
of the intercontinental ballistic missile and space research like the Mercury and Gemini
programs. In the race to put a man on the moon, a reliable program was very important.
The first association for engineers working with reliability issues was established. IEEE-
Transactions on Reliability was the first journal published on the subject in 1963. After
that, a number of textbooks were published and in the 1970s many countries from Europe

181 The material from this section is taken (in some places verbatim) from: Hoyland, pages 1-2.

197
and Asia began dealing with the same issues. Soon it became clear that a low reliability
level cannot be compensated by extensive maintenance.
2. What is Reliability?
“Until the 1960s, reliability was defined as the probability that an item will
perform a required function under stated conditions for a stated period of time.”
According to the International Standard Organization (ISO) 8402 and British Standard
(BS) 4778, “reliability is the ability of an item to perform a required function, under
given environmental and operational conditions and for a stated period of time.” The term
“item” is used to denote any component, subsystem or an entity system. A “required
function” may be a single function or a combination of functions necessary to provide a
certain service.182

For a defense acquisition system, reliability is a measure of effectiveness.183 It is


one of the “ilities” that a system needs to comply with, in order to be operationally
suitable.

We can keep track of reliability by measuring or calculating some measures of


performance such as:

a. The probability of completing a mission

b. The number of hours without a critical failure under specified mission


conditions or mean time between critical failures (MTBCF)

c. The probability of success as the number of successes divided by the


total number of attempts

d. The mean time to failure (MTTF)

e. The failure rate (failures per unit time)

f. The probability that the item does not fail in a time interval.
3. System Approach
A system is a group of elements, parts, or components that work together for a
specified purpose. A failure of the system is related at least to one of its parts or elements
182Hoyland, page 3.
183 Hoivik, slide 6.

198
or components failure. A part starts at its working state and for various reasons changes
to a failed state after a certain time. The time to failure is considered a random variable
that we can model by a failure-distribution function.184

Failure occurs due to a complex set of interactions between the material properties
and/or physical properties of the part and/or stresses that act on the part. The failure
process is complex and is different for different types of parts or elements or
components.185

The strength or endurance of a part may be significantly and unpredictably varied


because of manufacturing variability. So that strength, say “X”, must be modeled as a
random variable. When the system is being used it is subjected to a stress, say “Y”. If
“X” is less than “Y”, then the part fails immediately because its strength is not enough to
withstand the magnitude of stress “Y”. If “Y” is less than “X”, then the strength of that
part is enough to withstand the stress and the part is functional.

Even though the failure mechanisms vary, they are basically divided into two
categories, the overstress and the wear-out. The overstress failures are those due to
fracture, yielding, buckling, large elastic deformation, electrical overstress, and thermal
breakdown. Wear-out failures are those due to wear, corrosion, metal migration, inter-
diffusion, fatigue-crack propagation, diffusion, radiation, fatigue-crack initiation and
creep.186

For multi-component systems like a SUAV the number of parts may be very large
and a multilevel decomposition of such a system is necessary.
4. Reliability Modeling
a. System Failures187
System failures for a multi-component system can be modeled in several
ways. A system failure is due to the failure of at least one of its components. So analysis

184 Hoyland, page 18.


185 Pecht, page 93.
186 Ibid, page 96.
187 The material from this section is taken (in some places verbatim) from: Blischke, pages 204-205.

199
of failures at the component level is the initial point of a failure system analysis. “Henley
and Kumamoto (1981) propose the following classification of failures:

(1) Primary failure

(2) Secondary failure

(3) Command fault”

Primary or “natural” is when the component fails due to natural causes


like aging. In that case, replacement of the aging component is the remedy.

Secondary or “induced” is the failure of a component due to excessive


stress resulting from the primary failure of some other component(s) and/or
environmental factors and/or user actions.

“Command fault occurs when a component is in not working state because


of improper control signals or noise.” This can be due to a user’s faulty operation or a
logic controller’s faulty operation signal.
b. Independent vs Dependent Failures188
The failure times of components are often influenced by environmental
conditions. As the environment becomes “harsher, the time it takes to reach a failure
decreases.” Thus if the system’s components share the same environment their failure
times are statistically dependent. If the dependence is weak, it can be ignored and failure
times can be treated as statistically independent. In that way failure times can be modeled
separately using univariate failure-distribution functions. But in case of significant
dependence, multivariate failure distributions must be used and modeling becomes much
more complicated.
c. Black-Box Modeling189
A system failure is due to the failure of one or more of its components.
“The number of failed components that must be restored to their working state is usually
small relative to the total number” of the system’s components. Replacing or repairing
the defective component(s) restores the system to its operational state. If the restoration

188 The material from this section is taken (in some places verbatim) from: Blischke, page 205.
189 Ibid.

200
time is very small relative to the mean time between failures, then it can be ignored, and
we can model the failure system as a function reflecting the effect of age. In other words,
the model function can be viewed as the failure rate of the system through time.

After overhauls or major repairs or design alterations the failure rate of the
system can be significantly reduced. Usually, it becomes smaller than the failure rate
before.

Therefore, in black-box modeling we can collect data through the life


cycle time of a system and find a function that is the failure rate through time. A lot of
data is needed in order for the function to be precisely estimated, so black-box modeling
is not recommended for the design and development phase of a system because of the
changes that continuously alter the failure rate.
d. White-Box Modeling190
“In a white-box modeling, system failure is modeled in terms of the
failures of the components of the system.” We can reach system failures from component
failures using the bottom-up (or forward) approach or the top-down (or backwards)
approach. In the forward approach, we start with part-level failures, and then we proceed
to the system level to evaluate the consequences of such failures on the system’s
performance. FMEA uses this approach. In the backward approach, we start at the system
level failures, and then we proceed downward to the part level to relate pure-system
performance to part-level failures. FTA uses this approach.

“The linking of the system performance to failures at the part level can be
done either qualitatively or quantitatively.” In the qualitative case, we are interested in the
causal relations between failures and system performance. In the quantitative case, we
can use many measures of system effectiveness, like reliability, in terms of component
reliabilities.

For an example assuming independent failures, if a machine has a failure


rate of 1 failure every 100 days then the probability of having a failure on any day is
1/100. If a second redundant machine has the same failure rate, then a system that

190 Blischke, pages 206-207.

201
consists of both those machines has a probability that both machines fail on the same day
as 1/100 squared or 1/10,000.
e. Reliability Measures191
In order to understand the reliability measures, we must determine the
“time-to-failure” as a basic step. Time-to-failure of a system or component or part or unit
or element (system) is the time elapsing from when the system is put into operation until
the first failure. Let t=0, the operation starting time. The time to failure is subject to many
variables. Consequently, we can represent time-to-failure as a random variable T. We can
describe the condition or state of the system at time t by the condition random variable
⎧1 if the system is functioning at t ⎫
X(t) where X (t ) = ⎨ ⎬
⎩0 if the system is in failed condition at t ⎭

The graphical representation of X(t) versus time t is shown in Figure 43.


Condition Variable
X(t)

Working Condition
1

Failure
0
Time t
T

Time-to-failure

Figure 42. Condition Variable Versus Time.(From Hoyland, page 18)

The time-to-failure may not always be measured in time but can also be
measured in numbers of repetitions of operation, or distance of operation, or number of
rotations of a bearing, etc. We can assume that the time-to-failure T is continuously
distributed with a probability density f(t) and distribution function :

191 The material from this section is taken (in some places verbatim) from: Hoyland, pages 18-25.

202
1
F (t ) = P(T ≤ t ) = ∫ f (u )du for t > 0 .
0

The probability density f(t) is defined as :

d F (t + ∆t ) − F (t ) P(t < T ≤ t + ∆t )
f (t ) = F (t ) = lim = lim .
dt ∆t → 0 ∆t ∆t → 0 ∆t

If ∆t is small then:

f (t ) ⋅ ∆t = P(t < T ≤ t + ∆t ) .

A typical distribution function F(t) and the corresponding density function


f(t) are shown in Figure 44.

1.0

F(t)

0.5

f(t)

0.0
Time t
0 1 2 3 4

Figure 43. Distribution and Probability Density Functions (From Hoyland, page 18)

There are three important measures of reliability:

(1) The reliability or survivor function R(t)

(2) The failure rate z(t)

(3) The mean time to failure (MTTF)

203
(1). Reliability or Survivor Function R(t). The reliability function
of a system is defined as:

R(t ) = 1 − F (t ) = P(T > t ) for t > 0 . (A)

So R(t) is the probability that the system has operated without


failure in the time interval (0,t]. Equivalently we can say that R(t) is the probability that
the unit survives in the time interval (0,t]. The reliability function R(t) is also called the
“survivor function”. A typical reliability function that corresponds to the distribution
function of Figure 43 can be seen in Figure 44.

1.0

F(t)
R(t)

0.5
F(t),

R(t)

0.0
Time t
0 1 2 3 4

Figure 44. Typical Distribution and Reliability Function

(2). Failure-rate or Hazard Function. The probability that a system


will fail in the time interval (t, t+∆t], given that it is in operating condition at time t, is

P (t < T ≤ t + ∆t ) F (t + ∆t ) − F (t )
P (t < T ≤ t + ∆t | T > t ) = = .
P(T > t ) R (t )

Failure-rate z(t) is the limit as ∆t → 0 of probability that a system will fail in the interval
(t, t+∆t], given that it is in operating condition at time t, per unit length of time. If this
unit length of time approaches 0, then we have the following expression for the failure
P(t < T ≤ t + ∆t | T > t ) F (t + ∆t ) − F (t ) 1 f (t )
rate: z (t ) = lim = lim ⇒ z (t ) = (B)
∆t → 0 ∆t ∆t → 0 ∆t R(t ) R(t )
F (t + ∆t ) − F (t )
because it is known that f (t ) = lim or equivalently
∆t → 0 ∆t
204
d
f (t ) = F (t ) . (C)
dt

From the above it is implied that when ∆t is small:


P (t < T ≤ t + ∆t | T > t ) ≈ z (t ) ⋅ ∆t . So the conditional probability is approximately equal to
the failure rate z(t) at time t, times the length of the interval ∆t.

d
From (A) and (C) we get: f (t ) = (1 − R (t )) = − R '(t ) .
dt

So (B) becomes:

− R '(t )
1
d
z (t ) = = − ln R(t ) ⇒ since R(0)=1, ∫ z (t )dt = − ln R(t ) , so
R(t ) dt 0


− z ( u ) du
R(t ) = e 0
. Finally we have:
t t

d ∫
− z ( u ) du ∫
− z ( u ) du
f (t ) = − R '(t ) = − (e 0
) ⇔ f (t ) = z (t )e 0
, t>0.
dt

So the failure-rate or hazard function is very useful for modeling,


because everything else can be derived from that.

In the following table the relationships between the distribution


function F(t), the density function f(t), the reliability or survivor function R(t), and the
failure-rate or hazard function z(t) are presented.192

192 Hoyland, page 22.

205
F(t) f(t) R(t) z(t)
t t

F(t)= ∫
0
f (u )du 1 − R(t )
1− e

− z ( u ) du
0

t
d d ∫
f(t)= F (t ) − R(t ) − z ( u ) du

dt dt z (t )e 0

∞ t

R(t)= 1 − F (t ) ∫ t
f (u )du
e

− z ( u ) du
0

f (t )
dF (t ) / dt ∞ d
z(t)= − ln R(t )
1 − F (t ) ∫ f (u )du
t
dt

Table 32. Relationships Between Functions F(t), R(t), f(t), z(t) (From Hoyland, page 22)

For the most mechanical and electronic systems the failure rate
over the life of the system has three discrete periods, characterized by the well known
“Bathtub Curve,” shown in Figure 45.193

Infant Mortality Useful Life Wear-out


Failure Rate z(t)

Reliability Measure

Durability Measure

Time a b

Figure 45. The Bathtub Curve

193 RAC Toolkit, page 38.

206
Infant mortality is the first phase of the bathtub curve where the
failure rate is high because of early manufacturing tolerances and inadequate
manufacturing skills. The failure rate is decreasing through time because of the maturity
of the design and the manufacturing process. Useful life is the second phase, which is
characterized as a relative constant failure rate. Wear-out is the last phase where
components start to deteriorate to such a degree that they have reached the end of their
useful life. This can be modeled either piece-wise or as the sum of three failure-rate or
t

hazard functions, one for each phase. Then R (t ) = e ∫0



z ( u ) du
and

⎧ z1 (t ), t < a
⎪ 3
z (t ) = ⎨ z2 (t ), a < t < b ,or z (t ) = ∑ zi (t ) .
⎪ z (t ), t > b i =1
⎩ 3

These concepts are illustrated in Figure 45.


(3). Mean-Time-to-Failure (MTTF). The MTTF of a system is the
expected value of T, which is given by the density function f(t) and is defined as:

MTTF = E (T ) = ∫ tf (t )dt . (D)
0

If the time needed to repair or replace a failed system is very short


relative to MTTF, then the mean time between failures (MTBF) is represented by MTTF.
If the repair time is comparable to MTTF, then the MTBF also includes the mean time to
repair (MTTR). These concepts are illustrated in Figure 46.

207
Condition Variable X(t)
1 : System up
0 : System down

0
MTTF MTTR MTTF MTTR
Time t
MTBF MTBF

Figure 46. MTTF, MTTR, MTBF. (From Hoyland, page 25)

Because f(t)=-R’(t) (D) becomes :


∞ ∞ ∞
dR(t )
MTTF = − ∫ tR '(t )dt = − ∫ t dt = −[tR (t )]0∞ + ∫ R(t )dt , by partial integration , and if
0 0
dt 0

MTTF < ∞ which is what is happening in reality, then −[tR(t )]∞0 = 0 and so

MTTF = ∫ R(t )dt also. (E)
0

f. Structure Functions
The system and each component may only be in one of two states,
operable or failed. Let xi indicate the state of component i, for 1 ≤ i ≤ n , and
⎧1 if component i works
xi = ⎨ where x = ( x1 , x2 ,...xn ) is the component state vector.
⎩ 0 if component i failed

The state of the system is also a binary random variable, which is


determined by the states of its components.

{
Φ = Φ(x) = system state = 10 ifif system works
system failed
, and

Φ = Φ(x) = Φ ( x1 , x2 ,...xn ) is the structure function of the system.194


194 Kuo, W., and Zuo, J. M., Optimal Reliability Modeling, John Wiley & Sons, 2003, page 87.

208
A series system with n components, works if and only if each of its n
components work, and fails whenever any of its components fails. The structure function
for a series system is
n
Φ = Φ (x) = x1 ⋅ x2 ⋅ ... ⋅ xn = ∏ xi 195.
i =1

It cannot usually be predicted with certainty whether or not a given


component will be in a failed state after t time units. So we interpret the state variables of
the n components at time t as random variables, and we denote them
as X 1 (t ), X 2 (t ), ... X n (t ) .

Now we focus on the following probabilities:

P ( Xi (t ) = 1) = pi (t ) for i = 1, 2,...n , which is the component’s i reliability,

and P (Φ( X (t )) = 1) = ps (t ) , which is the system’s reliability.

For the state variables X i (t ) for i = 1, 2,...n , we have

pi (t ) = Ε[ X i (t )] = 0 ⋅ P( X i (t ) = 0) + 1⋅ P( X i (t ) = 1), for i = 1, 2,...n

For the system reliability at time t, we have:

ps (t ) = Ε[Φ (X (t ))] where X(t ) = ( X 1 (t ), X 2 (t ), ... Xn(t )) ,

Assuming that X 1 (t ), X 2 (t ), ... X n (t ) are independent, the system


n n
reliability is ps (t ) = ∏ pi (t ) or R (t ) = r1 (t ) ⋅ r2 (t ) ⋅ ... ⋅ rn (t ) = ∏ ri (t ) , where R(t) is the
i =1 i =1

system’s reliability and ri(t) is the ith component’s reliability for a series system.196

195 Hoyland, page 99.


196 Ibid, page 127-129.

209
g. Series System Reliability Function and MTTF197
From Table 32 we find the failure rate function for the system is

d d
z (t ) = − ln( R(t )) = − ln(r1 (t ) + r2 (t ) + ⋅⋅⋅ + rn (t ))
dt dt

which is z (t ) = z1 (t ) + z2 (t ) + ⋅⋅⋅ + zn (t ) .

So the failure rate for a series system equals the sum of the failure rates of
all its components. As a result, the failure rate of the system is greater than the failure rate
of any of its components, and the whole system is driven by the worst component, which
is the one with the larger failure rate or the least reliability.

From the above, we can conclude that if we want to optimize a series


system reliability, we must reduce the number of the components, and if that is not
possible, then we must enhance the reliability for the worst component.

For example, and to simplify, we may assume that each of the components
in our system has an exponential lifetime distribution. Then the system also has an
exponential lifetime distribution. If zi (t ) = λi is the failure rate for component i, then the
n
failure rate for the system is Z (t ) = λs = ∑ λi , and the reliability function of the system
i =1


becomes: R (t ) = e − λs ⋅t
. Then (E) becomes: MTTFs = ∫ e − λs ⋅t dt = 1/ λs .
0

h. Quantitative Measures of Availability


The quantitative measures of availability are listed in the following
table.198

197 Kuo, pages 107-108.


198 RAC Toolkit, page 12.

210
Measure Equation Reliability & Maintainability
considerations
Inherent MTBF Assures operation under declared
Availability Ai = conditions in an ideal customer service
MTBF+MTTR
environment.
It is usually not a field-measured
requirement.
Achieved MTBM Similar to Ai
Availability Aa =
MTBM+MTTR active
Operational MTBM Extends Ai to include delays
Availability Ao = Reflects the real world operating
MTBM+MDT
environment
Not specified as a manufacturer-
controllable requirement
MTBF = Mean Time between Failure
MTTR = Mean Time to Repair
MTBM = Mean Time between Maintenance
MTTRactive = Mean Time to Repair
MDT = Mean Downtime
(corrective maintenance only)

Table 33. The Quantitative Measures of Availability (After RAC Toolkit, page 12)

211
THIS PAGE INTENTIONALLY LEFT BLANK

212
APPENDIX E: LIST OF ACRONYMS AND DEFINITIONS

AAV - Advanced Air Vehicle

ACTD - Advance Concept Technology Demonstrations

AEA - American Engineering Association

APU - Auxiliary Power Units

ATA - Air Transport Association

BDA - Battle Damage Assessment

BS - British Standard

CAA/UK - Civil Aviation Administration from the UK

CHAE - Conventional High-Altitude Endurance

CP – Counter-proliferation

DARPA - Defence Advanced Research Projects Agency

DS - Discard

EO - Electro-Optical

EPRI - Electric Power Research Institute

ERAST - Environmental Research Aircraft and Sensor Technology

FAA - Federal Aviation Authority

FMA - Failure Mode Analysis

FMCA - Failure Mode and Critical Analysis

FMEA - Failure Mode and Effect Analysis

FMECA - Failure Mode Effect and Criticality Analysis

FRACAS – Failure Reporting And Corrective Action System

213
FTA - Fault Tree Analysis

GCS – Ground Control Station

GPS - Global Positioning System

HAE - High-Altitude Endurance

ICAO - International Civil Aviation Organization

IN/FC - Inspection/Functional Check

INS – Inertial Navigation System

IR - Infrared

ISC - Industry Steering Committee

ISO - International Standard Organization

JSF - Joint Strike Fighter

L/HIRF - Lightning/High Intensity Radiated Field

LOS - Line-Of-Sight

LU/SV - Lubrication/Servicing

MAV - Micro-Air Vehicle

MDT - Mean Downtime

MR - Mishap Rate

MRB - Maintenance Review Board

MRBR - Maintenance Review Board Report

MSG-3 - Maintenance Steering Group-3

MSI - Maintenance Significant Items

MTBCF - Mean Time Between Critical Failure

MTBF - Mean Time Between Failure

MTBM - Mean Time between Maintenance

214
MTTR - Mean Time to Repair

MUAV - Micro UAV

NASA - National Aeronautics and Space Administration

NAWC/AD - Naval Air Warfare Centre Aircraft Division

NPS – Naval Postgraduate School

NRL - Naval Research Laboratory

O&S - Operation and Support

OBC - Onboard Computer

OBC - Onboard Computer

OP/VC - Operational/Visual Check

OTHT - Over The Horizon Targeting

PM - Planned Maintenance

QFD - Quality Function Deployment

RC - Radio Control

RCM - Reliability Centered Maintenance

RECCE - Reconnaissance mission

RPN - Risk Priority Number

RPV - Remote Piloted Vehicles

RS - Restoration

RSTA - Reconnaissance Surveillance and Target Acquisition

SAE - Society of the Automotive Engineers

SAR - Synthetic Aperture Radar

SARP - Standards and Recommended Practices

SEAD - Suppression of the Enemy Air Defences

215
SIGINT – Signal Intelligence

SSI - Structural Significant Items

STAN - Surveillance and Tactical Acquisition Network

SUAV - Small Unmanned Aerial Vehicle

TAAF - Test, Analyze and Fix

TR - Tactical Reconnaissance

TUAV - Tactical UAV

UCAV - Unmanned Combat Aerial Vehicle

UHF - Ultra High Frequency

VR - Vendor Recommendations

VTOL - Vertical Take-Off and Landing

WG - Working Group

216
LIST OF REFERENCES

Air Transport Association of America, “ATA MSG-3, Operator/Manufacturer


Scheduled Maintenance Development, Revision 2002.1,” November 30, 2001.

Aladon Ltd, Specialists in the application of Reliability-Centered Maintenance,


“Reliability Centred Maintenance-An Introduction,” Internet, February 2004. Available
at: www.aladon.co.uk/10intro.html

Aladon Ltd, Specialists in the application of Reliability-Centered Maintenance,


“About RCM,” Internet, February 2004. Available at: www.aladon.co.uk/02rcm.html

Ashworth, Peter, LCDR, Royal Australian Navy, Sea Power Centre, Working
Paper No6, “UAVs and the Future Navy”, May 2001, Internet, February 2004. Available
at: http://www.navy.gov.au/spc/workingpapers/Working%20Paper%206.pdf

Athos Corporation, Reliability-Centered Maintenance Consulting, “SAE RCM


Standard: JA 1011, Evaluation Criteria for RCM Process,” Internet, February 2004.
Available at: http://www.athoscorp.com/SAE-RCMStandard.html

Barfield, Finley, Automated Air Collision Avoidance Program, Air Force


Research Laboratory, AFRL/VACC, WPAFB,“Autonomous Collision Avoidance: the
Technical Requirements,” 0-7803-6262-4/00/$10.00(c)2000 IEEE.

Blischke, R. W., and Murthy D. N. Prabhakar, Reliability Modeling, Prediction,


and Optimization, John Wiley & Sons, 2000.

Carmichael, Bruce W., Col (Sel), Troy E. DeVine, Maj, Robert J. Kaufman, Maj,
Patrick E. Pence, Maj, and Richard S. Wilcox, Maj, “Strikestar 2025,” August 1996,
Department of Defense, Internet, February 2004. Available at: http://www.au.af.mil
/au/2025/volume3 /chap13/v3c13-2.htm

Ciufo, Chris A., “UAVs:New Tools for the Military Toolbox,” [66] COTS
Journal, June 2003, Internet, May 2004. Available at: http://www.cotsjournalonline.com
/2003/66

217
Clade, Lt Col, USAF, “Unmanned Aerial Vehicles: Implications for Military
Operations,” July 2000, Occasional Paper No. 16 Center for Strategy and Technology,
Air War College, Air University, Maxwell Air Force Base.

Clark, Richard M., Lt Col, USAF, “Uninhabited Combat Aerial Vehicles,


Airpower by the People, For the People, But Not with the People,” CADRE Paper No. 8,
Air University Press, Maxwell Air Force Base, Alabama, August 2000, Internet, February
2004. Available at: http://www.maxwell.af.mil/au/aupress/CADRE_Papers/PDF_Bin
/clark.pdf

Clarke, Phill, “Letter to the Editor of New Engineer Magazine regarding Professor
David Sherwin at ICOMS 2000,” question 10, August 2000, Internet, May 2004.
Available at: http://www.assetpartnership.com /downloads.htm-13k

Clough, Bruce, “UAVS-You Want Affordability and Capability? Get


Autonomy!” Air Force Research Laboratory, 2003.

Coker, David, and Geoffrey Kuhlmann, “Tactical-Unmanned Aerial Vehicle


‘Shadow 200’ (T_UAV),” Internet, February 2004. Available at:
http://www.isye.gatech.edu/~tg /cources/6219/assign /fall2002/TUAVRedesign/

Department of Defense, Director of Operational Test & Evaluation, “Missile


Defense and Related Programs FY 1997 Annual Report,” February 1998, Internet,
February 2004. Available at: http://www.fas.org/spp/starwars/program/dote97/97cp.html

Department of Defense, MIL-STD-1629A, “Procedures For Performing a Failure


Mode Effects and Criticality Analysis,” Task 101 FMEA sheet, November 24, 1980.

Dixon, Stephen R., and Christopher D. Wickens, “Control of multiple UAVs: A


Workload Analysis,” University of Illinois, Aviation Human Factors Division, Presented
to 12th International Symposium on Aviation Psychology, Dayton, Ohio 2003.

Duane, J. J., “Learning Curve Approach to Reliability Modeling,” Institute of


Electrical and Electronic Engineers Transactions on Aerospace and Electronic Systems
(IEEE. Trans. Aerospace 2563), 1964.

218
Fei-Bin, Hsiao, Meng-Tse Lee, Wen-Ying Chang, Cheng-Chen Yang, Kuo-Wei
Lin, Yi-Feng Tsai, and Chun-Ron Wy, ICAS 2002, 23rd International Congress of
Aeronautical Sciences, proceedings, Toronto Canada, 8 to 13 September 2002, Article:
“The Development of a Low Cost Autonomous UAV System”, Institute of Aeronautics
National Cheng Kung University Tainan, TAIWAN ROC.

GlobalSecurity.org, “Pioneer Short Range (SR) UAV,” maintained by John Pike,


last modified: November 20, 2002, Internet, May 2004. Available at: http://www.
globalssecurity.org/intell/systems/pioneer.htm

GlobalSecurity.org, “RQ-3 Dark Star Tier III Minus,” maintained by John Pike,
last modified: November 20, 2002, Internet, May 2004. Available at: Available at:
http://www.globalsecurity.org /intell/systems/darkstar.htm

Goebel, Greg,/ In the Public Domain, “[6.0] US Battlefield UAVs (1),” January 1,
2003, Internet, February 2004. Available at: http://www.vectorsite. net/twuav6.html

Gottfried, Russell, LCDR (USN), Unmanned Vehicle Integration TACMEMO, 5-6


May Recap, e-mail May 7, 2004.

Hoivik, Thomas H., OA-4603 Test and Evaluation Lecture Notes, Version 5.5,
“The Role of Test and Evaluation,” presented at NPS, winter quarter 2004.

Hoyland, A., and Rausand, M., System Reliability Theory: Models and Statistics
Methods, New York: John Wiley and Sons, 1994.

Kececioglu, D., Reliability Engineering Handbook Volume 2, Prentice Hall Inc.,


1991.

Kuo, W., and Zuo, J. M., Optimal Reliability Modeling, John Wiley & Sons,
2003.

Lewis E. E., Introduction to Reliability Engineering, Second Edition, John Wiley


& Sons, 1996, pages 211-212.

Lopez, Ramon, American Institute of Aeronautics and Astronautics (AIAA),


“Avoiding Collisions in the Age of UAVs,” Aerospace America, June 2002, Internet,

219
February 2004. Available at: http://www.aiaa.org/aerospace /Article.cfm?
issuetocid=223&ArchiveIssueID=27

McDermott, E. R., Mikulak, J. R, and Beauregard, R. M., The Basics of FMEA,


Productivity Inc., 1996.

Meeker, Q. W., and Escobar, A. L., Statistical Methods for Reliability Data, John
Wiley & Sons Inc., 1998.

Message from COMMMNAVAIRSYSCOM to HQ USSOCOM MACDILL AFB


FL, March 26, 2004, “UAV Interim Flight Clearance for XPV-1B TERN UAV System,
Land Based Concept of Operation Flights.”

Morris, Jefferson, Aerospace Daily, December 8, 2003, “Navy To Use Wasp


Micro Air Vehicle To Conduct Littoral Surveillance.”

Moubray, John, summarized by Sandy Dunn, Plant Maintenance Resource Center,


“Maintenance Task Selection-Part 3,” Revised September 18, 2002, Internet, May 2004.
Available at: http://www.plant-maintenance.com/articles/maintenance_tak_selection_
part2.shtml

Moubray, John, an excerpt of the first chapter of the book “Reliability-centered


Maintenance,” Plant Maintenance Resource Center, “Introduction to Reliability-centered
Maintenance,” Revised December 3, 2002, Internet, May 2004, Available at:
http://www.plant-maintenance.com/RCM-intro.shtml

Munro, Cameron and Petter Krus, AIAA’s 1st Technical Conference & Workshop
on Unmanned Aerospace Vehicles, Systems, Technologies and Operations; a Collection
of Technical Papers, AIAA 2002-3451,“A Design Approach for Low cost ‘Expendable’
UAV system,” undated.

Nakata, Dave, White paper, “Can Safe Aircraft and MSG-3 Coexist in an Airline
Maintenance Program?”, Sinex Aviation Technologies, 2002, Internet, May 2004.
Available at: http://www.sinex.com/ products/Infonet/q8.htm

National Aeronautics and Space Administration (NASA), “Preferred Reliability


Practices: Problem Reporting and Corrective Action System (PRACAS),” practice NO.

220
PD-ED-1255, Internet, February 2004. Available at: http://klabs.org/DEI /References
/design_guidelines/design_series/1255ksc .pdf

National Aeronautics and Space Administration (NASA), “Reliability Centered


Maintenance & Commissioning,” February 16, 2000, Internet, May 2004. Available at:
http://www.hq.nasa.gov/office/codej/codejx/Intro2.pdf

National Air and Space Museum, Smithsonian Institution, “Pioneer RQ-2A,”


1998-2000, revised 9/14/01 Connor R. and Lee R. E., Internet, May 2004. Available at:
http://www.nasm.si.edu /research/aero/aircraft/pioneer.htm

NAVAIR, “Small Unmanned Aerial Vehicles,” undated, Internet, February 2004.


Available at: http://uav.navair.navy.mil/smuav/smuav_home.htm

Office of the Secretary of Defense (OSD), “Unmanned Aerial Vehicles Roadmap


2000-2025,” April 2001.

Office of the Secretary of Defense (OSD), “Unmanned Aerial Vehicles Roadmap


2002-2027,” December 2002.

Pecht, M., Product Reliability Maintainability and Supportability Handbook,


CRC Press, 1995.

Peck, Michael, National Defense Magazine, May 2003, Feature Article,


“Pentagon Unhappy About Drone Aircraft Reliability, Rising Mishap Rates of Unmanned
Vehicles Attributed to Rushed Deployments,” Internet, February 2004. Available at:
http://www. nationaldefensemagazine.org/article. cfm?Id=1105

Petrie, G., Geo Informatics, Article “Robotic Aerial Platforms for Remote
Sensing,” Department of Geography &Topographic Science, University of Glasgow,
May 2001, Internet, February 2004. Available at: http://web.geog.gla.ac.uk /~gpetrie
/12_17_petrie.pdf

Pike, John, Intelligence Resource Program, “Unmanned Aerial Vehicles


(UAVS),” Internet, March 2004. Available at: http://www.fas.org/irp/program/collect/ua

221
Puscov, Johan, “Flight System Implementation,” Sommaren-Hosten 2002, Royal
Institute of Technology (KTH), Internet, February 2004. Available at: http://www.
particle.kth.se/group_docs/admin /2002/Johan_2t.pdf

Regan, Nancy, RCM Team Leader, Naval Air Warfare Center, Aircraft Division,
“US Naval Aviation Implements RCM,” undated, Internet, February 2004. Available at:
http://www.mt-online.com/articles/0302_navalrcm.cfm

Reliability Analysis Center (RAC), Failure Mode, Effects and Criticality Analysis
(FMECA), 1993.

Reliability Analysis Center (RAC), Fault Tree Analysis (FTA) Application Guide,
1990.

Reliability Analysis Center (RAC), Reliability Toolkit: Commercial Practices


Edition. A Practical Guide for Commercial Products and Military Systems Under
Acquisition Reform, 2004.

Riebeling, Sandy, Redstone Rocket Article, Volume 51, No.28, “Unmanned Aerial
Vehicles,” July 17, 2002, Col. Burke John, Unmanned Aerial Vehicle Systems project
manager, Internet, February 2004. Available at: http://www.tuav.redstone.army.mil
/rsa_article.htm

Robinson, John, Technical Specialist Mercury Computers, COTS Journal, “UAV


Multi-Mission Payloads Demand a Flexible Common Processor,” June 2003, Internet,
February 2004. Available at: http://www.mc.com/literature/literature_files/COTSJ
_UAVs_6-03.pdf

Sakamoto, Norm, presentation: “UAVs, Past Present and Future,” Naval


Postgraduate School, February 26, 2004.

Stamatis, D. H., Failure Mode and Effect Analysis: FMEA from Theory to
Execution, American Society for Quality (ASQ), 1995.

Sullivan, Carol, Kellogg, James, Peddicord, Eric, Naval Research Lab, January
2002, Draft of “Initial Sea All Shipboard Experimentation.”

222
Teets, Edward H., Casey J. Donohue, Ken Underwood, and Jeffrey E. Bauer,
National Aeronautics and Space Administration (NASA), NASA/TM-1998-206541,
“Atmospheric Considerations for UAV Flight Test Planning,” January 1998, Internet,
February 2004. Available at: http://www.dfrc.nasa.gov /DTRS/1998/PDF/H-2220.pdf

The Global Aircraft Organization, US Reconnaissance, “U-2 Dragon Lady,”


Internet, February 2004. Available at: http://www.globalaircraft.org/planes/u-2
_dragon_lady.pl

The Warfighter’s Encyclopedia, Aircraft, UAVs, “RQ-2 Pioneer,” August 14,


2003, Internet, February 2004. Available at: http://www.wrc.chinalake. navy.mil
/warfighter_enc/aircraft/UAVs/pioneer .htm

Tozer, Tim, David Grace, John Thompson, and Peter Baynham, “UAVs and
HAPs-Potential Convergence for Military Communications,” Univercity of York, DERA
Defford, undated, Internet, February 2004. Available at: http://www.elec.york.ac.uk
/comms/papers/tozer00_ieecol.pdf

Transport Canada Civil Aviation (TCCA), Maintenance Instruction Development


Process, TP 13850, Part B, “The Maintenance Review Board (MRB) Process (TP 13850),
Chapter 1. General,” last updated: April 19, 2003, Internet, February 2004. Available at:
http://www.tc.gc.ca/civilaviation /maintenance/aarpd/tp13850/partB.htm

UAV Annual Report FY 1997: Subsystems, Key subsystem program, “UAV


common recovery system (UCARS),” Internet, February 2004. Available at:
http://www.fas.org/irp/agency/daro/uav97 /page36.html

UAV Rolling News, “New UAV work for Dryden in 2004,” June 12, 2003,
Internet, February 2004. Available at: http://www.uavworld.com /_disc1/00000068.htm

UAV Rolling News, “UAV Roadmap defines reliability objectives,” March 18,
2003, Internet, February 2004. Available at: http://www.uavworld.com/_disc1 /0000002

Undated message from Commander, Cruiser Destroyer Group 12 to Commander,


Second Fleet, “Urgent Requirement for UAVs in Support of Enterprise Battle Group
Recognized Maritime Picture.”

223
Williams Warren, and Michael Harris, “The Challenges of Flight –Testing
Unmanned Air Vehicles,” Systems Engineering, Test & Evaluation Conference, Sydney,
Australia, October 2002.

Zachany, Bathon A., Marine Forces Reserve, “Unmanned Aerial Vehicles Help
3/14 Call For and Adjust Fire,” Story ID Number: 2001411104010, April 5, 2001,
Internet, February 2004. Available at: http://www.13meu.usmc. mil/marinelink
/mcn2000.nsf/Open document.

224
INITIAL DISTRIBUTION LIST

1. Defense Technical Information Center


Ft. Belvoir, VA

2. Dudley Knox Library


Naval Postgraduate School
Monterey, CA

3. Dr. Dave Olwell (2)


NPS, (Code OR-OL)
Monterey, CA

4. Dr. David W. Netzer


NPS, (Code 09)
Monterey, CA

5. Dr. Mike McCauley


NPS, (Code OR)
Monterey, CA

6. Dr. Thomas Hoivik


NPS, (Code OR)
Monterey, CA

7. Gottfried Russell, LCDR


NPS, (Code OR)
Monterey, CA

8. CDR DonVarner, Air 4.1.1.5


UAV Class Desk, PMA-263, NAVAIR
Patuxent River, MD

9. Brad Johnson, Code 4.1.1.5


PMA-263, NAVAIR
Patuxent River, MD

10. Brannan Joe D., Code 4.1.1.5


Fire scout Class Desk
Patuxent River, MD

11. Sorensen Dennis, Capt,


3029 Jeannie Anna Ct,
Oak Hill, Va 20171

225

You might also like