Applied Reliability Centered Maintenance
Applied Reliability Centered Maintenance
Applied Reliability Centered Maintenance
Applied
Reliability-Centered
Maintenance
front matter i-xxiv.qxd 3/3/00 2:28 PM Page ii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page iii
Applied
Reliability-Centered
Maintenance
by Jim August, PE
OME, Inc.
front matter i-xxiv.qxd 3/3/00 2:29 PM Page iv
Copyright 1999 by
PennWell
1421 S. Sheridan/P.O. Box 1260
Tulsa, Oklahoma 74101
03 02 01 00 99 1 2 3 4 5
front matter i-xxiv.qxd 3/3/00 2:29 PM Page v
Table of Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xv
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xix
Chapter 1 Applied RCM: An Overview . . . . . . . . . . . . . . . . . . . . . . . .1
Chapter 2 Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Chapter 3 RCM Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Chapter 4 Plant Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113
Chapter 5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
Chapter 6 Lessons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .195
Chapter 7 Fast Track . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .255
Chapter 8 Maintenance Software . . . . . . . . . . . . . . . . . . . . . . . . . . .301
Chapter 9 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .321
Chapter 10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .341
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .353
Further Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .391
RCM Software Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . .437
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .477
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .481
v
front matter i-xxiv.qxd 3/3/00 2:29 PM Page vi
front matter i-xxiv.qxd 3/3/00 2:29 PM Page vii
Figures
vii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page viii
viii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page ix
ix
front matter i-xxiv.qxd 3/3/00 2:29 PM Page x
Tables
1-1 Common Outage Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
2-1 Example of Time Base Intervals . . . . . . . . . . . . . . . . . . . . . . . . . .49
3-1 Equipment for Standard Templates . . . . . . . . . . . . . . . . . . . . . .81
3-2 Strategy Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
4-1 Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128
4-2 PM Basis Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133
4-3 PM vs. CM and WO classes . . . . . . . . . . . . . . . . . . . . . . . . . . .154
4-4 Component, Function, Part Failure . . . . . . . . . . . . . . . . . . . . . . .156
6-1 Tick Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221
6-2 Critical Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .246
7-1 Areas Not Worked On-Line . . . . . . . . . . . . . . . . . . . . . . . . . . . .292
x
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xi
Acronyms List
xi
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xii
xii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xiii
xiii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xiv
xiv
front matter i-xxiv.qxd 3/14/00 5:07 PM Page xv
Acknowledgements
...a chaise breaks down, but doesnt wear out
xv
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xvi
xvi
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xvii
xvii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xviii
new term? RCM has been given a black eye by analysis. Some would
make it a religion or take it into non-measurable, esoteric philosophy.
I believe that RCMs most appropriate application is fundamentally as
a technology. Theres little new except complexity, failure behavior,
numbers, exploration, and RCMs integrating perspective. Since no
one has offered a completely comprehensive text emphasizing mainte-
nance technology, I offer this work.
The most tedious and difficult aspects of RCM are selecting task
limits and intervals. Fortunately, RCM also provides techniques to
provide answerseven with incomplete informationwith powerful
alternatives that help us to manage risk. Technical competence is
taken for granted; it may or may not be available.
ARCM is fundamentally about using RCM for value, to under-
stand which paths to pursue, what ax to grind, and where to focus.
ARCM is finding those things that provide value in a specific setting,
and doing them. Just do it!
Know your equipment. Know how it ages. Know that it is aging,
and how to restore it. And when its broken, just fix it! This last mis-
sive is the hardest. A chaise breaks down but never wears out!
xviii
front matter i-xxiv.qxd 3/14/00 5:07 PM Page xix
Preface
Things that matter most must never be at the mercy of those that matter least.
-Goerthe
There hasnt been a maintenance best seller since Zen and the
Art of Motorcycle Maintenancea book on philosophy. The authors
theme was his love/hate relationship with technology, expressed as his
motorcycle.
Maintenance is a tough subject to write about. Its so dry! Yet its
a subject we all immediately relate to, both professionally and as people.
The last significant new book on technical maintenance intro-
duced us to RCM, in 1978. Since then, a few new maintenance terms
have been added, such as total productive maintenance (TPM) and
total quality management (TQM). Asset management is the latest twist
on the subject, as I write this. The original United Airlines (UAL)
work, Reliability Centered Maintenance, by Nolan and Heap, pub-
lished in 1978, has led to 20 years of implementation history in the
nuclear generation industry. Several re-interpretations have been pub-
lished. My purpose in this book is to provide new fundamental inter-
pretations of ARCM based upon the original theory, and to discuss
them in terms of real-world problems and experience. These prob-
lems at one time or another demanded hours of analytical thought,
planning and performancetheir learning was in some instances bit-
terly bought.
RCM offered a fresh perspective on maintenance, focusing on the
theory and methodology of traditional, non-aeronautics RCM.
However, the original RCM fieldaeronauticsdiffers radically from
power generation. RCM texts currently available could easily leave
the impression that applying RCM requires the skills of engineers and
mathematicians.
In fact, RCM summarizes practical experience, putting it on a firm
engineering and mathematical foundation. RCM provides a fresh per-
spective on maintenance to enable us to make better, more informed
xix
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xx
xx
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xxi
xxi
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xxii
grams. On another level, there are only a few unique and new dimen-
sions in RCM. One of these is inherent reliability. The single most
important maintenance lesson is that when you have a maintenance
problemfix it. No benefit results from deferring or ignoring mainte-
nance. The likely results are complicationscompounded problems,
secondary failures, and greater expenses.
This book encourages small, simple steps to help implement RCM
and to help guide big projects in organizations with formal mainte-
nance foundations. These methods counter prevailing wisdom and I
eagerly look forward to discussions I hope to generate.
Examples cited here all have basis in fact and are taken from an
historical perspective. Some were faced as long as 20 years ago! In
some instances the plant and even the companies no longer exist. All
are provided to stimulate thought and provoke reader reflection. I
believe that had we known of these methods at the time, and had
broader recognition of their validity, we would have been more effec-
tive and may have avoided expensive consequences. On the other
hand, where we decided to be effective at the time, we typically were.
So, what key points of ARCM are covered in this text?
xxii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xxiii
xxiii
front matter i-xxiv.qxd 3/3/00 2:29 PM Page xxiv
chapter 1 1-22.qxd 3/14/00 5:08 PM Page 1
Chapter 1
Applied RCM: An Overview
The problem with twin-engine planes is that they double your chance of engine failure.
-(Aviation anecdote)
Precursors to RCM
Early in the 20th century, Frederick Taylor, William Shewhart, W.E.
1
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 2
Development of RCM
The term comes from the title of the work, Reliability Centered
Maintenance by Stanley Nowlan and Howard Heap. It was published
by United Airlines in conjunction with the U.S. Defense Department. It
remains available through the National Technical Information Agency
and the Department of Commerce.
RCM fills a void between reliability (R) engineeringfocusing on
the theory and mathematics of Rand the workplace, where maintain-
ing production is key. Applied conscientiously over time, RCM provides
production focus. While there are other tools (and no single one is per-
fect), and although tools and processes overlap in approachesand
adjunct tools include training, technology, and softwareRCM is par-
ticularly suited to American culture and needs.
Consultants sell versions of RCM. At least 10 different software
packages purport to allow users to perform RCM. Two-to-four page
magazine ads in maintenance periodicals promise to teach RCM in three
days. (I wish these guys had been around when I took integral calculus.
Perhaps I could have learned that in three days!) Some companies
2
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 3
practice more than one version. If for no other reason than to engage
small talk at industry conferences, its useful to know what RCM is
and what is it not.
RCM has other names. PMO is one. Common sense is another.
There are certainly competing versions of RCM, as well. An RCM
process standard has been drafted. Questions outnumber answers:
3
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 4
Origin
Maintenance came into its own as a concept with the industrial
revolution. Before that time, machines were designed, built, and main-
tained by their users. Watt, Edison, Westinghouse, the Wrights, Sikorsky,
and a long list of other brilliant people conceptualized, developed, test-
ed, and debugged their own designs. They had few peers, for design is
the realm of sheer genius. Design-build-operate information exchange
wasnt necessary-they were integrated in one and the same person.
The industrial revolution differentiated processes. Product users
became separate from product makers. As production became depend-
ent upon machines, specialtiesoperators and maintainersemerged
and evolved into different jobs. Scientific work analysis (espoused
early in this century by Frederick Taylor) found that there were benefits
in specialization. The assembly linededicating low-skilled workers to
specific assembly taskstook this position to an extreme degree.
Operations diverged from maintenance. Managers didnt want opera-
4
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 5
Figure 1-1: Idarado Ball and Rod Mill, Pandora, Co. Informal on-the-job training,
remote owners, and lack of operating strategy lead to sporadic operations, high costs,
and eventual shut down. This plant employing 350 people kept the otherwise non-
descript town of Telluride from becoming yet another western ghost town in the 1960s.
RCM
When Stan Nolan and Rowland Heap coined the term R-centered
maintenance in their 1978 publication they summarized early jet engine
R development by the commercial airline industry and the FAA.
Ultimately, RCM was applied to jumbo airliners (beginning with the
Boeing 747) to capture practical R lessons in a highly visible field. This
work provides many of the concise RCM terms:
5
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 6
condition-monitoring
maintenance task
hard-time
logic tree analysis
on-condition
effectiveness
age exploration
failure-finding
time-based
R studies in the late 50s were driven by the large lead the Soviets
apparently held in missile technology. Spurred on by congressional
funding, R studies in defense and aerospace took many paths. Spin-off
benefits included development of:
6
chapter 1 1-22.qxd 3/14/00 5:08 PM Page 7
7
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 8
8
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 9
Post-World War II
Soon after the war, television added a whole new dimension to
American life. Jet engines, rockets, and nuclear reactors were intro-
duced. Designs were refined and matured. The steam locomotive
benefactor of a hundred years of evolutionwas outflanked by subma-
rine diesel engines that had been modified for locomotive use. Post-war
production shifted to consumer products. By the 1950s, a new para-
digm presented American products and technology as the best in the
world, though new products provided new problems amid the techni-
cal advances. Technology growth led to a second preventive mainte-
nance paradigmpredictive maintenance (PdM).
PdM suited the rapid advances in diagnostics and equipment taking
place at the time. Using our insight into the mechanics of failures, we
would be able to predict when things were going awry, and then head
them off before they did. The Department of Defense applied PdM on
F-105s, fast-attack submarines, and M-100 Abram tanks. Maintenance
practitioners and managers embraced PdM applications such as vibra-
tion monitoring, oil sample analysis, multi-channel analyzers, and
remote telemetered data. Regulators also saw the appeal in these
philosophiesso much so, they sometimes mandated their use. Areas
of vital public interest, such as nuclear power and air transportation,
were early PdM proponents. Military procurement contracts specified
PdM use. Industrial safety and environmental protection followed.
Over time, however, requirements became more prescriptive.
Computerized maintenance management systems (CMMS) delivered
information with ease; suddenly, organizations were buried in mainte-
nance demands. More parties took interest in the maintenance process
and had resources to pursue their interests. The vast resources of the
federal government could be applied where the public interest was con-
cerned. The PdM experience bogged down and stalled.
PdM acknowledged time-based PM but emphasized that you
couldnt prevent all failures with TBM. You could do something near-
ly as good and possibly more useful, howeveryou could know when
things were starting to fail. All you needed was the right diagnostic tools
and the ability to interpret them. All it took was a little savvy and the
right technologyand Americans had both! The model held great
appeal. So much so, that thousands of predictive maintenance programs
9
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 10
10
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 11
Airlines and Air Force statistical failure studies supported new and fun-
damentally different interpretation of failures.
Advancing rapidly along several paths, failure pattern recognition
developed into the identification and study of failure modes and their
effects. Emphasis shifted from performing repairsthe historical focus
of maintenanceto understanding the causes of failure. The assump-
tion that maintenance was always effective was challenged. Systems the-
ory and evaluation of the Pratt & Whitney JT-4 engine maintenance
results laid the foundations of what has come to be called RCM.
Key aspects of the initial findings included:
systems focus
recognition of complexity as an important attribute in
modern equipment
failure classification by modes
assessment of failure mode effects on systems
numerical and statistical data evaluation of large
equipment populations
CNM
Key:
PM -- Preventive Maintenance
CM -- Corrective Maintenance
TBM -- Time Based Maintenance
CDM -- Condition-Directed Maintenance
OCM -- On-Condition Maintenance
OCMFF -- (OCM) Failure Finding
NSM -- No Scheduled Maintenance
CNM -- Condition Monitoring
Figure 1-3: Maintenance Terms Map
Applied RCM
Because it evolved in a highly regulated environment, traditional
RCM (TRCM) includes a rigorous task selection methodology that fol-
lows detailed flow paths needed to document decision-making. This is
12
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 13
13
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 14
The R in RCM
Reliability defined
Mathematically defined, reliability (R) is a conditional probability
the ratio of acceptable outcomes to total trials. More exactly, R is the
probability that components, equipment, and systems will perform their
design functions without failure. Its based upon:
While its not the purpose of this book to develop R theory, we need
to understand basic R concepts to appreciate the R in RCM. Intuitively,
we should have some benchmark R numbers in mind when we look at
any equipment. For example, is a feedpump R of 0.99995 satisfactory?
In what context? How about overall feedwater system R? Two 50%
pump combinations? Three? What are the benchmark comparison
standards? How can we relate these numbers to conditions that utility
managers more closely follow, such as equivalent availability, capacity,
and cost?
14
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 15
R = 1 - Unreliability
0.99995 = 1 - 0.00005
R engineering
R engineering applies R theory to solve engineering problems. This
is done by projecting a systems overall R and applying engineering
methods to assure those goals are achieved. When R is allocated among
constituent components, successful mission completion can be estab-
lished for new designs with relative confidence. For existing facilities,
sources of unreliability can be identified and traced back to causes
design, operation, maintenance, or a combination thereof.
Unlike military R applications that focus on individual mission
events, power plants look at operating periods. These could be:
15
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 16
16
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 17
its easy to cut back on capital investments that have little or no short-
term payback, even if they could affect mean-time-to-repair for essen-
tial equipment. Utilities often establish a project budget to manage
costs without considering long-term operational consequences. This
provides an opportunity to apply R engineering.
That being said, in commercial generation designs, R engineering
uses general thumb rulesstandards and guidelinesto achieve
client contract goals. R is built upon incremental advances in produc-
tion methods and facilities, standardized redundancy, layout planning,
and common design packages. Designers use experience and similar
designs to project plant R. Probability risk assessments are reserved for
nuclear plants, where special requirements (such as the NRCs mainte-
nance rule) override simple economics.
There are two ways to evaluate Ra priori (before the facts) and a
posteriori (afterwards). Production R engineering looks at a facilitys a
posteriori performance, examining sources of unreliability and their
causes. By allocating unreliability downward to systems, equipment,
and components, engineers identify those areas with the greatest oppor-
tunity for improvement. They can then allocate resources to where they
will do the most good.
A priori calculations require the use of probability theory and
assumptions. This can be illustrated by tossing a coin 1,000 times. If you
get 493 heads, the a posteriori probability of a head is 0.493 or
49.3%. Probability theory tells us that for the toss of a fair coin, the a
priori probability of a head is 50%, exactly. (Strictly speaking, the mean
value probability approaches 0.50 after many tosses.)
The key assumption is that we have a fair coin. Overlooking or fail-
ing to appreciate such a simple, common assumption in a real-world
problem can be painful to the owner of a manufacturing plant stuck
with a costly retrofit, significantly different production costs, or both.
When they assess facility R projections, owners must carefully evaluate
how they were developedtheir basis. Numerical results providing R
whether casual R estimates or formal failure modes and effects critical-
ity assessment (FMECA)are rarely provided with designs. In their
absence, the owner must rely on:
17
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 18
Process R
TQM and statistical process control (SPC) address production
process R. Each process has different inherent design capabilities. This
concept of process capability has been thoroughly developed by manufac-
turing process engineers and statisticians. In addition, Deming, Stewhart,
Juran, Gryna and others provided many insights into the statistical basis
for production process improvement. While some companies are very
capable at improving production processes-most find it a struggle.
Yet, a goal of this book is to provide tools for generation engineers
seeking to improve plant process R to support higher unit, plant, and
system R goals. Like a body-builder developing muscle mass, however,
building intrinsic process R is a laboriously slow process.
Initially, theres lots of training and other investment with no imme-
diate payback. It takes time to generate results and earnings. Fast-track
methods can provide quicker paybacks but once advocates and sup-
porters of a process improvement project move on, its often back to
18
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 19
19
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 20
Implementation
Implementing RCM process results is tough, and the reasons why
provide insight into generation production challenges. Some are simple,
others complex, but RCM analysis without implementation has no value.
An organization considering ARCM should first examine its basic
work management processes, to uncover implicit processes that may
be understood but not well defined. Managers and other parties to
existing processes may not know how work is actually performed; those
who perceive that potential gains would cause them to lose in any way
could block RCM applications.
Maintenance has traditionally been a craft process whose workers
usually have had great latitude to work flexibly, using their own meth-
ods, standards, and pace. For this reason, maintenance culture and
practices should be reviewed for RCM alignment. Some organizational
features align more naturally with RCM processes than others. They
must be discussed and emphasized to support the RCM effort and so
avoid later implementation pitfalls.
Many organizations find maintenance process commitments sub-
stantial, and (understandably) are reluctant to take them on. However,
once an RCM-based maintenance paradigm takes hold, RCM thinking
can provide compound returns. Simplified projects can achieve RCM
benefits quicklyeven within the budget year. The discipline can great-
ly focus efforts. This offers the added benefit of demonstrating change
success. If value can be demonstrated, most organizations have power-
ful incentives to improve.
As the pace of industry deregulation and reorganization continues
around competitive structures, companies will have to invest in mainte-
nance infrastructure to remain competitive. New emphasis will be
placed on R and process improvement offered through RCM.
Operations, maintenance, and engineering support will all benefit as
companies discover this fertile area of improvement.
Value added
If the megawatts arent available when the buyer demands them,
20
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 21
then no sale occurs. Availability creates the sales opportunity, since elec-
tricity storage is limited. Producing at a cost lower than a competitor
assures sales in a competitive market. This means that availability is a
significant value-adder and R is the practical indicator of availability
performance.
Cost is another factor. A plant with a high product costeven
though its product is availablewill not be as attractive in an eco-
nomic dispatch model. Because total generation costs include operat-
ing and maintenance production costs, plants that turn to RCM favor-
ably influence availability and cost to benefit their bottom lines.
Maintenance strategy
Nine times out of ten, operators initiate maintenance but their
operating success rests on the maintenance product delivered.
Maintenance cant correct fundamental design problems. In the past,
only combined group efforts identified design as the fundamental prob-
lem and eliminated maintenance as a solution. RCM enables us to iden-
tify misapplied maintenance quicklyfacilitating designer involvement
more effectively. This helps maintenance focus on things it can correct
and stick to a winning strategy.
Design change (DC) maintenancedesign changes initiated
where a maintenance solution is availableis expensive. Maintenance
organizations request and implement design changes either for non-
problems or for problems that have simple maintenance solutions,
because:
21
chapter 1 1-22.qxd 3/3/00 2:30 PM Page 22
22
chapter 2 23-70.qxd 3/14/00 5:10 PM Page 23
Chapter 2
Maintenance
How come dumb stuff seems so smart when youre doing it?
-Dennis the Menace
23
chapter 2 23-70.qxd 3/3/00 2:31 PM Page 24
sis events like purple hearts and some corporate cultures reward those
who promote and manage crisis, rather than stable productive work-
places. I advocate stable, predictable operations. We need to get the job
done, minimizing crisis responses! And everyone needs to go home at
the end of the day.
Maintenance Options
On a continuum, maintenance varies from purely reactivefailure
responseto purely preventivetime-based. (Fig 2-1) Looking at
maintenance across such a spectrum, theres less tendency to view any
particular maintenance approach as either good or bad. Theyre
just approaches.
I dont come to this discussion totally unbiasedI believe in
planned maintenance. Competence stems from knowing which method
is most effective, and when. Even response-based maintenance can be
planned! Different equipment with different design capabilities opti-
24
chapter 2 23-70.qxd 3/3/00 2:31 PM Page 25
Maintenance
25
chapter 2 23-70.qxd 3/3/00 2:31 PM Page 26
26
chapter 2 23-70.qxd 3/3/00 2:31 PM Page 27
Maintenance
Establish a process with rules and then ensure that everyone (even
cowboys) plays by the rules. Theres almost never a good reason to
shorten a PM interval just to get a price break on parts. The opposite
should occur: if you become aware of a premium part, analyze its
cost/benefit; if you find its cost-effective to use it, buy it. Only extend
the service-life interval based on the better part.
Doing part-lifetime analysis work is not trivial. Unfortunately, many
people think that it is, which is why theres such a large market for low-
quality parts. In a more rigorous, informed cost environment, many
cost-based part suppliers couldnt survive.
When youve analyzed, compared, and tested your components,
youre ready to build them into sub-assemblies, skids, and systems. The
overall integration determines the failures that ultimately cause overall
functional failure. Two things can happenequipment, with a life-lim-
iting part, can fail. It can also last indefinitely, with internal failures,
while preserving function. (This is the complexity principle.) If there is
a predominant age-based failure, it establishes an aging and failure pro-
file. The composite of all component failure modes over the expected
life of the equipment or assembly, and their redundancy in design,
establishes the overall composite failure characteristic and behavior.
This locates the equipment on a failure spectrum (Fig. 2-3).
Thus, the failure spectrum enables you to consider alternative
strategies and how the maintenance strategy must change when compo-
nents change. Ideally, only those changes that increase product lifetimes
would occur but, unfortunately, low-quality parts and/or services com-
promise lifetimes with the opposite effect. In some cases the systematic
downgrade of constituent parts leads to equipment capability lossthat
is, as equipment becomes less capable, useful, and maintainable, utility
decreases and aging accelerates.
Between major replacement programs, our ability to maintain
equipment drops as many small problems gradually sap the overall
equipment utility, its capability to be maintained, and its operating mar-
gins. When the operating margin is gone, failures occur. Taken togeth-
er, all across a plant, they raise overall costs. This is why planned main-
tenance programs can maintain nearly complete performance capacity.
But, how? A policy of conscious age-exploration and learninga
27
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 28
low MTBF
random failures
28
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 29
Maintenance
then only operator monitoring will be effective. The more the mainte-
nance strategy is oriented towards no scheduled maintenance (NSM),
the more dependent the strategy is on operators to identify failing
equipment. The best operators are literally integrated into the man-
machine process. Experienced, skilled operators can compensate for
most failures and identify developing problems through CNM. They
require little guidance. A facility with such operators who make well-
designed rounds in a well-designed plant with a responsive maintenance
process is an ideal plantfunctional failures are rare to non-existent!
Many plants meet this implemented ARCM definition today.
The failure spectrum suggests that to be effective we must really
manage risk. The effectiveness depends on plant design, combined R
factors and redundancies, and how equipment is operated. The key to
managing risk is education. We must master knowledge of:
Consistency
Fossil generating-station maintenance processes and strategies are
implicit; nuclear plant processes are defined (though nuclear plant
processes are functionally similar to fossil). For effective RCM applica-
tions, information exchange must occur on several levels, no matter
what kind of plant is involved. These processes are unique to mainte-
nance optimization, continuous cost reduction, and performance
improvement but are not routine for many reasons.
Corporate cost information is often unavailable or inaccurate. Many
utilities are just learning cost management. Predictable costs are a chal-
29
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 30
Statistics
Informalitythe lack of a maintenance strategy that is universally
understood and appliedintroduces random factors into work per-
formance. This dilutes planned maintenance effectiveness and increas-
es the frequency and dispersion of failures. Maintenance plans must
address equipment to control failures, but this has been difficult to do,
except at the worker level.
Statistics tell us that around 85% of the tasks in a typical large gen-
erating facility are CNM and CDM, a large fraction of which should (or
needs to) be implemented by operators. Because the monitoring inter-
val for operator tasks is shorthours to dayswhat does this say for
plants that are characterized by:
30
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 31
Maintenance
Figures 2-4: Fossil Unit Forced Outage Rate (from NERC GADS)
31
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 32
32
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 33
Maintenance
33
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 34
34
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 35
Maintenance
35
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 36
36
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 37
Maintenance
(cals) can easily become buried beneath the trivial many. Screening
fossil I&C for unnecessary cals and other work is highly effective in
improving overall program results and assuring completion of those cals
that do make a difference.
Maintenance Process
Overview
Engineers appreciate highly complex chemical, mechanical, and
other engineering processes that can be analyzed objectively. This often
stands in contrast with organizational process awareness. Most man-
agers of production facilities are engineers, but there has been less
recognition of the soft processes as they apply to operating efficiency.
After years in the utility industryas both engineer and managerI
attribute this to a combination of cost-plus mentality and lack of pro-
found maintenance process awareness endemic to American industry.
Maintenance is not static. The constant introduction and improve-
ment of materials and processes has transformed the maintenance
process. Like other processes, the environment has influenced the pace
of change. Where 40 years ago, small simple-cycle plants and diesel
operations were replaced by huge vertically integrated utilities, today
the opposite occurs (Fig. 2-6).
Maintenance is one of many complex organizational processes that
benefit greatly from process improvement techniques. For example,
quality process theories found in manufacturing can be applied to main-
tenance performance. Maintenance can be viewed as a process that
delivers available equipment (products) in an operating facility on
a budget. (Fig. 2-7) Traditional maintenance organizations have done an
outstanding job delivering maintenance but rules are changing.
Maintenance organizations need to deliver operating equipment more
of the time at lower cost and take on more than the old maintenance
department has done. Some independent power producers (IPPs)
have replaced traditional maintenance staffs and annual-unit outages
with flexibility scheduled overhauls at lower cost. Workers literally wear
all hatsoperator, mechanic, and technicianto develop the jack-of-
all-trades utility worker.
37
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 38
Figure 2-6: Change! Although this engine captured 100 years of steam design learin-
ng, engineers could not overcome the inherent advantages of diesel locomotives.
Infrastucture requirements, high operating costs, and labor agreements made steam
no match for simpler, reliable diesels. Anachronisms lingered for 40 more years but
operating steam locomotives disappeared forever on Americas railroads between 1955
and 1960. No tear has turned back the inevitable march of technology progress.
38
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 39
Maintenance
39
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 40
40
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 41
Maintenance
41
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 42
42
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 43
Maintenance
43
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 44
44
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 45
Maintenance
45
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 46
PM models
A PM is any scheduled preventative task intended to reduce the
probability of failure. Key ideas are:
scheduled
intended failure prevention
effectiveness
46
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 47
Maintenance
47
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 48
PM perspectives
Organizations view PM performance differently. A PM activity
issued may be considered as good as complete at some facilities. Others
treat work complete more formally, allowing equipment interpreta-
tion based upon their last performance.
Other organizations are PM intense, performing every vendor-rec-
ommended task. This approach initiates effective monitoring and time-
based PMs processes but can also break down if operators or the craft
discover they can skip task intervals with little or no failure conse-
quence.
When equipment doesnt fail, people tend to continue with an
existing program, even though it is over-conservative and performed
too often. The only way to find out what the equipment can support (in
terms of lifetime and PM replacement intervals), is to perform age
exploration.
And craft workers dont uniformly perform all PMs to completion.
48
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 49
Maintenance
If essential PMs slip and failures occur, then program credibility is cast
into doubt and everyones effectiveness is diminished. The systematic
clean-up of casual PM programs is a significant first step on the path
towards effective RCM implementation.
In reality, programs based on manufacturing recommendations can
49
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 50
typically extend task intervals greatly with little risk of failure conse-
quences because of conservative vendor recommendations.
So, whats best? Craft and operators need to critique and adjust rec-
ommended PM task intervals. Craft feedback on intervals is required
for the dual purpose of finding the best intervals and maintaining com-
mitment to actual task-monitoring performance. Craft worker feedback
is also essential to the CMMS (and to engineering) concerning how well
parts perform, in service, to continually manage and reduce parts costs.
(In ARCM, this is a formal, continuous process.) Fostering close opera-
tions-maintenance ties, whether by intent or accident, yields more effec-
tive PM programs.
Operators, in fact, provide first-level monitoring in plant PM sys-
tems. Like maintenance PMs, operators rounds (routinely scheduled
checks that monitor broad areas and systems) should be based on value.
Traditional rounds put operators into the plant on a non-specific, just-
in-case basis, but rounds can be based on the frequency and risk of
failures. Operators may extend certain rounds with no consequence but
they bear the responsibility to support their decision. For equipment
that requires no action until an alarm goes off, a monitoring and main-
tenance strategy must be based upon that. Actively recruiting operators
to develop, review, and turn rounds is a continuous, high value
process.
Support the craft workers doing what they know needs to be done
by means of a task list. Once the PM task list is developed, work
processes determine how much gets done. Some organizations have a
catch-as-catch-can approach. Others have a work-all approach. Some
leave work scope to the discretion of the workers. Others try to work
equipment that is available. Few systematically measure the degree to
which they adhere to and complete their plan.
In the absence of a measurable plan, theres reason to question
maintenance effectiveness. Good programs are carried forward by
knowledgeable and committed craft workers. Workers still lack infor-
mation that points in the direction of improvementunaware of the
degree to which they are dependent upon the collective memory of the
workforce to accomplish PM work. In an environment with turnover
their success is diminished.
50
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 51
Maintenance
51
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 52
focus
group work organization and delivery capacity
effective tasks
52
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 53
Maintenance
53
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 54
54
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 55
Maintenance
55
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 56
56
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 57
Maintenance
57
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 58
58
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 59
Maintenance
blade cleaning
root tip crack inspections
rotor inspections
blade erosion inspection
gasket replacement
rotor re-balancing
bore inspections
lube oil purification
cooler inspections
instrumentation bypass line inspections
casing bypass flow erosion inspection
generator winding examination
balancing
stop valves inspection
stem blush removable
weld repair
59
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 60
60
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 61
Maintenance
61
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 62
62
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 63
Maintenance
Costs
Operation and maintenance practices based on reactive (not proac-
tive) philosophies are very costly. Direct costs include rework, increased
scheduling, and increased risk. Risk ultimately translates into operat-
ing events that impact equipment and employees. High risk organiza-
tions mean higher cost operations, just as speeding drivers mean higher
costs for insurers.
Analyzing failures and costs confirms this intuitive knowledge
maintenance performance correlates with insurance claim losses.
Insurers periodically inspect client facilities to assess their insurance risk
and help clients better manage that risk.
There are wide variations in electricity cost, in part because some
producers are more expensive, based on their plant outage profiles,
while for others its routine maintenance practices. If, in fact, competi-
tion re-invigorates the generating industry, it will happen because com-
panies will be forced to re-evaluate and improve processes that have
been slow to change.
63
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 64
Case Examples
SBAC
A 7000-acfm soot-blowing air compressor (SBAC) experienced pre-
mature filter pluggage, filter element tear-out, and prolonged operation
with unfiltered intake air. The rotary compressor uses five high speed
compressor stages, each driven off a common bull gear. The compressor
required overhaul two years after the previous performanceabout two
years short of its early projected overhaul date but much shorter than the
previous stage overhauls. Installed as one out of three compressorstwo
in continuous servicethis unit achieved more than five years service
until its first overhaul. The nominal life for the compressors was placed at
four years, aside from the new operating period when all units ran
almost eight years. On these staged compressors, the high-velocity fifth
stage ordinarily wore out first, establishing the overhaul need.
The diminished compressor life (between overhauls) was two years.
At an overhaul cost of between $250,000 and $300,000, the shorter life
cost an additional $150,000 (around $75,000 annualized). The missed
PMs cost three hours every quarteror $1,000annualized, including
filters. Cost benefit is at least 75-to-1 based on maintenance costs. Such
a PM cannot be missed without adding substantial maintenance costs.
While down, boiler convection passage plugging increased. Because
operating staff had to be pulled aside for the compressor overhaula two-
man, one-month duration job (with contracted help), normal schedules
were interrupted. Because overtime was required in this union shop, the
whole plant was authorized overtime, further driving up costs.
During operations, big-ticket failures mean major unscheduled
events. Non-routine, non-turbine or boiler costs can be tracked by fre-
quency and cost category. Major unpredictable equipment failures can
cost up to hundreds of thousands of dollars. Such failures are an obvi-
ous target for reduction! They can be identified, counted, costed (annu-
alized), understood-by-cause(s), and corrected. Taking on unplanned
but statistically predictable big ticket events in a systematic manner
results in gradual improvement in equipment online cost performance.
Ultimately, all equipment wears between service intervals.
Achieving maximum predictable service intervals is a goal of a PM pro-
64
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 65
Maintenance
65
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 66
66
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 67
Maintenance
67
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 68
68
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 69
Maintenance
69
chapter 2 23-70.qxd 3/3/00 2:32 PM Page 70
70
chapter 3 71-112.qxd 3/14/00 5:11 PM Page 71
Chapter 3
RCM Performance
71
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 72
Figure 3-1: Modern Blending Coal-fired Power Plant: Apparently simple, looks are
deceiving. This zero discharge plant ranks with the last nuclear units for complexity.
The plant is running at full load.
72
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 73
RCM Performance
73
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 74
gas-fired boilers
coal-fired boilers
CTs
hydro
nuclear
74
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 75
RCM Performance
75
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 76
76
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 77
RCM Performance
safety
environment
production
license technical specification or other formal commitments
agreed upon as conditions for
operations
practical support requirements
major equipment trains and redundancies
essential instrumentation
77
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 78
78
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 79
RCM Performance
large equipment
equipment covered by existing PM, calibration, and test programs
equipment of regulatory, insurance or cost concern
major redundant equipment
79
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 80
80
chapter 3 71-112.qxd 3/3/00 2:46 PM Page 81
RCM Performance
Equipment
Hierarchy level
Equipment fits into the system hierarchy between systems and
components. Equipment is integrated into system support functions.
It is often redundant, or supports a redundancy feature.
Instrumentation also fills this role. Equipment is often identified in
trains (Fig. 3-4). Equipment can be alternately viewed as susystems.
That being said, the ways in which we identify equipment is often
arbitrary. It is convenient to view equipment as a combination of com-
81
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 82
82
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 83
RCM Performance
Failure descriptions
System failures are described functionally. These are general and
non-specific with regard to component and performance. System fail-
ures are ultimately caused by discrete component failure but can be
identified much more easily at the system level than at a discrete com-
ponent level.
For example, a system that provides hydraulic valve-position con-
trol might functionally fail to control. Any number of other things
83
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 84
components fail
actionable tasks must address component failure
failure mechanism = failure mode and cause
typically, there are fewer than three common failure mechanisms
for a component type. Statistically, theres often one
ARCM perspective is statistical, not absolute; we worry about
the common modes overall
if a specific application has a known specific failure or failure
mode, we can address that
root cause does not have to be addressed for PM to be effective
the objective is to manage risk
84
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 85
RCM Performance
85
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 86
86
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 87
RCM Performance
nance program requires a plan that the organization knows it will per-
form repetitively. Herein lies another ARCM benefit-the ability to
reduce a substantial amount of maintenance to a production perform-
ance basis.
NSM. The choice to not perform scheduled maintenance is a pro-
found one. When there are no applicable, effective tasks, and no safe-
ty or environmental issues, the default is to select NSM. While this
selection seems the obvious choice, the role of the PM gatekeeper is
probably the toughest in maintenance. Ineffective or inapplicable PMs
arise from many sources. Regulators, managers, and engineers all feel
uniquely qualified to make maintenance decisions. Systems engineers at
nuclear plants have this in their formal job descriptions. The perception
is that PM is freewrite up a repetitive work order, and it just happens.
Properly developing PMs is as tough as laying out a facility design
tougher, where the concept of engineered PM programs hasnt been
sold. Organizations without gatekeepers perform lengthy lists of PM
activityonly a small percentage of which get done. Nuclear plants
spend inordinate sums on PMs that drive up their costs yet do not ben-
efit operations or safety.
Every organization needs a gatekeeper-type R engineer with author-
ity on the same level as the chief engineer at an architect-engineering
firm. Such an individual controls PM scopes and helps to achieve imple-
mentation on those PMs that matter.
Hidden failure. Hidden failures are those not evident to the oper-
ating crew under normal conditions. They usually result from instru-
mentation and/or control failures, where a component identifies a func-
tional failure not otherwise evident. Some relate to failure of redundant
and/or standby systems. For all critical functionsthose involved
with safety, that would not otherwise be evident to the operating crew
an instrument is typically provided to make the equipment failure evi-
dent. These can further be hard-wired to arm pre-set trips for critical
functions where the trip response time is essential for safety or eco-
nomics.
Nuclear units have many more hard critical trips than fossil
plants. In both cases, however, if the instrument, trip or alarm is the
operators sole line of defense against a critical safety failure, then the
87
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 88
88
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 89
RCM Performance
tions, there are repetitive needs to assess, and the decision to be made,
whether to rework it or contract services to achieve in-specification con-
ditions. A combination of the two is needed at most facilities, and must
be factored into the RCM program.
Blocking tasks
After applicable and effective tasks have been selected, they must
be blocked for effective performance. (Fig. 2-2, page 26)
Blocking starts at the task level. For instance-achieving performance
effectiveness for a large turbine overhaul requires selecting and per-
forming between 20 and 50 major TBM and CDM rework/repair tasks.
Many of these in turn will be performed hundreds of times. We incor-
porate these into the disassemble/reassemble schedule, as a project to
assure task completion and coordination. This theory applies across the
board, even at the instrument calibration level. (We would never send a
technician out to calibrate just one instrument in a rack.) A good meas-
ure of the effectiveness of PM programs is the degree to which they
achieve blocking to conserve performance trip time. Blocking also
reduces equipment outage duration (Fig. 3-5)
89
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 90
skill requirement
the task
the interval
90
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 91
RCM Performance
tive program. Taken literally, and in full force, they often assure exces-
sive and redundant monitoring or overly intrusive maintenance with
excessive parts replacement. Furthermore, vendors cant anticipate a
users exact application. The application determines which of the com-
mon failure modes become dominant. Fortunately, almost every vendor
calls for adjusting program intervals and recommends performance
based on experiencethe universal out.
In most PM programs, vendor-prescribed tasks are adequate. From
an analytical perspectiveand when cost-effectiveness and managing
risk are involved (e.g., R engineering)we need to go further. Ideally,
our statistical frequency-of-failure information identifies dominant fail-
ures, their occurrence frequencies, and the risk they pose in each appli-
cation, so that our overall strategy can be tuned to manage risk. One
needs failures to do this from a R engineering perspective, as RCM is
essentially a R engineering derivative.
Basis history
The link among failure mechanisms and tasks, and our selection cri-
teria, is whats called the selected tasks basis. Using it, over time, we can
track and understand why a given maintenance program is in effect at
any point. This is important from a regulatory perspectivethe main-
tenance rule requires that a basis be carried forward for all PM tasks
at nuclear power plants. A basis is desirable to maintain a living pro-
gram in any plant, however.
Its much easier to assess changes made to a program when you can
trace its origins. The lions share of changes are based on an assessment
of the current state and attendant equipment needs but why a given
program was in place is almost never specifiedeven in the nuclear
plants. At best, change-histories justify an existing program and provide
the basis for it but in a regulated or LCM maintenance program, a doc-
umented basis has value. A basis is an important step to developing an
effective, living maintenance plan.
PM work packages
At the equipment level, we can package multiple PM tasks to facil-
itate efficient work performance. Work can be organized by the per-
91
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 92
Information sources
Many sources of information identify viable PM tasks and their
associated failures. Vendors provide more information about PM activ-
ities than failures, but its easy to infer associated failures from their rec-
ommended tasks by analogy, comparison, and experience. Manuals
from OEMs provide maintenance, diagnostic, PM, and calibration
guides. Performance interval information they provide is typically not
directly applicable, so the user needs experience and judgement to sup-
port intervalsor, better yet, a diagnostic capability with an age explo-
ration program. Users who presume continuous service, and lean
toward literal internal applications, design overly conservative intervals.
In addition to vendor operations and maintenance (O&M) guid-
ance, there are:
92
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 93
RCM Performance
93
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 94
94
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 95
RCM Performance
requires that you know the manufacturers product lines and document
your experience. Manufacturer representatives usually do a wonderful
job helping to specify suitable products. These manufacturers cost
morebut they very often warrant the extra costs. They provide valu-
able selection criteria service.
Criticality
Risk is mathematically defined as:
Probability x Consequence
95
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 96
basic purpose
calculation ranking
failure focus
96
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 97
RCM Performance
97
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 98
unit
system
equipment/subsystem
sub-tier subsystem(s) (if any)
component
part
98
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 99
RCM Performance
failure(s)
causes(s)
Functional Reviews
Engineers find new functions in familiar areas, and their design ele-
ments, major O&M systems, and equipment. These reveal functional
99
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 100
History
Production plants develop a failure history quickly. Industry expe-
rience and history provide high value information for failure risk analy-
sis. Nuclear plants list functions subject to significant risk and support-
ing equipment under the NRCs maintenance rule. In fossil, person-
100
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 101
RCM Performance
101
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 102
Standards
In practice, there are two, possibly three broad categories of equip-
ment and components:
102
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 103
RCM Performance
103
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 104
Comparison Analysis
TRCM always performs comparison analysis as a final key project
step. These before-and-after snapshots hold limited value when com-
pared to the value achieved with RCM-based PM reviews.
Comparison analysts require a high level of bookkeeping to main-
tain a spreadsheet documentation of project accomplishments.
Documentation is suspect if plant staff delays PM changes, misses
review meetings, forgets to prepare reviews for meetings and neglects
rework analysis.
104
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 105
RCM Performance
Summary
Detailed, TRCM analysis has great value primarily as an analytical
learning tool. Standardsapplied quickly and reapplied many times
speed PM assessment and implementation. People working to stan-
dards develop processes and production methods to perform repetitive
tasks with consistency, speed, and simplicity to make the standardized
application of formal RCM techniques effective.
However, dedicated equipment applications, failure consequences,
and the ways in which we decide equipment importance necessitate
adjustment that limits the depth of TRCM analysis. Fortunately, bench-
mark and composite references mean detailed RCM analysis isnt always
needed. When we identify important components, develop appropriate
programs and standards, then devise the appropriate PM task and
measure the results, we can further adjust individual component PM
programs to overall standards requirements, where needed. Simplify.
Standardize. Implement the best ARCM-PM program.
Maintenance Process
Traditionally, work orders are based on noted problems. This is the
CM maintenance model. A second modelscheduled maintenance
supplements and extends the fundamental model. But identification of
problems, a posteriori, is how traditional maintenance works.
Operators know the problems because they know the system, the equip-
ment, capabilities, and what they need it to do. Response-based main-
tenance was the first improvement over disposable equipment due to its
significant capacity to reduce cost.
Response-based maintenance is very cost-effective when compared to
the alternativenothing! Its the first basic step in any maintenance pro-
gram. The next step is an intuitively harder onescheduled maintenance.
105
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 106
design
materials
106
chapter 3 71-112.qxd 3/3/00 2:47 PM Page 107
RCM Performance
107
chapter 3 71-112.qxd 3/3/00 2:48 PM Page 108
108
chapter 3 71-112.qxd 3/3/00 2:48 PM Page 109
RCM Performance
109
chapter 3 71-112.qxd 3/3/00 2:48 PM Page 110
Figure 3-13: Turbine Failures. Although turbine overhaul failures following overhauls
follow an infant mortality curve overall, the composition of individual failures is not so
clear. Limited extension intervals suggest that extended lifetimes between overhauls are
feasible for many turbines. Ultimate age-based turbine failures appear to be a compo-
sition of blade deposit and erosion failures for many machines. These cause stage effi-
ciency to fall. Overhauls are then a question of economic production tradeoffs.
Assessing numerically small failure numbers means interpolating between few fail-
ure events using judgement. This is at best a risky proposition. The comparison of
many machines with many failure modes at the other extreme is also fraught with risk.
In the final analysis, an engineering inference supported by detailed parts examinations
and performance tests is the most useful approach.
110
chapter 3 71-112.qxd 3/3/00 2:48 PM Page 111
RCM Performance
Figure 3-14: Cumulative Turbine Failures. With a fleet, turbine failure periodically
approximates the overhaul interval. While this suggested wearout, closer examination
showed most failures occurred following start-ups and reflected infant mortality prob-
lems. In fact, that best explains the timing! Data like this suggests that turbine over-
haul intervals may be extended with minor risk. Until age exploration establishes a
wearout interval with more exact failure experience, the predominant risk is under-
utilizing the asset. OEMs complicate issues by providing traditional time-based over-
haul interval recommendations.
Defining efficiency and load loss failure further complicates the problem.
Some companies have vague standards for end-of-period performance that provide
the basis for overall intervals. Without and exact efficiency standard, failure to
achieve performance is a subjective call. Although the issues are complex, there are
simple measures and solutions.
Lastly, the data support the idea of random limit. For this fleet, some failures
persist throughout the overhaul cycle.
111
chapter 3 71-112.qxd 3/3/00 2:48 PM Page 112
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 113
Chapter 4
Plant Needs
113
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 114
114
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 115
Plant Needs
Delivery
The value of any plant improvement process is limited by the abili-
ty to deliver the benefits to the customer. When RCM methods drive
PMO for equipment maintenance programs, LCM, and overall plant
work scheduling and coordination, its based upon an intrinsic belief
that they represent better ways.
The next question is: how can a plant get there? Two basic process-
es are required. If absent, they must be developed. They are:
a PM process
an LCM scheduling process
There is no such thing as a completely effective PM process. An
effective process must:
115
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 116
116
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 117
Plant Needs
117
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 118
Training
Reassessment and updating of operator rounds for improved
monitoring
Identification and ranking of problems with redesign opportunities
Extension of service intervals on equipment through a systematic
application of age-exploration
System definition
Operating and engineering personnel use the plant architect engi-
neers system structure to understand work and develop operating pro-
118
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 119
Plant Needs
119
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 120
System monitoring
Operations monitoring combines an operators skills, knowl-
edge, experience and senses (sight, smell, sound, taste, feel). It also
120
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 121
Plant Needs
121
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 122
System cost
Changes in system costs provide an early warning of items worth
further investigation. They also provide key measures for benchmarking
in competitive studies. How many generators know their air costs?
How, then, would a fossil generator evaluate a proposal by an air com-
pressor vendor to provide air at a unit volume price?
When system and service costs are measured, they provide serious
numbers for thought. In a competitive environment, cost oversights
raise unit cost. Loss of 1% generation in a year, and the associated gen-
eration R loss, is opportunity (and revenues) lost. Ask any IPP operator.
Assessing PM Programs
PM programs must pass the same muster as any other: They have to
contribute to the bottom line.
This means measures have to be in place to assess PM costs and
delivered benefits. Integrated effectiveness measuresthe statistical
and cost pictureare the key measures. PM activities must meet the
bottom-line acid tests of technical and cost adequacy. Failing either test
means the PM is probably an unnecessary expense. Like all expenses
time and otherwisegetting employees tuned to look for low value
or non value adding expense material is a key to long term financial suc-
cess.
122
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 123
Plant Needs
123
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 124
124
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 125
Plant Needs
Process Improvement
Maintenance performance improvement must address two aspects:
The two are parts of the same puzzle. Confusing the issue can be the
problem of establishing a basic maintenance PM process when another
system is already in place-even if its not performing well. A pilot proj-
125
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 126
126
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 127
Plant Needs
Establishing a process
Since most large facilities have a corporate CMMS, its necessary to
build a CMMS-based PM system. Many companies treat their CMMS
separately from their basic maintenance processes, yet its the process by
which they determine, plan, and carry out their work and the process
they use to develop and maintain a maintenance strategy. Ideally, the
CMMS is designed around a working PM process. (Fig. 4-2) Many
legacy systems had PM added as an after thought. With so many facets
of efficient PM performance and so much equipment in a typical, mod-
ern plant, concurrent PM process development and CMMS implemen-
tation is not feasible. Getting a basic PM process instituted around a
small system or core group of equipment is a necessary first step to a
comprehensive site-wide program. Tying CMMS support processes into
the program follows. More PM process development follows addition-
al CMMS tuning.
CMMSs may lack someone to manage the program. In a crisis-ori-
ented plant, the PM portions of the CMMS are implemented incom-
pletely, so an effective PM process never develops. For these plants,
attaining a fully-implemented PM process is an especially high value
activity.
An effective PM process has several essential functionsperform
ongoing PM tasks; rank and prioritize CNM results for time based,
CDM work; incorporate improvements. These elements are based upon
identified failure mechanisms, costs, availability, and other improve-
ments to PM program processes. Plants that presume their PM program
process is adequate often find, after performing an RCM effort, that
essential elements are missing-PM elements, maintenance performance
elements, support elements.
Among case histories of failed TRCM efforts are those which failed
because of underlying assumptions. An ARCM focus creates the most
essential PM elements-quickly-where they are missing.
Processes. Developing a maintenance process model sounds silly
to those who have been doing it for years. On the other hand, why is
it that some organizations do maintenance creatively and uniquely-as
evidenced by their WOs, equipment, and other process aspects-and
some do not? Its precisely because maintenance processes are so often
127
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 128
taken for granted as a part of plant needs that occasionally we may need
to confirm our model. (Table 4-1)
Like the maintenance process model, the basic PM process has
many different interpretations. Maintenance outcomes are also influ-
enced by organizations different cultures and personalities. Some get
many miles out of equipment, some get less, but as long as the organi-
zation extracts what it considers to be fair value from its assets, and it
makes a profit, it makes no difference how quickly its used up. In evolv-
ing industries, a facilitys useful life may be five years. Typically, the
high-tech and information-technology industries are radically restruc-
tured that quickly. Competitors adapt quickly or die. In microprocessor
and memory electronics manufacturing facilities, plants are rebuilt or
product lines replaced far more frequently than the generation industry
is used to. In the electronics environment, extracting value from a facil-
ity in five years makes economic senseit may be obsolete at the end of
the period. Based on unit product cost, the least expensive alternative
may be to entirely replace the facility with new when that happens.
Utilities and petrochemicals lie at the other end of the useful-life
spectrum. Generators that cranked out MWs in 1910 are running
128
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 129
Plant Needs
Companies roles
More than ever, generation companies need units that operate with-
in predictable costs. Since total costs include payments to co-generators
and unplanned power purchases, in-house generation costs essentially
control costs. Factors effecting random, forced outages vary from com-
pany to company, but for net power purchasers, the unplanned loss of
a single unit means substantial costs. High electricity costswhether
generated by nuclear units or base load coal generationis the factor
driving large end-users to clamor for de-regulation.
Some companies are electing to get out of generation. Those choos-
ing to remain find a tough environment. State public utility companies
(PUCs) arent granting rate increases for those who remain regulated
and most are planning some form of deregulation. Companies whose
benchmarks prove that they arent competitive find it even more diffi-
cult to restructure for competition. In light of this new generating envi-
ronment, the traditional arms-length relationship between plants and
parent companies is likely to become more interested and concerned for
plant performanceif its not already here.
129
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 130
130
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 131
Plant Needs
Nuclear generation
As an entity, the nuclear generation industry answers to regulatory
masters at NRC. Despite outstanding operating records achieved by
these plants, costs are high and hard to reduce. Nuclear processes-com-
plex and slow to change-place burdens on plants that need innovative
and cost-conscious improvements. Gradual attrition of high cost
nuclear units will continue as competitive-pressure increases.
Although overseeing a mature technology, the NRC generates
new regulations and findings, unabated. This maintains a regulatory
focusnot a productivity improvement one. Nuclear plants face chal-
lenges to simplify their processes just as fossil plants do, but fear of
NRC scrutiny gives rise to a conservatism that limits innovative jumps
and raises costs. Nuclear units need to be allowed to explore safe ways
to manage costs and risk in the public interest.
PM Bases
Justification is the concept of a basis. In fossil work, the basis is the
cost-benefit calculation. It should include safety, environment, codes,
insurance, and other compliance and general concerns. It can be explic-
it, but more often its impliedand never documented. In fossil gener-
ation the focus is to do things. Nuclear has no such luxury.
Documented justifications are expected to support changes.
Documenting a PM basis could be setting up changes to be blocked.
This is particularly true where its unclear why a task was even started
in the first place! In nuclear generation, a PM change history usually
provides such a basis. It merely needs to be collected and occasional
gaps completed before it is grandfathered to the original PM program.
Should something go awry, theres an opportunity to check the intent,
results, and see how things got off track. Developing and retaining a
basiswhy a PM is needed, selected, and at what intervalis valuable
information for history and review in either nuclear or fossil work
(Table 4-2).
Nuclear generation, with great many prescribed PM requirements,
requires the change-out of EQ components as specified in their aging
design basis documents and compliance with all vendor-directed main-
131
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 132
132
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 133
Plant Needs
PM tasks! This also means that nuclear and fossil operators rarely need
to develop cost/benefit bases for doing many PMs, but can apply tem-
plates that implicitly include cost bases.
133
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 134
Conservatism
Nuclear and fossil plants are alike in that both are ultra-conserva-
tive in selecting PM task performance intervals. Both suffer limited
access to expert analysis and support, and so depend heavily on vendors
for analytical support. Part of whats driving this is conservatism.
Until one checks component performance, in service, first-hand,
theres a tendency to grossly underestimate their capabilities. Combined
with equipments inherent fault tolerance, there are many unrealized
opportunities to extend component life and service intervals. Use data
thats available to you, if only to avoid severely penalizing a maintenance
plan!
Conservatism offers traditionalist operators tools such as condi-
tional overhauls and age exploration, both of which force them far out-
side their comfort zones. But this is where the significant savings are
also.
Conditional overhaul is not more work. It is the directed rework of
a component focused to restore original performance. While not intu-
itive, conditional overhauls have been demonstrated to be statistically
effective for jet engine overhauls. Few generating companies have a
formal repair policy of conditional overhaul, however.
Age exploration and PM interval extensions are a second opportu-
nity. Virtually all companies that use age exploration extend intervals by
minimums of 10-30%. Benefits from such minor extensions take a while
to add up to real cost savings. A substantial lesson from ARCM-aggres-
sive use of age exploration, can significantly extend PM intervals.
134
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 135
Plant Needs
Over-conservatism
Task interval conservatism is a requirement for any PM mainte-
nance program. That is: OCM intervals must be short enough to iden-
tify diminished failure resistance, but long enough to realize an items
useful life, so that on-condition maintenance tasks may be effective.
To identify appropriate intervals may require an actuarial analysis.
Those charged with adjusting intervals are not trained actuariesa skill
that requires advanced training in mathematics and many years of spe-
cial tests. A tendency in any programincluding maintenanceis to
utilize overly conservative requirements. Margin is hung on margins,
until many basic intervals reach their constraining limitthe annual
boiler or refueling outage. Of all PMs worked in power plants, the
scheduled outage interval often has the greatest PM frequency. This
represents the minimum interval that can be selected with no produc-
tion interruption. In the absence of hard R and actuarial engineering
analysis, these intervals have become accepted, implicit standards. They
also must be challenged.
An interval that represents half the appropriate (or capable) design
life of a piece of equipment puts severe restrictions on scheduled out-
age work and greatly increases expense. Statistically, annual or refuel-
ing-interval PM intervals disproportionately populate PM systems, and
reviewing annual outage work is a highly profitable task. You may find
that plant support staff selects outage replacement intervals in spite of
performance, vendor guidelines, and other recommendations that sup-
port longer intervals. CNM performed with inadequate specifications
identifies fault conditions early and overhauls prematurely. Overly con-
servative work can load up an outage with thousands of extra work
hours. Properly selecting intervals represent an immediate opportunity
for many plants.
There are other examples of over-conservatism. The asbestos
inspection requirement is one year in the absence of a monitoring pro-
gramthree years, otherwiseso a typical plant can inspect at three
135
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 136
years! Yet, almost none do so. Others use annual replacements for parts
faulted for a single failure. A nuclear plant automatically reduces by
25% its EQ service lifetime, just in case an EQ hard-time replacement
PM gets missed. Conservatism adds up. Costs for replaced parts are
high, as are infant mortality failure rates. Actuarial studies show that
overhaul activities are ineffective at improving R, yet they remain a
mainstay of the traditional generation industry.
Alternatives to regular and capricious applications of conservatism
will address any number of oversights. Competent R engineering help,
setting exact intervals, and age exploration standards represent excel-
lent opportunities to advance along the same maintenance cost-man-
agement learning curve in generation that occurred in the commercial
aviation industry.
If PM scheduling is a process problem go after the processdont
introduce common-cause conservatism. It cant correct the fundamen-
tal root-cause flaw an ineffective scheduling process presents. PM
restrictions defeat the purpose. Process errors that occur because of
complexity make it highly unlikely that more complexity in any
processjust like our failure process itselfwill reduce the error rate.
Based on documented parts performance, and provided the envi-
ronment is maintained, quality parts usually exceed expectations. When
environments must be maintained (e.g., protected from water, excessive
temperatures, caustic atmospheres, acid runoff, or excessive
wetting/drying cycles), then use the best materials available, perform
age explorationand condition-monitor.
When fossil environmental control equipment (ventilation and
cooling) is abandoned due to maintenance priorities or difficulty in
using it, its very much like abandoning chemistry specifications that are
too difficult to maintain. Theres no obvious immediate effect, but long
term consequences can be serious. In a number of cases, restoring
equipment to service was easily justifiable but hard to achieve.
Numbers can help define the objective-failure story-numbers that
most traditional generation RCM analyses lack. Some RCM analysts go
so far as to discount failure statistics and numbers. In my opinion, this
is a serious oversight. Implicitly or not, we live by frequency and conse-
quences. However, while numbers dont tell the whole story, those who
136
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 137
Plant Needs
ignore them often chase the trivial few. This has given RCM a black
eye. While some analysts bog down in endless pursuit of rare or imagi-
nary eventsthings that dont happen in this world-my approach
reflects interest in measurement. But I also review large quantities of
data to identify failures, summarize statistics by failure categories, and
make estimates (Fig. 4-3).
The numbers I work with arent exact but they are in the right ball-
park. I view them like dose rate estimates: theyre order-of-magnitude
significant and they identify sensitivity to costs. Costs need to be under-
stood at a 10%, 100%, or 1000% payback during the period of inter-
est. A 10% payback on a turbine overhaul may be worth chasing but
probably not for a $20 filter replacement. A 500% savings on a $20 task
clearly outweighs the same for a $2 task, so we want to structure our
programs to capture that value. Practically, this means when it comes to
a trade-off (and it will), we must give up the $2 tasks to make room for
the $20s.
Ultimately, activities should reflect on-site statistical data and failure
experience. Environmentsincluding the work environmentare
unique to each plant and influence what fails (and when). The cultural
137
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 138
138
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 139
Plant Needs
139
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 140
Failure frequency
After leaving nuclear and working in fossil power plants for nearly
a decade, I developed several strong impressions. For those who havent
worked both environments, theres a great deal of information sharing
that is possible. Each has very focused strengths that are applicable to
the other.
Nuclear is focused on identifying technical failures that add both
clarity and certainly to help those working in unfamiliar terrain.
Embracing failure realityversus abstract considerations of imagi-
nary problemscan greatly improve nuclear competitiveness at virtual-
ly no risk to the general public.
Fossil focus means using inherent design availability that is
built into plants to perform work as needed. The advantage is the abil-
ity to perform real-time maintenance; the risk is potential functional
failure because margins are expended. Fossil units easier start-up and
load cycle, for the most part, minimizes production losses incurred from
a forced outage.
The ability to mobilize personnel and systems to get a job done
is another fossil capability. Paperwork and organizational systems are
compact, focused, and anchored in a vested and accountable individual
or group. This focus supports the performance of CBM. However,
because fossil maintenance is less formal, operating limits are occasion-
ally stretched or overlooked and reactive failures or forced outages can
result. Defining clear operating limits to trigger condition-directed
maintenance is a fossil generation need. The opportunity for fossil
(unlike nuclear) is the authority to make individual plant interpretations
of risk and benefit when engineered limits are reached. This can pro-
vide great operating flexibility. There is absolutely no benefit when lim-
its are blown over and failures result.
On many occasions, fossil plant staff clearly understands key
operating limits from a technical perspective but organizationally, they
fail to act in a timely manner. Expectations were not made clear, or man-
agers failed to support operator decision-making. Again, the point of
CNM is to identify and perform CDM prior to final failure. To do this
well, those who perform monitoring must be expected and empowered
to act.
140
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 141
Plant Needs
141
chapter 4 113-160.qxd 3/3/00 2:39 PM Page 142
Complex failures
Complex failures in this definition include interdependent and
logical-sequencing faults involving equipment and control interactions,
multiple failures, intermittent failures, secondary failures, loss of redun-
dancy, and drift. Theyre difficult to identify, troubleshoot, and correct.
Analytical difficulty arises because many variable facets present them-
selves in concert. Each emulates the problems of a plant startup, in
which defining and solving coincident problems takes a thorough test
plan, expert assistance, and persistence. Teams and specialists are need-
ed to ferret out complex failures.
Avoiding complex failures lowers costs and increases production. A
142
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 143
Plant Needs
143
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 144
goal-setting
documentation
procedures
training
measurement
feedback
144
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 145
Plant Needs
145
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 146
thought processes
applied effort
providing each in enough volume to offset the inherent disorder
of the system
146
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 147
Plant Needs
147
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 148
Culture
Maintenance delivery
Its my belief that American maintenance performance is disorgan-
ized. We cope with high rework rates, ignoring statistical (and other)
tools to identify, measure and reduce, (or eliminate) rework. Substantial
maintenance coordination and improvement opportunities need to
include ongoing:
continuous improvement
innovative jumps
148
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 149
Plant Needs
149
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 150
Maintenance performance
Maintenance normally occurs within an uncertain environment.
Maintenance organizations often share information verbally, which has
limitations. Workers cope with equipment problems with varying
degrees of engineering support. Theres little documented, easily
retrievable information concerning equipment failure and many ways to
approach it. PM programs are implicitly defined and rarely have a basis.
Available PM information only implies the failures it addresses while
prescribing monitoring or corrective tasks. Few vendors specify (or
perhaps even know) exactly how to perform organizational mainte-
nance or address appropriate maintenance intervals.
Times and characteristics of equipment failure mode attributes
actual or idealizedare very uncertain. Yet, its from them that we
obtain mean-life, conditional probability of failure curves, distributions
of failure type, and mean life variation. Conservatism, built into main-
tenance task performance to compensate, could come from institution-
alized monitoring frequencies that are too tight. Theres also a lack of
trust in supporting systems and processes, including the computer
maintenance management/information systems. Many facilities com-
pensate by over-performing maintenance.
Some conservatism arises from the very nature of large industrial
maintenance and the crafts inherent desire to do good work. Part of it
stems from the lack of effective CMMS PM systems. However, a huge
part of the problem arises from the uncertainty of equipment lifetimes
and use. Combined with a TBM modelthe traditional PM model
(assuming it was performed)means huge amounts of conservatism
have to be built in.
Maintenance doesnt need to be random, nor must there be so
much of it. In disorganized facilities, factors working together to help
control failures include conservatism, craft, design, and monitoring.
The tendency to maintain wide margins for error (on the assumption
that things will be missed) adds tremendous conservatism to part-life-
time calculations, randomness of failure assumptions, and other main-
tenance program features.
Craft workers in a stable working environment learn equipment
150
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 151
Plant Needs
costs drops
production improves
waste drops
R improves
Equipment groups
Work association (also known as work blocking) can speed and
streamline maintenance performance. Associations can occur at the
task, equipment, or systems levels among equipment, function, or
boundary groups. Such opportunities arise when its convenient or
mandatory to work on elements within an equipment group.
The basic objectivesimilar to the objective throughout industrial
151
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 152
152
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 153
Plant Needs
CNM
Most CNM-initiated maintenance originates in operations. CNM-
monitoring without specific failure criteria-can be hard to rank, priori-
tize, and perform due to its generality. In the absence of time-based and
on-condition WO categories, an organization can measure its CNM-
originated work, based on the work-fraction coming from operations. If
an operation originates 70% of the WOs unrelated to operational tests,
then about 70% of them are NSM. Scheduling and planning, and engi-
neering, initiate most of the balance of the outage, PM, and modifica-
tion WOs.
TBM comprises the planned maintenance that is traditional, and
time-based rework/replace task work. If a plant can identify condition-
based from time-based WOs, they can measure the RCM maintenance
workload as as shown on Table 4-3.
A small fraction of CNM identifies functional failures. Measuring
that fraction involves (1) reading WOs or (2) checking logs. Few
CMMSs have fields to record functional failures (FF) and few opera-
tors discriminate functional from other failures. Logs typically record
functional failures.
A quick way to re-align CMMSs to measure RCM-based work strat-
egy is to relate CDM to on-condition WOs. You can also perform all
condition-directed work as part of the original on-condition WO.
This establishes three basic WO classes:
This approach provides a quick way to measure existing processes.
153
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 154
154
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 155
Plant Needs
shorts, and trips the pump off, or an operator smells smoke and trans-
fers pumps, shutting down the offending pump.) Functions affect work
performance. Failures translate as lost functions. The operators com-
ponent is a black box that he or she may not understand. They only
need to see the functional outputs or note their absence and act appro-
priately (Fig. 4-5).
We require functions while operating plants. When functions
break (e.g., are lost) we diagnose failures, locate the source, and fix
parts. Operators perspective is inherently functional. But while identi-
fying a functional problem is one step, tracing that back to its physical
source is another matter. Success in managing failures depends on orga-
nizational diagnostic skills (Table 4-4).
Holding a functional perspective simplifies the operators required
equipment knowledge. Operators need only assure function availabili-
ty-which involves the senses-and interpreting instruments. Facility
instrumentation supports function monitoring, and the specific func-
tion, measurement requirements, and equipment redundancy deter-
mine the instrumentation needed.
The function-part failure dichotomy is important when we talk
about failure and operate to failure. There are few (if any) cases in
which plants intentionally operate to system functional failure. This
simply makes no sense. We provide robust equipment redundancy
specifically to avoid it. In the hierarchy of systems, subsystems, and
their functionality, however, redundant or incidental functionality is
provided at subsystem (or lower) levels that can tolerate failure, to some
degree. Risk accompanies function failures, but it can be managed.
NSM is meaningful on components where a failure will be evident,
can be managed, or has no functional impact. Redundant instruments
packages, inexpensive components, and even spare trains and equip-
ment support this approach. If redundant equipment can be run to
failure while maintaining system functions, the deciding factor is cost.
Sophisticated microprocessors and sensors can identify and shutdown
deteriorating equipment, limiting damage. The cost is loss of the
equipment until maintenance is completed. This strategy is viable for
wearout failures where there is installed redundancy (Fig. 4-6).
Consider a boiler feedpump in a 50% redundant train (three 50%
155
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 156
pumps, any two of which provide 100% rated flow). (Fig. 4-7) This con-
figuration is a standard plant feedwater design, and meets boiler head
requirements for four to seven years of service. This approach is viable
and effective, provided the standby-train pump can start and load reli-
ably. Such assurance can be provided by periodic testing. When in-serv-
ice feedpump failure is identified, capacity is shifted to standby. The
worn-out, failing pump is removed from service and repaired. This could
be online or off-line, during a scheduled outage. Although equipment
must be restored, the systems functions are maintained (Fig. 4-8).
OTF as described here is a rational. We need to remember that the
failure considered here is an abstract engineering proximate function
156
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 157
Plant Needs
failure. (Fig 4-9) For many traditional engineers, this is not their per-
ception of failure. The function-equipment-failure seesaw makes OTF
confusing for some. For an operator, maintenance is of no consequence
so long as they always have necessary (or backup) equipment available.
OTF means little as long as black box system functions work.
This approach may not set well for the mechanic, however. OTF
must conserve equipment or economic consequences make it unreason-
able. Catastrophic failure fears explain why many mechanics object. In
fact, a great deal of equipment is designed to support an OTF strategy.
Internal sensing devices initiate shutdown on fault conditions causing
function loss. This limits equipment damage, but sacrifices functionality.
Cases can arise where sacrificing equipment for extended functionality is
preferred. Operators make the choice.
This function-to-physical failure mode relationship is summarized
with Figure 4-10. Functional failures observed using a system perspective
are the result of physical part deterioration. Functions can only be
157
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 158
158
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 159
Plant Needs
159
chapter 4 113-160.qxd 3/3/00 2:40 PM Page 160
160
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 161
Chapter 5
Applications
161
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 162
162
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 163
Applications
163
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 164
164
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 165
Applications
equipment that are planned, developed, and utilized. The real potential
of I&C lies in improved availability. Cost reductions arent an especial-
ly promising or even desirable goal (except perhaps at nuclear units).
The tedious, time-slugging work of disassembly, rework, and reassem-
bly of major equipment-the traditional mechanical maintenance role
is absent, because I&C hours could greatly increase or decrease with
small impacts on overall costs. R comes from reliable instruments and
controls and for plant R, I&C holds great value. Direct I&C influence
on other areas is slight.
Understanding the factors that cause trips, and improving instru-
mentation until it plays no role is the major I&C goal. Instrumentation
R and availability is a significant concern for operations. Operations
and I&C must work closely.
Other players. Traditional mechanical, electrical, and I&C mainte-
nance is supplemented by welders, insulators, and specialists such as
non-destructive evaluation technicians, vibration analysts, direct-sup-
port engineers, and janitorial staff. As it fulfills its primary role of imple-
menting time-based and condition-directed programs, maintenance
must also coordinate with specialist and contract maintenance groups
brought in for special jobs and outages.
Maintenance holds the greatest influence over costs, through
planned and outage maintenance programs and budgets. Because of the
time-intensity of any major disassemble/reassemble work, maintenance
has tremendous leverage over operating O&M cost. In a forced outage,
or a delayed return-to-power situation, traditional maintenance costs can
increase with few questions because the value of lost generation is great.
Engineering
Operations: organizational relationships
Engineering, operations, and maintenance have historically been
distant cousins. Engineering performs design-build roles. O&M run
facilities. Their interactions were usually limited to day-to day operat-
ing issues. Engineering provides project management support for large
modification projects but it routinely works alongside operations in
plant support. Fossil plants may have two or three onsite engineers who
165
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 166
166
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 167
Applications
167
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 168
168
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 169
Applications
Plant modification
Meaningful improvements will not come cheaply because they
depend upon plant design modifications and such mods are expen-
sive. Those initiated within the plant tend to cost much more than esti-
mates suggest in my experience. Minor modifications managed on-site
often have the lowest level of control and stand as the worst offenders.
In concert they add up to a burden on operating budgets and available
staff. When the final numbers are in, such projects can cost more than
whats budgetedabout 10 times more, throughout my utility work
experience, based upon final-cost figures for many minor design
changes using a cost-accounting system that traced charge numbers to
jobs. Given that original cost benefit, justifications (where utilized) were
based on estimates that were a factor of 10 low, it stands to reason that
there must be a significant volume of design work of marginal value
or, more likely, of no tangible value when the goal is reducing unit oper-
ating costs or increasing generation.
Improving the design change screening process will thus have great
paybacks. ARCM can do just that. In RCM task selection logic, design
169
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 170
modifications are the last choice when there is no effective PM that can
be done, and failure cant be tolerated. In fact, these cases are rare.
Effective PM translates to technically effective, a case agreed to
by experts addressing a failure mode. It points to fundamental design
flaws uncommon in production components and equipment. More
commonly in these cases, maintenance fundamentally misses the mark.
i.e., the task performed has no applicablemuch less cost-effective
basis.
Until its proven that a design change is required, redesign is a cost-
ly proposition. If a maintenance solution is at hand, however, savings
and benefits will be substantial. To make this point, you must have done
your homework, and there must always be analysis on which to base
design changes and value. Formal RCM analysis provides the basis spec-
ifications for redesign.
Another common organizational weakness is the failure to pass
design-developed equipment assumptions (and support requirements)
to the facility operating and maintenance staffs in a manner useful to
them. After problems arise and designs are reviewed, it often becomes
apparent that plant management and engineering staffs never connect-
ed on procedures, training, drawings, or other key aspects of what was
supposed to be a joint effort. From my experience, in about half the
cases engineering did provide the product, but it got lost at the plant
level because the plant lacked the infrastructure to use the material pro-
vided.
Its hard to recall faulty designs, and so developing thorough failure-
based maintenance plans effectively identifies areas that can truly bene-
fit from design. Such reviews ensure that operating and maintenance
problems at the plant level get corrected at the plant levelwith little
or no engineering assistance-before going to the design engineer. In this
manner, step improvements occur in O&M. Operating groups
improve their understanding of plant design specificationsand limita-
tions. Plant operators better grasp design and operation factors
required for success. RCM considerations assure that design requests
are those that design personnel and processes can and should legiti-
mately address.
There is also value in having engineering staff work on product
170
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 171
Applications
Engineering tools
A number of engineering tools provide R analysis for generating
units.
Hand-calculated until just a few years ago, R analysis was generally
not applied to complete designs. Instead, thumb rules, benchmarks,
and standard solutions were applied. Today, personal computers (PCs)
and specialty software offer greater capability to evaluate detailed
designs for availability, R, risks, and other life cycle R aspects. Many
analyses tie directly into plant operations. RCM evaluations support
implementation of the unit planned maintenance program. Other prod-
ucts provide similar services. For instance, Markoff Analysis can be
used to evaluate conditional probability of failure when important
equipment is OOS for maintenance.
In a deregulated environment, with many new plant and equipment
designs emerging, capital investments are put at greater and greater risk.
This increases the need for R tools for these design assessments. Here
are some of the best.
FMECA. A complete RCM analysis begins with a failure modes
and effects criticality analysis (FMECA). ARCM limits analysis to the
major hitters that can be identified and used, based upon experience.
ARCM/RCM for an existing facility is an a posteriori assessment
experience limits the scope of the review and focuses on value. New
facilities can be reviewed using a priori RCM, utilizing a variety of for-
mal R engineering tools, including FMECA. Projections of likely prob-
lems, availability, and maintenance costs can be generated based solidly
on analysis.
Analytical FMECAs have been used for years in aerospace applica-
tions to zero in on risk contributors and manage overall risk on a budg-
171
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 172
172
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 173
Applications
Operations roles
Failure identification. An operations staff is primarily responsible
for plant condition. However, operating staffs own the plants, to vary-
ing degrees, and so failure identification is a legitimate responsibility.
Recognizing failure requires knowledge, experience, skill, tools, and
failure standards-a perfect fit with operators plant-monitoring assign-
ments. Failure identification, as a rule, is sometimes assumed, over-
looked, or taken for granted. Again, operators have the abilities and the
obligation.
Operations spends more time than anyone else in the plantread-
ing instruments, operating equipment, feeling vibration levels, smelling
fluid leaks, hearing noises, and seeing how things do (or dont) perform.
173
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 174
They are naturally suited to recognize changes, identify faults, and initi-
ate correction.
Successfully identifying failures depends on experience and skill.
Some operators receive excellent training, either during career develop-
ment or prior to hire, while what others receive is very limited. Effective
operators in a competitive environment need higher-than-average skill
levels. Turnover increases training requirements. Nuclear plants have
excellent training programs because of license and industry standards
while fossil plant training is more on-the-job, hands-on, learn-as-you-go.
Both methods have their place. Training needs to be cost-effectivein
fact, measurement for cost-effectiveness is a training need in itself.
About 80% of all failure-identification tasks originate with opera-
tors, based upon RCM failure analysis. That is: fully 80% of all RCM-
based maintenance involves operator monitoring! In a CNM program,
then, maintenance starts with operations. Because operations monitor-
ing is so pervasive, failure recognitiona key feature of effective main-
tenancebegins with operator training.
Two primary operations tasks are CNM and functional testing.
CNM uses the senses and instrumentation to identify equipment
failure and failure trends. Functional testing for hidden failure func-
tionsalarms, trips, and other protective or standby devicesassure
function is preserved.
Nuclear plants wont discover large, available benefits from
increased functional (surveillance) testing because they already have
extensive surveillance plan requirements based upon their licenses, and
they generally have excellent availability. Fossil plants, however, may
find major gaps in their testing and equipment protection plans. Many
fossil surveillance plants are informally controlled and miss critical
and essential instruments and alarms. If implemented, these can assure
design conditions are met.
The second aspect of the operations monitoring program is the test-
ing program. Essential alarms and trips are typically tested on the
largest equipment in both nuclear and fossil plants. These include tur-
bine trips and vibration trips for large ID and FD fans. But other, less-
er alarms dont get testedchemistry out of spec condition alarms,
for example. Some critical alarms occur in remote locations and may not
174
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 175
Applications
go to the main control room. Calibration and testing programs for these
alarms implies value and perceived importance. Hard (unit trip) cal-
ibration limits are frequently neglected. Operations and engineering
personnel often interpret alarm values substantially differently, as if the
two arent reading the same set of guidelines.
Some utilities intentionally minimize hard trips when an equip-
ment supplier provides a hard-wired trip or status alarm (on high
vibration, perhaps) and the company uses a status alarm. The operator
then acts on the alarm appropriately. Such vibration status instru-
mentation is installed on virtually all the main turbines at one Midwest
utility I know of. This approach undermines effective instrument main-
tenance. Their position was that, We dont want any trips to occur due
to sporadic alarms. They expected their operators to interpret ambigu-
ous instrumentation from the same erratic instrumentation that no one
wanted hard-wired for trips. There was a R problem with the trip
instrumentation that the company was unwilling to address.
How an operating company addresses instrumentation indicates
much about its operations philosophy. In the case of critical instru-
mentation, critical has two connotationsRCM and common usage.
RCM is a direct safety consideration, common use is subjective, inexact
intuition. Ambiguous instrumentation guidelines indicate unclear man-
agement philosophy. An RCM-based instrumentation review can help
management select the instrumentation and limits for clear action.
OOS, uncalibrated, or otherwise unessential instrumentation
abounds in a typical plant. A vast majority of instrumentation provides
non-critical, non-essential status. Such instrumentation can readily have
non-scheduled maintenance (run to failure or self-identify) and be
maintained as operators recognize their need for, or dependence on, its
use. A few instruments provide early warnings of impending high cost
failurelarge-machine vibration monitors, for instance. These need
attention.
Although concern that hardwired instrument trips will lower unit R
is legitimate, there are more fundamental worries. Focus on essential
instrumentation improves unit R and safety. Clear instrumentation
maintenance standards supports safe operation. Usually, instrumenta-
tion and personnel protection for large equipment go hand-in-hand.
175
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 176
Concern that hard requirements get carried away (in the fossil world,
anyhow) is driven by fear and culture, not careful analysis. The oppor-
tunity to establish operations-administered guidance can improve safe-
ty and performance.
Once identified equipment failures are entered into a plants main-
tenance system, operators describe symptoms and provide other
insights. Getting the right maintenance starts with identifying prob-
lems correctly and that means clear WO problem descriptions. Even
someone with limited writing skills can quickly grasp WO specifica-
tions. The more defined the WO problem, the more diagnostics com-
pleted, the easier it is to troubleshoot, define scope, and perform work.
Operator monitoring. Operators monitor plant performance,
remotely and locally-in the control room and on rounds. Automated
DCS plants trend by CRTs or by automated-round logging devices.
Monitoring via DCS CRT or control room panel requires a big picture
perspective and the capacity to anticipate. DCS make monitoring the
plant easier, simplifies work, and improves alarm response. DCS sim-
plify round monitoring requirements because remotely monitored
points can be trended and need not be replicated in rounds. Invariably
there are instruments that arent monitored, or that need a physical
presence to visually review, or that cant be downloaded because appro-
priate drops arent available. In these cases, a round is still necessary.
DCSslike all other instrumentationneed oversight to control
information going into the system and the alarms safeguarding it.
Because DCS has the capacity to tie together large amounts of informa-
tion, scope-of-monitoring is even more important. RCM helps prioritize
and rank information value. Critical alarms can be emphasized and sta-
tus alarms de-emphasized. On a DCS upgrade, an RCM filter can eval-
uate alarms and instruments for monitoring, and limit the scope of mon-
itored equipment, hardware, and software. This substantially reduces
the amount of instrumentation required, and saves money. Savings con-
tinue over the life of the facility because the scope of monitoring and
maintenance has been limited.
Rounds optimization. Rounds consume the major portion of oper-
ators time. Ideally, time not spent reconfiguring the plant is devoted to
rounds and monitoring. Rounds are CNM tasks that incorporate fail-
176
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 177
Applications
177
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 178
Parts
Age exploration. Actuarial failure statistics from commercial avia-
tion studies show most in-service parts (93%) never reach their design
end-of-life. Components are replaced on a hard-time basis though only
partly consumed (Fig. 5-1). This stems from conservatism and from
untested assumptions about wear-out and overhaul. Commercial avia-
tion experience and conclusions transfer directly to the generating
industry, supported by appropriate information control and manage-
ment. Age-explorationsservice and wear monitoring on parts in
service as they are replacedprovides information that can extend life-
times.
To improve part utilization, the involvement of craft performing
part in-service performance assessment is both necessary and logical:
They remove, service, and replace virtually all parts and so their assess-
ments of parts performance are essential for aging study. When skilled
workers ask the question, How much remaining serviceable life is
there? it orients them towards assessment of failure modes, criticality,
and part service performance. CMMSs offer the ability to track failures
and replaced-parts performance information with less effort.
However, no information trail begins without a skilled craft assessment
and data entry.
Evaluation of parts performance in-service is every plant persons
job. The savings potential is simply too large for such work to be
ignored, and many facets to parts service requires that all be involved.
These facets range from warehousing lifetimes to nuclear environmen-
tal and usability issues. Sometimes savings come where least expected.
While most CMMSs have the ability to develop age exploration
processes sensitizing the craft to age explorations as a routine practice
is more challenging. A simple assessment of a part as it is replaced is
more that adequate. Fancy material-failure analyses that are within the
capabilities of some companies are, for the most part, not needed. Parts
management subroutines in new CMMSs will enhance parts use and
tracking-but even good guesses are helpful!
Component monitoring and age exploration have been practiced as
178
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 179
Applications
179
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 180
180
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 181
Applications
181
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 182
182
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 183
Applications
183
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 184
Failure numbers
Numbers are the best way to tell the objective failure story, yet
theyre missing from most traditional generation RCM analysis reports.
Some RCM books go so far as to discount failure statistics and numbers
altogether. In my opinion, this is a serious oversight. Implicitly or not,
we use frequency and consequences to draw conclusions, and numbers
tell that story. Those who dont understand this and live by the numbers
wind up chasing the trivial few. It gives RCM a black eye when ana-
lysts bog down in endless pursuit of rare or imaginary eventsthings
that dont happen in the real world.
My approach reflects my predisposition towards measurement
Im an engineer. In reviewing large quantities of failure data, identifying
failures, summarizing statistics by failure categories, and making esti-
mates, I work with numbers that arent exactbut theyre in the right
ballpark. I view them like health physic numbersorder-of-magnitude
significance. They identify sensitivity to costs that need to be under-
184
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 185
Applications
stood at 10%, 100%, or 1000% payback levels over the period of inter-
est. We need to structure our programs for value. A 10% payback on a
turbine overhaul may be worth chasing, but probably not for a $20 fil-
ter replacement. A 500% savings on a $20 task clearly outweighs the
same for a $2 task. Practically, this means when it comes to a tradeoff
(and it will), we must give up the $2 tasks to make room for the $20
tasks.
Activities should reflect on-site statistical data and failure experi-
ence. Work environments are unique and influence what fails and when.
The cultural environment influences what failures are recognized.
Available levels of skill, knowledge, and other intangibles can be
inferred, but are hard to measure. Just as two randomly-selected indi-
viduals will experience different success rates with the same make and
model of automobile (as measured in longevity and life cycle cost), two
similar plants experience distinctly unique operational outcomes. These
can only be explained in process terms.
So-called rare eventsthe second aspectpose an actuarial
problem. Rare events represent the highest-value RCM learning and
benefit opportunities. Most heavy production and financial losses arise
from them. They are certainly worth understanding.
After many years examining major losses, I find that in most cases,
a chain of events presents a history. The progression towards ultimate
failure depends on systematic process weaknesses-rare events occur
more frequently in the absence of process awareness and controls. They
reflect random, individual, repetitive occurrences. Individually, they
rarely convert to an operating event but if they happen frequently, that
event will most probably occurand, ultimately, statistics tell the story.
Rare events can be managed with conscientious, complete operations.
These rules, well known in theory, are well-practiced by professional
operating organizations.
Safety
Direct consequences
Generating plant safety presents two challengesmaximizing safe-
ty practices and minimizing costs.
185
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 186
Potential consequences
A second, equally significant improvement in safety can be derived
186
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 187
Applications
187
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 188
188
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 189
Applications
stant attention. Ice builds up, louvers tear off, fill comes down, and
screens plug. Work can be miserable. Fan electrical problems, lubri-
cating oil leaks, vibrations, and a host of other random problems make
towers tough maintenance areas.
During the summer, the units are often on the raw edge of load
reduction as the tower cell fans, spray patterns, distribution, fill condi-
tion, and other lesser problems make towers the key determinants of
load. Every last ounce of capacity may need to be coaxed from an old
and tired tower. Balancing cells on a shifting, deteriorated tower can be
almost impossible.
Towers have blown over, fallen down, burned down, and rotted
away. The last case is the most frequent. Deterioration of cell basin lev-
els, or leaky distributor shutoff valves, makes cell balancing difficult or
impossible. Structural sags can effect the hot water basins so that bal-
ance cannot be achieved. As towers age, their inability to balance,
maintain basin temperature, and maintain condenser vacuum in sum-
mertime (during load peaks) makes replacement inevitable.
The more dramatic tower episodes in my career involved lesser sub-
systems that werent appreciated until they became problems or failed
outright. Before fiberglass return lines and spargers became standard,
redwood staved-distribution piping was common. At one plant the
staving failed, sprinkling a waterfall out away from the tower basin.
Flooding resulted, and the basin went low. After the basin emptied, the
circulating water pumps tripped. The unit went down on combination
of low vacuum and no cooling water flow. A similar event involved a
tunnel access manhole cover bolt failure on the discharge side of the cir-
culating water pumps. The condenser tripped on low vacuum after the
basin emptied. This latter case destroyed a contractors onsite trailer.
Circulating water pump head is 30 to 60 feet at rated flow, nominally
enough for an impressive waterspout!
Tower fan problems are the stuff of legends. Fans throw blades
when ice damage occurs. Deicing practices aggravate this tendency.
Gearbox failures, due to water lube-oil contamination, are typical as age
increases. Corrective measures for throwing blades have involved cre-
ative modifications like enclosing the diffuser assemblies with heavy
wire mesh. (Consider the costs for this modification for a 16 cell tower
189
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 190
and you see the potential for RCM-based modification review! What
an opportunity for root-cause analysis, too!)
Motor failures are common as towers age in service. Most new
units use weather-enclosed motors. Theres no work to speak of and
they are essentially consumable. Sizes range from 5-75 horsepower
(HP).
Secondary failures from tower fill and structural debris have been
quite damaging in unique cases. We once rebuilt major tower sections
(with a contractor) and wood scrap debris was left in the basin. Startup
transported this debris to the water boxes and waterbox isolation valves
(seats), and ultimately into the condenser tubes. Screens removed a lot
of debris, but the volume and size of the debris, together with repetitive
screen cleaning during adverse winter weather conditions, allowed large
amounts into the condensers, where it accumulated. Silt accumulated
around the packed wood debris waterboxes and flow stagnated in the
partially blocked tubes. Local corrosion cells were established. The
resulting tube damage prematurely required condenser retubing due to
severe water conditions and the inability to control localized secondary
pitting corrosion. An admiralty brass condenser that had a design life
of 30 years was limping badly at 13. Production losses ran into the mil-
lions. Retubing ultimately cost around $5 million. Granted, the water
was aggressive, but the condenser had been performing well until the
wood debris episode.
This example reiterates the importance of such simple features as
screens, and the need to do simple PM tasks-screen cleaning-very well.
It must be timely, and conscientious, even in adverse weather. Our will-
ingness to start the unit in this state of unreadiness indirectly reflected
our standards. When standards are compromised at high levels, the
trickle-down effect can be significant. Ultimately, workers care when
they see that managers care. Standards must begin at the top.
Secondary damage of this nature is an expensive consequence of
low-quality work and otherwise inconsequential failures. It is very pre-
ventable-if youre aware of the risks. Of course, this event was an infant
mortality failure, but a very predictable one in light of the stations other
problem areas.
190
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 191
Applications
191
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 192
192
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 193
Applications
193
chapter 5 161-194.qxd 3/3/00 2:41 PM Page 194
chapter 6 195-254.qxd 3/3/00 2:42 PM Page 195
Chapter 6
Lessons
195
chapter 6 195-254.qxd 3/3/00 2:42 PM Page 196
196
chapter 6 195-254.qxd 3/3/00 2:42 PM Page 197
Lessons
Task Intervals
PM task intervals are based on failure rate and mean life variability.
For random-failing components, special strategies may apply (within
the context of the design). For a predominantly random failure mode,
for instance, functional monitoring can effectively identify instrumenta-
tion failure. A check made at a fraction of the MBTF can identify
instrumentation failure while minimizing overall failure risk. An instru-
ment can be tolerated in a failed state for limited intervals, if effec-
tively redundant.
Operations is charged with identifying random equipment failures
in the plant. An effective round ensures that operators get through the
facility often enough to identify failuresparticularly random ones
without excessive monitoring. Usually, four to eight-hour round inter-
vals are effective. If they must be made more frequently, designs should
be evaluated; if they can be made less frequently, it raises questions of
risk, and staffing levels. Equipment in terminal failure may require addi-
tional monitoring. Requirements to monitor terminally failing equip-
ment can be quite substantial.
Selecting task intervals is part science, part art. Failure data alone
may be inadequate for infrequent failure modes. With limited failure
information, manufacturer recommendations, failure mode physics, or
mode type assessment, an expert opinion may be needed to establish
appropriate task intervals. In many instances, too, an exact interval is
not critical. With inexact information, intervals can be over-specified
(made too frequent), particularly by an unskilled analyst. An age explo-
ration-based monitoring program can adjust intervals based on failure
type and age exploration.
When craft participates in interval selection, its a significant orga-
nizational growth step. Craft develops ownership of monitoring and
parts-in-service assessment that in turn supports identification of
appropriate intervals.
For expensive, age-based failures, intervals need to be conservative.
Generator rotor cracking probably has an exact MTBF in excess of 30
years. Inspection on a disassemble-overhaul basis (typically, every 5-10
years) is appropriate since the equipment is at risk. Instruments moni-
197
chapter 6 195-254.qxd 3/3/00 2:42 PM Page 198
toring for high cost equipment failures must be maintained and their
failure prevented. Brief outage periods are acceptable but entail risk.
Prolonged unavailability is not an option. VM on high-inertia rotating
equipment must be maintained constantly, with hard-wired trips, and
must have an operating limit. Anything above the limit is an automatic
trip.
The problem establishing intervals for instrumentation is ambigui-
ty. Manufacturers resist hard-wired trips to avoid spurious or undesired
events. They assume that an operator can discern inappropriate
demand trips on an instrument provided for status, and avoid bring-
ing down the unit. Its right in theory but wrong in practiceoperations
learns to ignore unreliable instruments. Regular, spurious instrument
trips and alarmsas there can be at status only instrumented
plantsmeans that instrument value plummets. Status only instru-
ments can lead to maintenance deferral when their importance is dimin-
ished. Well-maintained, high-quality critical instruments are essential.
Identifying critical instruments is helpful, of course! Chasing a
faulty alarm is frustrating and expensive. Sitting in the hot seat after
making the wrong call in an ambiguous situation is equally trying. You
dont get too many chances before the instruments are discounted and
ignored. In RCM (or ARCM) a spuriously alarming instrument is con-
sidered failed-one of the truly great contributions to instrument
maintenance programs and one that operators had been demanding for
years.
It all sounds overwhelming. But in reviewing thousands of compo-
nents and tens of thousands of WOs, one quickly becomes comfortable
estimating task intervals. It takes experience to develop a feel for fail-
ures, but, with some quick R training, most experienced people can
draw on their years of observations to make excellent judgements about
parts agingparticularly when wear, abrasion, or erosion processes are
at play. A failure model helps integrate a picture of the failure process
with plant culture and strategy.
In practice, MTBFs are often grossly underestimated. Plant staff
base their life estimates (and PM intervals) practically on a small frac-
tional sample of failed parts. While suitable for safe-life interval lim-
198
chapter 6 195-254.qxd 3/3/00 2:42 PM Page 199
Lessons
its, it isnt obvious that this grossly underestimates the average life.
Predominantly, PMs are based on economics. What this says is that
informally developed economic PM intervals are almost always grossly
conservative.
Estimating organic failures is vexing. Aging is expected; however,
rubbers, cloths, elastomers, and similar materials deteriorate with time
and temperature. Even when visual aging evidence is missing, its risky
to assume theres been no aging. The Arhenius temperature character-
izes organic agingbelow it, little or no aging occurs; above it, aging
increases quickly. Visual evidence can be absent in the transition range.
Calculating an ageespecially for components in critical applications
helps avoid gross errors. When large organic expansion joints made of
reinforced cloth, rubber, and binders, reach their manufacturers spec-
ified life, life extension is risky. During installation, the absence (or pres-
ence) of offset, vibration pulsations, and other synergistic aging phe-
nomena complicates the picture. Only experience can determine actual
in-service aging.
Remaining aware of failure processes and how they workknowing
what to look forgreatly improves setting task intervals. Fortunately, a
few fundamental aging processes repeat over and over in most plant
applications. Learn these, and you have the basic tools to evaluate most
aging mechanisms. Developing failure mode data is a R engineering
exercise. Fortunately, experience, thumb rules, and training go far.
Age Exploration
Definition
Age exploration is the systematic examination of the lifetime a com-
ponent or part can support in an application in service. Its crucial to
setting task intervals. The term means literally to explore component
aging, and find out what service the component can provide.
It used to be assumed that all components had finite lifetimes
equipment wore outand needed replacement or overhaul. As first
examined in air transport, it was discovered that it doesnt hold true for
many components. Though powerful and intuitive, the assumption had
no basis. A statistically large number of components showed virtually
199
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 200
no deterioration during period of use, based upon early jet engine over-
hauls in the late 1950s and early 60s. These included actuarial analysis
of failure studies. On 90% of replaced components, life remained at
the end of their specified life.
Lifetimes have always been based on the best available informa-
tion. Aircraft turbine developmenttransitioning from prescribed
overhauls to age study-based monitoringsummarizes the experience.
When overhaul limits were eliminated and age exploration undertaken,
equipment lifetimes increased significantly resulting in quick economic
benefits and lower risk. Considerable actuarial analysis detailed mathe-
matical failure analysis to quantify lifetimes and conditional probability
distributions-and support this change. The concept of conditional
overhaul gradually emerged. The results can be applied to most other
industrial maintenance applications.
Conditional overhauls only address immediate failure causes and
correct other necessary parts to achieve specified performance. The
paradox is that conditional overhauls yield overhauled equipment
that statistically perform the same as traditionally overhauled ones. By
literally running fault-tolerant equipment with NSM until failures devel-
op, we can use the concept of age exploration, merged with design and
conditional overhauls, to give credence to the term NSM.
Early equipment manufacturers and maintenance experts did their
best to specify age-based replacements, recognizing the potential to do
better through age exploration. Extending useful equipment life
requires understanding how items age in service and how effective we
are at discovering itand then formulating how best to use this knowl-
edge. Profound understanding of statistics and actuarial lessonsles-
sons learned from those aircraft engine overhauls, failures, and actuari-
al analysisenables this conceptual leap. Evaluation of in-service per-
formance on an ongoing basis (particularly for new equipment and
components as they enter service) enables us to manage risks.
In the course of understanding jet engine aging and failures, the air-
craft industry discovered that even the very best maintenance and engi-
neering experts couldnt predict future engine performance based on
overhaul data. Experts predicted the imminent failure of apparently
worn-out equipment, only to have the equipment perform (statistically)
200
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 201
Lessons
Value
For major equipment overhauls (boilers and turbines), extending an
interval by even a few days can have value. Other small savings, in
aggregate, also add up. Age exploration achieves its greatest potential
value when a plant shutdown can be deferred.
For a nuclear BWR, replacing solenoid valves to meet an EQ often
falls in the 4-6 year range. Extending the qualified life for control rod
pilot solenoids (at 4 per control rod, or 137 control rods per unit) pro-
vides substantial savings. (This requires re-qualification testing, of
201
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 202
Figure 6-1: Best Value? Every nuclear plant spends time on the NRCs watch list or
so it seems. Impressive operating records lead regulators to suspect production is
emphasized over safety. Maintenance expectations differ in the highly regulated
industries despite the same equipment. The challenge of deregulation is to use
industry-wide best practices to achieve outstanding operations at low cost.
202
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 203
Lessons
203
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 204
Systematic application
Age exploration in new equipment begins by removing parts from
service for examination. New-failure mechanisms, premature aging, and
other unanticipated failure-mode evidence requires immediate atten-
tion. Over the long term age exploration provides the basis for predict-
ing how much service a given component can support. You can better
extend life when you realize the ultimate service life limit. Done sys-
tematically, it can provide the basis for improving many plant equip-
ment maintenance decisions.
Such age exploration principles have been known and used for
years, but havent found regular applications in electric power genera-
tion. Perhaps this simply reflects the traditional nature of power plant
maintenance. The need to improve part-cost performance hasnt been
a need in generating plants-until now. Legitimizing age exploration
neatly resolves this cultural problem. Utilities should develop formal
age exploration methods and hand the decision process back to those
who actually use the parts.
Effective age exploration requires:
This last element is vital. Employees must expect that plants, equip-
ment, and systems will continually improve, and that performance will
increase and costs decrease through improved utilization. Without
learning, a part-aging program wont be effective.
The benefits-both obvious and subtleafforded by age exploration
include:
204
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 205
Lessons
Engineering Focus
When its included as part of an overall corporate strategy, age
exploration focuses corporate engineering on what matters. Issues of
205
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 206
redesign, statistics, cost analysis, and new CMMS tools can help put
engineering resources where they will add the greatest value.
Old and new facilities differ in their engineering design improve-
ment needs. In the past, rapid advances in design, lowered unit costs,
and load growth meant that engineering focused on construction. Plant
lifetimes were short. Disposable plants were expected to be techni-
cally obsolete after 40 years in service. Such was the design standard.
Today, plant replacement capital simply isnt available to old-line utili-
ties. Traditional, vertically-integrated utility generating units, whether
fossil or nuclear, are beset with high costs and complex processes, put-
ting their continued existence at risk. The challenge is to redesign and
re-deploy assets for competitive survival. When engineering groups lack
the experience needed to effectively improve plant operations, utility
engineering must turn to others-and this should not be the case.
Failure spectrums
A complete RCM review of a complex systemestablishing a main-
tenance spectrumquantifies optimum maintenance mix. At the
extreme are systems that support heavy monitoring: they have a high
number of random, low-consequence failures that dont (cost-effective-
ly) support fixed-time maintenance. Personal computersmany of
them controlling many complex subsystemsfit this profile. Overall,
they fail randomly, but a quality machines average age at failureits
MTBFis several years longer than prescribed useful life. Its highly
probably the device will be technically obsolete (and taken OOS)
before this point is reached. The MTBF is large20,000 hours or more.
Most failures are, in fact, randomly introduced software glitches or
random operational losses. Hardware failure incidence is low. There are
no effective tasks that will cost-effectively prevent failure so an effective
strategy is one that addresses failure identification and data preservation
instead.
The philosophy behind how equipment is designed also determines
its approach with respect to operator intervention, monitoring, and the
value placed on monitoring time. When system failure rates are low, it
demonstrates integrated man-machine design success. Time (in man-
hours) required to achieve failure rates may differ radically from one
206
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 207
Lessons
design to the next. If labor is valued low and capital requirements are
high, an overall optimum low cost solution is labor-intensive-a CNM-
intensive maintenance solution. If the cost of capital is low and man-
power high, the optimum mix is little manpower and more capital.
Here, the limiting case requires no man-hours for maintenance at all
the OTF case. Different cultures approach equipment maintenance dif-
ferently but often apply one of these two methods.
I think of the former model as German and the latter as
American because the way the design of a Porsche and a Chevrolet
reflect this different thinking. European maintenance strategies lean
towards more monitoring while Americans tend towards less. If equip-
ment is capable of extended life, we should seek that approach and
apply it for the optimum maintenance cost.
Operating costs can be reduced through reductions in capital
expenses, provided such reductions-or an increase in unreliability-dont
increase O&M. (One unplanned outage and all savings can be wiped
out) The purpose of many capital expenses are performance improve-
ment. Programs to extend life must assure against trade-offs-or, worse,
bottom line losses. Invariably, production losses carry high penalties but
are abstract and harder to quantify than PM. (How can we measure the
cost of opportunities lost when sales are missed?) Industry faces the
same opportunitiesand riskson a much larger scale. Every decision
in industry is a roll of the dice-and they roll hundreds or even thousands
of times a day. When we do, ineffective or incorrect strategies show up
on fairly short order. (Twelve to 24 months are needed to measure the
impact of a strategy change for average plant cases.)
Many technically advanced products carry specifications that assure
a specific design life at a specified level of performance. Boiler tubes will
last 40 years at design firing rates with specific water chemistry. (Firing
rate and chemistry specificationstechnical limitsassure design life.)
Exceeding these specifications causes immediate proximate failure.
Some books refer to this as engineering failure, or root cause failure. For
long-lived capital equipment, understanding this relationship reflects
business profound knowledge. Many companies do not (or cannot)
make this relational tie. Where equipment records indicate secondary
failures can be attributed to exceeding specifications, improved per-
formance monitoring can significantly improve economics. Where
207
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 208
208
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 209
Lessons
PM Implementation Models
Do your best
When some companies build or acquire facilities and maintain a
laissez-faire approach to facility maintenance, its because they discov-
ered they can run them much longer than vendor-specified intervals
with no apparent loss. Ultraconservative vendor intervals partly explain
ho-hum approaches to TBM.
Imagine, on the other hand, that missed PM intervals had (relative-
ly) immediate and severe consequences. Time-based monitoring pro-
grams and vendors would gain credibility! I believe this would happen
if manufacturers discarded the volumes of trivial, over-conservative
209
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 210
210
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 211
Lessons
211
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 212
212
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 213
Lessons
Typical PM implementation
In the unregulated maintenance arena, only a small number of PM
tasks are performed. The tasks themselves may not specify work to be
213
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 214
done. Review provides only vague notions about work to be done. From
years of plant experience, I can infer things worth doing, but auditing
work turns up many different results. The common trait is that the value
of many PM tasks cant be assessed or calculated. The plants entire
maintenance strategy may be suspect. At fossil generating stations Ive
audited 15% of the work on the PM list is performed-on the high end.
Typically its 7-10%. Low is 3-5%.
Nuclear plants have more aggressive lists and better measurement
because of regulatory requirements. Completion rates are regularly
between 80-95%. I watched a BWR achieve more than 90% of sched-
uled PMs worked to completion month in-and-out. Those not worked
were rescheduled. Failing to perform scheduled PMs had to be justified
in advance. Returning a PM to a backlog list was unacceptable. The
result? This unit did not suffer a plant trip in five years! Not that this
was solely due to PM completion-there were many other expectations
and practices that supported operations. But the culture was one of
commitment, competence, and maintenance across all groups
inspired by the regulatory environment.
Clearly, the contextual meaning of the PM program was radically
different in these two environments. To get this latter level of PM pro-
gram performance requires management commitment, and PM work
credibility.
Fossil plant R is a tribute to designthey run so well with so little.
But if most fossil units run well without a complete maintenance plan,
whats the upper performance limit? Would a more detailed plan bring
down performance and raise costs? What about other aspects of PM
performance, such as outages? My opinion is that more complete
strategies can raise production and lower costs.
Total PM performance
Some companies develop and execute maintenance strategies based
upon what I call total PM performance strategies. They aim to
enhance production and profitability goals by centering on facility uti-
lization. Ensuring high production, profitability, and performance facil-
ity utilization rates must be heavy and planned. The key to such projects
is that maintenance must support operating and facility use plansnot
214
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 215
Lessons
215
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 216
nance unflinchingly in the 80s.) In this case, its called corrective main-
tenance. Contract all repair work to specialists so that all that remains
are PMs, performed in-house as the total responsibility of the remain-
ing workforce. This could be a multidisciplinary group-electrician, tech-
nician, and mechanic skills-supplemented with operations. This
approach clarifies, identifies, and prioritizes maintenance organization
work. The inherent conflict-of-interest between PMs and corrective
work for overtime, enjoyment, or other motivations is gone. PM is now
the only game in town. Selection of PM as core work is espoused by
some merchant generators.
Merchant co-generators tried this approach as an interim measure
because they lacked trained, skilled crafts. Onsite plant staff performed
all routine operations and light maintenance while outage and heavy
work was contracted out. Plant staff-clearly focused on the plant condi-
tion-used CNM as their primary tool to identify, diagnose, prioritize,
and plan outage and restorations. It was effective!
Vendor Perspective
The vendors dilemma is twofold. He must provide a good product
while generating sales. Ideally, he receives follow-up sale and service
calls for training, service, parts and so forthfor each customer. When
the client receives value and satisfaction from the equipment, the ven-
dors interest is best served when a product has a finite life. His best sit-
uation is technical, functional, or economic obsolescence before the end
of useful facility life occurs. The client retires the product in-service to
buy anotherunless the vendor can convince him to upgrade to some-
thing better.
Vendors are also repositories for product development knowledge.
In the course of their work, they must identify, understand, and remove
design, production, and operating impediments that cause failures.
They generally retain this information conveying it selectively to users.
Unfortunately, vendors cant provide complete failure data to equip-
ment owner/operators nor fully disclose product development and
applications, as they must protect competitive positions. They need to
exercise discretion in the event of legal action. In addition, plant tech-
216
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 217
Lessons
nical staff must understand and translate the operating data they receive
from the vendor. Generally, users dont need or require details about
the product. They might be intimidated by too much information
weak points and total costsor might comparison shop, or be steered
towards a competitor. Lastlyand most importantvendors dont
have complete information on actual in-service failures and aging per-
formance. They cannot possibly understand all conceivable environ-
mental and aging factors, applications, and uses imposed by the users
and their environments.
So, our dilemma is that vendor information, while good, is incom-
plete. It generally gives a fair assessment of common failures and
expected maintenance needs for anticipated service applications over
an intended period of use but doesnt provide environmental or appli-
cations information that summarizes a products stretch capacity
always the most exciting and challenging areas for users. However, ven-
dors are always a first source to identify both expectations and report-
ed experiences with new products.
If you can connect with a vendors engineering staff (assuming they
have one), you can resolve most questions with unpublished accounts
and experience for many product use applications and most failure his-
tory. Vendor engineers are more likely to offer critical failure informa-
tion over the phone than on paper.
Vendor recommendations
Vendor recommendations represent the best guide to maintenance
strategies that are appropriate for the equipment they offer. The quali-
ty of vendor recommended maintenance varies greatly. Some is truly
outstanding. Many dont provide any information at all. The vast major-
ity provides useful, but incomplete or sometimes inaccurate guidance.
At best, vendors provide a starting point, and so their O&M manuals,
sales literature, and drawings should be reviewed while developing any
maintenance strategy.
In a highly regulated environment, a vendors guidance may carry
the force of law. If the vendor specifies that a certain filter must be
changed every three months, you must change it. Rarely, however, are
vendors so direct-or consistent with plant time measures-in their guid-
217
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 218
Failure Footprints
CMMS barriers
When performing ARCM, simple techniques can often significant-
ly improve analytical results. Typically, RCM analysis is highly abstract,
using esoteric, redundant, or even arcane statements. Many analysts
focus exclusively on expert interviews to flush out failure modes of
interest. While interviews are good, numbers tell the story as weve seen.
In my experience, the difficulty most people haveengineers
includedis penetrating corporate maintenance management systems.
CMMSs are difficult to learn and harder to interpret. In the role of
maintenance manager-at the mercy of others to develop and interpret
CMMS reports-I finally forced myself to learn how they work. Having
waded through the process several additional times at several compa-
nies and plants, I highly recommend anyone involved with maintenance
not yet fluent with these systems, reports, and numbers to learn at least
one. If you want to evangelize, you have to learn the language.
218
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 219
Lessons
219
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 220
quate. These two fields provide the most basic failure information nec-
essary for these studies. For CMMS users-particularly schedulers, plan-
ners, and workersthe maintenance process flow model and CMMS
must compliment one other. With the widespread application of file
transfer protocols (and products like Microsoft(c) [MS] Excel), its rel-
atively painless to export, sort, and reformat CMMS data for analysis
and presentation. (Day-to-day users are restricted by their unique sys-
tems and processes they support.)
Information displayed in a condensed report formatemphasizing
work initiation and completion fieldsenables a quick review of com-
ponents by type. An RCM failure-sample survey (by system and by com-
ponent type) provides both failures types and the work performed.
Several thousand WO reports are representative for an average sys-
tem. Typically, this encompasses one to three years of WOs for a fossil
plant and at least 30 for a nuclear facility.
I read the reports, placing tick marks for each failure group, and
then identify dominant failure modes. Based on the time period under
review, I make some rough estimates of MTBF and MTTR. In this way,
I can knock off a relatively large system in a day or so. My research into
system problems comes next, and takes more time, but the CMMSs can
be very effective when I download and sort vast quantities of informa-
tionexactly the uses CMMS promoters championed 20 years ago
(Table 6-1)!
These reports tell a story. They indicate aging-based failures, ran-
dom failures, and indeterminate areas that need more review. They
enable me to effectively structure questions for operators, maintenance
personnel, and engineering support personnel. Done well, these reports
can summarize failures in visual ways for storybook analysis and prob-
lem discussions. They provide a relevant basis for types of PM tasks and
their intervals (Fig. 6-2).
220
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 221
Lessons
221
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 222
222
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 223
Lessons
ways and ladderswhich you need to do for safety. One maintains cur-
rent plant drawingsno engineering job is complete until the drawings
are revised. The other never updates any post-construction drawings.
One plant finds time and money to maintain little extras, like ventilation
and air conditioning. The other cant seem to keep them upexcept for
the administrative offices.
Each plants policies offer powerful cues about management expec-
tations and equipment standards. The commitment to develop an O&M
planin contrast to a catch as catch can approach- delivers greater
confidence. Developing and maintaining standards takes fortitude. A line
is drawn in the sand. If you skip back and forth across the line too many
times, it becomes indistinct. Companies that set high standards out-per-
form those that do not, as reported in business books.
OTF in RCM
OTF simply means no scheduled maintenance tasks. The terms
run to failure and operate to failure are similar but generate nega-
tive interpretationsand thats the least of our problems.
Regulators have come to expect that everything has a planned
maintenance program and it must be understood that OTF is a planned
maintenance program. The plan is no scheduled maintenance like the
mathematical null solution. The work elements and failure modes can
still be virtually complete. Some nuclear plants must document their
null maintenance plan, literally. The inherent robustness of design cited
in the RCM classic by Nolan and Heap is, largely, beyond the grasp of
the general population, regulators, and particularly the media. Those
not versed in RCMand most people arentsimply dont understand
this distinction or its basis nor do they need to.
For the NSM option, there is no time-based WO to kick out, but
operating staff and those in the plant must identify and respond to
symptoms. NSM programs depend on their personal and informal diag-
nostic skill and knowledge though the tasks are non-specific or not
scheduled. This also illustrates why RCM-based, non-specific tasks
need to be ruthlessly purged from the CMMS task list. For operators,
knowing the plan depends on their initial condition assessment.
Removing redundant CMMS tasks encourages more thoroughness in
223
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 224
224
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 225
Lessons
ules should occur only very infrequently. Routine, repetitive jobs need
standard corrective plans developed, planned, and shelved to work
on demand. Sophistication improves as people gain insights into plant
design depth-a depth developed with routine ARCM applications.
The term self-identifying maintenance also sheds light on both
NSM and OTF. The bottom line of that plan is that we dont schedule
formal tasks to perform maintenance - nothing more or less.
Equipment maintenance needs are self-identified by the equipment.
This is acceptable because of inherent R-the absence of known, effec-
tive age-based PM measures-and limited consequences of failure. No
one in their right mind would do any maintenance to a car headlamp
other than replace it upon failure. On failure, however, its very impor-
tant to replace promptly. Its the same concept.
Legitimate failure
Actual failures occur when we violate performance standards. If
standards are established, its unnecessary to discuss failure criteria. In
their absence we have nothing but a discussion about what constitutes
a failurea discussion beneficial only to the degree that it leads to
common failure definitions. The exercise is pointless-except to develop
failure standards. Without standards, people cannot agree on what is
important and what is failed. For better or worse, nuclear plants have
many guidance standards in place. Failures are well known. For fossil
plants, the concept is abstract. Failure tends to follow a free-form defi-
nition, literally linked to a primary function failure. At plants with spec-
ifications, failures are specification-based and incremental.
An RCM-based approach to failure definition forces people to
think about goals and limits, which in turn leads to earlier action. Goals
can be obscure to operating staffs. Obvious limits on measured vari-
ables such as opacity and emissions, material thickness and production,
can be missed. It could be argued that exceeded specifications and lim-
its are in themselves arbitrary failures, since in most cases violation does
not cause sudden, discrete events. Rather, an engineering limit has been
exceeded. Real failure comes some time later and with continued loss of
margin. Design specifications have margins thatif properly fol-
lowedsafeguard us from real proximate failure areas. Real proxi-
mate failure consequences include:
225
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 226
major accidents
excessive accident rates
excessive environmental releases
226
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 227
Lessons
227
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 228
Complexity in failures
Failures have contextssimple or complex. Simple failures are eas-
ier to manage so we naturally prefer them to complex incidents.
Complex failures involve multiple failed items and interactionsinter-
actions that make them hard to diagnose. Multiple hidden failures are
harder to recognize, interpret, and correct. These include multiple coin-
228
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 229
Lessons
Bootstrapping
ARCM provides us with many ways to approach plant operations
improvement. Each can add value quickly. Once specific equipment
and subsystem analysis is complete, results can be transferred to similar
units with little effort. Applying previous learning to similar units and
systems without any formal detailed analysis is a form of bootstrap-
ping.
Contrary to TRCM analysis, power plant design is highly standard-
ized. All steam turbines use a Rankine cycle, for example. Suppliers are
the same in a given region of the world. Large generating facilities share
considerable standardization of equipment, systems, and layouts. A few
configurations have been developed and see many repetitions. Even
informal standards have proven their utility over time. Thus, once a
basic repertoire of RCM equipment and systems understanding is in
place, it forms the core for many common applications.
Systems. Common system configurations abound. In fossil plants,
feedwater, condensate, sootblowing air, and circulating water are very
similar from unit to unit. In nuclear plants, the GE BWR and the
Combustion Engineering (CE), Westinghouse, and Babcock & Wilcox
(B&W) PWRs share common design elements. This supports stan-
dardized RCM analysis.
Occasionally new systems are integrated into traditional ones, and
this requires system re-analysis.
Equipment. Most common equipment in power generating facili-
ties is supplied by two or three primary suppliers. Even where there are
many suppliersas in the cases of valves and motorstheres so much
229
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 230
230
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 231
Lessons
Critical?
Traditional RCM definition
Traditionally, commercial air transport RCM reserves the term crit-
ical for those failures that have an immediate and direct safety impact.
A critical failure is any failure that could have a direct effect on safe-
ty. (Nolan & Heap) Note the word direct imposes specific qualifying
criteria not imposed elsewhere. This qualification excludes failures that
arent immediately evident, or werent single failures. For non-evident
failures, the absence of evidence means there is no direct impact on
safety. The immediate requirement screens out multiple, train-redun-
dant failures.
The first major permutation of RCMs definition of critical
occurred in the transition from aerospace to nuclear power. NRC defi-
nitions for what are now called essential componentsthose whose
failure could affect fission product environmental releasesgave crit-
ical a new dimension. Since many nuclear components occupy this cat-
egory, critical applications grew by default. The tendency to associate
the term with specific components, rather than failures, compounded
confusion.
The final application of the term critical to non-nuclear units,
produced a flow process that divided analysis into critical and non-crit-
ical. Dividing components in this way (a la nuclear) proved confusing as
fossil plants struggled to abandon their historical critical interpreta-
231
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 232
Casual use
Many of us, occasionally use critical casually. Those SRCM meth-
ods that divide equipment into two broad categories-critical and
non-critical-depend upon the equipment involved, and determine
whether the review selected involved is a thorough failure review or a
quick and dirty (cost-based) sanity check. After evaluating thousands
of components, Ive found that the RCM process that identifies com-
ponents for PM is not that important. Most analysts can quickly deter-
mine a components suitability for PM, and the likelihood the PM is
effective. Whether a non-technical person can follow their analysis is
another matter! If your interest is solely the final product-an effective
PM program-you probably dont need the extra information. If you
must maintain it, you do!
One plant in which I worked developed an automated prioritization
CMMS that identified all equipment as critical (or non-critical) and elec-
tronically pre-assigned priority to the equipment WO. This system ulti-
mately produced a disproportionate number of critical, high-priority,
work-today WOs. Because no one had the inclination to override
default rank and rank any priority low, the predictable results were that
virtually everythingexcept PMswere critical. This was not only
not useful, but effectively eliminated meaningful priority.
The primary work prioritization/screening tool depended on two
attributes-is it emergency or deferrable? The tasks most likely to relieve
the workload, long termPMs-rarely made the cut.
The primary purpose of a priority system is to rank the importance
of workquickly. If there is no discernible priority attribute, or its
skewed, then the system-no matter how ingenuous at the software level-
has little value. This system had no value. When truly critical equip-
ment fails, it causes unsafe conditions. Economically critical equip-
232
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 233
Lessons
ment failures cause unit outages or other obvious design-intent and sys-
tem-function failures that relate to bottom lines, not safety.
Truly critical failure modes. People dont think in terms of failure
modes. Understanding failures does not come easily. Like any skill, it
takes years of experience and missed calls to get rightmost of the
time. Systems understanding comes first. Many people in responsible
positions havent had the time to hone this skill but are charged with
understanding plant risk and making safe, effective operate/shutdown
decisions. Such managers rarely make good candidates for failure engi-
neersthey like black and white cases, and clear-cut calls. (And,
indeed, someone has to make a call.)
Its extremely important to clarify distinctions among safety, pro-
duction, and economics, to best allocate scheduled maintenance
resources. After 15 years of discussion and analysis, I believe that pro-
ducers are inherently safer that non-producers. Plants that operate-just
like cars that put on miles with no breakdowns-have to be in good
hands to be able to do so. Its rare to find top performers that dont also
put operational safety at the top of the list.
Just as equipment groups are similar, so uniquedifferentcriti-
cal failure modes are relatively scarce. What they share in common is
the fact that most equipment potentially presents life-threatening failure
modes, eventually. When we buy it, get good service out of it, and expe-
rience its performance deterioration so legions of engineers have to
scratch their heads arguing over whether the time for overhaul (or
replacement, as the case may be) has comethats the way we like it!
Operations at this point wants a new onepump, compressor, valves,
belt, whateverbut understands the old beast well and so gets more
mileage from it.
Wearout. Wearout is the desired end-state for every component in
an operating organization. When equipment is worn out, evenly and
well, meeting a manufacturers promised life, it represents an ideal. Its
a matter of gradual, predictable performance loss, providing lots of lat-
itude to schedule replacementand manage risks by scheduled main-
tenance. Ideally, turbines age this way between overhauls, as full turbine
load gradually trends towards valves wide open (VWO) position.
Centrifugal pumps show gradual loss of head, as the rotating elements
deteriorate and seals wear.
233
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 234
234
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 235
Lessons
Importance
Given that critical will probably endure as a named condition,
though in confusing permutations, lets consider one more attempt to
clarify and simplify its use.
Safety and cost drive all PM while safety and economics drive all
plant operations. Ignoring economics fails to adequately address our
sole operations purpose. For this reason, economically critical is an
acceptable concept-provided we restrict its application to failures. The
primary consequence of most safety equipment failures are operational;
we must terminate operations to address a key safety function. Using
our functional definition of failures, we could agree to use another term
for equipment classification based upon economics, then reserve the
term critical for safety functions and their failures.
We would then have two classes of equipmentimportant and non-
importantand a sole criteria for classification: whether we plan to
consider scheduled maintenance for the item or not. We could just as
well identify these as scheduled maintenance and non-scheduled
maintenance. Once past this barrier, we can review equipment for
applicable and effective PM tasks.
Practically, a reviewer looks at the following identifiers to discern
235
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 236
equipment size
failure reporting frequency
work frequency (including PM)
vendor recommendations
general industry practice
shop practice
equipment register
236
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 237
Lessons
costs.
Work frequency. Systems that are costly to operate and maintain
demand that much extra work-and they require PM programs, WOs, or
both. Outage and contracted work not captured in plant WO systems
should be reviewed for scope that reflects maintenance tasks. Acid
tests for large work scope should be considered-is a basis established?
Does the plant know why the work is done, and what its performance
benefits are? Occasionally, capital modification work performed in off-
budget areas is charged as a maintenance expense, especially when engi-
neering groups perform maintenance support functions. This skews
costs.
Vendor recommendations. Vendor literature should be reviewed to
assure that vendor-identified work has been assessed. Vendors general-
ly understand their equipments economics better than anyone else.
They also appreciate safety considerations, although they may not
understand specific applications. Vendor recommendations often turn
up interesting insights (and oversights) that influence large equipment
costs. Small equipment that lacks obvious, integrated functions is usu-
ally NSM by definition, but reviewing this against the vendor recom-
mendations can identify tasks the vendor thought were cost effective to
perform, even for generic equipment.
Comparing recommendations from similar vendors is an effective
way to establish appropriate tasks and performance intervals. Many
times upgrades and enhancements influence maintenance frequency-
like a superior synthetic lubricant that extends a lubrication perform-
ance interval, for instance.
Industry practice. Benchmarking equipment for standard industry
practice is another effective way to establish appropriate levels of effort.
Comparing one industry to another, when both run the same equip-
ment, can provide insights. For example, how mines handle coal in dif-
ferent locations supports cost-effective improvements for utility coal
handling operations. As with all benchmarking, understanding the
methods and practices behind the numbers is essential to making
appropriate choices.
Shop practice. Every shop develops techniques to manage work
performance. Often some of them are unique and effective tasks that
237
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 238
Area Checks
When operations personnel perform area checks, theyre engaged
in NSM. Area checks require operator rounds, to check the condition
of equipment installed in the plant. This was the original intent of the
hourly round.
In fossil generation, an operators complete round can take two
hours or more, even at a brisk pace. And brisk is not the point-you
must slow down to read instruments. Often, lighting and cleanliness can
make equipment monitoring additionally time consuming (especially
when gauges are dusted in coal or oil mist.) When I identify hourly
rounds sheets (except for the control room), Im immediately skeptical
of their effectiveness and applicability.
Yet, rounds are important. In complex plants, many failures are ran-
dom, and actual system functional failures are rare. Because of this, its
essential that the operator on a round identifies failing (and failed)
equipment. A log entry or CMMS trouble report are techniques to iden-
tify failures. After a complete ARCM review at a nuclear plant (review-
ing approximately 100,000 components), we found that the over-
whelming default PM activity, numerically, was NSM. This plant had
238
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 239
Lessons
An area check is a general survey that integrates the senses and non-
specifically identifies failures. Like functional tests at the system level,
they integrate and pick up major failure indicators. Theyre like the
area check that driver education courses suggest you perform before
you get in your car and drive away. Airline personnel make them every
time a plane takes off, confirming the absence of functional problems
for various basic, yet critical components. They also enable the quick
discovery of problems that could have serious consequences over time.
Area checks in plants are also cost-effective ways to identify random
and general deterioration failures. Clean, well-maintained equipment
provides unambiguous results. If standards drop, however, and dirty
239
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 240
240
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 241
Lessons
Instrumentation
The typical plant has a tremendous amount of instrumentation and
control equipment. Much of it was installed or packaged with equip-
ment skids, provided by the manufacturer/assembler. Many instru-
ments have as their primary function the setup and performance of pre-
operational or operational testing-lube oil skids, for example.
Equipment suppliers often provide remote panels for pre-operational
testing and operations. A typical plants equipment, augmented by skid
I&C, provides so much instrumentation that to tackle TBM of all of it
would be a Herculean task!
Major process control loops feed a plants DCS. This is done
through drops, in modern plants. The typical DCS runs two redundant
independent buses with self-checking diagnostics and the capability to
swap buses, should a problem occur. Each has a fully redundant back-
up with the same capability. These robust systems havent been the
focus of a detailed RCM assessment by many clients, and I&C techni-
cians and engineers, by and large, effectively maintain DCS controls.
This suggests low value added benefit here at the typical plant installa-
tion. Risk management, maintaining redundancy, and depth have
thus far been very effective. However, there are I&C opportunities.
The first is to select and identify candidates for NSM. I&C PMs
include cals, channel checks (CCs), and functional tests (FTs). Self-diag-
nostic equipment can reduce or eliminate the need to perform FT. Self-
calibration routines can eliminate the need to calibrate. Typically, a
trouble alarm sounds if a plant DCS loses a drop or channel. Periodic
checking of the channel alarm is all thats required.
Its tempting to feed every instrumentation point in a plant into a
new DCS during an upgrade project. However, large fossil units could
have 5,000 points fed into the DCS. For important equipment (like boil-
er feedpumps), this enables dropping more points than original plant
data logging could accommodate. More information is available on-line
than previously available, or available locally at the feedpump skid
(such as local hydraulic and control oil pressure and temperature) but
the downside is that we must maintain instruments-including the low
value instruments. Points fed into the DCS need conservative selection
with an operating monitoring strategy in mind. Selection should not
replicate an existing monitoring strategy. Extraneous nice to have
241
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 242
Spurious alarms
Consider the new car whose seat belt monitor tells you to buckle
up, over and over, as you cruise down the highway. Most people can
take about five minutes of that before they pull the plug-assuming they
can find it. While searching for it, theyre a safety hazard (unless they
pull over). Nuisance alarms are more than a nuisance-theyre a distrac-
tion and a potential safety problem.
In the context of RCM instrument functionproviding a clear,
unambiguous picturea nuisance alarm is a failed alarm! Critical
alarms should be corrected. If non-critical (e.g., you can safely tolerate
their absence for long periods), remove them.
Screening I&C calibration intervals for extension based on drift
242
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 243
Lessons
experience and importance can reduce and simplify I&C work. In fos-
sil plants, calibration frequency can be adjusted (extended) by installing
new controls, but it should be considered in all plants. Many times,
overly conservative intervals persist long after theyve been identified.
Excepting very old pneumatic control devices, there are many opportu-
nities to calibrate less often. Newer sensors and instruments can reduce
maintenance requirements by significantly reducing drift. Reducing
efforts expended on non-essential control loops calibration enables
more consistent focus on key control loops and essential alarms.
Critical instruments
I qualify my reservations about using the term critical with one
exceptioninstrumentation. The reason for this is very simple. Key,
essential safety and monitoring control instrumentation really has one
single function-to provide operators with an unambiguous window on
the plant world. When this is not the case, the instrumentation has
failed. For this instrumentation alone-because its sole function is unam-
biguous safety information-its failure alone is enough to justify a plant
trip.
Its like the driver with broken windshield wipersif you cant see,
its hard to proceed safely. Or a train with no clear signals. To proceed
with no window on the worlds critical features violates the basic pre-
cepts of safety. Its like flying blind. This is why I quantify essential mon-
itoring and control equipment as critical. Generally, if there is an
active control loop, its role is already captured as important. No con-
trol, no operations. This interpretation is largely limited to I&C with
safety status functions, and is consistent with Nolan and Heap.
Note that the vast majority of instrumentation doesnt meet these
criteriawell under 1%, and maybe 0.1%. And were not talking about
a little drift here or there in an operating event. Although drift also
has limits and boundaries, critical instrumentation that has drifted out
of range is failed.
The I&C equipment spectrum extends from non-essential to con-
venient controls to generation control loops to safety I&C. Equipment
not directly supporting generation control provides service, conven-
ience, time-savings, or another support function. It is non-critical.
243
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 244
244
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 245
Lessons
245
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 246
machine missile
mechanical faults
246
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 247
Lessons
247
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 248
Redundancy
Costs and layers
Where redundancy cost can be managed, redundancy adds value.
Large mechanical and electrical equipment costs are substantial. In the
absence of cost containment emphasis, costs quickly escalate. With
generation simply another competitive product, uneconomic costs are
an added burden.
Many American families have a spare car (some two). Its value is
mobility when the primary vehicle breaks down or goes into the shop.
There is a cost to maintain a spare. Typically its less than a primary vehi-
cle because operation is limited, but there are fixed costs. Consider the
pros and cons of a spare vehicle. Space, time-cost, and other less-obvi-
ous costs are incurred to have one. There are fixed and variable costs,
all of which are endured for the assurance and convenience of the spare.
Cost-savings are achievable if the R of one car is high enough to elimi-
nate the spare. Herein lies the problem of redundancy: How much,
before too much becomes a cost and organizational burden?
A Midwestern utility developed a system-wide blackout recov-
248
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 249
Lessons
Oh my God!
Rare failures occur with random predictability. When a plant expe-
riences a rare event, the loss is sobering. We hope that equipment dam-
age is the only consequence but occasionally, people are hurt. Major
equipment losses are rare, averaging less than one per year per unit
(based on NERC, my own experience, and other statistics) even at func-
tionally run-to-failure facilities. Individual, rare failure events occur
several times in the 40 year plus life of a unit. Scarcity is their problem:
statistically, theyre like auto speeding; many separate events are
required before an event registers. But plants with events account dis-
proportionately for overall losses. Precursor events are risk factors.
Control them, and you have made a substantial impact on practical risk.
For large equipment event protection, instrumentation extends
the human senses for failure modes (and events) that otherwise cant be
detected. Fossil units are at a significant disadvantage to their nuclear
249
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 250
250
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 251
Lessons
251
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 252
Case Example
In 1998, a New Zealand transmission/distribution utility ran into
the worst utility nightmare: an inability to supply loads to a major load
centers customers. Most primary feeders to the downtown district in
Auckland were lost. Personnel were unable to restore service in a time-
ly manner. New overhead catenary had to be strung as an emergency
measure, requiring approximately five weeks but ending the crisis. The
event involved the predictable loss of two aging, deteriorated feeders
and the additional loss of two more in rapid succession. A few facts
bear scrutiny:
There were other issues. When the final failures developed, the util-
ity acted slowly to save their remaining good cables. The will within the
company to challenge its organizational path was absent. (What
252
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 253
Lessons
253
chapter 6 195-254.qxd 3/3/00 2:43 PM Page 254
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 255
Chapter 7
Fast Track
255
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 256
from load reductions, extended outages, and unit trips translate into
millions in lost sales. This can be tolerated in the regulated environ-
ment with capital investment guaranteed returns. In competitive
industries-refining, chemicals, fibers, process and manufacturing-main-
tenance losses cannot be tolerated.
Culturally, maintenance is an unglamorous backshop where status
quo has been accepted. It hasnt achieved recognition as a strategic
function supporting production. Traditional accounting treats mainte-
nance as a variable cost of production. It is required simply to keep a
facility in operating condition. But if investment in production is nec-
essary to support revenues, then maintenance is a viable production
investment. Industries squeezed by cash flow have tried to cut mainte-
nance but have discovered their competitive position erodes as produc-
tion capacity, services, and processes decline.
CMMS Strategy
To implement any maintenance plan, there must be a strategy devel-
oped on the plants CMMS. Most CMMSs use a PM/CM work model
even if they use an RCM maintenance philosophy. Work originates on
a CMMS as a timed event or work request. Both are internally gener-
ated and correspond to routine (pre-scheduled, timed event) and
response (requested, demand event) WOs. Pre-developed, pre-request-
ed, timed WOs are called PM. Everything else traditionally is CM.
This includes on-demand maintenance we prefer not to develop as rou-
tine, pre-scheduled work.
For instance, work is developed from scratch lists kept by engineers
and planners and put into CMMSs several months, weeks, or even days
prior to the scheduled outage as CM demand work requests. (Such
lists may have been maintained for years in hard copy despite CMMS
availability and capability.) This work looks, acts, and gets identified as
CM when in fact, most outage work is preventive in nature. Equipment
is operated into an outage; work is intentionally deferred into the
scheduled work period. When outage lists are prepared as WOs at the
last moment, potentially plannable/planned work becomes demand
work, and the benefits of planning-standardization, coordination, repet-
256
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 257
Fast Track
CNM
Key:
PM -- Preventive Maintenance
CM -- Corrective Maintenance
TBM -- Time Based Maintenance
CDM -- Condition-Directed Maintenance
OCM -- On-Condition Maintenance
OCMFF -- (OCM) Failure Finding
NSM -- No Scheduled Maintenance
CNM -- Condition Monitoring
Figures 7-1: Maintenance Terms Map
257
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 258
existing software and processes and then perform high value analysis
focused on implementation. (Fig. 7-1) As this occurs, the organization
develops a new maintenance paradigm focused on scheduled mainte-
nance.
When an organization acquires another CMMS they are forced to
adopt a new maintenance model. The selection of new CMMS software
facilitates the transition. This time is an opportunity to introduce
ARCM-based organization, planning, and scheduling methods.
Maintenance Infrastructure
Performing RCM requires a maintenance infrastructure, just as per-
forming planned work does. Someone needs the skills, time, and com-
mitment to do the work. The organization must have the confidence to
use the results. Processes and systems grow slowly, with nurturing care.
Even with focus, commitment, and expert help, learning is required.
The work force grasps most RCM concepts quickly, once they perceive
an organizational commitment to improve skills and manage costs. This
is infrastructure and it takes a dedicated period to develop.
In some instances, infrastructure development requires new capa-
bilities and measures. In others it requires processes-getting PM WO
change control processes, and creating PM owner responsibility.
Building infrastructure-awareness and sensitivity to a maintenance plan-
requires time and nurturing.
Traditional PM Programs
Consider VM-a traditional PM program. Immediate payback
comes from screening VM to limit and control scope. Only a few plant
areas benefit cost-effectively from VM. Although this might at first
seem like a complex task, its by no means that hard- particularly with
several benchmark VM cost/benefit studies. Developing and applying
benchmark cases can quickly establish where VM will be beneficial.
Using this template to quickly screen all equipment for VM can elimi-
nate large amounts of non-productive effort for better PM paybacks
elsewhere.
258
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 259
Fast Track
Is a PM list maintained?
Whats the percentage completion rate of PMs on the list?
Who gets PM completion rate reports?
Who decides how to defer PMs?
Is there engineering responsibility for PM selection?
Is there an exception report for overdue PMs?
How are PM priorities ranked with regard to other work?
How are outage PMs maintained?
What is the process to add or remove PMs from the list?
Are the PM basic processes defined?
Who is responsible to maintain the stations PM program?
How does the PM program integrate (or fail to) with the CM pro-
gram?
Many plants run random PM programs. That is, they have laun-
dry lists of things to do as time becomes available. PM task selection,
performance, and reporting are hit-or-miss. Unfortunately, its virtually
certain that plants with this PM approach will suffer R and availability
losses. There is simply no credibility to this PM approach in a complex
259
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 260
Scheduling
Once an analysis for scheduled maintenance is complete, the real-
ities of making work happen takes over. Complex plants struggle to
260
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 261
Fast Track
inefficiently planned
hard to coordinate
indeterminate or questionable value
not ranked by some common value scale
261
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 262
priority scale. The cost/benefit values of these PMs are much higher
than can be gained by fixing broken equipment. From a return-on-
investment perspective, returns on the most effective PM tasks range
from 50 to 200 times cost. Fixing broken equipment, on the other
hand, has no improvement return. It merely restores status quo. From
an investment perspective, then, which is more important - a 1/1, 5/1,
or 50/1 benefit? Common plant priority systems are structured as
appears below.
Priority Meaning
E Work immediately
1 Work next day
2 Work next week
3 Work when convenient
262
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 263
Fast Track
operator monitoring
calibrated instrumentation
maintenance work turnaround capacity
263
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 264
264
chapter 7 255-300.qxd 3/14/00 5:12 PM Page 265
Fast Track
265
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 266
Scheduling Methods
Expedite
Traditional maintenance environments depend heavily on day-to-
day expedited work. With greater coordination required by the NRC,
nuclear generation has developed more routine scheduling methods.
They also have more detailed PM programs, and generally, a much high-
er degree of PM program implementation.
An operate to failure perspective simplifies scheduling. Priorities
are more obvious with failed equipment. Absence of daily and weekly
generation look-ahead scheduling reflects this acceptance, and the
inherent R of fossil designs. There are fewer imposed engineering
specification failures to contend with (in contrast to nuclear). There
is capacity in the traditional large generating station for operations with
compromised equipment, too. As a choice, OTF offers greater oppor-
tunity to perform real-time maintenance on demand, as needed. So
long as functional failures dont compromise production, and costs are
managed, the option to use OTF is a powerful one. It also requires
266
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 267
Fast Track
Long term
Except for unit outages, most plants have short windowsand
memories. A 12-week schedule fills the middle ground between the
weekly look-ahead and the outage schedule.
The 12-week schedule was derived from surveillance tests at nuclear
267
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 268
268
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 269
Fast Track
deteriorates much slower than youd ever expect, unless you had meas-
ured it. Vendors specify common lubricants, for instance, whereas pre-
mium ones increase lifetimes. Vendor information and ARCM can
jointly identify, select, and perform the right PMs-but we need both.
Work grouping should be intuitive to optimize performance.
Minimizing overall work-the number of times an area must be entered,
a tag out hung, an area cleaned-are examples of why project management
is effective. Power plants often suffer the frustration of work not coor-
dinated, equipment not cleared when crews are ready to work, or other
crews working when job site space is limited. Work conflict is a very real
and persistent problem. Re-entering areas for equipment rework is
another time-wasting frustration. The fewer surprises, the fewer over-
sights, the fewer items falling in cracks, the less rework there is.
Organizations that measure rework are often surprised at its level as
a percentage of the total. Studies Ive developed and seen have indi-
cated rework approaching 50% of all WO hours, at some plants.
Unless its measured, youll never exactly know the loss. If we accept
that rework is important to manage, then theres substantial opportuni-
ty to reduce this wasted effort.
EGs are very effective at doing this. An EGa logical assembly of
equipment identified and scheduled for work as a unitvary in their
group basis but commonly include:
single tag out and return
standard tag out boundaries
one clearance (including draining or otherwise prepping)
one post-maintenance test and calibration for all work done
multiple tasks performance while in the area
coordinated scheduling of all items in the group to reduce volume
of work items
enabling of establishing standard PM work plans and schedule
intervals for major trains
coordination of work within the group
coordination of LCO type license, safety, and insurance com-
pensatory measures
safety
269
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 270
270
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 271
Fast Track
common tag out boundary, this does not always have to be the case.
Another use of an EG could be to associate many of the same equip-
ment types into a round for common assessment under one trip.
Another EG could be for common lubrications. Another could be for
VM, or inspecting fire doors. There are as many group possibilities as
possibilities to associate work. Newer CMMSs provide the ability to
establish parent-child relationships, and therefore facility equipment
work grouping.
EGs work best when developed as a joint operating agreement
among all entities in the plant. They provide an agreed-upon standard
for work performancei.e., EGs have greater value if operations can
support them with a standard tag out boundary, and release the equip-
ment for work based on the group. Any plant considering work per-
formance at power (previously done only during scheduled outages),
needs to absolutely minimize the risk of trips. This makes groups a
powerful tool, especially as plants realize the value of doing more work
on-line instead of in traditional outages. (OCM and CNM must be per-
formed on-line.) Once a group is established, the risk of doing work
on-line can be assessed, managed, and controlled with greater focus and
certainty.
Groups fill the gap between systems and the componenti.e., at
the top we have hundreds, or even thousands of components per sys-
tem, based upon functions and identified in the form of system draw-
ings, descriptions, and component lists. Below the system level, there
are often equipment trains, logical subsystems, and the unique organi-
zational structures and work practices that require different groups.
Groups can change dynamically over time as well. For example, an
organization could move from hard time lubrications to OCM, and
back to hard time for a given class of equipment, such as a coal belt (or
eliminate them entirely with sealed bearings). Both types of monitoring
can run concurrently. Groups simplify and standardize this practice.
Why not associate tasks within a given functional work area in a
procedure and eliminate the EGs? This fixed grouping suffers from
its frozen nature and relative difficulty in modifying procedures. Tasks
are typically harder to internally reorganize. EGs retain work individu-
ally, and carry the flexibility to make new associations electronically.
271
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 272
They minimize the need to rework hard copy or text files, which makes
them easier to use. Grouping and ARCM complement each other, since
RCM identifies PM tasks to be done by functions, failures, and descrip-
tions-all potential grouping attributes. As PM scope grows, coordina-
tion requirements are likewise greater, and this motivates grouping.
Groups also facilitate work planning. All planned work PMs and
MWRs are identified and associated with a given EG by component
number and coordinated for most efficient performance. Once a group
window is established on the rolling 12-week schedule, time can be
allotted for work based upon experience, risk, and scheduled work
time. Groups can tag many CBM tasks onto one activity to improve the
capability to schedule and work CBM in a controlled way.
Outage
Outage planning and scheduling benefits from the rigor and sim-
plification introduced by ARCM methods. Outages affect unit avail-
ability and constitute the most expensive maintenance budget period.
RCM reviews simplify and standardize outage work to minimize them
and maximize benefits.
Getting outage workscopes, formally reviewing them for applicabil-
ity, effectiveness, and cost/benefit value, greatly benefits outage scope
and budget management. For units that maintain extensive outage
work scopes on a routine basis, the RCM screen is virtually identical to
that performed for existing PM programs. For those that develop
scopes just prior to coming down, there is opportunity to cut scope.
Given that most outages slightly to moderately run over scheduled
durations (based on my experience), theres great opportunity to
achieve substantial returns with unit outage scope reviews.
Outages are sometimes only partially planned. Consistency and
predictability of outage workloads comes with thorough, failure-based
work review for value.
272
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 273
Fast Track
Working to schedules
The great challenge for everyone is working to schedules. As nuclear
plants discovered, adding schedulers doesnt guarantee success without
somehow structurally simplifying the work. Operating groups re-sched-
ule easily to support the operating plan, but cause severe implications for
work managers. An eleventh hour operations outage housecleaning
designed to maintain a schedule led to the elimination of nearly 100 PMs
on one job. The limited availability of work windows, combined with the
long duration between plant outages, forced a significant amount of PM
work into grace periods and mandated schedule realignment. This short-
perspective schedule change hurts long-term operating goals.
Once PMs are set up and the schedule is aligned, seemingly super-
ficial changes can have long term consequences. Where PM programs
are mandatory, this can force unscheduled plant outages. Accepting
PM as a priority is a difficult lesson for any organization. Operations,
discovering that they incurred an unplanned plant outage by cut-and-
slash deferral of scheduled PM work, learns a sharp lesson. As
workscopes shift towards a maintenance strategy, schedules become
increasingly important.
273
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 274
Overhaul Intervals
Basis for overhauls
Overhauls associate individual, time-based rework tasks around a
common disassembly activity. Because much time can be invested in
disassembling and reassembling a large machine, as much work as pos-
sible is performed with any disassembly task. The consequence of any
single component wearing out before the next scheduled overhaul peri-
od is so great, its considered cost-effective to simply replace all compo-
nents, regardless of condition or cost. This is the basic overall strategy.
Take it apart, replace as much as possible, and button it up.
Many traditional mechanics-as well as instrument technicians and
others-follow an overhaul work philosophy. This has two disadvan-
tages. First, costs are higher than necessary when serviceable parts are
replaced. Second, by examining parts, workers learn to support a
plant- or company-wide age exploration program. Without examining
parts and asking the serviceability question, R engineers never get feed-
back on inservice performance unrelated to failures, and lose the oppor-
tunity to pursue systematic life extension based upon age exploration.
Overhaul intervals are typically based on accepted standards that
represent a composite wearout picture for many components. Once
established, these intervals have gone unchallenged for long periods.
Only the recent drive for cost-competitiveness, and the demonstration
by a few IPPs that overhaul envelopes can be stretched, has changed
this perspective. Unfortunately, executive committees are too often the
ones setting new turbine and boiler overhaul intervals in almost com-
plete absence of field engineering information on equipment capability.
With todays emphasis on CNM and life-extension, plants are
extending intervals, supplementing known aging problems with specif-
274
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 275
Fast Track
Optimizing strategy
Many secondary considerations go into the scheduling of a heavy
outage, such as a turbine. These include availability of other units, over-
all load, R, scheduling of replacement power and services, and value of
the deferral in present value terms. With the recognition that nominal
outage intervals may have been conservative, methods to extend outage
intervals (while managing risk) are considered. Methods using condi-
tional probability have been available for years.
From an RCM perspective, a single great potential savings comes
from the systematic examination of risk that comes from incrementally
extending an outage period from a known benchmark. The five-year
turbine standard was considered reasonably safe but companies are
shedding the known safety of this interval to take overhauls out to
seven, nine, and even longer nominal intervals. As they do this, they
seek to manage their risk with increased use of CNM. Extending large
machine outage intervals systematically is an obvious RCM capability.
Planning
Planned work. Efficient preplanning requires that work be antici-
pated-either because it gets performed over and over (like PM inspec-
tions) or because equipment failure modes follow statistical patterns.
275
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 276
For example, limit switches get loose and sticky in certain environ-
ments. Even in clean environments, the contact may oxidize. If your
plants primary experience with limit switches is that they come loose
and need to be reset, that job can be anticipated and preplanned. That
standard job should be planned and ready to go on a moments notice.
A job within the skill of the craft requires a job plan-just not
something on paper. (The shop practice and methods guide is the
generic standard plan.) The planner could use a preplanned job, the
standard, or none, at his discretion. But having a standard written job
plan facilitates training, establishes a standard, and enables learning and
revision as methods change. With electronic CMMSs, maintaining stan-
dard job plans is as simple as Windows cut and paste capability and
offers the opportunity to use standard electronic plans.
Preplanned corrective maintenance. Planners and others some-
times conclude that because a failure mode occurs randomly, the work
cant be preplanned. If the failure mode is predictable, and recurring,
the job can be preplanned. Using the 80/20 rule, all high-frequency CM
WOs can be preplanned and filed away (electronically) for immediate
recall. This makes the electrician who gets called in on the backshift for
an unpleasant job a little happier-he doesnt have to wait for the job plan
once he gets in or plan it himself-on the fly! He has something to work
from. Maintenance gets more consistent performance. When these
simple aids were made available workers found them useful.
276
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 277
Fast Track
Plant outages are the second level of PM. Any PMs that directly
prevent plant outage fit this criteria. Boiler tube inspections, condens-
er tube inspections, boiler chemistry monitoring, DCS two-channel
backup configurations checks, boiler camera checks, and main steam
safeties liftoff tests (where these are split, say, 3/35% for 105% total
relief capacity) are examples.
These tasks assure key redundancies, backup equipment, and/or
other capabilities are present. If we lose these devices or equipment,
and anything else happens, we go down. Plant DCS displays are anoth-
er example. By themselves, operator display consoles to the DCS con-
trol provide instrumentation. A plant can (and has) continued to oper-
ate with no active display monitors. (Its never supposed to happen, but
it hasat least once!) If anything else goes wrong, the plant probably
goes down because we cant respond. Redundancies for critical instru-
ments also fit this category. Important equipment goes here. Power
pops that protect code safeties fit here also.
At the third level is PM for purely economical reasons. This
includes PMs that avoid reactive maintenance or large equipment
replacement costs. The traditional work hours and materials B/C PM
slides in here. Theres no production impact at this level, but work tasks
at this level are not all equal.
277
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 278
Standards
Every facility has highly repetitive maintenance tasks, either because
of the number of identical components, or its repetitive maintenance
nature. Developing standardized methods to perform work improves
maintenance efficiency. Work standards should include planned
NSM/OTF and CBM jobs, in addition to traditional time-based PMs.
However, convincing craft people that theyll benefit from trouble
shooting procedures and diagnostic guides is a challenge. Once theyve
developed them, they support their use. Nuclear units have procedures
that provide a high degree of work conformity. Even fossil plant check-
lists and guides standardize work uniformity, consistency, and perform-
ance time.
Establishing maintenance standards that address common equip-
ment classes is a preliminary step to build maintenance programs. For
a fossil plant with 20 coal belts, the major componentsgearboxes,
take-ups, belts, drive motor, and soft start gyrolhave nearly the
same needs. Likewise, a nuclear unit with 200 Limitorque motor oper-
ated valves (MOVs), needs an MOV standard as the first step towards
overall work rationalization. A standard will need tuning for details,
such as environmental conditions, equipment class, importance and
278
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 279
Fast Track
usage, but the standard is a start. A failure review of each type ulti-
mately identifies specific performance issues. A maintenance plan stan-
dard establishes an efficient, relevant method to examine plant equip-
ment needs at the big picture level.
Standards take many conflicting issues into account. At the equip-
ment level, in a single plant failures and wear are similar. Environment
and usage factors will emphasize some failure modes while suppressing
others. Yet, usage and environment will be similar within a plant.
These influence failure modes.
The maintenance standard summarizes experience and establishes a
plant baseline-and they can be revised with lessons learned, at any time.
Standards provide a foundation for maintenance checkout lists or pro-
cedures, accommodating different requirements or classes of equip-
ment. Taken together, maintenance standards provide the basis for a
plants planned maintenance program.
PM Reviews
PM backlog review
All plants must carry backlogged work. Very often, this is a large
and mostly inactive file. WOs that have been on the list for more than
a year, for example, will never be worked unless a change occurs. A
quick way to establish work value is to review and screen backlogged
WOsboth PM and CMto identify the high value work. This
requires equipment familiarityknowing failure modes, the manufac-
turers guidance, and industry standards. When performed by an expe-
rienced analyst, the plant can eliminate low value WOs and retain valu-
able ones (Fig. 7-4).
Large backlogs may mask high-risk work that fell into a crackfor
example, feedwater heater tube inspections that were skipped, or
missed lubrications, filters, and inspections. A CMMS review can sim-
plify backlogs while pulling out high value work. A R engineer regularly
reviewing the lists can keep backlogs short.
Reviews divide corrective maintenance into CBM and failure main-
tenance. The difference depends on whether an in-service failure
occurs. Failures occur in all planseven in RCM-based plans. CBM
279
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 280
PM list
Plants start up with OEM-based PM worklists based on specific
equipment preservation. Many vendor plans assume a continuous serv-
ice operating assessment. Most plant equipment is not in continuous
service and sees far fewer operating cycles than estimated in vendor
manuals. Adjusting vendor recommendations for these operating dif-
ferences generates the first large reduction in vendor-based PMs.
Other enhancements can extend vendor intervals for continuous or
difficult-to-service equipmentless frequent filter changes or lubrica-
tions, higher quality parts, lubricants, and filters, and minor modifica-
tions to improve service. These adjustments are fundamental for a
280
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 281
Fast Track
281
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 282
282
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 283
Fast Track
283
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 284
284
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 285
Fast Track
Event analysis
The best time to perform RCM failure mode identification and
analysis in an operating plant is concurrent with any major failure. Staff
is analyzing the failure; everyones energized and focused. Now is the
time to capture the lessons for the future. Ive found that RCM analy-
sis, done concurrently in computer database format, can help focus the
failure event analysis on facts, as well as document other potential hypo-
thetical and real failure modes discovered during investigation. Its
learning that will carry into the future.
One of the frustrations in determining root cause failure is the pre-
sumption of a single failure mode. Root cause failure analysis (RCFA)
doesnt work well for truly complex, synergistic failures-failures with
complex physical, and perhaps even organizational interactions. These
types can be decomposed to be represented as independent failures
on an Ishikawa drawing. Interactions, for which you lack adequate
information to determine the cause that developed into failure, are irrel-
evant in RCM-type process improvement analysis. The focus is learn-
ing all the mechanisms that could have lead to failure (and if you find it,
so much the better!). You need to separately address each independent
failure in your prevention strategy. In this regard, an RCM review of an
event can be more proactive and less fault-finding in nature than tradi-
tional failure analysis. This is exactly why Ishikawa diagrams help to
understand failure patterns-they seek not only the exclusive cause of a
particular event, but demonstrate the interrelationships that can lead to
failure. With this, and with process understanding, frequencies of
occurrence can be measured and action can be adjusted based on risk,
285
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 286
286
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 287
Fast Track
Equipment Groups
EGs provide a method to make PM task performance efficient. The
concept derives from work blocking by Nolan & Heap, and is espe-
cially critical when trip time may be a significant portion of the total job
time, or a significant effort must be made to make equipment available.
For a plant, it could mean the time required to tag out and return
equipment to service. When equipment is available, all potential
required work ideally will be performed. In nuclear applications, the
NRC maintenance rule requires plants to monitor the unavailability of
risk-significant systems and minimize unnecessary work. This impetus
has always had value (Fig. 7-5).
The practical utility of developing and implementing EGs with a
CMMS is that when a group is scheduled down, you do all planned
work in the group. This requires a process of aligning all PMs in the
group in such a way that they occur in the scheduled downtime slots.
The other CMMS benefit is that when a group must come down, all the
backlogged work for that group is immediately retrievable (by sort) for
quick assessment. This greatly simplifies the job of the workscope
287
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 288
planner and outage scheduler. EGs make it far less likely that an item
will fall into a crack and be lost. The greatest benefit is work perform-
ance consistency.
An operators job includes large amounts of travel between relative-
ly short periods of equipment monitoring. As much as 60% of an oper-
ators time may involve travel. Likewise, when maintenance crews must
spend a large fraction of their work time getting to and returning from the
job site, they need to coordinate trips for maximum time utilization. (In
practice, when work isnt effectively blocked, feedback from crews is usu-
ally quick and critical. The tragedy is that this doesnt always become
incorporated into work as improved job planning.) PM tasks, like proj-
ect activity blocks, are most effective when thought out and coordinated.
The concept is like taking your car to the garage: ideally, you identify all
the work and get everything taken care of with one trip. Obviously,
power plants are much more complex than cars but the principles the
same (Fig. 7-6).
288
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 289
Fast Track
289
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 290
very few work tasks are new. The bread and butter work of the organ-
ization is repetitive. Because of this, its highly effective to spend a great
deal of effort to plan and coordinate the repetitive tasks. This means
EGs and blocking of some sort.
The more planned maintenance an organization works, the more
important EGs become. In a purely reactive maintenance environment,
things breakas they break, they come down. All thats minimally nec-
essary is to quickly get them back into service. As a transition to
planned maintenance occurs, more work originates as:
TBM WOs and tasks
OCM derived work tasks
CNM/OTF operator-identified degradation work tasks
290
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 291
Fast Track
291
chapter 7 255-300.qxd 3/3/00 2:52 PM Page 292
Fossil Nuclear-BWR
Fire control
292
chapter 7 255-300.qxd 3/3/00 2:53 PM Page 293
Fast Track
Figure 7-8: Can It Be Worked On-Line? Can 40% coal-burning efficiency hold the
line? What level of reliability must back this up? These plants require staffs of 100.
Forty maintenance workers stretch to get all the work done. Yet plants in Australia
use two-shift operations with idle shutdown periods on automatic startup sequencing
unheard of in the US. Although unit outages will always be required, systems and
equipment can support more online work performance -- provided maintenance is
carefully coordinated with operations. In fact, as more maintenance work is initiated
by on-condition maintenance/condition monitoring, online work fraction increases.
This increases revenue.
293
chapter 7 255-300.qxd 3/3/00 2:53 PM Page 294
EG development steps
EGs can be developed in two fundamental ways. One is based on
designP&IDs, trains, and equipment layout. The second is based on
existing operation tag out boundaries. Each has its advantages. The
designers intended EGs are often described in A-E system operating
procedures, plant operating modes, and system descriptions. These
materials can occasionally provide surpriseslike the fact that design-
ers anticipated isolating equipment for maintenance on-line that plant
managers never envisioned possible.
In essence, every designer had EGs in mind as they laid out their
plant. This is often the reason, for example, for the check and isolation
valves in standby lines. Its obvious that any redundant standby train
must incorporate the means to both bring it on-line as well as isolate it
for work. Designers have learned standard layouts and methods for
incorporating redundancy into designs and have used them for years.
The problems occur once the design leaves the designers control.
Although many design intentions get faithfully reproduced in the real
plant, as-built systems occasionally dont function as expected, either
due to design oversight or quality problems. If contractors use substan-
dard materials or undersized components to manage costs, equipment
doesnt perform in service as expected. Thats not the designers fault.
The greater problem occurs, however, when plant operators and
maintenance staff do not fully own a design at the time they become
responsible for it. Designers usually provide training, but theres no
assurance operators will comply with the designers intentions. The
design is too often compromised as its incorporated into operating
rounds, PMs, monitoring strategies, and allocations for maintenance.
Existing operating cultures have powerful influence on planned opera-
tion regardless of equipment or system capabilities. Given these facts,
and the haphazard methods by which we bring new systems on-line, its
inevitable that some design compromises occur.
Because the primary reason for forming EGs is to perform mainte-
nance, EGs dont need to be based upon a physical boundary. An EG
should provide a unique identifier in the plant CMMS for sorting.
294
chapter 7 255-300.qxd 3/3/00 2:53 PM Page 295
Fast Track
operator rounds
VM round
calibration round
loop calibration
thermography round
lubrication round
fire door inspections
fire alarm inspections
295
chapter 7 255-300.qxd 3/14/00 5:12 PM Page 296
EG types
One method to develop EGs is using plant A-E P&IDs to identify
multiple trains, logical sub-groupings, and other sub-units that so natu-
rally form work organization skeletons. These EGs are a physical type,
based upon the tangible layout for installed equipment. Such groupings
are most often based upon designer intenti.e., the check valves in a
feedwater pump loop were installed to facilitate on-line work. A second
common grouping is the maintenance round. Thermography, VM, even
fire door inspections, logically fit into this kind of special EGa do it
all at once group. The key to this group is the equipment availability
on-line.
The first EG type includes a physical, energy, or pressure bound-
aryareas where steam, water, voltage, air, hydraulic, or special gases
(CO2, H2, He) are common. Usually, these can be isolated on-line by
train, using block and/or root valves, breakers, and other isolation
devices. The boundary points for EGs on a P&ID are also typically tag
out points. The second group type is a convenience group. These facil-
itate the use of a checklist to perform a PM round on a routine basis.
A combination of the boundary and convenience groups results,
when a system has separate trains that can be conveniently grouped for
simple work, even though trains can be crossed. Reactor water cleanup
at a BWR involves two pump trains on the front end and two filter dem-
ineralizer trains at the back end, separated by common regenerative and
non-regenerative heat exchangers. These can be grouped for simplicity
and convenience, even though the A/B pumps and A/B demineralizers
could be cross-connected. We can thus construct an arbitrary group,
but a simplifying and convenient one, within the pressure boundary.
Similarly, the BWR augmented off gas system has two similar yet inde-
pendent front- and rear-end trains, which grouping simplifies.
Implicit grouping results when routine work within a system is
aligned on a 12 week schedule. Once equipment PMs are placed on
the schedule, and worked as scheduled, the work groupings naturally
stay together as time goes on because theyre aligned to the same work
window. This grouping, developed as a natural outcome of alignment,
minimizes work periods, system outage time, and the other negative
work impacts. On a system with non-intrusive PMs (such as lubrica-
296
chapter 7 255-300.qxd 3/3/00 2:53 PM Page 297
Fast Track
297
chapter 7 255-300.qxd 3/3/00 2:53 PM Page 298
Operations
Operations copes day in and out with residual, random equipment
problems. They run complex facilities, and a general trend is for com-
plex equipment to fail randomly. Outstanding operations reduce ran-
domness. Mediocre operations introduce it. What factors separate out-
standing from mediocre to control randomness of operations?
Factors that have been identified by risk analysis and good operat-
ing practices for years include poke yoke methods and devices. Some
are:
simplicity
procedures
standards
marking
lighting
cleanliness
training
298
chapter 7 255-300.qxd 3/3/00 2:53 PM Page 299
Fast Track
Modification reviews
Ranking unit modification capital allocation requests is an annual
budget exercise that is becoming more complex. Safety and environ-
mental improvements often are handled as separate line items, leaving a
limited amount of money to be divvied up among many needed
improvements. Ideally, improvements have high paybackscertainly
high enough to pay their way. Instrumentation upgradessuch as fos-
sils transition to distributed controlsmay be needed to continue eco-
nomic operations. Every modification should have a projected benefit.
RCM thinking has identified modifications and upgrades that made
no sense, and, once identified and cast in a R perspective, could be
deferred or eliminated entirely. Several of these discoveries and ARCM
has paid its way.
A single unit PRB coal unit was originally sited for two units, with
coal handling service sized and built to allow a two-unit operation. All
major belts except the transfer and tripper belt, along with the crusher,
and dust collection system had completely redundant spares. When
one of the long, inclined-yard belts became an aging concern, the coal
handling people requested to replace the belt as a part of the annual
capital budget request. The belt replacement (with a cost more than
$100,000) made the cut, even though there was no generation improve-
ment to be derived from the upgrade.
This improvement could easily be deferred by simply spacing out
the aged belt-using it as a backup for the otherwise redundant paired
belt. This was possible at virtually no production risk. Of course, long
term, the operating strategy in this case requires that everyone under-
stand the modified approach and accept the marginal increase in risk.
In another case, a $1 million dollar capital rail loop construction
project was replaced by $50,000 of capital improvements, and an over-
all greater operating monitoring program. This capacity to focus capital
improvements is an obvious ARCM benefit.
299
chapter 7 255-300.qxd 3/3/00 2:53 PM Page 300
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 301
Chapter 8
Maintenance Software
Theres a right way, a wrong way, and a Navy way. We do it the Navy way.
-Master Chief, USN
Goals
Why do we need software to help us perform maintenance? People
and organizations have performed maintenance for years without it. On
the other hand, the primary purpose of software is to improve mainte-
nance productivity and work performance. With so many prescribed
statutory rules, its hard to imagine organizations doing work without
software and many have now used it for 20 years. Some do without,
however, often very effectively.
Its instructive to remember the promise of maintenance informa-
tion system (MIS) maintenance software as we review the changes nec-
essary to successfully implement RCM. The softwares original objec-
tive was to vastly improve the use of maintenance information. Did this
promised benefit occur? In many situations, it didnt. Access to com-
puters was functionally limited to the front offices. Software never
301
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 302
Hierarchy
Software was developed to facilitate the whole of maintenance plan-
ningincluding the entire equipment hierarchy.
Systems are the highest unit levelparts that can be removed from
service and replaced singly are at the lowest. In between we have equip-
ment trains, logical equipment groupings, major equipment assemblies,
and components, all of which can be tagged, isolated, and worked at
one time. (I think of equipment as component assemblies providing
302
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 303
Maintenance Software
Coding levels
Equipment can be coded to any detail level. The key requirement
is to be able to uniquely identify equipment with no ambiguity for main-
tenance. Inconsistently coded equipment systems pose problems. The
primary reason to code and identify equipment in a CMMS is to facili-
303
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 304
Standardize
Equipment coding structure reflects how you do maintenance.
Coding must fit the CMMS and tie into any equipment grouping
scheme the plant intends to use. Equipment grouping is a powerful tool
that enables the free association of equipment for the primary purpose
of accomplishing maintenance. A CMMS should facilitate the develop-
ment and use of arbitrary EGs.
Applications
The value of EGs increases where equipment coding systems
are detailed. (At fossil plants, a boiler feedpump could be the smallest
coded unit in a group. A nuclear plant might have 100 coded identifiers
for the same equipment.) This concept supports natural grouping. The
trick is to uniquely locate the skid-identified components. For a plant
coded to the component level, group associations coordinate work.
Newer CMMS systems that support hierarchies automatically provide
grouping logic.
304
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 305
Maintenance Software
Unique CMMS
Companies used to maintain large information services groups.
These groups are vanishing, victims of outsourcing and cost/benefit
assessment. Their legacy is custom software productsthe most impor-
tant aspect of the CMMS. These large legacy mainframe codes must be
mastered to extract data, generate reports, and otherwise interact with
and manage the day to day work of the organization. Few engineers and
managers have learned these systems. CMMS systems are largely the
software tool of planners, schedulers, and maintenance staff. However,
all the functions of maintenance at most stationsfor better or worse
are computerized, with most permanent records maintained this way.
CMMS systems offer operators great power to access and interrogate
information. Its kept maintenance managers either in control (or in the
dark, as the case may be) the past 20 years. Those who could access and
generate their own reports had an inherent advantage.
The advent of second- or even third-generation CMMS products
offers greater flexibility-though at the expense of tailored applications.
These products, with their Graphical User Interface (GUI), Windows-
based environments are truly exciting. From a standardization perspec-
tive, you will likely see the kinds of evolutionary paths that accompanied
the adoption of Word and WordPerfect as document software standards
in the PC worldi.e., much greater exchange of documents and other
information developed in the same Windows-based applications.
As this transition occurs, maintenance organizations must learn to
adapt systems they grew up with. Just as industry-wide application of
word processors has developed some incredibly powerful and common
routines, CMMS capabilities depend on having and learning the soft-
ware. Since a CMMS installation for a modest-sized utility runs well
into millions of dollars, some wont be able to afford the transition until
market pressures force it. These organizations will continue to struggle
with their specific software application. In addition, newer systems are
capable of more efficient import and export routines, which will facili-
305
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 306
306
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 307
Maintenance Software
Age exploration
To perform age exploration, there must be a convenient way to cap-
ture the performance date of in-service parts. For instrumentation and
controls, age exploration is as simple as looking at calibration data after
some period of service, and estimating the allowable drift before the
instrument goes out of range. Obviously, many assumptions and a lot
of skill are needed-not the least of which is familiarity with the equip-
ment. The principle is the same with mechanical parts, but there may
be multi-dimensional requirements to consider. Opinion and judgment
are still a key factor.
The tendency with age exploration is to underestimate mean life.
An interpreter sees the first few instruments drift or datapoints out-of-
range and they estimate the mean life at this age. In this manner,
extremely short mean life estimates result.
307
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 308
308
chapter 8 301-320.qxd 3/14/00 5:13 PM Page 309
Maintenance Software
309
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 310
are highly skilled and motivated. Even when there are lapses, the sim-
plicity is unmistakable.
A basis-to-PM relationship reflects the analogy of an architect to pro-
vide the basis for the builders construction. Most builders would not con-
sider building without a plan, but this wasnt always the case. Is it possi-
ble to build a house without a plan? It sure ismy granddad built at least
two that way. Is it effective? Perhaps marginally. Is it competitive? No.
Existing buildings constructed on an as you go basis cost more.
Maintenance programs built as you go can likewise be expensive
anywhere from 20% to 40% more, and they dont develop as much
inherent equipment R based on post-implementation PMO/RCM
reviews. Existing facilities must have some plan. Making adjustments
to the plan as you go is often the most cost-effective approach. On the
other hand for a new facility, LCM costs can be a major opportunity to
achieve long term production benefits and lower cost. Is there value to
increase production from the same capital asset? Ask any banker.
Most current RCM software are user friendly and save efforts.
Value doesnt come from documentation of every plant components
PM case, however, although such software is available for nuclear
plants. Rather, value comes from building effective maintenance strate-
gies. Building and maintaining standards that reproduce failure mech-
anisms and identify effective preventive tasks is most cost effective in
the long run. The organization that can retain strategies over time to
support a living maintenance program will maintain lower production
costs.
Ultimately, theres a tie between basis software and a plants CMMS.
Basis software provides the justification for the tasks performed. Since
PM tasks (as a part of a work performance package) address individual
failures, PMs get scatter-gunned, as well. Software that facilitates
grouping is helpful. Groups statistically increase the odds of PM on-
time performance. More advanced RCM packages provide task group-
ing and sort capabilities. In a seamless way, the things you do and how
you do them can be imported into CMMSs to provide the PM tasks and
frequencies. No single software has this functionality at this time.
Ideally RCM software should be simple and intuitive to use. Some
general goals include:
310
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 311
Maintenance Software
Basis
The basis for any PM task is the reason why the PM task makes
sense to do (or not do, for the NSM case). In nuclear applications, there
is great emphasis on developing and maintaining a basis. In non-
nuclear applications, the justification for a basis is underlying task value.
Implicit basis requirements can be lost during time. Plants have aban-
doned very good PM tasks because they forgot why they did them. The
tasks very success may have been its downfalltasks effective at pre-
venting failures werent recognized as adding value, and were dropped
and a painful learning process begins anew.
This occurred once at a two-unit coal unit in a dramatic way. The
plant manager eliminated virtually the entire sootblower PM program,
based on low failure rate. For the better part of a year his costs dropped
with no apparent consequences. Then sidewall boiler tube blower tube
cutting developed in a major way. Within three months theyd experi-
enced two cut boiler tube leak outages. In a year theyd had six. This
was eventually traced back to corrective maintenance by untrained
mechanics. After two years it was clear that the net gain had actually
become a significant lossmore than five times the promised operating
savings, at the plants wholesale production rate. One outage saved
could have paid for a whole year of sootblower PMs.
When you dont know why youre performing an activity, theres
temptation to change. For PMs, you must know why youre doing them
and what the underlying value is. From the perspective of a living pro-
gram, thats when a task basis is most valuable. Changes can be evalu-
ated with more focus, clarity, and with less regulatory or safety risk.
Developing a basis for an activity is like keeping a log. Every time you
311
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 312
change or note something, you log it, and file the result where its
retrievable for later use.
For large plants and companies, this means reliance on computers
and their networks. The value of a basis is underscored in this age of
cost-trimming, when many young Masters of Business Administration
(MBAs) with virtually no practical experience are tempted to make
career-serving cost reductions. These only reveal their true impact later.
A documented basis gives staff more armor to defeat such moves.
Analysis
Software enables us to perform detailed failure and cost analysis
quickly. Several software products have this capability. Where genera-
tion could be lost, and expensive corrective action is a necessity or
option, cost-calculating software can establish B/C ranges and total
costs to enable us to focus on high value activity. Exact costs arent nec-
essary, but we need ranges to know if were talking B/C ratios of 1/1,
100/1 or 1000/1. Ideally, we would like bound our approximate upper
and lower PM and failure cost estimates.
In non-regulated environments, cost is the primary focus of all PM
activity and naturally forms the primary basis for any activity.
Practically, even with regulation cost is the basis for most PM hours.
You dont have to cost out every PM case in detail. Rather, you need
enough benchmark cases-on the order of 5 to 10, so that you can quick-
ly assess any new case by comparison-for cost analysis and priority rank
purposes. Standard development has an obvious cost tie.
The development of benchmark cases aids in regulated PM activi-
ties, as well. Regulated PMs are usually easy to identify and because a
law or license specifically requires PM tasks, these tasks often record
their legal source in the basis. Rules for reports concerning continuous
emissions-monitoring performance are a straightforward example.
Under the older discharge permits or licenses, PM requirements were
non-prescriptive except at the brush-stroke level. They once went no
further than general mandates for appropriate PM programs.
(Whats an appropriate program? An outcome-based answer is obvi-
ousone thats applicable and effective!) Nonetheless, theres consid-
312
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 313
Maintenance Software
313
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 314
314
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 315
Maintenance Software
tests) that somehow just disappear! This discipline diffuses through the
PM program so that it gains credibility.
Air transport, hazardous material handling, and nuclear generation
are areas of public concern that will need to continue to document and
justify maintenance programs from a regulatory perspective. This, too,
is an opportunity for software heavyweights who develop an interest
in maintenance performance models. A CMMS ultimately manages
workany workthat must be performed. A CMMS that provides a
seamless tie to an RCM-based work development system will always
have an inherent advantage. This transition to RCM software will fol-
low the implementation of standard CMMS programs.
Backing-off an existing program (one with too high frequency or
where regulation is relaxed) requires a basis document. PMO or RCM
software that provides basis maintenance as a feature has a one up on
other methods.
A non-regulated market environment really doesnt care why a
change occurs, in contrast. It is more than sufficient to justify where
you are at a point in time, based on cost. In this case, a well-prepared
basis document must present a case for a PM activity for economic rea-
sons. RCM development software and its related CMMS database
must have work efficiencies as a goal whether the environment is regu-
latory, economicor whatever.
Many PM programs developed informally and were subsequently
grandfathered by regulators. A basis is implicit. We presume that at
one time there was a good reason to do all the PM work specified.
Justifying a specific change is referred to as a partial basis. It documents
the purpose of one change. A complete PM program justification, on
the other hand, is comprehensive and includes all relevant program
documentation. These are known as full bases. Many programs sur-
vive on partial basis PM changes.
Documentation
Basis development in a non-regulated environment only needs to
suit the company and economic conditions. At a very basic level, all PM
is based on cost. Regulation-mandated PMs means that the potential
315
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 316
cost is the risk of being shut down for not doing legally required PM
tasks. Or worsethe costs associated with injury to the public or
employees from an event.
Typically these involve rules, agreements and understandings with
state health departments, boiler inspectors, federal agenciesFAA (for
stacks), EPA (emissions), OSHA (industrial safety), Department of
Transportation (DOT) (gas transport) and occasionally others. Its
common to have a regulatory body endorse a professional groups stan-
dard, like the NRC and state endorsements of the ASMEs Boiler and
Pressure Vessel (BPV) Code. Occasionally, rules or standards conflict,
like the mixed-waste jurisdiction issue only recently settled between
the NRC and EPA. Overall, these standards are the first levels of com-
pliance that need to be assured in a PM program. Their source docu-
ments are voluminous.
Many companies also separately endorse building codes, the
National Electric Code (NEC), and many of the supporting ANSI,
American Society for Quality Control (ASQC), ASME, IEEE,
American Society of Civil Engineers (ASCE), SAE, ASCLE, and other
technical body codes. These codes are impressive in size. Codes repre-
sent the best effort of a group of knowledgeable and interested people
to provide guidance on how to do something. Theyre often general,
vague, confusing, and subject to interpretation and change, but theyre
also the best source of information on any subject for which you arent
already an expert. Occasionally theyre dated, or organized based on
changes.
Insurers develop and maintain inspection standards in addition to
codes. Boiler and fire protection requirements are two examples, but
there are many others. Fossil boiler insurers and their representatives
often want to know specific ways that an owner implements a code
requirement. Occasionally an authority designates an implementing
agency for a code requirement. (I worked in a plant that had this
arrangement with the state boiler inspector. The state recognized that
industrial insurance agent and his engineers recommendations had the
authority of the state boiler inspector behind them.)
Industrial insurers may identify risk areas they would like
addressed. Sometimes this has to do with a facility. More often its for
316
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 317
Maintenance Software
Products
The products of RCM basis documentation are the tasks done to
avoid failures. Organizations that have traditionally performed them,
as well as the products themselves, can summarize what these tasks
are. Most nuclear organizations would recognize the on condition-fail-
ure finding RCM task as the rough equivalent of their SP. (The SP is
based on the literal technical specifications of the nuclear plant, so the
correspondence isnt exact.)
This illustrates why the RCM paradigm is so useful. Its helpful in
cutting through the organizational muck, and getting to the basics.
Sound engineering, sound maintenance, appropriate to the situation.
Round checklists
Checklists for rounds are the staple of the roving operator. They
provide guidance on what to check, how often, limits, and other inci-
dental information. For the control operator, they are summarized in
software as screen pop-ups that require entry from a DCS.
Rounds are being modernized with hand-held wands and monitors.
Portable-monitoring devices can provide a seamless tie from the rover,
reading nonDCS data, into the CMMS or even DCS through a down-
load. This in turn supports trending.
Obviously out-of-limit equipment is a candidate for immediate
317
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 318
PM tasks
Scheduled work activity that includes tasks implemented by WOs
using personnel assigned to maintenance is the traditional scheduled
PM program. Two primary task groupings comprise the TBM plan-the
traditional PM and on-condition-maintenance. While OCM has no
exact equivalent in a traditional program, it does reflect the intentions
of the traditional program PdM. The distintion is the level of diagnos-
tic task scheduling (OCM) and follow-up task performance (CDM).
Organization
Organization of PM activity into useful sub-categories is a prime
benefit of RCM. The distinction between a non-specific CNM task and
an on-condition one is the simplification of routine scheduling and
work priority this provides. In so many words, the mature program
schedules failing-condition equipment for maintenance before indeter-
minate-condition equipment.
CMMS integration
The best RCM systems integrate cleanly with the CMMS. On the
front end, they use similar coding and system definitions to organize
strategies. They tie in instrument plans. They easily upload completed
grouped plans into the CMMS.
They should not require repetitive entry of CMMS WO plans for
the maintenance work plans. They should allow the later addition of
work plan detail from reference documents. They should support stan-
dard work plans for repetitive work-the case for virtually all PM.
RCM/CMMS idealization
What would the ideal RCM-based CMMS/RCM maintenance strat-
egy development and implementation system look like?
It would probably have a very different emphasis than traditional
CMMS systems that are based on the concept of broken equipment.
318
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 319
Maintenance Software
Configurations Simulation
What maintenance work is best in a given situation? With a simu-
lation model, we can take the maintenance plan, enter all the factors,
perform a Monte Carlo simulation, and see what R answers result.
Simulation is available today on PC to help facilities test strategies for
redundant trains and instruments to see which, in fact, is best. This can
help avoid learning lessons the hard way. Very often minor design
changes can have significant R paybacks. Maintenance routines are also
likely to benefit.
Software can simulate the availability impact of proposed modifica-
tions, before theyre made. Abandoned modifications resulted in some
cases only after the plant saw the impact of the modification in a plant
trip. Plant trips at even a modest-sized plant cost well upwards of tens
of thousands of dollars. For large base-loaded facilities, they may
approach six figures quickly. The age of trial-and-error design changes
in the utility industry is rapidly coming to a close. Its simply too expen-
sive!
Simplicity
One risk of simplified, computerized, RCM analysis capability is
analysis-paralysisdocumenting every potential failure possible.
319
chapter 8 301-320.qxd 3/3/00 2:54 PM Page 320
Policy
Some corporations may find it useful to establish company-wide
RCM standards. After an analysis has been completed, it represents a
significant amount of learning, and its logical to apply this at similar
facilities.
In the final analysis, the system that makes the work elegantly sim-
ple is the best one. Where learning is transferable with existing
processes, it should be transferred.
320
chapter 9 321-340.qxd 3/3/00 2:55 PM Page 321
Chapter 9
Measures
Measurement
Global
Accepting the challenge of an RCM/ARCM program is an example
of a process shift. When a process shift occurs, what precisely happens?
At the plant or company level, it doesnt mean that we instantaneously
have a new process, with new results. Any major organizational change
requires time, effort, and resource dedication to implement. But what
if we could change a system or its inputs instantaneously? We check a
control system response by feeding in a new signal. (Fig. 9-1) Treating
a system in the same way, we should see a new response (with some
dynamic delay). If we could model in this way, what would the response
level be?
Theoretically, output would start to respond once the change
321
chapter 9 321-340.qxd 3/3/00 2:55 PM Page 322
occurred. When control system input takes a step change, the process
output instantaneously has a new equilibrium value. It just takes time
for the process to get to the new value. Taking the system to be the
maintenance process, the input as maintenance selection, what are
some suitable output measures? Based on theory and our projections,
what do we expect to change? Our outputs are maintenance costs, unit
production cost, and R. To see change, we must measure their
response. Ideally, we achieve an appropriate level (Fig. 9-2).
We change a maintenance or operating plan to either increase pro-
duction, reduce costs, or both. R is a byproductits difficult to meas-
ure until we drop to the more sensitive system level. At the system level,
responses are easier to seeif we have system level measures. We seek
322
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 323
Measures
No change can occur until the maintenance plan changes. Thus, the
initial effort after an ARCM effort must be implementation of the plan.
This takes time, because ARCM is implemented at the system level. To
speed the results and identification process, those systems with the
largest potential for improvement need to be selected. Problem sys-
tem selection usually isnt difficult, but depending on the level of plant
measurement awareness and sophistication, it can require some time to
identify the potential value-adders. Sometimes secondary failures arent
accurately reflected by their root causes.
The kinds of system problems reported in NERC statistics are gen-
erally the same for a given class of units. For example, coal-fired boil-
323
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 324
ers tend to have a high rate of boiler tube failures induced by fly ash-
erosion. Boiler and turbine losses typically represent the top two loss
contributors. To understand the spread of losses at a particular unit
requires understanding root cause losses in depth. This starts with
loss reports, but doesnt end until the loss drivers are understood.
This identifies the low hanging fruit. Reviews require an up-front
assessment-an intermediate step to assure the effort focuses on the best
improvement targets.
When a maintenance management system provides accurate num-
bers up front, theyre most helpful. Many CMMSs can identify these
statistics, but only to the degree data allows. Sometimes a surrogate sta-
tistic must be sought when a key field or other indicator isnt available,
or available but unreliable. For example, CMMSs that record hours
independent from time cards are suspect for time accounting data accu-
racy. CMMS reports of emergency WOs may be suspect based on the
uniformity and control of the category emergency.
Systems with performance problems often have multiple problems.
Systems with low availability are also typically high cost systems. More
hours are worked on these systems, much of it overtime, on short notice.
By focusing on the half-dozen highest cost systems, the effort has
much greater probability of success. There are often masked secondary
failures in the measurement, so analytical review of system costs is
required. Three factors that trend together are availability, R (evidenced
by forced outage rate), and costs/work hours. By simple thumb rules of
estimating, non-labor cost and work hours tend to roughly approximate
each other in total cost terms. i.e., a staff with an annual payroll of
approximately $5 million spends $5 million on parts and services.
Focused
Every system has inherent cost profiles, based upon designs show-
ing their inherent R with regard to cost, availability, and man-hours
needed to support a given level of production. Benchmark comparison
figures for similar plants are very helpful to understand where nominal
levels should be. After selecting improvement areas for focused effort,
several change iterations may be needed to achieve the desired result.
A detailed performance measurement system is necessary to meas-
324
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 325
Measures
production
R
cost
325
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 326
326
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 327
Measures
remove iron or debris from the coal feed stream. All such material was
fed into the crushers, bunkers, feeders, and mills, where it was either
pulverized or caused random trips that required isolated, unplanned
entry to remove it from the equipment. All the while generation was
lost-up to 1/2% availability due to these events alone. At the wholesale
generation value added rate, this added up to a cool $486,000 annually
for that base loaded unit. Year after year the capital budget request for
tramp iron removal equipment and metal detection upgrades (budget-
ed between $100-200,000) was edged out by more glamorous projects.
Focus can be redirected once interrelationships are understood.
But it shows-from a RCM perspective-how design basis system func-
tions have gradually eroded and even vanished over a units in-service
life-to the units cost-competitive detriment. Nuclear units do not suf-
fer design memory loss, but they pay dearly to maintain that memory.
Training and documentation expenses are correspondingly higher.
All measures start with operating goals. Awareness of the competi-
tive profile, as well as industry standards and capabilities, are helpful in
establishing meaningful goals. The pursuit of meaningful goals is
exhaustively covered in literature on TQM process methods.
The key is to find a parameter that provides improvement focus.
Even in those rare cases where systems or equipment dont show obvi-
ous stratification, there are ample opportunities to focus on perform-
ance.
327
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 328
Measures
Consider traditional system level unit cost measures. Units can also
be measured at the systems level with an FERC-like cost reporting sys-
tem. FERC categories are:
328
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 329
Measures
and what fails, helps interpret risk patterns. Operating risks can be
managed as long as theyre known.
At an operating level, economics factors (corporate wage rates, his-
torical costs, and trends) are known. Total systems/equipment hours
worked, maintenance strategy mixes, equipment CM/PM (by hours),
emergencies, and total costs are also known. To interpret these num-
bers, you must also develop system/equipment risk profiles. Theres
nothing inherently good or bad about working 60% corrective/40%
preventive work on a system, unless you compare it to a known bench-
mark. Knowing that a systems ratio of reactive maintenance and its
competitive operating costs are high suggests finding out how competi-
tors operate similar systems or performing general benchmark studies.
One of the most exciting aspects of ARCM implementation is the
opportunity to view performance data from an entirely new perspective.
Historically, plants followed costs and work hours. Some further broke
these down into CM and PM. But if maintenance is better off per-
formed on demand, PM/CM categories have no meaning. New RCM
maintenance categories enable consideration of measurement-and what
can be measured. Most of these categories cant be measured directly
with existing CMMSs. But, new CMMSs can.
System measures
System level cost measurement is the minimum level to ensure that
unit performance expectations are measurable. Usually a system must
meet minimum safety standards and support pre-set production levels.
There are two broad systems categories. The first directly supports pro-
duction, the latter provides production support service. A loss of serv-
ice system functionality has a delay before production halts. Only a few
production systems directly support production.
Examples of production systems include:
fuel system
primary coolant system
main steam
reheat steam
feedwater
329
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 330
circulating water
flue gas (boiler)
electric conversion
waste-water
ash handling
service air
coal handling
domestic water
330
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 331
Measures
failure, then the boiler is charged for the tube failure. This bias misdi-
rects our effort on the wrong systems. Inconsistent failure reporting
makes this doubly difficult. Theres just no way to avoid becoming
familiar with the actual performance numbers.
One refrain heard many times over from station managers has been,
That failure will never happen here again. It wontat their unit
for five to ten years. After that the lessons learned are forgotten and the
potential for failure reoccurs. Monitoring one companys fleet of 20
large generating units, I found that fleet problems such as winding fail-
ures do recur. Unless they change some aspect of basic operation, over-
all risk levels remain the same.
Statistical failures can be compared to traffic tickets. Speeding tick-
ets, accidents, and insurance losses correlate. Eliminating risks from an
insurers portfolio begins with elimination of speederscharging that
risk category a higher premium to cover the higher risk. At the plant
level, few stations keep speeding tickets. Major loss precursors often
go unnoticed, but at the system level, its much easier to identify and
track precursor near misses and use them to predict future risk and
system level performance.
In the absence of a near-miss program, how can you identify sys-
tem level risk performance? One method is to track two measures that
correlate system risksystem equipment emergency WOs and over-
time. These indicate the degree to which unplanned events influence
system performance. These indicators can serve as red flags. Of course,
the absence of a system management plan, a system owner, and opera-
tional awareness are the big warning signs. With one or more of these
factors present, loss factors decrease.
Cost measures including total man-hours worked, total costs, and
how these are allocated between and among various work categories
need to be followed at the system level. Remember that a PM hour is
an effective work hour-the work is planned, predictable, and the value
added has been pre-identified-while an emergency hour is most ineffec-
tive. Given alternatives the PM hour is preferred.
After system outage work is controlled, emergency work should be
addressed. A system profile of planned CNM, TBM, and OCM offers
the lowest cost. Superficially similar systems can have very different
331
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 332
Failures
Another exciting aspect of RCM is the new perspective it provides
on failures. This is true both in regulated and unregulated environ-
ments. In place of the easy and conservative (and 100% bulletproof)
position of, When in doubt, or questioned, call it inoperable! we have
proactive, pre-planned assessment engineering and failure descriptions.
A confident pre-assessment can be provided to those who must make
shutdown decisions.
Failures influence economics, and are therefore worth measuring.
Some failures have license or emissions impacts, as well as production
loss value. There needs to be a consistent basis for measuring failures
at the system level. The NRC imposed this by regulation with the
Maintenance Rule at nuclear facilities. Nuclear plants must track sys-
tem availability and MPFFs at the system level for risk-significant sys-
tems. Risk-significant functional failures can be identified from operat-
ing logs, and E WOs.
Two difficult failure types involve redundant equipment. The first
is redundant trains, like a standby feedwater pump train. The second is
protective devices. Their function is redundant in the sense that they
serve to alert or prevent another primary function failure. Protective
devices whose function isnt required (nor typically desired) until an
event occurs spend the majority of their lifetime in standby waiting for
an event. Like redundant equipment, no backup need occurs until we
lose the primary. An unintended transfer constitutes a control transfer
functional failure. In the case of the redundant train, inadvertent trans-
fer doesnt constitute functional failure-it merely swaps trains. But for
a protective device, the component functional failure as an inadvertent
activation very often creates an unplanned event and system level func-
tional failure results. So, an unplanned and incorrect feedwater level
trip is an unintended event-and failure. A spurious high-vibration alert
trip on a steam-driven boiler feedpump is a failure.
Sometimes a functional test, a near miss, or a demand-event
uncovers a protective device failure. Because safety devices have multi-
332
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 333
Measures
ple backups, demands rarely utilize all devices and trains. Events typi-
cally are near misses, and most of them reveal loss of protective device
redundancy and greater risk of functional failure. Nuclear power plants
are effective at testing and identifying device functionality under
nuclear technical specifications, safety programs, and general public
safety considerations. Fossil unit rules arent as structured.
System functional failures provide an objective measure to track
failure performance. Though subject to interpretationmore so where
the functional requirements havent been formally identifiedthey are
still very useful. Total WO numbers are arbitrary as a system perform-
ance measure, but system work hours and costs are not. For nuclear
plants, MPFFs remain a convenient measure. For fossil, large
unplanned expenses may provide the failure measure. Because fail-
ures themselves are hard to track, I developed two measures that corre-
late with failures and are simpler to use: system emergency work orders
as a total number and percentage system overtime. These are easy to
extract from CMMSs at most plants. Both are indicators of functional
failures.
Costs
System operating costs are the obvious summary performance
measure. Some systems are more cost intensive than others. A soot-
blowing air system is an inherently high cost system in a PRB-fired boil-
er. Cost performance per standard cubic foot per minute of air pro-
duced is one measure of this systems performance. Unit efficiency and
boiler pluggage events are others.
Because CMMS systems dont always allow cost monitoring below
the plant or unit level, system costscombined labor and material
provide meaningful cost data. As important as total system costs are,
other numbers can tell more. The cost of overtime hours worked per
system, or costs of irregular part expense, are examples.
At one plant, we arbitrarily selected unplanned failures costing
above $25,000 to measure for overall performance. This selective analy-
sis required manually tracking CMMS entries (subject to interpretation)
but the results told a subtle story. Quantified in this way, eight major
failures a year dropped to five because of our efforts. Stratifying meas-
ures must be performed with great care, and the advice of a statistician.
333
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 334
PM Hours
Real hours
Although in an ideal world all planned PMs get worked, in the real
world this is never the case. Auditing PM jobslike other CMMS-
reported jobsprovides insights into how completely a program is
implemented.
Many Legacy CMMS systems ran redundant time accounting sys-
tems separate from the time cards submitted for pay. WO completers
could report any amount of time on WOsthe CMMS couldnt check
or enforce simple time accounting rules. One aspect of this was that
workers could work (according to CMMS time reports) any amount of
time. In my experience, workers are biased by work planning time esti-
mates and management expectations. Without an independent time
accounting system, they arent grounded by real world limitations.
Measurement helps to provide this!
For accuracy, a system should charge time concurrently from time
cards to WOs (or jobs, as appropriate). Fractional time charges down
to the decimal hour are needed for PM time measurement. Many PMs
are brief jobsshort enough that several may be worked in a morning
or an afternoon. Accurate cost accounting is necessary to understand
where time costs are allocated. In a typical plant, only 40-45% of work
time gets charged against work jobs. The challenge for maintenance is
to put in wrench time. Other things (like training) are important but all
compete for limited available time. When wrench time drops below
40% the plant needs to worry about time usage competitors. These
include safety meetings, extracurricular duties, and other supernumer-
ary tasks. Mechanics who dont turn wrenches have little value.
WO time accounting must be controlled like checks. Paid time
must be managed. Major time charges should be organized by Pareto
chart, be tracked, and audited. Routine time consumersrework, tool
and parts shagging delays, tag-out delays, engineering support delays,
and job planning (by the workman)should be given charge numbers,
so detailed time charges can be allocated. Bottlenecks, delays, and
other losses can be identified for organizational review. The goal must
be to continually increase work time charged to jobs in spite of the
334
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 335
Measures
335
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 336
336
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 337
Measures
Responsiveness
As maintenance programs evolve towards CNM, measurement bal-
ance needs to be sought. Appropriate levels of CNM depend on system
type and strategy. Many strategies can achieve the same operating
objectives though at different cost and complexity levels. A strategy
must fit an organization.
Failure measurement is not possible without accepted failure stan-
dards. A traditional program lacks explicit definitions. Even nuclear
maintenance programs lack function-based failure criteria, as relatively
minor eventsthe charging motor run-on in a 4160 breakerget
described as failures. Major failures can go entirely unrecognized. The
secondary failure that results is often the focus of investigation. A
breaker fails to trip on overload, causing a fire, or an alarm fails to
annunciate an unsafe condition, like methane or carbon monoxide gas.
Events that should have been excluded by operation in fact become the
focus of investigation.
Total hours/system
Total work hours per systembroken down by PM (TBM +
OCM), CM (CBM + OTF) and functional failuresare a meaningful
RCM-based measure. The key ratios are the percentage of each ARCM
category. These profile the system. The continuous improvement goal
is to reduce required hours and costs. Where systems lack total cost
measurement capacity, tracking total work hours is a second useful
measure.
Trends
System downtime and functional failures are important perform-
ance measures. Identifying functional failures is a challenge when func-
tional-failure definition and perspective is absent. Having these meas-
ures requires that a company has engaged in goal setting for the unit.
This establishes relevant failures. Many havent.
337
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 338
Aging studies
WO populations should show continuous progression to completion.
If we examine progressively aged WO group snapshots over time, we
should see incomplete WO numbers decline. Mathematically, WOs are
worked proportional to number and age. When this doesnt happen, the
systems in trouble. Regular aging reports are useful for telling manage-
ment whether their maintenance system fundamentally works.
Costs
Change should generate improved performance, lower unit costs,
more flexible operations, or all of the above. Historically, plants have
never had income benefits allocated at the unit level and so higher
income generating units have requests buried in with peer units. Utility
cost and income aggregation are to blame.
Merchant-independents stand alone from a financial perspective.
Even then, unit costs allocate downward to systems and equipment by
tedious manual methods. The before/after snapshot of any significant
process change can be obscured. Utilities, as vertically integrated struc-
tures, suffer incomplete cost accounting at the unit level.
New CMMS systems greatly improve cost tracking. Theyre
dependent on data entry, but use hierarchies that are interact with and
extract data easier. They promise better information capture and pres-
entation.
Its difficult to tackle more than one plant system improvement at a
time. System cost trendstotal, routine, outage, emergency, and mod-
ification costsare major interest categories. Some cost expenditures
are most important. PM time and expense are among them. These
need tracking categories. For the cost-driving systems, these cost trends
will be important.
Ratios
Maintenance ratios tell a story. The emergency to routine mainte-
nance ratio-by hours-reveals how a systems work is managed. One
CMMS coded work priority on a continuum range from E to 3s (E-
1-2-3). Es were unscheduled, unplanned WOs; 3 were planned and
scheduled. High E/3s reflected reactive maintenance. (High and low
338
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 339
Measures
Rework
Rework is a significant cost-contributor because maintenance is
expensive. If you can track rework causes, you can reduce them with
significant benefits. Manufacturers follow rework and scrap cost in
depth. Manufacturers neither want to make scrap nor send it out,
incurring warranty or other cost charge-backs. Maintenance is a
processbut tracking rework is like tracking scrap. WOs need to iden-
tify reworked equipment and jobs for trending and root-cause assess-
ment.
Workers, generally readily identify rework on jobs. With their par-
ticipation, rework maintenance can be measured. Things you can meas-
ure can be improved. Sources of rework should be identified for
process improvement.
339
chapter 9 321-340.qxd 3/3/00 2:56 PM Page 340
340
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 341
Chapter 10
Conclusions
Learn from the mistakes of others. You wont get to make them all yourself.
-Eleanor Roosevelt
Just because somethings old doesnt mean you throw it away.
- Scotty, Relics, Star Trek
God integrates empirically.
-Albert Einstein
Organizational Entropy
Entropy explains why our natural order trends with time towards
disorder. Entropy explains why thermodynamic cycles have limits, heat
flows in a single direction, and why temperature and time have mean-
ing. Entropy is a powerful concept and one of the three laws of ther-
modynamics, (paraphrased):
341
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 342
Figure 10-1: The Fort Saint Vrain power plant in Platteville, CO., was operated as a
conventional nuclear plant until its sporadic operations and high cost caused its shut-
down and decomissioning. Today, its operated as a combined-cycle gas generator.
Recent organizational models apply thermodynamic principles to
organizations and processes. Entropy helps to explain the apparent
confusion and disorder among some large organizations as they do so.
Entropy can help us understand operating environments. We might
view a situation normal all fouled up (SNAFU) as an individual
faulta person messing up the jobbut an entropy model suggests it
is the nature of the system. Things will malfunction without continuous
addition of energy and intelligence to the process (Fig. 10-1).
This explains accepted aspects of operations that have never been
theoretically considered before. Operations demand intelligence and
energy. Every conscientious worker in an operating environment knows
this. Outstanding operations demand more! The assumption that
order is the normal state of affairs is simply founded on idealism.
Complex operations need information and control to produce value.
This only happens if intelligence offsets the systems natural tenden-
cy to unwind.
Management provides the framework that provides order.
342
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 343
Conclusions
operating goals
operating plans
training
staff selection
work processes
standards
343
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 344
What is ARCM?
The general focus of RCM is identifying and preventing functional
failures. ARCM goes one step further. It throws out the dogmatic styles
in favor of pragmatics. In effect, If it works, use it. ARCM retains the
unique, fundamental principles introduced by Nolan and Heap,
Matteson, and all the other RCM pioneersthe factual basis, applied
statistics, applied engineering methods, benchmarks, and applications
based upon basic logic principles. Basics that validate NSM in a com-
plex equipment strategy. Basics that can effectively control high-
impact, large equipment outages with on-condition/CDM, and use
CNM as a general operations strategy. Basics that faithfully apply and
operationalize on-condition limits that so uniquely delineate Nolan
and Heaps published work from others. Methods to schedule time-
based and OCM equitably with the balance of non-specific work, with
assurances that CDM is worked. Methods that accept and apply risk
management to substantially improve performance, and lower risks and
cost (Fig. 10-2).
The bottom line is improved R (reducing functional failures),
reduced costs, improved quality, and well-supported corporate missions.
ARCM shares basic similarities with TQM. Both are founded on
statistics. TQM summarizes generalized lessons from early SPC appli-
cations that became tools for present day managers. Some conclusions
still apply. Others provide insights, but must be taken in context.
Embracing ARCMdeveloping strategies to reduce costsis what
facility operations are all about. A facility or company can practice
RCM and still not know about LTA or other detailed TRCM methods.
Does an understanding of RCM methodology help? Absolutely!
Years ago, corporate cost-cuttersaccountantstrimmed plant
budgets and cleaned out the shops. Experienced people left, planned
maintenance and training programs were cut. The opportunity to
improve processes was lost. Workers obviously disliked the top-down
cutting, but no one understood the value of the losses, cuts, or future
costs. Several years later, R was down, production down, unit costs up,
and maintenance was more reactive than ever.
Maintenance is an inherent cost of production, a fundamental con-
344
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 345
Conclusions
345
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 346
346
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 347
Conclusions
Statistical maintenance
Maintenance itself is inexact, with many strategies, no single one of
which is right. Many work. Statistically, we must:
347
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 348
nance program. Operators, through their A-Es, will have the founda-
tion for a cost-effective maintenance strategy from startup. This in turn
will support better staffing, life cycle decision-making, and, ultimately,
lower overall facility maintenance costs. Performance levels will be
known for operators to benchmark. In short, a step-advance in facility
O&M performance is dawning.
Maintenance is to some degree an art. There are no panaceas that
suddenly make all maintenance decisions simple and clear. Even with
the very best RCM plans in hand, hard diagnostics, interpretations, and
choices are and will be required. But, armed with a maintenance plan,
operators will have better tools to interpret equipment, plan for main-
tenance, and perform work in a reduced cost fashion. This has been the
lesson of the commercial aviation industry. The challenge is to intro-
duce the appropriate degree of rigor into an ambiguous environment to
improve managing risk and cost.
A general, repetitive RCM lesson has been identifying a basic essen-
tial instrumentation strategy to help manage operating risks. The
opportunity suggested here is obvious: if the reader grasps an apprecia-
tion for the need to quantify, understand, maintain, and manage essen-
tial instrumentation from reading this book, they will gain great value.
The flipside of the coinlearning to manage non-essential instru-
mentsis a corollary. In this world of expanding hardware capacity, it
is particularly important to control the vital few versus the trivial
many. Operators must learn to discriminate and act on essential
instruments. This is the low hanging fruit that many maintenance
managers should grasp. Unknown or inadequate instrument mainte-
nance plans cost dearly in production, cost, and (rarely) in employee
and public safety.
The general lesson of RCM is CNM. Organizations with a CNM
philosophy are reliable. Its possible to go overboard but generally the
other case occurs. Little or no CNM, and absence of follow-through on
the insights provided by monitoring, are the trademarks of unreliable
and unsafe operators. Like the person who feels ill but is afraid to visit
the doctor for fear of having a worst fear confirmed, failure to act on
CNM adds risk. Understanding problems and alternatives enables us to
select options. Rarely are we saddled by Hobsons choice: take whats
348
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 349
Conclusions
349
chapter 10 341-350.qxd 3/3/00 2:58 PM Page 350
350
appendices final 351-352.qxd 3/3/00 2:59 PM Page 351
Appendices
1. Glossary of Terms
2. Further Readings
3. RCM Software Applications
4. References
351
appendices final 351-352.qxd 3/3/00 2:59 PM Page 352
glossary 353-390.qxd 3/3/00 3:00 PM Page 353
Glossary
80/20 Rule
A rule attributed to the Italian statistician Pareto based upon his
study of 18th century economic wealth distribution in Italy. For our
purposes, it attributes 80% of the problems to 20% of the equipment.
Generally the Pareto rule can be found in many skewed distributions.
Abandoned-in-place
Equipment removed from service and left in its location in a plant
because the cost of removal exceeds the scrap value. Equipment that
adds marginal value to a process compared to cost, may be left un-
maintained in place with no cost or production impact.
Acceptance criteria
Specific limits for acceptance. A term with general meaning, but
nuclear origin. Also used for operationalized failure criteria, generally
describing a test. Time-based PM will sometimes have acceptance cri-
teria. Large turbine bearings will be reused as is, i.e., unless wear
exceeds so many thousandths.
Actuarial
Mathematical statistical failure analysis using practices accepted by the
insurance industry and Society of Actuaries (SOA) or other profession-
al groups (as appropriate) to measure aging, risk, and mortality (based
on study for human populations).
A-E
Architect-engineer. The facility designer usually hired as a contractor to
provide a facility design. Occasionally also the constructor (design-
construct)
After-market
The parts and services supply market for equipment other than through
the OEM. After-market parts can be superior to OEM parts, but theres
much greater risk in the after market for parts reliability and quality.
This risk is generally borne by the buyer.
353
glossary 353-390.qxd 3/3/00 3:00 PM Page 354
Age
Time correlation factor. The accumulated time since the equipment was
placed in service. In general, time equivalent for the aging process in
question. This could be tonnage for wearing parts, for example.
Age exploration
A systematic process of using conditional overhauls and opportunity
samples, with formal cost analysis, to evaluate and improve designs.
Age parameter
The measure of aging, which correlates with resistance to failure for
parts that age (dependent on the part, application, and use). Used as
the general time basis for maintenance.
Aging
A process that can reduce a part or components resistance to failure
over time. To grow old or show signs of growing old. Synonyms: dete-
riorate, fatigue, waste, tire, exhaust, flag, droop, ply-out, drained, spent,
depleted, obsolete, erode, consume, fret, rub, fray, erode, weather, cor-
rode, oxidized, rust, disintegrate, spoil, decay, decompose, break.
ALARA
As low as reasonably achievable: a program of minimizing radiation
exposure required for all nuclear license holders. The basis is to avoid
unnecessary radiation exposure and its long-term damaging somatic
and genetic effects.
Align
Block: put multiple PM tasks together for performance at one time
when an equipment train, unit, or even plant is available. Aligning
intrusive PMs into a unit outage window is an obvious example. For
scheduled work, once work is aligned, it facilitates the performance of
scheduled maintenance.
ANI
American Nuclear Insurers: a consortium of insurers providing nuclear
insurance backed by law.
ANSI
American National Standards Institute: a standards organization that cer-
tifies and maintains standards. These include many industrial and power
354
glossary 353-390.qxd 3/3/00 3:00 PM Page 355
Glossary
generation standards. Common ones for nuclear plants are ANSI N45.2
and N18.7, for procedure use and maintenance management.
Applicability
In traditional RCM use, the requirement that assures a PM activity is
technically and statistically effectivei.e., it actually prevents failures.
Applicable
Prevents failure.
ARCM
Applied RCM: one abbreviated version of RCM that retains the funda-
mental salient elements of RCM (as described in the document by
Nolan and Heap). Includes maintenance strategies TBM, OCM, CDM,
OCMFF, and NSM, but simplifies analysis.
Area inspection
A general walk-around inspection that checks for random or other fail-
ures. For an airplane or car, pre-service visual check for obvious
problems.
ASCE
American Society of Civil Engineers.
ASME
American Society of Mechanical Engineers.
ASQC
American Society for Quality Control.
ASTM
American Society for Testing and Materials
Availability
Defined exactly by NERC. The period of time a unit is available to be
dispatched for generation, whether it is or not. Expressed as a fraction
of calendar time.
Base load
Loading a unit to full rating between scheduled down periods. Typical for
nuclear and low-cost generators. The opposite load term is peak load.
355
glossary 353-390.qxd 3/3/00 3:00 PM Page 356
Basic interval
Prime intervalthe most fundamental interval when aligning a PM
task that fits the frequency, and provides reasonable multiples for the
overall maintenance program. A car requiring PMs at 12, 24, 30, 48 and
60 months has a basic interval of 12. Often the interval that also carries
a fundamental condition-directed maintenance program inspection
task. Missing a basic interval PM in a completed program carries a seri-
ous consequence.
Basis
The justificationthe reason why. Usually at least partially implicit.
B/C
Benefit/cost: benefit-to-cost ratio, commonly represented backwards as
cost/benefit ratio. For instance: replacing the oil has a cost/benefit ratio
of 10/1. It is really the other way but most people understand and state
it this way.
Benchmark
Comparing costs for similar processes, facilities, or equipment. Often
performed within an industry to validate competitive position, and out-
side to identify world-class performers.
Bit map
An image in a computerized format literally imported as a map of image
bits.
Block
Aggregate, group, or align for performance.
Blower
A low head fan that supplies gas (usually air).
Blue blush
Deep blue-hued carbon deposits on high-pressure, high-temperature
steam valves. Through time, these build up to where the valve may bind
during stroke. These require periodic removal.
Bootstrap
In computer jargon, a preliminary short software routine that allows
loading the main operating system. A system that gets the machine up
to minimum smarts to run.
356
glossary 353-390.qxd 3/3/00 3:00 PM Page 357
Glossary
Breakdown
Fail suddenly, with little or no warning. A failure that impacts opera-
tions and production schedules.
Breaker
Circuit breaker
Burn-up
Slang: in nuclear work, reaching the radiation exposure administrative
limit. It pulls a person off the available worker list; they are referred to
as burned up based on reaching their maximum weekly or monthly
radiation exposure limit.
Bus
Electrical bus
Bus bar
The output bus that connects the generator (stepped-up) output to the
transmission grid. Often used in the context of bus bar costthe
cost to generate at the grid connection.
B&W
Babcock and Wilcox: a large industry supplier of boiler and nuclear
steam supply systems.
BWR
Boiling water reactor
Calibrate
To adjust an instrument for zero and span due to drift. A basic PM
activity that is often time-based.
Call out
Calling out a person for work after normal work hours end, usually in
response to a plant need.
CBM
Condition-based maintenance: the same as condition-directed mainte-
nance. Sometimes used to maintain a distinction between condition-
monitoring program derived maintenance tasks and formal on-condi-
tion-derived condition-directed maintenance.
357
glossary 353-390.qxd 3/3/00 3:00 PM Page 358
CDM
Condition-directed maintenance: maintenance directed by condition.
Obligatory maintenance based upon defined failure limits exceeded.
CDM is the fundamental differentiator of a firmly RCM-based PM pro-
gram. To use effectively, it must be reserved for those on-condition
tasks with explicitly defined failure limits.
CDM (FF)
Failure finding condition-directed maintenance. A special type of con-
dition directed maintenance in which the acceptance criteria constitutes
satisfactory test performance. For instance, a diesel generator in stand-
by mode could be required to demonstrate that it can meet its design
specifications by starting and loading to 650 KWe within 10 seconds as
the test criteria.
CE
Combustion Engineering: a supplier of power generating plants and
equipment. Now ABB CE.
CFR
Code of federal regulations.
Chargeable loss
A loss that can be charged to a specific cause. Used in particular for
forced outage measurements. A restriction due to water chemistry dur-
ing startup is a chargeable loss to water chemistry.
CIC
Component identification code: a unique equipment identification
code.
Clearance
Tag out: a method to control equipment for work. A tag out or clear-
ance is required to isolate equipment from energy sources for personnel
safety.
CNM
Condition monitoring: operations-implemented general equipment
monitoring for failure. Tasks for which no exact on-condition limit
can be determined. When theres agreement that a benefit exists these
358
glossary 353-390.qxd 3/3/00 3:00 PM Page 359
Glossary
CMMS
Computerized maintenance management system (from 1990s onward).
Called maintenance information system (MIS) in the 1980s. A comput-
erized WO initiation, tracking, planning, scheduling, approval, and
archive system. A sophisticated computer software system thats at the
core of maintenance management in complex operating environments.
CO
Conditional overhaul: an overhaul that corrects the proximate cause of
failure, secondary failures, and restores equipment to performance speci-
fication. CO does not completely disassemble nor replace all replaceable
parts, although it does replace any aging components prior to the next
scheduled overhaul period. CO zero-times the equipment. (See also
Control Operator).
Cogen
Cogeneration: a type of generation authorized by law to allow non-util-
ity participation in the generating market. Being phased out in favor of
independent power producers and separation of generating and trans-
mission and distribution assets.
359
glossary 353-390.qxd 3/3/00 3:00 PM Page 360
Complex item
An item that fails to exhibit dominant failure modes and thus fails ran-
domly in service with no known age. Practically, an item that fails to
show aging.
Conditional Overhaul
An overhaul that conditionally addresses on the observed discrepencies,
returning the cycle-dependent time measure to zero. An overhaul that
addresses all proximate failure causes, as well as any aging to restore the
units post-overhaul aging parameter to zero. A maintenance activity
that corrects and zero-times a piece of equipment.
Condition-based
Condition-directed, including the non-specific results of condition
monitoring, which are separate from condition-directed in that no com-
monly agreed failure resistance has necessarily been exceeded.
Conservatism
The tendency to prefer an existing situation to change; safe. In engi-
neering, the provision of design margins to accomodate uncertainty.
Control Operator
The operator who manages the control room boards or DCS CRT.
Practically, the operator who is running the plant.
Constructor
The facility builder. Occasionally the same as the A-E.
Corrective maintenance
In former days, work on demand to correct failed equipment. A term
gradually falling out of favor due to its limitations and bias. Most cor-
rective maintenance is a combination of condition-directed, condition-
based no-scheduled maintenance but has never been differentiated. Old
MIS systems categorized work two wayspreventive, and corrective.
360
glossary 353-390.qxd 3/3/00 3:00 PM Page 361
Glossary
Cost
Maintenance cost: combination of hourly cost, material cost, services
cost, and overhead. Typically five to six times the hourly cost. Excludes
opportunity cost of lost generation.
Cost effective
Worthwhile based upon cost/benefit perspective in a general sense.
Since PM implicitly considers the time difference from performance
cost incurred to benefit received, PM cost effectiveness implies using
the time value of money.
Critical
Immediate and direct safety consequences, usually unacceptable.
Critical is used in common language to identify any important failure.
Readers of the literature and particularly in RCM should skim to dis-
cern the authors context for the use of critical.
Critical failure
A failure with an immediate and direct safety consequence. A failure
whose risk is unacceptable based upon accepted safety standards.
Usually an immediate and direct threat to personnel, public, or (in rare
cases) the environment.
Critical few
The statistical few that predominantly drives the totals presented in
Poreto format by category. This comes from statistical analysis of data
in TQM and Process Improvement Technology. As an example: one
finds that the combination of bearing failures and insulation resistance
failure account for most motor failuresstatistically.
Critical instruments
Instruments that identify critical failures, such as excessive vibration for
a large rotating machine.
Criticality
Criticality analysis is analysis done in some streamlined RCM approach-
es to identify how important a failure mode really is. It is subjective,
based on interview and conjecture, and therefore of limited use for
assessment and management of experiential risk.
361
glossary 353-390.qxd 3/3/00 3:00 PM Page 362
CT
Combustion turbine
DCS
Distributed control system: a plant-wide control system common in
many non-nuclear applications. The state of the art in control system
technology at this writing.
Design
Involving the specification, selection, and layout of equipment and
materials, in contrast with maintenance. Maintenance and design often
overlap in indistinct ways. Maintenance in fossil environments often
performs light design rolessometimes without being aware of the
design role.
Direct cost
Cost at the point of applicationin contrast with indirect cost or over-
head. Direct labor, material, and services costs show up as plant
expenses in traditional utilities. Indirect costs (in my experience)
dontalthough they ultimately effect the bus bar generation cost to
the consumer.
Discard
A type of PM task where a part is replaced based on time.
DOE
Department of Energy
362
glossary 353-390.qxd 3/3/00 3:00 PM Page 363
Glossary
DOT
Department of Transportation
Economic dispatch
Dispatch of the next unit of available generation based upon marginal
cost as the system load increases. Economic dispatch calculates the cost
of the next available MWe of generation and dispatches that unit that
provides it. Most PUCs require public utilities operating distribution
systems to follow such an economic dispatch model. The automatic
generation control (AGC) will identify which units are to be loaded
next (or removed first) as the load varies over the course of the shift and
day. In reverse, as the load drops, the most expensive generation is shut
down first.
EEI
Edison Electric Institute: a generating utility trade group.
Effective
Used casually in two contexts: technically effective and cost-effective.
RCM reserves the term to address cost-effectiveness and uses applica-
ble for technical effectiveness.
EFOR
Equivalent forced outage rate: forced outage rate adjusted to reflect the
equivalent effect of forced load restrictions. Defined exactly by NERC.
Empirical
Based on trial and errorexperiential.
Engineering cause
The local cause responsible for failure. A bearing experiencing a wipe
with no lubrication has fretting, spalling, or wipe as the engineering
cause. A plugged filter, mixed-up round, or broken supply line is not
the engineering causeeven though any one of them could have been
a root cause. Engineers often refer to engineering failure cause as root
causes. I call them proximate causes. The confusion is due to the mul-
titude of different definitions in various standards.
EO
Equipment operator: one level up from a tender. The roving operator who
starts and stops most heavy equipment requiring local monitoring.
Typically a senior experienced operator with 10 or more years of experience.
363
glossary 353-390.qxd 3/3/00 3:00 PM Page 364
EPA
Environmental Protection Agency
EPRI
(The) Electric Research Power Institute: a voluntary, industry spon-
sored research organization that performs most research for the gener-
ating industry.
EQ
Environmentally qualified: a special class of (essential) nuclear equip-
ment required to perform shutdown and monitoring activity following
a hypothetical design basis accident. Typically, equipment with organic
and elastomeric materials that is susceptible to temperature aging.
These require special scheduled maintenance programs by law.
Event tree
A logic tree that shows the pathways from a primary event upward to a
final outcome. Used to identify contributors to overall outcome risk.
Evident
Evident failure: a failure in which the failed item should be evident to a
qualified operator performing their normal duties. Contrast with hid-
den failure: one that no one would be aware of except by monitoring or
performing special checks and tests.
FAA
Federal Aviation Administration
Fail safe
Fails in the safe direction or position, e.g., a fail-safe air-operated con-
trol valve would fail shut if that were the safe direction. This is how
the feedwater-regulating valve in some fossil plants fails since it avoids
flooding the steam drum. Fail-safe valve positions include FAIfails as
is, FCfails closed, and FOfails open, for example.
Failure Mode
Repetitive manner of failure intrinsic to design and application.
Failure finding
Testing to identify a hidden failure. Startup test of a redundant train,
364
glossary 353-390.qxd 3/3/00 3:00 PM Page 365
Glossary
Failure
Failure maintenance
Failure-based maintenance: maintenance based on a failed condition. A
type of corrective maintenance.
Failure mechanism
Failure mode and a cause.
Failure substitution
Substitution of a major failure with a minor one. Redesign to lower the
consequences of failure.
Fault tree
Logic tree of outcomes traced to primary events through system logic
modeling that enables the calculation of failure probabilities for certain
events.
Feeder
Equipment that feeds a continuous, controllable stream of product into
a process. A coal feeder, for example.
FERC
Federal Energy Regulatory Commission: a federal commission charged
with the regulation of interstate power sales.
Fishbone
Fishbone diagram: Ishikawa diagram.
Flash
For computer applications, memory set permanently in EPROM. Also
short for flashover.
365
glossary 353-390.qxd 3/3/00 3:00 PM Page 366
Flashover
Electric plasma arc due to failed equipment or protective device. A
high-energy arc that can cause injury or fire. Commonly occurs in
switchyards, on switchgear, or in large motors and controllers, which
use medium voltage (2000-8000 V) electric equipment. Requires high
voltage due to the electric resistance of air. Once initiated, however,
requires another circuit interruption to terminate.
FMEA
Failure modes and effects analysis: systematic review of the ways that
equipment is expected to fail, and consequences of the failure. A qual-
itative enumeration of failure modes. Developed as a discipline in the
early 1960s as a technique to improve reliability. Made into a Mil spec
standard, and required as a part of some defense design proposals
FMECA
Failure modes and effects criticality analysis: FMEA with criticality cal-
culation based on standard published component reliability figures.
FMECA extends FMEA to a numerical basis. This in turn allows tech-
niques such as reliability allocation to be used as a design tool.
FOR
Forced outage rate: forced outage hours(forced outage and service
hours) expressed as a percentage. A NERC reliability measure.
Fossil
Fossil-fired boiler or combustion turbine fueled with fossil fuel.
Function
Output(s) provided by a system.
Functional failure
Loss of one or more system outputs. Loss of a system purpose.
GADS
Generation availability data system: a statistical reporting system oper-
ated by NERC that summarizes broad categories of electric generating
plant performance.
Gaitronix
Plant public address and personal communication system.(A trade name.)
366
glossary 353-390.qxd 3/3/00 3:00 PM Page 367
Glossary
GE
General Electric. Supplier of power plants, turbines, and electrical
equipment.
Graybeard
Experienced personnel who know the ropes. Usually with 20 or more
years of experience.
Gun Deck
Perform superficially but completing the paperwork to perfection.
GUI
Graphical user interface: a screen interface that uses a mouse to isolate
and execute commands
Gyrol
(slang) A soft-start single speed transmission between a motor and a
load. Used on large coal belts. Named for a manufacturer.
Hard time
Time-basednot condition-based. Sometimes used for emphasis to
indicate activity that could be worked as OCM, but is left at hard time
due to program maturity, or intent.
Hard wired
Unchangeable, unmodifiable. (Slang) Impossible to mess up. Contrast
with jumpers.
Heat rate
Heat required to generate a MWhr of load. Literally, the efficiency of
the plant to convert fuel to generation. The inverse of efficiency.
Typically fossil range is 6000-12000 BTU/MWhr
HEU
Hydraulic equipment units: a hydraulic equipment skid.
Hidden
Not evident to operators performing their normal routine duties.
367
glossary 353-390.qxd 3/3/00 3:00 PM Page 368
Hidden failure
Opposite of evident. Describes a failure not normally evident to the
operator without special instrument and test.
House
Main plant.
HRSG
Heat recovery steam generator: steam generator used for combustion
turbine heat recovery.
HTGR
High temperature gas reactor: a nuclear reactor cooled by helium. The
retired Peach Bottom 1 and Fort St. Vrain plants were these types of
plants.
Hydro
Hydroelectric
I&C
Instrument & control
IEEE
Institute of Electrical and Electronic Engineers
Ignitor
Device used to ignite a fuel stream in a boiler, usually oil or pulverized
coal. An electric ignition source (spark plug) and fuelusually gas or
oil that assures a flame for combustion.
Important
Worthy for consideration of scheduled maintenance. A classification
used by EPRI for streamlined RCM.
In service
A device during its useful life. Typically used to describe equipment,
train, or plant that is operating, or capable of being operated.
Infant mortality
Failures shortly after entry into service due to quality, defect, and other
latent causes that drop as service life increases. For electronics, this used
to be the basis for burn-in of equipment.
368
glossary 353-390.qxd 3/3/00 3:00 PM Page 369
Glossary
Inherent capability
The equipments intrinsic service capability related to design.
Inherent reliability
The reliability level supported by the intrinsic design. An upper limit
to the performance reliability of a plant or equipment.
In-op
Inoperable
Inoperable
(Nuclear terminology) Not capable of performing design-basis func-
tions. A system can be operating, yet be inoperable because certain acci-
dent or other scenario design assumptions cant be met. Based upon
technical specifications, inoperable status can force a plant to shut down
until operability is reviewed and assured. (Secondary) Unevaluated for
operability; standing by until a qualified person (usually an engineer)
evaluates and declares the equipment capable of performing its design
function. A real or virtual equipment status category.
INPO
Institute of Nuclear Plant Operations: a self-regulated nuclear industry
oversight body originated by the ANI after TMI. A quasi-regulatory
nuclear body performing many self-oversight functions.
Instruments
Devices that make a transducer conversion of condition to human-read-
able format. Commonly visual, but occasionally acoustic and other for-
mats.
Interlock
A device that prevents one device from operating until other require-
ments are met. These are usually based on personnel or, machine pro-
tection, or both. Sometimes the protection of the general public is a fac-
tor. For example, mid-1970s vintage cars had seat belt ignition inter-
locks. The engine wouldnt crank until the seat belt was fastened.
These were eventually eliminated based upon public outcry.
Interlocks
Devices that prevent undesirable equipment operations. For instance,
369
glossary 353-390.qxd 3/3/00 3:00 PM Page 370
IPP
Independent power producer. A producer outside the generating and
transmission companys jurisdiction but who has the right to sell power
to the generating companys grid on an economic dispatch model.
ISO 9000
A European common market standard requiring process and mapping
certification. A certification standard that assures basic process con-
trols are in place for production manufacturing.
Jumpers
Temporary power or control cables that defeat the purpose of a control
device, including interlock. The purpose of a jumper is to temporarily
defeat a control or interlock to facilitate maintenance, or bypass a fail-
ure.
KISS
Keep it simple stupid: a military term.
LAN
Local area network: a shared computer or microprocessor network.
LCM
Life-cycle maintenance
LCO
Limiting condition for operation: the technical specification limit that
requires shutdown or entry into a grace period when exceeded. A grace
period, when expired, must be followed by appropriate action-often
shutdown, if the condition cant (or hasnt) been corrected. A risk man-
agement tool for nuclear power plants.
Learning
A process that reduces times and cost to perform an activity. Used in
manufacturing to represent the general improvement in design and cost
as products enter and proceed through a production life cycle.
370
glossary 353-390.qxd 3/3/00 3:00 PM Page 371
Glossary
Legacy systems
Existing, company-developed systems.
Life-cycle
The progression of a product from introduction through production
and into obsolescence. It occurs over many years.
Life-cycle maintenance
A maintenance plan with the overall product life-cycle strategy in mind.
Life-cycle cost
Total cost throughout the product life cycle including disposal costs.
Often the initial cost is the driving factor in a purchase decision. Like
the owner of a new European sports car, the owner may find that the
total operating costs far outweigh the initial cost.
Like-for-like
Like-for-like replacement: exact replacement. In contrast with replace-
ment by an improved or superior part. A replacement that minimally
maintains performance.
Living
Ongoing, changing, and evolving.
Loaded cost
Costs including overhead charges that may not be applied at the plant
level. Overhead charges pay for staff and corporate services not ordi-
narily charged directly at the plant level.
LTA
Logic tree analysis: an RCM decision analysis for classifying failures by
type. This in turn influences the maintenance strategy selected. One of
the more confusing aspects of traditional RCM for new users.
Lubrication
A process of replenishment of aging lubricants that are a part of the
equipment.
LWR
Light water reactor: the types of commercial nuclear plants licensed in
the United States. They in contrast with a heavy water reactor, such as
the Canadian Candu reactors.
371
glossary 353-390.qxd 3/3/00 3:00 PM Page 372
Maintain
To preserve in an orderly state.
Maintainability
The capacity to maintain equipment. The design consideration of main-
tenance to provide access, turnaround, tools, and other support
requirements to facilitate maintenance.
Maintenance
(as defined by 10CFR50.65) The aggregate of those functions required
to preserve or restore safety, reliability, and availability of plant struc-
tures, systems, and components.
Maintenance rule
(10CFR50.65) An NRC (federal) rule that requires nuclear plants to
perform maintenance monitoring for in-scope structures, systems, and
components, and their safety functions and take corrective action
appropriately. Informally called the maintenance rule.
Maintenance strategy
A plan for the maintenance of a component or equipment in an RCM
format using a combination of CNM, OCM, TBM, OCMFF, and the
resulting CDM. NSM is the null strategy.
Markov analysis
A type of conditional probability analysis used for the prediction of suc-
cessful events with preconditions. One use is the likelihood of starting
emergency diesel generators after faults.
Mechanism
See: Failure mechanism.
Metal clad
Metallic cladding applied to some nuclear fuel types for protection. For
many fuels, the primary fission product barrier that restrains fissionable
gases from release to the environment.
MIL spec
Military specification. A military standard derived from U.S. DOD
equipment procurement specifications. MIL-STD-2173 (AS), i.e.,
addresses provision for FMECA reliability analysis for procurement.
372
glossary 353-390.qxd 3/3/00 3:00 PM Page 373
Glossary
Mill
Coal mill. A machine that pulverizes coal to a fine combustible dust,
mixing it with air in the process.
MIS
Maintenance information system (See: CMMS).
Mixed waste
Waste that includes both radioactive material under the jurisdiction of
the NRC and hazardous material under the jurisdiction of the EPA.
Mod
Design modification: a change to a plants fundamental design.
Sometimes as simple as a part upgrade, or as complex as replacing a
precipitator with a bag house. Most fall somewhere between these
extremes, and many times wont be recognized as a design change (in
non-nuclear applications).
Morpheline
A volatile organic chemical used for treating feedwater where inorgan-
ic chemicals arent acceptable.
MORT
Management oversight risk tree
MPFF
Maintenance preventable functional failure
mREM
One thousandth of a REM: the common practical measure of exposure.
Typical radiation jobs incur several mREM of exposure. Big jobstens
of mREM. Bigger jobs correspondingly more. Typical limits are 40
mREM/week.
MSG-3
Maintenance steering group standard-3: commercial aviation mainte-
nance standard for RCM-based maintenance programs maintained by
the Airline Transport Association (ATA).
MTBF
Mean time between failure: the average period between failures for a
failure mode.
373
glossary 353-390.qxd 3/3/00 3:00 PM Page 374
MTTR
Mean time to repair: the average time to restore a failed component to
service for a given failure mode.
Multiple failure
More than one concurrent failure. In contrast with single failure.
MW
Megawatt: 1,000 Kilowatts. A city of one million people has an electric
demand of about 1,000 megawatts, or 1,000 watts per capita. This is the
output of a relatively large two-unit generating station of 500 MW each.
This is a common standard.
NDE
Non-destructive examination: evaluation of a material condition such as
welds without destructive examination. Uses methods such as radi-
ograph, ultrasonic inspection, and replication to assess condition.
Near miss
An event which breaches several levels of protectionusually leaving
one remaining fault barrier.
NEC
National Electric Code
NERC
North American Electric Reliability Council. A voluntary organization,
that sets standards and rules for interconnected transmission system
generating station requirements to assure reliability of the transmission
system. The country is divided into 10 contiguous interconnected
regions. NERC regional committee members take responsibility for
meeting requirements that assure transmission system reliability, such as
establishing and maintaining rotating reserve and standby reserve
requirements. These are generating units immediately available or
available on short notice to come online to meet contingencies. NERC
also measures the overall reliability of member units. Whenever a unit
is brought online or removed from service, the plant records the nature
of the status change and cause.
374
glossary 353-390.qxd 3/3/00 3:00 PM Page 375
Glossary
NFPA
National Fire Protection Association
NRC
Nuclear Regulatory Commission
NSM
No scheduled maintenance. A plan of using condition monitoring to
wait for a maintenance requirement to become evident. Legitimacy is
based on theoretical and actuarial studies that form the basis for relia-
bility centered maintenance.
O&M
Operations and maintenance
OCM
On-condition maintenance. The first check/inspect part of an on-con-
dition/condition-directed maintenance pair. A combination of task,
limit, and performance interval.
OEM
Original equipment manufacturer: the original supplier. Contrasts with
the secondary market or after-market supplier.
Old hat
Graybeard. A very experienced person.
On-condition
A scheduled maintenance activity with a specific monitoring method
and failure resistance limit identified that experts have agreed detects
resistance to failure. Upon exceeding this limit, resistance to failure has
declined so that failure will occur. Equipment is removed from service
for maintenance, at this point. It can be as simple as measuring the
thickness of remaining tread on a tire or as complex as modal analysis
for vibrations. The key concepts are resistance to failure and explicit,
repeatable limits that trigger condition-directed maintenance.
On-condition/condition-directed pair
A two-part maintenance activity that is unique to RCM.
375
glossary 353-390.qxd 3/3/00 3:00 PM Page 376
OOS
Out of service.
Operate
To run, maintain, and dispatch production from a facility. To exercise
disgression in the asset to generate income.
Operationalize
Make useable in an operating environment. For example, a standard
must always be operationalized. This process develops the infrastruc-
ture and insures that the utility can make an activity work in a produc-
tion environment.
Operator
The owner-operator entity. More commonly, a person charged to mon-
itor, configure, and report plant conditions, working on shift.
OSHA
Occupational Safety and Health Administration
OTF
Operate to failure. A characteristic of RCM over-emphasized in after-
market books. No scheduled maintenance or maintenance required
on an interval exceeding the assets useful life is a more appropriate
term in the authors opinion. See also NSM.
Outage
A scheduled production down period to facilitate maintenance. Outage
maintenance is comprised of on-condition, time-based, and condition-
based maintenance. For nuclear units, this also provides the window
to refuel the reactor for American BWRs and PWRs.
Overhaul
To rebuild by teardown, reassembling with new consumable parts, and
reworking all parts and components to a like new condition.
Terminology applied to any large complex piece of equipment, from
diesel engine to turbine disassemble/reassemble work.
Pareto
Vinceto Pareto, Italian mathematician and statistician. Pareto demon-
strated the statistical presentation of data in block chart format by fre-
376
glossary 353-390.qxd 3/3/00 3:00 PM Page 377
Glossary
quency and expressed an early version of the now trite 80/20 rule that
summarizes skewness often present in statistical data.
Pareto chart
Data presentation in block chart format ordered by frequencymost
frequent to least.
PC
Primary containment
PCRV
Pre-stressed concrete reactor vessel
PdM
Predictive maintenance
Peak load
Load added only as demand requires. Since demand varies by hour,
day, week, and season, some units will be started and stopped with
demand according to economic dispatch rules. These units are loose-
ly termed peakers and largely comprise hydro, combustion turbine,
gas-fired, and a few coal-fired boilers. Nuclear units are not peakers.
Peaker
A plant used to supply system peak load periods. Gas turbines, gas-
fired boilers (sometimes re-powered coal-fired units), pumped storage,
and diesel may comprise units in peaking service. Usually not econom-
ically dispatched until all base load is available due to high fuel cost.
Permissive
Logic permissive: a control scheme that must be completed for an
action to be permitted.
P&ID
Process and instrumentation drawings: design drawings supplied with
the plant (along with vendor O&M manuals) by the A-E to aid in per-
formance plant maintenance and modification.
Pillow block
A bearing housing shaped literally like a pillow. Often installed as a sep-
arate assembly for large equipment like coal belts.
377
glossary 353-390.qxd 3/3/00 3:00 PM Page 378
Planned maintenance
Prepared maintenance plans for equipment that requires repetitive
maintenance. May include standard clearance points, parts, tools , and
resources such as labor and contractors. Planned maintenance is made
up of scheduled, on-condition, condition-directed, and some condition-
based maintenance.
PM
Preventive maintenance: planned scheduled maintenance activity. Also,
slang for PM work orders. Also, a scheduled maintenance program.
Sometimes used to refer to the discretionary part of the scheduled main-
tenance program in a regulatory enviroment.
PMO
Preventive maintenance optimization: a maintenance optimization
process that streamlines RCM. Primary advantage is simplification of
paperwork and coding based on the LTA of TRCM.
Poke yoke
Make user-friendly and simple: a Japanese term summarizing a tech-
nique that stresses task simplification to remove or diminish the possi-
bility of error. Poke yoke devices are devices which serve the same pur-
pose.
Population
Statistical population
Pot
Potentiometer: a variable resistor often installed in instrument loops to
facilitate calibration.
Potential failure
A failure that is imminent based upon exceedance of a failure resistance
standard. Examples: High-pressure boiler tube wall thickness less than
0.025 inches, machine vibration amplitude in excess of that machines
specified limit (say 5 mils at 1800 RPM).
PRB
Powder River Basin (coal): a distinctive Western low-sulfur coal char-
acterized by low heat rate, high volatility, and dust. Widely used based
on low sulfur content and price.
378
glossary 353-390.qxd 3/3/00 3:00 PM Page 379
Glossary
Precursor
Precursor event. An event that predicts susceptibility for (precurses)
a more severe event. An event that predicts future events of the same
nature, but more severe in consequence. A leading risk indicator.
Predictive maintenance
Maintenance to diagnose conditions and predict future maintenance
requirements. Gradually being supplanted by condition-directed and
on-condition terminology.
Premature failure
Failure prior to planned end-of-life.
Premature removal
Removal from service ahead of schedule due to unsatisfactory service,
or selection as part of an age-exploration sample for new equipment.
Present value
Present value of money: total cost adjusted by discount factor and time.
Preventive maintenance
Traditional term for scheduled maintenance (10CFR50.65). Predictive,
periodic, and planned maintenance actions taken prior to failure to
maintain SSC within design operating conditions by controlling degra-
dation or failure.
Primary failure
The immediate failure. The first failure. A tire blowout causing an acci-
dent is the primary failure. See also secondary failure.
Process
A defined way in which something is transformed. A process must have
specified inputs/outputs and processing technology.
379
glossary 353-390.qxd 3/3/00 3:00 PM Page 380
Profound knowledge
Intrinsic, hard-to-replicate knowledge of a process. Almost always pro-
prietary, whether by formal intent or functional cost to extract. Often
the basis for competitive advantage. Term coined by W. E. Deming.
Proximate cause
The immediate, local cause. Not necessarily a root cause (based on
RCFA), but known in engineering circles as root cause. The apparent
cause of failure. The cause evident at the failure location.
PUC
Public Utility Commission: the government entity that regulates the tra-
ditional utility environment in the public interest. Also known as PRC,
RC, and other acronyms.
Puff
Low-pressure explosion, large enough to damage large low-pressure
boiler walls, ducts, and mills. Overpressure on the order of several
inches of water. Because of the limited extent of over pressurization,
commonly called a puff. to designate minor nature
Pulverizer
Coal mill
PWR
Pressurized water reactor
RAM
Reliability availability and maintainability. A type of analysis that looks
at the total reliability of a system and factors in maintenance turnaround
time. Developed for aerospace and other high-cost applications such as
weapons programs to establish theoretical baselines for performance
expectations.
RCFA
Root cause failure analysis. Analysis to uncover root causes of prob-
lems. There are approximately 10 different root cause techniques.
They can be further delineated into stochastic and statistical groups.
Its important to understand the RCFA context. Nuclear units, for
example, dont use statistical RCFA.
380
glossary 353-390.qxd 3/3/00 3:00 PM Page 381
Glossary
Redundant
(Websters) More than enough. Excess. Important safety, operational,
instrumentation, and other features are provided with redundancy in
engineering designs to assure their availability. Anyone whos ever
gazed into the cockpit of a commercial airliner has seen the four-fold
redundancy of critical instruments such as altimeter, direction, and roll.
Redundant equipment is provided in duplicated and triplicate precise-
ly because the function is critically required.
Regulate
Broadly speaking in a process control perspectivecontrol. For
transmission and distribution control, regulate describes the remote
operation of a generator to provide instantaneous load following. Some
power plants are reserved for base loadingprincipally nuclear units
and very large fossil units which are hard to start up and shut down, or
which dont follow load well. Typically, hydro, small fossil, and com-
bustion turbine peakers are used for regulation. They follow the load
through the course of the day, adjusting for instantaneous load changes.
Reliability
The ratio of successful missions to total trials. The degree to which an
operating unit meets the expectations of the operating entity (usually
owner) between scheduled down periods. The expectation that the
SSC will perform its function upon demand at any future instant
(10CFR50.65).
REM
Roentgen equivalent man: the primary measure of radiation dose expo-
sure used in the nuclear industry in the 1980s. Superseded by dose
equivalents in Sievelts (Sv), where 10 0 REM = 1 Sv.
Repair
Restore to specifications using welding or other processes. More than
rework/replace. Typically involves certified personnel and testing to
assure specification.
Replace
A rework task where a like for like part replacement is performed.
Commonly applied to filters, lubricating fluids, greases, and other con-
sumables that require repetitious service during equipment operation.
381
glossary 353-390.qxd 3/3/00 3:00 PM Page 382
Restore
Return to exact specification. Most PM work is technically of a restora-
tion nature.
Rework
Unneccesarily perform work again. A job may require rework due to
infant mortality failure (relatively common in power applications),
unserviceable parts, failed performance tests, or (occasionally) lack of
appropriate documents and certifications (nuclear and aerospace
work). For fossil, bad welds are typical of a rework requirement.
Risk
What can happen (scenario), its likelihood (probability) and its level of
damage (consequences).
RO
Reverse osmosis: a common type of water makeup train purifier.
Root cause
The basic cause. In traditional RCFA, a cause that, once removed, pre-
vents recurrence (a stochastic perspective). This context is different
from a root cause on an Ishikawa diagram, which takes a probabilistic
perspective.
Round
A scheduled activity that checks a large part of a facility. Generally
comprised of a series of area checks with intermittent on-condition
checks. Also, a scheduled review of screens on DCS systems.
Round sheet
A round logsheet, updated in real time by round logging devices. A
sequence of readings currently being superceded by round logging devices.
382
glossary 353-390.qxd 3/3/00 3:00 PM Page 383
Glossary
Run to failure
A misnomer, a term intended to summarize the no scheduled mainte-
nance aspect of many planned maintenance tasks. Misleading because
virtually none of these tasks result in functional failure. See also NSM.
Schedule
Enter an activity into a scheduling system.
Secondary failure
Indirect failure. Failure that results from a primary failure. A boiler
tube leak caused from steam cutting from another tube leak is a sec-
ondary failure.
Service
Maintain.
Shifter
Shift supervisor, called operating engineer at some plants.
Significant
Equipment that has either a safety or economic impact, thereby war-
ranting review for potential PM benefits.
Simple item
An item characterized by very few failure modes. A relative term.
Review of the failure history of a simple item results in very few repeti-
tive failure modes recurring with great frequency. A filter and a journal
bearing are two example of simple items. The building block for com-
plex items. Contrast with complex item.
383
glossary 353-390.qxd 3/3/00 3:00 PM Page 384
Single Failure
One concurrent failure. A simple failure to diagnose and correct, in
contrast with a multiple failure.
Six Sigma
A quality goal based on the reduction of failure frequency to less than
one in two million events.
Smoke
To destroy something. Occasionally from overload, continued use in
failure, or abuse. Smoking a motor, breaker, or starter are examples.
SNAFU
Situation normal all fouled up: a fiasco on a large scale. An organiza-
tional mix-up.
SP
Surveillance program: a planned functional test program made up
largely of on-condition and on-condition failure finding tasks used at
nuclear power plans to verify the functional capability of standby, pro-
tective, and alarm equipment
SPC
Statistical Process Control. A statistical study of processes that provides
measures of process capability and control. Widely used in manufac-
turing of high-quality products. Advocated for floor-up quality control.
SRCM
Streamlined RCM: an abbreviated version of RCM that simplifies RCM
using a two-path critical/non-critical approach. Developed by the
EPRI.
SSC
(10CFR50.65) Structures, systems, and components.
Startup
Plant startup. A special period lasting from several minutes (for fast-
384
glossary 353-390.qxd 3/3/00 3:00 PM Page 385
Glossary
start combustion turbines) to several days (for large baseload coal and
nuclear plants) that requires manual intervention, reconfiguration, and
direct support to place the plant in an operating phase.
Stroke
Operate, or test operation of, as stroke a valve.
Substitution
Replacement of an OEM part with an aftermarket one.
Super session
A super-session resembles MS Windows in which multiple applications
can be kept running so the user can jump between applications without
need to formally shutdown and restart applications. Early applications
could only run one per terminallike some DOS-based PCs even
today. This is a significant productivity tool.
SWOT analysis
Strength-weakness opportunity threat: a type of subjective risk analysis.
Synthetic
Synthetic oil: chemically constructed lubricants, in contrast with dis-
tilled column fractionated oil common in traditional lubricants. Such
lubricants generally possesss superior qualities, but at a price.
Synthetics cost four or more times more than specified traditional lubri-
cants.
System
A defined equipment group that performs a specific set of functions.
Usually, the A-Es plant documents provide a list of all plant systems,
their major equipment, functions, and expected operating conditions.
Very often all the related CIC lists and vendor O&M manuals are pro-
vided in binders (1970-1980s vintage units) that organize all informa-
tion about a plant. They are retained along with A-E design drawings
in document centers or shops for reference doing maintenance.
Tagno
Tag number; same as CIC.
385
glossary 353-390.qxd 3/3/00 3:00 PM Page 386
Tag outs
An equipment control technique that facilitates work on equipment in
an operating plant. Also known as clearance. A controlled technique
to seperate energy from equipment to perform work.
Task
A single activity with a failure prevention aspect. The basic building
block of a PMWO. Usually, a PM consists of enough tasks to make
effective use of operator trip time to and from the work location. Many
simple tasks require 10 to 15 minutes to perform and are listed in ven-
dor manuals. Invariably, tasks must be organized into larger work activ-
ities or rounds to be done cost effectively.
TBM
Time-based maintenance. Roughly equivalent to hard time mainte-
nance with one slight distinction: The on-condition part of a two-
part on-condition/condition-directed maintenance pair can be consid-
ered as time-based. Its scheduled off the same software system as the
TBM task activities and looks virtually the same from a scheduler per-
spective.
Tech spec
Technical specifications. All equipment has technical specifications
used for reference in performance testing for deterioration. Nuclear
plants also have technical specifications that provide a basis for operat-
ing licenses. They must operate within these specifications or shut
down.
Tender
Job title for the lower seniority operators who roam the plants service
and outside areas, monitoring, servicing, and configuring equipment.
Time card
A charge for time, typically made against an activity or account. In some
CMMS systems, a time card and a work order are combined for mainte-
nance workers.
TMI
Three Mile Island. The Pennsylvania nuclear plant whose trip and shut-
down in March 1979 set the nuclear industry spinning from the adverse
386
glossary 353-390.qxd 3/3/00 3:00 PM Page 387
Glossary
publicity and cost. The most serious commercial nuclear power plant
event in North America.
Total cost
Total life cycle cost. The total cost of operations, as distinct from oper-
ating and maintenance (O&M) cost, or startup or installation cost.
TPM
Total productive maintenance
TQM
Total quality management: a field of quality management that received
a great deal of promotion in the 1980s as traditional manufacturing
faced competitive pressure form overseas suppliers.
TRCM
Traditional RCM
Trip
Automatic or manual shut down of a piece of equipment. An operator
can manually trip a turbine or a breaker could trip on a ground fault
protection relay.
Tripper
Tripper belt. The coal unloading belt, the last of a series of belts that
moves coal from a railroad unloading point (often a rotary dumper) to
the housethe plant.
UAL
United Airlines
Unit
Generating unit. One increment of generating capacity at a plant
Useful life
Economically useful life. The period of time when an item can be
expected to operate with predictable cost and performance.
VAR
Volt amp reactive: in power flows, this portion of power provides volt-
age, but does no work. Its necessary to support the voltage in trans-
mission and distribution systems
387
glossary 353-390.qxd 3/3/00 3:00 PM Page 388
Violation
A citation for violation of an article of law. Common jargon used in the
nuclear industry for citations under 10CFR50 and related parts of the
federal registrar. Very much subject to interpretation and established
precedent. Becoming common usage at fossil plants as EPA, OSHA,
DOT, and other agencies spread their wings.
VOM
Volt-ohm meter
VWO
Valves wide open: for a turbine, the maximum practical load that can
be placed on a machine.
Walk around
Area check: A tour looking for general failure evidence or environmen-
tal factors. Sometimes performed by management or non-routine per-
formers at plants.
Wear
To impair, consume or diminish by constant use, handling or friction; to
tire or exhaust.
Wearout
Fail gradually, with degrading performance allowing a long period to
evaluate alternatives and options. Similar to non-failures except that
performance specifications and expectations arent met. Gradual per-
formance deterioration until further service is no longer cost-effective.
Weibull
A specific mathematical distribution named after Lauritz Weibull, who
first used it extensively to model failure with age, infant mortality, and
randomness characteristics.
Weibull analysis
An analysis of failure data to fit the observed measurements to a Weibull
distribution. This can be done with specific Weibull analysis paper or
using software.
WO
Work order: a work authorization that conveys not only information
388
glossary 353-390.qxd 3/3/00 3:00 PM Page 389
Glossary
WSCC
Western States Coordinating Council: the region of NERC covering the
Western states. One of eight NERC regions.
Xerox-style benchmarking
Process benchmarking. More complex than traditional benchmarking
because the process is also examined and compared.
Zerks
Grease fittings. After the manufacturer trade name.
Zero time
To reset the component aging clock to zero after an overhaul. To make
the item statistically indistinguishable from new based on mission
goals, performance, and failure criteria.
Zonal inspection
Inspection of an area or zone. Typically includes environmental con-
ditions, leakage, and other non-specific conditions that an experienced
person is expected to know. A pre-flight walk around aircraft check is
a zonal inspection.
389
glossary 353-390.qxd 3/3/00 3:00 PM Page 390
further reading 391-436.qxd 3/3/00 3:01 PM Page 391
Further Readings
No Silver Bullets
RCM is not a silver bullet. Ultimately, improved performance comes
from better maintenance selection, timing, and performance. RCM helps
with selection timing and provides tools to raise awareness. Both timing
and performance benefit from heightened equipment awareness. Timing
improves first, maintenance performance, later.
As maintenance programs improve, two things become evident.
Crises decrease, but maintenance costs run higher. As crises decrease,
overtime, low productivity work, material parts expense, and service
expenses fall. After a yearlong enough to capture secondary cost fac-
torsproduction unit costs start to fall. More mega-wiggles are pro-
duced, so unit costs drop. This decrease in unit production cost due
to increased availability is a major benefit.
Long term effects are an increase in worker productivity and a
decrease in maintenance costs. These changes take approximately 2-5
years. This time period allows the benefits of reduced overtime, better
quality work, and improved productivity to build up. The requirement
for measurement to roll-up, as well as the time to allow fundamental
changes to influence machine performance causes the delay. Expect 1-
2 years to see an improvement in maintenance unit costs. For highly
reactive environments with high maintenance costssuch as those sole-
ly focused on in-service failuresimprovement can take place more
quicklyas short as six months! Improvement is seen as a decline in
unbudgeted maintenance expense and a decrease in both total hours
and cost by system. In order to see the decrease, you must first have sys-
tem-level performance measurement.
Aggressive RCM implementation can increase short-term costs.
Implementation makes the staff aware of the weaknesses in measure-
ment and administrative systems. A measurement infrastructure must
then be developed. The need for increased infrastructure begins to
grow in other areas. Requests for productivity tools and cost-effective
modifications rise. Training needs are recognized and their requests
391
further reading 391-436.qxd 3/3/00 3:01 PM Page 392
392
further reading 391-436.qxd 3/3/00 3:01 PM Page 393
Further Readings
Missed PM
A nuclear plant declared a reactor core isolation cooling (RCIC)
pump inoperable based on a low-lube oil level. On his rounds, an
operator dipped the sump and the level came up just below the low-
level mark. The obvious and simple thing to do was to add oil.
However, the plant was operating, and HP (Health Physics) was leery
about exposure risk for a simple PM. They held up the work order, put-
ting Operations in the awkward position of having to perform an oper-
ability assessment. Asked for my opinion, my initial reaction was, You
393
further reading 391-436.qxd 3/3/00 3:01 PM Page 394
394
further reading 391-436.qxd 3/3/00 3:01 PM Page 395
Further Readings
395
further reading 391-436.qxd 3/3/00 3:01 PM Page 396
ALARA
An HTGR nuclear plant had intermittent control rod problems. As
the reactor engineer, I proposed PM to restore several of the nine inop-
erable spares to service level. We did not have any spare control rod
assemblies. Unfortunately, control rods are not only contaminated, but
activated, and potentially cause substantial exposure. Working on the
lower activated areas resulted in exposures of up to 10 millirem per
hour. The neutron absorbers were far too hot to work with directly.
Fortunately, assembly work was many feet away. For three years, Health
Physics (HP) held the PM work orders based upon ALARA (as low as
reasonably achievable radiation exposure). We could not perform
control rod assembly maintenance. Then an event occurred. The plant
scrammed during startup, and six control rod drives failed to insert.
The alert went up the corporate ladder to the president, since the plant
was under shutdown order. Suddenly, it was an all-out sprint to devel-
op and implement control rod-drive PM plans. Now HP was receptive.
We knew what needed to be done, but were woefully short of spare
parts. We puttered around developing work plans and failure informa-
tion. It was months before we could start work. When we did, the
exposures from control rod overhauls were between 20 and 500 mil-
lirem per drive. The total overhaul project, as I recall, required a grand
total of around a 100 man-REM over the course of one and a half years.
HP learned to tolerate control rod PM-related exposures. But HP was
still a maintenance work barrier. All work orders in this plant went
through HPeven work orders in the switchyard! On a good day,
walking a WO around for sign-offs took four hours. The fact that the
plant was the radioactively cleanest in the country and less than 1% of
work hours involved contamination or radiation spaces could not influ-
ence this turnaround.
At one critical point, while rebuilding control rods, we reattached
the highly-radioactive neutron absorbers to the drive assembly. We
would figuratively burn up mechanics on their weekly administrative
radiation exposure limits. A few subtle points help explain why the
plant was eventually forced to shut down. First, HP and ALARA fun-
damentally did not recognize PM in their plan. They discounted any
396
further reading 391-436.qxd 3/3/00 3:01 PM Page 397
Further Readings
work that was not a crisis. HP was much more receptive to broken
equipment maintenance directly supported by the plant manager. This
reflected the prevailing culture at the plant.
Second, most of the control rod drive failures were secondary fail-
ures. The absence of a startup maintenance strategy on control rod
drives (and a host of other equipment) necessitated earlier overhaul per-
formance. For radioactive equipment, the longer an overhaul interval
can be stretched the lower the man-REM exposure. HP ALARA, as
practiced, ultimately increased the life-cycle exposure for the plant.
Theoretically, PM warrants ALARA recognition. Most HP administra-
tors and technicians know little about maintenance. They do not trust
the maintenance supervisors and workersafter all, they cause 95% of
the HP workload, including contamination events! Work groups
reflect prevailing culture. This HP department approved work based
on a call from the plant manager. This was ultimately not competitive
for a commercial nuclear plant.
Optimizing radiation exposure and maintenance costs remains a
challenge at nuclear units today. Recently, HP concerns were again a
barrier to full-scope PM plan implementation. The nuclear world has
not improved in 20 years. One solution is to put the life-cycle mainte-
nance strategy on an RCM-based foundation and pre-approve planned
work. Condition-monitoring programs, on-condition maintenance,
fixed time maintenance, and condition-directed maintenance activities
can be reviewed and pre-approved by HP. ALARA should not be a bar-
rier to PM. ALARA is, in so many ways, simply another cost of doing
nuclear maintenance that has to be optimized in an overall plant con-
text. As with safety, ALARA concerns must be placed on a common
playing field. An activity avoided today that will incur a future expo-
sure ten times as great does not implement ALARA.
397
further reading 391-436.qxd 3/3/00 3:01 PM Page 398
398
further reading 391-436.qxd 3/3/00 3:01 PM Page 399
Further Readings
399
further reading 391-436.qxd 3/3/00 3:01 PM Page 400
400
further reading 391-436.qxd 3/3/00 3:01 PM Page 401
Further Readings
401
further reading 391-436.qxd 3/3/00 3:01 PM Page 402
Instruments
Stages
402
further reading 391-436.qxd 3/3/00 3:01 PM Page 403
Further Readings
403
further reading 391-436.qxd 3/3/00 3:01 PM Page 404
Conditional Overhauls
Conditional overhauls specifically correct equipment failure, its cause,
any secondary failures, and nothing more. A conditional overhaul is an
opportunity for traditional workshops. The basic idea of only fixing the
obvious primary and secondary damage from a failure is widely used in the
commercial world. Aircraft turbines receive conditional overhauls. A
conditional overhaul extends to automobiles, diesels, and other large
equipment such as power turbines. We conditionally overhaul equipment
at home. When an automobile engine fails due to a main bearing, we eval-
uate the remaining life on the engine. Then we either perform a selective
complete bearing replacement (on a relatively new engine), or bearings
and cylinders on an older engine. If there is any ring or valve damage we
fix that too, based on the equipments age and our inspection. To apply
conditional overhauls, one must understand the time-based needs for the
equipment. Then, when a failure occurs, they must evaluate the failures
occurring in the context of the equipments age. Effort shifts to fixing the
failed equipment. This contrasts with a traditional shop practice of tear-
ing failed equipment down to be rebuilt up from the frame.
We lost a large compressor at a plant. We knew we had finishing stage
problems, but we lacked the staff adequate to follow up the job. Although
the nominal overhaul interval had been four years, only 18 months had
elapsed. The assigned mechanic did a complete compressor teardown.
404
further reading 391-436.qxd 3/3/00 3:01 PM Page 405
Further Readings
When it was complete, the only problem noted had been the premature
failure of the fifth stage compressor wheel. It was an expensive over-
haul at $250,000; on the other hand, fifth stage replacement ran around
$30,000.
In another example, a gearbox was lost due to a failed retainer. A
grinding noise had resulted in the early gearset shutdown. The mechanic
tore into the gearbox with no specific instructions other than to correct the
problems. We found a missing retainer immediately after opening the
gearbox, 10 minutes into a six-hour job. We proceeded through the entire
disassembly from one end to the other, although we suspected the missing
retainer was the sole problem. On completion, we confirmed this. Guide
bearing misalignment from the missing retainer was the sole problem. It
could have been replaced, the cover installed, and the entire assembly run
successfully even before we had finished our exploratory surgery. RCM
states that this partial rebuild strategy is sound and that we should use it.
This approach should be built into a facility work maintenance process. Jet
engine actuarial studies showed that there is no statistically different per-
formance between a conditionally overhauled machine and the complete-
ly overhauled one.
This lesson is counter-intuitive to most shop thinking. The feeling is
that it cannot be good until you have looked the entire machine over.
Given that many shops implicitly direct mechanics to perform work as
they see necessary, conditional overhauls are a prime opportunity to
reduce low-value work. To practically implement conditional overhauls,
however, mechanics must recognize the difference between a conditional
and a full overhaul, committing to perform conditional overhauls when
appropriate. Many mechanics enjoy equipment work, especially tear-
down maintenance. This is why they are mechanics; they excel in their
jobs. Everyone in maintenance has to remember that performance of the
maintenance, in itself, is not the purpose. It is to keep the equipment func-
tional with as few resources as necessary.
405
further reading 391-436.qxd 3/3/00 3:01 PM Page 406
406
further reading 391-436.qxd 3/3/00 3:01 PM Page 407
Further Readings
failure prevention tasks, the lesson is that intervals are extended too
conservatively. The use of condition monitoring, age exploration, and
other hedges can reduce the tendency to incrementally extend equip-
ment inspection intervals. A database of equipment components and
their failure modes is also helpful, as are benchmark intervals. A char-
acteristic of modern equipment is the combination of one or several
dominant age-based failure modes and an underlying complexity. The
composite exhibits mixed characteristics. A strategy of managing the
known aging failures with on-condition or time-based maintenance, as
appropriate based on certainty of aging and organizational capability,
combined with condition monitoring, maintains this equipment very
well. The challenge for operating organizations is to develop simple
standard applications of this strategy.
People lacking confidence and experience are uncertain. Sunday
analysts are squeamish as opposed to hands-on staff or experienced ana-
lysts. Databases and experience provide confidence. For economic fail-
ures, rapidly getting intervals to the age-based failure range is the only
way to learn what aging equipment failure modes are present, how long
they take to develop, and what applications they can support.
Maintenance Budgeting
To focus on managing maintenance costs a plant or company must
have a motivating driver. The traditional utility environment has not
provided this focus. Some companies do not invest in productivity
growth. Operating budgets can influence how productivity grows.
In my experience, changes are made in a previous years budget
adjusted for corporate goals (minus 5-10%, whatever the corporate
accounting office desires). Take existing staff salaries and add overtime
percentage (about 5%), services, and historical material costs. Budget
for non-routine events such as major scheduled outages. Adjust for his-
torical trends such as inflation. The resulting budget is last years plus a
percentage. Then hope for the best!
In my years as manager, we wound up chronically over our mainte-
nance budget. A catastrophic year, such as one involving a turbine fail-
ure, could double budget expenses. How could budget overruns be
407
further reading 391-436.qxd 3/3/00 3:01 PM Page 408
sustained year in and out? How could they be tolerated with no cor-
porate response? Companies have bought the farm on maintenance.
They accept budgets and expenditures as they occur since no one ever
figured out an alternative. Corporate offices presume that historical
performance adjusted for predictable events is the most reliable budget
performance predicator. Corporate staffs manage catastrophic plant
outage risk by spreading them over all of the plants in the system, effec-
tively self-insuring. If a company owns 20 major generating units and
one major unbudgeted failure occurs per year (such as a $10-million
generator rewinding), it is buried in the $150 million of budgeted main-
tenance expenses for generation. At the fleet level, costs, including
catastrophes, are relatively predictable.
Corporate accounting departments can project likely expenditures
with certainty beyond formal budget submittals. This approach pro-
vides a simple cost-plus expense budgeting plan. It has no bias to
reduce costs. The approach presents barriers to promising new main-
tenance processes. It is averse to changes and to risks. It does not pro-
vide a return to plants for better production or maintenance perform-
ance. Emphasis is on existing staff and contract expenses, not innova-
tion. It fails to allocate money for long term improvements. In an envi-
ronment like this one, struggles over money for RCM, or any non-rou-
tine activity, will always exist. What are the ways to overcome these
mindset barriers?
innovation
standard measures
value focus
contingency budget reallocation
408
further reading 391-436.qxd 3/3/00 3:01 PM Page 409
Further Readings
cate part of the plant loss contingency budgets for developing cost-
effective, innovating technology. They should also fairly address any
work loss from productivity changes with affected workers. Success
with RCM means this problem will be encountered. The competitive
generation market will stimulate change. How much and how quickly is
a matter of conjecture. The beauty of RCM is the profound maintenance
process knowledge it provides. For those willing to master the subject,
maintenance need not be a mysterious, budget-busting free agent.
Bigger Opportunities
Most plants adequately manage turbines and boilers. RCM benefits
come from in-between systems that are high in cost and potential pro-
duction impact, but without chronic outage causes. These systems often
include:
sootblowers
sootblowing air
flue gas and overfire air
ash Handling
coal handling
coal Milling
Savings are found at the knee of the Pareto system cost curve.(See
fig. A-1 and A-2. These are discussed earlier in the book.)
409
further reading 391-436.qxd 3/3/00 3:01 PM Page 410
410
further reading 391-436.qxd 3/3/00 3:01 PM Page 411
Further Readings
Prioritization rules
Change
Prioritization simplified
E MWRs
Overtime
hours
CNM
work hours
shift to planned work
percentage of work tasks completed as preplanned OCM/CDM pairs
preplanned CDM work tasks
increase.
411
further reading 391-436.qxd 3/3/00 3:01 PM Page 412
Review List
Purpose: This checklist helps identify the availability of an effective
time-based maintenance scheduling system. This scheduling system sup-
ports the basic PM program foundation.
PM Health
Maintenance Process
Traditional CMMS system request work is based on noted prob-
lems. This is the corrective maintenance model. Maintenance begins
412
further reading 391-436.qxd 3/3/00 3:01 PM Page 413
Further Readings
design
materials
construction
environment
operation
413
further reading 391-436.qxd 3/3/00 3:01 PM Page 414
414
further reading 391-436.qxd 3/3/00 3:01 PM Page 415
Further Readings
Condition Monitoring
Most condition monitoring (CNM) maintenance is initiated
through operations. Condition monitoring is monitoring without
specific failure criteria. This is a double-edged sword. It can be hard
to rank, prioritize, and perform condition monitoring due to its gener-
ality. In the absence of time-based and on-condition work order cate-
gories an organization can measure its CNM-originated work. This is
based on the work percentage coming from Operations. If Operations
originates 70% of the work orders, then about 70% are no scheduled
maintenance. Scheduling, planning, and engineering initiate most of
the balance of outage, PM, and modification work.
Time-based maintenance comprises the traditional on-condition,
failure finding, and time-based rework/replace planned maintenance.
If a plant can tag originated work from time-based work orders, then
they can measure the RCM maintenance workload as follows:
415
further reading 391-436.qxd 3/3/00 3:01 PM Page 416
Summarizing,
PM (time-based): rework/replace TBM
416
further reading 391-436.qxd 3/3/00 3:01 PM Page 417
Further Readings
417
further reading 391-436.qxd 3/3/00 3:01 PM Page 418
Condition Monitoring or
Condition-Directed Maintenance?
Morpheline has a pungent smell. Once you smell morpheline, as
with smoldering coal, you remember it. It is used as a volatile feedwa-
ter treatment at some steam plants. At a fossil plant, on a Main Steam
maintenance optimization project, we were working on steam leak
detection tasks. Valve packing, turbine steam seals, and pipe cracks
cause steam leaks. The question is What is an appropriate mainte-
nance task to identify steam leaks?
Large leaks, noise, steam release, and increased makeup signal that
there is a problem needing to be identified. Noise usually accompanies
steam leaks. Saturated steam leaks exhibit vapor. Inability to maintain
makeup is a sign that a system is not secure. Changes in make-up trends
are one clue to the presence of a small leak. Leaks in inaccessible areas
of the boiler must be inferred. For accessible leaks local inspection,
vapor tests, and ultrasonic tests are the best identifiers. Valve packing
is the most common source of steam leaks. Checking a valves lantern
ring compression is a good time-based packing measure. Most opera-
tors and mechanics learn this on the job. Steam piping is all lagged so
418
further reading 391-436.qxd 3/3/00 3:01 PM Page 419
Further Readings
What is the difference between CDM and CNM? CDM has explic-
it thresholds and is scheduled. CNM is informal, although the two are
very close in performance. For operating tasks, it becomes somewhat
arbitrary as to the category in which an activity fits. CNM generally
requires more experience and skill to apply. Interpretation is subject to
opinion. Some operators note everything, while others see very little.
Experienced, skilled operating staffs use CNM with high degrees of
419
further reading 391-436.qxd 3/3/00 3:01 PM Page 420
Further Readings
Think a moment about the life-cycle costs. She put at least 300,000
miles on various cars over the years. (She worked and commuted 50-100
miles each workday over most of her life.) At around $200 per brake
joba competitive 1990s rate ($300 is probably more like it) we have
20 (100,000/5,000) jobs per 100,000 miles, or around 60 total jobs. In
todays dollars these added up to 60 ($200) or $12,000, conservatively.
Probably more like $18,000 considering secondary damage when the
brake lining work got missed. Then throw in the present value costs
over the years and you are up around $20,000. Then consider we
havent started to value anyones time! What would the training cost
have beena few hours? At perhaps, $50/hour for a skilled driver
(using todays rates). Cost benefit (benefit to cost) is 10,000/$100
(round terms), or well over 100/1 conservatively.
Missing this kind of opportunity on a personal level is expensive; in
business it is uneconomic. Unfortunately, for all the same reasons, busi-
nesses regularly miss the opportunity to train employees, especially
operators, in the optimum use of equipment to manage costs. I hesitate
to use correct because that presumes there is a correct way, and
there is not one. There are only costs.
Strategically, one distinction between excellent (low-cost, high-reli-
ability) companies and also-rans is the ability to train people cost-
effectively. Why are there so many also-ran companies in business? One
reason is protected markets, such as the traditional utility industry.
Another is market inefficiency. Many American companies see their
product benchmark costs as unfavorable overall, while their training
costs are negligable, and they cannot make the connection. They lack
the profound business knowledge to relate training costs to final prod-
uct costs. A former boss of mine, R. O. Williams, used to jokingly ask
Whos the most expensive person on the payroll? At the time he was
the highest-compensated executive in the company. His answer, The
worker who is not trained.
Failure Complexity
Failures can be classified as simple or complex. Simple failures
involve single faults and modes without interactions or secondary fail-
ures. Aging failuresin which a specification is exceeded, such as
421
further reading 391-436.qxd 3/3/00 3:01 PM Page 422
422
further reading 391-436.qxd 3/3/00 3:01 PM Page 423
Further Readings
High flame scanner, ignitor failures and control drift costs were the
secondary failures; they had a common root cause in the absence of design
environmental conditions.
423
further reading 391-436.qxd 3/3/00 3:01 PM Page 424
Coal handling equipment that monitored the tramp iron, belts, and
alarms went out of service for a variety of reasons. Coal handling did not
warrant resources beyond the emergency level. At the unit age of 15
years coal handling system costs had taken a number three position behind
the boiler and the turbine. It appeared a matter of time before direct coal
handling outages impacted production. Coal handling equipment:
424
further reading 391-436.qxd 3/3/00 3:01 PM Page 425
Further Readings
One of the drains plugged up, the tray overflowed, and the leaky fluid
dripped down onto exposed reheat steam safety valve hardware two levels
below. These started smoldering. Because of the non-conventional plant
design, the exposed parts of the safety valves were slightly above the flash
point of the fireproof fluid. The smoldering fluid ignited, and the little
fire eventually triggered the fire detection system. An operator responded
and extinguished the fire. Subsequent flashover after the flame was extin-
guished extensively damaged an area of intense cable, instrumentation,
and control equipment adjacent to a cable spreading room. Damage repair
took a focused effort of nearly 90 days and a special release from the NRC
to restart the unit.
The area of the fire was congested, dirty, poorly lit, and the facility had
historical problems of the hydraulic valves, especially oil leakage, that
were a root cause of the blaze. Direct costs of repairs were between $10
million and $20 million dollars.
425
further reading 391-436.qxd 3/3/00 3:01 PM Page 426
426
further reading 391-436.qxd 3/3/00 3:01 PM Page 427
Further Readings
comparison, such as a synthetic. We may find that the other lasts twice as
long in service. This approach is exact, engineering-based, and controlled.
Complex equipment provides a different challenge. It may not exhib-
it a dominant failure mode. We may not have enough experience to see
how it performs in service. However, we need a maintenance plan. If we
use the OEMs recommended interval and observe no failures over the
service period, how should we go about extending the interval? Here is
where RCM provides a useful tool. First, we need to quantify the failure
mode in question. It must not have a safe-life limit. We must be assured
that the failure will not create a personnel, public, or other hazard. If it
does, we should have, through the supplier and other agencies, a great deal
of information on which to fall back. If it does notas is the case in 90%
of the PM activity in a typical plantour next task is to reasonably extend
the interval. The RCM approach says that actuarially we have a solid basis
to extend the interval a substantial amountabout 50%! This is usually
a shock. For me, this is still like leaping off a cliff. Based upon studies per-
formed for no predominant failure mode complex equipment, we will
do very well with large service interval extensions by fitting a no experi-
ence template. These large extensions are exactly what we need to iden-
tify dominant failure mode characteristics in complex equipment.
This type of extension either very quickly extends parts out to where
a lifetime can be identified, or very quickly achieves substantial reductions
in PM hours performed and associated cost. In the context of an ARCM-
based approach, it can be done with very little economic risk. In this way,
we greatly accelerate the rate at which we learn the dominant failure
modes and their appropriate PM intervals.
Before RCM, few would perform substantial part life extensions.
Now we can extend intervals with some comfort. Not only are large exten-
sions possible, but they are statistically justified. In fact, there is very little
statistical justification for initial intervals for most equipment. Typically,
the first few failures are assumed to approximate mean life. We wind up
with greatly conservative service intervals from the onset. A corollary con-
cerns cases where PMs have been missed with no adverse failures devel-
oping. This experience justifies extending intervals to the discovery limit.
These add legitimacy to interval extension. With work performers includ-
ed in age exploration of parts performance, we can advance quickly to
more accurate realizations of potential equipment lifetimes.
427
further reading 391-436.qxd 3/3/00 3:01 PM Page 428
Key Points
redundancy
low failure impact
acceptable risk (for random failures, for example)
inherent reliability
No Direct System Impact: The item must not impact any essential
system functions.
428
further reading 391-436.qxd 3/3/00 3:01 PM Page 429
Further Readings
Engineering-Specified Failure
Cost
429
further reading 391-436.qxd 3/3/00 3:01 PM Page 430
CNM
Key:
PM -- Preventive Maintenance
CM -- Corrective Maintenance
TBM -- Time Based Maintenance
CDM -- Condition-Directed Maintenance
OCM -- On-Condition Maintenance
OCMFF -- (OCM) Failure Finding
NSM -- No Scheduled Maintenance
CNM -- Condition Monitoring
Figure A-4: Maintenance Terms Map
Identification
430
further reading 391-436.qxd 3/3/00 3:01 PM Page 431
Further Readings
Typical Candidates
Examples:
Home
1. light bulbs
2. small TVs, other consumer electronics
3. small appliances
4. watches
Plant
431
further reading 391-436.qxd 3/3/00 3:01 PM Page 432
Maintenance Discipline
When maintenance programs struggle with PM, it may reflect a
problem with discipline. Developing and following a work plan reflects
maintenance discipline. Discipline means the ability to comply with
standards, no matter what their source. Correctly initiating work
orders, working to procedures, working to schedules, meeting dead-
lines, writing work summaries on completed work orders, signing com-
pleted workall can be reduced to basic work habits that demonstrate
commitment to standards. Work habits are hard to learn and easy to
compromise.
The Navy relaxed standards in the early 1970s. Candy, food, and
beverages increased the food residue in sleep areas. In short order, on
some ships, shipboard spaces looked like dumps. Cockroaches became
shipmates.
Discipline requires standards, training, and reinforcement.
Unfortunately, reinforcing behaviors is not the strength of traditional
maintenance. Unaccountability can prevail. Lawsuits have been filed
against companies stemming from the most trivial attempts to exercise
standards and authority. Submitting signed, accurate time cards, keep-
ing tools stored, cleaning work areas, even wearing shoes to work were
432
further reading 391-436.qxd 3/3/00 3:01 PM Page 433
Further Readings
433
further reading 391-436.qxd 3/3/00 3:01 PM Page 434
434
further reading 391-436.qxd 3/3/00 3:01 PM Page 435
Further Readings
continued doing liftoffs until the plant was permanently shut down for
high cost. Developing effective on-condition/condition directed main-
tenance pairs is challenging work. It can depend greatly on the plant
regulatory and cultural environment for its success. Static, regulated
environments will not be conducive to any new techniques or methods.
They are far too demanding!
435
further reading 391-436.qxd 3/3/00 3:01 PM Page 436
436
RCM 437-476.qxd 3/3/00 3:04 PM Page 437
437
RCM 437-476.qxd 3/3/00 3:04 PM Page 438
Often overall top event reliability is known, but the individual relia-
bilities are not. Individual reliabilities may be taken from generic tables.
In any event, the Fault Tree identifies the fault paths of interest, and their
logic, providing the opportunity to focus on the critical few that matter.
In the example, we build the fault tree on FaultTree+, assigning fail-
ure data as we go. We have several selections of failure models and
information to select from. Once complete, we can run the analysis and
see whether the frequency of occurrence of the top event here,
engine failure to loadfits our experience. Very often it doesnt, but
we now have specific guidance on where to look for additional data.
The fault tree thus supports the continuous evolution of a maintenance
and operating plan based on facts. For users who have never used fault
trees, they help understand the complexity of multiple failure data and
help focus efforts in selective areas for maximum results. Its common
for a fault tree model of a problem to draw out risk areas not previous-
ly appreciated.
438
RCM 437-476.qxd 3/3/00 3:04 PM Page 439
439
RCM 437-476.qxd 3/3/00 3:04 PM Page 440
oped from which costs (like criticality) can be developed. The common
file structure in Item Softwares RCM Cost and related products (Fail
Mode, Fault Tree+...) means that designers can develop a FMECA, per-
form fault tree analysis, and review critical failure modes together
during design. Alternative maintenance strategies can then be devel-
oped, explored, simulated, and optimized. Any combination can help
to develop a product manufacturers initial installation and recom-
mended scheduled maintenance program. Clearly, the same software
can be used by the end users (operating organizations) to review and
evaluate their maintenance practices, costs, and risks, and optimize their
maintenance programs. There are at least seven RCM software prod-
ucts available. Some support one or more applications more easily than
others. Software users must understand their own needs and then
explore the market alternatives.
440
RCM 437-476.qxd 3/3/00 3:04 PM Page 441
Splash Menu
Splash Menu. The startup or splash menu shows the major func-
tions offered by the CMMS and graphically suggests use of a mouse
the trademark of a GUI interface. Since many non-routine CMMS users
are not typists the GUI interface is essential for speed and convenience.
Note that most CMMS systems even today use traditional terminology
since this is what users know. (We could equally call PM Management
Scheduled Maintenance.) Users can view and update different areas
by controlled authorization. Most systems offer generous view only
data privileges but restrict updates to specific work areas.
Equipment Hierarchy. The hierarchy provides a convenient way for
plant workers and staff to quickly locate any equipment of interest for
the purpose of identifying, selecting, or reviewing work and related fail-
ures, resources, and costs. The hierarchy (in a GUI environment) dou-
441
RCM 437-476.qxd 3/3/00 3:04 PM Page 442
442
RCM 437-476.qxd 3/3/00 3:05 PM Page 443
443
RCM 437-476.qxd 3/3/00 3:05 PM Page 444
ated as TRs. Success with TRs hinges on ranking the nature and
importance of the failures. Documented failures and planned actions
for important equipment equipment in the plants CMMS register
facilitate sorting through TRs quickly to (1) extract high-impact failures
that warrant high-priority, and (2) allow the option to pre-plan work,
and work pre-planned work on many NSM-type failures. A TR should
clearly identify a problem in the title not just a piece of equipment.
Since the TR title converts into supplemental documents like a work
order, the TRs title, initiator, plant impact and priority, reported date,
and work start target date need identification. Obviously, a planner and
scheduler will have to review and adjust the initiators request with over-
all plant schedule and resources.
444
RCM 437-476.qxd 3/3/00 3:05 PM Page 445
445
RCM 437-476.qxd 3/3/00 3:05 PM Page 446
446
RCM 437-476.qxd 3/3/00 3:05 PM Page 447
447
RCM 437-476.qxd 3/3/00 3:05 PM Page 448
Scheduling: WO Scheduler
448
RCM 437-476.qxd 3/3/00 3:06 PM Page 449
PM WO - Coal Mills WO
449
RCM 437-476.qxd 3/3/00 3:06 PM Page 450
450
RCM 437-476.qxd 3/3/00 3:06 PM Page 451
this allow anyone to quickly confirm (1) that work is in fact in progress,
and (2) that its had so many hours of time charged and presumably
worked. Work that is stalled or parked is also visually clear. Key infor-
mation for selected WOs is displayed on the same page without jump-
ing around.
WO/PM Lists. Lists can be generated by many sort orders to sup-
port any of a variety of standard plant work review activity: daily work,
outage work, scheduled work, skill category or department work.
These lists must provide the key information WO number, title, pri-
ority importance, crew ID, and scheduled completion datein any sort
order. For PM masters the priority is supplemented by WO type that
should roughly translate into RCM scheduling options Hard Time:
TBM, Scheduled Tests & Checks: OCM, On Demand and No
Scheduled Maintenance (preplanned): CDM. Other categories such as
Overhaul allow convenient grouping into scheduled work categories for
451
RCM 437-476.qxd 3/3/00 3:06 PM Page 452
outage. Double clicking the top field of any column resorts the list by
that column, and a second time will reverse the sort order (top-to-bot-
tom goes to bottom-to-top).
Overhaul. Overhauls are special groups of work activity that are
issued as large groups of activity at the same time. Overhauls such as
six-year turbine tear-down and inspections can be developed as many
separate individual WOs for the many activities that must separately be
performed. These can then be tagged as a specific outage such as an
18-month boiler inspectionand issued as one single group (or be
issued based on selected items from the group). The overhaul group
then gets issued as a single clump of work. While the benefits to this
are not immediately apparent, let me personally attest that many hours
were spent in former days issuing many of the individual WOs that
made up large power plant outages as many as 1,000! The time sav-
ings and simplicity of this feature are tremendous. Typically, these activ-
ities are the same ones the plant wants to download for their Project
Management Software, to be able to schedule the outage in rote detail. The
452
RCM 437-476.qxd 3/3/00 3:06 PM Page 453
453
RCM 437-476.qxd 3/3/00 3:06 PM Page 454
Routes: PM WO Route
454
RCM 437-476.qxd 3/3/00 3:06 PM Page 455
455
RCM 437-476.qxd 3/3/00 3:07 PM Page 456
456
RCM 437-476.qxd 3/3/00 3:07 PM Page 457
done in a large facility, you find it is highly repetitious (e.g., a few fail-
ure modes dominate), and the advantages of this feature to pre-plan
even on demand CDM and NSM-type failure work are tremendous!
The beauty of this feature is the capacity to standardize planned work
plans and revise the work plan for tens or even hundreds of equipment
PMs and WOs with one standard change.
457
RCM 437-476.qxd 3/3/00 3:07 PM Page 458
458
RCM 437-476.qxd 3/3/00 3:07 PM Page 459
459
RCM 437-476.qxd 3/3/00 3:07 PM Page 460
Startup Menu. The startup menu defines the major RCM software
functions. Since users are often not typists, a GUI interface improves
speed and convenience. Different users can view or update different
areas. Since use is restricted to a small group of engineers and analysts,
control requirements are simpler than for a CMMS. View only use
includes interrogating the database for known hardware failures and
failure data, the failure bases (e.g., failure basis, plural), and strategies
addressing known failures. Systems should be at the highest level in the
database and be identifiable by general category such as fossil, nuclear,
or chemical process, and/or other general classification. Broad systems
classes such as control, service and power conversion should be avail-
able for later analysis and sorting.
Startup Menu
460
RCM 437-476.qxd 3/3/00 3:08 PM Page 461
461
RCM 437-476.qxd 3/3/00 3:08 PM Page 462
462
RCM 437-476.qxd 3/3/00 3:08 PM Page 463
463
RCM 437-476.qxd 3/3/00 3:08 PM Page 464
Pull-down Menus and Finders: Browse Lists and Finders System Equipment
Computers Hierarchy
464
RCM 437-476.qxd 3/3/00 3:09 PM Page 465
465
RCM 437-476.qxd 3/3/00 3:09 PM Page 466
466
RCM 437-476.qxd 3/3/00 3:09 PM Page 467
467
RCM 437-476.qxd 3/3/00 3:09 PM Page 468
468
RCM 437-476.qxd 3/3/00 3:09 PM Page 469
469
RCM 437-476.qxd 3/3/00 3:09 PM Page 470
Operators Rounds
470
RCM 437-476.qxd 3/3/00 3:10 PM Page 471
Reports
codes, EPA Title 5 emissions, and other requirements should they so
desire. When documented, users can see those aspects of their sched-
uled maintenance program that are based on the force of law. Every
PM task has task resources necessary to perform the task identified.
This includes work classification, department, work hours, travel and
slack time. PM tasks can be grouped at the component, and part fail-
ure level to arrange packaged activity that can be performed as conven-
ient work packages.
Data Copying. Data copying subroutines allow the user to select,
extract, and apply failure data at many levels and copy large chunks of
pre-existing systems up to and including the entire system itself
into new or existing models. A process that avoids recreating basic
engineering and failure data available elsewhere for the multitude of
replicated components, equipment, and even systems present in a large
industrial facility can be used to develop plans quickly. Data copying
allows rapid similarity modeling at multiple levels.
471
RCM 437-476.qxd 3/3/00 3:10 PM Page 472
Reports
472
RCM 437-476.qxd 3/3/00 3:10 PM Page 473
473
RCM 437-476.qxd 3/3/00 3:10 PM Page 474
474
RCM 437-476.qxd 3/3/00 3:10 PM Page 475
475
RCM 437-476.qxd 3/3/00 3:10 PM Page 476
references 477-480.qxd 3/3/00 3:11 PM Page 477
References
1. F.S. Nolan, H.L. Heap, et al, Reliability-Centered Maintenance,
United Airlines, San Francisco, CA, Dec. 1978 NTIS AD/A066579
2. Reliability Centered Maintenance (edited summary of Nolan &
Heap), R. Keith Young, MQS, PdMA , Millersville, MD, 1996
3. Smith, A.M., Reliability-Centered Maintenance, McGraw-Hill, New
York, NY, 1993
4. Scherkenbach, W.W., The Deming Route, CEE Press Books,
George Washington University, Washington, DC, 1990
5. Ishikawa, Ki., What is Total Quality Control? The Japanese Way
(Translated by D.L. Lu), Prentice-Hall, Inc., Englewood, NJ, 1985
6. Bloch, H.P., Geitner, F.K. Machine Reliability Assessment, Van
Nostrand Reinhard, New York, NY, 1990
7. Tajiri, M., & Gotoh, Total Productive Maintenance
ImplementationA Japanese Approach, McGraw-Hill, Inc., New
York, NY, 1992
8. Rao, S., Reliability-Based Design, McGraw-Hill, New York, NY,
1992
9. Kececioglu, D, Reliability Engineering Handbook Vol. 1-2, Prentice
Hall, Englewood Cliffs, NJ, 1991
10. Ireson, W. Grant, et al, Handbook of Reliability Engineering and
Management, McGraw-Hill, 1988
11. Equipment Maintenance Optimization Group Meeting Minutes,
ERPI Boston, MA, 1995
12. RCM Handbook, EPRI, 1994
13. Reliability Centered Maintenance Implementation, EPRI NDE
Center, Charlotte, NC, RCM Maintenance Training, Nov. 1993,
(NUS)
14. RCM for Substations Technical Reference, EPRI NUS
Gaithersburg, MD, June 1996
15. RCM Proceedings RCM for Substations Conference, EPRI Cambias
& Associates, August 1996
16. Predictive Maintenance Primer, NMAC EPRI (NUS), Palo Alto,
477
references 477-480.qxd 3/3/00 3:11 PM Page 478
478
references 477-480.qxd 3/3/00 3:11 PM Page 479
References
479
references 477-480.qxd 3/3/00 3:11 PM Page 480
Index 481-500.qxd 3/3/00 3:12 PM Page 481
Index
A
Accuracy (instrumentation), 243-248
Acronyms, xi-xiv
Age exploration, 6, 178-180, 199-205, 307, 338, 426-427:
definition, 199-201;
value, 201-204;
systematic application, 204-205
Aging analysis, 47, 270, 338, 405-407
Alarms, 242-243
Alternative solutions, 349-350
Ambiguity, 146-147
Analysis software, 312-315
Applicability criterion, 31-32, 122-124
Applications software, 304
Applications (RCM), 12-14, 161-193, 344-350:
overview, 161-165;
engineering, 165-173;
integration of functions, 173-185;
safety, 185-187;
case histories, 187-193;
statistical maintenance, 346-349;
alternatives, 349-350
Area checks, 238-240
Areas not worked online, 292-293
As low as reasonably achievable, 396-397
Assessing programs, 122-125:
applicability, 122-124;
cost effectiveness, 124-125
Assumptions, 31-37:
applicability, 31-32;
effectiveness, 32-33;
481
Index 481-500.qxd 3/3/00 3:12 PM Page 482
PM, 33;
statistics and regulators, 33-37
Availability simulation, 172
B
Backlogs, 279-280, 282-284:
work order, 282-284
Basis history (equipment), 91
Black-box model, 156
Blocking tasks, 89-90
Bootstrapping, 229-231
Budgeting, 407-409
C
Case histories, 63-70, 187-193, 252-253:
maintenance practices, 63-70;
soot-blowing air compressor, 64-65;
turbine blade, 65-66;
generator retaining ring, 66-68;
coal belt fire, 69-70;
circulating water tower, 187-190;
maintenance options, 191-193
Casual use, 232-235:
critical failure modes, 233;
wearout, 233-234;
confusion implications, 234-235
Changes and measures (output/response), 327-333:
measure types, 328-329;
system measures, 329-332;
failure, 332-333; costs, 333
Checklist (maintenance health), 412
Checklist (round), 317-318
Circulating water tower, 187-190
Clock-based PM, 56-57
482
Index 481-500.qxd 3/3/00 3:12 PM Page 483
References
483
Index 481-500.qxd 3/3/00 3:12 PM Page 484
D
Delivery (maintenance), 148-149
Delivery (operations), 115-116
Department goal balancing, 163:
value added, 163
Design basis, 116-118
Design-change maintenance, 21
Development steps, 294-295
Development (RCM), 2-4
Development (systems approach), 116
Do your best (strategy), 209-211
Documentation (software), 315-317
484
Index 481-500.qxd 3/3/00 3:12 PM Page 485
References
E
Effectiveness measures, 6, 335, 337
Emergency maintenance, 61, 335
Engineering applications, 165-173:
operations-organizational
relationship, 165-167;
plant support roles, 167-169;
plant modification, 169-171;
tools, 171-173
Engineering focus, 205-209:
failure spectrum, 206-209
Engineering maintenance, 43
Engineering reliability, 15-18
Engineering support role, 163-164
Engineering tools, 171-173:
failure modes and effects criticality analysis, 171-172;
fault trees, 172; availability simulation, 172;
Weibull analysis, 172-173
Engineering-specified failure, 429
Entropy (organizational), 341-343
Environment maintenance, 422-426
Equipment groups, 151-153, 268-272, 287-299:
development steps, 294-295;
types, 296-298;
operations, 298-299;
modification reviews, 299
Equipment hierarchy, 81-99:
level, 81-83;
failure descriptions, 83-88;
blocking tasks, 89-90;
PM tasks/vendors, 90-91;
basis history, 91;
PM work packages, 91-92;
information sources, 92-94;
485
Index 481-500.qxd 3/3/00 3:12 PM Page 486
F
Failure analysis, 6, 10-11, 85, 312-315, 332-333
Failure complexity, 421-426
Failure description, 83-88, 159
Failure footprints, 218-221: CMMS, 218-221
Failure frequency, 140-142:
coded components, 141;
complexity, 141-142
Failure identification, 6, 173-176, 430-432
Failure management, 157-158
Failure modes, 6, 94-97, 171-172, 233, 438-439, 465, 467-468
Failure modes and effects analysis, 6, 94
Failure modes and effects criticality analysis, 94, 96-97, 171-172, 438-
439
Failure numbers, 184-185
Failure perspectives, 154-160
Failure reports, 236-237
Failure spectrum, 28-29, 206-209
Failures, 34, 61, 83-84, 218-226, 249-250, 332-333. SEE ALSO Case
486
Index 481-500.qxd 3/3/00 3:12 PM Page 487
References
histories.
Fast track maintenance, 255-299:
CMMS, 256-258;
maintenance infrastructure, 258;
traditional programs, 258-260;
scheduling, 260-266;
scheduling methods, 266-272;
project management techniques, 272-274;
overhaul intervals, 274-279;
PM reviews, 279-284;
outage work review, 284-287;
equipment groups, 287-299
Fault tree analysis, 6, 98, 172, 437-438
Focused measurement, 324-327
Function integration, 173-185
Functional elements (PM), 56-63:
time-based (clocks), 56-57;
operational-based (surveillance), 57-58;
operate to failure, 58;
preplanned failure, 58;
no scheduled maintenance, 58;
measurement, 58-59;
overhaul, 59-61;
emergency maintenance, 61;
overtime, 61;
failures, 61;
maintenance rule, 62-63
Functional failure, 156, 417-418:
measurement, 417-418
Functional reviews (RCM), 99-102:
history, 100-102
G
Generating units, 74-76
Generator retaining ring, 66-68
487
Index 481-500.qxd 3/3/00 3:12 PM Page 488
H
Hard time, 6
Hardcopy documents, 458
Hierarchies, 81-99:
level, 81-83
Hierarchy (software), 302-304:
coding levels, 303-304;
standardize, 304;
applications, 304
Hierarchy and boundary, 98-99
I
Implementation, 20-22, 209-216:
value added, 20-21;
maintenance strategy, 21-22;
models, 209-216
Implementation models, 209-216:
do your best, 209-211;
trust us, 212-213;
typical implementation, 213-214;
total performance, 214-216
Importance (criteria), 235-238:
equipment size, 236; failure reports, 236-237;
work frequency, 237;
vendor recommendations, 237;
industry practice, 237;
shop practice, 237-238;
equipment registers, 238
488
Index 481-500.qxd 3/3/00 3:12 PM Page 489
References
K
Keep it simple stupid, 146
L
Legitimate failure, 225-226
Lessons learned, 195-253:
task intervals, 197-199;
age exploration, 199-205;
engineering focus, 205-209;
implementation models, 209-216;
vendor perspective, 216-218;
failure footprints, 218-221;
operate to failure, 221-231;
no planned maintenance, 221-231;
critical failure, 231-235;
importance criteria, 235-238;
489
Index 481-500.qxd 3/3/00 3:12 PM Page 490
M
Maintenance budgeting, 407-409
Maintenance cost, 63, 323-327, 335-336:
maintenance hour, 335-336
Maintenance delivery, 148-149
Maintenance discipline, 432-433
Maintenance infrastructure, 258
Maintenance options (RCM based), 191-193, 349-350:
no scheduled maintenance, 191-193;
on-condition maintenance, 191;
time-based maintenance, 191-192
Maintenance performance, 41-42, 126, 150-151, 214-216
Maintenance perspective (RCM), 1-22:
precursors, 1-2;
development, 2-4;
origin, 4-5;
reliability perspective, 507, 14-19;
post World War II, 8-10;
traditional RCM, 10-12;
applied RCM,12-14;
implementation, 20-22
Maintenance practices, 23-70:
options, 24-29;
490
Index 481-500.qxd 3/3/00 3:12 PM Page 491
References
consistency, 29-37;
maintenance process, 37-44;
PM, 44-63;
costs, 63;
case examples, 63-70. SEE ALSO Case histories.
Maintenance process, 6, 37-44, 105-111, 127-129, 297, 304-305,
335, 337-340, 412-415:
plan, 40-41;
schedule, 41;
performance, 41-42;
training, 42-43;
engineering, 43;
definition, 43-44;
model, 127-129;
software, 304-305
Maintenance process measures, 335, 337-340:
effectiveness, 335, 337;
responsiveness, 337;
total hours/system, 337;
trends, 337;
aging studies, 338;
costs, 338;
ratios, 338-339;
rework, 339;
screening for effectiveness, 339-340
Maintenance process model, 127-129
Maintenance process software, 301-320:
goals, 301-302;
hierarchy, 302-304;
CMMS software, 304-307;
RCM software development, 307-312;
analysis, 312-315;
documentation, 315-317;
products, 317-319;
configurations simulation, 319;
simplicity, 319-320;
491
Index 481-500.qxd 3/3/00 3:12 PM Page 492
policy, 320
Maintenance ratios, 338-339
Maintenance rule, 62-63
Maintenance strategy, 21-22
Management oversight risk tree, 6
Mean time between failures, 25
Measure types, 328-329
Measurement, 58-59, 321-327:
PM functions, 58-59;
output/response, 321-327;
global, 321-324;
focused, 324-327
Measures (output/response), 321-340:
measurement, 321-327;
changes and, 327-333;
PM hours, 334-335;
maintenance process, 335, 337-340
Mechanisms of failure, 94-95
Missed PM, 394-395
Models (PM), 46-48
Modification reviews, 299
Monitoring, 120-121, 176-178, 208, 238-240, 392-393, 418-420
N
Needs awareness, 116-118
No planned maintenance, 221-231:
failure, 221-223;
RCM environment, 223-225;
legitimate failure, 225-226;
condition-based maintenance, 226-228;
complexity in failures, 228-229;
bootstrapping, 229-231
No scheduled maintenance, 58, 191-193
Nuclear energy generation, 131
492
Index 481-500.qxd 3/3/00 3:12 PM Page 493
References
O
Obsolescence, 8
On-condition maintenance, 6, 191, 265
Operate to failure, 8, 58, 221-231, 397-399, 428-432:
failure, 221-223;
RCM environment, 223-225;
legitimate failure, 225-226;
condition-based maintenance, 226-228;
complexity in failures, 228-229;
bootstrapping, 229-231
Operational-based PM, 57-58
Operations (equipment groups), 298-299
Operations overview, 161-163
Operations roles, 173-178:
failure identification, 173-176;
operator monitoring, 176;
rounds optimization, 176-178
Operations-organizational relationships, 165-167
Operator monitoring, 176
Operator training, 420-421
Options (maintenance), 24-29
Organization of activity, 318
Organizational relationships, 165-167
Origin (RCM), 4-5
Outage, 16, 31, 51, 120, 272, 284-287:
intervals, 16;
work review, 284-287;
parts and, 286-287
Outage work review, 284-287:
event analysis, 285-286;
parts and outages, 286-287;
strategy development, 287
Over-conservatism, 135-139
Overhaul, 59-61, 110, 200, 274-279, 401-405, 452, 454:
493
Index 481-500.qxd 3/3/00 3:12 PM Page 494
basis, 274-275;
intervals, 274-279;
conditional, 404-405
Overhaul basis, 274-275
Overhaul intervals, 274-279
Overhaul schedules, 274-279:
basis, 274-275;
optimizing strategy, 275;
planning, 275-276;
costs and rank, 276-278;
standards, 278-279
Overtime, 61
P
Pareto system cost analysis, 117, 409-411
Part aging dispersion, 405-407
Part failure, 156, 286-287
Parts, 156, 178-184, 286-287, 405-407:
failure, 156, 286-287; age exploration, 178-180;
integration, 178-184;
stocking levels, 180-181;
consistency, 181-182;
problems, 182-184;
troubleshooting, 183-184;
and outages, 286-287;
aging dispersion, 405-407
Parts and outages, 286-287
Parts integration, 178-184
Performance (RCM), 41-42, 71-111, 126, 150-151, 214-216:
equipment selection, 73-81;
equipment hierarchy, 81-99;
functional reviews, 99-102;
standards, 102-104;
comparison analysis, 104-107;
maintenance process, 105-111
494
Index 481-500.qxd 3/3/00 3:12 PM Page 495
References
495
Index 481-500.qxd 3/3/00 3:12 PM Page 496
Q
Query features, 458
496
Index 481-500.qxd 3/3/00 3:12 PM Page 497
References
R
Random failure, 25-26, 429
Rare failures, 249-250
Ratio measurement, 338-339
RCM analysis, 307-312, 439-440:
software development, 307-312;
software, 439-440
RCM definition, 231-231
RCM environment, 223-225
RCM software development, 307-312, 439-440:
process standardization, 307-311;
task basis, 311-312;
example, 439-440
RCMtrim (tm), 460-475
RCM/CMMS idealization, 318-319
Readings in RCM, 391-436
Real hours, 334-335
Redundancy, 35, 248-250, 417-418:
costs and layers, 248-249;
rare failures, 249-250
Reference materials, 477-479
Regulatory agencies, 33-37, 315-317
Reliability concepts, 5-8, 14-19:
definition, 14-15;
engineering, 15-18;
process, 18-19
Reliability perspective, 5-8, 14-19
Responsiveness measures, 337
Rework measures, 339
Risk management, 95, 399-401
Root cause analysis, 119
Round checklist, 317-318, 470, 473
Rounds optimization, 176-178
Routes (groups), 458, 474-475
Run-in period, 109
497
Index 481-500.qxd 3/3/00 3:12 PM Page 498
S
Safety, 186-188: direct consequences, 185-186;
potential consequences, 186-187
Scheduling, 41, 260-274, 281, 289, 449-451:
methods, 266-272
Scheduling methods, 266-272:
expedite, 266-267;
short term (weekly), 267;
long term, 267-268;
equipment groups, 268-272;
outage, 272
Screening for effectiveness, 339-340
Shop practice, 237-238
Short term schedule, 267
Simplicity criterion (software), 319-320
Software, 2, 179, 203, 304-312, 317-319, 437-475:
products, 317-319
Software applications, 437-475:
fault tree analysis, 437-438;
failure modes and effects criticality analysis, 438-439;
RCM analysis, 439-440;
CMMS example (Power FM), 441-459;
RCMtrim, 460-475
Soot-blowing air compressor, 64-65
Spurious alarms, 242-243
Standardization, 13, 80-81, 102-104, 278-219, 304, 307-311, 320:
software, 304
Standards, 102-104, 278-279
Statistical analysis, 331
Statistical maintenance, 347-349
Statistical process control, 18, 97, 435-436
Statistics and regulators, 33-37
Stocking levels, 180-181
Strategy, 275, 287, 344-346:
498
Index 481-500.qxd 3/3/00 3:12 PM Page 499
References
development, 287
Surveillance-based PM, 57-58
System cost, 122
System definition, 118-119
System failure, 83-84
System hierarchy, 81-83
System level measurement, 306
System measures, 329-332
System monitoring, 120-121
System performance measurement, 119-120
Systematic application, 204-205
Systems approach, 6-7, 116-122:
development, 116;
training, 116-118;
design basis, 116-118;
needs awareness, 116-118;
system definition, 118-119;
system performance measurement, 119-120;
system monitoring, 120-121;
system cost, 122
Systems approach, 116-122
T
Task basis, 311-312
Task intervals, 197-199
Terminology, xi-xiv, 12, 257, 353-389, 430
Time accounting, 334-335
Time-based maintenance, xvi, 6, 49, 56-57, 191-192
Tools (engineering), 171-173
Total hours/system measures, 337
Total PM performance, 214-216
Total quality maintenance, 38-39
Traditional maintenance programs, 10-13, 258-260
Training, 42-43, 116-118
Trend analysis, 337
499
Index 481-500.qxd 3/3/00 3:12 PM Page 500
V
Value, 20-21, 128, 163, 201-204
Vendor perspective, 216-218:
recommendations, 217-218
W
Wearout, 233-234
Weibull analysis, 172-173
Work descriptions (tasks), 454-455, 457
Work frequency, 237
Work grouping, 269
Work orders, 282-284, 444-459:
backlog, 282-284
Work review, 284-287
Work screening/prioritization, 392-393
Working to schedules, 273-274
Worklists, 280-291
World War II, 9-10
500