Human Error Identification in Human Reliability Assessment - Part 1 - Overview of Approaches PDF
Human Error Identification in Human Reliability Assessment - Part 1 - Overview of Approaches PDF
Human Error Identification in Human Reliability Assessment - Part 1 - Overview of Approaches PDF
This paper reviews a number of approaches which can be used to identify human
errors in human-machine systems. These approaches range from simple error
classifications to sophisticated software packages based on models of human
performance. However, the prediction of human behaviour in complex environments
is far from being an easy task itself, and there is significant scope for improvement in
human error identification 'technology'. This first paper in a series of two reviews the
available techniques of human error identification, highlighting their strengths and
weaknesses. The second paper will review the validation of such approaches, and
likely future trends in this area of human reliability assessment.
Keywords: Human error; risk assessment; human reliability assessment
Introduction
The assessment of what can go wrong with largescale systems such as nuclear power plants is of
considerable current interest, given the past decade's
record of accidents attributable to "human error". Such
assessments are formal and technically complex evaluations of the potential risks of systems, and are called
probabilistic risk assessments (PRAs). Today, many
PRAs consider not just hardware failures and environmental events which can impact upon risk but also
human error contributions: accidents ranging from
Bhopal, to the near-misses that occur, such as the
Davis-Besse incident1, justify this inclusion of human
error.
The assessment of the human error contribution to
risk is via the use of human reliability assessment
(HRA) tools. Most papers on HRA are concerned with
the quantification of human error potential, as most
research in HRA has focused on this area. This paper,
however, attempts to bring together a representative
sample of techniques for the identification of human
errors, so that the state of the art and trends/development needs in this area can be assessed. This is
necessary because human error identification (HEI), as
will be argued, is at least as critical to assessing risk
accurately as the quantification of error likelihoods,
and yet is relatively under-developed.
This paper therefore assesses the state of the art of
HEI. Before moving on to the actual techniques,
Vol 23 No 5 October 1 9 9 2
299
BARRYKIRWAN
J Problem I
1
definition
"I'ask
I
2
analysis
I
3
Human error
analysis
Error
avoidance
not studied
further
I (
i
Factors
influencing
performance
and error
causes or
mechanisms
Screening
rnl~oving
~. perforrnonce
Quantification
Error
reduction
assessment
+"
No
I
9
Ouality
assurance
t
Documentation
10I
BARRY KIRWAN
not done;
repeated;
less than;
sooner than;
more than;
later than;
as well as;
mis-ordered;
other than;
part of.
An example of the output from a human-errororiented H A Z O P exercise ~4 (from the offshore drilling
sector) is shown in Table 1. It may be noted that in this
exercise additional guidewords were utilized (eg
"calculation error") to facilitate error identification,
and a tabular task analysis format was added to the
HAZOP process to investigate operator errors and
actions more formally.
More generally, analysts' judgement is a useful
resource in error identification. Many risk analysis
practitioners build up experience of identifying errors
in safety studies, either from their operational experience or from involvement in many different safety
analyses. Whilst this is perhaps an 'art' rather than a
science and as such is less accessible to the novice than
a more formal method, its value must not be overlooked. Owing to the specificity of every new safety
analysis carried out, the human reliability practitioner
"standing on the outside" may often be more disadvantaged in terms of error identification than the
303
Table 1 Driller's HAZOP examplel4: pipe-handling equipment in tripping operations. Operating sequence: tripping
out. 5A/5D. Lift and set aside kelly. Ensure hook is free to rotate
Study Deviation
ref. no.
5.1
5.2
No movement
Reverse movement
Indication
Causes
Consequences
Actions notes/
Follow-up
questions
comments
recommendations
A new (mud-saver)
valve was discussed
(A pre-set pressure
valve which prevents flow for
pressure below 200
psi, ie a mud
column equal to
the length of the
kelly assembly)
Forget to open
hook to ensure
free rotations
With stabilizer in
the hole, the drill
string will rotate
when pulled out
which may cause
damage to the
hook
Damaged hoses,
piping etc
Injuries to
operators
BARRY KIRWAN
Start
The situation is a
Yes
routine situation
for which the operator
has highly skilled
routines ?
t No
No
Stereotype
fixation
_J Does operator
q realize this ?
covered by normal
work know-how or
planned procedures ?
- ~
Manual variability
...~ Topographic
misorientation
Yes
~
~
Does operator
respond to proper I
per task-defining I
information ?
J
Does operator
I recall procedure
I correctly ?
J
Familiar pattern
not recognized
~ No
The situation is
J Operator responds
to familiar cue
~-~ which is incomplete ~
I part of available
information
No
I Yes
Familiar association
short cut
J Doesthe operator
correctly collect the
information available
for his analysis ?
~Yes
I Are functional onolysis
and deduction property
performed ?
r Yes
J Other~ specify
_N~
305
6 Assumption
B A R R Y KIRWAN
Table 3 SHERPA HEA table2. Terminate supply and isolate tanker (sequence of remotely operated valve operations)
Task E r r o r
step type
Recovery Psychostep
logical
mechanism
Causes, consequences
and comments
Action
too late
No recovery
Place-losing
error
Overfill of tanker
Operator
resulting in dangerous estimates
circumstance
time/records
amount loaded
2.1
Action
omitted
2.4
Slip of
memory
Feedback when
attempting to close
closed valve. Otherwise alarm when
liquid vented to vent
line
2.2
Action
too early
2.2
Place-losing
error
Action
omitted
2.2
Slip of
memory
As above and
possible overpressure of tanker
(see step 2.3)
Action
too early
No reeovery
Place-losing
error
Action
omitted
2.6
Slip of
memory
Automatic closure on
loss of instrument air
2.4
Action
omitted
2.2
Slip of
memory
Action
omitted
No recovery
Slip of
memory
Latent error
2.3
e r r o r m o d e s 16
Action omitted
Action too early
Action too late
Action too much
Action too little
Action too long
Action too short
Action in wrong direction
Right action on wrong object
Wrong action on right object
Misalignment error
Information not obtained/transmitted
Wrong information obtained/transmitted
Check omitted
Check on wrong object
Wrong check
Check mistimed
Recommendations
Procedures
Training
Equipment
Interlock on
tanker vent
valve
Mimic of valve
configuration
Explain
meaning of
audio feedback
Add check on
final valve
positions before
proceeding to
next step
Mimic of valve
configuration
Mimic of valve
configuration
Rule-based
308
BARRY KIRWAN
Cl
~'Z Doubling
nTunnelling
c3
Hyperactivity
MI
C4
-- Stressors
Unplanned response
(35 Freeze
C6
C8
""n
- - M i n d set
I~" Indec=slo
c7
I c9
.
Short cuts
~--,, Persisrence
MZ
-Heterogeneous
Cl3
M3
L ~
Misd iagnosis
cFl4Reduced capabilities
Disturbance/interruption
c2o
I_-. Forget stage
r- Stereotype mismatch
I cz,
M5System
interface
~C2~ction prevented
~ / d entificafion prevented
Perception prevented
C24
~ . Conscious vs subconscious
M6
--Endogenous
ICZ5
Random fluctuations ~
Motor coordination
IC~C~5
C27
M7
Absent
minded
Mental blocks
r~Substifution
i--Unintentional activation
F~lForget
Intrusions
c31
ICaZ
j-~3Overestimate
abilities
Risk taking-~--~-~Rule contravention
i~o,.t
Me
C ~ 3 . R i s k recognition
L~Risk
tolerance
Murphy diagrams
Developed by Pew et a122 as part of a study
commissioned by the Electric Power Research Institute
in the USA, Murphy diagrams are named after the
well-known axiom "if anything can go wrong, it will".
Essentially they are pictorial representations, following the pattern of logic trees, which show the error
modes and underlying causes associated with cognitive
decision-making tasks (based to an extent on the SRK
model).
Each stage of the decision-making process is represented separately:
procedure selection;
procedure execution.
Outcome
Distal sources
Proximo I sources
_1 _
I
Detect signal- - - @
D i s p l a y design deficiency
Signal intermittent~ E q u i p m e n t
malfunction
or non- specific
~
Activation
detection
/
/
Controlroomdesigndeficiency
Signal
". . . .parhally
. '
. ~_______~ Control boarddesigndeficiency
/or totallyooscureo
~ D
isplay designdeficiency
Inexperiencedoperator
Fail to
detectsignaI
,,~
~/
Monitoring
procedure~f"
not followed
-- Training deficiency
Applied Ergonomics
BARRY KIRWAN
Possible failure mode is: Operator(s) could aquire insufficient data to characterize the plant
You should consider this failure as probable if you answer "yes" to any of the following questions
Y/N
a. Is it likely that the data aquired by the operator is not relevant to establishing the plant state?
Possible reasons for this failure include:
Operator is inexperienced
A procedure is not followed
Operator searches for common pattern of data
More than one abnormal plant conditions exists
Relevant questions are: Q10, Q l l
Relevant conditions are: C1, C6
Y/N
Y/N
Decisionmaking
stage
Failure
modes
Possible
reasons
Fail to act
on alarms
Activation
Fail to
notice or
delayed
detection
Failure to follow 1, 2, 3, 4, 5 1
procedure
Alarms inter6
2
mittent
Alarms obscured 7
3
Task
definition
process. An example from a C A D A analysis questionnaire is shown in Table 6, and an example of CADA
tabular results in Table 7, taken from ref 24.
CADA is interesting in that it is rooted in the SRK
domain but gives the analyst more guidance as to the
Vol 23 No 5 October 1992
CADA
questions
Remarks
CADA
conditions
Incident
Alarms/indicators
warned that
systems were unavailable
1, 6
4
4
1,6
1,4
Task analysis
module
Task analysis
t
Human error
analysis
Task
classification
Sequence driven
scenario
Human
error
dentificatiol
module
Genericaccident I
sequence event
tree
Cognitive
error
potential
Human
error
identif ication
Cognitive
]
error analysis
j/
Human error ]
analysis table
Generic
logic tree
library
Representation
Representation ~ , ~
guidelines
Q uantification
module
f
I analysis
Sens't'tY1
Re-evaluation
Documentation and
quality assurance
Quantification
Error
reduction
Documentation
and quality
assurance
Applied Ergonomics
BARRY KIRWAN
B I Kirwan
Fan failure scenario
Cooling fan system
007 FAN
Step No. 1
Error No 01090
Error mechanism
Recovery
Dependency/exclusivity
Screening
Comments
Step No. 2
Error No. 07090
Error mechanism
Recovery
Dependency/exclusivity
Screening
Comments
Step No 3.2
Error No. 13010
Error mechanism
Recovery
Dependency/exclusivity
Screening
Comments
Sub
Hardwired alarm very salient, 'copied' to SS console. Subsumed by error number
01090: alarm quantification module quantifies total response to alarm set
This error is used to refer to the failure to identify the recovery route of preventing
boiling by achieving recirculation via changing valve status
313
Prior to
Event
event
sequence I
Event
sequence
detection
commencesI
Response
Response
Situations
identification implementation worsens
(Caused by
I
I enew
errors)
vents failures'
or
Fino I
recovery
actions
Outcome
Deals with
compiicotory factors
/I
Latent
errors
/~
Operator
initiators
//Operator actions
/ " worsenevent
Z
/ Responseinadequateaction errors
Misdiagnosisor
decision error
///
Success
IF(~ailstodeal
with new situation
Failure
Adequatelyhandles
(~ent sequence
Success
Fails to resolve
Failure
situation in time
Recovery
JN~orecovery from
~ action errors
Success
Failure
JO
Partial success
(~i(~)
Failure
Partial success
Failure
Figure 6 Generic accident sequence event tree s. L: latent errors of maintenance, calibration etc especially with respect to
instrumentation and safety systems, can affect every node in the tree. 1: if misdiagnosis can lead to more severe consequences, a
second event tree must be developed accordingly. 2: resolution to a safe state, even though responses have largely been
inappropriate or incomplete, is still feasible depending on the system, eg if rapid (safe) shutdown is possible throughout the
scenario as on some chemical plants
314
Applied Ergonomics
BARRY KIRWAN
o..or
vo,ve )
closed
( Monuo,
consequence
I-
I plant
'o0'oo'o
stateo'
i abnormal (damp:fine) )
(~Bh~l~c~ne k
I VDu
I \.
I Line pressure
I Linepressur
(VDU
I\l:,,0,
-
problem
~ _
Autovalveclosed )
1]
[indicated on
J alarm log
( .0o.0n,,r.n
)
supply
( ~lline~/~ k
I: ow
Nitrogensupply
exhausted
II
Levelindicator
on tank
Loose clipson
/ ~ (
Rotaryvalve rubber
sleeve loose
( Air leekooe
Rotary valverubber
sleeveburst
_ i
[ Highoxygenreading
~ fromanolyser
I Oxy~ level I J,
vooII
J = High
~
I .ater a
l
/
/
FDCfiltercompletely.
blocked and full
!isTill~nl~wcOde?n1
~ --~
oxygen analyser
,oo,,,
J T
~'~
~
)
lineinterrupted
ff
,,]
flexible hose
~
~
J digital display
FDCfilter blocked
and filling
II
Printout
J
Mossdiscrepancy[
I g ~ o % n
noted
/ Fnot'"t.rO
r
)
closedcorrectly
'
..
/
"x-,,x
No material transferring
fromhopperto slurry-'
m~x vesset
J[
VDU
I
No mass change
Confusion matrices
Potash et a126 proposed the use of confusion matrices
in order to help analysts to identify potential misdiagnoses. The approach entails, first, generating a list
of the likely abnormal events anticipated for a system
(eg a nuclear power plant). These scenarios are then
assessed by experienced personnel (eg experienced
operators or simulator trainers) in terms of their
relative similarity of indications. Each scenario is
therefore rated in terms of confusibility with respect to
each other scenario. These ratings are then represented
in a confusion matrix (see Table 9). The human
reliability analyst can then predict likely misdiagnoses
for each scenario.
The approach is conceptually simple, although in
Vol 23 No 5 October 1992
CO
Reactor trip
Partial loss of secondary heat removal
Reduction in feedwater flow
Turbine trip
Excessive feedwater flow
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
L/N
L/N
M/R M/N
IdR
10
11
12
13
15
H/RIdR
14
16
L/R
17
19
20
21
L/R
M/NUN M/N
L/R
H/R
M/N
H/R
H/R
M/R
18
22
24
H/R
L/N
L/R
M/N
L/R L/R
23
26
L/N
L/N L/N
25
27
L/R
H/R
L/N
M/N
28
L = low level of scenario confusion; M = medium level of scenario confusion; H = high level of scenario confusion; R = impact on subsequent operator actions; N = negligible impact on subsequent
operator actions
Initiating events
(in descending order of frequency)
t~
g~
BARRY KIRWAN
Summary
This paper has described twelve approaches of HEI,
some basic and some sophisticated, one no longer
available (PHECA), and some still at the prototype
stage of development (eg CES). An initial assessment
of each has been made in terms of defined basic criteria
of comprehensiveness, usefulness for error prediction
and reduction, and documentability. Part 2, to follow,
will compare the techniques against a more detailed set
of relevant, detailed attributes, providing guidance for
tool selection in H R A , and will report on an H E I tool
validation study.
References
1 USNRC 'Loss of main and auxiliary feedwater at the
Davis--Besse Plant on June 9, 1985' Nureg 1154 (1989)
2 Green, A E Safety systems reliability Wiley (1983)
3 Reason, J T 'A framework for classifying errors' in
Rasmussen, J, Duncan, K D and Leplat, J (eds) New
technology and human error Wiley (1987)
4 Kirwan, B, Embry D E and Rea, K 'The human
reliability assessors guide' Report RTS 88/95Q, NCSR,
UKAEA, Culcheth, Cheshire (1988)
317
318
Applied Ergonomics