Guarnera Et Al. (2017) - Why Do Forensic Experts Disagree
Guarnera Et Al. (2017) - Why Do Forensic Experts Disagree
Guarnera Et Al. (2017) - Why Do Forensic Experts Disagree
CITATION READS
1 21
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Lucy Guarnera on 12 July 2017.
Marcus T. Boccaccini
Sam Houston State University
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Recently, the National Research Council, Committee on Identifying the Needs of the
Forensic Science Community (2009) and President’s Council of Advisors on Science and
Technology (PCAST; 2016) identified significant concerns about unreliability and bias in
the forensic sciences. Two broad categories of problems also appear applicable to forensic
psychology: (1) unknown or insufficient field reliability of forensic procedures, and (2)
experts’ lack of independence from those requesting their services. We overview and
integrate research documenting sources of disagreement and bias in forensic psychology
evaluations, including limited training and certification for forensic evaluators, unstandard-
ized methods, individual evaluator differences, and adversarial allegiance. Unreliable opin-
ions can result in arbitrary or unjust legal outcomes for forensic examinees, as well as
diminish confidence in psychological expertise within the legal system. We present rec-
ommendations for translating these research findings into policy and practice reforms
intended to improve reliability and reduce bias in forensic psychology. We also recommend
avenues for future research to continue to monitor progress and suggest new reforms.
Imagine you are a criminal defendant or civil answering a difficult psycholegal question
litigant undergoing a forensic evaluation by a about you and your case. For example, “Were
psychologist, psychiatrist, or other clinician. you sane or insane at the time of the offense?
The forensic evaluator has been tasked with How likely is it that you will be violent in the
future? Are you psychologically stable enough
to fulfill your job duties?” The forensic evalu-
ator interviews you, reads records about your
Lucy A. Guarnera, Department of Psychology, Univer- history, speaks to some sources close to you,
sity of Virginia; Daniel C. Murrie, Institute of Law, Psy-
chiatry, and Public Policy, University of Virginia School of and perhaps administers some psychological
Medicine; Marcus T. Boccaccini, Department of Psychol- tests. The evaluator then forms a forensic
ogy, Sam Houston State University. opinion about your case—and the opinion is
Correspondence concerning this article should be ad-
dressed to Lucy A. Guarnera, Department of Psychology,
not in your favor. You might wonder whether
University of Virginia, P.O. Box 400400, Charlottesville, most forensic clinicians would have reached
VA 22904-4400. E-mail: [email protected] this same opinion. Would a second (or third,
143
144 GUARNERA, MURRIE, AND BOCCACCINI
Identifying the Needs of the Forensic Science Nezworski, & Stejskal, 1996). In general, the
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Community (2009) and President’s Council field reliability of forensic opinions is either
of Advisors on Science and Technology unknown or far from perfect. For example, a
(PCAST; 2016) reviewed the state of forensic recent meta-analysis concluded that for evalua-
science, covering a wide range of disciplines tions of adjudicative competency— one of the
including analyses of DNA, fingerprints, hair, most common forensic psychology proce-
tire treads, bite marks, and ballistics. Both dures—pairs of independent evaluators assess-
governmental councils concluded that the er- ing the same defendant disagreed in approxi-
ror rates of many forensic techniques are un- mately 15%–30% of cases (Guarnera & Murrie,
known, and that forensic scientists are prone in press). This corresponds to rater agreement
to a variety of contextual biases. Consistent coefficients (i.e., Cohen’s kappa) in the range of
with the National Research Council (NRC) .30 –.65, which indicates fair to moderate agree-
and PCAST’s concerns, research has docu- ment according to most kappa interpretation
mented subjectivity and bias even in the fo- schemes (e.g., Landis & Koch, 1977). Field
rensic science procedures that courts have reliability rates for other common forensic opin-
considered most reliable, such as analyses of ions are similar although generally somewhat
DNA (Dror & Hampikian, 2011) and finger- lower; pairs of independent evaluators tend to
prints (Dror & Rosenthal, 2008). disagree in approximately 25%–35% of sanity
While forensic evaluators strive for objectiv- cases ( ⬇ .25–.65; Guarnera & Murrie, in
ity and seek to avoid conflicts of interest (Amer- press) and almost half (45%) of conditional
ican Psychological Association, 2013), a foren- release cases ( ⫽ .19; Acklin, Fuger, & Gow-
sic opinion may be influenced by multiple ensmith, 2015). In a related issue, the interrater
sources of variability and bias that can be pow- reliability of forensic assessment instruments
erful enough to cause independent evaluators to scored under routine practice conditions in the
form different opinions about the same defen- field is often poorer than what has been docu-
dant (see Figure 1). The purpose of this review mented in controlled validation studies and re-
is to summarize and integrate research docu- ported in test manuals (C. S. Miller, Kimonis,
menting various sources of disagreement in fo- Otto, Kline, & Wasserman, 2012).
rensic evaluations, as well as suggest promising We discuss many possible reasons for these
avenues of future research. We also present less-than-ideal field reliability rates, but one key
recommendations for translating these research foundational explanation is that forming a fo-
findings into policy and practice reforms in- rensic opinion is an extraordinarily difficult
tended to improve the reliability of forensic task. For example, evaluations of legal sanity
evaluations. require clinicians to use limited and often con-
The NRC and PCAST reports identified tradictory information to draw conclusions
two broad categories of problems in forensic about the mental state of a defendant at the time
science that appear applicable to forensic psy- they committed the crime, which may have
chology: (1) unknown or insufficient field re- been months or even years ago. A survey of a
liability of forensic procedures, and (2) ex- variety of medical and psychological proce-
perts’ lack of independence from those
requesting their services. We address both of 1
See, generally, Gwet (2014) for a more in-depth defi-
these areas in turn. nition and discussion of interrater reliability.
DISAGREEMENT AND BIAS IN FORENSIC EVALUATIONS 145
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
dures confirms that complex decision tasks in- any state-level certification in forensic mental
volving the integration of multiple sources of health assessment, and those that do may have
data, such as rating child behavior problems or weak standards (e.g., attend one brief, initial
classifying stroke severity, tend to settle at fair training session or have previous clinical expe-
to moderate reliability rates (kappa or intraclass rience; Gowensmith, Pinals, & Karas, 2015).
correlation [ICC] ⬇ .30 –.75; Meyer, Mihura, & Thus, many states continue to have the bulk
Smith, 2005). This is in contrast to simple ob- of their forensic evaluations performed by “oc-
ject counts (e.g., counting decayed or missing casional experts,” general clinicians without
teeth) or physical measurements (e.g., measur- specialized forensic training (Grisso, 1987, p.
ing organ size on an ultrasound), where reliabil- 833).2 Unsurprisingly, studies assessing the
ity tends to be higher, with rater agreement thoroughness, relevance, and accuracy of the
coefficients greater than .90 (Meyer et al., reports forensic clinicians submit to the court
2005). Along these lines, Mossman (2013) re- routinely find them deficient (Fuger, Acklin,
cently performed mathematical simulations of Nguyen, Ignacio, & Gowensmith, 2014). For
competency evaluations and concluded that fair example, Skeem and colleagues (1998) found
to moderate reliability estimates were about as that competency evaluators’ reports in Utah
good as could reasonably be expected given the failed to incorporate legally relevant aspects of
inherent difficulty of the task. competency and failed to adequately describe
the reasoning underlying their final forensic
Limited Training and Certification for opinion.
Forensic Evaluators This training gap is important because empir-
ical research suggests that evaluators with
Besides the unreliability that may be intrinsic greater training produce more reliable forensic
to a complex, ambiguous task such as forensic opinions. A compelling recent study conducted
evaluation, research has identified multiple ex- in Hawaii examined interrater reliability rates
trinsic sources of expert disagreement. One for three types of common forensic opinions
such source is limited training and certification (adjudicative competency, legal sanity, and vi-
for forensic evaluators. While specialized train- olence risk assessment) both before and after
ing programs and board certifications have be- the state adopted more stringent certification
come far more commonplace and rigorous since
the early days of the field in the 1970s and
2
1980s, the training and certification of typical Occasional experts are likely more common in rural or
clinicians conducting forensic evaluations today other underresourced areas where forensic mental health
assessments are needed, but no highly trained, board-
remains variable and often poor (DeMatteo, certified forensic clinicians are available. Thus, the court’s
Marczyk, Krauss, & Burl, 2009). For example, only option may be a general clinician without specialized
only about one third to one half of states have training in forensic assessment.
146 GUARNERA, MURRIE, AND BOCCACCINI
standards in 2014 (Gowensmith, Sledd, & Ses- .78 and .74, respectively) than less structured
sarego, 2014). These new standards included a instruments like the Psychopathy Checklist—
mandatory 3-day training, written test, submis- Revised (PCL-R; Hare, 2003), which showed
sion of a mock report, peer review process, and an ICC1 of .60 in the field. Even within the
continuing education. Postcertification, reliabil- PCL-R, more objective items with explicit scor-
ity rates improved for all three types of evalu- ing rules (e.g., criminal versatility, juvenile de-
ations (competency: 13% increase, p ⫽ .08; linquency, revocation of conditional release;
sanity: 17% increase, p ⫽ .04; risk: 29% in- ICCA1 ⫽ .75–.80) tend to show greater field
crease, p ⫽ .001). Gowensmith and colleagues’ reliability than more subjective items requiring
(2014) results provide the first direct evidence impressionistic judgments (e.g., impulsivity,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
that more stringent state-level certification stan- glibness, callousness; ICCA1 ⫽ .23–.36; Sturup
This document is copyrighted by the American Psychological Association or one of its allied publishers.
on agreeableness may have been less willing to More recently, using scores from structured risk
assume that equivocal data from the case files instruments (e.g., PCL-R, Static-99R) as a con-
indicated psychopathy. venient way to quantify differences in expert
Regarding attitudes, early studies found that opinion, researchers examining archival data
evaluators’ personal attitudes toward the insanity found large scoring differences according to
defense predicted whether they reached an insan- side of retention—prosecution-retained evalua-
ity opinion in case vignettes (Homant & Kennedy, tors produced higher risk scores that made the
1987). Vignette-based research and practitioner examinee look more dangerous, while defense-
surveys have found that evaluators with pro- retained evaluators produced lower risk scores
death-penalty attitudes are more likely to find hy- that made the examinee look more benign. For
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
pothetical defendants competent for execution example, Murrie et al. (2009) found an average
This document is copyrighted by the American Psychological Association or one of its allied publishers.
(Palker-Corell, 2007) or accept referrals for death difference of 5.8 points on the PCL-R (score
penalty evaluations (Neal, 2016). Furthermore, range: 0 – 40) between opposing sexually vio-
evaluators themselves appear to acknowledge the lent predator (SVP) evaluators in Texas, a dif-
potential influence of attitudes on their forensic ference twice the standard error of measurement
work. In a recent qualitative study, many forensic reported in the test manual (Hare, 2003).3
evaluators identified preexisting personal, moral, Recent surveys also suggest that evaluators
or political values as influences on their forensic tend to interpret risk scores in a way that favors
opinions (Neal & Brodsky, 2016). the side that retained them (Boccaccini, Chevalier,
Murrie, & Varela, 2015; Chevalier, Boccaccini,
Forensic Psychologists’ Lack of Murrie, & Varela, 2015). For example, Chevalier
Independence From the Retaining Party et al. (2015) found that 94% of state-retained SVP
evaluators reported using high-risk/need norms
Upon these concerns about unknown or less- for the Static-99R (a way of interpreting scores
than-ideal field reliability of forensic psychology that makes the examinee seem more risky, as
procedures, we now add concerns about forensic compared to routine sample norms), but only 33%
experts’ lack of independence from those request- of respondent-retained evaluators reported using
ing their services (NRC, 2009). As far back as the high-risk/need norms. Thus, two opposing evalu-
1800s, legal experts have lamented the apparent ators who arrive at the same numerical score on a
frequency of scientific experts espousing the risk assessment instrument might still draw biased
views of the side that hired them (perhaps for conclusions that favor the retaining side through
financial gain), leading one judge to comment, differing norm selection.
“[T]he vicious method of the Law, which permits These surveys and field studies of adversarial
and requires each of the opposing parties to sum- allegiance cannot rule out of the possibility of
mon the witnesses on the party’s own account[,] selection effects creating the observed scoring dif-
. . . naturally makes the witness himself a partisan” ferences (Murrie & Boccaccini, 2015). Attorneys
(Wigmore, 1924). More modern surveys continue may preselect evaluators whom they know to be
to identify partisan bias as judges’ main concern sympathetic to their point of view, or gather pre-
about expert testimony, citing experts who appear liminary opinions from multiple evaluators and
to “abandon objectivity” and “become advocates” ultimately retain only the most favorable opinion.
for the retaining party (Krafka, Dunn, Johnson,
Furthermore, evaluators may self-select according
Cecil, & Miletich, 2002, p. 328).
to preexisting attitudes or preferences, choosing to
Research on forensic psychologists working
accept or decline particular types of cases or cases
within adversarial settings appears to validate
from particular referral sources (Neal, 2016). To
some of these concerns about adversarial alle-
eliminate the possible influence of selection ef-
giance, the tendency for experts to reach con-
clusions that support the party who retained
them (Murrie et al., 2009). Some early studies 3
SVP refers to sexually violent predator provisions,
suggested that clinicians drifted toward opin- which allow for sexual offenders to be civilly committed
ions favorable to the retaining party in a real-life after completing their criminal sentence. While SVP pro-
ceedings are technically civil, they still involve an adver-
civil litigation following a mining disaster (Zus- sarial arrangement, with different forensic psychologists
man & Simon, 1983) and in case vignettes testifying for the state and for the respondent (i.e., the
simulating sanity evaluations (Otto, 1989). individual being considered for commitment).
148 GUARNERA, MURRIE, AND BOCCACCINI
fects, Murrie and colleagues (2013) conducted an chology procedures were widely known, legal
experiment where practicing forensic evaluators decision makers might be able to weight their
were randomly assigned to believe they were confidence in psychological testimony accord-
working for the prosecution or the defense on a ing to the reliability of the procedure in question
real-world case consultation. Even with random (Butler, 2013). In addition, by carefully cata-
assignment, evaluators still tended to score cases loguing variables specific to the examiner, ex-
in the direction of allegiance. Unsurprisingly, al- aminee, and evaluation context from which re-
legiance effects were larger for the PCL-R (me- liability figures are drawn, field reliability
dium to large effect sizes; d ⫽ 0.55–0.85) than for research can also shed light on factors associ-
the more structured and objective Static-99R ated with better or worse reliability, suggesting
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
(small effect sizes; d ⫽ .20 –.42).4 While the Mur- further avenues for improvement (Guarnera &
This document is copyrighted by the American Psychological Association or one of its allied publishers.
rie et al. (2013) experiment used sex offender case Murrie, in press).
files scored with popular risk assessment instru- Given that increased standardization of fo-
ments, other types of forensic evaluations and rensic methods has the potential to ameliorate
instruments likely show the same vulnerability to multiple sources of unreliability and bias de-
adversarial allegiance. scribed here, more investigation of forensic in-
struments, checklists, practice guidelines, and
Future Directions for Research, Practice, other methods of standardization is a second
and Policy research priority (Ægisdóttir et al., 2006). Some
of this research should continue to focus on
The research overviewed here points to the creating standardized tools for forensic evalua-
growing realization that some portion of every tions and populations for which none are cur-
forensic opinion—perhaps a larger portion than rently available, particularly civil evaluations
we might now acknowledge— has more to do such as guardianship, child protection, fitness
with the examiner than the examinee. This is a for duty, and civil torts like emotional injury
serious problem that risks arbitrary or unjust (Heilbrun & Brooks, 2010). Future research can
outcomes for those undergoing forensic evalu- also continue to seek improvements to the cur-
ations, as well as diminishing the legal system’s rently modest predictive accuracy of risk assess-
confidence in psychological expertise. Unreli- ment instruments (Fazel, Singh, Doll, & Grann,
able evaluations can also put the community at 2012). However, given the current gap between
risk (e.g., assigning a low risk score to a truly the availability of forensic instruments and their
high-risk individual likely to offend again). At limited use by forensic evaluators in the field,
the same time, some degree of unreliability and perhaps more pressing is research on the imple-
bias on complex human decision tasks is un- mentation of forensic instruments in routine prac-
avoidable in light of our “bounded rationality” tice. More qualitative (e.g., Pinals, Tillbrook, &
(Gigerenzer & Goldstein, 1996). Given this ten- Mumley, 2006) and quantitative (e.g., Neal &
sion, what next steps are possible to prevent Grisso, 2014) investigations of how instruments
forensic psychology from becoming the NRC or are administered in routine practice, why instru-
PCAST’s next target? ments are or are not used, and what practical
Just as research has helped uncover these obstacles evaluators encounter are needed. With-
problems, further research can continue to de- out greater understanding of how instruments
fine the scope of the problem and suggest solu- are (or are not) implemented in practice—
tions. As a much-needed first step, foundational particularly in rural or other underresourced ar-
research should establish field reliability rates eas— continuing to develop new tools may not
for various types of forensic evaluations in or- translate to their increased use in the field.
der to assess the current situation and gauge Third, a clear recommendation for improving
progress toward improvement. Only a handful evaluator reliability is that states without stan-
of field reliability studies exist for a few types of
forensic evaluations (i.e., adjudicative compe- 4
tency, legal sanity, conditional release), and vir- These effect sizes held true for three out of four cases
included in the study. One case, involving an individual
tually nothing is known about the field reliabil- with exceptionally low risk, did not show evidence of ad-
ity of other types of evaluations, particularly versarial allegiance. All evaluators rated this individual as
civil evaluations. If error rates of forensic psy- similarly low risk, regardless of side of retention.
DISAGREEMENT AND BIAS IN FORENSIC EVALUATIONS 149
dards for the training and certification of foren- or compare their own base rates of incompe-
sic experts should adopt them, and states with tency and insanity findings to those of their
weak standards (e.g., mere workshop atten- colleagues.
dance) should strengthen them. What is less Ambitious evaluators could even experiment
clear, however, is what kinds and doses of train- with blinding themselves to the source of refer-
ing can improve reliability with the greatest ral in order to counteract adversarial allegiance
efficiency. Drawing from extensive research in (Robertson & Kesselheim, 2016). For example,
industrial and organizational psychology, cre- evaluators could try using a case manager, an
dentialing requirements that mimic the type of individual who communicates with attorneys
work evaluators do as part of their job (e.g., and controls the inflow and outflow of informa-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
mock reports, peer review, apprenticing) may tion, in order to prevent irrelevant biasing in-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
foster professional competency better than re- formation (such as the identity of the retaining
quirements dissimilar to job duties (e.g., written party) from reaching the evaluator (Dror, 2013).
tests; Phillips, 1998). Given that both evaluators Evaluators may soon be able to market (to at-
and certifying bodies have limited time and torneys or the court) their willingness to serve
resources, research into the most potent ingre- as blinded experts, since research suggests that
dients of successful forensic credentialing is a mock jurors view the testimony of blinded ex-
third research priority. perts as more credible (Robertson & Yokum,
Even while this important research remains 2012).
to be done, practicing forensic evaluators still Although individual evaluators can make
have many options to reduce the impact of many voluntary changes today in order to re-
unreliability and bias in their own work. While duce the impact of unreliability and bias on their
many clinicians cite introspection (i.e., looking forensic opinions, other reforms require wider-
inward in order to identify one’s own biases) as ranging structural transformation. For example,
a primary method to counteract personal ideol- state-level legislative action is needed to man-
ogy, idiosyncratic responses to examinees, and date more than one independent forensic opin-
other individual differences (Neal & Brodsky, ion. Requiring more than one independent opin-
2016), research suggests that introspection is ion is a powerful way to combat unreliability
ineffective and may even be counterproductive and bias by reducing the impact of any one
(Pronin, Lin, & Ross, 2002). Thus, more disci- evaluator’s error (Larrick, 2004). For example,
plined changes to personal practice are needed. by statute, Hawaii mandates three independent,
For example, when conducting evaluations for nonadversarial forensic opinions for all felony
which well-validated structured tools exist, defendants being evaluated for adjudicative
evaluators could commit to using such tools as competency and legal sanity (Hawaii Revised
a personal standard of practice. This would en- Statutes, 2003, sections 704 – 404 and 704 –
tail justifying to themselves (or preferably col- 406). Only nine other states require more than
leagues) why they did or did not use an avail- one competency evaluator, and 14 states allow
able tool for a particular case. Practicing (but do not require) more than one evaluator
forensic evaluators could also use simple debi- (Gowensmith et al., 2015). For more states to
asing methods to counteract confirmation bias, join these ranks, state legislators would need to
such as the “consider-the-opposite” technique prioritize funding for multiple forensic evalua-
in which evaluators ask themselves, “What are tions per defendant, likely a substantial outlay.
some reasons my initial judgment might be Similarly, more stringent state-level certifica-
wrong?” (Mussweiler, Strack, & Pfeiffer, tion standards would require considerable fi-
2000). To increase personal accountability, nancial investment in the infrastructure neces-
evaluators could keep organized records of their sary to organize trainings, vet certification
own forensic opinions and instrument scores, or materials, maintain records, and enforce com-
even help organize larger databases for evalua- pliance.
tors within their own institution or locality (Le- Even slower to change than state legislation
rner & Tetlock, 1999). Using these personal and infrastructure might be existing legal
data sets, evaluators might look for mean dif- norms, such as judges’ current willingness to
ferences in their own instrument scores when admit nonblinded, partisan experts. While au-
retained by the prosecution versus the defense, thoritative calls to action like the NRC and
150 GUARNERA, MURRIE, AND BOCCACCINI
PCAST reports may have some influence, most Chevalier, C. S., Boccaccini, M. T., Murrie, D. C., &
legal change only happens by the accretion of Varela, J. G. (2015). Static-99R reporting practices
legal precedent, which is a slow and unpredict- in sexually violent predator cases: Does norm se-
able process. Thus, radical changes regarding lection reflect adversarial allegiance? Law and Hu-
the roles and expectations of forensic experts— man Behavior, 39, 209 –218. http://dx.doi.org/10
.1037/lhb0000114
such as “hot tubbing,” a system pioneered in
DeMatteo, D., Marczyk, G., Krauss, D. A., & Burl, J.
Australia where opposing experts are ques- (2009). Educational and training models in foren-
tioned simultaneously and can also question sic psychology. Training and Education in Profes-
each other (Edmund, 2009)—seem unlikely to sional Psychology, 3, 184 –191. http://dx.doi.org/
take root any time soon in the American legal 10.1037/a0014582
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
system. Regardless, we hope the growing Dror, I. E. (2013). Practical solutions to cognitive and
This document is copyrighted by the American Psychological Association or one of its allied publishers.
awareness of problems of unreliability and bias human factor challenges in forensic science. Fo-
in the forensic sciences—in the wake of the rensic Science Policy & Management, 4, 105–113.
NRC and PCAST reports— can spur on legal http://dx.doi.org/10.1080/19409044.2014.901437
reforms, as well as create urgency to prioritize Dror, I. E., & Hampikian, G. (2011). Subjectivity and
some of these larger structural and funding bias in forensic DNA mixture interpretation. Sci-
changes within forensic psychology. ence & Justice, 51, 204 –208. http://dx.doi.org/10
.1016/j.scijus.2011.08.004
Dror, I., & Rosenthal, R. (2008). Meta-analytically
quantifying the reliability and biasability of foren-
References sic experts. Journal of Forensic Sciences, 53, 900 –
903. http://dx.doi.org/10.1111/j.1556-4029.2008
Acklin, M. W., Fuger, K., & Gowensmith, W. N.
.00762.x
(2015). Examiner agreement and judicial consen-
Edmund, G. (2009). Merton and the hot tub: Scientific
sus in forensic mental health evaluations. Journal
conventions and expert evidence in Australian civil
of Forensic Psychology Practice, 15, 318 –343.
procedure. Law and Contemporary Problems, 72,
http://dx.doi.org/10.1080/15228932.2015.1051447
Ægisdóttir, S., White, M. J., Spengler, P. M., Maugh- 159 –189. http://www.jstor.org/stable/40647170
erman, A. S., Anderson, L. A., Cook, R. S., . . . Epperson, D. L., Kaul, J. D., Goldman, R., Hout,
Rush, J. D. (2006). The meta-analysis of clinical S. J., Hesselton, D., & Alexander, W. (1998).
judgment project: Fifty-six years of accumulated Minnesota Sex Offender Screening Tool—Revised
research on clinical versus statistical prediction. (MnSOST-R). St. Paul, MN: Minnesota Depart-
Counseling Psychologist, 34, 341–382. http://dx ment of Corrections.
.doi.org/10.1177/0011000005285875 Fazel, S., Singh, J. P., Doll, H., & Grann, M. (2012).
American Psychological Association. (2013). Spe- Use of risk assessment instruments to predict vio-
cialty guidelines for forensic psychology. Ameri- lence and antisocial behaviour in 73 samples in-
can Psychologist, 68, 7–19. http://dx.doi.org/10 volving 24,827 people: Systematic review and
.1037/a0029889 meta-analysis. British Medical Journal, 345,
Boccaccini, M. T., Chevalier, C. S., Murrie, D. C., & e4692. http://dx.doi.org/10.1136/bmj.e4692
Varela, J. G. (2015). Psychopathy Checklist– Fuger, K. D., Acklin, M. W., Nguyen, A. H., Ignacio,
Revised use and reporting practices in sexually L. A., & Gowensmith, W. N. (2014). Quality of
violent predator evaluations. Sexual Abuse. Ad- criminal responsibility reports submitted to the Ha-
vanced online publication. http://dx.doi.org/10 waii judiciary. International Journal of Law and
.1177/1079063215612443 Psychiatry, 37, 272–280. http://dx.doi.org/10
Boccaccini, M. T., Turner, D. B., & Murrie, D. C. .1016/j.ijlp.2013.11.020
(2008). Do some evaluators report consistently Gigerenzer, G., & Goldstein, D. G. (1996). Reason-
higher or lower PCL-R scores than others? Find- ing the fast and frugal way: Models of bounded
ings from a statewide sample of sexually violent rationality. Psychological Review, 103, 650 – 669.
predator evaluations. Psychology, Public Policy, http://dx.doi.org/10.1037/0033-295X.103.4.650
and Law, 14, 262–283. http://dx.doi.org/10.1037/ Gowensmith, W. N., Pinals, D. A., & Karas, A. C.
a0014523 (2015). States’ standards for training and certify-
Butler, H. A. (2013). Debiasing juror perceptions of ing evaluators of competency to stand trial. Jour-
the infallibility of forensic identification evidence: nal of Forensic Psychology Practice, 15, 295–317.
The utility of educational and perspective-taking http://dx.doi.org/10.1080/15228932.2015.1046798
debiasing methods (Unpublished doctoral disserta- Gowensmith, W. N., Sledd, M., & Sessarego, S.
tion). Claremont Graduate University, Claremont, (2014). The impact of stringent certification stan-
CA. dards on forensic evaluator reliability. Paper pre-
DISAGREEMENT AND BIAS IN FORENSIC EVALUATIONS 151
sented at the annual meeting of the American Assessment, 84, 296 –314. http://dx.doi.org/10
Psychological Association, Washington, DC. .1207/s15327752jpa8403_09
Grisso, T. (1987). The economic and scientific future Miller, A. K., Rufino, K. A., Boccaccini, M. T.,
of forensic psychological assessment. American Jackson, R. L., & Murrie, D. C. (2011). On indi-
Psychologist, 42, 831– 839. http://dx.doi.org/10 vidual differences in person perception: Raters’
.1037/0003-066X.42.9.831 personality traits relate to their Psychopathy
Guarnera, L. G., & Murrie, D. C. (in press). Field Checklist—Revised scoring tendencies. Assess-
reliability of competency and sanity opinions: A ment, 18, 253–260. http://dx.doi.org/10.1177/
systematic review and meta-analysis. Psychologi- 1073191111402460
cal Assessment. Miller, C. S., Kimonis, E. R., Otto, R. K., Kline,
Gwet, K. L. (2014). Handbook of inter-rater reliabil- S. M., & Wasserman, A. L. (2012). Reliability of
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ity: The definitive guide to measuring the extent of risk assessment measures used in sexually violent
This document is copyrighted by the American Psychological Association or one of its allied publishers.
agreement among raters. Gaithersburg, MD: Ad- predator proceedings. Psychological Assessment,
vanced Analytics. 24, 944 –953. http://dx.doi.org/10.1037/a0028411
Hare, R. D. (2003). The Hare Psychopathy Checklist– Mossman, D. (2013). When forensic examiners dis-
Revised (2nd ed.). Toronto, Ontario, Canada: Multi- agree: Bias, or just inaccuracy? Psychology, Public
Health Systems. Policy, and Law, 19, 40 –55. http://dx.doi.org/10
Hawaii Revised Statutes, Vol. 14, 704 – 404 (2003). .1037/a0029242
http://dx.doi.org/10.1007/BF01044699 Murrie, D. C., & Boccaccini, M. T. (2015). Adver-
Heilbrun, K., & Brooks, S. (2010). Forensic psychol- sarial allegiance among forensic experts. Annual
ogy and forensic science: A proposed agenda for Review of Law and Social Science, 11, 37–55.
the next decade. Psychology, Public Policy, and http://dx.doi.org/10.1146/annurev-lawsocsci-
Law, 16, 219 –253. http://dx.doi.org/10.1037/ 120814-121714
a0019138 Murrie, D. C., Boccaccini, M. T., Guarnera, L. A., &
Helmus, L., Thornton, D., Hanson, R. K., & Bab- Rufino, K. A. (2013). Are forensic experts biased
chishin, K. M. (2012). Improving the predictive ac- by the side that retained them? Psychological Sci-
curacy of Static-99 and Static-2002 with older sex ence, 24, 1889 –1897. http://dx.doi.org/10.1177/
offenders: Revised age weights. Sexual Abuse, 24, 0956797613481812
64 –101. Murrie, D. C., Boccaccini, M. T., Turner, D. B.,
Homant, R. J., & Kennedy, D. B. (1987). Subjective Meeks, M., Woods, C., & Tussey, C. (2009). Rater
factors in clinicians’ judgments of insanity: Com- (dis)agreement on risk assessment measures in
parison of a hypothetical case and an actual case. sexually violent predator proceedings: Evidence of
Professional Psychology: Research and Practice, adversarial allegiance in forensic evaluation? Psy-
18, 439 – 446. http://dx.doi.org/10.1037/0735-7028 chology, Public Policy, and Law, 15, 19 –53.
.18.5.439 http://dx.doi.org/10.1037/a0014897
Krafka, C., Dunn, M. A., Johnson, M. T., Cecil, J. S., Murrie, D. C., Boccaccini, M. T., Zapf, P. A., War-
& Miletich, D. (2002). Judge and attorney experi- ren, J. I., & Henderson, C. E. (2008). Clinician
ences, practices, and concerns regarding expert variation in findings of competence to stand trial.
testimony in federal civil trials. Psychology, Public Psychology, Public Policy, and Law, 14, 177–193.
Policy, and Law, 8, 309 –332. http://dx.doi.org/10 http://dx.doi.org/10.1037/a0013578
.1037/1076-8971.8.3.309 Murrie, D. C., & Warren, J. I. (2005). Clinician
Landis, J. R., & Koch, G. G. (1977). The measure- variation in rates of legal sanity opinions: Impli-
ment of observer agreement for categorical data. cations for self-monitoring. Professional Psychol-
Biometrics, 33, 159 –174. http://dx.doi.org/10 ogy: Research and Practice, 36, 519 –524. http://
.2307/2529310 dx.doi.org/10.1037/0735-7028.36.5.519
Larrick, R. P. (2004). Debiasing. In D. J. Koehler & Mussweiler, T., Strack, F., & Pfeiffer, T. (2000).
N. Harvey (Eds.), Blackwell handbook of judgment Overcoming the inevitable anchoring effect: Con-
and decision making (pp. 316 –338). Oxford, UK: sidering the opposite compensates for selective
Blackwell. http://dx.doi.org/10.1002/9780470 accessibility. Personality and Social Psychology
752937.ch16 Bulletin, 26, 1142–1150. http://dx.doi.org/10
Lerner, J. S., & Tetlock, P. E. (1999). Accounting for .1177/01461672002611010
the effects of accountability. Psychological Bulle- National Research Council, Committee on Identify-
tin, 125, 255–275. http://dx.doi.org/10.1037/0033- ing the Needs of the Forensic Science Community.
2909.125.2.255 (2009). Strengthening forensic science in the
Meyer, G. J., Mihura, J. L., & Smith, B. L. (2005). United States: A path forward. Washington, DC:
The interclinician reliability of Rorschach interpre- National Academies Press. Retrieved from https://
tation in four data sets. Journal of Personality www.ncjrs.gov/pdffiles1/nij/grants/228091.pdf
152 GUARNERA, MURRIE, AND BOCCACCINI
Neal, T. M. (2016). Are forensic experts already Pronin, E., Lin, D. Y., & Ross, L. (2002). The bias
biased before adversarial legal parties hire them? blind spot: Perception of bias in self versus others.
PLoS ONE, 11, e0154434. http://dx.doi.org/10 Personality and Social Psychology Bulletin, 28,
.1371/journal.pone.0154434 369 –381. http://dx.doi.org/10.1177/0146167
Neal, T., & Brodsky, S. L. (2016). Forensic psychol- 202286008
ogists’ perceptions of bias and potential correction Robertson, C. T., & Kesselheim, A. S. (2016). Blind-
strategies in forensic mental health evaluations. ing as a solution to bias: Strengthening biomedical
Psychology, Public Policy, and Law, 22, 58 –76. science, forensic science, and law. San Diego, CA:
http://dx.doi.org/10.1037/law0000077 Elsevier.
Neal, T., & Grisso, T. (2014). Assessment practices Robertson, C. T., & Yokum, D. V. (2012). The effect
and expert judgment methods in forensic psychol- of blinded experts on juror verdicts. Journal of
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ogy and psychiatry: An international snapshot. Empirical Legal Studies, 9, 765–794. http://dx.doi
This document is copyrighted by the American Psychological Association or one of its allied publishers.