FACTORES DEL SISTEMA. Michael, 2019. Cómo Afecta El Orden de Las Preguntas

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Memory

ISSN: 0965-8211 (Print) 1464-0686 (Online) Journal homepage: https://www.tandfonline.com/loi/pmem20

How do ordered questions bias eyewitnesses?

Robert B. Michael & Maryanne Garry

To cite this article: Robert B. Michael & Maryanne Garry (2019): How do ordered questions bias
eyewitnesses?, Memory, DOI: 10.1080/09658211.2019.1607388

To link to this article: https://doi.org/10.1080/09658211.2019.1607388

Published online: 17 Apr 2019.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=pmem20
MEMORY
https://doi.org/10.1080/09658211.2019.1607388

How do ordered questions bias eyewitnesses?


a
Robert B. Michael * and Maryanne Garryb*
a
Department of Psychology, University of Louisiana Lafayette, Louisiana, USA; b Psychology Department, The University of Waikato,
Hamilton, New Zealand

ABSTRACT ARTICLE HISTORY


Background: Suggestive techniques can distort eyewitness memory (Wells & Loftus, 2003, Received 20 July 2018
Eyewitness memory for people and events. In A. M. Goldstein (Ed.), Handbook of psychology: Accepted 5 April 2019
Forensic Psychology, Vol. 11 (pp. 149–160). Hoboken, NY: John Wiley & Sons Inc). Recently, we
KEYWORDS
found that suggestion is unnecessary: Simply reversing the arrangement of questions put to Eyewitness; memory;
eyewitnesses changed what they believed (Michael & Garry, 2016, Ordered questions bias question order; anchoring-
eyewitnesses and jurors. Psychonomic Bulletin & Review, 23, 601–608. doi:10.3758/s13423-015- and-adjustment
0933-1). But why? One explanation might be that early questions set an anchor that
eyewitnesses then adjust away from insufficiently. Methods: We tracked how eyewitness
beliefs changed over the course of questioning. We then investigated the influence of
people’s need to engage in and enjoy effortful cognition. This factor, “Need for Cognition,”
(NFC) affects the degree to which people adjust (Cacioppo, Petty, & Feng Kao, 1984, The
efficient assessment of need for cognition. Journal of Personality Assessment, 48, 306–307.
doi:10.1207/s15327752jpa4803_13; Epley & Gilovich, 2006, The anchoring-and-adjustment
heuristic: Why the adjustments are insufficient. Psychological Science, 17, 311–318.
doi:10.1111/j.1467-9280.2006.01704.x). Results: In our first two experiments we found results
consistent with an anchoring-and-adjustment account. But in Experiments 3 and 4 we found
that NFC provided only partial support for that account. Conclusions: Taken together, these
findings have implications for understanding how people form beliefs about the accuracy of
their memory.

How do ordered questions bias eyewitnesses?


of suggestive techniques: All it took was a simple change
Eyewitnesses play a critical role in the criminal justice to the order of a set of questions. We asked people to
system. But more than 40 years of psychological research watch a simulated crime, and then we asked them ques-
shows that suggestive techniques such as leading ques- tions about what they had seen. When we arranged
tions or post-identification feedback can distort eyewitness those questions from the easiest to the most difficult
memory and confidence (Douglass & Steblay, 2006; Frenda, (easy-to-difficult), people believed they had answered
Nichols, & Loftus, 2011; Loftus, 2005; Loftus, Donders, more questions correctly and were more confident about
Hoffman, & Schooler, 1989). When eyewitnesses are what they remembered, compared with their counterparts
unknowingly wrong, their confidence becomes a for whom we had arranged those same questions the other
problem because jurors find it persuasive (Cutler, Penrod, way around (difficult-to-easy; Michael & Garry, 2016).
& Dexter, 1990; Douglass, Neuschatz, Imrich, & Wilkinson, These findings are consistent with research investi-
2010). That persuasiveness is especially troublesome in gating the influence of ordered questions within an edu-
the context of eyewitness identifications. We now know cational context (Jackson & Greene, 2014; Weinstein &
that about 70% of wrongful convictions involve eyewit- Roediger, 2010, 2012). But despite an accumulating body
nesses who mistakenly identify the wrong perpetrator of research, we know little about the mechanisms under-
(Innocence Project, 2018). It is therefore important to inves- lying these effects (Jackson & Greene, 2014; Michael &
tigate the factors that affect eyewitnesses’ beliefs about Garry, 2016; Michael & Weinstein, 2018; Weinstein & Roedi-
their memory. ger, 2010, 2012). More specifically, the following question
Recently, we discovered that eyewitnesses’ beliefs remains largely unanswered: How do ordered questions
about their memory can be manipulated without the use influence people’s beliefs?

CONTACT Robert B. Michael [email protected] Department of Psychology, University of Louisiana Lafayette, PO Box 43644, Lafayette, LA,
USA
*Both authors contributed to the conception of the research and its design. Robert B. Michael collected and analyzed data. Both authors interpreted the data
and contributed to the writing of the article, including drafts and critical revisions. Robert B. Michael and Maryanne Garry approved the final version of the
article.
Supplemental data for this article can be accessed at doi:10.1080/09658211.2019.1607388
© 2019 Informa UK Limited, trading as Taylor & Francis Group
2 R. B. MICHAEL AND M. GARRY

We know from research that at least two theories the true answer, insufficient adjustment becomes a likely
provide explanations that are unlikely. The first of these outcome.
theories – the affect heuristic – proposes that people’s feel- How would the anchoring-and-adjustment heuristic
ings can quickly and automatically influence their sub- explain the influence of ordered questions on people’s
sequent information processing (for a review, see Slovic, beliefs? We hypothesise that people generate their initial
Finucane, Peters, & MacGregor, 2007). Within the context beliefs about test performance and memory confidence
of ordered questions, early easy questions should based on the ease or difficulty of early test questions.
produce positive affect, while early difficult questions More specifically, that easy-to-difficult subjects hold initial
should produce negative affect. These affective states beliefs of relatively good test performance and high confi-
could then influence people’s interpretation of the later dence in their memory, while difficult-to-easy subjects hold
questions. If this explanation were true, then we should initial beliefs of relatively poor test performance and low
expect people’s confidence in their answers to questions confidence in their memory. We further hypothesise that
to vary depending on where in a sequence those questions people adjust these beliefs as the test becomes progress-
appear. For example, if the first few questions were ively easier or more difficult, but only to the degree that
difficult, then people’s confidence for a subsequent easy the adjustment is plausible. The result? People’s final
question should be lower than if that same easy question beliefs about how they performed on the test, or their
had appeared early on. But it is not. Results from both confidence in their memory, are skewed toward their
the educational and eyewitness domains show that confi- initial anchor. In other words, despite both question
dence ratings for specific questions are similar, regardless arrangement groups answering the same overall set of
of when those questions appear in a sequence (Michael questions, their resulting beliefs are not the same.
& Garry, 2016; Weinstein & Roediger, 2010, 2012). Some evidence from the existing research fits with the
The second theory – the availability heuristic – proposes anchoring-and-adjustment theory. As noted earlier, differ-
instead that people rely on the information that most easily ences in people’s beliefs emerge as the test progresses,
springs to mind when making decisions and evaluations and not only at the end (Weinstein & Roediger, 2012).
(Tversky & Kahneman, 1973). Within the context of But we are still missing a finer-grained examination of
ordered questions, early questions suffer less from a how these biases develop. For example, one important
buildup of interference and can be rehearsed more than but unanswered question is: How do these differences
later questions (Rundus, 1971). Therefore, when people emerge between people who answer easy-to-difficult
are later asked to estimate their performance on the questions and those who answer difficult-to-easy ques-
whole test, we might expect that what springs to mind tions? Is it all over after the very first question, or is some
are the early parts of the test. If this explanation were minimum number necessary before these groups start to
true, we should see that people can most easily remember diverge? In addition, we know nothing about how or why
the early test questions. But that is not what we see. In fact, people adjust their beliefs over the course of questioning.
what little research there is instead finds that people tend One possibility – consistent with the anchoring-and-adjust-
to remember the later questions best (Franco, 2015; Jones ment explanation – is that people who answer easy-to-
& Roediger, 1995). Moreover, other work shows that differ- difficult questions develop an initial impression that the
ences in beliefs develop while people take the test, and not test is easy and they’re performing well, then adjust this
solely afterward as a result of remembering the experience belief as the test becomes progressively more difficult.
(Weinstein & Roediger, 2012). People who answer difficult-to-easy questions might do
The affect and availability heuristic explanations seem exactly the opposite. The problem is that we do not
inadequate. Where, then, does that leave us? One promis- know if the theory is correct and that this approach is
ing alternative theory – the anchoring-and-adjustment really how people behave.
heuristic – proposes that in situations of uncertainty, To address this problem, we first conducted two exper-
people rely on an initial piece of information as a starting iments (Experiments 1 and 2) in which we asked people to
point when providing estimated answers to questions predict, after every test question, how many of the 30 total
(Tversky & Kahneman, 1974). This “anchor” need not be questions they would answer correctly. Across both exper-
given explicitly; it can be self-generated. For example, iments, we found initial support for an anchoring-and-
when asked to estimate the freezing point of vodka, adjustment explanation. Then, in an effort to add nuance
what springs to mind for most people who are uncertain to this theoretical account, we conducted two additional
of the true answer is the freezing point of water – an experiments (Experiments 3 and 4). Specifically, we know
anchor that people’s estimates are skewed towards that people with a relatively strong desire to engage in
(Epley & Gilovich, 2006). But why are adjustments away effortful thinking tend to make more sufficient adjustments
from these self-generated anchors typically insufficient? than people with a relatively weak desire (Epley & Gilovich,
Research suggests that the adjustment process is 2006). We hypothesised that if people’s beliefs are indeed
effortful and stops once people reach a plausible value the product of an anchoring-and-adjustment heuristic,
(Epley & Gilovich, 2006). Because a plausible range of then the desire to engage in effortful thinking, or Need
values will often include values between the anchor and For Cognition (NFC; Cacioppo et al., 1984), should
MEMORY 3

influence the magnitude of those beliefs. Across both difficulty (see Michael & Garry, 2016).1 Subjects were ran-
experiments, however, we found only partial support for domly assigned one of these test versions. For each test
this explanation. question, subjects used a scale from 1 (Not at all
confident) to 5 (Very confident) to report their confidence
they had selected the correct answer. This item-confidence
Experiment 1
measure served primarily as a manipulation check. Criti-
If the anchoring-and-adjustment explanation is correct, cally, between each test question we asked subjects,
then subjects who answer questions arranged from the “This test consists of 30 questions total. How many of
easiest to most difficult should initially believe they are those questions do you think you will get correct?” Sub-
doing well, but should then adjust their estimates down- jects responded with a number between 0 and 30.
ward over the course of the test. Conversely, subjects The fourth phase followed the test. Subjects answered
who answer questions arranged from the most difficult two randomly ordered questions. One question asked:
to the easiest should show the opposite pattern, initially “The memory test about Eric the Electrician consisted of
believing they are doing poorly, but adjusting their esti- 30 questions. How many of those questions do you think
mates upward over the course of the test. In addition, sub- you answered correctly?” Subjects responded with a
jects should make insufficient adjustments to these number between 0 and 30. This question, in combination
estimates, resulting in group differences even at the end with those asked in the third phase, results in 30 estimates
of the test (Epley & Gilovich, 2006). To investigate these of performance for each subject, staggered across the test.
predictions, we tracked how subjects’ beliefs about their The other question asked: “How confident are you about
performance changed over the course of questioning. the accuracy of your memory for the video?” Subjects
We repeatedly asked subjects to predict how many of responded on a scale from 1 (Not at all confident) to 5
the 30 total questions they would answer correctly. (Very confident).

Method Results and Discussion


Subjects We first carried out a manipulation check by examining
In our earlier work, we populated each cell of the exper- mean confidence ratings for individual test questions.
imental design with a minimum of 50 subjects (Michael & These data appear in the bottom panel of Figure 1 and
Garry, 2016). In line with Cumming’s (2012) recommen- show that our manipulation was successful: the difficult-
dations, we aimed to boost precision in the current exper- to-easy subjects were increasingly confident in their test
iment with a sample size of 100 per cell (200 total). We answers (M1 = 1.75, SD1 = 1.07; M30 = 4.76, SD30 = 0.66, r
ultimately recruited a total of 218 Amazon Mechanical = .52, 95% CI [.50, .55], p < .01), and easy-to-difficult sub-
Turk workers, because Mechanical Turk and Qualtrics – jects were the opposite (M1 = 4.77, SD1 = 0.77; M30 = 1.83,
our experimental software – interact such that it is possible SD30 = 1.09, r = −.57, 95% CI [−.59, −.54], p < .01). We also
to unintentionally collect more data points than requested. found that the order of questions had no meaningful
effect on overall test performance, Mdifficult-to-easy = 20.72,
Design SDdifficult-to-easy = 3.12; Measy-to-difficult = 20.54, SDeasy-to-
We manipulated Question Order (easy-to-difficult, difficult- difficult = 2.52; Mdiff = 0.17, 95% CI [−0.58, 0.93], t(216) =
to-easy) between subjects. 0.45, p = .65.
We next examined responses to the questions asked in
Procedure the fourth phase, to determine the extent to which the
The experiment had four phases. First, we told subjects the order of test questions affected how well subjects believed
study was examining learning styles. Subjects then they performed on the test and how confident they felt
watched a video of a tradesman who stole items from about the accuracy of their memory for the video. These
the unoccupied house in which he was working (Takarangi, data showed that difficult-to-easy subjects believed they
Parker, & Garry, 2006). performed more poorly on the test than easy-to-difficult
The second phase began when the video ended. Sub- subjects, Mdifficult-to-easy = 15.08, SDdifficult-to-easy = 5.04;
jects solved Sudoku number puzzles for 10 min as a filler Measy-to-difficult = 18.49, SDeasy-to-difficult = 5.38; Mdiff = 3.42,
task. 95% CI [2.02, 4.81], t(216) = 4.83, p < .01. Surprisingly,
In the third phase, subjects took a surprise memory test however, these differences did not extend to subjects’
consisting of 30 two-alternative forced choice (2AFC) ques- reported confidence in the accuracy of their memory for
tions about the video. These questions, drawn from and the video, Mdifficult-to-easy = 3.00, SDdifficult-to-easy = 0.91;
normed in our earlier work, were arranged sequentially Measy-to-difficult = 3.00, SDeasy-to-difficult = 0.98; Mdiff = 0.00,
from those that people answer with the lowest confidence 95% CI [−0.25, 0.25], t(216) = 0.00, p = 1.00.
to those that people answer with the highest confidence What are we to make of these results? On the one hand,
(difficult-to-easy) or vice versa (easy-to-difficult); these the findings partially replicate our earlier experiments,
arrangements are highly related to reported question showing that the arrangement of questions influences
4 R. B. MICHAEL AND M. GARRY

Figure 1. Top panel: Mean estimated total test scores reported after each test question as a function of question arrangement. Bottom panel: Mean confi-
dence of a correct answer for each test question as a function of question arrangement. Error bars represent 95% confidence intervals of means. Data are from
Experiment 1.

people’s beliefs about their test performance. But on the performance, but does little to influence judgments of
other hand, we did not replicate our earlier findings with memory confidence. Another explanation is that the
respect to memory confidence. One possible explanation arrangement of questions has a smaller true influence on
is that people believe their test performance reflects the confidence than we estimated in our earlier work;
ease or difficulty of the test questions themselves, rather these results might therefore reflect ordinary sampling
than reflecting the quality of their memory. That potential variability.
difference in attribution fits with research showing that We now turn to our primary question: How do people
people rely on anchors less as their compatibility with adjust their beliefs over the course of questioning? To
target judgments decreases (Chapman & Johnson, 2002). answer this question, we examined the mean predicted
If this explanation is true, then it is plausible that the test scores people reported after each test question;
arrangement of questions influences estimates of test these data appear in the top panel of Figure 1.
MEMORY 5

As the figure shows, the influence of a question Results and Discussion


depended on the difficulty of that question and when it
We first carried out a manipulation check by examining
appeared. After the first question, easy-to-difficult subjects
mean confidence ratings for individual test questions.
made predictions that were high (M1 = 23.83, SD1 = 4.79)
These data appear in the bottom panel of Figure 2 and
and descended over the course of the test (M30 = 18.49,
show that our manipulation worked: difficult-to-easy sub-
SD30 = 5.38). Put another way, regression analyses
jects were increasingly confident in their test answers
showed that their adjustments fit to a straight line2: esti-
(M1 = 1.99, SD1 = 1.19; M30 = 4.35, SD30 = 1.29, r = 0.41,
mate = 25.03–0.19 * Time point; R 2 = .08, F(1, 3358) =
95% CI [.38, .44], p < .01), and easy-to-difficult subjects
291.26, p < .01. But the difficult-to-easy subjects did not
were the opposite (M1 = 4.39, SD1 = 0.92; M30 = 1.72, SD30
do the opposite; instead, even after the first question
= 1.06, r = −.45, 95% CI [−.48, −.42], p < .01).
their predictions were lower than those of their easy-to-
We next scored subjects’ responses to the questions by a
difficult counterparts (M1 = 17.95, SD1 = 5.36). These predic-
computerised keyword search. For example, if a subject’s
tions continued to drop, reaching their lowest point after
response to the question, “How many toothbrushes were in
the ninth question (M9 = 10.00, SD9 = 6.87), at which point
the bathroom?” included either “six” or “6” it was marked
they ascended over the remainder of the test (M30 =
correct. In our prior work using the same scoring criteria,
15.08, SD30 = 5.04). Put another way, regression analyses
we found a high correlation with a blind rater’s hand-scores
showed that their adjustments fit to a cubic curve: esti-
(r = .96, p < .01; Michael & Garry, 2016). As in Experiment 1,
mate = 6.28 + 0.27 * Time point + 0.02 * (Time point –
the order of questions had no meaningful influence on
15.5)2 - 0.002 * (Time point – 15.5)3; R 2 = .08, F(3, 3176) =
overall test performance, Mdifficult-to-easy = 11.95, SDdifficult-to-
86.28, p < .01.
easy = 3.71; Measy-to-difficult = 11.13, SDeasy-to-difficult = 3.83; Mdiff
In addition, a repeated-measures analysis of variance
= 0.82, 95% CI [−0.24, 1.88], t(196) = 1.53, p = .13.
(ANOVA) revealed an interaction between Time point and
We next examined the extent to which the order of test
Question Order, F(29, 188) = 12.80, p < .01. Follow-up Bon-
questions affected how well subjects believed they per-
ferroni-corrected comparisons (i.e., α = .05 / 30 = 0.00167)
formed on the test and how confident they felt about
revealed statistically significant differences between the
the accuracy of their memory for the video. These data
two groups at every time point. The maximum difference
showed that difficult-to-easy subjects believed they per-
in predictions occurred after the 9th test question,
formed more poorly on the test than easy-to-difficult sub-
Mdifficult-to-easy = 10.00, SDdifficult-to-easy = 6.87; Measy-to-difficult
jects, Mdifficult-to-easy = 10.09, SDdifficult-to-easy = 4.37; Measy-to-
= 23.62, SDeasy-to-difficult = 5.26; Mdiff = 13.62, 95% CI [11.99,
difficult = 13.55, SDeasy-to-difficult = 6.54; Mdiff = 3.46, 95% CI
15.25], t(216) = 16.48, p < .0001, and the minimum differ-
[1.89, 5.03], t(196) = 4.33, p < .01. But these differences did
ence in predictions occurred after the final test question,
not extend to subjects’ reported confidence in the accu-
Mdifficult-to-easy = 15.08, SDdifficult-to-easy = 5.04; Measy-to-difficult
racy of their memory for the video. We found only a
= 18.49, SDeasy-to-difficult = 5.38; Mdiff = 3.42, 95% CI [2.02,
trivial difference in subjects’ post-test confidence ratings,
4.81], t(216) = 4.83, p < .0001.
Mdifficult-to-easy = 2.59, SDdifficult-to-easy = 0.95; Measy-to-difficult
Taken together, these findings show that people adjust
= 2.48, SDeasy-to-difficult = 0.98; Mdiff = 0.11, 95% CI [−0.16,
their beliefs about performance during questioning. More-
0.38], t(196) = 0.83, p = .41. Taken together, these findings
over, the narrowing gap between estimates from the easy-
are consistent with Experiment 1.
to-difficult and difficult-to-easy subjects is consistent with
Next, we examined the mean predicted test scores
an anchoring-and-adjustment explanation. To determine
people reported after each test question; these data
the extent to which these patterns would replicate and
appear in the top panel of Figure 2. This pattern looks
generalise to a different question format, we conducted
remarkably similar to the pattern in Figure 1. After just
Experiment 2.
one question, the easy-to-difficult subjects made predic-
tions that were high (M1 = 21.85, SD1 = 5.66) and then des-
Experiment 2 cended over the course of the test (M30 = 13.55, SD30 =
6.54). Put another way, regression analyses showed that
Method
their adjustments fit to a straight line: estimate = 22.79–
Subjects 0.28 * Time point; R 2 = .12, F(1, 3088) = 405.44, p < .01. But
We recruited 200 Mechanical Turk workers. Two subjects the difficult-to-easy subjects – after just one question –
were excluded due to missing data. made predictions that were lower than their easy-to-
difficult counterparts (M1 = 15.31, SD1 = 5.66). These predic-
Design tions continued to drop, reaching their lowest point after
The design was the same as Experiment 1. the eleventh question (M11 = 6.82, SD11 = 5.97), and then
ascended over the remainder of the test (M30 = 10.09,
Procedure SD30 = 4.37). Put another way, regression analyses
The procedure was the same as Experiment 1, except we showed that their adjustments fit to a cubic curve: esti-
converted each 2AFC question into a cued-recall question. mate = 4.70 + 0.16 * Time point + 0.02 * (Time point -
6 R. B. MICHAEL AND M. GARRY

Figure 2. Top panel: Mean estimated total test scores reported after each test question as a function of question arrangement. Bottom panel: Mean confi-
dence of a correct answer for each test question as a function of question arrangement. Error bars represent 95% confidence intervals of means. Data are from
Experiment 2.

15.5)2 - 0.001 * (Time point - 15.5)3; R 2 = .08, F(3, 2846) = predictions occurred after the final test question, Mdifficult-
81.83, p < .01. to-easy = 10.09, SDdifficult-to-easy = 4.37; Measy-to-difficult = 13.55,
In addition, a repeated-measures ANOVA revealed an SDeasy-to-difficult = 6.54; Mdiff = 3.46, 95% CI [1.89, 5.03], t
interaction between Time point and Question Order, F (196) = 4.33, p < .0001.
(29, 168) = 11.58, p < .01. Follow-up Bonferroni-corrected Taken together, Experiments 1 and 2 are consistent with
comparisons (i.e., α = .05 / 30 = 0.00167) revealed statisti- the idea that people developed beliefs using an anchoring-
cally significant differences between the two groups at and-adjustment heuristic. The results suggest that the ease
every time point. The maximum difference in predictions or difficulty with which people experienced the first test
occurred after the 6th test question, Mdifficult-to-easy = 7.74, question provided an anchoring point that constrained
SDdifficult-to-easy = 6.78; Measy-to-difficult = 20.97, SDeasy-to- adjustments across the remainder of the test. The end
difficult = 6.23; Mdiff = 13.23, 95% CI [11.41, 15.06], t(196) = result was a difference in what people believed about
14.31, p < .0001, and the minimum difference in their performance – even though everyone answered the
MEMORY 7

same set of questions, and their actual performance was Design


the same. The design was the same as Experiment 1, except that we
But the way people adjusted over the course of the test additionally split the sample into two groups based on NFC
was more interesting than we predicted. Specifically, we scores (High NFC, Low NFC).
predicted that easy-to-difficult subjects would initially
believe they were performing well, and would insufficiently
Procedure
adjust their beliefs downward over the course of the test.
The procedure was the same as Experiment 1, except as
By contrast, we predicted that difficult-to-easy subjects
follows.
would show the inverse. Instead, we found a more
First, we removed the confidence ratings subjects
complex pattern, in which easy-to-difficult subjects
reported for their answers to each test question. We
behaved as expected, but difficult-to-easy subjects first
removed this manipulation check because (a) we have
adjusted down before slowly adjusting back up. Moreover,
already established across multiple experiments – both
these patterns were consistent across Experiments 1 and
here and in our prior work – that the manipulation is
2. In summary, the two test arrangements produce mark-
effective, and (b) it is possible the confidence judgments
edly different experiences.
about individual questions were confounding the predic-
There are at least two possible explanations for these
tions subjects repeatedly provided in Experiments 1 and
different experiences. First, given the ambiguous
2. If so, then we might expect to see a different pattern
difficulty of upcoming test questions, easy-to-difficult sub-
of developing beliefs when this confound is removed.
jects might be somewhat cautious in their initial optimism.
Second, we included an 18-item short form of the Need
Perhaps these subjects anticipated the unlikely situation
for Cognition Scale just prior to the end of the experiment.
that they would answer every question correctly; therefore,
Subjects rated their agreement with each item on a scale
they rapidly hit a subjective ceiling. The only reasonable
from 1 (Strongly disagree) to 7 (Strongly agree). An
adjustment these subjects could then have made early
example item is “The idea of relying on thought to make
on was downward, or none at all. Second, difficult-to-
my way to the top appeals to me.” This form of the Need
easy subjects may have been unwilling to initially anchor
for Cognition Scale demonstrates good reliability, θ = .90
at a sufficiently low value – perhaps sensibly, given the
(here, θ represents a maximised Cronbach’s alpha coeffi-
ambiguous difficulty of upcoming test questions. But as
cient; Cacioppo et al., 1984).
these subjects accrued evidence over the early questions
that they were performing poorly, they continued to
adjust downward, before eventually recognising that the
Results and Discussion
questions were getting easier. These two explanations
are not mutually exclusive. In the analyses that follow, we found virtually identical
What other evidence might we look for to support or results when treating NFC as a continuous or categorical
refute an anchoring-and-adjustment explanation? We sus- variable, so for simplicity, we first split our sample into
pected that one promising approach would be to examine two groups based on the median NFC score of 4.50
people’s Need For Cognition, because people who enjoy (high: M = 5.41, SD = 0.56, n = 198; low: M = 3.69, SD =
effortful thinking adjust more sufficiently than people 0.70, n = 202; overall: M = 4.54, SD = 1.07, n = 400). We
who do not (Cacioppo et al., 1984; Epley & Gilovich, again found that the order of questions had no meaningful
2006). In Experiment 3, we set out to replicate Experiment influence on overall test performance, Mdifficult-to-easy =
1 while considering the role of people’s NFC. We hypoth- 19.64, SDdifficult-to-easy = 3.59; Measy-to-difficult = 20.24, SDeasy-
esised – in accord with the theoretical account – that to-difficult = 3.26; Mdiff = 0.61, 95% CI [−0.07, 1.28], t(398) =
people with high NFC would make larger adjustments 1.76, p = .08. Interestingly, however, people with high
over the course of questioning than people with low NFC answered slightly more questions correctly than
NFC. More specifically, we predicted that: (1) easy-to- their low NFC counterparts, Mhigh = 20.83, SDhigh = 3.11;
difficult subjects with high NFC would initially adjust Mlow = 19.05, SDlow = 3.54; Mdiff = 1.78, 95% CI [1.12, 2.43],
upward beyond the subjective ceiling of their low NFC t(398) = 5.33, p < .01. We found no statistically significant
counterparts, before making larger downward adjust- interaction between the order of questions and NFC, F(1,
ments; (2) difficult-to-easy subjects with high NFC would 396) = 0.44, p = .51.
initially adjust downward beyond their low NFC counter- Next, we examined subjects’ final test estimates and
parts, before making larger upward adjustments. post-test reports of confidence in the accuracy of their
memory. For test estimates, we replicated the typical
finding in which difficult-to-easy subjects believed they
Experiment 3 performed more poorly on the test than easy-to-difficult
subjects, Mdifficult-to-easy = 14.52, SDdifficult-to-easy = 6.12;
Method
Measy-to-difficult = 18.55, SDeasy-to-difficult = 6.08; Mdiff = 4.04,
Subjects 95% CI [2.84, 5.24], t(398) = 6.61, p < .01. We found no stat-
We recruited 400 Mechanical Turk workers. istically significant interaction between the order of
8 R. B. MICHAEL AND M. GARRY

questions and NFC, F(1, 396) = 0.03, p = .85, nor a main in predictions between high and low NFC subjects
effect of NFC, F(1, 396) = 1.80, p = .18. These results occurred after the 9th test question, Mhigh = 24.66, SDhigh
remained virtually unchanged when we controlled for = 5.31; Mlow = 21.34, SDlow = 7.87; Mdiff = 3.32, 95% CI
the slight difference in test accuracy between people [1.42, 5.21], t(191) = 3.45, p = .0007, and the minimum
with low and high NFC. For subjects’ confidence in the difference in predictions occurred after the 28th test ques-
accuracy of their memory, we replicated the null findings tion, Mhigh = 20.08, SDhigh = 5.50; Mlow = 18.93, SDlow = 7.09;
from Experiments 1 and 2, finding no statistically signifi- Mdiff = 1.16, 95% CI [−0.64, 2.95], t(191) = 1.27, p = .21.
cant differences in subjects’ post-test confidence ratings, How are we to explain these results? On the one hand,
all ps > .08. the patterns are consistent with Experiments 1 and 2, in
We now turn to our primary question: How does the that the overall shape of developing beliefs fits with an
desire to engage in effortful cognition influence the adjust- anchoring-and-adjustment explanation. Moreover, the
ments eyewitnesses make to their developing beliefs consistently higher predictions from people with high
about performance? To answer this question, we examined NFC in the easy-to-difficult condition fits with our earlier
the mean predicted test scores people reported after each idea about people hitting a subjective ceiling – one that
test question; these data appear in Figure 3. is slightly higher for people with high NFC, who are more
As the figure shows, the influence of a question again capable adjusters. But on the other hand, we did not antici-
depended on the difficulty of that question and when it pate the lack of any meaningful differences according to
appeared. But the figure also reveals that the influence of NFC in the difficult-to-easy conditions, and it is difficult to
NFC was more complicated than we predicted. We had reconcile that finding with an anchoring-and-adjustment
anticipated that people with low NFC would make explanation.
smaller adjustments than their high NFC counterparts. One possible problem with interpreting these data is
That is, we expected that the lines or curves in Figure 3 that in asking people to repeatedly predict their test per-
for low NFC subjects would look “flatter” than those of formance, we altered their behaviour from how it would
the high NFC subjects. But they do not. In fact, in the unfold in the absence of these repeated requests.
difficult-to-easy conditions, low and high NFC subjects Specifically, the repeated requests for predictions might
look virtually identical, adjusting similarly across the test. have encouraged people to more carefully monitor and
Put another way, regression analyses showed that both think effortfully about their ongoing performance, redu-
groups’ adjustments fit to quadratic curves: estimatelow = cing their reliance on the anchoring-and-adjustment
12.35–0.01 * Time point + 0.01 * (Time point - 15.5)2; R 2 heuristic (Simmons, LeBoeuf, & Nelson, 2010). To
= .02, F(2, 3237) = 27.77, p < .01; estimatehigh = 11.57 + address this issue, we conducted Experiment 4 in an
0.002 * Time point + 0.02 * (Time point - 15.5)2; R 2 = .04, F effort to examine the influence of NFC when people
(2, 2967) = 54.76, p < .01. The same cannot be said about are asked to provide only one final estimate of their
the easy-to-difficult conditions. Here, NFC mattered. test performance. We hypothesised – in accord with
Specifically, people with high NFC consistently reported the theoretical account – that people with high NFC
higher estimates across the test than their low NFC would adjust more sufficiently than their low NFC
counterparts. Put another way, regression analyses counterparts. We therefore predicted that: (1) in the
showed that the low NFC group’s adjustments fit to a easy-to-difficult condition, people with high NFC would
simple line, but the high NFC group’s adjustments fit to a report a smaller final test estimate than people with
quadratic curve: esitmatelow = 22.90–0.14 * Time point; R 2 low NFC; (2) in the difficult-to-easy condition, people
= .02, F(1, 2818) = 72.23, p < .01; estimatehigh = 26.63–0.17 with high NFC would report a larger final test estimate
* Time point - 0.01 * (Time point - 15.5)2; R 2 = .02, F(2, than people with low NFC.
2967) = 144.00, p < .01.
In addition, a repeated-measures ANOVA revealed a
three way interaction, F(28, 369) = 1.93, p < .01. We decom- Experiment 4
posed this interaction with two additional repeated- Method
measures ANOVAs, examining the influence of NFC
within each question arrangement condition. For the Subjects
difficult-to-easy subjects, this analysis revealed only a We aimed to recruit 400 Mechanical Turk workers, and ulti-
main effect of Time point, F(28, 178) = 7.27, p < .01. But mately recruited 408.
for the easy-to-difficult subjects, we found a statistically
significant interaction between Time point and NFC, F(28, Design
164) = 2.02, p < .01. Follow-up Bonferroni-corrected com- The design was the same as Experiment 3.
parisons (i.e., α = .05 / 30 = 0.00167) revealed statistically
significant differences between the easy-to-difficult low Procedure
and high NFC groups after questions 9, 16, and 19 only – The procedure was the same as Experiment 3, except that
although we note that the mean is always numerically we no longer asked people to predict their test perform-
greater for people with high NFC. The maximum difference ance after every test question. Instead – as in our earlier
MEMORY 9

Figure 3. Mean estimated total test scores reported after each test question as a function of question arrangement and Need For Cognition (NFC). Error bars
represent 95% confidence intervals of means. Data are from Experiment 3.

work – we asked people to estimate their test performance 14.25, SDdifficult-to-easy = 5.54; Measy-to-difficult = 17.61, SDeasy-
only once, at the end of the test. We know from this earlier to-difficult = 5.03; Mdiff = 3.37, 95% CI [2.34, 4.40], t(406) =
work that reliable differences emerge in beliefs about test 6.43, p < .01. For confidence in the accuracy of their
performance as a function of question arrangement memory, difficult-to-easy subjects were also less
(Michael & Garry, 2016). confident than easy-to-difficult subjects, Mdifficult-to-easy =
2.73, SDdifficult-to-easy = 0.88; Measy-to-difficult = 3.10, SDeasy-to-
difficult = 0.88; Mdiff = 0.37, 95% CI [0.20, 0.54], t(406) = 4.25,
Results and Discussion
p < .01.
Recall that, as in Experiment 3, our primary question of In other words, for subjects’ final test estimates we
interest is the extent to which NFC influences the use of found no statistically significant interaction between the
an anchoring-and-adjustment heuristic in producing the order of questions and NFC, F(1, 404) = 0.01, p = .91, nor a
question arrangement effect. To answer that question, main effect of NFC, F(1, 404) = 1.49, p = .22. As in Exper-
we first split our sample into two groups based on the iment 3, these results remained virtually unchanged
median NFC score of 4.61 (high: M = 5.45, SD = 0.57, n = when we controlled for the slight difference in test accu-
201; low: M = 3.79, SD = 0.71, n = 207; overall: M = 4.61, racy between people with low and high NFC. For subjects’
SD = 1.05, n = 408). Consistent with Experiments 1-3, we reports of confidence in the accuracy of their memory, we
found that the order of questions had no meaningful found no statistically significant interaction, F(1, 404) =
influence on overall test performance, Mdifficult-to-easy = 1.16, p = .28, nor a main effect of NFC, F(1, 404) = 0.02, p
20.25, SDdifficult-to-easy = 2.94; Measy-to-difficult = 20.64, SDeasy- = .90.
to-difficult = 3.23; Mdiff = 0.39, 95% CI [−0.20, 1.00], t(406) = Overall, these results are consistent with our earlier work
1.29, p = .20. As in Experiment 3, however, people with and show that the biasing influence of question arrange-
high NFC answered slightly more questions correctly ment happens both when people make repeated predic-
than their low NFC counterparts, Mhigh = 21.01, SDhigh = tions during testing, and when they make a single post-
2.97; Mlow = 19.90, SDlow = 3.12; Mdiff = 1.12, 95% CI [0.52, test prediction (Michael & Garry, 2016). The patterns
1.71], t(406) = 3.70, p < .01. We found no statistically signifi- depicted in Figures 1–3 may therefore represent how
cant interaction between the order of questions and NFC, F people’s beliefs develop implicitly. But importantly, we
(1, 404) = 0.64, p = .42. found no meaningful moderation in the size of the ques-
Next, we examined subjects’ final test estimates and tion arrangement effect due to NFC. This unexpected
post-test reports of confidence in the accuracy of their result is, as in Experiment 3, difficult to reconcile with an
memory. Overall, subjects behaved similarly, regardless of anchoring-and-adjustment explanation. Finally, the differ-
differences in NFC. More specifically, for test estimates, ence in post-test memory confidence could suggest that
we replicated only the typical finding wherein difficult-to- question arrangement only influences this judgment
easy subjects believed they performed more poorly on when people are not making explicit, repeated predictions
the test than easy-to-difficult subjects, Mdifficult-to-easy = about their performance. Of course, the alternative
10 R. B. MICHAEL AND M. GARRY

explanation – that the bouncing around of this small effect important question: Why do these beliefs develop in a
reflects ordinary sampling variability – is still viable. qualitatively different way, when everyone ultimately
sees the same set of questions? Put another way, why is
it that difficult questions dramatically change people’s
General Discussion
beliefs about test performance when encountered first,
Across four experiments, we aimed to determine what but those exact same questions produce almost no
drives the finding that the order in which we ask eyewit- change in beliefs about test performance when encoun-
nesses questions about an event can shape how well tered last? Our results also add to the small but growing
those eyewitnesses believe they answered those ques- body of literature investigating explanations for the
tions. To achieve this aim, in Experiments 1 and 2 we influence of question arrangement. The available evidence
repeatedly asked subjects to report how well they to date suggests that a number of other explanations are
thought they would perform on an eyewitness memory unlikely, including the possibility that people remember
test, tracking how this belief changes over the course of the first test questions best (Franco, 2015); their affect
questioning. We found that even with two different test changes across the test (Weinstein & Roediger, 2010,
formats, flipping the order of questions does not simply 2012); and their attention declines across the test
flip the pattern of beliefs people develop. Instead, the (Michael & Garry, 2016).
two orders produce markedly different experiences. In line with our prior work, we consistently found that
In Experiments 3 and 4, we further aimed to identify the eyewitnesses who first answered easy questions believed
role of Need For Cognition, an individual difference they answered more questions correctly than eyewitnesses
measure known to affect the extent to which people who first answered difficult questions. That finding repli-
make adjustments to numerical estimates (Cacioppo cated across all four experiments, and fits with research
et al., 1984; Epley & Gilovich, 2006). We anticipated that investigating the influence of question arrangement in
people high in NFC would make greater adjustments to an educational paradigm (Jackson & Greene, 2014; Wein-
their estimates than their low NFC counterparts, both stein & Roediger, 2010, 2012). But in contrast to our pre-
when people repeatedly provided estimates over the vious work, we found in three of the four experiments
course of the test (Experiment 3) and when people pro- that eyewitnesses who first answered easy questions
vided only one estimate after the test (Experiment 4). were just as confident in the accuracy of their memory as
Such findings, if present, would fit with the idea that eyewitnesses who first answered difficult questions. This
people rely on an anchoring-and-adjustment heuristic finding is at odds with our previous work (Michael &
when forming beliefs about their performance. But Garry, 2016).
instead, both experiments produced results that are How are we to explain this disconnection between
difficult to reconcile with an anchoring-and-adjustment judgments of test performance and memory confidence?
explanation. We suspect that it may be due to different attributions
In Experiment 3, people with high NFC adjusted differ- people make across these two judgments. More specifi-
ently compared to people with low NFC only when the cally, test performance is a consequence of both the
test was arranged from the easiest to most difficult ques- quality of memory and the nature of the test questions. If
tion. And, in Experiment 4, we found no evidence that initially asked difficult questions that virtually no one
NFC affected people’s single, post-test estimates of per- could answer correctly, people might develop an
formance – estimates that were now free of the potential impression that their test performance is poor – but not
influence of repeated test score predictions. Across both because of a shaky memory. Instead, that poor perform-
experiments, we had anticipated instead that people ance can be attributed to some unfairly difficult questions.
with high NFC would adjust more than their low NFC A similar difference in attribution could arise if initially
counterparts, reducing the difference in final test estimates asked easy questions that virtually everyone could
between the question arrangement conditions (see, e.g., answer correctly. One way to test this speculative expla-
Epley & Gilovich, 2006). Overall, the results from these nation would be to ask people to explain their test per-
two experiments suggest that effortful thinking may not formance and memory confidence judgments. If our
protect people from the influence of ordered questions. hypothesised explanation is correct, we would expect
But we state this suggestion only tentatively, because an that people attribute their test performance to the ease
alternative explanation is that there are, in fact, small differ- or difficulty of the test, rather than the quality of their
ences in adjustment due to NFC that require greater pre- memory. As we acknowledged earlier, however, an alterna-
cision to detect. tive explanation – one that is simpler, but perhaps less
Considered as a package, a critic might wonder if these interesting – is that the true size of this effect is smaller
four experiments have value, given that they do not than we estimated in our prior work (Michael & Garry,
support firm conclusions about the mechanisms respon- 2016).
sible for the influence of ordered questions. On the con- Our research adds nuance to the literature because it
trary, we think they do. In particular, the patterns of shows that a seemingly trivial and non-suggestive manipu-
developing beliefs in Experiments 1 and 2 raise an lation can influence eyewitness metacognition (Wells &
MEMORY 11

Loftus, 2003). Moreover, the results have implications for confidence is a good proxy for subjective difficulty (r = −.82,
the mechanisms responsible for the effects that occur 95% CI [−.66, −.91]; Michael & Garry, 2016).
2. We present these line and curve data because they are intui-
when people answer questions arranged in certain orders
tively understandable. But the careful reader will note they
(Weinstein & Roediger, 2012). As a whole, the theory of are statistically problematic due to autocorrelation. We there-
effortful adjustment seems an inadequate explanation for fore ran additional regression analyses that included a lag vari-
our results (Epley & Gilovich, 2006). But very recently, a able of the estimates, and in each case this approach improved
new paper appeared providing empirical support for an model fit and successfully removed autocorrelation. These data
can be found in Table 1 of the Supplementary Materials.
alternative theory that may prove fruitful in future investi-
gations. This theory proposes that anchoring effects are the
result of an aversion to extreme adjustments (Lewis, Disclosure of interest
Gaertig, & Simmons, 2018).
It is also worth noting a methodological difference The authors report no conflict of interest.
between the work presented here and other investigations
of the anchoring phenomenon. In our paradigm, people Data availability statement
provide an estimate of their performance after a series of
questions. In other work, people typically provide an esti- The data for all four experiments reported in this manu-
mate in response to a single question (Epley & Gilovich, script are available from the Open Science Framework at
2006; Tversky & Kahneman, 1974). Perhaps the serial the following address: https://osf.io/8hkmj/
nature of our paradigm reduces the reliance on an anchor-
ing-and-adjustment heuristic because it provides people Disclosure statement
with multiple retrieval cues that can lead to recall of
event details, reducing the necessity of relying on other No potential conflict of interest was reported by the authors.
information – like how easy or difficult it feels to answer
questions (Greifeneder, Bless, & Pham, 2011). ORCID
What recommendations could we make – if any – for
Robert B. Michael http://orcid.org/0000-0001-5275-7636
applied contexts, such as eyewitness interviewing? We
know that best practice interviewing techniques often rec-
ommend an initial rapport-building phase that could be References
construed as a set of easy questions before the “real,”
Cacioppo, J. T., Petty, R. E., & Feng Kao, C. (1984). The efficient assess-
more difficult questioning begins (Collins, Lincoln, & ment of need for cognition. Journal of Personality Assessment, 48,
Frank, 2002). So it is plausible that question arrangement 306–307. doi:10.1207/s15327752jpa4803_13
may have some influence when interviewing eyewitnesses. Chapman, G. B., & Johnson, E. J. (2002). Incorporating the irrelevant:
But we state this possibility cautiously, because a rapport- Anchors in judgments of belief and value. In T. Gilovich, D. Griffin,
building technique differs in a number of ways from the & D. Kahneman (Eds.), Heuristics and biases: The psychology of intui-
tive judgment (pp. 120–138). Cambridge, UK: Cambridge University
serially ordered question manipulation we used, and thus Press.
might not meaningfully bias eyewitnesses at all. Further- Collins, R., Lincoln, R., & Frank, M. G. (2002). The effect of rapport in for-
more, we also know that best practice techniques typically ensic interviewing. Psychiatry, Psychology and Law, 9, 69–78. doi:10.
recommend that the types of questions we asked should 1375/pplt.2002.9.1.69
be used only toward the end of interviewing, after exten- Cumming, G. (2012). Understanding the new statistics: Effect sizes, confi-
dence intervals, and meta-analysis. New York, NY: Routledge.
sive free report procedures (Paulo, Albuquerque, & Bull, Cutler, B. L., Penrod, S. D., & Dexter, H. R. (1990). Juror sensitivity to eye-
2013). We therefore also don’t know, yet, whether question witness identification evidence. Law and Human Behavior, 14, 185–
arrangement would make any appreciable difference in 191. doi:10.1007/Bf01062972
people’s beliefs if those people have already had an oppor- Douglass, A. B., Neuschatz, J. S., Imrich, J., & Wilkinson, M. (2010). Does
tunity to engage in extensive recall. Finally, it is difficult to post- identification feedback affect evaluations of eyewitness testi-
mony and identification procedures? Law and Human Behavior, 34,
see how forensic interviewers could possibly know a priori 282–294. doi:10.1007/s10979-009-9189-5
the difficulty of their questions. Perhaps the only reason- Douglass, A. B., & Steblay, N. (2006). Memory distortion in eyewit-
able conclusion to draw, then, is that we may need to nesses: A meta-analysis of the post-identification feedback effect.
think more carefully about how the experience of Applied Cognitive Psychology, 20, 859–869. doi:10.1002/acp.1237
difficulty changes for eyewitnesses over the course of Epley, N., & Gilovich, T. (2006). The anchoring-and-adjustment heuris-
tic: Why the adjustments are insufficient. Psychological Science, 17,
questioning, because that experience can plausibly 311–318. doi:10.1111/j.1467-9280.2006.01704.x
distort what people believe. Franco, G. (2015). The order of questions on a test affects how well stu-
dents believe they performed. (Unpublished doctoral thesis), Victoria
University of Wellington, Wellington, New Zealand.
Frenda, S. J., Nichols, R. M., & Loftus, E. F. (2011). Current issues and
advances in misinformation research. Current Directions in
Notes
Psychological Science, 20, 20–23. doi:10.1177/0963721410396620
1. In our prior work we established that reported confidence Greifeneder, R., Bless, H., & Pham, M. T. (2011). When do people rely on
closely aligns with reported difficulty, suggesting that affective and cognitive feelings in judgment? A review. Personality
12 R. B. MICHAEL AND M. GARRY

and Social Psychology Review, 15(2), 107–141. doi:10.1177/ Rundus, D. (1971). Analysis of rehearsal processes in free recall.
1088868310367640 Journal of Experimental Psychology, 89, 63–77. doi:10.1037/
Innocence Project. (2018). The Causes of Wrongful Conviction. h0031185
Retrieved from https://www.innocenceproject.org/causes/ Simmons, J. P., LeBoeuf, R. A., & Nelson, L. D. (2010). The effect of accu-
eyewitness-misidentification/. racy motivation on anchoring and adjustment: Do people adjust
Jackson, A., & Greene, R. L. (2014). Impression formation of tests: from provided anchors? Journal of Personality and Social
Retrospective judgments of performance are higher when easier Psychology, 99, 917–932. doi:10.1037/a002140
questions come first. Memory & Cognition, 42, 1325–1332. doi:10. Slovic, P., Finucane, M. L., Peters, E., & MacGregor, D. G. (2007). The
3758/ s13421-014-0439-5 affect heuristic. European Journal of Operational Research, 177,
Jones, T. C., & Roediger, H. L. (1995). The experiential basis of serial pos- 1333–1352. doi:10.1016/j.ejor.2005.04.006
ition effects. European Journal of Cognitive Psychology, 7, 65–80. Takarangi, M. K., Parker, S., & Garry, M. (2006). Modernising the misin-
doi:10.1080/09541449508520158 formation effect: The development of a new stimulus set. Applied
Lewis, J., Gaertig, C., & Simmons, J. P. (2018). Extremeness aversion is a Cognitive Psychology, 20, 583–590. doi:10.1002/acp.1209
cause of anchoring. Psychological Science, 30, 1–15. doi:10.1177/ Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging
0956797618799305 frequency and probability. Cognitive Psychology, 5, 207–232. doi:10.
Loftus, E. F. (2005). Planting misinformation in the human mind: A 30- 1016/0010-0285(73)90033-9
year investigation of the malleability of memory. Learning & Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty:
Memory, 12, 361–366. doi:10.1101/lm.94705 Heuristics and biases. Science, 185, 1124–1131. doi:10.1126/
Loftus, E. F., Donders, K., Hoffman, H. G., & Schooler, J. W. (1989). science.185.4157.1124
Creating new memories that are quickly accessed and confidently Weinstein, Y., & Roediger, H. L. (2010). Retrospective bias in test per-
held. Memory & Cognition, 17, 607–616. doi:10.3758/Bf03197083 formance: Providing easy items at the beginning of a test makes
Michael, R. B., & Garry, M. (2016). Ordered questions bias eyewitnesses students believe they did better on it. Memory & Cognition, 38,
and jurors. Psychonomic Bulletin & Review, 23, 601–608. doi:10.3758/ 366–376. doi:10.3758/MC.38.3.366
s13423-015-0933-1 Weinstein, Y., & Roediger, H. L. (2012). The effect of question order on
Michael, R. B., & Weinstein, Y. (2018). The influence of ordered question evaluations of test performance: How does the bias evolve?
difficulty: A meta-analysis of two paradigms. Manuscript in Memory and Cognition, 40, 727–735. doi:10.3758/s13421-012-
preparation. 0187-3
Paulo, R. M., Albuquerque, P. B., & Bull, R. (2013). The enhanced cogni- Wells, G. L., & Loftus, E. F. (2003). Eyewitness memory for people and
tive interview: Towards a better use and understanding of this pro- events. In A. M. Goldstein (Ed.), Handbook of psychology: Forensic
cedure. International Journal of Police Science and Management, 15, Psychology, Vol. 11 (pp. 149–160). Hoboken, NY: John Wiley &
190–199. doi:10.1350/ijps.2013.15.3.311 Sons Inc.

You might also like