An Update of The Benton Facial Recognition Test: Ebony Murray Rachel Bennetts Jeremy Tree Sarah Bate
An Update of The Benton Facial Recognition Test: Ebony Murray Rachel Bennetts Jeremy Tree Sarah Bate
An Update of The Benton Facial Recognition Test: Ebony Murray Rachel Bennetts Jeremy Tree Sarah Bate
https://doi.org/10.3758/s13428-021-01727-x
Abstract
The Benton Facial Recognition Test (BFRT) is a paper-and-pen task that is traditionally used to assess face perception skills
in neurological, clinical and psychiatric conditions. Despite criticisms of its stimuli, the task enjoys a simple procedure and is
rapid to administer. Further, it has recently been computerised (BFRT-c), allowing reliable measurement of completion times
and the need for online testing. Here, in response to calls for repeat screening for the accurate detection of face processing
deficits, we present the BFRT-Revised (BFRT-r): a new version of the BFRT-c that maintains the task’s basic paradigm, but
employs new, higher-quality stimuli that reflect recent theoretical advances in the field. An initial validation study with typi-
cal participants indicated that the BFRT-r has good internal reliability and content validity. A second investigation indicated
that while younger and older participants had comparable accuracy, completion times were longer in the latter, highlighting
the need for age-matched norms. Administration of the BFRT-r and BFRT-c to 32 individuals with developmental prosop-
agnosia resulted in improved sensitivity in diagnostic screening for the BFRT-r compared to the BFRT-c. These findings are
discussed in relation to current diagnostic screening protocols for face perception deficits. The BFRT-r is stored in an open
repository and is freely available to other researchers.
Keywords Face perception · Face matching · Face recognition · Prosopagnosia · Benton · Response times
The Benton Facial Recognition Test (BFRT: Benton & Van the target. The task was originally developed for the assess-
Allen, 1968; see Benton et al. (1983) for the formal refer- ment of individuals believed to have acquired prosopagnosia
ence of the test) is a face matching task that is traditionally (a severe deficit in recognising familiar people from their
administered face-to-face using hard copy materials. Partici- face) following brain injury (Barton, 2008; Bate & Bennetts,
pants are simultaneously presented with a target face above 2015; Van Belle et al., 2011), but has since been widely used
an array of six test faces. In the first six trials, one face in to assess face perception skills in a number of neurologi-
the array matches the identity of the target face, and in the cal, clinical and psychiatric conditions (Annaz et al., 2009;
final 16 trials, three faces in the array match the identity of Rabin et al., 2005; Sachse et al., 2014).
Yet, the popularity of the BFRT has reduced in recent
years, particularly for the assessment of individuals sus-
Sarah Bate is supported by a British Academy Mid-Career pected to have prosopagnosia. At the turn of the century,
Fellowship (MD170004) and a Leverhulme Research Fellowship
(RF-2020-105). many more people presented to researchers believing they
experience a developmental form of prosopagnosia (Bate
* Ebony Murray et al., 2008; De Luca et al., 2019; Geskin & Behrmann,
[email protected] 2017), prompting a wider individual differences perspec-
1
Department of Psychological Sciences, School of Natural
tive on human face recognition, and the belief that devel-
and Social Sciences, University of Gloucestershire, opmental face recognition difficulties may reside on a
Cheltenham GL50 4AZ, UK continuum (Barton & Corrow, 2016; Bate & Tree, 2017).
2
Department of Life Sciences, Brunel University London, These larger samples of cases have reignited long-standing
Uxbridge, UK questions of whether perceptual and mnemonic difficulties
3
Department of Psychology, Swansea University, Swansea, are dissociable (De Renzi et al., 1991), and whether sub-
UK types of developmental prosopagnosia (DP) map onto this
4
Department of Psychology, Bournemouth University, Poole, framework (note that the term “congenital” or “hereditary”
UK
13
Behavior Research Methods (2022) 54:2318–2333 2319
prosopagnosia has been used somewhat interchangeably point out that decisional response biases are avoided by the
with DP in the literature: e.g. Behrmannet al., 2005; Has- task’s forced choice procedure (the number of target faces is
son et al., 2003; Kennerknecht et al., 2006; Palermo et al., constant across test sections), making it substantially easier
2011). Clearly, to address all of these questions, reliable face to interpret test scores. Further, Rossion and Michel (2018)
perception tasks are required. However, Duchaine and Wei- highlighted the importance of recording task completion
denfeld (2003) reported that when the inner features of the time in addition to accuracy (via a computerised version of
faces in the BFRT were obscured, most typical participants the test: the BFRT-c), as a means to detect typical scores that
could still achieve a typical score using the hairline and eye- are achieved by compensatory mechanisms. Previous work
brows alone. Further, Duchaine and Nakayama (2004) found has adopted this approach when screening for face percep-
that seven out of 11 DP participants achieved typical scores tion deficits in acquired prosopagnosia, where apparently
on the task, again suggesting that external facial cues may typical accuracy scores were found to be accompanied by the
be used to aid performance. use of atypical, laboured feature-by-feature matching strate-
Unfortunately, there is also deliberation over alternate gies (e.g. Bukach et al., 2006; Busigny & Rossion, 2010;
tests of face perception, and the field still lacks a reliable Delvenne et al., 2004; Farah, 1990; Young et al., 1993).
task. The most widely used face perception test for the diag- Another way to address this issue is to administer multi-
nosis of DP is the Cambridge Face Perception Test (CFPT: ple versions of a face perception task, using rather different
Duchaine et al., 2007), which presents participants with six facial stimuli. This should prohibit, or at least reduce, the
morphed faces that are to be organised in order of similarity transfer of compensatory strategies that are useful in one
to a simultaneously presented target face. The task requires version of a task. Indeed, some DPs report the use of par-
proficient use of a computer mouse within a strict time ticular facial features when images are captured within the
period, and the instructions are complex for online admin- same photography session (e.g. no change in skin tone or
istration, particularly with clinical and older participants appearance of the hairline, even when images are cropped),
(Bate et al., 2018; Bate, Frowd, Bennetts, et al., 2019c; or even consider pictorial cues such as lighting conditions
Bowles et al., 2009). Others query whether morphed faces or the quality of the images themselves (Adams et al., 2019).
are unnaturally similar (White et al., 2017), and whether Further, very recent findings also highlight the importance
the requirement for similarity judgements initiates higher- of repeat testing on key measures of face recognition per-
level cognitive processes than required for the simplistic formance when screening for DP (Bate, Bennetts, Gregory,
identity matching of simultaneously presented naturalistic et al., 2019a; Murray & Bate, 2020), given issues with task
facial images (Rossion & Michel, 2018). Such simpler face reliability, the occurrence of borderline scores that are dif-
matching tasks are typically found in the forensic face rec- ficult to reconcile, and the possibility that a particular score
ognition literature (e.g. the Glasgow Face Matching Test: simply occurred by chance performance (Young et al.,
Burton et al., 2010; the Pairs Matching Test: Bate et al., 1993).
2018; Bate, Frowd, Bennetts, et al., 2019c), but are seldom Yet, no known alternate version of the BFRT exists, and
used for the detection of DP due to their low sensitivity to an update of the task using new stimuli is certainly overdue.
poor performance (although see Stantic et al. (2021) for a While the basic paradigm (with the monitoring of comple-
recent face matching test that is presented as a suitable task tion times) offers a sound means of assessing face percep-
for the entire face processing spectrum of abilities). Indeed, tion, the age of the test unsurprisingly lends itself to very
the chance of responding correctly on all trials is 50%, a low image quality. Whilst face recognition can be successful
score that is within the range achieved by typical perceivers, even when images are of low spatial frequency (e.g. Liu et al.
many of whom find these tasks particularly challenging (e.g. 2000), unfamiliar face processing, as is being assessed with
Robertson et al., 2016; Shah et al., 2015). In addition, White the BFRT, benefits from high-quality images (Burton et al.,
et al. (2017) reported a response bias in DP participants, 1999). Moreover, the findings of Duchaine and colleagues
where the tendency to respond “different” in a simple same/ (Duchaine & Nakayama, 2004; Duchaine & Weidenfeld,
different face matching task artificially inflated their score 2003) indicate that extra-facial cues can be used to achieve
on these trials. a typical score on the existing version of the test. While
Such criticisms led Rossion and Michel (2018) to return more recent face-processing tasks in the neuropsychological
to the BFRT, citing advantages in its original paradigm. literature have responded to Duchaine and colleagues’ criti-
Despite the external cues to recognition that were high- cisms of the BFRT by using tightly controlled images that
lighted by Duchaine and colleagues (Duchaine & Nakayama, are captured on the same day and heavily cropped to exclude
2006; Duchaine & Weidenfeld, 2003), the BFRT has tradi- the external features (e.g. Biotti et al., 2017; Duchaine &
tionally been regarded as a difficult test with no ceiling effect Nakayama, 2006; Esins et al., 2016), it has been argued that
(Benton & Van Allen, 1972), which is quick to administer this procedure actually distances the task from real-world
with simple instructions. Importantly, Rossion and Michel face recognition (Burton, 2013). Rather, variability in facial
13
2320 Behavior Research Methods (2022) 54:2318–2333
appearance is a critical feature of everyday face recogni- contains a total of 22 trials in which an unfamiliar Caucasian
tion, and should be embraced in, rather than removed from, target face (shown from a frontal viewpoint with a neutral
laboratory tests (Young & Burton, 2017, 2018). In fact, even expression) has to be found among a simultaneously pre-
typical participants struggle to match faces of the same iden- sented array of six Caucasian probe faces, also showing neu-
tity when pictured in more “ambient” images that retain the tral expressions. For the first six trials (half male), the target
external features of the face, given image-based cues cannot face has to be found only once within each array, where all
be used as compensatory cues for successful performance faces are shown from a frontal viewpoint, such that the cor-
(for further discussion, see Burton, 2013). responding probe image is very similar to the target image.
Here, we introduce a new version of the BFRT-c, the For the remaining 16 trials (half male), the target face is
BFRT-revised (BFRT-r), which maintains the format of again presented from a frontal viewpoint. The participant is
the original task but employs new, more varied, naturalistic required to find three images within the six-image array that
facial images. In Experiment 1, we examine the validity of match the identity of the target. The six faces in each array
the BFRT-r in typical participants and provide norming data vary either in terms of head orientation (the second section
for comparison to clinical cases. In Experiment 2, we assess of the test: eight trials, half female) or lighting (the third
the test’s diagnostic utility alongside the BFRT-c in DP. section of the test: eight items, half female). Some target
faces are repeated: four of the seven female targets appear in
two separate sections, one of the seven male targets appears
Experiment 1 in all three sections, and three male targets are used in two
sections. All target identities are also used as distractors in
A new face matching task (the BFRT-r) was created that fol- at least one trial of the task.
lows the original BFRT paradigm, but is computerised (akin In each trial, target faces are presented at a slightly differ-
to the BFRT-c) and uses new, more ambient facial images. ent size than those in the array (target faces were 156 x 232
We initially assessed the psychometric properties of the pixels; faces in the array were 201 x 234 pixels, in order to
task and collected norming data from young typical adults. minimise successful matching based on low-level, image-
The reliability and validity of the BFRT-r was investigated based visual cues: Rossion & Michel, 2018). All images are
by comparing performance on this task to the BFRT-c. In grayscale and display the overall shape of the face, but are
addition, content validity was assessed by comparing perfor- cropped below the chin and beyond the hairline. As in the
mance to (a) the Cambridge Face Memory Test (CFMT) and original version of the task, the order of the trials is not ran-
(b) to a new group of participants using an inverted version domised and participants have an unlimited length of time
of the BFRT-r. to complete each trial. There is an inter-stimulus interval of
800 ms. Information screens at the beginning of each sec-
Method tion instruct the participant how many responses to make for
each trial, and inform them that response time is recorded.
Participants Participants are required to select their responses by
clicking on the appropriate face(s) in each array. For trials
A total of 165 participants took part in Experiment 1. One that require three responses, participants are able to select
hundred and nine participants aged between 18 and 35 years faces in any order, but cannot change a response once a face
(mean age = 24.7 years, SD = 3.5; 55 female) completed the has been selected. The maximum score on the task is 54.
full string of tests in their upright format. To avoid re-expo- Participants can receive one point in each of the six trials
sure effects, 56 different participants aged between 18 and 35 that compose Section 1 (where one response is required per
years (mean age = 24.9 years, SD = 3.5; 27 male) completed trial), and between 0 and 3 points for each of the trials in
only the inverted version of the BFRT-r. All participants Section 2 (where three responses are required per trial). Trial
were recruited via the online participant recruitment website completion times are measured to aid data processing (see
Prolific, in exchange for a small financial incentive. All were below), and overall task completion times are monitored for
Caucasian and lived within the UK, reported no history of analysis.
socio-emotional, neurological or psychiatric disorder, and
had normal or corrected-to-normal vision. This project was BFRT‑r The basic paradigm of the BFRT-c is retained, with
approved by the institutional Research Ethics Committee. the same number of trials. However, the facial stimuli are
replaced throughout. As gender biases have been shown for
Materials the recognition of female but not male faces (e.g. Herlitz
& Lovén, 2013; Lovén et al., 2011), we followed the prec-
BFRT-c (Rossion & Michel, 2018): The BFRT-c is the origi- edent of more recent tests by only using male faces (e.g. the
nal version of the BFRT, in a computerised format. The test CFMT: Duchaine & Nakayama, 2006; the CFPT: Duchaine
13
Behavior Research Methods (2022) 54:2318–2333 2321
et al., 2007). We initially acquired facial images from a total exclude any part of the head, hair or ears, but array images
of 130 Caucasian males (aged 18–34 years: M = 21.9 years, were cropped around the hairline (see Fig. 1). Target images
SD = 3.2) in exchange for course credit or a small financial were larger (166 x 232 pixels) than those in the array
incentive. Images were captured within the laboratory, and/ (approximately 153 x 200 pixels). As in the BFRT-c, only
or were existing photographs provided by the participant one of the array faces matched the identity of the target in
that had been taken within the space of a single year. Thus, the first six trials, and three in the remaining 16 trials. In the
different images of the same person had been captured on first 12 trials of the task, all faces are displayed from frontal
different days, often months apart, and in many cases, using viewpoints. In the final ten trials, faces are displayed from
different cameras. However, images of the same person frontal, but more naturalistic, viewpoints. The rotation of
had all been captured within the same year, preventing any most faces is small; approximately 10–30 degrees to the left
major ageing effects. Blemishes, skin tone and hairstyle var- or right. A small number of images (N = 7) are displayed at
ied from image to image, as well as lighting conditions. No a larger rotation (less than 45 degrees), but the whole face
image had been manipulated, and all were of sufficiently can be viewed in every photograph (i.e. both eyes are clearly
high quality (no less than 96 DPI). They displayed the target visible; see Fig. 1).
without spectacles or very heavy facial hair to the extent that As for the BFRT-c, trials were presented in the same
the faces were obscured. There were variations in viewpoint order for each participant, with an inter-stimulus interval of
due to their capture in naturalistic settings. 800 ms, and responses were made and scored in the same
A unique target was used in each of the 22 trials, and manner. Instructions were identical to the BFRT-c, but addi-
no target was re-used as a distractor. Ten distractor identi- tionally informed participants that some images were taken
ties were repeated over the 22 trials, but different images of some time apart, and some aspects of the target’s appear-
each individual were used where possible; only two images ance (e.g. hairstyle) may have changed during this time. The
were repeated twice through the test. No distractor identity BFRT-r test materials are available in an open repository:
was repeated in the same array. Distractors were allocated https://osf.io/vza3m/?view_only=404f6d1971924759b126
to each trial based on their perceived similarity to the target, d46cba1d25b7. A fully programmed version can also be
as judged by a member of the research team. Pilot testing shared with researchers on request, via Testable. The test
supported these judgements: the trials included in the final and its materials are protected by a Creative Commons Attri-
BFRT-r did not elicit ceiling nor floor effects. In total, the bution-Non-Commercial license.
test used images from 76 different individuals.
All images were presented in greyscale. This decision BFRT‑r inverted The BFRT-r was also prepared in an
was made based on initial pilot testing/materials analyses, inverted format to assess the content-validity of the test. All
which indicated that ceiling effects in the typical population stimuli and parameters were identical, with the exception
could be achieved when images were in colour. To prevent that all images were rotated 180 degrees.
low-level image matching, target faces were not cropped to
Fig. 1 Example trials from the BFRT-r. Note. Panel A shows the frontal viewpoint, but there are three correct responses (1, 2, 4). Panel
trial format for trials 1–6 where all faces are presented from a frontal C shows the format for trials 13–23, where the target face is shown
viewpoint. There is only one correct response (4). Panel B shows the from a frontal viewpoint and the probe images show some rotation.
trial format for trials 7–12 where all faces are again shown from a There are three correct responses (2, 5, 6)
13
2322 Behavior Research Methods (2022) 54:2318–2333
CFMT (Duchaine & Nakayama, 2006) The CFMT is an unfa- were similarly affected by duplications, without artificially
miliar face memory test. The overall objective of the CFMT affected the mean scores. Overall accuracy scores were also
is to introduce unfamiliar, young male faces to the partici- screened for outliers across the dataset, using a three SDs
pant and then test their recognition of those faces. It con- from the mean criterion.
tains three test stages which increase in difficulty as the test For the participants that completed both the BFRT-r and
progresses: (a) Learn: Participants view a target face from BFRT-c, 15 were removed for failing to complete enough tri-
three viewpoints for 3 s per image. They then choose which als (i.e. too many duplications), and one for achieving accu-
of three faces, presented simultaneously, is the target. This racy scores on both tasks that surpassed three SDs from the
is repeated for six faces, resulting in a maximum score of mean score. No participant responded quicker than 150 ms
18. (b) Test stage: Thirty triads of faces are presented, where on any trial, and no participant was excluded for giving too
one face is a novel image of a target identity intermixed with many abnormally slow responses. Of the remaining partici-
two distractors. (c) Noise stage: Twenty-four new triads are pants, none were identified as outliers on the CFMT. The
displayed with added visual noise. Again, each trial con- final sample consisted of 93 participants aged 18–30 years
tains any one of the targets and two distractors. The entire (49 female; mean age = 24.84 years, SD = 3.44). The same
test is scored out of 72, and chance is 24. More information screening procedures were applied to the participants who
about this test can be found in the associated publication. only completed the inverted version of the BFRT-r, result-
We include it here as a means to check the content validity ing in the exclusion of one individual. A final sample of 55
of the BFRT-r (i.e. that it correlates well with a dominant participants aged 18–30 years (27 male; mean age = 24.9
face-processing task that is already known to have high con- years, SD = 3.6) therefore proceeded to the analysis phase.
tent validity).
Results
Procedure
Mean accuracy performance on the upright version of the
All tasks were completed online, using the Testable platform BFRT-r was 78.83% (SD = 8.72). Given different numbers
(www.testable.org; see Rezlescu et al., 2020). Participants of responses were required in different sections of each test,
were required to initially calibrate the tests for screen size, we followed the precedent of Rossion and Michel (2018)
ensuring uniform presentation. The 109 participants that in analysing overall task completion times, rather than the
completed the main string of tests completed the CFMT, average response time per trial. The mean overall task com-
BFRT-r, and BFRT-c, in that order. This enabled us to col- pletion time for the BFRT-r was 253.75 s (SD = 130.97).
lect accurate norming data for the new task without intro- Mean accuracy performance on the BFRT-c was 80.82%
ducing practice effects from the repeated use of the same (SD = 8.17), and mean task completion time was 168.34
paradigm. The 56 participants who only took part in the s (SD = 69.82 s). The performance of this sample on the
inverted version of the BFRT-r did not complete any other BFRT-c is therefore comparable to the norms presented by
tests. Rossion and Michel (2018), who reported a mean accuracy
of 82.98% (SD = 6.37), and a mean task completion time of
Data processing 180.85 s (SD = 59.86).
To fully compare the two tasks, a mixed 2 (test: BFRT-c,
As data were collected online, responses were initially BFRT-r) x 2 (gender: male, female) ANOVA was carried
screened for task engagement. Each individual’s mean out. There was a main effect of test, F(1,91) = 6.760, p =
response time (and SD) was calculated for each group of .011, ηρ2 = .069, in that individuals performed significantly
trials in both the BFRT-c and BFRT-r (i.e. the trials which better on the BFRT-c (M = 80.82%, SD = 8.17) than the
required one response, and the trials which required three BFRT-r (M = 78.83%, SD = 8.72; see Table 1). There was
responses). Any responses which were greater that than 3 also a significant main effect of gender over the two tests,
SDs above the mean were removed, as were any responses F(1,91) = 4.469, p = .037, ηρ2 = .047 (see Fig. 2), where
that were quicker than 150 ms. In addition, trials that females (M = 81.39%, SD = 8.14) outperformed males (M =
required three unique responses were screened to ensure cor- 78.09%, SD = 8.50). However, there was no significant inter-
rect completion (i.e. to remove trials that had received dupli- action between test and gender, F(1,91) = 1.173, p = .282.
cate responses). If participants made more than two dupli- A 2 (test: BFRT-c, BFRT-r) x 2 (gender) ANOVA
cations on only one of the tests, their data were removed on task completion times revealed a main effect of test,
from the analysis. Participants who made a similar number F(1,91) = 87.616, p < .001, ηρ2 = .491, with participants
of duplications (no more than four; equivalent of 10% of taking longer to complete the BFRT-r (M = 253.75s, SD
responses) on both tests were retained. As a duplication was = 130.98) than the BFRT-c (M = 168.34s, SD = 69.82;
scored as 0, we did this to ensure that scores on both tasks see Table 1). There was no main effect of gender, F(1,91)
13
Behavior Research Methods (2022) 54:2318–2333 2323
Table 1 Descriptive data (means and standard deviations) for the upright versions of the BFRT-c and BFRT-r for younger controls (Experiment
1), and older controls and DPs (Experiment 2); accuracy is presented as a percentage, and completion times in seconds
BFRT-c accuracy BFRT-c completion times BFRT-r accuracy BFRT-r completion times
Younger controls 80.82 (8.17) 168.34 (69.82) 78.83 (8.72) 253.75 (130.97)
(Mean age = 24.8; N = 93)
Older controls 82.71 (8.76) 247.55 (108.30) 78.98 (10.46) 348.22 (165.20)
(Mean age = 48.4; N = 218)
DPs 74.83 (7.83) 341.16 (139.87) 67.48 (8.23) 489.13 (202.29)
(Mean age = 52.0; N = 32)
Fig. 2 Violin plots indicating scores on the BFRT-r and BFRT-c for males and females separately, and the overall sample
= 0.081, p = .776 (see Fig. 3), nor an interaction between also correlated significantly and moderately (BFRT-c) or
test and gender, F(1,91) = 0.003, p = .954. Completion strongly (BFRT-r) with the CFMT.
times on the BFRT-r strongly correlated with the BFRT-c Finally, comparison between overall accuracy scores on
(r = .787, p < .001). the upright (M = 78.83%, SD = 8.72) and inverted (M =
Following the precedent of Rossion and Michel (2018), 56.66%, SD = 10.64) versions of the BFRT-r revealed a
the task’s internal reliability was assessed by correlating per- substantial inversion effect, t(146) = 13.746, p < .001, d =
formance on even versus odd items, considering only the 2.28. However, there was no significant difference between
second part of the test in which three responses are made per upright BFRT-r (M =253.75s, SD = 130.97) task comple-
trial. The inter-item correlation was significant for accuracy tion times and inverted task completion times (M = 249.90
rates (mean score for the eight even items = 20.41/24, SD s, SD = 146.88), t(146) = 0.165, p = .869.
= 2.34; mean score odd items = 18.78/24, SD = 2.18; rSB
[Spearman–Brown] = .735, p < .001). The interitem correla- Summary
tion was even higher for trial completion times (mean trial
completion times for the eight even items = 89.50s, SD = Here, we present the BFRT-r: a new test of face perception
46.05s; mean trial completion times for the eight odd items that adopts the same paradigm as the original BFRT (as per
= 98.91s, SD = 55.48s; rSB = .963, p < .001). the BFRT-c) but uses more naturalistic images to accommo-
To further explore the reliability and validity of the date within-person variation in facial images. As the BFRT-
BFRT-r, scores were correlated against performance on the r follows the procedure of the BFRT-c, the test continues
CFMT (see Table 2 for the full correlation matrix). Accu- to be simple and quick to administer, with an approximate
racy performance on the BFRT-r strongly correlated with the completion time of four minutes in typical young adults.
BFRT-c (r = .636, p < .001). Both the BFRT-c and BFRT-r Initial analyses reveal that the BFRT-r has good internal
13
2324 Behavior Research Methods (2022) 54:2318–2333
Fig. 3 Violin plots indicating completion times on the BFRT-r and BFRT-c for males and females separately, and overall sample
Table 2 Correlation matrix for overall scores on the three tests (r difference in face recognition, with a meta-analysis confirm-
(p)). Cut-offs for significance are Bonferroni corrected. N = 93 ing that females outperform males (Herlitz & Lovén, 2013),
even when only male faces are presented. The authors of
BFRT-r CFMT
the meta-analysis observed that the female advantage may
BFRT-c .636 (< .001) .432 (.001) be accentuated in tasks which involve generalisation across
BFRT-r .510 (< .001) different viewpoints or images (as is the case in both BFRT
tests). Thus, the gender effect observed here was not surpris-
ing; we continued to explore whether it persevered in our
reliability with strong inter-item correlations. It also has a second experiment using older adult control participants.
strong inversion effect according to accuracy (although not
completion times), suggesting that it taps face- rather than
image-processing mechanisms. A strong correlation with the Experiment 2
CFMT further supports this.
Comparison of performance on the two tasks indicates Having explored the validity of the BFRT-r in younger par-
that the BFRT-r is slightly more difficult than the original ticipants, we next sought to examine the diagnostic utility
version. More importantly, typical participants are able to of the updated version in individuals with DP. In particular,
score well above chance on the BFRT-r (the lowest score we examine (a) the additional benefit of evaluating response
was 55.56%; chance performance is 46.30%). Further, the times as well as accuracy in atypical participants, and (b)
norming data reported here (M = 78.24%, SD = 9.20) would whether there is a case for administration of multiple ver-
enable clinical participants to score two SDs below the mean sions of the same task when screening for face perception
without performing at chance level. Indeed, those with DP deficits.
often show impaired face perception skills, but these skills
are not completely abolished to the point that they are scor- Method
ing at chance level (e.g. Bate et al., 2019a, b, c; Biotti et al.,
2019; Righart & de Gelder, 2007). Thus, the task is suit- Participants
ably calibrated to detect variations in performance between
chance and the control mean. Thirty-two participants with a prior diagnosis of DP took
It is of note that a gender difference was found for accu- part in this study. They had previously taken part in an
racy (but not completion time) across both versions of the objective screening session and scored atypically on at
BFRT. A similar effect has previously been reported for least two of three diagnostic tests: the CFMT (Duchaine &
completion time but not accuracy on the BFRT-c (Rossion Nakayama, 2006), the CFPT (Duchaine et al., 2007), and a
& Michel, 2018). Previous work suggests a small gender famous faces test (e.g. Bate, Bennetts, Gregory, et al., 2019a;
13
Behavior Research Methods (2022) 54:2318–2333 2325
Fig. 4 Violin plots indicating scores on the BFRT-c and BFRT-r for younger control participants, older control participants, and the DP group
Bennetts et al., 2015; Murray & Bate, 2019), following exist- Results
ing diagnostic protocols (Dalrymple & Palermo, 2015; Bate
& Tree, 2017: see supplementary material for their diagnos- Age and gender
tic results). Eight were male, and they were aged between 40
and 59 years (M = 52.0 years, SD = 5.6). For the older controls, high correlations were observed
Because our DP sample were older than the younger between performance on the two versions of the Benton on
adults reported in Experiment 1, a new set of 243 older both the accuracy (r = .743, p < .001) and task completion
control participants (M age = 48.4 years, SD = 5.9; 119 time (r = .844, p < .001) measures (as seen in Experiment 1
females) were recruited for age-matched comparison. These in the younger control data). Further, BFRT-r accuracy per-
individuals were again recruited via the Prolific online formance did not differ between the new set of older control
recruitment platform, in exchange for a small financial participants (M = 78.98%, SD = 10.46) and the younger
incentive. All DP and control participants were Caucasian, sample reported in Experiment 1 (M = 78.83%, SD = 8.7),
and reported no history of socio-emotional, neurological or t(309) = – .115, p = .908. However, overall task completion
psychiatric disorder (including mild cognitive impairment) times were slower in older (M = 348.22 s, SD = 165.20)
and had normal or corrected-to-normal vision. compared to younger (M = 253.75 s, SD = 130.97) controls,
Following the same data-processing strategies as in t(309) = – 4.896, p < .001, d = .63 (see Fig. 4). The same
Experiment 1, data from 25 control participants were pattern emerged for the BFRT-c: younger (M = 80.82%,
removed: 23 provided too many duplications in their SD = 8.17) and older (M = 82.71%, SD = 8.76) controls
responses, and an additional two participants took an abnor- performed similarly in terms of accuracy, t(309) = – 1.776,
mally long time to complete both the BFRT-r and BFRT-c. p = .077, but younger controls (M = 168.34 s, SD = 69.81)
This resulted in a final sample of 218 (116 male) control completed the test significantly faster than older controls (M
participants, aged between 40 and 60 years (M = 48.4 years, = 247.55 s, SD = 108.30), t(309) = 6.497, p < .001, d = .87
SD = 5.9). The same exclusion criteria were applied to the (see Fig. 5). Thus, subsequent analyses only compared the
DP data as for the control participants; no DP data were performance of DPs to the older control group. No gender
removed from the analysis. effects were found on either the BFRT-r or BFRT-c in this
age group (ps > .05).
All participants completed the upright version of the BFRT- A mixed 2 (test: BFRT-c, BFRT-r) x 2 (group: DP, older
r and BFRT-c in that order, online, via the testing platform controls) ANOVA was conducted to explore overall group
Testable. differences in accuracy scores (see Table 1). There was a
13
2326 Behavior Research Methods (2022) 54:2318–2333
Fig. 5 Violin plots indicating completion times on the BFRT-c and BFRT-r for younger control participants, older control participants, and the
DP group
significant main effect of group, whereby DP participants significantly longer than controls to complete both the
scored significantly poorer (M = 71.15%, SD = 8.03) than BFRT-c (M = 341.16, SD = 112.74 and M = 247.55, SD
control participants (M = 80.84%, SD = 9.61), F(1,248) = = 112.73 respectively: p < .001) and BFRT-r (M = 489.13,
34.207, p < .001, ηρ2 = .121 (see Fig. 4). There was also SD = 170.28 and M = 348.22, SD = 170.28 respectively:
a main effect of test, F(1,248) = 66.702, p < .001, ηρ2 = p < .001).
.212: scores on the BFRT-c (M = 87.70%, SD = 9.02) were The difference in BFRT-r and BFRT-c scores were larger
higher than those on the BFRT-r (M = 77.50%, SD = 10.89). for DPs (mean difference = 147.67 s, SD = 124.55) than
There was also a significant interaction between test and for control participants (mean difference = 100.67 s, SD =
group, F(1,248) = 7.079, p = .008, ηρ2 = .028. Pairwise 93.98), t(248) = 2.54, p = .012, d = .43.
comparisons revealed that DPs scored significantly lower Because we also held CFPT upright accuracy scores for
than controls on both the BFRT-c (M = 74.83%, SD = 8.64 all our DPs as part of their background diagnostic profiles
and M = 82.71%, SD = 8.65 respectively: p < .001) and (see supplementary material), we were able to investigate
BFRT-r (M = 67.48%, SD = 10.21 and M = 78.98%, SD = whether accuracy or completion time on both versions of
10.20 respectively: p < .001). the BFRT were associated with this indicator. However,
The difference in BFRT-r and BFRT-c scores was larger no significant correlations were observed between CFPT
for DPs (mean difference = 7.35%, SD = 7.85) than for performance and any of the four BFRT measures (all ps >
control participants (mean difference = 3.74%, SD = 7.07), .490), a finding which has recently been reported elsewhere
t(248) = 2.661, p = .008, d = .48. (Mishra et al., 2020). As such, this may not be a surprising
To investigate any differences in task completion times, finding, especially given the BFRT and CFPT have consid-
a 2 (test: BFRT-c, BFRT-r) x 2 (group: DP, controls) erable differences in their paradigms. On the other hand,
ANOVA was conducted (see Table 1). A significant main accuracy (r = 0.538, p = .002) and completion time (r =
effect of group indicated that DPs took longer to complete 0.788, p = .001; sequential Bonferroni correction for mul-
the tests (M = 415.14 s, SD = 171.08) than controls (M tiple correlations applied) on the two versions of the BFRT
= 297.88 s, SD = 136.75), F(1,248) = 20.812, p < .001, were highly correlated in DP participants, as was found for
ηρ2 = .077 (see Fig. 5). A significant main effect of test control participants in both experiments. Because of the lack
indicated that participants took longer to complete the of association between CFPT and BFRT scores, we did not
BFRT-r (M = 366.26 s, SD = 176.36) than the BFRT- proceed to use CFPT scores to further interpret individual
c (M = 259.53 s, SD = 116.79), F(1,248) = 178.426, p patterns of performance on either version of the BFRT (see
< .001, ηρ2 = .418. There was also a significant interac- below). Indeed, it is not possible to infer from the current
tion between test and group, F(1,248) = 6.455, p = .012, methodology whether either the CFPT or BFRT offers a
ηρ2 = .025. Pairwise comparisons revealed that DPs took “true” indicator of face perception, and we instead focus on
13
Behavior Research Methods (2022) 54:2318–2333 2327
consistency of individual performance across the two ver- DP literature, with some authors using two SDs from the
sions of the BFRT. mean (Bate, Bennetts, Tree, et al., 2019b; Biotti et al., 2019;
Bowles et al., 2009) and others 1.7 SDs (DeGutis et al.,
Single‑case analyses 2012, 2014; Palermo et al., 2017; White et al., 2017). Here,
to allow for recording error, and to err on the conservative
To examine the importance of assessing both accuracy and side when determining face perception is intact (given it is
response times on face matching tests, each DP’s perfor- currently assumed that the process is impaired in most DPs:
mance on the BFRT-r was evaluated on both parameters on Bate, Bennetts, Gregory, et al., 2019a; Biotti et al., 2019),
a case-by-case basis (see Table 3). As all participants were we present the findings in terms of a 1.7 SD cut-off.
over the age of 40, their scores and completion times were Fifteen of the 32 DPs (46.88%) performed within the
compared to that of the older control group (see Table 1). typical range on both the BFRT-r and BFRT-c according to
The z-score used as a cut-off for typicality varies within the both accuracy and completion time measures (see Table 3).
Table 3 Normalised accuracy scores and task completion times for the 32 DP participants on the BFRT-r and BFRT-c
Participant ID BFRT-r Accuracy BFRT-r Completion Time BFRT-c Accuracy BFRT-c
Completion
Time
Negative z-scores represent poorer performance for accuracy, and positive scores indicate slower completion times.* denotes an atypical z-score
(+/– 1.7)
13
2328 Behavior Research Methods (2022) 54:2318–2333
Notably few borderline scores were detected: the closest (17/32 classified as typical performers) for the BFRT-r (see
score to a cut-off was a z-score of -1.62 on the BFRT-c accu- Figs. 6 and 7).
racy (DPF01; DPF12) and – 1.53 on the BFRT-r accuracy To examine how well the BFRT-c and BFRT-r discrimi-
(DPM08; DPF14), with the vast majority of other scores nate between control participants and individuals with DP,
occurring within 1.25 SDs of the control mean. Of the 17 we calculated d’ for each test. d’ is a bias-free measure of
DPs who showed at least some impairment, seven exceeded sensitivity that combines information about hit rates (in this
cut-off on both tasks, according to at least one measure. case, the number of older control participants who were cor-
An additional eight participants were only impaired on the rectly classified as “typical” performers on each test) and
BFRT-r, and two participants were only impaired on the false alarms (participants with DP who were classified as
BFRT-c. Interestingly, only five DPs displayed impairments “typical” performers on each test) (Macmillan & Creelman,
on accuracy alone, whereas nine DPs only showed impair- 2005). A d’ of 0 would indicate chance discrimination; and
ments on the completion time measure. Thus, task comple- a d’ of 5.00 would represent perfect discrimination between
tion time was the primary indicator of impairment on both control and DP participants in this sample. The d’ for the
versions of the test. BRFT-c is 0.60, whereas the d’ for the BFRT-r is 1.03.
110
100
90
BFRT-r Accuracy
80
Controls
70
DPs
60
Cut-off Point
50
40
30
30 40 50 60 70 80 90 100 110
BFRT-c Accuracy
Fig. 6 Scatterplot indicating the individual scores, in percentage, for BFRT-c. Note that two of the DP data plots represent two DPs each;
DPs and older control participants. Note. The dashed line indicates they scored the same on both tests (two scored 68.52 and 66.67%, and
the cut-offs for DP criteria (1.7 SDs from the mean). Thus, those two scored 75.93 and 55.56%, on the BFRT-c and BFRT-r, respec-
below the dashed line on the y-axis were atypical on the BFRT-r and tively)
those to the left of the dashed line on the x-axis were atypical on the
13
Behavior Research Methods (2022) 54:2318–2333 2329
1000
900
Fig. 7 Scatterplot indicating the individual completion times for DPs the dashed line on the y-axis were atypical on the BFRT-r and those
and older control participants. Note. The dashed line indicates the to the right of the dashed line on the x-axis were atypical on the
cut-offs for DP criteria (1.7 SDs from the mean). Thus, those above BFRT-c
that further investigation is required to confirm whether gen- years perform similarly to those ages 35 and younger on face
der differences persist on the task. processing tasks. Finally, we investigated the test’s utility in
As a group, the DPs performed more poorly than controls identifying face perception difficulties in DP.
on both the BFRT-r and BFRT-c according to both accuracy First, we replicated several known advantages of the
and completion time measures (see Table 1). However, akin BFRT. As the BFRT-r procedure is identical to the com-
to previous work (Bate, Bennetts, Gregory, et al., 2019a; puterised version of the original BFRT (the BFRT-c: Ros-
Burns et al., 2017; Le Grand et al., 2006; Minnebusch et al., sion & Michel, 2018), the test is known to follow a simple
2007; Stantic et al., 2021), case-by-case analyses indicated procedure and is quick to administer. Here, we found that
considerable heterogeneity in DPs’ face perception perfor- the BFRT-r takes approximately 4–6 min for typical par-
mance, with just under half of the sample displaying intact ticipants to complete (with longer completion times in older
face perception skills on both tests. Consistent deficits in adults), and approximately 8 min for older adults with DP.
face perception were noted across both versions of the test in Further, we noted the particular importance of monitoring
seven DPs. However, only the BFRT-r detected impairment task completion times for accurate diagnosis of face per-
for eight DPs, and only the BFRT-c for the remaining two, ception impairments. This is clearly facilitated by the use
suggesting some cases may be missed by administration of a of a computerised rather than pencil-and-paper format (see
single face perception task (see also Murray & Bate, 2020). Rossion & Michel, 2018), and also lends the task to online
administration – a particularly important concern in very
recent times. Notably, online administration was used here
General discussion in both experiments, and the resulting strong internal reli-
ability and inter-item correlations directly support this mode
In this paper, we introduce a new version of the BFRT (the of implementation. Moreover, the BFRT-r elicits a strong
BFRT-r), with updated stimuli that address recent theoreti- inversion effect and strongly correlates with the BFRT-c and
cal progress in the face recognition literature. We sought to CFMT, evidencing content-validity akin to other tests of
examine the validity of the BFRT-r in typical participants, face processing (Busigny & Rossion, 2010; Duchaine et al.,
and provide norming data for younger (aged 18–35 years) 2007; Duchaine & Nakayama, 2006; McKone et al., 2011).
and older (aged 40–60 years) adult populations. Although Most strikingly, the sensitivity of the new BFRT-r in iden-
our control samples did not include individuals aged 36–39 tifying face perception impairments improved substantially
years, patterns observed in previous work (e.g. Bate, Ben- than that for the BFRT-c.
netts, Gregory, et al., 2019a; Bowles et al., 2009) suggest It should be noted that no measure of BFRT performance
clinical cases within this age range should be compared to was found to be associated with CFPT accuracy scores in
the younger control group. That is, individuals aged 36–39 the DP sample. This finding replicates a similar recent
13
2330 Behavior Research Methods (2022) 54:2318–2333
report for the BFRT-c (Mishra et al., 2020), and also fits Minnebusch et al., 2007; Stantic et al., 2021). Here, we
with other findings of low correlations between four different found that just less than half of our DP sample presented
face matching tests (Fysh et al., 2020). Further investiga- with no impairments on either version of the BFRT. While
tion is required to decipher whether these poor associations seven of the remaining 17 DPs consistently displayed
between tasks that are thought to operate under the umbrella deficits on both versions of the BFRT, eight were only
of “face perception” represent differences in process or con- detected on the BFRT-r and two by the BFRT- c (note that
tent, or even participant inconsistency (see Bate et al., 2018; we did not attempt to further clarify these patterns using
Bate, Frowd, Bennetts, et al., 2019c). This could potentially CFPT scores given the lack of association between the
have important implications for DP screening protocols as it two paradigms). Together, these patterns of performance
appears from the present data that these tasks should not be highlight the importance of administering more than one
used interchangeably, and, pending further investigation, it task when screening for face perception deficits. This is
may be prudent for researchers to administer all three tests to particularly true for a condition such as DP, where face
their DP participants. Indeed, investigations into (acquired) recognition difficulties appear to mostly be lifelong and do
prosopagnosia traditionally investigated face processing not accompany any other form of dysfunction. This allows
abilities using several tasks which assessed the same sub- many people with DP to develop elaborate compensatory
process (e.g. De Renzi & di Pellegrino, 1998; Rossion et al., strategies that may help them with particular facial stimuli
2003; Takahashi et al., 1995; Wada & Yamamoto, 2001) and or task paradigms, allowing them to obscure their diffi-
more recently, researchers are urging the use of multiple culties (Adams et al., 2019). The case for repeat-testing
assessments to examine the consistency of performance in aligns with our recent demonstration of the importance of
DP (see Murray and Bate (2019) for a discussion). At the repeat-screening for face memory deficits in DP, given the
same time, it is often encouraged that time-effective tests possibility that “typical” scores can be achieved by chance
are used to do this and, whilst the CFPT can take up to 16 or due to low task reliability (Murray & Bate, 2020).
minutes to complete, the BFRT-r and BFRT-c are typically One further way to address this issue, particularly in
much quicker to administer and, consequently, offer a more tasks of face perception, is to place more emphasis on
practical way to assess the consistency of one’s face per- completion times, given accurate scores may be obtained
ception skills. That is, face perception could be thoroughly by spending a long time on a task. Consistent with exist-
assessed (in the vast majority of cases) in less than 30 min. ing work (e.g. Bukach et al., 2006; Busigny & Rossion,
The main adaptation of the new BFRT-r concerns the 2010; Delvenne et al., 2004; Jansari et al., 2015; Rossion
new images, both in terms of visual quality, and in address- & Michel, 2018), the finding reported here that nine DPs
ing important theoretical concerns within the field. While were only impaired on completion time (but not accu-
the original BFRT images were highly constrained and racy) on either, or both tests, highlights the importance of
were presumably captured in the same setting on the same assessing both measures. Indeed, longer completion times
day and using the same camera, our images embraced the may reflect the use of laboured face processing strategies
natural variability which typically occurs when viewing the and methods which ultimately lead to a correct response.
same person on different occasions in everyday life. The However, it is important to note that our older adult con-
photographs were captured over different days (sometimes trols took longer to complete the task than younger adults,
months apart), using a variety of cameras, showing the per- although the same effect did not emerge for accuracy.
son from varying viewpoints and distances from the camera, Thus, we strongly suggest that age-matched norms are
and in different lighting conditions. By moving away from used for identifying impaired performance on this task.
the tightly controlled conditions that prevail in existing tests Additionally, with this finding in mind, the BFRT-r likely
of face perception, it is likely that we move closer towards offers itself to be a suitable task for examining age-related
the circumstances of everyday face perception, providing a changes in face processing within the typical population.
more ecologically valid diagnostic test (Burton, 2013). In In conclusion, this paper presents an updated version
addition, the use of more ambient facial images also over- of the BFRT with new theoretically motivated stimuli. As
comes previous concerns that extra-facial or distinguishing researchers continue to recommend repeat assessment of
features could be used by clinical participants to achieve face processing deficit is to explore consistency of impair-
typical scores on the BFRT (Duchaine & Weidenfeld, 2003). ment, the field will continue to benefit from more tasks
Taken together, the new stimuli have likely gone some way which to assess sub-processes of face processing. As such,
towards addressing this issue. the BFRT-r offers an opportunity for repeat screening for
Importantly, our data also indicated that face percep- consistency of performance that improves substantially on
tion skills are heterogeneous in DP – a factor that has the sensitivity offered by the BFRT-c. The task can be
been highlighted in previous work (Bate, Bennetts, Tree, shared with other researchers on request.
et al., 2019b; Burns et al., 2017; Le Grand et al., 2006;
13
Behavior Research Methods (2022) 54:2318–2333 2331
Supplementary Information The online version contains supplemen- Benton, A. L., Sivan, A. B., Hamsher, K. D. S., Varney, N. R., &
tary material available at https://d oi.o rg/1 0.3 758/s 13428-0 21-0 1727-x. Spreen, O. (1983). Facial recognition: Stimulus and multiple-
choice pictures. In A. L. Benton, A. B. Sivan, K. D. S. Hamsher,
N. R. Varney, & O. Spreen (Eds.), Contribution to neuropsycho-
logical assessment (pp. 30–40). Oxford University Press.
References Biotti, F., Wu, E., Yang, H., Jiahui, G., Duchaine, B., & Cook, R.
(2017). Normal composite face effects in developmental pros-
Adams, A., Hills, P., Bennetts, R., & Bate, S. (2019). Coping strategies opagnosia. Cortex, 95, 63–76. https://doi.org/10.1016/j.cortex.
for developmental prosopagnosia. Neuropsychological Rehabilita- 2017.07.018
tion, 1–20. https://doi.org/10.1080/09602011.2019.1623824 Biotti, F., Gray, K., & Cook, R. (2019). Is developmental prosopagno-
Annaz, D., Karmiloff-Smith, A., Johnson, M., & Thomas, M. (2009). sia best characterised as an apperceptive or mnemonic condition?
A cross-syndrome study of the development of holistic face rec- Neuropsychologia, 124, 285–298. https://doi.org/10.1016/j.neuro
ognition in children with autism, Down syndrome, and Williams psychologia.2018.11.014
syndrome. Journal of Experimental Child Psychology, 102(4), Bowles, D., McKone, E., Dawel, A., Duchaine, B., Palermo, R.,
456–486. https://doi.org/10.1016/j.jecp.2008.11.005 Schmalzl, L., Rivolta, D., Wilson, E., & Yovel, G. (2009). Diag-
Barton, J. (2008). Structure and function in acquired prosopagnosia: nosing prosopagnosia: Effects of ageing, sex, and participant–
Lessons from a series of 10 patients with brain damage. Journal stimulus ethnic match on the Cambridge Face Memory Test and
of Neuropsychology, 2(1), 197–225. https://d oi.o rg/1 0.1 348/1 7486 Cambridge Face Perception Test. Cognitive Neuropsychology,
6407x214172 26(5), 423–455. https://doi.org/10.1080/02643290903343149
Barton, J., & Corrow, S. (2016). The problem of being bad at faces. Bukach, C. M., Bub, D. N., Gauthier, I., & Tarr, M. J. (2006). Percep-
Neuropsychologia, 89, 119–124. https://doi.org/10.1016/j.neuro tual expertise effects are not all or none: Spatially limited per-
psychologia.2016.06.008 ceptual expertise for faces in a case of prosopagnosia. Journal of
Bate, S., & Bennetts, R. (2015). The independence of expression and Cognitive Neuroscience, 18, 48–63. https://d oi.o rg/1 0.1 162/0 8989
identity in face-processing: evidence from neuropsychological 2906775250094
case studies. Frontiers in Psychology, 6. https://doi.org/10.3389/ Burns, E., Martin, J., Chan, A., & Xu, H. (2017). Impaired processing
fpsyg.2015.00770 of facial happiness, with or without awareness, in developmental
Bate, S., & Tree, J. (2017). The Definition and Diagnosis of Devel- prosopagnosia. Neuropsychologia, 102, 217–228. https://doi.org/
opmental Prosopagnosia. Quarterly Journal of Experimental 10.1016/j.neuropsychologia.2017.06.020
Psychology, 70(2), 193–200. https://doi.org/10.1080/17470218. Burton, M.A. (2013). Why has research in face recognition progressed
2016.1195414 so slowly? The importance of variability. Quarterly Journal of
Bate, S., Haslam, C., Tree, J., & Hodgson, T. (2008). Evidence of an Experimental Psychology, 66(8), 1467–1485. https://doi.org/10.
eye movement-based memory effect in congenital prosopagnosia. 1080/17470218.2013.800125
Cortex, 44(7), 806–819. https://doi.o rg/10.1016/j.cortex.2007.02. Burton, A., Bruce, V., & Hancock, P. (1999). From Pixels to People:
004 A Model of Familiar Face Recognition. Cognitive Science, 23(1),
Bate, S., Frowd, C., Bennetts, R., Hasshim, N., Murray, E., Bobak, 1–31. https://doi.org/10.1207/s15516709cog2301_1
A.K., Wills, H., & Richards, S. (2018). Applied screening tests Burton, M.A., White, D., & McNeill, A. (2010). The Glasgow Face
for the detection of superior face recognition. Cognitive Research: Matching Test. Behavior Research Methods, 42, 286–291. https://
Principles and Implications, 3(1). https:// d oi. o rg/ 1 0. 1 186/ doi.org/10.3758/BRM.42.1.286
s41235-018-0116-5 Busigny, T., & Rossion, B. (2010). Acquired prosopagnosia abolishes
Bate, S., Bennetts, R., Gregory, N., Tree, J., Murray, E., & Adams, the face inversion effect. Cortex, 46, 965–981. https://doi.org/10.
A., Bobak, A.K., Penton, T., Yang, T., & Bannisy, M.J. (2019a). 1016/j.cortex.2009.07.004
Objective Patterns of Face Recognition Deficits in 165 Adults Dalrymple, K., & Palermo, R. (2015). Guidelines for studying devel-
with Self-Reported Developmental Prosopagnosia. Brain Sci- opmental prosopagnosia in adults and children. Wiley Interdisci-
ences, 9(6), 133. https://doi.org/10.3390/brainsci9060133 plinary Reviews: Cognitive Science, 7(1), 73–87. https://doi.org/
Bate, S., Bennetts, R., Tree, J., Adams, A., & Murray, E. (2019b). The 10.1002/wcs.1374
domain-specificity of face matching impairments in 40 cases of De Luca, M., Pizzamiglio, M., Di Vita, A., Palermo, L., Tanzilli, A.,
developmental prosopagnosia. Cognition, 192, 104031. https:// Dacquino, C., & Piccardi, L. (2019). First the nose, last the eyes in
doi.org/10.1016/j.cognition.2019.104031 congenital prosopagnosia: Look like your father looks. Neuropsy-
Bate, S., Frowd, C., Bennetts, R., Hasshim, N., Portch, E., Murray, E., chology, 33(6), 855–861. https://doi.org/10.1037/neu0000556
& Dudfield, G. (2019c). The consistency of superior face recogni- De Renzi, E., & di Pellegrino, G. (1998). Prosopagnosia and Alexia
tion skills in police officers. Applied Cognitive Psychology, 33(5), Without Object Agnosia. Cortex, 34(3), 403–415. https://doi.org/
828–842. https://doi.org/10.1002/acp.3525 10.1016/s0010-9452(08)70763-9
Behrmann, M., Avidan, G., Marotta, J., & Kimchi, R. (2005). Detailed De Renzi, E., Faglioni, P., Grossi, D., & Nichelli, P. (1991). Apper-
Exploration of Face-related Processing in Congenital Prosopag- ceptive and Associative Forms of Prosopagnosia. Cortex, 27(2),
nosia: 1. Behavioral Findings. Journal of Cognitive Neuroscience, 213–221. https://doi.org/10.1016/s0010-9452(13)80125-6
17(7), 1130–1149. https://doi.org/10.1162/0898929054475154 DeGutis, J., Chatterjee, G., Mercado, R., & Nakayama, K. (2012). Face
Bennetts, R., Butcher, N., Lander, K., Udale, R., & Bate, S. (2015). gender recognition in developmental prosopagnosia: Evidence
Movement cues aid face recognition in developmental prosopag- for holistic processing and use of configural information. Visual
nosia. Neuropsychology, 29(6), 855–860. https://doi.o rg/10.1 037/ Cognition, 20(10), 1242–1253. https://d oi.o rg/1 0.1 080/1 35062 85.
neu0000187 2012.744788
Benton, A. L., & Van Allen, M. W. (1968). Impairment in facial rec- DeGutis, J., Cohan, S., & Nakayama, K. (2014). Holistic face training
ognition in patients with cerebral disease. Transactions of the enhances face processing in developmental prosopagnosia. Brain,
American Neurological Association, 93, 38–42. 137(6), 1781–1798. https://doi.org/10.1093/brain/awu062
Benton, A., & Van Allen, M. (1972). Prosopagnosia and facial discrim- Delvenne, J.F., Seron, X., Coyette, F., & Rossion, B. (2004). Evidence
ination. Journal Of The Neurological Sciences, 15(2), 167–172. for perceptual deficits in associative visual (prosop)agnosia: A
https://doi.org/10.1016/0022-510x(72)90004-4
13
2332 Behavior Research Methods (2022) 54:2318–2333
single-case study. Neuropsychologia, 42, 597–612. https://d oi.o rg/ Neuropsychology, 28(2), 109–146. https://d oi.o rg/1 0.1 080/
10.1016/j.neuropsychologia.2003.10.008 02643294.2011.616880
Duchaine, B., & Nakayama, K. (2004). Developmental prosopagno- Minnebusch, D., Suchan, B., Ramon, M., & Daum, I. (2007). Event-
sia and the Benton Facial Recognition Test. Neurology, 62(7), related potentials reflect heterogeneity of developmental prosop-
1219–1220. https://d oi.o rg/1 0.1 212/0 1.w
nl.0 00011 8297.0 3161.b 3 agnosia. European Journal of Neuroscience, 25(7), 2234–2247.
Duchaine, B., & Nakayama, K. (2006). The Cambridge Face Memory https://doi.org/10.1111/j.1460-9568.2007.05451.x
Test: Results for neurologically intact individuals and an investi- Mishra, M., Fry, R., Saad, E., Arizpe, J., Ohashi, Y., & DeGutis, J.
gation of its validity using inverted face stimuli and prosopagnosic (2020). Comparing the sensitivity of face matching assessments
participants. Neuropsychologia, 44(4), 576–585. https://doi.org/ to detect face perception deficits. PsyArXiv. https://doi.org/10.
10.1016/j.neuropsychologia.2005.07.001 31234/osf.io/68gbm
Duchaine, B., & Weidenfeld, A. (2003). An evaluation of two com- Murray, E., & Bate, S. (2019). Self-ratings of face recognition abil-
monly used tests of unfamiliar face recognition. Neuropsycho- ity are influenced by gender but not prosopagnosia severity.
logia, 41(6), 713–720. https://doi.org/10.1016/s0028-3932(02) Psychological Assessment, 31(6), 828–832. https://doi.org/10.
00222-1 1037/pas0000707
Duchaine, B., Germine, L., & Nakayama, K. (2007). Family resem- Murray, E., & Bate, S. (2020). Diagnosing developmental prosop-
blance: Ten family members with prosopagnosia and within-class agnosia: repeat assessment using the Cambridge Face Memory
object agnosia. Cognitive Neuropsychology, 24(4), 419–430. Test. Royal Society Open Science, 7(9), 200884. https://doi.org/
https://doi.org/10.1080/02643290701380491 10.1098/rsos.200884
Esins, J., Schultz, J., Stemper, C., Kennerknecht, I., & Bülthoff, I. Palermo, R., Willis, M., Rivolta, D., McKone, E., Wilson, C., &
(2016). Face Perception and Test Reliabilities in Congenital Pros- Calder, A. (2011). Impaired holistic coding of facial expression
opagnosia in Seven Tests. I-Perception, 7(1), 204166951562579. and facial identity in congenital prosopagnosia. Neuropsycho-
https://doi.org/10.1177/2041669515625797 logia, 49(5), 1226–1235. https://doi.org/10.1016/j.neuropsych
Farah, M. J. (1990). Visual Agnosia: Disorders of object recognition ologia.2011.02.021
and what they tell us about normal vision. MIT Press. Palermo, R., Rossion, B., Rhodes, G., Laguesse, R., Tez, T., Hall,
Fysh, M., Stacchi, L., & Ramon, M. (2020). Differences between and B., Albonico, A., Malaspina, M., Daini, R., Irons, J., Al-Janabi,
within individuals, and subprocesses of face cognition: implica- S., Taylor, L.C., Rivolta, D., & McKone, E. (2017). Do People
tions for theory, research and personnel selection. Royal Society Have Insight into their Face Recognition Abilities? Quarterly
Open Science, 7(9), 200233. https://doi.org/10.1098/rsos.200233 Journal of Experimental Psychology, 70(2), 218–233. https://
Geskin, J., & Behrmann, M. (2017). Congenital prosopagnosia without doi.org/10.1080/17470218.2016.1161058
object agnosia? A literature review. Cognitive Neuropsychology, Rabin, L., Barr, W., & Burton, L. (2005). Assessment practices of
35(1–2), 4–54. https://doi.org/10.1080/02643294.2017.1392295 clinical neuropsychologists in the United States and Canada: A
Hasson, U., Avidan, G., Deouell, L., Bentin, S., & Malach, R. (2003). survey of INS, NAN, and APA Division 40 members. Archives
Face-selective activation in a Congenital Prosopagnosic subject. of Clinical Neuropsychology, 20(1), 33–65. https://doi.org/10.
Journal Of Cognitive Neuroscience, 15(3), 419–431. https://doi. 1016/j.acn.2004.02.005
org/10.1162/089892903321593135 Rezlescu, C., Danaila, I., Miron, A., & Amariei, C. (2020). More
Herlitz, A., & Lovén, J. (2013). Sex differences and the own-gender time for science: Using Testable to create and share behavioral
bias in face recognition: A meta-analytic review. Visual Cogni- experiments faster, recruit better participants, and engage stu-
tion, 21(9–10), 1306–1336. https://doi.org/10.1080/13506285. dents in hands-on research. Progress in Brain Research, 253,
2013.823140 243–262. https://doi.org/10.1016/bs.pbr.2020.06.005
Jansari, A., Miller, S., Pearce, L., Cobb, S., Sagiv, N., Williams, A. Righart, R., & de Gelder, B. (2007). Impaired face and body per-
L., Tree, J., & Hanley, J. R. (2015). The man who mistook his ception in developmental prosopagnosia. Proceedings of The
neuropsychologist for a popstar: When configural processing fails National Academy of Sciences, 104(43), 17234–17238. https://
in acquired prosopagnosia. Frontiers in Human Neuroscience, 9, doi.org/10.1073/pnas.0707753104
390. https://doi.org/10.3389/fnhum.2015.00390 Robertson, D., Noyes, E., Dowsett, A., Jenkins, R., & Burton, A.
Kennerknecht, I., Plümpe, N., Edwards, S., & Raman, R. (2006). (2016). Face Recognition by Metropolitan Police Super-Rec-
Hereditary prosopagnosia (HPA): the first report outside the Cau- ognisers. PLOS ONE, 11(2), e0150036. https://doi.org/10.1371/
casian population. Journal of Human Genetics, 52(3), 230–236. journal.pone.0150036
https://doi.org/10.1007/s10038-006-0101-6 Rossion, B., & Michel, C. (2018). Normative accuracy and response
Le Grand, R., Cooper, P., Mondloch, C., Lewis, T., Sagiv, N., de time data for the computerized Benton Facial Recognition Test
Gelder, B., & Maurer, D. (2006). What aspects of face processing (BFRT-c). Behavior Research Methods, 50(6), 2442–2460.
are impaired in developmental prosopagnosia? Brain and Cogni- https://doi.org/10.3758/s13428-018-1023-x
tion, 61(2), 139–158. https://d oi.o rg/1 0.1 016/j.b andc.2 005.1 1.0 05 Rossion, B., Caldara, R., Seghier, M., Schuller, A., Lazeyras, F., &
Liu, C., Collin, C., Rainville, S., & Chaudhuri, A. (2000). The effects of Mayer, E. (2003). A network of occipito-temporal face-sensitive
spatial frequency overlap on face recognition. Journal Of Experi- areas besides the right middle fusiform gyrus is necessary for
mental Psychology: Human Perception And Performance, 26(3), normal face processing. Brain, 126(11), 2381–2395. https://doi.
956–979. https://doi.org/10.1037/0096-1523.26.3.956 org/10.1093/brain/awg241
Lovén, J., Herlitz, A., & Rehnman, J. (2011). Women’s own-gender Sachse, M., Schlitt, S., Hainz, D., Ciaramidaro, A., Walter, H.,
bias in face recognition memory. The role of attention at encoding. Poustka, F., Bolte, S., & Freitag, C.M. (2014). Facial emotion
Experimental Psychology, 58, 333–340. https://doi.org/10.1027/ recognition in paranoid schizophrenia and autism spectrum dis-
1618-3169/a000100 order. Schizophrenia Research, 159(2–3), 509–514. https://doi.
McKone, E., Hall, A., Pidcock, M., Palermo, R., Wilkinson, R., org/10.1016/j.schres.2014.08.030
Rivolta, D., Yovel, G., David, J.M., & O’Connor, K.B. (2011). Shah, P., Sowden, S., Gaule, A., Catmur, C., & Bird, G. (2015).
Face ethnicity and measurement reliability affect face recogni- The 20-item prosopagnosia index (PI20): relationship with the
tion performance in developmental prosopagnosia: Evidence Glasgow face-matching test. Royal Society Open Science, 2(11),
from the Cambridge Face Memory Test–Australian. Cognitive 150305. https://doi.org/10.1098/rsos.150305
13
Behavior Research Methods (2022) 54:2318–2333 2333
Stantic, M., Brewer, R., Duchaine, B., Banissy, M.J., Bate, S., Susilo, Prosopagnosia. Quarterly Journal of Experimental Psychology,
T., Catmur, C. & Bird, G. (2021). The Oxford Face Matching Test: 70(2), 287–297. https://d oi.o rg/10.1 080/1 7470218.2 016.1 173076
A non-biased test of the full range of individual differences in face Young, A., & Burton, A. (2017). Recognizing Faces. Current Direc-
perception. Behavior Research Methods.https://doi.org/10.3758/ tions in Psychological Science, 26(3), 212–217. https://doi.org/
s13428-021-01609-2 10.1177/0963721416688114
Takahashi, N., Kawamura, M., Hirayama, K., Shiota, J., & Isono, O. Young, A. W., Newcombe, F., de Haan, E. H., Small, M., & Hay, D. C.
(1995). Prosopagnosia: A Clinical and Anatomical Study of Four (1993). Face perception after brain injury. Selective impairments
Patients. Cortex, 31(2), 317–329. https://doi.org/10.1016/s0010- affecting identity and expression. Brain, 116, 941–959. https://
9452(13)80365-6 doi.org/10.1093/brain/116.4.941
Van Belle, G., Busigny, T., Lefèvre, P., Joubert, S., Felician, O., Gen-
tile, F., & Rossion, B. (2011). Impairment of holistic face percep- Open Practices Statement None of the experiments were pre-
tion following right occipito-temporal damage in prosopagnosia: registered. The data are available as supplementary material. The
Converging evidence from gaze-contingency. Neuropsychologia, BFRT-r stimuli and dataset are available via the Open Science
49(11), 3145–3150. https://doi.org/10.1016/j.neuropsychologia. Framework, and can be accessed here: https://osf.io/vza3m/?view_
2011.07.010 only=404f6d1971924759b126d46cba1d25b7
Wada, Y., & Yamamoto, T. (2001). Selective impairment of facial rec- Publisher’s note Springer Nature remains neutral with regard to
ognition due to a haematoma restricted to the right fusiform and jurisdictional claims in published maps and institutional affiliations.
lateral occipital region. Journal Of Neurology, Neurosurgery &
Psychiatry, 71(2), 254–257. https://d oi.o rg/1 0.1 136/j nnp.7 1.2.2 54
White, D., Rivolta, D.A., Burton, M., Al-Janabi, S., & Palermo,
R. (2017). Face Matching Impairment in Developmental
13