Padilla Cognitive 2017

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Social Indicators Research Series 69

Bruno D. Zumbo
Anita M. Hubley Editors

Understanding and
Investigating Response
Processes in Validation
Research
Editors
Bruno D. Zumbo Anita M. Hubley
Measurement, Evaluation, and Research Measurement, Evaluation, and Research
Methodology (MERM) Program, Methodology (MERM) Program,
Department of Educational and Department of Educational and
Counselling Psychology, and Special Counselling Psychology, and Special
Education (ECPS) Education (ECPS)
The University of British Columbia The University of British Columbia
Vancouver, BC, Canada Vancouver, BC, Canada

ISSN 1387-6570 ISSN 2215-0099 (electronic)


Social Indicators Research Series
ISBN 978-3-319-56128-8 ISBN 978-3-319-56129-5 (eBook)
DOI 10.1007/978-3-319-56129-5

Library of Congress Control Number: 2017939937

© Springer International Publishing AG 2017


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Chapter 12
Cognitive Interviewing and Think Aloud
Methods

José-Luis Padilla and Jacqueline P. Leighton

Although a case could be made that the need for explanations of item responses has
been around since the origins of validity theory, the 1999 edition of the Standards
for Educational and Psychological Testing (AERA, APA, & NCME, 1999), can be
considered the official birth certificate of validity evidence based on response pro-
cesses as a source of validity evidence. Previous relevant references can be traced to
a recommendation by Messick (1990) to look at how subjects cope with items and
tasks to identify processes underlying item responses, efforts by Embretson (1983)
linking cognitive psychology to item response theory, or even the earliest definitions
of validity, if we just take an interest in knowing “what the test measures.”
At the same time that professionals, researchers, and testing organizations have
been incorporating research on response processes into their test development and
evaluation practices, the current Standards (AERA, APA, & NCME, 2014) main-
tains response processes among the five sources of validity evidence. On the down-
side, the latest Standards has not gone further in providing more indications than in
the previous edition on how to obtain solid validity evidence based on response
processes. Systematic reviews of validation studies reveal that few studies are con-
ducted to obtain validity evidence using response processes. Cizek, Rosenberg, and
Koons (2007) found that validity evidence based on participants’ response pro-
cesses were studied only in 1.8% of the papers. Zumbo and Shear (Shear & Zumbo,
2014; Zumbo & Shear, 2011) showed a higher presence but still a minority com-
pared with the other sources of validity evidence; for instance, in the medical

J.-L. Padilla ( )
University of Granada, 18071 Granada, Spain
e-mail: [email protected]
J.P. Leighton
Center for Research in Applied Measurement and Evaluation (CRAME), Department of
Educational Psychology, Faculty of Education, University of Alberta,
6-119D Education North Building, Edmonton, AB T6G 2G5, Canada
e-mail: [email protected]

© Springer International Publishing AG 2017 211


B.D. Zumbo, A.M. Hubley (eds.), Understanding and Investigating Response
Processes in Validation Research, Social Indicators Research Series 69,
DOI 10.1007/978-3-319-56129-5_12
212 J.-L. Padilla and J.P. Leighton

outcomes field, only 14% of the validation studies were aimed at obtaining evidence
of the response processes.
The lack of experience, consolidation of best practices, and recommendations on
how to obtain evidence of response processes can lead to missed opportunities pro-
vided by new conceptual and methodological developments in validity theories and
validation methods. Among the various validation methods that can provide evi-
dence of response processes, this chapter is devoted to cognitive interviewing (CI)
and think aloud methods.
The target audience of this chapter is professionals and researchers looking for
methodological guidance to perform validation studies by using CI and think aloud
methods. In the chapter, we (a) describe the state-of-the-art in conducting think
aloud and CI studies, (b) describe similarities and difference between the methods,
and (c) demonstrate how both methods can provide validity evidence of response
processes.
CI and think aloud methods are described in the context of educational and psy-
chological testing. Both methods are often applied in survey research too, mainly as
pre-testing methods to fix problems and improve survey questions. In fact, as we
discuss in the following sections, both methods have common origins and not as
distant developments as it might seem. We intend to provide arguments to distin-
guish between both methods to help researchers to make informed decisions about
which method can be more useful considering the aims of the validation study.
We think that such validity evidence can be understood from a de-constructed
view of validity (e.g., Kane, 2013; Sireci, 2012) to a more contextualized and prag-
matic explanation validity framework (Stone & Zumbo, 2016; Zumbo, 2009).
Throughout the chapter, we will also present studies to illustrate the content and
how to apply think aloud and CI methods.

Introduction and State-of-the-Art in Conducting Cognitive


Interviewing (CI)

CI History and Overview

Before starting with a short history of the CI method, we should present a definition
and a clear description of how the method is usually applied. The need for a defini-
tion is evident given that the term is also common in fields far from educational
testing and psychological assessment, like law enforcement, where CI is a police
resource to check witness reliability. What is more, CI emerged as a question evalu-
ation method in the survey research field. Therefore, readers should pay attention to
translating definitions of CI into the educational testing and psychological assess-
ment context.
Although there is no universally accepted definition of CI, a wide consensus
exists about what Beatty and Willis (2007) think CI involves: “the administration of
12 Cognitive Interviewing and Think Aloud 213

draft survey questions while collecting additional verbal information about the sur-
vey responses, which is used to evaluate the quality of the response or to help deter-
mine whether the question is generating the information that its author intends”
(p. 287). The first task for readers is to change ‘survey question’ to ‘test items’ or
‘scale items’. A couple of years later, Willis (2009) stated that CI “… is a
psychologically-oriented method for empirically studying the way in which indi-
viduals mentally process and respond to survey questionnaires. Cognitive inter-
views can be conducted for the general purpose of enhancing our understanding of
how respondents carry out the task of answering survey questions” (p. 106).
Highlighting the core elements in both definitions of CI allows us to recognize
potential benefits from CI in validation studies of test score interpretations: (a) CI is
a psychologically-oriented method for investigating respondents’ mental processes
while answering test and scale items, (b) CI data can be useful for examining the
quality of item responses, and (c) CI can help determine whether items are captur-
ing the intended behaviors. The next section presents studies that illustrate these
core elements.
Commonly, CI pre-testing evaluation studies in survey research consist of con-
ducting in-depth interviews following an interview protocol with a small, purposive
sample of 10–30 respondents. First, respondents answer the target survey questions;
that is, the questions to be pre-tested, and then they respond to a series of follow-up
probes that vary from general and open probes, like “What were you thinking?” or
“How did you came up with that?” to much more scripted and specific follow-up
probes, such as “What does the term/word (...) mean to you?” or “How did you cal-
culate (…)?” Problems with the ‘question-and-answer’ process are usually identi-
fied and analyzed from the respondents’ narratives in the cognitive interviews.
As Miller (2014) points out, CI, by asking respondents to describe how and why
they answered survey questions as they did, provides evidence not just to fix ques-
tions but also to find out the ways respondents interpret questions and apply them to
their own lives, experiences, and perceptions. Miller’s (2014) interpretative view of
CI methodology from survey research, coincides beyond expected with the broadest
conceptions of validity theory in educational and psychological testing. For exam-
ple, the contextualized and pragmatic explanation validity framework (Zumbo,
2009) expands opportunities for CI as a validation method to obtain evidence of
response processes and to examine equivalence and sources of bias in cross-cultural
research (Benítez & Padilla, 2014). However, we need to briefly summarize the
evolution of CI before going into details of that proposal.
Almost all manuals and introductory articles on the CI method point out the
Cognitive Aspects of Survey Methodology (CASM) conference (Jabine, Straf,
Tanur, & Tourangeau, 1984) as a critical event in the history of CI. Presser et al.
(2004) also identified the influential contribution of the Loftus’ (1984) post-
conference analysis of how respondents answer questions about past events. Such
an analysis relied on the think-aloud technique to studying the solving of problems
developed by Ericsson and Simon (1980). So influential was Loftus’ (1984) work
that, since then, the think-aloud technique has been closely linked to CI either as a
214 J.-L. Padilla and J.P. Leighton

theoretical basis for the CI method (e.g., Willis, 2005), or as a data collection pro-
cedure along with verbal probing to conduct CI (e.g., Beatty & Willis, 2007).
After the CASM conference, and relying heavy on cognitive theory, cognitive
laboratories devoted to testing and evaluating survey questions were established
first at several U.S. federal agencies (e.g., the National Center for Health Statistics,
the U.S. Census Bureau), and then at Statistics Canada and equivalent official sta-
tistics institutes like Statistics Netherland, Statistics Sweden, etc. (Presser et  al.,
2004). As we discuss in the next section, the role of federal agencies and official
statistics institutes can explain why CI methodology is still mainly seen as a pre-
testing method aimed at fixing problems with questions to reduce response errors
(Willis, 2005). Federal agencies and official statistics institutes have shaped CI
methodology in survey research similarly to the way that testing companies have
modeled research on item bias and differential item functioning (DIF).
Nowadays, CI practitioners and researchers live off of the advancements that the
CASM conference brought to the study of measurement errors in survey research.
The CASM movement sets the idea that respondents’ thought processes must be
understood to assess validity (Schwarz, 2007). Later, the inclusion of motivational
elements to information-processing perspectives produced a major evolution. For
example, Krosnick (1999) introduced the construct of “satisficing” to account for
the tendency of most respondents to choose the first satisfactory or acceptable
response option rather than options reflecting full cognitive effort. More compre-
hensive models of the question-and-answer process are on the way to take context,
social, and cultural elements into account, support the rationale behind the method,
and expand the range of validation research questions that could be addressed by CI
(e.g., Shulruf, Hattie, & Dixon, 2008).

CI Approaches and Theories

From the short introduction above to CI, it should be clear that CI is a qualitative
method used to examine the question-and-answer process carried out by respon-
dents when answering survey questions. Even though distinguishing between dif-
ferent purposes for conducting CI in survey research can be difficult, such a division
can help us find out the ways in which CI can provide evidence of response pro-
cesses for validation studies in testing and psychological assessment. Willis (2015)
differentiates between two apparently contrasting objectives: reparative vs. descrip-
tive cognitive interviews. With slight changes in the labels, the distinction can be
easily found in the literature when the purpose of CI is under debate (e.g., Chepp &
Gray, 2014; Miller, 2011). The reparative approach corresponds to the original need
for identifying problems in survey questions and repairing them. Traditionally in
cognitive laboratories and official statistics institutes, it has been the practice of CI
projects to answer, let us say, quantitative questions within a qualitative method:
“How many problems does the target question have?” or “Which percentage of CI
participants reveal such problem?” In contrast, the descriptive approach represents
12 Cognitive Interviewing and Think Aloud 215

CI projects whose aims are to find out how respondents mentally process and answer
survey questions instead of just uncovering response errors. Advocates of this
approach argue that CI should be planned to discover what a survey question is truly
capturing, that is, how survey questions function as measure of a particular con-
struct (e.g., Chepp & Gray, 2014; Ridolfo & Shoua-Glusberg, 2011).
The descriptive approach is in line with our proposal to rely on the CI method to
obtain validity evidence related to response processes associated with test and scale
items. There is a solid argument for the parallelism between the more comprehen-
sive objective of discovering what the survey question is truly capturing and the
2014 Standards definition for validity evidence of response processes: “Some con-
struct interpretations involve more or less explicit assumptions about the cognitive
process engaged in by test takers. Theoretical and empirical analysis of the response
processes of test takers can provide evidence concerning the fit between the con-
struct and the detailed nature of the performance or response actually engaged in by
test takers” (p. 15). The following indication to questioning test takers about their
performance strategies or responses to particular items opens the door to applying
CI methodology from a descriptive approach to obtain validity evidence of response
processes.
The question now is if there is a theory to support CI methodology. Willis (2015)
proposes to distinguish between what he calls a theory of the phenomenon, that is,
how people respond to survey questions, and a theory of the method, a theory that
supports the use of CI to test and investigate survey response processes. Starting
with the theory of the phenomenon, the CASM view, such as it is exposed by the
four-stage cognitive model by Tourangeau (1984), has been and still is the most
cited cognitive theoretical framework of response processes to survey questions.
The model presents a linear sequence from when the survey questions are presented
to the respondent to the selection of a response: (a) comprehension of the question,
(b) retrieval of relevant information, (c) judgment/estimation processes, and (d)
response. More recently, elements of disciplines like linguistics, anthropology, or
sociology have been incorporated to account for the effects of context, social, and
cultural factors, etc., on response processes (e.g., Chepp & Gray, 2014).
Regarding the theory of the method, Willis (2015) thinks that CI still relies on
Ericsson and Simon’s (1980) defense of think-aloud interviews to obtain access to
the functioning of cognitive processes. For Willis (2015), the idea that persons who
spontaneously verbalize their thoughts provides a ‘window into the mind’ remains
as the theoretical base for CI, what blurs borders between think aloud and CI meth-
ods, and explains why the CI method is sometimes referred as ‘think-aloud inter-
views’. Due to the lack of empirical evidence of the veracity of verbal reports
provided by CI, current contributions from other social science disciplines (e.g.,
ethnography, sociology), and the growing application of CI cross-cultural research,
CI is starting to be viewed as a qualitative method and something more than just
‘cognitive’ (e.g., Willis & Miller, 2011).
Among the qualitative approaches to CI, one of the most promising is the interpre-
tive perspective within the framework of Grounded Theory (e.g., Ridolfo & Shoua-
Glusberg, 2011). The rationale behind of this approach is the production of a full
216 J.-L. Padilla and J.P. Leighton

range of themes in CI data and the need to study the CI topic (in this case, the response
processes to a survey question) until saturation is reached. Briefly, from an interpre-
tive perspective, the topic is what the meaning is for the respondent, and meaning is
socially constructed by the respondent in a particular moment and a particular social
location. A detailed treatment of the interpretative perspective, in the context of CI,
can be found in Chepp and Gray (2014). Miller, Willson, Chepp, and Padilla (2014)
present an exhaustive description of the main phases and aspects of CI methodology
from an interpretative perspective. The next section of the chapter presents examples
of studies conducting CI as a validation method in the context of educational testing
and psychological assessment from an interpretative perspective.

Conducting a CI Validation Study: Main Phases, Procedural


Issues, and Examples

To help researchers in making informed decisions on the adequate method–either


CI or think-aloud–to obtain validity evidence of response processes, this section
presents the main phases and some practical issues on CI. Fortunately, readers inter-
ested in all procedural details of the CI method can be referred to a set of books
published in the last years: Collins (2015), Miller, Chepp, Wilson, and Padilla
(2014), Willis (2005), and Willis (2015). From different approaches to CI, these
books were mainly written for an audience of survey researchers. Considering the
aim of the chapter and our experience, we have selected and adapted the contents
that can be more useful for a validation study in educational and psychological test-
ing. The three main phases to be considered when planning a CI validation study are
discussed in the following.
Fitting the CI Study into the Overall Validation Project The introduction to the
evidence based on response processes in the validity chapter of the latest Standards
(AERA et al., 2014), include indications that can help in responding to the question,
when should a validation study based on response processes be conducted?
Obviously, the question is relevant for both CI and think-aloud methods, and should
be responded to before considering applying any of them. The indications point out
the validity research questions for which both methods can be appropriate:
“Evidence of response processes can contribute to answering questions about dif-
ference in meaning or interpretations of test scores across relevant subgroups of test
takers” (AERA et al., 2014, p. 15).
Benitez and Padilla (2014) propose three general propositions which could be
examined by evidence provided by CI: (a) the performance of test takers reflects the
psychological processes delineated in test or scale specifications, (b) the processes
of judges or observers while evaluating test taker’s products are consistent with the
intended interpretations of scores, and (c) relevant subpopulations of test takers
defined by demographic, linguistic, or cultural groups do not differ in the response
processes to test and scale items.
12 Cognitive Interviewing and Think Aloud 217

CI fits into different moments of overall validation projects and can be integrated
with other validation methods. To support test uses or propositions involved in a
validity argument can require multiple strands of quantitative or qualitative validity
evidence. For example, Castillo and Padilla (2013) conducted CI to interpret differ-
ences in the factor structure of a psychological scale intended to measure the con-
struct of family support. Therefore, the integration of different validation methods,
among them the CI method, should be addressed in a systematic way from the
beginning of the validation project. A mixed methods research framework, intro-
duced by Padilla and Benitez (in this book), offers a path to reaching such
integration.
Planning CI To contrast with CI practice in survey research, in which single sur-
vey questions are the “target,” we intend to obtain evidence of response processes of
multi-items tests or scales. Of course, researchers can focus on particular items, but
test takers respond to tests and scales as a whole. Conrad and Blair (2009) stated
conditions in order that CI can provide evidence of a non-automatic processing of
item scales. Unsurprisingly, to sum up such conditions, test takers should be aware
of response processes and able to communicate about them during the interview.
Planning CI involves taking care of many procedural issues. Next, we address the
most important aspect of planning in the context of educational and psychological
testing.
Developing the Interview Protocol A movie script can come to the reader’s mind as
an example of an interview ‘protocol’. At the end of the day, a CI is an interview
with two main characters: interviewer and respondent. To some extent, the compari-
son conveys the key role of the interview protocol. It consists of the introduction of
the study to the respondents (e.g., statements of the research aims, main topics,
responsible organization, confidentiality), information of the expected role for the
respondent, and the probes. However, as a validation method, the interview protocol
is much more than a script. The content of the interview protocol, structure, even its
length, reflects the researcher’s approach to the CI method. To opt for a reparative
vs. a descriptive approach to CI leads to very different interview protocols. A CI
study from an interpretative approach develops an interview protocol which allow
researchers to obtain the socially constructed meaning of the items for the respon-
dent whereas, from a solving-problem perspective, the protocol intends to facilitate
the evaluation of questions task. Table 12.1 outlines the bi-directional conditioning
effects between the roles of the respondents and interviewers, and the kind of probes
mostly included in the interview protocol.
Willson and Miller (2014) presented what we can call two oppositions to charac-
terize the expected role of the respondents and the interviewers that condition the
kind of probes included in the interview protocol. The respondents act as ‘evalua-
tors’ when they are asked to evaluate parts of the question: stem, response options,
or their own cognitive processes, while acting as ‘storytellers’, where “they are
asked to generate a narrative that depicts ‘why they answered the question in the
way that they did’” (Willson & Miller, p. 26). The second opposition sets a parallel-
ism with the expected role of the interviewer as a ‘data collector’ or as a ‘researcher’.
218 J.-L. Padilla and J.P. Leighton

Table 12.1 Relation between interviewer and respondent


Role Respondent Probes Role Interviewer
Evaluator Standardized and structured Data collector
Storyteller Spontaneous Researcher

Table 12.2 Examples of probes used in interview protocols


GENERAL PROBE:
P.1. Let’s start talking about how you answered the first questions. The first questions were
about how important aspects like “work,” “family,” “friends,” etc., are for you. How did you
answer these questions? What did you take into account for responding?
SPECIFIC PROBES:
P.2. One of the aspects was “family,” what did came to your mind while responding? What
persons did you think of?
P.3. Other aspect was “friends and acquaintances,” you have answered _______ (See and read
the alternative selected by the participant in statement), tell me more about your answer, why
did you answer that.

If the interviewer is instructed to ask the same probes in the same way to every
respondent, we have data collectors that do their best to avoid interviewer biases and
preserve CI data accuracy. In contrast, the interviewer is a qualitative researcher
when they “assess the information that he or she is collecting and examine the
emerging information to identify any gaps, contradictions, or incongruences in the
respondent’s narrative” (Willson & Miller, 2014, p. 30). In this case, the interview
protocol is open to what Willis (2005) called spontaneous or free-form probes to
help interviewers in leading the interviewing.
Benitez, He, van Vijver, and Padilla (2016) conducted a CI study to obtain valid-
ity evidence of the response processes to some quality-of-life questions and scale
items used in international studies, comparing Spanish and Dutch respondents.
Table  12.2 presents a sample of the interview protocol for questions intended to
capture how important aspects like family, work, friends, etc., are for participants.
The sample includes a general probe and two specific probes. Interviewers were
instructed to resort to the specific follow-up probes when interviewees’ comments
did not provide a full narrative of what items meant for them and how they had
constructed their responses.
The books by Willis (2005) and Miller, Willson, et al. (2014) provide detailed
descriptions of the different kind of probes, and how they determine not just inter-
viewer and respondent roles, but also, as could not be otherwise, the CI data
analysis.
Recruitment How many interviews and who should be the respondents are perma-
nent concerns when researchers decide to conduct a CI validation study. Researchers
should not forget that CI is a qualitative method. Thus, sampling is not a primarily
numerical matter, but a purposive one. Learning from the survey research field, we
can base sampling on demographic diversity or the topic covered by the items. For
12 Cognitive Interviewing and Think Aloud 219

example, if we want to obtain validity evidence of the response processes to a qual-


ity of life scale for people with disability, CI sampling should include people with
different disabilities. The AERA et al. (2014) Standards reiterates the idea of com-
paring response processes “about difference in meaning or interpretation of test
scores across relevant subgroups of test takers” (p. 15).
As a qualitative validation method, CI can benefit from criteria to respond to the
question of sample size and composition: theoretical saturation and theoretical rel-
evance (Willson & Miller, 2014). In the context of an educational testing or psycho-
logical assessment validation study, theoretical saturation implies that one keeps
interviewing until research reaches a full understanding of how and why respon-
dents answer the items and find potential difference across groups of respondents.
With respect to theoretical relevance, along with respondents belonging to the rel-
evant groups defined in the validity intended interpretations, researchers should
consider including participants that can provide as much diversity as possible
regarding response processes to the test items. As it is hard to avoid giving a num-
ber, in our experience, both criteria can be met with between 20 and 50 interviews.
Interviewer Training There is no simple answer to what competencies a good inter-
viewer should have. Obviously, there are technical abilities and interpersonal skills
that can make a difference when conducting the CI method. Willis (2005) described
the technical background that can be helpful: (a) some type of social science educa-
tion, (b) knowledge and experience in questionnaire design, (c) some exposure to
the subject matter of the questionnaire, and (d) experience in conducting CI. The
more experience interviewers have, the more capable they will be to manage and
lead interviews. Willis (2005) also paid attention to the non-technical skills the
interviewers should have: “the ability to be flexible, spontaneous, and cool under
duress” (p. 130).
Analyzing CI data All major manuals of the CI method include a chapter devoted
to CI data analysis. As readers may guess, different approaches to CI correspond
with different types of analytic processes. Willis (2015) published a state-of-art
book titled Analysis of the Cognitive Interview in Questionnaire Design, where the
different analytical strategies, models, and critical issues in current analytic prac-
tices can be found.
We summarize here the analytic process from an interpretative approach to CI as
a method to obtain validity evidence of response processes. Miller, Willson, et al.
(2014) outline five incremental steps by which the reduction and synthesis process
of CI data can be conceptualized: (1) conducting interviews to produce the inter-
view text; (2) synthetizing interview text into summaries; (3) comparing summaries
across responding to produce a thematic schema; (4) comparing identified themes
across subgroups to produce an advanced schema; (5) making conclusions to pro-
duce a final study conclusion. From this perspective, analysis starts with the inter-
view itself given that the interviewer acting as a researcher make analytic decisions
along the way: identifying contradictions, following up respondent first responses,
etc. Lastly, the main steps described follow an iterative process in practice: analysts
go forward and backward through the analytic process (Miller, Willson, et al. 2014).
220 J.-L. Padilla and J.P. Leighton

Benitez et  al. (2016) followed the interpretative approach to analyze CI data
obtained to compare response processes to quality-of-life questions and scale items
between Spanish and Dutch respondents. For example, the researchers found a dif-
ferent interpretation pattern of the family concept. In contrast to Dutch participants,
Spanish participants include within the family concept not just the immediate fam-
ily, but also relatives and friends.

Introduction and State-of-the-Art in Conducting Think Aloud


Interviews

History and Overview of Think Aloud Interviews

The think aloud interview is a psychological method used to collect data about
human information processing, namely, problem solving. Problem solving has been
defined as the goal-driven process of finding a solution to a complex state of affairs
(Newell & Simon, 1972). Problem solving requires the manipulation of information
to create something new and, therefore, is normally involved in higher-level skills
found in Bloom’s taxonomy (Bloom, Engerhart, Furst, Hill, & Krathwohl, 1956).
The think-aloud interview can be a useful tool in determining whether test items or
tasks elicit problem-solving processes. The think-aloud interview technique needs
to be distinguished from cognitive labs, which are used to measure a wider array of
response processes, especially comprehension (see Leighton, 2017a, 2017b).
Cognitive labs are not the focus of this section and not discussed further.
The think-aloud interview has historical roots in experimental self-observation,
a method used by Wilhelm Wundt (1832–1920) to systematically document the
mental experiences of trained human participants to a variety of sensory stimuli.
Unlike introspection, experimental self-observation was standardized to provide a
structured account of the unobservable but systematic human mental experience.
However, beginning in the 1920s, behaviorism became the dominant paradigm for
studying psychological phenomena and only observable behavior was viewed as
worthy of measurement. In the 1950s, the cognitive revolution, instigated by schol-
ars such as Noam Chomsky and psychologists such as George Miller, Allan Newell,
Jean Piaget, and Herbert Simon effectively replaced behaviorism as the dominant
paradigm and methods to scientifically study mental experiences as accounts for
human behavior and became a focus of interest (Leahey, 1992).
The think aloud interview as it is currently conceived was developed by two
cognitive scientists, K. Anders Ericsson and Herbert Simon. In 1993, Ericsson and
Simon wrote their seminal book Protocol Analysis: Verbal Reports as Data based
upon a decade of their own research into the scientific study of human mental pro-
cessing (e.g., Ericsson & Simon, 1980) and a review of previous research that was
focused on the study of human mental processing. The 1993 book continues to be
the major reference in the field. A careful reading of their book makes the following
12 Cognitive Interviewing and Think Aloud 221

unequivocal – inferences or claims about human problem solving are supported by


data collected from think aloud interviews only when the interviews are conducted
in a highly structured and systematic manner. In particular, the following conditions
must hold: (a) the content of the interview must involve a problem-solving task, (b)
the problem-solving task must require what is called controlled processing (i.e.,
processing that is not automatic but, rather in the participant’s awareness and open
to verbalization) for its solution, and (c) the interview probes must be minimal and
non-directive, without requests for elaboration and explanation, to allow the partici-
pant to verbalize concurrently. These three conditions must be met if the objective
of the think-aloud interview is to collect evidence about human problem solving. If
these conditions do not hold, claims or inferences about human problem solving are
suspect at best and unwarranted at worst (Ericsson & Simon, 1993; Fox, Ericsson,
& Best, 2011; Leighton, 2004). Hence, in the validity arguments created to bolster
claims about test items measuring problem solving processes (e.g., in mathematical
or scientific domains), the data from think aloud interviews can only serve as evi-
dence of such claims if the data have been collected according to specific proce-
dures, as discussed next.

Interview Sessions for Conducting Think-Aloud

There are normally two sessions or parts to include in the think-aloud interview– the
concurrent session and the retrospective session. Both involve unique interview
probes. The details of these have been elaborated at length in past publications (e.g.,
Ericsson & Simon, 1993; see also Leighton, 2004, 2013, 2017b for instructions),
but a summary bears repeating here. First, the concurrent session of the interview is
most important and characterized by requesting the participant (or examinee) to
verbalize his or her thoughts aloud in response to a problem-solving task. The
objective is to have the participant (or examinee) solve the task and simultaneously
verbalize the mental processes being used, in real time, to solve it. During this part
of the interview, the interviewer should not interrupt with any questions (e.g., Can
you elaborate on why you are drawing a diagram to solve the problem?) that would
disrupt the flow of problem solving and thus verbalization or lead the participant to
consider a distinct problem solving route (e.g., Why not consider a diagram in solv-
ing the problem?) not previously contemplated. The only probes the interviewer
should use during this session are non-directed reminders to the examinee to verbal-
ize thoughts as he or she is solving problem. For example, permissible non-directive
probes would include, “Please keep talking” or “Please remember to verbalize.”
The interviewer should avoid directive probes such as “What are you thinking”
because this probe is a question that takes focus away from the task and requires the
examinee to respond to the interviewer. If these protocol or procedural details seem
overly specified, it is deliberate. True-to-life problem-solving processes are not nec-
essarily robust to measurement–meaning that they are difficult to measure accu-
rately. This is because these processes take place in working memory and the
222 J.-L. Padilla and J.P. Leighton

contents of working memory are fleeting (see Ericsson & Simon, 1993). The data
produced from this concurrent phase comprise a verbal report.
The second part of the think aloud interview is the retrospective session, and it is
secondary in importance. It is characterized by having the examinee recount how he
or she solved the problem-solving task. The retrospective session follows directly
after the concurrent session and is initiated by having the interviewer request for the
examinee to “Please tell me how you remember solving the task.” During the ses-
sion, the interviewer may ask for elaboration and explanation of how the examinee
remembers solving the task (e.g., Why did you decide to draw the diagram?). These
elaborative questions are designed to help contextualize the verbal report the exam-
inee provided during the concurrent session. The verbalizations an examinee pro-
vides during the retrospective session are not considered to be the primary evidence
for supporting claims about problem solving (see Ericsson & Simon, 1993). This is
because the retrospective session relies heavily on an examinee’s memory and does
not capture the problem-solving process in vivo. One of the main weaknesses of
verbal reports as evidence of problem-solving processes is the failure to follow pro-
tocol, namely, properly collect the reports during the concurrent session of the inter-
view (see Fox et al., 2011; Leighton, 2004, 2013; Wilson, 1994). These failures will
undermine the utility of verbal reports in validity arguments.

Conducting a Think-Aloud Validation Study: Main Phases,


Procedural Issues and Examples

There are five phases for conducting think aloud interviews. The phases include: (1)
cognitive model development; (2) instructions; (3) data collection using concurrent
and retrospective probes; (4) coding of verbal reports using protocol analysis; and
(5) generating inferences about participants’ response processes based on the data.
Each of these phases involves specific methods or procedures. It is beyond the scope
of the chapter to delve into these details, but interested readers are referred to
Leighton (2017b) for a fuller exposition. At this point, it is important to repeat that
the phases of the think-aloud interview differ from those used in ‘cognitive labs’, a
variant interview of the think-aloud method that is used to measure comprehension
rather than problem solving (the reader is again referred to Leighton, 2017b for a
full exposition on the differences between think aloud interviews and cognitive
labs). In this section, the main phases of the think aloud are summarized with brief
presentation of procedural issues, with examples.
Cognitive Model Development Think-aloud interviews can yield a significant
amount of verbal report data to analyze. Often researchers can become overwhelmed
with the extent of the report data and what to focus on and evaluate as evidence of
response processes. This is one reason why the first step in conducting a think-aloud
is to develop a cognitive model, or some type of flowchart that outlines the knowl-
edge and skills expected to underlie performance. The cognitive model does not
12 Cognitive Interviewing and Think Aloud 223

have to be complicated. However, it should illustrate the response processing


expected as it will serve as a roadmap for identifying the knowledge and skills of
interest in the verbal reports. If the model fails to fully or partially illustrate what is
observed in the reports, then the model is refined based on the data. Leighton, Cui,
and Cor (2009) provide an example of an expert-based cognitive model from expert
analysis. It is a coarse-grained model developed by an expert for 15 algebra multiple
choice SAT items; finer-grained models can be developed but can present chal-
lenges for inter-rater reliability. The cognitive model is the first step in structuring
the measurement of response processes.
Think-Aloud Instructions Think aloud interviews, as originally conceived by
Ericsson and Simon (1993), are used primarily to measure problem solving pro-
cesses. The instructions used to initiate the interview must therefore be adminis-
tered to ensure (a) participant comfort with verbalizing problem solving processes
(a practice phase), (b) the minimization of participant response bias (indicate non-
evaluation), and (c) participant focus on concurrent verbalization (concurrent
probes). Because participants can easily become self-conscious about problem solv-
ing in front of an interviewer, it is suggested that participants be given time to prac-
tice projecting their voice. Often, participants will express and show comfort
verbalizing with practice tasks, but when they begin the actual task of interest, will
go silent. This often occurs as simultaneously thinking through the task information
and verbalizing burdens working memory resources. However, participants need to
be reminded to verbalize as they think through the task as this is the target of what
is being measured, even if this means slowing down how they solve the task.
Data Collection Using Concurrent and Retrospective Probes As mentioned
previously, there are two parts to the think-aloud interview–a concurrent session and
a retrospective session. Each session has unique probes to ensure that the target
response processing, namely problem solving, is being measured as intended. As
explained in Leighton (2017b), only minimal, non-obtrusive and non-directed
probes are permissible in the concurrent session, where the actual problem solving
of interest is being observed and measured. Permissible concurrent probes include
“Please keep talking” and “Remember to continue talking.” Elaborative probes that
involve “why” or “how” questions are not permissible as they are often directive,
obtrusive, and may bias the problem solving in which the participants is engaging.
For example, probes such as “Why did you do this?” or “How did you decide to
select this option?” can function as a source of feedback and influence the direction
of problem solving. Elaborative probes are permissible during the retrospective ses-
sion given that this session is designed to provide complementary but secondary
evidence in relation to the problem-solving response processing (see Leighton,
2017b).
Coding of Verbal Reports from Think Aloud Interviews When verbal reports
are collected as part of a validation project, the integrity of the data alongside
interpretations or observations made from the data must be carefully considered and
verified. For this reason, the coding of verbal reports should follow a rigorous and
224 J.-L. Padilla and J.P. Leighton

standardized process that includes multiple raters and computation of inter-rater


reliability (see Leighton, 2017b for details). First, a coding manual needs to be cre-
ated based on the cognitive model developed for the task of interest. Second, the
coding manual should include the set of knowledge and skills expected and exam-
ples of types of verbalizations that would present as evidence of these knowledge
and skills. Third, at least two raters need to be independently trained to use the
manual to categorize a proportion of the verbalizations of interest (e.g., 15–25%).
Fourth, the raters need to be naïve to the objectives of the think aloud interviews, in
terms of task difficulty, discrimination, potential differential item functioning, etc.
Fifth, the initial agreement between the raters needs to be computed and, if low,
further training undertaken to increase reliability of verbal report interpretation.
Sixth, once inter-rater agreement is acceptable (e.g., Kappa of .60 or greater; see
Landis & Koch, 1977), one rater can proceed to code the remainder of the verbaliza-
tions in the reports.
Generating Inferences About Response Processes Related to Problem
Solving As already noted, when verbal reports become part of the validity argu-
ment for claiming that test-takers are engaging in specific problem solving pro-
cesses, the integrity of verbal report data and observations made about the data must
be verified. Claims made about problem solving processes cannot be made using
any type of interview method (see Ericsson & Simon, 1993; Leighton, 2017a;
Wilson, 1994). Although the think-aloud interview provides a tool for measuring
problem solving, procedural deviations can undermine the data collected and claims
(see Leighton, 2017b for cognitive labs and the target of measurement). Thus, gen-
erating inferences about response processes requires not only collecting verbal
report data but also demonstrating that the procedures used to collect and interpret
those data minimize bias, and are not subject to alternative interpretations and idio-
syncratic conclusions. These issues are elaborated in Leighton (2017b).

Conclusion: Strengths and Weaknesses of Verbal Reports


for Validity Arguments

As indicated previously, the Standards (AERA et al., 2014) maintain the need to
include evidence of response processes when generating validity arguments to sup-
port claims about skills, competencies, attitudes, beliefs, etc. that are difficult to
observe or measure directly. While the Standards emphasize the need for evidence
of response processes, the Standards do not describe how this evidence should be
gathered and the best practices for gathering this evidence. Clearly, it can be
assumed that evidence used to validate claims needs to be sound. The good news is
that there is a solid base of past research on the conditions for gathering this evi-
dence using different interview methods–cognitive interviews and think-aloud to
name the two to which this chapter is devoted–and a growing body of research
specifically in the domain of educational testing and increasingly so in psychologi-
cal assessment, cross-cultural testing, etc.
12 Cognitive Interviewing and Think Aloud 225

A fundamental step in including interviews in validation arguments is to be clear


about what type of response processing is being measured using the interview.
Toward this end, at least for think-aloud interviews, it is necessary to identify the
expected knowledge and skills expected before planning the interviews and admin-
istering tasks or items to examinees. Next, the integrity of the data has to be verified.
It bears repeating that the strength of verbal report evidence is contingent on how
the data were collected and how the data are interpreted. Coding manuals, raters,
and checks on rater evaluations are key.
We expected that readers have obtained a clear picture of the main characteristics
of CI and think-aloud methods. Even though origins and theoretical bases are closer
than expected–both methods rely on the foundations developed by Ericsson and
Simon (1980)–the evolution and current practices as it is exposed in this chapter,
allow delimitating both methods. For example, while in think-aloud, the interview
probes must be minimal and non-intrusive, CI is in general much more direct and
intrusive by requesting elaborations from respondents, encouraging interviewing to
look for contradictions in respondents’ narratives, etc. Furthermore, the think-aloud
interview is focused on the human problem solving domain, whereas CI not just
comes from survey research but its use is growing in psychological assessments,
cross-cultural research, etc.
Within any validity theoretical framework, researchers should be aware that the
most important decisions to be made before collecting verbal reports using the think
aloud interview is determining what are the response processes that test items are
expected to measure and what are the appropriate procedures for collecting this
evidence without biasing the data and the subsequent inferences. In contrast, CI
could be conducted from, let us say, a more exploratory perspective to uncover what
questions and scales are capturing.
Verbal report data, regardless of the method used to obtain them, are no different
than any other data; quality rests with the methods used to minimize bias and avoid
idiosyncrasies in interpretation. Both methods require that data are collected accord-
ing to specific procedures. The difference is how think-aloud and CI understand and
deal with such “bias.” While think-aloud rely on rigorous and standardized inter-
raters agreement evaluation, CI, as a qualitative method, trusts that transparency
establishes credibility and validity through all phases. CI researchers should docu-
ment any decision made while conducting the method, especially during CI data
analysis, in order to achieve transparency.

References

American Educational Research Association, American Psychological Association, & National


Council on Measurement in Education [AERA, APA, & NCME]. (1999). Standards for educa-
tional and psychological testing. Washington, DC: American Educational Research Association.
American Educational Research Association, American Psychological Association, & National
Council on Measurement in Education [AERA, APA, & NCME] (2014). The standards for
educational and psychological testing. Washington DC: Author.
226 J.-L. Padilla and J.P. Leighton

Beatty, P., & Willis, G. (2007). Research synthesis: The practice of cognitive interviewing. Public
Opinion Quarterly, 71(2), 287–311.
Benítez, I., He, J., van de Vijver, F. J. R., & Padilla, J. L. (2016). Linking extreme response styles
to response processes: A cross-cultural mixed methods approach. International Journal of
Psychology, 51, 464–473.
Benítez, I., & Padilla, J. L. (2014). Analysis of non-equivalent assessments across different linguis-
tic groups using a mixed methods approach: Understanding the causes of differential item
functioning by cognitive interviewing. Journal of Mixed Methods Research, 8, 52–68.
Bloom, B. S., Engerhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of
educational objectives: The classification of educational goals. Handbook I: Cognitive domain.
New York, NY: David McKay.
Castillo, M., & Padilla, J. L. (2013). How cognitive interviewing can provide validity evidence of
the response processes to scale items. Social Indicators Research, 114, 963–975.
Chepp, V., & Gray, C. (2014). Foundations and new directions. In K. Miller, S. Willson, V. Chepp,
& J.  L. Padilla (Eds.), Cognitive interviewing methodology (pp.  7–14). Hoboken, NJ: John
Wiley & Sons.
Cizek, G. J., Rosenberg, S. L., & Koons, H. H. (2007). Sources of validity evidence for educational
and psychological tests. Educational and Psychological Measurement, 68, 397–412.
Collins, D. (2015). Cognitive interviewing practice. London, UK: Sage.
Conrad, F.  G., & Blair, J.  (2009). Sources of error in cognitive interviews. Public Opinion
Quarterly, 73, 32–55.
Embretson, S. (1983). Construct validity: Construct representation versus nomothetic span.
Psychological Bulletin, 93, 179–197.
Ericsson, K.  A., & Simon, H.  A. (1980). Verbal reports as data. Psychological Review, 87,
215–251.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge,
MA: The MIT Press.
Fox, M. C., Ericsson, A., & Best, R. (2011). Do procedures for verbal reporting of thinking have
to be reactive? A meta-analysis and recommendations for best reporting methods. Psychological
Bulletin, 137, 316–344.
Kane, M.  T. (2013). Validation as a pragmatic, scientific activity. Journal of Educational
Measurement, 50, 115–122.
Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567.
Jabine, T., Straf, M., Tanur, J., & Tourangeau, R. (Eds.). (1984). Cognitive aspects of survey
design: Building a bridge between disciplines. Washington, DC: National Academy Press.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreetment for categorical data.
Biometrics, 33, 159–174.
Leahey, T. H. (1992). A history of psychology (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: The collection
of verbal reports in educational achievement testing. Educational Measurement: Issues and
Practice, 23, 6–15.
Leighton, J. P. (2013). Item difficulty and interviewer knowledge effects on the accuracy and con-
sistency of examinee response processes in verbal reports. Applied Measurement in Education,
26, 136–157.
Leighton, J.  P. (2017a). Collecting, analyzing and interpreting verbal response process data. In
K. Ercikan & J. Pellegrino (Eds.), Validation of score meaning in the next generation of assess-
ments. Routledge.
Leighton, J. P. (2017b). Using think aloud interviews and cognitive labs in educational research.
Oxford, UK: Oxford University Press.
Leighton, J. P., Cui, Y., & Cor, M. K. (2009). Testing expert-based and student-based cognitive
models: An application of the attribute hierarchy method and hierarchical consistency index.
Applied Measurement in Education, 22, 229–254.
12 Cognitive Interviewing and Think Aloud 227

Loftus, E. (1984). Protocol analysis of response to survey recall questions. In T. Jabine, M. Straf,
J.  Tanur, & R.  Tourangeau (Eds.), Cognitive aspects of survey design: Building a bridge
between disciplines (pp. 61–64). Washington, DC: National Academy Press.
Messick, S. (1990). Validity of test interpretation and use, Research report No. 90–11. Princeton,
NJ: Education Testing Service.
Miller, K. (2011). Cognitive interviewing. In K. Miller, J. Madans, A. Maitland, & G. Willis (Eds.),
Question evaluation methods: Contributing to the science of data quality (pp.  51–75).
New York, NY: Wiley.
Miller, K. (2014). Introduction. In K. Miller, S. Willson, V. Chepp, & J. L. Padilla (Eds.), Cognitive
interviewing methodology (pp. 1–6). New York, NY: Wiley.
Miller, K., Chepp, V., Willson, S., & Padilla, J. L. (Eds.). (2014). Cognitive interviewing methodol-
ogy. New York, NY: Wiley.
Miller, K., Willson, S., Chepp, V., & Ryan, J.  M. (2014). Analyses. In K.  Miller, S.  Willson,
V. Chepp, & J. L. Padilla (Eds.), Cognitive interviewing methodology (pp. 35–50). New York,
NY: Wiley.
Newell, A., & Simon, H.  A. (1972). Human problem solving. Englewood Cliffs, NJ:
Prentice-Hall.
Padilla, J. L., & Benítez, I. (2014). Validity evidence based on response processes. Psicothema, 26,
136–144.
Presser, S., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., Rothgeb, J. M., & Singer, E. (2004).
Methods for testing and evaluating survey questions. Public Opinion Quarterly, 68, 109–130.
Ridolfo, H., & Schoua-Glusberg, A. (2011). Analyzing cognitive interview data using the constant
comparative method of analysis to understand cross-cultural patterns in survey data. Field
Methods, 23, 420–438.
Schwarz, N. (2007). Cognitive aspects of survey methodology. Applied Cognitive Psychology, 21,
277–287.
Shear, B. R., & Zumbo, B. D. (2014). What counts as evidence: A review of validity studies in
educational and psychological measurement. In B. D. Zumbo & E. K. H. Chan (Eds.), Validity
and validation in social, behavioral, and health sciences (pp.  91–111). New  York, NY:
Springer.
Shulruf, B., Hattie, J., & Dixon, R. (2008). Factors affecting responses to Likert type question-
naires: Introduction of the ImpExp, a new comprehensive model. Social Psychology of
Education, 11, 59–78.
Sireci, S. G. (2012, April). “De-constructing” test validation. Paper presented at the annual con-
ference of the National Council on Measurement in Education as part of the symposium
“Beyond Consensus: The Changing Face of Validity” (P. Newton, Chair), Vancouver, BC.
Stone, J., & Zumbo, B. D. (2016). Validity as a pragmatist project: A global concern with local
application. In V.  Aryadoust & J.  Fox (Eds.), Trends in language assessment research and
practice (pp. 555–573). Newcastle, UK: Cambridge Scholars.
Tourangeau, R. (1984). Cognitive science and survey methods: A cognitive perspective. In
T.  Jabine, M.  Straf, J.  Tanur, & R.  Tourangeau (Eds.), Cognitive aspects of survey design:
Building a bridge between disciplines (pp.  73–100). Washington, DC: National Academy
Press.
Willis, G. B. (2005). Cognitive interviewing. Thousand Oaks, CA: Sage.
Willis, G. B. (2009). Cognitive interviewing. In P. Lavrakas (Ed.), Encyclopedia of survey research
methods (Vol. 2, pp. 106–109). Thousand Oaks, CA: SAGE.
Willis, G. B. (2015). Analysis of the cognitive interview in questionnaire design. New York, NY:
Oxford University Press.
Willis, G., & Miller, K. (2011). Cross-cultural cognitive interviewing: Seeking comparability and
enhancing understanding. Field Methods, 23, 331–341.
Wilson, T.  D. (1994). The proper protocol: Validity and completeness of verbal reports.
Psychological Science, 5, 249–252.
228 J.-L. Padilla and J.P. Leighton

Willson, S., & Miller, K. (2014). Data collection. In K. Miller, S. Willson, V. Chepp, & J. L. Padilla
(Eds.), Cognitive interviewing methodology (pp. 15–33). New York, NY: Wiley.
Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications
for validation practice. In R. W. Lissitz (Ed.), The concept of validity (pp. 65–83). Charlotte,
NC: Information Age Publishing, Inc.
Zumbo, B. D., & Shear, B. R. (2011). The concept of validity and some novel validation methods.
In Northeastern Educational Research Association (p. 56). Rocky Hill, CT.

You might also like