Padilla Cognitive 2017
Padilla Cognitive 2017
Padilla Cognitive 2017
Bruno D. Zumbo
Anita M. Hubley Editors
Understanding and
Investigating Response
Processes in Validation
Research
Editors
Bruno D. Zumbo Anita M. Hubley
Measurement, Evaluation, and Research Measurement, Evaluation, and Research
Methodology (MERM) Program, Methodology (MERM) Program,
Department of Educational and Department of Educational and
Counselling Psychology, and Special Counselling Psychology, and Special
Education (ECPS) Education (ECPS)
The University of British Columbia The University of British Columbia
Vancouver, BC, Canada Vancouver, BC, Canada
José-Luis Padilla and Jacqueline P. Leighton
Although a case could be made that the need for explanations of item responses has
been around since the origins of validity theory, the 1999 edition of the Standards
for Educational and Psychological Testing (AERA, APA, & NCME, 1999), can be
considered the official birth certificate of validity evidence based on response pro-
cesses as a source of validity evidence. Previous relevant references can be traced to
a recommendation by Messick (1990) to look at how subjects cope with items and
tasks to identify processes underlying item responses, efforts by Embretson (1983)
linking cognitive psychology to item response theory, or even the earliest definitions
of validity, if we just take an interest in knowing “what the test measures.”
At the same time that professionals, researchers, and testing organizations have
been incorporating research on response processes into their test development and
evaluation practices, the current Standards (AERA, APA, & NCME, 2014) main-
tains response processes among the five sources of validity evidence. On the down-
side, the latest Standards has not gone further in providing more indications than in
the previous edition on how to obtain solid validity evidence based on response
processes. Systematic reviews of validation studies reveal that few studies are con-
ducted to obtain validity evidence using response processes. Cizek, Rosenberg, and
Koons (2007) found that validity evidence based on participants’ response pro-
cesses were studied only in 1.8% of the papers. Zumbo and Shear (Shear & Zumbo,
2014; Zumbo & Shear, 2011) showed a higher presence but still a minority com-
pared with the other sources of validity evidence; for instance, in the medical
J.-L. Padilla (
)
University of Granada, 18071 Granada, Spain
e-mail: [email protected]
J.P. Leighton
Center for Research in Applied Measurement and Evaluation (CRAME), Department of
Educational Psychology, Faculty of Education, University of Alberta,
6-119D Education North Building, Edmonton, AB T6G 2G5, Canada
e-mail: [email protected]
outcomes field, only 14% of the validation studies were aimed at obtaining evidence
of the response processes.
The lack of experience, consolidation of best practices, and recommendations on
how to obtain evidence of response processes can lead to missed opportunities pro-
vided by new conceptual and methodological developments in validity theories and
validation methods. Among the various validation methods that can provide evi-
dence of response processes, this chapter is devoted to cognitive interviewing (CI)
and think aloud methods.
The target audience of this chapter is professionals and researchers looking for
methodological guidance to perform validation studies by using CI and think aloud
methods. In the chapter, we (a) describe the state-of-the-art in conducting think
aloud and CI studies, (b) describe similarities and difference between the methods,
and (c) demonstrate how both methods can provide validity evidence of response
processes.
CI and think aloud methods are described in the context of educational and psy-
chological testing. Both methods are often applied in survey research too, mainly as
pre-testing methods to fix problems and improve survey questions. In fact, as we
discuss in the following sections, both methods have common origins and not as
distant developments as it might seem. We intend to provide arguments to distin-
guish between both methods to help researchers to make informed decisions about
which method can be more useful considering the aims of the validation study.
We think that such validity evidence can be understood from a de-constructed
view of validity (e.g., Kane, 2013; Sireci, 2012) to a more contextualized and prag-
matic explanation validity framework (Stone & Zumbo, 2016; Zumbo, 2009).
Throughout the chapter, we will also present studies to illustrate the content and
how to apply think aloud and CI methods.
CI History and Overview
Before starting with a short history of the CI method, we should present a definition
and a clear description of how the method is usually applied. The need for a defini-
tion is evident given that the term is also common in fields far from educational
testing and psychological assessment, like law enforcement, where CI is a police
resource to check witness reliability. What is more, CI emerged as a question evalu-
ation method in the survey research field. Therefore, readers should pay attention to
translating definitions of CI into the educational testing and psychological assess-
ment context.
Although there is no universally accepted definition of CI, a wide consensus
exists about what Beatty and Willis (2007) think CI involves: “the administration of
12 Cognitive Interviewing and Think Aloud 213
draft survey questions while collecting additional verbal information about the sur-
vey responses, which is used to evaluate the quality of the response or to help deter-
mine whether the question is generating the information that its author intends”
(p. 287). The first task for readers is to change ‘survey question’ to ‘test items’ or
‘scale items’. A couple of years later, Willis (2009) stated that CI “… is a
psychologically-oriented method for empirically studying the way in which indi-
viduals mentally process and respond to survey questionnaires. Cognitive inter-
views can be conducted for the general purpose of enhancing our understanding of
how respondents carry out the task of answering survey questions” (p. 106).
Highlighting the core elements in both definitions of CI allows us to recognize
potential benefits from CI in validation studies of test score interpretations: (a) CI is
a psychologically-oriented method for investigating respondents’ mental processes
while answering test and scale items, (b) CI data can be useful for examining the
quality of item responses, and (c) CI can help determine whether items are captur-
ing the intended behaviors. The next section presents studies that illustrate these
core elements.
Commonly, CI pre-testing evaluation studies in survey research consist of con-
ducting in-depth interviews following an interview protocol with a small, purposive
sample of 10–30 respondents. First, respondents answer the target survey questions;
that is, the questions to be pre-tested, and then they respond to a series of follow-up
probes that vary from general and open probes, like “What were you thinking?” or
“How did you came up with that?” to much more scripted and specific follow-up
probes, such as “What does the term/word (...) mean to you?” or “How did you cal-
culate (…)?” Problems with the ‘question-and-answer’ process are usually identi-
fied and analyzed from the respondents’ narratives in the cognitive interviews.
As Miller (2014) points out, CI, by asking respondents to describe how and why
they answered survey questions as they did, provides evidence not just to fix ques-
tions but also to find out the ways respondents interpret questions and apply them to
their own lives, experiences, and perceptions. Miller’s (2014) interpretative view of
CI methodology from survey research, coincides beyond expected with the broadest
conceptions of validity theory in educational and psychological testing. For exam-
ple, the contextualized and pragmatic explanation validity framework (Zumbo,
2009) expands opportunities for CI as a validation method to obtain evidence of
response processes and to examine equivalence and sources of bias in cross-cultural
research (Benítez & Padilla, 2014). However, we need to briefly summarize the
evolution of CI before going into details of that proposal.
Almost all manuals and introductory articles on the CI method point out the
Cognitive Aspects of Survey Methodology (CASM) conference (Jabine, Straf,
Tanur, & Tourangeau, 1984) as a critical event in the history of CI. Presser et al.
(2004) also identified the influential contribution of the Loftus’ (1984) post-
conference analysis of how respondents answer questions about past events. Such
an analysis relied on the think-aloud technique to studying the solving of problems
developed by Ericsson and Simon (1980). So influential was Loftus’ (1984) work
that, since then, the think-aloud technique has been closely linked to CI either as a
214 J.-L. Padilla and J.P. Leighton
theoretical basis for the CI method (e.g., Willis, 2005), or as a data collection pro-
cedure along with verbal probing to conduct CI (e.g., Beatty & Willis, 2007).
After the CASM conference, and relying heavy on cognitive theory, cognitive
laboratories devoted to testing and evaluating survey questions were established
first at several U.S. federal agencies (e.g., the National Center for Health Statistics,
the U.S. Census Bureau), and then at Statistics Canada and equivalent official sta-
tistics institutes like Statistics Netherland, Statistics Sweden, etc. (Presser et al.,
2004). As we discuss in the next section, the role of federal agencies and official
statistics institutes can explain why CI methodology is still mainly seen as a pre-
testing method aimed at fixing problems with questions to reduce response errors
(Willis, 2005). Federal agencies and official statistics institutes have shaped CI
methodology in survey research similarly to the way that testing companies have
modeled research on item bias and differential item functioning (DIF).
Nowadays, CI practitioners and researchers live off of the advancements that the
CASM conference brought to the study of measurement errors in survey research.
The CASM movement sets the idea that respondents’ thought processes must be
understood to assess validity (Schwarz, 2007). Later, the inclusion of motivational
elements to information-processing perspectives produced a major evolution. For
example, Krosnick (1999) introduced the construct of “satisficing” to account for
the tendency of most respondents to choose the first satisfactory or acceptable
response option rather than options reflecting full cognitive effort. More compre-
hensive models of the question-and-answer process are on the way to take context,
social, and cultural elements into account, support the rationale behind the method,
and expand the range of validation research questions that could be addressed by CI
(e.g., Shulruf, Hattie, & Dixon, 2008).
CI Approaches and Theories
From the short introduction above to CI, it should be clear that CI is a qualitative
method used to examine the question-and-answer process carried out by respon-
dents when answering survey questions. Even though distinguishing between dif-
ferent purposes for conducting CI in survey research can be difficult, such a division
can help us find out the ways in which CI can provide evidence of response pro-
cesses for validation studies in testing and psychological assessment. Willis (2015)
differentiates between two apparently contrasting objectives: reparative vs. descrip-
tive cognitive interviews. With slight changes in the labels, the distinction can be
easily found in the literature when the purpose of CI is under debate (e.g., Chepp &
Gray, 2014; Miller, 2011). The reparative approach corresponds to the original need
for identifying problems in survey questions and repairing them. Traditionally in
cognitive laboratories and official statistics institutes, it has been the practice of CI
projects to answer, let us say, quantitative questions within a qualitative method:
“How many problems does the target question have?” or “Which percentage of CI
participants reveal such problem?” In contrast, the descriptive approach represents
12 Cognitive Interviewing and Think Aloud 215
CI projects whose aims are to find out how respondents mentally process and answer
survey questions instead of just uncovering response errors. Advocates of this
approach argue that CI should be planned to discover what a survey question is truly
capturing, that is, how survey questions function as measure of a particular con-
struct (e.g., Chepp & Gray, 2014; Ridolfo & Shoua-Glusberg, 2011).
The descriptive approach is in line with our proposal to rely on the CI method to
obtain validity evidence related to response processes associated with test and scale
items. There is a solid argument for the parallelism between the more comprehen-
sive objective of discovering what the survey question is truly capturing and the
2014 Standards definition for validity evidence of response processes: “Some con-
struct interpretations involve more or less explicit assumptions about the cognitive
process engaged in by test takers. Theoretical and empirical analysis of the response
processes of test takers can provide evidence concerning the fit between the con-
struct and the detailed nature of the performance or response actually engaged in by
test takers” (p. 15). The following indication to questioning test takers about their
performance strategies or responses to particular items opens the door to applying
CI methodology from a descriptive approach to obtain validity evidence of response
processes.
The question now is if there is a theory to support CI methodology. Willis (2015)
proposes to distinguish between what he calls a theory of the phenomenon, that is,
how people respond to survey questions, and a theory of the method, a theory that
supports the use of CI to test and investigate survey response processes. Starting
with the theory of the phenomenon, the CASM view, such as it is exposed by the
four-stage cognitive model by Tourangeau (1984), has been and still is the most
cited cognitive theoretical framework of response processes to survey questions.
The model presents a linear sequence from when the survey questions are presented
to the respondent to the selection of a response: (a) comprehension of the question,
(b) retrieval of relevant information, (c) judgment/estimation processes, and (d)
response. More recently, elements of disciplines like linguistics, anthropology, or
sociology have been incorporated to account for the effects of context, social, and
cultural factors, etc., on response processes (e.g., Chepp & Gray, 2014).
Regarding the theory of the method, Willis (2015) thinks that CI still relies on
Ericsson and Simon’s (1980) defense of think-aloud interviews to obtain access to
the functioning of cognitive processes. For Willis (2015), the idea that persons who
spontaneously verbalize their thoughts provides a ‘window into the mind’ remains
as the theoretical base for CI, what blurs borders between think aloud and CI meth-
ods, and explains why the CI method is sometimes referred as ‘think-aloud inter-
views’. Due to the lack of empirical evidence of the veracity of verbal reports
provided by CI, current contributions from other social science disciplines (e.g.,
ethnography, sociology), and the growing application of CI cross-cultural research,
CI is starting to be viewed as a qualitative method and something more than just
‘cognitive’ (e.g., Willis & Miller, 2011).
Among the qualitative approaches to CI, one of the most promising is the interpre-
tive perspective within the framework of Grounded Theory (e.g., Ridolfo & Shoua-
Glusberg, 2011). The rationale behind of this approach is the production of a full
216 J.-L. Padilla and J.P. Leighton
range of themes in CI data and the need to study the CI topic (in this case, the response
processes to a survey question) until saturation is reached. Briefly, from an interpre-
tive perspective, the topic is what the meaning is for the respondent, and meaning is
socially constructed by the respondent in a particular moment and a particular social
location. A detailed treatment of the interpretative perspective, in the context of CI,
can be found in Chepp and Gray (2014). Miller, Willson, Chepp, and Padilla (2014)
present an exhaustive description of the main phases and aspects of CI methodology
from an interpretative perspective. The next section of the chapter presents examples
of studies conducting CI as a validation method in the context of educational testing
and psychological assessment from an interpretative perspective.
CI fits into different moments of overall validation projects and can be integrated
with other validation methods. To support test uses or propositions involved in a
validity argument can require multiple strands of quantitative or qualitative validity
evidence. For example, Castillo and Padilla (2013) conducted CI to interpret differ-
ences in the factor structure of a psychological scale intended to measure the con-
struct of family support. Therefore, the integration of different validation methods,
among them the CI method, should be addressed in a systematic way from the
beginning of the validation project. A mixed methods research framework, intro-
duced by Padilla and Benitez (in this book), offers a path to reaching such
integration.
Planning CI To contrast with CI practice in survey research, in which single sur-
vey questions are the “target,” we intend to obtain evidence of response processes of
multi-items tests or scales. Of course, researchers can focus on particular items, but
test takers respond to tests and scales as a whole. Conrad and Blair (2009) stated
conditions in order that CI can provide evidence of a non-automatic processing of
item scales. Unsurprisingly, to sum up such conditions, test takers should be aware
of response processes and able to communicate about them during the interview.
Planning CI involves taking care of many procedural issues. Next, we address the
most important aspect of planning in the context of educational and psychological
testing.
Developing the Interview Protocol A movie script can come to the reader’s mind as
an example of an interview ‘protocol’. At the end of the day, a CI is an interview
with two main characters: interviewer and respondent. To some extent, the compari-
son conveys the key role of the interview protocol. It consists of the introduction of
the study to the respondents (e.g., statements of the research aims, main topics,
responsible organization, confidentiality), information of the expected role for the
respondent, and the probes. However, as a validation method, the interview protocol
is much more than a script. The content of the interview protocol, structure, even its
length, reflects the researcher’s approach to the CI method. To opt for a reparative
vs. a descriptive approach to CI leads to very different interview protocols. A CI
study from an interpretative approach develops an interview protocol which allow
researchers to obtain the socially constructed meaning of the items for the respon-
dent whereas, from a solving-problem perspective, the protocol intends to facilitate
the evaluation of questions task. Table 12.1 outlines the bi-directional conditioning
effects between the roles of the respondents and interviewers, and the kind of probes
mostly included in the interview protocol.
Willson and Miller (2014) presented what we can call two oppositions to charac-
terize the expected role of the respondents and the interviewers that condition the
kind of probes included in the interview protocol. The respondents act as ‘evalua-
tors’ when they are asked to evaluate parts of the question: stem, response options,
or their own cognitive processes, while acting as ‘storytellers’, where “they are
asked to generate a narrative that depicts ‘why they answered the question in the
way that they did’” (Willson & Miller, p. 26). The second opposition sets a parallel-
ism with the expected role of the interviewer as a ‘data collector’ or as a ‘researcher’.
218 J.-L. Padilla and J.P. Leighton
If the interviewer is instructed to ask the same probes in the same way to every
respondent, we have data collectors that do their best to avoid interviewer biases and
preserve CI data accuracy. In contrast, the interviewer is a qualitative researcher
when they “assess the information that he or she is collecting and examine the
emerging information to identify any gaps, contradictions, or incongruences in the
respondent’s narrative” (Willson & Miller, 2014, p. 30). In this case, the interview
protocol is open to what Willis (2005) called spontaneous or free-form probes to
help interviewers in leading the interviewing.
Benitez, He, van Vijver, and Padilla (2016) conducted a CI study to obtain valid-
ity evidence of the response processes to some quality-of-life questions and scale
items used in international studies, comparing Spanish and Dutch respondents.
Table 12.2 presents a sample of the interview protocol for questions intended to
capture how important aspects like family, work, friends, etc., are for participants.
The sample includes a general probe and two specific probes. Interviewers were
instructed to resort to the specific follow-up probes when interviewees’ comments
did not provide a full narrative of what items meant for them and how they had
constructed their responses.
The books by Willis (2005) and Miller, Willson, et al. (2014) provide detailed
descriptions of the different kind of probes, and how they determine not just inter-
viewer and respondent roles, but also, as could not be otherwise, the CI data
analysis.
Recruitment How many interviews and who should be the respondents are perma-
nent concerns when researchers decide to conduct a CI validation study. Researchers
should not forget that CI is a qualitative method. Thus, sampling is not a primarily
numerical matter, but a purposive one. Learning from the survey research field, we
can base sampling on demographic diversity or the topic covered by the items. For
12 Cognitive Interviewing and Think Aloud 219
Benitez et al. (2016) followed the interpretative approach to analyze CI data
obtained to compare response processes to quality-of-life questions and scale items
between Spanish and Dutch respondents. For example, the researchers found a dif-
ferent interpretation pattern of the family concept. In contrast to Dutch participants,
Spanish participants include within the family concept not just the immediate fam-
ily, but also relatives and friends.
The think aloud interview is a psychological method used to collect data about
human information processing, namely, problem solving. Problem solving has been
defined as the goal-driven process of finding a solution to a complex state of affairs
(Newell & Simon, 1972). Problem solving requires the manipulation of information
to create something new and, therefore, is normally involved in higher-level skills
found in Bloom’s taxonomy (Bloom, Engerhart, Furst, Hill, & Krathwohl, 1956).
The think-aloud interview can be a useful tool in determining whether test items or
tasks elicit problem-solving processes. The think-aloud interview technique needs
to be distinguished from cognitive labs, which are used to measure a wider array of
response processes, especially comprehension (see Leighton, 2017a, 2017b).
Cognitive labs are not the focus of this section and not discussed further.
The think-aloud interview has historical roots in experimental self-observation,
a method used by Wilhelm Wundt (1832–1920) to systematically document the
mental experiences of trained human participants to a variety of sensory stimuli.
Unlike introspection, experimental self-observation was standardized to provide a
structured account of the unobservable but systematic human mental experience.
However, beginning in the 1920s, behaviorism became the dominant paradigm for
studying psychological phenomena and only observable behavior was viewed as
worthy of measurement. In the 1950s, the cognitive revolution, instigated by schol-
ars such as Noam Chomsky and psychologists such as George Miller, Allan Newell,
Jean Piaget, and Herbert Simon effectively replaced behaviorism as the dominant
paradigm and methods to scientifically study mental experiences as accounts for
human behavior and became a focus of interest (Leahey, 1992).
The think aloud interview as it is currently conceived was developed by two
cognitive scientists, K. Anders Ericsson and Herbert Simon. In 1993, Ericsson and
Simon wrote their seminal book Protocol Analysis: Verbal Reports as Data based
upon a decade of their own research into the scientific study of human mental pro-
cessing (e.g., Ericsson & Simon, 1980) and a review of previous research that was
focused on the study of human mental processing. The 1993 book continues to be
the major reference in the field. A careful reading of their book makes the following
12 Cognitive Interviewing and Think Aloud 221
There are normally two sessions or parts to include in the think-aloud interview– the
concurrent session and the retrospective session. Both involve unique interview
probes. The details of these have been elaborated at length in past publications (e.g.,
Ericsson & Simon, 1993; see also Leighton, 2004, 2013, 2017b for instructions),
but a summary bears repeating here. First, the concurrent session of the interview is
most important and characterized by requesting the participant (or examinee) to
verbalize his or her thoughts aloud in response to a problem-solving task. The
objective is to have the participant (or examinee) solve the task and simultaneously
verbalize the mental processes being used, in real time, to solve it. During this part
of the interview, the interviewer should not interrupt with any questions (e.g., Can
you elaborate on why you are drawing a diagram to solve the problem?) that would
disrupt the flow of problem solving and thus verbalization or lead the participant to
consider a distinct problem solving route (e.g., Why not consider a diagram in solv-
ing the problem?) not previously contemplated. The only probes the interviewer
should use during this session are non-directed reminders to the examinee to verbal-
ize thoughts as he or she is solving problem. For example, permissible non-directive
probes would include, “Please keep talking” or “Please remember to verbalize.”
The interviewer should avoid directive probes such as “What are you thinking”
because this probe is a question that takes focus away from the task and requires the
examinee to respond to the interviewer. If these protocol or procedural details seem
overly specified, it is deliberate. True-to-life problem-solving processes are not nec-
essarily robust to measurement–meaning that they are difficult to measure accu-
rately. This is because these processes take place in working memory and the
222 J.-L. Padilla and J.P. Leighton
contents of working memory are fleeting (see Ericsson & Simon, 1993). The data
produced from this concurrent phase comprise a verbal report.
The second part of the think aloud interview is the retrospective session, and it is
secondary in importance. It is characterized by having the examinee recount how he
or she solved the problem-solving task. The retrospective session follows directly
after the concurrent session and is initiated by having the interviewer request for the
examinee to “Please tell me how you remember solving the task.” During the ses-
sion, the interviewer may ask for elaboration and explanation of how the examinee
remembers solving the task (e.g., Why did you decide to draw the diagram?). These
elaborative questions are designed to help contextualize the verbal report the exam-
inee provided during the concurrent session. The verbalizations an examinee pro-
vides during the retrospective session are not considered to be the primary evidence
for supporting claims about problem solving (see Ericsson & Simon, 1993). This is
because the retrospective session relies heavily on an examinee’s memory and does
not capture the problem-solving process in vivo. One of the main weaknesses of
verbal reports as evidence of problem-solving processes is the failure to follow pro-
tocol, namely, properly collect the reports during the concurrent session of the inter-
view (see Fox et al., 2011; Leighton, 2004, 2013; Wilson, 1994). These failures will
undermine the utility of verbal reports in validity arguments.
There are five phases for conducting think aloud interviews. The phases include: (1)
cognitive model development; (2) instructions; (3) data collection using concurrent
and retrospective probes; (4) coding of verbal reports using protocol analysis; and
(5) generating inferences about participants’ response processes based on the data.
Each of these phases involves specific methods or procedures. It is beyond the scope
of the chapter to delve into these details, but interested readers are referred to
Leighton (2017b) for a fuller exposition. At this point, it is important to repeat that
the phases of the think-aloud interview differ from those used in ‘cognitive labs’, a
variant interview of the think-aloud method that is used to measure comprehension
rather than problem solving (the reader is again referred to Leighton, 2017b for a
full exposition on the differences between think aloud interviews and cognitive
labs). In this section, the main phases of the think aloud are summarized with brief
presentation of procedural issues, with examples.
Cognitive Model Development Think-aloud interviews can yield a significant
amount of verbal report data to analyze. Often researchers can become overwhelmed
with the extent of the report data and what to focus on and evaluate as evidence of
response processes. This is one reason why the first step in conducting a think-aloud
is to develop a cognitive model, or some type of flowchart that outlines the knowl-
edge and skills expected to underlie performance. The cognitive model does not
12 Cognitive Interviewing and Think Aloud 223
As indicated previously, the Standards (AERA et al., 2014) maintain the need to
include evidence of response processes when generating validity arguments to sup-
port claims about skills, competencies, attitudes, beliefs, etc. that are difficult to
observe or measure directly. While the Standards emphasize the need for evidence
of response processes, the Standards do not describe how this evidence should be
gathered and the best practices for gathering this evidence. Clearly, it can be
assumed that evidence used to validate claims needs to be sound. The good news is
that there is a solid base of past research on the conditions for gathering this evi-
dence using different interview methods–cognitive interviews and think-aloud to
name the two to which this chapter is devoted–and a growing body of research
specifically in the domain of educational testing and increasingly so in psychologi-
cal assessment, cross-cultural testing, etc.
12 Cognitive Interviewing and Think Aloud 225
References
Beatty, P., & Willis, G. (2007). Research synthesis: The practice of cognitive interviewing. Public
Opinion Quarterly, 71(2), 287–311.
Benítez, I., He, J., van de Vijver, F. J. R., & Padilla, J. L. (2016). Linking extreme response styles
to response processes: A cross-cultural mixed methods approach. International Journal of
Psychology, 51, 464–473.
Benítez, I., & Padilla, J. L. (2014). Analysis of non-equivalent assessments across different linguis-
tic groups using a mixed methods approach: Understanding the causes of differential item
functioning by cognitive interviewing. Journal of Mixed Methods Research, 8, 52–68.
Bloom, B. S., Engerhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of
educational objectives: The classification of educational goals. Handbook I: Cognitive domain.
New York, NY: David McKay.
Castillo, M., & Padilla, J. L. (2013). How cognitive interviewing can provide validity evidence of
the response processes to scale items. Social Indicators Research, 114, 963–975.
Chepp, V., & Gray, C. (2014). Foundations and new directions. In K. Miller, S. Willson, V. Chepp,
& J. L. Padilla (Eds.), Cognitive interviewing methodology (pp. 7–14). Hoboken, NJ: John
Wiley & Sons.
Cizek, G. J., Rosenberg, S. L., & Koons, H. H. (2007). Sources of validity evidence for educational
and psychological tests. Educational and Psychological Measurement, 68, 397–412.
Collins, D. (2015). Cognitive interviewing practice. London, UK: Sage.
Conrad, F. G., & Blair, J. (2009). Sources of error in cognitive interviews. Public Opinion
Quarterly, 73, 32–55.
Embretson, S. (1983). Construct validity: Construct representation versus nomothetic span.
Psychological Bulletin, 93, 179–197.
Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87,
215–251.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge,
MA: The MIT Press.
Fox, M. C., Ericsson, A., & Best, R. (2011). Do procedures for verbal reporting of thinking have
to be reactive? A meta-analysis and recommendations for best reporting methods. Psychological
Bulletin, 137, 316–344.
Kane, M. T. (2013). Validation as a pragmatic, scientific activity. Journal of Educational
Measurement, 50, 115–122.
Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567.
Jabine, T., Straf, M., Tanur, J., & Tourangeau, R. (Eds.). (1984). Cognitive aspects of survey
design: Building a bridge between disciplines. Washington, DC: National Academy Press.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreetment for categorical data.
Biometrics, 33, 159–174.
Leahey, T. H. (1992). A history of psychology (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
Leighton, J. P. (2004). Avoiding misconception, misuse, and missed opportunities: The collection
of verbal reports in educational achievement testing. Educational Measurement: Issues and
Practice, 23, 6–15.
Leighton, J. P. (2013). Item difficulty and interviewer knowledge effects on the accuracy and con-
sistency of examinee response processes in verbal reports. Applied Measurement in Education,
26, 136–157.
Leighton, J. P. (2017a). Collecting, analyzing and interpreting verbal response process data. In
K. Ercikan & J. Pellegrino (Eds.), Validation of score meaning in the next generation of assess-
ments. Routledge.
Leighton, J. P. (2017b). Using think aloud interviews and cognitive labs in educational research.
Oxford, UK: Oxford University Press.
Leighton, J. P., Cui, Y., & Cor, M. K. (2009). Testing expert-based and student-based cognitive
models: An application of the attribute hierarchy method and hierarchical consistency index.
Applied Measurement in Education, 22, 229–254.
12 Cognitive Interviewing and Think Aloud 227
Loftus, E. (1984). Protocol analysis of response to survey recall questions. In T. Jabine, M. Straf,
J. Tanur, & R. Tourangeau (Eds.), Cognitive aspects of survey design: Building a bridge
between disciplines (pp. 61–64). Washington, DC: National Academy Press.
Messick, S. (1990). Validity of test interpretation and use, Research report No. 90–11. Princeton,
NJ: Education Testing Service.
Miller, K. (2011). Cognitive interviewing. In K. Miller, J. Madans, A. Maitland, & G. Willis (Eds.),
Question evaluation methods: Contributing to the science of data quality (pp. 51–75).
New York, NY: Wiley.
Miller, K. (2014). Introduction. In K. Miller, S. Willson, V. Chepp, & J. L. Padilla (Eds.), Cognitive
interviewing methodology (pp. 1–6). New York, NY: Wiley.
Miller, K., Chepp, V., Willson, S., & Padilla, J. L. (Eds.). (2014). Cognitive interviewing methodol-
ogy. New York, NY: Wiley.
Miller, K., Willson, S., Chepp, V., & Ryan, J. M. (2014). Analyses. In K. Miller, S. Willson,
V. Chepp, & J. L. Padilla (Eds.), Cognitive interviewing methodology (pp. 35–50). New York,
NY: Wiley.
Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ:
Prentice-Hall.
Padilla, J. L., & Benítez, I. (2014). Validity evidence based on response processes. Psicothema, 26,
136–144.
Presser, S., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., Rothgeb, J. M., & Singer, E. (2004).
Methods for testing and evaluating survey questions. Public Opinion Quarterly, 68, 109–130.
Ridolfo, H., & Schoua-Glusberg, A. (2011). Analyzing cognitive interview data using the constant
comparative method of analysis to understand cross-cultural patterns in survey data. Field
Methods, 23, 420–438.
Schwarz, N. (2007). Cognitive aspects of survey methodology. Applied Cognitive Psychology, 21,
277–287.
Shear, B. R., & Zumbo, B. D. (2014). What counts as evidence: A review of validity studies in
educational and psychological measurement. In B. D. Zumbo & E. K. H. Chan (Eds.), Validity
and validation in social, behavioral, and health sciences (pp. 91–111). New York, NY:
Springer.
Shulruf, B., Hattie, J., & Dixon, R. (2008). Factors affecting responses to Likert type question-
naires: Introduction of the ImpExp, a new comprehensive model. Social Psychology of
Education, 11, 59–78.
Sireci, S. G. (2012, April). “De-constructing” test validation. Paper presented at the annual con-
ference of the National Council on Measurement in Education as part of the symposium
“Beyond Consensus: The Changing Face of Validity” (P. Newton, Chair), Vancouver, BC.
Stone, J., & Zumbo, B. D. (2016). Validity as a pragmatist project: A global concern with local
application. In V. Aryadoust & J. Fox (Eds.), Trends in language assessment research and
practice (pp. 555–573). Newcastle, UK: Cambridge Scholars.
Tourangeau, R. (1984). Cognitive science and survey methods: A cognitive perspective. In
T. Jabine, M. Straf, J. Tanur, & R. Tourangeau (Eds.), Cognitive aspects of survey design:
Building a bridge between disciplines (pp. 73–100). Washington, DC: National Academy
Press.
Willis, G. B. (2005). Cognitive interviewing. Thousand Oaks, CA: Sage.
Willis, G. B. (2009). Cognitive interviewing. In P. Lavrakas (Ed.), Encyclopedia of survey research
methods (Vol. 2, pp. 106–109). Thousand Oaks, CA: SAGE.
Willis, G. B. (2015). Analysis of the cognitive interview in questionnaire design. New York, NY:
Oxford University Press.
Willis, G., & Miller, K. (2011). Cross-cultural cognitive interviewing: Seeking comparability and
enhancing understanding. Field Methods, 23, 331–341.
Wilson, T. D. (1994). The proper protocol: Validity and completeness of verbal reports.
Psychological Science, 5, 249–252.
228 J.-L. Padilla and J.P. Leighton
Willson, S., & Miller, K. (2014). Data collection. In K. Miller, S. Willson, V. Chepp, & J. L. Padilla
(Eds.), Cognitive interviewing methodology (pp. 15–33). New York, NY: Wiley.
Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications
for validation practice. In R. W. Lissitz (Ed.), The concept of validity (pp. 65–83). Charlotte,
NC: Information Age Publishing, Inc.
Zumbo, B. D., & Shear, B. R. (2011). The concept of validity and some novel validation methods.
In Northeastern Educational Research Association (p. 56). Rocky Hill, CT.