Evaluacion Aparaxia PDF
Evaluacion Aparaxia PDF
Evaluacion Aparaxia PDF
Tutorial
Background: With respect to the clinical criteria for diagnosing operationalize and assess these 3 core characteristics.
childhood apraxia of speech (commonly defined as a Methodological details are reviewed for each procedure,
disorder of speech motor planning and/or programming), along with a short overview of research results reported
research has made important progress in recent years. Three in the literature.
segmental and suprasegmental speech characteristics—error Conclusion: The 3 types of measurement procedures should
inconsistency, lengthened and disrupted coarticulation, be seen as complementary. Some characteristics are better
and inappropriate prosody—have gained wide acceptance suited to be described at the perceptual level (especially
in the literature for purposes of participant selection. However, phonemic errors and prosody), others at the acoustic level
little research has sought to empirically test the diagnostic (especially phonetic distortions, coarticulation, and prosody),
validity of these features. One major obstacle to such empirical and still others at the kinematic level (especially coarticulation,
study is the fact that none of these features is stated in stability, and gestural coordination). The type of data collected
operationalized terms. determines, to a large extent, the interpretation that can be
Purpose: This tutorial provides a structured overview given regarding the underlying deficit. Comprehensive studies
of perceptual, acoustic, and articulatory measurement are needed that include more than 1 diagnostic feature and
procedures that have been used or could be used to more than 1 type of measurement procedure.
F
rom a historical perspective, childhood apraxia of
speech (CAS) is a controversial clinical entity,
with respect to both clinical signs and underlying
a
Utrecht Institute of Linguistics-OTS, Utrecht University, deficit. In 1981, Guyette and Diedrich had concluded that
the Netherlands “…No pathognomonic symptoms or necessary and suffi-
b
Oral Dynamics Laboratory, Department of Speech-Language cient conditions were found for the diagnosis…” (p. 44)
Pathology, University of Toronto, Ontario, Canada
c
and critically termed CAS as “a label in search of a popu-
Department of Communication Sciences and Disorders, Temple lation” (p. 39). Despite clinical studies to further character-
University, Philadelphia, PA
d ize CAS (e.g., Aram & Horwitz, 1983; Ekelman & Aram,
Department of Communicative Disorders and Sciences, University
at Buffalo, NY
1984; Marion, Sussman, & Marquardt, 1993; Pollock &
e
Moss Rehabilitation Research Institute, Moss Rehabilitation Hall, 1991; B. Smith, Marquardt, Cannito, & Davis, 1994;
Hospital, Elkins Park, PA Walton & Pollock, 1993), this situation had not changed
f
HAN University of Applied Sciences, Nijmegen, the Netherlands much by the time of 1994, when Shriberg (1994) con-
g
Department of Rehabilitation, Donders Institute for Brain, cluded that development in this field was moving endlessly
Cognition and Behaviour, Radboud University Medical Center, sideways.
Nijmegen, the Netherlands Since then, a large body of research has been dedicated
h
Center for Language and Cognition, Research School of Behavioral
to characterize the speech impairment and underlying func-
and Cognitive Neurosciences, University of Groningen, The Netherlands
tional and neuromotor deficit of CAS, and this endeavor
Correspondence to Hayo Terband: [email protected]
has been successful in some respects. There is an agreement
Editor-in-Chief: Julie Liss that, from a functional point of view, CAS is a disorder of
Received May 11, 2019 motor planning and/or motor programming (American
Accepted May 18, 2019 Speech-Language-Hearing Association [ASHA], 2007) or,
https://doi.org/10.1044/2019_JSLHR-S-CSMC7-19-0214
Publisher Note: This article is part of the Special Issue: Select
Papers From the 7th International Conference on Speech Motor Disclosure: The authors have declared that no competing interests existed at the time
Control. of publication.
Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019 • Copyright © 2019 American Speech-Language-Hearing Association 2999
Downloaded from: https://pubs.asha.org Andrea Martucci on 09/27/2019, Terms of Use: https://pubs.asha.org/pubs/rights_and_permissions
in other words, an inability to transform an abstract pho- 2018, for a systematic review of the differential diagnostic
nological code into motor speech commands (cf. Maassen, value of these features). Alternative approaches such as
Nijland, & Terband, 2010). More specifically, ASHA defined developing psycholinguistic profiles derived from process-
CAS as “a neurological childhood (pediatric) speech sound oriented diagnostics have been proposed elsewhere (e.g.,
disorder in which the precision and consistency of move- Terband, Maassen, & Maas, 2016, 2019). The goal of the
ments underlying speech are impaired in the absence of current article is to provide a structured overview of mea-
neuromuscular deficits (e.g., abnormal reflexes, abnormal surement procedures that have been used or may be used
tone)…. The core impairment in planning and/or program- to assess the three core characteristics of CAS as formu-
ming spatiotemporal parameters of movement sequences lated in the ASHA Technical Report (ASHA, 2007),
results in errors in speech sound production and prosody.” without going into the issue of differential diagnosis itself.
(ASHA, 2007, pp. 3–4). Since then, this definition has been This review is organized by each feature character-
adopted widely in the CAS research literature (e.g., Grigos izing CAS and within each feature by level of analysis
& Kolenda, 2010; Iuzzini-Seigel, Hogan, Guarino, & Green, (perceptual/transcription, acoustic, articulatory analysis).
2015; Maas & Farinella, 2012; Murray, McCabe, Heard, & We review methodological details for each procedure and
Ballard, 2015; Namasivayam et al., 2015; Preston et al., provide a short overview of research results that have been
2014; Terband, Maassen, Guenther, & Brumberg, 2009, reported in the literature. In terms of methodological details,
2014). for each approach, we identify four critical parameters
With respect to the clinical criteria for diagnosing that must be specified for operationalization and determining
CAS, research has also made important progress in recent cutoff scores for diagnosis: (a) the response target to be
years. Although ASHA (2007, p. 4) noted that “there is produced by the child (sounds, words, nonwords, etc.),
no validated list of diagnostic features of CAS that differ- (b) the task used to elicit these responses (e.g., imitation,
entiates this symptom complex from other types of child- picture naming), (c) the conditions under which the responses
hood speech sound disorders,” the CAS Technical Report are elicited (e.g., quiet, with time pressure), and (d) the
proposed three segmental and suprasegmental speech char- measures obtained from these responses (e.g., error consis-
acteristics that were considered to be consistent with a tency scores, formant ratios). For each method, we further
deficit in speech motor planning and programming and summarize the scientific basis, specifically, (e) whether
thus as being specific to CAS: administration is standardized, (f ) whether validity and
reliability data are available, and (g) whether norm or
1. inconsistent errors on consonants and vowels in
reference data for children are available (we make a distinc-
repeated productions of syllables or words;
tion between norm data, i.e., norm-referenced cutoff scores,
2. lengthened and disrupted coarticulatory transitions and reference data, i.e., numbers reported by other studies
between sounds and syllables; and that may serve as reference values). Finally, we discuss
3. inappropriate prosody, especially in the realization issues that need to be taken into consideration when choosing
of lexical or phrasal stress. a suitable technique and identify research needs in terms
of the development of (more objective) measures as well as
These features have gained wide acceptance in the their validation and standardization.
subsequent literature for purposes of participant selection,
but little research has sought to empirically test the diag-
nostic validity of these features. One major obstacle to Inconsistent Errors on Consonants and Vowels
such empirical study is the fact that none of these proposed in Repeated Productions of Syllables or Words
features was stated in operationalized terms. This lack of
operationalization also hinders comparability of participants
Background
across studies, because often researchers either do not Inconsistency of Speech
provide operationalized criteria for the CAS diagnoses of Disordered or atypical “inconsistency” is variability
their participants or researchers use different criteria. The in speech production in the absence of contextual varia-
purpose of this tutorial is to provide a structured overview tions (e.g., phonetic context, pragmatic influences, matura-
of measurement procedures that have been used or could tion or cognitive–linguistic influences), such as during
be used to operationalize and assess these three core char- repeated productions of the same exemplar across multiple
acteristic. The hope is that this will facilitate a more repli- trials (Dodd, Hua, Crosbie, Holm, & Ozanne, 2009;
cable evidence base and, eventually, a consensus on how Marquardt, Jacks, & Davis, 2004). The measurement of
best to capture these features for future research and clinical inconsistent speech production includes not just quantity of
application. different productions and control of context but also the
To be clear, we do not address whether a “feature quality of those alterations. Qualitative differences, such as
checklist” is ultimately the optimal approach to diagnosis the number and type of (multiple) substitutes for phonemes
(e.g., see Shriberg et al., 2017, for a discussion of prob- within and across all positions, assist in the differentiation
lems with this approach), nor do we suggest that these spe- of atypical/disordered “inconsistency” from “normal” vari-
cific features are the most important or discriminative ones ability as found in typically developing (TD) children
(see Murray, Iuzzini-Seigel, Maas, Terband, & Ballard, (Iuzzini-Seigel, 2012; Iuzzini-Seigel & Forrest, 2010). In the
3000 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
3002 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being Six multisyllabic words (elephant, umbrella, strawberries, helicopter,
analyzed thermometer, and spaghetti; Preston & Koenig, 2011)
(2) Tasks used to elicit those Picture naming (Preston & Koenig, 2011)
targets Spontaneously elicited connected speech samples using age-appropriate
materials (Marquardt et al., 2004)
(3) Conditions in Quiet, with time pressure (rapid picture naming; Preston & Koenig, 2011)
which responses are elicited Quiet, no time pressure (Marquardt et al., 2004)
(4) The measures obtained from Total token variability: (number of variants − 1) / (number of tokens − 1)
those responses (Marquardt et al., 2004)
Error token variability: (number of incorrect variants − 1)/ (number of
incorrect tokens − 1) (Marquardt et al., 2004)
Scientific basis
(5) Standardized measurement No
protocol?
(6) Validity and reliability of Validity: No
outcome measures? Reliability: Broad transcription reliability from spontaneous
speech (10% of samples) = 86.22% (range: 75%–96.26%;
Marquardt et al., 2004)
Interrater reliability of total token variability scores based on
phonetic transcription of rapid naming task with r = .55
(Preston & Koenig, 2011)
(7) Norm or reference data available? No
Token-to-Token Inconsistency assessment is a nominal mea- measurements using such methods (e.g., ECI) may represent
surement, and children with phonological disorders are classi- severity of the problem rather than disorder category (Betz
fied as inconsistent or consistent, depending on whether or & Stoel-Gammon, 2005; Forrest, Dinnsen, & Elbert, 1997;
not they produced the same words consistently across three Forrest, Elbert, & Dinnsen, 2000 ; Tyler et al., 2006). With
repetitions (> 40% = inconsistent). If inconsistency scores regard to reliability and validity, ECI score calculation
are greater than 40% (but see Iuzzini-Seigel, 2012, for higher has a high degree of reliability (99%; Tyler et al., 2003)
cutoff > 50%), along with the presence of other features, and possibly addresses the same construct as other measures
such as poor oromotor performance, poorer productions of speech severity (e.g., PCC; Tyler & Lewis, 2005; see
during imitation than spontaneous speech, consonant and Table 3).
vowel distortions, and atypical prosody, then a CAS diag-
nosis may be suspected (Dodd et al., 2002; see Table 2).
TTR of Consonant Substitutions
ECI TTR analysis is a measure of the number of types of
With respect to inconsistency measures at the seg- productions to the total number of tokens produced (see
mental level, the ECI has been applied in a number of studies Table 4). It indicates the number of different ways (i.e.,
(Preston & Koenig, 2011; Tyler & Lewis, 2005; Tyler, inconsistency) a target form is produced by the child. Two
variations of TTR analysis have been applied in both diag-
Lewis, & Welch, 2003; see Table 3). The ECI is a raw score
nostic and therapeutic contexts in the SSD and CAS popu-
calculated as the sum of the total number of different error
lations. The segmental-level TTR measure, called CSIP,
forms across all consonants and all word positions. A higher
calculates a percentage based on the number of different error
ECI score indicates a greater number of different error
substitutes across all targets divided by the total number
forms across a larger number of consonants, and a lower of erred productions across the whole inventory (Forrest &
ECI score indicates fewer different error forms across a Iuzzini-Seigel, 2008; Iuzzini-Seigel, 2012). The ISP (Iuzzini-
smaller number of consonants (Tyler & Lewis, 2005). The Seigel & Forrest, 2010) is derived from CSIP by modifying
ECI measure is moderately–strongly correlated to token-to- the denominator (of CSIP) from the total number of erred
token variability of repeated productions at word level and productions to the number of target opportunities. Validity
measures of speech severity, such as percent consonants cor- of the CSIP/ISP measure has been demonstrated in few stud-
rect (PCC; Preston & Koenig, 2011). Generally, correlation ies. Segmental-level ISP measure is correlated with the broader
between PCC and ECI scores have been reported in the lexical-level word inconsistency scores (r > .70; Iuzzini-
range of r = −.58 to −.88 in children with speech and Seigel, 2012), which demonstrates construct validity. Inter-
language disorders (Tyler & Lewis, 2005; Tyler et al., 2003). rater percent agreement scores for narrow transcrip-
Importantly, and as mentioned earlier (see the Error Incon- tions, as used in TTR analysis, is reported to be > 90%
sistency in CAS section), there are several studies that (Heisler, Goffman, & Younger, 2010; Iuzzini-Seigel, 2012;
provide support for the notion that variability/consistency see Table 4).
(1) Stimuli or targets being analyzed 25 words (ranging from one to four syllables)
(2) Tasks used to elicit those targets Picture naming
(3) Conditions in which responses Quiet, no time pressure, production of each target word in
are elicited three separate trials, each trial separated by an intervening
task (subsection of oral motor screen) or a short break
(5 min) with conversation
(4) The measures obtained from Percentage of target words produced differently (word
those responses inconsistency score)
Scientific basis
(5) Standardized measurement protocol? Yes
(6) Validity and reliability of outcome Validity: Not specified in the DEAP test manual
measures? Reliability: Percent interrater agreement for Word Inconsistency
Assessment based on whole-word narrow transcriptions
from video/audio recordings was 91.64% (SD = 5.76%;
Iuzzini-Seigel, 2012)
(7) Norm or reference data available? Reference data: n > 40% = inconsistent phonological disorder
(Dodd, 2005; Tyler & Lewis, 2005)
3004 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed 64 words (included every English consonant at least twice—except /h/; Preston &
Koenig, 2011)
(2) Tasks used to elicit those targets Picture naming (Preston & Koenig, 2011)
(3) Conditions in which responses are elicited Quiet, no time pressure (Preston & Koenig, 2011)
(4) The measures obtained from those responses ECI: Sum of all different error forms for all consonant phonemes combined
(Preston & Koenig, 2011; Tyler & Lewis, 2005; Tyler et al., 2003)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: Point -by-point consonant agreement = 87.3% (range: 81.5%–92.3%)
Interrater reliability of ECI scores, r = .98 (Preston & Koenig, 2011)
Reliability: Intra- and interreliability of error consistency scores derived from
transcriptions = 99% (Tyler et al., 2003)
(7) Norm or reference data available? Reference data: ECI range in preschool-age children with speech and language
disorders: 12–70
ECI cutoff scores for children with speech and language disorders: variable
group, upper quartile > 44.75; consistent group, lower quartile < 22.25 (Tyler &
Lewis, 2005)
these procedures reports any reliability scores related to & Whalen, 2017). However, most studies report outcome
segmentation of acoustic recordings or peak-picking algo- measures obtained with high reliability (Iuzzini-Seigel,
rithms (see Table 5). Hogan, Rong, et al., 2015; Lundeborg et al., 2015; see Table 6).
(1) Stimuli or targets being analyzed 200–240 word probe list that provides 340–440 opportunities to produce all of the
American English consonants in all naturally occurring word positions (Iuzzini-
Seigel, 2012; Iuzzini-Seigel & Forrest, 2010)
Stimuli also derived from the Goldman-Fristoe Test of Articulation 2 (GFTA-2) and
the first trial of Word Inconsistency Assessment (Dodd et al., 2009)
(2) Tasks used to elicit those targets Picture-naming task (if child is unable, then semantic cue or delayed imitation is
carried out)
(3) Conditions in which responses are elicited Quiet, no time pressure
(4) The measures obtained from those responses CSIP: percentage based on the number of different error substitutes across all targets
divided by the total number of erred productions across the whole inventory
(Iuzzini-Seigel, 2012; Iuzzini-Seigel & Forrest, 2010)
ISP: percentage based on the number of different error substitutes across all targets
divided by total number of productions (Iuzzini-Seigel, 2012; Iuzzini-Seigel &
Forrest, 2010)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: Construct validity: high correlation between ISP (r > .70) and lexical-level
word inconsistency scores (Iuzzini-Seigel, 2012)
Reliability: Interrater percent agreement for narrow transcription > 90% (Heisler et al.,
2010; Iuzzini-Seigel, 2012)
(7) Norm or reference data available? Reference data: ISP score cutoff for CAS > 17% (Iuzzini-Seigel, 2012)
et al., 2011). Optical motion capture systems utilize small movement trajectories (e.g., of the jaw or the lower lip)
reflective markers (approximately 3 mm) that are placed or individual movement cycles (cyclic STI; van Lieshout &
on the child’s upper and lower lips, right/left/mid jaw, and Moussa, 2000; see Table 7). A lower STI value represents
lip corners to track speech-related movements. Other less variability, suggesting a robust and well-learned under-
markers are placed on the forehead and nasion, which are lying movement template (Kleinow & Smith, 2000). With
used as reference to correct for head rotation/movements. regard to stimuli and elicitation procedures, camera-based
An alternative to optical motion capture system is EMA. motion tracking of speech articulators in children has been
In EMA, the position and motion of sensor coils attached limited to visible structures such as the jaw and lips and
to speech articulators are tracked within a magnetic field. to words that comprise of bilabial consonants (e.g., pop,
The sensor coils, typically around 4 × 4 × 3 mm in size, puppet, and puppypop: Moss & Grigos, 2012; buy bobby
are usually glued on the bridge of the nose, the maxillary a puppy: A. Smith & Goffman, 1998). Stimuli with bilabial
gum ridge on the upper and lower lips, the mandibular productions are also chosen with EMA systems for easier
gum ridge, and two or three points on the tongue. As the segmentation of position data (Terband et al., 2011). To ac-
sensor coils are wired and directly glued on the articula- quire adequate data for measurement of articulatory vari-
tors, this methodology is relatively invasive and might not ability (e.g., STI/cyclic STI), about 10–15 productions of
be tolerated well by young children or infants. In com- the target stimuli are elicited. Most speech kinematic stud-
parison, the passive reflective markers used with optical ies in children have elicited productions using picture nam-
motion tracking systems are unobtrusive, light, and well ing, cloze sentence procedure (within a story retell game), or
tolerated by young children and offer a more relaxed and by direct/immediate word/sentence imitation tasks with
naturalistic setting for data collection, especially in children. auditory models (Grigos et al., 2015; Moss & Grigos, 2012;
The limitation of optical motion capture systems is that Sadagopan & Smith, 2008; Terband et al., 2011; see Table 7).
they require a direct line of sight between the camera
and the reflective marker and hence are only suited for the Covariance Measures
measurement of externally visible structures such as the jaw Moss and Grigos (2012) examined spatial coupling
and lips. The operational principles of the optical motion (calculated as absolute peak correlation coefficient [PC]
capture and EMA systems have been elaborated elsewhere between articulator pairs; i.e., between jaw and lower lip
and are beyond the scope of this review (e.g., see Feng [J–LL], jaw and upper lip [J–UL], and upper and lower
& Max, 2014; Yunusova, Green, & Mefferd, 2009). lip [UL–LL]) and temporal coupling (time required for
peak spatial coupling; i.e., lag) as a function of word length
Kinematic Spatiotemporal Variability Indices (e.g., “pop,” “puppet,” and “puppypop”; see Table 8). A
For the STI, a sum of 50 SDs at 2% intervals is pair of articulators with a high degree of spatial and tem-
calculated over amplitude- and time-normalized repeated poral coordination would yield high correlation coefficients
3006 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed 20–25 repetitions of a phrase of which typically 10 are used for analysis: “Buy Bobby a
puppy” (E-STI; Howell et al., 2009); “Well we’ll will them” (FDA; Anderson et al., 2008);
“Tony knew you were lying in bed” (FDA/UUV; Cummins et al., 2014)
(2) Tasks used to elicit those targets Phrase repetition
(3) Conditions in which responses are elicited Quiet, self-selected comfortable/habitual speaking rate, twice as fast or half as fast as
habitual speaking rate
(4) The measures obtained from those responses Independent or combined temporal and spatial variability (E-STI/FDA/UUV) from audio
recordings
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: Results comparable to kinematic STI and negatively correlated with speech
intelligibility ratings (Cummins et al., 2014; van Brenk & Lowit, 2012)
Reliability: No
(7) Norm or reference data available? No
Note. E-STI = envelope-based spatiotemporal index; FDA = functional data analysis; UUV = utterance-to-utterance variability.
and low lag values. Moss and Grigos analyzed these mea- dependent as articulatory movements overlap in time and
sures in 3- to 6-year-old TD children and those with CAS interact with one another. Acoustically, this manifests itself
and speech delay (n = 6 per group). There was no effect as the realizations of consecutive speech segments affecting
of group or Group × Word interactions for PC and lag. each other mutually. The effect is bidirectional. Influences
Green, Moore, Higashikawa, and Steeve (2000) analyzed of a segment on a following segment are called persevera-
PC and lag in 1-, 2-, and 6-year-old TD children and tory or carryover coarticulation, and influences of an up-
adults. In general, 1- and 2-year-old children demonstrated coming segment on a preceding segment are known as
greater spatial coupling between the UL–LL than between anticipatory coarticulation. Furthermore, coarticulation is not
the lips and jaw pairs. The PC values indexing lip and jaw limited to adjacent segments and can occur across syllables.
coupling (J–UL, J–LL) for 1-year-old children were very Coarticulation is the consequence of the inertia of
low, indicating weak coupling (values centered near zero). the articulatory organs caused by their biomechanical char-
Spatial coupling values increased with age. With regard acteristics and an economy of effort in articulatory planning
to lag-to-peak coefficient values, all articulatory move- influenced by biomechanical constraints (e.g., Recasens,
ments (across pairs of articulators) were tightly coupled 2004; Recasens, Pallarès, & Fontdevila, 1997), prosodic
with mean lag values not > 29 ms for any age group (see conditions (Cho, 2004; De Jong, 1995; Edwards, Beckman,
Table 8). & Fletcher, 1991), and syllable structure (e.g., Modarresi,
Sussman, Lindblom, & Burlingame, 2004; Nittrouer,
Coefficient of Variation of Spatial and Temporal Coupling Munhall, Kelso, Tuller, & Harris, 1988; Sussman, Bessell,
Coefficient of variation of the PC (PCcov) and lag Dalston, & Majors, 1997). Furthermore, the amount of
values (Lcov) from the Covariance Measures section were coarticulation depends on lexical frequency and, relatedly,
analyzed by Moss and Grigos (2012) for the following the specific demands of the communication task (e.g.,
articulatory pairs: J–LL, J–UL, and UL–LL in 3- to 6-year- Farnetani & Recasens, 1997; Kühnert & Nolan, 1999).
old TD children, those with speech delay, and children diag- Perseveratory coarticulation has been found to reflect pre-
nosed with CAS (n = 6 per group; see Table 9). Significant dominantly biomechanical constraints, whereas anticipa-
main effects for group were found for PCcov and Lcov. tory coarticulation mainly reflects higher level phonetic
The CAS group had significantly higher average PCcov processing (e.g., Daniloff & Hammarberg, 1973; Hertrich
and Lcov across utterances for J–LL coupling than the & Ackermann, 1995, 1999; Kent & Minifie, 1977; Whalen,
speech delay group (see Table 9). 1990). Comparisons between carryover and anticipatory
coarticulation effects are highly complicated, as both effects
co-occur at multiple levels at approximately the same time.
Lengthened and Disrupted Coarticulatory Moreover, the specific biomechanical constraints and syllabic
Transitions Between Sounds and Syllables position of the speech sounds involved play a role that
is not straightforward and appears to be language specific,
Background that is, some studies report stronger perseveratory as
Coarticulation compared to anticipatory coarticulation whereas other
Coarticulation refers to the phenomenon that the studies report opposite effects (Beddor, Harnsberger, &
specific properties of articulatory movements are context Lindemann, 2002; Graetzer, Fletcher, & Hajek, 2015;
(1) Stimuli or targets being analyzed Five repetitions of CVC pseudowords (pVb), which sampled corner vowels (e.g., /pib/,
/pub/; Iuzzini-Seigel, Hogan, Guarino, et al., 2015)
115 Repetitions of monosyllabic /pa/ (Yu et al., 2014)
Five repetitions of 12 CVC target words with plosive consonants in syllable initial
position (e.g., pea, bee, tea; Whiteside et al., 2003)
Three repetitions of six minimal pairs (e.g., pil–bil, tennis–dennis; Lundeborg et al., 2015)
(2) Tasks used to elicit those targets Imitation of recorded speech sample (Iuzzini-Seigel, Hogan, Guarino, et al., 2015)
Cued (white circle on monitor) repetition task (Yu et al., 2014)
Picture naming (Iuzzini-Seigel, Hogan, Guarino, et al., 2015)
In carrier phrase “say ___ now” (Iuzzini-Seigel, Hogan, Guarino, et al., 2015) or “say ___
again” (Whiteside et al., 2003)
(3) Conditions in which responses are elicited Quiet room, no time pressure
(4) The measures obtained from those responses Duration in milliseconds of VOT measured described in terms of mean, SD median,
median difference scores for voiced–voiceless cognates, COV, and skewness
(Iuzzini-Seigel, Hogan, Guarino, et al., 2015)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: No
Reliability: Intrarater reliability: ICC = .98–.99 (absolute error = 2.0–4.3 ms; Iuzzini-Seigel,
Hogan, Guarino, et al., 2015); Cronbach’s alpha = .97 (Lundeborg et al., 2015).
Interrater reliability: Pearson r = .97 (Whiteside et al., 2003); mean difference between
raters = 17.19 ms (SD = 6.89 ms), Pearson r = .93 (Yu et al., 2014)
(7) Norm or reference data available? Reference data: Mean COV values (in %) for voiced plosives approximately 20%–30%
for typically developing children between 5;8 and 13;2 (years;months). Mean COV
values (in %) for voiceless plosives approximately 15%–25% for typically developing
children between 5;8 and 13;2 (Whiteside et al., 2003)
Typically developing 5-year-olds: Mean COVs of 74% for /b/ and 51% for /d/. Mean
COVs of 42% for /p/ and 34% for /t/
3- to 5-year-old children with CAS: Mean (SD) of COV = 56% (29) for /p/ and 52% (28) for /t/
3- to 5-year-old children with phonological delay: Mean (SD) of COV = 38% (19) for /p/
and 42% (25) for /t/ (Iuzzini-Seigel, 2012)
Note. COV = coefficients of variation; CAS = childhood apraxia of speech; ICC = intraclass correlation coefficient.
Modarresi et al., 2004; Recasens & Pallarès, 2001; Sharf Sussman, Minifie, Buder, Stoel-Gammon, & Smith, 1996;
& Ohde, 1981). Zharkova, Hewlett, & Hardcastle, 2011, 2012) and children
move from a more global to a more segmental planning
Typical Development of Coarticulation (Katz & Bharadwaj, 2001; Nijland et al., 2002; Nittrouer,
In typical development, coarticulatory patterns Studdert-Kennedy, & McGowan, 1989; Noiray et al.,
change as children become more adultlike in their speech 2018; Siren & Wilcox, 1995). However, coarticulation in-
production and improve spatiotemporal control. However, creases (relatively) in certain contexts that are language
precisely how coarticulation changes during development specific, that is, depending on, for example, the phonologi-
has proved to be rather complex. Studies agree on the fact cal and articulatory specification of the segments involved
that coarticulation is more variable in the speech of children (e.g., underspecified vowels exhibit more coarticulation;
as compared to adults, but some studies report stronger Nijland et al., 2002), prosodic patterns (e.g., stressed vowels
coarticulation in children while other studies report that exhibit less coarticulation; Nijland et al., 2002), and mor-
children exhibit less coarticulation than adults. At first phological structure or lexical frequency (e.g., higher fre-
glance, these results appear to be conflicting, but studies quent utterances show more coarticulation in adults but
differ in experimental methodologies, procedures, lan- not in children; Song, Demuth, Evans, & Shattuck-Hufnagel,
guage, stimuli, and age of participants. When examined 2013). Furthermore, differences between anticipatory and
closely, the results show a pattern in which “coarticulation perseveratory coarticulation in their developmental trajec-
that reflects poor temporal control or poor differentiation tories seem likely due to their differences in etiology, but
of structures decreases, whereas coarticulation that reflects the development of anticipatory and perseveratory coarti-
language-specific efficiency increases” (ASHA, 2007, p. 8). culation have not yet been compared directly in a single
More specifically, coarticulation decreases in general, as experimental design. In fact, little is known about the
coordinative structures/functional motor synergies develop development of perseveratory coarticulation in general with
(e.g., Barbier et al., 2013; Noiray, Abakarova, Rubertus, the vast majority of studies focusing on anticipatory coarti-
Krüger, & Tiede, 2018; Noiray, Ménard, & Iskarous, 2013; culation (but see Song et al., 2013).
3008 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed Eight to 15 productions of /papa/ and /baba/ produced with equal stress (Grigos, 2009)
10–15 productions of “pop,” “puppet,” and “puppypop” (Grigos et al., 2015; Moss &
Grigos, 2012)
Dutch words /paːs/ and /spaː/ repeated for 5–12 s (three to six movement cycles per trial;
Terband et al., 2011)
(2) Tasks used to elicit those targets Object naming (Grigos, 2009)
Closed-sentence procedure or respond to a “who”-question cued by a picture probe
(Grigos et al., 2015; Moss & Grigos, 2012)
Reiterated speech task–auditory model provided as needed (Terband et al., 2011)
(3) Conditions in which responses are elicited No time pressure, play scenario (Grigos, 2009)
Naturalistic productions embedded in a story retell game (Grigos et al., 2015; Moss
& Grigos, 2012)
Syllable repeated at self-chosen normal, comfortable pace (Terband et al., 2011)
(4) The measures obtained from those responses Jaw, lower lip, and upper lip displacement trajectories (Grigos, 2009; Grigos et al., 2015)
Lip aperture STI and lower lip–jaw STI (Moss & Grigos, 2012)
cSTI for tongue tip, lower lip, and jaw (Terband et al., 2011)
Scientific basis
(5) Standardized measurement protocol? No
Segmentation based on zero crossing of jaw velocity trace (Grigos, 2009)
Movement cycles (peaks/valleys in the position and velocity signals) were identified by
automated algorithm using relative amplitude (10% of maximum amplitude) and time
(a minimum interval of 0.5 s between successive events) criteria. Errors in automated
peak/valley assignment were corrected manually (Terband et al., 2011)
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? Reference data: lower lip STI data on typically developing children and young adults for
“buy bobby a puppy” phrase: M (SD) = 24.1 (4) for 4-year-old children, 18.5 (5.7)
for 7-year-old children, 13.6 (2.5) for 20- to 27-year-old young adults (A. Smith &
Goffman, 1998)
In summary, the literature indicates that development Van der Meulen, Gabreëls, et al., 2003; Sussman, Marquardt,
does not involve a global increase or decrease in coarticula- & Doyle, 2000).
tion. Speech motor development rather moves toward One factor that could be held responsible for this
“flexible patterns of coarticulation” (Noiray et al., 2018, paradox is reduced phonological distinctiveness. The less
p. 1363; see also Noiray, Wieling, Abakarova, Rubertus, & distinctly speech sounds are produced, the weaker their
Tiede, in press), which can differ depending on the phonetic possible coarticulatory influence on surrounding speech
and linguistic context. The point we want to make here, sounds. Children with CAS demonstrated weaker coarti-
therefore, is that one should deliberate what the possible culation in studies where they also showed a decreased
different outcomes would signify when assessing coarti- differentiation of speech sounds as compared to their TD
culation, that is, would more or less coarticulation in a peers (stop consonants [Sussman et al., 2000] and vowels
specific case indicate impaired, delayed, or more adultlike [Nijland et al., 2002; Nijland, Maassen, & Van der Meulen,
speech motor planning and programming? 2003]). It is unclear why these studies found a decreased
differentiation of speech sounds as not all studies do.
Coarticulation in Children With CAS Possibly, the decreased distinctiveness actually reflects
As formulated in the CAS Technical Report, the coarticulatory effects in the opposite direction. In studies
speech of children with CAS is characterized by “lengthened that feature similar phonological distinctiveness in the
and disrupted coarticulatory transitions between sounds speech of children with CAS in comparison with TD chil-
and syllables” (ASHA, 2007, p. 4). First and foremost, dren, coarticulation was found to be stronger and more
children with CAS show coarticulation patterns that are extended (Nijland, Maassen, Van der Meulen, Gabreëls,
not consistent, not typically immature, and highly idiosyn- et al., 2003). In a recent study, Terband (2017) investigated
cratic. Coarticulation effects usually change the character- anticipatory coarticulation in [ə] as context-dependent F2
istics of a speech sound in the direction of the neighboring ratio relative to size of the produced phonetic contrast in
speech sound. For 5- to 7-year-old children with CAS, the data set that was collected previously as part of the
however, coarticulation has been found to be both stronger studies by Nijland and colleagues (Nijland et al., 2002;
and more extended, as well as the opposite, more segmen- Nijland, Maassen, & Van der Meulen, 2003), thus taking
tal (or hyperarticulation), as compared to their TD peers the potential coarticulatory influence of the following
(Maas & Mailend, 2017; Maassen, Nijland, & Van der speech sounds into account. The results showed increased
Meulen, 2001; Nijland et al., 2002; Nijland, Maassen, coarticulation in the group of children with CAS (n = 16)
(1) Stimuli or targets being analyzed One-, two-, and three-syllable words (“pop,” “puppet,” and “puppypop”) repeated
10–15 times in random order (Moss & Grigos, 2012)
“Baba,” “papa,” and “mama” in 15 repetitions pseudorandom order (Green et al., 2000)
(2) Tasks used to elicit those targets Closed-sentence procedure or respond to a “who”-question cued by a picture probe
(Moss & Grigos, 2012)
Reading for older children and imitation for younger children (Green et al., 2000)
(3) Conditions in which responses are elicited No time pressure, naturalistic productions embedded in a story retell game (Grigos et al.,
2015; Moss & Grigos, 2012)
(4) The measures obtained from those responses Peak correlation coefficient (PC) between articulator pairs and lag (time required for peak
spatial coupling; Green et al., 2000; Moss & Grigos, 2012)
Scientific basis
(5) Standardized measurement protocol? No
Cross-correlation functions computed on the displacement traces
(6) Validity and reliability of outcome measures? Validity: No
Reliability: 10% of data set was reanalyzed by the same experimenter for three
coordinative indices (i.e., contribution to oral closure, coefficient, and lag). The mean
absolute difference between first and second measurements of coefficient and lag
was 0.012 and 3 ms, respectively. Pearson correlations between the first and second
measurements ranged from 0.96 to 0.99. These findings suggest that the difference
between the two measurements was negligible (i.e., good reliability; Green et al.,
2000)
(7) Norm or reference data available? Reference data: Mean (SD) of PC values and lag data from 3- to 6-year-old typically
developing children for “puppypop” phrase: J–LL: PC: 0.62 (0.13), lag: 18.87 (2.77);
J–UL: PC: 0.46 (0.08), lag: 27.86 (3.04); UL–LL: PC: 0.53 (0.06), lag: 26.78 (1.38;
Moss & Grigos, 2012)
Typically developing children (only data for 2- and 6-year-old typically developing children
provided below due to space limitations; exact raw data unavailable; ~ = approximate
values): J–LL: PC: ~0.3 to ~0.7, lag: ~ −0.02 to ~ −01; J–UL: PC: ~0.2 to ~0.4, lag:
~ −02; UL–LL: PC:~0.6, lag: ~ −02 to ~ −01 (Green et al., 2000)
Note: PC values close to one indicate a high degree of spatial coupling, while lag values
close to zero indicate high levels of temporal coupling
compared to TD children (n = 8), but this effect was large variability in the children with CAS—both within
limited to certain articulatory contexts. While TD children groups and within subjects (Nijland et al., 2002). In direct
showed a differentiation in coarticulation between conso- comparison, no differences were found between inter- and
nant contexts, the children with CAS did not. The results intrasyllabic coarticulation, neither in the children with
did not show any evidence of decreased coarticulation in CAS nor in their TD peers (Maassen et al., 2001; Nijland,
CAS. Maassen, Van der Meulen, Gabreëls, et al., 2003). Although
A second factor that is often put forward to explain these studies did not contain an adult control group, such
the paradoxical findings is syllabic structure. The manipula- an effect has been reported for adults in the literature (e.g.,
tion of syllable boundary or syllable shape revealed differ- Modarresi et al., 2004; Nittrouer et al., 1988; Sussman
ences in the adjustment of the durational structure as a et al., 1997). However, the location of syllable boundary
function of syllabic organization in children with CAS as did have an effect, and intersyllabic coarticulation was
compared to normally developing children (Maassen et al., found to be stronger in V/CC (e.g., /zə sxit/; “ze schiet”) than
2001; Nijland, Maassen, Van der Meulen, Gabreëls, et al., in VC/C (e.g., /zəs xit/; “zus giet”) sequences for both groups
2003; see also Marquardt, Sussman, Snow, & Jacks, 2002). of children (Nijland, Maassen, Van der Meulen, Gabreëls,
More specifically, the children with CAS did not show et al., 2003). In summary, whereas syllabic structure has
systematic durational adjustments to syllabic structure, and been found to have a different effect on temporal organiza-
consistent intra- and intersyllabic temporal structures were tion (the durations of the speech sounds) in 5- to 7-year-old
missing (Maassen et al., 2001; Nijland, Maassen, Van der children with CAS compared to their TD peers, it does
Meulen, Gabreëls, et al., 2003; see also Marquardt et al., not have a differential effect in terms of coarticulation.
2002). However, the differential effects of syllable structure
on coarticulation are less clear. Children with CAS did not
show a significant coarticulation effect across syllable Perceptual Measures
boundaries, while TD children showed stronger intersylla- Identification of Gated Stimuli
bic coarticulation as compared to adults. However, this Due to the transient nature of the acoustic signal,
lack of a group-level effect could very well be due to the speech characteristics involving fine-grained phonetic detail
3010 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed One-, two-, and three-syllable words (“pop,” “puppet,” and “puppypop”) repeated
10–15 times in random order
(2) Tasks used to elicit those targets Closed-sentence procedure or respond to a “who”-question cued by a picture probe
(Moss & Grigos, 2012)
(3) Conditions in which responses are elicited No time pressure, naturalistic productions embedded in a story retell game (Grigos
et al., 2015; Moss & Grigos, 2012)
(4) The measures obtained from those responses Coefficient of variation of peak correlation coefficient (PCcov) between articulator pairs
and coefficient of variation for lag (time required for peak spatial coupling; Lcov; Moss &
Grigos, 2012)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? Reference data: Mean (SD) of jaw–lower lip PCcov and Lcov data of 3- to 6-year-old
typically developing (TD), CAS and children with speech delay for the phrase
“puppypop”: TD: PCcov: 0.36 (0.15), Lcov: 0.65 (0.27); speech delay: PCcov:
0.25 (0.10), Lcov: 0.35 (0.14); CAS: PCcov: 0.54 (0.22), Lcov: 0.73 (0.30; Moss &
Grigos, 2012)
such as coarticulation are very difficult to assess perceptu- Acoustic outcome measures to assess coarticulation are
ally (see Table 10). Ziegler and von Cramon (1985) used stimuli specific, and which measure is appropriate depends
a vowel identification task in which a panel of nine trained on the speech sounds that are involved. In vowels, coarticu-
listeners were presented with gated speech segments con- lation can be calculated with mean formant frequencies
taining parts of increasing length of three test words with measured over a short time window (10–30 ms) at differ-
the form /gɘtVːtɘ/ with target vowels (/i, y, u/) and were asked ent parts of the speech sound, typically comprising onset,
of which test word the segment was the beginning of (see midpoint, and offset. While primarily formant frequencies
Table 10). The percentage of correct identification is indic- at midpoint are indicative for realized vowel quality and ar-
ative for the amount of coarticulatory information that is ticulatory positioning, other parts of the vowel can be
contained in the stimulus and can be analyzed as a function used to investigate the range of the coarticulatory influence.
of stimulus length and compared between speakers with Exact definitions of onset and offset vary between studies
and without speech disorder. Examining the productions of but are usually at about 20%–30% and 70%–80% of the
a patient with AOS compared to three control speakers, vowel, respectively. Few studies have focused on sonorants
Ziegler and von Cramon found that the onset of the vowel and liquids, but coarticulation in these speech sounds can
gesture was delayed in /i/ and /y/, whereas for /u/ the differ- be measured similar to vowels. The same principle applies
ences with the control speakers were not as pronounced. to fricatives, provided that the calculations are not based
These results indicate a reduced anticipation of the upcom- on formant analysis but on the spectral moment of the
ing articulatory movement (lip spread in case of /i/ and frication noise. When little spectral information is avail-
lip rounding in case of /y/) in the patient with AOS. Using able, such as in the case of plosives, place of articulation
a similar gating technique, Southwood, Dagenais, Sutphin, should be derived from the formant trajectories in the
and Garcia (1997) replicated this finding of reduced antici- consonant-to-vowel or vowel-to-consonant transition.
patory coarticulation in another apraxic patient. Acoustic measurements of coarticulation typically
This measure has not been used in children and only involve the first three formants, with F2 as the most
sparsely in populations with speech disorders in general. Its prominent measure of interest. Under the assumption of
potential for use in clinical settings is limited as the proce- an idealized vocal tract model, changes in vocal tract
dure yields 90 stimuli per speaker and requires an elaborate shapes during coarticulation might be obtained from trac-
perception experiment with a panel of trained listeners. ing the formant contours over time. The most prominent
relationships in the context of coarticulation are the follow-
ing. First formant frequencies are inversely related to tongue
Acoustic Measures height, that is, high vowels have low F1 values and low
Background vowels have high F1 values. Second formant frequencies
There is a large body of studies involving acoustic are related to tongue advancement, that is, front vowels
measurements of coarticulation, typically comparing specific have high F2 values and back vowels have low F2 vowels.
spectral characteristics of the acoustic signal across dif- Third formant frequencies have been found to be related
ferent contexts. Measurements can focus on the acoustic to lip rounding in front vowels, with low F3 values
spatial domain (how much the acoustics are influenced) present in rounded vowels and high F3 values present in
or the temporal domain (how far the influence reaches). unrounded vowels (Harrington, 2010). With respect to
(1) Stimuli or targets being analyzed Six repetitions of three words /gətVːtɘ/ with target vowels (/i, y, u/); each of which
five gating segments of increasing length were extracted
(2) Tasks used to elicit those targets Imitation (model produced by experimenter)
(3) Conditions in which responses are elicited Quiet, no time pressure; items in carrier phrase (“Ich habe /…/ gehört,” “I have
heard /…/”)
(4) The measures obtained from those responses Percentage /i, y, u/ responses per gating segment in an identification task by a
panel of trained listeners
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? No
voiced consonants, transitions of F2 have been found to reduced coarticulation. These factors require appropriate
be a relatively reliable indicator of place of articulation, attention when designing and analyzing speech tasks
with increasing F2 trajectories for labial consonants to employed to assess coarticulation in CAS (Hardcastle &
decreasing F2 trajectories for dorsal consonants (e.g., Tjaden, 2008).
Kewley-Port, 1982; Liberman, Cooper, Shankweiler, & The three most prominent acoustic techniques to
Studdert-Kennedy, 1967). As such, F2 has been found in evaluate coarticulation are F2 ratios, first moment coeffi-
general to be more sensitive to coarticulation than F1 and cients, and F2 locus equations. Since F2 ratios and first
F3 (Öhman, 1966). moment coefficients are usually reported side by side, these
With regard to stimuli and elicitation procedures, outcome measures will be discussed jointly, followed by a
many studies have used schwa–CV(C) sequences. When separate subsection on F2 locus equations.
interested in consonant production, the unspecified, neutral
vowel limits systematic carryover coarticulation and F2 Ratios and First Moment Ratios
schwa proves to be very sensitive to anticipatory coarti- Coarticulation in children’s speech has mainly been
culation, making it a very suitable object of study itself quantified by using the center of gravity (also named spec-
(Nijland et al., 2002; Nittrouer, 1993). Corner vowels are tral centroid or first moment of the spectral distribution)
often included in the assessment materials, as they are and fricative F2 frequencies as outcome measures (Nittrouer
most distinctive within the F1–F2 space. When studying et al., 1989; see Table 11). Typically, stimuli with varying
vowel-to-vowel coarticulation, consonant context is im- fricative spectral distributions and vowels with lip-spreading
portant to consider as recent results have suggested that de- and lip-rounding features are used, for example, /sisi/, /ʃiʃi/,
viant coarticulation in children with CAS compared to TD /susu/, and /ʃuʃu/. Coarticulation is usually quantified by cal-
children might be limited to certain articulatory contexts culating F2 ratios: dividing mean F2 values in /i/ utterances
(Terband, 2017). by mean F2 values in /u/ utterances averaged across a series
A further consideration is that measuring formants of repetitions (see Table 11). The F2 ratios provide a measure
in children can be difficult due to their relatively high fun- to distinguish the utterances. High F2 ratios in the vowels in-
damental frequencies, which generate widely spaced har- dicate large distinctions between vowels, and the F2 ratios in
monics, leading to an undersampling of the vocal tract the measurement points preceding the vowel reflect the coar-
transfer function, and may cause first and second formants ticulation effect of the upcoming vowel (Nittrouer et al.,
to blend (Lee, Potamianos, & Narayanan, 1999; Nijland 1989). It has been found, however, that centroids tend to
et al., 2002; Story & Bunton, 2016). This has been found be a relatively poor measure of fricative vowel coarticula-
to be particularly problematic in earlier studies using speech tion but are rather a measure of anticipatory lip rounding
processing programs with limited linear predictive coding (Nittrouer et al., 1989; Soli, 1981).
and visualization capabilities (Bennett, 1981; Bickley, 1986; Despite the fact that lengthened and disrupted coarti-
Nittrouer et al., 1989). Solutions to this measurement prob- culatory transitions has been identified as one of the main
lem, while becoming less urgent with modern speech process- criteria in CAS, the literature on coarticulation is, as of
ing software, are still researched, for example, by extracting yet, relatively modest in size, compared to the literature
the spectral envelope through improved spectral filtering investigating coarticulation in neurotypical children and
techniques (Story & Bunton, 2016). adults (Hardcastle & Tjaden, 2008). A number of studies
Children with CAS might display reduced articulatory have used acoustic measures of coarticulation in the assess-
rate and reduced size or amplitude of articulatory move- ment of children with CAS. As of yet, no coherent picture
ments, which may complicate interpretations of coarticula- can be drawn with respect to coarticulatory behavior in
tory effects: Both reduced articulation rate and reduced CAS. Compared to their TD peers, children with CAS
speech movements may contribute to the appearance of have found to display earlier and stronger anticipatory
3012 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed Eight repetitions of four reduplicated syllables (/CVCV/) consisting of a fricative (/s/, or /ʃ/)
followed by a vowel context (/i/, or /u/; Nittrouer et al., 1989)
Six repetitions of 12 /dəˈCV/ syllables consisting of an initial stop (/b/, /d/, /s/, and /x/) followed
by three final vowel contexts (/i, a, u/; Nijland et al., 2002)
Six repetitions of 12 /CVb/ syllables consisting of an initial fricative (/s, z, ʃ/) followed by three
final vowel contexts (/i, ɑ, u/; Maas & Mailend, 2017)
(2) Tasks used to elicit those targets Imitation (model produced by experimenter)
Accompanied by a picture (Nittrouer et al., 1989)
(3) Conditions in which responses Quiet, no time pressure
are elicited Items in isolation (Nittrouer et al., 1989)
Items in carrier phrase (“Hé /dəˈCV/ weer” [he…wIːr] (“hey…again”; Nijland et al., 2002)
Items in carrier phrase (“It’s the /CVb/ again”; Maas & Mailend, 2017)
(4) The measures obtained from Ratio of F2 frequencies in different vowel contexts (Nittrouer et al., 1989)
those responses Ratio of F2 frequencies in different vowel contexts at /ə/ midpoint, /ə/ end, C onset, CV
transition onset, CV transition end, and V midpoint (Nijland et al., 2002)
Ratio of first spectral moment (Maas & Mailend, 2017)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome Validity: No
measures? Reliability: Interinvestigator differences in segmentation: 12.2 ms; correlation between
segmentation markers: r > .78 (Nijland et al., 2002)
Interinvestigator differences in segmentation: 1.2 ms (onset) and 1.4 ms (offset); correlation
between segmentation markers: r > .99 (Maas & Mailend, 2017)
Validity and reliability of F2 values by a postprocessing procedure of outlier removal
(Nijland et al., 2002)
(7) Norm or reference data available? Reference data: F2 frequencies, fricative ratios, and vowel context ratios for /si/, /ʃi/, /su/, and
/ʃu/ are reported for eight participants per age group for adults (four males, four females)
and 3-, 4-, 5-, and 7-year-old TD children (Nittrouer et al., 1989)
Mean midpoints and width of ranges of F1 and F2 and variability of F2 of schwa and vowels
for children with CAS, TD children, and adult females are reported (Nijland et al., 2002)
F ratios and V ratios for /si/, /ʃi/, /su/, and /ʃu/ are reported for adults, TD children, and children
with SSD (Maas & Mailend, 2017)
Note. TD = typically developing; CAS = childhood apraxia of speech; SSD = speech sound disorder.
coarticulatory vowel effects during a preceding consonant & Conture, 2002; Gibson & Ohde, 2007; Sussman, Hoemeke,
(Maassen et al., 2001), display higher variability in the & McCaffrey, 1992; Sussman et al., 1996).
amount of coarticulation, and display reduced distinc- Locus equations are based on the correlation between
tions between different vowels (Nijland et al., 2002). the values of F2 at vowel onset and vowel midpoint in CV
Findings of reduced contrasts have been reproduced when sequences for a given consonant across vowel contexts.
studying fricative productions in children with SSD, indepen- Lindblom (1963) found that the relationship between F2 at
dent of SSD subtype (Maas & Mailend, 2017). Abnormal onset and F2 midvowel can be described by a linear regression
(greater and reduced) coarticulation was observed only in equation: F2 onset = k × F2 vowel midpoint + c, where
children diagnosed with CAS (Maas & Mailend, 2017). k is the slope of the regression line and c is the y intercept
(the value where the regression line crosses the y-axis at
Locus Equation Metric x = 0; Lindblom, 1963, as cited in Sussman et al., 1991).
The locus equation metric was originally conceived Regression slope and y intercept can then be used to quan-
by Lindblom (1963), as cited in Sussman, McCaffrey, and tify anticipatory coarticulation in CV utterances where a
Matthews (1991), in the search for an invariant cue of steeper slope (i.e., a larger value of k) and a lower y inter-
place of articulation in stop consonants, independent of cept (a smaller value of c) indicate more coarticulation
vowel context (Sussman et al., 1991; see Table 12). While (Krull, 1989). In general, regression slope and y-intercept
initially based on voiced stops, it has been found to be an values show a strong correlation. Alveolar and dental
effective descriptor of place of articulation for consonants with productions, for example, typically feature shallower
other manners of articulation as well (Fowler, 1994; Sussman, slopes and higher y intercepts, while bilabials typically
1994; Sussman & Shore, 1996; but see also Brancazio & feature steeper slopes and lower y intercepts. Approxi-
Fowler, 1998) and has been shown to be stable across lan- mants, however, form an exception and typically feature
guages (Krull, 1988; Sussman, Hoemeke, & Ahmed, 1993). slopes near zero with varying F2 onset loci exclusively
Furthermore, the measure has been shown to work in adults described by varying y intercepts (Sussman, 1994; Sussman
and in children as young as 1.5 years old (Chang, Ohde, & Shore, 1996).
Table 12. Methodological details: locus equation metric (Sussman et al., 1992).
(1) Stimuli or targets being analyzed Three to six repetitions of 18 /CVt/ syllables consisting of an initial stop (/b/, /d/, and /g/) in
six vowel contexts (/i/, /I/, /æ/, /a/, /ʌ/, and /u/)
(2) Tasks used to elicit those targets Imitation (model produced by experimenter)
(3) Conditions in which responses are elicited Quiet, no time pressure; items in carrier phrase (“It’s a /CVt/ again”; Sussman et al., 1992)
(4) The measures obtained from those responses Regression slope and y intercept of the linear relationship between the frequencies F2 at
onset and F2 at vowel midpoint
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: No
Reliability: Interinvestigator differences in F2 frequencies: 97.2 Hz; correlation between
F2 measurements: r > .95
(7) Norm or reference data available? No
3014 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed /CC…/ sequences comprising real words (e.g., “clock”)
(2) Tasks used to elicit those targets Read at self-chosen, habitual, rate; 10 iterations of each target word
(3) Conditions in which responses are elicited Quiet, no time pressure; items in carrier phrase (“I see…”; Kühnert et al., 2006) or
preceded by an article (“a…”; Timmins et al., 2008)
(4) The measures obtained from those responses Gestural overlap/lag = (t_offsetC1 − t_onsetC2 / t_offsetC2 − t_onsetC1) × 100
(Kühnert et al., 2006)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? No
Ménard, Cathiard, Abry, & Savariaux, 2004; see Table 14). the point at which lip area shows a 10% decrease follow-
Two parameters indicative for lip rounding have been inves- ing the maximum area and offset by the point of a 10%
tigated in this respect, lip protrusion and lip constriction, of increase following the minimum lip area. The duration
which the latter has been consistently shown to be more reli- of the obstruence interval is based on the acoustic sig-
able (Noiray et al., 2010; Noiray, Cathiard, Ménard, & nal with V1 offset and V2 offset determined from the
Abry, 2011; Ménard, Cathiard, Dupont, & Tiede, 2013). spectrogram.
Where earlier studies used a combination of three-dimensional The stimuli used by Noiray et al. (2004, 2010) con-
optical (infrared light) and video recordings, later studies rely tain intervocalic consonant sequences of increasing length,
on video-based registration only (e.g., Ménard et al., 2013). which was specific to their study testing theoretical hypo-
In this technique, lip constriction is measured as theses about the temporal expansion of lip rounding as a
between-lips area based on the labial contours. Speakers’ function of intervocalic obstruence interval duration (see
lips are marked with a blue lipstick to maximize visual Table 14). For children with CAS, some of these complex
contrast, and a purpose-designed video analysis software intervocalic consonant clusters might be (too) difficult
automatically tracks and processes labial shapes. The time to produce. In principle, however, any intervocalic con-
resolution depends on the camera, but the software doubles sonant sequence could be used, as long as the consonants
the frame rate of the camera (which means that, with mod- are phonologically neutral with respect to rounding
ern ordinary equipment, rates of ≥ 60Hz are easily attain- and the clusters are phonologically legal in the testing
able). The operationalization of anticipatory coarticulation is language.
strongly intertwined with the stimuli, consisting of V1CnV2
sequences (/i/Cn/y/ or /i/Cn/u/), in which Cn varied from Articulatory Positioning: Mean Distance
zero to three consonants. In these sequences, anticipatory Across Set/Context
vowel behavior is assessed through the relation between A first type of measure of coarticulatory influences on
the total duration of the rounding gesture in the final vowel articulatory positioning is the absolute distance between the
and the duration of the obstruence interval or, in other position of an articulator during the production of a speech
words, is measured by how early in the utterance lip sound in different contexts and has been mainly used to
rounding starts. The duration of the constriction gesture investigate lingual coarticulation (see Table 15). Distance
is based on the video data, with the onset marked by a measures can be based on tongue contour as a whole or on
Table 14. Methodological details: anticipatory lip rounding (Noiray et al., 2004, 2010).
(1) Stimuli or targets being analyzed 10–12 repetitions of /iCny/ sequences in which Cn varied from zero to three intervocalic
consonants (in French forming the names [iy], [isy], [iky], [iksy], [ikry], [itkry], [iskry],
and [iksty])
(2) Tasks used to elicit those targets Imitation, prompted by the experimenter
(3) Conditions in which responses are elicited Quiet, no time pressure; items embedded in carrier sentences
(4) The measures obtained from Anticipatory lip rounding = total duration of the rounding gesture in the final vowel /
those responses duration of the obstruence interval
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? No
3016 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed 10 repetitions of CV syllables consisting of a fricative (/s/ or /ʃ/) followed by a vowel
context (/i/, /a/, and /u/; Zharkova et al., 2011, 2012)
Five repetitions of /ə/ preceded or followed by four CVC words consisting of two voiced
alveolar plosives /d/ with the vowels (/i/, /u/, /ɑ/ and /æ/; Kim et al., 2018)
(2) Tasks used to elicit those targets Reading/picture naming (text + image on screen; Kim et al., 2018; Zharkova et al., 2011,
2012)
(3) Conditions in which responses are elicited Quiet, no time pressure; items in carrier phrase (“It’s a…Pam”: Zharkova et al., 2011, 2012;
“Get…a puppy” and “Put a…here”: Kim et al., 2018)
(4) The measures obtained from those responses Mean across set/context distance (in millimeters) of tongue contours (Zharkova et al.,
2011, 2012) or tongue body position (Kim et al., 2018)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? Reference data: mean across set/context distance of tongue contours for /si/, /su/, /sa/
and /ʃi/, /ʃu/, /ʃa/ are reported for 10 participants per age group for adults (gender not
reported) and 6- to 10-year-old TD children (Zharkova et al., 2011, 2012)
specific parts of the tongue (e.g., flesh-point markers on curvature position, Dorsum Excursion Index, Tongue Con-
tongue tip, body and dorsum [EMA], highest point in tongue straint Position Index, and LOCa-i, a tongue bunch location
body [ultrasound], center point of contact [EPG]). Using index, which is further explained below; see also Ménard,
ultrasound tongue imaging, Zharkova et al. (2011, 2012) Aubin, Thibeault, & Richard, 2012; Zharkova, 2013; see
quantified coarticulation as the mean nearest neighbor dis- Table 16). The main purpose of their study was to compare
tance between tongue curves at midpoint of the production ultrasound data collection with and without head stabiliza-
of the initial fricatives /s/ and /ʃ/ in two vowel contexts, tion (i.e., the ultrasound scanner mounted on a headset or
calculated as the Euclidean distance from each point in handheld). The results indicate that tongue shape measure
one curve to the nearest point in the second, comparison LOCa-i is the most robust, as it was the only measure that
curve. Coarticulatory distance between single points in- was not affected by the absence of stabilization. LOCa-i
stead of contours, such as EMA coil position or EPG cen- captures the extent of tongue front and tongue back excur-
ter point of contact, could be calculated in the same way. sion and is calculated as the ratio of tongue height at 1/3
A similar but slightly different approach was used by and 2/3 of the length of the tongue curve (measured from
Kim, Coalson, and Berry (2018) in investigating articula- the tip). Higher values correspond to a more /i/-like tongue
tory measures of anticipatory and carryover lingual coarti- shape, and lower values correspond to a more /a/-like
culation in (/ə/)CVC(/ə/) sequences with EMA (see Table 15). tongue shape (Zharkova et al., 2015).
Instead of comparing tongue position in two contexts, they The LOCa-i tongue shape ratio measure can be seen
compared each /ə/ production with the speaker-specific aver- as the articulatory equivalent of acoustic F2/second moment
age over all repetitions at the temporal midpoint. The ratios (see the F2 ratios and First Moment Ratios section)
advantage hereof is that it generates a data point for each and are suitable for consonant–vowel (CV) or vowel-to-vowel
utterance individually instead of each context pair and thus (əCV) anticipatory coarticulation, albeit specifically de-
provides a context-independent measure of coarticulation. signed for /i/ and /a/ vowel contexts. Task and elicitation
Coarticulation was measured at two positions in /ə/, at /ə/ procedures are similar to the mean distance across set/
midpoint and at /ə/ boundary, defined as onset (anticipa- context measure (Zharkova et al., 2011, 2012; see Table 16).
tory) or offset (carryover) of /ə/, which were acoustically With respect to the comparison between head-mounted
identified as the first or last glottal pulse. The two yielded or handheld ultrasound recording, the results from Zharkova
the same pattern of results, although a direct comparison of et al. (2015) indicated that it was possible to collect reliable
the two versions of the measure in terms of sensitivity was data without head mount in adolescents (N = 10; 13-year-
not possible due to the small sample size (N = 7 female adult olds). As the authors note, however, this might not hold
speakers; Kim et al., 2018). for younger children. Until it has been conclusively proven
to be reliable, it is advised to collect data with head stabiliza-
tion when investigating coarticulation in younger children.
Articulatory Positioning: Tongue Shape Ratio
Instead of a distance measure based on tongue con-
tours or flesh points, Zharkova, Gibbon, and Hardcastle Articulatory Positioning: Coarticulation Degree
(2015) quantified coarticulation as the vowel context ratios Another measure of coarticulation that has been used
of five different measures of tongue shape (curvature degree, in recent ultrasound studies with children is coarticulation
(1) Stimuli or targets being analyzed Six repetitions of CV syllables consisting of a consonant (/p/, /t/, /s/, and /ʃ/) followed
by a vowel context (/i/ and /a/)
(2) Tasks used to elicit those targets Reading/picture naming (text + image on screen)
(3) Conditions in which responses are elicited Quiet, no time pressure; items in carrier phrase (“It’s a…Pam”)
(4) The measures obtained from those responses i/a ratio on the LOCa-i measure of tongue shape
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? No
Note. CV = consonant–vowel.
degree (Noiray et al., 2018; Rubertus & Noiray, 2018), which Inappropriate Prosody, Especially in the
can be seen as the articulatory variant of the locus equa-
tions metric (see the Locus Equation Metric section; see
Realization of Lexical or Phrasal Stress
Table 17). Similarly, coarticulation degree captures whether Background
the positioning of an articulator during the production of a Prosody; Lexical and Phrasal Stress
speech sound varies systematically depending on its position Prosody is difficult to define and may encompass
in the vowel context by means of a regression analysis. different aspects of speech for different researchers and
Unlike the acoustics-based equivalent, however, the articula- clinicians. For present purposes, we will not discuss the
tory measure was used not only for consonant–vowel (CV) many different views of prosody but instead attempt to de-
anticipatory (Noiray et al., 2018) but also for vowel-to-vowel lineate the aspects of prosody that have received attention
(VCə) carryover coarticulation (Rubertus & Noiray, 2018). in the literature on AOS. To help delineate this domain,
Articulatory positioning was based on the highest point of we will follow Shriberg and Kent (2013) in using the term
the tongue body (horizontally) at the (acoustically deter- prosody to refer to suprasegmental aspects of the speech
mined) temporal midpoint of the segments of interest. Spe- signal that affect the linguistic or communicative structure
cifically, they measured whether tongue body height in the of an utterance, such as stress, intonation, and pauses (see
consonant and /ə/ varied systematically depending on the also Gerken & McGregor, 1998). Excluded from this defini-
vowel by regressing the horizontal position of the highest point tion and discussion are paralinguistic, suprasegmental as-
of the tongue body at C and V midpoint and /ə/ and V mid- pects of speech that primarily provide information about
point, respectively (see Table 17). Differences in coarti- the speaker or the speaking context, such as voice quality
culation degree were expressed in regression coefficients, and overall loudness.
where a larger value (i.e., a steeper slope) indicates more It is important to recognize that there is significant
coarticulation. cross-linguistic variation in prosodic structure and the
Table 17. Methodological details: coarticulation degree (Noiray et al., 2018; Rubertus & Noiray, 2018).
(1) Stimuli or targets being analyzed Six repetitions of C1VC2/ə/ pseudowords, V consisting of the tense long vowels
/i:/, /y:/, /e:/, /u:/, and /o:/ and C consisting of /b/, /d/, /g/, and /z/ with C1V a fully
crossed set and C2 different from C1
(2) Tasks used to elicit those targets Imitation of prerecorded model
(3) Conditions in which responses are elicited Quiet, no time pressure; items preceded by an article (“eine…”)
(4) The measures obtained from those responses Coarticulation degree: mean within stimulus distance in tongue body position in
V and C1 midpoint (Noiray et al., 2018) and V and /ə/ midpoint (Rubertus &
Noiray, 2018)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? No
(7) Norm or reference data available? Reference data: Graphic displays of regression slope estimates are available for
vowel-to-/ə/ carryover coarticulation in consonant contexts /b/, /d/, and /g/ for
3-year-olds (n = 19), 4-year-olds (n = 14), 5-year-olds (n = 14), 7-year olds
(n = 15), and adults (n = 13; Mage 23; seven females and six males) with typical
development (Rubertus & Noiray, 2018)
3018 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
3020 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed 50 Multisyllabic real words: three (n = 37), four (n = 12), five (n = 1) syllables
(2) Tasks used to elicit those targets Picture naming
(3) Conditions in which responses are elicited Quiet, no time pressure
(4) The measures obtained from those responses Percent stress matches (based on binary judgments of match/mismatch)
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: Percent stress matches among the two most discriminative measures in
distinguishing children with CAS from children without CAS (77%–80% accuracy;
Murray et al., 2015)
Reliability: Interrater reliability = 91.2% (Murray et al., 2015)
(7) Norm or reference data available? No reference data from children with typical speech. Reference data of children
without CAS but with other speech disorders (N = 15), M = 67.3%, SD = 22.4%,
and children with CAS (N = 28), M = 9.8%, SD = 9.1% (Murray et al., 2015)
studies use similar tasks and stimuli as perceptual studies, iambs and spondees, which led the authors to exclude these
but some considerations are particularly pertinent in the items from analysis.
context of the acoustic approach. We will first consider The acoustic measures are typically obtained from
the effect of phonetic context in stressed and unstressed the nucleus of a syllable, and they include the duration of
syllables. Listeners in a perceptual study have the advan- the segment, peak intensity, and peak F0. Sometimes,
tage of using all the different cues of stress simultaneously. measures that relate to the timing of the peak F0 and/or
This allows listeners to weigh different cues differently, amplitude are also included. The magnitude of stress is
depending on the phonetic context. For example, in English, reflected in the comparisons of these measures between
vowels are produced with a longer duration in a syllable stressed and unstressed syllables—greater difference reflects
with a voiced coda compared to a voiceless one. The acous- more pronounced production of stress. Several different
tic measures of duration, amplitude, and F0 are obtained techniques have been developed to compare the acoustic
in isolation. The duration difference that results from measures of stressed and unstressed syllables.
different phonetic context or from stress will interact with Some studies have compared the raw values of
one another and cannot be separated. In order to interpret duration, amplitude, and F0 between stressed and un-
the duration change as an effect of stress production, it is stressed syllables (Nijland, Maassen, Van der Meulen,
important that the phonetic context be controlled for a Gabreëls, et al., 2003; Skinder et al., 2000). For example,
reliable comparison of stressed and unstressed syllables. In Nijland, Maassen, Van der Meulen, Gabreëls, et al. (2003)
addition, the acoustic measures typically depend on a reli- found a difference in the duration of unstressed syllables
able identification of the nucleus of a syllable (the vowel in iambic feet when children with CAS were compared to
or a syllabic consonant). The nuclei that are surrounded TD children. More specifically, while the duration of
by stop consonants can be more reliably identified com- stressed syllables was comparable between the two groups,
pared to those surrounded by liquids or glides. the authors found that children with CAS did not have
In addition to phonetic context, acoustic analysis is shorter durations for unstressed syllables in these utterances.
particularly sensitive to variables related to phrase- or The shortcoming of comparing raw values of acoustic
sentence-level prosodic factors such as phrase-final length- measures, such as syllable duration, is that this approach
ening and citation intonation. Like phonetic context, these does not take into account individual variation in these
variables affect the same acoustic variables as stress and measures between different groups. Children with CAS,
therefore interact with stress. For example, the last stressed for example, may have a decreased speaking rate compared
syllable in the final foot of a phrase is subject to phrase-final to TD children. This systematic difference may interact
lengthening. This lengthening can mask the effect of stress with the duration differences that are related to stress.
on duration in case of trochees and inflate the effect in case
of iambs. Using a carrier phrase (e.g., “It’s a [stimulus] Lexical Stress Ratio
again.”) may help to circumvent this issue. Carrier phrases Shriberg et al. (2003) proposed the lexical stress ratio
will also help to avoid citation intonation that people often (LSR), an approach to quantify stress production that
use in picture-naming or single-word reading tasks (Ballard, takes advantage of acoustic correlates of stress see Table 19.
Djaja, Arciuli, James, & van Doorn, 2012), where people The LSR combines duration, intensity, and F0 into one
raise their F0 in the end of the utterance as if requesting composite score of stress. More specifically, the LSR is
feedback. Shriberg et al. (2003) also reported that children the sum of the ratios of three acoustic measures (frequency
were playfully varying the duration of the last syllable in area under pitch contour trace, amplitude area under
(1) Stimuli or targets being analyzed Eight bisyllabic real-word trochees (eight iambs and eight spondees excluded
from analysis; see Shriberg et al., 2003)
(2) Tasks used to elicit those targets Imitation (from recorded audio model)
(3) Conditions in which responses are elicited Quiet, no time pressure; items in isolation
(4) The measures obtained from those responses Weighted average of ratios of frequency area, amplitude area, and vowel duration
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: No clear pattern of correspondence between LSR values and clinical
perceptual judgments of abnormal stress
Reliability: Interjudge differences: amplitude = 0.9–1.3 dB; F0 = 10.9–14.0 Hz;
duration = 16–18 ms
(7) Norm or reference data available? No reference data of children with typical speech. Range of LSR values for
children with (non-CAS) speech delay: 0.65–1.14 (Shriberg et al., 2003)
rectified waveform contour trace, and duration) weighted PVI was used to measure normalized relative vowel duration
by a constant. Although Hosom, Shriberg, and Green and peak intensity over the first two syllables of the poly-
(2004) automated the calculation of LSR using automatic syllabic words. The results showed that speakers with AOS
speech recognition, this indicator has not been widely used. had lower PVI vowel duration values for words with weak–
While LSR assigns different weight for various acoustic do- strong stress produced in the sentence condition, compared
mains combining them into one indicator of stress much to controls and individuals with aphasia, and was primarily
like a human listener, it lacks a similar flexibility. Human attributed to disproportionately long vowels in the word-
listeners may weigh different perceptual cues of stress differ- initial weak syllable for AOS participants. Similar findings
ently, depending on the phonetic and intonational context, were reported by Courson et al. (2012). Together, these
while the weights are constant in LSR. findings demonstrate that the PVI might be a promising
acoustic diagnostic tool in assessing dysprosody in AOS.
Pairwise Variability Index Ballard et al. (2010) have further demonstrated that the
Finally, another measure that has been used to quan- PVI is strongly correlated with perceptual ratings of prosody.
tify stress production in children with and without CAS The PVI has several advantages over other approaches
(Ballard et al., 2012, 2010; Shriberg, Jakielski, & El-Shanti, of quantifying stress production in addition to normalizing
2008) is the pairwise variability index (PVI; Low, Grabe, & for individual differences of the measures of interest. First,
Nolan, 2000; see Table 20). This index is calculated for a study by Ballard et al. (2012) provides reference data for
each acoustic measure related to stress assignment (duration, the PVI of duration, amplitude, and F0 in a cohort of 73 TD
intensity, and F0) separately, and it normalizes for the indi- 3- to 7-year-old children. The authors used a picture-naming
vidual variability of speakers for these measures. PVI is task to elicit polysyllabic words with Sw and wS stress
calculated by the following formula (from Ballard et al., patterns. According to the results, stress production was
2010): PVI(dur) = ((dk − dk + 1) / (( dk − dk − 1) / 2)) × 100, adultlike for words with a Sw stress pattern (e.g., “butterfly”)
where d is the duration of the kth syllable (see Table 20). already by age of 3 years. In contrast, even the older chil-
This formula illustrates the calculation of PVI for dura- dren in this cohort differed from adults in their stress pro-
tion; the same formula can be used to calculate the PVI duction of words with a wS stress pattern (e.g., “potato”),
for other acoustic measures by replacing the duration with at least with respect to duration and amplitude. These results
the measure of interest (e.g., intensity or F0). are crucial for interpreting the stress production differences
Findings to date using the PVI indicate that this mea- in children with CAS as they suggest that not all differences
sure can reveal differences between speakers with and without in stress production of wS words are a reason for concern
AOS (in children and adults). For example, Shriberg et al. in a 5-year-old while differences in words with Sw pattern
(2008) used the PVI to investigate timing and stress charac- may be reflective of a delay or disorder.
teristics in the speech of three siblings with CAS using the Second, Ballard et al. (2010) argue that analyzing dif-
PVI and found a significantly poorer score in one of the ferent correlates of stress separately from one another is a
three affected speakers, compared to their age-matched strength of the PVI approach because it may provide a clini-
controls. With respect to adults with AOS, Vergis et al. cian with indications as to which aspect of stress production
(2014) analyzed lexical stress contrastiveness in polysyllabic is most impaired in children with CAS and/or which aspect
words produced in isolation and in a carrier sentence, pro- of stress is the best target in therapy. For example, Ballard
duced by individuals with AOS + aphasia (AOS; n = 9), et al. examined the effects of therapy, which emphasized
aphasia only (n = 8), and unaffected speakers (n = 8). The only durational contrasts in stress production. While all
3022 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
(1) Stimuli or targets being analyzed Multisyllabic nonwords (Ballard et al., 2010) or words (Ballard et al., 2012)
(2) Tasks used to elicit those targets Reading (Ballard et al., 2010) or picture naming (Ballard et al., 2012)
(3) Conditions in which responses are elicited Quiet, no time pressure, in isolation
(4) The measures obtained from those responses PVI for duration, amplitude, and/or F0
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: Strong correlations with perceptual ratings of prosody for three children with
CAS (Ballard et al., 2010)
Reliability: Intraclass correlation coefficients: .905 (duration), .996 (F0; Ballard et al.,
2012); interjudge Pearson r = .98 (duration); average difference, 1.22 ms (Ballard
et al., 2010)
(7) Norm or reference data available? Reference data available for ages 3–7 years and adults (Ballard et al., 2012). No
cutoff values specified, but means and SD available
participants with CAS (n = 3) showed improvement on extent these measurement differences have an effect on the
the duration contrast, the contrast between other vari- final result (Liss et al., 2009). In terms of reliability, Ballard
ables, such as intensity and F0, also improved. and colleagues (Ballard et al., 2012, 2010) reported high
With respect to the validity and reliability of the Pearson and intraclass correlation coefficients and small
PVI, the following should be noted. At present, validation interrater differences for their (small) sample, suggesting
of the PVI as a measure of dysprosody in CAS is limited that PVI measures can be reliably obtained (see Table 20).
to the strong correlations between PVI and perceptual rat-
ings of prosody in three children with CAS, as reported by
Ballard et al. (2010). Clearly, further validation using Articulatory Measures
(much) larger samples is needed, in particular, also to vali- Kinematic Pairwise Variability Index
date the PVI as a potential diagnostic marker for CAS (e.g., Articulatory measures of prosody may include kine-
validation against other measures such as standardized matic measures of movement amplitude and movement du-
maximum performance tasks). Also, in order to obtain ration (e.g., Goffman, 1999, 2004; Grigos & Patel, 2007,
PVI values, it is necessary to divide the stimuli in vocalic 2010; Kopera & Grigos, 2019; see Table 21. Similar to
and intervocalic intervals based on acoustic information acoustic measures, ratios of duration or amplitude and
available from the waveform and spectrogram. As noted PVI can be computed to express the degree of differentia-
by White, Liss, and Dellwo (2011), variations exist in the tion between stressed and unstressed syllables. All the same
approach of researchers to determine vocalic and consonan- caveats and considerations as discussed previously, with re-
tal information, although it is not clear whether or to what spect to kinematic measures, apply here as well. To date,
Table 21. Methodological details: kinematic pairwise variability index (PVI; Kopera & Grigos, 2019).
(1) Stimuli or targets being analyzed Two-syllable sequence puppy extracted from multisyllabic nonword puppypop
(2) Tasks used to elicit those targets Cloze sentence or response to question
(3) Conditions in which responses are elicited In story context with prop, no time pressure, in isolation
(4) The measures obtained from those responses PVI for jaw movement duration and amplitude
Scientific basis
(5) Standardized measurement protocol? No
(6) Validity and reliability of outcome measures? Validity: Not reported, except to the extent that the PVI movement duration distinguished
children with CAS from TD children, and children were designated as having CAS
based on perceived presence of prosodic abnormalities.
Reliability: Not reported
(7) Norm or reference data available? Only group means and standard deviations: PVI movement duration: TD = 33.1 (29.3),
SSD = 21.9 (18.3), CAS = 18.0 (30.4)
PVI movement amplitude: TD = 86.9 (62.0), SSD = 66.3 (45.5), CAS = 73.4 (65.8)
Note. CAS = childhood apraxia of speech; TD = typically developing; SSD = speech sound disorder.
3024 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
3026 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
3028 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
3030 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019
3032 Journal of Speech, Language, and Hearing Research • Vol. 62 • 2999–3032 • August 2019