Module 5

Module 5:
Bilingual Speech Processing

part 1
Speech Perception and
Comprehension
 In our daily lives, listening to speech is a skill that takes place quickly
and efficiently.
 This particular skill can also be referred to as processing and

comprehending spoken language.
 A human being is said to process 7 to 10 sounds per second, and about

175 words in a minute, irrespective of their noisy surroundings.
 From perceiving the acoustic waves to building an all encompassing

mental representation, the process of listening is complex in nature.
Architecture of the basic components of speech
processing
(Grosjean & Li, 2013: pp-30)

 The figure shows the basic components needed for speech
processing to take place.
 First, speech input (speech wave)
 Produced by the speaker.
 Also known as “bottom-up” information. Hence is at the bottom of the figure,
with an arrow pointing upwards.
 Second, “linguistic knowledge” and “processing mechanism” in the

middle
 “Linguistic knowledge”: knowledge that the listener has of the language
being spoken (grammar and lexicon)
 “Processing mechanism”: needed to process the input.
 Finally, “other sources of information”
 Used by the listener to perceive and understand what is being said.
 Includes extralinguistic information such as: context in which the speech

situation is taking place, information regarding what has been heard so far,
knowledge about the world, etc.
 Also called “top-down” information, hence is at the top of the figure with an
arrow pointing downwards
 The final outcome of perception and comprehension is referred to as Mental

(interpretative) Representation = enriched meaning of what is being said.
Processing mechanism
 Processing mechanism includes the “processing levels” involved- from the
acoustic wave to the mental representation.
 These levels include:

 Speech perception and prosodic analysis
 Word recognition
 Syntactic and semantic processing
 Pragmatic processing
 Let’s understand each of these levels using an example:

“The librarian gave the book to Mary!”
 Speech perception and prosodic analysis:
 First, the phonetic units that make up the utterance are identified.
 So, “th” (/ð/) of “the”, “e” (/ə/), the “l” (/l/) of “librarian”, “i” (/aɪ/), “b” (/b/) and
so on.
 Then, the speech variables not included in phonetic segments, also known as
suprasegmentals, come under prosodic analysis.
 These include pitch, loudness, length, rhythm, etc.
 The prosodic analysis helps categorize the utterance as statement,

exclamation, or question.
 In this sentence, there is a stronger stress on “gave”, as well as a pitch that

shows surprise throughout the utterance.
 Word recognition:
 This process begins as soon as the speech sounds are identified by the
speaker
 As the word is uttered, it activates the listener’s lexicon, which includes all
words that match the speech signal.
 As the information compiles little by little, the mental lexicon removes the
unlikely candidates, and narrows down to the final possibilities among which it
selects the word in question.
 Once the word is mapped in the lexicon, information regarding the meaning,
the morphology, the grammatical category, and the syntactic and semantic
structures occurs in the listener.
 Syntactic and semantic processing:
 Here, the syntactic structure of the sentence is determined. This process is
known as parsing.
 So,
 Next comes the assignment of the thematic roles. This involves assigning the
semantic roles to the phrases in relation to the verb.
 Here, AGENT give THEME to the RECIPIENT.

 “The librarian” is the agent.
 “the book” is the theme.
 “Mary” is the recipient.
 The outcome of this level of processing mechanism is the literal mental

representation of the utterance.
 Pragmatic processing:
 Uses the prior knowledge or information that the listener has about the
utterance to compute the utterance’s enriched representation.
 This knowledge includes:

 The context
 The speaker
 The people being referred to, as well as what has been said before
 Knowledge of the world
 Rules of communication, etc.
 In this example, to understand it fully, we need to know that Mary repeatedly

has tried to borrow the book from the library but was unsuccessful. The fact
that the librarian finally gave her the book means something happened to make
it possible, hence the tone of surprise.
Some points to remember at this
stage
 All of these processes are complex and are subjects of research in their own terms.
 Though the processes are mentioned in a serial manner here, they actually occur
quasi-simultaneously. These are near parallel processes. (Marslen-Wilson,1975)
 These are online processes, i.e. they occur as the utterance is taking place.
 It is also generally agreed that these layers of different processes are
interactive, i.e. there is an exchange of information between the levels of
processes.
 These basic processing components and levels are found in both monolingual
and bilingual speech comprehension.
 However, there are some differences when it comes to bilingual speech.

The factors responsible for differences
 First, Bilinguals process two languages, hence some of the central

components of the model will be multiplied by two.
 Thus, the phoneme inventories may not overlap entirely, word order
may be different, morphological processes differ and so on.
 Second, in most cases the knowledge in the languages might differ based on
proficiency. Hence that will impact their comprehension and perception.
 Some languages may be less well perceived in adverse condition.
 For example, bilinguals do worse than monolinguals in perceiving speech in

noisy environment.
(Lecumberri, Cooke &Cutler 2010)

 Third,
 based on linguistic structure of each language, processing mechanism might

differ.
 For example, the case of tones. If a bilingual speaks tow languages, one of
which is tonal, the strategies will be different in each language. Lexical
processing in one will need to take tons into account but not in the other.
 Similarly, differences at other levels, like syntax, will also affect the relative
processing speed and strategy.
 Fourth,
 the utterances to be processed may be in monolingual or bilingual mode.
 Bilingual mode will be activated when it would depend on the specific requirements
of the system and the user's needs.
 For example, if the user communicates in two languages, bilingual mode may need
to be activated to allow for accurate understanding and response in both languages.
 This issue is unique to a bilingual.
 This will have added processing cost for a bilingual.

 Finally, even when only one language is being processed, a bilingual
cannot entirely suppress the impact of the other language at various
levels.
 This is because of the co-activation of these competing nodes from the

two languages.
 Bilinguals perceive and comprehend 2 or more languages in their daily
lives.
 The image shows the speech processing mechanism in bilingual adults,
when they listen to ‘one’ language.
Grosjean & Li. 2013: pp-33.

Activation of the other language: role
of bottom up information
 In the diagram, one language is not being heard (the deactivated one)
 Researchers have taken two positions in this case: the other language is active
Vs it is not active
 Majority of them attest that both languages are active, even when the bilingual
is listening to only one
 This means, speech processing is non-selective
 Evidence for non-selective access already exists in written/visual language
processing
 Of late such evidence is provided in case of auditory processing too, using eye
tracking
Spivey-Marian 1999
 Participants: Russian-English bilinguals

 Eye tracking: tracked the eye movement of the participants while they heard
auditory input command in Russian
 The board in front of the participants had four objects: stamp (marku in Russian),
marker, ruler and another filer object.
 Input: poloji marku nije krestika (put the stamp below the cross)
 Result: the interlingual competitor ‘marker’ received significantly higher eye
movement compared to other fillers.
 Reason: the object ‘marker’ and the target object ‘marku’ have shared phonetic
property.
Ju & Luce 2004
 Participants: Spanish-English bilinguals
 Eye tracking study: participants were asked to click on the picture that
matched the auditory stimuli
 Stimuli: Playa (Spanish): beach
 Manipulation: changing the Voice Onset Time (VOT) of the consonant ‘p’, to
make it sound like English ‘p’.
 This made the participants look at ‘pliers’ (competitor)

Is it possible to neutralize such
activation?
 Chambers and Cook (2009)
 English –French bilinguals
 Eye tracking study
 Target word: poule (chicken). Competitor object: pool
 Preceded by two types of sentences: restrictive Vs non-restrictive
 Marie va nourrir la poule (Marie will feed the chicken)restrictive
 Marie va decrite la poule (Marie will describe the chicken)non-restrictive
 Task: dragging the target object into the middle square
 Result: in the restrictive condition, the eye movement to the competitor was
greatly reduced.
Marian & Spivey 2003
 This work showed that the activation of the other language can be neutralized.
 This study tried to create a monolingual mode by have experimenters who spoke
either Russian or English (posing as monolinguals)
 Also other changes were made to the contextual factors, like conducting Russian
and English experiments in different sessions etc.
 This time, they reported reduction in eye movement to the competitor to a

significant level
 This way, selective processing can also be activated.
 Thus, the processing strategy depends on a number of bottom-up and
top-down factors.
Permanent influence of one language
on another
 Sound category of dominant language influencing weaker language.
 For example, if the dominant language has only one category corresponding to two of
the weaker one, then the two might get merged.
 Thus English / æ/ and / / get assimilated into Dutch / /
 Other studies involving different phoneme inventories across language, gender

marking etc also point out the same phenomenon.
(Pallier, Colome & Sebastian-Galles 2001; Guillelmon & Grosjean 2001)

Model of spoken word recognition
 General consideration:
 In the monolingual mode: only one language node is activated
 In the bilingual mode: both languages are active, but base language network is more
strongly activated.
 Resting level of non-target language can be modified by various other factors.
 And, if an input unit has similarities in the other language, this will lead to delayed
processing.
BIMOLA (Lewy & Grosjean 2008)
Bilingual Interactive Model of Lexical
Access:
This model proposes three level of nodes:

features, phonemes and words.
Feature level: shared by two languages
Phoneme and word levels: independent sets

within a larger system.
Features activate phonemes and phonemes

activate words
Activation connection between words and

phonemes are bi-directional.
Feature level to phoneme level is

unidirectional
BIMOLA AND BIA
 These two models are both based on bilingual lexical access. BIA is based on visual
word recognition and BIMOLA on auditory
 Both are computer models of processing.
 BIA is based on Dutch and English, whereas BIMOLA simulates French and English
words.
 There are some other important internal feature dependent differences too.
(Thomas & Van Leuven 2005)

Speech processing in Bilingual
adults
 Previously, bilingual population was considered problematic in terms of
linguistic research (Antoniou, 2018).
 However, later studies showed the key role bilinguals played to understand
the effects of language experience on speech perception.
 Research on bilingual adults examine:

 The influence of the two languages on one another in the process of perception
 Direction of language influence, L1 to L2, or L2 to L1
 The degree of influence.
Factors affecting Bilinguals’
speech perception
 Common misconception: a “true bilingual” should have equal command
on both languages, like 2 monolinguals.
 However, under certain circumstances, a bilingual’s two languages will

be activated and influence one another, deviating them from
monolinguals.
 Degree of languages influence determined by factors:

1. Age of Acquisition
2. L1:L2 Usage patterns
3. Language dominance
4. Language Mode
1. Age of Acquisition
 It is the age at which an individual is first exposed to L2.
 Can be very early in infancy or late in life.
 For example:
 Simultaneous bilinguals: exposed to both languages within first year of life
 Heritage speakers: AOA corresponds to age when they first attend preschool
or school.
 Immigrant population: AOA used alternatively with “age of arrival at host
country”. Might occur in childhood, or adulthood.
Flege, MacKay and Meador (1999)
 Examined perception of English vowels by Italian-English bilinguals of varying
AOA in Canada.
 72 Participants of varying AOA:
 Early (7 years old)
 Mid (14 years old)
 Late (19 years old)
 Task: discriminate range of vowel contrasts between

 two English vowels (/I/ VS /i/
 English and Italian vowels, like / æ /–/a/
 two Italian vowels, like /u/–/o/
 3 stimuli presented on each trial. Participants had to indicate if the 3 stimuli

contained different vowel or same vowel.
 Result:
 English monolinguals discriminated English-English, English-Italian contrasts more
accurately than group of bilinguals.
 Strong effect of AOA observed for Italian-English bilinguals.
 Discrimination of English-English and English-Italian contrasts worsened as AOA

increased: early bilinguals showed best discrimination, followed by mid bilinguals,
and lastly, the late bilinguals.
 Conclusion: As the AOA increases, L2 vowels perceived less accurately.

Stölten, Abrahamsson, and Hyltenstam (2014)
 Investigated AOA on categorical perception of stop voicing contrast in
Swedish word initial stops
 15 native Swedish monolinguals, 41 Spanish-Swedish bilinguals

 31 early Spanish-Swedish bilinguals (AOA 1-11 years)
 10 late Spanish-Swedish bilinguals (AOA 13-19 years)
 Task: 3 continua created for Swedish voiced vs voiceless contrasts

 /b/ - /p/, /d/ - /t/, /k/ - /g/
 Task was to categorize if each of the continuum steps was voiced or voiceless.
 Result:
 Native speakers showed clear category boundary for all 3 contrasts
 Late bilinguals categorization deviated most from native speakers
 Early bilinguals’ responses were somewhere in between
 Conclusion: the degree of deviation from native speakers’ categorization

increased with AOA
2. L1:L2 Usage patterns
 L1:L2 usage ratio refers to respective amount of communication that takes place
in an individual’s L1 and L2
 High levels of continued L1 usage might reflect limited exposure to L2 and vice
versa.
 In general, high usage of a language, be it L1 or L2, is associated with greater

proficiency and has subsequent consequences for speech perception.
 To understand L1-L2 usage pattern, it is necessary to compare bilinguals who

have the same AOA but differ in their L1:L2 usage ratios.
Flege and MacKay (2004)
 A series of 4 experiments were performed on Italian-English bilinguals of varying
L1:L2 usage patterns, namely:
 Native Italians living in Canada for upto 3 months
 Canadian English monolinguals and Italian-English bilinguals living in Canada for years
 Italian-English bilinguals who had either arrived in Canada early (3-13 years) or late (15-
28 years)
 Tasks included:
 Task 1, 2 & 3: discriminate several English vowel contrast
 Task 4: identify vowels in phrases they heard and read and indicate if they were
produced correctly. These contrasts typically did not occur in Italian.
 Results:
 Early bilinguals are more accurate in discriminating English vowel contrasts

than late bilinguals.
 Bilinguals reporting low L1 (Italian) use tended to discriminate English vowel

contrasts better than those with high L1 use.
3. Language Dominance
 “Bilinguals most often have a dominant, or stronger, language in which they have
attained an overall higher level of proficiency, which they use more frequently, and
across a wider range of domains.” [Silva-Corvalán and Treffers-Daller, 2016]
 Language dominance of bilinguals is not fixed but can change over time.
 As bilingual’s environment changes, so will their need for that particular language skills.
Amengual and Chamorro (2015)
 Aim: investigate role of language dominance and see if the Galician vowel
system becomes more Spanish-like due to extended exposure to Spanish.
 Participants: 54 Spanish-Galician early bilinguals.
 Task: Categorize Galician front and back mid vowel contrasts in an

identification task.
 Findings: Galician-dominant (but not Spanish-dominant) bilinguals have

established two separate vowel categories for front and back mid vowels. Thus,
language dominance strongly predicts perception abilities of bilinguals.
Casilas (2015)
 Aim: Speech perception abilities of English monolinguals and Spanish-English
bilinguals of varying AOA compared.
 Participants:
 English monolinguals
 Spanish-English late bilinguals
 Early Spanish-English bilinguals
 Task: Categorize the 11 steps of vowel continuum of southwestern American

English tense-lax high front vowel contrast.
 Results:
 English monolinguals and English-dominant bilinguals more consistent in
assigning categorical labels.
 Spanish-dominant bilinguals showed a less sharp categorical boundary.
 The English monolinguals and English-dominant bilinguals placed greater weight

on vowel spectrum properties than vowel duration.
 On the other hand, Spanish-dominant bilinguals relied more on vowel duration .

4. Language mode
 So far we have covered patterns of language acquisition and usage.
 Language mode (Grosjean 2001, 2008) refers to the state of activation of the
languages of a bilingual, and the language processing mechanisms at a given point in
time.
 Bilinguals will be in monolingual mode:

 When interacting with a monolingual speaker of one of their languages
 The other language is said to be deactivated but not completely.
 Bilinguals will be in bilingual mode:
 When interacting with bilingual speaker of the same two languages
 Both languages are activated, and code switching occurs. However, one
language is used for processing, thus more active than other.
 Factors responsible for position of a bilingual on the language mode

continuum are:
 Person being spoken to
 Situation of the conversation
 Form and content of the message
 Function of communication
Garcia-Sierra et al. (2012)
 Aim: to investigate relationship between speech perception and language mode
using event-related potentials.
 Participants: Spanish-English bilinguals.
 Task: Participants had to identify the deviant stimuli from repeated sequence of
standard stimuli. MMN was used to observe the response.
 Results: MMN changed as function of base language.
 It further proved that experimentally manipulating language mode causes

bilinguals to perceive same physical stimulus as belonging to 2 distinct
categories.
Speech processing in Bilingual
children
 For children, learning a new language starts from listening to the sounds and
rhythms of the language.
 Young infants are called “universal listener” because of their broad speech
perception, which helps them acquire any native language.
 Over time, listeners start to specialize in attending to just the sound contrasts of
their own languages, called perceptual narrowing.
 Speech perception is a complex process in bilingual children.
 It can occur in two ways:

 Simultaneous
 Sequential
Simultaneous Bilinguals
 Children tune their perceptual systems to 2 native languages from
birth.
 Have less exposure to each language than monolinguals since their time
is divided between the 2 languages
 Must mentally represent 2 languages, increasing number of perceptual

categories they must learn.
 Their exposure is “noisy”.
 Eg. Bilingual children often have bilingual caregiver, speaking with accent
in one or more languages.
 Simultaneous bilinguals need to discriminate and separate their languages

and engage in language specific processing.
Sequential Bilingual
 When they begin learning a second language, their first language is
mostly acquired, but not fully adult-like.
 They acquire new perceptual system on top of their still-developing L1

system.
 Children’s perceptual sensitivities begins to decline sharply in the first

year of life, and this decline continues for some years.
 Therefore, when children come across a new language, their perceptual

systems will neither be as open as infants’ nor as well-established as
adults’.
 The figure shows the process of perceptual narrowing:
Source: https://static1.squarespace.com/static/555906d9e4b0251d92d5635a/t/
5bb7d0c7104c7b66055d47b8/1538773192178/2018_Byers-Heinlein_Chapter8_SpeechPerception.pdf
Kuipers and Thierry (2012)
 Investigated if the English-Welsh bilinguals could discriminate between
words in each of the languages.
 Participants: 2-3 years old English-Welsh bilinguals
 Task:
 Children shown series of familiar pictures labeled in one or the other language.
 Several trials of one language followed by one trial in other language.
 Result:
 ERP showed that the bilinguals and the control group of English monolingual
toddlers detected language change
 However, response timings showed a difference, suggesting difference in
processing.
Gervain and Werker (2013)
 Head-turning preference procedure used to study 7-month old bilingual

infants learning 2 languages with different word orders, English and
Japanese.
 English is VO language, whereas Japanese is OV language.
 Eg. English: eat an apple (verb–object word order, VO)
 Japanese: ringo‐wo taberu “apple eat” (object–verb word order, OV)
 Task:
 Infants first made to hear stream of nonsense syllables consisting of
frequent and infrequent syllables.
 Then, they were played syllable stream that were characteristic of either
OV or VO parsing strategy.
 Results:
 When bilinguals heard stream with OV prosody; they tend to look longer or surprised.
 When they heard stream with VO prosody, they showed the opposite pattern of
looking.
 Meanwhile, the monolinguals were not able to parse the stream of OV prosody, and
defaulted to VO parsing strategy, even when they heard stream with no prosody.
 These results together suggest that bilingual infants separate their languages
based on prosody, and use this information to form expectations about the
word order they are hearing.
Consonant Perception
 Different languages use different phonemes, consonant contrasts and

vowel contrasts.
 By the end of their first year, bilingual infants perceive many consonant
contrasts that are important in their native languages.
 Studies have found that bilinguals are better than monolinguals at

discriminating non-native contrasts.
 This could be because:
 Delayed perceptual development of bilinguals. Both monolinguals and
bilinguals follow same pattern of perceptual narrowing, bilinguals take longer
to reach mature stage of speech perception.
 Bilinguals have a perceptual advantage. They maintain enhanced sensitivity

to speech sound contrasts as adaptation to their environments.
 Now the question is, do bilinguals form one intermediate consonant

boundary or do they maintain different consonant boundaries for each
language?
Ferjan-Ramírez et al. (2016)
 Aim: To investigate if bilingual infant brains respond to /t/ and /d/.
 Participants: 11-month old Spanish-English bilinguals and English
monolinguals.
 Task: Infants made to hear intermediate sound repeatedly and their
response was measured using magnetoencephalography (MEG)
 Measurement was done in two time windows: 100-260 ms & 260-460 ms
after the stimulus.
 Results:
 For /t/ (the English contrast), no differences found between monolinguals and
bilinguals in either time window.
 for /d/ (the Spanish contrast), bilinguals showed a stronger neural response
than the monolinguals in both time windows.
 Thus, infants’ brains responded most to the languages they were

acquiring.
Vowel Perception
 Just like consonants, different languages also use different vowel

contrasts.
Albareda‐Castellot, Pons, and Sebastián‐Gallés (2011)
 Aim: To investigate bilingual infants’ discrimination of the /e/–/ε/

contrast, using an anticipatory eye‐movement paradigm.
 Participants: 8 month old Catalan-Spanish bilingual infants.
 Task:
 Infants watched an animated Elmo character disappear into a t‐shaped
tunnel, and then emerge on one side or the other.
 The /e/ sound predicted Elmo’s reappearance on the right side, while hearing
the /ε/ sound predicted Elmo’s reappearance on the left side (or vice versa).
 Result: Both monolingual Catalan and bilingual Catalan–Spanish infants
succeeded, which they could only do if they perceived the difference
between the two sounds.
Child L2 Learners
 Studies on phonological development in child L2 learners predate similar
studies on bilingual infants as:
 Children can follow instructions, push buttons, and answer verbally.
 Whereas infants’ phonological knowledge must be inferred indirectly from
brain responses and looking time.
 One of the earliest research was a longitudinal task conducted by Snow

and Hoefnagel‐Höhle (1978), to test the critical period hypothesis.
 Participants: Native English‐speaking children and adults who had

recently moved to the Netherlands, aged 3–15 years, as well as adults.
 Task: Many aspects of participants’ L2 knowledge tested, including
grammar, pronunciation, story‐telling, most importantly, auditory
discrimination.
 Results:
 All groups improved over the course of testing.
 Children aged 12–15 did the best of any group, and performed similarly to
native Dutch‐speaking children, even during the first testing session – a short
6 months after arriving in the Netherlands.
 Conclusion was that, contrary to Critical period hypothesis, SLA,

including phonetic perception, was most rapid for children age 12–15
years.
Consonant Perception
 Studies of children’s L2 consonant perception have revealed various

different developmental patterns:
 In some cases, children seem to acquire non‐native contrasts rather quickly,
while in other cases they show more difficulty.
 Evidence shows that children’s perception of L2 consonants improves rapidly,

particularly compared to adult L2 learners.
McCarthy et al. (2014)
 Studied a group of English‐learning children who had grown up in a
Sylheti‐speaking community in London (Sylheti is a language from
Bangladesh).
 Aim: English‐learners were compared to a group of native English‐

speaking monolingual children on their consonant perception of two
consonant pairs: /k/–/g/ and /b/–/p/.
 Participants: Children were around 4.5 years of age.
 Task:
 Children heard a word, across several trials, and had to point to a picture
representing the word they had heard.
 Children were presented with words across a range of voice onset time
values, so that researchers could pinpoint where they placed their boundaries.
 Results:
 The native English speakers and the L2 learners showed very similar
performances overall.
 Both groups located the consonant boundaries in approximately the same

place and became both more consistent and more adult‐like in their
categorization abilities over time.
 These results would suggest that children quickly acquire the perceptual skills
to process L2 consonants.
Aoyama et al. 2008
 This study compared Japanese children and adults learning of English
consonant sounds.
 Participants: Japanese speaking children, aged 6–14 and their parents.
 The consonants investigated were the sounds /s/ (as in “sink”) and /θ/ (as
in “think”).
 Task:
 The participants heard three sounds and had to indicate which of the three was
different from the other two.
 Both groups were tested twice: first, around six months after arrival in the
United States and then around 1.5 years after arrival.
 Results:
 Both the Japanese children and adults showed difficulty with this task
compared to native English‐speaking adults and children.
 Over time, they showed improvement, even then, Japanese children

performed much worse than all the other groups.
 These results suggest that it may take some time for child L2 learners to
accurately perceive some L2 speech sounds.
Vowel perception
 In some cases, children accurately produce L2 sounds that they have

seem to have difficulty perceiving.
 Let us look at some studies that have investigated the same.

Darcy and Krüger (2012)
 Examined the vowel perception and production in children growing up in
Germany who spoke Turkish at home.
 Participants: Children aged 9–12 years and had started learning German
in preschool.
 Task:
 Children tested on their discrimination of several different German vowels
previously classified as either difficult or easy for monolingual Turkish adults to
discriminate.
 Children heard three different robots each produce a nonsense word (e.g.,
“kak”, “kak”, “kek”), and had to decide which robot was saying something
different.
 Results:
 Bilingual children were less able to discriminate the difficult contrasts, but
equally able to discriminate the easy contrasts, as compared to the control
group of German monolingual children.
 Suggesting that despite their early and ongoing exposure to German over
many years, they still perceived speech sounds differently than monolinguals.
 When measured on their ability to produce the same vowels in real

German words, monolinguals and bilinguals showed highly similar
production.
 The bilinguals were able to produce vowel contrasts that they were not
able to perceive.
One possible explanation:
 Production task used familiar German words, whereas the perception
task used nonsense words, which are unfamiliar.
 If L2 learners have German categories that are present but less stable
than those of monolinguals, they may have more difficulty applying
them in the contexts of nonsense words than familiar words.
 Implying that, in everyday speech, these children’s vowels can “pass”
for native, even though their perceptual categories may not be identical
to those of native speakers.
Tsukada et al. (2005)
 Aim: To investigate differences in how learners perceive versus produce
vowels.
 Participants: Children aged 9–17 years, adults who had been in an

English‐speaking country for either 3 or 5 years.
 Task: Production and perception of several different English vowel pairs

examined.
 Results:
 Bilingual children perceived the vowels more accurately than bilingual adults,
but were less accurate than monolingual English‐speaking children.
 Children who had been in the country for a longer duration did significantly
better than those who had been in the country for a shorter duration, while
this had no effect for adults.
 The study also revealed that unlike their performance on the perception
task, bilingual children’s productions of the same vowels were just as
accurate as monolingual children’s productions.
Base language effect in categorical
perception and speech production.
 When interacting with another bilingual of the same 2 languages, a
bilingual will be in bilingual mode.
 Here, both languages will be activated, and code switching will take place.
 However, only one language will be used for processing, and is the main
language of communication
 This language is known as Base Language.
 This base language will be more active than the other, which is known as
guest language
 In bilingual mode bilinguals choose a base language and can bring in the
other language in a number of ways:
 They can shift to that language for a word, phrase, or sentence. That is, they
can code-switch.
 They can borrow a word, a short expression from other, less activated,
language and adopt it morphologically, and even phonologically, into the base
language.
Bilingual speech production
 Bilingual’s language production system is dynamic.
 activation states of the non-target language might differ based on certain factors
like language mode.
 This can be seen with phoneme monitoring experiments.
 Colome (2001):
 Three experiments were conducted in which 83 highly fluent Catalan-Spanish

bilinguals had to decide whether a certain phoneme was in the Catalan name of a
picture
The task
 Phonemes could be either part of the Catalan word, part of its Spanish translation, or
absent from both nouns
 Picture: /taule/ table in Catalan
 Phonemes :
 /t/ related
 /m/ cross language relation
 /p/ non related
 participants took longer to reject the phoneme appearing in the Spanish word than
the control one
 the same pattern was replicated at different stimulus onset asynchronies (-2000,
+200, +400), which lead the authors to conclude that both the target language and
the language not in use are simultaneously activated.
 This study was replicated by Hermans, Besselaar and Van Hell (2011).
 Their stimulus et differed slightly from the previous study.
 The filler object names were divided into cognate and non-cognate words.
 Language pair: Dutch-English bilinguals.

 Example:
 Picture: bottle
 Target phonemes: /b/ affirmative condition
 /f/ cross language condition
 /p/ non related
 The pictures were chosen in such a way that half had English names and had a
translation equivalent name in Dutch (like bottle).
 The filler pictures were those whose names were also non-cognates.
 The result showed that there was n difference between cross language and non-
related phoneme identification in response latency or accuracy.
 Conclusion: non target language is NOT activated.
 Part 2 of the experiment:
 here the filler pictures were those that were cognates, like:
 Moon/maan; mouse/muis etc
 This time, the two critical conditions had difference in response latency as well
as accuracy.
 Cross language condition took longer and were less accurate.
 Conclusion: in this condition, the non-target language WAS activated.
 Thus, the activation level of the non-target language can be manipulated. Its
not static but dynamic.
 Similar findings have been reported by other studies too.
 Use of material, interlocutors, code-switching behaviour etc are found to have

an impact on the language production behaviour.
 (Grosjean 2008; Weil 1990; Caixeta 2003; Fokke et al 2007)

Language interaction in terms of
phonogical transfer
L1- L2 transfer
 types
 Segmental transfer:
 Featural transfer
 suprasegmental transfer
 Phonotactic transfer
 Segmental transfer:
 One of the first studies investigating this was Goto (1971) and Miyawaki et al
( 1975).
 Difficulty of Japanese speakers in differentiating between English liquids /r/ and /l/.
 Goto used natural stimuli and tested both perception and production.
 Miyawaki used synthetic stimuli and tested perception.
 Both studies reported low accuracy scores for Japanese in both perception and
production.
 Similar studies are carried out in recent times as well
 Ingvalson, McCleland &B Holt (2011) showed that even Japanese speakers living in
English speaking environment for many years could not distinguish the two sounds.
 Similar findings are reported from other studies investigating L2 specific consonants
contrasts .
 French NS speakers : /d/ Vs / δ / ; /r/ Vs /w/ contrast in English

 English NS : Mandarin Chinese affricate-fricative contrast
 Spanish NS’ Catalan contrast /e/ Vs /ε/ etc.
 Studies like this highlight the differences between L1 and L2 phonemic investory and the
resultant transfer of L1 phoneme to L2 while processing.
(Sundara, Polka &Genesee 2006; Halle, Best & Levitt 1999; Tsao, Liu & Kuhl 2006)
 Featural transfer
 Distinctive features like length, stress etc are used in some languages, but are
absent in others.
 L2 learners ate typically found to have difficulty if these features are present in L2
but not in L1.
 Length:
 In some circumstances speakers whose L1 does not make use of this temporal
feature have been found to be able to perceive the same in their L2
 Russian-Estonian bilinguals
(Meister & Meister 2011)

 In some other cases, the same was not observed.
 E.g. Catalan learners of English (Cebrian 2006)
 English learners of Japanese find it difficult to distinguish between Japanese long and short
consonants and vowels.
 Similarly, Finnish and Swedish ESL learners difficulty in perceiving the syllable final fricative /s/
Vs/z/ (peace Vs peas)
 Temporal inaccuracy, location of the boundary between long and short segments etc are also
investigated.
Callan et al 2006; Han 1992; Hardison & Saigo 2010; Hirata 2004; Flege, Munro & Skelton 1992)
 For example, English speaking learners of French produced longer VOTs than
French NS (native speakers).
 Mandarin and Korean speakers ESL learners were found to produce longer vowel
duration than English NS and so on.
 Spanish ESL speakers temporal inaccuracy in producing English vowels.
(Chen 2003; Ingram & Park1997; Shah 2004)

 Suprasegmental transfer:
 Word stress , tone etc are suprasegmental features.
 In this domain, advanced high proficient speakers of L2 are often found to

perform like NS .
 However, non-nativeness is also common.
 Stress error in L2 found in L2 production among Chinese, Japanese and

Spanish ESL speakers.
 Lack of sensitivity on stress related cues were also reported.
 Perception and production of lexical tones: L1 influence is prominent
 Cantonese, Japanese and English NS were found to score poorly in identifying

tonal contrast in Mandarin.
 This identification were found missing even after training.
 Even tonal L1 may not benefit as tones are categorized differently in different
languages.
 E.g. Mandarin tone perception and production among Cantonese was studied and
results sowed the accuracy rates were not good in identifying all the tones.
(So & Best 2020; Wang, Spence, Jongman & Sereno 1999; Hao 2012)
 Phonotactic transfer:
 Difficulty found even among advanced learners in this domain.
 Consonant clusters, vowel epenthesis and so on have been investigated.
 English NS producing Polish words with consonant clusters that are illegal in
English had an accuracy rate of 11% to 63%, but for those clusters that are
legal in English were 94%.
 speakers of a language where consonant clusters are not allowed tend to
insert a vowel between the two consonants.
 Davidson, Jusczyk & Smolensky 2004; Dupoux et al 1999)

L2 effect on L1
 Is the influence unidirectional or bidirectional? Is it possible for the L2 to impact L1
phonetic processing?
 The answers seems: YES
 Flege (1987) was one of the first to report the same
 L1 categories were affected by L2 features.
 English voiceless stops have longer VOTs than French.
 English NS living in Paris for more than 11 years were found to produce English
stops with shorter VOTs.
 Chang (2012) studied American students studying in a Korean immersion program
 He found general lowering of F1 and F2 values of all English vowels due to lower
similar values in Korean.
 Dimitrova et al (2010) compared Russian monolinguals with Russian English

bilinguals living in USA.
 Neutralization: devoicing of word final voicing contrasts involving stops and
fricatives.
 Russian word /kod/ (code) becomes /kot/ on devoicing
 This is common in Russian but not in English
 The bilinguals showed lesser tendency toward devoicing in a reading test.

Models of L2 phonological processing
 Several theoretical frameworks have tried to account for phonological processing and
development in non-native languages.
 The Speech Learning Model (SLM)

 Deals with the issue pf production and perception of L2 phonology at segmental level.
 It has there basic assumptions:
 Perception are production are related. Accurate perception leads to accurate
production
 L1 and L2 sounds exists in a common phonological space.
 Adults have the same capability in learning native-like phonology in L2 as children
learning their L1.
(Flege 1995; 1999, 2002; 2003)

 The Perceptual Assimilation Model (PAM)
 How a language specific phonological skill is developed among infants, while

learning their L1.
 Hence this model is more focused on a specific phenomenon.
 (Catherine Best 1993, 1994, 1995)

More on bilingual language processing in the next module
References
 Albareda‐Castellot, Barbara, Ferran Pons, and Núria Sebastián‐Gallés. 2011. “The acquisition of phonetic
categories in bilingual infants: New data from an anticipatory eye movement paradigm.” Developmental
Science, 14 (2): 395–401. DOI 10.1111/j.1467‐7687.2010.00989.x
 Amengual, M., & Chamorro, P. (2015). The Effects of Language Dominance in the Perception and Production of
the Galician Mid Vowel Contrasts. Phonetica, 72(4), 207–236. https://doi.org/10.1159/000439406
 Aoyama, Katsura, Susan G. Guion, James Emil Flege, Tsuneo Yamada, and Reiko Akahane‐ Yamada. 2008. “The
first years in an L2‐speaking environment: A comparison of Japanese children and adults learning American
English.” International Review of Applied Linguistics in Language Teaching, 46 (1): 61–90. DOI:
10.1515/IRAL.2008.003
 Casillas, J. V. (2015). Production and perception of the /i/-/I/ vowel contrast: The case of L2-dominant early
learners of English. Phonetica, 72(2-3), 185-205. https://doi.org/10.1159/000431101
 Darcy, Isabelle, and Franziska Krüger. 2012. “Vowel perception and production in Turkish children acquiring L2
German.” Journal of Phonetics, 40 (4): 568–581. DOI: 10.1016/ j.wocn.2012.05.001
 Ferjan Ramírez, Naja, Rey R. Ramírez, Maggie Clarke, Samu Taulu, and Patricia K. Kuhl. 2016. “Speech
discrimination in 11‐month‐old bilingual and monolingual infants: A magnetoencephalography study.”
Developmental Science. Advance online publication. DOI: 10.1111/desc.12427.
 Flege, J. E., & MacKay, I. R. A. (2004). Perceiving Vowels in a Second Language. Studies in Second Language
Acquisition, 26(1), 1–34. https://doi.org/10.1017/S0272263104261010
 Flege, J. E., MacKay, I. R. A., & Meador, D. (1999). Native Italian speakers' perception and production of English
vowels. Journal of the Acoustical Society of America, 106(5), 2973–2987. https://doi.org/10.1121/1.428116
 García-Sierra, A., Ramírez-Esparza, N., Silva-Pereyra, J., Siard, J., & Champlin, C. A. (2012). Assessing the double
phonemic representation in bilingual speakers of Spanish and English: An electrophysiological study. Brain and
Language, 121(3), 194–205. https://doi.org/10.1016/J.BANDL.2012.03.008
 Gervain, J., Werker, J. Prosody cues word order in 7-month-old bilingual infants. Nat Commun 4, 1490 (2013).
https://doi.org/10.1038/ncomms2430
 Kuipers, J. R., & Thierry, G. (2012). Event-related potential correlates of language change detection in
bilingual toddlers. Developmental cognitive neuroscience, 2(1), 97–102.
https://doi.org/10.1016/j.dcn.2011.08.002
 McCarthy, Kathleen M., Merle Mahon, Stuart Rosen, and Bronwen G. Evans. 2014. “Speech perception and
production by sequential bilingual children: A longitudinal study of voice onset time acquisition.” Child
Development, 85 (5): 1965–1980. DOI: 10.1111/cdev.12275.
 Snow, Catherine E., and Marian Hoefnagel‐Höhle. 1978. “The critical period for language acquisition:
Evidence from second language learning.” Child Development, 49 (4): 1114. DOI: 10.2307/1128751.
 Stölten, K., Abrahamsson, N., & Hyltenstam, K. (2014). Effects of Age of Learning on Voice Onset Time:
Categorical Perception of Swedish Stops by Near-native L2 Speakers. Language and Speech, 57, 425 - 450.
 Tsukada, Kimiko, David Birdsong, Ellen Bialystok, Molly Mack, Hyekyung Sung, and James Flege. 2005. “A
developmental study of English vowel production and perception by native Korean adults and children.”
Journal of Phonetics, 33 (3): 263–90. DOI: 10.1016/ j.wocn.2004.10.002.

Module 5

Uploaded by

Copyright:

Available Formats

Module 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 5

Uploaded by

Copyright:

Available Formats

Module 5:

Bilingual Speech Processing

 This particular skill can also be referred to as processing and

 A human being is said to process 7 to 10 sounds per second, and about

 From perceiving the acoustic waves to building an all encompassing

(Grosjean & Li, 2013: pp-30)

 Second, “linguistic knowledge” and “processing mechanism” in the

 Includes extralinguistic information such as: context in which the speech

 The final outcome of perception and comprehension is referred to as Mental

 These levels include:

 Let’s understand each of these levels using an example:

 These include pitch, loudness, length, rhythm, etc.

 The prosodic analysis helps categorize the utterance as statement,

 In this sentence, there is a stronger stress on “gave”, as well as a pitch that

 Here, AGENT give THEME to the RECIPIENT.

 The outcome of this level of processing mechanism is the literal mental

 This knowledge includes:

 In this example, to understand it fully, we need to know that Mary repeatedly

 However, there are some differences when it comes to bilingual speech.

 First, Bilinguals process two languages, hence some of the central

 Some languages may be less well perceived in adverse condition.

 For example, bilinguals do worse than monolinguals in perceiving speech in

(Lecumberri, Cooke &Cutler 2010)

 based on linguistic structure of each language, processing mechanism might

 the utterances to be processed may be in monolingual or bilingual mode.

 This issue is unique to a bilingual.

 This will have added processing cost for a bilingual.

 This is because of the co-activation of these competing nodes from the

Grosjean & Li. 2013: pp-33.

 Participants: Russian-English bilinguals

 Stimuli: Playa (Spanish): beach

 This made the participants look at ‘pliers’ (competitor)

 This time, they reported reduction in eye movement to the competitor to a

 Other studies involving different phoneme inventories across language, gender

(Pallier, Colome & Sebastian-Galles 2001; Guillelmon & Grosjean 2001)

 Resting level of non-target language can be modified by various other factors.

This model proposes three level of nodes:

Feature level: shared by two languages

Phoneme and word levels: independent sets

Features activate phonemes and phonemes

Activation connection between words and

Feature level to phoneme level is

 Both are computer models of processing.

(Thomas & Van Leuven 2005)

 Research on bilingual adults examine:

 However, under certain circumstances, a bilingual’s two languages will

 Degree of languages influence determined by factors:

 Can be very early in infancy or late in life.

 Task: discriminate range of vowel contrasts between

 3 stimuli presented on each trial. Participants had to indicate if the 3 stimuli

 Strong effect of AOA observed for Italian-English bilinguals.

 Discrimination of English-English and English-Italian contrasts worsened as AOA

 Conclusion: As the AOA increases, L2 vowels perceived less accurately.

 15 native Swedish monolinguals, 41 Spanish-Swedish bilinguals

 Task: 3 continua created for Swedish voiced vs voiceless contrasts

 Late bilinguals categorization deviated most from native speakers

 Early bilinguals’ responses were somewhere in between

 Conclusion: the degree of deviation from native speakers’ categorization

 In general, high usage of a language, be it L1 or L2, is associated with greater