(Handbucher Zur Sprach- Und Kommunikationswissenschaft 38.1) Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill, Sedinha Teßendorf (Eds.)-Body – Language – Communication_ an I
(Handbucher Zur Sprach- Und Kommunikationswissenschaft 38.1) Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill, Sedinha Teßendorf (Eds.)-Body – Language – Communication_ an I
(Handbucher Zur Sprach- Und Kommunikationswissenschaft 38.1) Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill, Sedinha Teßendorf (Eds.)-Body – Language – Communication_ an I
HSK 38.1
Handbücher zur
Sprach- und Kommunikations-
wissenschaft
Handbooks of Linguistics
and Communication Science
Manuels de linguistique et
des sciences de communication
Band 38.1
De Gruyter Mouton
Body – Language –
Communication
An International Handbook on
Multimodality in Human Interaction
Edited by
Cornelia Müller
Alan Cienki
Ellen Fricke
Silva H. Ladewig
David McNeill
Sedinha Teßendorf
Volume 1
De Gruyter Mouton
ISBN 978-3-11-020962-4
e-ISBN 978-3-11-026131-8
ISSN 1861-5090
Volume 1
V. Methods
52. Experimental methods in co-speech gesture research Judith Holler. . . . 837
53. Documentation of gestures with motion capture Thies Pfeiffer . . . . . . . 857
54. Documentation of gestures with data gloves Thies Pfeiffer . . . . . . . . . . 868
55. Reliability and validity of coding systems for bodily
forms of communication Augusto Gnisci, Fridanna Maricchiolo
and Marino Bonaiuto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
56. Sequential notation and analysis for bodily forms of
communication Augusto Gnisci, Roger Bakeman
and Fridanna Maricchiolo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892
57. Decoding bodily forms of communication Fridanna Maricchiolo,
Angiola Di Conza, Augusto Gnisci and Marino Bonaiuto . . . . . . . . . . . 904
58. Analysing facial expression using the facial action coding system
(FACS) Bridget M. Waller and Marcia Smith Pasqualini. . . . . . . . . . . . 917
59. Coding psychopathology in movement behavior: The movement
psychodiagnostic inventory Martha Davis . . . . . . . . . . . . . . . . . . . . . . . 932
60. Laban based analysis and notation of body
movement Antja Kennedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 941
61. Kestenberg movement analysis Sabine C. Koch and K. Mark Sossin . . . 958
62. Doing fieldwork on the body, language,
and communication N. J. Enfield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
viii Contents
One consequence of this development is the sudden increase in interest in the bodily
aspects of language and communication, a phenomenon that is apparent from the esca-
lation in the number of research projects from fields such as artificial intelligence to
media studies and conversation analysis on topics which are being subsumed under
the term “multimodality”.
The handbook offers a perspective on the body as “part” and “partner” of language
and communication. In this way it contributes to some of the current key issues of the
humanities and the natural sciences: the multimodality of language and communication,
and the notion of embodiment as a resource for meaning-making and conceptualization
in language and communication. It overcomes the longstanding dichotomy represented
in the concepts of verbal and nonverbal communication, and promotes an incorporation
of the body as integral part of language and communication. With this perspective, the
handbook documents the bodily and embodied nature of language as such. We should
underline that nonverbal communication studies are products of a fundamentally differ-
ent concept of how bodily and linguistic forms of communication cooperate in commu-
nication. Nonverbal communication studies focus on the social-psychological
dimensions of bodily communication and have basically separated the body from lan-
guage. Informed by Watzlawick’s dichotomy of analogic and digital forms of communi-
cation and the functional attributions of “social-relation” versus “linguistic content”,
nonverbal communication research has devoted most of its interest to research on social
and affective facets of bodily forms of communication (Watzlawick, Bavelas and Jack-
son 1967). The claim in this approach is that the verbal part of the message is what car-
ries content, while the nonverbal part does not, it only conveys affective and social
meaning. This theory has inspired highly important strands of research, among them:
the very rich field of studying facial expressions of affect and emotion (Ekman and Da-
vidson 1994; Ekman and Richardson 1997), and in this context, the study of forms of
deceit and nonverbal leakage; movement analysis as a measure for psychic integration
and disintegration (Davis 1970; Davis and Hadiks 1990; Lausberg, von Wintersheim and
Hubert 1996; Lausberg 1998); and fields of study concerning issues of gender, culture,
and social status. It is not by accident that the analysis of hand movements or gestures
plays a minor role in nonverbal communication studies. Gestures were recognized as
being not non-verbal enough to be considered of interest for nonverbal communication
research (see the debate on the “verbal” or “non-verbal” status of gesture in Psycho-
logical Review Butterworth and Hadar 1989; Feyereisen 1987; McNeill 1985, 1987,
1989). Indeed it is the close integration of gestures with speech (Beattie 2003; Cienki
1998, 2005; Cienki and Müller 2008a, 2008b; Duncan, Cassell and Levy 2007; Fricke
2007, 2012; Kendon 1972, 1980, 2004; McNeill, 1992, 2005; Müller 1998; Müller 2009)
that has made those forms of bodily movements less interesting for the research con-
ducted in the spirit of nonverbal communication. And it is precisely this that makes ges-
tures such an interesting topic for students of language proper.
An obvious consequence of the particular orientation of nonverbal communication
studies was that relatively little was known about human gesticulation and its integra-
tion with language and communication until very recently. Only when the humanities
shifted more significantly towards cognitive science in the 1970s and 80s did gestures
begin very slowly to attract the interest of linguists, psychologists and anthropologists.
The grounds were laid early on with the pioneering writings of Adam Kendon and
David McNeill; they served as a basis on which a steadily increasing community of
Introduction 3
scholars from various disciplines could build their research in the 90s on the hand move-
ments that people make when they talk. Since then a field of gesture studies has
emerged with its own journal, book series, society, and biennial international confer-
ences. The research carried out on human and non-human forms and uses of gestures
will be widely documented in this handbook. Hand-gestures are the “articulators” that
are closest to vocal language: they contribute to all levels of meaning, and they are syn-
tactically, pragmatically, and semantically integrated with speech, forming in Adam
Kendon’s terms gesture-speech ensembles, and constituting in David McNeill’s terms
the imagistic part of language, playing a crucial role in the cognitive processes of think-
ing for speaking. And as we have mentioned already, it is the hand movements that
under certain circumstances may turn into a full-fledged language. Note that this is
not true for the face, the torso, or the legs. The hands are the primary articulators
along with our vocal tracts that can become articulators of language. Despite the impor-
tance of gestures, however, we will underline in the handbook the fact that it is not only
the hands plus vocal tract which are used to communicate: we will highlight the integra-
tion of other concomitant forms of visible action as well, such as the face, gaze, posture,
and body movement and orientation. With this orientation, the handbook seeks to over-
come Watzlawick’s dichotomy that has blindfolded the close cooperation of visible and
audible forms of communication.
To ensure cross-disciplinary transparency, the articles in this handbook are written and
conceptualized for an interdisciplinary audience.
The handbook may serve both as a resource for specific questions as well as for gain-
ing an overview of specific topics, problems, and questions discussed in the field. It may
serve both as guideline and orientation for anyone interested in this new field of
scientific interest.
evolving within one family. Here a particular emphasis is put on how gestures relate to
signs in signed languages.
Chapter two outlines perspectives on the relation of the body to language and commu-
nication from the perspective of various different disciplines. Multimodal communication
has raised the interest of a wide range of disciplines, and this chapter is giving accounts
from: Psychology of Language, Psycholinguistics, Neuropsychology, Cognitive Linguis-
tics, Linguistics, Conversation Analysis, Ethnography, Cognitive Anthropology, Social
Psychology, Multimodal Interaction, and Literature.
Chapter three presents a documentation of historical and cross-cultural dimensions
of research regarding the relation of body movements to language and speech. Starting
from prehistoric gestures, Indian traditions of a grammar of gestures in dance, Jewish
traditions and their active gestural practices in religious life, it moves on to European
scholarly treatments. It further includes articles on medieval practices of the body, on
Renaissance ideas on gestures as universal language, on enlightenment philosophy
and the debate around gestures, language, and the origin of human understanding
and it ends with a sketch of 19th and 20th century research of body, language, commu-
nication. The historical considerations of body movements as communication are con-
cluded with contributions from arts and philosophy – dance and the history of the
notion of mimesis.
Chapter four offers an encompassing collection of contemporary approaches of how
the relation between body motion and language in communication should be conceived.
Notably, each author outlines his or her particular view on this subject matter and we
are presenting here views of eminent and senior scholars as well as perspectives ad-
vanced by junior colleagues. These articles present theories or approaches to the
body in communication in a nutshell. Topics range from mirror systems and gestures
as precursor of speech in evolution to the social interactive nature of gestures. Chapter
five finally provides a valuable collection of methods for the analysis of multimodal
communication. Again methods included here cover a wide range of disciplines includ-
ing quantitative as well as qualitative takes on the analysis of body movement used with
and without speech.
4. References
Beattie, Geoffrey 2003. Visible Thought: The New Psychology of Body Language. London:
Routledge.
Butterworth, Brian and Uri Hadar 1989. Gesture speech and computational stages: A reply to
McNeill. Psychological Review 96(1): 168–174.
Cienki, Alan 1998. Metaphoric gestures and some of their relations to verbal metaphorical expres-
sions. In: Jean-Pierre Koenig (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford, CA: Center for the Study of Language and Information.
Cienki, Alan 2005. Metaphor in the “Strict Father” and “Nurturant Parent” cognitive models:
Theoretical issues raised in an empirical study. Cognitive Linguistics 16(2): 279–312.
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John
Benjamins.
Cienki, Alan and Cornelia Müller (2008b). Metaphor, gesture, and thought. In: Raymond W.
Gibbs, Jr. (ed.), The Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge:
Cambridge University Press.
6 Introduction
Davis, Martha 1970. Movement characteristics of hospitalized psychiatric patients. In: Claire
Schmais (ed.) Proceedings of the Fifth Annual Conference of the American Dance Therapy
Association, 25–45. Columbia: The Association.
Davis, Martha and D. Hadiks 1990. Nonverbal behavior and client state changes during psy-
chotherapy. Journal of Clinical Psychology 46(3): 340–351.
Donald, Merlin 1993. Origins of the Modern Mind. Cambridge, MA: Harvard University Press.
Duncan, Susan, Justine Cassell and Elena Levy (eds.) 2007. Gesture and the Dynamic Dimension
of Language. Amsterdam: John Benjamins.
Ekman, Paul and Richard J. Davidson (eds.) 1994. The Nature of Emotion. Oxford: Oxford
University Press.
Ekman, Paul and Erika Rosenberg (eds.) 1997. What the Face Reveals. Basic and Applied Studies
of Spontaneous Expression using the Facial Action Coding System (FACS). Oxford: Oxford
University Press.
Feyereisen, Pierre 1987. Gestures and speech, interactions and separations: A reply to McNeill.
Psychological Review 94(4): 493–498.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: de Gruyter.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin:
De Gruyter Mouton.
Kendon, Adam 1972. Some relationships between body motion and speech: An analysis of an
example. In: Aaron W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
177–210. New York: Pergamon Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Nonverbal Communication and Language, 207–227. The Hague: Mouton.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Lausberg, Hedda 1998. Does movement behavior have differential diagnostic potential? Ameri-
can Journal of Dance Therapy 20(2): 85–99.
Lausberg, Hedda, Jörn von Wietersheim and Feiereis Hubert 1996. Movement behaviour of
patients with eating disorders and inflammatory bowel disease. A controlled study. Psychother-
apy and Psychosomatics 65(6): 272–276.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3): 350–371.
McNeill, David 1987. So you do think gestures are nonverbal! A reply to Feyereisen. Psycholog-
ical Review 94(4): 499–504.
McNeill, David 1989. A straight path – to where? Reply to Butterworth and Hadar. Psychological
Review 96(1): 175–179.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arno Spitz.
Müller, Cornelia 2008. Metaphors. Dead and Alive, Sleeping and Waking. A Dynamic View.
Chicago: University of Chicago Press.
Müller, Cornelia 2009. Gesture and Language. In Kirsten Malmkjaer (ed.) Routledge’s Linguistics
Encyclopedia. 214–217. Abington/New York: Routledge.
Tomasello, Michael 2000. The Cultural Origins of Human Cognition. Cambridge, MA: Harvard
University Press.
Watzlawick Paul, Janet H. Beavin Bavelas and Don D. Jackson 1967. Pragmatics of Human
Communication. A Study of Interactional Patterns, Pathologies and Paradoxies. New York:
W.W. Norton.
Abstract
In this essay, I offer a survey of the main questions with which I have been engaged in
regard to “gesture,” or, as I prefer to call it, and as will be explained below, “utterance
visible action.” In doing so, I hope to make clear the approach I have employed over
a long period of time in which, to put it in the most general terms, visible bodily actions
used in the service of utterance are seen as a resource which can be used in many dif-
ferent ways and from which many different forms of expression can be fashioned, de-
pending upon the circumstances of use, the communicative purposes for which they are
intended, and how they may be used in relation to other media of expression that may
be available. Accordingly, I have sought to describe the diverse ways in which utterance
visible actions may be employed, their semiotic properties, and how they work as
components of utterance in relation to the other components that may also be being
employed.
Visible bodily action may also serve as a means of discourse, however. Either by
itself, or in collaboration with speaking, visible bodily actions can be used as a means
of saying something. For example, one draws attention to something by pointing at
it, one may employ one’s hands to describe the appearance of something or to suggest
the form of a process or the structure of an idea. By means of visible bodily action one
can show that one is asking a question, pleading for an answer, is in disagreement, and a
host of other things, specific to the current linguistically managed interchange. There
are forms of visible bodily actions that can serve instead of words, and in some circum-
stances entire language systems are developed using only visible action. In short, there
are many different ways in which visible bodily action may be employed to accomplish
expressions that have semantic and pragmatic import similar to, or overlapping with,
the semantic and pragmatic import of spoken utterances.
This constitutes the utterance uses of visible bodily action. It is this that I shall call
utterance visible action, and it corresponds to what is often referred to by the word
“gesture.” However, because “gesture” is also sometimes used more widely to refer
any kind of purposive action, for example the component actions of practical action
sequences, or actions that may have symptomatic significance, such as self-touchings,
patting the hair, fiddling with a wedding ring, rubbing the back of the head, and the
like, because it is also used as a way of referring to the expressive significance of any
sort of action (for example, saying that sending flowers to someone is a “gesture of
affection”), and because, too, in some contexts the word “gesture” carries evaluative
implications not always positive, it seems better to find a new and more specific
term. I also think that doing so invites the undertaking, without prejudice, of a compar-
ative semiotic analysis of all of the different ways in which visible bodily action can
enter into the creation of utterances (Kendon 2008, 2010).
By “utterance” I mean any action or ensemble of actions that may be employed to
provide expression to something that is deemed by participants to be something that the
actor meant to express, that was expressed wilfully, that is. Goffman’s distinction
between information that is “given” and information that is “given off ” is helpful in
clarifying this (see Goffman 1963: 13–14). As he says, everything we do all of the
time “gives off ” information about our intentions, interests, attitudes, and the like.
However, some kinds of actions are taken to have been done with the intent to express
something, whether by words alone, by words combined with actions, or by visible
actions alone (as in sign languages). These actions are taken to “give” information,
they express what the person “meant” and the actor can be called to account for
them. Actions treated by co-participants in this manner are the actions of utterance,
and we may establish a domain of concern that attends to the different ways in
which visible bodily action can serve as utterance action, and how it may do so (see
Kendon 1978, 1981, 1990: Chapter 8; and, especially, Kendon 2004: Chapters 1–2).
It is important to stress that this domain cannot be established with sharp boundaries
nor can rigid criteria be established according to which an action is or is not admitted as
an utterance visible action. There is a core of actions about which there seems to be
widespread agreement that they comprise utterance visible actions. This includes, wav-
ing, pointing, the use of symbolic gestures of any kind, manual actions made while
speaking (“gesticulation”), as well actions performed in the course of creating utter-
ances in sign language. There are always forms of action whose status is ambiguous,
however. If we compare actions that tend to be accepted as being done with what we
1. Exploring the utterance roles of visible bodily action: A personal account 9
might call “semantic intent” with those that are not so regarded, we may discover a set
of features which actions may have less of or more of. The less they have of these fea-
tures, the more likely they are to be disregarded, not attended, or not counted as
“meant.” Sometimes this ambiguity is exploited. On occasion, someone wishing to con-
vey something to another by means of a visible action which they want to be understood
only by one specific other, and not by anyone who may be co-present, may alter the per-
formance of their action so that it seems casual or to have the character of a “comfort
movement” or some other sort of disattenable action (for examples, see de Jorio 2000:
179–180, 185, 188, 260–261; Morris et al. 1979: 67–68, 88–89).
Comfort movements (“self-adaptors” in the terminology of Ekman and Friesen
1969) and other kinds of “mere actions” may well be studied for what they reveal as
symptoms of a person’s motivational or affective state, thus attracting attention from
a psychological point of view (for early studies see Krout 1935; Mahl 1968). Actions
considered as “meant,” on the other hand, attract attention from a point of view that
is closer to that of students of language and discourse. Issues of interest here include
questions about the semiotic character of utterance visible actions and how they are
employed as components in utterance construction. These modes of expression also
raise issues for cognitive theories of language. For example, utterance visible actions
are treated by some authors as if they are image-like representations of meaning
(McNeill 1992, 2005). When deployed in relation to spoken language, their study
may suggest how the mental representation of utterance meaning is multi-levelled
and organised as a simultaneous configuration, aspects of which can be represented
through utterance visible action at the same time as other aspects can be represented
by means of the linear structures of spoken language. Old questions about the relation-
ship between language and the structure of thought, debated extensively in the eigh-
teenth century, may be re-opened in a new way through studies of utterance visible
action both in speakers and in signers (Woll 2007; see also Ricken 1994).
I now turn to discuss some of the main themes which have occupied me in my work
in this domain. I begin with aspects of how utterance visible action, speech and verbal
expression are related within the utterance. Then I discuss work on utterance visible
action when it is the sole vehicle of utterance. This includes a study of a primary
(deaf) sign language in Papua New Guinea, and a much larger study of alternate sign
languages in use among Australian Aborigines. I conclude with a short survey of what
I see as some of the broader implications of these studies.
of the body, especially of the head and face, patterned in relation to aspects of speech.
In consequence of this, and adopting methods I had learned from an association with
William Condon (Condon and Ogston 1966, 1967; Condon 1976), I undertook a de-
tailed analysis of the bodily action that could be observed in a two-minute film showing
a continuous discourse by a man who was engaged in an informal discussion of
“national character” in a London pub. In a paper published in 1972 (Kendon 1972b),
which reported this analysis, I described how, in association with each “tone unit” (Crys-
tal 1969) in the spoken discourse one could observe a contrasting pattern of bodily
action. Patterns of action of shorter duration might be accompanied by other contrasting
patterns of longer duration – so one could say that the movement flow was organized at
multiple levels simultaneously. To a considerable degree these multiple levels in the
movement flow corresponded to the several different levels of organization in terms
of which the flow of speech could be analyzed. I was led to suggest – to quote from
the paper – “the speech production process is manifested in two forms of activity simul-
taneously: in the vocal organs but also in bodily movement, particularly in movements of
the hands and arms” (Kendon 1972b: 205).
In the aforementioned 1972 study, I attempted to deal with all observable movements –
the fingers and hands and arms, the shifts in positionings of the trunk, changes in ori-
entation of the head. From this it appeared that the larger segments of discourse were
bracketed by sustained changes in posture or new orientations of the head or repeated
patterns of head action, while shorter segments of discourse, down to the level of the
tone unit and even syllables within the tone unit, were associated with phrases of move-
ment of shorter duration. This was in accord with observations that Birdwhistell and his
colleague Albert Scheflen had summarised in earlier publications (see Scheflen 1965).
From this single study I concluded that the “utterance” manifested itself in two aspects
simultaneously – in speech and visible bodily action (Kendon 1980a).
In subsequent work, my attention focused more upon speakers’ hand actions. Such
actions, as is well known, had been in the past, as they have been subsequently, the prin-
cipal focus of interest in studies of “gesture.” There is good reason for this. After all, as
Quintilian noted some 2000 years ago, of all the body parts that speakers move, the
hands are closest to being instruments of speaking. In his discussion of the role of visible
bodily action in Delivery (Actio), he writes while “other parts of the body merely help
the speaker … the hands may almost be said to speak” (see Quintilian Institutio
Oratoria Book XI, iii. 86–89 in Butler 1922).
Subsequent to my 1972 publication, I developed a terminology and a scheme for ana-
lyzing the organization of speaker’s hand movements and offered some general obser-
vations on how these relate to speech (Kendon 1980a). These suggestions were re-
stated and slightly revised later (Kendon 2004: Chapter 7). The slight modifications
in terminology given in this revision are reflected in what follows here. As a starting
point I noted how the forelimb movements of utterance visible actions are organised
as excursions – the hand or hands are lifted away from a position of rest (on the
body, on the arm of chair, etc.), they move out into space and perform some action,
thereafter returning to a position of rest, often very similar to the one from which it
started. This entire excursion, from position of rest to position of rest, I called a Gesture
Unit. Within such an excursion the hand or hands might perform one or more actions –
pointing, outlining or sculpting a shape, performing a part of an action pattern, and
1. Exploring the utterance roles of visible bodily action: A personal account 11
so on. This action was called the stroke. Whatever the hand or hands did to organize
themselves for this action was called the preparation. The preparation and stroke,
taken together, I referred to as a Gesture Phrase, with the stroke and any subsequent
sustained position of the hand considered as the nucleus of the Gesture Phrase. Once
the hand or hands began to relax, the Gesture Phrase was finished. The hand (or
hands) might then start upon a new preparation, in which case a new Gesture Phrase
begins, or it might go back to a position of complete rest, in which case the Gesture
Unit would be finished. The distinction between Gesture Unit, the more inclusive
unit, and Gesture Phrase, was necessary, for in this way a succession of Gesture Phrases
within the frame of a single Gesture Unit could be accommodated. As we had shown
(in Kendon 1972b), the nested hierarchical relationship between Gesture Unit and Ges-
ture Phrases corresponded to the nested hierarchical relationship between tone unit
groupings at various levels within the spoken discourse. Just as spoken discourse is
organized at multiple levels simultaneously, so this appears to be true of associated
utterance visible actions of the forelimbs.
Examination of how these Gesture Phrases were organized in relation to their con-
current tone units suggested that the stroke of the Gesture Phrase tended to anticipate
slightly, or to coincide with the tonic center of the tone unit. Looking at the form of
action in the stroke and what it seemed to express, it seemed that there was a close coor-
dination between the meanings attributed to the action of the stroke and the meaning
being expressed in the associated tone unit. This did not mean that the meanings attrib-
uted to the forms of action in the Gesture Phrases were always the same as the mean-
ings expressed in the associated speech. Rather, it meant that there was generally a
semantic coherence between them (McNeill 1992 has called this “co-expression”).
Sometimes these meanings seemed to parallel verbal meaning, but they often seemed
to complement it or add to it in various ways. Uttering, that is, could be done both ver-
bally and kinesically in coordination. This gave rise to the general observation that,
somehow, expression in words and expression in visible bodily action are intimately
related. These conclusions were, in part, confirmed in independent observations by
McNeill and were incorporated and re-stated by him in his book Hand and Mind
(McNeill 1992).
Subsequent to this demonstration that a speaker, in using the hands in this way, does
so as an integral part of the utterance, I began to investigate the different ways in which
these hand actions could be deployed in relation to the speech component of the utter-
ance. From this it appeared that the utterer can be flexible in how this is done. The co-
ordinate use of the two modes of expression is orchestrated in relation to whatever
might be the speaker’s current rhetorical aim. Thus we described examples where the
speaker delayed speech so that a kinesic expression could be foregrounded or com-
pleted, examples in which the speaker delays a kinesic expression so that it could be
placed appropriately in relation to what was said, and yet other examples showing
how the speaker, though repeating the same verbal expression, employed a different
kinesic expression with each repetition. These observations were presented in Chapter 8
of Kendon (2004). I took them as supporting the view that these manual actions “should
be looked upon as fully fashioned components of the finished utterance, produced as
an integral part of the ‘object’ that is created when an utterance is fashioned” (Kendon
2004: 157).
12 I. How the body relates to language and communication
3.1. Referential
There are two ways in which visible actions can contribute to referential or proposi-
tional meaning. One way is by pointing. Here the actor, by pointing at something,
1. Exploring the utterance roles of visible bodily action: A personal account 13
can establish what is pointed at as the referent to some deictic expression in the dis-
course. In a study of pointing (Kendon and Versante 2003; Kendon 2004: Chapter 11)
the different hand shapes used when pointing were described (six different forms were
identified), and the discourse contexts in which they were used were examined. It
emerged that different hand shapes are used in pointing, according to the way in
which speakers used the referent of the pointing in their discourse. For example, if it
was important that the speaker’s recipients distinguish one specific object pointed at
from another (“Over there you see St. Peters, then to left you see the Old Vicarage”),
the extended index finger was the commonest hand shape. On the other hand, if the
speaker referred to something because it is an example of a category (“you see there
a fine example of a war memorial”), because the speaker makes a comment about it,
or because it is something which has features the speaker’s recipients are to take
note of (“you can see again the quality of the building in this particular case”), the
speaker is more likely to use a hand in which all fingers are extended and held together,
palm oriented vertically or upwards. That is, the shape and orientation of the hand
employed in pointing is chosen according to how the speaker is treating, in spoken
discourse, the referent of the pointing action.
This may reflect a more general feature of utterance visible actions, which is that,
very often, they are derived forms of actions made when operating upon or manipulat-
ing the objects which the refer to, whether these be literal or metaphorical. When we
talk about things, we conjure them up as objects in a virtual presence and with our
hands we may manipulate them in various ways, pushing them into position, touching
them as we speak of them, arranging them in relation to one another spatially, and
so on (for a view not unrelated to this, see Streeck 2009).
The other way for actors to use their hands in relation to the referential content of
their discourse is to use them to do something which itself has referential meaning.
These actions may be highly conventionalized, recognized as having quite specific or
restricted meanings (often directly expressible in a word or phrase that is regarded as
having an equivalent meaning), or they may be forms of action by which a sketch or
diagram of some object is provided, by which some pattern of action is depicted, or
which provides a movement image analogous to the dynamic character of a process
or mode of action.
In a survey of numerous recordings of unscripted conversations in various settings,
I distinguished six different ways in which visible actions could, in this manner, partic-
ipate in the referential meaning of the speaker’s discourse (Kendon 2004: 158–198).
These may be summarized as follows:
(i) A manual expression with a “narrow gloss” (“quotable gesture”) is used simulta-
neously with a word that has an identical or very similar meaning. In Naples, in
Italy, where I collected recordings of conversations, it was not uncommon to
observe how, from time to time, such expressions were used in the course of
talk, so that it was as if the speaker uttered the same word simultaneously in
speech and kinesically. A speaker explaining that nowadays in Naples there
were too many thieves uttered the word “ladri” (thieves) and used a manual
expression which is always glossed with this word. Again, as a speaker says
“money” he rubs the tip of his index finger against the tip of his thumb in an action
always glossed as “money.” Yet again, as a (British) speaker describes her job and
14 I. How the body relates to language and communication
says “I do everything from the accounts, to the typing, to the telephone, to the
cleaning,” as she says “typing” and “telephone” and “cleaning” she does an action,
in each case a conventional form, often glossed with the same words that she utters
(see Kendon 2004: 178 for these examples). In such cases the semantic relationship
between the two modalities appears to be one of complete redundancy. However, a
study of the contexts in which this occurs, taking into consideration how the action
is performed, suggests that there are various effects speakers achieve by using such
narrow gloss expressions in this way. More attention to this kind of use of kinesic
expressions is needed.
(ii) Kinesic expressions with a narrow gloss may also be used in parallel with verbal
expressions in such a way that they are not semantically redundant but make a sig-
nificant addition to the content of what the speaker is saying. For example, a city
bus driver (in Salerno, Italy), describing the disgraceful behavior of boys on the
buses adds that they behave this way in front of girls, who are not in the least
upset, saying that also they are happy about it. As he says this, he holds both
hands out, index fingers extended, positioned so that the two index fingers are
held parallel to one another. In this way he adds the comment that boys and girls
are equal participants in this activity, using here a kinesic expression glossed as
“same” or “equal,” among other meanings given it (de Jorio 2000: 90). Kendon
(2004: 181–185) describes this and several other examples.
(iii) Kinesic expressions may be used to make more specific the meaning of something
that is being said in words. For example, it is common to observe how an enact-
ment, used in conjunction with a verb phrase, appears to make the meaning of
the verb phrase much more specific. For example, a speaker speaks of how some-
one used to “throw ground rice” over ripening cheeses to dry off the cheeses’
“sweat.” As he says “throw” he shapes his hand as if it is holding powder and
does a double wrist extension as if doing what you would do if you were to scatter
a powder over a surface. In this way the actions referred to by the verb “throw” are
given a much more specific meaning (Kendon 2004: 185–190).
(iv) Hand actions may be used to create the representation of an object of some kind.
This may be deployed in relation to what is being said as if it is an exemplar or an
illustration of it. For example, a speaker is explaining how, in a new building being
discussed, a security arrangement will include “a bar across the double doors on
the inside.” As he says “bar” he lifts up his two hands and moves them apart
with a hand shape that suggests he is molding a wide horizontal elongate object.
As the speaker talks about an object, he uses his hands to create it, as if to bring
it forth as an exhibit or illustration (Kendon 2004: 190–191).
(v) Hand actions are often used either as a way of laying out the shape, size and spatial
characteristics or relationships of an object being referred to, or as a way of exhi-
biting patterns of action which provide either visual or motoric images of processes
(Kendon 2004: 191–194).
(vi) Hand actions can also be employed to create objects of reference for deictic ex-
pressions. For example, a speaker described a Christmas cake and said it was
“this sort of size,” using his extended index fingers to sketch out a rectangular
area over the table in front of him, thus enabling recipients to envisage a large
rectangular object lying on the table (Kendon 2004: 194–197).
1. Exploring the utterance roles of visible bodily action: A personal account 15
3.2. Operational
In contrast to these kinds of uses, hand or head actions are common that function as an
operator in relation to the speaker’s spoken meaning. An obvious way in which this
may be observed is in the use of head or hand actions that add negation to what is
being said. This is not always a straightforward matter, however. For example, the
head shake, commonly interpreted as a way of saying “no” is of course used for this,
but it can also be used when a speaker is not saying “no” to anything directly, but saying
something which implies some kind of negation (Kendon 2002). Likewise, there is a
very widely used hand action in which the hand, held with all fingers extended and ad-
ducted (a so-called open hand), held on a supinated forearm (so the palm faces down-
wards), is moved horizontally and laterally. Such a hand action is commonly seen in
relation to negative statements or statements that imply a negative circumstance (as
in a shopkeeper using this action as she explains her supply of a cheese to a customer:
“That’s the finish of that particular brie”), but it may also be seen in relation to positive
absolute statements, as if the hand action serves to forestall any attempts to deny what
is being said, as in: “Neapolitan cooking is the best of all cooking,” the horizontal hand
action acting here as if to say that any contrary claim will be denied (see Kendon 2004:
265–264; see also Harrison 2010).
3.3. Modal
Utterance visible actions may also be used to provide an interpretative frame for a
stretch of speech. The use of the “quotation marks” gesture to indicate that the speaker
is putting what he is saying in quotes is a common example. In an example drawn from
one of my recordings (not published), a speaker is in a conversation with someone
about how he negotiated a good deal with a representative of a mobile phone company.
In describing his successful negotiation he repeats what he said to the representative in
accepting some offer. He says: “yes, I’ll have that” and as he does so he held his hand up
to his ear in a Y hand shape, commonly used as a kinesic expression for “telephone.” In
this way he frames his words as quoted – as what he said to the representative – and
shows that he said this while talking on the telephone. In another example, also from
my recordings (made in Salerno in 1991), someone discussing a robbery puts forward
a speculation about what the robber might have done. As he describes what the robber
did he places a “finger bunch” hand against the side of his forehead and moves it away
and upward, expanding his fingers as he does so. This is an action that is widely accepted
(in Southern Italy) as a reference to imagination. Here it serves to frame his statement
as a hypothesis.
3.4. Performative
Hand actions are often used as a way of making manifest the speech act or illocutionary
force of what a speaker is saying. Many examples of this sort of usage were described
by Quintilian, and some of the forms he described are also used today (Dutsch
2002; Quintilian Book XI, iii: lines 14–15, 61–149 in Butler 1922). In my own work I
have described the use of ‘praying hands’ or mani giunte and also of the ‘finger
16 I. How the body relates to language and communication
3.5. Parsing
Lastly, there is a punctuational parsing or discourse structure marking function of
speaker’s hand or head actions. For example, speakers not uncommonly, in giving a
list of items, place their head in a slightly different angular position in relation to each
item as they describe it. “Batonic” movements of the hand can be observed to occur
in apparent association with features of spoken discourse that are given prominence
(see Efron 1972; Ekman and Friesen 1969). However there are also hand action se-
quences, such as the “finger-bunch-open-hand” sequence observed in Neapolitan speak-
ers that are coordinated with the topic-comment structure of the speaker’s discourse. A
version of this has also been described for Persian speakers in southern Iran (Seyfeddi-
nipur 2004). Also observed among Neapolitan speakers, but observed elsewhere as well,
is the thumb-tip-to-index-finger-tip “precision grip.” This is often used to mark a stretch
of speech which the speaker deems to be of central importance to what is being said, as
when the speaker is emphasizing something that is quite specific and important (see
Kendon 1995; Kendon 2004: 238–247). For an account of German uses of this hand
action see Neumann (2004). For uses by an American speaker see Lempert (2011).
3.6. Discussion
It should be stressed that the different ways described here of how visible actions can
contribute to the meaning of an utterance is only a beginning. More complete and more
systematic accounts have yet to be provided. Previous partial attempts similar to this
include Efron (1972), McNeill (1992) (and see also Streeck 2009 and Calbris 2011). Fur-
thermore, and it is important to stress this, it should be understood that these semantic
and pragmatic functions of utterance visible actions are not mutually exclusive. A given
action can serve in more than one way simultaneously, and a given form may function in
one way in one context and in a different way in another.
A second point must be made. We have spoken about different ways in which these
utterance visible actions can contribute to the meaning of the utterance, pointing out
how they may contribute to the propositional content of an utterance, or function in
various ways in relation to various aspects of its pragmatic meaning. The different
ways we have outlined are different ways which have been arrived at by observers or
analysts, after they have reflected upon how the form of visible action, regarded as in
some way intended or meant as part of the speaker’s expression, can be related to
the semantic or pragmatic content that has been apprehended from the speech. Our
ability to do this is based upon our ability to grasp how these actions are intelligible.
The basis for this understanding remains obscure, however. Very little attention has
been paid to the problem of how the semantic “affiliation” claimed between words
and kinesic expressions is justified. Involved here is the question of the intelligibility
1. Exploring the utterance roles of visible bodily action: A personal account 17
of utterance visible actions and how this interacts with the intelligibility of associated
spoken expression. The nature of this intelligibility and of this semantic interaction
deserves much more systematic attention (one recent relevant discussion is Lascarides
and Stone 2009).
Finally, how can we be sure whether, or to what extent, these utterance visible ac-
tions make a difference to how recipients grasp or understand the meanings of the utter-
ances they are a part of. We do know, both from everyday experience and from
numerous experimental studies (Hostetter 2011; Kendon 1994), that these visible ac-
tions do make a difference for recipients, but whether they always do so, and whether
they do so in the same way, this we cannot say, nor do we have a good understanding of
the circumstances in which they may or may not do so. (See Rimé and Schiaratura 1991:
272–275 for an interesting start in investigating this issue).
To conclude, the brief survey offered above should make clear the diverse ways in
which speakers employ utterance visible action. No simple statement can be made
about what these actions do or what they are for. For me, it seems, a consideration
of these different modes of use supports the view that these actions are to be regarded
as components of a speaker’s final product. That is, they are not (or are not only) symp-
toms of processes leading to verbal expression (as some approaches to them might sug-
gest). Rather, they are integral components of a person’s expression which, in the cases
we have been considering, are composed as an ensemble of different modalities of
expression.
employed and developed with semiotic features that are comparable to spoken lan-
guages. Depending on the community and the place of deaf persons within it, these
sign languages may also be used between deaf and hearing, as well as just among the
deaf. The literature on these sign languages is now very extensive. For a representative
survey see Brentari (2010).
In my own work on utterance visible action as the sole vehicle for utterance, I have
undertaken two pieces of research. One was a small scale study of material collected
in Papua New Guinea (Kendon 1980b, 1980c, 1980d), mainly from one deaf young
woman. The other was a large scale study of sign languages in Aboriginal Australia
(Kendon 1988). The work with the material collected in Papua New Guinea was (for
me) a pioneering and preliminary effort in many ways, and restricted in scope, since
it was based on limited material collected as a result of a chance encounter. In the
course of attempting to make films of certain kinds of social occasions among the
Enga in the Papua New Guinea highlands, a young deaf woman appeared one day
near my residence. She talked in signs with great fluency. She was using a system that
was used by various families in the valley who had deaf members. The deafness in the
valley was said to be a consequence of an epidemic of meningitis of some years back. For-
tunately, my New Guinean field assistant was able to converse with her, since he also had
deaf relatives. He was later able to interpret for me much of what I was able to record, as
well as assisting in the recordings. I later undertook a detailed study of some of this mate-
rial. Despite its limitations, undertaking such a detailed study led me to confront some
fundamental issues regarding the way in which meanings may be encoded in the media
of visible bodily action (see the discussion in Kendon 1980c).
The fundamental process involved seems to be one in which the actor, by means of
range of different techniques of representation, “brings forth” or “conjures” actions,
objects, movements, spatial relations, in this way representing concepts, ideas, and the
like, so they are understood as making reference to these things. This may take the
form of a kind of re-enactment of actions and their circumstances and of the actions
themselves in a fairly elaborated pantomimic manner. Very quickly, however, these
forms of action become reduced schematized and standardized in various ways as
they become a shared means by which meanings may be represented. This is a funda-
mental and general process that has been described many times by students of auton-
omous utterance visible actions. Although the terminology is various, the processes of
“sign formation” that have been described by Kuschel (1973), Tervoort (1961), Klima
and Bellugi (1979), Yau (1992), Eastman (1989), Cuxac (Cuxac and Sallandre 2007;
see also Fousellier-Souza 2006) – to name just a few of the authors who have written
about this – are all fundamentally similar. To represent a meaning for someone else
(and also, I think, to represent it for oneself), one resorts to a sort of re-creation. As
if, by showing the other the thing that is meant, the other will come to grasp it in a
way that overlaps with the way it is grasped by oneself. As these representations
become socially shared, they rapidly undergo various processes of schematization. In
consequence they are no longer understood only because they are depictions of some-
thing but also because they are forms which contrast with other forms in the system,
acquiring the status of lexical items in a system, that is. In this process we seem able
to observe the processes of language system formation. This provides one of the
main reasons why primary sign languages (sign languages of the deaf, that is) have
become objects of such intense interest.
1. Exploring the utterance roles of visible bodily action: A personal account 19
on a shelf or some other object which has a base on which it stands, one chooses the
verb zetten. However, if the object does not have a base or is something, such as a
book, that can be put down on its side, one chooses the verb leggen. In French, on
the other hand, one uses the same verb mettre, whatever the object or its placement ori-
entation might be. Gullberg found that Dutch speakers, if using hand actions as they
talked about putting objects somewhere, accompanied their verb phrases with different
hand actions, according to which placement verb they used. French speakers, on the
other hand, did not use hand actions that were differentiated in this way, regardless of
the nature of the object they were talking about. This suggests that where a language
makes semantic distinctions of this sort and manual expressions are also being employed,
these may reflect these semantic distinctions. The language spoken, thus, may link directly
to the kinds of manual expressions that may be used, if these are used when speaking.
This is a further piece of evidence in favour of the view that, as Kendon (1980a) put it,
“gesticulation and speech are two aspects of the process of utterance.” Exactly how this
is to be understood is yet to be made clear. However, the detailed way in which Warlpiri
speakers or Warlmanpa speakers have created kinesic expressions for the semantic
units their spoken languages supply reinforces the view, also suggested by Gullberg’s
work (and suggested, too, by the phenomenon we described earlier, in which “narrow
gloss” kinesic expressions may be used conjointly with spoken expressions of the same
meaning), that word meanings are somehow linked to or grounded in schematic percep-
tuo-motor patterns so that, if the hands are also employed when speaking, we see these
patterns being drawn upon as a source for the hand actions. For the Warlpiri women,
who, of necessity, were to create kinesic representations of concepts provided by
their language, a strategy they followed was to draw upon repertoires of already existing
perceptuo-motor representations. If this is so, this might mean that the “imagery” that
McNeill (2005) suggests is opposed to the categorical expressions of words is not always
to be so sharply separated. Kinesic expressions can also be like words. Indeed, they are
often highly schematic in form and serve as devices to refer to conceptual categories in
ways very similar to words. Cogill-Koez (2000a, 2000b) shows this for “classifier predi-
cates” in sign languages, which have features in common with some kinds of manual ex-
pressions seen in speakers (see Kendon 2004: 316–324; Schembri, Jones, and Burnham
2005). Whatever it is that is made available through verbal expression can also be made
available by other means. The distinction between imagistic expression and verbal
expression may be much less sharp than has often been supposed.
5. Broader implications
In the foregoing I have touched upon some of the questions I have been concerned with
in my studies of utterance visible action. My purpose has been to illustrate the partic-
ular perspective in terms of which I have approached the study of this domain of human
action. What are some of the broader implications?
they may do so, suggests that in the process of utterance production the speaker forges
“utterance objects” out of materials of diverse semiotic properties. This makes it possi-
ble for a speaker to “escape” the constraints of the linearity of verbal expression, at
least to some degree. As has recently been pointed out, in sign languages use is
made of multiple articulators simultaneously. This means that, in these languages,
simultaneous as well as linear constructions must be envisaged as part of their grammar
(Vermeerbergen, Leeson, and Crasborne 2007). Once it is seen that speakers also can
make use of utterance visible actions as they construct utterances, it will be clear that a
similar kind of simultaneity of construction becomes possible. As the examples we have
mentioned make clear (and as is clear from the many others that have been described),
speakers do in fact exploit this possibility. For the most part, at least as far as is known,
the use of simultaneous constructions in spoken language through the combination
of speech and utterance visible action has not, in any community of speakers, become
stabilized and formalized as a shared practice to the point that it must be considered
as a part of the formal grammar of any spoken language. Such a manner of construct-
ing utterances is widespread nevertheless, and, from the point of view of describing
languaging, rather than language, it must be taken into consideration (Kendon 2011).
evolved as a system of vocal expression do not face this “switch” problem, none of them
pay very much attention to the intimate interrelations between speaking and visible
bodily action we have discussed here. The involvement of manual (and other) bodily
action in speaking needs to be accounted for in any proposal put forward to account
for the origin of language in evolutionary terms.
Most writers who advocate a “gesture first” theory of language origins draw atten-
tion to the commonly noted intimate association between gesture and speech as sup-
porting evidence (as, indeed, I did myself in Kendon 1975). However, given that
utterance visible action, when used in conjunction with speech, has a rather different
role in utterance and, accordingly, exhibits a different range of semiotic properties
than it does when it is employed as the sole vehicle of utterance (as in signing), it is
clear that it is not some kind of left-over from a non-speech kind of language and is
not appropriately so regarded. It is, rather, an integrated component of contemporary
languaging practice. Further, given modern developments in our understanding of the
neurological interrelations between speaking and hand actions (for one review see Will-
ems and Hagoort 2007), it seems much better to suppose that speaking and utterance
manual action evolved together.
According to a proposal that I am currently working on (expressed in a preliminary
way in Kendon 2009), it is suggested that we might better approach the problem if we
started out, not by thinking about the actions of speaking and gesturing as being des-
cended with modification only from communicative or expressive actions, but by think-
ing of them as including modifications of the practical actions involved in manipulating
and altering the environment, especially as this is required in the acquisition of food,
and including the manipulation and alteration of the behavior of conspecifics, as in
mothering, grooming, mating and fighting. MacNeilage (2008) has suggested that the
complex oral actions that form the basis of speech have their origins in the oral manip-
ulatory actions that are involved in the management of food intake. Perhaps this could
be extended to actions of other parts of the body involved in feeding. If an animal is to
masticate its food, food has to be brought into the mouth in some way. Leroi-Gourhan
(1993) pointed out that an animal may do this by moving its whole body close enough
to foodstuffs so that it can grasp them with its mouth directly. Animals that do this tend
to be herbivores and all four of their limbs are specialized for body support and loco-
motion. They acquire food by grazing or cropping. On the other hand, many animals,
for example squirrels and raccoons, grasp and manipulate foodstuffs with their hands,
which they also use to carry food to the mouth. Such animals tend to be carnivores or
omnivores and their forelimbs are equipped as instruments of manipulation, each with
five mobile digits. In mammals of this sort, a system of forelimb-mouth co-ordination be-
comes established. This development is particularly marked in primates, of course, who,
perhaps, in adopting an arboreal style of life, have developed forelimbs that can serve in
environmental manipulation as well as in body support and locomotion. This sets the
stage for the development of oral-forelimb manipulatory action systems, and this may
explain the origin of co-involvement of hand and mouth in utterance production (see
Gentilucci and Corballis 2006).
This implies that the actions involved in speaking and in utterance visible action, two
forms of action that, as we have seen, are so intimately connected that they must some-
how be regarded as two aspects of the same process, are adaptations of oral and manual
environmental manipulatory systems employed in practical actions. The adaptations
24 I. How the body relates to language and communication
that allow them to serve communication at a distance are adaptations that arise as prac-
tical actions came to function in situations of co-present interaction between conspeci-
fics, at first, perhaps, as “try-out” or “as if ” versions of true practical actions (Kendon
1991). On this view the actions of speaking and gesturing do not derive only from ear-
lier forms of expressive actions. We may expect, accordingly, that there will be compo-
nents of the executive systems involved in speech that will be closely related to those
involved in forelimb action and that these will be different from those components of
oral and laryngeal action that are part of the vocal-expression system. This view re-
ceives some support in the neuroscience literature, where it is reported that the actions
of the tongue and lips by which the oral articulatory gestures of speech are achieved,
controlled as they are in the pre-motor and motor cortex, can be separated from actions
involved in exhalation and in the activation of the larynx, which produce vocalization.
The control circuits for these actions involve sub-cortical structures instead. However,
in normal speech, the oral gestures of speech articulation are combined with vocal
expression, which provides the affective and motivational components of speaking
(see, for example, Ploog 2002).
Engaging in utterance, doing language, as we might say, is thus to be thought of as
being derived from forms of action by which a creature intervenes in the world. Langua-
ging (doing language), in consequence, because it involves practical action, involves the
mobilization of oral and manual practical action systems. It also involves the mobiliza-
tion of vocal and kinesic expressive systems, as they come to be a part of social action.
Utterance visible actions, thus, are neither supplements nor add-ons. They are an inte-
gral part of what is involved in taking action in the virtual or fictional world that is
always conjured up whenever language is made use of. A theory of language that
takes this perspective, we suggest, will be better able to allow us to understand why
it is that visible bodily action is also mobilized when speakers speak and why, more gen-
erally, speaking, using language in co-present interaction, that is, is always a form of
action that involves several different executive systems in co-ordination.
6. References
Arbib, Michael 2005. From monkey-like action to human language: An evolutionary framework
for neurolinguistics. Behavioral and Brain Sciences 28: 105–167.
Arbib, Michael 2012. How the Brain Got Language: The Mirror Neuron Hypothesis. Oxford:
Oxford University Press.
Armstrong, David F., William C. Stokoe and Sherman E. Wilcox 1995. Gesture and the Nature of
Language. Cambridge: Cambridge University Press.
Bartlett, Frederick C. 1932. Remembering: A Study in Experimental and Social Psychology. Cam-
bridge: Cambridge University Press.
Birdwhistell, Ray L. 1970. Kinesics and Context: Essays in Body Motion Communication. Philadel-
phia: University of Pennsylvania Press.
Brentari, Diane (ed.) 2010. Sign Language. Cambridge: Cambridge University Press.
Brookes, Heather J. 2001. O clever “He’s streetwise.” When gestures become quotable: The case
of the clever gesture. Gesture 1: 167–184.
Brookes, Heather J. 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Anthropology 14: 186–224.
Brookes, Heather J. 2005. What gestures do: Some communicative functions of quotable gestures
in conversations among black urban South Africans. Journal of Pragmatics 37: 2044–2085.
1. Exploring the utterance roles of visible bodily action: A personal account 25
Bruce, Scott G. 2007. Silence and Sign Language in Medieval Monasticism: The Cluniac Tradition
C.900–1200. Cambridge: Cambridge University Press.
Butler, Harold E. 1922. The Institutio Oratoria of Quintilian. With an English Translation by H. E.
Butler. London: William Heinemann.
Calbris, Genevieve 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.
Cogill-Koez, Dorothea 2000a. Signed language classifier predicates: Linguistic structures or sche-
matic visual representation? Sign Language and Linguistics 3: 153–207.
Cogill-Koez, Dorothea 2000b. A model of signed language “classifier predicates” as templated
visual representation. Sign Language and Linguistics 3: 209–236.
Condillac, Étienne Bonnot de 2001. Essay on the Origin of Human Knowledge. Translated and
edited by Hans Aarsleff. Cambridge: Cambridge University Press.
Condon, William S. 1976. An analysis of behavioral organization. Sign Language Studies 13: 285–
318.
Condon, William S. and Richard D. Ogston 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143: 338–347.
Condon, William S. and Richard D. Ogston 1967. A segmentation of behavior. Journal of Psychi-
atric Research 5: 221–235.
Corballis, Michael C. 2002. From Hand to Mouth: The Origins of Language. Princeton, NJ: Prin-
ceton University Press.
Crystal, David 1969. Prosodic Systems and Intonation in English. Cambridge: Cambridge Univer-
sity Press.
Cuxac, Christian and Marie-Anne Sallandre 2007. Iconicity and arbitrariness in French sign lan-
guage: Highly iconic structures, degenerated iconicity and diagrammatic iconicity. In: Elena
Pizzuto, Paola Pietandrea and Raffael Simone (eds.), Verbal and Signed Languages: Compar-
ing Structures, Concepts and Methodologies, 13–33. Berlin: De Gruyter Mouton.
Davis, Jefferey E. 2010. Hand Talk: Sign Language among American Indian Nations. Cambridge:
Cambridge University Press.
de Jorio, Andrea 1819. Indicazione Del Più Rimarcabile in Napoli E Contorni. Naples: Simoniana
[dalla tipografia simoniana].
de Jorio, Andrea 2000. Gesture in Naples and Gesture in Classical Antiquity. A Translation of
“La Mimica Degli Antichi Investigata Nel Gestire Napoletano” by Andrea De Jorio (1832)
and with an Introduction and Notes by Adam Kendon. Bloomington: Indiana University Press.
Donald, Merlin 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and
Cognition. Cambridge, MA: Harvard University Press.
Duncan, Susan 2005. Gesture in signing: A case study from Taiwan sign language. Language and
Linguistics 6: 279–318.
Dutsch, Dorata 2002. Towards a grammar of gesture: A comparison between the type of hand
movements of the orator and the actor in Quintilian’s Institutio Oratoria. Gesture 2: 259–281.
Eastman, Gilbert C. 1989. From Mime to Sign. Silver Spring, MD: T. J. Publishers.
Efron, David 1972. Gesture, Race and Culture, Second Edition. The Hague: Mouton. First pub-
lished [1941].
Ekman, Paul and Wallace Friesen 1969. The repertoire of nonverbal behavior: Categories, origins,
usage and coding. Semiotica 1: 49–98.
Emmorey, Karen (ed.) 2003. Perspectives on Classifier Constructions in Sign Languages. Mahwah,
NJ: Lawrence Erlbaum.
Farnell, Brenda 1995. Do You See What I Mean? Plains Indian Sign Talk and the Embodiment of
Action. Austin: University of Texas Press.
Fousellier-Souza, Ivani 2006. Emergence and development of signed languages: From a semio-
genic point of view. Sign Language Studies 7: 30–56.
Gentilucci, Maurizio and Michael C. Corballis 2006. From manual gesture to speech: A gradual
transition. Neuroscience and Biobehavioral Reviews 30: 949–960.
Goffman, Erving 1963. Behavior in Public Places. New York: Free Press of Glencoe.
26 I. How the body relates to language and communication
Kendon, Adam 2009. Manual actions, speech and the nature of language. In: Daniele Gambarara
and Alfredo Giviigliano (eds.), Origine e Sviluppo Del Linguaggio, Fra Teoria e Storia, 19–33.
Rome: Aracne Editrice.
Kendon, Adam 2010. Pointing and the problem of “gesture”: Some reflections. Revista Psicolin-
guistica Applicata 10: 19–30.
Kendon, Adam 2011. “Gesture first” or “speech first” in language origins? In: Donna Jo Napoli
and Gaurav Mathur (eds.), Deaf Around the World, 251–267. New York: Oxford University
Press.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan.” In: Sotaro Kita (ed.),
Pointing: Where Language, Culture and Cognition Meet, 109–137. Mahwah, NJ: Lawrence
Erlbaum.
Klima, Edward A. and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard
University Press.
Krout, Maurice H. 1935. Autistic gestures: An experimental study in symbolic movement. Psycho-
logical Monographs 208(46): 1–126.
Kuschel, Rolf 1973. The silent inventor: The creation of a sign language by the only deaf mute on a
Polynesian Island. Sign Language Studies 3: 1–27.
Lascarides, Alex and Matthew Stone 2009. Discourse coherence and gesture interpretation. Ges-
ture 9: 147–180.
Lempert, Michael 2011. Barack Obama, being sharp: Indexical order in the pragmatics of
precision-grip gesture. Gesture 11(3): 241–270.
Leroi-Gourhan, André 1993. Gesture and Speech. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Lewis, Jerome 2009. As well as words: Congo pygmy hunting, mimicry and play. In: Rudolf Botha
and Chris Knight (eds.), The Cradle of Language, 236–256. Oxford: Oxford University Press.
Liddell, Scott K. 2003. Grammar, Gesture and Meaning in American Sign Language. Cambridge:
Cambridge University Press.
MacNeilage, Peter F. 2008. Origin of Speech. Oxford: Oxford University Press.
Mahl, George F. 1968. Gestures and body movements in interviews. Research in Psychotherapy
(American Psychological Association) 3: 295–346.
Mallery, Garrick 1972. Sign Language Among North American Indians Compared With That
Among Other Peoples and Deaf Mutes. The Hague: Mouton.
Mayer, Carl Augusto 1948. Vita Popolare a Napoli Nell’ Età Romantica. Bari: Gius. Laterza & Figli.
McNeill, David 1992. Hand and Mind. Chicago: Chicago University Press.
McNeill, David 2005. Gesture and Thought. Chicago: Chicago University Press.
Meissner, Martin and Stuart B. Philpott 1975. The sign language of sawmill workers in British
Columbia. Sign Language Studies 9: 291–308.
Meo-Zilio, Giovanni and Silvia Mejia 1980–1983. Diccionario De Gestos: España E Hispanoamér-
ica. Tomo I (1980), Tomo II (1983). Bogotá: Instituto Caro y Cuervo.
Morris, Desmond, Peter Collett, Peter Marsh and Maria O’Shaughnessy 1979. Gestures: Their Ori-
gins and Distribution. London: Jonathan Cape.
Müller, Cornelia 2004. Forms and uses of the palm up open hand: A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures, 233–256. Berlin: Weidler Buchverlag.
Neumann, Ranghild 2004. The conventionalization of the ring gesture in German discourse. In:
Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures, 216–224. Berlin: Weidler Buchverlag.
Ploog, Deltev 2002. Is the neural basis of vocalization different in non-human primates and Homo
Sapiens? In: Tim J. Crow (ed.), The Speciation of Modern Homo Sapiens, 121–135. Oxford:
Oxford University Press.
Ricken, Ulrich 1994. Linguistics, Anthropology and Philosophy in the French Enlightenment. Lon-
don: Routledge.
28 I. How the body relates to language and communication
Rimé, Bernard and Laura Schiaratura 1991. Gesture and speech. In: Robert S. Feldman and Ber-
nard Rimè (eds.), Fundamentals of Nonverbal Behavior, 239–281. Cambridge: Cambridge Uni-
versity Press.
Rosenfeld, Sophia 2001. Language and Revolution in France: The Problem of Signs in Late Eigh-
teenth Century France. Stanford, CA: Stanford University Press.
Scheflen, Albert E. 1965. The significance of posture in communication systems. Psychiatry: Jour-
nal of Interpersonal Relations 27: 316–331.
Schembri, Adam, Caroline Jones and Denis Burnham 2005. Comparing action gestures and clas-
sifier verbs of motion: Evidence from Australian sign language, Taiwan sign language and non-
signers gestures without speech. Journal of Deaf Studies and Deaf Education 10: 272–290.
Seyfeddinipur, Mandana 2004. Meta-discursive gestures from Iran: Some uses of the “Pistol
Hand.” In: Cornelia Müller and Roland Posnan (eds.), The Semantics and Pragmatics of Every-
day Gestures, 205–216. Berlin: Weidler Buchverlag.
Sherzer, Joel 1991. The Brazilian thumbs-up gesture. Journal of Linguistic Anthropology 1: 189–197.
Stokoe, William C. 2001. Language in Hand: Why Sign Came Before Speech. Washington, DC:
Gallaudet University Press.
Streeck, Jürgen 2009. Gesturecraft: The Manu-Facturing of Meaning. Amsterdam: John Benjamins.
Tervoort, Bernard 1961. Esoteric symbolism in the communication behavior of young deaf chil-
dren. American Annals of the Deaf 106: 436–480.
Tomasello, Michael 2008. The Origins of Human Communication. Cambridge: Massachusetts
Institute of Technology Press.
Umiker-Sebeok, Jean and Thomas A. Sebeok (eds.) 1987. Monastic Sign Languages. Berlin: De
Gruyter Mouton.
Vermeerbergen, Myriam, Lorraine Leeson and Onno Crasborn (eds.) 2007. Simultaneity in Signed
Languages: Form and Function. Amsterdam: John Benjamins.
Willems, Roel M. and Peter Hagoort 2007. Neural evidence for the interplay between language,
gesture and action: A review. Brain and Language 101: 278–289.
Woll, Bencie 2007. Perspectives on linearity and simultaneity. In: Myriam Vermeerbergen, Lor-
raine Leeson and Onno Crasborn (eds.), Simultaneity in Signed Languages, 337–344. Amster-
dam: John Benjamins.
Yau, Shun-Chiu 1992. Creations Gestuelle Et Debuts Du Langage: Creation De Langues Gestuelles
Chez Des Sourds Isoles. Paris: Editions Langages Croisés.
Abstract
This paper provides an overview on what is currently known about gestures and speech
from a psychological perspective. Spontaneous co-verbal gestures offer insights into ima-
gistic and dynamic forms of thinking while speaking and gesturing. Includes motion event
studies, also from cross-cultural and developmental perspectives, and concerning those
with language impairments.
“it’s like seeing someone’s thought” – Mitsuko Iriye, historian, on observing how to
code gestures.
1. Introduction
To see in gesture “someone’s thought,” as our motto remarks, we look at each case indi-
vidually and in close detail. Since they are unique in their context of occurrence, ges-
tures, for this purpose, are transcribed one by one, never accumulated, and, since
often it is the tiniest features through which thought peeks, we record in detail. Taking
gesture at this fine-grained scale, we cover a wide range – gestures in different types of
language (the “S-type” and “V-type”), gestures of children, and gestures in neurological
disturbances – and find in each region that our “window” provides views of thinking as
it takes place, different across languages, ages, and neurological condition.
Fig. 2.1: Iconic gesture depicting an event from the animated stimulus, Canary Row. In the car-
toon, Sylvester, the ever-frustrated cat, attempts to reach Tweety, his perpetual prey, by climbing
a drainspout conveniently attached to the side of the building where Tweety in his birdcage
is perched. In this instance, Sylvester is climbing the pipe on the inside, a stealth approach. Tweety
nonetheless spots him, rushes out of his cage and into the room behind, then reappears with
an impossibly large bowling ball, which he drops into the pipe. In the example, the speaker is
describing the bowling ball’s descent. Used with permission of Cambridge University Press.
2. Gesture as a window onto mind and brain 31
It is important to consider the precise temporal details of a gesture. They suggest that
in the microgenesis of an utterance the gesture image and linguistic categorization con-
stitute one idea unit, and their timing is an inherent part of how this idea unit is cre-
ated. The start of the gesture preparation is the dawn of the idea unit, which is kept
intact and is unpacked, as a unit, into the full utterance. The phases fall out at the pre-
cise moment of their intellectual realizations. Timing the gesture phases is inherent to
this developing meaning.
forms in the codified system of English, and the meaning of the whole is composed out
of these separately meaningful parts according the plan of the intransitive phrase.
Due to synchrony, the gesture semiotic presents its content at the same time as the
linguistic semiotic, and this duality is an important key to what evolved. This mecha-
nism of combined semiotic opposites is one important spectacle we see through the
gesture window.
5.3.1. English
The above example of a speaker describing the aftermath of the bowling ball event di-
vided the event into six path segments, each with its own path gesture:
The match between speech and gesture is nearly complete. The speaker’s visuospatial
cognition – in gesture – consists of a half dozen straight line segments, not the single
curved path that Sylvester actually followed (Fig. 2.2).
34 I. How the body relates to language and communication
PATH 4 [ / rainspo]]
(Continued )
2. Gesture as a window onto mind and brain 35
Fig. 2.2: English speaker’s segmentation of a curvilinear path. Computer art in this and subse-
quent illustrations by Fey Parrill. Used with permission of University of Chicago Press.
5.3.2. Spanish
In video recordings by Karl-Erik McCullough and Lisa Miotto, Spanish speakers,
in contrast, represent this scene without significant segmentation. Their gestures are
single, unbroken curvilinear trajectories. In speech, the entire path may be covered
by a single verb. The following description is by a monolingual speaker, recorded in
Guadalajara, Mexico:
The accompanying gesture traces a single, unbroken arcing trajectory down and to
the side. What had been segmented in English becomes in Spanish one curvaceous ges-
ture that re-creates Sylvester’s path. In speech, the speaker made use of onomatopoeia,
which is a frequent verb substitute in our Spanish-language narrative materials
(Fig. 2.3).
To quantify this possible cross-linguistic difference, the table shows the number of
path segments that occur in Spanish and English gestures for the path running from
the encounter with the bowling ball inside the pipe to the denouement in the bowling
alley (Tab. 2.1). English speakers break this trajectory into 43 percent more segments
than Spanish speakers: 3.3 in English and 2.3 in Spanish. Extremes, moreover, favor
36 I. How the body relates to language and communication
Fig. 2.3: Spanish speaker’s single continuous arc for Sylvester’s circuitous trip down the pipe,
along the street and into the bowling alley (scan images from left to right). Elapsed time is
about 1 sec. This illustrates Spanish-style visuospatial cognition of a curved trajectory as single
a single, unsegmented path. Used with permission of University of Chicago Press.
2. Gesture as a window onto mind and brain 37
English. Five English speakers divided the trajectory into six or more segments, com-
pared to only one Spanish speaker. Thus Spanish speakers, even when they divide
paths into segments, have fewer and broader segments.
Gestural manner was in the second, third, and fourth lines, despite the total absence of
spoken manner references. Thus, while manner may seem absent when speech alone is
considered, it can be present, even abundant, in the visuospatial thinking.
Fig. 2.4: Spanish speaker’s “manner fog,” while describing Sylvester’s inside ascent. She is at line
2, “[de entra][r / / se met][e por el]” (to enter refl goes-into through the). Her hands continually
rock back and forth (= climbing manner) while rising (= upward path) but without verbal mention
of manner.
The gesture contains manner and synchronizes with the manner verb, “rolls.” The con-
text highlighted manner as the point of differentiation. The content and co-occurrence
highlight manner and suggest that it was part of the psychological predicate.
This gesture, despite the presence of the same verb, “rolls,” skips the verb and presents
no manner content of its own. It shows path alone, and co-occurs with the satellite,
“down.” Both the timing and the shape of the gesture suggest that manner was not a
major element of the speaker’s intent and that “rolls,” while referentially accurate,
was de-emphasized and functioned as a verb of motion only, with the manner content
modulated (the speaker could as well have said “goes down,” but this would have
meant editing out the true reference to rolling).
(8) lao tai-tai [na -ge (9) da bang hao]-xiang gei (10) ta da-xia
old lady hold big stick seem CAUSE him hit-downverb-satellite
CLASSIFIER
‘The old lady apparently knocked him down with a big stick’
The gesture (a downward blow with her left hand, fist clenched around “the stick,” palm
facing center) that accompanied the spoken reference to the stick (da bang ‘big stick’)
was expressively the same as the verb and satellite, da-xia ‘hit-down’. However, the
speaker’s hand promptly relaxed, long before this verb phrase was reached in speech.
Chinese is what Li and Thompson (1976, 1981) termed a “topic prominent” lan-
guage. Wallace Chafe stated the sense of topicalization intended: “What the topics
appear to do is limit the applicability of the main predication to a certain restricted
domain […] the topic sets a spatial, temporal, or individual framework within which
the main predication holds” (Chafe 1976: 50).
In this instance, the domain is what was done with the big stick. English and Spanish,
in contrast, are “subject prominent.” Utterances in the latter languages are founded on
subject-predicate relations. In line with this typological distinction, we find cases like
the above, in which gesture provides one element and speech another element, and
they jointly create something like a topic frame. This may be again, therefore, the
impact of language on thinking for speaking.
In English, too, gestures occasionally occur that depict an event yet to appear in
speech (referring here to time lapses far longer than the previously discussed frac-
tion-of-a-second gesture anticipations). Such premature imagery is handled by the
speaker as an error, requiring repair. In the following, a gesture shows the result of
an action and it occurred with speech describing its cause. This is a semantically
40 I. How the body relates to language and communication
appropriate pairing not unlike the Chinese example, but it involved separating the ges-
ture from the predicate describing the same event. It was repaired first by holding it
until the predicate arrived, and then repeating it in enlarged form:
The two gestures in the first clause depicted Sylvester moving down the street, an event
described only in the following clause. The difference between Chinese and English in
this situation is apparent in the second line, the point at which the target predication
emerged in speech. Unlike the Chinese speaker, whose hands were at rest by now,
this English speaker held the gesture (underlined text) and then repeated it in a larger,
more definite way when the possible growth point occurred.
The subsequent enhanced repeat indicates the relevance of the gesture to the pred-
icate. In other words, the speaker retained the imagery from the first clause for the
growth point of the second. She did not, as the Chinese speaker did, use it as a self-
contained framing unit when it first appeared.
(i) Gestural paths tend to be broken into straight line segments in English and into
unbroken curvilinear wholes in Spanish. Chinese also tends to break paths into
straight line segments.
(ii) Gestural manner tends to expand the encoding resources of Spanish and to
modulate them in English (the relationship in Chinese is not known).
(iii) Gestures can combine with linguistic segments to create discourse topics: this
occurs in Chinese, but not in English or Spanish.
rotating as it goes down for rolling). Manner without path, however, rarely occurs. Chil-
dren, like adults, have path-only gestures but, unlike adults, they also have large num-
bers of pure manner gestures and few path+manner gestures.
Fig. 2.5: No decomposition with an English speaking 2;6 year-old, who has path and manner in
one gesture. The hand simultaneously swept to the right, moved up and down, and opened and
closed. Computer art by Fey Parrill. Used with permission of the University of Chicago Press.
In other words, they “decompose” the motion event to pure manner or pure path, and
tend not to have gestures that combine the semantic components.
Decomposition, while seemingly regressive, is actually a step forward. The youngest
child from whom we have obtained any kind of recognizable narration of the animated
stimulus was a two-and-a-half year-old English-speaking child. The accompanying illus-
tration (Fig. 2.5) shows her version of Sylvester rolling down the street with the bowling
ball inside him (she reasons that it is under him).
The important observation is that she does not show the decomposition effect. In a
single gesture, the child combines path (her sweeping arc to the right) and manner (in
two forms – an undulating trajectory and an opening and closing of her hand as it
sweeps right, suggested by the up-and-down arrow).
Is this an adult-like combined manner-path gesture? I believe not. An alternative
possible interpretation is suggested by Werner and Kaplan (1963), who described a non-
representational mode of cognition in young children, a mode that could also be the
basis of this gesture. Werner and Kaplan said that the symbolic actions of young chil-
dren (in this case, the gesture) have “the character of ‘sharing’ experiences with the
Other rather than of ‘communicating’ messages to the Other” (1963: 42). Sharing
with, as opposed to communicating and representing to, could be what we see in the
two-and-a-half year-old’s gesture. The double indication of manner is suggestive of
sharing, since this redundancy would not be a relevant force, as it might have been in
a communicative representation of this event where the child were merely trying to
re-create an interesting spectacle for her mother.
One of the first attempts by children to shape their gestures for combination with
language could be the phenomenon of path and manner decomposition. The mecha-
nism causing this could be that the decomposition effect creates in gesture what Karmil-
off-Smith (1979) has suggested for speech: When children begin to see elements of
meaning in a form, they tend to pull these elements out in their representations to
42 I. How the body relates to language and communication
get a “better grip” on them. Bowerman (1982) added that the elements children select
tend to be those with “recurrent organizational significance” in the language. Manner
and path would be such elements, and their reduction in gesture to a single component
could be this kind of hyperanalytic response.
Three illustrations show the decomposition effect in English (age 4, Fig. 2.6), Man-
darin (age 3;11, Fig. 2.7), and Spanish (age 3;1, Fig. 2.8).
Fig. 2.6: English-speaking four-year-old with decomposition to manner alone. The child is describ-
ing Tweety escaping from his cage. The stimulus event combined a highly circular path with flying.
The child reduces it to pure manner – flapping wings, suggested by the two arrows – without path,
which was conspicuous in the stimulus and had been emphasized by the adult interlocutor (not
shown, but who demonstrated Tweety’s flight in a simultaneous path-manner gesture). The
embodiment of the bird, in other words, was reduced to pure manner, path excised. Computer
art by Fey Parrill. Used with permission of the University of Chicago Press.
6.2. Perspective
We gain insight into the decomposition effect and how it forms a step in the child’s
emerging imagery-language dialectic when we consider gestural viewpoint; the first-
person or character viewpoint (C-VPT) and the third-person or observer viewpoint
(O-VPT). In observer viewpoint, the speaker’s hands are a character or other entity
as a whole, the space is a stage or screen on which the action occurs, and the speaker’s
body is distanced from the event and is “observing” it. In character viewpoint, the
speaker’s hands are the character’s hands, her body is its body, her space its space,
etc. – the speaker enters the event and becomes the character in part. Unlike panto-
mime, character viewpoint is synchronized and co-expressive with speech and forms
psychological predicates (see Parrill 2011 for extensive discussion of viewpoint combi-
nations). Tab. 2.2 shows the viewpoints of path decomposed, manner decomposed and
fused path + manner gestures for three age groups; all are English speakers.
For adults, we see that most gestures are observer viewpoint, both those that fuse
manner and path and those with path alone. Few gestures in either viewpoint occur
with manner alone.
For children, both older and younger, we see something quite different. Not only do
we see the decomposition effect, but manner and path are sequestered into different
viewpoints. Path tends to be observer viewpoint and manner character viewpoint.
This sequestering enforces the path-manner decomposition: if one gesture cannot
have both viewpoints, it is impossible to combine the motion event components.
44 I. How the body relates to language and communication
*All figures are percentages. M = manner; P = path. C-VPT = character viewpoint; O-VPT =
observer viewpoint.
The decomposition effect and this viewpoint sequestering are very long lasting; children
up to twelve years old still show them. Longevity implies that the final break from
decomposition depends on some kind of late-to-emerge development that enables
the child (at last) to conceptualize manner in the observer viewpoint. Until this devel-
opment, whatever it may be, the difference in perspective locks in path-manner
decomposition.
6.3. Imitation
In an example thanks to Karl-Erik McCullough, the decomposition of manner and its
sequestering in character viewpoint is revealed in another way by imitation. Children
do not imitate model gestures with manner in observer viewpoint, even when the
model is directly in front of them and the imitation is concurrent. They change the
model to fit their own decompositional semantics, putting manner into the character
viewpoint and omitting path. In Fig. 2.9, a four-year-old is imitating an adult model.
The adult depicts manner (running) plus path in observer viewpoint, his hand moving
forward, fingers wiggling. The child watches intently; nonetheless, she transforms
Fig. 2.9: Decomposition to manner alone in imitation of model with combined path and manner.
Computer art by Fey Parrill. Used with permission of the University of Chicago Press.
2. Gesture as a window onto mind and brain 45
the gesture into manner with no path, in character viewpoint (in the gesture, she is
Sylvester, her arms moving as if running).
7. Neurogesture
I describe here a case of severe Broca’s agrammatic aphasia from Pedelty (1987), a case
of Wernicke’s aphasia, also from Pedelty, a case of split-brain gesture, collected in col-
laboration with Dalia Zaidel, a psychologist at University of California, Los Angeles,
and the effects of right hemisphere injury on gesture, collected in collaboration with
Laura Pedelty. The first case demonstrates the presence of growth points in Broca’s
aphasia, the second the truncation of growth points in Wernicke’s aphasia, and the
split-brain case a role in the production of iconic gesture for the right hemisphere, a
role confirmed by the study of right-hemisphere injury itself.
but it impairs the ability to access constructions and to orchestrate sequences of speech
and gesture movements.
The speaker in Fig. 2.10 had viewed the animated stimulus (the bowling ball scene).
She clearly was able to remember many details of the scene but suffered extreme
impairment of linguistic sequential organization: “cat – bird? – ‘nd cat – and uh – the
uh – she (unintell.) – ‘partment an’ t* – that (?) – [eh ///] – old uh – [mied //] – uh –
woman – and uh – [she] – like – er ap – [they ap – #] – cat [/] – [an’ uh bird /] – [is //] –
Fig. 2.10: Gestures by an agrammatic (Broca’s) aphasic speaker timed with “an’ down t’ t’ down.”
The speaker was attempting to describe Tweety’s bowling ball going down the drainpipe. Com-
puter art by Fey Parrill. Use with permission of the University of Chicago Press.
46 I. How the body relates to language and communication
I uh – [ch- cheows] – [an’ down t’ t’ down]”. Gestures occurred at several points, indicated
with square brackets, and appeared to convey newsworthy content. The figure shows a
gesture synchronous with “an’ down t’ t’ down,” depicting the bowling ball’s downward
path. Plausibly, this combination of imagery and linguistic categorization was a growth
point. The gesture occurred at the same discourse juncture where gestures from normal
speakers also occur, implying that for the patient, as for normals, a psychological pred-
icate was being differentiated from a field of oppositions.
After injury, gestures, like speech, are garbled and lack intelligible pragmatic or seman-
tic content. Strikingly, one gesture-speech combination (“to-it” with the gesture in
Fig. 2.11), seems to have become fixed in his memory, and repeatedly occurred; a
2. Gesture as a window onto mind and brain 47
growth point that – very abnormally – would not switch off (normal growth points
disintegrate after a second or two; see McNeill 1992: 240–244).
Fig. 2.11: Wernicke aphasic recurring imagery with the phrase “to-it.” Each panel shows a speech-
gesture combination created without meaningful context. The panels represent temporally widely
separated points, and show “getting to-it.” Computer art by Fey Parrill. Used with permission of
the University of Chicago Press.
Within traditional models of the brain, Wernicke’s area supplies linguistic categorial
content. It is known to be essential for speech comprehension, which is severely dis-
rupted after injury to the posterior speech area (Gardner 1974). However, it also
might play a role in speech production.
As inferred from the effect of injuries, Wernicke’s area could help generate the ca-
tegorial content of growth points; in turn, this content giving the imagery of the
growth point a shape that accords with the linguistic meanings. Damage accordingly
interferes the growth point, as we see in the transcript and Fig. 2.11. The repetitiveness
in the “to-it” example, whereby an initially meaningful speech-gesture combination (as
it appears) became detached from context and sense, ensures that all ensuing growth
points would be denied content (since they cannot vary their linguistic categorial
parts).
patients display reduced gesture output. Our sample of 5 clumps at the low end of the
distribution. Two other right-hemisphere patients had gesture outputs in the normal
range. The difference between the groups presumably is due to details of the injured
areas (hand dominance was not a factor). Since the two preserved patients and the 5 de-
pleted patients were non-overlapping groups, it is misleading to combine them statisti-
cally. We focus on the depletion phenomenon and limit our sample to that group
therefore. Tab. 2.3 compares Canary Row narrations by 5 right hemisphere patients
to those by 3 normal speakers.
Injury has no impact on speech, as measured in the number of clauses and number of
words, or the length of time taken to recount the stimulus; if anything, right hemisphere
damaged speakers are more talkative on these measures (Tab. 2.4).
And right hemisphere damaged patients talk faster – more words and clauses, while
taking less time (Tab. 2.5).
Gesture imagery thus seems to be the specific target of right hemisphere damage.
I just saw the* # the cat running ar* run* white and
black cat [# running here or there t* hop to here*] here, there, everywhere.
a b c
Hand hops forward four times:
a = onset of first hopping gesture
b = between second and third hopping gestures, and the approximate point
when her hand entered her field of vision
c = fourth hopping gesture
The speaker was describing Sylvester’s running and began with this verb, but, for rea-
sons unknown, her hand was undulating as it moved forward. As she caught sight of her
own motion, she started to say “hopping.” Kinetic experience also may have been a fac-
tor, but it was not sufficient since the change occurred only when her hopping hand
moved into her field of view.
The example illustrates an imagery-language looseness and release from ongoing
cohesive constraints that seems to be a result of right hemisphere damage. The imagery
with “running” was not constrained by the linguistic category “to run,” in contrast
to normal gesture imagery that is strongly adapted to the immediate linguistic environ-
ment. It also illustrates, however, that speech and gesture are still tightly bound after
right hemisphere damage, in that speech shifted when the undulating gesture came
into view.
50 I. How the body relates to language and communication
LB and NG therefore jointly illustrate one of our main conclusions above – the right
hemisphere (available to NG, apparently minimally used by LB) is necessary for situ-
ating speech in context and imbuing imagery with linguistically categorized signifi-
cance; the left hemisphere (relied on by LB, available to NG) orchestrates well-
formed speech output but otherwise has minimal ability to apprehend and establish
discourse cohesion.
(i) The brain must be able to combine motor systems – manual and vocal/oral – in a
systematic, meaning-controlled way.
(ii) There must be a convergence of two cognitive modes – visuospatial and linguistic –
and a locus where they converge in a final motor sequence. Broca’s area is a log-
ical candidate for this place. It has the further advantage of orchestrating actions
52 I. How the body relates to language and communication
that can be realized both manually and within the oral-vocal tract. MacNeilage
(2008) relates speech to cyclical open-close patterns of the mandible, and proposes
that speech could have evolved out of ingestive motor control. (See language ori-
gin theories in McNeill this volume a).
(iii) More than Broca’s and Wernicke’s areas underlie language – there is also the right
hemisphere and interactions between the right and left hemispheres, as well as
possibly the frontal cortex. A germane result is Federmeier and Kutas (1999),
who found through evoked potential recordings different information strategies
in the right and left sides of the brain – the left they characterized as “integrative,”
the right as “predictive.” These terms relate very well to the hypothesized roles
of the right and left hemispheres in the generation of growth points and unpack-
ing. The growth point is integrative, par excellence, and is assembled in the right
hemisphere, per the hypothesis of this chapter; unpacking is sequential orches-
tration, and orchestration would be involved in prediction, when that is the
experimental focus. And Kelly, Kravitz and Hopkins (2004) observe evoked
response effects (N400) in the right brain when subjects observe speech-gesture
mismatches.
(iv) Wernicke’s area serves more than comprehension – it also provides categorization,
might initiate imagery and might also shape it.
(v) Imagery arises in the right hemisphere and needs Wernicke-originated categoriza-
tions to form growth points. Categorial content triggers and/or shapes the imagery
in the right hemisphere. At the same time, it is related to the context to which the
right hemisphere has access.
(vi) The growth point is unpacked in Broca’s area. Growth points may take form in the
right hemisphere, but they are dependent on multiple areas across the brain (fron-
tal, posterior left, as well as right and anterior left). In addition, the cerebellum
would be involved in the initiation and timing of gesture phases relative to speech
effort (see Spencer et al. 2003). However, this area is not necessarily a site
specifically influenced by the evolution of language ability.
(vii) Catchments and growth points specifically are shaped under multiple influences –
from Wernicke’s area, the right hemisphere, and the frontal area – and take form
in the right hemisphere. (For catchments, see McNeill this volume b).
Throughout the model, the concept is that information from the posterior left hemi-
sphere, the right hemisphere, and the prefrontal cortex converge and synthesize in
the frontal left hemisphere motor areas of the brain – Broca’s area and the adjacent
premotor areas. This circuit could be composed of many smaller circuits – “localized
operations [that] in themselves do not constitute an observable behavior […] [but]
form part of the neural ‘computations’ that, linked together in complex neural circuits,
are manifested in behaviors” (Lieberman 2002: 39). See Feyereisen (volume 2), for
evidence from aproprioception for a thought-language-hand link in the brain.
Broca’s area in all this is the unique point of (a) convergence and (b) orchestration of
manual and vocal actions guided by growth points and semantically framed language
forms. The evolutionary model presented in McNeill (this volume a) specifically aims
at explaining orchestration of actions under other significances in this brain area and
how it could have been co-opted by language and thought.
2. Gesture as a window onto mind and brain 53
9. References
Bowerman, Melissa 1982. Starting to talk worse: Clues to language acquisition from children’s late
speech errors. In: Sidney Strauss (ed.), U-Shaped Behavioral Growth, 101–145. New York: Aca-
demic Press.
Chafe, Wallace 1976. Givenness, contrastiveness, definiteness, subjects, topics, and point of view.
In: Charles N. Li (ed.), Subject and Topic, 25–55. New York: Academic Press.
Feyereisen, Pierre volume 2. Gesture and the neuropsychology of language. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.),
Body – Language – Communication: An International Handbook on Multimodality in Human
Interaction. (Handbooks of Linguistics and Communication Science 38.2.) Berlin: De Gruyter
Mouton.
Federmeier, Kara D. and Marta Kutas 1999. Right words and left words: Electrophysiological evi-
dence for hemispheric differences in meaning processing. Cognitive Brain Research 8: 373–392.
Gardner, Howard 1974. The Shattered Mind. New York: Vintage Books.
Gardner, Howard, Hiram H. Brownell, Wendy Wapner and Diane Michelow 1983. Missing the
point: The role of the right hemisphere in the processing of complex linguistic material. In:
Ellen Perecman (ed.), Cognitive Processing in the Right Hemisphere, 169–191. New York: Aca-
demic Press.
Gazzaniga, Michael S. 1970. The Bisected Brain. New York: Appleton-Century-Crofts.
Gazzaniga, Michael S. 1995. Consciousness and the cerebral hemispheres. In: Michael S. Gazza-
niga (ed.), The Cognitive Neurosciences, 1391–1400. Cambridge: Massachusetts Institute of
Technology Press.
Gleitman, Lila 1990. The structural sources of verb meanings. Language Acquisition 1(1): 3–55.
Karmiloff-Smith, Annette 1979. Micro- and macrodevelopmental changes in language acquisition
and other representational systems. Cognitive Science 3: 91–118.
Kelly, Spencer D., Corinne Kravitz and Michael Hopkins 2004. Neural correlates of bimodal
speech and gesture comprehension. Brain and Language 89: 253–260.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: May
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Lausberg, Hedda, Sotaro Kita, Eran Zaidel and Alain Ptito 2003. Split-brain patients neglect left
personal space during right-handed gestures. Neuropsychologia 41: 1317–1329.
Li, Charles N. and Sandra A. Thompson 1976. Subject and topic: A new typology of language. In:
Charles N. Li (ed.), Subject and Topic, 457–490. New York: Academic Press.
Li, Charles N. and Sandra A. Thompson 1981. Mandarin Chinese: A Functional Reference Gram-
mar. Berkeley: University of California Press.
Lieberman, Philip 2002. On the nature and evolution of the neural bases of human language. Year-
book of Physical Anthropology 45: 36–63.
Lucy, John A. 1992a. Grammatical Categories and Cognition: A Case Study of the Linguistic Rel-
ativity Hypothesis. Cambridge: Cambridge University Press.
Lucy, John A. 1992b. Language Diversity and Thought: A Reformulation of the Linguistic Relativ-
ity Hypothesis. Cambridge: Cambridge University Press.
MacNeilage, Peter F. 2008. The Origin of Speech. Oxford: Oxford University Press.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David this volume a. The co-evolution of gesture and speech, and its downstream conse-
quences. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body-Language-Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin: De Gruyter Mouton.
54 I. How the body relates to language and communication
McNeill, David this volume b. The growth point hypothesis of language and gesture as a dynamic
and integrated system. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body-Language-Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking for speaking. In: David
McNeill (ed.), Language and Gesture, 141–161. Cambridge: Cambridge University Press.
McNeill, David and Laura Pedelty 1995. Right brain and gesture. In: Karen Emmorey and Judy
Snitzer Reilly (eds.), Sign, Gesture, and Space, 63–85. Hillsdale, NJ: Erlbaum.
Parrill, Fey 2011. The relation between the encoding of motion event information and viewpoint
in English-accompanying gestures. Gesture 11: 61–80.
Pedelty, Laura L. 1987. Gesture in Aphasia. Ph.D. dissertation, University of Chicago.
Pinker, Steven 1989. Learnability and Cognition: The Acquisition of Argument Structure. Cam-
bridge, MA: Massachusetts Institute of Technology Press.
Slobin, Dan I. 1987. Thinking for speaking. In: Jon Aske, Natasha Beery, Laura Michaelis and
Hana Filip (eds.), Proceedings of the Thirteenth Annual Meeting of the Berkeley Linguistic
Society, 435–445. Berkeley, CA: Berkeley Linguistic Society.
Slobin, Dan I. 1996. From “thought and language” to “thinking for speaking.” In: John Joseph
Gumperz and Stephen C. Levinson (eds.), Rethinking Linguistic Relativity, 70–96. Cambridge:
Cambridge University Press.
Slobin, Dan I. 2004. The many ways to search for a frog: Linguistic typology and the expression of
motion events. In: Sven Strömqvist and Ludo Verhoeven (eds.), Relating Events in Narrative,
Volume 2: Typological and Contextual Perspectives, 219–257. Mahwah, NJ: Lawrence Erlbaum.
Slobin, Dan I. 2009. Review of M. Bowerman & O. Brown (eds). Crosslinguistic perspectives on
argument structure: Implications for learnability. Journal of Child Language 36: 697–704.
Spencer, Rebecca M.C., Howard N. Zelaznik, Jörn Diedrichsen and Richard B. Ivry 2003. Dis-
rupted timing of discontinuous but not continuous movements by cerebellar lesions. Science
300: 1437–1439.
Talmy, Leonard 1975. Syntax and semantics of motion. In: John P. Kimball (ed.), Syntax and
Semantics, Volume 4, 181–238. New York: Academic Press.
Talmy, Leonard 1985. Lexicalization patterns: Semantic structure in lexical forms. In: Timothy
Shopen (ed.), Language Typology and Syntactic Description, Volume III: Grammatical Cate-
gories and the Lexicon, 57–149. Cambridge: Cambridge University Press.
Talmy, Leonard 2000. Toward a Cognitive Semantics. Cambridge: Massachusetts Institute of Tech-
nology Press.
Vygotsky, Lev Semenovich 1987. Thought and Language. Edited and translated by Eugenia Hanf-
mann and Gertrude Vakar (revised and edited by Alex Kozulin). Cambridge: Massachusetts
Institute of Technology Press.
Werner, Heinz and Bernard Kaplan 1963. Symbol Formation. New York: John Wiley. [Reprinted
in 1984 by Erlbaum].
Whorf, Benjamin Lee 1956. Language, Thought, and Reality. Selected Writings of Benjamin Lee
Whorf. Edited by John B. Carroll. Cambridge: Massachusetts Institute of Technology Press.
Zaidel, Eran 1978. Concepts of cerebral dominance in the split brain. In: Pierre A. Buser and Arl-
ette Rougeul-Buser (eds.), Cerebral Correlates of Conscious Experience, 263–284. Amsterdam:
Elsevier.
Abstract
This chapter gives a brief overview of gesture research from a linguistic point of view.
It begins with a short overview of the history of research on gestures as part of spoken
language and an attempt to understand the longstanding lack of linguistic interest in
considering gestures a relevant topic – or a relevant feature of language.
It then shows that a new field of gesture research has emerged over the past decades,
which regards gesture and speech as inherently intertwined. We have attempted to system-
atize the findings regarding the nature of gestures and their relation to language in use
according to the four aspects currently most widely researched: 1) form and meaning
of gestures, 2) gestures and their relation to utterance formation, 3) gestures, language,
and cognition, and 4) gestures as communicative resource in interaction and discourse.
In doing this, an overview of the present state of the art of research on gesture as part
of spoken language is presented. The chapter is complemented by an encompassing
bibliography of current research on gestures and speech from a linguistic perspective.
As for the hands, without which all action would be crippled and enfeebled, it is scarcely
possible to describe the variety of their motions, since they are almost as expressive as
words. For other portions of the body merely help the speaker, whereas the hands may
almost be said to speak. Do we not use them to demand, promise, summon, dismiss,
threaten, supplicate, express aversion or fear, question or deny? Do we not employ
them to indicate joy, sorrow, hesitation, confession, penitence, measure, quantity, number
and time? Have they not power to excite and prohibit, to express approval, wonder or
shame? Do they not take the place of adverbs and pronouns when we point at places
and things. In fact, though the peoples and nations of the earth speak a multitude of ton-
gues, they share in common the universal language of the hands. The gestures of which I
have thus far spoken are such as naturally proceed from us simultaneously with our words
(Quintilian: Institutionis oratoriae XI 3, 85–88).
The idea of gestures as a universal language was present in Renaissance ideas (Bacon,
Bulwer), it played a prominent role in the philosophy of Enlightenment (Condillac, Di-
derot) and also Romanticism discussed the idea (Vico, Herder) (for more detail see
Copple this volume; Müller 1998; Wollock 1997, 2002, this volume). Notably this long-
standing recognition of gesture’s linguistic properties and their potential for language
declined with the 19th and the 20th century. Treatments like de Jorio’s Mimica degli
Antichi investigata nel gestire napoletano in the early 19th century (see de Jorio
[1832] 2000 with an introduction by Adam Kendon) and Wundt’s work on gestures
and signs of Neapolitans, Plains Indians and Deaf people (Wundt 1921) did not inspire
scholarly reflection upon gesture as part of language (see also Kendon 2004: chapter 3
and 4).
Wollock (this volume) summarizes this development and its implications for contem-
porary reflections on gestures as follows:
Renaissance ideas on gesture foreshadow the 18th century, and to some extent even
Romanticism (see Vico, Herder). Important for us today is not so much the literal question
whether gesture is a universal language, as the fact that in this period gesture called atten-
tion to linguistic processes that are certainly universal – psychophysiological processes
common to verbal and nonverbal thought – but that were often overlooked, downplayed,
or even denied in 20th–century linguistics. (Wollock this volume)
Within 20th century linguistics gestures were not considered a relevant topic – or a
relevant feature of language. Under the auspices of Saussurean linguistics, hand-
movements that go along with speech were thrown into the wastebasket of parole or
of language use (Saussure, Bally, and Sechehaye 2001; Albrecht 2007). The idea of lan-
guage as a social system langue underlying all forms of language use was critical in de-
fining and establishing a scholarly discipline of linguistics in Europe. It had an immense
3. Gestures and speech from a linguistic perspective: A new field and its history 57
impact on the humanities in Europe. Structuralism became one of the most influential
schools of thought in the twentieth century.
Such a focus on language as a social system distanced the attention of linguists from
those phenomena that are characteristic of language use. This also holds for American
structuralism, which was strongly marked by the great challenge of documenting the
wealth of unwritten languages of Native America (see Bloomfield 1983; Z. Harris
1951). Notably, those were languages without a writing system, spoken languages,
which were characterized through their lack of literacy: un-written languages. Interest-
ingly enough American linguistic anthropology focused on the de-contextualized
systematic features of these languages, not on their particular nature as spoken lan-
guages. However, given their goal to identify the grammatical structures or the linguis-
tic system “behind” the spoken words this appeared to make perfect sense – concepts of
emergent grammar were not discussed at the time (Hopper 1998).
An exceptional study that did empirically investigate forms and functions of gestures
that are used in conjunction with speech comes out of American anthropology: the doc-
toral dissertation of David Efron ([1941] 1972), a student of Franz Boas. Carried out
during the Second World War it was not aimed at a contribution to linguistic questions,
but as an empirical study within the “nature-nurture” debate and seeking an empirical
answer to the question: Is human behavior shaped by culture or by nature? To counter
racist and eugenics positions, David Efron went out to study the gestural behavior of
traditional Eastern-Jewish and Southern Italian immigrants in New York City and com-
pared their style of gesturing with that of the second-generation immigrants. What he
found were hybrid gestural forms in the second generation, such as gestures, which
blended “American” and “Italian” or “Jewish” forms of gesturing. This was taken as
an indication and support of the nurture position, because it showed the influence of
culture on shaping human communicative behavior. Efron’s meticulous semiotic analy-
sis of gestural forms was prerequisite to actually defining and identifying these differ-
ences. But the study did not inspire scholars of language to look at gesture. His work
and his classification system of gestures was later made public by the psychologist
Paul Ekman and became a standard reference system in 20th century research on
bodily behavior more generally and non-verbal communication research in particular
(Ekman and Friesen 1969).
With Chomskian linguistics taking over the lead in the middle of the twentieth cen-
tury, linguistics made a turn towards cognitive science: the universal cognitive compe-
tence of humans for acquiring language came to be the topic of linguistics proper.
“Language performance” and accordingly gesture was not regarded a relevant topic
of inquiry within the so defined field of linguistics (Chomsky 1965; R. Harris 1995).
At roughly the same time however there were some singular attempts to analyze
body movements from a point of view of structuralism: Ray Birdwhistell (1970) –
linguistic anthropologist – put forward an account of facial expressions, postures and
hand movement for which he coined the term “Kinesics”. He developed a structuralist
framework for the description of body-movements and proposed that units of gestures
are very much structured like linguistic units:
The isolation of gestures and the attempt to understand them led to the most important
findings of kinesic research. This original study of gestures gave the first indication that ki-
nesic structure is parallel to language structure. By the study of gestures in context, it
58 I. How the body relates to language and communication
became clear that the kinesic system has forms which are astonishingly like words in a
language. (Birdwhistell 1970: 80)
In the sixties the anthropologist and linguist Kenneth Lee Pike put forward a theoret-
ical framework for language as part of human behavior (Pike 1967). Extending the pho-
nology/phonetics or phonem/phon distinction of structural linguistics to human
behavior in general, he introduced the differentiation between emic and etic aspects
of human behavior. He proposed that emic aspects of human behavior concern meaning
and etic aspects of behavior address their material characteristics. Pike even argued that
these behavioral units could form what nowadays would be termed a “multimodal syn-
tactic structure”, namely a sentence in which verbal and gestural forms are systemati-
cally integrated. (For a detailed account of Pike’s contribution to a multimodal
grammar see Fricke 2012, this volume).
A pioneer in researching bodily behavior with speech is Adam Kendon (with an edu-
cation in biology and in particular ethology). In the sixties and seventies he researched
the patterns of interactive bodily behavior (Kendon, Harris, and Key 1975; Kendon
1990). His analysis of the behavioral units and sequencing of greetings (Kendon and
Ferber 1973) revealed that communicative bodily actions are highly structured, mean-
ingful and closely integrated with speech. In the early seventies Kendon provided the
first systematic micro-analysis of gestural and vocal units of expression. At that time
film recordings became available for scientific research and the possibility to inspect
these sequences again and again made it possible to discover the fine-grained micro-
structures of human bodily and verbal behavior. An important outcome of this devel-
opment in technology was the fist micro-analysis of speech and body motion. Kendon
showed that units of speech and units of body motion possess a similar hierarchical
structure: larger units of movements go along with larger units of speech and smaller
units of movement parallel smaller portions of speech (Kendon 1972). It was only
about 10 years later that he formulated explicitly the idea of gesture and language as
being two sides of one process of utterance (Kendon 1980; for his current view see
Kendon 2004, this volume).
In the seventies and also for most part of the eighties linguistics continued to be
dominated by generative theory (then reformulated as Government and Binding
Theory; see Chomsky 1981). Psychology adopted the concept of non-verbal communi-
cation (Argyle 1975; Feldmann and Rimé 1991; Hinde 1972; Ruesch and Kees 1970;
Scherer and Ekman 1982; Watzlawick, Bavelas, and Jackson 1967) and gestures as
part of speech were regarded as only marginally relevant for such a field of research.
On the contrary, those body-movements not related to speech and with functions differ-
ent from language attracted most interest. One consequence of this was a big increase in
researching facial expression (see Ekman and Rosenberg 1997 for an overview). Such a
scholarly climate made it difficult to pursue a linguistic perspective on gestures and lan-
guage throughout the eighties. However, there were other positions: David McNeill
(1979) – coming from psychology and linguistics – proposed a theory of language and
gesture in which both modalities form one integrated system. Already at that time
McNeill and Kendon concentrated on gestures as movements of hands and on their par-
ticular relationship to speech. In contrast to nonverbal communication scholars they
were interested in the movement of the hands because they exhibit a particularly
tight interrelatedness with language.
3. Gestures and speech from a linguistic perspective: A new field and its history 59
McNeill’s idea of gesture as being part of the verbal utterance challenged the distinc-
tion of verbal versus non-verbal behavior, which characterized the mainstream research
on nonverbal communication at the time. It even triggered a public debate carried out
in several articles in the Journal “Psychological Review”. McNeill challenged the
psycholinguistic belief in gestures as part of the NON-verbal dimensions of communica-
tion, by raising the question: So you think gestures are non-verbal (McNeill 1985). Parti-
cipants of this debate were: Brian Butterworth, Pierre Feyereisen, and Uri Hadar on the
one hand and David McNeill on the other hand (Butterworth and Hadar 1989; Feyerei-
sen 1987; McNeill 1985, 1987, 1989). While McNeill criticized the idea of gestures as
being non-verbal, Butterworth and Hadar presented psycholinguistic evidence for their
assumption of gestures as something different from speech (see Hadar this volume).
In 1992 McNeill published his integrated theory of gestures and speech in, what
became a landmark book for a psychological and linguistic approach to gesture and
speech, “Hand and Mind. What gestures reveal about thought.” (McNeill 1992) In
this book David McNeill develops his theory of language and gesture. He proposes
that gesture and speech are different but integrated facets of language: gesture as imagis-
tic, holistic, synthetic and language as arbitrary, analytic, and linear. These two sides of
language reside on two different modes of thought: one imagistic – the other one propo-
sitional and McNeill considers the dialectic tension between the two modes of thought as
propelling thought and communication (for more detail, see McNeill this volume).
With its core idea of gestures as “‘window’ onto thinking” (McNeill and Duncan
2000), as revelatory of imagistic forms of thought, the book matched a turn towards
cognitive science in the humanities and raised a great amount of interest in psycholog-
ical research on language and cognition (for an overview see McNeill 2000). But for lin-
guistics proper, gesture remained a phenomenon at the margins of interest – if at all.
This holds even for cognitive linguistics [including cognitive grammar (Langacker
1987); metaphor theory (Ortony 1993); or blending theory (Fauconnier and Turner
2002)], which developed a counter position to the modularism of generativism (includ-
ing its further developments as Government and Binding Theory and the minimalist
program, see Chomsky 1981, 1992). Cognitive linguistics argues that language resides
on general cognitive principles and capacities challenging the generativist position of
linguistic competence as a particular and cognitively distinct module. A cognitive lin-
guistic position quite naturally opens up the gate for a concept of language which is
not restricted to the oral mode alone and which allows for an integration of different
modalities within one process of utterance (see Cienki 2010, 2012). Despite this theo-
retical pathway, cognitive linguists for the most part have relied on the analysis of in-
vented sentences – not on data from language use, which might have attracted their
attention to the work gestures do in conjunction with speech.
But over the past two decades the situation has changed and we find an increasing
amount of publications within cognitive linguistics that do consider gestures as part
of linguistic analysis (examples are Cienki 1998a, 1998b; Cienki and Müller 2008b;
McNeill and Duncan 2000; Mittelberg 2006, 2010a, 2010b; Müller 1998; Müller and
Tag 2010; Sweetser 1998; Núñez and Sweetser 2006). Moreover, also outside of Cogni-
tive Linguistics an increasing amount of publications with a linguistic perspective on
gestures – or at least a perspective that is compatible with a linguistic analysis of
gestures – was published. In 2004 Kendon’s monograph on “Gesture: Visible Action
as Utterance” appeared, presenting an encompassing account of the manifold ways in
60 I. How the body relates to language and communication
which gestures can become part of verbal utterances, including a detailed historical
section of gesture classifications as well as a discussion of what makes a movement of
the hands a gesture. Other books relevant to a linguistic perspective on gestures
include: Calbris’ “Elements of meaning in gesture” (2011), Enfield’s “The anatomy
of meaning: speech, gesture, and composite utterances” (2009), Fricke’s “Origo,
Geste und Raum: Lokaldeixis im Deutschen” (2007) and “Grammatik multimodal”
(2012), McNeill’s edited volume “Language and Gesture” (2000) and his book on “Ges-
ture and Thought” (2005), Müller’s “Redebegleitende Gesten: Kulturgeschichte –
Theorie – Sprachvergleich” (1998), Müller and Posner’s edited volume “The semantics
and pragmatics of everyday gestures” (2004), and Streeck’s “Gesturecraft: The manu-
facture of meaning” (2009).
After the turn of the century, more and more scholars have begun to look at gestures
from a linguistic perspective thereby focusing on a range of different aspects.
In the following sections we will present and discuss the present state of the art of a
linguistic view on gestures in more detail. We will concentrate on four main areas of
research: form and meaning of gestures, gestures and their relation to utterance forma-
tion, gestures, language, and cognition, and gestures as a dynamic communicative
resource in discourse.
This gesture-symbol is global in that the whole is not composed out of separately meaning-
ful parts. Rather, the parts gain meaning because of the meaning of the whole. The
3. Gestures and speech from a linguistic perspective: A new field and its history 61
wiggling fingers mean running only because we know that the gesture, as a whole, depicts
someone running. It’s not that a gesture depicting someone running was composed out of
separately meaningful parts: wiggling + motion, for instance. The gesture also is synthetic.
It combines different meaning elements. The segments of the utterance, “he + running +
along the wire,” were combined in the gesture into a single depiction of Sylvester-running-
along-the-wire. (McNeill 1992: 20–21)
The ways in which gestures convey meaning across sequences of gestures is furthermore
characterized as being “non-combinatoric” and “non-hierarchical”: “two gestures pro-
duced together don’t combine to form a larger, more complex gesture. There is no hier-
archical structure of gestures made out of other gestures” (McNeill 1992: 21) and even
if several gestures are combined this does not result in a more complex gesture: “Even
[…] several gestures don’t combine into a more complex gesture. Each gesture depicts
the content from a different angle, bringing out a different aspect or temporal phase,
and each is a complete expression of meaning by itself.” (McNeill 1992: 21)
For McNeill’s theory of language and gesture this sharp distinction between the ways
in which meaning is “carried” in language and how it is “conveyed” in gesture, is of core
importance. McNeill uses a structuralist account of language as a system of arbitrary
signs as a contrastive frame that brings out the particular articulatory properties of ges-
tures and that maximizes the differences between the two modes of expression. This
sharp distinction is a prerequisite for his theory of language, gesture and thought, in
which gestures are considered to reveal a fundamentally different type of thought,
one that is imagistic and global-synthetic and holistic, whereas language forces thought
into the linearity of speech-sounds and the arbitrariness of linguistic signs. It is also con-
stitutive for his understanding of thinking and speaking as a dynamic process propelled
by an imagery-language dialectic, whose basic unit is the so-called “Growth-Point” (see
McNeill 1992: 219–239 and 2005: 92–97; McNeill and Duncan 2000): “It is this unstable
combination of opposites that fuels thought and speech.” (McNeill 2005: 92) Notably, in
his 2005 book McNeill brings in a phenomenological turn, now taking a non-dualistic
point of view with regard to the relation of gesture and mind. Rather than assuming
that gestures reveal inner images, he now argues with reference to the work of Merleau-
Ponty (1962) that gestures do not represent meaning but “inhabit” it (McNeill 2005:
91–92). Drawing on Heidegger, he proposes the H-Model of gestures, suggesting that:
“To the speaker, gesture and speech are not only ‘messages’ or communications, but are
a way of cognitively existing, of cognitively being, at the moment of speaking.” (McNeill
2005: 99) This new concept of gestural meaning and its relation to form is brought
together in the concept of gestures as “material carriers” (inspired by Vygotsky 1986),
and advances a phenomenological understanding of the meaning of gestures:
McNeill’s (1992) book set the stage for a view on the meaning of gestures as holistic,
“global-synthetic” and it inspired a wealth of research in the domain of language and
thought (see McNeill 2000; Parrill 2008; Parrill and Sweetser 2002, 2004; for a discussion
see Kendon 2008).
62 I. How the body relates to language and communication
In his 2009 book “Gesturecraft. The manu-facture of meaning” Jürgen Streeck pro-
poses a praxeological account to the meaning of gestures, which is strongly informed by
phenomenology too. However, Streeck focuses on the situatedness of meaning making
in mundane practices of the world:
The point of departure for the research reported in this book, thus, are human beings in
their daily activities. The perspective on gesture is informed by the work of phenomenolo-
gical philosophers (Heidegger 1962; Merleau-Ponty 1962; Polanyi 1958) who have argued
that we must understand human understanding by finding it, in the first place, in concrete
practical, physical activity in the world, as well as by more recent work in anthropology
[…], philosophy and linguistics […], educational psychology […], and sociology, which is
defined by the view that the human mind – and the symbols that it relies upon – are
embodied. (Streeck 2009: 6, highlighting in the original)
We suggest, […], that the conjunction of the stroke with the informational centre of the
spoken phrase is something the speaker achieves. In creating an utterance that uses
both modes of expression, the speaker creates an ensemble in which gesture and speech
are employed together as partners in a single rhetorical enterprise. (Kendon 2004: 127)
Kendon underlines the flexible ongoing adjustment of the two modes of expression in
this intertwined process of verbo-gestural meaning construction in an ongoing conver-
sation (Kendon 2004: 127).
Enfield develops a concept of meaning of gestures which departs from the perspec-
tive of the interpreter (taking a Peircian approach in this regard). In his book on the
“Anatomy of meaning: speech, gesture, and composite utterances” (Enfield 2009: IX)
he brings together semiotic (Peirce), pragmatic (Grice, Levinson) and interactive (Goff-
man, Sacks) approaches to the meaning of gestures and language as used in an interac-
tion. However, Enfield, is also in line with Kendon’s and Streeck’s take on the meaning
of gesture as situated interactional moves by conceiving gestures as elements of com-
posite utterances. His proposal opens up further important facets of Kendon’s gesture-
speech ensembles: “ composite utterances [are defined] as a communicative move that
incorporates multiple signs of multiple types.” (Enfield 2009: 15) The meaning of such
3. Gestures and speech from a linguistic perspective: A new field and its history 63
of gestural forms, which they use recurrently. As McNeill suggests for the Palm Up
Open Hand (2005: 48–53) recurrent gestures are in a process of becoming conventiona-
lized, their form-meaning relation is motivated and the motivation is still transparent,
but given their recurrent usages in a limited set of contexts, they appear best be placed
somewhere in the middle of a continuum between spontaneously created gestures on
the one hand and fully conventionalized emblems on the other hand (for different
gesture continua McNeill 2000: 2–7).
Based on a discussion of recurrent gestural forms and recurrent gestural meanings,
studies have furthermore documented that gestures can become semanticized as well
as grammaticalized. Hence culture-specific gestures can be deployed as lexical or gram-
matical elements in co-occurrence with speech or they may even enter sign linguistic
systems as lexemes or grammatical morphemes. Accordingly, gestures were, for
instance, identified as markers of negation (Harrison 2009, 2010; Müller and Speck-
mann 2002; Bressem, Müller, and Fricke in preparation), of Aktionsart (Becker et al.
2011; Bressem 2012; Ladewig and Bressem forthcoming; Müller 2000), of a topic com-
ment structure (Kendon 1995; Seyfeddinipur 2004), or as plural markers (Bressem
2012). Furthermore pathways of grammaticalization from gesture to sign have been
traced, for instance, for classifier constructions (Pfau and Steinbach 2006; Müller
2009), tense and modality (Janzen and Shaffer 2002; Wilcox 2004; Wilcox and Rossini
2010; Wilcox and Wilcox 1995), topic marking (Janzen and Shaffer 2002) or for the
development of pronouns and auxiliaries (Pfau and Steinbach 2006, 2011).
Several studies have also shown that gestures and speech are intertwined on the level
of syntax and semantics, each providing necessary information to the formation of an
utterance. Gestures are obligatory elements for the use of particular verbal deictic ex-
pressions such as so, here or there (e.g., de Ruiter 2000; Fricke 2007; Kita 2003; Streeck
2002; Stukenbrock 2008; inter alia) and may even differ in the gestural form depending
on the intended reference object of the deictic expression (e.g., Fricke 2007; Kendon
2004). Gestures also stand in close relation with aspects of verbal negation (e.g., Bres-
sem, Müller, and Fricke in preparation; Calbris 1990, 2003, 2008; Harrison 2009; Ken-
don 2003, 2004; Bressem, Müller, and Fricke in preparation; Streeck 2009 inter alia)
and may go along with different types of negation, such as negative particles, morpho-
logical negation, implicit negation as well as the grammatical aspects of scope and node
of negation (e.g., Harrison 2009, 2010; Lapaire 2006). Furthermore, gestures seem to be
closely related with grammatical categories of the verbal utterance so that iconic ges-
tures, for instance, often correlate with nouns, verbs, and adjectives (e.g., Hadar and
Krauss 1999; Sowa 2005; Bergmann, Aksu, and Kopp 2011).
Various scholars have furthermore argued that gestures can be integrated into the
syntactic structure of an utterance (e.g., Andrén 2010; Bohle 2007; Clark 1996; Clark
and Gerrig 1990; Goodwin 1986, 2007; Enfield 2009; Langacker 2008; McNeill 2005,
2007; Müller and Tag 2010; Slama-Cazacu 1976; Streeck 1988, 1993, 2002, 2009; Wilcox
2002). Recent empirical studies have expanded those characterizations and suggest that
gestures can take over syntactic functions either by accompanying or by substituting
speech. Fricke (2012), for instance, distinguishes two forms of integrability: gestures
may be integrated by positioning, that is either through occupying a syntactic gap or
through temporal overlap; or they might be integrated cataphorically, that is by using
deictic expressions son or solch (‘such a’). These deictic expressions demand “a quali-
tative description that can be instantiated gesturally” (Fricke 2012, our translation). In
doing so gestures expand a verbal noun phrase and serve as an attribute. This phenom-
enon, also referred to as “multimodal attribution”, furnishes evidence for the structural
and functional integration of gestures into spoken language and laid the ground for
developing the framework of a “multimodal grammar” (Fricke 2012).
Bressem (2012) and Ladewig (2012) expanded the notion of a multimodal grammar
by showing that gestures either accompany or substitute nouns and verbs of spoken
utterances. Bressem could show that gestural repetitions can serve attributive function
when they co-occur with noun phrases and they serve adverbial function in cases of
temporal overlap with verb phrases. The potential of gestures to take over syntactic
functions by specifying the shape and size of objects or depicting the manner of the
action is in those cases not bound to a cataphoric integration of the gestures into verbal
utterance by explicit linguistic devices (see Fricke 2012), but rather seems to be based
on the temporal, semantic, and syntactic overlap of the gestures with speech. In her
study on gestures in syntactic gaps exposed by interrupted utterances, Ladewig could
show that gestures do not adopt all kinds of syntactic functions when substituting
speech as was assumed by some authors (e.g., Slama-Cazacu 1976). On the contrary,
when replacing speech gestures preferably fulfill the function of objects and predicates.
Based on her study she argued for a “continuum of integrability” (Ladewig 2012: 183)
in which the link between gesture and speech can be conceived of as varyingly strong
depending on three aspects: the type of integration, the distribution of information
over the different modalities, and the order in which speech and gesture are deployed.
66 I. How the body relates to language and communication
In her study, she furthermore found that referential gestures (e.g., gestures referring to
concrete or abstract actions, entities, events, properties) are the most frequently used
type of gesture with a substitutive function. This finding questions the widely accepted
assumption that the kinds of gestures typically used to replace speech are emblematic
or pantomimic gestures.
Gestures not only take over syntactic functions when forming multimodal utterances,
but they also contribute to the semantics of an utterance. Gestures may replace infor-
mation, illustrate and emphasize what is being uttered verbally, soften or slightly modify
the meaning expressed in speech or even create a discrepancy between the gestural and
verbal meaning (see Bavelas, Kenwood, and Phillips 2002; Bergmann, Aksu, and Kopp
2011; Bressem 2012; Calbris 1990; Engle 2000; Freedman 1977; Fricke 2012; Gut et al.
2002; Kendon 1987, 1998, 2004; Ladewig 2011, 2012; McNeill 1992, 2005; Scherer 1979;
Slama-Cazacu 1976). In exploring the semantic relation of gesture and speech, linguistic
studies have offered different approaches. Semantic information conveyed in both mod-
alities can be described in terms of image schematic structures (e.g., Cienki 1998b, 2005;
Ladewig 2010, 2011, 2012; Mittelberg 2006, 2010a; Williams 2008) or in terms of seman-
tic features (e.g., Beattie and Shovelton 1999, 2007; Bergmann, Aksu, and Kopp 2011;
Bressem 2012; Kopp, Bergmann, and Wachsmuth 2008; Ladewig 2012). Together with
the temporal position of gesture and speech the semantic relation of both modalities
as well as the semantic function of gestures can be captured: If gestures double the
information expressed in speech, their relation can be described as co-expressive
(McNeill 1992, 2005) or redundant (e.g., Gut et al. 2002). If they add information to
that expressed in speech, the relation between both can be described as “complemen-
tary” or “supplementary”. In these cases gestures modify information expressed in
speech (e.g., Andrén 2010; Birdwhistell 1970; Bergmann, Aksu, and Kopp 2011; Bres-
sem 2012; Kendon 1986, 2004; Freedman 1977; Fricke 2012; Scherer 1979). Thereby
both modalities are considered as being “enriched” by their co-occurrence and by
the context in which they are embedded (Enfield 2009, this volume; see also Bressem
2012; Ladewig 2012). At the same time, the range of a gesture’s possible meanings is
reduced as the spoken modality provides necessary information to single out a refer-
ence object (Ladewig 2012). In doing so gestures “are not limited to primarily depicting
specific situations or individuals” but “can be used to depict types or kinds of things,
like prototypes” (Engle 2000: 39). Gestures may single out exemplar interpretations
in speech by picking out a specific individual from a collection mentioned in speech
(Engle 2000) and thus refer to a meaning or concept associated with a word, that is a
prototype, or to an intended object of reference. By being interpretant- or object
related (Fricke 2007, 2012 based on Peirce 1931), gestures are not always and only
tied to the representation of referents in the real world, but are also capable of
seemingly contradicting the intended object of reference (see Fricke 2012).
Gestures that replace speech fulfill a substitutive function and can form an utterance
on their own or provide the semantic center of a multimodal utterance (e.g., Bohle
2007; Clark 1996; Clark and Gerrig 1990; McNeill 2005, 2007; Müller and Tag 2010;
Slama-Cazacu 1976; Wilcox 2002). The unit created by both modalities has been
referred to as “gesture-speech ensemble” (Kendon 2004), “multimodal utterance”
(Goodwin 2006; Wachsmuth 1999), “composite utterance” (Enfield 2009, this volume),
“composite signal” (Clark 1996; Clark and Gerrig 1990; Engle 1998), “multimodal
package” (Streeck this volume), and as “hybrid utterance” (Goodwin 2007).
3. Gestures and speech from a linguistic perspective: A new field and its history 67
[…] the traditional view of mind is mistaken, because human cognition is fundamentally
shaped by various poetic or figurative processes. Metaphor, metonymy, irony, and other
tropes are not linguistic distortions of literal mental thought but constitute basic schemes
by which people conceptualize their experience and the external world. (Gibbs 1994: 3)
Research on gestures in relation to verbal metaphoric expressions has made even more
specific claims by proposing that gestures used in conjunction with speech may form
multimodal metaphors. In doing this, the gesture part of the metaphor very frequently
embodies the experiential source domain of the verbalized metaphoric expression (e.g.,
Calbris 1990, 2003, 2011; Cienki 1998a, 2008; Cienki and Müller 2008b; Kappelhoff and
Müller 2011; McNeill 1992; McNeill and Levy 1982; Müller 1998, 2004, 2008a, 2008b;
Müller and Cienki 2009; Núñez and Sweetser 2006; Sweetser 1998; Webb 1998; a
state of the art is given in Cienki and Müller 2008a). What is striking about the studies
on metaphor, gesture, and speech is the variable relation of both modalities in expres-
sing metaphoricity: either metaphoricity can be expressed monomodally, that is in
speech or gesture, or multimodally, that is in both speech and gesture (Cienki 2008;
Cienki and Müller 2008b; Müller and Cienki 2009). The observed distribution of meta-
phoric meaning across the different modalities led to an enhanced understanding of the
“modality-independent nature of metaphoricity” (Müller and Cienki 2009: 321; see also
Müller 2008a, 2008b). It has been suggested that metaphors are clearly delimited count-
able units – apt for statistical analysis in corpus linguistics – yet studying metaphors in
the context of multimodal discourse, e.g. as a phenomenon of use, it turned out that
very often metaphors are not bound to single lexical items but rather evolve over
time. This points to an understanding of metaphoricity as a process not as product, a
process which can evolve dynamically over time in an interaction, through speech, ges-
ture, and other modalities (Kappelhoff and Müller 2011; Kolter et al. 2012). In those
multimodally orchestrated interactions, metaphoric gestures in conjunction with speech
are used as a foregrounding strategy which activates metaphoricity of sleeping meta-
phors (e.g., so called “dead” metaphors, see Müller 2008a, 2008b and Müller and Tag
2010 for an extended version of this argument).
68 I. How the body relates to language and communication
Also the notion of conceptual metonymy (e.g., Gibbs 1994; Lakoff and Johnson
1980) is recently receiving increasing attention in the field of gesture studies (Ishino
2001; Mittelberg 2006, 2008, 2010a, 2010b; Müller 1998, 2004, 2009). It is assumed to
play a major role in gestural sign formation. Mittelberg (2006, 2008, 2010b), for
instance, proposes that observers of gestures follow a metonymic path from a gesture
to infer a conceived object. She suggests “that accounting for metonymy in gesture
may illuminate links between habitual bodily acts, the abstractive power of the mind,
and interpretative/inferential processes.” (Mittelberg 2006: 292–293) By introducing
the concepts of “internal and external metonymy” (Jakobson and Pomorska 1983; Mit-
telberg 2006, 2010b), different processes of abstraction involved in the creation and
interpretation of gestures are disentangled. Accordingly, in the case of “internal meto-
nymy” an observer of a gesture can infer a whole action or an object of which salient
aspects are depicted gesturally. In case of “external metonymy” objects that are ma-
nipulated by the hands can be inferred via a contiguity relation between the object
and the hand.
Processes of abstraction in gestures are pertinent to the motivation of the form of
gestures and they contribute significantly to the meaning of gestures. They concern
the level of pre-conceptual structures such as image schemas (see above), action sche-
mas (e.g., Bressem, Müller and Fricke in preparation; Calbris 2011, this volume; Mittel-
berg 2008; Teßendorf 2008; Streeck 2008, 2009), mimetic schemas (Zlatev 2002), and
motor patterns (Mittelberg 2006; Ladewig and Teßendorf 2008, in preparation).
Conceptual blending must be regarded as a higher cognitive process since it concerns
the construction of complex forms of meaning in gestures (Parrill and Sweetser 2004;
Sweetser and Parrill volume 2) as well as in signs (Liddell 1998) and in the interactive
construction of multi-layered blends as for instance in the context of a school teacher’s
explanation of how a clock symbolizes time (Williams 2008).
Gestures in language use have also been subject to analyses addressing more specif-
ically issues of a cognitive grammar: Bressem (2012) on repetitions in gestures, Harrison
(2009) on gestures and negation, Ladewig (2012) on the semantic and syntactic integra-
tion of gestures, and Wilcox on the grammaticalization of gestures to signs of signed
languages (2004). Núñez (2008) and Streeck (2009) have pointed out that gestures
embody what Leonard Talmy terms “fictive motion” (Talmy 1983), thus exhibiting
that abstract concepts lexicalized for instance as motion verbs (e.g. the road runs
along the river), are conceived as actual body motion. Mittelberg (2008) and Bressem
(2012) both found that gestures appear to play a vital role in the establishment of
so-called “reference-points” (e.g. Langacker 1993). A reference point is a cognitively
salient item that provides mental contact with a less salient target. A gestural form
serves as a reference point by providing cognitive access to a concrete or abstract
object. In so doing, gestures may guide the hearer’s attention to particular aspects of
a conversation. In reference point relations, gestures may “serve as an index” providing
cognitive access to a construed object (Mittelberg 2008: 129).
Broadening the scope from the meaning and functions of single units to cognition
and gesture in language as use, Harrison (2009), Andrén (2010), and Bressem (2012)
have proposed to conceive of this interplay as multimodal or embodied constructions.
Müller (2008a, 2008b), Müller and Tag (2010), and Ladewig (2012) have suggested
that gestures display the flow of attention, especially with regard to the foregrounding
and activating of metaphoricity (for an extension to audio-visual multimodal metaphor
3. Gestures and speech from a linguistic perspective: A new field and its history 69
see Kappelhoff and Müller 2011). Furthermore, Bressem (2012) has suggested that
repetitions in gesture follow the attentional flow.
A further aspect, which has gained vital interest over the past years in gesture
research, concerns the gestural representation of motion events and its relation with
grammatical aspects of the verbal utterance and the information distributed across
the modalities (e.g., Duncan 2005; Gullberg 2011; Kita 2000; Kita and Özyürek 2002;
McNeill 2000; McNeill and Duncan 2000; McNeill and Levy 1982; Müller 1998; Parrill
2008 inter alia). Numerous studies have shown that gestural representations of the same
motion event may differ across languages depending on whether the languages are
verb- or satellite-framed. Whereas speakers of English, for instance, might express
the notion of a ball rolling down a hill in one clause and one gesture, which represents
the motion and the direction at the same time, Japanese or Turkish speakers express the
same notion in two verbal clauses accompanied by two distinct gestures, one expressing
the motion and the other the direction or manner of motion (Kita and Özyürek 2002;
Kita et al. 2007). Thus, if meaning is distributed over two spoken clauses, the same
meaning is likely to be expressed in two gestures, each expressing similar meaning as
the spoken clause (Kita et al. 2007). Therefore, gestures reflect information considered
relevant for expression (what to say) as well as its linguistic encoding (how to say it),
with cross-linguistic consequences. Gestures thus reflect linguistic conceptualization
and cross-linguistic differences in such conceptualizations. (Gullberg 2011: 148)
To sum up: bringing together gesture studies and cognitive perspectives on language
and language as use contributes to the discussion of “embodied cognition”, underlining
that cognitive processes and conceptual knowledge are deeply rooted in the body’s
interactions with the world.
when creating discourse, participating “in a real-time dialectic during discourse, and
thus propel and shape speech and thought as they occur moment to moment” (McNeill
2005: 3).
Another dynamic dimension introduced by McNeill “that reveals itself quite natu-
rally when extending one’s focus from single gesture-speech units to the unfolding of
discourse” (Müller 2007: 109f.) is that of “communicative dynamism” (Firbas 1971).
Following Firbas, communicative dynamism is regarded “as the extent to which the
message at a given point is pushing the communication forward.” (McNeill 1992:
207) McNeill observed that the quantity of gestures as well as the complexity of gestural
and spoken expressions would “increase at points of topic shift, such as new narrative
episodes or new conversational themes” (McNeill and Levy 1993: 365). Further,
when speech and gesture synchronize, i.e., when they are used in temporal overlap,
co-expressing “a single underlying meaning”, the “point of highest communicative
dynamism” is reached (McNeill 2007: 20). With these information revealed in speech
and gesture, it can be traced what a speaker focuses on along a narration. “As the
speaker moves between levels and event lines, at any given moment some element is
in focus and other elements recede in the background […] The focal element will
have the effect of pushing the communication forward” (McNeill 1992: 207).
McNeill’s observations on communicative dynamism pave ways for Müller’s obser-
vations of dynamic meaning activation going along with different attentional foci of
the speaker (Müller 2007, 2008a, 2008b; Müller and Tag 2010). Adopting a discourse
perspective on the analysis of multimodal communication, she found that meaning
(and in particular metaphoric meaning) is not created on the spot but emerges over
the flow of the discourse. Through the interplay of the different communicative re-
sources participants of a conversation have at hand, meaning can be activated to differ-
ent degrees and become foregrounded for both speaker and recipient. In their analysis
on metaphoricity in multimodal communication, Müller and Tag (2010) identified three
different foregrounding techniques in which gestures play a significant role. Accord-
ingly, when metaphoricity is being expressed in only one modality, that is in speech
or gesture, it is regarded only minimally activated. When metaphoricity is being elabo-
rated or expressed in both speech and gesture it is considered waking and highly acti-
vated. This dynamic foregrounding of different aspects of (metaphoric) meaning goes
along with a moving focus of attention (Chafe 1994). In doing so, “participants in a
conversation co-construct an interactively attainable salience structure, that they
engage in a process of profiling metaphoric meaning by foregrounding it” (Müller
and Tag 2010).
Recent work within the framework of “dynamic multimodal communication” (Mül-
ler 2008a, 2008b) focuses on the experiential grounding of metaphoric meaning. More
precisely, minute studies of face-to-face communication in therapeutic settings and in
the context of dance lessons revealed that bodily movements as well as their “felt qua-
lities” (Johnson 2005; see also Sheets-Johnstone 1999) provide the affective, embodied
grounds of metaphoricity (Kappelhoff and Müller 2011; Kolter et al. 2012). Metaphori-
city can be observed to emerge from bodily movement being verbalized at a later point
in the conversation proving the dynamic dimension of metaphoric meaning. These ob-
servations give empirical evidence of what has been referred to as “languageing of
movement” (Sheets-Johnstone 1999) – the translation of body movements into words
and, as such, the emergence of meaning from the body.
3. Gestures and speech from a linguistic perspective: A new field and its history 71
3. Conclusion
Regarding gestures and speech from a linguistic perspective addresses the properties of
gestures as a medium of expression both in conjunction with speech and as a modality
with its own particular characteristics. It departs from the assumption that the hands
possess the articulatory and functional properties to potentially develop a linguistic sys-
tem (Müller 1998, 2009, this volume; Müller, Bressem, and Ladewig this volume). That
the hands can indeed become language is visible in signed languages all over the world.
In the early days of sign linguistics the challenge was to prove that signed languages
are actually languages. In order to substantiate this claim, a sharp boundary had to be
drawn between gestures and signs. However, with the increasing recognition of signed
languages as full-fledged linguistic systems, the stage has opened up for gestures to
be studied as precursors of signs (Kendon 2004: chapter 15; Armstrong and Wilcox
2007).
This brings us back to claims concerning gestures as the universal language of man-
kind, especially as Quintilian formulates them. What we see in co-verbal gestures are
pre-requisites of embodied linguistic structures and patterns that can evolve to lan-
guage, when the oral mode of expression is not a viable form of communication. We
would like to suggest therefore that studying gestures and their “grammar” allows us
to gain some insights into processes of language evolution within the manual modality.
Despite the missing reflection upon gestures as part of language for most part of the
twentieth century, a linguistic view on the multimodality of language has by now proven
to be a valuable “companion to other present foci, such as psychological or interactional
approaches, by expanding the fields of investigations and approaches in gesture studies
and thereby contributing to a more thorough understanding of the medium ‘gesture’
itself as well as the relation of speech and gesture.” (Bressem and Ladewig 2011: 87)
By allowing for a different point of view on phenomena observable in gestures and
its relation with speech, a linguistic view not only further unravels that nature of how
speech and gesture “arise from a single process of utterance formation” (McNeill
1992: 30) and are able to “appear together as manifestations of the same process of
utterance” (Kendon 1980: 208), but moreover underpins the multimodal nature of
language use and of language in general.
Acknowledgements
We are grateful to the Volkswagen Foundation for supporting this work with a grant for
the interdisciplinary project “Towards a grammar of gesture: Evolution, brain and
linguistic structures” (www.togog.org).
4. References
Albrecht, Jörn 2007. Europäischer Strukturalismus: Ein forschungsgeschichtlicher Überblick.
Tübingen: Gunter Narr.
Andrén, Mats 2010. Children’s gestures from 18 to 30 months. Ph.D. dissertation, Centre for Lan-
guages and Literature, Lund University.
Argyle, Michael 1975. Bodily Communication. New York: International Universities Press.
Armstrong, David F. and Sherman E. Wilcox 2007. The Gestural Origin of Language. New York:
Oxford University Press.
72 I. How the body relates to language and communication
Barnett, Dene 1990. The art of gesture. In: Volker Kapp (ed.), Die Sprache der Zeichen und
Bilder, Rhetorik und nonverbale Kommunikation in der frühen Neuzeit, 65–76. Marburg:
Hitzeroth.
Battison, Robin 1974. Phonological deletion in American sign language. Sign Language Studies 5:
1–19.
Bavelas, Janet Beavin, Trudy Johnson Kenwood and Bruce Phillips 2002. An experimental study
of when and how speakers use gesture to communicate. Gesture 2(1): 1–17.
Beattie, Geoffrey and Heather Shovelton 1999. Do iconic hand gestures really contribute anything
to the semantic information conveyed by speech? An experimental investigation. Semiotica
123(1/2): 1–30.
Beattie, Geoffrey and Heather Shovelton 2007. The role of iconic gesture in semantic communi-
cation and its theoretical and practical implications. In: Susan D. Duncan, Justine Cassell and
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language, Volume 1, 221–241.
Philadelphia: John Benjamins.
Becker, Raymond, Alan Cienki, Austin Bennett, Christina Cudina, Camille Debras, Zuzanna
Fleischer, Michael Haaheim, Torsten Müller, Kashmiri Stec and Alessandra Zarcone 2011. Ak-
tionsarten, speech and gesture. Proceedings of the 2nd Workshop on Gesture and Speech in
Interaction – GESPIN, Bielefeld, Germany, 5–7 September.
Bergmann, Kirsten, Volkan Aksu and Stefan Kopp 2011. The relation of speech and gestures:
Temporal synchrony follows semantic synchrony. Paper presented at the 2nd Workshop on
Gesture and Speech in Interaction – GESPIN, Bielefeld, Germany, 5–7 September.
Birdwhistell, Ray L. 1970. Kinesics and Context. Philadelphia: University of Pennsylvania Press.
Bloomfield, Leonard 1983. An Introduction to the Study of Language. Volume 3. Amsterdam: John
Benjamins.
Bohle, Ulrike 2007. Das Wort ergreifen – das Wort übergeben: Explorative Studie zur Rolle rede-
begleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bolinger, Dwight 1983. Intonation and gesture. American Speech 58(2): 156–174.
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D. disser-
tation, European University Viadrina, Frankfurt (Oder).
Bressem, Jana this volume. A linguistic perspective on the notation of form features in gestures.
In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin: De Gruyter Mouton.
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases – articulatory features of
gestural movement? Semiotica 184(1/4): 53–91.
Bressem, Jana, Silva H. Ladewig and Cornelia Müller this volume. Linguistic annotation system
for gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Bressem, Jana, Cornelia Müller and Ellen Fricke in preparation. “No, not, none of that” – cases of
exclusion and negation in gesture.
Butterworth, Brian and Uri Hadar 1989. Gesture, speech, and computational stages: A reply to
McNeill. Psychological Review 96(1): 168–174.
Calbris, Geneviève 1990. The Semiotics of French Gestures. Bloomington: Indiana University Press.
Calbris, Geneviève 2003. From cutting an object to a clear cut analysis. Gesture as the representa-
tion of a preconceptual schema linking concrete actions to abstract notions. Gesture 3(1): 19–46.
Calbris, Geneviève 2008. From left to right…: Coverbal gestures and their symbolic use of space.
In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 27–53. Amsterdam: John
Benjamins.
Calbris, Geneviève 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.
3. Gestures and speech from a linguistic perspective: A new field and its history 73
Calbris, Geneviève this volume. Elements of meaning in gesture. In: Cornelia Müller, Alan Cienki,
Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Lan-
guage – Communication: An International Handbook on Multimodality in Human Interaction.
(Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Chafe, Wallace L. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Con-
scious Experience in Speaking and Writing. Chicago: University of Chicago Press.
Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge: Massachusetts Institute of
Technology Press.
Chomsky, Noam 1981. Lectures on Government and Binding. Dordrecht, the Netherlands: Foris.
Chomsky, Noam 1992. A Minimalist Program for Linguistic Theory. Cambridge: Massachusetts
Institute of Technology Press.
Cienki, Alan 1998a. Metaphoric gestures and some of their relations to verbal metaphorical ex-
pressions. In: Jean-Pierre König (ed.), Discourse and Cognition: Bridging the Gap, 189–204.
Stanford, CA: Center for the Study of Language and Information.
Cienki, Alan 1998b. Straight: An image schema and its metaphorical extensions. Cognitive Lin-
guistics 9(2): 107–149.
Cienki, Alan 2005. Image schemas and gesture. In: Beate Hampe (ed.), From Perception to Mean-
ing: Image Schemas in Cognitive Linguistics, 421–442. Berlin: De Gruyter Mouton.
Cienki, Alan 2008. Why study metaphor and gesture. In: Alan Cienki and Cornelia Müller (eds.),
Metaphor and Gesture, 5–25. Amsterdam: John Benjamins.
Cienki, Alan 2010. Gesture and (cognitive) linguistic theory. In: Rosario Caballero (ed.), Proceed-
ings of the XXVII AESLA International Conference ‘Ways and Modes of Human Communica-
tion’, 45–56. Ciudad Real, Spain: Universidad de Castilla-La Mancha.
Cienki, Alan 2012. Usage events of spoken language and the symbolic units (may) abstract from
them. In: Krzysztof Kosecki and Janusz Badio (eds.), Cognitive Processes in Language, 149–
158. Frankfurt: Peter Lang.
Cienki, Alan this volume. Cognitive Linguistics: Spoken language and gesture as expressions of
conceptualization. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, gesture and thought. In: Raymond W. Gibbs
(ed.), Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge: Cambridge Uni-
versity Press.
Clark, Herbert H. 1996. Using Language. Volume 4. Cambridge: Cambridge University Press.
Clark, Herbert H. and Richard J. Gerrig 1990. Quotations as demonstrations. Language 66(4):
764–805.
Condon, William C. and Richard Ogston 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143(4): 338–347.
Condon, William C. and Richard Ogston 1967. A segmentation of behavior. Journal of Psychiatric
Research 5: 221–235.
Copple, Mary this volume. Enlightenment philosophy: Gestures, language, and the origin of
human understanding. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An Inter-
national Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and
Communication Science 38.1.) Berlin: De Gruyter Mouton.
De Jorio, Andrea 2000. Gesture in Naples and Gesture in Classical Antiquity. A translation of
La mimica degli antichi investigata nel gestire napoletano. With an introduction and notes by
Adam Kendon. Bloomington: Indiana University Press. First published Fibreno, Naples [1832].
De Ruiter, Jan Peter 2000. The production of gesture and speech. In: David McNeill (ed.), Lan-
guage and Gesture, 284–311. Cambridge: Cambridge University Press.
74 I. How the body relates to language and communication
Duncan, Susan 2005. Gesture in signing: A case study in Taiwan Sign Language. Language and
Linguistics 6(2): 279–318.
Dutsch, Dorota this volume. The body in rhetorical delivery and in theatre – An overview of clas-
sical works. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published [1941].
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories, ori-
gins, usage and coding. Semiotica 1(1): 49–98.
Ekman, Paul and Erika Rosenberg (eds.) 1997. What the Face Reveals: Basic and Applied Studies
of Spontaneous Expression Using the Facial Action Coding System (FACS). New York: Oxford
University Press.
Enfield, N. J. 2009. The Anatomy of Meaning: Speech, Gesture, and Composite Utterances. Cam-
bridge: Cambridge University Press.
Enfield, N. J. this volume. A ‘Composite Utterances’ approach to meaning. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interaction.
(Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Engle, Randi A. 1998. Not channels but composite signals: Speech, gesture, diagrams and object
demonstrations are integrated in multimodal explanations. In: Morton Ann Gernsbacher and
Sharon J. Derry (eds.), Proceedings of the Twentieth Annual Conference of the Cognitive
Science Society, 321–326. Mahwah, NJ: Erlbaum.
Engle, Randi A. 2000. Toward a theory of multimodal communication combining speech, gestures,
diagrams, and demonstrations in instructional explanations. Ph.D. dissertation, Stanford University.
Esposito, Anna and Maria Marinaro 2007. What pauses can tell us about speech and gesture part-
nership. In: Anna Esposito, Maja Bratanic, Eric Keller and Maria Marinaro (eds.), Fundamentals
of Verbal and Nonverbal Communication and the Biometric Issue, 45–57. Amsterdam: IOS Press.
Fauconnier, Gilles and Mark Turner 2002. The Way We Think: Conceptual Blending and the
Mind’s Hidden Complexities. New York: Basic Books.
Feldmann, Robert S. and Bernard Rimé (eds.) 1991. Fundamentals of Nonverbal Behavior. Cam-
bridge: Cambridge University Press.
Feyereisen, Pierre 1987. Gestures and speech, interactions and separations: A reply to McNeill.
Psychological Review 94(4): 493–498.
Firbas, Jan 1971. On the concept of communicative dynamism in the theory of functional sentence
perspective. Brno Studies in English 7: 12–47.
Freedman, Norbert 1977. Hands, words and mind: On the structuralization of body movements
during discourse and the capacity for verbal representation. In: Norbert Freedman and Stanley
Grand (eds.), Communicative Structures and Psychic Structures, 109–132. New York: Plenum.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: Walter de Gruyter.
Fricke, Ellen 2010. Phonaestheme, Kinaestheme und multimodale Grammatik: Wie Artikulatio-
nen zu Typen werden, die bedeuten können. In: Sprache und Literatur 41(1): 70–88.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin:
De Gruyter Mouton.
Fricke, Ellen this volume. Towards a unified grammar of gesture and speech. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interac-
tion. (Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Gerwing, Jennifer and Janet Beavin Bavelas this volume. The social interactive nature of gestures:
theory, assumptions, methods, and findings. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Commu-
nication: An International Handbook on Multimodality in Human Interaction. (Handbooks of
Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
3. Gestures and speech from a linguistic perspective: A new field and its history 75
Gibbs, Raymond W. 1994. The Poetics of Mind: Figurative Thought, Language, and Understanding.
Cambridge: Cambridge University Press.
Goodwin, Charles 1986. Gesture as a resource for the organization of mutual orientation. Semi-
otica 62(1/2): 29–49.
Goodwin, Charles 2006. Human sociality as mutual orientation in a rich interactive environment:
Multimodal utterances and pointing in Aphasia. In: N. J. Enfield and Stephen C. Levinson
(eds.), Roots of Human Sociality: Culture, Cognition and Interaction, 97–125. London: Berg.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell, and
Elena Levy (eds.), Gesture and Dynamic Dimensions of Language, 195–212. Amsterdam: John
Benjamins.
Graf, Fritz 1994. Gestures and conventions: The gestures of Roman actors and orators. In: Jan Bremmer
and Herman Roodenburg (eds.), A Cultural History of Gesture, 36–58. Cambridge: Polity Press.
Gullberg, Marianne 2011. Thinking, speaking and gesturing about motion in more than one lan-
guage. In: Aneta Pavlenko (ed.), Thinking and Speaking in Two Languages, 143–169. Bristol:
Multilingual Matters.
Gut, Ulrike, Karin Looks, Alexandra Thies and Dafydd Gibbon 2002. Cogest: Conversational ges-
ture transcription system version 1.0. Fakultät für Linguistik und Literaturwissenschaft, Univer-
sität Bielefeld, ModeLex Tech. Report, 1.
Hadar, Uri this volume. Coverbal gestures: Between communication and speech production. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.)
Berlin: De Gruyter Mouton.
Hadar, Uri and Robert Krauss 1999. Iconic gestures: The grammatical categories of lexical affili-
ates. Journal of Neurolinguistics 12(1): 1–12.
Harris, Randy A. 1995. The Linguistics Wars. New York: Oxford University Press, USA.
Harris, Zellig 1951. Methods in Structural Linguistics. Chicago: Chicago University Press.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Ph.D.
dissertation, Université Michel de Montaigne, Bourdeaux 3.
Harrison, Simon 2010. Evidence for node and scope of negation in coverbal gesture. Gesture 10(1):
29–51.
Heidegger, Martin 1962. Being and Time. Translated by John Macquarrie and Edward Robinson.
New York: Harper and Row.
Hinde, Robert A. (ed.) 1972. Nonverbal Communication. Cambridge: Cambridge University Press.
Hockett, Charles F. 1958. A Course in Modern Linguistics. New York: MacMillan.
Hopper, Paul 1998. Emergent grammar. In: Michael Tomasello (ed.), The New Psychology of Lan-
guage: Cognitive and Functional Approaches to Language Structure, volume 1, 155–175. Mah-
wah, NJ: Lawrence Erlbaum.
Hougaard, Anders and Gitte Rasmussen this volume. Fused bodies: on the interrelatedness of cog-
nition and interaction. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communica-
tion Science 38.1.) Berlin: De Gruyter Mouton.
Ishino, Mika 2001. Conceptual metaphors and metonymies of metaphoric gestures of anger in dis-
course of native speakers of Japanese. Mary Andronis, Christopher Ball, Heidi Elston and Syl-
vain Neuvel (eds.), CLS 37: The Main Session, 259–273. Chicago: Chicago Linguistic Society.
Jakobson, Roman and Krystyna Pomorska 1983. Dialogues. Cambridge: Massachusetts Institute of
Technology Press.
Janzen, Terry and Barbara Shaffer 2002. Gesture as the substrate in the process of ASL grammati-
calization. In: Richard P. Meier, Kearsy Cormier and David Quinto-Pozos (eds.), Modality and
Structure in Signed and Spoken Languages, 199–223. Cambridge: Cambridge University Press.
Johnson, Mark 2005. The philosophical significance of image schemas. In: Beate Hampe (ed.), From
Perception to Meaning: Image Schemas in Cognitive Linguistics, 15–33. Berlin: De Gruyter Mouton.
76 I. How the body relates to language and communication
Kappelhoff, Hermann and Cornelia Müller 2011. Embodied meaning construction. Multimodal
metaphor and expressive movement in speech, gesture, and feature film. Metaphor in the Social
World 1(2): 121–153.
Kendon, Adam 1972. Some relationships between body motion and speech: An analysis of an
example. In: Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communica-
tion, 177–210. New York: Elsevier.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
R. Key (ed.), Nonverbal Communication and Language, 207–227. The Hague: Mouton.
Kendon, Adam (1983). Gesture and speech: How they interact. In: John M. Wiemann (ed.), Non-
verbal interaction, 13–46. Beverly Hills, California: Sage Publications.
Kendon, Adam 1986. Some reasons for studying gesture. Semiotica 62(1/2): 3–28.
Kendon, Adam 1987. On gesture: Its complementary relationship with speech. In: Aaron W. Sieg-
man and Stanley Feldstein (eds.), Nonverbal Behavior and Communication, 65–97. London:
Lawrence Erlbaum.
Kendon, Adam 1990. Conducting Interaction: Patterns of Behaviour in Focused Encounters. Cam-
bridge: Cambridge University Press.
Kendon, A. 1995. Gestures as illocutionary and discourse structure markers in Southern Italian
conversation. Journal of Pragmatics, 23: 247–279.
Kendon, Adam 1998. Die wechselseitige Einbettung von Geste und Rede. In: Caroline Schmauser
and Thomas Knoll (eds.), Körperbewegungen und ihre Bedeutungen, 9–19. Berlin: Arno Spitz.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kendon, Adam 2003. Some uses of the head shake. Gesture 2(2): 147–182.
Kendon, Adam 2008. Language’s matrix. Gesture 9: 355–372.
Kendon, Adam this volume. Exploring the utterance roles of visible bodily action: A personal
account. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Kendon, Adam and Andrew Ferber 1973. A description of some human greetings. In: Richard
Phillip Michael and John Hurrell Crook (eds.), Comparative Ecology and Behaviour of Pri-
mates, 591–668. London: Academic Press.
Kendon, Adam, Richard M. Harris and Mary Ritchie Key 1975. The Organization of Behavior in
Face-to-Face Interaction. The Hague: Mouton.
Kidwell, Mardi this volume. Framing, grounding and coordinating conversational interaction: Pos-
ture, gaze, facial expression, and movement in space. In: Cornelia Müller, Alan Cienki, Ellen
Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language –
Communication: An International Handbook on Multimodality in Human Interaction. (Hand-
books of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.), Lan-
guage and Gesture. Cambridge: Cambridge University Press.
Kita, Sotaro 2003. Pointing where language, culture, and cognition meet. Mahwah, NJ: Lawrence
Erlbaum.
Kita, Sotaro and Asli Özyürek 2002. What does cross-linguistic variation in semantic coordination
of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and
speaking. Journal of Memory and Language 48: 16–32.
Kita, Sotaro, Asli Özyürek, Shanley Allen, Amanda Brown, Reyhan Furman and Tomoko Ishizuka
2007. Relations between syntactic encoding and co-speech gestures: Implications for a model of
speech and gesture production. Language and Cognitive Processes 22(8): 1212–1236.
Kolter, Astrid, Silva H. Ladewig, Michela Summa, Sabine Koch, Thomas Fuchs and Cornelia
Müller 2012. Body memory and emergence of metaphor in movement and speech. An interdis-
ciplinary case study. In: Sabine Koch, Thomas Fuchs, Michela Summa and Cornelia Müller
(eds.), Body Memory, Metaphor, and Movement, 201–226. Amsterdam: John Benjamins.
3. Gestures and speech from a linguistic perspective: A new field and its history 77
Kopp, Stefan, Kirsten Bergmann and Ipke Wachsmuth 2008. Multimodal communication from
multimodal thinking – towards an integrated model of speech and gesture production. Interna-
tional Journal of Semantic Computing 2(1): 115–136.
Ladewig, Silva H. 2007. The family of the cyclic gesture and its variants – systematic variation of
form and contexts. http://www.silvaladewig.de/publications/papers/Ladewig-cyclic_gesture_pdf;
accessed January 2008.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste.
In: Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. http://
cognitextes.revues.org/406.
Ladewig, Silva H. 2012 Syntactic and semantic integration of gestures into speech: Structural, cogni-
tive, and conceptual aspects. Ph.D. dissertation, European University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium hand – Discovering
Structures in gestures based on the four parameters of sign language. Semiotica.
Ladewig, Silva H. and Sedinha Teßendorf in preparation. The brushing-aside and the cyclic
gesture – reconstructing their underlying patterns.
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: Chicago University Press.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar: Theoretical Prerequisites. Stan-
ford, CA: Stanford University Press.
Langacker, Ronald W. 1993. Reference-point constructions. Cognitive Linguistics 4(1): 1–38.
Langacker, Ronald W. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford Univer-
sity Press.
Lapaire, Jean-Remı́ 2006. Negation, reification and manipulation in a cognitive grammar of sub-
stance. In: Stephanie Bonnefille and Sebastian Salbayre (eds.), La Négation, 333–349. Tours:
Presses Universitaires François Rabelais.
Liddell, Scott 1998. Grounded blends, gestures, and conceptual shifts. Cognitive Linguistics 9(3):
283–314.
Loehr, Dan 2004. Gesture and intonation. Ph.D. dissertation, Georgetown University, Washington, DC.
Loehr, Dan 2007. Aspects of rhythm in gesture and speech. Gesture 7(2): 179–214.
Martinet, André (1960/1963). Grundzüge der Allgemeinen Sprachwissenschaft. Stuttgart: Kohlhammer.
McClave, Evelyn Z. 1991. Intonation and gesture. Ph.D. dissertation, Georgetown University,
Washington, DC.
McClave, Evelyn Z. 1994. Gestural beats: The rhythm hypothesis. Journal of Psycholinguistic
Research 23(1): 45–66.
McClave, Evelyn Z. 2000. Linguistic functions of head movements in the context of speech. Jour-
nal of Pragmatics 32(7): 855–878.
McNeill, David 1979. The Conceptual Basis of Language. Hillsdale, NJ: Erlbaum.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3): 350–371.
McNeill, David 1987. So you do think gestures are nonverbal. Reply to Feyereisen (1987). Psycho-
logical Review 94(4): 499–504.
McNeill, David 1989. A straight path – to where? Reply to Butterworth and Hadar. Psychological
Review 96(1): 175–179.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David (ed.) 2000. Language and Gesture. Cambridge: Cambridge University Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
McNeill, David 2007. Gesture and thought. In: Anna Esposito, Maja Bratanić, Eric Keller and
Maria Marinaro (eds.), Fundamentals of Verbal and Nonverbal Communication and the
Biometric Issue, 20–33. Amsterdam: IOS Press.
McNeill, David this volume. The growth point hypothesis of language and gesture as a dynamic
and integrated system. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
78 I. How the body relates to language and communication
Müller, Cornelia this volume. Gestures as a medium of expression: The linguistic potential of ges-
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication: An International Handbook
on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Müller, Cornelia submitted. How gestures mean – The construal of meaning in gestures with speech.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of gestures:
a form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communica-
tion Science 38.1.) Berlin: De Gruyter Mouton.
Müller, Cornelia and Alan Cienki 2009. Words, gestures, and beyond: Forms of multimodal met-
aphor in the use of spoken language. In: Charles Forceville and Eduardo Urios-Aparisi (eds.),
Multimodal Metaphor, 297–328. Berlin: De Gruyter Mouton.
Müller, Cornelia, Hedda Lausberg, Ellen Fricke and Katja Liebal 2005. Towards a Grammar of
Gesture: Evolution, Brain, and Linguistic Structures. Berlin: Antrag im Rahmen der Förderini-
tiative “Schlüsselthemen der Geisteswissenschaften. Programm zur Förderung fachübergrei-
fender und internationaler Zusammenarbeit”.
Müller, Cornelia and Ingwer Paul 1999. Gestikulieren in Sprechpausen. Eine konversations-
syntaktische Fallstudie. In: Hartmut Eggert and Janusz Golec (eds.), … wortlos der Sprache
mächtig. Schweigen und Sprechen in Literatur und sprachlicher Kommunikation, 265–281.
Stuttgart: Metzler.
Müller, Cornelia and Roland Posner (eds.) 2004. The Semantics and Pragmatics of Everyday Ges-
tures. Berlin: Weidler.
Müller, Cornelia and Gerald Speckmann 2002. Gestos con una valoración negativa en la conver-
sación cubana. DeSignis 3: 91–103.
Müller, Cornelia and Susanne Tag 2010. The embodied dynamics of metaphoricity: Activating
metaphoricity in conversational interaction. Cognitive Semiotics 6: 85–120.
Núñez, Raphael 2008. A fresh look at the foundations of mathematics: Gesture and the psycho-
logical reality of conceptual metaphor. In: Alan Cienki and Cornelia Müller (eds.), Metaphor
and Gestures, 225–247. Amsterdam: John Benjamins.
Núñez, Rafael E. and Eve Sweetser 2006. With the future behind them: Convergent evidence from
Aymara language and gesture in the crosslinguistic comparison of spatial construals of time.
Cognitive Science 30(3): 401–450.
Ortony, Andrew 1993. Metaphor and Thought. Cambridge: Cambridge University Press.
Parrill, Fey 2008. Form, meaning and convention: An experimental examination of metaphoric
gestures. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 225–247. Amster-
dam: John Benjamins.
Parrill, Fey and Eve Sweetser 2002. Representing meaning: Morphemic level analysis with a hol-
istic appraoch to gesture transcription. Paper presented at the First Congress of the Interna-
tional Society of Gesture Studies, The University of Texas, Austin.
Parrill, Fey and Eve Sweetser 2004. What we mean by meaning: Conceptual integration in gesture
analysis and transcription. Gesture 4(2): 197–219.
Peirce, Charles S. 1931. Collected Papers of Charles Sanders Peirce. Cambridge, MA: Harvard Uni-
versity Press.
Pfau, Roland and Markus Steinbach 2006. Pluralization in sign and in speech: A cross-modal typo-
logical study. Linguistic Typology 10(2): 135–182.
Pfau, Roland and Markus Steinbach 2011. Grammaticalization in sign languages. In: Bernd Heine
and Heiko Narrog (eds.), Handbook of Grammaticalization, 681–693. Oxford: Oxford Univer-
sity Press.
Pike, Kenneth Lee 1967. Language in Relation to a Unified Theory of the Structure of Human
Behavior (second and revised edition). The Hague: Mouton.
80 I. How the body relates to language and communication
Talmy, Leonard (1983). How language structures space. In: Herbert L. Pick and Linda P. Acredolo
(eds.), Spatial Orientation: Theory, Research, and Application, 225–282. New York: Plenum Press.
Teßendorf, Sedinha 2008. Pragmatic and metaphoric gestures – combining functional with cogni-
tive approaches. Unpublished manuscript, European University Viadrina, Frankfurt (Oder).
Teßendorf, Sedinha and Silva H. Ladewig 2008. The brushing-aside and the cyclic gesture – recon-
structing their underlying patterns, GCLA-08/DGKL-08. Leipzig, Germany.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1/2): 83–105.
Vygotsky, Lev 1986. Thought and Language. Edited and translated by Eugenia Hanfmann and
Gertrude Vakar, revised and edited by Alex Kozulin. Cambridge: Massachusetts Institute of
Technology Press.
Wachsmuth, Ipke 1999. Communicative rhythm in gesture and speech. In: Annelies Braffort, Ra-
chid Gherbi, Sylvie Gibet, James Richardson and Daniel Teil (eds.), Gesture-Based Communi-
cation in Human-Computer Interaction – Proceedings International Gesture Workshop GW’99,
277–289. Berlin: Springer.
Watzlawick, Paul, Janet Beavin Bavelas, and Don D. Jackson 1967. Pragmatics of Human Commu-
nication: A Study of Interactional Patterns, Pathologies and Paradoxes. New York: Norton.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. Unpublished Ph.D. dissertation,
University of Rochester, New York.
Webb, Rebecca 1998. The lexicon and componentiality of American metaphoric gestures. In:
Serge Santi, Isabelle Guaitella, Christian Cavé and Gabrielle Konopczynski (eds.), Oralité et
Gestualité: Communication Multimodale, Interaction, 387–391. Paris: L’Harmattan.
Wilcox, Sherman 2002. The iconic mapping of space and time in signed languages. In: Liliana Al-
bertazzi (ed.), Unfolding Perceptual Continua, 255–281. Amsterdam: John Benjamins.
Wilcox, Sherman 2004. Gesture and language. Gesture 4(1): 3–73.
Wilcox, Sherman and Paolo Rossini 2010. Grammaticalization in sign languages. In: Diane Bren-
tari (ed.), Sign Languages, 332–354. Cambridge: Cambridge University Press.
Wilcox, Sherman and Phyllis Wilcox 1995. The gestural expression of modality in ASL. In: Joan
Bybee and Suzanne Fleischman (eds.), Modality in Grammar and Discourse, 135–162. Amster-
dam: John Benjamins.
Williams, Robert F. 2008. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Müller (eds.), Metaphor and Gesture, 55–92. Amsterdam: John Benjamins.
Wollock, Jeffrey 1997. The Noblest Animate Motion: Speech, Physiology, and Medicine in Pre-Car-
tesian Linguistic Thought. Amsterdam: John Benjamins.
Wollock, Jeffrey 2002. John Bulwer (1606–1656) and the significance of gesture in 17th century
theories of language and cognition. Gesture 2(2): 227–258.
Wollock, Jeffrey this volume. Renaissance philosophy: Gesture as universal language. In: Cornelia
Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf
(eds.), Body – Language – Communication: An International Handbook on Multimodality in
Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.) Berlin:
De Gruyter Mouton.
Wundt, Wilhlem 1921. Völkerpyschologie: Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Erster Band. Die Sprache. Leipzig: Engelmann.
Zlatev, Jordan 2002. Mimesis: The “missing link” between signals and symbols in phylogeny and
ontogeny? In: Anneli Pajunen (ed.), Mimesis, Sign and Language Evolution, 93–122. Publica-
tions in General Linguistics 3. Turku: University of Turku Press.
Abstract
This article offers an account of the nature of emblems, the history of the concept and its
definitions. It sketches the most important theoretical approaches towards emblems
and their findings. Here, we will present insights from cognitive and psychological, semi-
otic, and ethnographic and pragmatic perspectives on the subject. We will give a very
brief overview about mono-cultural and cross-cultural repertoires of emblems and
address some of the cross-cultural findings concerning this conventional gesture type.
At the end of the article, some of the most important characteristics of emblems are
described.
1. Introduction
Emblems or quotable gestures are conventional body movements that have a precise
meaning which can be understood easily without speech by a certain cultural or social
group.
In this article, we will concentrate on emblematic hand gestures only. Examples of
prototypical emblems are the so-called “thumbs up gesture” that, at least in Western
cultures, is used to express something good or positive and can be glossed with “OK”
(Morris et al. 1979; Sherzer 1991) or the “V as victory sign” (Brookes 2011; Morris
et al. 1979; Schuler 1944) where the index and middle finger are stretched, the other
fingers curled in and the palm faces the interlocutor. These are the gestures that are
likely to appear on photos in newspapers, advertisements, or in ancient paintings having
a clear message and often express the attitude of the gesturer.
At least since the beginning of modern gesture studies with the seminal study of
David Efron ([1941] 1972) emblems have been regarded as a class of gestures different
from spontaneous co-speech gestures. The majority of gesture students agrees that em-
blems differ from spontaneous co-speech gestures, which are assumed to be created
rather on the spot (see McNeill 1992, 2000, 2005; Müller 1998, 2010, this volume;
Poggi 2002, inter alia), in that emblems are historically developed and therefore belong
to the gestural repertoire of a certain culture or group. Emblems are conventional ges-
tures that have a standard of well-formedness. It is widely accepted that they have a
more or less defined meaning, are easily translatable into words or a phrase and can
therefore be used as a substitute for speech (Ekman and Friesen 1969, 1972; Johnson,
4. Emblems, quotable gestures, or conventionalized body movements 83
Ekman, and Friesen 1975; McNeill 1992, 2000, 2005; Morris 2002; Morris et al. 1979;
Müller 1998, 2010; Payrató 1993; Poggi 2002, 2007, inter alia). Consequently, that
means that emblems have, at least in part, an illocutionary force (Kendon 1995; Payrató
1993, 2003; Poggi 2002).
This article will not treat other conventionalized body movements such as coded ges-
tures (e.g. the semaphore language of arm signals, see Morris 2002: 40) or technical ges-
tures (Morris 2002: 38) which are invented and used by a minority for technical
communication, e.g. the gestures of crane drivers or firemen. These gestures usually
do not enter the gestural repertoire of a wider group and are therefore not addressed
here (see Kendon 2004b: 291ff. for an overview).
Notes therefore of things, which without the helpe and mediation of Words signifie Things,
are of two sorts; whereof the first sort is significant of Congruitie, the other ad placitum. Of
the former are Hieroglyphiques and Gestures; […] As for Gestures they are, as it were,
Transitory Hieroglyphiques. […] This in the meane is plain, that Hieroglyphiques and Ges-
tures ever have some similitude with the thing signified, and are kind of Emblemes. (Bacon
1640: 258–259; Efron 1972: 94–95)
Although Bacon includes all gestures that work as signs without the “mediation” of lan-
guage, and compares them to hieroglyphs, because both signs are connected to their
84 I. How the body relates to language and communication
signified through similarity, Efron reserves the term symbolic or emblematic gestures
for those that are “representing either a visual or a logical object by means of a pictorial
or a non-pictorial form which has no morphological relationship to the thing repre-
sented” (Efron 1972: 96), actually reserving the term emblems for arbitrary gestures,
and excluding those that have “some similitude with the thing signified”. Nevertheless,
in a footnote he notes that there are some symbolic gestures that are partially similar to
their referent and calls them “hybrid movements” (Efron 1972: 96). But since they
fall into two different categories and are therefore hard to classify, he refrains from
considering them any further.
While Bacon in this “very brief discussion uses the term “hieroglyphic” in a very
generic way, for all iconic ideograms” ( Jeffrey Wollock, personal communication; see
also this volume), Efron reserves this term for those gestures that have no relation of
similarity to their signified, but an arbitrary one. In her survey of the history of gesture
studies Müller (1998: 61–62, footnote 72) points out that with the discovery of the
Egyptian hieroglyphs in the 16th and 17th century a sudden interest in iconology
arose within the intellectual circles of Europe, leading to the development of pictorial
symbols, such as emblems, displaying proverbs, idioms and abstract notions. It was in
this context that gestures were considered ideograms or emblems. Müller alludes to
the fact that some emblematic gestures (in the Efronian sense) are in effect grounded
in the gestural representation of proverbs and idioms (Müller 1998: 62; see also Payrató
2008).
For David Efron, emblematic gestures are meaningful by the conventional symbolic
connotation that they possess independently from the speech for which they “may, or
may not, be an adjunct” (Efron 1972: 96), a characteristic which also counts for deictic
or pictorial gestures. The matter of iconicity, arbitrariness, and conventionality has been
discussed thoroughly by Barbara E. Hanna (1996, see below). Most emblem researchers
have followed Ekman and Friesen’s definition (1972, a slightly adjusted version of the
one presented in 1969) who shifted the focus from conventionality towards the
emblem’s relation to speech:
Emblems are those nonverbal acts (a) which have a direct verbal translation usually con-
sisting of a word or two, or a phrase, (b) for which this precise meaning is known by most or
all members of a group, class, subculture or culture, (c) which are most often deliberately
used with the conscious intent to send a particular message to other person(s), (d) for
which the person(s) who sees the emblem usually not only knows the emblem’s message
but also knows that it was deliberately sent to him, and (e) for which the sender usually
takes responsibility for having made that communication. A further touchstone of an
emblem is whether it can be replaced by a word or two, its message verbalized without sub-
stantially modifying the conversation. (Ekman and Friesen 1972: 357)
This definition focuses on the word-likeliness of emblems, and, following Hanna (1996),
has hindered the development of thorough studies on emblems as communicative signs
in their own right and lead to a series of emblem repertoires (see also Payrató 1993 for a
systematic discussion). Adam Kendon qualifies his own definition about emblems as
“autonomous or quotable gestures” as a practical user’s definition, thereby declaredly
circumventing the difficulties of establishing coherent semiotic criteria, which even
within theoretical reasoning are difficult to meet. The term therefore refers to gestures
4. Emblems, quotable gestures, or conventionalized body movements 85
that “are standardized in form and which can be quoted and glossed apart from a con-
text of spoken utterance” (Kendon 1986: 7–8). With this definition, Kendon grasps
those gestures which have already made their way “into an explicit list or vocabulary”
(Kendon 2004b: 335), such as for instance the “thumbs up gesture”, the “victory ges-
ture” or the “fingers cross gesture” (see Morris et al. 1979, for examples).
3. Theoretical approaches
Emblems have been treated by almost all gesture researchers because they hold a
prominent position between conventional and codified gestural systems, such as sign
languages, and supposedly idiosyncratic and singular co-speech gestures. This idea is ex-
pressed in the so-called Kendon’s continuum that was introduced by David McNeill
(1992: 37–38) and elaborated in McNeill (2000) and Kendon (2004b) which arranges
gesture types on a scale from holistic, spontaneous, idiosyncratic and co-speech depen-
dent gesticulations to the language-like, conventional signs of sign languages. In
between are language-like gestures, pantomimes, and emblems, the last described as
having a “segmentation, standards of well-formedness, a historical tradition, and a com-
munity of users” (McNeill 1992: 56). What has been looked at when considering em-
blems depends heavily on the respective researcher’s theoretical assumptions. In the
following, we will sketch the most influential approaches.
interjections, an equivalent of a complete speech act, with a clear and unchangeable il-
locutionary force, whereas articulated emblems behave like components of a communi-
cative act. Comparable to words they participate in communicative acts, but their
performative character changes according to the context.
In short, both approaches can be characterized by their semantic focus and their
verbocentric point of view.
compared the gesture use of US-immigrants from Southern Italy with the gesture use of
Jewish immigrants from Eastern Europe, but he was the first to apply a variety of empir-
ical methods, for example direct observation combined with sketches, and – as a revo-
lutionary novelty – the compilation and interpretation of film material recorded on the
scene within natural communicative settings. Efron found that the use and especially
the repertoire of conventional gestures differed greatly between the two groups inves-
tigated. While the Italians had an extensive and diversified repertoire of conventional
gestures (151 gesture-words, not only emblematic, but also physiographic gestures),
the Jews hardly made any use of emblems at all, only six rather symbolic movements
could be spelt out. The assimilated groups of both origins, though, had clearly taken
over the US American standard displayed by their new status and/or social group
and hardly used any emblematic gestures at all.
Adam Kendon’s work starts out just where Efron’s ended. With a great expertise on
alternate sign languages, gesture and culture, his efforts have been directed towards the
investigation of gestures in use, in their natural surrounding. He has argued quite early
(e.g. 1988) against a definitorial division between the so-called spontaneous or idiosyn-
cratic and conventional or quotable gestures. In a study in 1995, Kendon compared em-
blems and formally similar conventional gestures, such as the emblematic gesture of the
mano a borsa with the finger bunch, a recurrent gesture according to Ladewig (2011a).
Both gestures are used pragmatically, the mano a borsa to indicate a certain speech act
(request, negative comment), the finger bunch in order to mark the topic of the utter-
ance. This suggests that a pragmatic use of gestures might be related to a process of con-
ventionalization. In her study of the “pistol hand” in Iran, Seyfeddinipur (2004) obtains
similar results.
In a comparative study of the gesturing of a Neapolitan and an Englishman, Kendon
(2004a) confirmed Efron’s findings about the elaborate repertoire and usage of conven-
tional gestures by the Italian. One possible explanation of the abundant gesture vocab-
ulary seems to lie in what he calls the ecology of interaction in Naples (Kendon 2004a,
2004b). In a review of Morris et al.’s book about the origin and distribution of emblems
in Europe (Morris et al. 1979, see below), Kendon (1981) resumed the functions of
these gestures on the basis of existing emblem repertoires. As we have noted above,
emblems are used to express communicative acts, rather than being used as a mere sub-
stitute for a word (see below for the functions). They are especially used for communi-
cative acts of interpersonal control, for announcing one’s own current state, and as
evaluative descriptions of the action or appearances of someone else.
Two contextual studies stand out in this line of research: Joel Sherzer (1991) has
undertaken a careful context of use analysis for the omnipresent “thumbs up gesture”
in urban Brazilian settings, so has Heather Brookes for the “clever gesture” (2001) in
the South African townships, followed by contextual studies of the “drinking”, “clever”
and “money gesture” (2005), and the “HIV gesture” (Brookes 2011) in the same com-
munity. Basing his analysis on the theories of Jakobson and Goffman, Sherzer shows
that the “thumbs up” gesture combines the paradigmatic notion of “OK”, “positive”
with the syntagmatic or interactive function of “social obligation met”. This combina-
tion accounts for a multifunctional use of this emblem covering almost all functions
that Kendon (1981) had extracted from the different repertoires. According to Sherzer
the main reason for its abundant use is that the gesture expresses a key concept of the
Brazilian culture representing a friendly and positive linkage between people, “a public
88 I. How the body relates to language and communication
self-image very important to Brazilians” (Sherzer 1991: 196), who actually live in a
socially and economically divided society. A quite similar approach is Brookes’ study
(2001) on the “clever gesture” in South African townships, which expresses the concept
of being clever in the sense of “streetwise” or “city slick”, an important cultural concept
in township life. The different functions of this gesture are connected through the
semantic core: a formal reference to seeing. The core, the situational context, and the
facial expression constitute the gesture’s functions, as a warning, as a comment, or
even as a greeting. In the case of the “HIV gesture” (Brookes 2011), we can actually
observe the emergence, frequent use and decay of a gesture (see below). Here, the ges-
ture’s use and prominence are shown to be a result of a taboo, which is connected
to the connotation of sex and severity of this widespread illness together with social
communicational norms, like politeness for example.
The pragmatic linguist Lluı́s Payrató (1993, 2003, 2004, 2008, volume 2 of this hand-
book; Payrató, Alturo, and Payà 2004) has not only compiled a basic repertoire of Cat-
alan emblems, used by a certain social class in Barcelona, but has also introduced solid
methods of pragmatics, sociolinguistics, cognitive linguistics, and ethnography of com-
munication to emblem research. For Payrató, the determinant feature of emblems is
their illocutionary force. Using the speech act classification of Searle (1979) to investi-
gate emblematic functions more closely, he confirmed Kendon’s results (1981) and,
moreover, was able to show a tendency toward emblematization or conventionalization.
Considering the data of the Catalan basic repertoire, it can be said that directive ges-
tures, gestures for interpersonal control and such gestures that are based on interactive
actions seem to be the ones most likely to undergo emblematization (Payrató 1993:
206). In questions that concern the structure of an emblem repertoire, Payrató (2003)
used prototype theory, family resemblance, and relevance theory (Sperber and Wilson
1995) to account for the different relationships and meanings of single gestures or their
variants. On different occasions (Payrató 1993, especially 2001, 2004) he has argued for
an implementation of diverse precise linguistic methods into gesture studies and for
an opening of the traditional linguistics towards the fundamental insights that gesture
studies can contribute to the understanding of human communication, examples of a
fruitful integration can be seen throughout his work.
4. Emblem repertoires
The collection of emblems goes back to ancient times. Although Quintilian also ad-
dresses conventional gestures, among the first repertoires known is Bonifacio’s trea-
tise on the art of signs (L’arte de’ Cenni 1616, see Kendon 2004b: 23) together with
the works of John Bulwer (Chirologia and Chironomia [1644] 1972), both in the con-
text of gestures as the natural language of mankind. In the 19th century de Jorio’s
([1832] 2000) and Mallery’s ([1881] 2001) works stand out for their detailed descrip-
tion and ethnographic interest. Throughout the centuries, there has been a great inter-
est in collecting emblems as cultural gestures, and a detailed historic account would
exceed the possibilities of this paper, but Kendon (1981, 2004b) offers a good sum-
mary and Bremmer and Roodenburg (1992) present a diachronic view on gesture
use. A good overview of emblem repertoires can be found in Kendon (1981, 1996,
2004b), and, with a detailed bibliography, in Payrató (1993), for the Hispanic
tradition see Payrató (2008).
4. Emblems, quotable gestures, or conventionalized body movements 89
5. Cross-cultural findings
Cross-cultural findings regarding emblems can be subdivided into issues of varying com-
plexities: Differences in the meaning(s) of individual gestures, their spread and distribu-
tion; differences in cultural key concepts expressed by emblems and finally differences
in the use, size and diversity of a gestural repertoire. The fact that – on an individual
level – emblems differ from one culture to another can be proven by the mere existence
of culture specific dictionaries or repertoires as listed above. It is, of course, difficult to
know why a certain gesture exists in this form in one area and in a different form or with
another meaning in another area gesturers rely on the iconic interpretation of signs
leading to widespread and very popular speculations about the origins of emblems
(see again Morris et al. 1979 for diverse etymological derivations).
Although emblems have been defined as having a clearcut translation, they are not
restricted to one meaning, not only across cultures, but also within one culture. As
Adam Kendon (1981) observed, there seems to be a link between the range of mean-
ings and their spread. The most widespread gestures in Morris et al. (1979), like the
“nose thumb”, for instance, have only one or a few related meanings attributed to
them, while the ones with a whole range of (unrelated) meanings are geographically en-
trenched. Reasons for the spread of emblems can be seen in culture contact, common
history, common religion, beliefs, and traditions, a common language, common climate,
traveling, and the influence of modern media. None of these factors act exclusively, or
predictably. When a certain emblem is tied to a specific idiom, an interjection and the
like, it might not cross linguistic borders. When an emblem is tied to religious beliefs, its
spread will probably depict the spread of this religion. In trying to answer the question
what keeps gestures from spreading, Morris et al. (1979: 263–265) propose, among
other things, cultural prejudice barriers, linguistic barriers, ideological and religious bar-
riers, geographical barriers, and gesture taboos. And, on a somewhat different level, the
semantic characteristics of the existing repertoire that prevents or shapes the adoption
of an emblem.
Close contextual and ethnographic studies such as those by Brookes (2001), Kendon
(1995) and Sherzer (1991) have shown that the frequent use of certain emblems in a
community may shed some light on important key concepts or concerns of this commu-
nity. Being positive and meeting social obligations in everyday interaction is an impor-
tant characteristic in urban Brazil, just as “doing” being clever and streetwise, and
belonging to the right group, is in the townships of South Africa, both are concepts
that need to be negotiated within everyday communication and interaction. Brookes’
(2004, 2005) collection of the gesture repertoire of South African urban young
men has similar features. By investigating the gesturers and their gesture use in
their everyday surrounding, distinguishing different forms and functions in various in-
teractional contexts, she was able to get a very detailed hold on the characteristics of
this special repertoire which belongs to the overall communicative behavior gesturing
as a skill which should be mastered as an important part of male township identity.
In order to gain cross-cultural insights, though, other, comparable investigations are
required.
Cultural differences in gesture repertoires have been presented most notably by
David Efron (1972). He observed that the Italians in his study used more pictorial ges-
tures, and had a far bigger repertoire than the Eastern European Jews. To Efron it
4. Emblems, quotable gestures, or conventionalized body movements 91
seemed that the Italian repertoire could serve as an exclusive means of communication
while the Jews hardly used any emblems at all, and if they did, they were not inter-
preted consistently. De Jorio (2000), Kendon (1995, 2004a, 2004b), and others have con-
firmed and described the size and diversity of the (Southern) Italian repertoire (see also
Burke 1992). While de Jorio concentrated on the historical aspect of gestures, tracing
the gestures back to ancient times, Kendon developed a theory about the overall ecol-
ogy of Naples as a reason for the abundance of conventional gestures. The dominance
of a somewhat theatrical public life, the crowded streets, the overall noise, the interest
in display and a tradition of secrecy, according to Kendon, have their share in the
emergence of this refined communication system.
6. Characteristics of emblems
The following sections will touch upon some of the most important characteristics that
are at stake when discussing emblems: their semantic domains, the emergence and
origin of emblems, their compositionality, conventionality, and their relation to speech.
used for the identification of people, for commenting on them, for threatening and
warnings. Brookes concludes that the functions of lexical gestures vary and that those
emblems that are based on practical objects and actions fulfill a smaller range of func-
tions than the others. Those lexical gestures seem to be close to what Kendon (2004b:
chapter 10) has called narrow gloss gestures when they display rather substantial than
pragmatic information. The other lexical gestures seem to be used as interactional
moves just as described a detailed comparison of the functions of lexical emblems
with other repertoires has not been undertaken.
As mentioned above, Payrató (1993) used Searle’s (1979) speech act classification in
order to describe the functions of the gestures in the overall Catalan repertoire, consist-
ing of emblems, pseudoemblems and other items, where the last two categories decrease
in conventionality and preciseness of meaning. Due to the fact that one gesture can
have multiple illocutionary values, the categories (assertives, directives, etc.) were not
seen as exclusive. The results reveal that most emblems have an assertive function, fol-
lowed by directive and expressive functions. What is even more interesting is that the
comparison of the three sets shows that the assertive function increases within the lesser
conventional sets, just as the directive function decreases. This suggests that there is a
clear correlation between gestural functions analyzed in strict linguistic terms and con-
ventionality: Gestures with a directive function tend to undergo emblematization more
easily, a trend which underlines once again the findings of Adam Kendon and others,
namely that emblems cluster around functions that are concerned “with the immediate
interaction situation” (Kendon 1981: 142).
and originates in the actual burning of one’s own hand as a bodily experience. Posner’s
semiotic and ethological analysis of the emergence of an emblem as a process of ritua-
lization is of a more general scope because it may hold for a wider range of emblems,
namely such that are based on body movements of different sorts, regardless of their
communicative function. The emergence of a historic emblem, the gesture of
“bound hands”, from an action in a ritual context is described as a modulation in
Goffman’s terms (see Goffman 1974) by Müller and Haferland (1997). Similar em-
blems like the “fingertip kiss” can be found in Morris et al. (1979). Having started
this section with the emergence of emblems out of relevant communicative and social
needs, we have come to the emergence of emblems from different bases, such as body
movements or ritual actions. Further bases of emblems are other (co-speech) gestures,
affect displays and expressions of feelings, adaptors, interpersonal actions, intention
movements, (symbolic) objects, idioms and other linguistic expressions, and abstract
entities (see Brookes 2011 for an overview, and Kendon 1981). Regarding the Catalan
repertoire, Payrató (1993) concludes that gestures based on interactive actions are
most likely to become emblematized than others, which, again, seems to match the
overall assessment that emblems are concerned with the immediate interactional
situation.
6.3. Conventionality
Emblems are conventional gestures and therefore differ from spontaneous, singular or
creative co-speech gestures. The only study, to our knowledge, that treats the conven-
tionality of emblems in depth is the one by Barbara E. Hanna (1996). As we have
sketched above, according to Hanna, emblems are conventional signs and as such
they are strongly coded. Apparently, they have a standard of form and a notion of gen-
erality. For Hanna, an emblem is a replica of a type that is already known and that spe-
cifies the form and the meaning. Because of the strong coding, neither an analogous link
to the object represented nor a specific context is necessary. While for Hanna conven-
tion is essential to the functioning of every sign, what makes emblems specific is “that
the interpretation of emblems is governed by strong habits, that emblems are ruled by
strong conventions, thus being conventionalized to the point of generality” (1996: 346).
Emblems are a category of gestures with fuzzy edges and conventionality is not an
exclusive characteristic of them.
Kendon’s continuum or continua (McNeill 1992, 2000; see also Kendon 2004b) was a
way to determine the characteristics of different gesture types on a continuum that
comprised their relationship to speech, linguistic properties, to conventions, and the
character of their semiosis. According to this tradition, emblems are in between signs
of a sign language and gesticulation or spontaneous idiosyncratic gestures. The relation-
ship between gestures and signs has been reconsidered recently by Wilcox (2005, this
volume) and Kendon (2008), inter alia, insofar as the interconnections are foregrounded
and not the divide. This line of research might open up new perspectives in the question
of the conventionality of emblems. From the perspective of co-speech gestures,
Kendon’s work has been influential yet again. As mentioned above, Kendon’s (1995)
comparative study of emblems and apparently conventional co-speech gestures, that
were used primarily with pragmatic functions, initiated the investigations of what has
been called recurrent gestures (Ladewig 2011a, volume 2; Müller 2010, this volume).
94 I. How the body relates to language and communication
Although further research is needed, it appears that there are fundamental overlaps
between emblems and recurrent gestures, like the “palm-up-open-hand-gesture” that
presents something on the open hand (see Müller 2004). An experimental study by
Fey Parrill (2008) comparing the “palm-up-open-hand-gesture” with the “OK” emblem
presents similarities and differences between those two types. While the emblem had a
more restricted range of usages, both gestures were acknowledged to have formal var-
iants. Interestingly, standards of well-formedness could not be confirmed for either ges-
ture. More insights on the subject of conventionality of emblems can be found in studies
about the process of emblematization such as the ones by Brookes (2011) and Payrató
(1993). Comparing the three sets of gestures in his repertoire, Payrató was able to con-
clude that “directive gestures, interpersonal control gestures, and gestures based on
interactive actions are the least restrained by the filters in the basic repertoire of Cat-
alan emblems; therefore, they seem to be more likely than any others to reach the
highest level of emblematization or conventionalization of body action” (Payrató
1993: 206).
6.4. Compositionality
Another characteristic of emblems is its basic compositionality, meaning that an emblem
can consist of more than one formal gestural component as, for example, when the
“thumbs up” gesture is moved repeatedly towards the interlocutor, combining a signifi-
cant hand configuration with a movement pattern (Calbris 1990, 2003; Kendon 1995,
2004b; McNeill 1992, 2000; Poggi 2002; Sparhawk 1978). The results of Sparhawk’s ana-
lysis show that although she could confirm a set of contrasting elements, even some min-
imal pairs, in the Persian data, they differ notably from the contrastive system of sign
languages. Rebecca Webb (1996) has undertaken a similar approach toward so-called me-
taphoric gestures. Her findings suggest a small set of “morpheme-like” components that
can be recombined with other components.
Compositionality can also mean that an emblem consists of a hand gesture and a
facial expression (among others Calbris 1990; Payrató 2003; Poggi 2002, inter alia; Poya-
tos 1981; Ricci Bitti 1992; Sparhawk 1978). The importance of the facial component in
emblems has been shown by Poggi (2002: 80). In order to decide whether an emblem
represents a fixed communicative act or not, she performed different performative
faces to see if they match or mismatch the gestural function. If variations are possible,
it is an articulated emblem, if only one facial expression is valid, it is a holophrastic
emblem.
In a third interpretation, compositionality might mean that two emblems combine
into a new one (Calbris 1990; Johnson, Ekman, and Friesen 1975; Morris et al. 1979).
This case is very seldom, but Morris et al. report a combination of the “flat hand-
chop threat emblem” with the “ring” and the combination of the “fig” or “horn ges-
ture” with the “forearm jerk”, as to double the impact of the insult (Morris et al.
1979: 267). Somewhat differently, it may mean that an emblem is used with a sound,
which can be paralinguistic, made by the mouth, by the hand or by another articulator,
or an interjection, for instance (Calbris 1990; Meo-Zilio 1986; Posner 2002; Poyatos
1981). In the case of the “flapping hand gesture” presented by Posner, the original
sound of blowing onto the burnt hand and of taking a deep breath develops towards
linguistic articulation, ending in two interjections, each of them leading to a different
4. Emblems, quotable gestures, or conventionalized body movements 95
interpretation of the overall gesture. While one refers to the danger of something, the
other refers to its fascination.
7. Concluding remarks
Throughout this article, we have tried to sketch the characteristics of emblems as a pre-
sumed class of conventional gestures. What is fascinating about them is that they “act as
conveyors of meaning in their own right” as Kendon puts it (1981: 146). Regarding
them as mere word-substitutes has not only obscured their functions within communi-
cation, but has also distracted the attention from their versatility and dynamics. More
recently, studies from different theoretical backgrounds seem to have overcome this
96 I. How the body relates to language and communication
constraint. In some areas, though, such as gesture acquisition, gesture processing, and
most of the psycholinguistic tradition emblems need to receive more attention. Besides
more ethnographic and contextual studies, what is essential for future research on em-
blems is the development of scientific standards that allow for a true comparison of
emblem repertoires.
Acknowledgements
I would like to thank Jeffrey Wollock for his insights on emblems in the Renaissance
and Cornelia Müller for helpful comments on earlier versions of this chapter.
8. References
Andrén, Mats 2010. Children’s gestures from 18 to 30 months. Ph.D. thesis, Centre for Languages
and Literature, Lund University.
Bacon, Francis 1640. Of the Advancement and Proficience of Learning. Book VI. Oxford: Young
and Forrest.
Barakat, Robert A. 1973. Arabic gestures. Journal of Popular Culture 4: 749–793.
Bremmer, Jan and Herman Roodenburg (eds.) 1992. A Cultural History of Gesture. Ithaca, NY:
Cornell University Press.
Brookes, Heather 2001. The case of the clever gesture. Gesture 1(2): 167–184.
Brookes, Heather 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Anthropology 14(2): 186–224.
Brookes, Heather 2005. What gestures do: Some communicative functions of quotable gestures in
conversations among Black urban South Africans. Journal of Pragmatics 32: 2044–2085.
Brookes, Heather 2011. Amangama amathathu ‘The three letters’. The emergence of a quotable
gesture (emblem). Gesture 11(2): 194–217.
Brookes, Heather volume 2. Gestures and taboo. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body – Language – Communi-
cation: An International Handbook on Multimodality in Human Interaction. (Handbooks of
Linguistics and Communication Science 38.2.) Berlin: De Gruyter Mouton.
Bulwer, John 1974. Chirologia or the Natural Language of the Hand, etc. (and) Chiromania or the
Art of Manual Rhetoric, etc. Cabonville: Southern Illinois Press First published [1644].
Burke, Peter 1992. The language of gesture in early modern Italy. In: Jan Bremmer and Herman
Roodenburg (eds.), A Cultural History of Gesture, 71–83. Ithaca, NY: Cornell University Press.
Calbris, Geneviève 1990. The Semiotics of French Gestures. Bloomington: Indiana University Press.
Calbris, Geneviève 2003. From cutting an object to a clear cut analysis: Gesture as the representa-
tion of a preconceptual schema linking concrete actions to abstract notions. Gesture 3(1): 19–46.
Calbris, Geneviève and Jacques Montredon 1986. Des Gestes et des Mots Pour le Dire. Paris: Clé
International.
Castelfranchi, Cristiano and Domenico Parisi 1980. Linguaggio, Conoscenze e Scopi. Bologna:
Il Mulino.
Cestero, Ana Marı́a 1999. Repertorio Básico de Signos no Verbales del Español. Madrid: Arco Libros.
Creider, Chet A. 1977. Towards a description of East African Gestures. Sign Language Studies 14:
1–20.
De Jorio, Andrea 2000. Gesture in Naples and Gesture in Classical Antiquity. A translation of La
mimica degli antichi investigata nel gestire napoletano (Fibreno, Naples 1832), with an introduc-
tion and notes by Adam Kendon. Bloomington: Indiana University Press.
Diadori, Pierangela 1990. Senza Parole: 100 Gesti degli Italiani. Rome: Bonacci Editore.
Eco, Umberto 1976. A Theory of Semiotics. Bloomington: Indiana University Press.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton First published [1941].
4. Emblems, quotable gestures, or conventionalized body movements 97
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories, ori-
gins, usage, and coding. Semiotica 1: 49–98.
Ekman, Paul and Wallace V. Friesen 1972. Hand movements. Journal of Communication 22: 353–374.
Fornés, Maria Antònia and Mercè Puig 2008. El Porqué de Nuestros Gestos. La Roma de Ayer en
la Gestualidad de Hoy. Palma: Edicions Universitat de les Illes Balears.
Gelabert, Marı́a José and Emma Martinell 1990. Diccionario de Gestos con sus Usos Más Usuales.
Madrid: Edelsa.
Goffman, Erving 1974. Frame Analysis. An Essay on the Organization of Experience. Cambridge,
MA: Harvard University Press.
Green, Jerald R. 1968. Gesture Inventory for the Teaching of Spanish. Philadelphia: Chilton Books.
Hanna, Barbara E. 1996. Defining the emblem. Semiotica 112(3/4): 289–358.
Johnson, Harold G., Paul Ekman and Wallace Friesen 1975. Communicative body movements:
American emblems. Semiotica 15(4): 335–353.
Kacem, Chaouki 2012. Gestenverhalten an Deutschen und Tunesischen Schulen. Ph.D. thesis,
Technical University, Berlin. URN: urn:nbn:de:kobv:83-opus-34158 URL: http://opus.kobv.
de/tuberlin/volltexte/2012/3415/
Kendon, Adam 1981. Geography of gesture. Semiotica 37(1–2): 129–163.
Kendon, Adam 1983. Gesture and speech: How they interact. In: John M. Wieman and Randall P.
Harrison (eds.), Nonverbal Interaction, 13–45. Beverly Hills, CA: Sage.
Kendon, Adam 1984. Did gesture have the happiness to escape the curse at the confusion of
Babel? In: Aaron Wolfgang (ed.), Nonverbal Behavior: Perspectives, Applications, Intercultural
Insights, 75–114. Lewiston, NY: C. J. Hogrefe.
Kendon, Adam 1986. Some reasons for studying gestures. Semiotica 62: 3–28.
Kendon, Adam 1988. How gestures can become like words. In: Fernando Poyatos (ed.), Cross-
Cultural Perspectives in Nonverbal Behavior, 131–141. Toronto: C. J. Hogrefe.
Kendon, Adam 1992. Some recent work from Italy on quotable gestures (emblems). Journal of
Linguistic Anthropology 2(1): 92–108.
Kendon, Adam 1995. Gestures as illocutionary and discourse structure markers in Southern Ital-
ian conversation. Journal of Pragmatics 23: 247–279.
Kendon, Adam 1996. An agenda for gesture studies. Semiotic Review of Books 7(3): 7–12.
Kendon, Adam 2004a. Contrasts in gesticulation. A British and a Neapolitan speaker compared.
In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures. Proceedings of the Berlin Conference, April 1998, 173–193. Berlin: Weidler.
Kendon, Adam 2004b. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kendon, Adam 2008. Some reflections on the relationship between ‘gesture’ and ‘sign’. Gesture 8(3):
348–366.
Kreidlin, Grigori E. 2004. The Russian dictionary of Gestures. In: Cornelia Müller and Roland
Posner (eds.), The Semantics and Pragmatics of Everyday Gestures. Proceedings of the Berlin
Conference, April 1998, 173–193. Berlin: Weidler.
Ladewig, Silva H. 2011a. Putting a recurrent gesture on a cognitive basis. CogniTexte 6 http://
cognitextes.revues.org/406.
Ladewig, Silva H. 2011b. Syntactic and semantic integration of gestures into speech: Structural, cog-
nitive, and conceptual aspects. Ph.D. thesis, European University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. volume 2. Recurrent gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body – Language – Communica-
tion: An International Handbook on Multimodality in Human Interaction. (Handbooks of
Linguistics and Communication Science 38.2.) Berlin: De Gruyter Mouton.
Mallery, Garrick 2001. Sign Language among North American Indians. New York: Dover. First
published [1881].
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought, 2nd edition. Chicago:
Chicago University Press.
98 I. How the body relates to language and communication
McNeill, David 2000. Introduction. In: David McNeill (ed.), Language and Gesture, 1–10.
Chicago: University of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Meo-Zilio, Giovanni 1986. Expresiones extralingüı́sticas concomitantes con expresiones gestuales
en el español de América. In: Sebastian Neumeister (ed.), Actas del IX Congreso de la Asocia-
ción Internacional de Hispanistas.
Meo-Zilio, Giovanni and Silvia Mejı́a 1980. Diccionario de Gestos: España e Hispanoamérica.
Bogota: Instituto Caro y Cuervo.
Monahan, Barbara 1983. A Dictionary of Russian Gestures. Ann Arbor, MI: Hermitage.
Morris, Desmond 2002. Peoplewatching. London: Vintage.
Morris, Desmond, Peter Collett, Peter Marsh and Marie O’Shaughnessy 1979. Gestures. Their Ori-
gins and Distributions. New York: Stein and Day.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2004. The Palm-Up-Open-Hand. A case of a gesture family? In: Cornelia Müller
and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Gestures. Proceedings of
the Berlin Conference, April 1998, 233–256. Berlin: Weidler.
Müller, Cornelia 2010. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In: Sprache und Literatur 41(1): 37–68. Munich: Fink.
Müller, Cornelia this volume. Linguistics: Gestures as a medium of expression. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interaction.
(Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Müller, Cornelia and Harald Haferland 1997. Gefesselte Hände. Zur Semiose performativer Ges-
ten. Mitteilungen des Deutschen Germanistenverbandes 44(3): 29–53.
Munari, Bruno 1963. Supplemento al Dizionario Italiano. Milan: Muggiani.
Nascimento Dominique, Nilma 2008. Inventario de emblemas gestuales españoles y brasileños.
Language Design 10: 5–75.
Parrill, Fey 2008. Form, meaning and convention: An experimental examination of metaphoric
gestures. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 225–247. Amster-
dam: John Benjamins.
Paura, Bruno and Marina Sorge 2002. Comme te L’aggia Dicere? Ovvero L’arte Gestuale a Napoli.
Naples: Intra Moenia.
Payrató, Lluı́s 1993. A pragmatic view on autonomous gestures: A first repertoire of Catalan
emblems. Journal of Pragmatics 20: 193–216.
Payrató, Lluı́s 2001. Methodological remarks on the study of emblems: The need for common eli-
citation procedures. In: Christian Cavé, Isabelle Guaitella and Serge Santi (eds.), Oralité et
Gestualité: Interactions et Comportements Multimodeaux dans la Communicacion, 262–265.
Paris: Harmattan.
Payrató, Lluı́s 2003. What does ‘the same gesture’ mean? A reflection on emblems, their organi-
zation and their interpretation. In: Monica Rector, Isabella Poggi and Nadine Trigo (eds.), Ges-
tures, Meaning and Use, 73–81. Porto: Fernando Pessoa University Press.
Payrató, Lluı́s 2004. Notes on pragmatic and social aspects of everyday gestures. In: Cornelia Mül-
ler and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Gestures. Proceedings
of the Berlin Conference, April 1998, 103–113. Berlin: Weidler.
Payrató, Lluı́s 2008. Past, present, and future research on emblems in the Hispanic tradition: Pre-
liminary and methodological considerations. Gesture 8(1): 5–21.
Payrató, Lluı́s volume 2. Emblems or quotable gestures: Structures, categories, and functions.
In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Jana Bressem (eds.), Body – Language – Communication: An International Handbook
on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.2.) Berlin: De Gruyter Mouton.
4. Emblems, quotable gestures, or conventionalized body movements 99
Payrató, Lluı́s, Núria Alturo and Marta Payà (eds.) 2004. Les Fronteres del Llenguatge. Lingüı́stica
I Comunicació No Verbal. Barcelona: Promociones y Publicaciones Universitarias.
Peirce, Charles Sanders 1960. Collected Papers of Charles Sanders Peirce (1931–1958), Volume I:
Principles of Philosophy, Volume II: Elements of Logic, edited by Charles Hartshorne and Paul
Weiss. Cambridge, MA: Belknap Press of Harvard University Press.
Pérez, Faustino 2000. Diccionario de Gestos Dominicanos. Santo Domingo, Republica Domeni-
cana: Faustino Pérez.
Pike, Kenneth L. 1947. Phonemics: A Technique for Reducing Languages to Writing. Ann Arbor:
University of Michigan Press.
Poggi, Isabella 1983. La mano a borsa: Analisi semantica di un gesto emblematico olofrastico. In: Gra-
zia Attili and Pio Enrico Ricci Bitti (eds.), Comunicare Senza Parole, 219–238. Rome: Bulzoni.
Poggi, Isabella (ed.) 1987. Le Parole nella Testa: Guida a un’ Edicazione Linguistica Cognitivista.
Bologna: Il Mulino.
Poggi, Isabella 2002. Symbolic gestures. The case of the Italian gestionary. Gesture 2(1): 71–98.
Poggi, Isabella 2004. The Italian gestionary. Meaning representation, ambiguity, and context. In:
Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures. Proceedings of the Berlin Conference, April 1998. Berlin: Weidler.
Poggi, Isabella 2007. Mind, Hands, Face and Body: A Goal and Belief View of Multimodal Com-
munication. Berlin: Weidler.
Poggi, Isabella and Emanuela Magno Caldognetto 1997. Mani Che Parlano. Padova: Unipress.
Poggi, Isabella and Marina Zomparelli 1987. Lessico e grammatica nei gesti e nelle parole. In: Isa-
bella Poggi (ed.), Le Parole nella Testa: Guida a un’ Edicazione Cognitivista, 291–328. Bologna:
Il Mulino.
Posner, Roland 2002. Everyday gestures as a result of ritualization. In: Monica Rector, Isabella
Poggi and Nadine Trigo (eds.), Gestures. Meaning and Use, 217–230. Porto: Fernando Pessoa
University Press.
Posner, Roland, Reinhard Krüger, Thomas Noll and Massimo Serenari in preparation. The Berlin
Dictionary of Everyday Gestures. Berlin: Weidler.
Poyatos, Fernando 1970. Kinésica del español actual. Hispania 53: 444–452.
Poyatos, Fernando 1981. Gesture inventories: Fieldwork methodology and problems. In: Adam
Kendon (ed.), Nonverbal Communication, Interaction, and Gesture. Selections from Semiotica,
371–400. The Hague: Mouton.
Rector, Monica and Salvato Trigo 2004. Body signs: Portuguese communication on three conti-
nents. In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Every-
day Gestures. Proceedings of the Berlin Conference, April 1998, 195–204. Berlin: Weidler.
Ricci Bitti, Pio Enrique 1992. Facial and manual components of Italian symbolic Gestures. In:
Fernando Poyatos (ed.), Advances in Nonverbal Communication, 187–196. Amsterdam: John
Benjamins.
Safadi, Michaela and Carol Ann Valentine 1990. Contrastive analysis of American and Arab non-
verbal and paralinguistic communication. Semiotica 82(3–4): 269–292.
Saitz, Robert L. and Edward J. Cervenka 1972. Handbook of Gestures. The Hague: Mouton.
Schuler, Edgar A. 1944. V for victory: A study in symbolic social control. Journal of Social Psy-
chology, 19: 283–299.
Searle, John R. 1979. Expression and Meaning. Studies in the Theory of Speech Acts. Cambridge:
Cambridge University Press.
Seyfeddinipur, Mandana 2004. Meta-discursive gestures from Iran: Some uses of the ‘Pistol Hand’.
In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures. Proceedings of the Berlin Conference, April 1998, 205–216. Berlin: Weidler.
Sherzer, Joel 1991. The Brazilian thumbs-up gesture. Journal of Linguistic Anthropology 1(2):
189–197.
Sparhawk, Carol M. 1976. Linguistics and gesture: An application of linguistic theory to the study
of Persian emblems. Ph.D. thesis, The University of Michigan.
100 I. How the body relates to language and communication
Abstract
This chapter examines several forms of embodied action in interaction. After discussing
the historical emergence of an interactionist approach to embodied action from early
figures in American anthropology, to the Palo Alto group, to present day conversation
analysts, it considers research on body posture, gaze, facial expression, and movement
in space for their distinct contributions to the moment-by-moment production and
5. Framing, grounding and coordinating conversational interaction 101
1. Introduction
The framing, grounding, and coordination of conversational interaction is a nuanced
and complex enterprise, one that is made possible in large part by the relative flexibility
of the human body. The head, eyes, mouth, face, torso, legs, arms, hands, fingers, and
even the feet comprise moveable elements of the human body that can be arranged
and mobilized in conjunction with talk in a potentially limitless variety of configura-
tions. These configurations convey participants’ readiness to interact; the nature and
quality of their relationships; the current and unfolding tenor of the immediate interac-
tion; as well as the moment-by-moment differentiation of their identities as speakers
and hearers, storytellers and story recipients, doctors and patients, and other such iden-
tities that are associated with a variety of interactional, interpersonal, and institutional
activities in interaction. These are activities that are constituted via the particulars of
participants’ speech and body movements as being recognizably about something, as
being directed toward some end, and they comprise the frameworks for meaning and
action in interaction.
2. Background
The study of the body and speech in interaction as a detailed, naturalistic endeavor
owes its beginnings to a confluence of figures from several disciplines and the emer-
gence of technologies – namely, film and video – capable of capturing behaviors that
appear one moment and disappear the next. By most accounts, the meeting of scholars
at Stanford University’s Center for Advanced Study in the Behavioral Sciences in 1955,
and extending via briefer meetings through the late 1960s, marks a pivotal point in this
area of study (Kendon 1990; Leeds-Hurwitz 1987), as does the work of sociologist Erving
Goffman, and later, of conversation analysts.
The Stanford group, sometimes referred to as the Palo Alto group, included such
early and mid-twentieth century figures as psychiatrists Frieda Fromm-Reichmann
and Henry Brosin; linguists Charles Hockett and Norman McQuown; and anthropolo-
gists Alfred Kroeber, Gregory Bateson and Ray Birdwhistell, among others (for a more
complete list, see Leeds-Hurwitz 1987). These figures had, in part, inherited their inter-
est in culture and communication, including gesture and body motion, and the desire to
study these phenomena closely through film, from an earlier generation of anthropolo-
gists who included Frans Boas and Edward Sapir, and, later, Margaret Mead (the latter
figures, Boas’ students); they were also influenced by figures in cybernetics and informa-
tion theory. The group’s initial goal was to use film to understand the role of nonverbal
behavior in the treatment of psychiatric patients, but their work came to be associated
102 I. How the body relates to language and communication
3. Posture
When two or more people interact, they arrange their bodies to communicate their or-
ientations to engagement. The “ecological huddle” in Goffman’s terms (1961), or F-for-
mation in Kendon’s (1990), is the positioning of one’s body toward another (or others)
for interaction, and in ways that convey varying degrees of involvement in any number
of other, possibly competing, activities and events. With their body arrangements,
5. Framing, grounding and coordinating conversational interaction 103
participants create a “frame” of engagement and visibly display their alignment toward
one another as interactants.
As a number of researchers have noted, the human body provides a segmentally or-
ganized hierarchy of resources for communicating participants’ engagement in interac-
tion (Goffman 1963; C. Goodwin 1981; M. H. Goodwin 1997; Kendon 1990; Robinson
1998; Schegloff 1998). The head, torso, and legs especially can be arranged to convey
different points of attentional focus: for example, the head can be oriented in one direc-
tion, the torso in another and the legs in yet another. When these body segments are
aligned in the same direction, a single dominant orientation is communicated; when
they are not, they communicate multiple simultaneous orientations that are ranked
in accord with the relative stability of each body segment. Put another way, the most
stable of these segments, the legs, communicates a person’s dominant orientation rela-
tive to the torso and the head, while the torso communicates a more dominant orien-
tation relative to the head. Schegloff (1998) writes that when these body segments
are arranged divergently, and as such communicate multiple simultaneous involve-
ments, they convey a postural instability that projects a resolution in terms of moving,
for example, the least stable segment, the head, back into alignment with the more sta-
ble segments, the torso and the legs. Thus, a person’s fleeting and transitory involve-
ments are communicated as such relative to their more primary and long term
involvements, and this has important consequences for the forwarding and, alternately
holding off, of interaction.
Schegloff (1998) finds, for example, that co-participants to a conversation treat the
unstable, or “torqued,” body posture of their interlocutors as cause for limiting expan-
sion of a sequence of talk, as when the co-participant turns her head but not her lower
body to engage in talk; alternately, he finds that the alignment of the lower body with
the torqued head can be cause for sequence expansion. As another case in point, in
medical consultations, Robinson (1998) reports that patients entering the consultation
room may find that the doctor, who is seated at his desk, has turned his head to
greet them, although his legs and torso remain directed forward, oriented to the med-
ical records on the desk in front of him. In this way, the doctor’s body, representing a
hierarchy of differentially aligned segments, projects his initial engagement with the
patient as fleeting, and a return to the business with the records as an impending and
dominant involvement – although in the activity context of this encounter, a return
to interaction with the patient is projectably imminent. Patients are sensitive to this
matter: when the doctor turns back to the medical records, they occupy themselves
with such activities as settling in (e.g., shutting the door and taking a seat). When the
doctor is ready to begin the business proper, he will typically turn and orient his entire
body toward the patient, that is, with head, torso, and lower body simultaneously
aligned, and produce a topic initiating utterance such as “what’s the problem?” or
“what can we do for you today?”
4. Gaze
Gaze, too, is an integral element in the communication of participants’ orientations to
engagement, and works in concert with body posture. Looking at another, and another’s
looking back, is a critical step in the move from “sheer and mere co-presence” to rati-
fied mutual engagement, and people may avoid others’ gazes, and/or avoid directing
104 I. How the body relates to language and communication
their own gaze to others, to discourage interaction (Goffman 1963). The management of
speaker-recipient roles, once interaction has begun, has been taken up by a number of
researchers (e.g., Argyle and Cook 1976; Bavelas, Coates, and Johnson 2002; Egbert
1996; C. Goodwin 1981; M. H. Goodwin 1980; Hayashi 2005; Kendon 1967, 1990; Kid-
well 1997, 2006; Lerner 2003; Rossano, Brown, and Levinson 2009; Streeck 1993, 1994).
Speakers and recipients do not typically gaze at one another continuously, but intermit-
tently: recipients gaze toward speakers as an indication of their attentiveness to talk,
and speakers direct their gaze to recipients to show that talk is being addressed to
them; recipients typically gaze for a longer duration at speakers, and speakers for
shorter duration (they tend to look away during long turns at talk as when telling a
story; C. Goodwin 1981; Kendon 1967, 1990). When speakers do not have the gaze
of a recipient, they may produce cut-offs, re-starts, and other dysfluencies until they
secure the recipient’s gaze (C. Goodwin 1981). Speakers may also produce such actions
as tapping or touching the other, bringing their own face and eyes into the other’s line
of regard, and, in some cases, even taking hold of the other’s face and turning it toward
their own; these are actions that are linked with efforts to remediate an encounter with
a resistant and/or unwilling interactant (Kidwell 2006). Recipients, too, may take action
to get a speaker to begin talk or address ongoing talk to them, for example by directing
their gaze (i.e., a show of recipiency) to the would-be speaker, making a sudden body
movement, or contacting the other’s body via some manner of touching (Heath 1986;
Kidwell 1997).
5. Facial expression
The face, while an important topic of study in psychological approaches to body com-
munication (especially, for example, in the work of Ekman), has often been overlooked
as an element in the coordination and management of conversational interaction. The
great mobility of the face, along with the speed (i.e., relative to other body parts) with
which it can be deployed in interaction, make it an especially useful resource as both a
stand-in for, and elaborator of, talk. There is a rich line of research into the syntactic
and semantic functions of the face in conjunction with speech (e.g., Bavelas and Chovil
1997; Birdwhistell 1970; Chovil 1991; Ekman, Sorenson, and Friesen 1969). However,
the face can also be used as a means of regulating talk and other interactional activities.
Kendon (1990) writes of the face in a “kissing round” between a man and a woman sit-
ting on a park bench for how the face, particularly that of the woman’s, regulates the
approach and orientations of the male. While Kendon notes a number of types of facial
expressions (for example “dreamy look” and “innocent look”), he specifically notes
that a closed-lip smile by the woman invites kissing, while a teeth-exposed smile does
not. In this way, the woman’s face serves as a resource for projecting not only what
she will do next (i.e., kiss or not kiss), but also what she will allow the male to do. In
conversational openings, Pillet-Shore (2012) notes that the face, particularly smiling
in conjunction with greetings, is used to “do being warm” at the outset of an encounter,
and invites further interaction. The face may be displayed prominently in interaction,
particularly for the role it plays in the expression of positive affect, but it may also
be shielded in interaction, particularly when it is used in expressions of grief. Thus, par-
ticipants will shield their eyes with their hands or a tissue, turn away, or lower their
heads to prevent others from seeing their faces during emotionally painful moments
5. Framing, grounding and coordinating conversational interaction 105
(Beach and LeBaron 2002; Kidwell 2006). The face itself, as Goffman noted, is one of
“the most delicate components of personal appearance” and integrally involved in the
interactional work by which participants show themselves via constant control of their
facial movements to be situationally present, or “in play” and alive to the obligations of
their involvements with others (Goffman 1963: 27).
In a more recent study of the face in interaction, Ruusuvuori and Peräkylä (2009)
have demonstrated that facial displays not only accompany specific elements of talk,
but can project and follow these elements both in redundant and non-redundant
ways, in effect, making use of the face to extend the temporal boundaries of an action
beyond a turn at talk. They examine the role of the face in storytelling assessments and
other types of tellings. As they report, the face may be used by the speaker to fore-
shadow a stance toward something being described, in this way preparing the listener
for how to respond. Following an utterance, the face may be used by a speaker to pur-
sue uptake by a listener who fails to respond, as when a speaker continues to smile after
completing talk. They also demonstrate that the listener may respond not only verbally
in a way that shows understanding and affiliation with a speaker’s stance, but also with a
like facial expression: in other words, listeners may reciprocate a speaker’s facial
expression as a means of producing a reciprocating stance. It has also been reported
that listeners may use facial actions in conjunction with acknowledgement tokens and
continuers such as “mh hm” and “okay”, or as stand-alone responses to another’s
talk (i.e., without accompanying verbalizations; cf. Bavelas and Chovil 1997).
6. Movement
Movement is not so much an overlooked element in the coordination and management
of conversational interaction as it is a taken-for-granted one. Someone’s approach
toward another, like gaze directed at another, is one of the most basic and pervasive
ways by which interaction is initiated and, with the person’s movement away,
terminated – a particularly powerful resource for even very young children, who are
in the pre- and early-verbal stages of language use (Kidwell and Zimmerman 2007).
Body movement as an interactional resource has been considered in other ways as
well. For example, police may strategically move their bodies toward a suspect in con-
junction with their talk to prompt a confession (LeBaron and Streeck 1997); in a public
place such as a museum, visitors are attracted to exhibits that others are attracted to,
and move into the spaces left by others when they move on to the next exhibit
(Lehn, Heath, and Hindmarsh 2001). Regarding a fundamental organization of body
movement, Sacks and Schegloff (2002) showed that moving bodies, including moving
hands and limbs, typically return to the place from which they started, that is, to a
“home position” (Sacks and Schegloff 2002).
During conversation, participants may exhibit “interactional synchrony”, that is, a
roughly similar flow of body movements such as postural shifts, positioning of limbs,
and head movements by which they make visible and regulate their involvement with
one another (Condon and Ogston 1966; Kendon 1970, 1990). Head movements have
been found to have quite diverse functions. As a semantic matter, the head can be
used with or without speech to signify an affirmative or negative response. McClave
(2000) reports on a number of additional semantic patterns: for example, in conjunction
with certain words or phrases, lateral head sweeps can be used to show inclusivity;
106 I. How the body relates to language and communication
lateral head shakes can be used to show intensity, disbelief, and/or uncertainty (M. H.
Goodwin 1980). Head movements are produced with greater frequency by speakers,
and speakers’ head movements may trigger listeners’ head movements (McClave
2000: 874–875). Listeners also produce head movements as a demonstration of their
attention to talk. Head nods may be produced alone, or in conjunction with acknowl-
edgement tokens and continuers such as “mh hm” and “okay”. Stivers (2008) notes a
distinct difference between the use of head nods and verbal tokens. Specifically, head
nods that are placed in the mid-telling position of a story demonstrate an affiliative
stance toward that displayed via speaker’s formulation of story events, while verbal to-
kens demonstrate alignment. Listeners may also make more affective responses with
their heads, as when they make a sudden jerk back to show surprise, or a particular
sort of comprehension, what Goodwin has called “take” (M. H. Goodwin 1980: 309).
7.1. Openings
In interaction, participants must have some way of beginning an encounter, that is, of
indicating their interest in interacting, and their availability and willingness to do so.
Openings are critical to the initiation of interaction, not only in terms of coordinating
participants’ basic entry into an encounter, but also in terms of proposing something
about the nature of participants’ relationship to one another, the business at hand,
and, often, the tenor of the interaction to come.
like waiting for a bus or sitting in a class. These sorts of scenarios represent in Goff-
man’s terms, the realm of unfocused interaction, situations in which, although people
are co-present and attending consciously or unconsciously to any number of embodied
or otherwise unspoken communication phenomena by others, ratified social interaction
has yet (if at all) to take place (Goffman 1963).
7.1.3. Greetings
Greetings may be verbal (Hello! Hey! How’s it goin’?) and/or embodied actions (waves,
head tosses, handshakes, hugs), that also typically include participants orientation of
their eyes and bodies toward one another and facial displays (e.g., smiles and eyebrow
flashes). Greetings proffered and greetings returned is a way that parties acknowledge
one another when they come into one another’s presence, a fundamental means of “per-
son appreciation”, but they also open up the possibility of further interaction and are
perhaps the most frequent way that participants begin interaction. Through their lexical
and intonational verbal production, in conjunction with their embodied components,
greetings reflect and propose something about the character of a relationship: for exam-
ple, are participants strangers or casual acquaintances, or are they good friends who
have not seen one another in a long time?
Kendon and Ferber (1973; Kendon 1990), in their classic paper on greetings, describe
a recurrent sequence of behaviors by which participants come to greet one another in
naturally occurring social gatherings. In the backyard birthday party example, guest and
host proceed through distinct phases, what are termed the “distance salutation” (made
when the guest first enters through the backyard gate), the “approach”, and the “close
salutation”. In the distance salutation phase, behaviors include sighting, in which parti-
cipants visually locate one another and typically wait for a return sighting, followed by
greeting displays (e.g., a hand wave, a head toss, and/or a “hello”) and accompanying
108 I. How the body relates to language and communication
smiles. The approach phase may occur concurrently, or shortly thereafter. This phase is
characterized by participants looking away from one another, especially as they get
close; participants may also engage in self-grooms (e.g., smoothing their hair, adjusting
their clothing) and “body crosses” (crossing one or both arms or hands in front of the
body) as they approach. Once participants are near enough to begin the close salutation,
they again look at one another and produce another greeting, often followed by or pro-
duced simultaneously with such actions as a handshake, embrace, kiss, and/or other sort
of touching; this phase is also accompanied by smiles. The authors note that greeting in-
teractions are interrelated with participants’ roles as guests and hosts, their degree of
familiarity, and their relational status. For example, a host traveling far from the center
of the party to greet a guest creates a display of respect and enthusiasm at their arrival;
guests entering into the center before being greeted create a show of familiarity, while
those who wait on the fringe to be greeted first show relative unfamiliarity.
Indeed, the very first moves in face-to-face openings enable participants to discern
whether or not they are acquainted with someone and to design their greetings and
next moves accordingly. As Pillet-Shore (2011) writes, gaze is used to do critical iden-
tification/recognition work, that is, to discern in the very first moments of interaction
whether or not participants who are coming into one another’s presence already
know each other. Participants’ distinction between the acquainted and the unac-
quainted, and the consequences for subsequent interaction, is a major organizing fea-
ture of social behavior, as noted by Goffman (1963). Pillet-Shore (2012) documents the
systematicity by which participants, upon visually locating another as an acquaintance
or not, produce greetings that are recipient designed. Greetings between acquainted
parties are produced at a relatively louder volume than surrounding speech, and
make use of such features as a higher pitch, “smiley voice” intonation in conjunction
with smiles, continuing and rising final intonation, sound stretches, and multiple greet-
ing components (verbal and embodied); these latter two features enable greetings to be
produced in overlap. Hence, acquainted participants “do being warm” and index their
familiarity, in this way conveying that their identification/recognition work has been
successful and that they may move forward in the interaction.
7.2. Storytelling
One very common sort of activity that participants engage in in interaction is storytell-
ing. As Sacks (1972) and Jefferson (1978) noted, stories have a distinct structure that
consists of
(i) initiation,
(ii) delivery, and
(iii) reception by the story recipients.
when one of them, A, turns to another, T, and initiates a story (discussed in Kidwell
1997). Transcription conventions can be found in the appendix.
A’s actions at line 1, that is, her turn toward T in addition to slapping the table and di-
recting her gaze at her, are embodied techniques that, in conjunction with her talk, des-
ignate T as her primary addressed recipient. A’s actions, however, have the effect of
eliciting a display of recipiency from both T and R: they both shift their gaze to A
although it is only T (the addressed recipient) who answers the story initiation question
that A has posed, which she does with a negative head nod. Story initiations are de-
signed to separate knowing from unknowing recipients, prepare recipients for the
kind of story that is being offered, and set up an extended turn space for the teller to
deliver the story (C. Goodwin 1981; Jefferson 1978; Sacks 1972). T’s action (the negative
head nod) informs A that she hasn’t heard the story, and, thus, functions as a go-ahead for
A to tell the story.
Getting the go-ahead, A assumes a distinct teller’s posture by returning her gaze
back to the center of the table and adjusting her clothing; she then places her elbows
on the table, and rests her head in her hands as she speaks at line 6 (see also C. Good-
win 1984). She maintains this position until she once again shifts her gaze to T at line 8.
Of note, however, is that it is R, the unaddressed recipient who responds with
continuers and head nods at lines 10 and 12.
The gaze shift by A at line 8 is done as part of a reference check: A wants to confirm
that T knows what she is talking about and, in addition to shifting her gaze to T, she
produces the word “queery” with a try-marked, rising intonation (Sacks and Schegloff
1979). Getting no indication of recognition from T, she continues with an explanation of
what “queery” means in lines 8, 9, and 11 while she looks at T. Although A has not di-
rected any of her gazes to R, and thus has not treated her as someone for whom the
story is being told, it is R who responds. R produces continuers that, along with her
head nods and gaze toward A, displays – and claims – recipient status; T, for her
part, makes only head nods at line 11 (Schegloff 1982). Moreover, R’s continuers and
head nods, produced in overlap with A’s explanation rather than at turn construction
110 I. How the body relates to language and communication
unit boundaries, displays that she already knows what A is talking about, a way of
demonstrating that the story-in-progress is relevant for her, too.
In sum, A uses body positioning, movement, and gaze to designate T as her primary
addressed recipient. However, R challenges this framework with her embodied actions
and vocalizations. By positioning her head nods and continuers as she does, R shows
that not only is she a recipient, but that certain story elements are familiar to her,
too, and, therefore, that she is entitled to being addressed as a recipient. These
moves and other moves by R (not shown here) work to re-shape the participation
framework such that A subsequently (albeit briefly) accommodates her as a story
recipient.
8. Conclusion
As has been discussed here, the human body provides participants with a critical
resource for accomplishing and differentiating their work as particular sorts of partici-
pants in interaction (i.e., speaker and recipient, storyteller and story recipient, doctor
and patient, etc.), and in the variety of interactional, interpersonal, and institutional ac-
tivities that comprise encounters. The sensitivities of participants to these body behav-
ioral resources speak to the fundamental sociality of a social species in which even the
most minimal of movements of the body, face, eyes, hands, head and so on are of con-
sequence for what they understand about what others are doing, and what they them-
selves are expected to do, upon occasions of their coming together. Together with talk,
these resources are part of an extraordinarily powerful yet nuanced toolkit for going
about the complex business of being human.
9. References
Argyle, Michael and Mark Cook 1976. Gaze and Mutual Gaze. Cambridge: Cambridge University
Press.
Bavelas, Janet, Linda Coates and Trudy Johnson 2002. Listener responses as a collaborative pro-
cess: The role of gaze. Journal of Communication 52(3): 566–580.
Bavelas, Janet and Nicole Chovil 1997. Faces in dialogue. In: James A. Russell and José Miguel
Fernandez-Dols (eds.), The Psychology of Facial Expression, 334–346. Cambridge: Cambridge
University Press.
Beach, Wayne A. and Curtis D. LeBaron 2002. Body disclosures: Attending to personal problems
and reported sexual abuse during a medical encounter. Journal of Communication 52: 617–639.
Birdwhistell, Ray L. 1970. Kinesics and Context: Essays on Body Motion Communication. Phila-
delphia: University of Pennsylvania Press.
Chovil, Nicole 1991. Discourse-oriented facial displays in conversation. Research on Language and
Social Interaction 25: 163–194.
Condon, William S. and William D. Ogston 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143(4): 338–347.
Egbert, Maria 1996. Context sensitivity in conversation analysis: Eye gaze and the German repair
initiator “bitte.” Language in Society 25: 587–612.
Ekman, Paul, E. Richard Sorenson and Wallace V. Friesen 1969. Pan-cultural elements in facial
displays of emotions. Science 164(3875): 86–88.
Garfinkel, Harold 1967. Studies in Ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall.
Goffman, Erving 1959. Presentation of Self in Everyday Life. New York: Doubleday Anchor.
Goffman, Erving 1961. Encounters: Two Studies in the Sociology of Interaction. Indianapolis:
Bobbs-Merrill.
Goffman, Erving 1963. Behavior in Public Places. New York: Free Press.
Goffman, Erving 1981. Forms of Talk. Philadelphia: University of Pennsylvania Press.
Goffman, Erving 1983. The interaction order. American Sociological Review 48: 1–17.
Goodwin, Charles 1981. Conversational Organization: Interaction between Speakers and Hearers.
London: Academic Press.
Goodwin, Charles 1984. Notes on story structures and the organization of participation. In: John
Maxwell Atkinson and John Heritage (eds.), Structures of Social Action: Studies in Conversa-
tion Analysis, 225–246. Cambridge: Cambridge University Press.
Goodwin, Marjorie Harness 1980. Processes of mutual monitoring implicated in the production of
description sequences. Sociological Inquiry 50: 303–317.
Goodwin, Marjorie Harness 1997. By-play: Negotiating evaluation in storytelling. In: Gregory R.
Guy, Crawford Feagin, Deborah Schiffrin and John Baugh (eds.), Toward a Social Science of
Language: Papers in Honor of William Labov, Volume 2, 77–102. Amsterdam: John Benjamins.
Hayashi, Makoto 2005. Joint turn construction through language and the body: Notes on embodi-
ment in coordinated participation in situated activities. Semiotica 156: 21–53.
Heritage, John 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Heath, Christian 1984. Talk and recipiency: sequential organization in speech and body move-
ment. In: John Maxwell Atkinson and John Heritage (eds.), Structures of Social Action: Studies
in Conversation Analysis, 247–265. Cambridge: Cambridge University Press.
Heath, Christian 1986. Body Movement and Speech in Medical Interaction. Cambridge: Cambridge
University Press.
Jefferson, Gail 1978. Sequential aspects of story telling in conversation. In: Jim N. Schenkein (ed.),
Studies in the Organization of Conversational Interaction, 213–248. New York: Academic Press.
Kendon, Adam 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26:
22–63.
Kendon, Adam 1970. Movement coordination in social interaction. Acta Psychologica 32: 1–25.
112 I. How the body relates to language and communication
Kendon, Adam 1990. Conducting Interaction: Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Kendon, Adam and Andrew Ferber 1973. A description of some human greetings. In: Richard
Phillip Michael and John Hurrell Cook (eds.), Comparative Ecology and Behavior of Primates,
591–668. London: Academic Press.
Kidwell, Mardi 1997. Demonstrating recipiency: Resources for the unacknowledged recipient.
Issues in Applied Linguistics 8(2): 85–96.
Kidwell, Mardi 2000. Common ground in cross-cultural communication: Sequential and insti-
tutional contexts in front desk service encounters. Issues in Applied Linguistics 11(1):
17–37.
Kidwell, Mardi 2006. “Calm Down!”: The role of gaze in the interactional management of hysteria
by the police. Discourse Studies 8(6): 745–770.
Kidwell, Mardi and Don Zimmerman 2007. Joint attention as action. Journal of Pragmatics 39(3):
592–611.
Leeds-Hurwitz, Wendy 1987. The social history of the “Natural History of an Interview”: A multi-
disciplinary investigation of social communication. Research on Language and Social Interac-
tion 20: 1–51.
LeBaron, Curtis D. and Jürgen Streeck 1997. Built space and the interactional framing of experi-
ence during a murder interrogation. Human Studies 20: 1–25.
Lehn, Dirk, Christian Heath and Jon Hindmarsh 2001. Exhibiting interaction: Conduct and collab-
oration in museums and galleries. Symbolic Interaction 24(2): 189–216.
Lerner, Gene H. 2003. Selecting next speaker: The context-sensitive operation of a context-free
organization. Language in Society 32: 177–201.
Lerner, Gene H., Don Zimmerman and Mardi Kidwell 2011. Formal structures of practical tasks:
A resource for action in the social lives of very young children. In: Charles Goodwin, Jürgen
Streeck and Curtis D. LeBaron (eds.), Multimodality and Human Activity: Research on Human
Behavior, Action, and Communication, 44–58. Cambridge: Cambridge University Press.
McClave, Evelyn Z. 2000. Linguistic funtions of head movements in the context of speech. Journal
of Pragmatics 32: 855–878.
Mondada, Lorenza 2009. Emergent focused interactions in public places: A systematic analysis of
the multimodal achievement of a common interactional space. Journal of Pragmatics 41(10):
1977–1997.
Pillet-Shore, Danielle 2011. Doing introductions: The work involved in meeting someone new.
Communication Monographs 78(1): 73–95.
Pillet-Shore, Danielle 2012. Displaying stance through prosodic recipient design. Research on
Language and Social Interaction. 45(4): 375–398.
Robinson, Jeffrey David 1998. Getting down to business: Talk, gaze, and body orientation during
openings of doctor-patient consultations. Human Communication Research 25: 97–123.
Rossano, Federico, Penelope Brown and Stephen C. Levinson 2009. Gaze, questioning and cul-
ture. In: Jack Sidnell (ed.), Conversation Analysis: Comparative Perspectives, 187–249. Cam-
bridge: Cambridge University Press.
Ruusuvuori, Johanna and Anssi Peräkylä 2009. Facial and verbal expressions in assessing stories
and topics. Research on Language and Social Interaction 42(4): 377–394.
Sacks, Harvey 1972. On the analyzability of stories by children. In: John J. Gumperz and Dell
Hymes (eds.), Directions in Sociolinguistics: The Ethnography of Communication, 325–345.
New York: Rinehart and Winston.
Sacks, Harvey and Emanuel A. Schegloff 1979. Two preferences in the organization of reference
to persons in conversation and their interaction. In: George Psathas (ed.), Everyday Language:
Studies in Ethnomethoology, 15–21. New York: Erlbaum.
Sacks, Harvey and Emanuel Schegloff 2002. Home position. Gesture 2: 133–146.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the orga-
nization of turn-taking for conversation. Language 50: 696–735.
6. Homesign: When gesture is called upon to be language 113
Schegloff, Emanuel A. 1979. Identification and recognition in telephone openings. In: George
Psathas (ed.), Everyday Language: Studies in Ethnomethoology, 24–78. New York: Erlbaum.
Schegloff, Emanuel A. 1982. Discourse as an interactional achievement: Some uses of “uh huh”
and other things that come between sentences. In: Deborah Tannen (ed.), Analyzing Dis-
course: Text and Talk, 71–93. Washington, DC: Georgetown University Press.
Schegloff, Emanuel A. 1998. Body torque. Social Research 65: 535–586.
Schegloff, Emanuel A. and Harvey Sacks 1973. Opening up closings. Semiotica 8: 289–327.
Stivers, Tanya 2008. Stance, alignment and affiliation during story telling: When nodding is a token
of preliminary affiliation. Research on Language in Social Interaction 41: 29–55.
Streeck, Jürgen 1993. Gesture as communication I: Its coordination with gaze and speech. Com-
munication Monographs 60: 275–299.
Streeck, Jürgen 1994. Gesture as communication II: The audience as co-author. Research on Lan-
guage and Social Interaction 27: 239–267.
Abstract
When people speak, they gesture, and young children are no exception. In fact, children
who are learning spoken language use gesture to take steps into language that they cannot
yet take in speech. But not all children are able to make use of the spoken input that sur-
rounds them. Deaf children whose profound hearing losses prevent them from acquiring
spoken language and whose hearing parents have not exposed them to sign language also
use gesture, called homesigns, to communicate. These homesigns take on the most basic
functions and forms of language – lexicon, morphology, sentential structure, grammatical
categories, sentential markers for negations, questions, past and future, and phrasal struc-
ture. As such, the deaf children’s homesign gestures are qualitatively different from the
co-speech gestures that surround them and, in this sense, represent first steps in the
process of language creation.
All children who learn a spoken language use gesture. But some children – deaf chil-
dren with profound hearing losses, for example – are unable to learn the spoken language
that surrounds them. If exposed to a conventional sign language, these deaf children will
acquire that language as naturally as hearing children acquire spoken language (Lillo-
Martin 1999; Newport and Meier 1985). If, however, deaf children with profound hearing
losses are not exposed to sign, they have only gesture to communicate with the hearing
individuals in their worlds.
114 I. How the body relates to language and communication
The gestures used by deaf children in these circumstances are known as homesigns.
They are different in both form and function from the gestures that hearing children pro-
duce to communicate along with speech, and resemble more closely the signs that deaf
children of deaf parents and the words that hearing children of hearing parents learn
from their respective communities. We begin with a brief look at the gestures that hearing
children produce in the early stages of language learning, and then turn to the homesign
gestures that deaf children create to substitute for language.
proposition (akin to a complex sentence, e.g., “I like it” + eat gesture) several months
before producing a complex sentence entirely in speech (“I like to eat it”). Gesture
thus continues to be at the cutting edge of early language development, providing
stepping-stones to increasingly complex linguistic constructions.
2.1. Lexicon
Like hearing children at the earliest stages of language-learning, deaf homesigners use
both pointing gestures and iconic gestures to communicate. Their gestures, rather than
being mime-like displays, are discrete units, each of which conveys a particular meaning.
Moreover, the gestures are non-situation-specific – a twist gesture, for instance, can be
used to request someone to twist open a jar, to indicate that a jar has been twisted open,
to comment that a jar cannot be twisted open, or to tell a story about twisting open a
jar that is not present in the room. In other words, the homesigner’s gestures are not
tied to a particular context, nor are they even tied to the here-and-now (Morford
and Goldin-Meadow 1997). In this sense, the gestures warrant the label sign.
Homesigners use their pointing gestures to refer to the same range of objects that
young hearing children refer to using, first, pointing gestures and, later, words – and
in the same distribution (Feldman, Goldin-Meadow, and Gleitman 1978). Both groups
of children refer most often to inanimate objects, followed by people and animals. They
also both refer to body parts, food, clothing, vehicles, furniture and places, but less
frequently.
116 I. How the body relates to language and communication
Homesigners use iconic gestures more frequently than most hearing children learn-
ing spoken language. Their iconic gestures function like nouns, verbs, and adjectives in
conventional languages (Goldin-Meadow et al. 1994), although there are fundamental
differences between iconic gestures and words. The form of an iconic gesture captures
an aspect of its referent; the form of a word does not. Interestingly, although iconicity is
present in many of the signs of American Sign Language (ASL), deaf children learning
American Sign Language do not seem to notice. Most of their early signs are either not
iconic (Bonvillian, Orlansky, and Novack 1983) or, if iconic from an adult’s point of
view, not recognized as iconic by the child (Schlesinger 1978). In contrast, deaf indivi-
duals inventing their own homesigns are forced by their social situation to create ges-
tures that not only begin transparent but remain so. If they didn’t, no one in their
worlds would be able to take any meaning from the gestures they create. Homesigns
therefore have an iconic base.
Despite the fact that the gestures in a homesign system need to be iconic to be under-
stood, they form a stable lexicon. Homesigners could create each gesture anew every
time they use it, as hearing speakers seem to do with their gestures (McNeill 1992).
If so, we might still expect some consistency in the forms the gestures take simply
because the gestures are iconic and iconicity constrains the set of forms that can be
used to convey a meaning. However, we might also expect a great deal of variability
around a prototypical form – variability that would crop up simply because each situa-
tion is a little different, and a gesture created specifically for that situation is likely to
reflect that difference. In fact, it turns out that there is relatively little variability in
the set of forms a homesigner uses to convey a particular meaning. The child tends
to use the same form, say, two fists breaking apart in a short arc to mean “break”,
every single time that child gestures about breaking, no matter whether it’s a cup break-
ing, or a piece of chalk breaking, or a car breaking (Goldin-Meadow et al. 1994). Thus,
the homesigner’s gestures adhere to standards of form, just as a hearing child’s words or
a deaf child’s signs do (Singleton, Morford, and Goldin-Meadow 1993). The difference
is that the homesigner’s standards are idiosyncratic to the creator rather than shared by
a community of language users.
2.2. Morphology
Modern languages (both signed and spoken) build up words combinatorially from a
repertoire of a few dozen smaller meaningless units. We do not yet know whether
homesign has phonological structure (but see Brentari et al. 2012). However, there is
evidence that homesigns are composed of parts, each of which is associated with a par-
ticular meaning; that is, they have morphological structure (Goldin-Meadow, Mylander,
and Butcher 1995; Goldin-Meadow, Mylander, and Franklin 2007). The homesigners
could have faithfully reproduced in their gestures the actions that they actually perform.
They could have, for example, created gestures that capture the difference between
holding a balloon string and holding an umbrella. But they don’t. Instead, the children’s
gestures are composed of a limited set of handshape forms, each standing for a class of
objects, and a limited set of motion forms, each standing for a class of actions. These
handshape and motion components combine freely to create gestures, and the meanings
of these gestures are predictable from the meanings of their component parts. For
example, a hand shaped like an “O” with the fingers touching the thumb, that is, an
6. Homesign: When gesture is called upon to be language 117
OTouch handshape form, combined with a Revolve motion form means “rotate an
object <2 inches wide around an axis”, a meaning that can be transparently derived
from the meanings of its two component parts (OTouch = handle an object <2 inches
wide; Revolve = rotate around an axis).
Importantly in terms of arguing that a morphological system underlies the children’s
homesigns, we note that (1) the vast majority of gestures that each deaf child produces
conforms to the morphological description for that child, and (2) this morphological
description can be used to predict the forms and meanings of the new gestures that
the child produces. Thus, homesigns exhibit a simple morphology, one that is akin to
the morphologies found in conventional sign languages. Interestingly, it is much more
difficult to impose a coherent morphological description that can account for the ges-
tures that hearing speakers produce (Goldin-Meadow, Mylander, and Butcher 1995;
Goldin-Meadow, Mylander, and Franklin 2007), suggesting that morphological struc-
ture is not an inevitable outgrowth of the manual modality but is instead a linguistic
characteristic that deaf children impose on their communication systems.
each child is developing his or her system alone without contact with other deaf chil-
dren and in different cultures (Goldin-Meadow and Mylander 1998). The homesigners
tend to produce gestures for patients in the first position of their sentences, before ges-
tures for verbs (cheese–eat) and before gestures for endpoints of a transferring action
(cheese–table). They also produce gestures for verbs before gestures for endpoints
(give–table). In addition, they produce gestures for intransitive actors before gestures
for verbs (mouse-run).
Homesigners’ third device for indicating thematic roles is to displace verb gestures
toward objects playing particular roles, as opposed to producing them in neutral
space (at chest level). These displacements are reminiscent of inflections in conven-
tional sign languages (Padden 1983, 1990). Homesigners tend to displace their gestures
toward objects that are acted upon and thus use their inflections to signal patients. For
example, displacing a twist gesture toward a jar signals that the jar (or one like it) is the
object to be acted upon (Goldin-Meadow et al. 1994). Thus, homesign sentences adhere
to simple syntactic patterns marking who does what to whom.
refer to objects. In other words, the larger unit substitutes for the smaller units and, in
this way, functions as a phrase.
Finally, the hearing mothers’ iconic gestures were not stable in form and meaning
over time while their deaf children’s homesigns were. Moreover, the hearing mothers
did not distinguish between gestures serving a noun role and gestures serving a verb
role; the deaf children did (Goldin-Meadow et al. 1994).
Did the deaf children learn to structure their homesign systems from their mothers?
Probably not – although it may have been necessary for the children to see hearing peo-
ple gesturing in communicative situations in order to get the idea that gesture can be
appropriated for the purposes of communication. But in terms of how the children
structure their homesigns, there is no evidence that this structure came from the chil-
dren’s hearing mothers. The hearing mothers’ gestures do not have structure when
looked at with tools used to describe the deaf children’s homesigns (although they
do when looked at with tools used to describe co-speech gestures, that is, when they
are described in relation to speech).
The hearing mothers interacted with their deaf children on a daily basis. We there-
fore might have expected that their gestures would eventually come to resemble their
children’s homesigns (or vice versa). But they did not. Why didn’t the hearing parents
display language-like properties in their gestures? The parents were interested in teach-
ing their deaf children to talk, not gesture. They therefore produced all of their gestures
with speech – in other words, their gestures were co-speech gestures and had to behave
accordingly. The gestures had to fit, both temporally and semantically, with the speech
they accompanied. As a result, the hearing parents’ gestures were not “free” to take on
language-like properties.
In contrast, the deaf homesigners had no such constraints. They had no productive
speech and thus always produced gesture on its own, without talk. Moreover, because
the manual modality was the only means of communication open to the children, it had
to take on the full burden of communication. The result was language-like structure.
Although the homesigners may have used their hearing parents’ gestures as a starting
point, it is clear that they went well beyond that point. They transformed the co-speech
gestures they saw into a system that looks very much like language.
of signers. For example, the first cohort combines their signs as do homesigners, adhering
to consistent word orders to convey who does what to whom (Senghas et al. 1997).
But Nicaraguan Sign Language has not stopped there. Every year, new students
enter the school and learn to sign among their peers. This second cohort of signers
has as its input the sign system developed by the first cohort and, interestingly, changes
that input so that the product becomes more language-like. For example, second cohort
signers go beyond the small set of basic word orders used by the first cohort, introducing
new orders not seen previously in the language (Senghas et al. 1997). As a second exam-
ple, the second cohort begins to use spatial devices invented by the first cohort, but they
use these devices consistently and for contrastive purposes (Senghas et al. 1997;
Senghas and Coppola 2001).
The second cohort, in a sense, stands on the shoulders of the first. They do not need
to invent the properties of language found in homesign – those properties are already
present in their input. They can therefore take the transformation process one step fur-
ther. The Nicaraguan homesigners (and, indeed, all homesigners) take the first, and per-
haps the biggest, step: They transform their hearing parents’ gestures, which are not
structured in language-like ways, into a language-like system (Coppola et al. 1997;
see also Singleton, Goldin-Meadow, and McNeill 1995). The first and second cohort
of Nicaraguan signers are then able to build on these properties, creating a system
that looks more and more like the natural languages of the world.
There is, however, another interesting wrinkle in the language-creation story – it
matters how old the creator is. Second cohort signers who began learning Nicaraguan
Sign Language relatively late in life (after age 10) do not exhibit these linguistic ad-
vances and, in fact, use signs systems that are no different from those used by late-learn-
ing first cohort signers (Senghas 1995; Senghas and Coppola 2001). It looks like the
creator may have to be a child to take full advantage of the input provided by the
first cohort to continue the process of language creation. Thus, we see in Nicaraguan
Sign Language that language creation depends not only on what the creator has to
work with, but also on who the creator is.
To summarize, when children are provided with a model for language, they use ges-
ture to take steps into language that they cannot yet take in speech. But when children
do not have a model for language, they use gesture to fill the void. They create a system
of homesigns that assumes the most basic forms and functions of language. These home-
sign gestures are qualitatively different from the co-speech gestures that serve as input
to the system and, in this sense, represent the first steps of language creation.
Acknowledgements
This research was supported by grant no. R01 DC00491 from NIDCD. Thanks to my
many collaborators for their invaluable help in uncovering the structure of homesign,
and to the children and their families for welcoming us into their homes.
5. References
Bates, Elizabeth, Laura Benigni, Inge Bretherton, Luigia Camaioni and Virginia Volterra 1979. The
Emergence of Symbols: Cognition and Communication in Infancy. New York: Academic Press.
Bekken, Kaaren 1989. Is there “Motherese” in gesture? Unpublished doctoral dissertation, Uni-
versity of Chicago.
6. Homesign: When gesture is called upon to be language 123
Bonvillian, John D., Michael D. Orlansky and Lesley Lazin Novack 1983. Developmental mile-
stones: Sign language acquisition and motor development. Child Development 54: 1435–1445.
Brentari, Diane, Marie Coppola, Laura Mazzoni and Susan Goldin-Meadow 2012. When does a
system become phonological? Handshape production in gesturers, signers, and homesigners.
Natural Language and Linguistic Theory 30(1): 1–31.
Brown, Roger 1973. A First Language. Cambridge, MA: Harvard University Press.
Coppola, Marie and Elissa L. Newport 2005. Grammatical “subjects” in home sign: Abstract lin-
guistic structure in adult primary gesture systems without linguistic input. Proceedings of the
National Academy of Sciences 102: 19249–19253.
Coppola, Marie, Ann Senghas, Elissa L. Newport and Ted Supalla 1997. The emergence of gram-
mar: Evidence from family-based sign systems in Nicaragua. Paper presented at the Boston
University Conference on Language Development.
Feldman, Heidi, Susan Goldin-Meadow and Lila Gleitman 1978. Beyond Herodotus: The creation
of language by linguistically deprived deaf children. In: Andrew Lock (ed.), Action, Symbol,
and Gesture: The Emergence of Language. New York: Academic Press.
Franklin, Amy, Anastasia Giannakidou and Susan Goldin-Meadow 2011. Negation, questions, and
structure building in a homesign system. Cognition 118(3): 398–416.
Goldin-Meadow, Susan 1982. The resilience of recursion: A study of a communication system de-
veloped without a conventional language model. In: Eric Wanner and Lila R. Gleitman (eds.),
Language Acquisition: The State of the Art, 51–77. New York: Cambridge University Press.
Goldin-Meadow, Susan 1985. Language development under atypical learning conditions: Replica-
tion and implications of a study of deaf children of hearing parents. In: Keith Nelson (ed.),
Children’s Language, Volume 5, 197–245. Hillsdale, NJ: Lawrence Erlbaum and Associates.
Goldin-Meadow, Susan 1993. When does gesture become language? A study of gesture used as a
primary communication system by deaf children of hearing parents. In: Kathleen R. Gibson
and Tim Ingold (eds.), Tools, Language and Cognition in Human Evolution, 63–85. New
York: Cambridge University Press.
Goldin-Meadow, Susan 2003. The Resilience of Language. Philadelphia, PA: Taylor and Francis.
Goldin-Meadow, Susan and Cindy Butcher 2003. Pointing toward two-word speech in young chil-
dren. In: Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet. Mahwah,
NJ: Erlbaum Associates.
Goldin-Meadow, Susan, Cindy Butcher, Carolyn Mylander and Mark Dodge 1994. Nouns and
verbs in a self-styled gesture system: What’s in a name? Cognitive Psychology 27: 259–319.
Goldin-Meadow, Susan and Marolyn Morford 1985. Gesture in early child language: Studies of
deaf and hearing children. Merrill-Palmer Quarterly 31: 145–176.
Goldin-Meadow, Susan and Carolyn Mylander 1983. Gestural communication in deaf children:
The non-effects of parental input on language development. Science 221: 372–374.
Goldin-Meadow, Susan and Carolyn Mylander 1984. Gestural communication in deaf children:
The effects and non-effects of parental input on early language development. Monographs
of the Society for Research in Child Development 49: 1–121.
Goldin-Meadow, Susan and Carolyn Mylander 1998. Spontaneous sign systems created by deaf
children in two cultures. Nature 91: 279–281.
Goldin-Meadow, Susan, Carolyn Mylander and Cindy Butcher 1995. The resilience of combinator-
ial structure at the word level: Morphology in self-styled gesture systems. Cognition 56:
195–262.
Goldin-Meadow, Susan, Carolyn Mylander and Amy Franklin 2007. How children make language
out of gesture: Morphological structure in gesture systems developed by American and Chi-
nese deaf children. Cognitive Psychology 55: 87–135.
Goodwyn, Susan, Linda Acredolo and Catherine Brown 2000. Impact of symbolic gesturing on
early language development. Journal of Nonverbal Behavior 24: 81–103.
Greenfield, Patricia M. and E. Sue Savage-Rumbaugh 1991. Imitation, grammatical development,
and the invention of protogrammar by an ape. In: Norman A. Krasnegor, Duane M.
124 I. How the body relates to language and communication
Shatz, Marilyn 1982. On mechanisms of language acquisition: Can features of the communicative
environment account for development? In: Eric Wanner and Lila R. Gleitman (eds.), Lan-
guage Acquisition: The State of the Art, 102–127. New York: Cambridge University Press.
Singleton, Jenny L., Susan Goldin-Meadow and David McNeill 1995. The cataclysmic break
between gesticulation and sign: Evidence against an evolutionary continuum of manual com-
munication. In: Karen Emmorey and Judy Reilly (eds.), Language, Gesture, and Space, 287–
311. Hillsdale, NJ: Erlbaum Associates.
Singleton, Jenny L., Jill P. Morford and Susan Goldin-Meadow 1993. Once is not enough: Stan-
dards of well-formedness in manual communication created over three different timespans.
Language 69: 683–715.
Tervoort, Bernard T. 1961. Esoteric symbolism in the communication behavior of young deaf chil-
dren. American Annals of the Deaf 106: 436–480.
Thompson, Sandra A. 1988. A discourse approach to the cross-linguistic category “adjective”. In:
John A. Hawkins (ed.), Explaining Language Universals, 167–185. Cambridge, MA: Basil
Blackwell.
Abstract
For much of history, the relationship among spoken language, signed language, and ges-
ture has been a source of contention among language scholars and philosophers of
language. The chapter examines the history of this question, beginning with the infamous
Milan conference and the debate over whether deaf children should be permitted to sign
or be required to learn to speak. Framed in Cartesian mind-body dualism, the debate
determined the scientific world view for the next 100 years. Recently, however, three
developments in the science of communication have begun to form a unified view of
the nature of human semiotic capabilities: (1) the linguistic study of signed languages;
(2) the growth of gesture studies; and (3) the new approach to linguistic theory called cog-
nitive linguistics and cognitive grammar. Each development, and the integration among
them, is described. Finally, the emerging interdisciplinary synthesis of the three areas is
discussed.
language scholars. Language has consistently been equated with speech. Signed lan-
guages were rarely, if ever, recognized as language; rather, they were commonly seen
as nothing more than depictive gestures. Gesture was regarded as a universal language,
more closely related to nature than is spoken language.
For deaf people who use their hands and bodies to express their language, the nature
of the relationship between these three systems is far from merely a philosophical ques-
tion. It informed a centuries-long debate about whether deaf children could be edu-
cated and become integrated into the general society. In the mid- to late 1800s this
debate centered around whether deaf children should be permitted to use signed lan-
guage, or whether signed languages should be forbidden and deaf children should be
taught using only speech and articulation training. These two forces, those who sup-
ported a manual approach and those who supported oralism, came together at the
Milan Conference of 1880. It was here that the proponents of oralism most forcefully
made their case.
Marius Magnat, the director of an oral school in Geneva at the time, made the case
for why the oralist approach should be preferred over that of signed language:
The advantages of articulation training [i.e., speech] […] are that it restores the deaf to
society, allows moral and intellectual development, and proves useful in employment.
Moreover, it permits communication with the illiterate, facilitates the acquisition and
use of ideas, is better for the lungs, has more precision than signs, makes the pupil the
equal of his hearing counterpart, allows spontaneous, rapid, sure, and complete expression
of thought, and humanizes the user. Manually taught children are defiant and corruptible.
This arises from the disadvantages of sign language. It is doubtful that sign can engender
thought. It is concrete. It is not truly connected with feeling and thought. […] It lacks pre-
cision. […] Sign cannot convey number, gender, person, time, nouns, verbs, adverbs, adjec-
tives, he claims. […] It does not allow [the teacher] to raise the deaf-mute above his
sensations. […] Since signs strike the senses materially they cannot elicit reasoning, reflec-
tion, generalization, and above all abstraction as powerfully as can speech. (Lane 1984:
387–388)
The president of the conference, Giulio Tarra, approached the argument from a differ-
ent perspective. Not only did he argue that speech should be preferred over signs, but
he linked signs to gesture, and claimed that neither was the proper instrument for
developing the intellect and the mind of the deaf child:
Gesture is not the true language of man which suits the dignity of his nature. Gesture,
instead of addressing the mind, addresses the imagination and the senses. Moreover, it
is not and never will be the language of society […] Thus, for us it is an absolute necessity
to prohibit that language and to replace it with living speech, the only instrument of
human thought. […] Oral speech is the sole power that can rekindle the light God
breathed into man when, giving him a soul in a corporeal body, he gave him also a
means of understanding, of conceiving, and of expressing himself. […] While, on the
one hand, mimic signs are not sufficient to express the fullness of thought, on the
other they enhance and glorify fantasy and all the faculties of the sense of imagination.
[…] The fantastic language of signs exalts the senses and foments the passions, whereas
speech elevates the mind much more naturally, with calm and truth and avoids the danger
of exaggerating the sentiment expressed and provoking harmful mental impressions.
(Lane 1984: 391, 393–394)
7. Speech, sign, and gesture 127
The arguments made by Magnat and Tarra were not new. They reflected Cartesian
philosophical ideas common at the time. Descartes distinguished two modes of
conceptualization – understanding or reasoning, and imagination:
I believe that this power of imagining that is in me, insofar as it differs from the power of
understanding, is not a necessary element of my essence, that is, of the essence of my mind;
for although I might lack this power, nonetheless I would undoubtedly remain the same
person I am now. Thus it seems that the power of imagining depends upon something dif-
ferent from me. (Descartes 1980: 90)
Cartesian thought also established the duality that separated the mind and the body:
Although perhaps […] I have a body that is very closely joined to me, nevertheless,
because on the one hand I have a clear and distinct idea of myself – insofar as I am a
thing that thinks and not an extended thing – and because on the other hand I have a dis-
tinct idea of a body – insofar as it is merely an extended thing, and not a thing that thinks –
it is therefore certain that I am truly distinct from my body, and that I can exist without it.
(Descartes 1980: 93)
Both of those philosophical themes are present in the arguments presented in favor of
oralism, which reveal long-standing preconceptions, and indeed misconceptions, about
the relation between spoken language, signed language, and gesture (see Tab. 7.1).
Tab. 7.1: The mind-body duality and speech, sign, and gesture
Mind Body
Language Gesture
Speech Sign
Acquisition of ideas Concrete
Expression of thought, instrument of thought Cannot engender thought
Those who use it are restored to society with calm, Those who use it are defiant, corruptible,
prudence, truth (human-like) undignified (animal-like)
Precision (grammar) Lacks grammar (number, gender, nouns,
verbs, etc.); mimic
Elicits and permits reasoning, reflection, Associated with the senses, material, glorifies
abstraction, generalization, conceptualization, imagination and fantasy, foments the
rationality passions, encourages harmful mental
impressions
The soul, the spirit, the breath of God (aspiration The corporeal body, the flesh, the material
and speech), res cogitans world, the sensual, res extensa
The impact of these ideas was tremendous because they established the scientific world
view for the next 100 years. Scholars were left with the following deeply entrenched
assumptions about the relationship between speech, sign, and gesture:
Three recent developments in the science of communication have not only challenged
these views, but have begun to synthesize to form a bold new unified view of the nature
of human semiotic capabilities. These are:
If one considers a natural human sign language from inside not outside, its gSigns denote
the kinds of things that sSigns denote. […] Because the full language use of gSigns is not
generally known, it has been supposed that there must be an unbridgeable gap between
gSign as affect display and gSign as language element. Yet there may be a semiotic and
evolutionary continuum as we shall see. The habit of thinking of animal gSigns as simple
and human signs, both gestured and spoken, as complex is preposterous. […] The verte-
brates’ affect display gSign is immensely complex. Its reference is in fact to the whole
ethology, and it is moreover complex in a semiotically exploitable way. Its vehicle, deno-
tation, and denotatum are all rich material for evolutionary development. If it is
7. Speech, sign, and gesture 129
foolhardy to maintain this kind of gSign evolved directly into language, it is foolish to
dismiss it and seek elsewhere for the origins of more specialized gSigns and sSigns.
(Stokoe 1974: 36)
As we shall see, only recently has Stokoe’s original position been taken up by sign lin-
guists and gesture researchers. Even while spoken language researchers were exploring
the deep connections between language and gesture, the view remained that sign and
gesture could not be unified (Singleton, Goldin-Meadow, and McNeill 1995).
4. Cognitive Linguistics
The Cartesian idea that language is of the mind, while gesture is of the body, has deeply
influenced modern linguistic theory. While some philosophers, such as Condillac, sug-
gested that language began as a gesture language or langage d’action, while others, in-
cluding Rousseau, Herder, and Humboldt, caricatured and ridiculed such a claim,
arguing that language could not have arisen from such natural, animalistic beginnings.
Humboldt contended that: “Language must be regarded as built-in man; for as the work
of his mind, in the clarity of consciousness, it is completely inexplicable. It is of no avail
to allow for thousands and thousands of years for its invention. Language could not be
invented if its archetype were not already present in the human mind” (Wells 1987).
The notion that language must be “built-in”, that there is in fact a faculty of language
distinct from other, general cognitive abilities, is a defining assumption of modern form-
alist or generative linguistic theory. Another defining assumption of formalist ap-
proaches is that grammar and syntax can be explained independently of meaning.
Chomsky, for example, writes derisively of the matter:
A great deal of effort has been expended in attempting to answer the question: “How can
you construct a grammar with no appeal to meaning?” The question itself, however, is
wrongly put, since the implication that obviously one can construct a grammar with appeal
130 I. How the body relates to language and communication
to meaning is totally unsupported. One might with equal justification ask: “How can
you construct a grammar with no knowledge of the hair color of the speaker?” (Chomsky
1957: 93).
Cognitive linguistics presents a radical departure from such a view. Cognitive linguistics
is an approach to language that is based on our experience of the world and the way
we perceive and conceptualize it. Cognitive linguistics adopts three foundational
hypotheses (Croft and Cruse 2004):
Under the cognitive linguistic framework, language is dissociable from other facets of
human cognition. Knowledge of language cannot be sharply delimited and distinguished
from other kinds of knowledge and ability.
One approach to cognitive linguistics is known as cognitive grammar. In direct reply
to the more traditional assumption that grammar is a purely formal system devoid of
meaning, Langacker (2008: 3) presents the provocative claim that grammar is meaning-
ful. Grammar, according to Langacker, is symbolic in nature, where the term “symbolic”
means the pairing between a semantic structure and a phonological structure – between
a meaning and its physical manifestation. Within this theory, lexicon, morphology, and
syntax form a continuum of symbolic structures. All aspects of grammar, including
grammatical classes, grammatical markers, grammatical rules, and so forth, are symbolic
structures, having both semantic and phonological import.
According to cognitive grammar, one of the overriding aspects of grammar is that it
imposes a particular construal onto conceptual content. This ability to construe situa-
tions in myriad ways is based on imaginative and creative abilities. Cognitive linguistics
eschews purely propositional or truth-conditional accounts of meaning, and instead fa-
vors imagistic accounts. One type of such conceptual structure is a set of image schemas,
“schematized patterns of activity abstracted from everyday bodily experiences,
especially pertaining to vision, space, motion, and force” (Langacker 2008: 32).
While the cognitive linguistic approach does not rule out certain innate abilities,
cognitive grammar does insist that:
an expression of general cognitive abilities that are grounded in our sensorimotor and
kinesthetic experiences.
It is in the modern synthesis of these three developments, which is only now happen-
ing, that a truly interdisciplinary understanding is beginning to emerge about the body,
language, and communication. Initially, the integration followed two tracks. In one
track, linguists (Cienki and Müller 2008a; Parrill and Sweetser 2004) began to apply
cognitive linguistic concepts and analytical tools to the study of gesture, demonstrating
that gesture can reveal in a very transparent way the nature of conceptualization.
In the second track, cognitively-trained signed language linguists began to utilize
cognitive linguistic methods to study signed languages (S. Wilcox 2007; S. Wilcox and
P. Wilcox 2010). These studies resulted in new insights into problems that had seemed
impenetrable, such as the nature of metaphor in signed languages (P. Wilcox 2000;
S. Wilcox 2007; S. Wilcox, P. Wilcox, and Jarque 2003) and the significance of iconicity
in a visual language (Perniss 2007; Pietrandrea 2002; Taub 2001; S. Wilcox 2004a). Addi-
tionally, models unique to cognitive linguistics such as blending and conceptual integra-
tion (Dudis 2004; Liddell 1998, 2003; Wulf and Dudis 2005) were applied to signed
languages, resulting in new insights into the structure of these languages.
Recently, gesture researchers and linguists have begun to integrate all three develop-
ments. One of the first publications to explore this new, integrated approach was Ges-
ture and the Nature of Language (Armstrong, Stokoe, and S. Wilcox 1995). The
interface between gesture and signed language has become the focus of several linguists
(Müller 2008; S. Wilcox 2004b). One aspect of this research has been to document the
grammaticalization process by which non-linguistic gestures become incorporated into
the linguistic system of signed languages (Janzen and Shaffer 2002; S. Wilcox 2009).
Increasingly, the disciplinary boundaries between sign linguists, gesture scholars, and re-
searchers in other fields of communication science such as psychologists and cognitive
neuroscientists, are beginning to fade. As they do, the fruits of this interdisciplinary uni-
fication are beginning to shed new light on the nature of human language as embodied
communication.
6. References
Armstrong, David F., William C. Stokoe and Sherman Wilcox 1995. Gesture and the Nature of
Language. Cambridge: Cambridge University Press.
Birdwhistell, Ray L. 1970. Kinesics and Context: Essays on Body Motion Communication. Phila-
delphia: University of Pennsylvania Press.
Chomsky, Noam 1957. Syntactic Structures. The Hague: Mouton.
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John
Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, gesture, and thought. In: Raymond Gibbs
(ed.), The Cambridge Handbook of Metaphor and Thought, 483–502. Cambridge: Cambridge
University Press.
Condon, William S. and Louis W. Sander 1974. Synchrony demonstrated between movements of
the neonate and adult speech. Child Development 45(2): 456–462.
Croft, William and Alan D. Cruse 2004. Cognitive Linguistics. Cambridge: Cambridge University
Press.
Descartes, René 1980. Discourse on Method and Meditations on First Philosophy. Indianapolis:
Hackett.
132 I. How the body relates to language and communication
Dudis, Paul G. 2004. Body partitioning and real-space blends. Cognitive Linguistics 15(2):
223–238.
Janzen, Terry and Barbara Shaffer 2002. Gesture as the substrate in the process of ASL gramma-
ticization. In: Richard Meier, David Quinto and Kearsy Cormier (eds.), Modality and Structure
in Signed and Spoken Languages, 199–223. Cambridge: Cambridge University Press.
Kendon, Adam 1980. Gesticulation and speech: two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam 1991. Some considerations for a theory of language origins. Man 26(2): 199–221.
Kendon, Adam 1994. Do gestures communicate? A review. Research on Language and Social
Interaction 27(3): 175–200.
Kendon, Adam 1997. Gesture. Annual Review of Anthropology 26: 109–128.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6 http://
cognitextes.revues.org/406. Accessed May 2012.
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium hand – Discovering
structures in gestures on the basis of the four parameters of sign language. Semiotica.
Lane, Harlan 1984. When the Mind Hears: A History of the Deaf. New York: Random House.
Langacker, Ronald W. 2008. Cognitive Grammar: A Basic Introduction. Oxford: Oxford Univer-
sity Press.
Liddell, Scott K. 1998. Grounded blends, gestures, and conceptual shifts. Cognitive Linguistics
9(3): 283–314.
Liddell, Scott K. 2003. Grammar, Gesture, and Meaning in American Sign Language. New York:
Cambridge University Press.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3): 350–371.
McNeill, David 1992. Hand and Mind: What Gestures Reveal About Thought. Chicago: University
of Chicago Press.
McNeill, David (ed.) 2000. Language and Gesture. New York: Cambridge University Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Mittelberg, Irene 2008. Peircean semiotics meets conceptual metaphor: Iconic modes in gestural
representations of grammar. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Ges-
ture, 145–184. Amsterdam: John Benjamins.
Mittelberg, Irene 2010. Geometric and image-schematic patterns in gesture space. In: Vyvian
Evans and Paul Chilton (eds.), Language, Cognition, and Space: The State of the Art and
New Directions, 351–385. London: Equinox.
Müller, Cornelia 2004. Forms and uses of the palm up open hand: A case of a gesture family. In:
Roland Posner and Cornelia Müller (eds.), The Semantics and Pragmatics of Everyday Ges-
tures, 234–256. Berlin: Weidler.
Müller, Cornelia 2008. What gestures reveal about the nature of metaphor. In: Alan Cienki and
Cornelia Müller (eds.), Metaphor and Gesture, 219–248. Amsterdam: John Benjamins.
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), The Routledge Linguis-
tics Encyclopedia, 510–518. London: Routledge.
Müller, Cornelia and Alan Cienki 2009. Words, gestures, and beyond: Forms of multimodal met-
aphor in the use of spoken language. In: Charles Forceville and Eduardo Urios-Aparisi (eds.),
Multimodal Metaphor, 297–328. Amsterdam: John Benjamins.
Parrill, Fey and Eve Sweetser 2004. What we mean by meaning: Conceptual integration in gesture
analysis and transcription. Gesture 4(2): 197–219.
Perniss, Pamela M. 2007. Space and Iconicity in German Sign Language (DGS). Max Planck Insti-
tute Series in Psycholinguistics 45. Nijmegen, the Netherlands: Max Planck Institute.
Pietrandrea, Paola 2002. Iconicity and arbitrariness in Italian Sign Language. Sign Language Stu-
dies 2(3): 296–321.
7. Speech, sign, and gesture 133
Pike, Kenneth L. 1967. Language in Relation to a Unified Theory of the Structure of Human
Behavior. The Hague: Mouton.
Singleton, Jenny L., Susan Goldin-Meadow and David McNeill 1995. The cataclysmic break
between gesticulation and sign: Evidence against a unified continuum of gestural communica-
tion. In: Karen Emmorey and Judy Reilly (eds.), Language, Gesture, and Space, 287–311. Hills-
dale, NJ: Lawrence Erlbaum.
Stokoe, William C. 1960. Sign Language Structure: An Outline of the Visual Communication Sys-
tems of the American Deaf. Buffalo, NY: University of Buffalo.
Stokoe, William C. 1974. Motor signs as the first form of language. In: Roger W. Wescott, Gordon
W. Hewes and William C. Stokoe (eds.), Language Origins, 35–50. Silver Spring, MD: Linstok
Press.
Taub, Sarah 2001. Language in the Body: Iconicity and Metaphor in American Sign Language.
Cambridge: Cambridge University Press.
Wells, Georg A. 1987. The Origin of Language: Aspects of the Discussion from Condillac to
Wundt. La Salle, IL: Open Court.
Wilcox, Phyllis P. 2000. Metaphor in American Sign Language. Washington, DC: Gallaudet Uni-
versity Press.
Wilcox, Phyllis P. 2007. Constructs of the mind: Cross-linguistic contrast of metaphor in verbal and
signed languages. In: Elena Pizzuto, Paola Pietrandrea and Raffaele Simone (eds.), Verbal and
Signed Languages: Comparing Structures, Constructs and Methodologies, 251–274. Berlin:
Mouton de Gruyter.
Wilcox, Sherman 2004a. Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed
languages. Cognitive Linguistics 15(2): 119–147.
Wilcox, Sherman 2004b. Gesture and language: Cross-linguistic and historical data from signed
languages. Gesture 4(1): 43–75.
Wilcox, Sherman 2007. Signed languages. In: Dick Geeraerts and Hubert Cuyckens (eds.), The
Oxford Handbook of Cognitive Linguistics, 1113–1136. Oxford: Oxford University Press.
Wilcox, Sherman 2009. Symbol and symptom: Routes from gesture to signed language. Annual
Review of Cognitive Linguistics 7(1): 89–110.
Wilcox, Sherman and Phyllis Wilcox 2010. The analysis of signed languages. In: Bernd Heine and
Heiko Narrog (eds.), The Oxford Handbook of Linguistic Analysis, 739–760. Oxford: Oxford
University Press.
Wilcox, Sherman, Phyllis Wilcox and Maria J. Jarque 2003. Mappings in conceptual space: Meto-
nymy, metaphor, and iconicity in two signed languages. Jezikoslovlje 4(1): 139–156.
Wulf, Alyssa and Paul Dudis 2005. Body partitioning in ASL metaphoric binds. Sign Language
Studies 5(3): 317–332.
Abstract
This is a sketch of the theoretical background and basic assumptions for regarding
language and gesture as an integrated system along with an account of the empirical
observations and findings which have inspired and that lend support to this theory.
1. Introduction
A dynamic view emphasizes how language integrates itself with thought and context
(discursive, interpersonally, material, etc.) – factors integral, not external, to language
considered dynamically that fuel the real-time microgenesis of speech and gesture.
The growth point (GP), featured here, is a hypothesis concerning this dynamic system.
– The static dimension is accessed through the synchronic method, in which language is
viewed as a totality at a single theoretical instant (hence “syn-” “chronic”). Saussure
([1959] 1966) argued that only in this way, with the whole of language laid out panor-
amically, could the contrasts that define linguistic values be discerned. Saussure
famously contrasted French “mouton” to the two English words, “sheep” and “mut-
ton”, to illustrate how values depend on “differences” (here, a difference in English,
none in French) as well as reference. The values of “mouton” and “sheep” can never
be the same, despite being mutual translations and having the same references. Also
see Saussure’s recently discovered notes (Saussure 2002) and comments by Harris
(2002, 2003). “In language”, he said, “there are only differences” (Saussure 1966:
652) and this is the essence of the static dimension – linguistic objects in contrast
(“static” does not mean “stasis” – we are not speaking of moments of repose between
bursts of activity: Every linguistic event can be regarded statically or dynamically, or
as here both).
– The dynamic dimension could be termed the “activity” of language but I will call it
(invoking Merleau-Ponty 1962) “inhabiting” language with thought and action. This
term has the advantage of invoking both langue, the static system, and something that
animates it, bringing in the dynamic dimension. A historical figure associated with
this tradition is Vygotsky (1987). On this dimension, utterances come and go, emerge
and disperse in real time, and this stands in contrast to the abstraction from time of
the linguistic objects in langue.
With careful observation these criteria can be tested for applicability, as will be
illustrated in multiple examples below.
3.3. An example
In Fig. 8.1, a speaker is shown recounting an episode of an animated cartoon (variously
termed the “inside ascent” or the “bowling ball episode”) in which one character,
Sylvester, attempts to reach a second character, Tweety, by climbing the inside of a
drainpipe attached to the side of a building – a pipe conveniently topping out just
where Tweety is perched in an upper story window. The cat Sylvester enters the
drainpipe at street level and starts to climb it but is thwarted when Tweety Bird
drops a bowling ball into the top. Sylvester and ball meet midway, and he swallows it.
This and other examples were collected in a standardized elicitation: A participant is
shown an approximately 8-minute long animated Tweety and Sylvester cartoon
(“Canary Row”, Warner Brothers, 1950). Immediately after viewing the cartoon, the
participant recounts the story to a second participant “as accurately and completely
as possible, as “BR” will be asked to retell the story based on your narration” (or
words to that effect). The performance is video recorded, with the speaker in full
138 II. Perspectives from different disciplines
Fig. 8.1: Gesture of “rising hollowness” with: “he goe[ss up through the pipe] this time”
view and at least the front half of the listener as well. The second participant is a
genuine listener, not one of the experimenters. The instructions emphasized that the
purpose of the experiment was to study storytelling and there is no mention of gesture.
(Computer art in Fig. 8.1 and later figures by Fey Parrill.)
Typological conventions in the transcriptions of examples distinguish the phases of
gestures (Kendon 1980). The gesture phrase, the larger unit, is enclosed within
“[” and “]” brackets. The stroke, the image phase, is marked in boldface. Preparation is
the hand getting into position for the stroke and is indicated by the span from the
left bracket to the boldface (the onset of preparation is the first sign the gesture is
becoming organized); holds or cessations of movement, either prestroke, awaiting the
stroke, or poststroke, preserve the position and hand shape of the stroke after move-
ment ceases and are indicated with underlining (there was a poststroke hold over
“the” in the example); and retraction is the hand retreating to rest and is indicated
by the span to the right bracket (retraction extended through the rest of the noun
phrase, the noun “pipe”). Only the stroke phase is obligatory. Without a stroke a
gesture is not said to occur but the other phases may or may not occur.
In the boldfaced section of Fig. 8.1, the speaker raised her right hand upward, her
palm up and fingers and thumb spread apart: a kind of open basket shape moving
up, as illustrated (the clip is at the moment she is saying the vowel of “thróugh”).
This gesture depicts the path of motion (upward), the moving figure (the cat), and
the interiority of the path. The gesture synthesizes all this information in a single
symbol.
3.4. Co-expressiveness
The synchronous speech and gesture, “and he go[es up thróugh the pipe] this time”,
were co-expressive. The term “co-expressive” means that speech and gesture cover the
same idea unit. Co-expressiveness is distinct from “mismatch” or “supplement”, where
speech and gesture cover distinct meanings. Co-expressively with “up” the speaker’s
hand rose upward; co-expressively with “thróugh” her fingers spread outward to create
an interior space. The upward movement and the opening of the hand took place con-
currently, not sequentially, and these movements occurred synchronously with “up
thróugh,” the linguistic package that carries the same meanings. The contrastive empha-
sis on “thróugh,” highlighting interiority, is matched by the added complexity of the
gesture, the spreading of the upturned fingers. What makes speech and gesture co-
expressive is this joint realization of the idea of upward motion and interiority. Utter-
ances can occur without gestures and imagery still be present. While absence of gesture
has multiple causes, one is that the less newsworthy the information, the less materia-
lized the speech and gesture. Absence of gesture is the endpoint of this continuum.
Speech, at the same endpoint, likewise becomes simple and hackneyed and may also
briefly cease. So even if gesture is absent, imagery can be present.
(the open shape). The gesture does not admit decomposition. It is a minimal Vygots-
kian psychological unit, and there are no subunits with their own meanings, no repea-
table significances for the outspread fingers and upward palm. Only upward motion can
be said to have an independent meaning – it means upward, but that is all and it is not
enough to generate the meaning of the gesture whole. And even this upward meaning
acquires significance as a part of the whole (it means rising hollowness, which comes
from the whole, not from upward simple). The gesture in other words is not composed
out of parts: The parts are composed out of it. Also, the gesture is more a unified whole
than just the combination of up and through. I have tried to convey this unity with the
expression “rising hollowness” but whatever the phrase the gesture has interiority, entity,
and upward motion in one, undecomposable symbolic form. The gesture synthesized
ideas that required separation in speech – the figure in motion, the direction, and the
idea of interiority were unified in the gesture while the same ideas were distributed
into “he”, “goes up” and “thróugh” in speech.
Fig. 8.2: Illustrating gesture combination. “Down the pipe” at different levels of detail.
Left panel, the speaker’s initial description, one hand showing Tweety’s “hand” shaped over the bowl-
ing ball as it is thrust into a drainpipe; the downward thrust occurred three times:
Speaker: [and throws a bowling ball] [down in the*] [the thing]
Right panel, the second description after the listener requested clarification, an elaborated gesture
together with an expanded verbal description (the downward thrust now occurring six times):
Listener: where does he throw the bowling ball?
Speaker: [it’s one of those*] [the] [gu][tter pipes] [an’ he t][hrows the ball into the top]
In the right panel, the left hand adds pictorial detail but the value is intrinsic to the imagery and does
not arise from the left hand-right hand combination, unlike the way that a direct object arises when
a noun is combined with a transitive verb in a verb phrase.
4. Context
Another source of dynamic dimension change is the inclusion of the immediate context
of speaking; it is not that a growth point “consults” the context or the context sets para-
meters for it; the growth point incorporates it. Growth point and context are dependent
on each other; they are mutually constitutive. These considerations are subsumed by the
Vygotskian concept of a psychological predicate.
A psychological predicate:
Regarding the growth point as a psychological predicate suggests that the mechanism
of growth point formation is differentiation of a focus from a background. Such
differentiation is validated by the very close temporal connection of gesture strokes
with the peaks of acoustic output in speech. Shuichi Nobe (1996) has documented
this connection instrumentally:
The robust synchrony between gesture strokes and the peaks of acoustic aspects suggests
that the information the gesture stroke carries has an intrinsic relationship with the accom-
panying speech information prominently pronounced with these peaks. The manifestation
of the salient information seems to be realized through the synchronization of these two
modalities (Nobe 1996: 35).
I use the terms field of oppositions and significant (newsworthy) contrast to refer to the
constructed background and the differentiation of psychological predicates (growth
points) within it. All of this is meant to be a dynamic, continuously updated process
in which new fields of oppositions are formed and new growth points or psychological
predicates are differentiated in ongoing cycles of thinking for speaking.
which he clambers up on the outside, but the effort fails. Undaunted, he immediately
tries a second ascent, now on the inside in a kind of stealth approach, as we have
seen. This second ascent was crowned (literally) by the bowling ball episode.
The natural experiment is the following: Describing the first attempt, the field of
oppositions would have been something like WAYS OF USING A DRAINPIPE
(since this ascent was the first mention of the pipe), and the psychological predicate
something like CLIMB IT. With the second attempt, climbing the pipe is no longer
newsworthy. It is background, and the field of oppositions would be updated, something
like: WAYS OF CLIMBING A DRAINPIPE. Now interiority would be newsworthy:
ON THE INSIDE.
If a speaker recalls both attempts, in the correct outside-inside order, the psycholog-
ical predicate relating to the second attempt should focus on interiority. This follows
from the psychological predicate concept; in the updated field of oppositions, interiority
is the newsworthy feature.
However, if a speaker recalls only the inside attempt and fails to recall the outside
attempt, or recalls both attempts but reverses their order, interiority should not be
newsworthy; it should not be a feature of the psychological predicate, even though
the speaker has perceptually registered that Sylvester did climb the pipe on the inside.
This is so because interiority does not contrast with exteriority in an inside-only or
inside-outside context. The field of oppositions would only be about climbing and the
interiority, even when registered, would be non-differentiating and not form a psycho-
logical predicate (no one has ever recalled only the outside attempt but it would also be
expected to highlight only ascent). Susan Duncan (pers. comm.) discovered this logic of
this natural experiment. It is now the basis of a designed experiment she and Dan Loehr
are conducting, comparing gestures after participants have watched the standard
outside-inside order to the gestures of different participants who have watched it in
the reverse inside-outside order. They find interiority in gesture for the outside-inside
order but not for inside-outside, showing powerfully that interiority, while perceptually
present, is not significant in inside-first.
Right thumb (see arrow) lifts for ascent. No interiority. Den. contin-
ued with “Tweety drops a bowling ball down the drainpipe”, again
showing that the internal ascent had been registered.
Fig. 8.4: “Inside” gestures after “outside” all highlight the newsworthy interiority feature.
146 II. Perspectives from different disciplines
Fig. 8.5: Highlighting “pipe” (not interiority) when the contrast is to the ground element
(misremembering the first ascent as on a ladder).
Thus the natural experiment shows that the growth point, as a psychological predicate,
captures exactly what is newsworthy in the immediate field of oppositions and ignores
the same features when, through narrative mischance, they are not contrastive.
8. The growth point hypothesis of language and gesture 147
“<uuhh> let’s see the next time / is he tries to* <uh> /tries to cliimb / [up / in / through /
the] [drain* /”
Fig. 8.6: Timing shift to highlight interiority.
148 II. Perspectives from different disciplines
(Continued)
150 II. Perspectives from different disciplines
(Continued)
8. The growth point hypothesis of language and gesture 151
Fig. 8.7: Catchments in the bowling ball and its aftermath description.
152 II. Perspectives from different disciplines
The catchments for (1) through (9) appear in hand use – right hand or left; one hand
or two; and, when two hands, same or different hand shape and/or hand position (other
catchments could be present). Each of the gesture features embodies a certain thematic
content and this content is what motivates it: C1 is about Sylvester as a moving entity
and its recurring gesture feature is a single moving hand; C2 is about the bowling ball
and what it does, and its recurring feature is a rounded shape (in gesture transcription
terms, “2 similar hands”); C3 is about the relative positions of Sylvester and bowling ball
in the drainpipe and its recurring feature involves the two hands in the appropriate
spatial configuration (“2 different hands”). The “it down” growth point was part of
the symmetrical 2-handed C2 (several later gestures were also part of it).
The occurrence of (2) in the symmetrical catchment shows that one of the factors
comprising its field of oppositions at this point was the various guises in which the bowl-
ing ball appeared in its role as an antagonist. The significant contrast in the example was
its downward motion. Because of the field of oppositions at this point, this downward
motion had significance as an antagonistic force against Sylvester. We can write this
field of oppositions and growth point, respectively, as:
This was the context and contrast respectively. Thus, the “it down” growth point,
unlikely though it may seem as a unit from a grammatical point of view, was the cog-
nitive core of the utterance in (2) – the “it” indexing the bowling ball, and the
“down” indexing the significant contrast itself in the field of oppositions.
6. Unpacking
The final contribution to the dynamic dimension is called here “unpacking”. In the fol-
lowing, it should be evident that we are not trying to derive linguistic structures from
imagery or any other underlying psychological mechanisms. Such would undercut the
entire dialectic concept. Growth point instability is resolved by unpacking when it is
cradled in one or more syntactic constructions. The effect is to stabilize the cognition
of the growth point. A construction is a resting point par excellence (Goldberg 1995).
How a construction is selected is the source of dynamism. Since the construction will
have its own context and add its own significances selecting it involves the generation of
further meanings. The growth point and the unpacking can occur simultaneously but
they are functionally different – the core meaning in the growth point, a peripheral
meaning that presents this core in the unpacking.
The first context of “it down” we have already analyzed; it was the C2 theme in
which the bowling ball was an antagonistic force. A second context can be seen in
the fact that the two-handed gesture at (2) also contrasted with C1 – the preceding
one-handed gesture in (1) depicting Sylvester as a solo force. (1) and (2) comprised a par-
adigm of opposed forces: Sylvester versus Tweety (via the bowling ball). This contrast led
to the other parts of the utterance in (2) through a partial repetition of the utterance
structure of (1), a poetic framework within which the new contrasts were formed
(see Jakobson 1960). Contrasting verbal elements appeared in close to equivalent slots
(the match is as close as possible given that the verb in (2) is transitive while that in
(1) is intransitive):
8. The growth point hypothesis of language and gesture 153
7. Note on consciousness
Consciousness was once the central topic of psychology but it was swept away by behav-
iorist puritanism. Interest in it is returning after a long exile, and the growth point offers
its own perspective. Wundt ([1900] 1973) a century ago described the “sentence” as a
dynamic psychological phenomenon:
From a psychological point of view, the sentence is both a simultaneous and a sequential
structure. It is simultaneous because at each moment it is present in consciousness as a
totality even though the individual subordinate elements may occasionally disappear
from it. It is sequential because the configuration changes from moment to moment in
its cognitive condition as individual constituents move into the focus of attention and
out again one after another. (Translation by Blumenthal 1970: 21, a passage cited by
Zenzi Griffin)
Wundt’s insight is that two simultaneous phenomena occur as the “psychological struc-
ture” of the sentence, something that exists all at once and something else, the same
meaning but in another form, that is successive. Wundt had in effect noticed in these
wavering levels of consciousness the growth point and its unpacking. Growth points sur-
face in the unpacking construction as brief dynamic pulses (to use Susan Duncan’s 2006
term). The dialectic is the focus of this rhythmically situated point in the flow of speech
action (called the “L-center” in McNeill 2003). The unpacking is the sequential frame
154 II. Perspectives from different disciplines
housing it. The unpacked growth point and its construction are aligned structures that
laminate consciousness and are experienced in the dual way that Wundt described.
8. Conclusions
Growth points are the brief dynamic pulses wherein idea units take form. Housing op-
posed semiotic modes, they are intrinsically unstable and stability is sought in the form
of unpacking into grammatical constructions, which offer static dimension stability par
excellence. The sources of utterance dynamics sketched here are:
9. References
Blumenthal, Arthur (ed. and trans.) 1970. Language and Psychology: Historical Aspects of Psycho-
linguistics. New York: John Wiley & Sons.
Chase, William G. and K. Anders Ericsson 1981. Skilled memory. In: John Robert Anderson (ed.),
Cognitive Skills and Their Acquisition, 227–249. Hillsdale, NJ: Erlbaum.
Duncan, Susan 2006. Co-expressivity of speech and gesture: Manner of motion in Spanish,
English, and Chinese. In: Proceedings of the 27th Berkeley Linguistic Society Annual Meeting,
353–370. [Meeting in 2001.] Berkeley, CA: Berkeley Linguistics Society.
Duranti, Alessandro and Charles Goodwin 1992. Rethinking context: An introduction. In: Ales-
sandro Duranti and Charles Goodwin (eds.), Rethinking Context: Language as an Interactive
Phenomenon, 1–42. Cambridge: Cambridge University Press.
Feyereisen, Pierre volume 2. Gesture and the neuropsychology of language. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.),
Body – Language – Communication: An International Handbook on Multimodality in
Human Interaction. (Handbooks of Linguistics and Communication Science 38.2.) Berlin:
De Gruyter Mouton.
Furuyama, Nobuhiro and Kazuki Sekine 2007. Forgetful or strategic? The mystery of the system-
atic avoidance of reference in the cartoon story narrative. In: Susan D. Duncan, Justine Cassell
and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language, 75–82. Amster-
dam: John Benjamins.
Goldberg, Adele 1995. Constructions: A Construction Approach to Argument Structure. Chicago:
University of Chicago Press.
Harris, Roy 2002. Why words really do not stay still. Times Literary Supplement 5182. 26 July 2002.
Harris, Roy 2003. Saussure and His Interpreters, 2nd edition. Edinburgh: Edinburgh University Press.
Jakobson, Roman 1960. Closing statement: Linguistics and poetics. In: Thomas A. Sebeok (ed.),
Style in Language, 350–377. Cambridge: Massachusetts Institute of Technology Press.
8. The growth point hypothesis of language and gesture 155
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2003. Aspects of aspect. Gesture 3: 1–17.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking for speaking. In: David
McNeill (ed.), Language and Gesture, 141–161. Cambridge: Cambridge University Press.
McNeill, David, Susan D. Duncan, Amy Franklin, James Goss, Irene Kimbara, Fey Parrill, Ha-
leema Welji, Lei Chen, Mary Harper, Francis Quek, Travis Rose, and Ronald Tuttle 2009.
Mind merging. In: Ezequiel Morsella (ed.), Expressing Oneself / Expressing One’s Self: Com-
munication, Language, Cognition, and Identity, 143–164. London: Taylor and Francis.
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. Translated by C. Smith. London:
Routledge.
Nobe, Shuichi 1996. Representational gestures, cognitive rhythms, and acoustic aspects of speech:
A network/threshold model of gesture production. Ph.D. dissertation, University of Chicago.
Parrill, Fey 2008. Subjects in the hands of speakers: An experimental study of syntactic subject and
speech-gesture integration. Cognitive Linguistics 19(2): 283–299.
Quaeghebeur, Liesbet, Susan D. Duncan, Shaun Gallagher, Jonathan Cole and David McNeill
volume 2. Aproprioception, gesture, and cognitive being. In: Cornelia Müller, Alan Cienki,
Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body – Language –
Communication: An International Handbook on Multimodality in Human Interaction. (Hand-
books of Linguistics and Communication Science 38.2.) Berlin: De Gruyter Mouton
Quaeghebeur, Liesbet and Peter Reynaert 2010. Does the need for linguistic expression constitute
a problem to be solved? Phenomenology and the Cognitive Sciences 9(1): 15–36.
Saussure, Ferdinand de 1966. Course in General Linguistics. Edited by Charles Bally and Albert
Sechehaye, in collaboration with Albert Riedlinger; translated by W. Baskin. New York:
McGraw-Hill. First published [1959].
Saussure, Ferdinand de 2002. Écrits De Linguistique Général. Compiled and edited by Simon
Bouquet and Rudolf Engler. Paris: Gallimard.
Talmy, Leonard 2000. Toward a Cognitive Semantics. Cambridge: Massachusetts Institute of Tech-
nology Press.
Vygotsky, Lev S. 1987. Thought and Language. Edited and translated by Eugenia Hanfmann and
Gertrude Vakar (revised and edited by Alex Kozulin). Cambridge: Massachusetts Institute of
Technology Press.
Wundt, Wilhelm 1973. The Language of Gesture. The Hague: Mouton. First published [1900].
Zinchenko, Vladimir P. 1985. Vygotsky’s ideas about units for the analysis of mind. In: James V.
Wertsch (ed.), Culture Communication, and Cognition: Vygotskian Perspectives, 94–118. Cam-
bridge: Cambridge University Press.
Abstract
Since the beginnings of psycholinguistics, gestures were considered as significant parts of
the multimodal messages that are exchanged during social interactions. The first models
proposed to explain production and comprehension of these messages took various forms
of cognitive architectures. These information-processing models explicitly distinguished
several components and processing levels, either modality-specific or “abstract”, i.e.
shared by different modalities as input to production and output of comprehension.
The assumption of such abstract conceptual representations underlying uses of speech
and gesture was challenged in two directions. In a dynamic perspective, the notion of
fixed conceptual, lexical or motor representations was criticized and meaning was sup-
posed to emerge from temporary interactions among sub-symbolic units connected in
multi-layered networks. In another, pragmatic direction, a distinction was made between
monologues and dialogues. Speech and gestures in conversations are joint actions that
require collaborative partners sharing a common ground. These various models have in-
spired experimental or quasi-experimental manipulations of several factors: the charac-
teristics of the speakers, of the addressees, of the situational context, of the speech
content, of the task demands. In this way, multiple routes were drawn to link up the
verbal, motor, and social cognitive subsystems that are involved in multimodal
communication.
any spoken language and therefore is called “mentalese” or the “language of thought”
(Fodor, Bever, and Garrett 1974: 374–377).
Such a language is based on abstract symbols, or concepts, which also code for motor
actions and for outcomes of perception. These symbols combine within propositions
(i.e. elementary logical predicate-argument structures such as “the Earth is round”).
By their abstract nature, conceptual representations can interface words, objects, and
actions. Computation assumes serial processing stages, top-down in production and
bottom-up in comprehension. In both cases, the operations take place within “mod-
ules,” i.e. autonomous components specialized for definite functions, such as syntactic
parsing or object recognition, at an intermediate level between the conceptual system
and the sensory-motor processes. Fodor, Bever, and Garrett’s (1974) proposal was thor-
oughly revised in further elaborations (e.g. the sentence-production models of Garrett
(1988) and Levelt (1989)) but some basic ideas remain. These include: the notion of
abstract conceptual representations at the beginning of the message production and
at the end of its comprehension; the notion of modularity; and the sequential timing
of information processing.
Communicative intention:
conceptualization
Action
repertoire: Lexicon:
lemmas,
hierarchical
forms
schemas
Formulation: Comprehension:
grammatical sentence parsing,
Gesture Gesture
encoding, word recognition,
recognition planning
phonological phonological
encoding analysis
Visual Auditory
Execution Articulation
processing processing
Gesture Speech
Fig. 9.1: Information processing architecture for production and comprehension of speech
and gesture: a synthetic model. (As in Levelt (1989), boxes represent processing components
and ellipses represent knowledge stores.)
than did abstract words. Following Barsalou (1999), they assumed that word meaning re-
activated sensory-motor states (“perceptual symbols”) that differ from the conceptual
symbols assumed by Fodor, Bever, and Garrett (1974).
Two of Krauss et al.’s main claims – that lexical gestures help word retrieval and play
only a minor role in communication – remain disputed on the basis of available empir-
ical evidence, but they have had a strong impact on experimental investigation of ges-
ture processing (Feyereisen 2006; see also Goldin-Meadow this volume; Hostetter
2011). de Ruiter (2000, 2007) proposed an alternative model he successively called
the Sketch Model and the Postcard Architecture. In this conception, thoughts and com-
municative intentions are expressed in separate and parallel channels like the two sides
of a postcard. This model is consistent with the views developed by Kendon (2004; see
also Hadar this volume). Kendon carefully described cases of mutual adaptations in the
use of speech and gestures: sometimes, execution of a movement is interrupted by a
holding phase to fit the structure of the spoken expression, whereas, on other occasions,
speech is interrupted by silent or filled pauses to accommodate the requirements of
gestural expression. In the Postcard Architecture speakers are assumed to conceive a pre-
liminary message during the conceptualization phase and to select different communica-
tive forms. Accordingly, information that is not formulated in speech may be conveyed in
gesture or vice versa, depending on the relevance and availability of the chosen vehicle. In
support of this model, Melinger and Levelt (2004) found that the choice of performing
gestures or keeping the hands immobile influenced the lexical choices used to describe
9. Psycholinguistics of speech and gesture: Production, comprehension, architecture 159
spatial layouts. By analogy with a lexicon, de Ruiter (2000) also assumed the existence of
a “Gestuary,” a store of gesture templates. This is consistent with the notion of gesture
families proposed by Kendon (1995, 2004).
The Interface Hypothesis was initially proposed by Kita and Özyürek (2003) on the
basis of cross-linguistic evidence and then supported by experimental findings (see also
Kita 2009; Kita et al. 2007). Languages such as English, Japanese, and Turkish do not
express motion events in similar ways. For instance, the verb “to swing” has no equiv-
alent in Turkish and must be translated by a paraphrase. Kita and Özyürek (2003) found
similar differences in the gesture modality. English speakers described the swing move-
ment much more often with an arc trajectory than with a straight trajectory, whereas
Japanese and Turkish speakers used both kinds of gestures at similar rates. Accordingly,
the Interface Hypothesis consists of splitting the conceptualization stage into several
components and creating an intermediate level in which a Message Generator informs
verbal formulation and an Action Generator informs motor control. In this model, bi-
directional links between these two Generators represent reciprocal adaptations of ges-
ture forms to speech contents and vice versa. Interactions are located at this level and
not in the more abstract conceptualization stage, as in de Ruiter’s (2000) model. Feed-
back from formulation to message generation also allows cross-linguistic differences in
the way motion events are represented.
Kita (2000) and Goldin-Meadow (2003) both considered an alternative to the facil-
itation hypothesis of Krauss and his co-workers. They suggested that gesture production
can help conceptualization rather than formulation. According to Kita’s (2000) Infor-
mation Packaging hypothesis, “the production of representational gestures helps speak-
ers organize rich spatio-motoric information into packages suitable for speaking” (Kita
2000: 163). Experimental findings support this hypothesis, as gesture rates are higher in
the most difficult tasks than in tasks involving access to infrequent words (Hostetter,
Alibali, and Kita 2007; Melinger and Kita 2007).
Goldin-Meadow (2003) has been particularly concerned by mismatches between ges-
tures and speech during cognitive development. At some ages, in transition between
two stages, children express one idea verbally and a different idea in their gestures,
suggesting that gestures may help them to learn a new problem solving strategy.
Goldin-Meadow (2003) also hypothesized that gestures facilitate cognitive processing
by alleviating the load in working memory. In working memory tasks, it has been
shown that if the discourse of the participants is accompanied by gestures in the interval
between presentation and recall, recall scores for verbal and visuo-spatial materials are
higher than if the participants did not use gestures (see, e.g. Wagner, Nusbaum, and
Goldin-Meadow 2004). Goldin-Meadow assumed that the interposed verbal task was
easier (or less disturbing) when the expression was assisted by gestures. Her views
have not been presented through box-and-arrow diagrams, but they are relevant
to the issue of production architecture, and de Ruiter (2007) included this model
under the rubric of Window Architecture to express the idea that gesture is a window
into the mind.
Most of these discussions concern the production of representational gestures in the
framework of the Levelt’s (1989) model. Much less is known about the control mechan-
isms of other kinds of movements such as beat gestures in relation to speech rhythm and
variations of intonation (but see Krahmer and Swerts 2007; McClave 1994, 1998), head
movements and gaze orientation (e.g., McClave 2000), and use of facial gestures in
160 II. Perspectives from different disciplines
conversation (Bavelas and Chovil 2006). Likewise, on the comprehension side, the pro-
cesses underlying the integration of auditory and visual information have not been
made explicit (for an information-processing model of the listener, see Cutler and
Clifton 1999). Moreover, there are controversies about the reception stage at which
the gesture and speech comprehension systems interact. However, in various other di-
rections, developments in cognitive sciences have led to profound changes in the use of
processing metaphors for the mind.
over time (Smith and Samuelson 2003). These changes can be modelled by mathemat-
ical equations that enable the present state to be predicted from knowledge of previous
states of the system. These equations also allow predictions to be made about the results
of experiments in which critical parameters (e.g. cue strength and timing) are manipu-
lated. In dynamic accounts, behaviour does not rely on fixed representations but
emerges from embodied interactions in the context of a particular task (Barsalou,
Breazal, and Smith 2007; Smith 2005).
This anti-representationalist claim and non-modular view are similar to another
theory of the interaction of gesture and speech developed on different grounds by
McNeill (1987, 1992, 2000, 2005, this volume). Like dynamicists, McNeill deeply dis-
agreed with an information processing approach to gesture. According to him, there
is neither a language of thought nor a mental image serving as an input to the machin-
ery that translates the message into a spoken and bodily output. Thought is not defined
by its content, but as a process coming into existence through a progressive develop-
ment of grammatical and imagistic forms. To describe the formulation of the utterance,
McNeill used the term “unpacking” that suggests a segmentation of a whole and a
contrast of the communicative event with its context. A central concept in this theory
is the notion of “growth point,” i.e. the minimal psychological unit of inner speech from
which gestures and speech arise. This psychological predicate is a dynamic unit (a tem-
porary activation) in two senses. First, it results from the instability of an idea, which
can be conveyed in various modes, images and linguistic forms. Second, it emerges
from a contrast with the context, i.e. the inherent background from which the psycho-
logical predicate is differentiated. McNeill’s two main criticisms of the information-
processing models are the separation of language from imagery (as in the dual-code
hypothesis) and the use of context as an external parameter, outside the model. This
complex and original theoretical work also differs from connectionist models of
language processing, which represent input as a layer of feature nodes and do not incor-
porate imagery-language dialectic as a major determinant of gesture and speech
production.
Unlike other dynamical systems approaches, McNeill’s theory is formulated in ver-
bal and not mathematical terms. However, it has inspired some computer scientists
who try to simulate human behaviour through an artificial conversational agent able
to convey information by means of speech and gesture (Sowa et al. 2008). Previous at-
tempts had been based on information processing models such as Kita and Özyürek’s
(2003) Interface Model (see e.g. Kopp, Bergmann, and Wachsmuth 2008) with separate
modules for visuo-spatial imagery and sentence formulation. In the revised model,
gestures are not produced by combining pre-defined features but are learned from
interactions with the context. Yet, this computational model did not capture the
imagery-linguistic duality assumed in the notion of a growth point.
Another source of disagreement among researchers concerns the relationship
between co-verbal gestures and action. According to McNeill (2005), “gestures arise
from the process of thinking and speaking and can arise separately (in part) from the
brain systems controlling instrumental actions” (McNeill 2005: 234). In a different
research tradition, however, hand and mouth movements are closely associated from
birth to mechanisms involved in manipulation and feeding, and not in communication
(see e.g., Iverson and Thelen 1999). Synchrony of gestures and speech emerges from
this primitive coupling of vocal and manual repetitive movements. Connections exist
162 II. Perspectives from different disciplines
Bavelas and Chovil (2006) also assumed that the uses of language and gesture differ
in monologues and dialogues. Some gestures (called interactive gestures) integrate the
conversational partner into the process of message production and are absent in indi-
vidual narrative recall (see also Bavelas et al. 1992). These gestures refer to the inter-
locutor or to previous exchanges (through phrases such as “as you know” or “as we
said”) rather than to external elements (as the so-called topic gestures do). Bavelas
et al. (2008) designed an experiment to investigate the separate contributions of visi-
bility and dialogue by comparing the description of a picture of an eighteenth century
dress, out of sight of the addressee, in three conditions: face-to-face, by telephone, and
to a tape recorder. Analyses of speech and gestures revealed the effects of the two
manipulated factors. Gestures were more frequent in the dialogues (either face-to-face
or by telephone) than in monologues, but differed in form and content depending on
visibility conditions. Their amplitude was greater in face-to-face interaction (for
instance, by depicting the size of the dress with reference to the speaker’s own
body) and gestures were more often redundant with speech in the telephone condi-
tion. As might be expected, deictic expressions and pointing gestures were more fre-
quent when the addressee was visible. These differences related to the distinction
made by Clark (1996) between the three ways of communicating: describing by
means of symbols, indicating by pointing and deictic expressions, and demonstrating
through action.
In conclusion, psycholinguistic research is fuelled by several controversial topics
such as domain-specific modularity, format of thought (abstract, language-like, or sen-
sory-motor representations), directions of information flows (top-down and/or bottom
up), and communication mechanisms. In most cases, these conceptions are not really
antagonist and hybrid models can be elaborated.
3. Empirical issues
Psycholinguistics is also an empirical discipline, and the various models can be tested
through experimental and quasi-experimental manipulation of several factors.
– The speakers. Individual differences in the use of bodily signals are massive but still
poorly explained. Other differences may relate to age, language proficiency, bilingu-
alism, or capacity to generate mental images (see, e.g. Feyereisen and Havard 1999;
Hostetter and Alibali 2007; Nicoladis 2007).
– The addressees. The knowledge that the speakers can assume in their partners plays
an important role in the use of gestures. Jacobs and Garnham (2007), for instance,
compared the number of representational gestures in successive narratives, using
either three different or the same story repeated to either three different or the
same listener. Across trials, gesture frequency declined in the same/same condition
but remained relatively high in the same/different and different/different conditions.
These results are inconsistent with the view that gestures are performed to facilitate
lexical retrieval rather than to influence the listener. There is now growing evidence
that common ground shared between the speaker and the listener influences the use
of gesture (e.g. Gerwing and Bavelas 2004).
– The situational context. Bavelas et al. (2008) have shown that the presence and
visibility of a listener influence the use of gesture through attribution of states of
164 II. Perspectives from different disciplines
4. Conclusion
The various psycholinguistic architectures suggested to account for the use of language
and bodily movements in communication have in common the idea that exchange of
information involves multiple levels of analyses and can take several routes. There
are no reflex-like connections between the ideas (intentions) and their expressions (sig-
nals), as it is implied from the “mind-in-the-mouth assumption”, as Bock (1996) called
it. Instead, making decisions about the best way to reach a goal or about the best inter-
pretation of the situation involves complex computations that integrate factors of var-
ious kinds. Bodily communication is at the intersection of at least three domains: social
cognition, language use, and sensory-motor interactions with the environment. It is no
surprise that it can be seen from multiple perspectives.
9. Psycholinguistics of speech and gesture: Production, comprehension, architecture 165
5. References
Alibali, Martha W. 2005. Gesture in spatial cognition: Expressing, communicating and thinking
about spatial information. Spatial Cognition and Computation 5: 307–331.
Bangerter, Adrian 2004. Using pointing and describing to achieve joint focus of attention in
dialogue. Psychological Science 15: 415–419.
Barsalou, Lawrence W. 1999. Perceptual symbol systems. Behavioural and Brain Sciences 22:
577–660.
Barsalou, Lawrence W. 2008. Grounded cognition. Annual Review of Psychology 59: 617–645.
Barsalou, Lawrence W., Cynthia Breazal and Linda B. Smith 2007. Cognition as coordinated non–
cognition. Cognitive Processes 8: 79–91.
Bavelas, Janet B. and Nicole Chovil 2006. Nonverbal and verbal communication: Hand gestures
and facial displays as part of language use in face-to-face dialogue. In: Valerie L. Manusov
and Miles L. Patterson (eds.), The Sage Handbook of Nonverbal Communication, 97–115.
Thousand Oaks, CA: Sage.
Bavelas, Janet B., Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gestures.
Discourse Processes 15: 469–489.
Bavelas, Janet, Jennifer Gerwing, Chantelle Sutton and Danielle Prevost 2008. Gesturing on the
telephone: Independent effects of dialogue and visibility. Journal of Memory and Language
58: 495–520.
Beattie, Geoffrey and Heather Shovelton 1999. Mapping the range of information contained in
the iconic hand gestures that accompany spontaneous speech. Journal of Language and Social
Psychology 18: 438–462.
Bernardis, Paolo and Maurizio Gentilucci 2006. Speech and gesture share the same communica-
tion system. Neuropsychologia 44: 178–190.
Bock, Kathryn 1996. Language production: Methods and methodologies. Psychonomic Bulletin &
Review 3: 395–421.
Christiansen, Morten H. and Nick Chater 2001. Connectionist psycholinguistics: Capturing the
empirical data. Trends in Cognitive Sciences 5: 82–88.
Clark, Herbert H. 1996. Using Language. New York: Cambridge University Press.
Clark, Herbert H. 1997. Dogmas of understanding. Discourse Processes 23: 567–598.
Clark, Herbert H. and Meredyth A. Krych 2004. Speaking while monitoring addressees for under-
standing. Journal of Memory and Language 50: 62–81.
Cutler, Anne and Charles Clifton Jr. 1999. Comprehending spoken language: A blueprint of the
listener. In: Colin M. Brown and Peter Hagoort (eds.), The Neurocognition of Language, 123–
166. New York: Oxford University Press.
de Ruiter, Jan P. 2000. The production of gesture and speech. In: David McNeill (ed.), Language
and Gesture, 284–311. Cambridge: Cambridge University Press.
de Ruiter, Jan P. 2007. Postcards from the mind: The relationship between speech, imagistic ges-
ture, and thought. Gesture 7: 21–38.
Elman, Jeffrey L. 1995. Language as a dynamic system. In: Robert F. Port and Timothy Van Gelder
(eds.), Mind as Motion, 195–225. Cambridge, MA: Massachusetts Institute of Technology Press.
Feyereisen, Pierre 1997. The competition between gesture and speech production in dual-task
paradigms. Journal of Memory and Language 36: 13–33.
Feyereisen, Pierre 2006. How could gesture facilitate lexical access? Advances in Speech-Lan-
guage Pathology 8: 128–133.
Feyereisen, Pierre 2007. How do gesture and speech production synchronise? Current Psychology
Letters: Behaviour, Brain and Cognition 22(2). Published online on 9 July 2007. URL: http://
cpl.revues.org/document1561.html.
Feyereisen, Pierre and Isabelle Havard 1999. Mental imagery and production of hand gestures
during speech by younger and older adults. Journal of Nonverbal Behavior 23: 153–171.
166 II. Perspectives from different disciplines
Fodor, Jerry A., Thomas C. Bever and Merrill F. Garrett 1974. The Psychology of Language:
Introduction to Psycholinguistics and Generative Grammar. New York: McGraw Hill.
Garrett, Merrill F. 1988. Processes in language production. In: Frederick J. Newmeyer (ed.),
Linguistics: The Cambridge Survey – III. Language: Psychological and Biological Aspects,
69–96. Cambridge: Cambridge University Press.
Gentilucci, Maurizio and Riccardo Dalla Volta 2008. Spoken language and arm gestures are con-
trolled by the same motor control system. Quarterly Journal of Experimental Psychology 61:
944–957.
Gerwing, Jennifer and Janet Bavelas 2004. Linguistic influences on gesture’s form. Gesture 4:
157–195.
Goldin-Meadow, Susan 2003. Hearing Gesture: How Our Hands Help Us Think. Cambridge, MA:
Belknap Press of Harvard University Press.
Goldin-Meadow, Susan this volume. How our gestures help us learn. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interac-
tion. (Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter
Mouton.
Hadar, Uri this volume. Coverbal gestures: Between communication and speech production. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin: De Gruyter Mouton.
Hostetter, Autumn B. 2011. When do gestures communicate? A meta-analysis. Psychological Bul-
letin 137: 297–315.
Hostetter, Autumn B. and Martha W. Alibali 2007. Raise your hand if you’re spatial: Relations
between verbal and spatial skills and gesture production. Gesture 7: 73–95.
Hostetter, Autumn B. and Martha W. Alibali 2008. Visible embodiment: Gestures as simulated
action. Psychonomic Bulletin & Review 15: 495–514.
Hostetter, Autumn B. and Martha W. Alibali 2010. Language, gesture, action! A test of the Ges-
ture as Simulated Action framework. Journal of Memory and Language 63: 245–257.
Hostetter, Autumn B., Martha W. Alibali and Sotaro Kita 2007. I see it in my hands’ eye: Represen-
tational gestures reflect conceptual demands. Language and Cognitive Processes 22: 313–336.
Iverson, Jana M. and Esther Thelen 1999. Hand, mouth and brain: The dynamic emergence of
speech and gesture. Journal of Consciousness Studies 6: 19–40.
Jacobs, Naomi and Alan Garnham 2007. The role of conversational hand gestures in a narrative
task. Journal of Memory and Language 56: 291–303.
Kendon, Adam 1995. Gestures as illocutionary and discourse structure markers in Southern Ital-
ian conversation. Journal of Pragmatics 23: 247–279.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kimbara, Irene 2008. Gesture form convergence in joint description. Journal of Nonverbal Behav-
ior 32: 123–131.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.),
Language and Gesture, 261–283. Cambridge: Cambridge University Press.
Kita, Sotaro 2009. A model of speech-gesture production. In: Ezequiel Morsella (ed.), Expressing
One Self/Expressing One’s Self: Communication, Cognition, Language, and Identity, 9–22. Lon-
don: Taylor & Francis.
Kita, Sotaro and Asli Özyürek 2003. What does cross-linguistic variation in semantic coordination
of speech and gesture reveal? Evidence for an interface representation of spatial thinking and
speaking. Journal of Memory and Language 48: 16–32.
Kita, Sotaro, Asli Özyürek, Shanley Allen, Amanda Brown, Reyhan Furman and Tomoko Ishi-
zuka 2007. Relations between syntactic encoding and co-speech gestures: Implications for a
model of speech and gesture production. Language and Cognitive Processes 22: 1215–1236.
9. Psycholinguistics of speech and gesture: Production, comprehension, architecture 167
Kopp, Stefan, Kirsten Bergmann and Ipke Wachsmuth 2008. Multimodal communication from
multimodal thinking: Towards an integrated model of speech and gesture production. Interna-
tional Journal of Semantic Computing 2: 115–136.
Krahmer, Emiel and Marc Swerts 2007. The effects of visual beats on prosodic prominence:
Acoustic analyses, auditory perception and visual perception. Journal of Memory and
Language 57: 396–414.
Krauss, Robert M., Yihsiu Chen and Purmina Chawla 1996. Nonverbal behaviour and nonverbal
communication: What do conversational hand gestures tell us? In: Mark P. Zanna (ed.), Ad-
vances in Experimental Social Psychology, Volume 28, 389–450. San Diego, CA: Academic Press.
Krauss, Robert, Yihsiu Chen and Rebecca F. Gottesman 2000. Lexical gestures and lexical access:
A process model. In: David McNeill (ed.), Language and Gesture, 261–283. Cambridge:
Cambridge University Press.
Krauss, Robert M. and Susan R. Fussell 1996. Social psychological models of interpersonal com-
munication. In: E. Tony Higgins and Arie W. Kruglanski (eds.), Social Psychology: Handbook
of Basic Principles, 655–701. New York: Guilford.
Krauss, Robert M. and Uri Hadar 1999. The role of speech-related arm/hand gestures in word
retrieval. In: Lynn S. Messing and Ruth Campbell (eds.), Gesture, Speech, and Sign, 93–116.
New York: Oxford University Press.
Langton, Stephen R.H. and Vicky Bruce 2000. You must see the point: Automatic processing of
cues to the direction of social attention. Journal of Experimental Psychology: Human Percep-
tion and Performance 26: 747–757.
Levelt, Willem J.M. 1989. Speaking: From Intention to Articulation. Cambridge, MA: Massachu-
setts Institute of Technology Press.
McClave, Evelyn 1994. Gestural beats: The rhythm hypothesis. Journal of Psycholinguistic
Research 23: 45–66.
McClave, Evelyn 1998. Pitch and manual gestures. Journal of Psycholinguistic Research 27: 69–89.
McClave, Evelyn Z. 2000. Linguistic functions of head movements in the context of speech. Jour-
nal of Pragmatics 32: 855–878.
McNeill, David 1987. Psycholinguistics: A New Approach. New York: Harper and Row.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: Chicago
University Press.
McNeill, David 2000. Catchments and contexts: Non-modular factors in speech and gesture pro-
duction. In: David McNeill (ed.), Language and Gesture, 312–328. Cambridge: Cambridge
University Press.
McNeill, David 2005. Gesture and Thought. Chicago: Chicago University Press.
McNeill, David this volume. The growth point hypothesis of language and gesture as a dynamic
and integrated system. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An Inter-
national Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and
Communication Science 38.1.) Berlin: De Gruyter Mouton.
Melinger, Alissa and Sotaro Kita 2007. Conceptualisation load triggers gesture production.
Language and Cognitive Processes 22: 473–500.
Melinger, Alissa and Willem J. M. Levelt 2004. Gesture and the communicative intention of the
speaker. Gesture 4: 119–141.
Morsella, Ezequiel and Robert M. Krauss 2004. The role of gestures in spatial working memory
and speech. American Journal of Psychology 117: 411–424.
Morsella, Ezequiel and Robert M. Krauss 2005. Muscular activity in the arm during lexical retrieval:
Implications for gesture-speech theories. Journal of Psycholinguistic Research 34: 415–427.
Nicoladis, Elena 2007. The effect of bilingualism on the use of manual gestures. Applied Psycho-
linguistics 28: 441–454.
Pickering, Martin J. and Simon Garrod 2004. Toward a mechanistic psychology of dialogue.
Behavioral and Brain Sciences 27: 169–226.
168 II. Perspectives from different disciplines
Shannon, Claude E. and Warren Weaver 1949. The Mathematical Theory of Communication.
Urbana, IL: University of Illinois Press.
Smith, Linda B. 2005. Cognition as a dynamic system: Principles from embodiment. Developmen-
tal Review 25: 278–298.
Smith, Linda B. and Larissa K. Samuelson 2003. Different is good: Connectionism and dynamic
systems theory are complementary emergentist approaches to development. Developmental
Science 6(4): 434–439.
Sowa, Timo, Stefan Kopp, Susan Duncan, David McNeill and Ipke Wachsmuth 2008. Implement-
ing a non-modular theory of language production in an embodied conversational agent. In:
Ipke Wachsmuth, Manuela Lenzen and Guenther Knoblich (eds.), Embodied Communication
in Humans and Machines, 425–449. New York: Oxford University Press.
van Gelder, Tim and Robert F. Port 1995. It’s about time: An overview of the dynamical approach
to cognition. In: Robert F. Port and Tim van Gelder (eds.), Mind as Motion: Explorations in the
Dynamics of Cognition, 1–43. Cambridge, MA: Massachusetts Institute of Technology Press.
Wagner, Susan M., Howard Nusbaum and Susan Goldin-Meadow 2004. Probing the mental repre-
sentation of gesture: Is handwaving spatial? Journal of Memory and Language 50(4): 395–407.
Abstract
This chapter focuses on where in the brain – in terms of the right and left hemispheres –
co-speech gestures are generated. This question is not only of neurobiological relevance
but its investigation also provides an empirical basis to explore with what kind of cogni-
tive and emotional processes in the two hemispheres gesture generation may be asso-
ciated. First addressed are the methodological difficulties and the approaches currently
used to empirically investigate the question of hemispheric specialization in the produc-
tion of gestures. Second, an empirically grounded theory proposing a left hemispheric
generation of co-speech gestures is presented and critically discussed. The left hemisphere
proposition is contrasted by empirical data providing evidence for a right hemispheric
generation of gestures. The chapter concludes with a proposition on the distinct roles
of the right and left hemispheres in gesture production.
10. Neuropsychology of gesture production 169
and Sperry 1967; Lausberg et al. 2003; Sperry 1968; Trope et al. 1987; Volpe et al. 1982).
As a result, the actions of the right and left hands reflect competence or incompetence
of the contralateral hemisphere. As an example, as the left hemisphere is language
dominant, these patients cannot execute verbal commands with the left hand, which
is controlled by right hemisphere (left hand apraxia). In contrast, the split-brain pa-
tient’s right hand performs worse than the left hand in copying figures (right hand con-
structional apraxia), because the right hand is disconnected from the right hemispheric
visuo-spatial competence (e.g. Bogen 1993; Lausberg et al. 2003). Therefore, studies on
spontaneous hand preferences in split-brain patients provide valuable information
about the neurobiological correlates of the production of different gesture types.
Likewise, in healthy subjects spontaneous hand preferences reflect the activation of
the contralateral hemisphere (Hampson and Kimura 1984; Verfaellie, Bowers, and
Heilman 1988). Hampson and Kimura (1984) observed in right-handed healthy subjects
a shift from right hand use in verbal tasks toward greater left hand use in spatial tasks.
They suggested that the problem-solving hemisphere preferentially uses the motor
pathways, which originate intrahemispherically. Consequently, the right hemisphere
that primarily solves the spatial task employs the contralateral left hand. Indeed, in
behavioral laterality experiments, when resources are sufficient for both decision and
response programming, there is an advantage to responding with the hand controlled
by the same hemisphere that performs the task (Zaidel et al. 1988). In the same line,
right-handers prefer the left hand for self-touch gestures (Lausberg, Sassenberg, and
Holle submitted). Self-touch gestures are displayed when individuals are stressed or
emotionally engaged (Freedman and Bucci 1981; Freedman et al. 1972; Freedman
and Hoffmann 1967; Freedman 1972; Lausberg 1995; Lausberg and Kryger 2011;
Sainsbury 1955; Sousa-Poza and Rohrberg 1977; Ulrich 1977; Ulrich and Harms
1985). Further, the right hemisphere is activated more than the left during emotionally
loaded or stressful situations (Ahern and Schwartz 1979; Berridge et al. 1999; Borod
et al. 1998; Grunwald and Weiss 2007; Killgore and Yurgelun-Todd 2007; Stalnaker,
Espana, and Berridge 2009). Thus, the left hand preference for self-touch reflects the
right hemispheric activation during emotional engagement. It is noteworthy that if
the body-focused activity includes the manipulation of body-attached objects, such as
playing with a necklace, there is a significant right-hand preference (Lausberg, Sassen-
berg, and Holle submitted). This concurs well with the left hemispheric dominance for
tool use (see 5. for detailed discussion).
However, in healthy subjects, if required the intact corpus callosum enables each
hemisphere to exert control over the ipsilateral hand. Therefore, in studies on healthy
subjects other factors have to be ruled out, which might require the use of a specific
hand. These are as follows:
(ii) a semantic purpose, such as when talking about the left or right of two objects
(Lausberg and Kita 2003);
(iii) cultural conventions, such as when Arrente speakers in Central Australia use
the left hand to refer to targets that are on the left and vice versa (Wilkins and de
Ruiter 1999). Likewise, in explicit gesture production, right-handers make 68%
of reaches to left hemispace with the left hand (Bryden, Pryde, and Roy 2000);
(iv) an occupation of the right hand with some other physical activity, such as holding a
cup of coffee.
If these factors are controlled for, in empirical studies on healthy subjects
spontaneous hand preferences are a good indicator of hemispheric specialization.
the split-brain patients were unable to use the left hand on verbal command (left verbal
apraxia), they spontaneously preferred the left hand for communicative gestures. This
data evidences that spontaneous communicative gestures can be generated in the
right hemisphere independently from left hemispheric speech production.
Hence, the question arises what determines the hand choice for spontaneous com-
municative gestures. Thus far, factors other than speech dominance or handedness
have rarely been investigated systematically. In the following sections, I will argue
that the hand choice for different gesture types reflects the hemispheric lateralization
of different cognitive and emotional functions, which the generation of the specific ges-
ture types is associated with. The following review of empirical studies reveals distinct
patterns of hand preferences for the different gesture types.
in N.G. and U.H.. A context analysis was conducted only on U.H. (Lausberg, Davis, and
Rothenhäusler 2000). U.H.’s left hand pictorial gestures occurred in speech pauses and
reflected the ideational process (ideographics). In contrast, the right hand was used
exclusively for pictorial gestures, which matched the verbal utterance semantically
(physiographics) and temporally. U.H.’s unilateral shrugs of the left shoulder occurred
frequently and in a context of lack of knowledge and resignation, whereas the rare uni-
lateral shrugs of the right shoulder were performed when talking about the “right side”.
Furthermore, his right hand deictics only referred to the external space, whereas the left
hand deictics occurred when the patient referred to himself.
Analogous hand preferences for the specific gesture types are observed in healthy
subjects. Souza-Poza, Rohrberg and Mercure (1979) reported that in right-handers a
right hand preference was only significant for the representational gestures (includes
all of Efron’s types except for batons), but not for the nonrepresentational gestures
(Efron’s baton). Stephens (1983) found a significant right hand preference for iconics,
a non-significant right hand preference for metaphorics, as well as a non-significant
left hand preference for beats (Efron’s baton). In a study by Blonder et al. (1995), a
right-handed control group showed a trend towards more right hand use for symbolic
gestures (Efron’s emblematic), whereas the left hand was used more often for expres-
sive gestures (Efron’s baton). In Foundas et al. (1995) study, a right-handed control
group showed a significant right hand preference for content gestures (includes all of
Efron’s types except for batons and partly ideographic) and for emphasis gestures
(Efron’s baton) as well as a right hand trend for fillers (overlap with Efron’s ideo-
graphic). Kita, de Condappa and Mohr (2007) reported a significant right hand prefer-
ence for deictics (idem to Efron) and for depictive gestures (includes Efron’s
ideographic and physiographic), except for those depictive gestures that had a character
viewpoint in a metaphor condition. For deictics, a right hand preference has been re-
ported in healthy adults (Kita, de Condappa, and Mohr 2007; Wilkins and de Ruiter
1999) and in infants and toddlers (Bates et al. 1986; Vauclair and Imbault 2009). In a
recent study by Lausberg, Sassenberg and Holle (submitted), a distinct pattern of
hand preferences for different gesture types was found in 37 right-handed participants.
In order to collect a broad spectrum of data, the participants were examined in two dif-
ferent communicative settings, i.e., during narrations of everyday activities and during
semi-standardized interviews with personal topics. No hand preferences were found for
self-deictics, body-deictics, directions, iconographics, batons, back-tosses, palm-outs,
shrugs, and emblems. While there was a significant left hand preference for self-touch
(see 1.), a significant right-hand preference was found for pantomimes, positions (Def-
inition: The hand positions an imaginary object/subject at a specific location in an imag-
inary scene, which is projected into the gesture space.), traces (definition: The hand
traces an imaginary line or contour), deictics to external loci, kinetographs, and
body-attached object manipulation.
investigated (only the split-brain patient U.H. was a left-hander, but his left-handedness
did not substantially alter his pattern of hand preference for the different gesture types
as compared to the right-handed patients A.A. and N.G.). The current theoretical mod-
els of handedness as a multidimensional trait serve to explain the complementary func-
tions of the right and left hands during tool use and object manipulation, but they
provide only little explanation for the hand choice in spontaneous gestures in commu-
nicative situations. The exception is that right-handers might prefer the right hand for
those gesture types, which require a high degree of fine motor coordination and mod-
ulation of speed or direction. However, the execution of the right hand preference ges-
ture types, such as deictics, pantomimes, positions, traces, kinetographs, or iconographs
(see 4.) does not require more fine motor coordination than the execution of the gesture
types with no hand preference, such as self-deictics, body-deictics, directions, ideo-
graphics, metaphorics, batons, back-tosses, palm-outs, shrugs, and emblems. In other
words, the fact that the right and left hands are used to execute the latter gesture
types is not due to kinesi simplicity. Thus, handedness cannot sufficiently explain the
hand preferences for the different types of hand gestures, which are spontaneously dis-
played in communicative situations. With this viewpoint, I partly agree with Kimura
(1973a), who rejected handedness as an independent additive factor that could influ-
ence the hand choice for free movements and self-touch. However, I suggest that
especially in the case of kinesically complex gestures handedness is a co-factor that
influences the hand choice.
The following paragraphs focus on the relation between hand preferences for the dif-
ferent gesture types and hemispheric lateralization for different cognitive and emo-
tional functions, such as emotional processes, prosody, metaphorical thinking, and the
tool use competence. It is noteworthy that there is a group of gesture types, i.e., batons,
tosses, self-deictics, and shrugs, for which split-brain patients show a clear left hand pref-
erence and for which the right-handed healthy subjects show either a trend towards
more left hand use or no hand preference. This suggests that in right-handed healthy
subjects the effect of the right hemisphere generation of these gesture types, which
is strongly suggested by the split-brain data, is attenuated because the intact corpus
callosum potentially enables the right-handers to use the more dexterous right hand.
For batons, no hand preference or a trend towards more left hand use was found
(Blonder et al. 1995; Lausberg et al. 2007; Lausberg, Sassenberg, and Holle submitted;
McNeill 1992; Souza-Poza, Rohrberg, and Mercure 1979; Stephens 1983). The same
applies to back-tosses, which set rhythmical accents just like batons (Lausberg et al.
2007; Lausberg, Sassenberg, and Holle submitted). The exception was Foundas et al.
(1995), who reported a right-hand preference for emphasis gestures in 12 healthy sub-
jects. Thus, two assumptions concerning the neuropsychology of batons and tosses are
forwarded here:
(i) both hemispheres are equally competent to execute batons and tosses; or
(ii) the influence of right-handedness was attenuated by the right-hemispheric prosodic
contribution to generation of these gesture types.
Indeed, as batons and tosses emphasize prosody, it can be hypothesized that their pro-
duction is associated with the right hemispheric specialization for the production of
emotional prosody and a contribution to prosodic fundamental frequency (e.g. Schirmer
176 II. Perspectives from different disciplines
and in a context of lack of knowledge and resignation, whereas the rare unilateral
shrugs of the right shoulder were performed when talking about the “right side” (Laus-
berg, Davis, and Rothenhäusler 2000).
The significant right hand preference for pantomime gestures (Lausberg et al. 2007;
Lausberg, Sassenberg, and Holle submitted) is in line with lesion studies and functional
neuroimaging studies, which demonstrate that the left hemisphere plays a central role in
the generation of pantomime gestures on command in right-handers and left-handers.
Split-brain patients demonstrated a left hand callosal apraxia when pantomiming on
command to visual presentation of tools (Lausberg et al. 2003). Furthermore, patients
with left hemisphere damage were more impaired in pantomiming tool use on com-
mand than right hemisphere damaged patients (De Renzi, Faglioni, and Sorgato
1982; Hartmann et al. 2005; Liepmann and Maas 1907). Neuroimaging studies demon-
strated that independently of whether the right or left hand is used, pantomime is ac-
companied by left hemisphere activation (Choi et al. 2001; Hermsdörfer et al. 2007;
Johnson-Frey, Newman-Norlund, and Grafton 2005; Moll et al. 2000; Ohgami et al.
2004; Rumiati et al. 2004). Lausberg et al. (2003) suggested that the generation of
pantomime gestures relies on the specifically left hemispheric competence to link the
movement concept for tool use with the mental representation of the tool.
There would be some reasons to assume a left hand preference for trace gestures,
because visuo-constructive abilities are localized in the right hemisphere. Split-brain
patients spontaneously choose the left hand for visuo-motor tasks (Graff-Radford,
Welsh, and Godersky 1987; Lausberg et al. 2007; Sperry 1968;) and they show better
performances with the left hand than with the right in these kind of tasks, e.g. when
drawing the Taylor figure. Furthermore, in right-handed healthy subjects Hampson
and Kimura (1984) observed a shift from right hand use in verbal tasks toward greater
left hand use in spatial tasks. Likewise, for position gestures a right hemisphere advan-
tage could be assumed because the right hemisphere is specialized for the conceptu-
alization of imaginary scenes in the whole gesture space, while the left hemisphere
neglects the gesture space left of the subject’s body midline (Lausberg et al. 2003).
However, for both gesture types, trace and position, a significant right hand preference
was evidenced in the recent study by Lausberg, Sassenberg, and Holle (submitted).
With regard to gesture type phenomenology, it could be hypothesized that trace and
position gestures are derived from pantomime gestures. Trace gestures can be re-
garded as body-part-as-object pantomimes, i.e., as pantomiming “drawing” with the
index used as if it were a pen (Alternatively, in an evolutionary scenario it could hy-
pothesized that first the finger was used and then it was replaced by a pen [C. Müller,
personal communication]). In the same vein, the position gesture could be the panto-
mime of placing something. Thus, trace and position gestures might originate from
pantomimes or even further from tool use or direct object manipulation. However,
in contrast to the actual pantomime gestures, in which the gesturer pretends to act,
e.g. “I am drawing” or “I am positioning”, the gestural message of trace and position ges-
tures focuses on the contour, which is created, e.g. “a square”, or on the position, which is
marked in an imaginary scene, e.g. “here is [the church], and behind it, there is [the super
market]”. The present data indicate that despite the fact that the gestural information is
primarily of spatial nature, the origin of the gesture type, which is here the left hemi-
spheric function pantomiming, or in other words, the “gestural mode of representation”
(Müller 2001) overruns the impact of the right hemispheric spatial contribution.
178 II. Perspectives from different disciplines
The data comparison for pictorial gestures is complicated by the fact that researchers
here use quite different concepts (Efron: iconographics, kinetographics, ideographics;
McNeill: iconic, metaphoric). In general, for pictorial gestures, there is a significant
right hand preference (Foundas et al. 1995; McNeill 1992; Lausberg et al. 2007; Souza-
Poza, Rohrberg, and Mercure 1979; Stephens 1983). However, metaphoric use (Kita,
de Condappa, and Mohr 2007; Stephens 1983) and ideographic use (Lausberg et al.
2000; Lausberg et al. 2007) induce a shift toward more left hand use. This observation
concurs with the dominant role of the right hemisphere for the processing of
conventionalized metaphors (e.g. Ferstl et al. 2008; Mashal and Faust 2009).
6. Conclusion
The split-brain data provide evidence that spontaneous communicative gestures can be
generated in the right hemisphere independently from left hemispheric speech produc-
tion. Furthermore, split-brain patients as well as healthy subjects show distinct hand
preferences for specific gesture types. While right-handers prefer the right hand for
deictics, pantomimes, traces, positions, and for concrete pictorial gestures, they prefer
the left hand or no hand for self-deictics, batons, tosses, shrugs, and ideographics/meta-
phorics. Neither handedness nor speech-lateralization can explain the distinct pattern of
hand preferences. Instead, I argue that the hand preferences for the different gesture
types reflect the lateralization of cognitive and emotional functions in the left and
right hemispheres, which are associated with the production of these gesture types.
Some of the right hand preference gesture types are characterized by a close relation
to tool use, which is a primarily left hemisphere competence. In contrast, the left
hand preference gesture types are related to the right hemispheric competences for
prosody, emotional processes, and metaphorical thinking. The substantial right hemi-
spheric contribution to the generation of these gesture types attenuates the right-han-
ders’ right hand preference or even induces a left preference. Thus, I suggest that some
gestures, which are spontaneously displayed in communicative situations, directly
emerge from right hemispheric emotional processes, processes underlying prosody,
and metaphorical thinking.
7. References
Ahern, Geoffrey L. and Gary E. Schwartz 1979. Differential lateralization for positive versus neg-
ative emotion. Neuropsychologia 17: 693–698.
Bates, Elizabeth, Barbara O’Connel, Jyotsna Vaid, Paul Sledge and Lisa Oakes 1986. Language
and hand preference in early development. Developmental Neuropsychology 44: 178–190.
Berridge, Craig W., Elizabeth Mitton, William Clark and Robert H. Roth 1999. Engagement in a
non-escape (displacement) behavior elicits a selective and lateralized suppression of frontal
cortical dopaminergic utilization in stress. Synapse 32: 187–197.
Blonder, Lee Xenakis, Allan F. Burns, Dawn Bowers, Robert W. Moore and Kenneth M. Heilman
1995. Spontaneous gestures following right hemisphere infarct. Neuropsychologia 33: 203–213.
Bogen, Joseph E. 1993. The callosal syndromes. In: Kenneth M. Heilman and Edward Valenstein
(eds.), Clinical Neuropsychology, 337–408. New York: Oxford University Press.
Bogen, Joseph E. 2000. Split-brain basics: Relevance for the concept of one’s other mind. Journal
of the American Academy of Psychoanalysis 28: 341–369.
10. Neuropsychology of gesture production 179
Borod, Joan C., Elissa Koff and Betsy White 1983. Facial asymmetry in posed and spontaneous
expressions of emotion. Brain and Cognition 2: 165–175.
Borod, Joan C., Elissa Koff, Sandra Yecker, Cornelia Santschi and Michael Schmidt 1998.
Facial asymmetry during emotional expression: Gender, valence, and measurement technique.
Neuropsychologia 36(11): 1209–1215.
Brown, Susan G., Eric A. Roy, Linda Rohr and Pamela J. Bryden 2006. Using hand performance
measures to predict handedness. Laterality 11(1): 1–14.
Bryden, Pamela J., Kelly M. Pryde and Eric A. Roy 2000. A performance measure of the degree of
hand preference. Brain and Cognition 44: 402–414.
Buxbaum, Laurel J., Myrna F. Schwartz, Branch H. Coslett and Tania G. Carew 1995. Naturalistic
action and praxis in callosal apraxia. Neurocase 1: 3–17.
Choi, Seong Hye, Duk L. Na, Eunjoo Kang, Kyung Min Lee, Soo Wha Lee and Dong Gyu Na
2001. Functional magnetic resonance imaging during pantomiming tool-use gestures. Experi-
mental Brain Research 139(311): 311–317.
Corey, David M., Megan M. Hurley and Anne L. Foundas 2001. Right and left handedness de-
fined: a multivariate approach using hand preference and performance. Neuropsychiatry, Neu-
ropsychology, Behavioural Neurology 14(3): 144–152.
Dalby, J. Thomas, David Gibson, Vittorio Grossi and Richard Schneider 1980. Lateralized hand
gesture during speech. Journal of Motor Behaviour 12: 292–297.
Darwin, Charles R. 2009. The Expression of the Emotions in Man and Animals, 2nd edition. London:
Penguin Group. First published [1890].
De Renzi, Ennio, Pietro Faglioni and P. Sorgato 1982. Modality-specific and supramodal mechan-
isms of apraxia. Brain 105: 301–312.
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published [1941].
Ferstl, Evelyne C., Jane Neumann, Carsten Bogler and D. Yves von Cramon 2008. The extended
language network: a meta-analysis of neuroimaging studies on text comprehension. Human
Brain Mapping 29(5): 581–593.
Foundas, Anne L., Beth L. Macauley, Anastasia M. Raymer, Lynn M. Maher, Kenneth M. Heil-
man and Lesley J. G. Rothi 1995. Gesture laterality in aphasic and apraxic stroke patients.
Brain and Cognition 29: 204–213.
Freedman, Norbert 1972. The analysis of movement behavior during clinical interview. In: Aron
W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 153–175. New York:
Pergamon Press.
Freedman, Norbert and Wilma Bucci 1981. On kinetic filtering in associative monologue. Semio-
tica 34(3/4): 225–249.
Freedman, Norbert and Stanley P. Hoffmann 1967. Kinetic behaviour in altered clinical states:
Approach to objective analysis of motor behaviour during clinical interviews. Perceptual and
Motor Skills 24: 527–539.
Freedman, Norbert, James O’Hanlon, Philip Oltman and Herman A. Witkin 1972. The imprint of
psychological differentiation on kinetic behaviour in varying communicative contexts. Journal
of Abnormal Psychology 79(3): 239–258.
Gazzaniga, Michael S., Joseph E. Bogen and Roger W. Sperry 1967. Dyspraxia following division
of the cerebral commissures. Archives of Neurology 16: 606–612.
Geschwind, Daniel H., Marco Iacoboni, Michael S. Mega, Dahlia W. Zaidel, Timothy Cloughesy
and Eran Zaidel 1995. Alien hand syndrome: Interhemispheric motor disconnection due to a
lesion in the midbody of the corpus callosum. Neurology 45: 802–808.
Graff-Radford, Neill R., Kathleen Welsh and John Godersky 1987. Callosal apraxia. Neurology
37: 100–105.
Grunwald, Michael and Weiss, T. 2007. Emotional stress and facial self-touch gestures. Unpublished
paper presented at the Lindauer Psychotherapietage, April, 15–27, 2007, Lindau, Germany.
Hampson, Elizabeth and Doreen Kimura 1984. Hand movement asymmetries during verbal and
nonverbal tasks. Canadian Journal of Psychology 38: 102–125.
180 II. Perspectives from different disciplines
Hartmann, Karoline, Georg Goldenberg, Maike Daumüller and Joachim Hermsdörfer 2005. It
takes the whole brain to make a cup of coffee: The neuropsychology of naturalistic actions in-
volving technical devices. Neuropsychologia 43: 625–637.
Healey, Jane M., Jaqueline Liedermann and Norman Geschwind 1986. Handedness is not a uni-
dimensional trait. Cortex 22(1): 33–53.
Hermsdörfer, Joachim, Guido Terlinden, Mark Mühlau, Georg Goldenberg and Afra M.
Wohlschläger 2007. Neural representations of pantomimed an actual tool use: Evidence from
an event–related fMRI study. NeuroImage 36: 109–118.
Hubbard, Amy L., Stephen Wilson, Daniel Callan and Mirella Dapretto 2009. Giving speech a
hand: Gesture modulates activity in auditory cortex during speech perception. Human Brain
Mapping 30(3): 1028–1037.
Johnson, Harold G., Paul Ekman and Wallace V. Friesen 1975. Communicative body movements:
American emblems. Semiotica 15: 335–353.
Johnson-Frey, Scott H., Roger Newman-Norlund and Scott T. Grafton 2005. A distributed left hemi-
sphere network active during planning of everyday tool use skills. Cerebral Cortex 15(6): 681–695.
Killgore, William D. S. and Deborah A. Yurgelun-Todd 2007. The right-hemisphere and valence
hypotheses: Could they both be right (and sometimes left)? Social Cognitive and Affective
Neuroscience 2: 240–250.
Kimura, Doreen 1973a. Manual activity during speaking – I. Right–handers. Neuropsychologia 11:
45–50.
Kimura, Doreen 1973b. Manual activity during speaking – II. Left-handers. Neuropsychologia 11:
51–55.
Kita, Sotaro, Olivier de Condappa and Christine Mohr 2007. Metaphor explanation attenuates the
right-hand preference for depictive co-speech gestures that imitate actions. Brain and Lan-
guage 101: 185–197.
Lausberg, Hedda 1995. Bewegungsverhalten als Prozeßparameter in einer kontrollierten Studie
mit funktioneller Entspannung. Unpublished paper presented at the 42. Arbeitstagung des
Deutschen Kollegiums für Psychosomatische Medizin, March 2–4, 1995, Friedrich-Schiller-
Universität Jena, Germany.
Lausberg, Hedda, Robyn F. Cruz, Sotaro Kita, Eran Zaidel and Alain Ptito 2003. Pantomime to
visual presentation of objects: Left hand dyspraxia in patients with complete callosotomy.
Brain 126: 343–360.
Lausberg, Hedda, Martha Davis and Angela Rothenhäusler 2000. Hemispheric specialization in
spontaneous gesticulation in a patient with callosal disconnection. Neuropsychologia 38:
1654–1663.
Lausberg, Hedda, Reinhard Göttert, Udo Münßinger, Friedrich Boegner and P. Marx 1999. Cal-
losal disconnection syndrome in a left-handed patient due to infarction of the total length of
the corpus callosum. Neuropsychologia 37: 253–265.
Lausberg, Hedda and Sotaro Kita 2003. The content of the message influences the hand choice in
co-speech gestures and in gesturing without speaking. Brain and Language 86: 57–69.
Lausberg, Hedda, Sotaro Kita, Eran Zaidel and Alain Ptito 2003. Split-brain patients neglect left
personal space during right-handed gestures. Neuropsychologia 41: 1317–1329.
Lausberg, Hedda, Uta Sassenberg and Henning Holle submitted. Right-handers display distinct
hand preferences for different gesture types in communicative situations: Evidence for specific
right and left hemisphere contributions to implicit gesture production.
Lausberg, Hedda, Eran Zaidel, Robyn F. Cruz and Alain Ptito 2007. Speech-independent produc-
tion of communicative gestures: Evidence from patients with complete callosal disconnection.
Neuropsychologia 45: 3092–3104.
Lausberg, Hedda and Kryger Monika 2011. Gestisches Verhalten als Indikator therapeutischer
Prozesse in der verbalen Psychotherapie: Zur Funktion der Selbstberührungen und zur Reprä-
sentation von Objektbeziehungen in gestischen Darstellungen. Psychotherapie-Wissenschaft 1:
41–55.
10. Neuropsychology of gesture production 181
Lavergne, Joanne and Doreen Kimura 1987. Hand movement asymmetry during speech: No effect
of speaking topic. Neuropsychologia 25: 689–693.
Liepmann, Hugo and Maas, O. 1907. Fall von linksseitiger Agraphie und Apraxie bei rechtsseitiger
Lähmung. Journal für Psychologie und Neurologie 10 (4/5): 214–227.
Lomas, Jonathan and Doreen Kimura 1976. Intrahemispheric interaction between speaking and
sequential manual activity. Neuropsychologia 14: 23–33.
Marangolo, Paola, Ennio De Renzi, Enrico Di Pace, Paola Ciurli and Alessandro Castriota-Skan-
denberg 1998. Let not thy left hand know what thy right hand knoweth. The case of a patient
with an infarct involving the callosal pathways. Brain 121: 1459–1467.
Mashal, Nira and Maria Faust 2009. Conventionalisation of novel metaphors: A shift in hemi-
spheric asymmetry. Laterality 14(6): 573–589.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David and Laura Pedelty 1995. Right brain and gesture. In: Karen Emmorey and Judy S.
Reilly (eds.), Language, Gesture, and Space, 63–85. Hillsdale, NJ: Lawrence Erlbaum.
Moll, Jorge D., Ricardo de Oliveira-Souza, Leigh J. Passman, Fernando Cimini Cunha, Fabiana
Souza-Lima and Pedro Angelo Andreiuolo 2000. Functional MRI correlates of real and ima-
gined tool-use pantomimes. Neurology 54: 1331–1336.
Moscovitch, Morris and Janet Olds 1982. Asymmetries in spontaneous facial expressions and their
possible relation to hemispheric specialization. Neuropsychologia 20: 71–81.
Müller, Cornelia 2001. Iconicity and gesture. In: Christian Cavè, Isabelle Guaitella and Serge Santi
(eds.), Oralité et Gestualité, 321–328. Aix-en-Provence, France: L’Harmattan.
Ohgami, Yuko, Kayako Matsuo, Nobuko Uchida and Toshiharu Nakai 2004. An fMRI study of
tool-use gestures: body part as object and pantomime. Cognitive Science and Neuropsychology
15(12): 1903–1906.
Rapcsak, Steven Z., Cynthia Ochipa, Pélagie M. Beeson, and Alan B. Rubens 1993. Praxis and the
right hemisphere. Brain and Cognition 23: 181–202.
Rumiati, Raffaella I., Peter H. Weiss, Tim Shallice, Giovanni Ottoboni, Johannes Noth, Karl Zilles
and Gereon Fink 2004. Neural basis of pantomiming the use of visually presented objects. Neu-
roImage 21: 1224–1231.
Sainsbury, Peter 1955. Gestural movement during psychiatric interview. Psychosomatic Medicine
17: 454–469.
Schirmer, Annet, Kai Alter, Sonja A. Kotz and Angela D. Friederici 2001. Lateralization of pros-
ody during language production: A lesion study. Brain and Language 76: 1–17.
Sousa-Poza, Joaquin F. and Robert Rohrberg 1977. Body movements in relation to type of infor-
mation (person- and non-person oriented) and cognitive style (field dependence). Human
Communication Research 4(1): 19–29.
Souza-Posa, Joaquin F., Robert Rohrberg and André Mercure 1979. Effects of type of information
(abstract-concrete) and field dependence on asymmetry of hand movements during speech.
Perceptual and Motor Skills 48: 1323–1330.
Sperry, Roger Wolcott 1968. Hemisphere deconnection and unity in conscious awareness. Amer-
ican Psychologist 23: 723–733.
Stalnaker, Thomas A., Rodrigo A. Espana and Craig W. Berridge 2009. Coping behaviour causes
asymmetric changes in neuronal activation in the prefrontal cortex and amygdala. Synapse 63:
82–85.
Stephens, Debra 1983. Hemispheric language dominance and gesture hand preference. Ph.D. dis-
sertation, Department of Behavioral Sciences, University of Chicago.
Tanaka, Yasufumi, A. Yoshida, Nobuya Kawahata, Ryota Hashimoto and Taminori Obayashi
1996. Diagnostic dyspraxia – Clinical characteristics, responsible lesion and possible underlying
mechanism. Brain 119: 859–873.
Trope, Idit, Baruch Fishman, Ruben C. Gur, Neil M. Sussman and Raquel E. Gur 1987. Contral-
ateral and ipsilateral control of fingers following callosotomy. Neuropsychologia 25: 287–291.
182 II. Perspectives from different disciplines
Abstract
Speakers of oral languages use not only speech but also other kinds of bodily movements
when they communicate. From the perspective of cognitive linguistics, all of these beha-
viors can provide insight into speakers’ ongoing conceptualizations of the physical world
and of abstract ideas. Increasing attention is being paid in Cognitive Linguistic (CL)
11. Cognitive Linguistics 183
research to gesture (in particular, manual gesture) with speech in relation to such topics
as: conceptual metaphor; conceptual metonymy; (image) schemas; perspective-taking and
construal; mental spaces; formal and conceptual integration/blending theory; and mental
simulation and simulation semantics. The inclusion of gesture in cognitive linguistic
research raises not only methodological issues concerning what constitutes linguistic
data and what means can be used to analyze it, but also significant theoretical questions
about the nature of language itself, especially in relation to its dynamic character and the
degree to which grammar is multimodal.
1. Introduction
Cognitive Linguistics is a collective label for several lines of research that stem from the
1970s and 80s. Their origin was motivated by a combination of factors. On the one hand,
they grew out of opposition with existing paradigms that dominated linguistics and the
philosophy of language in academia, particularly in the United States. The strongest of
these in linguistics was, and still is, the formalist tradition, developed from the theories
of Noam Chomsky (e.g., 1965, 1995). At the same time, there was a burgeoning in
research in cognitive psychology which exposed problematic aspects of tenets underly-
ing the formalist approach, such as assumptions about how people use categories, in-
cluding linguistic categories, in unconscious reasoning (see Lakoff 1987; Taylor [1989]
1995). This research also brought about work in linguistics in areas which had largely
been ignored in the past. Examples include studying metaphor as a means by which
we think and talk about one conceptual domain in terms of another (Lakoff and John-
son 1980), recognizing frames as a fundamental means by which we organize our
knowledge and express it in linguistic constructions (Fillmore 1982), theorizing about
grammar in terms of principles of cognitive processing (Langacker 1982, 1987, 1991b),
and understanding reference and discourse structure as the setting up of mental models
(Fauconnier [1985] 1994). In contrast to the emphasis on syntax in the Chomskyan
formalist tradition of linguistics, what these approaches have in common is that they
highlight semantics as a starting point for explaining linguistic structure, with meaning
understood as some form of conceptualization.
Some of the work in cognitive linguistics has focused on elements that have tradi-
tionally been studied in descriptive linguistics, falling within the categories of phonol-
ogy, morphology, syntax, etc. However, several of the principles behind this approach
have provided a basis for broadening the scope of investigation. One is the focus on
conceptualization, and since cognitive linguistics takes an encyclopedic view of meaning
(rather than assuming that word meanings are best characterized in terms of discrete
lexical entries), then that very meaning may be expressed in various forms of behavior
in the context of using language. This leads us to a second important principle, which is
that cognitive linguistics is a usage-based approach to the study of language (Barlow
and Kemmer 2000; Langacker 1988). Grammar is assumed to be abstracted from recur-
ring pairings between forms of expression and meanings in usage events of language
(Langacker 2008: 457–458). The conclusion above, that the meanings conventionally
expressed in words can also appear in other forms of behavior, is also relevant here
in that there may be forms of expression beyond the words themselves which recur
in usage events to such an extent that they also gain the status of symbolic units.
Based on both of these lines of argumentation, gesture occurring with and around
184 II. Perspectives from different disciplines
talk is, and should be, included as an important source of data within cognitive linguistics
(Sweetser 2007).
This chapter will thus focus on what the field of cognitive linguistics has contributed
and could contribute to the study of bodily behavior in face-to-face communicative set-
tings and, in turn, what looking at gesture from a linguistic point of view means for the
study of language. The focus will be on spontaneous gesture used with and during talk,
concentrating primarily on manual gesture as the type that has received the most atten-
tion by cognitive linguistics researchers. The category of gesture covered here is meant
to include the range from “spontaneous movements of the hands and arms accompany-
ing speech” (McNeill 1992: 37), which Kendon (1980 and elsewhere) terms gesticula-
tion, to more symbolic, emblematic gestures (Efron [1941] 1972) for which there is a
more fixed correspondence between gestural form and conventionally accepted mean-
ing. The gestures may or may not have been used intentionally to communicate and
may or may not have been seen or perceived by the addressee as a signal. We will
see that considering the relevance of cognitive linguistics theories for the analysis of
gesture with speech not only provides new ways of interpreting the functions and
roles of gestures but also raises larger questions about what should fall within the
scope of linguistic inquiry.
A note is in order on the phrase “expressions of conceptualization” in the chapter’s
title. First, the term conceptualization rather than “concepts” is used in keeping with the
claim that this better highlights the dynamic nature of meaning in cognitive linguistics
(Langacker 2008: 30). Second, while cognitive linguistics takes meaning-expression as a
driving force behind why and how language is structured the way it is, there is not a uni-
form consensus within cognitive linguistics as to the scope of what should be included
under conceptualization with regard to language. Some take a very inclusive approach
toward this issue. In the theory of cognitive grammar, for example, “conceptualization
is broadly defined to encompass any facet of mental experience,” including sensory,
motor, and emotive experience, and apprehension of the context on various levels (lin-
guistic, physical, and social) (Langacker 2008: 30). Others qualify conceptualization in
relation to language in other ways. Slobin (1987, 1996), for example, focusses on “think-
ing for speaking” as the “special form of thought that is mobilized for communication”
(Slobin 1996: 76). Taking a step further, Evans (2006, 2009) emphasizes that semantic
structure is cognitively a different level of representation than conceptual structure in
general, with the former involving lexical concepts: “bundle[s] of varying sorts of
knowledge […] which are specialized for being encoded in language” (Evans 2009:
xi). In any case, what is agreed upon in cognitive linguistic approaches is that language
is not referring directly to the world but to conceptualization based upon the language
producer’s (such as the speaker’s) construal of some given situation.
The notion of words mapping to conceptualization may not be controversial. How-
ever, when we speak of gestures as mapping onto or even referring to conceptualiza-
tion, rather than referring to entities and situations in the physical world, this may at
first seem counterintuitive. Doesn’t an instance of pointing to an object refer to that
very object? From the perspective of cognitive linguistics, the gesture refers to the
speaker’s conceptualization of the object, but our conceptualizations of the physical
world are grounded on our perceptual input from the world (Barsalou 1999; Glenberg
and Robertson 2000). Note, though, that since gestures are produced in physical space,
even gestural reference to abstract conceptualizations is grounded by default via the
11. Cognitive Linguistics 185
physical forms, movements, and locations in which the gestures are produced. In this
regard, even abstract reference is grounded with gestures.
A second introductory note is in order concerning two ways of gesturing, since they
will be mentioned repeatedly in the explication below, namely: pointing and represent-
ing. By pointing, I will be referring to the movement of a body part “which appears to
be aimed in a clearly defined direction as if toward some specific target” (Kendon 2004:
200). Since the focus here is mainly on manual gestures as used with European lan-
guages, the relevant pointing is usually performed with an extended finger, although
open-hand pointing is also possible (Kendon and Versante 2003; Kita 2003a). (See
the chapter on pointing in volume 2 for further discussion.)
Another prominent way of gesturing involves representing. Gestural representation
can be accomplished in different manners:
(i) by re-enactment of an action which one would normally do with the hand(s), such
as gripping the hand and moving it horizontally as if holding a pen and writing;
(ii) by using the hand(s) to stand for an entity by substituting for it and thus embodying
it, as when holding one’s hand next to one’s ear with the three middle fingers curled,
the thumb extended up, and the pinkie extended down, in order to represent a
telephone handset;
(iii) by putting the hands or fingers in a way as if holding or next to a two- or three-
dimensional form (adjacent representation), such that a viewer could infer shape
from the hand’s/hands’ contour (Leyton 1992: 121), e.g., as if holding a bowl
upright in the air;
(iv) or by moving in such a way that the hand(s) trace(s) a form, either two-dimension-
ally with a finger tip, or three-dimensionally with more of the hand shape, espe-
cially including the palm, essentially drawing in the air; in this way a viewer
could recover the shape from the motion of the depiction (Leyton 1992: 145).
(This analysis is an adaptation of Müller’s 1998a and 1998b characterization of four ges-
tural modes of representation: acting, representing, molding, and drawing, discussed
further in Müller volume 2) One manner can sometimes lead into the other, for exam-
ple if a tracing gesture finishes with the hand(s) held still in the air, in the static form of
adjacent representation mentioned above.
These ways of gesturing can be used in the service of different functions. Below we
will focus on two main types: reference and relation to the discourse itself. Reference
can be made by pointing (in a prototypical act of deictic indication) or by gestural rep-
resentation, in the ways described above (Cienki in press). Likewise discourse-related
gestures may involve pointing, as in pointing in space that is related to the structure
of the speakers’ ongoing discourse, what McNeill, Cassell, and Levy (1993) refer to
as abstract deixis, or elements of discourse themselves may be represented – for exam-
ple, as-if held in one’s hand(s) (see McNeill’s 1992 discussion of this type as exemplify-
ing the metaphoric CONDUIT gesture for communication, presented in Reddy [1979]
1993).
As discussed below, the simple examples mentioned so far already involve many
basic theoretical notions that have been articulated in cognitive linguistics. However
the bulk of cognitive linguistics research is based on analysis and theorizing about lan-
guage only on the verbal level. We will see here that many of the important ideas from
186 II. Perspectives from different disciplines
cognitive linguistics also relate integrally to the study of gesture. Indeed the relevance
of many of these directions in cognitive linguistics to co-verbal behavior raises ques-
tions about the degree to which, when, and in what contexts spoken language should
be analyzed as multimodal.
The following sections should be seen as a selection of some main areas from cogni-
tive linguistics which have found resonance in gesture studies. They treat the study of
metaphor, metonymy, schemas, construal and perspective, mental spaces, formal and
conceptual integration (otherwise known as types of blending), and mental simulation.
2. Metaphor
The study of metaphor was one of the areas of research which became a starting point
for cognitive linguistics as a field. The approach to metaphor which is specifically meant
here is one which treats it not just as a linguistic device, but as a means by which people
conceptualize (consciously or unconsciously) one domain in terms of another (Lakoff
1993; Lakoff and Johnson 1980, 1999; see Grady 2007 for an overview). Verbal meta-
phors are seen as linguistic expressions of underlying cross-domain mappings of con-
cepts. The conceptual mappings claimed to underlie sets of related verbal examples
are usually stated in such research in terms of a target domain (that captures what is
being talked about metaphorically) and a source domain (which generalizes over the
concept in terms of which the target is being understood). This is noted in a verbal state-
ment of the form TARGET IS SOURCE, e.g., SIMILARITY IS PROXIMITY (Grady 1997), where
the IS stands for the mapping. It follows logically from this approach, which has become
known as conceptual (or cognitive) metaphor theory (CMT), that conceptual metaphors
should not just be expressed in spoken or written words (that is: linguistic behavior) but
also in other forms of human behavior. This idea received early recognition in gesture
research in McNeill and Levy (1982), leading them to propose metaphoric gestures as
one of the four categories in a classification of manual gesture types; this reached a
wider audience with the publication of McNeill (1992).
As first discussed in Cienki (1998a), there are various ways in which metaphor in
spoken language and gesture may relate to each other. Perhaps the least surprising ex-
amples from the point of view of conceptual metaphor theory are those cases in which
the same imagery appears in words and gestures at the same time, as when one talks
about trying to take different factors into account equally as balancing all these things,
and simultaneously produces a two-handed gesture, with the flat hands, loosely open,
palm-up, one moving up while the other moves down and then vice versa. This gesture
has been interpreted (Cienki 1998a: 193–194) as reflecting a metaphorical mapping
from the source domain of weighing objects comparatively (on one’s hands as if on
the pans of a traditional pair of scales) to the target domain of considering their relative
importance. Such expression of the same metaphoric source domain in words and the
gestures need not occur simultaneously, but may happen consecutively or partially over-
lapping in time. Indeed, given the fact that gestures often slightly precede the verbal
utterances they accompany (Kendon 1980; McNeill 1992), it is not surprising that the
imagery of the source domain concept may be produced slightly earlier in the gesture
than it is expressed in words. This phenomenon can be seen as providing support for
the argument that at least some of the time, metaphors expressed verbally do involve
the conceptualization of both a target and a source domain. Some may argue that
11. Cognitive Linguistics 187
the gesture may just be illustrating the metaphor that was expressed verbally, i.e., that
the gesture was reflecting the verbal metaphor (Bouissac 2008: 279). Perhaps more con-
vincing evidence, however, comes from the co-production of words and gestures which
highlight different metaphoric construals of a given target domain. For example, in
Cienki (2008: 14–15), a student talks about the moral distinction between wrong versus
right as black or white, but she gestures a different aspect of a scenario with an absolute
division, namely by putting the edge of her right hand, tense and flat in the vertical
plane, against the palm of her left hand, palm up and flat in the horizontal plane.
The gesture can be interpreted as reflecting a clear spatial division between two spaces
(on either side of the vertical palm); it renders the opposition of black and white in a
way which allows for iconic representation in gesture, but this ultimately involves the
use of different types of source domains (one based on the dark/light contrast, one
based on a spatial distinction), and so different metaphoric mappings.
Metaphors may appear in the words alone or in the gestures alone (Cienki 1998a).
Metaphor in spoken words but not in co-speech gesture may be found with verbal ref-
erence to a metaphoric source domain (e.g., a color, as in to be feeling blue meaning
“feeling slightly sad”) which cannot be represented iconically in the spatial medium of
gesture. In addition, we often use verbal metaphoric expressions in speech without
any coordinate gesture depicting the source domain (metaphorically used words need
not be accompanied by metaphorically used gestures). Conversely, metaphor may
appear in the gestures alone, such as when a speaker of English gestures from left to
right while describing a process that occurred. The metaphor of an event transpiring
over time from left to right draws on a spatial metaphor that is common, at least in
European cultures, with the preceding state being oriented to the left and the subse-
quent state to the right (Calbris 1985). This is consistent with the directionality of the
writing systems in these languages and also the illustration of a time line in mathemat-
ics. Yet the metaphor is not used verbally in most, if in any, European languages: for
example people do not say in English that they did something to the left to mean
they did it before something else. Indeed, as Bouissac (2008), Gibbs (2008), and Ritchie
(2009) note, a topic that awaits further investigation concerns which conceptual meta-
phors are only expressed via gestures and do not have verbal equivalents. However, this
is a research area in which the analytic metalanguage of conceptual metaphor theory
may impede progress, in that not all metaphoric mappings can be analyzed in terms
of words or phrases which name the target and source domains (in the form X IS Y).
Finally, we also know that a metaphor may exist in the language but simply not be
used at the moment, such that the target domain is expressed in words (honest person)
while the speaker produces a gesture which reflects a possible source domain for that
concept in that culture (such as in this case: a rigid, flat hand gesture with the palm
in the vertical plane, reflecting the honesty as straightness, Cienki 1998b) (Cienki
1998a: 199–200). These gestures can even be as subtle as small rhythmic beat gestures,
for example in a downward direction when talking about wanting to buy something
inexpensively (a mapping of the sort LESS COST IS DOWN), even without using spatial
language from the source domain (like low price) (Casasanto 2008).
Just as conceptual metaphor theory has been one of the most popular and widely-
applied approaches to the analysis of linguistic data within cognitive linguistics, so
has it been the sub-field of cognitive linguistics which has garnered the most interest
within the field of gesture studies. Other early work in this vein applied conceptual
188 II. Perspectives from different disciplines
3. Metonymy
Within cognitive linguistics, the study of metonymy has long played a secondary role in
relation to research on metaphor. Yet its potential for the field of cognitive linguistics
was already put forward in Lakoff and Johnson (1980) with claims such as: “Metonymic
concepts allow us to conceptualize one thing by means of its relation to something else”
(Lakoff and Johnson 1980: 39). Subsequent research on metonymy in cognitive linguis-
tics has, for the most part, maintained this broad approach to what constitutes meto-
nymy, subsuming synecdoche under it, as proposed previously in Jakobson ([1956]
1990). The growing role of conceptual metonymy as a topic of study in cognitive linguis-
tics can be seen in collections such as Barcelona (2000), Dirven and Pörings (2002), and
Panther and Radden (1999); see Panther and Thornburg (2007) for an overview. The
connection of metonymy to gesture research from an explicitly cognitive linguistics per-
spective is as yet just beginning (Cienki 2007; Cienki and Müller 2006; Mittelberg 2006,
2008; Mittelberg and Waugh 2009), and even work on metonymy and gesture from
other theoretical approaches is limited (e.g., Bouvet 2001). (See the chapter on gestures
and metonymy in volume 2 of the Handbook for more details.)
From a contemporary cognitive linguistic perspective, “Metonymy is a cognitive pro-
cess in which one conceptual entity, the vehicle, provides mental access to another con-
ceptual entity, the target, within the same cognitive model” (Radden and Kövecses
1999: 21, building on Croft 1993, Lakoff 1987, and Langacker 1993). This definition is
aptly neutral with respect to the types of expression of, or stimulus for, the relevant
conceptual entities, be they verbal, gestural, or other types of expressions.
However, it is also clear from the existing research that metonymy as expressed in
words and metonymy in gesture actually function quite differently. The verbal expres-
sion of the metonymic vehicle is a sign or combination of signs which is part of a con-
ventionalized system of form-meaning pairings in the given language. The reduced
nature of the iconicity through which most spoken and written linguistic signs refer
has been pointed out at least since de Saussure ([1916] 1959), even if that reduction
has been qualified and put into a more balanced perspective more recently in cognitive
linguistics (see van Langendonck 2007). By contrast, iconicity is fundamental to refer-
ential gestures (Müller 1998a). Even though it is usually quite schematic in nature, it
still plays a more essential role in the visuo-spatial medium of gesture (as it does in
11. Cognitive Linguistics 189
sign languages, Taub 2001; Wilcox 2007) than it does in spoken or written words (via
their sound or graphic symbolism).
Metonymy is a building block of both pointing and representational gestures (Cienki
and Müller 2006). So demonstrative pointing does not indicate the entire referent itself,
but rather a locatable index to the referent (Clark 2003: 254). To make a connection
with the cognitive linguistic terminology of Langacker (1993), the deictic gesture indi-
cates a perceptually conspicuous site as the reference point for identifying the target.
Note that this applies to reference to concrete entities as well as to the abstract deixis
mentioned in the introduction to this chapter. Speakers (at least in Europe and North
America) sometimes point at apparently empty space when referring either to a new
idea being presented or to one previously referred to by the given interlocutor. The bor-
ders of the referent space are fuzzy, so the metonymy here can be compared to pointing
at a cloud; but again, what is pointed to is a locatable index of the idea-as-space, which
in some cases can be a physical entity or space that is metonymically associated with
the idea (e.g., if you point to a space where someone recently stood when mentioning
something that s/he had uttered). Note that this metonymic pointing to the abstract in-
herently involves metaphor as well, as the idea is reified as a space (perhaps imagined as
an invisible object).
Representational gestures also inherently involve metonymy. All of the four man-
ners of representing discussed in the introduction (re-enactment, substitution, adjacent
representation, and tracing) represent only part or parts of some action or entity. This
applies to whether they are used for concrete reference or abstract (metaphoric) refer-
ence (Cienki 2007; Cienki and Müller 2006). For example, the gestural re-enactment of
writing by hand in the air actually involves no writing instrument or paper: but via me-
tonymy the action shows parts of a whole “writing scene.” A speaker’s open hands as if
tracing a three-dimensional form in the air, be it a bowl or a story as a complete whole
(e.g., in German eine runde Geschichte, a complete, literally “round”, story, Müller
1998b), actually only show parts of that form and the rest is inferred. This leads Mittel-
berg and Waugh (2009) to argue that metonymic mapping is, both semiotically and
cognitively, a prerequisite for metaphoric mapping in metaphoric gestures.
Therefore, metonymy in words and metonymy in gestures differ in how they accom-
plish reference in qualitative terms. Given the ubiquity of metonymy in referential ges-
tures, but not in all referential words, there is also a quantitative difference in the
expression of metonymy in words and gestures. One conclusion we can draw from
the above is that gesture provides a potentially rich source of data for research on
how humans employ conceptual metonymy both in their expressions and cognitively
during face-to-face interaction.
4. Schemas
The importance of schematicity in relation to how language represents meaning has
been recognized as fundamental in cognitive linguistics since the origins of the field.
See, for example, Talmy’s (1983) work on the schematic nature of the meaning of
closed-class words; Johnson’s (1987) and Lakoff ’s (1987) articulation of image schemas
as basic patterns in our physical experience (such as PATH, CONTAINER, BALANCE) which
provide the basis for understanding more abstract domains; and Langacker’s (1987)
employment of schematicity at various levels within the theory of cognitive grammar
190 II. Perspectives from different disciplines
and pointing gestures show something (concrete, or abstract via metaphor) from a par-
ticular physical perspective. This perspective can vary with gesture according to differ-
ent factors, an important one being whether the gesturer uses what is called character
viewpoint or observer viewpoint (Cassell and McNeill 1990; McNeill 1992). For exam-
ple, McNeill (1992: Chapter 7) describes how speakers gestured when telling the story
from a cartoon they viewed in which a cat tried to swing on a rope from one building to
another. Some speakers narrated the event while clasping both hands together and
moving them from near one shoulder horizontally to their other side, as if holding a
rope and swinging on it themselves (character viewpoint). Other speakers would hold
up one hand in a loose fist and move it from one side to the other, depicting the cat
swinging (observer viewpoint on the action). The difference is apparent in their word-
ing, but only subtly. One from the character-viewpoint group said “and he tries to swing
from…”, actually highlighting his eventual lack of success, while one from the observer-
viewpoint group said, “and you see him swinging…” – putting the narrator “inside the
fictive world” (McNeill 1992: 193) of the story being told. For one who hears the speak-
ers and sees their gestures, the differing perspectives are clear much more readily than
for one who just reads the transcripts of the words they uttered. (See the discussion of
mental simulation below.)
Viewpoint is not always so directly related to the physical perspective on a scene,
however. Another construal phenomenon discussed in cognitive linguistics is the
description of static scenes in terms of dynamic processes, what Talmy (1996) has called
“fictive motion.” The argument is that in sentences such as “The path runs from the
house down the hill to the river,” the use of a verb of motion to characterize the static
position of an extended object reflects a dynamic mental scanning of it from a particular
point of view. By way of contrast, one could have described it starting from the river
and going up to the house. Evidence supporting the psychological reality of these claims
of mental scanning has come from experimental studies (Matlock 2004a, 2004b) as well
as observational research, the latter of which includes studies on gesture. For example,
McNeill (1992) and Núñez (2008) discuss how mathematicians produce gestures that
metaphorically represent certain concepts in terms of motion events – a dynamicity
which is not inherent in the formal definitions and axioms for these concepts in math-
ematics. Gesture can thus provide evidence of conceptualization in terms of fictive
motion even when fictive motion might not be evident in the conventional verbal
and written expressions used in the domain in question.
Construal always takes place against the background of the speaker’s previous
knowledge and attitudes, and so his/her personal interpretation of the topic being
talked about can play a more or less prominent role. The field of clinical psychology
has taken bodily movements into consideration for some time in research on changes
in speaker attitude (e.g., Freedman 1971). These gestures often involve movement
of the torso or the whole body – for example, torso shifts reflecting the taking on a
different point of view (literally and metaphorically a different “stance”) towards an
issue.
In sum, taking a multimodal perspective on meaning-expression with spoken lan-
guage in the face-to-face context, we see that gestures produced in various ways can
be used (intentionally or unwittingly) by speakers to signal their viewpoint or perspec-
tive on the subject of talk in the moment. This can include their understanding of the
developing discourse structure itself, as we will see in the following section.
11. Cognitive Linguistics 193
7. Mental simulation
Since the late 1990s, the term mental simulation has rapidly gained currency in cognitive
psychology (e.g., Barsalou 1999; Glenberg and Kaschak 2002; Pecher, Zeelenberg, and
194 II. Perspectives from different disciplines
Barsalou 2004) and quickly spread to cognitive linguistics. There are slightly different uses
of the term in the literature, but here we will adhere to the characterization in Barsalou
(2003) that “conceptual processing uses reenactments of sensory-motor states – simula-
tions – to represent categories” (Barsalou 2003: 521). Barsalou elaborates that these simu-
lations are, technically speaking, partial reenactments, since although perception and
conception are similar, they are not identical. Building on this research, a view that is
increasingly gaining ground in cognitive linguistics is known as simulation semantics,
which claims that “[u]nderstanding a piece of language can entail performing mental per-
ceptual and motor simulations of its content” (Bergen 2005: 262; Bergen 2007; see also
adaptations of it in the Neural Theory of Language by Feldman and Narayanan 2004;
Lakoff 2008; and the theory of Lexical Concepts and Cognitive Models, Evans 2006, 2009).
Most of the initial work in this area has focused on written linguistic cues as prompts
for mental simulations (e.g., Glenberg and Kaschak 2002; Stanfield and Zwaan 2001).
However, more recent developments also take spoken language and gesture into
account. For example, Cook and Tanenhaus (2008) had some research participants
move a tower of disks from one peg to another and others did the same task on a com-
puter, moving the images of the disks with a computer mouse. Both sets of participants
then described the task to new participants. Relevant differences were found in how the
people gestured while describing the task, depending on whether they had done it phys-
ically or virtually. The new participants then had to do the task – either using the same
apparatus that they heard described (physical or computer-based) or the other set-up.
The new participants were found to be influenced by the information that they had
gleaned from the gestures of the first set of participants as they tried to solve the task them-
selves, suggesting their mental simulation of what they had (consciously or unconsciously)
seen in the gestures. In addition, studies measuring event-related potentials (ERP) in the
brain provide supporting evidence that “iconic gestures activate image-specific informa-
tion about the concepts which they denote” (Wu and Coulson 2007: 244). It therefore ap-
pears that both words and gestures can serve as cues for addressees to simulate (at least
some aspects of) the content of talk when they can also see the speaker. (See also
Arbib this volume on the mirror neuron system in relation to gesture research.)
The premise that semantics should be approached in terms of the mental simulation
of the content of language is a solution for some of the quandaries posed by remnants of
pre-cognitive linguistic theory which have maintained a foothold in cognitive linguistics.
A case in point is the adherence to words as the medium or metalanguage for many kinds
of cognitive linguistic-based semantic analyses. Simulation semantics is thus a direction
which is more consistent with the tenets of cognitive linguistics. Yet giving this approach
its due will mean taking more seriously the generally accepted view that “the conversa-
tional use of language is primary” in terms of “providing a basic model that other uses of
language mimic and adapt as needed” (Langacker 2008: 459). In line with this, the audio-
visual nature of face-to-face talk will have to be acknowledged further in research on
mental simulation, which will mean incorporating the study of gesture to a greater
degree.
8. Conclusions
The above overview highlights some of the ways in which theory from cognitive linguis-
tics has found fertile ground for development by those taking gesture with speech into
11. Cognitive Linguistics 195
account. We also see that analysis of gestural data can raise some challenges for cogni-
tive linguistic theory, as it conventionally employs verbally-based metalanguage in
semantic analyses: the analog nature of meaning expressed in imagery, particularly in
moving images as we have with gesture, is inadequately captured in written words,
which are static, digital symbols. (Note the diagrams used for analyses in cognitive
grammar as an exception, although these are also static in nature.)
In addition, the studies surveyed here support an argument that language is at least
sometimes multimodal (cf. Müller 2009 for the more absolute claim that spoken lan-
guage is inherently multimodal). This idea has roots independent of cognitive linguis-
tics: witness Kendon’s (1980) approach to gesticulation and speech as two aspects of
the process of utterance and McNeill’s (1992: 2) claim that gesture and language are
one system. For arguments about grammar itself as multimodal, see the project Towards
a Grammar of Gesture (www.togog.org), Fricke (2008), and Harrison (2009). Building on
these works, an approach to language as sometimes multimodal seems to best account
for the varying degrees of systematicity of gesture use in contexts of face-to-face
communication.
This suggests that rather than thinking of language as a “classical” category (i.e., one
with clear boundaries) which contains verbal communication (in written, spoken, and
signed forms of language), perhaps a different concept of the category of language is
needed, namely the prototype structure. Research in cognitive linguistics (e.g., Lakoff
1987; Taylor’s 1995 overview) has endorsed the idea that linguistic categories on a num-
ber of levels of description (such as those of the phoneme, word, and syntactic construc-
tion) exhibit prototype effects. Rather than being definable in the classical way, in terms
of necessary and sufficient conditions, these categories exhibit fuzzy boundaries – the
difficulty of delimiting what constitutes “a word” in many languages being a prime
example (is a sometimes separable clitic a word or an affix?). In a similar way, perhaps
linguists should approach language itself as a category with conventional verbal symbols
being the prototypical manifestation but also recognize that what are often considered
paralinguistic features – including expressive forms like intonation and gesture with
spoken language – can also sometimes have conventionalized symbolic status. In
some cases, this may be redundant with the spoken words, but sometimes these other
forms of expression like gesture may not be co-expressive with words but function on
their own merits. Whether the communicative system so viewed is still best character-
ized with the label language, or rather as something like a variably multimodal semiotic
system, remains an issue to be explored.
Acknowledgements
Work on this chapter was supported by a 2009-10 Fellowship-in-Residence at NIAS, the
Netherlands Institute for Advanced Study in the Humanities and Social Sciences.
9. References
Arbib, Michael this volume. Mirror systems and the neurocognitive substrates of bodily commu-
nication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An Inter-
national Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and
Communication Science. Berlin 38.1.) New York: De Gruyter Mouton.
196 II. Perspectives from different disciplines
Barcelona, Antonio (ed.) 2000. Metaphor and Metonymy at the Crossroads: A Cognitive Perspec-
tive. Berlin: De Gruyter Mouton.
Barlow, Michael and Suzanne Kemmer (eds.) 2000. Usage-Based Models of Language. Stanford,
CA: Center for the Study of Language and Information.
Barsalou, Lawrence W. 1999. Perceptual symbol systems. Behavioral and Brain Sciences 22: 577–660.
Barsalou, Lawrence W. 2003. Situated simulation in the human conceptual system. Language and
Cognitive Processes 18: 513–562.
Bergen, Benjamin 2005. Mental simulation in literal and figurative language understanding. In:
Seana Coulson and Barbara Lewandowska-Tomaszczyk (eds.), The Literal and Nonliteral in
Language and Thought, 255–278. Frankfurt am Main: Peter Lang.
Bergen, Benjamin 2007. Experimental methods for simulation semantics. In: Monica Gonzalez-
Marquez, Irene Mittelberg, Seana Coulson and Michael Spivey (eds.), Methods in Cognitive
Linguistics, 277–301. Amsterdam: John Benjamins.
Bouissac, Paul 2008. The study of metaphor and gesture: A critique from the perspective of semi-
otics. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 277–282. Amsterdam:
John Benjamins.
Bouvet, Danielle 2001. La Dimension Corporelle de la Parole: Les Marques Posturomimo-
Gestuelles de la Parole, leurs Aspects Métonymiques et Métaphoriques, et leur Rôle au Cours
d’un Récit. Paris: Peeters.
Calbris, Geneviève 1985. Espace-Temps: Expression gestuelle de temps. Semiotica 55: 43–73.
Calbris, Geneviève 2003. From cutting an object to a clear-cut analysis: Gesture as the represen-
tation of a preconceptual schema linking concrete actions to abstract notions. Gesture 3(1):
19–46.
Casasanto, Daniel 2008. Conceptual affiliates of metaphorical gestures. Paper presented at the
conference Language, Communication, and Cognition, Brighton, August 2008.
Cassell, Justine and David McNeill 1990. Gesture and ground. In: Proceedings of the Sixteenth
Annual Meeting of the Berkeley Linguistics Society, 57–68. Berkeley, CA: Berkeley Linguistics
Society.
Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge: Massachusetts Institute of
Technology Press.
Chomsky, Noam 1995. The Minimalist Program. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Cienki, Alan 1998a. Metaphoric gestures and some of their relations to verbal metaphoric expres-
sions. In: Jean-Pierre Koenig (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford, CA: Center for the Study of Language and Information.
Cienki, Alan 1998b. STRAIGHT: An image schema and its metaphorical extensions. Cognitive Lin-
guistics 9: 107–149.
Cienki, Alan 2005. Image schemas and gesture. In: Beate Hampe (ed.), From Perception to Mean-
ing: Image Schemas in Cognitive Linguistics, 421–442. Berlin: De Gruyter Mouton.
Cienki, Alan 2007. Reference points and metonymic sources in gesture. Talk presented at the
ninth International Cognitive Linguistics Conference, Kraków, Poland, July 2007.
Cienki, Alan 2008. Why study metaphor and gesture? In: Alan Cienki and Cornelia Müller (eds.),
Metaphor and Gesture, 5–25. Amsterdam: John Benjamins.
Cienki, Alan 2009. Mental space builders in speech and in co-speech gesture. In: Ewa Jarmoło-
wicz-Nowikow, Konrad Juszczyk, Zofia Malisz and Michał Szczyszek (eds.), GESPIN: Gesture
and Speech in Interaction [CD-ROM and http://issuu.com/cognitarian/docs/cienki/1]. Poznań.
Cienki, Alan in press. Gesture, space, grammar, and cognition. In: Peter Auer and Martin Hilpert
(eds.), Space in Language and Linguistics: Geographical, Interactional, and Cognitive Perspec-
tives. Berlin: Walter de Gruyter.
Cienki, Alan and Cornelia Müller 2006. How metonymic are metaphoric gestures? Talk presented
at the German Society for Cognitive Linguistics conference, Munich, October 2006.
11. Cognitive Linguistics 197
Cienki, Alan and Cornelia Müller (eds.) 2008a. Metaphor and Gesture. Amsterdam: John Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, gesture, and thought. In: Raymond W. Gibbs
Jr. (ed.), The Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge: Cam-
bridge University Press.
Clark, Herbert H. 2003. Pointing and placing. In: Sotaro Kita (ed.), Pointing: Where Language,
Culture, and Cognition Meet, 243–268. Mahwah, NJ: Lawrence Erlbaum.
Cohen, Gerald 1987. Syntactic Blends in English Parole. Frankfurt am Main: Peter Lang.
Cook, Susan Wagner and Michael K. Tanenhaus 2008. Speakers communicate their perceptual-
motor experience to listeners nonverbally. In: Brad C. Love, Ken McRae and Vladimir M.
Sloutsky (eds.), Proceedings of the 30th Annual Conference of the Cognitive Science Society,
957–962. Austin, TX: Cognitive Science Society.
Cooperrider, Kensy and Rafael Núñez 2009. Across time, across the body: Transversal temporal
gestures. Gesture 9(2): 181–206.
Cornelissen, Joep, Jean Clarke and Alan Cienki 2012. Sensegiving in entrepreneurial contexts:
The use of metaphors in speech and gesture to gain and sustain support for novel business ven-
tures. International Small Business Journal 30: 213–241.
Croft, William 1993. The role of domains in the interpretation of metaphors and metonymies.
Cognitive Linguistics 4: 335–370.
Cutting, J. Cooper and Kathryn Bock 1997. That’s the way the cookie bounces: Syntactic and seman-
tic components of experimentally elicited idiom blends. Memory and Cognition 25(1): 57–71.
Dirven, René and Ralf Pörings (eds.) 2002. Metaphor and Metonymy in Comparison and Contrast.
Berlin: De Gruyter Mouton.
Donald, Merlin 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and
Cognition. Cambridge, MA: Harvard University Press.
Efron, David 1972. Gesture, Race, and Culture. The Hague: Mouton. First published as Gesture
and Environment. New York: King’s Crown Press [1941].
Evans, Vyvyan 2006. Lexical concepts, cognitive models and meaning construction. Cognitive Lin-
guistics 17: 491–534.
Evans, Vyvyan 2009. How Words Mean: Lexical Concepts, Cognitive Models, and Meaning Con-
struction. Oxford: Oxford University Press.
Fauconnier, Gilles 1984. Espaces Mentaux. Paris: Minuit.
Fauconnier, Gilles 1994. Mental Spaces. Cambridge: Cambridge University Press. First published
Cambridge: Massachusetts Institute of Technology Press [1985].
Fauconnier, Gilles and Mark Turner 1998. Principles of conceptual integration. In: Jean-Pierre
Koenig (ed.), Discourse and Cognition: Bridging the Gap, 269–283. Stanford, CA: Center for
the Study of Language and Information.
Fauconnier, Gilles and Mark Turner 2002. The Way We Think: Conceptual Blending and the
Mind’s Hidden Complexities. New York: Basic Books.
Feldman, Jerome and Srini Narayanan 2004. Embodied meaning in a neural theory of language.
Brain and Language 89: 385–392.
Fillmore, Charles 1982. Frame semantics. In: Linguistic Society of Korea (ed.), Linguistics in the
Morning Calm, 111–137. Seoul: Hanshin.
Forceville, Charles and Eduardo Urios-Aparisi (eds.) 2009. Multimodal Metaphor. Berlin: De
Gruyter Mouton.
Freedman, Norbert 1971. The analysis of movement behavior during the clinical interview. In:
Aaron Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 132–165. New
York: Pergamon.
Fricke, Ellen 2008. Grundlagen einer multimodalen Grammatik des Deuschen. Syntaktische
Strukturen und Funktionen. Habilitationsschrift Europa-Universität Viadrina, Frankfurt Oder.
Geeraerts, Dirk and Hubert Cuyckens (eds.) 2007. The Oxford Handbook of Cognitive Linguistics.
Oxford: Oxford University Press.
198 II. Perspectives from different disciplines
Gibbs, Raymond W. 2008. Metaphor and gesture: Some implications for psychology. In: Alan Cienki
and Cornelia Müller (eds.), Metaphor and Gesture, 291–301. Amsterdam: John Benjamins.
Glenberg, Arthur M. and Michael P. Kaschak 2002. Grounding language in action. Psychonomic
Bulletin and Review 9: 558–565.
Glenberg, Arthur M. and David A. Robertson 2000. Symbol grounding and meaning: A compar-
ison of high-dimensional and embodied theories of meaning. Journal of Memory and Language
43: 379–401.
Grady, Joseph E. 1997. Foundations of meaning: Primary metaphors and primary scenes. Ph.D.
dissertation, University of California at Berkeley.
Grady, Joseph E. 2007. Metaphor. In: Dirk Geeraerts and Hubert Cuyckens (eds.), The Oxford
Handbook of Cognitive Linguistics, 188–213. Oxford: Oxford University Press.
Hampe, Beate (ed.) 2005. From Perception to Meaning: Image Schemas in Cognitive Linguistics.
Berlin: De Gruyter Mouton.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Ph.D.
dissertation, Université Michel de Montaigne, Bordeaux, France.
Jakobson, Roman 1990. Two aspects of language and two types of aphasic disturbances. In: Linda
R. Waugh and Monique Monville-Burston (eds.), On Language: Roman Jakobson, 115–133.
Cambridge, MA: Harvard University Press. First published in: Roman Jakobson and Morris
Halle, Fundamentals of Language. The Hague: Mouton [1956].
Johnson, Mark 1987. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Rea-
son. Chicago: University of Chicago Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relation between Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan.” In: Sotaro Kita (ed.),
Pointing: Where Language, Culture, and Cognition Meet, 109–137. Mahwah, NJ: Lawrence Erlbaum.
Kita, Sotaro 2003a. Pointing: A foundational building block of human communication. In: Sotaro
Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet, 1–8. Mahwah, NJ: Law-
rence Erlbaum.
Kita, Sotaro (ed.) 2003b. Pointing: Where Language, Culture, and Cognition Meet. Mahwah, NJ:
Lawrence Erlbaum.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. Online
19 May 2011 [http://cognitextes.revues.org/406].
Lakoff, George 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the
Mind. Chicago: University of Chicago Press.
Lakoff, George 1993. The contemporary theory of metaphor. In: Andrew Ortony (ed.), Metaphor
and Thought, 2nd edition, 202–251. Cambridge: Cambridge University Press.
Lakoff, George 2008. The neuroscience of meaphoric gestures: Why they exist. In: Alan Cienki
and Cornelia Müller (eds.), Metaphor and Gesture, 283–289. Amsterdam: John Benjamins.
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: University of Chicago
Press.
Lakoff, George and Mark Johnson 1999. Philosophy in the Flesh. New York: Basic Books.
Langacker, Ronald W. 1982. Space grammar, analysability, and the English passive. Language 58:
22–80.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar. Volume 1: Theoretical Prerequi-
sites. Stanford, CA: Stanford University Press.
Langacker, Ronald W. 1988. A usage-based model. In: Brygida Rudzka-Ostyn (ed.), Topics in
Cognitive Linguistics, 127–161. Amsterdam: John Benjamins.
Langacker, Ronald W. 1990. Subjectification. Cognitive Linguistics 1: 5–38.
Langacker, Ronald W. 1991a. Concept, Image, and Symbol: The Cognitive Basis of Grammar. Ber-
lin: De Gruyter Mouton.
11. Cognitive Linguistics 199
Müller, Cornelia and Harald Haferland 1997. Gefesselte Hände. Zur Semiose performativer
Gesten. Mitteilungen des Germanistenverbandes 3: 29–53.
Núñez, Rafael 2008. A fresh look at the foundations of mathematics: Gesture and the psycholog-
ical reality of conceptual metaphor. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and
Gesture, 93–114. Amsterdam: John Benjamins.
Núñez, Rafael and Eve Sweetser 2006. With the future behind them: Convergent evidence from
language and gesture in the crosslinguistic comparison of spatial construals of time. Cognitive
Science 30: 1–49.
Panther, Klaus-Uwe and Günter Radden (eds.) 1999. Metonymy in Language and Thought.
Amsterdam: John Benjamins.
Panther, Klaus-Uwe and Linda L. Thornburg 2007. Metonymy. In: Dirk Geeraerts and Hubert
Cuyckens (eds.), The Oxford Handbook of Cognitive Linguistics, 236–263. Oxford: Oxford
University Press.
Parrill, Fey and Eve Sweetser 2004. What we mean by meaning: Conceptual integration in gesture
analysis and transcription. Gesture 4: 197–219.
Pecher, Diane, René Zeelenberg and Lawrence W. Barsalou 2004. Sensorimotor simulations
underlie conceptual representations: Modality-specific effects of prior activation. Psychonomic
Bulletin and Review 11: 164–167.
Radden, Günter and Zoltán Kövecses 1999. Towards a theory of metonymy. In: Klaus-Uwe Pan-
ther and Günter Radden (eds.), Metonymy in Language and Thought, 17–59. Amsterdam: John
Benjamins.
Reddy, Michael J. 1993. The conduit metaphor: A case of frame conflict in our language about lan-
guage. In: Andrew Ortony (ed.), Metaphor and Thought, 164–201. Cambridge: Cambridge Uni-
versity Press. First published [1979].
Ritchie, L. David 2009. Review of Metaphor and Gesture ed. by Alan Cienki and Cornelia Müller
(2008). Metaphor and Symbol 24: 121–123.
Saussure, Ferdinand de 1959. Course in General Linguistics. New York: Philosophical Library.
First published Paris: Payot [1916].
Slobin, Dan 1987. Thinking for speaking. In: Proceedings of the Thirteenth Annual Meeting of the
Berkeley Linguistics Society, 435–445. Berkeley, CA: Berkeley Linguistics Society.
Slobin, Dan I. 1996. From “thought and language” to “thinking for speaking.” In: John Gumperz
and Stephen C. Levinson (eds.), Rethinking Linguistic Relativity, 70–96. Cambridge: Cam-
bridge University Press.
Stanfield, Robert A. and Rolf A. Zwaan 2001. The effect of implied orientation derived from ver-
bal context on picture recognition. Psychological Science 12: 153–156.
Streeck, Jürgen 1994. “Speech-handling”: The metaphorical representation of speech in gestures.
A cross-cultural study. Unpublished manuscript.
Streeck, Jürgen 2009. Gesturecraft: The Manu-facture of Meaning. Amsterdam: John Benjamins.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: Gestures at the transition place. In: Peter Auer
and Aldo Di Luzio (eds.), The Contextualization of Language, 135–157. Amsterdam: John
Benjamins.
Sweetser, Eve 1998. Regular metaphoricity in gesture: Bodily-based models of speech interaction.
Actes du 16e Congrès International des Linguistes (CD-ROM), Elsevier.
Sweetser, Eve 2007. Looking at space to study mental spaces: Co-speech gesture as a crucial data
source in cognitive linguistics. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson
and Michael Spivey (eds.), Methods in Cognitive Linguistics, 201–224. Amsterdam: John
Benjamins.
Talmy, Leonard 1983. How language structures space. In: Herbert L. Pick Jr. and Linda Acredolo
(eds.), Spatial Orientation: Theory, Research, and Application, 225–282. New York: Plenum
Press.
11. Cognitive Linguistics 201
Talmy, Leonard 1996. Fictive motion in language and “ception.” In: Paul Bloom, Mary A. Peterson,
Lynn Nadel and Merrill F. Garrett (eds.), Language and Space, 211–276. Cambridge:
Massachusetts Institute of Technology Press.
Taub, Sarah 2001. Language from the Body: Iconicity and Metaphor in American Sign Language.
Cambridge: Cambridge University Press.
Taylor, John R. 1995. Linguistic Categorization: Prototypes in Linguistic Theory. Oxford: Oxford
University Press. First published [1989].
Teßendorf, Sedinha and Silva Ladewig 2008. The brushing-aside and the cyclic gesture: Recon-
structing their underlying patterns. Talk presented at the third conference of the German Cog-
nitive Linguistics Association, Leipzig, Germany, September 2008.
Traugott, Elizabeth Closs 1986. From polysemy to internal semantic reconstruction. In: Proceed-
ings of the Twelfth Annual Meeting of the Berkeley Linguistics Society, 539–550. Berkeley, CA:
Berkeley Linguistics Society.
Traugott, Elizabeth Closs 1988. Pragmatic strengthening and grammaticalization. In: Proceedings
of the Fourteenth Annual Meeting of the Berkeley Linguistics Society, 406–416. Berkeley, CA:
Berkeley Linguistics Society.
Tuggy, David 2007. Schematicity. In: Dirk Geeraerts and Hubert Cuyckens (eds.), The Oxford
Handbook of Cognitive Linguistics, 82–116. Oxford: Oxford University Press.
Turner, Mark and Gilles Fauconnier 1995. Conceptual integration and formal expression. Meta-
phor and Symbolic Activity 10: 183–203.
Webb, Rebecca 1997. Linguistic features of metaphoric gestures. Ph.D. dissertation, University of
Rochester, New York.
Wilcox, Sherman 2007. Signed languages. In: Dirk Geeraerts and Hubert Cuyckens (eds.), The
Oxford Handbook of Cognitive Linguistics, 1113–1136. Oxford: Oxford University Press.
Williams, Robert F. 2008a. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Müller (eds.), Metaphor and Gesture, 55–92. Amsterdam: John Benjamins.
Williams, Robert F. 2008b. Path schemas in gesturing for thinking and teaching. Talk presented at
the third conference of the German Cognitive Linguistics Association, Leipzig, Germany,
September 2008.
Wu, Ying Choon and Seana Coulson 2007. How iconic gestures enhance communication: An ERP
study. Brain and Language 101: 234–245.
Zlatev, Jordan 2005. What’s in a schema? Bodily mimesis and the grounding of language. In: Beate
Hampe (ed.), From Perception to Meaning: Image Schemas in Cognitive Linguistics, 313–342.
Berlin: De Gruyter Mouton.
Zlatev, Jordan 2007. Embodiment, language, and mimesis. In: Tom Ziemke, Jordan Zlatev and
Roslyn Frank (eds.), Body, Language and Mind, Volume 1: Embodiment, 297–337. Berlin:
De Gruyter Mouton.
Zlatev, Jordan 2008. The co-evolution of intersubjectivity and bodily mimesis. In: Jordan Zlatev,
Timothy R. Racine, Chris Sinha and Esa Itkonen (eds.), The Shared Mind: Perspectives on In-
tersubjectivity, 215–244. Amsterdam: John Benjamins.
Zlatev, Jordan, Tomas Persson and Peter Gärdenfors 2005a. Bodily mimesis as “the missing link”
in human cognitive evolution. Lund University Cognitive Studies 121: 1–40.
Zlatev, Jordan, Tomas Persson and Peter Gärdenfors 2005b. Triadic bodily mimesis is the differ-
ence. Behavioral and Brain Sciences 28: 720–721.
1. Introduction
2. Bühler’s organon model of language as use: A functional-linguistic framework for gesture
3. How gestures realize the three functions of the organon model: Expression, appeal,
representation
4. Gestures with a dominantly pragmatic function
5. Conclusion
6. References
Abstract
Departing from Karl Bühler’s psychological theory of language and his theory of expres-
sion, it is proposed that gestures exhibit a potential for language because they can be used
to fulfill the same basic functions as language. Gestures can express inner states and feel-
ings, they may regulate the behavior of others (they are designed for and directed towards
somebody) and they can represent objects and events in the world (Bühler [1934] 1982;
Müller 1998a, 1998b). Notably these functions are co-present in any speech-event and
they can be co-present in gestures, with one of them taking center stage. The functional
approach to gestures relates them to three basic aspects of any communicative encounter –
the speaker, the listener, and the world we talk about. With this systematics in place, we
can categorize a wide range of gestures according to their dominant function: expression,
appeal, or representation.
1. Introduction
Regarding gestures as a medium of expression is inspired by Wilhelm Wundt and Karl
Bühler, both psychologists with a great interest for language and expression, for com-
municative body movements and for their relation to affective as well as linguistic
expression. In Bühler’s theory of expression, gestures figure as one expressive medium
which may in some cases take over representational functions comparable to words
(Bühler 1933, [1934] 1982, 2011). In Wundt’s Folk Psychology, gestures are considered
forms of expression that are at the roots of language evolution (Wundt 1921). This is
why Wundt discusses sign languages of the deaf and the signs of Northern American
Plains Indians as much as the manifold gestures of the Neapolitans. Wundt regards ges-
tural forms of linguistic communication as precursors of vocal language. Looking at
signed languages today, we know that hand movements can develop into a fully fledged
language. Signed languages have a grammar and lexicon, and they vary across cultures.
French Sign Language differs significantly from British, German, Danish or Nicaraguan
Sign Language and this means no less than in order for a sign language to develop
historically, gestures undergo processes of grammaticalization.
Now, how is it that movements of the hands are equipped to turn into language, if
necessary? What makes them apt to do so? What are their properties as an articulatory
12. Gestures as a medium of expression: The linguistic potential of gestures 203
medium that makes them the only body part other than the vocal tract that can develop
into a full-fledged language? We know that hand and mouth movements are controlled
in the same cortical region in the brain (Wilson 1998), and it is assumed that language
only evolved in the human species after the change to bipedalism (Leroi-Gourhan 1984;
Wilson 1998). The latter assumption inspired a broad range of theories proposing a ges-
tural origin of language (Arbib and Corballis, this volume, are current proponents of
such a theory; see also Kendon 1991 and Armstrong and Wilcox 2007). Condillac’s
theory of a language d’action (a language of action as precursor of vocal language) is
an important example of a close intertwining of gestures and articulatory expressions
(see Copple, this volume). Kendon (2008) and McNeill (this volume) in turn propose
different theories about how vocal and gestural articulation evolved together. We
also know from praxeological accounts of gesture that the hands are the only organ
apart from the vocal tract that have developed a capacity for flexible and variable
movements with a high degree of articulatory precision (see Streeck 2009, this volume).
Notably, the hands are the techniques of the body par excellence (Mauss 1934), they are
the primary tools that ensure survival and afford the development of culture (Leroi-
Gourhan 1984). The medium “hand” appears therefore well equipped to develop into
a full-fledged language. There are at least two major reasons: a highly flexible articula-
tion (the capacity for a high differentiation of movements is a prerequisite for a com-
plex sign system) and a manifold instrumental use of the hands, which provides the
functional grounds (and infinite sources) for the creation of gestural meaning. Everyday
actions – such as opening, closing, holding, pushing, drawing lines in a soft surface or
touching objects – may serve as roots for gestural meaning (Müller 1998a, 1998b,
2010a, 2010b; Müller and Haferland 1997; Streeck 2002, 2009, this volume).
The hands’ movement properties in conjunction with their functions as instruments
to deal with the world make them a medium of expression with a potential for language.
With these properties hands and mouth stand out against all other possible bodily
media of expression: neither face, nor legs or feet have a comparable articulatory free-
dom and instrumental and functional richness. The linguistic view on gesture proposed
here takes these medial and functional properties as a point of departure; it regards ges-
tures as a medium of expression.
language use – or in his terms the concrete speech event – as a starting point. Language
functions as an instrument which sets up a relation between three fundamental aspects
of any such speech event: the speaker, the addressee and the things. The speaker uses
language as an instrument to tell his/her addressee something about the things in the world.
By specifying the particular functions of a linguistic sign in the moment of speaking,
Bühler critically evaluates and re-formulates Plato’s simple model of language use.
Situating language in the basic structure of any communicative encounter – the speaker,
the addressee and the world talked about – he establishes three basic functions of the
concrete ‘sound phenomenon’ (Schallphänomen) or the uttered (complex) linguistic
sign (e.g. this could be a word, or a phrase): it expresses, appeals, and represents. A
word that is spoken represents entities and events in the world; at the same time it ex-
presses inner states and feelings, and as a communicative sign it appeals to somebody
(and directs the behavior of somebody/or it addresses somebody). Here is Bühler’s
reformulation of Plato’s model (see Fig. 12.1):
The circle in the middle symbolizes the concrete acoustic phenomenon. Three variable fac-
tors in it go to give it the rank of a sign in three different manners. The sides of the in-
scribed triangle symbolize these three factors. In one way the triangle encloses less than
the circle (thus illustrating the principle of abstractive relevance). In another way it goes
beyond the circle to indicate that what is given to the senses always receives an appercep-
tive complement. The parallel lines symbolize the semantic functions of the (complex) lan-
guage sign. It is a symbol by virtue of its coordination to objects and states of affairs, a
symptom (Anzeichen, indicium: index) by virtue of its dependence on the sender, whose
inner states it expresses, and a signal by virtue of its appeal to the hearer, whose inner
or outer behaviour it directs as do other communicative signs.
This organon model, with its three largely independently variable semantic relations, was
first expounded completely in my paper on the sentence (Bühler 1918), which begins with the
words: “What human language does is threefold: profession, triggering and representation.”
Today I prefer the terms expression, appeal and representation […]. (Bühler 2011: 34–35)
representation
expression
appeal
sender receiver
an object or enact actions with the face or the head or the legs (if not they are actions of
the head or the feet; or if not, the feet are used as instruments for sketching out forms).
Hands can do all this without any effort, they can represent entities other than them-
selves and in fact people use them all the time for these purposes. With their hands, peo-
ple mold geometric shapes, reenact instrumental actions of the hands and by doing so,
they refer either to the action or the object acted upon. Speakers use their hands also
as objects in motion: wiggling fingers as moving legs, a flat hand for a piece of paper, or
the fist for a ball (see Müller 1998a, 1998b, 2009, 2010a; for more on gestural modes of
representation, see Müller volume 2 of this handbook).
As far as pointing is concerned, we can do this quite well with different kinds of ar-
ticulators (eyes, mouth, chin, feet) but the depiction of entities or events in the world is
difficult to do with other articulators than the hand. This is what makes their properties
of expression so unique and this is what they share with the vocal tract: an extremely
high level of movement differentiation and a big range of possible movements along
with the functional property to actually exploit the techniques of the hand for depictive
purposes. These articulatory and functional properties constitute the articulatory pre-
requisite for a complex sign system to emerge. Gestures fulfill all three functions formu-
lated in the organon model and they do so simultaneously. This is what makes gestures
so apt for language. They can talk about the world and express inner feelings and they
are directed to an addressee. The following section will illustrate how they do this.
flexibility to form manifold shapes, to move in a wide variety of manners, and to occupy
all kinds of possible places within a fairly large space, they are candidates for language.
Hand movements are highly articulate and they are humans’ most important instruments
to deal with the world, to shape it and to handle it (Streeck 1994, 2009, this volume). Free-
dom of form, movement, and space along with the complex functionality of hand move-
ments also provides the articulatory and functional grounds for the co-presence of the
three functions of expression, appeal, representation. Bühler argues that every word,
every phrase we utter realizes the three functions at the same time, with one of the func-
tions usually being dominant (Bühler [1934] 1982). Looking at gestures, this co-presence
is particularly clear: whenever we are depicting something, let’s say a rectangular object,
our hands will mold a rectangular shape, but the molding will be realized in a particular
manner, with a specific movement quality: we might carefully almost tenderly move our
hands, or perform a vigorous, energetic movement. This movement quality is an expres-
sion of our affective stance towards the object we are depicting and it is co-present with
the depictive dimension of the gesture (Kappelhoff and Müller 2011). The same holds for
the appealing function. It, too, is present in every gesture with a dominantly representa-
tional function. Whenever gestures are performed they are oriented towards an attend-
ing interlocutor, they are addressed towards somebody. So, we do not simply mold a
rectangular shape in front of our body, but we tend to direct our hands towards an attend-
ing addressee (Streeck 1993, 1994). Of course, sometimes we seem to gesture for our-
selves, when sorting out complex spatial tasks, or when speaking on the phone, but
then we as speaker/gesturers are our own addressee. The following section provides
examples for gestures with dominantly representational, appealing, or expressive function.
Both hands reenact unpacking move- Both extended fingers trace the shape
ment, which is repeated once after the of a picture frame with a repeated
sentence is finished. stroke.
Both hands mold in front of the body Right hand, clenched fist, represents
the shape of a round bench. car in motion, bashing against a wall,
which is represented by the vertically
oriented left extended hand.
Fig. 12.3: Gestures with a dominantly representational function depicting a concrete object and
event (example 3: acting as if molding a round object and 4: right hand represents object in
motion, left hand represents flat, vertically oriented object).
expression. In the next example (Fig. 12.4, example 5), a woman is shown who enacts an
unpacking movement with both hands, very similar to the one that we have described in
Fig. 12.2, example 1. However, here the unpacking refers to the ‘unpacking’ of secrets
(Geheimnisse auspacken), figuratively this German idiom refers to the telling of secrets.
208 II. Perspectives from different disciplines
What we see here is a typical case of a so-called metaphoric gesture (for an overview on
metaphor and gesture, see Cienki and Müller 2008a, 2008b).
Er packte aus.
stroke post stroke hold retraction
‘He unpacked.’
Fig. 12.4: Gesture with a dominantly representational function referring to abstract action
(example 5: acting as if “unpacking”, e.g. revealing a secret).
Speakers also use their hands to outline abstract paths. In example 6 (Fig. 12.5), the
woman traces the ups and downs of her first love-relationship starting from the left
hand side. The gesture again is used metaphorically: the invisible line is used to meta-
phorically depict the emotional and temporal course of her first relationship.
Left extended index finger goes up and traces a wavy line from left
to right to metaphorically depict the course of the relationship
Fig. 12.5: Gestures with a dominantly representational function depicting an abstract event: meta-
phoric gestures (example from Müller 2007, 2008a, 2008b) (example 6: tracing course of relationship).
12. Gestures as a medium of expression: The linguistic potential of gestures 209
Fig. 12.6: Gestures with a dominantly representational function depicting an abstract object and
event: metaphoric gestures (example 7: molding scope of time and 8: crashing enterprise).
Note, however that the gestures described above are not simple reflections of something
out there in the real world. Instead, they are mediated by processes of conceptualiza-
tion, interaction, and the depictive possibilities of visible bodily movement. Notably
the four different examples of gestural depiction reflect the four basic techniques of
the hands for the creation of gestural signs, elsewhere termed gestural modes of repre-
sentation: acting, molding, tracing, representing (see Müller volume 2 of this handbook
and 1998a,1998b, 2004, 2009, 2010a, 2010b).
210 II. Perspectives from different disciplines
Both arms raised, fists clenched, expressing Right hand closes eyes and covers face,
joy and triumph., expressing sadness and grief.
Clenched fists bang energetically downwards, expressing anger and even rage.
Fig. 12.7: Gestures with a dominant function of expressing emotions: raising arms expressing joy
and triumph (example 9), covering face expressing sadness and grief (example 10), banging fists
expressing anger and rage (example 11).
of banging the fist, raising the hands, and covering the face, the body movements are
symbolic – they are culturally shaped, conventionalized willful expressions of emotion.
They might have physiological and experiential roots – but they are no purely symptom-
atic forms of behavior. However, as we have mentioned above, all gestural movements
have an expressive dimension: So gestures with a representational function and gestures
with a dominantly appealing function do all have an expressive dimension, which re-
sides in their movement qualities (primarily). Gestures with a dominant function of
expressing emotions also differ with regard to movement qualities; here they express
differences in the degree of joy, anger, sadness, their dominant function appears to
be the expression of basic emotions, while the movement qualities seem to be more
212 II. Perspectives from different disciplines
on the side of what emotion psychology and neuro-cognitive science would characterize
as feelings (Damasio 2003).
Both hands, palm down are moved downwards to Right index finger is placed across the lips to say
calm the audience in the parliament. “be quite”.
‘Be quiet!’
With his flat hand oriented down and a stretched arm a seller on the fish market waves the bypassers to
come close and listen to his price offers.
Fig. 12.8: Gestures with a dominant function of appealing to interlocutors (conative or perlocu-
tionary function) (example 12: open hands moved down to calm an audience, 13: index across
lips to request silence from interloctor and 14: waving towards body asking interlocutor to
approach).
Therefore, and with this theoretical shift, we find that there is a group of rather fre-
quently used gestures that cross-cuts Bühler’s functional approach. Kendon has referred
to them as gestures used with a pragmatic function in a rather general sense as “any of the
ways in which gestures may relate to features of an utterance’s meaning that are not a
part of its referential meaning or propositional content” (Kendon 2004: 158). He distin-
guishes three main kinds of pragmatic functions: Müller’s modal and performative ges-
tures and those with a parsing function (Kendon 2004: 159). There is quite a body of
214 II. Perspectives from different disciplines
research on pragmatic gestures (see volume 2 of this handbook for an overview Kendon
1995, 2004; Streeck 1994, 2009; Teßendorf 2008, just to mention a selection), but for now
we want to focus on one very prominent type: the performative gestures.
There are gestures whose primary function is to execute a speech-act. These gestures
function like performative verbs: when saying I swear the action of swearing is realized,
when saying I bless you, the action of blessing is realized, when offending somebody by
expressing Fuck you, the offense is realized. Interestingly most of those verbal perfor-
matives go along with a highly conventionalized performative gesture. Swearing (with
an open palm raised vertically), blessing (sketching the Christian cross), offending
(showing somebody the erected middle finger, palm turned towards speaker). And
notably these gestures do have a status as legal acts (swearing) or as actions that will
be filed in certain circumstances as an offense (the German police can charge someone
with showing the erected middle finger to another driver). Historically, some of these
verbal performatives might have been verbalizations of gestural performative acts in
the first place (see for an example of this the case of the medieval honoring gesture,
Müller and Haferland 1997, volume 2 of this handbook). Performative gestures in gen-
eral are extremely common and widespread and they are often fully conventionalized
speech acts (mostly characterized as emblems, see Teßendorf this volume).
This indicates once more that gestures are in principle functionally comparable to
verbal signs: that they may re-present something other than themselves while at the
same time expressing some inner state, being addressed towards somebody, and by ex-
ecuting speech acts and other communicative activities (Müller 1998b). From a more
strictly speech-act theoretical point of view (Austin 1962; Searle 1969), it is obvious
that every gesture is a communicative action, some of them primarily express proposi-
tional content (gestures with a representational function or referential gestures), some
of them primarily realize illocutionary force (gestures with a performative function),
while others mainly execute perlocutionary effects. These functional properties indicate
that gestures embody the functional and articulatory seeds of language.
5. Conclusion
Regarding gestures as a medium of expression advocates an embodied and functional
perspective on gestures. It opens up pathways to a linguistic understanding of gestures:
the understanding of gestures as linguistic derives in part from their functions in lan-
guage, and this can be realized by researchers if they do a close analysis of gestural
forms. In short, the linguistic potential of gestures is grounded in their properties as
a medium of expression. (see Bressem, Ladewig and Müller this volume and Müller,
Ladewig and Bressem this volume, for more detail) To be clear, co-verbal gestures
themselves are not a language on their own; they are integrated with speech and are
part and parcel of multimodal utterances. They are embodied seeds of language, and
when developing into signed languages, this potential for language – that of all
human articulators only the mouth and hands carry – comes to light.
Acknowledgment
For the presentation of the examples, we will not give a full form-meaning analysis
as proposed by the ToGoG Method of Gesture Analysis (see Müller, Ladewig and
12. Gestures as a medium of expression: The linguistic potential of gestures 215
Bressem volume 2 and Bressem, Ladewig and Müller for our Linguistic Annotation
System this volume). Instead, for the illustration of different gesture functions, we
will account for the overall gestural form gestalt and the temporal placement of the
gesture in relation to speech.
We thank Mathias Roloff for providing the drawings (www.mathiasroloff.de) and the
Volkswagen Foundation for supporting this work with a grant for the interdisciplinary
project “Towards a grammar of gesture: evolution, brain and linguistic structures”
(www.togog.org).
6. References
Arbib, Michael A. this volume. Mirror systems and the neurocognitive substrates of bodily com-
munication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An Inter-
national Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and
Communication Science 38.1.) Berlin: De Gruyter Mouton.
Armstrong, David F., and Sherman E. Wilcox 2007. The Gestural Origin of Language. New York:
Oxford University Press.
Auer, Peter 1999. Sprachliche Interaktion: Eine Einführung anhand von 22 Klassikern. Tübingen:
Niemeyer.
Austin, John L. 1962. How to Do Things With Words. Cambridge: Harvard University Press.
Bressem, Jana, Silva H. Ladewig and Cornelia Müller this volume. Linguistic Annotation System
for Gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Bühler, Karl 1933. Ausdruckstheorie: Das System an der Geschichte aufgezeigt. Jena: Fischer.
Bühler, Karl 1982. Sprachtheorie: Die Darstellungsfunktion der Sprache. Stuttgart: Fischer. First
published [1934].
Bühler, Karl 2011. Theory of Language: The Representational Function of Language. Amsterdam:
John Benjamins.
Cienki, Alan and Cornelia Müller 2008a. Metaphor and Gesture. Amsterdam: John Benjamins.
Cienki, Alan and Cornelia Müller 2008b. Metaphor, Gesture and Thought. In: Raymond W. Gibbs
(ed.), The Cambridge Handbook Metaphor and Thought, 484–501. Cambridge: Cambridge
University Press.
Copple, Mary this volume. Enlightenment philosophy: Gestures, language, and the origin of
human understanding. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An Inter-
national Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and
Communication Science 38.1.) Berlin: De Gruyter Mouton.
Corballis, Michael this volume. Mirror systems and the neurocognitive substrates of bodily com-
munication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig,
David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An Inter-
national Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and
Communication Science 38.1.) Berlin: De Gruyter Mouton.
Damasio, Antonio R. 2003. Looking for Spinoza. Joy, Sorrow, and the Feeling Brain. Orlando:
Harcourt.
Eschbach, Achim 1984. Bühler-Studien. Frankfurt am Main: Suhrkamp.
Jakobson, Roman 1960. Linguistics and Poetics. In: Krystyna Pomorska and Stephen Rudy (eds.),
Language and Literature, 62–94. Cambridge, MA: Harvard University Press.
216 II. Perspectives from different disciplines
Kappelhoff, Hermann and Cornelia Müller 2011. Embodied meaning construction. Multimodal
metaphor and expressive movement in speech, gesture and feature film. Metaphor in the Social
World 1(2): 121–135.
Kendon, Adam 1991. Some considerations for a theory of language origins. Man (The Journal of
the Royal Anthropological Institute) 26(2): 199–221.
Kendon, Adam 1995. The Open Hand. Manuscript. Albuquerque.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kendon, Adam 2008. Signs for language origins? Public Journal of Semiotics 2(2): 2–29.
Krumhuber, Eva, Susanne Kaiser, Arvid Kappas and Klaus R. Scherer this volume. Body and
speech as expression of inner states. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication:
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguis-
tics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: Chicago University Press.
Leroi-Gourhan, Andre 1984. Hand und Wort: Die Evolution von Technik, Sprache und Kunst.
Frankfurt am Main: Suhrkamp.
Mauss, Marcel 1934. Les techniques du corps. Journal de Psychologie 32(3–4): 271–293.
McNeill, David this volume. Gesture as a window onto mind and brain, and the relationship to
linguistic relativity and ontogenesis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication:
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguis-
tics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Müller, Cornelia 1998a. Iconicity and Gesture. In: Serge Santi, Isabelle Guaitella, Christian Cavé
and Gabrielle Konopczynsk (eds.), Oralité et Gestualité: Communication Multimodale, Interac-
tion, 321–328. Montréal: L’Hartmann.
Müller, Cornelia 1998b. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arno Spitz.
Müller, Cornelia 2004. Forms and Uses of the Palm Up Open Hand: A Case of a Gesture Family?
In: Cornelia Müller and Roland Posner (eds.), Semantics and Pragmatics of Everyday Gestures,
234–256. Berlin: Weidler.
Müller, Cornelia 2007. A dynamic view on gesture, language and thought. In: Susan D. Duncan,
Justine Cassell and Elena T. Levy (eds.), Gesture and the dynamic dimension of language. Es-
says in honor of David McNeill. Amsterdam: John Benjamins.
Müller, Cornelia 2008a. Metaphors Dead and Alive, Sleeping and Waking. A Dynamic View. Chi-
cago: University of Chicago Press.
Müller, Cornelia 2008b. What gestures reveal about the nature of metaphor. In: Alan Cienki and
Cornelia Müller (eds.), Metaphor and Gesture, 219–245. Amsterdam: John Benjamins.
Müller, Cornelia 2009. Gesture and Language. In: Kirsten Malmkjaer (ed.), The Linguistic Ency-
clopedia, 214–217. Abingdon: Routledge.
Müller, Cornelia 2010a. Mimesis und Gestik. In: Gertrud Koch, Christiane Voss and Martin Vöh-
ler (eds.), Die Mimesis und ihre Künste, 149–187. Munich: Wilhelm Fink.
Müller, Cornelia 2010b. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. Sprache und Literatur 41(1): 37–68.
Müller, Cornelia volume 2. Gestural modes of representation as techniques of depiction. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana
Bressem (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.2.) Berlin: De Gruyter Mouton.
Müller, Cornelia and Harald Haferland 1997. Gefesselte Hände: Zur Semiose performativer
Gesten. Mitteilungen des Deutschen Germanistenverbandes 3: 29–53.
Müller, Cornelia, Silva H. Ladewig and Jana Bressem this volume. Towards a grammar of gesture:
A form based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
12. Gestures as a medium of expression: The linguistic potential of gestures 217
Abstract
This chapter describes the perspective of Conversation Analysis on embodied action as it
is intertwined with talk. Conversation Analysis works on naturalistic video data as doc-
umenting the endogenous organization of social activities in their ordinary settings. Its
purpose is to study social interaction as it is organized collectively by the co-participants,
in a locally situated way, and as it is built incrementally through its temporal and sequen-
tial unfolding, by mobilizing a large range of vocal, verbal, visual and embodied re-
sources, which are publicly displayed and monitored in situ. This chapter explicates
some features of this approach. First, in order to analyze the organization of social inter-
action as an indexical, contingent, and emergent accomplishment, Conversation Analysis
adopts a methodology for collecting data which is based on fieldwork and video record-
ings in naturalistic settings. Second, these data are transcribed in a detailed way, taking
into consideration the embodied feature as they are made relevant by the participants –
constituting the multimodal resources they use for the emergent and local organization
of their action. These multimodal resources include not only gesture but also body pos-
ture and body movement. Third, analysis of these data focus particularly on the sequen-
tial organization of action, which shapes the temporality both of turns, sequences and
activities.
1. Introduction
Conversation Analysis (CA) looks at the endogenous organization of social activities in
their ordinary settings: it considers social interaction as it is organized collectively by
the co-participants, in a locally situated way, and as it is built incrementally through
its temporal and sequential unfolding, by mobilizing a large range of vocal, verbal,
visual and embodied resources, which are publicly displayed and monitored in situ.
The analysis of these feature focuses on action as an indexical, contingent, and emer-
gent accomplishment. This has some consequences for how the empirical work is con-
ceived and the data are collected, i.e. on what is observed and how it is documented.
Conversation analysis’ naturalistic approach demands the study of “naturally occurring
activities” as they unfold in their ordinary social settings. This highlights the fact that for
a detailed analysis of their relevant endogenous order, recordings of actual situated
activities are necessary.
13. Conversation analysis 219
For studying co-present interaction with sound recording alone risked missing embodied
resources for interaction (gesture, posture, facial expression, physically implemented ongo-
ing activities, and the like), which we knew the interactants wove into both the production
and the interpretation of conduct, but which we as analysts would have no access to. With
the telephone data, the participants did not have access to one another’s bodies either, and
this disparity was no longer an issue.
Nevertheless, video began to be used very early on in the history of conversation ana-
lysis. As early as 1970, Charles and Marjorie Harness Goodwin in Philadelphia carried
out film recordings of everyday dinner conversations and other social encounters. After
1973, these recordings were used by Jefferson, Sacks and Schegloff in research seminars,
and then also in published papers. In 1975, at the Annual Meeting of the American
Anthropological Association, Schegloff presented a paper co-authored with Sacks,
who had been killed a few weeks earlier in a car accident, on “home position”. This
was an early attempt to describe bodily action systematically. In 1977, Charles Goodwin
presented his dissertation at the Annenberg School of Communications of Philadelphia
(later published as C. Goodwin 1981). The dissertation was based on approximately
50 hours of videotaped conversations in various settings (C. Goodwin 1981: 33).
Thus early work by both Emanuel Schegloff (1984) and Charles Goodwin, as well as
by Christian Heath (1986), used film materials in order to analyze and understand how,
in co-present interaction, humans mobilize orderly and situatedly a large range of verbal,
auditory and visual resources in order to produce intelligible – accountable – actions, as
well as to interpret publicly displayed and mutually available actions (Streeck 1993). In
an important way, this early work was convergent with some of the assumptions made
by pioneers in gesture studies, such as Kendon (1990) and McNeill (1981), who had ar-
gued that gesture and talk are not separate “modules” for communication but originate
from the very same linguistic, cognitive and social mechanisms. Further developments of
conversation analysis within the 1980s were characterized by an increasing interest in
data gathered in institutional settings, opening up a program of comparative studies of
speech exchange systems which differ from conversation and are characterized by distinc-
tive and restrictive speakers’ rights (cf. Drew and Heritage 1992: 19; Sacks, Schegloff, and
Jefferson 1974: 729). Studies of institutional settings from the 1980s onwards cover,
among other things, court hearings, medical interactions in primary care, and news inter-
views, partially recorded on video. After the identification of the fundamental structures
of turn-taking in ordinary conversation – considered as the basic form of intersubjective
relation in the social world, the primary form of interaction to which the child is exposed
220 II. Perspectives from different disciplines
and socialized, and thus the fundamental locus for the emergence, development and
change of language – these studies engaged in a comparison between conversation and
institutional talk, the latter being seen as presenting more formal, restricted, specialized,
and constrained sets of practices for the organization of action (Drew and Heritage 1992).
Within this group of work, Heath (1986) was one of the few authors explicitly to address
multimodal features of social interaction documented by video data – such as gaze, ges-
ture, body posture, movement, manipulation of artifacts, spatial arrangements, etc. These
features become central in the so-called workplace studies emerging in the 1990s, in
which video data are central for the documentation of complex professional activities
which are characterized by the use of technologies and artifacts and by a variety of spatial
distributions of the participants. Two projects play an important role in this context: the
first is the study of an airport initiated by L. Suchman at Xerox PARC, which includes
intensive videotaping of various locations, using sometimes as many as seven cameras
focusing on various persons at work, but also producing close-up views of the documents
and screens they were using (C. Goodwin and M.H. Goodwin 1996: 62; Suchman 1993).
The second is the study of the line control rooms of the London Underground initiated by
C. Heath (Heath and Luff 1992; Heath and Luff 2000), focusing on the coordination of
multiple activities co-present and distant. Other settings studied included an archaeology
field (C. Goodwin 1994, 1997) and a scientific ship (C. Goodwin 1995). Later on, numerous
studies followed, on surgery (Koschmann et al. 2007; Mondada 2003, 2007b), television
studios (Broth 2008, 2009), emergency calls and call centres (Fele 2008; Mondada 2008;
M. Whalen and Zimmerman 1987; J. Whalen 1995), pilots’ cockpits (Nevile 2004), etc.
This work implemented new ways of recording data, prompting a reflection about
the way in which data were collected (C. Goodwin 1993; Heath 1997; Heath, Hindmarsh,
and Luff 2010; Mondada 2006b), and about the complexity of embodied action, includ-
ing not only gesture but also object manipulations (Streeck 1996), use of documents, and
use of technologies, as well as complex forms of multi-party interaction, beyond mutu-
ally focused encounters and within peculiar ecologies and environments, all demanding
peculiar techniques of documentation and often more than one camera.
Some studies focus on one of these details and explore their systematic character: for
example, Schegloff (1984) studies gestures produced by speakers, C. Goodwin (1981)
absent vs. mutual gaze prompting re-starts at the beginning of the turn, Mondada
(2007a) pointing as displaying the imminent speaker’s self-selection, Stivers (2008)
nods as expressing affiliation in storytelling, and Peräkylä and Ruusuvuori (2006) facial
expressions as manifesting alignment and affiliation in assessing sequences.
Other studies consider the coherent and coordinated complexity of embodied con-
ducts: for example, Heath (1989) considers together gaze, body posture and body ma-
nipulations, Streeck (1993) gesture and gaze, and Mondada and Schmitt (2010) and
Hausendorf, Mondada, and Schmitt (2012) the coordination of a range of multimodal
resources, from gesture to body position and the distribution of bodies in space. This
emphasis on global Gestalts also invites researchers to investigate the entire body
and its adjustments to other bodies in its environment, taking into account object ma-
nipulations and body movements within the environment (C. Goodwin 2000). Recently,
the consideration of mobility in interaction, comprising walking, driving, and flying,
has emphasized the importance of considering the entire body (Haddington, Mondada,
and Nevile 2013).
What emerges from these studies is the necessity of going beyond the study of sin-
gle “modalities” coordinated with talk, and to take into consideration the broader em-
bodied and environmentally situated organization of activities (Streeck, Goodwin, and
LeBaron 2011).
Charles Goodwin’s research stands as a prime example of a multimodal interactional
perspective: he has done pioneering work in conceptualizing social action in a multimo-
dal environment, and this work has later been developed in a rich diversity of terrains in
workplace studies. His early work shows how the coordination of gaze, as well as the
mutual availability of the participants, has configuring effects on turn and sequence con-
struction (C. Goodwin 1981). Later on, he shows that, in addition to talk and embodiment,
it is crucial to consider other “semiotic fields”, such as the environment and the material
artifacts that surround interactants, which constitute “contextual configurations”, and to
look at how these are used as resources for producing and recognizing actions and for ac-
complishing meaningful activities (C. Goodwin 2000). For example, C. Goodwin (2000)
shows how during a game of hopscotch a player can use the hopscotch grid as a multi-
modal resource for producing and challenging action, e.g. accusing the other player of
throwing a beanbag in the wrong square and thereby of having breached the rules. In
another example, C. Goodwin (2003) provides a detailed analysis of how a graphic struc-
ture in the soil, on which two archaeologists focus their attention, acts as a crucial
resource in the production of the complex embodied activity in which they are engaged.
C. Goodwin argues that the structure (in a similar way to the grid above) “provides orga-
nization for the precise location, shape and trajectory of the gesture” (C. Goodwin 2003:
20). Consequently, different modalities (language, gestures and the body) and the envi-
ronment elaborate upon each other, are mutually interdependent, and are relevant to
the organization of an activity in interaction.
construction of their turns, which are formatted online, in an emergent way, taking into
consideration the responses of the co-participants (C. Goodwin 1979) and the contingen-
cies of the interactional context. Turns unfold in a systematic way, based on the projec-
tions and the normative expectations characterizing the organization of the sequence,
first within the fundamental structure of the adjacency pair, constituted by a first pair
part making relevant and expectable a second pair part (as in the pair question/answer,
Schegloff and Sacks 1973), and second within possible pre-sequences and post-expansions
which make it more complex (Schegloff 2007). Participants mobilize all the resources at
hand for the intelligible, mutually accountable organization of the sequentiality of inter-
action; consequently, all the levels of sequential analysis have been explored, showing
that sequentiality is configured not only by talk but also by a range of embodied
resources.
Turn-construction and turn-taking basically rely on multimodal resources for the
organization of recognizable unit completions as well as transition-relevance places
(Lerner 2003). Speakers can display their imminent self-selection by a range of multi-
modal resources used as “turn-entry devices” – such as the “[a-face”, the palm-up gesture
(Streeck and Hartge 1992), the pointing gesture (Mondada 2007a) or other complex
embodied manifestations (Schmitt 2005) – to show that the person is about to speak.
Speakers also multimodally display the completion of turn-constructional units
(TCUs), as well as of turns, projecting completion by the trajectory of their gesture
or its retraction (Mondada 2007a), which is used as a “turn-exit device”. More generally,
Schegloff (1984: 267) suggests that the pre-positioning of gestures is a way of creating a
“projection space” within an ongoing utterance. Turns can also be expanded, not only by
adding syntactically fitted materials, but also gesturally (C. Goodwin 1979, 1981). This in-
cludes gestures achieving the collaborative construction of turns, in a similar way to the
collaborative construction of utterances (Bolden 2003; Hayashi 2005; Iwasaki 2009).
Not only turn-construction but also sequence organization relies on multimodal
resources, as demonstrated by studies on assessments (C. Goodwin and M.H. Goodwin
1987; Haddington 2006; Lindström and Mondada 2009), on word searches (M.H. Goodwin
and C. Goodwin 1986; Hayashi 2003), and on repair (Fornel 1991; Greiffenhagen and
Watson 2007), not to speak of sequences co-constructed by aphasic speakers and their
partners (C. Goodwin 2004).
Opening and closing sequences, as well as transition sequences from one activity to
another, are typically organized in an embodied way. The opening of an interaction is
achieved not only by the first words spoken or the response to a summons, but, even
before participants begin to speak, by an adjustment and arrangement of their bodies
in the material environment, in such a way that an “F-formation” (Kendon 1990) appro-
priate for the imminent activity is constituted. This in turn prompts the participants to
progressively organize and assemble their bodies within the local environment, building
the relevant “interactional space” of the encounter (Mondada 2005, 2009; Hausendorf,
Mondada, and Schmitt 2012). The participants can even be engaged in different courses
of action – or in multi-activity – as displayed, for example, by “body-torqued” postures
(Schegloff 1998), in which the upper part of the body is oriented towards a particular
participation framework and the lower part to another set of relevances. Conversely,
towards the closings of the encounter, the interactional space dissolves (De Stefani
2006, 2010; Robinson 2001). The importance of bodily movements in transitions be-
tween one episode of interaction and another, often concomitant with the manipulation
13. Conversation analysis 223
of objects and artifacts, also displays the embodied orientation of the participants
towards the organization of the interaction (Heath 1986; Modaff 2003; Mondada
2006a).
More generally, turns and sequences are closely monitored in the interactive con-
struction of actions; for example, co-participants constantly orient to recipiency
(M.H. Goodwin 1980), mutual attention (Heath 1982), and the achievement of mutual
understanding (Koschmann 2011; Sidnell 2006). This in turn opens up different oppor-
tunities to participate (C. Goodwin and M.H. Goodwin 2004), both in everyday situa-
tions (see the analyses of by-play by M.H. Goodwin 1997, or of stance and participation
by C. Goodwin 2007) and in professional ones (see the analyses of embodied participa-
tion in meetings by Ford 2008; Markaki and Mondada 2011; and Depperman, Schmitt,
and Mondada 2010).
5. Conclusion
Conversation analysis is engaged in the naturalistic study of social interaction. It focuses
on the local mobilization of multimodal resources (gesture, gaze, head movements,
facial expressions, body postures and body movements) by the participants, who
thereby organize the public accountability and finely-tuned coordination of their ac-
tions. Audible and visible resources, relying on language and the body, are exploited –
and also transformed in their exploitation – in both a situated and a systematic way
within sequentiality. Sequentiality concerns turn design, the building of sequences,
action formats and the organization of larger activities.
On the basis of video recordings of naturally occurring social activities, the conver-
sation analysis multimodal approach to social interaction has included an increasingly
wide and complex range of resources for accounting for different levels of organization,
starting from gesture and gaze, then integrating head movements and facial expressions,
and finally taking more and more seriously the issues of position, arrangement and
movement of bodies in interaction. The analysis of these complex Gestalts deals not
only with everyday face-to-face conversation, but also with ordinary mobile interac-
tions – such as walking together – and interactions among larger groups of participants
like debates or public assemblies. It also describes interaction in activities mediated by
technologies and by manipulating objects, artifacts and tools.
In this sense, conversation analysis deals with the study of the complexity of human
activity, including the widest range of multimodal resources, as they are considered,
produced and interpreted locally by the participants engaged in action.
6. References
Bolden, Galina B. 2003. Multiple modalities in collaborative turn sequences. Gesture 3(2):
187–212.
Broth, Mathias 2008. The studio interaction as a contextual resource for TV-production. Journal of
Pragmatics 40(5): 904–926.
Broth, Mathias 2009. Seeing through screens, hearing through speakers: Managing distant studio
space in television control room interaction. Journal of Pragmatics 41: 1998–2016.
Depperman, Arnulf, Reinhold Schmitt and Lorenza Mondada 2010. Agenda and emergence: Con-
tingent and planned activities in a meeting. Journal of Pragmatics 42: 1700–1712.
224 II. Perspectives from different disciplines
Greiffenhagen, Christian and Rod Watson 2007. Visual repairables: Analyzing the work of repair
in human-computer interaction. Visual Communication 8(1): 65–90.
Haddington, Pentti 2006. The organization of gaze and assessments as resources for stance taking.
Text & Talk 26(3): 281–328.
Haddington, Pentti, Lorenza Mondada and Maurice Nevile (eds.) 2013. Mobility and Interaction.
Berlin: De Gruyter.
Hausendorf, Heiko, Lorenza Mondada and Reinhold Schmitt (eds.) 2012. Raum als Interaktive
Resource. Tübingen, Germany: Narr.
Hayashi, Makoto 2003. Joint Utterance Construction in Japanese Conversation. Amsterdam: John
Benjamins.
Hayashi, Makoto 2005. Joint turn construction through language and the body: Notes on embodi-
ment in coordinated participation in situated activities. Semiotica 156(1/4): 21–53.
Heath, Christian 1982. The display of recipiency: An instance of sequential relationship between
speech and body movement. Semiotica 42: 147–167.
Heath, Christian 1986. Body Movement and Speech in Medical Interaction. Cambridge: Cambridge
University Press.
Heath, Christian 1989. Pain talk: The expression of suffering in the medical consultation. Social
Psychology Quarterly 52(2): 113–125.
Heath, Christian 1997. Analysing work activities in face to face interaction using video. In: David
Silverman (ed.), Qualitative Research. Theory, Method and Practice, 266–282. London: Sage.
Heath, Christian, Jon Hindmarsh and Paul Luff 2010. Video in Qualitative Research. London:
Sage.
Heath, Christian and Paul Luff 1992. Collaboration and control: Crisis management and multime-
dia technology in London Underground Line Control Rooms. Journal of Computer Supported
Cooperative Work 1(1–2): 69–94.
Heath, Christian and Paul Luff 2000. Technology in Action. Cambridge: Cambridge University
Press.
Heritage, John C. 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Iwasaki, Shimako 2009. Initiating interactive turn spaces in Japanese conversation: Local projec-
tion and collaborative action. Discourse Processes 46: 226–246.
Kendon, Adam 1990. Conducting Interaction: Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Koschmann, Timothy (ed.) 2011. Understanding Understanding. Special issue of Journal of Prag-
matics 43.
Koschmann, Timothy, Curtis LeBaron, Charles Goodwin, Alan Zemel and Gary Dunnington
2007. Formulating the triangle of doom. Gesture 7(1): 97–118.
Lerner, Gene H. 2003. Selecting next speaker: The context-sensitive operation of a context-free
organization. Language in Society 32: 177–201.
Lindström, Anna and Lorenza Mondada (eds.) 2009. Research on Language and Social Interaction.
(Special issue on Assessments) 42: 4.
Markaki, Vassiliki and Lorenza Mondada 2011. Embodied orientations towards co-participants in
multinational meetings. Discourse Studies 13(6): 1–22.
McNeill, David 1981. Action, thought, and language. Cognition 10: 201–208.
Modaff, Daniel P. 2003. Body movement in the transition from opening to task in doctor-patient
interviews. In: Philipp Glenn, Curtis D. LeBaron and Jenny Mandelbaum (eds.), Studies in
Language and Social Interaction: In Honor of Robert Hopper, 411–422. Mahwah, NJ: Erlbaum.
Mondada, Lorenza 2003. Working with video: How surgeons produce video records of their ac-
tions. Visual Studies 18(1): 58–72.
Mondada, Lorenza 2005. La constitution de l’origo déictique comme travail interactionnel des
participants: Une approche praxéologique de la spatialité. Intellectica 2–3(41–42): 75–100.
Mondada, Lorenza 2006a. Participants’ online analysis and multimodal practices: Projecting the
end of the turn and the closing of the sequence. Discourse Studies 8: 117–129.
226 II. Perspectives from different disciplines
Suchman, Lucy 1993. Technologies of accountability: Of lizards and airplanes. In: Graham Button
(ed.), Technology in Working Order: Studies of Work, Interaction and Technology, 113–126.
London: Routledge.
Whalen, Jack 1995. Expert systems versus systems for experts: Computer-aided dispatch as a
support system in real-world environments. In: Peter J. Thomas (ed.), The Social and Interac-
tional Dimensions of Human-Computer Interfaces, 161–183. Cambridge: Cambridge University
Press.
Whalen, Marylin and Don H. Zimmerman 1987. Sequential and institutional contexts in calls for
help. Social Psychology Quarterly 50: 172–185.
Abstract
This article gives an overview of studies that approach the body and its role for commu-
nication and cultural practices from an ethnographic perspective. First, the ethnographic
perspective itself is presented as a cultural practice that dates back to Antiquity and con-
sists in describing alien bodies and communicative practices. Secondly, the article presents
the early anthropological interest in cultural differences in the uses and representations of
the body that started with studies by Franz Boas in the US and Marcel Mauss in France.
Most innovatively, Gregory Bateson has begun to use videotaping as empirical starting
point, thus deviating from the literary orientation of classical ethnography. Thirdly, the
article provides a summary of newer developments in the field of anthropology that
address the body and its cultural production from the perspectives of ritual, medical and
political anthropology as well as from the anthropology of the senses. Fourthly, ethnogra-
phers have often reported that bodies have become the object of estheticization in art, but
also through bodily mutilation. Fifthly and finally, since bodies are extensively used for
communicative purposes, the article summarizes recent findings of cultural differences in
gestural communication detected through the detailed studies of video footage.
1. Ethnography as a method
As a rule, the term “ethnography” refers to the practices and products of description of
alien people and to their folkways and cultural achievements. Commonly, Herodotus of
Halicarnassus, who in his “Histories” described, amongst others, the appearances and
228 II. Perspectives from different disciplines
The ethnographic descriptions that resulted from Hymes’ appeal addressed all kinds
of topics that a “speech event” might encompass. The methodological strategy to focus
on speech events entails that neither actors, nor structures, nor situations, nor types of
behavior, but instead realized instances of verbal exchange are viewed as the units of
inquiry. Gumperz has defined speech events as “units of verbal behavior bounded in
time and space” (Gumperz 1982: 165) and “longer strings of talk each of which is
marked by a beginning, middle and an end” (Gumperz 1992: 44).
In order to methodologically approach speech events, Hymes (1972: 58–65), in his
famous mnemonic S-P-E-A-K-I-N-G acronym, has named eight elements to be focused
on: Setting, Participants, Ends, Acts, Keys (in the Goffmanian sense as framing de-
vices), Instrumentalities, Norms, and Genres. Thus, the ethnography of speaking and
communication was projected in a broad fashion so as to encompass bodily practices
as well.
Most of the ethnographic descriptions produced in this tradition provide detailed ac-
counts of speech events and genres, as well as of models and taxonomies of speaking
concepts. But, as critical voices have objected, they failed to advance to the comparative
level of the Hymesian endeavor and lost themselves in providing detailed descriptions
(see Duranti 1988; Keating 2001). In contrast to conversation analysis, the ethnography
of speaking adopts a relativist stance, in resisting all attempts of generalizing argu-
ments with counter-examples of the people they study. The ethnography of speaking
has also been criticized for a non-empiricist stance insofar as their studies only seldom
actually quote unedited talk. As Moerman (1988: 10) states, for example, in Bauman
and Sherzer’s seminal volume (1974), only two chapters do so, one of them being
provided by conversation analyst Harvey Sacks.
Only in recent years, especially in a new approach that was called “micro-ethnogra-
phy,” the recordings and thorough analyses of situations unaffected by the researcher
were also included as data in ethnographic descriptions. The term “micro-ethnography”
had first been used by scholars who aimed at studying “ ‘big’ social issues through an
examination of ‘small’ communicative behaviour” (LeBaron 2005: 494; cf. Streeck
and Mehus 2005: 381), as present, for example, in the moment-by-moment behaviors
whereby social stratification was accomplished in the classroom (firstly Smith 1966).
That is, micro-ethnographers pay thorough attention to the “small means whereby
events were jointly accomplished” which they see as “the building blocks of micro-
cultures enacted and constituted collectively” (LeBaron 2005: 494). “Micro-ethnography”
today has adopted an ethnomethodological and praxeological orientation (Streeck and
Mehus 2005: 385–386, 387–393) and aims at identifying the methods of establishing
order and the means of co-constructing meaning in the minutiae of social interaction.
Thus, in contrast to classical ethnography, which has a long-standing tradition of in-
cluding the ethnographer’s subjectivities and literary skills in ethnographic descriptions,
the micro-ethnographic approach is radically empiricist. It tries to base any analysis on
concrete hard data provided in particular by video recordings and complemented by ob-
servations, interviews, and document research. Communication and interaction are con-
sidered as constitutive for social reality. The analysis of such processes, constituted by
the visible and audible behaviors of social actors and embedded in a specific social
and material environment, is therefore its chief goal. Human bodies in interaction
are viewed not as intentional actors or physical apparatuses that enclose inner states,
but as living persons who are enculturated and socialized, thus embodying cultural
230 II. Perspectives from different disciplines
the index-finger through the buttonhole of the coat of the interlocutor) along with
idiosyncratic “contactual” and other gestures (Efron 1972: 91–92).
In France, Durkheim (1912) emphasized that the human body is the central object
of society. According to him, society is only able to exist as a collective entity because
it is able to appropriate the sensing bodies of its individuals. This is mainly achieved
through rituals in which society conveys specific sentiments to the bodily individual
while the human individual gains experiences that convince it of the power and profit
of society. Durkheim’s student, Marcel Mauss ([1935] 1973: 73), even more explicitly
treated the human body as the “main locus of culture” and drew attention to the phe-
nomena of its cultural training, which he, preceding Bourdieu, called “habitus.” His
ideas became central for any ethnography of the body: Cultures develop, as Mauss
(1973: 75) views it, “techniques of the body,” using the body both as “man’s first and
most natural technical object, and at the same time technical means.” Every act, how-
ever asocial, or even antisocial, is culturally laden; “there is perhaps no ‘natural way’
for the adult” (1973: 74). For example, “[t]he positions of the arms and hands while
walking form a social idiosyncrasy, they are not simply a product of some purely
individual, almost completely psychical arrangements and mechanisms” (Mauss 1973:
72). Concerning embodied knowledge and abilities, as Mauss (1973: 83) states, the
anthropologist reaches their limits of understanding: “Nothing makes me so dizzy as
watching a Kabyle going downstairs in Turkish slippers (babouches). How can he
keep his feet without the slippers coming off? I have tried to see, to do it, but I can’t
understand.”
Thus, with his new attention to the body as instrument of “cultivation,” Mauss iden-
tified a whole “new field of studies: masses of details which have not been observed, but
should be, constitute the physical education of all ages and both sexes” (1973: 78).
Mauss, however, only focused on the “techniques of the body” in singular activities
and, apart from military defilation, did not pay attention to processes in which several
bodies are coordinated in their activities. He largely neglected those practices in which
processes of intersubjectivation are generated and performed by the socialized and
enculturated body.
Another student of Durkheim’s, Robert Hertz (1907), made an early study –
preceding structuralist approaches – about the power of collective representations to
unite groups. Drawing on funeral rituals, he developed the theory that these represen-
tations often refer to universal body schemata such as right and left. These obvious
physical orderings are subsequently transferred to culture-specific symbolic meanings
such as good and bad, right and wrong, virtuous and sinful, endowing them with an
apparent naturalness.
From a methodological perspective, interestingly, ethnographic film has been used in
France much earlier than by Boas in the US. In fact, the ethnographic usage is as old as
cinema itself: in 1895, the same year the Lumière brothers held the world’s first public
film screenings, French physician Felix-Louis Regnault filmed the pottery-making tech-
niques of a Wolof woman at the Exposition Ethnographique de l’Afrique Occidentale in
Paris. Subsequent films were devoted to the cross-cultural study of movement: climbing
a tree, squatting, walking, by Wolof, Fulani, and Diola men and women. Regnault also
published a methodological paper, which differentiated his aims from those of the Lu-
mière brothers (Regnault 1931). In contrast to them, Regnault regarded the camera as a
scientific instrument that was able to fix transient human events for further analysis.
232 II. Perspectives from different disciplines
He went so far as to claim that ethnography only attains the precision of a science
through the use of such instruments (MacDougall 1998: 179) and also proposed the for-
mation of anthropological film archives (Brigard 1975: 15–16). However, it took some
time before these ideas were realized.
In the US-American tradition, Boas’ initiative to study the body from a non-
biological perspective was continued, for one, by Erving Goffman and his interactionist
sociology (1963, 1967). Goffman, however, stood in an even closer relationship to the
traditions of the Chicago School (Park, Hughes), Durkheim, and Julian Huxley
when he studied the dramaturgies of the human persona and their body in public and pri-
vate spaces, in total institutions and loose encounters, thereby focusing on face-to-face
interaction and applying ethnographic methods.
More directly related to Boas was the work of Gregory Bateson and Margaret
Mead (1942), Mead having been a student of Boas and Bateson and having studied
with Alfred Haddon, who was the first to use the technique of filming in the field
during the famous Torres Straits Expedition (1898/1899) (Brigard 1975: 16). Bateson
and Mead were the first to fully exploit the potentials of visual methods in ethno-
graphic research in what they termed an “experimental innovation” (1942: xi).
They took thousands of photographs and shot footage for several ethnographic
films in a Balinese village in order to use them as primary research method and
not, as it had sometimes been done before, as visual illustrations for written ac-
counts. By artfully arranging these photographs in their monograph and by examin-
ing their films frame by frame, they attempted to reveal the unspoken, even the
unspeakable aspects of culture so as to eventually get knowledge of what it means
and how it feels to be a Balinese body, and to engage with the world through Bali-
nese senses (cf. Streeck n.d.). Some time earlier, Bateson (1936) had used the term
“ethos” for this tacit bodily sociality and culturality.
The experimental methodological endeavor in their joint book had grown out of
their dissatisfaction with the established descriptive ethnography that they found
“far too dependent upon idiosyncratic factors of style and literary skill” (Bateson
and Mead 1942: xi). Their own project, in contrast, was designed to help solving
the difficulties of the ethnographic method “to communicate those intangible aspects
of culture which ad been vaguely referred to as its ethos” (Bateson and Mead 1942:
xi; orig. emph.). As they found many cultural phenomena in Bali to be tacit (most
prominently, maybe, the practices of education to which they dedicated a significant
to part of their book), Bateson and Mead felt that having found in visual analysis a
way to show how the Balinese “through the way in which they, as living persons, mov-
ing, standing, eating, sleeping, dancing, and going into trance, embody culture, how
gesture and posture are expressive of a people’s character” (Bateson and Mead
1942: xii).
Bateson was later an important part of what came to be known as the “Natural His-
tory of an Interview” (NHI) group. This group constituted itself around the project of
experimentally interpreting a video recording of a psychiatric interview (provided by
Bateson) from a bunch of different disciplinary perspectives including psychiatry, lin-
guistics, and anthropology. They did not only pay attention to what was said in this
interview, but particularly to how it was said, including bodily ways of communication.
A number of approaches that later on became quite influential, such as, for example,
Ray Birdwhistell’s “kinesics,” Albert Scheflen’s “context analysis,” Adam Kendon’s
14. Ethnography: Body, communication, and cultural practices 233
3. Newer developments
A branch of research that – following Durkheim – has relatively early been interested in
the human body and its practices is ritual studies. Particularly after the focus of ritual
studies turned away from asking for meaning towards concentrating on practice (see
Bell 1992), students of ritual began to include bodily practices into their scope of
research. For example, bodily practices in initiation rituals (such as “ritualized homo-
sexuality” [Herdt 1993] in Melanesian societies), the dissolution of bodily integrity in
illness and trance healing (as, e.g., breath, pain, and emotion in the shamanistic prac-
tices of the Sherpa in the Nepal Himalayas, cf. Desjarlais 1992) or the transformation
of living bodies into dead matter (Connor 1995) were studied.
Ritual studies triggered the creation of the anthropology of performance in the 1960s
that, after decades in which the mental life was granted theoretical privilege, focused
on the bodily-material aspects of culture, such as dance and movement, but also on
face-to-face interaction, verbal practices, as well as acting and theatrical performance
(Schechner 1988). Ethnographers of performance often study aesthetic languages of
bodies across cultural divides. Hahn (2007), in a study of traditional Japanese
dance, pays particular attention to the ways of body-to-body transmission of tacit
knowledge, and how they influence the sense of self. She argues that bodily knowledge
contributed to the construction of “boundaries of existence” that define physical and
social worlds. Sometimes, bodily practices are related to memory. Nelson (2008), for
example, examines how contemporary Okinawa Japanese storytellers, musicians, and
dancers engage with the legacies of a cruel Japanese colonial era and the devastations
of World War II by recalling memories, restructuring bodily experiences and practices,
and rethinking actions as they work through the past in order to re-arrange the
present.
A second contemporary anthropological sub-discipline interested in bodily practices
is medical anthropology. Its main objective consists in describing culture-specific as-
sumptions about normality and the conceptualization of health and illness, thus arguing
against the universal reification of disease entities advanced by the natural sciences.
Medical anthropology ethnographically analyses ideas about health (often including
or even centering on psychic health) as well as the healthy body, its substances, and
its epistemic state (McCallum 1996). Scholars such as Turner (1968) insist that the effi-
cacy of healing rituals is grounded in the cultural framing of subjectively experienced
sensations. To this effect, Farmer, e.g., interprets “bad blood” and “spoiled milk” in
rural Haiti as “moral barometers that submit private problems to public scrutiny”
(Farmer 1988: 62). Health and illness are thus sometimes seen as extensions of the
234 II. Perspectives from different disciplines
individual body to society and its morals (consider, e.g., discussions about AIDS as
divine retribution).
A third topic addressed by the ethnography of the body is concerned with the ways
by which human bodies are positioned in space. Especially in hierarchical societies,
these positionings are guided by norms that imply clear social roles endowed with rights
and duties of expression (see Duranti 1994; Wolfowitz 1991). Moreover, as Duranti
(1992) has stated, words, body movements, and lived space are sometimes – as in West-
ern Samoa – interconnected in interactional practice. The words used in ceremonial
greetings, as he says, cannot be fully understood without reference to sequences of
acts that include bodily movements in reference to symbolically laden space, as the per-
formance of ceremonies is located in and at the same time constitutive of the sociocul-
tural organization of space inside the house. According to Duranti’s theory, Samoan
practices of “sighting” embody language and space through “an interactional step
whereby participants not only gather information about each other and about the set-
ting but also engage in an negotiated process at the end of which they find themselves
physically located in the relevant social hierarchies and ready to assume particular
institutional roles” (Duranti 1992: 657).
Thus, by way of cross-cultural comparison, the anthropology of the body has identi-
fied variations of body concepts and practices and their relation to culture. Often their
findings also entail a critique of Western epistemological categories, as they are, for
example, inherent in the Cartesian dualism of body and soul. They have proved not
to be universal (cf. Lock 1993).
Further and more recent studies on the body include such topics as the cultural con-
struction of the self and the other, the forms and functions of emotions, specific cultures
of subjectivation, concepts and constructions of gender, as well as modalities and forms
of power and resistance as they are embedded (or, to put it in the Foucaultian jargon of
these approaches, “inscribed”) in the body (cf., e.g., Butler 1993; Lock 1993). “Body
politics” has thus developed to be an important concept of those anthropological orien-
tations, even though the methods applied in these fields often do not meet the standard
expected by scholars who are used to deal with detailed transcriptions of social events
(cf. Antaki et al. 2003).
Part of the newer anthropology of the body is, furthermore, the anthropology of the
senses that has emerged since the 1990s. In reaction to media studies that claimed the
superiority of cultures that structurally emphasize the visual sense (cf. Classen 1997),
anthropological studies were conducted that explore the whole range of the sensory
practices in other cultures, focusing on the tactile, gustatory, and olfactory but also
on the visual and auditory realm (cf. Howes 1991, 2004). They even study variations
of the “sixth sense” (Howes 2009) and contrast the Aristotelian “five sense sensorium”
with different cultural models of up to 17 senses (Synnott 1993: 155). Some of these stu-
dies also explore the usage of the senses within practical activities in a very detailed
manner (see, e.g., Goodwin 1994).
Most recently and with great methodological rigor, developments in medical tech-
nology, for example in surgery or genetic manipulation, that influenced the ways in
which bodies are conceptualized, became an important topic for ethnographers of the
body. Some of these studies address the ways bodies are dissociated and reconfigured
in the operation room when part of their expressive functions are delegated to technol-
ogies (e.g. Mol 2001). Other studies deal with changes in body concepts that are
14. Ethnography: Body, communication, and cultural practices 235
microethnography. The “multimodal turn” in the ethnography of the body, if ever, was
performed relatively late. Since ethnography as the most important method of practice
research is constantly confronted with two dangers, the one of excessively referring to
the knowledge of the researcher as an actor (leading at the end to the stance of using
oneself as an informant) and the other of restricting oneself to positive behavioral data
without any access to the subjective meaning of the actors (which, at the end, leads back
to a behavioristic position), micro-ethnography today, through combining both meth-
ods, is a rather complex, laborious, and time-consuming endeavor (cf. Farnell 1994).
This might be the reason why it has been applied only very tentatively in research
about other cultures, where the acquisition of a foreign language and the corresponding
body practices are additionally required, when micro-ethnography is already intricate in
the culture of the ethnographers themselves.
Even though there are a few studies which compile gestures in Non-Western cultures
(e.g., Baduel-Mathon 1971; Creider 1977), ethnographically contextualized studies of
the inherently embedded nature of bodily forms of communication that are well
grounded in empirical data gained by video recordings remain exceptional until
today. There are, however, some ethnographic, cross-cultural studies on gesture, as
for example on pointing gestures such as lip- and eye-click-pointing (Enfield 2001;
Orie 2009; Sherzer 1973) or absolute or relative directional space (Haviland 1993,
2000), as well as on gesture taboos (Kita and Essegbey 2001), emblematic gestures
(Brookes 2001, 2004; Sherzer 1991), co-speech gesturing (Eastman 1992; Enfield
2003), iconic gestures (Enfield 2005; Haviland 2007; Streeck 2009), and individual
gestures (Streeck 2002). In most recent times, ethnographic studies grounded on
video data that explore the interactional and practical consequences yielded by
culture-specific concepts of the senses have begun to be conducted (Meyer 2011).
6. References
Althoff, Gerd 2002. The variability of rituals in the middle ages. In: Gerd Althoff, Johannes Fried
and Patrick J. Geary (eds.), Medieval Concepts of the Past. Ritual, Memory, Historiography,
71–87. Cambridge: Cambridge University Press.
Antaki, Charles, Michael Billig, Derek Edwards and Jonathan Potter 2003. Discourse analysis
means doing analysis: A critique of six analytic shortcomings. Discourse Analysis Online 1
[http://www.shu.ac.uk/daol/articles/v1/n1/a1/antaki2002002-paper.html].
Asheri, David 2007. A Commentary on Herodotus: Books I–IV. Oxford: Oxford University Press.
Baduel-Mathon, Céline 1971. Le langage gestuel en Afrique Occidentale. Recherches Bibliogra-
phiques. Journal de la Société des Africanistes 41(2): 203–249.
Bateson, Gregory 1936. Naven. Cambridge: Cambridge University Press.
Bateson, Gregory and Margaret Mead 1942. Balinese Character: A Photographic Analysis. New
York: New York Academy of Sciences.
Bauman, Richard and Joel Sherzer (ed.) 1974. Explorations in the Ethnography of Speaking.
London: Cambridge University Press.
Beck, Rose-Marie 2001. Ambiguous signs: the role of the ‘kanga’ as a medium of communication.
Afrikanistische Arbeitspapiere 68: 157–169.
Bell, Catherine M. 1992. Ritual Theory, Ritual Practice. New York: Oxford University Press.
Boas, Franz 1897. The Social Organization and the Secret Societies of the Kwakiutl Indians. Wash-
ington: Government Printing Office.
Boas, Franz 1914. Kultur und Rasse. Leipzig: Veit.
14. Ethnography: Body, communication, and cultural practices 237
Brigard, Emilie de 1975. The history of ethnographic film. In: Paul Hockings (ed.), Principles of
Visual Anthropology, 13–43. The Hague: De Gruyter Mouton.
Brookes, Heather J. 2001. O clever ‘He’s streetwise.’ When gestures become quotable. The case of
the clever gesture. Gesture 1(2): 167–184.
Brookes, Heather J. 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Anthropology 14(2): 186–224.
Butler, Judith 1993. Bodies That Matter: On the Discursive Limits of “Sex." New York: Routledge.
Classen, Constance 1997. Foundations for an anthropology of the senses. International Social
Science Journal 153: 401–412.
Connor, Linda H. 1995. The action of the body on society: Washing a corpse in Bali. Journal of the
Royal Anthropological Institute 1(3): 537–559.
Creider, Chet A. 1977. Towards a description of east African gestures. Sign Language Studies 14:
1–20.
Desjarlais, Robert R. 1992. Body and Emotion: The Aesthetics of Illness and Healing in the Nepal
Himalayas. Philadelphia: University of Pennsylvania Press.
Duranti, Alessandro 1988. Ethnography of speaking: Toward a linguistics of the praxis. In: Frederick
J. Newmeyer (ed.), Linguistics: The Cambridge Survey, Volume 4: Language: The Socio-Cultural
Context, 210–228. Cambridge: Cambridge University Press.
Duranti, Alessandro 1992. Language and bodies in social space: Samoan ceremonial greetings.
American Anthropologist 94: 657–691.
Duranti, Alessandro 1994. From Grammar to Politics: Linguistic Anthropology in a Western
Samoan Village. Berkeley: University of California Press.
Durkheim, Emile 1912. Les Formes Élémentaires de la Vie Religieuse: Le Système Totémique en
Australie. Paris: Alcan.
Eastman, Carol M. 1992. Swahili interjections: Blurring language-use/gesture-use boundaries.
Journal of Pragmatics 18: 273–287.
Edmonds, Alexander 2011. Pretty Modern: Beauty, Sex, and Plastic Surgery in Brazil. Durham:
Duke University Press.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton. Orig. First published [1941].
Enfield, N. J. 2001. Lip pointing: A discussion of form and function with reference to data from
Laos. Gesture 1(2): 185–212.
Enfield, N. J. 2003. Producing and editing diagrams using co-speech gesture: Spatializing nonspa-
tial relations in explanations of kinship in Laos. Journal of Linguistic Anthropology 13(1): 7–50.
Enfield, N. J. 2005. The body as a cognitive artifact in kinship representations: Hand gesture dia-
grams by speakers of Lao. Current Anthropology 46(1): 51–81.
Farmer, Paul 1988. Bad blood, spoiled milk: Bodily fluids as moral barometers in rural Haiti.
American Ethnologist 15(1): 62–83.
Farnell, Brenda M. 1994. Ethno-graphics and the moving body. Man 29(4): 929–974.
Freese, John H. 1920. Photius: The Library. London: Society for Promoting Christian Knowledge.
Freyre, Gilbert 1986. The Masters and the Slaves. A Study in the Development of Brazilian Civili-
zation. Berkeley: The University of California Press. Orig. First published [1933].
Gell, Alfred 1998. Art and Agency. Oxford: Clarendon.
Goffman, Erving 1963. Behavior in Public Places. Notes on the Social Organization of Gatherings.
Glencoe: Free Press.
Goffman, Erving 1967. Interaction Ritual. Essays on Face-to-Face Behavior. New York: Doubleday
Anchor.
Goodwin, Charles 1994. Professional vision. American Anthropologist 96: 606–633.
Gumperz, John J. 1982. Discourse Strategies. Cambridge: Cambridge University Press.
Gumperz, John J. 1992. Contextualization revisited. In: Peter Auer and Aldo Di Luzio (eds.), The
Contextualization of Language, 39–54. Amsterdam: John Benjamins.
Hahn, Tomie 2007. Sensational Knowledge: Embodying Culture through Japanese Dance. Middle-
town, CT: Wesleyan University Press.
238 II. Perspectives from different disciplines
Haraway, Donna J. 1991. Simians, Cyborgs, and Women: The Reinvention of Nature. New York:
Routledge.
Haviland, John B. 1993. Anchoring, iconicity, and orientation in Guugu Yimithirr pointing ges-
tures. Journal of Linguistic Anthropology 3: 3–45.
Haviland, John B. 2000. Pointing, gesture spaces, and mental maps. In: David McNeill (ed.),
Language and Gesture, 13–46. Cambridge: Cambridge University Press.
Haviland, John B. 2007. Gesture. In: Alessandro Duranti (ed.), A Companion to Linguistic
Anthropology, 197–221. Malden, MA: Blackwell.
Herdt, Gilbert H. (ed.) 1993. Ritualised Homosexuality in Melanesia. Berkeley: University of
California Press.
Hertz, Robert 1907. Contribution à une étude sur la représentation collective de la mort. Anneé
Sociologique 10: 48–137.
Hewes, Gordon W. 1974. Gesture language in culture contact. Sign Language Studies 1: 1–34.
Hinsley, Curtis M. and Bill Holm 1976. A cannibal in the national museum: The early career of
Franz Boas in America. American Anthropologist 78: 306–316.
Hogle, Linda F. 2005. Enhancement technologies and the body. Annual Review of Anthropology
34: 695–716.
Howes, David (ed.) 1991. The Varieties of Sensory Experience: A Sourcebook in the Anthropology
of the Senses. Toronto: University of Toronto Press.
Howes, David (ed.) 2004. Empire of the Senses: The Sensual Culture Reader. Oxford: Berg.
Howes, David (ed.) 2009. The Sixth Sense Reader. Oxford: Berg.
Hymes, Dell H. 1962. The ethnography of speaking. In: Thomas Gladwin and William C. Sturte-
vant (eds.), Anthropology and Human Behavior, 13–53. Washington, DC: The Anthropology
Society of Washington.
Hymes, Dell H. 1972. Models of interaction of language and social life. In: John J. Gumperz and
Dell Hymes (eds.), Directions in Sociolinguistics: The Ethnography of Communication, 35–71.
New York: Holt.
Keating, Elizabeth 2001. The ethnography of communication. In: Paul Atkinson, Amanda Coffey,
Sara Delamont, Lyn Lofland and John Lofland (eds.), Handbook of Ethnography, 285–301.
London: Sage.
Kita, Sotaro and James Essegbey 2001. Pointing left in Ghana: How a taboo on the use of the left
hand influences gestural practice. Gesture 1: 73–94.
LaTosky, Shauna 2006. Reflections on the lip-plates of Mursi women as a source of stigma and
self-esteem. In: Ivo Strecker and Jean Lydall (eds.), Perils of Face: Essays on Cultural Contact,
Respect and Self-Esteem in Southern Ethiopia, 382–397. Berlin: Lit.
LeBaron, Curtis 2005. Considering the social and material surround: Toward microethnographic
understandings of nonverbal behavior. In: Valerie Manusov (ed.), The Sourcebook of Non-
verbal Measures, 493–506. Mahwah, NJ: Erlbaum.
Leeds-Hurwitz, Wendy 1987. The social history of the natural history of an interview: A multidis-
ciplinary investigation of social communication. Research on Language and Social Interaction
20: 1–51.
Leenhardt, Maurice 1979. Do Kamo: Person and Myth in the Melanesian World. Chicago: Univer-
sity of Chicago Press.
Lock, Margaret 1993. Cultivating the body: Anthropology and epistemologies of bodily practice
and knowledge. Annual Review of Anthropology 22: 133–155.
MacDougall, David 1998. Transcultural Cinema. Princeton, NJ: Princeton University Press.
Madden, Raymond 2010. Being Ethnographic: A Guide to the Theory and Practice of Ethnogra-
phy. London: Sage.
Malinowski, Bronislaw 1922. Argonauts of the Western Pacific: An Account of Native Enterprise
and Adventure in the Archipelagoes of Melanesian New Guinea. London: Routledge.
14. Ethnography: Body, communication, and cultural practices 239
Mauss, Marcel 1973. Techniques of the body. Economy and Society 2(1): 70–88. First published
[1935].
McCallum, Cecilia 1996. The body that knows: From Cashinahua epistemology to a med-
ical anthropology of lowland South America. Medical Anthropology Quarterly 10(3):
347–372.
Meyer, Christian 2011. Körper und Sinne bei den Wolof Nordwestsenegals. Eine mikroethnogra-
phische Perspektive. Paideuma 57: 97–120.
Moerman, Michael 1988. Talking Culture: Ethnography and Conversation Analysis. Philadelphia:
University of Pennsylvania Press.
Mol, Annemarie 2001. The Body Multiple: Artherosclerosis in Practice. Durham, NC: Duke Uni-
versity Press.
Nelson, Christopher 2008. Dancing with the Dead: Memory, Performance, and Everyday Life in
Postwar Okinawa. Durham, NC: Duke University Press.
Orie, Olanike Ola 2009. Pointing the Yoruba way. Gesture 9(2): 237–261.
Parkin, David 2004. Textile as commodity, dress as text: Swahili kanga and women’s statements.
In: Ruth Barnes (ed.), Textiles in Indian Ocean Societies, 47–67. London: Routledge.
Regnault, Félix-Louis 1931. Le rôle du cinéma en ethnographie. La Nature 59: 304–306.
Ruby, Jay 1980. Franz Boas and early camera study of behavior. The Kinesis Report 3:
6–11, 16.
Schechner, Richard 1988. Performance Theory. New York: Routledge.
Schegloff, Emanuel A. 1988. Goffman and the analysis of conversation. In: Paul Drew and An-
thony J. Wootton (eds.), Erving Goffman. Exploring the Interaction Order, 89–135. Cambridge:
Polity Press.
Sherzer, Joel 1973. Verbal and non-verbal deixis: The pointed lip gesture among the San Blas
Cuna. Language in Society 2: 117–131.
Sherzer, Joel 1991. The Brazilian Thumbs-Up Gesture. Journal of Linguistic Anthropology 1(2):
189–197.
Sherzer, Joel and Regna Darnell 1972. Outline guide for the ethnographic study of speech use. In:
John Joseph Gumperz and Dell H. Hymes (eds.), Directions in Sociolinguistics, 548–554. New
York: Harper & Row.
Smith, Louis M. 1966. The Micro-Ethnography of the Classroom. St. Louis: Central Midwestern
Regional Educational Laboratory.
Stollberg-Rillinger, Barbara 2011. Much ado about nothing? Rituals of politics in early modern
Europe and today. Bulletin of the German Historical Institute 48: 9–24.
Streeck, Jürgen and Siri Mehus 2005. Microethnography: The study of practices. In: Kristine Fitch
and Robert Sanders (eds.), Handbook of Language and Social Interaction, 381–404. Mahwah,
NJ: Erlbaum.
Streeck, Jürgen 2002. A body and its gestures. Gesture 2(1): 19–44.
Streeck, Jürgen 2009. Gesturecraft. The Manu-facture of Meaning. Amsterdam: John Benjamins.
Streeck, Jürgen (n.d.). Balinese Hands. http://jurgenstreeck.net/bali/
Synnott, Andrew 1993. The Body Social. London: Routledge.
Turner, Terence 1980. The social skin. In: Jeremy Cherfas and Roger Lewin (eds.), Not Work
Alone: A Cross-Cultural View of Activities Superfluous to Survival, 112–140. Beverly Hills,
CA: Sage.
Turner, Victor W. 1968. The Drums of Affliction: A Study of Religious Processes among the
Ndembu of Zambia. Oxford: Clarendon Press.
Wolfowitz, Clare 1991. Language Style and Social Space: Stylistic Choice in Suriname Javanese.
Urbana: University of Illinois Press.
Abstract
All human cognition is distributed. It is, of course, distributed across networks of neurons
in different areas of the brain, but also frequently across internal and external representa-
tional media, across participants in interaction, and across multiple spans of time. While
most research on distributed cognition has focused on complex activities in technology-
laden settings, the principles apply just as well to everyday cognitive activities. Studies of
distributed cognition reveal that bodily activity – especially gesture – plays a central role
in coordinating the functional systems through which cognitive work gets accomplished.
Gesture does more than externalize thought; it is often part of the cognitive process itself.
Gestures create representations in the air, enact representational states on and over other
media, and bring states in different media into coordination to produce functional out-
comes. Gesture also plays a central role in propagating functional systems – associated
cultural practices, cognitive models, and forms of coordination – across generations,
while adapting them to the particulars of new problem situations. In so doing, gesture
helps to sustain and enhance the cognitive sophistication of the human species.
1. Introduction
A child told to share candies with her sibling doles them out one at a time, saying “one
(for you), one (for me); two, two; three, three….” Or she spreads them across a table
and points at candies in succession while saying “one, two, three….” Or her mother
shows her how to slide candies across the table two at a time while counting “two,
four, six….” Each scenario involves hand actions, one pointing, and one demonstrating.
Are these actions gestures?
What we call gestures and how we study them reflect particular theories of human
cognition and communication. The traditional view in cognitive science is that humans
think internally through propositional logic and/or mental imagery and express their
thoughts to others through language, spoken or written. Spoken messages are accompa-
nied by paralinguistic cues, such as vocal tone, facial expression, or body language, that
signal emotional state or stance toward what is being said. Head movements signal
agreement or disagreement, and hands support or supplement the content of spoken
messages. These signals help listeners unpack utterances and recover their propositional
and/or imagistic content. From this point of view, communicating is a matter of
15. Cognitive Anthropology: Distributed cognition and gesture 241
encoding and decoding messages transmitted from sender to receiver: what since Reddy
(1979) has been called the “conduit metaphor” of communication. This view of cogni-
tion and communication served as the basis for most research in cognitive science dur-
ing the mid-1950s to the late 20th century, until other views, including that of distributed
cognition, called this account increasingly into question.
Against this backdrop, gesture re-emerged as a topic of research due primarily to the
pioneering studies of Adam Kendon (1972, 1980) and David McNeill (1985, 1992).
Kendon and McNeill both study the expressive hand movements that accompany
speech, which Kendon (1980) calls “gesticulation,” and how they relate to spoken con-
tent. For both researchers, a primary unit of analysis is the utterance, a communicative
act consisting of a speech-gesture ensemble expressing related content and bounded as
a single intonation unit (as in Chafe 1994). For Kendon (2004), speech and gesture are
separate streams coordinated in the process of utterance, while for McNeill (2005),
speech and gesture arise together, in a dialectic of language and imagery, from a single
idea or “growth point.” Gesture, like speech, is a means for expressing or externalizing
thought in the mind of the speaker, although it can also mark or regulate aspects of the
discourse. Kendon’s data consist primarily of recordings of conversations and some
guided tours, while McNeill’s consist mostly of experimental participants narrating
events seen in a cartoon or film or recalled from a fairy tale. The gestures examined
in these studies are produced in the air in the space in front of the speaker or, in the
case of pointing gestures, directed toward objects or locations in the surrounding
space. The ground-breaking studies of Kendon and McNeill contrast in some respects
with the workplace studies typical of distributed cognition research, where gestures
over representational artifacts are common and where gesture and speech are directed
toward the accomplishment of joint activity as well as the development of mutual
understanding.
Studies of distributed cognition are closer to the work of Jürgen Streeck (2009) and
Charles Goodwin (2000), researchers from the tradition of conversation analysis who
study the communicative practices of people engaged in work and life activities in
the culturally rich settings they ordinarily inhabit. These researchers study gesture as
practice – as part of what people do and how they go about doing it – rather than as
expressions of interior mental life. They take a particular interest in times when people
coordinate with one another to develop a shared understanding of a problematic situ-
ation or to overcome snags in the flow of activity; here gesture comes to the fore. In
these studies, gesture is entwined with practical action, so that gestures are frequently
produced on or over objects in an “environmentally coupled” way (Goodwin 2007) or
with objects in hand (as in LeBaron and Streeck 2000). Gesture may not be singled out
for study but may instead be considered one of many factors shaping the construction of
meaning in situ, including the structure of the activity, aspects of the setting, the position
and orientation of participants’ bodies (including access to each other’s actions), mutual
orientation toward objects, shared knowledge or history of activity, content and struc-
ture of the preceding discourse, and, of course, the talk that participants produce
(Goodwin 2000). Attention is paid to conversational moves of various kinds (even inac-
tion), and meaning is seen as emergent from the relations between talk, gesture, arti-
facts, and situated aspects of the discourse rather than from the unpacking of utterances.
Along with a focus on practice, distributed cognition research shares with these stu-
dies the use of micro-ethnography as a method of inquiry. Data consist of recordings of
242 II. Perspectives from different disciplines
2. Distributed cognition
The term “distributed cognition” refers not to a type of cognition but to a perspective
for understanding cognition generally. As described by its leading proponent, Edwin
Hutchins, all human cognition is distributed. Some cognitive accomplishments rely
solely on interactions among neural networks in diverse areas of the brain, while others,
including the most significant human accomplishments, involve coordination of internal
structures and processes with structures in the world we engage with our bodies and
modify to suit our purposes. Through such functional couplings, we use our Stone Age
brains to lead Space Age lives.
Among the chief insights of distributed cognition is the benefit to be gained by not
defining the boundaries of the cognitive system too narrowly. If we consider cognitive
processes of reasoning, decision-making, and problem-solving to be those “that involve
the propagation and transformation of representations” (Hutchins 2001: 2068; see also
Hutchins 1995a: 49), then we must also consider that many of the most important repre-
sentations lie outside the heads of individuals, embedded in sociotechnical systems of
human activity. By incorporating relevant aspects of the material setting and social
organization into the unit of analysis, we can study how cognitive systems function
through “the propagation of representational state across a series of representational
media” and how representational states are propagated “by bringing the states of the
media into coordination with one another” (Hutchins 1995a: 117). Once we have an
15. Cognitive Anthropology: Distributed cognition and gesture 243
for the distributed cognitive system to function: the speech system utters numeric labels
in succession while the manual system moves the hand with extended index finger (in
the handshape prototypical for pointing) from object to object, touching each object
at precisely the instant when a numeric label is uttered. An essential part of this coor-
dination is imposing a path along which the hand moves so that it touches every object
exactly once. This system for computing quantity requires the coordination of brain,
body, and world (Clark 1997): it combines conceptualization (object perception, a cog-
nitive model for a specific cultural practice, a conceptual path), bodily action (speaking
and touching), and environmental structure (a configuration of objects) into a func-
tional ensemble. The system can fail in several ways: missing number names, miscoor-
dinating uttering with touching, mistaking objects to be included, losing track of the
counting path, etc. Errors can be reduced through improvised adaptations: a child
counting dots in a circle, for example, held the tip of her left index finger at one dot
while she used the tip of her right index finger to touch each of the subsequent dots
around the circle while counting aloud; marking the start of the counting path in this
way made it easier to discern its end (Williams unpublished data). A common adapta-
tion is to modify the environment before counting: to rearrange objects into a line
or array in order to facilitate a simpler counting path. These manual actions before
counting reduce errors in the execution of the functional system, making it more
robust. These examples show how conventional cultural practices must be adapted to
the particulars of setting and situation when distributed cognitive functional systems
are instantiated in real human activity. Actions of the hands are critical to these
situation-specific adaptations.
(a) sequential touching (b) relocating objects (c) using finger proxies
In the form of counting practice described above, the functional system operates
through a series of touch-points synchronized with speech; this looks like gestural action
without intent to communicate. The gestural quality is even more apparent when the
manual actions are reduced to a series of points (with no contact) while number tags
are uttered. With further reduction, the system operates with no hand action at all, repla-
cing it with gaze shifting: looking at (fixating) objects in succession while uttering number
names subvocally. These are varied instantiations of a common counting practice realized
through different bodily actions, some more overtly gestural than others.
15. Cognitive Anthropology: Distributed cognition and gesture 245
A second conventional way to count objects, shown in Fig. 15.1(b), involves changing
the location of objects while uttering number tags. Examples from Williams (2008c)
include picking up and placing traffic cones, sliding coins across a table two at a time,
and dropping buttons into a bag. Again coordination of the functional system is
achieved through manual action synchronized with speech, but here the movements
look more like practical actions than gestures: grasping, lifting, sliding, placing, and
dropping objects, all performed not to accomplish some practical end on or with an
object but rather to accomplish the cognitive goal of computing quantity.
A third way to count objects, shown in Fig. 15.1(c), again appears gestural in form:
raising fingers or touching fingers successively (for example, to a surface) while uttering
names of non-present entities. Here the fingers are proxies for objects being counted.
Examples from Williams (2008c) include raising fingers while reciting the alphabet to
identify the 18th letter and touching and raising fingers while naming family members
to determine the number of people for a dinner reservation. In this functional system,
the hand configuration is modified in coordination with object-naming; the configura-
tion produced when the final object is named represents the total number of objects.
This final configuration can be identified using associations from childhood counting
practice, or the finger-raising sequence can be repeated while reciting number tags
until the target configuration, held in visual working memory, reappears. The manual
actions in this case are neither practical nor communicative: they are cognitive actions
that encode representational states during the execution of a computational process.
If we call them gesture, as I believe we should, then they are gesture for cognition –
specifically, gesture for problem-solving rather than word-finding or thinking-for-speaking,
which are cognitive functions claimed for gesture when it is considered in relation to
speech.
The distributed cognitive functional systems described in this section all accomplish
computation through sequenced actions of the hands (or eyes) that coordinate spoken
labels with objects or their proxies. The manual actions that bring these systems into
coordination cross distinctions between practical action (physically moving objects),
communicative action (pointing), and cognitive action (counting on fingers). A single
form, an index-finger point, can serve different functions (cognitive, communicative,
or both simultaneously) while a single function, coordinating number tags with objects,
can be accomplished by different forms (looking, pointing, touching, sliding, picking up
and placing, etc.). Whether manipulative or gestural in appearance, the hand actions de-
scribed here serve the common purpose of coordinating elements in a functional system
to produce a computational outcome.
them as tools to act on other objects (to draw, write, carve, and so on). These are com-
monly regarded as practical actions, but they may equally be cognitive actions, helping
us perceive the affordances of objects (Kirsh and Maglio 1994) or prepare the environ-
ment for intelligent action (Kirsh 1995), as in the example of lining up objects before
counting them. Hands also interact with the world without modifying it: they bring
attention to objects, highlight their relevant features, and annotate or elaborate their
structure. These are environmentally coupled gestures (Goodwin 1994, 2007) whose sig-
nificance derives from the culturally constituted spaces in which they are performed.
Finally, hands depict directly, in the space in front of the speaker, using conventional
gestural practices: they enact schematic actions; they evoke imagined objects through
enactments or through schematic acts of drawing, outlining, or molding; and they
model objects and their interactions (see Müller 1998: 114–126 and Streeck 2008 for dis-
cussion of gestural modes of depiction). Acting on objects, gesturing over objects, and
gesturing in the air seem like different sorts of hand actions, but from the perspective of
distributed cognition, they often serve similar or closely related purposes. The purpose
of a given hand action may become apparent only when it is considered in terms of the
functional system being instantiated to accomplish a particular outcome.
That human hands modify environments, manufacture artifacts, and use tools to
achieve desired ends is well known and widely regarded as fundamental to human
life. More specific to human cognitive achievements are hands’ entrained abilities to
create representational states in physical media. Hands create representational states
through culturally shared practices of sketching, drawing, and writing, as well as
through more specialized practices like painting, sculpting, carving, or crafting. In
much of the world today, hands create representational states in electronic media
through historically recent practices such as keyboarding, mousing, and using touch-
pads and -screens. Where physical or electronic media are absent, or where the skills
to employ them are lacking, hands rely upon themselves to represent: they use their
own physicality to materialize conceptual content. Indeed, this ability may be integral
to all the others. Hutchins (2010) claims that “humans make material patterns into re-
presentations by enacting their meanings” (Hutchins 2010: 434), and hands are humans’
primary tools for enactment.
Given this array of potential means for representation, it is worth asking: What are
the affordances of hands that lead to their being employed for depiction when other
media might instead be chosen? Because hands are parts of our bodies, they are always
“ready at hand”: they can be brought into action quickly and can produce representa-
tional states faster than these could be engendered in other media. In contrast with writ-
ing and drawing, hands represent relations in three-dimensional space and can move
while representing, enacting the dynamics of processes. Multiple changing relations
are especially hard to visualize, requiring complicated physical models or clockworks,
flat (2-D) video recordings or animations, or high-technology systems for motion cap-
ture or 3-D visualization. Hands can conjure 3-D relations and dynamics directly in
space – in so far as a partial, schematic depiction annotated by speech is sufficient to
the demands of the situation and the complexity of what needs to be represented.
Using the hands depictively also brings processes of all sizes and scopes, from the
cosmic to the microscopic, into “human scale”: the scale at which we directly perceive
and act in the world (Fauconnier and Turner 2002: 312). Two types of human scale are
important in gesture research: one in which the gesturer inhabits a space and acts
15. Cognitive Anthropology: Distributed cognition and gesture 247
subjectively within it, called “character viewpoint,” and another in which the gesturer
models objects and interactions in the space in front of his body, called “observer view-
point” (McNeill 1992: 118–125). A speaker adopts character viewpoint if she enacts
steering a car while describing an automobile accident; she adopts observer viewpoint
if she uses her hands to model two cars colliding, a depiction she views from outside the
space of action in mutual orientation with her interlocutor, who views it from another
angle. Observer viewpoint, in particular, enables us to take processes at any imaginable
scale and to portray them in the perceivable, reachable space in front of our bodies and
thus to “dominate” them (Latour 1986: 21). And because our bodies are mobile – able
to bend, reach, turn, walk, and so on – we can transport gestural representations into
and out of co-location with states in other media, thereby linking or coupling them.
This puts hands as representational tools squarely at the center of the coordinative
processes necessary for cognitive functional systems to achieve their outcomes.
Finally, what must be noted in this discussion of the affordances of gesture as a re-
presentational medium is the limited durability, the non-persistence, of gestural repre-
sentations. Gestures have a greater material presence than words, but while they can be
sustained briefly to support perception and reasoning, they vanish as soon as the hands
are put to other uses. This is their most significant contrast with other physical media,
yet the affordances of gesture enable it to be used in conjunction with durable media
to achieve outcomes more powerful than either could achieve on its own.
turn bearings or danger depth contours, etc.) are added in ink before the chart is em-
ployed in navigation. When the chart is in use, plotted lines of position and projected
future positions are marked in pencil. Finally, when navigators consider possible land-
marks for the next position fix, they trace projected lines of position on the chart with
their index fingers, leaving no marks. The significance of these gestural traces emerges
not simply from the composites of gesture and speech (the utterances) but from the
layering of the gestures, construed by speech, on the meaningful space of the chart in
the context of the mutually understood activity being jointly pursued. As Hutchins
(2006) notes: “The meanings of elements of multimodal interactions are not properties
of the elements themselves, but are emergent properties of the system of relations
among the elements” (Hutchins 2006: 381). These gestural traces of possible lines of
position are part of an embodied process of joint imagining: perceiving in a “hypothet-
ical mode” (Murphy 2004: 269) while acting in a “subjunctive mood” (Hutchins 2010:
438). The fleeting quality and lack of physical imprint of the gestures suit precisely
the nature of the cognitive task at hand: considering, but not committing to, possible
courses of action, and using these considerations as the basis for a decision that will
determine future action.
This two-handed gesture complex, in coordination with the accompanying speech, ac-
complishes the multifaceted purpose of representing molecular structure, making it
available for visual scrutiny, and focusing attention on a detail in that structure that
is critical to understanding how the molecule functions. In a further elaboration, the
speaker adds motion dynamics to the 3-D model. She says, “And so our new theory
is that thrombomodulin does something like this,” pausing briefly to contract and
expand her fingers, “or like this,” pausing again to rotate her fingers from side to
side. In this portion of the discourse, the speaker uses her hand-as-molecule to enact
and thereby simulate possible forms of molecular motion resulting from the binding
of thrombomodulin. These simulations are, like the lines-of-position example, not yet
committed to but hypothetical. In subsequent elaborations, she places the back of
her right hand against the back of the hand-as-molecule to enact the binding of throm-
bomodulin to the back side of thrombin, and then she uses rapid movement of her right
hand toward the interior of the hand-as-molecule to enact the rapid binding of another
protein to the active site. Throughout these depictions, her left hand models the throm-
bin molecule and its dynamics while her right hand alternates between indexing parts
of the molecule and modeling other molecules’ interactions with it.
(a) thrombin diagram (b) thrombin hand model (c) thrombin hand model
(6 months later)
This example clearly illustrates how gestural depiction becomes a component of sci-
entific reasoning. The speaker’s hand-as-molecule gesture creates a stable, visually
accessible, dynamically reconfigurable, 3-D model of a functioning molecule at human
scale. Her gestural elaborations of that model serve to highlight invisible elements and
depict imperceptible processes, all in the hypothetical mode. By using her hand move-
ments to create these representations, the speaker also brings her own embodied expe-
rience with tangible objects, felt movements, and visuospatial perception into play,
providing a bodily basis for sensing connections or making discoveries about molecular
dynamics. The gestural model takes on the status of a cognitive artifact: a representa-
tional or computational tool that is part of a cognitive functional system – in this case,
a system for reasoning about molecular interactions. This gestural model proves crucial
to the work of the group, as evidenced by two observations: first, that other members of
the group reproduce the hand-as-molecule gesture during subsequent discussion, and
second, that the hand-as-molecule gesture is produced independently and spontaneously
250 II. Perspectives from different disciplines
by a lab member (not the original speaker) six months later in an interview when she de-
scribes the goals of the project, as shown in Fig. 15.2(c). In both the science laboratory
and ship navigation examples, gestural enactments serve as components of hypothetical
thinking as well as ways of sharing that thinking with others, demonstrating how hands
are used as tools for reasoning as well as communication.
(i) touches the brain image on the computer screen while identifying the location she
touches as the “center”;
(ii) rotates the hand-drawn chart to align it with the image and points to a location on
the chart she identifies as “right here”;
(iii) holds the chart up next to the computer screen while saying “and when we look at
this map it looks something like that”;
(iv) traces the outline of the primary visual area on the chart with her index and middle
finger while saying “so V1 is going to be in the center”; and then
(v) transposes her hand – maintaining the tracing handshape – to the brain image on
the computer screen where she executes a matching two-fingered trace, shown in
Fig. 15.3(a), six times in rapid succession while saying “it’s gonna be this pie shape;
it’s probably covering approximately this area” (Alač and Hutchins 2004: 646).
15. Cognitive Anthropology: Distributed cognition and gesture 251
Here the coordinative function of gesture is quite apparent. Pointing and tracing highlight
structures in external representational media whose conceptual identity is invoked by
speech. Maintaining handshape while moving the hand from one culturally constructed
space (the chart) to another (the computer image) and repeating the gesture form in
the new space together establish a conceptual link between the highlighted states of
the two media; these states are construed, named, and related by the accompanying
speech (“this,” “here,” “the center,” “V1,” “like that,” “so,” etc.) to produce the relevant
object of knowledge, namely, seeing part of the colored image as V1, the primary visual
cortex. “Seeing-as” is a cognitive accomplishment, the outcome of a discursive process in
which gesture weaves conceptual content into material patterns in the physical world.
also coordinate with both through perceived similarities of form. Proximity and syn-
chrony are precise achievements of skilled human action that weaves media into mean-
ing. As hands enter culturally constituted spaces, they form shapes and perform
movements that take on meaning in relation to the structures and representational con-
ventions that govern those spaces. How the gestures are to be interpreted, whether in
relation to an artifact itself or to what that artifact represents, depends upon the con-
strual provided by speech and by shared knowledge of the situation. In Hutchins and
Palen (1997), for example, a pilot’s gestures on and over an instrument panel in the
flight deck are variously interpreted as actions taken on the panel and as events occur-
ring in the aircraft’s fuel system (Hutchins and Palen 1997: 37). Hands also transport
meaningful state from one constructed space – one semiotic field – to another, as
when the functional magnetic resonance imaging researcher transfers a traced outline
from the labeled chart to the brain image. Moving a representational state into a
new semiotic field transforms how that state is seen. Hutchins gives the example of a
navigator moving his dividers (a tool that can be set to span a particular interval)
from a line segment on a navigation chart to a printed scale where the distance traveled
in three minutes (1500 yards) can be read as a speed (15 knots) by ignoring the two
trailing zeroes (1995a: 151–152; further analyzed in 2010: 429–434). Whether what is
moved from one space to another is a physical tool or a configured hand makes little
difference; what matters is the semiotic shift. Of course, hands also create representa-
tions in their own semiotic space, in the air in front of the speaker, in relation to spoken
content, as when the scientist uses her hand to model the shape and movements of a
molecule. Through these processes, hands play a crucial role in producing and elabor-
ating “multimodal meaning complexes” (Alač and Hutchins 2004: 637) in the interac-
tions through which joint activities are accomplished and through which participants
come to share understanding.
the hands or body with outstretched arms to simulate changes in the airplane’s orien-
tation or dynamics. The gestural enactments variably precede (with a hold), coincide
with, or quickly follow their lexical affiliates in the other’s speech. Their purpose
seems to be to display intersubjective understanding through demonstrated action (in
anticipation of or in response to the other’s verbalization) and/or to practice the proce-
dure being described, possibly as an aid to future recall. Listener gestures provide vis-
ible evidence that the listener inhabits a conceptual world in common with the speaker.
In so doing, they require a commitment to particulars of the situation not evident in
speech. Gestural enactments evoke an imagined setting (the flight deck of a jet aircraft),
a role (pilot flying), and a vantage point (in the right or left seat), including details such
as the location and operation of aircraft controls. Aspects of the setting “are brought
forth as implied elements in an imagined world of culturally meaningful action”
(Hutchins and Nomura 2011: 41), and gestures “are coupled to elements of that ima-
gined environment” (Hutchins and Nomura 2011: 42). The speaker also appears to
modify his on-going talk as a consequence of the other’s gestures, variously confirming
or correcting the apparent interpretations or omitting items that have already been es-
tablished, such that a lexical affiliate for the other’s gesture may never be spoken.
Hutchins and Nomura claim that “the participants are engaged simultaneously in two
kinds of projects: they are enacting conceptual objects of interest (what they are talking
about), and they are conducting a social interaction. While these objects are analytically
separable, in action, they are woven into the same fabric” (Hutchins and Nomura 2011:
40). In these examples, talking and gesturing in coordination with others appears to be a
way of establishing common ground for the discourse while simultaneously establishing
the objects of knowledge that define the work of the profession.
Finally, it is worth pointing out that the most significant phenomena described
here – the coupling of gesture with other media, and the coordinated production of
talk and gestures by separate participants – are precluded by an experimental method-
ology in which one participant who has seen a video narrates the events, without access
to any material resources, to another who has not. These phenomena are also less likely
to be observed in studies of conversations where participants tell stories about people,
happenings, and objects not present. This point is not meant to diminish the many in-
sights that are gained from such studies. It is, however, meant to press the claim that the
primordial home for gesture is in mutual, consequential activity in culturally con-
stituted settings. Gesture in such activity is likely to be performed in relation to
other media and in close interaction with other participants, and it is likely to serve a
functional role in cognition that goes beyond the expression of internal content.
We have already seen one illustration of the propagation of cultural practices in Alač
and Hutchins’ (2004) example of the experienced functional magnetic resonance imaging
researcher teaching the novice how to interpret brain images. Here the expert used point-
ing and tracing to highlight shapes in the visible media (the chart and the image) while
her speech profiled conceptual entities and relations manifested in those shapes. Keeping
a fixed handshape while moving her hand from one semiotic space (the chart) to another
(the image) and repeating the gestural form helped establish a conceptual link between
elements in the two spaces. By compressing analogy into identity (Fauconnier and Turner
2002: 314–315, 326), the novice learns to see the shape on the computer screen as a cor-
tical map in the participant’s brain. It may be that the expert routinely uses her hands to
accomplish this seeing on her own, perhaps by tracing outlines on the images she is ex-
amining, but when she teaches the novice, she performs these actions overtly, opening the
process to scrutiny. She also annotates her task-relevant actions with additional gestures,
speech, and shifts of gaze from work objects to her addressee, monitoring the novice’s
responses as she explicitly guides him in where to look (how to attend), what to see
(how to conceptualize what is being viewed), and what to do (how to act). The expert shifts
projects from accomplishing to teaching. This shift is evident in how she orients her body
and how she uses her hands and her talk to engage her interlocutor as well as the tools of
her trade. In her instructional discourse, the expert demonstrates how to find, interpret,
coordinate, and employ relevant states of representational media to accomplish the
work of a functional magnetic resonance imaging researcher.
A more commonplace example of shifting projects from doing to teaching, of opening
functional systems to scrutiny, and of guiding the conceptualization of novices can be found
in adults teaching children to tell time (Williams 2008a, 2008b). Expert time-tellers look at
an analog clock and read the time with gaze-fixing and slight gaze-shifting from one clock
hand to another as the only visible evidence of a cognitive process unfolding. It is doubtful
that any novice could learn to read the clock simply by watching an expert do it. Children
learn to read the clock because adults who are proficient time-tellers provide them with
active instruction: pointing to structures and tracing paths on the clock face, highlighting
elements, relations, and processes while construing them with speech, and shifting gaze
from the clock face to the learner to monitor attention and seek signs of confusion or
comprehension. While the child practices reading times on the clock, the adult monitors
and provides prompts, confirmation or correction, and additional instruction as needed.
Through this form of social interaction, children learn to see meaningful structure on
the clock face and to interpret that structure in relation to human activity and to a con-
ventional system of time measurement. Seeing time on the clock is another cognitive
accomplishment entrained by the gestural weaving of material and conceptual worlds.
As a brief example, consider the fragment of instruction shown in Fig. 15.4 (Williams
2008a). Here the teacher says “now another way that we say it, is we count by fives,
when we move this from number to number; there’s five minutes between each number”
while she enacts a hypothetical process of counting on the clock. If we break this fragment
into segments, we see the dynamic mapping of conceptual content to the clock face as
mediated by gesture. While saying “now another way that we say it,” the teacher moves
the minute hand to the 12, positioning the hand at the starting point for a clock-counting
process. When she says “is we count by fives,” she activates a cognitive model for counting
that is familiar to her first grade class: touching objects (sets of five elements) while utter-
ing “five, ten, fifteen….”. Accompanying her statement “when we move this” is a shift of
15. Cognitive Anthropology: Distributed cognition and gesture 255
gaze to where the tip of her right index finger rests on the minute hand; this construes
the minute hand as the thing-to-be-moved, namely as the pointing finger that will touch
each object-set as it is counted. The next part of her utterance, “from number to num-
ber,” defines numbers as the object-sets to be counted; here she touches the large
numerals on the clock face in sequence, making it clear which numbers she is referring
to, while the form of her gestural movement enacts a canonical counting motion, boun-
cing from one number to the next clockwise around the dial. The gesture alone provides
the origin, direction/path, and manner of the counting motion, which is notably not the
continuous, steady movement of a clock hand but the intermittent, bouncing movement
of a human hand touching objects while counting. The same gesture continues during
the next statement, “there’s five minutes between each number,” a statement that acti-
vates a cognitive model for the conventional system of time measurement, in which an
hour is divided into 60 minutes, and maps an interval of five minutes to the space
between adjacent numbers on the clock. In this example, a single gesture in coordina-
tion with two verbal statements sets up mappings from two cognitive models: a mapping
from objects in the counting model to numbers on the clock face, and a mapping from
units of time in the time measurement model to intervals of space on the clock face. The
second mapping conjoins with the first to implicitly generate a third mapping: linking
units in the system of time measurement (minutes) to elements of the object-sets
being counted (five minutes per object-set). All of this is accomplished through the
coordination of gaze, gesture, speech, and a culturally constituted artifact, all carefully
orchestrated to guide the novice’s conceptualization (Williams 2008b).
Once these mappings are established, the teacher performs the counting process by
grasping the minute hand and moving it to the 5, the 10, and the 15, pausing momen-
tarily at each while saying “five, ten, fifteen….” If the children have succeeded in mak-
ing the correct conceptual mappings, they will see the clock hand as a counting finger
that touches each number in sequence while the elapsed minutes are counted. This,
in microcosm, is how conventional functional systems get propagated, sustaining the
cognitive accomplishments of the human species.
Given this instruction, the children must perform the activity with diminishing help
until they are able to instantiate the functional system successfully in appropriate
256 II. Perspectives from different disciplines
contexts with little effort; only then would we say that they have mastered the practice.
Once they become proficient and use the system repeatedly, they will come to recognize
the hand configurations and numeric labels as standing for particular five-minute times
(oh five, ten, fifteen, and so on), and they will shift strategies from counting to directly
naming these times, retaining counting as a backup strategy should memory fail them.
A new functional system will emerge, one that supports more efficient conduct of the
activity while it reduces the cognitive demands on the individual coordinating the sys-
tem to produce the intended outcome. The expert system will differ from the novice sys-
tem, but the counting-based practice will continue to be retained by our culture as a
stepping-stone because it enables the sustained successful performance through
which the memory-based ability arises.
7. Conclusion
This article has presented evidence for gesture’s role in: (1) coordinating the functional
systems through which cognitive work gets done, and (2) propagating those systems
across generations. In purposeful human activity, participants gesture not simply to
express but to accomplish. The familiar conduit metaphor of communication proves
inadequate for studying meaning-making in situated activity because it obscures the
ways gesture operates in distributed systems for human cognition. Even where the
focus of study is exclusively on speech, the conduit metaphor tends to mislead because,
as Hutchins (2006) points out, “it is easier to establish a meaning for words embedded
with gestures that are performed in coordination with a meaningful shared world than it
is to establish meanings for words as isolated symbols” (395). That humans can commu-
nicate solely through words is clear, but that such communication should be regarded
as prototypical is clearly mistaken. Recognizing this, leading gesture researchers like
Kendon and McNeill have argued that gesture, like speech, is part of utterance.
Researchers who study distributed cognition find it more productive to treat gesture
as part of the functional systems through which cognitive outcomes are accomplished.
If we expand the unit of analysis to encompass aspects of the setting, of mutual orien-
tation and (inter-)action, and of shared knowledge and the unfolding of goal-directed
activity, then we stand a better chance of understanding and appreciating the critical
role that gesture plays in human cognition and communication.
8. References
Alač, Morana and Edwin Hutchins 2004. I see what you are saying: Action as cognition in fMRI
brain mapping practice. Journal of Cognition and Culture 4(3): 629–661.
Becvar, L. Amaya, James Hollan and Edwin Hutchins 2005. Representational gestures as cogni-
tive artifacts for developing theory in a scientific laboratory. Semiotica 156(1/3): 89–112.
Chafe, Wallace 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Con-
scious Experience in Speaking and Writing. Chicago: University of Chicago Press.
Clark, Andy 1997. Being There: Putting Brain, Body, and World Together Again. Cambridge: Mas-
sachusetts Institute of Technology Press.
Fauconnier, Gilles and Mark Turner 2002. The Way We Think: Conceptual Blending and the
Mind’s Hidden Complexities. New York: Basic Books.
Goodwin, Charles 1994. Professional vision. American Anthropologist 96(3): 606–633.
15. Cognitive Anthropology: Distributed cognition and gesture 257
Goodwin, Charles 2000. Action and embodiment within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan D. Duncan, Justine Cassell
and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language: Essays in Honor
of David McNeill, 195–212. Amsterdam: John Benjamins.
Halverson, Christine 1995. Inside the cognitive workplace: New technology and air traffic control.
Ph.D. dissertation, Department of Cognitive Science, University of California, San Diego.
Hazlehurst, Brian 1994. Fishing for cognition: An ethnography of fishing practice in a community
on the west coast of Sweden. Ph.D. dissertation, Departments of Anthropology and Cognitive
Science, University of California, San Diego.
Holder, Barbara 1999. Cognition in flight: Understanding cockpits as cognitive systems. Ph.D. dis-
sertation, Department of Cognitive Science, University of California, San Diego.
Hutchins, Edwin 1995a. Cognition in the Wild. Cambridge: Massachusetts Institute of Technology
Press.
Hutchins, Edwin 1995b. How a cockpit remembers its speeds. Cognitive Science 19: 265–288.
Hutchins, Edwin 2001. Distributed cognition. In: Neil J. Smelser and Paul B. Baltes (eds.), Inter-
national Encyclopedia of the Social & Behavioral Sciences, 2068–2072. Oxford: Elsevier.
Hutchins, Edwin 2003. Cognitive ethnography. Plenary address at the 25th meeting of the Cogni-
tive Science Society, Boston, MA, July 31–August 2.
Hutchins, Edwin 2006. The distributed cognition perspective on human interaction. In: Nick J. En-
field and Stephen C. Levinson (eds.), Roots of Human Sociality: Culture, Cognition and Inter-
action, 375–398. Oxford: Berg.
Hutchins, Edwin 2010. Enaction, imagination, and insight. In: John Stewart, Olivier Gapenne and
Ezequiel A. Di Paolo (eds.), Enaction: Towards a New Paradigm for Cognitive Science, 425–
450. Cambridge: Massachusetts Institute of Technology Press.
Hutchins, Edwin and Tove Klausen 1996. Distributed cognition in an airline cockpit. In: Yrjö En-
geström and David Middleton (eds.), Cognition and Communication at Work, 15–34. New
York: Cambridge University Press.
Hutchins, Edwin and Saeko Nomura 2011. Collaborative construction of multimodal utterances.
In: Jürgen Streeck, Charles Goodwin and Curtis LeBaron (eds.), Multimodality and Human
Activity: Research on Human Behavior, Action, and Communication, 29–43. Cambridge:
Cambridge University Press.
Hutchins, Edwin and Leysia Palen 1997. Constructing meaning from space, gesture, and speech.
In: Lauren B. Resnick, Roger Säljö, Clotilde Pontecorvo and Barbara Burge (eds.), Discourse,
Tools, and Reasoning: Essays on Situated Cognition, 23–40. Berlin: Springer-Verlag.
Kendon, Adam 1972. Some relationships between body motion and speech: An analysis of an
example. In: Aaron Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
177–210. Elmsford, NY: Pergamon Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kirsh, David 1995. The intelligent use of space. Artificial Intelligence 73: 31–68.
Kirsh, David and Paul Maglio 1994. On distinguishing epistemic from pragmatic actions. Cognitive
Science 18(4): 513–549.
Latour, Bruno 1986. Visualization and cognition: Thinking with eyes and hands. Knowledge and
Society: Studies in the Sociology of Culture Past and Present 6: 1–40.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge, and the world. In: David
McNeill (ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3):
350–371.
258 II. Perspectives from different disciplines
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arno Spitz.
Murphy, Keith M. 2004. Imagination as joint activity: The case of architectural interaction. Mind,
Culture, and Activity 11(4): 267–278.
Reddy, Michael J. 1979. The conduit metaphor: A case of frame conflict in our language about lan-
guage. In: Andrew Ortony (ed.), Metaphor and Thought, 284–297. Cambridge: Cambridge
University Press.
Streeck, Jürgen 2008. Depicting by gesture. Gesture 8(3): 285–301.
Streeck, Jürgen 2009. Gesturecraft: The Manu-facture of Meaning. (Gesture Studies 2.) Amster-
dam: John Benjamins.
Williams, Robert F. 2004. Making meaning from a clock: Material artifacts and conceptual blend-
ing in time-telling instruction. Ph.D. dissertation, Department of Cognitive Science, University
of California, San Diego.
Williams, Robert F. 2006. Using cognitive ethnography to study instruction. In: Sasha A. Barab,
Kenneth E. Hay and Daniel T. Hickey (eds.), Proceedings of the 7th International Conference
of the Learning Sciences, Volume 2, 838–844. International Society of the Learning Sciences
(distributed by Lawrence Erlbaum Associates).
Williams, Robert F. 2008a. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Müller (eds.), Metaphor and Gesture, 55–92. Amsterdam: John Benjamins.
Williams, Robert F. 2008b. Guided conceptualization: Mental spaces in instructional discourse. In:
Todd Oakley and Anders Hougaard (eds.), Mental Spaces in Discourse and Interaction, 209–
234. Amsterdam: John Benjamins.
Williams, Robert F. 2008c. Situating cognition through conceptual integration. Paper presented at
the 9th conference on Conceptual Structure, Discourse, and Language, Case Western Reserve
University, Cleveland, OH, October 18–20.
Abstract
This contribution aims to provide an up-to-date picture of the theoretical models on com-
munication and bodily communication in social psychology, from the classical Encoding/
Decoding paradigm for explaining interpersonal communication in general to the con-
temporary Communication Accommodation Theory and Parallel Process Model for
16. Social psychology: Body and language in social interaction 259
1. Introduction
This chapter briefly revises social psychology’s approach to the study of bodily and lin-
guistic communication in social interaction, and it is organized in three main sections.
The second section describes the main broad theoretical paradigms used in studying
bodily and linguistic communication. The third section then describes the specific
social-psychological models; they are illustrated with the main topics covered by social
psychological studies, thus focusing on the role body and/or language in interaction
plays in each of the main processes involved in social interaction.
Social psychology, in focusing on psychological aspects of social life, views language
(or communication in general) and social life as intertwined: “just as language use
pervades social life, the elements of social life constitute an intrinsic part of the way lan-
guage is used” (Krauss and Chiu 1998: 41). Thus on the one hand, for social psychology,
language characterizes all the classical objects of social psychology: social perception,
attribution, attitude, identity, both interpersonal and intergroup social interaction
(e.g., Kruglanski and Semin 2007). On the other hand, while linguists focus on language
as abstract structure, social psychology focuses on the social context (e.g., participants’
definition of the social situation, participants’ perceptions of others) affecting actual
language uses, and thus communication.
Social psychology recognizes that not all communication involves language. A broad
definition of communication, accepted at least within mainstream social psychology,
views it as a process involving exchanges of representations (Sperber and Wilson
1986). Language (either speech or writing) is one possible medium and form for repre-
sentations, but it is not the only one, since bodily features could also do the same work.
Following Krauss and Chiu (1998), in human communication a common case would be
constituted by two persons as two information-processing devices where one person
modifies the physical environment of the other person (say, perturbations of air mole-
cules caused by speech), and therefore the second one constructs mental representa-
tions which are similar to the first person’s mental representations. Departing from
such a basic conception, four “paradigms” or “models” of interpersonal communication
can be identified in terms of their different characterizations of the process by which
representations are conveyed in the communication process (following Krauss
and Chiu 1998; see also Krauss and Fussell 1996): Encoding/decoding, Intentionalist,
Perspective-taking, Dialogic paradigm.
260 II. Perspectives from different disciplines
one hand, the same message can have different meanings for different recipients, and,
on the other hand, speakers consequently try to take their addressees’ perspectives into
account in formulating their messages (Krauss 1987).
among participants to the social interaction. The social construction of the world and
the mind is possible thanks to communication (e.g., from Berger and Luckmann
1966; to Gergen 1985; to Edwards and Potter 1992).
It must be noted that this view applies not simply to talk-in-interaction, but to any
form of communication, even when no interlocutors are present. In fact, a potential
addressee or audience is always at stake even when writing or when arguing with
oneself (Billig 1987).
Though this paradigm has long roots in the past, it has recently found new, maybe
unexpected, vital lymph from social neuroscience: results show the power of language
and communication in shaping the mental and social reality through specific brain
changes (e.g., Paquette et al. 2003; Siegel 1999).
the one hand, there are social judgments about others (through more or less auto-
matic processes), and on the other hand there are social behaviours (more or less
automatic too).
This model includes: (i) determinants; (ii) the social environment; (iii) cognitive-
affective mediators.
(i) Determinants are concerned with biological, cultural and personality aspects. The
first reflects the role of adaptive evolution, including communication patterns, such
as the positive and protective response towards babies’ faces that is beneficial
for their survival. Cultural and personality aspects may increase the variability
in communication: cross-cultural differences and personality styles influence and
characterize bodily communication.
(ii) The social environment regards interactional setting as well as the interlocutor.
The environment determines the stimuli to be judged and toward which to
react. Sometimes, the communicator, with his/her own specificity, chooses the
communication partner and the setting. This selection process, combined with
the personal determinants, could affect both evaluation and behavioural processes,
bringing, through homogeneity and similarity between the interlocutors, more
accurate social judgments about others, but also behavioural coordination in the
interaction.
(iii) If social and environmental determinants provide a framework for the interaction,
cognitive-affective mediators are the processes that drive the evolution of commu-
nication. These include personal dispositions, goals, emotional states, interpersonal
expectancies and cognitive resources. The goals are perhaps the most important
mediators, because they are cognitive representations of desired states toward
which people tend, and they can influence the investment of cognitive resources
based on the type of information revealed and on the processing depth of this
information. Affective states and mood are a combination of temporary disposi-
tions and can affect both the formation of social judgments and the bodily style
adopted. Finally, cognitive resources refer to cognitive investments available to
the interaction that can be variously distributed among themselves, partners,
settings, conversation topic, etc.
People evaluate others through bodily signals and react or act through bodily beha-
viors. In these activities, individuals are led by specific social goals. The dynamic rela-
tionship between the parallel social judgment and behavioural processes is constrained
by the influence of the determinants (biology, culture, and personality) and of the
social environment. Goals are critical in directing the two processes. Once goals are
activated by environmental stimuli, they consciously or unconsciously direct the course
of the two processes (social judgment and behaviour). In such a process, both cognitive
and affective mediators take part. The availability of cognitive resources and their
application in communication are important elements in this parallel processing.
When automatic processes are not available, or they are inappropriate or they do
not work, cognitive effort is then required in making social judgments and in managing
social behaviour. Furthermore, in these processes ( judgments, behaviour and cognitive
mediation), people are often influenced by their current affective state, emotions or
mood.
268 II. Perspectives from different disciplines
evaluation of bodily behaviours also occurs in an automatic and unconscious way. The
origin of perception and the production of bodily communication could therefore be
based primarily on automatic associations among contiguous and elements similar in
nature rather than on the cognitive processing and analysis of benefits and likely con-
sequences of judgments and behaviours. Of course, some social-psychological variables
could work as mediators in the automaticity of these processes, such as motivation,
available cognitive resources and time, accessibility of conceptual content and meanings,
purposes, expectations and valences, as well as specific communication skills.
For example, to see our interlocutor smiling can automatically elicit a positive judg-
ment and generate, through an impulsive system and in a non-deliberative way, bodily
behaviours of approaching or replaying to the smile. The automaticity of these
processes can be based on our motivational orientation regarding:
On the other hand, the involvement of variables such as, for example, motivation to
understand the intentions and expectations of others and/or cognitive resources avail-
able for decoding the smiling behaviour can generate deliberate processes of interpre-
tation and, therefore, moderate the automaticity of our positive judgment of the other.
For example, we will try to understand if the other wants to obtain something from us, if
we are motivated to do so and/or if we can understand and relate her/his smile with the
interactional situation or with other verbal behaviours or otherwise. In addition, our
goals and expectations with respect to the communicative situation, as well as the ana-
lysis of the benefits and possible consequences, can activate the parallel intervention of
the reflective system in the planning and implementation of the bodily behaviour sche-
mata in reaction to the smile of the other (e.g., smile, reply, gaze, approach, contact or,
conversely, avoidance, withdrawal, escape).
In conclusion, the latest theoretical developments in the social-psychological expla-
nation of social-cognitive processes in general can also be applied to the understanding
and prediction of bodily behaviours. Understanding the deliberative and impulsive
components and processes, which lie at the origin of bodily behaviours and their per-
ception and evaluation, is a key step, after processes in social psychology, as well as
in order to understand the importance of bodily, after evaluation behaviours within
any kind of interpersonal relationship in general. For example, within the gestural
domain, it is now experimentally tested that some kind of gestures (conversational
and ideational vs. self-manipulation) have a more positive effect on the perception of
competence and composure of the speaker as well as on positive features of the mes-
sage (Maricchiolo et al. 2009). However, it is presently untested whether this kind of
effects rests on either automatic/impulsive and/or deliberative/reflective processes.
Future research should clarify this kind of issue.
processes. Cognitive and affective elaboration of social objects, such as attitude, stereo-
type, social perception and emotion involves bodily state or movement as well as the
brain’s modality-specific systems for perception and action (Barsalou et al. 2003).
Such a process is called the embodiment of social cognition. Embodied cognition
theories evolved from advances in social cognition toward comprehensive accounts of
embodied phenomena that, traditionally, have been difficult to explain. Embodiment
underlies social information processing when the perceiver interacts with actual social
objects (online cognition) and when the perceiver represents social objects in their
absence (offline cognition; Niedenthal et al. 2005). Imitation of another person’s
happy facial expression is an example of online embodiment. On the other hand, under-
standing the word “happiness” or recalling a happy past experience by recruiting mod-
ality-specific systems in the present is an example of offline embodiment. Another
example is the importance of motor behaviour in attitudes. An attitude towards another
person is often showed by special behaviours, such as distance, body orientation, as well
as posture assumed during the interaction. Moreover, bodily responses during interac-
tion with novel objects or persons influence later-reported attitudes and impressions
(Tom et al. 1991). According to this theoretical approach, bodily postures (relaxed/
tense) and movements (approach/avoidance) are associated with (positive/negative)
inclinations and action tendencies toward objects and persons. Furthermore, these incli-
nations and tendencies influence attitudes toward or perception about those objects and
persons (see Bonaiuto, De Dominicis, and Ganucci Cancellieri volume 2; Maricchiolo,
Bonaiuto, and Gnisci volume 2 for one example within the domain of social power and
one within that of political orientation, respectively). Thus, attitudes and impressions
would seem to be determined, at least in part, by embodied responses (Niedenthal
et al. 2005). This process would occur when people process symbolic entities, such as
words. Cognitive elaboration of concepts, such as recognition, memory, and understand-
ing, is maximally efficient when relevant conceptual information is consistent with cur-
rent embodiments (Chen and Bargh 1999). So, also mimicry, imitation movements, and/
or postural synchrony during interaction are embodiments of positive attitude or per-
ception of the other (Bernieri and Rosenthal 1991; Chartrand and Bargh 1999); more-
over, these embodied processes would facilitate social perception, cooperation and
empathy (Neumann and Strack 2000). Finally, also the mere simulation through body
movements and poses of an embodiment can affect physiological and psychological
states. A recent study (Carney, Cuddy, and Yap 2010) shows that high power bodily dis-
plays, such as open, expansive postures, can affect changes in cortisol and testosterone
secretion as well as one’s own power self-perception and behaviours; for example, sit-
ting on a chair and leaning towards the left or the right would polarize our political ori-
entation towards, respectively, the left or the right of the political attitudes spectrum
(Oppenheimer and Trail 2010).
Such embodiment phenomena can originate in long-term processes, such as the Spa-
tial Agency Bias, according to which there is a link among the perception of social
agency and the reading and writing direction of the culture to which the person belongs;
in left-right writing cultures, people tend to attribute more agency to from left to right
postures and movements; while in right-left writing cultures the bias is reversed. The
general idea is that agentic targets (i.e., performing an action) are systematically asso-
ciated with a left position, with recipient targets to their right. The resulting direction of
the action is rightward. This idea, initially proposed by Chatterjee (2002) was then
272 II. Perspectives from different disciplines
5. References
Abele, Andrea E., Mirjam Uchronski, Caterina Suitner and Bogdan Wojciszke 2008. Towards an
operationalization of the fundamental dimensions of agency and communion: Trait content rat-
ings in five countries considering valence and frequency of word occurrence. European Journal
of Social Psychology 38: 1202–1217.
Argyle, Michael and Janet Dean 1965. Eye contact, distance and affiliation. Sociometry 28: 289–304.
Austin, John L. 1962. How to Do Things with Words. Oxford: Clarendon Press.
Bargh, John A. 1997. The automaticity of everyday life. In: Robert S. Wyer (ed.), Advances in
Social Cognition, 1–61. Mahwah, NJ: Lawrence Erlbaum.
Barsalou, Lawrence W., Paula Niedenthal, Aron Barbey, and Jennifer Ruppert 2003. Social
embodiment. Psychology of Learning and Motivation 43: 43–92.
Bell, Allan 1980. Language style as audience design. Language in Society 13: 145–204.
Berger, Peter and Thomas Luckmann 1966. The Social Construction of Reality. New York:
Doubleday.
Bernieri, Frank J. and Robert Rosenthal 1991. Interpersonal coordination: Behavioral matching
and interactional synchrony. In: Robert S. Feldman and Bernard Rimé (eds.), Foundations
of Nonverbal Behaviour, 401–432. Cambridge/New York: Cambridge University Press.
Billig, Michael 1987. Arguing and Thinking. A Rhetorical Approach to Social Psychology. Cam-
bridge: Cambridge University Press.
Bonaiuto, Marino, Stefano De Dominicis and Uberta Ganucci Cancellieri volume 2. Gestures,
postures, gaze, and movement in work and organization. In: Cornelia Müller, Alan Cienki,
Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body-Language-
Communication: An International Handbook on Multimodality in Human Interaction. (Hand-
books of Linguistics and Communication Science 38.2.) Berlin: De Gruyter Mouton.
Burgoon, Judee K. 1978. A communication model of personal space violations: Explication and an
initial test. Human Communications Research 4: 129–142.
Burgoon, Judee K., Lesa Stern and Leesa Dillman 1995. Interpersonal Adaptation: Dyadic Inter-
action Patterns. New York: Cambridge University Press.
Carney, Dana R., Amy J. C. Cuddy and Andy J. Yap 2010. Power posing: Brief nonverbal displays
affect neuroendocrine levels and risk tolerance. Psychological Science 21: 1363–1368.
Chartrand, Tanya L. and John A. Bargh 1999. The chameleon effect: The perception-behavior link
and social interaction. Journal of Personality and Social Psychology 76: 893–910.
Chatterjee, Anjan 2002. Portrait profiles and the notion of agency. Empirical Studies of the Arts
20(1): 33–41.
Chen, Mark and John A. Bargh 1999. Consequences of automatic evaluation: Immediate behavior
predispositions to approach or avoid the stimulus. Personality and Social Psychology Bulletin
25: 215–224.
16. Social psychology: Body and language in social interaction 273
Stroebe (eds.), The Scope of Social Psychology: Theory and Applications, 107–120. New York:
Psychology Press.
Malatesta, Carol Zander and Jeannette M. Haviland 1982. Learning display rules: The socializa-
tion of emotion expression in infancy. Child Development 53: 991–1003.
Maricchiolo, Fridanna, Angiola di Conza, Augusto Gnisci and Marino Bonaiuto this volume. De-
coding bodily forms of communication. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body-Language-Communication: An
International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Maricchiolo, Fridanna, Augusto Gnisci, Marino Bonaiuto and Ginaluca Ficca 2009. Effects of dif-
ferent types of hand gestures in persuasive speech on receivers’ evaluations. Language and
Cognitive Processes 24(2): 239–266.
Maricchiolo Fridanna, Marino Bonaiuto and Augusto Gnisci volume 2. Body movements in polit-
ical discourse. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Jana Bressem (eds.), Body-Language-Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.2.) Berlin: De Gruyter Mouton.
Marková, Ivana and Klaus Foppa (eds.) 1991. Asymmetrics in Dialogue. Hemel Hempstead,
England: Harvester Wheatsheaf.
Marková, Ivana, Carl Friedrich Graumann and Klaus Foppa (eds.) 1995. Mutualities in Dialogue.
Cambridge: Cambridge University Press.
Mead, George Herbert 1934. Mind, Self, and Society. Chicago: University of Chicago Press.
Neumann, Roland and Fritz Strack 2000. “Mood contagion”: The automatic transfer of mood
between persons. Journal of Personality and Social Psychology 79: 211–223.
Niedenthal, Paula M., Lawrence W. Barsalou, Piotr Winkielman, Silvia Krauth-Gruber and
Francois Ric 2005. Embodiment in attitudes, social perception, and emotion. Personality and
Social Psychology Bulletin 9(3): 184–211
Oppenheimer, Daniel M. and Thomas E. Trail 2010. Why leaning to the left makes you lean to the
left: Effect of spatial orientation on political attitudes. Social Cognition 28: 651–661.
Paquette, Vincent, Johanne Lévesque, Boualem Mensour, Jean-Maxime Leroux, Gilles Beaudoin,
Pierre Bourgouin and Mario Beauregard 2003. Change the mind and you change the brain: Ef-
fects of cognitive-behavioral therapy on the neural correlates of spider phobia. NeuroImage 18:
401–409.
Patterson, Miles L. 2001. Toward a comprehensive model of non-verbal communication. In: Wil-
liam Peter Robinson and Howard Giles (eds.), The New Handbook of Language and Social
Psychology, 159–176. Chichester: John Wiley & Sons.
Rimé, Bernard, Loris Schiaratura, Michel Hupet and Anne Ghysselinckx 1984. Effects of relative
immobilization on the speaker’s nonverbal behavior and on the dialogue imagery level. Moti-
vation and Emotion 8: 311–325.
Rizzolatti, Giacomo and Michael A. Arbib 1998. Language within our grasp. Trends in Neuros-
ciences 21: 188–194.
Rommetveit, Ragnar 1980. Prospective social psychological contributions to a truly interdisciplin-
ary understanding of ordinary language. Journal of Language and Social Psychology 2: 89–104.
Searle, John R. 1969. Speech Acts. Cambridge: Cambridge University Press.
Searle, John R. 1985. Expression and Meaning: Studies in the Theory of Speech Acts. Cambridge:
Cambridge University Press.
Shannon, Claude Elwood and Warren Weaver 1949. The Mathematical Theory of Communication.
Urbana-Champaign: Illinois University Press.
Siegel, Daniel J. 1999. The Developing Mind. New York: Guilford Press.
Siegman, Aron W. and Mark A. Reynolds 1982. Effects of mutual invisibility and topical inti-
macy on verbal fluency in dyadic communication. Journal of Psycholinguistic Research 12:
443–455.
17. Multimodal (inter)action analysis: An integrative methodology 275
Sperber, Dan and Deidre Wilson 1986. Relevance: Communication and Cognition. Cambridge,
MA: Harvard University Press.
Strack, Fritz and Roland Deutsch 2004. Reflective and impulsive determinants of social behavior.
Personality and Social Psychology Review 8(3): 220–247.
Suitner, Caterina and Ann Maass 2008. The role of valence in the perception of agency and com-
munion. European Journal of Social Psychology 38: 1073–1082.
Tajfel, Henri (ed.) 1978. Differentiation between Social Groups: Studies in the Social Psychology
of Intergroup Relations. London: Academic Press.
Tom, Gail, Paul Pettersen, Teresa Lau, Trevor Burton and Jim Cook 1991. The role of overt head
movement in the formation of affect. Basic and Applied Social Psychology 12: 281–289.
Wiener, Norbert 1948. Cybernetics: Or Control and Communication in the Animal and the
Machine. Paris: Hermann.
Abstract
This article is an introduction to the theoretical and methodological backgrounds of
multimodal (inter)action theory. The aim of this theory is to explain the complexities
of (inter)action, connecting micro- and macro levels of analysis, focusing on the social
actor. The most important theoretical antecedent, mediated discourse analysis (see
Scollon 1998, 2001b), is presented with its key concepts mediated action and modes.
It is shown how action is used as the unit of analysis and how modes are understood
in multimodal (inter)action analysis – as complex cultural tools, as systems of
mediated action with rules and regularities and different levels of abstractness. Subse-
quently, methodological basics are introduced, such as lower-level, higher-level and fro-
zen action; modal density, which specifies the attention/awareness of the social actor; and
horizontal and vertical simultaneity of actions. Horizontal simultaneity can be plotted on
the heuristic model of foreground-background continuum of attention/ awareness. Vertical
simultaneity of actions comprises the central layer of discourse (immediate actions), the
intermediate layer (long-term actions) and the outer layer (institutional or societal con-
texts). In short, it is sketched how multimodal (inter)action analysis aims to answer ques-
tions about the interconnection of the different modes on a theoretical as well as on a
practical level.
276 II. Perspectives from different disciplines
1. Introduction
Multimodal (inter)action analysis is an interdisciplinary methodology that integrates
verbal and non-verbal actions (i.e.: spoken language and gesture, posture, or gaze) as
well as objects in the material world (i.e.: computers, cell phones, toys or pieces of fur-
niture) and the environment itself (i.e.: layout of a room, a city or a park). With this
methodology, we also integrate psychological notions such as feelings and levels of
attention/awareness as they reveal themselves phenomenologically in (inter)action.
Feelings may be displayed phenomenologically in a social actor’s facial expression,
and attention/awareness may be analysed through the modal intensity and/or complex-
ity of an action that is performed. However, before going into the details of how multi-
modal (inter)action analysis allows us to investigate these lower – or higher–level
actions and include the world of objects and the environment in the analysis, I would
like to begin by giving some historical and theoretical background. Every methodology
has an underlying theory, and while theories are often implicit when we write about
methodology, I would like to illustrate how theory and methodology have developed
and how they connect in multimodal (inter)action analysis to build a solid foundation
for multimodal inquiry.
However, even when doing this, he postulated that each practice (or repeated action) is
actually performed at a particular time in a particular place (or a site of engagement),
thus making a practice also an action.
Because of this dual property – a property of history and a property of the immediate –
the concepts become interlinked, theoretically linking the micro to the macro.
Thus, we can see that the mode of language is similar to the mode of walking in that
both possess rules and regularities (though vastly different), but we have to agree that
this is the only similarity.
The challenge for the area of multimodality is to account for the differences in modes
while simultaneously building an overarching theoretical umbrella that encompasses all
modes, building on their similarities.
A mode, as I have emphasized before, is not real, is not countable, and has to be defined
and re–defined depending upon the focus of study (Norris 2004, 2011c). However, in
loose terms, we can speak about modes (without clearly defining them) to gain a shared
(though vague) understanding of what we are referring to. For example, we can speak of
modes such as furniture, gesture, or images, and we all have some shared understanding
of what is meant. The reason we can speak of the mode of furniture, gesture, or images
is that there are certain rules and regularities that structure these modes. Yet all modes
differ greatly in their materiality and also in their structure.
A theory of mediated action allows us to account for the differences in all modes,
while also allowing us to integrate their similarities – if we add a new notion to
mediated discourse theory. Here, we want to keep in mind that most important for
the theory of mediated action is the notion that a mediated action is a social actor acting
with/through mediational means or cultural tools (throughout the chapter, the terms
cultural tool and mediational means are used interchangeably).
Taking this primary feature of mediated discourse analysis a step further, I define
modes as systems of mediated action. Further, modes may be concrete or abstract
systems of mediated action, where concrete modes are located on one end of a contin-
uum and abstract modes on the other and where most modes have some aspects of
abstractness as well as concreteness, allowing for a fuzzy distinction. The systems are
recognisable by the rules and regularities attached to the mediated actions.
As a mode, furniture is a system of mediated action because furniture is made by
social actors for social actors’ use. The mode of furniture does not exist without social
actors, and neither does any other mode. A system of mediated action (in this case the
mode of furniture) is always and only an ephemeral concept that allows us to talk about
(in this case) all or some kind of furniture. The actual pieces of furniture that social
actors use (or produce), however, are cultural tools.
Similarly, when speaking about the mode of gesture as a system of mediated action,
we refer to a concept. When speaking about a gesture that is actually being performed,
the gesture consists of a social actor using multiple mediational means.
By theorizing modes as systems of mediated action, the mere definition of the term
mode includes the irreducible tension between social actor and mediational means. This
17. Multimodal (inter)action analysis: An integrative methodology 279
tension is easily missed when defining modes as semiotic systems or as systems of rep-
resentation; but it is also missed when we define modes as mediational means or cul-
tural tools because in either case the system seems dislodged from social actors. Yet,
it is social actors that multimodal (inter)action analysis is interested in and who are
at the heart of communication, not semiotic systems or systems of representation nor
mediational means or cultural tools.
The term mode is thus defined as a concrete or abstract system of mediated action,
whereby most modes lie somewhere on a continuum between these two points of con-
crete and abstract. A more concrete system would be the mode of walking. while a
more abstract system would be the mode of language.
This point of view directly grows out of mediated discourse theory (Scollon 2001a,
2001b), which is based on the following three principles:
(i) The principle of social action: Discourse is best conceived as a matter of social ac-
tions, not systems of representation or thought or values.
(ii) The principle of communication: The meaning of the term “social” in the phrase
“social action” implies a common or shared system of meaning. To be social an
action must be communicated.
(iii) The principle of history: “Social” means “historical” in the sense that shared meaning
derives from common history or common past. (Scollon 2001b: 6–8 [italics original])
Defining a mode as a system of mediated action naturally follows from the first princi-
ple, which highlights the social action as primary. Further, it encompasses the second
and the third principles, which highlight the aspect of shared meaning, action, and his-
tory. Common meaning or common past is constructed through mediated actions that
are taken in the world.
However, it also deviates from mediated discourse theory, as a mediated action in
mediated discourse theory is always and only a one-time irreversible action taken in
the world. By speaking of modes as systems of mediated action, I superimpose a con-
cept onto the irreversible one-time notion of a mediated action: A system of mediated
action is a heuristic concept that allows us to express the idea of something overarching,
such as furniture or language, walking or gesture, and directly embeds the importance
of social actors within this concept.
In this view, then, language is a system of mediated action; images are conceived as a
system of mediated action; walking is seen as a system of mediated action, and so on.
Each of these systems of mediated action (or modes) are viewed as systems that have
come about through the historical conglomeration of mediated actions, which always
and irreducibly link social actors to the cultural tools used to perform social action in
the world. In other words, modes are built and re-built, shaped and changed by social
actors through use. They are not viewed as systems of representation that exist in and
by themselves, but rather are viewed as systems that are produced and re-produced by
social actors in (inter)action.
When investigating modes in use, we can see that each system is recognizable
through the rules and regularities attached to it. But in this view, where a mode is a sys-
tem of mediated action, we have explicitly unpacked the irreducible tension between
social actors and cultural tools, allowing us to theorize modes in quite different ways
than before. When defining a mode as a system of mediated action with rules and
280 II. Perspectives from different disciplines
regularities attached to it, making the irreducible tension between social actor and med-
iational means explicit, we can see that the rules and regularities of a mode may be
more attached to the mediational means or cultural tools or may be more attached
to the social actor. When thinking about the mode of language, we see that language
is an abstract system of mediated action, where we can find a great number of rules
and regularities that have become embedded in the cultural tools of semantics, syntax,
etc. Differently, when thinking about the mode of walking, we find that the rules and
regularities of the modes are largely (though not entirely) linked to the social actor’s
body.
While we can make many sentences – and even many nonsensical sentence – and
have all of them still recognizable by others as language, the ways in which we can
walk, so that our walking is recognizable as such by others, are largely constrained
by our bodies. Certainly, language is also constrained by our bodies, but for the average
social actor language is much less constrained by the body than as for example walking.
Feet have to be placed in a certain way, not only because that is what the mode affords,
but precisely because the body highly constrains the mode. Further, language is a mode
that allows us to communicate in highly abstract ways, while walking is a much more
concrete mode, which is not usually used in highly abstract communicative ways,
even though the action of walking always does communicate to others.
While language is very well studied and we can easily adopt what we know about
language for a multimodal analysis, many other modes – with the exception of
maybe gesture and some aspects of gaze – are much less understood. Our interest in
multimodal (inter)action analysis, however, is not only to gain a better understanding
of particular modes; our interest is to gain a better understanding of how modes play
together in human (inter)action.
the lower-level action of pointing to the pepper mill on the table to come about. All
actions are demarcated by their beginning and ending points, and when investigating
actions in this way, we soon discover that there are many levels of higher-level action
(Norris 2011a). When analysing a dinner, for example, we can investigate the dinner
from beginning to end, but we can also investigate parts of the dinner as demarcated
higher-level actions by analysing the beginning and ending points, such as the beginning
and ending of food being brought to the table, or the beginning and ending of a topic
being discussed, the beginning and ending of someone eating dessert, or the beginning
and ending of a person addressing someone at the table. Thus, the extent and level of a
higher-level action always depends upon the focus of study: if one investigates addresses
during dinner, one investigates this as a higher-level action, taking into consideration
not only the verbal, but also the non-verbal actions, the objects that are handled, the
food that is eaten and the table and chairs that are used during the addresses that
one studies; and if one investigates complete dinners, one investigates the complete din-
ner as a higher-level action, taking into consideration not only the verbal, but also the
non-verbal actions, the objects that are handled, the food that is eaten and the table and
chairs that are used during the dinner that one studies. The essence in multimodal
(inter)action analysis is always the linking of the multitude of modes that are used in
(inter)action when social actors act and communicate.
daughter in the conversation, also asking about their day. While much talk overlaps, the
father interacts with each other social actor at the table, while he is also eating dinner
himself.
When we analyse this dinner, we can see that the father is paying focused attention
to the conversation between himself and his partner: Modal density in the conversation
between these two social actors is high, which is produced primarily through intensity of
the mode of language. His spoken language is fast-paced with a slight high pitch, and
the conversation continues for a long stretch during the dinner. During the exchange,
he mostly looks at the baby, coordinating his feeding with the baby’s self-feeding and
babbling. Once in a while, he looks up and gazes at his daughter, who is telling her
story of the day that he pays some attention to while he is engaged with the mother
and the baby, and makes a remark or asks her a question. He also sometimes gazes
at his older son, who is listening, but is mostly focused upon the food in front of him.
What we find here, is that the father engages in five (inter)actions, or higher-level
actions, simultaneously: he pays focused attention to his partner, pays attention to a
lesser degree to the baby, to a still lesser degree to his daughter, and to an even lesser
degree to his older son, and eats his own dinner without paying much attention to it at
all. In order to produce all of these simultaneous actions, he uses the modes of gaze,
gesture, spoken language, posture, proxemics, object handling, and furniture. Some
(inter)actions are constructed more through the mode of language, others more through
the mode of gaze or the mode of object handling, but all are produced through all of the
modes in one way or another, so that the modes play together differently for each
(inter)action.
When thinking about (inter)action in this way, we can see that social actors are often
engaged in several higher-level actions at the very same time. As analysts, we can read
the levels of attention/awareness that the father pays to each of the other social actors
off of the modal density that he uses as he engages with the others. While the father (as
any other social actor) can only give one higher-level action his focused attention at one
time, he can give differentiated attention to quite a few higher-level actions. The (inter)
actions that he engages in can then be plotted on the heuristic model called the
foreground-background continuum of attention/awareness (Fig. 17.1).
Modal
Density
(inter)action
with the
mother (inter)action
with the
Baby (inter)action
with the
(iner)action
daughter
with the (inter)action
son eating
Decreasing
Attention/Awareness
(i) the central layer of discourse (comprised of the immediate actions performed by a
social actor);
(ii) the intermediate layer of discourse (comprised of the long-term actions that social
actors produce within their network(s)); and
(iii) the outer layer of discourse (comprised of actions performed within the institu-
tional or societal contexts).
Often, all three vertical layers of discourse overlap, so that they are difficult to disen-
tangle. However, when it comes to misunderstandings or ruptures of some kind in
(inter)action, we often can find the answer to the problems in the different vertical
layers of discourse and their normative expectations. For example, let us think about
the father in our example above. There he is engaged with his partner in conversation,
feeding the baby, and (inter)action with the older two children at some times. On the
central layer of discourse, his actions all make sense, and on the intermediary layer of dis-
course, within his family network, his actions all run smoothly and are comprehensible to
all others involved.
But now imagine that the mother believes in a strict gender differentiation, where
she feeds the baby. In this instance, the father’s actions (his central layer of discourse)
would clash with his intermediary layer of discourse (his family or network discourse),
and instead of a smooth and happy dinner, a completely different and quite strained
(inter)action between the two adults would emerge. Thus, while the different layers
of discourse are often invisible in smooth (inter)actions because they overlap comple-
tely, they become visible when they diverge as (inter)action becomes strained because
of it in some way. Of course, such ruptures in (inter)action can also come about with the
outer layers of discourse as they disconnect from any of the other layers of discourse.
4. Conclusions
Multimodal (inter)action analysis is a method that is theoretically strongly grounded
in mediated discourse analysis. With this method, we try to analyse the complexities
284 II. Perspectives from different disciplines
in (inter)action, neither shying away from a minute and detailed micro analysis of
lower-level actions that social actors perform, nor shying away from connecting these
micro analyses to the various layers of discourse from micro to macro.
When we define modes as systems of mediated action, we highlight that modes are
conceptual notions which grow out of and change within (inter)actions. Modes are pro-
duced by social actors for social actors’ use. They are learned, developed, and changed
through (inter)action, and embed rules and regularities. Rules and regularities may be
more attached to the social actor’s body, as is the case for the mode of walking; or may
be more attached to the cultural tool, as is the case for the mode of language. Where
exactly rules and regularities can be found within a mode is of great interest to a multi-
modal (inter)action analyst, because it shows that we cannot treat all modes in the same
way. Modes have different affordances and limitations, and these differences have to be
taken into account when investigating multimodal (inter)action.
In multimodal (inter)action analysis, we take the mediated action as our unit of ana-
lysis, as we believe that humans foremost act in the world. Several actions are often per-
formed simultaneously, rather than only consecutively, and simultaneity is a notion that
multimodal (inter)action analysis investigates not only on a horizontal, but also on a
vertical level.
However, there are many different elements in (inter)action that can be analysed
using multimodal (inter)action analysis, from transcription and theorizing modes,
modal hierarchies, and modal interconnection in (inter)action to analysing identity pro-
duction (Norris 2002, 2007, 2011a, 2011c). Especially for the analysis of identity produc-
tion several methodological concepts have been developed (Norris 2011a). As an
example, we can speak of identity elements being produced as social actors perform
a higher-level action, such as the father in our example above is producing a father
identity as he feeds the baby or a partner identity as he discusses his day with the
mother in the example. He produces an immediate identity element (as he performs
the actions on the central layer of discourse, i.e.: the lower-level and higher-level ac-
tions that he performs); he also produces a continuous identity element (as he performs
the actions within the intermediary layer of discourse, i.e.: the family discourse in our
example); and he performs a general identity element (as he performs the actions
on the outer layer of discourse, i.e.: institutional or societal gender discourse in our
example).
Multimodal (inter)action analysis tries to shed new light on human communication
from a vast array of settings. Researchers have used multimodal (inter)action analysis
to study computer mediated communication (Örnberg Berglund 2005); to analyse
aspects of advertising (White 2010, 2011) and interactions between and among journal-
ists and public relations professionals (Sissons 2011); or to study Aipan art making
(Frommherz 2011).
We have further studied interaction in an elementary school classroom (Norris
2003), and have used multimodal (inter)action analysis to study doctor-patient interac-
tions, and music lessons, as well as traffic police officers and workplace practices (Norris
2004). On a more theoretical level, multiparty interactions (Norris 2006) have been
studied, as well as rhythm in (inter)action (Norris 2009) and gesture in relation to
language (Norris 2011c).
While we have gained some knowledge about multimodal (inter)action, multimodal
(inter)action analysis is still a young methodology that opens up many old questions to
17. Multimodal (inter)action analysis: An integrative methodology 285
new scrutiny: How do social actors perform actions? Which lower-level actions are nec-
essary to construct a higher-level action? Which lower-level actions are required
because of the construction of a higher-level action? How many actions can a social
actor perform simultaneously?
But we will also wish to answer questions such as: How do the various modes play
together in interaction? Are there rules and regularities attached to modal aggregates?
If modes, as shown in Norris (2011c) fluctuate in hierarchies, what does this mean for
the mode of language, the mode of gesture, or the mode of object handling? Further,
we will want to think about the practical impact that we can have because of our
findings and ask ourselves: Can we teach social actors about the various layers of dis-
course and the impact that these have on their everyday actions? How can we use
our new knowledge in a constructive way to alleviate miscommunication and better
communication?
With these and many other questions, multimodal (inter)action analysis tries to find
answers by integrating the study of language with context and the social actor’s use of
embodied modes such as gesture, gaze, or posture, always taking into account and trying
to build on research not only in linguistics, but also in the area of non-verbal behaviour
and gesture (as for example by Argyle and Cook 1976; Birdwhistell 1970; Dittmann
1987; Ekman and Rosenberg 1997; Kendon 2004; McNeill 1992, 2000, 2005).
5. References
Argyle, Michael and Mark Cook 1976. Gaze and Mutual Gaze. Cambridge: Cambridge University
Press.
Birdwhistell, Ray L. 1970. Kinesics and Context. Essays on Body Motion Communication. Phila-
delphia: University of Pennsylvania Press.
Dittmann, Allen T. 1987. The role of body movement in communication. In: Aron W. Siegman
and Stanley Feldstein (eds.), Nonverbal Behavior and Communication, 2nd edition, 37–63.
Hillsdale, NJ: Lawrence Erlbaum.
Ekman, Paul and Erika Rosenberg (eds.) 1997. What the Face Reveals. Basic and Applied Studies
of Spontaneous Expression Using the Facial Action Coding System (FACS). Oxford: Oxford
University Press.
Frommherz, Gudrun 2011. Sacred time: Temporal rhythms in Aipan practice. In: Sigrid Norris
(ed.), Multimodality and Practice: Investigating Theory-in-Practice-through-Methodology, 66–
81. New York: Routledge.
Goffman, Erving 1959. The Presentation of Self in Everyday Life. Garden City, NY: Doubleday.
Goffman, Erving 1963. Behavior in Public Places. New York: Free Press of Glencoe.
Goffman, Erving 1974. Frame Analysis. New York: Harper and Row.
Gumperz, John 1982. Discourse Startegies. Cambridge: Cambridge University Press.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kress, Gunther and Theo Van Leeuwen 2001. Multimodal Discourse: The Modes and Media of
Contemporary Communication. London: Edward Arnold.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David (ed.) 2000. Language and Gesture. Cambridge: Cambridge University Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Norris, Sigrid 2002. The implication of visual research for discourse analysis: transcription beyond
language. Visual Communication 1(1): 97–121.
286 II. Perspectives from different disciplines
Norris, Sigrid 2003. Autonomy: multimodal literacy events in an immersion classroom. Annual
Meeting of the American Association of Applied Linguistics. Washington, DC, March
21–23.
Norris, Sigrid 2004. Analyzing Multimodal Interaction: A Methodological Framework. London:
Routledge.
Norris, Sigrid 2006. Multiparty interaction: A multimodal perspective on relevance. Discourse Stu-
dies 8(3): 401–421.
Norris, Sigrid 2007. The micropolitics of personal national and ethnicity identity. Discourse and
Society 18(5): 653–674.
Norris, Sigrid 2009. Tempo, Auftakt, levels of actions, and practice: Rhythms in ordinary interac-
tions. Journal of Applied Linguistics 6(3): 333–356.
Norris, Sigrid 2011a. Identity in Interaction: Introducing Multimodal Interaction Analysis. Berlin:
De Gruyter Mouton.
Norris, Sigrid (ed.) 2011b. Multimodality and Practice: Investigating Theory-in-Practice-through-
Methodology. New York: Routledge.
Norris, Sigrid 2011c. Three hierarchical positions of deictic gesture in relation to spoken language:
A multimodal interaction analysis. Visual Communication 10(2): 129–147.
Norris, Sigrid and Rodney H. Jones (eds.) 2005. Discourse in Action: Introducing Mediated Dis-
course Analysis. London: Routledge.
Örnberg Berglund, Therese 2005. Multimodality in a three-dimensional voice chat. In: Jens All-
wood, Beatriz Dorriots and Shirley Nicholson (eds.), Proceedings from the Second Nordic Con-
ference on Multimodal Communication, April 7–8: 303–316.
Scollon, Ron 1998. Mediated Discourse as Social Interaction. London: Longman.
Scollon, Ron 2001a. Action and text: Toward an integrated understanding of the place of text in
social (inter)action. In: Ruth Wodak and Michael Meyer (eds.), Methods in Critical Discourse
Analysis, 139–183. London: Sage.
Scollon, Ron 2001b. Mediated Discourse: The Nexus of Practice. London: Routledge.
Sissons, Helen 2011. Multi-modal exchanges and power relations in a public relations department.
In: Sigrid Norris (ed.), Multimodality and Practice: Investigating Theory-in-Practice-through-
Methodology, 35–49. New York: Routledge.
Tannen, Deborah 1984. Conversational Style: Analyzing Talk among Friends. Norwood, NJ:
Ablex.
Van Leeuwen, Theo 1999. Speech, Music, Sound. London: Macmillan Press.
Vygotsky, Lev S. 1987. Thought and Language. Edited and translated by Eugenia Hanfmann and
Gertrude Vakar (revised and edited by Alex Kozulin). Cambridge: Massachusetts Institute of
Technology Press.
Wertsch, James V. 1998. Mind as Action. Oxford: Oxford University Press.
White, Paul 2010. Grabbing attention: The importance of modal density in advertising. Visual
Communication 9(4): 371–397. London: Sage.
White, Paul 2011. Reception as social action: The case of marketing. In: Sigrid Norris (ed.), Multi-
modality and Practice: Investigating Theory-in-Practice-through-Methodology, 138–152. New
York: Routledge.
Abstract
Whether for research or for attaining full enjoyment of the experience of reading a novel,
for instance, it would be unrealistic and rather shortsighted to identify gestures, manners
and postures by themselves, without acknowledging the other co-occurring or alternating,
conscious or unconscious, communicative bodily sign systems, simply as they manifest
themselves in everyday interactive or noninteractive situations (i.e. proxemic behaviors,
bodily chemical, thermal and dermal activities, object-related behaviors); apart from
the fact that, at the very least, they occur as co-systems within the triple structure of
speech: verbal language (the words said) – paralanguage (how those words sound, and
other word-like utterances) – kinesics (movements and stills positions co-occurring or
alternating with those words). This article, while referring to more extensive treatments
of its various topics, is meant therefore, with appropriate textual examples, as a reference
model, for researcher or serious readers, of all that ought to be regarded in literature as
kinesics and its varying sensory perception, its internal structuring, its qualifying charac-
teristics, and the different ways in which the sensitive reader, as recreator, should be able
to perceive its explicit and implicit occurrences in the writer’s created text, discussing also
the common phenomenon of our mute or sound “oralization” of the text as part of the
reading act. In addition, it suggests the application of this topic in the area of literary
anthropology.
In fact, kinesics is, above all, one of the three mutually inherent (and more often than
not mutually conditioning) components of interactive speech: verbal language (words) –
paralanguage (voice modifying features and independent word-like utterances) –
kinesics, circumstantially associated also to other somatic or extrasomatic signs like
tear-shedding, blushing, or clothes (Poyatos 2002a, 2002b), as well as to biophysicopsy-
chological, cultural, socioeconomic and educational conditioning elements (Poyatos
2002a: 124–130). Only keeping this in mind will make it possible to focus here, due
to editorial limitations, on bodily movements and positions in literature alone.
A literary work, most typically a novel, shows the following verbal and nonverbal
components: written verbal exchanges, paralinguistic descriptions and transcriptions,
kinesic descriptions, proxemic (interpersonal and person-environment spatial relation-
ships) descriptions, other described or evoked personal signs (chemical (smells, flavors),
thermal (temperature) shape, size, consistency and strength, weight, color), sensible de-
scriptions or implicit evocations of any natural, built or artifactual environment (sound
and silence, movement and stillness, chemical signs, temperature, spaces, volumes,
shapes, consistency, light, darkness) (Poyatos 2002a: 32–48; 2003c: 3–12, 16–24), not
to be dissociated from kinesics neither in literary studies or in the actual reading act.
This multi-channel reality shows researchers and sensitive readers: that the literatures
of the different cultures are an invaluable source of data for the study of kinesics
and other nonverbal bodily signs in anthropology, sociology, psychology, etc. (Poyatos
2002d); and that serious research on narrative, dramatic or poetic texts requires an
interdisciplinary approach to all the sign systems just mentioned (Poyatos 2002c).
Expressiveness in real life, and in that other reality we recreate in our personal read-
ing act, depends greatly on our universal as well as culture-specific kinesic behaviors.
They constitute a sort of unique kinetic vocabulary (i.e., movement and associated still-
ness) – with possible grammatical functions in the verbal-nonverbal flow of speech –
that can express visually and nonvisually what would be ineffable otherwise (Poyatos
2002a: 104–112).
Nonverbal communication in literature as a research area began to receive system-
atic treatment in the 1970s by Poyatos (1972, 1977) and proliferated since then (e.g.,
Korte 1993; Marmot Rein 1986; Portch 1985; and by the contributors to Poyatos’
1988, 1992 and 1997 edited volumes), more recently by Poyatos’ endeavors to establish
a realistic all-inclusive model for the analysis of the various nonverbal aspects of liter-
ature in its creative stages, its readers’ recreation, and its interlinguistic-intercultural
translation (2002c, 2002d, 2008). In order to deal with the specific areas and points
which should widen the readers’ perspectives and spur further interdisciplinary
research, this brief overview will concentrate on the novel.
kinesthetic perception, which, whether isolated or combined with the linguistic and
paralinguistic structures and with other somatic and object-manipulating behavioral
systems, possess intended or unintended communicative value (on kinesics, Poyatos
2002b: Chapter 5; 2002c: Chapter 4; 2002e).
Thus defined, its observation and study includes more than what is usually acknowl-
edged. First of all, its three basic categories:
– gestures, both conscious and unconscious, mainly of the head, the face alone, includ-
ing gaze, and the extremities: “Miss Crane’s thin red nostrils quivered with indigna-
tion” (Wolfe LHA: XXV);
– manners, that is, how we perform a gesture or adopt a posture, but also “social man-
ners” (eating, smoking, shaking hands, donning or doffing a garment, kind of gait):
“Mr F.’s Aunt [after eating a piece of toast] then moistened her ten fingers in slow
succession at her lips, and wiped then in exactly the same order on the white hand-
kerchief ” (Dickens LD: II, IX), “Paulie swang, catching him unexpectedly in the
jaw from the side. The fellow staggered” (Farrell YMSL: I, V);
– postures, delimiting, and caused by, movements, with which they articulate in a com-
municative continuum to be acknowledged as we read, as we must silences with
respect to sounds: “Harran fell thoughtful, his hands in his pockets, frowning moodily
at the toe of his boot” (Norris O: I, V). There are two kinds: dynamic postures, when
a basically static posture contains a moving element, or the whole body moves in a
posture: “She stood on one foot, caressing the back of her leg with a bare instep”
(Steinbeck GW: XX); and contact postures, involving another person or object:
“NAPOLEON […] [He sits down at the table, with his jaws in his hands, and his
elbows propped on the map]” (Shaw MD).
– movements and positions of eyes and eyelids: “Mrs. Kearney rewarded his very flat
final syllable with a quick stare of contempt” ( Joyce M);
– the hidden hands, whether still or moving: “Godfrey stood, still with his back to the
fire, uneasily moving his fingers among the contents of his side-pockets, and looking
at the floor” (Eliot SM: III);
– the heaving chest, an uncontrollable manner not necessarily dissociated from words,
paralanguage and other kinesic behaviors: “She [Ruth, in court] was pale, angry,
almost sullen, and her breast heaved” (Grey RT: X);
– a sudden stiffening of the body as thunders claps or something startle us: “She felt her
sister-in-law stiffen with nervousness and clasp her little bag tightly [in her excitement
when Morris gets up]” (Woolf Y: 1891);
– shuddering in disgust or horror, or shivering from cold or emotion: “CABOT [pats her
on the shoulder. She shudders]” (O’Neill DUE: III, iv);
– something as expressive as door-knocking (with personal and cultural characteristics)
or door-slamming: “[EBEN rushes out and slams the door – then the outside front door
[…]” (O’Neill DUE: I, ii);
– a person’s eloquent stride: “Elmer watched Jim plod away, shoulders depressed, a
man discouraged” (Lewis EG: XXX, V);
290 II. Perspectives from different disciplines
There are also less obvious, but relevant, occurrences, mainly: microkinesics, meaning-
ful movements and still positions of small magnitude: “Adam’s face was bent down, and
his jawbone jutted below his temples from clenching” (Steinbeck EE: XXIV, I), specif-
ically as micropostures, adopted by minor body parts (as in delicately feminine holding
of a cup with fingers of both hands loosely around it): “NAPOLEON [exasperated, clasps
his hands behind him, his fingers twitching […] This woman will drive me out of my
senses [To her] Begone” (Shaw MD).
As for our perception of body movements and positions, we can do it:
Holmes] could read that as he walked he grew more and more excited. That is shown by
the increased length of his strides” (Conan Doyle SS: I, IV).
(i) formative phase, initiated in different static positions to later continue its course
(sometimes a “manner,” as with how we fold or unfold our arms): “The serenity
of her expression [Sally’s] was altered by a slight line between the eyebrows; it
was the beginning of a frown” (Maugham OHB: XXI);
(ii) central phase, either dynamic (e.g. shaking the hand in the French “O, là, là!”) or
static (e.g. holding one’s temples with thumb and forefinger while trying to remem-
ber): “I mean it, sir. Please don’t worry about me.” I sort of put my hand on his
shoulder. ‘Okay?’ I said” (Salinger CR: II);
(iii) and dissolving phase or disarticulating, before initiating the next behavior, as with
the residual smile after a laugh, or a hand receding after executing the “firm” ges-
ture in: “In the momentary firmness of the hand that was never still – a firmness
inspired by the utterance of these last words, and dying away with them – I saw
the confirmation of her earnest tones” (Dickens BH: LX).
(i) the interrupted kinesic behaviors, at times as eloquent as unuttered words: “John
wearily swung his leg over the pommel, but did not at once dismount. His clear
gray eyes were wondering riveted upon the hunter” (Grey MF: XIX);
(ii) the intrasystemic relationships, among different body parts: “ ‘I know, old man,’ he
[Burlap] said, laying his hand on the other’s shoulder […] ‘I know what being
hard up is’ […] another friendly pat and a smile. But the eyes expressed nothing”
(Huxley PCP: XIII);
(iii) the intersystemic relationships, as with kinesics and tears as in: “She [Jennie] rose./
‘Oh,’ she exclaimed, clasping her hands and stretching her arms out toward him.
There were tears of gratefulness in her eyes” (Dreiser JG: VII).
Kinesic behaviors are variously affected by five parakinesic qualifiers, profusely illu-
strated in literature, which reveal anthropological, sociopsychological and clinical
292 II. Perspectives from different disciplines
aspects and can modify the meaning of the message, besides revealing cultural or socio-
educational backgrounds:
– intensity, or muscular tension: “Robin stirred his coffee furiously” (Wilson ASA:
I, IV), “ ‘No!’ Viciously, Warren Trent stubbed out his cigar” (Hailey H: “Tuesday,”
2), often the intensity of different systems combining in one expression: “A
broad-shouldered man […] spatted his knee with his palm. ‘I know it […]!’ he
cried” (Steinbeck GW: XXIV);
– pressure, exerted in varying degrees (distinct from the cutaneous sensations of touch,
pain, heat and cold, all parts of the sense of touch) on people or inanimate objects:
“Nothing could be stronger, more dependable, more comforting, than the pressure
of his fingers on her arm […] They laughed intimately […] he picked up her hand”
(Lewis EG: VII, IV), “There was a pause intense and real […] Then Gerald’s fingers
gripped hard and communicative into Birkin’s shoulders, as he said:/ ‘No, I’ll see to
this job through, Rupert […]’ ” (Lawrence WL: XIV);
– range: “Rising from his seat, Dismukes made a wide, sweeping gesture, symbolical of
a limitless expanse” (Grey WW: VIII);
– velocity, or temporal dimension: “He drank up his tea. Some drops fell on his little
pointed beard. He took out his large silk handkerchief and wiped his chin impati-
ently” (Woolf Y: 1880), “It was a slow smile […] a very sensual smile and it made
her heart melt in her body” (Maugham PV: II);
– duration of the behavior, distinct from (though closely related to) speed: “a strange
recital. She [June] heard it flushing painfully, and, suddenly, with a curt handshake,
took her departure” (Galsworthy MP: II, IV).
These qualifiers lend movements and positions their specific meaning and affect also
what is being expressed verbally: “[after Rachel takes her leave of her cousin God-
frey] He waited a little by himself, with his head down, and his heel grinding a hole
slowly in the gravel walk; you never saw a man look more put out” (Collins M:
“First Period,” IX).
– simultaneous single-meaning gestures in the same or different body area, the elements
of that multiple expression complementing and even qualifying each other: “Miriam
looked up. Her mouth opened, her dark eyes blazed and winced, but she said nothing.
She swallowed her anger and her shame, bowing her dark head” (Lawrence SL: VII);
– simultaneous multiple-meaning gestures in the same or different body area, the simul-
taneity of gesture and words lending a special tone to the verbal expression by the
dominant superimposition of the former: “Mr Tigg, who with his arms folded on
his breast surveyed them, half in despondency and half in bitterness” (Dickens
MC: VII).
This multiplicity can imply either congruence or incongruence among the various bodily
components (which should sensitize us to the interaction processes on deeper levels), as
with the congruence seen in: “ ‘I won’t bear it. No, I won’t’ he said, clenching his hand
with a fierce frown” (Beecher Stowe UTC: III).
Directly related to this are inter-gesture masking behaviors, in which, more often
than we imagine, we try to conceal the kinesic behavior (which conveys already a spe-
cific emotion or thought) by camouflaging it or masking it, even as consciously as we
would the meaning of our words (cf. Ekman 1981, referring only to the face). We
may try to mask that feeling or emotion with another one we do not feel, incongruence
thus being quite obvious: “Ataity made a disparaging grimace; but through the mask of
contempt his brown eyes shone with pleasure” (Huxley EG: XV).
But we may also, with a neutral or indifferent countenance, unsuccessfully try to
mask what we feel, even betraying a complex mixture of feelings: “As soon as he
[Mr. Dawson] set eyes on the patient [Laura] I saw his face alter. He tried to hide it,
but he looked both confused and alarmed” (Collins WW: 390).
Naturally, we can add paralanguage and even verbal language, since how words are
chosen and said can have a strong bearing on these more or less subtle masking pro-
cesses, for the three components of speech, language-paralanguage-kinesics, are often
mutually complementary in dissimulation and feigning acts: “ ‘He has everything to
do with it as far as I am concerned,’ March answered, with a steadiness that he did
not feel” (Howells HNF: IV, VII). In fact, signs like coughing, breaking eye contact,
laughter, blowing out smoke, etc., can perform these intended masking functions.
But there are also anticipatory manners and postures: “She [Rachel] approached
Mr Godfrey at a most unlady-like rate of speed […] her face […] unbecomingly
flushed” (Collins M: “Second Period,” “First Narrative,” II); in fact, any type of bodily
signs (blushing, heaving, etc.) can announce and determine the tone of the subsequent
interaction.
Since we perceive our own kinesics, if we are conscious of it, we cannot ignore pos-
itive or negative hidden gestures, mainly facial and manual, often unseen by others,
which we want to conceal more or less consciously, though the gesture nevertheless
is still there: “Ruthie mushed her face at his back, pulled out her mouth with her fore-
finger, slobbered her tongue at him” (Steinbeck GW: XX), “He [Basil Ramson] ground
his teeth a little as he thought of the contrasts of the human lot” ( James B: III), “he
wanted to hold her hand and tell her with quick little pressures that they were sharing
the English countryside” (Lewis B: IX), “[Soames] walked on faster, clenching his
gloved hands in the pockets of his coat” (Galsworthy IC: II, II).
Our repertoires of sound-producing gestures, which we can label phonic gestures,
that is phonokinesics, which acquire a language-like quality, are very distinctly differ-
entiated, both culturally and personality-wise: “ ‘Order, order, gentlemen,’ cried
Magnus, remembering the duties of his office and rapping his knuckles on the
table” (Norris O: II, IV), “ ‘Come into money, have you?’ he [the man at the
shop] cried, chuckling and slapping his thigh with a loud report” (Markandaya NS:
XXVIII).
We should also consider the many object-related gestures in which we manipulate
something as part of a compound kinesic behavior: “ ‘Rosedale – good heavens!’ ex-
claimed Van Alstyne, dropping his eye-glass” (Wharton HM: XIV), “Dr. Winskill
[…] sat in his consulting room, his elbows on his desk sliding a silver pencil backwards
and forwards from hand to hand” (Wilson ASA: I, IV).
– how they function as gestures, qualifying what is being said, blending with other
forms of expression and playing a crucial role in how people feel about one another:
“During the colloquy Jennie’s large blue eyes were wide with interest. Whenever he
looked at her she turned upon him such a frank, unsophisticated gaze, and smiled in
such a vague, sweet way, that he could not keep his eyes off of her for more than a
minute at a time” (Dreiser, JG, I), “She had a terrifically nice smile. She really did.
Most people have hardly any smile at all, or a lousy one” (Salinger, CR, VIII);
– how a smile can be seen in other parts of the face besides the lips, particularly the
eyes: “She smiled with her lips and with her eyes […] ‘Why not?’ asked his wife,
her blue eyes still pleasantly smiling” (Maugham CA: V);
– the fact that they can qualify whole stretches of “smiling speech”: “Mr Pecksniff
smilingly explained the cause of their common satisfaction” (Dickens MC: VI);
– that they can blend with other forms of expressions at first sight incongruously:
“Rose was unable to continue for a moment […] smiling through her tears, she
said” (Wilson ASA: I, II); and specifically with gaze: “His face [William de Mille’s]
was severe even in repose, and his mouth firm in preoccupation. But the lights
blazed behind the eyes and his lips were cross-hatched with lurking smiles” (de
Mille DP: V);
– that smiles can act as disclaimers: “ ‘Are you penitent?’ […]/ ‘Heart-broken!’ he
answered, with a rueful countenance – yet with a merry smile just lurking within
his eyes and about the corners of his mouth” (A. Brontë TWF: XXIV);
– that someone’s negative impression on us can change when that person smiles:
“Jonathan’s smile, which came quickly, accompanied by a warm light in the eyes,
relieved Helen of an unaccountable repugnance she had begun to feel toward the
borderman” (Grey LT: III);
– that there is also an “anatomical smile,” which can betray an unfelt feeling: “ ‘Poor
Goggler! How fiendish we were to him!’/ ‘That’s why I’ve always pretended I
didn’t know who he was,’ said Staithes, and smiled an anatomical smile of pity
and contempt” (Huxley EG: XX).
The smile, in sum, is perhaps the human gesture which affect us the most: “Oh that
those thine – thy own sweet smiles I see,/ The same that oft in childhood solac’d
me.” (Wm. Cowper On the Receipt of My Mother’s Picture, 1.1).
(i) permanent (position, size and shape of brows, eyelids and eyelashes, nose, cheeks,
mouth, forehead, chin and mandible, to which can be added the long-term pres-
ence of a beard or moustache, conspicuous sideburns, or hairdo): “with fat little
296 II. Perspectives from different disciplines
A novelist may merely suggest a character’s facial signs, which, together with a not always
available kinesic configuration, are all that we have to go by in order to carry out our own
task as readers. But as we engage in bringing those characters to life, we tend to broaden
the concept of the “speaking face” to include, for instance, not only the eyes and lips as
objects of visual attention – and therefore qualifiers of personal interaction – but mainly
the characters’ hands, which in general act close to the face and in close articulation with
facial expressions and their functions: “your hands [Isabel’s] are your most fascinating
feature. They are so slim and elegant […] the infinite grace with which you use them
[…] They’re like flowers sometimes and sometimes like birds on the wing. They’re
more expressive than any words you can say” (Maugham RE: V, IV).
(i) mentally only, hearing in our mind their sounds as we articulate them internally;
(ii) half-articulating the sounds of words, still inaudibly to others, although partly vis-
ibly, since the lips are not parted;
(iii) articulating the sounds fully, but mutteringly, making them audible and visible
through the movements of our slightly parted lips;
(iv) uttering the words in a fully audible and visible way.
We can then speak, therefore, of mute oralization and sound oralization, the latter being
the common inclination in many readers when reading a compelling text, just as we may
18. Body gestures, manners, and postures in literature 297
catch ourselves mirroring the characters’ speech movements while watching the more
emotional scenes of a film; let alone while oralizing a letter by someone we do not
know personally, and never have seen, or perhaps have seen once or seldom, or some-
one we do know quite well and just “see” how he or she would say those words, partic-
ularly in a love letter, a piece of epistolary literature which so often is not only oralized
but reenacted, dramatized, emotionally uttered, with excruciating restrain if we are
forced to read it in front of witnesses such as airplane or train seatmates. But we
give free rein to our blissful identification in word and gesture with the beloved, re-
reading key words and phrases, pausing to let our imagination fly to him or her who
spoke and is speaking again, precisely like that, to us readers. That is, indeed, full ora-
lization at its best, accompanied by our own emotional reactions to the sender’s ima-
gined verbal and nonverbal speech: “She did not read the letter [Philip’s]; she heard
him utter it, and the voice shook her with its old strange power […] Philip’s letter
had stirred all the fibres that bound her to the calmer past” (Eliot MF: VII, V). Here
are two very especial instances of epistolary oralization: “I wake filled with thoughts
of you. Your portrait and the intoxicating evening we spent yesterday have left my
senses in turmoil. Sweet, incomparable Josephine, what a strange effect you have on
my heart! […] a thousand kisses; but give me none in return, for they set my blood
on fire” (Napoleon to his future wife Josephine, December 1795); “My angel, my all,
my very self […] My heart is full of many things to say to you – Ah! – there are mo-
ments when I feel that speech is nothing after all – cheer up– – remain my true, only
treasure, my all as I am yours” (Beethoven to an unknown woman, July 6, 1880).
Onomatopoeic words will trigger our own phonic articulation better: “There is a
grumbling sound and a clanking and jarring of keys. The iron-clamped door swung
heavily back” (Conan Doyle SF: V); “The silences between them were peculiar.
There would be the swift, slight ‘cluck’ of her needle, the sharp ‘pop’ of his lips as he
let out the smoke, the warm sizzle on the bars as he sat in the fire” (Lawrence SL:
III). Also, the combination of onomatopoeic and other types of words that trigger
our oralization: “[in a Kentucky tavern, 19th century] a jolly, crackling, rollicking fire,
going rejoicily up a great wide chimney […] the calico window curtain flapping and
snapping in a good stiff breeze of damp raw air” (Beecher Stowe UTC: XI); “See
that steamer out there?[…]/ ‘Yes,’ said Suzanne with a little gasp. She inhaled her
breath as she pronounced this word which gave it an airy breathlessness which had a
touch of demure pathos in it. ‘Oh, it is perfect!’ ” (Dreiser G: III, VII).
and tapped their feet; and they smiled gently and they caught one another’s eyes and
nodded [as a greeting or approval]” (Steinbeck GW: XXIV);
– as ritualized activities, with crosscultural similarities and differences and even inter-
cultural borrowings: “ ‘So it is,’ cried Tigg, kissing his hand in honour of the sex”
(Dickens MC: XXVII), “ ‘But Huzoor!’ said Hari, touching the foreman’s black
boots with his hand and taking the touch of the beef hide to his forehead. ‘Be
kind’ ” (Anand C: 198–199);
– as social manners, varying socially and historically: “He sank into a chair, laid his hat
and gloves on the floor beside him in the old-fashioned way” (Wharton AI: X);
– as task-performing acts, as in eating, drinking, smoking, etc., and associated behaviors:
“Rap with the bottom of your pint for more liquor [at the inn]” (Hardy FMC: XLII),
“he smoked a cigarette which he held between his thumb and forefinger, palm up,
in the European style” (Doctorow R: VI);
– as somatic, random and emotional acts, as in: “the man blew his nose into the palm of
his hand and wiped his hand in his trousers” (Steinbeck GW: XVI).
It has been seen to what extent creative literature offers a wealth of explicit and implicit
instances of interactive and noninteractive nonverbal occurrences, and that pursuing
the study of visual body behaviors independently of the other somatic signs would be
quite unscholarly.
10. References
Anand, Mulk Raj 1972. Coolie. New Delhi: Orient Paperbacks. C
Beecher Stowe, Harriet 1998. Uncle Tom’s Cabin. New York: Signet. First published [1852]. UTC
Bhattacharya, Bhabani 1955. He Who Rides a Tiger. New Dehli: Hind Pocket Books. RT
Brontë, Anne 1979. The Tenant of Wildfell Hall. London: Penguin Books. First published [1848].
TWF
Collins, Wilkie 1974. The Woman in White. London: Penguin Books. First published [1860]. WW
Collins, Wilkie 1986. The Moonstone. London: Penguin Books. First published [1868]. M
Conan Doyle, Arthur 1930a. The Sign of Four. The Complete Sherlock Holmes by Sir Arthur
Conan Doyle, with a Preface by Christopher Morley, Volume I. Garden City, NY: Doubleday.
SF
Conan Doyle, Arthur 1930b. A Study in Scarlet. The Complete Sherlock Holmes by Sir Arthur
Conan Doyle, with a Preface by Christopher Morley, Volume I. Garden City, NY: Doubleday.
SS
Cowper, William 1905. On the Receipt of My Mother’s Picture. Poems. London. First published
[1798]. OR
Dickens, Charles 1836–1837. Pickwick Papers. New York: Dell. PP
Dickens, Charles 1968. Martin Chuzzlewit. Harmondsworth: Penguin. First published [1843–1844].
MC
Dickens, Charles 1985. Bleak House. London: Penguin Books. First published [1853]. BH
Dickens, Charles 1973. Little Dorrit. Harmondsworth: Penguin. First published [1856–1857]. LD
Doctorow, Edgar Lawrence 1985. Ragtime. New York: Ballantine Books, Fawcett Crest. First pub-
lished [1975]. R
Doctorow, Edgar Lawrence 1985. World’s Fair. New York: Fawcett Crest. WF
Dreiser, Theodore 1963. Jennie Gerhardt. New York: Laurel. First published [1911]. JG
Dreiser, Theodore 1967. The Genius. New York: New American Library of Canada, Signet Books.
First published [1915]. G
18. Body gestures, manners, and postures in literature 299
Ekman, Paul 1981. Mistakes when deceiving. Annals of the New York Academy of Sciences 364:
269–278.
Eliot, George 1860. The Mill on the Floss. New York: Dutton; London: Dent. Everyman’s
Library. MF
Eliot, George 1992. Silas Marner. New York: Bantam Books. First published [1861]. SM
Farrell, Studs 1935. The Young Manhood of Studs Lonigan. New York: Vanguard Press. YMSL.
Galsworthy, John 1968. The Man of Property. New York: Charles Scribner’s Sons. First published
[1906]. MP
Galsworthy, John 1959. Justice. Contemporary Drama: 9 Plays. New York: Charles Scribner’s Sons.
First published [1910]. J
Galsworthy, John 1994. Chancery. Hertfordshire: Wordsworth Editions. First published [1920]. IC
Grey, Zane 1945. The Last Trail. Philadelphia: Blakiston Company. First published [1909]. LT
Grey, Zane 1961. The Rainbow Trail. New York: Pocket Books. First published [1915]. RT
Grey, Zane 1990. The Man of the Forest. New York: Harper and Row. First published [1920]. MF
Grey, Zane 1923. Wanderer of the Wasteland. New York: Grosset and Dunlop. WW
Hailey, Arthur 1966. Hotel. New York: Bantam. First published [1965]. H
Hardy, Thomas 1971. Far from the Madding Crowd. London: Pan Books. First published [1874].
FMC
Howells, William Dean 1960. A Hazard of New Fortunes. New York: Bantam Books. First pub-
lished [1890]. HNF
Huxley, Aldous 1928. Point Counterpoint. New York: Avon Books. PC
Huxley, Aldous 1961. Eyeless in Gaza. New York: Bantam Books. First published [1936]. EG
Joyce, James 1947. A Mother, Dubliners. The Portable James Joyce. New York: Viking Press. First
published [1914]. M
Kendon, Adam 1990. Conducting Interaction: Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Korte, Barbara 1997. Body Language in Literature. Toronto: University of Toronto Press; Tübin-
gen, Germany: A. Francke.
Lawrence, D.H. 1960. Sons and Lovers. New York: New American Library, Signet Books. First
published [1913]. SL
Lawrence, D.H. 1950. Women in Love. New York: Random House, Modern Library. First pub-
lished [1921]. WL
Lewis, Sinclair 1961. Babbitt. New York: New American Library, Signet Classic. First published
[1922]. B
Lewis, Sinclair 1926. Mantrap. New York: Grosset and Dunlap. M
Lewis, Sinclair 1954. Elmer Gantry. New York: Dell. First published [1927]. EG
MacLennan, Hugh 1967. Two Solitudes. Toronto: Macmillan of Canada, Laurentian Library. First
published [1945]. TS
Markandaya, Kamala 1954. Nectar in a Sieve. New York: New American Library. NS
Maugham, William Somerset 1942. Of Human Bondage. New York: Random House, Modern
Library. First published [1915]. OHB
Maugham, William Somerset 1978. The Painted Veil. London: Pan Books. First published [1925].
PV
Maugham, William Somerset 1960. Cakes and Ales. Harmondsworth: Penguin. First published
[1930]. CA
Maugham, William Somerset 1943. The Razor’s Edge. Philadelphia: Blakiston Company. RE
Mille, Agnes de 1952. Dance to the Piper. New York: Bantam Books. DP
Norris, Frank 1971. The Octopus. New York: Bantam Books. First published [1901]. O
Norris, Frank 1956. The Pit. New York: Grove Press; London: Evergreen Books. First published
[1903]. P
O’Neill, Eugene 1974. Desire under the Elms. Masterpieces of the Drama. New York: Macmillan.
First published [1924]. DUE
300 II. Perspectives from different disciplines
Portch, Stephen R. 1985. Literature’s Silent Language. New York: Peter Lang.
Poyatos, Fernando 1972. Paralenguaje y kinésica del personaje novelesco: Nueva perspectiva en el
análisis de la narración. Revista de Occidente 113–114: 148–170.
Poyatos, Fernando 1977. Forms and functions of nonverbal communication in the novel: A new
perspective of the author-character-reader relationship. Semiotica 21(3/4): 295–337.
Poyatos, Fernando 1983. New Perspectives in Nonverbal Communication: Studies in Cultural
Anthropology, Social Psychology, Linguistics, Literature and Semiotics. Oxford: Pergamon
Press.
Poyatos, Fernando (ed.) 1988. Literary Anthropology: New Approaches to People, Signs and Lit-
erature. Amsterdam: John Benjamins.
Poyatos, Fernando (ed.) 1992. Advances in Nonverbal Communication: Sociocultural, Clinical,
Esthetic and Literary Perspectives. Amsterdam: John Benjamins.
Poyatos, Fernando 1997. Nonverbal Communication and Translation: New Perspectives and Chal-
lenges in Literature, Interpretation and the Media. Amsterdam: John Benjamins.
Poyatos, Fernando 2002a. Nonverbal Communication across Disciplines, Volume I: Culture, Sen-
sory Interaction, Speech, Conversation. Amsterdam: John Benjamins.
Poyatos, Fernando 2002b. Nonverbal Communication across Disciplines, Volume II: Paralanguage,
Kinesics, Silence, Personal and Environmental Interaction. Amsterdam: John Benjamins.
Poyatos, Fernando 2002c. Nonverbal Communication across Disciplines, Volume III: Narrative Lit-
erature, Theater, Cinema, Translation. Amsterdam: John Benjamins.
Poyatos, Fernando 2008. Textual Translation and Live Translation: The Total Experience of Non-
verbal Communication in Literature, Theater and Cinema. Amsterdam: John Benjamins.
Rain, A. Marmot 1986. La Communication Non-Verbale chez Maupassant. Paris: Nizet.
Rice, Elmer 1959. Street Scene. Contemporary Drama: Nine Plays. Edited by Ernest Bradlee Watson
and William Benfield Pressey. New York: Charles Scribner’s Sons. First published [1929]. SS
Salinger, Jerome David. 1960. The Catcher in the Rye. New York: Signet Books. First published
[1951]. CR
Shaw, Bernard 1958. The Man of Destiny. Bernard Shaw: Seven One-Act Plays. Baltimore: Pen-
guin Books. First published [1896]. MD
Steinbeck, John 1964. The Grapes of Wrath. New York: Bantam Books. First published [1931].
GW
Steinbeck, John 1953. East of Eden. Harmondsworth: Penguin, 179. EE
Wharton, Edith 1905. The House of Mirth. New York: New American Library, Signet Classic. HM
Wharton, Edith 1997. The Age of Innocence. Mineola, NY: Dover. First published [1920]. AI
Wilson, Edmund 1956. Anglo-Saxon Attitudes. New York: New American Library, Signet. ASA
Wolfe, Thomas 1929. Look Homeward, Angel. New York: Modern Library, Random House. LHA
Woolf, Virginia 1973. The Years. Harmondsworth: Penguin. First published [1937]. Y
Abstract
All artifacts involve gestures not only as a prerequisite of their very production but also in
the ways they are used for technical or symbolic purposes. Prehistoric artifacts from stone
tools to decorated cliffs and caves are the only record we have of the gestures of early
humans. Most prehistoric gestures have to be inferred but some are occasionally repre-
sented in parietal paintings and small statues. However, the most direct evidence of pre-
historic gestures is the presence of printed or stenciled hands in caves, and on cliffs or
boulders. Whether these hands can be construed as indexical, iconic, or symbolic signs,
their abundance, ubiquity, and antiquity bear witness to irrefutable traces left by human
gestures in the deep time of Homo sapiens’s cognitive evolution.
1. Introduction
Gestures are by nature ephemeral, dynamic objects which could not be preserved by
any means until the modern technologies allowing visual recording were invented. At
most, two- or three-dimensional representations can reproduce postures or freeze
expressive gestures by selecting features combining directionality, amplitude, and finger
configurations. It can be safely assumed that anatomically modern humans and, presum-
ably, their Neanderthal cousins fostered rich gesture repertories in their social interac-
tions and in their manipulation of, confrontations with, and imitations of the objects,
both animate and inanimate, in their environment. We can only imagine the specific
gestures they were making to communicate feelings and attitudes toward each other,
and the forms of visual communication that prevailed during collective hunting, war-
fare, and the transmission of transformative techniques applied to vegetal, animal,
and mineral materials. Presupposing such gestures is broadly constrained, on the one
hand, by the range of movements that the human body makes possible and, on the
other hand, by the nature of the objects which must be transformed to be functional.
These necessarily included crushing, pounding, knapping, breaking, hitting, molding,
and tying, among many other goal oriented behaviors. Such plausible general assump-
tions remain nevertheless rather vague and hypothetical, and observing the daily life of
surviving hunter-and-gatherer cultures can only help figure out past gestures without
providing any hard evidence regarding the way it was, say thirty-thousand years ago
302 III. Historical dimensions
or more, the time about which the archaeological record provides proofs of socio-
cultural activities. There exist, however, material traces of these prehistoric gestures
in the form of artifacts which at least offer some objective clues about the manipulative
techniques which produced them.
necklace. Words most often come as multimodal utterances, and this provides a ground
for mutual understanding among groups which do not share the same language.
Finally, burials offer irrefutable glimpses of a family of gestures which are essentially
ritualistic: the carrying of the body to its resting place; its coloring with ochre and dec-
orating with beads; its disposition in a particular posture on the ground; the depositing
of symbolic artifacts within the burial space. All this provides strong evidence of ges-
tures which were more of a symbolic than technological nature (Vialou 1998: 116).
Inferring gestures from artifacts and evidence of rituals does not require stretching
the imagination beyond the realm of the highly probable, if not the absolute certain.
raised thigh (White 2003: 57). There are also instances of rock engravings showing what
can be legitimately construed as evidence of meaningful gestures such as the upper
body of a man whose arms are extended in front of him forming a gentle raising
curve with palms facing the ground (Vialou 1998: 112). In other, perhaps more recent
rock paintings, schematic leg and arm positions are evocative of (trance?) dancing with
bodies arced and arms thrown back (e.g., White 2003: 164).
These are mere glimpses, but future discoveries might further document the range of
gestures which formed the dynamic cultural repertories of these populations.
4. Prehistoric hands
But the most vivid evidence of prehistoric gestures is found in the numerous panels of
hand stencils and hand prints which are displayed, often in large numbers, on parietal
surfaces (e.g., Barrière 1976; Cummins and Mildo 1961). They are direct, indexical signs
of deliberate human actions, and they presuppose the complex technologies of stencil-
ing, painting, and printing. Replications have explored various methods to produce
these marks such as blowing a coloring liquid with the mouth on the hand resting on
the rock surface for stenciling. Some hands are also smeared with a painting material
and applied following a printing technique. However, the meaning and purpose of
such behaviors remain elusive, and the morphological characteristics of many of
these hands’ indexical signs are puzzling. The only sure thing is that prehistoric humans
in some areas which are well distributed across the whole planet have consistently pro-
duced sets of such hands, sometimes in large numbers and other times more sparingly in
combination with figurative and abstract signs (e.g., Manhire 1998). This brings us as close
as can be to the physical presence of prehistoric gestures: the preparation of the hand and
its deliberate application on a mineral support on which it remained permanently or at
least until today, several tens of thousands years later. It should be noted that this practice
is still observed among Australian aborigines (Gunn 1998; Wright 1985).
But there is more. All hands on these panels do not look the same. First, there are
variations of techniques, as noted above, as well as chromatic diversity (red, white,
black, and ochre). Second, not all five fingers are visible, as if some had been cut at
the first or second joint, or – and this would be a remarkable evidence of meaningful
gesture – as if some fingers had been selectively bent before applying the color. Several
hypotheses have been proposed to explain such concentrations of hand representations
and their morphological diversity. Each one assumes a complex set of gestures per-
formed in view of a rich socio-cultural context to explain the archaeological record.
But two gesture-relevant features must be distinguished: first, the fact that humans pro-
duced these clusters of stenciled and printed hands by using coloring material and press-
ing their palms against the rocks; secondly, the fact that in many cases some fingers or
parts of fingers are missing, thus creating series of incomplete hand images. Two
extreme theories have been proposed to explain the latter data: a naturalistic view
which claimed that diseases and accidents simply can account for the absent digits
(Barrière 1976) and a linguistic interpretation which contends that the variations
shown might bear witness to the existence of a kind of sign language (Bouissac 1997).
The first hypothesis is based on medical data listing all the known pathologies which
can affect human fingers, suggesting that at least parts of the populations which produced
such images of their hands were seriously crippled by diseases and accidents. The other
19. Prehistoric gestures: Evidence from artifacts and rock art 305
offers a vision of a fairly sophisticated social life in which alternate forms of language
could emerge and generate a kind of hand writing based on sign languages. The only
evidence that could support the first theory would be the discovery of a critical number
of skeletons with missing phalanges. As to the second hypothesis, only a systematic par-
sing of the data could decide whether the variations are random or reveal some form of
combinatorial order with the iteration of recurring sequences (Bouissac 1994). But, so
far, these hypotheses have remained mere virtual possibilities and have not been the
object of serious investigation.
There are, however, some cases in which simple observation reveals intriguing prop-
erties of hand clusters. In their first illustrated account of the Cosquer cave, Clottes and
Courtin (1996) describe seven types of stenciled hands they have recorded among a set
of decorated panels in the cave:
(i) whole left hand with all fingers intact (10 examples);
(ii) whole right hand with all fingers intact (3 examples);
(iii) left hand with little finger folded (2 examples);
(iv) left hand with little and ring fingers folded (15 examples);
(v) left hand with little, ring, and middle fingers folded (6 examples);
(vi) left hand with all fingers folded except the thumb (1 example);
(vii) right hand with all fingers folded except the thumb (1 example) (Clottes and
Courtin 1996: 77).
This limited set offers less diversity than what can be observed in the Gargas cave in
which 16 types have been described (Barrière 1976). Gargas has been thoroughly
described (e.g., Leroi-Gourhan 1967), but Clottes’s and Courtin’s account is not the
result of an exhaustive recording in the Cosquer cave with its multiple galleries
which had not been yet fully explored when their book was published. Describing
types is a first step. More significant would be to record the way in which these types
combine in the panels in which they are often mixed with animal figures. For instance,
Clottes and Courtin (1996: 73) reproduce the engraving of a horse whose front part is
surrounded and partially covered with seven hands which combine four different types
according to the following sequence: types 1, 4, 2, 3, 4, 4, 1. It is hardly plausible that
these stenciled hands belonged to differently crippled people. Is it not more probable
that they are evidence of a form of sign language?
But even if such extreme theories are discarded as improbable fancies, the fact remains
that humans have performed these complex cultural behaviors consisting of elaborating
techniques of printing and stenciling, and producing images on rock surfaces which
have variously persisted until today. Whether we can reliably infer from these facts the
ancient existence of more complex gesture systems will depend, on the one hand, on fur-
ther archaeological discoveries and, on the other hand, on the elaboration of investigative
methods based on the parsing of large databases which are still to be constructed.
5. References
Barrière, Claude 1976. L’Art Parietal de la Grotte de Gargas [par] Cl. Barrière; avec la Collabo-
ration de Ali Sahly et des Élèves de l’Institut d’Art Préhistorique de Toulouse. Translated
from the French by W.A. Drapkin. Oxford: British Archaeological Reports.
306 III. Historical dimensions
Bednarik, Robert G. 2001. Rock Art Science: The Scientific Study of Palaeoart. Turnhout:
Brepols.
Bouissac, Paul 1994. Art or script? A falsifiable semiotic hypothesis. Semiotica 100(2/4): 349–367.
Bouissac, Paul 1997. New epistemological perspectives for the archaeology of writing. In: Roger
Blench and Matthew Spriggs (eds.), Archaeology and Language I: Theoretical and Methodolog-
ical Orientations, 53–62. London: Routledge.
Clottes, Jean and Jean Courtin 1996. The Cave beneath the Sea: Palaeolithic Images at Cosquer,
translated by M. Garner. New York: Harry N. Abrams.
Cummins, Harold and Charles Mildo 1961. Finger Prints, Palms, and Soles. New York: Dover.
Delson, Eric, Ian Tattersall, John A. Van Couvering and Alison S. Brooks (eds.) 2000. Encyclope-
dia of Human Evolution and Prehistory. New York: Garland.
Gunn, Roger G. 1998. Patterned hand prints: A unique form from Central Australia. Rock Art
Research 15(2): 75–80.
Leroi-Gourhan, André 1967. Les mains de Gargas. Essai pour une étude d’ensemble. Bulletin de
la Société Préhistorique Française 63(1): 107–122.
Manhire, Anthony 1998. The role of hand prints in the rock art of the south-western cape. South
African Archaeological Bulletin 53: 98–108.
Marzke, Mary W. 1997. Precision grips, hand morphology, and tools. American Journal of Physical
Anthropology 102: 91–110.
Schick, Kathy D. and Nicholas Toth 1993. Making Silent Stones Speak: Human Evolution and the
Dawn of Technology. New York: Simon and Schuster.
Vialou, Denis 1998. Prehistoric Art and Civilization. Translated by Paul G. Bahn. New York:
Harry N. Abrams.
White, Randall 2003. Prehistoric Art: The Symbolic Journey of Humankind. New York: Abrams.
Wright, Bruce 1985. The significance of hand motif variations in the stenciled art of the Australian
aborigines. Rock Art Research 2(1): 3–19.
Abstract
Extensive use of gestures is a widely practiced tradition in the dance and dance theatre
forms of India. While articulation with various limbs, like hands, eyes, waist, hips,
neck, head, etc., are considered gestures and discussed in ancient treatises like the
Natyasastra (Bharata, ca. 3rd Century BC), it is the use of hands or the hasta, called
20. Indian traditions: A grammar of gestures in classical dance and dance theatre 307
1. Introduction
Performing Arts in India are known for their extensive use of gestures, specifically in
the dance and dance theatre traditions. While articulation with various limbs like
hands, eyes, waist, hips, neck, head, etc., have been considered as gestures and discussed
in ancient treatises like the Natyasastra (Bharata, ca. 3rd century BC), it is the use of
hands, the hasta, called Hastabhinaya that became the most form of physical expression,
seen in dance traditions even today. These gestures are distinguished between those
conveying the meaning in an accompanying text or song, and those that are used in
non-narrative dance, i.e. to ornament movement for aesthetic purposes. Forms or
hand shapes have become conventionalized through long practice and their description
in many ancient treatises, often leading to the notion that such practice is essentially
culturally specific and ritualistic. Also, the hand shape itself is mistaken for the gesture
that conveys the meaning, thus confusing form for content. The actual practiced living
traditions however reveal that gestures are more than just an outer representation of
conventionalized forms. The movement technique used to bring forth the semantic
intent the performer wants to achieve and transmit to the spectators varies depending
on various factors. The hand form may derive its meaning from the content or context
it is used in, but conveys meaning essentially through its articulation, which incorpo-
rates the hand shape as well as the movement or placement of the hand in any given
context. A form-based approach from the perspective of movement studies and cur-
rent linguistic research on gestures that analyses the actual movement methodology
used in the process of meaning construction enables insight into qualitative aspects
of Indian performing arts hitherto not looked into. Laban/Bartenieff Movement Stu-
dies, also known as Laban Movement Analysis and Bartenieff Fundamentals, devel-
oped by dancer, choreographer and philosopher Rudolf von Laban (1879–1958) and
his disciple Irmgard Bartenieff (1900–1981) in Germany, UK and USA, which have
evolved into a multi-layered movement analytic system, provide tools for movement
execution, observation and analysis. In the current linguistic research on gestures, the
form-based approach using gestural modes of representation provides a methodology
308 III. Historical dimensions
for linguistic gesture analysis (Müller 1998, 2009: 514–515). They analyze criteria that
lead to the execution of gestures and describe the origins of gestures in mundane
everyday practices of the hand. These two approaches are deemed appropriate for
the current analysis, which – due to the scope of this article – will be restricted to a
few hand gestures as used in the dance theatre styles Kuchipudi and Bharatanatyam
of South India. Indian performing arts employ hand gestures in the dynamics of mean-
ing-making through physical expression, which appear to be culturally specific and
prescriptive. The movement analytic and linguistic approach will look at the move-
ment principles employed in their execution, enabling a differentiation of content,
which in Indian dance traditions is culturally specific, from form, which could be
meta-cultural, following principles like conceptualization, cognition and embodiment.
Section 2 will briefly explain the contents of the earliest treatise Natyasastra, and look
at some passages relevant for the present analysis. A description and analysis of ges-
tures in non-narrative dance follows in section 3, of narrative dance in section 4 and
after briefly introducing how hand forms are used for the representations of gods in
section 5, an example will illustrate in section 6 the integration of hands and eyes.
(i) gestures used in drama, i.e. for narrative purposes where samyutha (combined) and
asamyutha (non-combined) hands are used to convey a meaning and
(ii) gestures in dance, Nrittahastas, which are used for ornamenting a movement and
providing aesthetic experience.
Subsequent verses list the nomenclature given to the hands in each of these categories,
the description of their hand shape, movement and/or positioning and their multi-
functional usage. These names continue to be used even today. These names are often
of objects similar in shape to that formed by the hand, but the meaning that is derived
from their usage need not always have a semantic correspondence with the name or
shape. In combined gestures and dance gestures, the names often denote movement
and/or spatial aspects, as will be discussed in the analysis of these in the following sections.
Further, five different positions of the hands are identified: palm up, circular motion, ob-
lique, stable and palm down (9.176–177). Of particular interest are the verses 155 to 159
which appear useful for the present analysis. Here, Bharata explicitly says that while he ex-
plains the application of gestures in association with different ideas, such application can be
used elsewhere based on the personal judgement of the actor, bearing in mind the form,
movement significance and class, meaning social strata to which the character belongs.
However, since every gesture can be used to convey an idea, and many commonly used ges-
tures have meanings, he suggests that these can be used as one pleases to depict emotions
and their activities, the suitability of their meaning being important. These verses support
the idea that meaning construction is the basic function and movement technique plays a
significant role in making the semantic content obtainable. Also, that gestures in Natya
are abstracted from common use found in the world. They also enable an understanding
for the conceptually oriented approach to the use of gestures in Indian performing arts.
The primary text, Natyasastra, influenced later treatises that shaped regional dance
and drama styles in subsequent centuries. These styles evolved over time and continue
to be practiced in various forms such as dance, dance theatre or folk theatre, combin-
ing movement, gestures, song, spoken text, music and/or other features as governed by
context and creativity. One of these styles, that integrates all these features, is the
Kuchipudi Yakshagana tradition which evolved since the 13th century in the north east-
ern parts of the state of Andhra Pradesh. The 20th century has seen its evolution into a
sophisticated and refined dance theatre style. The style known today as Bharatanatyam
evolved in the temples and courts of South India. It focuses on precise linearity of move-
ment and subtle expression. Its present repertoire dating back to the 18th century has
been revived and refined by various practitioners in the 20th century. While Kuchipudi
follows the Natyasastra in its entirety, Bharatanatyam primarily follows a secondary
text, the “Abhinayadarpana” of Nandikeswara dated around the 11th century AD. In
the following analysis the Nrittahastas are based partly on the definitions found in the pri-
mary text and as practised in Kuchipudi today and partly on the activity of hands in pure
dance of Bharatanatyam. The analysis of gestural expression with narrative content is
based on the secondary text and as used in Bharatanatyam.
a meaning, they are used in karanas. Karanas are defined in Chapter 6 (30–34) as the
combined movement of the hands and feet. They are basic units of nritta defined
above. Ghosh (1967/2007) translates this term as ‘dance hands’ and comments
that – as the meaning implies – they are used in dance and at times also in combina-
tion with the other two varieties of hand gestures for ornamental purposes. In the
Natyasastra, the nomenclature and description of these hands often specify all or
any of the following:
illustrated. Counter-tension is required for stability and equilibrium. These are concepts
again – like breath mentioned above – that are integral to the Indian world view
(Vatsyayan 1996: 56). Noteworthy is also that spatial relationships of the hastas corre-
spond to some extent with the classification of gesture space used in the analysis of
co-speech gestures (McNeill 1992: 328). For example, the initiation of movement in
pure dance of Bharatanatyam, as seen above, is almost always in front of the chest
corresponding to Centre Centre in gesture analysis.
Using Bartenieff Fundaments for analysis, one can see how the use of hands in the
Indian context goes further. Clarity of hand shape coupled with Spatial Pulls, a clear
Spatial Intent, i.e. the directional intention of the movement, and a movement phrasing
that conventionally gets initiated at the hands influence the movement of the upper
body at the neuromuscular level. Initiating movement with hand shapes involving an
exact placement of each finger, as seen in the above example, integrates the upper
body at a very subtle and deep level, because deep connectivity patterns exist between
the fingers, the ulna and radius and the scapula, going further down to the tailbone
(Hackney 1998: 157–158, 231–244; Hartley 1989: 170–174; Myers 2004: 154–165).
Hand gestures – when executed as defined – hence enable a greater movement range
involving the whole upper body, seen in flexions, rotations and spiralling movements.
For example, in other variations of the movement illustrated in Fig. 20.1, the hands
could move towards other Spatial Pulls like side-low-front and side-high-back, invol-
ving a spiralling of the body. Considering that Bharatanatyam uses all spatial pulls
around the body, a further analysis of dance movements (Ramesh 2008) along these
lines, too elaborate to mention here, suggests a correlation to the principles of ancient
Indian architecture called Vastusastra, where natural forces, determined amongst others
by geographic directions, influence the wellbeing of man (Schmieke 2000). This would
perhaps throw some more insight into what has been termed auspicious relevance of
nritta in the Natyasastra. The visible traditional precision of these movements in geo-
metric space, thus analyzed, clearly places the culturally specific notion of aesthetics
in a broader sense of kinaesthetic understanding, without denying its embeddedness
within the framework of a specific world view. Vatsyayan (1996, 1997) discusses the
role the human body plays in the Indian world view. The understanding and specifica-
tion of such a role is furthered by the analysis of Indian movement aesthetics based on
Laban/Bartenieff Movement Studies and linguistic gesture analysis.
Tab. 20.1: Analysis of some gestures used in the Bharatanatyam repertoire depicting actions,
objects, concrete and abstract ideas, emotions, and more.
Name of Acting Modeling Drawing Representing
Hasta
Kapittha Milking, holding/ Draping of Eye make-up Bird, Lotus in
playing cymbals, robes Goddess Lakshmi’s
offering incense hands
or light
(Continued )
314 III. Historical dimensions
(i) In the case of the cymbals, the shape of the gesture with the forefinger placed over
the tip of the stretched thumb and the other fingers closed into a clasp refers to the
holding of cymbals. In actual performative use, the gesture is acting or enacting the
playing of the cymbals, by either striking the bent forefinger of the right kapittha
hand against the open left palm or striking both the kapittha hands against each
other. The illustrative exactness of the act of playing the cymbals re-creates the
actual movement when playing them, with the difference that a stylized gesture
is used to enhance its visual impact. When striking the gesture against the left
palm, the representation is of cymbals, which are small and rather heavy and
give rhythm for dance. Here the wrists are placed firmly in Centre Centre, per-
forming the striking action with quick, strong and bound movements as when hold-
ing and striking with something small and heavy to produce a specific sound.
Striking both the kapittha hands against each other is mostly used in the context
of cymbals accompanying a prayer. In such a case the movement is light and
free, suggesting the lightness of corresponding cymbals. In a broader metonymic
sense, the act of playing them could also stand for dance or prayer.
(ii) For milking the cow, the forefingers of both the kapittha hands are raised and low-
ered alternately with the acting hands moving downwards vertically, so as to illus-
trate a grasp and downward pull, thus enacting the milking process. One may ask,
why the act of milking is depicted, because not all mundane activities find expres-
sion in this way. Popular narratives are the cowherd stories involving Krishna. The
story of his stealing butter is usually an example of elaboration of a theme, which
shows the whole process of butter-making. Natyasastra, however, does not refer to
the stories of Krishna; the first play performed using natya is mentioned as the
20. Indian traditions: A grammar of gestures in classical dance and dance theatre 315
churning of the milk ocean. This narrative refers to Vishnu, the sustainer of the
world who rests on the milk ocean. An example that reveals the use of the linguis-
tic principle of contiguity is the depiction of the milk ocean, where the action of
milking preceding a gesture depicting waves is enacted with the hands.
(iii) An example that is commonly used in actual practice and finds no reference for
such a usage in the treatise is the use of the kapittha hand to illustrate a bird.
Here, the hand shape represents the bird implying usage due to similarity in
shape. The positions of the hand in gesture space can vary corresponding to the
context in which a bird gets depicted. Higher up in far reach space shows for
e.g. the bird perched on a tree. A precise placement of the wrist at the same
time enhances the meaningful execution of the gesture. Closer to the body it
could be used to express the emotion of compassion with a corresponding patting
or feeding movement with the other hand.
(iv) In denoting the goddess Lakshmi (Fig. 20.2) both hands in kapittha are placed with
firmly grounded wrists at the level of the shoulders. The hands here depict the
holding of lotus flowers, in a strict sense as an acting gesture. They also depict
the lotus flowers in a metonymic sense and when placed as mentioned above,
the whole stance represents Lakshmi in an iconographic representation. The
hands stand for the lotuses Lakshmi holds and the positioning of the hands stands
for Lakshmi. When using only one hand, it is derived from the social context of
ladies or queens holding a flower and is used to portray a woman.
(i) When used to show small or tender objects like pearls, jasmine flowers or the wick
of a lamp, it is often used as an acting gesture to illustrate through the context of
such an action, like plucking a flower, the object being depicted.
(ii) To illustrate a painting the gesture can be used as a sketching gesture by tracing
the rims of the frame or as an acting gesture in the actual movement of painting,
drawing or writing.
(iii) When depicting concepts like truth or time, the movement sketches a vertical line
downwards (Fig. 20.3).
(i) The act of threatening is executed by shaking the outstretched index finger of this
hand shape at the level of the shoulder with the corresponding expression in the
face, coming from a social context. A threat can also be expressed by pointing
the finger towards the person or persons.
(ii) It could denote the world by either tracing spiralling patterns with the index finger
in a moulding gesture above the head, while the eyes are used in a deictic sense
referring to the vast expanse of the universe. The extended index finger of the
suci hand is also used to show the world or earth in a large horizontal circular
movement at the level of the shoulder. The imagery of a spiralling universe or a
round earth being depicted here reveals insight into the nature of the world. If
being more specific and wanting to denote the concept of three worlds, another
gesture called trishula with its three outstretched fingers denoting the number
three is used, either as a preceding gesture or by holding it with the left hand,
while the right hand executes the spiralling movement.
(iii) As a deixis to point at something. For instance if a city or location is held with ala-
padma of the left hand, the right suchi hand points at it.
(iv) To represent death, when held at Centre Centre, index finger pointing up. If this
placement is combined with the other hand palm down in alapadma over the
head it represents an umbrella. The same gesture, however with the alapadma
hand held in a lateral position, represents the sun.
(v) To represent beauty, the index finger is used sketching the shape of the eyebrows.
At this point, it is important to mention that the technique of using gesture space in
front of the body using precise positions corresponds to observations made in co-
speech gestures (McNeill 1992: 378). The fact that such a placement has the purpose
of attracting the attention of the spectator to the gesture is fully utilized in Indian per-
forming arts. The above examples also reveal that the gestures are not just the hand
forms symbolic of what is being represented, but are the actual enacting of the de-
picted. At times, the gestures are used with an over-layering of modes, similar to pat-
terns of gesture production seen in co-speech gestures. For example, the gesture for
eyes both sketches the shape of the eye and represents it at the same time. Also, as
seen in some of the above examples, where gestures are executed simultaneously or
sequentially creating a phrase, the semantic intent made available is due to contiguity.
Such a meaning giving proximity is called sthithibheda in linguistic understanding in
India. Also, in these phrases the left hand is often used to denote the object and the
right hand to denote the action. In the case of pure dance gestures or nrittahastas,
the movement is mirrored on the other side equalizing the body halves and establishing
the experience of verticality, a concept also crucial in the Indian worldview (Vatsyayan
1996: 51–53, 1997: 10–11).
318 III. Historical dimensions
upon whether the performer is describing the child from the mother’s perspective or
becomes that child. The challenge in the performative context is the progression
from a learnt and thus conventionalized gesture language to producing gestures as if
they were in a spontaneous co-speech context – a reversal of some sorts of Kendon’s
Continuum (McNeill 2005: 5–7).
7. Conclusion
Presented above are but a few examples from the rich repertoire of using gestures in
Indian performing arts. The small cross-section of selected gestures illustrated here
allows us to assume that the gestures of the hand in the Indian context are as rich
and varied as the capacity of the human mind to use imagery for expression. There is
also a process of stylization seen in the use of gestures to express and communicate
abstract ideas of spatial cognition. Though treatises describe mythological origins of
the gestures, the examples above also reveal that there has been an abstraction of ges-
tures from real life contexts. The analysis reveals that a cultural specificity seen in the
usage of the gestures is due to the contents depicted in Indian dance rather than due to
form. The actual execution follows principles that are cognitive in nature, involving spa-
tial relationships, visualization and imagery, which underlie conceptualization. In as
much, the function that becomes evident is that gestures abstract and embody a move-
ment event that establishes outer expression, external spatial relationships and internal
connectivity patterns. They depict a speech act, allow a thought, a feeling, or an emo-
tion to materialize – become obtainable and transmittable in the act of performance,
as the term abhinaya suggests. This also reveals that abhinaya even in its traditional
origins, as mentioned in texts and seen in practice has always been a conscious and
structured methodology to mirror the feelings and concepts in the mind.
8. References
Bharatamuni 1987. The Natya Sastra. English Translation by a Board of Scholars. Delhi: Sri Sat-
guru Publications.
Ghosh, Manomohan 1967/2007. Natyasastra (A Treatise on Ancient Indian Dramaturgy and Histri-
onics), Volume 1. Varanasi: Chowkhamba Sanskrit Series Office.
Ghosh, Manomohan 1975. Nandikesvara’s Abhinayadarpanam. Calcutta: Manisha.
Groff, Ed 1990. Laban Movement Analysis: An historical, philosophical and theoretical perspec-
tive. Unpublished Manuscript.
Hackney, Peggy 1998. Making Connections. Total Body Integration through Bartenieff Fundamen-
tals. New York: Routledge
Hartley, Linda 1989/1995. Wisdom of the Body Moving. An Introduction to Body-Mind Centering.
Berkeley, CA: North Atlantic Books.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University
Press.
Laban, Rudolf 1966. The Language of Movement. A Guidebook to Choreutics. Boston: Plays, Inc.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago/London:
University of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Berlin Verlag.
320 III. Historical dimensions
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), Routledge Linguistics
Encyclopedia, 214–217. Abingdon: Routledge.
Myers, Thomas W. 2004. Anatomy Trains. Myofasziale Meridiane. Munich: Urban & Fischer.
Ramesh, Rajyashree 1982. Krishna zeig Dein Gesicht! Erzählender Tanz aus Indien. In: Johannes
Merkel and Michael Nagel (eds.), Erzählen. Die Wiederentdeckung einer Vergessenen Kunst.
Geschichten und Anregungen: Ein Handbuch, 68–82. Reinbek, Germany: Rowohlt.
Ramesh, Rajyashree 2008. Culture and cognition in Bharatanatyam. Integrated Movement Stu-
dies Certification Program Application Project. Unpublished Manuscript.
Schmieke, Marcus 2000. Die Kraft Lebendiger Räume. Das Große Vastu-Buch. Aarau, Switzer-
land: AT Verlag.
Vatsyayan, Kapila 1996. Bharata. The Natyasastra. Delhi: Sahitya Akademi.
Vatsyayan, Kapila 1997. The Square and the Circle of the Indian Arts. New Delhi: Abhinav
Publications.
Abstract
Of all the richness of Jewish gestural culture, this entry focuses on a number of currently
active traditional bodily practices, which appear to be the beating heart of this culture.
The practices under discussion are: the laying of phylacteries, mezuzah kissing, Torah
scribing, gestures that accompany Torah reading, touching/kissing the Western Wall,
prostrating on graves of Tzadiks, applause and hands waving in the prayer, and
some others. In these practices, the cultural knowledge opens up for the retaining/
advancing cultural work, thus uniting their symbolic and pragmatic functions. The
technical and anthropological descriptions of the practices are followed by cultural-
rhetorical analysis, which discovers a metaphorical mechanism at their basis. A meta-
phorical identification is established between a real person and the ideal figure of the
perfect worshiper, or between a physical body and a sacred one (the Book, the
Word), or between a real space and a sacred one (Holy Land). Some of the mentioned
practices, being strongly built in the complex cognitive-physical-lingual activities, enter
the everyday communication and oratorical behavior as bodily techniques observed in
the closing part of the entry.
21. Jewish traditions: Active gestural practices in religious life 321
1. Introduction
The prominent role played by language(s) in Jewish culture, especially in Jewish reli-
gious culture, is well known. One should not forget, however, that in the course of hun-
dreds of years, the Jewish tradition integrated numerous gestural practices, some of
which have been retained and included in the actual religious activities (performed
both by men and, to a lesser extent, by women.) These practices are closely tied to
the lingual ones and to the language itself as the communication basis.
The present entry discusses living gestural practices associated with communicative
traditions in Jewish culture. Everyday gesticulation and purely ritual gestures will not
be considered, out of mainly methodological reasons. Ritual gestures in the strict and
narrow sense constitute an immense store of cultural knowledge, but their communica-
tive function is limited to rigid ceremonial occasions in which each participant transmits
his predictable knowledge as a kind of meta-addresser who addresses a kind of meta-
addressee. Communicativity is here stretched to the limit of its ability. A good example
of this kind of gesture is the ritual of the Priests’ Blessing. The priests perform the bles-
sing gesture with their heads and hands covered by their prayer shawls while the con-
gregation stands with downturned eyes. Uri Ehrlich has done a study of gestures in
canonical prayer customs, their meanings and their origins in Jewish law (halacha)
and culture (Ehrlich 1998).
One reason why gesticulation is an important element in everyday communication is
that it has become detached from its cultural-symbolic roots and inserted into an imme-
diate communicative situation, to whose operative needs it is subordinated: definition of
self in others’ eyes, definition and management of the situation, influence and control,
an aid to speech (Goffman 1959). Naturally, one finds unique elements of everyday ges-
ticulation among different national, local and ethnic groups, but such elements do not of
necessity reflect a national or a cultural character. David Efron’s classical study (which
showed how the New York Jews’ gestures are re-shaped in accordance with the environ-
ment) is still relevant for this issue today (Efron 1972).
Traditional gestural practices are “trans-communicative systems”, vital non-verbal
information systems which are personal and cultural, pragmatic and symbolic all at
once. Cultural information is here made available to everyday cultural activity which
simultaneously both preserves cultural memory and advances the skills of its actualiza-
tion. As it is known, certain gestures constitute entire cultural archives feeding valuable
information to be kept and passed on into intra-cultural communication channels. How-
ever, a part of these gestures, mostly ritual, do not take part in the active social rhetoric.
In terms of hierarchy, their “dialect” belongs to such a high register that there is no
social configuration that would allow its translation into a lower, everyday cultural dia-
lect, its pragmatic utilization. For their high status, these gestures pay with their abso-
lutely static condition. On the other hand, everyday gestures lacking cultural memory
cannot be kept in high codes, but enjoy almost unlimited dynamics. Mutual translatabil-
ity of high- and low-register cultural dialects is typical for a defined group of gestures
integrated in active traditional body practices and techniques, such as crossing oneself
in Western Christian cultures, or lifting one’s hat as a greeting. Such practices of ritual-
ity within everyday routine are culture’s beating heart. In the present entry, some of the
more unique practices of this type within this Jewish culture will be presented, proceed-
ing from the rare to the more common.
322 III. Historical dimensions
We shall start with the discussion of two gestural customs, typical for two respective
communities. The first one is the Torah reading gestures by Yemenite Jews – a large
community characteristic of ancient uniform roots in the past (some maintain – from
the destruction of the first Temple), of a great diaspora, and of a considerable cultural
heterogeneity nowadays. The second one – clapping hands by Breslov Hassids – a rel-
atively young community (the founder, Rabbi Nachman of Breslov, died in 1810), con-
solidated and homogeneous. Naturally, two communities and their customs were
selected for analysis in which the gestural dimension is important and especially essen-
tial. The intent behind addressing originally Middle-Eastern and Eastern-European
communities is to create a colorful picture without the pretense of developing the sub-
ject fully or reaching conclusions concerning the cultural uniformity of the customs at
hand.
After that we shall turn to customs widespread amongst Jewish observant
communities – putting on the phylacteries, kissing the mezuzah, writing sacred texts on
parchment – and we shall see that gesture is their constitutive mechanism. The gestural
essence of kissing the Holy Scriptures, the Western Wall, etc., which will be discussed fur-
ther on, is all too clear, however the explanation of its communicative and symbolic func-
tioning will require further theoretical effort. To sum up, we shall examine the gesture
constituting perhaps the most distinctive characteristic of an observant Jew – swaying
during the service or, in other words, shokeling (from the Yiddish word “to shake”).
(i) Handclapping as preparation for the utterance of prayer: “To prepare and repair
the mouth”: “When a man awakens himself with his hands, [the wings] awaken,
the wings of the lung, where speech is formed. But still there is need to prepare
and repair the mouth, to accept speech inside it, and by striking one palm against
the other, by this the mouth is made” (Nachman 2008: 141:45).
(ii) Handclapping turns prayer into a coupling (in the mystical-symbolic sense of the
word, probably as unification of man with the Sacred, Holy Land etc.): “So there
will not be a difference between the prayer and coupling” (Nachman 2008: 141:46).
(iii) Handclapping makes prayer able to mitigate (in the original – “to sweeten”,
lehamtik) Heavenly sentences concerning the fate of the individual or the
community (Nachman 2008: 141:46).
(iv) Handclapping for achieving unity (Nachman 2008: 141:46).
(v) Handclapping as prophecy in prayer, as an aid in evoking a prophetic imagination
with which to represent the image of God (Mark 2003: 290).
(vi) Handclapping creates “air of the Land of Israel”: “By what will you be granted
that your prayer shall be in the air of the Land of Israel? (…) It is by means of
handclapping that prayer is in the air of the Land of Israel (…) and by handclap-
ping he thus lives in the air of the Land of Israel” (Nachman 2008: 141:44). “Hand-
clapping by the person who prays can arouse the imagination and create a kind of
enclave similar to the Land of Israel, which maintains a space of imagination and
prophecy that cloaks the praying person and helps him attain a prayer that is like
prophecy” (Mark 2003: 291).
This custom thus serves to symbolically originate an ideal praying person. A similar
mechanism can be seen in the Torah reading gestures of Yemenite Jews: Their symbolic
function is to produce an ideal reader/worshipper. To use Kenneth Burke’s rhetorical
terminology, handclapping creates an identification of the worshipper with the absolute
praying person, an identification of the prayer space with the space of the Land of
Israel, an identification of speech with prayer. However, the Breslov conception
of gesture contains another dimension as well: in the rhetorical identity there is a
latent non-identity as well (as Paul Ricoeur has written), which is typical of a meta-
phorical connection. Non-identity or the dismantlement of identity is the second, com-
plementary objective of the handclapping gesture; it is what leads the supplicant to the
sacral “madness” and ecstasy. Gesture here functions as a metaphorical link between
the ideal and the empirical (on the relation between gesture and metaphor see: Cienki
and Müller 2008). Here the gestures neither express, nor signify, nor symbolize, but
324 III. Historical dimensions
rather originate and maintain the ideal I-other figure, the figure of the perfect worship-
per. The functioning of a gesture as a metaphor can be presented with the help of the
Groupe µ model, as a combination of two synecdoches (Dubois et al. 1981: 106–110):
Hear, O Israel: The Lord our God, the Lord is one! You shall love the Lord your God with
all your heart, with all your soul, and with all your strength. And these words which I com-
mand you today shall be in your heart. You shall teach them diligently to your children,
and shall talk of them when you sit in your house, when you walk by the way, when you
lie down, and when you rise up. You shall bind them as a sign on your hand, and they
shall be as frontlets between your eyes. You shall write them on the doorposts of your
house and on your gates (Deuteronomy 6: 4–9).
One puts on the phylacteries at least once a day. The phylacteries are two leather boxes
(called “houses”) containing pieces of parchment with Torah verses written on them;
they are tied with leather strips, one to the arm and the other to the head. A mezuzah
is a box that contains a parchment with two texts concerned with the mezuzah com-
mandments. One kisses the mezuzah every time one crosses the threshold of nearly
every building or room: usually this involves touching the mezuzah with one’s fingers
and then kissing those fingers (see also next section). The performance of the com-
mandment consists in the gesture of binding to the body itself (as in the case of the phy-
lacteries) or by means of the body (as in the case of the mezuzah); the difference is not
significant in the matter at hand.
Writing a Torah scroll is also a binding gesture of sorts. It is only through the gesture
of writing the Torah scroll that a person can really “bind” the holy letters to his body.
This gesture, too, functions as a metaphorical operator: On the one hand it is a synec-
doche of writing (by entailment and causality), and on the other hand it is a synecdoche
of the body (by entailment, proximity, part/whole and specific/general):
In these acts, of writing a Torah scroll and putting on the phylacteries, the gestures are
similar to mudras. One meaning of the word mudra, according to a proposal put forth
by Otto Francke, is “script” or “the art of reading”. Fritz Hommel, too, hypothesizes
that the etymology of mudra goes back to the Babylonian word musaru, “script”, in
21. Jewish traditions: Active gestural practices in religious life 325
which the “s” was replaced by “z” when the word entered the Persian language:
musaru – muzra – mudra (Eliade 1969: 367–368). As in the Middle Ages, the ultimate
object of all these gestures is God (Schmitt 1992: 80–81). The essence and objective of
the binding gesture is reading by the body: The mind reconstructs the invisible written
text (which is enclosed inside the casings of the phylacteries and the mezuzahs) both at
the propositional (the text’s words) and the visual (the appearance of the text written
on the parchment) levels. This is the reason why the writing rules must be so strictly
adhered to: The actual written text must conform absolutely to the image of the prayer
text which is evoked in an observant Jew at the time of binding/touching. If the written
text does not fit the mental image, the speech act, the prayer, could be deemed void or
failed. The correct and successful performance of the binding gesture thus confirms the
prayer’s correct and successful performance; in other words it, again, creates the perfect
worshipper.
5. Kissing gestures
We shall now examine a number of acts which can ostensibly be defined as kissing ges-
tures: Kissing a mezuzah, kissing the phylacteries, kissing a Torah scroll, kissing the
Western Wall. These gestures are components of religious or quasi-religious, ritual or
quasi-ritual (as in the case of kissing the Western Wall) practices. We can point to a
body technique which underlies all these gestures and which constitutes their psycho-
cultural motive, but does not exhaust the gestures; this may be called the practice’s tech-
nical component. The technical component of the above-mentioned gestures would
seem at first glance to be the gesture of kissing. Indeed, kissing is an important and
frequently-encountered body technique in many cultures, which is apparently rooted
in the instinctive and innate act of suckling. But in order to define an instinctive act
as a body technique, it is necessary to describe the cultural development which it un-
dergoes in this or that culture. And in fact, a more careful analysis of the gestures in
question shows that their technical component is not the kissing. Note that in every
one of these gestures the act of kissing is accompanied or mediated by a hand move-
ment: the person touches the mezuzah with his hand and kisses the hand (1); he holds
the phylactery case and brings it to his mouth (2); in the case of a Torah scroll, tech-
nique no. 1 is used when the scroll is taken out of the Ark and returned to it; with a
printed Torah technique no. 2 is used; at the Western Wall both techniques are used on
non-ritual/canonical occasions: Either one touches the Wall with one’s hand which is
then kissed, as in the case of the mezuzah, or one touches the Wall and at the same
time also kisses it. It is thus the hand movement which provides the key to sorting the
given gestures according to their technical-cultural functions. Clearly, therefore, the
gesture of touching with the hand is the technical component of these gestures. In
our case we can define the kiss as the practice’s core gesture. The core gesture contains
no genetic information, but the raw energy which feeds future gestures, but not the
genetic information, the technical manual which determines their growth and their
cultural role.
The body technique which is thus learned and stored in memory at a very young age
is a kiss which is accompanied/mediated by a hand movement and a touch. Every one of
the gestures in the group discussed here involves the hand/mouth touching the words
of the Torah. The gesture of touching the Western Wall is no exception. Most of the
326 III. Historical dimensions
customs associated with the Western Wall imitate synagogue-related behaviors in many
details. In fact, the Western Wall is the ultimate mizrah (a synagogue’s eastern wall) of
the Jewish liturgy. The Western Wall is treated with the same marks of respect as the
Ark containing the Torah scrolls in the synagogue. For example, one does not turn
one’s back on it before receding a few steps. Such manifestations of trepidation and
respect are directed at the sacred center represented by the Holy Book, here repre-
sented by the Western Wall. Thus observant Jews implement towards the Western
Wall the same body technique which they have learned in childhood to use in the syn-
agogue, after this new application was approved by the rabbinical authorities. Touching/
kissing the words of the Torah has a double symbolic function, which is in fact one:
Writing and binding. The words of the “Hear O Israel” passage are themselves a speech
act, similar to a blessing or oath, an elocution in John Austin’s (1962) terminology: Say-
ing the words constitutes their performance, in other words, by pronouncing them they
are metaphorically bound to the heart simultaneously with their non-metaphorical
binding to the arm and the head in the ceremony of putting on the phylacteries. The
primary gesture here is the metaphorical one, while the physical gesture is its de-meta-
phorization, which takes place in the process of the realization of the ideal worshipper.
Kissing or touching the word ties it to the body, writes it on the body (in and by means
of the body) and turns the human body into a Holy Book, into an embodiment of holy
love. This is a gesture of love: The love of God for man as embodied in the Torah’s
words encounters and becomes one with man’s love for God as realized in the touch/
kiss/writing of these words.
6. Shokeling
Back-and-forth movements of the body during prayer have become one of the most
familiar attributes of an observant Jew, to a large measure due to the efforts of the
Jews themselves, who sought and found prayer gestures that differ from those used
by other faiths. The movement consists of a series of very light bows (with bending
at the waist), slight twists of the body left-right, and from time to time – minor bending
of knees before the bowing (everything according to specific phrases of the prayer). It
usually proceeds at a more-or-less free pace, except for some specific cases (especially
during the Eighteen Blessings, the central part of the three mail daily prayers) in
which deeper bows are required. This is all that has remained of the bows, kneelings
and prostrations which characterized the ancient pre-exilic Jewish liturgy. The move-
ment hat is supposed to express the heat of faith is usually quite restrained; only
rarely, especially in certain hassidic and mystical circles, does it attain a truly ecstatic
level. The same goes for the hand movements which accompany prayer, for example
striking the chest when reciting the tahanun (plea). The movements’ restrained nature
endows them with a rational and somewhat reserved character; yet it is also what
makes them more similar to everyday and rhetorical gesticulation. The shokeling
today constitutes a body technique, which is learned and imprinted on memory in
childhood and youth, through constant, thrice-daily practice. It can be assumed that
this technique is characteristic of males rather than of females, as, basically, the obli-
gation of prayer applies to the former ones only. Furthermore, one may observe that
women for the most part shokel more slowly and less vigorously than men (Heilman
1976: 218).
21. Jewish traditions: Active gestural practices in religious life 327
It is important to note that the movement is learnt in tandem with the typical
recital tune used in prayer or reading the Torah, and in a specific social and circum-
stantial frame, which may with considerable justice be defined as an epideictic rhetor-
ical situation (e.g. a situation of public teaching). This body technique has been
transferred from prayer to the lessons in yeshivas, where it is associated with the
behavior of teachers and students when teaching, learning, reading and speaking.
Nowadays one can see and hear rabbis, saintly or just plain observant Jews slightly
bending their torso back-and-forth and speaking with singsong intonation (imitating
prayer or Torah reading) when they give a speech, eulogy or sermon, explain their
position, or debate on occasions which are very far removed from prayer, on topics
which may have nothing whatever to do with faith or religion. Furthermore, their
audience can be seen to move in tempo with them; this can become a mass phenom-
enon when the speech or sermon is given by very revered (usually hassidic) rabbis and
holy men. And since for most people prayer involves reading a prayer book, and most
religious studies are based on reading books, the body technique of shokeling has pe-
netrated into the behavior accompanying the reading of books in general, even non-
religious books in circumstances which have no connection with the liturgy. Perhaps,
however, a distinction should be made between the yeshiva student, who “never seems
to stop shokeling, even when he stands in conversation with another”, and “the mod-
ern Jew” who shokels only during moments of most intense” prayer (Heilman 1976:
218–219).
What are the symbolic and pragmatic functions of this technique? A well-known
opinion is that of the Rabbi from The Kuzari by Judah Halevi: in the past, when not
everyone could have his own book of prayers, people were bending in turns towards
the book to have a look on it. This assumption, an absolutely imaginary one, can be
understood in the context of the inter-confessional dispute described in The Kuzari:
the Rabbi’s response escapes making an analogy between the gesture in the Jewish cus-
tom and non-Jewish rituals, as well as avoids insulting other religions and representing
Jews in a defensive, opportunistic light. However, this assumption is of particular inter-
est for us, since it connects gesture with book reading. Even if it lacks any real basis, it
reflects a deep cultural logic, in which, on the one side, a need “to touch” the sacred
words stipulates the gesture performance, and, on the other side, the gesture perfor-
mance stipulates the possibility of the reading. The desire directed to God is embodied
in the gesture of desire to read the book thus enabling a worshipper to fulfill his or her
duty (and a congregation – its duty) and perform the ritual to the best advantage.
The main symbolic meaning of shokeling during prayer is well-known; it expresses
the supplicant’s self-abasement and submission before God or the Divine Presence.
When prayer is perceived as a discourse or dialogue, the accompanying gestures rein-
force and illustrate its interactive nature, that most complex and mysterious aspect of
prayer. The shokeling, together with other movements, are meant to actualize the
motto of “All my bones shall say” (Psalms 35:10), that is to make the body an active
participant in prayer (which can also be understood as an observance of the command-
ment “You shall love the Lord your God (…) with all your strength”) here are also
another symbolic explanations based on various biblical verses, such as comparison
of the human soul to the candle of God (Proverbs 20:27), and description of the trem-
bling of People of Israel when they were given the Torah near Mount Sinai (Exodus
20:15).
328 III. Historical dimensions
The gestures thus have the purpose of creating the (collective) image of the perfect,
ideal supplicant or, in the concepts of Jewish mysticism, the image of the ideal bride
(the congregation of Israel), one fit to accost the ideal, heavenly Groom. The shokeling
in fact creates the stage of sanctity and fear of God, so that the interaction among peo-
ple here and now symbolizes/stages/reconstructs the occasion of the revelation, which
was the originating interaction between God and man. The shokeling, similarly to the
handclapping in rabbi Nachman’s approach, presences the sanctity of the Land of Israel
and the Temple wherever it is performed, after (and since) the Temple was destroyed
and the People of Israel exiled from its land. Now, another Temple was built instead,
in the body of every Jew at prayer. And just as the Temple was the social, and not
just the religious center of the people, so also the curtsy has the purpose of bringing
about the social and national unity of the congregation by means of the body. This is
the source of this gesture’s social “mission” in the profane sphere, which constitutes
its pragmatic function. The gesture thus turns into a body technique which comes
back into action on every social-epideictic, religious and quasi-religious, sermonizing
and teaching occasion, and unifies the participants by turning them into ideal pilgrims
to the Temple. After all, any Jewish congregation is traditionally called a “holy
congregation”.
To sum up, the Jewish traditions preserve numerous quasi-ritual gestural practices
(some of which turned into body techniques) that are charged with rich symbolic-
cultural meaning and integrated into everyday activities. As it can be seen from the ana-
lysis of some of these practices, their main function is to shape an ideal worshipper. A
gesture functions as a metaphor, merging one’s real character with the ideal one that
fills a human being with sanctity (of the Holy Land or the Holy Scriptures). The ges-
tures thus serve as a means of spreading sanctity wherever they are performed.
When these gestures, by turning into bodily techniques, permeate non-religious spheres
of life, they act so as to produce the ideal cultural community. Since the present article
has not explored the subject in full detail and has never attempted to do so, it can only
be assumed, with a certain degree of confidence, that despite geographical and historical
differences between various Jewish communities, they share the same conception of ges-
ture and of its function in ritual and in everyday cultural (religious or quasi-religious)
communication.
7. References
Austin, John L. 1962. How to do Things with Words. Oxford: Clarendon.
Cienki, Alan and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John
Benjamins.
Dubois, Jaques, Francis Edeline, Jean-Marie Klinkenberg, Philippe Minguet, François Pire and
Hadeline Trinon 1981. A General Rhetoric. Translated by Paul B. Burrell and Edgar M. Slotkin.
Baltimore: Johns Hopkins University Press.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton.
Ehrlich, Uri 1998. ‘All My Bones Shall Say’: The Non-verbal Language of Prayer. In Hebrew,
Jerusalem: Magness Press.
Eliade, Mircea 1969. Yoga: Immortality and Freedom. Translated by Willard R. Trask. Princeton,
NJ: Princeton University Press.
Goffman, Erving 1959. The Presentation of Self in Everyday Life. Harmondsworth: Penguin.
22. The body in rhetorical delivery and in theater: An overview of classical works 329
Heilman, Samuel C. 1976. Synagogue Life: A Study in Symbolic Interaction. Chicago: The Univer-
sity of Chicago Press.
Katsman, Roman 2007. Gestures accompanying Torah learning/recital among Yemenite Jews.
Gesture 7(1): 1–19.
Mark, Zvi 2003. Mysticism and Madness: The Religious Thought of Rabbi Nachman of Bratslav. In
Hebrew, Tel Aviv: Am Oved.
Nachman Ben Simcha, of Breslov 2008. Likutei Moharan. In Hebrew, Modiin: Tiferet ha-Nahal
Institute.
Schmitt, Jean-Claude 1992. The rational of gestures in the West: A history from the 3rd to the 13th
centuries. In: Fernando Poyatos (ed.), Advances in Nonverbal Communication: Sociocultural,
Clinical, Esthetic, and Literary Perspectives, 77–95. Amsterdam: John Benjamins.
Abstract
This chapter analyzes views on deportment found in classical texts. The author argues that
while most of the extant texts focus on rhetorical delivery, they also offer insights into
theories of acting, and more general ideas about the meaning of gesture and deportment.
In addition to the standard texts of Aristotle and Quintilian, the chapter discusses fragmen-
tary and indirect testimonies to Theophrastus’ mirror theory of emotions, Stoic and Epicu-
rean views on body language, as well as Roman theorists’ attempts to map both voice and
gesture. The author concludes that a close reading of technical treatises and fragments
points to the existence of what one might term an ancient theory of human communication,
concerned with emotions, the genesis of language, and nuances of meaning.
use of the body as the art of oratory did of language. Consequently, in order to maxi-
mize the efficiency of the message he delivered, the orator had to hone both his verbal
and non-verbal skills to a high level of artistry – and artificiality. It is possible that by the
time of Quintilian the orator would have planned in advance every aspect of his per-
formance, from finding convincing proofs down to the right moment to touch his cheek,
much like the director and performer of a one-actor play (see Sonkowsky 1959:
272–273).
The earliest testimonies on the study of delivery suggest that a skilled use of hypoc-
risis was believed from the beginning to be extremely effective in public speaking. An
anecdote about Demosthenes (384–322 BCE), repeated (among others) by Cicero in
Brutus (142), Orator (56), and On the Orator (3.213) and by Plutarch in the Lives of
the Ten Orators (845a–b), bears witness to these cultures’ perception of bodily eloquence
as an integral element of the art of public speaking:
One day, when Demosthenes was walking home disheartened after he had been hissed out
of the assembly, (…) the actor Andronicus cheered him up very much by telling him that
his speeches were good, but his delivery needed work; Andronicus then recited from mem-
ory what Demosthenes had just said in the assembly. As a result, Demosthenes was con-
vinced to entrust himself to Andronicus [for training]. From then on, whenever someone
asked him what was the most important aspect of the art of rhetoric, Demosthenes replied,
“Delivery;” “And the second?” “Delivery;” “And the third?” “Delivery.” Plutarch in the
Lives of the Ten Orators (845a–b)
Some ancient theorists of rhetoric come close indeed to the radical statement that the
anecdote ascribes to Demosthenes; for example, Theophrastus (371–287 BCE) claimed
that delivery was the single most important element of the art, whereas Quintilian (35–
100 CE) went as far as to state that how one spoke was more important than what one
said (11.3). Others, however, disagreed; for example, Aristotle (384–322 BCE), De-
mosthenes’ exact contemporary, omitted the subject of delivery from his treatise on
Rhetoric, arguing that it was concerned mostly with manipulating the audience’s emo-
tions and, since it had been borrowed from actors, that it was altogether trivial. Cicero
(106–43 BCE), for his part, extolled the importance of delivery in On the Orator, en-
couraging his readers to learn from actors, yet warning them against acting as though
they were on stage (3.213–3.220). The art of rhetorical delivery, it follows, was consid-
ered both vital and entirely negligible, both similar to and different from the craft of the
hypocrites or actor. In order to disentangle these contradictions, I analyse in this essay
theories of delivery formulated in ancient treatises of rhetoric, from Aristotle’s dismis-
sive comments to Quintilian’s apology, identifying two main theoretical trends, Peripa-
tetic and Stoic, the latter having received hardly any attention in scholarly literature on
ancient delivery.
For most of the 20th century, classical scholarship on ancient rhetorical theory
showed an Aristotelian leaning: the most influential monographs mentioned deliv-
ery only briefly. The dismissive treatment of the Roman treatises on delivery in
Clarke’s classic Rhetoric at Rome ([1953] 1996) is typical of this trend. Clarke (1996:
35–6) reluctantly summarizes the account of delivery found in Rhetoric for Here-
nnius, adding that he hopes that the author’s obsession with delivery was not represen-
tative of contemporary Roman thought (see Kennedy 1972, 1980, 1994). The mid 1990s,
22. The body in rhetorical delivery and in theater: An overview of classical works 331
however, brought a radical change in critics’ attitude towards body language. Two mono-
graphs on delivery in the Roman world, Gleason’s brilliant analysis of rhetorical perfor-
mance of masculinity (1995) and Aldrete’s positivist attempt to reconstruct rhetorical
gestures (1999), appeared simultaneously with two groundbreaking books on Homer, La-
teiner’s (1995) and Boegehold’s (1999). These were followed by Gunderson’s literary
and psychoanalytical reading of rhetorical declamations (2003) and Corbeill’s extensive
critical analysis of the functions of gesture in Roman culture (2004). (For a more de-
tailed discussion of recent work on rhetorical delivery in Rome, see Hall 2007: 234).
In these works, the reader will find a perceptive context-oriented analysis of non-
verbal communication in classical literature, rhetorical practice, and ritual as well
as theoretical reflections on the nature of embodied communication (see especially
Corbeill (2004: 109–136) and his use of Bourdieu’s theory of habitus). In the present
essay, I steer away both from an empirical study of the orator’s deportment and its his-
torical conditions and from modern theory, and instead focus on the ancient writers’ at-
tempts to theorize the role of the human body in communication. This, as will be
apparent shortly, entails a discussion of both theatre and public speaking.
1. Voice 2. Department
1.1 Volume
1.1.1. Loud
1.1.2. Quiet
1.1.3. Medium
1.2 Pitch
1.2.1. High
1.2.2. Low
1.2.3. Medium
1.3 Rhythm
…and the third topic, concerning delivery, has a great potential, but has not yet received
attention.
Delivery influenced the art of tragedy and rhapsodic recitation late, for at first the poets
acted in their tragedies themselves. Furthermore, it is clear that this aspect is present in
rhetorical as well as poetic art, just as some people, including Glaucon of Teos, have
demonstrated.
Now, delivery is manifest in voice, how to use it[’s volume] according to each emotion, such
as, loud or quiet or in between, and how to use the tones [of voice], such as high and low
and in between, and how to use rhythms for each emotion. For there are three matters that
people study: volume, pitch, and rhythm. And these people take prizes in nearly all com-
petitions, and just as actors now have a greater success than poets there [i.e., in recitations
of epic and tragedy], so also [it happens] in political competitions because of the corruption
of the citizens [that people skilled in delivery prevail]. But there is no treatise about these
things, given that even the study of diction began only recently.
In fact, carefully considered, this matter [of delivery] is rather trivial. However, since [not
only delivery but] the whole study of rhetoric pertains to influencing opinions [rather than
attaining true knowledge], we must deal with it, not because it is right, but because it is
necessary. Seeing that the just thing to do in a speech is to seek nothing beyond [facts]
and to arouse neither sorrow nor joy, it is just to compete by means of facts themselves.
Consequently, all things outside the demonstration itself are superfluous.
Nevertheless, as we have said, delivery has a great potential because of the corruption of
our audiences. This necessity [to discuss delivery] is, however, only a small part of the
whole teaching of diction. For it does make a certain difference in clarifying things whether
we say things in this way or another, but not such a great difference, since all these things
[linked to delivery] merely pertain to putting on a show for the audience. This is why no
one teaches geometry in this way. Now, when this thing [i.e. delivery] comes [into fashion
among orators] it will do the same [to their art] as it did to the art of acting; in fact, certain
writers have attempted to say something about this, such as Thrasymachos in his Laments.
(Aristotle Rhetoric: 1403b3–1404a7)
Aristotle’s refusal to discuss delivery combines his outline of the rise of delivery, a brief
account of previous research and its scope, and the rationale for his omission of this
topic; I will discuss them in that order. The philosopher’s agenda is quite clear from
his quasi-historical account. In associating the increase of interest in delivery with the
fashion for poetic texts (both epic and dramatic) to be recited by people other than
their authors, Aristotle seems to be contrasting the performers’ studied delivery with
the authors’ presumably spontaneous rendition of their own work. Delivery, he implies,
is an essentially mimetic art that originally developed as a skill of actors imitating emo-
tions that they did not experience. As such, delivery risks skewing public debates, offer-
ing an unfair advantage to speakers willing and able to manipulate their audience’s
22. The body in rhetorical delivery and in theater: An overview of classical works 333
emotions and awareness would constitute the principles of the art. It seems, however,
plausible that the science in question pertains to emotions, and that the orator was ex-
pected to produce non-verbal signs to which the audience would respond with apposite
emotions (see Achard 2000: 2). Plutarch’s brief mention that Theophrastus explained
that voice could “both express and elicit sorrow, pleasure, and passion” confirms the
assumption that the audience was expected to empathize with the speaker (Table
Talk 623a–b). What the Peripatetic school proposed after Aristotle was, then, in all
likelihood, a psychological, audience-oriented theory of voice and deportment, which as-
sumed that the audience would mirror the emotions demonstrated by the speaker. This
rule of mirrored emotions would have implied that the speaker’s body is a dangerous
instrument capable of manipulating the listener’s response.
gestures in order to point with their finger things that are present” (Lucretius On the
Nature of Things: 5.1028).
Despite their scarcity, these testimonies to Greek thought on delivery allow us to risk
a tentative historical outline. The earliest theories proposed by Sophists drew upon the
art of reciting dramatic and epic poetry and focused on the voice and its ability to
express emotion. Aristotle, despite his misgivings, laid foundations for further study
of rhetorical delivery that focused on emotional associations of both voice and deport-
ment; Theophrastus continued in his steps, analyzing audiences’ emotional responses to
non-verbal signals. Next to this approach focused on public performance, we find tan-
talizing traces of broader theoretical thought on the role of the body in communication;
the Stoics expressed interest in the connection between thought, gesture, and speech,
while the Epicureans apparently reflected on the role of gesture in the development
of language. Although our sources for the broader theory are drastically limited, it is
hard not to notice the contrast between the Aristotelian argument that hypocrisis
should be dismissed as a mimetic activity that exploited emotions, and the Stoic belief
that bodily expression was natural and encompassed all activities of the mind. These
theories would have entailed rather different constructs of the speaker’s body. The peri-
patetic theory assumes that the performer/speaker needs to learn how to use his body
and project efficacious signs of emotions. The Stoic and Epicurean theories, as far as we
can conjecture, held that the body itself “knew” what to do. These concepts would, in
turn, have fostered different attitudes towards delivery, the first one leading to mistrust
towards the body’s histrionic abilities, the second, projecting enthusiasm for the alleg-
edly universal and natural language of the body. These two attitudes compete and at
times converge in the writings of Roman rhetoricians.
conclusions” (3.21–3.22). The face is hardly mentioned and postures are treated as a
topic subordinate to vocal registers. In Rhetoric for Herennius 3.26, the author offers
his only specific advice on the subject of facial expression: “it is thus proper for the
face to express modesty and earnestness.”
Delivery
In this system, the orator’s autonomy is severely limited by the network of rules and
subsidiary rules that he must follow when using his body in the service of his text.
This mechanical approach coincides with a fear that the voice of the speaker who
would follow prescriptions unthinkingly might become too similar to that of an actor:
“it will be proper to use the calmest and most restrained voice possible [speaking]
with full throat, in such a way, however, that we should not move from the conventions
of rhetoric into tragic [delivery]” (3.24). The advice on deportment (gestus) contains
similar words of caution: “Deportment should be neither elegant nor clumsy, so that
we do not look like actors (histriones) or seasonal workers.”
The wealth of technical detail is, it seems, proportional to the degree of anxiety about
the status of a speaker who would follow this advise to the letter. Cicero’s writings on
delivery, composed a little later than the Rhetoric for Herennius, are less concerned with
detail and instead express faith in the speaker’s judgment.
history of great orators. Most of his systematic comments on the art of rhetoric are ex-
pressed in works entitled On the Orator (55 BCE) and Orator (46 BCE). In both of
these works, Cicero concurs with the view ascribed to Demosthenes that delivery is the
dominant factor in oratory (Orator 56, On the Orator 3.213). Although he does so with
a touch of embarrassment, Cicero perceives rhetorical delivery as being akin to “the friv-
olous skills of stage actors” (On the Orator, 3.213). (On the connection between acting an
oratory in Roman theory, see Dutsch 2002, 2007; Fantham 2002; Graf 1994).
But Cicero, in addition to being an experienced speaker and keen critic of acting,
was also an erudite versed in Greek philosophical writings. Thus his theory combines
a fascination with the potential of skilled delivery with strong Peripatetic influences
and an underlying conviction that human bodies naturally transmit information. When in-
dicating, at the beginning of On the Orator (Cicero 1.18), that delivery (actio) is the art of
controlling (moderari) the movement of the whole body (motus corporis), hand gesture
(gestus), facial expression (vultus), as well as the strength and modulation of voice, Cicero
is probably drawing on Peripatetic sources. The source for his statement in On Invention
that delivery consists in “adjusting voice and body according to the status of the topic and
speech” is probably similar (Cicero 1.9). Like Peripatetic theorists, Cicero also discusses
voice in the greatest detail and views delivery as primarily concerned with emotions:
“every emotion (motus animi) has received from nature, as it were, its own facial expres-
sion, its own sound and its own gesture” (On the Orator 3.216). These remarks do not
form, however, an all-encompassing system comparable to the one found in the Rhetoric
for Herennius. The closest Cicero comes to a systematic description of the use of voice
is in Crassus’ lecture, in which the speaker states that one needs to begin with the most
natural tone, then gradually raise the pitch (On the Orator 3.227).
What we have not seen in the earlier writings is Cicero’s perception of the act of im-
plementing these principles as a manifestation of the speaker’s mastery over his body.
This exercise in self-control comes into focus in the discussion of delivery in The Orator:
The orator will use movement in such a way that there will be nothing superfluous; when
gesticulating he will remain straight and tall; his pacing back and forth should be controlled
(moderata) and never for a long time or far. There should be no effeminacy in his bending
of the neck, no twirling of fingers, no counting of the rhythm with fingertips. He will instead
be in control of himself (se ipse moderans) [as manifest in] a manly posture of his entire
torso and both his sides, extending his arm in debate and dropping in calmer moods.
(Cicero 59)
The key word in this passage is moderari, “to manage, control.” Cicero represents the
orator both as the puppeteer and the puppet, his mind controlling the entire body and
forcing it to broadcast, simultaneously through several channels, the strength, stability,
and self-control expected of an elite Roman man (on Roman masculinity, see Gleason
1995 and Gunderson 2003). As long as this elite man uses his body as a cipher for his
status, as he would have been socialized to do, he will not need the detailed instructions
of the kind offered in Rhetoric for Herennius.
has come down to us from antiquity. Although he is aware that pronuntiatio referred
originally to voice, Quintilian follows Cicero’s usage (see On the Orator 3.59.222 and
Orator 17. 55) in applying this term to both voice and deportment. Thus defined, deliv-
ery of a speech is, according to Quintilian, more important than its content (see Dutsch
2002). (See Fig. 22.3) “[Delivery] has an extraordinarily powerful effect in oratory: since
the effect that a speech has on all listeners depends on what they hear – the quality of
the speeches that we compose in our minds is not as important as the way in which we
deliver them.”
1. Voice (vox) 2. Face (vultus) 3. Deportment
Part of Speech Voice Quality Eyes (no expressions described) (habitus corporis)
Introduction Quiet Joy Head nods
Narration Colloquial Sadness (no movements described)
Proof Colloquial/Witty Intensity Consent
Argument Sharper, more emphatic Indifference Refusal
Digression Gentle/sweet/relaxed Pride Affirmation
Epilogue: Anger Modesty
a) Recapitulation Even, short, clear-cut clauses Threat Hesitation
b) Asking mercy Sweet and melancholic Flattery Surprise
Submission Neck Indignation
Mood Voice Quality
Joy Joyful Eyebrows Straight (reliability)
Encouragement Strong Anger (contracted) Bent (subservience)
Hostility Rather slow Sadness (dropped) Hands Stiff (arrogance)
Compliments Gentle and submissive Joy (raised)
Consent (dropped) [Emotions]
Refusal (raised) Aversion
Derision (not specified) Fear
Contempt (not specified) Joy
Loathing (not specified) Sorrow
Confession
Blush/pallor Hesitation
Regret
Nose [Actions]
(no meaning assigned) Demand
Twitching Promise
Wrinkling Summon
Dismiss
Lip movement Threaten
(no meaning assigned) Supplicate
Smacking Question
Curling Negate
[Deictic]
Measure
Quantity
Number
Time
Deictic Adverbs
Unlike the author of Rhetoric for Herennius and (to a lesser degree) Cicero, Quintilian
does not privilege voice in his account. On the contrary, he offers a far more extensive
account of gesture and deportment than he does of voice. He writes with contagious
enthusiasm about the expressive potential of every small section of the human body
(eyebrows, fingers, mouth, neck). Like the Peripatetics and Cicero, he subscribes to
the opinion that voice is directly connected to the human soul and has the power to
convey – and provoke – every emotion. He also assumes that the audience’s emotional
response is of paramount importance to the speaker and compares the work of the ora-
tor to that of an actor. Quintilian’s discussion of voice contains, in addition to the old
Peripatetic prescription that the tone of voice must befit circumstances, advice on the
22. The body in rhetorical delivery and in theater: An overview of classical works 339
For, indeed, not only hands, but also nods of the head can convey our will; the mute use it
instead of language and we can be moved by pantomime dancing without a sound. It is also
possible to understand the disposition of individuals from their face and gait. Furthermore,
although animals have no language, one can sense their anger, joy, and affection in their
eyes and in certain other signs. (Quintilian 11.3: 66)
In this excerpt, Quintilian departs from his earlier (Peripatetic) definition of rhetorical
delivery (as concerned chiefly with emotions) and draws attention to the body’s other
possibilities, such as expressing consent or storytelling. He also alludes to sign language
and seems to assume that non-verbal communication is to some degree used by every
living creature.
The rhetorician seems to be making no distinction between voluntary communica-
tion and involuntary signs that can be read by others who can interpret a person’s dis-
position from their face and gait. Instead, he suggests that the gestures which are
endowed with meaning and “come out naturally along with voice” must be distin-
guished from the “imitative” gestures used by pantomime actors (11.3.88). The most
complex code among his different facets of natural gestus is that of hand gestures. Quin-
tilian writes that these almost surpass spoken word in their intricacy:
As for hands, without which delivery is lame and weak, I can hardly tell you of how many
movements they are capable, for they almost surpass the multiplicity of words. For,
whereas other parts of the body merely help the speaker, the hands so to say, speak them-
selves. Don’t we use them to ask, promise, call, dismiss, threaten, supplicate; don’t we
express revulsion, fear, questions, and negations? Don’t we show joy, sadness, doubt, con-
fession, regret, measure, quantity, number, and time? Don’t the hands have the power to
encourage and forbid, approve admire and show shame? Do they not, in indicating people
and places fulfill the function of adverbs and pronouns? (Quintilian 3.85–3.87)
the rudimentary distinction between verbs and nouns, classifying all other words as
“conjunctions.” The Stoics later added several new parts of speech, including the parti-
ciple and the adverb (1.4.18–1.4.20). Quintilian’s references to adverbs and pronouns in
his comments on the language of gesture, relying as they do on Stoic theory of language,
thus probably reflect a Stoic theory of gesture. The fact that both references to Stoic
sources and comparisons between gesture and parts of speech are absent from Cicero’s
comments on delivery and from The Rhetoric for Herennius (see Achard 2000: 12),
would confirm the thesis that Quintilian’s comparison is indeed derived from Stoic
thought.
Quintilian is also intensely preoccupied with the possibility that the speaker’s body
might look (as well as sound) “unnatural” (another Stoic motif): the head must be car-
ried “naturally,” as carried too high it expressed arrogance; too low, a slavish disposition
(11.3.68–11.3.69; see Varwig 1976). The eyes need careful attention, as they are natu-
rally expressive of one’s emotions. Although Quintilian thinks that those (probably
Stoic theorists) who argue that the best delivery owes absolutely nothing to art go
too far, his insistence on natural delivery (11.3.10) is quite remarkable and has no par-
allel in the Peripatetic sources. The problem of authenticity arises, for example, when
Quintilian considers whether the emotions displayed by the orator are “true” or “arti-
ficial” and concludes that they are positioned somewhere between truth and fiction. He
advises the future orator that “it is best to feel the affect and imagine pictures of things
and be moved as though by genuine emotions” (11.3.61–11.3.62).
Ultimately, all these precautions serve three purposes: to win the audience’s
approval by displaying manly self-control, to persuade them by the intensity of one’s
conviction, and to stir their emotions. The Roman rhetorician thus arrives at a workable
compromise between the linguistic and psychological approaches, suggesting both that
the body has an innate capacity for communicating diverse concepts in several different
ways and that the speaker must take into special consideration his audience’s likely
emotional response to his own body language. In addition to his focus on nature, Quin-
tilian is also aware of the cultural dimensions of gesture. He often compares his recom-
mendations for a Roman orator with what is deemed acceptable in other contexts, i.e.
among “Greek” or “Eastern” orators, among Roman orators in earlier times, or among
actors on the tragic and comic stage or in pantomime performances.
8. Conclusion
Because delivery required the orator to use his body to represent willfully – rather than
express spontaneously – his thoughts, this particular aspect of the orator’s work was in
the eyes of ancient theorists reminiscent of the task of a dramatic performer. Ancient
writers were acutely aware of the similarities between these two stylized bodily idioms.
The notion that a man speaking to his fellow citizens on important public or legal mat-
ters had to resort to methods akin to acting displeased certain rigorous thinkers, includ-
ing Aristotle, but the awareness of the rules governing the use of voice, posture, and
facial expression in oratory sparked a larger debate on the function of the body in
human communication. By the time of Quintilian, the theory of delivery was a sophis-
ticated discipline that combined a psychology of emotions with a semantic theory that
held that nonverbal communication encompassed several modes (posture, face, gesture,
and voice). Through these modes a speaker was believed to convey information
22. The body in rhetorical delivery and in theater: An overview of classical works 341
that was usually linked with spoken utterances. Sometimes, however, especially in the
case of hand gesture, the body was believed to speak a language of its own. Governed
by its own grammar, this language had multiple registers (everyday interaction, pub-
lic speaking, pantomime, sign language) and regional dialects (Rome versus the
“East”).
In sum, while the ancient texts we have at our disposal are focused on technical de-
tails useful for public speaking, they also bear witness to a much broader intellectual
discourse that encompassed not only emotions, but also the genesis of language, and
the very nature of human communication.
9. References
Achard, Guy 2000. Pathos et passions dans l’Ad Herennium et le De inuentione. In: Andreas
Haltenhoff and Fritz-Heiner Mutschler (eds.), Hortus Litterarum Antiquarum: Festschrift für
Hans Armin Gärtner zum 70. Geburtstag, 1–17. Heidelberg: Carl Winter.
Aldrete, Gregory S. 1999. Gestures and Acclamation in Ancient Rome. Baltimore: John Hopkins
University Press.
Aristotle 1992. The Art of Rhetoric. Translated and edited by Hugh Lawrence Tancred. New York:
Penguin Classics.
Boegehold, Alan L. 1999. When a Gesture Was Expected: A Selection of Examples from Archaic
and Classical Greek Literature. Princeton, NJ: Princeton University Press.
Caplan, Harry 1954. [Cicero]: Ad Herennium de ratione dicendi. Cambridge, MA: Harvard Uni-
versity Press.
Cicero, Marcus Tullius 1949. On Invention. Translated by H. M. Hubbell. Cambridge, MA: Loeb
Classical Library.
Cicero, Marcus Tullius 2001. On the Ideal Orator. Translated by James M. May and Jakob Wisse.
New York: Oxford University Press.
Cicero, Marcus Tullius 2004. Cicero’s Brutus or History of Famous Orators; also His Orator, or
Accomplished Speaker. Translated by E. Jones. Whitefish, MT: Kessinger.
Clarke, Martin Lowther 1996. Rhetoric at Rome: A Historical Survey. Revised edition with intro-
duction by D.H. Berry. London: Routledge. First published [1953].
Corbeill, Anthony 2004. Nature Embodied. Gesture in Ancient Rome. Princeton, NJ: Princeton
University Press.
Diogenes Laertius 1853. The Lives and Opinions of Eminent Philosophers. Translated by Charles
Duke Yonge. London: Henry G. Bohn.
Dutsch, Dorota 2002. Towards a grammar of gesture: A comparison between the types of hand
movements of the orator and the actor in Quintilian’s Insitutio Oratoria 11.3 85–184. Gesture
2(2): 259–281.
Dutsch, Dorota 2007. The language of gesture in the illuminated manuscripts of Terence. Gesture
2: 39–70.
Edwards, Michael 2007. Alcidamas. In: Ian Worthington (ed.), A Companion to Greek Rhetoric,
47–57. Malden, MA: Blackwell.
Fantham, Elaine R. 1982. Quintilian on performance. Phoenix 36: 243–263.
Fantham, Elaine R. 2002. Orator and/et actor. In: Patricia E. Easterling and Edith Hall (eds.),
Greek and Roman Actors: Aspects of an Ancient Profession, 362–376. Cambridge: Cambridge
University Press.
Fortenbaugh, William W. 1985. Theophrastus on delivery. In: William W. Fortenbaugh, Pamela M.
Hubby and Anthony A. Long (eds.), Theophrastus of Eresus: On His Life and Work, 269–288.
New Brunswick, NJ: Transaction.
Fortenbaugh, William W. 1986. Aristotle’s platonic attitude toward delivery. Philosophy and
Rhetoric 19: 242–254.
342 III. Historical dimensions
Fortenbaugh, William W. 2007. Aristotle’s art of rhetoric. In: Ian Worthington (ed.), A Companion
to Greek Rhetoric, 107–123. Malden, MA: Blackwell.
Gleason, Maud W. 1995. Making Men: Sophists and Self-Presentation in Ancient Rome. Princeton,
NJ: Princeton University Press.
Graf, Fritz 1994. Gestures and conventions: The gestures of Roman actors and orators. In: Jan
Bremmer and Herman Roodenburg (eds.), A Cultural History of Gesture, 36–58. Cambridge:
Polity Press.
Gunderson, Erik 2003. Declamation, Paternity, and Roman Identity: Authority and the Rhetorical
Self. Cambridge: Cambridge University Press.
Hall, Jon 2007. Oratorical delivery and the emotions: Theory and practice. In: William Dominik
and Jon Hall (eds.), A Companion to Roman Rhetoric, 218–234. Malden, MA: Blackwell.
Harrigan, Jinni A., Robert Rosenthal and Klaus R. Scherer (eds.) 2005. The New Handbook of
Methods in Nonverbal Behavior Research. Oxford: Oxford University Press.
Juslin, Patrik N. and Klaus R. Scherer 2005. Vocal expression of affect. In: Jinni A. Harrigan, Ro-
bert Rosenthal and Klaus R. Scherer (eds.), The New Handbook of Methods in Nonverbal
Behavior Research, 65–135. Oxford: Oxford University Press.
Kennedy, George A. 1972. The Art of Rhetoric in the Roman World, 300 B.C. – A.D. 300. Prince-
ton, NJ: Princeton University Press.
Kennedy, George A. 1980. Classical Rhetoric and Its Christian and Secular Tradition from Ancient
to Modern Times. Chapel Hill: University of North Carolina Press.
Kennedy, George A. 1994. A New History of Classical Rhetoric. Princeton, NJ: Princeton Univer-
sity Press.
Kennedy, George A. 2005. Invention and Method: Two Rhetorical Treatises from the Hermogenic
Corpus. The Greek Text. Edited by Hugo Rabe, translated with Introduction and notes by
George A. Kennedy. Atlanta: Society of Biblical Literature.
Lateiner, Donald 1995. Sardonic Smile: Nonverbal Behavior in Homeric Epic. Ann Arbor: Univer-
sity of Michigan Press.
Long, Anthony A. and David N. Sedley 1987. The Hellenistic Philosophers, Volume 2: Greek and
Latin Texts with Notes and Bibliography. Cambridge: Cambridge University Press.
Lucretius 2011. On the Nature of Things. Translated by Frank O. Copley. New York: Norton.
McNeill, David 2005. Gesture and Thought. Chicago: Chicago University Press.
O’Sullivan, Neil 1992. Alcidamas, Aristophanes and the Beginnings of Greek Stylistic Theory.
(Hermes Einzelschriften 60.) Stuttgart, Germany: Franz Steiner.
Plutarch 1969. Moralia, Volume 8: Table Talk, Books 1–6. Translated by P.A. Clement and H.B.
Hoffleit. Cambridge, MA: Loeb Classical Library.
Plutarch 2010. Plutarch’s Lives, Volume 5 (Lives of the Ten Orators). Charleston, SC: Nabu Press.
Quintilian 1856. Education of an Orator. Translated by John Selby Watson. London: Henry G.
Bohn.
Rabe, Hugo 1913. Hermogenes Opera. Leipzig: Teubner. Reprinted Stuttgart: Teubner [1969].
Sifakis, Gregory Michael 2002. Looking for the actor’s art in Aristotle. In: Patricia E. Easterling
and Edith Hall (eds.), Greek and Roman Actors: Aspects of an Ancient Profession, 148–164.
Cambridge: Cambridge University Press.
Sonkowsky, Robert P. 1959. An aspect of delivery in ancient rhetorical theory. Transactions and
Proceedings of the American Philological Association 90: 256–274.
Tranquillus, C. Seutonious 2010. The Lives of the Twelve Caesars: Grammarians and Rhetoricians.
Whitefish, MT: Kessinger.
Unknown 1994. Rhetorica ad Herennium. Translated by Theodor Nüsslein. Mannheim, Germany:
Artemis and Winkler.
Varwig, Freyr Roland 1976. Der Rhetorische Naturbegriff bei Quintilian. Heidelberg: Carl Winter.
Abstract
Prior to the invention of book printing, Western culture had no efficient storage medium
that served to unburden human memory. Instead of writing a mnemotechnics based on
the visual perception of bodily movements, took over the functions of orienting, identify-
ing, and stabilizing the social order in the medieval period. In medieval instructions on
ecclesiastic and secular norms of behavior, the categorization of the most wide-spread
and relevant bodily practices was ordered according to the various functions of the
body parts involved, such as neck, back, and knee muscles, arms, hands, lips, and facial
muscles. Naturally, such a system of signifiers (like the prostration, the genuflection with
inclined body, the genuflection with erect body, bowing to the waist, bowing to the chest,
the foot kiss, the knee kiss, the shoulder kiss, the hand kiss, etc.) which will be recon-
structed here on the basis of various source corpora, could not encompass all the motion
sequences that took place in the context of different interactions. Nevertheless, it has
proven useful to treat socially meaningful actions as communicative acts within the
contexts of 1) religion, 2) law, 3) ceremonial and 4) etiquette.
surveying, is the primary precondition for all cognition, control, and cultural semiosis.
In the context of pre-literate culture, the hierarchy of modes of cognition was no differ-
ent from that of other ages. In the Middle Ages, too, human cognitive processes were
guided primarily by the eye. Instead of writing, however, a mnemotechnics based on
the visual perception of bodily movements took over the functions of orienting, identi-
fying, and stabilizing the social order in the medieval period.
Secondly, Ong’s notion of “orality” narrows the context of communication insofar as
it refers only to the exchange of information using language between a speaker and a
recipient. Such a narrowed approach completely excludes from consideration the posi-
tion of the body toward the addressee, for example, as well as the role of eye contact (or
the lack thereof), openness of arms and legs, shoulder position, in other words, all those
behavioral variables which always co-determine the course and character of communica-
tive acts. In order to understand the specificity of oral culture in the Middle Ages, we
must thus comprehend the structural and semantic relationship of verbal and non-verbal
signs to one another and their meaning in the context of pre-literate communication.
inclinationes
(ad renes)
genuflexlones
(ad genua)
prostrations
(ad talos)
The basic body techniques listed above can be subdivided into more complex cate-
gories. For instance, the term proskynesis was generally used to refer to the combination
of the genuflection with a kiss on the feet, knees, or hand, or, somewhat less frequently,
to a combination of bowing and kissing the corner of a garment or a hand. In regard to
the nature of the relationship between signifier and signified, we can further distinguish
between iconic vs. indexical and symmetric vs. asymmetrical bodily signs.
In this manner, courtly society created a system of gestures that, so to speak, did not
function autonomously but referred to the canonical gestures of Christian iconography.
For instance, one did not fully remove one’s hat but only had to raise it briefly or only
touch the brim slightly, as can be observed in the “chapeau” gesture that derives from
courtly etiquette and was preserved by the educated middle class. Accordingly, in acts
of courtly reverence of the seventeenth century, one could slightly bend one’s leg to
refer to something that icons expressed through full genuflection.
centuries. Most often, this process occurred not under pressure from below but, on the
contrary, by the initiative of the upper classes. The following resolution by Kaiser
Joseph II of January 10, 1787 is indicative of this trend. In it, it was decreed
[…] that henceforth, […] the kiss on the hand which men and women offer to the highest of
lords of the highest archducal house, just like genuflections in reverence and the kneeling
before anyone and in all cases is to be ceased entirely, […] likewise no one, […] who wishes
to request or otherwise petition anything, shall no longer kneel, for this is no fitting action
between men, but is to be reserved for God alone. (Schreiner 1990: 128)
Body-related decrees from the eighteenth century illustrate the dissolution of tradi-
tional, visually oriented modes of perception in which social hierarchies were projected
onto the vertical body axis and depicted by analogy. In the sphere of power, a spatial
symbolism of “left” and “right” (the expressions left wing and right wing come from
the British Parliament in the 1700s, when the Liberal party sat on the left and the
Tory party sat on the right) became established by the eighteenth century, while the
symbolism of “above” and “below” in self-representations of powerful political figures
increasingly faded into the background. The new politics aimed at concealing inequal-
ities surfaced in the rhetoric of the “universally human”. It aimed to block the observation
of differences in status and rank on the level of the body.
3. Areas of application
The physical behavior that determined the course of most everyday interactions in the
Middle Ages, such as prayer, conflict resolution, inviting and receiving, greeting and
taking leave, was subject to conscious societal standardization. With regard to medieval
bodily practices, we will examine four sets of norms: religious, legal, ceremonial, and
etiquette norms.
3.1. Religion
The common conception of the Middle Ages as an age of superstition is the result of
mistaken interpretations of bodily worship practices, which continue to be viewed as
23. Medieval perspectives in Europe: Oral culture and bodily practices 349
demanding an exceptional degree of physical exertion. In fact, the use of the body in
medieval religious rites was subject to a strict system of usage rules (adoratio). Aside
from worship practices, the signified adoratio also encompassed other forms of tran-
scendental communication, such as those between sacred rulers (i.e. emperors) and
their subjects, between artists and patrons, but also between lover and beloved. Eccle-
siastic rituals, however, drew on the reverse of the adoratio in the form of expressions of
humility.
In the context of physical worship rituals, the signifiers were strictly divided into
prostrations (prostrationes), genuflections (genuflexiones), and bows (inclinationes). In
the context of antiquity as well as in the medieval tradition, the prostration was most
often understood as an extended genuflection. This is particularly true of the position
known as genuflexio proclivis, in which the head of the kneeling subject touches the
ground. Humbert de Romans (early thirteenth century) treats the prostratio as identical
to the genuflexio proclivis. Petrus Cantor (1197) also describes the prostration under the
title genuflexiones (see Schmitt 1992: 286). We can infer that the lying body position was
viewed as an amplification of the genuflection in liturgical ceremony. In my classifica-
tion, the term “prostration” only refers to the prostrate position. It is distinct from
the genuflection with its various semantic connotations (see Humbert, early thirteenth
century: 167).
Even though the classifications of prayer postures conducted in the Middle Ages
were based on the knowledge of rituals inherited from antiquity, significant differences
from the latter occurred as well. As a pose whose function was restricted to sacral com-
munication, the prostration symbolized primarily an exaggerated fear of God. As the
religious prostratio venia, this pose was widely used in the Middle Ages. With regard
to the suitability of various body positions for prayer, Petrus Cantor (1197) deemed
“most excellent,” “most serious,” and “most useful” those types of prayer in which
“chest, stomach, arms, knees, upper thighs and toes touch the earth” (“Inter omnes
autem modos orandi ist est quasi melior et fere utilior: iacere in solo, ita quod os et pec-
tus et venter et brachia et genua nec non et crura atque digiti contingant terram” – see
Petrus Cantor 1197: 233; see also Trexler 1987: 190–233; Schmitt 1992: 290). In the
beginning of the thirteenth century, Humbert de Romans also included the full prostra-
tion in his classification of different prayer types. However, he condemned the habit of
“certain laymen” to extend their arms in the form of the cross of Christ and to kiss the
ground. The rejection may be explained by the association of kissing the ground with
pagan practices of earth worship. The prayer instructions of Humbert de Romans
and Petrus Cantor show that the liturgical forms of prayer in Catholic prayer practices
of the twelfth and thirteenth centuries were treated analogously to rhetorical figurae.
Similarly to rhetoric and the art of theatre, the impression of true versus false (e.g.
feigned) holiness (ostentatio sanctitatis) depended on the assumption of a certain
body posture (eloquentia corporis). Lying prostrate was interpreted as an extreme
expression of devoutness in Catholic worship (albeit in a positive sense).
Aside from their concern with technique, twelfth-century prayer manuals consis-
tently distinguish between the space of private and the space of public prayer (Trexler
1987: 43). In this regard, Petrus Cantor refers to certain contemporaries who were
“ashamed to pray lying” in public (“[…] dicit […] verecundor orare […] in terra
prostrates”, Petrus Cantor 1197: 838–839). Moreover, the religious semantics of the
prostrate position is played off against the use of this technique in court ritual:
350 III. Historical dimensions
individuals who are accustomed to falling prostrate before a tyrant ought instead to
prostrate themselves before God (“Itaque cum prosternas te ante pede alicuius tiranni
pro modica pena fugienda, potius teneris te proicere ante deum […]”, Petrus Cantor
1197: 799–802; see also Trexler 1987: 47). Trexler notes: “the opposition of knights to
kneeling and especially prostration is today and was even then cited as a cultural char-
acteristic of the medieval West” (Trexler 1987: 47). The symbolism of prayer techniques
was used to overcome the controversial ethical attitudes of the armed nobility on the
one hand and of a monkhood on the other hand.
Beginning in the late Middle Ages, genuflection with folded hands gained promi-
nence as a competing and more strictly regimented posture during public prayer. In
the West, the symbolically generalizing semantics of worship (adoratio) generally
applied to genuflexio, and not only to the prostrate position (Bolkestein 1929: 31). In
the context of worship, the genuflection was a well-documented phenomenon through-
out all of Europe since antiquity. In addition to the prostration, it can be considered
the basic element of the genuflecting adoratio technique. It must be noted, however,
that Greek “Gods of state,” in contrast to later depictions of Christ, were generally
not worshipped by genuflection (Bolkestein 1929: 29). The motif of kneeling that ap-
pears on ancient Greek vases is associated primarily with female figures. Kneeling
girls essentially rested on their heels without inclining their upper body forward (Walter
1910: 244).
The genuflection only gradually established itself in the context of ancient Roman
courtly ceremony. Only in the third century did the Christian Church recognize the gen-
uflection before the Roman emperor as a valid ceremony, which in turn led the imperial
state to legitimize genuflection in Christian prayer practices (Alföldi 1935: 77). From
then on, the practice of genuflection was preserved in particular by those who had at-
tacked it most vigorously during confrontations with Roman emperors: the Christian
clergy. This was achieved by means of a reinterpretation of a New Testament passage
according to which the Three Holy Kings had worshipped the Infant Jesus on their
knees as both God and King. Bishops and priests in particular came to feel included
in both state and church hierarchies by the adoratio practice. The history of medieval
rule saw many periods during which the pope genuflected before the emperor, or,
much more frequently, the emperor genuflected before the pope. Thus Pope Leo III
performed the adoratio before Charles the Great (et post laudes ab apostolico more
antiquorum principium adoratus est – Ann. regni Franc. a. 801). It remains unclear
whether the adoratio in this context must be interpreted as full prostration, genuflec-
tion, or bowing. In view of the Roman adoratio practices known to us, the use of the
prostration (prostratio venia) by the pope before the emperor is rather unlikely. Bowing
was a common greeting ritual among free citizens and was thus less appropriate to the
context of the adoratio. Genuflection thus appears as the most likely option. Since the
coronation of Charles the Great, the proskynesis of the pope before the emperor had
not been repeated (see Folz 1964: 175; Kelly 1986: 98). In the late Middle Ages, it
was the emperor, however, who was expected to kiss the pope’s feet during the corona-
tion ceremony. The year 1209 even saw the introduction of a double foot kiss whenever
the pope himself performed the coronation (Wirth 1963: 176; Hoffmann 1990: 44–46).
Prayer and acts of penitence involving repeated prostrations (metanoeae) became
almost completely obsolete in Western Catholicism over the course of the twentieth
century. By contrast, they continue to be practiced until today on Mount Athos and
23. Medieval perspectives in Europe: Oral culture and bodily practices 351
among the Old Believers of the Russian Orthodox Church (Baumgartner 1967: 225). In
prayer instructions for Russian Old Believers edited in the late nineteenth and early
twentieth century, for instance, prostrations are proscribed for the beginning and end
of worship services. Eastern Orthodox instructions for prayer correspondingly give
the following suggestions: “When you are to make prostrations do not beat your head
against the floor of the church or home. Simply bend your knees and lower your head,
but do not strike the ground with it. Move both of your hands away from your heart
and properly place them on the floor. Do not extend them like two axes […]” (Syn
cerkovnyj 1894: 25–27; Robson 1995: 49ff.).
3.2. Law
The medieval act of jurisdiction comprised both acoustic and gestural aspects of the
speech act and was interpreted as a physical act. Since antiquity, the proper connection
of physical and verbal expressions in the context of legal eloquence was determined by
the rhetorical teaching of actio/pronuntiatio. Instructions on the use of gestures in the
context of medieval law can be inferred from the Sachsenspiegel, for instance. The illus-
trations in this thirteenth-century manuscript served to depict instructive patterns of
behavior within the courtroom. Thus, for instance, one illustration depicts the Gaugraf,
who invites the Bauermeister (or head of the township, indicated by his straw hat) to
speak through a polite hand gesture (his right hand describes an offering movement
in the direction of a group of four men). Three Landsassen stand behind the Bauerme-
ister, their hands lowered with palms facing inward in a gesture of reverence and patient
expectation. A fourth Landsasse, not quite conscious of his behavior, expresses a neg-
ative position (see Die Sprache der Hände: 10). Illustrations such as these served to
familiarize plaintiffs with the nature of legal procedures.
Although late Roman rhetoric in its most widespread form generally permitted and
approved of the use of hand gestures in the pronuntiatio, it also made allowances for the
alternative to the speaking hand. In this context, it was noted that the toga of the
ancient Greeks did not feature a pad. Judges and lawyers of the time, whose arms
were kept inside the garment, had to resort to another form of gestures. While the cloth-
ing of the “old orators” concealed both hands, in later garments one hand remained
free for gesticulation. In Quintilian’s Institutionis Oratoriae Libri XII (see there 11, 3:
84–124ff.) restrictions on gestures concern primarily the right but not the left hand
(manus sinistra), most likely under the assumption that right-handers held the toga
with their left hand (Maier-Eichhorn 1989: 114ff.).
The tradition of medieval rhetoric preserved the “hand-out-” and “hand-in-” posi-
tions as placeholders for two contradictory meanings, each of which could claim a cer-
tain validity. Whereas the first position was viewed as the embodiment of action, the
second signified a demonstrative renunciation of action and was thus a symbol of
sober-minded self-restraint (Fleckner 1997: 124ff.).
The tendency of medieval judges to conceal their hands in their garments can be
gleaned from the works of numerous Baroque authors. In his instruction on the use
of rhetorical gestures (Chironomia 1644), for instance, John Bulwer indicated that
the hand that was concealed and kept inside the garments was an expression of “mod-
esty” and “frugal pronunciation.” This interest in early Christian ethics of modesty
found its expression in increased significance of Hellenistic iconography. The statue
352 III. Historical dimensions
of Aeschines of Macedonia (390–331 BC), who is depicted with his hand resting inside
his dress, was considered an authoritative example. In his speech against Timarchos,
Aeschines invoked the ideal of the public orator of past times, who had addressed
the people like Solon with their hands tucked into the folds of their garments. “They
were too modest,” Aeschines stressed, “to remove their hand from their garments
during their speeches” (Aeschines, Against Timarchos: 25).
Not just the act of jurisdiction but every speech act was interpreted as a bodily act in
the context of oral culture. The application of legal sanctions rested on a complex sys-
tem of relationships through which the harm done to the individual body was translated
into the degree of damage done. High treason was thus punishable by beheading and
theft by cutting off a hand. Even the invalidation of a speech act had a physical dimen-
sion. It occurred on a non-verbal level, by means of forced exile, legal censorship of
speech, and not least through the removal of one’s tongue.
Up until the late eighteenth century, legal sources did not distinguish between bodily
versus speech acts in their treatment of cases that led to the tearing or cutting out of the
tongue (see Keller [1921] 198; Quanter 1901: 175–178; Schuhmann 1964: 39). Thus, the
“illocutionary” power and the “perlocutionary” injury caused by “profane” statements
were suppressed directly on the level of the body. Because the word of God was consid-
ered as community fostering, the existence of the community was considered threatened
by the presence of blasphemy. The increasing standardization of word and sentence
semantics within the context of Franciscan and Dominican education helped develop
the preconditions for formalizing legal codes, including laws about speech.
The intensification of the punishment for blasphemy (gotzschwür) is highly charac-
teristic for the late Middle Ages and the eve of the Reformation (see Schuhmann
1964: 37). Norm violations due to abusive expressions or curse words directed against
God and the saints, or deeds or gestures directed toward holy objects, were punishable
by pillory or removal of the tongue, depending on the corresponding legal measure.
Furthermore, the cutting out of the tongue was frequently used in the case of other
“foul” expressions, such as perjury, and occasionally because of false allegations, libel,
and extortion. The tongue was either removed from the throat in its entirety or, as in
the sixteenth century in Memmingen, shortened by the width of two thumbs: “Im Jahr
1368 in vigilia Magni wart Heinrich der Schwertfurb […] gestellt in den branger und dar-
nach wart im die zung uz dem hals geschnitten und die stat ewiglich verboten […]” (“In
the year 1368 in vigilia Magni Heinrich der Schwertfurb was placed onto the pillory,
whereafter his tongue was cut from his throat, and he was forever banished from the
town.” Buff 1877: 166).
The unification of punitive measures demonstrated, for instance, by the Carolina
(1532), or Peinliche Halsgerichtsordnung, of Emperor Charles V, encompassed forms
of punishment for “criminal” acts of speech that were now increasingly treated in the
context of national law: “Abschneidung der zungen Offentlich in branger [Pranger]
oder halßeisen gestellt, die zungen abgeschnitten, und darzu biß auf kundtlich erlau-
bung der oberhandt [Obrigkeit] auß dem landt verwisen werden soll. (“Cutting of ton-
gues: [they are] to be publicly placed on the pillory or into the iron collar, their tongues
cut out, and until public permission by the authorities expelled from the land.” Cited
after Carolina § 198. see Quanter 1901: 176; Hensel 1979: 74).
The transformation of the legal order of Modern Europe was reflected in the increas-
ing value ascribed to conversational rhetoric (Linke 1996: 129). The process by which
23. Medieval perspectives in Europe: Oral culture and bodily practices 353
social life became increasingly codified created the background conditions that trans-
formed verbal media into the only possible means of communication and conflict reso-
lution. Whereas the stability of the medieval social order largely rested on acquired
body techniques, modernity put greater emphasis on verbal skill, which increasingly
dominated society as a technique of information exchange (Fauser 1991: 123). The
art of conversation, differentiated from both law and ceremony, became a trans-societal
value over the course of the eighteenth century and was even accorded the role of the
primary medium in social communication. Christian Thomasius encapsulated this trans-
formation in a single sentence: “The basis of all societies is conversation” (Thomasius
1710: 108).
Close attention to the historical relationship between conversational practices and
punishment by removing the tongue, with a particular stress on the interrelation
between rhetorical and brute-force mechanisms of blocking speech, shows that the
social rhetoric of early modern Europe integrated methods of communication that
aimed at cancelling out speech acts into social practices. They enabled the cancelling
out of speech acts primarily on the “rhetic” level, by drawing on rhetorical patterns
such as the following: “Let’s change the subject,” “Let’s not talk about that now”
(Freidhof 1992: 22b).
Early modern conversation manuals propagated polite expressions that, for instance,
(i) contained information about the receptivity of the listener (“I don’t want to hear
anything about that”); or
(ii) limited the amount of information communicated (“Enough of it,” “Let’s drop the
subject”); or
(iii) led the conversation in a new direction (“Let’s talk about something else
instead”); or
(iv) imputed to the speaker a pragmatic, or “illocutionary,” objective not intended by
his conversational move (“You must be joking,” “You can’t be serious,” “What a
question”).
One dialogue in a bilingual conversation manual reads as follows: “Sir, you’ve put on
considerable weight since your marriage” – “Surely you are joking!” The form of
such dialogues reflects the actual everyday demand for non-violent prohibitions on
speech and corrections of speech intentions in frequently recurring conversational
situations. In line with Radtke’s observation, we must note that most conversation man-
uals before the early seventeenth century were rather rudimentary. Only in the begin-
ning of the seventeenth century did notable foreign language grammar books begin to
be published more frequently. Both the foreign didactic literature used in teaching as
well as the basic grammars were designed rather ordinarily in the seventeenth century.
These foundations were essentially adopted in the eighteenth century and the basic
approaches were retained (Radtke 1994: 48–56).
3.3. Ceremony
Ceremony served to symbolically depict and thus stabilize social hierarchies. On the
level of signifiers, the ceremonial normativization of behavior focused on bodily ( proxe-
mic) distance and asymmetric body techniques such as the bow and the genuflection.
354 III. Historical dimensions
so that their face remained turned toward the emperor (De Ceremoniis I: 32; II: 56;
Vogt II: 160).
Alföldi’s 1935 [1970] studies on the genesis of court ceremony in the West have
shown without doubt that rituals such as observing a ceremonial distance to the ruler
and performing asymmetrical gestures of obeisance (such as genuflection) slowly
emerged within the context of late Roman imperial ceremony. It is symptomatic that
the introduction of proskynesis at the court of Diocletian (284–305 AD) occurred
against the background of a far-reaching codification of administrative and everyday
life. Aside from the submissive genuflection, such standardizing measures included
the introduction of a uniform trade tax, the subdivision of sovereign territory, the intro-
duction of price regulation, the standardization of coinage, and finally the far-reaching
bureaucratization of the administrative apparatus. “Principate” was replaced by “Dom-
inate” and henceforth the Emperor expected to be honored as Dominus rather than as
the Princeps among equals.
It was as late as the third century AD that genuflection gained entrance into the sys-
tem of government of late antiquity, and later that of the Middle Ages. Diocletian, who
ruled from 284 to 305 AD, may have been the first emperor to firmly incorporate it as
part of court ritual. Quite symptomatically, one of the most important preconditions for
this was the decline of the greeting rituals exchanged between the emperor and his se-
nators, which involved a mutual kiss and emphasized equality. Even Tiberius (14–37
AD) already replaced the daily receptions in the early morning with the simultaneous
arrival of all council members, apparently because of administrative concerns. During
this transition, the “day-to-day kisses” (cotidiana oscula) were also abolished as the
epitome of salutatio, which had the effect of increasingly subjecting greetings to the
control of state authorities (Alföldi 1935: 41).
Henceforth the genuflection (genuflexio) became the epitome of imperial ceremony.
Emperor and king kneeled during the unction, vassals kneeled during commendation
and obeisance, and knights kneeled while being knighted (see Amira 1943). Aside
from the common distinction between falling on one knee (simplex) and on both
knees (duplex), the question as to whether one should kneel with erect or lowered
upper body continued to lead to disagreements. In the first version (genuflexio recta),
the upper body was aligned orthogonally to the axis of the knee. In the second variant
(genuflexio proclivis) the upper body of the kneeling subject was inclined forward toward
the ground. The resulting inclined angle was often so large that the head touched the
ground, as was the case in the Chinese kau-tau and most likely in the Russian “bowing
down” (čelobitie) (see Sittl 1890: 160). Western rules for the knave and page service,
for instance, proscribed kneeling on one knee before every lord, “whoever he may be”
“Hold up youre hede, and knele but on oone kne // To youre sovereyne or lorde, whedir
he be” (The Babees Book V. 62ff., cited after Mahr 1911: 18). In contrast to Western
courts, at the Moscow court of the seventeenth century it was customary for the head
to touch the ground during the adoration of the Tsar. (According to reports of the
time, Muscovite diplomats during a reception on October 10 1654 “each time bowed with
their foreheads down to the ground”, “sich idesmalhs mit der stirn gantz auff die erden gen-
eigt.” This form of reverence was accordingly titled “reverence in the Muscovite manner”,
“die reverentz auff die moscowitische arth”).
While the genuflection became a part of papal and imperial ritual in the early Middle
Ages, use of the foot kiss generally remained restricted to acts of homage before the
356 III. Historical dimensions
pope (Schreiner 1990: 118). Due to the resistance of the West European knighthood
and West European kings, the foot kiss was rarely used as a ritual of paying obeisance
to the ruler in the West Roman Empire. In the few extant references to it, this gesture
only appears as an act of submission: thus the knights of Milan kissed the feet of
Barbarossa in 1182. In the East Roman Empire, however, the foot kiss was practiced
regularly during acts of obeisance to the emperor and the patriarch. It is likely that
the Roman popes adopted the foot kiss based on Byzantine court ritual, since, begin-
ning in the seventh century, the deacons who read the gospels during the papal mass
approached the throne after reading in order to kiss the pope’s feet. In the context
of papal elections, too, both ecclesiastic and secular laymen kissed the feet of the
newly elected head of the church.
According to Schreiner, the earliest documents that prove beyond a doubt that even
high-ranking secular persons in the West kissed the Pope’s feet date to the ninth cen-
tury. However, the foot kiss was not promoted to an obligatory part of imperial crown-
ing ceremonial until the beginning of the twelfth century. For instance, during the
procession leading up to the coronation on February 12th 1111, Heinrich V kissed
the feet of Paschalis II, who was waiting before the gates of St. Peter. Papal demands
for the foot kiss ritual were often justified by references to biblical role models, but pri-
marily through Christological associations. (See Ps 3, 11: Request to kiss God’s feet;
Lk 7, 38; 45: Mary Magdalene kissing the feet of Christ; Mt 28, 9: Proskynesis of the
women before the risen Christ; Apg 10, 25: the Centurion Cornelius kissing the feet
of St. Peter, Schreiner 1989: 1063.)
Aside from the foot kiss, a number of other foot-centered acts of obeisance hovered
near the boundary of the allowed and the reasonable. These, for instance, included the
washing of feet and the so-called “strator service” performed at the stirrups. In the
twelfth and thirteenth centuries, a rivalry arose in the symbolical territory of corre-
sponding rituals of obeisance between the papal and princely powers in the West. As
a consequence, since the Middle Ages foot-related obeisance gestures corresponded
far more closely than the genuflection to a “border of degradation,” the transgression
of which could cause irreversible damage to the self-perception of those involved.
Such gestures presumably presupposed a well-calculated mutual understanding between
the environment and those affected and were mandated primarily in the context of
national law and rituals of legal ratification.
In all likelihood, the manner of execution of the foot kiss, the washing of the feet,
and the strator service was never completely spontaneous. Most likely, the “script”
was always agreed upon in advance by both communicating parties (Ostrogorsky
1935: 200). But even a provisional agreement could hardly ensure the full predictability
of the ritual cases where the subject in the lower position used the possibility to diverge
into a ritual of evasion. For example, King Frederick Barbarossa’s refusal to provide
Pope Hadrian IV his services as stableman (1115) was almost certainly a diplomatic
trap for which the pope was not prepared. He responded to the king’s unwillingness
to hold the stirrups by refusing him the kiss of peace. The damaged frame of communi-
cation had to be repaired by means of a corrective exchange, until the King finally as-
sented to perform the strator service in the form of a simple ceremony. The ritual of
kissing the feet, which consisted of a ceremonial approach on horseback, followed by des-
cending, kneeling, and kissing the feet, therefore generally presupposed the “script-safe”
behavior of the involved parties. Apparently, decisions about details of staging were less
23. Medieval perspectives in Europe: Oral culture and bodily practices 357
the responsibility of the emperor or the pope than of the masters of ceremony, who,
among other things, were to ensure an illusion of spontaneity and veracity. Such is the
impression conjured by the description of a foot kiss ritual in 1177, performed by the
newly crowned King Frederick I Barbarossa before Pope Alexander III:
Frederick approached the Pope, removed his red robe, threw himself onto the ground before
Alexander with arms stretched apart, kissed his feet [“like the first of the apostles” – deos-
culatus eius tanquam primi apostolorum pedibus] and then his knees. Touched to tears, the
pope rose slightly, helped the emperor stand up, embraced his head with both hands, gave
him the kiss of peace [pacis osculum] and bade him sit by his side. (Althoff 1997: 229–258)
As stylized eyewitness report shows, the aim of the foot kiss was less to indicate real
positions of secular and ecclesiastical power than to elicit assertions about the legiti-
macy of a ruler’s sovereignty from the audience. Ultimately, this approach served to
reveal the channels of communication between primary and secondary social elites.
Correspondingly, the foot kiss offered to the pope by the emperor was justified in var-
ious ways by contemporary opinion: on the one hand, it was viewed as a simple cere-
mony, on the other, as one of the acts of reverence befitting the first apostles, and
thirdly, as an act of obeisance due only to God. Explanations for the rationale behind
Barbarossa’s gesture vary from source to source. According to Bosso, the emperor per-
formed the foot kiss “according to the model of the first of the apostles,” i.e. Saint Peter.
According to Romuald, the gesture was directed exclusively at God (Deum in Alexan-
dro venerans). According to the annals of Pisa, Barbarossa offered due reverence to his
spiritual father (li fece debita reverenzia come a Padre spirituale) (see Hack 1999: 654).
Based on the thesis that in most documented cases the function of the foot kiss ritual
was ostentatious and directed toward the medieval urban public, we can assume that
the termination of this ritual (with the last papal coronation, that of Charles V in
Bologna, 1530) was sanctioned less by its participants than by the audience.
The intensive reformatory polemics against the kissing of the pope’s feet coincided
with a tendency to raise the middle-class body and to reduce the physical strain of obei-
sance rituals. In his Tractatus de Ecclesia, Jan Hus characterized the foot kiss demanded
by the popes as an expression of anti-Christian conceit. The heresies of which Hus was
accused during the Council of Constance (1414–1418) also included his notion that the
pope merited neither the title sanctissimus nor the foot kiss. In the year 1468, Emperor
Frederick III refused to kiss the feet of Pope Paul II because otherwise he “might not
have preserved the majesty of the empire.” Furthermore, Martin Luther’s 1520 treatise
“to the Christian nobility of the German nation” condemned the “Fußkussen des
Bapsts” (“kissing of the pope’s feet”) as “Endchristlich exempel” (“an anti-Christian
example”) (see Hus 1413: 98; Schreiner 1990: 123–124ff.). The tendency toward obscur-
ing ostentatious gestures, which also manifested itself in secret state policies and the
clothing habits of the Jesuits, suggested that the faith of both the Catholic Church
and secular power in iconic depictions of canonical devotional gestures had lessened
significantly toward the mid-sixteenth century.
3.4. Etiquette
In the Middle Ages, etiquette was applied in situations where the boundaries of differ-
ent social systems had been crossed. This was the case with migrants, such as couriers
358 III. Historical dimensions
and pilgrims, which travelled from land to land. The necessity of implicating such border
crossers in a foreign system of relationships led to a series of conversion mechanisms that
enabled a partial equalization of insiders and outsiders.
In the following, I suggest a distinction among three types of equalization: equalization
in the sense of leveling, equalization in the sense of ritual inclusion, and equalization in
the sense of identical means of exchange.
The first aspect, leveling, refers to a symbolic inversion of social hierarchies of
power. Most medieval leveling rituals, such as the washing of feet, occurred primarily
in monasteries, but spread from there into secular space in the form of Christological
rituals. A specific example of leveling in governmental greeting rituals is the exemption
of subjects from genuflection, i.e. by raising the supplicants from the ground (allevatio
supplicis). The increasing tendency toward social leveling in public bodily practices was
characterized by the desemanticization of greeting and parting phrases that stress sta-
tus. This is the case in the German servus (Latin ‘the slave’) and the Italian ciao,
which derives from schiavo (likewise ‘the slave’) (see Prati 1936: 240ff.).
The second aspect of the notion of equality encompasses the semantics of inclusion.
In most traditional cultures, the latter corresponds to a symbolic restoration of primor-
dial kinship relations, such as by ritual “fraternization.” In such cases, “equality” is
established by force of belonging to the same extended family or community of Chris-
tian believers. This process is exemplified by the fraternal kiss of peace, the so-called
osculum fidei et pacis, which was characteristic for medieval peace agreements. In socie-
ties where communicative accessibility was reserved for a few houses of nobility related
through marriage, the symbols of “primordial” relationship generally had legal status.
They therefore guaranteed the preservation of social boundaries (Voss 1987: 134–
136). The treatment of the excommunicated in the Middle Ages shows that the so-
called “brotherly kiss” was also considered as a sign of inclusion. In addition to the
kiss of peace, excommunicated subjects were to be refused company at the table, polite
address, greeting words, as well as conversation (“In quinque maxime vitandi sunt
excommunicati, videlicet in mensa, oratione, salutatione osculo pacis et in colloquio”
Summa de Confessione des Petrus von Poitiers, around 1200, cited after Fuhrmann
1993: 118).
The third aspect of equalization implies the identity of the rules of the game and the
means of exchange, and thus presupposes the notion of communication as an exchange.
One example of this phenomenon is the English “handshake.”
Ritualized inclusion into the family was among the preferred forms of equalization
in the Middle Ages. Only in modern Europe did the unification of means of exchange
become of greater importance. Documents prove that in the Middle Ages the kiss of
peace and the fraternal kiss (osculum pacis) were exchanged between strangers (see
Strätz 1999: 1590ff.). But outside of the church, too, it was typical for “community mem-
bers to exchange the fraternal kiss after communal prayer and before the beginning of
the Eucharistic part of the mass” (Schreiner 1990: 101ff.). According to popular belief,
the ritual of the fraternal kiss can be traced to a rule that the Apostle Paul imparted
to the faithful: “Greet one another with the holy kiss [Salutate invicem in osculo
sancto]” (This is nearly a fixed expression in the letters of St. Paul. St. Paul, Rome
16, 16; I. Cor. 16, 20, II. Cor. 13, 12; I. Thess. 5, 26 here: invicem fratres omnes) (see
Voss 1987: 139). It is also very likely, however, that the legal-protocolary symbolic
power with which this kiss was vested not only in the normativized ecclesiastic space
23. Medieval perspectives in Europe: Oral culture and bodily practices 359
of communication in the Middle Ages, resulted from the nature of pre-modern socie-
ties. In the latter, social boundaries were determined not by formal organizations, but
primarily by sensorily perceptible interactions in the context of exchanging gifts. One
can even assume with great likelihood that medieval princely law served as an essential
precondition for the spread of this ecclesiastic gesture. In the context of the fraternal
cheek kiss, we can refer to the law of the “family of kings,” according to which the mem-
bers of the ruling family were required to address each other using the brotherly “you”
(“Du”) and to kiss. Dölger indicates “that the kiss between kings was a well-known
institution in the Middle Ages” (Dölger 1976: 34–69). In the meeting of kings in
Ruodlieb, which largely reflects eleventh century ceremonial, according to Voss, the
use of the kiss as a greeting between rulers is also entirely taken for granted (Voss
1987: 142). A typical example of later exchanges was the meeting between Heinrich
VI and Philipp II in Milan (1191), where the German emperor received the French
ruler in osculo pacis (Voss 1987: 139). This form of greeting by kiss is also documented
in the context of East- and West-Franconian encounters, such as in the meetings
between Otto II and Lothar (in 980), who kissed simultaneously as family members
and as rulers (osculum sibi dederunt).
The reception by kiss and embrace became a firm component of knightly and diplo-
matic encounters in the late thirteenth century. In contemporary literature, the great
significance of “brotherliness” is expressed in greeting scenes of a highly emotional
nature. We must distinguish, however, between the greeting kiss exchanged while stand-
ing and the expressions of obeisance that took place between sovereign and vassal dur-
ing feudal transactions. Mentions of physical contact between vassal and sovereign
disappear from the records around the end of the sixteenth century. According to
Old French law, the vassal kneeled and placed his hands into the hands of the sovereign
(immixio manuum), upon which the latter raised the vassal from his knees and kissed
him on the mouth (see Chénon 1924: 137). On the verbal level, ritualized formulas
affirmed the semantics of mutual trust during feudal transactions (see “Vers 1260, Le
Livre de jostice et de plet décrit le rite de l’hommage en ces termes: Cil qui requiert
doit joindre les mains et dire: Sire, ge deviens vostre home, etc […] Et li sires doit re-
spondre: Et ge vos recef à home, que ge foi vous porterai, comme à mon home, et vos en
bese en nom de foi” cited after Chénon 1924: 138). Such encounters involved not only
the cheek kiss, but kisses on eyes and lips as well. “Plus de c[ent] fois li baise // et la
bouche et le nez” (cited after Schultz 1889: 521). See also the same document in Heck-
endorn (1970: 3). For instance, after Hartmann von Aue’s Gawein and Iwein recognized
one another they kissed each other on eyes, cheeks, and mouth for many hours. See: “Si
underkusten tausentstunt // Ougen wangen unde munt” (Hartmann von Aue, Iwein,
V. 7503/4). On this, see also Heckendorn (1970: 3ff). See also the following commentary
by Schreiner: “The writers of Middle High German lyric and epic poetry profited
from the fact that ‘munt’ and ‘stunt’ rhymed. Apparently there was no shortage of
lovers who kissed the eyes, cheeks, and mouth of their beloved ‘tusent-stunt’, ‘dusent
stund’ or ‘tausend stunt, êtlicher stunt, an der stunt, bı̂ der stunt’ and ‘zô der stunt’ ”
(Schreiner 1990: 104; see also Jones 1966: 195–210). In Wolfram von Eschenbach’s
Parzival, Gahmuret kisses his knights in greeting: “Dô kuste er die getriuwen […]”
(Wolfram von Eschenbach Parzival II: 99; V: 7).
Aside from male greeting rituals, medieval literature contains numerous scenes in
which the maid-of-honor responds to the greeting of the knight with a kiss on the
360 III. Historical dimensions
cheek. In the Middle Ages, this gesture was presumably not entirely free from the pos-
sible erotic connotations of courtly love (see the scene depicting the greeting of the
cousin of Gahmuret, during which the queen kisses the knight’s dagger: “von der küne-
ginne rı̂ch // si kuste den degen minneclı̂ch”) (Wolfram von Eschenbach Parzival I: 47;
V: 28; 29; 30; 48; V: 1; 2). It must be added, however, that non-erotic symbols, such as
collective feasts and the giving of alms, could also occur in a ritualized relationship to
the female kiss. According to the notions of pre-modern economics, the female greeting
kiss was thus associated less with eroticism and more with ritualized acceptance into an
extended family.
In this function, the lady’s kiss was incorporated into the sign system of medieval ars
donandi: it symbolized the wealth and military might of the feudal lord and implicitly
referred to his reproductive ability. Most often, the lord invited his male guests to be
greeted with a kiss by his wife. Accordingly, medieval poetry tended to raise the female
kiss to the equivalent of payment for knightly virtues. In the context of the dominant
symbolic order of medieval society, the knight’s ascent along the social ladder was usu-
ally depicted symbolically by shortening the ceremonial distance vis-à-vis the sovereign.
In Wolfram von Eschenbach’s Parzival, Gahmuret, victorious against Razalic, requests
that the latter approach and kiss his wife: “gêt näher, mı̂n hêr Razalic: // ir sult küssen
mı̂n wı̂p” (Wolfram von Eschenbach Parzival I: 46; V: 1, 2; Heckendorn 1970: 3ff). In
the work of an anonymous Rhenish poet (around 1180), the queen steps out of the
women’s chambers (the bower) in order to approach the knights of King Rother and
to kiss them (“die vrouwe alsô lossam // kuste den hêren // do schiet her danne mit
êren // ûz van der kemenâte […] die konigı̂n ginc umbe // unde kuste besunder // alle
Rôtheres man” König Rother around 1180). And when Parzival meets his teacher’s
daughter, he is likewise permitted to approach in a rather symbolic manner, whereby
“he kissed her on the mouth” (“kuste er sie an den munt”) (Wolfram von Eschenbach
Parzival III: 176; V: 9; Heckendorn 1970: 3ff).
In the modern era, the integration of feudal economic units into the relationship
between absolutist courts and the dissolution of knightly familial ties led to a reevalua-
tion of fraternal greeting rights during court rituals. A tendency toward the semantic
relativization of the fraternal kiss in the ecclesiastic context developed at the same
time as this gesture was gradually displaced from the secular stage. The significance
of the kissing ritual, in effect considered inscrutable, was increasingly undermined
by questions about its meaning, i.e. by questions about the extent to which a holy
kiss (osculum sanctum) could be distinguished from a sinful kiss between lovers
(Hieronymus Epistolae 22: 16; see also Fuhrmann 1993: 117ff.; Sittl 1890: 79). The
kiss of peace was increasingly abandoned in the interest of preserving one’s image.
The early twelfth century saw the propagation of a rule according to which the kiss
of peace could be exchanged only within a given house of nobility. Such restrictions
on public exchanges of kisses on the mouth may also have been indirectly related to
the appearance of the so-called “kissing plaque” (osculatorium or instrumentum
pacis). It was now no longer necessary to exchange kisses; “symbolic mediacy was
replaced by a consecrated device” (Schreiner 1990: 101). This was a small plate made
of medal or marble that featured the image of Christ, which was to be kissed. The
osculatorium began to fulfill a symbolic function in church ritual by representing the
lips of the community members that were to be kissed.
23. Medieval perspectives in Europe: Oral culture and bodily practices 361
Aside from this process of displacement, devotional literature of the early modern
period reveals an increasing tendency to differentiate among various meanings of the
kiss. For instance, Johannes von Paltz (died 1511) distinguishes between three forms
of the osculum corporale: a kiss that can be recommended (osculum commendabile),
a kiss that can be excused (osculum excusabile), and a kiss that must be treated with
disdain (osculum detestabile). The greeting kiss (osculum receptionis) belongs to the
recommended gestures, insofar as it appears justified by Old Testament examples.
The osculum cognationis is a forgivable kiss, which takes place between mother and
son. Contemptible kisses include those which express affectation (simulatio), treachery
(dolositas), and sensual desire (libido) ( Johannes von Paltz around 1500: 155–157; cited
after Schreiner 1990: 110).
In the early modern period, the handshake replaced the kiss of peace as sign of egal-
itarian treatment. The increasing popularity of the handshake for initiating communica-
tion must be viewed in the context of the general expansion of early modern
communicative intercourse, where encounters with strangers and the process of initia-
tion communication were increasingly normativized and formalized. This process of
normativization was apparently set off by the fact that individual persons – merchants
and ambassadors, migratory knights and religious missionaries – were increasingly lifted
from the context of primordial and ceremonial power hierarchies. Prior to a tourna-
ment, knights who did not know each other greeted one another by lifting their visor
and showing an unprotected right hand regardless of status and rank. The primary
focus of the event was on comparing their performance. Quite symptomatically, one
symbol of honor consisted in exchanging positions as well as weapons with the oppo-
nent. In a similar way, the use of the handshake as a greeting among merchants served
to affirm the equality of communicative strategies outside their range of application
within power hierarchies (Mahr 1911: 41).
While extending the right hand in greeting was primarily the privilege of feudal lords
and their progeny in the early Middle Ages, sixteenth century diplomatic protocols indi-
cate that this gesture, at least in England, had already become a component of diplo-
matic greeting rituals. As early as in the work of Saxo Grammaticus, we read,
according to Kolb, about the encounter between Frederick I and Waldemar in Lübeck
(1181), which included an embrace, a kiss, and a handshake [“Siquidem inprimis eum
amplexu atque osculo decentissime veneratus, mox dextra honorabiliter apprehensa”]
(Kolb 1988: 99). Apparently, shaking the right hand was part and parcel of greeting ri-
tuals in encounters between rulers both in the early and late Middle Ages. In this
regard, Voss (1987) polemicizes with Wielers (1959), who argues that the use of the
term dextras dare in the context of meetings between rulers could only be traced to
the eighth century (Voss 1987: 139ff.).
The frequency with which diplomatic correspondence since the late sixteenth cen-
tury mentions the hand greeting in the context of diplomatic meetings suggests that
emblematic gestures at the time became abstracted from their concrete semantics of
approach, affirmation, and peaceful intentions, and were increasingly formalized. As
a result of this formalization of diplomatic practice, offering one’s hand became a sim-
ple signal for initiating communication. In this form, it continues to be used regularly at
the beginning of conversations, regardless of the ranks of the interlocutors, and even in
cases where the conversation concerns a disagreement of conflict of interests.
362 III. Historical dimensions
4. References
Aeschines 1908. Aeschinis Orationes. Leipzig: Teubner. First published [400 AD].
Alföldi, Andreas 1970. Die Monarchische Repräsentation im Römischen Kaiserreiche. Darmstadt,
Germany: Wissenschaftliche Buchgesellschaft. First published [1935].
Althoff, Gerd 1997. Spielregeln der Politik im Mittelalter. Kommunikation in Frieden und Fehde.
Darmstadt: Wissenschaftliche Buchgesellschaft.
Amira, Karl von 1905. Die Handgebärden in den Bilderhandschriften des Sachsenspiegels. Munich:
Akademie der Wissenschaft.
Bolkestein, Hendrik 1979. Theophrastos’ Charakter der Deisidaimonia als Religionsgeschichtliche
Urkunde. Religionsgeschichtliche Versuche und Vorarbeiten 21(2): 1–21. Giessen: Töpelmann.
First published [1929].
Buff, Adolf 1877. Verbrechen und Verbrecher zu Augsburg in der zweiten Hälfte des 14. Jahrhun-
derts. Zeitschrift des Historischen Vereins für Schwaben und Neuburg 4(2): 1–108.
Chénon, Emile 1924. Le rôle juridique de l’osculum dans l’ancien droit français. Mémoires de la
Société des Antiquitaires de France Serie 8, Volume 6, 124–155.
Dölger, Franz 1976. Die “Familie der Könige” im Mittelalter. In: Franz Dölger, Byzanz und die
Europäische Staatenwelt, 34–69. Darmstadt: Wissenschaftliche Buchgesellschaft.
Fauser, Markus 1991. Rhetorik und Umgang. Topik des Gesprächs im 18. Jahrhundert. In: Alain
Montandon (ed.), Über die Deutsche Höflichkeit: Entwicklung der Kommunikationsvorstellun-
gen in den Schriften über Umgangsformen in den Deutschsprachigen Ländern, 117–143. Char-
lottesville: University of Virginia Press.
Fleckner, Uwe 1997. Napoleons Hand in der Weste: Von der ethischen zur politischen Rhetorik
einer Geste. Daidalos 64: 122–129.
Folz, Robert 1964. Le Couronnement Impérial de Charlemagne: 25 Décembre 800. Paris: Gallimard.
Freidhof, Gerd 1992. Typen dialogischer Kohärenz und Illokutions-Blockade. Zeitschrift für Sla-
wistik 37(2): 215–230.
Fuhrmann, Horst 1993. Willkommen und Abschied. Über die Begrüßungs- und Abschiedsrituale
im Mittelalter. In: Wilfried Hartmann (ed.), Mittelalter. Annäherungen an eine Fremde Zeit,
111–139. Regensburg: Schriftenreihe der Universität Regensburg.
Hack, Achim Thomas 1999. Das Empfangszeremoniell bei Mittelalterlichen Papst-Kaiser-Treffen.
Vienna: Böhlau.
Heckendorn, Heinrich 1970. Wandel des Anstandes im Französischen und Deutschen Sprachgebiet.
Basel: Lang.
Hensel, Gerd 1979. Geschichte des Grauens. Deutscher Strafvollzug in 7 Jahrhunderten. Altendorf:
Lector-Verlag.
Herberstein, Sigmund 1855. Selbstbiographie Sigmunds Freiherrn von Herberstein. In: Theodor
von Karajan (ed.), Fontes Rerum Austriacarum, Abt. 1, Volume 1, 67–396. Vienna: Bohlau.
First published [1517].
Hoffmann, Thomas 1990. Knien vor dem Thron und Altar. Friedrich Nicolai und die Orthopädie
des aufrechten Ganges vor den Kirchenfürsten. In: Bernd J. Warneken (ed.), Der Aufrechte
Gang. Zur Symbolik einer Körperhaltung, 42–49. Tübingen: Tübinger Vereinigung für
Volkskunde.
Hus, Jan 1413. Tractatus de ecclesia. Edited by Samuel Harrison Thomson. Cambridge: Roger
Dymmok, 1956.
Jones, George Fenwick 1966. The kiss in Middle High German literature. Studia Neophilologica
38: 195–210.
Kämpfer, Engelbert 1827. Diarium Itineris ad Aulam Muscoviticam indeque Astracanum suscepti
Anno 1683. Edited by Friedrich Adelung. [First published in 1683]. Saint Petersburg.
Keller, Albrecht [1921] Der Strafrichter in der Deutschen Geschichte. Bonn/Leipzig: Nachdruck
Hildesheim.
Kelly, John Norman Davidson 1986. The Oxford Dictionary of Popes. Oxford: Oxford University Press.
23. Medieval perspectives in Europe: Oral culture and bodily practices 363
Treitinger, Otto 1969. Die Oströmische Kaiser- und Reichsidee nach ihrer Gestaltung im Höfischen
Zeremoniell. Vom Oströmischen Staats- und Reichsgedanken. Bad Homburg: Gentner.
Trexler, Richard 1987. The Christian at Prayer. An Illustrated Prayer Manual Attributed to Peter
the Chanter (d. 1197). Binghampton, NY: Medieval and Renaissance Texts and Studies.
Viller, Marcel, Ferdinand Cavallera, Joseph de Guibert, André Rayez and Charles Baumgartner
(eds.) 1967. Dictionnaire de Spiritualité Ascétique et Mystique. Doctrine et Histoire. Paris: Gabriel
Beauclesne.
Voss, Ingrid 1987. Herrschertreffen im Frühen und Hohen Mittelalter. (Beihefte zum Archiv für
Kulturgeschichte 26.) Cologne: Böhlau.
Vogt, Albert (ed.) 1939–1967. De Ceremoniis, T. 1, Livre 1, chapitres 1–46(37). Paris: Les Belles Letters.
Vogt, Albert (ed.) 1939–1967. De Ceremoniis, T. 1, Livre 1, chapitres 47(38)–92(83). Paris: Les
Belles Letters.
Walter, Otto 1910. Kniende Adoranten auf attischen Reliefs. Jahreshefte des Österreichischen
Archäologischen Instituts in Wien 13: 229–244. Vienna: Rohrer.
Wielers, Margret 1959. Zwischenstaatliche Beziehungen im Frühen Mittelalter. Ph.D. dissertation.
Munich.
Wirth, Karl August 1963. Imperator Pedes Papae deosculator. Ein Beitrag zur Bildkunst des 16.
Jahrhunderts. In: Hans Martin Freiherr von Erffa and Elisabeth Herget (eds.), Festschrift für
Harald Keller zum 60. Geburtstag. Darmstadt: Roether.
Wolfram von Eschenbach 1952. Parzival. Edited by Karl Lachmann. Berlin: Walter de Gruyter.
[First published in the 1300s].
Abstract
The idea of gesture as a universal language, an inheritance from classical rhetoric, took
on new importance around the mid-16th century. Among the relevant factors were a
heightened emphasis on nonverbal media in the propaganda wars of the Reformation
and Counterreformation; associated reforms in the teaching of rhetoric that gave stron-
ger theoretical grounding to the affective use of voice and gesture; the rediscovery of
Aristotle’s Poetics (Halliwell 1986) and its catalyzing effect on contemporary rhetoric,
emphasizing affect and extending the reach of rhetorical theory to the nonverbal arts;
the addition of a dynamic physiognomics drawing on basic principles of psychology,
physiology and medicine, to the traditional physiognomics based on static anatomical
24. Renaissance philosophy: Gesture as universal language 365
1. Introductory
In the Renaissance, as in the Middle Ages, gesture and its depiction continued to serve
as a sort of universal language in religious painting, stained glass, sculpture, sacred
drama, and ritual (Barasch 1987; Baxandall 1972: 61, 1982: 234 note 34; Davidson
2001; Schmitt 1991), most effective amidst high illiteracy. In logic and grammar, sign
theory had been concerned mainly with arbitrary signs, but a theory of natural signs,
including signs of passions, had long been central to medicine (Wollock 1990: 17–22;
1997: 97–152; 2002: 243–249; 2012: 843). While gesture received pragmatic and usually
cursory treatment, if any, in late medieval rhetorical handbooks (Knox 1990: 102–109,
115–16; 1996: 379–387; Müller 1998: 42–45; Rehm 2002: 34), medieval logic and theol-
ogy had a rich tradition of sign theory, including natural, nonverbal signs, based on
Augustine’s De Doctrina Christiana Book II.1–8 (Burrow 2002:1–4; Jackson 1969; Manetti
1993: 157–168, see 142–143) – still important to Bulwer in 1644 (Wollock 2011: 42–43).
Among classical works known in the Middle Ages, the ps.-Ciceronian Rhetorica ad
Herennium and mutilated texts of Cicero’s De Oratore and Quintilian’s Institutio Ora-
toria presented gesture as a universal language mirroring the movements of the soul
(Enders 1992, 2001). A complete Quintilian was discovered in 1416; Cicero’s De Oratore
and Orator, in 1421 (Rehm 2002: 33–36; G. Rossi 2003).
invented Aristotle’s Poetics” (Hathaway 1962: 6). Interpretation of this treatise fol-
lowed prevalent conceptions of classical rhetoric and the poetics of Horace (Vickers
1991), making new connections and giving new emphasis to old ones. Thus the Poetics
became the catalyst for a distinct group of philosophical problems touching on gesture.
Questions about opsis (i.e. “spectacle”, visual aspects of theatrical performance, includ-
ing actorial representation), hupokrisis (representation, acting), and psychagogia (liter-
ally “soul-leading”, or affective aspects of rhetoric) (Halliwell 1986: 337–343; Janko
1987: 153–154), were integrated with already-established debates on gesture as part
of hupokrisis (Latin actio) in rhetoric and ekphrasis (vivid visual description) in both
poetry and rhetoric.
Voice and gesture are voluntary, bodily actions expressing the emotions and inner
workings of the mind. In oratory, they comprise the art of delivery (performance), gen-
erally but not always used together. As an art, writes Aristotle (Rhetoric III.1) delivery
had been neglected in rhetoric; it developed in poetics as actors and rhapsodes replaced
the poets themselves as interpreters (Rhetoric 1403b33). Aristotle’s view of gesture is
decidedly critical. To the finest taste, delivery itself is vulgar (phortikos, Rhetoric III.1,
1404a1). At Poetics 61b27–29–62a12, he reports the opinion of some critics that
the use of gesture makes tragedy more vulgar than epic (Poetics 62a3–6; Davis 1992:
156; Janko 1987: 153). Although spectacle is a constituent part of tragedy and comedy,
it is the least artistic and the least connected with the poetic art: for the true ends
of tragedy can be achieved by simply reading the poetry aloud (Poetics 50b16–20,
62a11–13).
Renaissance commentators, e.g. Robortello (1548) and Castelvetro (1570), imbued
in rhetorical tradition, disputed this view (Puttfarken 2005: 64, 65). Had not Cicero
and Quintilian praised the dignity and necessity of gesture (Dutsch 2002; Fögen 2009;
Hall 2004)? “Every motion of the mind, said Cicero in a famous passage (De Oratore
III), has its own face and voice and gesture”, and at a minimum, these expressions must
not be at odds with what is expressed. Did not Cicero hold the comedian Roscius in the
greatest admiration for his superlative mimicry (Duncan 2006: 174–175)? Had not Aris-
totle himself recommended that the poet write with the requirements of staging in
mind; that he plan a play around the key gestures (Poetics, 55a29–32; see Halliwell
1986:89, Scott 1999)?
2.3.1. Jesuits
The Jesuits were the advance troops of the Counterreformation. As Jesuit rhetorician
Nicolas Caussin (1583–1651) explained in De Eloquentia Sacra et Humana (1626), soph-
istry is all splendid surface; but genuine oratory is an expression of deep and genuine
feeling cultivated by the speaker drawing on the natural sources of the passions and
conveying them to his audience. Thus in delivery, the expressive qualities of voice
and gesture must be genuine and perfectly suited to the content (Campbell 1993:
61–62). The Jesuits underscored this old teaching of Cicero and Quintilian by putting
heavy emphasis on the art of delivery (Rehm 2002: 36–37). In Book IX, Caussin treats
368 III. Historical dimensions
voice and articulation (including their defects), as well as gesture, in great detail. He
ends the treatment of gesture by referring readers to the Vacationes Autumnales
(1620) of the Jesuit Louis de Cressolles (1568–1634), a work entirely devoted to the
use of voice and gesture in oratory (Campbell 1993: 64–65; Fumaroli 1981). These
are only the two best-known of a whole library of Jesuit rhetorical texts that emphasize
delivery (Conley 1990: 152–157; Fumaroli 2002: 279–391; Knox 1990: 111–114).
Quintilian had sharply distinguished rhetorical gesture from actorial mimicry
(Müller 1998: 27–28, 31–42); but the Renaissance was moving toward a wider theory
of nonverbal expression (see Fumaroli 2002: 305, 317, 357–360; Müller 1998: 45–47l;
Percival 1999). Although there was practically no theoretical literature on acting as
such in the 16th or 17th century, Jesuit rhetorical delivery greatly influenced contempo-
rary acting (Golding 1986; Goodden 1986; Gros de Gasquet 2007; Niefanger 2011;
Zanlonghi 2002).
2.3.2. Ramists
Pierre de la Ramée (Petrus Ramus, 1515–1572), professor of rhetoric and philosophy at
Paris, an opponent of scholastic and Aristotelian dialectics, developed a method to reor-
ganize all fields of learning, principally through a reform of logic and rhetoric (Conley
1990: 127–133, 140–143). The Ramists were influential in Protestant countries and, for a
relatively short time (1575–1625), in their universities. Ramus and his associate Omer
Talon (Audomarus Talaeus, 1510–1562) reduced rhetoric to elocution (style, i.e. tropes
and figures of speech) and action (voice and gesture), discarding invention and dispo-
sition – on the grounds that these properly belonged to logic – as well as memory (incor-
porated under disposition). While elocution got overwhelming attention, a few, like
Schonerus (Lazarus Schöner, 1543–1607) were interested in action. Johannes Althusius
(1563–1638) went further: reasoning that elocution applied as much to written as to oral
composition, he removed action from rhetoric to ethics, treating it under “civil conver-
sation” (Althusius 1601: 103–119). Others, like Alsted and Keckerman, were Aristote-
lians who used Ramist techniques of organization. (On Ramist influence in the study
of gesture, see Knox 1990: 116–125.) Despite their great differences, both Jesuit and
Ramist methods had the effect of giving greater emphasis to the expressive-affective
aspects of rhetoric, including pronunciation and gesture (Koch 2008: 321, n.31). Of
course, Quintilian and Cicero were the main sources for both.
3. Giovanni Bonifacio
Giovanni Bonifacio (1547–1635), from a noble family at Rovigo, studied law and rhet-
oric at the University of Padua, the center of Poetics scholarship (Benzoni 1967, 1970).
One of his teachers was the important Aristotelian theorist Antonio Riccoboni (1541–
1599), with whom he kept in touch in later life (Frischer 1996: 81 n.38; Mazzoni 1998:
214, n.32). As a lawyer, Bonifacio was professionally concerned with the art of rhetoric;
but he was also involved with the theatre as playwright, critic, and actor, even writing a
discourse on tragedy (Padua, 1624). In 1616 he published L’arte de’ Cenni, a 600-page
compendium of gestures arranged by parts of the body from head to foot, dedicating it
to the Accademia Filarmonica of Verona, of which he was a member. He also belonged
to literary academies in Treviso, Venice and Padua.
24. Renaissance philosophy: Gesture as universal language 369
A likely inspiration for Bonifacio was the famous and much reprinted Hieroglyphica
of Giovanni Pierio Valeriano Bolzani (1477–1558), (Valeriano 1556), which devotes one
section to the hieroglyphic interpretation of the parts of the body and their gestures
(Dorival 1971; Fumaroli 1981: 238; P. Rossi [1983] 2006: 78). Another important source
was certainly the Trattato dell’Arte della Pittura (1584) of the neoplatonic theorist Gian
Paolo Lomazzo (1538–1592), Book II of which is on “Actions and Gestures” (Aikema
1990: 104–105). Bonifacio emphasizes “that both poet and painter are capable of ren-
dering almost every emotion. The painter[’s] … task [was] to ‘portray the gestures
and movements, and thus the emotions of people’; his work should consequently be
understood by everyone without exception” (Aikema 1990: 105).
L’Arte de’ Cenni is not a handbook of rhetoric or acting technique or social behavior,
but a compendium for everyone interested in gesture – philosophers, critics, dramatists,
orators, poets, choreographers – and especially painters (Aikema 1990: 105; Gualandri
2001b: 395, 401; Popp 2007: 65–66; Puttfarken 2005: 6–67, 108–109). Bonifacio’s inten-
tion was to present a dynamic physiognomics, an art of discerning the inner state of the
soul from visible bodily signs, of great use for behaving with civility in all social situa-
tions. The art should also be of great value to merchants and explorers encountering
unknown peoples, since the affective message of a painting is as clear to Asians and
Africans as to Europeans (Knox 1996: 391–392, 395). Bonifacio finds gesture more sin-
cere than speech, of greater antiquity, dignity, naturalness, and universality. Though
truth may be simulated, he says (echoing Lomazzo), it is much easier to dissemble
with speech (Casella 1993: 335–340).
The arrangement, while systematic and compendious, is anatomical rather than func-
tional. This isolates the gestures from one another: there are no laws of combination, no
syntax. It is a lexicon without a grammar. As David Clement (1754) wrote, Bonifacio
does not so much teach us to speak by signs, as to know what ideas the ancients attached
to the parts of the body in their various movements. This lack of functional perspective
differentiates the work from both the rhetorical and the courtesy literature, which
describe gestures as instances of general rules of delivery or of social behavior (Casella
1993: 344).
4. Francis Bacon
Francis Bacon (1561–1626) discusses bodily expression in De Augmentis Scientiarum
IV.1 (the 1623 Latin translation and enlargement of The Advancement of Learning,
1605), and manual gesture at VI.1. These brief discussions would prove extremely influ-
ential, especially in the 18th century. Book IV is part of Bacon’s proposal for a new
science of humanity. At IV.1 he notes that while Aristotle in the Physiognomics pro-
vides a detailed treatment of the human body at rest, he ignores the body in motion,
i.e. gesture and expression. Yet this had long been discussed in the medical literature,
and the shift in physiognomics from an exclusive focus on structure to signs of transient
passions can already be seen in the opening chapters of Book II of Lomazzo’s Treatise
(English translation 1598), which discusses the concordance of body and soul on the
basis of the Galenic theory of the four humors (Gualandri 2001; Rykwert 1996: 92),
and in the De Humana Physiognomonia (1586) of Giambattista della Porta (1535–
1615), which devotes a chapter to the semiotics of gesture and expression, again
supported by humoral theory (Percival 1999: 16–17; Rykwert 1996: 40–45).
370 III. Historical dimensions
Bacon’s treatment of manual gesture near the beginning of Book VI (“The Doctrine
of Delivery”, covering logic, grammar and rhetoric) comes out of the logical tradition
(Aristotle, De Interp I.1) rather than the rhetorical (Wollock 2002: 232, 242–243).
Signs need not stand for words, they can stand directly for things, for “whatever may
be split into differences, sufficiently numerous for explaining the variety of notions, pro-
vided these differences are sensible, may be a means of conveying the thoughts from
man to man: for we find that nations of different languages, hold a commerce, in some
tolerable degree, by gestures” (see Knox 1990: 127, 130–132; Müller 1998: 52, 61).
To illustrate this, Bacon draws a parallel between gesture and hieroglyphic: while the
latter is permanent and the former transient, both resemble what they signify. This or-
dering of gesture with hieroglyphic recalls the renaissance fascination with emblemata
and imprese (inspired by Egyptian hieroglyphs, which were interpreted symbolically).
Bacon probably got the idea from Pierio’s Hieroglyphica, and possibly from another
work of Porta, the Ars Reminiscendi (1602), which connects hieroglyphics and gestures
with Ciceronian memory images, “those animated pictures which are recalled into the
imagination to represent a fact or a word” (P. Rossi 2006: 77; see 77–79, 107–109). In the
Jesuit ethnographic literature, Bacon had found Chinese ideographs described as a
“real character” which, rather than resembling what they signify, denote real relations.
Bacon believed that the perfection of such a system, symbolizing the verified “notes of
things”, could serve as a universal, philosophical language (Wollock 2002: 231–236).
Though it would require much labor to learn, the logical relations of the graphic sche-
mata would replace iconicity as an aid to memory (see his discussion of the ars memor-
ativa at DA V.5, the immediately preceding section; and Wollock 2002: 252, n.4). This
idea is complemented by the desideratum, immediately following, of a philosophical,
universal grammar based on a comprehensive study of all natural languages. British lan-
guage reformers would avidly pursue the real character (Wollock 2002: 239–240; 2011:
39–41).
In the scholarly literature on the real character, Bacon’s brief discussion of gesture is
usually seen as a mere introduction to his actual suggestion for a universal language and
advancement of science. The universality of gesture is that of primitive origins, related
to fable and ancient processes of imagination, memory and communication – what Vico
would later call the “poetic character”, the primal generator of both language and myth
(Bedani 1989: 35–43; Cantelli 1976: 49–63; Singer 1989; Wollock 2002: 232 at n.2, see
239–240). Others have argued, however, that the hieroglyphic principle is important
to Bacon’s own method (Stephens 1975, esp. 70–71,121–171; Altegoer 2000, esp. 82–83,
107–112).
5. John Bulwer
John Bulwer (1606–1656) was a London physician who wrote five books (1644a, b, 1648,
1649, 1650, 1653) and an unpublished manuscript (Wollock 1996: 16–19) on the body as
a means of communication. A self-styled Baconian, Bulwer was inspired by DA IV.1,
on the mutual influence of body and soul, a topic Aristotle had ignored; nor was it
Aristotle, as he further notes in the introduction to Chirononomia (Bulwer 1644b:
24–26), who had developed “manual rhetorique” into an art, but rhetoricians them-
selves, along with painters and actors: the first that “collected these Rhetoricall motions
of the Hand into an Art (…) was surely Quintilian”. Bulwer also admired the Jesuits
24. Renaissance philosophy: Gesture as universal language 371
who, inspired by Quintilian, expanded the role of gesture in oratory (Müller 1998: 46–47,
52, 53). He notes that Gerard Vossius (see Conley 1990: 159–162) reviews Quintilian’s
manual precepts; and disagrees with those – surely thinking of Althusius here – who rel-
egate gesture to ethics, because there is a distinction between actio moralis or civilis and
the action the Greeks call hupokrisis (i.e. delivery), “accom[m]odated to move the affec-
tions of the Auditours”, although such gestures “doe presuppose the Aethique precepts
and the lawes of civill conversation”. Talaeus (the first Ramist rhetorician) “prefers these
Canonicall gestures before the artifice of the Voyce”, but his commentator Claudius Minos
(Claude Mignault, 1536–1606) allows this only in communication between “Nations of
divers tongues”. Lazarus Schöner, a commentator on Talaeus, expresses a wish for
“Types and Chirograms, whereby this Art might be better illustrated then by words”,
which Bulwer has “here attempted to supply…” Indeed the Chironomia is the first
book to contain pictorial illustrations of the manual gestures of classical oratory.
Calling Bartholomaeus Keckermann (1572–1609, see Conley 1990: 157–159) “no
better than a precisi[a]n in Rhetorique” for giving precedence to the voice over gesture
even while admitting that “the Jesuites (known to be the greatest proficients in
Rhetorique of our times) instruct their disciples” in gesture as an art, Bulwer remarks
“how wonderfully they have improved and polished this kind of ancient Learning”,
which “appeares sufficiently by the Labours of three eminent in this facultie”, i.e.,
[Jean] Voel, Cressolles, and Caussin. Despite Conley’s observation (1990: 157) that Jesuit
rhetoric had little influence on English forensic oratory, the ceremonial style of worship
of the Church of England under Archbishop William Laud (1573–1645), which Bulwer
fully supported, emphasized the body and the senses in worship and helped define
what has often been called (originally by its detractors in the mid-19th century), the
“Laudian counterreformation” (see Parry 2006: 190–191; Wollock 2011: 68–72; 1996: 8–11).
While Chirologia/Chironomia, though limited to the hands, resembles Bonifacio’s
Cenni in many ways, there is no evidence that Bulwer knew the work or that it was
known in England at all. The similarities can be accounted for by English developments
that parallel the continental ones discussed above. In this connection, the influence
of Francis Junius the younger (1589–1677) has gone virtually unnoticed. Junius, of
Huguenot ancestry, born in Heidelberg, raised at Leiden, friend of Grotius and
brother-in-law of Gerard Vossius, was a paragon of Caroline intellectual culture. He
lived in England from 1620 to 1642 as librarian to the Earl of Arundel, and on the
fall of Charles I joined the earl in exile in the Netherlands. Junius’s De Pictura Veterum
(1637, revised English edition, The Painting of the Ancients 1638) is a classical source-
book on gesture and expression to facilitate the construction of a classical theory
of painting. Like Bulwer, Junius was also responding to the attacks of Puritan icono-
clasts on painting, theatre, and ceremonial worship (Wollock 2011: 68–72). He drew
on Sidney’s Arcadia as well as his Defense of Poesie, the first work in English to
show the influence of Aristotle’s Poetics (Dundas 2007). Junius’s work must have
been of great use to Bulwer, who writes in the preface to Chironomia:
I never met with any Rhetorician or other, that had picturd out one of these Rhetoricall
expressions of the Hands and fingers; or (…) any Philologer that could exactly satisfie
me in the ancient Rhetoricall postures of Quintilian. Franciscus Junius in his late Transla-
tion of his Pictura veterum, having given the best proofe of his skill in such Antiquities, by a
verball explanation thereof. (Bulwer 1644: 26)
372 III. Historical dimensions
Like Lomazzo and Bonifacio, Junius wrote his De Pictura Veterum because, in Bulwer’s
words, the “Painter, or Carver, or Plastique” needs to know “the naturall and artificiall
properties of the Hand” … for as the History [i.e. narrative] runnes and ascribes pas-
sions to the Hand, gestures and motions must come in with their accommodation”
“(Bulwer 1644b: 58). Just as Bonifacio had been influenced by Lomazzo, so was Junius
(Hard 1951: 238–239, n.8); and, so it would seem, was Bulwer himself. A single garbled
reference to “Palomatius” [sic] “in proport.” (Bulwer 1644b: 100) seems second-hand,
but the rare word “motist” is more telling: this was Haydocke’s rendering of Lomazzo’s
motista (Oxford English Dictionary: “A person skilled in depicting or describing move-
ment”): the earliest recorded use in English. A dedicatory verse to Chirologia addresses
Bulwer as “motistarum clarissimo” (Bulwer 1644a: unnumb.); the phrase “cunning mo-
tist” occurs in William Diconson’s dedicatory verse to the same work (Bulwer 1644a:
unnumb.) and to Pathomyotomia (Bulwer 1649: end matter); once in the text of
Chirologia (Bulwer 1644a: 172), twice in Chironomia (Bulwer 1644b: 24, 58), and
twice in Philocophus (Bulwer 1648: dedication, unnumbered, 150). Haydocke/Lomazzo
was one of the first books to advocate appreciation of painting as part of the education
of an English gentleman (235–236), which links it with the Inns of Court culture in
which Bulwer flourished.
To De Augmentis IV.1, Bulwer added the perspective of DA VI.1, bringing in Bacon’s
gesture/hieroglyph comparison and frequently citing Pierio’s Hieroglyphica. But Bulwer’s
attachment to the renaissance tradition of gesture as a universal language did not extend
to the artificial “real character”. Gesture is already “an universall character of reason (…)
generally understood and knowne by all Nations, among the formall differences of their
Tongue” (Bulwer 1644a: 3). For “[t]his naturall Language of the Hand (…) had the hap-
pinesse to escape the curse at the confusion of Babel” (Bulwer 1644a: 7). The first to take
Bacon’s hint about hieroglyphic and gesture, seeing the latter as the more fundamental of
the two, Bulwer foreshadows Giambattista Vico, William Warburton, and the French phi-
losophers influenced by Warburton (Rosenfeld 2004: 37–43) in their quest for the origins
of language – though none of them knew Bulwer’s work.
As we have seen from Lomazzo, Porta and Bonifacio, Bulwer was not the first to dis-
cuss the function of the humors and passions in the body’s signification of the soul.
There are some basic psychophysical observations in his 1644 publications (e.g. his idea
that gesture precedes utterance in the passage from thought to speech, agrees with
Cressolles, see Golding 1986: 151), but it is mainly in his next two books that Bulwer de-
velops psychophysiology as the universal basis of gesture and expression. In Philocophus
(1648) he investigates the sensorimotor aspects of speech; a tradition, also based on
humoral physiology, highly compatible with that on gesture (Wollock 2002, 2012). Bulwer
was the only physician among the renaissance gesture writers, and the only one to
discuss psychophysiology in detail according to the teachings on voluntary motion of
Aristotle and Galen (see Wollock 1990: 11–22; 1997: 113–150; 2012: 844–851). Thus,
although Bulwer’s Chirologia/Chironomia, like Bonifacio’s Arte de’ Cenni, may be
called a lexicon without a grammar, Bulwer’s lexicon (glossing all the gestures as
verbs) is backed up by a psychophysiology (see analysis in Hübler 2001:350–361).
Bulwer did not explore the connections of speech and gesture except from the stand-
point of rhetorical delivery (nor did anyone else at the time; see McNeill 2005: 13–15),
but his review of the interrelation of the senses in the general theory of action provides
a groundwork (Wollock 2012; see Müller 1998: 77–82, discussing McNeill). It is also
24. Renaissance philosophy: Gesture as universal language 373
striking that he concludes his anatomy of the expressive muscles of the head with the
musculature of the speech organs (Bulwer 1649: 228–241), which links directly to the
more extensive discussion of articulatory phonetics in Philocophus (Bulwer 1648: 1–54).
Indeed, the psychophysiological processes common to both gesture as universal language
and gesture as transient iconic sign, point to the substantial links between them; and to
have provided such a framework for further investigation is Bulwer’s most important,
though long unappreciated, contribution to language theory.
It is an interesting question how, or even whether, Bulwer connects with the
Baconian language reformers (Wollock 2011). Recalling Bacon’s scheme of advance-
ment from natural to artificial real character is Knox’s suggestion (1990: 129–136;
1996: 395–396) that investigators of artificial, universal language saw the universal
language of gesture as a precedent. However, the universality of gesture lies in its nat-
ural, iconic, imitative character, the universality of human expression, the human psy-
chophysiology that supports it, and its natural relationship to speech. The artificial “real
character” prescinds from the imagination, emotions, and any ratio of bodily communi-
cation. Thus, in opting for the artificial real character over more iconic forms of com-
munication, the artificial language researchers abandoned body as first principle of
expression, in favor of a supposedly independent rationality of the external world
(Wollock 2002: 242–243, 249–252).
Whether or not the natural basis of language belongs to language itself, a satisfactory
theory of language needs to take strict account of it. While mental imagery and affect
remained central to rhetoric and aesthetics, both mechanistic psychology and formalist
linguistics, in a trend not so much anti-visual as anti-sensory in general, excluded them.
And indeed this fits with both the puritan iconoclasm, or fear of iconicity (except for
utilitarian use), as well as with Cartesian hyper-abstraction. Thus, Bulwer’s lack of influ-
ence in the three centuries after his death is due to cultural and intellectual changes that
would make England unreceptive to his approach, as well as to the fact that his work,
available only in English, was virtually unknown abroad.
1991: 263–265). Courtesy books frequently discussed gesture, and its value for polite
conversation was stressed by Bonifacio, who (as Harsdörffer emphasizes), was himself
a member of the Accademia Filarmonica of Verona. In another work, Der Poetischer
Trichter (1648–1653), Harsdörffer provides valuable insights into the use of gesture
on the stage; he also discusses the core problems on the relations of poetry with the
other arts, with which we began this article (Niefanger 2011).
Bonifacio was unknown in England, and Bulwer was virtually unknown on the con-
tinent. In the mid-1650s, however, Harsdörffer, or his associates, had access to a copy of
Chirologia/Chironomia. (See excerpts in the Teutsche Secretarius, I, 1656: 707–718.)
Harsdörffer’s enthusiasm for Bonifacio represents, in the German context, the
beginning of a transition to the culture of wit, taste, and polite conversation (Agazzi
2000: 33–37; Bannasch 2007; Bonfatti 1983; Hübler 2001:171–202; Knox 1990: 383–
384; Locher 1991; Müller 1998: 53–55). From this perspective, not only is gesture
mute speech, but speech and voice are audible gesture (see Hübler 2007: 147–170;
Wollock 1979, 1982: 195–223).
7. Conclusions
Renaissance ideas on gesture foreshadow the 18th century, and to some extent even
Romanticism (see Vico, Herder). Important for us today is not so much the literal
question whether gesture is a universal language, as the fact that in this period gesture
called attention to linguistic processes that are certainly universal-psychophysiological
processes common to verbal and nonverbal thought – that were often overlooked,
downplayed, or even denied in 20th-century linguistics.
8. References
Agazzi, Elena 2000. Il Corpo Conteso: Rito e Gestualità nella Germania del Settecento. Milan: Jaca
Book.
Aikema, Bernard 1990. Pietro della Vecchia and the Heritage of the Renaissance in Venice. Flor-
ence: Istituto universitario olandese di storia dell’arte.
Altegoer, Diana B. 2000. Reckoning Words: Baconian Science and the Construction of Truth in
English Renaissance Culture. Madison, NJ: Fairleigh Dickinson University Press; London: As-
sociated University Presses.
Bacon, Francis 1857–1870. De augmentis scientiarum. In: James Spedding, Robert Leslie Ellis and
Douglas Denon Heath (eds.), The Works of Francis Bacon 1: 431–837. London: Longman. First
published [1623].
Badt, Kurt 1959. Raphael’s Incendio del Borgo. Journal of the Warburg and Courtauld Institutes 22
(1/2): 35–59.
Bannasch, Bettina 2007. Zwischen Jakobsleiter und Eselsbrücke: Das “Bildende Bild” im Emblem-
und Kinderbilderbuch des 17. und 18. Jahrunderts. Göttingen, Germany: Vandenhoeck u.
Ruprecht.
Barasch, Moshe 1987. Giotto and the Language of Gesture. Cambridge Studies in the History of
Art. New York: Cambridge University Press.
Barasch, Moshe 1997. Language of art: Some historical notes. In: Moshe Barasch. Language of
Art: Studies in Interpretation, 10–26. New York: New York University Press.
Battafarano, Italo Michele 1990. Harsdörffers “Frauenzimmer Gesprächspiele”: Frühneuzeitliche
Zeichen- und (Sinn)Bildsprachen in Italien und Deutschland. In: Volker Kapp (ed.), Die
Sprache der Zeichen und Bilder: Rhetorik und Nonverbale Kommunikation in der Frühen Neu-
zeit, 77–88. Marburg, Germany: Hitzeroth.
24. Renaissance philosophy: Gesture as universal language 375
Baxandall, Michael 1972. Painting and Experience in Fifteenth-Century Italy: A Primer in the
Social History of Pictorial Style. Oxford: Clarendon Press.
Baxandall, Michael 1982. The Limewood Sculptors of Renaissance Germany. New Haven, CT:
Yale University Press.
Bedani, Gino 1989. The origins of language, “natural signification” and “onomathesia”. In: Vico
Revisited. Orthodoxy, Naturalism and Science in the Scienza Nuova, 35–51. Oxford: Berg.
Benzoni, Gino 1967. Giovanni Bonifacio (1547–1635), erudito uomo di legge e devoto. Studi
Veneziani 9: 247–312.
Benzoni, Gino 1970. Bonifacio, Giovanni. In: Dizionario Biografico degli Italiani, 12: 104–197.
Rome: Enciclopedia Italiana Treccani.
Bonfatti, Emilio 1983. Vorläufige Hinweise zu einem Handbuch der Gebärdensprache im
deutschen Barock: Giovanni Bonifacios “Arte de’ Cenni” (1616). In: Joseph P. Strelka and
J. Jungmair (eds.), Virtus et Fortunae: Zur Literatur zwischen 1400 und 1700. Festschrift H.-
G. Roloff, 393–405. Bern: Lang.
Bonifacio, Giovanni 1616. L’arte de’ Cenni, con la quale, Formandosi Favella Visibile, si Tratta
della Muta Eloquenza. Vicenza: Francesco Grossi.
Bremmer, Jan 1991. Walking, standing and sitting in ancient Greek culture. In: Jan Bremmer and
Herman Roodenburg (eds.), A Cultural History of Gesture, 15–35. Ithaca, NY: Cornell Univer-
sity Press.
Bulwer, John 1644a. Chirologia, or the Natvrall Langvage of the Hand. London: T. Harper.
Bulwer, John 1644b. Chironomia, or the Art of Manuall Rhetoricke. London: T. Harper.
Bulwer, John 1648. Philocophus, or the Deafe and Dumbe Man’s Friend. London: Humphrey Moseley.
Bulwer, John 1649. Pathomyotomia, or a Dissection of the Significant Muscles of the Affections of
the Mind. London: Humphrey Moseley.
Bulwer, John 1650. Anthropometamorphosis: Man Transform’d, or the Artificial Changeling. Lon-
don: J. Hardesty.
Bulwer, John 1653. Anthropometamorphosis. Second, enlarged edition. London: William Hunt.
Burrow, John Anthony 2002. Gestures and Looks in Medieval Narrative. Cambridge: Cambridge
University Press.
Campbell, Stephen F. 1993. Nicolas Caussin’s “Spirituality of Communication”: A meeting of
divine and human speech. Renaissance Quarterly 46(1): 44–70.
Cantelli, Gianfranco 1976. Myth and language in Vico. In: Giorgio Tagliacozzo and Donald Phillip
Verene (eds.), Giambattista Vico’s Science of Humanity, 47–63. Baltimore: John Hopkins Uni-
versity Press.
Casella, Paola 1993. Un dotto e curioso trattato del primo Seicento: L’arte de’Cenni di Giovanni
Bonifaccio. Studi Secenteschi 34: 331–407.
Conley, Thomas M. 1990. Rhetoric in the European Tradition. Chicago: University of Chicago Press.
Davidson, Clifford, ed. 2001. Gesture in Medieval Drama and Art. Early Drama, Art, and Music
Monograph. Kalamazoo, MI: Medieval Institute Publications.
Davis, Michael 1992. Aristotle’s Poetics: The Poetry of Philosophy. Savage, MD: Rowman and
Littlefield.
Dorival, Bernard 1971. Philippe de Champaigne et les Hiéroglyphiques de Pierius. Revue de l’Art
11: 31–41.
Duncan, Anne 2006. Performance and Identity in the Classical World. Cambridge: Cambridge Uni-
versity Press.
Dundas, Judith 2007. Sidney and Junius on Poetry and Painting: From the Margins to the Center.
Cranbury, NJ: Associated University Presses.
Dutsch, Dorota 2002. Towards a grammar of gesture. A comparison between the types of hand
movements of the orator and the actor in Quintilian’s Institutio Oratoria 11.3.85–184. Gesture
2(2): 259–281.
Enders, Jody 1992. Rhetoric and the Origins of Medieval Drama. Ithaca, NY: Cornell University
Press.
376 III. Historical dimensions
Enders, Jody 2001. Of miming and signing. In: Claude Davidson (ed.), Gesture in Medieval Drama
and Art, 1–25. Kalamazoo, MI: Medieval Institute Publications.
Fögen, Thorsten 2009. Sermo corporis: Ancient reflections on gestus, vultus and vox. In: Thorsten
Fögen and Mireille M. Lee (eds.), Bodies and Boundaries in Graeco-Roman Antiquity, 15–44.
Berlin/New York: Walter de Gruyter.
Frischer, Bernard 1996. Rezeptionsgeschichte and Interpretation: The Quarrel of Antonio Ricco-
boni and Niccolò Cologno about the structure of Horace’s Ars Poetica. In: Helmut Krasser and
Ernst A. Schmidt (eds.), Zeitgenosse Horaz: Der Dichter und seine Leser seit zwei Jahrtausen-
den, 68–116. Tübingen: Gunter Narr.
Fumaroli, Marc 1981. Le corps éloquent: Une somme d’actio et pronuntiatio rhetorica au XVII
siècle, les Vacationes autumnales du P. Louis de Cressoles (1620). Dix-Septième Siècle
33(132): 237–264.
Fumaroli, Marc 2002. L’âge de L’éloquence: Rhétorique et “Res Literaria” de la Renaissance au
Seuil de L’époque Classique. Geneva: Droz.
Golding, Alfred S. 1986. Nature as symbolic behavior: Cresol’s Autumn Vacations and early
Baroque acting technique. Renaissance and Reformation 10(1): 147–157.
Goodden, Angelica 1986. Actio and Persuasion: Dramatic Performance in Eighteenth-Century
France. Oxford: Clarendon Press; New York: Oxford University Press.
Graf, Fritz 1995. Ekphrasis: Die Entstehung der Gattung in der Antike. In: Gottfried Boehm and
Helmut Pfotenhauer (eds.), Beschreibungskunst – Kunstbeschreibung. Ekphrasis von der
Antike bis zur Gegenwart, 143–155. Munich: Fink.
Gros de Gasquet, Julia 2007. Rhétorique, téatralité et corps actorial. XVIIe Siècle: Revue Trimes-
trielle (236): 501–519.
Gualandri, Francesca 2001a. Affetti, Passioni, Vizi e Virtù: La Retorica del Gesto nel Teatro Del
’600. Milan: Peri.
Gualandri, Francesca 2001b. Le geste scénique dans “Le nozze di Teti e Peleo”. In: Marie-Thérèse
Bouquet-Boyer (ed.), Les Noces de Pélée et de Thetis: Venise, Paris, 1654; actes du colloque de
Chambéry et de Turin, 3–7 novembre 1999, 391–405. Bern: Lang.
Hall, Jon 2004. Cicero and Quintilian on the oratorical use of hand gestures. Classical Quarterly
54(1): 143–160.
Halliwell, Stephen (ed.) 1986. Aristotle’s Poetics. Chicago: University of Chicago Press.
Hard, Frederick 1951. Some interrelations between the literary and the plastic arts in 16th and
17th century England. College Art Journal 10(3): 233–243.
Harsdörffer, Georg Philipp 1968–1969. Frauenzimmer Gesprächspiele. Edited by Irmgard Bou-
cher. Tübingen: Max Niemeyer. First published [1641–1649].
Harsdörffer, Georg Philipp 1656–1659 Der teutsche Sekretarius. Das ist aller Cantzeleyen/ Studir
und Schreibstuben Nützliches/fast Nothwendiges und zum Drittenmal Vermehrtes Titular- und
Formularbuch (2 Vol.). Nürnberg: Endter.
Hathaway, Baxter 1962. The Age of Criticism: The Late Renaissance in Italy. Ithaca, NY: Cornell
University Press.
Hübler, Axel 2001. Das Konzept “Körper” in den Sprach- und Kommunikationswissenschaften.
Tübingen: Francke.
Hübler, Axel 2007. The Nonverbal Shift in Early Modern English Conversation. Amsterdam: John
Benjamins.
Jackson, B. Darrell 1969. The theory of signs in St. Augustine’s De Doctrina Christiana. Revue des
Eludes Augustiniennes 15: 9–49.
Janko, Richard (ed.) 1987. Aristotle, Poetics: With the Tractatus Coislinianus, Reconstruction of
Poetics II, and the Fragments of the On Poets (Book 1). Indianapolis: Hackett.
Knox, Dilwyn 1990. Ideas on gesture and universal languages c. 1550–1650. In: John Henry and
Sarah Hutton (eds.), New Perspectives on Renaissance Thought: Essays in the History of
Science, Education and Philosophy in Memory of Charles B. Schmitt, 101–136. London:
Duckworth.
24. Renaissance philosophy: Gesture as universal language 377
Knox, Dilwyn 1996. Giovanni Bonifacio’s L’arte de’ cenni and Renaissance ideas of Gesture. In:
Mirko Tavoni (ed.), Italia ed Europa nella Linguistica del Rinascimento. Atti del Convegno In-
ternazionale, Ferrara 20–24 March 1991, 2: 379–400. Modena: Panini.
Koch, Erec 2008. The Aesthetic Body: Passion, Sensibility, and Corporeality in Seventeenth-Cen-
tury France. Newark: University of Delaware Press.
Lauter, Paul (ed.) 1964. Theories of Comedy. Garden City, NY: Doubleday Anchor.
LeCoat, Gerard G. 1975. The Rhetoric of the Arts, 1550–1650. Bern: Herbert Lang.
Lievsay, John L. 1961. Stefano Guazzo and the English Renaissance, 1575–1675. Chapel Hill: Uni-
versity of North Carolina Press.
Locher, Elmar 1991. Harsdörffers Deutkunst. In: Italo Michele Battafarano (ed.), Georg Philipp
Harsdörffer. Ein Deutscher Dichter und Europäischer Gelehrter, 243–265. Bern: Lang.
Lühr, Berit 2002. The Language of Gestures in Some of EI Greco’s Caravaggio Altarpieces. Ph.D.
dissertation, Department of History of Art, University of Warwick, UK.
Manetti, Giovanni 1993. Theories of the Sign in Classical Antiquity. Translated by C. Richardson.
Bloomington: Indiana University Press.
Mazzoni, Stefano 1998 L’Olimpico di Vicenza: un Teatro e la sua “Perpetua Memoria”. Florence:
Le Lettere.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte, Theorie, Sprachvergleich. Berlin:
Spitz.
Niefanger, Dirk 2011. Gebärde und Bühne. In: Stefan Keppler-Tasaki and Ursula Kocher (eds.),
Georg Philipp Harsdörffers Universalität: Beiträge zu einem Uomo Universale des Barock, 65–
82. Berlin: de Gruyter.
Parry, Graham 2006. The Arts of the Anglican Counter-Reformation: Glory, Land and Honour.
Woodbridge, UK: Boydell Press.
Percival, Melissa 1999. The Appearance of Character: Physiognomy and Facial Expression in Eigh-
teenth-Century France. Leeds, U.K: Maney & Son for the Modern Humanities Research
Association.
Popp, Jessica 2007. Sprechende Bilder – Verstummte Betrachter: zur Historienmalerei Domenichi-
nos (1581–1641). Cologne: Böhlau.
Preimesberger, Rudolf 1987. Tragische Motive in Raffaels Transfiguration. Zeitschrift für Kunst-
geschichte 50: 89–115.
Puttfarken, Thomas 2005. Titian and Tragic Painting: Aristotle’s Poetics and the Rise of the Modern
Artist. New Haven, CT: Yale University Press.
Rehm, Ulrich 2002. Stumme Sprache der Bilder. Berlin: Deutscher Kunstverlag.
Rosenfeld, Sophia A. 2004. A Revolution in Language: The Problem of Signs in Late Eighteenth-
Century France. Stanford, CA: Stanford University Press.
Rossi, Giovanni 2003. Rhetorical role models for 16th to 18th century lawyers. In Olga Telle-
gen-Couperus (ed.), Quintilian and the Law, 81–94. Leuven, Belgium: Leuven University
Press.
Rossi, Paolo 2006. Logic and the Art of Memory: The Quest for a Universal Language. Translated
by Stephen Clucas, 2nd ed. London: Continuum. First published [1983].
Rykwert, Joseph 1996. The Dancing Column: On Order in Architecture. Cambridge: Massachu-
setts Institute of Technology Press.
Schmitt, Jean-Claude 1991. The rationale of gestures in the West; third to thirteenth centuries. In:
Jan Bremmer and Herman Roodenburg (eds.), A Cultural History of Gesture, 59–70. Ithaca,
NY: Cornell University Press.
Schröder, Volker 1992 “Le langage de la peinture est le langage des muets”: remarques sur un
motif de l’esthétique classique. In: René Démoris (ed.), Hommage à Elizabeth Sophie Chéron:
Texte et Peinture à l’Âge Classique, 95–110. Paris: Presses de la Sorbonne Nouvelle.
Schroedter, Stephanie 2004. Vom “Affect” zur “Action”. Würzburg: Königshausen & Neumann.
378 III. Historical dimensions
Scott, Gregory 1999. The poetics of performance. The necessity of spectacle, music, and dance in
Aristotelian tragedy. In: Salim Kemal and Ivan Gaskell (eds.), Performance and Authenticity in
the Arts, 15–48. Cambridge: Cambridge University Press.
Singer, Thomas C. 1989. Hieroglyphs, real characters, and the idea of natural language in English
seventeenth-Century thought. Journal of the History of Ideas 50(1): 49–70.
Stephens, James 1975. Francis Bacon and the Style of Science. Chicago: University of Chicago Press.
Valeriano Bolzani, Giovanni Pierio 1556. Hieroglyphica, sive De sacris Aegyptiorum literis com-
mentarii. Basel: Michael Isengin.
Varriano, John L. 2006. Caravaggio: The Art of Realism. Pennsylvania: Pennsylvania State Press.
Vickers, Brian 1991. Rhetoric and poetics. In: Charles B. Schmitt and Quentin Skinner (eds.), The
Cambridge History of Renaissance Philosophy, 715–745. Cambridge: Cambridge University Press.
Wollock, Jeffrey 1979. An articulation disorder in seventeenth-century Germany. Journal of Com-
munication Disorders 12: 303–320.
Wollock, Jeffrey 1982. Views on the decline of apical r in Europe: historical survey. Folia Linguis-
tica Historica 3: 185–238.
Wollock, Jeffrey 1990. Communication disorder in renaissance Italy: An unreported case analysis
by Hieronymus Mercurialis (1530–1606). Journal of Communication Disorders 23: 1–30.
Wollock, Jeffrey 1996. John Bulwer’s (1606–1656) place in the history of the deaf. Historiographia
Linguistica 23(1/2): 1–46.
Wollock, Jeffrey 1997. The Noblest Animate Motion: Speech, Physiology and Medicine in Pre-Car-
tesian Linguistic Thought. Amsterdam: John Benjamins.
Wollock, Jeffrey 2002. John Bulwer (1606–1656) and the significance of gesture in 17th-century
theories of language and cognition. Gesture 2(2): 227–258.
Wollock, Jeffrey 2011. John Bulwer and the quest for a universal language, 1641–1644. Historio-
graphia Linguistica 37(1/2): 37–84.
Wollock, Jeffrey 2012. Psychological theory of John Bulwer. In: Robert W. Rieber (ed.), Encyclo-
pedia of the History of Psychological Theories, 2: 839–856. New York: Springer Science.
Zanlonghi, Giovanna 2002. Teatri di Formazione: Actio, Parola e Immagine nella Scena Gesuitica
del Sei-Settecento a Milano. Milan: Vita e Pensiero.
Abstract
Bodily forms of communication were a core topic in the 18th century debate about the
origin of language. At issue was whether God, nature, or man was responsible for creating
our language faculty. The debate called for new ideas about how language and thought
are related in order to advance learning. It brought the Enlightenment philosophers
25. Enlightenment philosophy 379
into confrontation with discourse traditions stemming from the Bible, Aristotle, and Des-
cartes. Drawing upon the work of the British empiricist Locke, Condillac argued that
bodily sensations are transformed and create the mind and all the knowledge it contains.
He hypothesized that the language of action was a precursor to speech and gave rise to
the language faculty which, rather than the soul, distinguishes us from other animals.
His sensualism was highly influential and inspired the pioneering work of de l’Epée,
the inventor of the gestural method of educating the deaf in France, and his successor,
Sicard. By considering what role the body may have in forming the mind, Enlightenment
philosophy shifted the perspectives on what language is and how its communication
function is intimately related to cognition.
1. Introduction
Enlightenment philosophy dawned in France in the mid 18th century. It was character-
ized by a critical questioning of traditional beliefs and values underlying the political
and social structures of that time. Humanism and the Reformation had challenged
the authority of the Roman Catholic Church, as well as the political conventions that
depended on it, such as the divine right of kings. Science was heralded as a new savior
promising to deliver truth. Referring to Aristotle’s Organon [Works on Logic] (trans.
[1930] 1960, [1955] 1992, [1938] 1996), Thomas Bacon (1561–1626) wrote Novum Orga-
num Scientiarum [New Instrument of Science] ([1620] 2011) announcing that the new
empirical method was to replace the old dialectic method of discovering truths. The
“light of reason” was to reveal the laws of nature through experimentation and obser-
vation to advance knowledge and, hence, improve physical well-being. As contempo-
rary European languages were replacing Latin as the medium of scholarly discourse,
Bacon saw a major obstacle to scientific inquiry in what he called idola fori ‘idols of
the market-place’, misconceptions arising from the common use of language, e.g. a
whale is a fish (see Trabant 2003: 122–131). Philosophers across Europe sought ways
to resolve these semantic problems, and their inquiries raised questions about the relation
between language and knowledge.
One approach was to try to eliminate idola fori by analyzing concepts and clarifying
definitions, thus reforming natural languages for philosophical discourse. Another was
to bypass words as the vehicle of philosophical knowledge by inventing a universal lan-
guage comprised of symbolic characters whose meanings would have a direct link with
reality and thus mirror the process of reasoning perfectly (see Harris and Taylor [1989]
1997: 110–125). Gesture became a topic in this discussion due to its mimetic nature. The
idea that it formed the very first language was advanced by Vico ([1744] 1990), Condil-
lac ([1746] 2001) and Rousseau ([1781] 1968). It was corroborated by the reports of tra-
velers, especially in the New World, where gesture proved to be a way of surmounting
the language barrier confronting Europeans in their encounters with Native Americans,
who were commonly viewed as primitive humans. The efficiency of gesture in establish-
ing communication implied that the original, natural, universal language of mankind
may have been gestural (see Kendon [2004] 2005: 22). The deaf were also commonly
equated with primitive man. Diderot (1713–1784) expressed the view that the gestural
communications of the deaf may give clues to the original structure of thought in Lettre
sur les Sourds et Muets ([1751] 1978) (see Fischer 1990: 35–58; Kendon 2005: 38). Their
380 III. Historical dimensions
signs attracted scholarly interest as the Abbé de l’Epée showed how they could be used
to teach French to the deaf and thus educate them.
Inquiry into the question the origin of language unavoidably raised polemic issues
concerning religion and society, because it concerned man’s nature and his position
in the universe. At issue was whether God, nature, or man was responsible for our lan-
guage capacity. Using thought experiments to speculate about the historical origin of
language, the birth of the language faculty in our ancestors in the empirically inacces-
sible past, the aim was to gain insight into the perpetual origin of language, from which
each and every instance of language use springs, and thus throw light on how we obtain
knowledge. The debate called for new ideas about how language and knowledge are
related in order to advance learning and cultivate a new image of man, which brought
the Enlightenment philosophers into confrontation with three discourse traditions. As a
result, philosophy underwent a process of emancipation to free itself from dogmas (see
Aarsleff 2001: xiii–xv; Trabant 1996: 44–49, 2001: 6–9, 2003: 15–20, 29–34, 131–136):
(i) The Bible: According to Genesis, God created the first man, Adam, a fully evolved
human being equipped with the language capacity. The primary function of lan-
guage was cognition, which Adam exercised in naming the animals; according to
myth lingua Adamica possibly contained divine knowledge due to its divine origin
(see Aarsleff 1982: 281–283; Willard 1989: 134–137). Communication is a second-
ary function of language that can lead us astray from God’s will; rhetorical lan-
guage, appealing to and expressing passions (needs, desires, feelings, emotions),
caused the loss of Paradise, and so the body was instrumental in committing the
Original Sin.
(ii) Aristotle’s linguistic conception in De Interpretatione [On Interpretation] (trans.
[1938] 1996): The mind naturally receives conceptual images of the world; these
are the same for all human beings and constitute thought. The sole function of lan-
guage is the communication of thought. Different languages are different ways of
materializing the same universal ideas; the link between words and ideas is arbi-
trary and conventional (see Trabant 1996: 47–49; 2003: 29–34).
(iii) Cartesian dualism in Discours sur la Méthode ([1637] 1960): The mind is the soul
and it differentiates us from other animals and machines; it receives true, innate
ideas from God. The sole function of language is the communication of thought.
Natural movements which express passions are not part of language (Descartes
1960: 97).
2. Condillac
The Essai sur l’Origine des Connaissances Humaines ([1746] 1973) by Etienne Bonnot
de Condillac’s (1714–80) caused intense debate in the second half of the 18th century
(see Aarsleff 1982: 146–209; Ricken 1989: 301–309). There, he argues that the langage
d’action ‘the language of action’ gave rise to the human language faculty, which distin-
guishes us from other animals. His argument draws upon the concept of actio ‘action,
delivery’ in the Greek and Roman rhetorical tradition, which had been gaining in
importance since the 16th century (see Aarsleff 2001: xviii–xxiii; Kendon 2005: 20–38).
His proposals were further developed in Grammaire ([1775] 1970a) and La Logique
([1780] 1970b).
25. Enlightenment philosophy 381
Countering Cartesian dualism, he asserted that all animals have a soul (mind) equipped
with all the mental faculties up to and including imagination. But only humans have
access to the higher cognitive processes beyond this threshold. Memory enables man
“to gain mastery of his own imagination and to give it new exercise” (Condillac
2001: 40). This voluntary use of the imagination makes the vital difference: It gives
us the ability to draw analogies between the world around us and our reactions to it,
which guides the development of our semiotic capacity. He distinguished three types
of signs (see Condillac 2001: 36):
(i) Accidental signs are chance repetitions of perceptions that unintentionally evoke
the same ideas;
(ii) Natural signs are sounds and movements that express affective states and are
established by nature;
(iii) Instituted signs are man-made and established by convention.
They usually accompanied the cries with some movement, gesture, or action that made the
expression more striking. For example, he who suffered by not having an object his needs
demanded would not merely cry out; he made as if an effort to obtain it, moved his head,
his arms, and all parts of his body. Moved by this display, the other fixed the eyes on the
same object, and feeling his soul suffused with sentiments he was not yet able to account
for to himself, he suffered by seeing the other suffer so miserably. From this moment he
feels that he is eager to ease the other’s pain, and he acts on this impression to the extent
that it is within his ability. Thus by instinct alone these people asked for help and gave it.
(Condillac 2001: 114–115)
Like the Roman poet Lucretius (98–55 BC), who integrated gesture into a heathen
version of Genesis in De Rerum Natura V [On the Nature of Things] (trans. [1924]
1992), Condillac assumed that sympathy is an innate human attribute. Crucially, the
first sign was naturally created in the observer’s mind (see Aarsleff 2001: xxv): He
linked what his senses were telling him and then acted instinctively to help his compan-
ion. Thereafter the children communicated about the world using cries and movements.
25. Enlightenment philosophy 383
The use of natural bimodal signs established new relationships – between them as they
interacted, and between them and the external world as they tackled survival problems
together – leading to enhanced cognition and the first language: the language of action.
This emotion-based form of interaction and reference generated higher mental facul-
ties. The intentional use of natural signs gave them control over their memory, which –
together with the imagination and contemplation – established the faculty of analogy.
The ability to imagine and thus create retrievable signs brought about a mental libera-
tion from immediate circumstances and drove the transition from nature to culture.
Hence our semiotic capacity, rather than a soul, as in Cartesian dualism, differentiates
us from other animals.
As the number of bimodal signs in usage increased, so did the capacity for gaining
knowledge. Condillac does not say why the voice took over this function, but bimodal
signs were gradually superseded by mono-modal ones as dance and speech evolved sep-
arately (Fig. 25.1). Dance is conceived as a “strong and noble” (Condillac 2001: 118)
way of communicating to compensate for the limitations of primordial speech. Bodily
attitudes and actions later became subjected to rules in the danse des gestes ‘dance of
gestures’ which conveyed semantic content, and which, in turn, gave rise to the danse
des pas ‘dance of steps’ to express “certain states of mind, especially joy” (Condillac
2001: 118), as exemplified in Italian pantomime. Condillac celebrates the various
branches of the Greek and Roman rhetorical tradition (Condillac 2001: 120–155) as fruit-
ful evolutionary developments in the natural history of our semiotic capacity (see Trabant
2003: 172). Since “gestures, dance, prosody, declamation, music, and poetry” are “closely
interrelated as a whole and to the language of action which is their underlying principle”
(Condillac 2001: 156), aesthetics preceded epistemology, and imagination preceded
reason (see Aarsleff 2001: xvi). He thus exonerates the passions as infiltrating linguistic
communication to the detriment of humanity, as in the Biblical account of the Fall.
accidental signs
natural signs
language of action
(movement + cries)
instituted signs
dance speech
dance of gestures
dance of steps
Regarding the evolution of speech, he puts forward no clues as to how duality of pat-
terning could have emerged from cries of emotion. He addresses the problem of the
transition from unarticulated sounds to articulated speech mainly as a material (pho-
netic) rather than a structural (syntactic) or conceptual (semantic) one: The number
of words increased by chance, and young children learnt to pronounce more of them
384 III. Historical dimensions
while their vocal equipment was still flexible. This timely exercise extended their use
into adulthood and established speech as the dominant mode of communication. He
does not suggest what drove the transition – simply that “the use of articulated sounds
became so easy that they prevailed” (Condillac 2001: 116).
In the Essai (Condillac 2001), movement, gesture, and action only play a role in the
early stages of linguistic and cognitive development. In this respect, Condillac discusses
the case of a congenitally deaf-mute man from Chartres whose hearing was suddenly
restored at the age of 23. At first, he was reported to have been surprised at hearing
the sound of bells. Then, having silently listened to people’s conversations for some
months, and repeated in private the words he had heard, one day, he spontaneously
began to speak. Condillac concluded that, while still deaf, the young man was mentally
comparable to a “wild” (feral) child of nature, because deafness had isolated him from
speakers, to whom he could communicate only his essential needs by means of gesture,
and that without speech, his mind had remained in a primitive state (see Condillac 2001:
85–87). He was thus unable to climb the ladder from nature and culture in Condillac’s
model. However, Condillac modified his judgement of the mental capacities of the deaf
and upgraded his view of the semiotic potential of gesture in a later work, Grammaire
(1970a: 359–360), after visiting the Abbé de l’Epée’s school for the deaf in Paris and
witnessing his pupils’ abilities (see Fischer 1993: 431–437; 2011: 13–14). Epée’s analytic
method of using manual signs as the primary medium for educating the deaf convinced
Condillac that they can indeed grasp abstract concepts and thus achieve normal levels
of intellectual development. In Grammaire, he treats the language of action primarily as
a gestural phenomenon at length, and defines gesture as encompassing movements
of the arms, head, and the whole body, as well as facial and ocular expressions (see
Condillac 1970a: 354–355). Nature is said to have determined the first signs and
paved the way for us to imagine new ones. Consequently, we could express all our
thoughts in gestures just as well as we do in words (see Condillac 1970a: 357). He dif-
ferentiates two languages of action, “one natural, whose signs are given by a conforma-
tion of the organs; and the other artificial, whose signs are given by analogy. The former
is necessarily very limited; the latter is quite able to render understandable all of
man’s thoughts” (Condillac 1970a: 359, as quoted in Seigel 1995: 106). By “natural”
Condillac means signs determined by biological constitution, hence all animals have
a language of action specific to their species. By “artificial” he means man-made
signs whose semantic content has been analyzed and whose forms are determined by
analogy, as Epée’s methodical signs were, thus conceding that the language use required
to develop the intellect is, in fact, modality independent (see Fischer 2011: 13–14).
Crucially, he views analysis and analogy as basic complementary principles underlying
language and knowledge acquisition, and which originated in the language of action
(see Condillac 1970a: 365; Fischer 1993: 431–433; Harris and Taylor 1997: 139–154).
learnt to analyze the contents of their minds by making the transition from holistic to
analytic gestures – from simultaneous to successive linguistic sequencing. Then the com-
munication channel began to switch between visual-kinesic and auditory-vocal modal-
ities. Speech was a by-product that became easier in the long run through habitual
usage.
Condillac discusses the transition from a natural language of gestural signs to an arti-
ficial language of speech in more depth in Logique (Condillac 1970b). There, he pro-
poses how holistic gestures gave rise to analytic signs in the mind of the viewer. Our
ancestors obey nature. They gesticulate without a plan. The viewer who “listens with
his eyes” (Condillac 1970b: 404, my translation) will not understand/hear (Condillac
uses the French verb entendre, which can have either meaning) what the signer is trying
to communicate if he does not “decompose” (analyze) what he sees. It is natural for him
to observe movements sequentially: He attends to the most striking movements first,
then to others, and thus mentally converts his holistic perception of an action into a lin-
ear sequence of distinct movements, each of which is coupled with a distinct idea. These
early humans realized that such decomposed actions are easier to understand than hol-
istic ones. An unconscious cognitive process then becomes conscious and deliberate. By
decomposing his gestures, the evolutionary signer decomposes his thought into its con-
stituent ideas to clarify it for himself. Others understand him because he understands
himself. Repetition reinforces the habit, and gestural language naturally evolves into
an analogical method for analyzing thought: “I say method because the succession of
movements will not be made arbitrarily and without rules. For since gesture is the effect
of one’s needs and circumstances, it is natural to decompose it in the order given by
those needs and circumstances. Although this order can and does vary, it can never
be arbitrary” (Condillac 1970b: 404–405, as quoted in Harris and Taylor 1997: 148).
movements and substituted these. The first languages were highly intonated because of
the limited articulatory abilities of the first speakers. Sounds were intentionally varied
in tone to create semantic distinctions, as in tone languages like Chinese. As more
words were created, the less “singing” was required to make semantic distinctions
(see Condillac 2001: 120–122).
In Grammaire, Condillac (1970a) elaborates on how the first vocal lexicon could
have expanded to include elements other than cries, equivalent to interjections, and
onomatopoeic expressions. He suggests that names for things that do not make noises
were formed analogously through a synesthetic transfer from the visual-gestural to the
audio-oral modality (see Condillac 1970a: 366–367), an idea that recalls how Plato
related mimetic movement and sound symbolism (Cratylus, trans. [1926] 1996: 133–
147). It is a contentious issue whether duality of patterning could have evolved from
interjections and onomatopoeic expressions. In both cases, they are holophrastic sounds
which are typically outside the phonetic range of a language system and not syntacti-
cally integrated into sentential structures; furthermore, whereas interjections have no
semantic dimension, onomatopoeic expressions have no pragmatic dimension (Trabant
1998: 115–146).
Condillac reasons that, originally, instituted signs were based on the principle of
analogy, and chains of analogies expanded the lexicon over time, i.e. they were moti-
vated signs with links that formed metaphoric matrices. The older the language, the
more it preserved characteristic traces of the language of action and exercised the imag-
ination. These traces fade as a language evolves diachronically. Hence, Ancient Greek
conveyed more vivid imagery than the relatively abstract French of his day (see Con-
dillac 2001: 141). He argues that in reforming natural languages for philosophical
inquiry, the principles of analysis and analogy must be applied if clear and precise
ideas are to be conveyed.
character of the people who speak it (see Condillac 2001: 185–195; Trabant 2003: 175–
177). Génie is a consequence of the climate that the speakers live in and the type of gov-
ernment they have. It blossoms in the hands of great writers and withers in those of
mediocre ones. It is captured in the combination of ideas that words comprise, and
the connotations attached to them, as well as in the syntax that governs how words
may be combined to form text. For Condillac, language diversity reflected the many
possible ways of configuring thoughts, and all are equally natural (see Ricken 1989:
293). He considered the different grammatical structures of different languages to
offer different advantages. The inflectional forms of Latin declinations give a flexibility
to word order, which allows a wide margin of stylistic creativity that is wonderful for
composing poetry. Its génie gives full rein to the imagination. In contrast, the lack of
declinations in French imposes the word order Subject Verb Object to establish gram-
matical relations between the nouns in a sentence. This yields constructions that link
perceptions and thoughts well. Its génie gives it simplicity and clarity that makes it
ideal for abstract reasoning. Condillac considered analysis and imagination to be antag-
onistic cognitive processes: A language that favors the one, does so at the expense of the
other. A good analytic language, like French, exercises the reasoning that underpins
good philosophy, whereas a language that exercises the imagination more, like Latin,
inspires good poetry. “The most perfect language lies in the middle, and the people
who speak it will be a nation of great men” (Condillac 2001: 192), because they would
not only produce good science, but they would also bear the hallmark of good culture
by creating good art.
2.7. Summary
In the Essai, Condillac (2001) tells “a communicative story to explain the genesis of a
cognitive device” (Trabant 2001: 5), arguing that subjective expression gave rise to
objective denotation. Reacting to instinctive pity resulted in semiogenesis. Gaining
mental control over natural signs led to the invention of conventional ones. Semiogen-
esis led to glottogenesis. In Grammaire (Condillac 1970a) and Logique (Condillac
1970b), he develops the idea that gestural language established the faculty of analogy
by giving holistic thought an analytic linear structure and led to speech via cross-
modal transfer. The analogical principle underlying the language of action provided a
blueprint for metaphoric matrices to build up the lexicon.
In sum, Condillac shook all three discourse traditions with regard to the origin of
language. His recognition, inspired by Locke, that language influences the way that peo-
ple think marked a break with the Aristotelian tradition. He re-embodied the human
mind that Cartesian rationalism had severed from the body by proposing that it is gener-
ated by transforming sensations. God is sidestepped rather than eliminated in his hypoth-
esis of how man made language, allowing him to make a significant contribution to the
advancement of philosophy outside the realm of theology. His epistemology and theory
of language origins came to be known as sensualist (or sensationalist) philosophy.
unable to hear or speak invent their own signs to communicate their thoughts, he con-
sidered sign language to be an equally valid form of language as speech (Descartes
1960: 96–97; Trabant 2003: 135–136). But, generally, up to the 16th century, “the deaf
were almost universally relegated to the class of idiots and madmen” (Berthier
[1840] 2006: 164). This view changed radically in the 18th century thanks to the pioneer-
ing work of enlightened French teachers and their deaf pupils, who proved the falsity of
this verdict of the speaking majority.
3.1. De l’Epée
The abbé Charles-Michel de l’Epée (1712–1789) is known as the inventor of the ges-
tural method of educating the deaf. In the 1760s he founded, and privately funded,
the first public school for the deaf in the world, in Paris. It was then commonly believed
that without speech, intellectual development was just not possible (see Berthier 2006:
168). This is why teaching the deaf to speak was considered the first priority if they were
to be integrated into society. Epée polemicized against this practice. He argued that, in
order to teach the deaf to think, it was better to teach them written French first, by com-
municating in the modality they naturally use among themselves, just as he had been
taught Latin by means of his own native language (see Lane [1984] 2006: 49). He learnt
the sign language of his pupils but thought that it lacked rules (Lane 2006: 7). So he in-
vented a system of signes méthodiques ‘methodical signs’ for analyzing and representing
lexical and grammatical elements of French. Words were written on the blackboard or
cards and first fingerspelled to memorize them. Words were correlated with ideas by
means of gesture. A word with a concrete meaning, like “door”, was explicated deicti-
cally. A word with an abstract meaning was explicated gesturally by analyzing its con-
stituent ideas, each of which was shown by means of a “natural” (motivated) sign for a
concrete referent that the pupils already understood. Hence, the meanings of complex
metaphysical terms were “made visible” using “natural signs, or those made natural by
analysis” (Epée 1776: Part II, 38; as quoted in Fischer 1993: 435). As an example,
Fischer (1993: 433) gives Epée’s (1776) analysis of the complex idea of je crois
‘I believe’ into simple ones that were expressed gesturally:
Here one recognizes Condillac’s underlying principles of analysis and analogy for
creating artificial signs in the language of action (see Fischer 1993: 431–437; 2011: 14),
which Condillac himself recognized. Four years after Epée had published his method-
ology in Institution des Sourds et Muets par la Voie des Signs Méthodiques (1776), Con-
dillac praised his method of instruction in Grammaire, stating that the ideas it conveyed
were “more exact and more precise” (Condillac 1970a: 359, my translation) than those
usually acquired with the help of hearing. The subtitle of Epée’s (1776) book, “A work
that contains the project of a universal language, by the intermediary of natural signs
subjected to a method” (my translation) indicates his belief in the superior efficiency
25. Enlightenment philosophy 389
of gestural signs. Epée later quoted Condillac’s remark in the book’s second edition,
published in 1784 (Fischer 1993: 434), but his belief that gestural signs could be per-
fected to become the universal language sought by philosophers did not gain support.
However, his work thrived, and many schools throughout Europe were founded on
his method of using sign to educate the deaf, marking a breakthrough in recognizing
their status as human beings with normal intellectual capacities (see Lane 2006: 50).
3.2. Sicard
The abbé Roch-Ambroise Cucurron Sicard (1742–1822) succeeded Epée as the princi-
pal of the school for the deaf in Paris in 1789 and established it as the model for many
others that opened in Europe and America (Lane 2006: 9). Although admiring Epée as
a great pioneer, Sicard was critical of his method, which in his view made the deaf into
“automatic copyists of signed French into written French without any understanding of
what they were writing” (Lane 2006: 81), so they were unable to compose their own
sentences (see Sicard [1803] 2006: 94). To redress this deficit, Sicard extended Epée’s
method, explicitly applying Condillac’s sensualistic philosophy in a kind of metaphysical
experiment (see Fischer 1993: 437–444; 2011: 15–19).
Sicard considered the uneducated deaf to be “absolute savages” (Sicard 1808: 1: 15–16;
as quoted in Fischer 1993: 442), who required a method for learning French that corre-
sponded to their “primitive” state, so they could become human and hence part of soci-
ety (see Fischer 1993: 437–439). In his Cours d’Instruction d’un Sourd-Muet de
Naissance (2006), he describes how he educated his first and most famous pupil, Jean
Massieu (1772–1846), in a process that parallels Condillac’s evolutionary transition
from natural to artificial signs. He systematically applied the principle of analogy to
show that a drawing (motivated signifier) can be replaced by a word (conventional sig-
nifier) to represent something, and the principle of analysis to signify part-whole rela-
tions, for example, a “tree” is composed of “roots”, “trunk”, and “branches” (see Sicard
2006: 111), thus enabling Massieu to classify his growing knowledge. Gesture was used
to explicate simple concrete terms, then complex abstract ones. In contrast to Epée,
Sicard never mastered the sign language of his pupils (Lane 2006: 10), and he stressed
the importance of deaf pupils supplying their own manual signs to “replace spoken
language” (Sicard 2006: 97), as these were most suited to their primitive level of intel-
lectual development. Sicard’s approach is demonstrated in his Théorie des Signes
(1808), a dictionary in two volumes, inspired by an idea of Epée’s (Berthier 2006:
185). Volume one gives examples of how concrete terms are decomposed into their sim-
ple constituent ideas and explicated by enacting pantomimic scenes, for which deaf
pupils provided the sign sequences described by Sicard. Volume two treats abstract
terms and grammar.
In the words of Roch-Ambroise Auguste Bébian (1789–1834), a hearing person who
attended Sicard’s school and became fluent in the sign language of his deaf school
friends, Sicard may have completed and perfected Epée’s approach by giving the
deaf a way of obtaining a satisfactory grammatical translation of French:
But, one senses, the more profoundly these signs decompose the sentence – thus revealing
the structure of French – the further they get away from the language of the deaf, from
their intellectual capacities and style of thinking. That is why the deaf never make use
390 III. Historical dimensions
of these signs among themselves; they use them in taking word-for-word dictation, but to
explain the meaning of the text dictated, they go back to their familiar language. (Bébian
[1817] 2006: 148)
Bébian ended the practice of educating the deaf in a manual version of French rather
than their own sign language (Lane 2006: 127). It took the intelligence and persever-
ance of deaf people themselves to prove the true value of their native language. To
name but a few outstanding deaf pioneers: Pierre Desloges (1747–1799) wrote the
first book by a deaf person, in which he defended Epée’s method and discussed the
sign language of his Parisian deaf community (see Lane 2006: 28–29). Laurent Clerc
(1785–1869) was a school friend of Bébian’s who left France for America, where he
co-founded the first school for the deaf in 1817 with Thomas Hopkins Gallaudet
(1787–1851) (see Lane 2006: 9–10, 127–128). And Ferdinand Berthier (1803–1886), a
pupil of Clerc’s, achieved full professorship at the age of 26, was a prolific French
author, and the first deaf person to receive the Legion of Honor, the highest decoration
in France (see Lane 2006: 161).
Despite the success of the manual approach to educating the deaf, the 19th century
saw the return of emphasizing speech and lip reading, a fate that was sealed by the Con-
gress of Milan in 1880, which deemed that the deaf should first and foremost learn
to speak. Sign language was banished from classrooms worldwide until the mid 20th
century, when it attracted the attention of language scientists (see Lane 2006: 1–4).
3.3. Summary
Condillac’s sensualist philosophy inspired the attempts of Epée and Sicard to teach the
deaf French by translating written words into gestural signs and thus facilitate their
social integration. This led to recognizing the superiority of their own sign language
for educational purposes. However, the light thrown on the génie of their language in
the 18th century almost faded to extinction at the end of the 19th century but was
rekindled in the mid 20th century.
4. Conclusion
By giving the body a central role in forming the mind, Enlightenment philosophy
shifted the perspectives on what language is and how its communication function is in-
timately related to cognition. Considered as a vestige of our phylogenetic past, gesture
became a focus of interest in the debate on the origin of language. In Condillac’s treat-
ment of the language of action that underpinned his natural history of man, gesture was
constitutive in forming the first naturally motivated signs, and in creating blueprints for
man-made signs and hence language, which promoted creative thinking and intellectual
evolution. The analogical and analytic principles underlying his conception of the lan-
guage of action found application in Epée’s and Sicard’s experiments with French sign
language, proving that the deaf are indeed capable of reason. Despite the acclaimed
natural potential of gesture for conveying clear and precise ideas, the proposal that
sign language could be developed into a universal philosophical language was not
pursued.
25. Enlightenment philosophy 391
5. References
Aarsleff, Hans 1982. From Locke to Saussure. Essays on the Study of Language and Intellectual
History. Minneapolis: University of Minnesota Press.
Aarsleff, Hans 2001. Introduction. In: Etienne Bonnot de Condillac, Essay on the Origin of Human
Knowledge, xi–xxxviii. Translated and edited by Hans Aarsleff. Cambridge: Cambridge Univer-
sity Press.
Aristotle 1960. Posterior Analytics. Edited and translated by Hugh Tredennick. Topica. Edited and
translated by E. S. Forster. The Loeb Classic Library. Cambridge, MA: Harvard University
Press. First published [1930].
Aristotle 1992. On Sophistical Refutations, On Coming-to-Be and Passing Away. Translated by
E. S. Forster; On the Cosmos. Translated by D. J. Furley. The Loeb Classic Library. Cambridge,
MA: Harvard University Press. First published [1955].
Aristotle 1996. On Interpretation. In: Categories. On interpretation. Edited and translated by
Harold P. Cooke, Prior Analytics. Edited and translated by Hugh Tredennick. The Loeb Classic
Library. Cambridge, MA: Harvard University Press. First published [1938].
Bacon, Francis 2011. The Works of Francis Bacon. A New Edition. Edited and translated by
Basil Montagu. 1825–1836. 16 Volumes. London: British Library, Historical Print Editions.
First published [1620].
Bébian, Roch-Ambroise Auguste 2006. Essai sur les sourds-muets et sur le langage naturel. In:
Harlan Lane (ed.), The Deaf Experience. Classics in Language and Education, 129–160. Wash-
ington, DC: Gallaudet University Press. First published [1817].
Berthier, Ferdinand 2006. Les sourds-muets avant et depuis l’Abbé de l’Epée. In: Harlan Lane (ed.),
The Deaf Experience. Classics in Language and Education, 163–203. Washington, DC: Gallau-
det University Press. First published [1840].
Condillac, Etienne Bonnot de 1973. Essai sur l’Origine des Connaissances Humaines, Ouvrage où
l’on Réduit à un Seul Principe tout ce qui Concerne l’Entendement. Edited by Charles Porset.
Auvers-sur-Oise, France: Galilée. First published [1746].
Condillac, Etienne Bonnot de 2001. Essay on the Origin of Human Knowledge. Translated and edi-
ted by Hans Aarsleff. Cambridge: Cambridge University Press. First published [1746].
Condillac, Etienne Bonnot de 1970a. Grammaire. In: Œuvres Complètes, Volume 5, 351–625. Reprint
of the Paris edition 1821–1822. Geneva: Slatkine Reprints. First published [1775].
Condillac, Etienne Bonnot de 1970b. La logique. In: Œuvres Complètes, Volume 15, 319–463. Reprint
of the Paris edition 1821–1822. Geneva: Slatkine Reprints. First published [1780].
De l’Epée, Charles-Michel 1776. Institution des Sourds et Muets par la Voie des Signes Méthodi-
ques. Ouvrage qui Contient le Projet d’une Langue Universelle, par l’Entremise des Signes Nat-
urels Assujettis à une Méthode. Paris: Nyon l’Aı̂né.
De l’Epée, Charles-Michel 2006. La véritable manière d’instruire les sourds et muets, confirmée
par une longue expérience. In: Harlan Lane (ed.), The Deaf Experience. Classics in Language
and Education, 51–72. Washington, DC: Gallaudet University Press. First published [1784].
Descartes, René 1960. Discours de la Méthode pour bien Conduire la Raison et Chercher la Vérité
dans les Sciences. Paris: Garnier. First published [1637].
Diderot, Denis 1978. Lettre sur les sourds et muets à l’usage de ceux qui entendent et qui parlent.
Edited and presented by Jacques Chouillet. In: Œuvres Complètes, Volume 4. Edited by Jean Fabre,
Herbert Dieckmann, Jacques Proust, and Jean Varloot. Paris: Hermann. First published [1751].
Fischer, Renate 1990. Sign language and French Enlightenment. Diderot’s “Lettre sur les sourds
et muets”. In: Siegmund Prillwitz and Tomas Vollhaber (eds.), Current Trends in European
Sign Language Research. Proceedings of the Third European Congress on Sign Language
Research, Hamburg, July 26–29, 1989, 35–58. Hamburg: Signum Press.
Fischer, Renate 1993. Language of action. In: Renate Fischer and Harlan Lane (eds.), Looking
Back. A Reader on the History of Deaf Communities and Their Sign Languages, 429–455. Ham-
burg: Signum Press.
392 III. Historical dimensions
Fischer, Renate 2011. Der gestische Sprachursprung – Szenarien um 1800. In: Das Zeichen, 87,
12–22.
Harris, Roy and Talbot J. Taylor 1997. Landmarks in Linguistic Thought I. The Western Tradition
from Socrates to Saussure. 2nd edition. London: Routledge. First published [1989].
Kendon, Adam 2005. Gesture. Visible Action as Utterance. Reprint with corrections. Cambridge:
Cambridge University Press. First published [2004].
Lane, Harlan 2006. The Deaf Experience. Classics in Language and Education. Washington, DC:
Gallaudet University Press. First published [1984].
Locke, John 1996. An Essay Concerning Human Understanding. Abridged and edited by John W.
Yolton. London: Everyman. First published [1690].
Lucretius Carus, Titus 1992: De Rerum Natura V. Translated by W. H. D. Rouse and revised by
Martin Ferguson Smith. The Loeb Classic Library. Cambridge, MA: Harvard University
Press. First published [1924].
Plato 1996. Cratylus. In: Cratylus. Parmenides. Greater Hippias. Lesser Hippias. Translated by
H. N. Fowler. The Loeb Classic Library. Cambridge, MA: Harvard University Press. First
published [1926].
Ricken, Ulrich 1989. Condillac: Sensualistische Sprachursprungshypothese, geschichtliches
Menschen – und Gesellschaftsbild der Aufklärung. In: Joachim Gessinger and Wolfert von
Rahden (eds.), Theorien vom Ursprung der Sprache, 287–311. Berlin: De Gruyter Mouton.
Rousseau, Jean-Jacques 1968. Essai sur l’Origine des Langues où il est Parlé de la Mélodie et de
l’Imitation Musicale. Edited by Charles Porset. Paris: Nizet. First published [1781].
Seigel, Jules Paul 1995. The Enlightenment and the evolution of a language of signs in France and
England. In: Nancy S. Struever (ed.), Language and the History of Thought, 91–110. Rochester,
NY: Boydell and Brewer.
Sicard, Roch-Ambroise Cucurron 2006. Cours d’instruction d’un sourd-muet de naissance, pour
servir à l’éducation des sourds-muets, et qui peut être utile à celle de ceux qui entendent
et qui parlent. In: Harlan Lane (ed.), The Deaf Experience. Classics in Language and Educa-
tion, 83–126. Washington, DC: Gallaudet University Press. First published [1803].
Sicard, Roch-Ambroise Cucurron 1808. Théorie des Signes pour l’Instruction des Sourds-Muets,
dédié à S. M. l’Empereur et Roi. Suivi d’une notice sur l’enfance de Massieu. 2 volumes.
Paris: L’Imprimerie de l’Institution des Sourds-Muets.
Takesada 1982. Imagination et langage dans l’essai sur l’origine des connaissances humaines de
Condillac. In: Jean Sgard (ed.), Condillac et les Problèmes du Langage, 47–57. Geneva:
Slatkine.
Trabant, Jürgen 1996. Thunder, girls, and sheep, and other origins of language. In: Jürgen Trabant
(ed.), Origins of Language, 39–69. Budapest: Collegium Budapest.
Trabant, Jürgen 1998. Artikulationen. Historische Anthropologie der Sprache. Frankfurt am Main:
Suhrkamp.
Trabant, Jürgen 2001. Introduction: New perspectives on an old academic question. In: Jürgen
Trabant and Sean Ward (eds.), New Essays on the Origin of Language, 1–17. Berlin: De Gruy-
ter Mouton.
Trabant, Jürgen 2003. Mithridates im Paradies. Kleine Geschichte des Sprachdenkens. Munich:
C. H. Beck.
Vico, Giambattista 1990. Princı̀pi di Scienza Nuova d’Intorno alla Comune Natura della Nazioni.
Third edition. Edited by Andrea Battistini. Milan: Mondadori. First published [1744].
Willard, Thomas 1989. Rosicrucian sign lore and the origin of language. In: Joachim Gessinger and
Wolfert von Rahden (eds.), Theorien vom Ursprung der Sprache, Volume 1: 131–157. Berlin:
De Gruyter Mouton.
Abstract
The chapter presents a concise overview of empirical research of body, language, and
communication in the 20th century. Starting from a discussion of Wundt’s semiotic clas-
sification of bodily behavior (Wundt 1921) and the work of Efron (1972) on gestures as
expression of cultures, the paper moves on to a discussion of empirical research on body,
language, and communication in the post war period and the decline in interest within the
1950s to 1970s due to the research paradigm of nonverbal communication. After shortly
discussing the work of Pike (1972) and Birdwhistell (1970), the paper then traces back the
emergence of modern gesture research within the 70s and 80s and introduces the main
research paradigms established, in particular within gestures studies. The paper then dis-
cusses modern day gesture research within the fields of psycholinguistics/psychology,
interaction studies, linguistics, and semiotics, cognitive linguistics, conversation analysis
and the ethnography (of communication) as well as the field of artificial intelligence
showing that present research on body, language, and communication is a wanderer
between disciplines, and first and foremost characterized by its interdisciplinary nature.
1. Introduction
At the end of the 19th century, research on body, language, communication, and in par-
ticular gestures reached a respectable and central position in the sciences. Works of
scholars such as Mallery (1972 [1881]), Tylor (1964 [1870]), and de Jorio ([1881]
2000) contributed substantively to ongoing scientific interest and were at the heart of
central scientific questions.
However, this interest declines substantially after the influential work of Wilhelm
Wundt (1921) at the beginning of the 20th century. Topics of the 19th century, such
as the evolution of language, the nature of sign languages, and the universality and nat-
uralness of gestures were not at the center of scientific interest and only few works on
body, gestures, and language emerge in the beginning of the 20th century. Only at
the beginning of the 1970s does the interest in gestures rise again. Based on several
influential publications on gestures, language and the body, a diversified range of
research characterized by its interdisciplinary nature develops. Today, research on
body, language and communication wanders between different scientific fields such as
394 III. Historical dimensions
body and gestures (see Kendon 1982, 2004b). When gestures or bodily forms of
expression were of interest, it was only with respect to their possible evidence on under-
lying motivations, character or personality of the individual (see Krout 1935; Wolff
1945) as well as the expression of emotion (Allport and Vernon 1933; Bruner and
Tagiuri 1954).
With the rise of structural linguistics and the focus of linguistics on abstract aspects
of the language system, interest in gestures had also no appropriate place in the science
of language. Anthropologists, such as Franz Boas and Edward Sapir, accepted gestures
as part of a wide variety of communicative behavior. They stated that “among the pri-
mary communicative processes of society may be mentioned: language, gestures, in its
widest sense” (Sapir 1949: 104). Furthermore, gestures may play a role in determin-
ing the meaning of utterances (Sapir 1949). However, in general, gestures were not con-
sidered part of linguistics proper. Bloomfield’s depreciation of gestures may be seen as
the standard view in linguistics at that time. “To some extent, individual gestures are
conventional and differ for different communities.” But “most gestures scarcely go
beyond an obvious pointing and picturing.” (Bloomfield [1933] 1984: 39)
unstudied because it was left without a theoretical framework into which it could be
really fitted.” It “fell between stools.” (Kendon 2004b: 72)
This situation gradually changes at the beginning of the 1970s, as research on the
body, language and especially gestures returns in a number of scientific fields (e.g.,
anthropology, linguistics, and psychology), leading to the development of new perspec-
tives and theoretical frameworks for the study of gestures (see Kendon 1982, 2004b for
a detailed discussion).
Pike (1967, 1972) and Birdwhistell (1970, 1974, 1975) are among the first to put for-
ward thoughts at bringing together body and language in the study of communication
from a theoretical as well as methodological viewpoint. Pike and Birdwhistell objected
to the view of communication being divided and analyzed into discrete and indepen-
dent channels (e.g., verbal and nonverbal). Rather, communication must be under-
stood as a shared and common system of behavior patterns, a “multichannel system”
(Birdwhistell 1979: 193). Based on this assumption, Pike and Birdwhistell approach
this question on the basis of structuralist methods, yet with a different aim and scope.
Pike (1967, 1972) develops a theoretical framework and method with which verbal and
nonverbal activities can be conceived of as a structural whole (Hübler 2001: 220).
The activity of man constitutes a structural whole, in such a way that it can not be subdi-
vided into neat “parts” or “levels” or “compartments” with language in a behavioral com-
partment insulated in character, content, and organization from other behavior. Verbal and
nonverbal activity is a unified whole, and theory and methodology should be organized or
created to treat it as such. (Pike 1967: 26)
Birdwhistell (1970, 1974, 1975), on the other hand, develops his program of “kinesics”, a
science of communication by bodily behavior (gestures, facial expressions, etc.). In the
view of Birdwhistell, bodily behavior can be analyzed analogously to verbal utterances.
With the help of descriptive and structural linguistics, he examines bodily behavior
on the different levels of language (phonetics, phonology, morphology, and syntax),
and subdivides bodily expressions in distinctive hierarchical units (kine, kinemes,
kinemorphs, complex kinemorphs, and kinemorph constructions).
Another step that moves the analysis of body and speech forward into the focus of
attention is the work by Condon and Ogston (1966, 1967). Based on microanalyses of
motion pictures, Condon and Ogston are able to show a “precise correlation between
changes of body motion and the articulated patterns of the speech stream.” (Condon
and Ogston 1967: 227) This correlation of body movements and speech encompasses
different hierarchical levels of speech, such that body movements may align with
“sub-phones”, phones, syllables, words, phrases, and higher-level units. This systematic
co-occurrence of body movements with speech seems to be based on the segmental
breath pulse (see Furuyama 2002; Tuite 1993), leading to the impression that “the
body dances in time with speech” (Condon and Ogston 1967: 225).
This tight interrelation of speech and body movements is picked up by Ken-
don (1972) who puts forward an examination of various types of bodily and showed
that “just as the flow of speech may be regarded as an hierarchically ordered set of
units, so we may see the patterns of body motion that are associated with it
as organized in a similar fashion, as if each unit of speech has its ‘equivalent’ in
body motion.” (Kendon 1972: 204) Contrary to Condon and Ogston, Kendon goes
398 III. Historical dimensions
beyond the mere correlation of units of speech with units of bodily expressions. Rather,
he shows that the larger the speech unit, the more body parts are involved, and con-
cludes: “it is as if the speech production process is manifested in two forms of activity
simultaneously: in the vocal organs and also in the bodily movement, particularly
movements of the hands and arms.” (Kendon 1972: 205) Speech and gesture are
“under the guidance of the same controlling mechanism” (Kendon 1972: 206), and
even more so “appear together as manifestations of the same process of utterance.”
(Kendon 1980: 208)
A similar view is expressed by McNeill (1979, 1985, 1986). Gestures “share
with speech a computational stage; they are, accordingly, parts of the same psycho-
logical structure” (McNeill 1985: 350). Moreover, gestures allow insights into
the mental representation on which speech is based. Gestures are a “channel of
observation onto the speaker’s mental representation.” (McNeill 1986: 108; see
also McNeill 1985)
Kendon’s works and in particular McNeill’s (1985) statement of gestures and speech
sharing a “computational stage” triggers a lively debate on the relation and production
of gesture and speech (Butterworth and Hadar 1989; Feyereisen 1987; McNeill 1989).
Initiated by this debate, publications on gestures abruptly increased in the 90s and
the topic of gesture and speech being guided by one mental process becomes the
main paradigm of the newly arising gesture research.
“window onto thinking” (McNeill and Duncan 2000: 143), a way of inspecting pro-
cesses of conceptualizations and thinking, as gesture displays “mental content, and
does so instantaneously, in real-time.” (McNeill and Duncan 2000: 143)
The orientation of gesture research on questions of conceptual processes, produc-
tion, reception, and the relation between speech and gesture as initiated by McNeill
constitutes one of the main topics in modern gesture research (for an overview see
for example Duncan, Cassell, and Levy 2007). Studies address how gesture and speech
relate to each other on the level of production, i.e., regarding speech-gesture production
models (de Ruiter 2000, 2006, 2007; Kita 1990, 2000; Seyfeddinipur 2006), whether ges-
tures aid the speech production process (Alibali, Kita, and Young 2000; Beattie and
Rima 1994; Beattie and Shovelton 2002b; Esposito, McCullough, and Quek 2001;
Hadar, Dar, and Teitelman 2001; Hadar and Pinchas-Zamir 2004; Krauss and Hadar
1999; Krauss, Dushay, Chen, et al. 1995; Krauss, Chen, and Chawla 1996; Mayberry
and Jaques 2000; Seyfeddinipur 2006; Tuite 1993) or are communicative themselves
and fulfill an important interactive function (Alibali and Heath 2001; Bavelas 1994;
Bavelas and Chovil 2000; Bavelas, Chovil, Coates, et al. 1995; Bavelas, Kenwood, and
Phillips 2002; Beattie and Shovelton 1999, 2002a, 2007; Furuyama 2002; Gerwing and
Bavelas 2004; Gullberg and Holmquist 1999, 2006; Iverson and Goldin-Meadow 1998;
Kimbara 2007; Kuhlen and Seyfeddinipur 2007; Levy and Fowler 2000; Özyürek
2000, 2002; Tabensky 2001). Comparisons of conceptualization and expressions of
ideas and concepts in gesture and speech particularly in different languages (Allen,
Özyürek, Kita, et al. 2007; Duncan 1996; Kita and Özyürek 2003; Özyürek, Kita, and
Allen 2001) as well as questions on the acquisition of gestures are also being addressed
in psychological studies on gestures (Butcher, Goldin-Meadow, and McNeill 2000;
Goldin-Meadow 1993; Guidetti and Nicoladis 2008; Gullberg 1998; McCafferty 2004;
Negueruela, Lantolf, Rehn Jordan, et al. 2004). The production and reception of ges-
tures from neurological perspectives (Duncan 2002; Kelly and Goldsmith 2004; Kita
and Lausberg 2007; Lausberg, Cruz, Kita, et al. 2003; Lausberg and Kita 2003; McNeill
1992, 2005) are investigated as much as structural relations between gestures and speech
(Furuyama, Takase, and Hayashi 2002; Kita, van Gijn, and van der Hulst 1998; Loehr
2006; McClave 1998 inter alia).
Central to the work of Calbris (1990, 2003a, 2003b, 2008) is the assumption that gestures
can be decomposed into different features, i.e., physical components such as straight,
curved or circled movements which recur with rather stable gestural meanings. Based
on this assumption, Calbris sets up semantic groups for particular movement types,
for instance, and suggests, even further, that it is possible to set up a gesture dictionary
based on recurring gestural forms and functions. For the first time, Calbris systemati-
cally addresses the question of contrasts in the physical forms of gestures and their
possible semiotic implications by relying on a large body of different gestures form dif-
ferent discourse types.
Apart from the idea of decomposing gestures into separate features that can be
understood as building blocks of gestural forms and meanings, Calbris’ discussion of
how the representation of objects or concrete actions can serve as metaphors for abstracts
concepts is equally interesting. Gestural representations of concrete objects or actions,
according to Calbris, are representations based on abstraction: the concrete action or
object being represented in the gestures solely functions as a symbol for the concept
to be represented. “Even in evoking a concrete situation, a gesture does not reproduce
the concrete action, but the idea abstracted from the concrete reality. The dimensions of
objects are mimed according to symbolic norms.” (Calbris 1990: 115) (c.f. Kendon
2004b; Mittelberg 2006; Müller 1998, 2010; Streeck 2009)
Questions of gestural representation and abstraction are also addressed by Müller in
her book Redebegleitende Gesten: Kulturgeschichte, Theorie, Sprachvergleich (1998).
Gestures, according to Müller, display all three of Bühler’s (1934) language categories
and thus, just as speech, have the functions of expressing, making an appeal and repre-
senting (Müller 1998: 10; this volume). Yet, what is particular for gestures, and therefore
differentiates them from other sign systems, is the ability to represent objects, events,
and the like – a function only found in speech and gesture. Müller, thus, singles out ges-
tures with dominantly representational function and, based on these, develops four
modes of representation, i.e., processes of gestural sign creation based on the handling
of objects in the world. In the case of acting, the hands represent the performance of an
action such as opening a window. If the hands model however, the hands recreate an
object, such as a bowl for instance by three-dimensionally modeling it. In the case of
drawing, the hand recreates the bowl not in a plastically three-dimensional but in a
two-dimensional way. In the case of representing, the hand itself becomes the object
to be represented such as a finger used for writing on a piece of paper (Müller 1998:
115–120). In a recent revision of her earlier proposal, Müller refines the four modes
into two, i.e., acting and representing (Müller, Ladewig, and Teßendorf in preparation;
see also Müller this volume).
Each of the gestural modes of representation is tied to a specific form of abstraction
from the object of perception (e.g., two dimensional vs. three dimensional). Gestures
26. 20th century: Empirical research of body, language, and communication 401
are thus “concrete, i.e., figuratively transformed abstractions of perceived reality and
abstract ideas.” (Müller 1998: 124, translation JB) The gestural modes thereby show
a particular perception of the observed object driving speakers to isolate particular
form aspects. The different gestural modes thus lead to differing gestural representation
of objects, and even to the particular use of hand shapes and movements (Müller 1998:
120). In the case of the gestural modes, “form features of the hands, particular hand
postures and movements of the hands constitute meaning.” (Müller 1998: 120, transla-
tion JB) In connection with the modes of representation, Müller therefore also ex-
presses first thoughts on other aspects of gestural structures, such as the relation of
gestural forms and possible meanings taken up in later publications (Müller 2004)
(see also section below for further studies on this aspect).
Another major publication within the interactional paradigm on language, body
and communication is Kendon’s Gesture: Visible Action as Utterance (Kendon 2004b).
The book, a synopsis of Kendon’s numerous studies and work on gestures, addresses
major research questions and foci under the focus of gestures and speech in interac-
tion and includes semiotic, linguistic, and cultural factors of spontaneous as well as
conventionalized gestures.
Gestures, as put forward by Kendon already in his papers in 1972, 1980, are integral
parts of the utterance “and are partnered with speech as a part of the speakers final
product.” (Kendon 2004b: 5) Starting from this assumption, Kendon examines the rela-
tion of speech and gesture in its diversity under the overarching theme of speech and
gesture jointly creating utterance ensembles. Kendon examines these utterance ensem-
bles in a variety of aspects. By studying the structural relation of speech and gesture, i.e.,
the position and timing of gestures and speech, Kendon points out the joint creation of
speech and gestures leading to a mutual adaption of the modes to each other. Gestural
expressions can be modified to fit the structure of speech, and speech may be modified
to meet the requirement of gestures (Kendon 2004b: 135ff.; see also Seyfeddinipur
2006). Furthermore, the variety and differences in the range of semantic relations of
speech and gesture as well as the semantic interaction of the both modi are foci of Ken-
don’s work (Kendon 2004b: 158ff., 176ff.). Gestures fulfill a variety of function, depend-
ing on the communicative context. Similar to Müller, Kendon therefore proposes a
functional classification of gestures including referential, pragmatic, and interactive
gestures (Kendon 2004b: 158ff.). Yet, a particular focus is placed on gestures with prag-
matic function (see also Kendon 1995, 2004a). Kendon aims at a thorough investigation
of the range of forms and functions of pragmatic gestures and thereby develops the idea
of so-called “gestures families”.
Using this concept of the gesture family, Kendon investigates several families of ges-
tures, such as the grappolo or bunch, the ring, and gestures of the open hand, and is
able to show that functional differences in the different gestures correspond to
402 III. Historical dimensions
differences in the type and manner of their execution. Similar to Calbris (1990, 2003a,
2003b, 2008), Kendon is able to semantically group gestures based on differences in
their forms. He shows that features such as hand shapes and movement seemingly con-
trast with others in order to reveal differences in meaning. The idea of gestures being
set up of features used recurrently by speakers and partially carrying and conveying
meaning on their own, is one of the major tenets of Kendon’s work.
In her study on the palm up open hand, Müller (2004) proposes a similar analysis yet
takes the conclusions about the relation of form, meaning, and function in gestures a bit
further. Based on the idea of the gesture family, Müller is able to show that the group of
gestures using the flat hand is structured around a formational and functional core
(hand shape and orientation of the palm), while different types of movement (e.g., ro-
tating, lateral) serve for internal meaning differentiation. The variations in form, ac-
cording to Müller, are thereby not idiosyncratically based, but rather correspond to a
closed set of form, as they only involve the feature “movement”. Concluding, Müller
adds:
the fact that the incorporated features appear to be semiotically unrelated to the features
of hand shape and orientation suggests that indeed two independent form-function ele-
ments are joined in one gesture: the Palm Up Open Hand (hand shape and orientation)
with Rotating or Lateral Motion (motion pattern). This indeed points towards a rudimen-
tary morphology based on purely iconic principles in pragmatic of performative co-speech
gestures. (Müller 2004: 254)
in gestures a step further (see Birdwhistell 1970; Becker 2004, 2008; Kendon 2004a;
Müller 2004) and incorporates it into an overall multimodal theory of grammar. Fricke
argues for the semanticization and typification of gestures, the existence of recursive
structures in gestures as well as the applicability of constituent structures to the linear
progression of gestures. Moreover, gestures have the potential to be used in an attrib-
utive function to the verbal utterance, as they limit the extension of the noun referent
(Fricke 2012) in the same way as attributes of speech. Studies approaching gestures
with the perspective of a grammar of gesture and/or multimodal grammar make up
a yet new field in gesture studies.
in preparation), but also for interactive processes in the production and perception of
gestures (Ladewig and Teßendorf in preparation).
7. Conclusion
In the course of time, and in particular since the 1970s, the study of the body, language,
and communication has evolved into an independent area of research. The journal
GESTURE, founded by Adam Kendon and Cornelia Müller, has celebrated its 10th
anniversary, and the International Society for Gesture Studies (ISGS) will be hosting
its sixth conference. In the past years, a number of research centers focusing on gestures
have been established, such as the Berlin Gesture Center, the Nijmegen Gesture Center
at the Max Planck Institute for Psycholinguistics, and the Manchester Gesture Center.
The field is nowadays a wanderer between disciplines, and first and foremost character-
ized by its interdisciplinary nature. Studies on gesture and other bodily forms of behav-
ior are finding attentive ears in several disciplines and their respective journals. Its
interdisciplinary approach has become one of the field’s major strengths, turning it
into an innovative area of research.
8. References
Alibali, Martha W. and Dana C. Heath 2001. Effects of visibility between speaker and listener on
gesture production: Some gestures are meant to be seen. Journal of Memory and Language 44:
169–188.
Alibali, Martha W., Sotaro Kita and Amanda Young 2000. Gesture and the process of speech pro-
duction: We think, therefore we gesture. Language and Cognitive Processes 15(6): 593–613.
Allen, Shanley, Asli Özyürek, Sotaro Kita, Amanda Brown, Reyhan Furman and Tomoko Ishi-
zuka 2007. Language-specific and universal influences in children’s syntactic packaging of Man-
ner and Path: A comparison of English, Japanese, and Turkish. Cognition 102: 16–48.
Allport, Gordon W. and Philip E. Vernon 1933. Studies in Expressive Movement. New York:
MacMillan.
Arendson, Jeroen, Andrea J. van Doorn and Huib de Ridder 2007. When and how well do people
see the onset of gestures. Gesture 7(3): 305–342.
Argyle, Michael 1989. Körpersprache und Kommunikation. Paderborn, Germany: Junfermann.
Bateson, Gregory 1968. Redundancy and coding. Animal Communication: Techniques of Study
and Results of Research, 614–626.
Bavelas, Janet Beavin 1994. Gestures as part of speech: Methodological implications. Research on
Language and Social Interaction 27(3): 201–221.
Bavelas, Janet Beavin and Nicole Chovil 2000. Visible acts of meaning. An integrated message model
of language in face-to-face dialogue. Journal of Language and Social Psychology 19(2): 163–193.
Bavelas, Janet Beavin, Nicole Chovil, Linda Coates and Lori Roe 1995. Gestures specialized for
dialogue. Personality and Social Psychology Bulletin 21(4): 394–405.
Bavelas, Janet Beavin, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive
Gestures. Discourse Processes 15: 469–489.
Bavelas, Janet Beavin, Trudy Johnson Kenwood and Bruce Phillips 2002. An experimental study
of when and how speakers use gesture to communicate. Gesture 2(1): 1–17.
Beattie, Geoffrey and Aboudan Rima 1994. Gesture, pauses and speech. An experimental inves-
tigation of the effects of changing social context on their precise temporal relationships. Semi-
otica 99(3/4): 239–272.
Beattie, Geoffrey and Heather Shovelton 1999. Do iconic hand gestures really contribute anything
to the semantic information conveyed by speech? An experimental investigation. Semiotica
123(1/2): 1–30.
Beattie, Geoffrey and Heather Shovelton 2002a. Iconic hand gestures and the predictability of
words in context in spontaneous speech. British Journal of Psychology 91: 473–491.
26. 20th century: Empirical research of body, language, and communication 407
Beattie, Geoffrey and Heather Shovelton 2002b. Lexical access in talk: A critical consideration of
transitional probability and word frequency as possible determinants of pauses in spontaneous
speech. Semiotica 141(1/4): 49–71.
Beattie, Geoffrey and Heather Shovelton 2007. The role of iconic gesture in semantic communi-
cation and its theoretical and practical implications. In: Susan D. Duncan, Justine Cassell and
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language (vol. 1), 221–241.
Philadelphia: John Benjamins.
Becker, Karin 2004. Zur Morphologie redebegleitender Gesten. MA thesis. Institut für deutsche
und niederländische Philologie, Freie Universität Berlin.
Becker, Karin 2008. Four-feature-scheme of gesture: Form as the basis of description. Unpub-
lished manuscript.
Bergmann, Kirsten and Stefan Kopp 2006. Verbal or visual? How information is distributed across
speech and gesture in spatial dialog. In: David Schlangen and Raquel Fernandez (eds.), Pro-
ceedings of brandial 2006, the 10th Workshop on the Semantics and Pragmatics of Dialogue,
90–97. Potsdam.
Bergmann, Kirsten and Stefan Kopp 2007. Co-expressivity of speech and gesture: Lessons for
models of aligned speech and gesture production. Symposium at the AISB Annual Conven-
tion: Language, Speech and Gesture for Expressive Characters, 153–158.
Birdwhistell, Ray 1970. Kinesics and Context. Pennsylvania: University of Pennsylvania Press.
Birdwhistell, Ray 1974. The language of the body: The natural environment of words. In: A. Sil-
verstein (ed.), Human Communication, 203–220. Hillsdale, NJ: Lawrence Erlbaum.
Birdwhistell, Ray 1975. Background considerations of the study of the body as a medium of
“expression”. In: Jonathan Benthall and Ted Polhemus (eds.), The Body as a Medium of
Expression, 34–58. New York: Dutton.
Birdwhistell, Ray 1979. Kinesik. In: Klaus R. Scherer and Harald G. Wallbott (eds.), Nonverbale Kom-
munikation. Forschungsberichte zum Interaktionsverhalten, 102–202. Weinheim, Germany: Beltz.
Bloomfield, Leonard 1984. Language. New York: Holt. First published [1933].
Bohle, Ulrike 2007. Das Wort Ergreifen, das Wort Übergeben: Explorative Studie zur Rolle
Redebegleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D. disser-
tation. Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt (Oder).
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases. Semiotica 184(1/4): 53–91.
Bressem, Jana and Cornelia Müller volume 2. The family of AWAY gestures. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva Ladewig, David McNeill and Jana Bressem (eds.), Body – Language –
Communication. An International Handbook on Multimodality in Human Interaction (Handbooks
of Linguistics and Communication Science 38.2). Berlin and Boston: De Gruyter Mouton.
Brookes, Heather 2001. O clever ‘He’s streetwise’. When gestures become quotable. Gesture 1(2):
167–184.
Brookes, Heather 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Anthropology 14(2): 186–224.
Brookes, Heather 2005. What gestures do: Some communicative functions of quotable gestures in
conversations among Black urban South Africans. Journal of Pragmatics 32: 2044–2085.
Bruner, Jerome S. and Renato Tagiuri 1954. The perception of people. In: Gardner Lindzey (ed.),
Handbook of Social Psychology, Vol. 2, 634–654. Reading, MA: Addison-Wesley.
Bühler, Karl 1934. Sprachtheorie. Die Darstellungsfunktion der Sprache. Jena, Germany: Gustav
Fischer.
Butcher, Cynthia, Susan Goldin-Meadow and David McNeill 2000. Gesture and the transition
from one to two word speech: When hand and mouth come together. In: David McNeill
(ed.), Language and Gesture, 235–258. New York: Cambridge University Press.
Butterworth, Brian and Hadar Uri 1989. Gesture, speech, and computational stages: A reply to
McNeill. Psychological Review 96(1): 168–174.
Calbris, Geneviève 1990. The Semiotics of French Gestures. Bloomington: Indiana University Press.
408 III. Historical dimensions
Calbris, Geneviève 2003a. From cutting an object to a clear cut analysis. Gesture as the representa-
tion of a preconceptual schema linking concrete actions to abstract notions. Gesture 3(1): 19–46.
Calbris, Geneviève 2003b. Multireferentiality of coverbal gestures. In: Monica Rector, Isabella
Poggi and Nadine Trigo (eds.), Gestures, Meaning and Use, 203–207. Porto: Universidade Fer-
nando Pessoa.
Calbris, Geneviève 2008. From left to right…: Coverbal gestures and their symbolic use of space.
In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 27–53. Amsterdam: John
Benjamins.
Cassell, Justine and Scott Prevost 1996. Distribution of semantic features across speech and ges-
ture by humans and computers. In: Proceedings of the Workshop on the Integration of Gesture
in Language and Speech. October 7–8, Wilmington.
Cassell, Justine, Matthew Stone, Brett Douville, Scott Prevost, Brett Achorn, Mark Steedman,
Norm Badler and Catherine Pelachaud (1994). Modeling the interaction between speech
and gesture. In: Ashwin Ram and Kurt Eiselt (eds.), Proceedings of the Sixteenth Annual Con-
ference of the Cognitive Science Society, 153–158. Atlanta, Georgia (USA): Lawrence Erlbaum.
Cienki, Alan 1998a. Metaphoric gestures and some of their relations to verbal metaphoric expres-
sions. In: Jean-Pierre Koening (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford: Center for the Study of Language and Information Publication.
Cienki, Alan 1998b. Straight: An image schema and its transformations. Cognitive Linguistics 9:
107–149.
Cienki, Alan 2005. Image schemas and gestures. In: Beate Hampe and Joseph E. Grady (eds.), From
Perception to Meaning: Image Schemas in Cognitive Linguistics, 421–442. Berlin: De Gruyter
Mouton.
Cienki, Alan 2008. Why study metaphor and gesture? In: Alan Cienki and Cornelia Müller (eds.),
Metaphor and Gesture, 5–25. Amsterdam: John Benjamins.
Cienki, Alan and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John Benjamins.
Condon, William S. and William D. Ogston 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143(4): 338–347.
Condon, William S. and William D. Ogston 1967. A segmentation of behavior. Journal for Psychia-
tic Research 5: 221–235.
de Jorio, Andrea 2000. Gesture in Naples and gesture in classical antiquity. A translation of La
mimica degli antichi investigata nel gestire napoletano (Fibreno, Naples [1832]) and with an
introduction and notes by Adam Kendon. Bloomington: Indiana University Press.
de Ruiter, Jan Peter 2000. The production of gesture and speech In: David McNeill (ed.), Lan-
guage and Gesture, 284–311. Cambridge: Cambridge University Press.
de Ruiter, Jan Peter 2006. Can gesticulation help aphasic people speak, or rather, communicate?
Advances in Speech-Language Pathology 8(2): 124–127.
de Ruiter, Jan Peter 2007. Postcards from the mind: The relationship between speech, imagistic
gesture and thought Gesture 7(1): 21–38.
Deppermann, Arnulf and Reinhold Schmitt 2007. Koordination. Zur Begründung eines neuen
Forschungsgegenstandes. In: Reinhold Schmitt (ed.), Koordination. Analysen zur Multimoda-
len Interaktion, 15–54. Tübingen, Germany: Narr.
Duncan, Susan D. 1996. Grammatical form and “thinking for speaking” in Mandarin Chinese and
English: An analysis based on speech-accompanying gestures. Ph.D. dissertation, Department
of Psychology, University of Chicago.
Duncan, Susan D. 2002. Gesture, verb aspect, and the nature of iconic imagery in natural dis-
course. Gesture 2(2): 183–206.
Duncan, Susan D., Justine Cassell, and Elena T. Levy (eds.) 2007. Gesture and the Dynamic
Dimension of Language: Essays in Honor of David McNeill. Philadelphia: John Benjamins.
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published in [1941].
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories, ori-
gins, usage and coding. Semiotica 1: 49–98.
26. 20th century: Empirical research of body, language, and communication 409
Esposito, Anna, Karl E. McCullough and Francis Quek 2001. Disfluencies in gesture: Gestural
correlates to filled and unfilled speech pauses. Proceedings of IEEE Workshop on Cues in
Communication. Hawaii: Kauai.
Feyereisen, Pierre 1987. Gestures and speech, interactions and separations: A reply to McNeill.
Psychological Review 94(4): 493–498.
Fornel, Michel de 1992. The return gesture: Some remarks on context, inference, and iconic ges-
ture. In: Peter Auer and Aldo di Luzio (eds.), The Contextualization of Language, 159–176.
Amsterdam: John Benjamins.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: Walter de Gruyter.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin: De
Gruyter Mouton.
Fricke, Ellen volume 1. Towards a unified grammar of gesture and speech: A multimodal
approach. Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig and David McNeill
(eds.), Body – Language – Communication: An International Handbook on Multimodality in
Human Interaction. (Handbooks of Linguistics and Communication Science 38.1). Berlin:
De Gruyter Mouton.
Furuyama, Nobuhiro 2002. Prolegomena of a theory of between-person coordination of speech
and gesture. International Journal Human-Computer Studies 57: 347–374.
Furuyama, Nobuhiro, Hiroki Takase and Koji Hayashi 2002. An ecological approach to intra- and
inter-personal coordination of speech, gesture, and breathing movements. Proceedings of the
First International Workshop on Man-Machine Symbiotic Systems, 169–199. Japan: Kyoto.
Gerwing, Jennifer and Janet Beavin Bavelas 2004. Linguistic influences on gesture’s form. Gesture
4: 157–194.
Gibbs, Raymond W., Jr. 1994. The Poetics of Mind: Figurative Thought, Language, and Under-
standing. New York: Cambridge University Press.
Goldin-Meadow, Susan 1993. When does gesture become language? A study of gesture used as a
primary communication system by deaf children of hearing parents. In: Kathleen Rita Gibson
and Tim Ingold (eds.), Tools, Language and Cognition in Human Evolution, 63–85. Cambridge:
Cambridge University Press.
Goodwin, Charles 2000a. Action and embodiment within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Goodwin, Charles 2000b. Practices of seeing: Visual analysis: An ethnomethodological approach. In:
Theo van Leeuwen and Carey Jewitt (eds.), Handbook of Visual Analysis, 157–182. London: Sage.
Goodwin, Charles 2003. Pointing as situated practice. In: Sotaro Kita (ed.), Pointing: Where Lan-
guage, Culture and Cognition Meet, 217–241. Hillsdale, NJ: Erlbaum.
Guidetti, Michele and Elena Nicoladis 2008. Introduction to special issue: Gestures and commu-
nicative development. First Language 28(2): 107–115.
Gullberg, Marianne 1998. Gesture as a Communication Strategy in Second Language Discourse: A
Study of Learners of French and Swedish. Lund, Sweden: Lund University Press.
Gullberg, Marianne and Kenneth Holmquist 1999. Keeping an eye on gestures: Visual perception
of gestures in face-to-face communication. Pragmatics and Cognition 7(1): 35–63.
Gullberg, Marianne and Kenneth Holmquist 2006. What speakers do and what listeners look at.
Visual attention to gestures in human interaction live and on video. Pragmatics and Cognition
14(1): 53–82.
Hadar, Uri, Rivka Dar and Amit Teitelman 2001. Gesture during speech in first and second lan-
guage: Implications for lexical retrieval. Gesture 1(2): 151–165.
Hadar, Uri and Lian Pinchas-Zamir 2004. The semantic specificity of gesture: Implications
for gesture classification and function. Journal of Language and Social Psychology 23(2):
204–214.
Harling, Philip and Alistair Edwards 1997. Hand tension as a gesture segmentation cue. In: Philip
Harling and Alistair Edwards (eds.), Progress in Gestural Interaction. Proceedings of Gesture
Workshop ’96, 75–88. Berlin: Springer.
410 III. Historical dimensions
Hayashi, Makoto, Junko Mori and Tomoyo Takagi 2002. Contingent achievement of co-tellership
in a Japanese conversation: An analysis of talk, gaze, and gesture. In: Cecilia E. Ford, Barbara
A. Fox, and Sandra A. Thompson (eds.), The Language of Turn and Sequence, 81–122. Oxford:
Oxford University Press.
Heath, Christian 1992. Gesture’s discreet tasks: Multiple relevancies in visual conduct and in the
contextualization of language. In: Peter Auer and Aldo di Luzio (eds.), The Contextualization
of Language, 101–127. Amsterdam: John Benjamins.
Heath, Christian and Paul Luff 1992. Collaboration and control: Crisis management and multime-
dia technology in London Underground Line Control Rooms. Journal of Computer Supported
Cooperative Work 1(1–2): 69–94.
Hübler, Axel 2001. Das Konzept “Körper” in den Sprach- und Kommunikationswissenschaften.
Tübingen: Francke.
Hymes, Dell 1962. The ethnography of speaking. In: Thomas Gladwin and William C. Sturtevant
(eds.), Anthropology and Human Behavior, 13–53. Washington, DC: Anthropological Society
of Washington DC.
Iverson, Jana and Susan Goldin-Meadow 1998. The Nature and Functions of Gesture in Children’s Com-
munication: New Directions for Child and Adolescent Development. San Francisco: Jossey-Bass.
Johnson, Mark 1987. The Body in Mind. The Bodily Basis of Meaning, Imagination, and Reason.
Chicago: University of Chicago.
Kelly, Spencer and Leslie Goldsmith 2004. Gesture and right hemisphere involvement in evaluat-
ing lecture material. Gesture 4: 25–42.
Kendon, Adam 1972. Some relationship between body motion and speech. In: Aron W. Siegman
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Perga-
mon Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Nonverbal Communication and Language, 207–277. The Hague: Mouton.
Kendon, Adam 1982. The study of gestures: Some observations on its history. Recherches Semio-
tiques 2(1): 44–62.
Kendon, Adam 1990. Conducting Interaction. Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Kendon, Adam 1995. Gestures as illocutionary and discourse structure markers in Southern Ital-
ian conversation. Journal of Pragmatics 23: 247–279.
Kendon, Adam 2004a. Contrasts in gesticulation: A Neapolitan and a British speaker compared.
In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday
Gestures, 173–193. Berlin: Weidler.
Kendon, Adam 2004b. Gesture. Visible Action as Utterance. Cambridge: Cambridge University Press.
Kettebekov, Sanshzar and Rajeev Sharma 2001. Toward natural gesture/speech control of a large
display. In: Roderick Little and Laurence Nigay (eds.), Engineering for Human-Computer
Interaction, 221–234. Heidelberg: Springer.
Kimbara, Irene 2007. Indexing locations in gesture: Recalled stimulus image and interspeaker
coordination as factors influencing gesture form. In: Susan D. Duncan, Justine Cassell and
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language (vol. 1), 213–220.
Philadelphia: John Benjamins.
Kipp, Michael 2004. Gesture Generation by Imitation: From Human Behavior to Computer Char-
acter Animation. Boca Raton, FL: Dissertation.com.
Kipp, Michael, Michael Neff, Kerstin H. Kipp and Irene Albrecht 2007. Towards natural gesture
synthesis: Evaluating gesture units in a data-driven approach to gesture synthesis. Intelligent
Virtual Agents 7: 15–28.
Kita, Sotaro 1990. The Temporal Relationship between Gesture and Speech: A Study of Japanese-
English Bilinguals. Chicago: University of Chicago.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.), Lan-
guage and Gesture, 162–185. Cambridge: Cambridge University Press.
26. 20th century: Empirical research of body, language, and communication 411
Kita, Sotaro and Hedda Lausberg 2007. Speech-gesture discoordination in split brain patients’
left-hand gestures: Evidence for right-hemispheric generation of co-speech gestures. Cortex
8: 131–139.
Kita, Sotaro and Asli Özyürek 2003. What does cross-linguistic variation in semantic coordination
of speech and gesture reveal? Evidence for an interface representation of spatial thinking and
speaking. Journal of Memory and Language 48: 16–32.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and co-
speech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Gesture, and Sign Language in Human-Computer Interaction, 23–35. Berlin:
Springer.
Kopp, Stefan, Kirsten Bergmann and Ipke Wachsmuth 2008. Multimodal communication from
multimodal thinking—towards an integrated model of speech and gesture production. Interna-
tional Journal for Semantic Computing 2(1): 115–136.
Kopp, Stefan, Paul Tepper and Justine Cassell 2004. Towards an integrated microplanning of lan-
guage and iconic gesture for multimodal output. In: International Conference on Multimodal
Interaction, 04 October 13–15. Center County, Pennsylvania: Pennsylvania State College.
Kopp, Stefan, Paul Tepper, Kimberley Ferriman, Kristina Striegnitz and Justine Cassell 2007.
Trading spaces: How humans and humanoids use speech and gesture to give directions. In:
Toyoaki Nishida (ed.), Conversational Informatics. An Engineering Approach, 133–160. Chi-
chester: John Wiley.
Kopp, Stefan and Ipke Wachsmuth 2004. Synthesizing multimodal utterances for conversational
agents. Computer Animation and Virtual Worlds 15(1): 39–52.
Krauss, Robert M., Yihusiu Chen and Purnima Chawla 1996. Nonverbal behavior and nonverbal
communication: What do conversational hand gestures tell us? In: Mark Zanna (ed.),
Advances in Experimental Psychology, 389–450. San Diego: Academic Press.
Krauss, Robert, Robert Dushay, Yihsiu Chen and Francis Rauscher 1995. The communicative
value of conversational hand gestures. Journal of Experimental Social Psychology 31: 533–552.
Krauss, Robert M. and Uri Hadar 1999. The role of speech-related arm/hand gestures in word
retrieval. In: Lynn S. Messing and Ruth Campbell (eds.), Gesture, Speech and Sign, 93–116.
Oxford: Oxford University Press.
Krout, Maurice H. 1935. The social and psychological significance of gestures (a differential ana-
lysis). The Pedagogical Seminary and Journal of Genetic Psychology 47: 385–412.
Kuhlen, Anna and Mandana Seyfeddinipur 2007. From speaker to speaker: Repeated gestures
across speakers. Paper presented at the Conference Berlin Gesture Center Colloquium, August
29, 2007. Berlin.
Ladewig, Silva H. 2006. Die Kurbelgeste – konventionalisierte Markierung einer kommunikativen
Aktivität. MA thesis, Institut für deutsche und niederländische Philologie, Freie Universität
Berlin.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste.
In: Sprache und Literatur 41(1) (105): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6 http://cogni
textes.revues.org/406.
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech: Structural, cog-
nitive, and conceptual aspects. Ph.D. dissertation. Faculty of Social and Cultural Sciences, Euro-
pean University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium ‘hand’: Discover-
ing recurrent structures in gestures. Semiotica.
Ladewig, Silva H. and Sedinha Teßendorf in preparation. Collaborative metonymy – Co-construction
meaning and reference in gestures.
Lakoff, George and Johnson Mark 1980. Metaphors We Live By. Chicago: Chicago University Press.
Latoschik, Marc Erich 2000. Multimodale Interaktion in Virtueller Realität am Beispiel der Virtuel-
len Konstruktion. Bielefeld, Germany: Technische Universität Bielefeld.
412 III. Historical dimensions
Lausberg, Hedda, Robyn F. Cruz, Sotaro Kita, Eran Zaidel and Alan Ptito 2003. Pantomime to
visual presentation of objects: Left hand dyspraxia in patients with complete callosotomy.
Brain 126: 343–360.
Lausberg, Hedda and Sotaro Kita 2003. The content of the message influences the hand choice in
co-speech gestures and in gesturing without speaking. Brain and Langugage 86: 57–69.
LeBaron, Curtis and Jürgen Streeck 2000. Gestures, knowledge, and the world. In: David McNeill
(ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
Levy, Elena T. and Carol A. Fowler 2000. The role of gestures and other grades language forms in
the grounding of reference in perception. In: David McNeill (ed.), Language and Gesture,
215–234. Cambridge: Cambridge University Press.
Loehr, Dan 2006. Gesture and Intonation. Washington, DC: Georgetown University Press.
Mallery, Garrick 1972. Sign Language among North American Indians Compared with that among
Other Peoples and Deaf-Mutes. The Hague: Mouton. First published [1881].
Martell, Craig 2005. FORM: An experiment in the annotation of the kinematics of gesture. Ph.D.
dissertation, University of Pennsylvania, Pennsylvania.
Matschnig, Monika 2007. Körpersprache: Verräterische Gesten und Wirkungsvolle Signale. Mu-
nich: Gräfe und Unzer.
Mayberry, Rachel I. and Joselynne Jaques 2000. Gesture production during stuttered speech. In:
David McNeill (ed.), Language and Gesture 199–214. Cambridge: Cambridge University Press.
McCafferty, Stephen G. 2004. Space for cognition: Gesture and second language learning. Interna-
tional Journal for Applied Linguistics 14(1): 148–165.
McClave, Evelyn 1998. Pitch and manual gesture. Journal of Psycholinguistic Research 27(1): 69–89.
McNeill, David 1979. The Conceptual Basis of Language. Hillsdale, NJ: Erlbaum.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3): 350–371.
McNeill, David 1986. Iconic gestures of children and adults. Semiotica 62: 107–128.
McNeill, David 1989. A straight path – to where? Reply to Butterworth and Hadar. Psychological
Review 96(1): 175–179.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
McNeill, David 2008. Unexpected metaphors. In: Alan Cienki and Cornelia Müller (eds.), Meta-
phor and Gesture, 185–199. Amsterdam: John Benjamins.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking-for speaking. In: David
McNeill (ed.), Language and Gesture, 141–161. Cambridge: Cambridge University Press.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discourse evidence for multi-
modal models of grammar. Ph.D. dissertation. Cornell University New York, Ann Arbor, MI: UMI
Mittelberg, Irene 2008. Peircean semiotics meets conceptual metaphor: Iconic modes in gestural
representations of grammar. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Ges-
ture, 145–184. Amsterdam: John Benjamins.
Mittelberg, Irene 2010. Geometric and image-schematic patterns in gesture space. In: Vyv Evans
and Paul Chilton (eds.), Language, Cognition, and Space: The State of the Art and New Direc-
tions, 351–385. London: Equinox.
Mittelberg, Irene and Linda Waugh 2009. Multimodal figures of thought: A cognitive-semiotic
approach to metaphor and metonymy in co-speech gesture. In: Charles Forceville and Eduardo
Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter.
Mondada, Lorenza 2007. Interaktionsraum und Koordinierung. In: Reinhold Schmitt (ed.), Koor-
dination. Analysen zur Multimodalen Interaktion, 55–94. Tübingen: Narr.
Mondada, Lorenza and Reinhold Schmitt 2009. Situationseröffnungen. Zur Multimodalen Herstel-
lung Fokussierter Interaktion. Tübingen: Narr.
Müller, Cornelia 1994. Cómo se llama …? Kommunikative Funktionen des Gestikulierens in
Wortsuchen. In: Peter Paul König and Helmut Wiegers (eds.), Satz-Text-Dikurs, 71–80. Tübin-
gen, Germany: Niemeyer.
26. 20th century: Empirical research of body, language, and communication 413
Özyürek, Asli 2000. The influence of addressee location on spatial language and representational
gestures of direction. In: David McNeill (ed.), Language and Gesture, 64–83. Cambridge: Cam-
bridge University Press.
Özyürek, Asli 2002. Do speakers design their cospeech gestures for their addressees? The effects of
addressee location on representational gestures. Journal of Memory and Language 46: 688–704.
Özyürek, Asli, Sotaro Kita and Shanley Allen 2001. Tomato man movies: Stimulus kit designed to
elicit manner, path and causal constructions in motion events with regard to speech and ges-
tures. Nijmegen, the Netherlands: Max Planck Institute for Psycholinguistics, Language and
Cognition group.
Panther, Klaus Uwe and Linda Thornburg 2007. Metonymy. In: Dirk Geerarts and Hubert Cuyckens
(eds.), The Oxford Handbook of Cognitive Linguistics, 236–264. Oxford: Oxford University Press.
Parrill, Fey 2008. Form, meaning and convention: An experimental examination of metaphoric
gestures. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gestures, 225–247. Amster-
dam: John Benjamins.
Pease, Allan and Barbara Pease 2003. Der tote Fisch in der Hand und andere Geheimnisse der Kör-
persprache. Berlin: Ullstein.
Pike, Kenneth 1967. Language in Relation to a Unified Theory of the Structure of Human Behav-
ior. The Hague: Mouton.
Pike, Kenneth 1972. Towards a theory of the structure of human behavior In: Ruth M. Brend
(ed.), Kenneth L. Pike – Selected Writings, 106–116. The Hague, Paris: Paris Mouton.
Quek, Francis, David McNeill, Robert Bryll, Susan Duncan, Xin-Feng Ma, Cemil Kirbas, Karl E.
McCullough and Rashid Ansari 2002. Multimodal human discourse: Gesture and speech. Asso-
ciation for Computing Machinery, Transactions on Computer-Human Interaction 9(3): 171–193.
Ruesch, Jürgen and Gregory Bateson 1951. Communication: The Social Matrix of Psychiatry. New
York: Norton.
Ruesch, Jürgen and Weldon Kees 1956. Nonverbal Communication: Notes on the Visual Percep-
tion of Human Relations. Berkeley: University of California Press.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the orga-
nization of turn-taking for conversation. Language 50: 696–735.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1977. The preference for self-correction
in the organization of repair in conversation. Language 53: 361–382.
Sapir, Edward 1949. Selected Writings in Language, Culture and Personality. Edited by David
Mandelbaum. Berkeley: University of California Press.
Scheflen, Albert 1973. Communicational Structure. Bloomington: Indiana University Press.
Scherer, Klaus R. and Harald G. Walbott 1984. Nonverbale Kommunikation. Forschungsberichte
zum Interaktionsverhalten. Weinheim: Beltz.
Schmitt, Reinhold 2005. Zur multimodalen Struktur von turn-taking. Gesprächsforschung 6: 17–61
(www.gespraechsforschung-ozs.de).
Schmitt, Reinhold (ed.) 2007a. Koordination. Analysen zur Multimodalen Interaktion. Tübingen:
Narr.
Schmitt, Reinhold 2007b. Von der Konversationsanalyse zur Analyse multimodaler Interaktion.
In: Heidrun Kämper and Ludwig M. Eichinger (eds.), Sprach-Perspektiven, 395–417. Tübingen:
Narr.
Schmitt, Reinhold and Arnulf Deppermann 2007. Monitoring und Koordination als Voraussetzun-
gen der multimodalen Konstitution von Interaktionsräumen. In: Reinhold Schmitt (ed.), Koor-
dination. Analysen zur Multimodalen Interaktion, 95–128. Tübingen: Narr.
Schönherr, Beatrix 1997. Syntax – Prosodie – nonverbale Kommunikation. Empirische Untersu-
chungen zur Interaktion Sprachlicher und Parasprachlicher Ausdrucksmittel im Gespräch. Tü-
bingen, Germany: Niemeyer.
Schönherr, Beatrix 2001. Paraphrasen in gesprochener Sprache und ihre Kontextualisierung durch
prosodische und nonverbale Signale. Zeitschrift für Germanistische Linguistik 29: 332–363.
26. 20th century: Empirical research of body, language, and communication 415
Seyfeddinipur, Mandana 2004. Meta-discursive gestures from Iran: Some uses of the “Pistol
Hand”. In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Every-
day Gestures, 205–216. Berlin: Weidler.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting Speech and Gesture. Max Planck Institute
Series in Psycholinguistics, 39. Nijmegen, NL: Radboud Universiteit Nijmegen.
Sowa, Timo 2006. Understanding Coverbal Iconic Gestures in Object Shape Descriptions. Berlin:
Akademische Verlagsgesellschaft.
Sparhawk, Carol 1978. Contrastive-Identificational features of Persian gesture. Semiotica 24(1/2):
49–86.
Streeck, Jürgen 1993. Gesture as communication I: Its coordination with gaze and speech. Com-
munication Monographs 60(4): 275–299.
Streeck, Jürgen 1994. Gesture as communication II: The audience as co-author. Research on Lan-
guage and Social Interaction 27(3): 239–267.
Streeck, Jürgen 1995. On projection. In: Esther Goody (ed.), Social Intelligence and Interaction,
87–110. Cambrigde: Cambridge University Press.
Streeck, Jürgen 1996. How to do things with things. Objets trouvés and symbolization. Human Stu-
dies 19: 365–384.
Streeck, Jürgen 2007. Geste und verstreichende Zeit. Innehalten und Bedeutungswandel der “bieten-
den Hand”. In: Heiko Hausendorf (ed.), Gespräch als Prozess, 157–180. Tübingen: Narr.
Streeck, Jürgen 2009. Gesturecraft: The Manu-facture of Meaning. Amsterdam: John Benjamins.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: Gestures at the transition place. In: Peter Auer and
Aldo di Luzio (eds.), The Contextualization of Language, 138–158: Amsterdam: John Benjamins.
Sweetser, Eve 1998. Regular Metaphoricity in Gesture: Bodily-Based Models of Speech Interaction.
London: Elsevier.
Sweetser, Eve and Fey Parrill 2004. What we mean by meaning: Conceptual integration in gesture
analysis and transcription. Gesture 4(2): 197–219.
Tabensky, Alexis 2001. Gesture and speech rephrasings in conversation. Gesture 1(2): 213–235.
Taub, Sarah F. 2001. Language from the Body. Iconicity and Metaphor in American Sign Language.
Cambridge: Cambridge University Press.
Tepper, Paul, Stefan Kopp and Justine Cassell 2004. Context in context: Generating language and
iconic gesture without a gestionary. AAMAS ’04 Workshop on Balanced Perception and Action
for Embodied Conversational Agents: 79–86.
Teßendorf, Sedinha 2005. Pragmatische Funktionen spanischer Gesten am Beispiel des “Gestos de
Barrer”. MA thesis, Institut für deutsche und niederländische Philologie, Freie Universtität Berlin.
Teßendorf, Sedinha 2008. Pragmatic and metaphoric gestures–combining functional with cognitive
approaches. Unpublished manuscript.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1/2): 83–105
Tylor, Edward B. 1964. Researches into the Early History of Mankind and the Development of Civ-
ilization. Edited and abridged with an introduction by Paul Bohannan. Chicago: University of
Chicago Press. First published [1870].
Wachsmuth, Ipke 2005. Multimodale Interaktion in Mensch-Maschine-Systemen. In: Christiane
Steffens, Manfred Thüring and Leon Urbas (eds.), Zustandserkennung und Systemgestaltung.
6. Berliner Werkstatt Mensch-Maschine-Systeme, 1–6. Düsseldorf: VDI.
Watzlawick, Paul, Janet Beavin Bavelas and Don D. Jackson 1969. Menschliche Kommunikation –
Formen, Störungen, Paradoxien. Bern: Huber.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. In: Lynn Messing (ed.), Proceed-
ings of WIGLS. The Workshop on the Integration of Gesture in Language and Speech. October
7–8, 1996, 79–95. Delaware, Newark: Applied Science and Engineering Laboratories Newark.
Webb, Rebecca 1998. The lexicon and componentiality of American metaphoric gestures. In:
Christian Cave, Isabelle Guaitelle and Serge Santi (eds.), Oralité et Gestualité: Communication
Multimodale, Interaction, 387–391. Montreal: L’Harmattan.
416 III. Historical dimensions
Weinrich, Lotte 1992. Verbale und Nonverbale Strategien in Fernsehgesprächen: Eine Explorative
Studie. Tübingen: Niemeyer.
Williams, Robert F. 2004. Making meaning from a clock: Material artifacts and conceptual blend-
ing in time-telling instruction. Ph.D. dissertation, Department of Cognitive Science, University
of California, San Diego.
Wolff, Charlotte 1945. A Psychology of Gesture. London: Methuen.
Wundt, Wilhlem 1921. Die Sprache. Völkerpyschologie: Eine Untersuchung der Entwicklungsge-
setze von Sprache, Mythus und Sitte. Erster Band. Leipzig: Kröner.
Abstract
To talk about artistic dance as a language is a delicate issue. It is especially sensitive
because language obviously in the first place refers to communication, thereby evoking
the impression that dance is per se “understandable” in a global sense, as it does not
use words but bodily motion to express itself in front of an audience. Though protagonists
of Modern Dance, such as Mary Wigman, talk about the “language of dance” (Wigman
1963), the modes of transmitting contents or meaning are very different from the verbal
mode. Even the idea of meaning as a hermeneutic category is partly neglected in the
course of contemporary dance in Europe since the 1990s.
While dance in the baroque era and in classical ballet from the 19th century onwards
still holds on to movement as a codified language, choreographers in the course of Mod-
ern Dance in the 20th century are less focused on an identifiable vocabulary according to
aesthetic conventions, but instead approach the modes of language in a more expressive
way. The following foray through dance from the Baroque until today delineates
developments of the notion of dance as a codified system during the different epoches.
significations of the body that went with it. Regarded as burlesque dance, this period
was merely a short interlude in the history of the performing arts, and afterwards
dance was fixed again in its role as an interludious part within opera plays (Franko
1993: 3–5). Art in the era of Baroque served a highly social function. José Antonio Mar-
avall, who is cited by Franko, articulates art in that time as “the gesticulating submission
of the individual to the confines of the social order” (Franko 1993: 4). In France, “per-
mission” for those expressions was given by the king, namely Louis XIV, who estab-
lished the court ballet in order to legitimate “the monarch in his double status as
real and ideal body,” thus serving as the warrant of political power (Franko 1993: 3;
Lee 2002: 66). Along with this representational turn, Sarah R. Cohen highlights the
“[t]ransformation of aristocratic performance into an aesthetic product that a wider
public could appropriate” (Cohen 2000: 166). As a further effect, rules were established
that turned dance into an art form in need of professionals, thus – going along with re-
ceiving it as art on stage – removing it from being a social amusement physically avail-
able for everybody (Jeschke 1996: 93). Under the regency of Louis XIV, dance
developed into a regulated system of steps and positions, assisted by Pierre Rameau’s
determination of the five foot positions in ballet, recorded in 1725 (Rameau 1967:
11–22), as well as the signification of designated steps like “balancé, jeté, pirouette
[and] entrechat” that still exist today (Lee 2002: 82).
Already a century before, dance was understood as a written text and as a represen-
tation of aristocratic power (Franko 1993: 36). Accordingly, stage dance produced a dis-
ciplined body executing strict codes of movement, fixed in written, in the sense of the
written word, choreo-graphed instructions, such as the famous Orchesography by
Thoinot Arbeau in 1589 (Lepecki 2006: 6–7). Consequently, André Lepecki formulates
the assumption that “the modern body revealed itself as a linguistic entity” (Lepecki
2006: 7). Exemplary for this phenomenon of dance in the early baroque period is the
so called geometrical dance, focusing on specific shapes and figures of the moving
body which the audience was supposed to literally “read” as a text (Franko 1993:
15–16), supported by a flattened dimension of the event presented on stage: “Geomet-
rical dance […] acquired its name from geometrical and symbolic patterns that were de-
signed to be seen from above as if they were horizontal or flat on a page” (Franko 1993:
21). The dancers’ bodies even formed words executed through specific postures
(Jeschke 1996: 89): A prominent printed example is the Alfabeto Figurato by Giovanni
Battista Bracelli in 1624 (Fig. 27.1; Franko 1993: 17). Thus dance in the Baroque be-
comes a regulated model of reading as well as a written, choreo-graphed text itself,
at least since Raoul Auger Feuillet established a notational system for dance (Feuillet
1700, Jeschke 1996: 91).
From the middle of the 17th century, dance manuals appeared, especially in France,
Germany and Great Britain, that gave instructions and thereby codified the danced
movement on stage, generating “[…] a technology that creates a body disciplined to
move according to the commands of writing” (Lepecki 2006: 6). Thus, stage dance
was established as an art form following fixed rules and patterns according to positions
of the body itself, the steps as well as arms, hands and fingers. Apart from regulating
movement according to the social system, an important goal was to give dance the sta-
tus of an accepted art form: Claude François Ménéstrier claims that dance expresses
itself through movement in the same way that painting does so through its own specific
means (Ménéstrier 1682: 41). The system now established by various authors regulates
418 III. Historical dimensions
Fig. 27.1: Giovanni Battista Bracelli, Alfabeto Figurato. From: Bizzarie di varie figure (1624), Bib-
liothèque nationale, Paris
the body’s motion in every possible position. John Weaver for instance gives exact
directives about how to stand and to use gravity and balance in order to execute a cer-
tain set of movements that followed (Weaver 1721: 97). Kellom Tomlinson turns this
initial anatomical need into a social posture essential to literally marking one’s own
position and representational status in society, by giving very detailed advice how to
stand, e.g., during a conversation:
[…] when we stand in Company, are when the weight rests much on one Foot, as the other,
the Feet being considerably seperated or open, the Knees [sic!] streight [sic!], the Hands
placed by the side in a genteel Fall or natural Bend of the Wrists, and being in an agreeable
fashion or Shape about the joint or bend of the Hip, with the Head gracefully turning to
the Right or Left, which completes a most Heroic Posture. (Tomlinson 1735: 4)
Wendy Hilton provides a detailed overview about the use of different body parts, e.g.,
the arms, according to the social dance conventions, including Tomlinson’s instructions
(Hilton 1981: 133–171). Stephanie Schroedter shows how those positions and gestures
change meaning during the course of time: A slightly different shape of the arms or
27. Language – gesture – code 419
fingers, for example, was enough to expose a dancer as not being up-to-date with the
current social conventions (Schroedter 2008: 425).
Apart from the choreographed guidelines involving a stern code of bodily attitudes
necessary to communicate in social settings, the project of many dance authors was to
establish a vocabulary appropriate to express emotions in dance. Ménéstrier in partic-
ular reflects about the “affects of the soul” as a primary function of artistic dance
(Ménéstrier 1682: 161). In addition to the movement, which is usually denominated
as the steps, the “expressions” are the vehicles to transport emotions and passions
(Ménéstrier 1682: 158; Jeschke 1996: 95–96). The bodily attitudes involved in the pro-
duction of affects such as love, hate, grief or terror are more complicated than the sim-
ple execution of steps. These attitudes also bear a certain ambivalence: on the one hand,
Ménéstrier asks for preferably “natural” postures which shall be able to trigger the
imagination of the audience, and on the other hand encloses them in a codified and
regulated system according to the laws of “rhetoric,” providing almost a catalogue
of appropriate gestures in relation to the expressed emotion (Jeschke 1996: 96–97;
Huschka 2006: 115; Fig. 27.2). Thus a literal language of affects and emotions becomes
an intrinsic part of dance as a movement sytem on stage, claiming the potential of
making its contents clearly understandable for the spectators.
About 80 years later, Jean Georges Noverre strengthens the idea that dancers should
not only be accurate interpreters of formalized figures but also should support the
mode of emotional expression within the dance. In his Lettres sur la Danse (1760), he
postulates the “truthful expression” that should withdraw from the mere abstraction
of danced figures but tend to unify body, mind and soul, thus claiming an aesthetics
420 III. Historical dimensions
of authenticity avant la lettre (Huschka 2006: 107). However, this “naturalness” is again
a constructed one, produced by a necessary literacy that lets the whole dance “speak”
and links it again to language, expressed in Noverre’s statement “[w]ords become use-
less, everything will speak, each movement will be expressive, each attitude will depict a
particular situation, each gesture will reveal a thought, each glance will convey a new
sentiment; [sic!] everything will be captivating, because all will be a true and faithful
imitation of nature” (Noverre 1951: 52). With his proclaimed danse d’action (Noverre
1951: 52), he provides the basis for ballet as an autonomous art form that literally
can “speak for itself ” through its movement and gestures. Still, ballet is bound to the
opera, but Noverre prepares the ground for ballet being some 100 years later not
only the appendix of opera plays any more, or being “parked” in an interlude position.
Noverre promotes the change from the more abstract ballet de cour, serving social
functions, into the ballet en action, making pantomime a key element (Schroedter
2004: 411).
By gesture we present to the eyes all that we cannot express to the ears; it is a universal
interpreter that follows us to the very extremities of the globe […] Speech is the language
of reason: it convinces our minds; tones and gestures form a sentimental discourse that
moves the heart. Speech can only give utterance to our passions, by means of reflection
through their relative ideas. Voice and gesture convey them to those we address, in an
immediate and direct manner. In short, speech, or rather the words which compose it, is
an artificial institution, formed and agreed upon between men, for a more distinct recipro-
cal communication of their ideas; whilst gesture and the tone of voice are […] the dictio-
nary of simple nature; they are a language innate in us, and serve to exhibit all that
concerns our wants and the preservation of our existence; for which reason they are
rapid, expressive, and energetic. Such a language cannot but be an inexhaustable source
to an art whose object is to move the deepest sensations of the soul! (Blasis 1976: 113–114)
In favouring the pantomime (Blasis 1976: 114), he differentiates between two kinds of
gestures: “natural” and “artificial” ones (Blasis 1976: 114). The former are meant to
comprise all those expressions that are linked to the “sentiments,” the latter are related
to the “moral world” (Blasis 1976: 114). Akin to Noverre, Blasis opts for a system of
gestural conventions that shall be able to transmit the expressed emotions via an arti-
ficial language presented on stage. However, he partly quits the field of “direct”
27. Language – gesture – code 421
There is a difference of inspiration in the dance today. Dance [is] no longer performing its
function of communication. By communication is not meant to tell a story or to project an
27. Language – gesture – code 423
idea, but to communicate experience by means of action and perceived by action. […] The
departure of the dance from classical and romantic delineations was not an end in itself,
but the means to an end. […] The old forms could not give voice to the more fully awa-
kened man. They had to undergo metamorphosis – in some cases destruction – to serve
as a medium for a time differently organized. (Graham 1979: 50)
Referring to the new motile expressions necessary for Modern Dance, Mary Wigman
distinguishes between “stage dance,” where pantomime is used to “interpret[] mean-
ingful action,” and the so called “absolute dance […] independent of any literary-in-
terpretative content; it does not represent, it is; and its effect on the spectator who is
invited to experience the dancer’s experience is on a mental-motoric level, exciting
and moving” (Wigman 1963: 35–36). Wigman articulates the idea of a direct transmit-
ting affect of e/motion to the audience, a phenomenon John Martin describes as “[m]
etakinesis” in the beginning of the 1930s ( J. Martin 1969: 14): “emotional experience
can express itself through movement directly” ( J. Martin 1969: 18). However, Yvonne
Hardt depicts that emotion in dance is not a state of free floating feelings but achieved
by a long lasting process of bodily discipline in the rehearsal studio (Hardt 2006: 143).
Wigman’s idea of a “language of dance” thus accentuates the symbolic level of dance
movement (Wigman in Schikowski 1930), and contemporary dance critic Hans
Brandenburg even conceives of her work as presenting pure form in terms of a “phe-
nomenology of movement […] following no other form than the one emerging from
the movement itself ” (Brandenburg 1921: 201–202; see also Wigman 1963: 12;
Manning 1993: 18). However, Rudolf von Laban suggests a sytem of notation to rec-
ord the new “languages” of Modern Dance. Since there is a lack of an explicit vocab-
ulary provided for by ballet, he responds to the need of visualizing the movement
patterns of dance in an appropriate way, taking into consideration the levels of the
body in space, the movement of the extremities and its motile qualities, called efforts
(Laban 1966).
Far from the idea of an absolute dance, Doris Humphrey insists on dance acting in a
social field: “Just self-expression, provided that can be had at all, is certainly not accept-
able” (Humphrey 1959: 31). She differentiates between a repertory of gestures, among
them the so called “ritual gesture” and the “social gesture,” like hand shaking, embra-
cing and so on (Humphrey 1959: 115–118). Those gestures are not necessarily meant to
be pantomimic but serve as a foil in creating dance themes and motives by stylizing the
movement, and should be always considered within their corporal history (Humphrey
1959: 115, 121–124). Humphrey also develops a set of “emotional gesture[s],” not mim-
icking certain feelings, but exploring the body’s shape caused by them. The situation of
griefing for instance lets the body contract itself, wrapping the arms around the torso,
etc. Those explorations serve as a starting point for the development of “patterned
emotions” in dance (Humphrey 1959: 118), and are partially reminiscent of the postures
Ménéstrier suggested three hundred years before.
Apart from that, it has to be recognized that in this historical period (often female)
dancers and choreographers are less interested in a literal mimesis of contents or
emotions, as they represent themselves not only as the authors (choreographers) of
their own dances but also by expressing themselves in writing, like accompanying
self-reflective commentaries and manifestos (Brandstetter 1995b: 37–38).
424 III. Historical dimensions
5. References
Agamben, Giorgio 2004. Notes on gesture. In: Hemma Schmutz and Tanja Widman (eds.), That
Bodies Speak Has Been Known for a Long Time, 105–114. Cologne: Walther König.
Anderson, Jack 1992. Ballet and Modern Dance. A Concise History. Hightstown: Princeton Book
Company.
Banes, Sally 1987. Terpsichore in Sneakers. Post-Modern Dance. Hanover: Wesleyan University
Press.
Blasis, Carlo 1976. The Code of Terpsichore: A Practical and Historical Treatise on the Ballet, Dan-
cing and Pantomime; with A Complete Theory of the Art of Dancing: Intended As Well for the
Instruction of Amateurs as the Use of Professional Persons (1828), Facsimile of the James Bul-
lock edition, London, 1928. New York: Dance Horizons.
Brandenburg, Hans 1921. Der Moderne Tanz. Munich: Georg Müller.
Brandstetter, Gabriele 1995a. Tanz-Avantgarde und Bäder-Kultur – Grenzüberschreitungen
zwischen Freizeitwelt und Bewegungsbühne. In: Erika Fischer-Lichte (ed.), TheaterAvant-
garde, Wahrnehmung – Körper – Sprache, 123–155. Tübingen: A. Francke.
Brandstetter, Gabriele 1995b. Tanz-Lektüren. Körperbilder und Raumfiguren der Avantgarde.
Frankfurt am Main: Fischer.
Brandstetter, Gabriele 2004. “The code of Terpsichore.” Carlo Blasis’ Tanztheorie zwischen Ara-
beske und Mechanik. In: Gabriele Brandstetter and Gerhard Neumann (eds.), Romantische Wis-
senspoetik. Die Künste und die Wissenschaften um 1800, 49–72. Würzburg: Königshausen and
Neumann.
Cohen, Sarah R. 2000. Art, Dance and the Body in French Culture of the Ancien Régime. Cam-
bridge: Cambridge University Press.
Duncan, Isadora 2008. Der Tanz der Zukunft (1903). In: Magdalena Tzaneva (ed.), Isadora Dun-
cans Tanz der Zukunft. 10 Stimmen zum Werk von Isadora Duncan, 33–50. Berlin: LiDi
EuropEdition.
Falcone, Francesca 2007. Between tradition and innovation: Bournonville’s vocabulary and style.
In: Gunhild Oberzaucher-Schüller (ed.), Souvenirs de Taglioni. Bühnentanz in der Ersten
Hälfte des 19. Jahrhunderts, Vol. 2, 283–292. Munich: Kieser.
Feuillet, Raoul-Auger 1700. Chorégraphie ou l’Art de décrire la Dance par Caractères, Figures et
Signes Demonstratifs. Paris: Michel Brunet. Reprint 1979. Hildesheim: Georg Olms.
Foellmer, Susanne 2009. Am Rand der Körper. Inventuren des Unabgeschlossenen im Zeitgenös-
sischen Tanz. Bielefeld: transcript.
Franko, Mark 1993. Dance as Text. Ideologies of the Baroque Body. Cambridge: Cambridge Uni-
versity Press.
426 III. Historical dimensions
Graham, Martha 1979. Graham 1937. In: Jean Morrison Brown (ed.), The Vision of Modern
Dance, 49–53. New Jersey: Princeton Book Company.
Guest, Ivor 1966. The Romantic Ballet in Paris. London: Pitman and Sons.
Hardt, Yvonne 2006. Reading emotions. Lesarten des Emotionalen am Beispiel des modernen
Tanzes in den USA. In: Margit Bischof, Claudia Feest and Claudia Rosiny (eds.), e_motion,
139–155. Hamburg: Lit.
Hilton, Wendy 1981. Dance of Court and Theater. The French Noble Style 1690–1725. Princeton,
NJ: Princeton Book Company.
Hoffmannsthal, Hugo von 2005. The Lord Chandos Letter and Other Writings. Selected and trans-
lated by Joel Rotenberg. New York: New York Review Book.
Humphrey, Doris 1959. The Art of Making Dances. New York: Rinehardt.
Huschka, Sabine 2006. Der Tanz als Medium von Gefühlen. Eine historische Betrachtung. In:
Margit Bischof, Claudia Feest and Claudia Rosiny (eds.), e_motion, 107–122. Hamburg: Lit.
Husemann, Pirkko 2002. Ceci Est de la Danse. Choreographien von Meg Stuart, Xavier Le Roy
und Jérôme Bel. Norderstedt: Books on Demand.
Jeschke, Claudia 1996. Körperkonzepte des Barock – Inszenierungen des Körpers und durch den
Körper. In: Sibylle Dahms and Stephanie Schroedter (eds.), Tanz und Bewegung in der Bar-
ocken Oper, 85–105. Innsbruck: StudienVerlag.
Jeschke, Claudia, Isa Wortelkamp and Gabi Vettermann 2005. Arabesken. Modelle “fremder”
Körperlichkeit in Tanztheorie und – inszenierung. In: Claudia Jeschke and Helmut Zedelmaier
(eds.), Andere Körper – Fremde Bewegungen. Theatrale und Öffentliche Inszenierungen im
19. Jahrhundert, 169–210. Münster: Lit.
Laban, Rudolf 1966. Choreutics. London: Macdonald and Evans.
Lee, Carol 2002. Ballet in Western Culture. A History of Its Origins and Evolution. New York:
Routledge.
Lepecki, André 2006. Exhausting Dance. Performance and the Politics of Movement. New York:
Routledge.
Manning, Susan A. 1993. Ecstasy and the Demon. Feminism and Nationalism in the Dances of
Mary Wigman. Berkeley: University of California Press.
Martin, John 1969. The Modern Dance. New York: Dance Horizons. First published [1933].
Martin, Randy 1998. Critical Moves. Dance Studies in Theory and Politics. Durham, NC: Duke
University Press.
Ménéstrier, Claude François 1682. Des Ballets Anciens et Modernes selon les Règles du Théatre.
Paris: Rene Giugnard.
Noverre, Jean Georges 1951. Letter VII. In: Jean Georges Noverre (ed.), Letters on Dancing and
Ballet (Lettres sur la Danse et les Ballets, 1760), 49–55. Translated by Cyril W. Beaumont, 1930.
London: Beaumont.
Pappacena, Flavia 2007. Dance terminology and iconography in early nineteenth century. In: Gun-
hild Oberzaucher-Schüller (ed.), Souvenirs de Taglioni. Bühnentanz in der Ersten Hälfte des 19.
Jahrhunderts, Vol. 2, 95–112. Munich: Kieser.
Rainer, Yvonne 1974. Parts of Some Sextets. Some retrospective notes on a dance for 10 people
and 12 mattrasses called “Parts of Some Sextets,” performed at the Wadsworth Atheneum,
Hartford, Connecticut and Judson Memorial Church, New York, March 1965 (1965). In:
Yvonne Rainer (ed.), Yvonne Rainer: Work 1961–73, 45–51. Halifax: Press of the Nova Scotia
College of Art and Design.
Rameau, Pierre 1967. Le Maı̂tre à Danser (1725). Facsimile of the Paris edition. New York:
Broude Brothers.
Schikowski, John 1930. Absoluter Tanz. In: Mainzer Anzeiger, 19.1.1930, Mary – Wigman –
Archive, Akademie der Künste Berlin, folder N – S.
Schlicher, Susanne 1987. TanzTheater. Traditionen und Freiheiten. Pina Bausch, Gerhard Boh-
ner, Reinhild Hoffmann, Hans Kresnik, Susanne Linke. Reinbek bei Hamburg: Rowohlt.
28. Communicating with dance 427
Schroedter, Stephanie 2004. Vom “Affect” zur “Action.” Quellenstudien zur Poetik der Tanzkunst
vom Späten Ballet de Cour zum Frühen Ballet en Action. Würzburg: Königshausen and
Neumann.
Schroedter, Stephanie 2008. The French art of dancing as described in the German dance instruc-
tion manuals of the early 18th century. In: Stephanie Schroedter, Marie Thérèse Mourey, and
Giles Bennett (eds.), Baroque Dance and the Transfer of Culture around 1700, 412–448. Hilde-
sheim, Germany: Georg Olms.
Schulze, Janine 1999. Dancing Bodies Dancing Gender. Tanz im 20. Jahrhundert aus der Perspek-
tive der Gender-Theorie. Dortmund: Edition Ebersbach.
Siegmund, Gerald 2006. Abwesenheit. Eine Performative Ästhetik des Tanzes. William Forsythe,
Jérôme Bel, Xavier Le Roy, Meg Stuart. Bielefeld: transcript.
Smith, Marian 2000. Ballet and Opera in the Age of “Giselle.” Princeton, NJ: Princeton University
Press.
St. Denis, Ruth 1979. The dance as life experience In: Jean Morrison Brown (ed.), The Vision of
Modern Dance, 21–25. New Jersey: Princeton Book Company.
Tomlinson, Kellom 1735. The Art of Dancing Explained by Reading and Figures. London. Reprint
1970. Westmead: Gregg International.
Weaver, John 1721. Anatomical and Mechanical Lectures upon Dancing. London: J. Brotherton
and W. Meadows.
Wigman, Mary 1963. Die Sprache des Tanzes. Stuttgart: Battenberg
Abstract
The article presents an overview of the aesthetic and anthropological reflections on the
“language” of the body on stage from classical times to the present. It discusses the his-
toriography of these two academic fields and their corresponding traditions by placing
them in conversation with one another. The article focuses on a few paradigmatic
concepts and methods that are often applied when artists and scholars think about the
relation of dance with regards to the body, language, and representation.
428 III. Historical dimensions
4. Historical development
Anthropology is both a product of the Enlightenment and the increasing process of col-
onization since the 18th century, which provoked the study and comparison of people in
different spaces and the investigation of long term human development. Those who first
used the term were not particularly interested in dance, although Georg Forster in his
“Leitfaden zu einer künftigen Geschichte der Menschheit” clearly integrated the dance
of so called “wilde” people into his humanisitc vision that saw dance as mankind’s first
step progressing toward civilization (Forster [1789] 1974: 191).
By the 19th century, the focus was clearly directed toward gestures, which by then were
considered “undoubtedly, the very soul and support of the Ballet” (Blasis 1828: 111).
Blasis, whose Code of Terpsichore is considered the first and most comprehensive sys-
tematization of ballet steps, as it still informs ballet today, drew on the Greek distinction
between phora, considered a gesture for the expression of emotions and actions, and
schemata, which is considered a gesture which expresses the characteristics of a person
or thing and that was widely circulated in publications, for his distinction between natural
gestures (for emotions) and artificial (for abstract concepts) (Blasis 1828: 133; Lawler
1964; Ley 2003: 476). For 18th and 19th century writer, dancers needed to learn this lan-
guage of gestures in order to enhance expressivity (Blasis 1828; Cahusac 1754: 2–32;
Gallini 1762; Noverre 1726: 28). The belief that dance is a universal language that could
be studied with scientific means did not yet signify that it was also considered natural.
Reading Dancing (1986) and Janet Adshead’s Dance Analysis (1988). Published only
two years apart, these two studies coincide with the advent of the academic establish-
ment of dance studies in Anglophone academia. They argued in their comparative ap-
proaches for a competence in reading dance that needed to be learned in similar ways
as language. Foster, who began with the working methods of four contemporaneous
choreographers, clearly demonstrated how “wrong” expectations of how a dance com-
municates might make the dance inaccessible for a viewer even within a Western
concert dance context. Rather, she suggests that in order to understand dance one
needs to understand its different modes of representation (Foster 1986). Adshead en-
couraged a cross-cultural perspective, applying her movement oriented dance analysis
to a comparison between modern dance, English Folk dancing and Toganian dance
(Adshead 1988).
However, dance studies predominantly kept the distinction of aesthetics and anthro-
pological investigations apart up to the mid-1990s because the establishment of a more
theoretically founded dance scholarship coincided and took its inspiration from the so-
called post-modern dance that was bound to a highly modernistic artistic discourse that
opted for the independence of art. Influential choreographers, such as Merce Cunning-
ham, and the protagonists of the Judson Dance Theater, such as Yvonne Rainer, Steve
Paxton, and Trisha Brown, wanted to stage dance that presented movement for move-
ment’s sakes, avoiding classical narration as well as emotional expressivity (Banes
1987). By performing movement that was pedestrian, without any organic organisation
and flow and performed in casual manner, they not only questioned the boundaries of
what defines dance but challenged the convention that dance should communicate or
represent at all. They also destabilized the inevitable correlation between specific ges-
tures or movements and their symbolic or emotional meanings. The focus was drawn to
the context in which gestures and movements could appear and attain a symbolic mean-
ing on stage at all (Hardt 2006). Despite such a broad view of the potential of how
movement could signify, aesthetical discourse on dance participated in defending and
explaining post-modern dance’s rejection of former expressivity in dance, and it clearly
followed the modernistic argumentation of dancers that separated it from representation
(Banes 1987; Franko 1995).
social context and cultural exchanges. The focus is now also geared towards the inter-
dependence of social and artistic dance, as, for instance, Claudia Jeschke demonstrates
in case studies that both dance genres shared similar movement motives, dynamics and
preferences for body parts within their historical period (Jeschke 1999). These studies
indicate that the division with respect to the focus of locality between aesthetic and
anthropological perspective is slowly being eroded.
Provoked by the increasing process of globalization, the focus is now geared toward
how dance cultures have always travelled through different cultural contexts and how
they are reshaped and in this process are assigned new meanings. For instance, Ruth
St. Denis and her staging of Indian Dances, Mary Wigman’s interest in early Germanic
cultures, Nijinskiy’s scandalous Le Sacre du Printemps that marked the event of mod-
ernism in Ballet in 1913 by evoking pagan rituals, or Martha Graham’s appropriation of
Native American dance were now discussed in the context of colonisation, cultural bor-
rowings and hybrid dance cultures (Franko 1995; Jeschke 1999; Manning 1993). More
recent studies have also revealed African American influence on a wide range of dances
and American culture in general, including the neoclassic ballet of the founder of
American ballet, Balanchine (Dixon Gottschild 1998).
This cultural hybridity can also be studied both as dance migrates through different
contexts and also as it is appropriated by different social classes within a similar locality,
as for instance in the development of social dances like Tango (Savigliano 1995), and
HipHop (Desmond 1997; Klein and Friedrich 2004) or the spread of “African dance”
classes all over the world. When these dance forms travel throughout the world and
especially into white middle class contexts, most often the movements of the hips, the
looseness of the legs, or the forward bend of the upper body are replaced by a focus
on figuration of steps and a more upward and confined movement of the body
(Desmond 1997). An exemplary case study is Marta Savigliano’s Tango and the Political
Economy of Passion (1995), which traces how Tango travelled from the working class
pubs in Argentina to the bourgeois night life in Europe at the beginning of the 20th
century and later back to Argentina, where the dance was (re)appropriated by a
wider Argentinean middle class. Savigliano demonstrates how Tango had very different
connotations and meanings for different groups of dancers: while it allowed the white
bourgeois in Europe to express what was considered suppressed and to indulge a fasci-
nation with the exotic, the (re)appropriation of Tango back into Argentina was linked
to what Savigliano calls auto-exotism and the need to define a cultural heritage. With
this, stable notions of culture have finally been put aside.
In general, how dance cultures come to stand in for a national movement has been
traced through a critique of the Folk as signifying tradition. Folk dance is now consid-
ered as part of the cultural invention of tradition that accompanied the foundation of
nation state building in Europe (Baxmann and Cramer 2005; Buckland 2006). The
European folk dance traditions share a similar history for instance with the Indian
Bharatha Natyam, considered as classical Indian dance but which was only revived
and reinvented in the way that it exist today in the process of Indian liberation from
British colonization (Chatterjee 2004; Meduri 2004).
These studies have also challenged the notion that saw traditionally and ethnically
marked dances as always expressing and symbolizing (one of the reasons they were
not considered as avant-garde). For instance, while in Kathak – a classical Indian
28. Communicating with dance 435
dance form – the hand gestures (mudras) have specific meaning potential, what they
end up signifying not only depends on the context and part of the dance they are per-
formed in, but they can also be used in a highly decorative manner in the so called nritta
(non-narrative) sequences of the dance that does not attempt to signify. In this sense,
then, classical Indian dance parallels the structure of the ballet d’action, where sections
that simply demonstrate physical virtuosity (the so called white sections in ballet) are
alternated with more narrative sections (Katrak 2008). However, while the virtuous sec-
tions in both dances may function outside a more classical understanding of communi-
cation in the sense of narration, they are highly implicated in the representational value
of dance that demonstrates physical ability and the skilled performer that separates
the dancer from the audience. This is an expression of a society that values a specific
understanding of virtuosity (Foster 1986).
6. Exposition
The investigation of dance and dance practice has been shifting between preferring
either the performative or the representational mode and “nature” of dancing and
accordingly does not allow one to postulate a specific or universal way of how dance
can communicate. Rather, the different discourses on dance encourage a comparative
and contextualized analysis and, unfortunately, for most of history have only allowed
for research into how discourse has framed the understanding of dance instead of
how audiences or participants might have experienced the practice of dance.
436 III. Historical dimensions
7. References
Adshead, Janet (ed.) 1988. Dance Analysis: Theory and Practice. London: Dance Books.
Banes, Sally 1987. Terpsichore in Sneakers. Post-Modern Dance. Middletown, CT: Wesleyan Uni-
versity Press.
Baxmann, Inge 2000. Mythos: Gemeinschaft. Körper- und Tanzkulturen in der Moderne. Munich:
Fink.
Baxmann, Inge and Franz Anton Cramer 2005. Deutungsräume: Bewegungswissen als Kulturelles
Archiv der Moderne. Munich: Kieser.
Blasis, Carlo 1828. The Code of Terpsichore. A Practical and Historical Treatise on the Ballet, Dan-
cing, and Pontomime with a Complete Theory of the Art of Dancing. London: Bulcock.
Boas, Franziska (ed.) 1944. The Function of Dance in Human Society. First Seminar. New York:
Dance Horizons.
Brandstetter, Gabriele 2005. The code of Terpsichore. The dance theory of Carlo Blasis: Mechan-
ics as the matrix of grace. Topoi 24(1): 67–79.
Brandstetter, Gabriele and Gabriele Klein (eds.) 2006. Methoden der Tanzwissenschaft. Modelana-
lysen am Beispiel von Pina Bauschs “Le Sacre du Printemps”. Bielefeld: transcript.
Brandstetter, Gabriele and Christoph Wulf (eds.) 2007. Tanz als Anthropologie. Berlin: Fink.
Bücher, Karl 1898. Arbeit und Rhythmus, 3rd edition. Leipzig: Teubner.
Buckland, Theresa Jill (ed.) 2006. Dancing from Past to Present. Nation, Culture, Identities. Mad-
ison: University of Wisconsin Press.
Cahusac, Louis de 1754. La Danse Ancienne et Moderne ou Traité Historique de la Danse. A La
Haye, France: Chez Jean Neaulme.
Chatterjee, Ananya 2004. Constructing a historical narrative for Odissi. In: Ann Albright and Ann
Dils (eds.), Rethinking Dance History. A Reader, 143–156. London: Routledge.
Cohen Bull, Cynthia J. 1997. Sense, meaning, and perception in three dance cultures. In: Jane Des-
mond (ed.), Meaning in Motion. New Cultural Studies of Dance, 269–287. Durham, NC: Duke
University Press.
Desmond, Jane (ed.) 1997. Meaning in Motion. New Cultural Studies of Dance. Durham, NC:
Duke University Press.
Dixon Gottschild, Brenda 1998. Digging the Africanist Present. London: Greenwood.
Durkheim, Emile 2001. The Elementary Forms of Religious Life. Translated by Carol Cosman.
Oxford: Oxford University Press. First published [1912].
Forster, George 1974. Leitfaden zu einer künftigen Geschichte der Menschheit. In: Siegfried
Scheibe (ed.), Georg Forsters Werke: Kleine Schriften zu Philosophie und Zeitgeschichte,
Volume 8. Berlin: Akademie Verlag. First published [1789].
Foster, Susan Leigh 1986. Reading Dancing. Berkeley: University of California Press.
Foster, Susan Leigh 1996. Choreography and Narrative: Ballet’s Staging of Story and Desire. Bloo-
mington: Indiana University Press.
Franko, Mark 1993. Dance as Text: Ideologies of the Baroque Body. Cambridge: Cambridge Uni-
versity Press.
Franko, Mark 1995. Dancing Modernism: Performing Politics. Bloomington: Indiana University
Press.
Gallini, Giovani 1762. A Treatise on the Art of Dancing. London: R. Dodsley.
Hall, Stuart 1999. Kulturelle Identität und Globalisierung. In: Karl Hörning and Rainer Winter
(eds.), Widerspensitge Kulturen. Cultural Studies als Herausforderung, 393–439. Frankfurt am
Main: Suhrkamp.
Hardt, Yvonne 2004. Politische Körper: Ausdruckstanz, Choreographien des Protests und die Ar-
beiterkulturbewegung in der Weimarer Republik. Münster: Lit.
Hardt, Yvonne 2006. Reading emotions: Lesarten des Emotionalen am Beispiel des modernen
Tanzes in den USA (1945–1965). In: Margrit Bischof, Claudia Feest and Claudia Rosiny
28. Communicating with dance 437
Münster (eds.), e-motions. Jahrbuch der Gesellschaft für Tanzforschung 16: 139–155. London,
London & Münster: Lit.
Hutchinson Guest, Ann 2005. Labanotation. The System of Analyzing and Recording Movement,
4th edition. New York: Routledge.
Jeschke, Claudia 1983. Tanzschriften: Die Illustrierte Darstellung eines Phänomens von den Anfän-
gen bis zur Gegenwart. Bad Reichenhall: Comes Verlag.
Jeschke, Claudia 1999. Tanz als BewegungsText. Analysen zum Verhältnis von Tanztheater und
Gesellschaftstanz (1910–1965). Tübingen: Niemeyer.
Kaeppler, Adrienne L. 1996. Dance. In: David Levinson and Melvin Ember (eds.), Encyclopedia
of Cultural Anthropology, Volume 1, 309–313. New York: Routledge.
Kamper, Dietmar and Christoph Wulf (eds.) 1982. Die Wiederkehr des Körpers. Frankfurt am
Main: Suhrkamp.
Katrak, Ketu H. 2008. The gestures of Bharata Natyam: Migration into diasporic contemporary
Indian dance. In: Carrie Noland and Sally Ann Ness (eds.), Migrations of Gesture, 217–240.
Minneapolis: University of Minnesota Press.
Kealiinohomoku, Joann Wheeler 1969/70 An anthropologist looks at ballet as a form of ethnic
dance. Impulse 20: 24–33.
Klein, Gabriel and Malte Friedrich 2004. Is This Real? Die Kultur des HipHop. Frankfurt am
Main: Suhrkamp.
Laban, Rudolph von 1922. Die Welt des Tänzers. Fünf Gedankenreigen. Vienna: W. Seifert.
Langer, Susan 1953. Feeling and Form. A Theory of Art Developed from Philosophy in a New Key.
New York: Charles Scribner’s Sons.
Lawler, Lillian 1964. The Dance of Ancient Greece. London: Adam and Charles Black.
Ley, Graham 2003. Modern visions of Greek tragic dance. Theatre Journal 55: 467–480.
Lorenz, Maren 2000. Leibhaftige Vergangenheit. Einführung in die Körpergeschichte. Tübingen:
edition discord.
Manning, Susan 1993. Ecstasy and the Demon: Feminism and Nationalism in the Dances of Mary
Wigman. Berkeley: University of California Press.
Martin, John 1965. The Modern Dance. New York: Dance Horizon. First published [1933].
Meduri, Avanthi 2004. Bharatha Natyam – what are you? In: Ann Cooper Albright and Ann Dils
(eds.), Moving History / Dancing Cultures. A Dance History Reader, 103–113. Middeltown, CT:
Wesleyan University Press.
Nietzsche, Friedrich 1972. Die Geburt der Tragödie aus dem Geist der Musik: In: Griorriog Colli
and Mazzino Montinari (eds.), Werke. Kritische Gesamtausgabe, 17–152. New York: De Gruy-
ter. First published [1872].
Novack, Cynthia 1990. Sharing the Dance. Contact Dance: Contact Improvisation and American
Culture. Madison: University of Wisconsin Press.
Noverre, Jean Jacques 1727. Lettres sur la Danse et sur les Ballets, Précédées d’une Vie de l’Au-
teur, par André Levinson. Paris. (Letters on Dancing and Ballet. New York: Dance Horizons,
1803.)
Sachs, Curt 1937. World History of Dance. New York: Norton.
Savigliano, Marta 1995. Tango and the Political Economy of Passion. Boulder, CO: Westview
Press.
Schlesier, Renate 2007. Kulturelle Artefakte in Bewegung. Zur Geschichte der Anthropologie des
Tanzes. In: Gabriele Brandstetter and Christoph Wulf (eds.), Tanz als Anthropologie, 132–145.
Berlin: Fink.
Sheets-Johnstone, Maxine 1966. The Phenomenology of Dance. London: Dance Books.
Sklar, Deidre 2008. Remembering kinesthesia: An inquiry into embodied cultural knowledge. In:
Carrie Noland and Sally Ann Ness (eds.), Migrations of Gesture, 85–111. Minneapolis: Univer-
sity of Minnesota Press.
Spencer, Herbert 1862. First Principles. London: Watts.
438 III. Historical dimensions
Spencer, Paul (eds.) 1985. Society and the Dance. The Social Anthropology of Process and Perfor-
mance. Cambridge: Cambridge University Press.
Taylor, Diana 2003. The Archive and the Repertoire. Performing Cultural Memory in the Americas.
Durham, NC: Duke University Press.
Weickmann, Darion 2002. Der dressierte Leib. Kulturgeschichte des Balletts (1580–1870). Frankfurt
am Main: Campus.
Wigman, Mary 1935. Deutsche Tanzkunst. Dresden: Reißner.
Williams, Drid 1991. Ten Lectures on Theories of the Dance. London: Scarecrow Press.
Youngerman, Suzanne 1974. Curt Sachs and his heritage: A critical review of world history of the
dance with a survey of recent studies that perpetuate his ideas. CORD News 6: 6–19.
Abstract
Mimesis, a term that is usually translated as “imitation,” has been central to the fields of
arts, aesthetics, and poetry. However, mimesis is not only a theoretical term. The mimetic
ability is fundamental to all realms of human action and understanding. Mimetic pro-
cesses may be described as the way humans interact with the world and how they recreate
the existing world through their own capacity for giving form. This paper presents an
overview of the history of scholarly reflections on the notion of mimesis from Aristotle
to the present. Three main periods are focused on: a) the period spanning from its first
formulations by Plato and Aristotle in antiquity to the early modern period, b) the period
in which mimesis is declared meaningless by rationalist and idealist philosophies, while
still playing an important social role, and c) the rehabilitation of the concept in the
20th century. The paper concludes by presenting new findings from neuroscience and
developmental psychology.
1. Introduction
In the history of philosophy, mimesis is usually understood as an aesthetic category and
translated as “imitation”. This understanding prevents one from seeing the wide field of
application and manifold meanings that mimesis has been given in the course of the
centuries. However, mimesis plays a decisive role in almost every area of human imag-
ination and action, thought and language. The mimetic ability is part of the conditio
humana and is central to human understanding. In the course of history, it unfolds its
29. Mimesis: The history of a notion 439
semantic spectrum with terms such as: expression and portrayal as well as mimicry,
imitatio, representation, game and ritual.
Mimetic processes may be described as the way human beings interact with the
world in which they live. They take in the world through their senses, yet they do not
passively endure, but respond to it with constructive measures: what they receive
from the world is formed by their own action. The mimetic ability has to be discovered
or constructed through and with the model. In mimetic acts, the subject recreates the
existing world through its own capacity for giving form. In this respect, mimesis is two-
fold: a creation of semblance and a social or artistic formation in another medium. To
the extent that mimesis in fact creates what it copies, it precedes the epistemological
distinction between truth and falsehood. As a productive activity, mimesis belongs
to the domain of practice. It is not a theoretical term; its aim is not knowledge but
world making (Goodman 1978). What is created and how this is achieved depends
on the competence of the world maker; therein lies its creative aspect.
The conceptual history of the term mimesis lacks clear delineations and exhibits a
certain resistance to theoretical appropriation. Insofar as the concept of mimesis is
linked to actions, productive activities and processes, the artificiality and acuity of sci-
entific definitions is not conducive to it. Mimetic processes combine experientially
acquired practico-technical skills with capacities for practical cognition and evaluation.
With the beginning of the predominance of rational thought and the postulate of the
singularity of the subject, the concept of mimesis loses its outstanding position in the
history of thought, which it had occupied prior to Cartesian rationalism and idealism.
The history of the concept of mimesis may be divided into three periods:
(i) the long period spanning from its first formulations by Plato and Aristotle to the
early modern period;
(ii) the period, in which mimesis is declared meaningless by rationalist and idealist phi-
losophies, while still playing an important social role;
(iii) the rehabilitation of the concept in the 20th century by authors such as Walter
Benjamin, Theodor W. Adorno, René Girard, and Jaques Derrida (the first impor-
tant work about mimesis in literature can be found in Auerbach 1953; see recent
studies on the concept of mimesis: Gebauer and Wulf 1995; Girard 1977; Girard
1986; Halliwell 2002; Kablitz 1998; Kardaun 1993; Koch 2010; Melberg 1995; Petersen
2000; Schönert 2004; Scholz 1998; Taussig 1993).
In more recent times, a discovery of mimesis may be observed in cultural studies, the
social sciences as well as in the neurosciences.
for education, where it is mainly accomplished through the emulation of models (Plato
1991). A young person strives to become similar to the model. The representation of
bad models is potentially dangerous as it may spoil the youth. Poetic representations
stimulate the mimetic capacity and thus initiate transformations and alterations. This,
however, does not imply that they are easily subordinated to pedagogical and social ob-
jectives. They rather run the risk of developing in an uncontrolled and rampant way and
may thus lead to unwanted side effects. This is the reason why Plato calls for a regulation
of poetry and of the models it portrays.
In Plato’s work, the central significance of mimesis for art, poetry and music is
already intimated. Mimesis is attributed with the capacity of creating a world of illusion.
Imitation is conceived of as the ability to produce images, not things. The images’ defin-
ing feature is their resemblance to things and objects, in which the real and the imag-
inary intermingle. Insofar as images are defined through resemblance, they belong to
the world of appearances by visualizing something, which they themselves are not.
They thus occupy an intermediary space between being and nonbeing.
Plato’s work contains comprehensive, yet also contradictory accounts of mimesis and
a scathing criticism of its truth-value (see Havelock 1963). It can be assumed that Pla-
to’s heterogeneous conception is connected to the transition from oral to literate cul-
ture (see Ong 1982). From this perspective, Plato’s condemnation of mimesis may be
linked to his efforts to replace speech with conceptual discourse. The fraught combina-
tion of oral with literate culture, in which a transformation of language and thought an-
nounces itself, remains characteristic for Plato. With the dissemination of writing and
reading, a number of mimetic particularities of oral culture lose their significance.
Other mimetic capabilities, connected to writing and reading, come to the fore (see
Havelock 1982). It is likely that Aristotle’s allocation of mimesis to the domains of
poetry, art and music may only be adequately understood in the context of the
“literateness” of his philosophy.
In his Poetics, Aristotle (1987) points out that man is distinguished from other ani-
mals through his mimetic ability. Mimesis is innate and already shows itself in earliest
childhood. With its help, man is capable of learning even before he can develop other
forms of world appropriation. The specific form this ability might take depends on how
and by means of which contents it develops. Aristotle differentiates between two as-
pects of mimesis: in the first, drawing on Plato, he emphasises the significance of mime-
sis for the creation of images. He then develops his own conception of literary mimesis
(see Halliwell 1986). In his understanding, mimesis does not merely aim at the recre-
ation of a given state of affairs but also aspires towards transformation, particularly
towards beautification, improvement and a universalization of individual traits.
In poetry, mimesis represents the possible and the universal. The Poetics interprets it
as ‘fable’ or ‘plot’ (Ricoeur 1984: 52ff.). In tragedy, it is aimed at the dramatization and
embodiment of the speaking and acting human being and may be grasped as the capac-
ity for poetic representation, which expresses itself in linguistic and imaginary plot out-
lines. Mimesis creates fictive worlds, with no direct relation to reality. It is less through
the individual elements of the plot than trough artistic organisation that the cathartic
effects of tragedy are produced (see Goldstein 1966). In contrast to Plato, who dreads
the negative consequences of models, Aristotle regards their mimetic emulation as a
chance to diminish their force. A confrontation with models, not their evasion, is thus
to be desired.
29. Mimesis: The history of a notion 441
Plato’s and Aristotle’s elaborations develop the semantic horizon of mimesis, which
all other discussions of this concept up to the present day will refer to.
and secular authorities. The call for mimesis thus leads to a formal obligation in literary
representation while enabling a subjective expression of the individual.
With the dissemination of the book and an increase of readers, texts acquire an
unprecedented importance. The question of how the reading of texts may be influenced
through commentaries and guidelines becomes highly relevant. Through the immediate
reference of one text to another, intertextuality becomes an issue (see Fumaroli 1980).
What does it mean for literary production that texts mimetically relate to other texts
and thus create new texts? What is the relation between textuality and the significant
literary practice of fragmentation, conspicuous in the work of Montaigne (1958)? The
systematic use of fragments certainly constitutes a modern principle of literary compo-
sition and is accounted for by the insufficiency of human knowledge. In Montaigne it
exists in correlation with self-referentiality, in which the “I” is always ultimately
referred back to itself (see Starobinski 1985). This self-referentiality constitutes the
central mimetic framework for the understanding of the modern subject.
symbolic order in its own right (Foucault 1970). The representations thus generated
lay claim to universal validity. They are applied in all symbolic domains with state-
supporting functions and are thus conflated into an all-encompassing fiction. The object
of these fictionalizing representations is the power of the king and of the state. Yet this
all-encompassing demand is met with a profound scepticism towards the representa-
tional function of language and painting, especially in the works of Racine, La Roche-
foucauld, Pascal and Madame Lafayette (see Lyons 1982; Stierle 1984; Todorov 1982).
Other thinkers, too, question the mimetic representation of reality, most notably Des-
cartes, who supplants representational thinking with the scientific method (see Ehrard
1970; Fischer-Lichte 1989: 70; Judowitz 1988).
the arbiter between inner life and the public (see Habermas 1991; Sennett 1977).
It extends its realm into the domain of emotion, yet subordinates emotional expression
to taste and verisimilitude. The Aristotelian categories are interpreted by the anthro-
pology of the emerging bourgeois society, one of the problems being that bourgeois
society is not yet fully formed. This leads to a special situation, in which theatrical
mimesis participates in actively shaping a society and its anthropology – with its central
conceptions of man, humanity, morality and free will.
Theatrical mimesis develops an impact far beyond the stage. It transforms the every-
day into a dramatic text, in which the individuals play their part (see Geyer 2007; Will-
ems 2009). Acting becomes a social category. For Diderot and Lessing, the two
spokesmen of bourgeois dramatic art, there exist ideal models in nature, which authors
and actors need to grasp (Lessing 1962). Both adhere to an idea of a universal natural
order given to mimesis, yet both are already on their way to a new conception (see Be-
laval 1950; Lacoue-Labarthe 1989). They recognize that semiotic systems and media
not only influence what is portrayed but also decisively participate in its creation (see
Genett 1969; Goodman 1968).
Towards the end of the 18th century, the interest of art theoreticians in the concept
of mimesis begins to wane. Contemporaneous with this decline of interest, mimetic pro-
cesses become increasingly important for the creation of the social world. Mimesis
slowly develops into an all-encompassing, yet theoretically barely recognized social
category.
Rituals refer to earlier rituals that have already been performed, without being mere
copies of the latter. Every performance of a ritual is based on a new performance that
entails a modification of earlier ritualistic activities. In mimetic processes, a mimetic
relation with earlier ritualistic arrangements is established (Gebauer and Wulf 2003;
Wulf 2005). The dynamics of ritual push toward repetition as well as difference. Repe-
tition of ritualistic action never entails an exact reproduction of the past, but leads to
the creation of a new situation, in which the difference to prior realisations of the ritual
constitute a constructive element (Wulf et al. 2004).
Games, akin to plays and rituals, are also mimetically structured. Every performance
of a game is unique. At the same time it constitutes a mimetic repetition and an update
of cultural knowledge, contained in the numerous games and applied in the game’s ac-
tivities, for which games represent mnemonic possibilities (Gebauer and Wulf 1998).
Through games, people enter special realms of experience and modify their own as
well as their society’s cultural knowledge. At the same time, by virtue of their partici-
pation, they are transformed into players: games create their players and, insofar as
they foster social practice and integration, they expand the players’ social knowledge
and their ability to act competently – yet only within a small fragment of the entirety
of social possibilities. Their actions are based on an “as-if ” and in this sense they are
both serious and playful. If this double meaning is lost, the game becomes serious
and may cause violence.
In games, the sense of ludic activity develops before the players notice it. It emerges
without entering consciousness. Depending on the kind of game – free play, institutio-
nalised games, competition, dramatics or gambling – the sense that emerges within the
game is always different. Roger Caillois (1961) describes these different game varieties
as agon (competition), alea (chance), mimicry (mask) and llinx (intoxication) and sets
them in relation to the societies in which they are played. Despite their significant dif-
ferences, games are always characterised by a mimetic reference to the world outside
the game. The analysis of games therefore generates an insight into the social and
cultural reality of a society (see Gebauer et al. 2004).
2008). By now, there no longer seems to be any doubt concerning the biological foun-
dations of mimetic behaviour in humans, whereas there is no consensus about their
relations to the formation of social mimetic processes.
6. References
Adorno, Theodor W. 1978. Minima Moralia. Translated by Edmund Jephcott. London: Verso.
Adorno, Theodor W. 1984. Aesthetic Theory. Ttranslated by C. Lenhardt. London: Routledge.
Apostolidès, Jean-Marie 1985. Le Prince Sacrifié. Théâtre et Politique au Temps de Louis XIV.
Paris: Minuit.
Aristotle 1987. Poetics. Translated by Stephen Halliwell. Chapel Hill: University of North Carolina
Press.
Auerbach, Erich 1953. Mimesis. The Representation of Reality in Western Literature. Translated by
Willard R. Trask. Princeton, NJ: Princeton University Press.
Bastiaansen, Jojanneke A., Marc Thioux and Christian Keysers 2009. Evidence for mirror systems
in emotions. Physiological Transactions of the Royal Society 364(1528): 2392–2404.
Belaval, Yvon 1950. L’Esthétique sans Paradoxe de Diderot. Paris: Gallimard.
Benjamin, Walter 2006. Problems in the sociology of language. In: Howard Eiland and Michael W.
Jennings (eds.), Walter Benjamin: v. 3: Selected Writings, 1935–1938. Cambridge, MA: Harvard
University Press.
Boileau-Despréaux, Nicolas 1683. Art of Poetry. Translated by Sir William Soames. London: Bent-
ley and Magnes.
Bourdieu, Pierre 1989. The historical genesis of a pure aesthetic. Journal of Aesthetics and Art
Criticism 46: 201–210.
Boyd, John D. 1968. The Function of Mimesis and Its Decline. Cambridge, MA: Harvard Univer-
sity Press.
Caillois, Roger 1961. Man, Play and Games. Translated by Meyer Barash. New York: Free Press of
Glencoe.
Derrida, Jacques 1981. The double session. In: Jacques Derrida, Dissemination, 187–237. Trans-
lated by Barbara Johnson. Chicago: University of Chicago Press.
Du Bellay, Joachim 1968. La Défence et illustration de la langue Française. In: Hubert Gillot (ed.),
La Querelle des Anciens et des Modernes en France. Geneva: Slatkine.
Ehrard, Jean 1970. L’Ideé de Nature en France à l’Aube des Lumières. Paris: Flammarion.
Elias, Norbert 1978. The Civilizing Process. Translated by Edmund Jephcott. New York: Urizon
Books.
Elias, Norbert 1983. The Court Society. Translated by Edmund Jephcott. New York: Pantheon
Books.
Enticott, Peter G., Kate E. Hoy, Sally E. Herring, Patrick J. Johnston, Zafiris J. Daskalakis and
Paul B. Fitzgerald 2008. Mirror neuron activation is associated with facial emotion processing.
Neuropsychologia 46: 2851–2854.
Fischer-Lichte, Erika 1989. Semiotik des Theaters. Eine Einführung, Vol. 2. Tübingen: Narr.
Fittler, Doris M. 2005. Ein Kosmos der Ähnlichkeit: Frühe und Späte Mimesis bei Walter Benjamin.
Bielefeld: Aisthesis.
Flasch, Kurt 1965. Ars imitatur naturam. Platonischer Naturbegriff und mittelalterliche Philoso-
phie der Kunst. In: Kurt Flasch (ed.), Parusia. Festgabe für Johannes Hirschberger, 265–306.
Frankfurt am Main: Minerva.
Foucault, Michel 1970. The Order of Things: An Archaeology of the Human Sciences. New York:
Vintage Books.
Foucault, Michel 1979. Discipline and Punish: The Birth of the Prison. Translated by Alan Sher-
idan. New York: Vintage Books.
448 III. Historical dimensions
Koller, Hermann 1954. Die Mimesis in der Antike. Nachahmung, Darstellung, Ausdruck. Bern: A.
Francke.
Lacoue-Labarthe, Philippe 1989. Typography: Mimesis, Philosophy, Politics. Cambridge, MA:
Harvard University Press.
Lessing, Gotthold Ephraim 1962. Hamburg Dramaturgy. Translated by Victor Lange. New York:
Dover.
Lukács, Georg 1952. Balzac und der Französische Realismus. Berlin: Aufbau-Verlag.
Lyons, John D. 1982. Speaking in pictures, speaking of pictures. Problems of representation in
the seventeenth Century. In: John D. Lyons and Steven G. Nichols Jr. (eds.), Mimesis:
From Mirror to Method, Augustine to Descartes, 166–187. Hanover, NH: University of
New England Press.
Melberg, Arne 1995. Theories of Mimesis. Cambridge: Cambridge University Press.
Montaigne, Michel de 1958. The Complete Essays. Translated by Donald M. Frame. Stanford, CA:
Stanford University Press.
Nelson, Benjamin 1977. Die Anfänge der modernen Revolution in Wissenschaft und Philosophie.
Fiktionalismus, Probabilismus, Fideismus und katholisches Prophetentum. In: Benjamin Nel-
son (ed.), Der Ursprung der Moderne, 94–139. Frankfurt am Main: Suhrkamp.
Ong, Walter J. 1982. Orality and Literacy: The Technologizing of the Word. London: Routledge.
Petersen, Jürgen H. 2000. Mimesis – Imitatio – Nachahmung: Eine Geschichte der europäischen
Poetik. Munich: Fink.
Plato 1991. The Republic of Plato, 393 c. translated by Allan Bloom. New York: Basic Books.
Prendergast, Christopher 1986. The Order of Mimesis. Balzac, Stendhal, Nerval, Flaubert. Cam-
bridge: Cambridge University Press.
Putnam, Hilary 1981. Reason, Truth and History. Cambridge: Cambridge University Press.
Reiss, Timothy 1982. Power, poetry and the resemblance of nature. In: John D. Lyons and Steven
G. Nichols, Jr. (eds.), Mimesis: From Mirror to Method, Augustine to Descartes, 215–247. Han-
over, NH: University Press of New England.
Ricoeur, Paul 1984. Time and Narrative, Vol. 1. Chicago: University of Chicago Press.
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain. Oxford: Oxford University
Press.
Scholz, Bernhard F. (ed.) 1998. Mimesis: Studien zur literarischen Repräsentation. Tübingen, Ger-
many: Francke.
Schönert, Jörg (ed.) 2004. Mimesis – Repräsentation – Imagination: Literaturtheoretische Positio-
nen von Aristoteles bis zum Ende des 18. Jahrhunderts. Berlin: de Gruyter.
Sennett, Richard 1977. The Fall of Public Man. New York: Alfred A Knopf.
Starobinski, Jean 1985. Montaigne in Motion. Translated by Arthur Goldhammer. Chicago: Uni-
versity of Chicago Press.
Stierle, Karlheinz 1984. Das bequeme Verhältnis. Lessings “Laokoon” und die Entdeckung des
ästhetischen Mediums. In: Gunter Gebauer (ed.), Das Laokoon-Projekt. Pläne einer semio-
tischen Ästhetik, 23–58. Stuttgart, Germany: Metzler.
Szondi, Peter 1979. Die Theorie des Bürgerlichen Trauerspiels im 18. Jahrhundert. Edited by Gert
Mattenklott. Frankfurt am Main: Suhrkamp.
Taussig, Michael T. 1993. Mimesis and Alterity: A Particular History of the Senses. New York:
Routledge.
Todorov, Tzvetan 1982. Theories of the Symbol. Translated by Catherine Porter. Ithaca, NY: Cor-
nell University Press.
Tomasello, Michael 1999. The Cultural Origins of Human Cognition. Cambridge, MA: Harvard
University Press.
Tomasello, Michael 2008. The Origins of Human Communication. Cambridge: Massachusetts
Institute of Technology Press.
Willems, Herbert (ed.) 2009. Theatralisierung der Gesellschaft. Soziologische Theorie und Zeit-
diagnose. Wiesbaden: Verlag für Sozialwissenschaften.
450 III. Historical dimensions
Wulf, Christoph 2005. Zur Genese des Sozialen. Mimesis, Performativität, Ritual. Bielefeld:
transcript.
Wulf, Christoph, Birgit Althans, Kathrin Audehm, Constanze Bausch, Benjamin Jörissen,
Michael Göhlich, Ruprecht Mattig, Anja Tervooren, Monika Wagner-Willi and Jörg
Zirfas 2004. Bildung im Ritual. Schule, Familie, Jugend, Medien. Wiesbaden: Verlag für
Sozialwissenschaften.
Zimbardo, Rose A. 1986. A Mirror to Nature. Transformations in Drama and Aesthetics 1660–1732.
Lexington: University Press of Kentucky.
Abstract
The mirror system hypothesis for the evolution of the language ready brain hypothesizes
that the ability to produce and recognize hand movements for practical goals may have
underwritten the evolution of brain systems supporting manual gesture, with further evo-
lution yielding protosign and protospeech. The chapter thus introduces mirror neurons
and mirror systems, stressing the role of direct and indirect pathways in supporting imita-
tion, and then traces the posited path from mirror neurons to language, placing language
within the framework of bodily communication. Ontogenetic ritualization in apes is
assessed in relation to human gestural communication, noting the role of human emo-
tional and intentional dispositions in supporting communication that goes beyond the
communicative repertoire of apes that lack human contact, pointing the way to deixis.
Before going further we introduce a cast of brain regions (in both macaque and
human) which will play roles in what follows:
Motor cortex is the region of cerebral cortex that can send detailed motor commands
to, for example, the muscles of the hands, thus supporting skilled movements.
Premotor cortex is the region of cerebral cortex just in front of motor cortex. It pro-
vides neural codes for actions and movements that can be sent to motor cortex to be
elaborated into instructions for the musculature.
Broca’s region is an area of the human brain which, in the left hemisphere, is
traditionally associated with speech production.
The basal ganglia form a subcortical region traditionally associated with the
sequencing of movement.
The insula is a region deep within the folds of cerebral cortex. It plays a role in
diverse functions related to emotion.
With this, let us turn to mirror neurons and see their role in action and communica-
tion, while also emphasizing the cooperation of many other brain regions as well in sup-
porting these functions: A mirror neuron in the brain of creature C for an action A is a
neuron that fires both
(i) when C executes A and actions similar to A, but not when C executes dissimilar
actions, and
(ii) when C observes another creature execute an action more-or-less similar to A.
We need single-cell recording to tell whether or not a neuron has these properties. Mir-
ror neurons were first recorded in Parma from an area called F5 in the premotor cortex
of the brains of macaque monkeys, and were mirror neurons for particular kinds of
grasp, e.g., neurons that fired in association with a precision pinch but not a power
grasp or vice versa (Gallese et al. 1996). Subsequent modeling advanced the view
that mirror neurons gained their specificity through learning (Oztop and Arbib 2002),
a view amply confirmed by empirical data showing that some mirror neurons could
respond to tearing paper versus breaking peanuts, with some responsive to the sound
as well as the sight of the action (audiovisual neurons; Bonaiuto, Rosta, and Arbib
2007; Kohler et al. 2002), while other mirror neurons could be formed – but only
after extensive training of the monkey – that were responsive to tool use (Umiltà
et al. 2008). Moreover, mirror neurons have been found in macaques for oro-facial ac-
tions, both ingestive actions and some related communicative gestures like lip-smacks
and teeth chatters (Ferrari et al. 2003), and it has been suggested that the insula may
contain mirror neurons for the expression of disgust (Wicker et al. 2003, see Rizzolatti
and Craighero 2004 for an extensive review and references to the primary literature on
macaque mirror neurons and human mirror neuron systems).
In humans, our data come from brain imaging, not single neuron recording. We say a
brain region is a mirror system for a class A of actions if, relative to a suitable control
condition, it exhibits significant activation both for humans executing actions of class A
and humans observing other people execute actions of class A. Brain imaging has re-
vealed that the human brain has a mirror system for grasping – regions that are more
highly activated both when the subject performs a range of grasps and observes a
range of grasps (Grafton et al. 1996, Rizzolatti et al. 1996). Following this early work
there has been an explosion of papers on the imaging of various human mirror systems,
30. Mirror systems and the neurocognitive substrates 453
but very few papers exploring mirror neurons in the macaque, and the latter primarily
from the Parma group and their colleagues.
The terms mirror system or mirror neuron system suggest the false impression that
observed activity must be due primarily to firing of mirror neurons (though in some
cases it might be). Quite apart from other issues, this ignores the fact that brain regions
such as F5 in the macaque which contain mirror neurons also contain canonical neurons
(which fire as the monkey executes actions, but not during observation of actions of
others) and other neuron types as well, so that finer-grained analysis is needed to
tease apart the contribution of different classes of neurons in the brain activation
seen in diverse studies.
Complementing this concern is the fact that much discussion of action understanding
focuses on the brain regions thought to contain mirror neurons, excluding the role of
other brain regions. While it is true that mirror neuron activity correlates with observing
an action, it is probably false that such activation is sufficient for understanding the
movement. A possible analogy might be to observing a bodily gesture in a foreign
culture – one might be able to recognize much of the related movement of head, body,
arms and hands that constitute it, yet be unable to understand what it means within
the culture.
Rizzolatti and Sinigaglia (2008: 137) assert that numerous studies (e.g., Calvo-Merino
et al. 2005) “confirm the decisive role played by motor knowledge in understanding
the meaning of the actions of others.” The cited study used functional Magnetic Reso-
nance Imaging (fMRI) brain imaging to study experts in classical ballet, experts in ca-
poeira (a Brazilian dance style) and inexpert control subjects as they viewed videos of
ballet or capoeira actions. They found greater bilateral activations in various regions
including “the mirror system” when expert dancers viewed movements that they had
been trained to perform compared to movements they had not. Calvo-Merino et al.
(2005) assert that “their results show that this ‘mirror system’ integrates observed
actions of others with an individual’s personal motor repertoire, and suggest that
the human brain understands actions by motor simulation.” However, nothing in the
study shows that the effect of expertise is localized entirely within the mirror system.
Indeed, the integration they posit could well involve “indirect” influences from the
prefrontal cortex.
Indeed, Rizzolatti and Sinigaglia (2008: 137) go on to say that the role played by
motor knowledge does not preclude that these actions could be “understood with
other means”. Buccino et al. (2004) used fMRI to study subjects viewing a video, with-
out sound, in which individuals of different species (man, monkey, and dog) performed
ingestive (biting) or communicative (talking, lip smacking, barking) acts. In the case of
biting there was a clear overlap of the cortical areas that became active in watching
man, monkey and dog, with activation in areas linked to mirror systems. However,
although the sight of a man moving his lips as if he were talking induced strong “mirror
system” activation in a region that corresponds to Broca’s area; the activation was weak
when the subjects watched the monkey lip smacking, and disappeared completely when
they watched the dog barking. Buccino et al. (2004) conclude that actions belonging to
the motor repertoire of the observer (e.g., biting and speech reading) are mapped on
the observer’s motor system, whereas actions that do not belong to this repertoire
(e.g., barking) are recognized without such mapping. However, in view of the distribu-
ted nature of brain function, I would suggest that the understanding of all actions
454 IV. Contemporary approaches
involves general mechanisms which need not involve the mirror system strongly – but
that for actions which are in the observer’s repertoire these general mechanisms may be
complemented by activity in the mirror system which enriches that understanding by
access to a network of associations linked to the observer’s own performance of such
actions.
(e)
F5 Mirror F5 Canonical
Act
Recognize Actions Specify Actions
Observe
Scene Interpretation
& Action Planning
Action Understanding
Fig. 30.1: The perceptuomotor coding for both observation and execution contained in F5 region
for manual actions in the monkey is linked to “conceptual systems” for understanding and plan-
ning of such actions within the broader framework of scene interpretation. The interpretation and
planning systems themselves do not have the mirror property save through their linkage to the
actual mirror system.
Summarizing this discussion, Fig. 30.1 emphasizes (a) that F5 (and presumably any
human homologue labeled as a “mirror system”) contains non-mirror neurons (here
the canonical neurons are shown explicitly), and (b) that it functions only within a
broader context provided by other brain regions for understanding and planning of ac-
tions within a broader framework of scene interpretation. The direct pathway (e) from
mirror neurons to canonical neurons for the same action may yield “mirroring” of the
observed action, but is normally under inhibitory control. In some social circumstances,
a certain amount of mirroring is appropriate, but the total lack of inhibition exhibited in
echopraxia and echolalia (Roberts 1989) – compulsive repetition of observed actions or
heard words – is pathological.
– The direct route, supporting imitation of meaningless and intransitive gestures, con-
verts a visual representation of limb motion into a set of intermediate limb postures
or motions for subsequent execution (low-level imitation).
– The indirect route recognizes and then performs known actions.
30. Mirror systems and the neurocognitive substrates 455
Direct Direct
Object Recognition
Route Route
System
(Action)
Phonological Buffer
???
Fig. 30.2: A dual route imitation learning model balancing language and praxis. The “?” indicates
that the right-hand side should be augmented by an “action buffer”, as described later in this chap-
ter. The upright dashed oval emphasizes the bidirectional link between lexicon and semantics
(Arbib and Bonaiuto 2008; adapted from Rothi, Ochipa, and Heilman 1991).
In terms of observable motion, imitation of an action (acting upon an object) and pan-
tomime of that action (in the absence of an object) may appear the same. However, imi-
tation is the generic attempt to reproduce movements performed by another, whether
to master a skill or simply as part of a social interaction. By contrast, pantomime is per-
formed with the intention of getting the observer to think of a specific action or event.
It is essentially communicative in its nature. The imitator observes; the pantomimic
intends to be observed.
In a novel pantomime, the pantomimic is acting out, perhaps in reduced form, the
mental rehearsal of an action – but often in the absence of the objects with which
the action normally complies. The effort may thus be guided by imaginary affordances
(i.e., imagining those aspects of the object which could be involved in an action) and/or
motor memory (of the movements involved in the action). Even here there is a distinc-
tion. One may pantomime an action in one’s own repertoire using the same effectors
that one uses for the action itself, though perhaps with somewhat reduced and less
articulated movements, given the absence of real objects. Or one may pantomime
an action not within one’s repertoire. This might involve both a visual memory (per-
haps missing essential details) of an observed action, and a mapping of observed
effectors onto one’s own. One might pantomime the movement of the wings of a
456 IV. Contemporary approaches
butterfly by making one’s hands flutter by; while pantomiming the wings of a soaring
gull might employ the full length of the arms held relatively fixed as the top of the
body sways.
While much discussion of the mirror system stresses that it is part of the motor sys-
tem, it must be more strongly stressed that action and perception are interwoven. The
ability to recognize an action must rest on appropriate processing of visual or other sen-
sory input derived from observation of someone (whether self or other) performing the
action. For a specific example, consider the goal of writing a cursive lower case letter
“a”. One can use a long pole with a paint brush on the end to write on a wall in
what, despite little skill, is recognizably one’s own handwriting even if one has never
attempted the task before and thus the necessary actions are not within the motor rep-
ertoire. The “secret” is that we have a visual, effector-independent, representation of
the letter in terms of the relative size and position of marker strokes. Making the
strokes is highly automatized in the case of a pen or pencil but the visual recognition
can be invoked to control the novel end-effector of the pole-attached paintbrush,
using visual feedback to match this novel performance against our visual expectations.
Thus the generic goal of writing the “a” will invoke an automatized effector-specific
motor schema if the tool is a pen or pencil, but an “effector-independent” vision-
based representation can be used for feedback control of a specific, albeit unfamiliar,
effector – focusing on the trajectory and kinematics of the end-effector of the writing
action, whatever it may be. More generally, then, the ability to recognize the perfor-
mance of an action must precede acquisition of the action by imitation if indeed it is
novel. At first, the recognition provides feedback for visual control of the novel action.
With practice, “internal models” will be developed for the use of the novel effector, in-
creasing the speed and accuracy of the movement (Arbib et al. 2009). With increasing
skill, feedforward comes to dominate. The feedback becomes less and less exercised in
controlling movement but is still there to handle unexpected perturbations (Hoff and
Arbib 1993).
form the indirect pathway. The direct pathway reminds us that we can also imitate cer-
tain novel behaviors that are not in the praxicon and cannot be linked initially to an
underlying semantics.
For Rothi, Ochipa, and Heilman (1991), the language system at the left of Fig. 30.2
simply serves as a model for their conceptual model of the praxis system at the right,
with semantics playing a bridging role (see Itti and Arbib 2006 for a discussion on
how to extend semantics from objects to “scenes” structured by actions). But in this
section, our challenge is to understand the relevance of mirror systems to an account
of how the system on the left evolved “atop” the system on the right. The portion of
human frontal cortex that serves as the “mirror system for grasping” is located in or
near Broca’s area, traditionally associated with speech production. (See Petrides and
Pandya 2009 for a more subtle analysis than was available ten years earlier of the
macaque homologues of Broca’s area, and the related connectivity in macaque and
human. These new findings have yet to be used to update the analyses reviewed in
the present article.) What could be the connection between these two characteriza-
tions? Inspired in part by findings that damage to Broca’s area could affect deaf
users of signed languages (Poizner, Klima, and Bellugi 1987), not just users of spoken
language, and by earlier arguments for a gestural origin of language, Giacomo Rizzo-
latti and I (Arbib and Rizzolatti 1997; Rizzolatti and Arbib 1998; Rizzolatti and
Arbib 1999) suggested that Broca’s area evolved atop (and was thus not restricted
to) a mirror system for grasping that had already been present in our common ancestor
with macaques, and served to support the parity property of language – that what is
intended by the speaker is (more or less) understood by the hearer, including the
case where “speaker” and “hearer” are using a signed language. This Mirror-System
Hypothesis provided a neural “missing link” for those theories that argue that commu-
nication based on manual gesture played a crucial role in human language evolution
(Armstrong, Stokoe, and Wilcox 1995; Armstrong and Wilcox 2007; Corballis 2002;
Hewes 1973). Many linguists see “generativity” as the hallmark of language, i.e., its abil-
ity to add new words to the lexicon and then to combine them hierarchically through
the constructions of the grammar to create utterances that are not only novel but
can also be understood by others. This contrasts with the fixed repertoire of monkey
vocalizations. However, we stress that such generativity is also present in the repertoire
of behavior sequences whereby almost any animal “makes its living.” It is strange,
then, that Rothi, Ochipa, and Heilman’s (1991) original figure includes a phonological
buffer for putting words together, but omits an “action buffer” for putting actions
together.
Clearly, the Mirror System Hypothesis does not say that having a mirror system is
equivalent to having language. Monkeys have mirror systems but do not have language,
and we expect that many species have mirror systems for varied socially relevant beha-
viors. Moreover, since monkeys have little capacity for imitation – much depends on
how imitation is defined (Visalberghi and Fragaszy 2001; Voelkl and Huber 2000) –
more than a macaque-like mirror system is needed for the purpose of imitation. More-
over, the ability to copy single actions is just the first step towards imitation, since
humans have what I call complex imitation (Arbib 2002) which combines a perceptual
skill – “parsing” a complex movement into more or less familiar pieces – with a motor
skill that exploits it, performing the corresponding composite of (variations on) familiar
actions. Crucial to this is the recognition of the subgoals to which various parts of the
458 IV. Contemporary approaches
behavior are directed (Wohlschläger, Gattis, and Bekkering 2003). I have argued
(Arbib 2002) that chimpanzees (and, presumably, the common ancestor of human
and chimpanzees) have simple imitation, they can imitate short novel sequences
through repeated exposure, whereas only humans have complex imitation.
With this background, I can summarize one hypothetical sequence leading from
manual gesture to a language-ready brain (Arbib 2005, 2012) as follows:
My current view is that stages (iii)–(vi) and a rudimentary (pre-syntactic) form of (vii)
were present in pre-human hominids, but that the “explosive” development of (vii) that
we know as language depended on cultural evolution well after biological evolution had
formed modern Homo sapiens. This remains speculative, and one should note that bio-
logical evolution may have continued to reshape the human genome for the brain even
after the skeletal form of Homo sapiens was essentially stabilized, as it certainly has
done for skin pigmentation and other physical characteristics. However, the fact that
most people can master any language if raised appropriately, irrespective of their
genetic heritage, shows that these changes are not causal with respect to the structure
of language.
Work continues both to fill in the details of the Mirror System Hypothesis and to
develop a neurolinguistics which assesses mechanisms for language processing in the
human brain in relation to this evolutionary perspective. Itti and Arbib (2006) discuss
how to extend semantics from objects to “scenes” structured by actions (these ideas
have been developed in the treatment of semantic representations and template con-
struction grammar by Arbib and Lee 2008). Arbib and Bonaiuto (2008) suggest that
analysis of the action buffer (the missing link of Fig. 30.2) might employ a method
called Augmented Competitive Queuing (Bonaiuto and Arbib 2010). I have then
suggested (briefly) how this might be related to a buffer for utterance formulation
based on construction grammar. Details lie outside the scope of this article (but see
Lee and Barrès, 2013).
30. Mirror systems and the neurocognitive substrates 459
(iv) Subsequently, A anticipates B’s anticipation and produces the prefix in a ritualized
form (waiting for a response rather than completing X) in order to elicit Y.
For example, play hitting is an important part of the rough-and-tumble play of chimpan-
zees, and many individuals come to use a stylized arm-raise to indicate that they are
about to hit the other and thus initiate play (Goodall 1986). However, since there
are group-specific gestures – performed by the majority of individuals of one group,
but not observed in another – it would seem that even if ontogenetic ritualization
were involved in “first use” of a gesture, its spread throughout a group would involve
some form of simple imitation of gestures.
Pointing (using the index finger or extended hand) has only been observed in chim-
panzees interacting with their human experimenters (e.g., Leavens, Hopkins, and Bard
1996; Leavens, Hopkins, and Thomas 2004) as well as in human-raised or language-
trained apes (e.g., Gardner and Gardner 1969; Patterson 1978; Woodruff and Premack
1979), but never between conspecifics. Since both captive and wild chimpanzees share
the same gene pool, Leavens, Hopkins, and Bard (2005) argued that the occurrence
of pointing in captive apes is attributable to environmental influences on their com-
municative development. A related suggestion is that apes do not point for conspeci-
fics because the observing ape will not have the motive to help or inform others or
to share attention and information (Tomasello et al. 2005). One hypothesis that
I find attractive is this: A chimpanzee reaches through the bars to get a banana but
cannot reach it. However, a human does something another chimpanzee would not
do – recognizing the ape’s intention, the keeper gives him the banana. As a result,
the ape soon learns that a point (i.e., the failed reach) is enough to get the pointed-
to object without the exertion of trying to complete an unsuccessful reach. This is a
variation on ontogenetic ritualization that depends on the fact that humans provide
an environment which is far more responsive than that provided by conspecifics in
the wild:
(i) Individual A attempts to perform behavior X to achieve goal G, but fails – achieving
only a prefix Z;
(ii) Individual B infers goal G from this behavior and performs an action that achieves
the goal G for A;
(iii) In due course, A produces Z in a ritualized form to get B to perform an action that
achieves G for A.
Building on this, let me suggest three types of bodily communication – while admit-
ting that the range of articles in this Handbook, Body-Language-Communication, show
that this is a very small sample:
Among the most challenging questions for the evolution of language is its ability to sup-
port “mental time travel” (Suddendorf and Corballis 1997, 2007), the ability to recall
the past and to plan and discuss possible outcomes for the future. But this takes us
beyond the scope of this article.
5. Putting it together
A partial view of what we have achieved in this article is provided by Fig. 30.3. The top
box combines the ability to recognize and perform small, meaningless actions (the
direct pathway) with the ability to recognize and perform familiar goal-directed actions
(the indirect pathway, including the input and output praxicons). A parallel set of mir-
ror systems would support the expression and recognition of emotions (the input and
output emoticons). However, we must go “beyond the mirror” (BtM) to sequence
actions, bringing in, for example the basal ganglia. Given the data on monkeys and
apes, we must invoke other systems beyond the mirror to imitate actions. Apes seem
capable of simple imitation, but further evolution beyond the chimpanzee-human
common ancestor was needed to support complex imitation and the intentional use
of pantomime for intended bodily communication.
Not a Evolution
Flow of From Praxis to
Data Communication
Embodied Semantics:
Schema networks:
Perceive Act
Perceptuo-motor
schema assemblage
Fig. 30.3: A view of the language-ready human brain informed by the Mirror System Hypothesis
(loosely adapted from Arbib 2006). See text for details.
However, as stressed in Fig. 30.1, the recognition of an action is only a small part of
understanding the meaning of the action within the broader context of scene perception
(not to mention mental time travel) and action planning. The bottom box summarizes
the role of schema networks and assemblages of perceptual and motor schemas in
providing the embodied semantics (as discussed in the next paragraph) that supports
this broader understanding. The line linking the top and bottom boxes then indicates
both the separation and coordination between the planning of actions based on
30. Mirror systems and the neurocognitive substrates 463
recognition of current circumstances and goals, and the invocation of specific parameters
in adapting motor control to the specific objects with which interaction is taking place.
The theory of embodied semantics, however, needs to be treated with some caution.
For many authors (e.g., Feldman and Narayanan 2004; Gallese and Lakoff 2005) the
notion is that concepts are represented in the brain within the same sensory-motor cir-
cuitry on which the enactment of that concept relies. By contrast, Fig. 30.3 separates the
semantic pathway (at the bottom) from the basic sensorimotor pathways of the mirror
system (both mirror neurons and other neurons involved in the detailed specification
of actions). The suggestion is that knowing about objects and actions (and thus being
able to plan actions) must be distinguished from the detailed muscle control for
those actions. Nonetheless, the suggestion is that our semantics is grounded in our abil-
ity as embodied actors to interact with the world and the people around us. Then, as we
develop (both ontogenetically and phylogenetically), the concepts that can be referred
directly to physical and social interactions scaffold increasingly abstract concepts that
are, at best, tenuously related to the original sensorimotor processing (Arbib 2008).
Finally, the middle box shows the mirror system for words-as-articulatory-actions as
having evolved from the mirror system for actions (via pantomime, the coupling of ges-
ture and vocalization, and the conventionalization of communicative actions), rather
than being directly driven by it. Although not shown in the diagram, gestures remain
available to complement language (as in the co-speech gestures of McNeill 2005), but
the signs of signed language are the “words” of a fully-formed human language and
so, as part of a rich integrated system of syntax and semantics, are to be distinguished
from gestures more closely related to action. Duality of patterning then involves the
ability to put meaningless sounds together (the direct pathway; phonology) and the
ability to directly access words as appropriate sound assemblages (the input phonological
lexicon) and produce words as articulatory patterns (the output phonological lexicon).
For signed languages, the conventions of handshape, location and movement provide
the equivalent “phonology” (Stokoe 1960). We again need a linkage, now between the
middle box and the bottom box, to link words to their meanings. Again, we must go
beyond the mirror – applying various constructions (using syntax to build semantics)
to combine words into hierarchically structured utterances that express the richness of
our embodied semantics and the abstractions that grow from it, both working with
and moving beyond the capacities of non-linguistic bodily communication.
6. References
Arbib, Michael A. 2002. The mirror system, imitation, and the evolution of language. In: Kerstin
Dautenhahn and Chrystopher L. Nehaniv (eds.), Imitation in Animals and Artifacts. Complex
Adaptive Systems, 229–280. Cambridge: Massachusetts Institute of Technology Press.
Arbib, Michael A. 2005. From monkey-like action recognition to human language: An evolution-
ary framework for neurolinguistics (with commentaries and author’s response). Behavioral and
Brain Sciences 28: 105–167.
Arbib, Michael A. 2006. Aphasia, apraxia and the evolution of the language-ready brain. Apha-
siology 20: 1–30.
Arbib, Michael A. 2008. From grasp to language: Embodied concepts and the challenge of
abstraction. Journal of Physiology-Paris 102(1–3): 4–20.
Arbib, Michael A. 2010. Mirror system activity for action and language is embedded in the inte-
gration of dorsal and ventral pathways. Brain and Language 112: 12–24.
464 IV. Contemporary approaches
Arbib, Michael A. 2012. How the Brain Got Language: The Mirror System Hypothesis. New York:
Oxford University Press.
Arbib, Michael A. and James B. Bonaiuto 2008. From grasping to complex imitation: Modeling
mirror systems on the evolutionary path to language. Mind & Society 7: 43–64.
Arbib, Michael A., James B. Bonaiuto, Stéphane Jacobs and Scott H. Frey 2009. Tool use and the
distalization of the end-effector. Psychological Research 73: 441–462.
Arbib, Michael A. and JinYong Lee 2008. Describing visual scenes: Towards a neurolinguistics
based on construction grammar. Brain Research 1225: 146–162.
Arbib, Michael A., Katja Liebal and Simone Pika 2008. Primate vocalization, gesture, and the evo-
lution of human language. Current Anthropology 49(6): 1053–1076. (See http://www.journals.
uchicago.edu/doi/full/1010.1086/593015#apa for access to supplementary materials and videos.)
Arbib, Michael A. and Giacomo Rizzolatti 1997. Neural expectations: a possible evolutionary
path from manual skills to language. Communication and Cognition 29: 393–424.
Armstrong, David F., William C. Stokoe and Sherman E. Wilcox 1995. Gesture and the Nature of
Language. Cambridge: Cambridge University Press.
Armstrong, David F. and Sherman E. Wilcox 2007. The Gestural Origin of Language. Oxford:
Oxford University Press.
Bonaiuto, James B. and Michael A. Arbib 2010. Extending the mirror neuron system model, II:
What did I just do? A new role for mirror neurons. Biological Cybernetics 102: 341–359.
Bonaiuto, James B., Edina Rosta and Michael A. Arbib 2007. Extending the mirror neuron system
model, I: Audible actions and invisible grasps. Biological Cybernetics 96: 9–38.
Buccino, Giovanni, Fausta Lui, Nicola Canessa, Ilaria Patteri, Giovanna Lagravinese, Francesca
Benuzzi, Carlo A. Porro and Giacomo Rizzolatti 2004. Neural circuits involved in the recog-
nition of actions performed by nonconspecifics: An FMRI study. Journal of Cognitive Neu-
roscience 16(1): 114–126.
Calvo-Merino, Beatriz, Daniel E. Glaser, Julie Grèzes, Richard E. Passingham, and Patrick Hag-
gard 2005. Action observation and acquired motor skills: An FMRI study with expert dancers.
Cerebral Cortex 15(8): 1243–1249.
Corballis, Michael C. 2002. From Hand to Mouth, the Origins of Language. Princeton, NJ: Prince-
ton University Press.
Dapretto, Mirella, Mari S. Davies, Jennifer H. Pfeifer, Ashley A. Scott, Marian Sigman, Susan Y.
Bookheimer, and Arco Iacoboni 2006. Understanding emotions in others: Mirror neuron dys-
function in children with autism spectrum disorders. Nature Neuroscience 9(1): 28–30.
Darwin, Charles 1872. The Expression of the Emotions in Man and Animals. Republished in
[1965]. Chicago: University of Chicago Press.
De Renzi, Ennio 1989. Apraxia. In: Francois Boller and Jordan Grafman (eds.), Handbook of
Neuropsychology, Vol. 2, 245–263. Amsterdam: Elsevier.
Feldman, Jerome and Srinivas Narayanan 2004. Embodied meaning in a neural theory of lan-
guage. Brain & Language 89(2): 385–392.
Ferrari, Pier Francesco, Vittorio Gallese, Giacomo Rizzolatti and Leonardo Fogassi 2003. Mirror
neurons responding to the observation of ingestive and communicative mouth actions in the
monkey ventral premotor cortex. European Journal of Neuroscience 17(8): 1703–1714.
Gallese, Vittorio, Luciano Fadiga, Leonardo Fogassi and Giacomo Rizzolatti 1996. Action recog-
nition in the premotor cortex. Brain 119: 593–609.
Gallese, Vittorio and Alvin Goldman 1998. Mirror neurons and the simulation theory of mind-
reading. Trends in Cognitive Science 2: 493–501.
Gallese, Vittorio and George Lakoff 2005. The brain’s concepts: The role of the sensory-motor
system in conceptual knowledge. Cognitive Neuropsychology 22: 455–479.
Gardner, R. Allen and Beatriz T. Gardner 1969. Teaching sign language to a chimpanzee. Science
165: 664–672.
Gazzola, Valeria, Lisa Aziz-Zadeh and Christian Keysers 2006. Empathy and the somatotopic
auditory mirror system in humans. Current Biology 16(18): 1824–1829.
30. Mirror systems and the neurocognitive substrates 465
Goodall, Jane 1986. The Chimpanzees of Gombe: Patterns of Behavior. Cambridge, MA: Harvard
University Press.
Grafton, Scott T., Michael A. Arbib, Luciano Fadiga and Giacomo Rizzolatti 1996. Localization of
grasp representations in humans by positron emission tomography. 2. Observation compared
with imagination. Experimental Brain Research 112(1): 103–111.
Hewes, Gordon W. 1973. Primate communication and the gestural origin of language. Current
Anthropology 12(1–2): 5–24.
Hoff, Bruce and Michael A. Arbib 1993. Models of trajectory formation and temporal interaction
of reach and grasp. Journal of Motor Behavior 25(3): 175–192.
Itti, Laurent and Michael A. Arbib 2006. Attention and the minimal subscene. In: Michael A.
Arbib (ed.), Action to Language via the Mirror Neuron System, 289–346. Cambridge: Cam-
bridge University Press.
Kohler, Evelyn, Christian Keysers, M. Alessandra Umiltà, Leonardo Fogassi, Vittorio Gallese and
Giacomo Rizzolatti 2002. Hearing sounds, understanding actions: Action representation in
mirror neurons. Science 297(5582): 846–848.
Leavens, David A., William D. Hopkins and Kim A. Bard 1996. Indexical and referential pointing
in chimpanzees (Pan troglodytes). Journal of Comparative Psychology 110(4): 346–353.
Leavens, David A., William D. Hopkins and Kim A. Bard 2005. Understanding the point of point-
ing. Current Directions in Psychological Science 14(4): 185–189.
Leavens, David A., William D. Hopkins and Roger K. Thomas 2004. Referential communication
by chimpanzees (Pan troglodytes). Journal of Comparative Psychology 118(1): 48–57.
Lee, Jinyong and Victor Barrès 2013. From visual scenes to language and back via Template Con-
struction Grammar. Neuroinformatics.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Oberzaucher, Elisabeth and Karl Grammer 2008. Everything is movement: On the nature of em-
bodied communication. In: Ipke Wachsmuth, Manuela Lenzen and Guenther Knoblich (eds.),
Embodied Communication in Humans and Machines, 151–177. Oxford: Oxford University
Press.
Oztop, Erhan and Michael A. Arbib 2002. Schema design and implementation of the grasp-related
mirror neuron system. Biological Cybernetics 87(2): 116–140.
Patterson, Francine 1978. Conversations with a gorilla. National Geographic 134(4): 438–465.
Petrides, Michael and Deepak N. Pandya 2009. Distinct Parietal and Temporal Pathways to the
Homologues of Broca’s Area in the Monkey. Public Library of Science Biology 7(8): e1000170.
Pika, Simone, Katja Liebal and Michael Tomasello 2003. Gestural communication in young gor-
illas (Gorilla gorilla): Gestural repertoire, learning and use. American Journal of Primatology
60(3): 95–111.
Poizner, Howard, Edward S. Klima and Ursula Bellugi 1987. What the Hands Reveal about the
Brain. Cambridge: Massachusetts Institute of Technology Press.
Rizzolatti, Giacomo and Michael A. Arbib 1998. Language within our grasp. Trends in Neuros-
ciences 21: 188–194.
Rizzolatti, Giacomo and Michael A. Arbib 1999. From grasping to speech: Imitation might pro-
vide a missing link: Reply. Trends in Neurosciences 22(4): 152.
Rizzolatti, Giacomo and Laila Craighero 2004. The mirror-neuron system. Annual Review of Neu-
roscience 27: 169–192.
Rizzolatti, Giacomo, Luciano Fadiga, Massimo Matelli, Valentino Bettinardi, Eraldo Paulesu, Da-
niela Perani and Ferruccio Fazio 1996. Localization of grasp representations in humans by
PET: 1. Observation versus execution. Experimental Brain Research 111(2): 246–252.
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain: How Our Minds Share Ac-
tions, Emotions, and Experience. Translated from the Italian by Frances Anderson. Oxford:
Oxford University Press.
Roberts, Jaqueline M 1989. Echolalia and comprehension in autistic children. Journal of Autism
and Developmental Disorders 19(2): 271–281.
466 IV. Contemporary approaches
Rothi, Lesley J., Cynthia Ochipa and Kenneth M. Heilman 1991. A cognitive neuropsychological
model of limb praxis. Cognitive Neuropsychology 8: 443–458.
Seyfarth, Robert M. and Dorothy L. Cheney 1997. Some general features of vocal development in
nonhuman primates. In: Charles T. Snowdon and Martine Hausberger (eds.), Social Influences
on Vocal Development, 249–273. Cambridge: Cambridge University Press.
Stokoe, William C. 1960. Sign Language Structure: An Outline of the Visual Communication Sys-
tems of the American Deaf. Buffalo, NY: University of Buffalo.
Suddendorf, Thomas and Michael C. Corballis 1997. Mental time travel and the evolution of the
human mind. Genetic, Social and General Psychology Monographs 123(2): 133–167.
Suddendorf, Thomas and Michael C. Corballis 2007. The evolution of foresight: What is mental
time travel, and is it unique to humans? Behavioral and Brain Sciences 30: 299–351.
Tomasello, Michael and Josep Call 1997. Primate Cognition. New York: Oxford University Press.
Tomasello, Michael, Malinda Carpenter, Josep Call, Tanya Behne and Henrike Moll 2005. Under-
standing and sharing intentions: The origins of cultural cognition. Behavioral and Brain
Sciences 28: 1–17.
Umiltà, M. Alessandra, L. Escola, Irakli Intskirveli, Franck Grammont, Magali Rochat, Fausto
Caruana, Ahmad Jezzini, Vittorio Gallese and Giacomo Rizzolatti 2008. When pliers become
fingers in the monkey motor system. Proceedings of the National Academy of Sciences of the
USA 105(6): 2209–2213.
Visalberghi, Elisabetta and Dorothy M. Fragaszy 2001. Do monkeys ape? Ten years after. In: Ker-
stin Dautenhahn and Chrystopher L. Nehaniv (eds.), Imitation in Animals and Artifacts, 471–
500. Cambridge: Massachusetts Institute of Technology Press.
Voelkl, Bernhard and Ludwig Huber 2000. True imitation in marmosets. Animal Behaviour 60:
195–202.
Wicker, Bruno, Christian Keysers, Jane Plailly, Jean-Pierre Royet, Vittorio Gallese and Giacomo
Rizzolatti 2003. Both of us disgusted in my insula: The common neural basis of seeing and feel-
ing disgust. Neuron 40(3): 655–664.
Wohlschläger, Andreas, Merideth Gattis and Harold Bekkering 2003. Action generation and
action perception in imitation: an instance of the ideomotor principle. Philosophical Transac-
tions of the Royal Society of London 358: 501–515.
Woodruff, Guy and David Premack 1979. Intentional communication in the chimpanzee: The
development of deception. Cognition 7: 333–352.
Abstract
Although it is commonly assumed that language evolved from primate calls, evidence
now converges on the alternative view that it emerged from manual gestures. Signed
languages are recognized as true languages, and manual gestures accompany and add
31. Gesture as precursor to speech in evolution 467
meaning to regular speech. The mirror system in primates, involving neural circuits spe-
cialized for both the production and perception of manual grasping, may have formed the
platform from which a manual form of communication evolved. Great apes in captivity
cannot be taught to speak, but they have acquired manual systems that have at least some
language-like properties and even in the wild, ape gestures are more flexible and inten-
tional than are their vocalizations. In hominins, bipedalism probably led to even greater
flexibility of gestures. A dramatic increase in brain size during the Pleistocene may have
signalled the emergence of grammatical features, but language probably remained pri-
marily gestural, with added facial and vocal elements, until the emergence of Homo sa-
piens some 200,000 years ago, when speech emerged as the dominant mode. The practical
advantages of speech over gesture, including the freeing of the hands, helps explain the
dramatic rise of sophisticated tools, bodily ornamentation, art, and perhaps music, in
our own species.
1. Introduction
Suppose we had no voice or tongue, and wanted to communicate with one another, should
we not, like the deaf and dumb, make signs with the hands and head and rest of the body?
(from Plato’s Cratylus, 360 BC)
Signed languages have been recognized for well over 2,000 years. Interest seemed to
pick up from the 17th century, though, when Francis Bacon ([1605] 1828) suggested
that the signed languages of the deaf and dumb were superior to spoken language,
since (or so he thought) they did not require grammar, the curse of Babel. The English
physician John Bulwer wrote in 1644 of “the natural language of the hand,” and was the
first to propose that the deaf might be educated through lip reading. Signed languages
were not studied formally, however, until the work of Abbé Charles-Michael de l’Épée
in the 1770s when he developed a sign language for use in a private school for the deaf
in Paris. This language was amalgamated with signs used by the deaf population in the
US to form the basis for American Sign Language (ASL).
The French philosopher Géraud de Cordemoy ([1668] 1972) called gestures “the first
of all languages,” noting that they were universal and understood everywhere. The idea
that gestures were an evolutionary precursor to spoken language was further developed
in the 18th century, although constrained by religious doctrine proclaiming language to
be a gift from God. The Italian philosopher Giambattista Vico ([1744] 1953) accepted
the Biblical story of the divine origin of speech, but proposed that after the Flood hu-
mans reverted to savagery, and language emerged afresh with the use of gestures and
physical objects, only later incorporating vocalizations. Condillac ([1746] 1971) also be-
lieved that language evolved from manual gestures but, as an ordained priest, he felt
compelled to disguise his theory in the form of a fable about two children isolated in
the desert after the Flood. They began by communicating in manual and bodily ges-
tures, which were eventually supplanted by vocalizations. Condillac’s compatriot
Jean-Jacques Rousseau (1781) later endorsed the gestural theory more openly.
In the 19th century, Charles Darwin referred to the role of gestures in his book
The Descent of Man: “I cannot doubt that language owes its origins to the imitation
and modification of various natural sounds, and man’s own distinctive cries, aided by
signs and gestures” (Darwin [1871] 1896: 86; emphasis added). Shortly afterwards,
468 IV. Contemporary approaches
the philosopher Friedrich Nietzsche, in his (1878) book Human, All Too Human, wrote
as follows:
Imitation of gesture is older than language, and goes on involuntarily even now, when the
language of gesture is universally suppressed, and the educated are taught to control their
muscles. The imitation of gesture is so strong that we cannot watch a face in movement
without the innervation of our own face (one can observe that feigned yawning will
evoke natural yawning in the man who observes it). The imitated gesture led the imitator
back to the sensation expressed by the gesture in the body or face of the one being imi-
tated. This is how we learned to understand one another; this is how the child still learns
to understand its mother (…) As soon as men understood each other in gesture, a symbol-
ism of gesture could evolve. I mean, one could agree on a language of tonal signs, in such
a way that at first both tone and gesture (which were joined by tone symbolically) were
produced, and later only the tone. (Nietzsche [1878] 1986: 219)
This extract also anticipates the later discovery of the mirror system, discussed below.
In 1900, Wilhelm Wundt wrote a two-volume work on speech, and argued that a uni-
versal sign language was the origin of all languages. He wrote, though, under the mis-
apprehension that all deaf communities use the same system of signing, and that signed
languages are useful only for basic communication, and cannot communicate abstract
ideas. We now know that signed languages vary widely from community to community,
and can have all of the communicative sophistication of speech (e.g., Emmorey 2002)
2. Modern developments
The gestural theory began to assume its modern shape with an article published in 1973
by the anthropologist Gordon W. Hewes. He too drew on evidence from signed lan-
guages, and also referred to contemporary work showing that great apes were unable
to learn to speak but could use manual gestures in a language-like way, with at least
moderate success. It was not until the 1990s, though, that the idea began to take
hold. Pinker and Bloom’s (1990) influential article on language evolution made no men-
tion of Hewes’ work, but was followed by an increasing number of publications that
picked up the gestural theme (e.g., Armstrong, Stokoe, and Wilcox 1995; Corballis
1992, 2002; Rizzolatti and Arbib 1998). Three books published since 2007 converge
on the gestural theory, but from very different perspectives, one based on signed lan-
guages (Armstrong and Wilcox 2007), one on gestural communication in great apes
(Tomasello 2008), and one on the mirror system in the primate brain (Rizzolatti and
Sinigaglia 2008). I discuss each in turn.
or vocal, is basically gestural rather than amodal, with manual gestures taking prece-
dence in human evolution (e.g., Armstrong and Wilcox 2007). In principle, at least, lan-
guage could once have existed in a form consisting entirely of manual and facial
gestures, comparable to present-day signed languages.
One difference between signed and spoken languages is that many of the signs are
iconic, representing the actual shapes of objects or sequences of events. In Italian
Sign Language some 50% of the hand signs and 67% of the bodily locations of signs
stem from iconic representations, in which there is a degree of spatiotemporal mapping
between the sign and its meaning (Pietrandea 2002). In American Sign Language, too,
some signs are purely arbitrary, but many more are iconic (Emmorey 2002). For exam-
ple, the sign for “erase” resembles the action of erasing a blackboard, and the sign for
“play piano” mimics the action of actually playing a piano. The iconic nature of signed
languages may seem to contravene the “arbitrariness of the sign,” considered by Ferdi-
nand de Saussure (1916) to be one of the defining properties of language. But arbitrari-
ness is more a consequence of the language medium, and of expediency, than a
necessary property of language per se.
Speech, unlike manual signing, requires that the information be linearized, squeezed
into a sequence of sounds that are necessarily limited in terms of how they can capture
the physical nature of what they represent. The linguist Charles Hockett (1978) put it
this way:
In any case, not all signs are iconic, and they can also be iconic without being transpar-
ently so. Seemingly iconic signs often cannot be guessed by naı̈ve observers (Pizzuto
and Volterra 2000). They also tend to become less iconic and more arbitrary over
time, in the interests of speed, efficiency, and the constraints of the communication
medium. This process is called conventionalization (Burling 1999). Once the principle
of conventionalization is established, there is no need to retain an iconic component,
or even to depend on visual signals. We are quick to learn arbitrary labels, whether
for objects, actions, emotions, or abstract concepts. Manual gesture may still be neces-
sary to establish links in the first place – the child can scarcely learn the meaning of the
word dog unless her attention is drawn to the animal itself – but there is otherwise no
reason why the labels themselves need not be based on patterns of sound. Of course
some concepts, such as the moo of a cow or miaow of a cat, depend on sound rather
than sight. Pinker (2007) notes a number of newly minted examples: oink, tinkle,
barf, conk, woofer, tweeter. But most spoken words bear no physical relation to what
they represent.
The symbols of signed languages are less constrained. The hands and arms can mimic
the shapes of real-world objects and actions, and to some extent lexical information can
be delivered in parallel instead of being forced into rigid temporal sequence. Even so,
conventionalization allows signs to be simplified and speeded up, to the point that many
of them lose most or all of their iconic aspect. The American Sign Language sign for
470 IV. Contemporary approaches
home was once a combination of the sign for eat, which is a bunched hand touching
the mouth, and the sign for sleep, which is a flat hand on the cheek. Now it consists
of two quick touches on the cheek with a bunched handshape, so the original iconic
components are all but lost (Frishberg 1975).
Two signed languages have emerged, apparently de novo, in recent times, and have
provided some insights into how language might have evolved in the first place. One
arose in Nicaragua, where deaf people were isolated from one another until the Sandi-
nista government in 1979 created the first schools for the deaf. Since then, the children
in these schools have invented their own sign language, which has developed into
the system now called Lenguaje de Signos Nicaragüense (LSN). In the course of
time, Lenguaje de Signos Nicaragüense has changed from a system of holistic signs
to a more combinatorial format. For example, the children were told a story of a cat
that swallowed a bowling ball, and then rolled down a steep street in a “waving, wob-
bling manner.” Asked to sign the motion, some children indicated the motion holisti-
cally, moving the hand downward in a waving motion. Others, however, segmented
the motion into two signs, one representing downward motion and the other represent-
ing the waving motion, and this version increased after the first cohort of children had
moved through the school (Senghas, Kita, and Özyürek 2004).
The other newly-minted sign language arose among the Al-Sayyid, a Bedouin com-
munity in the Negev Desert in Israel. Some 150 of the people in this community of
about 3,500 have inherited a condition leaving them profoundly deaf, and although
these are in the minority their sign language, Al-Sayyid Bedouin Sign Language
(ABSL), is widely used in the community, along with a spoken dialect of Arabic. Al-
Sayyid Bedouin Sign Language is now in only its third generation of signers. A feature
of Al-Sayyid Bedouin Sign Language is that it lacks the equivalent of phonemes or mor-
phemes. Each signed word is essentially a whole, not decomposable into parts (Aronoff,
Meir, Padden, et al. 2008). In this respect, it seems to defy what has been called duality
of patterning, or combinatorial structure at two levels, the phonological and the gram-
matical. Hockett (1960) listed duality of patterning as one of the “design features” of
language, so its absence in Al-Sayyid Bedouin Sign Language might be taken to
mean that Al-Sayyid Bedouin Sign Language is not a true language. Yet, Hockett him-
self recognized that the design features did not appear all at once, and that duality of
patterning was probably a latecomer. In the early stages of language development,
whole signs or sounds can be used to refer to distinct entities, such as objects or actions,
but as the required vocabulary grows individual signs or sounds become increasingly
indistinguishable. The solution is then to form words that are combinations of elements,
and duality of patterning is born (Hockett 1978). Al-Sayyid Bedouin Sign Language,
then, may be regarded as a language in the early stages of development. Aronoff con-
cludes that words are the primary elements of all languages, and do not acquire phonol-
ogy or morphology until they are thrust upon them; as he put it, “In the beginning was
the word” (Aronoff 2007: 803).
Other signed languages, such as American Sign Language, do have the equivalent of
phonemes, sometimes called cheremes, but more often referred to simply as phonemes
(Emmorey 2002). It is reasonable to suppose that Al-Sayyid Bedouin Sign Language
will evolve duality of patterning in the course of time. The study of both Lenguaje
de Signos Nicaragüense and Al-Sayyid Bedouin Sign Language have therefore yielded
insights into the manner in which language develops structure, perhaps not so much
31. Gesture as precursor to speech in evolution 471
through some innately given universal grammar (Chomsky 1975) as through a gradual
and pragmatic process of grammaticalization (Heine and Kuteva 2007; Hopper and
Traugott 1993).
self-serving; for example, a chimp may point to a desirable object that is just out of
reach, with the aim of getting a watching human to help. True language requires the fur-
ther step of shared attention, so that the communicator knows not only what the recip-
ient knows or is attending to, but knows also that the recipient knows that the
communicator knows this. This kind of recursive mind-reading enables communication
beyond the making of simple requests to the sharing of knowledge, a distinctive property
of language.
Tomasello (2008) goes on to summarize work on pointing in human infants, suggest-
ing that shared attention and cooperative interchange emerge from about one year of
age. The one-year-old points to objects not only to request them, but also to indicate
things that an adult is already looking at, in the apparent understanding that attention
to the object is shared. Infants therefore seem already to understand sharing and shared
communication before language itself has appeared. In this, they have already moved
beyond the communicative capacity of the chimpanzee. Even so, Tomasello’s work re-
veals an essential continuity between the pointing behaviour of chimpanzees and that of
human infants.
It is also worth making the point that manual gestures provide a much more natural
means of communicating about events in the world than vocal sounds do. Pointing re-
fers naturally to objects and events, proving the link between specific gestures – or
words – to real-world entities. The hands and arms provide a natural vehicle for mim-
icry and description, since they are free to move in four-dimensional space (three of
space, one of time). Indeed the whole body can be used to pantomime events. Of
course, as we have seen, vocalizations can be used to mimic some sounds in onomato-
poeic fashion, but the natural world presents itself much more as a visuo-tactile entity
than as an acoustic one.
humans (Fazio, Cantagallo, Craighero, et al. 2009). They are not impaired in the encod-
ing of physical actions. Broca’s area, then, appears to be specialized for both the
production and the understanding of human action, including vocal action.
In nonhuman primates, though, the mirror system seems not to incorporate vocaliza-
tion (Rizzolatti and Sinigaglia 2008), although it is receptive to acoustic input. Kohler,
Keysers, Umiltà, et al. (2002) recorded neurons in area F5 of the monkey responding to
the sounds of manual actions, such as tearing paper or cracking peanuts. Significantly,
there was no response to monkey calls. This is consistent with evidence that vocaliza-
tions in nonhuman primates are under limbic rather than neocortical control (Ploog
2002), and are therefore not (yet) part of the mirror system. Even in the chimpanzee,
voluntary control of vocalization appears to be limited, at best (Goodall 1986).
Mirror neurons are part of a more general “mirror system” involving regions of the
brain other than F5. In monkeys, cells in the superior temporal sulcus (STS) also
respond to observed biological actions, including grasping (Perrett, Harries, Bevan,
et al. 1989), although few if any respond when the animal itself performs an action.
F5 and superior temporal sulcus connect to area PF in the inferior parietal lobule,
where there are also neurons, known as “PF mirror neurons,” that respond to both
the execution and perception of actions (Rizzolatti, Fogassi, and Gallese 2001). This ex-
tended mirror system in monkeys largely overlaps with the homologues of the wider
cortical circuits in humans that are involved in language, leading to the notion that
language evolved out of the mirror system (see Arbib 2005 for detailed speculation).
It reached a peak about 700,000 years ago, and remains about three times the size
expected in a great ape of the same body size (Wood and Collard 1999).
Communication was no doubt a critical element of the cognitive niche. Survival de-
pended on ability to encode, and no doubt express, information as to “who did what to
whom, when, where, and why” (Pinker 2003: 27). The problem is that the number of
combinations of actions, actors, locations, time periods, implements, and so forth, that
define episodes becomes very large, and a system of holistic calls to describe those epi-
sodes rapidly taxes the perceptual and memory systems. Syntax may then have emerged
as a series of rules whereby episodic elements could be combined.
The expressiveness of modern signed languages tells us that manual gestures would
have been sufficient to carry the communicative burden. A gradual switch from hand to
mouth was probably driven by the increasing involvement of the hands in manufacture,
and perhaps in transporting belongings or booty. Manufacture of stone tools, considered
a conceptual advance beyond the opportunistic use of sticks or rocks as tools, appear in
the fossil record from some 2.5 million years ago (Semaw, Renne, Harris, et al. 1997).
From some 1.8 million years, Homo erectus began to migrate out of Africa into Asia
and later into Europe, and the Acheulian industry emerged, with large bifacial tools
and handaxes, marking a significant advance over the simple flaked tools of the earlier
industry.
With the hands increasingly involved in such activities, the burden of communica-
tion may have shifted to the face, which provides sufficient diversity of movement
and expression to act as a signaling device. Signed languages involve communicative
movements of the face as well as of the hands (Sutton-Spence and Boyes-Braem 2001),
and eye-movement recordings suggest that signers watching signed discourse focus
mostly on the face and mouth, and relatively little on the hands or upper body (Muir
and Richardson 2005). In American Sign Language, facial expressions and head move-
ments can turn an affirmative sentence into a negation, or a question, as well as providing
the visual equivalent of prosody in speech (Emmorey 2002).
The incorporation of vocalization into the gestural system may then have been a rel-
atively small step. One advantage is that voicing makes gestures of the tongue, normally
partly hidden from view, recoverable by the receiver through audition rather than
vision. But vocal communication has other advantages. Unlike signed language, it can
be carried on at night, or when the line of sight between sender and receiver is blocked.
Communication at night may have been critical to survival in a hunter-gatherer society.
The San, a modern hunter-gatherer society, are known to talk late at night, sometimes
all through the night, to resolve conflict and share knowledge (Konner 1982). Speech is
much less energy-consuming than manual gesture. Signing can impose quite severe
physical demands, whereas the physiological costs of speech are so low as to be nearly
unmeasurable (Russell, Cerny, and Stathopoulos 1998). Speech adds little to the energy
cost of breathing, which we must do anyway to sustain life.
But it was perhaps the freeing of the hands for other adaptive functions, such as car-
rying things, and the manufacture of tools, that was the most critical. Vocal language
allows people to use tools and at the same time explain verbally what they are doing,
leading perhaps to pedagogy (Corballis 2002). Indeed, this may explain the dramatic
rise of more sophisticated tools, bodily ornamentation, art, and perhaps music, in our
own species. This is sometimes linked to the Late Stone Age dating from about
50,000 years ago (Klein 2008), but more likely originated from around 75,000 years
476 IV. Contemporary approaches
ago in Africa ( Jacobs, Roberts, Galbraith, et al. 2008), shortly before the human dis-
persal from Africa at around 55,000 to 60,000 years ago (Mellars 2006). These develop-
ments, associated with Homo sapiens, are often attributed to the emergence of language
itself, but I suggest that the critical innovation was not language, but speech.
It is unclear precisely how the neurophysiological change allowing vocalization to
become fully incorporated into the mirror system occurred. One possible lead comes
from the forkhead box transcription factor, FOXP2. A mutation of this gene has re-
sulted in a severe deficit in vocal articulation, and an functional magnetic resonance ima-
ging study revealed that members of a family affected by the mutation, unlike their
unaffected relatives, showed no activation in Broca’s area while covertly generating
verbs (Liégeois, Baldeweg, Connelly, et al. 2003). Although highly conserved in mam-
mals, the forkhead box transcription factor 2 gene underwent two mutations since the
split between hominine and chimpanzee lines. One estimate placed the more recent
of these at “not less than” 100,000 years ago (Enard, Przeworski, Fisher, et al. 2002).
Contrary evidence, though, comes from a report that the mutation is also present in the
DNA of a 45,000-year-old Neandertal fossil (Krause, Lalueza-Fox, Ludovic, et al.
2007), suggesting that it goes back as much as 700,000 years ago to the common
human-Neandertal ancestor. But this is challenged, in turn, by Coop, Bullaughev, Luca,
et al. (2008), who used phylogenetic dating of the haplotype to re-estimate the time at
a mere 42,000 years ago. Further clarification of this issue, and of the precise role of fork-
head box transcription factor 2 in vocal articulation, may lead to a better understanding
of when and why we came to talk, and how this impacted on human culture.
4. Conclusion
Evidence from signed languages, ape communication, and the mirror system strongly
point to the idea that language originated in manual gestures. The shift to speech
was probably progressive, with increasing involvement of the face, and gradual incorpo-
ration of vocal elements. Even today, manual and facial gestures accompany speech
(McNeill 2005). It is not clear when and how speech eventually became dominant,
but the switch may have been restricted to Homo sapiens, and the consequent freeing
of the hands may explain the extraordinary flowering of manufacture and art in human
culture, and why we eventually prevailed over the Neandertals, who were driven to
extinction some 12,000 years ago (Corballis 2004).
5. References
Arbib, Michael A. 2005. From monkey-like action recognition to human language: An evolution-
ary framework for neurolinguistics. Behavioral and Brain Science 28: 105–168.
Armstrong, David F., William C. Stokoe and Sherman E. Wilcox 1995. Gesture and the Nature of
Language. Cambridge: Cambridge University Press.
Armstrong, David F. and Sherman E. Wilcox 2007. The Gestural Origin of Language. Oxford:
Oxford University Press.
Aronoff, Mark 2007. In the beginning was the word. Language 83: 803–830.
Aronoff, Mark, Irit Meir, Carol A. Padden and Wendy Sandler 2008. The roots of linguistic orga-
nization in a new language. Interaction Studies 9: 133–153.
Bacon, Francis 1828. Of the Proficience and Advancement of Learning, Divine and Human.
London: Dove. First published [1605].
31. Gesture as precursor to speech in evolution 477
Binkofski, Ferdinand and Giovanni Buccino 2004. Motor functions of the Broca’s region. Brain
and Language 89: 362–389.
Broca, Paul 1861. Remarques sur le siège de la faculté de la parole articulée, suivies d’une obser-
vation d’aphémie (perte de parole). Bulletin de la Société d’Anatomie (Paris) 36: 330–357.
Browman, Catherine P. and Louis F. Goldstein 1995. Dynamics and articulatory phonology. In:
Tim van Gelder and Robert F. Port (eds.), Mind as Motion. Explorations in the Dynamics of
Cognition, 175–193. Cambridge: Massachusetts Institute of Technology Press.
Buccino, Giovanni, Ferdinand Binkofski, Gereon R. Fink, Luciano Fadiga, Leonardo Fogassi, Vit-
torio Gallese, Rüdiger J. Seitz, Karl Zilles, Giacomo Rizzolatti and Hans-Joachim Freund 2001.
Action observation activates premotor and parietal areas in a somatotopic manner: An fMRI
study. European Journal of Neuroscience 13: 400–404.
Bulwer, John 1644. Chirologia: On the Natural Language of the Hand. London.
Burling, Robbins 1999. Motivation, conventionalization, and arbitrariness in the origin of lan-
guage. In: Barbara J. King (ed.), The Origins of Language: What Nonhuman Primates Can
Tell Us, 307–350. Santa Fe, NM: School of American Research Press.
Burling, Robbins 2005. The Talking Ape. New York: Oxford University Press.
Chomsky, Noam 1975 Reflections on Language. New York: Pantheon.
Condillac, Étienne Bonnot de 1971. An Essay on the Origin of Human Knowledge. Translated by
T. Nugent. Gainesville, FL: Scholars Facsimiles and Reprints. First published [1746].
Coop, Graham, Kevin Bullaughev, Francesca Luca and Molly Przeworski 2008. The timing of
selection of the human FOXP2 gene. Molecular Biology and Evolution 25: 1257–1259.
Corballis, Michael C. 1992. On the evolution of language and generativity. Cognition 44: 197–226.
Corballis, Michael C. 2002. From Hand to Mouth: The Origins of Language. Princeton, NJ: Prin-
ceton University Press.
Corballis, Michael C. 2004. The origins of modernity: Was autonomous speech the critical factor?
Psychological Review 111: 543–552.
Cordemoy, Géraud de 1972. A Philosophical Discourse Concerning Speech. Delmar, New York:
Scholars’ Facsimiles and Reprints. First published [1668].
Darwin, Charles 1896. The Descent of Man and Selection in Relation to Sex. London: William
Clowes. First published [1871].
D’Ausilio, Alessandro, Friedemann Pulvermüller, Paola Salmas, Ilaria Bufalari, Chiara Beglio-
mini and Luciano Fadiga 2009. The motor somatotopy of speech perception. Current Biology
19: 381–385.
Egnor, S.E. Roian and Marc D. Hauser 2004. A paradox in the evolution of primate vocal learn-
ing. Trends in Neurosciences 27: 649–654.
Emmorey, Karen 2002. Language, Cognition, and Brain: Insights from Sign Language Research.
Hillsdale, NJ: Erlbaum.
Enard, Wolfgang, Molly Przeworski, Simon E. Fisher, Cecilia S.L. Lai, Victor Wiebe, Takashi Ki-
tano, Anthony P. Monaco and Svante Pääbo 2002. Molecular evolution of FOXP2, a gene in-
volved in speech and language. Nature 418: 869–871.
Fazio, Patrik, Anna Cantagallo, Laila Craighero, Alessandro D’Ausilio, Alice C. Roy, Thierry
Pozzo, Ferdinando Calzolari, Enrico Granieri and Liciano Fadiga 2009. Encoding of human
action in Broca’s area. Brain 132: 1980–1988.
Frishberg, Nancy 1975. Arbitrariness and iconicity in American Sign Language. Language 51: 696–
719.
Galantucci, Bruno, Carol A. Fowler and Michael T. Turvey 2006. The motor theory of speech per-
ception reviewed. Psychonomic Bulletin and Review 13: 361–377.
Gardner, R. Allen and Beatrix T. Gardner 1969. Teaching sign language to a chimpanzee. Science
165: 664–672.
Gentilucci, Maurizio, Francesca Benuzzi, Massimo Gangitano and Silvia Grimaldi 2001. Grasp
with hand and mouth: A kinematic study on healthy subjects. Journal of Neurophysiology
86: 1685–1699.
478 IV. Contemporary approaches
Gentilucci, Maurizio and Michael C. Corballis 2006. From manual gesture to speech: A gradual
transition. Neuroscience & Biobehavioral Reviews 30: 949–960.
Goodall, Jane 1986. The Chimpanzees of Gombe: Patterns of Behavior. Cambridge, MA: Harvard
University Press.
Grice, Paul 1975. Logic and conversation. In: Peter Cole and Jerry L. Morgan (eds.), Syntax and
Semantics, Vol. 3: Speech Acts, 43–58. New York: Academic Press.
Hanakawa, Takashi, Ilka Immisch, Keiichiro Toma, Michael A. Dimyan, Peter Van Gelderen and
Mark Hallett 2003. Functional properties of brain areas associated with motor execution and
imagery. Journal of Neurophysiology 89: 989–1002.
Heine, Bernd and Tania Kuteva 2007. The Genesis of Grammar. Oxford: Oxford University Press.
Hewes, Gordon W. 1973. Primate communication and the gestural origins of language. Current
Anthropology 14: 5–24.
Hewes, Gordon W. 1996. A history of the study of language origins and the gestural primacy
theory. In: Andrew Lock and Charles R. Peters (eds.), Handbook of Human Symbolic Evolu-
tion, 571–595. Oxford: Clarendon Press.
Hockett, Charles F. 1960. The origins of speech. Scientific American 203(3): 88–96.
Hockett, Charles F. 1978. In search of love’s brow. American Speech 53: 243–315.
Hopper, Paul J. and Elizabeth C. Traugott 1993. Grammaticalization. Cambridge: Cambridge Uni-
versity Press.
Jacobs, Zenobia, Richard G. Roberts, Rex F. Galbraith, Hilary J. Deacon, Rainer Grün, Alex
Mackay, Peter Mitchell, Ralf Vogelsang and Lyn Wadley 2008. Ages for the Middle Stone
Age of Southern Africa: Implications for human behavior and dispersal. Science 322: 733–735.
Klein, Richard G. 2008. Out of Africa and the evolution of human behavior. Evolutionary Anthro-
pology 17: 267–281.
Kohler, Evelyne, Christian Keysers, M. Allessandra Umiltà, Leonardo Fogassi, Vittorio Gallese
and Giacomo Rizzolatti 2002. Hearing sounds, understanding actions: Action representation
in mirror neurons. Science 297: 846–848.
Konner, Melvin 1982. The Tangled Wing: Biological Constraints on the Human Spirit. New York:
Harper.
Krause, Johannes, Carles Lalueza-Fox, Orlando Ludovic, Wolfgang Enard, Richard E. Green,
Hermàn A. Burbano, Jean-Jacques Hublin, Catherine Hänni, Javier Fortea, Marco de la Ra-
silla, Jaume Bertranpetir, Antinio Rosas and Svante Pääbo 2007. The derived FOXP2 variant
of modern humans was shared with Neandertals. Current Biology 17: 1908–1912.
Liberman, Alvin M., Franklin S. Cooper, Donald P. Shankweiler and Michael Studdert-Kennedy
1967. Perception of the speech code. Psychological Review 74: 431–461.
Liebal, Katja, Josep Call and Michael Tomasello 2004. Use of gesture sequences in chimpanzees.
American Journal of Primatology 64: 377–396.
Liégeois, Frédérique, Torsten Baldeweg, Alan Connelly, David G. Gadian, Mortimer Mishkin and
Faraneh Vargha-Khadem 2003. Language fMRI abnormalities associated with FOXP2 gene
mutation. Nature Neuroscience 6: 1230–1237.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Mellars, Paul 2006. Going east: New genetic and archaeological perspectives on the modern
human colonization of Eurasia. Science 313: 796–800.
Muir, Laura J. and John E.G. Richardson 2005. Perception of sign language and its application
to visual communications for deaf people. Journal of Deaf Studies & Deaf Education 10:
390–401.
Nietzsche, Friedrich W. 1986. Human, All Too Human: A Book for Free Spirits. Translated by R. J.
Hollingdale. Cambridge: Cambridge University Press. First published [1878].
Perrett, David I., Mark H. Harries, Rachel Bevan, S. Thomas, P.J. Benson, Amanda J. Mistlin,
Andrew J. Chitty, Jari K. Hietanen and J.E. Ortega 1989. Frameworks of analysis for the
neural representation of animate objects and actions. Journal of Experimental Biology 146:
87–113.
31. Gesture as precursor to speech in evolution 479
Petitto, Laura A., Siobhan Holowka, Lauren E. Sergio, Bronna Levy and David J. Ostry 2004.
Baby hands that move to the rhythm of language: Hearing babies acquiring sign language bab-
ble silently on the hands. Cognition 93: 43–73.
Petrides, Michael, Geneviève Cadoret and Scott Mackey 2005. Orofacial somatomotor responses
in the macaque monkey homologue of Broca’s area. Nature 435: 1325–1328.
Pietrandrea, Paolo 2002. Iconicity and arbitrariness in Italian Sign Language. Sign Language Stu-
dies 2: 296–321.
Pika, Simone, Katja Liebal and Michael Tomasello 2003. Gestural communication in young gorillas
(Gorilla gorilla): Gestural repertoire, and use. American Journal of Primatology 60: 95–111.
Pika, Simone, Katja Liebal and Michael Tomasello 2005. Gestural communication in subadult bo-
nobos (Pan paniscus): Repertoire and use. American Journal of Primatology 65: 39–61.
Pinker, Steven 2003. Language as an adaptation to the cognitive niche. In: Morten H. Christiansen
and Simon Kirby (eds.), Language Evolution, 16–37. Oxford: Oxford University Press.
Pinker, Steven 2007. The Stuff of Thought. London: Penguin Books.
Pinker, Steven and Paul Bloom 1990. Natural language and natural selection. Behavioral and
Brain Sciences 13: 707–784.
Pizzuto, Elena and Virginia Volterra 2000. Iconicity and transparency in sign languages: A cross-
linguistic cross-cultural view. In: Karen Emmorey and Harlan Lane (eds.), The Signs of Lan-
guage Revisited: An Anthology to Honor Ursula Bellugi and Edward Klima, 261–286. Mahwah,
NJ: Lawrence Erlbaum.
Plato (360 BCE) Cratylus. Translated by Benjamin Jowett. http://www.ac-nice.fr/philo/textes/
Plato-Works/16-Cratylus.htm
Ploog, Detlev 2002. Is the neural basis of vocalisation different in non-human primates and Homo
sapiens? In: Timothy J. Crow (ed.), The Speciation of Modern Homo Sapiens, 121–135. Oxford:
Oxford University Press.
Pollick, Amy S. and Frans B.M. de Waal 2007. Apes gestures and language evolution. Proceedings
of the National Academy of Sciences 104: 8184–8189.
Rizzolatti, Giacomo and Michael A. Arbib 1998. Language within our grasp. Trends in Cognitive
Sciences 21: 188–194.
Rizzolatti, Giacomo, Rosolino Camarda, Leonardo Fogassi, Maurizio Gentilucci, Giuseppe Lup-
pino and Massimo Matelli 1988. Functional organization of inferior area 6 in the macaque mon-
key. II. Area F5 and the control of distal movements. Experimental Brain Research 71: 491–507.
Rizzolatti, Giacomo, Leonardo Fadiga, Vittorio Gallese and Leonardo Fogassi 1996. Premotor
cortex and the recognition of motor actions. Cognitive Brain Research 3: 131–141.
Rizzolatti, Giacomo, Leonard Fogassi and Vittorio Gallese 2001. Neurophysiological mechanisms
underlying the understanding and imitation of action. Nature Reviews 2: 661–670.
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain. Oxford: Oxford University
Press.
Rousseau, Jean-Jacques 1781. Essai sur l’Origine des Langues. It was published posthumously by
A. Belin, Paris, in 1817. Geneva.
Russell, Bridget A., Frank J. Cerny and Elaine T. Stathopoulos 1998. Effects of varied vocal inten-
sity on ventilation and energy expenditure in women and men. Journal of Speech, Language,
and Hearing Research 41: 239–248.
Saussure, Ferdinand de 1916. Cours de Linguistique Générale. Edited by C. Bally and A. Seche-
haye. Lausanne: Payot.
Savage-Rumbaugh, Sue, Stuart G. Shanker and Talbot J. Taylor 1998. Apes, Language, and the
Human Mind. New York: Oxford University Press.
Semaw, Sileshi, Paul Renne, John W.K. Harris, Craig S. Feibel, Raymond L. Bernor, N. Fesseha
and Ken Mowbray 1997 2.5-million-year-old stone tools from Gona, Ethiopia. Nature 385:
333–336.
Senghas, Ann, Sotaro Kita and Asli Özyürek 2004. Children creating core properties of language:
Evidence from an emerging sign language in Nicaragua. Science 305: 1780–1782.
480 IV. Contemporary approaches
Studdert-Kennedy, Michael 2005. How did language go discrete? In: Maggie Tallerman (ed.), Lan-
guage Origins: Perspectives on Evolution, 48–67. Oxford: Oxford University Press.
Sutton-Spence, Rachel and Penny Boyes-Braem (eds.) 2001. The Hands Are the Head of the
Mouth: The Mouth as Articulator in Sign Language. Hamburg: Signum-Verlag.
Tomasello, Michael 2008. The Origins of Human Communication. Cambridge: Massachusetts
Institute of Technology Press.
Tooby, John and Irven DeVore 1987. The reconstruction of hominid evolution through strategic
modeling. In: W. G. Kinzey (ed.), The Evolution of Human Behavior: Primate Models, 183–237.
Albany: State University of New York Press.
Vico, Giambattista 1953. La Scienza Nova. Bari: Laterza. First published [1744].
Wood, Bernard and Mark Collard 1999. The human genus. Science 284: 65–71.
Wundt, Wilhelm 1900. Die Sprache. Leipzig: Engelman.
Abstract
This article is a sketch of the “gesture-first” and the gesture and speech (“Mead’s Loop”)
theories of language evolution, plus how a capacity for syntax could have arisen during
linguistic encounters, and implications of the foregoing for current-day language onto-
genesis as it recapitulates steps in language phylogenesis, including both a gesture-first
and a separate gesture and speech stage.
“The actuation of language evolution lies within the interacting individuals, under the
pressure of a host of external ecological factors”. (Mufwene 2010)
1. Introduction
The origin of language, a prodigal topic, has returned to respectability after long exile.
Discoveries in linguistics, brain science, primate studies, children’s development and
elsewhere have inspired new interest after the infamous 19th Century ban (actually,
32. The co-evolution of gesture and speech, and downstream consequences 481
bans) on the topic. The topic can be approached from many angles. Most common
seems to be the comparative – differences and resemblances between humans and
other primates. A related approach is to consider the brain mechanisms underlying
communicative vocalizations and/or gesture. These have been recorded directly in
some primate species and can be compared to humans on some performance measures
thought to depend on similar brain mechanisms. Or a linguistic angle – the key features
of human language and whether anything can be said of how they came to be and
whether other animal species show plausible counterparts. Approaches are combined
in comparing human language to vocalizations, gestures and/or the instructed sign lan-
guage use of, say, orang-utans or chimps. I will take a third approach, gestures, which
also has its devotees, but I shall diverge from past approaches in crucial ways. This
approach is not inconsistent with the others, and in fact will draw on them at various
places.
My guiding idea and fundamental divergence is the following: Gestures are compo-
nents of speech, not accompaniments but actually integral parts of it. Much evidence
supports this idea, but its full implications have not always been recognized. The growth
point (GP) hypothesis is designed to explicate this integral linkage. A key insight is that
speech on the one hand and gesture on the other, when combined in a growth point,
bring semiotically opposite modes of representation together into a single package or
idea unit. The growth point unity of opposite semiotic modes, gesture and speech, cre-
ates a specific form of human cognition that animates language, giving it a dynamic
dimension intersecting the static one accessed by intuitions of linguistic form. This semi-
otic opposition, and the processes to resolve it, propel thought and speech forward. In
this chapter, I develop a sustained argument concerning the evolutionary implications
of the idea that gestures are integral parts of what evolved to make language possible
and how they did so. Fundamentally, what evolved are new forms of thinking and acting
which orchestrate vocal and manual movements in the brain areas in charge of such
movements, and these new orchestrations are based on the synthesis of speech and ges-
ture. The growth point explains this unit of thought and action, and the theme of this
chapter is to uncover some of the evolutionary selections that made this possible.
So what did it take in human evolution to have this system of communicative actions
in which global gesture imagery and “digital” encoded linguistic forms join in smoothly
performed actions? In its essentials the argument of these “notes” is that gesture and
speech are integrated as one dynamic system: this dynamic system is the outcome of
evolutionary selection. Accordingly, explanations of the origin of language can be
judged whether or not they are able to predict it. The “gesture-first” hypothesis of
recent fame does not. I offer a different explanation, called Mead’s Loop, that lays
the neural groundwork for language and gesture to be integrated as we observe it.
explain how this system of united semiotic oppositions – global and synthetic imagery
and analytic and combinatoric language combined to make a single idea unit – evolved.
To leave it out is to misconstrue what evolution produced. This idea is my main meth-
odological approach – to test a theory of language origin for whether (or not) it explains
the dual semiotic system of imagery and conventional code. A popular current theory,
appearing over and over in a veritable avalanche of recent books, is what I dub “ges-
ture-first”. Despite this name, the primatologist, neuroscientist, developmental psychol-
ogist, anthropologist, sign-language linguist, regular linguist, computer scientist, etc.
proponents of gesture-first seemingly have not carefully analyzed gestures with speech
but rely instead (it appears) on folk culture portrayals. So, they do not recognize a key
point: that language is misconstrued if it is not seen as a duality of semiotic opposites.
In this article, we take up gesture-first and an alternative to be explained called
Mead’s Loop. Gesture-first states that the initial form of language was unspoken – it
was a gesture or a sign language. I show that gesture-first – to put it delicately – is
unlikely to be true, because it is unable to capture the connections of speech and ges-
ture that we, the living counter-examples, display: it “predicts” what did not evolve (that
gesture withered when speech arose) and does not predict what did evolve – that ges-
ture is an integral part of speaking. A theory that says what didn’t happen did, and what
did happen didn’t, cannot be generally true, to say the least. That so many have adopted
it I explain as due to the above-mentioned folk (and fabricated) beliefs about gestures.
The alternative is “Mead’s Loop”, after George Herbert Mead (1974), which does
account for the integrated system of speech-gesture. It also provides a framework for
a remarkably wide range of other phenomena that stem from it, as would be expected
from a founding event. According to Mead’s Loop, speech and gesture were naturally
selected jointly. The dual semiosis is intrinsic to the origin of language. Without gesture
speech would not have evolved, and without speech gesture would not have also –
contrary to both “speech-first” and “gesture-first”. Mead’s Loop creates a “thought-
language-hand link” in the area of the brain that orchestrates complex actions – Broca’s
Area according to much evidence. In completing Mead’s Loop, mirror neurons under-
went a further evolution from the primate norm, a recalibration or “twist” in which they
came to respond, not only to the actions of others but to one’s own gestures, as if from
another. Thus they became capable of using the imagery in gesture to orchestrate vocal
and manual actions. When one makes a gesture, Mead’s Loop mirrors it and brings its
significance into the process of vocal and manual action orchestration, tying together
thought, speech and gesture, and moving the orchestration of actions in Broca’s Area
away from their original vegetative (vocal tract) and manipulative (manual) signifi-
cances. This explains how speech and gesture actions can be orchestrated jointly by sig-
nificances other than those of the actions themselves. And it shows how imagery and
linguistic encoding were united from the beginning. At the same time, Mead’s Loop im-
parted an apparent social reference of gestures – enabling the actions they orchestrate
to mesh with the emerging socio-cultural system of langue, the “social fact” of the lan-
guage system. Many scenarios can be imagined in which Mead’s Loop would have been
selected, but the one that seems most important and likely to have undergone natural
selection is adult-infant instructional interaction, with adults (mothers) the locus of the
natural selection. In this situation, Mead’s Loop gives the adult a sense of being an
instructor as opposed to being just a doer with an onlooker the chimpanzee way. As
many have observed, entire cultural practices of childrearing depend upon this sense.
32. The co-evolution of gesture and speech, and downstream consequences 483
3. “Gesture-first”
The popularity of this theory of the origin of language was sparked in part by a genuine
discovery, the discovery of mirror neurons – although it far antedates this discovery,
going back to at least the 18th C. (see Harris and Taylor 1989). It says the first step
of language was not speech, nor speech with gesture, but was gesture alone. Gestures
in this theory are regarded as mimicry of real world actions, hence the connection to
mirror neurons. In some versions, the gestures are said to have been structured as a
sign language. This view has inspired a surprising range of enthusiasts (see for instance
Arbib 2005; Armstrong and Wilcox 2007; Corballis 2002; Rizzolatti and Arbib 1998;
Tomasello 2008; Volterra, Caselli, Caprici, et al. 2005) and has taken on something
like the default assumption about the place of gesture in the origin of language and
a few Doubting Thomases. We (the co-authors of McNeill, Duncan, Cole, et al. 2008)
are among the latter, along with Woll (2009) and Dessalles (2008) and possibly a few
others.
Gesture-first is correct in asserting that language could not have come into existence
without gesture: on this point, we agree. The error lies in the explanation for why that
was so; that gesture had to precede speech, hence “gesture first”, but, we say, if this did
take place (and we do not deny it) it could not have led to human language. Instead, it
would have landed at a different point on the Gesture Continuum, pantomime. The
Gesture Continuum (formerly “Kendon’s Continuum”, renamed at Kendon’s request):
As one goes from gesticulation to sign language the relationship of gesture to speech
changes:
Language-slotted gestures look like gesticulations but differ in the manner of integrating
with speech. They enter a grammatical slot, semiotically merge with speech, and acquire
syntagmatic values from it; gesticulation in contrast is gesture co-produced obligatorily
with speech but semiotically opposed to it. Pantomime is gesture without speech, often
in sequences and usually comprised of simulated actions. Sign languages are full linguis-
tic codes with their own static dimension. This terminology follows for the most part
Kendon (1980).
There is some evidence described later in children’s early ontogenesis that such a
phase evolved, but if so it was able to support at most a semi-language. A separate
and different evolution, Mead’s Loop, created the dynamic dimension that we now see.
ourselves today. We don’t deny that such a phase could have existed, but do claim that it
could not have been a true protolanguage; it has difficulty explaining how gestures com-
bine with speech to comprise language, as we observe it. Assuming gesture-first, a tran-
sition to later speech would not have led to speech and gesture units, synchronized at
points where they are co-expressive of newsworthy content but to mimicry/pantomime
or perhaps a sign language, but no speech-gesture units. We ourselves are living
contradictions of gesture-first.
We say that “gesture-first” incorrectly predicts that speech would have supplanted
gesture, and fails to predict that speech and gesture became a single system. It thus
meets, along with the growth point, the requirement of falsifiability and (unlike the
growth point) is falsified – twice in fact.
Gesture (…) helped to develop the power of forming sounds while at the same time help-
ing to lay the foundation of language proper. When men first expressed the idea of “teeth”,
“eat”, “bite”, it was by pointing to their teeth. If the interlocutor’s back was turned, a cry
for attention was necessary which would naturally assume the form of the clearest and most
open vowel. A sympathetic lingual gesture would then accompany the hand gesture which
later would be dropped as superfluous so that ADA or more emphatically ATA would
mean “teeth” or “tooth” and “bite” or “eat”, these different meanings being only gradually
differentiated. (Sweet 1888: 51, emphasis added)
(Thanks to Bencie Woll, pers. comm., for bringing this passage to my attention).
Rizzolatti and Arbib (1998) also see gesture as fading once speech has emerged:
“Manual gestures progressively lost their importance, whereas, by contrast, vocalization
acquired autonomy, until the relation between gestural and vocal communication
inverted and gesture became purely an accessory factor to sound communication” (Riz-
zolatti and Arbib 1998: 193). More recently, Stefanini, Caselli, and Volterra (2007: 219,
referring to Gentilucci and Dalla Volta 2007 and others): “the primitive mechanism that
might have been used to transfer a primitive arm gesture communicative system from
the arm to the mouth…” which again implies supplantation.
Most recently of all, Armstrong and Wilcox (2007), in The Gestural Origin of Lan-
guage, imply supplantation, as it were, by silence. They do not, as far as I have been able
to tell, consider how speech came into being at all after sign language (as they identify
gesture-first); but skipping it does not skirt the mystery of how speech supplanted ges-
ture and still ended up being integrated with it. And Tomasello (2008), thinking in terms
of primate and very young (one-year and less) human infant gestures, likewise puts
forth “gesture-first” with supplantation (referring to ontogenesis, but with the sugges-
tion that something similar took place in phylogenesis): “Infants’ iconic gestures
emerge on the heels of their first pointing … they are quickly replaced by conventional
language … because both iconic gestures and linguistic conventions represent symbolic
ways of indicating referents” (Tomasello 2008: 323). It is not that the transitions he
32. The co-evolution of gesture and speech, and downstream consequences 485
mentions do not occur, but that they are insufficient to lead to the incorporation of
gesture as we see it in current-day human speech.
3.1.2. Pantomime?
Michael Arbib (2005) advocates an origin role for pantomime. Pantomime is consid-
ered to follow directly from mirror neuron action. In Arbib’s proposal the initial com-
municative actions were symbolic replications of actions both of self, others and entities,
and these pantomimes could later “scaffold” speech. Merlin Donald (1991) likewise
posited mimesis as an early stage in the evolution human intelligence. There may
indeed have been pantomimes without vocalizations used for communication at the
dawn, in which case pantomime could have had its own evolution, landing at a different
place on the Gesture Continuum, as seen today in its own temporal relationship to
speech: substituting for rather than synchronizing with it.
The distinguishing mark of pantomime compared to gesticulation, the point on the
Gesture Continuum where speech and gesture combine, is that the latter, but not the
former, is integrated with speech. Gesticulation is an aspect of speaking. The speaker
constructs a gesture-speech combination. In pantomime this does not occur. There is
no co-construction with speech, no co-expressiveness, timing is different (if there is
speech at all), and no duality of semiotic modes. Pantomime, if it relates to speaking
at all, does so, as Susan Duncan (pers. comm.) points out, as a “gap filler” – appearing
where speech does not. Movement by itself offers no clue to whether a gesture is ges-
ticulation or pantomime; what matters is whether or not two modes of semiosis com-
bine to co-express one idea unit simultaneously. It is conceivable that pantomime is
something an ape-like brain would be capable of, and was a proto-system already in
place in the last common chimp(say)-human ancestor, some 8 million years back
(while it could have been a step on the way, it would not identify the events that led
to language specifically).
These Notes argue, as is plain by now, that imagery is not just “vital” but is language in
part, no less so than the segmented verbal atoms with which it combines to form growth
points.
Given this asymmetry, even though speech and gesture were jointly selected, it
would still work out that speech is the medium of linguistic segmentation. As a result,
natural selection for gesture-speech combinations would still have had speech handling
the linguistic semiotic. Sign languages – their existence as full linguistic systems –
impress many as a justification for positing gesture-first, but in fact, historically and
over the world, manual languages are found only when speech is unavailable; the dis-
crete semiotic then shifting to the hands. So it is not that gesture is incapable of carrying
a linguistic semiotic, it is that speech (to us visually disposed creatures) does not carry
the imagery semiotic.
The spacing is meant to show relative durations, not that signs were performed with
temporal gaps. What happens is that at the beginning of each phrase speech and
sign start out together but immediately fall out of synchrony, and then reset (there
is one reset in the example). The process of separation then occurs again. So, accord-
ing to this model, speech-gesture synchrony would be systematically interrupted at the
crossover. Yet synchrony of co-expressive speech and gesture evolved.
system. And there are ontogenetic reasons to suggest that pointing and character view-
point gestures, in the end, are effects of language, not steps toward it. What is crucial is
that there be developed the semiotic contrasts of the global-synthetic to the analytic-
combinatoric modes that can fuel thought and speech, and pantomime and pointing
possibly could not do this. However, once language with gesture was established, pan-
tomime and pointing could be absorbed by it, pantomime becoming integrated as “char-
acter viewpoint” or CVPT gestures, pointing as deixis, both of the concrete and the
“abstract” metaphoric kinds. In ontogenesis, pointing and pantomime precede, speak-
ing follows and absorbs. Both emerge well in advance of encoded speech and, for a
time, alternate with it, patterns that exhibit the initiation and then absorption of
these separate gesture modes. The integration of pantomime with language to make
CVPT seems to occur not before age 3 and possibly even later, and this is also the ear-
liest ages at which there is any evidence of growth points. Pointing occurs throughout
this development but is so imprecisely timed with speech, even with adults, that it is
hard to say whether it does or does not participate in any sort of growth point process.
Abstract or metaphoric pointing to help organize discourse is definitely absent from
pre-growth point children. So there are various reasons to regard pantomime and point-
ing as products of their own evolutions which are, now, absorbed by a separately
evolved dynamic linguistic process.
4. Mead’s Loop
We now consider Mead’s Loop, an explanation of how the imagery-language dialectic
and growth point could plausibly have been naturally selected at the origin. Mead’s
Loop is a hypothesis concerning mirror neurons and how they were altered. According
to Mead’s Loop what had to evolve is a new ability for mirror neurons to self-respond
to one’s own gestures. The idea is inspired by George Herbert Mead who proposed that
the significance of a gesture depends on the gesture arousing in oneself the same
response as it does in others (Mead 1974: 47). Mead wrote this much earlier, probably
in the 1920s. I first saw this description in a paper-bound typescript by Mead (with pen-
ciled annotations) on the shelves of the University of Chicago library, under the title
Philosophy of Gesture. The pamphlet has since vanished, presumably sequestered in
some special collection. Mead’s Loop treats one’s own gesture as a social stimulus,
and thus explains why gestures occur preferentially in a social context of some kind
(face-to-face, on the phone, but not speaking to a tape recorder, Cohen 1977). A social
reference is intrinsic to mirror neurons. The Mead’s Loop “twist” is that one’s own ges-
tures are treated as having the same social reference. In so doing, it opens the mirror
neuron response to the range of significances that gesture is able to carry and brings
an inherent “Other” orientation to gesture – both essential properties of what evolved.
Without Mead’s Loop, mirror neurons respond to the actions of others with the sig-
nificances they have as actions: with it, they take on a range of different meanings, the
meanings of gesture. So, according to Mead’s Loop, part of human evolution was that
mirror neurons came to respond to one’s own gestures, including their imagery, as if
from another person.
To form Mead’s Loop, speech and gesture had to evolve together. Gesture-first and
speech-first (another alternative) are equally inadequate to the job. At the motor level,
Mead’s Loop provides a way for significant imagery – carried by gesture – to orches-
trate speech motor control. Gestures in the early hominid line would, with Mead’s
Loop, co-opt the vocal action circuits. Mead’s Loop gesture gains the property of
chunking: a chunk of linguistic output organized around significant imagery, rather
than, as with unmodified mirror neurons (shared by all primates), the significance of
instrumental actions qua action.
we still need to add something that allows us to understand how the actions of the mouth
and associated vocalisation came to be available, as it were, so that they could be recruited
490 IV. Contemporary approaches
into the referential gesture function. For this to be possible an elaborate and voluntary
control of the vocal system must already have been in place. In other words, a scenario
for the evolution of the human speech apparatus and its neuro-motor control systems is
also needed (Kendon 2009 ms., p. 29)
Mead’s Loop provides the connection. Through Mead’s Loop, gesture imagery enters
the part of the brain, Broca’s area, where complex actions are orchestrated, and pro-
vides a new basis of action, imagery, and does so in such a way that the neuro-motor
control system that Kendon mentions is a product rather than a precondition for the
recruitment of vocal tract orchestration by gesture function.
Mead’s Loop is a necessary but far from sufficient step toward language at the
dawn – the evolution of a brain link between thought, language and hand that enables
the orchestration of movements by significances other than those of the actions them-
selves. Other steps could have arisen after the Mead’s Loop step, and in fact would
require it. Some of these steps may have acquired biological adaptations, real genetic
changes and a few possibilities are suggested later.
Four steps comprise our Mead’s Loop hypothesis:
(i) Primates in general have mirror neurons but, by themselves, they mirror only the
intentional actions of others.
(ii) Mead’s Loop brings gestures into the same areas that have mirror neurons.
(iii) The gestures bring their own significances. They also have the quality of referring
to the Other – with them comes an intrinsic social presence.
(iv) These significances (including social references) then become the orchestrating
schemas of the once vegetative movements, making them into speech movements.
It is meant to explain:
(i) The synchronization of gesture with vocalization on the basis of shared meaning
(= the only way they synchronize)
(ii) The co-opting by meanings of brain circuits that orchestrate sequential actions and
(iii) The occurrence of gestures preferentially in a social context of some kind (face-to-
face, on the phone, but not speaking to a tape recorder – two modern challenges to
Mead’s Loop).
That Mead’s loop treats imagery as a social stimulus, explains point (iii). That mirror
neurons complete Mead’s loop in a part of the brain where action sequences are orga-
nized, means the two kinds of sequential actions, speech and gesture, converge and that
meaning in imagery is the integrating component, and this explains points (i) and (ii).
Hence, gesture and speech necessarily co-evolved in the Mead’s Loop hypothesis, and
this is the chief incompatibility with the gesture-first theory.
have existed with the last common human-chimp ancestor and would have provided
raw material for co-opting the motor area by imagery – or raw material from even fur-
ther back: Fogassi and Ferrari (2004) have identified neural mechanisms in monkeys for
associating gestures and meaningful sounds, which they suggest could be a pre-adaptation
for articulated speech.
It is the case that chimps produce gestures and adapt new gestures for communica-
tive purposes (see Call and Tomasello 2007). Mead’s Loop also could use these kinds of
manual signals as raw material. What made us human according to this model crucially
depended at one point on such gestures, which Mead’s Loop altered fundamentally. Ob-
servations by Amy Pollick (see for instance Pollick and de Waal 2007) of bonobos at the
San Diego Zoo and elsewhere reveal a use of pantomimic/iconic gestures to induce
movements by other bonobos (but not, so far as I know, ever humans). The gesture re-
plicates the motion the performer desires from the other bonobo. She does not replicate
the entire movement with all her limbs, but symbolizes it in a hand movement, forming
an iconic gesture.
The significance of the gestures in the illustrations was clear to both producer and
recipient, and was to get the second chimp to move. The gestures were totally inde-
pendent of vocalization (for there was none) but when we imagine such gestures in
the Mead’s Loop creature, vocalization would also be present (like the Hopkins
chimp).
Start of second swing, other bonobo moves End of second swing, other moving
So this kind of gesture could have been raw material for the mirror neuron “twist”. That
it has not happened (at least, not yet) with the bonobo does not mean it couldn’t have
been available as raw material for the creature in which we are presuming language to
have originated.
5.4. Scenarios
Natural selection for Mead’s Loop could arise whenever sensing oneself as a social
object is advantageous – as for example when imparting information to infants,
where it gives the adult the sense of being an instructor as opposed to being just a
doer with an onlooker (the chimpanzee way). Entire cultural practices of childrearing
depend upon this sense (Tomasello 2008). Obviously, there would have been a range
of such instruction scenarios: passing on skills, planning courses of action, cultural
494 IV. Contemporary approaches
freed the hands for manipulative work and gesture, but it would have been only the
beginning. Even earlier there were presumably preadaptations such as an ability to
combine vocal and manual gestures, and the sorts of iconic/pantomimic gestures
we saw in bonobos, but not yet an ability to orchestrate movements of the vocal
articulators by gestures.
(ii) The period from 5 to 2 million years ago – Lucy et al. and the long reign of
Australopithecus – would have seen the emergence of precursors of language,
something an apelike brain would be capable of, such as the kind of protolanguage
Bickerton attributes to apes, very young children and aphasics (Bickerton 1990);
also, ritualized incipient actions become signs at this stage as described by Kendon
(1991).
(iii) Starting about 2 million years ago with the advent of Homo habilis and later Homo
erectus, there commenced the crucial selection of self-responsive mirror neurons
and the reconfiguring of areas 44 and 45, with a growing co-opting of actions by
language, this emergence being grounded in the appearance of a humanlike family
life with the host of other factors shaping this major change (including cultural in-
novations like the domestication of fire and cooking). Recent archeological find-
ings strongly suggest that hominids had control of fire, had hearths, and cooked
800 thousand years ago (Goren-Inbar, Alperson, Kislev, et al. 2004). Another cru-
cial factor would have been the physical immaturity of human infants at birth and
the resulting prolonged period of dependency giving time, for example, for cultural
exposure and growth points to emerge at leisure, an essential delay pegged to the
emergence of self-aware agency. Thus, the family as the locus for evolving the
thought-language-hand link seems plausible.
(iv) Along with this sociocultural revolution was the expansion of the forebrain from
2 million years ago, described by Deacon (1997), and a complete reconfiguring
of areas 44 and 45, including Mead’s loop, into what we now call Broca’s area.
This development was an exclusively hominid phenomenon and was completed
with Homo sapiens about 200–100 thousand years ago (if it is not continuing;
see Donald 1991).
Considering the timeline, protolanguage and then language itself seems to have
emerged over five million years (far therefore from a big bang). Meaning-controlled
manual and vocal gestures, synthesized, as we currently know them, under meanings
as growth points emerged over the last two million years. The entire process may
have been completed not more than 100 thousand years ago, a mere 5,000 human
generations, if it is not continuing.
a role in the creation of growth points. This is plausible, since growth points depend on
the differentiation of newsworthy content from context and require the simultaneous
presence of linguistic categorial content and imagery, both of which seem to be avail-
able in the right hemisphere. The frontal cortex also might play a role in constructing
fields of oppositions and psychological predicates, and supply these contrasts to the
right hemisphere, there to be embodied in growth points. The prefrontal cortex devel-
ops slowly in children (Huttenlocher and Dabholkar 1997), and in fact does not reach a
level of synaptic density matching the visual and auditory cortex until about the same
age that self-aware agency appears and with it, per hypothesis, growth points differen-
tiated from fields of oppositions. Everything (right hemisphere, left posterior hemi-
sphere, frontal cortex) converges on the left anterior hemisphere, specifically Broca’s
area, and the circuits specialized there for action orchestration. Broca’s area may
also be the location of two other aspects of the imagery-language dialectic. At least,
such would be convenient – or more than convenient: a study of Ben-Shachar, Hendler,
Kahn, et al. 2003 shows an fMRI response in Broca’s area for grammaticality judg-
ments. Grammaticality judgments, obviously, depend on linguistic intuitions. The
same study found verb complexity registering in the left posterior superior temporal
sulcus area. In the brain model, the left posterior area provides the categorial content
of growth points – the generation of further meanings via constructions and their
semantic frames, and intuitions of formal completeness to provide dialectic “stop
orders”. All of these areas are “language centers” of the brain in this model.
The brain model also explains why the language centers of the brain have classically
been regarded as limited to two important patches – Wernicke’s and Broca’s areas. If the
model is on the right track, contextual background information must be present to acti-
vate the broad spectrum of brain regions that the model describes. Single words and de-
contextualized stimuli of the kinds employed in most neuropsychological tests would not
engage the full system; they would be limited to activating only a small part of it.
7. Whence syntax?
The origin and natural selection of static dimension features has been a shining light on
the hill to some, bugbear to others, ever since Lenneberg (1967) posited a biological
capacity for language, and Chomsky (1965) applied the idea to syntax. The emergence
sui generis of morphs (McNeill and Sowa 2011) and syntagmatic values (Gershkoff-
Stowe and Goldin-Meadow 2002) in gestures when speech is denied suggests, at a
broad level, an urge to divide and organize meaningful symbols syntactically. In this sec-
tion, we consider the role that Mead’s Loop could have played in laying a basis for this
push to syntax and what, given this role, it would consist of.
7.2. Shareability
Mead’s Loop explains one step at the origin of language. It enabled gestures to orches-
trate actions both manual and vocal with significances other than the actions them-
selves. Additionally, Mead’s Loop, where gesture assumes the guise of a social other,
plants in the brain the seed of what Freyd (1983) in her innovative paper called Share-
ability – constraints on information that arise because it must be shared; constraints
because:
It is easier for a individual to agree with another individual about the meaning a new
“term” (or other shared concept) if that term can be described by: (a) some small set of
the much larger set of dimensions upon which things vary; and (b) some small set of dimen-
sional values (or binary values as on a specific feature dimension). Thus, terms are likely to
be defined by the presence of certain features. (Freyd 1983: 197, italics in original)
orchestration under control. The growth point underlies all three pulses; in fact it ex-
plains the three and their time alignment in terms of this one pulse, the growth point
itself.
I argue that the manual actions that are coordinated with speech that we see in modern
humans, are derived from forms of practical action. That is, they are “ritualized” versions
of grasping, reaching, holding, manipulating, and so forth. Given their intimate connection
with speaking, I make a speculative leap, and argue that speaking itself is derived from
practical actions. Acts of utterance are to be understood, ultimately, as derived versions
of action system ensembles mobilized to achieve practical consequences. The origin of lan-
guage, that is to say, derives by a process of “symbolic transformation” of forms of action
that were not communicatively specialized to begin with. Speech and associated gestur-
ing does not descend from communicative vocalizations and visible displays but from
manipulatory activities of a practical sort. (Kendon 2008a: introduction)
500 IV. Contemporary approaches
This action hypothesis also leaves an explanatory gap that Mead’s Loop fills. It cannot
explain speech and gesture as new kinds of action in the human domain – the “process
of ‘symbolic transformation’ ” is unexplained and seems to presuppose such a process.
As with MacNeilage, there is no incompatibility but the growth point and Mead’s Loop
complete the picture. Practical actions cease to be actions of that kind and become, some-
how, actions of a new kind. The human brain evolved a “symbolic transformation” with
which to orchestrate manual and vocal actions by significances other than those of the
actions themselves. The natural selection of Mead’s Loop explains how this happened.
Without Mead’s Loop, the significance of the action remains that of the original action.
It is not clear what “ritualization” as an explanation is meant to cover exactly, but if
it is streamlining the forms of actions and making them smaller, there is this gap that
Mead’s Loop fills. Alternatively, “ritualization” could be the new significance orches-
trating the action, and then it is another name for Mead’s Loop – the “speculative
leap” – explaining the selective mechanism. In any case, once Mead’s Loop was estab-
lished, manual actions enter into dialectics with the co-evolving linguistic encodings
and thus fundamentally changed their character from their instrumental (“practical”)
action originals.
[N]ew form-function mappings arise among children who functionally differentiate pre-
viously equivalent forms. The new mappings are then acquired by their age peers (who
are also children), and by subsequent generations of children who learn the language,
but not by adult contemporaries. As a result, language emergence is characterized by a
convergence on form within each age cohort, and a mismatch in form from one age cohort
to the cohort that follows. In this way, each age cohort, in sequence, transforms the lan-
guage environment for the next, enabling each new cohort of learners to develop further
than its predecessors. (Senghas 2003: 511)
The importance of the cohort, and the localization of each successive version of the sign
language within an age peer group shows the role of the encounter in the creation of
languages and language drift generally. The successive changes are systematic, not ran-
dom, and show the importance of shareability – as the language changed over succes-
sive generations it developed the ever greater analysis of form-function mappings
that Senghas describes.
(i) Mapping meanings onto temporal orders seems primitive – the thought-language-
hand link provides a new way to orchestrate movements, and these movements
take place in time. Thus, a language in which meanings map onto time alone is
one candidate for early linguistic form. More derived are morphological complexes
that dissociate meaning from temporal order. Other discourse and social forces
then are free to orchestrate vocal and manual actions. It is perhaps no mistake
that languages spoken in the least accessible areas, thus taking the longest to
502 IV. Contemporary approaches
reach, such as the arctic, are elaborate and distant from the presumed mapping
through action orchestrations of meanings onto time.
(ii) Combinations of constructions likewise would be coordinated through temporal
loci – the embedded parts temporally isolated from the embedding parts, which
in turn are continuous in time except for other embeddings. This creates levels
but always holds the temporal sequences together. Thus, a language in which recur-
sion is handled by holding pieces in memory until an embedded piece is complete
is another candidate for an early form. Notice that both the recursion and the basic
mapping mechanisms assume that the creatures employing them have some way to
know when some form is “complete” or “incomplete” – that is, have standards of
form. So something like a system of langue was in place.
(iii) Similarly, the spatial zone of Indo-European is the area through which migrants
would have passed, hence is likely to be covered with relic features. Glacial condi-
tions would have blocked paths to the north but east-west extensions were open.
Shields (1982), citing earlier authorities, proposed that the earliest form of Indo-
European was isolating, like Chinese (also very early), which implies a time-
sequenced base for grammatical meaning – in terms of the hypothesis, orchestrations
without further superstructure.
A simple hypothesis would be that the Indo-European type is one of the least elabo-
rated. From geography, it would have been the initial area to be entered, hence
would have the greatest likelihood of retaining time-based orchestrations. More outly-
ing areas – the arctic, the southern ocean, the New World – would have developed more
adorned languages. This makes sense from the dispersion viewpoint, since these remote
regions would have been the last to be reached through migrations; the indigenous
people they encountered on the way are long gone but the impacts are still visible
and long-distance migrants will have absorbed the greatest number of them. Each
encounter creates pressure to stabilize the dynamic of language and thought and this
presumably tends to add embellishments. Perhaps there is some such correlation –
more remote, more departure from temporal sequencing. If this sounds too much
like English to be true, the reasons for the deduction are not English but considerations
of possible brain mechanisms. Are the languages with the greatest time depth in the orig-
inal eastward migration route also ones with the most temporally sequenced relations?
The Sapiran division of languages according to how they combine meanings into single
words – analytic or isolating (e.g., Chinese), synthetic (e.g., Latin), polysynthetic (e.g.,
Inuit) (Sapir 1921: 128), may reflect degrees of adornment of the basic brain orchestration
plans. There are of course all kinds of possible complications of this simple pattern. Mul-
tiple waves of migration and changes from other sources can induce changes on top of
existing layers. Indo-European itself presumably was altered to become inflecting after
the first migrations. Recent DNA analyses (Science news item, issue of 4 Sept. 2009,
p. 1189, vol. 325) suggest that the first farmers in Europe were migrants, replacing indig-
enous hunter-gatherers, and modern European populations descended from these mi-
grants; so further, per the hypothesis, altering the language plan as well. There must
also be other forces changing languages that might move them in random directions, fur-
ther camouflaging ancestral forms. But the main point is that many hypotheses concern-
ing the dispersal of language – the Tower of Babel story – can be traced to the effects of
migration encounters on the mechanisms of orchestration as described by Mead’s Loop.
32. The co-evolution of gesture and speech, and downstream consequences 503
to which they have been exposed corresponds possibly to a precursor. Steven Pinker
famously pooh-poohed the ape-language studies as “trained animal acts” (1994).
While the communicative miracle of Kanzi and others is not “human language”,
Pinker’s sound-bite masks the precursors of human language revealed in these studies.
Perhaps the most significant language trained animal-human natural language differ-
ence stems ultimately from a lack of Mead’s Loop and lack as a result of growth points,
and hence – a significant difference – no unpacking process, despite having sequences of
signs of some kind. When Washoe or Kanzi outputs strings of signs or button presses,
there is nothing corresponding to the unpacking of growth points that is an inherent
part of the human engagement of thought with language. This unpacking of growth
points seems a cognitive innovation on the evolutionary stage, a trait others do not pos-
sess. Unpacking gives rise to syntax and could be a pattern of human cognition more
broadly, beyond the sentence. Chimps can learn to read Arabic numbers and recite
them (in the form of button presses) but they show no ability to go farther, for example,
to use their knowledge to actually count. Similarly, chimps who have received language
from their experimenters do not enlarge their universe of knowledge with it. I see in this
non-expansion of cognition striking evidence that a language-engendered mode of cog-
nition seems absent from the chimp’s mental world. The chimp brain is locked out of it,
for all its powers of alert attention and inference.
Beginning with the Gardners’ (1969, 1971) attempt to teach American Sign Lan-
guage (or something like it) to Washoe, the direction of thinking about the potential
of non-human primates for acquiring aspects of language entered a new phase. Whereas
earlier attempts with chimpanzees had emphasized speech and failed to elicit anything
like it, the Gardners reasoned that the problem was a peripheral limitation of the chim-
panzee vocal tract, and that the animals could master a different mode of language,
manual sign language. Washoe did indeed learn a number of signs, and spontaneously
began stringing them together to form complex references. However, the absence of
speech is significant, for it shows in the chimp that discovering new patterns of action
orchestration in language does not occur despite exposure to human speech. Nonethe-
less, sign sequences might be an alternate kind of action orchestration and, from a
Mead’s Loop vantage point, be important for what they do and do not exhibit.
Some years back (McNeill 1974), long before I perceived anything of the existence of
Mead’s Loop, I collected all the examples of Washoe’s sign sequences that I could glean
from various lists and examples the Gardners had cited, and concluded that she did
indeed produce sequences in certain regular ways (as well as producing numerous
signs themselves). I found three orders (two of which the Gardners had identified as
well). After eliminating various redundancies and sequences due to mechanical con-
straints, they boiled down to one pattern that Washoe truly followed; this was
Addressee-Action-Non-addressee. I wrote of this sequence: “The chimpanzee may
therefore have imposed her own formula on the sentence structures she observed her
handlers using. Washoe’s formula does not capture what the handlers themselves en-
coded (agent, action, recipient), but instead emphasizes a novel relationship as far as
grammatical form is concerned, that of an interpersonal or social interaction
(addressee-non-addressee)” (McNeill 1974: 83).
The Addressee-Action-Non-Addressee order, covers social reference – the same
area as Mead’s Loop – but is based directly on social interaction. Basically,
Addressee-Action-Non-Addressee is mimicry, usually one featuring Washoe’s begging
32. The co-evolution of gesture and speech, and downstream consequences 505
or demanding something from another. Washoe and her handlers could see the same
sequences of signs yet relate them to different underlying meanings, addressee rather
than agent, etc. So for Washoe, meaning packages orchestrating actions are those of
the interacting parties, not those of actional imagery. This points out both a similarity
and a crucial inter-species difference that could set one sort of stage for Mead’s
Loop. The same action sequences covered by non-Mead’s Loop and Mead’s Loop
minds could have offered the creature we are attempting to reconstruct a bridge. An
ancestral creature in the right family milieu, with this chimp-like capability, could
start to use begging in totally new ways, once selection pressures for Mead’s Loop
came to be felt in interacting with infants, as described earlier.
In keeping with an absence of Mead’s Loop, Washoe and other signing apes, as far
as I know, never perform gestures with their signs, as human signers do (Duncan 2005,
Liddell 2003, and Bechter 2009). This, despite the fact that non-signing apes gesture
and vocalize concurrently and show a right hand dominance when they do (Hopkins
and Cantero 2003). This vocalization and hand dominance is another precursor but
not yet language.
Kanzi, one of the celebrated bonobo subjects of Sue Savage-Rumbaugh’s language
learning experiments (Savage-Rumbaugh, Shanker, and Taylor 1998), was highly suc-
cessful at learning an artificial system of communication utilizing a keyboard with but-
tons corresponding to lexical words (each button with a distinctive visual sign). At first
glance, there is nothing in this performance that points to Mead’s Loop. The motions of
Kanzi’s hands (pressing buttons) are not re-orchestrated movements, as for us are
speech (originally vegetative) and gesture (originally manipulative); they remain press-
ing-down movements. But there are also, in the videos of Kanzi that I have seen, rapid
sequences of different key presses in which orchestrations of orders seem present under
communicative significances. Re-orchestration is not a full explanation of what Kanzi
may have achieved, but it does suggest that another precursor of Mead’s Loop could
have existed in our last common ancestor with the chimpanzee line, namely an ability
to orchestrate sequences of movements (the hands only) by meanings. Kanzi also shows
an ability to understand spoken English commands. However, this offers little clue to
where he stands regarding Mead’s Loop. Comprehension of speech is so multifaceted
and language itself so varied a factor in it, that it is difficult to conclude anything
very definite from these feats regarding precursors. The potential role of multiple fac-
tors seems particularly germane to Kanzi’s spoken English comprehension. While the
language comprehension tests Savage-Rumbaugh, Shanker, and Taylor (1998) con-
ducted excluded extralinguistic cues that could have steered Kanzi to correct choices
(such as the human handler’s gaze), we nonetheless see in some cases, in the videos,
Kanzi reaching for the correct item in advance of hearing the critical word identifying
it, so other cues seemingly were available to him despite all precautions – we know nei-
ther what nor how many. Given the multifaceted character of speech comprehension,
this is perhaps not too surprising even with conscientious control efforts. However,
there is no indication whatsoever that Kanzi uses imagery as the orchestration medium,
so Mead’s Loop is remote from the chimp (here, a bonobo) on this point as well. The
general realization that nonhuman primates are unable to orchestrate speech sounds (as
has been documented now for many decades, and was indeed part of the original moti-
vation for using sign language and a keyboard) shows the other limit of the chimp brain
for language: it’s not just that ape mouth-anatomy differs (a higher larynx, for one
506 IV. Contemporary approaches
thing), it is that apes do not have the brain circuitry to orchestrate these mouth-part
movements around gesture imagery, nor the circuitry to use imagery to do the orches-
trating. So Mead’s Loop would have no equivalent in their brain functioning, even
though precursors may exist, as Kanzi’s fluent sequential button-pressing and Washoe’s
sign sequences suggest.
A limit on orchestration doesn’t apply to Alex, the parrot, who apparently was able
to orchestrate actions of the syrinx and possibly tongue to produce speech-like sounds
(Pepperberg 2008), but the last common ancestor of humans and parrots is so far back
that this achievement can shed no light on human evolution and Mead’s Loop, espe-
cially given its apparent total absence in many other creatures also descended on the
mammalian side (cows, for example). The similarities are perhaps due instead to
some kind of convergent evolution, possibly tied to mimicry tendencies, shared by
parrots and humans (but, complicating this explanation, equally shared by apes).
Adam Kendon in 1991, summarizing several chimpanzee studies, wrote as follows:
Evidently, then, chimpanzees in wild and semi-wild conditions refer each other to fea-
tures of the environment by means of a sort of eye and body pointing, they do sometimes
give evidence of partially acting out possible, rather than actual courses of action, they
are able to grasp the nature of the information their own behaviour provides to others
and to modify it accordingly if it suits their purposes, and in respect to some kinds of ges-
tural usage, as we have seen, they are able to employ them in new contexts with new
meanings [referring here to attested cases by infants lifting their arms overhead as a sig-
nal of non-aggression, based on a natural postural adjustment to being groomed]. They
are on the edge of developing a use of gesture that is representational. The studies in
which chimpanzees have been taught symbolic usage, whether of gestures or of keyboard
designs, not only confirm the cognitive capacities that these observational studies imply
but also show that chimpanzees can use behaviour representationally if shown how.
(Kendon 1991: 212)
“Chimpanzees, then, seem on the verge of developing a language, yet they have not
done so. What is missing? What holds them back?” (Kendon 1991: 212).
Kendon proposes an answer, that chimpanzee social life is full of “parallel actions”
but has little in the way of collaboration. This is in line with Mead’s Loop. Chimpanzees
have not experienced the pressure, as we imagine early humans did, to develop Mead’s
Loop and growth points, and the discreteness, repeatability, and portability foundations
of a syntax of action orchestration with which to stabilize them. And this is why we
stand alone.
“monster” + two vertical palms spread apart (=big) (Goldin-Meadow and Butcher 2003,
Table 3). There is some uncertainty over how universal any such single pattern is. The
review of early child language development of more than 50 languages from around the
world in Bowerman and Brown (2008) makes clear that there are various ways of en-
tering this domain of linguistic form, partially summarized by “bootstrapping”. This en-
compasses semantic and syntactic bootstrapping: these are regarded as rival claims
about infant predispositions for the forms of language, although in fact they seem
closely related, emphasizing one side or the other of the linguistic sign, signifier or sig-
nified. Bootstrapping from either is expected to lead to other areas of language. One
hypothesis holds that certain semantic signified patterns evolved – such as actor-
action-recipient, from Pinker (1989); the other that syntactic signifier patterns did –
such as subject-verb-object, from Gleitman (1990); both are said to provide entrée to
the rest of language. Of course, both or neither, as Slobin (2009) suggests in his review,
may have evolved – the picture is murky, to say the least. Other languages offer obsta-
cles and routes into the static forms of their language tailored to local features, and this
is not surprising if what is being developed at this stage are not only static structures but
action templates – how to organize motions of the vocal apparatus like those of
grownups.
“Dark age” transition. “Dark” because so little is known of gesture performance
once patterned speech starts, and also because it appears to be an age at which syntax
relates to thought not via inhabitance, as later, but as a kind of elaborated word learning
(constructions being like “words” with internal slots into which other words can insert;
see Goldberg 1995). Constructions can run off as denotative references and shared per-
formances with adults (Werner and Kaplan 1963) but not yet as modes of thinking for
speaking. Gestures predictably then appear to decline in frequency. I imagine, they are
largely pantomimic, but this is not actually known. Looking at the linguistic record, co-
piously described in contrast, the child appears to be engaged in building up action tem-
plates. The first 3 years roughly, for the child, is devoted to the fascinating exercise of
doing as adults do vocally. But it is not yet the cognition of integrated gesture-speech;
that is yet to come.
The “age of language-thought combinations” is the start of the growth point and the
effects of Mead’s Loop. This “age” commences not earlier than age 3˜4, and is linked to
the development of the sense of self-aware agency to which the selection of Mead’s
Loop was tied, and which onsets in current-day ontogenesis at around age 4.
9. Conclusions
Three separate evolutions converge in one ontogenetic process – gesture-first, the origin of
syntax and the dynamic inhabitance of language the growth point provides. The new age
of speculation regarding the old age of language origin may have reached a level where
scientific hypotheses and falsifiability tests can be formulated. “Gesture-first” is such a
hypothesis. It is falsifiable in the Popper sense and is false as a whole, because it cannot
account for the current-day observed combination of speech and gesture, but it may reap-
pear in the earliest stages of current-day child language acquisition, which retraces the
steps in language origin, gesture-first, the syntax origin (which comes “too early” now, pro-
ducing the “dark age” of constructions without presumed inhabitance), and the emer-
gence of the growth point and this inhabitance with the arrival of self-aware agency.
32. The co-evolution of gesture and speech, and downstream consequences 509
10. References
Arbib, Michael A. 2005. From monkey-like action recognition to human language: An evolution-
ary framework for neurolinguistics. Behavioral and Brain Sciences 28: 105–124.
Armstrong, David and Sherman Wilcox 2007. The Gestural Origin of Language. Oxford: Oxford
University Press.
Bechter, Frank Daniel 2009. Of deaf lives: Convert culture and the dialogic of ASL storytelling.
Unpublished Ph.D. Dissertation, University of Chicago.
Ben-Shachar, Michal, Talma Hendler, Itamar Kahn, Dafna Ben-Bashat and Yosef Grodzinsky
2003. The neural reality of syntactic transformations: Evidence from functional magnetic res-
onance imaging. Psychological Science 14: 433–440.
Bickerton, Derek 1990. Language and Species. Chicago: University of Chicago Press.
Bowerman, Melissa and Penelope Brown (eds.) 2008. Crosslinguistic Perspectives on Argument
Structure: Implications for Learnability. New York: Taylor and Francis.
Browman, Catherine P. and Louis Goldstein 1990. Tiers in articulatory phonology, with some im-
plications for casual speech. In: John Kingston and Mary E. Beckman (eds.), Papers in Labo-
ratory Phonology I: Between the Grammar and Physics of Speech, 341–376. Cambridge:
Cambridge University Press.
Butcher, Cynthia and Susan Goldin-Meadow 2000. Gesture and the transition from one- to two-
word speech: When hand and mouth come together. In: David McNeill (ed.), Language and
Gesture, 235–257. Cambridge: Cambridge University Press.
Call, Josep and Michael Tomasello 2007. The Gestural Communication of Apes and Monkeys. New
York: Lawrence Erlbaum.
Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge: Massachusetts Institute of
Technology Press.
Cohen, Akiba A. 1977. The communicative function of hand illustrators. Journal of Communica-
tion 27: 54–63.
Corballis, Michael C. 2002. From Hand to Mouth: The Origins of Language. Cambridge, MA: Har-
vard University Press.
Deacon, Terrence W. 1997. The Symbolic Species: The Co-evolution of Language and the Brain.
New York: Norton.
Dessalles, Jean-Louis 2008. From metonymy to syntax in the communication of events. Interaction
Studies 9(1): 51–65.
Donald, Merlin 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and
Cognition. Cambridge, MA: Harvard University Press.
Duncan, Susan. 2005. Gesture in signing: A case study in Taiwan Sign Language. Language and
Linguistics 6: 279–318.
Emmorey, Karen, Helsa B. Borinstein and Robin Thompson 2005. Bimodal bilingualism: Code-
blending between spoken English and American Sign Language. In: James Cohen, Kara T.
McAlister, Kellie Rolstad and Jeff MacSwan (eds.), Proceedings of the 4th International Sym-
posium on Bilingualism, 663–673. Somerville, MA: Cascadilla Press.
Evans, Patrick D, Gilbert, Sandra L., Mekel-Bobrow, Nitzan, Vallender, Eric J., Anderson, Jeffrey
R., Vaez-Azizi, Leila M., Tishkoff, Sarah A., Hudson, Richard R. and Lahn, Bruce T. 2005. Mi-
crocephalin, a gene regulating brain size, continues to evolve adaptively in humans. Science
309: 1717–1720.
Feyereisen, Pierre volume 2. Gesture and the neuropsychology of language. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.), Body-
Language-Communication: An International Handbook on Multimodality in Human Interac-
tion. (Handbooks of Linguistics and Communication Science 38.2.) Berlin: De Gruyter
Mouton.
Fogassi, Leonardo and Pier Francesco Ferrari 2004. Mirror neurons, gestures and language evolu-
tion. Interaction Studies 5: 345–363.
510 IV. Contemporary approaches
Freyd, Jennifer J. 1983. Shareability: The social psychology of epistemology. Cognitive Science 7:
191–210.
Gardner, Beatrix T. and R. Allen Gardner 1971. Two-way communication with an infant chimpan-
zee. In: Allan Martin Schrier and Fred Stolnitz (eds.), Behavior of Nonhuman Primates, vol-
ume 4: 117–184. New York: Academic Press.
Gardner, R. Allen and Beatrix T. Gardner 1969. Teaching sign language to a chimpanzee. Science
168: 664–672.
Gentilucci, Maurizio, and Riccardo Dalla Volta 2007. The motor system and the relationship
between speech and gesture. Gesture 7: 159–177.
Gershkoff-Stowe, Lisa, and Susan Goldin-Meadow 2002. Is there a natural order for expressing
semantic relations? Cognitive Psychology 45(3): 375–412.
Gleitman, Lila 1990. The structural sources of verb meanings. Language Acquisition 1: 3–55.
Goldberg, Adele 1995. Constructions: A Construction Approach to Argument Structure. Chicago:
University of Chicago Press.
Goldin-Meadow, Susan 2003. The Resilience of Language: What Gesture Creation in Deaf Children
Can Tell Us about How All Children Learn Language. New York: Taylor and Francis.
Goldin-Meadow, Susan and Cynthia Butcher 2003. Pointing toward two-word speech in young
children. In: Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet, 85–
107. Mahwah, NJ: Erlbaum.
Goldin-Meadow, Susan, David McNeill and Jenny Singleton 1996. Silence is liberating: Removing
the handcuffs on grammatical expression in the manual modality. Psychological Review 103:
34–55.
Goren-Inbar, Naama, Nira Alperson, Mordechai E. Kislev, Orit Simchoni, Yoel Melamed, Adi
Ben-Nun and Ella Werker 2004. Evidence of hominid control of fire at Gesher Benot Ya’aqov,
Israel. Science 304: 725–727.
Harris, Roy and Taylor, Talbot J. 1989. Landmarks in Linguistic Thought: The Western Tradition
from Socrates to Saussure. New York: Routledge.
Hopkins, William D. and Monica Cantero 2003. From hand to mouth in the evolution of language:
The influence of vocal behavior on lateralized hand use in manual gestures by chimpanzees
(Pan troglodytes). Developmental Science 6: 55–61.
Hrdy, Sarah Blaffer 2009. Mothers and Others: The Evolutionary Origins of Mutual Understand-
ing. Cambridge, MA: Harvard University Press.
Hurley, Susan 1998. Consciousness in Action. Cambridge, MA: Harvard University Press.
Huttenlocher, Peter R. and Arun S. Dabholkar 1997. Developmental anatomy of prefrontal cor-
tex. In: Norman Krasnegor, G. Reid Lyon and Patricia S. Goldman-Rakic (eds.), Development
of the Prefrontal Cortex: Evolution, Neurobiology, and Behavior, 69–83. Baltimore: Brookes.
Kegl, Judy 1994. The Nicaraguan Sign Language Project: An overview. Signpost 7: 24–31.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam 1988. Sign Languages of Aboriginal Australia: Cultural, Semiotic and Communi-
cative Perspectives. Cambridge: Cambridge University Press.
Kendon, Adam 1991. Some considerations for a theory of language origins. Man 26: 199–221.
Kendon, Adam 2008a. Review of Call and Tomasello (eds.) The gestural communication of apes
and monkeys. Gesture 8: 375–385.
Kendon, Adam 2008b. Some reflections of the relationship between ‘gesture’ and ‘sign’. Gesture 8:
348–366.
Kendon, Adam 2011. Some modern considerations for thinking about language evolution: A dis-
cussion of “The Evolution of Language by Tecumseh Fitch”. Public Journal of Semiotics 3(1):
79–108.
Kinzler, Katherine D. and Jocelyn B. Dautel 2012. Children’s essentialist reasoning about lan-
guage and race. Developmental Science 15(1): 131–138.
32. The co-evolution of gesture and speech, and downstream consequences 511
Konopka, Genevieve, Jamee M. Bomar, Kellen Winden, Giovanni Coppola, Zophonias O. Jonsson,
Fuying Gao, Sophia Peng, Todd M. Preuss, James A. Wohlschlegel and Daniel H. Geschwind
2009. Human-specific transcriptional regulation of CNS development genes by FOXP2. Nature
462: 213–218.
Lenneberg, Eric H. 1967. Biological Foundations of Language. New York: Wiley.
Liddell, Scott K. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge:
Cambridge University Press.
MacNeilage, Peter F. 2008. The Origin of Speech. Oxford: Oxford University Press.
MacNeilage, Peter F. and Barbara L. Davis 2005. The frame/content theory of evolution of speech:
A comparison with a gestural-origins alternative. Interaction Studies 6: 173–199.
Maestripieri, Dario 2007. Macachiavellian Intelligence: How Rhesus Macaques and Humans Have
Conquered the World. Chicago: University of Chicago Press.
McNeill, David 1974. Sentence structure in chimpanzee communication. In: Kevin Connolly and
Jerome Bruner (eds.), The Growth of Competence, 75–94. New York: Academic Press.
McNeill, David this volume. Gesture as a window onto mind and brain, and the relationship to
linguistic relativity and ontogenesis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication:
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
and Communication Science 38.1.) Berlin: De Gruyter Mouton.
McNeill, David, Susan D. Duncan, Jonathan Cole, Shaun Gallagher and Bennett Bertenthal 2008.
Growth points from the very beginning. Interaction Studies (special issue on proto-language,
edited by Derek Bickerton and Michael Arbib) 9(1): 117–132.
McNeill, David and Claudia Sowa 2011. Birth of a morph. In: Gale Stam and Mika Ishino (eds.), In-
tegrating Gestures: The Interdisciplinary Nature of Gesture, 27–47. Amsterdam: John Benjamins.
Mead, George Herbert 1974. Mind, Self, and Society from the Standpoint of a Social Behaviorist
(C. W. Morris ed. and introduction). Chicago: University of Chicago Press.
Mufwene, Salikoko S. 2010. ‘Protolanguage’ and the evolution of linguistic diversity. In: Feng Shi
and Zhongwei Shen (eds.), The Joy of Research: A Festschrift for William S.-Y. Wang, 283–310.
Shanghai Jiaoyu Chubanshe: Education Press.
Pepperberg, Irene M. 2008. Alex and Me: How a Scientist and a Parrot Discovered a Hidden World
of Animal Intelligence–and Formed a Deep Bond in the Process. New York: Collins.
Pinker, Steven 1989. Learnability and Cognition: The Acquisition of Argument Structure. Cam-
bridge: Massachusetts Institute of Technology Press.
Pinker, Steven 1994. The Language Instinct. New York: Harper Perennial.
Pollick, Amy S. and Frans B.M. de Waal 2007. Apes gestures and language evolution. Proceedings
of the National Academy of Sciences 104: 8184–8189.
Rizzolatti, Giacomo and Michael A. Arbib 1998. Language within our grasp. Trends in Neuros-
ciences 21: 188–194.
Sahin, Ned T., Steven Pinker, Sydney S. Cash, Donald Schomer and Erik Halgren 2009. Sequential
processing of lexical, grammatical and phonological information within Broca’s Area. Science
326: 445–449.
Sapir, Edward 1921. Language: An Introduction to the Study of Speech. New York: Harcourt,
Brace and World.
Saussure, Ferdinand de 1916. Cours de Linguistique Générale. Paris: Payot.
Savage-Rumbaugh, Sue, Stuart G. Shanker and Talbot J. Taylor 1998. Apes, Language, and the
Human Mind. New York: Oxford University Press.
Science Magazine 2008. Report on a conference on primate behavior and human universals. Issue
of 25 Jan. 318: 404–405.
Senghas, Ann 2003. Intergenerational influence and ontogenetic development in the emergence of
spatial grammar in Nicaraguan Sign Language. Cognitive Development 18: 511–531.
Senghas, Ann and Marie Coppola 2001. Children creating language: How Nicaraguan Sign Lan-
guage acquired a spatial grammar. Psychological Science 12: 323–328.
512 IV. Contemporary approaches
Shields, Kenneth, Jr. 1982. Indo-European Noun Inflection: A Developmental History. University
Park: Pennsylvania State University Press.
Slobin, Dan I. 2009. Review of M. Bowerman and P. Brown (eds.) “Crosslinguistic perspectives on
argument structure: Implications for learnability.” Journal of Child Language 36: 697–704.
Smith, Zadie 2009. Speaking in tongues. New York Review of Books, February 2nd. http://www.
nybooks.com/articles/archives/2009/feb/26/speaking-in-tongues-2/?pagination=false&printpage=
true.
Stefanini, Silvia, Maria Cristina Caselli, and Virginia Volterra 2007. Spoken and gestural production
in a naming task by young children with Down syndrome. Brain and Language 101: 208–221.
Sweet, Henry 1888. A History of English Sounds from the Earliest Period. Oxford: Clarendon Press.
Tomasello, Michael 2008. Origins of Human Communication. Cambridge: Massachusetts Institute
of Technology Press.
Volterra, Virgina, Mari Cristina Caselli, Olga Caprici and Elena Pizzuto 2005. Gesture and the
emergence and development of language. In: Michael Tomasello and David Slobin (eds.),
Beyond Nature-Nurture: Essays in Honor of Elizabeth Bates, 3–40. Mahway, NJ: Erlbaum.
Werner, Heinz and Bernard Kaplan 1963. Symbol Formation. New York: John Wiley. [Reprinted
in 1984 by Erlbaum].
Woll, Bencie 2009. Do mouths sign? Do hands speak? Echo phonology as a window on language
genesis. In: Rudolf Botha and Henriette de Swart (eds.), Language Evolution: The View from
Restricted Linguistic Systems, 203–224. Utrecht, the Netherlands: Lot.
Wrangham, Richard W. 2001. Out of the pan, into the fire: How our ancestors’ evolution depended
on what they ate. In: Frans de Waal (ed.), Tree of Origin: What Primate Behavior Can Tell Us
about Human Social Evolution, 119–143. Cambridge, MA: Harvard University Press.
Abstract
This chapter examines empirical studies and theory related to the hypothesis that embo-
died simulations of sensorimotor imagery play a crucial role in the construction of mean-
ing during conversation. We review some of the substantial body of work implicating
sensorimotor simulations in language comprehension and also some more recent work
suggesting that simulations are similarly involved in the production of gestures. Further-
more, we consider two less standard lines of research that are argued to be relevant to the
33. Sensorimotor simulation in speaking, gesturing, and understanding 513
The result of interest is that people walked significantly slower after reading words
referring to the elderly than in other conditions. Thus, reading words with certain mean-
ings affected people’s subsequent walking behaviors to be similar to the multi-modal
simulation processes required to understand those earlier read words. As Zwaan
(2004: 38) argued, “comprehension is not the manipulation of abstract, arbitrary and
amodal symbols, a language of thought. Rather, comprehension is the generation of
vicarious experiences making up the comprehender’s experiential repertoire.”
Participants were presented with sentences that implicitly referred to the orientation of
various objects (e.g. “Put the pencil in the cup” implies a vertical orientation of the pen-
cil). After each sentence, a picture was presented, to which participants answered
whether the pictured object had been in the previous sentence. For pictures that
were contained in the previous sentence, the picture’s orientation varied as to whether
or not it matched the orientation implied by the sentence (e.g., a pencil was presented
in either a vertical or horizontal orientation). Overall, participants responded faster to
pictures that matched the implied orientation than to mismatched pictures and sen-
tences. This empirical finding suggests that people form analogue representations of ob-
jects during ordinary sentence comprehension, which is consistent with the simulation
view of linguistic processing.
Studies also suggest that sensorimotor simulations during language comprehension
can be shaped by emotion. Earlier brain imaging experiments showed that observing
(perception) and imitating (action) emotional facial expressions activate the same neu-
ral areas of emotion, as well as relevant motor parts of the mirror neuron system (Carr,
Iacoboni, Dubeau, et al. 2003). Generating facial expressions also primes people’s rec-
ognition of others’ facially conveyed emotions (Niedenthal, Brauer, Halberstadt, et al.
2001). These findings on simulation and emotion recognition have been extended to
language comprehension. For instance, one set of studies evoked positive or negative
emotions in participants by having them either hold a pen with their teeth (i.e., produ-
cing a smile) or gripping a pen with their lips (i.e., producing a frown) (Havas, Glen-
berg, and Rinck 2007). As people were either smiling or frowning, they made
speeded judgments as to whether different sentences were either pleasant (e.g., “You
and your lover embrace after a long separation)” or unpleasant (e.g., “The police car
rapidly pulls up behind you, siren blaring”). Response times to make these judgments
showed that people made pleasant judgments faster when smiling than when frowning
and made unpleasant judgments faster when frowning than when smiling. Subsequent
experiments revealed that these effects could not be replicated when people make
speeded lexical decisions to isolated words that were either pleasant (e.g., “embrace”)
or unpleasant (e.g., “police”), suggesting that emotional simulation during language
comprehension operates most strongly at the level of full phrases or sentences. Overall,
sensorimotor simulations appear to be critical parts of language understanding as
people create meaningful and emotional interpretations of linguistic expressions.
One concern with the psycholinguistic research on word and sentence processing is
that simulations may be important in understanding concrete actions and objects, but
not necessarily abstract ideas, such as “justice” or “democracy.” Yet there is much
work in cognitive linguistics showing that people understand at least some abstract con-
cepts in embodied metaphorical terms (Gibbs 2006a; Lakoff and Johnson 1999). More
specifically, abstract ideas, such as “justice,” are structured in terms of metaphorical
mappings where the source domains are deeply rooted in recurring aspects of embodied
experiences (i.e., achieving justice is achieving physical balance between two entities).
Many abstract concepts are presumably structured via embodied metaphors (e.g.,
time, causation, spatial orientation, political and mathematical ideas, emotions, the
self, concepts about cognition, morality) across many spoken and signed languages
(see Gibbs 2008). Systematic analysis of conventional expressions, novel extensions,
patterns of polysemy, semantic change, and gesture all illustrate how abstract ideas
are grounded in embodied source domains. Thus, the metaphorical expression “John
33. Sensorimotor simulation in speaking, gesturing, and understanding 517
ran through his lecture” is motivated by the embodied metaphor of mental achievement
is physical motion toward a goal (a submetaphor derived from change is motion). These
cognitive linguistic findings provide relevant evidence showing that abstract concepts
are partly structured through embodied simulation processes (Gibbs 2006b).
A second response to the concern about simulation and abstract concepts comes
from different psycholinguistic research demonstrating that embodied conceptual me-
taphors motivate people’s use and understanding of metaphoric language related to var-
ious abstract concepts (Gibbs 2006a, 2006b). These experimental studies indicate that
people’s recurring embodied experiences often play a role in how people tacitly
make sense of many metaphoric words and expressions. In fact, some of these studies
extend the work on simulative understanding of non-metaphorical language to proces-
sing of metaphorical speech. Gibbs, Gould, and Andric (2006) demonstrated how peo-
ple’s mental imagery for metaphorical phrases, such as “tear apart the argument,”
exhibit significant embodied qualities of the actions referred to by these phrases (e.g.,
people conceive of the “argument” as a physical object that when torn apart no longer
persists). Wilson and Gibbs (2007) showed that people’s speeded comprehension of
metaphorical phrases like “grasp the concept” are facilitated when they first make, or
imagine making, in this case, a grasping movement. Bodily processes appear to enhance
the construction of simulation activities to speed up metaphor processing, an idea that
is completely contrary to the traditional notion that bodily processes and physical
meanings are to be ignored or rejected in understanding verbal metaphors. Further-
more, hearing fictive motion expressions implying metaphorical motion, such as
“The road goes through the desert,” influences people’s subsequent eye-movement
patterns while looking at a scene of the sentence depicted (Richardson and Matlock
2007). This suggests that the simulations used to understand the sentence involve a par-
ticular motion movement of what the roads does, which interacts with people’s eye
movements.
Experimental findings like these emphasize that people may be creating partial, but
not necessarily complete, sensorimotor simulations of speakers’ metaphorical messages
that involve moment-by-moment “what must it be like” processes, such as grasping, that
make use of ongoing tactile-kinesthetic experiences (Gibbs 2006b). These simulation
processes operate even when people encounter language that is abstract, or refers to
actions that are physically impossible to perform, such as “grasping a concept,” because
people can metaphorically conceive of a “concept” as an object that can be grasped.
One implication of this work is that people do not just access passively encoded con-
ceptual metaphors from long-term memory during online metaphor understanding,
but perform online simulations of what these actions may be like to create detailed
understandings of speakers’ metaphorical messages (Gibbs 2006b).
Gesture is not the true language of man which suits the dignity of his nature. Gesture,
instead of addressing the mind, addresses the imagination and the senses (…) Thus, for
us it is an absolute necessity to prohibit that language and to replace it with living speech,
518 IV. Contemporary approaches
the only instrument of human thought (…) Oral speech is the sole power that can rekindle
the light God breathed into man when, giving him a soul in a corporeal body, he gave him
also a means of understanding, of conceiving, and of expressing himself (…) mimic signs
(…) enhance and glorify fantasy and all the faculties of the imagination (Lane 1984:
391; from Wilcox 2004: 120–121).
The empirical research reviewed above on the active involvement of the sensorimotor
system in online language understanding reveals that Tarra was not so much mistaken
about the nature of gesture, but rather about the nature of language. Language, through
the imaginative processes of simulation, is deeply grounded in movement and the
senses, which form the basis for the construction of linguistic meaning. Yet, ironically,
as evidence has accumulated for this view on language, empirical research on gesture
has been slow to adopt a distinctly simulation perspective, despite its intuitive appeal.
One reason for this imbalance may be found in the asymmetry between different ap-
proaches to studying meaning in language compared to gesture. While language re-
searchers interested in meaning have tended to focus on comprehension, gesture
researchers have tended to pay more attention to production, which does not so easily
lend itself to the traditional dependent measures used to examine the online processes
involved in meaningful communication. Thus, when gesture comprehension has been mea-
sured, this has largely been in the way of offline, targeted assessments of information
uptake.
The intuitive basis for a simulation theory of gesture stems from its clear correspon-
dence to sensorimotor imagery, a point that has also received substantial empirical
attention over the years (McNeill 1992, 2005). One line of research has documented
a positive correlation between spatial processing and gesture production. For example,
Lavergne and Kimura (1987) found that people gesture more frequently when conver-
sing about spatial topics compared to neutral or verbal topics. Another study asked par-
ticipants to describe animated action cartoons, and found that speech-accompanying
gestures were nearly five times as likely to occur with phrases containing spatial prepo-
sitions than those without spatial content (Rauscher, Krauss, and Chen 1996). A study
by Hostetter and Alibali (2007) investigated the influence of individual differences in
spatial skills on gesture production, with the finding that people with strong spatial skills
and weak verbal skills gestured most frequently. Finally, neuropsychological research
shows that stroke patients who suffered visuospatial deficits gestured less than matched
controls (Hadar, Burstein, Krauss, et al. 1998).
Other studies have revealed a positive relationship between gesture and imagery
more broadly. In one experiment, Rime and colleagues had participants engage in a
50 minute conversation while seated in an armchair that restrained the movement of
their head, arms, hands, legs, and feet (Rime, Schiarapture, Hupet, et al. 1984). Analysis
of their dialogue revealed that while people’s movement was restricted, the content of
their speech showed a significant decrease in the amount of vivid imagery compared to
freely moving controls. Another study asked participants to describe a cartoon after, in
one condition, watching it, or in a second condition, reading it in narrative form (Hos-
tetter and Hopkins 2002). People gestured more frequently after watching the cartoon,
presumably because of the richer imagery in the non-verbal presentation. Last, Beattie
and Shovelton (2002) investigated the properties of verbal clauses that were likely to
occur with iconic gestures. Participants narrated cartoons, and afterwards, the clauses
33. Sensorimotor simulation in speaking, gesturing, and understanding 519
in these narrations were rated by other subjects for their imageability. Clauses
associated with gestures were rated as more highly imageable.
These various studies demonstrate that gesture is highly correlated with communica-
tion about imagery-laden topics that span a range of motor, visual, and spatial imagery.
Indeed, these findings are not at all surprising as one considers the highly imagistic
nature of gestures themselves, which are inherently visible, spatially-oriented motor ac-
tions. Yet, the studies also beg the question of how sensorimotor imagery might relate
to gesture within the momentary processes involved in communicative interaction. One
recent proposal – the Gesture as Simulated Action hypothesis – offers a potential
answer, arguing for the idea that gestures arise from the simulation of motor imagery
(Hostetter and Alibali 2008). According to this view, simulations of action-related
thoughts lead to the activation of neural premotor action states, which then has the
potential to spread to motor areas. This spreading activation comes to be realized as
the overt action of representative gesture.
Hostetter and Alibali’s Gesture as Simulated Action hypothesis is clearly related to
our claim about the importance of sensorimotor simulation in speaking, gesturing, and
understanding. Yet the Gesture as Simulated Action hypothesis may have some limita-
tions. The first is not critical, but concerns the scope of the gestural behaviors the Ges-
ture as Simulated Action hypothesis actively attempts to explain. Explicitly the theory
concerns “representative gestures,” which are said to include iconic and deictic gestures.
However, deictics receive minimal consideration, and more broadly, the theory does not
address language-related gestures, including, spoken prosody, manual beat gestures, or
iconic “vocal” gestures (see below), nor the conventionalized gestures of speech and
sign. Given the tight temporal coordination and semantic coherence between spoken
language and gesture (McNeill 1992, 2005; Perlman in press), we emphasize the impor-
tance of developing a comprehensive theory of gesture aimed to account for the range
of these gestural behaviors. This point gains weight as researchers increasingly observe
a more graded distinction between modality-independent notions of gesture and
language that rests largely on a continuous quality of conventionalization.
A second limitation of the Gesture as Simulated Action account is in its exclusive
emphasis on simulated action as the foundation of gesture. The theory argues that simu-
lated motor imagery is the critical conceptual ingredient for gesture production, and
that gestures related to non-motor, perceptual imagery might arise, but only through
co-activation with afforded actions or by simulated perceptual actions like eye move-
ments or tactile exploration. Though one cannot deny that action is an essential element
of many, if not all, gestures (though consider a gestural depiction of stillness and expand
from there), the fact that gestures are composed of substance and context is likewise
essential to the simulation process. To fully appreciate the simulative processes involved
in the production of gesture, it is critical to attend to the full process of how the body is
imaginatively and often metaphorically used to constitute an endless variety of simu-
lated entities and qualities. As McNeill (1992) observed, “The hand can represent a
character’s hand, the character as a whole, a ball, a streetcar, or anything else (…) In
other words, the gesture is capable of expressing the full range of meanings that arise
from the speaker” (McNeill 1992: 105). Moreover, as we describe below, the cross-
modal sorts of representations involved in common vocal gestures demonstrate a com-
plexity of simulation that is just not adequately captured by the simplifying notion of
“simulated action.”
520 IV. Contemporary approaches
This capacity is easily demonstrated even within a conceptually simple context. For
example, consider a study by Lausberg and Kita (2003), which investigated hand pref-
erence for observer-viewpoint iconic gestures. Participants watched and described ani-
mations of two shapes interacting with each other, with one shape oriented on the left
side and the other on the right. In one condition, participants verbally described the
event as they also produced many speech-accompanying gestures, and in a second con-
dition, they depicted the event silently. Lausberg and Kita were interested in whether
the verbal condition would elicit, through left hemispheric activation associated with
speech, more right-handed gestures. Indeed, the findings showed only a minimal effect
induced by speech, and instead, by far the most important factor influencing hand
choice was whether the represented block was oriented on the left or the right side
of the animation. Not surprisingly, when the represented block was on the left, speakers
tended to use their left hand, and when the block was on the right, speakers tended to
use their right hand. In addition, participants also produced numerous bimanual ges-
tures in which the hands worked together to embody the spatial relationship between
the blocks. This empirical finding provides a nice and simple illustration of the essential
role played by substance and context, in addition to action, in how simulative processes
give rise to gesture.
These difficulties with the Gesture as Simulated Action hypothesis do not dampen
our enthusiasm for the general idea that gesture is both produced and understood in
terms of ongoing sensorimotor simulations. By providing a detailed formulation of
their hypothesis, Hostetter and Alibali open the way for direct, empirical testing of a
simulation-based theory of gesture, helping to incorporate gesture into an important
theory for the comprehension of meaning in language. Notably, by linking the sensor-
imotor simulations of language comprehension to gesture production, the hypothesis
brings together the meaning-making processes involved in language and gesture, as
well as comprehension and production.
The value of their proposal can be seen in one recent study that examined the degree
of detail incorporated into the production of a gesture, and in turn, how much of this
detail is conveyed to the listener (Cook and Tanenhaus 2009). Participants solved the
Tower of Hanoi problem, either with a stack of weights or on a computer, and after-
wards described the solution to a listener. (In this problem, a stack of disks is arranged
bottom up from largest to smallest on the leftmost of three pegs. The goal is to move all
of the disks to the rightmost peg, moving only one disk at a time and without ever pla-
cing a larger disk on top of a smaller one.) Analysis of the trajectory of speakers’ ges-
tures revealed that these were finely tuned to the actual trajectory involved in solving
the problem, with differences reflecting the differently afforded constraints of the real-
weight versus computer version of the task. Moreover, listeners were sensitive to this
information, which transferred into matching trajectories when listeners later per-
formed the computer version of the problem themselves. Cook and Tanenhaus (2009)
interpret these analog differences in gesture production and comprehension as evidence
for the activation of perceptual-motor information that is involved in the actual perfor-
mance of the task. More generally, they suggest that this finding is consistent with a
sensorimotor simulation account of gesture production and comprehension.
Cook and Tanenhaus provide a compelling interpretation of their results, and we
anticipate much future empirical research focused in this direction. Yet, returning to
the current formulation of the Gesture as Simulated Action hypothesis, we also
33. Sensorimotor simulation in speaking, gesturing, and understanding 521
reiterate the need for a more comprehensive account that emphasizes the full embodied
and contextualized complexity of these simulation processes and the meaningful actions
that result. Given evidence for the multimodal nature of language and gesture, it is cru-
cial to understand not just the action aspect of gesture, but how the body is incorporated
into the construction of meaning during these activities. To illustrate this point, we next
describe some recent research and examples related to the production of gestures
within the vocal modality.
5. Vocal gesture
When people talk, they commonly pattern their voice in a variety of ways to iconically
depict an aspect of their subject matter. Many times these iconic correspondences are
created within the domain of sound, with the sounds of our voice imitating the sounds
of our environment. One familiar example is when quoting another person’s speech, we
often imitate certain characteristics of that person’s voice, such as their emotional state,
accent, and tonal quality (Clark and Gerrig 1990). Nonhuman animals can also be
“quoted,” such as when imitating the high-pitched barks of a yapping dog (perhaps
in combination with a one-handed gesture of the dog’s mouth opening and closing).
And we are similarly inclined to imitate the sounds of inanimate subjects too, an ability
that often proves useful when taking a malfunctioning car to the mechanic. It seems that
people produce and comprehend these sorts of iconic vocalizations so naturally that
we hardly even notice as they are seamlessly woven into our speech.
Close observation reveals further that iconic vocalizations go well beyond simple
pantomimic imitations of other sounds. Indeed, spoken words and phrases often take
on prosodic patterns that reflect aspects of their so-called semantic meaning, often ex-
tending these iconic correspondences across modalities through processes of abstrac-
tion and metaphor. For example, one might describe “a looong snake,” expressing
the snake’s physical length and size by iconically accenting the adjective with extended
duration and low pitch. Or in contrast, the phrase “a quicklittlebug” might be uttered
with a fast tempo and high pitch to convey the bug’s rapid movement and size. Building
from such observations, scholars have recently begun to argue that these co-speech vo-
calizations are, in fact, the same qualitative sort of behavior as manual gestures (Em-
morey 1999; Liddell 2003; McNeill 2005; Okrent 2002; Perlman in press). Below we
describe some of the empirical research on vocal gesture and discuss its implications
for an integrative, simulative account of gesture and language.
Bolinger (1983, 1986) was one of the first to recognize an iconic quality to spoken
prosody and to point to the close relationship between intonation and gesture. He
saw intonation as “part of a gestural complex whose primitive and still surviving func-
tion is the signaling of emotion” (Bolinger 1986: 195). Ohala (1994) too observes a link
between intonation and the more primitive emotional vocalizations shared with other
mammals, which is expressed in what he calls the “frequency code.” According to
Ohala, high-frequency vocalizations signal apparent smallness and, by extension, non-
threatening, submissive, or subordinate attitude, and low-frequency vocalizations signal
apparent largeness and thus threat, dominance, and self-confidence. Although both
scholars stress the expansion of this iconic intonational system through processes
of ritualization and metaphor, recent research nevertheless indicates that this view of
intonation and gesture is too narrow. By focusing on the iconic expression of
522 IV. Contemporary approaches
emotion, Bolinger and Ohala neglect intonation’s incorporation into more imagistic,
representational gestures.
Recent empirical research has begun to document the prevalent use of representa-
tive vocal gesture within various experimental and more naturalistic contexts. An
early series of studies investigated people’s production of vocal gestures, or what the
authors called “analog acoustic expressions” (Shintel, Nusbaum, and Okrent 2006).
One group of participants described the movement of an animated dot moving up or
down with the phrases, “It is going up” or “It is going down.” Another group of parti-
cipants simply read these same sentences as they were presented on a computer screen.
Analysis of people’s speech revealed that the final word of these phrases, “up” and
“down,” were spoken with a higher or lower fundamental frequency both when spoken
to describe the dot or just simply read.
A second study in this series had participants describe animated dots as they moved
at either a fast or slow rate to the left or right. Participants were instructed to use the
phrases “It is going left” or “It is going right” to describe the dot, without explicit men-
tion of its speed. Participants nevertheless spoke the phrase with an overall shorter
duration for fast-moving dots and longer duration for the slow-moving dots. Moreover,
when these descriptions were replayed for listeners to guess whether the utterance had
been spoken in description of a fast or slow moving dot, their accuracy was significantly
correlated with phrase duration, indicating sensitivity to the prosodic information.
A different series of experiments examined whether modulations in speech rate con-
tribute to a speaker’s mental representation of a described event as in motion or still
(Shintel and Nusbaum 2007). Building on an experiment by Zwaan and Yaxley
(2002), participants listened to sentences (e.g., “The horse is brown”) spoken at a fast
or slow rate and then indicated whether a pictured object had been mentioned in the
sentence. Critically, the picture presented the object either in a stationary position or
in motion (e.g., a horse standing still or running), with the idea that fast-spoken sen-
tences would contribute a sense of movement in the listener’s representation and facil-
itate responses to in-motion pictures. This prediction was confirmed as participants
were faster in responding to compatible trials (fast rate-moving picture or slow rate-
still picture). According to Shintel and Nusbaum, this finding suggests that speech
rate (an “analog acoustic expression”) can contribute to a listener’s analog perceptual
representation of a described event. Analogically conveyed motion information can
influence listeners’ representations about described objects, even when information is
conveyed exclusively in the prosodic properties and not the propositional content of
the sentence.
The regular use of vocal gesture has also been demonstrated in a more naturalistic
setting. Perlman (in press) investigated the spontaneous use of iconic speech rate by
having participants watch and describe a series of short video clips showing fast or
slow-paced events. For each description that made explicit mention of speed, speech
rate measurements were made for the full utterance, as well as for speed-related adver-
bial phrases. The analysis showed that speakers generally spoke faster or slower in their
full descriptions of fast or slow events, respectively, and additionally, they spoke
adverbial phrases about “fast” events faster than adverbials about “slow” ones.
Perlman suggests that these two separate speech rate effects may arise as the man-
ifestation of two different sorts of simulation-related processes. The more general shift
in speech rate is suggested to arise from a background simulation of the event, reflecting
33. Sensorimotor simulation in speaking, gesturing, and understanding 523
speakers’ imaginative engagement with the tempo of the action as they proceed through
their description, scanning and profiling specific details to highlight different aspects of
the message. On the other hand, the adverbial phrase-specific effect is qualitatively
compared to the more commonly observed manual gestures that are produced concur-
rently with speech. The vocal gesture emerges precisely as the speaker is conceptualiz-
ing and communicating about speed as the profiled aspect of the message. We propose
here that this simulative focus is the motivating force that drives both the conventional
articulatory gestures of the adverbial phrase and simultaneously, the iconic increase in
the rate with which the words are spoken. These conventional and iconic forms of ges-
ture are dynamically integrated together as they are simultaneously activated by the
focused concept of speed as it is contextualized within the simulation of the event.
Yet, ongoing work in this paradigm indicates that vocal gesture is only part of the
story, and that the notion of multimodal iconic gesture is probably more apt (Perlman
in press). In a current study, subjects come into the lab in pairs and take turns watching
and describing to each other short video clips of various animals engaging in different
activities. Preliminary analysis of audio and video shows that vocal gestures, including
the spontaneous rhythms and intonational patterns of speech, are often performed in
precise temporal and semantic coordination with iconic manual gestures. Importantly,
in many instances, gestures may not be arbitrarily synchronized with speech, but rather,
both gesture and speech are performed in iconic synchrony with an ongoing simulation
of the event being described.
To illustrate this synchrony, consider the following excerpt from the description of a
video clip of a large fish floating around in an aquarium. The fish drifts up to the surface
of the water and then suddenly gulps down a bug.
(1) it was this big fish [kind of hanging out, he was floating slowly up to the top
and he ate some…thing…]
(1) (2)
Preparation: raises and holds right hand with thumb and fingers pinched together as a fish
and its mouth.
Manual iconic (1): right hand rises slowly upward and pauses
Manual iconic (2): on “ate” the right hand thrusts forward, thumb and fingers spread-
ing open and closing like the mouth of the attacking fish. On “some” the fingers and hand
retract back and are held to the end of the utterance.
Vocal iconic (1): speech is slowed down and low in intensity reflecting the fish’s man-
ner of floating to the surface. The slowing is most marked in extended duration of the
vowel /o/ in the adverb “slowly.” Pitch steadily rises, peaking on the word “top.”
Vocal iconic (2): speech suddenly increases in tempo and intensity in synchrony with
the stressed syllable of “ate.”
This person’s description demonstrates the multimodal nature of gesture as a see-
mingly single unified expression incorporates iconic manual and vocal gestural elements
with the conventional articulatory gestures of speech. It is additionally interesting to
note that the sudden nature of the fish’s attack, although it is quite apparent in the cor-
responding manual and vocal gestures, is not entirely discernible from just the semantics
of the words. A clue, however, is provided by the grammar of the utterance, in which the
temporally extended “float,” expressed by the progressive aspect, is contrasted to the
524 IV. Contemporary approaches
punctuated “eat,” expressed in the past tense. Thus, corresponding elements of the fish’s
manner of motion, the temporal contour of the motion, and even perhaps its upward
direction are all manifested synchronously within iconic manual gesture, iconic prosody,
lexical semantics, and even higher-order syntactical structures.
Unlike manual gestures (with the exception of when they are integrated with sign-
ing), vocal gestures are, in a sense, parasitic on the phonological form of an utterance.
That is to say, these gestures must be interwoven within the phonological material pro-
vided by the forms of spoken words. How is it that particular segments, during the
online moments of speech production, are modified to accentuate certain iconic quali-
ties? One possibility is that phonological aspects of a word form maintain some latent
potential to take on a quality of iconicity, which may, in some instances become acti-
vated in relation to the contextualized dynamics of an utterance. This idea borrows
from Wilcox’s (2004) theory of cognitive iconicity and Müller’s (2008) notion of activation
as it plays out in the triadic structure of metaphor.
Cognitive iconicity adopts Langacker’s (1987) claim that semantic and phonological
poles (i.e., semantic meaning and phonological form) each reside within semantic space,
which is itself a subset of the full expanse of conceptual space. Wilcox describes that,
“The phonological pole reflects our conceptualizations of pronunciations, which
range from the specific pronunciation of actual words in all their contextual richness
to more schematic conceptions, such as a common phonological shape shared by all
verbs, or a subset of verbs, in a particular language” (Wilcox 2004: 122). Wilcox explains
that cognitive iconicity is not an objective similarity relation between a form and its sig-
nified referent, but rather a constructed relation between two structures in a multidi-
mensional conceptual space. He also notes that metaphor can act as a “worm hole”
through this space, functioning to shorten the distance between the phonological and
semantic poles.
A similar schema exists in the triadic structure of metaphor, which involves two
meaning structures of source and target and a relation between them. Considering a tra-
ditional view distinguishing dead and living metaphors, Müller (2008) proposes a dyna-
mical alternative in which the metaphorical relation between two concepts does not
need to be fully active or fully opaque, but instead can be activated to a greater or lesser
degree with the dynamics of each instantiated usage. Applying this framework to the
triadic structure of cognitive iconicity, it follows that the iconic relation between the
semantic and phonological poles may lie dormant and become more or less activated
dependent on the dynamics of usage. When activated, these iconic relations become
accentuated and take form as vocal gesture.
Consider for example, the saliently extended duration of /o/ in the pronunciation of
the word “slooowly” in the above example (1). In this case, the duration of the /o/ was
particularly activated, observable through its exaggeration. However, for comparison,
consider how one might articulate the word “slowly” in the phrase, “a ssloowly sslither-
inng sssnake.” In this case the /o/ is still articulated with some extended duration, but in
addition, the frication of the /s/ is extended too, in part because of the alliteration, but
surely also in part to the onomatopoeic hiss that we associate with snakes. Thus one can
see how iconic relations between a phonological form and an aspect of its meaning
might become differentially activated within each dynamical usage.
Vocal gestures seen in examples like (1) above offer a special window into the full
extent to which speech and gesture are integrated together as they manifest from the
33. Sensorimotor simulation in speaking, gesturing, and understanding 525
same simulative processes. Furthermore, these gestures imply that the embodied repre-
sentations that arise during these simulative processes are profoundly multimodal. Var-
ious concepts, such as those related to speed, manner of movement, size, and verticality,
are spontaneously embodied in the movements of our hands and also in the movements
of our vocal tract, and indeed, in the right context, body parts ranging across much of
our anatomy. The frequent and casual use of iconic vocal gestures in particular suggests
that humans have a special knack for conceiving of their experience in terms of iconic
movements of the vocal tract, often through cross-modal abstractions and metaphor. As
McNeill (1992: 12), puts it, “Gestures are like thoughts themselves,” pointing to the idea
that the embodiment of thoughts through gesture is an essential aspect of the very
nature of how humans think. Vocal gestures may also be thoughts with conventional
linguistic gestures reflecting conventionalized aspects of embodied thought patterns.
[A] considerable proportion of all desires is naturally expressed by slight initiation of the
actions which are desired. Thus, one chimpanzee who wishes to be accompanied by
another, gives the latter a nudge, or pulls his hand, looking at him and making the move-
ments of “walking” in the direction desired. One who wishes to receive bananas from
another initiates the movement of snatching or grasping, accompanied by intensely plead-
ing glances and pouts. (…) In all cases their mimetic actions are characteristic enough to be
distinctly understood by their comrades. (Köhler 1925: 307–308)
In each of these circumstances, it appears that the gesturing chimp desires an interactive
outcome and partially enacts a gesture that, if it were carried out to instrumental com-
pletion, would function to bring the outcome into being. Indeed, this ability to imagine
the performance of an instrumental act upon a social interactant, but then to partially
inhibit that act’s performance, appears crucial in the origin of these possible precursors
of representative gesture. And as Köhler notes, comprehension of such partial actions
comes naturally, facilitated by their mimetic resemblance and presumably their
contextual relevance to an afforded instrumental action.
More recent studies have also reported the use of spontaneously produced iconic
gestures by great apes, often occurring in contexts of tactful social engagements in
which one ape is trying to influence the movement and position of an interlocutor (Sav-
age-Rumbaugh, Wilkerson, and Bakeman 1977; Tanner and Byrne 1996). For example,
Savage-Rumbaugh and colleagues documented the gestures uses by bonobo chimpan-
zees as they coordinated copulatory positions with one another (promiscuous sex is a
common behavior of bonobos). They observed that many of the gestures that immedi-
ately preceded copulatory bouts bore an iconic quality and could generally be placed
into three categories: positioning motions (actual physical, gentle movements to
move the recipient’s body or limbs), touch plus iconic hand motions (limb or body
part is lightly touched and then movement is indicated by an iconic hand motion),
and iconic hand motions (simply indicates via an iconic hand movement, without
touch). Interestingly, this ordering of increasing abstraction correlated negatively
33. Sensorimotor simulation in speaking, gesturing, and understanding 527
with the gestures’ frequency of occurrence, suggesting that those gestures closest to
instrumental action were the easiest to perform. One could reason that the more ab-
stractly iconic gestures, further removed from the immediate context of instrumental
action, would place a greater load on the imagination.
Another study by Tanner and Byrne (1996) observed the use of iconic gestures
between two captive gorillas at the San Francisco Zoo. In this case, a 13-year-old
adult male Kubie was recorded using a variety of iconic tactile and visual gestures to
encourage play and direct interactive movement with Zura, a 7 year-old-female. Of par-
ticular interest here, Tanner and Byrne note how special conditions in the zoo enclosure
made coercion ineffective and propose that these conditions were a motivating force for
the production of the gestures. For instance, a door to an indoor pen was opened wide
enough to allow Zura to fit through but not her larger companion, permitting Zura to
escape if Kubie was too forceful. Additionally, a second, older silverback male was part
of the troop, which meant that Kubie had to be especially charming so as to engage
Zura without drawing the other silverback’s attention. Thus, again, we find that iconic
gestures arise in contexts closely connected with instrumental action, in which one ape
desires to bring about a certain outcome with an interactant, but must exercise restraint
for purposes of social tact.
Evidence shows also that with rich human social interaction and enculturation, apes
develop a markedly expanded capacity to produce more abstracted iconic gestures
(Tanner, Patterson, and Byrne 2006). In many of these cases, the expansion of the imag-
ination and its distinct role within particular gestures is obvious. For example, consider
some of the iconic gestures produced by the language-trained bonobos Kanzi and Mu-
lika in which they “made twisting motions toward containers when they needed help in
opening twist-top lids” and “hitting motions toward nuts they wanted others to crack
for them” (Savage-Rumbaugh, Kelly, Rosa, et al. 1986: 218). Both of these gestures dif-
fer from those described above in how they incorporate the imagined physical manip-
ulation of objects that are not available to immediate tactile experience. Notably
though, the simulated objects are available to visual perception, and as Kanzi and Mu-
lika perform these gestures, it sounds from the preposition “through” that their visual
attention is clearly drawn towards these objects. One might wonder whether they could
produce such sophisticated iconic gestures with the same degree of facility if the objects
were located outside of their perceptual purview. (See Köhler 1925 for interesting ex-
amples from the domain of problem solving in which direct perceptual access to the ele-
ments involved in a solution is necessary for a chimp to conceive of the solution.)
Finally, a simulation-based account of gesture has implications for more domain-general
cognitive processes, the evolution of which is typically assumed a necessary precondition
for the use of iconic gestures. A traditional view on iconic gesture assumes that their pro-
duction and comprehension depends on highly developed cognitive abilities related to
imitation and theory of mind. For example Tomasello (2008: 203) reasons,
To use an iconic gesture one must first be able to enact actions in simulated form [a more
deliberate notion of simulation than we intend in our usage], outside their normal instrumen-
tal context – which would seem to require skills of imitation, if not pretense. But even more
importantly, to comprehend an iconic action as a communicative gesture, one must first
understand to some degree the Gricean communicative intention; otherwise the recipient
will suppose that the communicator is simply acting bizarrely, trying to run like an antelope
or to dig a hole for real when the context is clearly not appropriate. (Tomasello 2008: 203)
528 IV. Contemporary approaches
This line of reasoning, in combination with empirical research, has led researchers, such
as Tomasello and his colleagues, to dismiss on a priori grounds the possibility that the
great apes use iconic gestures (Call and Tomasello 2007; Pika 2007; Tomasello 2008).
They argue that the great apes have only minimal abilities to imitate and to share
communicative intentions, and thus they simply cannot use iconic gestures.
However, from the perspective of the simulation hypothesis, the use of iconic ges-
tures, although spontaneous, creative, and socially-minded, does not require an ability
for deliberate imitation and pantomime or hard Gricean social-cognitive reasoning.
As we have seen above, reports of iconic gestures by the great apes describe them as
used, not “outside their normal instrumental context,” but directly within it (Tomasello
2008: 203). Moreover, these iconic gestures are comprehended without necessarily a
reflective understanding of the gesturer’s “communicative intention,” but more directly
through an activated sense of the full action within a context that is rife with expecta-
tion of exactly that sort of action. We suggest that these gestures appear to reflect an
emerging ability to perform increasingly imaginative, sensorimotor simulations and to
modulate their iconic motor activations towards communicative expression. According
to the simulation hypothesis, this capacity is interwoven at the core of the cognitive
skills leading to the origin of our own, dramatically more sophisticated system of
socially-tuned simulations which are foundational in the motivation of our gesture
and language.
7. Conclusion
Our main hypothesis is that experiential simulations of sensorimotor imagery are fun-
damental to the conceptual processes that underlie the use of gesture and language and
the construction of meaning during conversational interaction. According to this view,
one’s ability to interpret meaning during conversation resides largely in the ability to
simulate the thoughts and ideas of one’s interlocutor through the expressive movements
of their speech and gesture. These simulative processes are also involved in the produc-
tion of language and gesture. Articulatory movements, whether of the vocal tract, the
hands, or potentially any other part of the body, are produced by the activations that
arise during a sensorimotor simulation. Indeed, these bodily activations are, in the
sense of Vygotsky’s (1986) notion of a “material carrier,” an essential aspect of the
thought itself. An important implication of this idea is that our embodied, sensorimotor
experience plays a crucial role in the formation of the concepts and the meanings we
construct and express during the online moments of conversation. Critically, these con-
cepts and meanings appear to be interwoven across modalities, and often involve the
creation of schematic and metaphorical cross-modal correspondences.
Although we do not, as yet, have a firm idea on all of the constraints on simulation
processes, and the extent to which they create simplified, as opposed to complex, mean-
ings, we suggest that these processes include aspects of full-bodied experiences, and are
critical to understanding the minds of others. People are likely to be quite flexible in the
level of details they create during sensorimotor simulation, depending on their imme-
diate motivations and goals, the social context, the linguistic material to be understood,
and the task. The process of constructing sensorimotor simulations is constrained sim-
ilarly as are other fundamental cognitive operations in the pursuit of meaning. People
will create simulations rich enough to enable them to infer sufficiently relevant
33. Sensorimotor simulation in speaking, gesturing, and understanding 529
meanings and impressions, while also trying to minimize the cognitive effort needed to
produce meaningful effects. In some cases, the meanings, emotions, and impressions one
infers when understanding a speaker’s utterance will be relatively crude, primarily
because this set of products will be “good enough” for the purposes at hand. At
other times, people may engage in more elaborate, even highly strategic simulation pro-
cesses as they tease out numerous meanings and impressions from an utterance in con-
text, such as when reading novel metaphors in poetry. Interestingly, these more
elaborate instances of communication often seem to be ones that invoke more richly
iconic expression and interpretation.
Finally, there is one important challenge that must be addressed to achieve a more
unified understanding of how sensorimotor simulations figure into the production and
comprehension of language and gesture. This challenge rests in the need to resolve
the qualitative distinctions that are often assumed to distinguish linguistic communica-
tion from gesture and other so-called paralinguistic forms of expression. On the surface,
language appears to be a completely different sort of behavior from gesture, and lan-
guage scholars have long believed that its use depends on specialized cognitive pro-
cesses. More specifically, language is typically characterized as a conventional system
of discrete, arbitrary forms that are strung together by the phonological and syntactic
rules that comprise duality of patterning (e.g., Hockett 1960; Jackendoff 1994; Pinker
1994). In contrast, gesture appears be almost diametrically opposite. Gestures are idio-
syncratic, iconic, and analog in form; they lack syntactic combinatorial rules and cannot
be analyzed into anything resembling phonological components (McNeill 1992). Ges-
tures seem naturally molded to our thoughts and it is intuitive how they might manifest
directly from the bodily activations of sensorimotor simulations. Language, on the other
hand, is generally thought to be a symbolic code for thought and thus to involve pro-
cesses of encoding/decoding or often “mapping” (e.g., Glenberg and Kaschak 2002)
between thought and linguistic form.
Though on the surface these distinct properties may appear to reflect differences of
quality, a new framework has emerged that instead considers them as a set of continua
(McNeill 1992, 2005). The properties of language and gesture occupy opposing proto-
typical ends of these continua and the intermediate properties of various other forms
of communication, such as pantomimes and emblems, are positioned in between. Build-
ing on this perspective, evidence suggests that the most critical difference between lan-
guage and gesture may relate more to conventionality than to any particular formal
differences intrinsic to language per se. Under this view, the formal properties asso-
ciated with language – duality of patterning, arbitrary symbolism, and categorical
form – may simply be emergent properties of functional constraints on a conventional
communication system.
Crucial to this idea are various empirical observations demonstrating how a func-
tional need to establish conventional communication appears to lead quite naturally
to the properties characteristically associated with language. For example, deaf children
raised without access to a system of sign language naturally create their own linguistic
homesign system (Goldin-Meadow and Feldman 1977). Over a child’s development,
initially idiosyncratic iconic gestures become conventionalized into more discrete
word-like forms, which the child combines together by simple syntactic rules. In fact,
this pattern is so robust that it can be induced within just a few minutes in a laboratory
setting. McNeill (1992) describes a study in which speakers told fairytales to a partner,
530 IV. Contemporary approaches
but were permitted to use only manual gestures and no words. McNeill notes how
“Within 15 or 20 minutes a system has emerged that includes segmentation, composi-
tionality, a lexicon, a syntax with paradigmatic oppositions, arbitrariness, distinctiveness,
standards of well-formedness, and a high degree of fluency” (McNeill 1992: 66).
Other evidence for this natural pathway between language and gesture comes from
studies documenting the residual iconicity found in sign languages. One study examin-
ing 1,944 signs in Italian Sign Language found that fifty percent of handshape occur-
rences and sixty-seven percent of body locations had clear iconic motivations
(Pietrandrea 2002). Indeed, sign language scholars have traced some of the historical
routes by which conventional signs originate from iconic gesture (Wilcox 2004). Fur-
thermore, the fact that a substantial amount of this iconicity persists even in mature lan-
guages suggests that it continues to play an active role in their ongoing development.
Although it is less transparent, there is also evidence to suggest that vocally-based ico-
nicity similarly persists to some degree in spoken languages (Hinton, Nichols, and
Ohala 1994).
Perhaps most crucial difference between gesture and language may turn out mostly
to be a matter of convention. (This is not to say, of course, that humans do not have a
special knack for acquiring the conventional; clearly, conventional actions and behav-
ioral routines of all sorts abound in human culture.) As we have seen, this quality of
convention is fluid and, importantly, bidirectional. Gestures can become more linguistic
under certain functional constraints, and likewise, in certain contexts, linguistic forms
can become more like gesture. Such is the case with poetry, both spoken and signed,
and more mundanely, it is the case with prosodic vocal gestures. Given this slippery dis-
tinction, it seems perhaps less plausible to view language from the standard notion as a
symbolic, arbitrary code, into which thoughts are encoded, transmitted, and then de-
coded back. A more parsimonious account might consider instead that language is pro-
duced and understood in the same way as gesture, with the only significant difference
being that linguistic gestures arise from the conventionalized aspects of simulated sen-
sorimotor imagery. Moreover, it takes only casual observation to witness the completely
ordinary way in which language and gesture are seamlessly integrated with pantomimes
and emblems, as well as music, dance and everything in between. Underlying it all,
we argue, is the ability to interactively engage our minds and bodies in imaginative
simulations of sensorimotor imagery.
8. References
Barsalou, Lawrence W. 2008. Grounded cognition. Annual Review of Psychology 71: 230–244.
Beattie, Geoffrey and Heather Shovelton 2002. What properties of talk are associated with the
generation of spontaneous iconic hand gestures? British Journal of Social Psychology 41:
403–417.
Bolinger, Dwight 1983. Where does intonation belong? Journal of Semantics 2(2): 101–120.
Bolinger, Dwight 1986. Intonation and Its Parts: Melody in Spoken English. Palo Alto, CA: Stan-
ford University Press.
Browman, Catherine and Louis Goldstein 1995. Dynamics and articulatory phonology. In: Timo-
thy van Gelder and Robert F. Port (eds.), Mind as Motion, 175–193. Cambridge: Massachusetts
Institute of Technology Press.
Call, Josep and Michael Tomasello 2007. The Gestural Communication of Apes and Monkeys.
London: Lawrence Erlbaum.
33. Sensorimotor simulation in speaking, gesturing, and understanding 531
Carr, Laurie, Marco Iacoboni, Marie-Charlotte Dubeau, John C. Mazziotta and Gian Luigi Lenzi
2003. Neural mechanisms of empathy in humans: A relay from neural systems for imitation to
limbic areas. Proceedings of the National Academy of Science USA 100: 5497–5502.
Chartrand, Tanya and John Bargh 1999. The chameleon effect: The perception-behavior link and
social interaction. Journal of Personality and Social Psychology 76(6): 893–910.
Clark, Herbert H. and Richard J. Gerrig 1990. Quotations as demonstrations. Language 66(4):
764–805.
Cook, Susan Wagner and Michael K. Tanenhaus 2009. Embodied communication: Speakers’ ges-
tures affect listeners’ actions. Cognition 113(98): 98–104.
Crawford, Meredith P. 1937. The cooperative solving of problems by young chimpanzees. Compar-
ative Psychology Monograph 14(2): 1–88.
Emmorey, Karen 1999. Do signers gesture? In: Lynn Messing and Ruth Campbell (eds.), Gesture,
Speech, and Sign, 133–159. New York: Oxford University Press.
Fowler, Carol A. 1987. Perceivers as realists; talkers too. Journal of Memory and Language 26(5):
574–587.
Galantucci, Bruno, Carol A. Fowler and M. T. Turvey 2006. The motor theory of speech percep-
tion reviewed. Psychonomic Bulletin and Review 13(3): 361–377.
Gibbs, Raymond W., Jr. 2006a Embodiment and Cognitive Science. New York: Cambridge Uni-
versity Press.
Gibbs, Raymond W., Jr. 2006b Metaphor interpretation as embodied simulation. Mind and Lan-
guage 21(3): 434–458.
Gibbs, Raymond W., Jr. (ed.) 2008. Cambridge Handbook of Metaphor and Thought. New York:
Cambridge University Press.
Gibbs, Raymond W., Jr., Jessica J. Gould and Michael Andric 2006. Imagining metaphorical ac-
tions: Embodied simulations make the impossible plausible. Imagination, Cognition, and Per-
sonality 25(3): 221–238.
Gibson, James Jerome 1979. An Ecological Approach to Visual Perception. Boston: Houghton
Mifflin.
Glenberg, Arthur M. and Michael P. Kaschak 2002. Grounding language in action. Psychonomic
Bulletin and Review 9(3): 558–565.
Goldin-Meadow, Susan and Heidi Feldman 1977. The development of language-like communica-
tion without a language model. Science 197(4301): 401–403.
Hadar, Uri, Aaron Burstein, Michael R. Krauss and Nachum Soroker 1998. Ideational gestures
and speech: A neurolinguistic investigation. Language and Cognitive Processes 13: 59–76.
Havas, David A., Arthur M. Glenberg and Mike Rinck 2007. Emotion simulation during language
comprehension. Psychonomic Bulletin and Review 14(3): 436–441.
Hinton, Leanna, Johanna Nichols and John J. Ohala 1994. Sound Symbolism. Cambridge: Cam-
bridge Univesity Press.
Hockett, Charles F. 1960. The origin of speech. Scientific American 203: 89–97.
Hostetter, Autumn B. and Martha W. Alibali 2007. Raise your hand if you’re spatial: Relations
between verbal and spatial skills and representational gesture production. Gesture 7(1): 73–95.
Hostetter, Autumn B. and Martha W. Alibali 2008. Visible embodiment: Gestures as simulated
action. Psychonomic Bulletin and Review 15(3): 495–514.
Hostetter, Autumn B. and William D. Hopkins 2002. The effect of thought structure on the pro-
duction of lexical movements. Brain and Language 82(1): 22–29.
Jackendoff, Ray 1994. Patterns in the Mind: Language and Human Nature. USA: Basic Books.
Kendon, Aadam 1991. Some considerations for a theory of language origins. Man 26(2): 602–619.
Klatzky, Roberta L., James W. Pellegrino, Brian P. McCloskey and Sally Doherty 1989. Can you
squeeze a tomato? The role of motor representations in semantic sensibility judgments. Journal
of Memory and Language 28: 56–77.
Köhler, Wolfgang 1925. The Mentality of Apes. London: Routledge and Kegan Paul.
532 IV. Contemporary approaches
Lakoff, George and Mark Johnson 1999. Philosophy in the Flesh. The Embodied Mind and Its
Challenge to Western Thought. New York: Basic Books.
Lane, Harlan 1984. Where the Mind Hears: A History of the Deaf. New York: Random House.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar, Vol. 1: Theoretical Foundations.
Stanford, CA: Stanford University Press.
Lausberg, Hedda and Sotaro Kita 2003. The content of the message influences the hand choice in
co-speech gestures and in gesturing without speaking. Brain and Language 86(1): 57–69.
Lavergne, Joanne and Doreen Kimura 1987. Hand movement asymmetry during speech: No effect
of speaking topic. Neuropsychologia 25: 689–693.
Liddell, Scott K. 2003. Sources of meaning in ASL classifier predicates. In: Karen Emmorey (ed.), Per-
spectives on Classifier Constructions in Sign Languages, 199–219. Mahwah, NJ: Lawrence Erlbaum.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Müller, Cornelia 2008. Metaphors Dead and Alive, Sleeping and Waking. Chicago: University of
Chicago Press.
Niedenthal, Paula M., Markus Brauer, Jamin B. Halberstadt and Ase H. Innes-Ker 2001. When
did her smile drop? Contrast effects in the influence of emotional state on the detection of
change in emotional expression. Cognition and Emotion 15(6): 853–864.
Noe, Alva and J. Kevin O’Regan 2002. On the brain-basis of visual consciousness: A sensorimotor
account. In: Alva Noe and Evan Thompson (eds.), Vision and Mind, 367–398. Cambridge: Mas-
sachusetts Institute of Technology Press.
Ohala, John J. 1994. The frequency code underlies the sound-symbolic use of voice pitch. In:
Leanna Hinton, Johanna Nichols and John J. Ohala (eds.), Sound Symbolism, 325–347. Cam-
bridge: Cambridge University Press.
Okrent, Arika 2002. A modality-free notion of gesture and how it can help us with the morpheme
vs. gesture in question in sign language linguistics. In: Richard P. Meier, Kearsy Kormier and
David Quinto-Pozos (eds.), Modality and Structure in Signed and Spoken Language, 175–198.
Cambridge: Cambridge University Press.
Perlman, Marcus in press. Talking fast: The use of speech rate as iconic gesture. In: Fey Parrill,
Vera Tobin and Mark Turner (eds.) Meaning, Form, and Body. Stanford, CA: Center for the
Study of Language and Information Publications.
Pietrandrea, Paola 2002. Iconicity and arbitrariness in Italian Sign Language. Sign Language Stu-
dies 2(3): 296–321.
Pika, Simone 2007. Gestures in subadult gorillas (Gorilla gorilla). In: Josep Call and Michael Tomasello
(eds.), The Gestural Communication of Apes and Monkeys, 41–67. New York: Lawrence Erlbaum.
Pinker, Steven 1994. The Language Instinct. New York: William Morrow.
Rauscher, Frances H., Robert M. Krauss and Yihsiu Chen 1996. Gesture, speech and lexical
access: The role of lexical movements in speech production. Psychological Science 7(4): 226–231.
Richardson, Daniel and Teenie Matlock 2007. The integration of figurative language and static de-
pictions: An eye movement study of fictive motion. Cognition 102: 129–138.
Rime, Bernard, Loris Schiarapture, Michel Hupet and Anne Ghysselinkckx 1984. Effects of rela-
tive immobilization on the speaker’s nonverbal behavior and on the dialogue imagery level.
Motivation and Emotions 8(4): 311–325.
Savage-Rumbaugh, E. Sue, Kelly McDonald, Rosa A. Sevcik, William D. Hopkins and Elizabeth
Rupert 1986. Spontaneous symbol acquisition and communicative use by pygmy chimpanzees
(Pan paniscus). Journal of Experimental Psychology: General 115(3): 211–235.
Savage-Rumbaugh, E. Sue, Beverly J. Wilkerson and Roger Bakeman 1977. Spontaneous gestural
communication among conspecifics in the pygmy chimpanzee (Pan paniscus). In: Geoffrey H.
Bourne (ed.), Progress in Ape Research, 97–116. New York: Academic Press.
Shintel, Hadas and Howard C. Nusbaum 2007. The sound of motion in spoken language: Visual
information conveyed by acoustic properties of speech. Cognition 105(3): 681–690.
34. Levels of embodiment and communication 533
Shintel, Hadas, Howard C. Nusbaum and Arika Okrent 2006. Analog acoustic expression in
speech communication. Journal of Memory and Language 55: 167–177.
Stanfield, Robert A. and Rolf A. Zwaan 2001. The effect of implied orientation derived from ver-
bal context on picture recognition. Psychological Science 12(2): 153–156.
Tanner, Joanna E. and Richard W. Byrne 1996. Representation of action through iconic gesture in
a captive lowland gorilla. Current Anthropology 37(1): 162–173.
Tanner, Joanna E., Francine G. Patterson and Richard W. Byrne 2006. The development of spon-
taneous gestures in zoo-living gorillas and sign-taught gorillas: From action and location to
object representation. Journal of Developmental Processes 1: 69–102.
Tomasello, Michael 2008. Origins of Human Communication. Cambridge: Massachusetts Institute
of Technology Press.
Vallee-Tourangeau, Frederic, Susan H. Anthony and Neville G. Austin 1998. Strategies for gener-
ating multiple instances of common and ad hoc categories. Memory 6(5): 555–592.
Vygotsky, Lev S. 1986. Thought and Language. Edited and translated by Eugenia Hanfmann and
Gertrude Vakar, revised and edited by Alex Kozulin. Cambridge: Massachusetts Institute of
Technology Press.
Wilcox, Sherman 2004. Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed lan-
guages. Cognitive Linguistics 15(2): 119–147.
Wilson, Nicole L. and Raymond W. Gibbs 2007. Real and imagined body movement primes met-
aphor comprehension. Cognitive Science 31: 721–731.
Wu, Ling and Lawarence Barsalou 2001. Grounding concepts in perceptual simulation: I. Evi-
dence from property generation. Unpublished manuscript.
Zwaan, Rolf A. 2004. The immersed experiencer: Toward an embodied theory of language com-
prehension. In: Brian H. Ross (ed.), The Psychology of Learning and Motivation, 35–62. New
York: Academic Press.
Zwaan, Rolf A., Robert A. Stanfield and Richard H. Yaxley 2002. Language comprehenders men-
tally represent the shape of objects. Psychological Science 13(2): 168–171.
Abstract
The notion of “embodiment” is popular and useful for the cognitive sciences, but it needs
to be made more precise. This chapter describes four different levels of embodiment,
from the perspective of a cognitive semiotics, focusing on meaning. On the most basic
level, the biological body, meaning is co-extensional with life. The lived body, however,
concerns the experiencing self. The significational body engages in sign use and the ex-
tended body concerns the normative level involved in shared sign systems. Communication
534 IV. Contemporary approaches
thus has a different nature on each of these levels of embodiment. On the level of the bio-
logical body, communication consists of reactions to bodily states, such as cries. For the ex-
periencing self, communication involves intentional movements. The third level of
embodiment brings in communicative actions such as gestures (including vocal ones) in
which a sign relationship is intentionally communicated. The fourth level involves system-
atic use of external representations, as in signed or spoken (and written) languages. This
framework provides a coherent, cognitive semiotic approach for using, understanding,
and researching the notions of embodiment and communication.
1. Introduction
After being dominated for a long time by formal and computational approaches, there
is currently a growing consensus in the cognitive sciences that “the body shapes the
mind” (Gallagher 2005), as well as human communication. However, behind the
term “embodiment” lie many different concepts (see Krois, Rosengren, Steidele, et al.
2007; Ziemke, Zlatev and Frank 2007). Focusing on the embodiment of meaning, this
chapter outlines four different kinds of embodiment: biological, phenomenological,
significational (sign-based) and extended. Since the notion of communication, with con-
siderably older ancestry, is still more ambiguous, I propose a general definition and
four corresponding levels, illustrating these with examples from the comparative and
developmental literatures.
One conclusion is that a purely biological view of embodiment can only account for
the lowest level of communication. By complementing the biological perspective of the
body with a phenomenological one (Husserl [1952] 1989; Merleau-Ponty 1962), focus-
ing on “the lived body” (Leib), we can accommodate crucial dimensions of communi-
cation such as agency and intention. Furthermore, a phenomenological semiotics can
provide a sign concept that is not “Cartesian” or “solipsist”, but rather grounded in
the acts of the lived body, and furthermore on the roles of symbolic artifacts and exter-
nal representations (Donald 1991; Sinha 2009; Sonesson 2007a, 2007b, 2009). The pro-
posal is meant to counteract a danger eminent in focusing too much on the “highest”
level of meaning and communication: a devaluing of the foundational role of the
real, living and lived human body in bringing forth a human, and humane, world.
Radical embodiment … [is] radically altering the subject matter and theoretical framework
of cognitive science. (Clark 1999: 22)
34. Levels of embodiment and communication 535
Despite initial optimism (see Varela, Thompson and Rosch 1991), however, a widely
accepted, coherent interdisciplinary theoretical framework for the study of human
meaning, communication and thinking based on the notion of embodiment has been
lacking (see Zlatev 2007). One important reason behind this is the ambiguity of the
term “embodiment” itself. The cognitive psychologist Wilson (2002) writes of “six
views of embodied cognition” and the cognitive scientist Ziemke (2003) of “six differ-
ent notions of embodiment”, with the two sets only partially overlapping. Rohrer (2007:
348) states: “By my latest count the term “embodiment” can be used in at least twelve
different important senses with respect to our cognition”, which are different from the
“senses” highlighted by Wilson and Ziemke. Such non-convergence is hardly surprising,
since for a consistent classification one first needs to ask: what (X) is it that is (claimed
to be) embodied in what (Y)? One can find at least the following terms substituting for
the variables in this schema:
Related to the problem of ambiguity is that of overextension: Are all aspects of the
(human) mind embodied, and if so, are they done so in the same way? Some strong
statements have been made to this effect:
Image schematic structure is the basis for our understanding of all aspects of our percep-
tion and motor activities. […] Conceptual Metaphor Theory proposes that all abstract con-
ceptualization works via conceptual metaphor, conceptual metonymy, and a few other
principles of imaginative extension. (Johnson and Rohrer 2007: 33, 38, my emphasis)
It is not surprising that such claims have been met with skepticism (Haser 2005; Sinha
2009; Zlatev 2007).
To rephrase, the meaning of a given phenomenon, for a given subject, will be deter-
mined by the type of world in which the phenomenon appears and the value of the phe-
nomenon for the subject. If either p is not within W or its value for S is nil, p will be
meaningless for S. Depending on the nature of (a), (b), and (c), four levels of meaning
can be defined, summarized in Tab. 34.1, and explained in the remainder of this section.
Tab. 34.1: Summary of the four levels of meaning of The Semiotic Hierarchy (Zlatev 2009)
Level Subject World Value system
1 Organism Umwelt Biological
2 Minimal self Natural Lebenswelt Phenomenal
3 Enculturated self Cultural Lebenswelt Significational (Sign-based)
4 Linguistic self Universe of discourse Normative
i.e. the biological directedness of the organism-subject toward phenomena which it “ex-
periences” (due to its intrinsic value system) as meaningful, even if non-phenomenally,
is a plausible ground for the emergence of consciousness (as primitive sentience) in
evolution (see Popper 1962, 1992; Thompson 2007; Zlatev 2003, 2009).
On the level of phenomenal value/meaning, there is not only a biologically meaning-
ful Umwelt, but a phenomenal Lebenswelt in which the subject is immersed. The subject
S is here a “minimal self ” (Gallagher 2005), with (at least) affective and perceptual con-
sciousness, which is intentional (i.e. directed) towards whatever is perceived. The other
sense of “intention”, related to agency and volition is related to having a body image,
unifying (at least) haptic, proprioceptive and visual experience of one’s own body (Gal-
lagher 2005), giving higher animals (i.e. at least mammals) and human infants a “sense
of self ” (Stern [1985] 2000), capable of acting purposefully on their surroundings.
The role of the biological and lived bodies of the previous two levels appears to be
fundamental for (the emergence of) signification (sign use) in two ways: through (whole
body) imitation (Piaget 1962) and the most basic forms of sign use in ontogeny and
possibly in evolution: iconic and deictic gestures, both being aspects of bodily mimesis
(Donald 1991, 2001; Zlatev 2007, 2008a, 2008c).
(ii) Communication is the process that links discontinuous parts of the living world to
one another. (Ruesch 1957: 462)
(iii) The means of sending military messages, orders, etc. as by telephone, telegraph, radio,
couriers. (The American College Dictionary. New York: Random House. 1964: 224)
(iv) Those situations in which a source transmits a message to a receiver with conscious
intent to affect the latter’s behaviors. (Miller 1966: 92)
(v) It is the process that makes common to two or several what was the monopoly of
one. (Gode 1959: 5)
(vi) Communication is the verbal interchange of a thought or idea. (Hoben 1954: 77)
(vii) Communication is the transmission of information. (Berelson and Steiner 1964:
254)
The first dimension is generality, with (ii) being (arguably) much too general, and
(iii) much too concrete. The second dimension is intentionality, with (iv) requiring “con-
scious intent”, and (v) not. The third concerns whether definitions presume success of
communication, or not. The definition in (vi) does so (along with the over-restrictive
requirement that interchange be “verbal”), while (vii) focuses on the “transmission”,
but does not require that the message is successfully received or understood.
Littlejohn (1999: 7) provides an informative division of “theories of communication”
into five major types: (a) structural-functional theories deriving from system theory,
semiotics, linguistics and discourse studies, (b) cognitive and behavioral theories from
the cognitive and biological sciences, (c) interactionist theories from ethnomethodology
and related forms of social studies, (d) interpretive theories from hermeneutics and phe-
nomenology which “celebrate subjectivism or the preeminence of individual experi-
ence” (Littlejohn 1999: 15) and (e) critical theories, often based on Marxism, which
“focus on issues of inequality and oppression” (Littlejohn 1999: 15). Most importantly,
Littlejohn points out the different theories’ strengths and weaknesses, and shows that
these camps focus on complementary aspects of communication: (a) and (c) on the
social-cultural dimension, with (a) zeroing in on structures, while (c) on processes.
Both (b) and (d) focus on the individual, but from different perspectives: (b) from
the third-person perspective of “objective” observation, while (d) from the first-person
perspective, typical for the humanities. Finally, while (a–d) all focus on understanding
(and possibly explaining) communication, (e) attempts further to use such knowledge
in order to change communicative practices and structures for the better. It is hard
not to agree with Littlejohn’s conclusion: “These genres are more than theory types.
They also embody philosophical commitments and values and reflect the kind of
work that different theorists believe is important” (Littlejohn 1999: 16).
Given this diversity, it may appear naı̈ve to propose a “unified theory of communi-
cation”. Still it is possible to apply the synthetic theory of meaning outlined in the pre-
vious section in order to distinguish levels of communication, corresponding to the
levels of meaning and embodiment. The ambitions of the emerging paradigm of cogni-
tive semiotics involve precisely the combination of the social-cultural and the individual
approaches, the scientific “third-person” and the experiential “first-person” perspec-
tives, as expressed on the home site of the journal with the same name:
The first of its kind, Cognitive Semiotics is a multidisciplinary journal devoted to high qual-
ity research, integrating methods and theories developed in the disciplines of cognitive
540 IV. Contemporary approaches
science with methods and theories developed in semiotics and the humanities, with the
ultimate aim of providing new insights into the realm of human signification and its
manifestation in cultural practices. (www.cognitivesemiotics.com)
With respect to generality (viii) is clearly less abstract than (ii) by requiring that the
communicating entities be subjects, rather than “parts of the living world” such as neu-
rons or hormones. At the same time it is clearly general, rather than domain-specific
like (iii). With respect to “intentionality” it is non-committed, allowing subdivision
into intended and non-intended transmission of meanings – in two different ways to
be explicated below. As for “success”, it uses the (often criticized) notion of transmis-
sion, as in (vi), but unlike it, it does not focus solely on the “sender”, but on both parties
(“between”). Also unlike (vi), it does not concern only verbal meaning. At the same
time, it does not require in all cases that the sender’s meaning (rather than “informa-
tion”) to be identical with that of the receiver, as in (vii), thus allowing for individual
interpretation, and collective negotiation.
Tab. 34.2: Levels of communication, corresponding to the four levels of meaning of the Semiotic
Hierarchy, and the levels of embodiment (see Section 2), with categories of communicative signals
from the different communicative modalities, or “channels”
Level Subject Embodiment Bodily-Visible/ Vocal- Material-Visible/
Haptic Audible Audible
1 Organism Biological Bodily reactions Cries Traces
2 Minimal self Phenomenological Intention- Directed Marks
movements, calls
Attention getters
3 Enculturated Significational Gesture, “Vocal Early picture
self pantomime gestures” comprehension
4 Linguistic self Normative/ Signed language Spoken Writing, external
Extended language representations
of conscious experience and purposive action can hardly be in doubt (see Beshkar 2008;
Zlatev 2009). Furthermore, the signals discussed here are produced with the purpose of
influencing the behavior of conspecifics, thus amounting to the definition given in (4).
However, while being both “intentional” (i.e. volitional) and “communicative” this
does not amount to a strong notion of intentional communication (Grice 1989), which
requires a higher-level intention: not only to influence the behavior of the receiver,
but that the receiver understands the sender’s intended meaning, a form of third-level
mentality: “I wish that you understand that I mean X in producing Y” (Zlatev
2008a). This implies at least some mastery of signification, which would bring us to
the next level of the hierarchy (Section 3.2.3).
While all species of great apes have been shown capable of signification given special
tutoring and human enculturation: chimpanzees and bonobos (Savage-Rumbaugh
1998), gorillas (Patterson 1980) and orangutans (Miles 1990), their spontaneous com-
municative signals are not true signs and do not amount to “intentional communica-
tion” as defined above (see Deacon 1997; Tomasello 2008), though that claim has
been contested (e.g. Savage-Rumbaugh 1998).
In the bodily-visual modality, there has been considerable recent interest in ape ges-
tures (Call and Tomasello 2007; Pika 2008), and considerable individual and intra-
species group variation, implying learning, has been shown. Leavens and colleagues
(e.g. Leavens, Hopkins and Bard 2008) have documented the widespread presence of
spontaneous “pointing” in captive apes of all species, mostly to human receivers, but
also among themselves in some special conditions. However, Tomasello (2008) and
Pika (2008) argue that such gestures are qualitatively different from those of children
in their second year of life. To put it simply in the terminology of this paper, while chil-
dren’s deictic and iconic gestures involve sign use, those of the apes are not, but can be
categorized as either intention movements (IMs), attention getters (AGs) or a combina-
tion of both. IMs arise from so-called ontogenetic ritualization: e.g. a pulling of the
other’s body in a desired direction becomes toned down to a gentle tug with time,
since the receiver has learned to respond adequately to the initial part of the sender’s
action, allowing it to become “stylized”. Apes also demonstratively understand that the
other needs to attend to such IMs for them to be efficient communicative signals, and
hence when the receiver is facing another direction, the sender will usually produce
AGs – either in the bodily-haptic channel (touching, patting) or in the vocal-audible
one (calling) in order to gain receiver’s attention prior to producing IMs. Tomasello
(2008) argues that ape pointing, which in non-enculturated individuals is always to
desired objects and most often food, arises precisely in this way.
Vocal calls as AGs have already been mentioned as an example of communicative
signals on this level within the vocal-audible modality. Unlike IMs and non-vocal
AGs, however, most ape calls do not seem to be learned (“socially transmitted”), but
species-general, “innate” signals. Hence Tomasello (2008) argues that ape bodily-visible
communicative signals, and not calls, were the likely stepping stone for the evolution of
language: an argument for the “gesture-first” position within the prolonged debate with
“speech-first” theorists (see Johansson 2005). This is plausible, but ape (and dolphin)
vocal signals are not to be easily dismissed. In the case of the most studied non-
human species in primatology, chimpanzees, calls have been shown to be of two
types: “broadcast” and “proximal”. The first, such as the “food-cry” are high-pitched,
not addressed to anyone in particular, apparently involuntary (Deacon 1997), and
34. Levels of embodiment and communication 543
while their communicative function is often agonistic (pro-social) rather than (only)
antagonistic, they seem to be closer to the cries of level 1. The second type of calls
are low-pitched, directed to particular individuals, voluntarily produced and intended
to have a particular effect, e.g. consoling a distressed relative. It has also been shown
that the two types lead to different brain-activation patterns: more localized to the
right-hemisphere for the directed, proximal calls (Taglialatela, Russell, Schaeffer,
et al. 2009). Therefore, it seems that Tomasello (2008) underestimates the complexity
of ape vocal abilities, by treating them basically as a level 1 phenomenon (cries):
non-voluntary and unlearned.
However, what was stated in the beginning of this subsection, that both bodily and
vocal spontaneous animal signals are not communicative signs, remains unchallenged.
The calls signaling different types of predators (leopard, eagle, snake) produced by ma-
caques, which received much attention at the beginning of the 1990s, are now nearly
unanimously agreed to be “broadcast” signals, serving their communicative functions,
without being either learned or intentional (both in the sense of “voluntary” and “di-
rected”) to another (see Cheney and Seyfarth 2005). Surprisingly, an interesting case
for possible spontaneous sign use by non-humans, made by Savage-Rumbaugh (1998)
has not received much attention in the literature. It concerns the third type of channel:
the external-visible one. During troop migrations, wild bonobos have been observed to
break and leave branches at path crossings, possibly signaling to other members of the
troop following them the direction that they had taken. The suggestion is that given the
ecological context (dense vegetation preventing visual contact, predators that would
be informed in the case of vocal signaling), this was a strategy consciously chosen by
some individual troop members, and then became socially transmitted. If this interpre-
tation were to be confirmed with more cases and better documentation, the branches
would be almost literally “pointers” (i.e. deictic signs) fulfilling the conditions for signi-
fication and intentional communication given earlier. Unsurprisingly, Tomasello (1999)
is skeptical, since breaking tree branches and dragging them is part of the behavioral
repertoire of the species, and is used for a variety of “display” functions. Still, this is
a good example, if only as a thought experiment of sorts, alerting us that communication
“in the wild” could take forms and modalities that are not readily apparent to us. The
classification in Tab. 34.2 takes an intermediary position (between Tomasello and Sav-
age-Rumbaugh) calling the branches along the path marks: it is at least possible that the
natural branch breaking behavior may become “ritualized” in a particular troop, so that
some members voluntarily leave them along the way so as to influence the behavior of
those following in a (literally) desired direction, though without involving intentional
communication proper, i.e. involving a higher-level intention for their intended meaning
to be understood.
the present disagree on whether they are truly “symbolic” (i.e. signs as here defined)
rather than “indices” (Piaget 1962) that could be understood as level 2 communicative
signals, i.e. associations between a vocalization and a desired object or event. It is first
with the “vocabulary spurt” around the middle of the second year, and clearly by
20 months that it is generally uncontested that the child has made the entry into
language.
Considering the period between 9 and 18 months, on the other hand, there is strong
evidence that the (typically developing) child has become a sign user, foremost in the
bodily-visible channel (Acredolo and Goodwyn 1990; Bates, Camaioni and Volterra
1975; Bates 1979; Blake, Osborne, Cabral, et al. 2003; Carpenter, Nagell and Tomasello
1998; Liszkowski, Carpenter, Henning, et al. 2004; Piaget 1962). This period can be sub-
divided in two stages. The child’s first bodily communicative signals are dyadic (e.g. rais-
ing the hands to express the wish to be picked up) and when triadic, function as requests
for objects. Thus they resemble the “gestures” of the great apes discussed in the previ-
ous sub-section. In a recent review, Pika asks a pertinent question “Gestures of apes
and pre-linguistic human children: Similar or different?” and concludes that there are
both similarities and differences: “Many human gestures are … used to direct the atten-
tion and mental states of others to outside entities… Apes also gesture… but use these
communicative means mainly as effective procedures in dyadic interactions to request
action from others” (Pika 2008: 131–132). While the ontogenetic progression needs to
be more carefully studied, especially in a cross-cultural perspective, it seems that the
gestures that are specific for human children and which “direct the attention of others
to some third entity, simply for the sake of sharing interest in it or commenting on it”
(Pika 2008: 131) appear clearly from about 13–14 months. It is in part a terminological
issue, but Tab. 34.2 reserves the term gesture for those (human-specific) bodily ex-
pressions which (a) “stand” for a an actual or imagined object, action or event,
and (b) in which this sign relationship is intentionally communicated. It is possible
to have (a) without (b), as in “private” symbolic play or reenactment, but (b) clearly
requires (a). Since the relationship between expression and meaning is not (yet) con-
ventional, there are two ways in which the meaning can be “transmitted”: on the
basis of resemblance between R(epresentamen) and O(object) (see Section 2.2.3),
i.e. through iconic gestures, and through declarative (as opposed to imperative) point-
ing. Children’s iconic gestures are performed from a “character viewpoint” rather
than an “observer viewpoint” (McNeill 2005) or a “first-person perspective” rather
than a “third-person perspective” (Zlatev and Andrén 2009). Thus, there is no
clear distinction between iconic gestures and “pantomime” before the emergence
of language.
What about the other two channels on this level? While this is controversial, I would
suggest that prior to the vocabulary spurt around the middle of the second year, the
child’s first “words” serve a subordinate role to gestural communication, as a supple-
ment to the multimodal communicative signal (Clark 1996). Their conventional refer-
ential function is not yet clear to the child and they serve as “vocal gestures” – a role
more sophisticated than that of directed calls (level 2), but not yet part of a linguistic
system (level 4). This is consistent with the growing acceptance of the view of “gesture
as the cradle of speech” (see Acredolo and Goodwyn 1990; Iverson and Goldin-
Meadow 1998; Lock and Zukow-Goldring 2010).
34. Levels of embodiment and communication 545
4. Conclusions
About a decade ago, assessing the status quo within linguistic and philosophical seman-
tics, cognitive science and semiotics, I made the following pessimistic pronouncement:
Our conception of meaning has become increasingly fragmented, along with much else in
the increasing “postmodernization” of our worldview. The trenches run deep between dif-
ferent kinds of meaning theories: mentalist, behaviorist, (neural) reductionist, (social) con-
structivist, functionalist, formalist, computationalist, deflationist… And they are so deep
that a rational debate between the different camps seems impossible. The concept is
treated not only differently but incommensurably within the different disciplines. (Zlatev
2003: 253)
This served as the motivation for attempting to formulate “an outline of a unified bio-
cultural theory of meaning”, giving a foundational place to life (rather than machines),
and proposing various hierarchies of meaning in evolution and development, which in
a broadly continuous framework could also accommodate qualitative changes. Such
ambitions may have been somewhat premature, but since then, several impressive at-
tempts at integrational theories of meaning have been proposed (Brier 2008; Emmeche
2007; Sonesson 2007b; Stjernfelt 2007) as well as a rapprochement between phenomen-
ology and cognitive science (Gallagher and Schmicking 2010; Gallagher and Zahavi
2008; Thompson 2007). The appearance of the journal Cognitive Semiotics can be
seen as a reflection of the same need to counter the fragmentation described in the
quotation above. The Semiotic Hierarchy (Zlatev 2009) and its extension to the notion
of embodiment and communication explored in this chapter are in line with these
developments.
What is common to all these approaches is an effort to assert “the primacy of the
body”, but without falling into any form of biological reductionism in which the body
(with focus on the brain) is treated as kind of physical object, a sophisticated machine.
Another common motivation is a desire to point out that “higher levels” of meaning,
communication and intersubjectivity presuppose lower ones: evolutionarily, develop-
mentally, but also “synchronically”. Meaning and communication are rooted in the bio-
logical, lived and significational bodies interacting with their respective “worlds” (see
Tab. 34.1). This is important since neglecting the body in theorizing leads to distorted
accounts involving at one extreme beliefs in innate “language organs”, and at another
extreme, claims that “everything is a text”. Still much worse would be a cultural
devaluing of the living and lived body in an over-technological society and “globalized”
world. This could potentially lead to the experience of a vacuum of meaning and
breakdown in communication.
5. References
Acredolo, Linda and Susan Goodwyn 1990. Sign language among hearing infants: The spontane-
ous development of symbolic gestures. In: Virginia Volterra and Carol J. Erting (eds.), From
Gesture to Language in Hearing and Deaf Children, 68–78. Berlin: Springer.
Almeida, Olinda G., António Miranda, Pedro Frade, Peter C. Hubbard, Eduardo N. Barata and
Adelino V. M. Canário 2005. Urine as a social signal in the Mozambique Tilapia (Oreochromis
mossambicus). Chemical Senses 30 (suppl 1): i309–i310.
34. Levels of embodiment and communication 547
Bates, Elizabeth 1979. The Emergence of Symbols. Cognition and Communication in Infancy. New
York: Academic Press.
Bates, Elizabeth, Luigia Camaioni and Virginia Volterra 1975. The acquisition of performatives
prior to speech. Merrill-Palmer Quarterly 21: 205–226.
Berelson, Bernard and Gary A. Steiner 1964. Human Behavior. New York: Harcourt, Brace and
World.
Beshkar, Majid 2008. Animal consciousness. Journal of Consciousness Studies 15(3): 5–34.
Blake, Joanna, Patricia Osborne, Marlene Cabral and Pamela Gluck 2003. The development of
communicative gestures in Japanese infants. First Language 23(1): 3–20.
Brier, Soren 2008. Cybersemiotics: Why Information Is Not Enough. Toronto: University of
Toronto Press.
Call, Josep and Michael Tomasello 2007. The Gestural Communication of Apes and Monkeys.
Mahwah, NJ: Lawrence Erlbaum.
Carpenter, Malinda, Katherine Nagell and Michael Tomasello 1998. Social cognition, joint atten-
tion, and communicative competence from 9 to 15 months. Monographs of the Society of
Research in Child Development 63(4). Boston, MA: Blackwell.
Cheney, Dorothey L. and Robert M. Seyfarth 2005. Constraints and preadaptations in the earliest
stages of language evolution. Linguistic Review 22: 135–159.
Chomsky, Noam 1965. Aspects of a Theory of Syntax. Cambridge: Massachusetts Institute of Tech-
nology Press.
Clark, Andy 1999. An embodied cognitive science? Trends in Cognitive Sciences 3(9): 345–351.
Clark, Herbert H. 1996. Using Language. Cambridge: Cambridge University Press.
Dance, Frank E. 1970. The “concept” of communication. Journal of Communication 20:
201–220.
Dance, Frank E. and Charles E. Larson 1976. The Foundations of Human Communication: A The-
oretical Approach. New York: Holt, Rinehart & Winston.
Deacon, Terrence 1997. The Symbolic Species: The Co-evolution of Language and the Brain. New
York: Norton.
DeLoache, Judy S. 2004. Becoming symbol-minded. Trends in Cognitive Sciences 8: 66–70.
Derrida, Jacques 1976. Of Grammatology. Baltimore: John Hopkins University Press.
Donald, Merlin 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and
Cognition. Cambridge, MA: Harvard University Press.
Donald, Merlin 2001. A Mind so Rare: The Evolution of Human Consciousness. New York:
Norton.
Edelman, Gerald M. 1992. Bright Air, Brilliant Fire: On the Matter of the Mind. New York: Basic
Books.
Emmeche, Claus 2007. On the biosemiotics of embodiment and our human cyborg nature. In: Tom
Ziemke, Jordan Zlatev and Roslyn M. Frank (eds.), Body, Language, Mind: Volume 1,
Embodiment, 411–430. Berlin: De Gruyter Mouton.
Fodor, Jerry A. 1981. Representations. Cambridge: Massachusetts Institute of Technology Press.
Gallagher, Shaun 2005. How the Body Shapes the Mind. Oxford: Oxford University Press.
Gallagher, Shaun and Daniel Schmicking 2010. Handbook of Phenomenology and Cognitive
Sciences. Dordrecht, the Netherlands: Springer.
Gallagher, Shaun and Dan Zahavi 2008. The Phenomenological Mind: An Introduction to Philos-
ophy of Mind and Cognitive Science. London: Routledge.
Gallese, Vittoria and George Lakoff 2005. The brain’s concepts: The role of the sensori-motor sys-
tem in conceptual knowledge. Cognitive Neuropsychology 22: 445–479.
Gode, Alexander 1959. What is communication. Journal of Communication 9: 3–20.
Grice, Paul 1989. Meaning. In: Paul Grice (ed.), Studies on the Way of Words, 213–223. Cambridge,
MA: Harvard University Press.
Haser, Verena 2005. Metaphor, Metonymy and Experientialist Philosophy. Berlin: De Gruyter
Mouton.
548 IV. Contemporary approaches
Nelson, Katherine and Lea K. Shaw 2002. Developing a socially shared symbolic system. In: James
P. Byrnes and Eric Amsel (eds.), Language, Literacy and Cognitive Development, 27–57. Hills-
dale, NJ: Lawrence Erlbaum.
Patterson, Francis 1980. Innovative use of language in a gorilla: A case study. In: Katherine Nelson
(ed.), Children’s Language. Vol. 2, 497–561. New York: Gardner Press.
Paukner, Annika, Stephen J. Suomi, Elisabetta Visalberghi and Pier F. Ferrari 2009. Capu-
chin monkeys display affiliation toward humans who imitate them. Science 325(5942):
880–883.
Peirce, Charles S. 1958. Collected Papers of Charles Sanders Peirce. Cambridge, MA: Harvard Uni-
versity Press.
Persson, Tomas 2008. Pictorial Primates: A Search for Iconic Abilities in Great Apes. Lund,
Sweden: Lund University Cognitive Studies, 136.
Piaget, Jean 1962. Play, Dreams and Imitation in Childhood. New York: Norton. First published
[1945].
Pika, Simone 2008. Gestures of apes and pre-linguistic human children: Similar or different? First
Language 28(2): 116–140.
Pinker, Steven 1994. The Language Instinct. New York: William Morrow.
Popper, Karl 1962. Objective Knowledge. Oxford: Oxford University Press.
Popper, Karl 1992. In Search of a Better World: Lectures and Essays from Thirty Years. London:
Routledge.
Ruesch, Jurgen 1957. Technology and social communication. In: Lee Thayer (ed.), Communica-
tion: Theory and Research, 452–481. Springfield, IL: Thomas.
Savage-Rumbaugh, Sue 1998. Scientific schizophrenia with regard to the language act. In: Jonas
Langer and Melanie Killen (eds.), Piaget, Evolution and Development, 145–169. Mahwah,
NJ: Lawrence Erlbaum.
Searle, John 1995. The Construction of Social Reality. London: Allen Lane.
Senghas, Ann, Sotaro Kita and Asli Özyürek 2004. Children creating core properties of language:
Evidence from an emerging sign language in Nicaragua. Science 305: 1779–1782.
Sinha, Chris 2004. The evolution of language: From signals to symbols to system. In: Kimbrough
D. Oller and Ulrike Griebel (eds.), Evolution of Communication Systems: A Comparative
Approach, 217–235. Cambridge: Massachusetts Institute of Technology Press.
Sinha, Chris 2009. Objects in a storied world: Materiality, normativity, narrativity. Journal of Con-
sciousness Studies 16(6–8): 167–190.
Sinha, Chris and Kristine Jensen de Lopéz 2000. Language, culture and the embodiment of spatial
cognition. Cognitive Linguistics 11(1–2): 17–41.
Sonesson, Goran 1989. Pictorial Concepts. Lund, Sweden: Lund University Press.
Sonesson, Goran 2007a. The extensions of man revisited. From primary to tertiary embodiment.
In: John M. Krois, Mats Rosengren, Angela Steidele and Dirk Westerkamp (eds.), Embodi-
ment in Cognition and Culture, 27–56. Amsterdam: John Benjamins.
Sonesson, Goran 2007b. From the meaning of embodiment to the embodiment of meaning: A
study in phenomenological semiotics. In: Tom Ziemke, Jordan Zlatev and Roslyn M. Frank (ed.),
Body, Language and Mind, Vol 1: Embodiment, 85–127. Berlin: De Gruyter Mouton.
Sonesson, Goran 2009. The view from Husserl’s lectern: Considerations on the role of phenomen-
ology in cognitive semiotics. Cybernetics and Human Knowing 16(3–4): 107–148.
Stern, Daniel N. 2000. The Interpersonal World of the Infant: A View from Psychoanalysis and De-
velopmental Psychology. New York: Basic Books. First published [1985].
Stjernfelt, Frederik 2007. Diagrammatology. Dordrecht, the Netherlands: Springer.
Taglialatela, Jared P., Jamie L. Russell, Jennifer A. Schaeffer and William D. Hopkins 2009. Visua-
lizing vocal perception in the chimpanzee brain. Cerebral Cortex 19(5): 1151–1157.
Thompson, Evan 2007. Mind in Life: Biology, Phenomenology and the Sciences of Mind. Cam-
bridge, MA: Harvard University Press.
550 IV. Contemporary approaches
Tomasello, Michael 1999. The Cultural Origins of Human Cognition. Cambridge, MA: Harvard
University Press.
Tomasello, Michael 2003. Constructing a Language. A Usage-Based Theory of Language Acquisi-
tion. Cambridge, MA: Harvard University Press.
Tomasello, Michael 2008. Origins of Human Communication. Cambridge: Massachusetts Institute
of Technology Press.
Varela, Francisco, Evan Thompson and Eleanor Rosch 1991. The Embodied Mind. Cambridge:
Massachusetts Institute of Technology Press.
von Uexküll, Jakob 1982. The theory of meaning. Semiotica 42(1): 25–82. First published [1940].
Vygotsky, Lev S. 1962. Thought and Language. Cambridge: Massachusetts Institute of Technology
Press.
Wallace, Brendan, Alastair Ross, John B. Davies and Tony Anderson 2007. The Mind, the Body
and the World. Psychology after Cognitivism? Exeter, UK: Imprint Academic.
Wilson, Margaret 2002. Six views of embodied cognition. Psychonomic Bulletin and Review 12(4):
625–636.
Wittgenstein, Ludwig 1953. Philosophical Investigations. Oxford: Basil Blackwell.
Zahavi, Dan 2003. Husserl’s Phenomenology. Stanford, CA: Stanford University Press.
Ziemke, Tom, Jordan Zlatev and Roslyn M. Frank 2007. Body, Language and Mind. Vol 1:
Embodiment. Berlin: De Gruyter Mouton.
Ziemke, Tom 2003. What’s that thing called embodiment? In: Richard Alterman and David Kirsh
(eds.), Proceedings of the 25th Annual Meeting of the Cognitive Science Society, 1305–1310.
Mahwah, NJ: Lawrence Erlbaum.
Zlatev, Jordan 1997. Situated Embodiment: Studies in the Emergence of Spatial Meaning. Stock-
holm: Gotab.
Zlatev, Jordan 2003. Meaning = Life + (Culture): An outline of a unified biocultural theory of
meaning. Evolution of Communication 4(2): 253–296.
Zlatev, Jordan 2005. What’s in a schema? Bodily mimesis and the grounding of language. In: Beate
Hampe (ed.), From Perception to Meaning: Image Schemas in Cognitive Linguistics, 313–343.
Berlin: De Gruyter Mouton.
Zlatev, Jordan 2007. Embodiment, language and mimesis. In: Tom Ziemke, Jordan Zlatev and Ro-
slyn M. Frank (eds.), Body, Language and Mind. Vol 1: Embodiment, 297–337. Berlin: De
Gruyter Mouton.
Zlatev, Jordan 2008a. The co-evolution of intersubjectivity and bodily mimesis. In: Jordan Zlatev,
Timothy P. Racine, Chris Sinha and Esa Itkonen (eds.), The Shared Mind: Perspectives on In-
tersubjectivity, 215–244. Amsterdam: John Benjamins.
Zlatev, Jordan 2008b. The dependence of language on consciousness. Journal of Consciousness
Studies 15: 34–62.
Zlatev, Jordan 2008c. From proto-mimesis to language: Evidence from primatology and social
neuroscience. Journal of Physiology 102: 137–152.
Zlatev, Jordan 2009. The Semiotic Hierarchy: Life, Consciousness, Signs, Language. Cognitive
Semiotics 4: 169–200.
Zlatev, Jordan 2010. Phenomenology and cognitive linguistics. In: Shaun Gallagher and Daniel
Schmicking (eds.), Handbook on Phenomenology and Cognitive Science, 415–446. Dordrecht,
the Netherlands: Springer.
Zlatev, Jordan and Andrén, Mats 2009 Stages and transitions in children’s semiotic development.
In: Jordan Zlatev, Mats Andrén, Marlene Johansson Falck and Carita Lundmark (eds.), Studies
in Language and Cognition, 380–401. Newcastle: Cambridge Scholars.
Zlatev, Jordan, Timothy P. Racine, Chris Sinha and Esa Itkonen 2008. The Shared Mind: Perspec-
tives on Intersubjectivity. Amsterdam: John Benjamins.
Abstract
This article provides a sketch of the theoretical framework of German Expression
Psychology (GEP) and discusses the forms and functions of bodily and verbal types
of communication that express inner states. Starting with a brief historical overview, we
discuss general concepts of the German Expression Psychology framework, in particular
with respect to the definition of expression, the relationship between expression and its
subject, and the perception of expression. Within each of these areas special attention
is given to the face, body and voice as indicators of inner states. Following this general
overview of German Expression Psychology, we focus on the contribution of three
selected authors, namely, Philipp Lersch, Paul Leyhausen and Egon Brunswik, who
have been particularly influential in the field of German Expression Psychology. For
Lersch, we consider the co-existential relationship between affect and expression, the de-
tailed anatomical description of expressions, as well as the analysis of dynamic aspects of
expressions. Leyhausen added an ethological perspective on expressions and perceptions.
Here, we focus on the developmental aspects of expression and impression formation,
and differentiate between phylogenetic and ontogenetic aspects of expression. Brunswik’s
Lens Model allows a separation between distal indicators on the part of the sender and
proximal percepts on the part of the observer. Here, we discuss how such a model can
be used to describe and analyze nonverbal communication on both the encoding and de-
coding side. Deriving from the presentation of all three authors, we outline the general
relevance of German Expression Psychology for current research, specifically with
respect to the definition and function of expressions and perceptions, and existing
approaches to the study of verbal and nonverbal behavior.
1. Introduction
Since Darwin (1872), the concept of expression has played a central role in the psycho-
logical understanding of emotions. As in other domains, the field of expression research
has undergone several changes and transformations. Interestingly, these conceptual and
methodological shifts have been linked to different research traditions in Europe and
the USA. German Expression Psychology as reviewed in this chapter encompasses
all research on nonverbal behavior that has been performed by German-speaking
expression psychologists (Asendorpf and Wallbott 1982). In this research vein, topics
such as facial expression, gestures and body movements were active areas of inquiry
and part of the curriculum in psychology at German universities until the sixties (see
Scherer and Wallbott 1990). However, just like the study of emotions, which had lost
its attraction in the past decades of psychological research, expression psychology
552 IV. Contemporary approaches
soon literally disappeared from the field. The approach used by many traditional
German Expression Psychology psychologists such as Lersch, Kirchhoff, Klages and
Leonhard was criticized for being speculative and unscientific in nature. Many empirical
psychologists rejected this type of philosophical and phenomenological analysis. As a
result, German Expression Psychology disappeared at the end of the sixties and was re-
placed in the seventies by a new research domain called “nonverbal communication
research.” This new line of research, which was influenced primarily by American psy-
chology, was driven by a research model based on systematic and objective measure-
ment. With its development, a new generation of psychologists working on expressive
behavior emerged in Germany and Europe, almost independently of the classic field
of “expression psychology” (Asendorpf and Wallbott 1982).
The present paper aims to review important concepts and ideas of German Expres-
sion Psychology which no longer receive much attention but have contributed to our
understanding of nonverbal communication. A complete overview of German Expres-
sion Psychology and its contribution for each channel of communication (face, body,
voice) can be found in a series of special issues on German Expression Psychology
by Asendorpf, Wallbott and Helfrich published in the Journal of Nonverbal Behavior
in 1982 and 1986. In the present paper, therefore, an exhaustive description of German
Expression Psychology will not be provided. Instead, following a short summary of Ger-
man Expression Psychology based on the articles just referred to, we will focus on the
contribution of three selected authors, namely Lersch, Leyhausen and Brunswik. Each
of these authors has added his own definition and perspective to the field of expression
psychology. Deriving from the description of the work of all three authors, we will out-
line the extent to which German Expression Psychology is still relevant for current
research on nonverbal behavior.
For body movements, gait and gestures, measurement techniques ranged from subjec-
tive descriptions and judgments to more objective approaches (Wallbott 1982). How-
ever, even in the case of complex and demanding techniques (such as time-way
graphs, light paths, and stereo photography), there was a tendency to interpret the cap-
tured data in subjective and evaluative terms (Scherer and Wallbott 1990). Similarly, for
voice and speech, a distinction was drawn between a genetic, acoustic or auditory level
and an impressionistic or phenomenological one. Most research interests, however,
focused on the phenomenological approach with its holistically perceived qualities of
spoken messages. In contrast, electro-acoustical methods of recording on the auditory
level were treated with caution and considered useful only in combination with a
holistic approach (Helfrich and Wallbott 1986).
Clearly, the focus of German Expression Psychology was not on the abstraction of
single variables and their objective measurement but rather on the supposed meaning
of expressions. Although an expressive act could have more than one meaning, the rela-
tionship between the expression and its subject was regarded rather as constant and in-
variant. In many cases, verbal and nonverbal phenomena were seen as indicative of
individual characteristics such as affective states and personality attributes (Asendorpf
and Wallbott 1982), German Expression Psychology thereby being diagnostic in its
approach. This was particularly pronounced in the case of research on gait and gestures.
Here, the primary interest was not in the expression of emotion, but more in the expres-
sion of character or personality itself (Wallbott 1982). In addition to the functional
value of body movements, it was thought that valuable information about the person’s
character could be extracted. In consequence, research interests were mainly concerned
with behavior characteristics and typological approaches to personality such as those
suggested by Kretschmer: for example, pyknics, athletics and leptosomics (see Kretsch-
mer 1940). A similar approach applied to research on voice and speech. Although
objective views existed that treated vocal features merely as indicators of speech con-
tent, most studies adopted a subjective person-related view. That is, an association
was made between expressive features and the underlying personality traits of the
speaker. Research attempts consequently focused on the identification and interpreta-
tion of different speech types and vocal characteristics in relation to personality or by
attributing certain traits by simply listening to the voice (Helfrich and Wallbott 1986).
The basis for such associative links between expression and personality were mainly
semantic analogies. That is, the person-specific characteristics were inferred from a
semantic description of the expression. In most cases, the meaning of the expression
was derived by analogy from the function of the respective action (Asendorpf and Wall-
bott 1982). For example, eyelid droop has the function of diminishing the visual input
from outside. By analogy, it follows that a person with lowered eyelids should have a
passive and unfocused attitude towards the world (Lersch 1961). Unfortunately, opera-
tionalizations for “passive” and “unfocused” were seldom given. Moreover, such infer-
ence-by-analogy syllogisms were not subject to empirical research that included
standardized tests. Such analogical-metaphorical conclusions, close to the concepts of
naı̈ve psychology, were responsible for much of the later criticism of German Expres-
sion Psychology as being unscientific and subjective in nature (Asendorpf and Wallbott
1982).
Within German Expression Psychology, concepts of expression and perception were
highly interlinked. That is, no clear distinction was made between expression and
554 IV. Contemporary approaches
impression. Rather, both concepts were seen as a gestalt – a holistic perceptual pattern.
As with expression, conclusions were often drawn by analogy and not directly traceable
(Asendorpf and Wallbott 1982). However, more explicit theories also existed that
tried to determine the effects of specific behavioral cues on impression formation.
This was particularly the case for research on the face. Here, the main focus of German
Expression Psychology consisted of the perception and imitation of facial expressions
(Asendorpf 1982). Specifically, the role of certain facial areas and components in evok-
ing behavioral responses was of research interest. Such component studies consisted of
the systematic variation of specific facial characteristics using schematized stimuli or
drawings. Other experiments explored the relative contribution of static (physiog-
nomic) and dynamic (pathognomic) cues, or facial and verbal cues in impression forma-
tion (Asendorpf 1982). A crucial concept was that of imitation in which the feelings
experienced by the perceiver (due to the mimicry of the sender’s expression) were
thought to lead to the attribution of feelings to the sender. In order to describe this
expression-impression process, constructs such as expression, perception and attribu-
tion were important. Although the focus was mainly on the perceived cues and their
imitation, this approach was not far from Brunswik’s Lens Model (see Asendorpf
and Wallbott 1982; Brunswik 1956). Here also a distinction was made between externa-
lization, perception and inference. In the Lens Model, a more fine-grained analysis was
conducted with respect to the type of eliciting emotion, the cues expressed and the im-
pressions formed. The majority of studies in German Expression Psychology failed to
go to such a detailed and objective level of description and in consequence experimental
designs were often unrepresentative with respect to their methodology.
3. Selected authors
In the following, we will focus on the concepts and theories of three authors influential
in the field of German Expression Psychology; namely Phillipp Lersch, Paul Leyhausen,
and Egon Brunswick.
elements in Leyhausen’s arguments was the notion that expression and impression de-
veloped, to a certain degree, independently of one another and that their relationship
was then shaped by the evolutionary pressures that social communication put on the
respective systems (e.g., Leyhausen 1967). Specifically, he argued that the origin of
expression was phylogenetically much earlier than the development and the genetic
transmission of perceptual mechanisms that would be able to respond to expression.
This notion offers explanations for certain (mis-)communication processes that are rel-
evant for any current researcher interested in the communication of affect and intent.
Leyhausen saw the possibility of predicting future behavior and adapting/reacting
accordingly as the most important function of the communication process.
Leyhausen was peculiar in holding the assumption that expression refers to behav-
ioral manifestations of action tendencies that do not correspond to the currently pre-
dominant behavior but that reflect competing processes. Thus, the expression was
considered to be the blend of two or more conflicting behaviors or behavioral tenden-
cies. Take the example of the arching of a cat’s back in an aggressive context. Darwin
interpreted this expression as an attempt to increase its size following the principle of
antithesis (appear big when aggressive and small when submissive; see Darwin 1872: 56
and Figure 8: 58). However, following Lorenz’s analysis of dog behavior, Leyhausen
(1956, cited in 1967) demonstrated that the cat’s arched back is more likely to be the
consequence of a co-activation of two behavioral tendencies – flight and fight in this
context – and only appears as a gesture of threat due to subjective human interpreta-
tion. Nonetheless, sometimes behaviors can be observed that do not seem to serve a
purpose under conditions of competing action tendencies. Leyhausen argued that
when it was impossible to follow any of these action tendencies, displacement activities
(Übersprungshandlungen) could occur. A chicken in a fight or flight situation might
start picking on the ground as if there were food. Humans might scratch their head
when in behavioral conflict. Such behaviors would have been difficult for Darwin to
explain using his three principles guiding the origins of expressions.
When addressing the issue of the impression, Darwin believed that experiences dur-
ing one’s lifetime could somehow be transmitted to the next generation, not unlike the
mechanisms that Jean Baptiste Lamarck proposed (see also Cornelius 1996). This left
various possibilities open concerning the development of impression. Leyhausen in
turn benefited from knowledge of genetic mechanisms that were unavailable at the
time of Darwin. He was able to reject Lamarckian transmission of experience and in
turn focus on an evolutionary analysis, considering how impression mechanisms
might have evolved and what the features of such mechanisms would be. This included
the complex relationship of genotype and phenotype and features such as the stability
or the variance of features in a given population in a given environment. A central fea-
ture of the genetically transmitted capacity for impression relates to the concept of the
innate releasing mechanism (IRM). Based on studies of several species, Leyhausen ar-
gued that different individual aspects capable of triggering an innate releasing mecha-
nism follow an additive logic, rather than pattern (Gestalt) principles. This is an
important assumption because it opens doors for the strategic control of an interaction
partner’s innate releasing mechanism to achieve certain (social) goals (see also Fridlund
1994). Interestingly, in his analysis of communication in humans, Leyhausen considered
aspects including fashion or stylized behaviors, such as mannerisms of speech or gesture,
as well as behavioral traces, such as writing.
558 IV. Contemporary approaches
Leyhausen strongly emphasized the need for comparative research, citing the long
time period that must have been required for impression processes to evolve. He re-
jected views that focused only on humans and consequential fallacies, such as the con-
fusion of homologies and analogies in the analysis of morphological structures or
behaviors. In analyzing the communication process Leyhausen cast a wide net, consid-
ering behaviors of species as simple as protozoa (paramecia), ticks, or fish, as precursors
to flexible impression and recognition mechanisms in humans. Of particular interest was
his distinction – in higher organisms only – of “angeborenes Ausdruckserfassen,” or Ein-
drucksfähigkeit on the one hand and “erworbenes Ausdruckserfassen,” or Ausdrucksver-
stehen on the other. He referred to an innate grasp of expression as the capacity for
impression and the acquired grasp of expression as understanding of expression. The
use of the German term erfassen, which can literally be translated as ‘grasping,’ proves
to be a particularly felicitous choice of words. Contemporary advances concerning the
role of a mirror system in the pre-motor cortex (e.g., Rizzolatti and Craighero 2004) that
may play a considerable role in the communication of affect and intention (see also
Kappas and Descôteaux 2003) could indeed have more to do with grasping than think-
ing. In other words, there may be a more physical, action-oriented sense of the expres-
sion of other individuals rather than a mental construct such as matching a schema.
Such notions go back to Lipps’ theory of empathy and feature strongly in expression
psychology (e.g., Lersch 1940 and Rothacker 1941, cited in Leyhausen 1967).
preserved. He criticized classical methods of experimental design for applying the de-
mands of representativeness to the number of subjects but not to the number of objects
(stimuli). Furthermore, he questioned the feasibility of experimental designs that select
and isolate one or a few independent variables that are varied systematically whilst
extraneous variables are held constant. In his view, the concept of representativeness
is fundamental to generalization.
Brunswik’s methodological principles were closely entwined with his theoretical out-
look. Brunswik introduced the term ecological validity to indicate the degree of corre-
lation between a proximal (e.g., retinal) cue and the distal (e.g., object) variable to
which it is related. For example, in a perceptual task, ecological validity refers to the
objectively measured correlation between, say, vertical position and size of an object
(larger objects tend to be higher up in the visual field) over a series of situations. Or,
in the domain of emotional expressions, one may compare the ecological validity of
the cue “smiling” with the cue “reported happiness” as indicators of a person-object’s
emotional state. An important aspect of Brunswik’s approach was his conviction that
the environment to which an organism must adapt cannot be perfectly predicted
from the proximal cues. A particular distal stimulus does not always imply specific prox-
imal effects and vice versa. Proximal cues are only probabilistic indicators of a distal
variable. The proximal cues are themselves interrelated, thus introducing redundancy
into the environment.
Scherer (1978, 2003) has suggested modeling the relationship between emotions and
expressive cues in a Brunswikian perspective. In this approach, the relationship
between the underlying emotion and the expressive cues is probabilistic and many
redundant cues are potentially available for both the expression and communication
of emotion. Brunswik would probably question the current dominance of “modern” dis-
crete emotion theories as represented by Ekman, Izard, and their respective collabora-
tors (e.g., Ekman 1992; Izard 1991). Discrete emotion theorists claim that there are only
a limited number of fundamental or basic emotions and that for each of these there ex-
ists a prototypical, innate, and universal expression pattern. In this basic emotion view,
the association is deterministic and only allows minor variations (e.g. a motor program
might be carried out only partially). In the Brunswikian probabilistic view, several
expressive cues will be related independently to the emotional reaction, each with a de-
fined probability. On different occasions, under different circumstances, different com-
binations of cues might be used to express the same emotion. The extended lens model
integrates the study of the production (expression) and of the perception (impression)
of emotional expressions.
In Scherer’s model (see Scherer 2003, Figure 1: 229), the internal states – in this case,
the emotions – are exteriorized in the form of distal indicators corresponding, in the
case of vocal communication, to the acoustic characteristics of the voice. The notion
of externalization covers both intentional communication of internal states and involun-
tary behavioral and physiological reactions. At the operational level, the internal states
are represented by criterion values and the distal indicators by indicator values. These
distal indicators are represented proximally by percepts which are the result of percep-
tual processing on the part of the observer. These percepts can be evaluated by percep-
tual judgments expressed as scores on psychophysical scales or dimensions. The
correlations between indicator values and perceptual judgments are designated by
the term representation coefficient, and indicate the degree of precision of the projection
560 IV. Contemporary approaches
of distal indicators in the perceptual space of the individual. The attribution of a state is
the result of inferential processes based on perception of the distal indicators. These at-
tributions can be evaluated by obtaining new judgments on psychological dimensions
from observers. The correlations between perceptual judgments and attributions are re-
presented in the model by utilization coefficients which provide a measure of the utili-
zation (or the weighting) of each index that is perceived when a state is inferred. The
accuracy of these attributions in relation to the objectively observed state of the indi-
vidual is defined at an operational level by the correlation between criterion values
and attributions (coefficients of accuracy).
It is important to keep in mind that an emotion may be reliably attributed to a facial
or a vocal display despite this emotion not being present in the sender. Conversely,
when receivers cannot reliably attribute emotions to senders, this does not demonstrate
that senders do not express emotions. Such an imperfect relationship between the en-
coding process and the attribution process has been demonstrated by Hess and Kleck
(1994) using dynamic facial expressions, either posed or elicited by affectively evocative
materials. Subjects were able to accurately report the cues they employed in assessing
the perceived spontaneity or deliberateness of expressions. However, these cues were
not always valid determinants of posed and spontaneous expressions. In fact, partici-
pants were relatively poor at identifying expressions of the two types, and this low dis-
crimination accuracy was found to be a result of the consistent use of these invalid cues.
Reynolds and Gifford (2001) reported a similar observation for the expression and per-
ception of intelligence in nonverbal displays. They showed that intelligence, as measured
by a test, was correlated with a few nonverbal cues that could be observed and measured
in video recordings. Participants who were requested to rate intelligence on the basis of
those videos were relatively consistent in using another set of cues to make inferences
about intelligence. However, their ratings were not influenced by the cues that were
related to the intelligence scores, and they didn’t make accurate inferences.
person’s expression can be found in much recent research on facial dynamics (i.e.,
Ambadar, Schooler, and Cohn 2005; Bassili 1978; Wehrle, Kaiser, Schmidt, et al.
2000). Particularly, the distinction between angular-stiff and round-fluent movement
has relevance for today’s study of fake and genuine expressions (see Ekman and Friesen
1982a; Krumhuber and Kappas 2005; Krumhuber, Manstead, and Kappas 2007).
Leyhausen argued for a clear separation between the study and the analysis of
expression and impression. In this respect, his approach was not far from Brunswik’s
concept (e.g., Brunswik 1956; see also Kappas, Hess, and Scherer 1991). However,
the ethological approach Leyhausen represents has pointed to a certain asymmetry
between the origins and, in consequence, the plasticity of expression and impression
processes in humans. The discovery of a mirror system in the brain (e.g., Rizzolatti
and Craighero 2004) and systematic studies on imitative behavior (e.g., in the shape
of the Chameleon effect, e.g., Chartrand and Bargh 1999) are concordant with Leyhau-
sen’s analysis that the impressions gathered from bodily expressions are much less flex-
ible than the expressions themselves. Research by Todorov and his colleagues has
demonstrated how dramatically fast impressions are arrived at, with individuals showing
great confidence in literally fractions of seconds (Todorov, Mandisodza, Goren, et al.
2005; Willis and Todorov 2006). These findings suggest rapid automatic effects that are
consistent with the type of mechanisms postulated by Leyhausen. In other words, this
is not about what faces or bodily expressions really say, but what people think they
say. This can be attributed to our evolutionary past, where it was useful to act quickly
in response to certain signs. A closer analysis of Leyhausen’s predictions might be highly
productive for furthering our understanding of “old questions” about issues such as mis-
communication, misattribution, or insensitivity to deception based on expressive cues.
These questions are currently difficult to answer when the attempt is based on an analysis
that centers only on humans in the here and now and does not take phylogenetic trajec-
tories into account. Leyhausen’s work may also prove useful for current and developing
questions linked to computer-mediated communication, for instance what level of realism
is really needed for artificial entities, such as agents, avatars, or embodied robots.
Brunswik’s lens model was quick to be used and gain support at the time of its
appearance, particularly by Hammond (1955), who remains to this day a strong
defender of this paradigm. Hammond’s initial application of the lens model was in
the area of judgment analysis (or decision making), using the paradigm to analyze
clinical (diagnostic) judgments by psychiatrists and psychologists. Holzworth (2001)
has presented a brief review of the research regarding analysis of judgments from a
Brunswikian perspective. In this review, the author finds that the value to be ascribed
to analysis of judgments quickly became oriented towards problems of interpersonal
perception. In this area, Albright and Malloy (2001) go as far as to state: “All research
on interpersonal perception is Brunswikian, because the use of real people as targets of
judgment invokes the principle of representative covariation. Because Brunswik was
the first to conduct a theoretically based and comprehensive […] study of social percep-
tion, he was the originator of the interpersonal approach to social perception research”
(Albright and Malloy 2001: 330–331). Brunswik was certainly ahead of his time and was
criticized by gestalt psychologists (e.g. Kurt Lewin) as well as by behaviorists, experi-
mental psychologists and statisticians. However, in the present day, Brunswik’s emphasis
on the importance of the environment is reflected in the increasing development of
“ecological psychology,” as illustrated by the work of Barker (1968).
562 IV. Contemporary approaches
German Expression Psychology remained an active research area until the sixties
and devoted much interest to what is now termed nonverbal communication. Unfortu-
nately, its concepts and theories are often overlooked in today’s research. From the pre-
sentation of three selected authors within the field of German Expression Psychology,
namely Lersch, Leyhausen and Brunswik, we wanted to demonstrate how rich and
wide-ranging their ideas have been. These ideas do not only reject the notion of discrete
prototypes of emotional expressions but also highlight the complexities in impression
formation and management. German Expression Psychology therefore has an impor-
tant status in providing the framework for much later empirical work on nonverbal
behavior. In this sense, it may be an ancestor of modern nonverbal communication
research, but it continues to contribute to today’s perspectives on bodily and verbal
forms of behavior to express inner states.
5. References
Albright, Linda and Thomas E. Malloy 2001. Brunswik’s theoretical and methodological contribu-
tions to research in interpersonal perception. In: Kenneth R. Hammond and Thomas R. Stew-
art (eds.), The Essential Brunswik: Beginnings, Explications, Applications, 328–331. New York:
Oxford University Press.
Ambadar, Zara, Jonathan W. Schooler and Jeffrey F. Cohn 2005. Deciphering the enigmatic face:
The importance of facial dynamics in interpreting subtle facial expressions. Psychological
Science 16: 403–410.
Asendorpf, Jens 1982. Contributions of the German “Expression Psychology” to nonverbal com-
munication research, Part II: The face. Journal of Nonverbal Behavior 6: 199–219.
Asendorpf, Jens and Harald G. Wallbott 1982. Contributions of the German “Expression Psychol-
ogy” to nonverbal communication research, Part I: Theories and concepts. Journal of Nonver-
bal Behavior 6: 135–147.
Barker, Roger G. 1968. Ecological Psychology: Concepts and Methods for Studying the Environ-
ment of Human Behavior. Stanford, CA: Stanford University Press.
Bassili, John N. 1978. Facial motion in the perception of faces and of emotional expression. Journal
of Experimental Psychology: Human Perception and Performance 4: 373–379.
Brunswik, Egon 1952. The Conceptual Framework of Psychology. International Encyclopedia of
Unified Science, Volume 1, Number 10. Chicago: University of Chicago Press.
Brunswik, Egon 1956. Perception and the Representative Design of Psychological Experiments.
Berkeley: University of California Press.
Brunswik, Emil and Lotte Reiter 1937. Eindrucks-Charaktere schematisierter Gesichter.
Zeitschrift für Psychologie 142: 67–134.
Chartrand, Tanya L. and John A. Bargh 1999. The chameleon effect: The perception-behavior link
and social interaction. Journal of Personality and Social Psychology 76: 893–910.
Cornelius, Randolph. R. 1996. The Science of Emotion. Upper Saddle River, NJ: Prentice Hall.
Darwin, Charles 1872. The Expression of the Emotions in Man and Animals. London: John Murray.
Dhami, Mandeep, Ralph Hertwig and Ulrich Hoffrage 2004. The role of representative design in
an ecological approach to cognition. Psychological Bulletin 130: 959–988.
Ekman, Paul 1979. Non-verbal and verbal rituals in interaction. In: Mario von Cranach, Klaus
Foppa, Wolfgang Lepenies and Detlev Ploog (eds.), Human Ethology: Claims and Limits of
a New Discipline, 169–202. Cambridge: Cambridge University Press.
Ekman, Paul 1982. Methods for measuring facial action. In: Klaus R. Scherer and Paul Ekman
(eds.), Handbook of Methods in Nonverbal Behavior Research, 45–90. Cambridge: Cambridge
University Press.
Ekman, Paul 1992. Facial expressions of emotion: New findings, new questions. Psychological
Science 3: 34–38.
35. Body and speech as expression of inner states 563
Ekman, Paul and Wallace V. Friesen 1978. The Facial Action Coding System. Palo Alto, CA: Con-
sulting Psychologists Press.
Ekman, Paul and Wallace V. Friesen 1982a. Felt, false and miserable smiles. Journal of Nonverbal
Behavior 6: 238–252.
Ekman, Paul and Wallace V. Friesen 1982b. Measuring facial movement with the Facial Action
Coding System. In: Paul Ekman (ed.), Emotion in the Human Face. 2nd edition, 178–211. Cam-
bridge: Cambridge University Press; Paris: Editions de la Maison des Sciences de l’Homme.
Fridlund, Alan J. 1994. Human Facial Expression: An Evolutionary View. San Diego, CA: Aca-
demic Press.
Hammond, Kenneth R. 1955. Probabilistic functioning and the clinical method. Psychological
Review 62: 255–262.
Helfrich, Hede and Harald G. Wallbott 1986. Contributions of the German “Expression Psychol-
ogy” to nonverbal communication research, Part IV, The voice. Journal of Nonverbal Behavior
10: 187–204.
Hess, Ursula and Robert E. Kleck 1994. The cues decoders use in attempting to differentiate emo-
tion-elicited and posed facial expressions. European Journal of Psychology 24: 367–381.
Holzworth, R. James 2001. Judgment analysis. In: Kenneth R. Hammond and Thomas R. Stewart
(eds.), The Essential Brunswik: Beginnings, Explications, Applications, 324–327. New York:
Oxford University Press.
Izard, Carroll E. 1991. The Psychology of Emotions. New York: Plenum Press.
Kappas, Arvid and Jean Descôteaux 2003. Of butterflies and roaring thunder: Nonverbal commu-
nication in interaction and regulation of emotion. In: Pierre Philippot, Erik J. Coats and Robert
S. Feldman (eds.), Nonverbal Behavior in Clinical Settings, 45–74. New York: Oxford Univer-
sity Press.
Kappas, Arvid, Ursula Hess and Klaus R. Scherer 1991. Voice and emotion. In: Robert S. Feldman
and Bernard Rimé (eds.), Fundamentals of Nonverbal Behavior, 200–238. Cambridge: Cam-
bridge University Press.
Klages, Ludwig 1926. Grundlagen der Charakterkunde. Leipzig: J. A. Barth.
Kretschmer, Ernst 1940. Körperbau und Charakter. Berlin: J. Springer.
Krumhuber, Eva and Arvid Kappas 2005. Moving smiles: The role of dynamic components for the
perception of the genuineness of smiles. Journal of Nonverbal Behavior 29: 3–24.
Krumhuber, Eva, Antony S. R. Manstead and Arvid Kappas 2007. Temporal aspects of facial dis-
plays in person and expression perception. The effects of smile dynamics, head-tilt and gender.
Journal of Nonverbal Behavior 31: 39–56.
Lersch, Philipp 1940. Seele und Welt. Leipzig: J. A. Barth.
Lersch, Philipp 1957. Zur Theorie des mimischen Ausdrucks. Zeitschrift für Experimentelle und
Angewandte Psychologie 4: 409–419.
Lersch, Philipp 1961. Gesicht und Seele. Munich: E. Reinhardt.
Leyhausen, Paul 1956. Verhaltensstudien an Katzen. Zeitschrift für Tierpsycholologie, Beiheft 2.
Leyhausen, Paul 1967. Biologie von Ausdruck und Eindruck. Psychologische Forschung 31:
113–227.
Müller, Cornelia this volume. Gestures as a medium of expression: The linguistic potential of ges-
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body-Language-Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin: De Gruyter Mouton.
Peters, Gustav 2000. Nachruf: Paul Leyhausen (1916–1998). Bonner Zoologische Beiträge 49: 179–189.
Reynolds, D’Arcy J. and Robert Gifford 2001. The sounds and sights of intelligence: A lens model
channel analysis. Personality and Social Psychology Bulletin 27: 187–200.
Rizzolatti, Buccino G. and Fadiga L. Craighero 2004. The mirror-neuron system. Annual Review
of Neuroscience 27: 169–192.
Rothacker, Erich 1941. Die Schichten der Persönlichkeit (2. Auflage). Leipzig: J. A. Barth.
564 IV. Contemporary approaches
Scherer, Klaus R. 1978. Personality inference from voice quality: The loud voice of extroversion.
European Journal of Social Psychology 8: 467–487.
Scherer, Klaus R. 2003. Vocal communication of emotion: A review of research paradigms. Speech
Communication 40: 227–256.
Scherer, Klaus R. and Harald G. Wallbott 1990. Ausdruck von Emotionen. In: Klaus R. Scherer
(ed.), Enzyklopädie der Psychologie. Band C/IV/3 Psychologie der Emotion, 345–422. Göttingen:
Hogrefe.
Todorov, Alex, Anesu N. Mandisodza, Amir Goren and Crystal C. Hall 2005. Inferences of com-
petence from faces predict election outcomes. Science 308: 1623–1626.
Wallbott, Harald G. 1982. Contributions of the German “Expression Psychology” to nonverbal
communication research. Part III: Gait, gestures, and body movement. Journal of Nonverbal
Behavior 7: 20–32.
Waller, Bridget and Marcia Smith Pasqualini this volume. Analysing facial expression using Facial
Action Coding Systems (FACS). In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H.
Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body-Language-Communication: An
International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Wehrle, Thomas, Susanne Kaiser, Susanne Schmidt and Klaus R. Scherer 2000. Studying the dy-
namics of emotional expression using synthesized facial muscle movements. Journal of Person-
ality and Social Psychology 78: 105–119.
Willis, Janine and Alex Todorov 2006. First impressions: Making up your mind after 100 ms expo-
sure to a face. Psychological Science 17: 592–598.
Abstract
This chapter introduces the Fused Bodies (FB) approach to sense-making in social inter-
action. Using ethnographic observation and microanalyses of naturally occurring social
interaction as its empirical basis, Fused Bodies focuses on sense-making in face-to-face
encounters as the integrated whole of interactionally relevant, mutually oriented-to
body movements at any given time during the interaction. Fused Bodies emphasizes
that interactionally consequential body movements are more than just “behavior”; they
36. Fused Bodies: On the interrelatedness of cognition and interaction 565
are acts of sense which matter to the participants. Hence, though heavily indebted to the
concepts and principles of analysis in ethnomethodology and conversation analysis,
Fused Bodies aims at going beyond these disciplines in 1) focusing on and emphasizing
the role of whole bodies in interaction and 2) in explicitly considering the interactional
work done by co-participants as both action and cognition. The latter point is both pro-
grammatic in that Fused Bodies aims at empirically investigating “cognition” as a mem-
ber’s matter and it is empirical in that the sense of interactional actions and the
interactional machinery per se have been found to be significant matters to participants
which they invest interactional work in establishing, negotiating, repairing, and re-
establishing. Finally, Fused Bodies hypothesizes that body movements, including talk,
may in and of themselves index common sense knowledge and constitute interactional
cognitive work.
1. Introduction
Fused Bodies (FB) is a new approach to sense-making in social interaction. The term
sense is understood as to cover all the kinds of meanings an action in social interaction
can have, i.e. referential, conceptual, reflective, etc. The term also covers our under-
standing of meaning as bodily grounded and it highlights that meaning does not reside
in analysts’ models, schemas, or notations, but it resides in the way the sensing body
acts, perceives the actions of others and itself, forms it actions and so on.
From the perspective of Fused Bodies, sense is the total, integrated unit at any given
moment in space and time when mutually oriented, interacting, physical and knowl-
edgeable bodies, in concert, systematize their movements. The latter is understood as
any resource that interacting bodies make use of to achieve sense: language, gesture,
gaze, posture, manipulation of objects or other bodies, non-linguistic sounds, silence,
and more. In other words, the unit of sense-making does not have to be communication
in a traditional sense, involving language as the primary means of achieving sense. The
unit may also be completely non-linguistic. An analogy which highlights the Fused
Bodies focus on the coordinated work that whole bodies do (not just mouths, brains
and hands) is a couple of skilful tango dancers. Each dancer is constantly “tuned in”
to his or her partner, monitoring carefully the movements of the other, not just through
gazing but through the whole body, and every movement is done in precise coordination
with the movements of the other. In and through their coordination and timing, the
dancers become one whole, a gracefully moving unit which is two and one at the
same time. A tango requires mastery of particular “exotic” movements but it shares
with everyday social interaction the general human talent for performing interactional
coordination and timing with the body, to act as two in a graceful unit. The analogy
between a tango and everyday social interaction is of course to be understood at a cer-
tain level of abstraction. On the specific descriptive level, they are two different phe-
nomena of social interaction, two different kinds of social, interactional activities
governed by different specific rules and possibilities and “design” of actions. Social
interaction fuses bodies – in time and space and through sound, vision, coordination,
touch, smell, temperature, and more – and through this fusion social interaction
becomes possible and sociality becomes a fundamental fact of our lives.
Fused Bodies is thus focusing on movements that are turned into social actions by
interacting human bodies. Further, its interests concern when and how these
566 IV. Contemporary approaches
movements are turned into social actions and thus how sense is achieved in interaction.
What is characteristic about the Fused Bodies approach to social interaction is the way
in which it understands social actions as being inherently social and cognitive, or rather
socio-cognitive. Social actions are the means in and through which a common under-
standing is worked at and established. They are not merely “events”, “things that hap-
pen” or “behavior” in a traditional behaviorist sense. Social actions carry out
understanding. They demonstrate the understanding that they “do”. Furthermore,
they do not have a life of their own. They are being constructed, carried out, done, re-
cognized and understood by sense-making human bodies. Hence, in contrast to other
names for approaches to social interaction and communication such as Conversation
Analysis, Discourse Analysis and Discursive Psychology, the name Fused Bodies
profiles the doers of the social action.
The Fused Bodies approach thus endorses a fundamental mutual dependency and
reflexivity between sociality and cognition: On the one hand, understanding is only pos-
sible in and through sociality or on the basis of sociality. Yet on the other hand, an event
can only be a social action – that is an event that is understandable as a social action,
done by a sense-making human being and understood by a sense-making human being –
if there is cognition. When human beings in conversation analytic (see section 2. below)
jargon produce an action, treat an action and understand an action “as” something, they
(in Fused Bodies jargon) cognize that action.
From Fused Bodies’ research interests in understanding follows an interest in knowl-
edge. It follows from its approach to sense-making that Fused Bodies describes and ana-
lyzes knowledge as a social achievement too. The Fused Bodies interest in knowledge
thus concerns how knowledge is achieved socially and how interacting human bodies’
actions bear upon social knowledge.
2. Theoretical background
2.1. Body-language
The study of language as disembodied linguistic structure (morphological, syntactical,
semantic, symbolic, pragmatic) has a long history that dates back to ancient Greece
and includes the most prominent proponents of the modern science of linguistics:
Chomsky (1957), Austin (1962), and Searle (1962) (for a discussion of Aristotle’s con-
ceptualization of the relation between the structure of language and the structure of
reality, see Edel (1996)). The Fused Bodies approach, however, follows a more recent
line of research which embeds language as an integral part of the embodied, interac-
tional making of sense (e.g. Goodwin 2000a, 2000b, 2003; Goodwin and Goodwin
1986; Kendon 1980, 1993, 1994, 2000, 2002, 2004; Laursen 2002; LeBaron and Streeck
2000; Schegloff 1984, 1998; Streeck 1993, 2002). In the Fused Bodies program, this en-
tails that “language” itself, its production and understanding, is a visible bodily doing on
a par with gaze, gesture, head movements and movements of parts of the face (e.g. eye
brows), body posture, object manipulation, and non-verbal sounds. All bodily doings, or
as we shall call them movements, are all in different ways potential resources for inter-
action. It should be emphasized that since movements covers a wide range of different
resources that contribute to sense-making in different ways, these types of movements
should not be understood or treated as the same. Hence, we recognize the distinct
36. Fused Bodies: On the interrelatedness of cognition and interaction 567
symbolic nature of most verbal utterances as being different from say the way in which
a body leans towards another while whispering as secret. Yet, both are part of the same
integrated whole of sense-making, both are also bodily movements and neither the ut-
tering of words nor other movements of the body can a priori be said to be either sym-
bolic or not. The uttering of words may in some context be almost un-symbolic and the
movement of other body parts may take on a symbolic function. The local context and
not the type of movement itself decides whether it is symbolic or not or how it differs
from other body movements.
Fused Bodies’ research interests concern resources which amount to actions that are
recognized as actions for interaction by the participants in interaction. The approach
focuses on how sense is achieved by the participants of interaction in and through con-
certed action in time and space. The Fused Bodies program has a lot in common with
and inherits a lot from multi-modal analyses of interaction in the vein of Ethnometho-
dology (EM) and Conversation Analysis (CA) (Heritage 1984; Hutchby and Wooffitt
1998; Silverman 1998).
Studies in this area have shown how for instance gestures and headshakes are orga-
nized as activities, how they are organized in relation to speaking and how they contrib-
ute to the total meaning of the utterances and/or actions of which they are a part. Yet
other studies (Streeck 1993) reveal a coordination of gaze and gesture in the way speak-
ers turn their gaze at their gesture (iconic gesture) before uttering a specific central
word that is semantically coherent with the meaning of the gesture. And Bavelas,
Coates, and Johnson (2002) have studied how the speaker’s gaze coordinates the inter-
actional work between speaker and recipient. Goodwin (1981) shows how the speaker
while speaking orients himself towards receiving the recipient’s gaze (see also Kendon
2004), and investigates (Goodwin 2000b, 2003) pointing as a situated activity which en-
tails among other “semiotic resources” “a body visibly performing an act of pointing”
(Goodwin 2000b: 69, see also Goodwin and Goodwin 1986). Finally, Schegloff (1998)
shows how “body torque” can “impinge on the conduct of the participants and shape
the way they interactively produce talk” (Schegloff 1998: 536).
However, though the Fused Bodies program inherits most of the methodological and
conceptual agenda of Ethnomethodology and Conversation Analysis (see below) and
has in common with these studies the focus on the employment and integration of mul-
timodal resources in interactional sense-making, it attempts at the same time to over-
come the tendency that has existed in these research traditions to treat language or
rather “talk” as the primordial “carrier” of sense. Thus, the Fused Bodies program en-
dorses an approach that studies face-to-face interaction without a priori analytic divi-
sions of aspects or resources of sense-making. Instead, understanding or sense is
conceptualized as an interactionally achieved, social, integrated whole consisting of
concerted, recognizable and understandable actions that are composed of the bodily
movements that happen to be employed.
of one’s own field, the Fused Bodies program in principle aims at starting with a blank
slate. This means that its research agendas are not motivated by the ambition to either
verify, undermine or rethink specific, existing cognitive theories, concepts and terminol-
ogy. Instead it sets up a redefinition of cognition as such that allows for empirically dis-
covering new kinds of cognizing and new kinds of cognitive phenomena. But by
following this agenda, Fused Bodies studies may of course converge with other or
indeed existing or previous cognitive studies with different points of departure. And,
needless to say, the Fused Bodies program does still draw on the same language and
the same general labeling of things in the world, which means that Fused Bodies studies
may eventually focus on potential phenomena such as metaphor, mind, conceptualization,
or frames without being answerable to any specific cognitive theory.
An influential line of thinking which has inspired both cognitive and ethnomethodo-
logical approaches to sense-making and which the Fused Bodies approach is inspired by
is the phenomenological understanding of how things can have an existence to human
beings. In that understanding, things can have an existence in and through the processes
by which they are constructed. Within “traditional” phenomenology these processes are
features of the individuals’ perception of the world. Cognitive studies take the indivi-
duals’ perception of the world as a point of departure and study these processes typi-
cally as mental processes going on within the minds or brains of individuals. The
individual human being is thus often conceptualized as an enclosed psychological entity
who, by means of communication, can reveal his thought contents to his surroundings.
Ethnomethodology has adopted the phenomenological view on how things come into
being, that is: On how things are constructed as “being”. Ethnomethodological studies,
however, view this construction as fundamentally social and observable. Things come
into being through human beings’ concerted actions. Only through a joint effort do
things become a part of human beings’ social world. Ethnomethodological studies
thus adopt a praxeological and procedural approach to how social human beings in
and through concerted actions construct or orient towards things as being a part of
social reality. With this approach, ethnomethodological studies investigate how social
human beings recognize actions as being part of procedures of sense-making through
the (re)use, (re)construction and (re)establishment of an action (of being exactly that
action) in systematic ways. What is of interest to ethnomethodological studies is conse-
quently not “cognitive” or “neurological” “processing” per se but the systematic occur-
rence of recognizable methods and techniques by which human beings construct a social
order and make sense of their social world. Thus, Ethnomethodology disregards any
attempt to account for social behavior by reference to “mental” processes which may
generate such behavior. Instead, Ethnomethodology views social behavior as the
locus of sense-making in and of itself. It should be noted that human beings are not
thus viewed as mindless social agents (or “cultural dopes”; Garfinkel 1967: 68). On
the contrary, social behavior indexes knowledge. Hence, Ethnomethodology interests
concern social human beings’ knowledge and thus their use of categories and notions
of their social world. Hence, mental or neurobiological categories such as cognition,
mind, thinking and feeling have a place in the ethnomethodological analysis only in
so far as these categories are oriented to or constructed by the participants themselves
(in and through their actions) as features of their social world. The Fused Bodies pro-
gram shares with Ethnomethodology the perspective on phenomena that describes
these in terms of how they come into being a part of social human beings’ life world.
36. Fused Bodies: On the interrelatedness of cognition and interaction 569
However whereas Ethnomethodology studies focus on the social structures as these are
achieved in and through the actions of members of society and as existing only as such,
the Fused Bodies program to an even greater extent emphasizes the integral and defin-
ing role of the doers of these structures. Thus, in the Fused Bodies program actions and
social structure are seen as the bodily movements that constitute them.
is on these as naturally occurring social situations). The social interactions are filmed
and then transcribed according to a set of transcription symbols which draw in part
on transcription conventions as developed by the mother of Conversation Analysis,
Gail Jefferson (Schenkein 1978; for a discussion of methodological and practical aspects
of using this style for transcribing verbal-interaction, see Ten Have 1999).
Traditional Conversation Analysis data was typically audio-taped and, as mentioned
above, the analysis focused on the oral sounds produced by the participants. The Fused
Bodies approach with its interests in “bodies-in-interaction” naturally relies on video-
taped data. It must thus include in its transcriptions all potentially relevant body move-
ments (see Hougaard and Hougaard 2009 for examples). Data is analyzed on the basis
of both the original footage and the transcriptions.
every activity – even of the “individual” one. That is, Fused Bodies has it that all our
knowledge as individuals is not individual but acquired in and through social interaction
with and about the world we are a part of (Hougaard and Hougaard 2009). Thus, the
Fused Bodies framework proposes a conception of mind which is fundamentally social
(see also Coulter 1979 and Wittgenstein [1953] 2001). The perception of “pain” may be
individual when stepping on a sharp stone barefooted. The conception or theoretical or
practical knowledge of the neuro-physical sensation we know as “pain” is however
achieved in and through the social determination of it. Thus, cognition is social and
so is the human mind.
As a consequence, the radical Fused Bodies hypothesis has it that the social mind
precedes the Husserlian “subject”. This hypothesis is partly based on the concepts
and empirical fundament of the Fused Bodies program and in part on the unsustainabil-
ity of the Husserlian subject understood as purely individual experience of being
(Husserl [1913] 1977; see also Eco 1997). The Fused Bodies program hypothesizes, as
opposed to traditional phenomenology, that the experience of “being” is dependent
on the co-existence of other “beings”. No child is born in isolation. Also, it follows
from this that the program does not leave room for a notion of cognition as an “auton-
omous” phenomenon that interacts autonomously with other “minded processes” such
as perception. Autonomous cognition is by definition non-human or pre-human. Con-
sider for instance a child who, though not born in isolation, is still, “brought up” by
wolves. We may recognize that child as a “human being” because of its biological poten-
tial for being a member of human society, but we consider its way of understanding the
world un-human. Had the child displayed an understanding of the world that corre-
sponded to a “human” perspective, this would have been evidence of autonomous cog-
nition. We speculate that what is autonomous is our orientation towards other
“individual beings” – most of all “individuals like us”. As some of the empirical studies
that gave rise to the Fused Bodies program evidence, people orient towards people (see
section 5. below). Thus, human beings’ way of having a world, having knowledge and
being minded is based on our social orientation.
The Fused Bodies program presents a non-reductionist approach to cognition. The
cognitive sciences have produced many attempts to demarcate the “cognitive sphere”,
for instance the individual’s “inside” or “mind”, the individual brain or the individual
body and brain. As described above, the notion of cognition entertained in the Fused
Bodies approach is constrained by the documentary method of interpretation, i.e. what
the participants themselves (together) understand as the cognitive constitution of
their social world. Whether it be in terms of talking about cognition (“memory”, “rec-
ognition” and the like) or in terms of building cognition into actions in interaction
(alluding to, or relying on cognition). This notion of cognition, though clearly method-
ologically constrained, in principle allows for various different notions of cognition or
an all-encompassing notion of cognition as for instance involving any thinkable (and
unthinkable) bodily movement. In other words: The Fused Bodies research interests
concern how participants in interaction treat actions not just as actions but as actions
carried out by knowledgeable, mindable, feeling human beings and members of society.
Anything within this that is understood intersubjectively as cognition is of interest to
Fused Bodies.
It should be made clear that the Fused Bodies program distinguishes between actual
empirical analyses and findings in studies of social interaction and the hypotheses and
572 IV. Contemporary approaches
philosophy of human knowledge and the human mind, that have been sketched above.
The latter is a changeable spin-off of the former, and the empirical studies of bodies
that fuse in social interaction to achieve intersubjectivity in a current activity remain
the core of the Fused Bodies program. Nonetheless, the Fused Bodies program can
indeed generate studies of sense-making in solitary settings where, using other methods
than those developed in connection to Conversation Analysis, the focus is on how the
processes and procedures of sense-making orient towards a social, normative constitution
of the sense-maker’s world.
5. Empirical evidence
The Fused Bodies program emerges from ethnographic and interactional studies of nat-
urally occurring (face-to-face) interaction in many different settings. Studies of how
participants – ordinary, natives, non-natives, (non)disabled, children, adults, parents,
businesspeople, teachers, pedagogues and more – make sense have indicated that
sense-making is neither an autonomous individual pre-planned cognitive endeavour
nor a behaviourist social machinery of talk. Sense-making is an accomplishment that
is worked and arrived at by the participants in a social process. These studies have
also indicated that the social process and the arrived-at understanding of whatever is
the ongoing interactional business matters to the participants.
with these implications (in line 7) by first establishing the way the question (posed by
the non-native caller in line 5) is understood, and then answering the understood
question:
Excerpt:
The example above illustrates how the understanding of different actions doing dif-
ferent things matters to participant G. Instead of just treating the most likely under-
standing of D’s action (line 5) as “is Mister Heinrich in the house” by responding
“no” in line 6 or 7, G repairs “Heinrich” with “Mr. Heinrich” and repairs “at
home” with “in the house”, demonstrating that it matters what kind of action and
possible understanding is being dealt with subsequently. Furthermore, the construc-
tion of the relationship between the caller and the called and between the answerer
(G) and the called matters (“Heinrich” or “Mr. Heinrich” to whom?) as well. Notice
that that “it matters” is done in interaction. It is not an “internal” matter. Further-
more, the understanding of “what matters” is collaboratively worked at and accom-
plished by both participants.
Thus, a lot seems to be at stake for the participants. This may of course be – and has
been – described in terms of the actions they have carried out. However, we hold that
such an approach does not grasp the nature of “what matters”. That meaning and sense-
making matters – what means something to people, what they care about – is not part
and parcel of what can simply be labeled “interactional machinery.” In order to capture
and describe participants’, members of society’s, understanding of their world, the
Fused Bodies approach proposes a framework which does not conceptualize this under-
standing merely as “mechanics.” Hence the Fused Bodies approach is not in line with
traditional behaviorist thinking.
palsy and his pedagogue – show how meaningful interaction may proceed locally by
way of non-symbolic (non-signifying) body movements that are
Hence, the understanding that matters to the participants is produced by entire bodies
(see also gesture studies, see section 2.1.), and sometimes non-verbal movements alone
do the job of making meaningful interaction (see also Kendon 1992; Lerner and
Zimmerman 2003). Participants may monitor whole bodies as they interact and they
may, in principle, use any part of their body in understanding the ongoing interaction
(see also Streeck 1993, 2002; LeBaron and Streeck 2000). It is of course central to a
framework that proposes to also include (non-verbal) movements as social actions to
be able to distinguish between body movements that are being treated as movements
for social interaction and non-interactional body movements. In the studies mentioned
above, the analyses show how the participants in interaction deal with such movements
(or not). On the basis of these studies, Fused Bodies proposes to approach the question
as a “members’ problem”, i.e. as a problem that members have to and actually do deal
with in practical ways. Hougaard and Hougaard (2009) show how participants orient
towards bodies as being “interactionally set” or not. In their analysis of social interac-
tion between a person suffering from aphasia and her speech therapist, it is shown how
the aphasic person is sometimes oriented to as “falling out” of the interaction. In con-
tinuation of doing a word search her gaze drops to the surface of a desk between her
and the therapist and the otherwise very lively, gesturing woman sits completely still.
In response to this, the speech therapist also sits completely still while however gazing
at his patient. Hence, the co-participants, by stopping all bodily movement momen-
tarily, bring the interaction to a halt. This is done non-verbally, in and through a coor-
dination of body movements and monitoring (by the therapist) of the other’s body. For
that moment, the bodies are not in an interactionally developing mode. Having co-
composed the stillness (with the therapist’s responding stillness), what follows does
not carry the interaction onward until the stillness is broken by the aphasic woman.
Hence, what bodies do during periods of interaction may or may not contribute to
the development of the interaction, and whether or not some body movement contri-
butes to the development of the interaction can be determined by whether it is oriented
to by the participants as doing so. If orienting towards bodily movements as social
actions, then the understanding hereof matters to the participants. The understanding
of these social actions are worked at by the co-participants and are thus are a concern
to them.
Importantly, not only the contents of social actions matter. In an analysis of an au-
tistic child who is building a Lego tower together with his father, Brouwer et al. (in
prep.) found that the parent corrects the child, if the child (Arthur, invented name) in-
itiates a turn-at-building (putting a block on the tower), when it is the parent’s turn
(“Daddy’s turn–Arthur’s turn”). This example – though it may appear strange to
many parents at first glance – confirms what seems to have been found in the analysis
of fighting for turns-at-talk (Sacks, Schegloff, and Jefferson 1974), namely that the-turn-
taking machinery matters. Considerable work can be seen to be put into negotiating the
36. Fused Bodies: On the interrelatedness of cognition and interaction 575
6. References
Austin, John 1962. How to Do Things with Words: The William James Lectures Delivered at Har-
vard University in 1955. Edited by James O. Urmson. Oxford: Clarendon.
Bavelas, Janet B., Linda Coates and Trudy Johnson 2002. Listener responses as a collaborative
process: The role of gaze. Journal of Communication 52(3): 566–579.
Brouwer, Catherine E., Dennis Day, Ulrika Ferm, Anders Hougaard, Gitte Hougaard and Gunilla
Thunberg in preparation. Structures in interactions between children with disability and their
parents, or: Treating the actions of children with disabilities as sensible.
Brouwer, Catherine E., Gitte Rasmussen and Johannes Wagner 2004. Embedded corrections in
second language talk. In: Rod Gardner and Johannes Wagner (eds.), Second Language Conver-
sations, 75–92. London: Continuum.
Chomsky, Noam 1957. Syntactic Structures. New York: De Gruyter.
Coulter, Jeff 1979. The Social Construction of Mind. London: Macmillan.
Eco, Umberto 1997. Kant og Næbdyret. [Danish translation. Kant e l’ornitorinco. Milan: R.C.S.
Libri S.p.A] Copenhagen: Forum.
Edel, Abraham 1996. Aristotle and His Philosophy. New Brunswick, NJ: Transaction.
Edwards, Derek and Jonathan Potter 1992. Discursive Psychology. London: Sage.
Garfinkel, Harold 1967. Studies in Ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall.
Goodwin, Charles 1981. Conversational Organisation – Interaction between Speakers and Hearers.
New York: Academic Press.
Goodwin, Charles 2000a. Gesture, aphasia, and interaction. In: David McNeill (ed.), Language
and Gesture, 84–98. Cambridge: Cambridge University Press.
Goodwin, Charles 2000b. Pointing and the collaborative construction of meaning in aphasia. Texas
Linguistic Forum 43: 67–76.
Goodwin, Charles 2003. Conversational frameworks for the accomplishment of meaning in apha-
sia. In: Charles Goodwin (ed.), Conversation and Brain Damage, 90–116. Oxford: Oxford Uni-
versity Press.
576 IV. Contemporary approaches
Goodwin, Charles and Marjory Goodwin 1986. Gesture and coparticipation in the activity of
searching for a word. Semiotica 62: 51–75.
Heritage, John 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Heritage, John 1987. Ethnomethodology. In: Anthony Giddens and Jonathan H. Turner (eds.),
Social Theory Today, 224–272. Cambridge: Polity Press.
Hougaard, Anders R. and Gitte Rasmussen Hougaard 2008. Do body movements display com-
mon sense knowledge? Paper presented at Language Culture and Mind Conference III,
Odense, Denmark.
Hougaard, Anders R. and Gitte Rasmussen Hougaard 2009. Fused Bodies: Sense-making as a
phenomenon of interacting, knowledgeable bodies. In: Hanna Pishwa (ed.), Social Cognition
and Language, 47–78. Berlin: De Gruyter Mouton.
Husserl, Edmund 1977. Ideen zu einer reinen Phänomenologie und Phänomenologischen Philoso-
phie. Erstes Buch: Allgemeine Einführung in die reine Phänomenologie. 1. Halbband: Text der
1.–3. Auflage. Reprint, edited by Karl Schuhmann, The Hague: Martinus Nijhoff. First pub-
lished Halle (Saale): Max Niemeyer [1913].
Hutchby, Ian and Robin Wooffitt 1998. Conversation Analysis. Cambridge: Polity Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
R. Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The Hague:
Mouton.
Kendon, Adam 1992. The negotiation of context in face-to-face interaction. In: Alessandro Dur-
anti and Charles Goodwin (eds.), Rethinking Context: Language as an Interactive Phenomenon,
323–334. Cambridge: Cambridge University Press.
Kendon, Adam 1993. Human gesture. In: Tim Ingold and Kathleen R. Gibson (eds.), Tools, Lan-
guage and Cognition in Human Evolution, 43–62. Cambridge: Cambridge University Press.
Kendon, Adam 1994. Do gestures communicate? A review. Research on Language and Social
Interaction 27(3): 175–200.
Kendon, Adam 2000. Language and gesture: Unity or duality? In: David McNeill (ed.), Language
and Gesture, 84–98. Cambridge: Cambridge University Press.
Kendon, Adam 2002. Some uses of the head shake. Gesture 2(2): 147–183.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Laursen, Lone 2002. Kodeskift, gestik og sproglig identitet på internationale flersprogede arbejds-
pladser [Code-switching, gesture and identity in international work places]. PhD dissertation,
University of Southern Denmark.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge and the world. In: David
McNeill (ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
Lerner, Gene H. and Don H. Zimmerman 2003. Action and the appearance of action in the con-
duct of very young children. In: Phillip Glenn, Curtis D. LeBaron and Jenny Mandelbaum
(eds.), Studies in Language and Social Interaction: In Honor of Robert Hopper, 441–457.
Mahwah, NJ: Lawrence Erlbaum.
Maynard, Douglas W. and Steven E. Clayman 1991. The diversity of ethnomethodology. Annual
Review of Sociology 17: 385–418.
Psathas, George 1995. Conversation Analysis. The Study of Talk-in-Interaction. Thousand Oaks,
CA: Sage.
Rasmussen, Gitte 1998. The use of forms of address in intercultural business conversation. Revue
de Sémantique et Pragmatique 3: 57–72.
Rasmussen, Gitte 2000. Zur Bedeutung Kultureller Unterschiede in Interlingualen und Interkul-
turellen Gesprächen. Munich: Iudicium.
Sacks, Harvey 1995. The baby cried. The mommy picked it up. In: Gail Jefferson (ed.), Lectures
on Conversation, 236–242. Oxford: Blackwell. First published [1966].
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the
organisation of turn-taking for conversation. Language 50: 696–735.
37. Multimodal interaction 577
Schegloff, Emanuel 1984. On some gesture’s relation to talk. In: J. Maxwell Atkinson and John
Heritage (eds.), Structures of Social Actions: Studies in Conversation Analysis, 266–296. Cam-
bridge: Cambridge University Press.
Schegloff, Emanuel 1998. Body torque. Social Research 65(3): 535–596.
Schenkein, Jim 1978. Studies in the Organization of Conversational Interaction. New York: Aca-
demic Press.
Searle, John 1962. Meaning and speech acts. Philosophical Review 71(4): 423–432.
Silverman, David 1998. Harvey Sacks: Social Science and Conversation Analysis. Cambridge: Pol-
ity Press; New York: Oxford University Press.
Streeck, Jürgen 1993. Gesture as communication. Communication Monographs 60: 275–299.
Streeck, Jürgen 2002. A body and its gestures. Gesture 2(1): 19–44.
Ten Have, Paul 1999. Doing Conversation Analysis: A Practical Guide. Thousand Oaks, CA: Sage.
Wittgenstein, Ludwig 2001. Philosophische Untersuchungen/Philosophical Investigations. Oxford:
Blackwell. First published Oxford [1953].
Abstract
This chapter sketches some of the embodied feature characterizing social interaction as a
locally situated and mutually intelligible achievement of the participants. It focuses on the
interactional dimension and relevance of gesture and other embodied conduct. After a
short presentation of the issues raised by the study of multimodal interaction, the chapter
focuses on the notion of “resource”, which covers both conventional forms – such as
grammar – and less standardized and more opportunistic means that are used by parti-
cipants to build the intersubjective accountability of their actions – such as gesture,
gaze, head movements, facial expressions, body posture, and body movements. In this
way, the notion of “multimodal resource” invites to take into consideration the irremedi-
able indexicality of linguistic resources, as well as the systematic and methodic use of em-
bodied resources. The chapter describes some features of these resources, particularly
focusing on their temporal and sequential unfolding in an emergent and incremental
way. Issues such as the timed coordination between participants in interaction, the emer-
gent time of talk, and the synchronization of multiple simultaneous embodied actions
are discussed. Finally, the chapter deals with larger issues, going beyond gesture and
578 IV. Contemporary approaches
concerning the body as a whole: the way in which material objects can be considered in
this perspective, as well as the importance of considering not only actions involving static
bodies but also mobile activities.
1. Introduction
Social interaction is the primordial site of language use – its “home habitat” (Schegloff
1996) – as well as the context and activity in and through which children and adults
learn a new language and socialize. Therefore, interaction is the fundamental site for
observing how language and, more generally, communication works in both a situated
and a systematic way, for describing how social relations, and also emotions and cogni-
tive processes, develop and unfold in real time and in real settings, and for studying the
resources on which participants rely in order to communicate together.
In face-to-face interaction, participants not only speak together, but also gesticulate
and move their bodies in meaningful and coordinated ways. Gesture studies have shown
that gestures in conversation originate through the same process that produces words
(Kendon 1980; McNeill 1985). Made predominantly by speakers, but strongly oriented
to their partners (Schegloff 1984), gestures are finely synchronized with the structure of
discourse (Müller 1998) and of talk in interaction (Kendon 2004; Bohle 2001), display-
ing the “same improvisational quality as do words in conversation” (Bavelas et al. 1992:
470), and they are finely tuned with the conduct of the co-participants to whom they are
addressed. This has prompted gesture studies to investigate “interactive gestures”
(Bavelas et al. 1992) in dialogue – that is, gestures that do not refer to the topics at
hand but instead refer to the interlocutor, monitoring shared understanding and estab-
lishing common ground (Clark 1996), seeking agreement, and also maintaining conver-
sation and regulating the turns at talk. These gestures, which typically take the form of
a pointing towards the interlocutor or more complex hand shapes such as exposed
palm, offering open hand, etc., belong to the range of visibly-embodied resources
that participants mobilize in order to build a systematic order of social interaction.
Interest in how human interaction works, not only in its ordinary social and every-
day, but also in its professional and institutional settings, has prompted the study of
video recordings of naturally occurring activities, aimed at understanding how partici-
pants smoothly achieve the finely-tuned complex coordination of their actions. On
the basis of naturalistic data, conversation analysis, inspired by Garfinkel’s ethnometho-
dology (1967), has focused on human interaction as endogenously and methodically or-
ganized. This means it is not considered as being governed by external norms and rules,
but as being locally achieved by the participants, based on micro-practices that are both
context-free and context-shaped (Heritage 1984) – such as practices for self-selecting,
for beginning a new turn, for recognizing transition-relevance points (Sacks, Schegloff,
and Jefferson 1974; Lerner 2003), or for repairing troubles (Schegloff, Jefferson, and
Sacks 1977). These practices are organized by the participants in a publicly accountable
way, that is, in a way which is intelligibly produced for, and interpreted by, them as
action unfolds in real time. This public accountability is built through the mobilization
of a range of resources, which includes language and gesture but which also integrates
other aspects of bodily conduct, such as body postures and movements. The booming
literature on multimodal resources (“multimodality” being conceived in this perspec-
tive as comprising language, gesture, gaze, head movements, facial expressions, body
37. Multimodal interaction 579
2. Multimodal resources
The notion of “resource” covers both conventional forms – such as grammar – and less
standardized and more opportunistic means that are used by participants to build the
intersubjective accountability of their actions. Thus, the notion of “resource” invites
us to take into consideration the irremediable indexicality of linguistic resources, as
well as the systematic and methodic use of embodied resources. This avoids us reifying
certain well-studied (mostly linguistic) resources and ignoring other less-studied ones,
or extracting the resources from the context in which they are situated. The use of a
resource is reflexive in an ethnomethodological sense (Garfinkel and Sacks 1970;
Heritage 1984), and refers to the fact that action is reflexive to the circumstances of
its organization, both adjusting to those circumstances and transforming them. Reflex-
ivity makes what is being used – in a local, contingent, and opportunistic way – into an
organizationally relevant detail. The value and meaning of a resource is context-
dependent, being related both to the sequential organization of social interaction and
to the situated occasion of its use. In return, a resource also shapes the particular inter-
pretation of the context that is made adequate and relevant at that particular point.
Moreover, the contextually-specific use of a resource might also shape its form and
intelligibility as a resource that will be available in the future – thus prompting language,
and more generally semiotic, change.
What Schegloff says about language can be generalized for other multimodal
resources:
The central prospect, then, is that grammar stands in a reflexive relationship to the orga-
nization of a spate of talk as a turn. On the one hand, the organizational contingencies of
talking in a turn […] shape grammar – both grammar as an abstract, formal organization
and the grammar of a particular utterance. On the other hand, the progressive grammatical
realization of a spate of talk on a particular occasion can shape the exigencies of the turn as
a unit of interactional participation on that occasion, and the grammatical properties of a
language may contribute to the organization of turns-at-talk in that language and of the
turn-taking device by which they are deployed. (Schegloff 1996: 56)
For linguistic resources, the notion of reflexivity means that grammatical constructions
and other linguistic forms are used by interlocutors by exploiting various available fea-
tures, but are also re-configured within their very use. Ultimately, linguistic resources
can be seen as being shaped by repeated configuring uses within interaction – the
repeated mobilization of forms in given sequential environments for the practical
purpose of achieving a given interactional action being a grounding force of
580 IV. Contemporary approaches
and body movements. Every one of these dimensions unfolds in time too, concurrently
with talk, constituting the sequential embodied organization of action. Therefore, the
embodied design of turns within interaction integrates not only gesture but also the
way in which the turns are formatted according to the presence or absence of gaze of
the co-participant on the speaker. So, Schegloff (1984) observes that the fact, that ges-
ture prefigures the meaning that will be conveyed by speech, enables the recipient to
build an understanding step by step of the turn, thus creating a projection space that
permits him to timely respond and act in a relevant and emergent way as the turn un-
folds. Goodwin (1981) shows that this progressivity can be delayed: the speaker pro-
duces re-starts when he notices that the co-participant is not gazing at him and, in
turn, these re-starts work both as a delaying device, suspending the progression of
the turn until the gaze is secured, and as an attention-getting device, requesting the
attention of the co-participant. Streeck (1993) finds a gesture-gaze pattern in his data
documenting descriptions in dialogue, which consists in the fact that “the speaker
looks at her own gesture before the word that carries the key information, returns
gaze to the listener once the keyword is uttered” (Streeck 1993: 234): in this way, ges-
tures become “objects of attention” which are offered for inspection to both the listener
and the speaker, the gaze marking the communicative relevance of the gesture. More
generally, gestures are coordinated with the state of attention of the participants:
they are closely coordinated with a monitoring of what the other participants do
(M.H. Goodwin 1980), they are sensitive to the participation framework in which the
speaker can either find or miss a recipient (Goodwin 1979), and their trajectory is orga-
nized after having secured the visibility of their target, through a preliminary rearran-
gement of the local environment. In this respect, studies of deictic reference
(Hindmarsh and Heath 2000a; Goodwin 2003; Mondada 2007) show that prior to the
use of referential expressions the speaker engages in intensive interactional work, in
order not only to get the relevant attention of the co-participants but also to (re)arrange
their entire bodies in such a way that the deictic action (achieved through linguistic and
gestural resources) lies within the focus of attention of the participants. As Goodwin
(2003) shows, pointing and talking are formatted together by taking into consideration
the surrounding space, the activity in which the participants are engaged, and the par-
ticipants’ mutual orientation. In another study, using the example of archaeologists ex-
cavating soil, Goodwin (2000) shows how participants actively constitute a visual field
which has to be scrutinized, parsed and understood together by the co-participants in
order to find out where the speaker is pointing. The archaeologists juxtapose language,
gesture, tools (such as trowels), and graphic fields (such as maps) on a domain of scru-
tiny, which is surrounding them but is also being delimitated by the very act of referring
to it. In this sense, gestures are environmentally coupled (Goodwin 2007a), and not
used as a separated resource coming from the exterior world into a pre-existing context:
the domain of scrutiny is transformed and reorganized by the very action of pointing,
done within the current task. As Hindmarsh and Heath (2000a) show, these gestures,
and body movements amplifying them, are realized in a way that is recipient-designed,
that is they indicate and even display the referent for the co-participants, at the relevant
moment, when the referent is visible for them. Pointing gestures are “produced and
timed with respect of the activities of the co-participants, such that they are in a position
to be able to see the pointing gesture in the course of its production” (Hindmarsh and
Heath 2000a: 1868). Thus, the organization of the gesture and the body of the speaker
582 IV. Contemporary approaches
are adjusted to the recipient, in order to guide him in the material environment and
towards the referent. Since recipients display their understanding and grasp of the
action going on, speakers adjust to the production of these expressions, or to their
absence or delay.
This mutual orientation involves not only talk and gesture but also the entire body,
gazing on and bending towards the object (Hindmarsh and Heath 2000a) and, more
radically, actively rearranging the surrounding environment. Mondada (2007) shows
how speakers, prior to the utterance of the deictic, dispose their bodies within space,
reposition objects within space, and even restructure the environment. The deictic
and the pointing gesture are produced only after participants have organized the dispo-
sition of the spatial context. Thus, deictic words and gestures are not merely adapting to
a pre-existing and immutable context; they are part of an action which actively renews
and changes the context, rearranging the interactional space in the most appropriate
way for the pointing to take place.In these cases, the emergent and progressive tempor-
ality of talk and action is suspended, delayed, or postponed until the conditions for joint
attention or a common focus of attention are fulfilled. The emergent organization of
talk and action concerns not only gesture and gaze but also the moving body, the sur-
rounding space and the material environment. What emerges from these contributions
is the necessity to go beyond the study of single “modalities” coordinated with talk, and
to take into consideration the broader embodied and environmentally situated organi-
zation of activities (Streeck, Goodwin, and LeBaron 2011). Consequently, in what fol-
lows we sketch some fields that are currently being investigated and which open up the
variety of multimodal resources to be considered within social interaction: materiality,
spatiality and mobility.
Luff 1996) has been documented. Likewise, the reflexive constitution of artifacts and
inscriptions by the way in which they are locally mobilized within the work activity
(Hindmarsh and Heath 2000b; Suchman 2000; Mondada 2006) and during scientific ac-
tivities (Ochs, Gonzales, and Jacoby 1996; Roth and Lawless 2002) has been studied.
Goodwin’s study of the use of the Munsell chart by archaeologists (Goodwin 1999) de-
tails this complex web of multimodal resources in an exemplary way. He describes the
work of archaeologists excavating soil and examining its color, holding a coding form to
be filled with soil description and the Munsell chart allowing the comparison between
different shades of color. The mobilization of the chart is done in an ordered way,
aligned with other movements such as the arrangement of the bodies, the participants’
gaze, their pointing gestures, and the holding of the trowel, as well as with the partici-
pants’ talk, which together make it possible to compare the colors of the Munsell chart
with the color of the soil sample. The participants agree, disagree, and negotiate the
final color to inscribe on the form. “Seeing” color with the Munsell chart is not the auto-
matic result of a procedure; it is a situated achievement needing the prior alignment of
the local action space and, thus, requiring time. In turn, the analysis of the use of the
Munsell chart is based on various video views of the archaeologists’ action, some of
them being quite narrowly framed close-ups, and the reproduction of various materials
(an example of a soil description form, various pictures of the Munsell chart, and the
Munsell book). In summary, attention to the objects permits the investigation not
only of the manual actions of the hands but also the way in which those actions are an-
chored within the environment – not forgetting the common focus of attention these
manipulated objects create among the co-participants.
embodied actions and material environment, defining what he calls “contextual config-
urations”. If the analysis of talk has to take into consideration the embodied actions of
the participants, the study of gesture or body postures cannot be developed in isolation,
but has to describe the way in which the structure of the environment contributes to the
organization of the interaction. Drawing on these inspirations, Mondada (2007, 2009,
2011a) proposes that “interactional space” is constituted through the situated, mutually
adjusted and changing arrangements of the participants’ bodies within space. This pro-
duces a configuration relevant to the activity they are engaged in, their mutual attention
and their common focus of attention, the objects they manipulate and the way in which
they coordinate in joint action. This interactional space is constantly being established
and transformed within the activity (De Stefani and Mondada 2007; LeBaron and
Streeck 1997; Mondada 2009a, 2011a; De Stefani 2011; Hausendorf, Mondada, and
Schmitt 2012). The transformation of interactional space is achieved by the bodily ar-
rangements of the participants constituting mobile configurations and mobile forma-
tions. This is even more the case with interactional spaces constituted through and
within mobile activities such as walking, driving, biking, etc.
(i) they show the importance of the entire body, and not only of gesture or of the
upper parts of the body for the organization of social interaction,
(ii) they concern a mobile body, and not only a static one,
(iii) they are methodically organized within interaction (such as in walking together),
and
(iv) they are chronologically and sequentially finely-tuned with the organization of
talk, reflexively contributing to its intelligibility.
Early works on walking already describe “doing walking” as a methodic practice and a
concerted accomplishment (Ryave and Schenkein 1974: 265). Members achieve walking
together, being recognized both as a “vehicular unit” (Goffman 1971: 8) and as “withs”
(Goffman 1971: 19). In walking together, participants organize their concerted action
both within their group – by maintaining proximity and pace, speeding up and slowing
down, managing turns and stopping together (see Haddington, Mondada, and Nevile in
press; De Stefani 2011) – and with respect to other passers-by, – while navigating within
37. Multimodal interaction 585
a crowd, avoiding collisions, adjusting to the trajectory of others, and even making
accountable their interruptions of trajectory (Lee and Watson 1993; Watson 2005).
Two mobile units can also converge, for example when various couples or groups
meet and merge, thereby constituting one unique interactional space (Mondada
2009a). Conversely, people can also display that they are not with, exhibiting civil
inattention and minimizing the effects of co-presence (Goffman 1971; Sudnow
1972). As noted by Ryave and Schenkein, the fact that these challenges are resolved
in unproblematic ways reveals “the nature of the work executed routinely by partic-
ipant walkers” (Ryave and Schenkein 1974: 267). Moreover, collective walking activ-
ities are organized by being oriented in a finely-tuned way to the organization of talk
and even to the details of the emergent construction of turns and sequences at talk:
Relieu (1999) shows how turn-design is sensitive to the spatial ecology encountered
by speakers talking and walking, and Mondada (2009a) shows how the first turn of
an encounter is finely designed with respect to the walking body of the co-participant.
Walking practices, such as walking away (Broth and Mondada forthcoming) or run-
ning away (Deppermann, Schmitt, and Mondada 2010), orient to transition-relevance
points and to transitions from one activity to the other; conversely, they achieve the
accountability of these transitions and contribute towards the achievement of these
transitions in a publicly visible way in which all co-participants can align – or eventually
desalign.
7. Conclusion: Challenges
Multimodal interaction opens up an extremely rich field of investigation, expanding
prior knowledge of social interaction, and was initially based on audio recordings favor-
ing talk and on videos focusing on specific embodied resources, such as gesture. A wider
notion of the multimodal resources mobilized by participants for building their account-
able actions includes language, gesture, gaze, facial expressions, body posture, body
movements such as walking, and embodied manipulations of artifacts. This enlarged
vision opens up various challenges, both methodologically and theoretically. Method-
ologically, the study of relevant details concerning the entire body challenges the way
in which social action is documented, firstly through video recordings of naturally occur-
ring interactions in their ordinary social settings, and secondly through transcripts and
other forms of annotation. The documentation of participants engaged in mobile activ-
ities within complex settings, involving not only their bodies but also various material
and spatial environmental details, requires video recordings and video technologies
that are relevantly adjusted to the activities observed (see Mondada 2012). The repre-
sentation of complex conduct involving a variety of multimodal resources also chal-
lenges traditional, more linear, transcripts, and requires more and more sophisticated
annotation and alignment tools (like ELAN, ANVIL or CLAN). Analytically, this
rich documentation makes the reconstruction of the moment-by-moment temporality
of the emerging interaction complex. Multimodality is characterized by multiple tem-
poralities and multiple sequentialities operating at the same time. Multiple details
unfold both simultaneously and successively – and even multiple courses of action
since often simultaneous activities are concerned (such as in body-torqued positions,
Schegloff 1998; but also in various parallel actions such as working-and-overhearing,
M.H. Goodwin 1996; talking-and-operating, Mondada 2011b; talking-and-eating,
586 IV. Contemporary approaches
8. References
Ashcraft, Norman and Albert Scheflen 1976. People Space: The Making and Breaking of Human
Boundaries. New York: Anchor.
Auer, Peter 2009. Online Syntax. Thoughts on the temporality of spoken language. Language
Sciences 31: 1–13.
Auer, Peter, Elizabeth Couper-Kuhlen and Frank Müller 1999. Language in Time. The Rhythm
and Tempo of Spoken Interaction. Oxford: Oxford University Press.
Bavelas, Janet and Nicole Chovil 2000. Visible acts of meaning. An integrated message model of
language in face-to-face dialogue. Journal of Language and Social Psychology 19(2): 163–193.
Bavelas, Janet, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gesture. Dis-
course Processes 15: 469–489.
Bohle, Ulrike 2001. Das Wort Ergreifen – das Wort Übergeben. Berlin: Weidler.
Broth, Mathias and Lorenza Mondada in press. Walking away: The embodied achievement of
activity closings in mobile interaction. Journal of Pragmatics.
Clark, Herbert 1996. Using Language. Cambridge: Cambridge University Press.
Condon, William S. 1971. Speech and body motion synchrony of the speaker-hearer. In: David L.
Horton and James Jenkins (eds.), Perception of Language, 150–173. Columbus: Merrill.
Couper-Kuhlen, Elisabeth and Margret Selting 1996. Prosody in Conversation: Interactional Stu-
dies. Cambridge: Cambridge University Press.
Deppermann, Arnulf, Reinhold Schmitt and Lorenza Mondada 2010. Agenda and emergence:
Contingent and planned activities in a meeting. Journal of Pragmatics 42: 1700–1712.
De Stefani, Elwys 2011. Ah Petta Ecco, io Prendo Questi che mi Piacciono’. Agire come Coppia al
Supermercato. Un Approccio Conversazionale e Multimodale allo Studio dei Processi Decisio-
nali. Roma: Aracne.
De Stefani, Elwys and Lorenza Mondada 2007. L’organizzazione multimodale e interazionale del-
l’orientamento spaziale in movimento. Bulletin Suisse de Linguistique Appliquée 85: 131–159.
DiMatteo, Robin, Jeffrey Robinson, John C. Heritage, Melissa Tabbarrah and Sarah Fox 2003.
Correspondence among patients’ self-reports, chart records, and audio/videotapes of medical
visits. Health Communication 15: 393–413.
Ford, Cecilia E. 2004. Contingency and units in interaction. Discourse Studies 6(1): 27–52.
Garfinkel, Harold 1967. Studies in Ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall.
Garfinkel, Harold and Harvey Sacks 1970. On formal structures of practical actions. In: John D.
McKinney and Edward A. Tiryakian (eds.), Theoretical Sociology, 337–366. New York: Apple-
ton-Century Crofts.
Goffman, Erving 1963. Behavior in Public Places: Notes on the Social Organization of Gathering.
New York: Free Press.
Goffman, Erving 1964. The neglected situation. American Anthropologist 66(6): 133–136.
Goffman, Erving 1971. Relations in Public: Microstudies of the Public Order. New York: Harper
and Row.
Goodwin, Charles 1979. The interactive construction of a sentence in natural conversation. In: George
Psathas (ed.), Everyday Language: Studies in Ethnomethodology, 97–121. New York: Irvington.
Goodwin, Charles 1981. Conversational Organization: Interaction between Speakers and Hearers.
New York: Academic Press.
37. Multimodal interaction 587
Goodwin, Charles 1999. Practices of color classification. Mind, Culture and Activity 7(1–2): 62–82.
Goodwin, Charles 2000. Action and embodiment within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Goodwin, Charles 2003. Pointing as situated practice. In: Sotaro Kita (ed.), Pointing: Where Lan-
guage, Culture and Cognition Meet, 217–241. Hillsdale, NJ: Lawrence Erlbaum.
Goodwin, Charles 2007a. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell
and Elena Terry Levy (eds.), Gesture and the Dynamic Dimensions of Language, 195–212. Am-
sterdam: John Benjamins.
Goodwin, Charles 2007b. Participation, stance and affect in the organization of activities. Dis-
course and Society 18(1): 53–73.
Goodwin, Charles 2009. Things, bodies, and language. In: Bruce Fraser and Ken Turner (eds.),
Language in Life, and a Life in Language: Jacob Mey – A Festschrift, 106–109. Bingley, UK:
Emerald.
Goodwin, Marjorie Harness 1980. Processes of mutual monitoring implicated for the production
of description sequences. Sociological Inquiry 50(3–4): 303–317.
Goodwin, Marjorie Harness 1996. Informings and announcements in their environments: Prosody
within a multi-activity work setting. In: Elizabeth Couper-Kuhlen and Margret Selting (eds.),
Prosody in Conversation: Interactional Studies, 436–461. Cambridge: Cambridge University
Press.
Haddington, Pentti, Tiina Keisanen and Maurice Nevile 2012. Meaning in Motion: Interaction
in Cars. Semiotica 192 (special issue).
Hakulinen, Auli and Margret Selting 2005. Syntax and Lexis in Conversation. Studies on the Use of
Linguistic Resources in Talk-in-Interaction. Amsterdam: John Benjamins.
Hausendorf, Heiko, Lorenza Mondada and Reinhold Schmitt 2012. Raum als Interaktive
Resource. Tübingen: Narr.
Heath, Christian 1982. Preserving the consultation: Medical record cards and professional con-
duct. Sociology of Health and Illness 4: 56–74.
Heath, Christian 1986. Body Movement and Speech in Medical Interaction. Cambridge: Cambridge
University Press.
Heath, Christian and Paul Luff 1996. Documents and professional practices: “bad” organisa-
tional reasons for “good” clinical records. In: Mark S. Ackerman (ed.), Proceedings of the
1996 ACM conference on Computer supported cooperative work November 16–20, 1996,
354–363. New York, NY: ACM.
Heath, Christian and Paul Luff 2000. Technology in Action. Cambridge: Cambridge University
Press.
Heritage, John C. 1984. Garfinkel and Ethnomethodology. Cambridge: Polity Press.
Hindmarsh, John and Christoph Heath 2000a. Embodied reference: A study of deixis in workplace
interaction. Journal of Pragmatics 32: 1855–1878.
Hindmarsh, John and Christoph Heath 2000b. Sharing the tools of the trade: The interactional
constitution of workplace objects. Journal of Contemporary Ethnography 29(5): 523–562.
Hopper, Paul 1987. Emergent grammar. Berkeley Linguistic Society 13: 139–157.
Kendon, Adam 1977. Studies in the Behavior of Face-to-Face Interaction. Lisse, the Netherlands:
Peter de Ridder Press.
Kendon, Adam 1980. Gesture and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Nonverbal Communication and Language, 207–277. The Hague: Mouton.
Kendon, Adam 1990. Conducting Interaction. Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University Press.
LeBaron, Curtis D. and Jürgen Streeck 1997. Built space and the interactional framing of experi-
ence during a murder interrogation. Human Studies 20: 1–25.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge and the world. In: David
McNeill (ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
588 IV. Contemporary approaches
Lee, John R. E. and D. Rod Watson 1993. Regards et attitudes des passants. Les arrangements de
visibilité de la locomotion. Annales de la Recherche Urbaine, 57–58; 101–109.
Lerner, Gene H. 2003. Selecting next speaker: The context-sensitive operation of a context-free
organization. Language in Society 32: 177–201.
McNeill, David 1985. So you think gestures are nonverbal? Psychology Review 92(3): 350–371.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
Mondada, Lorenza 2006. Participants’ online analysis and multimodal practices: Projecting the
end of the turn and the closing of the sequence. Discourse Studies 8: 117–129.
Mondada, Lorenza 2007. Interaktionsraum und Koordinierung. In: Arnulf Depperman and Rein-
hold Schmitt (eds.), Koordination. Analysen zur Multimodalen Interaktion, 55–94. Tübingen,
Germany: Narr.
Mondada, Lorenza 2009a. Emergent focused interactions in public places: A systematic analysis of the
multimodal achievement of a common interactional space. Journal of Pragmatics 41: 1977–1997.
Mondada, Lorenza 2009b. The methodical organization of talking and eating: Assessments in din-
ner conversations. Food Quality and Preference 20: 558–571.
Mondada, Lorenza 2011a. The interactional production of multiple spatialities within a participa-
tory democracy meeting. Social Semiotics 21(2): 283–308.
Mondada, Lorenza 2011b. The organization of concurrent courses of action in surgical demonstra-
tions. In: Jürgen Streeck, Charles Goodwin and Curtis D. LeBaron (eds.), Embodied Interaction,
Language and Body in the Material World, 207–226. Cambridge: Cambridge University Press.
Mondada, Lorenza 2012. The Conversation Analytic Approach to Data Collection. In: Jack
Sidnell and Tanya Stivers (eds.), Handbook of Conversation Analysis, 32–56. Malden, MA:
Wiley-Blackwell.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Berlin Verlag.
Nevile, Maurice 2004. Beyond the Black Box: Talk-in-Interaction in the Airline Cockpit. Aldershot,
UK: Ashgate.
Ochs, Elinor, Patrick Gonzales and Sally Jacoby 1996. When I come down I’m in the domain state:
Grammar and graphic representation in the interpretive activity of physicists. In: Elinor Ochs,
Emanuel Schegloff and Sandra E. Thompson (eds.), Interaction and Grammar, 328–369. Cam-
bridge: Cambridge University Press.
Ochs, Elinor, Emanuel A. Schegloff and Sandra A. Thompson (eds.) 1996. Grammar and Interac-
tion. Cambridge: Cambridge University Press.
Relieu, Marc 1999. Parler en marchant. Pour une écologie dynamique des échanges de paroles.
Langage et Société 89: 37–68.
Robinson, Jeffrey David 1998. Getting down to business: Talk, gaze, and body orientation during
openings in doctor-patient consultation. Human Communication Research 25: 98–124.
Roth, Wolff-Michael and Daniel V. Lawless 2002. When up is down and down is up: Body orien-
tation, proximity, and gestures as resources. Language in Society 31: 1–28.
Ryave, A. Lincoln and James Schenkein 1974. Notes on the art of walking. In: Roy Turner (ed.),
Ethnomethodology, 265–274. Harmondsworth, UK: Penguin.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the orga-
nization of turn-taking for conversation. Language 50: 696–735.
Schegloff, Emanuel A. 1984. On Some Gestures’ Relation to Talk. In: J. Maxwell Atkinson and John
Heritage (eds.), Structures of Social Action, 266–296. Cambridge: Cambridge University Press.
Schegloff, Emanuel A. 1996. Turn organization: One intersection of grammar and interaction. In:
Elinor Ochs, Emanuel A. Schegloff and Sandra A. Thompson (eds.), Grammar and Interaction,
52–133. Cambridge: Cambridge University Press.
Schegloff, Emanuel A. 1998. Body torque. Social Research 65(3): 535–586.
Schegloff, Emanuel A., Gail Jefferson and Harvey Sacks 1977. The preference for self-correction
in the organization of repair in conversation. Language 53: 361–382.
38. Verbal, vocal, and visual practices in conversational interaction 589
Streeck, Jürgen 1993. Gesture as communication I: Its coordination with gaze and speech. Com-
munication Monographs 60: 275–299.
Streeck, Jürgen 2009. Gesturecraft: The Manufacture of Understanding. Amsterdam: John
Benjamins.
Streeck, Jürgen, Charles Goodwin and Curtis D. LeBaron 2011. Embodied Interaction, Language
and Body in the Material World. Cambridge: Cambridge University Press.
Suchman, Lucy 2000. Making a case: “Knowledge” and “routine” work in document production.
In: Paul Luff, John Hindmarsh and Christian Heath (eds.), Workplace Studies. Recovering
Work Practice and Informing System Design, 29–45. Cambridge: Cambridge University Press.
Sudnow, David 1972. Temporal parameters of interpersonal observation. In: David Sudnow (ed.),
Studies in Social Interaction, 259–279. New York: Free Press.
Watson, Rod 2005. The visibility arrangements of public space: Conceptual resources and methodo-
logical issues in analysing pedestrian movements. Communication & Cognition 38(1–2): 201–227.
Abstract
Drawing on the theories and methodologies of Conversation Analysis, Interactional Lin-
guistics and research in Multimodality, this article explores some aspects of the relation-
ship between verbal, vocal and visual practices in social interaction. After giving an
outline of some results of previous research on the use and the relationship between ver-
bal, vocal and visual resources in social interaction in general, and after outlining the
goals and methodology of this paper, I will present a sample analysis of an extract
from a video-taped conversation. In this analysis, I will make explicit and demonstrate
how participants in interaction use verbal, vocal and visual cues in co-occurrence and
concurrence in order to organize their interaction and, in this particular case, make affec-
tivity interpretable for the recipient, who then is expected to respond, and in the example
shown responds affiliatively.
1. Introduction
For a long time, research on language and interaction, in the field of Conversation Ana-
lysis (CA) and neighboring approaches such as Interactional Linguistics (IL), has
590 IV. Contemporary approaches
Before going into more detail, the clarification of the terminology in the title and in
the following paper is necessary.
I will conceive of verbal practices and aspects of “talk-and-other-conduct-in-interaction”
(Schegloff 2005) as comprising the use of resources from the domains of rhetoric, lexico-
semantics, syntax, and segmental phonetics and phonology.
Vocal practices encompass the use of resources from the domains of prosody and
voice-quality. In this sense, the conception of vocal is co-extensive with the Firthian
conception of prosody (see Firth 1957). Firth conceived of prosodies as all types of syn-
tagmatic relationships between syllables that are not determined by the structure of
words and utterances. These include syllable structures, stress/accentuation, tone, qual-
ity and quantity, and, where applicable, also phenomena like glottalization, aspiration,
nasalization, whisper etc. Following Firth, all suprasegmental phenomena that are
constituted by the interplay of pitch, loudness, duration and voice quality can be under-
stood as prosodic, as long as they are used – independently of the language’s segmental
structure – as communicative signals (see Couper-Kuhlen 2000: 2; Kelly and Local 1989;
Selting 2010b).
In face-to-face interaction, visual practices of the participants also play an important
role. Yet, there is a problem of terminology here. The term nonverbal in expressions
such as nonverbal communication or nonverbal activities has been criticized because
it implies an unwanted verbal bias of research. Yet the term visual practices or visual
aspects, which is often used as an alternative, is problematic too, since it also includes
some aspects of articulatory phonetics and prosody that we would not want to include
as meaningful interactional devices: for instance, lip movements characteristic of the
production of segmental sounds like high front versus high back vowels ([i] vs. [u]), bi-
labial or labio-dental plosives, nasals, or fricatives. At the same time, lip movements are
realized to produce particular voice qualities, such as spread lips, “pursed lips”, for smil-
ing voice or as cues to suggest irony or the like respectively. Do we count these as con-
comitant cues of the particular voice quality – or as visual cues in themselves? Where,
thus, is the borderline between articulatory phonetic and visual cues? On the one hand,
this problem demonstrates that the cues which scientific practice and terminology want to
precisely define and allocate are deployed in co-occurrence and concurrence in the reality
of multimodal interaction. The allocation of cues to different categories is an analytic
problem, not a practical one for the participants in interaction. The only relevant issue
for participants is: which of the multiplicity of cues is interactionally relevant? Neverthe-
less, on the other hand, in order to enhance scientific generalizations, we need to isolate,
classify and categorize the different kinds of cues deployed in multimodal interaction. In
the following, I will use the terms visual practices and visual resources, referring to re-
sources from the domains of gaze, facial expression, gesture, posture, object manipula-
tion, etc. as long as they are deployed for the signalling of interactional meaning.
Until quite recently, many students of primarily verbal discourse and interaction, in
encountering the necessity of dealing with other than the verbal aspects of communica-
tion, have been concerned with relationships between the verbal and non-verbal parts
of the messages or actions studied (see, e.g., Schönherr 1993, 1997). Only recently has
this changed, as Sidnell (2006: 379f.) notes: “current work on multimodality focuses on
questions of integration (or “reassembly” as Schegloff 2005 put it) by putting at the
forefront the question of how different modalities are integrated so as to form coherent
courses of action.”
When we look at video recordings of social interaction, we can see, as C. Goodwin
(2000) phrases it,
that the construction of action through talk within situated interaction is accomplished
through the temporally unfolding juxtaposition of quite different kinds of semiotic re-
sources, and that moreover through this process the human body is made publicly visible
as the site for a range of structurally different kinds of displays implicated in the constitu-
tion of the actions of the moment. (Goodwin 2000: 1490)
Stivers and Sidnell (2005), following Enfield (2005), distinguish between the vocal/aural
and visuospatial modalities, the vocal/aural modality encompassing spoken language in-
cluding prosody, the visuospatial modality including gesture, gaze, and body postures.
They point out that they do not want to rate the modalities differently with respect
to their relevance:
Yet, with respect to how precisely the concurrent modalities are understood as working
together by participants in interaction, Levinson (2006) points out that synchrony alone
cannot explain this:
To investigate multimodality, one needs to pay attention to the level of structured activ-
ities: those situated activity systems within which analysts and the coparticipants encounter
gestures, directed gaze, and talk working together in a coordinated and differentiated way.
This is a unit of interaction that is relatively discrete; has a beginning, middle, and end; and
provides a structure of opportunities for participation. (Sidnell 2006: 380)
This is thus the site where Conversation Analysis, Interactional Linguistics and
Multimodality can meet to fulfill their goals together.
– New turn-constructional units begin with the beginning of new syntactic units, e.g.
clauses, phrases, single-word units – and also with the beginning of new prosodic
units, often contextualized as new via prosodic breaks and pitch step ups or step
downs or faster syllables at the beginning of new units.
– There may be pre-positioned items, syntactically as well as prosodically formatable in
different ways between the poles of independent/exposed units to integrated items of
the thus begun turn-constructional units, deployed in order to project and focus the
unit-to-come.
– Before pauses within or after syntactically possible complete units, level or slightly
rising pitch may be deployed in order to hold the turn and project continuation.
– After pauses in the middle of units, projected clauses can be continued – just as
projected contours and previous loudness can be continued after the pause.
– Syntactic constructions can be expanded after first and further possible completion
points – just as the prosody after possible completion points can be expanded, by
594 IV. Contemporary approaches
adding further words or phrases continuing the prior prosody. These may be formated
in different ways between the poles of prosodically integrated items to independent/
exposed post-positioned units.
– At syntactically possible completion points, level or slightly rising pitch can be used in
order to hold the turn and project continuation across the turn-constructional unit
boundary, thus projecting another turn-constructional unit to come.
All this shows that both syntax and prosody can be deployed to construct turn-construc-
tional units as flexible entities that may be adapted to the local exigencies of the inter-
action (see Auer 1991, 1996; Couper-Kuhlen 2007; Schegloff 1996; Selting 1995a, 1995b,
1996, 2000, 2001, 2008).
Müller (2009) gives the following general characterization on the relation between
verbal and gestural parts of utterances in interaction:
Gestures are part and parcel of the utterance and contribute semantic, syntactic and
pragmatic information to the verbal part of the utterance whenever necessary. As a
visuo-spatial medium, gestures are well suited to giving spatial, relational, shape,
size and motion information, or enacting actions. Speech does what language is
equipped for, such as establishing reference to absent entities, actions or events or es-
tablishing complex relations. In addition, gestures are widely employed to turn verbally
implicit pragmatic and modal information into gesturally explicit information. (Müller
2009: 517)
Bohle (2007) investigated the role of gestures in the organisation of turn taking in Ger-
man conversations. She found a number of parallel properties between gesture and
prosody. In particular:
– Gesture phrases have, like intonation units/phrases, flexible beginnings and endings:
they can be expanded, interrupted and continued, and abandoned (see Bohle 2007:
274).
– Gestures which are prepositioned in relation to the verbal-vocal units that they are
co-expressive with, i.e. constructed before the verbal-vocal units have begun, may
be used to project the unit and its continuation (see Bohle 2007: 274).
– During pauses, or in response to competitive incomings (French and Local 1983),
gestures may be deployed to project more-to-come and/or claim turn continuation,
they may thus be used as turn-holding devices, both locally within as well as more
globally across turn-constructional units (Bohle 2007: 275f.).
– In transition relevance spaces there are no turn-holding devices being used. Rather,
what we find is: pragmatic completion, syntactic completion, ending intonation, and
the returning of the hands and arms into a rest position. One or more of these devices
can be shifted to constitute slightly incongruous structures at the beginning or ending
of units, in order to exploit them for purposes such as projection of continuation or
the negotiation of turn taking (see Bohle 2007: 280).
Bohle (2007: 277) summarizes the parallel practices from prosody and gesture in a table
(see Tab. 38.1; my translation from German, MS):
38. Verbal, vocal, and visual practices in conversational interaction 595
Tab. 38.1: Parallel practices from prosody and gesture according to Bohle (2007: 277)
Function Prosodic practice Gestural practice
Utterance Ending intonation Withdrawal of hands into rest position
completion
Possible Completion of intonation Completion of gesture phrase
completion contour
of a unit
Continuation of an Continuing intonation Stop of movement at the high point/
utterance or turn climax or in the retraction phase
Continuation of Continuation of prior speech Continuation of rhythm of movement
unit with an rhythm Maintaining prior locus of movement
expansion Continuation of prior tempo Maintaining prior hand configuration
Continuation of prior loudness Continuation of prior gesture phrase
Continuation of prior intonation
contour
Turn continuation Change of speech rhythm Change of rhythm of movement
with new unit Change of tempo Change of locus of movement
Change of loudness Change of hand configuration
New intonation contour New gesture phrase
Projection of turn Semantic projection Pre-positioning of referential gesture
continuation Discourse-pragmatic projection Pre-positioning of pragmatic gesture
Activity-type specific projection Construction of cumulative
gesture unit
In Bohle’s view, gesture and speech are co-expressive, yet the construction of ges-
tures is independent from speech. In particular, the timing of gestures in relation
to speech may be deployed for the suggestion of interactional meanings (see Bohle
2007: 274).
A few studies by C. and M. H. Goodwin have investigated the concurrence of pro-
sodic and visual resources for the construction of action sequences in special contexts
and settings. E.g., M. H. Goodwin (1996) describes how, within the ecology of work si-
tuations in an airport, informings and announcements issued from the Operations room
rely on different prosodic patterns in order to get tailored for their target audience and
the space that they inhabit (see M. H. Goodwin 1996: 436). Girls playing hop scotch are
shown to build actions that require the integrated use of both particular verbal and pro-
sodic formats within the semiotic field provided by the hop scotch grid (see M. H.
Goodwin and C. Goodwin 2000). A man suffering from severe aphasia is shown to suc-
cessfully communicate with his family by relying on prosodic and visual resources (see
M. H. Goodwin and C. Goodwin 2000; C. Goodwin 2010).
Most studies, however, in investigating the multimodality of the organization of
interaction with respect to, e.g., turn taking or storytelling, have concentrated on the
deployment of visual resources. In many studies, the use of these devices for the projec-
tion of imminent actions is emphasized, projection being conceived of as a prerequisite
for human interaction and cooperation.
596 IV. Contemporary approaches
At the heart of language and bodily conduct as public resources for the achievement of
socially coordinated participation in situated activities is the projectability of human con-
duct. Projectability allows participants to anticipate the future course of action being pro-
duced by another participant and produce a specific form of action that fits into the
unfolding structure of that other participant’s ongoing action (Hayashi 2005: 45f; see
also Streeck’s 2009 analysis of “forward-gesturing”).
According to Hayashi (2005: 47), turns at talk “are multimodal packages for the pro-
duction of action (and collaborative action) that make use of a range of different mod-
alities, e.g., grammatical structure, sequential organization, organization of gaze and
gesture, spatial-orientational frameworks, etc., in conjunction with each other”. In con-
clusion, Hayashi encourages a way to conceptualize a turn at talk as “a temporally un-
folding, interactively sustained domain of embodied action through which both the
speaker and recipients build in concert with one another relevant actions that contribute
to the further progression of the activity in progress” (Hayashi 2005: 47–48).
In a series of studies, Mondada (e.g., 2006, 2007) investigates the role of multimodal
resources in the organization of turn taking in interaction. With reference to a fragment
from a work meeting interaction in an architect’s office in Paris, in which three partici-
pants talk about and manipulate a map showing the castle which they want to transform
into a luxury hotel, Mondada (2006) shows that and how “emergent dynamics – such as
projections at the level of turn, sequence and action – are displayed and oriented to by
participants in a detailed and timed way” (Mondada 2006: 127). For a particular con-
text, professional work meetings of experts sitting at a table and discussing and devel-
oping a cartographic language for modelling agricultural land, using various drawings
and artefacts on the table, Mondada (2007) describes a multimodal practice of self-
selection: “the use of pointing gestures predicting turn completions and projecting the
emergence of possible next speakers” (Mondada 2007: 194).
Müller (2003) shows in a case study how gestures are related to storytelling: “Ges-
tures are intertwined with the verbal part of the utterance with regard to rhematic infor-
mation and communicative dynamism. They represent and/or highlight the rhematic
information verbally provided which receives a high communicative dynamism. And
in doing this they construct a visible account of the story line and of the narrative
peaks” (Müller 2003: 263). “The gestures (…) are part of the narrative structure of a
verbally and bodily described event. They create a visual display of the narrative struc-
ture and thus turn the story telling into a multi-modal event: something to listen to but
also something to watch (…) they are natural elements of an everyday rhetoric of telling
a story in conversation” (Müller 2003: 263).
With respect to recipients’ responses to storytelling, Stivers (2008) shows that vocal
and visual responses during storytelling convey different information: “whereas vocal
continuers simply align with the activity in progress, nods also claim access to the tell-
er’s stance toward the events (whether directly or indirectly)” (Stivers 2008: 31). That
means that whereas in mid-storytelling verbal recipiency tokens, acknowledgements
and continuers, are deployed to achieve more formal alignment, nodding may be
38. Verbal, vocal, and visual practices in conversational interaction 597
deployed to signal affiliation, i.e. “that the hearer displays support of and endorses the
teller’s conveyed stance” (Stivers 2008: 35).
In conclusion, although work like the latter discussed has provided important in-
sights into the relation and concurrence of verbal and visual resources in interaction,
there is still the need to integrate the results of this research with the results of research
in prosody in interaction.
(i) the verbal display: rhetorical, lexico-semantic, syntactic, and segmental phonetic-
phonological resources;
(ii) the vocal display: resources from the domains of prosody and voice quality;
(iii) visual resources from the domains of body posture and its changes, head move-
ments, gaze, hand movements and gestures, and the manipulation of objects.
I will present an extract from a face-to-face conversation. The data come from a corpus
of 8 everyday face-to-face conversations with 2 or 3 participants in their home environ-
ments in the area of Berlin-Potsdam, recorded by the project Emotive Involvement in
Conversational Storytelling within the Cluster of Excellence Languages of Emotion in
2008–2009 at Freie Universität Berlin. For these data, the project devised a recording
technique adopted from Peräkyla and Ruusuvuori (2006): We used three cameras
plus an extra audio flash recorder. The three cameras were focussed to capture both
the total situation as well as the faces and bodies of the participants facing each
other separately. For data analysis, all four recordings were synchronized and combined
into one film, allowing analysts to look at the same sequences from three different
perspectives as well as to have access to a high-quality audio-recording.
The data have been transcribed according to a transcription system developed by a
group of German interactional linguists in 1998, revised in 2009 (Selting et al. 1998,
2009). This system is similar to the transcription system used in Conversation Analysis,
but it attempts a more linguistically systematic notation, especially with respect to prosody
in talk-in-interaction. The notation conventions can be found in the appendix.
For the demonstration of the relationship between verbal, vocal, and visual practices
in interaction I will present a sequence of interaction in which a storyteller and a recip-
ient enact a climax of story telling. Here, as elsewhere in face-to-face interaction, all
kinds of resources are deployed to co-occur and work together in order to make the
respective activity recognizable for the recipients, who are then expected to respond
affiliatively.
598 IV. Contemporary approaches
5. Sample analysis
The following extract (from: LoE_VG_03_Parkausweis Gehbehinderte) shows the cli-
max of a complaint story told by Carina, and recipient Hajo’s responses. Carina has just
been telling how she came to be parking in a parking place for disabled people for a
very brief time and on coming back found a ticket on the windscreen of her car. In
response to this, Hajo provides a recipiency token in segment 10. With the following
segments Carina makes the climax of her complaint story recognizable to Hajo, who
then responds more strongly.
Carina’s story climax
In overlap with Hajo’s recipiency token, Carina in 11 produces the swear word
FUCK and then in 12 gives the sum she had to pay as a fine.
12 |<<whispery>SIEBzig euro.>
seventy euros
|((with raised eyebrows))
13 (-)
Fig. 38.1
Fig. 38.2
38. Verbal, vocal, and visual practices in conversational interaction 601
|<<dim>pArkplatz [(stEhn).>
parking place
|((nodding in synchrony with accented syllables,
|then gaze away from Haj))
seventy euros for using a parking place for the disabled
Rhetorically and lexico-semantically, she does not add anything new, but only formu-
lates the egregious fine in a more elaborate form again. Syntactically, this is a non-finite
construction, mentioning only the bare fact, with the mentioning of the extreme sum of
the fine in a topicalized position, but it is longer than the first rendering. Prosodically,
the topicalized extreme sum is presented with an accented syllable rising to an extra-
high pitch peak and carrying some lengthening, thus signaling the focus of the unit
right from the beginning. The words in the rest of the unit carry a high number of addi-
tional secondary accents, namely five; these are not rhythmically organized but sepa-
rated by two brief pauses. Nevertheless, the accentuation is dense (see Selting 1994),
with only few unaccented syllables between the accented ones, even though most of
the accents are not very strong. The unit ends in soft voice. Visually, Carina nods her
head in synchrony with the accented syllables; at first she continues to gaze at Hajo
and then directs her gaze away from him.
In this case it is not only the verbal, vocal and visual marking that displays the emo-
tive involvement, but also the fact that Carina repeats the egregious fact again, in more
or less the same words as before. She thus draws attention to the egregious fact again.
But in contrast to the first rendering, as the climax, which seemed to re-enact her affect
in the storyworld, she now seems to comment on and evaluate the egregious fine for
Hajo in the here-and-now and thus creates another opportunity for Hajo to respond.
Carina’s in-situ evaluation of the complainable seems to be weaker and calmer than
her prior reconstructed rendering of it.
Again, this analysis can be warranted with reference to Hajo’s response at segment
16. Hajo provides ^HOLla. with slow speech rate and with marked rising-falling pitch.
This can be seen in the acoustic analysis shown in Fig 38.5, carried out with the software
programme PRAAT.
Just as Carina’s second formulation of her climax was longer than her first, so Hajo’s
second response cry is longer: it now has two syllables. And in comparison to his prior
response at 14, this second response cry is prosodically and visually less marked. As
Fig. 38.5, in comparison to Fig. 38.4, shows, the F0 peak is lower and the intensity is
602 IV. Contemporary approaches
0.111038805 0.392990464
100
Intensity (dB)
400 50
300
Pitch (Hz)
200
75
hm hm
0 0.7227
Time (s)
Fig. 38.3
0.173040492 0.600584813
100
Intensity (dB)
400 50
300
Pitch (Hz)
200
75
o a h
0 0.7193
Time (s)
Fig. 38.4
0.0345618371 0.656058515
100
Intensity (dB)
400 50
300
Pitch (Hz)
200
75
h h o l a
0 0.7205
Time (s)
Fig. 38.5
38. Verbal, vocal, and visual practices in conversational interaction 603
lower and gradually rising and falling throughout the item. There is no pressed articu-
lation any longer, but slow tempo. Nevertheless, it is still much more prominent with
respect to both pitch movement as well as intensity than his recipiency token hm_hm
from line 10 (shown in Fig. 38.3). Hajo continues the visual marking of his first response:
he is gazing with his eyes wide open and with raised eyebrows, but does not add new
signals. This means: Just as Carina’s in-situ evaluation of the complainable was weaker
than her first re-enaction of it, so now Hajo’s second response is weaker than his first.
Nevertheless, it is a fully affiliative response to Carina’s complaint story.
Analyzing the sequence, the form and succession of the two adjacency pairs by
Carina and the interaction between Carina and Hajo here suggest the following
interpretations:
Carina and Hajo thus display what M. H. Goodwin (1980) has called “mutual monitor-
ing” (see also C. Goodwin and M. H. Goodwin 1987). (On a different “epistemic ecol-
ogy” created through the two sequences in succession, see also C. Goodwin, 2010. For
more detail on the sequential structure here see Selting 2010a).
The results of this analysis can be summarized as follows: In their presentation of
the climax of a complaint story – in which the storyteller uses general rhetorical re-
sources like the presentation of the offender or offending as acting or being unfair,
irrational, offensive and the presentation of the self as acting in a fair, rational, justi-
fied manner – participants use verbal, vocal and visual cues to signal their emotive in-
volvement in telling and responding to the story. In particular, we saw the following
cues being deployed:
(a) Verbal cues:
Rhetorically and lexico-semantically: Reformulation of an action, extreme-case
formulations, swear words or expletives; sound objects that function as response cries.
Syntactically: short, dense “elliptical” constructions and clauses.
(b) Vocal cues:
Prosodically: prosodic marking cues such as extra-strong accents, extra-high pitch
peaks, lengthenings, dense accentuation, tempo changes, changes of pitch register.
Voice quality: whispery voice; pressed, tense voice.
604 IV. Contemporary approaches
6. Conclusion
The extract has shown that participants in interaction deploy verbal, vocal and visual
cues and practices from several different domains in co-occurrence and concurrence
in order to make their actions recognizable and interpretable for the recipient and
thus enabling her or him to provide the appropriate recipient responses.
There is no one-to-one relationship between cues and meanings. It is rather that bun-
dles of co-occurring cues suggest the interpretation of both actions as well as concom-
itant aspects such as affectivity or emotive involvement in general, while the specific
action or affect being displayed has to be interpreted within the sequential context.
In the case shown here, the relation between the verbal, vocal and visual cues was a sim-
ple one: all cues and practices were enacted as cues that support and enhance the actions
expressed via the other cues. In other words: all cues were deployed to comply with and
support each other. I have not dealt with any cases in which cues are deployed in a non-
congruent fashion, e.g., in order to suggest the interpretation of the message as exagger-
ated, fake, mock, ironic or the like. Neither have I dealt with any cases in which visual prac-
tices were used in place of speech or even as actions altogether independent from speech.
Acknowledgements
I am grateful to Elizabeth Couper-Kuhlen for comments on a previous version of this
paper. I also thank Maxi Kupetz for making the still frames for me and Yuko Sugita
for making the PRAAT analyses for me, presented in the figures 38.3–38.5.
Pauses
(.) micropause
(-), (--), (---) brief, mid, longer pauses of ca. 0.25–0.75 secs.; until ca. 1 sec.
(2.0) estimated pause, more than ca. 1 sec. duration
(2.85) measured pause (notation with two digits after the dot)
? rising to high
, rising to mid
- level
; falling to mid
. falling to low
SO falling
´SO rising
¯SO level
ˆSO rising-falling
ˇSO falling-rising
Rhythm
/xxx /xx x/xx rhythmically integrated talk: “/” is placed before a rhythmic beat
↑ to higher pitch
↓ to lower pitch
606 IV. Contemporary approaches
Laughter
Breathing
Other conventions
7. References
Auer, Peter 1991. Vom Ende deutscher Sätze. Zeitschrift für Germanistische Linguistik 19/2: 139–157.
Auer, Peter 1996. On the prosody and syntax of turn-continuations. In: Elizabeth Couper-Kuhlen
and Magret Selting (eds.), Prosody in Conversation. Interactional Studies, 57–100. Cambridge:
Cambridge University Press.
38. Verbal, vocal, and visual practices in conversational interaction 607
Bohle, Ulrike 2007. Das Wort Ergreifen – das Wort Übergeben. Explorative Studie zur Rolle Re-
debegleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Couper-Kuhlen, Elizabeth 2000. Prosody. In: Jef Verschueren, Jan-Ola Östman, Jan Blommert and
Chris Bulcaen (eds.), Handbook of Pragmatics, 1–19. Amsterdam: John Benjamins.
Couper-Kuhlen, Elizabeth 2007. Prosodische Prospektion und Retrospektion im Gespräch. In:
Heiko Hausendorf (ed.), Gespräch als Prozess. Linguistische Aspekte der Zeitlichkeit verbaler
Interaktion, 69–94. Tübingen: Narr.
Couper-Kuhlen, Elizabeth and Cecilia Ford (eds.) 2004. Sound Patterns in Interaction. Amsterdam:
John Benjamins.
Couper-Kuhlen, Elizabeth and Magret Selting (eds.) 1996. Prosody in Conversation. Interactional
Studies. Cambridge: Cambridge University Press.
Couper-Kuhlen, Elizabeth and Magret Selting (eds.) 2001. Studies in Interactional Linguistics. Am-
sterdam: John Benjamins.
Enfield, N. J. 2005. The body as a cognitive artifact in kinship representations: Hand gesture dia-
grams by speakers of Lao. Current Anthropology 46(1): 1–26.
Firth, John Rupert 1957. Papers in Linguistics, 1934–1951. Oxford: Oxford University Press.
French, Peter and John Local 1983. Turn-competitive incomings. Journal of Pragmatics 7: 701–715.
Goffman, Erwing 1978. Response cries. Language 54: 787–815.
Goffman, Erwing 1981. Forms of Talk. Oxford: Blackwell.
Goodwin, Charles 1996. Transparent vision. In: Elinor Ochs, Emanuel A. Schegloff and Sandra A.
Thompson (eds.), Interaction and Grammar, 370–404. Cambridge: Cambridge University Press.
Goodwin, Charles 2000. Action and embodiment within human interaction. Journal of Pragmatics
32: 1489–1522.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassel and
Elena Levy (eds.), Gesture and the Dynamic Dimensions of Language, 195–212. Amsterdam:
John Benjamins.
Goodwin, Charles 2010. Constructing meaning through prosody in aphasia. In: Dagmar Barth-
Weingarten, Elisabeth Reber and Margret Selting (eds.), Prosody in Interaction, 373–394.
Amsterdam: John Benjamins.
Goodwin, Charles and Marjorie Harness Goodwin 1987. Concurrent operations on talk: Notes on
the interactive organization of assessments. Papers in Pragmatics 1: 1–54.
Goodwin, Marjorie Harness 1980. Processes of mutual monitoring implicated in the production of
descriptive sequences. Sociological Inquiry 50: 303–317.
Goodwin, Marjorie Harness 1996. Informings and announcements in their environment: prosody
within a multi-activity work setting. In: Elizabeth Couper-Kuhlen and Margret Selting (eds.),
Prosody in Conversation. Interactional Studies, 436–461. Cambridge: Cambridge University
Press.
Goodwin, Marjorie Harness and Charles Goodwin 1986. Gesture and coparticipation in the activ-
ity of searching for a word. Semiotica 62: 51–75.
Goodwin, Marjorie Harness and Charles Goodwin 2000. Emotion within situated activity. In:
Alessandro Duranti (ed.), Linguistic Anthropology: A Reader, 239–257. Oxford: Blackwell.
Günthner, Susanne 2000. Vorwurfsaktivitäten in der Alltagsinteraktion. Tübingen: Niemeyer.
Hakulinen, Auli and Margret Selting (eds.) 2005. Syntax and Lexis in Conversation. Amsterdam:
John Benjamins.
Have, Paul ten 1999. Doing Conversation Analysis. London: Sage.
Hayashi, Makoto 2005. Joint turn construction through language and the body: Notes on embodi-
ment on coordinated participation in situated activities. Semiotica 156(1/4): 21–53.
Hutchby, Ian and Robin Wooffitt 1998. Conversation Analysis. Cambridge: Polity Press.
Kelly, John and John Local 1989. Doing Phonology. Observing, Recording, Interpreting. Manche-
ster: Manchester University Press.
Lerner, Gene 2003. Selecting next speaker: The context-sensitive operation of a context-free orga-
nization. Language in Society 32: 177–201.
608 IV. Contemporary approaches
Levinson, Stephen C. 2006. On the human “interactional engine”. In: N. J. Enfield and Stephen C.
Levinson (eds.), Roots of Human Sociality: Culture, Cognition and Interaction, 39–69. Oxford:
Berg.
Mondada, Lorenza 2006. Participants’ online analysis and multimodal practices: Projecting the
end of the turn and the closing of the sequence. Discourse Studies 8(1): 117–129.
Mondada, Lorenza 2007. Multimodal resources for turn-taking: Pointing and the emergence of
possible next speakers. Discourse Studies 9(2): 194–225.
Müller, Cornelia 2003. On the gestural creation of narrative structure: A case study of a story told
in a conversation. In: Monica Rector, Isabella Poggi and Nadine Trigo (eds.), Gestures. Mean-
ing and Use, 259–265. Porto: Universidade Fernando Pessoa Press.
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), The Routledge Linguis-
tics Encyclopedia, 510–518. London: Routledge.
Peräkylä, Anssi and Johanna Ruusuvuori 2006. Facial expression in an assessment. In: Hubert
Knoblauch, Jürgen Raab, Hans-Georg Soeffner and Bernt Schnettler (eds.), Video Analysis:
Methodology and Methods, 127–142. Frankfurt am Main: Peter Lang.
Pomerantz, Anita 1986. Extreme case formulations: A way of legitimizing claims. Human Studies
9: 219–229.
Reber, Elisabeth 2008. Affectivity in talk-in-interaction: Sound objects in English. Ph.D. disserta-
tion. University of Potsdam.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the orga-
nization of turn-taking in conversation. Language 50: 696–735.
Schegloff, Emanuel A. 1984. On some gestures’ relation to talk. In: J. Maxwell Atkinson and
John Heritage (eds.), Structures of Social Action, 266–296. Cambridge: Cambridge University
Press.
Schegloff, Emanuel A. 1996. Turn organisation: One intersection of grammar and interaction. In:
Elinor Ochs, Emanuel A. Schegloff and Sandra A. Thompson (eds.), Interaction and Grammar,
52–133. Cambridge: Cambridge University Press.
Schegloff, Emanuel 1997. Whose text? Whose context? Discourse and Society 8: 165–187.
Schegloff, Emanuel A. 2005. On integrity in inquiry … of the investigated, not the investigator.
Discourse Studies 7(4–5): 455–480.
Schegloff, Emanuel A. 2007. Sequence Organization in Interaction. A Primer in Conversation Ana-
lysis. Cambridge: Cambridge University Press.
Schmitt, Reinhold 2007. Von der Konversationsanalyse zur Analyse multimodaler Interaktion. In:
Heidrun Kämper and Ludwig M. Eichinger (eds.), Sprach-Perspektiven. Germanistische Lin-
guistik und das Institut für Deutsche Sprache, 395–417. Tübingen: Narr.
Schönherr, Beatrix 1993. Prosodische und nonverbale Signale für Parenthesen. “Parasyntax” in
Fernsehdiskussionen. Deutsche Sprache 21: 223–243.
Schönherr, Beatrix 1997. Syntax – Prosodie – Nonverbale Kommunikation. Tübingen: Niemeyer.
Selting, Margret 1994. Emphatic speech style – with special focus on the prosodic signalling of
heightened emotive involvement in conversation. Journal of Pragmatics 22: 375–408.
Selting, Margret 1995a. Der “mögliche Satz” als interaktiv relevante syntaktische Kategorie. Lin-
guistische Berichte 158: 298–325.
Selting, Margret 1995b. Prosodie im Gespräch. Aspekte einer Interaktionalen Phonologie der
Konversation. Tübingen: Niemeyer.
Selting, Margret 1996. On the interplay of syntax and prosody in the constitution of turn-construc-
tional units and turns in conversation. Pragmatics 6(3): 357–388.
Selting, Margret 2000. The construction of units in conversational talk. Language in Society 29:
477–517.
Selting, Margret 2001. Fragments of units as deviant cases of unit-production in conversational
talk. In: Margret Selting and Elizabeth Couper-Kuhlen (eds.), Studies in Interactional Linguis-
tics, 229–258. Amsterdam: John Benjamins.
39. The codes and functions of nonverbal communication 609
Selting, Margret 2008. Linguistic resources for the management of interaction. In: Gerd Antos,
Eija Ventola and Tilo Weber (eds.), Handbook of Interpersonal Communication, 217–253.
Volume 2. Berlin: De Gruyter.
Selting, Margret 2010a. Affectivity in conversational storytelling: An analysis of displays of anger
or indignation in complaint stories. Pragmatics 20(2): 229–277.
Selting, Margret 2010b. Prosody in interaction: State of the art. In: Dagmar Barth-Weingarten, Eli-
sabeth Reber and Magret Selting (eds.), Prosody in Interaction, 3–40. Amsterdam: John
Benjamins.
Selting, Margret, Peter Auer, Birgit Barden, Jörg Bergmann, Elizabeth Couper-Kuhlen,
Susanne Günthner, Uta Quasthoff, Christian Meier, Peter Schlobinski and Susanne Uh-
mann 1998. Gesprächsanalytisches Transkriptionssystem (GAT). Linguistische Berichte
173: 91–122.
Selting, Margret, Peter Auer, Dagmar Barth-Weingarten, et al. 2009. Gesprächsanalytisches Trans-
kriptionssystem 2 (GAT 2). Gesprächsforschung – Online-Zeitschrift zur verbalen Interaktion
10(2009): 292–341 (www.gespraechsforschung-ozs.de).
Sidnell, Jack 2006. Coordinating gesture, talk, and gaze in reenactments. Research on Language
and Social Interaction 39(4): 377–409.
Stivers, Tanya 2008. Stance, aligment, and affiliation during storytelling: When nodding is a token
of affiliation. Research on Language and Social Interaction 41(1): 31–57.
Stivers, Tanya and Jack Sidnell 2005. Introduction: Multimodal interaction. Semiotica 156(1/4):
1–20.
Streeck, Jürgen 2009. Forward-Gesturing. Discourse Processes 46: 161–179.
Wilkinson, Sue and Celia Kitzinger 2006. Surprise as an interactional achievement: Reaction to-
kens in conversation. Social Psychology Quarterly 69(2): 150–182.
Abstract
The article considers whether verbal communication is fundamentally different from
nonverbal communication. By tracing back origins, message features, and features of
neurophysiological processing of verbal and nonverbal communication, the article iden-
tifies the nonverbal codes that form the nonverbal communication system and then more
broadly examines the communicative functions of nonverbal communication on a range
of different levels (e.g., structural, personal, interactional).
610 IV. Contemporary approaches
signals and in the communicative functions they perform (see Bavelas and Chovil 2000,
for an integrated message model and McNeill 2005, for growth point theory, both of
which detail the interdependence among verbal and nonverbal signals in forming mes-
sages). In what follows, we first disaggregate the parts to identify the nonverbal codes
that form the nonverbal communication system before reassembling them into the com-
municative functions that represent what nonverbal communication is meant to do.
Although our major goal in this chapter is to address what constitutes nonverbal com-
munication, we will allude periodically in this brief overview to the ways in which
nonverbal codes articulate with language to form a total meaningful utterance or
expression.
As a last prefatory clarification, the term gesture itself is open to a wide range of in-
terpretations, ranging from the full gamut of nonverbal communication codes to be dis-
cussed in this chapter to the more narrow and familiar interpretation referring to
displays performed by the limbs (usually hands or head). Although we will tend to
use the term gesture to refer to the latter, we recognize that gestures can be construed
as commensurate with all forms of nonverbal communication, including actions of the
head, face, eyes, hands, feet, trunk and voice; use of touch and distancing; physical
appearance; and manipulation of time, environments and artifacts. Thus, our usage is
one of semantic convenience rather than reflecting a philosophical distinction.
Two additional codes expressed through the body are the approach-avoidance codes.
At the approach end is haptics, which refers to communication through touch. At the
avoidance end is proxemics, which encompasses interpersonal distances, spatial
arrangements, and use of territory as forms of communication.
The two remaining codes, like proxemics, are part of what Hall (1959) called “the
silent language” or “the hidden dimension” because they refer to implicit messages
that are deeply imprinted in culture and yet operate most of the time outside conscious
awareness. Chronemics refers to the use and perception of time as a communication sys-
tem. Features such as lead time, wait time, punctuality and pacing are among the wide
range of chronemic messages that are possible. Finally, environment and artifacts con-
stitutes another nonverbal code related to place. It includes elements such as architec-
tural features, furniture arrangement, temperature, noise, and lighting. Many of these
codes also incorporate verbal cues. For example, “keep out” signs, like other territorial
barriers, regulate space and territory; t-shirts with slogans are appearance cues that send
a verbal message (Guerrero and Farinelli 2009).
These eight nonverbal codes rely on visual, olfactory, auditory, tactile, temporal, and
spatial channels for the generation, transmission, receipt and interpretation of mes-
sages. Although the codes are in principle separable, each with its own distinctive struc-
tural properties, the various codes are better understood as part of a highly
interdependent system of communication, complete with redundant and substitutable
forms of expression, all in service of specific communication functions. Although they
are also highly interrelated with the verbal stream, they are not mere hand maidens
to verbal communication. Instead, they can stand alone, performing important commu-
nication functions independent of any words being uttered. Thus, it becomes useful
to distinguish the ways in which verbal and nonverbal communication are alike or
different.
display emotion in particular ways even if they cannot see the emotional expressions of
others (Eibl-Eibesfeldt 1973; Galati, Scherer, and Ricci-Bitti 1997).
Second, nonverbal communication has phylogenetic primacy over verbal communi-
cation (Burgoon, Guerrero, and Floyd 2010). Nonverbal communication predated ver-
bal communication in the evolutionary history of the human species; before people
learned to speak and use language, they communicated through nonverbal means.
Third, nonverbal communication also has ontogenetic primacy over verbal commu-
nication, which means that children learn how to communicate nonverbally before
they learn how to communicate using language. These combined characteristics of non-
verbal communication result in it often being trusted over verbal communication when
the two conflict (Burgoon, Guerrero, and Floyd 2010).
Other features that help parse out the differences between nonverbal and verbal
communication include the extent to which a message is multimodal, spontaneous,
reflexive, and occurs in the here and now (Burgoon 1985; Guerrero and Farinelli
2009). Nonverbal messages tend to be more multimodal or multichanneled, which
means that people can display various nonverbal behaviors – such as smiling while lean-
ing forward and tossing one’s head back – all at the same time. In contrast, verbal mes-
sages tend to be unimodal, inasmuch as people can only say one word at a time.
Exceptions would be communicating dual verbal messages through means such as hold-
ing up a protest sign while chanting the same (or similar) words that appear on the sign.
Nonverbal messages also tend to be more spontaneous than verbal messages. Some non-
verbal behaviors, such as speaking with a nervous voice when giving a speech, are par-
ticularly hard to control. Verbal communication, on the other hand, requires at least a
minimal level of forethought for encoding to take place. Thus, although nonverbal com-
munication is often intentional and strategic, it generally tends to be more spontaneous
than verbal communication.
Verbal communication, however, tends to be characterized by more reflexivity and
displacement (Burgoon 1985). Reflexivity is the degree to which a code can reflect
upon itself. People can make statements such as “I’m sorry I said that,” or “Let me
re-word that” but it is difficult to use nonverbal behaviors to modify or direct others
to re-interpret one’s previous nonverbal behaviors. Displacement involves being able to
refer to things that are removed in time and space. Again, people can accomplish this
with words (e.g., “I was not myself yesterday”) but it is difficult, if not impossible, to
communicate the same sentiment with nonverbal communication.
4. Neurophysiological processing
The ways in which the human brain processes nonverbal versus verbal information also
differ to some extent. Scholars have made an important distinction between analogic
and digital messages. An analogic signal contains a continuous range of values, whereas
a digital signal contains a finite set of values. Analogic signals also tend to be processed
holistically, whereas digital signals involve processing discrete units of information. For
example, children often first learn the alphabet by singing it (analogic encoding and de-
coding) but it takes a while for them to understand that the alphabet is composed of
discrete units called letters (digital encoding). Whereas verbal communication tends
to be processed digitally, nonverbal signals may be processed analogically or digitally.
Recognition of words, symbols and emblematic gestures take place in the left side of
the brain, which is primarily responsible for tasks that involve using logic and analytical
reasoning and other forms of digital processing. On the other hand, voice recognition,
depth perception, and tasks related to emotions, space, pictures, and music tend to
create more activity in the right side of the brain, which is primarily responsible for ana-
logic processing (e.g., Hopkins 2007). Andersen (2008) argued that verbal communica-
tion tends to be digital and nonverbal communication, analogic: People understand
language by recognizing individual words, whereas nonverbal expressions are con-
tinuous and holistic. For example, people look at the whole face rather than dissecting
smaller movements of the eyes and mouth.
However, neurophysiological and discourse-oriented research both suggest that this
digital-analogic distinction is more complex than originally thought (Bavelas and
39. The codes and functions of nonverbal communication 615
Chovil 2000; Efron 1990; MacLean 1990; McNeill 1992; Ploog 2003). First, findings sup-
porting this distinction are largely limited to initial perceptions that occur during the
decoding process. So while people may initially perceive many nonverbal behaviors
using right hemispheric processing, they may make sense of those perceptions using
both sides of their brain. Similarly, people may decode many nonverbal messages ana-
logically, but the encoding of those same messages may be a different story, especially
when a nonverbal message is sent with intent. Researchers have also proposed that the
brain is better understood as containing three parts: a primitive brain (located near the
brain stem) that controls instinctive behaviors such as screaming, defending one’s ter-
ritory, and protecting one’s child; a paleomammalian brain (composed of the limbic sys-
tem surrounding the primitive brain), which controls emotional expression and bodily
functions; and a neomammalian brain (part of the cerebral cortex), which controls
higher-order cognitive activity such as language processing. The tripartite brain approach
similarly implies more interdependence in verbal and nonverbal encoding and decoding
than early perspectives assumed, although clear neurophysiological differences in the
processing of verbal and nonverbal signals remain.
Other assumptions are that nonverbal communication is strategic, dynamic and itera-
tive – nonverbal expressions are created intentionally to achieve various ends, they
do not remain static but instead are changeable and evolving as a function of feedback
between interlocutors (see Burgoon and Saine 1978, and Patterson 1983, for fuller
articulation of the assumptions of a functional perspective).
Nonverbal communication, in particular, has been connected to the following func-
tions: message production and processing, expression of emotions, identification and
self-presentation, interaction management, relational communication, social influence,
and deception.
people may be more likely to hide feelings of anger than people from the U.S. because
of different cultural rules related to politeness and the importance of group harmony.
Other scholars support a behavioral ecology perspective (Fridlund and Duchaine
1996), which suggests that rather than altering one’s innate emotional expression by ap-
plying display rules, people display emotions that are consistent with their social mo-
tives. So, people might inhibit expressions of anger because they want to maintain a
positive relationship with someone. According to the behavioral ecology perspective,
the resulting expression would not reflect that someone is hiding anger, but rather
that someone wishes to be perceived positively by a relational partner. This perspective
aligns with Buck’s (1997) concept of social signals that reflect intentionally formulated
displays rather than expressive read-outs of internal experiences.
Relatedly, a significant body of scholarship has examined the extent to which culture
moderates emotional encoding and decoding. Evidence from meta-analyses confirms
that people from different cultures decode facial expressions of basic emotions simi-
larly, although there is an in-group advantage, with people most likely to decode facial
expressions accurately when they are viewing someone from their own culture (Elfen-
bein and Ambady 2002, 2003). There is also cross-cultural similarity in how people
encode basic emotions, especially when emotional expressions are spontaneous. How-
ever, people from different cultures also exhibit nonverbal accents, which help people
identify their country of origin. For example, people can differentiate between Australians
and U.S. citizens based on their smiles (Marsh, Elfenbein, and Ambady 2003).
individual” or “to maintain or modify aspects of another individual that are proximal
to behavior, such as cognitions, emotions, and identities” (Dillard, Anderson, and
Knobloch 2002: 426). A long history of research links nonverbal behavior to social influ-
ence. In the area of compliance-gaining, a meta-analysis demonstrated that nonverbal
communication is just as important as verbal communication for getting people to com-
ply with requests, with acquiescence more likely if requestors engage in positive and
socially acceptable forms of touch, use eye contact, present a professional and well-
groomed appearance, and stand moderately close to the person whom they are trying
to persuade (Segrin 1993). The associations between these behaviors and compliance-
gaining, however, are mediated by at least two variables. First, the credibility of the re-
questor is vitally important. Individuals who are considered to be trustworthy, likeable,
composed, competent, and charismatic are more likely to be perceived as credible, and
therefore to be more persuasive (Burgoon, Birk, and Pfau 1990). Second, people are
more likely to comply with requests that they perceive to be legitimate compared to
those that are perceived to be unreasonable and unnecessary (e.g., Baron 1978).
The way that nonverbal cues are processed in persuasive contexts may also modify
the relationship between these cues and enduring social influence. Petty and Cacioppo’s
(1986) classic work on the elaboration likelihood model of persuasion suggests that non-
verbal cues can be processed one of two ways – through a central or a peripheral route.
Central route processing involves carefully considering the relevance and merit of the
information or behavior that is presented, whereas peripheral route processing involves
assigning meaning to a simple cue, such as appearance, without much (if any) appraisal
of that cue. Nonverbal cues can be processed either way. For example, a voter might
evaluate a political candidate’s facial expressions carefully to try to determine if the
candidate really cares about the issues she is discussing (central processing), or the
voter may simply note that the candidate smiled and infer that she is a friendly person
(peripheral processing). Long-term social influence is more likely when information is
processed centrally. Nonetheless, as Burgoon, Birk, and Pfau’s (1990) research suggests,
cues that are processed peripherally can lead to perceptions of credibility, which can,
in turn, affect the persuasion process. In the political realm, nonverbal cues such as
facial expressions of happiness (Masters et al. 1987), physical appearance cues (Lau
and Redlawsk 2001), and vocal pitch (Gregory and Gallagher 2002) act as heuristic
cues that impact credibility judgments and voter preferences.
5.7. Deception
Deception is included as a unique communication function because it reflects situations
in which communicators knowingly manage a message to “foster a false belief or con-
clusion by a receiver” (Buller and Burgoon 1996: 205). That is, they covertly violate the
Gricean maxims of cooperative discourse (Grice 1989; McCornack 1992). Although
considerable research has been conducted on nonverbal deception cues, virtually all
deception scholars note that no single cue serves as a reliable indicator of deception
(e.g., Vrij 2006). Cues related to anxiety and tension that often accompany deception
(cues such as heightened pitch and nonfluencies) have been identified across studies,
but the complexity of interaction makes detecting deception from such cues very diffi-
cult. A key element of the communication perspective on deception is the recognition
that deceivers manage their behavior and respond to the communicative actions of
622 IV. Contemporary approaches
others. Thus, rather than seeking to identify behaviors that inadvertently “leak” infor-
mation about deception, communication researchers have explored how deception oc-
curs in interactive contexts. Interpersonal deception theory (IDT; Buller and Burgoon
1996) provides a conceptual framework for explaining how the behavior of both deceiv-
ers and receivers contribute to the success (or failure) of deceivers and the impressions
formed by their interaction partners. IDT emphasizes the numerous communication
tasks liars must accomplish simultaneously during interaction, such as managing anxiety
and responding appropriately. IDT also notes that in face-to-face encounters, decei-
vers and receivers both influence the interaction. When people communicate with one
another, a number of interaction processes, such as synchrony and matching, come into
play (Burgoon et al. 1999). For instance, White and Burgoon (2001) found that although
deceivers were less nonverbally involved initially in interactions, they increased their
involvement over time. This adjustment was most notable when deceivers were inter-
acting with a partner who displayed increased nonverbal involvement. Thus, examina-
tion of deception as a communication function requires examination of the behavior of
all participants in the interaction.
6. Summary
A communication perspective views verbal and nonverbal communication as intimately
intertwined components of the human signaling system that has both biological and
social origins. Together, language and nonverbal codes form a goal-oriented system
for creating and sharing meaning among members of a social community. The nonver-
bal codes of kinesics, vocalics, physical appearance, proxemics, haptics, chronemics and
artifacts, separately or in combination with one another and with linguistic components,
accomplish such communication functions as producing and comprehending messages;
expressing and interpreting emotions; creating personal and social identities and
managing self-presentations; managing interactions; defining interpersonal relations;
influencing others; and perpetrating and detecting deception.
7. References
Afifi, Walid A. and Michelle L. Johnson 1999. The use and interpretation of tie signs in a public
setting: Relationship and sex differences. Journal of Social and Personal Relationships 16: 9–38.
Andersen, Peter A. 1985. Nonverbal immediacy in interpersonal communication. In: Aron Wolfe
Siegman and Stanley Feldstein (eds.), Mulitchannel Integrations of Nonverbal Behavior, 1–36.
Hillsdale, NJ: Lawrence Erlbaum.
Andersen, Peter A. 1991. When one cannot not communicate: A challenge to Motley’s traditional
communication patterns. Communication Studies 42(4): 309–325.
Andersen, Peter A. 1998. The cognitive valence theory of intimate communication. In: Mark T.
Palmer and George A. Barnett (eds.), Progress in Communication Sciences, Volume 14: Mutual
Influence in Interpersonal Communication Theory and Research in Cognition, Affect, and
Behavior, 39–72. Norwood, NJ: Ablex.
Andersen, Peter A. 2008. Nonverbal Communication: Forms and Functions, second edition. Long
Grove, IL: Waveland Press.
Baron, Robert A. 1978. Invasions of personal space and helping: Mediating effects of the invader’s
apparent need. Journal of Experimental Social Psychology 14: 304–312.
Bavelas, Janet Beavin 1990. Behaving and communicating: A reply to Motley. Western Journal of
Speech Communication 54: 593–602.
39. The codes and functions of nonverbal communication 623
Bavelas, Janet Beavin, Alex Black, Charles R. Lemery and Jennifer Mullett 1986 “I show you how
I feel.” Motor mimicry as a communicative act. Journal of Personality and Social Psychology
50: 322–329.
Bavelas, Janet Beavin and Nicole Chovil 2000. Visible acts of meaning: An integrated message
model of language in face-to-face dialogue. Journal of Language and Social Psychology 19:
163–194.
Bavelas, Janet Beavin and Nicole Chovil 2006. Nonverbal and verbal communication: Hand ges-
tures and facial displays as part of language use in face-to-face dialogue. In: Valerie Lynn Man-
usov and Miles L. Patterson (eds.), The Sage Handbook of Nonverbal Communication, 97–115.
Thousand Oaks, CA: Sage.
Buck, Ross 1988. Nonverbal communication: Spontaneous and symbolic aspects. American Behav-
ioral Scientist 31: 341–354.
Buck, Ross 1997. From DNA to MTV: The spontaneous communication of emotional messages.
In: John O. Greene (ed.), Message Production: Advances in Communication Theory, 313–339.
Mahwah, NJ: Lawrence Erlbaum.
Buller, David B. and Judee K. Burgoon 1996. Interpersonal deception theory. Communication
Theory 6: 203–242.
Burgoon, Judee K. 1983. Nonverbal violations of expectations. In: John M. Wiemann and Randall
P. Harrison (eds.), Nonverbal Interaction: Volume 11. Sage Annual Reviews of Communication,
11–77. Beverly Hills, CA: Sage.
Burgoon, Judee K. 1985. The relationship of verbal and nonverbal codes. In: Brenda Dervin and
Melvin J. Voight (eds.), Progress in Communication Sciences, Volume 6, 263–298. Norwood, NJ:
Ablex.
Burgoon, Judee K. and Aaron E. Bacue 2003. Nonverbal communication skills. In: Brant Raney
Burleson and John O. Greene (eds.), Handbook of Communication and Social Interaction
Skills, 179–219. Mahwah, NJ: Lawrence Erlbaum.
Burgoon, Judee K., Thomas Birk and Michael Pfau 1990. Nonverbal behaviors, persuasion, and
credibility. Human Communication Research 17: 140–169.
Burgoon, Judee K., David B. Buller, Amy S. Ebesu, Patricia A. Rockwell and Cindy White 1996.
Testing interpersonal deception theory: Effects of suspicion on nonverbal behavior and rela-
tional messages. Communication Theory 6: 243–267.
Burgoon, Judee K., David B. Buller, Cindy H. White, Walid A. Afifi and Aileen L. S. Buslig 1999.
The role of conversation involvement in deceptive interactions. Personality and Social Psychol-
ogy Bulletin 25: 669–685.
Burgoon, Judee K. and Leesa Dillman 1995. Gender, immediacy and nonverbal communication.
In: Pamela J. Kalbfleisch and Michael J. Cody (eds.), Gender, Power, and Communication in
Human Relationships, 63–81. Hillsdale, NJ: Erlbaum.
Burgoon, Judee K. and Norah E. Dunbar 2006. Dominance, power and influence. In: Valerie Man-
usov and Miles Patterson (eds.), The SAGE Handbook of Nonverbal Communication 279–298.
Thousand Oaks, CA: Sage.
Burgoon, Judee K., Kory Floyd and Laura K. Guerrero 2010. Nonverbal communication theories
of adaptation. In: Charles Berger, Michael E. Roloff and David R. Roskos-Ewoldsen (eds.),
The New Sage Handbook of Communication Science, 93–108. Thousand Oaks, CA: Sage.
Burgoon, Judee K., Laura K. Guerrero and Kory Floyd 2010. Nonverbal Communication. Boston:
Allyn and Bacon.
Burgoon, Judee K. and Jerold L. Hale 1984. The fundamental topoi of relational communication.
Communication Monographs 51: 193–214.
Burgoon, Judee K. and Jerold L. Hale 1988. Nonverbal expectancy violations: Model elaboration
and application to immediacy behaviors. Communication Monographs 55: 58–79.
Burgoon, Judee K. and Gregory D. Hoobler 2002. Nonverbal signals. In: Mark L. Knapp and John
Augustine Daly (ed.), Handbook of Interpersonal Communication, 240–299. Thousand Oaks,
CA: Sage.
624 IV. Contemporary approaches
Burgoon, Judee K. and Deborah A. Newton 1991. Applying a social meaning model to relational
messages of conversational involvement: Comparing participant and observer perspectives.
Southern Communication Journal 56: 96–113.
Burgoon, Judee K. and Thomas J. Saine 1978. The Unspoken Dialogue. Boston: Houghton-Mifflin.
Burgoon, Judee K., Lesa A. Stern and Leesa Dillman 1995. Interpersonal Adaptation: Dyadic
Interaction Patterns. New York: Cambridge University Press.
Cappella, Joseph N. 1987. Interpersonal communication: Definition and fundamental questions.
In: Charles R. Berger and Steven H. Chaffee (eds.), Handbook of Communication Science,
184–238. Newbury Park, CA: Sage.
Cappella, Joseph N. 1994. The management of conversational interaction in adults and infants. In:
Mark L. Knapp and Gerald R. Miller (eds.), Handbook of Interpersonal Communication, sec-
ond edition, 380–419. Thousand Oaks, CA: Sage.
Cappella, Joseph N. 2006. The interaction management function of nonverbal cues. In: Valerie
Lynn Manusov and Miles L. Patterson (eds.), The Sage Handbook of Nonverbal Communica-
tion, 361–379. Thousand Oaks, CA: Sage.
Cappella, Joseph N. and John O. Greene 1982. A discrepancy-arousal explanation of mutual influ-
ence in expressive behavior for adult-adult and infant-adult dyadic interaction. Communica-
tion Monographs 49: 89–114.
Coker, Deborah A. and Judee K. Burgoon 1987. The nature of conversational involvement and
nonverbal encoding patterns. Human Communication Research 13: 463–494.
DePaulo, Bella M. 1992. Nonverbal behavior and self-presentation. Psychological Bulletin 111: 203–240.
Dillard, James Price, Jason W. Anderson and Leanne K. Knobloch 2002. Interpersonal influence.
In: Mark L. Knapp and John A. Daly (eds.), Handbook of Interpersonal Communication, third
edition, 423–474. Thousand Oaks, CA: Sage.
Efron, Robert 1990. The Decline and Fall of Hemispheric Specialization. Hillsdale, NJ: Lawrence
Erlbaum.
Eibl-Eibesfeldt, Irenäus 1973. Expressive behaviour of the deaf and blind born. In: Mario von
Cranach and Ian Vine (eds.), Social Communication and Movement, 163–194. New York: Aca-
demic Press.
Ekman, Paul 1971. Universal and cultural differences in facial expressions of emotion. In: James K.
Cole (ed.), Nebraska Symposium on Motivation, 207–283. Lincoln: University of Nebraska Press.
Ekman, Paul, E. Richard Sorenson and Wallace V. Friesen 1969. Pan-cultural elements in facial
displays of emotion. Science 164: 86–88.
Elfenbein, Hillary Anger and Nalini Ambady 2002. On the universality and cultural specificity of
emotions. Science 164: 86–88.
Elfenbein, Hillary Anger and Nalini Ambady 2003. When familiarity breeds accuracy: Cultural expo-
sure and facial emotion recognition. Journal of Personality and Social Psychology 85: 276–290.
Fridlund, Alan J. and Bradley Duchaine 1996. Facial expressions of emotion and the delusion of
the hermetic self. In: Rom Harrè and W. Gerrod Parrott (eds.), The Emotions: Social, Cultural,
and Biological Dimensions, 259–284. Thousand Oaks, CA: Sage.
Galati, Dario, Klaus R. Scherer and Pio E. Ricci-Bitti 1997. Voluntary facial expression of emo-
tion: Comparing congenitally blind with normally sighted encoders. Journal of Personality and
Social Psychology 73(6): 1363–1379.
Gallois, Cindy, Tania Ogay and Howard Giles 2005. Communication accommodation theory: A
look back and a look ahead. In: William B. Gudykunst (ed.), Theorizing about Intercultural
Communication, 121–148. Thousand Oaks, CA: Sage.
Giles, Howard and Tania Ogay 2006. Communication accommodation theory. In: Bryan Whalen
and Wendy Samter (eds.), Explaining Communication: Contemporary Theories and Exemplars,
293–310. Mahwah, NJ: Lawrence Erlbaum.
Grice, Paul 1989. Studies in the Way of Words. Cambridge, MA: Harvard University Press.
Goffman, Erwing 1959. The Pesentation of Self in Everyday Life. Garden City, NY: Anchor/
Doubleday.
39. The codes and functions of nonverbal communication 625
Gregory, Stanford W. and Timothy J. Gallagher 2002. Spectral analysis of candidates’ nonverbal
vocal communication: Predicting U.S. presidential election outcomes. Social Psychology Quar-
terly 65: 8–315.
Guerrero, Laura K. and Lisa Farinelli 2009. Key characteristics of messages: The interplay of ver-
bal and nonverbal codes. In: William F. Eadie (ed.), 21st Century Communication: A Reference
Handbook, 239–248. Thousand Oaks, CA: Sage.
Guerrero, Laura K. and Kory Floyd 2006. Nonverbal Communication in Close Relationships. Mah-
wah, NJ: Lawrence Erlbaum.
Hall, Edward T. 1959. The Silent Language. Garden City, NY: Anchor/Doubleday.
Hall, Judith A. 2006. Women’s and men’s nonverbal communication: Similarities, differences,
stereotypes, and origins. In: Valerie Lynn Manusov and Miles L. Patterson (eds.), The Sage
Handbook of Nonverbal Communication, 201–218. Thousand Oaks, CA: Sage.
Hall, Judith A. and Mark L. Knapp 2010. Nonverbal Communication in Human Interaction, sixth
edition. Boston: Wadsworth.
Hopkins, William D. (ed.) 2007. The Evolution of Hemispheric Specialization in Primates. New
York: Academic Press.
Izard, Carroll Ellis 1977. Human Emotions. New York: Plenum.
Kendon, Adam 1990. Conducting Interaction: Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Krauss, Robert M., Yihsiu Chen and Purnima Chawla 1996. Nonverbal behavior and nonver-
bal communication: What do conversational hand gestures tell us? In: Mark P. Zanna (ed.),
Advances in Experimental Social Psychology, Volume 28, 389–450. New York: Academic
Press.
LaFrance, Marianne, Marvin A. Hecht and Elizabeth Levy Paluck 2003. The contingent smile: A
meta-analysis of sex differences in smiling. Psychological Bulletin 129: 305–334.
LaPlante, Debi and Nalini Ambady 2000. Multiple messages: Facial recognition advantage for
compound expressions. Journal of Nonverbal Behavior 24: 211–224.
Lau, Richard R. and David P. Redlawsk 2001. Advantages and disadvantages of cognitive heuris-
tics in political decision-making. American Journal of Political Science 45: 951–971.
MacLean, Paul D. 1990. The Triune Brain in Evolution: Role in Paleocerebral Functions. New
York: Plenum.
Manusov, Valerie Lynn 1992. Mimicry or synchrony: The effects of intentionality attributions for
nonverbal mirroring behavior. Communication Quarterly 40: 69–83.
Marsh, Abigail A., Hillary Anger Elfenbein and Nalini Ambady 2003. Nonverbal “accents”: Cul-
tural difference in facial expressions of emotion. Psychological Science 14: 373–376.
Masters, Roger D., Denis G. Sullivan, Alice Feola and Gregory J. McHugo 1987. Television cov-
erage of candidates’ display behavior during the 1984. Democratic primaries in the United
States. International Political Science Review 8: 121–130.
McCornack, Steven A. 1992. Information manipulation theory. Communication Monographs 59:
1–16.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Mehrabian, Albert 1969. Significance of posture and position in the communication of attitude
and status relationships. Psychological Bulletin 71: 359–372.
Morsella, Ezequiel and Robert M. Krauss 2001. Movement facilitates speech production: A ges-
ture feedback model. Unpublished manuscript.
Morsella, Ezequiel and Robert M. Krauss 2004. The role of gestures in spatial working memory
and speech. American Journal of Psychology 117: 417–424.
Morsella, Ezequiel and Robert M. Krauss 2005. Can motor states influence semantic processing?
Evidence from an interference paradigm. In: Alexandra M. Columbus (ed.), Advances in Psy-
chology Research, Volume 36, 163–182. Hauppauge, NY: Nova.
626 IV. Contemporary approaches
Motley, Michael T. 1990. On whether one can(not) not communicate: An examination via tradi-
tional communication postulates. Western Journal of Speech Communication 54: 1–20.
Motley, Michael T. 1991. One may not communicate: A reply to Andersen. Communication Stu-
dies 42: 326–339.
Özyürek, Asli 2002. Do speakers design their co-speech for their addressees? The effects of
addressee location on representational gestures. Journal of Memory and Language 46: 665–687.
Patterson, Miles L. 1976. An arousal model of interpersonal intimacy. Psychological Review 83:
235–245.
Patterson, Miles L. 1983. Nonverbal Behavior: A Functional Perspective. New York: Springer.
Patterson, Miles L. 2001. Toward a comprehensive model of nonverbal communication. In: Wil-
liam Peter Robinson and Howard Giles (eds.), The New Handbook of Language and Social
Psychology, 159–176. Chichester, UK: Wiley and Sons.
Petty, Richard E. and John T. Cacioppo 1986. The elaboration likelihood model of persuasion. In:
Leonard Berkowitz (ed.), Advances in Experimental Social Psychology, Volume 19, 123–205.
New York: Academic Press.
Philpott, Jeffrey S. 1983. The relative contribution to meaning of verbal and nonverbal channels of
communication: A meta-analysis. Unpublished master’s thesis, University of Nebraska.
Planalp, Sally, Victoria Leto DeFrancisco and Diane Rutherford 1996. Varieties of cues to emo-
tion in naturally occurring situations. Cognition and Emotion 10: 137–153.
Ploog, Detlev W. 2003. The place of the triune brain in psychiatry. Physiology and Behavior 70:
487–493.
Prager, Karen J. 2000. Intimacy in personal relationships. In: Clyde Hendrick and Susan S. Hen-
drick (eds.), Close Relationships: A Sourcebook, 229–242. Thousand Oaks, CA: Sage.
Prager, Karen J. and Linda J. Roberts 2004. Deep intimate connection: Self and intimacy in couple
relationships. In: Debra J. Mashek and Arthur P. Aron (eds.), Handbook of Closeness and Inti-
macy, 43–60. Mahwah, NJ: Lawrence Erlbaum.
Rimé, Bernard and Loris Schiaratura 1991. Gesture and speech. In: Robert S. Feldman and Ber-
nard Rimé (eds.), Fundamentals of Nonverbal Behavior, 239–281. Cambridge: Cambridge Uni-
versity Press.
Robinson, Jeffrey D. 1998. Getting down to business: Talk, gaze, and body orientation during
openings of doctor-patient consultations. Human Communication Research 25: 97–123.
Segrin, Chris 1993. The effects of nonverbal behavior on outcomes of compliance-gaining at-
tempts. Communication Studies 44: 169–187.
Shepard, Chris, Howard Giles and Beth A. Le Poire 2001. Communication accommodation
theory. In: William Peter Robinson and Howard Giles (eds.), The New Handbook of Language
and Social Psychology, 33–56. Chichester, UK: Wiley.
Swerts, Mark and Emiel Krahmer 2005. Audiovisual prosody and feeling of knowing. Journal of
Memory and Language 53: 81–94.
Tedeschi, James T. and Nancy M. Norman 1985. Social power, self-presentation, and the self. In:
Barry R. Schlenker (ed.), The Self and Social Life, 293–322. New York: McGraw-Hill.
Tomkins, Silvan S. 1963. Affect, Imagery, Consciousness: Volume 2. The Negative Affects. New
York: Springer.
Vrij, Aldert 2006. Nonverbal communication and deception. In: Valerie Lynn Manusov and Miles
L. Patterson (eds.), The Sage Handbook of Nonverbal Communication, 341–359. Thousand
Oaks, CA: Sage.
White, Cindy H. and Judee K. Burgoon 2001. Adaptation and communicative design: Patterns of
interaction in truthful and deceptive conversations. Human Communication Research 27: 9–37.
Abstract
The chapter presents a definition and analysis of multimodal communication according
to a model in terms of goals and beliefs. Communication is a process in which a Sender
has a goal (a conscious intention, an unconscious, tacit, social goal, or a biological func-
tion) to have some Addressee come to know some belief, and to achieve this goal pro-
duces, in some modality (words, prosody, gesture, touch, gaze, face) a signal
connected, in one’s and the Addressee’s mind, to some meaning (the belief to convey),
according to a communication system, i.e., a set of rules to put signals and meanings
in correspondence. The model presented argues that not only words and symbolic ges-
tures, but also gaze, touch and other communication systems form a lexicon, i.e., a sys-
tematic list of signal-meaning pairs, where each signal can be analyzed in terms of a
small set of parameters, like are words in phonology. The chapter provides analyses
from the lexicons of symbolic gestures, gaze and touch, and presents the “score” of multi-
modal communication, a scheme to annotate the meanings conveyed simultaneously in
various modalities, while showing its potential for the analysis of sophisticated aspects
of communication in comic films, political discourse and piano performance.
1.1. Goals
The life of any natural or artificial, individual or collective system (a human, an octopus,
a robot, a machine, a group, an institution) is ruled by goals. A goal is a state, repre-
sented or not in a system, that regulates its corresponding action: when the perceived
628 IV. Contemporary approaches
state is different from the regulating state, the system performs some action to cancel
the difference (Miller, Galanter, and Pribram 1960). To achieve a goal, a system devises
and performs a plan by making use of internal resources (the system’s action capacities
and beliefs) and external resources (material resources in the environment and social
exchange with other systems). A plan is a set of actions aimed at hierarchically arranged
goals, where a goal (e.g., creating the necessary world conditions) may aim at a super-
ordinate goal, a supergoal, and all goals and supergoals aim at a final goal. For example,
if to eat I decide to cook spaghetti, I have to heat water, boil spaghetti and make sauce.
Given the definition of goal as simply a regulating state, several psychological no-
tions can be considered as goals: a person’s wishes and intentions, needs, drives, biolog-
ical functions of animals and functions of artefacts. Further, a goal does not necessarily
imply conscious volition, nor is it necessarily represented in the system it regulates:
internal vs. external goals are distinguished. An internal goal is one represented, either
consciously or not, in an individual, while an external goal is not represented in the sys-
tem but determines its features or actions, as do the functions of artefacts, social roles
and biological functions. The function of a chair of letting me sit down is not repre-
sented in the chair but in my mind: an internal goal for me but an external one for
it. The social role of a newspaper director is determined by the goal of that organization
that someone decides and is responsible for what is written in the newspaper. The goal
of flying away triggered in a bird by the outline of a predator, though not explicitly
represented in the bird, has the biological function of preventing predation.
The power of a system to achieve one’s goals depends on world conditions, the pres-
ence of resources and the system’s capacity to perform necessary actions. If world con-
ditions are not fulfilled, system A can fulfil them through a sub-plan, but if A cannot do
the right actions or does not possess the necessary resources, it may depend on another
system B and have the goal that B adopt A’s goal. B adopts A’s goal when B gets regu-
lated by A’s goal and helps A to achieve it. Different kinds of adoption exist: exploit-
ative (I host you in a hut in a cotton field so you work for me as a slave), social exchange
(I lend you my car so you lend me your pied-à-terre), cooperative (I help you to do your
home assignment so we can go to the movie together), altruistic (I dive into the sea to
save you), normative (I let you cross the street because the light is red). Through adop-
tion, systems provide resources to each other, enhancing the likeliness of adaptation.
Adoption may entail social influence. Social influence takes place when A causes B
to have a goal B did not have before, or to give up a previous goal. If A wants B to
adopt A’s goal, she can try to influence B, i.e., induce B to place his actions or resources
at A’s disposal: if A is out of salt, she can ask neighbour B to lend her some. We try to
influence others for both selfish and altruistic goals: I may influence you to do things
that are in your interest, like advising you to take an umbrella when it’s raining or telling
you of a job opening.
1.2. Beliefs
More than other animals, humans need to acquire, process and use beliefs in goal pur-
suit, in order to check pre-conditions for action, likeliness of success and respective
worth of alternative goals. Beliefs may be represented in a system in either a sensori-
motor or a propositional format, or both. Sensorimotor representations are mental
images or body movements: the visual image of a chair (its shape, back, number of
40. Mind, hands, face and body 629
legs) or the muscular actions we perform to sit down. In the propositional format, the
chair is represented as an object, a piece of furniture, with the function of sitting down:
a set of propositions, each formed by a predicate and its arguments, with a predicate
being a property of a single argument or a relation among two or more arguments
(Parisi and Antinucci 1973; Castelfranchi and Parisi 1980). Beliefs may be assumed
with a different status depending on their context of assumption (real world, fantasy,
dreams…) and their degree of certainty (Miceli and Castelfranchi 1995; Castelfranchi
and Poggi 1998). But how are they acquired, organized and managed? Perception, sig-
nification and communication are three acquisition devices, while through memory and
inference, respectively, previous beliefs are stored and new ones are generated. In sen-
sation and perception, our first ways to acquire beliefs from the world, the steps from
perceiving to knowing are relatively few: the beliefs we come to assume strictly corre-
spond to the stimuli met by our senses. After selection and processing, perceived beliefs
get stored in long term memory, where they connect to each other through links of time,
space, class-example, set-subset, condition, cause-effect and means-goal, forming com-
plex networks ruled by the law of reciprocal compatibility: two contradictory beliefs
with the same degree of certainty, in a context of assumption of reality (not, for
instance, dreams or fantasy) cannot be accepted at the same time, and one or the
other must be rejected (Miceli and Castelfranchi 1995). Through inference, by connect-
ing two or more beliefs, we generate new ones. An inference is a new belief I draw from
old beliefs by applying some rule of inference. If I see that John tastes an orange and
leaves it (belief 1, acquired now through perception), and I know he loves oranges
(belief 2, a previous belief retrieved from memory), I may infer that this particular
orange is very sour (belief 3, drawn by inference). Signification can be viewed as the
crystallization of inference. If from a particular belief acquired from a perceivable stim-
ulus, whatever its context, I invariably draw the same inference, the stimulus becomes a
signal, and the inference becomes its meaning. If I see smoke, and I often verified that
where there is smoke there is fire, I can say that “smoke means fire.” With time, the link
between the two beliefs comes to be stored in memory, making the inference no longer
necessary, until the first belief finally recalls the second; it stands for the second (Saussure
1916; Peirce 1935; Eco 1975).
Communication is a way to get information from other people, by taking the stimuli
people produce as signals, and finding meaning in them. We often get information from
others simply thanks to our capacity of inferring beliefs from what we perceive; but in
communication we acquire information from another person just because that person
has the goal of sending us a signal to convey information. The difference between infer-
ring beliefs from other people’s being or behaving and receiving beliefs because they
communicate them to us is like the difference between theft and gift: beliefs we infer
from people are robbed from them, while beliefs people communicate are a gift they
bestow upon us. In this, communication follows the law of “reciprocal altruism of
knowledge” (Castelfranchi and Poggi 1998) – a version in terms of goals and beliefs
of Grice’s cooperative principle – according to which, when one needs a belief, the
other is bound to provide it, and the other way around.
At the same time, though, communication is an act of influence, since with any com-
municative act we provide beliefs about a goal of ours, to request another to adopt it.
By a command we ask the other to do something for us, by a question, to let us know
something, by informing, to believe what we say.
630 IV. Contemporary approaches
Actually, this also opens a chance for deception. The function of communicating is to
induce goals in others; but humans decide which goals to pursue on the basis of the be-
liefs they have. So to influence others, one may need to provide information apt to trig-
ger the goals one wants, whether one believes that information to be true or not.
Castelfranchi and Poggi (1998) define as deceptive any act, morphological feature or
even any non-act (omission) of some system S whose goal is that some system A
does not come to have some belief B that S believes to be true, and that is relevant
to A’s goals. S can deceive by linguistic actions (like in lies) but also by non-verbal com-
municative actions (a firm handshake and a smile to a person she hates), by actions
which are not communicative per se (hiding her lover in her wardrobe) or by doing
nothing at all (simply omitting to tell her partner she has AIDS).
2. Communication
In terms of the notions above, a communicative process takes place when a sender S has
the goal G of causing some addressee A to come to have some beliefs through signifi-
cation (i.e. to come to believe some meaning M), and in order to this goal, S produces a
signal s that S supposes is linked to the meaning M both in S’s and in A’s minds. The
signal s is produced in a productive modality PM, it is perceivable in a receptive
modality RM, and it is linked to the meaning M through a communication system CS.
The elements of this process are defined as follows:
(i) sender: a system who has the goal of conveying some belief to an addressee using
signification, and in order to this goal produces a signal linked to some meaning;
(ii) goal of communicating: the sender’s goal of having an addressee believe some
belief;
(iii) addressee: the system to whom the sender has the goal of making have beliefs;
(iv) signal: a perceivable stimulus (an action, object, part or aspect of an object, a
morphological feature or even a non-action like silence) that is linked to some
meaning. The sender believes that the addressee can perceive it and that in the
addressee’s mind that stimulus is linked to the same meaning as in one’s own.
(v) meaning: a belief or set of beliefs that corresponds to a signal;
(vi) productive modality: the body organ used by the sender to produce the signal;
(vii) receptive modality: the sensory organ through which the addressee may receive
the signal;
(viii) communication system: the system of rules to put signals and meanings in
correspondence.
gurgled by a shark, a communicative process has taken place all the same, even if
communication was not successful.
Two features characterize this definition. On one side, positing the necessity of a goal
of communicating allows one to distinguish when belief acquisition is simply due to the
receiver’s inference from when it is due to a sender’s goal, hence a view of communi-
cation that does not include any acquisition of information, even possibly leaking from
the emitter against his will. On the other side, thanks to the notion of goal as a regula-
tory state, not necessarily a conscious intention, animal, unconscious and the so-called
non-verbal communication also may be called communication in their own right.
The goal of communicating may be internal or external, i.e., represented or not in the
individual’s mind. A conscious internal goal of communicating is represented and meta-
represented: we have the goal of having someone know, and we also believe we have
this goal. This is generally the case for all verbal communication, and the so-called “em-
blems” of other modalities: “symbolic” gestures, those that in a particular culture have a
codified shared meaning and a clear verbal paraphrase; “facial emblems,” like a grimace
conveying ignorance or a head nod expressing agreement. Unconscious signals, instead,
like a neurotic symptom, a tic or a compulsive behaviour, may have the goal of commu-
nicating some unconscious disturbance, but the Sender is not aware of this goal, nor
possibly of the disturbance itself. Finally, while talking, we make “beat” gestures (we
move our hands down in correspondence with stressed syllables) and raise our eye-
brows to emphasize what we are saying. These movements have the goal of communi-
cating that the part of sentence or discourse we are presently uttering is important, but
this goal is tacit, not represented at a high level of awareness: the signals are performed
in an automatic way, without attentive control.
Cases of communication ruled by an external goal of communicating, not repre-
sented in the Sender’s mind, are the gasoline light, warning when you’re out of fuel
( function of an artefact), a seagull’s flight warning the flock of a predator (biological
communicative function), and uniforms, status symbols and regional accent, ruled by
the social ends of expressing one’s identity or group belonging.
(i) an action of an object (e.g., the gasoline led lighting up), organism (a gesture, a cry,
a sentence) or group (the march of a crowd);
(ii) an object produced by an action (a book, a movie, a statue, a monument);
(iii) an object used during an action (black glasses worn at a funeral to hide crying eyes,
a cop’s helmet);
(iv) a part of an object (green light of the traffic light) or organism (a woman’s fleshy lips);
(v) an aspect of an object (fluorescence of car stops), organism (cute round face of a
baby) or group (density of a crowd);
(vi) a non-action (e.g., silence), if different from expected.
In brief, we may count as a signal any physical stimulus that in the sender’s and (as
assumed by the sender) in the addressee’s mind is linked to some meaning: a word,
a picture, a kiss, a slap, a strike, a resignation letter, a terrorist action…
632 IV. Contemporary approaches
Signals are produced and perceived in various modalities. In humans the receptive
modalities exploit their five senses: a perfume you wear is perceived through smell, a
tasty food conveying love or hospitality, through taste. But the majority of signals are
received through visual and auditory modalities, and produced by various body organs.
The hypothesis of this model is that not only head, face, hands, trunk, legs, but even sub-
parts of body organs, like eyes or nose, are repositories of specific communication
systems, whose signals may be listed and described.
2.3. Meaning
The meaning is the set of beliefs conveyed by a signal. The meanings a human may need
to provide to others for one’s adaptive goal, conveyed by signals of one or the other
communication system, are of three types (Poggi 2007): information on the world
(about concrete and abstract events, entities, properties, relations, times and places),
on the sender’s identity (sex, age, ethnicity, cultural roots and personality), and on
the sender’s mind (goals, beliefs and emotions concerning ongoing discourse).
(i) relation of a signal with signals in other modalities. The “beats” we make to scan
speech are non-autonomous, we can use them only while talking, while “symbolic”
gestures, as well as signs of Sign Languages, are autonomous: they can be used
without speaking.
(ii) cognitive construction: whether and how a signal is represented in long term mem-
ory. Some signals are codified, i.e., the signal-meaning link is stably represented in
long term memory, like lexical items in a dictionary, and they form a “lexicon”. But
this property, generally acknowledged only for words and symbolic gestures, can
be credited also to signals in other modalities, like gaze, touch, head movements,
postures…
(iii) As opposed to a codified signal, a creative signal is one invented on the spot to
convey some meaning for which no corresponding signal is stored in memory:
for example a neologism or an iconic gesture. Suppose you want to convey the
meaning “cello” by a gesture. No ready made gesture-meaning signal exists, so
you have to create one, and you will produce a hand movement that in some
way resembles or recalls a cello, to have the Addressee understand it. In this
case, what is at work is not a lexicon but a set of rules for the generation of
creative signals.
(iv) signal – meaning relationship: whether or not the meaning can be inferred from the
signal. A signal is motivated (Saussure 1916) if it has a relation of similarity with its
meaning (e.g., the iconic symbol fork-and-knife to mean “restaurant,” the ono-
matopoeic word “drip”), or one of compositionality (like in morphological deriva-
tives) or mechanical determinism (the nouns for “mama,” which in all cultures
40. Mind, hands, face and body 633
exploit labial consonants; the arm raising of gestures of elation, induced by the
arousal of that emotion). When the signal is not linked to its meaning by these
or other devices that allow one to infer the meaning from the signal without
knowing it in advance, it is an arbitrary signal.
(v) correspondence between signal and communicative act. The unit of communication
in humans is the communicative act, which includes a performative (the Sender’s
specific communicative goal: asserting, asking, requesting) and a propositional con-
tent (what the Sender asserts, asks, requests). Articulated signals, like most words
and some symbolic gestures, convey only part of a communicative act, while holo-
phrastic signals, like interjections (Poggi 2008), many gaze items and other sym-
bolic gestures, convey a whole communicative act, the meaning of a whole
sentence. The Italian symbolic gesture hand with index and middle fingers up in
V shape moving back and forth before mouth means “smoke” or “cigarette”
(Fig. 40.1), while hand palm down bent down twice means “I ask you to come
here” (Fig. 40.2).
(i) “Phonology” is a set of parameters, each with a small number of possible values,
such that each item in the system is described by the combination of the specific
values it assumes as to all parameters. Like in the words town and down the param-
eter “voicing” respectively assumes the value “non-voiced” for /t/ and “voiced” for
/d/, similarly in the Italian symbolic gestures for “mad” (index finger rotating on
temple) and “good, tasty!” (index rotating on cheek) the parameter “location”
assumes different values, “temple” and “cheek”;
(ii) – lexicon is the list of signal-meaning pairs stored in long-term memory.
(i) For any signal considered communicative, it is possible to represent its meaning in
terms of “cognitive units,” i.e. logical propositions, each formed by a predicate with
its arguments. E.g., the performative of imploration, conveyed by head sideward-
forward, gaze to interlocutor, internal eyebrows raised (Fig. 40.3), contains the
following cognitive units:
a. Sender has the goal that addressee do an action a;
b. Sender believes the action a is in the interest of sender;
c. Sender believes that addressee has power over sender;
d. Sender believes that if addressee does not do the action a, then sender will
be sad.
(ii) Any signal, whether verbal or non-verbal, may be polysemic, that is, have two or
more different meanings, and one of them is triggered by context. But this does not
imply that meaning is freely floating, because the different meanings of one signal
all share a common semantic core (one or more cognitive units): e.g., raising eye-
brows may convey surprise, perplexity or emphasis, but all share a common core of
request for attention;
(iii) Signals, besides their literal meaning, often have an indirect meaning, that can be
inferred from the literal one, but in some cases is codified as another meaning of
40. Mind, hands, face and body 635
that signal, to be also written in a lexicon. E.g., clapping hands has a literal meaning
of praise and an ironical one of blame.
To find out the meanings of items in a body lexicon, six methods can be used.
(i) Speaker’s judgements (Chomsky 1965). For each item, wonder if it is acceptable in
a given context, if it is ambiguous, if it has synonyms or paraphrases in words or
other modalities.
(ii) Deductive method. Figure out what types of information one may need to convey
for one’s adaptive goals, and wonder if and how they are conveyed in a given
communication system.
(iii) Ethnosemantics. Collect and analyse words describing non-verbal items. The
semantic differences between gaze and stare, glance, peek, wink, frown, glower,
blood-shot eyes, make sheep’s eyes, to look down on someone help to specify
what communicative action each communicative act of gaze conveys.
(iv) Observation. Analyze videorecordings of communicative interactions, collect uses
of each signal, find its meaning in each use and the core meaning shared by all uses.
(v) Empirical studies. Test the hypothesis about the meaning of a certain signal
through questionnaires or interviews.
(vi) Simulation. Simulate the items in embodied agents by representing their “phono-
logical” and semantic analysis, and assess their interpretation by users through
evaluation studies.
Within gestures informing about the sender’s identity, flat hand on one’s heart, often
used by politicians (Serenari 2003), claims the sender’s positive image: it points to one-
self as if saying “I” or “we,” but with a nuance of “I/we, the fair and noble person(s).”
Many Italian symbolic gestures convey information about the sender’s mind. Some
inform on the degree of certainty of the beliefs mentioned: shaking index finger
means “no,” i.e., “I do not assume this information as true”; opening flat hands with
palms up means perplexity, a mental state of uncertainty. Other gestures inform on
the source of the mentioned beliefs: snapping thumb and middle fingers means “I am
retrieving beliefs from long-term memory”; both hands bending index and middle
fingers with palms to hearer mean “I am quoting another’s belief.”
Among gestures informing on the sender’s goals, some communicate performatives
(like “I apologize” or “Attention”). Others mention the relation of something to the
speaker’s goals: sliding hand-back under chin (= “I couldn’t care less,” i.e., “there is
no goal of mine to which this is relevant”: Fig. 40.4); fist rapidly rotating on wrist
with thumb and index finger extended (= “nothing to do,” “no way to have this goal
achieved”). Some gestures express logical links among sentences in a discourse, fulfill-
ing a metadiscursive function: fist slowly rotating on wrist with thumb and index finger
extended curve sets a causal link between facts mentioned in a discourse. Raising one
hand has a conversational function of requesting turn.
Among gestures informing on the speaker’s emotions, “Churchill’s gesture,” index and
middle finger extended upward, expresses elation for an achievement, beating right fist
40. Mind, hands, face and body 637
over left hand palm up, disappointment, while the “cheek screw” (Morris 1977), tip of
the extended index finger rotating like a screw on the cheek (Fig. 40.5), expresses physical
or quasi-physical pleasure, for a tasty food, a pretty girl or even an exciting book.
Various Italian gestures make use of rhetorical figures: metaphor, synecdoche, irony,
hyperbole. The finger bunch, palm down, beaten on breast (Fig. 40.6) literally means “mi
sta qua” (“it’s here, on my stomach, I can’t digest it”), but metaphorically counts as
“I can’t bear him/her”: what cannot be digested is not a food but a person.
Clapping hands, besides its literal use to approve or praise, may be used ironically to
express a sarcastic praise, hence strong disapproval or criticism. A hyperbolic gesture
is index finger skimming one’s cheek from cheek-bone down, iconically representing a
tear: the literal meaning is “I cry” but crying is a hyperbole, the intended meaning is
simply “I am sad.” Typically hyperbolic are some gestures of description, comment
or threat representing sexual organs or actions. Finally, some gestures use a synecdoche:
they represent some object or action to refer to something linked to it. Hand with
spread fingers covering one’s face (Fig. 40.7) mimics the bars of a jail to mean “jail”
(part-whole synecdoche) or “criminal” (one who often is in jail: container-content).
Right index touching left wrist (Fig. 40.8) means “what time is it?” or “hurry up, it’s
late.” A recursive synecdoche: from place (wrist) to object (watch), to its function
(knowing the time), to resulting action (hurrying).
(i) movements of the eyebrows (e.g., frowning means worry or concentration, eyebrow
raising, perplexity or surprise);
(ii) position, tension and movement of the eyelids (in hate you lower upper and raise
lower eyelids with tension; in boredom upper eyelids are lowered but relaxed);
(iii) features of eyes: humidity (bright eyes of joy or enthusiasm), reddening (bloodshot
eyes of rage), pupil dilation (sexual arousal); focusing (stare out into space when
thoughtful), direction of iris with respect to direction of Speaker’s head and to
Interlocutor’s location (deictic use of eyes);
(iv) size of eye sockets (to express tiredness);
(v) duration of movements (a defying gaze keeps longer eye contact).
Also in gaze, like in symbolic gestures, several codified items can be found. Within infor-
mation on the world, gaze conveys information about entities (its deictic use: gazing at
something or someone to refer to it) or about properties (squinting eyes = little, wide
open eyes = huge: an iconic use of gaze). As to the sender’s identity, eyelid shape reveals
ethnicity, bright eyes reveal aspects of personality. Within information on the sender’s be-
liefs, gaze can tell how certain we are of what we are saying (slight frown = “I am serious,
not kidding”; raised eyebrows without wide open eyes = “I am perplexed, not sure”), and
what is the source of what we are saying (eyes left-downward = “I am retrieving from
memory”). Further, gaze communicates the performative of our sentence (staring at inter-
locutor = request for attention, frowning = question, fixed stare = defiance), topic-comment
distinction (averting vs. directing gaze to interlocutor), turn-taking (gaze at speaker to take
the floor) and backchannel (frown = incomprehension or disagreement).
Recently the hypothesis has been made (Poggi and Roberto 2007; Poggi, D’Errico,
and Spagnolo 2010) that some values in the parameters of gaze are comparable to mor-
phemes, since by themselves they convey specific meanings. For example, wide open
eyelids imply activation and alert, while half open eyelids have a semantic nuance of
de-activation or relaxation, but raised lower eyelids convey effort, even when the general
meaning of the gaze item is one of de-activation.
40. Mind, hands, face and body 639
(i) touching part: the part of the sender’s body actively touching the addressee (hair,
forehead, head, eyelash, nose, cheek, beard, lips, teeth, tongue, shoulder, arm,
back, elbow, hand, fingers, nails, hip, genitals, glutei, thigh, knee, foot);
(ii) touched part: the part of the addressee’s body that is touched (hair, forehead, head,
eyebrows, eyelashes, eye, temple, nose, cheek, ear, beard, lips, tongue, neck, shoul-
der, arm, forearm, breast, trunk, stomach, back, elbow, hand, fingers, hip, genitals,
glutei, thigh, knee, calf, ankle, foot);
(iii) location or space that is touched: point, line or area;
(iv) movement, encompassing sub-parameters like path, pressure, duration, speed and
rhythm.
Also for touch one can find a lexicon, and the meanings of its items account for the pos-
itive or negative influence that physical contact has on people’s relationships. Some acts
of touch are communicative, and their meaning can be paraphrased in words. For many
touch items it is possible to find their origin in action (its degree zero of meaning, Pos-
ner and Serenari 2001) and the possible communicative inferences it elicits, that is, its
indirect meaning, which is sometimes idiomatized, i.e., stored in memory as a further
(sometimes the only) meaning of the item.
The meanings of touch may be analyzed in terms of the following criteria (Poggi
2007):
(i) name or verbal description of the act of touch: e.g., kiss, slap, kick, drying the
other’s tears;
(ii) verbal paraphrase or verbal expression that may accompany the touch: “C’mon,
don’t cry” while drying the other’s tears, or “I love you” while caressing
someone;
(iii) literal meaning: drying the other’s tears conveys “I want to console you”; a caress,
“I want to give you serenity and pleasure”;
(iv) indirect idiomatic meaning: an indirect meaning of a caress may be “I want you to
be calm”;
(v) originary meaning: the primitive goal of the act from which the literal meaning
might have evolved (e.g., through ritualization). So, embracing might derive
from a desire to wrap and incorporate the other.
(vi) social goal: four types of social disposition of the toucher towards the touched per-
son. An item of touch is “aggressive” when aimed at hurting or causing harm (e.g.,
a slap); “protective” when it offers help or affect (to kiss or to hand one’s hand to
another); “affiliative” when it asks for help or affection (a wife leaning on her hus-
band’s arm); “friendly” when it offers help or affection without implying difference
in power (to walk arm-in-arm).
Polysemy also holds in touch items, since some acts of touch have different meanings if
performed by actors with different relationship or status.
640 IV. Contemporary approaches
4. Multimodality
The communicating body may be viewed as an orchestra; its instruments are words,
prosody and intonation, gesture, head, gaze, face and posture, all playing simulta-
neously and making multimodal communicative acts in which the meaning intended
by the Sender is distributed across modalities. This raises two interesting issues:
(i) How are meanings distributed across modalities in the planning and production of
the multimodal message?
(ii) How can one disentangle the different meanings conveyed in different modalities?
(i) Meaning. If some types of meanings are more easily or typically conveyed in a par-
ticular modality (e.g., facial expression is generally more apt than verbal items to
convey emotions), that modality is searched first
(ii) Physical and social context: in a noisy disco the gestural system may be on top of
the stack
(iii) Addressee’s characteristics: if talking to a child, iconic gestures may be better than
verbal descriptions.
Once the stack is created, the sender searches for a suitable signal to convey the mean-
ing in the communicative system on top of it. If the signal found is not appropriate, the
search goes on in the next communicative system. If conveying a particular meaning is
particularly important, the sender may search signals for that meaning also in other
modalities and produce them simultaneously.
40. Mind, hands, face and body 641
(i) repetition, if a signal bears the same meaning as one or more words. E.g., Speaker
opens and drops his hands heavily to mean: “very much, intensely” while saying
“very much.”
(ii) addition, if it adds information: saying “a large balcony”, while depicting a trian-
gle with the base facing oneself.
(iii) substitution, if the speaker does not utter a word due to a slip, amnesia or inten-
tional reticence, but conveys that meaning by a body signal. E.g., a speaker says:
“In that case…”, then he suspends and makes the gesture of cutting air horizon-
tally, meaning: “I will not do that.”
(iv) contradiction, if the meaning of a signal contrasts with one of simultaneous
words. While being interviewed about racism a speaker may show very tolerant
in words, but keep distance from the interviewer in posture.
(v) independence, if a signal simultaneous to some word has no relation to it. E.g.,
while talking on the phone, I wave to someone entering my room: my words
and my wave are not part of the same communicative plan.
can be inferred from the literal one, the levels of meaning description, meaning type
and function may include an additional line, so each signal may be interpreted and clas-
sified in different ways, based on whether the literal or the indirect layer of meaning is
taken into account.
The “two-layer musical score” is particularly fit to analyze sophisticated cases of
multimodal communication, like the following fragment from the movie “Totò a colori”
(“Totò in colours,” director Steno 1952), with Totò (Antonio De Curtis), the most
famous Italian comic actor (see Fig. 40.9).
Totò has just tried to court a maiden, but on her refusal he offers an apology and a
justification. Here goes the verbal text.
(1) “Hai ragione, hai ragione, scusa tanto! Mi sono lasciato trasportare dall’impeto,
dall’imbulso della carne! Di questa carnaccia maledetta! Mo’ mi faccio male!
Non so che mi farei! Mo’ me ceco l’occhi!”
(You’re right, you’re right, I apologize! I let myself be carried by the impetus, by
the impulse of flesh! By this bloody bad flesh! Now I’ll hurt myself! I don’t know
what I would like to do to myself! Now I’ll blind my own eyes!).
v. SD hai ragione, hai ragione, scusa tanto! mi sono lasciato trasportare dall’impeto, dall’imbulso della carne!
You’re right,you’re right,I apologize! I let myself be carried by impetus, by the imbulse of flesh!
Legenda:
v.=verbal modality; p-i.= prosody-intonation; g.= gesture; f.=face; b.=body; SD=Signal Description; ST=Signal Type;
MD=Meaning Description; MT=Meaning Type; F=Function; IW=Information on the World; ISM=Information on the
Sender’s Mind; lh.=left hand; rh.=right hand; I = literal layer; II = indirect layer
While saying Hai ragione, hai ragione, scusa tanto! (‘You are right, you are right, I apol-
ogise a lot!’), Totò raises his open hands, palms forward, before his breast: a gesture that
conveys a literal meaning “I leave you alone” and an indirect idiomatic meaning
“I apologize,” the former providing information on the world with an additive function,
the latter giving information on the sender’s mind with a repetitive function. He also
lowers his gaze to express shame, showing he repents (indirect meaning), and pulls
inner parts of his eyebrows up expressing sadness, hence (indirect meaning) a request
40. Mind, hands, face and body 643
for forgiveness. For both gaze items, the first (“ashamed,” “sad”) is an additive meaning
of emotion and the second (“I repent,” “I apologize”) a performative that repeats the
meaning of simultaneous words, with all four being information on the sender’s mind.
Since showing sorry is semantically a part of the act of apologizing, its indirect meaning
is “I apologize,” a repetition against the words scusa tanto = “I apologize.”
While saying “Mi sono lasciato trasportare dall’imbulso della carne!” (‘I let myself be
carried by the impetus, by the impulse of flesh!’, with the word imbulso uttered with a
regional accent), Totò closes his fists with strong muscular tension, and frowns. Both sig-
nals have a literal meaning of “effort” and “striving hard,” information on a physical
sensation of the sender with an additive function with respect to speech. But the indi-
rect meaning, drawn through the rhetorical figure of metonymy, is “impetus” (impetus
is something you have to strive hard to oppose it): information on the world having a
repetitive function with respect to the word impeto.
As shown in Fig. 40.9, in the two-layers score meaning, meaning type and function of
each communicative signal are interpreted and classified in different ways, depending
on whether the literal or the indirect meaning is represented.
The “musical score” and other annotation schemes of multimodal communication
have been devised to analyze several types of interaction – political speech and political
talk show, judicial debate, teacher-pupil interaction, oral examination, speech-therapy,
job interview, comic and dramatic fiction, pianist performance, orchestra and choir
conduction – and proved particularly helpful in analyzing sophisticated aspects of
communication, like irony, comic recitation and so forth.
4.4. Irony
Irony in conversation typically takes advantage of multimodality. As every rhetorical
figure, irony is a case or “recitation” (Vincent and Castelfranchi 1981; Castelfranchi
and Poggi 1998), that is, of revealed deception, where the intended meaning is different
from the apparent, but this difference is revealed: the sender communicates something
different from what s/he thinks, but wants the addressee to understand it is not what
s/he thinks. In irony, the intended meaning is opposite to the apparent, and to under-
stand it, the addressee must first be alerted to irony, i.e., understand that the real
meaning is different from its appearance, then understand the specific non-literal
meaning of the communicative act (Attardo et al. 2003). Sometimes the addressee
is alerted to irony simply because the literal meaning is implausible compared to con-
text and previous knowledge, but in other cases the sender may alert the addressee in
one of two ways:
(i) through meta-communication: another communicative act about the ironic one,
performed through a “dedicated” signal that specifically means “I am being ironic”:
a wink, a “blank face” – an apparently and unnaturally inexpressive face – or an
a-symmetrical smile;
(ii) through para-communication: another communicative act besides the ironic one,
performed either in sequence in the same modality or simultaneously through
other modalities that utterly contradicts it (for instance, a bored face while uttering
an enthusiastic utterance).
644 IV. Contemporary approaches
Para-communication of the “irony alert” often occurs in the “Clean Hands trial,” a trial
of very high political importance where the accuser Antonio Di Pietro uses irony with
accused and witnesses, to demonstrate they are not credible (Poggi, Cavicchio, and
Magno Caldognetto 2008; Poggi 2010a).
Di Pietro is trying to demonstrate that Paolo Cirino Pomicino received 5 billions Lire
from Dr. Ferruzzi, an industry owner, for political elections. Cirino Pomicino says that
the day after the elections he received Ferruzzi at his home at 7.30 in the morning, and
that he did so just because seven months before he had promised to Sama, Ferruzzi’s
mediator, to meet Ferruzzi. Di Pietro ironically remarks it is quite strange that Cirino
Pomicino received Ferruzzi at his home at that time in the morning, and also strange
that this was only because, seven months before, he had been committed to meet
Ferruzzi, not because he was to thank Ferruzzi for granting 5 billion for the elections!
He does so to argue that Pomicino did know he was doing something illicit.
(2) Di Pietro says: “Il vero impegno che aveva preso questo signore era di ringraziare,
di sdebitarsi di un impegno che aveva preso col dottor Sama a giugno di sette mesi
prima”
(‘The true commitment of this gentleman was to thank, to pay off his debt of
something he had been committed to with Dr. Sama in June seven months
before’).
While uttering “a giugno” (in June), Di Pietro with both hands depicts an oblong shape
up in the air, and gazes up as if looking up in the sky: an iconic gesture and a deictic gaze
that together refer to something like a cloud, then a metaphor of “vagueness.” This ges-
ture of vagueness contrasts with the straight idea of a commitment (“impegno”), para-
communicating a meaning that contrasts with the words used, and thus signaling irony.
certainty, importance, evaluation, emotion and the sender’s competence and benevolence.
These types of information contribute to persuasion in that they implement the three
Aristotelian strategies of logos (logical argumentation), ethos (the sender’s reliability)
and pathos (the appeal to the addressee’s emotions); so classifying gesture and gaze
items in terms of this typology, and comparing their frequency, allows one to distinguish
how persuasive different styles of political discourse are and how much they respec-
tively employ the pathos, ethos or logos strategy (Poggi 2005; Poggi and Pelachaud
2008; Poggi and Vincze 2008).
6. References
Alibali, Martha, Sotaro Kita and Amanda J. Young 2000. Gesture and the process of speech
production: We think, therefore we gesture. Language and Cognitive Processes 15:
593–613.
Attardo, Salvatore, Jodi Eisterhold, Jennifer Hay and Isabella Poggi 2003. Multimodal markers of
irony and sarcasm. Humor. International Journal of Humor Research 16(2): 243–260.
Bernardis, Paolo, Elena Salillas and Nicoletta Caramelli 2008. Behavioural and neurophysiologi-
cal evidence of semantic interaction between iconic gestures and words. Cognitive Neuropsy-
chology 25(7–8): 1114–1128.
Castelfranchi, Cristiano and Domenico Parisi 1980. Linguaggio, Conoscenze e Scopi. Bologna: Il
Mulino.
Castelfranchi, Cristiano and Isabella Poggi 1998. Bugie, Finzioni, Sotterfugi. Per una Scienza del-
l’inganno. Roma: Carocci.
Chomsky, Noam 1965. Aspects of the Theory of Syntax. Cambridge: Massachusetts Institute of
Technology Press.
Conte, Rosaria and Cristiano Castelfranchi 1995. Cognitive and Social Action. London: University
College London Press.
Eco, Umberto 1975. Trattato di Semiotica Generale. Milan: Bompiani. Eng.Transl. A Theory of
Semiotics. Bloomington: Indiana University Press.
Ekman, Paul and Wallace V. Friesen 1982. Felt, false, and miserable smiles. Journal of Nonverbal
Behavior 6(4): 238–252.
Hartmann, Björn, Maurizio Mancini and Catherine Pelachaud 2002. Formational parameters and
adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. Computer Anima-
tion 2002: 111–119.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.), Lan-
guage and Gesture, 162–185. Cambridge: Cambridge University Press.
Klima, Edward and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard Uni-
versity Press.
Krauss, Robert M. 1998. Why do we gesture when we speak? Current Directions in Psychological
Science 7: 54–60.
Magno Caldognetto, Emanuela, Isabella Poggi, Piero Cosi, Federica Cavicchio and Giorgio Mer-
ola 2004. Multimodal score: An ANVIL based annotation scheme for multimodal audio-video
analysis. Workshop on Multimodal Corpora LREC 2004. Centro Cultural de Belem, Lisboa,
Portugal, 25th May 2004.
Mancini, Maurzio and Catherine Pelachaud 2009. Implementing distinctive behavior for con-
versational agents. In: Miguel Sales Dias, Sylvie Gibet, Marcelo M. Wanderlyey and Rafael
Bastos (eds.), Gesture-Based Human-Computer Interaction and Simulation 163–174. Berlin/
Heidelberg: Springer.
McNeill, David 1992. Hand and Mind. Chicago: University of Chicago Press.
McNeill, David (ed.) 2000. Language and Gesture. Cambridge: Cambridge University Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
646 IV. Contemporary approaches
Merola, Giorgio 2009. The effects of the gesture viewpoint on the students’ memoy of words and stor-
ies. In: Miguel Sales Dias, Sylvie Gibet, Marcelo M. Wanderley and Rafael Bastos (eds.), Gesture-
Based Human-Computer Interaction and Simulation, 272–281. Berlin/Heidelberg: Springer.
Merola, Giorgio and Isabella Poggi 2004. Multimodality and gestures in the teacher’s communi-
cation. In: Antonio Camurri and Gualtiero Volpe (eds.), Gesture-Based Communication in
Human-Computer Interaction. Proceedings of the 5th Gesture Workshop, GW 2003, Genova,
Italy, April 2003, 101–111. Berlin: Springer.
Miceli, Maria and Cristiano Castelfranchi 1992. La Cognizione del Valore. Milan: Franco Angeli.
Miceli, Maria and Cristiano Castelfranchi 1995. Le Difese della Mente. Rome: Nuova Italia
Scientifica.
Miceli, Maria and Cristiano Castelfranchi 2000. The role of evaluation in cognition and social
interaction. In: Kerstin Dautenhahn (ed.), Human Cognition and Social Agent Technology, 225–
261. Amsterdam: John Benjamins.
Miller, George A., Eugene Galanter and Karl A. Pribram 1960. Plans and the Structure of Behav-
ior. New York: Holt, Rinehart and Winston.
Morris, Desmond 1977. Manwatching. London: Jonathan Cape.
Parisi, Domenico and Francesco Antinucci 1973. Elementi di Grammatica. Turin: Boringhieri.
English Transl. Fundamentals of Grammar. New York: Academic Press.
Parisi, Domenico and Cristiano Castelfranchi 1975. Discourse as a hierarchy of goals. Working
Papers, 54–55. Urbino: Centro Internazionale di Semiotica e Linguistica.
Peirce, Charles Sanders 1935. Collected Papers. Cambridge: Cambridge University Press.
Pelachaud, Catherine and Isabella Poggi 1998. Talking faces that communicate by eyes. In: Serge
Santi, Isabelle Guaitella, Christian Cavé and Gabrielle Konopczynski (eds.), Oralité et Gestua-
lité, Communication Multimodale, Interaction, 211–216. Paris: L’Harmattan.
Pelachaud, Catherine and Isabella Poggi (eds.) 2001. Multimodal Communication and Context in
Embodied Agents. Proceedings of the Workshop W7 at the 5th International Conference on
Autonomous Agents, Montreal, Canada, 29 May 2001.
Poggi, Isabella 2002a. The lexicon of the conductor’s face. In: Paul McKevitt, Seán O’ Nuallàin
and Conn Mulvihill (eds.), Language,Vision, and Music. Selected Papers from the 8th Interna-
tional Workshop on the Cognitive Science of Natural Language Processing, Galway, 1999, 271–
284. Amsterdam: John Benjamins.
Poggi, Isabella 2002b. Symbolic gestures. The case of the Italian gestionary. Gesture 2(1): 71–98.
Poggi, Isabella 2005. The goals of persuasion. Pragmatics and Cognition 13: 298–335.
Poggi, Isabella 2006a. Body and mind in the pianist’s performance. In: Mario Baroni, Anna Rita
Addessi, Roberto Caterina and Marco Costa (eds.), Proceedings of the ICMPC 9 (International
Conference on Music Perception and Cognition), Bologna, August, 22–26, 2006: 1044–1051.
Poggi, Isabella 2006b. Le Parole del Corpo. Introduzione alla Comunicazione Multimodale. Rome:
Carocci.
Poggi, Isabella 2007. Mind, Hands, Face and Body. A Goal and Belief View of Multimodal Com-
munication. Berlin: Weidler.
Poggi, Isabella (ed.) 2008. La Mente del Cuore. Le Emozioni nel Lavoro, nella Scuola, nella Vita.
Rome: Armando.
Poggi, Isabella 2011. Irony, humour ad ridicule. Power, image and judical rhetoric in an Italian
political trial. In: Robert Vion, Robert Giacomi and Claude Vargas (eds.), La corporalité du
langage : Multimodalité, discours et écriture, Hommage à Claire Maury-Rouan. Aix-en Provence:
Publications de L’Université de Provence.
Poggi Isabella 2011. Music and leadership: the Choir Conductor’s multimodal communication.
In Gale Stam and Mika Ishino (eds.), Integrating Gestures. The interdisciplinary nature of ges-
tures, 341–353. Amsterdam: John Benjamins.
Poggi, Isabella, Federica Cavacchio and Emanuela Magno Caldagnetto 2007. Irony in a judi-
cial debate: analyzing the subtleties of irony while testing the subtleties of an annotation
scheme. Language Resources and Evaluation 41(3–4): 215–232.
40. Mind, hands, face and body 647
Poggi, Isabella and Francesca D’Errico 2009. Social signals and the action-cognition loop. The case
of overhelp and evaluation. Proceedings of the 1St IEEE International Workshop on Social
Signal Processing, Amsterdam, September 13, 2009.
Poggi, Isabella, Francesca D’Errico and Alessia Spagnolo 2010. The embodied morphemes of
gaze. In: Stefan Kopp and Ipke Wachsmuth (eds.), Gesture in Embodied Communication
and Human-Computer Interaction, GW 2009, LNAI 5934, 34–46. Berlin: Springer.
Poggi, Isabella and Emanuela Magno Caldognetto 1997. Mani che Parlano. Gesti e Psicologia della
Comunicazione. Padua, Italy: Unipress.
Poggi, Isabella, Emanuela Magno Caldognetto, Federica Cavicchio, Florida Nicolai and Silvia M.
Sottofattori 2007. Planning and generation of multimodal communicative acts. Poster pre-
sented at the III International ISGS Conference, Evanston (Ill.), 15–18 June 2007.
Poggi, Isabella and Catherine Pelachaud 2008. Persuasion and the expressivity of gestures in hu-
mans and machines. In: Ipke Wachsmuth, Manuela Lenzen and Gunther Knoblich (eds.), Em-
bodied Communication in Humans and Machines, 391–424. Oxford: Oxford University Press.
Poggi, Isabella, Catherine Pelachaud and Berardina De Carolis 2001. To display or not to display?
Towards the architecture of a reflexive agent. Proceedings of the 2nd Workshop on Attitude,
Personality and Emotions in User-adapted Interaction. User Modeling 2001, Sonthofen (Ger-
many), 13–17 July 2001.
Poggi, Isabella and Emanuela Roberto 2007. Towards the lexicon of gaze. An empirical study. In:
Andrzej Zuczkowski (ed.), Relations and Structures. Proceedings of the 15th International Sci-
entific Convention of the Society for Gestalt Theory and its Application, Macerata, May 24–27.
Poggi, Isabella and Laura Vincze 2008. The persuasive import of gesture and gaze. Proceeding on
the Workshop on Multimodal Corpora, LREC, Marrakech, 46–51.
Posner, Roland and Massimo Serenari 2001. Il grado zero della gestualità: dalla funzione pratica a
quella simbolica-alcuni esempi dal Dizionario berlinese dei gesti quotidiani. In: Emanuela
Magno Caldognetto and Piero Cosi (eds.), Atti delle 11e Giornate del Gruppo di Fonetica Sper-
imentale: Multimodalità e Multimedialità della Comunicazione. Padova, November 29–Decem-
ber 1, 2000, 81–88. Padua, Italy: Unipress.
Rector, Monica, Isabella Poggi and Nadine Trigo (eds.) 2003. Gestures. Meaning and Use. Porto:
Universidade Fernando Pessoa.
Rimé, Bernard and Loris Schiaratura 1991. Gesture and speech. In: Robert Feldman and Bernard
Rimé (eds.), Fundamentals of Nonverbal Behavior, 239–284. New York: Cambridge University
Press.
Saussure, Ferdinand de 1916. Cours de Linguistique Générale. Paris: Payot.
Serenari, Massimo 2003. Examples from the Berlin dictionary of everyday gestures. In: Monica
Rector, Isabella Poggi and Nadine Trigo (eds.), Gestures. Meaning and Use, 15–32. Porto: Uni-
versidade Fernando Pessoa.
Stokoe, William C. 1978. Sign Language Structure: An Outline of the Communicative Systems of
the American Deaf. Silver Spring, MD: Linstock Press.
Vincent, Jocelyne and Cristiano Castelfranchi 1981. On the art of deception: How to lie while say-
ing the truth. In: Herman Parret, Marina Sbisà and Jef Verschueren (eds.), Possibilities and
Limitations of Pragmatics, 749–778. Amsterdam: John Benjamins.
Volterra, Virginia 1987. LIS. La Lingua Italiana dei Segni. Bologna: Il Mulino.
Abstract
This chapter presents an overview of nonverbal communication from a functional prag-
matic perspective. Starting from a short discussion of the notion of multimodality within
functional pragmatics, the chapter focuses on the means of nonverbal communication
and offers a systematics for nonverbal communication, basic analytic notions, its relation
to linguistic action, and its transcription.
1. Functional pragmatics
1.1. Form, function, and linguistic action
Functional pragmatics (FP) is a theory of linguistic action (Ehlich 2007a; Redder 2008;
Rehbein and Kameyama 2003; Thielmann 2013). Functional pragmatics conceives of
linguistic action as a combination of action purposes and linguistic forms that can
be used to achieve these purposes. Functional pragmatics – analysis of linguistic action
is thus based on a close nexus of function and form. Languages are seen as elaborated
systems for the realisation of the interactors’ purposes. Languages are resources for
the interactors’ activities. Linguistic interaction is the most important type of human
communication.
use of the face, and by the use of hands and fingers. Other parts of the body, though they
may be used also for communicative purposes, seem to be less apt for this application, all
the more since human beings hide parts of the body and, by contrast, disclose other. This
causes further restrictions for the usability of the body and its parts in communication.
4.3. Repertoires of expression of the body and its parts and their
aptitude for communicative purposes
The double character of visual perceptibility of the body in communicative contexts is
determined by the specific primary functions of the body and its parts being a specific
potential and a specific restriction. As compared with the 360-degree perceptibility on
the acoustic dimension, the visual dimension of human beings is characterised by a lim-
ited sight angle of about 180plus degrees – everything that happens behind the body
being invisible. Turning the body turns into invisibility everything that had been visible
before in order to make visible what has not been visible before. The basic front-rear-
dichotomy that characterises the human body (as it characterises many other animals) is
a strong restriction for the use of the body as a means of communication. On the other
hand, the possibility of turning the body towards the other offers a communicative
potential that is largely used for nonverbal communication. Face-to-face communica-
tion genuinely is communication with the front of the body turned to the front of the
body of the other. The management of proximity and distance is of basic importance
in communication. It is part of the elementary cooperation without which communication
is impossible (cf. Ehlich 2007b).
The face, the eyes and the mouth are the essential means of expression that are used
for mimetic communication. Forehead and chin are more restricted with regard to their
usability for communicative purposes. Nose and ears have only a small repertoire of
movement that may be used for communicative purposes.
The movability of arms and hands extends the space that is occupied by the body as
res extensa into a virtual space that is limited by the maximal extension that arms or
hands can reach. Since arms and hands are integrated into the front-rear type of orien-
tation the visual domain can easily be transferred into the haptic dimension (touch, cf.
Dreischer 2001).
Other parts of the body offer only few possibilities of expression though they may
have them (breast, belly, hips, lower extremities). Their communicative use is strongly
restricted in different cultures.
652 IV. Contemporary approaches
From the right to the left, the salience of nonverbal phenomena in Fig. 41.1 decreases,
while, at the same rate, the difficulty of analysis increases. Neutral concomitant nonver-
bal communication thus presents the greatest difficulties to analysis; ostentatious non-
verbal communication is quite easy to deal with, but its analysis has only little to offer in
comparison.
A set of every day linguistic terms can be used for the analysis of the repertoires of
expressions and of their expressive units, as, e.g., “winking” or “waving”. They are com-
bined to form a semantic field of its own. Such everyday terms can be taken as a starting
point of analysis. But they always include the danger of misleading the analysis and to
miss the systematic structure of the communicative units and their inner connexion. In
many cases, everyday terms are applicable in everyday communication because of their
indeterminateness and lack of precision. Scientific analysis faces the task to aim at pre-
cision with reference to the phenomena under investigation and their systematic inter-
relationship. A simple use or transposition of everyday terminology will not do to
achieve adequate results. The history of nonverbal communication research is full of ex-
amples of quid pro quo’s and full of ambivalence caused by uncritical application of
everyday terms to the analysis.
Expressive units are understandable and obligatory for the members of a communi-
cative group. Their use follows the communicative purposes of the communicative in-
teractors. Hence, expressive units are more than indications, indexes or telltale signs.
In communicative interaction, it is not the question of inferring intrapsychological states
of affairs of the interlocutor by “decoding” nonverbal communication units which might
disclose what otherwise would be hidden. On the contrary, what is at stake are acts of
understanding that are structured in a way similar to those of verbal communication.
However, because of the limitations of the movement potentials nonverbal communica-
tion cannot achieve the same degree of differentiation as verbal communication. Further-
more, the ties to face-to-face communication situations are difficult to cut for nonverbal
communication. This has drastic consequences for the extension of nonverbal communi-
cation as a communicative subsystem. The “double articulation” characteristic of verbal
languages (Martinet 1960) is not applicable to nonverbal communication. The single
expressive units are characterised by an inner complexity that is difficult to dissolve.
These analytical tasks necessitate a long lasting research process that is structured in a
hermeneutic way. This process demands extensive empirical work and, at the same time,
the reflection of this work’s theoretic foundations. It is not likely that the research pro-
cess can be significantly shortened, for instance by means of recurring to alleged
41. Nonverbal communication in a functional pragmatic perspective 655
universals. The research process most probably will yield fragmentary results for quite a
time. These results may be produced with respect to the various analytical steps named
above, such as descriptions of movement potentials or of single repertoires of expres-
sion or single expressive units, or they may refer to scrutinised investigation into the
systematics or into parts of the systematics. The achievement of a consistent conceptua-
lisation of nonverbal communication as a subsystem of purposeful human action being
of fundamental importance, it will be essential that none of the above mentioned as-
pects of analysis is isolated against the others or is proclaimed the only one of relevance.
As an example for a comprehensive analysis that demonstrates this methodological
procedure in detail the above mentioned book Ehlich and Rehbein (1982) gives an
account of eye communication and of one specific expressive unit of this communicative
sub-domain, a unit that is called “deliberative gaze avoiding” (deliberatives Wegblicken).
The functional pragmatics research programme with regard to nonverbal communica-
tion comprises the quantitative extension of similar analyses that deliver precise and de-
tailed descriptions of expressive units of nonverbal communication. One basic task of
these analyses is a critical reconstruction of the results that have been reached so far
in the rich literature on nonverbal communication by re-interpreting these results in
terms of the above mentioned categorical framework.
Research into factual communication, which has increased steadily during the last
40 years, aims at quick and comprehensive results. However, such results cannot be
achieved by short cuts and premature generalisations or universalisations. The func-
tional pragmatics research programme for nonverbal communication is aware of the
necessity of extensive efforts. Anticipations and preliminary categories will be indis-
pensable during the analytical process – as, for instance, descriptions of nonverbal com-
munication have to be entered into transcriptions of spoken language. It will be
necessary, however, to keep in mind the preliminary and restricted character of such
categories. Such preliminary categories can consist of everyday terms or of descriptions
of movement repertoires or their parts at a time when the reconstruction of their pur-
pose has not yet been achieved. This is why this research process requires great meth-
odological care and attention. However, wherever a precise description has been
achieved, its use for further development of analysis will be evident.
Each of these procedure types is part of a functional nexus, a linguistic field (in the
sense of Bühler 1934). Though most of the procedures are integrated into linguistic
acts, when a linguistic action is performed, a small group of procedures can stand alone
and can sufficiently execute a communicative purpose without further integration into
higher order linguistic acts or actions. The isolated articulation of the deictic “here” is
one example. Its utterance is fully sufficient to fulfil the purpose of a deictic procedure,
namely to orientate the hearer in his/her immediate environment. Such self-sufficient pro-
cedures are primary candidates for the use of a presentational independent nonverbal
communicative unit, namely the gesture of pointing. In such a case, the hearer’s (or “view-
er’s”, as it were) orientation can be accomplished completely by means of the nonverbal
communication unit without any accompanying verbal utterance. Self sufficiency is also
possible in the case of incitive procedures. “Waving” can function sufficiently to make
another person turn towards the interactor just in the same way as a loud “hello” would do.
It is much more difficult to identify the position of concomitant neutral nonverbal
communication units in the overall systematics of linguistic action. In the case of
these nonverbal communication units, it is possible that expressive procedural aspects
are specifically combined with operative procedures. Expressive procedures serve the
purpose to emotionally align the speakers with their communicative interactors. In
European languages the usual mode of expression is paralinguistic in nature, i.e. expres-
sive procedures occur in procedural combination with other procedures, esp. with
symbolic procedures. Such a function can also be realised by means of a nonverbal
communication unit that co-occurs with the verbal utterance.
Operative procedures are used to process the linguistic activities themselves. Assis-
tance in the structuring of a verbal utterance by means of gesture (cf. Müller 1998)
may serve one of the purposes of operative procedures. The acquisition of these
communicative skills obviously starts very early in childhood (cf. Leimbrink 2010).
The other types of nonverbal communication from Fig. 41.1 need specific analyses to
determine their functions in the framework of the systematics of linguistic actions.
expressive units in the terms of linguistic action, transcription work will become more
and more reliable and usable for linguistic analysis.
As has been said above, nonverbal communication, seen in a functional pragmatic
perspective, is a highly complex field of analysis. Analytic knowledge with regard to
the expressive units that make up the nonverbal communication of a specific commu-
nication community still is very rudimentary. A continuous enrichment and enlarge-
ment of our knowledge of nonverbal communication, of its units and its structures is
indispensible to achieve progress. For the further development of functional pragmatics
theory and for concrete analytical work in its context this knowledge is an important
desideratum.
11. References
Berkemeier, Anne 2003. Wie Schüler(innen) ihr nonverbales Handeln beim Präsentieren und
Moderieren reflektieren. In: Otto Schober (ed.), Körpersprache im Deutschunterricht, 58–72.
Baltmannsweiler: Schneider Verlag.
Berkemeier, Anne 2006. Präsentieren und Moderieren im Deutschunterricht. Baltmannsweiler:
Schneider Verlag.
Bühler, Karl 1934. Sprachtheorie. Die Darstellungsfunktion der Sprache. Jena: Fischer. [Transla-
tion: Theory of language. The representational function of language. Translated by Donald
Fraser Goodwin, 1990. Amsterdam: John Benjamins].
Bührig, Kristin 2005. Gestik in einem inszenierten Fernsehinterview. In: Kristin Bührig and Frank
Sager (eds.), Nonverbale Kommunikation im Gespräch, 193–215. (Osnabrücker Beiträge zur
Sprachtheorie (OBST) 70.) Oldenburg: Redaktion OBST.
Darwin, Charles 1872. The Expression of the Emotions in Man and Animals. London: Murray.
Dreischer, Anita 2001. “Sie brauchen mich nicht immer zu streicheln…” Eine diskursanalytische
Untersuchung zu den Funktionen von Berührungen in medialen Gesprächen. (Arbeiten zur
Sprachanalyse 39.) Frankfurt am Main: Peter Lang.
Ehlich, Konrad 1993. HIAT – A transcription system for discourse data. In: Jane Edwards and
Martin D. Lampert (eds.), Talking Data: Transcription and Coding in Discourse Research,
123–148. Hillsdale, NJ: Lawrence Erlbaum.
Ehlich, Konrad 2007a. Funktional-pragmatische Kommunikationsanalyse – Ziele und Verfahren.
In: Konrad Ehlich, Sprache und sprachliches Handeln, Volume 1, 9–28. (Pragmatik und
Sprachtheorie.) Berlin: De Gruyter.
Ehlich, Konrad 2007b. Kooperation und sprachliches Handeln. In: Konrad Ehlich, Sprache und
sprachliches Handeln, Volume 1, 125–137. (Pragmatik und Sprachtheorie.) Berlin: De Gruyter.
Ehlich, Konrad and Jochen Rehbein 1977. Wissen, kommunikatives Handeln und die Schule. In:
Herma C. Goeppert (ed.), Sprachverhalten im Unterricht, 36–114. Munich: Fink/UTB.
Ehlich, Konrad and Jochen Rehbein 1981a. Die Wiedergabe intonatorischer, nonverbaler und ak-
tionaler Phänomene im Verfahren HIAT. In: Annemarie Lange-Seidl (ed.), Zeichenkonstitu-
tion, Volume 2, 174–186. Berlin: De Gruyter.
Ehlich, Konrad and Jochen Rehbein 1981b. Zur Notierung nonverbaler Kommunikation für dis-
kursanalytische Zwecke (Erweiterte halbinterpretative Arbeitstranskriptionen (HIAT 2)). In:
Peter Winkler (ed.), Methoden der Analyse von Face-to-Face-Situationen, 302–329. Stuttgart:
Metzler.
Ehlich, Konrad and Jochen Rehbein 1982. Augenkommunikation. Methodenreflexion und Beispiel-
analyse. (Linguistik Aktuell 2.) Amsterdam: John Benjamins.
Grasser, Barbara and Angelika Redder 2011. Schüler auf dem Weg zum Erklären – eine
funktional-pragmatische Fallanalyse. In: Petra Hüttis-Graff and Petra Wieler (eds.), Übergänge
zwischen Mündlichkeit und Schriftlichkeit im Vor- und Grundschulalter, 57–78. Freiburg:
Fillibach.
658 IV. Contemporary approaches
Hanna, Ortrun 2003. Wissensvermittlung durch Sprache und Bild. Sprachliche Strukturen in der in-
genieurwissenschaftlichen Hochschulkommunikation. (Arbeiten zur Sprachanalyse 42.) Frank-
furt am Main: Peter Lang.
Heilmann, Christa M. 2005. Der gestische Raum. In: Kristin Bührig and Frank Sager (eds.),
Kommunikation im Gespräch, 117–136. Oldenburg: Redaktion OBST.
Leimbrink, Kerstin 2010. Kommunikation von Anfang an. Die Entwicklung von Sprache in den
ersten Lebensmonaten. Tübingen: Stauffenburg.
Martinet, André 1960. Eléments de Linguistique Générale. Paris: Armand Colin.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorien – Sprachvergleich.
Berlin: Berliner Wissenschaftsverlag.
Redder, Angelika 2001. Aufbau und Gestaltung von Transkriptionssystemen. In: Klaus Brinker,
Gerd Antos and Wolfgang Heinemann (eds.), Text- und Gesprächslinguistik. Ein internatio-
nales Handbuch, 2. Halbband, 1038–1059. (Handbücher zur Sprach- und Kommunikationswis-
senschaft, 16.2.) Berlin: De Gruyter.
Redder, Angelika 2008. Functional Pragmatics. In: Gerd Antos and Eija Ventola (eds.), Interper-
sonal Communication, 133–178. (Handbook of Applied Linguistics 2.) Berlin: De Gruyter.
Rehbein, Jochen and Shinichi Kameyama 2003. Pragmatik. In: Ulrich Ammon, Norbert Dittmar,
Klaus Mattheier and Peter Trudgill (eds.), Sociolinguistics. An International Handbook of the
Science of Language and Society, 556–588. (Handbücher zur Sprach- und Kommunikationswis-
senschaft 3.1.) Berlin: De Gruyter.
Rehbein, Jochen, Thomas Schmidt, Bernd Meyer, Franziska Watzke and Annette Herkenrath
2004. Handbuch für das computergestützte Transkribieren nach HIAT. (Serie B 56.) Hamburg:
Arbeiten zur Mehrsprachigkeit.
Thielmann, Winfried 2013. Konrad Ehlich. In: Carol Chapelle (ed.), Encyclopedia of Applied
Linguistics. Oxford: Blackwell.
Wrobel, Ulrike 2007. Raum als kommunikative Ressource. Eine handlungstheoretische Analyse vi-
sueller Sprachen. (Arbeiten zur Sprachanalyse 47.) Frankfurt am Main: Peter Lang.
Abstract
My interest essentially lies in the specificity of co-speech gesture and in its mode of sym-
bolic functioning. I argue for viewing gesture as a symbolic system in its own right that
interfaces with thought and speech production.
42. Elements of meaning in gesture: The analogical links 659
oneself) originating from a link of contiguity between the gesture (the palm moves for-
wards) and its functional meaning (to protect oneself). The analogical link of function is
a link of contiguity.
However, as a vertical surface with a rectangular shape turned away from the body,
Palm Forwards at face level can represent a notice, the relevant element in the action of
putting an announcement on a notice board. The virtual object thus represented can
evoke, via metonymy, the whole action schema and its motive: putting a notice on a
board to make its written content known to everyone. And by further semantic deriva-
tion, outside the domain of the written word, the action of making something known to
everyone as if one were displaying it on a notice board. This example shows us that the
semiotic process that produces representational gesture occurs in stages, in this case
using several links: resemblance of shape (rectangular flat palm and notice), temporal
contiguity (displaying information), and resemblance of motive (public announcement).
The analogical link of shape is a link of physical resemblance, and the contextual meaning
(public announcement) is derived from this physical resemblance.
In sum, the analogical link is the initial link of contiguity or of resemblance estab-
lished through analogy between a physical feature of the gesture and our physical expe-
rience of the world. On the basis of the analogical link, further links of contiguity or
resemblance may come into play to create the contextual meaning of a gesture. We
shall come back to the importance of identifying the analogical link of a gesture in
order to discover the meaning of a gesture in a given context.
a physical similarity between the gesture’s physical elements and the reflex action
(Palm Forwards) creates a physical metaphor (physical self-protection). This serves
to express an abstract metaphor via a transfer from the physical world (reflex of physical
self-protection) to an abstract notion (non-physical self-protective action).
A deep analysis of the kinesic sign highlights its motivated character. Researching
this motivation leads us to home in on the perceptual-motor experience of the body
in physical interaction with its environment. This return to the origins highlights the
non-conscious, physico-symbolic information conveyed by the gestural sign that thus
operates on several levels of consciousness; quite often, it is its deep motivation, its
non-conscious “symbolic action,” that is revealed to be of highest relevance on the
semantic level during speech production. Knowledge of this root meaning enables us
to explain the spontaneous choice of a particular kinesic expression at the expense
of another and, in this way, to gain a deeper understanding of the utterance.
2. Corpora
My research began by studying attitudes that French people express through co-
occurrent intonational and facial expressions, truly audio-visual nonverbal entities
(Calbris and Montredon 1980). It continued by concentrating solely on the visual mod-
ality, first of all on conventional gestural expressions that can be understood without a
context (emblems), and then on spontaneous gestural expressions that occur during
speech production (co-speech gestures).
Initially, 50 French (Calbris 1980: 245–347) and foreign (Calbris 1981: 125–156) sub-
jects were tested using an experimental film designed to study how 34 conventional
French co-occurrent manual and facial expressions are structured to convey meaning.
The results give indications of the relative pertinence of meaningful physical elements
and the cultural character of these expressions. Moreover, the foreign subjects inter-
preted them as signs that necessarily have a “motivated” origin, that is to say, there
seems to be a natural driving force that has led to their appearance as opposed to an
arbitrary pairing of forms and meanings established by convention.
An outcome of this initial research was the need to verify the motivation of the phys-
ical components of the gestures produced during spontaneous uses of language. This
gave rise to Corpus 1, a very varied collection of about a thousand samples of co-speech
French gestures ethnographically noted in 1981 in the field, for example, in trains and
cafes, as well as selected from media such as films, comedy sketch shows, and television
debates. The semiotic analysis of these gestures, classified according to their physical as-
pects by evaluating the hierarchical relevance of their physical components in view of
their corresponding contextual meanings, was the subject of a doctoral thesis (Calbris
1983), later condensed into a book (1990). A comparative analysis of the data showed
that one gesture can evoke several notions (semantic diversity) and that one notion can
be represented by several gestures (physical diversity). This being the case, how is the
presumably motivated character of gesture maintained? I sought to answer this ques-
tion by conducting a further comparative analysis of the data. This revealed the phe-
nomenon of semantic derivation on the basis of one physico-semantic link (single
motivation). This revealed the phenomenon of semantic derivation from either one
single physico-semantic link (single motivation), or one selected from several possible
physico-semantic links (plural motivation) (Calbris 1987: 57–96).
662 IV. Contemporary approaches
To confirm the results obtained from observations noted in the field, in 1990 I estab-
lished Corpus 2, a database of audio-visual samples of French gestures: fragments of se-
quences, varying in length from a few seconds to one minute, selected from filmed
interviews with about 60 people, mostly intellectuals.
Corpus 3 is a series of six interviews with Lionel Jospin, the former French Prime
Minister, that were broadcast on French television between July 1997 and April 1998.
It was established to study how the two types of sign – gestural and verbal – interact
synergetically during utterance production.
A major contribution of my work resides in demonstrating the plural motivation of
gesture. This modifies our perception of the referential function of co-speech gesture
because one gesture can be bi-referential or even multi-referential. Thus, in a given
instance, by establishing several analogical (physico-semantic) links between its
physical aspect and its contextual meaning, one gesture can contain several gestural
signs.
3. Methodology
In order to progress towards a deep understanding of surface phenomena, my method
of analysis operates on several descending levels: from the examination of a co-speech
gesture in its contexts of use, to its motivation, and then to the physical origin of its
motivation. The stages of analysis are summarized as follows:
By researching physical diversity, on the one hand, and semantic diversity, on the other,
one may discover the range of physico-semantic links contained in the database.
The major class of gestures in space is subdivided according to the form of the move-
ment pathway: straight-line gestures are opposed to curved gestures, whose components
of secondary relevance differ completely from one another. In the case of straight-line
664 IV. Contemporary approaches
gestures, what matters is the directional axis of the movement performed by a body
part in a particular configuration, and in a specific orientation if the configuration
has a flat shape. Repetition and symmetry are also secondary components. In the
case of curved gestures, what matters is the form created by the movement as well as its
direction: progressive (clockwise) movement is opposed to regressive (counterclockwise)
movement.
Whereas there are essentially only three priority physical components (localiza-
tion, movement, and body part), the secondary components are numerous: repetition
of movement, type of repetition, and movement quality all have their importance, as
does laterality or the use of both hands, as well as the plane, whether it be the plane in
which the flat hand performing a straight-line movement is oriented or the plane in
which a curved movement is performed. Furthermore, certain physical elements of
the flat hand, such as the tip, the edge, or the surface of the flat palm, appear to be
relevant. The sub-components of a configuration, just like the sub-components of
a movement, can unite to constitute the relevant physical feature of the gesture in
question.
(i) Code the representational gestures’ components: the coded description indicates,
always in the same order, the hand used (or both hands), its localization (only
for gestures at face level or higher), its configuration, its orientation, and then
its movement. For example, the right hand [R], closed in a Fist [R ¶] turned
inwards towards the speaker [R ¶b], moves forwards [R ¶b.f];
(ii) Create repertoires according to gestural components;
(iii) In each repertoire, determine the common semantic element corresponding to the
common physical element;
(iv) Deduce the potential analogical link(s) between physical and semantic elements;
(v) Validate the analogical link.
This method enables us to discover the diverse analogical links established between the
physical and the semantic levels and subsequently the combination of these links man-
ifested internally in the gestures. It brings to light the complex interplay of the symbolic
relations between gestures and the notions they express.
Alternatively Alternatively
Polysemous gesture Gesture variants
in the case of Palm Forwards, whereas the gesture performed with the Level Hand
contains several analogical links.
S Refusal of Self-protection
E responsibility Stop
M Request to stop ‘Time out!’
A Objection Restriction Restriction
N Refusal
T Negation Negative
I insinuation
C
Fig. 42.2: Parallel attenuation, physical and semantic: Opposition to the outside (Calbris
1990: 119)
Level Hand can express the following alternative notions depending on its context of
use: “quantity” and as a value judgement “superlative”; “totality” and as a value judge-
ment “perfection”; “directness” on the temporal level (“immediately afterwards”), on
the logical level (“determinism,” “obligation,” “certainty”), or on the value-judgement
level (“frankness”); “stop-refusal” as in cases of “negation” (“nothing,” “never,” “no
more”), “refusal,” or “end”; “cutting”; and it can literally and figuratively express the
idea of a “flat surface,” a “second surface” that covers the first, the action of “laying
something out flat,” a “level,” or “making something level,” i.e. “equality.” It is a question
of “levelling – flattening – standardizing – equalizing.”
GESTURE Transverse movement of the Level Hand
RELEVANT Movement
TRAIT makes relevant:
the visual field the fingertips the palm the edge the flat palm
from left to right that draw that resists that cuts that covers
Fig. 42.3: Plural motivation of the transverse movement of the Level Hand (Calbris 1990: 140)
The diverse contextual meanings of the same gesture make one or another of the com-
ponents relevant (movement or configuration), or even one of the physical traits of
the configuration (the palm, the fingertips, the edge of the hand, or the orientation
of the palm); most frequently, the analogical link resides in the movement of one of
these elements (the movement of the palm, of the fingertip(s), or of the edge of the
hand).
Let us now consider the relevant physical features upon which the different analogical
links in Fig. 42.3 are based:
(i) We know that other gesture variants expressing “totality” have a component in
common: the “transverse movement” of the hand or of the gaze, sweeping the
horizon, representing the whole visual field, “everything,” and “everywhere.”
(ii) The idea of “directness” that is common to the notions of “determinism,” “obliga-
tion,” “certainty” and “frankness” supposes the representation of a straight line
(linear trace made by a moving point); hence it is the movement of the fingertips
which becomes relevant.
(iii) The idea of “stop-refusal,” previously expressed by Palm Forwards opposing
something approaching from the outside, is signified by a horizontal movement
of the palm facing downwards. Could the palm stop an opposing force? The notion
668 IV. Contemporary approaches
of “the end” implies stopping a process that has come to an end. Would this be
represented by the palm stopping a progression originating from the ground?
(iv) To express the notion of “cutting,” it is the edge of the hand, or more exactly the
movement of the edge of the hand, which becomes relevant.
(v) Lastly, the analogy between the flat shape of the hand and a flat surface that it is
representing is obvious.
The analogical link applies to every domain and may be at the origin of semantic
derivation. For example, a direct link expresses “immediacy” on the temporal
level, and on the logical level, “immediate consequence” (“determinism,” “obliga-
tion,” “certainty”), whereas on the level of moral judgement, directness expresses
“frankness.”
4.2.2. Polysign
A gesture that simultaneously represents several notions is called a polysign. For exam-
ple, a raised fist that as co-speech gesture signifies how well a secret has been guarded
simultaneously represents “enclosure” by virtue of the fist configuration and “increas-
ing exclamation” by the upward movement. Each component (configuration, move-
ment) supporting an analogical link is a gestural sign. Hence the gesture (movement
of the configuration) is a polysign.
The complex gesture is a particular type of polysign. Here is an example from my
data: in order to simultaneously depict “mixing” and “approximation,” usually signified
by the two hands turning around each other in the first case, and by a rotational oscil-
lation of the concave palm facing downwards in the second, a screen writer expresses a
“kind of confusion” and a philosopher expresses an “approximate mixture” by perform-
ing the same synthesis, i.e. an alternating oscillation of the two concave palms, one
behind the other, as if they were interlocked (Fig. 42.4).
GESTURE 1. 2. Synthesis of 1. & 2.
Two hands turning Rotational oscillation Alternating oscillation of
around each other of the concave palm the two concave palms,
one behind the other
The multifaceted depiction requires one of the components to be modified so that the
analogical links necessary for synthetic representation can be cumulated. In other
words, an addition on the symbolic level requires a slight transformation on the physical
level.
We know that a gesture is a composite unit. Each of these components can itself be
decomposed. We have seen that a hand configuration, a flat hand, for example, is able to
contain several relevant physical elements: the fingertips, the palm, and the edge of
42. Elements of meaning in gesture: The analogical links 669
the hand. Similarly, a movement, for example, a curved line, is characterized by both
shape and direction. There are thus many elements which may convey meaning.
Since it is the analogical link, established by a gestural component or even a gestural
sub-component, that determines a meaning, several types of polysign may occur
(Tab. 42.2). Two analogical links established by two gestural components give rise to
a bi-referential gesture. Likewise, two analogical links within one component, for exam-
ple, a movement composed of two relevant sub-components, give rise to a bi-referential
movement. A polysign gesture may be multi-referential by having several analogical
links on the component and/or sub-component level. Tab. 42.2 shows just some of
the possible combinations of analogical links that, in sum, generate the referential
potential of polysign gestures.
Consequently, the different combinations of notions that these gestural components can
evoke are themselves multiple. If we consider Fig. 42.5,
(i) the “will,” on one hand, and the “progression,” on the other, represent together
the “will to advance”.
(ii) The “effort” and the “progression towards a goal” become allied to represent the
“effort towards a goal”.
(iii) The combination of the notion of “strength” and that of “temporal progression”
results in “strength and modernism”.
(iv) Finally, the representation of “strength” allied to a “progression” amounts to the
“strength to attack”.
Many combined and diverse meanings for one and the same gesture are possible.
In fact numerous contextual meanings can depend on a few physical elements that
support more than one analogical link, each of which may be subject to semantic
derivation.
When the analogical link is not obvious, one finds it by exploiting the interaction
between the phenomena of polysemy (one gesture represents different notions), on
the one hand, and variation (different gestures represent a notion), on the other. For
example, in France, the head shake expresses “negation”; but furthermore, as a co-
speech gesture, it can also express “totality” and/or “approximation.” How can we dis-
cover the analogical link inherent to each of the three contextual meanings of the head
shake? The answer lies in comparing the gesture variants of each notion. All those that
express “totality” are characterized by a transverse movement of the head, or of the
hand. What is the analogical relation between this common characteristic (transverse
movement in each gesture referring to “totality”) and the notion of “totality” if it is
not reference to the horizon, “everywhere,” concretely represented by the gaze sweep-
ing across the horizon (in a single or a repeated head movement), or by the palm cover-
ing it from one side to the other (using one hand or both hands in a symmetrical
movement).
N O T I O N S
1. 2. 3. 4.
VARIANTS
a finished absolute total definitive
Fig. 42.6: Polysemy and gesture variants: Different notions nuanced by the polysemy of the
gesture
Once the various elements of the symbolic Meccano system constituted by analogical
links have been identified, one can leisurely observe the constructions obtained by
the association of various elements. A polysemous gesture (meanings: 1, 2, 3, 4) can
in a given situation become a polysign: the totality is finished (1.a) and not unified;
the negation is absolute (2.a); the cut is total (3.a); and the stop is definitive (4.a).
For each notion, the polysemous gesture cumulates two analogical links, namely, the
link specific to the notion, and the link that, depicting “totality” by the transverse
movement, comes to nuance it.
Playing with the symbolic Meccano system produces curious results because a kine-
sic ensemble (facial expression and hand gesture) just as well as a kinesic unit (hand
gesture) or a kinesic sub-unit (gestural component) can, each one of them, contain
one or more analogical links!
The multichannel nature of the communication is highly evident in this example since
both the acoustic and the visual channels are used in a complementary way during
three-fifths of the utterance: one is assigned the task of diplomacy, and the other is
entrusted with the essential content. In other words, the idea of “unity,” present right
from the start, is gesturalized in the form of a globe. The gesture is repeated in order
to segment the rhythmic-semantic groups corresponding to the oratorical precautions
(three-fifths of the utterance). The multifunctionality of the gesture is equally evident
42. Elements of meaning in gesture: The analogical links 673
since the significance of the globe, sculpted by the referential gesture that segments and
complements the simultaneous utterance, is only confirmed at the end.
One sees the gesture giving the metaphorical image of the abstract notion often well
in advance of this being put into words. More than the imagination, it expresses the
metaphorical imagery that underlies the putting of thoughts into words. Gestural refer-
entiality is an indicator of ideation, of the spontaneity of the thought to be put into
words (see Calbris 2011: 295–312).
Now let us consider the relation between the gestural sign and thought. The study of
gestures of cutting, for example, extracted from fragments of conversation selected
from different corpora, demonstrates how gesture expresses the percept underlying
the concept (Calbris 2003: 19–46). Gesture appears as the product of a perceptual
abstraction from reality. It represents a preconceptual schema, an intermediary between
the concrete and the abstract, which allows it to evoke the one or the other equally well.
For example, the schema of cutting implicitly appears in numerous and varied notions:
“separation,” “cutting into elements,” “division into two halves,” “blockage,” “refusal,”
“elimination,” “negation,” “end,” “stoppage,” “decision,” “determination,” “measure-
ment,” “categorization,” “categorical character,” and “interruption.” An ideal, abstract,
and adaptable prototype is constructed on the basis of concrete acts. The gesture of cut-
ting represents the visual and proprioceptive operational schema and, through it, the
two extremes of the semantic continuum going from the concrete to the abstract:
from cutting a real object into pieces and the cognitive dissecting task of analysis.
Despite its motivated character and the physical concreteness of its form of expression,
the gestural sign operates at a certain level of abstraction.
6. To conclude
The methodological approach adopted here, that consists in studying the analogical link
by comparing the gesture variants of a notion, has enabled us to throw light upon the
aforementioned cases: it explains a gesture’s potential to change analogical links in
order to evoke different notions according to the context (polysemous gesture) or to
cumulate two analogical links in order to evoke two notions at the same time (polysign
gesture). The context comes to activate this and/or that analogical link which the ges-
ture can propose; it does not just develop a semantic derivation on the basis of an ana-
logical link. In short, gesture appears to be like a composite unit composed of physical
elements that are not only relevant but also potential conveyors of meaning that the
context comes to activate in a selective manner.
7. References
Calbris, Geneviève 1980. Etude des expressions mimiques conventionnelles françaises dans le
cadre d’une communication non verbale. Semiotica 29(3/4): 245–347.
Calbris, Geneviève 1981. Etude des expressions mimiques conventionnelles françaises dans le
cadre d’une communication non verbale testées sur des Hongrois. Semiotica 35(1/2): 125–156.
Calbris, Geneviève 1983. Contribution à une analyse sémiologique de la mimique faciale et ges-
tuelle française dans ses rapports avec la communication verbale, Volume 4. Thèse de doctorat
ès lettres, Paris III.
Calbris, Geneviève 1987. Geste et motivation. Semiotica 65(1/2): 57–96.
674 IV. Contemporary approaches
Calbris, Geneviève 1990. The Semiotics of French Gestures. Bloomington: Indiana University Press.
Calbris, Geneviève 2003. From cutting an object to a clear cut analysis: Gesture as the representa-
tion of a preconceptual schema linking concrete actions to abstract notions. Gesture 3(1): 19–46.
Calbris, Geneviève 2011. Elements of Meaning in Gesture. Amsterdam/Philadelphia: John
Benjamins.
Calbris, Geneviève and Jacques Montredon 1980. Oh là là! Expression intonative et mimique.
Paris: CLE International.
Mauss, Marcel 1950. Les techniques du corps. In: Marcel Mauss, Sociologie et Anthropologie,
36–85. Paris: Presses Universitaires de France. First published [1934].
Abstract
This chapter presents the outline of a praxeological approach to the study of gesture. It
argues that gestures should be analyzed for the cultural practices or methods by which
they are shaped and performed, rather than as a semiotic code or a mode of expression
of mental content. The praxeological approach can be traced to Mauss’ concept “techni-
ques du corps”, which emphasizes the cultural nature of bodily comportment. The chap-
ter discusses the “practice turn” in social research, shows the importance assigned to the
body in practice theory, and defines gesture practices as methods to achieve practical
understanding in complex, multi-modal activity contexts. It is argued that gesture prac-
tices can be investigated in terms of their impact on the ecology of the communicative sit-
uation and their relations to different components of these ecologies. It is suggested that
an adequate methodology for the rigorous study of communicative practices must be
micro-ethnographic, being attentive to both generic practices and organizations, as well
as to the ethnographic particulars of the community and the situation. The chapter
concludes with a brief summary of some recent practice-oriented studies of gesture.
1. Introduction
To investigate gesture in praxeological fashion means to conceive of it, in the first place,
as skilled physical praxis, as embodied activity performed according to methods that are
shared within some community. Gestures are physical actions by which we “do things”
43. Praxeology of gesture 675
(Austin 1962) – although the things that gestures do not only include illocutions, but a
large, presumably unknown, number of other types of social actions, including directing
or attracting attention and showing how something ought to be done. By practices we
mean established, common things that get done by gestures, as well as the habitual, rou-
tinized methods by which gestures are made. Thus, we say, roughly, that there exists in
many societies an established, commonly understood practice – we may call it “point-
ing” – the point of which is to direct an other’s visual attention to some target, and that
there are distinct practices by which pointing gets done in, say, Lao society (Enfield
2009), some involving eyes and fingers, others eyes and lips, each with a distinct social
effect. Each of these practices, when enacted, may require the concurrent enactment of
other practices, some involving the eyes, some speech – pointing is in fact a “multi-
modal” practice (or an integrated “bundle of practices” – issues of hierarchy or logical
type – practices and meta-practices, etc. – will not concern us here). More often than
not, gesture practices are enacted along with speech, and gestures may provide imagery
that overlaps or complements the imagery provided in speech, but the two modalities
nevertheless constitute separate and rather incomparable resources for communicative
action. The tasks for praxeological research on gesture, then, are
and methodological questions are discussed. The chapter concludes with a description
of some recent research on gesture that converges with the praxeological perspective.
2. Praxeology
The practice turn in social theory (Schatzki, Knorr-Cetina, and von Savigny 2001) is
a relatively recent and yet pervasive movement that comprises both classical sources
(historical materialism, phenomenology, Vygotski, Wittgenstein) and contemporary
ones (ethnomethodology, embodied cognition, discourse and conversation analysis,
Bourdieu). Associated with a variety of labels including activity theory, communities
of practice, and distributed cognition, the praxeology of social life and social interaction
does not constitute a single methodology or school, but a shared conviction that “the
social is a field of embodied, materially interwoven practices centrally organized around
shared practical understandings” (Schatzki 2001: 12). Although taken from Wittgenstein
(1953), who was concerned with linguistic practices, the notion of practical understand-
ing is given a distinctly physical interpretation in contemporary praxeology. It refers,
following Wittgenstein, not only to the practical nature of intersubjectivity – that crite-
ria for understanding can never be specified in so many words, but only reveal them-
selves in continued, successfully shared practice, but also to each human body’s tacit,
enactive understanding of the world.
Practical understanding […] exists only as embodied in the individual. An individual pos-
sesses practical understanding, however, only as a participant in social practices. Practical
understanding is […] a battery of bodily abilities that results from, and also makes pos-
sible, participation in practices. […] Because social orders rest upon practices that are
founded on embodied understanding, they are rooted directly in the human body.
(Schatzki 2001: 8)
That the human mind and its products, notably natural languages, reflect their ground-
ing in embodied experience and action has recently become commonplace in fields such
as cognitive linguistics (Johnson 1987) and cognitive neuroscience (Jeannerod 1997).
But while the human body and its fundamental experiences are often posited as univer-
sals in those fields, practice theorists insist on the culturally constituted character of the
body and its experiences in the world: it is the real body with its culturally specific sen-
sibilities and skills that our conceptual systems draw upon. The experiential schemata
that undergird these systems are not the products of our anatomical bodies, but of
our concrete dwelling in the world, that is, the habitual experiences and skills that living
bodies acquire in their daily immersion in a specific, culturally shaped world. “To speak
is to occupy the world, not only to represent it”, writes Hanks (1996: 236).
This emphasis on embodiment and practical understanding makes practice theory
attractive to gesture researchers, because relevant to gestural communication are not
only the skills that bodies acquire in interpersonal relations – where they might pick
up a communicative “code” – but also those acquired in the thing-world, as hands
reach for, grasp, hold, handle, move, hand over, and explore physical objects. The crit-
ical feature of the human hands, however, in so far as our manual gesturing is con-
cerned, is that they are skilled makers of things, organs of an organism which, with
unparalleled versatility, makes meaningful objects out of the raw materials of the
world. The transformation of nature into artifacts is, despite numerous, but invariably
43. Praxeology of gesture 677
Talk between co-workers, the lines they are drawing, measurement tools, and the ability to
see relevant events […] all mutually inform each other within a single coherent activity
[…]. [And] the ability of human beings to modify the world around them, to structure set-
tings for the activities that habitually occur within them, and to build tools, maps, slide
rules, and other representational artifacts is as central to human cognition as processes hid-
den inside the brain. The ability to build structures in the world that organize knowledge,
shape perception, and structure future action is one way that human cognition is shaped
through ongoing historical practices. (Goodwin 1994: 626–628)
3. Praxeology of gesture
The praxeology of gesture takes its cue from Marcel Mauss’ notion of techniques of the
body, by which the French social anthropologist meant “the ways in which from society
to society men know how to use their bodies” (Mauss [1935] 1973: 70). Mauss envi-
sioned a study of culture grounded in observations like the one that “walking and swim-
ming, […] are specific to determinate societies; […] the Polynesians do not swim as we
678 IV. Contemporary approaches
do, […] my generation did not swim as the present generation” (Mauss [1935] 1973: 70).
Techniques of the body travel by imitation and mediated representation: “American
walking fashions […] arrive[d] [in France] thanks to the cinema” (Mauss [1935] 1973: 72).
Mauss discerned the social in the seeming individuality of bodily comportment, claiming
to be able to “recognize a girl that has been raised in the convent” (Mauss [1935]
1973: 72): “The positions of the arms and hands while walking form a social idiosyncrasy,
they are not simply a product of some purely individual, almost completely psychical
arrangements and mechanisms” (Mauss [1935] 1973: 72, emphasis added).
Mauss (not Bourdieu!) coined the term habitus to refer to these “social idiosyncra-
sies” of bodily action, to translate the Aristotelian notion of hexis, “ ‘acquired ability’
and ‘faculty’ ” (Mauss [1935] 1973: 73). In contrast to Bourdieu, who focused his theoretical
imagination more or less exclusively on society’s inscription on the body, Mauss saw in
bodily habits “the techniques and work of collective and individual reason” (Mauss
[1935] 1973: 73, emphasis added), emphasizing the specific intelligence and sensibility
that is embodied and enacted in someone’s habitus. Techniques of the body are cultur-
ally specific, traditional solutions to recurrent practical, communicative and interaction
tasks. But their development in the individual, and as a consequence their manner of
use and coordination with other practices, are always also intensely personal affairs
(Polanyi 1958): as Thelen’s research has demonstrated, for example, there is no set pro-
gram for learning to walk; each child’s solution to the task of getting on the feet, keep-
ing balance, and beginning to walk, emerges through an idiosyncratic “soft assembly” of
component skills (Thelen and Smith 1994).
A conception of culture as acquired, distinctive techniques du corps also inspired the
classic photographic study of Balinese Character by Gregory Bateson and Margaret
Mead (1942). They set out to study “living persons moving, standing, sleeping, dancing,
and going into trance, [who] embody that abstraction which (after we have abstracted
it) we technically call culture” (Bateson and Mead 1942: xii).
Bateson and Mead observed, for example, how a Balinese left hand holds the paper
on which a Balinese right hand is writing and concluded from these and other observa-
tions that the Balinese specialize the left hand for “light surface touch”, while the right
hand is specialized for the more active tasks such as wielding instruments. These specia-
lizations, in turn, reveal “cultural idiosyncrasies” such as a taboo placed on the left hand
that prevents it from being used to hand over objects and to engage in many other ac-
tions. Thus, Bateson and Mead also studied how the right-hand preference is inscribed
upon infant bodies – as well as how the Balinese practice bending their fingers back-
ward, how ceremonially unclipped nails shape the ways in which fingers can hold
objects and people, and, as a consequence, people experience these objects.
Human hands are enculturated hands. But the practice of gesturing also enculturates
them. As Noland writes in Agency and Embodiment, “the human body becomes a social
fact through the act of gesturing” (Noland 2009: 19), and “gesturing is the visible per-
formance of a sensorimotor body that renders that body at once culturally legible and
interoceptively available to itself ” (21).
Gesture is a double-faced performance, making meaning visible to others, while
being experienced through kinesthesia – “muscle sense” or proprioception (Gibson
1966) – by the self. Performing culturally legible gestures, we couple patterns of enac-
tive self-perception with typified motives, stances, and interactional moves – whatever is
“conveyed” by the gesture. Kinesthesia is essential to gesture, as it is to all modes of
43. Praxeology of gesture 679
skilled motor action: we perceive the things that we do with our hands – and compre-
hend (“gather together”) the things with which we do them – through “internal”, kines-
thetic perception. J.J. Gibson refers to this apprehending system as the haptic system
(Gibson 1966). It is as haptic systems that we explore, handle, and make sense of the
world at hand, as well as our own physical actions within it. But kinesthesia is important
also with respect to the cognitive roles of gestures: gestures provide their makers with
“felt” motor patterns in terms of which to structure ideational content.
The praxeological approach to gesture has no prior commitment concerning the re-
lations among gesture, speech/language, and cognitive processes – whether, for exam-
ple, gesture and speech are parts of or controlled by the same psychological system.
Ethnographic research suggests that these relations are manifold, shifting, and complex:
gesture and speech can anchor, elaborate, and contextualize each other in a variety of
ways, depending in part on timing and the use of foregrounding devices in and for
speech and gesture. What remains identical across gesture situations is the fundamental
materiality – or phenomenal unity – of gesture: that meaning is made by physical
actions of the arms and hands. Leroi-Gourhan ([1964] 1993) sees no fundamental differ-
ence between instrumental and expressive gestures, i.e. between practical and commu-
nicative motor acts: “both […] produce kinesthetic experience as part of the recursive
loop of correction and refinement over time” (Noland 2009: 15).
The praxeology of gesture is interested in the ways in which gestures participate in the
bringing about of shared practical understandings; it conceives gesture as a heteroge-
neous, open-ended family of practices geared towards achieving practical understandings,
i.e. understandings that, while they may support what is being said, can not be formulated
in so many words. Many of the patterns of enactment and sensation that gesture offers
have counterparts in practical action and experience, and if this is the case, the gesture
projects the experiential pattern from everyday life onto the discursive material that is
in need of conceptualization. Understanding, thus, does not come from shared rules of
grammar or a shared lexicon, but rather from sufficiently shared practices (Hanks 1996).
4. Gesture as praxis
Gesture is a praxis, and gestures are practiced actions. Gestures belong to our “equip-
ment for dwelling” (Dreyfus 1991), for collaboratively inhabiting, sustaining, making
sense of, and re-making a common world. “Gestures originate in the tactile contact
that mindful human bodies have with the physical world. […] [They] ascend from ordi-
nary […] manipulations in the world of matter and things, and […] the knowledge that
the human hands acquire […] in these manipulations is […] brought to bear upon the
symbolic tasks of [gesture]” (LeBaron and Streeck 2000: 119).
Gestures are implicated in our activities and organize situations in a number of dif-
ferent ways. These ways can be systematized if we conceive of situations or contexts,
following Bateson (1972), as “micro-ecological systems”. The micro-ecology of commu-
nicative situations – and the conditions for practical understanding – are impacted and
reconfigured by hand gestures in a number of distinctly different ways. Streeck (2009b)
has proposed that we can minimally distinguish six modes of “ecological involvement”
of hand gestures, i.e. ecological components of the situation that are addressed by ges-
tures of the hands: the setting at hand, which a gesture can elaborate; the visible scene,
which gestures can structure; talked-about worlds beyond the here and now, which can
680 IV. Contemporary approaches
5. Ecologies of gesture
We can only give the most cursory outline here of some of the ways in which hand ges-
tures relate to and bear upon situated ecologies. First, relating to the world within reach
of the hands, we find indexical gesture actions by which objects are selected from their
context – as figures from grounds – including placing (Clark 2003), raising, tapping, and
tracing (Goodwin 1994). A seemingly unlimited, infinitely renewable set of methods of
tactility, hapticity, and manipulation serve to transform perceived reality into counter-
factual or potential versions: the scene at hand is annotated by schematic acts to adum-
brate what could be, what could be done, what should be – to transform the perceived
scene into a field of action. Hand gestures also emanate from “active touch” (Gibson
1962), i.e. information-gathering actions of the hands whose findings – an object’s tex-
ture or internal properties – can be broadcast by motions of the fingers and hands that
are amplified versions of those qualities of motion that correspond to the qualities of
the surface or object that is being touched (softness, coarseness, and so on). In this fashion,
tactile experience is made visible, attesting to the “interstitial”, modality-transcending
affordances of gestures of the hands. We can also include gestures of the hands that
are simply repetitions or augmentations of practical acts and thereby select the action
itself as cognitive object and highlight some of its features, often for instructional
43. Praxeology of gesture 681
specificity and granularity: simple acts of schematic holding or putting (setting down)
characterize classes of objects in the broadest terms, while schematic versions of tech-
nical acts visualize narrow object categories (for example, both hands in grip position,
moved a few inches up and down in alternation, appears to unequivocally evoke a steer-
ing wheel). When little constructional effort is expended because the depiction is not
salient or meant to be precise, very generic acts of transportation are often employed:
picking up, holding, setting down. Schematically depictive handlings build upon our
everyday experience with objects, abstracting schemata from them and using them in
representational fashion.
Commonly, depictive and conceptual gestures are co-classified as iconic gestures, and
there is no overt difference in the forms of these gestures; they use the same methods to
construct imagery that represents some phenomenon in some world. However, it makes
a fundamental difference whether an “iconic” gesture is used to depict a phenomenon
or the phenomenon that it evokes serves as a concrete form to structure abstract con-
tent. Moreover, speakers behave quite differently when they depict than when they
conceptualize by gesture. Only in the former case do they turn their attention to
their hands and thereby also direct the addressee’s attention to the gesture. This is
never done when the gesture has a conceptual function. Like linguistic metaphors –
which picture something concrete while referring to something abstract and involve com-
parison and analogy between domains – conceptual gestures provide concrete Gestalten
for ideations: they give form to notions (see Streeck 2009a: Chapter 7). Conceptual
gestures usually remain in the background, receiving no focused visual attention.
Conceptual gestures bear a relationship to manual acts by which humans seek to
make sense of tangible things: reaching for, grasping, taking, holding, manipulating,
and exploring. As our hands find the right aperture, posture, and constellation of forces
when they take hold of an object, they understand practically what kind of an object it
is, what its affordances are, what can be done with it. Our hands acquire a schema, a
haptic concept, of a (class of) thing. (Note that the very term schema originally denotes
a wrestler’s hold.) Practical understandings are embodied as habitualized prehensile
postures and acts. These habitualized schemata then become available to be projected
onto other experiential domains, in other words, for metaphorical usage. The metaphor-
ical nature of conceptual gesture practices is central to Calbris’ account of meaning in
gesture (Calbris 2011). Calbris assumes, with Leroi-Gourhan, that gestures are intrinsi-
cally related to practical action. By evoking practical actions, gestures simultaneously
evoke all the other components (or valences) associated with them: the actor, instru-
ment, object, manner of action, etc. Moreover, “not only does gesture intervene in
our physical interaction with the world but also in our pictorial […] and linguistic
[…] expressions […]. One finds it constantly present in this interactive loop of progressive
feedback between action and representation” (Calbris 2011: 1).
Gesture embodies the body’s real-world experience: “the living world [reverberates]
in the living body. Semiosis, conceptualization, and thought originate in the body, the
receptacle of real-world interactions” (Calbris 2011: 2). Gestures give form to concepts
not by providing a material substrate upon which meaning can be inscribed, but by
retaining and providing lived experiential structure to organize novel content.
Thus, gestural practices for depicting and making sense of the world are derived from
ways in which embodied human beings engage with the physical world; we understand
the gestures – and see what they are meant to evoke – because we are familiar with the
43. Praxeology of gesture 683
bodily engagements that they are predicated on. Gestural imagery evokes and presup-
poses the background of embodied everyday experience. It derives from our common
hold on things. In a praxeological account, conceptual gestures are conceptual actions,
i.e. physical actions that bring their own inherent experiential structures and meanings
to bear on the situation at hand. It is in this fashion, by providing lived, schematic expe-
rience for meaning-making, that gesture not so much supports and expresses thought,
but constitutes a form of thinking.
6. Methodology
Different strands of praxeological research have developed, and the granularity of
description and analysis of social practices varies between them. Research into “com-
munities of practice” aims for comprehensive accounts of those practices that define
the community, or they present more detailed investigations of key moments and the
practices deployed in them. Praxeological research has been conducted under monikers
such as “activity theory”, “complexity theory”, “actor-network theory”, “studies of
work/workplace studies”, “distributed cognition”, and others, but all praxeological
research includes some ethnographic component.
In France, anthropological film-makers in the tradition of Mauss and trained by
Jean Rouch have developed a method of praxeological research that uses cinemato-
graphic techniques (editing and montage) to unveil the structures of human activities
and the ‘knowledge-producing gestures’ (gestes de savoir; Comolli 1991) by which
techniques of the body and material practices are communicated to novices. Mauss
insisted that techniques of the body are acquired through training, in co-operative,
gesturally mediated, and sometimes transitory apprenticeship interactions (de France
1983). Praxeology, as these researchers understand it, reveals the chaine operatoire
of an activity, the series of consecutive gestures that are made in the making of an
artifact, the playing of an instrument, the setting of a table, or in the course of a social
encounter. Praxeological analysis is comprised “a partir d’une micro-analyse des
articulations entre les differents moments d’un procès, examinées de proche en
proche, et d’une macro-analyse s’attachant aux articulations entre des vastes ensem-
bles de séquences d’activitées” (de France 1983: 148). The ultimate goal appears to be
a kind of encyclopedia or grammar of everyday activities and skills, arrived at by the
micro-analysis of individual cases and the comparative analysis of collections of
phenomena.
Practices must be studied in situ, within the moment-by-moment progression of in-
teractions within which they are enacted. Micro-ethnography (Erickson and Schultz
1977; Streeck and Mehus 2004) is a research methodology that is both geared towards
the identification of generic methods and devices (practices habitualized in a commu-
nity or by an individual to deal with kinds of circumstance) and to the Gestalt and
local significance of the single moment of activity and interaction. It is particularly
well-suited for praxeological research because it focuses on the minute details of the
embodied enactment of practices in which alone the skillfulness of social actors fully
reveals itself.
Gesture practices address issues of intersubjectivity; they are practices geared to
achieve practical understanding in the context of collaboration and talk. Intersubjectiv-
ity in interaction is predicated on sequentiality, i.e. the opportunity for next parties to
684 IV. Contemporary approaches
display their understanding of a current action in their response to it, and for the orig-
inator of the original action to monitor in next party’s response whether the original
action has indeed been understood. Gestures of the hand do their work in relation to
and as components of conversational turns, actions, and action sequences. They address
distinctly different tasks at different positions within turns and sequences, and the pro-
jection that a gesture makes – what it conveys about the next moment – is a matter of
the exact moment when it is made (Streeck 2009b): a hand gesture made just before the
onset of a turn can specify the communicative act that is to be performed in that turn; a
gesture that occupies a gap between article and noun will be perceived as a (literal or
metaphorical) depiction; during turn-completion, a shrug can serve as an invitation to
the other to share or respond to the stance towards the just-completed utterance (its
content, its pragmatic upshot) that the speaker expresses by the shrug. There has
been a considerable amount of research on the step-by-step production of gestures,
i.e. the stages in which simple and complex gestural acts unfold, from the moment
the hands depart from their “rest positions” to the stroke, perhaps a post-stroke freeze,
and back to “home position” (Sacks and Schegloff 2002). Gesturers make use of the
possibility to manipulate the temporal characteristics of speech and gesture to achieve
distinct communicative effects. For example, a post-stroke hold during a negation ges-
ture marks the scope of the negation (Harrison 2010); or the freezing of a turn “hand-
over” gesture may indicate a waiting stance and thereby convey a request for response.
The movement arc of the single gesture – and more so the combined motions and holds
of extended gesture phrases – offer rich and varied opportunities for coupling gestures
and words in multimodal “packages”, and the importance of multimodal utterances
has been shown for various practice communities (Heath and Luff 2006) and instruc-
tional situations (Müller and Bohle 2007). Goodwin has called gesture an “intersti-
tial” medium, i.e. an embodied praxis that is particularly apt at stitching other
representations – for example, words and pictures – together (Goodwin 2007). The
fine-grained analysis of gesture practices, therefore, not only demands close attention
to the sequential unfolding of the gesture and the responses it receives, but also to the
ways in which other resources are recruited at the moment. Gesture is a visual modality,
and to perform salient communicative tasks, it needs to be seen. A speaker who seeks to
implicate a gesture in the moment’s understanding is thus tasked with getting the ad-
dressee’s visual attention. Conversely, listeners who do not wish to affiliate with the
turn’s project may do so by actively disattending the gesture that is a salient part of
it (Streeck 2008b).
In sum, the enactment and communicative effects of different gesture practices can
only reliably be investigated by combining the close, moment-by-moment analysis that
characterizes micro-analytic approaches such as context analysis and conversation
analysis with attention to the ethnographic particulars that make gesture relevant
and practical within a particular community and situation.
Analysis of the full gesture repertoire in a community of practice can also reveal a
community’s lived epistemology, as Sauer (2003) has demonstrated in her analysis of
the ways in which miners re-embody accidents or dangerous situations. They alternate
between “first-person” and “third-person” depictions of mining accidents, shifting back
and forth between rendering subjective experience and analyzing the situation from a
detached, “God’s eye” point of view. It is this duality of perspectives upon which
their survival often depends – the ability to subjectively sense danger and simulta-
neously analyze a situation in detached, objective terms (“at arm’s length”). Miners
refer to this as “pit sense”.
8. Conclusion
Because of the relative novelty of the practice approach to the analysis of social life and
communication and because many researchers are committed to longer-standing para-
digms such as cognitive psychology, and conversation analysis, it is unlikely that prax-
eology will visibly dominate gesture studies and interaction research any time soon.
Its influence may remain more indirect, coming from the practice-orientation that is
already inherent in some of these methodologies, especially those that trace part of
their theoretical lineage either to the late Wittgenstein (1953) – the Wittgenstein of
practical understandings – or to phenomenologists such as Polanyi (1958, 1966) and
Merleau-Ponty (1962) and their interest in the enactive nature of embodiment and in
the embodied nature of human cognition. It is also possible, however, as Meyer
(2010) has suggested, that the approach outlined here will provide praxeological studies
of other domains (e.g., science-and-technology studies, studies of work) and the micro-
analysis of social interaction in general with a new model for naturalistic research,
because of its empirical precision, ecological validity, and because its practitioners
recognize the enculturated nature of the living, interacting, and gesturing body.
9. References
Austin, John 1962. How to Do Things with Words. Oxford: Oxford University Press.
Bateson, Gregory 1972. Steps to an Ecology of Mind. New York: Ballantine.
Bateson, Gregory and Margaret Mead 1942. Balinese Character. A Photographic Analysis. New
York: New York Academy of Sciences.
Bourdieu, Pierre 1977. Outline of a Theory of Practice. Cambridge: Cambridge University Press.
Bourdieu, Pierre 1990. The Logic of Practice. Stanford, CA: Stanford University Press.
Calbris, Geneviève 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.
Clark, Herbert 2003. Pointing and placing. In: Sotaro Kita (ed.), Pointing: Where Language, Cul-
ture, and Cognition Meet, 243–268. Mahwah, NJ: Lawrence Erlbaum.
Comolli, Annie 1991. Les Gestes du Savoir, second edition. Paris: Éditions Publidix.
De France, Claudine 1983. L’Analyse Praxéologique. Composition, Ordre et Articulation d’un
Procès. Paris: Éditions de la Maison des Sciences de l’Homme.
Dreyfus, Hubert L. 1991. Being-in-the-World. A Commentary on Heidegger’s “Being and Time”.
Cambridge: Massachusetts Institute of Technology Press.
Enfield, N. J. 2009. The Anatomy of Meaning. Cambridge: Cambridge University Press.
Erickson, Frederick and Jeffrey Schultz 1977. When is a context? Some issues in the analysis of
social competence. Quarterly Newsletter of the Institute for Comparative Human Development
1(2): 5–10.
Gibson, James J. 1962. Observations on active touch. Psychological Review 69: 477–491.
43. Praxeology of gesture 687
Gibson, James J. 1966. The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Goodman, Nelson 1978. Ways of Worldmaking. Indianapolis: Hackett.
Goodwin, Charles 1994. Professional vision. American Anthropologist 96(3): 606–633.
Goodwin, Charles 2003. Pointing as situated practice. In: Sotaro Kita (ed.), Pointing: Where Lan-
guage and Cognition Meet, 217–242. Mahwah, NJ: Lawrence Erlbaum.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan D. Duncan, Justine Cassell
and Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language: Essays in
Honor of David McNeill, 195–212. Philadelphia: John Benjamins.
Goodwin, Charles and Marjorie Harness Goodwin 1986. Gesture and coparticipation in the activ-
ity of searching for a word. Semiotica 62(1/2): 51–75.
Hanks, William F. 1996. Language and Communicative Practices. Boulder, CO: Westview Press.
Hanks, William F. 2005. Pierre Bourdieu and the practices of language. Annual Review of Anthro-
pology 34: 67–83.
Harrison, Simon 2010. Evidence for node and scope of negation in coverbal gestures. Gesture
10(1): 29–51.
Heath, Christian and Paul Luff 2006. Video analysis and organizational practice. In: Hubert Kno-
blauch, Bernt Schnettler, Jürgen Raab and Hans-Georg Soeffner (eds.), Video Analysis: Meth-
odology and Methods. Qualitative Audiovisual Analysis in Sociology, 35–50. Frankfurt am
Main: Peter Lang.
Holzkamp, Klaus 1978. Sinnliche Erkenntnis: Historischer Ursprung und Gesellschaftliche Funk-
tion der Wahrnehmung, 4th revised edition. Königstein: Athenäum.
Hutchins, Edwin 1991. The social organization of distributed cognition. In: Lauren B. Resnick,
John M. Levine and Stephanie D. Teasley (eds.), Perspectives on Socially Shared Cognition,
283–307. Washington, DC: American Psychological Association.
Hutchins, Edwin 1995. Cognition in the Wild. Cambridge: Massachusetts Institute of Technology
Press.
Hutchins, Edwin 2005. Material anchors for conceptual blends. Journal of Pragmatics 37: 1555–
1577.
Hutchins, Edwin and Saeko Nomura 2011. Collaborative construction of multimodal utterances.
In: Jürgen Streeck, Charles Goodwin and Curtis D. LeBaron (eds.), Embodied Interaction.
Language and Body in the Material World, 289–304. New York: Cambridge University Press.
Jeannerod, Marc 1997. The Cognitive Neuroscience of Action. Oxford: Blackwell.
Johnson, Mark 1987. The Body in the Mind. Chicago: University of Chicago Press.
Kendon, Adam 1995. Gestures as illocutionary and discourse structure markers in Southern Ital-
ian conversation. Journal of Pragmatics 23(3): 247–279.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan”. In: Sotaro Kita (ed.),
Pointing: Where Language, Culture, and Cognition Meet, 109–138. Mawah, NJ: Lawrence
Erlbaum.
LeBaron, Curtis D. and Jürgen Streeck 2000. Gestures, knowledge, and the world. In: David
McNeill (ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
Leroi-Gourhan, André 1993. Gesture and Speech. Cambridge: Massachusetts Institute of Technol-
ogy Press. First published [1964].
Mauss, Marcel 1973. The techniques of the body. Economy and Society 2(1): 70–88. First published
[1935].
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. London: Routledge.
Meyer, Christian 2010. Gestenforschung als Praxeologie: Zu Jürgen Streecks mikroethnogra-
phischer Theorie der Gestik. Gesprächsforschung – Online-Zeitschrift zur Verbalen Interaktion
11: 208–230.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich.
Berlin: Arno Spitz.
688 IV. Contemporary approaches
Müller, Cornelia and Ulrike Bohle 2007. Das Fundament fokussierter Interaktion. Zur Vorberei-
tung und Herstellung von Interaktionsräumen durch körperliche Koordination. In: Reinhold
Schmitt (ed.), Koordination. Studien zur Multimodalen Interaktion, 129–166. Tübingen: Gunter
Narr.
Noland, Carrie 2009. Agency and Embodiment. Performing Gestures/Producing Culture. Cam-
bridge, MA: Harvard University Press.
Ochs, Elinor, Patrick Gonzales and Sally Jacoby 1996 “When I come down I’m in the domain
state”: Grammar and graphic representation in the interpretive activity of physicists. In: Elinor
Ochs, Emanuel A. Schegloff and Sandra E. Thompson (eds.), Interaction and Grammar,
328–369. Cambridge: Cambridge University Press.
Polanyi, Michael 1958. Personal Knowledge. Towards a Post-Critical Philosophy. New York:
Harper Torchbooks.
Polanyi, Michael 1966. The Tacit Dimension. Garden City, NY: Doubleday.
Rahaim, Matt 2008. Gesture and melody in Indian vocal music. Gesture 8(3): 325–347.
Ryle, Gilbert 1949. The Concept of Mind. London: Hutchinson’s University Library.
Sacks, Harvey and Emanuel A. Schegloff 2002. Home position. Gesture 2(2): 133–146.
Sauer, Beverly J. 2003. The Rhetoric of Risk. Technical Documentation in Hazardous Environ-
ments. Mahwah, NJ: Lawrence Erlbaum.
Schatzki, Theodore R. 2001. Introduction: Practice theory. In: Theodore R. Schatzki, Karin Knorr-
Cetina and Eike von Savigny (eds.), The Practice Turn in Contemporary Theory, 1–14. London:
Routledge.
Schatzki, Theodore R., Karin Knorr-Cetina and Eike von Savigny (eds.) 2001. The Practice Turn in
Contemporary Theory. London: Routledge.
Sowa, Timo 2006. Understanding Coverbal Iconic Gestures in Shape Descriptions. Berlin: Akade-
mische Verlagsgesellschaft.
Streeck, Jürgen 2002a. A body and its gestures. Gesture 2(1): 19–44.
Streeck, Jürgen 2002b. Grammars, words, and embodied meanings. On the evolution and uses of
so and like. Journal of Communication 52(3): 581–596.
Streeck, Jürgen 2008a. Depicting by gestures. Gesture 8(3): 285–301.
Streeck, Jürgen 2008b. Laborious intersubjectivity: Attentional struggle and embodied communi-
cation in an auto-shop. In: Ipke Wachsmuth, Manuela Lenzen and Günther Knoblich (eds.),
Embodied Communication in Humans and Machines, 201–228. Oxford: Oxford University
Press.
Streeck, Jürgen 2008c. Metaphor and gesture: A view from the microanalysis of interaction. In:
Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture, 259–264. Amsterdam: John
Benjamins.
Streeck, Jürgen 2009a. Forward-gesturing. Discourse Processes 45(3/4): 161–179.
Streeck, Jürgen 2009b. Gesturecraft. The Manu-Facture of Meaning. Amsterdam: John Benjamins.
Streeck, Jürgen and Siri Mehus 2004. Microethnography: The study of practices. In: Kristine L.
Fitch and Robert E. Sanders (eds.), Handbook of Language and Social Interaction, 381–405.
Mahwah, NJ: Lawrence Erlbaum.
Thelen, Esther and Linda Smith 1994. A Dynamic Systems Approach to the Development of Cog-
nition and Action. Cambridge: Massachusetts Institute of Technology Press.
Wakin, Daniel J. 2012. The maestro’s mojo. New York Times, April 6.
Wilkins, David 2003. Why pointing with the index finger is not a universal (in sociocultural and
semiotic terms). In: Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition
Meet, 171–216. Mahwah, NJ: Lawrence Erlbaum.
Wittgenstein, Ludwig 1953. Philosophical Investigations. Oxford: Blackwell.
Abstract
This chapter argues for a composite utterances approach to research on body, language,
and communication. It argues that to understand meaning we need to begin with the utter-
ance or speech act as the unit of analysis. From this perspective, the primary task in inter-
preting others’ behaviour in communication is to infer what a person wants to say. In
order to solve this task, an interpreter is free to consult any and all available information,
regardless of the sensory modality in which that information is gathered (e.g., vision ver-
sus hearing), and regardless of the semiotic function of that information (e.g., iconic/
indexical, symbolic/conventional, or some combination of these). Having recognized
that another person has an intention to communicate, an interpreter takes the available
relevant information (e.g., vocalizations, facial expressions, hand movements, all in the
context of synchronic knowledge of linguistic and cultural systems, and other aspects
of common ground) and looks for a way in which those co-occuring signs may simulta-
neously point to a single overall message of the move that a person is making. This is
helped by the binding power of social cognition in an enchronic context (that is, the
sequential context of turn-by-turn conversation), in particular the assumption that people
are not merely saying things but making moves. The chapter focuses on co-speech hand
gestures, and also discusses implications of the composite utterances approach to research
on syntax, and on sign language.
1. Introduction
In human social behavior, people build communicative sequences move by move.
These moves are never semiotically simple. Their composite nature is widely varied
in kind: they may consist of a word combined with other words, a string of words com-
bined with an intonation contour, a diagram combined with a caption, an icon com-
bined with another icon, a spoken utterance combined with a hand gesture. By what
means does an interpreter take multiple signs and draw them together into unified,
meaningful packages? This chapter explores the question with special reference to
one of our most familiar types of move, the speech-with-gesture composite, a classical
locus of research on body, language, and communication (see other chapters of this
handbook relating to gesture, and many references therein). The central question is
this: How do gestures contribute to the meaning of an utterance? To answer this,
we need to situate research on gesture within broader questions of research on
meaning.
690 IV. Contemporary approaches
Fig. 44.1: Man (left of image) speaking of preferred angle of a drainage pipe under construction:
“Make it steep like this.”
The discussion in the context of Fig. 44.1 is about construction works under way in the
temple. The man on the left is reporting on a problem in the installation of drainage
pipes from a bathroom block. He says that the drainage pipes have been fixed at too
low an angle, and they should, instead, drop more sharply, to ensure good run-off. As
692 IV. Contemporary approaches
he says haj5 man2 san2 cang1 sii4 ‘Make it steep like this’, he thrusts his arm forward
and down, fixing his gaze on it, as shown in Fig. 44.1. The meanings of his words and his
gesture are tightly linked, through at least three devices:
(i) their tight spatiotemporal co-occurrence in place and time (both produced by the
same source),
(ii) the use of the explicit deictic expression “like this” (sending us on a search: “Like
what?”, and leading us to consult the gesture for an answer),
(iii) the use of eye gaze for directing attention.
A similar case is presented in Fig. 44.2, from a description of a type of traditional Lao
fish trap called the sòòn5 (see Enfield 2009: Chapter 5).
Fig. 44.2: Man describing the sòòn5, a traditional Lao fish trap: “As for the sòòn5, they make it
fluted at the mouth.”
(i) a string of words (itself a composite sign consisting of words and grammatical
constructions),
(ii) a two-handed gesture,
(iii) tight spatiotemporal co-occurrence of the words and gestures (from a single
source), and
(iv) eye gaze directed toward the hands, also helping to connect the composite utter-
ance’s multiple parts.
This is subtly different from Fig. 44.1 in that it does not involve an explicit deictic
element in the speech. Like the picture-with-caption examples mentioned above,
spatiotemporal co-placement in Fig. 44.2 is sufficient to signal semiotic unity. The
gesture, gaze and speech components of the utterance are taken together as a uni-
fied whole. As interpreters, we effortlessly integrate them as relating to one overall
idea.
44. A “Composite Utterances” approach to meaning 693
A general theory of composite meaning takes Figs. 44.1 and 44.2, along with road
signs, paintings on gallery walls, and captioned photographs to be instances of a single
phenomenon: signs co-occurring with other signs, acquiring unified meaning through
being interpreted as co-relevant parts of a single whole. A general account for how
the meanings of multiple signs are unified in any one of these cases should apply to
them all, along with many other species of composite sign, including co-occurring
icons in street signs, grammatical unification of lexical items and constructions, and
speech-with-gesture composites.
In studying speech-with-gesture, there are two important desiderata for an account
of composite meaning. A first requirement is to provide a modality-independent
account of gesture (Okrent 2002). While we want to capture the intuition that co-speech
hand gesture (manual-visual) conveys meaning somehow differently to speech (vocal-
aural), this has to be articulated without reference to modality. We need to be able
to say what makes speech-accompanying hand movements “gestural” in such a way
that we can sensibly ask as to the functional equivalent of co-speech gesture in other
kinds of composite utterances; for example, in sign language of the Deaf (all visual,
but not all “gesture”), or in speech heard over the phone (all vocal-aural, but not all
“language”).
A second desideratum for an account of meaning in speech-with-gesture composites
is to capture the notion of “holistic” meaning in hand gestures, the idea that a hand
gesture has the meaning it has only because of the role it plays in the meaning of an
utterance as a whole (Engle 1998; McNeill 1992, 2005). If we want to achieve analytic
generality, then a notion of holistic meaning is required not only for analyzing the
meaning of co-speech hand gesture, but more generally for analyzing linguistic and
other types of signs as well. This results from acknowledging that an interpreter’s
task begins with the recognition of a signer’s communicative intention (i.e., recognizing
that the signer has an informative intention). The subsequent quest to lock onto a target
informative intention can drive the understanding of the composite utterance’s parts,
and not necessarily the other way around.
2. Composite utterances
2.1. Contexts of hand gesture
One view of speech-with-gesture composites is that the relation between co-expressive
hand and word is a reciprocal one: “the gestural component and the spoken component
interact with one another to create a precise and vivid understanding” (Kendon 2004:
174, original emphasis; see Özyürek et al. 2007). By what mechanism does this reci-
procal interaction between hand and word unfold? Different approaches to analyzing
meanings of co-speech gestures find evidence of a gesture’s meaning in a range of
sources, including
These four sources, often combined, draw on different components of a single underly-
ing model of the communicative move and its sequential context, where the hand-
movement component of the composite utterance is contextualized from three angles:
A. what just happened, B. what else is happening now, C. what happens next. This
three-part sequential structure underlies a basic trajectory model recognized by many
students of human social behaviour. Schütz (1970), for example, speaks of actions
(at B) having “because motives” (at A) and “in-order-to motives” (at C; e.g., I’m picking
berries [B] because I’m hungry [A], in order to eat them [C]; see Sacks 1992; Schegloff
2007 among many others).
sign is found when people take a certain signifier to stand for a certain signified because
that is what members of their community normatively do (Saussure [1916] 1959; on
norms, see Brandom 1979; Kockelman 2006). This kind of sign allows for arbitrary re-
lations like /khæt/ referring to ‘cat’, by which the cause of my taking [khæt] to mean
‘cat’ is my experience with previous occasions of use of tokens of the signifier /khæt/.
Examples of conventional signs include words and grammatical constructions, idioms,
and “emblem” hand gestures such as the OK sign, V for Victory, or The Finger
(Brookes 2004; Ekman and Friesen 1969). Non-conventional signs, by contrast, are
found when people take certain signifiers to stand for certain signifieds not because
of previous experience with that particular form-meaning pair or from social conven-
tion, but where the standing-for relation between form and meaning comes about by
virtue of just that singular event of interpretation. Examples include representational
hand gestures (in the sense of Kita 2000), that is, where the gesture component of an
utterance is a token, analogue representation of its object.
The symbolic indexical is a hybrid of the two types of sign just described, having
properties of both. These include anything that comes under the rubric of deixis (Fill-
more 1997; Levinson 1983), that is, form-meaning mappings whose proper interpre-
tation depends partly on convention and partly on context (Bühler [1934] 1982;
Jakobson 1971; Silverstein 1976). Take for example him in Take a photo of him. Your
understanding of him will depend partly on your recognition of a conventional, context-
independent meaning of the English form him (third person, singular, male, accusative)
and partly on non-conventional facts unique to the speech event (e.g., whichever male
referent is most salient given our current joint attention and common ground). Sym-
bolic indexicals play a critical role in many types of composite utterance, since their
job is to glue things together, including words, gestures, and (imagined) things in the
world (see Part I of Enfield 2009, and studies of pointing in this handbook).
In the context of these three kinds of sign, it is important to be mindful of the dis-
tinction between type and token (Hutton 1990; Peirce 1955). All of the signs discussed
above occur as tokens, that is, as perceptible, contextualized, unique instances. But only
conventional signs (including conventional components of symbolic indexicals) neces-
sarily have both type and token identities. That is, when they occur as tokens, they
are tokens of types, or what Peirce called replicas. It is because of their abstract type
identity that conventional signs can be regarded as meaningful independent of context,
as having “sense” (Frege [1892] 1960), “timeless meaning” (Grice 1989) or “semantic
invariance” (Wierzbicka 1985, 1996). Conventional signs are pre-fabricated signs,
already signs by their very nature. By contrast, non-conventional signs (including
non-conventional components of symbolic indexicals) are tokens but not tokens of
types. They are singularities (Kockelman 2005). They become signs only when taken
as signs in context. This is the key to understanding the asymmetries we observe in com-
posite utterances such as speech-with-gesture ensembles. A hand gesture may be a con-
ventional sign (e.g., as “emblem”). Or it may be non-conventional, only becoming a sign
because of how it is used in that context (e.g., as “iconic” or “metaphoric”). Or it may
be a symbolic indexical (e.g., as pointing gesture, with conventionally recognizable
form, but dependent on token context for referential resolution). Hand gestures are
not at all unique in this regard: the linguistic component of an utterance may, similarly,
be conventional (e.g., words, grammar), non-conventional (e.g., voice quality, sound
stretches), or symbolic indexical (e.g., demonstratives like yay or this). Ditto for
44. A “Composite Utterances” approach to meaning 697
I. Encoded
I.1. Lexical (open class, symbolic)
I.2. Grammatical (closed class, symbolic-indexical)
II. Enriched
II.1. Indexical resolution
II.1.1. Explicit (via symbolic indexicals, e.g., pointing or demonstratives)
II.1.2. Implicit (e.g., from physical situation)
II.2. Implicature
II.2.1. From code
II.2.2. From context
Fig. 44.3: Sources of composite meaning for interpretation of communicative moves. “Encoded” =
conventional sign components. “Enriched” = non-conventional token meanings drawing on
context.
In Fig. 44.3, “encoded meaning” encompasses both lexical and grammatical meaning.
Grammatical signs show greater indexicality because they signify context-specific ties
between two or more elements of a composite utterance (e.g., grammatical agreement,
case-marking, etc.) or between the speech event and a narrated event (Jakobson 1971;
e.g., through tense-marking, spatial deixis, etc.). “Indexical enrichment” refers to the
resolution of reference left open either explicitly (e.g., through symbolic indexicals
like this) or implicitly (e.g., by simple co-placement in space or time; thus, a “no smok-
ing” sign need not specify “no smoking here”). “Enrichment through implicature” re-
fers to Gricean token understandings, arising either through rational interpretation
based on knowledge of a restricted system of code (i.e., informativeness scales and
other mechanisms for Generalized Conversational Implicature; Levinson 2000), or
through rational interpretation based on cultural or personal common ground (e.g., Par-
ticularized Conversational Implicatures such as those based on a maxim of relevance;
Sperber and Wilson 1995).
Thus, composite utterances are interpreted through the recognition and bringing
together of these multiple signs under a pragmatic unity heuristic or co-relevance prin-
ciple, i.e., an interpreter’s steadfast presumption of pragmatic unity despite semiotic
complexity.
special diacritic marking (see Figs. 44.1 and 44.2, above). An example of the latter is
discussed in Enfield (2009, Chapter 3) where movements of the face and head can
serve as triggers for eye gaze to be interpreted as pointing, not merely as looking. In
yet other cases, interpreters can employ abductive, rational interpretation to detect
that an action is done with a communicative intention (Grice 1957; Peirce 1955). For
instance, if you open a jar I may be unlikely to take this to be communicative, but if
you carry out the same physical action without an actual jar in your hands, the lack
of conceivable practical aim is likely to act as a trigger for implicature (Gergely,
Bekkering, and Király 2002; Levinson 1983: 157).
Data of the kind presented throughout this handbook do not usually present special
difficulties for interpreters in detecting communicative intention or identifying which
signs to include when interpreting a composite utterance. Mostly, the mere fact of lan-
guage being used triggers a process of interpretation, and the gestures which accom-
pany speech are straightforwardly taken to be associated with what a speaker is
saying (Kendon 2004). Hand gestures are therefore available for inclusion in a unified
utterance interpretation, whether or not we take them to have been intended to
communicate.
Note the kinds of heuristics that are likely being used in solving the problem of sign
filtration. (On heuristics and bounded rationality in general see Gigerenzer, Hertwig,
and Pachur 2011 and references therein.) By a convention heuristic, if a form is recog-
nizable as a socially conventionalized type of sign, assume that it stands for its socially
conventional meaning. Symbols like words may thus be considered as pre-fabricated
semiotic processes: their very existence is due to their role in communication (unlike
iconic-indexical relations which may exist in the absence of interpretants). By an orien-
tation heuristic, if a signer is bodily oriented toward you, most obviously by body posi-
tion and eye gaze, assume they are addressing you. By a contextual association
heuristic, if two signs are contextually associated, assume they are part of one signifying
action. Triggers for contextual association are timing and other types of indexical prox-
imity (e.g., placing caption and picture together, placing word and gesture together). By
a unified utterance-meaning heuristic, assume that contextually associated signs point
to a unified, single, addressed utterance-meaning. And by an agency heuristic, if a signer
has greater control over a behaviour, assume (all things being equal) that this sign is
more likely to have been communicatively intended. Language scores higher than ges-
ture on a range of measures of agency (Kockelman 2007). For further elaboration on
the application of a heuristic model to the interpretation of speech-gesture composites,
see Enfield (2009: 223–227).
to stand for this object. These three types of ground are not exclusive, but co-occur. In
the example of a fingerprint on the murder weapon, the print is iconic and indexical. It
is iconic in that the print has qualities in common with the pattern on the killer’s actual
fingertip and in this way it is a sign that can be taken to stand for the fingertip. It is
indexical in that
(i) it was directly caused by the fingertip making an impression on the weapon (thus a
sign standing for an event of handling it), and
(ii) the fingertip of the killer is in contiguity with the whole killer (thus a sign standing
for the killer himself).
Standard taxonomies of gesture types (Kendon 2004; McNeill 1992; inter alia) are fully
explicable in terms of these types of semiotic ground, as shown in Fig. 44.4.
Deictic:
• semiotic function: indexical (in that the directional orientation of the gesture is determined by the
conceived location of a referent), and symbolic (in that the form of pointing can be locally conven-
tionalized); the hands are used to bring the referent and the attention of the addressee together;
- in concrete deixis, the referent is a physical entity in the speech situation, while in abstract deixis
the referent is a reference-assigned chunk of space with stable coordinates
- in pointing, the attention of the addressee is directed to the referent by some vector-projecting
articulator (such as the index finger or gaze).
- in placing, the referent is positioned for the attention of the addressee
(Nb.: Gaze plays an important role in deictic gestures; it projects its own attention-directing vector
which may (a) reinforce a deictic hand gesture by providing a second vector oriented towards
the same referent, and (b) assist in the management of attention-direction during production of
other gestures.)
Interacting:
• semiotic function: iconic (in that the hands imitate an action) and indexical (in that the shape of the
hands is not the shape of the referent, but is determined by the shape of the referent); the hands arc
meant to look as if they were interacting with the referent;
- in mimetic enactment, the hands arc moving as if they arc doing something to or with the
referent
- in holding, the hands arc shaped to look as if they arc holding the referent
Modeling:
• semiotic function: iconic; the hands arc meant to look as if they arc the referent
- in analogic enactment, the hand’s movement imitates the movement of the referent
- in static modeling, the hand’s shape imitates the shape of the referent
Tracing:
• semiotic function: iconic (in that the gesture imitates drawing) and indexical (in that only part of the
referent is depicted, but the whole is referred to); the hands (more specifically, the fingers) arc meant
to look as if they were tracing the shape of some salient feature of the referent, such as its outline.
Fig. 44.4: Sketch of some semiotic devices used in illustrative co-speech gestures (see Kendon
1988; Mandel 1977; Müller 1998).
structure (see Kockelman 2005: 240–241). This will entail teasing apart the large set of
distinct semiotic dimensions which hand movements incorporate (Talmy 2006; see de
Ruiter et al. 2003). For example, upon uttering a word, the human voice can simulta-
neously vary many distinct features of a speaker’s identity (sex, age, origin, state of
arousal, individual identity, etc.), along with pitch, loudness, among other things.
What makes pitch and loudness distinct semiotic dimensions is that pitch and loudness
can be varied independently of each other. But loudness is a single dimension, because
it is impossible to produce a word simultaneously at two different volumes. Hand move-
ments are well suited to iconic-indexical meaning thanks to their rich potential for shar-
ing perceptible qualities in common with physical objects and events. But they are not
at all confined to these types of meaning. As Wilkins writes, “[the] analog and supraseg-
mental or synthetic nature [of gestures] does not make them any less subject to conven-
tion, and does not deny them combinatorial constraints or rules of structural form”
(Wilkins 2006: 132). For example, in some communities, “the demonstration of the
length of something with two outstretched hands may require a flat hand for the length
of objects with volume (like a beam of wood) and the extended index fingers for the
length of essentially linear objects lacking significant volume (e.g., string or wire)”
(Wilkins 2006: 132). A similar example is a Lao speaker’s conventional way of talking
about sizes of fish, by using the hand or hands to encircle a cross-section of a tapering
tubular body part such as the forearm, calf, or thigh. This is taken as standing for the
actual size of a cross-section of the fish.
Another kind of conventionality in gestures concerns types of communicative prac-
tice like, say, tracing in mid air as a way of illustrating or diagramming (Enfield 2009:
Chapter 6; Kendon 1988; Mandel 1977). It may be argued that there are conventions
which allow interpreters to recognize that a person is doing an illustrative tracing ges-
ture, based presumably on formal distinctions in types of hand movement in combina-
tion with attention-directing eye gaze toward the gesture space. While the exact form
of a tracing gesture cannot be pre-specified, its general manner of execution may be
sufficient to signal that it is a tracing gesture.
Most important is the collaborative, public, socially strategic nature of the process of
constructing composite utterances (Goodwin 2000; Streeck 2009). These communica-
tive moves are not merely designed but designed for, and with, anticipated interpreters.
They are not merely indices of cognitive processes, they constitute cognitive processes.
They are distributed, publicized, and intersubjectively grounded. Each type of compos-
ite utterance discussed in this book is regulated by its producer’s aim not just to convey
some meaning but to bring about a desired understanding in a social other. So, like all
instruments of meaning, these composites are not bipolar form-meaning mappings, or
mere word-to-world glue, they are premised on a triadic, cooperative activity consisting
of a speaker, an addressee, and what the speaker is trying to say.
efforts, our primary unit of analysis must be the utterance or move, the single increment
in a sequence of social interaction. Component signs will only make sense in terms of
how they contribute to the function of the move as a whole.
This chapter has focused on moves built from speech-with-gesture as a sample
domain for exploring the anatomy of meaning. But the analytic requirement to think
in terms of composite utterances is not unique to speech-with-gesture. Because all utter-
ances are composite in kind, our findings on speech-with-gesture should help us to
understand meaning more generally. This is because research on the comprehension
of speech-with-gesture is a sub-field of a more general pursuit: to learn how it is that
interpreters understand token contributions to situated sequences of social interaction
(see Goffman 1981; Goodwin 2000; Schegloff 1968; Streeck 2009).
How are multiple signs brought together in unified interpretations? The issue was
framed above in terms of semiotic function of a composite’s distinct components (see
Fig. 44.4). A broad distinction was made between conventional meaning and non-
conventional meaning, where these two may be joined by indexical mechanisms of
various kinds. Think of a painting hanging in a gallery: a title (words, conventional)
is taken to belong with an image (an arrangement of paint, non-conventional) via in-
dexical links (spatial co-placement on a gallery wall, putative source in a single cre-
ator and single act of creation). Speech-with-gesture composites can be analyzed in
the same way. When a man says Make it steep like this with eye gaze fixed on his arm
held at an angle (see Fig. 44.1), the conventional signs of his speech are joined to the
non-conventional sign of his arm gesture by means of indexical devices including tempo-
ral co-placement, source in a single producer, eye gaze, and the symbolic indexical
expression like this. In these “illustrative gesture” cases, hand movements constitute
the non-conventional “image” component of the utterance. By contrast, in cases of
“deictic gesture” or pointing, hand movement is what provides the indexical link
between words and an image or thing in the world, such as a person walking by, or
diagrams in ink or mid-air.
This semiotic framework permits systematic comparison of speech-with-gesture
moves to other species of composite utterance. An important case is sign language of
the Deaf. There is considerable controversy as to how, if at all, gesture and sign lan-
guage are to be compared (see Emmorey and Reilly 1995). The present account
makes it clear that the visible components of a sign language utterance cannot be com-
pared directly to the visible hand movements that accompany speech, nor to mere
speech alone (i.e., with visible hand movements subtracted), but may only be properly
compared to the entire speech-with-gesture composite (see Liddell 2003; Okrent 2002).
The unit of comparison in both cases must be the move. By the analysis advanced here,
different components of a move in sign language will have different semiotic functions,
in the sense just discussed: conventional signs with non-conventional signs, linked in-
dexically. Take the example of sign language “classifier constructions” or “depicting
verbs” (Liddell 2003: 261ff). In a typical construction of this kind, a single articulator
(the hand) will be the vehicle for both a conventional sign component (a conventiona-
lized hand shape such as the American Sign Language “vehicle classifier”) and a non-
conventional sign component (some path of movement, often relative to a contextually
established set of token spatial referents), where linking indexical mechanisms such
as spatio-temporal co-placement and source in single creator are maximized through
instantiation in single sign vehicle, i.e., one and the same hand.
44. A “Composite Utterances” approach to meaning 703
Another domain in which a general composite utterance analysis should fit is in lin-
guistic research on syntax. Syntactic constructions, too, are made up of multiple signs,
where these are mostly the conventional signs of morphemes and constructions (though
note of course that many grammatical morphemes are symbolic indexicals). An increas-
ingly popular view of syntax takes lexical items (words, morphemes) and grammatical
configurations (constructions) to be instances of the same thing: linguistic signs (Croft
2001; Goldberg 1995; Langacker 1987). From this “construction grammar” viewpoint,
interpretation of speech-only utterances should be just as for speech-with-gesture. It
means dealing with multiple, simultaneously occurring signs (e.g., That guy may be
both noun phrase and sentential subject), and looking to determine an overall target
meaning for the communicative move that these signs are converging to signify. A dif-
ference is that while semantic relations within grammatical structures are often nar-
rowly determined by conventions like word order, speech-with-gesture composites
appear to involve simple co-occurrence of signs, with no special formal instruction
for interpreters as to how their meanings are to be unified. Because of this extreme
under-determination of semiotic relation between, say, a gesture and its accompanying
speech, many researchers conclude that there are no systematic combinatorics in
speech-with-gesture. But speech-with-gesture composites are merely a limiting case in
the range of ways that signs combine: all an interpreter knows is that these signs are
to be taken together, but there may be no conventionally coded constraints on how.
Such under-determination is not unique to gesture. In language, too, we find minimal
interpretive constraints on syntactic combinations within the clause, as documented
for example by Gil (2005) for the extreme forms of isolating grammar found in some
spoken languages. And beyond the clause level, such under-determined relations are
the standard fabric of textual cohesion (Halliday and Hasan 1976).
In sum, to understand the process of interpreting any type of composite utterance,
we should not begin with components like “noun”, “rising intonation”, or “pointing ges-
ture”. We begin instead with the notion of a whole utterance, a complete unit of social
action which always has multiple components, which is always embedded in a sequential
context (simultaneously an effect of something prior and a cause of something next),
and whose interpretation always draws on both conventional and non-conventional
signs, joined indexically as wholes.
Research on speech-with-gesture yields ample motivation to question the standard
focus in mainstream linguistics on competence and static representations of meaning
(as opposed to performance and dynamic processes of meaning; see McNeill 2005:
64ff, Wilkins 2006: 140–141). There is a need for due attention to meaning at a con-
text-situated token level, a stance preferred by many functionalist linguists, linguistic
anthropologists, conversational analysts, and some gesture researchers. Speech-with-
gesture composites quickly make this need apparent, because they force us to examine
singularities, i.e., semiotic structures that are tokens but not tokens-of-types. These sin-
gularities include non-conventional gestures as utterance components, as well as the
overall utterances themselves, each a unique combination of signs. This is why, for
instance, Kendon writes of speech-with-gesture composites that “it is only by studying
them as they appear within situations of interaction that we can understand how they
serve in communication” (Kendon 2004: 47–48; see also Hanks 1990, 1996, among
many others). Here is the key point: What Kendon writes is already true of speech
whether it is accompanied by gesture or not. Speech-with-gesture teaches us to treat
704 IV. Contemporary approaches
Acknowledgements
The text of this chapter is drawn from chapters 1 and 8 of The Anatomy of Meaning:
Speech, Gesture, and Composite Utterances (Cambridge University Press, 2009), with
revisions. I gratefully acknowledge Cambridge University Press for permission to
reproduce those sections here. I would also like to thank Cornelia Müller for her
encouragement, guidance, and patience.
6. References
Atkinson, J. Maxwell and John Heritage 1984. Structures of Social Action: Studies in Conversation
Analysis. Cambridge: Cambridge University Press.
Atlas, Jay D. 2005. Logic, Meaning, and Conversation: Semantical Underdeterminacy, Implicature,
and Their Interface. Oxford: Oxford University Press.
Austin, John L. 1962. How to Do Things with Words. Cambridge, MA: Harvard University Press.
Bates, Elizabeth, Luigia Camaioni and Virginia Volterra 1975. The acquisition of performatives
prior to speech. Merril-Palmer Quarterly 21: 205–224.
Bates, Elizabeth, Barbara O’Connell and Cecilia M. Shore 1987. Language and communication in
infancy. In: Joy D. Osofsky (ed.), Handbook of Infant Competence (2nd edition), 149–203. New
York: Wiley and Sons.
Brandom, Robert B. 1979. Freedom and constraint by norms. American Philosophical Quarterly
16(3): 187–196.
Brookes, Heather 2004. A repertoire of South African quotable gestures. Journal of Linguistic
Anthropology 14(2): 186–224.
Bühler, Karl 1982. The deictic field of language and deictic words. In: Robert J. Jarvella and Wolf-
gang Klein (eds.), Speech, Place, and Action, 9–30. Chichester: John Wiley and Sons. First
published [1934].
Chafe, Wallace 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Con-
scious Experience in Speaking and Writing. Chicago: University of Chicago Press.
Clark, Herbert H 1996. Using Language. Cambridge: Cambridge University Press.
Colapietro, Vincent M. 1989. Peirce’s Approach to the Self: A Semiotic Perspective on Human Sub-
jectivity. Albany: State University of New York Press.
Croft, William 2001. Radical Construction Grammar: Syntactic Theory in Typological Perspective.
Oxford: Oxford University Press.
Cruse, D. Alan. 1986. Lexical Semantics. Cambridge: Cambridge University Press.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Origins, usage,
and coding. Semiotica 1: 49–98.
Emmorey, Karen and Judy S. Reilly 1995. Language, Gesture, and Space. Hillsdale, NY: Lawrence
Erlbaum.
Enfield, N. J. 2009. The Anatomy of Meaning: Speech, Gesture, and Composite Utterances. Cam-
bridge: Cambridge University Press.
Enfield, N. J. and Stephen C. Levinson 2006. Introduction: Human sociality as a new interdisciplin-
ary field. In: N. J. Enfield, Stephen C. Levinson, Robert J. Jarvella and Wolfgang Klein (eds.),
Roots of Human Sociality: Culture, Cognition, and Interaction, 1–35. Oxford: Berg.
Engle, Randi A. 1998. Not channels but composite signals: Speech, gesture, diagrams and object
demonstrations are integrated in multimodal explanations. In: Morton A. Gernsbacher,
Sharon J. Derry, Robert J. Jarvella and Wolfgang Klein (eds.), Proceedings of the Twentieth
44. A “Composite Utterances” approach to meaning 705
Annual Conference of the Cognitive Science Society, 321–326. Mahwah, NJ: Lawrence
Erlbaum.
Fillmore, Charles J. 1997. Lectures on Deixis. Stanford, CA: Center for the Study of Language and
Information Publications.
Foley, William A. and Robert D. Van Valin Jr. 1984. Functional Syntax and Universal Grammar.
Cambridge: Cambridge University Press.
Frege, Gottlob 1960. On sense and reference. In: Peter Geach and Max Black (eds.), Translations
from the Philosophical Writings of Gottlob Frege, 56–78. Oxford: Blackwell. First published
[1892].
Frith, Chris D. and Uta Frith 2007. Social cognition in humans. Current Biology 17(16): R724–
R732.
Gergely, György, Harold Bekkering and Ildikó Király 2002. Developmental psychology: Rational
imitation in preverbal infants. Nature 415: 755.
Gigerenzer, Gerd, Ralph Hertwig and Thorsten Pachur (eds.) 2011. Heuristics: The Foundations of
Adaptive Behavior. New York: Oxford University Press.
Gil, David 2005. Isolating-monocategorial-associational language. In: Henri Cohen and Claire
Lefebvre (eds.), Categorization in Cognitive Science, 348–380. Amsterdam: Elsevier.
Goffman, Erving 1964. The neglected situation. American Anthropologist 66(6): 133–136.
Goffman, Erving 1981. Forms of Talk. Philadelphia: University of Pennsylvania Press.
Goldberg, Adele E. 1995. Constructions: A Construction Grammar Approach to Argument Struc-
ture. Chicago: University of Chicago Press.
Goodwin, Charles 2000. Action and embodiment within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Grice, H. Paul 1957. Meaning. Philosophical Review 67: 377–388.
Grice, H. Paul 1975. Logic and conversation. In: Peter Cole and Jerry L. Morgan (eds.), Speech
Acts, 41–58. New York: Academic Press.
Grice, H. Paul 1989. Studies in the Way of Words. Cambridge, MA: Harvard University Press.
Halliday, Michael A. K. and Ruqaiya Hasan 1976. Cohesion in English. London: Longman.
Hanks, William F. 1990. Referential Practice: Language and Lived Space among the Maya. Chi-
cago: University of Chicago Press.
Hanks, William F. 1996. Language and Communicative Practices. Boulder, CO: Westview Press.
Horn, Laurence R. 1989. A Natural History of Negation. Chicago: Chicago University Press.
Hutton, Christopher M. 1990. Abstraction and Instance: The Type-Token Relation in Linguistic
Theory. Oxford: Pergamon Press.
Jackendoff, Ray 1983. Semantics and Cognition. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Jakobson, Roman 1971. Shifters, verbals categories, and the Russian verb. In: Roman Jakobson
(ed.), Selected Writings II: Word and Language, 130–147. The Hague: Mouton.
Kendon, Adam 1988. Sign Languages of Aboriginal Australia: Cultural, Semiotic and Communi-
cative Perspectives. Cambridge: Cambridge University Press.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.), Lan-
guage and Gesture, 162–185. Cambridge: Cambridge University Press.
Kita, Sotaro 2003. Pointing: Where Language, Cognition, and Culture Meet. Mahwah, NJ: Lawrence
Erlbaum.
Kockelman, Paul 2005. The semiotic stance. Semiotica 157(1/4): 233–304.
Kockelman, Paul 2006. Residence in the world: Affordances, instruments, actions, roles, and iden-
tities. Semiotica 162(1/4): 19–71.
Kockelman, Paul 2007. Agency: The relation between meaning, power, and knowledge. Current
Anthropology 48(3): 375–401.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar: Volume I, Theoretical Prerequi-
sites. Stanford, CA: Stanford University Press.
706 IV. Contemporary approaches
Sperber, Dan and Deirdre Wilson 1995. Relevance: Communication and Cognition (2nd edition).
Oxford: Blackwell.
Streeck, Jürgen 2009. Gesturecraft: The Manu-Facture of Meaning. Amsterdam: John Benjamins.
Talmy, Leonard 2006. The representation of spatial structure in spoken and signed language. In:
Maya Hickmann and Robert Stéphane (eds.), Space in Languages: Linguistic Systems and
Cognitive Categories, 207–238. Amsterdam: John Benjamins.
Tomasello, Michael 2006. Why don’t apes point? In: N. J. Enfield and Stephen C. Levinson (eds.),
Roots of Human Sociality: Culture, Cognition, and Interaction, 506–524. London: Berg.
Tomasello, Michael, Malinda Carpenter, Josep Call, Tanya Behne and Henrike Moll 2005. Under-
standing and sharing intentions: The origins of cultural cognition. Behavioral and Brain
Sciences 28(5): 664–670.
Wierzbicka, Anna 1985. Lexicography and Conceptual Analysis. Ann Arbor, MI: Karoma.
Wierzbicka, Anna 1996. Semantics: Primes and Universals. Oxford: Oxford University Press.
Wilkins, David P. 2006. Review of Adam Kendon (2004). Gesture: Visible action as utterance.
Gesture 6(1): 119–144.
Abstract
Departing from Kendon’s (2004) notion of “features of manifest deliberateness” and their
particular movement characteristics: “Deliberate expressive movement was found to be
movement that had a sharp boundary of onset and offset and that was an excursion,
rather than resulting in any sustained change of position” (Kendon 2004: 12), the chapter
presents a form-based approach to gesture analysis, which regards gestures as motivated
signs and considers a close analysis of their form as the point of departure for reconstruct-
ing their meaning. Furthermore, by considering gestural meaning not only as visual
action but also as a form of dynamic embodied conceptualization, the approach takes
a cognitive and interactive perspective on the process of ad hoc meaning construction
in the flow of a discourse. By discussing principles of meaning creation (sign motivation
via semiotic and cognitive processes) and the simultaneous (variation of formational fea-
tures and gesture families) and linear structures (combinations within gesture units) of
gesture forms, the chapter explicates individual aspects of a “grammar” of gestures. It
is concluded that in gestures we can find the seeds of language or the embodied potential
of hand-movements for developing linguistic structures.
708 IV. Contemporary approaches
1. Introduction
Our form-based approach to gesture analysis departs from the premise that the articu-
lation of shapes, movements, positions and the orientations of hands, fingers and arms is
meaningful. Moving the hands so that they stand out as figure against a ground, that
they form a particular shape in a particular location and move with a distinct motion
pattern requires effort, articulatory effort which uses muscular energy to work against
gravity (Müller 1998b). It assumes that what we see in gestures is an articulatory effort,
which can be recognized as a communicative one (Müller and Tag 2010). As Kendon
(2004) has shown in a little experiment, people observing a speaking person without
hearing what is being talked about can very reliably distinguish communicative move-
ments from non-communicative ones. Only based on their particular characteristics of
form, those movements are attributed “intentionality” and “having something to do
with what is being said”: “These movements are seen as deliberate, conscious, governed
by an intention to say something or to communicate. These were the movements that
were considered to be, in one observer’s words, “part of what the man was trying to
say” (Kendon 2004: 11). Kendon reports the following form characteristics of those
communicative hand movements: “Deliberate expressive movement was found to be
movement that had a sharp boundary of onset and offset and that was an excursion,
rather than resulting in any sustained change of position. For limb movements, deliber-
ately expressive movements were those in which the limb was lifted away from the body
and later returned to the same or a similar position from which it started” (Kendon
2004: 12). Kendon argues that observers are highly consistent and sure in differentiating
the body movements that show what he terms “features of manifest deliberate expres-
siveness” (Kendon 2004: 13–14, highlighting in the original) from those that are to be
disattended in Goffman’s terms (Goffman 1974, chapter 7; Kendon 2004: 12–13) and
that hearers, in fact, treat those gestures very much like speech: “Just as a hearer per-
ceives speech, whether comprehended or not, as ‘figure’ no matter what the ‘ground’
may be, and just as speech is always regarded as fully intentional and intentionally com-
municative, so it is suggested that if movements are made so that they have certain
dynamic characteristics they will be perceived as ‘figure’ against the ‘ground’ of other
movement, and such movements will be regarded as fully intentional and intentionally
communicative” (Kendon 2004: 13). Kendon thus proposes features of form that are re-
cognized as deliberate to be the core characteristics that distinguishes gesture from
other body movements and actions: “If an action is an excursion, if it has well defined
boundaries of onset and offset, and if it has features which show that the movement is
not made solely under the influence of gravity, then it is likely to be perceived as ges-
tural” (Kendon 2004: 14). In short, Kendon bases his definition of gesture on those form
characteristics of actions “that have the features of manifest deliberate expressiveness”
(Kendon 2004: 15).
Our form-based approach takes on Kendon’s primacy of form in gesture analysis, but
it differs from Kendon in that it regards gestural meaning not only as visual action but
also as a form of embodied conceptualization. In doing so, it follows Langacker’s cog-
nitive linguistic tenet “that meaning resides in conceptualization (in the broadest sense
of this term)” (Langacker 1991: 1). When for instance a German speaker describes a
political turn metaphorically as a “set back” (politischer Rückschlag, literally: “political
back hit”) and while doing so he performs a hitting gesture, his gesture reveals that
45. Towards a grammar of gestures: A form-based view 709
he perceives the second part of the metaphoric compound Schlag (‘stroke’) as a manual
action. The gesture embodies the source domain of a verbal metaphor and activates the
embodied sensory experience of the lexicalized meaning. We suggest that in such multi-
modal (or verbo-gestural) metaphoric expressions we see that metaphoricity is cogni-
tively and interactively activated (Müller 2003, 2008a, 2008b; Müller and Cienki 2009;
Müller and Tag 2010). In other words, this embodied conceptualization is an ad hoc cre-
ation of a speaker that documents his (dynamically changing) focus of attention, which
in turn is a product of the interactive needs of a given communicative situation. The
form-based approach thus advocates a cognitive take on the process of ad hoc meaning
construction in the flow of a discourse: “Meaning construction is an on-line mental
activity whereby speech participants create meanings in every communicative act on
the basis of underspecified linguistic units.” (Radden, Köpcke, Berg, et al. 2007: 3, high-
lighting in the original, for a cognitive linguistic view on gesture and language see also
Cienki this volume, 2012) We consider gestures to be a core partner in this interactive
process of meaning construction, but we are not regarding co-verbal gestures as linguis-
tic units in the full-fledged sense. However, we do take the position that gestures may
take over functions of linguistic units either in collaboration or in exchange with vocal
linguistic units. While most of the gesture research addresses gestures accompanying
speech, that specify, extend, or complement what is said (Calbris 1990, 2011; Enfield
2009; Kendon 2004; McNeill 1992, 2000; Mittelberg 2006; Streeck 2009), only scarce
attention has been paid to gestures filling grammatical slots (e.g., Bohle 2007; Clark
and Gerrig 1990; Clark 1996; McNeill 2005, 2007; Müller and Tag 2010; Slama-Cazacu
1976; Wilcox 2002).
Ladewig (2012 volume 2) has shown that co-verbal gestures may replace spoken
units and become part of the syntactic structure of a given utterance. Investigating in-
terrupted spoken utterances that are completed by gestures, she found that gestures
replace nouns and verbs in most cases, taking over the functions of objects and predi-
cates. When a speaker, for instance, describes a person and says He had a […] and
molds an arched object in front of his stomach, this gesture is inserted in the syntactic
gap of a noun and describes what verbally could be expressed as a big belly. In another
case, when a speaker, talking about a baseball game, says It’s like a baseball player who
digs his feet in the sand before he […], the gesture occupies the syntactic position of a
verb and enacts the action of hitting a ball with a baseball bat. Studies on gestures ac-
companying speech could show that gestures may adopt the syntactic functions of ad-
jectives or adverbs, modifying the semantic information expressed in speech (Bressem
2012; Fricke 2012). An arced gesture accompanying the utterance there is such a small
bridge is cataphorically integrated into the verbal noun phrase and adds the informa-
tion regarding the shape of the bridge to the meaning expressed in speech. This phenom-
enon has been described as “multimodal attribution” (Fricke 2012). A flat hand being
moved downwards twice with an accented movement while accompanying the utterance
“is being stamped” specifies the manner of the action of stamping and, as such, fulfills
the function of an adverb (Bressem 2012). Such observations furnish evidence for the
structural and functional integration of gestures into spoken language and lay the ground
for developing the framework of a “multimodal grammar” (Fricke 2012, this volume).
These empirical findings substantiate Kendon’s assumption of gestures and speech as
being “two sides of one process of utterance” (Kendon 1980). By complementing this
view with a perspective on gestural meaning construction as a cognitive process we
710 IV. Contemporary approaches
are indebted to David McNeill’s theory of gestures as visible forms of imagistic thought
(McNeill 1992):
McNeill thus suggests that gestures reveal thinking and this is in line with our assump-
tion of gestures as embodied conceptualization. This assumption is also in line with
Slobin’s concept of thinking for speaking (Slobin 1987) although it is somewhat broader
in scope. Speakers orient their thinking according to the expressive properties at hand:
a specific language with its individual grammatical and semantic/pragmatic structures
and a body which is apt for communication. So we must talk about “thinking and ges-
turing for speaking”, because the cognitive processes that are activated during speaking
must be adjusted to both the linguistic properties of a given language and the expressive
resources of the body (Cienki and Müller 2008). Taking a form-based approach to ges-
ture analysis means then that those expressive resources are described and documented
as resources.
How is it that hand movements can be meaningful? What are the techniques of the
body (to quote Marcel Mauss (1973)) that people employ to create gestures? If gestures
can represent other than themselves, if they can fulfill, in Bühler’s terms, a representa-
tional function (see Müller this volume, 2009), what are the modes of gestural represen-
tation? How are gestures used as instruments for depiction? And what are the
principles governing gestures’ potential to depict concrete as well as abstract actions,
entities, properties, to refer to space as well as to time? Is it just their contextual place-
ment that motivates their local meaning, so that in gesture analysis we need not care
much about a careful account of form properties of a given movement? Or are there
differences of form that are pertinent for meaning variations, for instance, when we
look at gestures relating to literal and metaphoric meaning? We know that form fea-
tures of gestures do play out in regard to the question of systematic relations between
gestures, Kendon’s gesture families (Kendon 2004: 15). In fact, Kendon’s concept of
gesture family is based on the assumption that gesture forms are meaningful and a ges-
ture family emerges around a shared semantic core or semantic theme incorporated in a
particular aspect of form, as for instance: the open hand (e.g., a particular hand shape;
see Kendon 2004, chapter 13; Müller 2004). We believe that these aspects of gestural
form are the basis of gestural meaning – and we believe that these are precisely
Kendon’s “features of manifest deliberate expressiveness.” Providing an encompassing
documentation of those properties of form that characterize the hand(s) as a medium of
expression was the goal of two research enterprises carried out between 2000 and 2011:
“The Berlin Gesture Project” and the project “Towards a Grammar of Gesture: Evolu-
tion, Brain and Linguistic Structures” (ToGoG) (for further information see www.
togog.org and http://www.berlingesturecenter.de/berlingestureproject/fugestureproject.
html).
45. Towards a grammar of gestures: A form-based view 711
The interdisciplinary project “Towards a grammar of gesture” was funded within the
program “Key Topics of the Humanities” of the VolkswagenStiftung: It addressed three
key topics of the humanities: the multimodal nature of language, the evolution of gesture,
and neuropsychological foundations of gestures. It investigated these issues from the per-
spective of linguistics (Cornelia Müller), semiotics (Ellen Fricke), evolutionary psychol-
ogy (Katja Liebal), and neuropsychology (Hedda Lausberg). Notably, the term
“grammar of gesture” refers to the basic form properties of gestures and to their struc-
tures. It does not imply, however, that co-verbal gestures have anything like a grammat-
ical structure. The formulation “Towards a grammar of gestures” underlines two aspects:
first, co-verbal gestures show properties of form and meaning which are prerequisites of
language and which – in case the oral mode of expression is not available – may evolve
into a more or less full-fledged linguistic system such as a sign language or an alternate
sign language (see Bressem’s work (2012) on reduplication in gesture, speech, and
sign). Second, when used in conjunction with speech, co-verbal gestures may take over
grammatical functions, such as that of verbs, nouns, or attributes pointing towards a multi-
modal nature of grammar (Bressem 2012; Fricke 2012; Ladewig 2012). In the following,
we will substantiate in particular the first assumption with an overview of our findings
regarding the principles of meaning creation, the simultaneous and linear structures of
gesture forms, and introduce our distinction between singular and recurrent gestures.
For different meticulous substantiations of the second claim, see Bressem on repetitions
in gestures (Bressem 2012; Fricke 2012), Cienki (2012) on gestures as part of usage events
and a variably multimodal concept of spoken language grammar, Fricke on the basic prin-
ciples of a multimodal grammar (2012, this volume), Ladewig on gesture’s integration
into the syntactic and sequential structure of verbal utterances (Ladewig 2012) and Müller
(2010b, submitted) for an overview.
objects and movements in space. It is their material character or their specific mediality
which makes them particularly suitable to mime and depict actions, objects, movements
and spatial relations of all kinds. In the process of the formation of gestural signs, the
hands undergo transformational processes that rely on a basic set of techniques. In ear-
lier work, we distinguished four basic modes of representation – the hand acts, the hand
molds, the hand draws (or traces), and the hand represents. Note, that the term repre-
sentation is used here in Bühler’s sense of Darstellung, which might probably be better
translated as depiction (for more detail, see Müller 2009, this volume). Those four
modes of representation (or of gestural depiction) were a first attempt to provide an
answer to the question: what are the hands actually “doing” when they are transformed
into gesture? (see Müller 1998a, 1998b) The answer was: they are used as techniques for
the depiction of “the world” much like artists use a paint brush, a pencil or clay to shape
a sculpture, a vase, a cup or a bowl. This means, that in the acting mode the hands are
used to mime or reenact actual manual activities. Examples are: grabbing, holding, giv-
ing, receiving, opening a window, turning off a radiator, or holding a steering-wheel. In
the molding mode, the hands create a transient sculpture, such as a frame or a bowl; in
the drawing (or tracing) mode, the hands outline the contour of objects or they trace the
path of movements in space. In all these cases gestural meaning is motivated through
the re-enacted action. This motivation differs in the representing mode. Here, the
hand embodies an object as a whole and becomes itself a kind of “manual sculpture”.
Examples are: the extended index finger representing the pen or the open hand be-
comes a piece of paper on which something is written down (Fig. 45.1). While in the
first three modes of representation the tactile and sensory-motor experience of actions
of the hands provide the derivational base for gestural meaning, in the fourth mode the
derivational base of the gesture might as well be a visual perception only.
Fig. 45.1: The four basic techniques of gesture creation or the gestural modes of representation:
acting, molding, drawing (tracing), representing. Hand(s) reenact an everyday action (pulling a
gear-shift), they mold or outline shape objects (a picture frame), they trace paths, or they repre-
sent objects (in motion) (extended index represents pen, palm open represents paper).
Notably, research conducted on processes of gesture and sign creation and on the
nature of iconicity in gestures and signs has made similar distinctions: for gesture stu-
dies (Kendon 2004; McNeill 1992; Sowa 2005; Streeck 2008, 2009, this volume;
Wundt 1921), for sign linguistics (Battison 1974; Cohen, Namir, and Schlesinger 1977;
Kendon 1980, 1981, 1986; Mandel 1977; Taub 2001), and for semiotics (Andrén 2010)
45. Towards a grammar of gestures: A form-based view 713
(For a detailed discussion of those accounts see Müller, Fricke, Ladewig, et al. in
preparation).
We assume, however, that the derivation from a particular action or a specific hand-
ling of objects is not given pre-hoc. On the contrary, speakers often use different ges-
tural techniques to create gestures which refer to the same object. So, you may
observe somebody describing a picture hanging on the wall by molding its shape, by
outlining it, or by representing it – and all this in the same sequence of talk. Yet by
using different techniques of depiction, different properties of the respective object
are foregrounded: molding highlights its corporality, tracing reduces it to lines, repre-
senting brings forward the qualities of objects as located and moved in space (for a de-
tailed account of such a gestural sequence, see Müller 2010a, and for its relation to
classifiers in signed languages see Müller 2009). Speakers use these devices for gesture
creation in a very fine-grained manner and in accordance with their communicative
goals. These techniques of representation (or mimetic devices) imply an orientation
of speakers towards specific facets of perceived and conceived “things and actions”
in the world. It is in this sense that gestures are subtle and variable conceptualizations
of a perceived reality – designed for and triggered by the purposes of communication at
a given moment in the flow of interaction.
Gestures can be regarded as mundane forms of creating illusions of reality, illusions
which are manifestations of “visual thinking” in Arnheim’s (1974) terms (who includes
gestures himself) and as a form of “thinking for speaking and gesturing” in Cienki and
Müller’s terms (Cienki and Müller 2008). According to Gombrich (1961) and Arnheim
(1974) the artist conceptualizes the perceived reality in terms of the available mimetic
devices or “modes of representation”. The mimetic technique has a direct relation to
the process of conceptualization of a perceived world. The psychology of art teaches
us that images are conceptualizations of perceived objects and that they are shaped
by different modes of representation. Images in that sense are artfully created illusions
of reality, creative abstractions shaped by their respective mode of representation. So
it makes a great difference whether a landscape is perceived for drawing or for painting
or for black and white photography (see Gombrich 1961). Gestures in this view are
“natural” and “artful” illusions of reality, created by speakers in the flow of discourse
and interaction, and they are probably the first mimetic devices appearing on the
stage of human evolution (Müller 1998b).
For Aristotle, mimesis is a fundamental anthropological trait (Aristotle Poetics
1965). Only humans possess a capacity for it, and it is considered the base for the devel-
opment of all arts. Aristotle distinguishes three aspects of mimesis: medium, objects,
and modes of mimesis and we suggest that those three aspects characterize gestural
mimesis as well (see also Müller 2010a).
The medium of gestural mimesis is the body with its range of articulators: hands,
head, face, eyes, mouth, shoulders, trunk and even leg(s) and feet. Up until now, gesture
studies have focused mainly on the hands, but obviously the hands are but the most
articulate one (along with the face) and often interact with the other ones. We charac-
terize the objects of gestural mimesis through their functions. Following Bühler, we dis-
tinguish three basic functions of gestures: representation, expression and appeal (see
Müller 1998b, 2009, this volume). The modes of mimesis concern precisely the modes
of gestural representation that we have introduced above: acting, molding, drawing
and tracing, representing.
714 IV. Contemporary approaches
When reconsidering those four different modes of representation from the point of
view of a more strictly praxeological perspective (as advocated by Streeck 2009, this
volume), one can argue, however, that there are actually only two fundamental techni-
ques: acting and representing. In the acting mode the hands act as if performing an
actual action, they mime different kinds of actions and in the representing mode the
hand acts as if being an object itself, the hand mimes an object by becoming a “manual
sculpture” of that object. In the acting mode of mimesis, gestures are based on actions,
and put in terms of Peirceian semiotics, this means that here the representamen-object
relation is one in which the gestural form relates to an action. In the representing mode,
on the contrary, gestures are based on objects; the representamen-object relation is one
in which the gestural form relates to a “material object in the world” (Peirce 1960, for an
extended analysis in Peirceian terms see Müller, Fricke, Ladewig, et al. in preparation).
Neuro-cognitive research conducted in the ToGoG group underlines the reduction
to two modes of representation as the two basic mimetic modes (see Lausberg, Cruz,
Kita et al. 2003; Lausberg, Heekeren, Kazzer, et al. submitted). In split-brain as well
as in fMRI studies, Lausberg and colleagues found that the two modes (acting and re-
presenting) are processed in different parts of the brain and must therefore be consid-
ered different neuro-cognitive entities. Primatological studies carried out in our group
furthermore indicate that the “acting” mode of representation is clearly the most wide-
spread technique for creating gestures that non-human primates have at their disposal.
Both neuro-cognitive and primatological work indicate that the difference between
object use and gesturing object use (i.e. acting as if the hands would hold or show an
object versus acting with a real object) is not trivial at all. Rather they appear as fun-
damentally distinct behavioral and cognitive processes. Acting as if holding, showing
or moving an object presupposes some kind of mental concept of that very object.
What we face here is no less than the difference between instrumental and symbolic
behavior, or the transition from action to gesture.
Notably, both ways of gesture formation – acting and representing – involve two cog-
nitive-semiotic processes: iconicity and indexicality. In the acting mode indexicality
(contiguity) is the primary cognitive-semiotic process, which mediates between the ges-
tural form and the perceived object in the world, while in the representing mode iconi-
city (similarity) is primary. So this would also point towards indexicality as a different
and evolutionarily earlier cognitive process than iconicity. But, why is indexicality the
basis of a type of gestures that has very widely been described as iconic gesture (see
McNeill 1992)? Because the semiotic principle governing the motivation of the gestural
form is primarily a metonymic one – or more particularly a synecdochical one: a part of
the action stands for the action. This principle characterizes the reenactment of mun-
dane actions, the reenactment of touching surfaces (for the molding mode) and the re-
enactment of drawing traces in soft surfaces with an extended index (for the drawing
mode). In the case of the representing mode iconicity is primary, because the motiva-
tion of the form is primarily based on similarity of a particular hand-shape with an
object.
Müller (1998b, 2009), Müller, Fricke, Ladewig, et al. (in preparation), Mittelberg
(2006) and Mittelberg and Waugh (2009), have argued that one pertinent relationship
between a gesture’s form and its object or ground (in a Peirceian sense) is metonymy
and both consider metonymy as based on contiguity relations (in a Jacobsonian
sense), i.e. as based on indexicality (for more detail see Mittelberg’s cognitive-semiotic
45. Towards a grammar of gestures: A form-based view 715
approach to gesture, Mittelberg 2006, this volume). Müller (1998b) has described the
process of gesture creation in terms of different forms of metonymy, all of which are
characterized by a process of creative abstraction from a perceived or conceived object
or ground in the world. She has related these to processes of abstraction in modern
art in particular to Kandinsky’s concept of “abstraction” and to the sculpting art of
Brancusi underlining that it is here where the creative processes of conceptualization
described above come in. Therefore we would like to suggest that the processes of ab-
stractive conceptualization (see also Langer’s concept of “abstractive seeing” 1957) are
mediated by the specific technique the artist as well as the gesturer applies, meaning
that what is metonymically abstracted, depends on the techniques of depiction (paint,
pencil, photography or acting, molding, drawing, representing), and those techniques
will guide “the thinking or conceiving for depiction” and they will give each artistic
and every gestural depiction its particular taste.
We therefore suggest that there is no such thing as an “automatic” or “natural” rela-
tion between a perceived or remembered object or event in the world and its gestural
depiction. Speakers choose between different gestural modes: they may trace, mold,
represent an object or they may act with it and each time they will highlight a different
dimension of this object. At their moment of creation the modes of representation
or the mimetic devices structure the embodied abstractive conceptualizations that we
see in the gesturing hands.
Reconstructing the techniques of gesture creation is one core aspect of a form-based
view of gesture analysis. It offers a key to a differentiated understanding of gestural
meaning construction and is one element of a method for a linguistic analysis of gestural
meaning (see Bressem, Ladewig, and Müller this volume; Ladewig and Bressem forth-
coming; Müller, Ladewig and Bressem volume 2). This is crucial, since gestures are ico-
nic and indexical signs and considering the modes of representations helps to determine
the particular motivation of a given gestural movement. It provides a form-based,
descriptive, and rational ground to reconstructing the meaning of gestures in a partic-
ular context and it is an important contribution to intersubjectively accountable
descriptions of Kendon’s features of manifest deliberate expressiveness.
gestures’ forms was based on the four parameters of sign language, i.e. hand shape, ori-
entation of the hand, movement, and position in the gesture space (Battison 1974; Sto-
koe 1960; for a specification of those parameters for gestures see Bressem this volume).
In addition, we took facial expressions as well as the involvement of other articulators
such as posture, arm and head movements as well as gaze into consideration.
First of all, the analysis of the elicited gestures did not yield any kind of systematic
variation of form features between gestures depicting literal versus metaphoric mean-
ing. We would have expected gestures referring to abstract notions and actions to be
more “abstract” in the literal sense of the word. This means, we expected them to dis-
play less form features that contribute to the meaning of the gesture. This was however
not what we found.
What we observed were not differences in the gestural forms with regard to abstract-
ness or concreteness of the action or object gesturally depicted, but different ways of
gestural depiction irrespectively of whether the depicted object was a concrete or an
abstract one. So the subjects either produced a pantomimic re-enactment of the target
event described in the story (i.e., gestures involving gaze, body movement, facial expres-
sion and specified hand shapes) or they used a non-pantomimic (hands-only) depiction,
a kind of prototypical more de-contextualized depiction of the scene. Put differently, in
the case of pantomimic re-enactments of a scenario, more form features of the hands
contributed to the meaning of the gestures and more body parts were involved. In
non-pantomimic depictions, the deployed gestures were semantically reduced, i.e.,
less form parameters of the hands contributed to the meaning of the gesture and
they mostly involved only the hands.
These results can be explained drawing upon Müller’s theory of a dynamic focus of
attention which proposes that particular aspects of meaning are foregrounded by the
participants of a conversation depending upon the flow of attention within a given inter-
action (Müller 2008a, b; Müller and Tag 2010). The results of our experiment indicate
that participants differ in their ways of conceptualization and experiencing the scenes
depicted in the story, irrespectively of whether the gesture depicts aspects of literal
or metaphoric expressions. So no matter whether concrete or abstract objects and
events in the world are depicted gesturally, there will be people which provide rich pan-
tomimic enactments and others which do fairly floppy loose hand gestures, which are
semantically poorer.
We would like to suggest that when speakers use highly specified hand shapes and
posture and the face it is likely that they fall back on full body experiences of rich sce-
narios. When, on the contrary, semantically reduced gestures are used and no further
body part is involved in the gestural performance, the gestures depict prototypical situa-
tions and their associated embodied experiences. For instance, some speakers will
depict a sticking piece of chewing gum precisely where it got stuck (at the bottom of
something else), while others will just perform a movement of the index and thumb il-
lustrating only the nature and quality of stickiness as property. The same is true for me-
taphoric usages of “sticky”: some subjects performed pantomimic enactments others
fairly abstract hands only depictions of stickiness.
These findings can be interpreted along Fricke’s (2007, 2012) Peirceian distinction of
object versus interpretant-related gestures. Accordingly, we suggest that our analysis
of the formational features of the gestures and the types and amount of articulators
involved in gesturing uncovers whether subjects conceptualize a specific event as an
45. Towards a grammar of gestures: A form-based view 717
Examples of gesture families are Kendon’s two families of the open hand: the family of
the Open Hand Prone (“palm down”) and the family of the Open Hand Supine (“palm
up”) (Kendon 2004: chapter 13). Based on a context-of-use study of the Open Hand
Prone Kendon suggests that gestures with a palm downward orientation “share the
semantic theme of stopping or interrupting a line of action that is in progress” (Kendon
2004: 248–249). For the Open Hand Supine or “palm up” family of gestures Kendon
and Müller assume that the shared formational features, e.g. hand-shape (palm open)
and the orientation (upwards) come with the shared semantic themes of “offering” and
“receiving” (Kendon 2004: 264; Müller 2004). Both base their assumptions of group-
ing gestures as a family on the semantization of gestural forms, meaning that the
semantic core or the thematic theme is connected with one or several of the four
718 IV. Contemporary approaches
family of Away Gestures, the process of removing objects is exploited to form gestures
of exclusion and negation, showing that negation rests upon bodily concepts, is struc-
tured by tactile experiences, and thus has a bodily dimension (Bressem, Müller, and
Fricke in preparation; Harrison 2009; Lapaire 2006).
To sum up, a form-based view also includes a close analysis of the motivation of
gestures and in this case it resulted in the discovery of a new type of gesture family.
Tab. 45.1
Recurrent gesture Description of form and meaning
Brushing away The hand, in which the palm is oriented towards the speakers’
body, is moved outwards by a (rapid) twist of the wrist.
Topics of talk are rejected by (rapidly) brushing them away
from the speakers’ body.
Holding away The (lax) flat hand(s), with the palm turned away from the
speakers’ body, are moved upwards and held and/or moved.
Topics of talk are rejected by holding and moving the palm away.
Throwing away The hand is orientated vertically, the palm is facing away from
the speakers’ body and the hand is flapped downwards by
moving down the wrist. An imaginary topic of talk, sitting in the
palm of the hand, is dismissed by throwing it away.
(Continued )
720 IV. Contemporary approaches
(Continued )
45. Towards a grammar of gestures: A form-based view 721
movement changes: The “slap” is executed with a lax flat hand, a palm oriented
upwards, and a movement anchored in the lower arm. Another systematic form-
function distinction was found in the group of visual gestures. It turned out that visual
gestures with an object are not about the object itself, but are used to regulate social
relationships. Visual gestures without an object on the other hand are used for a
range of different functions (Bressem, Liebal, Müller, et al. in preparation).
Even if these variations in form may seem very rudimentary at first sight, they nev-
ertheless indicate that Orang-Utans are able to systematically vary form and meaning
of a recurrent gestural form. In doing so, the study not only provides insights into
the structure of ape gestures, but also suggests essential similarities in the use of ges-
tures between human and non-human primates, regarding the internal structures, as
well as the development of gestures from instrumental actions.
To conclude: a close analysis of the formational features of gestures in non-human
primates revealed that apes seem to modify at least some of their gestures depending
on the context they are used in – a finding that might shed new light on discussions
of language evolution (see Arbib this volume; Corballis this volume; Kendon 2008;
McNeill this volume).
(i) One gesture might be repeated several times, resulting in the repetition of the
same gestural meaning (iteration) or in the creation of a new gestural meaning
(reduplication) (Bressem 2012, volume 2);
(ii) Several gestures depicting objects, actions, events in a literal (McNeill’s iconics) or
metaphoric manner (McNeill’s metaphorics) may combine to describe an entire
scenario. A verbo-gestural account of an afternoon visit to the baroque garden
of the Versailles castle would be an example;
(iii) Several pragmatic and performative gestures (the Ring Gesture, the Palm Up
Open Hand (Kendon’s Palm Presenting), the Away Gestures) may combine.
They are typically found in argumentative discourses (Political speeches or discus-
sions are a good example for this type of gesture combination);
(iv) The three types might combine, with pragmatic and performative gestures often
located at the beginning or at the end of speaking turns (very often with metaprag-
matic functions) and depictive gestures often placed in the middle of gesture
sequences.
We will provide one characteristic example for the second type of combination. A stu-
dent of art history uses a succession of six depictive gestures to describe his impressions
and memories of an afternoon visit in the baroque garden of the palace at Versailles,
where the changing weather and light created spectacular effects (Fig. 45.2).
This sequence of depictive gestures begins with a departure of both hands from rest-
position into the upper part of the gesture space. In this location the speaker molds
what is supposed to indicate the blue sky (G1), then leaving his left hand up in gesture
724 IV. Contemporary approaches
G1 G2 G3
G4 G5 G6
space (the left “end” of the sky), he uses his right hand “to wave in” wind blowing up in
the sky (G2), then, resuming both hands, still in the “sky” location, round shapes of
clouds are molded (G3), coming in as a result of the blowing wind. Also in the same
position loose downward movements of the relaxed hands are used to depict falling
rain (G4). Now, at the lower end of the raining movements, the effect of this rain is
shown “everything was wet”. His hands are now in a palm downward orientation and
perform a lateral sweep (Kendon’s PL) (G5) to show that “everything”, the entire ground
had been turned wet (for an analysis of the meaning of the PL, see Kendon 2004:
chapter 13). With the last gesture of this sequence the speaker comes to the point of his
little story, namely the effect that this changing weather had on the beauty of the famous
baroque garden: with the returning sun after the rain shower everything was sparkling.
Delicate finger movements of both hands (located between the ground and the sky
position) illustrate the shimmering and sparkling light reflexes on the wet landscape
(G6). The series of depictive gestures comes to an end, with a return to full rest-position.
What is remarkable about this long gesture unit is that all the gestures are performed
within the same gesture space. With his first gesture the speaker sets up a spatial
frame in the upper part of the gesture space (“the blue sky”) – and creates a kind of
stage for all the facets of the afternoon scenario that he will illustrate with the follow-
ing gestures: the wind, the clouds, the rain, all the wet and shimmering reflexes in the
garden. This is what we term a mimetic relation between different gestures (e.g., ges-
ture phrases) and we consider it one fundamental principle of gesture combination
(see Müller, Ladewig, and Bressem volume 2; Bressem, Ladewig, Müller, et al. in
preparation).
45. Towards a grammar of gestures: A form-based view 725
also with other types of gestures and in different contexts, indicating that this non-
mimetic use of gesture space is a systematic way of combining gestures and of creating
abstract structural relations between them (Tag and Müller 2010; see also Bressem
2012).
5. Conclusion
We believe that the form-based view proposed here, substantiates Kendon’s claim of
gestures as “movement excursions” and as showing “features of manifest deliberate ex-
pressiveness”. To offer a systematic, form-based, and linguistic documentation of them
is what we term “a grammar of gestures”. We emphatically avoid to project linguistic
categories onto co-verbal gestures. In doing so, our approach is in accordance with
Enfield’s characterization of analyzing gestures as part of composite utterances:
When examining gesture, as when examining any other component of composite utter-
ances, we must carefully distinguish between token meaning (enriched, context-situated),
type meaning (raw, context-independent, pre-packaged), and sheer form (no necessary
meaning at all outside of a particular context in which it is taken to have meaning).
These distinctions may apply to signs in any modality. (Enfield this volume)
The basic idea being proposed here is that the category of language (in general, and in
terms of any individual language) is not clearly bounded, but is a dynamic category with
a fuzzy, flexible boundary which can incorporate other forms of symbolic expression to
varying degrees, and differently in different contexts of use. (Cienki 2012: 155)
Our specific twist is that we take the form of the gestural movement as a point of depar-
ture for our analysis of the meaning of gestures. We regard gestures as motivated
signs and as such we consider a close analysis of their form as point of departure for
reconstructing their meaning.
In order to arrive at the local, indexical or more shared and maybe even conven-
tional forms of gesture’s meaning, we regard how a particular gesture is deployed within
its context-of-use (Kendon 1990, 2004) or more specifically within a particular linguistic
and interactional context. In doing this, we arrive at intersubjectively accountable de-
scriptions of gestural meaning (see Bressem, Ladewig, and Müller this volume; Müller
2010, submitted; Müller, Ladewig, and Bressem volume 2). This holds for ad hoc as well
as recurrent forms of meaning in gestures. It furthermore allows us to determine if – and
if so, how – gestures may be used to take over the functions of grammatical categories,
like verb or noun, or adverb (Fricke 2012; Ladewig 2012, volume 2).
In a form-based view, meaning creation in gestures is considered to be fundamentally
rooted in human experience. In that we are sympathetic to Streeck’s praxeological
approach to gesture (Streeck 2008, this volume). Gestures display embodied facets of
language as well as their embodied roots. In singular gestures (McNeill’s spontaneous
gestures), sensory (-motor) experiences are used to create ad hoc gestural meaning.
In recurrent gestures we observe processes of schematization, abstraction, segmentation
of the gestural movement and its meaning. Here we can see how – based on those
45. Towards a grammar of gestures: A form-based view 727
Acknowledgments
The Berlin Gesture Project was located at the Freie Universität Berlin (2000–2006)
and headed by Cornelia Müller. Members of the group were: Karin Becker, Janina
Beins, Ulrike Bohle, Gudrun Borgschulte, Jana Bressem, Ellen Fricke, André Hatting,
Alexander Kästner, Silva Ladewig, Nadine Schmidt, Daniel Steckbauer, Susanne Tag,
Sedinha Theßendorf, and Juliane Wutta. The ToGoG project (2006–2011) was located
728 IV. Contemporary approaches
at the European-University Viadrina, the Free University Berlin and the German Sport
University Köln and headed by Cornelia Müller (Linguistics), Ellen Fricke (Semiotics),
Hedda Lausberg (Neurology), and Katja Liebal (Primatology). Members of the group
were: Jana Bressem, Mary Copple, Firdaous Fatfouta-Hanka, Marlen Griessing, Julius
Hassemer, Ingo Helmich, Katharina Hogrefe, Henning Holle, Benjamin Marienfeld,
Robert Rein, Philipp Kazzer Silva Ladewig, Christel Schneider, Nicole Stein, Susanne
Tag, Sedinha Teßendorf, Sebastian Tempelmann, and Ulrike Wrobel. Irene Mittelberg
was an associated member of the project.
We are grateful to the Volkswagen Foundation for supporting this research with a
grant for the interdisciplinary project “Towards a grammar of gesture: Evolution,
brain and linguistic structures” (www.togog.org).
We thank Karin Becker for the drawings and all the group members for their inspir-
ing contributions to this intellectual enterprise.
6. References
Andrén, Mats 2010. Children’s gestures from 18 to 30 months. Ph.D. dissertation, Centre for Lan-
guages and Literature, Lund University.
Arbib, Michael this volume. Mirror systems and the neurocognitive substrates of bodily commu-
nication and language. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
Aristotle 1965. Poetics. Oxford: Oxford University Press.
Arnheim, Rudolph 1974. Art and Visual Perception. A Psychology of the Creative Eye. Berkeley:
University of California Press.
Battison, Robin 1974. Phonological deletion in American sign language. Sign Language Studies 5:
1–19.
Bohle, Ulrike 2007. Das Wort ergreifen–das Wort übergeben: Explorative Studie zur Rolle redebe-
gleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D. disser-
tation, European University Viadrina, Frankfurt (Oder).
Bressem, Jana this volume. A linguistic perspective on the notation of form features in gestures. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.)
Berlin: De Gruyter Mouton.
Bressem, Jana volume 2. Repetitions in gesture. In: Cornelia Müller, Alan Cienki, Ellen Fricke,
Silva H. Ladewig, David McNeill, and Jana Bressem (eds.), Body – Language – Communication:
An International Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics
and Communication Science 38.2.) Berlin: De Gruyter Mouton.
Bressem, Jana, Cornelia Müller and Ellen Fricke in preparation “No, not, none of that” – cases of
exclusion and negation in gesture.
Bressem, Jana and Cornelia Müller volume 2. A repertoire of recurrent gestures of German. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Alan Cienki, Silva H. Ladewig, David McNeill and
Jana Bressem (eds.), Body – Language – Communication: An International Handbook on Mul-
timodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.2.) Berlin, New York: De Gruyter Mouton.
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases – articulatory features of
gestural movement? Semiotica 184(1/4): 53–91.
45. Towards a grammar of gestures: A form-based view 729
Bressem, Jana, Silva H. Ladewig and Cornelia Müller this volume. Linguistic Annotation System
for Gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin, New York: De Gruyter Mouton.
Bressem, Jana, Silva H. Ladewig, Cornelia Müller, Melissa Arnecke, Franziska Boll, Dorothea Böhme,
Lena Hotze, Benjamin Marienfeld, Nicole Stein in preparation. Linear structures in gestures.
Bressem, Jana, Katja Liebal, Cornelia Müller and Nicole Stein in preparation. Recurrent forms
and contexts: Families of gestures in non-human primates.
Calbris, Geneviève 1990. The Semiotics of French Gestures. Bloomington: Indiana University Press.
Calbris, Geneviève 2011. Elements of Meaning in Gesture. Amsterdam: John Benjamins.
Cienki, Alan 2012. Usage events of spoken language and the symbolic units (may) abstract from
them. In: Kosecki, Krzysztof and Janusz Badio (eds.), Cognitive Processes in Language,
149–158. Frankfurt am Main: Peter Lang.
Cienki, Alan this volume. Cognitive Linguistics: Spoken language and gesture as expressions of
conceptualization. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
Cienki, Alan and Cornelia Müller 2008. Metaphor, gesture and thought. In: Raymond W. Gibbs
(ed.), Cambridge Handbook of Metaphor and Thought, 483–501. Cambridge: Cambridge Uni-
versity Press.
Clark, Herbert H. 1996. Using Language, Volume 4. Cambridge: Cambridge University Press.
Clark, Herbert H. and Richard J. Gerrig 1990. Quotations as demonstrations. Language 66(4):
764–805.
Cohen, Einya, Lila Namir and I. M. Schlesinger 1977. A New Dictionary of Sign Language: Em-
ploying the Eshkol-Wachmann Movement Notation System. The Hague: Mouton.
Corballis, Michael this volume. Gesture as precursor to speech in evolution. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.),
Body – Language – Communication: An International Handbook on Multimodality in
Human Interaction. (Handbooks of Linguistics and Communication Science 38.1) Berlin,
New York: De Gruyter Mouton.
Enfield, N. J. 2009. The Anatomy of Meaning: Speech, Gesture, and Composite Utterances. Cam-
bridge: Cambridge University Press.
Enfield, N. J. this volume. A ‘Composite Utterances’ approach to meaning. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.),
Body – Language – Communication: An International Handbook on Multimodality in
Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.) Berlin:
De Gruyter Mouton.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin: De
Gruyter Mouton.
Fricke, Ellen this volume. Towards a unified grammar of gesture and speech: A multimodal
approach. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Goffman, Erving 1974. Frame Analysis. Cambridge, MA: Harvard University Press.
Gombrich, Ernst H. 1961. Art and Illusion: A Study in the Psychology of Pictorial Representation.
London: Phaidon.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Ph.D.
dissertation, Université Michel de Montaigne, Bourdeaux 3.
730 IV. Contemporary approaches
Kendon, Adam 1972. Some relationships between body motion and speech: An analysis of an
example. In: Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communica-
tion, 177–210. New York: Elsevier.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
R. Key (ed.), Nonverbal Communication and Language, 207–227. The Hague: Mouton.
Kendon, Adam 1981. Introduction: Current issues in the Study of ‘Nonverbal Communication’. In:
Adam Kendon (ed.), Nonverbal communication, interaction, and gesture: Selections from
Semiotica (Approaches to Semiotics 41), 1–56. The Hague: Mouton.
Kendon, Adam 1986. Some reasons for studying gesture. Semiotica 62(1/2): 3–28.
Kendon, Adam 1990. Conducting Interaction: Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kendon, Adam 2008. Some reflections on the relationship between “gesture” and “sign”. Gesture
8(3): 348–366.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and co-
speech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Gesture, and Sign Language in Human-Computer Interaction, 23–35. Berlin:
Springer.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste.
In: Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. http://
cognitextes.revues.org/406.
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech: Structural, cog-
nitive, and conceptual aspects. Ph.D. dissertation, European University Viadrina, Frankfurt
(Oder).
Ladewig, Silva H. volume 2. Linear intergration of gestures into speech. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill , and Jana Bressem (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Inter-
action. (Handbooks of Linguistics and Communication Science 38.2.) Berlin: De Gruyter
Mouton.
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium hand – Discovering
Structures in gestures based on the four parameters of sign language. Semiotica.
Langacker, Ronald W. 1991. Concept, Image, and Symbol: The Cognitive Basis of Grammar. Ber-
lin: De Gruyter Mouton.
Langer, Susanne K. 1957. Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and
Art. Vol. 17. Cambridge, MA: Harvard University Press.
Lapaire, Jean-Remı́ 2006. Negation, reification and manipulation in a cognitive grammar of sub-
stance. In: Stephanie Bonnefille and Sebastian Salbayre (eds.), La Négation: formes, figures,
conceptualisation, 333–349. Tours: Presses universitaires François Rabelais.
Lausberg, Hedda, Robyn F. Cruz, Sotaro Kita, Eran Zaidel and Alan Ptito 2003. Pantomime to
visual presentation of objects: left hand dyspraxia in patients with complete callosotomy.
Brain 126: 343–360.
Lausberg, Hedda, Hauke Heekeren, Phillip Kazzer and Isabell Wartenburger submitted. Differen-
tial cortical mechanisms underlying pantomimed tool use and demonstrations with tool in
hand.
Liebal, Katja and Josep Call 2012. The origins of non-human primates’ manual gestures. Philoso-
phical Transactions of the Royal Society B: Biological Sciences 367(1585): 118–128.
Liebal, Katja, Simone Pika and Michael Tomasello 2006. Gestural communication in orang-utans
(Pongo pygmaeus). Gesture 6(1): 1–38.
Mandell, Mark 1977. Iconic devices in American Sign Language. In: Lynn A. Friedman (ed.), On the
Other Hand. New Perspectives on American Sign Language, 57–108. New York: Academic Press.
45. Towards a grammar of gestures: A form-based view 731
Mauss, Marcel 1973. The techniques of the body. Economy and Society 2(1): 70–88.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David (ed.) 2000. Language and Gestures. Cambridge: Cambridge University Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
McNeill, David 2007. Gesture and thought. In: Anna Esposito, Maja Bratanić, Eric Keller and
Maria Marinaro (eds.), Fundamentals of Verbal and Nonverbal Communication and the Bio-
metric Issue, 20–33. Amsterdam: IOS Press.
McNeill, David this volume. The co-evolution of gesture and speech, and downstream conse-
quences. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discoursive evidence
for multimodal models of grammar. Ph.D. dissertation, Cornell University. Ann Arbor, MI: UMI.
Mittelberg, Irene this volume. The exbodied mind: Cognitive-semiotic principles as motivating
forces in gesture. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
Mittelberg, Irene and Linda Waugh 2009. Multimodal figures of thought: A cognitive-semiotic
approach to metaphor and metonymy in co-speech gesture. In: Charles Forceville and Eduardo
Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
Müller, Cornelia 1998a. Beredte Hände. Theorie und Sprachvergleich redebegleitender Gesten.
In: Noll, Thomas and Caroline Schmauser (eds.), Körperbewegungen und ihre Bedeutung,
21–44. Berlin: Arno Spitz.
Müller, Cornelia 1998b. Redebegleitende Gesten: Kulturgeschichte, Theorie, Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2003. On the gestural creation of narrative structure: A case study of a story told
in conversation. In: Monica Rector, Isabella Poggi and Trigo Nadine (eds.), Gestures: Meaning
and Use, 259–265. Porto: Universidade Fernando Pessoa.
Müller, Cornelia 2004. Forms and uses of the palm up open hand. A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), Semantics and Pragmatics of Everday Gestures,
234–256. Berlin: Weidler.
Müller, Cornelia 2005. Gestures in human and nonhuman primates: Why we need a comparative
view. Gesture 5(1–2): 259–283.
Müller, Cornelia 2008a. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View. Chi-
cago: Chicago University Press.
Müller, Cornelia 2008b. What gestures reveal about the nature of metaphor. In: Alan Cienki and
Cornelia Müller (eds.), Metaphor and Gesture, 249–275. Amsterdam: John Benjamins.
Müller, Cornelia 2009. Gesture and Language. In: Kirsten Malmkjaer (ed.), Routledge’s Linguis-
tics Encyclopedia, 214–217. Abingdon: Routledge.
Müller, Cornelia 2010a. Mimesis und Gestik. In: Gertrud Koch, Martin Vöhler and Christiane
Voss (eds.), Die Mimesis und ihre Künste, 149–187. Paderborn: Fink.
Müller, Cornelia 2010b. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In: Sprache und Literatur 41(1): 37–68.
Müller, Cornelia submitted. How gestures mean – The construction of meaning in gestures with speech.
Müller, Cornelia this volume. Gestures as a medium of expression: The linguistic potential of ges-
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication: An International Handbook
on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin: De Gruyter Mouton.
732 IV. Contemporary approaches
Müller, Cornelia, Jana Bressem, Silva H. Ladewig and Susanne Tag 2009. Introduction to special
session “Towards a grammar of gesture”. Paper presented at the Gesture and Speech in Intera-
tion (GESPIN) at the Adam Mickiwicz University Poznan, Poland.
Müller, Cornelia and Alan Cienki 2009. Words, gestures, and beyond: Forms of multimodal met-
aphor in the use of spoken language. In: Charles Forceville and Eduardo Urios-Aparisi (eds.),
Multimodal Metaphor, 297–328. Berlin: De Gruyter Mouton.
Müller, Cornelia, Ellen Fricke, Silva H. Ladewig, Irene Mittelberg and Sedinha Teßendorf in
preparation. Gestural Modes of Representation – Revisited.
Müller, Cornelia, Silva H. Ladewig and Jana Bressem volume 2. Methods of linguistic gesture ana-
lysis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Alan Cienki, Silva H. Ladewig, David
McNeill and Jana Bressem (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.2.) Berlin: De Gruyter Mouton.
Müller, Cornelia and Susanne Tag 2010. The dynamics of metaphor: Foregrounding and activating
metaphoricity in conversational interaction. Cognitive Semiotics 6: 85–120.
Peirce, Charles S. 1960. Collected Papers of Charles Sanders Peirce (1931–1958). Vol. I: Principles
of Philosophy, Vol. II: Elements of Logic. Cambridge, MA: Belknap Press of Harvard Univer-
sity Press.
Radden, Günter, Klaus-Michael Köpcke, Thomas Berg and Peter Siemund 2007. Introduction: The
construction of meaning in language. In: Günter Radden, Klaus-Michael Köpcke, Thomas Berg
and Peter Siemund (eds.), Aspects of Meaning Construction, 1–15. Amsterdam: John Benjamins.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting speech and gesture. Ph.D. dissertation, Uni-
versity of Nijmegen, Nijmegen, the Netherlands.
Slama-Cazacu, Tatiana 1976. Nonverbal components in message sequence: “Mixed syntax”. In:
William Charles McCormack and Stephen A. Wurm (eds.), Language and Man: Anthropolog-
ical Issues, 217–227. The Hague: Mouton.
Slobin, Dan 1987. From thought and language to thinking for speaking In: John J. Gumperz and
Stephen C. Levinson (eds.), Rethinking Linguistic Relativity, 70–96. Cambridge: Cambridge
University Press.
Sowa, Timo 2005. Understanding Coverbal Iconic Gestures in Object Shape Descriptions. Ph.D.
dissertation. Berlin: Akademische Verlagsgesellschaft.
Stokoe, William C. 1960. Sign Language Structure. Buffalo, NY: Buffalo University Press.
Streeck, Jürgen 1994. ‘Speech handling’: The metaphorical representation of speech in gesture. A
cross-cultural study. Unpublished manuscript.
Streeck, Jürgen 2008. Depicting by gesture. Gesture 8(3): 285–301.
Streeck, Jürgen 2009. Gesturecraft: Manu-facturing Understanding. Amsterdam: John Benjamins.
Streeck, Jürgen this volume. Praxeology of gesture. In: Cornelia Müller, Alan Cienki, Ellen
Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Com-
munication: An International Handbook on Multimodality in Human Interaction.
(Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Tag, Susanne and Cornelia Müller 2010. Combining gestures: Mimetic and non-mimetic use of
gesture space. Paper presented at the 4th conference of the International Society for Gesture
Studies on 29.07.2010, Frankfurt (Oder).
Taub, Sarah F. 2001. Language from the Body: Iconicity and Metaphor in American Sign Lan-
guage. Cambridge: Cambridge University Press.
Teßendorf, Sedinha 2005. Pragmatische Funktionen spanischer Gesten am Beispiel des “Gestos de
Barrer”. Unpublished MA thesis, Freie Universität Berlin.
Teßendorf, Sedinha 2008. Pragmatic and metaphoric gestures – combining functional with cogni-
tive approaches. Unpublished manuscript.
Trier, Jost 1973. Aufsätze und Vorträge zur Wortfeldtheorie. The Hague: Mouton.
Wilcox, Sherman 2002. The iconic mapping of space and time in signed languages. In: Liliana
Albertazzi (ed.), Unfolding Perceptual Continua, 255–281. Amsterdam: John Benjamins.
46. Towards a unified grammar of gesture and speech: A multimodal approach 733
Wilcox, Sherman 2007. Routes from gesture to language. In: Elena Pizzuto, Paola Pientrandrea
and Raffaele Simone (eds.), Verbal and Signed Languages: Comparing Structures, Constructs
and Methodologies, 107–131. Berlin: Walter de Gruyter.
Wundt, Wilhelm 1921. Völkerpyschologie: Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Erster Band. Die Sprache. Leipzig, Germany: Engelmann.
Abstract
This chapter argues for a multimodal approach to grammar (Bressem 2012; Fricke 2008,
2012; Harrison 2008, 2009; Ladewig 2012) and offers a sketch of the theoretical founda-
tions according to Fricke (2012). Two main research traditions in linguistics are consid-
ered: generative grammar and linguistic structuralism and functionalism. The enterprise
of a multimodal grammar is substantiated by the analyses of typification and semantiza-
tion of gestures as potential syntactic constituents, by giving the rules of a generative
phrase structure grammar of co-speech gestures which displays recursion and self-
embedding, and by the grammatical analysis of multimodal attribution in German
noun phrases. If we conceive of multimodality as a global dimension of linguistic and
semiotic analysis which is generally applicable to language and other systems of signs,
then we have to broaden our perspective by also including grammars of single languages
and the human language faculty.
1. Introduction
Until recently, the idea that a multimodal approach to grammar is necessary was by no
means evident. Most grammarians so far focus their grammatical analyses on written
and spoken language without considering co-speech gestures. Yet the progress in
734 IV. Contemporary approaches
gesture studies offers a new perspective on the grammatical capacity of gestures accom-
panying speech (Bressem 2012; Fricke 2008, 2012; Harrison 2008, 2009; Ladewig 2012).
Human speech is not only composed of articulations of the mouth, primarily perceived
by ear, but also of visible articulations of other body parts affecting the eye (e.g., Ken-
don 2004; McNeill 1992, 2005). In this regard, the movements of the hands play a special
role: the sign languages of the deaf show that movements of the hands alone can func-
tion as articulators of fully established languages (e.g., Wundt [1900] 1904, 1973). If it is
the case that movements of the hand inherently have the potential for establishing a
grammar, what are the grammatical implications of all those hands movements that
accompany the speech of hearing people? Are single languages like German or English
partially multimodal? How far is the faculty of language (Hauser, Chomsky, and Fitch
2002) bound to a particular mode of manifestation?
These are basic questions that link the enterprise of a multimodal approach to gram-
mar to a long intermittent linguistic tradition of research affiliated above all with the
names Wilhelm Wundt, Karl Bühler, Louis Hjelmslev, and Kenneth Pike. Karl Bühler
([1934] 2011) justifies a necessary combination of gestural pointing and verbal deictics
with the argument that only in this way are speakers able to successfully refer to entities
in certain utterance situations. Wundt ([1900] 1904, 1973), using the example of sign lan-
guages, demonstrates for the first time that the faculty of language is manifest in the
visual-gestural mode. Hjelmslev ([1943] 1969), as well as Pike (1967), argues for a per-
spective of mode neutrality with respect to utterances as a whole because gestures can
potentially instantiate structures and functions of language and speech. Therefore, ac-
cording to this view, defining linguistic categories has to occur independently of any lin-
guistic “substance” (Hjelmslev 1969). In this research tradition, two basic presumptions
coexist in an unconnected way: first, the presumption that speech can manifest itself in
various media and second, the presumption that subcodes, which may differ medially
from the vocal language, can be integrated into a vocal matrix code. At first glance,
both principles, manifestation and integration, seem to be incompatible. However, a
deeper, more systematic approach facilitates a productive conceptual perspective:
The phenomenon of the multimodality of language is to be tackled on different levels
and from different linguistic perspectives. With respect to generative as well as struc-
tural-functional conceptions of grammar and language, the basic argumentation in
the following sections pursues the goal of substantiating that co-speech gestures belong
at least partially to the subject area of grammar.
linguistic units with stable form-meaning relations which might be capable of entering
into syntactic constituent structures as constituents. How can we address this objec-
tion? The claim that movements of the hand are categorically not capable of instantiat-
ing linguistic structure and functions and, in a narrower sense, not capable of building
morphemes is easily invalidated. Consider the sign languages of the deaf, which are
fully developed linguistic systems possessing both a manual syntax and a manual lex-
icon. Gestures of the hearing may also be meaningful, comparable to morphemes in
vocal utterances. These gestures with stable form-meaning relations are the so-called
“emblematic gestures” (Efron [1941] 1972; Ekman and Friesen 1969). In section 4
we introduce the concept of kinesthemes as submorphemic units, which allows for
modeling semiotic processes of typification and semantization and thereby provides
terminal constituents for gestural constituent structures in section 5 (see also vol-
ume 2). This concept supports the assumption of a “rudimentary morphology” (Müller
2004: 3) as well as substantiating the category of “recurrent gestures” located between
idiosyncratic and emblematic gestures in Kendon’s continuum (e.g., Ladewig 2010,
2011, 2012; Müller 2008, 2010, submitted; Fricke, Bressem, and Müller volume 2).
With regard to co-speech gestures, gesture scholars have so far neglected grammar
and the syntactic dimension of analysis and linguists have largely considered gesture
as “non-verbal” and as a phenomenon of language use only, excluding it from their
subject. The crucial questions of the following sections are:
Proving the possibility of typification for gestures is the prerequisite for the assumption
of syntactic constituents that enter syntactic constituent structures. Proving the possibil-
ity of their semantization is the prerequisite for assigning the syntactic relation of mod-
ification in multimodal attribution in verbal noun phrases. On the basis of Eisenberg’s
grammar (1999), we show that co-speech gestures can fulfill syntactic as well as semantic
attributive functions in German as a single language. This implies that they must be seen
as part of the subject area of German grammar. With regard to the faculty of language,
co-speech gestures can be assigned syntactic constituent structures that are recursive. Ac-
cording to the hypothesis that recursion is the defining criterion for the language faculty
in the narrow sense (Hauser, Chomsky, and Fitch 2002) recursive co-speech gestures then
have to be considered as an integral part of human language (see volume 2). Conse-
quently, analyzing the grammar of single vocal languages as well as modeling the
human language faculty require a multimodal approach (section 7).
736 IV. Contemporary approaches
itself partially in two different “modalities”, for example, in single vocal and single sign
languages. On the other hand, gestures can be integrated structurally and/or function-
ally into the more dominant vocal matrix code of a single language. Therefore, we deal
first with processes of code manifestation and second with processes of code integra-
tion. At first glance, analyses of language-image relations deal primarily with the
principle of code integration while gesture-speech relations may be subject to both
principles.
MULTIMODALITY
Code manifestation/
Code integration
Sign language
– structural
(visual)
– functional
Language
Verbal level
Written language (auditory)
(visual)
SPOKEN LANGUAGE
Gestural level
(primarily auditory) (visual)
(Simultaneity of gestural
LANGUAGE FACULTY SINGLE LANGUAGE
and verbal level)
When, in a further step, the code-related media concept and the biological media con-
cept are combined, different media within the linguistic sphere can be differentiated:
sign language, written language, and spoken language. A single sign language like
American Sign Language (ASL) or German Sign Language (DGS) on this level is
merely monomedial and monomodal because only one sensory modality is affected:
the visual. Barring hypertextual applications on the internet, English or German as a
written language is primarily visual and therefore primarily monomedial and mono-
modal as well. Visual and auditory material integrated into a text presupposes the written
language as code but the reverse is not true. Written communication functions indepen-
dently from other potentially integrable codes which also seems to be the case with
spoken language. When on the telephone, for example, people are restricted to an audi-
tive modality and can nevertheless communicate without the visual modality. In such
communication situations, spoken language is monomedial and monomodal as well. Vis-
ible gestures accompanying speech presuppose an audible, vocal language, not the other
way around. Is it therefore sufficient to likewise classify the vocal language as monome-
dial and monomodal? There are strong arguments for not doing so. When comparing
communication by phone with face-to-face communication, it becomes clear that the
latter is the ontogenetically and phylogenetically primary form of communication.
The telephone is a rather young technological innovation, and its handling is learned
relatively late in a child’s development. With respect to a biological media concept,
face-to-face communication is primarily audiovisual and, according to Lyons (1977:
637–638), the “canonical situation of utterance”, serving as point of origin for all
other communication situations with its specific contextual constraints.
Although the term “spoken language” suggests that only the auditory sensory mo-
dality is affected, the visual sensory modality plays a role via gestures and other body
movements in face-to-face communications. Consequently two media (biological media
concept) are involved, which fulfill the necessary condition both of multimodality
and of multimediality. Is spoken language then to be classified as multimedial or as mul-
timodal? What could be a reasonable differentiation between the two notions with
respect to our research intentions of developing a multimodal approach to grammar?
When considering characterizations of co-speech gestures (e.g., Kendon 2004; McNeill
1992, 2005), they have to be seen as body movements observed when someone is
speaking. Concerning their timing, they are closely related to the uttered speech with
which they also share semantic and pragmatic functions.
This points to a workable criterion for differentiating multimodality and multimedi-
ality, that is to say, the structural and functional integration into one and the same
matrix code or, alternatively, the manifestation of one code in two different media. If
two linguistic media are structurally and/or functionally integrated into the same
code at the same time, or if, conversely, one code manifests itself simultaneously in
two different media, then we can speak of multimodality. If two or more media are nei-
ther structurally nor functionally integrated into one code and if there is no manifesta-
tion of the same code in different media, then the phenomenon is defined as
multimedial. This differentiation underlies our definitions of linguistic multimediality
as well as linguistic multimodality in the broad and narrow sense. The following table
summarizes the individual terms and their defining features:
740 IV. Contemporary approaches
What all of the terms listed in Tab. 46.1 share is that multimodality and multimediality
are only present when more than one medium is involved. This is the defining charac-
teristic as opposed to monomodality or monomediality. Multimediality can be distin-
guished from multimodality when there is no structural and/or functional integration
of the same primary code or the same code is not manifested in different media.
Within multimodality we distinguish between the broad and narrow case. Multimodal-
ity in the narrow sense occurs when the media involved in an expression belong to dif-
ferent sense modalities in terms of a biological media concept and are structurally and/
or functionally integrated in the same code or, alternatively, manifest the same code in
terms of a code-based media concept. Multimodality in the broad sense differs from
the narrow sense in that the media involved belong to different codes in terms of a
code-based media concept and the same sense modality in terms of a biological
media concept. For the concepts of code integration and code manifestation, there
are precedents in the speech theories of Pike (code integration) and Hjelmslev
(code manifestation). In our current analysis, we are fundamentally open to both
forms of multimodality.
language. The delimitable, meaningful segments resulting from such processes are
called “phonesthemes” or “sub-morphemic” units (Bolinger [1968] 1975; Firth [1935]
1957; Zelinsky-Wibbelt 1983). They are defined as intersubjective sound/meaning cor-
relations based on diagrammatic iconicity according to Charles S. Peirce (1931–58).
Bolinger (1975) characterizes them as words clustering in groups, for example, the
words ending in -ump: bump, chump, clump, crump, flump, glump, grump, hump. Se-
mantically, most of them suggest “heaviness”. Bolinger’s crucial observation is that of
an “underlying iconic drive to make sound conform to sense” (Bolinger 1975: 218).
The integration of such concepts like that of the phonestheme into grammars of the
spoken language is hindered in particular by the sharp separation between language use
and language system. This separation is valid for structural linguistics in the tradition of
Saussure as well as for generative linguistics in the Chomskyan tradition. In his book
System und Performanz (Stetter 2005), Christian Stetter makes an interesting proposal
to bridge the gap between language use and language system. In his approach, based on
Nelson Goodman’s work, linguistic types (language system) are understood as sets of
tokens (language use) which – rather than being identical – are only similar to each
other and do not share a common original as basis. This concept allows for intermediate
stages of conventionalization like phonesthemes – and kinesthemes.
Fricke defines kinesthemes analogously to phonesthemes as gestural tokens with in-
tersubjective semantic loading based on diagrammatic iconicity (Fricke 2008, 2010,
2012). Similarities of form correlate with similarities of meaning. Analyses of empirical
examples from route descriptions at Potsdam Square in Berlin show that kinesthemes
can be simple or complex. Complex kinesthemes can be compared to processes of mor-
phological contamination or blending in word formation of spoken languages (e.g.,
smog is a blend of smoke and fog, cf. Zelinsky-Wibbelt 1983) (Fricke 2008, 2012). The
following pointing gestures in figure 46.2 and 46.3 illustrate an analogous process of ges-
ture formation:
Fig. 46.2: Two types of pointing gestures in German: G-Form and PLOH (Fricke 2012: 110)
In German, we can observe two typified forms of pointing gestures: Firstly, the so-called
G-Form with an extended index finger and the palm oriented downwards, secondly,
the palm-lateral-open-hand gesture (PLOH) (Fricke 2007, 2008; see Kendon and Ver-
sante 2003 for Italian gestures). The G-form is semantically loaded with a meaning
which can be paraphrased as “pointing to an object”, whereas the meaning of the
palm-lateral-open-hand gesture is directive (“pointing in a direction”). Fig. 46.3 shows
an example of a gestural contamination that blends both types. It can be paraphrased
as “pointing to an object in a particular direction”.
742 IV. Contemporary approaches
This example shows that processes of formal typification and semantic loading on the
verbal and gestural level are both guided by the same principles. Phonesthemes and kin-
esthemes manifest the same general semiotic code of sign formation and complement
other types of meaning construction, for example, metonymies and metaphors (Cienki
2008; Cienki and Müller 2008; Mittelberg 2006, 2008; Müller 2008, 2010).
sequentialized through iteration are completely independent of one another, the same
is not true for recursive sequentialization: “Iteration involves repetition of an action or
object, where each repetition is entirely independent of those that come before and
after. Recursion involves the embedding of an action or object inside another of the
same type, each embedding being dependent in some way on the one it is embedded
inside” (Kinsella 2010: 180).
Let us look at the following examples of iteration (1) and recursion (2) (Kinsella
2010: 181):
(1) Iteration: Jack ate [NP1 the sandwiches NP1] and [NP2 the doughnut NP2] and
[NP3 the apple NP3].
(2) Recursion: [NP1 [NP2 [NP3 John’s NP3] mother’s NP2] neighbor NP1] bought the car.
At first glance, both examples appear on the surface to be chains of noun phrases. Look-
ing closer at the structure of each sentence, however, it becomes apparent that example
(1) has a flat structure in which the noun phrases are independent of each other. In
example (2), on the other hand, there exists a dependence between the noun phrases
that determines the relation of modification (Kinsella 2010: 181). This is also the reason
why in example (1) the order of the noun phrases could, in general, be changed whereas
in example (2) this is not the case (Kinsella 2010).
Lobina (2011: 155) and Fitch (2010: 78) advise that it is necessary to maintain a strict
separation between recursive structures and fundamental recursive algorithms in which
iterative structures, such as in example (1) above, can be created:
[…] many studies focus on the so-called self-embedded sentences (sentences inside other
sentences, such as I know that I know etc.) as a way to demonstrate the non-finiteness of
language, and given that self-embedding is sometimes used as a synonym for recursive struc-
tures (see infra), too close a connection is usually drawn between the presence of these
syntactic facts and the underlying algorithm of the language faculty. (Lobina 2011: 155–154)
Lobina’s distinction between structure and fundamental process also allows a different
perspective on Everett’s objections to the article by Hauser, Chomsky, and Fitch (2002)
in that the absence of self-embedding as a structure cannot be an argument against ac-
cepting recursion as a fundamental algorithm that, as a key element of the faculty of
language, might be present in all natural languages. He argues:
However, even if there were a language that did not exhibit self-embedding but allowed
for conjunction, you could run the same sort of argument and the non-finiteness conclusion
would still be licensed. These two aspects must be kept separate; one focuses on the sort of
expressions that languages manifest (or not), while the other is a point about the algorithm
that generates all natural language structures. (Lobina 2011: 156)
With regard to the analysis of speech-accompanying gestures, Fricke (2012) shows that
they alone – without reference to vocal utterances – can essentially form arbitrarily long
“flat” chains: On the one hand, gestures share the structural characteristics of iteration
that can be created through a recursive algorithm. On the other hand, however, ges-
tures also share the structural properties of a “deeper” self-embedding in that gestural
constituents can contain other gestural constituents of the same type. Based on
744 IV. Contemporary approaches
empirical analyses, Fricke proposes (2012: 176) the following phrase structure rules for
co-speech gestures (for more details see volume 2):
GP Retr
GU GU (GU1 … GUn) Retr
GU GP GP (GP1 … GPn) Retr
GU (GU1 … GUn) GP (GP1 … GPn) (GUn+1 … GUz) Retr
GP (GP1 … GPn) GU (GU1 … GUn) (GPn+1 … GPz) Retr
GP (Prep) SP
SP S (S1 … Sn)
S (Hold) s (Hold)
The starting point for this system of rules is the gesture unit. A primary gesture unit is
the highest unit of the constituent hierarchy. This fact is reflected by using the category
GU as the starting symbol, comparable to the category sentence (S) in generative gram-
mars of vocal languages. The property of self-embedding is indicated when to the left
and right of the arrow the same category symbol is present. There is, to date, no empir-
ical evidence for levels of embedding deeper than one (primary and secondary gesture
units). The braces show that the vertical listing of alternative symbol chains could each
serve as a “replacement” for gesture units. The individual symbols and the constituents
they represent can either be obligatory or optional. If they are optional, this is shown by
using parentheses.
According to Kendon (2004, 1972) gesture units (GU) are limited by positions of
relaxation and – in contrast to gesture phrases (GP) – obligatorily contain a phase of
retraction (Fricke 2012). A primary gesture unit is the highest constituent in the ges-
tural constituent structure, whereas secondary gesture units are dominated by a primary
gesture unit (Fricke 2012). Gesture units can be simple (GP + Retr) or complex. In prin-
ciple, complex gesture units consist of an arbitrary number of gesture units and/or ges-
ture phrases. Analyses of selected video sequences show that the embedding of
secondary gesture units is indicated by the degree of relaxation and the location of
the respective rest position. Primary gesture units show complete relaxation, whereas
in secondary gesture units the relaxation is only partial. The hierarchy level is indicated
by “gestural cohesion” (McNeill 2005): All coordinated gesture units show the same
degree of relaxation in their retraction and the same location of the rest position (Fricke
2012). Stroke phrases (SP), too, can be either simple or complex. They expand to one or
more strokes (S) that are ordered next to each other. Strokes (S) expand then to an
obligatory stroke nucleus (Kendon 2004: 112) which can be preceded or followed by
an optional hold. Whether it is a so-called pre- or post-stroke hold is not categorically
determined, but rather by its position in the constituent structure. The terminal constit-
uents are the gesture phases (e.g., Bressem and Ladewig 2011; Kendon 1980, 2004;
Kita, Van Gijn, and Van der Hulst 1998) stroke nucleus (s), hold (Hold), preparation
(Prep), and retraction (Retr).
What conclusions can we draw from this for language theory? If we consider the cur-
rent debate about recursion and language complexity started by Hauser, Chomsky, and
Fitch (2002) then the fact that co-speech gestures are recursive carries with it the
following consequences:
46. Towards a unified grammar of gesture and speech: A multimodal approach 745
Based on the assumption that recursion is specific to the language faculty in the nar-
row sense (FLN), then the recursion of co-speech gestures forces them to be considered
as an integral element of language. An indication that the human language faculty of
Hauser, Chomsky, and Fitch is not viewed as being fundamentally modality-specific
but rather the opposite, that the possibility of a change in modality is a determining fea-
ture can be found in the following quote: “[…] only humans can lose one modality (e.g.,
hearing) and make up for this deficit by communicating with complete competence in a
different modality (e.g., signing)” (Hauser, Chomsky, and Fitch 2002: 1575). With this,
the authors find themselves not far from Hjelmslev’s postulate that the substances do
not in and of themselves define language and that one and the same form can be mani-
fest in different substances (Hjelmslev 1969). From the acceptance of a compensatory
function of gestures through Hauser, Chomsky, and Fitch it is just a small step to accept-
ing a fundamentally multimodal constitution of language. Should the multimodality
of language be denied, it follows that, through the recursivity of co-speech gestures,
recursion cannot be unique to the faculty of language in the narrow sense.
On the syntactic level, the adjectival attribute circular is an expansion of the nuclear
noun and a constituent of the respective noun phrase the circular table. On the semantic
level, attributes are modifications of the noun, which is the nucleus of the noun phrase
(Eisenberg 1999). In this case modification can most easily and simply be understood as
the intersection of sets between the semantic extension of the adjective circular (all cir-
cular entities) and the semantic extension of the noun table (all tables). The resulting set
is a set of tables with the characteristic “being circular”.
Now regard the following examples, which are designed on the basis of an empirical
route description. They address the quality of shape of a given office tower, the so-called
“Sony Center” in Berlin, which is built in the shape of a semicircle. The speaker localizes
this tower, which resembles a bisected cylinder, on the right side of her gesture space.
Her mode of modeling evokes the impression of a vertical image with a sense of depth.
In each of the following four examples in figure 46.4 we deal with noun phrases ini-
tiated by a finite article and the noun tower as its nucleus. In (5) and (6) the noun
phrase on its verbal level is expanded with the attribute semicircular, which modifies
the nuclear noun semantically, whereas in (4) and (7) there are no attributive expan-
sions on the verbal level. The verbal utterances in the examples (6) and (7) are also ac-
companied by a speaker’s gesture of modeling a semicircular shape.
Fig. 46.4: The gesture modeling a semicircular shape in the examples (4) to (7)
What difference occurs between the examples (5) and (7)? Both utterances inform the
addressee about the shape of the object referred to. The difference merely consists of
the fact that in example (5) the speaker refers to the shape of the object exclusively
verbally, while in example (7) this happens solely gesturally. This shows that the
46. Towards a unified grammar of gesture and speech: A multimodal approach 747
attributive function of modifying the nuclear noun in a noun phrase can also be instan-
tiated solely by gesture. The resulting intersection of semantic extensions is the same
in both cases: A set of towers with the characteristic “being semicircular”. So certain
occurrences of co-speech gestures fall within the scope of Eisenberg’s concept of an
attribute mentioned above.
The German speaker on the left describes the façade of the Berlin State Library: She
uses the noun phrase sone gelb-goldenen Tafeln ‘such yellow golden tiles’ accompanied
by a gesture modeling a rectangular shape. On the verbal level, the adjective gelb-
golden expands the nuclear noun, modifying it at the same time by reducing its
748 IV. Contemporary approaches
extension to tiles with a specific characteristic of color. On the gestural level, the rect-
angular shape performed by the hands of the speaker fulfills an analogous function of
modifying the nuclear noun. The resulting intersection of both extensions is a set of tiles
with a specific characteristic of color (yellow golden) and a specific characteristic of
shape (rectangular). This division of labor is a very frequent pattern in multimodal
noun phrases: due to the particular medial capacity of both modes, speakers tend to
use gestures for referring to aspects of shape, whereas the use of verbal adjectives pro-
vides information with respect to color (Fricke 2008, 2012). The noun phrase of this
example shows a temporal overlap between the verbal adjective and the modifying
co-speech gesture.
The crucial question is whether or not co-speech gestures are capable of instantiating
independent syntactic constituents detached from the nuclear noun in a noun phrase.
Co-speech gestures can only be expected to adopt an attributive function within verbal
noun phrases on the syntactic level if this requirement is met. Research into this ques-
tion so far did not get beyond the assumption that co-speech gestures can fill syntactic
gaps in linear verbal constituent structures. Considering temporal overlaps as given in
this example, the following alternative explanation with respect to the relation between
the rectangular gesture and the nuclear noun tiles could be offered: The rectangular
shape metonymically stands for the respective concept TILE which is associated with
the word form tiles (e.g., Lakoff and Johnson 1980; Mittelberg 2006; Mittelberg and
Waugh 2009). This explanation would also be in line with the assumption of a so-called
“lexical affiliate” according to Schegloff (1984).
It is worth observing at this point that in colloquial German the article son ‘such a’
provides a syntactic integration of modifying gestures within verbal noun phrases as re-
quired above. According to Hole and Klumpp (2000), the qualitative deictic son is a
fully grammaticalized article inflecting for case, gender and number, which is simulta-
neously used for definite type reference and indefinite token reference. They give con-
vincing evidence that in German the article son is not just an optional contraction of so
‘such’ and ein ‘a’. They emphasize that “son does not just narrow down the meaning of
the indefinite article, it introduces a whole new dimension, namely, that of a necessary
two-dimensional reference classification” (Hole and Klumpp 2000: 240). Exactly these
two dimensions of reference apply also to multimodal noun phrases with son as article.
In the following examples the speaker informs the addressee about the shape of the
table he wants to buy within the next few days. The underlying pattern is “I want to
by such a [quality] table”. In contrast to noun phrases with definite articles the speaker
refers in this case to an indefinite token of a definite type (Hole and Klumpp 2000: 234).
With respect to example (10) this means that the speaker wants to buy a specific kind of
table (definite type) that only looks like the table he is pointing at (indefinite token).
As we have seen, the German article son shows a multimodal integratability of a very
high degree. As a fully grammaticalized article it is governed by the nuclear noun (first
step), as a qualitative deictic son cataphorically requires the description of a quality,
which can be instantiated either verbally or gesturally (second step). Both steps will
be explicated and complemented by a third step in the following section with respect
to Seiler’s continuum of determination within complex noun phrases.
(i) Specification (determination of reference): “The range of head nouns for which a
determiner D is potentially applicable increases with the potential distance of
that determiner from the head noun N”. (Seiler 1978: 308)
(ii) Characterization (determination of concept): “Determiners indicate properties im-
plied in the concept represented by the head noun. The degree of naturalness of
such an implication of Dni vs. Dnj decreases proportionally to the distance of
Dni vs. Dnj with regard to the head noun”. (Seiler 1978: 310)
In the following illustration (Fig. 46.7), Seiler’s continuum with both its domains is high-
lighted by two grey-shaded rectangles with a bilaterally oriented arrow between them.
Moving from the left side towards the right, the determination of reference declines and
the determination of concept increases, while moving from right to left determination of
concept declines and determination of reference increases. On the gestural level, the
deictic gesture is attached to the domain of specification and the iconic gesture to the
domain of characterization.
Because son as article is (Fig. 46.7), according to Eisenberg, governed by the nuclear
noun with respect to its gender, and because in specific contexts the existence of a ges-
ture, either deictic or iconic, is a precondition for the possibility of using son, son within
a noun phrase instantiates an additional turning point, namely, between linguistic
monomodality and linguistic multimodality. Son is the syntactic integration point on
the level of the linguistic system for gestures accompanying speech in noun phrases.
Gestures structurally integrated to such an extent can also be integrated functionally
as attributes in verbal noun phrases. Thus, because son in the noun phrase requires a
qualitative description, which can be gesturally instantiated as well, it is shown that ico-
nic gestures in noun phrases constitute autonomous syntactic units detached from the
nuclear noun. Furthermore, they can establish syntactic relations with the nuclear noun.
If the gestural qualitative determination takes place through an iconic gesture then
there follows a categorial selection of the gestural modes of representation (Müller
1998) by the article son (Fricke 2012): in noun phrases with the article son, iconic
750 IV. Contemporary approaches
MULTIMODAL INTEGRATION
(qualitative determination)
Government (gender)
cataphoric/
son
catadeictic
Extension Intension
Class/Individual Properties
Reference continuum Concept
Specification Characterization
Fig. 46.7: The article son as turning point and syntactic integration point for co-speech gestures in
noun phrases (Fricke 2012: 228)
gestures primarily occur in the modes of representation “the hand models” and “the
hand draws”, with rare occurrences in the mode “the hand acts”. The mode “the
hand represents”, by contrast, does not appear once in a corpus of instructions for
reaching a destination around Potsdam Square. The nuanced manual depiction of par-
ticular characteristics of objects might be hindered because the whole hand represents
an object in this mode.
The fact that there seems to occur a categorial selection of iconic gestures through son
basically distinguishes a qualitative determination through iconic gestures generated dur-
ing speaking from a qualitative determination through extralinguistic objects as demon-
strata of deictic gestures. These objects are given in a concrete situation and are
interpreted by the speaker and the addressee according to a specific quality by means
of deictic guidance of attention. Taken together, we therefore deal with a syntactical inte-
gration of gestures into noun phrases emerging in three consecutive steps (Fricke 2012:
230): The first step is constituted by the government of son by the nuclear noun of the
noun phrase (dotted arrow), the second step consists of the cataphoric integration of a
gestural qualitative determination required by son (solid arrow). In the case of an iconic
gesture providing the qualitative determination, the third step accomplishes a categorial
selection with respect to the four gestural modes of representation “the hand models”,
“the hand draws”, “the hand acts”, and “the hand represents” (dashed arrow).
46. Towards a unified grammar of gesture and speech: A multimodal approach 751
8. References
Bolinger, Dwight L. 1975. Aspects of Language. New York: Harcourt Brace Jovanovich. First
published [1968].
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D. disser-
tation, European University Viadrina, Frankfurt (Oder).
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases: Articulatory features of ges-
tural movement? Semiotica 184: 53–91.
Bühler, Karl 2011. Theory of Language. The Representational Function of Language. Amsterdam:
Benjamins. First published [1934].
Cienki, Alan 2008. Why study metaphor and gesture? In: Alan Cienki and Cornelia Müller (eds.),
Metaphor and Gesture, 5–24. Amsterdam: Benjamins.
Cienki, Alan and Cornelia Müller (eds.) (2008). Metaphor and Gesture. Amsterdam: Benjamins.
752 IV. Contemporary approaches
Corballis, Michael C. 2007. The uniqueness of human recursive thinking. American Scientist 95:
240–248.
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton. First published [1941].
Ehlich, Konrad 1987. so – Überlegungen zum Verhältnis sprachlicher Formen und sprachlichen
Handelns, allgemein und an einem widerspenstigen Beispiel. In: Inger Rosengren (ed.),
Sprache und Pragmatik. Lunder Symposium 1986, 279–313. Stockholm: Almqvist and Wiksell.
Eisenberg, Peter 1999. Grundriß der deutschen Grammatik. Volume 2: Der Satz. Stuttgart:
Metzler.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior. Categories, ori-
gins, usage, and coding. Semiotica 1: 49–98.
Everett, Daniel L. 2005. Cultural constraints on grammar and cognition in Pirahã. Another look at
the design features of human language. Current Anthropology 46: 621–646.
Firth, John Rupert 1957. The use and distribution of certain English sounds. In: John Rupert Firth,
Papers in Linguistics 1934–1951, 34–46. London: Oxford University Press. First published
[1935].
Fitch, W. Tecumseh 2010. Three meanings of “recursion”: Key distinctions for biolinguistics. In:
Richard K. Larson, Viviane Déprez and Hiroko Yamakido (eds.), The Evolution of Human
Language, 73–90. Cambridge: Cambridge University Press.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: De Gruyter.
Fricke, Ellen 2008. Grundlagen einer multimodalen Grammatik: syntaktische Strukturen und
Funktionen. Habilitation thesis, European University Viadrina, Frankfurt (Oder).
Fricke, Ellen 2010. Phonaestheme, Kinaestheme und multimodale Grammatik. Sprache und
Literatur 41: 69–88.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin: De
Gruyter.
Fricke, Ellen, Jana Bressem and Cornelia Müller in preparation. Gesture families and gestural
fields. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication. An International Handbook
on Multimodality in Human Interaction (HSK 38.2). Berlin and Boston: De Gruyter Mouton.
Goodman, Nelson 1976. Languages of Art. An Approach to a Theory of Symbols. London: Oxford
University Press. First published [1968].
Harrison, Simon 2008. The expression of negation through grammar and gesture. In: Jordan Zla-
tev, Mats Andrén, Marlene Johansson Falck and Carita Lundmark (eds.), Studies in Language
and Cognition, 405–409. Newcastle upon Tyne: Cambridge Scholars Publishing.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Ph.D.
dissertation, Université Bordeaux 3.
Hauser, Marc D., Noam Chomsky and W. Tecumseh Fitch 2002. The faculty of language: What is
it, who has it, and how did it evolve? Science 298(4): 1569–1579.
Hjelmslev, Louis 1969. Prolegomena to a Theory of Language. Madison: University of Wisconsin
Press. First published [1943].
Hole, Daniel and Gerson Klumpp 2000. Definite type and indefinite token: The article son in col-
loquial German. Linguistische Berichte 182: 231–244.
Karlsson, Fred 2010. Syntactic recursion and iteration. In: Harry van der Hulst (ed.), Recursion
and Human Language, 43–67. Berlin: De Gruyter Mouton.
Kendon, Adam 1972. Some relationships between body motion and speech. An analysis of an
example. In: Aron W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
177–210. New York: Pergamon Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary R.
Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The Hague: Mouton.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan”. In: Sotaro Kita (ed.),
Pointing: Where Language, Culture, and Cognition Meet, 109–137. Mahwah, NJ: Erlbaum.
46. Towards a unified grammar of gesture and speech: A multimodal approach 753
Kinsella, Anna 2010. Was recursion the key step in the evolution of the human language faculty? In:
Harry van der Hulst (ed.), Recursion and Human Language, 179–191. Berlin: De Gruyter Mouton.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and co-
speech gestures, and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Gesture and Sign Language in Human-Computer Interaction, 23–35. Berlin:
Springer.
Krämer, Sybille 1998. Das Medium als Spur und Apparat. In: Sybille Krämer (ed.), Medien, Com-
puter, Realität: Wirklichkeitsvorstellungen und Neue Medien, 73–94. Frankfurt am Main:
Suhrkamp.
Kress, Gunter 2011. What is mode? In: Carey Jewitt (ed.), The Routledge Handbook of Multimo-
dal Analysis, 54–67. London: Routledge. First published [2009].
Kress, Gunter and Theo van Leeuven 2006. Reading Images. The Grammar of Visual Design. Lon-
don: Routledge. First published [1996].
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern. Varianten einer rekurrenten Geste.
Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. http://
cognitextes.revues.org/406.
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech: structural, cog-
nitive, and conceptual aspects. Ph.D. dissertation, European University Viadrina, Frankfurt
(Oder).
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: Chicago University Press.
Lobina, David J. 2011 “A running back” and forth: A review of recursion and human language.
Biolinguistics 5(1–2): 151–169. http://www.biolinguistics.eu/index.php/biolinguistics/article/
view/198.
Lobina, David J. and José Eugenio Garcı́a-Albea 2009. Recursion and cognitive science: Data
structures and mechanisms. In: Niels Taatgen and Hedderik van Rijn (eds.), Proceedings of the
31st Annual Conference of the Cognitive Science Society, 1347–1352. Austin, Texas: Cognitive
Science Society
Loehr, Daniel P. 2004. Gesture and intonation. Ph.D. thesis, Georgetown University, Washing-
ton, DC.
Lyons, John 1977. Semantics. Volume 2. Cambridge: Cambridge University Press.
McClave, Evelyn Z. 1991. Intonation and gestures. Ph.D. thesis, Georgetown University, Washing-
ton, DC.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92: 350–371.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: Chicago
University Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Merten, Klaus 1999. Einführung in die Kommunikationswissenschaft. Volume 1/1: Grundlagen der
Kommunikationswissenschaft. Münster, Germany: LIT.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discourse evidence for
multimodal models of grammar. Ph.D. dissertation, Cornell University, Ithaca, NY.
Mittelberg, Irene 2008. Peircean semiotics meets conceptual metaphor: Iconic modes in gestural
representations of grammar. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Ges-
ture, 115–154. Amsterdam: Benjamins.
Mittelberg, Irene and Linda R. Waugh 2009. Metonymy first, metaphor second: A cognitive-
semiotic approach to multimodal figures of speech in co-speech gesture. In: Charles Forceville
and Eduardo Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2004. Forms and uses of the palm up open hand: A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Ges-
tures. Proceedings of the Berlin Conference April 1998, 233–256. Berlin: Weidler.
754 IV. Contemporary approaches
Müller, Cornelia 2008. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View.
Chicago: University of Chicago Press.
Müller, Cornelia 2010. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. Sprache und Literatur 41/1: 37–68.
Müller, Cornelia submitted. How gestures mean. The construal of meaning in gestures with speech.
Peirce, Charles Sanders 1931–58. Collected Papers. Charles Hawthorne and Paul Weiss (eds.)
Volumes 1–6; Arthur W. Burks (ed.) Volumes 7–8. Cambridge, MA: Harvard University Press.
Pike, Kenneth L. 1967. Language in Relation to a Unified Theory of the Structure of Human
Behavior. The Hague: Mouton.
Pinker, Steven and Ray Jackendoff 2005. The faculty of language: What’s special about it? Cog-
nition 95(2): 201–236.
Posner, Roland 1986. Zur Systematik der Beschreibung verbaler und non-verbaler Kommunika-
tion. In: Hans-Georg Bosshardt (ed.), Perspektiven auf Sprache: Interdisziplinäre Beiträge
zum Gedenken an Hans Hörmann, 267–313. Berlin: De Gruyter.
Posner, Roland 2004. Basic tasks of cultural semiotics. In: Gloria Withalm and Josef Wallmanns-
berger (eds.), Signs of Power – Power of Signs. Essays in Honor of Jeff Bernard, 56–89. Vienna:
INST.
Ruesch, Jurgen and Weldon Kees 1972. Nonverbal Communication: Notes on the Visual Perception
of Human Relations. Berkeley: University of California. First published [1956].
Sauerland, Uli and Andreas Trotzke 2011. Biolinguistic perspectives on recursion: Introduction to
the special issue. Biolinguistics 5(1–2): 1–9. http://www.biolinguistics.eu/index.php/biolinguistics/
article/view/201/210.
Schegloff, Emanuel A. 1984. On some gestures’ relation to talk. In: J. Maxwell Atkinson and John
Heritage (eds.), Structures of Social Action: Studies in Conversational Analysis, 266–296. Cam-
bridge: Cambridge Universtity Press.
Scherer, Klaus R. and Harald G. Walbott 1984. Nonverbale Kommunikation. Forschungsberichte
zum Interaktionsverhalten. Weinheim: Belz.
Seiler, Hansjakob 1978. Determination: A functional dimension for interlanguage comparison. In:
Hansjakob Seiler (ed.), Language Universals, 301–328. Tübingen: Narr.
Stetter, Christian 2005. System und Performanz. Symboltheoretische Grundlagen von Medienthe-
orie und Sprachwissenschaft. Weilerswist: Velbrück Wissenschaft.
Stöckl, Hartmut 2004. In between modes: Language and image in printed media. In: Eija Ventola,
Cassily Charles and Martin Kaltenbacher (eds.), Perspectives on Multimodality, 9–30. Amster-
dam: Benjamins.
Streeck, Jürgen 2002. Grammars, words, and embodied meanings: On the uses and evolution of so
and like. Journal of Communication 52(3): 581–596.
Streeck, Jürgen 2009. Gesturecraft. The Manu-facture of Meaning. Amsterdam: Benjamins.
Stukenbrock, Anja 2010. Überlegungen zu einem multimodalen Verständnis der gesprochenen
Sprache am Beispiel deiktischer Verwendungsweisen des Ausdrucks so. InLiSt – Interaction
and Linguistic Structures 47. http:// www.inlist.uni-bayreuth.de/issues/47/InLiSt47.pdf.
Watzlawick, Paul, Janet Beavin Bavelas and Don D. Jackson 1967. Pragmatics of Human Commu-
nication: A Study of Interactional Patterns, Pathologies, and Paradoxes. New York: Norton.
Wundt, Wilhelm 1904. Völkerpsychologie. Eine Untersuchung der Entwicklungsgesetze von
Sprache, Mythus und Sitte. Volume 1: Die Sprache. Leipzig: Engelmann. First published [1900].
Wundt, Wilhelm 1973. The Language of Gestures. The Hague: Mouton. First published [1900].
Zelinsky-Wibbelt, Cornelia 1983. Die semantische Belastung von submorphematischen Einheiten
im Englischen: Eine empirisch-strukturelle Untersuchung. Frankfurt am Main: Peter Lang.
is how the specific material properties of each medium may determine the cross-modal
distribution of semantic features and pragmatic functions (Mittelberg 2006). Studying
bodily semiotics from the angle proposed here aims to shed light on the “ex-bodiment”
(Mittelberg 2008: 148; 2010b: 376) of mental imagery, internalized conceptual struc-
tures, action patterns, and felt qualities of experience, whereby the human body func-
tions, as in processes of embodiment, as the living medium through which such
bidirectional mechanisms of abstraction and concretization are shaped.
Exbodiment entails the motivated semiotic structure inherent to communicative ges-
tures made with the hands and arms, as well as to expressive postures and full body
movements. Action routines and embodied image schemata are assumed to drive the
body’s intuitive expressions, as well as more consciously produced descriptions and
(re-)enactments of observed, lived or imagined experience, including physical and
social forces. This view also accounts for gestural signs exhibiting dimensions that
point beyond perceivable bodily semiotics by metonymically alluding to imaginary
physical objects, virtual movement traces or spatial projections that appear to be con-
tiguous to the gesturing hand(s). Gestural manifestations of basic geometric patterns,
image-schematic structures and metaphorical understandings of abstract ideas or pro-
cesses may be observed to emerge when speakers seem to outline or manipulate virtual
physical objects, relations or structures, while talking about emotions, inner mental
states or abstract knowledge domains (e.g., Cienki and Müller 2008; Mittelberg 2010b;
Müller 2008; Núñez 2008). Gestures are thus a means to express, reify and show to inter-
locutors both imagined and sensed dimensions of mental imagery. They may lend a per-
ceptible gestalt to concepts, ideas and memories, if only for a moment and if only in the
form of furtively drawn, invisible lines or demarcated chunks of space. Primary meta-
phors (Grady 1997), in particular, have been shown to emerge in the gestural modality,
even if the accompanying speech is non-figurative (Mittelberg 2006, 2008). Hence, one
of the guiding questions is how cognitively entrenched patterns of experience – arisen
from visual perception, navigation through space, tactile exploration, and other practices
of bodily interaction with the sensorial and social world – may motivate gestural sign
formation and interpretation, as well as structure the (interactive) use of gesture space.
The semiotic perspective taken here further places a focus on gesture interpretation,
in which one’s own habitual movements and actions as well as one’s personal semiotic
history may guide the understanding of multimodal semiotic acts performed by others.
As we may express “felt qualities of our experience, understanding, and thought”
(Johnson 2005: 31) through our own gestures and body postures, observing others
doing so enables us to see and feel what they are trying to convey (Mittelberg
2010a). Johnson (2007: 162) further points out that understanding other people’s actions
involves mental simulation of physical actions: “This deep and pre-reflective level of
engagement with others reveals our most profound bodily understanding of other peo-
ple, and it shows our intercorporeal social connectedness”. The recent discovery of mir-
ror neurons indicates that the sensorimotor areas that are active in the brain when a
person performs a goal-directed action are triggered when the person observes someone
else perform the action (e.g., Rizzolatti and Craighero 2004). These observations are also
crucial with regard to gestures and their role in language understanding (Skipper et al.
2009) as well as in respect to the human capacity to assume and multimodally express
multiple viewpoints on a given experience (Sweetser 2012: 12–16).
47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 757
A gesture is – at least in many cases – a gesture, because the hands do not manipulate
a physical object or structure, but rather pretend to do so. This letting go of the material
world in imitative gestures turns a transitive manual action with an object or a tool in
hand into a more abstract communicative hand movement, from which objects or tools
may still be inferred. Indeed, gestures may reflect the speakers’ embodied knowledge of
the material world and its affordances in various ways. Gesturing hands may also seek
and establish contact with the physical environment and with what Hutchins (1995)
calls “material anchors” in cross-modally orchestrated processes of manufacturing
meaning (Streeck 2009; see also Enfield 2009; Goodwin 2007; Lebaron and Streeck
2000; Williams 2008). Such grounded activities of multimodal cognition and communi-
cation also fall into the scope of the exbodied mind, understood as always being indexi-
cally anchored in the concreteness of the human body, its physical habitat, and the
intersubjective dynamics of communication (e.g., Zlatev et al. 2008).
In the present framework, these phenomena are described with help of the funda-
mental semiotic modes similarity and contiguity, as well as their subtypes, which provide
means to capture both fine distinctions and transient cases regarding the gestural forms
and functions of interest here. It will be argued that while the perceived similarity
between the bodily actions and gestures we observe in others and our own perceptual
experiences and physical routines may determine how we cognitively and physically
align with our interlocutors, similarity is only one pathway to understanding the inten-
tions and semantics of communicative behavior (for mimicry in gesture see, e.g., Holler
and Wilkin 2011). As will be discussed in detail below, contiguity relations between
the gesturing body and the material and social world also play a central role in sensing
and interpreting the meaning of coverbal gestures. In addition, contact, adjacency and
impact are different kinds of contiguity relations between entities in the physical world
or in semiotic structure that may be established, highlighted, or deleted in gestural sign
formation and interpretation.
For a brief preview of the kinds of distinct semiotic processes that will be discussed
in detail throughout this article, let us look at two gestural examples (taken from
Mittelberg 2010b). Both persons shown below are US-American linguistics professors
lecturing about grammatical categories. In the sequence where the gesture shown in
Fig. 47.1 occurs, the speaker introduces the notion of semantic roles: To account for
this… we use names of semantic roles that bounce around in linguistics… agent, patient,
recipient, goal, experiencer… those are semantic roles. On the mention of recipient she
produces a palm-up open hand with slightly bent fingers held near her body at hip
level. In this multimodal performance unit, the uttered word recipient refers to a specific
semantic role, i.e. an abstract grammatical function, which the teacher literally personi-
fies with her entire body: in that very moment, she is a recipient. Given this particular
combination of body posture, arm configuration and hand shape, we can see a similarity
relation between this unified corporeal image and a person holding something. It is left
unspecified whether her open hand is holding an imaginary object already received, or
whether it merely signals readiness to receive something. However, in her speech she
does not refer to any possible object, but solely to the role she is assuming. This bimodal
portrayal of an abstract function is afforded through a) iconicity between the semiotic
structure inherent to her body posture plus gesture and the mundane action of holding/
receiving something; b) a latent contiguity relation between the open hand and a potential
object; and c) metaphor (i.e. personification).
758 IV. Contemporary approaches
A closer look at the pragmatic functions of the two seemingly similar gestures above
reveals that their similarity mainly resides in the form features of the gestural articula-
tors: they are variants of the palm-up open hand (Müller 2004). Without considering the
speech content it would be impossible to establish which of the semiotic modes mixing
in each of these multimodal explanations predominantly contribute to their meaning.
Explaining the framework of emergent grammar, the speaker shown in Fig. 47.2 points
out that a priori … you cannot define a noun from a verb. When saying the word noun,
this palm-up open hand gesture, slightly extended toward the student audience, consti-
tutes a perceivable surface, i.e. a material support structure, on which the speaker
seems to present the abstract category noun reified as an imaginary tangible object.
In cognitive semantic terms, this gesture seems to manifest the image schemata SUP-
PORT, SURFACE, and OBJECT (Johnson 1987; Mandler 1996), as well as the primary met-
aphor IDEAS ARE OBJECTS or CATEGORIES ARE CONTAINERS (Lakoff and Johnson 1980).
The point here is that iconicity and metaphor do not suffice to account for the partic-
ular form and function of this hand configuration; there is no iconic relationship
between the shape of the manual articulator and a grammatical category. Rather,
an imputed contiguity relation between the open palm and the imaginary noun be-
comes significant: the simultaneously uttered word noun draws attention away from
the action to the imaginary entity which needs to be metonymically inferred from the
open hand.
47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 759
In the sections below, first the different theoretical strands building the conceptual
foundation of the present approach will be sketched. After discussing processes of moti-
vation and abstraction in gestural sign formation, a set of cognitive-semiotic principles
will be defined and illustrated with gestural examples. Special attention will be paid to
metonymic modes and image-schematic structures that seem to feed into both literal
and metaphoric meaning construction in multimodal discourse. The overall goal of
this article is to show some of the ways in which gestures might attest to the semiotic
reality of embodied conceptual schemata and action patterns, by invoking aspects of
their visuo-spatial, material and multisensory origins.
Fricke 2007) and anthropology (e.g., Enfield 2009; Haviland 2000). Peirce’s widely cited
definition of the sign is also provided here, particularly to recall the terms Representa-
men (the material form the sign takes) and the interpretant, i.e., the response/association
the Representamen evokes in the mind of the sign receiver. Hence, without an
interpreting mind there is no sign, that is, no semiosis and no meaning.
A sign [in the form of a representamen] is something which stands to somebody for some-
thing in some respect or capacity. It addresses somebody, that is, creates in the mind of that
person an equivalent sign, or perhaps a more developed sign. That sign which it creates I
call the interpretant of the first sign. The sign stands for something, its object. (Peirce’s
1960: 135, § 2.228; italics in the original)
A strong point of this model is that it includes the receiver as a participant actively in-
volved in making meaning of the signs she or he perceives. So the notion of interpretant
is central for several reasons: it marks the moment when meaning emerges in interpre-
tative processes (some of which might be propelled by metaphoric associations, for
example, while others by metonymic); it accounts for different minds with different
semiotic experiences and habits; and it exhibits a potential for augmentation regarding
different degrees of semiotic density and ways to link up the intended object with
semantic structure in the conceptual system (Mittelberg 2006: 43).
While responding to the need of categorizing gestures for the purpose of analysis,
many scholars have come to realize that working with categories, even if seen as not
absolute, hardly does justice to the polysemous and multifunctional nature of gestural
forms (cf. Müller 1998). McNeill, for instance, moved away from his original taxonomy
(i.e. iconics, deictics, metaphorics, beats, and cohesives; McNeill 1992) in preference to
dimensions such as iconicitiy and metaphoricity (McNeill 2005). In light of the noted
multifunctionality of gestural signs, the present approach advocates, in alignment
with Peirce (1960) and Jakobson (1987), a hierarchical view, asserting that among the
different semiotic modes that may mix and interact in a given gestural sign, one needs
to establish, in conjunction with the concurrent speech and other contextual factors,
which one(s) actually determine(s) its specific form and local function.
Before laying out in detail the workings of the cognitive-semiotic principles of cen-
tral interest in this article, a few words need to be said about the corpus from which the
examples discussed below are taken, as well as about the empirical methods employed.
The corpus consists of naturalistic academic discourse and coverbal gestures produced
by four US-American linguists while teaching introductory courses to linguistics. On the
basis of twenty-four hours of multimodal discourse, those segments were selected in
which referential gestures (cf. Müller 1998: 110–113) portray linguistic units of different
degrees of complexity, grammatical categories, and syntactic structures and operations.
Transcriptions included the speech of each segment, the course of each gestural move-
ment excursion according to its phases (Kendon 2004: 111), and the exact speech-
gesture synchrony. To record the kinesic features of the gestures, the most widely
used coding parameters were used: hand presence, hand dominance, hand shape,
palm orientation, movement manner and trajectory, and the location in gesture
space. Opting for a data-driven typology of manual signs, the corpus was searched
for prominent hand shapes and movement patterns recurring across speakers and con-
texts. A set of schematic images of objects, actions and relations emerging from the data
47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 761
then provided the basis for the analysis of cross-modal processes of meaning construc-
tion. For each gesture unit, the information conveyed in the concurrent speech seg-
ments was considered to determine the interaction of the different iconic, metonymic
and metaphoric modes (for more details regarding the methods, see Mittelberg 2007,
2008, 2010a).
Since the relationship between iconicity and metaphor in gesture has already re-
ceived ample attention (e.g., Cienki and Müller 2008), this article focuses on the inter-
action between iconicity and indexicality on the one hand and between metonymic
and metaphoric modes on the other. Assigning considerable weight to metonymic
principles and the role they play in processes of perception, abstraction and inferencing,
the approach presented here aims to offer insights into the motivated parthood of
communicative movements of the human body.
3. Motivation in gesture
From the perspective on gesture taken here the issue of motivation is central. Taking
the gestural material as a starting point, the intention is to establish how the different
modalities share the semiotic work of creating form and meaning. The task is to identify
the forces that might have motivated the form features and pragmatic functions of a
given gesture or sequence of gestures. One complicating factor in gesture analysis is
the fact that the semiotic material we are looking at consists not only of observable
physical components – such as body posture, bodily motion as well as configurations,
movements and locations of hands and arms – but also of immaterial dimensions
such as virtual movement traces left in the air or imagined surfaces, objects or points
in space. Compared to static visuo-spatial modalities such as drawings or sculptures,
gestures typically evoke persons, objects, actions, places or relations in a rather fluid
and ephemeral way. As the articulators in the speaker’s mouth constantly form new
configurations to produce speech sounds, hands may also constantly change their articu-
latory shape as well as the manner and trajectory of movement when in gestural motion
(Bouvet 2001; Bressem this volume). What a potentially polysemous gestural form
stands for can only be determined by considering the simultaneously produced utter-
ances. Speech and its accompanying gestures have been shown to assume specific semi-
otic roles in processes of utterance formation (Kendon 2000: 53). Being “motor signs”
(Jakobson 1987: 474), gestures are prone to depict, or actually constitute, spatial and
dynamic dimensions of what the speaker is talking about, thus grounding information
(partly) conveyed through speech in a visuo-spatial context sharable by interlocutors
(e.g., Müller 2008; Sweetser 1998, 2007). Gestures may further regulate social interac-
tion, either explicitly or in the form of a sort of subtext unfolding in parallel to the
ongoing conversation (e.g., Bavelas et al. 1992; Müller 1998).
Within the field of linguistics, the arbitrary versus motivated nature of human lan-
guage has been a matter of great debate (Jakobson and Waugh 1979; Saussure 1986).
Drawing on Peirce’s notions of image iconicity and diagrammatic iconicity, Jakobson
(1966) not only demonstrated that iconicity is a constitutive factor at all levels of lin-
guistic structure (phonology, morphology, syntax and the lexicon; Waugh 1976, 1994),
he also devised different kinds of contiguity relations in language and other sign sys-
tems. As we will see below, Jakobson’s distinction between inner and outer contiguity
takes center stage in the present framework and serves as the basis for different
762 IV. Contemporary approaches
types of metonymy (Jakobson and Pomorska 1983). Iconicity and metonymy have also
been ascribed a constitutive role in the formation of signs in American Sign Language
(ASL)(Mandel 1977; P. Wilcox 2004; S. Wilcox 2004). Investigating the relationship of
iconicity and metaphor in American Sign Language, Taub (2001) suggested a set of
principles including image selection (based on similarity or contiguity), schematization
(through abstraction), and encoding (through conventionalization). Compared to highly
symbolic sign systems such as spoken and signed languages, spontaneous gestures do
not show the same degree of formalized conventionalization and grammaticalization.
Hence, some of the most interesting questions that arise here concern the ways in
which gestures exploit and create similarity and contiguity relations differently than
language and how these modes feed into processes of conventionalization.
In his observations on “the body as expression”, Merleau-Ponty (1962: 216)
succinctly states:
It is through my body that I understand other people, just as it is through my body that
I perceive “things”. The meaning of a gesture thus “understood” is not behind it, it is
intermingled with the structure of the world outlined by the gesture.
This kind of “structure of the world” (Merleau-Ponty 1962: 216) as profiled in a gesture
can be assumed to comprise different kinds of structure: physical, semiotic, and/or con-
ceptual. It may reflect the spatial structures and physical entities humans routinely per-
ceive and interact with in their daily lives and professional practices (Goodwin 2007;
Streeck 2009). When asking someone for a small bowl, for instance, we can iconically
illustrate the desired object by evoking the shape of a round container through holding
two cupped open hands closely together with palms facing upward. Such a hand config-
uration not only expresses the idea of a bowl (as the word “bowl” does), but it actually
constitutes for a moment a container of that sort. In linguistics courses, gestures are one
of many visual semiotic resources used to explain abstract categories and functions. For
example, teachers have been found to trace triangle-shaped figures in the air, thus imi-
tating conventional tree diagrams used in textbooks and on blackboards to visualize
hierarchic sentence structure (Mittelberg 2008). In view of gestures that lend a tangible
form to abstract ideas and structures, the question of motivation becomes more com-
plex and leads us into the realm of figurative thought and expression. We then need
to ask in what ways gestures portraying abstracta, beliefs, mental or emotional states
might be shaped by construal operations such as metaphors, metonymies, and framing
(e.g., Cienki 1998a; Cienki and Müller 2008; Evola 2010; Gibbs 1994, 2006; Harrison
2009; Mittelberg 2010a; Müller 1998, 2008; Sweetser 1998, 2012).
Indeed, as Merleau-Ponty pointed out, getting at the meaning of a gesture does not
seem to be simply a matter of reference. Gestures may in the very moment of expres-
sion actually “be” what they are taken to “be about”, and create new semiotic material
from scratch which then can take on a life of its own in the ongoing discourse and
may as such serve as a reference structure for subsequent multimodal explanations
(e.g., Fricke 2007; McNeill 2005). Gesture researchers have come to differentiate ges-
tures that carefully depict an existing and experienced place, object, or event, such as
a certain tool one has used or an animated cartoon one has watched, from those ges-
tures that seem to reflect a concept, a conceptual image/schema, or a vague idea (e.g.,
Andrén 2010; Cienki and Mittelberg 2013; McNeill 1992; Mittelberg 2006; Müller
47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 763
1998). In his elaborations of the “thinking hand”, Streeck (2009: 151) distinguishes
between two gestural modes: depicting (e.g., via an iconic gesture portraying a physical
object) and ceiving (i.e. via a gesture conceptualizing a thematic object). He attributes
the latter mode to a more self-absorbed way of finding a gestural image for an emerging
idea. Fricke (2007) speaks of interpretant gestures which, in multimodal instruction giving,
may reflect a general concept of an architectural structure, such as a gate, instead of de-
scribing the idiosyncratic shape of a particular passageway that does not represent a pro-
totypical member of the category.
Additional subjective factors influencing which elements of a given object or sce-
nario are attended to, and how the locally salient features are encoded cross-modally,
pertain to the viewpoint the sign producer adopts, e.g. character or observer viewpoint
(McNeill 1992). As a substantial body of recent research has shown, viewpoint is a uni-
versal and extremely flexible construal operation shaping expressions across modalities
in spoken and signed languages (e.g., Dancygier and Sweetser 2012). As Sweetser (2012:
1) points out, “cognitive perspective starts with bodily viewpoint within a real physical
Ground of experience”. Being existentially tied to the speaker’s body, as well as to its
material and socio-cultural context, gestures, whether they be predominantly iconic or
deictic, are inherently indexical (Mittelberg 2008). While gestural and corporeal signs
tend to reflect aspects of the gesturer’s own bodily disposition and stance, they may,
at the same time, reflect multiple, shifting viewpoints on a given scene, thus also
embodying the perspective of others (Sweetser 2012).
What we can draw from these observations for the concept of the exbodied mind is
that motivated gestural sign constitution reflects – besides the primordial role of move-
ment, space and material culture – the workings of the speakers’ cognitive filter, result-
ing in, for instance, viewpointed conceptual images and structures. In this way, they may
attest to the psychological and semiotic reality of dynamic multimodal processes of
conceptualization (e.g., Cienki 1998b; Ladewig 2011; Mittelberg 2008; Müller 2008;
Núñez 2008).
Actually, the portrayal of an object by gesture rarely involves more than some one isolated
quality or dimension, the large or small size of the thing, the hourglass shape of a woman,
the sharpness or indefiniteness of an outline. By the very nature of the medium of gesture,
the representation is highly abstract. What matters for our purpose is how common, how
satisfying and useful this sort of visual description is nevertheless. In fact, it is useful not in
spite of its spareness but because of it. (Arnheim 1969: 117)
Before teasing apart the distinct ways in which this useful metonymic “spareness” (Arn-
heim 1969: 117) of gestures may be brought about, let us first look more closely at what
they might be metonymic of by considering the notion of the Object in Peirce’s triadic
sign model (henceforth, the elements of the Peircean sign model will be capitalized).
764 IV. Contemporary approaches
We will then narrow in on his concept of the Ground, as principles of abstraction and
partial representation are implemented already at this very basic level of the semiotic
process.
Peirce’s understanding of what a semiotic Object can be is extremely wide and
ranges from existing to non-existing things: it encompasses both concrete and abstract
entities, including possibilities, goals, qualities, feelings, relations, concepts, mental
states, and ideas (e.g., Kockelman 2005). Anything can be an Object, as long as it is re-
presented by a sign (Shapiro 1983: 25). From a cognitive semantics perspective, the non-
physical Objects listed above remind us of common target domains of conceptual
metaphors. As a large body of research on multimodal metaphor has shown, metaphor-
ical understandings can be expressed in either speech or gesture, or simultaneously in
both modalities (e.g., Müller and Cienki 2009). Moreover, the nature and properties
of the Object determine, according to Peirce, the sign. So in the case of a gesturally ex-
pressed metaphor, the Object of the gesture can be said to be the source domain of
the underlying mapping. In this case, the “structure of the world”, to come back to
Merleau-Ponty’s (1962: 216) observations quoted above, that might determine the
form of a metaphoric gesture, would be conceptual structure underlying a metaphoric
projection.
We are now in a position to bring Peirce’s concept of the Ground of a sign carrier, i.e.
of the Representamen, into the picture, which accounts for the fact that sign vehicles do
not represent Objects with respect to all of their properties, but only with regard to
some salient or locally relevant qualities. These foregrounded features function as the
Ground of the Representamen. In Peirce’s own words (1960: 135, § 2.228; italics in
the original):
The sign stands for something, its object. It stands for that object, not in all respects, but in
reference to some sort of idea, which I sometimes called the ground of the representamen.
“Idea” is here to be understood in a sort of Platonic sense, very familiar in everyday talk;
I mean in that sense in which we say that one man catches another man’s idea.
analogous sensations in the mind”. While there might be a visual bias in the term icon, it
encompasses a multimodal understanding of iconicity and hence also includes those
“sensations” that cause something to feel, taste, look, smell, move, or sound like some-
thing else. This view again corresponds well with the multisensory basis of embodied
image schemas and metaphors assumed in cognitive semantics (cf. section 6 for details).
It also encompasses representational gestures that are motivated by mental imagery
and conceptual structures, that is, by a mental Object: the cupped hands mentioned
above evoking the kind of bowl one is looking for (image icon); hands tracing the rela-
tions between concepts making up a theoretical framework (diagram); or an open
cupped hand that represents an abstract category in the form of a small container (met-
aphor; cf. Fig. 47.5 below and Mittelberg 2008). In addition, iconic gestures may take
shape in different ways. Inspired by the tools, media and mimetic techniques visual ar-
tists deploy, Müller (1998) introduced four modes of representation in gesture: drawing
(e.g., tracing the outlines of a picture frame), molding (e.g., sculpting the form of a
crown); acting (e.g., pretending to open a window), and representing (e.g., a flat open
hand stands for a piece of paper). If one applied all four modes to the same Object,
each resulting gestural Representamen would establish a different kind of iconic
Ground and hence highlight different features of the Object. This means that each
portrayal would convey a different “idea” of the Object (Peirce 1960: 135; § 2.228).
Pointing gestures are signs with a highly indexical Ground. In indexical signs, the
relation between sign and Object is based on contiguity, that is, on a factual (physical
or causal) connection between the two. According to Peirce (1960: 143; §2.228), “[a]n
Index is a sign which refers to the Object that it denotes by virtue of being really
affected by that object”. Indeed, the spatial orientation of highly context-sensitive
pointing gestures depends on the location of the Object they are directed toward,
and through the act of pointing the Object is established via a visual vector (cf. Fricke
2007; Haviland 2000; Kita 2003). Another example, not as strongly indexical though, is
the palm-up open hand gesture discussed above (cf. Fig. 47.2), since there is an indexical
relation between the open palm and the category noun it seems to be presenting to the
audience. Here we can see an interesting interaction between the speech content and
the gestural grounding mechanism: not the act of holding per se (which would presup-
pose an iconic Ground) is pertinent, but the entity (or the space) to be imagined as
being in contact with the open palm becomes the lieu of attention and thus the lieu
of meaning (cf. section 5.3. for details and more examples). In view of these observa-
tions, iconic and indexical grounding mechanisms may be regarded as two routes of
abstraction competing in corporeal/gestural sign creation and being tightly integrated
with the information provided in speech. As will be discussed next, Jakobson proposed
distinct types of contiguity relations and metonymic modes that seem to correlate with
these basic mechanisms of semiotic grounding.
provide the basis for Jakobson’s (1956) theory of metaphor and metonymy seen as two
opposite modes of association and signification that structure both linguistic and non-
linguistic signs. Like all semiotic modes, they are not mutually exclusive: signs tend
to exhibit varying degrees and different hierarchies of both (Jakobson 1966: 411). De-
riving his understanding of metonymy from Peirce’s notion of contiguity (and indexical-
ity), Jakobson emphasized the difference between synecdoche and other types of
metonymy:
One must – and this is most important – delimit and carefully consider the essential differ-
ence between the two aspects of contiguity: the exterior aspect (metonymy proper), and
the interior aspect (synecdoche, which is close to metonymy yet essentially different).
To show the hands of a shepherd in poetry or the cinema is not at all the same as showing
his hut or his herd, a fact that is often insufficiently taken into account. The operation of
synecdoche, with the part for the whole or the whole for the part, should be clearly distin-
guished from metonymic proximity. […] the difference between inner and outer contiguity
[…] marks the boundary between synecdoche and metonymy proper. (Jakobson and
Pomorska 1983: 134)
In the following subsections, we will first see how, when pragmatically operationalized
in a given sign process, these two contiguity relations may feed into corresponding me-
tonymic processes (i.e. internal and external metonymy). The analytical framework,
comprising different kinds of gestural icons and indices brought about through different
kinds of metonymic processes, will then be presented and exemplified with examples
from the multimodal corpus (Mittelberg 2006). Modes of interaction between metony-
mic and metaphoric processes will also be addressed (Mittelberg and Waugh 2009).
(Tab. 47.1 in section 5.4 provides a synopsis of the taxonomy of cognitive-semiotic
principles developed in sections 5.1–5.3 below.)
metaphoric mapping. In the present framework this corporeal sign is first and foremost
analyzed as an image icon, that is, a literal portrayal of the idea of a recipient simulta-
neously expressed in speech.
Another example of a gestural form with a strongly iconic Ground is given in
Fig. 47.3 below. This tracing gesture starts out with both hands joined in the center
of gesture space. Then the hands move laterally outward until both arms are fully ex-
tended, as if they were tracing a horizontal line or chain. Here, too, the speaker’s verbal
utterance – we think of a sentence as a string of words – disambiguates a potentially
polysemous schematic gestural image icon. So the focus is not on the body itself or
the action of tracing, but on the virtual line drawn in the air that results from the action
of tracing. Cross-modal processes of meaning construction also play a crucial function
here in that the trace is an image icon of the idea of a string that is simultaneously ex-
pressed in the speech modality. Via internal metonymy the imaginary line stands for an
entire sentence. Whether this schematic image reifies abstract conceptual structure or
whether it is a minimal icon of a graphical representation of a sequence of written
words, it is likely to result from cognitive processes of visual perception and analysis.
In contrast to the previously discussed gesture (Fig. 47.1), this gesture exhibits observer
viewpoint.
DIAGRAMMATIC ICON (INTERNAL RELATIONS; STRUCTURE). While the focus of the sentence
string of Fig. 47.3 was on its linear gestalt as a whole, the following gesture puts into
relief the inner structure of a word (cf. Fig. 47.4). In this well orchestrated multimodal
performance unit, two hands produce two individual gestural signs whose functional
relation turns out to be of particular interest. While explaining the basics of noun mor-
phology, the teacher complements the verbal part of his utterance as speakers of
English you know that … teacher consists of tech– and –er by making use of both
of his hands with the palms turned upwards and the fingers curled in. On the men-
tion of teach– he brings up his left hand, and immediately thereafter, on the mention
of –er, the right hand. He then keeps holding the two hands apart as depicted
in Fig. 47.4. This gesture can be interpreted in several ways. If we assume the left
hand to represent the morpheme teach– and the right hand the morpheme –er, we
can say that each sign itself entails a reification in that an abstract linguistic unit or a
speech sound is construed as a physical object through a metaphorical projection
(e.g., IDEAS ARE OBJECTS). If we assume the hands to be enclosing small imaginary
items, we can suppose an outer contiguity relation between the perceivable gestural ar-
ticulators and the metaphorically construed objects inside of them (cf. section 5.3). The
visually inaccessible contents would then be metonymically inferred from the percepti-
ble closed containers. In both readings, this composite gesture puts into relief the
boundary between the two elements, while accentuating the fact that the linguistic
units referred to in speech are connected on a conceptual level. As such, it constitutes
a gestural diagram: icons, “which represent the relations, mainly dyadic, (…) of the
parts of one thing by analogous relations in their own parts, are diagrams” (Peirce
1960: 157; § 2.277). The diagrammatic character of this cognitive-semiotic structure al-
lows us to identify contiguity relations between its constitutive parts: there is thus exter-
nal metonymy holding between individual signs building a structured whole (cf.
Tab. 47.1: to account for the hybrid status of the diagram, it is positioned closer to
the middle of the iconicity-indexicality continuum than image icon and metaphor
icon; cf. Mittelberg [2006: 117–132; 2008: 134–139] for diagrammatic iconicity in gesture;
cf. Waugh [1994] for iconicity in language).
METAPHOR ICON (PERSONIFICATION; REIFICATION; ETC.). Due to the metaphoricity char-
acterizing the meta-grammatical gestures analyzed here, the previously discussed exam-
ples of image icons could, in principle, also be analyzed as metaphor icons: the semantic
role recipient personified by the speaker’s bodily posture (cf. Fig. 47.1) and the sentence
conveyed as a string of words reified in the form of an imaginary line (cf. Fig. 47.3).
However, the present framework differentiates such gestural image icons of metaphoric
linguistic expressions from cases of metaphor iconicity in gesture that imply additional
semantic leaps in establishing similarity (Coulson 2001), that is, leaps not cued by met-
aphorical expressions in the speech modality. In the sequence of interest here, the
speaker explains the difference between main verbs and auxiliaries (Fig. 47.5). While
saying there is … what’s called the main verb, he directs his right hand toward the black-
board behind him, thus disambiguating and contextualizing the deictic existential
expression there is (cf. section 5.3). Immediately thereafter, while holding the deictic
gesture, the speaker makes a gesture with his left hand on the mention of the main
verb: the cupped palm-up open hand imitates the form of a small round container.
Showing a strongly iconic Ground, the formal features of the cupped hand are
770 IV. Contemporary approaches
motivated by internal metonymy in that they portray some of the essential structural
characteristics of a generic, small round container. This iconic form, however, does
not directly represent the idea of a main verb mentioned in speech.
Fig. 47.5: There is (index away from body) … the main verb (metaphor icon)
surfaces such as sand or other types of grounds (e.g., the famous example of animal foot-
prints). In a similar fashion, external metonymy accounts for the relation between ges-
tural movements and the emerging virtual traces they create in the air. Taking the
human body as the starting point, the following discussion of different types of outer con-
tiguity relations in gesture entails different degrees of “metonymic proximity” (Jakobson
and Pomorska 1983: 134) and an increasingly noticeable interaction with iconic modes.
INDEX AWAY FROM BODY (POINTING). Pointing is a highly coordinated and culturally-
shaped activity (e.g., Fricke 2007; Haviland 2000; Kendon 2004; Kita 2003; McNeill
1992). While they are not treated in depth here, pointing gestures are included in the
taxonomy as examples of signs with a highly indexical Ground. For example, the deictic
gesture shown above in Fig. 47.5 creates an invisible vector pointing away from the
speaker’s body and directing the audience’s attention to the word taught written on
the blackboard behind him. There is an outer contiguity relation between the tip of
the pointing finger and the target of the pointing action. Instances of deictic gestures
pointing to more distant Objects also belong to this group of indices.
BODY PART INDEX (LOCATIONS ON BODY). This kind of external metonymy is repre-
sented by gestures whose meaning derives partly from their contact with, or proximity
to, a particular body part of the speaker. The gesture depicted in Fig. 47.6 below is a
both-handed body part index co-occurring with the word knowledge in the verbal utter-
ance Grammar emerges from language use, not from knowledge becoming automatized.
It can be described as a hybrid of a) two simultaneously produced pointing gestures, tar-
geting each of the speaker’s temples, and b) a bimanual gesture consisting of two
cupped hands jointly constituting a metaphor icon of a container held next to the
head. In order to get to the site of knowledge, it takes two steps along an inferential
pathway, both of them afforded through external metonymy. First, there is an outer con-
tiguity relation between the hands and the head; then, there is another outer contiguity
relation between the outside of the head and its inside. In cognitive semantic terms: the
head is metaphorically understood as a CONTAINER which stands metonymically for its
content, i.e. knowledge (see Panther and Thornburg 2004 on metonymy and pragmatic
inferencing in language, and Dudis 2004 on body partitioning in American Sign
Language).
elements of the sign process (Jakobson and Pomorska 1983: 134). Compared to palm-up
open hand gestures, the gestures depicted below (cf. Figs. 47.8 and 47.9) employ two ar-
ticulators, e.g. two fingers or two hands, which help specify to a higher degree the size
and shape of the objects involved in the imitative actions. The person shown in Fig. 47.8
below is lecturing about sentence structure. When explaining the short sentence
Diana fell, his right hand shows this hand configuration held relatively high up in ges-
ture space on the mention of the verb form fell. Between his thumb and index finger,
he seems to be holding the verb fell, conceptualized as a tangible object or space
extending between the articulators. If we only considered the visible gestural articu-
lators as the semiotic material of this gesture, it would seem impossible to establish
a similarity relationship between this particular hand configuration and a verb form
or speech sound fell (no falling event is iconically depicted, either). However, through
the pragmatic context and the simultaneously uttered word fell attention is drawn
to the imaginary entity between the two fingers. So it is not the iconic relationship
between the imitated action and the real action of holding something up in the air
that gets profiled here; rather, the cognitive-semiotic principle of relevance is the
outer contiguity relation (contact/adjacency) between the observable gestural articu-
lators and the imagined word form that becomes, due to the linguistic cue fell, opera-
tionalized through external metonymy (see Hassemer et al. 2011 for a detailed account
of the profiling of gesture form features).
The last example is a two-handed gesture combining indexical and iconic modes in a
relatively balanced fashion. In the sequence from which the image in Fig. 47.9 is
taken, the speaker talks about the functional difference between main verbs and aux-
iliaries. He explains that auxiliaries such as have, will, being, and been (…) must all
belong to some subcategory. Upon some subcategory he makes the gesture shown
above, consisting of two hands that seem to be holding an imaginary three-dimensional
volume whose geometry is comparatively well designated. While there is an iconic rela-
tionship (via internal metonymy) between the physical action of holding or placing an
object as such and this gestural action of pretending to do so, it is again the outer con-
tiguity relation between the hands and the adjacent imagined object that is profiled here
via external metonymy in conjunction with the linguistic cue some subcategory. This
association works effortlessly because the action of “holding” and the object being
“held” are part of the same experiential domain (cf. Dancygier and Sweetser 2005 on
frame metonymy). Moreover, the meaning of the term subcategory is reinforced by
the gesture’s comparatively low location in gesture space. Since the subcategory is liter-
ally placed underneath the location where the superordinate category it relates to was
produced only a few utterances earlier (i.e. the main verb; cf. Fig. 47.5), this gesture also
is an instance of metonymy of place.
HAND/TRACE INDEX (PATH; LINE FIGURE; ETC.). Communicative hand movements leave
invisible traces in gesture space. These traces become meaningful when attention is
drawn to, for instance, their execution in terms of the trajectory they project, the
shape contours they delineate, or the specific manner of movement they exhibit (e.g.,
through straight or wavy lines; see Bressem this volume). In such sign processes,
hand/trace indices highlight the outer contiguity relation between perceivable gestural
articulators, e.g. the index finger or the entire hand, and the imaginary lines or figures
they leave in the air or the visible traces they imprint on surfaces. Such figurations
constitute, however sketchy or minimal they may be, signs in their own right and
may as such be iconic, diagrammatic, and/or metaphoric icons of something else
(e.g., the line representing a string of words in Fig. 47.3; see Mittelberg 2010a for
additional movement patterns). Given their strong interaction with iconic principles,
this type of index in placed towards the middle of the continuum in Tab. 47.1. Outer
(tactile) contiguity relations between flat hands and the surfaces they pretend to be
exploring, as well as between bent hands and the volumes they seem to be touching
or creating (external to the hands), represent another indexical relation that may
engender iconic figurations: instantiations of HAND/PLANE INDEX are not exemplified
here, but listed in the table below (see Hassemer et al. 2011 for dimensions of gesture
form).
stemming from previous work (lower part of table; see Mittelberg 2006, 2010a, 2010b).
Horizontally, the table is structured via a continuum spanning from an iconic pole on
the left to an indexical pole on the right. Along this continuum different kinds of ges-
tural and corporeal icons and indices are positioned, depending on whether they exhibit
a predominantly iconic or indexical Ground. Gestural signs combining increased de-
grees of both iconicity and indexicality are placed towards the center of the continuum.
Neither the taxonomy nor the placement of the gestures is to be seen as static or abso-
lute. In a given sign process, the particular combination of pragmatic forces might
require a reordering of certain principles along the continuum. There also is room
for variation regarding additional gesture forms and functions not considered here
(e.g., pragmatic gestures, beats and other primarily indexical gestures that fulfill various
functions regarding affect, attention, interaction, and information management; e.g.,
Bavelas et al. 1992; Müller 1998).
Table 47.1 reads as follows. While IMAGE ICON and METAPHOR ICON are positioned clo-
ser to the left pole of the continuum, it is understood that gestural imagery may exhibit
varying degrees of abstraction and schematicity along the line. As pointed out earlier,
image icons tend to literally depict what is described in speech, while metaphor icons
imply additional cognitive leaps. Since the DIAGRAMMATIC ICON (INTERNAL RELATIONS;
STRUCTURE) entails both inner contiguity, regarding the sign-object relation, and outer
contiguity relations among its constitutive parts, it is placed closer toward the indexical
side of the continuum. The different gestural indices proposed here point to locations
where meaning is multimodally constructed, either directly on the speaker’s body or
in “metonymic proximity” to it (Jakobson and Pomorska 1983: 134).
Deictic gestures with a highly indexical Ground constitute the far right of the contin-
uum: namely, pointing gestures (INDEX AWAY FROM BODY) and pointers to specific body
parts or locations on the speaker’s body (BODY PART INDEX). Gestures with a slightly
muted indexical Ground may indicate the existence or location of a mentally construed
entity in the form of a virtual object, by providing a support structure in or on which it
can be presented, e.g. via a HAND/OBJECT INDEX (SUPPORT; CONTAINER). In addition, ges-
tures employing more than one articulator may demarcate chunks or extensions of
space between them that may get semantically charged in acts of multimodal mean-
ing-making, e.g. via a HAND/OBJECT INDEX 2-SIDED (BOUNDED SPACE). To differentiate
between physical objects and tools used to perform a certain transitive action on some-
thing else, e.g. pantomiming cutting a fruit (object) with a knife (tool), another outer
contiguity relation is added here. A HAND/TOOL INDEX (tool involved in action) incorpo-
rates iconic dimensions, derived from the particular hand shape or grip and the motor
routine typical for the performed action (Grandhi, Joue and Mittelberg 2011, 2012).
Finally, HAND/TRACE INDEX (PATH; LINE FIGURE) and HAND/PLANE INDEX (FLAT SURFACE;
VOLUME) combine indexicality and iconicity in the creation of, for instance, invisible
paths, line drawings, planes or volumes in gesture space (Hassemer et al. 2011 for a
detailed account of the geometry and dimensions of gesture form). There is outer con-
tiguity between fingers/hands that move through gesture space and the paths and fig-
ures they delineate, or the surfaces and volumes they seem to explore or actually
create in the process. Since such invisible figurations may be iconic of something
else, e.g. of cognitive or physical structures or actions, these signs are positioned closer
to the iconic pole than the other indices. Regardless of the predominant Ground of
the gestural signs, the metonymically inferred objects may come in different shapes
776 IV. Contemporary approaches
and sizes and show different degrees of iconic and geometric specification. In the ges-
tures representing grammatical categories and linguistic structure analyzed here, smal-
ler units such as morphemes fit into a closed hand; single words and categories were
held between index and thumb or rest on a palm-up open hand; and more complex
units such as sentences were represented as linear structures unfolding horizontally
in front of the speaker’s body (see Mittelberg 2008 and 2010a for additional examples).
similarity contiguity
THEORETICAL BASES
C.S. Peirce
iconicity indexicality
(image; diagram; metaphor)
(posture/action; object = trace, figure, hand) BODY PART INDEX (locations on body)
Gestural and METAPHOR ICON HAND/OBJECT INDEX (surface; container)
corporeal (personification; reification (trace, hand))
sign processes HAND/OBJECT INDEX 2-sided (bounded space)
(with speech) DIAGRAMMATIC ICON HAND/TOOL INDEX (tool involved in action)
(internal relations; structure) HAND/TRACE INDEX (path; line figure)
HAND/PLANE INDEX (flat surface; volume)
Semiotic
grounding: iconic Ground indexical Ground
Iconicity–
indexicality
continuum
As metaphor is assumed to interact with all metonymic modes to varying degrees (Ja-
kobson 1956), it is placed on both sides of the continuum. In their investigations of how
indexical and iconic principles interact in the interpretation of metaphoric gestures,
Mittelberg and Waugh, in their (2009) article “Metonymy first, metaphor second”,
have suggested two distinct but intertwined mappings. From the perspective of the per-
son listening to and looking at such a multimodal performance, metonymy can be said
to lead into metaphor. First, gestural sign vehicles, i.e. hand shapes and movements,
may serve as visual reference points (Langacker 1993) triggering cognitive access to
concepts represented as chunks of demarcated space or invisible objects (e.g., P. Wilcox
2004). Via a metaphorical mapping, these reified entities stand for the abstract cate-
gories the person talks about (see Taub 2001 for metaphorical mappings in American
Sign Language). In some of the instances examined here, the concurrent speech is not
metaphorical in nature (e.g., teach-er; main verb). Yet, the body portrays, i.e. exbodies,
how the person conceptualizes and understands the abstracta.
Such instances of gestural manifestations of both mental imagery and physical ac-
tions seem to be cross-modally grounded in several interlaced ways: a) via the human
body indexically anchored in its momentary temporal, spatial and social context;
b) via the concurrent linguistic utterance carrying the information that determines
47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 777
heavy items, keeping balanced while riding a bike, or being pushed through a narrow
hallway by a crowd of people, can be easily reenacted through imitative actions per-
formed with the full body or parts of it, such as the head, torso and/or hand gestures.
In the meta-grammatical gesture corpus, certain gestures portrayed the behavior of
grammatical categories and the dynamic nature of cognitive and syntactic operations
via movements exhibiting an increased level of energy (Mittelberg 2010a: 370). For
instance, the idea that the theory of emergent grammar views grammar and language
as merging domains (Johnson 1987: 126) was illustrated by a comparably forceful move-
ment. The bimanual gesture in question starts out with two hands held apart, palms fac-
ing each other. On the mention of it blurs the boundary between learning and doing, the
palms are suddenly pushed towards each other. In this vivid portrayal, a forcefully per-
formed bodily action seems to erase both physical and conceptual boundaries at the
same time. Generally speaking, gesture research promises to augment our understand-
ing of the bodily logic of force schemata, especially regarding the multimodal expres-
sion, i.e. exbodiment, of less tangible, yet crucial dimensions of meaning such as social
forces and attitudes, but also affective and intersubjective dimensions of human
communicative behavior ( Johnson 2005, 2007).
7. Concluding remarks
In gesture studies, the “bodily basis of meaning, imagination and reason” (the subtitle
of Johnson’s 1987 book) indeed is the starting point for examining the physical and cul-
turally shaped forces that lend a certain degree of systematicity to less-consciously pro-
duced bodily signs. In the same vein, the intent of this article has been to demonstrate
how certain cognitive-semiotic principles – such as iconicity, indexicality, metaphor and
metonymy – interact in motivating physically grounded processes of gestural form cre-
ation and interpretation. Inspired by Peircean and Jakobsonian notions, an iconicity-
indexicality continuum was suggested, allowing to relate gestural signs with predomi-
nantly iconic or indexical Grounds, as well as more transient cases to one another
(cf. the taxonomy of icons and indices presented in Tab. 47.1). The idea of the exbodied
mind was put forth to focus on how structures of embodied multisensory experience,
such as image schemata and force gestalts, may visibly manifest themselves, at least
to certain degrees, in the form of dynamic ephemeral gestural and corporeal signs
produced with speech.
The analyses presented above have shown once again that gesture analyses need to
carefully consider a host of contextual factors, particularly speech and neighboring ges-
tures (Müller 2010), not only to disambiguate potentially polysemous gestural forms,
but also to determine which parts and movements of the bodily articulators become
profiled and thus meaningful in a given moment (Hassemer et al. 2011). In these
multimodal performances, the speech content further proved to be instrumental in es-
tablishing whether the corporeal actions or hand configurations themselves are focused
upon, or whether the imaginary entities, spaces or lines immediately contiguous to, or
created by, the gesturing hands become the salient elements in such processes of cross-
modal meaning construction. In this and many other respects, gesture studies can no
doubt greatly benefit from systematic comparisons with the morphology and discourse
pragmatics of signed languages (see, e.g., Dancygier and Sweetser 2012; Dudis 2004;
Liddell 2003; Taub 2001; P. Wilcox 2004; S. Wilcox 2004).
780 IV. Contemporary approaches
Acknowledgements
The research presented in this article was supported by the Excellence Initiative of the
German Federal and State Governments. The author wishes to thank the editors, as
well as Vito Evola, Julius Hassemer and Gina Joue for valuable feedback and Yoriko
Dixon for the gesture drawings.
8. References
Andrén, Mats 2010. Children’s gestures from 18 to 30 months. Ph.D. dissertation, Centre for Lan-
guages and Literatures, Lund University.
Arnheim, Rudolf 1969. Visual Thinking. Berkeley: University of California Press.
Bavelas, Janet, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gestures. Dis-
course Processes 15: 469–489.
Bourdieu, Pierre 1990. The Logic of Practice. Stanford, CA: Stanford University Press.
Bouvet, Danielle 2001. La Dimension Corporelle de la Parole. Les Marques Posturo-Mimo-Ges-
tuelles de la Parole, leurs Aspects Métonymiques et Métaphoriques, et leur Rôle au Cours
d’un Récit. Paris: Peeters.
Bressem, Jana volume 1. A linguistic perspective on the notation of form features in gestures.
In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin: De Gruyter Mouton.
Calbris, Geneviève 2003. From cutting an object to a clear-cut analysis: Gesture as the repre-
sentation of a preconceptual schema linking concrete actions to abstract notions. Gesture
3: 19–46.
Cienki, Alan 1998a. Metaphoric gestures and some of their relations to verbal metaphoric expres-
sions. In: Jan-Peter Koenig (ed.), Discourse and Cognition: Bridging the Gap, 189–204. Stan-
ford, CA: CSLI Publications.
Cienki, Alan 1998b. Straight: An image schema and its metaphorical extensions. Cognitive Lin-
guistics 9: 109–147.
Cienki, Alan 2005. Image schemas and gesture. In: Beate Hampe (ed.), From Perception to Mean-
ing: Image Schemas in Cognitive Linguistics, 421–442. Berlin: De Gruyter Mouton.
Cienki, Alan and Irene Mittelberg 2013. Creativity in the forms and functions of gestures with
speech. In: Tony Veale, Kurt Feyaerts and Charles Forceville (eds.), The Agile Mind: Creativity
in Discourse and Art, 231–252. Berlin: De Gruyter Mouton.
Cienki, Alan and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John
Benjamins.
Coulson, Seana 2001. Semantic Leaps: Frame-Shifting and Conceptual Blending in Meaning Con-
struction. Cambridge: Cambridge University Press.
Damasio, Antonio R. 1994. Descartes’ Error: Emotion, Reason, and the Human Brain. New York:
Putnam and Sons.
Danaher, David S. 1998. Peirce’s semiotic and cognitive metaphor theory. Semiotica 119 (1/2):
171–207.
47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 781
Dancygier, Barbara and Eve E. Sweetser 2005. Mental Spaces in Grammar: Conditional Construc-
tions. Cambridge: Cambridge University Press.
Dancygier, Barbara and Eve E. Sweetser 2012. Viewpoint in Language: A Multimodal Perspective.
Cambridge: Cambridge University Press.
Dirven, Rene and Ralf Pörings (eds.) 2002. Metaphor and Metonymy in Comparison and Contrast.
Berlin: De Gruyter Mouton.
Dudis, Paul 2004. Body partitioning and real-space blends. Cognitive Linguistics 15(2): 223–238.
Enfield, N. J. 2009. The Anatomy of Meaning. Speech, Gestures, and Composite Utterances. Cam-
bridge: Cambridge University Press.
Evola, Vito 2010. Multimodal cognitive semiotics of spiritual experiences: Beliefs and metaphors
in words, gestures, and drawings. In: Fey Parrill, Vera Tobin and Mark Turner (eds.), Form,
Meaning, and Body, 41–60. Stanford, CA: CSLI Publications.
Fillmore, Charles J. 1982. Frame semantics. In: Linguistic Society of Korea (ed.), Linguistics in the
Morning Calm, 111–137. Seoul: Hanshin.
Fricke, Ellen 2007. Origo, Geste und Raum – Lokaldeixis im Deutschen. Berlin: De Gruyter Mouton.
Gallese, Vittorio and George Lakoff 2005. The mind’s concepts: The role of the sensory-motor
system in conceptual knowledge. Cognitive Psychology 22: 455–479.
Gibbs, Raymond W., Jr. 1994. The Poetics of Mind: Figurative Thought, Language, and Under-
standing. Cambridge: Cambridge University Press.
Gibbs, Raymond W., Jr. 2006. Embodiment and Cognitive Science. New York: Cambridge Univer-
sity Press.
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell and
Elena T. Levy (eds.), Gesture and the Dynamic Dimensions of Language, 195–212. Amsterdam:
John Benjamins.
Grady, Joseph 1997. Foundations of meaning: Primary metaphors and primary scenes. Ph.D. dis-
sertation, University of California at Berkeley.
Grandhi, Sukeshini A., Gina Joue and Irene Mittelberg 2011. Understanding naturalness and in-
tuitiveness in gesture production: Insights for touchless gestural interfaces. Proceedings of the
ACM 2011 Conference on Human Factors in Computing Systems (CHI), Vancouver, BC.
Grandhi, Sukeshini A., Gina Joue and Irene Mittelberg 2012. To move or to remove? A human-
centric approach to understanding of gesture interpretation. Proceedings of the 10th ACM con-
ference on Designing Interactive Systems. Newcastle: ACM Press.
Hampe, Beate (ed.) 2005. From Perception to Meaning: Image Schemas in Cognitive Linguistics.
Berlin: De Gruyter Mouton.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Unpub-
lished Ph.D. dissertation, University of Bordeaux, France. Newcastle: ACM Press.
Hassemer, Julius, Gina Joue, Klaus Willmes and Irene Mittelberg 2011. Dimensions and mechan-
isms of form constitution: Towards a formal description of gestures. Proceedings of the GE-
SPIN 2011 Gesture in Interaction Conference. Bielefeld: Zentrum für interdisziplinäre
Forschung.
Haviland, John 2000. Pointing, gesture spaces, and mental maps. In: David McNeill (ed.), Lan-
guage and Gesture, 13–46. Cambridge: Cambridge University Press.
Hiraga, Masako K. 2005. Metaphor and Iconicity: A Cognitive Approach to Analysing Texts. Ba-
singstoke: Palgrave-MacMillan.
Holler, Judith and Katie Wilkin 2011. Co-speech gesture mimicry in the process of collaborative
referring during face-to-face dialogue. Journal of Nonverbal Behavior 35: 133–153.
Hutchins, Edwin 1995. Cognition in the Wild. Cambridge: Massachusetts Institute of Technology
Press.
Jakobson, Roman 1956. Two aspects of language and two types of aphasic disturbances. In: Linda
R. Waugh and Monique Monville-Burston (eds.), R. Jakobson – On Language, 115–133. Cam-
bridge, MA: Belknap Press of Harvard University Press.
782 IV. Contemporary approaches
Jakobson, Roman 1960. Linguistics and poetics. In: Krystyna Pomorska and Stephen Rudy (eds.),
Roman Jakobson – Language in Literature, 62–94. Cambridge, MA: Belknap Press of Harvard
University Press.
Jakobson, Roman 1966. Quest for the essence of language. In: Linda R. Waugh and Monique
Monville-Burston (eds.), Roman Jakobson – On Language, 407–421. Cambridge, MA: Belknap
Press of Harvard University Press.
Jakobson, Roman 1987. On the relation of auditory and visual signs. In: Krystyna Pomorska and
Stephen Rudy (eds.), Language in Literature, 467–473. Cambridge, MA: Belknap Press of Har-
vard University Press.
Jakobson, Roman and Krystyna Pomorska 1983. Dialogues. Cambridge: Massachusetts Institute of
Technology Press.
Jakobson, Roman and Linda R. Waugh 2002. The Sound Shape of Language. Berlin: De Gruyter
Mouton. First published [1979].
Johnson, Mark 1987. The Body in the Mind. The Bodily Basis of Meaning, Imagination, and Rea-
son. Chicago: University of Chicago Press.
Johnson, Mark 1992. Philosophical implications of cognitive semantics. Cognitive Linguistics 3:
345–366.
Johnson, Mark 2005. The philosophical significance of image schemas. In: Beate Hampe (ed.),
From Perception to Meaning: Image Schemas in Cognitive Linguistics, 15–33. Berlin: De Gruy-
ter Mouton.
Johnson, Mark 2007. The Meaning of the Body: Aesthetics of Human Understanding. Chicago: Chi-
cago University Press.
Kendon, Adam 2000. Language and gesture: Unity or duality. In: David McNeill (ed.), Language
and Gesture: Window into Thought and Action, 47–63. Cambridge: Cambridge University
Press.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kita, Sotaro 2003. Pointing: Where Language, Culture, and Cognition Meet. Mahwah, NJ: Lawrence
Erlbaum.
Kockelman, Paul 2005. The semiotic stance. Semiotica 157(1/4): 233–304.
Krois, John M., Mats Rosengren, Angela Steidele and Dirk Westerkamp (eds.) 2007. Embodiment
in Cognition and Culture. Amsterdam: John Benjamins.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste:
Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. http://
cognitextes.revues.org/406.
Lakoff, George and Mark Johnson 1980. Metaphors We Live By. Chicago: Chicago University
Press.
Lakoff, George and Mark Johnson 1999. Philosophy in the Flesh. The Embodied Mind and Its
Challenge to Western Thought. New York: Basic Books.
Langacker, Ronald W. 1993. Reference-point constructions. Cognitive Linguistics 4: 1–38.
Lebaron, Curtis and Jürgen Streeck 2000. Gestures, knowledge and the world. In: David McNeill
(ed.), Language and Gesture, 118–138. Cambridge: Cambridge University Press.
Liddell, Scott 2003. Grammar, Gesture and Meaning in American Sign Language. Cambridge:
Cambridge University Press.
Mandel, Mark 1977. Iconic devices in American Sign Language. In: Lynn Friedman (ed.), On the
Other Hand. New Perspectives on American Sign Language, 57–107. New York: Academic
Press.
Mandler, Jean M. 1996. Preverbal representation and language. In: Paul Bloom, Mary A. Peter-
son, Lynn Nadel and Merrill F. Garrett (eds.), Language and Space, 365–384. Cambridge,
MA: Massachusetts Institute of Technology Press.
47. The exbodied mind: Cognitive-semiotic principles as motivating forces in gesture 783
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: Chicago
University Press.
McNeill, David 2005. Gesture and Thought. Chicago: Chicago University Press.
Merleau-Ponty, Maurice 1962. Phenomenology of Perception. New York: Humanities Press.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discourse evidence for
multimodal models of grammar. Ph.D. dissertation, Cornell University. Ann Arbor, MI: UMI.
Mittelberg, Irene 2007. Methodology for multimodality: One way of working with speech and ges-
ture data. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson and Michael Spivey
(eds.), Methods in Cognitive Linguistics, 225–248. Amsterdam: John Benjamins.
Mittelberg, Irene 2008. Peircean semiotics meets conceptual metaphor: Iconic modes in gestural
represenations of grammar. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Gesture,
115–154. Amsterdam: John Benjamins.
Mittelberg, Irene 2010a. Geometric and image-schematic patterns in gesture space. In: Vyvyan
Evans and Paul Chilton (eds.), Language, Cognition, and Space: The State of the Art and
New Directions, 351–385. London: Equinox.
Mittelberg, Irene 2010b. Interne und externe Metonymie: Jakobsonsche Kontiguitätsbeziehungen
in redebegleitenden Gesten. Sprache und Literatur 41(1): 112–143.
Mittelberg, Irene and Linda R. Waugh 2009. Metonymy first, metaphor second: A cognitive-semi-
otic approach to multimodal figures of thought in co-speech gesture. In: Charles Forceville and
Eduardo Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Berlin Verlag.
Müller, Cornelia 2004. Forms and uses of the palm up open hand: A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday Gesture:
The Berlin Conference, 233–256. Berlin: Weidler Verlag.
Müller, Cornelia 2008. Metaphors: Dead and Alive, Sleeping and Waking. A Dynamic View. Chi-
cago: Chicago University Press.
Müller, Cornelia 2010. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. Sprache und Literatur 41: 37–68.
Müller, Cornelia and Alan Cienki 2009. Words, gestures and beyond: Forms of multimodal met-
aphor in the use of spoken language. In: Charles Forceville and Eduardo Urios-Aparisi (eds.),
Multimodal Metaphor, 297–328. Berlin: De Gruyter Mouton.
Núñez, Rafael 2008. A fresh look at the foundations of mathematics: Gesture and the psycholog-
ical reality of conceptual metaphor. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and
Gesture, 93–114. Amsterdam: John Benjamins.
Panther, Klaus-Uwe and Linda L. Thornburg 2004. The role of conceptual metonymy in meaning
construction. Metaphorik.de 06: 91–113.
Peirce, Charles Sanders 1955. Logic as semiotic: The theory of signs. In: J. Bucher (eds.) Philoso-
phical Writings of Peirce, 98–119. New York: Dover.
Peirce, Charles Sanders 1960. Collected Papers of Charles Sanders Peirce (1931–1958). Vol. I.:
Principles of Philosophy, Vol. II: Elements of Logic, edited by Charles Hartshorne and Paul
Weiss. Cambridge, MA: Belknap Press of Harvard University Press.
Peirsman, Yves and Dirk Geeraerts 2006. Metonymy as a prototypical category. Cognitive Lin-
guistics 17(3): 269–316.
Radden, Günther and Zoltan Kövecses (eds.) 1999. Metonymy in Language and Thought. Amster-
dam: John Benjamins.
Rizzolatti, Giaomo and Laila Craighero 2004. The mirror-neuron system. Annual Review of Neu-
roscience 27: 169–192.
Saussure, Ferdinand de 1986. Course in General Linguistics. 3rd edition. Translated by Roy Harris.
Chicago: Open Court.
Shapiro, Michael 1983. The Sense of Grammar: Language as Semeiotic. Bloomington: Indiana
University Press.
784 IV. Contemporary approaches
Skipper, Jeremy I., Susan Goldin-Meadow, Howard C. Nusbaum and Steven Small 2009. Gestures
orchestrate brain networks for language understanding. Current Biology 19: 611–667.
Sonesson, Göran 2007. The extensions of man revisited: From primary to tertiary embodiment. In:
John M. Krois, Mats Rosengren, Angela Steidele and Dirk Westerkamp (eds.), Embodiment in
Cognition and Culture, 27–53. Amsterdam: John Benjamins.
Streeck, Jürgen 2009. Gesturecraft: The Manu-facture of Meaning. Amsterdam: John Benjamins.
Sweetser, Eve E. 1990. From Etymology to Pragmatics: Metaphorical and Cultural Aspects of
Semantic Structure. Cambridge: Cambridge University Press.
Sweetser, Eve E. 1998. Regular metaphoricity in gesture: Bodily-based models of speech interac-
tion. Actes du 16e Congrès International des Linguistes (CD-ROM), Elsevier.
Sweetser, Eve E. 2007. Looking at space to study mental spaces: Co-speech gesture as a crucial
data source in cognitive linguistics. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana
Coulson and Michael Spivey (eds.), Methods in Cognitive Linguistics, 202–224. Amsterdam:
John Benjamins.
Sweetser, Eve E. 2012. Viewpoint and perspective in language and gesture. Introduction to Bar-
bara Dancygier and Eve Sweetser (eds.), Viewpoint in Language: A Multimodal Perspective,
1–22. Cambridge: Cambridge University Press.
Talmy, Leonard 1988. Force dynamics in language and cognition. Cognitive Science 12: 49–100.
Talmy, Leonard 2000. Toward a Cognitive Semantics. Volume 1 and 2. Cambridge: Massachusetts
Institute of Technology Press.
Taub, Sarah 2001. Language from the Body: Iconicity and Metaphor in American Sign Language.
Cambridge: Cambridge University Press.
Turner, Mark (ed.) 2006. The Artful Mind: Cognitive Science and the Riddle of Human Creativity.
New York: Oxford University Press.
Varela, Francisco, Evan Thompson and Eleanor Rosch 1992. The Embodied Mind: Human Cog-
nition and Experience. Cambridge: Massachusetts Institute of Technology Press.
Waugh, Linda R. 1976. Roman Jakobson’s Science of Language. Lisse: Peter de Ridder.
Waugh, Linda R. 1994. Degrees of iconicity in the lexicon. Journal of Pragmatics 22(1): 55–70.
Waugh, Linda R. and Monique Monville-Burston 1990. Introduction: The life, work and influ-
ence of Roman Jakobson. In: Linda R. Waugh and Monique Monville Burston (eds.),
Roman Jakobson – On Language, 1–45. Cambridge, MA: Belknap Press of Harvard Univer-
sity Press.
Wilcox, Phyllis P. 2004. A cognitive key: Metonymic and metaphorical mappings in ASL. Cogni-
tive Linguistics 15(2): 197–222.
Wilcox, Sherman 2004. Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed lan-
guages. Cognitive Linguistics 15(2): 119–147.
Williams, Robert F. 2008. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Müller (eds.), Metaphor and Gesture, 55–92. Amsterdam: John Benjamins.
Zlatev, Jordan 2005. What’s in a schema? Bodily mimesis and the grounding of language. In: Beate
Hampe (ed.), From Perception to Meaning: Images Schemas in Cognitive Linguistics, 313–342.
Berlin: De Gruyter Mouton.
Zlatev, Jordan, Timothy P. Racine, Chris Sinha and Esa Itkonen (eds.) 2008. The Shared Mind:
Perspectives on Intersubjectivity. Amsterdam: John Benjamins.
Abstract
This chapter describes an approach to the study of spoken and signed languages, as well
as communicative gestures, as the production and perception of coordinated movements,
or articulatory gestures. Such an approach derives from usage-based theories of language
as performance, and proposes that dynamic system theory can serve as the framework for
understanding language and cognition as action. One implication is a new understanding
of the relation between signed languages and gesture as systems of articulatory gesture.
The articulation as gesture framework also has implications for neuroscience, supporting
the idea that cognition and perception must be intimately linked in an embodied mind,
and suggesting that thinking is itself the evolutionary internalization of movement.
1. Introduction
In order to communicate, animals must create perceptible signals. For human commu-
nication, the predominant way in which signals are produced is by moving parts of our
bodies. For speech, the means of production is predominantly restricted to the vocal
tract. For signed languages and gestural communication, much more of the body is
used, including the hands, face, and body postures.
As Neisser (1967: 156) observed:
To speak is to make finely controlled movements in certain parts of your body, with the
result that information about these movements is broadcast to the environment. For this
reason, the movements of speech are sometimes called articulatory gestures. A person
who perceives speech, then, is picking up information about a certain class of real, physical,
tangible (as we shall see) events that are occurring in someone’s mouth.
This view of speech has given rise to a radically new way of conceptualizing language,
specifically phonology, most succinctly explained by Browman and Goldstein (1985: 35):
Much linguistic phonetic research has attempted to characterize phonetic units in terms of
measurable physical parameters or features. Basic to these approaches is the view that
phonetic description consists of a linear sequence of static physical measures, either articu-
latory configurations or acoustic parameters. The course of movement from one such con-
figuration to another has been viewed as secondary. We have proposed an alternative
approach, one that characterizes phonetic structure as patterns of articulatory movement,
786 IV. Contemporary approaches
or gestures, rather than static configurations. While the traditional approaches have viewed
the continuous movement of vocal-tract articulators over time as “noise” that tends to
obscure the segment-like structure of speech, we have argued that setting out to character-
ize articulatory movement directly leads not to noise but to organized spatiotemporal
structures that can be used as the basis for phonological generalizations as well as accurate
physical description. In our view, then, a phonetic representation is a characterization of
how a physical system (e.g., a vocal tract) changes over time.
This view led to the development of a branch of linguistics called articulatory phonology
(Browman and Goldstein 1986, 1990, 1992).
2. Language as performance
Under this view, the basic units of speech are articulatory gestures, where gesture is de-
fined as functional unit, an equivalence class of coordinated movements that achieve
some end (Studdert-Kennedy 1987: 77). These functionally-defined ends, or tasks, are
modeled in terms of task dynamics (Hawkins 1992). The vocal tract, for example, is
used not only for producing speech; the speech organs may be used for talking, but
also for chewing, singing, eating, or biting. Each of these activities is a task. In modeling
speech, the task may be the formation of a constriction such as bilabial closure; this task
involves the coordinated action of several articulators, such as the lower lip, upper lip,
and jaw.
The articulatory phonology framework has significance for language and gesture that
goes far beyond describing speech tasks. Other articulators may be modeled in this way
as well. For example, the arm and hand can be used to reach for a cup, scratch your
head, or to produce a sign or gesture. Whether for speech, sign, gesture, or motor activ-
ities unrelated to communication, tasks require the coordinated action of multiple ar-
ticulators moving appropriately in time and space. These coordinated actions, called
coordinative structures, are not hardwired; rather, they are emergent structures in a
dynamically changing system.
Another significant aspect of this framework is that it unifies description over levels
that in other theories are seen as distinct. For example, formalist theories such as
Chomsky’s minimalist program assumes that universal grammar “specifies certain lin-
guistic levels, each a symbolic system” (1995: 167). One such level is a computational
system that generates structural descriptions; these structural descriptions are in turn
seen as instructions that are fed into another level, the articulatory-perceptual
performance system, which specifies how the expression is to be articulated.
Rather than viewing the units of language – whether they are phonemes, syllables,
morphemes, words or formalist structural descriptions – as timeless and non-physical
(i.e., mental) units which must be passed to and implemented in a performance system,
the dynamic view defines language “in a unitary way across both abstract ‘planning’ and
concrete articulatory ‘production’ levels” (Kelso, Saltzman, and Tuller 1986: 31). Thus,
the distinction between competence and performance, which plays such a large theoret-
ical role in generative linguistics, is collapsed into a single system described not in the
machine vocabulary of mental programs and computational systems, but in terms of a
“fluid, organic system with certain thermodynamic properties” (Thelen and Smith 1994:
xix). As Thelen and Smith go on to observe, the distinction between competence and
48. Articulation as gesture: Gesture and the nature of language 787
performance does not make biological sense: “Abstract formal constraints are fine for
disembodied logical systems. But people are biological entities; they are embedded, liv-
ing process. If competence in the Chomskyan sense is part of our biology, then it must
also be embodied in living, real-time process” (Thelen and Smith 1994: 27). The artic-
ulation as gesture framework provides the theoretical basis for describing language as
a dynamic, real-time process.
The abstractionist solution removes all traces of language as a physical system and
instead views language as a formal system of abstract rules. This solution strips away
the performance of language by means of vocal tracts, hands, faces, and the anatomy
and musculature that controls these articulators. The question of how these non-
physical and timeless mental units are implemented in a physical production system
is left largely unanswered. Likewise, perceptual systems play no part in understanding
language from this perspective because the physical execution of language is removed
from consideration under the abstractionist solution.
The embodied solution claims that language, whether spoken or signed, and indeed
all communication, is made possible because we have physical bodies which we move to
produce signals. In this view, language is the performance of a physical system. Whether
spoken or signed, languages are systems of meaningful movement. While the percep-
tible signals are distinct – acoustic for speech and optical for signs – the distal events
are the same: spoken and signed languages are produced by dynamically organized
articulatory gestures.
Finally, just as the articulation as gesture framework offers a way to unify descrip-
tions across spoken and signed languages, it also extends to the description of the articu-
latory movements that constitute meaningful gestures of the type described by gesture
researchers (Cienki and Müller 2008; Kendon 2004; McNeill 2005). This unification of
three communication systems – speech, sign, and gesture – has significant implications
for the historical development of signed languages, for our understanding of the cog-
nitive and neural underpinnings of language and gesture, and for the evolution of
language.
It is doubtful that sign can engender thought. It is concrete. It is not truly connected with
feeling and thought. (…) It lacks precision. (…) Sign cannot convey number, gender, per-
son, time, nouns, verbs, adverbs, adjectives, he claims. (…) It does not allow [the teacher] to
raise the deaf-mute above his sensations. (…) Since signs strike the senses materially they
cannot elicit reasoning, reflection, generalization, and above all abstraction as powerfully
as can speech. (Lane 1984: 388)
As linguists began to study and understand signed languages as true human language,
the pendulum began to swing in the opposite direction. Signed languages were seen
as categorically distinct from gesture.
The articulation as gesture framework suggests that a third option is possible: non-
linguistic, everyday gestures may become incorporated into the linguistic system of
signed languages through the process known as grammaticalization (Bybee, Perkins,
and Pagliuca 1994; Heine, Claudi, and Hünnemeyer 1991; Heine and Kuteva 2007).
In a series of publications, Wilcox (2004, 2009; Wilcox and Wilcox 2010) has described
two routes by which this takes place.
48. Articulation as gesture: Gesture and the nature of language 789
In one route, a manual gesture enters a signed language as a lexical sign and then devel-
ops into a grammatical morpheme. For example, Janzen and Shaffer (2002) have suggested
that a gesture signaling departure that has been in common use in the Mediterranean
region for centuries was incorporated into French Sign Language as the lexical sign PAR-
TIR “to depart”. This sign appears as a lexical sign meaning “to leave” in American Sign
Language at the turn of the 20th century, but also as a grammatical sign meaning “future”.
Similarly, modal forms in American Sign Language and in Catalan Sign Language can be
traced to gestures that were incorporated into the language and then grammaticalized
from lexical to grammatical forms (Wilcox and Wilcox 1995; Wilcox 2004).
In the second route, the source is not the manual gesture itself; rather, it is the way
that the gesture is produced, its quality or manner of movement, as well as various
facial, mouth, and eye gestures that may accompany a manual gesture or sign. Upon
entering the linguistic system, these manner of movement and facial gestures follow
a developmental path from prosody/intonation to grammatical marker. For example,
gestural manner of movement is incorporated into Italian Sign Language (LIS) in the
expression of modal strength signifying various degrees of impossibility (Wilcox, Ros-
sini, and Antinoro Pizzuto 2010). Pizzuto (1987) observed that verb aspect is expressed
in Italian Sign Language through systematic alterations of the verb’s movement pattern;
for example, “suddenness” is expressed by means of a tense, fast, short movement,
while a verb produced with an elongated, elliptical, large and slow movement marks
an action that is repeated over and over in time or takes place repeatedly. Verb aspect
in American Sign Language is also marked by changes to the temporal dynamics of a
verb’s movement (Klima and Bellugi 1979).
The significance of this work is twofold. First, it suggests that gesture and language
may not be as distinct as previous linguistic theories suggested. A model that views lan-
guage and gesture as physical systems is able to account for this more unified perspec-
tive. Second, it suggests that the distinction among linguistic levels of analysis –
phonetic (prosody/intonation) and grammar – is also better seen as a continuum. If
this is true, then the dynamic approach to language as a physical system applies not
just to phonetics and phonology but to grammar as well.
7. References
Armstrong, David F., William C. Stokoe and Sherman Wilcox 1995. Gesture and the Nature of
Language. Cambridge: Cambridge University Press.
Armstrong, David F. and Sherman Wilcox 2007. The Gestural Origin of Language. Oxford: Oxford
University Press.
Berthoz, Alain 2000. The Brain’s Sense of Movement. Cambridge, MA: Harvard University Press.
Browman, Catherine P. and Louis M. Goldstein 1985. Dynamic modeling of phonetic structure. In:
Victoria A. Fromkin (ed.), Phonetic Linguistics, 35–53. New York: Academic Press.
Browman, Catherine P. and Louis M. Goldstein 1986. Towards an articulatory phonology. Phonol-
ogy Yearbook 3: 219–252.
Browman, Catherine P. and Louis M. Goldstein 1990. Representation and reality: Physical systems
and phonological structure. Journal of Phonetics 18(3): 411–424.
Browman, Catherine P. and Louis M. Goldstein 1992. Articulatory phonology. Phonetica 49:
155–180.
Bybee, Joan, Revere Perkins and William Pagliuca 1994. The Evolution of Grammar: Tense,
Aspect, and Modality in the Languages of the World. Chicago: University of Chicago Press.
Chomsky, Noam 1995. The Minimalist Program. Cambridge: Massachusetts Institute of Technol-
ogy Press.
Cienki, Alan J. and Cornelia Müller (eds.) 2008. Metaphor and Gesture. Amsterdam: John
Benjamins.
Edelman, Gerald M. 1987. Neural Darwinism: The Theory of Neuronal Group Selection. New
York: Basic Books.
Edelman, Gerald M. 1989. The Remembered Present: A Biological Theory of Consciousness. New
York: Basic Books.
Hawkins, Sarah 1992. An introduction to task dynamics. In: Gerard J. Docherty and Robert D.
Ladd (eds.), Papers in Laboratory Phonology II: Gesture, Segment, Prosody, 9–25. Cambridge:
Cambridge University Press.
Heine, Bernd, Ulrike Claudi and Friederike Hünnemeyer 1991. Grammaticalization: A Concep-
tual Framework. Chicago: University of Chicago Press.
Heine, Bernd and Tania Kuteva 2007. The Genesis of Grammar: A Reconstruction. Oxford:
Oxford University Press.
Janzen, Terry and Barbara Shaffer 2002. Gesture as the substrate in the process of ASL gram-
maticization. In: Richard Meier, David Quinto and Kearsy Cormier (eds.), Modality and
Structure in Signed and Spoken Languages, 199–223. Cambridge: Cambridge University
Press.
Kelso, J. A. Scott 1995. Dynamic Patterns: The Self-Organization of Brain and Behavior. Cam-
bridge: Massachusetts Institute of Technology Press.
Kelso, J. A. Scott, Elliot L Saltzman and Betty Tuller 1986. The dynamical perspective on speech
production: Data and theory. Journal of Phonetics 14: 29–59.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University
Press.
King, Barbara J. 2004. The Dynamic Dance: Nonvocal Communication in African Great Apes.
Cambridge, MA: Harvard University Press.
Klima, Edward and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard Uni-
versity Press.
Lane, Harlan 1984. When the Mind Hears: A History of the Deaf. New York: Random House.
Llinás, Rodolfo 2001. I of the Vortex: From Neurons to Self. Cambridge: Massachusetts Institute of
Technology Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Neisser, Ulrich 1967. Cognitive Psychology. New York: Appleton-Century-Crofts.
Noë, Alva 2004. Action in Perception. Cambridge: Massachusetts Institute of Technology Press.
792 IV. Contemporary approaches
Pizzuto, Elena 1987. Aspetti morfosintattici. In: Virginia Volterra (ed.), La Lingua Italiana dei
Segni – La Comunicazione Visivo-Gestuale dei Sordi, 179–209. Bologna: Il Mulino.
Saltzman, Elliot and J.A. Scott Kelso 1987. Skilled actions: A task-dynamic approach. Psycholog-
ical Review 94(1): 84.
Smith, Linda B. 2005. Cognition as a dynamic system: Principles from embodiment. Developmen-
tal Review 25(3–4): 278–298.
Studdert-Kennedy, Michael 1987. The phoneme as a perceptuomotor structure. In: Alan Allport,
Donald G. Mackay, Wolfgang Prinz and Eckart Scheerer (eds.), Language Perception and Pro-
duction: Relationships between Listening, Speaking, Reading, and Writing, 67–84. London:
Academic Press.
Swinnen, Stephan P. and Nicole Wenderoth 2004. Two hands, one brain: Cognitive neuroscience of
bimanual skill. Trends in Cognitive Sciences 8(1): 18–25.
Thelen, Esther and Linda B. Smith 1994. A Dynamic Systems Approach to the Development of
Cognition and Action. Cambridge: Massachusetts Institute of Technology Press.
Tyrone, Martha E., Hosung Nam, Elliot Saltzman, Guarav Mathur and Louis Goldstein 2010.
Prosody and Movement in American Sign Language: A Task-Dynamics Approach. Proceedings
from 5th Speech Prosody International Conference, Chicago.
Wilcox, Sherman 1992. The Phonetics of Fingerspelling. Amsterdam: John Benjamins.
Wilcox, Sherman 2004. Gesture and language: Cross-linguistic and historical data from signed lan-
guages. Gesture 4(1): 43–75.
Wilcox, Sherman 2009. Symbol and symptom: Routes from gesture to signed language. Annual
Review of Cognitive Linguistics 7: 89–110.
Wilcox, Sherman, Paolo Rossini and Elena Antinoro Pizzuto 2010. Grammaticalization in sign
languages. In: Diane Brentari (ed.), Sign Languages, 332–354. Cambridge: Cambridge Univer-
sity Press.
Wilcox, Sherman and Phyllis Wilcox 1995. The gestural expression of modality in American Sign
Language. In: Joan Bybee and Suzanne Fleischman (eds.), Modality in Grammar and Dis-
course, 135–162. Amsterdam/Philadelphia: John Benjamins.
Wilcox, Sherman and Phyllis Wilcox 2010. The analysis of signed languages. In: Bernd Heine and
Heiko Narrog (eds.), The Oxford Handbook of Linguistic Analysis, 739–760. Oxford: Oxford
University Press.
Abstract
The gestures we produce when we talk reflect our thoughts. But the hand movements that
accompany speech can do more – they can change the way we think. Gesture can bring
about cognitive change through its effects on the learning environment – learners’
49. How our gestures help us learn 793
gestures signal to their communication partners that they are ready to learn a particular
task and their partners alter their input accordingly. Gesture can also bring about change
directly through its effects on learners themselves – learners’ gestures can change, bring
new implicit knowledge into their repertoires, bring action information into their mental
representations, and lighten their cognitive load. Whatever the mechanism, it is clear that
the way we move our hands when we speak is not mere handwaving – our hands can
change how we think and, as such, have the potential to be harnessed in teaching and
learning situations.
1. Introduction
People move their hands when they talk – they gesture. Even congenitally blind speak-
ers who have never seen anyone gesture move their hands when they speak (Iverson
and Goldin-Meadow 1998). Although the gestures that accompany speech might, at
times, appear to be meaningless movements, they are not mere handwaving. Gestures
are synchronized, both semantically and temporally, with the words they accompany
(Kendon 1980; McNeill 1992) and, in this sense, form an integrated system with speech.
The goal of this chapter is to demonstrate that the relation gesture holds to speech
has cognitive implications. Learners who are on the verge of making progress on a task
gesture differently from learners who are not as far along in their thinking – they pro-
duce gestures that convey information that is different from the information they
convey in their speech (Goldin-Meadow 2003). These so-called “gesture-speech mis-
matches” are a signal that the learner is ready to learn that particular task and, in
fact, learners who produce gesture-speech mismatches show greater gains from instruc-
tion on the task than children who produce only matches (Church and Goldin-Meadow
1986; Perry, Church, and Goldin-Meadow 1988; Pine, Lufkin, and Messer 2004). Ges-
tures, when evaluated in relation to the speech they accompany, thus reflect a learner’s
cognitive state.
There is, in addition, recent work suggesting that gesture can do more than reflect
the state of a learner’s knowledge – it can act as a catalyst for change and, as such,
play a causal role in learning. We begin with a look at the evidence suggesting that ges-
ture is associated with learning and reflects what learners know. We then turn to new
findings demonstrating that gesture can play an active role in learning, and we explore
the mechanisms by which gesture has its effects.
incorrect answer by producing gestures that convey the same information as his speech
(e.g., saying, “I added the 5, the 3, and the 6, and put 14 in the blank,” while pointing at
the 5, the 3, and the 6 on the left side of the equation, i.e., producing an add-to-equal-
sign strategy in both speech and gesture; Perry, Church, and Goldin-Meadow 1988;
Alibali and Goldin-Meadow 1993).
This phenomenon turns out to be a general one, found not only in 9- and 10-year olds
learning math problems and in toddlers learning to produce two-word sentences, but also
in 5- to 8-year olds learning to solve conservation problems (Church and Goldin-
Meadow 1986); in 5- to 6-year-olds learning to mentally rotate objects (Ehrlich, Levine,
and Goldin-Meadow 2006); in 5-year-olds learning to balance blocks on a beam (Pine,
Lufkin, and Messer 2004); and in adults learning how gears work (Perry and Elder 1997).
There is evidence that children’s gestures can elicit targeted input in tasks other than
language learning. Adults interacting with children solving math problems change the
input they give children as a function of the gestures that the children produce.
Goldin-Meadow and Singer (2003) asked teachers to interact individually with children
who could not yet solve the mathematical equivalence problems. They found that the
teachers gave different kinds of instruction to children who produced gesture-speech
mismatches than they gave to children who produced only gesture-speech matches.
In particular, the teachers gave more different kinds of problem-solving strategies in
speech to children who produced mismatches than to children who produced matches.
Teachers also produced more mismatches of their own – typically containing two cor-
rect strategies, one in speech and the other in gesture – when teaching children who pro-
duced mismatches than when teaching children who produced matches. Thus, teachers
do notice the gestures learners produce and they change their instruction accordingly.
Does the tailored input teachers give math learners promote learning? To find out,
Singer and Goldin-Meadow (2005) designed a math lesson based on the instruction that
teachers spontaneously gave children who produced mismatches. In particular, the les-
son included either one correct strategy (equalizer) or two correct strategies (equalizer
and add-subtract) in speech; in addition, the instruction either contained no gestures at
all, matching gestures, or mismatching gestures. There were thus six different training
groups. Interestingly, including more than one strategy in speech in the lesson turned
out to be an ineffective teaching strategy – children improved significantly more after
the lesson if they had been given one strategy in speech than if they had been given
two. But including mismatches in the lesson was very effective – children improved sig-
nificantly more after the lesson if their lesson included mismatching gestures than if it
included matching gestures or no gestures at all. The lesson that was most effective con-
tained the equalizer strategy in speech, combined with the add-subtract strategy in ges-
ture (e.g., “to solve this problem, you need to make one side equal to the other side,”
said while pointing at the three numbers on the left side of the equation and then pro-
ducing a “take away” gesture under the number on the right side). In other words, a
lesson containing two strategies was particularly effective, but only if the two strategies
were produced in different modalities. Including gesture in instruction has, in general,
been found to promote learning in a variety of tasks (Church, Ayman-Nolley, and
Mahootian 2004; Perry, Berch, and Singleton 1995; Ping and Goldin-Meadow 2008;
Valenzeno, Alibali, and Klatzky 2003).
Taken together, the findings suggest that the gestures learners produce convey mean-
ing that is accessible to their communication partners. The partners, in turn, alter the way
they respond to a learner as a function of that learner’s gestures. Learners then profit
from those responses, which they elicited through their gestures. Gesture can thus play
a causal role in learning indirectly through the effect it has on the learning environment.
(2006) found that children were more likely to gesture during a lesson when their
teacher gestured. Importantly, those children who gestured during the lesson were
more likely to profit from the lesson than those who did not gesture. Gesturing can
help children get the most out of a lesson.
The children in the Cook and Goldin-Meadow (2006) study were not forced to ges-
ture – they chose to. Thus, the children who chose to gesture may have been more ready
to learn than the children who chose not to gesture. If so, the fact that they reproduced
the experimenter’s gestures may have been a reflection of that readiness to learn, rather
than a causal factor in the learning itself. To address this concern, gesture needs to be
manipulated more directly – all of the children in the gesture group must reproduce the
experimenter’s hand movements during the lesson.
Cook, Mitchell and Goldin-Meadow (2008) solved this problem by teaching children
words and hand movements prior to the math lesson, and then asking the children to
reproduce those words and/or gestures during the lesson itself. All of the children
were then given the same lesson in mathematical equivalence. The only difference
among the groups during the lesson was the children’s own behavior – the children
repeated the words and/or hand movements they were taught before and after each
problem they were given to solve. These self-produced behaviors turned out to make
a big difference, not in how well the children did at post-test (children in all three
groups made equal progress right after the lesson), but in how long they retained the
knowledge they had been taught – children who were told to produce gestures (with
or without speech) during the lesson performed significantly better than children who
were told to produce only speech on a follow-up test four weeks later. Thus, the chil-
dren’s own hand movements worked to cement what they had learned, suggesting
that gesture can play a role in knowledge change by making learning last.
The information that the children produced in gesture in the Cook, Mitchell and
Goldin-Meadow (2008) study (the equalizer strategy – the way to solve the problem
is to make one side of the equation equal to the other) was reinforced by the equalizer
information they heard in both speech and gesture during the lesson. Thus, their ges-
tures did not provide new information. To determine whether gesture can create new
ideas, Goldin-Meadow, Cook and Mitchell (2009) again taught children words and
hand movements to produce before the lesson began. But this time, the hand move-
ments instantiated a different strategy from the one conveyed in the words they were
taught. All of the children were taught to say the equalizer strategy in speech, “to
solve this problem, I need to make one side equal to the other side,” but some were
also taught to produce the grouping strategy in gesture (a V-hand under the 6+3 in
the problem 6+3+5=__+5, followed by a point at the blank; the correct answer can
be gotten by grouping and summing 6 and 3 in this problem). The children were re-
quired to produce the words or words+gestures they had been taught before and
after each problem they solved during the lesson, which did not include the grouping
strategy. Children who produced grouping in gesture learned more from the lesson
than children who did not. Since the teacher did not use the grouping strategy in either
gesture or speech, and the children only produced the grouping strategy in gesture and
not in speech, the strategy had to have come from the children’s own hands. Gesture
can thus introduce new knowledge into a child’s repertoire.
Gesture can bring new knowledge into a children’s system if the child is told to pro-
duce particular hand movements. But learners are rarely told how to move their hands.
798 IV. Contemporary approaches
What would happen if children were told to move their hands but without instruction
about which movements to make? Broaders et al. (2007) addressed this question by
first asking children to solve six mathematical equivalence problems without any in-
structions about what to do with their hands. The children were then asked to solve a
second set of comparable problems but, this time, some children were told to move
their hands as they explained their solutions to the second set of problems; others
were told not to move their hands or given no instructions at all about what to do
with their hands. Children who were told to gesture on the second set of problems
added significantly more new strategies to their repertoires than children who were
told not to gesture and than children given no instructions at all. Most of those strate-
gies were produced uniquely in gesture, not in speech, and, surprisingly, most were cor-
rect. The children who were told to gesture had been turned into mismatchers – they
produced information in gesture that was different from the information they produced
in speech. Were these created mismatchers also ready to learn? To find out, Broaders
et al. (2007) gave another group of children the same instructions to gesture or not
to gesture while solving a second set of mathematical equivalence problems, and
then gave all of the children a lesson in mathematical equivalence. Children told to ges-
ture again added more strategies to their repertoires after the second set of problems
than children told not to gesture. Moreover, children told to gesture showed more
improvement on the post-test than children told not to gesture, particularly if the chil-
dren had added strategies to their repertoires after being told to gesture. Being told to
gesture thus encouraged children to express new ideas that they had previously not
expressed, which, in turn, led to learning.
already had. But gesture might also be able to create new implicit knowledge, and thus
serve as a vehicle by which new information can be brought into the system. If gesture
serves only to activate implicit knowledge (as opposed to creating it), then asking lear-
ners to gesture (or having them observe gesture) should improve learning only for chil-
dren who already have implicit knowledge. However, if gesture can create new
knowledge, then gesture should also be effective for children who do not yet have
implicit knowledge.
To determine whether gesture affects learning by creating implicit knowledge or acti-
vating it, Cook and Goldin-Meadow (2009) reanalyzed data from previous studies, divid-
ing children into those who had implicit knowledge prior to the experimental
manipulation and those who did not. They used the gestures that children produced
prior to instruction, evaluated in relation to the accompanying speech, as a marker for
implicit knowledge. Children who produced at least some gestures that conveyed different
information from their speech (i.e., children who produced gesture-speech mismatches) on
a particular task were classified as “having implicit knowledge” with respect to that task.
Children whose gestures always conveyed the same information as their speech on a task
were classified as “not having implicit knowledge” with respect to that task.
If gesture is merely activating implicit knowledge, as opposed to creating it, then ask-
ing learners to gesture (or having them observe gesture) should improve learning only
for children who already have implicit knowledge. However, if gesture can create new
knowledge, then gesture should also be effective for children who do not yet have
implicit knowledge. Cook and Goldin-Meadow (2009) found that gesture did, in fact,
lead to learning not only in children who had implicit knowledge, but also in children
who did not have implicit knowledge, suggesting that the gesture manipulations were
not merely activating implicit knowledge but were creating it.
In addition to pinning down the mechanism by which gesture affects learning, Cook
and Goldin-Meadow (2009) were able to explore whether having implicit knowledge
prepares children to profit from instruction. They found that instruction of any sort,
whether it contained gesture or not, led to improved learning on a task if children already
had implicit knowledge on that task. In contrast, for children who did not have implicit
knowledge prior to instruction, including gesture in instruction (either seeing other peo-
ples’ gestures or producing one’s own gestures) was necessary in order for the children to
show improvement. In general, the analyses showed that gesture manipulations promote
learning in children who do not yet have implicit knowledge, suggesting that gesture can
indeed create implicit knowledge rather than merely activate it.
asked to explain how they solved the problem to a confederate. In the final step, adults
were asked to solve the Tower of Hanoi problem a second time (Tower of Hanoi 2), but
this time half of the adults used disks whose weights were switched so that the smallest
disk weighed the most and the largest the least (Switch condition); the other half used
disks whose weights remained correlated with their size (No-Switch condition).
Adults gestured when explaining how they solved Tower of Hanoi 1, often producing
action gestures; for example, one-handed or two-handed motions mimicking actions
used to move the disks (see Cook and Tanenhaus 2009; Garber and Goldin-Meadow
2002). Some of these gestures – in particular, one-handed gestures produced to describe
moving the smallest disk – were incompatible with the actions needed to solve Tower of
Hanoi 2 in the Switch condition (where the smallest disk was now the heaviest and re-
quired two hands to move), but were not incompatible with actions needed to solve
Tower of Hanoi 2 in the No-Switch condition (where the smallest disk continued to
be the lightest and could easily be moved one-handed).
The more incompatible gestures that adults in the Switch condition produced when
explaining how they solved Tower of Hanoi 1, the worse they performed on Tower of
Hanoi 2. No such relation between gesture and solution performance was found in the
No-Switch condition. Gesturing thus seemed to change adults’ mental representation of
the Tower of Hanoi task. After gesturing about the smallest disk with one hand, the
adults mentally represented this disk as a light object. For adults in the Switch condi-
tion, this representation was incompatible with the disk that the subjects eventually en-
countered when solving Tower of Hanoi 2 (the smallest disk was now too heavy to lift
with one hand). The relatively poor performance of adults in the Switch condition on
Tower of Hanoi 2 suggests that the mental representation created by gesture interfered
with subsequent performance on Tower of Hanoi 2.
There is, however, another possibility. The adults’ gestures could be reflecting their
representation of the smallest disk as a light object rather than creating it. But if gesture
changes thought by adding action information – rather than merely reflecting action
information already inherent in one’s mental representation of a problem – then perfor-
mance of adults in the Switch condition should not be impaired if those adults do not
gesture. Beilock and Goldin-Meadow (2009) asked a second group of adults to solve
Tower of Hanoi 1 and Tower of Hanoi 2, but they were not asked to do the explanation
task in between and, as a result, did not gesture. These adults performed equally well on
Tower of Hanoi 2 in both the Switch and No-Switch conditions. Switching the weights of
the disks interfered with performance only when subjects had previously produced
action gestures relevant to the task.
Gesturing thus adds action information to speakers’ mental representations – when
incompatible with subsequent actions, this information interferes with problem-solving.
When the information gesture adds to a speaker’s mental representations is compatible
with future actions, those actions will presumably be facilitated. Gesturing introduces
action into thought and, in this way, may be another instance of how the way in which
we move our bodies influences how we think (Barsalou 1999; Beilock and Holt 2007;
Glenberg and Robertson 1999).
coordination of two separate cognitive and motor systems. If so, gesturing might increase
speakers’ cognitive load. Alternatively, gesture and speech might form a single, integrated
system in which the two modalities work together to convey meaning. Under this view,
gesturing while speaking would reduce demands on the speaker’s cognitive resources (rel-
ative to speaking without gesture), and free up cognitive capacity to perform other tasks.
To distinguish these alternatives and to determine the impact of gesturing on a
speaker’s cognitive load, Goldin-Meadow et al. (2001; see also Wagner, Nusbaum
and Goldin-Meadow 2004) explored how gesturing on one task (explaining a math
problem) affected performance on a second task (remembering a list of words or let-
ters) carried out at the same time. If gesturing increases cognitive load, gesturing
while explaining the math problems should take away from the resources available
for remembering. Memory should then be worse when speakers gesture than when
they do not gesture. Alternatively, if gesturing reduces cognitive load, gesturing while
explaining the math problems should free up resources available for remembering.
Memory should then be better when speakers gesture than when they do not. Both
adults and children remembered significantly more items when they gestured during
their math explanations than when they did not gesture. Gesturing appeared to save
the speakers cognitive resources on the explanation task, permitting the speakers to
allocate more resources to the memory task.
Why does gesturing lighten cognitive load? Perhaps it is the motor aspects of gesture
that are responsible for the cognitive benefits association with producing gesture. If so,
the meaning of the gesture should not affect its ability to lighten cognitive load.
Wagner, Nusbaum and Goldin-Meadow (2004) replicated the cognitive load effect on
adults asked to remember lists of letters or locations on a grid while explaining how
they solved a factoring problem. The adults remembered more letters or locations
when they gestured than when they did not gesture. But the types of gestures they pro-
duced mattered. In particular, gestures that conveyed different information from the ac-
companying speech (mismatching gesture) lightened load less than gestures that
conveyed the same information as the accompanying speech (matching gesture).
Thus, the effect gesture has on working memory cannot be a pure motor phenomenon –
it must stem instead from the coordination of motor activity and higher order concep-
tual processes. If the motor aspects of gesture were solely responsible for the cognitive
benefits associated with gesture production, mismatching gestures should be as effective
in promoting recall as matching gestures – after all, mismatching gestures are motor
behaviors that are physically comparable to matching gestures.
Gesturing on a task thus allows speakers to conserve cognitive resources. Learners
might then have more resources available to learn a new task if they gesture while
tackling the task than if they do not gesture.
To summarize, the gestures we produce when we talk not only reflect our thoughts
but also play a role in changing those thoughts. Gesture can bring about cognitive
change in at least two ways – by signalling to others that a learner is in a particular cog-
nitive state and, as a result, eliciting input that could lead to learning, and by changing
the learner’s cognitive state itself. There are a number of mechanisms through which
gesture could bring about cognitive change, including bringing new implicit knowledge
into the learner’s repertoire, bringing action information into the learner’s mental re-
presentations, and lightening the learner’s cognitive load. Whatever the mechanism,
it is clear that the way we move our hands when we speak is not mere handwaving –
802 IV. Contemporary approaches
our hands can change how we think and, as such, have the potential to be harnessed in
teaching and learning situations.
Acknowledgement
Supported by grant no. R01 HD47450 and P01HD40605 from NIDCD.
5. References
Acredolo, Linda and Susan Goodwyn 1988. Symbolic gesturing in normal infants. Child Develop-
ment 59: 450–456.
Alibali, Martha Wagner, Miriam Bassok, Karen Olseth, Sharon Syc, and Susan Goldin-Meadow
1999. Illuminating mental representations through speech and gesture. Psychological Sciences
10: 327–333.
Alibali, Martha Wagner and Susan Goldin-Meadow 1993. Transitions in learning: What the hands
reveal about a child’s state of mind. Cognitive Psychology 25: 468–523.
Barsalou, Lawrence 1999. Perceptual symbols systems. Behavioral and Brain Sciences 22: 577–660.
Bates, Elizabeth 1976. Language and Context. New York: Academic Press.
Bates, Elizabeth, Laura Benigni, Inge Bretherton, Luigia Camaioni and Virginia Volterra 1979. The
Emergence of Symbols: Cognition and Communication in Infancy. New York: Academic Press.
Beilock, Sian L. and Susan Goldin-Meadow 2009. Gesture grounds thought in action, under review.
Beilock, Sian L. and Susan Goldin-Meadow 2010. Gesture changes thought by grounding it in
action. Psychological Science 21: 1605–1610.
Beilock, Sian L. and Lauren E. Holt 2007. Embodied preference judgments: Can likeability be
driven by the motor system? Psychological Science 18: 51–57.
Broaders, Sara C., Susan Wagner Cook, Zachary Mitchell and Susan Goldin-Meadow 2007. Mak-
ing children gesture brings out implicit knowledge and leads to learning. Journal of Experimen-
tal Psychology: General 136: 539–550.
Church, Ruth B., Saba Ayman-Nolley and Shahrzad Mahootian 2004. The effects of gestural
instruction on bilingual children. International Journal of Bilingual Education and Bilingualism
7(4): 303–319.
Church, Ruth B. and Susan Goldin-Meadow 1986. The mismatch between gesture and speech as
an index of transitional knowledge. Cognition 23(1): 43–71.
Cook, Susan Wagner and Susan Goldin-Meadow 2006. The Role of Gesture in Learning: Do children
use their hands to change their minds? Journal of Cognition & Development 7(2): 211–232.
Cook, Susan Wagner, Melissa Duff and Susan Goldin-Meadow 2013. Co-speech gesture is a vehicle
for non-declarative knowledge that can change a learner’s mind, under review.
Cook, Susan Wagner, Zachary Mitchell and Susan Goldin-Meadow 2008. Gesturing makes learn-
ing last. Cognition 106: 1047–1058.
Cook, Susan Wagner and Michael K. Tanenhaus 2009. Embodied communication: Speakers’ ges-
tures affect listeners’ actions. Cognition 113(1): 98–104.
Ehrlich, Stacy B., Susan C. Levine and Susan Goldin-Meadow 2006. The importance of gesture in
children’s spatial reasoning. Developmental Psychology 42: 1259–1268.
Garber, Philip and Susan Goldin-Meadow 2002. Gesture offers insight into problem-solving in
adults and children. Cognitive Science 26: 817–831.
Glenberg, Arthur M. and David A Robertson 1999. Indexical understanding of instructions. Dis-
course Processes 28: 1–26.
Goldin-Meadow, Susan 2003. Hearing Gesture: How Our Hands Help Us Think. Cambridge, MA:
Harvard University Press.
Goldin-Meadow, Susan, Martha W. Alibali and Ruth B. Church 1993. Transitions in concept acqui-
sition: Using the hand to read the mind. Psychological Review 100(2): 279–297.
49. How our gestures help us learn 803
Goldin-Meadow, Susan and Cindy Butcher 2003. Pointing toward two-word speech in young chil-
dren. In Sotaro Kita (ed.), Pointing: Where Language, Culture, and Cognition Meet, 85–107.
Mahwah, NJ: Lawrence Erlbaum.
Goldin-Meadow, Susan, Susan Wagner Cook and Zachary Mitchell 2009. Gesturing gives children
new ideas about math. Psychological Science 20(3): 267–272.
Goldin-Meadow, Susan, Whitney Goodrich, Eve Sauer and Jana M. Iverson 2007. Young children
use their hands to tell their mothers what to say. Developmental Science 10: 778–785.
Goldin-Meadow, Susan, Howard Nusbaum, Spencer D. Kelly and Susan Wagner 2001. Explaining
math: Gesturing lightens the load. Psychological Science 12: 516–522.
Goldin-Meadow, Susan and Melissa A. Singer 2003. From children’s hands to adults’ ears: Ges-
ture’s role in the learning process. Developmental Psychology 39: 509–520.
Iverson, Jana M., Olga Capirci, Virginai Volterra, and Susan Goldin-Meadow 2008. Learning to
talk in a gesture-rich world: Early communication of Italian vs. American children. First Lan-
guage 28: 164–181.
Iverson, Jana M. and Susan Goldin-Meadow 1998. Why people gesture as they speak. Nature 396: 228.
Iverson, Jana M. and Susan Goldin-Meadow 2005. Gesture paves the way for language develop-
ment. Psychological Science 16: 368–371.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Relationship of Verbal and Nonverbal Communication, 207–228. The Hague:
Mouton.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
Newell, Allen and Herbert A. Simon 1972. Human Problem Solving. Englewood Cliffs, NJ: Pre-
ntice-Hall.
Şeyda, Özçalışkan, and Susan Goldin-Meadow 2005. Gesture is at the cutting edge of early lan-
guage development. Cognition 96(3): B101–B113.
Perry, Michelle, Denise B. Berch and Jenny L. Singleton 1995. Constructing shared understanding:
The role of nonverbal input in learning contexts. Journal of Contemporary Legal Issues Spring
(6): 213–236.
Perry, Michelle, Ruth B. Church and Susan Goldin-Meadow 1988. Transitional knowledge in the
acquisition of concepts. Cognitive Development 3(4): 359–400.
Perry, Michelle and Anastasia D. Elder 1997. Knowledge in transition: Adults’ developing under-
standing of a principle of physical causality. Cognitive Development 12: 131–157.
Petitto, Laura Ann 1988 “Language” in the pre-linguistic child. In: Frank Kessel (ed.), The Devel-
opment of Language and Language Researchers: Essays in Honor of Roger Brown, 187–221.
Hillsdale, NJ: Lawrence Erlbaum.
Pine, Karen J., Nicola Lufkin and David Messer 2004. More gestures than answers: Children learn-
ing about balance. Developmental Psychology 40: 1059–1106.
Ping, Raedy and Susan Goldin-Meadow 2008. Hands in the air: Using ungrounded iconic gestures
to teach children conservation of quantity. Developmental Psychology 44: 1277–1287.
Rowe, Meredith and Susan Goldin-Meadow 2009a. Differences in early gesture explain SES dis-
parities in child vocabulary size at school entry. Science 323: 951–953.
Rowe, Meredith and Susan Goldin-Meadow 2009b. Early gesture selectively predicts later lan-
guage learning. Developmental Science 12: 182–187.
Singer, Melissa A. and Susan Goldin-Meadow 2005. Children learn when their teacher’s gestures
and speech differ. Psychological Science 16(2): 85–89.
Valenzeno, Laura, Martha W. Alibali and Roberta Klatzky 2003. Teachers’ gestures facilitate stu-
dents’ learning: A lesson in symmetry. Contemporary Educational Psychology 28: 187–204.
Wagner, Susan, Howard Nusbaum and Susan Goldin-Meadow 2004. Probing the mental represen-
tation of gesture: Is handwaving spatial? Journal of Memory and Language 50: 395–407.
Abstract
Some of the gestures that normally accompany continuous speech (coverbal gestures)
seem to be related to the content of speech and have a form that is related to this content
(iconic gestures). They have a low semantic specificity, they are physically complex, they
have systematic timing relations with the parts of speech to which they relate and they
tend to occur in the neighborhood of speech dysfluencies in both normal and pathological
speech. The present chapter reviews the evidence concerning iconic gestures and suggests
that they reflect the facilitation of lexical processing by recourse to secondary, imagistic
information.
Other coverbal gestures may be more variable in their form and timing in relation to
speech. These include pointing to a particular direction (deictics), pantomiming an action
(pantomimes), indicating measures (quantifiers) or performing a gesture with a specific,
well-known meaning (emblems). These gestures have primarily communicative functions
that act like words or specify the meanings of the accompanying speech.
I present a model (after Hadar and Butterworth 1997) in which I suggest that the
function of most iconic gestures is to facilitate word retrieval. To enhance retrieval, the
cognitive system elicits imagistic information from two distinct stages in the speech pro-
duction system: pre-verbally, from the processes of conceptualization, and post-semanti-
cally, initiated in the lexicon. Imagistic information assists or facilitates lexical retrieval in
three ways: first, by defining the conceptual input to the semantic lexicon; second, by
maintaining a set of core features while reselecting a lexical entry and, third, by directly
activating phonological word-forms.
The model offers an account of a range of detailed gestural phenomena, including the
semantic specificity of gestures, their timing in relation to speech and aspects of their
selective breakdown in aphasia.
1. Introduction
People engaged in speaking normally perform a variety of movements which are not
strictly a part of the speaking act: they nod their heads, change their postures, gesture
with their arms and hands, etc. The question why speakers should do this has occupied
the attention of curious observers from the Greek philosophers to Wundt ([1900] 1973)
50. Coverbal gestures: Between communication and speech production 805
and Freud (1938b). More systematic and rigorous recent studies resulted in viewing
these speech-related body movements as representatives of diverse phenomena, both
behaviorally and functionally. The functions ascribed to gesture included motor (redu-
cing the great number of degrees of freedom with which the articulatory system has to
manage (Hadar 1989)); communicative (helping the speakers communicate their
intended messages more fully (McNeill 2005) or even communicate unintended mes-
sages about the cognitive states of the speaker (Scheflen 1973)); cognitive (helping com-
municators think out their intended messages (Goldin-Meadow 2002)) or ancillary to
speech production (facilitating word retrieval during continuous speech) (Butterworth
and Hadar 1989; Krauss and Hadar 1999). The present chapter elaborates on the latter
function.
In descriptive terms, movements involving the whole trunk (postural shifts) were dif-
ferentiated from those of the head only, and both were separated from arm and hand
gesture (Bull 1983). Within the latter class (termed gesture henceforth), movements
having a definite and accepted meaning independently of the accompanying speech
(such as the V for victory), called emblems, were differentiated from those that do
not have a conventional meaning. The latter, in turn, were subdivided into short, fast
movements, used primarily for emphasis and related to speech rhythm (beats), as
against wider and more complex movements which were said to have some ideational
content, though not usually interpretable without the accompanying speech. The latter,
in turn, could be subdivided into deictic, quantifying, pantomimic and iconic gestures
(Feyereisen and de Lannoy 1991; Rose 2006). The elaboration of the speech productive
functions centers around iconic gestures, or gestures that I consider iconic in principle,
albeit, to a concept that is more remotely related to the words appearing in speech. This
causes the gesture to appear as metaphoric rather than iconic (Hadar and Butterworth
1997; McNeill 2005; Müller 2008).
Iconic gestures were said to depict in their form or dynamics a meaning related to
that of the accompanying verbal utterance, as in the following examples (the word
to which the gesture is presumed to be related – its lexical affiliate (Schegloff 1984) –
appears underlined):
(2) Phrase: “The network of wires that hooks up the cable car”;
Accompanying gesture: Both hands rise together, fingers interlocking momen-
tarily at chest level (McNeill 1986).
Iconic gestures have been at the center of much attention in the study of coverbal
behavior, probably because they are thought to reflect some natural processes of sym-
bolic representation, where the term “natural” refers to representations that do not sub-
sume and do not require social conventions to acquire meaning (Kristeva 1978).
Instead, the form of the gesture suggests the meaning through ideational equivalence,
a homeomorphy, so that no process of literal decoding is required to derive the meaning
from the gesture. The iconic gesture is, in that sense, an imagistic proposition: it “shows”
the meaning rather than “denotes” it, to borrow Frege’s terminology (1952). Müller
(e.g., 2008) has consistently taken a much wider view of signification in iconic represen-
tation, where a whole range of meanings – both abstract and concrete – can be gener-
ated by gesture in imagistic modes. While I do not object to this wider use of the term,
I feel that a more constrained usage may be more suitable for a detailed processing
account of iconic phenomena.
The meaning of an iconic gesture is typically vague in itself. Whilst iconic gestures
often have recognizable physical features (see below), their meaning can seldom be de-
rived from their form with any degree of certainty (Feyereisen, Van de Wiele, and
Dubois 1988; Krauss, Morrel-Samuels, and Colasante 1991; Hadar and Pinchas-Zamir
2004). The shape and dynamics of an iconic gesture are not sufficient to derive its mean-
ing, which requires also the identification of that part of the verbal message to which the
gesture relates. Given that understanding, the accompanying speech appears necessary
to interpret the gesture, the outstanding question is why speakers produce iconic ges-
tures at all: the gestures seem redundant to the communicative purposes of the speech
exchange.
kind of phenomena with which a comprehensive theory of gesture may eventually have
to grapple, but it would be pre-mature to offer their analysis before some descriptive
rigor is achieved. Watzlawick, Beavin, and Jackson (1968) describe contradictory mes-
sages of affect and attitude. A typical example is one in which a mother says to her
daughter ‘come give mummy a hug’, whilst turning her upper trunk away from the
girl. Freud (1938b) gives other examples in which persons convey in their acts those
meanings which they try to conceal in their verbal message. Studies of ‘leakage’
(Ekman and Friesen 1969) present gestural phenomena that betray the falseness of
the verbal statement, but these clearly do not refer to iconic gestures and do not rely
on content analysis. Generally speaking, the ability to convey contradictory messages
appears to belong in the repertoire of communicative behavior: it occurs intentionally
in ironic and sarcastic speech or in stylistic paradoxical formulations.
suggested that because of their relative width, iconic gestures usually involve upper arm
movement of greater amplitude than 1 cm. Moreover, over large samples of gesture,
collected in similar test conditions, greater average amplitude of movement indicated
a greater proportion of iconic gesture. For example, in a single conversation, periods
of greater average amplitude of arm movement imply the occurrence of a greater pro-
portion of iconic gestures (Hadar 1991a). Thirdly, because of their relatively wide
amplitude, iconic gestures also tend to be of relatively long duration. Hadar (1991a)
suggested that most iconic gestures were of greater duration than 0.5 sec. Also, differ-
ences in relative proportion of iconic gestures in the total of recorded body movement
were reflected in comparable differences in average duration of movement.
Combined, the above characteristics provide a powerful heuristic for inferences
about the nature of coverbal movement. This becomes most useful when inferences
have to be made without the analysis of pragmatic and semantic relations as, for exam-
ple, in analyzing gestures of aphasics whose speech may be impossible to interpret. It
is then necessary to derive the category of a movement from its physical features
alone, according to the above characterization. Whilst various non-iconic movements
may have some of the above characteristics, analysis of all four can generate valid
categorization of coverbal movements (Hadar and Yadlin-Gedassy 1994).
beats were said to coordinate with the suprasegmental features of stress and terminal
juncture (Hadar 1989). Here, an even lower locus of processing has been, proposed –
articulatory programming.
McNeill (1992, 2005) holds a view of speech production which is very different from
the above. In his view, linguistic processing evolves from generic units, “growth points”,
containing the meaning of the whole idea-to-be-expressed in an embryonic form. In this
view, the eventual size of the verbal unit is irrelevant to understanding the gesture, but
only the analysis of temporal, pragmatic and semantic relations. Since the present view
differs markedly in its understanding of speech production, it is also different in its view
of iconic gesture, despite the similarity of our understanding of the shared processing of
coverbal gesture and speech.
5. Temporal unfolding
An underlying assumption, accepted by most researchers in the field, is that if there is
cognitive coordination between the verbal and gestural channels, then the related pro-
cesses must temporally overlap. Whilst the precise temporal unfolding of this overlap is
not clear, the current practices and the related conceptual assumptions coordinate ges-
ture with elements appearing in the co-occurring idea (Butterworth and Beattie 1978;
McNeill 1985). In this context the term “idea” refers to a small number of sentences
having a strong thematic link, which appear to be planned as a single unit (Butterworth
1975).
Searching for a temporal overlap is an accepted practice, but the supporting argu-
ments are circumstantial. Firstly, in those cases where pragmatic and semantic analysis
is sufficient for the coordination of gesture and speech, the gesture concurred with the
utterance of the idea within which occurred the related verbal unit. In examples 1–4
above, there was an actual overlap between the duration of the gesture and the produc-
tion of the related word. Secondly, if the information conveyed in the gestural and the
verbal channel is pragmatically and semantically related, as all available evidence
seems to indicate, it is hard to imagine a cognitive procedure that will separate them
in time. Such procedure will necessarily require repeated processing of the same mate-
rial; it will be highly uneconomical in terms of its computational demands and will
offer no specifiable advantages in either production or perception. These considera-
tions are plausible, but leave open the issue of the precise timing of gesture in relation
to speech.
In the present restrictive definition of iconic gestures, most of them have a prepara-
tory phase during which the arm moves to a starting position at relatively low speeds.
Following this comes the iconic part of the gesture (its stroke) and it is this part of ges-
ture which will be referred to as iconic gesture henceforth. Iconic gestures usually start
before the related speech event (Butterworth and Beattie 1978; Kendon 1980; McNeill
1985; Morrel-Samuels and Krauss 1992) with a mean time lag of about 1.00 sec and with
a range of lags from 0 to 2.5 secs (Butterworth and Beattie 1978: Table 4), or to 3.8 secs
(Morrel-Samuels and Krauss 1992: Figure 1). However, Morrel-Samuels and Krauss
(1992: Figure 4) show that duration of gestures, ranging from 0.6 to 7.9 secs, is greater
than their lag in the overwhelming proportion of cases, implying a temporal overlap.
On average, gestures terminated about 1.5 secs after the initiation of their lexical
affiliates.
810 IV. Contemporary approaches
and Llewelyn (1969) have shown that, although some body movement occurs during flu-
ent speech, the probability of its occurrence increases markedly with the occurrence of
a pause, especially a hesitation pause, indicative of some breakdown in the process of
speech production. According to them, such pauses typically violate the correspon-
dence between speech flow and syntactic structure. Because no configurational descrip-
tion of movements was presented, the extent to which iconic gestures were implicated is
not clear. Hadar, Steiner, and Clifford-Rose (1984) showed that at least two kinds of
movement are involved, one which is relatively wide and probably iconic, which starts
during the pause, prior to the renewal of speech, and one which occurs after the pause
and usually coincides with the first primary stress of the following prosodic phrase. Con-
fusing the two kinds of movement resulted in ascribing to iconic gestures, as well as
beats, the motor function of dissipating redundant energy (Dittman 1972). Other
research shows that specifically iconic gestures tend to occur in the neighborhood of
hesitation pauses (Butterworth and Beattie 1978; Ragsdale and Silvia 1982).
Spontaneous speech of more than about 1 minute shows cycles of fluency comprising
a relatively hesitant “planning phase”, during which the conceptual content of speech is
determined, followed by a fluent “execution phase”, during which planned units are
produced and most lexical items are selected. Pauses in the fluent phase tend to be
brief and for lexical selection (Butterworth 1975; 1980; Henderson, Goldman-Eisler,
and Skarbek 1966). Iconic gestures tended to appear during the second phase when
most of the current pre-verbal processing has already been completed (Butterworth
and Beattie 1978: Table 1). During these fluent phases, gesture tended to start in a
pause just prior to a noun, a verb, an adverb or an adjective (Butterworth and Beattie
1978: Table 3), classes of words which tend to have lower frequencies in the language,
and also lower estimated transitional probabilities (Beattie and Butterworth 1979).
Now it is this latter measure, and not word frequency per se, that has been found to pre-
dict the occurrence of pauses in speech (Beattie and Butterworth 1979), and pre-
sumably the accessibility of the word to the speaker at that precise moment. It is,
therefore, a reasonable inference, though not one guaranteed by the available data,
that lexical retrieval difficulties are the source of both these dysfluencies and the
associated gestures.
On this evidence, those iconic gestures that are adjacent to their lexical affiliates are
generated following the conceptual planning of speech, but why should these gestures
be associated with hesitation? To investigate this, it is necessary to consider the possible
function of these gestures. There are two possibilities. First, gesture may serve a com-
municative function such as holding the floor while searching the lexicon for the target
word or conveying a part of the intended meaning in case lexical searches fail. If this
function is under voluntary control, then these gestures should vanish in the absence
of a visual contact between interlocutors, as in intercom or telephone conversations.
This is clearly not the case (Moscovici 1967; Rime 1983; Williams 1977). So, this is
unlikely to be the whole story.
Alternatively, gesture may be the exteriorized manifestation of a speech productive
function. The semantic relatedness of gesture to its lexical affiliate implies that some
aspect of the semantics of the current utterance have been determined prior to both lex-
ical selection and gesture generation. According to most models of speech production
(Butterworth 1980; Garrett 1984; Kempen and Huijbers 1983; Levelt 1989; Levelt,
Roelofs, and Meyer 1999), lexical retrieval proceeds in two distinct stages. In the first
50. Coverbal gestures: Between communication and speech production 813
stage, an abstract lexical item is retrieved on the basis of a conceptual specification. This
stage is called the “semantic lexicon” by Butterworth (1980) and Howard and Franklin
(1988: Chapter 12) or “lemma retrieval” by others (Kempen and Huijbers 1983; Levelt
1989; Levelt, Roelofs, and Meyer 1999). The second stage uses information from the
first stage to retrieve the phonological word form or “lexeme” (Kempen and Huijbers
1983) from the “phonological lexicon”. In terms of this model, the semantic relatedness
of the gesture could depend on the semantic specification used to access the semantic
lexicon and a likely functional candidate is that of facilitating lexical retrieval. In addi-
tion to the association between gestures, pauses and words of low probability in context,
evidence in support may be obtained by looking at aphasics whose lexical retrieval is
impaired by focal brain damage. These aphasics, the lexical hypothesis predicts, should
show an increase in the incidence of those iconic gestures which are related to lexical
failure.
The published evidence here is problematic. Firstly, in most published studies, not all
the relevant information is available. Thus, in some much-quoted studies such as by
Cicone et al. (1979) and Behrmann and Penn (1984) neither the timing of iconic ges-
tures in relation to lexical affiliates, nor the association with hesitation phenomena
were presented, nor indeed the exact nature of the language deficits of individual pa-
tients. Secondly, in those cases showing semantic difficulties, the determination of prag-
matic/semantic relations between gesture and speech is problematic because the
relevant word may not appear in the utterance. Thus, if a patient accompanies the
phrase “the building was tall” with a circular arm gesture, it is possible that the coordi-
nation between movement and speech is lost due to a gestural impairment (making a
circular gesture for the concept “tall”), or due to a linguistic impairment (choosing
the word “tall” for the concept “round”). There is no simple way of distinguishing
between the two. Therefore, pragmatic/semantic mis-matches between gesture and
speech in aphasic patients may not be interpreted in a specific way, but this precisely
is the basis for claims made by Cicone et al. (1979), McNeill (1985) and others. Of
course, semantics of a gesture may still be ventured on the basis of its configurational
properties, and the rate of incidence of semantic/pragmatic mis-matches may also be
determined (as in Butterworth, Swallow, and Grimston 1981), but beyond that, inves-
tigation of gesture in aphasics showing semantic impairments must be limited to the
analysis of lexical composition and timing relations.
The hypothesis of two links between gesture and speech, one pre-verbal and one
lexical, predicts a number of phenomena in the behavior of aphasic patients. Firstly,
the incidence of pragmatic/semantic mis-matches between gesture and speech will
occur irrespective of whether gesture originated early or late in the speech production
process. Deficits in word retrieval are sufficient to create these mis-matches. If gesture
does not occur in compensation for retrieval deficits, but only as a part of ideational
processing, there will be no increase in the incidence of gesture relative to speech rate.
This seems to be the case in groups of patients with fluent aphasia (Feyereisen 1983;
Hadar 1991b). This is clear in the case of KC, a jargon aphasic, who had an apparently
normal rate and ordinary timing of gestures (Butterworth, Swallow, and Grimston
1981). The authors noted that gestures probably occurred where neologistic jargon
words were produced as substitutes for words that KC was unable to retrieve. Sec-
ondly, word retrieval deficits originating in post-semantic processing will show
814 IV. Contemporary approaches
The general architecture of the model is presented in Fig. 50.1, where a two stage
model of word retrieval is assumed. A number of authors have made broadly equivalent
proposals regarding speech production and I follow a version that is derived from Levelt
(1989) although, currently, the model is not specific enough to differentiate among the
various two-stage models of lexical retrieval. Conceptual (“message level”) processing
constructs or selects a set of semantic features [Fw, Fx, Fy …} to be realized linguistically.
These features are envisaged to be conceptual and perceptual primitives of the sort pro-
posed by Miller and Johnson-Laird (1976), or by Bierwisch and Schreuder (1992). Lexical
entries are defined in terms of feature structures, and access to the entries entails matching
a subset of features, [Fx, Fy …}, to one or more entries in the semantic lexicon.
In Fig. 50.1, the subset [Fx, Fy …} may activate an image via two routes, the concep-
tual and the lexical. An image may, in turn, feed into the formulator process, and hence
into subsequent processes of word finding. The idea here is that the image will be trans-
lated back into semantic features that can then engage in conceptual processing. Now,
on occasion, the translation will not be identical to the original subset that evoked the
image: some features may be lost, or some accidental feature of the image may be in-
cluded, some previously unstressed feature may become salient, etc. For example, the
features designed to elicit the word table, will evoke the visual image of perhaps
some specific table, known to the speaker. Context or intention may determine which
aspect of the image of the table is salient and activates a gesture: its squareness will
elicit a different gesture from its flatness, for example.
As he formulates the phrase “in order to make the glider [verb]”, he generates the image
of an upwards motion and gestures an uprising curve with his arm. The feature of upward
motion is fed back into the conceptualization and added to (1), to give (2):
This set of features accesses the entry for soar and he says “in order to make the glider
soar”. The gesture will start before the beginning of the phrase, during conceptualization,
and will continue through the word “soar”. Alternatively, the speaker could preserve the
notion of a change in altitude, while evoking the image of a downward motion. He would
then produce a descending arm gesture and say “in order to make the glider descend”.
The image may also be used to refocus conceptual processing, so that, in effect, a dif-
ferent lexical entry is sought. Consider a case where a speaker is describing the way in
which her handbag was stolen. She wishes to emphasize that the bag, y, was taken force-
fully and rapidly by x, and has selected the set of features (3).
816 IV. Contemporary approaches
SPEECH PRODUCTION
Gesture Production
Spatial/
Imagistic Visual
Dynamic
shaping
Conceptual Propositional
Gesture
Spatial/dynamic
Specifications
Conceptualizer
Motor
Planning Preverbal
Massage
Lexical
Motor Lexicon Formulator
Gesture
Program
Grammatical
Lemmas Encoder
Motor Word
Phonological
Execution Forms
Encoder
Phonetic
Plan
Overt
Gesture Articulator
Overt
Speech
Fig. 50.1: A model of the relation between iconic gesture generation and speech production.
If this exhausts the speaker’s vocabulary for this particular semantic domain, there will
be no entry that satisfies all the features in (3). The image evoked by (3) may allow the
critical or core sense, (5)
manifest in speech dysfluency, depending on the time course of the re-selection process,
in relation to speech rate. In both the conceptual and the lexical cases, facilitation is
mediated through imagistic representation (De Ruiter 2000; Kita and Lausberg
2008), gesture being the motor manifestation of imagistic activation, just as lip move-
ment may be a motor accompaniment of silent reading. Of course, even in these
cases, when all evidence points to a lexical retrieval function, gesture may still have
communicative utility as a visible element of a cognitive process. In fact, there is evi-
dence that iconic gestures often help disambiguate the verbal message itself, let alone
its message (Obermeier, Holle, and Gunter 2011). This, to my mind, renders the
lip-reading metaphor even more instructive.
9. References
Argyle, Michael 1975. Bodily Communication. London: Methuen.
Beattie, Geoffrey W. and Brian Butterworth 1979. Contextual probability and word frequency as
determinants of pauses and errors in spontaneous speech. Language and Speech 22: 201–211.
Beattie, Geoffrey and Jane Coughlan 1999 An experimental investigation of the role of iconic ges-
tures in lexical access using the tip-of-the-tongue phenomenon. British Journal of Psychology
90: 35–56.
Behrmann, Marlene and Claire Penn 1984. Nonverbal communication of aphasic patients. British
Journal of Disorders of Communication 19: 155–168.
Bierwisch, Manfred and Robert Schreuder 1992. From concepts to lexical items. Cognition 42: 23–60.
Boomer, Donald S. 1963. Speech disturbance and body movement in interviews. Journal of Ner-
vous and Mental Disease 136: 263–266.
Boomer, Donald S. and Allen P. Dittman. 1964. Speech rate, filled pause and body movement in
interviews. Journal of Nervous and Mental Disease 139: 324–327.
Bull, Peter 1983. Body Movement and Interpersonal Communication. London: Wiley.
Butterworth, Brian 1975. Hesitation and semantic planning in speech. Journal of Psycholinguistic
Research 4: 75–87.
Butterworth, Brian 1980. Evidence from pauses. In: Brian Butterworth (ed.), Language Produc-
tion, 1:423–459. London: Academic Press.
Butterworth, Brian and Geoffrey Beattie 1978. Gesture and silence as indicators of planning in
speech. In: Robin Campbell and Philip T. Smith (eds.), Recent Advances in the Psychology
of Language: Formal and Experimental Approaches, 347–360. London: Plenum.
Butterworth, Brian and Uri Hadar 1989. Gesture, speech and computational stages. Psychological
Review 96: 168–174.
Butterworth, Brian J. Swallow and M. Grimston 1981. Gestures and lexical processes in jargon
aphasia. In: Jason W. Brown (ed.), Jargonaphasia, 113–124. New York: Academic Press.
Capirci, Olga and Virginia Volterra 2008. Gesture and speech: The emergence and development
of a strong and changing partnership. Gesture 8: 22–44.
Carlomagno, Sergio, Maria Pandolfi, Andrea Martini, Gabriella Di Iasi and Carla Cristilli 2005.
Coverbal gestures in Alzheimer’s type dementia. Cortex 41: 535–546.
Cicone, Michael, Wendy Wapner, Nancy Foldi, Edgar Zurif and Howard Gardner 1979. The rela-
tion between gesture and language in aphasic communication. Brain and Language 8: 324–349.
de Ruiter, Jan Peter 2000. The production of gesture and speech. In David McNeill (ed.), Lan-
guage and Gesture, 284–311. Cambridge: Cambridge University Press.
Dittman, Allen T. 1972. The body movement – speech rhythm relationship as a cue to speech en-
coding. In: Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication,
135–152. New York: Pergamon Press.
Dittman, Allen T. and Lynn G. Llewelyn 1969. Body movement and speech rhythm in social con-
versation. Journal of Personality and Social Psychology 11: 98–106.
50. Coverbal gestures: Between communication and speech production 819
Efron, David 1972. Gesture, Race and Culture. The Hague: Mouton.
Ekman, Paul and Wallace V. Friesen 1969. Nonverbal leakage and clues to deception. Psychiatry 3:
88–105.
Feyereisen, Pierre 1983. Manual activity during speaking in aphasic subjects. International Journal
of Psychology 18: 545–556.
Feyereisen, Pierre and Jacques-Domique de Lannoy 1991. Gestures and Speech: Psychological In-
vestigations. Cambridge: Cambridge University Press.
Feyereisen, Pierre, Michele Van de Wiele and Fabienne Dubois 1988. The meaning of gestures:
What can be understood without speech? European Journal of Cognitive Psychology 8: 3–25.
Frege, Gottlob 1952. On sense and reference. In: Peter Geach and Max Black (eds.), Translations
from the Philosophical Writings of Gottlob Frege. Oxford: Basil Blackwell.
Freud, Sigmund 1938a. The interpretation of dreams. In: Abraham Arden Brill (ed.), The Basic
Writings of Sigmund Freud, 181–468. New York: Modern Library.
Freud, Sigmund 1938b. Psychopathology of everyday life. In: Abraham Arden Brill (ed.), The
Basic Writings of Sigmund Freud, 35–178. New York: Modern Library.
Garrett, Merrill F. 1984. The organization of processing structure for language production: Applica-
tions to aphasic speech. In: David Caplan, André Roche Lecours and Allan Smith (eds.), Biolog-
ical Perspectives on Language, 172–193. Cambridge: Massachusetts Institute of Technology Press.
Gentilucci, Maurizio and Michael C. Corballis 2006. From manual gesture to speech: A gradual
transition. Neuroscience and Behavioral Reviews 30: 949–960.
Goldin-Meadow, Susan 2002. Hearing Gestures: How Our Hands Help Us Think. Cambridge, MA:
Harvard University Press.
Hadar, Uri 1989. Two types of gesture and their role in speech production. Journal of Language
and Social Psychology 8: 221–228.
Hadar, Uri 1991a. Body movement during speech: Period analysis of upper arm and head move-
ment. Human Movement Science 10: 419–446.
Hadar, Uri 1991b. Speech-related body movement in aphasia: Period analysis of upper arm and
head movement. Brain and Language 41: 339–366.
Hadar, Uri and Brian Butterworth 1997. Iconic gestures, imagery and word retrieval in speech.
Semiotica 115: 147–172.
Hadar, Uri and Lian Pinchas-Zamir 2004. The semantic specificity of gesture: Implications for ges-
ture classification and function. Journal of Language and Social Psychology 23: 204–214.
Hadar, Uri, Timothy J. Steiner and Frank Clifford Rose 1984. The relationship between head
movements and speech dysfluencies. Language and Speech 27: 333–342.
Hadar, Uri and S. Yadlin-Gedassy 1994. Conceptual and lexical aspects of gesture: Evidence from
aphasia. Journal of Neurolinguistics 8: 57–65.
Halpern, Lipa 1963. Observations on sensory aphasia and its restitution in a Hebrew polyglot. In:
Lipa Halpern (ed.), Problems in Dynamic Neurology, 156–173. Jerusalem: Hadassah.
Hanlon, Robert E., Jason W. Brown and Louis J. Gerstman 1990. Enhancement of naming in non-
fluent aphasia through gesture. Brain and Language 38: 298–314.
Harley, Trevor A. 1984. A critique of top-down serial models of speech production: Evidence from
non-plan-internal speech errors. Cognitive Science 8: 191–219.
Henderson, Alan, Frieda Goldman-Eisler and Andrew Skarbek 1966. Sequential temporal pat-
terns in spontaneous speech. Language and Speech 9: 207–216.
Hostetter, Autumn B. and Martha W. Alibali 2008. Visible embodiment: Gestures as simulated
action. Psychonomic Bulletin & Review 15: 495–514.
Howard, David and Sue Franklin 1988. Missing the Meaning. Cambridge: Massachusetts Institute
of Technology Press.
Kempen, Gerard and Pieter Huijbers 1983. The lexicalization process in sentence production and
naming: Indirect selection of words. Cognition 14: 185–209.
Kendon, Adam 1978. Differential perception and attentional frame in face-to-face interaction:
Two problems for investigation. Semiotica 24: 305–315.
820 IV. Contemporary approaches
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–227. The
Hague: Mouton.
Kendon, Adam 1985. Some uses of gesture. In: Deborah Tannen and Murielle Saville-Troike
(eds.), Perspectives on Silence, 215–234. Norwood (NJ): Ablex.
Kendon, Adam 1986. Some reasons for studying gesture. Semiotica 62: 3–28.
Kimura, Doreen 1979. Neuromotor mechanisms in the evolution of human communication. In:
Horst D. Steklis and Michael J. Raleigh (eds.), Neurobiology of Nonverbal Communication
in Primates: An Evolutionary Perspective, 197–220. New York: Academic Press.
Kircher, Tilo, Benjamin Straube, Dirk Leube, Susanne Weis, Olga Sachs, Klaus Willmes, Kerstin
Konrad and Antonia Green 2009. Neural interaction of speech and gesture: Differential acti-
vations of metaphoric co-verbal gestures. Neuropsychologia 47: 169–179.
Kita, Sotaro and Hedda Lausberg 2008. Generation of co-speech gestures based on spatial imag-
ery from the right-hemisphere: Evidence from split-brain patients. Cortex 44: 131–139.
Krauss, Robert M. and Uri Hadar 1999. The role of speech-related arm/hand gestures in word
retrieval. In: Lynn S. Messing and Ruth Campbell (eds.), Gesture, Speech and Sign, 93–116.
Oxford: Oxford University Press.
Krauss, Robert M., Palmer Morrel-Samuels and Christina Colasante 1991. Do conversational
hand gestures communicate? Journal of Personality and Social Psychology 61: 743–754.
Kristeva, Julia 1978. Gesture: Practice or communication. In: Ted Polhemus (ed.), Social Aspects
of the Human Body. Harmondsworth, UK: Penguin.
LaBarre, Weston 1947. The cultural basis of emotions and gestures. Journal of Personality 16:
49–68.
Lausberg, Hedda, Eran Zaidel, Robyn F. Cruz and Alain Ptito 2007. Speech-independent produc-
tion of communicative gestures: Evidence from patients with complete callosal disconnection.
Neuropsychologia 35: 3092–3104.
Levelt, Willem J. M. 1989. Speaking: From Intention to Articulation. Cambridge: Massachusetts
Institute of Technology Press.
Levelt, Willem J. M., Ardi Roelofs and Antje S. Meyer 1999. A theory of lexical access in speech
production. Behavioral and Brain Sciences 22: 1–38.
Luzzatti, Claudio, Rossella Raggi, Giusy Zonca, Caterina Pistarini, Antonella Contardi and Gian
Domenico Pinna 2002. Verb–noun double dissociation in phasic lexical impairments: The role
of word frequency and imageability. Brain and Language 81: 432–444.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92: 350–371.
McNeill, David 1986. Iconic gestures of children and adults. Semiotica 62: 107–128.
McNeill, David 1992. Hand and Mind. Chicago: University of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Miller, George A. and Philip N. Johnson-Laird 1976. Language and Perception. Cambridge, MA:
Belknap Press of Harvard University Press.
Morrel-Samuels, Palmer and Robert M. Krauss 1992. Word familiarity predicts the temporal asyn-
chrony of hand gestures and speech. Journal of Experimental Psychology: Learning Memory
and Cognition 18: 615–622.
Moscovici, Serge 1967. Communication processes and the properties of language. In: Leonard
Berkowitz (ed.), Advances in Experimental Social Psychology, Volume 3, 225–270. New
York: Academic Press.
Müller, Cornelia 2008. Metaphors Dead and Alive, Sleeping and Waking: A Dynamic View.
Chicago: University of Chicago Press.
Obermeier, Christina, Henning Holle and Thomas C. Gunter 2011. What iconic gesture fragments
reveal about gesture–speech integration: When synchrony is lost, memory can help. Journal of
Cognitive Neuroscience 23: 1648–1663.
Ragsdale, J. Donald and Catherine Fry Silvia 1982. Distribution of kinesic hesitation phenomena
in spontaneous speech. Language and Speech 25: 185–190.
51. The social interactive nature of gestures 821
Abstract
There is a rapidly increasing number of social experiments on gesture use, that is, experi-
ments with at least one condition in which both participants can interact freely. Already
this experimental evidence shows that conversational hand gestures serve a variety of
social interactive functions in face-to-face dialogues. First, speakers in a dialogue gesture
at a higher rate than in a monologue–even when the speaker and addressee cannot see
each other (e.g., on the telephone). Moreover, the form and function of speakers’ gestures
change to fit specific social conditions such as dialogue versus monologue, the presence or
absence of mutual visibility, a shared or different visual perspective, and the presence or
absence of common ground. Gestures also serve several functions other than conveying
information about the topic of the dialogue: They contribute to maintaining the inter-
action process, and they provide the speaker with information about the gesturer’s state
of understanding. Altogether, these findings show how gestures communicate in social
interaction. They also demonstrate that well-designed and controlled experiments need
not reduce dialogue to the study of individuals but can study dialogue itself as an
indivisible unit.
822 IV. Contemporary approaches
1. Introduction
People use gestures primarily in social interaction, seldom when alone. What we mean
here by social interaction is a face-to-face dialogue, and gestures are the conversational
hand movements that people integrate with their words to convey meaning to each
other in a dialogue. This chapter addresses both how to study the social nature of ges-
tures experimentally and what such studies reveal about how the participants in a dia-
logue influence each other’s gestures. Why do speakers gesture when talking on the
phone? Why do interlocutors describe something to one person with clear, well-formed
gestures then use sketchy, poorly formed gestures when describing the same thing to
another person? Why do speakers gesture at some times and not others? It turns out
that the answer to each of these questions is social, as revealed by experimental studies
of gesturing in face-to-face dialogues.
Limiting this review to lab experiments requires some explanation. All gesture re-
searchers find inspiration in the myriad details of everyday conversations. To pursue
these compelling observations, it is necessary to videotape similar phenomena for care-
ful study, which leads to a methodological choice: The researcher could either find con-
versations that are occurring naturally (e.g., at a party, a family dinner, or a playground)
or elicit controlled experimental dialogues in the lab. Either context advances gesture
research in its own way, and each method has its problematic aspects. Experimental re-
searchers must achieve a balance between control and spontaneity. Some experimental-
ists believe that spontaneous dialogue inherently precludes experimental control, so
they replace one of the participants with a confederate or themselves. We consider
this “dialogicide” to be unnecessary, and our review will show that an increasing num-
ber of studies are achieving the desired experimental control without depriving the dia-
logue of its essential features. Indeed, one of our objectives here is to promote the use
of face-to-face dialogue when collecting experimental data for investigations of social
gesture use. After outlining the defining features of dialogue, we review the gestural
findings from studies that used videotapes of real dialogues in the lab.
2. What is dialogue?
Many scholars have proposed that face-to-face dialogue is the primary site of language
use (e.g., Bavelas 1990; Bavelas et al. 1997; Chafe 1994; Clark 1996; Fillmore 1981;
Goodwin 1981; Levinson 1983; Linell 1982). Clark’s (1996) outline of 10 essential fea-
tures of face-to-face dialogue (see Tab. 51.1) provides a practical checklist for research-
ers who want to ensure that the gestural phenomena they elicit in the lab arise in real
dialogues. Perhaps most germane is what is not a dialogue. Obviously, a speaker who is
alone in the lab, describing something to a camera, is not in a dialogue; there is no
addressee. But simply placing an addressee in front of the speaker is not sufficient.
Both participants must be able to formulate their own actions spontaneously, be self-
determined, and act as themselves.
(Continued )
51. The social interactive nature of gestures 823
The following 12-second excerpt of interaction (from Bavelas et al. 2008) illus-
trates a controlled experimental task in the laboratory that also fulfills all of Clark’s
criteria. The two participants, sitting across from each other, interacted spontaneously
within their assigned task. A female speaker was describing a drawing of an unusual
18th century dress (shown in Fig. 51.1) to a male addressee who would later have to
select a picture of this dress from an array of similar dresses. In this excerpt, the speaker
was describing the large design on the front of the very wide skirt. The transcript below
includes subscripts and underlining that indicate where each gesture occurred in rela-
tion to words, with the description of gestures and other actions in italics and square
brackets below. Fig. 51.1 shows still photos of most of the gestures.
(Continued )
824 IV. Contemporary approaches
(Continued )
51. The social interactive nature of gestures 825
Fig. 51.1: Screen shots of the gestures made during the speaker’s description of the large design on
the center of the skirt in the picture above. The video is a three-camera split, with a side view of
the speaker and addressee in the lower screen, a front view of the speaker in the upper screen, and
a head shot of the addressee superimposed on the upper screen (except when the speaker stands
and blocks that camera).
Addressee: Okay
826 IV. Contemporary approaches
The addressee asked two questions about the location of the design on the dress (“start-
ing where?” and “on her waist?”), each time suspending his own hands as if waiting for
the speaker to specify her description. The speaker worked hard to answer his ques-
tions, ultimately standing up to demonstrate the location of the design on her own
body. Meanwhile, he was simultaneously using his own body as a reference point for
his questions. At the end of the excerpt, the addressee’s feedback (“Okay”) signaled
to the speaker that they had established common ground about where the design was
located. At every moment, his questions and actions were influencing her gestures.
The fluidity, spontaneity, and responsiveness to the addressee of her gestures was
striking and, we propose, similar to gestures she might use in typical, everyday social
interactions, such as describing something with unusual spatial characteristics to an
addressee.
Any addressee (such as a confederate or the experimenter) who must respond only
minimally violates the essential characteristics of dialogue and creates a different, unfa-
miliar kind of interaction. Without the precise reciprocity and collaboration inherent in
a real dialogue, we cannot be sure that the gestures produced in the laboratory have
anything in common with the gestures people use in everyday dialogues. There is evi-
dence to suggest that natural behaviors by an addressee are not limited to back chan-
nels. They are closely linked, in both timing and meaning, to what the speaker is saying
(Bavelas, Coates, and Johnson 2000), and a confederate or experimenter who is trying
to respond in a “neutral” or “standard” manner could have unintended effects (Beattie
and Aboudan 1994). When experimental researchers are knowledgeable about dialogue
and take steps to ensure its essential features in the laboratory, they can be more
confident that the conversation is a true dialogue.
Cohen 1977; Cohen and Harrison 1973; Emmorey and Casey 2001; Krauss et al. 1995)
or at the same rate (Bavelas et al. 1992, Experiment 2; Rimé 1982). Why would speak-
ers gesture to an addressee who cannot see them? Bavelas et al. (2008) pointed out a
confounding variable in this group of studies: These experiments manipulated whether
the participants could see each other, but not whether they were participating in a
dialogue.
Bavelas et al. (2008) hypothesized that participating in a dialogue might indepen-
dently elicit gesturing, regardless of whether the gestures were visible. These authors
tested their hypothesis by creating three conditions that disentangled visibility and
dialogue. Speakers described the 18th century dress in the example above to an
addressee in face-to-face dialogue (visibility plus dialogue), to an addressee on the tele-
phone (dialogue only), or alone in the room to a tape recorder (neither visibility nor
dialogue). The authors used linear regression to separate the effects of visibility and
dialogue on the rate of gesturing, checking first for an effect of visibility, then for an
additional, independent effect of dialogue. They found that restricting visibility did sup-
press gesturing, but participating in a dialogue significantly increased it. For example,
speakers gestured at a significantly higher rate in the telephone condition than in the
tape recorder condition, which differed only in whether the speakers were participating
in a dialogue. Interestingly, although speakers appeared to gesture at a higher rate in
the face-to-face condition compared to the telephone condition, this difference was
not significant, a finding that replicated two of the earlier visibility experiments. Bavelas
et al. (2008) noted that these shared a methodological feature:
In Rimé (1982), Bavelas et al. (1992, Experiment 2), and the present experiment, the
speaker and addressee were both participants who could interact freely and sponta-
neously, when and as they wished. In contrast, the five experiments that found a signif-
icant effect of visibility were also the ones that constrained the addressee (who was
usually the experimenter or a confederate) to a limited repertoire of responses. (Bavelas
et al. 2008: 512)
Comparing the three dialogic studies to the other five (cited above) provided additional
evidence that being in a real dialogue increases gesturing, even if the participants can-
not see each other.
Beattie and Aboudan (1994) focused specifically on how sensitive gesturing was to
how closely the speaker’s context resembled a real dialogue. They asked speakers to
describe a cartoon narrative three times, once alone in a room (nonsocial/monologue),
once to a confederate addressee who was present but unresponsive (social/monologue),
and once to a confederate addressee who interacted freely (social/dialogue). There was
a stepwise pattern of results: Participants gestured least in the nonsocial setting, slightly
more in the social/monologue setting, and most in the social/dialogue setting. Strikingly,
the difference between having a nonresponsive addressee and no addressee at all was
not significant: Speakers talking to an unresponsive addressee did not gesture much
more than when there was no addressee at all. However, the difference between the
social/monologue and social/dialogue conditions was significant. Participants gestured
almost three times more when talking to a freely responding addressee than to an un-
responsive addressee. This effect of what Clark (1996) called extemporaneity has obvi-
ous implications for investigations of social gesture use. The authors had even broader
conclusions, asserting that “in future, those theorists who wish to use gesture as an
828 IV. Contemporary approaches
important window on the computational stages of the human mind might find it neces-
sary to pay more attention to the social contexts from which their data is extracted”
(Beattie and Aboudan 1994: 260).
both of these experiments. The analysis revealed that interactive gestures were signifi-
cantly less likely to be redundant with words than topic gestures were. Whereas the
modal topic gesture added no information to the phonemic clause it accompanied,
the modal interactive gesture was completely non-redundant with the words.
Together, these results confirmed that gestures with interactive functions responded
to the availability of a visible addressee, but what if the visible addressee was not
engaged in interaction with the speaker? The next experiment (Bavelas et al. 1995:
Study 1) addressed this question with two conditions that were both face-to-face
dyads. In one condition, the dyads retold a cartoon together in a full dialogue. In the
other condition, one participant retold the first half of the cartoon, then the other par-
ticipant retold the second half; they could not help each other, so they were in sequen-
tial monologues. The results showed that, even though there was a visible addressee in
both conditions, the dyads in the full dialogue condition made interactive gestures at
significantly higher rate than those in the sequential monologues. Finally, they tested
whether addressees understood and responded to the various functions of the interac-
tive gestures. For example, when the speaker made a word-searching interactive ges-
ture, would the addressee provide a word, even though the speaker had not asked
for assistance verbally? One set of analysts identified the specific function of a large ran-
dom sample of the interactive gestures in the data, and another set of analysts classified
the addressee’s response. The predicted effect of interactive gestures on the addressees’
immediately subsequent behavior was statistically significant. Altogether, the series of
studies showed that this relatively small group of previously unnoticed gestures seem to
be an efficient way for interlocutors to manage the social requirement of including and
coordinating with each other, moment by moment, in their dialogue.
less likely to be redundant with the concurrent words. That is, the gestures were more
likely to contribute unique information, which was not being conveyed verbally. In tele-
phone dialogues and tape recorder monologues, the information in the gestures added
little information over and above what the immediately accompanying words conveyed.
The fourth, related finding was that speakers in the visibility condition accompanied sig-
nificantly more of their gestures with verbal deictic expressions (such as “here” or
“there”). These deictics drew attention to the gesture, which carried information
that was not in the words. Speakers in telephone dialogues and tape recorder monolo-
gues rarely marked their gestures with deictic expressions. All four of these effects sug-
gest that the speakers whose addressees could see them drew on their gestures as a
communicative resource, while the other speakers did not.
Kimbara (2006, 2008) demonstrated another effect of visibility on gestures, namely,
whether interlocutors who can see each other tend to use similar gestures for the same
events. Gestures depicting a particular referent can do so in a variety of ways. For exam-
ple, speakers can demonstrate someone running by moving their own arms as though
running, by wiggling two fingers to represent little running legs, or by tracing a path
in the air. In the 2006 study, Kimbara showed that interlocutors tended to encode ges-
tures about the same referent in the same way (e.g., they might both wiggle their fingers
to show a man running). However, this effect might have nothing to do with seeing each
other’s gestures. It could emerge solely from participants’ shared linguistic context and
subsequent convergence on linguistic encoding (i.e., they use similar words in a similar
context). Kimbara (2008) tested this possible alternative explanation by varying visibil-
ity. Ten dyads watched 10 short excerpts from cartoons. After each excerpt, the dyad
retold the excerpt together “in as much detail as possible so that a person who had
not seen the clips could understand what was being described” (Kimbara 2008: 126).
To test the effect of visibility, Kimbara alternated the retellings between two conditions:
the participants could see each other or they had a blind pulled down between them
so that they could hear but not see each other. Kimbara located all instances of co-
referential gesture pairs and analyzed them for convergence of the gestures’ form.
The results showed that when participants could see each other, their gesture forms
converged significantly more often than when they could not (66% vs. 30%). Thus
the similarly in form could not be attributed to shared linguistic context and verbal
encoding; it was an effect of visibility, of being able to see each other’s gestures.
of three participants. One participant was randomly assigned to be the target partici-
pant. The experimenter seated all three out of each other’s sight and gave each of
them some toys (e.g., a “finger trap” or “whirligig”). The target participant had the
same toys as one of the participants (creating common ground) but a different set
from the other (and therefore no common ground). When they had all finished trying
out their set of toys, the experimenter told them which two had had the same set
and which one had a different set. The experimenter then asked the three participants
to briefly discuss what they had done with their toys, but to do so in assigned pairs
with the extra participant waiting out of earshot. In the first two dialogues, the target
participant talked to each of the other two participants in counter balanced order,
which created a within-subjects comparison. Because the objects did not have well
standardized names, the participants depicted what they had done using gestures.
When their addressee did not share common ground, the target participants’ initial
gestures were reliably judged to be more complex, precise, or informative than
when the addressee did share common ground. Common ground led to sketchier,
“sloppier” gestures, presumably because these were all that this addressee needed.
In addition, a qualitative analysis of the pairs without common ground showed that
their gradual accumulation of common ground over the course of the dialogue simi-
larly influenced the form of their gestures. Successive gestures referring to the same
object showed a given-new effect, that is, gestures for new information about a refer-
ent were sharp and clear, and later gestures for the same referent became more
schematic.
Holler and Stevens (2007) followed up these findings by including both speech and
gesture in their analysis so they could investigate how common ground influenced the
interplay between the two modalities. They focused on how participants’ expression of
size in speech and gesture was influenced by whether the participants shared common
ground. The authors used a referential communication task, specifically, several
“Where’s Wally” pictures that prominently featured large objects such as a gigantic
knots in a pipe or an unusually large, dome-shaped roof on a house. Each speaker de-
scribed these pictures and the locations of these large objects either to an addressee
with whom the speaker had previously looked at the pictures, that is, who had seen
the same pictures (common ground condition) or to an addressee who had not seen
them (no common ground condition). Holler and Stevens (2007) found that how speak-
ers used words and gestures to refer to the large objects differed significantly according
to whether they shared common ground or not. Although speakers in both conditions
used verbal size markers (e.g., “big,” “huge,” “enormous”) equally, those in the no com-
mon ground condition often accompanied these words with gestures that were large en-
ough to depict the size accurately. Speakers in the common ground condition did not
accompany their verbal size markers with gestures as often; when they did, the ges-
tures were significantly smaller than the gestures produced in the no common ground
condition. In summary, in the common ground condition, speakers expressed the size
of the objects with their words but used their gestures in the no common ground con-
dition. Holler and Stevens (2007) pointed out that researchers should consider both
the linguistic and imagistic sides of utterances when deciding whether speakers
have become more elliptical. In other words, analyzing only speech or only gesture
is not sufficient.
834 IV. Contemporary approaches
4. Conclusion
The rich group of experiments reviewed here documents the important role that hand
gestures play in language as a social process. Moreover, these experiments illustrate the
variety of tasks and variables that can be used to elucidate gestures’ functions, as well as
an equal variety of dependent variables to assess effects of these variables. The findings
include evidence that participants in a dialogue gesture at a higher rate than speakers
who are in a monologue or with a constrained addressee. In addition, participants in
face-to-face dialogues use interactive gestures that function specifically to manage inter-
active aspects of their conversations. Furthermore, participants can monitor each
other’s gestures to display and check their mutual understanding in order to complete
a task together. Finally, speakers adapt their gestures to a wide variety of social vari-
ables, including whether the participants can see each other, where they are sitting in
relation to each other, what they can see, and whether the addressee is familiar with
what the speaker is describing.
As noted at the outset, our primary theoretical assumption is that the fundamental
site of social interaction is face-to-face dialogue – unmediated, spontaneous, and recip-
rocal conversations between at least two interlocutors. Because face-to-face dialogue is
also the primary site of conversational gestures, the understanding of these gestures
must draw as much on social interactive processes as on factors attributed to individuals
(e.g., cognition, culture, personality, etc.). These two assumptions have strong method-
ological implications. None of the above findings could have arisen in experiments that
used an individual alone, with an experimenter, or with a confederate; they required a
real addressee. If, as Lockridge and Brennan (2002) have shown in another area, there
are different results for the same experimental procedure when using real versus confed-
erate addressees, then methods that do not include real social interaction in face-to-face
dialogue have questionable generalizability to the natural use of conversational gestures.
Fortunately, the number of truly social experiments on gestures has increased rapidly
in the past decade, showing that it is possible to do experimental investigations of ges-
tures in full face-to-face dialogues. The variety of published experiments reviewed here
have already contributed both substantive results and methodological exemplars. It is
therefore possible, fruitful, and necessary to leave the narrow constraints of reduction-
ism in order to continue to advance our knowledge of the social interactive nature of
gestures. The neuropsychologist Alexander Luria pointed out that the principle of
reduction to the smallest possible element is not a scientific necessity. Indeed,
there are grounds to suppose that it may be false. To study a phenomenon, or an event, and
to explain it, one has to preserve all of its basic features. (…) It can easily be seen that re-
ductionism may very soon conflict with this goal. One can reduce water (H2O) into H and
O, but – as is well known – H (hydrogen) burns and O (oxygen) is necessary for burning;
whereas water (H2O) has neither the first or second quality (…). In order not to lose the
basic features of water, one must split it into units (H2O) and not into elements (H and O).
(Luria 1987: 675, emphasis in original)
We propose that, in order to learn about the social interactive nature of gestures, the
indivisible unit of study must be a true dialogue. Attempting to learn about dialogue
from the study of individuals will lose the basic features of social interaction, just as
studying hydrogen and oxygen separately loses the basic features of water.
51. The social interactive nature of gestures 835
5. References
Alibali, Martha W., Dana C. Heath and Heather J. Myers 2001. Effects of visibility between
speaker and listener on gesture production: Some gestures are meant to be seen. Journal of
Memory and Language 44: 169–188.
Bangerter, Adrian 2004. Using pointing and describing to achieve joint focus of attention in dia-
logue. Psychological Science 15: 415–419.
Bavelas, Janet Beavin 1990. Nonverbal and social aspects of discourse in face-to-face interaction.
Text 10: 5–8.
Bavelas, Janet Beavin, Linda Coates and Trudy Johnson 2000. Listeners as co-narrators. Journal of
Personality and Social Psychology 79(6): 941–952.
Bavelas, Janet Beavin, Nicole Chovil, Linda Coates and Lori Roe 1995. Gestures specialized for
dialogue. Personality and Social Psychology Bulletin 21(4): 394–405.
Bavelas, Janet Beavin, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive ges-
tures. Discourse Processes 15: 469–489.
Bavelas, Janet Beavin, Jennifer Gerwing, Chantelle Sutton and Danielle Prevost 2008. Gesturing
on the telephone: Independent effects of dialogue and visibility. Journal of Memory and Lan-
guage 58: 495–520.
Bavelas, Janet Beavin, Sarah Hutchinson, Christine Kenwood and Deborah Hunt Matheson 1997.
Using face-to-face dialogue as a standard for other communication systems. Canadian Journal
of Communication 22: 5–24.
Beattie, Geoffrey and Rima Aboudan 1994. Gestures, pauses and speech: An experimental inves-
tigation of the effects of changing social context on their precise temporal relationships. Semi-
otica 99: 239–272.
Chafe, Wallace L. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Con-
scious Experience in Speaking and Writing. Chicago: University of Chicago Press.
Clark, Herbert H. 1996. Using Language. Cambridge: Cambridge University Press.
Clark, Herbert H. and Meredyth A. Krych 2004. Speaking while monitoring addressees for under-
standing. Journal of Memory and Language 50: 62–81.
Cohen, Akiba A. 1977. The communicative functions of hand illustrators. Journal of Communica-
tion 27: 54–63.
Cohen, Akiba A. and Randall P. Harrison 1973. Intentionality in the use of hand illustrators
in face-to-face communication situations. Journal of Personality and Social Psychology 28:
276–279.
Emmorey, Karen and Shannon Casey 2001. Gesture, thought and spatial language? Gesture 1(1):
35–50.
Fillmore, Charles J. 1981. Pragmatics and the description of discourse. In: Peter Cole (ed.), Radical
Pragmatics, 143–166. New York: Academic Press.
Furuyama, Nobuhiro 2000. Gestural interaction between the instructor and the learner in origami
instruction. In: David McNeill (ed.), Language and Gesture, 99–117. Cambridge: Cambridge
University Press.
Gerwing, Jennifer and Janet Beavin Bavelas 2004. Linguistic influences on gesture’s form. Gesture
4(2): 157–195.
Goodwin, Charles 1981. Conversational Organization: Interaction between Speakers and Hearers.
New York: Academic Press.
Holler, Judith and Rachel Stevens 2007. The effect of common ground on how speakers use ges-
ture and speech to represent size information. Journal of Language and Social Psychology
26(1): 4–27.
Kimbara, Irene 2006. On gestural mimicry. Gesture 6(1): 39–61.
Kimbara, Irene 2008. Gesture form convergence in joint description. Journal of Nonverbal Behav-
ior 32: 123–131.
836 IV. Contemporary approaches
Krauss, Robert M., Robert A. Dushay, Yishiu Chen and Frances Rauscher 1995. The communica-
tive value of conversational hand gestures. Journal of Experimental Social Psychology 31(6):
533–552.
Levinson, Stephen C. 1983. Pragmatics. Cambridge: Cambridge University Press.
Linell, Per 1982. The Written Language Bias in Linguistics. University of Linkoping, Sweden:
Department of Communication Studies.
Lockridge, Calion B. and Susan E. Brennan 2002. Addressees’ needs influence speakers’ early syn-
tactic choices. Psychonomic Bulletin and Review 3: 550–557.
Luria, Alexander R. 1987. Reductionism in psychology. In: Richard Langton Gregory (ed.), The
Oxford Companion to the Mind, 675–676. Oxford: Oxford University Press.
Özyürek, Asli 2000. The influence of addressee location on spatial language and representational
gestures of direction. In: David McNeill (ed.), Language and Gesture, 64–83. Cambridge: Cam-
bridge University Press.
Özyürek, Asli 2002. Do speakers design their cospeech gestures for their addressees? The effects
of addressee location on representational gestures. Journal of Memory and Language 46:
688–704.
Rimé, Bernard 1982. The elimination of visible behavior from social interactions: Effects on ver-
bal, nonverbal and interpersonal variables. European Journal of Social Psychology 12: 113–129.
Abstract
This chapter provides an introductory overview of some of the basic experimental
paradigms traditionally employed in the field of gesture studies to investigate both com-
prehension and production in adult and child populations. With respect to gesture pro-
duction, the chapter taps into paradigms used for exploring both intra-psychological
and inter-psychological functions of co-speech gestures. At the same time, the present
chapter aims to shed light on some of the core questions researchers have been addressing
in using the described paradigms, concluding with a reflection on some of the methodolog-
ical shortcomings and limitations of the respective paradigms and methods used.
1. Introduction
Co-speech gestures occur in all cultures (Kita 2009) and in a wide variety of conversa-
tional contexts. This includes more formal settings, such as doctor-patient/therapist-
client interaction (Duncan and Niederehe 1974; Heath 1989, 2002), teacher-pupil
interaction (Roth 2001), work contexts (Mondada 2007) and official gatherings (Streeck
1994), as well as more informal conversational contexts, for example in interactions
with acquaintances, friends and family (Efron 1941; Goodwin 1986; Kendon 1980,
1985; Müller 2003; Seyfeddinipur 2004; Streeck 1994). The kinds of gestures used in
these contexts and the functions they fulfil are manifold. Explorations of co-speech ges-
tures occurring in natural contexts have been the origin of gesture studies and they
remain a prime focus in the field of gesture studies.
and paradigms that have been employed to find answers to some of the central ques-
tions in the field of co-speech gesture (mainly of a psychological nature). Before dis-
cussing these in detail, it is important to emphasise that one fundamental assumption
underlying this research is that the behaviour we elicit in the laboratory is represen-
tative of what we observe outside of it. Of course, one possibility is that, in any given
experiment, the chosen stimulus material influences the number and nature of ges-
tures used; therefore results have to be considered in their particular context. Largely,
however, the things participants are asked to talk about in gesture experiments tend
to also feature frequently in everyday talk, including spatial relations, actions, objects
and persons. One potentially critical issue which remains, though, is the common use
of cartoon pictures or videos. The semantics of a cartoon world are radically different
to the world we live in – literally anything can happen, even physical impossibilities. It
is therefore possible that speakers use gestures differently in talk about more mun-
dane events, especially if we consider that one use of gesture may be to channel
and influence addressees’ inferences; these, of course, could be crucially different
when trying to process talk about rather unpredictable cartoon worlds (cf. Holler
2003; Holler and Beattie 2003a). However, because, to the best of my knowledge,
no study to date has systematically investigated to what extent gestural behaviour
in- and outside the lab are the same or different, we currently have no reason to dis-
count experimental research on these grounds (but it is certainly an issue requiring
future research).
Further, participants in experimental settings are providing us with insight into
spontaneously produced gestural behaviour (sometimes, bodily behaviour may be
slightly inhibited initially since research ethics require us to inform participants
when they are video-recorded, but warm up conversations tend to get around
this problem). Moreover, we know that we observe at least some of the same phe-
nomena in experimental and non-experimental gesture data. For example, imagistic
gestural representations are common in everyday conversation (e.g., Kendon 1985),
and they occur frequently in laboratory-based communication, too (e.g., McNeill
1992); similar parallels can be claimed for interactive gestures (these involve the
addressee in the interaction and are often associated with handing over a turn or
keeping the floor) which have been observed both in the lab (Bavelas et al. 1995;
Bavelas et al. 1992) as well as in everyday talk (Duncan and Niederehe 1974; Kendon
2004; Streeck and Hartge 1992). Further examples include the so-called “return ges-
ture” (de Fornel 1992) where one participant in a conversation repeating another’s
gesture, which has also been observed in experimental contexts (Holler 2003; Holler
and Wilkin 2011; Kimbara 2006, 2008; Parrill and Kimbara 2006). The parallels
mentioned here are but a few and although no hard and fast evidence they serve
to illustrate the point that gestural behaviour can be observed in experimental
settings which, at least in some important aspects, is like that occurring outside the
laboratory.
Of course, all this is altogether less of an issue if we assume that co-speech ges-
tures are largely independent from the interactive processes happening between the
people talking. The basic requirement is here that the experimental tasks participants
engage in appropriately model the cognitive demands encountered by participants in
communication. This brings us to the questions gesture researchers have been trying
to answer.
52. Experimental methods in co-speech gesture research 839
other things, to identify the lexical affiliates of gestures, the semantic interpretation and
categorisation of gestures, and the recollection of individual gestures. Beattie and Sho-
velton (1999a, 1999b, 2001) used what they called a “semantic feature approach”, which
involved quizzing participants about the kinds of information they had received regard-
ing a range of detailed semantic categories. This was done in various forms, using either
open-ended or forced-choice questionnaires which participants completed for each clip.
Feyereisen, van de Wiele, and Dubois (1988) used a similar method; however, they
filmed people delivering lectures rather than describing imagistic stimuli and then
played video clips of the gestures to decoders. Rather than measuring the amount
the gestures communicated, their focus was on how well the decoders could differenti-
ate gesture types (iconic and batonic). These judgements were made either with or
without speech to see whether access to the verbal message content would modulate
the perception (and communicativeness) of the gestures.
Krauss et al. (1995) also used video footage of speakers communicating, but they in-
cluded a condition in which participants exchanged information via an intercom (i.e.,
where addressees could not see the speakers’ gestures). They played these videos
back to a set of decoders to compare the communicativeness of gestures which ap-
peared to be produced for addressees and those which did not (measured in terms of
the accuracy of decoders’ stimulus selection based on the speakers’ descriptions).
Also, their study introduced a slightly different set of stimuli bearing more abstract
features, such as synthesized sounds, tea flavours and abstract shapes.
Studies by Rogers (1978) and Riseborough (1981), too, used the basic play-back
paradigm to test the communicativeness of gestures, but by introducing conditions in
which just the speaker’s face was visible or the face was blanked out they managed
to filter out the contribution of facial information accompanying gestural representa-
tions (thus contrasting with the studies above). Another variation is the presentation
of noise at different levels of intensity to determine the importance of gestures when
speech is more or less intelligible. A further important difference to the studies above
is that the video footage played back to decoders stemmed from spontaneous interac-
tions between two “naı̈ve” interlocutors, rather than from interactions involving a
confederate (with the exception of Riseborough 1981, experiments 2 and 3). This is a
crucial point, as speakers in these experiments may have produced more natural ges-
tures than when talking to a confederate – I will come back to this issue in section 5.
The measure employed by Rogers (1978) bears similarity to the semantic feature
approach used by Beattie and Shovelton (1999a, 1999b, 2001), as individual questions
(with multiple choice answers) tapped different semantic aspects of the actions and
objects described by the participants in the stimulus videos (an approach based on
Fillmore 1971). Riseborough’s (1981) measure, in contrast, was based on participants’
guesses about the objects the gestures represented, their recall of gestures, and the
information they inserted into blank fields in a transcript of the original narrative.
Apart from adult to adult communication, it has also been tested whether adults
can glean information from children’s gestures, motivated by the idea that co-speech
gestures can reveal something about children’s cognitive development (Alibali
and Goldin-Meadow 1993; Church and Goldin-Meadow 1986; Broaders et al. 2007;
Goldin-Meadow 2000, 2003). In particular, gestures can reveal whether children are
at a so-called transitional stage (a period of time just before their implicit knowledge
is about to advance by a significant step). Because children may benefit a great
52. Experimental methods in co-speech gesture research 841
deal from instruction and input from the environment during such periods, it is an
important question whether adults (e.g., in the role of parents and teachers) are sensi-
tive to this information in the child’s gestural communication. A large number of exper-
imental studies has explored this issue, using a paradigm in which adults decode
information from children’s gestures, extracted from videos of spontaneous interactions
between the child and an experimenter (e.g., Alibali, Flevares, and Goldin-Meadow
1997; Goldin-Meadow and Sandhofer 1999; Goldin-Meadow, Wein, and Chang 1992).
During these interactions, children were asked to explain mathematical equations or
traditional Piagetian conservation problems, which children tend to grasp only at cer-
tain developmental stages. The adult decoders were presented with clips of either just
the speech, the speech accompanied by a “matching” gesture (the gesture represents
the same information as the speech) or a “mismatching” gesture (the gesture represents
different, supplementary information to that contained in speech). They were then
asked to check questionnaire answers relating to the video vignettes tapping the infor-
mation the children had provided, or to talk about the children’s explanations, with
their own speech and gestures subsequently being analysed for content to see what
information the adults had picked up.
Another question is whether children are also able to glean information from co-
speech gestures. Kelly and Church (1997) employed a paradigm very similar to that
used by Goldin-Meadow and colleagues (e.g., Goldin-Meadow, Wein, and Chang
1992), adapted to test the gesture comprehension of 7 year olds. To do so, they employed
three measures, a recall task (children describing, in their own words, the responses given
by the children in the video vignettes), a questionnaire testing for the information the
children thought had been given in the videos, and a task requiring children to assess
whether they thought the children in the videos were just about ready to understand
the concepts they were explaining. Other studies have directly compared the decoding
abilities of children and adults using the same basic paradigm, combined with compre-
hension and memory measures appropriate for the different age groups (Church,
Kelly, and Lynch 2000; Kelly and Church 1998; Thompson and Massaro 1986). Also,
some studies have started to investigate comprehension of gestures in very small children
(around 1 year of age) which have mainly focused on the understanding of intentionality
associated with gestures. These have used quite different paradigms. For example, Gliga
and Csibra (2009) measured children’s looking times in response to objects appearing at
locations indicated by pointing gestures or at the opposite side to that indicated by the
gesture.
Paradigms testing the communicativeness of co-speech gestures have been widely
applied to children and healthy young adults; some studies have adapted these para-
digms also to other populations, such as older adults and aphasics (for examples, see
Cocks et al. 2009; Feyereisen, Seron, and de Macar 1981; Feyereisen and van der Linden
1997; Thompson 1995).
Apart from researching whether co-speech gestures communicate semantic infor-
mation, studies have also focused on the comprehension of the pragmatic aspects of
messages, in particular indirect requests. For example, Kelly et al. (1999) used a play-
back paradigm to present observers with clips of an actor expressing indirect
requests accompanied by gesture (or not), where the gesture provided additional infor-
mation relevant to interpreting the speaker’s communicative intent. They asked parti-
cipants to predict the response of the person who acted as addressee in the stimulus
842 V. Methods
video, thus measuring information uptake from the gestures and whether this infor-
mation was integrated with the interpretation of the speaker’s intended meaning.
Kelly (2001, experiment 1) investigated the role of gesture for pragmatic understand-
ing in children (3–5 year olds) using a similar paradigm. Children watched video-
recorded interactions in which one person uttered an indirect request accompanied
by gesture or not, with children being asked what the speaker in the video had
referred to.
In addition to experiments presenting video clips to participants acting as observers/
decoders, some studies have tested the communicativeness of gestures in live interac-
tions. Graham and Argyle (1975) asked individuals to describe abstract shapes (of
high and low verbal encodability) to a group of addressees present in the same room.
In one condition, describers were allowed to gesture freely, in the other they were
asked to fold their arms. Addresses then drew the shapes, followed by an evaluation
of the accuracy of their drawings in the two conditions to measure gestural communi-
cation. Holler, Shovelton, and Beattie (2009) asked an actor to provide a scripted car-
toon narrative (based on spontaneously produced narratives) to addressees, including
the production of gestures which accompanied the original narratives. After the narra-
tions, addressees answered questions about the stories which were then scored by the
experimenters for the information they contained according to individual semantic fea-
tures (some of which were only represented in the gestures). The communicativeness of
the gestures in the face-to-face condition was compared to video (gesture + speech and
gesture only), as well as to an audio only condition (speech without gesture). With
regard to children’s gestures, Goldin-Meadow and Sandhofer (1999) have shown that
adults can glean significant amounts of information from them when observing the chil-
dren communicate live with an experimenter, using the same paradigm as with their
video-based play-back conditions. Kelly (2001, experiment 2) tested the communicative
role of gesture in children’s pragmatic understanding live by engaging them in interac-
tion with the experimenter who uttered indirect requests (using just speech, gesture and
speech, or just gestures to make the request, such as by pointing at an object). The chil-
dren’s success at understanding was reflected in their response to the indirect requests.
Behne, Carpenter, and Tomasello (2005) and Gräfenhain et al. (2009) showed that ges-
tures are communicative in a live context even to very small children (14 months of
age). Their studies tested children’s interpretation of the communicative intent asso-
ciated with gestures produced by an adult. For example, in the task used by Gräfenhain
et al. (2009), one adult pointed towards one of two locations combined with either
averted gaze or gaze directed at another adult looking for a toy. Children who observed
this scene were then allowed to look for the toy themselves, with their choice of location
providing insight into their comprehension of gesture and gaze cues.
Studies testing the communicativeness of gestures in a live, face-to-face context
advance our knowledge of gesture considerably, as they eradicate some of the potential
limitations of studies using video play-back techniques. For example, in case of the
latter, video clips of individual gestures are presented to decoders often without the
natural context in which they occur, thus isolating them from any other contextual
cues, and in some studies the clips were even played repeatedly. However, video
play-back paradigms do offer the advantage that gestures from spontaneous interac-
tions can be used as the stimuli whereas in most of the studies using a face-to-face con-
text reviewed here (with the exception of the studies by Graham and Argyle as well as
52. Experimental methods in co-speech gesture research 843
find out where in the brain gesture and speech are integrated and which neural net-
works are involved in their processing. For example, Willems, Özyürek, and Hagoort
(2007) used a paradigm in which they varied the difficulty of gesture-speech integration
(with matching versus mismatching gestures) to explore this issue. In another study,
they compared co-speech gestures with those that are less strongly tied to speech (pan-
tomimes) to see whether they activate different or overlapping brain areas (Willems,
Özyürek, and Hagoort 2009). Other researchers have focused on gesture-speech inte-
gration when gesture provides supplementary information which disambiguates speech
(Holle et al. 2008), and for the involvement of the human mirror system in co-speech
gesture processing (Skipper et al. 2007). Further, paradigms have manipulated the
degree of perceived communicative intent associated with gestures to investigate prag-
matic aspects of gesture-speech integration (e.g., by creating gesture-speech mismatches
produced by the same versus different persons (Kelly et al. 2007), or by varying the
speaker’s gaze direction (Holler, Kelly, Hagoort, and Özyürek 2012), or their body ori-
entation as being oriented either towards the participant or towards a third person
(Straube et al. 2010)).
In both ERP and fMRI studies exploring co-speech gesture processing, it is often
necessary to work with video stimuli of a highly controlled nature, as, otherwise, it is
difficult to attribute observed effects to the intended experimental manipulation. The
stimuli used in these studies tend to be video clips of individual gestures presented
on their own, accompanied by speech (words or sentences), or preceded by it. Due
to the strong need for careful control, the stimuli usually involve a trained actor carry-
ing out scripted hand movements. (Also, ERP and fMRI studies often incorporate addi-
tional tasks requiring participants to answer questions or make some other kind of
decision (e.g., using a push-button device). This results in additional datasets of reaction
times (RTs) and response accuracy, for example, which provide further insight into the
comprehension of co-speech gestures.)
The idea that co-speech gestures can provide us with a greater insight into speakers’
underlying mental representations has also become of great relevance in developmental
psychology. Here, particularly the work by Susan Goldin-Meadow and colleagues (e.g.,
Alibali and Goldin-Meadow 1993; Church and Goldin-Meadow 1986; Broaders et al.
2007; Goldin-Meadow, Alibali, and Church 1993; Goldin-Meadow 2003; Perry, Church,
and Goldin-Meadow 1988) (see also Pine, Lufkin, and Messer 2004) has shown that
children often externalise knowledge in co-speech gesture before they are ready to
communicate verbally about the same concepts, such as in their explanations of conser-
vation, maths and balance problems. In these kinds of studies, children are given prob-
lems of the aforementioned kind and are simply asked to provide their explanation of
it. Both gesture and speech can then be analysed for the semantic information they
represent.
Because co-speech gestures bear a very close relationship to speech, researchers
have been intrigued by the nature of this relationship, the role of gesture in the process
of speaking and communicating and the exact functions they fulfil in talk. Different
experimental paradigms have been used to test different hypotheses; these can be
broadly classed into those postulating cognitive functions (thus benefiting mainly the
speaker) and those postulating communicative functions (thus benefiting primarily
the addressee). However, although contrasted here and discussed separately, these
approaches are not necessarily mutually exclusive.
Goldin-Meadow et al. (2001) have argued that co-speech gestures may reduce a
speaker’s cognitive load and thus free up cognitive capacities. In their paradigm, parti-
cipants explained their solutions to a series of maths tasks, which they frequently accom-
panied with gestures, while trying to remember a sequence of letters. This was combined
with a standard memory test (tapping the letter sequences) to see whether those who
gestured more would perform better, assuming that gesturing enabled participants to
allocate more resources to the memory task.
Other researchers have claimed that gestures maintain representations in spatial
working memory, thus indirectly influencing speech production (Morsella and Krauss
2004; Wesp et al. 2001). Both Morsella and Krauss (2004) and Wesp et al.’s (2001) para-
digms required participants to describe stimulus objects either from memory (stimulus
absent) or while looking at them (stimulus present). A similar procedure was used by
de Ruiter (1998, experiment 3) to test the lexical retrieval theory against the theory
that gestures facilitate the encoding of imagery in speech.
Co-speech gestures have also been postulated to facilitate conceptual planning
during the speech production process. To investigate this hypothesis, researchers
have used paradigms which compare conditions under which conceptual planning is
easy versus difficult. For example, Alibali, Kita, and Young (2000) used traditional
Piagetian conservation tasks and asked children to either explain why they thought
two vessels held the same or different amounts of liquid, or describe how the two ves-
sels looked differently. Other studies have asked participants to describe a range of
shapes made up from lines connecting a number of dots to another person; to create
a conceptually more difficult condition, they removed the lines, leaving just dot pat-
terns less suggestive of a particular shape (Hostetter, Alibali, and Kita 2007). Another
study created conditions where participants had to describe geometrical shapes with
or without distracting lines creating competing conceptualisations (Kita and Davies
2009). Melinger and Kita (2007) increased conceptual planning load by asking
846 V. Methods
the addressee could see them via a video link); the fourth condition was set up exactly
like the latter but participants were told that their communication (both audio and
visual) would be fed into a computer. This manipulation allowed the authors to com-
pare the previous three different communicative contexts as well as human-human
and human-machine communication.
Bavelas et al. (2008) introduced another important manipulation to tease apart
the influence of visibility and dialogue on gesture use. In addition to visibility and co-
presence, they varied dialogic interaction. This was done by comparing a face-to-face
and a screen condition, in which interactants were free to engage in dialogue, with a
tape recorder condition, in which participants believed they were recording their mes-
sage for another person who would listen to the recording later – thus, neither did these
speakers see their addressees, nor did they engage in dialogue with them. This work
builds on a series of other studies manipulating monologue and dialogue and investigat-
ing the role of the addressee’s involvement in gesture and language use (Bavelas et al.
1995; Bavelas, Coates, and Johnson 2000; Bavelas et al. 1988; Bavelas et al. 1986). In
addition to research on gesture use in dyadic interaction, manipulating visibility, co-
presence and dialogic interaction, researchers have investigated the influence of
another contextual factor on co-speech gestures – that of addressee location (Özyürek
2002). Here, speakers talked to either one or two addressees who were located directly
opposite or towards the side. Speakers’ use of gesture when representing spatial infor-
mation was compared between these conditions to provide insight into recipient design
in gesture use. Further, because this research involved multi-party interactions, it ex-
pands our knowledge from gesture use in dyadic interactions to that in triads.
Apart from the influence of the degree of interactivity and physical contextual fac-
tors (such as co-presence, visibility, number of addressees and their location), re-
searchers have investigated the influence of more cognitive, covert processes of
conversation. One variable that has been manipulated in this context is the common
ground between speakers and their addresses (i.e., the knowledge, beliefs and as-
sumptions mutually shared by participants in an interaction (Clark 1996)). Common
ground has been experimentally induced in a variety of ways. Gerwing and Bavelas
(2004) asked participants to play with either the same (common ground) or a differ-
ent (no common ground) set of toys and then asked one person to tell another about
their experiences with the toys. The gestures used to refer to the toys in the two
groups were compared for differences their form (precision). Apart from creating
common ground based on shared action-based experiences, researchers have also
used paradigms to induce it visually by presenting stimuli to the speaker and the
addressee or just to the speaker (who then talks to an unknowing addressee). Holler
and Stevens (2007) used images showing particularly large entities amongst smaller
ones and focused their analysis on the effect of common ground on the encoding
of size information in both gesture and speech. Holler and Wilkin (2009) used a
similar method, but instead of pictures used a short video, allowing their analysis
to focus on a wider range of semantic features (relating to actions, objects and
persons, as well as their attributes). Parrill (2010) also used video stimuli to experi-
mentally manipulate common ground, but instead of a longer video (telling a
whole story) she used a short clip showing a single event and, similar to Holler and
Stevens (2007), focused her analysis on one semantic aspect of it (here, the ground ele-
ment). In addition, she combined this with the manipulation of “information salience”,
848 V. Methods
i.e., whether the ground element had been mentioned by an experimenter previously to
the participant referring to it or not. Holler (2003) and Jacobs and Garnham (2007) ma-
nipulated common ground by asking participants to relay the same description of events
represented in cartoon pictures to the same addressee repeatedly (thus accumulating
common ground) in order to then compare the speakers’ gesture rate across the trials.
Jacobs and Garnham (2007) also used joint visual availability of the stimulus to induce
common ground, by providing both speaker and addressee with the view of the stimulus
while it was being described.
Other studies investigating the link between communicative intent and co-speech
gestures have manipulated verbal ambiguity to find out whether speakers would draw
on the gestural modality to clarify their speech for the interlocutor (Holler and Beattie
2003b). Further, Melinger and Levelt (2004) investigated whether co-speech gestures
encode what they defined as “necessary” information, and whether in such cases speakers
were less likely to also represent this information in speech.
Although not manipulating communicative intent directly, studies focusing on ges-
tural mimicry (Holler and Wilkin 2011; Kimbara 2006, 2008; Parrill and Kimbara
2006) provide insight into the collaborative use of co-speech gestures and add further
to our knowledge of communicative uses of co-speech gestures in communication.
Finally, many of the paradigms reviewed above have been adapted (and in some
cases special paradigms have been newly created) to investigate gesture production in
populations other than children and the “healthy student adult”, such as in older adults
(Feyereisen and Havard 1999), split-brain patients (Kita and Lausberg 2008; Lausberg
and Kita 2002; Lausberg et al. 2003), aphasic patients (Cocks, Hird, and Kirsner 2007;
Hadar et al. 1998) and Alzheimer’s patients (Carlomagno et al. 2005; Glosser, Wiley,
and Barnoski 1998).
into a certain direction (e.g., the experimenter may, unconsciously, respond more enthu-
siastically or encouragingly (verbally or nonverbally) in cases where the participant
has displayed gestural behaviour in line with the experimental predictions (or sanction
behaviour going against them, such as with a lack of positive feedback)).
Of course, there may sometimes be good reasons as to why researchers want to use
confederates in their studies. In comprehension studies, it is important to isolate a single
manipulation or difference to test a particular hypothesis and obtain clear results. Espe-
cially in ERP and MRI studies, a tightly controlled, carefully constructed stimulus (op-
timising the signal/noise ratio) is necessary for the signal to pick up any meaningful and
unequivocally interpretable responses from the brain. Other reasons are that confeder-
ates producing scripted behaviour allow researchers to examine recipients’ responses to
these behaviours – which may be useful when the natural occurrence of such behaviours
is rather rare (meaning that an unmanageably huge number of hours of recordings and
participants would be needed to obtain large enough a dataset), or when the social con-
text in which it occurs creates too much noise for a clear analysis. Yet another reason of
for using confederates, at least in production studies, is the availability of resources, in-
cluding the size of the participant pool, financial means for compensating participants,
and the greater effort and difficulty associated with recruiting unacquainted participants
as pairs. Although regarding this latter reason scientific rigour and validity should cer-
tainly weigh stronger, much of the research using confederates in production studies
was carried out quite a few years ago, when the strong influences social-interactional
contexts can have on gesture use was not all that well known. Researchers nowadays
benefit from this awareness and where future studies need to employ confederates as
addressees or stimuli-actors for the above named (or other) reasons, one way of redu-
cing methodological limitations is to complement the analyses with a second, smaller
dataset using spontaneous interactions between “naı̈ve” participants in the same respec-
tive context. This helps to demonstrate that similar behaviour occurs in a more natural
context. Another (or better, additional) option is to have the consistency and natural-
ness of the confederate’s behaviour established by a separate set of independent
observers.
Another controversial issue is the manipulation of the interaction between speaker
and addressee. In many of the production studies cited in this chapter, the participant
taking on the role of the addressee was asked not to interrupt the speaker with ques-
tions (while still delivering back-channel responses though). It appears that one of
the reasons researchers choose to limit the amount of dialogic exchange is the possi-
bility of experimental confounds. Studies often measure the influence of various cog-
nitive and social variables on gesture by focusing on gesture frequency or gesture
rate. However, we also know that verbal interaction itself (as compared to mono-
logue) influences gestures rate, independent of any additional manipulation (Bavelas
et al. 1995; Beattie and Aboudan 1994). Thus, when manipulating, for instance, common
ground or conceptual load, participants may interact more with their addressee in one of
the experimental conditions than in another (e.g., participants may feel more rapport
with the other participant when mutually sharing certain knowledge, or they may seek
more help or feedback from their addressee when finding communication conceptually
more difficult). In such a case, a higher gesture rate in one of the conditions could be
due to a difference in dialogic interaction per se as well as due to the experimental
manipulation.
850 V. Methods
However, considering that other studies have shown that dialogical involvement of
the addressee impacts crucially on gestural behaviour (Bavelas et al. 1995; Beattie
and Aboudan 1994) studies restricting interaction may be fundamentally limited in
the extent to which their findings can be generalised to dialogue. Because dialogue is
one of the most common forms of everyday conversation this is a serious potential lim-
itation. While researchers certainly need to be aware of this limitation (and take it into
account when drawing their conclusions), those studies based on restricted interactions
are certainly not without value. This is because everyday talk constitutes a continuum,
ranging from monologue to dialogue (Pickering and Garrod 2004). People talk in
monologue when delivering lectures, conference talks or other oral presentations,
and in conversation, individual speakers often take extended turns to tell stories and
anecdotes, jokes, describe how someone gets from A to B, what procedure to follow
to achieve a certain goal, or about other complex contexts whose explanation stretches
over several sentence. During such extended turns it is not rare that addressees provide
mainly backchannel responses rather than take the floor. In other cases, some interlo-
cutors may simply be more dominant, vocal or extrovert and therefore talk consider-
ably more than others, possibly leaving no opportunity at all for turn contributions
from other participants for much of the conversation. Moreover, in almost all of the
gesture production studies reviewed here, one participant is assigned the role of the
speaker who has all the information (i.e., who has seen the stimuli) and who tells it
to their addressee. This sort of situation leads, for obvious reasons, mainly to conversa-
tions dominated by one individual with a limited number of turns between speakers,
even when these are completely free to interact. Considering the wide range of different
forms of talk, it is important that our research reflects this spectrum, thus capturing
human communication as the multi-faceted dimension that it is. At the same time,
though, it is vital that researchers recognise the particular facet that individual datasets
and analyses are representative of and be wary of over-generalisation.
Alternatively, researchers may choose to explore free interaction as a default, and in
contexts where differences in dialogic interaction could confound results, undertake
steps to tackle these unwanted influences. For example, experimental groups could be
compared for the number of turns used/the number of questions asked, and so on. If
differences on these dimensions are found, statistical procedures that partial out the
respective influences could be employed. This would allow researchers to carry out
experimental studies in order to exert some degree of control over aspects such as con-
tent of talk (narration/description of set stimuli) but without compromising spontane-
ous social interaction and running the risk of unnecessary reductionism; only
approaches using a social unit of analysis offer the opportunity to capture those pro-
cesses that cannot be captured by simply “summing the parts” (cf. Bavelas 2005). Con-
sidering we still know relatively little about gesture as a social behaviour and its use in
dialogic interaction, experimental paradigms based on spontaneous, free interaction
between non-confederates is certainly one main avenue researchers in this field need
to pursue.
6. Conclusion
In this chapter, I have tried to provide an overview of the range of basic paradigms (and
their variations) employed in experimental co-speech gesture research, combined with
52. Experimental methods in co-speech gesture research 851
some degree of critical reflection on aspects of these procedures. Due to the scope of
this article, this overview remains selective and limited in many ways, but I hope to
have been able to provide some starting point here, especially to scholars new in the
field of co-speech gesture research.
Ultimately, in choosing between different experimental methods and weighing up
their pros and cons, it depends on the researcher’s exact aim to decide what is gained
and lost by opting for a particular paradigm. With respect to the interpretation of
research findings, it is important to recognise that differences in research results may
in fact be rooted in differences between experimental paradigms used, even if these
may seem small (such as regarding the degree of dialogic interaction). Further, it is
important to be careful with the generalisation of findings and with distinguishing,
for example, whether results tell us something about gesture use in dialogue or in
more monologue-type contexts; or whether they tell us nothing about gesture use in
interaction at all but useful things about the gesture-speech relationship nevertheless
(i.e., something that Bavelas 2005 has referred to as studying the mind, or individuals’
thinking, as opposed to social interaction).
In trying to make a choice between different paradigms in the light of their method-
ological advantages and limitations, the most fruitful approach may still be one that
combines less with more experimentally controlled methods (given there are good rea-
sons for employing the latter). This way, we might throw light on the phenomena we
aim to investigate from a range of different angles, capturing partly different aspects,
and obtaining the most comprehensive answers. In my view, different experimental
methods and techniques complement, similar to laboratory-based research of co-speech
gesture complementing observations of gesture in non-experimental contexts – both in
terms of the methods used and the questions answered.
7. References
Alibali, Martha W., Lucia M. Flevares and Susan Goldin-Meadow 1997. Assessing knowledge con-
veyed in gesture: Do teachers have the upper hand? Journal of Educational Psychology 89:
183–193.
Alibali, Martha W. and Susan Goldin-Meadow 1993. Gesture speech mismatch and mechanisms of
learning: What the hands reveal about a child’s state of mind. Cognitive Psychology 25: 468–523.
Alibali, Martha W., Dana C. Heath and Heather J. Myers 2001. Effects of visibility between
speaker and listener on gesture production: Some gestures are meant to be seen. Journal of
Memory and Language 44: 169–188.
Alibali, Martha W., Sotaro Kita and Amanda J. Young 2000. Gesture and the process of speech
production: We think, therefore we gesture. Language & Cognitive Processes 15: 593–613.
Allen, Shanley, Ash Özyürek, Sotaro Kita, Amanda Brown, Reyhan Furman, Tomoko Ishikuza and
Mihoko Fujii 2007. Language-specific and universal influences in children’s syntactic packaging
of manner and path: A comparison of English, Japanese, and Turkish. Cognition 102: 16–48.
Argyle, Michael and Jean A. Graham 1975. A cross-cultural study of the communication of extra-
verbal meaning by gestures. International Journal of Psychology 10: 57–67.
Bavelas, Janet B. 2005. The two solitudes: Reconciling social psychology and language and social
interaction. In: Kristine L. Fitch and Robert E. Sanders (eds), Handbook of Language and
Social Interaction, 179–200. Mahwah, NJ: Lawrence Erlbaum.
Bavelas, Janet B., Alex Black, Nicole Chovil, Charles R. Lemery and Jennifer Mullett 1988. Form
and function in motor mimicry topographic evidence that the primary function is communica-
tive. Human Communication Research 14: 275–299.
852 V. Methods
Bavelas, Janet B., Alex Black, Charles R. Lemery and Jennifer Mullett 1986 “I show how you feel”:
Motor mimicry as a communicative act. Journal of Personality and Social Psychology 50: 322–329.
Bavelas, Janet B. and Nicole Chovil 2000. Visible acts of meaning: An integrated message model
of language in face-to-face dialogue. Journal of Language and Social Psychology 19: 163–194.
Bavelas, Janet B., Nicole Chovil, Linda Coates and Lori Roe 1995. Gestures specialized for dia-
logue. Personality and Social Psychology Bulletin 21: 394–405.
Bavelas, Janet B., Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive gestures.
Discourse Processes 15: 469–489.
Bavelas, Janet B., Linda Coates and Trudy Johnson 2000. Listeners as co-narrators. Journal of Per-
sonality and Social Psychology 79: 941–952.
Bavelas, Janet B., Jennifer Gerwing, Chantelle Sutton and Danielle Prevost 2008. Gesturing on the
telephone: Independent effects of dialogue and visibility. Journal of Memory and Language 58:
495–520.
Beattie, Geoffrey and Rima Aboudan 1994. Gestures, pauses and speech: An experimental inves-
tigation of the effects of changing social context on their precise temporal relationship. Semi-
otica 99: 239–272.
Beattie, Geoffrey and Jane Coughlan 1999. An experimental investigation of the role of iconic gestures
in lexical access using the tip-of-the-tongue phenomenon. British Journal of Psychology 90: 35–56.
Beattie, Geoffrey and Heather Shovelton 1999a. Do iconic hand gestures really contribute any-
thing to the semantic information conveyed by speech? Semiotica 123: 1–30.
Beattie, Geoffrey and Heather Shovelton 1999b. Mapping the range of information contained in
the iconic hand gestures that accompany spontaneous speech. Journal of Language and Social
Psychology 18: 438–462.
Beattie, Geoffrey and Heather Shovelton 2001. An experimental investigation of the role of dif-
ferent types of iconic gesture in communication. Gesture 1(2): 129–149.
Behne, Tanya, Malinda Carpenter and Michael Tomasello 2005. One-year-olds comprehend the
communicative intentions behind gestures in a hiding game. Developmental Science 8: 492–499.
Broaders, Sara C., Susan Wagner Cook, Zachary Mitchell and Susan Goldin-Meadow 2007. Mak-
ing children gesture brings out implicit knowledge and leads to learning. Journal of Experimen-
tal Psychology: General 136: 539–550.
Carlomagno, Sergio Maria Pandolfi, Andrea Marini, Gabriella Di Iasi and Carla Cristilli 2005.
Coverbal gestures in Alzheimer’s type dementia. Cortex 41: 535–546.
Church, Ruth Breckinridge and Susan Goldin-Meadow 1986. The mismatch between gesture and
speech as an index of transitional knowledge. Cognition 23: 43–71.
Church, Ruth Breckinridge, Spencer D. Kelly and Katherine Lynch 2000. Multi-modal processing
over development: The case of speech and gesture detection. Journal of Nonverbal Behavior
24: 151–174.
Clark, Herbert H. 1996. Using Language. Cambridge: Cambridge University Press.
Cocks, Naomi, Kathryn Hird and Kim Kirsner 2007. The relationship between right hemisphere
damage and gesture in spontaneous discourse. Aphasiology 21: 299–319.
Cocks, Naomi, Laetitia Sautin, Sotaro Kita, Gary Morgan and Sally Zlotowitz 2009. Gesture and
speech integration: An exploratory study of a man with aphasia. International Journal of Lan-
guage and Communication Disorders 44: 795–804.
Cohen, Akiba A. 1977. The communicative functions of hand illustrators. Journal of Communica-
tion 27: 54–63.
Cohen, Akiba A. and Randall P. Harrison 1973. Intentionality in the use of hand illustrators in face-
to-face communication situations. Journal of Personality and Social Psychology 28: 276–279.
Corballis, Michael 2003. From Hand to Mouth: The Origins of Language. Princeton, NJ: Princeton
University Press.
de Fornel, Michel 1992. The return gesture: Some remarks on context, inference, and iconic ges-
ture. In: Peter Auer and Aldo Di Luzio (eds.), The Contextualisation of Language, 159–176.
Amsterdam: John Benjamins.
52. Experimental methods in co-speech gesture research 853
de Ruiter, Jan-Peter 1998. Gesture and speech production. Ph.D. dissertation, MPI Series in Psy-
cholinguistics, Catholic University of Nijmegen, the Netherlands.
Duncan, Starkey and George Niederehe 1974. On signalling that it’s your turn to speak. Journal of
Experimental Social Psychology 10: 234–247.
Efron, David 1941. Gesture and Environment. New York: King’s Crown Press.
Feyereisen, Pierre and Isabelle Havard 1999. Mental imagery and production of hand gestures
while speaking in younger and older adults. Journal of Nonverbal Behavior 23: 153–171.
Feyereisen, Pierre, Xavier Seron and M. de Macar 1981. L’interpretation de differentes categories
de gestes chez des sujets aphasiques. Neuropsychologia 19: 515–521.
Feyereisen, Pierre and Martial van der Linden 1997. Immediate memory for different kinds of ges-
tures in younger and older adults. Cahiers de Psychologie Cognitive/Current Psychology of
Cognition 16: 519–533.
Feyereisen, Pierre, Michèle van de Wiele and Fabienne Dubois 1988. The meaning of gestures:
What can be understood without speech? Cahiers de Psychologie Cognitive 8: 3–25.
Fillmore, Charles 1971. Types of lexical information. In: Danny D. Steinberg and Leon A. Jako-
bovits (eds.), Semantics. An Interdisciplinary Reader in Philosophy, Linguistics and Psychol-
ogy, 370–392. New York: Cambridge University Press.
Frick-Horbury, Donna and Robert E. Guttentag 1998. The effects of restricting hand gesture pro-
duction on lexical retrieval and free recall. American Journal of Psychology 111: 43–62.
Gerwing, Jennifer and Janet B. Bavelas 2004. Linguistic influences on gesture’s form. Gesture 4(2):
157–195.
Gliga, Teodora and Gergely Csibra 2009. One-year-old infants appreciate the referential nature of
deictic gestures and words. Psychological Science 20: 347–353.
Glosser, Guila, Mary J. Wiley and Edward J. Barnoski 1998. Gestural communication in Alzhei-
mer’s disease. Journal of Clinical and Experimental Neuropsychology 20: 1–13.
Goldin-Meadow, Susan 2000. Beyond words: The importance of gestures to researchers and learn-
ers. Child Development 71: 231–239.
Goldin-Meadow, Susan 2003. Hearing Gesture: How Our Hands Help Us Think. Cambridge, MA:
Harvard University Press.
Goldin-Meadow, Susan, Howard Nusbaum, Spencer D. Kelly and Susan Wagner 2001. Explaining
math: Gesturing lightens the load. Psychological Science 12: 516–522.
Goldin-Meadow, Susan and Catherine Sandhofer 1999. Gestures convey substantive information
about a child’s thoughts to ordinary listeners. Developmental Science 2: 67–74.
Goldin-Meadow, Susan, Martha Wagner Alibali, and Ruth Breckinridge Church 1993. Transi-
tions in concept acquisition: Using the hand to read the mind. Psychological Review 100:
279–297.
Goldin-Meadow, Susan, Debra Wein and Cecilia Chang 1992. Assessing knowledge through ges-
ture: Using children’s hands to read their minds. Cognition and Instruction 9: 201–219.
Goodwin, Charles 1986. Gesture as a resource for the organization of mutual orientation. Semi-
otica 62: 29–49.
Gräfenhain, Maria, Tanya Behne, Malinda Carpenter and Michael Tomasello 2009. One-year-
olds’ understanding of nonverbal gestures directed to a third person. Cognitive Development
24: 23–33.
Graham, Jean A. and Michael Argyle 1975. A cross-cultural study of the communication of extra-
verbal meaning by gestures. International Journal of Psychology 10: 57–67.
Gullberg, Marianne 2003. Eye movements and gestures in human face-to-face interaction. In:
Jukka Hyönä, Ralph Radach and Heiner Deubel (eds.), The Mind’s Eye: Cognitive and
Applied Aspects of Eye Movement Research, 685–703. Oxford: Elsevier.
Gullberg, Marianne 2006. Handling discourse: Gestures, reference tracking, and communication
strategies in early L2. Language Learning 56: 155–196.
Gullberg, Marianne and Kenneth Holmqvist 1999. Keeping an eye on gestures: Visual perception
of gestures in face-to-face communication. Pragmatics & Cognition 7: 35–63.
854 V. Methods
Gullberg, Marianne and Kenneth Holmqvist 2006. What speakers do and what addressees look at:
Visual attention to gestures in human interaction live and on video. Pragmatics & Cognition 14:
53–82.
Gullberg, Marianne and Sotaro Kita 2009. Attention to speech-accompanying gestures: Eye
movements and information uptake. Journal of Nonverbal Behavior 33: 251–277.
Hadar, Uri, Dafna Wenkert-Olenik, R. Krauss and Nachum Soroker 1998. Gesture and the pro-
cessing of speech: Neuropsychological evidence. Brain and Language 62: 107–126.
Heath, Christian 1989. Pain talk: The expression of suffering in the medical consultation. Social
Psychology Quarterly 52: 113–125.
Heath, Christian 2002. Demonstrative suffering: The gestural (re)embodiment of symptoms. Jour-
nal of Communication 52: 597–616.
Holle, Henning and Thomas C. Gunter 2007. The role of iconic gestures in speech disambiguation:
ERP evidence. Journal of Cognitive Neuroscience 19: 1175–1192.
Holle, Henning, Thomas C. Gunter, Shirley-Ann Rüschemeyer, Andreas Hennenlotter and Marco Ia-
coboni 2008. Neural correlates of the processing of co-speech gestures. NeuroImage 39: 2010–2024.
Holler, Judith 2003. Semantic and pragmatic aspects of representational gestures: Towards a uni-
fied model of communication in talk. Ph.D. dissertation, Department of Psychology, University
of Manchester, Manchester (UK).
Holler, Judith and Geoffrey Beattie 2002. A micro-analytic investigation of how iconic gestures
and speech represent core semantic features in talk. Semiotica 142: 31–69.
Holler, Judith and Geoffrey Beattie 2003a. How iconic gestures and speech interact in the repre-
sentation of meaning: Are both aspects really integral to the process? Semiotica 146: 81–116.
Holler, Judith and Geoffrey Beattie 2003b. Pragmatic aspects of representational gestures: Do
speakers use them to clarify verbal ambiguity for the listener? Gesture 3(2): 127–154.
Holler, Judith, Spencer D. Kelly, Peter Hagoort and Asli Özyürek 2012. When gestures catch the eye:
The influence of gaze direction on co-speech gesture comprehension in triadic communication.
In: Naomi Miyake, David Peebles and Richard D. Cooper (eds.), Proceedings of the 34th Annual
Conference of the Cognitive Science Society, 467–472. Austin, TX: Cognitive Society.
Holler, Judith, Heather Shovelton and Geoffrey Beattie 2009. Do iconic hand gestures really con-
tribute to the communication of semantic information in a face-to-face context? Journal of
Nonverbal Behavior 33: 73–88.
Holler, Judith and Rachel Stevens 2007. The effect of common ground on how speakers use gesture
and speech to represent size information. Journal of Language and Social Psychology 26: 4–27.
Holler, Judith and Katie Wilkin 2009. Communicating common ground: How mutually shared
knowledge influences speech and gesture in a narrative task. Language and Cognitive Processes
24: 267–289.
Holler, Judith and Katie Wilkin 2011. Co-speech gesture mimicry in the process of collaborative
referring durig face-to-face dialogue. Journal of Nonverbal Behavior 35: 133–153.
Hostetter, Autumn B., Martha W. Alibali and Sotaro Kita 2007. I see it in my hands’ eye: Represen-
tational gestures reflect conceptual demands. Language and Cognitive Processes 22: 313–336.
Jacobs, Naomi and Alan Garnham 2007. The role of conversational hand gestures in a narrative
task. Journal of Memory and Language 56: 291–303.
Kelly, Spencer D. 2001. Broadening the units of analysis in communication: speech and nonverbal
behaviours in pragmatic comprehension. Journal of Child Language 28: 325–349.
Kelly, Spencer D., Dale Barr, Ruth Breckinridge Church and Katheryn Lynch 1999. Offering a
hand to pragmatic understanding: The role of speech and gesture in comprehension and mem-
ory. Journal of Memory and Language 40: 577–592.
Kelly, Spencer D. and Ruth Breckinridge Church 1997. Can children detect conceptual information
conveyed through other children’s nonverbal behaviors? Cognition and Instruction 15: 107–134.
Kelly, Spencer D. and Ruth Breckinridge Church 1998. A comparison between children’s and
adults’ ability to detect conceptual information conveyed through representational gestures.
Child Development 69: 85–93.
52. Experimental methods in co-speech gesture research 855
Kelly, Spencer D., Peter Creigh and James Bartolotti 2010. Integrating speech and iconic gestures
in a stroop-like task: Evidence for automatic processing. Journal of Cognitive Neuroscience
22(4): 683–694.
Kelly, Spencer D., Jana M. Iverson, Joseph Terranova, Julia Niego, Michael Hopkins and Leslie
Goldsmith 2002 Putting language back in the body: Speech and gesture on three time frames.
Developmental Neuropsychology 22: 323–349.
Kelly, Spencer D., Corinne Kravitz and Michael Hopkins 2004. Neural correlates of bimodal
speech and gesture comprehension. Brain and Language 89: 253–260.
Kelly, Spencer D., Asli Özyürek and Eric Maris 2010. Two sides of the same coin: Speech and ges-
ture mutually interact to enhance comprehension. Psychological Science 21: 260–267.
Kelly, Spencer D., Sarah Ward, Peter Creigh and James Bartolotti 2007. An intentional stance
modulates the integration of gesture and speech during comprehension. Brain and Language
101: 222–233.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relationship of Verbal and Nonverbal Communication, 207–228. The
Hague: Mouton.
Kendon, Adam 1985. Some uses of gesture. In: Deborah Tannen and Muriel Saville-Troike (eds.),
Perspectives on Silence, 215–234. Norwood, NJ: Ablex.
Kendon, Adam 2000. Language and gesture: Unity or duality. In: David McNeill (ed.), Language
and Gesture, 47–63. Cambridge: Cambridge University Press.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kimbara, Irene 2006. On gestural mimicry. Gesture 6(1): 39–61.
Kimbara, Irene 2008. Gesture form convergence in joint description. Journal of Nonverbal Behav-
ior 32: 123–131.
Kita, Sotaro 2009. Cross-cultural variation of speech-accompanying gesture: A review. Language
and Cognitive Processes 24: 145–167.
Kita, Sotaro and Thomas S. Davies 2009. Competing conceptual representations trigger co-speech
representational gestures. Language and Cognitive Processes 24: 761–775.
Kita, Sotaro and Hedda Lausberg 2008. Generation of co-speech gestures based on spatial imag-
ery from the right-hemisphere: Evidence from split-brain patients. Cortex 44: 131–139.
Kita, Sotaro and Asli Özyürek 2003. What does cross-linguistic variation in semantic coordination
of speech and gesture reveal? Evidence for an interface representation of spatial thinking and
speaking. Journal of Memory and Language 48: 16–32.
Krauss, Robert M., Yihsiu Chen and Rebecca Gottesman 2000. Lexical gestures and lexical access:
A process model. In: David McNeill (ed.), Language and Gesture, 261–283. New York: Cam-
bridge University Press.
Krauss, Robert M., Robert A. Dushay, Yihsiu Chen and Frances Rauscher 1995. The communicative
value of conversational hand gesture. Journal of Experimental Social Psychology 31: 533–552.
Krauss, Robert M., Palmer Morrel-Samuels and Christina Colasante 1991. Do conversational
hand gestures communicate? Journal of Personality and Social Psychology 61: 743–754.
Lausberg, Hedda and Sotaro Kita 2002. Dissociation of right and left hand gesture spaces in split-
brain patients. Cortex 38: 883–886.
Lausberg, Hedda, Sotaro Kita, Eran Zaidel and Alain Ptito 2003. Split-brain patients neglect left
personal space during right-handed gestures. Neuropsychologia 41: 1317–1329.
Lickiss, Karen P. and A. Rodney Wellens 1978. Effects of visual accessibility and hand restraint on
fluency of gesticulator and effectiveness of message. Perceptual and Motor Skills 46: 925–926.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92: 350–371.
McNeill, David 1992. Hand and Mind. Chicago: University of Chicago Press.
McNeill, David 2001. Analogic/Analytic representations and cross-linguistic differences in think-
ing for speaking. Cognitive Linguistics 11: 43–60.
McNeill, David and Susan Duncan 2000. Growth points in thinking-for-speaking. In: David
McNeill (ed.), Language and Gesture, 141–161. Cambridge: Cambridge University Press.
856 V. Methods
Melinger, Alissa and Willem J. Levelt 2004. Gesture and the communicative intention of the
speaker. Gesture 4: 119–141.
Melinger, Alissa and Sotaro Kita 2007. Conceptualisation load triggers gesture production. Lan-
guage and Cognitive Processes 22: 473–500.
Mol, Lisette, Emiel Krahmer, Alfons Maes and Marc Swerts 2009. The communicative import of
gestures: Evidence from a comparative analysis of human-human and human-machine interac-
tions. Gesture 9(1): 97–126.
Mondada, Lorenza 2007. Multimodal resources for turn-taking: Pointing and the emergence of
possible next speakers. Discourse Studies 9: 194–225.
Morsella, Ezequiel and Robert M. Krauss 2004. The role of gestures in spatial working memory
and speech. American Journal of Psychology 117: 411–424.
Müller, Cornelia 2003. On the gestural creation of narrative structure: A case study of a story told
in a conversation. In: Monica Rector, Isabella Poggi and Nadine Trigo (eds.), Gestures: Mean-
ing and Use, 259–265. Porto: Universidade Fernando Pessoa.
Özyürek, Asli 2002. Do speakers design their cospeech gestures for their addressees? The effects of
addressee location on representational gestures. Journal of Memory and Language 46: 688–704.
Özyürek, Asli, Sotaro Kita, Shanley Allen, Amanda Brown, Reyhan Furman and Tomoko Ishi-
zuka 2008. Development of cross-linguistic variation in speech and gesture: Motion events
in English and Turkish. Developmental Psychology 44: 1040–1054.
Özyürek, Asli, Sotaro Kita, Shanley Allen, Reyhan Furman and Amanda Brown 2005. How does
linguistic framing of events influence co-speech gestures? Insights from cross-linguistic varia-
tions and similarities. Gesture 5(1/2): 219–240.
Özyürek, Asli, Roel M. Willems, Sotaro Kita and Peter Hagoort 2007. On-line integration of
semantic information from speech and gesture: Insights from event-related brain potentials.
Journal of Cognitive Neuroscience 19: 605–616.
Parrill, Fey 2010. The hands are part of the package: Gesture, common ground, and information packag-
ing. In: John Newman and Sally Rice (eds.), Empirical and Experimental Methods in Cognitive/
Functional Research, 285–302. Stanford, CA: Center for the Study of Language and Information.
Parrill, Fey and Irene Kimbara 2006. Seeing and hearing double: The influence of mimicry in
speech and gesture and observers. Journal of Nonverbal Behavior 30: 157–166.
Perry, Michelle, Ruth Breckinridge Church and Susan Goldin-Meadow 1988. Transitional knowl-
edge in the acquisition of concepts. Cognitive Development 3: 359–400.
Pickering, Martin J. and Simon Garrod 2004. Toward a mechanistic psychology of dialogue.
Behavioural and Brain Sciences 27: 1–57.
Pine, Karen, Hannah Bird and Elizabeth Kirk 2007. The effects of prohibiting gestures on chil-
dren’s lexical retrieval ability. Developmental Science 10: 747–754.
Pine, Karen, Nicola Lufkin and David Messer 2004. More gestures than answers: Children learn-
ing about balance. Developmental Psychology 40: 1059–1067.
Rauscher, Frances H., Robert M. Krauss and Yihsiu Chen 1996. Gesture, speech, and lexical
access: The role of lexical movements in speech production. Psychological Science 7: 226–231.
Rime, Bernard, Loris Schiaratura, Michel Hupet and Anne Ghysselinckx 1984. Effects of relative
immobilization on the speaker’s nonverbal behavior and on the dialogue imagery level. Moti-
vation and Emotion 8: 311–325.
Riseborough, Margaret G. 1981. Physiographic gestures as decoding facilitators: Three experiments
exploring a neglected facet of communication. Journal of Nonverbal Behavior 5: 172–183.
Rizzolatti, Giacomo and Michael Arbib 1998. Language within our grasp. Trends in Neuroscience
21: 188–194.
Rogers, William T. 1978. The contribution of kinesic illustrators toward the comprehension of ver-
bal behavior within utterances. Human Communication Research 5: 54–62.
Roth, Wolff-Michael 2001. Gestures: Their role in teaching and learning. Review of Educational
Research 71: 365–392.
Seyfeddinipur, Mandana 2004. Meta-discursive gestures from Iran: Some uses of the ‘Pistol Hand’.
In: Cornelia Müller and Roland Posner (eds.), The Semantics and Pragmatics of Everyday
53. Documentation of gestures with motion capture 857
Gestures. Proceedings of the Berlin Conference April 1998 (Körper Zeichen Kultur 9), 205–216.
Berlin: Weidler.
Skipper, Jeremy I., Susan Goldin-Meadow, Howard C. Nusbaum and Steven L. Small 2007.
Speech-associated gestures, Broca’s area, and the human mirror system. Brain and Language
101: 260–277.
Straube, Benjamin, Antonia Green, Andreas Jansen, Anjan Chatterjee and Tilo Kircher 2010.
Social cues, mentalizing and the neural processing of speech accompanied by gestures. Neurop-
sychologia 48: 382–393.
Streeck, Jürgen 1994. Gesture as communication II: The audience as co-author. Research on Lan-
guage and Social Interaction 27: 239–267.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: Gestures at the transition place. In: Peter Auer and
Aldo di Luzio (eds.), The Contextualisation of Language, 135–157. Amsterdam, NL: John Benjamins.
Thompson, Laura A. 1995. Encoding and memory for visible speech and gestures: A comparison
between young and older adults. Psychology & Aging 10: 215–228.
Thompson, Laura A. and Dominic W. Massaro 1986. Evaluation and integration of speech and
pointing gestures during referential understanding. Journal of Experimental Child Psychology
42: 144–168.
Tomasello, Michael 2008. Origins of Human Communication. Cambridge: Massachusetts Institute
of Technology Press.
Wesp, Richard, Jennifer Hesse, Donna Keutmann and Karen Wheaton 2001. Gestures maintain
spatial imagery. American Journal of Psychology 114: 591–600.
Willems, Roel M., Asli Özyürek and Peter Hagoort 2007. When language meets action: The neural
integration of gesture and speech. Cerebral Cortex 17: 2322–2333.
Willems, Roel M., Asli Özyürek and Peter Hagoort 2009. Differential roles for left inferior frontal
and superior temporal cortex in multimodal integration of action and language. NeuroImage
47: 1992–2004.
Wu, Ying Choon and Seana Coulson 2007. How iconic gestures enhance communication: An ERP
study. Brain and Language 101: 234–245.
Yan, Stephanie and Elena Nicoladis 2009. Finding le mot juste: Differences between bilingual and
monolingual children’s lexical access in comprehension and production. Bilingualism: Lan-
guage and Cognition 12: 323–335.
Abstract
For the scientific observation of non-verbal communication behavior, video recordings
are the state of the art. However, everyone who has conducted at least one video-based
858 V. Methods
study has probably made the experience, that it is difficult to get the setup right, with
respect to image resolution, illumination, perspective, occlusions, etc. And even more
effort is needed for the annotation of the data. Frequently, even short interaction
sequences may consume weeks or even months of rigorous full-time annotations.
One way to overcome some of these issues is the use of motion capturing for assessing
(not only) communicative body movements. There are several competing tracking tech-
nologies available, each with its own benefits and drawbacks. The article provides an
overview of the basic types of tracking systems, presents representation formats and
tools for the analysis of motion data, provides pointers to some studies using motion cap-
ture and discusses best practices for study design. However, the article also stresses that
motion capturing still requires some expertise and is only starting to become mobile
and reasonably priced – arguments not to be neglected.
1. Introduction
Modern tracking technology for full body tracking, also referred to as motion capturing,
can be used as an alternative or a complementary method to video recordings (see also
Pfeiffer this volume, on documenting with data gloves for a discussion). It offers a high
precision of position and orientation information regarding specific points of reference,
which can be chosen by the experimenter. Motion tracking can be used to collect move-
ment trajectories, posture data, speed profiles and other performance indices. Disadvan-
tages are: the employment of obtrusive technology that has to be applied to the body of
the target, such as reflective markers used by optical tracking systems; data which de-
scribes rather the postures and movements of a stick-like figure than that of a body
with certain masses. Therefore, motion capturing should almost always go together
with video recordings; and last but not least the costs of the installation.
The following text provides a short introduction into the state of the art of tracking
technology that is available for the use in laboratories. While there are some companies
that offer high-quality motion capturing services for the film and gaming industries,
making use of their services is probably beyond the scope of more modest research
projects. The following presentation is therefore focused on the technology that can
be found in research laboratories at present.
2. Tracking technologies
There are two general types of tracking technologies available: marker-based systems
and marker-less systems. Both types have their advantages, especially in the context
of authentic empirical research of natural human communication.
As the name suggests, marker-based tracking systems rely on the detection of spe-
cific markers. The systems differ in the types of markers they use: some are based on
passive reflective markers or colored patches others use active infrared-LEDs. Also,
combinations of different types of markers are used. The markers can be single objects,
most frequently spheres that support a tracking of their position. As the position is spe-
cified in three coordinates (X/Y/Z), these markers provide three degrees of freedom
(3 DoF). Other markers, such as the ones presented in Fig. 53.1, have a more complex
3D structure. These markers also support a tracking of their orientation and thus
have six DoF (6 DoF, 3 DoF for the position and 3 DoF for the orientation). Markers
with 6 DoF can also be uniquely identified, which is not easily possible with sphere-like
3-DoF markers.
Fig. 53.1: Left: The optical tracking system from Advanced Realtime Tracking GmbH uses infra-
red cameras (upper left and upper right corner) to measure the position and orientation of specif-
ically designed targets. The targets use reflective spherical markers in unique 3D configurations,
which enables the system to uniquely identify each marker. The person at the center wears markers
at distinct positions relevant for a study on body movements related to verb productions. Right: The
picture on the right shows a screenshot of the software visualizing the tracking data recorded using
the setup on the left.
The systems with motion-capturing suites operate in the so-called outside-in mode: the
markers are attached to the object of interest and the motion of the markers is observed
by appropriate devices, such as high-speed infrared cameras. The markers have to be
carefully arranged on the target, so that all relevant movements are captured and the
structure of the body can be reconstructed from the data. Fig. 53.1 shows a participant
of one of our studies with a small set of markers attached to knees, shoulders, elbows
and head. The hands are tracked with the special finger-tracking system offered by
the Advanced Realtime Tracking GmbH (ART 1999). Often, the body is represented
by a skeletal stick figure. The movements of the markers are then translated into the
movements of the skeletal representation of the body (see Fig. 53.1, right). As the
860 V. Methods
markers are attached to the soft skin of the body and not to the bones themselves, a
mapping from the markers to the skeletal figure has to be defined, e.g. in an initial cal-
ibration procedure. This type of tracking is the standard tracking system for human
body movements today. Prominent examples of such tracking systems are the systems
developed by Advanced Realtime Tracking GmbH (ART 1999), Vicon Motion Systems
(VMS 1984) and Motion Analysis (MA 1982).
The advantages of the outside-in tracking are that the markers which are attached to
the tracked target are very small, lightweight and cheap. If the full body is to be tracked,
many markers need to be attached to get a good approximation of the body’s posture
and thus size, weight and costs of markers are important factors.
A major disadvantage is that the devices used to detect the markers and their move-
ments need to be set up with a good view on the target object. Thus outside-in tracking
faces a problem similar to those faced by standard video recordings. The resolution and,
e.g., the focal distances of the cameras are also restrictive factors constraining the opera-
tional area in which movement can be tracked, the interaction space of the tracking system.
A typical setup for the tracking of dyadic communication has an interaction space of 3 m x
3 m. Such a setup would already require eight or more cameras, which have to be placed
carefully in the surrounding of the interaction space and which have to be thoroughly cali-
brated to construct a common coordinate system as frame of reference. An advantage of
the motion tracking approach in contrast to normal video recordings is that whereas the
video from each camera has to be annotated separately, the motion-capturing system pro-
vides one integrated data point for each marker. Thus, increasing the number of cameras
in an outside-in tracking system will only increase the accuracy and range of the system
and have no effects on the effort to invest for the analysis of the data.
Inside-out tracking reverses the arrangement of tracking devices and markers: the
tracking devices are attached to the target and the markers are distributed over the
environment, e.g. the ceiling. This kind of setup can reduce costs if only few positions
need to be tracked. It also enables tracking in large interaction spaces which are too
large to be covered with an outside-in tracking system. The tracking devices, however,
are larger and less lightweight than the corresponding markers and they will be more
expensive and more fragile. This will increase the obtrusion of the tracking technology
experienced by the tracked participant and thus might negatively influence the perfor-
mance to be observed. Well-known examples for inside-out tracking are the GPS sys-
tem used for navigation or the Wii-Remote (Wii 2006) developed by Nintendo for
their Wii gaming console. In case of the Wii-Remote, however, the rather large sensor
has to be held in the hand during the recordings.
A special case of marker-based inside-out tracking are magnetic tracking systems,
such as the Ascension Flock of Birds (ATC 1986, see Fig. 53.2). These systems have sen-
sor devices, which are attached to the target, but they do not use discrete markers in the
environment. Instead, they have an outside unit which produces a magnetic field that
covers the interaction space (3 m x 4.5 m) and in which the sensors can measure
their position and orientation. The advantage of these systems is that they have no
need of a clear line of sight to markers and provide a high update rate greater than
100 Hz. Current systems, such as Ascension MotionStar, can operate wired or wireless
and provide up to 20 sensor positions per target. The accuracy of these systems depends
on the structure of the magnetic field and is typically better, the closer the sensor is to
the field generating unit.
53. Documentation of gestures with motion capture 861
Fig. 53.2: Ascension’s Flock of Bird is the most prominent magnetic tracking solution. The black
box in the back generates a magnetic field in which the sensor presented in the front can deter-
mine its position and orientation.
Fig. 53.3: Example of a segmentation of four persons based on the depth image provided by a Mi-
crosoft Kinect using marker-less tracking. The basic depth image data is visualized as a grey-scale
image and brighter colors represent areas closer to the Kinect. The segmentation and detection of
persons, here overlaid using colored regions, was computed by OpenNI (ONI 2011). The person to
the left also had OpenNI’s skeleton tracking activated.
Marker-less outside-in tracking systems observe the movement of the body from a dis-
tance. Most of the available systems operate in the visual domain. For an overview of
purely vision-based techniques, see Wang and Singh (2003), Moeslund, Hilton, and Krüger
(2006), Poppe (2007) or Poppe (2010).
A recent prominent example of an outside-in tracking system is the Microsoft Kinect
(MSDN 2010), an interaction device based on a depth camera produced by PrimeSense
862 V. Methods
(PS 2005), which actually uses a kind of marker, a pattern of structured light which is
projected onto the target and whose distortions are measured to extract depth informa-
tion. The Kinect does not require any attachments to the target. As a first result, the
Kinect provides a depth image that is represented as a greyscale image where the indi-
vidual intensities encode the depth of the first object hit by the light (see Fig. 53.3). The
provided software frameworks, Microsoft Kinect SDK (MSDN 2010) for Windows or
OpenNI (ONI 2011) for Windows and other platforms, analyze the depth image and
extract skeletal information in a second step (see Fig. 53.3, person to the left). This skel-
eton model is still rather coarse, as can be seen in Fig. 53.3, and does not contain hands,
fingers or the orientation of the head. This technology is rather new, so more precise
versions of Kinect-like systems and better software frameworks for skeleton extraction
are to be expected.
There are several alternatives to the Biovision Hierarchy file format, such as the
Hierarchical Translation Rotation (HTR) and the hierarchy-less Global Translation
Rotation (GTR) formats used by the company Motion Analysis (MA 1982). A more
recent format is GMS (Luciani et al. 2006) which is more compact as it uses a binary
representation and is also more flexible than Biovision Hierarchy. It is, however, less
widespread and thus has not gained a similar support as Biovision Hierarchy yet.
The National Institute of Health defined a format mainly targeted at biomechanical
research which is called Coordinate 3D (C3D 1987). It is a binary format and supports a
large variety of data well beyond the pure 3D position and orientation data described in
the Biovision Hierarchy format. A Coordinate 3D file can include data from electro-
myography, force plates, patient information, analysis results, such as gait timing, and
is extensible to support new data.
Formats that are quite popular are the commercial Autodesk FBX format (FBX
2012), which is a binary format that can be accessed and manipulated using the FBX
SDK, or the Collada format (COLLADA 2012), which is an open format based on Ex-
tensible Markup Language (XML) (Bray et al. 2008). The gaming company Acclaim
864 V. Methods
developed their own motion capturing system and defined two file formats, Acclaim
Skeleton File (ASF) and Acclaim Motion Capture data (ACM), to store the recorded
data (Schafer 1994). These file formats have been adopted by Oxford Metrics (OMG
1984) for their Vicon system (VMS 1984). An advantage of the Acclaim Skeleton
File / Acclaim Motion Capture files is, that they are text-based American Standard
Code for Information Interchange (ASCII) files and thus human readable.
Gestures are typically analyzed by annotating a visualization of the data recordings.
Please view Section 4 in Pfeiffer this volume, on documentation with data gloves for a
description of software tools that support the annotation of gesture recordings using
motion capture.
Other well documented studies using motion capturing have been recorded by the
Spontal project (Beskow et al. 2009) and by the POETICON project (Pastra et al. 2010).
Pfeiffer, Kranstedt, and Lücking (2006) we describe an approach where we visualized the
recorded data in an immersive Virtual Reality environment that allowed us to evaluate a
life-sized 3D multimodal visualization of all data recorded in our study on pointing ges-
tures simultaneously in an integrated view. This helped us to assess the quality of the dif-
ferent data channels (motion data, video, audio, speech transcription, and gesture
annotation) in one place and led to an improved quality of the created corpus.
6. Conclusion
Motion capturing technology paves the way for large corpora of quantitative data on
gestures at high spatial and temporal resolutions. The temporal and spatial precision
of today’s motion capturing systems is higher than anything that has been archived
based on the annotation of video recordings, and systems are still improving.
These advantages, however, come at a cost: tracking equipment is still expensive and
it requires some expertise to be set up and used. Also, pure tracking data is not as easily
evaluated as video recordings. With some experience, these investments are compen-
sated by a greatly reduced time required for the analysis of the data, as the manual
annotation of the recordings is reduced to a minimum, e.g. to identify and label the
time intervals of interest.
The increased interest in game production and consumer electronics has also led to fur-
ther developments in the field. Basic tracking systems already have a four-digit price tag and
more advancement can be expected. Consumer tracking systems, such as the Microsoft
Kinect, allow the tracking of simple skeleton models in a restricted interaction volume. So-
lutions combining several of such systems to extend precision and interaction volume are
also available. In the near future, we can expect high quality tracking systems to be found
in nearly every household. As consumers are also not too enthusiastic in attaching markers,
the most successful systems will be based on unobtrusive marker-less tracking technology.
And, maybe even more important, we will become attuned to being tracked by such systems
through everyday exposure – maybe even more than we are used to being videotaped – and
thus we will be less affected by the use of motion capturing in experimental settings.
Motion capturing, however, is not the panacea for linguistic research, as the follow-
ing last example underlines. Martell and Kroll (2007) used a machine-learning approach
to identify gesture phases in a pre-annotated video-based corpus. Their annotation of
the gestures is based on FORM. In a first study, they pre-annotated the positions of
the end-effector (the hand) within a 5x5x5 grid manually and were able to train a Hid-
den Markov Model to detect gesture phases – but only with moderate results. In a sec-
ond study, they compared the use of motion-capture data with the manual annotations
for the same problem. They expected that the fidelity of the motion capture data would
lead to more successful classifications. However, the opposite was the case and manual
annotations performed better than motion capture as basis for the machine learning
process. They explained this by a smoothing of the data done by the human annotators
that did not happen with the raw motion capture data. In addition, the annotation
scheme based on the discrete 5x5x5 grid abstracted away from detailed trajectories
and thus the annotation of many different gesture paths look the same. This more
coarse representation, a result of the pre-processing by human annotators, could have
been easier to learn by the machine learning algorithm. The punch line is that one
has to be careful when optimizing away human involvement from an analytical process,
53. Documentation of gestures with motion capture 867
as we might not yet have penetrated the topic deep enough to mold our knowledge into
an algorithm and leave the machine alone to handle the data
7. References
ART 1999. Advanced Realtime Tracking GmbH, online: http://www.ar-tracking.de (last access
January 2012).
ATC 1986. Ascension Technology Corporation, online: http://www.ascension-tech.com/ (last access
February 2012).
Beskow, Jonas, Jens Edlund, Kjell Elenius, Kahl Hellmer, David House and Sofia Strömbergsson
2009. Project presentation: Spontal – multimodal database of spontaneous dialog. In: Peter
Branderud and Hartmut Traunmüller (eds.), Proceedings of FONETIK 2009, 190–193. Stock-
holm: Stockholm University.
Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler and François Yergeau 2008. Extensible
Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation, 26 November 2008. W3C.
C3D 1987. The 3D Biomechanics Data Standard. Online: http://www.c3d.org/ (last access January 2012).
COLLADA 2012. Khronos Group, COLLAborative Design Activity, online: https://collada.org
(last access January 2012).
FBX 2012. Autodesk FBX, online: http://www.autodesk.com/fbx (last access January 2012).
Huenerfauth, Matt 2006. Generating American sign language classifier predicates for English-to-
ASL machine translation. Ph.D. dissertation, Department of Computer and Information
Sciences, University of Pennsylvania.
Huenerfauth, Matt and Pengfei Lu 2010. Eliciting spatial reference for a motion-capture corpus of
American Sign Language discourse. In: Philippe Dreuw, Eleni Efthimiou, Thomas Hanke,
Trevor Johnston, Gregorio Martı́nez Ruiz and Adam Schembri (eds.), 4th Workshop on the
Representation and Processing of Signed Languages (LREC 2010): http://www.sign-lang.uni-
hamburg.de/lrec2010/lrec_cslt_01.pdf. CD Content (Copyright by the European Language Re-
sources Association - ISBN 2-9517408-6-7). 121–124.
Kranstedt, Alfred, Andy Lücking, Thies Pfeiffer, Hannes Rieser and Ipke Wachsmuth 2006. Deic-
tic object reference in task-oriented dialogue. In: Gert Rickheit and Ipke Wachsmuth (eds.),
Situated Communication, 155–207. Berlin: De Gruyter Mouton.
Loeding, Barbara L., Sudeep Sarkar, Ayush Parashar and Arthur I. Karshmer 2004. Progress in
automated computer recognition of sign language. Computers Helping People with Special
Needs 3118: 1079–1087.
Lu, Pengfei and Matt Huenerfauth 2009. Accessible motion-capture glove calibration protocol for
recording sign language data from deaf subjects. In: Shari Trewin and Kathleen F. McCoy
(eds.), Proceedings of the 11th International ACM SIGACCESS Conference on Computers
and Accessibility, 83–90. ACM New York, NY: USA.
Lu, Pengfei and Matt Huenerfauth 2010. Collecting a motion-capture corpus of American Sign
Language for data-driven generation research. In: Melanie Fried-Oken, Kathleen F. McCoy
and Brian Roark (eds.), Proceedings of the NAACL HLT 2010 Workshop on Speech and Lan-
guage Processing for Assistive Technologies, 89–97, Association for Computational Linguistics
Stroudsburg, PA: USA.
Luciani, Annie, Matthieu Evrard, Damien Couroussé, Nicolas Castagné, Claude Cadoz and Jean-
Loup Florens 2006. A basic gesture and motion format for virtual reality multisensory applica-
tions. In: Proceedings of the 1st International Conference on Computer Graphics Theory and
Applications. Setubal. arXiv:1005.4564 [cs.HC].
MA 1982. Motion Analysis, online: http://www.motionanalysis.com/ (last access January 2012).
Martell, Craig and Joshua Kroll 2007. Corpus-based gesture analysis: An extension of the form
dataset for the automatic detection of phases in a gesture. International Journal of Semantic
Computing 1: 521.
868 V. Methods
Moeslund, Thomas B., Adrian Hilton and Volker Krüger 2006. A survey of advances in vision-based
human motion capture and analysis. Computer Vision and Image Understanding 104: 90–126.
MSDN 2010. Microsoft Kinect SDK, online: http://www.microsoft.com/en-us/kinectforwindows/
(last access January 2012).
OMG 1984. Oxford Metrics Group, online: http://www.omg3d.com (last access January 2012),
Oxford, UK.
ONI 2011. OpenNI, online: http://www.openni.org/ (last access January 2012).
Pastra, Katerina, Christian Wallraven, Michael Schultze, Argyro Vataki and Kathrin Kaulard 2010.
The POETICON corpus: Capturing language use and sensorimotor experience in everyday inter-
action. In: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph
Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds.), Proceedings of the
Seventh International Conference on Language Resources and Evaluation (LREC’10), European
Language Resources Association (ELRA). online: http://poeticoncorpus.kyb.mpg.de/ (last access
January 2012). European Language Resources Association (ELRA): Valletta, Malta. 3031–3036.
Pfeiffer, Thies 2011. Understanding Multimodal Deixis with Gaze and Gesture in Conversational
Interfaces. Aachen, Germany: Shaker.
Pfeiffer, Thies this volume. Documentation of gestures with data gloves. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interac-
tion. (Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Pfeiffer, Thies, Alfred Kranstedt and Andy Lücking 2006. Sprach-Gestik Experimente mit IADE,
dem Interactive Augmented Data Explorer. In: Stefan Müller and Gabriel Zachmann (eds.),
Dritter Workshop Virtuelle und Erweiterte Realität der GI-Fachgruppe VR/AR, 61–72. Aachen,
Germany: Shaker.
Poppe, Ronald 2007. Vision-based human motion analysis: An overview. Computer Vision and
Image Understanding 108: 4–18.
Poppe, Ronald 2010. A survey on vision-based human action recognition. Image and Vision Com-
puting 28: 976–990.
PS 2005. PrimeSense Ltd., online: http://www.primesense.com/ (last access January 2012).
Schafer, M. 1994. ASF Acclaim Skeleton File Format. online: http://mocap.co.nz/downloads/ASF_
spec_v1.html (last access January 2012).
VMS 1984. Vicon Motion Systems, online: http://www.vicon.com/ (last access January 2012), Oxford, UK.
Wang, Jessica JunLin and Sameer Singh 2003. Video analysis of human dynamics – a survey. Real-
Time Imaging 9: 321–346.
Wii 2006. Nintendo Wii Remote Gaming Controller, online: http://wii.com/ or http://en.wikipedia.
org/wiki/Wii_Remote (last access February 2012).
Abstract
Human hand gestures are very swift and difficult to observe from the (often) distant per-
spective of a scientific overhearer. Not uncommonly, fingers are occluded by other body
parts or context objects and the true hand posture is often only revealed to the addressee.
In addition to that, as the hand has many degrees of freedom and the annotation has
to cover positions and orientations in a 3D world – which is less accessible from the typ-
ical computer-desktop workplace of an annotator than, let’s say, spoken language – the
annotation of hand postures is quite expensive and complex.
Fortunately, the research on virtual reality technology has brought about data gloves in
the first place, which were meant as an interaction device allowing humans to manipulate
entities in a virtual world. Since their release, however, many different applications have
been found. Data gloves are devices that track most of the joints of the human hand and
generate data-sets describing the posture of the hand several times a second. The article
reviews different types of data gloves, discusses representation formats and ways to
support annotation, and presents best practices for study design using data gloves as
recording devices.
1. Introduction
The human hands form very unique and versatile parts of the body. They explore the
environment via tactile and haptic feedback, they manipulate the environment and
they communicate with interlocutors. The movements of the hands thus provide the
receptive observer with information about the current sensory information state, the
current actions or the communicative intent of a subject. Scientific observations of
the communicative function of manual gestures are what we are interested in within
this chapter.
If such observations are to be done systematically, e.g. as part of a scientific study, it
is essential that the movements and the postures taken by the hands are captured and
archived for later analysis. In this process, it is important to have ways of capturing
these data in an objective way. The live manual annotation of a gesturing interlocutor
or the depiction of a posture or movement in a sketch, for example, may be able to cap-
ture a qualitative impression of a certain behavior. The movement itself, however, is
then no longer accessible for more thorough investigation and for validation by a
third party. While such approaches to investigate, e.g., the gestural capabilities of hu-
mans are essential to gather an overview of the field, a rigid validation of such findings
will require different methods for capturing the data.
As Bouissac (this volume) shows, depictions of static and dynamic gestures are found
even in prehistoric art. However, for a long time science lacked the appropriate meth-
ods for capturing dynamic manual gestures in the moment of their production, which
without doubt is one reason for the difficulties scientists experienced in studying the
communicative function of gestures. Written texts can be approached more easily
and written language provides a means to transcribe at least the verbal part of spoken
dialogues. Interactive dialogue is less accessible, because it is fast paced and typically
multimodal.
Technical progress brought us first film recordings: celluloid film was developed by
Hannibal Goodwin in 1887 and the first film camera by Louis Le Prince in 1888; the
870 V. Methods
first amateur film camera was the Birtac by Birt Acres in 1898. Video recordings fol-
lowed shortly afterwards, starting in 1935 with AEG-Telefunken. The first home-
video recorders were marketed in 1969 by Philips and Grundig; later came Sony’s
Videomovie in 1980 and the Video 8 systems in 1982. Today, the required technology
is ubiquitously available, both for spontaneous recordings in the field, which can be
taken by any smartphone, and for high-quality and high-speed recordings in the
laboratory.
At first glance, high-definition video camcorders seem to be ideal for capturing fine-
grained hand movements during gestures. There are, however, still several problems.
First of all, a video camera provides a single perspective on the scene. It may therefore
happen that parts of the captured movements are occluded. This is typically the case
for movements of the fingers, which may easily be occluded by the back of the same
hand or the other arm and hand. This problem can be solved to some extent by
using multiple video camcorders to capture the movements from several perspectives
simultaneously. However, this significantly increases the amount of captured data, in-
troduces challenges to synchronize the different devices and will finally have a severe
impact on the processing time needed for the analysis of the recorded data. It is also
difficult to handle multiple perspectives within the same annotation tool (Hanke,
Storz, and Wagner 2010).
This leads us to the second major problem that comes with video recordings
(although it is not restricted to them): the requirement of manual annotation. Direct
or mediated observations of gestures via video recordings can only provide qualitative
data. If the results need to be quantified – and in most cases they will – the data has to
be annotated. The annotation of videos is a challenge in its own, and a rigid approach is
required to ensure a high quality of the annotations. Typically, a codebook and annota-
tion guidelines are defined, and the agreement of all participating annotators is con-
trolled. This basically means hours of manual work, often stepping through the
recorded videos frame by frame. When working on explorative studies, I have known
it to happen that the codebook was updated iteratively during the annotation process
to reflect the latest findings. This, however, always meant that the annotations already
made had to be updated as well. This is a tedious process, but going over such iterations
can only improve the quality of the annotations.
The third problem is that not all features that might be relevant for targeting a cer-
tain research question can be accessed with the same precision by annotating videos.
Features that are easily accessible concern timing, such as onset and offset of a gesture,
the well-known gesture phases preparation, stroke and retraction (Kita, van Gijn, and
van der Hulst 1998; see also Pfeiffer this volume), and handedness. However, a posture
description of the shape of the hand at a certain point in time can only be coarse-
grained if made from a fixed perspective. Under such conditions, certain features,
such as the position of the hand or the tips of individual fingers, as well as the exact pos-
ture of the hand are less accessible. I have seen cases where the position of the hand was
measured with a ruler on the screen displaying the video recordings. This approach
might be the most practicable way if no other measurement is available – or if the
need for positional data only came up after the experiment was conducted. The overall
quality of the data gathered using such measurements, however, will in most cases be
low. The distance of the recorded subjects from the camera and therefore their per-
ceived size may vary; the gesture might be subject to perspective distortions (unless
54. Documentation of gestures with data gloves 871
it is done completely in a plane orthogonal to the viewing direction); the spatial reso-
lution of the video recordings may be lower than required or the technical characteris-
tics of the display devices may not have been considered (e.g. different displays have
different dot densities).
An alternative to video recordings for documenting the hand movements of a subject
are so-called data gloves (see Section 2 for an overview), which the subjects can slip on
to track changes of their hand postures. Once worn, the data gloves stream descriptions
of the current hand posture (see Section 3 for representation formats) to a recording
device in near real-time. Such devices offer very rich data, up to 27 degrees of freedom
(including the translation) per hand (ElKoura and Singh 2003). Data gloves typically
provide the configuration of the fingers only. If the position and orientation of the
hand is needed, data gloves can be combined with another tracking technology (see
Pfeiffer this volume, on Motion Capturing or the discussion in Heloir, Neff, and Kipp
2010). Data gloves offer high spatial precision, which is in most cases independent
of perspective, and provide sufficient temporal precision. The rich data-set provided by
data gloves is machine readable and as such ready for further feature extraction and qual-
itative analysis (see Section 4), which can be done automatically or semi-automatically.
The time needed for analysis and evaluation will thus be significantly reduced. None-
theless, there are also some disadvantages of this technique, such as the obtrusiveness
of the technology, the increased complexity of the set-up, the required additional exper-
tise for analyzing the data and plainly the additional costs of such devices (see Section 5
for some best practices).
Fig. 54.1: Sketch of the VPL DataGlove by VPL Research. The fibre-optic cables for measuring
the flexion of the fingers are woven into the surface of the gloves.
872 V. Methods
to the back of the hand using fibre-optic cables running along the fingers. At one end of
the cable a light source emitted light into the fibre, which was measured using a photo-
sensor at the other end. In between, at each knuckle some light was released from the
fibre relative to the flexion at that particular knuckle by a precisely calibrated mecha-
nism. In this way, the software was able to calculate the flexion of the finger based on
the remaining light detected by the sensor. A comparison of these early systems has
been compiled by Eglowstein (1990).
The resolution of the DataGlove is about 5˚, which is not sufficient if a high precision
is required (Shaw et al. 1992), but the body of the glove is soft and comfortable to wear.
This allows the user to make fine-grained hand movements without much hindrance.
There are, however, some drawbacks: the DataGloves need to be calibrated for
every user, as the position of the fibres may change. A new calibration may also be nec-
essary when the DataGlove is used for a longer period, due to warming of the material
or sweat. In addition, the fibres are quite sensitive.
A different technology is used by the CyberGlove developed by Virtex. Virtex
replaced the optical fibres by metal straps which alter their resistance when bent.
This device proved more robust and is more easily calibrated than the DataGlove.
The Dextrous Hand Master developed by Elizabeth Marcus at Exos offers much higher
precision and is able to detect the splay of the thumb. However, it is based on an exos-
keleton which is quite cumbersome to set up. The flexion is calculated by measuring the
Hall-effect induced by magnets on the skeleton.
The success of the DataGlove also led to the development of the PowerGlove by
AGE, Mattel and VPL, a data glove for Nintendo NES gaming consoles. Overall
about 1 million units were sold until the end of 1991 (Rheingold 1992). The configura-
tion of the fingers is measured by a strap of polyester on which a conducting ink is
applied. The resolution of the system, however, only allows for the detection of the
bending of a whole finger, not for individual phalanxes. The position of the glove is
detected based on ultrasonic technology.
The CyberGlove data glove series by CyberGlove Systems (see Fig. 54.2) covers
modern data gloves. Some of these devices come with a wireless option, force feedback
Fig. 54.2: The wired CyberGlove I data glove by CyberGlove Systems. The bimetal stripes for mea-
suring the flexion of the fingers have been worked into the garment (see picture on the right, exam-
ple areas are highlighted in red). The black crossbar with the grey spheres is not part of the data
glove but of a complementary optical tracking system (photography by Martin Brockhoff).
54. Documentation of gestures with data gloves 873
or tactile stimulation. While the options for providing feedback to the wearer are in
principle not required for most documentation needs, they could be relevant for
study design if, e.g., social contact or interaction with objects is of interest. The latest
CyberGlove III system is a mobile data glove, which can record data on hand postures
to a storage card without the need for an operator personal computer. When recording
hand gestures in the field, this could be an interesting option. The system, however,
does not provide absolute positions and orientations of the hand, though it can be
combined with an additional tracking system for that purpose.
Modern data gloves that operate using optical outside-in tracking are also available,
such as the Fingertracking system by the Advanced Realtime Tracking GmbH (ART
2011, see Fig. 54.3). The Advanced Realtime Tracking system consists of an active
hand target for tracking the position and orientation of the back of the hand. Wired
to the target are 3 or 5 thimbles, small cups for the fingers with attached IR-LEDs.
The main unit is synchronized with the external optical tracking system through infra-
red flashes. Different light patterns are triggered in this way, so that the individual fin-
gers can be identified by the tracking system. The system provides a high precision
measurement of the end position of each finger, as this is where the LED is attached.
The flexion of the fingers, however, has to be computed through a model of the hand,
which is initialized in a calibration procedure. Because the Fingertracking system relies
on optical tracking, it has problems with occlusions, similar to the video recording
approach. Typical systems therefore include several tracking cameras, which observe
the tracking volume from different perspectives. This reduces the likelihood of occlu-
sion, but especially grasp-like gestures will still be a problem. The advantage of the sys-
tem is the exact data regarding the absolute position of the finger tip. This makes the
device especially attractive as an interaction device in virtual reality, where a precise
measurement of the point of contact with a virtual object is critical.
Fig. 54.3: The optical Fingertracking device by the Advanced Realtime Tracking GmbH. This
device uses active outside-in optical tracking to track markers on the hand and on the finger
tips; options for three or five fingers are available (photography by Martin Brockhoff).
Last but not least I would like to briefly mention current approaches to low-cost hand
tracking. Wang and Popović (2009) presented ColorGlove, a tracking system which
should – in principle – work with a standard webcam or with the recordings of a
video camcorder. Their key idea is to use a glove colored in a specific pattern which
allows their tracking algorithms to identify postures stored in a database. While the
ColorGlove system shows a surprisingly good performance in interactive tasks, it cannot
874 V. Methods
match the precision and stability of the professional solutions. In interactive tasks, the
user receives real-time feedback of the recognition results and can adjust his or her
behavior appropriately. This adaptive behavior improves the performance of the system,
but adaption is not desirable when documenting gesture use. Nevertheless, if, e.g., spe-
cific gestures are to be counted, the ColorGlove approach could be a viable low-cost solu-
tion. Other approaches try to deal with hand gestures recorded without modifications of
the hand. A comparison of purely vision-based implementations of finger tracking based
on the famous EyesWeb system has been compiled by Burns and Mazzarino (2006).
3. Representation formats
Once the question of which tracking device to use has been decided, it is important to
identify an appropriate representation format for storing the recorded data. Based on
my own experiences, I would recommend recording the data at a level which is as
close to the device as possible, with few if any initial abstractions. Device-specific cali-
bration procedures should however be followed. Only in a second step should the rel-
evant parameters be identified and a pre-processed version of the data suitable for
further analysis created. This ensures that the initial assumptions underlying the pre-
processing step can still be the target of discussion and refinement later on in the
analysis process. Nothing is more annoying than data that has gone to waste.
The raw data comes in different formats for the individual data glove solutions. The
CyberGlove systems, for example, are available in versions with 18 or 22 joint-angle
measurements (see Fig. 54.4). The raw data thus consists of an 18- or 22-dimensional
vector of joint angles per measurement (which can be about 90 per second). The raw
data provided by the Advanced Realtime Tracking Fingertracking system provides
only 10 joint-angles directly – those between the phalanxes – but it is more detailed
in other aspects. It includes the absolute position and orientation of the back of the
hand, as well as the positions of the finger tips relative to the back of the hand. In addi-
tion to the joint-angles, the radius of the fingertip and the lengths of the phalanxes are
Fig. 54.4: This image highlights the sensor positions on the CyberGlove system (22 joint-angle
measurements). The raw data format provides a single value for each sensor.
54. Documentation of gestures with data gloves 875
tracking glove by combining soft golf gloves with markers for the optical tracking system
to determine only the position and orientation of the index finger. As these custom gloves
looked like ordinary gloves and had no strings attached, the participants felt more
comfortable wearing them and we did not observe any further oddities during the study.
Other aspects that need to be considered are, for example, the different hand sizes of
the participants. Most data gloves are configurable in this respect, but none support the
large variations that can be found in the population. This can especially be an issue when
tracking children and adults or Asians and Europeans in the same study. In most cases
you would also try to avoid mixing data gloves from different companies in the same
study, so a device or a device series has to be found that supports all relevant hand sizes.
Special care also has to be given to the length of the interaction periods to be re-
corded with the devices. As with all tracking systems attached to the human body,
the devices’ relative position to the body may drift slightly over time. This is particularly
true for the hands, which will naturally move very often and fast and – at the same
time – will have many interactions where they touch or hit other objects or body
parts. Also, if a data glove is used together with a motion tracking system, the relative
position of the two tracking systems can change over time, for example when the
marker of an optical tracking system is not fixed to the glove itself. If the glove is closed,
sweat might also be an issue if the tasks are very long or require much effort. Depend-
ing on the technology used, some data gloves might be more affected by this than
others. It is thus important to include short phases for recalibration or error estimation
in the design, to ensure maximum accuracy.
6. Conclusion
Data gloves provide a great opportunity to capture dynamic hand gestures. They do not
suffer from problems with perspective or occlusion, and offer machine-readable data.
The provided data also has a higher precision than what can typically be obtained by
annotating hand postures based on video recordings.
A few drawbacks and caveats should however be borne in mind. There is only little
support on the software side to analyze the recorded data. Exploiting the full potential
of data gloves for gesture analysis thus requires some expertise. Finally, wearing data
gloves may make participants of experiments feel uncomfortable and lead them to
show different behavior than without the gloves.
7. References
ART GmbH 2011. Homepage of the Advanced Realtime Tracking GmbH, http://www.ar-tracking.
de, last access November 2011.
Barbič, Jerneij, Alla Safonova, Jia-Yu Pan, Christos Faloutsos, Jessica K. Hodgins and Nancy S.
Pollard 2004. Segmenting motion capture data into distinct behaviors. In: Wolfgang Heidrich
and Ravin Balakrishna (eds.), Proceedings of Graphics Interface 2004, 185–194. Canadian
Human-Computer Communications Society School of Computer Science, University of Water-
loo, Waterloo, Ontario: Canada.
Bryson, Steve 1992. Virtual environments in scientific visualization. In: Compcon Spring’92. Thirty-
Seventh IEEE Computer Society International Conference, Digest of Papers, 460–461. IEEE.
Bouissac, Paul this volume. Prehistoric art: Hands in cave paintings and rock art. In: Cornelia Mül-
ler, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.),
878 V. Methods
Nguyen, Quan and Michael Kipp 2010. Annotation of human gesture using 3D skeleton controls.
In: Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani,
Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds.), Proceedings of 7th Interna-
tional Conference on Language Resources and Evaluation (LREC), 3037–3041. ELDA. Euro-
pean Language Resources Association (ELRA).
Ong, Sylvie C.W. and Surendra Ranganath 2005. Automatic sign language analysis: A survey and
the future beyond lexical meaning. IEEE Transactions on Pattern Analysis and Machine Intel-
ligence 27(6): 873–891.
Pfeiffer, Thies 2011. Understanding Multimodal Deixis with Gaze and Gesture in Conversational
Interfaces. Aachen, Germany: Shaker.
Pfeiffer, Thies this volume. Documentation of gestures with motion capture. In: Cornelia Müller, Alan
Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body –
Language – Communication: An International Handbook on Multimodality in Human Interac-
tion. (Handbooks of Linguistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Rheingold, Howard 1992. Virtuelle Welten – Reisen im Cyberspace. Hamburg: Rowohlt.
Shaw, Chris, Jiandong Liang, Mark Green and Yunqi Sun 1992. The decoupled simulation model
for virtual reality systems. In: Penny Bauersfeld, John Bennet and Gene Lynch (eds.), Proceed-
ings of the SIGCHI Conference on Human Factors in Computing Systems, 321–328. New York:
Association for Computing Machinery.
Sutton, Valerie 1981. SignWriting for Everyday Use. Newport Beach: The Sutton Movement Writ-
ing Press. Newport Beach, CA.
Zimmermann, Thomas G., Jaron Lanier, Chuck Blanchard, Steve Bryson, Young Harvill. A Hand
Gesture Interface Device. ACM SIGCHI Bulletin, 189–192. May 1986.
Wang, Robert Y. and Jovan Popović 2009. Real-time hand-tracking with a color glove. ACM
Transactions on Graphics 28(3): 63:1–63:8. ACM, New York, NY: USA.
Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann and Han Sloetjes 2006.
ELAN: a professional framework for multimodality research. In: Proceedings of LREC
2006, Fifth International Conference on Language Resources and Evaluation. online: http://
www.lrec-conf.org/proceedings/lrec2006/pdf/153_pdf.pdf. 1556–1559. ELRA, European Lan-
guage Resources Association.
Abstract
This contribution aims to provide an up-to-date picture of the state of the art on reliability
and validity in the field of quantitative observation of verbal and bodily processes of com-
munication, where the main source of error is the human observer who codes the
880 V. Methods
interactants’ behaviour by means of coding schemes. After a lexical clarification that al-
lows readers to have a more aware orientation in this field, the contribution a) introduces
the basic concepts of intra-, inter- and observer reliability; b) reviews the summary and
point-by-point agreement coefficients among which the most used Cohen’s K; c) critically
reviews the K in the light of the more recent advances; d) provides brief hints on the valid-
ity of coding systems. Research examples, applications, indications for the right use of a
determined coefficient accompanies the contribution.
1. Introduction
Coding systems are typically used to classify. In social sciences, classification is particu-
larly used in behavioral observation (Bakeman 2000; Bakeman and Gottman 1997) or
content analysis (Bartholomew, Henderson, and Marcia 2000; Berelson 1954; Mehl
2005; Smith 2000). Given the emphasis of this volume on interaction, communication,
language and speech, focus is on behavioral observation. Behavioral coding systems
may apply to gestures, language and other communication constructs, especially when
they regard the development of the interaction in time, i.e., its sequential organization.
Initially the ability of observational coding systems to grasp non-individual interac-
tional processes and nonverbal aspects of behavior was recognized in the study of
animals (e.g., Atmann 1965) or of infants, “nonverbal” humans (Tronick et al. 1978).
Systematic observation roots in Stevens’ (1946) theory of measurement. It is a public
and replicable activity whereby one or more trained observers associate predefined
behavioral categories to interactional events (sequences of live or recorded behavior,
sometimes transcribed) according to rules fixed in the coding manual (Bakeman 2000).
This definition constrains the type of categories involved in coding systems, that is,
only nominal or ordinal. If a measurement is done with quantitative variables, it is called
rating, which by definition opposes coding. Coding systems are therefore sets of prede-
fined behavioral categories representing conceptually meaningful distinctions, almost
always theory-based that the investigator uses to answer research questions (Bakeman
and Gnisci 2005; Gnisci, Bakeman, and Maricchiolo this volume). Developing and imple-
menting a new category system is a time- and resources-consuming challenge. Alterna-
tively it can be adopted from literature or adapted to one’s own needs. A coding
system is informed by a combination of theory, previous research findings, analysis of
pilot, reflections on the preliminary efforts of coding, observation of the material to be
analyzed, other than individual insight (Bakeman and Gottman 1997; Bartholomew,
Henderson, and Marcia 2000). In this phase categories are not a priori, they are empiri-
cal, that is they emerge from the material to be analyzed (Smith 2000). Often, they consist
of a single set of mutually exclusive and exhaustive codes.
In Fig. 55.1 a coding system for hand gestures, hierarchically organized, is shown.
Usually two or more independent coders are compared to identify possible problems
of the category system or of the coders themselves. Idiosyncrasy among coders is not
the subject of the research, thus to escape individual views on observed data a compar-
ison among observers is due. This is done after each coder has been trained conceptu-
ally and practically to the categories used and to the studied phenomenon and context.
Agreement among coders is then checked in an empirical way. To make a long story
short, the training phase is crucial and requires many warnings (see Bakeman and
Gottman 1986, 1997).
55. Reliability and validity of coding systems for bodily forms of communication 881
HAND GESTURE
iconic deictic
metaphoric
Fig. 55.1: Example of coding system for hand gestures (from Maricchiolo, Gnisci, and Bonaiuto,
2012).
interaction. While reliability refers to internal consistency of a coding system (that is, if
different measures of the same construct go together), validity refers to the fact that our
coding system is really reflecting a process that is being realized “there,” in the “world.”
If two witnesses give the same evidence, they agree (reliability), but this does not necessar-
ily mean that they tell the truth, with respect to what really happened (validity).
Usually, reliability and validity are understood by referring to some basic concepts,
such as precision, stability, and accuracy. Although precision is not a consensual crite-
rion (see Suen 1988: note 1), in this particular version, precision is used here as the
degree of coherence with which the observer associates events or objects to determined
categories. If, within the same coding session, a coder systematically assigns a behavior,
such as a smile, to a determined category, say A (A represents the positive behaviors),
then this coding activity is precise. Note that a coder may be precise, but not accurate,
when, for example s/he always assigns a behavior as a smile to a category such as B,
where B represents negative behaviors: such an observer is systematically biased. Sta-
bility over time (or retest reliability) refers to the degree of correlation among different
coding, in different periods of time, of the same interaction. It applies across sessions
of coding occurring at different times (e.g., after 6 months) and guarantees that an ob-
server’s coding does not decay (Bakeman and Gottman 1997). Precision refers to the
coherence in the coding within the same session, stability to the coherence between ses-
sions. Both precision and stability refer to reliability because they allow, although dif-
ferently, for the checking of the internal consistency of the coding activity: Are
different collections of the same thing, as grasped by two coders observing the same
interaction, consistent? Thus, reliability is a multidimensional concept.
Finally, accuracy refers to the degree of correspondence between the categories used
and the “reality,” that is, how much what the observer coded corresponds to or reflects
the behaviors or the processes that occurred in the interaction. It has much to do with
validity, even if it may be also involved in reliability. If, reliably coding a marital inter-
action, a coding system detects a strong aggression of a husband toward his wife, and if
actually no aggressive behavior or process is really happening among husband and wife,
the category system is simply wrong: it is not accurate.
Another term is often used: calibration among observers as the degree of agreement
reached by two (or more) coders, one with each other. This is because data collected
should not vary as a function of the observer (Bakeman and Gottman 1997). One rea-
son to calibrate observers is essentially practical: if calibrated, two observers provide
the same or at least similar coding when observing the same interaction, and thus
may be used interchangeably. Psychometric literature gives not much clearness about
the role of calibration in relation with reliability and/or validity. However, when two
coders are calibrated, their coding is of course reliable, but not necessarily valid: Two
observers whose coding agrees could share a deviant worldview and, for whatever rea-
son (e.g., an error in training), both are convinced that certain observed behaviors fall
into certain categories but both are wrong.
many potential sources of error. Instead, reliability, invoking a rich and complex psy-
chometric tradition, addresses eventually all the possible sources of errors, and it is
not constrained to the case of two or more observers, appearing a more general
term. However, many scholars agree on the fact that agreement can be regarded as
a necessary but not sufficient condition for reliability (Bakeman and Gottman 1997).
When lacking overt agreement among observers, no reliability can be established;
yet, if agreement is found, this does not necessarily translate into reliability.
Reliability is commonly defined as data consistency and refers to the degree to which
data are free from measurement errors: the less error, the more consistent the data
(Suen 1988). Making reference to the distinction made among precision, stability and
accuracy and considering the observer as a source of data, there are at least three
types of reliability (Berk 1979; Martin and Bateson 1986; Bakeman and Gottman
1997): an observer can be reliable with respect to him/herself (intra-observer reliability),
to another observer (inter-observer reliability), or to an ideal observer (a standard pro-
tocol or master) that, it is assumed, coded perfectly (observer reliability). Each reliabil-
ity is defined below, introducing also adequate reliability coefficients for coding
schemes in different situations.
columns of the agreement matrix (see below) in place of the observer O2. If a good
index of reliability is obtained, it does not tell us that observers are calibrated but
that the observer coding is somehow accurate.
seems to enlarge the concept of reliability so as to soften the boundaries between relia-
bility and validity (for specific applications and calculations, see Bakeman and Gottman
1986; Berk 1979; Brennan 1983; Hartmann 1982; Suen 1988; for a procedure applicable
to sequential data see Bakeman and Gottman 1997: 78).
In any case, given that all intraclass correlations coefficients come from a matrix as the
one mentioned above, they will never grasp the agreement of the observer point-by-
point, that is for each moment or event or interval in which the interaction is segmented.
O2
Adaptors 42 0 0 1 43
Conversational 3 21 2 0 26
O1
Emblematic 0 4 18 1 23
Illustrators 0 0 0 31 31
Tot. 45 25 20 33 123
k
Observed agreement (AC ) = ∑x
i = j =1
ij N = 112 = .91*
123
Observed disagreement (D ) = ∑x
i = j =1?
ij N = 11 = .09
123
∑x
2
Chance Agreement (AC ) = +i xi+ N = 45 × 43 + 26 × 25 + 23 × 20 + 31 × 33 = 4158 = .27
1232 15129
i =1
AO – AC AV .64
K= = = = .88
AO – AC + D 1 – AC 1 – .27
Fig. 55.2: A hypothetical confusion matrix with four codes (for hand gestures) and two observers
(O1 and O2) and the calculations for the percentage of agreement and for the Cohen’s K.
than those for a self-report scale, probably because agreement among observers is con-
sidered a harder evidence to reach. However, we will see that this (and others) rule of
thumb for establishing good values of Cohen’s K have to be qualified.
(i) The systematic bias among observers, that is the similarity or difference among the
marginal frequencies of the two observers, for example, of the last row and the last
column of the table represented in Fig. 55.2 (when the two are different, even if the
two observers always agree, K cannot reach 1);
(ii) The number of codes of the category system (the more the codes the higher the K);
(iii) The prevalence of categories, that is how they are distributed in the universe of
reference (the more they are equally distributed the higher the value of K).
However, a simulation study shows that when a coding system holds more than 5–7
codes, the increase of K is small and negligible and K does not anymore depend on prev-
alence, finally, particularly when codes are less than 5, low values of K predict high
values of accuracy (Bakeman et al. 1997). This seems to demonstrate what has been
many times sustained, that K is a conservative index (Strijbos et al. 2006).
reliable and
reliable and error not-valid
not-valid 5% 15%
15%
Fig. 55.3: Two graphic examples of dependency of the validity of a coding scheme by the reliabil-
ity: the first with high reliability, the second with low reliability.
Why lose time in checking whether the coding system reflects something real, if it is not
even coherent internally, that is, if the observers saw two almost different things?
Therefore, only once reliability is established, is validity considered.
It is impossible to report the whole discussion on the validity of coding systems in a
few words. Only the traditional aspects of it are reported, providing few additional
notes on basic aspects for coding systems. Content validity is established by demonstrating
55. Reliability and validity of coding systems for bodily forms of communication 889
that the behaviors represented by the scheme’s categories are a representative sample
of the universe of behavior to be observed. Traditional approaches to validity require
demonstrating that the coding system correlates in sensible ways with different mea-
sures allegedly associated with it in the present (concurrent validity) or in the future
(predictive validity) and with other measures assumed to measure the same construct
(convergent validity); and does not correlate with other measures assumed to measure
different constructs (divergent validity).
Many advances in the field of language and communication could be achieved if the
efforts of researchers would be addressed to concurrent and predictive criterion (or
external) validity. This would render the coding schemes much more effective and use-
ful. Bonaiuto, Gnisci, and Maricchiolo (2002), for example, found that conversational
and ideational gestures (SLG in Fig. 55.1) made by people involved in conversation
correlate with sent out turns (.76, p=.001) while adaptor gestures (SNG) do not (.12,
p>.05): this seems a confirmation of the speech-related feature of conversational and
ideational gestures. Furthermore, particular behaviors may be seen with a different
eye (although caution is always welcomed) when it is known that in marital interaction,
for example, a particular facial display of contempt or disgust of the husband or the wife
correlates significantly with variables linked with the future duration of the couple
(Gottman 1994). Such evidence seems to strengthen the solidity of coding schemes
and to connect the behavioral categories to differently collected evidence.
Researchers do not use coding schemes as a mere list of behaviors but because they
are dimensions of interest, or constructs. A construct is a concept that cannot be ob-
served directly but only through several behavioral indicators. In research on language
and communication, these constructs often regard the interaction rather than the indi-
vidual (e.g., processes as reciprocity, accommodation, dominance). The ratio of
construct validity is that the construct that the coding system wants to measure will
be correlated with the same construct measured by other methods (e.g., a self-report
scale) and will not be correlated with unrelated constructs (Campbell and Fiske 1959;
Cronbach and Meehl 1955). This comprehensive approach has been formalized in the
multitrait-multimethod approach (MTMM; e.g., Eid 2005; John and Benet-Martı́nez
2000; Schmitt 2005).
From the latter half of the last century the concept of construct validity, originally
intended as part of general validity, developed toward an integrated general conception
that includes reliability and other forms of validity as the criterion validity. Validity be-
comes the degree to which the adequacy and the appropriateness of inferences and ac-
tions based on the results of the application of a coding system are supported by
empirical evidence and theoretical rationale (Messick 1995; Wiggins 1973). This new
conception, sometimes called nomological validity, wants to check whether the inter-
pretations of the data collected with the coding schemes are consistent with a nomolo-
gical network of other measurements, for example the same construct measured with
different methods, compared with different constructs and correlated with appropriate
criteria. In other words, this validation becomes very close to testing a complex causal
model, theoretically based, with many variables related to the construct itself. Originally,
this was made by qualitative and tabular summaries of previous research, then by meta-
analyses, and recently by structural equation model testing ( John and Benet-Martı́nez
2000; for the statistical technique, Corral-Verdugo 2002; Eid, Lischetzke, and Nussbeck
2005; for a critical voice see Borsboom, Mellenbergh, and Heerden 2004).
890 V. Methods
5. References
Atmann, Stuart A. 1965. Sociobiology of rhesus monkeys. II. Stochastics of social communication.
Journal of Theoretical Biology 8: 490–522.
Bakeman, Roger 2000. Behavioral observation and coding. In: Harry T. Reis and Charles M. Judd
(eds.), Handbook of Research Methods in Social and Personality Psychology, 138–159. Cam-
bridge: Cambridge University Press.
Bakeman, Roger and Augusto Gnisci 2005. Sequential observational methods. In: Michael Eid
and Ed Diener (eds.), Handbook of Multimethod Measurement in Psychology, 127–140. Wash-
ington, DC: American Psychological Association.
Bakeman, Roger and John M. Gottman 1986. Observing Interaction. An Introduction to Sequential
Analysis. New York: Cambridge University Press.
Bakeman, Roger and John M. Gottman 1997. Observing Interaction. An Introduction to Sequential
Analysis. 2nd Edition. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 1995. Analyzing Interaction. Sequential Analysis with SDIS
and GSEQ. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 2011. Sequential Analysis and Observational Methods for the
Behavioral Sciences. New York: Cambridge University Press.
Bakeman, Roger, Vicenç Quera and Augusto Gnisci 2009. Observer agreement for timed-event
sequential data: A comparison of time-based and event-based algorithms. Behavior Research
Methods 41: 137–147.
Bakeman, Roger, Vicenç Quera, Duncan McArthur and Byron Robinson 1997. Detecting sequen-
tial patterns and determining their reliability with fallible observers. Psychological Methods 2:
357–370.
Bartholomew, Kim, Antonia J. Z. Henderson and James E. Marcia 2000. Coding semistructured
interviews in social psychological research. In: Harry T. Reis and Charles M. Judd (eds.), Hand-
book of Research Methods in Social and Personality Psychology, 286–312. Cambridge: Cam-
bridge University Press.
Berelson, Bernard 1954. Content analysis. In: Gardner Lindzey (ed.), Handbook of Social Psy-
chology, Volume 1, 488–522. Oxford: Addison-Wesley.
Berk, Ronald A. 1979. Generalizability of Behavioral Observation: A Clarification of Interobserver
Agreement and Interobserver Reliability. Cambridge: Cambridge University Press.
Bonaiuto, Marino, Augusto Gnisci and Fridanna Maricchiolo 2002. Proposta e verifica empirica di
una tassonomia dei gesti nell’interazione di piccolo gruppo. Giornale Italiano di Psicologia 29:
777–807.
Borsboom, Denny, Gideon J. Mellenbergh and Jaap van Heerden 2004. The concept of validity.
Psychological Review 111: 1061–1071.
Brennan, Robert L. 1983. Elements of Generalizability Theory. Iowa City: American College Test-
ing Program.
Bryington, April A., Darcy J. Palmer and Marley W. Watkins 2004. The estimation of interobser-
ver agreement in behavioral assessment. Journal of Early and Intensive Behavior Intervention
1: 115–119.
Campbell, Donald T. and Donald W. Fiske 1959. Convergent and discriminant validation by the
multitrait-multimethod matrix. Psychological Bulletin 56: 81–105.
Cohen, Jacob A. 1960. A coefficient of agreement for nominal scales. Educational and Psycholog-
ical Measurement 20: 37–46.
Cone, John D. 1987. Behavioral assessment: Some things old, some things new, some things bor-
rowed? Behavioral Assessment 9: 1–4.
Corral-Verdugo, Victor 2002. Structural Equation Modeling. In: Robert B. Bechtel and Arza
Churchman (eds.), Handbook of Environmental Psychology, 256–270. New York: Wiley.
55. Reliability and validity of coding systems for bodily forms of communication 891
Cronbach, Lee J., Goldine C. Gleser, Harinder Nanda and Nageswari Rajaratnan 1972. The
Dependability of Behavioral Measurement: Theory of Generalizability for Scores and Profiles.
New York: Wiley.
Cronbach, Lee J. and Paul E. Meehl 1955. Construct validity in psychological tests. Psychological
Bulletin 52: 281–302.
Eid, Michael 2005. Methodological approaches for analyzing multimethod data. In: Michael Eid
and Ed Diener (eds.), Handbook of Multimethod Measurement in Psychology, 223–230. Wash-
ington, DC: American Psychological Association.
Eid, Michael, Tanja Lischetzke and Fridtjof W. Nussbeck 2005. Structural equation models for
multitrait-multimethod data. In: Michael Eid and Ed Diener (eds.), Handbook of Multimethod
Measurement in Psychology, 283–299. Washington, DC: American Psychological Association.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior. Semiotica 1: 49–98.
Fleiss, Joseph L. 1981. Statistical Methods for Rates and Proportions. New York: Wiley.
Gnisci, Augusto, Roger Bakeman and Fridanna Maricchiolo this volume. Sequential notation and
analysis. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin: De Gruyter Mouton.
Gnisci, Augusto, Roger Bakeman and Vicenç Quera 2008. Blending qualitative and quantitative anal-
yses in observing interaction. International Journal of Multiple Research Approaches 2: 15–30.
Gottman, John M. 1994. What Predicts Divorce? The Relationship between Marital Processes and
Marital Outcomes. Hillsdale, NY: Lawrence Erlbaum.
Hartmann, Donald P. 1982. Assessing the dependability of observational data. In: Donald P. Hart-
mann (ed.), New Directions for the Methodology of Behavioral Sciences: Using Observers to
Study Behavior, 51–65. San Francisco: Jossey-Bass.
John, Oliver P. and Veronica Benet-Martı́nez 2000. Measurement: Reliability, construct validation,
and scale construction. In: Harry T. Reis and Charles M. Judd (eds.), Handbook of Research
Methods in Social and Personality Psychology, 339–369. Cambridge: Cambridge University Press.
Martin, Paul and Patrick Bateson 1986. Measuring Behavior: An Introductory Guide. Cambridge:
Cambridge University Press.
Maricchiolo, Fridanna, Augusto Gnisci and Marino Bonaiuto 2012. Coding hand gestures: A reli-
able taxonomy and a multi-media support. In: Anna Esposito, Antonietta M. Esposito, Ales-
sandro Vinciarelli, Rüdiger Hoffman and Vincent C. Müller (eds.), Cognitive Behavioural
Systems, 405–416. Berlin: Springer.
McNeill, David 1992. Hand and Mind. Chicago: University of Chicago Press.
Mehl, Matthias R. 2005. Quantitative text analysis. In: Michael Eid and Ed Diener (eds.), Hand-
book of Multimethod Measurement in Psychology, 141–156. Washington, DC: American Psy-
chological Association.
Messick, Samuel 1995. Validity of psychologycal assessment. American Psychologist 50: 741–749.
Pedon, Arrigo and Augusto Gnisci 2004. Metodologia della Ricerca Psicologica. Bologna: il Mulino.
Quera, Vicenç 2008. RAP: A computer program for exploring similarities in behavior sequences
using random projections. Behavior Research Methods 40: 21–32.
Quera, Vicenç, Roger Bakeman and Augusto Gnisci 2007. Observer agreement for event se-
quences: Methods and software for sequence alignment and reliability estimates. Behaviour
Research Methods 39: 39–49.
Robinson, Byron F. and Roger Bakeman 1998. ComKappa: A Windows 95 program for calculating
kappa and related statistics. Behavior Research Methods, Instruments, and Computers 30: 731–732.
Schmitt, Manfred 2005. Conceptual, theoretical, and historical foundations of multimethod assess-
ment. In: Michael Eid and Ed Diener (eds.), Handbook of Multimethod Measurement in Psy-
chology, 9–25. Washington, DC: American Psychological Association.
Sim, Julius and Chris Wright 2005. The kappa statistic in reliability studies: Use, interpretation,
and sample size requirements. Physical Therapy 85: 257–268.
892 V. Methods
Smith, Charles P. 2000. Content analysis and narrative analysis. In: Harry T. Reis and Charles M.
Judd (eds.), Handbook of Research Methods in Social and Personality Psychology, 313–335.
Cambridge: Cambridge University Press.
Stevens, Stanley S. 1946. On the theory of scales of measurement. Science 103: 677–680.
Strijbos, Jan-Willem, Rob Martens, Frans Prins and Wim Jochems 2006. Content analysis: What
are they talking about? Computers and Education 46: 29–48.
Suen, Hoi K. 1988. Agreement, reliability, accuracy and validity: Toward a clarification. Behav-
ioral Assessment 10: 343–366.
Towstopiat, Olga 1984. A review of reliability procedures for measuring observer agreement. Con-
temporary Educational Psychology 9: 333–352.
Tronick, Ed, Heidelise Als, Lauren Adamson, Susan Wise and T. Barry Brazelton 1978. The in-
fant’s response to entrapment between contradictory messages in face-to-face communication.
Journal of American Academy of Child Psychiatry 17: 1–13.
Uebersax, John 1987. Diversity of decision-making models and the measurement of interrater
agreement. Psychological Bulletin 101: 140–146.
Watkins, Marley W. and Miriam E. Pacheco 2000. Interobserver agreement in behavioral research:
Importance and calculation. Journal of Behavioral Education 10: 205–212.
Wiggins, Jerry S. 1973. Personality and Prediction: Principles of Personality Assessment. Reading,
MA: Addison Wesley.
Abstract
Human interaction is a dynamic process that unfolds in time between two or more people
and that is typically characterized by mutual influence; participants in interaction coordi-
nate in time various aspects of their own behavior (e.g., speech and gestures) with other
people’s behavior (e.g., different turns of talk). Sequential analysis is a quantitative
approach that emphasizes measurement and utilizes traditional research instruments
for data collection and analysis in the attempt to grasp interactive processes without los-
ing the dynamic processes of interaction. The Sequential Data Interchange Standard
(SDIS) allows users to code data based on time (states and timed-events) and on events
(intervals, simple and multi-events), using a basic common language based on a universal
data grid and developing particular notation. Using this language allows users to analyze
basic interactive processes (e.g., reciprocity, coordination, divergence, etc.) in a way that
56. Sequential notation and analysis for bodily forms of communication 893
reflects their dynamic and sequential aspect. The contribution provides both worked
research examples and brief hints on which kind of sequential analysis can be fruitfully
applied to proper data.
1. Introduction
Sequentiality has been recognized as a basic attribute of interactive processes by many
scholars since at least the 1950s (Argyle 1967; Heyns and Lippitt 1954). The term
“sequential” is often used to characterize the dynamic aspect of interaction. It is not
a mystery that human interaction is essentially a dynamic process that unfolds in
time between two or more people and that is typically characterized by mutual influ-
ence (Bakeman and Gottman 1997). Participants in such interaction have to coordinate
in time various aspects of their own behavior (e.g., speech and gestures) with other
people’s behaviour (e.g., different turns of talk).
Some qualitative approaches to interaction (e.g., Conversation Analysis or CA) put
sequence at the heart of their approach (Sacks 1972; Sacks, Schegloff, and Jefferson
1974). However, the approach to sequential analysis that we will follow here is quanti-
tative, that is, an approach that emphasizes measurement and utilizes traditional
research instruments for data collection and analysis in our attempts to understand in-
teractive processes (Bakeman 2000; Bakeman and Gnisci 2005; Bakeman and Gottman
1997; Bakeman and Quera 1995a; Bakeman and Robinson 1994; for criticism, see
Slugoski and Hilton 2001; for some answers to qualitative criticism, see Gnisci, Bakeman,
and Quera 2008). It consists of observational techniques based on systematic observa-
tion and statistical techniques of analysis that take sequencing into account (sequences,
co-occurrence, durations, intervals, parallel streams of behaviours, etc.). Its scope in-
cludes attempts to operationalize basic interactive processes (e.g., reciprocity, coordina-
tion, divergence, etc.) in a way that reflects their dynamic and sequential aspect (Gnisci
2005). To put the matter metaphorically, we are interested in an approach that lets us
reconstruct a film clip and not just a single picture.
Such detailed sequential analysis has been made feasible by the exceptional devel-
opment of video and audio recording technology during the last several decades, allow-
ing us not just to view but to review repeatedly behaviour of interest (Bull 2002).
Recording technology has evolved from film, to analogue tape, to digital computer
files, and from expensive and cumbersome to inexpensive and portable recording de-
vices (Bakeman and Gottman 1997). It is not an exaggeration to say that computer
and video technology is to the study of behaviour what the introduction of the micro-
scope was to the biological sciences (Bull 2002). It has permitted the fine grained analysis
of interaction, called microanalysis (Bull 2002), and has led us to consider anew how we
should represent and analyse interaction (Müller this volume). This chapter presents a
notational scheme for behavioural sequences, one that, in particular, facilitates their
sequential analysis.
two for type of gestures (for witnesses WP=propositional and WNP=non propositional,
and for lawyers LP and LNP have similar meaning).
Fig. 56.2 shows in graphical form the outcome of coding, after observers have viewed
the video record. It is based on the beginning of an interchange reported and analyzed
extensively in Gnisci and Pontecorvo (2004), although here it has been simplified; thus
it should be considered simply as a hypothetical example. The examples of sequences
reported in subsequent paragraphs refer to the interaction portrayed in Fig. 56.2.
Q5 Q5 Q1 Q4 Q1 Q1 …
However, if question and answer exchanges were our research focus, we could repre-
sent the flow of behavior between lawyer and witness as:
Q5 A2 Q5 A2 Q1 A3 Q4 NR Q1 NR Q1 …
We can even include four coding systems (lawyer questions, witness answers, turn tak-
ing by L and by W) because they occur in sequence:
Q5 T3 A2 S2 Q5 T3 A2 S5 Q1 T1 A3 S1 Q4 T1 NR S5 Q1 T2 NR S2 Q1 …
Seconds
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 30 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
Q4 Q1 Q1 ...
A3 NR NR
S1 S5 S2
T1 T2
P
NP NP NP
Fig. 56.2: General representation of an examination between a lawyer (L) and a witness (W) in a criminal trial with annotation of the durations of turns
of question and answer, the duration of the gestures, and the occurrence of turn taking (approximation 1 second).
V. Methods
56. Sequential notation and analysis for bodily forms of communication 897
combinations of old codes (e.g., a new code X for the answer A1 co-occurring with ges-
ture P, a new code Y for A2-P, etc.), but this can produce a proliferation of new codes
that rapidly become awkward and unmanageable for subsequent data analysis.
In sum, simple-event data are useful under limited circumstances, for example, when
only a simple flow of behaviour is of interest. With these data it is possible to understand
how often events occur and how events are sequenced but not how long individual events
last or the proportion of time devoted by each participant to each kind of event. Their use
is suggested in very introductory studies, when resources are limited or when research
questions suit them. In such cases they can provide easy and effective answers.
Here each event or interactional unit (i.e., the question-answer exchange) represents
an event of variable duration (for example, the 1st exchange lasts 8 sec, the 2nd 37,
the 3rd 6). But if the actual duration is not of interest, representing data as multi-events
can be useful. Note, however, that this representation did not include gestures, which
occurred at varying points in time and were not necessarily a dimension of the
question-answer unit.
Although limited, multi-event data have advantages. For example, since multi-
dimensional contingency tables result, they lend themselves to log-linear analysis
(see below).
device can signal to the observer when the interval is over and the observer can then
proceed to code whatever happened within that interval. Therefore, the flow of behavior
is conceptualized as a sequence of intervals. If we assume intervals equal to 5 seconds,
and commas distinguish different intervals, our representation becomes:
If, instead, the coder assigns only the behaviors co-occurring at the end of the interval,
in effect point sampling, and starting from the 5th second, the representation is:
Of course, which strategy is the best depends on the aim of the researcher and the
research questions. In many cases, the best strategy may be to use a different data
type because of the inherent limitations of interval data. Still interval sequential data
may be useful when the investigator is interested in only approximate estimates of
how often events occur and what proportion of time is devoted on each event. When
used as point sampling, intervals provide a random sample of behaviors that cannot
be regarded as truly sequential.
Q5=2 A2=5 Q5=10 A2=24 Q1=4 A3=2 Q4=9 NR=6 Q1=10 NR=15 Q1=8 …
In this case the representation only approximates the interaction. It eliminates the
length of the inter-turns pauses and the occurrence of overlap between turns. However,
when applied to mutually exclusive and exhaustive codes it is very effective because it
preserves the sequence and the duration of a flow of events, providing more information
than simple-event sequences provide.
More than one stream can be represented, which allows us to examine co-occurrences
between codes in two or more streams. The multiple stream state format allows for con-
siderable flexibility. Here is a four-stream example. The first stream represents the law-
yer’s questions, the second the witness’s answers, the third the witness’s gestures, and
the fourth the lawyer’s gestures. To make each set exhaustive, Q0=no lawyer question,
A0=no witness answer, WG0=no witness gesture, LG0=no lawyer gesture. The time
unit is one second:
Q5=2 Q0=8 Q5=10 Q0=26 Q1=4 Q0=2 Q4=9 Q0=3 Q1=10 Q0=17 Q1=8 … &
A0=4 A2=5 A0=14 A2=24 A0=3 A3=2 A0=9 NR=6 A0=8 NR=15 … &
WG0=4 WNP=2 WG0=17 WNP=2 WG0=25 WNP=2 WG0=9 WNP=2 G0=12 WNP=2 …. &
LG0=43 LP=4 LG0=15 LP=5 LG0=…
The symbol “&” indicates the end of a stream, thus the next stream overlaps any pre-
vious streams. In sum, multiple-stream state sequences allows for the representation of
56. Sequential notation and analysis for bodily forms of communication 899
many parallel streams of interaction and are therefore very useful, but, note, they
require that all codes within a stream be members of mutually exclusive and exhaustive
sets; momentary or frequency-only codes are not included.
The general format is: code,onset-time – offset time, or code,onset-time for momentary
behaviours. Here, times are in seconds. A right parenthesis after the offset time indi-
cates an inclusive offset, for example, Q5,11–20) means the code Q5 occurred from sec-
ond 11 through second 20, for a duration of 10 seconds. (In SDIS, no right parenthesis
after the offset time indicates an exclusive offset, for example, Q5,11–21 means the code
Q5 occurred from second 11 up to but not through second 21, for a duration of
10 seconds.)
Information on event frequency, duration, proportion of time, and sequence is most
completely preserved with the timed-event data type, making this data notation the
most flexible and complete.
Timed/State Data*
Codes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Q5
Q6
A1
A2 ...
T3
S2
S5
WNP
* For brevity the representation stops at the 30th second.
Fig. 56.3: Representational grid for Simple-Event, Multi-Event, Fixed Interval, and Timed-Event/
State sequential data for our example from the beginning of the examination until the beginning
of the third question-answer exchange (approximately 47 sec.).
wanting to check specific sequential chains among behaviours may choose a chi square
strategy that uses the chi square test applied to cross-tables sequentially organized to
identify significant associations among different behaviors as omnibus test, so they
might compute transitional probabilities and build transitional matrixes or diagrams.
They could then compute adjusted residuals for the cells of the table to identify partic-
ular significant sequences (for details of this strategy, see Bakeman and Gottman 1997).
To check particular hypotheses, investigators might want, instead, to obtain a partic-
ular sequential statistic for each subject (for example an associative index such as
Yules’s Q between behaviors of the mother and subsequent responses by the child for
each interacting mother-child dyad) and export all of them in a participants-by-variables
matrix, such as the one that SPSS expects. Classical inferential analyses can then be per-
formed, such as ANOVA to identify differences between groups (e.g., between an autis-
tic group and controls), or regression to check, for example, if such individual variables as
the intelligence of the mother or of the child is associated with communicative compe-
tence as indicated by sequential indexes (see Bakeman and Gottman 1997; Gnisci and
Bakeman 2000). In sum, when deriving a dyadic score from sequential data, it can
then serve as a score to be analyzed with whatever statistical procedures are appropriate
(for an application, see Gilstrap and Ceci 2005).
Finally, investigators may want to identify general patterns of transition involving
many behaviors and therefore use a lag-sequential log-linear analysis (Bakeman and
Quera 1995b). This strategy allows integration between sequential analysis into an es-
tablished and well-supported statistical tradition as the one of log-linear models (Bake-
man and Robinson 1994) and resembles the Markovian chains analysis (Castellan 1979;
Chatfield 1973) of earlier literature. The resulting multidimensional sequential tables
that report data on sequences of many behaviors can be used with log-linear analysis
to find the simpler but more informative sequential model, the so called best fitting
model. In this manner, we can identify stable patterns of interaction (for a worked
example involving mother-child interaction, see Cohn and Tronick 1987). For example,
using this strategy, Gnisci (2005; see also Gnisci and Bakeman 2007) demonstrated that
in examinations during criminal trials the answers provided by witnesses and the way
they managed turn taking depended not only on the questions posed by lawyers but
also on the way lawyers took the floor when asking questions (see Fig. 56.4). What is
ΔQ 2 = 5%**, ΔE = .46%
ΔQ 2 = 6%***, ΔE = .87%
ΔQ 2 = 36%***, ΔE = 4.74%
ΔQ 2 = 22%*, ΔE = 2.73%
Fig. 56.4: Sequential log-linear analysis of examinations between lawyers and witnesses (From
Gnisci, 2005).
902 V. Methods
interesting is that content or transitional aspects of the turn of the witnesses depended
on both content and transitional aspect of the lawyer’s preceding turn (multimodality).
Here we have indicated only some ways to proceed in analyzing sequential data. In
any case, any of a variety of statistics can be fruitfully applied to answer research ques-
tions on interaction.
4. Brief conclusions
Researchers of language and communication who take a quantitative approach may
strongly benefit from the notation of sequential analysis. First, it provides a standard
way to represent different types of sequential data (as defined by Sequential Data Inter-
change Standard) that, second, can be fruitfully analyzed to represent dynamic aspects
of the interaction. The ability to operationalize these often elusive aspects of interaction
is a major advantage of the sequential approach: many different streams of behaviour
or over-lapping events can be taken into consideration simultaneously, the effects of dif-
ferent behavior modalities can be detected, different notions of context can be operation-
alized, and many relationships between them, reflecting different processes (e.g.,
reciprocity, accommodation, synchronicity, suggestibility, etc.), can be fruitfully pictured.
5. References
Argyle, Michael 1967. The Psychology of Interpersonal Behavior. Harmondsworth: Penguin.
Bakeman, Roger 2000. Behavioral observation and coding. In: Harry T. Reis and Charles M. Judd
(eds.), Handbook of Research Methods in Social and Personality Psychology, 138–159. Cam-
bridge: Cambridge University Press.
Bakeman, Roger, Lauren B. Adamson, Melvin Konner and Ronald G. Barr 1990 !Kung infancy:
The social context of object exploration. Child Development 61: 794–809.
Bakeman, Roger, Deborah F. Deckner and Vicenç Quera 2004. Analysis of behavioral streams. In:
Douglas M. Teti (ed.), Handbook of Research Methods in Developmental Science, 394–420.
Oxford: Blackwell.
Bakeman, Roger and Augusto Gnisci 2005. Sequential observational methods. In: Michael Eid
and Ed Diener (eds.), Handbook of Multimethod Measurement in Psychology, 127–140. Wash-
ington, DC: APA.
Bakeman, Roger and John M. Gottman 1997. Observing Interaction. An Introduction to Sequential
Analysis, 2nd Edition. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 1995a. Analyzing Interaction. Sequential Analysis with SDIS
and GSEQ. New York: Cambridge University Press.
Bakeman, Roger and Vicenç Quera 1995b. Log-linear approaches to lag-sequential analysis when
consecutive codes may and cannot repeat. Psychological Bulletin 118: 272–284.
Bakeman, Roger and Byron F. Robinson 1994. Understanding Log-Linear Analysis with ILOG.
An Interactive Approach. Hillsdale, NJ: Lawrence Erlbaum.
Bull, Peter 2002. Communication under the Microscope. The Theory and the Practice of Microa-
nalysis. London: Routledge.
Castellan, N. John 1979. The analysis of behavior sequences. In: Robert B. Cairns (ed.), The Ana-
lysis of Social Interactions: Method, Issues, and Illustrations, 81–116. Hillsdale, NJ: Lawrence
Erlbaum.
Chatfield, Christopher 1973. Statistical inference regarding Markov chain models. Applied Statis-
tics 22: 7–20.
Cohn, Jeffrey F. and Edward Z. Tronick 1987. Mother-infant face-to-face interaction: The
sequence of dyad states at 3, 6, and 9 months. Developmental Psychology 23: 68–77.
56. Sequential notation and analysis for bodily forms of communication 903
Gilstrap, Livia L. and Stephen J. Ceci 2005. Reconceptualizing childrens’ suggestibility: Bidirec-
tional and temporal proprieties. Child Development 76(1): 40–53.
Gnisci, Augusto 2005. Sequential strategies of accommodation: A new method in courtroom. Brit-
ish Journal of Social Psychology 44(4): 621–643.
Gnisci, Augusto and Roger Bakeman 2000. L’osservazione e l’analisi sequenziale dell’interazione
[Sequential observation and analysis of interaction]. Rome: LED.
Gnisci, Augusto and Roger Bakeman 2007. Sequential accommodation of turn taking and turn length:
A study of courtroom interaction. Journal of Language and Social Psychology 9(26): 234–259.
Gnisci, Augusto, Roger Bakeman and Vicenç Quera 2008. Blending qualitative and quantitative anal-
yses in observing interaction. International Journal of Multiple Research Approaches 2(1): 15–30.
Gnisci, Augusto and Marino Bonaiuto 2003. Grilling politicians. A study on politicians’ answers to
questions comparing televised political interviews and legal examinations. Journal of Language
and Social Psychology 22(4): 384–413.
Gnisci, Augusto, Fridanna Maricchiolo and Marino Bonaiuto this volume. Reliability and validity
of coding systems. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
Gnisci, Augusto and Clotilde Pontecorvo 2004. The organization of questions and answers in the
thematic phases of hostile examination: Turn-by-turn manipulation of meaning. Journal of
Pragmatics 36(5): 965–995.
Gottman, John M. 1979. Marital Interaction. Experimental Investigations. New York: Academic Press.
Gottman, John M. 1983. How children becomes friends. Monographs of the Society for Research in
Child Development 48(3): 1–86.
Heyns, Roger W. and Ronald Lippitt 1954. Systematic observational techniques. In: Gardner
Lindzey (ed.), Handbook of Social Psychology. I. Theory and Method. II. Special Fields and
Applications, Vol. 1, Chapter 10. Oxford: Addison-Wesley.
Müller, Cornelia this volume. Introduction. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva
H. Ladewig, David McNeill and Sedinha Teßendorf (eds.), Body – Language – Communica-
tion: An International Handbook on Multimodality in Human Interaction. (Handbooks of Lin-
guistics and Communication Science 38.1.) Berlin: De Gruyter Mouton.
Quera, Vicenç 2008. RAP: A computer program for exploring similarities in behavior sequences
using random projections. Behavior Research Methods 40: 21–32.
Reis, Harry T. and Charles M. Judd (eds.) 2000. Handbook of Research Methods in Social and Per-
sonality Psychology. Cambridge: Cambridge University Press.
Sacks, Harvey 1972. An initial investigation of the usability of conversational data for doing soci-
ology. In: D. Sudnow (ed.), Studies in Social Interaction, 31–74. New York: Free Press.
Sacks, Harvey, Emanuel Schegloff and Gail Jefferson 1974. A simplest systematics for the organi-
zation of turn-taking for conversation. Language 4: 696–735.
Slugoski, Ben R. and Denis J. Hilton 2001. Conversation. In: W. Peter Robinson and Howard Giles
(eds.), The New Handbook of Language and Social Psychology, 193–220. New York: Wiley.
Stevens, Stanley S. 1946. On the theory of scales of measurement. Science 103: 677–680.
Taylor, Paul J. and Ian Donald 2007 Testing the relationship between local cue-response patterns
and the global structure of communication behaviour. British Journal of Social Psychology
46(2): 273–298.
Taylor, Paul J. and Sally Thomas 2008. Linguistic style matching and negotiation outcome. Nego-
tiation and Conflict Management Research 1(3): 263–281.
Abstract
This contribution aims to provide brief descriptions of different methods for analysing
how people read, process and interpret bodily forms of communication. Research on de-
coding can be carried out using different methods according to research questions and
hypotheses as well as to background theoretical approaches. Main method features are:
experimental designs (laboratory, field experiments) and stimulus materials (e.g. photos,
videos). The first section of the contribution provides a brief description of the nonverbal
stimuli, which can be used in the experimental design with the aim of measuring receivers’
reactions (decoding). Such data can be recorded through direct and indirect measures de-
pending on the underling processes (deliberate/reflective or automatic/impulsive) mainly
involved in the perception of bodily communication. In the second section, both types of
procedure are defined, described and exemplified, by analysing reliability, validity and
applicability to body language. Comparisons about advantages and disadvantage of
explicit and implicit measures and the relations between the two kinds of measures are
treated in the final section of the contribution.
1. Introduction
In social interaction, people use the body, on the one hand to communicate their
states or traits to other people, on the other hand to perceive and identify their inter-
locutors’ personality, intentions, attitudes, emotions, etc., whether in a “transparent”
or tactical way.
Any form of communication requires a sender and a recipient, who both have the
task to produce and to interpret the message, respectively. So, the sender is called
upon to correctly encode the message and the recipient is called upon to decode it.
This is true also for bodily communication and irrespective of “honest” or “deceitful”
intentions. Whenever considering bodily movements as a form of communication, it is
necessary that the message conveyed by the sender’s body is interpreted by the recip-
ient. This process involves many other psychological concepts and abilities: the specific
bodily action must be perceived and the recipient has to assign his/her intention to the
sender in order to communicate something via that bodily action. Afterwards, bodily
actions are evaluated and elicit a reaction from the recipient, who is finally able to per-
form a behaviour in reaction or as an answer to the perceived communication. For
example, a “sender” feels happy and shows expressions of joy (smile, laugh, high
tone voice, etc.), which, in turn, can be perceived and interpreted by the “receiver”
as showing that the sender is happy. These processes can happen in a way that may
57. Decoding bodily forms of communication 905
vary in awareness, accuracy, truthfulness, and reliability, both from the sender and the
recipient/perceiver (DePaulo and Friedman 1998).
So, research questions on bodily communication mainly regard two principal bodily
processes: encoding (by the sender) and decoding (by the receiver).
A study by Gifford (1994) on bodily indexes of personality and on their perception
may be remembered as an example, in order to illustrate such research methodological
aspects. Gifford (1994) found significant relations between:
These perceptions and the interpersonal characteristics of the observed subjects. Gif-
ford (1994), in this study, analysed both encoding and decoding processes of bodily
communication, as well as the relationship between their outcomes.
In his study, it is possible to recognize different methodologies and instruments for
the analysis of these two different processes. For the study of the encoding process,
the adopted techniques were: video-recording of participants in conversation, coding,
notation and analysis of their nonverbal signals through nonverbal scoring systems,
and completing of a self-reported questionnaire about their interpersonal traits. For ana-
lysing the decoding process, the stimuli in procedures used by Gifford were observations
of videotaped conversations without audio track by small groups of unacquainted peers,
and the measurement tool was a questionnaire for the evaluation of the subjects’ inter-
personal traits.
This chapter presents an introduction to research methods on decoding, that is, pro-
cedures and techniques of analyzing how people read, process and interpret bodily
forms of communication. It will introduce the reader to basic elements for understand-
ing a large section of bodily communication research. After reading this chapter, the
reader will have a clearer understanding of the methods used in the studies on bodily
communication decoding processes included in this handbook, too.
Within the decoding process, the “decoder”, the receiver of a bodily communica-
tion – through specific cognitive and attentive abilities, nonverbal sensitivity, motiva-
tions, past experiences, and stereotypes – gets, reads, decodes, interprets and
understands bodily signals of his/her interlocutors (the senders). The research questions
are: How does the receiver perceive bodily signals made by the sender? How does s/he
use and interpret such signals in order to know the sender better, to understand his/her
intentions, to behave and to interact with him/her? For example, are smiling people
perceived as more likeable, friendly or trustworthy? How is a speaker judged that
uses many hand gestures in comparison with a speaker that uses none or little of
them? Is s/he evaluated as more effective and/or nice when s/he uses a certain type
of gesture in contrast to when s/he uses another type? Are the recipients conscious
of their evaluations? Which of the receivers’ characteristics affect decoding processes?
What are bodily decoding skills? Which of the receivers’ behaviours are affected by
906 V. Methods
such evaluations? Are these processes automatic or controlled? When studying decoding
processes, researchers try to answer these or similar questions.
conditions. The aim was to investigate the effects of different types of hand gesture on
receivers’ evaluations about the effectiveness and persuasiveness of the speaker and
the message. When gesturing with ideational and conversational gestures, the speaker
was evaluated as more competent and the message as more persuasive than with
self-adaptors.
In an experimental study on gesture processing, Kelly and Goldsmith (2004) showed
participants lecture videos with or without gestures in different conditions of cognitive
load (left or right hemisphere), with the aim to evaluate the gestures’ effect on lecture
comprehension and evaluation, and to understand the role of the hemisphere in gesture
processing. In this case, the researcher, besides manipulating body stimuli (presence/
absence of gesture), also manipulated an individual variable (cognitive load on right/
left hemisphere) to verify the involvement of the right hemisphere on the gesture
processing aimed to the lecture evaluation.
Some stimuli can also be construed by computer from real stimuli. For example,
Kramer and Ward (2010) used composite face images, combining facial photos from
a pool of 63 Caucasian women on the basis of their self-reported personality traits
and health conditions. The authors composed 14 computerised faces on the basis of
highest and lowest scores on seven measures: The Big Five traits (openness, conscien-
tiousness, extraversion, agreeableness, neuroticism) and physical and mental health.
The composed images were shown to the participants with the aim to verify if particular
facial features are signals of personality and health. Outcomes showed that four of
the Big Five traits were accurately differentiated on the basis of the facial features
(conscientiousness was the exception), as was physical health.
Again, for identifying which socially salient characteristics are affected by politi-
cians’ bodily movements and whether they can be used to predict voting behaviour,
Kramer, Arend, and Ward (2010) used film segments of a presidential debate
(Obama vs. McCain) converted into stick-figure displays as stimuli. In these displays,
10 landmarks (eyes, shoulders, elbows, wrists, tie knot, tie point) were manually
identified for each frame and an animation was produced. The study found that only
physical health, perceived only from body motion, predicted the participants’ voting
behaviour.
Besides pictures, films or other stimuli, often ad hoc recorded simulated situations
can be used as stimuli for decoding research, where bodily signal of a confederate is
the stimulus for the participants. Some people (experimenter’s confederates) are
trained to perform particular bodily behaviours towards participants unaware of this.
The aim is to measure behavioural and/or evaluative reactions to the confederate’s
bodily performances. In a study on the role of interpersonal behaviour in perpetuating
stereotypes Castelli et al. (2009), for example, trained a confederate to perform two
specific bodily behaviours while responding (consistently or inconsistently with the ste-
reotype) experimental participants’ questions about the elderly, namely touching her
own face and crossing her legs. Experimenters expected these bodily actions to be mim-
icked by the participants when the answers of the confederate were consistent with the
stereotype of the elderly. The finding indicates that stereotypes are faced with subtle
bodily cues from the audience that can retroactively reinforce their behaviours and
thus make the dismissal of the stereotype difficult to be achieved. Maddux, Mullen,
and Galinsky (2008) manipulated bodily behaviours of experimental participants to
examine their effect on negotiation outcomes. The authors instructed some participants
908 V. Methods
to mimic the mannerisms of their negotiation partner to get a better outcome. Negotia-
tors who mimicked the mannerisms of their opponents both secured better individual
outcomes, and their dyads as a whole also performed better when mimicking occurred
compared to when they did not.
Other bodily features can be manipulated by experimental studies with confederates.
In another study, for example, confederates were instructed to manipulate four differ-
ent kinds of interruption while interviewing naı̈ve participants, providing change-
subject, same-subject, disagreement, and supportive interruptions, at two different
rates, more or less frequently (Gnisci et al. 2012). At the end of the interview, partici-
pants were asked to answer questions on the interview they had just gone through.
Results showed that the negative effects of change- and same-subject interruptions
were amplified when they were more frequent, as were the positive effects of supportive
interruptions. Contrary to expectations, disagreement interruptions were regarded as
positive. The results as a whole provide support for the amplification hypothesis,
whereby the frequency of interruptions during an interaction amplifies the positive or
negative effect of the type of the interruptions on the interruptee.
(i) direct measures: asking participants directly about their perception and evaluation
of bodily signals;
(ii) indirect measures: reaction and evaluation indirectly assessed without explicitly
asking the participants.
In this case, the respondent marks the number representing the intensity of the evalu-
ation s/he gives to the stimulus (1–7 point scale: 1 is nearer to the left adjective, 7 to the
right adjective; 4 indicates a neutral evaluation). This form tries to extract numeric
values quantitatively reflecting the psychological process (other evaluation) to be mea-
sured, in order to use these “numbers” for subsequent quantitative data analyses
(Henerson, Morris, and Fitz-Gibbon 1987) and to finally derive statistical conclusions.
It is also possible to use an ordinal scale response for measuring perception of social
influence among the members of a group during a work discussion: group participants
are asked to put the other group members from the first to the last position, i.e., from
major to minor impact on the final group decision (Maricchiolo et al. 2011).
Compared to closed questions, open questions provide more information about the
participants’ evaluations, but they are harder to interpret and need coding work to yield
quantitative data. They are particularly useful in introductive or preliminary studies
given the richness of details they provide.
Standardized tools for measuring the decoding of bodily signs exist. The Interper-
sonal Perception Task (IPT, Costanzo and Archer 1989), for example, is a videotape
about bodily communication and social perception. Viewers see 30 brief scenes, each
910 V. Methods
30 to 60 seconds long. After each scene, viewers are asked to guess or “decode” some-
thing about each of the Interpersonal Perception Task scenes. The 30 Interpersonal
Perception Task scenes contain a full range of spontaneous body behaviours in context
and different modes of interaction (e.g., telephone, face-to-face, face-to-camera). For
each scene, there is one objectively correct answer. The answers regard the individua-
tion of five common types of social judgments: intimacy, competition, deception, kin-
ship, and status. The viewer can try to determine the correct answer by “reading”
bodily behaviour: e.g., facial expression, tone of voice, gesture, touch, glance, hesitation,
etc. Such a tool can be used to measure bodily ability and sensitivity to identify per-
ceived (with respect to actual) body cues of interpersonal perception. Studies on relia-
bility of the instrument (Archer and Costanzo 1988; Costanzo and Archer 1989) found
that the internal consistency rating (Kuder-Richardson Formula 20, KR-20) of the
instrument is .52 (The Kuder-Richardson Formula 20 is a measure of internal consis-
tency reliability for measures with dichotomous choices; Kuder and Richardson
1937). However, in spite of the low internal consistency, the Interpersonal Perception
Task seems to have fairly good validity as indicated by peer reports and measures of
convergent validity (Costanzo and Archer suggested that the internal consistency was
relatively low because the Interpersonal Perception Task sampled a diverse range of
scenes). Measures of accuracy and confidence of the viewers (Smith, Archer, and Cost-
anzo 1991) showed no significant correlation between the two: they depend on the mod-
ality of stimuli presentation (more confidence in Audiovisual, but more accuracy in
Visual only condition), on gender (women more accurate but less confident than
men), and on culture.
Another example of a standardized tool for measuring sensitivity to bodily commu-
nication is the Profile of Nonverbal Sensitivity (PONS Rosenthal et al. 1979). This
instrument is a standardized measure of accuracy in comprehending bodily communi-
cation and captures the individual difference in sensitivity at decoding posed emotional
states from bodily communication. The Profile of Nonverbal Sensitivity is a 45-min
16 mm sound film (or videotape) that comprises 220 two-second auditory or visual seg-
ments showing a single individual portraying various emotional states. The viewer has
to decide which of two behavioural alternative answers best describes the segment. The
220 segments represent scenarios from four affective quadrants (positive-dominant,
positive-submissive, negative-dominant, negative-submissive) crossed by 11 nonverbal
channels (e.g., face, body, tone of voice). The internal consistency of the Profile of Non-
verbal Sensitivity resulted to range from .86 to .92, and its median test-retest reliability
is .69. Like the Interpersonal Perception Task, the Profile of Nonverbal Sensitivity discri-
minates individual sensitivity to bodily communication, with respect to age (adults are
more sensitive than children), to gender (women are more sensitive than men), to mental
health (psychopathic patients are less sensitive than healthy people), and to culture.
Direct methods asking for opinions, preferences and evaluations are easy to
administer and, when adequately structured, they hold high reliability and validity. Nev-
ertheless, for providing adequate measures of the to-be-assessed construct, they require
respondents to be able to accurately identify and refer their opinions and perceptions and
to be disposed to express their own actual thoughts, even in case of socially undesirable
answers.
Researchers know that these conditions often are not matched so that, above all
when a socially relevant and sensitive issue is faced, people can provide inaccurate
57. Decoding bodily forms of communication 911
answers. For clearing this hurdle, several ways to indirectly assess the individual’s eva-
luations have been proposed. The next paragraph aims to describe some of the most
widespread tools for an indirect measure of attitudes and evaluations in response to
bodily forms of communication and interaction.
Source: elaborated from Blascovich (2000); Fogarty and Stern (1989); Fridlund and Cacioppo
(1986); Gullberg and Holmqvist (1999); Kelly, Kravitz, and Hopkins (2004); Wang and Minor
(2008).
Tab. 57.2: Sequential phases of the Implicit Association Test procedure. Subtracting the response
latencies to the phase 3 from the response latency to the phase 5, an index of preference is
obtained, where negative values indicate a more positive evaluation of self-adaptors while positive
values indicate a more positive evaluation of ideational gestures (the computational procedure is
described in Greenwald, Nosek, and Banaji 2003).
Phase Task Example
1 Single categorisation of attributes Positive (on the right of the screen) versus
Negative (on the left)
2 Single categorisation of concepts Ideational (right) versus Self-adaptors (left)
3 Combined categorisation of attributes Positive and Ideational (right) versus
and concepts Negative and Self-adaptors (left)
4 Single categorisation of concepts Self-adaptors (right) versus Ideational (left)
(reversed on the screen)
5 Combined categorisation of attributes Positive and Self-adaptors versus Negative
and concepts (reversed) and Ideational
Other implicit measures have been proposed in order to provide an absolute index of
the measured construct, such as the Single Category Implicit Association Test (SCIAT;
Karpinski and Steinman 2006) or a tool flexible in establishing contextual characteris-
tics for the evaluative situation (Go-No go Association Task; GNAT; Nosek and Banaji
2001). These instruments and several others, together with their peculiarities, their dif-
ference and similarities are described by De Houwer and Moors (2010; see also De
Houwer et al. 2009).
Similarly based on reaction time, but on a different task is the Affective Priming
Task (Fazio et al. 1995). It rests on the idea that the exposure to an evaluated object
(prime) facilitates the judgment of a target attribute, if the subsequent attitude is con-
noted with the same valence of the prime. In other words, if the prime is evaluated as
positively as the subsequent target, the judgment of the target is faster; on the contrary,
if a positive prime is followed by a negative attribute, the evaluation of the target (neg-
ative attribute) is slower. For example, if a negative facial expression (evaluated nega-
tively by an individual) is primed, it will automatically activate negative evaluations. If
the presentation of this prime is followed by a negative target attribute (e.g. terrible),
the responded will indicate the connotation of the word faster than when the following
word is a positive attribute (e.g. wonderful ). So, the performance of the respondent, in
terms of reaction time, in executing the task to indicate the valence of the attributes,
shows how the respondent evaluates the prime (positively or negatively).
Shariff and Tracy (2009), using three different tools for assessing implicit associa-
tions, tested whether the bodily expression of pride is automatically detected and eval-
uated as a signal of the status of a group member. Their findings indicate that bodily
cues of pride are automatically processed as indicating high status, suggesting an inno-
vative application of the indirect measures to the analysis of bodily communication.
The use of indirect measures and the analysis of automatic (implicit) association
have several important implications and potentiality. When used together with explicit
914 V. Methods
measures, they can inform on different cognitive processings of bodily cues and on the
automatic perception of the sender by the recipient. These considerations assume an
even greater relevance considering that automatically activated evaluations influence
the deliberate and rational cognitive process leading to explicit evaluations and overt
behaviour (Strack and Deutsch 2004).
4. Conclusions
These paragraphs aimed to introduce the main research methods and techniques to
analyse the decoding of bodily communication. Several tools, included in the two
macro-categories called direct and indirect measures, are described together with their
scope, their advantages and disadvantages. Obviously, direct and indirect measures
57. Decoding bodily forms of communication 915
have some overlapping characteristics and some specificities. These features make them
more or less adequate and useful for different research fields and objects of study. In gen-
eral, however, they should not be considered as excluding each other. On the contrary, they
can be used jointly, in order to gain information on different cognitive processes (automatic
and reflective) involved in the decoding and evaluation of bodily signals. Indeed, even if
presently studies on the automatic evaluation of body communication are not so frequent,
it could be interesting to identify possible differences in the outcome of automatic (asso-
ciative) and deliberative (reflective) processing of bodily communicative signals.
The presence of advantages and disadvantages for both of these classes of measure-
ment tools strengthens the idea that, when possible and useful, direct and indirect mea-
sures should be used together, both to tap into different cognitive processes, as stated
so far, but also for analysing the relationship and the reciprocal influences between
the assessed constructs (Gawronski and Bodenhausen 2006; Strack and Deutsch 2004).
5. References
Archer, Dane and Mark Costanzo 1988. The Interpersonal Perception Task (IPT) for Instructors
and Researchers. Berkeley: University of California Extension Media Center.
Blascovich, James J. 2000. Psychophysiological methods. In: Harry T. Reis and Charles M. Judd
(eds.), Handbook of Research Methods in Social Psychology, 117–137. Cambridge: Cambridge
University Press.
Borsboom, Denny, Gideon J. Mellenbergh and Jaap Van Heerden 2004. The concept of validity.
Psychological Review 111: 1061–1071.
Burgoon, Judee K. 1991. Relational message interpretations of touch, conversational distance, and
posture. Journal of Nonverbal Behavior 15: 233–259.
Castelli, Luigi, Giulia Pavan, Elisabetta Ferrari and Joshihisa Kashima 2009. The stereotyper and
the chameleon: The effects of stereotype use on perceivers’ mimicry. Journal of Experimental
Social Psychology 45: 835–839.
Costanzo, Mark and Dane Archer 1989. Interpreting the expressive behavior of others: The Inter-
personal Perception Task. Journal of Nonverbal Behavior 13: 225–245.
Cronbach, Lee J. and Paul E. Meehl 1955. Construct validity in psychological tests. Psychological
Bulletin 52: 281–302.
De Houwer, Jan 2006. What are implicit measures and why are we using them? In: Reinout W.
Wiers and Alan W. Stacy (eds.), The Handbook of Implicit Cognition and Addiction, 11–28.
Thousand Oaks, CA: Sage.
De Houwer, Jan and Agnes Moors 2010. Implicit measures: Similarities and differences. In: Ber-
tram Gawronski and B. Keith Payne (eds.), Handbook of Implicit Social Cognition: Measure-
ment, Theory, and Applications, 176–193. New York: Guilford Press.
De Houwer, Jan, Sarah Teige-Mocigemba, Adriaan Spruyt and Agnes Moors 2009. Implicit mea-
sures: A normative analysis and review. Psychological Bulletin 135: 347–368.
DePaulo, Bella M. and Howard S. Friedman 1998. Nonverbal communication. In: Daniel T. Gil-
bert, Susan T. Fiske and Gardner Lindzey (eds.), The Handbook of Social Psychology, 4th edi-
tion, 3–39. New York: McGraw-Hill.
Dovidio, Jack F. and Russel H. Fazio 1992. New technologies for the direct and indirect assessment
of attitudes. In: Judith Tanur (ed.), Questions about Questions: Meaning, Memory, Expression,
and Social Interactions in Surveys, 204–237. New York: Russell Sage Foundation.
Ekman, Paul 1972. Universals and cultural differences in facial expressions of emotion. In: James K.
Cole (ed.), Nebraska Symposium on Motivation, 207–282. Lincoln: University of Nebraska Press.
Evans, Jonathan 2008. Dual-processing accounts of reasoning, judgment, and social cognition.
Annual Review of Psychology 59: 255–278.
916 V. Methods
Fazio, Russel H., Joni R. Jackson, Bridget C. Dunton and Carol J. Williams 1995. Variability in
automatic activation as an unobstrusive measure of racial attitudes: A bona fide pipeline? Jour-
nal of Personality and Social Psychology 69: 1013–1027.
Fogarty, Christine and John A. Stern 1989. Eye movements and blinks: Their relationship to
higher cognitive processes. International Journal of Psychophysiology 8: 35–42.
Fridlund, Albert J. and John T. Cacioppo 1986. Publication guidelines for human electromyo-
graphic research. Psychophysiology 23: 567–589.
Ganster, Dan C., Harry W. Hennessey and Fred Luthans 1983. Social desirability response effects:
Three alternative models. The Academy of Management Journal 26: 321–331.
Gawronski, Bertram and Galen V. Bodenhausen 2006. Associative and propositional processes in
evaluation: An integrative review of implicit and explicit change. Psychological Bulletin 132:
692–731.
Gifford, Robert 1986. SKANSIV: Seated Kinesic Activity Notation System, Version 4.1. (Available
from Robert Gifford, Department of Psychology, University of Victoria, Victoria, British
Columbia, Canada V8W 2Y2).
Gifford, Robert 1994. A lens framework for understanding the encoding and decoding of interpersonal
disposition in non verbal behaviour. Journal of Personality and Social Psychology 66: 398–412.
Gnisci, Augusto, Ida Sergi, Elvira De Luca and Vanessa Errico 2012. Does frequency of interrup-
tions amplify the effect of various types of interruptions? Experimental evidence. Journal of
Nonverbal Behavior 3601: 39–57.
Greenwald, Antony G., Debbie E. McGhee and Jordan L. K. Schwartz 1998. Measuring individual
differences in implicit cognition: The implicit association task. Journal of Personality and Social
Psychology 74: 1469–1480.
Greenwald, Antony G., Brian A. Nosek and Mahzarin R. Banaji 2003. Understanding and using
the Implicit Association Test: I. An improved scoring algorithm. Journal of Personality and
Social Psychology 85: 197–216.
Guerriero, Laura K. and Tammy A. Miller 1998. Associations between nonverbal behaviors and
initial impressions of instructor competence and course context in videotaped distance educa-
tion courses. Communication Education 47: 30–42.
Gullberg, Marianne and Karl Holmqvist 1999. Keeping an eye on gestures: Visual perception of
gestures in face-to-face communication. Pragmatics and Cognition 7: 35–63.
Gullberg, Marianne and Sotaro Kita 2009. Attention to speech-accompanying gestures: Eye
movements and information uptake. Journal of Nonverbal Behavior 33: 251–277.
Henerson, Marlene E., Lynn L. Morris and Carol T. Fitz-Gibbon 1987. How to Measure Attitudes.
Newbury Park, CA: Sage.
Hofmann, Wilhelm, Bertram Gawronski, Tobias Gschwendner, Huy Le and Manfred Schmitt
2005. A meta-analysis on the correlation between the Implicit Association Test and explicit
self-report measures. Personality and Social Psychology Bulletin 31: 1369–1385.
Karpinski, Andrew and Ross B. Steinman 2006. The single category implicit association test as a
measure of implicit social cognition. Journal of Personality and Social Psychology 91: 16–32.
Kelly, Spencer D. and Linda J. Goldsmith 2004. Gesture and right hemisphere involvement in
evaluating lecture material. Gesture 4(1): 25–42.
Kelly, Spencer D., Corinne Kravitz and Michael Hopkins 2004. Neural correlates of bimodal
speech and gesture comprehension. Brain and Language 89: 253–260.
Kramer, Robin S. S., Isabel Arend and Robert Ward 2010. Perceived health from biological
motion predicts voting behaviour. Quarterly Journal of Experimental Psychology 63: 625–632.
Kramer, Robin S. S. and Robert Ward 2010. Internal facial features are signals of personality and
health. Quarterly Journal of Experimental Psychology 63: 2273–2287.
Kuder, G. Frederic and Marion W. Richardson 1937. The theory of the estimation of test reliabil-
ity. Psychometrika 2: 151–160.
Maddux, William W., Elizabeth Mullen and Adam D. Galinsky 2008. Chameleons bake bigger
pies and take bigger pieces: Strategic behavioral mimicry facilitates negotiation outcomes.
Journal of Experimental Social Psychology 44: 461–468.
58. Analysing facial expression using the facial action coding system (FACS) 917
Maricchiolo, Fridanna, Marino Bonaiuto, Augusto Gnisci and Gianluca Ficca 2008. Effects of
hand gestures on persuasion process. Presented at the 15th General Meeting of the European
Association of Experimental Social Psychology, Opatija, Croatia, 10–14 June 2008.
Maricchiolo, Fridanna, Augusto Gnisci, Marino Bonaiuto and Gianluca Ficca 2009. Effects of dif-
ference types of hand gestures in persuasive speech on receivers’ evaluations. Language and
Cognitive Processes 24: 239–266.
Maricchiolo, Fridanna, Stefano Livi, Marino Bonaiuto and Augusto Gnisci 2011. Hand gesture and
perceived influence in small group interaction. Spanish Journal of Psychology 14(2): 755–764.
Nosek, Brian A. 2005. Moderators of the relationship between implicit and explicit evaluation.
Journal of Experimental Psychology: General 134: 565–584.
Nosek, Brian A. and Mahzarin R. Banaji 2001. The go/no-go association task. Social Cognition 19:
161–176.
Rosenthal, Robert, Judit A. Hall, Robin M. DiMatteo, Peter L. Rogers and Dane Archer 1979.
Sensitivity to Nonverbal Communication: The PONS Test. Baltimore: Johns Hopkins Univer-
sity Press.
Schlenker, Barry R. 1981. Self-Presentation: A conceptualization and model. Paper presented at
the 89th Annual Convention of the American Psychological Association, Los Angeles, CA,
August 24–26, 1981.
Shariff, Azim and Jessica Tracy 2009. Knowing who’s boss: implicit perceptions of status from the
nonverbal expression of pride. Emotion 9: 631–639.
Smith, Eliot R. and Jamie DeCoster 2000. Dual process models in social and cognitive psychology:
Conceptual integration and links to underlying memory systems. Personality and Social Psy-
chology Review 4: 108–131.
Smith, Heather J., Dane Archer and Mark Costanzo 1991 “Just a hunch”: Accuracy and awareness
in person perception. Journal of Nonverbal Behavior 15: 3–18.
Strack, Fritz and Roland Deutsch 2004. Reflective and impulsive determinants of social behavior.
Personality and Social Psychology Review 8: 220–247.
Wang, Yong J. and Michael S. Minor 2008. Validity, reliability, and applicability of psychophysio-
logical techniques in marketing research. Psychology and Marketing 25: 197–232.
Wiggins, Jerry S. 1979. A psychological taxonomy of trait-descriptive terms: The interpersonal
domain. Journal of Personality and Social Psychology 37: 395–412.
Abstract
The Facial Action Coding System (FACS) was developed in 1978 to provide a common
language with which to translate facial expression descriptions, observations and findings
between studies, research groups, populations and species. As the object of study is visual
and subject to perceptual specialization (faces), an objective, common language is essen-
tial. FACS is anatomically based, and aims to identify the minimal units of facial move-
ment related to the underlying muscle movement. Since development of FACS, scientists
have modified the system for use with babies, and also other species. These recent exten-
sions have injected much needed standardisation across research areas, and mean that
larger scale comparative and evolutionary analyses can now be conducted.
condition, than when they viewed the same expression misaligned (for example, a “sad”
upper face misaligned with a “sad” lower face). These findings suggest that expressions
are processed in terms of configural content, so that the shape and position of the
mouth are coded with respect to the shape and position of the eyes. This composite
effect is similar to that found in facial identity processing (Young, Hellawell, and
Hay 1987) and eye gaze processing ( Jenkins and Langton 2003). When composite
facial expressions are presented, therefore, a new configuration is formed that interferes
with the processing of constituent parts. For example, Seyama and Nagayama (2002)
found that eyes are perceived as larger in happy faces than in surprised faces, based
on composite photographs created with eye size constant.
It is possible that ascribing emotion to facial expressions – as people routinely do –
further compounds difficulties in identifying, describing or comparing individual fea-
tures. Eastwood, Smilek and Merikle (2003) found that negative facial expressions
disrupt performance in experiments where participants are asked to count the features
of the face, suggesting that attention to faces can depend on perceived valence (Ohman,
Lundqvist, and Esteves 2001). For example, Waller et al. (2007) found that mor-
phological comparisons between human and chimpanzee facial expressions were
affected by the perceived emotion reflected in the chimpanzee expression. If the chim-
panzee expression (bared-teeth face) was considered to reflect angry emotion, similar-
ity to the human smile on specific physical parameters was underestimated. This finding
is consistent with Reisberg and Chambers’ (1991) argument that images are perceptu-
ally organised in terms of how the object is “understood”, and that initial perceptual
organisation restrains how shapes can be manipulated in imagery. Thus, if a face is pro-
cessed and retained as a prototypical emotional expression schema, features highly
salient of that expression (e.g. upturned mouth corners in a smile) may be retained
in preference to other features. Interpreting a face in emotional terms may also affect
perceived comparisons of images held in memory (mental images), which could affect
scientific discussions when scientists are not using direct observations.
In short, faces appear to be special in terms of how they are cognitively processed.
Therefore, scientific description of facial expressions may require extra measures to
ensure accuracy.
correspondence between facial movements and facial muscles. The Facial Action Cod-
ing System has been the most commonly used facial expression research tool in human
studies for over thirty years and has continued to be updated and refined to reflect new
research and technologies (Ekman, Friesen, and Hager 2002).
The Facial Action Coding System is an anatomically based system and aims to identify
the most basic components of facial expression – the minimal observable movements of
the face. As such, Facial Action Coding System is built largely on the pioneering work of
Duchenne de Boulogne. In The Mechanism of Human Facial Expression, Duchenne
(1872, reprinted in 1990) identified the anatomical basis of facial movements through sur-
face electrical stimulation of facial muscles. These physiological experiments were the
first to be published accompanied by photography, which is fitting for such visual subject
matter. Duchenne produced a set of images documenting the appearance of each individ-
ual muscle contraction, and discussed how each might be related to specific emotions.
More recently, the movement of each individual muscle has been documented using
intramuscular electrical stimulation (Waller et al. 2006). Intramuscular (as opposed to
surface) stimulation allows the muscle to be accessed directly, thus minimising activity
from surrounding muscles and displaying the contribution of that muscle alone. An
understanding of the contribution of each individual muscle is a vital and fundamental
basis of Facial Action Coding System.
The manual presents each Action Unit in turn. First, a schematic of the target muscle
superimposed on a face photograph is used to show how the muscles are anatomically
located (see Fig. 58.1A). A second figure shows the location and direction of muscle
action (see Fig. 58.1B), and the appearance changes (and minimal criteria for coding)
are then listed. The coder is taught how to distinguish between similar Action Units
(such as AU 14, dimpler; and AU 20, lip stretch) by identifying subtle differences,
and is also taught specific co-occurrence rules for movements which change when active
in combination with another. The Facial Action Coding System can also be used to
identify movements that occur only on one side of the face (unilateral), or on the
upper or lower lip. Intensity of an Action Unit can be recorded using additional
codes (a = trace evidence, e = maximum), but use of intensity codes is not essential.
The coder learns to produce the Action Unit by first attempting the movement while
looking in a mirror, and then feeling the underlying muscle movement. The coder
then practices coding still photographs and video clips of posed movements. Subtle ac-
tions can be difficult to detect in still photographs, and so the emphasis (and final test) is
on recognising Action Units through recordings of continuous movement.
reliability of coding: dividing the number of Action Unit scores on which two persons
agreed, by the sum of the number of Action Units scored by each person (Ekman and
Friesen 1978). The Agreement Index deals only with the presence or absence of specific
Action Units, and does not address the reliability of intensity ratings. Although this for-
mula is easy to apply and has been used extensively, it does not take the probability of
chance agreements into account. Therefore, some researchers have encouraged the use
of other statistics, such as Cohen’s Kappa coefficient. In addition, by treating all Action
Units under one umbrella, the Agreement Index ignores possible differences in relia-
bility coding for individual Action Units. This issue was addressed by Sayette and col-
leagues (Sayette et al. 2001), who assessed inter-rater reliability of individual Facial
Action Coding System Action Units from three separate studies designed to elicit
spontaneous facial expressions of emotion (rating the pleasantness of odors, present-
ing nicotine cues to smokers, and requiring participants to present a speech about
their physical appearance). Using coefficient kappa as their measure of interobserver
agreement, these researchers found that the reliability of Facial Action Coding Sys-
tem was good to excellent for 90% of the Action Units that were assessed. Intensity
ratings were considered to be good when a 3-point (rather than 5-point) rating scale
was used.
Facial Action Coding System can therefore also be used to code live events that cannot
be recorded, assuming the face can be clearly seen. Given the focus on emotion, the
Emotion Facial Action Coding System cannot be used for studies in which Action
Units are not associated with emotion. Importantly, given the a priori assumptions
about facial movements and emotion, it cannot be used to test whether additional
Action Units are associated with emotion.
database without Facial Action Coding System, the use of Facial Action Coding System
allows computer scientists to benefit from the large research base already available,
eliminating the need to collect a new data base and train detectors for each new
application.
Automated analysis is particularly useful for quantifying temporal dynamics of facial
movements, as well as changes in intensity of expression, both of which are especially
difficult for human coders. For example, Ambadar, Cohn, and Reed (2009) showed
that smiles that occurred when participants reported amusement differed from “embar-
rassed” and “polite” smiles with regard to variables such as velocity and offset. Thus,
automated processing has extended the types of research that are possible by greatly
expanding the ability to conduct temporal analyses with Facial Action Coding System
data, and to analyze Facial Action Coding System data together with gestures and
speech. The original goal of Facial Action Coding System was to identify all movements
that humans could reliably distinguish. With the refinement of automated systems, it is
possible that this repertoire will eventually expand to include movements that are not
differentially perceived at a conscious level.
One of the most active areas of research, due in large part to the continuing efforts of
Ekman and his colleagues, is the investigation of lying and deception. According to
Ekman, lying with the face can refer to showing an emotion you do not feel, or mask-
ing/blunting an emotion you do feel, either by showing a different emotion or a neutral
face. Through Facial Expression/Awareness/Compassion/Emotion (F.A.C.E.) training,
he teaches people ranging from security agents to business executives to read “micro-
expressions of emotion” that may reveal subtle emotions or deception. Ekman’s
research has spawned a popular American television show, “Lie To Me” (for which
he is a consultant), where the viewing audience can see mock-up photos of faces with
real Facial Action Coding System Action Units in a fictionalized drama.
Fig. 58.1: Training material from the Facial Action Coding System manual (FACS: Ekman,
Friesen, and Hager 2002), showing the anatomical arrangement of the upper face muscles (A),
and direction of muscle action (B).
System system can benefit nonhuman primate facial expression research, which are
similar to those which called for the human Facial Action Coding System in the first
place:
The first modification of Facial Action Coding System for use with another species was
ChimpFACS (Vick et al. 2007) for use with chimpanzees (Pan trogolodytes, www.
chimfacs.com). This was followed by MaqFACS (Parr et al. 2010) for use with rhesus
macaques (Macaca mulatta, http://userwww.service.emory.edu//˜lparr/MaqFACS.html),
and most recently, GibbonFACS for use with gibbons and siamangs.
Modified Facial Action Coding Systems have been developed to be explicitly
comparable with the human Facial Action Coding System. Thus, development has
tended to follow a clear step-by-step process. The facial muscles of each species were
initially investigated through dissections to document the size, structure and attach-
ments of muscles in comparison to humans (for chimpanzees, Burrows et al. 2006;
for rhesus macaques, Burrows, Waller, and Parr 2009; for gibbons, Burrows et al.
58. Analysing facial expression using the facial action coding system (FACS) 927
Chimpanzee
Gibbon
Human
Rhesus macaque
Fig. 58.2: The facial muscles of the four study species used for FACS development: humans (from
Waller, Cray, and Burrows 2008), chimpanzees (from Burrows et al. 2006), gibbons (from Burrows
et al. 2011) and rhesus macaques (from Waller et al. 2008, adapted from Huber 1931). All images
by Tim Smith.
928 V. Methods
0 10c 10e
Fig. 58.3: AU 10 in humans and chimpanzees at moderate intensity (10c) and extreme intensity
(10e) compared to neutral (Ekman et al. 2002; Vick et al. 2007). Human sequence also includes
mouth opening (AU 25) and chimpanzee sequence includes mouth opening (AU 25) and relaxed
lip (AU 16).
Units, and were associated with subtly different contexts, behavioural patterns and sex
differences. The study indicates a level of complexity and subtlety in chimpanzee
facial expression previously unknown. In addition to this specific study, similarity
between basic human and chimpanzee facial expressions has been documented
using Facial Action Coding System (Parr and Waller 2007) and ChimpFACs has
been used to identify categories of chimpanzee facial expression using a bottom-up
approach, using discriminant function analysis (Parr, Waller, and Heintz 2008). Finally,
Seth Dobson has used a Facial Action Coding System style approach to conduct com-
parative analyses on the relationship between facial movement/expression, and socio-
ecological variables across the primate order (Dobson 2009a, 2009b), which promises
to be a very useful approach to understand the evolution of facial expression in
primates.
58. Analysing facial expression using the facial action coding system (FACS) 929
5. Conclusion
Facial Action Coding System provides a common language with which to translate
facial expression descriptions, observations and findings between studies, research
groups, populations and species. When the object of study is visual and subject to
perceptual specialisation, such a common language is essential. Facial Action Coding
System has allowed scientists to study facial expression with a level of detail and objec-
tivity previously unavailable, and the recent extension to primate species has injected
much needed standardisation within these related fields.
6. References
Ambadar, Zara, Jeffrey F. Cohn and Lawrence I. Reed 2009. All smiles are not created equal:
Morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. Jour-
nal of Nonverbal Behavior 33(1): 17–34.
Bartlett, Marian, Gwen Littlewort, Esra Vural, Kang Lee, Mujdat Cetin, Aytul Ercil and Javier
Movellan 2008. Data mining spontaneous facial behavior with automatic expression coding.
Verbal and nonverbal features of human-human and human-machine interaction. Lecture
Notes in Computer Science 5042: 1–20.
Brown, Donald E. 1991. Human Universals. Philadelphia: Temple University Press.
Burrows, Anne M., Rui Diogo, Bridget M. Waller, Christopher J. Bonar and Katja Liebal 2011. Evo-
lution of the muscles of facial expression in a monogamous ape: Evaluating the relative influ-
ences of ecological and phylogenetic factors in hylobatids. Anatomical Record 294: 645–663.
Burrows, Anne M., Bridget M. Waller and Lisa A. Parr 2009. Facial musculature in the rhesus
macaque (Macaca mulatta): Evolutionary and functional contexts with comparisons to chim-
panzees and humans. Journal of Anatomy 215: 320–334.
Burrows, Anne M., Bridget M. Waller, Lisa A. Parr and Christopher J. Bonar 2006. Muscles of
facial expression in the chimpanzee (Pan troglodytes): Descriptive, comparative, and phyloge-
netic contexts. Journal of Anatomy 208(2): 153–168.
930 V. Methods
Calder, Andrew J., Andrew W. Young, Jill Keane and Michael Dean 2000. Configural information
in facial expression perception. Journal of Experimental Psychology: Human Perception and
Performance 26(2): 527–551.
Cohn, Jeffrey F. and Takeo Kanade 2007. Use of automated facial image analysis for measurement
of emotion expression. In: James A. Coan and John B. Allen (eds.), The Handbook of Emotion
Elicitation and Assessment, 222–238. New York: Oxford University Press.
Dobson, Seth D. 2009a. Allometry of facial mobility in anthropoid primates: Implications for the
evolution of facial expression. American Journal of Physical Anthropology 138: 70–81.
Dobson, Seth D. 2009b. Socioecological correlates of facial mobility in nonhuman anthropoids.
American Journal of Physical Anthropology 139: 413–420.
Duchenne de Boulogne, Guillaume B. 1990. The Mechanism of Human Facial Expression. New
York: Cambridge University Press. First published [1862].
Eastwood, John D., Daniel Smilek and Philip M. Merikle 2003. Negative facial expression cap-
tures attention and disrupts performance. Perception and Psychophysics 65: 352–358.
Ekman, Paul, Richard J. Davidson and Wallace V. Friesen 1990. Duchenne’s smile: Emotional
expression and brain physiology II. Journal of Personality and Social Psychology 58: 342–353.
Ekman, Paul and Wallace V. Friesen 1978. Facial Action Coding System: A Technique for the Mea-
surement of Facial Movement. Palo Alto, CA: Consulting Psychologists Press.
Ekman, Paul, Wallace V. Friesen and Joseph C. Hager 2002. Facial Action Coding System. Salt
Lake City, UT: Research Nexus.
Ekman, Paul, Wallace V. Friesen and Silvan S. Tomkins 1971. Facial affect scoring technique: A
first validity study. Semiotica 3: 37–58.
Ekman, Paul, Robert W. Levenson and Wallace V. Friesen 1983. Autonomic nervous system activ-
ity distinguishes among emotions. Science 221: 1208–1210.
Ekman, Paul and Erika L. Rosenberg 2005. What the Face Reveals: Basic and Applied Studies of
Spontaneous Expression Using the Facial Action Coding System (FACS), 2nd edition. Oxford:
Oxford University Press.
Ekman, Paul, Richard Sorenson and Wallace V. Friesen 1969. Pan-cultural elements in facial dis-
plays of emotions. Science 164(3875): 86–88.
Faigin, Gary 1990. The Artist’s Complete Guide to Facial Expression. New York: Watson-Guptill.
Friesen, Wallace V. and Paul Ekman 1984. EMFACS-7; Emotional Facial Action Coding System.
Unpublished Manual.
Grunau, Ruth E., Tim Oberlander, Lisa Holsti and Michael F. Whitfield 1998. Bedside application
of the neonatal facial coding system in pain assessment of premature neonates. Pain 76: 277–286.
Hjortsjö, Carl-Herman 1970. Man’s Face and Mimic Language. Sweden: Nordens Boktryckeri, Lund.
Huber, Ernst 1931. Evolution of Facial Musculature and Expression. Baltimore: The John Hopkins
University Press.
Izard, Carroll. E. 1979. The maximally discriminative facial movement coding system (MAX). Un-
published manuscript, available from Instructional Resource Center, University of Delaware.
Jenkins, Jenny and Stephen R.H. Langton 2003. Configural processing in the perception of eye-
gaze direction. Perception 3: 1181–1188.
Kanfer, Stefan 1997. Serious Business: The Art and Commerce of Animation in America from
Betty Boop to Toy Story. New York: Scribner.
Matsumoto, David and Bob Willingham 2009. Spontaneous facial expressions of emotion of con-
genitally and non-congenitally blind individuals. Journal of Personality and Social Psychology
96(1): 1–10.
Messinger, Daniel S., Mohammad H. Mahoor, Sy-Miin Chow and Jeffrey F. Cohn 2009. Auto-
mated measurement of smile dynamics in mother-infant interaction: A pilot study. Infancy
14(3): 285–305.
Ohman, Arne, Daniel Lundqvist and Francisco Esteves 2001. The face in the crowd revisited: A
threat advantage with schematic stimuli. Journal of Personality and Social Psychology 80: 381–
396.
58. Analysing facial expression using the facial action coding system (FACS) 931
Oster, Harriet 2006. Baby FACS: Facial action coding system for infants and young children. Un-
published monograph and coding manual, New York University.
Parr, Lisa A. and Bridget M. Waller 2007. The evolution of human emotion. In: Todd M. Preuss
and John H. Kaas (eds.), The Evolution of Primate Nervous Systems. Oxford: Academic
Press.
Parr, Lisa A., Bridget M. Waller, Anne M. Burrows, Katalin M. Gothard and Sarah-Jane Vick
2010. MaqFACS: A muscle-based facial movement coding system for the macaque monkey.
American Journal of Physical Anthropology 143: 625–630.
Parr, Lisa A., Bridget M. Waller and Matthew Heintz 2008. Facial expression categorization by
chimpanzees using standardized stimuli. Emotion 8(2): 216–231.
Parr, Lisa A., Bridget M. Waller and Sarah-Jane Vick 2007. New developments in understanding
emotional facial signals in chimpanzees. Current Directions in Psychological Science 16(3):
117–122.
Reisberg, Deborah and Daniel Chambers 1991. Neither pictures or propositions: What can we
learn from a mental image? Canadian Journal of Psychology 45: 336–352.
Sayette, Michael A., Jeffrey F. Cohn, Joan M. Wertz, Michael A. Perrott and Dominic J. Parrott
2001. A psychometric evaluation of the facial action coding system for assessing spontaneous
expression. Journal of Nonverbal Behavior 25: 167–186.
Seyama, Junichiro and Ruth Nagayama 2002. Perceived eye size is larger in happy faces than in
surprised faces. Perception 31: 1153–1155.
Vick, Sarah-Jane and Annika Paukner 2010. Variation and context of yawns in captive chimpan-
zees (Pan troglodytes). American Journal of Primatology 72: 262–269.
Vick, Sarah-Jane, Bridget M. Waller, Lisa A. Parr, Marcia C. Smith Pasqualini and Kim A. Bard
2007. A cross-species comparison of facial morphology and movement in humans and chim-
panzees using the facial action coding system (FACS). Journal of Nonverbal Behaviour 31:
1–20.
Waller, Bridget M., Kim A. Bard, Sarah-Jane Vick and Marcia C. Smith Pasqualini 2007. Per-
ceived differences between chimpanzee and human facial expressions are related to emotional
interpretation. Journal of Comparative Psychology 121(4): 398–404.
Waller, Bridget M., James J. Cray and Anne M. Burrows 2008. Selection for universal facial emo-
tion. Emotion 8(3): 435–439.
Waller, Bridget M., Lisa A. Parr, Katalin M. Gothard, Anne M. Burrows and Andrew J. Fugle-
vand 2008. Mapping the contribution of single muscles to facial movements in the Rhesus
Macaque. Physiology & Behaviour 95(1–2): 93–100.
Waller, Bridget M., Sarah-Jane Vick, Lisa A. Parr, Kim A. Bard, Marcia C. Smith Pasqualini, Ka-
talin Gothard and Andrew J. Fuglevand 2006. Intramuscular stimulation of facial muscles in
humans and chimpanzees: Duchenne revisited. Emotion 6(3): 367–382.
Young, Andrew W., Deborah Hellawell and Dennis C. Hay 1987. Configurational information in
face perception. Perception 16: 747–759.
Abstract
Following a review of psychiatric research on motor psychopathology, the Movement
Psychodiagnostic Inventory (MPI) is described as a coding system for the microanalysis
of nonverbal behavior. The value of a highly refined movement coding method is illu-
strated with case examples, and research is discussed that indicates the potential of the
MPI for differentiating schizophrenia spectrum disorder, borderline, and narcissistic per-
sonality disorder through a multidimensional scaling of the coded data. Finally, the clin-
ical potential of the analysis is illustrated with MPI coding of a patient during a therapy
session that identified a distinctive A B C A sequence of behaviors intricately related to
the therapist’s intervention.
1. Introduction
Anna, a woman in her mid-40s diagnosed with chronic undifferentiated schizophrenia,
had spent many years in a psychiatric hospital. One day, she was selected for an inten-
sive therapy unit staffed by psychiatry residents. A few weeks into her new therapy
regimen, Anna explained to her young psychiatrist why he could not help her. “It’s
my movement, doctor. One part goes this way and the other goes that way by itself.”
What Anna reported was visible. A foot would tap nervously in one rhythm while
she rubbed her hands in another or she might shift her position suddenly in a chaotic
sequence that was far more disorganized than what might be called “ungraceful” or
“impulsive.” Picking up on her implicit request for help with this, her psychiatrist re-
commended dance/movement therapy. She responded well to the sessions, trying
hard to move rhythmically with the music and improvise various steps and motions
in synchrony with the dance therapist in their concerted effort to help her organize
her movements. At one moment, she spontaneously jumped into the air and landed
in one perfectly coordinated motion, then laughed with delight as if relieved to be so
connected within herself.
emotions in normal people. The 19th century classic texts on dementia praecox or
schizophrenia contain descriptions of pathological behaviors such as fixed postures,
highly exaggerated facial expressions, and contorted movements (Kraepelin [1919] 1971).
Today, thanks to advances in treatment, a person diagnosed as schizophrenic may not
display the obvious exaggerations and bizarre mannerisms seen in Darwin’s time,
but motor symptoms and signs of severe psychopathology are still very important for
accurate diagnosis and treatment.
Most methods devoted exclusively to coding motor pathology in psychiatric patients,
such as the Abnormal Involuntary Movement Scale (National Institute of Mental
Health 1974) or the Neurological Rating Scale for Extrapyramidal Effects (Simpson
and Angus 1970), are designed for the assessment of the effects of neuroleptic medica-
tions. If the assessment instrument is devised for psychiatric research and diagnosis per
se, the coding of movement pathology tends to be just one part of many, and embedded
within a range of symptoms. For example, in the Positive and Negative Syndrome Scale
(PANSS) (Kay, Fiszbein, and Opler 1987: 276), “blunted affect” is rated on a 7-point
severity scale and is defined as “diminished emotional responsiveness as characterized
by a reduction in facial expression, modulation of feelings, and communicative ges-
tures.” Blunted affect is just one of 30 items in the Positive and Negative Syndrome
Scale. Other examples are severity ratings of “conceptual disorganization” and “delu-
sions.” In the Signs and Symptoms of Psychotic Illness (SSPI) rating scale (Liddle
et al. 2002), 4 of the 20 items are related to motor pathology (“overactivity,” “underac-
tivity,” “flat affect” and “inappropriate affect”).
Such instruments are major improvements over clinical impression in that they dem-
onstrate good observer reliability and great value for research on differential diagnosis,
change over time, and discriminate medication effects from symptoms of the psychiatric
illness. However, there are ways to operationalize the movement coding that are far
more detailed and unambiguous. The Movement Psychodiagnostic Inventory (MPI)
(Davis 1970, 1997) focuses only on movement pathology in very refined movement
terms and is based on the premise that this will generate new insights and discoveries
about the nature of severe mental illness.
The Movement Psychodiagnostic Inventory, first developed by the author in the
1960s, has unusual roots. It was influenced by the dance/movement notation and analy-
sis methods of Rudolf Laban as applied to conversational behavior by his student, Irm-
gard Bartenieff. Observation methods originally developed for dance analysis are based
on extremely fine-grained descriptions of movement in its own terms. As such they are
complex, patterned, accurate, operationalized, and comprehensive. Laban and Barte-
nieff understood their value for the study of behavior, and Bartenieff pioneered the
application of what came to be called Laban Movement Analysis to the study of the
movement patterns of psychiatric patients (Bartenieff and Davis [1965] 1972). Drawing
on the Laban tradition is especially valuable because without immersion in the com-
plexity and richness of movement in its own terms, the observer will tend to look at
movement in limited ways.
3. Microanalysis of movement
As film and video made very fine levels of analysis feasible, the microanalysis of move-
ment behavior using slow motion and repeat viewing helped researchers to identify
934 V. Methods
nuances of initiation, coordination, and spatial changes easily missed in real time view.
Researchers such as linguists or anthropologists who study the nonverbal dimensions of
communication tend to code movement in terms of what body parts are moving in what
direction, and they often use common action terms like “leg cross”, “hand gesture”,
“head nod”, etc. Simple actions in vernacular terms are the way we commonly
view nonverbal behavior. But determining what to code and how in a study of psycho-
pathology is, in large part, a matter of supplementing measures of “what” people do
(e.g. number of head nods) with an examination of exactly “how” they do it (e.g. the
tempo, accent pattern, intensity of the nodding). In the 1960s, there was considerable
support for film and video studies of therapy sessions in the United States, and sup-
port for research on the communication patterns of schizophrenic patients drew re-
searchers from diverse disciplines: psychiatry, anthropology, ethology and linguistics.
Working as a research team, anthropologist Ray L. Birdwhistell (1970) and psychia-
trist Albert E. Scheflen (1973) showed through film microanalysis that psychiatric pa-
tients do not behave differently because they “do” different things than normal adults.
They are different because of how they do the fundamental business of sustaining a
conversation.
Decisions about what aspects of nonverbal behavior to attend to shape the results of
the microanalysis. For example, Scheflen (1973), interested in the regularity and orga-
nization of face-to-face interactions, analyzed distances between people, the relation-
ships of their sitting positions, the synchrony and echoing of actions between them,
and so on. In contrast, Ekman and Friesen (1978), interested in the study of transient
emotions, developed the Facial Affect Coding System (FACS) to microcode very subtle
changes in facial expression and conflicts between what one says and the affect that the
face may fleetingly express. Ellgring (1989), applying a simpler version of the Facial
Affect Coding System to the study of mood change and subjective well-being, found
a general decrease in facial activity in endogenous depressed individuals compared
with normal controls, but not in neurotic depressed patients.
Precision and accuracy becomes critical in a study of movement disorder. Consider
how differently a “sudden action” can be coded. Ekman (1985) describes “micromen-
tary expressions” or MMEs, changes in the face so sudden and brief they last barely
1/24ths of a second and are reliably coded only by those trained to see them or the
few with a natural talent for perceiving them. Micromentary facial expressions
(MMEs) are sudden actions of the face. So are facial tics of people with Tourette’s syn-
drome. In the Movement Psychodiagnostic Inventory (Davis 1997), one form of disor-
ganization is described as a lightening quick movement “out of the blue” that disrupts
the flow of the action, and this can occur in any part of the body, including the face. In
all three examples, sudden appears to mean less than 1/12 of a second. But each is a very
different action. The micromentary facial expressions are flashes of perceptible facial
expressions that trained observers can reliably identify as traces of specific affects
such as disgust, fear, sadness, surprise, etc. Ekman tracks micromentary facial expres-
sions as “leakage” of feelings that contradict what is being said or implied, i.e., related
to a truth that must not be expressed. He does not associate them with psychopathology
but with a context in which the person is in conflict. Presumably, anyone can display
micromentary facial expressions.
Tourette facial tics are, on close examination, sudden spasms of parts of the facial
musculature that do not necessarily constitute a facial expression with a categorical
59. Coding psychopathology in movement behavior 935
name such as angry, sad, disgusted, and the like. Like micromentary facial expressions,
they disrupt the normal flow of the action and “break up” the face for a very brief
moment, but their form is different. They can be larger and slightly longer in duration
than micromentary facial expressions, and, although the tics may increase with stress, the
form of the tic itself does not seem particular to the context. Also, unlike the other
examples, the person with Tourette’s may work to transform the tic into conventional
behaviors or appear to consciously control the tic.
A “sudden, out of the blue” facial action as coded in the Movement Psychodiagnos-
tic Inventory appears to disrupt the continuity of the person’s movement, and is not
likely to be limited to the face or a particular set of facial muscles, but can occur in
other parts of the body and during very different types of actions. In other words, “sud-
den” actions as coded on the Movement Psychodiagnostic Inventory are more diffuse
and disorganizing in general than micromentary facial expressions or Tourette tics.
All three actions are in some way beyond one’s control, but there are perceptible dif-
ferences in their form and how they occur with other behaviors. These tiny, but visible
distinctions are so crucial that accurately coding “sudden facial action” can become an
exercise in differential diagnosis, distinguishing the normal person (who may be lying)
from the person with a neurological disorder from the person with severe psychopathol-
ogy. This comparison is presented to illustrate how critical it is to precisely define the
movement pathology in movement terms. Considering many aspects of nonverbal
behavior and honing the observations and coding to a high level detail can make all
the difference.
Copyright 1991 Martha Davis. For permission to quote or use contact author at
[email protected]
10 phrases of gesticulation, no head movements while speaking and so on.) Actions can
be infrequent or restricted for many reasons that have little or nothing to do with psy-
chopathology. For example, averted gaze might be related to age or cultural conven-
tions for addressing an authority figure. However, not looking at the other during a
59. Coding psychopathology in movement behavior 937
conversation may be related to a more severe diagnosis, especially when combined with
disordered patterns or other forms of restriction.
The Movement Psychodiagnostic Inventory part 2 is called Primary Categories and
deals with how the person moves, the qualitative aspects of nonverbal behavior. It is
composed of 10 categories, each with from 3 to 12 items. While the focus is on serious
patterns of disturbance, these patterns may be very subtle or infrequently displayed.
*Adapted from Davis (1997) Movement Psychodiagnostic Inventory. For a copy of the full
MOVEMENT PSYCHODIAGNOSTIC INVENTORY and guide, contact the author at
[email protected]
As discussed earlier, with advances in treatment, motor symptoms of mental illness are
not necessarily obvious or exaggerated in the ways that they were before the 1950s.
Tab. 59.2 lists the Movement Psychodiagnostic Inventory Primary Categories and a
sample item from each. This is an inventory of patterns that have been identified by
the author and Irmgard Bartenieff in an initial project and later in the author’s research
on hospitalized psychiatric patients. It is an inventory based on film, video and live ob-
servations of over 100 patients (22 and 62 in formal studies, the rest in clinical studies).
Although there have been few additions since the 1980s, the list of patterns is likely to
expand with future research.
The first two categories, disorganization and immobility, have the most items, and
their prominence in the inventory may reflect the fact that the Movement Psychodiag-
nostic Inventory was developed primarily from observation of people diagnosed within
the schizophrenia spectrum. Disorganization in movement can occur in many ways,
some of which appear to be more serious than others. The patterns considered the
most serious are listed first and identified as such. (The criteria for “serious” – like
938 V. Methods
example, disorganization may occur only in the gestures accompanying speech, but not
in grooming behaviors, fidgeting (self-related subsystem) or handling objects (instru-
mental subsystem).
The decision about how much information to record depends to a degree on one’s
aims and resources. Although the complete coding is time-consuming and labor inten-
sive, it has definite advantages. For example, disagreements are easier to identify and
resolve through consensus when the coding is detailed.
In most of our research studies applying the Movement Psychodiagnostic Inventory,
we have used three observers who independently code the individual’s movements
without sound or specific diagnostic information. Observer reliability is usually assessed
with Cohen’s kappa which corrects for chance agreements on qualitative judgments.
Because videotapes of psychiatric sessions are so difficult to secure and the coding is
so labor intensive, rather than throw out data points on which there is disagreement
or average codes which makes little sense for qualitative judgments, observers are
asked to review the points of disagreement and to make a decision based on consensus.
Although there are studies in which time-sampling of segments may be warranted, this
is not a good idea with the Movement Psychodiagnostic Inventory. Often, important
patterns are very rarely displayed, and limiting the sample increases the chance that
they will be missed.
Originally, we predicted that serious forms of disorganization, perseveration and
immobility would be associated with schizophrenia. The reality appears more compli-
cated, as we found in a study of 19 psychiatric patients with schizophrenia spectrum dis-
orders and 33 patients diagnosed with borderline and personality disorders (Davis,
Cruz, and Berger 1995). There were univariate differences in the social behavior con-
ventions operationalized in the Action inventory, with schizophrenia spectrum patients
displaying greater disturbance in behaviors that serve orienting to the other, gesticula-
tion and head movements underlining speech, and so on. However, the presence of the
formal patterns defined in Part 2 are not pathognomonic of schizophrenia in a simple
symptom used for diagnosis. The distribution of formal motor signs was equal between
the two groups, but mulitidimensional scaling of the data showed that the way the symp-
toms were configured was different. Naive impressions that a person is disturbed appear
to be based in part on visible disorders in conventional social behaviors, while the diag-
nostic significance of the more subtle qualitative signs of motor disorder depend on how
they co-occur.
The Movement Psychodiagnostic Inventory distinguished between borderline per-
sonality disorder and narcissistic personality disorder groups as well (Davis, Cruz,
and Berger 1995). A small literature on movement characteristics of patients with per-
sonality disorder suggested that for this study six Movement Psychodiagnostic Inven-
tory primary categories should be used: disorganization, immobility, diffusion, low
spatial complexity, flaccidity and hyperkinesis. The borderline patients showed mark-
edly higher mean scores on disorganization and low spatial complexity and higher
scores on hyperkinesis and flaccidity than the narcissistic group, with somewhat lower
mean scores on immobility. To reiterate, while these initial studies support the validity
of the Movement Psychodiagnostic Inventory as an instrument for studying psycho-
pathology, simple univariate relationships – presence of x means diagnosis y – are
not supported. Analysis of the formal qualitative distinctions of Movement Psychodiag-
nostic Inventory Part 2 appear to reveal “deep structure” differences and support a
940 V. Methods
more complicated model of the nature of severe psychopathology than one based on
univariate differences.
7. References
Bartenieff, Irmgard and Martha Davis 1972. Effort-shape analysis of movement. In: Martha
Davis (ed.), Research Approaches to Movement and Personality. New York: Arno Press.
First published [1965].
Birdwhistell, Ray L. 1970. Kinesics and Context. Philadelphia: University of Pennsylvania Press.
Berger, Miriam Roskin and Robyn Flaum Cruz 1998. Movement characteristics of borderline and
narcissistic personality disorder patients. Poster presentation at the American Dance Therapy
Association Annual Conference, Albuquerque, New Mexico.
60. Laban based analysis and notation of body movement 941
Darwin, Charles 1965. The Expression of Emotions in Man and Animals. Chicago: University of
Chicago Press. First published [1872].
Davis, Martha 1970. Movement characteristics of hospitalized psychiatric patients. Proceedings of
Fifth Annual Conference, American Dance Therapy Association, 25–45.
Davis, Martha 1985. Nonverbal behavior research and psychotherapy. In: George Stricker and
Roert H. Keisner (eds.), From Research to Clinical Practice, 89–112. New York: Plenum.
Davis, Martha 1997. Guide to movement analysis methods. Part 2: Movement psychodiagnostic
inventory. Unpublished guidebook available from author at [email protected].
Davis, Martha, Robyn Flaum Cruz and Miriam Roskin Berger 1995. Movement and psychodiag-
nosis: Schizophrenia spectrum and dramatic personality disorders. Presented at the Annual
Conference of the American Psychological Association.
Ekman, Paul 1985. Telling Lies. New York: W.W. Norton.
Ekman, Paul and Wallace V. Friesen 1978. The Facial Action Coding System. Palo Alto, CA: Con-
sulting Psychologists Press.
Ellgring, Heiner 1989. Nonverbal Communication in Depression. Cambridge: Cambridge Univer-
sity Press.
Kay, Stanley R., Abraham Fiszbein and Lewis A. Opler 1987. The positive and negative syndrome
scale (PANSS) for schizophrenia. Schizophrenia Bulletin 13: 261–276.
Kraepelin, Emil 1971. Dementia Praecox and Paraphrenia. Translated by R. M. Barclay. New
York: Krieger. First published [1919].
Liddle, Peter F., Elton T.C. Ngan, Gary Duffield, King Kho and Anthony J. Warren 2002. Signs and
symptoms of psychotic illness (SSPI): A rating scale. British Journal of Psychiatry 180: 45–50.
National Institute of Mental Health 1974. Abnormal involuntary movement scale (AIMS). Wash-
ington, DC: Alcohol, Drug Abuse, and Mental Health Administration, U.S. Department of
Health, Education and Welfare.
Scheflen, Albert E. 1973. Communicational Structure: Analysis of a Therapy Transaction. Bloo-
mington: Indiana University Press.
Simpson, G. M. and J. W. S. Angus 1970. A rating scale for extrapyramidal side effects. Acta Psy-
chiatrica Scandinavica 45: 11–19.
Abstract
Laban-based analysis, originally developed for dance, can also serve as an excellent tool
for observing body movement in human communication. With the six categories and
approximately 60 parameters of Laban/Bartenieff Movement Studies, one is able to
942 V. Methods
observe movement in its complexity. The observer may select between all or the most
salient parameters, quantitative and a qualitative analysis, macro level to a micro level
and different methods of notation can be chosen. The notation has the advantage that
is quicker to write and can show movements which are happening simultaneously in
close proximity and in a pictorial, non-linear way. For those who are conversant with
the notation, it also enables communication about movement without language as a medi-
ator, so it is truly intercultural like music notation. Through 90 years in which Laban-
based analysis has been developed by many people besides Laban, it meets the intricacy
of human body movement through its complexity and systematic approach.
1. Introduction
Historically, Rudolf Laban (1879, 1958) first used his analysis for dance, but he had
already developed it further to be used in industry and for theatre. In his book the
Mastery of Movement he not only describes his concepts for movement on stage, but
also for the “theatre of life” i.e. communication through movement. Laban comments:
“So movement evidently reveals many different things. It is the result of the striving
after an object deemed valuable, or of a state of mind. […] It can characterize momen-
tary mood and reactions as well as constant features of the mover” (Laban 1975: 2).
Since a movement can have different meanings, it is important to have well differentiated
concepts to be able firstly to describe movement on a more objective level, i.e. with the
least amount of interpretation possible, then secondly to look at it in relationship to the
context to find meaning.
Movement is in a “continuous dynamic flux which is interrupted only by short
pauses” (Laban 1975: 8). This makes it difficult especially to analyze in face-to-face
situations when the observer is also a participant in the communication. Laban devel-
oped a way to analyze movement, on a “macro” level, without the use of film, by way of
shorthand through his symbols and notation system. Today, analysts can also go onto a
“micro” level, with the help of video documentation. Through repeated observations
the many layers (i.e. categories) of movement can be found, while one brief observation
can only reveal the aspect (i.e. category) which is in the foreground.
Many students of Laban have developed his ideas further, so that today we have a
couple of different Laban based analytical approaches. In the US Labananalysis
(LMA) was developed by Irmgard Bartenieff ([1900] 1981) and her students. Since
1988 the European Association of Laban/Bartenieff Movement Studies (EUROLAB),
based in Germany, has been actively translating and developing this material. In the fol-
lowing overview, I will focus on the approach which we use and teach in Europe called
Laban/Bartenieff Movement Studies (LMS). In a few instances I will briefly describe
other Laban-based approaches, which are important in respect to communication,
meaning making or notation.
– Body: What is moving? Which parts are moving? Which movements are done?
– Space: Where does the movement go?
– Effort: With which energetic quality?
– Shape: With which plastic modification?
– Relationship: How does the moving person set up a relationship to something or
somebody through movement?
– Phrasing: In which temporal order does the movement progress?
The peculiarity of each individual movement originates not only from the addition of
the different elements, but from their versatile combinations. Furthermore, each move-
ment is colored by which of the categories stands more in the foreground. There are a
number of aspects within each of these six categories (altogether over sixty), which can
be observed. Therefore, it is crucial that a selection of the observation parameters is
made according to how one wants to structure the observation process in the context
of the core question (see part 3).
2.1. Body
Laban (1975) not only differentiated the body parts, but also distinguished different
body actions. Today we also look at body attitude i.e. the posture which is held in move-
ment as well as the body connections and organizational patterns. Since the first two are
most relevant for communication, these will be the focus here.
body actions – like locomotion through space or turning around. Seldom do we see peo-
ple jump. In all these body actions, the meaning is mainly expressed through the use of
effort, shape, space, relationship and phrasing. Still it may be important to note with
which part of the body this is being done.
Warren Lamb (1965), a student of Laban’s, in his many observations of people in
communication, discovered that there are also sometimes fleeting moments where
the gesture and the postural shift join together – he called these “posture-gesture-
mergers” or “integrated movements” (Moore 2005: 39). With extra training, these
can be observed and analyzed in relation to their effort or shape content.
Tab. 60.2: Body part symbols for the head and fingers
Thumb (1. 2nd Finger 3rd Finger 4th Finger 5th Finger
Finger)
In Laban’s notation, the body parts are separated through the joints, going from the
center out (Fig. 60.1). “C” for Caput, Lat. Head, can be further differentiated with cer-
tain additions to represent the face, mouth and eyes. Again, with other additions, the
limbs or each finger can be notated separately (Tab. 60.2). These symbols can be
60. Laban based analysis and notation of body movement 945
added to the body action symbols in motif writing to clarify which body part is doing a
gesture.
2.2. Space
Laban differentiated between the general space around us and the personal movement
area, the kinesphere.
2.2.2. Kinesphere
The kinesphere is the area around the body which can be reached with the limbs when
standing (or sitting) in one place. Laban defined three approaches to the kinesphere:
central, peripheral or transverse. Central movements go to, away from or pass near
the body center; peripheral movements stay on the edge of the kinesphere; transverse
movements occur in between the center and the edge. Each person defines where the
edge of his kinesphere is through his movements. So it is possible to see peripheral
movements in near reach of the body.
The vertical, horizontal and sagittal dimensions are in the model of the octahedron
(Fig. 60.2). The diameters of the vertical, horizontal and sagittal planes are in the icosahe-
dron. The four diagonals are in the cube. With these three models (plus center) we
have 27 spatial reference points in total, which can give us an orientation in the kine-
sphere. These reference points can be connected: centrally – through dimensions, dia-
meters and diagonals, peripherally – on the edges of the models (Fig. 60.2), transversely
or a mixture – e.g. central and peripheral (Fig. 60.3).
2.3. Shape
Laban’s first profession was visual art. Therefore it isn’t a surprise that Laban initially
saw movement as a development of one still shape to the next and described it as “a
series of shape transformations” (Laban 1920: 214).
In everyday activities, the still shapes can be observed in the whole body, but also in
body regions, such as the upper body which is pin-like (while sitting upright). It is pos-
sible to transfer these shape descriptions to shapes made by the hands or arms in ges-
tures, i.e. a hand makes a rounded shape of a ball. If the hands start making pantomimic
movements – where the observer has to imagine the rest of the shape – then this be-
comes more what Hutchinson Guest (1983: 173) calls “design drawing”.
different aspects: shape-flow which is self-oriented shape change, directional which is goal-
oriented shape change, either spoke-like or arch-like, and carving which is three-dimen-
sional shaping. Hackney again has designed symbols for these concepts (Tab. 60.5).
In active everyday communication with speech, one can observe lots of directional and
carving gestures, some of which may be supported by postural weight shifts. Shape-flow
will mostly happen in those “off ” moments; when people self reference, self touch or
look like they are “dreaming”.
In communication, the shape qualities are very important in action as well as in reaction,
since they can carry conscious or unconscious meaning. There are no “one to one” correla-
tions; so they have to be described correctly and then interpreted in context. For example, a
sinking arm movement from high to low can be away from something which is high or
towards something which is low. This movement will be different when the whole body
is sinking or when the arm is sinking but the torso is rising. There will also be a difference
if the movement is purely sinking, sinking with spreading, or sinking with spreading and
advancing. Each differentiation can give the movement a slightly different meaning.
2.4. Effort
In order to describe the dynamic quality of a movement, one usually uses very imagistic,
subjective and interpretive modes of expression, for example “a person thrashed wildly
60. Laban based analysis and notation of body movement 949
around himself ”. Laban has created a more objective and a clear structure for the char-
acterization of energetic qualities in movement.
Weight &
Time & Flow
Space Stable
Mobile State
State
Weight & Time Flow & Space
Rhythm State Remote State
(Near) (Far)
Weight, Space & Weight, Flow & Weight, Space & Flow, Space &
Time Action Time Passion Flow Spell Time Vision
Drive Drive Drive Drive
2.5. Relationship
Ann Hutchinson Guest, a student of Laban’s, has developed the relationship category.
Sometimes the kind of relationship established can be more important than the body,
spatial, effort or shape component of the movement which produced it.
= Focus point
Support
and at the same time we are using all other categories. Here an example: Person A is
supported by the chair, while she is near a table, her hands are touching a cup (which
is supported by the table), she is addressing person B with her face and aware of the
movement of a dog in the distance. Since Person B is standing in her back space, she
carves her torso into a screw-like twisted shape with direct space effort and sustained
time to address him!
2.6. Phrasing
Movement usually happens in phrases. A phrase has a beginning, middle and end. Gen-
erally, Laban distinguished that a phrase of the movement has an exertion and recuper-
ation phase. Bartenieff (Hackney 1998: 239–241) emphasized the moment of initiation
and also what we do as a preparation for the movement, since it will determine how the
movement is followed through.
Consecutive
Congruent
Overlapping
Observer Role/
Point of View
Documentation/ Movement
Notation Parameters
Fig. 60.7: Structuring the Observation process (based on Moore and Yamamoto 1988: 224)
in relation to each other (Fig. 60.7). A Laban-based analysis will of course use the above
mentioned parameters and usually the below described forms of documentation and
interpretation.
One can start at any point on the star, but changing one aspect might mean that
others have to be changed. In a face-to-face situation for example, it could unfold in
the following way: the duration of the participatory observation is very short, the num-
ber of movement parameters observed is relatively few, and the making sense of the
observation was perfectly clear, since the observer was in the same situation. While
making the documentation afterwards, the observer comes up with the core question –
ah, that is why this was so interesting to me! Whereas in a research situation one would
have to structure the process very differently: the observer role would be a distant one
(possibly with video), the core question and the movement parameters would have
to be carefully defined in respect to the anticipating how to make sense of the data
(possibly a correlation to an established framework). The duration should be set, but
probably will depend on when the observer feels he has enough data to answer his ques-
tion. The form of documentation probably depends on the training of the observer and
the core question to be answered – which could call for qualitative or quantitative
analysis.
3.2. Documentation
There are many ways we can document Laban-based analysis, e.g. through movement
itself, words, or coding sheets. Laban and his students (Albrecht Knust and Ann Hutchin-
son Guest in particular) developed his notation as a basic tool for documentation, with the
hope that its use would become as normal for movement and dance, as music notation is
for music.
to the right
stillness
rising-enclosing
rising-spreading
release (the support)
Head forward
Hand left-low
step backward
3.2.1.3. Labanotation
Labanotation incorporates the exact positions of all the moving body parts in space and
their relationship to objects and other people (Fig. 60.10). It is a description on a micro
level with the documentation of the duration in the same direction as motif writing. The
difference is that a staff represents the body from a center line into right and left sides.
Another difference is that gestures have to be referenced in their spatial directions from
the “point of attachment” of that limb to the body (Hutchinson Guest 2005: 199–217).
This means that an arm gesture forward in Labanotation is on the level of the shoulders,
whereas in motif writing it will be on the level of the pelvis (LMS uses the center of weight
as the reference, see space). Labanotation is used mostly for an exact reconstruction of
movement events. It is possible to add other categories (effort and shape) for yet a fuller
score, but this is hardly done – since the two Laban-based analyses are separate trainings.
For analysis of everyday movement, any of the above mentioned forms of documenta-
tion can be used. Coding sheets are the easiest to train observers with basic knowledge of
movement observation, since they can be used independently of the notation. Of the
three forms of notation, phrase writing is the easiest to learn and observe, since there
will only be one category to focus on. If one is notating movement which goes along
with a conversation, where the categories which are in the foreground can change within
microseconds, then motif writing would be the best choice. When there are only gestures
of the arms and hands, then it is easier to write a motif of the movement than to use the
complete body staff of Labanotation.
4. Meaning making/interpretation
The words which Laban used to describe the concepts are (unfortunately) already
loaded with meaning. In the Laban community, we debate what would most accurately
describe the phenomenon. Actually, there are many words which could work; it all de-
pends on the synthesis of the different elements within a particular context. The sym-
bols are more neutral in their meaning. This helps, when the goal is to document the
movements on a descriptive level.
60. Laban based analysis and notation of body movement 957
When the observer wants to make meaning and starts to interpret, then he will do this
either with his own body knowledge and all his body prejudices, or he will have to find or
develop an interpretative framework (Moore and Yamamoto 1988). In either case, the
aim is to find interpretive statements which are supported by the descriptive data.
Our body prejudices become very clear when we are observing movements of other
cultures. We might be interpreting them with our knowledge and then possibly find out
that this was not the intended meaning. But even within our own culture, we sometimes
find out that our initial interpretation was not correct. Still it is an important step in the
observation process to accept that we have first impressions loaded with our prejudices
and interpretations. We can then go through systematic and focused observation with
the Laban based concepts. In the end, we can judge if our first impressions were correct.
An example of an interpretive framework is the “Movement Pattern Analysis” for a
decision-making process which Warren Lamb developed to interpret the poster-gesture-
merger patterns of effort and shape he observed in conversation (Moore 2005: 43). He
correlates effort with assertion and shape with gaining a perspective in the three phases
of decision making: attending, intending and committing.
5. Conclusions
Laban based movement observation presents certain difficulties. Some concepts do not
become clear just from the association to the word. Movement experience and observa-
tion practice with a teacher is the prerequisite for a reliable observation of the para-
meters. Training is also required to attain the necessary knowledge of the symbols
and syntax for the notation. There are different trainings for different parts of Laban
based analysis and notation systems. Laban’s notation takes the point of view of the
mover, which is slightly awkward for the observer, with right and left not being the
way it is observed. But it forces the observer to try to see the world through the mover’s
eyes!
Laban based movement observation also has various advantages. With the six cate-
gories of LMS and approximately 60 parameters, one is able to observe movement in its
complexity. The observer can choose to try to be as complete as possible or only
observe the salient parameters. He may choose a quantitative or a qualitative analysis –
combinations are also workable. It is possible to observe from a macro level to a micro
level, depending on the context and the possibility of repetition. Depending on what is
important to document, different methods of notation can be chosen. The symbols have
the advantage that they are quicker to write and can show combinations in one symbol.
Compared to written speech, the notation has the advantage that it can show move-
ments which are happening simultaneously in close proximity and in a pictorial, non-
linear way. The notation gives the scholar a basis not only to document movement,
but to also reflect on movement in a different way. For those who are conversant
with the notation, it also enables communication about movement without spoken or
written language as a mediator, so it is truly intercultural like music notation.
Through 90 years in which Laban based analysis has been developed by many people
besides Laban, it meets the intricacy of human body movement through its complexity
and systematic approach. I hope to have shown here that Laban based analysis, origi-
nally developed for dance, can also serve as an excellent tool for observing body
movement in human communication.
958 V. Methods
6. References
Bartenieff, Irmgard and Dori Lewis 1981. Body Movement: Coping with the Environment. New
York: Gordon and Breach. First published [1900].
Hackney, Peggy 1998. Making Connections – Total Body Integration through Bartenieff Funda-
mentals. London: Gordon and Breach.
Hutchinson Guest, Ann 1983. Your Move: A New Approach to the Study of Movement and Dance.
London: Gordon and Breach.
Hutchinson Guest, Ann 2005. Labanotation – the System of Analyzing and Recording Movement.
4th edition. London: Routledge.
Kennedy, Antja 2010. Bewegtes Wissen – Laban/Bartenieff-Bewegungsstudien verstehen und erle-
ben. Berlin: Logos.
Laban, Rudolf 1920. Welt des Tänzers. Stuttgart, Germany: Walter Seifert.
Laban, Rudolf 1926. Choreographie. Jena, Germany: Eugen Dietrich.
Laban, Rudolf 1975. The Mastery of Movement. 4th edition. Boston: Plays Publishers. First pub-
lished [1950].
Laban, Rudolf 1956. Laban’s Principles of Dance and Movement Notation. London: MacDonald
and Evans.
Laban, Rudolf 1966. The Language of Movement – A Guidebook to Choreutics. Boston: Plays
Publishers.
Lamb, Warren 1965. Posture and Gesture. London: Trinity.
Lewis, Penny and Susan Loman 1982. The Kestenberg Movement Profile – Its Past Applications
and Future Directions. Keene: Antioch New England Graduate School.
Moore, Carol-Lynne 2005. Movement and Making Decisions – the Body-Mind Connection in the
Workplace. New York: Dance and Movement Press.
Moore, Carol-Lynne and Kaoru Yamamoto 1988. Beyond Words – Movement Observation and
Analysis. New York: Gordon and Breach.
Preston Dunlop, Valerie 1984. Points of Departure: The Dancer’s Space. London: Lime Tree
Studios.
Abstract
The present chapter provides an overview of the Kestenberg Movement Profile (KMP), a
full body assessment instrument of dynamic movement and meaning. The Kestenberg
61. Kestenberg movement analysis 959
Movement Profile is an observational movement analysis tool that describes body move-
ment patterns across nine different categories. It is employed in clinical fields such as
dance/movement and creative-arts therapies, in developmental, clinical, social, and health
psychology, psychiatry, as well as in embodied cognition research. The theory links these
movement patterns to psychological needs, affect, temperament, learning styles, defense
mechanisms, self- and other related feelings, simple and complex relations, interlacing
movement, developmental, psychobiological, and clinical perspectives. Starting with a
history of the Kestenberg Movement Profile, the chapter provides summaries of the
method, clinical use and empirical research applications of the Kestenberg Movement
Profile. Psychometric qualities, related applications, and links to the cognitive sciences
and embodiment research are described. Kestenberg Movement Analysis has developed
clear propositions regarding how body movement patterns are related to meaning thereby
contributing to understanding the essence, function and psychological correlates of
dynamic movement.
In addition to the Kestenberg Movement Profile, Laban’s school of thought has other
derivatives, including Laban Movement Analysis (LMA), Movement Pattern Analysis
(MPA), the Action Profile® (AP), and the Movement Psychodiagnostic Inventory
(MPI). Some of the many important pioneers contributing in the Laban tradition are
Irmgard Bartenieff (1900–1981), Warren Lamb, Pamela Ramsden, Marion North, and
Martha Davis (Bartenieff and Lewis 1980; Davis 1997, 1981; Davis et al. 2007; Kesten-
berg 1975; Kestenberg and Sossin 1979; Kestenberg Amighi et al. 1999; Lamb 1965;
Lamb and Watson 1987; North 1972).
has evolved during more than 40 years of observation (Kestenberg 1975; Kestenberg
et al. 1971; Kestenberg and Sossin 1979; Kestenberg Amighi et al. 1999; Loman and
Foley 1996; Loman and Sossin 2009; Sossin 2002, 2007). Kestenberg linked the domi-
nance of specific movement patterns with particular developmental phases and psycho-
logical functions. Movement observations complement Kestenberg’s (1975, 1976, 1980)
investigations of multiple aspects of development, including gender, pregnancy and
maternal feelings, trauma, and obsessive-compulsive disorder, with a distinct focus on
the primary prevention of emotional disorders.
Movement patterns in the womb have been considered from a Kestenberg Move-
ment Profile perspective (Kestenberg 1980, 1987; Loman 1994, 2007), describing prena-
tal attunement and continuities in rhythmicities from prenatal to postnatal stages.
Historically, profiles of infants and parents were compared with each other to yield
information about areas of interpersonal conflict and harmony.
u running/drifting starting/stopping us
low high
gradual abrupt
Fig. 61.1: In this figure, one of the nine diagrams, Shape-Flow Design, is not included, because so
far it did not lead to sufficient reliability between observers. Overview of the Kestenberg Move-
ment Profile. The developmental sequence from top to bottom of all profiles indicates a proceed-
ing from early to more mature movement patterns. The developmental sequence within the single
profiles proceeds from first year to third year in the single rows of the profiles.
patterns are identified: each corresponding in pairs to five major developmental phases:
oral, anal, urethral, inner-genital and outer genital (Kestenberg 1975), each with an “in-
dulging” or “libidinal” pole characterized by smooth reversals, and each with a “fight-
ing” or a “sadistic” pole characterized by sharp reversals.
Bound Flow
Fig. 61.2: Tension flow rhythms overview (adapted from Kestenberg Amighi et al. 1999)
Whereas all other movement categories/clusters can be readily “unwed” from psycho-
analytic theory, this applies less to rhythms. The ten basic rhythms, and their according
abbreviations (Kestenberg Amighi et al. 1999) are sucking (o), snapping/biting (os),
twisting (a), strain/release (as), running/drifting (u), starting/stopping (us), swaying
(ig), surging/birthing (igs), jumping (og), and spurting/ramming (ogs). Tension-flow
notation and scoring are exemplified in Fig. 61.3. At the height of each developmental
phase, we expect to see a notable increase in the proportion of rhythms typical for that
phase. All body parts can show all rhythms, and all rhythmic patterns are evident (to
greater or lesser extents) at all phases. Frequency distributions appear to reflect consis-
tent individual differences. In addition to the ten basic rhythms, there is great variety of
“mixed rhythms,” combinations of two or more rhythms. Combinations of mixed fight-
ing rhythms signal a potential for immediate aggression. Individual preferences for
specific rhythms indicate preferred methods of drive discharge.
Fig. 61.3: Tension flow rhythms notation: an example of a 2 ½ month old boy.
Precursors of effort are both body-oriented, in terms of bound and free tension-flow
alternations, and reality-oriented, in terms of space, weight and time; hence, they are
intermediary patterns, between tension-flow and effort.
2.1.4. Effort
Effort patterns are motor components of coping with external reality in terms of space,
weight and time (Laban and Lawrence 1947). In space, direct and indirect are distin-
guished; in weight, strength and lightness; and in time, acceleration and deceleration.
Direct, strength and acceleration are fighting effort elements, while indirect, light and
deceleration are more accommodating ways of dealing with space, weight and time.
Effort elements are developmentally linked (as per consonance) to specific precursors
of effort and, even further, to specific tension-flow attribute patterns. The individual’s
mature constellation of effort elements shows, in relation to the polarities identified
above, preferences in terms of attention, intention and decision-making.
2.2.2. Attunement
Tension-flow and shape-flow are fundamental in the experience and expression of
affect. Bound flow corresponds to inhibition and discontinuity whereas free-flow corre-
sponds to facilitation of impulses and continuity. Attunement in tension-flow (sharing of
feelings) appears to be a key manifestation of empathy between individuals, such as
61. Kestenberg movement analysis 967
caregiver and child (Kestenberg 1985a). Higher attunement is deduced from higher
concordance of the tension-flow attribute diagrams between two individuals (Loman
and Sossin 2009). This can be seen more directly in temporal coding bearing directly
on interpersonal contingencies. Dyadic up-regulation and down-regulation are related
to such contingencies. Partial attunement rather than complete attunement is seen as
being helpful to the parent-infant relation, serving individuation.
The load factor (LF) is a statistic that applies to all categories of movement except
Tension Flow Rhythms, reflecting the complexity of movements in each subsystem by
indicating how many elements are, on average, included in an action. The range of
the load factor is between one (33% load factor) and three (99% load factor) elements
per action. Gestures and postures of the same cluster can have very distinct load factors.
The gain-expense ratio (GE) compares the number of movement elements (gain) per
subsystem to the number of movement flow factors (expense). The gain-expense ratio is
interpreted in relation to other subsystems, and indicates the relative degree of affective
control (non-flow movement patterns) vs. affective spontaneity (flow patterns) in each
domain. This affective component is further broken down into a ratio of free flow (ease)
to bound flow (restraint) or a ratio of growing (comfort) and shrinking (discomfort) in
System I, or II respectively.
(i) theory and method development (Kestenberg 1965a, 1965b, 1967, 1975, 1995;
Kestenberg and Borowitz 1990; Kestenberg and Sossin 1979),
(ii) applications to fields of practice (e.g., Birklein 2005; Hastie 2006; Kestenberg
Amighi 2007; La Barre 2001; Lewis 1990, 1999; Loman 1998; Loman and Foley
1996; Lotan and Yirmiya 2002; Loman and Sossin 2009; Sossin 1999), and
(iii) establishment of psychometric quality (e.g., Koch, Cruz, and Goodill 2002; Koch
2006a; Koch 2007a; Sossin 1987).
Kestenberg (1975) reported success in initial efforts to validate the Kestenberg Move-
ment Profile system using the external criterion of the diagnoses by Anna Freud in the
1960s. Kestenberg – blind to Anna Freud’s assessment – diagnosed the same children as
Freud on basis of the movement profile and then compared her diagnosis to psychoan-
alytic diagnoses employing the Diagnostic Profile. Such anecdotal reports laid a basis
for further validational work (e.g., Koch 2007a).
Whereas each study employing the Kestenberg Movement Profile offers further steps
toward validation (e.g., Birklein 2005; Birklein and Sossin 2006; Bridges 1989; Loman
1995, 2005), a more systematic approach to investigating the validity of the KMP has
been conducted by Koch (2007a, 2011) across a series of experimental studies. In this
work, single parameters from the Kestenberg Movement Profile have been selected in an
attempt to crystallize basic dimensions of movement and to validate them step by step.
However, information resulting from these experiments mainly concerns the validity of sin-
gle Kestenberg Movement Profile components, not combinations, sequences or complex in-
teractions of movement parameters. To start with a simple yet important set of movement
parameters, Koch first tested the basic dimensions of system I (tension-flow-effort system;
indulgent vs. fighting movement), and of system II (shape-flow-shaping system; open vs.
closed movement), and then the combination of both (interactions) experimentally.
In terms of economy, Bräuninger and Züger (2007) have suggested an abbreviated ob-
servational version, and Koch has created a questionnaire format with 113 Items (Koch
61. Kestenberg movement analysis 969
1999) from the interpretive categories in Kestenberg Amighi et al. (1999). A German ver-
sion of the questionnaire was highly consistent (N=80; all Alphas > .80), and items had
high discriminative power, two exceptions were taken out. On the basis of this question-
naire, Koch and Müller (2007) developed the brief Kestenberg Movement Profile-based
affect scale (Koch and Müller 2007; Fig. 61.4). This scale is suited for experimental mea-
sures of movement-related affect as well as for evaluation designs. It includes System I
(Items in standard writing) and System II movement patterns (Items in italics).
relaxed 1 2 3 4 5 6 7 tense
loaden, fighting 1 2 3 4 5 6 7 joyful, excited
(aimless) drifting 1 2 3 4 5 6 7 impatient, driven
comfortable 1 2 3 4 5 6 7 uncomfortable
indulging 1 2 3 4 5 6 7 distancing
holding on, retentive 1 2 3 4 5 6 7 playful, coy
yielding 1 2 3 4 5 6 7 fighting
letting go 1 2 3 4 5 6 7 nervous
open 1 2 3 4 5 6 7 closed
resenting 1 2 3 4 5 6 7 taking in
approaching, curious 1 2 3 4 5 6 7 avoiding, refrain from
inclined toward 1 2 3 4 5 6 7 disinclined
peaceful 1 2 3 4 5 6 7 aggressive
Fig. 61.4: English version of the brief Kestenberg Movement Profile-based affect scale (13 Items,
Koch and Müller 2007)
of movement qualities (system I) as well as movement shapes (system II) on affect, at-
titudes, and cognition. Kinesthetic feedback from movement qualities (rhythms and
strong versus light efforts) has been shown to operate online, i.e., in the situation (Suit-
ner et al. 2011) as well as offline, i.e., from memory (Koch, Hentz, and Kasper 2011).
Directional movement and meaning has been shown to include Kestenberg Movement
Profile dimensions in the context of spatioal bias research (Koch, Glawe, and Holt
2011). The Kestenberg Movement Profile has been employed as a theory to derive hy-
potheses on gaze behavior among men and women in work teams (Koch et al. 2010),
resulting in the finding that women on average distribute their gaze in a more egalitar-
ian way across all team members (indirect gaze), whereas men on average gazed more
dyadically (direct gaze), paying attention to fewer team members.
Kestenberg Movement Profile research has also examined sequential movement pro-
cesses looking at verbal-nonverbal parallel processes, indications of defensive employ-
ments (Koch 2007b), and manners of maternal communications of depression in
mother-child interaction (Reale 2011). Such investigations are in greater agreement with
systems models of change processes (Fogel et al. 2009), e.g. employing split-screen methods
of microanalysis (Beebe et al. 2010) applying the Kestenberg Movement Profile as a tool in
time-series analyses. Systems-framed studies of transmission and sequential process suggest
that combinatory movement patterns may be more robust factors than singular patterns.
5. Summary
The chapter provided a comprehensive overview of the Kestenberg Movement Profile as
an observational movement analysis system, and as an accompanying theoretical system
pertaining to movement patterns and their meaning. It introduced history, theory,
method, highlighted development, validation, and present use in clinical, developmental,
and cognitive sciences contexts. Further information can be obtained on the Kestenberg
Movement Profile website (www.kestenbergmovementprofile.org) and from the authors.
6. References
Bartenieff, Irmgard and Dori Lewis 1980. Body Movement: Coping with the Environment. New
York: Gordon and Breach.
Beebe, Beatrice, Joseph Jaffe, Sara Markese, Karen Buck, Henian Chen, Patricia Cohen, Lorraine
Bahrick, Howard Andrews and Stanley Feldstein 2010. The origins of 12-month attachment: A
microanalysis of 4-month mother-infant interaction. Attachment and Human Development 12(1):
3–141.
Bender, Susanne 2007. Einführung in das Kestenberg Movement Profile (KMP). In: Sabine C.
Koch and Susanne Bender (eds.), Movement Analysis – Bewegungsanalyse: The Legacy of
Laban, Bartenieff, Lamb and Kestenberg, 53–64. Berlin: Logos.
Birklein, Silvia B. 2005. Nonverbal indices of stress in parent-child interaction. Ph.D. dissertation,
Department of Psychology, New School for Social Research. New York, NY.
Birklein, Silvia B. and K. Mark Sossin 2006. Nonverbal indices of stress in parent-child dyads: Impli-
cations for individual and interpersonal affect regulation and intergenerational transmission.
In: Sabine C. Koch and Iris Bräuninger (eds.), Advances in Dance/Movement Therapy, 128–141.
Berlin: Logos.
61. Kestenberg movement analysis 971
Bräuninger, Iris and Brigitte Züger 2007. Filmbasierte Bewegungsanalyse zur Behandlungsevalua-
tion von Tanz- und Bewegungstherapie. In: Sabine C. Koch and Susanne Bender (eds.), Move-
ment Analysis. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 213–223. Berlin: Logos.
Bridges, Laurel 1989. Measuring the effect of dance/movement therapy on the body image of in-
stitutionalized elderly using the Kestenberg Movement Profile and projective drawings. Un-
published master’s thesis, Antioch Graduate School, Keene, NH.
Davis, Martha 1981. Movement characteristics of hospitalized psychiatric patients. Amercian Jour-
nal of Dance Therapy 4: 52–84.
Davis, Martha 1997. Guide to movement analysis methods, part 2: Movement Psychodiagnostic
Inventory. Available from the author at [email protected].
Davis, Martha, Hedda Lausberg, Robyn Flaum Cruz, Miriam Roskin Berger and Dianne Dulicai 2007.
The Movement Psychodiagnostic Inventory (MPI). In: Sabine Koch and Susanne Bender (eds.),
Movement Analysis. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 119–130. Berlin: Logos.
Eberhard-Kaechele, Marianne 2007. The regulation of interpersonal relationships by means of
shape flow: A psychoeducational intervention for traumatised individuals. In: Sabine Koch
and Susanne Bender (eds.), Movement Analysis. The Legacy of Laban, Bartenieff, Lamb and
Kestenberg, 65–86. Berlin: Logos.
Fogel, Alan, Andrea Garvey, Hui-Chin Hsu and Delisa West-Stroming 2009. Change Processes in
Relationships: A Relational-Historical Research Approach. 2nd edition. Cambridge: Cambridge
University Press.
Frank, Ruella and Francis La Barre 2011. Movement, Development, and Psychotherapeutic
Change. New York: Routledge.
Freud, Anna 1965. Normality and Pathology in Childhood: Assessment of Developments. London:
Karnac.
Hastie, Suzanne 2006. The Kestenberg Movement Profile. In: Stephanie L. Brooke (ed.), Creative
Arts Therapies Manual, 121–132. Springfield, IL: Charles C. Thomas.
Kestenberg, Judith S. 1946. Early fears and early defences: Selected problems. Nervous Child 5:
56–70.
Kestenberg, Judith S. 1954. The history of an “autistic child”: Clinical data and interpretation.
Journal of Child Psychiatry 2: 5–52.
Kestenberg, Judith S. 1965a. The role of movement patterns in development: 1. Rhythms of move-
ment. Psychoanalytic Quarterly 34: 1–36.
Kestenberg, Judith S. 1965b. The role of movement patterns in development: 2. Flow of tension
and effort. Psychoanalytic Quarterly 34: 517–563.
Kestenberg, Judith S. 1967. The role of movement patterns in development: 3. The control of
shape. Psychoanalytic Quarterly 36: 356–409.
Kestenberg, Judith S. 1975. Children and Parents: Psychoanalytic Studies in Development. New
York: Jason Aronson.
Kestenberg, Judith S. 1995. Sexuality, Body Movement and Rhythms of Development. Northvale,
NJ: Jason Aronson. First published [1975].
Kestenberg, Judith S. 1976. Regression and reintegration in pregnancy. Journal of the American
Psychoanalytic Association 24: 213–250.
Kestenberg, Judith S. 1977a. Prevention, infant therapy and the treatement of adults, 1: Toward
understanding mutuality. International Journal of Psychoanaltyic Psychotherapy 6: 338–367.
Kestenberg, Judith S. 1977b. Prevention, infant therapy and the treatment of adults, 2.: Mutual hold-
ing and holding oneself up. International Journal of Psychoanalytic Psychotherapy 6: 369–396.
Kestenberg, Judith S. 1980. The three faces of femininity. Psychoanalytic Review 67: 313–335.
Kestenberg, Judith S. 1985a. The role of movement patterns in diagnosis and prevention. In:
Donald A. Shaskan, William L. Roller and Paul Schilder (eds.), Mind Explorer, 97–160. New
York: Human Sciences Press.
Kestenberg, Judith 1985b. The flow of empathy and trust between mother and child. In: Elwyn J.
Anthony and George H. Pollock (eds.), Parental influences in health and disease, 137–163. Boston:
Little, Brown.
972 V. Methods
Kestenberg, Judith S. 1987. Imagining and remembering. Israeli Journal of Psychiatry and Related
Sciences 24: 229–241.
Kestenberg, Judith S. and Esther Borowitz 1990. On narcissism and masochism in the fetus and
the neonate. Pre- and Perinatal Psychology Journal 5: 87–94.
Kestenberg, Judith S. and Arnhilt Buelte 1977. Prevention, infant therapy and the treatment of
adults 1. Towards understanding mutuality, 2. Mutual holding and holding oneself up. Interna-
tional Journal of Psychoanalytic Psychotherapy 6: 39–396.
Kestenberg, Judith S., Marcus Hershey, Esther Robbins, Jay Berlowe and Arnhilt Buelte 1971.
Development of the young child as expressed through bodily movement. Journal of the Amer-
ican Psychoanalytic Association 19: 746–764.
Kestenberg, Judith S. and K. Mark Sossin 1979. The Role of Movement Patterns in Development,
Vol. 2. New York: Dance Notation Bureau Press.
Kestenberg Amighi, Janet 2007 Kestenberg Movement Profile perspectives on posited native
American learning style preferences. In: Sabine Koch and Susanne Bender (eds.), Movement
Analysis. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 175–186. Berlin: Logos.
Kestenberg Amighi, Janet, Susan Loman, Penny Lewis and K. Mark Sossin 1999. The Meaning of
Movement: Development and Clinical Perspectives of the Kestenberg Movement Profile. New
York: Brunner-Routledge.
Koch, Sabine C. 1999. The Kestenberg Movement Profile. Reliability of Novice Raters. Stuttgart,
Germany: Ibidem.
Koch, Sabine C. 2006a. Gender at work: Differences in use of rhythms, efforts, and preefforts. In:
Sabine C. Koch and Iris Bräuninger (eds.), Advances in Dance/Movement Therapy. Theoretical
Perspectives and Empirical Findings, 116–127. Berlin: Logos.
Koch, Sabine C. 2006b. Interdisciplinary embodiment approaches. Implications for creative arts
therapies. In: Sabine C. Koch and Iris Bräuninger (eds.), Advances in Dance/Movement Ther-
apy. Theoretical Perspectives and Empirical Findings, 17–28. Berlin: Logos.
Koch, Sabine C. 2007a. Basic principles of movement analysis. Steps toward validation of the
KMP. In: Sabine C. Koch and Susanne Bender (eds.), Movement Analysis. The Legacy of
Laban, Bartenieff, Lamb and Kestenberg, 235–248. Berlin: Logos.
Koch, Sabine C. 2007b. Defences in movement. Video analysis of group communication patterns.
Body, Movement and Dance in Psychotherapy 2: 29–45
Koch, Sabine C. 2011. Basic body rhythms and embodied intercorporality: From individual to inter-
personal movement feedback. In: Wolfgang Tschacher and Claudia Bergomi (eds.), The Implica-
tions of Embodiment: Cognition and Communication, 151–171. Exeter, UK: Imprint Academic.
Koch, Sabine C., Christina Baehne, Friederike Zimmermann, Lenelis Kruse and Joerg Zumbach
2010. Visual dominance and visual egalitarianism. Individual and group-level influences of sex
and status in group interactions. Journal of Nonverbal Behavior 34(3): 137–153.
Koch, Sabine C., Robyn Cruz and Sharon W. Goodill 2002. The Kestenberg Movement Profile
(KMP): Reliability of novice raters. American Journal of Dance Therapy 23(2): 71–88.
Koch, Sabine C. and Thomas Fuchs 2011. Embodied arts therapies. The Arts in Psychotherapy 38:
276–280.
Koch, Sabine C., Stefanie Glawe and Daniel Holt 2011. Up and down, front and back: Movement
and meaning in the vertical and sagittal axis. Social Psychology 42(3): 159–164.
Koch, Sabine C., Eva Hentz and Detlef Kasper 2011. The influence of movement qualities on
affect and memory. Unpublished manuscript.
Koch, Sabine C., Katharina Morlinghaus and Thomas Fuchs 2007. The joy dance. Effects of a sin-
gle dance intervention on patients with depression. The Arts in Psychotherapy 34: 340–349.
Koch, Sabine C. and Stephanie M. Müller 2007. The KMP-questionnaire and the brief KMP-based
affect scale. In: Sabine C. Koch and Susanne Bender (eds.), Movement Analysis – Bewegungsana-
lyse. The Legacy of Laban, Bartenieff, Lamb and Kestenberg, 195–202. Berlin: Logos.
Koppe, Simo, Susanne Harder and Mette Vaever 2008. Vitality affects. International Forum of Psy-
choanalysis 17(3): 160–179.
Laban, Rudolf von 1960. The Mastery of Movement. London: MacDonald and Evans.
61. Kestenberg movement analysis 973
Laban, Rudolf von and F. Lawrence 1974. Effort: Economy in Body Movement. Boston, MA:
Plays. First published [1947].
La Barre, Francis 2001. On Moving and Being Moved. Hillsdale, NJ: Analytic Press.
La Barre, Francis 2005. The kinetic transference and countertransference. Contemporary Psycho-
analysis 41: 249–279.
Lamb, Warren 1965. Posture and Gesture: An Introduction to the Study of Physical Behavior.
London: Duckworth.
Lamb, Warren and Elizabeth M. Watson 1987. Body Code: The Meaning in Movement. London:
Routledge and Kegan Paul. First published [1979].
Lewis, Penny 1990. The KMP in the psychotherapeutic process with borderline disorders. In:
Penny Lewis and Susan Loman (eds.), The Kestenberg Movement Profile: Its Past, Present Ap-
plications and Future Directions, 65–84. Keene, NH: Antioch New England Graduate School.
Lewis, Penny 1999. Healing early child abuse. The application of the KMP and its concepts. In:
Judith Kestenberg Amighi, Susan Loman, Penny Lewis and K. Mark Sossin (eds.), The Mean-
ing of Movement: Development and Clinical Perspectives of the Kestenberg Movement Profile,
235–248. New York: Brunner-Routledge.
Lewis, Penny and Susan Loman (eds.) 1990. The Kestenberg Movement Profile: Its Past, Present
Applications and Future Directions. Keene, NH: Antioch New England Graduate School.
Loman, Susan 1994. Attuning to the fetus and the young child: Approaches from dance/movement
therapy. Zero to Three: Bulletin of National Center for Clinical Infant Programs 15(1): 20–26.
Loman, Susan 1995. The case of Warren: A KMP approach to autism. In: Fran J. Levy (ed.), Dance
and Other Expressive Art Therapies, 213–224. New York: Routledge.
Loman, Susan 1998. Employing a developmental model of movement patterns in Dance/movement
therapy with young children and their families. American Journal of Dance Therapy 20: 101–115.
Loman, Susan 2005. Dance/Movement Therapy. In: Cathy Malchiodi (ed.), Expressive Therapies,
68–89. New York: Guilford Press.
Loman, Susan 2007 The KMP and pregnancy: Developing early empathy through notating fetal
movement. In: Sabine Koch and Susanne Bender (eds.), Movement Analysis. The Legacy of
Laban, Bartenieff, Lamb and Kestenberg, 187–194. Berlin: Logos.
Loman, Susan and F. Foley 1996. Models for understanding the nonverbal process in relationships.
The Arts in Psychotherapy 23: 341–350.
Loman, Susan and K. Mark Sossin 2009. Current clinical applications of the Kestenberg Move-
ment Profile. In: Sharon Chaiklin and Hilda Wengrower (eds.), Life Is Dance: The Art and
Science of DMT, 237–264. New York: Routledge.
Lotan, Nava and Nurit Yirmiya 2002. Body movement, presence of parents and the process of fall-
ing asleep in toddlers. International Journal of Behavioral Development 26: 81–88.
Merleau-Ponty, Maurice 1962. The Phenomenology of Perception. London: Routledge.
Moore, Carol-Lynne and Kaoru Yamamoto 2011. Beyond Words: Movement Observation and
Analysis. 2nd edition. New York: Routledge.
Niedenthal, Paula, Laurence W. Barsalou, Piotr Winkielman, Silvia Kraut-Gruber and Francois
Ric 2005. Embodiment in attitudes, social perception, and emotion. Personality and Social Psy-
chology Review 9: 184–211.
North, Marion 1972. Personality Assessment through Movement. London: Macdonald and Evans.
Ramsden, Pamela 2007. Moments of wholeness: How awareness of action profile® integrated
movement and related modes of thinking can enhance action. In: Sabine C. Koch and Susanne
Bender (eds.), Movement Analysis – Bewegungsanalyse. The Legacy of Laban, Bartenieff, Lamb
and Kestenberg 29–40. Berlin: Logos.
Reale, Amy E. 2011. Maternal facial shape flow patterns in mother-infant interaction correspon-
dent to maternal self-criticism and dependency: Application and utilization of the Kestenberg
Movement Profile (KMP) in a microanalysis of mother-infant interactions. Psy.D. Dissertation,
Department of Psychology, Pace University, New York.
974 V. Methods
Rizzolatti, Giacomo and Corrado Sinigaglia 2008. Mirrors in the Brain: How Our Minds Share Ac-
tions and Emotions. New York: Oxford University Press.
Schilder, Paul 1935. The Image and Appearance of the Human Body: Studies in the Constructive
Energies of the Psyche. London: Kegan Paul, French and Trubner.
Shai, Dana and Jay Belsky 2011. When words just won’t do: Introducing parental embodied men-
talizing. Child Development Perspectives 5(3): 173–180.
Shaw, Jocelyn, K. Mark Sossin and Stephen Salbod 2010. Shape-Flow in Embodied Parent-Child
Affect Regulation: Heightened Emotional Expression in Dyadic Interaction. World Association
for Infant Mental Health, 12th World Congress: Infancy in Times of Transition. July 1, 2010.
Leipzig, Germany.
Sossin, K. Mark 1987. Reliability of the Kestenberg Movement Profile. Movement Studies: Observer
Agreement 2: 23–28. New York: Laban/Bartenieff Institute of Movement Studies.
Sossin, K. Mark 1990. Metapsychological considerations of the psychologies incorporated in the
KMP System. In: Penny Lewis and Susan Loman (eds.), The Kestenberg Movement Profile: Its
Past, Present Applications and Future Directions. 101–113. Keene, NH: Antioch New England.
Sossin, K. Mark 1999. Interpretation of an adult profile: Observations in a parent-child setting. In:
Janet Kestenberg Amighi, Susan Loman, Penny Lewis and K. Mark. Sossin (eds.), The Mean-
ing of Movement: Developmental and Clinical Perspectives of the Kestenberg Movement Profile.
265–290. Amsterdam: Gordon and Breach.
Sossin, K. Mark 2002. Interactive movement patterns as ports of entry in infant-parent psychother-
apy. Journal of Infant, Child and Adolescent Psychotherapy 2: 97–131.
Sossin, K. Mark 2007. History and future of the Kestenberg Movement Profile. In: Sabine C. Koch
and Susanne Bender (eds.), Movement Analysis: Bewegungsanalyse, 103–118. Berlin: Logos.
Sossin, K. Mark and Silvia Birklein 2006. Nonverbal transmission of stress between parent and
young child: Considerations and psychotherapeutic implications of a study of affective move-
ment patterns. Journal of Infant, Child, and Adolescent Psychotherapy 5: 46–69.
Suitner, Caterina, Sabine C. Koch, Katharina Bachleitner and Anne Maass 2011. Dynamic
embodiment and its functional role: A body feedback perspective. In: Sabine C. Koch, Thomas
Fuchs, Michela Summa and Cornelia Müller (eds.), Body Memory, Metaphor and Movement,
155–170. Amsterdam: John Benjamins.
Winter, Deborah Du Nann, Carla Widell, Gail Truitt and Jane George-Falvy 1989. Empirical stu-
dies of posture-gesture mergers. Journal of Nonverbal Behavior 13(4): 207–223.
Abstract
This chapter gives a brief and selective overview of how to approach research on bodily
and linguistic aspects of communication by doing fieldwork. Here I do not address tech-
nological issues such as specific choices of video recording equipment or computer
programs for processing and analysis. Any such discussion would either be too pur-
pose-specific or would quickly become obsolete due to the fast-moving nature of the
technology. Our interest here is the general approach.
2. Fieldwork defined
What is fieldwork? One meaning of the term is “distant travel for research”. This simply
refers to the collecting of data somewhere away from the researcher’s usual place of
work, especially when this requires the researcher to travel and be absent from their
own home for some time. This is in line with a recent definition given by Majid: “Field-
work is the collection of primary data outside of the controlled environments of the lab-
oratory or library” (Majid 2012: 54). This notion of fieldwork does not distinguish
between qualitatively different modes of data collection. It may, for example, involve
a researcher from Germany traveling to a Namibian village to carry out a brief program
of controlled experiments on memory for body movements in relation to spatial cogni-
tion (see Haun and Rapold 2009). Or it may involve a researcher from the Netherlands
traveling to central Australia for an extended period of observation and video recording
of everyday interaction in Aboriginal communities, to understand how speech and
bodily behaviour are integrated in referring to space (see Wilkins 2003). If this “travel
for research” sense of fieldwork is taken to simply mean that the work is done outside of
the research laboratory, it may also refer to work that is done closer to home. We might
say, for instance, that we are doing fieldwork when we go downtown to take notes on
how people gesticulate in public places (see Efron 1941), or when we take a video
camera to a local mechanic shop to record workplace interaction in a rich artifactual
environment (see Goodwin 2000; Streeck 2009).
A second notion of fieldwork is something like “the recording of research-
independent events”. This refers to the gathering of data by recording events which
are taking place independently of the fact that the recordings are being made. This is
not to say that the act of recording an event has no impact on the nature of the
976 V. Methods
event. Of course people’s behaviour can be affected as a result of them knowing that
they are being observed. But the idea of fieldwork in this sense is that the events
that are being recorded would take place anyway, in roughly the same form, and in
the same places and times, even if no research were being conducted on it at all. It
might involve recording people cooking, or having dinner, working in their fields or
workshops, staging performances or ritual activities. This is distinct from the collection
of data using experimental methods (whether one is in the field or not), where the to-
be-recorded events would not have happened at all were they not instigated by re-
searchers, motivated by research questions and methods.
There are at least two important reasons to do fieldwork in this second sense. First, it
can give the researcher access to phenomena that would otherwise be inaccessible, for
example because they are difficult or impossible to elicit experimentally. Second, and
relatedly, this kind of fieldwork provides a way to maximize ecological validity (though,
of course, at possible cost to experimental control). The second definition of fieldwork
almost entails the first, i.e., that the research be done “outside the lab” (unless of course
one is studying people’s communicative behaviour in research laboratories). But this is
not always the case. On the “recording research-independent events” definition of field-
work, we would have to include, for example, the recording of telephone conversations
or the gathering of data from web page commentaries.
In this chapter I want to discuss fieldwork in the sense of the overlap between the
two notions of the term discussed so far. Thus, we shall focus here on “travel away
from the researcher”s home environment to record research-independent events”.
Many researchers of bodily behaviour have carried out fieldwork in more or less this
sense (see Enfield 2003; Haviland 2003; Kendon and Versante 2003; Kita and Essegbey
2001; Wilkins 2003, among many others). Seyfeddinipur (2012) provides a linguist-
oriented overview and guide of fieldwork on co-speech gesture. There are many useful
references there, as well as detailed advice about the recording and annotation of data.
In what follows, I am going to restrict the discussion to this narrower definition than
what many people will want to include under fieldwork. In the rest of the chapter I want
to discuss some practical points of relevance to carrying out fieldwork on bodily aspects
of communication.
3. Equipment
For fieldwork on body, language, and communication, by far the most effective method
is to use video and sound recording technology. As Hanks (2009) points out, while it is
possible to collect data “on the fly” in the form of handwritten notes, this can only be
done if one has already built a sufficient background knowledge of the local cultural
and physical setting, and even then the data “is already a selective interpretation of
what the researcher perceives” (Hanks 2009: 19). Video recording of interaction
gives you the possibility of repeated inspection of the data. Also, with video-recorded
data you are able to provide others with direct evidence for your findings and analysis.
Use the highest quality equipment you possibly can. This will make a huge difference
for the quality and longevity of your data. If you invest now in equipment that delivers
the best quality possible, it will repay in the long term. Any recording you make today
will potentially be a source for your research for many years to come. Especially rele-
vant in research on gesture is your choice of equipment for video recording and digital
62. Doing fieldwork on the body, language, and communication 977
photography. When selecting equipment, you have to make many choices – what degree
of video resolution, what type of lens, which media format, etc. – and each of these
choices represents a potential weak link that may compromise any high-quality choice
you have made elsewhere. Even the best sound recording device cannot compensate for
the sound delivered through a poor quality microphone. Even the best camera cannot
compensate for the poor image delivered through a low quality lens. If the quality of
your recordings is compromised in any of these ways, you will be required to live
with the limitations of an inferior recording forever.
Quality is priority number one. Beyond this simple rule, it is not possible to give gen-
eral equipment recommendations. This is partly because your choice of equipment de-
pends on what you are trying to do, and partly because technology is changing so
quickly that a good choice today may be inferior or even obsolete tomorrow. In figuring
out what equipment to use, first you should specify exactly what your goals are – includ-
ing what kind of data you want to collect, and what you intend to do with the data after-
wards – then study the options, and consult as many colleagues as possible about their
experiences. Talk to them and pay attention to what they say. Our more experienced
colleagues have paid costs for their lessons learnt, and we can do well to benefit
from their costly experiences. Do not repeat others’ mistakes.
Even after the best preparation you should expect for things not to go the way you
have planned. In fieldwork, you must always remain flexible, and be willing to allow
your plans to change in an instant. You might be getting ready to make data recordings
with a certain goal in mind, yet somehow the circumstances change, and people start
doing something different from what they had led you to expect, or they become dis-
tracted from what you had hoped they would be doing. In such a case do not get fru-
strated or try to redirect things. Go with the flow. Like in all empirical science,
serendipity is a source of new, unexpected insights. In the field, you are in a world
that belongs to other people. The dynamics of daily life can be like the changing cur-
rents of the surf: If a rip tide takes you, don’t fight it. There’s no point. It would only
weaken you and tire you out. Instead relax and see where the flow leads you. You
will soon be able to make your way back to safety.
sensibilities from you. For instance, while they may not care about being photographed
with their shirts off, they might be mortified if you were to publish a picture of them
exposing the soles of their feet. Be sensitive.
Let the camera roll even when nothing special is happening, and walk away. Come back
later and find out what you have captured.
Be prepared to catch something other than what you were after, or maybe even to
catch nothing at all. In one case, I recall, I set up a camera in a village household, and
soon after I left the scene, so did all the people who I had hoped to film. The result was
an hour of footage of an empty room. But the costs of these kinds of failures are neg-
ligible: a bit of time is lost, and a videocassette or small section of hard drive is filled up.
With digital media, now the norm, your only constraint is hard-disk space. You can eas-
ily ensure that you have ample space for many more recordings than you will actually
end up using.
in which case you should be careful to brace the camera and avoid camera shake as
far as possible.
Some final points concern the preparation for, and management of, your recordings.
When in the field, make sure you have your recording equipment fully ready at all
times. This means that whenever you have time, for example before you go to bed at
night, make all the necessary preparations in advance for your next set of recordings:
for example, fully charge all the necessary devices and batteries, pack your work bag
with all the things you need, such as extra blank videotapes, or formatted memory
cards, depending on the kind of equipment you are using. Your work bag is then
ready for you to grab at any moment. Check and re-check that everything is in working
order. And when you are actually making your recordings, you should not only check
and re-check, but re-re-check as well. Are your batteries charged? Are your lighting
and focus settings correct? Is your framing good? And particularly important is the
input of sound to your video recorder. If you use an external microphone, you can
have a better quality microphone than the one in your camera, and you will be able
to place it closer to the action. However, it is easy to make errors with cable connections
and sound settings, and so it is crucial to check, re-check, and re-re-check that your
sound input during recording is working well. Use headphones to monitor the sound
input on the camera once you have begun any recording.
Whenever you make a recording, you should immediately note down the relevant
metadata: the time and place of the recording, what are the activities being recorded,
who the people are, and any other possibly relevant information. You can easily and
quickly note these things at the time of recording, and if you don’t, you will find it dif-
ficult if not impossible to remember all the relevant details later. Lastly, backup your
data as soon as you can. And keep your backups in a different place from your original
data, especially when you are traveling.
7. Closing remark
This chapter has offered some remarks concerning fieldwork on the body, language, and
communication. There are many significant and complex issues that I have not men-
tioned. If you are intending to do fieldwork, it is worth reading widely beforehand,
and drawing on others’ experiences as far as you can. But nothing is as valuable as
first-hand experience. If you want to do fieldwork, just do it. No matter how well-
prepared you are, you will make mistakes. Just make sure you learn from these mis-
takes. Don’t try to do everything in a single field trip. Take time in between, in order
to assess your experience and adjust your way of working. And above all, remain flex-
ible and good-humored. Rigidity and stress are both unhealthy and contagious: they
should be avoided at all costs.
Acknowledgements
Thank you to Nick Williams for helpful comments on an earlier draft, and to colleagues
in the Language and Cognition Department at the Max Planck Institute for Psycholin-
guistics in Nijmegen for their input during many discussions on the topics discussed here.
And thanks to Julija Baranova for expert assistance. This work is supported by the
European Research Council (Grant “Human Sociality and Systems of Language Use”).
62. Doing fieldwork on the body, language, and communication 981
8. References
Bowern, Claire 2008. Linguistic Fieldwork: A Practical Guide. New York: Palgrave Macmillan.
Crowley, Terry 2007. Field Linguistics: A Beginner’s Guide. Oxford: Oxford University Press.
Dixon, Robert Malcolm Ward 2010. Basic Linguistic Theory. Oxford: Oxford University Press.
Duranti, Allesandro (ed.) 1997. Linguistic Anthropology. Cambridge: Cambridge University Press.
Efron, David 1941. Gesture, Race, and Culture: A Tentative Study of Some of the Spatio-Temporal
and “Linguistic” Aspects of the Gestural Behavior of Eastern Jews and Southern Italians in New
York City, Living under Similar as Well as Different Environmental Conditions. The Hague:
Mouton.
Enfield, N. J. 2003. Demonstratives in space and interaction: Data from Lao speakers and implica-
tions for semantic analysis. Language 79(1): 82–117.
Gippert, Jost, Nikolaus P. Himmelmann and Ulrike Mosel (eds.) 2006. Essentials of Language
Documentation. Berlin: De Gruyter.
Goodwin, Charles 2000. Action and embodiment within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Hanks, William F. 2009. Fieldwork on deixis. Journal of Pragmatics 41: 10–24.
Haun, Daniel B. M. and Christian J. Rapold 2009. Variation in memory for body movements
across cultures. Current Biology 19(23): R1068–R1069.
Haviland, John 2003. How to point in Zinacantán. In: Sotaro Kita (ed.), Pointing: Where Lan-
guage, Culture, and Cognition Meet, 139–170. Mahwah, NJ: Lawrence Erlbaum.
Kendon, Adam and Laura Versante 2003. Pointing by hand in “Neapolitan”. In: Sotaro Kita (ed.),
Pointing, Where Language, Culture, and Cognition Meet, 109–138. Mahwah, NJ: Lawrence
Erlbaum.
Kita, Sotaro and James Essegbey 2001. Pointing left in Ghana: How a taboo on the use of the left
hand influences gestural practice. Gesture 1(1): 73–94.
Majid, Asifa 2012. A guide to stimulus-based elicitation for semantic categories. In: Nicholas Thie-
berger (ed.), The Oxford Handbook of Linguistic Fieldwork, 54–71. Oxford: Oxford University
Press.
Newman, Paul and Martha Ratliff (eds.) 2001. Linguistic Fieldwork. Cambridge: Cambridge Uni-
versity Press.
Payne, Thomas E. 1997. Describing Morphosyntax: A Guide for Field Linguists. Cambridge: Cam-
bridge University Press.
Sakel, Jeanette and Daniel L. Everett 2012. Linguistic Fieldwork: A Student Guide. Cambridge:
Cambridge University Press.
Seyfeddinipur, Mandana 2012. Reasons for documenting gestures and suggestions for how to go
about it. In: Nicholas Thieberger (ed.), The Oxford Handbook for Linguistic Fieldwork, 147–
165. Oxford: Oxford University Press.
Streeck, Jürgen 2009. Gesturecraft: The Manu-Facture of Meaning. Amsterdam: John Benjamins.
Thieberger, Nicholas (ed.) 2012. The Oxford Handbook of Linguistic Fieldwork. Oxford: Oxford
University Press.
Vaux, Bert, Justin Cooper and Emily Tucker (eds.) 2007. Linguistic Field Methods. Eugene OR:
Wipf and Stock.
Wilkins, David P. 2003. Why pointing with the index finger is not a universal (in socio-cultural and
semiotic terms). In: Sotaro Kita (ed.), Pointing, Where Language, Culture, and Cognition Meet,
171–216. Mahwah, NJ: Lawrence Erlbaum.
Abstract
Film has been available since the end of the 19th century and indeed has been used by
socio-anthropological research since then. Nonetheless, this interest in film has not ex-
panded in a linear and cumulative way, and has not spread in the same way in all disci-
plines. The chapter focuses on the use of film for scientific research and retraces some
early examples of film research practices, beginning with Haddon and Regnault. It also
mentions the difficulties raised by this field, the controversies about the interpretation of
moving images and the way in which their strong realistic dimension has been both
exploited and criticized. For contemporary uses of video in the study of social interaction,
the Natural History of an Interview, a project initiated by Bateson in 1955, has been crucial,
using film to record an entire interaction in a continuous way, then transcribed in detail and
exploited for interdisciplinary analyses. This project inaugurated a long series of studies
interested in analyzing “naturally occurring social interactions”, which have been partic-
ularly developed on everyday conversations, school settings and workplace activities,
within visual anthropology, visual sociology, micro-ethnography, ethnomethodology,
conversation analysis, linguistics and education studies.
1. Introduction
Film potentialities were exploited by social and anthropological research as soon as the
first technological devices for creating moving images were available. Nonetheless, this
interest in film has not expanded in a linear way: even if visual anthropology and visual
sociology are now established fields, film, and video have not yet been widely adopted
within the various disciplines of the social and human sciences. This paradoxical situa-
tion shows that the availability of technology does not in itself guarantee its scientific
use. Its potentialities are used only when they converge with the conceptual, theoretical
and methodological aims of the disciplines.
Even if the uses of film and video in the social sciences are not following a linear
growth curve, they have always been fueled by an interest in human, cultural, and social
embodied (not only linguistic) conduct, grasped in its dynamic, processual, and tempo-
ral dimensions, as well as in its contextuality and situatedness. Interest in human actions,
relations, and interactions as they take place in diverse, everyday (or more specialized)
settings has prompted the use of film and video for documenting them. Likewise, an
interest in capturing the details of language, gesture, body, and movement, and in study-
ing them in rituals, in everyday life, and in professional contexts, has built a strong
motivation to use cinematic technologies.
63. Video as a tool in the social sciences 983
In this short text, we give a range of examples of the uses of film and video as data
(and not as documentaries; that is, as empirical materials for analysis, and not as a way
of transmitting, circulating, and popularizing results, which would deserve another
study) from across the early period in the history of the social sciences, as well as within
contemporary research, emphasizing the new observable objects that these visual
technologies allow us to study.
The film of a movement is better to research than the simple viewing of a movement; it is
superior, even if the movement is slow. Film decomposes movement in a series of images
that one can examine at leisure while slowing the movement at will, while stopping it as
necessary. Thus, it eliminates the personal factor, whereas a movement, once it is finished,
cannot be recalled except by memory. (Regnault, as cited in Rony 1996: 47)
Regnault captures here some of the key features of film technology. Even if his account
of the posture of the Wolof woman is impregnated with social Darwinism and imperi-
alist determinism, he participated in the early recognition of the potentialities of this
new technology.
The first steps in visual anthropology are generally attributed to Alfred Cort Had-
don, a zoologist who became fascinated by cultural practices and who, during the Cam-
bridge Anthropological Expedition to Torres Straits in 1898, shot various films of
natives dancing, performing ceremonies, and living their everyday life. Again, dance
is a favored topic for film, which allows for the capture of its dynamic movements.
Haddon advised another zoologist-turned-ethnographer, W. Baldwin Spencer, to
take a Kinematograph and an Edison phonograph with him during a year-long expedi-
tion to Central Australia in 1901. Together with Frances James Gillen, they filmed Ar-
unta ceremonies and Kurnara rain ceremonies. They were very clear about the
limitations of the technology (“it is not a very easy matter to use [the cinematograph]
984 V. Methods
amongst savages. As they move about, you never know exactly where they will be, and
you are liable to go on grinding away at the handle, turning the film through at the rate
of perhaps fifty feet per minute, and securing nothing” (Spencer and Gillen 1912: 218,
quoted by Griffiths 1996: 30)). In the face of the difficulty of anticipating the use of
the surrounding space during the ceremony, they tried to pan the camera, and they
finally adopted multiple viewpoints on the event by changing camera set-ups for
every shot: This can be considered as an early attempt to edit film and produce a
multi-scope record.
Some years later, the Austrian doctor and anthropologist Rudolf Pöch organized two
big expeditions to Papua New Guinea (1904–1906) and the Kalahari desert (1907–
1909), bringing a camera and a phonograph with him. As with Regnault’s work,
Pöch’s work provides an example of the early use of film in anthropology, although it
is impregnated with racist ideology (Jordan 1992: 42).
These early ethnographers integrated film technology into their fieldwork practices.
Having a positivistic impetus, they saw film technology as a means to produce “objective”
evidence and knowledge.
But after them, the use of cameras and films dropped radically. Griffiths (1996) ex-
plains this change in terms of the changing requirements of the anthropological
community:
The shift from the evolutionism of nineteenth century anthropology to the cultural relati-
vism and structural functionalism of the twentieth century American and British anthro-
pology inspired a questioning of the objectivity of “scientific” recording techniques such
as anthropometry, a questioning that may explain why so few anthropologists expressed
an interest in moving pictures in the wake of Haddon’s and Spencer’s turn-of-the-century
fieldwork. (Griffiths 1996: 23; see also Banks and Morphy 1997)
It is in the 1930s that film is used again in two significant anthropological studies. The
first study was done by Franz Boas who, as late as 1930, when he was already seventy
years of age, filmed a number of Kwakiutl dances in his last trip to a community he had
studied for forty years (Ruby 1980). The filming was motivated by a strong interest in
rhythm and in the body – which it was his idea to transcribe using the Laban notation
(Laban 1926) – as well as the fact that it was a way of saving the traditional and endan-
gered Kwakiutl culture. Again, the film supported a vision of cultural practices invol-
ving not only language but also the body, and including “motor habits” as Boas
described them (Ruby 1980) – that is, mobility and gesture.
Later on, Boas’ student, David Efron, was even more interested in gesture: he
engaged in a comparative study of the gestural repertoires in different neighboring
communities (Sicilian and Lithuanian Jewish immigrants in the lower East Side of
New York City), as well as of the effect of assimilation on the range of gestures used
by their first generation descendants (Efron 1941).
Efron is an important pioneer in gesture studies (Kendon 2004); he acknowledged
that his use of film techniques was heavily inspired by Boas (“the idea of using film
as a research device in the field of “motor habits” originated entirely with “Papa
Franz” himself who discussed with us at great length his ideas about photographs,
motion pictures and sketches as research tools,” Efron, quoted by Ruby 1980: 11).
The second crucial experience in filming in the 1930s was made by Mead and Bateson
63. Video as a tool in the social sciences 985
in Bali, where they stayed for 3 years (1936–1939) and shot about 22,000 feet of film.
Both were in contact with eminent precursors of visual anthropology, since Bateson
had been a student of Haddon and Mead of Boas. In their fieldwork, they collected
notes, transcripts, photographs, and films, all related through timed notes (Jacknis
1988: 163–164), already addressing the problem of how to synchronize these comple-
mentary materials. Moreover, anticipating later reflexive experiments, they showed
their films to the informants and gathered their comments about the recorded scenes.
Their way of treating recordings considers them as “data” and distances them from
the tradition of documentary films:
We tried to use the still and the moving picture cameras to get a record of Balinese beha-
viour, and this is a very different matter from the preparation of a “documentary” film or
photographs. We tried to shoot what happened normally and spontaneously, rather than to
decide upon the norms and then get the Balinese to go through these behaviours in suit-
able lighting. (Bateson and Mead 1942: 49)
Ten years later, Bateson played another important role in the use of film for social
research, when he built an interdisciplinary team at the Veterans Administration Hos-
pital of Palo Alto, studying communication in psychotherapy and in families with a
member affected by schizophrenia. Film recordings were used to observe how family
members interact, taking into consideration “non-verbal communication” (see Bateson
et al. 1956 on “double bind”). In 1955, the group worked together on the video record-
ing of a psychiatric interview between Bateson and one of his patients, Doris, then tran-
scribed by Hockett, Birdwhistell and McQuown (McQuown 1971). The title of the
project, Natural History of an Interview, significantly refers to an analysis of human
behavior that recognizes the importance of “spontaneous conversational materials”
in “a variety of contexts” (McQuown 1971: 9 and 11).
On the basis of this film, Birdwhistell eventually developed his famous analysis of the
cigarette scene and the discipline of kinesics.
With the Natural History of an Interview begins the contemporary phase of the use
of video in the social sciences: Influenced by Scheflen, the founder of context analysis
(see Scheflen 1972) who was part of the Palo Alto group, Kendon (1967, 1970) gave a
new vigor to the study of gesture. Some years afterwards, conversation analysis and
other praxeologically and pragmatically oriented approaches developed more and
more sophisticated ways of documenting social practices, and the use of video began
to spread in all the social and human sciences, not only in anthropology, where it
has been used since its invention, but also in linguistics, sociology, studies of work,
technology studies, and education.
1896 and 1916, thirty-one articles using visual images were published in the American
Journal of Sociology. But these types of data disappeared afterwards, and lost the com-
petition with other means of illustration, evidence, or proof, such as statistical reports
(Stasz 1979). Additionally, the fact that, from the beginning of cinematography, com-
mercial film widely exploited ethnographic subjects as a source of curiosity and exoti-
cism might have convinced the scientific community that the medium was too
popular for serious science (Griffiths 1996: 19). The relationship between films as
data for research, commercial cinema and documentary film has never been simple
(see Bateson and Mead 1942: 49).
This leads to a contemporary paradox: although film and then video technologies
have been available for more than a century, and although there are early examples
of research not only using but also advocating the use of film, first in anthropology,
and then also in sociology, the field appears today to have just started to burgeon.
Another paradox is that despite the massive use of images in contemporary society
and the spread of technological media like film, television, video, and computers,
there is still an absence of consolidated and standardized practices in video making
and video analysis in the social sciences.
Among the more recent reasons for this late development, some elements can be
pointed out. A general focus on language as the main manifestation of culture and soci-
ety and the privileged medium for accessing them, as well as the emphasis on writing as
a central practice in fieldwork, have competed with alternative approaches based on
visual instead of audible resources, contributing to their marginalization. In comparison
to the innumerable theoretical and methodological models available for analyzing ver-
bal language, methods for interpreting images have been less developed – giving the
impression that images are more superficial and less telling than language. In this con-
text, the use of film has been confronted by the skepticism affecting image as a means of
knowledge (Jay 1994), as well as by doubts about the objectivity, realism, and positivism
of images. These latter aspects have been criticized within discussions about the “crisis
of representation” (see Marcus and Fischer 1986), leading to an increased awareness of
the constructedness of any data social scientists produce and analyze, as well as of
the fact that images, too, are theoretically and ideologically loaded, and pervaded by
relationships of power, gender asymmetries, and ethnic discrimination.
In response to the crisis of representation and to the challenges constituted by the
use of video data in the social sciences, the latest developments in videographic
methods have insisted on the importance of doing fieldwork in order to prepare what
is to be shot by the camera, and of ethical concerns in securing informed consent, get-
ting authorizations, and protecting the privacy of the informants (Mohn 2002). More-
over, instead of considering that informants gazing at the camera reveal “bias”
undermining the “objectivity” of video recordings, analyses dealing with the orientation
towards the camera have turned it into a phenomenon to be studied, which documents
the situated conditions in which the video records are produced, within a reflexive
approach (Heath 1986: 176; Laurier and Philo 2006; Lomax and Casey 1998; Speer
and Hutchby 2003).
Video has been increasingly used as data and not just as an illustration, and is con-
sidered essential for overcoming the limitations of participant observation and of field
notes, and for making available details of embodied conduct that cannot be imagined by
introspection but can only be discovered and observed with adequate records. In turn,
63. Video as a tool in the social sciences 987
the observation of these details fueled the development of analytical perspectives re-
cognizing the importance of visual feature for the study of communication, language,
and practice. MacDougall speaks in this respect of “a shift from word-and-sentence-
based anthropological thought to image-and-sequence-based anthropological thought”
(MacDougall 1997: 292). Related to this new perspective, and fostered by gesture stu-
dies and multimodal analysis, videos as data have been increasingly enriched with an-
notations and transcriptions, as well as being stored and eventually made available in
archives and searchable data banks. In this context, sharing video records has been con-
sidered as a way to ensure intersubjective interpretations and assessments of analyses,
and to make data available for the public examination of the scientific community.
anthropology (Banks and Ruby 2011), visual sociology (Knoblauch et al. 2006; Knoblauch
et al. 2008; Pink 2001), micro-ethnography, ethnomethodology, conversation analysis
(Sidnell and Stivers 2005; Schmitt 2007; Mondada and Schmitt 2010; Haddington,
Mondada, and Nevile 2013; Streeck, Goodwin, and LeBaron 2011), workplace studies
(Luff, Hindmarsh, and Heath 2000; Middleton and Engeström 1996), linguistics (De
Stefani 2007), gesture studies, the science of education, etc.
Everyday settings have been explored by interaction analysis, conversation analysis,
gesture studies, and linguistics in order to uncover the use of multimodal resources for
the situated organization of social, cultural, and linguistic practices. Collaborating with,
and being inspired by, the work of Birdwhistell in kinesics (see Birdwhistell 1970) and
Scheflen in context analysis, Kendon (1979, 1990, 2004) showed very early on the advan-
tages of using film and video for interaction analysis, with a multiple focus on gaze, ges-
ture, and spatial ecology. Emphasizing the role of these resources in the organization of
social interaction, Goodwin (1981, 1993, 2000) shows how video allows the capture of
the systematics of gaze and turn organization, the coupling between gesture and the
environment, and the intertwining of various semiotic fields in complex social actions,
from everyday conversation to highly specialized professional settings. This allows
the development of a vision of language in action which is less logocentric than in
traditional, grammatical, rather abstract, and disembodied accounts.
Educational settings were investigated with cameras very early on, with a tradition of
studies on classrooms, both in ethnomethodology (Mehan 1993; Spier 1973) and in
micro-ethnography. More particularly, the work of F. Erickson (see Erickson 1982,
2004, and 2006 on videography) has been influential on a large range of ethnographic
classroom studies. Classrooms have also been investigated by science of education pro-
jects interested in capturing the embodied, ecologically situated, and dynamic processes
of learning. The particularities of this setting – which can involve a large audience in
front of the teacher, as well as multiple working groups spread about in a room, working
on whiteboards and tables, with various material and textual artifacts – as well as the
features of the learning processes (involving long-term observation across the curriculum
and thus micro- as well as long-term longitudinal documentation), offer various chal-
lenges to video recordings as well as to data archiving and to the analysis of large
amounts of data (Aufschnaiter and Welzel 2001; Derry 2007; Goldman et al. 2007).
Another complex setting which has been extensively studied in the last decade is the
workplace. Characterized by complex, fragmented, and heterogeneous ecologies, by
collaborative work both face-to-face and at a distance, and by the use of artifacts and
technologies, the workplace presents a multiplicity of embodied, visual, material, and
spatial features which has prompted challenging reflections on the best way to video-
graph them. Complex video devices, often coordinating several cameras, dynamic screen
capture, audio recordings of telephone conversations, and the collection of other docu-
ments, have been used, but also the need for a strong ethnographic approach prior to and
during the recordings has been advocated (Borseix 1997; Heath, Hindmarsh, and
Luff 2010; Mondada 2008). Some scholars have also added extra video recordings, doc-
umenting the confrontation between the participants and their (videorecorded) work
(Theureau 2010).
Of course, these topics in no way exhaust the richness of the uses of video as a tool in
the social sciences, but they give a picture of the diversity of settings and disciplines that
63. Video as a tool in the social sciences 989
have been explored, as well as the multiplicity of issues that researchers have been able
to discuss.
5. Conclusion
The use of film and video in the social sciences is as old as the first cinematographic in-
ventions. From the beginning, film has been seen as an indispensable tool for the obser-
vation of dynamic action, movement, and embodied practices in their ordinary settings.
Nevertheless, the use of film and video has not developed linearly in the history of the
social sciences: This shows the importance not only of the availability of technologies,
but also of the compatibility of technologies with topical interests, methodologies, and
scientific ways of presenting results and evidence. From the 1960s onwards, there has
been an increasing focus on socio-cultural practices as they are observable in their nat-
ural context, without being orchestrated by the researchers. This praxeological turn
in the social sciences has meant an increasing use of video as a tool for analyzing
embodied details that are not imaginable but are only discoverable by fine-grained
observation.
6. References
Aufschnaiter, Stefan and Michaela Welzel 2001. Nutzung von Videodaten zur Untersuchung von
Lehr-Lern-Prozessen. Münster, Germany: Waxmann.
Banks, Marcus and Howard Morphy 1997. Rethinking Visual Anthropology. London: Yale Univer-
sity Press.
Banks, Marcus and Jay Ruby 2011. Made to Be Seen: Historical Perspectives on Visual Anthropol-
ogy. Chicago: University of Chicago Press.
Barnes, Donna B., Susan Taylor-Brown and Lori Weiner 1997. “I didn’t leave y’all on purpose”:
HIV-infected mothers’ videotaped legacies for their children. Qualitative Sociology 20: 7–32.
Bateson, Gregory, Don D. Jackson, Jay Haley and John Weakland 1956. Towards a theory of
schizophrenia. Behavioral Science 1: 251–264.
Bateson, Gregory and Margaret Mead 1942. The Balinese Character: A Photographic Analysis.
New York: New York Academy of Sciences.
Birdwhistell, Ray L. 1970. Kinesics and Context. Philadelphia: University of Pennsylvania Press.
Borseix, Annie (ed.) 1997. Filmer le Travail. Champs Visuels, 6. Paris: L’Harmattan.
Braune, Christian Wilhelm and Otto Fischer 1895. Der Gang des Menschen. Teil 1. Sächsische Ge-
sellschaft der Wissenschaften: Leipzig, XXI.
Derry, Sharon J. (ed.) 2007. Guidelines for Video Research in Education: Recommendations from
an Expert Panel. Prepared for the National Science Foundation, Interagency Education
Research Initiative, and the Data Research and Development Center. Available at: http://
drdc.uchicago.edu/what/video-research.html.
De Stefani, Elwys (ed.) 2007. Regarder la Langue. Les Données Vidéo dans la Recherche Linguis-
tique. Numéro Spécial du Bulletin VALS–ASLA, 85. Neuchâtel: Bulletin suisse de linguistique
appliquée.
Efron, David 1941. Gesture and Environment. New York: King’s Crown Press.
Erickson, Frederick 1982. Audiovisual records as a primary data source. Sociological Methods and
Research 11: 213–232.
Erickson, Frederick 2004. Origins: A brief intellectual and technical history of the emergence of
multimodal discourse analysis. In: Philip Levine and Ron Scollon (eds.), Discourse and Tech-
nology: Multimodal Discourse Analysis, 196–207. Washington DC: Georgetown University
Press.
990 V. Methods
Erickson, Frederick 2006. Definition and analysis of data from videotape: Some research proce-
dures and their rationales. In: Judith L. Green, Gregory Camilli and Patricia B. Elmore
(eds.), Handbook of Complementary Methods in Educational Research, 3rd edition, 177–192.
Mahwah, NJ: Lawrence Erlbaum.
Goldman, Ricki, Roy Pea, Brigid Barron and Sharon J. Derry (eds.) 2007. Video Research in the
Learning Sciences. Mahwah, NJ: Lawrence Erlbaum.
Goodwin, Charles 1981. Conversational Organization: Interaction between Speakers and Hearers.
New York: Academic Press.
Goodwin, Charles 1993. Recording interaction in natural settings. Pragmatics 3(2): 181–209.
Goodwin, Charles 2000. Action and embodiment within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Griffiths, Alison 1996. Knowledge and visuality in turn of the century anthropology: The early eth-
nographic cinema of Alfred Cort Haddon and Walter Baldwin Spencer. Visual Anthropology
Review 12(2): 18–43.
Haddington, Pentti, Lorenza Mondada and Maurice Nevile (eds.) 2013. Interaction and Mobility.
Language and the Body in Motion. Berlin: De Gruyter.
Harper, Douglas 2002. Talking about pictures: A case for photo elicitation. Visual Studies 17(1):
13–26.
Heath, Christian 1986. Body Movement and Speech in Medical Interaction. Cambridge: Cambridge
University Press.
Heath, Christian, Jon Hindmarsh and Paul Luff 2010. Video in Qualitative Research. London: Sage.
Holliday, Ruth 2004. Filming the closet: The role of video diaries in researching sexualities. Amer-
ican Behavioral Scientist 47(12): 1597–1616.
Jacknis, Ira 1988. Margaret Mead and Gregory Bateson in Bali: Their use of photography and film.
Cultural Anthropology 3(2): 160–177.
Jay, Martin 1994. Downcast Eyes. The Denigration of Vision in Twentieth-Century French Thought.
Berkeley: University of California Press.
Jordan, Pierre-L. 1992. Ein Blick auf die Geschichte – Geschichte eines Blickes. Cinema-Cinema-
Kino, 23–74. Marseille, France: Musées de Marseille.
Kendon, Adam 1967. Some functions of gaze-direction in social interaction. Acta Psychologica 26:
22–63.
Kendon, Adam 1970. Movement coordination in social interaction. Acta Psychologica 29: 100–125.
Kendon, Adam 1979. Some methodological and theoretical aspects of the use of film in the study
of social interaction. In: Gerald P. Ginsburg (ed.), Emerging Strategies in Social Psychological
Research, 67–91. New York: Wiley.
Kendon, Adam 1990. Conducting Interaction: Patterns of Behavior in Focused Encounters. Cam-
bridge: Cambridge University Press.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Knoblauch, Hubert, Alejandro Baer, Eric Laurier, Sabina Petschke and Bernt Schnettler (eds.)
2008. Visual Methods. Forum: Qualitative Social Research, Vol 9, No 3, http://www.qualitative-
research.net/index.php/fqs/issue/view/11/showToc.
Knoblauch, Hubert, Bernt Schnettler, Jürgen Raab and Hans-Georg Soeffner (eds.) 2006. Video
Analysis: Methodology and Methods. Qualitative Audiovisual Data Analysis in Sociology.
Frankfurt am Main: Lang.
Laban, Rudolf 1926. Choreographie. Jena, Germany: Eugen Diederichs.
Laurier, Eric and Chris Philo 2006. Natural problems of naturalistic video data. In: Hubert Kno-
blauch, Jürgen Raab, Hans-Georg Soeffner and Bernt Schnettler (eds.), Video-Analysis: Method-
ology and Methods. Qualitative Audiovisual Data Analysis in Sociology, 183–192. Bern: Lang.
Lee, Raymond M. 2004. Recording technologies and the interview in sociology, 1920–2000. Soci-
ology 38(5): 869–889.
Lomax, Helen and Neil Casey 1998. Recording social life: Reflexivity and video methodology.
Sociological Research Online 3(2), http://www.socresonline.org.uk/3/2/1.html.
63. Video as a tool in the social sciences 991
Luff, Paul, Jon Hindmarsh and Christian Heath (eds.) 2000. Workplace Studies. Recovering Work
Practice and Informing System Design. Cambridge: Cambridge University Press.
MacDougall, David 1997. The visual in anthropology. In: Marcus Banks and Howard
Morphy (eds.), Rethinking Visual Anthropology, 276–295. New Haven, CT: Yale University
Press.
Marcus, George E. and Michael M. J. Fischer 1986. Anthropology as Cultural Critique. Chicago:
University of Chicago Press.
Marey, Etienne-Jules, 1896. L’étude des mouvements au moyen de la chronophotographie. Revue
générale internationale. n˚ 1, 200–218.
McQuown, Norman A. (ed.) 1971. The Natural History of an Interview. Chicago: Microfilm Col-
lection, Manuscripts on Cultural Anthropology, Joseph Reginstein Library, Department of
Photoduplication, University of Chicago.
Mead, Margaret 1995. Visual anthropology in a discipline of words. In: Paul Hockings (ed.), Prin-
ciples of Visual Anthropology, 3–10. New York: De Gruyter.
Mehan, Hugh 1993. Why I like to look: On the use of videotape as an instrument in educational
research. In: Michael Schratz (ed), Issues in Qualitative Research, 93–105. London: Falmer
Press.
Middleton, David and Yrjö Engestrom (eds.) 1996. Cognition and Communication at Work. Cam-
bridge: Cambridge University Press.
Mohn, Elisabeth 2002. Filming Culture: Spielarten des Dokumentierens nach der Repräsentation-
skrise. Stuttgart, Germany: Lucius and Luci.
Mondada, Lorenza 2003. Working with video: How surgeons produce video records of their ac-
tions. Visual Studies 18(1): 58–73.
Mondada, Lorenza 2006. Video recording as the reflexive preservation-configuration of phenom-
enal features for analysis. In: Hubert Knoblauch, Jürgen Raab, Hans-Georg Soeffner and
Bernt Schnettler (eds.), Video-Analysis: Methodology and Methods. Qualitative Audiovisual
Data Analysis in Sociology, 51–68. Bern: Lang.
Mondada, Lorenza 2008. Using video for a sequential and multimodal analysis of social interac-
tion: Videotaping institutional telephone calls. FQS (Forum: Qualitative Sozialforschung /
Forum: Qualitative Social Research) 39: 1–35.
Mondada, Lorenza and Reinhold Schmitt (eds.) 2010. Situationseröffnungen: Zur Multimodalen
Herstellung Fokussierter Interaktion. Tübingen, Germany: Narr.
Muybridge, Eadweard 1887. Animated Locomotion. Philadelphia: University of Pennsylvania.
Pink, Sarah 2001. More visualising, more methodologies: On video, reflexivity and qualitative
research. Sociological Review 49: 586–599.
Regnault, Felix and M. M. Lajard 1895. Poterie crue et origine du tour. Bulletin de la Société
d’anthropologie de Paris 4(6): 734–739.
Rony, Fatimah Tobing 1996. The Third Eye: Race, Cinema and Ethnographic Spectacle. Durham,
NC: Duke University Press.
Ruby, Jay 1980. Franz Boas and early camera study of behavior. Kinesics Report 3(1): 6–11.
Ruby, Jay 2000. Picturing Culture: Explorations on Film and Anthropology. Chicago: University of
Chicago Press.
Scheflen, Albert E. 1972. Body Language and Social Order: Communication as Behavioral Con-
trol. Englewood Cliffs, NJ: Prentice Hall.
Schmitt, Reinhold (ed.) 2007. Koordination. Analysen zur Multimodalen Interaktion. Tübingen,
Germany: Narr.
Sidnell, Jack and Tanya Stivers (eds.) 2005. Multimodal Interaction. Special Issue of Semiotica 156.
Speer, Susan A. and Ian Hutchby 2003. From ethics to analytics: Aspects of participants’ orienta-
tions to the presence relevance of recording devices. Sociology 37(2): 315–337.
Spencer, Walter Baldwin and Frances James Gillen 1912. Across Australia. London: Macmillan.
Spier, Matthew 1973. How to Observe Face-to-Face Communication. A Sociological Introduction.
Pacific Palisades, CA: Goodyear.
992 V. Methods
Stasz, Clarice 1979. The early history of visual sociology. In: Jon Wagner (ed.), Images of Informa-
tion: Still Photography in the Social Sciences, 119–136. London: Sage.
Streeck, Jürgen, Charles Goodwin and Curtis LeBaron (eds.) 2011. Embodied Interaction: Lan-
guage and Body in the Material Wolrd. Cambridge: Cambridge University Press.
Theureau, Jacques 2010. Les entretiens d’autoconfrontation et de remise en situation par les traces
matérielles et le programme de recherche ‘cours d’action’. Revue d’Anthropologie des Con-
naissances 4(2): 287–322.
Worth, Sol and John Adair 1972. Through Navajo Eyes: An Exploration in Film Communication
and Anthropology. Bloomington: Indiana University Press.
Abstract
The process of transcribing requires some fundamental content-related as well as layout-
related decisions. As has been discussed in a series of papers (e.g., Ochs 1979; Cook 1990;
Edwards and Lampert 1993), each of these decisions is influenced by transcribers’ pre-
conceptions and theoretical assumptions, by the research question and by methodological
considerations. Yet it has not been explored in all its implications that transcripts them-
selves lay the foundation for conceptualizations of the object of study and thus contribute
to the construction of theory.
Four arguments for this reflexive relation between transcription and theory will be pre-
sented: 1. Given their inherent selectivity, transcripts reduce possible analyses and inter-
pretations of the data presented, and they reduce the array of questions that could
possibly be asked. 2. Transcripts play a pivotal function in the hermeneutic research pro-
cess, serving as the basis for and documenting the outcome of analysis and interpretation
of the data. 3. Transcripts are texts – and as such, they become subject to writing and read-
ing conventions and to rules for interpreting texts rather than multimodal interaction.
4. The use of writing as a medium of representation raises issues of linguistic norms
and standards and it reinforces “monologistic” (Linell 1998) conceptualizations of
discourse.
1. Introduction
Transcripts are graphical representations of some communicative event that serve
to preserve the genuinely evanescent data collected as a basis for further analysis.
64. Approaching notation, coding, and analysis from a conversational analysis 993
They are used in a wide variety of disciplines, such as linguistic anthropology (Duranti
1997), sociology (Atkinson and Heritage 1984), psychology (MacWhinney 1995) and
linguistics (Ehlich 1993). In all of these disciplines, new approaches have arisen that
do not use experiments nor rely on introspection, but rather base their analysis on empir-
ical data, analyzing discourse as situated, that is, context-bound or at least partially made
up of context-sensitive language use as a communicative event. Consequently, in all of
these approaches, context comes to play a crucial role in the explication of both the
form and the meaning of discourse (see Cook 1990: 1). To validate analysis, the represen-
tation of contextual information becomes crucial. In recent years, several transcription
systems have been developed to meet the need for representing language use in context.
Bergmann (1981) compared the impact of the introduction of audio-recording tech-
nology to discourse analysis to that of the microscope to biology. The same can be said
for the implementation and proliferation of video-technology that has enabled re-
searchers to record visible behaviour and to analyse visible bodily action as utterances
(as programmatically formulated by Kendon 2004). In recent years, the focus has
shifted from the analysis of discourse to the study of multimodal communication,
that is, communication with linguistic as well as bodily means. With the growing need
for the representational inclusion of bodily acts into transcripts, conventional transcrip-
tional systems have been extended and new notational systems have been developed for
the representation of bodily means of communication in coordination with speech. Yet,
until now there exists no single standard transcriptional system, as researchers have
created their own systems that best suit their special interests and research questions.
During the process of transcribing, some fundamental content-related as well as
layout-related decisions have to be made. These concern
(i) the selection of what to transcribe, that is, which context information to include
and which to neglect,
(ii) the segmentation of the flow of observable behaviour into meaningful units,
(iii) the placement of text, that is, speakers’ turns, and contextual information, and
their relation to each other, and
(iv) the notational symbols (see Edwards 1993, 2001).
As has been discussed in a series of papers in recent years (e.g., Ochs 1979; Cook 1990;
the collection in Edwards and Lampert 1993), each of these decisions is influenced by
transcribers’ preconceptions and theoretical assumptions, by the research question and
by methodological considerations. The growing awareness of the impact of theory on
transcripts has led to the development of criteria for the quality of transcription systems
(see among others Ehlich 1993: 125; Selting et al. 1998; Deppermann 2001: 46; Edwards
2001; Selting et al. 2009). What has not yet been explored in all its implications is that
not only are transcripts influenced by theoretical assumptions, but that they themselves
lay the foundation for conceptualizations of the object of study and thus contribute to
the construction of theory.
(i) Four arguments for this reflexive relation between transcription and theory will be
presented: Given their inherent selectivity, transcripts reduce possible analyses and
interpretations of the data presented, and they reduce the array of questions that
could possibly be asked.
994 V. Methods
(ii) Transcripts play a pivotal function in the hermeneutic research process, serving as
the basis for and documenting the outcome of analysis and the interpretation of
the data.
(iii) Transcripts are texts – and, as such, they become subject to writing and reading
conventions and to rules for interpreting texts rather than to multimodal
interaction.
(iv) The use of writing as a medium of representation raises issues of linguistic norms
and standards, and it reinforces “monologistic” (Linell 1998) conceptualizations of
discourse. This will be shown for the transcription of verbal interaction (part I) and
for the notation of bodily communication (part II).
The problems posed for transcription by the introduction of context are of identity, quality
and quantity. The first problem is to find a means of distinguishing relevant features, the
second of devising a transcription system that is capable of expressing them, the third
that even if such a system could be devised it would make the presentation of data
(which in its actual production had occupied a short space or time) take pages of transcript
[…]. (Cook 1990: 4)
Each of these layers of context has its own limitations and thus criteria in themselves for
selection. While transcription systems correspond in that they capture the text (more or
less completely), they widely differ in which of the other contextual factors they
include, in what degree of detail they include notation of contextual factors, in the
64. Approaching notation, coding, and analysis from a conversational analysis 995
categories they build and in the reasons they give for including some context parameters
and for excluding others.
Only a few researchers claim attainment of theoretically neutral transcriptions with-
out any pre-selection. Among them, Lévi-Strauss (discussing the different methodolo-
gies of observation versus experimentation in his famous Structural Anthropology)
states: “On the observational level, the main – one could almost say the only – rule
is that all the facts should be carefully observed and described, without allowing any
theoretical preconception to decide whether some are more important than others”
(Lévi-Strauss 1963: 280). Comparably, in the early days of the naturalistic approach
to interaction, Scheflen proposed that “[w]e do not decide beforehand what is trivial,
what is redundant, or what alters the system. This is a result of the research” (Scheflen
1966: 270, original emphasis).
Although a complete account of all contextual features seems desirable in order to
build data collections and corpora and to share data and transcripts among scholars, this
is not feasible. Transcripts are inherently and unavoidably selective. Selection is guided
by intuition, theory, research interest and methodological assumptions. Most research-
ers acknowledge the inherently selective nature of transcripts, some, like Cook (1990:
15) and Ochs (1979: 44), explicitly encourage it, stating that the transcript should reflect
what is known about communication as well as the particular research interest, that is,
the hypotheses to be examined (see Ochs 1979: 44).
In fact, transcribing all that “is there” is impossible for logical as well as for practical
reasons: “The amount of co-text and the amount of context exist on different dimen-
sions and thus in directly inverse proportion (i.e., the more co-text that is presented
in a given space, the less of other types of context)” (Cook 1990: 15). Transcribing
“all” would exceed transcribers’ as well as readers’ working memory. Moreover, as
Cook states:
[…] if we present everything (assuming for a moment that we could) this would be a repro-
duction of the speech event itself (or more exactly the speech event as its representation in
the mind of the participants is represented in the mind of the observer), and thus exclude
the validity of the analysis, rendering it no different in kind – if considerably more difficult
in apprehension – from witnessing (as a participant) the speech event itself. (Cook 1990: 4,
original emphasis)
One way to cope with the problem of selection is to theoretically and/or methodolog-
ically exclude some contextual factor as being irrelevant for analysis. E.g., in ethno-
methodology, intentionality has been explicitly rejected as a base for analysis: “[…]
meaningful events are entirely and exclusively events in a person’s behavioral environ-
ment [….] Hence, there is no reason to look under the skull since nothing of interest is
there but brains” (Garfinkel 1963: 190). Another, rather practical, way to reduce the
range of potentially relevant contextual features is to use data that lack some of
these contextual layers, such as telephone conversations where bodily communication
and the situation are not visible and thus are not available for participants in the
construction of meaning. Yet another solution is to make methodological use of the
context as it is done in Conversation Analysis and in Gumperz’ (1982) approach to con-
textualization. In these approaches, context is not seen as a given external fact, but as
established by the very interaction itself. Consequently, the transcriptional system
996 V. Methods
developed by Gumperz “attempts to set down on paper all those perceptual cues that
past research and ongoing analyses show participants rely on for their online processing
of conversational management signs” (Gumperz and Berenz 1993: 92).
Still, most researchers design their transcription systems more sensitively around the
topic under investigation, such as Bloom (1993) and Chafe (1993), who explicitly
include those and only those variables needed for the current research interest. Others,
such as Selting et al. (1998, 2009) propose a minimal transcript as working transcript
that can be complemented and refined according to the actual research question.Argu-
ably, the question is not to represent each and every contextual feature, but to select
relevant contextual features and to account for the selection. Therefore we need
explicit criteria for selection that are guided by theory and by our general knowledge
in the field. It is on the basis of empirical studies that our knowledge about relevant
context features grows, which in turn enables researchers to design transcription sys-
tems that increasingly make use of what previous studies have shown to be relevant
(see, e.g., DuBois et al. 1993).
way relevant to the immediate prior turn. The expectation of the reader matches the expec-
tation of adult speakers (Grice, 1975), and by and large inferences based on contingency are
correct. (Ochs 1979: 46)
On the other hand, the column format exploits the horizontal dimension to represent
simultaneity. In this format, the amount of talk of each participant is highly conspicu-
ous. Thus, it is particularly well suited for the representation of interactions between
participants with asymmetric speaking rights such as is often the case in institutional
settings. It also allows for the representation of unconnected utterances by several
speakers, as is often observed in children’s interactions (see Ochs 1979). Furthermore,
reading conventions and routines lead to the attribution of initiative and activity to the
participant whose utterances are placed in the leftmost column (see Ochs 1979: 50f). To
counterbalance this, one may place the more dominant participant in the right column
(see Ochs 1979: 51).
In sum, transcribers rely on reading conventions in order to make their analysis
clearly visible, and, regardless of transcribers’ intentions and conventions, readers’
routines contribute to their interpretation of the data represented in one or the other
format, and thus reinforce (or perhaps undermine) interpretations made by the
transcriber.
of them have led to the discovery of ranges of orderliness; most of them are yet to be
explored” (Jefferson 2004: 23). Other researchers, such as Gumperz, object that this so-
called eye dialect “tends to trivialize participant’s utterances by conjuring up pejorative
stereotypes, while neither representing the phonetic level more precisely nor capturing
detail relevant to the analysis” (Gumperz and Berenz 1993: 96). Therefore, Gumperz
and Berenz (1993: 97) propose, whenever the use of a variety or varieties of colloquial
pronunciation is relevant to the research interest, to include both the standard ortho-
graphy and the popular spelling of the colloquialism in a regularized way and to
indicate the regularizations made in prefatory comments to the transcript.
Given that neither orthography nor literal transcription represents pronunciation
one-to-one, especially when the use (or non-use) of dialects or vernacular varieties
plays a central role in a given interaction, a phonetically more precise transcription,
e.g. using the International Phonetic Alphabet (IPA) would be recommended. Yet,
this gain in accuracy causes a loss in readability, since using a phonetic transcription
assumes as a prerequisite the professional training and competence of transcribers as
well as readers (Edwards 1993: 20; Edwards 2001: 330).
In any case, transcribers have to weigh the benefits of detailed transcription against
readability. Whereas stops, anacolutha, repetitions and so on can be accounted for by
interactive exigencies, their representation thwarts readers’ expectations of written
texts as containing correct and complete syntactic structures. What goes unnoticed
when listening to a spoken conversation is glaring when reading its written reproduction
in a transcript. Furthermore, even if discourse analysis and sociolinguistics do not con-
sider dialects as an impoverished standard (Duranti 1997: 139), their representation (or
in contrast, their regularization into standard orthography) may turn out to be quite
consequential, e.g. in police interrogations and in courtroom proceedings. The use of
a vernacular and its representation in transcripts is a social and political issue even
beyond legal and medial settings (see Buchholtz 2000).
analyzes “any facial movement into anatomically based minimal action units” (Ekman
and Friesen 1982: 180). Yet, in agreement with Frey et al. (1981), Ekman and Friesen
explicitly “wanted to build the system free of any theoretical bias about the possible
meaning of facial behaviors” (Ekman and Friesen 1982: 180). The authors argue that
“the measurement must be made in noninferential terms that describe the facial behav-
ior, so that the inferences can be tested by evidence” (Ekman and Friesen 1982: 182).
Recently, Sager (2005) applied this approach to the coding of gesture.
In contrast, in one of the most intriguing attempts to apply structural linguistic
methods to bodily communication, Birdwhistell (1970) comes up with functional
units called “kines”, which are defined as “the least perceptible units of body motion”
(Birdwhistell 1970: 166). Kendon (2004 and others) and McNeill (1992 and others) de-
veloped notational systems from their studies of cognitive and communicative functions
of speech-accompanying gestures, and in so doing classified them on the basis of semi-
otic, semantic and pragmatic analyses. In these transcription systems, bodily communi-
cation is represented by pictures (be it drawings or stills), with the bodily movements
being described in ordinary terms and formal parameters (such as the onset, apex,
stroke, hold, retraction/transition) being indicated by special symbols.
Coding systems share the constraint of phonetic transcription of verbal utterances in
that they are to be done and to be read by professionally trained people only. Further-
more, rather positivist notations of the spatio-temporal parameters do not contribute to
readers’ understanding of the ongoing interaction. On the other hand, functional codes,
as well as descriptions of body movements in ordinary terms, run into the danger of
turning out rather interpretive, confounding description with inference (Ekman and
Friesen 1982: 182) and thus foreclosing any alternative analysis and interpretation.
1993: 131). The score-format has been developed specifically for the transcription of
multiparty multimodal communication. Yet, as the description of bodily movements
often requires much more space than the transcription of speech, it disrupts the repre-
sentation of the text. Therefore, for longer descriptions of non-phonological phenom-
ena, Ehlich proposes to “mark the relevant point within the score area and add a full
description in the left margin, enclosed in other brackets” (Ehlich 1993: 135). What
is chosen out of practical considerations for readability sets the bodily means of
communication apart, and thus literally marginalizes it.
When bodily communication is investigated, the coordination of and temporal rela-
tionship between the several modes of communication have to be indicated. Regardless
of the specific theoretical background of the researchers, most transcription systems
using the line format segment the stream of speech into prosodic units, which are
seen as the fundamental unit of speech production (see Gumperz and Berenz 1993:
95) and which serve as the basis for formatting the transcript. Yet, despite being co-
expressive with speech (Kendon 1980; McNeill 1992), gesture phrases are not exactly
co-extensive with intonation phrases (Bohle 2007: 194–242). Nevertheless, even in
studies on speech accompanying gesture, taking gesture phrases as the basic unit is
the exception (but see Kita 2000: 172).
Likewise, in transcripts using the score format, the verbal utterance serves as refer-
ence for the indication of temporal relations between several modes of communication
(e.g., Ehlich 1993: 136). Yet, (not only) during periods of time when no-one speaks,
some mode of bodily behaviour may be used as a reference to which the other elements
are related (see Heath 1986: 20). Verbal utterances, as well as bodily movements, may
as well all be indicated with reference to some external unit of measurement, be it the
frame number or the time code, without prioritizing some mode of communication as
the organizing point of reference (e.g., Frey et al. 1981).
Another consequence of the verbal utterance serving as reference for the notation of
bodily communication is that the verbal utterance is most often placed in the first line,
which may reverse the online perception of verbal-gestural utterances. At least iconic
gestures tend to precede their “lexical affiliate” and thus open a projection space
(Schegloff 1984), with the word entering a prepared scene (Streeck 1993). In a given
interaction, gestures are typically seen before their lexical affiliates are heard. However,
due to our reading conventions, when placed beneath or right to the text, information
about gesture is read after the text.
(especially alphabetic writing), […], is more adequate for the structural analysis of seg-
mentable sound sequences […] than for other forms of communication, especially ges-
tures” (Duranti 1997: 150). Bodily movements cannot be transcribed directly, they have
to be described. Descriptions in their turn are to be read like stage directions that do
not have to be pronounced but enacted. Consequently, due to their different represen-
tational modes – transcription versus description – information about body movements
is either graphically or spatially set apart in the transcripts in order not to be mistaken
for text.
Furthermore, by transforming non-talk into talk, verbal descriptions reproduce the
dominance of speech over other forms of human expression before giving us a chance
to assess how non-linguistic elements of the context participate in their own, unique
ways, to the constitution of the activity under examination (Duranti 1997: 144).
Readers should be aware that graphical distinction and/or spatial separation of
modes of communication are due to the writing system used for the representation
of discourse, not to ontological or functional differences of the modes themselves –
one may speculate that a transcript of multimodal communication using a logographic
writing system would minimize at least some of the issues discussed here.
4. Conclusion
Recordings and transcripts are used in order to objectify conversation – in the double
sense of transforming a genuinely evanescent event into an object that remains stable
and identical over time and in the sense of eliminating the inherent subjectivity of par-
ticipants’ as well as observers’ interpretations thereof. Yet, even rendered as a text, the
conversation is not the same over the course of time and to any observer. Not only for
participants, the interpretation of what is going on depends on individual interests and
expectations. As has been shown, any analyst may focus on different aspects of the com-
municative event; furthermore, each analysis is influenced and informed by previous
analyses by the same observer. Thus, as Franck (1989: 165) puts it, neither participants
nor observers may enter the same river twice. In consequence, Franck argues that this
specific kind of object of investigation calls for new methods of transmission. Challen-
ging the demand to write down processes and products of analysis, Franck (1989) pro-
poses that instead of attempting to meet the criteria set for scientific objects in
traditional approaches to language, e.g. in structural linguistics, one should conceive
of conversation analysis as a training to hear (Franck 1989: 166).
In a similar perspective, Nothdurft (2006) points out that transcribers attribute what
they hear to speakers, without reflecting that their own perception of a recorded con-
versation is in no way identical to what speakers actually did say. Furthermore, exactly
those procedures of treating the data are based upon an object of investigation that in
essential aspects fundamentally differs from the object of ordinary conversation. The
data undergo a process of transformation that ultimately leads. According to Nothdurft,
analysists to look for phenomena in the wrong place when locating them in the tran-
script. Analysts systematically misconceive the status of their data. Records and tran-
scripts do not represent communicative reality. What enables a detailed microanalysis
is just what eliminates the genuine evanescent character of conversation. Thus, the
phenomena under investigation evaporate and turn into what Nothdurft calls conversa-
tional phantoms. Consequently, he pleas for taking the specific instance of conversation
1004 V. Methods
not as object of investigation, but rather as an occasion prompting the analysts to reflect
upon their own experiences with and expectations towards communication.
From a psycholinguistic perspective, Kowal and O’Connell investigated the percep-
tual faculties that are necessary for transcribing: Transcribers have to refrain from those
perceptual routines that enable them to participate competently in a conversation, as
instead of concentrating on the meaning of utterances, they have to focus, for example,
on self-correcting practices that often interfere with understanding, and they have to
analyse changes in speed and loudness, intonation contours, simultaneity and succession
of turns that participants mostly notice intuitively and that often lie beyond conscious
perception (Kowal and O’Connell 2000a: 355). A corollary could be the consideration
and investigation of how reading routines and reading conventions influence lay readers’
as well as professional discourse analysts’ interpretation of transcripts.
Yet, only few researchers will go as far as to refrain from recording and transcribing
at all. The aim of this paper has been to elucidate the inherent selectivity of transcripts
and how they reduce what could be investigated. Essentially, what has not been re-
corded is not there to be transcribed, and what has not been transcribed is not there
to be analysed and interpreted. As tertiary sources, transcripts become the basis for fur-
ther investigation and as such constitute the object of investigation. Thus, only a limited
set of questions can be answered on the base of a given transcript. Since each new ques-
tion calls for a new set of data, records and transcripts, the sharing of transcripts among
researchers is a challenge. Meanwhile, discussion about practicality, standards and uni-
fication of symbols has begun and will continue with the aim of developing transcrip-
tional systems that reflect the current knowledge in the field and at the same time
allow for new research questions and for reanalysis and reinterpretation of transcripts
in publications.
Whereas selection and segmentation are regularly and often explicitly made on the
basis of on theoretical assumptions, the impacts of layout decisions mostly go unnoticed.
There is no clear correlation between theoretical stances and a given layout format;
rather, practical considerations and the wish to display the focus and results of the ana-
lysis as clearly as possible determine the layout. Yet, as a base for and documentation of
analysis and interpretation, transcripts build a strong case for the researchers’ assump-
tions about (this specific instance of) communication and thus contribute to readers’
conceptualization of the phenomenon as well. Without access to the original data nor
to the records, readers can only form their interpretations from the transcriber’s
account. New publication media enable researchers to provide readers at least with
audio- and video-files of the data.
For qualitative studies, the rendition of transcripts serves as evidence for the analysis.
In order to ensure intersubjective verifiability, the transcription rules are documented,
so that readers might check which information has (or has not) been transcribed, and
whether transcription is consistent and corresponds to the rules (see Steinke 2000:
325). While discussions of selectivity and its solutions can be found in general accounts
of transcription systems (such as Selting et al. 1998; Selting et al. 2009; and those col-
lected in Edwards 1993), until now, these issues are rarely discussed in single empirical
studies. Since there is no way out of the inherently reflexive relation between transcripts
and theory, criteria for selection as well as for layout decisions should be explicated in
any empirical study. In recent years, criteria for the quality of transcripts such as prac-
ticality, readability and exchangeability of data, as well as relevance in relation to the
64. Approaching notation, coding, and analysis from a conversational analysis 1005
research question and avoidance of mere positivistic description on the one hand, and
evaluative interpretation on the other hand, have been proposed by Ehlich (1993: 125),
Selting et al. (1998), Deppermann (2001: 46) and Edwards (2001) among others. These
criteria have been developed bearing theoretical considerations as well as practical ex-
igencies for transcription decisions in mind. Discussion should be continued, focusing
on the contribution of transcripts to the conceptualization of communication.
5. References
Atkinson, Maxwell J. and John Heritage 1984. Transcript notation. In: Maxwell J. Atkinson and
John Heritage (eds.), Structures of Social Action. Studies in Conversation Analysis, ix–xvi. Cam-
bridge: Cambridge University Press.
Bergmann, Jörg 1981. Ethnomethodologische Konversationsanalyse. In: Peter Schröder and Hugo
Steger (eds.), Dialogforschung. Jahrbuch 1980 des Instituts für Deutsche Sprache, 9–51. Düssel-
dorf, Germany: Cornelsen.
Bergmann, Jörg 2000. Konversationsanalyse. In: Uwe Flick, Ernst von Kardoff and Ines Steinke
(eds.), Qualitative Forschung: ein Handbuch, 524–537. Reinbek bei Hamburg, Germany:
Rowohlt.
Birdwhistell, Ray 1970. Kinesics and Context. Essays on Body Motion Communication. Philadel-
phia: University of Pennsylvania Press.
Bloom, Lois 1993. Transcription and coding for child language research: The parts are more than
the whole. In: Jane A. Edwards and Martin Lampert (eds.), Talking Data. Transcription and
Coding in Discourse Research, 149–166. Hillsdale, NJ: Lawrence Erlbaum.
Bohle, Ulrike 2007. Das Wort Ergreifen – das Wort Übergeben. Explorative Studie zur Rolle Re-
debegleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bucholtz, Mary 2000. The politics of transcription. Journal of Pragmatics 32: 1439–1465.
Chafe, Wallace L. 1993. Prosodic and functional units of language. In: Jane A. Edwards and Mar-
tin Lampert (eds.), Talking Data. Transcription and Coding in Discourse Research, 33–43. Hills-
dale, NJ: Lawrence Erlbaum.
Cook, Guy 1990. Transcribing infinity. Problems of context presentation. Journal of Pragmatics 14:
1–24.
Deppermann, Arnulf 2001. Transkription. In: Arnulf Deppermann (ed.), Gespräche Analysieren,
2nd edition, 39–48. Opladen, Germany: VS Verlag für Sozialwissenschaften.
Deppermann, Arnulf and Reinhold Schmitt 2007. Koordination. Zur Begründung eines neuen
Forschungsgegenstandes. In: Reinhold Schmitt (ed.), Koordination. Analysen zur Multimoda-
len Interaktion, 15–54. Tübingen, Germany: Narr.
DuBois, John W., Stephan Schuetze-Coburn, Susanna Cumming and Danae Paolino 1993. Outline
of discourse transcription. In: Jane A. Edwards and Martin Lampert (eds.), Talking Data. Tran-
scription and Coding in Discourse Research, 45–89. Hillsdale, NJ: Lawrence Erlbaum.
Duranti, Alessandro 1997. Transcription: From writing to digitized images. In: Alessandro Duranti
(ed.), Linguistic Anthropology, 122–161. Cambridge: Cambridge University Press.
Edwards, Jane A. 1993. Principles and contrasting systems of discourse transcription. In: Jane A.
Edwards and Martin Lampert (eds.), Talking Data. Transcription and Coding in Discourse
Research, 3–31. Hillsdale, NJ: Lawrence Erlbaum.
Edwards, Jane A. 2001. The transcription of discourse. In: Deborah Schiffrin, Deborah Tannen and
Heidi Hamilton (eds.), The Handbook of Discourse Analysis, 321–348. Malden, MA: Blackwell.
Edwards, Jane A. and Martin Lampert (eds.) 1993. Talking Data. Transcription and Coding in Dis-
course Research. Hillsdale, NJ: Lawrence Erlbaum.
Ehlich, Konrad 1993. HIAT: A transcription system for discourse data. In: Jane A. Edwards and
Martin Lampert (eds.), Talking Data. Transcription and Coding in Discourse Research, 123–
148. Hillsdale, NJ: Lawrence Erlbaum.
1006 V. Methods
Ekman, Paul and Wallace Friesen 1982. Measuring facial movement with the facial action coding
system. In: Paul Ekman (ed.), Emotion in the Human Face, 2nd edition, 178–211. Cambridge:
Cambridge University Press.
Franck, Dorothea 1989. Zweimal in den gleichen Fluß steigen? Überlegungen zu einer reflexiven,
prozeßorientierten Gesprächsanalyse. Zeitschrift für Phonetik, Sprachwissenschaft und Kom-
munikationsforschung 42(2): 160–167.
Frey, Siegfried 1977. Zeitreihenanalyse sichtbaren Verhaltens. Bericht des 30. Kongreß Deutsche
Gesellschaft für Psychologie, 328–329. Göttingen.
Frey, Siegfried, Hans-Peter Hirsbrunner, Jonathan Pool and William Daw 1981. Das Berner System zur
Untersuchung nonverbaler Interaktion I: Die Erhebung des Rohdatenprotokolls. In: Peter Winkler
(ed.), Methoden der Analyse von Face-to-Face-Situationen, 203–236. Stuttgart, Germany: Metzler.
Garfinkel, Harold 1963. A conception of, and experiments with, ‘trust’ as a condition of stable,
concerted action. In: O.J. Harvey (ed.), Motivation and Social Interaction. Cognitive Determi-
nants, 187–238. New York: Ronald Press.
Grice, H. Paul 1975. Logic and Conversation. Peter Cole and Jerry L. Morgan (eds.), Syntax and
Semantics. Vol. 3: Speech acts. New York: Academic Press.
Gumperz, John 1982. Contextualization conventions. In: John Gumperz (ed.) Discourse Strategies.
Studies in Interactional Sociolinguistics, 130–152. Cambridge: Cambridge University Press.
Gumperz, John and Norine Berenz 1993. Transcribing conversational exchanges. In: Jane A. Ed-
wards and Martin Lampert (eds.), Talking Data. Transcription and Coding in Discourse
Research, 91–121. Hillsdale, NJ: Lawrence Erlbaum.
Heath, Christian 1986. Body Movement and Speech in Medical Interaction. Cambridge: Cambridge
University Press.
Heidtmann, Daniela and Marie-Joan Föh 2007. Verbale Abstinenz als Form interaktiver Beteili-
gung. In: Reinhold Schmitt (ed.), Koordination. Analysen zur Multimodalen Interaktion, 263–
292. Tübingen, Germany: Narr.
Jefferson, Gail 2004. Glossary of transcript symbols with an introduction. In: Gene Lerner (ed.),
Conversation Analysis. Studies from the First Generation, 13–31. Amsterdam: John Benjamins.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Relationship of Verbal and Nonverbal Communication, 207–227. The Hague:
Mouton.
Kendon, Adam 1982. Organization of behavior in face-to-face interaction. In: Klaus Scherer and
Paul Ekman (eds.), Handbook of Methods in Nonverbal Behavior Research, 440–505. Cam-
bridge: Cambridge University Press.
Kendon, Adam 2000. Language and gesture: Unity or duality? In: David McNeill (ed.), Language
and Gesture, 47–63. Cambridge: Cambridge University Press.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University Press.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.), Lan-
guage and Gesture, 162–185. Cambridge: Cambridge University Press.
Kowal, Sabine and Daniel O’Connell 2000a. Psycholinguistische Aspekte der Transkription: Zur
Notation von Pausen in Gesprächstranskripten. Linguistische Berichte 183: 353–378.
Kowal, Sabine and Daniel O’Connell 2000b. Zur Transkription von Gesprächen. In: Uwe Flick,
Ernst von Kardoff and Ines Steinke (eds.), Qualitative Forschung. Ein Handbuch, 437–447. Re-
inbek bei Hamburg, Germany: Rowohlt Taschenbuch.
Lévi-Strauss, Claude 1963. Structural Anthropology. New York: Basic Books.
Linell, Per 1998. Approaching Dialogue. Talk, Interaction, and Contexts in Dialogical Perspectives.
Amsterdam: John Benjamins.
MacWhinney, Brian 1995. The CHILDES-Project – Tools for Analyzing Talk, 2nd edition. Hills-
dale, NJ: Lawrence Erlbaum.
McNeill, David 1985. So you think gestures are nonverbal? Psychological Review 92(3): 350–371.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
Nothdurft, Werner 2006. Gesprächsphantome. Deutsche Sprache 34: 32–43.
65. Transcribing gesture with speech 1007
Ochs, Elinor 1979. Transcription as theory. In: Elinor Ochs and Bambi Schieffelin (eds.), Develop-
mental Pragmatics, 43–72. New York: Academic Press.
Sacks, Harvey and Emanuel Schegloff 1974. Two preferences in the organization of reference of
persons in conversation and their interaction. In: N.H. Avison and J.R. Wilson (eds.), Ethno-
methodology: Labelling theory and deviant behavior. London: Routledge and Kegan Paul.
Sager, Sven F. 2005. Ein System zur Beschreibung von Gestik. Osnabrücker Beiträge zur
Sprachtheorie (OBST) 70: 19–47.
Scheflen, Albert 1966. Natural history method in psychotherapy: communicational research. In:
Louis Gottschalk and Aurthur Auerbach (eds.), Methods in Research in Psychotherapy, 263–
289. New York: Appleton-Century-Crofts.
Schegloff, Emanuel 1984. On some gestures’ relation to talk. In: Maxwell Atkinson and John Her-
itage (eds.), Structures of Social Action. Studies in Conversation Analysis, 266–296. Cambridge:
Cambridge University Press.
Schmitt, Reinhold 2005. Zur multimodalen Struktur von turn-taking. Gesprächsforschung –
Online Zeitschrift zur verbalen Interaktion 6: 17–61.
Schmitt, Reinhold (ed.) 2007. Koordination. Analysen zur Multimodalen Interaktion. Tübingen,
Germany: Narr.
Selting, Margret 2001. Probleme der Transkription verbalen und paraverbalen/prosodischen Ver-
haltens. In: Klaus Brinker, Gerd Antos, Wolfgang Heinemann and Sven F. Sager (eds.), Text-
und Gesprächslinguistik./Linguistics of Text and Conversation. Ein Internationales Handbuch
Zeitgenössischer Forschung. An International Handbook of Contemporary Research, 2. Halb-
band/Volume 2, 1059–1068. Berlin: Walter de Gruyter.
Selting, Margret, Peter Auer, Birgit Barden, Jörg Bergmann, Elizabeth Couper-Kuhlen, Susanne
Günthner, Christoph Meier, Uta Quasthoff, Peter Schlobinski and Susanne Uhmann 1998. Ge-
sprächsanalytisches Transkriptionssystem (GAT). Linguistische Berichte 173: 91–122.
Selting, Margret, Peter Auer, Dagmar Barth-Weingarten, Jörg Bergmann, Pia Bergmann, Karin
Birkner, Elizabeth Couper-Kuhlen, Arnulf Deppermann, Peter Gilles, Susanne Günthner,
Martin Hartung, Friederike Kern, Christine Mertzlufft, Christian Meyer, Miriam Morek,
Frank Oberzaucher, Jörg Peters, Uta Quasthoff, Wilfried Schütte, Anja Stukenbrock and
Susanne Uhmann 2009. Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). Gesprächs-
forschung. Online-Zeitschrift zur Verbalen Interaktion 10: 353–402.
Steinke, Ines 2000. Gütekriterien qualitativer Forschung. In: Uwe Flick, Ernst von Kardoff and
Ines Steinke (eds.), Qualitative Forschung. Ein Handbuch, 319–331. Reinbek bei Hamburg,
Germany: Rowohlt Taschenbuch.
Streeck, Jürgen 1993. Gesture as communication I: Its coordination with gaze and speech. Com-
munication Monographs 60(4): 275–299.
Abstract
The chapter sketches a set of analytic heuristics for observing and recording observa-
tions of how gesture and speech co-occur in natural interactive human discourse as it
was developed at the University of Chicago. After discussing theoretical as well as meth-
odological issues on tools and data, the chapter presents the set of methods for the tran-
scription, observation, and annotation of speech and gestures. As a last aspect, the
chapter outlines the analysis of gestures meanings and functions in discourse by discuss-
ing the meaning-driven essence and the reciprocal nature of the transcribing gestures
with speech.
1. Introduction
This is a sketch of some essentials of a method for observing and recording observations
of how gesture and speech co-occur in natural interactive human discourse. The method
is not a “coding system”. It is a set of analytic heuristics emerging from decades of work
by many scholars of GESTURE who have collaborated in research led by David McNeill in
the Psychology Department at the University of Chicago. (Note that “GESTURE” in spe-
cial font signals the problematic nature of this term. In this treatment of the bodily di-
mensions of natural language use, the everyday senses of this term are at once too
narrow and too broad. The notes as to what constitutes gesture in the observation and
analysis framework sketched here do not constitute a full or adequate definition of the
term.) Application of these heuristics to multimodal discourse data has the goal of build-
ing a base of descriptive facts that contribute to a theory of human language, rather than
to a theory of “nonverbal behavior”. “Growth Point” theory (McNeill 2005) derives from
these facts. This theory holds that intervals of language use issue from “idea units”
possessing both gradient-imagistic and discrete-categorial semiotic properties.
The notion of GESTURE relevant to this sketch is at the level of modes of semiosis:
gradient-imagistic versus discrete-categorial. Gradient semiosis in the visuo-spatio-
motoric modality, as manifested in gestures and other embodiments, is a focus of obser-
vation, annotation, and analysis. Gradient semiosis is also a focus in the verbal-vocal
modality where it manifests, for example, in speech prosodic contouring. Likewise,
the visuo-spatio-motoric modality is understood to support categorial semiosis as well
as gradient, for example, conventionalized gestures that have standards of form. The
analytic goal supported by the transcription approach sketched here is a systematic
study of how gradient and categorial modes of semiosis, irrespective of the modality
in which they are realized, mesh in real-time language production. In practice, for
the most part, this amounts to observing and annotating phases of gesture production
and speech prosodic contouring and how these synchronize with lexical, grammatical,
and discourse constituents in co-occurring speech. Examination of these temporal rela-
tionships promotes insights into the nature of combined gradient and categorial mean-
ing generation from moment-to-moment and across extended intervals of discourse
time.
An adequate description of the “McNeill method” for transcription and annotation
of multimodal discourse faces at least three challenges. First, there is no single method.
Each scholar whose work has helped shaped the research paradigm has taken a distinc-
tive approach to transcribing gesture with speech, framed by a specific research agenda.
65. Transcribing gesture with speech 1009
This sketch attempts to highlight certain commonalities among these approaches, in the
belief that these are largely driven by the observable facts of natural discourse, rather
than by affiliation with the research group. A second challenge has to do with defining
the range of empirically observable phenomena in natural language use to which the
method may be said to apply. Best practice continues to evolve in tandem with our
understanding of the target domain. This has meant that the number of phenomena
meeting criteria for the designation gesture has tended to increase.
The third challenge is misconceptions about this method, widespread in the research
community and resistant to change. A significant misconception is that the core activity
consists of parsing discrete gestural entities out of a stream of multimodal discourse and
labeling each as belonging to a singe one of a set of mutually-exclusive categories or
“gesture types” (e.g., iconic, metaphoric, beat, deictic, etc.). Something like this typo-
logical approach, which unfortunately confounds semiotic, morphological, production,
and social-interactional characteristics of gesturing in unsystematic ways, did character-
ize the research group’s early (1980s) efforts to gain traction in describing complex ges-
tural phenomena in narrative discourses. The Procedures Appendix of Hand and Mind
(McNeill 1992) reflected the early efforts. By the time of publication of that influential
volume, the group was coming to see gesturing as comprising phenomena with varying
temporal extents and consistencies of production, and multiple simultaneous and over-
lapping dimensions of semiosis and function in the discourse. Restrictive coding
schemes can have utility in narrow, targeted analyses. However, a tenet of the method
sketched here is that such schemes tend to be “non-neutral technologies” and, in apply-
ing them to natural discourses, the analyst must not become blind to the fluid multidi-
mensionality of every interval of gesture production, nor to the great variability of
gesturing across individuals. Any scheme for observation and annotation should be
employed in such a way that the annotator retains the ability to detect instances of ca-
tegorial blur that may have implications for descriptive adequacy and also remains open
to discovering the influence of additional, unlooked-for dimensions of patterning in a
dynamically-evolving natural discourse.
There are further misconceptions, including that the method does not treat socially-
constituted, conventionalized (“emblem” or “quotable”) gestures – it does; also, that it
concerns only what an individual speaker does with his or her hands – it does not. No
adequate empirical justification has been put forward for excluding semiotically valued
activity or attitudes of body parts other than the hands (head, feet, eyes, shoulders,
torso, whole body), or of the vocal apparatus (prosody) in discourse, nor of dyadi-
cally-constituted such embodiments. The method sketched here aims to meet the
challenge of encompassing all of these.
constant across all groups augments the certainty with which gesture meanings are
inferred. It also increases the power of a variety of cross-speaker-group comparisons.
Thus, elicitation method, annotative practice, and theoretical framework interpenetrate.
(i) words,
(ii) word partials,
(iii) speech repairs,
(iv) breaths,
(v) speech pauses,
(vi) laughs,
(vii) and various other non-speech sounds.
General practice has been to avoid punctuation (comma, period, etc.) so as not to force
sentential notions on the variously constituted utterances of natural discourse. The
speech of different participants is transcribed on separate lines in a text file or on
separate tiers in an annotation interface.
(1) … he tries / clIMBing up the drAINpipe / uh / and he gets ALL the way Up there …
(2) … he / tr[[ies going up][the inSI][Ide of the drAINpipe #]] and twEEty bird runs …
(3) … the next thin][[g you know he comes swINGin][g thrOUgh on a rOpe]] and then
he jUst misses the wINdow [and he smASHes into the brick wAll beSIde it # …
Some annotators make an initial pass across an entire discourse annotating speech pro-
sodic features such as intervals of peak emphasis. In excerpts (2)–(3), small capitals
mark intervals of audible prosodic emphasis (higher pitch, longer syllable duration, in-
creased loudness). Where speech is transcribed in Praat (Boersma and Weenink 2012)
1012 V. Methods
“TextGrids”, instrumental values for such features may be generated from the acoustic
signal and written to media- and transcript-synchronized tiers. In either case, the exer-
cise of identifying intervals of peak speech prosodic emphasis across a discourse has the
value of highlighting intonational structure as it pertains to contrastive discourse focus.
This can facilitate subsequent gesture annotation passes.
As concerns production properties of coverbal gestures in their temporal relation to
speech, some analysts choose to organize their efforts around the contrast between
movement phases and hold phases of bodily activity (Park-Doob 2010). A more widely
used heuristic centers on the notions of gesture “phrases” and “phases” (Kendon 1972,
1980) and the notion of gesture-speech production “pulse” (Tuite 1993). In either case,
the core activity is one of assessing the temporal and semantic associations between the
phases of bodily activity or stasis and intervals spoken discourse.
A gesture phrase is idealized as the entire interval a hand or other body part is in
motion, starting from an unmarked position of rest, through performance of some
movement or position that the observer perceives as having semiotic value in the dis-
course, and then returning to rest. In (2) and (3), above, square brackets enclose inter-
vals of gestural movement that are observably meaningfully related both to what the
speaker is saying at the moment and to what she witnessed in a cartoon. In the first
bracketed phrase of, “tr[[ies going up][the insi][ide of the drainpipe #]]”, the speaker
lifts her hand from her lap while saying “tries”. Her hand shape is an extended index
finger pointing up. She moves it up in front of her torso. In each of the next two inter-
vals, the hand drops back down (without losing the pointing shape or returning to rest)
and then repeats the movement up. The outer brackets enclosing all three phrases re-
flects the annotator’s judgment that they share features making them some kind of unit.
Intervals with no accompanying gesture phrase, as in (1), are not annotated. Where
“music score” annotation interfaces are used, intervals of gesture phrase are not
marked with typographic conventions on the speech text, rather on a tier synchronized
to the raw audio-video and the speech text.
The internal phase structure of gesture phrases is annotated next. A gesture stroke
phase is generally taken to be the interval within a phrase when an interpretable mean-
ing manifests and/or the interval of apparent greatest gestural effort. The annotator at-
tempts to infer a gesture stroke’s meaning through a “sense-making” process that takes
into account
In (2) and (3), above, bold font indicates stroke phases within the gesture phrases. In (2)
the annotator judged the intervals of upward movement synchronized with “up”,
“inside”, and “drainpipe” to convey meaning in relation to cartoon imagery, the current
point in the story line, and the accompanying speech. Note that strokes are not always
movement phases. “Hold strokes” are frequently seen. A held configuration of the
hands depicting a static object, for example, may be judged a stroke phase, in a given
discourse context.
In the idealized gesture phrase, a preparation phase is movement from rest to stroke
onset. A retraction is movement from stroke offset back to rest. In text-based
65. Transcribing gesture with speech 1013
transcripts, with the annotation of gesture phrases and the stroke phases within them,
preparation and retraction phases are de facto also annotated. In (2), “insi][ide of the
drainpipe #]]”, the preparation phase of the second phrase synchronizes with “ide of
th” in speech; the retraction with “pipe #”.
Several phenomena complicate observation and annotation of gesture phases.
Among them, first, gesture phrases often do not begin from rest positions nor end
with retractions to rest positions. Second, gesture phrases sometimes “abort” before
their stroke phases are fully realized, a common occurrence in conjunction with self-
interrupted, disfluent speech. Third, “hold” phases often occur within the gesture
phrase. A pre-stroke hold is when a gesturing hand momentarily halts between prepa-
ration and stroke. A post-stroke hold is when the hand halts for a moment after the
stroke phase, before retracting or beginning the next gesture phrase. In (2) and (3),
above, underlining indicates gestural holds. In the stroke phase synchronized with
“smashes” in (3) the speaker thrusts her tensed, flat palm away from her body and
then stops it suddenly, holding for the duration of “into”. Post-stroke holds are analyzed
as perseveration of a gestured image to encompass further speech constituents belonging
to the core idea unit from which the gesture-speech utterance is generated. Pre-stroke
holds have a different analytic significance, being seen as evidence that stroke phases tar-
get just those speech constituents that are focal in the idea unit. That fully formed ges-
tures may halt in their progression so as to time with particular speech constituents is
evidence that the synchrony of gesture and speech is not random, making synchrony
assessment seem a plausible approach to exploring multimodal semiosis in language
use. In order to achieve the required degree of accuracy in these assessments, most an-
notators find it necessary to examine gestural movements and holds repeatedly at different
playback speeds. Observation of phase micro-timing is important not only if the planned
analyses are to draw on speech-gesture synchrony findings, but also because close scru-
tiny of phases of gestural movement often reveals new dimensions of patterning and
spurs reassessment of gesture phrase judgments made in the previous pass.
earlier. Such meaning indeterminacy may be at the level of the totality of information
there is to muster, within the universe defined by the collected audio-video sample, in
support of judgments about a phrase parse or the meaning of a stroke phase. It may al-
ternatively derive from the level of speaker thinking, which at moments in a discourse
may be indeterminate or ambiguous itself and thus manifests as gestures whose mean-
ings are difficult to infer.
The method for transcribing gestures with speech sketched here is one of hypothesis
formulation, testing, and revision of analytic judgments at every level (i.e., location of
phrase boundaries, identification of stroke phases, inferences of meaning and semiotic
dimensions in play, etc.). Annotation is a backward-adjusting process as the observer-
annotator works through a discourse sample or multiple related samples, amassing in-
sights that prompt (provisional) acceptance or rejection of hypotheses about particular
gesture-speech productions. As noted, gestures pattern on multiple levels simulta-
neously and are multifunctional. Therefore, multiple hypotheses that annotators may
generate about a given gesture’s meaning or function may all be supportable and should
all be recorded in the process of transcribing gesture with speech. Some hypotheses,
however, will truly be in competition with one another at a particular level of analysis.
For such, the hope is that evidence accumulating from iterative passes over the dis-
course as a whole will aggregate in support of one hypothesis over any other for
every interval of gesture-speech production in a discourse.
5. References
Boersma, Paul and David Weenink 2012. Praat: Doing phonetics by computer [Computer pro-
gram]. Version 5.3.23, http://www.praat.org, Accessed 07.08.2012.
Brugman, Hennie and Albert Russel 2004. Annotating multimedia/multi-modal resources with
ELAN. Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, the
Netherlands, http://tla.mpi.nl/tools/tla-tools/elan.
Kendon, Adam 1972. Some relationships between body motion and speech. In: Aron Wolfe Siegman
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–210. New York: Pergamon Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), The Relation between Verbal and Nonverbal Behavior, 207–227. The Hague:
Mouton.
Kipp, Michael 2012. Anvil: The video annotation research tool [Computer Program]. Version
5.1.3. http://www.anvil-software.de/download/index.html, Accessed 14.08.2012.
McNeill, David 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Park-Doob, Mischa A. 2010. Gesturing through time: Holds and intermodal timing in the stream
of speech. Unpublished Ph.D.dissertation, Department of Linguistics, University of California,
Berkeley.
Schmidt, Thomas, Susan Duncan, Oliver Ehmer, Jeffrey Hoyt, Michael Kipp, Dan Loehr, Magnus
Magnusson, Travis Rose and Han Sloetjes 2008. An exchange format for multimodal annotations.
In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios
Piperdis and Daniel Tapias (eds.), Proceedings of the 6th Language Resources and Evaluation
Conference, 359–365. Paris & Luxembourg: The European Language Resources Association.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1–2): 83–105.
Abstract
The chapter presents a concise overview of annotation tools for multimodal data. After
a short introduction into the topic, the chapter traces back the history of the development
of annotation tools before focusing on their recent advances and comparing the capabil-
ities and drawbacks of existing tools. The last section of the chapter concentrates
on a short description of some representative tools (ELAN, ANVIL, Transana, and
EXMARaLDA) and finishes with some concluding remarks on annotation tools and
their influence on the analytic perspective on gestures.
1. Introduction
Systematic analyses of multimodal communication are essential to scientific advance-
ment in the study of language, learning, and human interaction. At the core of these
human capabilities are multimodal acts of communication through which interlocutors
establish shared knowledge and bonds of empathy and make their thinking and emo-
tions manifest in multiple modalities (verbal/vocal, gestural, facial, bodily orientation
and movement). Communicative interactions are contextualized in time and place
with respect to relationship history of the interacting parties, evolution of the immedi-
ate discourse, type of discursive interaction (story telling, planning, free conversation,
instruction), and the local physical and interpersonal environment (e.g., arrangement
and number of interlocutors, presence of manipulable objects, texts, videos, photos, dia-
grams, and other artifacts). These multiple dimensions of communicative interactions
are complexly configured, subtle and nuanced, making comprehensive study of the
whole a significant methodological challenge for anthropology, psychology, and linguis-
tics, as well as computer science and engineering efforts concerned with modeling
human communicative interaction behavior. With respect to audio-video recording
and automating capture of biophysiological, motion, and other dimensions of human
communication, we have reached a stage of technological development where research-
ers are able to assemble large amounts of instrumentally captured data on a variety of
behavioral dimensions. Researchers now aspire to combine instrumentally-automated
analyses of dimensions of communicative interactions, such as prosodic features ex-
tracted from the speech acoustic signal, bodily movements detected by motion capture
from video, gaze shifts captured with eye tracking, breath kinematic data (e.g., Gullberg
and Kita 2009; McFarland 2001; McNeill et al. 2001; Wang and Levow 2011,
respectively) with human-annotated features such as discourse topic shifts, intonation
units, the meanings of gestures, syntactic constituents, backchanneling, interactional
synchrony (Kimbara 2006; Knight 2011; Loehr 2004; McNeill and Levy 1982;
1016 V. Methods
McNeill et al. 2001; Parrill 2008, respectively), and many others. Any hope of bringing
such a variety of instrumentally- and human-assessed dimensions of communicative
interaction together so as to make patterning among them accessible to exploration
rests with continuing development of software interfaces for visualization and annota-
tion of complex multimodal data, that are coupled with database structures capable of
supporting useful queries.
available to end users either free of charge or for a nominal fee. The proliferation of
such tools is promising in that it signals widespread engagement with the challenge
of systematizing the work of observing and analyzing multimodal communication
data, as well as with facilitating the time-consuming aspects of this work. However, it
also has problematic aspects. With so many tools available, with apparently overlapping
capabilities, it is difficult for researchers to know how to choose among them. In addi-
tion, many multimodal annotation tools have short lives, having been developed on
platforms that become obsolete or by student researchers who move on to other pro-
jects, leaving them unsupported. The likelihood of these eventualities can be difficult
to gauge prospectively.
A significant portion of the proliferative development is due to researchers develop-
ing their own tools “in-house”. Examples of this are TASX (Milde and Gut 2002) and
MacVisSTA (Rose, Quek, and Shi 2004), two full-featured tools for creating and mana-
ging corpora of linguistic and behavioral data based on audio and video sources, as well
as, in the case of MacVisSTA, data from motion capture. Though the developers of
these and other such packages typically have the goal of sharing with other researchers,
in practice, most in-house efforts remain confined to their research groups of origin and
tend to have relatively short life spans. Learnability and usability are often problem
areas with academic researcher-designed software, which typically does not come
with a comprehensive user’s manual. Not being intended for commercial release,
such packages are unlikely to have received extensive testing and so may not be very
robust. Further, researchers seeking an annotation tool appropriate for their work
may not be able to tell, without investing a lot of learning time, whether particular
packages have capabilities they require or will match their research style. Such factors
push researchers in the direction of developing their own tools, thus furthering the cycle
of proliferative development.
Surveys or systematic comparisons of various multimodal annotation tools avail-
able at particular times over the past fifteen years have appeared in print (e.g.,
Allwood et al. 2001; Bigbee, Loehr, and Harper 2001; Bird and Liberman 2000; Silver
and Patashnick 2011) and at several conference workshops, including those held at
International Society of Gesture Studies meetings (Rohlfing et al. 2006; Rohlfing
and Duncan 2007), and recently at a conference hosted by the Netherlands Associa-
tion for Qualitative Research (KWALON). Organizers of the “KWALON Experi-
ment” tasked five developers of tools for qualitative data analysis with analyzing
the same dataset and then writing reflective papers about the experience
that would enable potential users to get a sense of how the tools compare (Evers
et al. 2011). The results are gathered in a special issue of FQS Forum: Qualitative
Social Research (see Evers et al. 2011; also, Woods and Dempster 2011). In addition,
Silver and Lewins (2009) and Fielding and Silver (2012) are helpful concerning
factors researchers may consider in selecting tools to match their research goals
and style.
Despite the long and varied use of visual data, there seems to be a lack of cross-fertilization
of methods and tools between disciplines. In parallel to the development of CAQDAS
(Computer Assisted Qualitative Data Analysis) packages, tools derived from educational,
behavioural, linguistic, and psychological perspectives offer some quite distinct analytic pos-
sibilities for audiovisual data. These include ELAN […] and Observer. These packages,
however, tend not to be rooted in qualitative social research traditions, usually taking a
more quantitative approach to the analysis of audiovisual data.
transcription and markup, however many researchers who use such platforms seem to
prefer to use the software Praat (Boersma and Weenink 2012) or other, similar software
designed specifically to facilitate speech transcription as well as phonetic analysis of
speech, to create initial transcriptions of the speech in audio-video data samples. Praat
TextGrids and file formats supported by other speech transcription software are readily
importable into interfaces like ELAN and ANVIL.
Transana (Woods and Dempster 2011) and EXMARaLDA (Schmidt and Wörner
2009) are examples of what Silver and Patashnick (2011) would call a “CAQDAS”
package. Both have their roots in the analytic needs of linguists working within the
Conversation Analysis tradition. Both are easy for non-expert computer users to get
started with. Their user interfaces give an experience of accumulated transcription
and annotation data rather different from that provided by the “music score” interfaces,
although something similar to that framework is also configurable in these interfaces. In
Transana multiple transcripts, each a text describing events on one behavioral level
in an audio-video sample, are built in multiple passes over the data. An example is a
verbatim transcription of spoken utterances. Phrases of coverbal gesture would also
be entered as transcripts, intervals of which synchronize with the corresponding inter-
vals in the audio-video sample and with the corresponding transcribed utterance inter-
vals, permitting examination of temporal co-occurrences. The audio-video sample may
be played with multiple time-aligned transcripts synchronized in a visualization format
reminiscent of a speech transcript with interlinears, each capturing different analytic
layers. Intervals may be accessed for playback either from the audio-video media or
transcript intervals. Notes are easy to incorporate. Intervals with features in common
can be excerpted as clips and aggregated as collections. Transana supports a number
of simple but very useful data visualizations that assign different colors to events in dif-
ferent associated transcripts, permitting the user to visually scan for patterning across
lengths of discourse. Data visualizations useful for the human user are also a strong
point of EXMARaLDA which, in addition to making a variety of visualizations with
layering of annotated behavioral dimensions available in the computer interface, also
permits printing these out in a standard transcript format. Interoperability with other
tools has been a development priority for EXMARaLDA and it is particularly strong
in this regard.
4. Concluding thoughts
Ihde (1979) has argued that analytic tools unavoidably select, amplify and reduce as-
pects of experience in various ways. Mowshowitz (1976: 8) notes that, “tools insist on
being used in particular ways.” In these senses, tools such as those used for analysis
of multimodal communication are not “neutral”, therefore their use must inevitably
contribute to shaping the way we perceive and interpret the communication phenom-
ena we study. Silver and Patashnick (2011) caution researchers to be wary of this and
to document their analytic procedure so as to make them transparent and available
to critique. No multimodal annotation tool is without disadvantages in relation to
some analytic goals. Each enables or impedes different aspects of the observation
and analytic effort. There is no “best” tool for research in multimodal communication.
In fact, anecdotal report suggests that most researchers employ more than one such tool
to support different aspects of their work
1020 V. Methods
5. References
Allwood, Jens, Leif Grönqvist, Elisabeth Ahlsen and Magnus Gunnarssan 2001. Annotations
and tools for an activity-based spoken language corpus. In: Jan van Kuppevelt and Ronnie W.
Smith (eds.), Proceedings of the 2nd SIGdial Workshop of Discourse and Dialogue 16, 1–10.
Morristown, NJ: Association for Computational Linguistics.
Bigbee, Anthony, Dan Loehr and Lisa Harper 2001. Emerging requirements for multi-modal
annotation and analysis tools. In: Proceedings, Eurospeech 2001 Special Event: Existing and
Future Corpora – Acoustic, Linguistic, and Multi-modal Requirements. Aalborg, Denmark.
Bird, Steven and Mark Liberman 2000. A formal framework for linguistic annotation. Speech
Communication 33: 23–60.
Boersma, Paul and David Weenink 2012. Praat: doing phonetics by computer [Computer pro-
gram]. Version 5.3.23, retrieved 7 August 2012 from http://www.praat.org/
Condon, William S. and William D. Ogston, 1966. Sound film analysis of normal and pathological
behavior patterns. Journal of Nervous and Mental Disease 143: 338–347.
Efron, David 1972. Gesture, Race, and Culture. Berlin: De Gruyter Mouton. First published [1941].
Evers, Jeanine, Katja Mruck, Christina Silver, Baart Peeters, Silvana di Gregorio and Clare Tagg
2011. The KWALON Experiment: Discussions on Qualitative Data Analysis Software by Devel-
opers and Users. FQS Forum: Qualitative Social Research 12(1): Art. 40.
Fielding, Nigel and Christina Silver, 2012. Choosing an appropriate CAQDAS package. Retrieved
15.08.2012, from http://www.surrey.ac.uk/sociology/research/researchcentres/caqdas/support/
choosing/
Goodwin, Charles 2007. Environmentally coupled gestures. In: Susan Duncan, Justine Cassell and
Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language, 195–212. Amsterdam:
John Benjamins.
Gullberg, Marianne and Sotaro Kita 2009. Attention to speech-accompanying gestures: Eye
movements and information uptake. Journal of Nonverbal Behavior 33(4): 251–277.
Ihde, Don 1979. Technics and Praxis. (Boston Studies in the Philosophy of Science, Volume 24.)
Dordrecht: Reidel.
Kendon, Adam 1972. Some relationships between body motion and speech. In: Aron W. Siegmann
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–210. New York: Pergamon.
Kimbara, Irene 2006. On gestural mimicry. Gesture 6(1): 39–61.
Kipp, Michael 2012. Multimedia annotation, querying and analysis in ANVIL. In: Mark T.
Maybury (ed.), Multimedia Information Extraction, 351–368. Los Alamitos, CA: Wiley-IEEE
Computer Society Press.
Knight, Dawn 2011. Multimodality and Active Listenership: A Corpus Approach. London: Contin-
uum International.
Loehr, Dan 2004. Intonation and gesture. Unpublished doctoral dissertation, Department of Lin-
guistics, Georgetown University, Washington, DC.
66. Multimodal annotation tools 1021
Loehr, Dan and Lisa Harper 2003. Commonplace tools for studying commonplace interactions:
Practitioner’s notes on entry-level video-analysis. Visual Communication 2(2): 225–233.
McFarland, David H. 2001. Respiratory markers of conversational interaction. Journal of Speech
and Language Research 44(1): 128–143.
McNeill, David and Elena Levy 1982. Conceptual representation in language activity and gesture.
In: Robert J. Jarvella and Wolfgang Klein (eds.), Speech, Place, and Action, 271–295. New
York: John Wiley and Sons.
McNeill, David, Francis Quek, Karl E. McCullough, Susan Duncan, Nobuhiro Furuyama,
Robert Bryll, Xin-Feng Ma and Rashid Ansari 2001. Catchments, prosody, and discourse. Gesture
1: 9–33.
Milde, Jan-Torsten and Ulrike Gut 2002. The TASX-environment: An XML-based toolset for time
aligned speech corpora. In: Manuel González Rodrı́guez and Carmen Paz Suarez Araujo
(eds.), Proceedings of the 3rd International Conference on Language Resources and Evaluation
(LREC 2002), 1922–1927. Las Palmas de Gran Canaria, Spain: European Language Resources
Association (ELRA).
Mowshowitz, Abbe 1976. The Conquest of Will: Information Processing in Human Affairs. Read-
ing, MA: Addison-Wesley.
Parrill, Fey 2008. Subjects in the hands of speakers: An experimental study of synactic subject and
speech-gesture integration. Cognitive Linguistics 19(2): 283–299.
Rohlfing, Katharina, Dan Loehr, Susan Duncan, Amanda Brown, Amy Franklin, Irene Kimbara,
Jan-Torsten Milde, Fey Parrill, Travis Rose, Thomas Schmidt, Han Sloetjes, Alexandra Thies,
and Sandra Wellinghoff 2006. Comparison of multimodal annotation tools – workshop report.
In: Gesprächsforschung 7: 99–123.
Rohlfing, Katharina and Susan Duncan 2007. Workshop: Annotation tools. Held at Integrating
Gestures: The International Society for Gesture Studies 3rd International Conference, North-
western University, Evanston, Illinois.
Rose, Travis, Francis Quek and Yang Shi 2004. MacVisSTA: A system for multimodal analysis. In:
Rajeev Sharma, Trevor Darrell, Mary P. Harper, Gianni Lazzari, and Matthew Turk (eds.),
Proceedings of the 6th International Conference on Multimodal Interfaces, 259–264. New
York: Association for Computing Machinery.
Schmidt, Thomas, Susan Duncan, Oliver Ehmer, Jeffrey Hoyt, Michael Kipp, Dan Loehr, Magnus
Magnusson, Travis Rose and Han Sloetjes 2008. An exchange format for multimodal annota-
tions. In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk,
Stelios Piperidis, and Daniel Tapias (eds.), Proceedings, Sixth International Conference on Lan-
guage Resources and Evaluation, LREC 2008, 359–365. Marrakech, Morocco: European Lan-
guage Resources Association (ELRA).
Schmidt, Thomas and Kai Wörner 2009. EXMARaLDA – Creating, analysing and sharing spoken
language corpora for pragmatic research. Pragmatics 19: 565–582.
Silver, Christina and Ann Lewins 2009. Choosing a CAQDAS package. Working Paper #001, 6th
edition. CAQDAS Networking Project and Qualitative Innovations in CAQDAS Project
(QUIC). Retrieved 15.08.2012 from http://eprints.ncrm.ac.uk/791/1/2009ChoosingaCAQDAS
Package.pdf.
Silver, Christina and Jennifer Patashnick 2011. Finding fidelity: Advancing audiovisual analysis
using software. FQS Forum: Qualitative Social Research 12(1): Art. 37.
Wang, Siwei and Gina-Anne Levow 2011. Contrasting multi-lingual prosodic cues to predict
verbal feedback for rapport. In: Yuji Matsumoto and Rada Mihalcea (eds.), Proceedings of
the 49th Annual Meeting of the Association for Computational Linguistics: Short Papers,
614–619. Portland, OR: Association for Computational Linguistics.
Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann and Han Sloetjes 2006.
Elan: A professional framework for multimodality research. In: Nicoletta Calzolari, Aldo
Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, and Daniel Tapias (eds.), Proceedings
1022 V. Methods
of the 4th Language Resources and Evaluation Conference (LREC 2006), 1556–1559. Genoa,
Italy: European Language Resources Association (ELRA).
Woods, David K. and Paul G. Dempster 2011. Tales from the bleeding edge: The qualitative ana-
lysis of complex video data using Transana. FQS Forum: Qualitative Social Research 12(1),
Art. 17.
Abstract
The NEUROGES coding system is a research tool for the empirical analysis of the spon-
taneously displayed hand movement behaviour that accompanies interaction, thinking,
and emotional experience. NEUROGES is designed for diagnostic purposes as well as
for basic research, i.e., to further explore the anatomy of hand movement behaviour
and its relation to cognitive, emotional, and interactive processes. Fields of application
are interaction analysis including psychotherapy, psychodiagnostics, and experimental
and clinical neuropsychology. In a multi-stage evaluation process resulting in more
and more fine-grained units, the behaviour is segmented and classified according to the
kinesic criteria. The objectivity, reliability, and validity of the NEUROGES categories
and values have been tested in several research studies, thus far including 263 participants,
healthy adults as well as patients with brain damage or mental disease. The analysis of the
group differences in hand movement behaviour as well as kinetographic and brain ima-
ging studies provide evidence that the NEUROGES categories and values are associated
with specific cognitive, emotional, and interactive processes.
thinking, and emotional experience. However, volitional hand movements such as tool
use can be analysed as well.
According to kinesic criteria the ongoing flow of spontaneous hand movements is
segmented into units and classified with values. For some values the association with
cognitive, emotional, and interactive processes is already empirically established and
thus, these can be used for diagnostic purposes. Fields of application are interaction
analysis including psychotherapy, psychodiagnostics, and experimental and clinical
neuropsychology. Furthermore, NEUROGES is designed for basic research, i.e., to fur-
ther explore the anatomy of hand movement behaviour and its relation to cognitive,
emotional, and interactive processes.
within-subject between-subjects
Fig. 67.1: Bi-directional link of hand movement behaviour and cognitive, emotional, and interac-
tive processes.
There is, in fact, ample empirical evidence that spontaneous hand movements are asso-
ciated with higher cognitive functions, such as language, spatial cognition, or praxis (e.g.,
Beattie and Shovelton 2006, 2009; Blonder et al. 1995; Butterworth and Hadar 1989;
Cohen and Otterbein 1992; de Ruiter 2000; de’Speratie and Stucchi 2000; Duffy and
Duffy 1989; Ehrlich, Levine and Goldin-Meadow 2006; Emmorey, Tversky and Taylor
2000; Foundas et al. 1995; Fricke 2007; Garber and Goldin-Meadow 2002; Goldin-
Meadow 2006; Haaland and Flaherty 1984; Hermsdörfer et al. 1996; Kita 2000; Kita
and Özyürek 2003; Krauss, Chen and Chawla 1996; Lausberg and Kita 2003; Lausberg
et al. 2007; Lavergne and Kimura 1987; Le May, David and Thomas 1988; Liepmann
1908; McNeill 1992, 2005; Müller, 1998; Ochipa, Rothi and Heilman 1994; Parsons
et al. 1998; Poizner et al. 1990; Sassenberg et al. 2011; Seyfeddinipur, Kita and Indefrey
2008; Sirigu et al. 1996; Wartenburger et al. 2010). Likewise, it has been demonstrated
that hand movements are related to emotional processes and that they may reflect
psychopathology (e.g., Berger 2000; Berry and Pennebaker 1993; Cruz 1995; Darwin
1890; Davis 1981, 1997; Ekman and Friesen 1969, 1974; Ellgring 1986; Freedman
1972; Freedman and Bucci 1981; Freedman and Hoffmann 1967; Gaebel 1992; Krout
1024 V. Methods
1935; Lausberg 2011; Lausberg and Kryger 2011; Mahl 1968; Sainsbury 1954; Scheflen
1974; Ulrich 1977; Ulrich and Harms 1985; Wallbott 1989; Willke 1995). Furthermore,
hand movements serve to implicitly and explicitly regulate interactive processes (e.g.,
Birdwhistell 1952; Davis 1997; Dvoretska 2009; Kryger 2010; Lausberg 2011; Scheflen
1973, 1974) and to communicate information (e.g., Cohen and Otterbein 1992; Cook
and Goldin-Meadow 2006; Feyereisen 2006; Holle et al. 2010).
Thus, one rationale for the development of the NEUROGES system has been to
integrate the existing empirical knowledge on movement types for which the link to
specific cognitive, emotional, and interactive functions had already been established,
such as for self-touch and stress. These types are suited for diagnostic purposes.
However, as many aspects of the bi-directional link between hand movements and
higher cognitive and emotional functions remain to be explored, the NEUROGES sys-
tem has also been designed to suit exploratory research. First, the anatomy of hand
movement behaviour is examined, such as the duration and the sequencing of certain
types of movement behaviour. And second, the correlation between hand movement
behaviour units and cognitive, emotional, and interactive parameters are investigated
(see below). This procedure implies that hand movements are classified first by kinesic
features alone, i.e., independently from other functions such as speech.
(i) movement,
(ii) rest position/posture
(Fig. 67.2, step 1). The Activation category provides a general Impression of the sub-
ject’s level of arousal.
(i) irregular,
(ii) repetitive,
(iii) phasic,
(iv) aborted,
(v) shift
(Fig. 67.2, step 2). If the Structure value changes within the given unit, the unit is seg-
mented into subunits (this priniciple of subunit generation applies to all following cod-
ing steps). The Structure category reflects the level of complexity of the motor
1026 V. Methods
(Fig. 67.2, step 3). The Focus category reflects the subject’s focus of attention. It ranges
from internal (within body) to external (in space).
(Fig. 67.3, step 1). The Contact category is related to the coordination between the two
hemispheres. It ranges from simple (act as a unit) to complex (act apart).
(i) symmetrical,
(ii) right hand dominance,
(iii) left hand dominance,
(iv) asymmetrical,
67. NEUROGES – A coding system for the empirical analysis 1027
Module I: Start
Step 1:
When does the Activation begin and when does it end?
Step 2:
Where is the Structure?
(only for movement Activation units)
Step 3:
Where is the Focus?
(only for phasic, repetitive, and irregular Structure units)
(Fig. 67.3, step 2). The Formal Relation category is related to hemispheric dominance. It
ranges from no suppression (symmetrical) to bilateral suppression (asymmetrical).
Step 1:
How is the Contact between the hands if both move simultaneously?
(temporal overlaps of right hand and left hand StructureFocus units)
act on each
act as a unit act apart
other
Step 2:
How is the Formal Relation between the hands?
(only for phasic and repetitive Contact units)
space, path during main phase, hand orientation, hand shape, efforts, body involvement,
gaze, cognitive perspective (meta criterion), frequency, and duration.
Eleven Function values are distinguished:
Step 1:
What is the Function?
(only for unimanual and bimanual phasic, repetitive, and shift units)
Step 2:
What is the Type?
(specific Types for each Function value)
external
rise baton neutral transitive shape route manner
target
super-
fall You imperative intransitive size position dynamics
imposed
palming
fist
clenching
opening
closing
(Fig. 67.4 step 1, top row). The Function category refers to the emotional, cognitive, in-
teractive, physical, and practical functions of hand movements. This definition implies
that also those hand movements that are displayed beyond the gesturer’s awareness
have a function. As an example, a seemingly purposeless on body movement may
serve psychodynamically for self-regulation.
egocentric deictic gestures, the Type values specify the target. For egocentric direction
gestures, the Type values specify the agent who takes the direction. For pantomime ges-
tures, they register transitivity. For form presentation, spatial relation presentation, and
motion presentation, the Type value classify the physical aspects that are presented in
gesture. For emblems, instead of specific Type values, a list of commonly used emblem-
atic gestures is provided.
5.1. Reliability
Inter-rater reliability was used to assess the quality of the operationalization of the
NEUROGES values. As the NEUROGES coding procedure comprises segmentation
and classification, the raters’ agreement not only concerns the value that is chosen
for the unit but also the segmentation, i.e., if the raters agree on when a unit with a spe-
cific value starts and ends, when the next unit starts, and ends, etc. Since, thus far, only
statistical measures are available which refer to the categorial agreement, Holle and
Rein developed a novel algorithm (a modified Cohen’s kappa) that takes into account
the raters’ agreement concerning behaviour segmentation (forthcoming). In the experi-
mental studies that were included in the meta-analysis this novel algorithm was applied.
With a few exceptions, for all NEUROGES values the inter-rater agreement was mod-
erate to substantial. Especially with regard to the fact that not only the categorial but also
the temporal agreement was considered, this level of inter-rater agreement indicates an
overall good objectivity of the NEUROGES values.
Furthermore, to examine intra-rater retest reliability, the same rater coded the same
videos with a time interval of 2 years. There was substantial agreement, further indicat-
ing a good operationalization of the NEUROGES values.
Kinematography was used to establish parallel-forms reliability. Movement units
that had been assessed by raters with the Module I Structure category criteria were ana-
lysed with the electromagnetic motion capture system Polhemus Liberty© (Colchester,
67. NEUROGES – A coding system for the empirical analysis 1031
VT), which records the displacement and orientation (Rein forthcoming). The five
structure values phasic, repetitive, shift, aborted, and irregular (see section 4.1.2.) were
reliably distinguished by kinematography and matched the raters’ classifications.
Indirect evidence for the association between specific NEUROGES values and spe-
cific cognitive and emotional functions is provided by using different experimental set-
tings that challenge the participants differently with regard to these functions. As an
example, an intelligence test may require information retrieval or arithmetic abilities,
whereas the narration of funny animated cartoons without speech may induce joy
and require mental imagery. In the meta-analysis, the pattern of the Structure values
differed between the experiments. Experimental settings that induced a pressure to per-
form (prototyp: intelligence test) were associated with more irregular and shifts units. In
contrast, a high amount of phasic, repetitive, and aborted units and a low frequency of
irregular and shift units were found in experiments that animated the participants (pro-
totype: narration of animated cartoons). The findings indicate a dichotomy between
phasic, repetitive, and aborted units on one hand, and irregular and shift units on the
other with regard to the level of cognitive processing (non-conceptual vs. conceptual).
Likewise, for the Focus category a clear picture emerged of in space dominant and on
body dominant experiments. Experiments that elicited visual imagery were accompanied
by a high rate of in space units. In contrast, stress-inducing experiments were character-
ized by a high frequency of on body units. The finding is in line with the proposition that
the Focus in space offers the most options for expressive demonstrations whereas the
Focus on body serves self-regulation.
Acknowledgments
The development of the NEUROGES coding system was supported by the German
Research Association (DFG) grants LA 1249/1–3.
6. References
Beattie, Geoffrey and Heather Shovelton 2006. When size really matters: How a single semantic
feature is represented in the speech and gesture modalities. Gesture 6(1): 63–84.
Beattie, Geoffrey and Heather Shovelton 2009. An exploration of the other side of semantic com-
munication. How the spontaneous movements of the human hand add crucial meaning to nar-
rative. Semiotica 184(1/4): 33–51.
1034 V. Methods
Berger, Miriam R. 2000. Movement patterns in borderline and narcissistic personality disorders.
Dissertation Abstracts International: Section B: The sciences and engineering 60(9/B): 4875.
Berry, Diane S. and James W. Pennebaker 1993. Nonverbal and verbal emotional expression and
health. Psychotherapy and Psychosomatics 59: 11–19.
Birdwhistell, Ray 1952. Introduction to Kinesics: An Annotation System for the Analysis of Body
Motion and Gesture. Washington, DC: Foreign Service Institute.
Blonder, Lee X., Allan F. Burns, Dawn Bowers, Robert W. Moore and Kenneth M. Heilma 1995.
Spontaneous gestures following right hemisphere infarct. Neuropsychologia 33: 203–213.
Bryjovà, Janka, Han Slöetjes and Hedda Lausberg under review. The interactive NEUROGES
training CD In: Hedda Lausberg (ed.), NEUROGES – The Neuropsychological Gesture Cod-
ing System. Berlin: Peter Lang.
Butterworth, Brian and Uri Hadar 1989. Gesture, speech, and computational stages: A reply to
McNeill. Psychological Review 96: 168–174.
Cohen, Ronald L. and Nicola Otterbein 1992. The mnemonic effect of speech gestures: Pantomimic
and non-pantomimic gestures compared. European Journal of Cognitive Psychology 4: 113–139.
Cruz, Robin F. 1995. An empirical investigation of the movement psychodiagnostic inventory. Dis-
sertation Abstracts International: Section B: The Sciences and Engineering 57(2/B): 1495.
Darwin, Charles R. 1890. The Expression of the Emotions in Man and Animals. London: Penguin
Group.
Davis, Caroline 1997. Eating disorders and hyperactivity: A psychobiological perspective. Cana-
dian Journal of Psychiatry 42: 168–175.
Davis, Martha 1981. Movement characteristics of hospitalized psychiatric patients. American Jour-
nal of Dance Therapy 4(1): 52–71.
Davis, Martha 1991 rev. 1997. Guide to movement analysis methods. Behavioral Measurement Da-
tabase Services, P.O. Box 110287, Pittsburgh, PA.
Dell, Cecily 1979. A primer for movement description using effort-shape and supplementary con-
cepts. New York: Dance Notation Bureau.
Denissen, Jacobus J. A. 2005. Understanding and being understood: The impact of intelligence and
dispositional valuations on social relationships. Ph.D. dissertation, Humboldt-University Berlin.
De Ruiter, Jan-Peter 2000. The production of speech and gesture. In: David McNeill (ed.), Lan-
guage and Gesture, 284–311. Cambridge: Cambridge University Press.
De’Sperati Claudio and Natale Stucchi 2000. Motor imagery and visual event recognition. Exper-
imental Brain Research 133: 273–278.
Duffy, Robert J. and Joseph R. Duffy 1989. An investigation of body part as object (BPO) re-
sponses in normal and brain-damaged adults. Brain and Cognition 10: 220–236.
Dvoretska, Daniela 2009. Kinetische Interaktion und Koordination. Diploma thesis, Department of
Psychology, Department of Mathematics and Natural Sciences II, Humboldt-University Berlin.
Dvoretska, Daniela, Jaap Denissen and Hedda Lausberg submitted Intra-dyadic kinesic turn tak-
ing and mutual understanding.
Efron, David 1941. Gesture and Culture. The Hague: Mouton.
Ehrlich, Stacy B., Susan C. Levine and Susan Goldin-Meadow 2006. The importance of gesture in
children’s spatial reasoning. Developmental Psychology 42(6): 1259–1268.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behaviour: Categories, ori-
gins, usage, and coding. Semiotica 1: 49–99.
Ekman, Paul and Wallace V. Friesen 1974. Nonverbal behavior and psychopathology. In: R. J.
Friedman Raymond J. and Martin M. Katz (eds.), The Psychology of Depression, 203–232.
New York: John Wiley and Sons.
Ellgring, Heiner 1986. Nonverbal expression of psychological states in psychiatric patients. Euro-
pean Archives of Psychiatry and Neurological Sciences 236: 31–34.
Emmorey, Karen, Barbara Tversky and Holly A. Taylor 2000. Using space to describe space: Per-
spective in speech, sign, and gesture. Spatial Cognition and Computation 2: 157–180.
67. NEUROGES – A coding system for the empirical analysis 1035
Feyereisen, Pierre 2006. Further investigation on the mnemonic effect of gestures: Their meaning
matters. European Journal of Cognitive Psychology 18(2): 185–205.
Foundas, Anne L., Beth L. Macauley, Anastasia M. Raymer, Lynn M. Mahler, Kenneth M. Heil-
man and Leslie J. G. Rothi 1995. Gesture laterality in aphasic and apraxic stroke patients.
Brain and Cognition 29: 204–213.
Freedman, Norbert 1972. The analysis of movement behavior during the clinical interview. In:
Aron W. Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 153–175.
New York: Pergamon.
Freedman, Norbert and Wilma Bucci 1981. On kinetic filtering in associative monologue. Semio-
tica 34(3/4): 225–249.
Freedman, Norbert and Stanley P. Hoffmann 1967. Kinetic behaviour in altered clinical states:
Approach to objective analysis of motor behaviour during clinical interviews. Perceptual and
Motor Skills 24: 527–539.
Fricke, Ellen 2007. Origo, Geste und Raum. Berlin: Walter de Gruyter.
Gaebel, Wolfgang 1992. Non-verbal behavioural dysfunction in schizophrenia. British Journal of
Psychiatry 161(suppl.18): 65–74.
Garber, Philip and Susan Goldin-Meadow 2002. Gesture offers insight into problem-solving in
adults and children. Cognitive Science 26: 817–831.
Goldin-Meadow, Susan 2006. Talking and thinking with our hands. Current Directions in Psycho-
logical Science 15(1): 34–39.
Haaland, Kathleen Y. and David Flaherty 1984. The different types of limb apraxia errors made
by patients with left vs. right hemisphere damage. Brain and Cognition 3: 370–384.
Helmich, Ingo, Robert Rein, Henning Holle, Christoph Schmitz and Hedda Lausberg 2011. Brain
oxygenation in gesture production. Differences between tool use demonstration, tool use panto-
mimes and body-part-as-object. XI International Conference on Cognitive Neuroscience,
ICON XI Mallorca – Spain, 25–29 Sept. 2011.
Hermsdörfer, Joachim, Norbert Mai, Josef Spatt, Christian Marquardt, Roland Veltkamp and Georg
Goldenberg 1996. Kinematic analysis of movement imitation in apraxia. Brain 119: 1575–1586.
Hogrefe, Katharina, Georg Goldenberg, Robert Rein, Harald Skomroch and Hedda Lausberg
under review. The impact of right and left hemisphere brain damage on gesture kinetics and
gesture location.
Holle, Henning, Joans Obleser, Shirley-Ann Rueschemeyer and Thomas C. Gunter 2010. Integra-
tion of iconic gestures and speech in left superior temporal areas boosts speech comprehension
under adverse listening conditions. NeuroImage 49: 875–884.
Holle, Henning and Robert Rein forthcoming. Assessing interrater agreement of movement an-
notations. In: Hedda Lausberg (ed.), NEUROGES – The Neuropsychological Gesture Coding
System. Berlin: Peter Lang.
Kendon, Adam 2004. Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press.
Kimura, Doreen 1973a. Manual activity during speaking – I. Right-handers. Neuropsychologia 11:
45–50.
Kimura, Doreen 1973b Manual activity during speaking – II. Left-handers. Neuropsychologia 11: 51–55.
Kita, Sotaro 2000. How representational gestures help speaking. In: David McNeill (ed.), Lan-
guage and Gesture: Window into Thought and Action, 162–185. Cambridge: Cambridge Univer-
sity Press.
Kita, Sotaro and Asli Özyürek 2003. What does cross-linguistic variation in semantic coordination
of speech and gesture reveal? Evidence for an interface representation of spatial thinking and
speaking. Journal of Memory and Language 48: 16–32.
Krauss, Robert M., Yihsiu Chen and Purnima Chawla 1996. Nonverbal behavior and nonverbal
communication: What do conversational hand gestures tell us? In: Mark P. Zanna (ed.), Ad-
vances in Experimental Social Psychology 28, 389–450. Tampa, FL: Academic Press.
Krout, Maurice H. 1935. Autistic gestures. Psychological Monographs 46(4): i–126.
1036 V. Methods
Kryger, Monika 2010. Bewegungsverhalten von Patient und Therapeut in als gut und schlecht er-
lebten Therapiesitzungen. Diploma thesis, Department of Neurology, Psychosomatic Medicine,
and Psychiatry; Institute of health promotion and clinical movement science, German Sport
University Cologne.
Laban, Rudolf 1988. The Mastery of Movement. Worcester: Billing & Sons.
Lausberg, Hedda 2011. Das Gespräch zwischen Arzt und Patientin: Die bewegungsanalytische
Perspektive. Balint Journal 12: 15–24.
Lausberg, Hedda to appear. NEUROGES – The Neuropsychological Gesture Coding System.
Berlin: Peter Lang.
Lausberg, Hedda, Henning Holle, Philipp Kazzer, Hauke Heekeren and Isabell Wartenburger
2010. Differential cortical mechanisms underlying tool use, pantomime, and body-part-as-
object use. Abstract book, 16th Annual Meeting of the Organization for Human Brain Mapping
Barcelona, June 2010.
Lausberg, Hedda and Sotaro Kita 2003. The content of the message influences the hand choice in
co-speech gestures and in gesturing without speaking. Brain and Language 86: 57–69.
Lausberg, Hedda and Monika Kryger 2011. Gestisches Verhalten als Indikator therapeutischer
Prozesse in der verbalen Psychotherapie: Zur Funktion der Selbstberührungen und zur Reprä-
sentation von Objektbeziehungen in gestischen Darstellungen. Psychotherapie-Wissenschaft
1(1): 41–55.
Lausberg, Hedda and Han Slöetjes 2009. Coding gestural behaviour with the NEUROGES-
ELAN system. Behaviour Research Methods 41(3): 841–849.
Lausberg, Hedda, Eran Zaidel, Robyn F. Cruz and Alain Ptito 2007. Speech-independent produc-
tion of communicative gestures: Evidence from patients with complete callosal disconnection.
Neuropsychologia 45: 3092–3104.
Lavergne, Joanne and Doreen Kimura 1987. Hand movement asymmetry during speech: No effect
of speaking topic. Neuropsychologia 25: 689–963.
Le May, Amanda, Rachel David and Andrew P. Thomas 1988. The use of spontaneous gesture by
aphasic patients. Aphasiology 2(2): 137–145.
Liepmann, Hugo 1908. Drei Aufsätze aus dem Apraxiegebiet. Berlin: Karger.
Mahl, George F. 1968. Gestures and body movements in interviews. Research in Psychotherapy 3:
295–346.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Müller, Cornelia 1998. Redebegleitende Gesten. Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Ochipa, Cynthia, Leslie J. G. Rothi and Kenneth M. Heilman 1994. Conduction apraxia. Journal
of Neurology, Neurosurgery, and Psychiatry 57: 1241–1244.
Parsons, Lawrence M., John D. E. Gabrieli, Elizabeth A. Phelps and Michael S. Gazzaniga 1998.
Cerebrally lateralized mental representations of hand shape and movement. Journal of Neu-
roscience 18: 6539–6548.
Poizner, Howard, Linda Mack, Mieka Verfaellie, Leslie J. G. Rothi and Kenneth M. Heilman
1990. Three-dimensional computergraphic analysis of apraxia. Brain 113: 85–101.
Rein, Robert forthcoming. Using 3D kinematics for segmentation of hand movement behavior:
A pilot study and some further suggestions. In: Hedda Lausberg (ed.), NEUROGES – The
Neuropsychological Gesture Coding System. Berlin: Peter Lang.
Sacks, Harvey, Emanuel A. Schegloff and Gail Jefferson 1974. A simplest systematics for the or-
ganisation of turn taking for conversation. Language 50: 696–735.
Sainsbury, Peter 1954. A method of measuring spontaneous movements by time-sampling motion
picture. Journal of Mental Science 100a: 742–748.
Sassenberg, Uta, Manja Foth, Isabell Wartenburger and Elke van der Meer 2011. Show your hands –
are you really clever? Reasoning, gesture production, and intelligence. Linguistics 49(1): 105–134.
68. Transcription systems for gestures, speech, prosody, postures, and gaze 1037
Sassenberg, Uta, Ingo Helmich and Hedda Lausberg 2010. Awareness of emotions: Movement be-
haviour as indicator of implicit emotional processes in participants with and without alexity-
hmia. In: Haack, Wiese, Abraham, Chiarcos (eds.), Proceedings of KogWis: 10th Biannual
Meeting of the German Society for Cognitive Science, 169. Potsdam, Germany.
Scheflen, Albert E. 1973. Communicational Structure: Analysis of a Psychotherapy Transaction.
Bloomington: Indiana University Press.
Scheflen, Albert E. 1974. How Behaviour Means. New York: Anchor/Doubleday.
Seyfeddinipur, Mandana, Sotaro Kita and Peter Indefrey 2008. How speakers interrupt them-
selves in managing problems in speaking: Evidence for self-repairs. Cognition 108: 837–842.
Sirigu, Angela, Jean-Rene Duhamel, Laurent Cohen, Bernard Pillon, Bruno Dubois and Yves
Agid 1996. The mental representation of hand movements after parietal cortex damage.
Science 273: 1564–1568.
Skomroch, Harald, Robert Rein, Katharina Hogrefe, Georg Goldenberg and Hedda Lausberg
2010. Gesture production in the right and left hemispheres during narration of short movies.
Conference Proceedings, International Society for Gesture Studies Frankfurt/Oder, Germany,
25–30 July 2010.
Ulrich, Gerald 1977. Videoanalytische Methoden zur Erfassung averbaler Verhaltensparameter
bei depressiven Syndromen. Pharmakopsychiatrie 10: 176–182.
Ulrich, Gerald and K. Harms 1985. Video analysis of the non-verbal behaviour of depressed pa-
tients before and after treatment. Journal of Affective Disorders 9: 63–67.
Wagner Cook, Susan and Susan Goldin-Meadow 2006. The role of gesture in learning: Do children
use their hand to change their minds? Journal of Cognition and Development 7(2): 211–232.
Wallbott, Harald G. 1989. Movement quality changes in psychopathological disorders. In B. Kir-
kaldy (Ed.), Normalities and abnormalities in human movement. Medicine and Sport Science
29: 128–146.
Wartenburger, Isabell, Esther Kühn, Uta Sassenberg, Manja Foth, Elizabeth A. Franz and Elke
van der Meer 2010. On the relationship between fluid intelligence, gesture production, and
brain structure. Intelligence 38: 193–201.
Willke, S. 1995. Die therapeutische Beziehung in psychoanalytisch orientierten Anamnesen und
Psychotherapien mit neurotischen, psychosomatischen und psychiatrischen Patienten. DFG-
Bericht Wi 1213/1–1.
Abstract
The chapter presents a concise overview of existing transcription system for gestures,
speech, prosody, postures, and gaze aiming at a presentation of their theoretical back-
grounds and methodological approaches. After a short introduction discussing the under-
standing of the term “transcription”, the article first focuses on transcription systems in
modern gesture research and discusses systems from the field of gestures research, non-
verbal communication, conversation analysis, and artificial agents (e.g., Birdwhistell
1970; Sager 2001; Martell 2002; Bressem this volume; Gut et al. 2002; Kipp, Neff, and Al-
brecht 2007; Lausberg and Sloetjes 2009; HIAT 1, GAT). Afterwards, the paper presents
well-known transcription systems for speech from the field of linguistics, conversation,
and discourse analysis (e.g., IPA, HIAT, CHAT, DT, GAT). Apart from systems
for describing speech, the article also focuses on systems for the transcription of prosody.
In doing so, the paper discusses prosodic descriptions within the field of conversation
analysis, discourse analysis, and linguistics (HIAT2, GAT2, TSM, ToBI, PROLAB,
INTSINT, SAMPROSA). The last sections of the paper focus on the transcription of
body posture and gaze (e.g., Birdwhistell 1970; Ehlich and Rehbein 1982; Goodwin
1981; Schöps in preparation; Wallbott 1998).
1. Introduction
The term transcription goes back to the Latin word trānsscrı̄bere meaning ‘to overwrite’
or ‘to rewrite’ (Bußmann 1990: 187). In a linguistic understanding, transcription refers
to the notation of spoken language in written form. More specifically, it is understood as
the reproduction of communicative events using alphabetic resources and specific sym-
bols while capturing the characteristics and specifics of spoken language (Dittmar
2004). Transcription must be understood as a scientific working method directed to
the analytical needs of a scientist by freezing oral communication and making it acces-
sible to thorough inspection (Redder 2001: 1038; see also Bohle this volume for a gen-
eral discussion). However, today, transcription is not only restricted to linguistics.
Investigating interaction and communication is part of a number of scientific disciplines,
such as ethnography, sociology, psychology, neurology, and biology for instance, which
face comparable problems and obstacles in making communicative behavior analyz-
able. Accordingly, the term transcription is no longer restricted to the notation of spo-
ken language, but also includes the notation of bodily behavior, such as gesture, posture,
and gaze.
For each of the section, the notational system includes a “basic notational logic”
(Birdwhistell 1970: 258), which is combined with indicators to capture the differing var-
iants producible by the 8 sections of the body. The description is based on articulatory
aspects, such as muscular tension as well joints of articulation. Altogether, the system
offers 400 signs for the description of bodily motion. It includes descriptions for hand
and finger activities, and bi-manual gestures, yet only rarely includes the notation of
movement.
The temporal structure of gestures is described according to the beginnings and endings
of movement. Assuming only a restricted number of possible movements for the pro-
duction of gestures, the direction of movement is described as being horizontal, vertical
or diagonal, for instance. Moreover, the system mentions differences in the quality of
movements and differentiates movements as slow, fast, or discontinuous (Sager 2001:
28–29). Signifikanzpunkte are described on the basis of two principles. The principle
center of rotation records the position of arms and hands through the centers of rotation
allowing the movement (shoulder, upper arm, elbow, wrist). The principle body levels
registers movements in the various centers of rotation relative to three body levels (ver-
tical axis, sagittal axis, transversal axis), allowing for different degrees of freedom for
movement (e.g., pronation or supination of the hand). Apart from the position and
movement of hands and arms, the system includes a description of hand shapes.
Seven communicatively relevant types of hand configurations, derivable from two
types of movements of the hand, are differentiated (e.g., cupped hand) (Sager 2001:
41–42).
FORM pursues a strictly hierarchical and technical set up due to its aspect of computa-
tional processing and its applicability in research on artificial agents. The system is
designed for the use with the annotation program ANVIL (Kipp 2004).
For the positions of the hand, the notational system draws on the gesture space intro-
duced by McNeill, which divides the gesture space “into sectors using a system of
concentric squares.” (McNeill 1992: 86) (see Bressem this volume for the notational
system)
McNeill’s system uses the written speech transcription as the basis for coding ges-
tures. Gestures are annotated into the speech transcription by inserting brackets for
the beginning and end of a gesture. The scheme includes the description of hand con-
figuration, orientation, position in gesture space, and movement. The gesture space con-
sists of a system of concentric squares dividing the space in front of the speaker into
three basic areas (center, periphery, and extreme periphery) (see McNeill 1992: 89).
Hand configurations are based on the labeling of hand shapes in American Sign Lan-
guage (ASL) using the “ASL shape that the gesture mostly resembles.” (McNeill
1992: 86) Orientation of the hand is coded according to the gesture space and palm ori-
entation. Gestural movements, such as shape, direction, and trajectory are accounted
for in a descriptive fashion without providing strict guidelines.
Configurations of the hand are described with the taxonomic notation systems of
HamNoSys (Prillwitz et al. 1989) and FORM (Martell 2002). Movements are coded
for shape, direction, and modifiers, that is, size, speed, and number of repetitions. In
addition symmetry of hands is coded. For the combination of gestures into complex
units, the Conversational Gesture Transcription system distinguishes sequences of pre-
cedence and overlap (Gut et al. 2002: 6). The functional classification of gestures is
based on a four-part classification distinguishing various degrees of overlap between
gestural and verbal meaning.
2.2.3. Kipp, Neff, and Albrecht (2007): A transcription and coding scheme for the
automatic generation and animation of character-specific hand/arm gestures
Offering a scheme developed “for the specific purpose of automatically generating and
animating character-specific hand/arm gestures, but with potential general value”
68. Transcription systems for gestures, speech, prosody, postures, and gaze 1043
(Kipp, Neff, and Albrecht 2007: 1), the scheme operates on the concept of a gesture lex-
icon made up of lexemes, that is “prototypes of recurring gesture patterns where certain
formational features remain constant over instances and need not be annotated for
every single occurrence.” (Kipp, Neff, and Albrecht 2007: 4)
The scheme is implemented and used within the ANVIL annotation tool (Kipp
2004) and consists of adding annotation elements to a track in which each element
is described with a pre-assigned set of attributes, which capture the most essential
parts of a gesture. The annotation scheme includes the spatial form of gestures in
which gesture phases, phrases, and units (Kendon 1980, 2004) are described along
with handedness, path of movement, position, as well as hand shape, and distance of
hands. Hand shapes are coded using a taxonomic classification of 9 types of configura-
tions. Gestures’ membership to a lexical category is determined by the lexeme that de-
fines the hand shape, palm orientation, and exact trajectory. Typical lexemes include:
raised index finger, cup (open hand), finger ring or progressive (circular movement)
(Kipp, Neff, and Albrecht 2007: 14). In a last step, the relation of speech and gesture
is captured.
Module (i) refers to the kinetic features of a hand movement, i.e., execution of move-
ment vs. no movement, trajectory and dynamics of movements, location of acting as
well as contact with body or not. For the characterization of the dynamic aspects of
movements, NEUROGES uses Laban notation (1950). Module (ii) allows for the cod-
ing of bimanual relation (for instance in touch vs. separate, symmetrical vs. complemen-
tary, independent vs. dominance). Module (iii) brings in the functional aspects and
determines the meaning of gestures based on a specific combination of kinetic features
(hand shape, orientation, path of movement, effort and others), which define the
various gesture types.
(i) only partly to specific elements of the movement potential (e.g. raising hand),
(ii) to the expressional quality including the movement potential, and
(iii) the summery of complex movements and actions (e.g., waving).
The transcription furthermore includes a rough record of the on- and offsets as well as
the length of movements and only a rudimentary description of form or function. The
relevance of the gestural component and its transcription is thereby always dependent
on its relevance for the verbal communication (Ehlich and Rehbein 1979a: 315). The
verbal modality is the constitutive background for the transcription of gestures, so
that gestures are transcribed on commentary lines dependent on the verbal utterance.
kinesic behavior. Bodily behavior is noted in commentary lines dependent on the verbal
utterance and inclusive of a rudimentary coding of on- and offsets of movement se-
quences. Bodily behavior is yet only of interest if it obviously influences the verbal
and communicative orientations of the speakers and addressees.
The Gesprächsanalytische Transkriptionssystem (GAT, Selting, Auer, Barden, et al.
1998; Selting, Auer, Barth-Weingarten, et al. 2009) also only includes behavioral as-
pects, such as proxemic, kinesic, gesture, and gaze in the transcription of face-to-face
interaction if it contributes to the “(un)ambiguousness of other predominantly verbal
levels of activities.” (Selting, Auer, Barth-Weingarten, et al. 2009: 26) Regarding ges-
tures, the Gesprächsanalytisches Transkriptionssystem (Selting, Auer, Barden, et al.
1998) lists deictic gestures, illustrators, and emblems and includes a rough description
of on- and offsets as well as apex, that is, peaks, of gestural movement sequences.
The description is behavior-oriented and tries to be as little interpretative as possible.
The Gesprächsanalytisches Transkriptionssystem offers differing degrees of detailed-
ness in the transcription, as it sets apart basic vs. fine-grained transcripts. Basic tran-
scripts usually include an interpretive characterization of the gestures within the line
containing the verbal transcription. Fine-grained transcripts list gestures in a separate
line under the simultaneously occurring verbal activity. For illustrative purposes and in
cases of special importance of the nonverbal activities, the Gesprächsanalytisches Trans-
kriptionssystem also mentions the inclusion of pictures in the transcript (Selting, Auer,
Barden, et al. 1998: 28). Its newest revision, the Gesprächsanalytisches Transkriptions-
system 2, mentions that new conventions for the transcription of visual components of
communication are being designed (Selting, Auer, Barth-Weingarten, et al. 2009: 356)
due to the growing interest and importance of visual aspects of communication within
the field of interaction analysis.
This section has presented notation and transcription systems for gestures, which
range from a focus on:
(i) form, to
(ii) form and function, to
(iii) rudimentary descriptions.
The presented systems primarily differ in the aspect of whether a) gestures’ form can
and should be separated from possible meanings and functions (e.g., Birdwhistell
1970; Martell 2002; Bressem this volume) or b) whether a separation of form, meaning,
and function is not useful for a transcription of gestures (e.g., Gut et al. 2002; McNeill
1992, 2005). These diverging foci thereby go along with the theoretical assumption that
gestures can either be broken down into separate components, which may combine with
other features or not. Furthermore, the role of speech in the process of notation or tran-
scription is different in the systems presented above. While the verbal utterance is of
particular importance for some of the systems (e.g., Ehlich and Rehbein 1979b; Selting,
Auer, Barden, et al. 1998; Selting, Auer, Barth-Weingarten, et al. 2009), others exclude
the verbal modality in parts completely from the notational process (e.g., Bressem this
volume). A further difference in the presented systems is the integration of annotation
software. While especially recent systems use the advantages of annotation software for
the process of notation and transcription (e.g., Kipp, Neff, and Albrecht 2007; Lausberg
and Sloetjes 2009; Martell 2002), others rely on conventional and longstanding methods
1046 V. Methods
of transcribing gesture with the use of word documents. Yet, the most important differ-
ence is the clarification and integration of the system within a theoretical and method-
ological framework as well as their implications, which, for most systems, are not
presented as articulately as is necessary.
transcript as an observational datum. A rich inventory for the reproduction of turns and
their sequential progression is thus characteristic for the system. The system uses the
“eye dialect,” a standard orthography onomasiologically adapted to the phonetic real-
ization of the expression. The system’s inventory of signs is based on the Latin alphabet.
The format of transcription is sequentially organized and turns of speakers are, analog
to their linear progression, ordered in chronological order. The system represents
simultaneous utterances of more than one speaker by using brackets at the time the
overlap occurs. The end of verbal units/turns is marked by standard orthography for
interrogative sentences. The system also includes prosodic aspects of utterances, such
as remarkable changes in pitch, changes in the intonations contour, lengthening,
emphasis, changes in tempo, and pauses. Nonverbal events, such as gestures, mimics,
breathing, and coughing for instance, are represented in commentary lines in double
parenthesis (e.g. coughing). In its newest revision (Jefferson 2002), the system also
includes guidelines for a computer-aided transcription.
While all of the presented systems aim at a representation of spoken language, the
systems differ from each other in a range of aspects. The most obvious difference is the
diverging format of representation for spoken language. Systems may use notation scores,
that is, an endless line (Ehlich and Rehbein 1979b) or single lines for single speakers and
turns (e.g., Selting, Auer, Barden, et al. 1998; Selting, Auer, Barth-Weingarten, et al. 2009).
Furthermore, the systems differ in their basic unit of analysis: turn (e.g., Jefferson 1984;
Selting, Auer, Barden, et al. 1998; Selting, Auer, Barth-Weingarten, et al. 2009) vs. utter-
ance as a whole (MacWhinney 2000). Going along with this is a differentiation in the
segmentation of verbal units, varying from sounds (International Phonetic Association
2005) to intonations units for instance (Du Bois 1991). In addition, the systems include
prosodic aspects as well as other forms of bodily behavior to varying degrees.
4. Transcription of prosody
Transcription systems for prosody generally capture two main types of phenomena:
a) the division of utterances into prosodically-marked chunks, units or phrases and
b) the representation of prominence along with aspects such as pitch movement,
reset or rhythmic change for instance. But, the size and type of prosodic units vary
considerably in the different systems, thus resulting in different prosodic transcriptions.
1050 V. Methods
The preceding overview has shown that the systems not only vary in their theoretical
and methodological tradition, but also in their focus on a transcription of prosody. In
general, the proposed systems can be classified according to common and differing
68. Transcription systems for gestures, speech, prosody, postures, and gaze 1053
parameters (Llisterri 1994). Regarding the representation of prosodic events, the sys-
tems can be classified into multi-tiered (the Tones and Break Indices, the International
Transcription System for Intonation) or one-tiered systems (e.g., the International Pho-
netic Alphabet, the Gesprächsanalytisches Transkriptionssystem 2, the HalbInter-
pretative ArbeitsTranskriptionen 2). The systems further differ regarding their aspects
of machine readable symbols (e.g., the Speech Assessment Method Phonetic Alphabet,
SAMSINT or SAM Prosodic Transcription) vs. non-machine readable symbols (e.g., the
Gesprächsanalytisches Transkriptionssystem 2, the Tones and Break Indices, the Ge-
sprächsanalytisches Transkriptionssystem, the HalbInterpretative ArbeitsTranskriptio-
nen, PROLAB). In addition, the systems differ in whether they are theory-driven
systems, that is a) based on a conception of the phonetics-phonology interface or b) data-
driven systems, i.e., defined by the needs and the practices which are known to be
relevant in order to explain the discursive or the interactional behavior of the speakers (Llis-
terri 1994).
head, arms, hands), which are described according to their movement abilities (up, down,
back for the shoulders for instance), Wallbott’s system allows for a rough anatomical
description of various body postures.
Although primarily developed for the notation of dance, aspects of the Laban nota-
tion (Laban 1950) are nowadays used in a range of transcription systems (e.g., Davis
1979; Greenbaum and Rosenfeld 1980; Kendon 2004; Lausberg and Sloetjes 2009).
This system includes a basic segmentation of the skeletal system, basic kinesiological
terms (e.g., rotations), spatial terms (e.g., straight vs. circular paths) and object relations
(e.g., touch), which allow for a detailed notation of body posture and bodily movement.
Recently, new transcription systems aiming at a combination of describing form and
function of body postures are postulated. Schöps (in preparation), for instance, presents
a system including basic postures (standing, laying down, sitting) as well as body parts
and movement categories, along with different predicates for the transcription of the
used body configurations (e.g., spread for arms and legs). The Body Action and Posture
Coding system by Dael and Scherer (2012) approaches the transcription and coding of
body posture on an “anatomical level (different articulations of body parts), a form
level (direction and orientation of movement), and a functional level (communicative
and self-regulatory functions)” (Dael and Scherer 2012).
7. Conclusion
This overview of notation and transcription system for speech and bodily behavior has
shown that a range of proposals exists, all of which try to account for the reproduction
of verbal and bodily behavior in written forms. It became apparent that the individual
68. Transcription systems for gestures, speech, prosody, postures, and gaze 1055
8. References
Argyle, Michael 1975. Bodily Communication. London: Methuen.
Argyle, Michael and Mark Cook 1976. Gaze and Mutual Gaze. Cambridge: Cambridge University
Press.
Auer, Peter, Elizabeth Couper-Kuhlen and Frank Müller 1999. Language in Time: The Rhythm
and Tempo of Spoken Interaction. New York: Oxford University Press.
Avanzi, Mathieu, Anne Lacheret-Dujour and Bernard Victorri 2008. ANALOR. A tool for semi-
automatic annotation of French prosodic structure. Paper presented at the Interspeech 2008,
Campinas, Brazil, May 6–9.
Battison, Robin 1974. Phonological deletion in American Sign Language. Sign Language Studies
5: 1–19.
Becker, Karin 2004. Zur Morphologie redebegleitender Gesten. MA thesis, Department of Philos-
ophy and Humanities, Free University Berlin.
Beckman, Mary E. and Gayle Ayers Elam 1997. Guidelines for ToBI labelling. Retrieved
15.07.2011, from http://www.ling.ohio-state.edu/research/phonetics/E_ToBI/
Birdwhistell, Ray 1970. Kinesics and Context. Essays on Body, Motion, Communication. Philadel-
phia: University of Pennsylvania Press.
Bohle, Ulrike 2007. Das Wort Ergreifen – das Wort Übergeben: Explorative Studie zur Rolle Re-
debegleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bohle, Ulrike this volume Approaching notation, coding, and analysis from a conversational ana-
lysis point of view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
Bressem, Jana this volume. A linguistic perspective on the notation of form features in gestures. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.)
Berlin: De Gruyter Mouton.
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases: Articulatory features of ges-
tural movement? Semiotica 184(1/4): 53–91.
Brinker, Klaus and Sven F. Sager 1989. Linguistische Gesprächsanalyse. Berlin: Erich Schmidt.
Bußmann, Hadumod 1990. Lexikon der Sprachwissenschaft. 2nd revised edition. Stuttgart,
Germany: Alfred Kröner.
Campione, Estelle and Jean Vèronis 2001. Semi-automatic tagging of intonation in French spoken
corpora. In: Paul Rayson, Andrew Wilson, Tony McEnery, Andrew Hardie and Shereen Khoja
(eds.), Proceedings of the Corpus Linguistics’ 2001 Conference, 90–99. Lancaster, U.K.: Lancas-
ter University, UCREL.
Couper-Kuhlen, Elizabeth 1993. English Speech Rhythm: Form and Function in Everyday Verbal
Interaction. Amsterdam: John Benjamins.
Crystal, David 1969. Prosodic Systems and Intonation in English. Cambridge: Cambridge Univer-
sity Press.
1056 V. Methods
Dael, Nele and Klaus R. Scherer 2012. The Body Action and Posture coding system (BAP):
Development and reliability. Journal of Nonverbal Behavior, 36, 97–121.
Davis, Martha 1979. Laban analysis of nonverbal communication. In: Shirley Weitz (ed.), Nonver-
bal Communication: Readings with Commentary, 182–206. New York: Oxford University Press.
Dittmar, Norbert 2004. Transkription: Ein Leitfaden mit Aufgaben für Studenten, Forscher und
Laien. Heidelberg: VS Verlag für Sozialwissenschaften.
Du Bois, John W. 1991. Transcription design principles for spoken discourse research. Pragmatics
1(1): 71–106.
Du Bois, John W., Susanna Cumming, Stephan Schuetze-Coburn and Danae Paolino 1992. Dis-
course transcription. Santa Barbara Papers in Linguistics 4. University of California, Santa Bar-
bara, Department of Linguistics.
Ehlich, Konrad and Jochen Rehbein 1976. Halbinterpretative Arbeitstranskriptionen (HIAT 1).
Linguistische Berichte 25: 21–41.
Ehlich, Konrad and Jochen Rehbein 1979a. Erweiterte halbinterpretative Arbeitstranskriptionen
(HIAT2). Linguistische Berichte 59: 51–75.
Ehlich, Konrad and Jochen Rehbein 1979b. Zur Notierung nonverbaler Kommunikation für dis-
kursanalytische Zwecke (HIAT2). In: Peter Winkler (ed.), Methoden der Analyse von Face-to-
Face-Situationen, 302–329. Stuttgart: Metzler.
Ehlich, Konrad and Jochen Rehbein 1982. Augenkommunikation. Methodenreflextion und Beis-
pielanalyse. Amsterdam: John Benjamins.
Eibl-Eibesfeldt, Irenäus 1971. Transcultural patterns of ritualized contact behavior. In: Aristide H.
Esser (ed.), Behavior and Environment. The Use of Space by Animals and Men, 238–246. New
York: Plenum.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories,
origins, usage, and coding. Semiotica 1 49–98.
Ekman, Paul and Wallace V. Friesen 1978. Facial Action Coding System (FACS): A Technique for
the Measurement of Facial Action. Palo Alto, CA: Consulting Psychologists Press.
Frey, Siegfried, Hans Peter Hirsbrunner, Jonathan Pool and William Daw 1981. Das Berner Sys-
tem zur Untersuchung nonverbaler Interaktion: I. Die Erhebung des Rohdatenprotokolls; II.
Die Auswertung von Zeitreihen visuell-auditiver Information. In: Peter Winkler (ed.), Metho-
den der Analyse von Face-to-Face-Situationen, 203–268. Stuttgart: Metzler.
Frey, Siegfried, Hans Peter Hirsbrunner and Ulrich Jorns 1982. Time-series notation: A coding
principle for the unified assessment of speech and movement in communication research. In:
Ernest W. B. Hess-Lüttich (ed.), Multimodal Communication: Vol. I Semiotic Problems of Its
Notation, 30–58. Tübingen: Narr.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: Walter de Gruyter.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin:
De Gruyter.
Garcia, Jesus, Ulrike Gut and Antonio Galves 2002. Vocale – a semi-automatic annotation tool for
prosodic research. Proceedings of Speech Prosody 2002.
Gibbon, Dafydd 1988. Intonation and discourse. In: Janos S. Petöfi (ed.), Text and Discourse Con-
stitution, 3–25. Berlin: De Gruyter.
Gibbon, Dafydd, Inge Mertins and Roger K. Moore 2000. Handbook of Multimodal and Spoken
Dialogue Systems: Resources, Terminology, and Product Evaluation. Norwell, Massachusetts,
USA: Kluwer Academic.
Goodwin, Charles 1981. Conversational Organization: Interaction between Speakers and Hearers.
New York: Academic Press.
Greenbaum, Paul E. and Howard Rosenfeld 1980. Varieties of touching in greetings: Sequential
structure and sex-related differences. Journal of Nonverbal Behavior 5(1): 13–25.
Grice, Martin, Stefan Baumann and Ralf Benzmüller 2005. German intonation in autosegmental-
metrical phonology. In: Sun-Ah Jun (ed.), Prosodic Typology: The Phonology of Intonation
and Phrasing, 55–83. Oxford: Oxford University Press.
68. Transcription systems for gestures, speech, prosody, postures, and gaze 1057
Gumperz, John and Norine Berenz 1993. Transcribing conversational exchanges. In: Jane A. Ed-
wards and Martin D. Lampert (eds.), Talking Data: Transcription and Coding in Discourse
Research, 91–121. Hillsdale, NJ: Lawrence Erlbaum.
Gussenhoven, Carlos, Toni Rietveld and Jaques Terken 2005. Transcription of Dutch intonation.
In: Sun-Ah Jun (ed.), Prosodic Typology: The Phonology of Intonation and Phrasing, 118–145.
Oxford: Oxford University Press.
Gut, Ulrike, Karin Looks, Alexandra Thies and Dafydd Gibbon 2002. Cogest: Conversational ges-
ture transcription system version 1.0. Fakultät für Linguistik und Literaturwissenschaft, Uni-
versität Bielefeld, ModeLex Tech. Rep 1.
Hager, Joseph C., Paul Ekman and Wallace V. Friesen 2002. Facial action coding system. Re-
trieved 15.07.2011, from http://face-and-emotion.com/dataface/facs/guide/InvGuideTOC.html
Hall, Alan T. 1963. A system for the notation of proxemic behavior. American Anthropologist
65(5): 1003–1026.
Hirsbrunner, Hans-Peter, Siegfried Frey and Robert Crawford 1987. Movement in human inter-
action: Description, parameter formation and analysis. In: Aron W. Siegman and Stanley Feld-
stein (eds.), Nonverbal Behavior and Communication, 99–140. Hillsdale, NJ: Lawrence
Erlbaum.
Hirst, Daniel 1991. Intonation models: Towards a third generation. In: Actes du XIIe‘me Congre‘s
International des Sciences Phone´tiques, 305–310. Aix-en-Provence, France: Université de
Provence, Service des Publications.
Hirst, Daniel and Albert Di Cristo 1998. Intonation Systems: A Survey of Twenty Languages. Cam-
bridge: Cambridge University Press.
International Phonetic Association 2005. Handbook of the International Phonetic Association: A
Guide to the Use of the International Phonetic Alphabet. Cambridge: Cambridge University
Press.
Jefferson, Gail 1984. On stepwise transition from talk about a trouble to inappropriately next-
positioned matters. In: Maxwell J. Atkinson and John Heritage (eds.), Structures of Social
Action: Studies in Conversation Analysis, 191–222. Cambridge: Cambridge University Press.
Jefferson, Gail 2002. Is “no” an acknowledgment token? Comparing American and British uses of
(+)/(-) tokens. Journal of Pragmatics 34(10/11): 1345–1383.
Kallmeyer, Werner and Reinhold Schmitt 1996. Forcieren oder: Die verschärfte Gangart. Zur
Analyse von Kooperationsformen im Gespräch. In: Werner Kallmeyer (ed.), Gesprächsrhe-
torik: Rhetorische Verfahren im Gesprächsprozeß, 19–118. Tübingen: Narr.
Kendon, Adam 1967. Some functions of gaze-direction in social interaction. Acta Psychologica 26:
22–63.
Kendon, Adam 1972. Some relationship between body motion and speech In: Aron W. Siegman,
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Perga-
mon Press.
Kendon, Adam 1980. Gesture and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Nonverbal Communication and Language, 207–288. The Hague: Mouton.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kendon, Adam and A. Ferber 1973. A description of some human greetings. In: R. P. Michael
and J. H. Crook (eds.), Comparative ecology and behavior of primates, 591–668. New York:
Academic Press.
Kipp, Michael 2004. Gesture Generation by Imitation: From Human Behavior to Computer Char-
acter Animation. Boca Raton, FL: Dissertation.com.
Kipp, Michael, Michael Neff and Irene Albrecht 2007. An annotation scheme for conversational
gestures: How to economically capture timing and form. Journal on Language Resources and
Evaluation – Special Issue on Multimodal Corpora 41(3/4): 325–339.
Klima, Edward and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard
University Press.
1058 V. Methods
Knowles, Gerry, Briony Williams and L. Taylor 1996. A Corpus of Formal British English Speech.
London: Longman.
Kohler, Klaus J. 1991. A model of German intonation. Arbeitsberichte des Instituts für Phonetik
und digitale Sprachverarbeitung der Universität Kiel 25: 295–360.
Kohler, Klaus J. 1995. ToBIG and PROLAB: Two prosodic transcription systems for German
compared. Paper presented at the Conference ICPhS Stockholm, 13 August 1995.
Kohler, Klaus J., Matthias Pätzold and Adrian P. Simpson 1995. From Scenario to Segment: The
Controlled Elicitation, Transcription, Segmentation and Labelling of Spontaneous Speech.
Kiel, Germany: Institut für Phonetik und Digitale Sprachverarbeitung, IPDS, Universität Kiel.
Laban, Rudolph von 1950. The Mastery of Movement on the Stage. London: Macdonald and
Evans.
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium ‘hand’: Discover-
ing recurrent structures in gestures. Semiotica.
Lausberg, Hedda and Han Sloetjes 2009. Coding gestural behavior with the NEUROGES –
ELAN system. Behavioral Research Methods 41(3): 841–849.
Llisterri, Joaquim 1994. Prosody encoding survery. MULTEXT-LRE Project 62-050.
MacWhinney, Brian 2000. The CHILDES Project: Tools for Analyzing Talk, Volume 2: The Da-
tabase. Hillsdale, NJ: Lawrence Erlbaum.
Martell, Craig 2002. Form: An extensible, kinematically-based gesture annotation scheme. Paper
presented at International Conference on Language Resources and Evaluation. European
Language Resources Association.
Martell, Craig 2005. FORM: An experiment in the annotation of the kinematics of gesture. Ph.D.
dissertation, Department of Computer and Information Sciences, University of Pennsylvania.
Martell, Craig and Joshua Kroll no date. Corpus-based gesture analysis: An extension of the
FORM dataset for the automatic detection of phases in a gesture. Unpublished manuscript.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
McNeill, David and Susan D. Duncan 2000. Growth points in thinking-for-speaking. In: David
McNeill (ed.), Language and Gesture, 141–161. Cambridge: Cambridge University Press.
Mertens, Pier 2004. The prosogram: Semi-automatic transcription of prosody based on a tonal per-
ception model. Paper presented at Speech Prosody 2004, March 23–26, 2004, Nara, Japan.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2010. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In: Sprache und Literatur 41(1): 37–68.
Müller, Cornelia, Hedda Lausberg, Ellen Fricke and Katja Liebal 2005. Towards a grammar of
gesture: evolution, brain, and linguistic structures. Berlin: Antrag im Rahmen der Förderinitia-
tive “Schlüsselthemen der Geisteswissenschaften Programm zur Förderung fachübergreifender
und internationaler Zusammenarbeit”.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of ges-
tures: A form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig
and David McNeill (eds.), Body – Language – Communication: An International Handbook
on Multimodality in Human Interaction. Handbooks of Linguistics and Communication
Science (38.1). Berlin and Boston: De Gruyter Mouton.
O’Connor, Joseph Desmond and Gordon Frederick Arnold 1973. Intonation of Colloquial
English. London: Longman.
Pierrehumbert, Janet B. 1980. The Phonology and Phonetics of English Intonation. Boston: Mas-
sachusetts Institute of Technology Press.
Prillwitz, Siegmund, Regina Leven, Heiko Zienert, Thomas Hanke and Jan Henning 1989. Ham-
NoSys Version 2.0 Hamburger Notationssystem für Gebärdensprachen: Eine Einführung. Ham-
burg: Signum Verlag.
68. Transcription systems for gestures, speech, prosody, postures, and gaze 1059
Redder, Angelika 2001. Aufbau und Gestaltung von Transkriptionssystemen. In: Klaus Brinker,
Gerd Antos, Wolfgang Heinemann and Sven F. Sager (eds.), Text und Gesprächslinguistik.
Ein Internationales Handbuch Zeitgenössischer Forschung, 1038–1059. (Handbücher zur
Sprach- und Kommunikationswissenschaft 16.2.) Berlin: De Gruyter.
Roach, Peter, Gerry Knowles, Tamas Varadi and Simon Arnfield 1993. Marsec: A machine-read-
able spoken English corpus. Journal of the International Phonetic Association 23(2): 47–54.
Sager, Sven F. 2001. Probleme der Transkription nonverbalen Verhaltens. In: Klaus Brinker, Gerd
Antos, Wolfgang Heinemann and Svend F. Sager (eds.), Text und Gesprächslinguistik. Ein In-
ternationales Handbuch Zeitgenössischer Forschung, 1069–1085. (Handbücher zur Sprach- und
Kommunikationswissenschaft 16.2.) Berlin: De Gruyter.
Sager, Svend F. and Kristin Bührig 2005. Nonverbale Kommunikation im Gespräch–Editorial. In:
Kristin Bührig and Svend F. Sager (eds.), Osnabrücker Beiträge zur Sprachtheorie 70: Nonver-
bale Kommunikation im Gespräch, 5–17. Oldenberg: Redaktion Obst.
Scheflen, Albert 1965. The significance of posture in communication systems. Psychiatry 27: 316–331.
Scherer, Klaus R. 1970. Non-Verbale Kommunikation: Ansätze zur Beobachtung und Analyse der
Aussersprachlichen Aspekte von Interaktionsverhalten. Hamburg: Buske.
Scherer, Klaus R. 1979. Die Funktionen des nonverbalen Verhaltens im Gespräch. In: Klaus R.
Scherer and Harald G. Wallbott (eds.), Nonverbale Kommunikation: Forschungsberichte zum
Interaktionsverhalten, 25–32. Weinheim: Beltz.
Schmitt, Reinhold (ed.) 2007. Koordination: Analysen zur Multimodalen Interaktion. Tübingen:
Narr.
Schneider, Wolfgang 2001. Der Transkriptionseditor HIAT-DOS. Gesprächsforschung-Online
Zeitschrift zur Verbalen Interaktion 2: 29–33.
Schönherr, Beatrix 1997. Syntax – Prosodie – Nonverbale Kommunikation. Empirische Untersu-
chungen zur Interaktion Sprachlicher und Parasprachlicher Ausdrucksmittel im Gespräch. Tü-
bingen, Germany: Niemeyer.
Schöps, Doris in preparation. Körperhaltung als Zeichen am Beispiel des DEFA-Films. Disserta-
tion, Technische Universität Berlin.
Selting, Margret, Peter Auer, Birgit Barden, Jörg R. Bergmann, Elizabeth Couper-Kuhlen, Sus-
anne Günther, Christoph Meier, Uta Quasthoff, Peter Schlobinski and Susanne Uhmann
1998. Gesprächsanalytisches Transkriptionssystem (GAT). Linguistische Berichte 173: 91–122.
Selting, Margret, Peter Auer, Dagmar Barth-Weingarten, Jörg Bergmann, Pia Bergmann, Karin
Birkner, Elizabeth Couper-Kuhlen, Arnulf Deppermann, Peter Gilles, Susanne Günthner,
et al. 2009. Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). Gesprächsforschung –
Online Zeitschrift zur Verbalen Interaktion 10: 353–402.
Silverman, Kim, Mary Beckman, John Pitrelli, Mori Ostendorf, Colin Wightman, Patti Price, Janet
Pierrehumbert and Julia Hirschberg 1992. ToBI: A standard for labeling English prosody.
Proceedings of ICSLP-1992, 867–870.
Stokoe, William 1960. Sign Language Structure. Buffalo, NY: Buffalo University Press.
Wallbott, Harald 1998. Ausdruck von Emotionen in Körperbewegungen und Körperhaltungen. In:
Caroline Schmauser and Thomas Noll (eds.), Körperbewegungen und ihre Bedeutung, 121–136.
Berlin: Arno Spitz.
Weinrich, Lotte 1992. Verbale und Nonverbale Strategien in Fernsehgesprächen: Eine Explorative
Studie. Tübingen: Niemeyer.
Wells, John, William Barry, Martin Grice, Adrian Fourcin and Dafydd Gibbon 1992. Standard
computer-compatible transcription. Technical Report No. SAM Stage Report Sen.3 SAM
UCL-037. London: University College London.
Zwitserlood, Ingeborg, Asli Özyürek and Pamela Perniss 2008. Annotation of sign and gesture
cross-linguistically. Paper presented at the 3rd Workshop on the Representation and Proces-
sing of Sign Languages, Marrakesh.
Abstract
This chapter presents a proposal for the description of gesture phases derived from articu-
latory characteristics observable in their execution. It is grounded in a linguistic approach
to gestures which stresses the separation of gestural forms and functions in the analytic
process. By presenting a context-independent and a context-sensitive description of ges-
ture phases this paper aims at three aspects: 1) to present articulatory features apparent
in the execution of gestures phases, 2) to characterize and define gesture phases according
to these features and independent of the their functional aspects, and 3) to allow for a
description of cases in which features are replaced due to the sequential embedding of
phases in particular linear successions. The article concludes with a focus on the practical
implementation of this approach in the annotation process.
1. Introduction
When investigating the tight coordination between speech and gesture, it is useful to
focus on the structure of both modalities. Gestures show simultaneous structures
(along the different form parameters: hand shape, orientation, movement, and position,
e.g., Bressem this volume; Ladewig and Bressem forthcoming; Müller, Bressem, and
Ladewig this volume; see also Battison 1974; Stokoe 1960) but they also exhibit a linear
structure that is hierarchically organized (Müller et al. 2005; Müller, Bressem, and
Ladewig this volume). Segments of gestural movement have been referred to as gesture
phases (see below). They were investigated with respect to their internal structure
(Fricke 2012; Kendon 1980, 2004), their coordination and correlation with units of speech
(Condon and Ogston 1967; Efron [1941] 1972; Fricke 2012; Karpiński, Jarmołowicz-
Nowikow, and Malisz 2009; Kendon 1972, 1980; Kita, van Gijn, and van der Hulst 1998;
Loehr 2006; McClave 1991; Nobe 2000; Quek et al. 2002; Seyfeddinipur 2006; Yassinik,
Renwick, and Shattuck-Hufnagel 2004), and their characteristics observable in their exe-
cution (e.g., Bressem and Ladewig 2011; Chafai, Pelachaud, and Pelé 2006; Harling and
Edwards 1997; Kahol, Tripathi, and Panchanathan 2006; Latoschik 2000; Martell and
Kroll 2007; Wilson, Bobick, and Cassell 1996).
parts – the preparation, the gesture proper, and the return” (Ott 1902: 21). Also Mosher
remarked that “[t]he great majority of gestures with the hands consist of three parts,
which may be termed the preparation, the stroke, and the recovery.” (Mosher 1916:
10) Furthermore, although argued from a prescriptive perspective (see also Kendon
1980), both remark that gestures can be combined linearly into sequences or series of
gestures.
Kendon (1980) gives a first systematic account of the linear structure of gestures.
Based on the observation that “the pattern of movement that co-occurs with the speech
has a hierarchic organization which appears to match that of the speech units” (Kendon
1972: 190), he focused on the close relationship of gestures and speech and addressed
the phrasal structure of gestures. Kendon identified six different gesture phases: the
rest position, a “moment when the limb is in rest”, the preparation, a phase “in which
the limb moves away from its rest position to a position at which the stroke begins”,
the stroke, the meaningful part of a gesture “in which the limb shows a distinct peaking
of effort”, the hold, a moment in which “the hand is held still in the position it reached
at the end of the stroke”, and the retraction, “a phase in which the limb is either moved
back to its rest position or is readied for another stroke” or the partial retraction, a
“phase in which the hand does not return all the way to the position it was in” (Kendon
1980: 212). The linear combination of these phases can form higher-order structures: A
gesture phrase is composed of a preparation and a stroke. The whole excursion of the
hands from one rest position to the next is termed gesture unit (Kendon 1980). Within
this seminal paper, Kendon not only provides a vocabulary for the segmentation of ges-
tural movement, he demonstrates that gestures show a linear structure and are hierar-
chically organized. Furthermore, his work is fundamental for a fine-grained analysis of
gesture-speech interaction and marks the beginning of the coding of gesture phases in
gesture studies.
Following contributions extend Kendon’s model by focusing on particular gesture
phases and by providing further characterizations. Kita, van Gijn, and van der Hulst
(1998) address a particular gesture phase, namely the hold. While Kendon solely
talks about a hold as a phase in which the hand is held still after the execution of a
stroke, Kita, van Gijn and van der Hulst point out that a stroke can
This differentiation is grounded in the functions these two types of holds fulfill with
respect to speech. Whereas a pre-stroke hold “is a period in which the gesture waits
for speech to establish cohesion so that the stroke co-occurs with the co-expressive por-
tion of speech” (Kita, van Gijn, and van der Hulst 1998: 26), a post-stroke hold is “a way
to temporally extend a single movement stroke so that the stroke and post-stroke-hold
together will synchronize with the co-expressive portion of speech” (Kita, van Gijn, and
van der Hulst 1998: 26). Moreover, they distinguish between independent holds, i.e.,
holds that can stand by themselves and be a “gestural expression” on their own and
dependent holds which adjoin and are “parasitic to the stroke” because “they arise
from the semiotic coordination or modification of the expression in the stroke” (Kita,
van Gijn, and van der Hulst 1998: 28). This functional specification of holds is currently
almost omnipresent in analyses focusing on gestures (see, for example, Gullberg and
1062 V. Methods
Holmquist 2006; Kendon 2004; Kettebekov and Sharma 2001; McCullough 2005;
McNeill 2005, 2012; Park-Doob 2010; Parrill 2000; Quek et al. 2002; Sowa 2006).
By and large, Kita, van Gijn, and van der Hulst (1998) remain within the framework
provided by Kendon (1980). Although they do focus on a few other characteristics such
as particular aspects of the gesture phase preparation, the basic set up of the gesture
phases as introduced by Kendon remains the same. Two aspects of their endeavor how-
ever stand out in contrast to the one put forward by Kendon. Their proposal includes
guidelines for the segmentation of gesture phases, i.e. the division of phases into smaller
stretches of movement prior to an identification of particular gesture phases. Further-
more, they aimed at a “method of analyzing continuous production of gestures and
signs, which is based purely on [a] formal basis” (Kita, van Gijn, and van der Hulst
1998: 34) and thus address an important aspect regarding the coding of gesture phases,
namely its orientation and dependence on speech.
The hold is again in the focus of attention in the McNeill lab coding manual (Duncan
n.d.). Duncan distinguishes between full holds, i.e. “holds with no detectable move-
ment” (Duncan n.d.: 4), and “ ‘feature’ or ‘virtual holds’ ”, i.e., holds which show
some movement but are characterized by a maintenance of hand shape and position
in gesture space (Duncan n.d.: 4). Regarding the other gesture phases, she includes prep-
aration, stroke, hold, and retraction in her coding manual, but gives no further descrip-
tion. Only the stroke is characterized in more detail, namely as the phase which
“typically (but not always!) is the interval of apparent greatest gestural effort” (Duncan
n.d.: 3). Although McNeill (2005) partially completes the reflection on the phases by
Duncan, insofar as he provides a further specification of the various gesture phases
and includes the phases pre-stroke hold and post-stroke hold, new aspects regarding
the characterization of gesture phases are again only added with respect to holds.
With her frame-by-frame marking procedure Seyfeddinipur (2006) initiates a new
turn in the identification of gesture phases. This segmentation procedure examines ges-
tural movement sequences frame by frame and marks transitions from one gesture
phase to another based on the sharpness of the video image. Accordingly, three types
of transitions in the execution of gestural movement sequences are distinguishable:
These types of transitions provide the basis for the assignment of gestural movement
phases to a specific type of gesture phase as each gesture phase is allocated to particular
types of transitions:
(i) dynamic phases which are characterized by the execution of movement such as in
preparation, stroke, and retraction, and
(ii) static phases which do not involve movement such as holds and rest positions.
phases have been offered (e.g., Kita, van Gijn, and van der Hulst 1998), the basic model
has remained the same. Modifications were offered with respect to technical aspects of
the coding process. Previous contributions have demonstrated that
(i) gestural movements display a structure of their own, i.e., gestural movements are
characterized by the progression of specific phases,
(ii) these phases are hierarchically organized, and
(iii) they correspond closely to units at the speech level, thereby underlining the close
relationship between gesture and speech.
and functions in the analytic process. In this framework, gestures are first described with
respect to their form, i.e. independent of speech. For this purpose, the approach adopts
the four parameters of sign language for the characterization of gestures (see Battison
1974; Becker 2004; Ladewig and Bressem forthcoming; Sparhawk 1978; Stokoe 1960;
Webb 1996) and describes gestures regarding the configuration of the hand, the orientation
of the palm, the movement, and the position in gesture space. Only in a second step are ges-
tural forms evaluated with respect to their function and related to speech. Based on the sep-
arate description of form and function, analyses approaching gestures within this
framework are able to make statements about patterns and structures of gestures on the
level of form alone as well as on the level of form and function.
(i) distinctive features comprising the categories movement and tension and
(ii) additional features subsuming the categories possible types of movement and flow
of movement.
All features are based on physical characteristics observable in the execution of gesture
phases. However the two sets of features show a different distribution across phases.
The set of distinctive features subsumes attributes that are visible in all gesture phases.
They make up a paradigmatic set of properties that are mutually exclusive and stand in
opposition to each other. Accordingly, a gesture phase carrying a particular feature
from the category movement such as “presence of movement” cannot carry another
feature from the same category such as “absence of movement”.
The set of additional features comprises properties that are not observable in all ges-
ture phases as they only apply to such phases that exhibit a particular distinctive fea-
ture, namely “presence of movement.” They can only selectively be used to identify
gesture phases, because they have a different status in the form-based characterization
of gesture phases. They will be used for a further description in order to enhance the
formal account of the gesture phases to be presented.
Contrary to the common form features presented above, these additional form fea-
tures cannot be observed in all gesture phases. They only apply to phases that exhibit a
particular common feature, namely “execution of movement”. As such, they can only
be used selectively for an identification of gesture phases and specify the different ges-
ture phases. Both types of articulatory features are presented in the following.
3.1.1.1. Movement
A prominent feature observable in the execution of gesture phases is the feature
movement. As such, gesture phases can be distinguished according to the presence
and absence of movement:
3.1.1.2. Tension
During the execution of gesture phases, changes between relaxation and exertion as
well as differences in the strengths of tenseness are observable. Tension may increase,
decrease, or remain stable. In order to account for differences in tension two different
kinds of muscular activities are taken into account: the volar and dorsal flexion of the
fingers.
Different qualities of tension are reflected in the configuration of the hand. As the
hand always shows some kind of a hand configuration due to residual muscle tension
(tonus), a default condition needs to be presumed. Accordingly, tension applies when
the hand’s configuration differs from that assumed in the default condition.
Accordingly, the category tension shows two features, namely
The feature [+tension] applies when the fingers either stretch (dorsal flexion) or bend
towards the palm (volar flexion), for instance. Tension is absent when the hands are
relaxed in a default position. In this case the feature [-tension] applies.
Gesture phases showing the features [+tension] can be sub-classified and marked by
the features
Movement phases exhibiting a stable configuration that differs from the ones taken in a
default condition are marked by the feature [+constant]. In these cases the tenseness
of the hand does not change throughout the execution of the phase. When [-constant]
applies, tenseness either increases or decreases. Accordingly, phases marked by this
feature can be further characterized in terms of an increase and decrease of tension:
(i) [+increase].
(ii) [-increase].
In both of the above-mentioned cases, the beginning and the end of the phases
show different degrees of tenseness reflected in varying configurations. The feature
[+increase] applies to phases in which tenseness is built up. Accordingly, in the begin-
ning of such a phase the hand is in or is close to the default condition and lacks
tension whereas the endpoint of the movement phase shows tension. This results in
1066 V. Methods
the formation of a configuration of the hand, which differs from the ones that can be
assumed in the default condition. In phases identified by a decrease of tension, the
beginning of the gesture phase is characterized by tension of the hand whereas the end-
point of the movement lacks the feature tension, i.e., the hand is in or close to the
default condition (Fig. 69.1). The decrease in hand tension corresponds with a deforma-
tion of the hand’s configuration approximating the default condition (see also Harling
and Edwards 1997; Martell 2005; for special cases see Bressem and Ladewig 2011: 69).
The application of the distinctive features allows for a differentiation of movement
phases in terms of gesture phases (see section 3.1.2 and 5). Additional features pre-
sented in the following section can enhance this form-based account and offer a
more detailed characterization of the phases.
Gesture phases that are marked by the feature [+restricted] show rather “straight” and
“curved” movements. In cases in which the movement is anchored solely at the wrist
“bending” and “raising” of the hand as well as “rotation” can be observed. In phases
in which the feature [-restricted] applies following six basic types are observable:
(i) “straight”,
(ii) “arced”,
(iii) “circle”,
(iv) “spiral”,
(v) “zigzag”, and
(vi) “s-line.”
captures varying force and/or a change of velocity in the execution of movement. Two
features are distinguished:
In cases in which the feature [+variable] applies, the flow of movement can be described
as variable, i.e., it shows some degree of variation within one movement phase: It may
be accentuated, accelerated, or decelerated. If the flow of movement does not show any
variation the phase is marked by the feature [-variable] (see Tab. 69.1).
For the dynamic gesture phases, the additional features have the same status as the
distinctive features. They show a particular distribution across the dynamic phases and,
as such, differentiate these phases from each other. As is the case with the distinctive
features, the additional features are mutually exclusive of one another and stand in
opposition to each other, i.e., if a phase carries a particular feature from one category
it cannot carry another feature of the same category.
+ restricted - restricted
flow of
+ variable - variable
Applying the sets of distinctive features as well as additional features to gesture phases
results in the following characterization of gesture phases.
3.1.3.1. Preparation
The gesture phase of a preparation is characterized by the execution of movement. The
tenseness of the hand increases during its performance as the hand shape is being
formed and a configuration that differs from the one taken in the default condition is
assumed. Accordingly, the preparation is marked by the features [+movement],
1068 V. Methods
[+tension], [-constant], and [+increase]. The preparation phase can be further character-
ized through additional features. According to these, it is distinguished by a restricted
variation of movement types as the only “straight” and “curved” movements occur. If
the movement is anchored merely at the wrist, the hand is raised (dorsal flexion) or it is
rotated upwards (supination) in the majority of cases. The flow of movement does not
show any variation. Accordingly, the preparation carries the features [+restricted] and
[-variable] (see Tab. 69.2).
3.1.3.2. Retraction
The retraction phase shows a similar feature matrix as the preparation phase. Both
phases are marked by the presence of movement [+movement] and a varying tenseness
([+tension], [-constant]). However, during the execution of the retraction tenseness
decreases ([-increase]). As such, the hand’s configuration is modified insofar as it is
resolved and approximates a default condition.
Again the variation of movement types is restricted to “straight” and “curved.” If the
movement is anchored at the wrist only, the hand is either bent to pulse (volar flexion)
or rotated (pronation), in the majority of cases. The flow of movement does not vary.
According to these observations, the features [+restricted] and [-variable] apply.
3.1.3.3. Stroke
Similar to the preparation and the retraction, the stroke is characterized by execution of
movement. However, this is the only feature that all three gesture phases discussed so
far have in common. Throughout the execution of a stroke, the tenseness of the hand
remains stable, i.e., the configuration of the hand is maintained and differs from the
one taken in the default condition (see Bressem and Ladewig 2011 for further details).
Furthermore, the types of movement are not restricted at all. This means, that the six
basic movement types “straight”, “curved,” “circle”, “spiral”, “zigzag”, and “s-line”
may be realized by a stroke. Additionally, the flow of movement may vary during the
execution of the stroke. This means that the stroke is the only phase, which may be ac-
centuated and/or performed with a change in velocity. According to these observations,
the stroke is marked by the features [+movement], [+tension], [+constant], [-restricted],
and [+variable].
3.1.3.4. Hold
The hold is characterized by absence of movement. Throughout the execution of a hold,
the tenseness of the hand remains stable (see above) and the configuration differs from
the one taken in the default condition. In some cases, slight drifting movements are
observable in the execution of a stroke, which are not meaningful (see also Duncan
n.d.). However, these are not considered gestural movements but are evidence of mus-
cle contraction. Accordingly, the features [-movement], [+tension], and [+constant]
apply.
The gesture phases stroke and hold share one distinctive feature that is not realized
in the remaining phases and is, as such, essential for their identification: This is the
feature [+tension] that is sub-classified by the feature [+constant].
69. A linguistic perspective on the notation of gesture phases 1069
3.1.4. Summary
This context-independent perspective on gesture phases allows an explication of the ar-
ticulatory features that an analyst perceives and relies on when coding gesture phases
and when describing their forms and functions. These features are subsumed under
sets of distinctive and additional features, which show a particular distribution across
gesture phases. In this way, a feature matrix for each gesture phase was set up:
However, when observing gesture phases embedded within a series of further phases,
these characteristics may undergo changes. Some cases in which the phases’ character-
istics deviate from their usual feature sets will be presented in the following section
focusing on a context-sensitive description.
features of two adjacent gesture phases. Two cases will be presented: firstly the omission
of a preparation, and secondly a succession of strokes.
show changes in the category “tension”: The feature [+increase] is replaced by the fea-
ture [+constant]. This change in tension is reflected in the hand’s configuration. Whereas
the hand’s configuration is usually being formed during the execution of a preparation, it
is maintained in theses specific linear successions. Hence, these preparation phases
display the features assigned to the gesture phase stroke (see Tab. 69.2).
These observations thus raise the question whether the movement segments in
between the strokes can still be regarded as preparatory phases or whether they should
rather be considered as strokes. A closer examination of the strokes in these sequences
supports the first assumption: All strokes in these successions show a further movement
characteristic subsumed under the category “flow of movement”, namely an accentua-
tion. By accentuation of movement we understand that the end of the motion is stressed
such that the movement is carried out with more force. This rise in force leads to an
increase in the intensity at the end of the movement execution (see Bressem 2012).
Accordingly, in addition to the features [+movement, +tension, +constant], the strokes
in such successions exhibit the feature [+variable].
The range of contexts in which the abovementioned changes occur can only be ob-
served if at least two strokes sharing the same form parameters are executed. Therefore,
preparations alter their articulatory feature “tension” only if more than one stroke with
the same hand shape, orientation of the palm, movement and position in gesture space
are carried out.
3.2.3. Summary
By taking a linguistic perspective on the study of gestures, the identification and
description of gesture phases has been reconsidered. A context-independent descrip-
tion of the phases preparation, stroke, hold, retraction, and rest position based on
their articulatory characteristic alone, leaving functional aspects aside was proposed.
Two sets of articulatory features were introduced. These are
(i) distinctive features subsuming the categories movement and tension, and
(ii) additional features comprising the categories possible types of movement and flow
of movement.
4. Discussion
Adopting a feature-based approach to the description of gesture phases offers an expli-
cation of gestural characteristics and contributes to the discussion of gesturalness
1072 V. Methods
(Kendon 2004: 15), i.e., the “features that an action must have for it to be treated as a
gesture” (Kendon 2004: 12, see also Kendon 1996). However, analyzing gesture phases
as a bundle of features does not necessarily imply the assumption that the described
gestural units resemble or are akin to units of speech. The present approach to gesture
phases is inspired by phonology and proposes to treat gesture phases as separate units
of analysis that can be perceived and analyzed as such. Therefore, the approach offered
here provides the opportunity to describe patterns and structures on the level of the
phases themselves and offers new insights into the semiotic system “gesture”. It consti-
tutes a step further towards a “grammar of a gesture” (Bressem 2012; Fricke 2012; La-
dewig 2012; Ladewig and Bressem forthcoming; Müller, Bressem, and Ladewig this
volume) and lays further grounds for an understanding of the intertwining simultaneous
and linear structures of speech and gesture.
The linguistic framework described here serves as a theoretical building block from
which elements are taken and adopted as far as the semiotic structures of the medium
“gesture” allow for it. It aims at developing a consistent terminology for the description
of both modalities speech and gesture and intends to advance reflections upon a multi-
modal grammar (Fricke 2012). Notably, this is done with the utmost care, so as not to
lose sight of the particular properties of gestures.
As proposed above (see section 3), the annotation is done independent of speech,
meaning that the sound is turned off during this process. In this way, only the units
of gesture phases are taken into account and neither their relation nor function with
respect to speech.
The annotation is executed in the annotation software ELAN, a software developed
for the annotation of audio-visual data (Max Planck Institute for Psycholinguistics,
Nijmegen, The Netherlands, http://www.lat-mpi.eu/tools/elan/, Wittenburg et al. 2006).
ELAN provides the possibility to watch videos in varying speed and analyze them
frame by frame, set up individual lines for the categories to be annotated, and group an-
notations according to a time interval. Furthermore, annotations can be exported into
programs such as Microsoft Word or Microsoft Excel.
In order to decide whether a particular movement is a preparation or a stroke, for
instance, the gestural movement excursions need to be segmented. More precisely, on-
sets and offsets of movement phases need to be determined. Seyfeddinipur’s (2006)
frame-by-frame marking procedure aims exactly at this analytical step in the coding
69. A linguistic perspective on the notation of gesture phases 1073
(i) dynamic phases which are characterized by the execution of movement such as in
preparation, stroke, and retraction, and
(ii) static phases which do not involve movement such as in holds and rest positions.
blurry blurry blurry clear clear clear clear blurry blurry clear blurry blurry
5 6 7 8 9 10 11 12 13 14 15 16
It needs to be stressed, however, that the offset of one movement phase can only be
exactly determined when the onset of the following movement phase is taken into
account. The onset of one movement phase thus retrospectively determines the offset
of the preceding. Accordingly, the precise ending of any gestural movement phase
can only be made out when
(i) the next video image is as clear as the preceding thus showing that it belongs to a
following static phase, or
(ii) the next video image is blurred and therefore showing that it belongs to a following
dynamic phase (see Fig. 69.1).
Furthermore, the sharpness of a video image should be determined with regard to the
preceding and following video image as this criterion depends
The determined dynamic and static movement phases provide the basis for the assign-
ment of gesture phases since each gesture phase is allocated to particular types of
transitions.
1074 V. Methods
In order to distinguish strokes from each other, it might in some cases be useful to pay
attention to a change in the realization of one or more parameters, i.e. movement, hand
shape, orientation of the palm, and position in gesture space. This analytical step is
based on the assumption that the meaning of a gesture is reflected in its form, such
that the modification of the form may entail changes in the meaning of a gesture (see
e.g., Ladewig 2010, 2011; Ladewig and Bressem forthcoming; Müller 2004). Accordingly,
the transition from two dynamic movement phases identified as strokes may not only be
visible in a frame showing a clear video image. Two strokes are also differentiated from
each other by a modification of one or more parameter realizations.
If the analyst encounters phases, which cannot be determined easily, s/he should con-
sider special cases (see section 3.2 and Bressem and Ladewig 2011) or take the function
of gesture phases into account.
(i) the function of a gesture phase with regard to surrounding gesture phases and
(ii) their communicative and interactive function.
Preparations and retractions fulfill functions with respect to following gesture phases.
The function of the preparatory phase is to bring the hand to a particular position at
which or from which a stroke can be performed. This function is reflected in the feature
[+restricted] since moving the hands to a position in space does not demand a variety of
movement patterns. Furthermore, during the preparatory phase the hand assumes
the configuration of the following stroke. Therefore, a preparation shows the features
[-constant] and [+increasing] in most cases. So, if an analyst considers a particular move-
ment phase to be performed so that another phase can be executed, the movement
phase can be identified as a preparation.
The retraction constitutes, in most cases, a transition from a stroke to a rest position.
Accordingly, the formation assumed during the preparation and maintained while per-
forming the stroke is resolved and approximates a default condition. In some cases, the
hands are not moved fully back to a rest position but the path of the hands is interrupted.
This phase has been termed partial retraction (Kendon 1980: 212; Seyfeddinipur 2006).
However, speaking in terms of articulatory features and taking the phases themselves
into account, a partial retraction exhibits the same feature matrix as a “full” retraction.
The stroke is the phase that carries the meaning of a gesture. Thus, it forms the cen-
ter of a gestural unit, which has also been referred to as nucleus (Kendon 2004). Hence,
a stroke differs from other gesture phases insofar as it shows the widest range of pos-
sible types of movements and it varies in the flow of movement. As such it can be ac-
centuated, for instance, which is one reason why it has been termed the phase which “is
supposed to be more forceful compared to its neighboring phases” (Kita, van Gijn, and
van der Hulst 1998: 32).
A hold can occur independently but in most cases it prolongs the scope of the stroke
as a hold mostly precedes or follows it. Pre- and post-stroke hold together with the
stroke or a hold alone belongs to the nucleus of a gestural unit (Kendon 2004).
The most striking feature in the identification of a rest position is its form. During
the rest position the hands and arms are relaxed. A rest position serves as a reference
in order to decide whether a hand shows tension or not which is why we also consider it
a default condition.
The three analytical steps presented above should not be understood as completely
independent steps in the annotation of gesture phases. Rather, the annotation process
must be conceived as a process, characterized by a back and forth between those three
aspects (see Fig. 69.2). Particularly in complex linear sequences of gestures, the
1076 V. Methods
segmentation of movement
phases
continuous consideration of all three aspects enhances the coding process. The separa-
tion of this process into the determination of movement phases and their classification
based on form and function mainly serves the purpose of bringing to light the different
analytical steps necessary in the segmentation and coding of gesture phases.
Last but not least, we would like to point out that the method presented above is not
to be understood as an error-proof method but should be regarded as a companion to
other proposals and as a further step towards putting the coding of gesture phases on
objectives grounds, by taking the characteristics of the medium itself into account.
Acknowledgments
We are grateful to the Volkswagen Foundation for supporting this research with a grant
for the interdisciplinary project “Towards a grammar of gesture: evolution, brain and
linguistic structures” (www.togog.org).
6. References
Arendsen, Jeroen, Andrea J. van Doorn and Huib de Ridder 2007. When and how well do people
see the onset of gestures. Gesture 7(3): 305–342.
Battison, Robin 1974. Phonological deletion in American Sign Language. Sign Language Studies
5: 1–19.
Becker, Karin 2004. Zur Morphologie redebegleitender Gesten. MA thesis, Department of Philos-
ophy and Humanities, Free University Berlin.
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D.
dissertation, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt
(Oder).
Bressem, Jana this volume. A linguistic perspective on the notation of form features in gestures. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science 38.1.)
Berlin/Boston: De Gruyter Mouton.
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases: Articulatory features of ges-
tural movement? Semiotica 184(1/4): 53–91.
69. A linguistic perspective on the notation of gesture phases 1077
Chafai, Nicolas Ech, Catherine Pelachaud and Danielle Pelé 2006. Analysis of gesture expressivity
modulations from cartoons animations. Workshop on “Multimodal Corpora”, International
Conference on Language Resources and Evaluation LREC, May 27th in Genoa, Italy.
Condon, William S. and William D. Ogston 1967. A segmentation of behavior. Journal for Psychi-
atric Research 5: 221–235.
Duncan, Susan D. n.d.. Coding “Manual” http://mcneilllab.uchicago.edu/pdfs/Coding_Manual.pdf,
accessed May 2006.
Efron, David 1972. Gesture, Race and Culture. Paris: Mouton. First published as “Gesture and
Environment” New York: King’s Crown Press. First published [1941].
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories, ori-
gins, usage and coding. Semiotica 1(1): 49–98.
Freedman, Norbert 1977. Hands, words and mind: On the structuralization of body movements
during discourse and the capacity for verbal representation. In: Norbert Freedman and
Stanley Grand (eds.), Communicative Structures and Psychic Structures, 109–132. New York:
Plenum.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: Walter de Gruyter.
Fricke, Ellen 2012. Grammatik Multimodal: Wie Wörter und Gesten Zusammenwirken. Berlin: De
Gruyter.
Gullberg, Marianne and Kenneth Holmquist 2006. What speakers do and what listeners look at.
Visual attention to gestures in human interaction live and on video. Pragmatics and Cognition
14(1): 53–82.
Harling, Philip and Alistair Edwards 1997. Hand tension as a gesture segmentation cue. In: Philip
Harling and Alistair Edwards (eds.), Progress in Gestural Interaction. Proceedings of Gesture
Workshop 1996, 75–88. Berlin: Springer.
Kahol, Kanav, Priyamvada Tripathi and Sethuraman Panchanathan 2006. Recognizing whole body
movements and gestures through activities in human anatomy. International Journal on Sys-
temics, Cybernetics and Informatics 3: 25–32.
Karpiński, Maciej, Ewa Jarmołowicz-Nowikow and Zofia Malisz 2009. Aspects of gestural and
prosodic structure of multimodal utterances in Polish task-oriented dialogues. In: Grazyna De-
menko, Krzysztof Jassem and Stanislaw Szpakowicz (eds.), Speech and Language Technology,
volume 11, 113–122. Poznań, Poland: Polish Phonetic Association.
Kendon, Adam 1972. Some relationship between body motion and speech In: Aron W. Siegman
and Benjamin Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Pergamon
Press.
Kendon, Adam 1980. Gesticulation and speech: Two aspects of the process of utterance. In:
Mary Ritchie Key (ed.), Nonverbal Communication and Language, 207–277. The Hague:
Mouton.
Kendon, Adam 1996. An agenda for gesture studies. The Semiotic Review of Books 7(3): 7–12.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University
Press.
Kettebekov, Sanshzar and Rajeev Sharma 2001. Toward natural gesture/speech control of a large
display. In: Roderick Little and Laurence Nigay (eds.), Engineering for Human-Computer
Interaction, 221–234. Heidelberg, Germany: Springer.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and
cospeech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Gesture, and Sign Language in Human-Computer Interaction, 23–35. Berlin:
Springer.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste.
In: Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. http://
cognitextes.revues.org/406.
1078 V. Methods
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech. Ph.D. disserta-
tion, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium hand – Discovering
structures in gestures on the basis of the four parameters of sign language. Semiotica.
Latoschik, Marc Erich 2000. Multimodale Interaktion in virtueller Realität am Beispiel der virtuel-
len Konstruktion. Bielefeld: Technische Universität Bielefeld.
Loehr, Dan 2006. Gesture and Intonation. Washington, DC: Georgetown University Press.
Martell, Craig 2005. FORM: An experiment in the annotation of the kinematics of gesture. Ph.D.
dissertation, Department of Computer and Information Sciences, University of Pennsylvania.
Martell, Craig and Joshua Kroll 2007. Corpus-based gesture analysis: An extension of the form
dataset for the automatic detection of phases in a gesture. International Journal of Semantic
Computing 1: 521.
McClave, Evelyn Z. 1991. Intonation and gesture. Ph.D. dissertation, Georgetown University,
Washington, DC.
McCullough, Karl-Erik 2005. Using Gestures during Speaking: Self-Generating Indexical Fields.
Chicago: ProQuest Information and Learning.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: Chicago
University Press.
McNeill, David 2005. Gesture and Thought. Chicago: Chicago University Press.
McNeill, David 2012. How Language Began: Gesture and Speech in Human Evolution. (Ap-
proaches to the Evolution of Language.) Cambridge: Cambridge University Press.
Menzerath, Paul and Antonio de Lacerda 1933. Koartikulation, Steurung und Lautabgrenzung,
Volume 1. Berlin: Dümmler.
Mosher, Joseph A. 1916. The Essentials of Effective Gesture for Students of Public Speaking. New
York: Macmillan.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2010. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In: Sprache und Gestik. Sprache und Literatur 41(1): 37–68.
Müller, Cornelia this volume. Gestures as a medium of expression: The linguistic potential of ges-
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Se-
dinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of gestures:
A form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communica-
tion Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia, Silva H. Ladewig and Jana Bressem this volume. Gestures and speech from a lin-
guistic point of view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communica-
tion Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia, Hedda Lausberg, Ellen Fricke and Katja Liebal 2005. Towards a Grammar of
Gesture: Evolution, Brain, ans Linguistics Structures. Berlin: Antrag im Rahmen der Förder-
initiative “Schlüsselthemen der Geisteswissenschaften Programm zur Förderung fachübergrei-
fender und internationaler Zusammenarbeit”.
Nobe, Shuichi 2000. Where do most spontaneous representational gestures actually occur with
respect to speech? In: David McNeill (ed.), Language and Gesture, 186–198. Cambridge: Cam-
bridge University Press.
70. A linguistic perspective on the notation of form features in gestures 1079
Ott, Edward Amherst 1902. How to Gesture. New York: Hinds and Noble.
Park-Doob, Mischa Alan 2010. Gesturing through time: Holds and intermodal timing in the stream
of speech. Ph.D. dissertation, Department of Linguistics, University of Berkeley.
Parrill, Fey 2000. Hand to mouth: Linking spontaneous gesture and aspect. BA thesis, Department
of Linguistics, University of Berkeley.
Quek, Francis, David McNeill, Robert Bryll, Susan Duncan, Xin-Feng Ma, Cemil Kirbas, Karl E.
McCullough and Rashid Ansari 2002. Multimodal human discourse: Gesture and speech.
Association for Computing Machinery, Transactions on Computer-Human Interaction 9(3):
171–193.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting Speech and Gesture. (Max Planck Institute
Series in Psycholinguistics, 39.) Nijmegen: Max Planck Institute.
Sowa, Timo 2006. Understanding Coverbal Iconic Gestures in Object Shape Descriptions. Berlin:
Akademische Verlagsgesellschaft.
Sparhawk, Carol 1978. Contrastive-Identificational features of Persian gesture. Semiotica 24(1/2):
49–86.
Stokoe, William C. 1960. Sign Language Structure: An Outline of the Communicative Systems of
the American Deaf. Studies in Linguistics, occasional paper, no. 8. Buffalo, NY: University
of Buffalo Press.
Trubetzkoy, Nikolaj S. 1958. Grundzüge der Phonologie. Göttingen: Vandenhoeck und Ruprecht.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. Ph.D. dissertation, University of
Rochester, Rochester, New York.
Wilson, Andrew D., Aaron F. Bobick and Justine Cassell 1996. Recovering the temporal structure
of natural gesture. In: Proceedings of the Second International Conference on Automatic Face
and Gesture Recognition, 66–71.
Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann and Han Sloetjes 2006.
ELAN: A professional framework for multimodality research. In: Proceedings of LREC
2006, Fifth International Conference on Language Resources and Evaluation.
Yasinnik, Yelena, Margaret Renwick and Stefanie Shattuck-Hufnagel 2004. The timing of speech
accompanying gestures with respect to prosody. In: Proceedings From Sound to Sense, 97–102.
http://velar.phonetics.cornell.edu/peggy/FA-Yasinnik-STS-MAC.pdf (29 November 2010).
Abstract
This chapter presents a notation system for gestures, which, by focusing solely on gestures’
physical appearance, directs the attention to the different facets of a gesture’s form and
1080 V. Methods
1. Introduction
Notation, coding or annotation systems for gestures have been proposed by a range of
researchers (e.g., Birdwhistell 1970; Calbris 1990; Duncan n.d.; Gut et al. 2002; Kipp
2004; Lausberg and Sloetjes 2009; Martell 2005; McNeill 1992, 2005; Mittelberg 2006,
2010; Müller 1998; Sager 2001; Sager and Bührig 2005 inter alia; see also Bohle this vol-
ume). However, coming from various disciplines and theoretical backgrounds and with
differing analytical perspectives in mind, a systematic linguistic method for the notation
of gestures is still lacking. Systems differ greatly with respect to what is being described
and in how much detail, as well as the applied terminology and methodology. More-
over, they often remain implicit with regard to their respective research question or
analytic perspective (see Bressem this volume for an overview of existing notation
and transcription system for gestures).
Typically, speech is omnipresent during notation and is made the basis of comparison,
which gestural descriptions have to meet. Aspects of gestures’ forms are thereby often
selectively chosen subject to the accompanying utterance and the information contained
therein. Although the necessity to focus on gestures’ form is gaining more and more
ground in the respective research (e.g., Bergmann, Aksu, and Kopp 2011; Bressem
2012; Bressem and Ladewig 2011; Calbris 1990; Fricke 2007, 2012; Hassemer et al. 2011;
Holler and Beattie 2002; Kendon 2004; Ladewig 2010, 2011, 2012; Ladewig and Bressem
forthcoming; Lücking et al. 2010; Martell 2002; Mittelberg 2007; Müller 2004, 2010, 2011;
Sowa 2006; Teßendorf 2005; Webb 1996, 1998 inter alia), gestural forms alone have only
rarely been the focus of notation or annotation systems (e.g., Birdwhistell 1970; Sager
2001; Sager and Bührig 2005; Martell 2002). Against this background, it appears timely
to propose a framework for the annotation of gestures from a linguistic point of view.
The present chapter presents a notation system for gestures, which, by focusing solely
on gestures’ physical appearance, directs the attention to the different facets of a gesture’s
form, and focuses on its detailed characterization. The system is grounded in a linguistic-
semiotic approach to gestures, assuming a heuristic separation of form, meaning, and
function in the analytical process (see section 2 for further details). Accordingly, the
present system differs from others existing systems in three essential aspects:
The notation system only includes guidelines for the notation of gestures’ forms with
regard to their physical appearance. It addresses hand shapes, movement patterns,
orientations of the hand, and positions in gesture space. It does not include guidelines
for the segmentation and coding of gesture phases (see for example Bressem and
Ladewig 2011), a meaning analysis of gestural forms (e.g., Kendon 2004; Ladewig
70. A linguistic perspective on the notation of form features in gestures 1081
2010, 2011; Müller 2004, 2010), a classification of gestures (e.g., McNeill 1992; Müller
1998), or other aspects of gestural coding and analysis. The notation system is consid-
ered as one module of a linguistic description and analysis of gestures, which can be
freely combined with other annotation systems or with other aspects of a linguistic ges-
ture analysis (for an overview of a linguistic method of gesture analysis see Müller,
Bressem, and Ladewig this volume; Müller, Ladewig, and Bressem this volume).
The notation system may be applied within a range of disciplines such as (cognitive)
linguistics and semiotics, anthropology, ethnography, primatology, psychology, and cogni-
tive science. It can be used for descriptive as well as experimental approaches to the ana-
lysis of gestures. Although it was developed within a linguistic context, it is not restricted
to linguistic research questions. On the contrary, it is designed to be a widely applicable
cross-disciplinary notation system for a description of gestures’ physical features.
(i) form,
(ii) sequential structure,
(iii) context of use (local) and
(iv) distribution by which gestural meaning construction and the interplay of speech
and gesture are analyzed.
1082 V. Methods
The analysis of gestures’ form thereby rests upon the “four feature scheme” (Becker
2004, 2008), which grounds the description of gestures on the four parameters of sign
language (Battison 1974; Klima and Bellugi 1979; Stokoe 1960). Gestures are described
in the four parameters “hand shape”, “orientation”, “movement”, and “position”. Sim-
ilar to sign languages, for which each of the parameters can be distinctive in differen-
tiating one sign from another, a linguistic-semiotic approach to gestures assumes a
potential significance of all four parameters for the creation of gestural meaning. Ex-
cluding one of the parameters from the description might result in missing a possibly
meaningful realization. Here, the notation system presented in this chapter comes in.
It presents a system based on the four form parameters, which has been developed during
the course of an empirical study investigating recurrent gestural forms of German speakers
in natural occurring conversations (Bressem 2007; Ladewig and Bressem forthcoming).
distance of the arm from the body. A separate notation of these features is thus not
included in the present system. Furthermore, the system was not designed as a nota-
tion system allowing for a real life reproduction of gestures in artificial agents for
instance (see for instance Martell 2002). Rather, it is designed to allow for a notation
of gestures, which helps to uncover structures and patterns in gestures’ forms and
functions, which provide a sound basis for gestural meaning analyses.
(iii) A systematic characterization of gestural forms in all four parameters of sign
language
We suggest that it is essential to describe a given gesture with regard to all four
parameters formulated in Sign Linguistics (Battison 1974; Klima and Bellugi
1979; Stokoe 1960): hand shape, orientation, movement, and position. Therefore,
the notation system provides descriptive categories for the notation of hand
shapes, orientations of the palm, movement patterns, and positions in gesture
space. Note, that the presentation of the notation conventions for the four para-
meters follows a particular logic by arranging them according to their prominence.
The system starts with the guidelines for the notation of hand shapes, as it assumes
hand shapes to be the most prominent form features of gestures. Similar to its use
in sign languages, it is assumed that the “perceptional identification of a relatively
stable hand shape in the flow of movement of single signs is much easier than
the identification and nomination of a movement, place of articulation or an
orientation of the hand.” (Wrobel 2007: 47, translation JB)
Based on the fact that the orientation of the hand is strongly connected to the
hand shape, the notation system places the parameter “orientation” on second
position in the notational logic. Contrary to other proposals of gesture notation
and coding, which argue for an inseparability of hand shape and orientation due
to their close connection and thus do not include separate conventions for these
two form aspects (e.g., Gut et al. 2002; Kendon 2004), the present system, based
on McNeill (1992), provides for a separate notation of orientations of the hand.
This assumption is grounded in a large body of research, showing that changes
in the orientation of the hand may go along with changes in gestural meaning
(e.g., Calbris 1990; Fricke 2012; Harrison 2009, 2010; Kendon 2004; Mittelberg
2006, 2010; Müller 2004; Sowa 2006 inter alia).
On third position, the system places the parameter “movement”, as it assumes
movement to be the other most prominent form feature apart from hand shape.
Sign language research for instance was able to show that the “perception of
sign movement appears to be crucially different from that of the static parameters,
such as hand shape and location (Poizner, Klima, and Bellugi 1987). Thus move-
ment appears to be central to sign production and perception […]” (Schembri
2001: 27). Also, gesture research has shown that the parameter “movement”
can be the core form feature in the establishment and differentiation of gestural
meaning (e.g., Calbris 1990; Harrison 2009, 2010; Ladewig 2010, 2011; Mittelberg
2006, 2010; Müller 2000, 2004; Teßendorf 2005 inter alia).
The notation of the parameter “position” is the last step in notating gestures’
forms. Although it is clear that the position in gesture space may be a central fac-
tor in distinguishing meaning and function of gestures (see Ladewig 2010, 2011a)
or that it can be exploited for the creation of larger gestural units (Bressem 2012;
Müller 1998, 2011), the parameter “position” appears to be a generally less central
1084 V. Methods
(i) fist,
(ii) flat hand,
(iii) single fingers, and
(iv) combinations of fingers (see Fig. 70.1).
70. A linguistic perspective on the notation of form features in gestures 1085
This distinction is based on the idea that the four categories show different prominent
areas, which determine the hand’s shape. With respect to the “flat hand”, for example,
the palm of the hand dominates the shape of the whole hand configuration. For the cat-
egory “combinations of fingers”, however, single fingers as well as combined fingers in
association with the palm determine the configuration of the hand as a whole.
1. “Fist” 2. “Flat hand” 3. “Single fingers” 4. “Combination of fingers”
Accordingly, the description of the hand shape rests upon the evaluation of the most
prominent form feature of the hand, and questions whether
Deciding on the particular category is therefore the first step in the depiction of the
parameter “hand shape”.
In addition, hand shapes involving both hands have to be distinguished. These are
either separated based on a) the four categories, the number of digits involved, and the
shape of the fingers (see below) or b) named individually such as in “hands interlocked”.
For the hand shapes assigned to the category “single fingers”, “combinations of
fingers”, and “hand shapes involving both hands”, the hand configuration is further
specified by the involved number and shape of the digits. In order to differentiate the
fingers of the hand, they are numbered, starting from 1 (=thumb) to 5 (=little finger).
After identifying and numbering the digits, their particular form has to be specified.
Here, six different shapes are distinguished, i.e., the digit is
(i) stretched,
(ii) bent,
(iii) crooked,
(iv) flapped down,
(v) connected, or
(vi) touching (see Fig. 70.2 below).
stretched bent crooked flapped down connected touching
These shapes correspond to differences in flexing the joints of the digits. Whereas in the
shape “stretched” no joint is flexed, the form “bent” shows a little flexing of the joint at
the fingertip as well as the middle knuckle joint. If the digits are “crooked”, the joints at
the finger tip, the middle knuckle joint as well as the joint at the rudiment of the digit
are flexed, whereas the middle knuckle joint is flexed the most. If the digit is depicted as
“flapped down”, the digit shows only a flexion of the joint at the rudiment and is almost
at right angles to the palm.
The shapes “connected” and “touching” specify shapes of the fingers, in which 2 or
more fingers are in contact. The shape “connected” applies to configurations of the digits,
in which the fingers are “bent” and thus show a flexion of all three knuckle joints, but are
additionally connected at the very tip of the finger. For the shape “touching”, however,
the digits are “flapped down” and touch each other at the entire first limb of the digit (see
Fig. 70.3 for examples of hand shapes involving the combination of fingers).
Furthermore, the marker “spread” is assigned if, in cases of the category “combina-
tions of fingers”, the fingers are separated from each other. In these cases, the fingers
are spread apart, i.e., the space between them is enlarged by upholding an extra amount
of muscle tension in the whole configuration of the hand.
Combination of fingers
1+2 connected 1+3 connected 1+2 crooked 1+2 bent 1–5 crooked 1–5 bent 1–5 spread
bent
2–5 flapped down 2–5 flapped 2–5 bent 1–5 touching 1+5 connected 1+2 touching
down 1
stretched
(i) Assigning the hand shape to one of the four categories or classifying it as a config-
uration involving both hands;
(ii) Numbering of each finger;
(iii) Specifying the shape of the digit.
For the characterization of the palm’s orientation, four different basic angles are
distinguished:
In addition to these four, the marker “diagonal” (Bressem 2006) is used to further dif-
ferentiate the four basic angles and to mark an intermediate orientation between
them. While in the case of a “palm lateral”, the hand is parallel to the sagittal line of
the body’s center, the marker “diagonal” indicates a 45 degrees angle to the body’s cen-
ter line (see Figs. 70.4 and 70.5) or with regard to the body of the speaker.
PLdiTC PLTC PLdiAC
sagittal axis
sagittal axis
sagittal axis
body of speaker
body of speaker
The orientations “palm lateral”, “palm vertical” and any orientation additionally tagged
by the marker “diagonal”, is further differentiated with respect to the gesture space.
Here, four types are distinguished:
1088 V. Methods
Additionally, if necessary, the orientation of the fingers such as “fingers down” will be
noted. The characterization of a hand’s orientation is therefore always a combination
of “orientation 1” and “orientation 2”, as in “palm vertical (1) away body (2)” for
example.
To sum up: the notation of the parameter “orientation” involves four steps:
For movements executed by the wrist of the hand, the notation system distinguishes
three possible types, i.e., “bending”, “raising”, and “rotation” (see for instance Prillwitz
et al. 1989) (see Fig. 70.7).
bending to
puls raising bending to 1 bending to 5 rotation
Fig. 70.7: Types of movement for wrist (figure taken from Prillwitz et al. 1989)
The third type of motion patterns, namely movements of single fingers are depicted ac-
cording to the basic movement types “straight”, “arced”, and “circle”. Additionally, for
the depiction of movements executed by all fingers of a hand, “beating of fingers”,
“flapping down”, “grabbing movement”, and “closing of fingers” is differentiated.
(i) movements along the horizontal axis (right and left, regarded from the perspective
of the gesturer),
(ii) movements along the vertical axis (up and down), and
(iii) movements along the sagittal axis (away from body and towards body).
up
left right
down
Fig. 70.8: Directions of movements along the vertical and horizontal axis
1090 V. Methods
towards body
away body
body of speaker
characterization of a spiral motion can for example be “clockwise right”. The directions
mentioned above are also used for the depiction of movements of single fingers.
For the characterization of the “bending” type of wrist movement, four directions
are distinguished, i.e., “to pulse”, “to 1”, and “to 5”. The type “raising” needs no further
specification. The type “rotation” is depicted in the same fashion as circular as well as
spiral motions, i.e., according to “clockwise” or “counter clockwise” direction.
The terms introduced for the depiction of the character of movement can and often
need to be combined with one another. It is therefore possible to characterize a move-
ment as “enlarged” and being “accentuated”.
The aspect “quality of movement” specifically addresses the markedness of move-
ments. A movement is marked, if it stands out in relation to other movements because
of a particular saliency regarding one of these qualitative features. For instance, in an
“accentuated” movement, the endpoint of the motion is stressed, because the movement
is carried out with more force. This rise in force leads to an increase in the intensity at
the end of the movement execution. Similarly to the accent in the spoken language, in
which the accent is used to stress particular segments of speech such as syllables for
instance (see for example Pompino-Marschall 1995), an accentuation in gestures may
be used to stress a particular gestural segment of the motion pattern (Bressem and
Ladewig 2011; see also Bressem 2012 for a more detailed account).
The notation of the parameter “movement” involves four steps:
(i) Depict the basic type of movement, i.e., whether it is executed by the arm or shoul-
der, the wrist or the fingers.
(ii) Characterize the shape of the movement accordingly.
70. A linguistic perspective on the notation of form features in gestures 1091
EXTREME upper
PERIPHERY
CENTER
lower
This depiction of the gesture space is sufficient for a basic characterization of the hands
position and can be used for a first description in recording the gestural forms. How-
ever, if one is interested in a more detailed account of movements and positions in
space, then a three-dimensional model of the gesture space offers an appropriate exten-
sion Fricke’s (2005, 2007). Starting from McNeill’s gesture space, Fricke assigns four
dimensions to the gesture space, i.e.
These dimensions can be assigned either to capture the forward or the backward dis-
tance from the speaker’s the body. (If the hand’s backward distance from the body
1092 V. Methods
needs to be described, the numbers −1, −2 and −3 are used.) Fricke’s three dimensional
model of gesture space may account for the hand’s distance from the speaker’s body but
also for the use of interactive gesture space areas, and even for the reconstruction of
movement trajectories in space.
The notation of the parameter “position” involves two steps:
(i) Depict the basic sector and define its further characteristics.
(ii) If necessary, use Fricke’s three-dimensional model for further differentiation.
See Fig. 70.11 for an example annotation using the notation scheme. A complete doc-
ument containing graphical representations of all notation conventions can be found at
www.janabressem.de/publications.
Fig. 70.11: Example showing notation of gestures using the notation scheme (taken from Bressem
2012)
which is used recurrently by different speakers. Starting from this research question, the
study posed the following three aspects:
(i) Which hand shapes, orientations, positions, and movements do German speakers
use in naturally occurring conversations?
(ii) How are the realizations of the four parameters distributed?
(iii) Is it possible to detect frequent co-occurrences of parameter realizations?
Based on the notation system introduced above, the study documented altogether 6 re-
current hand shapes out of which the “flat hand” and the “lax flat” hand were used most
frequently. Moreover, it was shown that both hand shapes frequently occurred with par-
ticular orientations, movements, and positions in gesture space. The “flat hand” and the
“lax flat hand” were documented to be used most often in a palm lateral orientation
(PLTC), a straight movement downwards positioned in the center of the gestures
space (cc), showing that clusters, i.e., the simultaneous occurrence of four form para-
meters frequently recur. Accordingly, the study was able to show that
(i) German speakers dispose of standardized gestural forms, which they use
recurrently,
(ii) that the co-occurrence of hand shapes with other gestural forms such as orientation
or movement is not random, and
(iii) that speakers seem to dispose of clusters which depend on particular hand shapes
and their co-occurrence with other specific gestural forms (for a more detailed
account of the study see Bressem 2007; Ladewig and Bressem forthcoming).
More recently, the notation system was applied in two studies examining simultaneous
structures of gestures in human and non-human primates, which aimed at
the simultaneous structures and specifically the degree of structural complexity present
in the gestures of nonhuman primates. Starting from a group of already identified and
analyzed visual and tactile gestures of Orang-Utans (Liebal, Pika, and Tomasello 2006),
these two groups of gestures were reanalyzed based on the notation scheme presented
in this paper (see Fig. 70.12). The study was able to document, that differences in form
features correlate with changes in the contexts-of use (Kendon 2004; Ladewig 2012;
Müller 2004). Accordingly, the study not only revealed that apes modify their gestures
depending on the goal they want to achieve, clearly replicating a structural pattern that
we find in the variation of gestures in human, but also that form variants of gestures
may be grouped into gesture families, moreover showing striking similarities observed
for gestures in humans.
Fig. 70.12: Notation of tactile gestures “slap” using the notation scheme (taken from Bressem
et al. in preparation)
The application of the notation scheme in the different studies has thus shown that
the form-based perspective of the notation is a suitable framework for the detection of
forms, structures, and patterns in gestures. The systematicity in the description of ges-
tures’ form along with the comparability of the terms allows for a (descriptive) statistical
analysis. Combined with annotation software such as Elan (Wittenburg et al. 2006), the
notation system is also suitable for further processing of the data, such as Excel or html.
5. Conclusion
The system presented in this paper sets the stage for a widely applicable notation sys-
tem, which is usable in a range of disciplines. As the terms introduced in the system are
based on form characteristics of gestures only, the notation system is open to all kinds of
research foci from various approaches. Given its flexibility and expandability, the sys-
tem may be adjusted according to a broad range of specific research questions. Further-
more, the notation system may be used as one module in a transcription, coding or
70. A linguistic perspective on the notation of form features in gestures 1095
annotation system for gestures (and speech) (see Bressem, Ladewig and Müller this
volume). Furthermore, it is applicable in annotation software but can also be used
for analyses not using particular annotation software.
Due to these characteristics, the notation system may be used in a range of disci-
plines interested in an analysis of gesture use. More importantly, however, it provides
the ground for a sound description of gesture forms, which is central to any account
of gesture irrespective of whether it focuses on the cognitive, semantic, interactive, or
other aspects of gestures.
6. References
Battison, Robin 1974. Phonological deletion in American Sign Language. Sign Language Studies
5: 1–19.
Becker, Karin 2004. Zur Morphologie redebegleitender Gesten. MA thesis, Department of Philos-
ophy and Humanities, Free University Berlin.
Becker, Karin 2008. Four-feature-scheme of gesture: Form as the basis of description. Unpub-
lished manuscript.
Bergmann, Kirsten, Volkan Aksu and Stefan Kopp 2011. The relation of speech and gestures:
Temporal synchrony follows semantic synchrony. Proceedings of the 2nd Gesture and Speech
in Interaction Conference (GeSpIn 2011). Bielefeld, Germany.
Birdwhistell, Ray 1970. Kinesics and Context. Philadelphia: University of Pennsylvania Press.
Bohle, Ulrike this volume. Approaching notation, coding, and analysis from a conversational ana-
lysis point of view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana 2006. Formen redebegleitender Gesten. Verteilung und Kombinatorik formbezoge-
ner Parameter. MA thesis, Department of Philosophy and Humanities, Free University Berlin.
Bressem, Jana 2007. Recurrent form features in coverbal gestures. http://www.janabressem.de/
Downloads/Bressem-recurrent form features.pdf (accessed 11 August 2010).
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D. disser-
tation, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt
(Oder).
Bressem, Jana this volume. Transcription systems for gestures, speech, prosody, postures, gaze. In:
Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana and Silva H. Ladewig 2011. Rethinking gesture phases. Semiotica 184(1/4): 53–91.
Bressem, Jana, Silva H. Ladewig and Cornelia Müller this volume. Linguistic Annotation System
for Gestures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill,
and Sedinha Teßendorf (eds.), Body – Language – Communication: An International Hand-
book on Multimodality in Human Interaction. (Handbooks of Linguistics and Communication
Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana and Cornelia Müller volume 2. The family of AWAY gestures. In: Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig and David McNeill (eds.), Body – Language –
Communication: An International Handbook on Multimodality in Human Interaction. (Hand-
books of Linguistics and Communication Science 38.2.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana, Katja Liebal, Cornelia Müller and Nicole Stein in preparation. Recurrent forms
and contexts: Families of gestures in non-human primates.
Calbris, Geneviève 1990. The Semiotics of French Gestures. Bloomington: Indiana University Press.
1096 V. Methods
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: Discourse evidence for
multimodal models of grammar. Ph.D. dissertation, Cornell University, Ithaca, NY. Ann
Arbor, MI: Cornell University: UMI.
Mittelberg, Irene 2007. Methodology for multimodality. One way of working with speech and ges-
ture data. In: Monica Gonzalez-Marquez, Irene Mittelberg, Seana Coulson and Michael J. Spi-
vey (eds), Methods in Cognitive Linguistics, 225–248. Amsterdam: John Benjamins.
Mittelberg, Irene 2010. Geometric and image-schematic patterns in gesture space. In: Vyv Evans
and Paul Chilton (eds.), Language, Cognition, and Space: The State of the Art and New Direc-
tions, 351–385. London: Equinox.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2000. Zeit als Raum. Eine kognitiv-semantische Mikroanalyse des sprachlichen
und gestischen Ausdrucks von Aktionsarten. In: Ernest W. B. Hess-Lüttich and H. Walter
Schmitz (eds.), Botschaften verstehen. Kommunikationstheorie und Zeichenpraxis. Festschrift
für Helmut Richter, 211–228. Frankfurt am Main: Peter Lang.
Müller, Cornelia 2004. Forms and uses of the Palm Up Open Hand. A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), Semantics and Pragmatics of Everday Gestures,
233–256. Berlin: Weidler.
Müller, Cornelia 2009. Gesture and language. In: Kirsten Malmkjaer (ed.), The Routledge Linguis-
tics Encyclopedia, 510–518. London: Routledge.
Müller, Cornelia 2010. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In: Sprache und Literatur 41(1): 37–68.
Müller, Cornelia 2011. Reaction paper. Are ‘deliberate’ metaphors really deliberate. A question
of human consciousness and action. Metaphor in the Social World 1: 61–66.
Müller, Cornelia this volume. Gestures as a medium of expression: The linguistic potential of ges-
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Se-
dinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia submitted. How gestures mean. The construal of meaning in gestures with speech.
Müller, Cornelia, Jana Bressem and Silva H. Ladewig this volume. Towards a grammar of gesture.
A form-based view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communi-
cation Science 38.1.) Berlin: De Gruyter Mouton.
Müller, Cornelia, Silva H. Ladewig and Jana Bressem this volume. Gestures and speech from a lin-
guistic point of view. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David
McNeill and Sedinha Teßendorf (eds.), Body – Language – Communication: An International
Handbook on Multimodality in Human Interaction. (Handbooks of Linguistics and Communica-
tion Science 38.1.) Berlin/Boston: De Gruyter Mouton.
Poizner, Howard, Edward S. Klima and Ursula Bellugi 1987. What the Hands Reveal about the
Brain. Cambridge: Massachusetts Institute of Technology Press.
Pompino-Marschall, Bernd 1995. Einführung in die Phonetik. Berlin: Walter de Gruyter.
Prillwitz, Siegmund, Regina Leven, Heiko Zienert, Thomas Hanke and Jan Henning 1989. Ham-
NoSys Version 2.0 Hamburger Notationssystem für Gebärdensprachen: Eine Einführung. Ham-
burg, Germany: Signum.
Sager, Svend F. 2001. Probleme der Transkription nonverbalen Verhaltens. In: Klaus Brinker,
Gerd Antos, Wolfgang Heinemann and Svend F. Sager (eds.), Text und Gesprächslinguistik.
Ein Internationales Handbuch Zeitgenössischer Forschung, 1069–1085. (Handbücher zur
Sprach- und Kommunikationswissenschaft 16.2.) Berlin: De Gruyter.
1098 V. Methods
Sager, Svend F. and Kristin Bührig 2005. Nonverbale Kommunikation im Gespräch – Editorial. In:
Kristin Bührig and Svend F. Sager (eds.), Osnabrücker Beiträge zur Sprachtheorie 70: Nonver-
bale Kommunikation im Gespräch, 5–17.
Schembri, Adam 2001. Issues in the analysis of polycomponential verbs in Australian Sign Lan-
guage (Auslan). Unpublished doctoral dissertation. University of Sydney.
Sowa, Timo 2006. Understanding Coverbal Iconic Gestures in Shape Descriptions. Berlin: Akade-
mische Verlagsgesellschaft.
Stokoe, William C. 1960. Sign Language Structure: An Outline of the Visual Communication Sys-
tems of the American Deaf. Buffalo, NY: University of Buffalo Press.
Ternes, Elmar 1999. Einführung in die Phonologie. Darmstadt: Wissenschaftliche Buchgesellschaft.
Teßendorf, Sedinha 2005. Pragmatische Funktionen spanischer Gesten am Beispiel des “Gestos de
Barrer”. MA thesis, Department of Philosophy and Humanities, Free University Berlin.
Teßendorf, Sedinha 2008. Pragmatic and metaphoric gestures – combining functional with cogni-
tive approaches. Unpublished manuscript.
Webb, Rebecca 1996. Linguistic features of metaphoric gestures. In: Lynn Messing (ed.), Pro-
ceedings of WIGLS. The Workshop on the Integration of Gesture in Language and Speech. Octo-
ber 7–8, 1996, 79–95. Newark in Delaware: Applied Science and Engineering Laboratories
Newark.
Webb, Rebecca 1998. The lexicon and componentiality of American metaphoric gestures In:
Christian Cave, Isabelle Guaitelle and Serge Santi (eds.), Oralité et Gestualité: Communication
Multimodale, Interaction, 387–391. Montreal: L’Harmattan
Wittenburg, Peter, Hennie Brugman, Albert Russel, Alex Klassmann and Han Sloetjes 2006.
ELAN: A professional framework for multimodality research. In: Proceedings of LREC
2006, Fifth International Conference on Language Resources and Evaluation.
Wrobel, Ulrike 2007. Raum als kommunikative Ressource. Eine handlungstheoretische Analyse vi-
sueller Sprachen. Frankfurt am Main: Peter Lang.
Abstract
This chapter outlines an annotation system for gestures grounded in a cognitive linguistic
approach to language use and provides guidelines for the annotation of gestures (gesture
units and phases, form and motivation of form), the annotation of speech as well as the
relation of gestures with speech on a range of levels of linguistic description (prosody,
semantics, syntax, and pragmatics). It addresses necessary aspects for a description of
71. Linguistic Annotation System for Gestures 1099
gestures’ forms and for a reconstruction of their meanings and functions with and without
speech and explicates underlying theoretical and methodological assumptions.
1. Introduction
A wide variety of different transcription or annotation systems are used in the field of
gesture studies. Developed for instance within psychology, anthropology, cognitive lin-
guistics, semiotics, or nonverbal communication research, the foci of the systems vary
immensely both in their theoretical and methodological perspectives (see Bressem
this volume b). They range from a focus on conceptual questions and a primary interest
in meaning and function of co-verbal gestures (e.g., Duncan n.d.; McNeill 1992, 2005;
Sweetser and Parrill 2004) to a focus on the form of the gesture alone and disregarding
gesture’s relation to the verbal utterance (e.g., Martell 2002; Sager 2001). Only few tran-
scription or coding systems address the structural properties of the medium “gesture”
(Gut et al. 2002; Kipp, Neff and Albrecht 2007) (for an overview of annotation systems
see Bressem this volume b). A systematic linguistic annotation for gestures is still lack-
ing. No existing scheme so far allows for an investigation of gestures’ structures, mean-
ings, and functions both on the level of gestures alone and in relation to speech and
clearly addresses the theoretical and methodological assumptions going along with a lin-
guistic perspective on gestures. The Linguistic Annotation System for Gestures (LASG)
aims at that target. It provides a perspective on the annotation of gestures grounded in a
(cognitive) linguistic approach to language use and a form- based approach to gesture
analysis (Müller, Bressem and Ladewig this volume; Müller, Ladewig and Bressem vol-
ume 2). In addition, it provides guidelines for the annotation of gestures on a range of
levels of linguistic description. The system addresses necessary aspects for a description
of gestural forms and a reconstruction of their meanings and functions with and without
speech. The underlying theoretical and methodological assumptions are spelled out ex-
plicitly. The Linguistic Annotation System thereby offers solid grounds for describing
and detecting a “grammar of gestures” (see Müller, Bressem and Ladewig this volume).
Moreover, it allows for an analysis of gestures from the perspective of a “multimodal
grammar” (Fricke 2012, this volume) and in particular for an examination of the integra-
tion of gestures into spoken utterances (see e.g., Bressem 2012; Fricke 2012 this volume;
Ladewig 2012).
The annotation system is grounded in a cognitive linguistic and form based approach
to gestures, which assumes that speech and gesture are tightly linked and that language
is inherently multimodal (Fricke 2007, 2012, this volume; Mittelberg 2006; Müller 1998,
2007, 2008a; Müller et al. 2005). Gestures are assumed to have “a potential for lan-
guage” by fulfilling the same functions as language (Bühler 2011) and either “express
inner states and feelings, […] regulate the behavior of others, […] or represent objects
and events in the world.” (Müller 2009: 213, this volume). A linguistic perspective on ges-
tures follows two main aims: 1) a description of the structural and functional properties of
gestures, that is a “grammar of gesture” (e.g., Bressem 2012; Fricke 2012; Müller, Bressem
and Ladewig this volume; Ladewig 2012; Müller 2004, 2009, 2010b, submitted; Müller et al.
2005), and 2) an investigation of the relation of speech and gestures in conjunction from
the perspective of a “multimodal grammar” (Bressem 2012; Fricke 2012, this volume; La-
dewig 2012). Linguistic theory and in particular linguistic methods and concepts are under-
stood as “theoretical building blocks from which elements are selectively taken and
1100 V. Methods
carefully adopted in the analysis of gestures.” (Bressem and Ladewig 2011: 86–87; see
Müller this volume for a detailed account of a form based and linguistic approach to
gestures)
A linguistic perspective on gestures supposes that gestures a) can be segmented and
classified, b) show regularities and structures on the level of form and meaning, and
c) have the potential for combinatorics and hierarchical structures. Gestural forms
are assumed to be motivated form Gestalts, that is meaningful wholes, in which however
every aspect of a gesture’s form is regarded as potentially meaningful. Form features may
be singled out and changes in form features might be meaningful (Bressem and Ladewig
2011; Müller 2004, 2010, submitted; Fricke 2012; Ladewig and Bressem forthcoming).
Moreover, gestural form features are not considered to be random. On the contrary it
is assumed, in particular with respect to performative or recurrent gestures, that form fea-
tures recur across speakers and contexts whilst sharing stable meanings (Bressem and
Ladewig forthcoming; Calbris 1990; Fricke 2010; Harrison 2009; Kendon 1995, 2004;
Ladewig 2010, 2011; Mittelberg 2006; Müller 1998, 2004; Müller, Bressem and Ladewig
this volume; inter alia). This perspective on gestural forms results in a particular method-
ological approach, which gives form a prominent role in the process of description and
analysis (e.g., Bressem 2007, 2012, this volume; Ladewig 2007, 2010; Ladewig and Bressem
forthcoming; Müller 1998, 2004, 2010b; Müller, Bressem and Ladewig this volume).
2. Annotation of gestures
The Linguistic Annotation System for Gestures is embedded within a linguistic
approach to gesture, its methodological premises and in particular within the “Methods
of Gesture Analysis (MGA)” (Müller this volume; Müller, Bressem and Ladewig this
volume). The Methods of Gesture Analysis (see Müller, Ladewig and Bressem vol-
ume 2) offer a form-based method to systematically reconstruct the meaning of ges-
tures. It allows for the reconstruction of fundamental properties of gestural meaning
creation and determines basic principles of gestural meaning construction by distin-
guishing four main building blocks: 1) form, 2) sequential structure of gestures in rela-
tion to speech and other gestures, 3) local context of use, i.e., gestures’ relation to
syntactic, semantic, and pragmatics aspects of speech, and 4) distribution of gestures
over different contexts use. The Methods of Gesture Analysis assumes that the meaning
of a gesture emerges out of a fine-grained interaction of a gesture’s form, its sequential
position, and its embedding within in a context of use (local and distributed). Thus, a
gesture’s meaning is determined in a (widely) context-free analysis of its form, which
grounds the later context-sensitive analysis of gestures.
The Linguistic Annotation System for Gestures represents particular aspects of the
Methods of Gesture Analysis. It takes up its first three building blocks, namely form,
sequential structure, and local context of use and transforms it into a format applicable
in annotation software such as ELAN (Wittenburg et al. 2006) or Anvil (Kipp 2001).
Annotation within the Linguistic Annotation System for Gestures is thereby under-
stood as “any type of text (e.g., a transcription, a translation, coding, etc.) that you
enter on a tier. It is assigned to a selected time interval of the video/audio file (e.g.,
to the utterance of a speaker) or to an annotation on another tier (e.g., a translation
is assigned to an orthographic transcription).” (Elan Manual) Accordingly, no explicit
distinction is drawn between transcription (descriptive perspective with a direct relation
71. Linguistic Annotation System for Gestures 1101
to the spoken and gestural utterance) and annotation (analytic perspective with a ref-
erence to units within the transcripts) but a rather broad understanding of annotation
covering both aspects is assumed (see Bird and Liberman 2001).
The structure of the Linguistic Annotation System for Gestures is determined by the
focus on form aspects of gestures. It first provides for the description and motivation of
gestural forms (modes of representation, image schemas, motor patterns, and actions).
Afterwards it addresses gestures in relation to speech on a range of levels of linguistic
description for speech, that is prosody, syntax, semantics, and pragmatics. In doing so,
the Linguistic Annotation System for Gestures offers obligatory as well as optional as-
pects for each of the different levels of linguistic description and as such allows for a
broad or narrow annotation of gestures alone and in relation to speech (see Tab. 71.1
for an overview). The following overview of the Linguistic Annotation System for Ges-
tures presents the individual aspects for the annotation of gestures in their chronological
order.
Tab. 71.1: Overview of levels of annotation in Linguistic Annotation System for Gestures
obligatory/ controlled
Level of Annotation Name of Tier
optional vocabulary
Gesture Unit
determining units Gesture Phases obligatory
Hand Shape
Orientation
Position
annotation of form obligatory
Annotation of Movement Type
x
gestures Movement Direction
Movement Quality
Mode of representation (MoR)
Action
motivation of form obligatory
Motor pattern
Image schema
Speech Turn
annotation of Speech Turn-translation
speech
Speech Turn-Gesture Phases
(turn)
Annotation of Speech Turn-Gesture Phases translation
obligatory
speech Intonation Unit
annotation of Intonation Unit-translation
speech
Intonation Unit-Gesture Phases
(intonation unit)
Intonation Unit-Gesture Phases translation
Final pitch movement obligatory
prosody optional x
Accent (primary, secondary)
Word Class obligatory
Syntax Syntactic Function x
optional
Integration
Annotation of Temporal Relation obligatory
gestures in relation Semantics Semantic Relation x
to speech
optional
Semantic Function
Turn obligatory
Speech Act
Pragmatics x
Pragmatic Function optional
Dynamic Pattern
1102 V. Methods
movement sequences into gesture phases, the Linguistic Annotation System for Ges-
tures uses the “frame-by-frame marking procedure” (Seyfeddinipur 2006). By using
the sharpness of a video image, in which the execution of movement becomes apparent
in blurry and clear images, different types of transitions in the execution of gestural
movement sequences are distinguished. On the basis of different types of transitions,
gestural movement phases are assigned to specific types of gesture phases (see also
Ladewig and Bressem this volume).
With these two levels, namely gesture units and gesture phases, the Linguistic Anno-
tation System for Gestures sets up the basis for all annotations within the system.
Gesture units thereby serve as the broadest level of gesture segmentation. Gesture
phases on the other hand constitute the lowest level of gesture segmentation and more-
over make up the referring unit for all following annotations (see Fig. 71.2 for tier
dependency within the annotation system). Intermediate levels, such as gesture phrases,
are not included in the annotation system, as they serve no immediate function in
segmenting the gestural movement for the annotation process.
Apart from segmenting gestural movement sequences into units of different levels of
complexity the concentration on gesture units and gesture phases also serves a further
function. The allocation of gesture phases is not only a prerequisite for determining ges-
tural segments but also for specifying the gesture’s exact relation with units of the
speech stream. Thus it is a necessary step in detecting the meaning of a gesture. Further-
more following Kendon (1972), it is assumed that gestures form larger units, which
match higher-level units on the level of spoken language.
The larger the speech unit, the more body parts there are involved in this movement. For
locutions, for instance, only the head and the gesticulating limb are involved. For locution
groups, there is a shift in the trunk as well. For very high level units, such as “discourse”
or “listening,” there is a major change in the speaker’s total bodily position (Kendon
1972: 205).
Gesture units thereby most likely correlate with locution clusters, the highest level
of speech within a discourse, which can be understood to be equivalent to a paragraph.
Thus annotating gesture units provides a first approach to the thematic organization
of the conversation and can be helpful in annotating the semantic and functional rela-
tion of gestures with speech, especially with respect to particular types of gestures
(see section 2.3.4, 2.3.5.).
particular aspects addressed. When focusing on gestures’ relation with motion events,
for instance, the detailedness in the description of the four form parameters may be var-
ied. As such one may focus, for instance, on a close description of the parameter “move-
ment” whereas the parameters “hand shape” and “orientation” may be annotated in a
less detailed manner (see e.g., Müller 1998, 2000). However, the Linguistic Annotation
System for Gestures proposes that a rough annotation of all form aspects addressed
above is necessary for a sound linguistic analysis of gestures as it lays the basis for a
detection of structures and functions of gestures.
structures for the recruitment of gestural forms (Cienki 2005). In doing so, image sche-
matic structures offer valuable insight into the context-free (Müller 2010b) or inherent
meaning of gestures (Ladewig and Bressem forthcoming).
A further puzzle stone in understanding the meaning and function of gestures can be
found in their basis in everyday actions. Gestures often constitute re-enactments of
basic mundane actions, grounding the gestures’ communicative actions in real world ac-
tions. By modulating the motion patterns of everyday actions, gestures abstract from the
actions in the real world, making them recognizable as as-if-actions, as signs (Müller
submitted). In this process of derivation from (everyday) actions to gestural meaning,
gestures select and recombine perceptually salient aspects and distinctive elements of
the action (e.g., Calbris 2003; Streeck 1994, 2009; Ladewig 2010, 2011; Müller 2004;
Teßendorf 2008). By being metonymically linked with the action itself, gestures
evoke elements from an action chain, such as actor, action, instrument, or result
(Calbris 1990; Müller 1998; Müller and Haferland 1997; Teßendorf 2008), and use
them for different communicative purposes (Teßendorf 2009). Thus, the concrete bodily
basis of gestures becomes visible in their forms providing metonymic pathways to
everyday actions.
3. Annotation of speech
The second block of the Linguistic Annotation System for Gesture addresses the anno-
tation of speech. Here, the system offers two possible strands: Speech occurring within
the boundaries of a gesture unit can either be annotated based on the notion of turns
(Sacks, Schegloff and Jefferson 1974) or intonation units (Chafe 1994). In both cases,
speech is transliterated or transcribed following the conventions of the “GAT2” (for
the English version see Couper-Kuhlen and Barth-Weingarten 2011). Annotating
speech, either based on the notion of turns or intonation units, is considered obligatory
within the Linguistic Annotation System for Gestures. Additionally, the system offers
the possibility to transliterate or transcribe speech in relation to further facets as, for
instance, speech in relation to the individual gesture phases or the translation of the
spoken utterance into another language. It further gives the opportunity to annotate
prosodic aspects of the spoken utterance, such as final pitch movement or focus accents.
the syntax of speech. A close description of the linguistic context in which a gesture is
placed is fundamental for the reconstruction of gestural meanings and for determining
the structural and functional relevance of gestures in language use (e.g., Bressem 2012;
Fricke 2012; Ladewig 2012; Müller, Lausberg, Fricke et al. 2005). Here, the system pro-
vides for the annotation of three facets: 1) Annotation of word classes, 2) Annotation of
syntactic functions, 3) Integration of gestures into the verbal utterance. From these
three aspects, only the annotation of the word classes is considered obligatory within
the system. All remaining aspects are regarded optional in the coding process and
can be annotated if relevant for the particular research question. The annotation is
done for the gesture phase “stroke”.
in different ways thereby showing “degrees of integrability” (Fricke 2012). Gestures may
be positionally integrated into the verbal utterance by a syntactic gap (Ladewig 2012) or
by temporal overlap (Bressem 2012; Fricke 2012). They may furthermore be integrated
cataphorically through the deictic such (a), the pronouns this, these, or the adverb here
(see Fricke 2007, 2012; Streeck 1988, 1990, 2002). The different degrees of integrability
need to be perceived as a continuum based on the type of integration (e.g., by being cat-
aphorically integrated or through occupation of syntactic gaps), the distribution of
information over the different modalities (i.e., redundance, supplementation or substitu-
tion), and the occurrence of the modalities (i.e., temporal overlap or linear succession)
(Ladewig 2012).
Annotating the type and degree of integration is considered optional within the sys-
tem and needs only to be incorporated if necessary. The Linguistic Annotation System
of Gestures however allows for the annotation of the type of integration as it of partic-
ular importance in analyzing gestures from a linguistic point of view and the perspective
of a multimodal grammar and, in particular, in determining a gesture’s function in
creating a multimodal utterance (meaning).
(i) Pre- and post-positioning, that is the linear combination of speech and gesture, in
which gestures may either precede or follow the co-expressive speech segment.
(ii) Parallel, that is the simultaneous combination of speech and gestures, in which ges-
tures are executed in temporal overlap with the co-expressive speech segment.
(iii) Gesture alone, that is the linear combination of speech and gesture, in which ges-
tures have no direct spoken counterpart at the moment of being uttered but occur
in pauses, in syntactic gaps, or in larger speechless segments.
(i) Redundant: The gesture matches the semantic features or image schemas in speech
so that the features or image schemas may be identical or included among the set
of semantic features or image schemas expressed in speech.
(ii) Complementary/Supplementary: Speech and gesture do not match in the semantic
features or image schemas but the gesture contributes semantic features or image
schemas to speech “thus forming a subset of the meaning of the superordinate
modality, namely speech.” (Gut et al. 2002: 8)
(iii) Contrary: Speech and gesture do not match in the semantic features or image sche-
mas, but rather carry contrary features so that speech and gesture do not form an
overlapping set of features or image schemas.
(iv) Replacing: Gestures are used without speech.
semantic function with regard to the verbal utterance. By comparing the semantic
features and/or image schemas communicated in speech and gestures, the system cate-
gorizes the function of the gesture on the semantics of the spoken utterance as one of
the following:
Gestures may thus either illustrate and emphasize what has already been uttered
verbally, modify the verbal meaning, or replace the verbal meaning (see e.g., Engle
2000; Freedman 1977; Fricke 2007, 2012; Gerwing and Allison 2009; Gut et al. 2002;
Kendon 1987; Kipp 2004; McNeill 1992, 2005; Müller 1998; Scherer 1979; Wilcox
2002). In doing so, gestures embody elements of the verbal meaning, mark salient infor-
mation, and highlight and foreground information in the flow of discourse (Alibali and
Kita 2010; Andrén 2010; Goodwin 2000; Ladewig 2012; Müller and Tag 2010).
Streeck and Hartge 1992) or even as a turn-holding device (Bohle 2007; Streeck and
Hartge 1992). At the beginning of the speaker’s turn, they would indicate the wish to
become next speaker; at the end of the turn, they would complete the turn, for instance,
by filling up a speech-pause or indicate the wish to maintain the right for the succeeding
turn. Annotating the placement of gestures in relation to the turn-constructional com-
ponent contributes to examining the means and functions of gestural meaning construc-
tion with respect to interactive aspects and furthermore sets the ground for evaluating
their pragmatic functions. Accordingly, the Linguistic Annotation System for Gestures
annotates the gestures placement either as “beginning of turn”, “end of turn” or “middle
of turn.”
into account in the first steps for annotation of the Linguistic Annotation System for
Gestures but it is gradually added throughout the annotation process. Gesture and
speech are taken into account as two separate articulatory modalities with gesture
being integrated with speech both structurally and functionally, while exhibiting mod-
ality specific forms, structures, and patterns. The separation of speech and gesture in
the annotation process is thereby understood as a necessary heuristic procedure in
order to detect and describe the “gestures’ potential for language” (Müller 2009).
Fig. 71.1: Example of Linguistic Annotation System with speech annotation based on the notion
of turn
7. Conclusion
The present article has outlined a perspective on the annotation of gestures grounded in
a (cognitive) linguistic approach to language use and provided guidelines for the anno-
tation of gestures and speech on a range of levels. By explicating underlying theoretical
and methodological assumptions in describing gestures’ forms and reconstructing their
meanings and function with and without speech, the annotation system complements
the list of existing annotation systems (e.g., Gut et al. 2002; Kipp, Neff and Albrecht
2007; Lausberg and Sloetjes 2009) by offering a systematic description at a range of le-
vels of linguistic description. Furthermore, by offering obligatory as well as optional as-
pects of annotation, the systems offers flexibility and expandability allowing it to be
adjusted to a range of research questions which are addressed by different depths of
71. Linguistic Annotation System for Gestures 1117
Fig. 71.2: Tier dependency in the Linguistic Annotation System for Gestures
1118 V. Methods
description and of analysis. The Linguistic Annotation System of Gestures is thus also
applicable in a range of disciplines such as (cognitive) linguistics, semiotics, and cog-
nitive science, for instance, interested in a linguistic and form-based perspective on
gestures and their relation to speech.
Acknowledgments
We are grateful to the Volkswagen Foundation for supporting this work with a grant for
the interdisciplinary project “Towards a grammar of gesture: evolution, brain and
linguistic structures” (www.togog.org).
8. References
Alibali, Martha W. and Kita, Sotaro Kita 2010. Gesture highlights perceptually present informa-
tion for speakers. Gesture 10(1): 3–28.
Andrén, Mats 2010. Children’s Gestures from 18 to 30 Months. Centre for Languages and Litera-
ture, Centre for Cognitive Semiotics. Lund: Lund University.
Austin, John L. 1962. How to Do Things with Words. Oxford: Clarendon Press.
Battison, Robin 1974. Phonological deletion in American Sign Language. Sign Language Studies 5: 1–19.
Bavelas, Janet Beavin, Nicole Chovil, Douglas A. Lawrie and Allan Wade 1992. Interactive ges-
tures. Discourse Processes 15: 469–489.
Beattie, Geoffrey and Heather Shovelton 1999. Do iconic hand gestures really contribute anything
to the semantic information conveyed by speech? An experimental investigation. Semiotica
123(1/2): 1–30.
Beattie, Geoffrey and Heather Shovelton 2007. The role of iconic gesture in semantic communi-
cation and its theoretical and practical implications. In: Susan D. Duncan, Justine Cassell and
Elena Tevy Levy (eds.), Gesture and the Dynamic Dimension of Language, 221–241. Philadel-
phia: John Benjamins.
Bergmann, Kirsten, V. Aksu and Stefan Kopp 2011. The relation of speech and gestures: Temporal
synchrony follows semantic synchrony. Paper presented at the 2nd Workshop on Gesture and
Speech in Interaction – GESPIN, 5-7 September. Bielefeld, Germany.
Bergmann, Kirsten and Stefan Kopp 2006. Verbal or visual? How information is distributed across
speech and gesture in spatial dialog. In: David Schlangen and Raquel Fernandez (eds.), Pro-
ceedings of brandial 2006, the 10th Workshop on the Semantics and Pragmatics of Dialogue,
90–97. Potsdam University Press, Germany.
Bird, Steven and Mark Liberman 2001. A formal framework for linguistic annotation. Speech
Communication 33(1/2): 23–60.
Birdwhistell, Ray 1970. Kinesics and Context. Essays on Body Motion Communication. Philadel-
phia: University of Pennsylvania Press.
Bohle, Ulrike 2007. Das Wort ergreifen – das Wort übergeben: Explorative Studie zur Rolle rede-
begleitender Gesten in der Organisation des Sprecherwechsels. Berlin: Weidler.
Bressem, Jana 2007. Recurrent form features in coverbal gestures. Unpublished manuscript.
Bressem, Jana 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Ph.D. disser-
tation, Faculty of Social and Cultural Sciences, European University Viadrina, Frankfurt (Oder).
Bressem, Jana this volume a. A linguistic perspective on the notation of form features in gestures.
In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
Teßendorf (eds.), Body – Language – Communication: An International Handbook on Multi-
modality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin/Boston: De Gruyter Mouton.
Bressem, Jana this volume b. Transcription systems for gestures, speech, prosody, postures, gaze.
In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha
71. Linguistic Annotation System for Gestures 1119
Ekman, Paul 1977. Facial expression. In: Aaron Siegman and S. Feldstein (eds.), Nonverbal Behav-
ior and Communication, 97–116. Hillsdale, NJ: Lawrence Erlbaum.
Ekman, Paul and Wallace V. Friesen 1969. The repertoire of nonverbal behavior: Categories, ori-
gins, usage and coding. Semiotica 1: 49–98.
Engle, Randi A. 2000. Toward a theory of multimodal communication combining speech, gestures,
diagrams, and demonstrations in instructional explanations. Ph.D. dissertation, School of Edu-
cation, Stanford University, Stanford, CA.
Freedman, Norbert 1977. Hands, words and mind. On the structuralization of body movements
during discourse and the capacity for verbal representation. In: Norbert Freedman and Stanley
Grand (eds.), Communicative structures. A psychoanalytic interpretation of communication,
219–235. New York: Plenum Press.
Freedman, Norbert 1972. The analysis of movement behaviour during the clinical interview. In:
Aron Wolfe Siegman and Benjamin Pope (eds.), Studies in Dyadic Communication, 153–175.
New York: Pergamon Press.
Fricke, Ellen 2007. Origo, Geste und Raum: Lokaldeixis im Deutschen. Berlin: Walter de Gruyter.
Fricke, Ellen 2010. Phonaestheme, Kinaestheme und multimodale Grammatik: Wie Artikula-
tionen zu Typen werden, die bedeuten können. In: Sprache und Literatur 41(1): 70–88.
Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin: De
Gruyter.
Fricke, Ellen this volume. Towards a unified grammar of gesture and speech. Cornelia Müller,
Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Sedinha Teßendorf (eds.),
Body – Language – Communication: An International Handbook on Multimodality in Human
Interaction. (Handbooks of Linguistics and Communication Science 38.1.) Berlin/Boston: De
Gruyter Mouton.
Gerwing, Jennifer and Meredith Allison 2009. The relationship between verbal and gestural con-
tributions in conversation: A comparison of three methods. Gesture 9(3): 312–336.
Goodwin, Charles 2000. Action and embodiement within situated human interaction. Journal of
Pragmatics 32: 1489–1522.
Gut, Ulrike, Karin Looks, Alexandra Thies and Dafydd Gibbon 2002. Cogest: Conversational ges-
ture transcription system version 1.0. Fakultät für Linguistik und Literaturwissenschaft, Uni-
versität Bielefeld, ModeLex Tech. Rep 1.
Harrison, Simon 2009. Grammar, gesture, and cognition: The case of negation in English. Ph.D.
dissertation, Université Michel de Montaigne, Bourdeaux 3.
Johnson, Mark 1987. The Body in Mind. The Bodily Basis of Meaning, Imagination, and Reason.
Chicago: University of Chicago Press.
Johnson, Mark 1993. Conceptual metaphor and embodied structures of meaning: A reply to Ken-
nedy and Vervaeke. Philosophical Psychology 6(4): 413–422.
Johnson, Mark 2005. The philosophical significance of image schemas. In: Beate Hampe (ed.),
From Perception to Meaning: Image Schemas in Cognitive Linguistics, 15–33. Berlin: De Gruy-
ter Mouton.
Kacem, Chaouki 2012 Gestenverhalten an deutschen und tunesischen Schulen. Ph.D. dissertation,
Technische Universität Berlin.
Kappelhoff, Hermann and Cornelia Müller 2011. Embodied meaning construction. Multimodal
metaphor and expressive movement in speech, gesture, and in feature film. Metaphor and
Social World 1: 121–153.
Kendon, Adam 1972. Some relationship between body motion and speech In: A. Seigman and B.
Pope (eds.), Studies in Dyadic Communication, 177–216. Elmsford, NY: Pergamon Press.
Kendon, Adam 1980. Gesture and speech: Two aspects of the process of utterance. In: Mary
Ritchie Key (ed.), Nonverbal Communication and Language, 207–288. The Hague: Mouton.
Kendon, Adam 1987. On gesture: Its complementary relationship with speech. In: Aron W. Sieg-
mann and Stanley Feldstein (eds.), Nonverbal Behavior and Communication, 65–97. Hillsdale,
NJ: Lawrence Erlbaum.
71. Linguistic Annotation System for Gestures 1121
Kendon, Adam 1995. Gestures as illocutionary and discourse structure markers in Southern Ital-
ian conversation. Journal of Pragmatics 23: 247–279.
Kendon, Adam 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University Press.
Kipp, Michael 2001. Anvil – a generic annotation tool for multimodal dialogue. In: Proceedings
of the 7th European Conference on Speech Communication and Technology (Eurospeech),
1367–1370. Aalborg, Denmark.
Kipp, Michael, Michael Neff and Irene Albrecht 2007. An annotation scheme for conversational
gestures: How to economically capture timing and form. Journal on Language Resources and
Evaluation – Special Issue on Multimodal Corpora 41(3–4): 325–339.
Kita, Sotaro, Ingeborg van Gijn and Harry van der Hulst 1998. Movement phases in signs and
cospeech gestures and their transcription by human encoders. In: Ipke Wachsmuth and Martin
Fröhlich (eds.), Gesture, and Sign Language in Human-Computer Interaction, 23–35. Berlin:
Springer.
Kolter, Astrid, Silva H. Ladewig, Michela Summa, Sabine Koch, Thomas Fuchs and Cornelia
Müller 2012. Body memory and emergence of metaphor in movement and speech. An
interdisciplinary case study. In: Sabine Koch, Thomas Fuchs, Michaela Summa and Cornelia
Müller (eds.), Body Memory, Metaphor, and Movement, 202–226. Amsterdam: John
Benjamins.
Kopp, Stefan, Paul Tepper and Justine Cassell 2004. Towards an integrated microplanning of
language and iconic gesture for multimodal output. Paper presented at the ICMI 04 October
13–15, State College, PA.
Ladd, D. Robert 1996. Intonational Phonology. Cambridge: Cambridge University Press.
Ladewig, Silva H. 2007. The family of the cyclic gesture and its variants – systematic variation of
form and contexts. http://www.silvaladewig.de/publications/papers/Ladewig-cyclic_gesture_pdf;
accessed January 2008.
Ladewig, Silva H. 2010. Beschreiben, suchen und auffordern – Varianten einer rekurrenten Geste.
In: Sprache und Literatur 41(1): 89–111.
Ladewig, Silva H. 2011. Putting the cyclic gesture on a cognitive basis. CogniTextes 6. http://
cognitextes.revues.org/406
Ladewig, Silva H. 2012. Syntactic and semantic integration of gestures into speech: Structural, cog-
nitive, and conceptual aspects. Ph.D. dissertation, Faculty of Social and Cultural Sciences, Euro-
pean University Viadrina, Frankfurt (Oder).
Ladewig, Silva H. and Jana Bressem this volume. The notation of gesture phases – a linguistic per-
spective. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and
Sedinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin/Boston: De Gruyter Mouton.
Ladewig, Silva H. and Jana Bressem forthcoming. New insights into the medium hand – Discover-
ing Structures in gestures based on the four parameters of sign language. Semiotica.
Langacker, Ronald 1987. Foundations of Cognitive Grammar. Vol. 1: Theoretical Prerequisites.
Stanford, CA: Standford University Press.
Lascarides, Alex and Mathew Stone 2006. Formal semantics for iconic gesture. In: David Schlan-
gen and Raquel Fernandez (eds.), brandial’06 Proceedings, 64–71. Potsdam, Germany: Pots-
dam University Press.
Lausberg, Hedda and Han Sloetjes 2009. Coding gestural behavior with the NEUROGES–ELAN
system. Behavioral Research Methods 41(3): 841–849.
Loehr, Daniel 2004. Gesture and intonation. Ph.D. dissertation, Graduate School of Arts and
Sciences, Georgetown University, Washington, DC.
Loehr, Daniel 2007. Aspects of rhythm in gesture and speech. Gesture 7(2): 179–214.
Martell, Craig 2002. Form: An extensible, kinematically– based gesture annotation scheme. Paper
presented at the International Conference on Language Resources and Evaluation. European
Language Resources Association. 29–31 May. Las Palmas.
1122 V. Methods
McClave, Evelyn Z. 1991. Intonation and gesture. Georgetown University, Washington DC. PhD Thesis.
McClave, Evelyn Z. 1994. Gestural beats: The rhythm hypothesis. Journal of Psycholinguistic
Research 23(1): 45–66.
McCullough, Karl Erik 2005. Using gestures during speaking: Self-generating indexical fields. Ph.
D. dissertation, Chicago University.
McNeill, David 1992. Hand and Mind. What Gestures Reveal about Thought. Chicago: University
of Chicago Press.
McNeill, David 2005. Gesture and Thought. Chicago: University of Chicago Press.
Mittelberg, Irene 2006. Metaphor and metonymy in language and gesture: discoursive evidence for
multimodal models of grammar. Ph.D. dissertation, Cornell University.
Mittelberg, Irene 2008. Peircean semiotics meets conceptual metaphor: Iconic modes in gestural
representations of grammar. In: Alan Cienki and Cornelia Müller (eds.), Metaphor and Ges-
ture, 145–184. Amsterdam: John Benjamins.
Mittelberg, Irene 2010. Geometric and image-schematic patterns in gesture space. In: Vyv Evans
and Paul Chilton (eds.), Language, Cognition, and Space: The State of the Art and New Direc-
tions, 351–385. London: Equinox.
Mittelberg, Irene and Linda Waugh 2009. Multimodal figures of thought: A cognitive-semiotic
approach to metaphor and metonymy in co-speech gesture. In: Charles Forceville and Eduardo
Urios-Aparisi (eds.), Multimodal Metaphor, 329–356. Berlin: De Gruyter Mouton.
Müller, Cornelia 1998. Redebegleitende Gesten: Kulturgeschichte – Theorie – Sprachvergleich. Ber-
lin: Arno Spitz.
Müller, Cornelia 2000. Zeit als Raum. Eine kognitiv-semantische Mikroanalyse des sprachlichen
und gestischen Ausdrucks von Aktionsarten. In: Ernest W. B. Hess-Lüttich and H. Walter
Schmitz (eds.), Botschaften verstehen. Kommunikationstheorie und Zeichenpraxis. Festschrift
für Helmut Richter, 211–228. Frankfurt am Main: Peter Lang.
Müller, Cornelia 2003. On the gestural creation of narrative structure: A case study of a story told
in conversation. In: Monica Rector, Isabella Poggi and Trigo Nadine (eds.), Gestures: Meaning
and Use, 259–265. Porto, Portugal: Universidade Fernando Pessoa.
Müller, Cornelia 2004. Forms and uses of the Palm Up Open Hand. A case of a gesture family? In:
Cornelia Müller and Roland Posner (eds.), Semantics and Pragmatics of Everday Gestures,
233–256. Berlin: Weidler.
Müller, Cornelia 2007. A dynamic view on metaphor, gesture and thought In: Susan Duncan, Jus-
tine Cassell and Elena T. Levy (eds.), Gesture and the Dynamic Dimension of Language. Essays
in Honor of David McNeill, 109–116. Amsterdam: John Benjamins.
Müller, Cornelia 2008a. Metaphors. Dead and Alive, Sleeping and Waking. A Dynamic View. Chi-
cago: University of Chicago Press.
Müller, Cornelia 2008b. What gestures reveal about the nature of metaphor. In: Alan Cienki and
Cornelia Müller (eds.), Metaphor and Gesture, 249–275. Amsterdam: John Benjamins.
Müller, Cornelia 2009. Gesture and language. In: K. Malmkjaer (ed.), Routledge’s Linguistics
Encyclopedia, 214–217. Abingdon: Routledge.
Müller, Cornelia 2010a. Mimesis und Gestik. In: Gertrud Koch, Martin Vöhler and Christiane
Voss (eds.), Die Mimesis und ihre Künste, 149–187. Paderborn: Fink.
Müller, Cornelia 2010b. Wie Gesten bedeuten. Eine kognitiv-linguistische und sequenzanalytische
Perspektive. In: Sprache und Literatur 41(1): 37–68.
Müller, Cornelia this volume. Gestures as a medium of expression: The linguistic potential of ges-
tures. In: Cornelia Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Se-
dinha Teßendorf (eds.), Body – Language – Communication: An International Handbook on
Multimodality in Human Interaction. (Handbooks of Linguistics and Communication Science
38.1.) Berlin/Boston: De Gruyter Mouton.
Müller, Cornelia volume 2. Gestural modes of representation as techniques of depiction. In: Cornelia
Müller, Alan Cienki, Ellen Fricke, Silva H. Ladewig, David McNeill and Jana Bressem (eds.),
Body–Language – Communication: An International Handbook on Multimodality in Human
71. Linguistic Annotation System for Gestures 1123
Searle, John R. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge: Cam-
bridge University Press.
Selting, Margret 2000. The construction of units in conversational talk. Language in Society 29(4):
477–517.
Seyfeddinipur, Mandana 2006. Disfluency: Interrupting speech and gesture. Ph.D. dissertation,
University Nijmegen, the Netherlands.
Slama-Cazacu, Tatiana 1976. Nonverbal components in message sequence: “Mixed syntax”. In:
William C. McCormack and Stephen A. Wurm (eds.), Language and Man: Anthropological Is-
sues, 217–227. The Hague: Mouton.
Sowa, Timo 2005. Understanding Coverbal Iconic Gestures in Object Shape Descriptions. Ph.D.
dissertation. Berlin: Akademische Verlagsgesellschaft.
Stokoe, William 1960. Sign Language Structure. Buffalo, NY: Buffalo University Press.
Streeck, Jürgen 1988. The significance of gesture: How it is established. IPrA Papers in Pragmatics
2(1/2): 60–83.
Streeck, Jürgen 1993. Gesture as communication I: Its coordination with gaze and speech. Com-
munication Monographs 60(4): 275–299.
Streeck, Jürgen 1994. Culture, meaning, and interpersonal communication. In: Mark L. Knapp
and Gerald R. Miller (eds.), Handbook of Interpersonal Communication: 286–319. London:
Sage Publications Ltd.
Streeck, Jürgen 2002. Grammars, words, and embodied meanings: on the uses and evolution of so
and like. Journal of Communication 52(3): 581–596.
Streeck, Jürgen 2005. Pragmatic aspects of gesture. In: Jacob L. Mey (ed.), Encyclopedia of Lan-
guage and Linguistics, vol. 5: Pragmatics, 71–76. Oxford: Elsevier.
Streeck, Jürgen 2009. Gesturecraft. The manu-facture of meaning. Amsterdam: John Benjamins.
Streeck, Jürgen and Ulrike Hartge 1992. Previews: Gestures at the transition place. In: Peter Auer and
Aldo di Luzio (eds.), The Contextualization of Language, 138–158. Amsterdam: John Benjamins.
Sweetser, Eve 1998. Regular Metaphoricity in Gesture: Bodily-Based Models of Speech Interaction.
Oxford: Elsevier.
Sweetser, Eve 2006. Negative spaces: Levels of negation and kinds of spaces. GRAAT 35: 313–332.
Sweetser, Eve and Fey Parrill 2004. What we mean by meaning: Conceptual integration in gesture
analysis and transcription. Gesture 4(2): 197–219.
Talmy, Leonard 1983. How language structures space. In: Herbert L. Pick and Linda P. Acredolo
(eds.), Spatial Orientation: Theory, Research, and Application, 225–282. New York: Plenum Press.
Teßendorf, Sedinha 2005. Pragmatische Funktionen spanischer Gesten am Beispiel des “Gestos de Bar-
rer”. Unpublished MA thesis, Department of Philosophy and Humanities, Free University Berlin.
Teßendorf, Sedinha 2008. Pragmatic and metaphoric gestures – combining functional with cogni-
tive approaches. Unpublished manuscript.
Teßendorf, Sedinha 2009. From everyday action to gestural performance: Metonymic motivations
of a pragmatic gesture. Paper presented at the Second Aflico Conference, Lille. 10–12 May.
Tuite, Kevin 1993. The production of gesture. Semiotica 93(1–2): 83–105.
Wilcox, Sherman 2002. The iconic mapping of space and time in signed languages. In: Liliana Al-
bertazzi (ed.), Unfolding Perceptual Continua, 255–281. Amsterdam: John Benjamins.
Williams, Robert F. 2008. Gesture as a conceptual mapping tool. In: Alan Cienki and Cornelia
Müller (eds.), Metaphor and Gesture, 55–92. Amsterdam: John Benjamins.
Wittenburg, Peter, Hennie Brugman, A. Russal, A. Klassmann and Han Sloetjes 2006. ELAN: A
professional framework for multimodality research. In: Proceedings of the Fifth International
Conference on Language Resources and Evaluation (LREC), 1556–1559.
Abstract
Sign languages are natural languages developed and used in all parts of the world where
there are deaf people. Although different from each other, these languages, that have been
exposed to varying degrees of communitisation and institutionalisation, share a signifi-
cant number of common structures. As in the study of any other language, the linguistic
study of visual-gestural languages, now a scientific field in its own right, faces the
unavoidable problem of their graphical representation. This paper provides an overview,
albeit non-exhaustive, of existing solutions to this challenge. We begin by reviewing the
main difficulties posed by the graphical representation of these languages, which have
no writing system of their own, taking into consideration their modal and structural char-
acteristics (section 2). We then present the major types of graphical representations that
have been developed for sign languages, noting their respective strengths and limitations
(sections 3 and 4). In section 5, we outline the problems that remain unsolved at this
point, if one aims to achieve transcription of a growing number of sign language corpora
that would meet research needs while respecting the original features particular to these
languages.
a writing system. The vast majority of spoken languages are also unwritten. The major
difference, however, is that some writing system can be easily adapted for use in any
spoken language over time. No such option exists for sign language, which has no sim-
ilar written tradition to build upon. In light of the writing processes used in spoken lan-
guages, sign languages pose very specific problems, related to their modality and its
structural consequences.
Crucially, sign languages exploit the availability of all manual and bodily articulators
(hands, head, shoulders, facial expressions, eye gaze), which can be used simultaneously
or in succession. This parametric multi-linearity is combined with a sophisticated use of
the space in front of the signer that is opened by these articulators; thus, most syntac-
tico-semantic relations in sign languages are spatialised (establishment of reference to
entities, time and space, pronominalisation and maintained reference). It is agreed that
these characteristics reflect the difficulties raised by the graphical representation of
these languages: complex temporal relations between the articulators (co-articulation,
hold and overlap), appropriate use of space through pointing signs and continued use
of the areas thus activated, variability of gestural units through the modification of
location and/or orientation (e.g., Bergman et al. 2001; Johnston 1991; Miller [1994]
2001; Stokoe 1987). However, the descriptive approach proposed by the semiological
model (Cuxac 1996, 2000) points to an additional difficulty.
In addition to conventional lexical units that are widely recognized in the literature
(as “frozen signs” or even “signs” or “words”), this model considers as central another
type of unit, non-conventionalised but employing a limited number of structures
(termed transfer units). Although listed in the literature as “classifier constructions”
(see Emmorey 2003) or “productive signs” (Brennan 1990), these units are generally
analyzed as peripheral and non-linguistic. Yet, they represent 30–80% of sign language
discourse (Sallandre 2003; Antinoro Pizzuto et al. 2008). If we adopt the semiological
perspective that considers units of this type the very heart of sign languages, their inclu-
sion increases the difficulty of graphical representation, as they are based on a semiosis
of the continuity, given their “illustrative intent” to say through showing. In sign lan-
guage discourse, these units are tightly intertwined with the conventional units
(“non-illustrative intent”), and the entire discourse alternates between both intents.
Significantly, the two moments in history which gave rise to an explicit linguistic
reflection on sign languages were accompanied by the development of graphical sys-
tems for their representation: Bébian (1825) and Stokoe (1960). The development of
the modern linguistic study of these languages has also been accompanied by a prolif-
eration of graphical systems (for a review, see Boutora 2005). Classification of these in-
ventions must take two variables into account. First, the goals of the representation
system, as transcription only (the graphic representation of produced data) or as a sys-
tem for written communication as well. Second, and more importantly, the semiological
features of the system used, in particular whether it employs specific symbols and its
own internal logic, or a pre-existing writing system (de facto, the written form of the
national spoken language). On this basis, we can distinguish two sets, which we term
“notation systems” and “annotation systems.” Notation systems are autonomous and
specific systems, sometimes intended for written communication, which share central
semiological features: They are mono-linear, and focused, at least in their design, on
the representation of lexical signs in terms of their visual form outside any discourse
context. In contrast, annotation systems, which are based on the written form of spoken
72. Transcription systems for sign languages 1127
language, are intended to represent discourse, and used only in the context of linguistic
research.
2. Notation systems
The moment sign language was taken into consideration in the education of deaf chil-
dren, its graphical representation became an issue, particularly for the creation of dic-
tionaries for teachers and students. We will not specify the graphical means used in such
dictionaries from the late 18th century, referring the reader to the full review by Bonnal
(2005). Two methods (which may be combined) were used for these representations:
drawing (enhanced by symbols indicating movement, at times represented by a
sequence of drawings) and, the more dominant, descriptions of the signifier form of a
sign, written in spoken language.
The first independent notation system is Bébian’s (1825) Mimographie, developed
for the sign language used at the Institution de Paris, and intended for purely pedago-
gical purposes (e.g. Fischer 1995). Bébian’s aim was not to provide a written form of
signed discourse, but simply to represent the “mimicked signs.” This purpose is remark-
ably ahead of its time. The idea is that each sign is represented in a linear sequence in-
dicating the relevant body part (“l’organe qui agit,” represented through 86 characters),
its movement (68), its position (14), and if needed, the facial expression (20) as well.
The Mimographie system, although never implemented and not the only attempt at
notation in the 19th century (Piroux 1830), is foundational, and serves as the basis
for all modern notation systems, starting with Stokoe (1960).
For Stokoe (1960, [1965] 1976), the creation of notation rather than transcription was
seen as part of the demonstration of the linguistic status of American Sign Language,
the characters, cherems, intended to be equivalent to phonemes, and thus to prove
the existence of a double articulation. His analysis of signs is directly inspired by Bébian
but diverges in several respects. Focusing only on manual aspects, Stokoe retains only
the handshape (vs. Bébian’s conformation, which included orientation as well), and
adding the parameter of location, but removes facial expression. Stokoe’s system is
composed of 55 cherems (19 handshapes, 12 locations, 24 movements), using symbols
borrowed from the Latin alphabet, the numerical system and some invented for this
purpose, and is generally devoid of iconicity (see Fig. 72.1). This model is the direct
source of the vast majority of systems used over the next two decades, as linguistic
study of other sign languages developed, now focusing on transcription.
Fig. 72.1: Notation in Stokoe’s (1960) system: The sign [SNAKE] in American Sign Language
(Martin 2000).
between them typically stem from theoretical developments (generative and post-gen-
erative phonology), the addition of the “orientation” parameter (following Battison
1973), the adaptation to different sign languages and options for the parametric linear
sequencing of symbols. However, each system is only comprehensible to the research
team using it.
One noteworthy system based on Stokoe’s is the HamNoSys, developed in Hamburg
(Prillwitz et al. 1989). HamNoSys is intended to enable the phonetic transcription of all
sign languages, and therefore includes a considerable number of symbols (more than
200 basic symbols), and gradually enhanced for the notation of spatial cues and non-
manual aspects (facial expression, body movements, prosody, eye gaze). Unlike Sto-
koe’s system, HamNoSys employs iconic symbols and shows strong internal systemati-
city (see Fig. 72.2 below for an illustration). Yet, it faces a serious legibility problem,
particularly for the recording of discourse. Nevertheless, thanks to its fast digitization
and compatibility with annotation software (see 3.2 below), it is integrated in large lex-
ical databases, as Ilex (Hanke and Storz 2008) or Auslan Database (Johnston 2001), and
is a notation system which has been frequently used in sign language research.
bears
Goldilocks
somewhere
wandering
deep forest
somewhere
wandering
Whatever their respective contributions, there is general agreement that these various
systems have some limitations. The most notable is their virtual inability to represent
discourse sequences and taking note of constituent principles. Their inherent mono-
linearity prevents a readable representation of the spatio-temporal relations that are
essential for sign language syntax. In addition, these systems were established solely
on the analysis of decontextualised manual signs, abstracting away from their use in dis-
course (modifications of internal parameters, discursive framing of the conventional
sign by non manual components).
Although it is an alphabetic type of notation, like its predecessors, the significant inno-
vation in SignWriting lies in its semiographic aspect, adding the analogical to the digital
(see Fig. 72.3). This system represents all gesture production as a multi-parameter com-
position and as a whole (each “graphic cell” includes, analogically, the symbols of various
articulators, allowing us to see a body creating a space and a gaze), thus allowing a de-
tailed reconstruction of spatial phenomena. SignWriting was designed to evolve through
use and was quickly adopted by deaf signers. It is taught at various schools around the
world and supported by numerous publications using the system (see, http://www.
signwriting.org/). Over the past decade, the system has been the object of detailed experi-
ments led by the Italian deaf team directed by E. Antinoro Pizzuto (e.g. Pizzuto, Chiari,
and Rossini 2008; Pizzuto et al. 2008), revealing the possibility (previously unachievable)
for a deaf speaker to accurately reconstruct discourse rich in transfer units from a text of
Lingua dei Segni Italiana (Italian Sign Language) in SignWriting, whether written or tran-
scribed (Di Renzo et al. 2006). However, limitations do remain. In the absence of spelling
rules, the system often allows multiple representations for the same sign. There are also
analogical setbacks – the absence of explicit and economic marking of spatial processes of
anaphora – and problems of computational compatibility (see 2.2.2 below).
Fig. 72.3: SignWriting Transcription of the beginning of a story in LIS (Di Renzo et al. 2009). The
circled part provides a gestural unit (our “graphic cell”). The space of the cell analogically repre-
sents the signing space.
1130 V. Methods
The alternative to the various limitations posed by specific notation systems has been,
and remains, the written form of the spoken language. From the very beginning of lin-
guistic research (e.g. Stokoe 1960; Klima and Bellugi 1979), sign language researchers
have resorted to graphic representations based on a “gloss,” that is, the representation
of isolated sign language signs (or sequences) by written words (or sequences) of the
national spoken language, posed as representing the signs of the sign language studied.
These gloss-based notations have undergone recent systematisation to overcome the
limitations of mono-linearity.
3. Annotation systems
3.1. Multi-Linear annotation
Johnston (1991) proposes a system he calls the Interlinear System, a primary concern of
which is the representation of the signifier form via HamNoSys, alongside the essential
use of written English. The issue, at least initially (see Johnston 2001), was to ensure
access to the data. The Interlinear System is presented as multi-linear; however, the typ-
ical relation between the lines is simple superposition, showing the same phenomena at
different levels of analysis. The fields used are:
(i) a “phonetic” notation in HamNoSys (actually, a notation of the signs in the citation
form, complemented, if necessary, by the spatial specifications available in
HamNoSys),
(ii) facial expression,
(iii) an English gloss of conventional signs,
(iv) any signs of mouthing, and
(v) a translation in English.
(i) a parametric score transcription (right hand, left hand, two hands, body, face, eye
gaze and facial expression),
(ii) systematic recovery of each element of the score, here numbered and explained,
(iii) a literal gloss translation in written French,
(iv) a translation in standard French.
Parts (2) and (3) are original and intended to accurately assign each annotated element
to its meaning component, thus partially compensating for the loss of information of
signifier forms.
72. Transcription systems for sign languages 1131
ELAN is programmed in Java under xml and is compatible with Mac OS as well as
with Windows and Linux. It is part of a set of freeware tools available on the Max
Planck Institute platform (Language Archiving Technology), where it is connected,
in particular, to the ARBIL tool for the precise (and almost exhaustive) administra-
tion of metadata (IMDI Metadata tools). The first step in the creation of an annota-
tion grid, is the definition of the template (e.g. tiers, types and stereotypes, controlled
vocabulary), allowing a hierarchy of information determined on the basis of the de-
sired analysis. As shown in Fig. 72.4, the ELAN grid is organized in distinct parts:
(a) the video, which progresses in sync with the other elements, (b) the annotation
grid, and (c) additional textual and numerical elements. The video alignment indicator
(or cursor) is symbolized by a thin vertical line in the grid (b), thus keeping track of
the video reference. Once the annotation template is established, annotation can be
performed simultaneously by multiple annotators, thus promoting interaction between
users.
(a)
(c)
(b)
Fig. 72.4: Screen capture of annotation with ELAN (Garcia et al. 2011)
The convenience and speed of multimedia tools enable the detailed annotation of large
corpora within reasonable timeframes, undoubtedly bringing a new era to sign language
research. Resultant annotations become, in effect, machine-readable, enabling new
and innovative types of analysis (e.g. diversification of statistical queries, lexicometry).
So, the use of such software provides a real heuristic dimension, enabling new types
of observation. However, these software are complex computerised tools that require
regular updating and maintenance that only large research institutes can provide.
72. Transcription systems for sign languages 1133
transfer units are, at best, equivalent to a clause in spoken language (e.g. “the thin elon-
gated vertical shape moves slowly towards a fixed horizontal oval shape”). Whatever
the theoretical status granted to such units, no sign language linguist can now deny
their massive presence in discourse (e.g. Emmorey 2003; Liddell 2003). The problem
of their graphical representation and the inadequacy of spoken language glosses for
such purposes are thus unavoidable. Ultimately, for Pizzuto, Rossini, and Russo
(2006), Garcia (2006, 2010), and Antinoro Pizzuto and Garcia (in preparation), these
problems are epistemological. Pizzuto and Pietrandrea (2001) have stressed that the
so-called “gloss” or “annotation” of sign language cannot claim this status in the
sense it has in spoken language research, to the extent (and quasi-systematically from
now on) that it lacks the initial level of transcribing the signifier form. This is not a prob-
lem in the annotation of any spoken language, even ones without their own writing sys-
tem, since spoken languages can always be phonetically transcribed using the
international phonetic alphabet. The video clip incorporated into the so-called annota-
tion software (see section 3.2.2) is not a functional equivalent, and even less so, given
that the sign language signal corresponds to highly multi-linear signifiers.
The Berkeley Transcription System (BTS), proposed by Slobin’s and Hoiting’s team
(Hoiting and Slobin 2002; Slobin et al. 2001), was explicitly developed to find alterna-
tives to sign language’s glossing problems. It is based on allowing a graphical (mono-lin-
ear) symbolisation, not on the level of the sign but of the morpheme, lexical as well as
grammatical. Originally designed for the transcription of American Sign Language and
Sign Language of the Netherlands (NGT), the Berkeley Transcription System, which is
connected to the CHILDES system (MacWhinney 2000), allows the transcription of
any sign language. However, its primary medium remains written English, although
supplemented by all available graphical resources (e.g. typographical variants, arrows,
brackets, figures), in rigorously standardised uses. Indeed, this system simply transfers
the limitations of glossing to the sub-lexical level. Another problem, aside from its
low readability, is the rigidity introduced by the labelling system provided, which is
based on specific theoretical assumptions that are not necessarily shared.
For Johnston (1991, 2001, 2008), the only real issue is a consistent use of glossing.
This requires lemmatisation of the lexicon of studied sign language, which is posited
as essential to any “modern corpus” (i.e. machine-readable) and requires the formation,
prior to any annotation, of a lexical database consisting of the systematic assignment of
ID-glosses to these lemmas, to be enhanced and revised later, following the analysis of
large corpora (see Auslan database). While such a lexical database undoubtedly pro-
vides internal glossing consistency, its constituents may be subject to debate (the criteria
for defining the lexical unit and predetermination of the lemma and its variants prior to
discourse analysis, see Konrad 2011). The underlying focus on conventional signs alone
cannot resolve the issue of the annotation of transfer units, let alone the problem of
their representation.
4.2. Perspectives
On the technical level, we can expect much faster progress in the field of image analysis
and automatic recognition, and in development of tools of video inlay and segmentation
(Braffort and Dalle 2008; Collet, Gonzalez, and Milachon 2010). However, two main
72. Transcription systems for sign languages 1135
types of advances are necessary if we are to resolve the problems posed by the current
annotation practices of sign language corpora.
First, while the internal consistency of annotations is, naturally, a prerequisite for any
productive and significant automatic processing of corpora, lemmatisation, which is
equally necessary, must take into account non-conventional units and their components,
which form 30–80% of these corpora. Cuxac’s (2000) hypothesis of a morphemic
compositionality, many components of which are shared by both conventional units and
transfer units, seems to open a promising alternative route for the creation of lexical
databases that are more faithful to the structures of sign language (Garcia 2006, 2010).
At the same time, progress is needed in the development and/or improvement of nota-
tion systems that allow analytical representation of the signifier form in discourse, which
remains the only way to allow rigorous elaboration of the annotations linked to the signal
itself. On this point, we believe that much can be expected from the continuation of ex-
periments on SignWriting noted above (see Bianchini et al. 2011) and from current efforts
to integrate it into annotation software (e.g. Antinoro Pizzuto and Garcia in preparation).
5. References
Antinoro Pizzuto, Elena, Isabella Chiari and Paolo Rossini 2008. The representation issue and its
multifaceted aspects in constructing sign language corpora: Questions, answers, further prob-
lems. In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik,
Stelios Piperidis and Daniel Tapias (eds.), Proceedings of LREC 2008, Sixth International Con-
ference on Language Resources and Evaluation, 150–158. Paris: European Language Resources
Association.
Antinoro Pizzuto, Elena and Brigitte Garcia in preparation. Annotation tools for sign language
(SL): The nodal problem of the graphical representation of forms. In: Terry Janzen and Sher-
man Wilcox (eds.), Cognitive Approaches to Signed Language Research. Berlin: De Gruyter
Mouton.
Antinoro Pizzuto, Elena, Paolo Rossini, Marie-Anne Sallandre and Erin Wilkinson 2008. Deixis,
anaphora and highly iconic structures: Cross-linguistic evidence on American (ASL), French
(LSF) and Italian (LIS) Signed Languages. In: Ronice Müller de Quadros (ed.), Proceedings
of TISLR9, Theoretical Issues in Sign Language Research Conference, 475–495. Petrópolis,
Rio de Janeiro Brazil: Editora Arara Azul.
Baker-Shenk, Charlotte 1983. Nonmanual behaviors in sign languages: Methodological concerns
and recent findings. In: William Stokoe and Virginia Volterra (eds.), Sign Language Research,
175–184. Burtonsville Maryland: Linstock Press.
Battison, Robbin 1973. Phonology in American Sign Language: 3-D and digitvision. Paper pre-
sented at the California Linguistic Association Conference, Stanford, CA.
Bébian, Auguste 1825. Mimographie, ou Essai d’Écriture Mimique, Propre à Régulariser le Lan-
gage des Sourds-Muets. Paris: Louis Colas.
Bentele, Susan 1999. HamNoSys. Sample of sentences from Goldilocks. http://www.signwriting.
org/forums/linguistics/ling007.html.
Bergman, Brita, Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.) 2001. Sign Tran-
scription and Database Storage of Sign Information, special issue of Sign Language and Lin-
guistics 4(1/2). Amsterdam: John Benjamins.
Bianchini, Claudia S., Gabriele Gianfreda, Alessio di Renzo, Tommaso Lucioli, Giulia Petitta,
Barbara Pennacchi, Luca Lamano and Paolo Rossini 2011. Ecrire une langue sans forme écrite:
Réflexions sur l’écriture et la transcription de la Langue des Signes Italienne (LIS). In: Gilles
Col and Sylvester N. Osu (eds.), Transcrire, écrire, formaliser, 1. Traveaux Linguistiques Cer-
LiCO, 71–89. Rennes, France: Presses Universitaires de Rennes.
1136 V. Methods
Bonnal, Françoise 2005. Sémiogenèse de la langue des signes française: étude critique des signes
attestés sur support papier depuis le XVIIIe siècle et nouvelles perspectives de dictionnaires.
Ph.D. dissertation, University of Toulouse Le Mirail, France.
Boutora, Leila 2005. Bibliographie sur les formes graphiques des langues des signes. Projet
RIAM-ANR LS Script, 25 pages. http://lpl-aix.fr/˜fulltext/4786.pdf.
Boyes-Braem, Penny 2001. Sign language text transcription and analyses using ‘Microsoft Excel.’
In: Brita Bergman, Penny Boyes-Braem, Thomas Hanke and Elena Antinoro Pizzuto (eds.),
Sign Transcription and Database Storage of Sign Information, special issue of Sign Language
and Linguistics 4(1/2): 241–250. Amsterdam: John Benjamins.
Braffort, Annelies and Patrice Dalle 2008. Sign language applications: Preliminary modelling.
Universal Access in the Information Society, special issue 6(4): 393–404. Berlin: Springer.
Brennan, Mary 1990. Productive morphology in British Sign Language. In: Siegmund Prillwitz and
Tomas Vollhaber (eds.), Proceedings of the International Congress on Sign Language Research
and Application, Hamburg’90, 205–228. Hamburg: Signum.
Brugman, Hennie and Albert Russel 2004. Annotating multimedia/multi-modal resources with
ELAN. In: Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa and Ra-
quel Silva, with the collaboration of Carla Pereira, Filipa Carvalho, Milene Lopes, Mónica Cat-
arino and Sérgio Barros (eds.), Proceedings of LREC 2004, Fourth International Conference on
Language Resources and Evaluation, 2065–2068. Paris: European Language Resources
Association.
Chen Pichler, Deborah, Julie A. Hochgesang, Diane Lillo-Martin and Ronice Muller de Quadros
2010. Conventions for sign and speech transcription of child bimodal bilingual corpora in
ELAN. Langage Interaction Acquisition 1(1): 11–40. Amsterdam/Philadelphia: John Benjamins.
Collet, Christophe, Matilde Gonzalez and Fabien Milachon 2010. Distributed system architecture
for assisted annotation of video corpora. In: Nicoletta Calzolari, Khalid Choukri, Bente Mae-
gaard, Joseph Mariani, Jan Odjik, Stelios Piperidis and Daniel Tapias (eds.), Proceedings of
LREC 2008, Sixth International Conference on Language Resources and Evaluation, 49–52.
Paris: European Language Resources Association.
Crasborn, Onno and Han Sloetjes 2008. Enhanced ELAN functionality for sign language corpora.
In: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Pi-
peridis and Daniel Tapias (eds.), Proceedings of LREC 2008, Sixth International Conference on
Language Resources and Evaluation, 39–43. Paris: European Language Resources Association.
Cuxac, Christian 1996. Fonctions et structures de l’iconicité dans les langues des signes. Thèse de
Doctorat d’Etat, Paris 5 University.
Cuxac, Christian 2000. La Langue des Signes Française (LSF). Les Voies de l’Iconicité. Faits de
Langue, Paris: Ophrys.
Di Renzo, Alessio, Luca Lamano, Tommaso Lucioli, Barbara Pennachi and Luca Ponzo 2006. Ital-
ian Sign Language: Can we write it and transcribe it with Sign Writing? In: Proceedings of
LREC 2006, Fifth International Conference on Language Resources and Evaluation, 11–16.
Genova, Italy. http://www.lrec-conf.org/proceedings/lrec2006/
Di Renzo, Alessio, Gabriele Gianfreda, Luca Lamano, Tommaso Lucioli, Barbara Pennachi, Paolo
Rossini, Claudia Bianchini, Giulia Petitta and Elena Antinoro Pizzuto 2009. Representation –
Analysis – Representation: novel approaches to the study of face-to-face and written narratives
in Italian Sign Language (LIS). Spoken Communication. Colloque International sur les langues
des signes (CILS), Namur, 16–20 novembre 2009, Palais des Congrès de Namur, Belgique.
Emmorey, Karen (ed.) 2003. Perspectives on Classifier Constructions in Sign Languages. Mahwah,
NJ: Lawrence Erlbaum.
Fischer, Renate 1995. The notation of sign languages: Bébian’s Mimographie. In: Heleen F. Bos
and Gertrude M. Schermer (eds.), Sign Language Research 1994, Proceedings of the Fourth
European Congress on Sign Language Research, 285–302. (International Studies on Sign Lan-
guage and Communication of the Deaf 29.) Hamburg: Signum.
72. Transcription systems for sign languages 1137
Fusellier-Souza, Ivani 2004. Sémiogenèse des Langues des Signes. Étude de langues des signes pri-
maires (LSP) pratiquées par des sourds brésiliens. Ph.D. dissertation, Paris 8 University.
Garcia, Brigitte 2000. Contribution à l’histoire des débuts de la recherche linguistique sur la Langue
des Signes Française (LSF); les travaux de Paul Jouison. Ph.D. dissertation, Paris 5 University.
Garcia, Brigitte 2006. The methodological, linguistic and semiological bases for the elaboration of
a written form of LSF (French Sign Language). In: Proceedings of LREC 2006, Fifth Interna-
tional Conference on Language Resources and Evaluation, 31–36. Genova, Italy. http://www.
lrec-conf.org/proceedings/lrec2006/
Garcia, Brigitte 2010. Sourds, surdité, langue(s) des signes et épistémologie des sciences du lan-
gage. Problématiques de la scripturisation et modélisation des bas niveaux en Langue des
Signes Française (LSF). Habilitation Thesis à Diriger les Recherches, Paris 8 University.
Garcia, Brigitte, Marie-Anne Sallandre, Camille Schoder and Marie-Thérèse L’Huillier 2011. Ty-
pologie des pointages en Langue des Signes Française (LSF) et problématiques de leur anno-
tation. In: Proceedings of TALN Conference 2011, 107–119. Montpellier, France. http://
degels2011.limsi.fr/actes/themes.html
Hanke, Thomas and Siegmund Prillwitz 1995. SyncWRITER. Integrating video into the transcrip-
tion and analysis of sign language. In: Trude Schermer and Heleen Bos (eds.), Proceedings of
the Fourth European Congress on Sign Language Research, 303–312. (International Studies on
Sign Language and Communication of the Deaf 29.) Hamburg: Signum.
Hanke, Thomas and Jakob Storz 2008. Ilex – a database tool for integrating sign language corpus
linguistics and sign language lexicography. In: Nicoletta Calzolari, Khalid Choukri, Bente Mae-
gaard, Joseph Mariani, Jan Odjik, Stelios Piperidis and Daniel Tapias (eds.), Proceedings of
LREC 2008, Sixth International Conference on Language Resources and Evaluation, 64–67.
Paris: European Language Resources Association.
Hoiting, Nini and Dan Slobin 2002. Transcription as a tool for understanding: The Berkeley Tran-
scription System for sign language research (BTS). In: Gary Morgan and Bencie Woll (eds.),
Directions in Sign Language Acquisition, 55–75. Amsterdam: John Benjamins.
Johnston, Trevor 1991. Transcription and glossing of sign language texts: Examples from Auslan
(Australian Sign Language). International Journal of Sign Linguistics 2(1): 3–28.
Johnston, Trevor 2001. The lexical database of Auslan (Australian Sign Language). In: Ronnie Wilbur
(ed.), Sign Language and Linguistics 4(1/2): 145–169. Amsterdam/Philadelphia: John Benjamins.
Johnston, Trevor 2008. Corpus linguistics and signed languages: No lemmata, no corpus. In: Nico-
letta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis
and Daniel Tapias (eds.), Proceedings of LREC 2008, Sixth International Conference on Lan-
guage Resources and Evaluation, 82–87. Paris: European Language Resources Association.
Jouison, Paul 1990. Analysis and linear transcription of sign language discourse. In: Siegmund
Prillwitz and Tomas Vollhaber (eds.), Proceedings of the International Congress on Sign Lan-
guage Research and Application, Hamburg’90, 337–354. Hamburg: Signum.
Jouison, Paul 1995. Ecrits sur la Langue des Signes Française (LSF). Edition critique établie par
Brigitte Garcia. Paris: L’Harmattan.
Kipp, Michael 2001. Anvil - A Generic Annotation Tool for Multimodal Dialogue. In: Proceedings
of Eurospeech 2001, 1367–1370. Aalborg. http://www.lrec-conf.org/proceedings/lrec2002/pdf/289.
pdf
Kipp, Michael 2012. Multimedia annotation, querying and analysis in ANVIL. In: Mark T. May-
bury (ed.), Multimedia Information Extraction, Chapter 19. IEEE Computer Society Press.
Klima, Edouard S. and Ursula Bellugi 1979. The Signs of Language. Cambridge, MA: Harvard
University Press.
Konrad, Reiner 2011. Die Lexikalische Struktur der Deutschen Gebärdensprache im Spiegel Em-
pirischer Fachgebärdenlexikographie. Zur Integration der Ikonizität in ein Korpusbasiertes Lex-
ikonmodell. Tübingen: Narr.
Liddell, Scott K. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge:
Cambridge University Press.
1138 V. Methods
MacWhinney, Brian 2000. The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawr-
ence Erlbaum.
Martin, Joe 2000. A linguistic comparison. The notation systems for signed languages: Stokoe
Notation and Sutton SignWriting. http://www.signwriting.org/forums/linguistics/ling016.html.
Miller, Christopher 2001. Some reflections on the need for a common sign notation. In: Brita
Bergman, Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.), Sign Transcription
and Database Storage of Sign Information, special issue of Sign Language and Linguistics 4
(1/2): 11–28. Amsterdam: John Benjamins. First published in [1994].
Montaigne, Michel de 2009. Les Essais. Paris: Gallimard.
Neidle, Carol 2001. SignStream™: A database tool for research on visual-gestural language. In:
Brita Bergman, Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.), Sign Transcrip-
tion and Database Storage of Sign Information, special issue of Sign Language and Linguistics 4
(1/2): 203–214. Amsterdam: John Benjamins.
Piroux, Joseph 1830. Le Vocabulaire des Sourds-Muets (Partie Iconographique). Nancy, France:
Grimblot.
Pizzuto, Elena and Paola Pietrandrea 2001. The notation of signed texts: Open questions and in-
dications for further research. In: Brita Bergman, Penny Boyes-Braem, Thomas Hanke and
Elena Pizzuto (eds.), Sign Transcription and Database Storage of Sign Information, special
issue of Sign Language and Linguistics 4(1/2): 29–45. Amsterdam: John Benjamins.
Pizzuto, Elena, Paolo Rossini and Tomasso Russo 2006. Representing signed languages in written
form: Questions that need to be posed. In: Proceedings of LREC 2006, Fifth International Con-
ference on Language Resources and Evaluation, 1–6. Genova, Italy. http://www.lrec-conf.org/
proceedings/lrec2006/
Prillwitz, Siegmund, Regina Leven, Heiko Zienert, Thomas Hanke and Jan Henning 1989. Ham-
burg Notation System for Sign Languages. An Introductory Guide, HamNoSys Version 2.0.
Hamburg: Signum Press.
Saint-Augustin 2002. Le Maı̂tre. Traduction, présentation et notes de Bernard Jolibert, 2ème édi-
tion revue et corrigée. Paris: Klincksieck.
Sallandre, Marie-Anne 2003. Les unités du discours en Langue des Signes Française. Tentative de
catégorisation dans le cadre d’une grammaire de l’iconicité. Ph.D. dissertation, Paris 8 Université.
Slobin, Dan I., Nini Hoiting, Michelle Anthony, Yael Biederman, Marlon Kuntze, Reyna Lindert,
Jennie Pyers, Helen Thumann and Amy Weinberg 2001. Sign language transcription at the
level of meaning components: The Berkeley Transcription System (BTS). In: Brita Bergman,
Penny Boyes-Braem, Thomas Hanke and Elena Pizzuto (eds.), Sign Transcription and Data-
base Storage of Sign Information, special issue of Sign Language and Linguistics 4(1/2): 63–
104. Amsterdam: John Benjamins.
Stokoe, William C. 1960. Sign language structure. Studies in Linguistics – Occasional Papers 8.
Buffalo, NY: Department of Anthropology and Linguistics, University of Buffalo. (Revised
edition Silver Spring, MD: Linstock Press [1978]).
Stokoe, William C. 1987. Sign writing systems. In: John V. van Cleve (ed.), Gallaudet Encyclopedia
of Deaf People and Deafness, Volume 3, 118–120. New York: McGraw-Hill.
Stokoe, William C., Dorothy Casterline and Carl Croneberg 1976. A Dictionary of American Sign
Language on Linguistic Principles. Washington, DC: Gallaudet College. First published [1965].
Sutton, Valerie 1999. Lessons in SignWriting. Textbook and Workbook. La Jolla, CA: Deaf Action
Commitee for Sign Writing. First published [1995].