Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

The Neural Architecture of Grammar
The Neural Architecture of Grammar
The Neural Architecture of Grammar
Ebook392 pages4 hours

The Neural Architecture of Grammar

Rating: 0 out of 5 stars

()

Read preview

About this ebook

A comprehensive, neurally based theory of language function that draws on principles of neuroanatomy, cognitive psychology, cognitive neuropsychology, psycholinguistics, and parallel distributed processing.

Linguists have mapped the topography of language behavior in many languages in intricate detail. To understand how the brain supports language function, however, we must take into account the principles and regularities of neural function. Mechanisms of neurolinguistic function cannot be inferred solely from observations of normal and impaired language. In The Neural Architecture of Grammar, Stephen Nadeau develops a neurologically plausible theory of grammatic function. He brings together principles of neuroanatomy, neurophysiology, and parallel distributed processing and draws on literature on language function from cognitive psychology, cognitive neuropsychology, psycholinguistics, and functional imaging to develop a comprehensive neurally based theory of language function.

Nadeau reviews the aphasia literature, including cross-linguistic aphasia research, to test the model's ability to account for the findings of these empirical studies. Nadeau finds that the model readily accounts for a crucial finding in cross-linguistic studies—that the most powerful determinant of patterns of language breakdown in aphasia is the predisorder language spoken by the subject—and that it does so by conceptualizing grammatic function in terms of the statistical regularities of particular languages that are encoded in network connectivity. He shows that the model provides a surprisingly good account for many findings and offers solutions for a number of controversial problems. Moreover, aphasia studies provide the basis for elaborating the model in interesting and important ways.

LanguageEnglish
PublisherThe MIT Press
Release dateFeb 3, 2012
ISBN9780262300865
The Neural Architecture of Grammar

Related to The Neural Architecture of Grammar

Related ebooks

Linguistics For You

View More

Related articles

Reviews for The Neural Architecture of Grammar

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    The Neural Architecture of Grammar - Stephen E. Nadeau

    The Neural Architecture of Grammar

    The Neural Architecture of Grammar

    Stephen E. Nadeau

    The MIT Press

    Cambridge, Massachusetts

    London, England

    © 2012 Massachusetts Institute of Technology

    All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

    For information about special quantity discounts, please email [email protected] or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142.

    Library of Congress Cataloging-in-Publication Data

    Nadeau, Stephen E.

    The neural architecture of grammar / Stephen E. Nadeau.

       p.  cm.

    Includes bibliographical references and index.

    ISBN 978-0-262-01702-2 (hardcover : alk. paper)

    ISBN 978-0-262-30086-5 (retail e-book)

    1. Neurolinguistics. 2. Language and language—Grammars. I. Title.

    QP399.N33 2012

    612.8′2336—dc23

    2011028979

    10 9 8 7 6 5 4 3 2 1

    d_r1

    To Sue, Nicole, Hillary, and Leslie

    Contents

    Preface

    Acknowledgments

    1    Introduction

    2    A Parallel Distributed Processing Model of Language: Phonologic, Semantic, and Semantic–Phonologic (Lexical) Processing

    Concept Representations

    The Acoustic-Articulatory Motor Pattern Associator Network

    Lexicons

    The Representation of Knowledge in Auto-Associator and Pattern Associator Networks: Attractor Basins, Attractor Trenches, and Quasi-Regular Domains

    Semantic–Phonologic (Lexical) and Phonologic Impairment in Aphasias

    3    Grammar: The Model

    Semantic Contributions to Syntax

    Sequence: The Basis of Syntax

    Grammar: A Synthesis

    4    Disorders of Grammar in Aphasia

    Grammaticality Judgment and the Issue of Loss of Knowledge versus Loss of Access to Knowledge

    Syntax

    Deficits in Verb Production in Broca’s Aphasia—Potential Mechanisms

    Distributed Representations of Verb Semantics

    Syntax: Phrase Structure Rules

    Grammatic Morphology—Cross-Linguistic Aphasia Studies

    Grammatic Morphology—Special Cases

    The Competition Model and a Return to Syntax

    Disorders of Comprehension

    5    Conclusion

    Future Research Directions

    Glossary

    Notes

    References

    Index

    Preface

    Linguists have mapped the topography of language behavior in many languages in immaculate detail. However, to understand how the brain supports language function, it is necessary to bring the principles and regularities of neural function to the table. Mechanisms of neurolinguistic function cannot be inferred solely from observations of normal and impaired language. Our understanding of principles and regularities of neural function relevant to language stems predominantly from two sources: (1) knowledge of neuroanatomy and the neural systems that are based upon this anatomy and (2) knowledge of the patterns of behavior of neural networks composed of large numbers of highly interconnected units and supporting population-based (distributed) representations.

    The core neuroanatomy underlying language was largely subsumed in the Wernicke–Lichtheim model (Lichtheim 1885), and our knowledge of it has scarcely advanced beyond the principles laid out in Norman Geschwind’s famous paper Disconnexion Syndromes in Animals and Man (Geschwind 1965). Functional imaging studies from the outset suggested a remarkable degree of bilateral engagement of the brain in language function, particularly in the perisylvian regions, but functional imaging is generally ill equipped to distinguish regions that are essential from those that are incidental. As it turns out, studies of aphasia, looked at from the right perspective, and with particular attention to the effect of lesions on both white and gray matter, provide us with fairly powerful evidence of the extent to which various components of language function are bilaterally represented, and even the extent to which this might vary from language to language. This aspect of aphasia studies, largely overlooked to date, will constitute one of the recurring themes of this book.

    The evolution of our understanding of neural systems over the past 20 years has revealed the importance of systems dedicated to keeping order in the house—that is, maintaining coherent patterns of activity in the brain’s 100 billion neurons such that all is not cacophony and adaptive behavior more or less consistently emerges. Key are systems underlying selective engagement of particular neural networks in particular ways (Nadeau and Crosson 1997), which support what Patricia Goldman-Rakic (1990) termed working memory. Working memory and its hippocampal complement, episodic memory (Shrager et al. 2008), play very substantial roles in language function, roles that we are beginning to understand and that will constitute another recurring theme of this book. Much less well understood are the mechanisms by which linguistic behavior is driven by systems underlying goal representations.

    Our understanding of the population dynamics of neural network systems owes almost entirely to the study of parallel distributed processing (PDP), which was thrust so dramatically onto the world’s scientific stage by David Rumelhart and Jay McClelland and their collaborators in their seminal two-volume text, published 25 years ago (McClelland, Rumelhart, and PDP Research Group 1986). The science of PDP has continued to evolve at a stunning pace ever since, enhancing our understanding of constraints governing neural network systems, demonstrating how powerful these systems can be, and demonstrating new ways in which neural networks might support particular functions. Before PDP was established as a field of research, behavioral neuroscientists could only hope that one day, the complex behaviors they were systematically studying could ultimately be related in a precise way to neural structure and function. In a flash, PDP research showed how this could be done, and, repeatedly, PDP simulations have shown the power of neural networks to account in a highly detailed way for behavior in normal and damaged brains.

    This book begins with the development of a comprehensive, neurally based, theoretical model of grammatic function, drawing on principles of neuroanatomy and neurophysiology and the PDP literature, together with cognitive psychological studies of normal language, functional imaging studies, and cognitive neuropsychological and psycholinguistic studies of subjects with language disorders due to stroke, Alzheimer’s disease, Parkinson’s disease, and frontotemporal lobar degeneration. The remarkably detailed understanding of semantic function that has emerged substantially from the 20-year effort of the Addenbrookes Hospital group (Hodges, Patterson, Lambon Ralph, Rogers, and their collaborators, most notably Jay McClelland) has proven to be particularly important. From this, we have a cogent theory of semantic instantiation and breakdown. Much of grammar (that not dependent on networks instantiating sequence knowledge) turns out to be semantics. Research on the neural foundation of semantics, coupled with the studies of many other investigators, has led to a conceptualization of verbs as the product of multicomponent frontal and postcentral distributed representations that engage and modify noun representations, even as they are engaged by noun representations. Equally important has been PDP work on the capacity for instantiation of sequence knowledge by certain neural networks, coupled with work by psycholinguists, most notably Thompson and her colleagues at Northwestern University, on sentence-level sequence breakdown in aphasia and patterns of reacquisition of this knowledge during speech–language therapy. Concept representations interfaced with sequence knowledge provide the intrinsic basis for the temporal dynamic of language.

    The second half of this book reviews the aphasia literature, most importantly including cross-linguistic studies, testing the model in its ability to account for the findings of empirical studies not considered in its development. The model fully accommodates the competition model of Bates and MacWhinney (1989), which provided an enormous advance in our understanding of the neural basis of grammatic function, and it extends the competition model. It accounts for perhaps the single most trenchant finding of cross-linguistic aphasia studies—that the most powerful determinant of patterns of language breakdown in aphasia is the premorbid language spoken by the subject (You can’t take the Turkish out of the Turk). It does so by accounting for grammatic knowledge in terms of the statistical regularities of particular languages that are encoded in network connectivity. Only the most redundantly encoded regularities, or those that have significant bihemispheric representation, survive focal brain damage. The model provides a surprisingly good account for a large number of findings and unprecedented resolution of a number of controversial problems, including whether grammatic dysfunction reflects loss of knowledge or loss of access to knowledge; relative sparing of grammaticality judgment; the problem of verb past tense formation, including some of the wrinkles that have appeared in the course of cross-linguistic studies; cross-linguistic differences in patterns of syntactic breakdown; and the impact of inflectional richness on the resilience of phrase structure rules and grammatic morphology in the face of brain lesions. To the extent that the proposed model did not fully account for observed patterns of language breakdown, aphasia studies have provided the basis for elaborating the model in ways that are interesting and important, even as these elaborations are entirely consistent with the general principles of the original model and in no way involve ad hoc extensions.

    Ultimately, however, the model represents but a new beginning. I have done my best, particularly in the concluding chapter, to delineate possible directions for further research.

    Acknowledgments

    My greatest debt is to my wife and daughters, who not only put up with me and tolerated the vast time and effort invested in this work but provided their usual loving support throughout the endeavor. This book would not have been written had not Kenneth Heilman inspired me to pursue a career in neurology and behavioral neurology and taught me the very powerful way of thinking about the brain that he developed so brilliantly and that so honored his mentor, Norman Geschwind. The substance of this book owes to the phenomenal work of the many scientists of language who, over the past 150 years, have brought our understanding of language and the brain from rudimentary beginnings to a point where the multitude of threads from various lines of investigation are beginning to come together in a quite marvelous way and give us the first glimmerings of a comprehensive understanding. I am very grateful to two anonymous reviewers, who made enormously helpful suggestions. I am indebted to John Richardson and George Kelvin for the illustrations. Last but not least, I am indebted to my editor, Ada Brunstein, whose enthusiasm and care for this book were so instrumental in making it a reality.

    1

    Introduction

    The goal of this book is to develop a neurologically plausible theory of grammatic function. Given all that we now know about neuroanatomy and principles of neural network function, it is no longer possible to propose such a theory without taking this knowledge into account. Linguists have been remarkably successful in mapping the regularities of grammatic performance in exquisite detail, and any neurological theory of language must be able to offer a logical and detailed explanation for these regularities, as well as for the errors of grammatic function that occur with slips of the tongue by normal subjects and in aphasia in subjects with brain injury.

    Linguists have also developed a sophisticated mathematics to account for grammatic phenomenology, normal and abnormal, in an orderly way. Unfortunately, they have not been successful in reconciling this mathematics in all its various derivations with principles of neurobiology. Linguistic theories of grammar (e.g., government and binding theory) are discrete, hierarchical, deterministic, explicitly rule defining, and essentially orthogonal to semantics. The brain supports distributed functions; these functions are hierarchical only in the Hughlings Jackson sense (higher systems supporting more complex functions inhibit lower systems, which may become released with damage to higher systems); brain order is chaotic rather than deterministic; rules are not defined but instead emerge from network behavior, constrained by network topography and within- and between-network connectivity, which is acquired through experience and reflects statistical regularities of experience; and grammar results from a process involving parallel constraint satisfaction in which semantic knowledge, semantic–phonologic/morphologic (lexical) knowledge, sequence knowledge, skills in concept manipulation, and recall of what has recently been said define the final product. However powerful and heuristically useful linguistic theories are, their failure to accommodate these cardinal principles of brain function constitutes a serious shortcoming.

    The chapter and verse of this book consists of the enormous body of work, principally from the fields of psycholinguistics and cognitive neuropsychology, that enables us to infer how the normal brain produces language from the aberrations in language function that follow brain damage. However, the underlying themes of the book derive from knowledge of neural systems, the well-established topography of cerebral association cortices supporting language, patterns of white matter connectivity between these cortices, the geographical and neurological impact of the vascular lesions and degenerative diseases that are the most common causes of aphasia, and the neural representation of knowledge. Because of the very serious methodological problems that have plagued functional imaging methodology, it has made but a modest contribution to our understanding of the neural architecture of grammar (see, e.g., reviews by Crepaldi et al. 2011 and Démonet, Thierry, and Cardebat 2005), and references to works in this field will be limited.¹

    It has become clear that the fundamental unit of cortical function is the neural network (Buonomano and Merzenich 1998), and a paradigmatic leap achieved in the 1980s, which yielded the field of parallel distributed processing (PDP; McClelland, Rumelhart, and PDP Research Group 1986), now enables us to peer deeply into the computational principles of neural network function and directly link complex behaviors to neural network function. Thus, PDP constitutes another major pillar of this book.

    PDP models can incorporate a large variety of model-specific assumptions, including ones that are not neurologically plausible. However, in this book, I will argue from the simplest, most limited set of assumptions possible, all of which receive substantial if not overwhelming support from neurobiological research. These are as follows:

    • The fundamental unit of cerebral function is the neural network (Buonomano and Merzenich 1998).

    • A neural network is comprised of a large number of relatively simple units, each of which is heavily connected with many if not all the other units in the network, hence the term connectionist model.

    • The knowledge in a network (long-term memory) consists of the pattern of connection strengths between units, corresponding to synaptic strengths between neurons (an idea that likely originated with Hebb 1949).

    • The principal operational currency of a network is the activity of individual units (which corresponds to states of depolarization or firing rates of neurons). Activity in some units will naturally spread to other units, constrained by the pattern of connection strengths.

    • Activation levels of units and the flow of activation between units involve nonlinear functions.

    • A cognitive representation consists of a specific pattern of activity of the units in a network (which defines working memory). Because this pattern involves a substantial portion of the units, the representation is referred to as distributed or population encoded (Rolls and Treves 1998).

    • Networks are linked to each other by very large numbers of connections. I will presume that the pattern of the network linkages underlying language function reflects the known topography of cortical functions, as substantially defined by information-processing models, and that there is no intrinsic incompatibility between information-processing models and the PDP conceptualizations to be discussed.

    • Learning consists of altering connection strengths.

    • Links between networks enable transformation of a representation in one modality in one network into a different representation, corresponding to another modality, in the connected network.

    These simple properties enable a complex array of functions even as they strongly constrain the nature of processing that must occur. Much of this book will consist of elaboration of this statement in the domain of grammatic function. The surprisingly great functional capacity of networks defined by these principles has been repeatedly demonstrated in the ability of PDP simulations to account in precise quantitative fashion for a vast range of empirical phenomena in normal and brain-injured subjects. More generally, connectionist concepts are now deeply embedded in and receive enormous support from mainstream neuroscientific research (e.g., Rolls and Deco 2002; Rolls and Treves 1998).

    Yusef Grodzinsky has been at once one of the scientists most knowledgeable about the aphasia literature and one of the stoutest defenders of a discrete, localized, domain-specific organ of grammar, presumably located in Broca’s area. In his 2000 paper, he stated, It is important to note that a theory is best ‘refuted’ not by data … but rather by an alternative proposal (Grodzinsky 2000, p. 56). In this book I seek to provide such an alternative proposal: a theoretical model of language function based in neuroanatomy and connectionist principles that also accommodates the accumulated empirical findings of psycholinguistic and cognitive neuropsychological studies. Before proceeding to the details, it is worth asking how the reader should judge the model presented here. MacWhinney and Bates (Bates and MacWhinney 1989, p. 36) have said it just right:

    [A model] itself cannot be disconfirmed by any single experiment.… [It] must instead be evaluated in terms of (1) its overall coherence, (2) its heuristic value in inspiring further research, and (3) its performance compared with competing accounts. Components of a model are tested one at a time, and if predictions fail, modifications are made and new concepts are introduced. Ideally, these new concepts should have an independent motivation (i.e., they should not be added merely to save the model), and they should lead to new discoveries in their own right. But the framework as a whole will be rejected only if (1) it loses coherence, weighted down by circularities and ad hoc assumptions, (2) it loses its heuristic value, and/or (3) a better account of the same phenomena comes along.

    One last apology: many of the conclusions in this book will be stated with a degree of confidence that is not warranted by the limited data available. To some extent this will be inadvertent. To a large extent, however, it reflects a deliberate strategy to minimize the muddiness that would be introduced by repeated qualifications and caveats. In any event, the conclusions should always be viewed as testable hypotheses rather than accepted facts. No systematic attempt has been made to contrast theories presented in this book with existing linguistic or cognitive neuropsychological theories for the simple reason that none of these theories has been comprehensively related to neural structure.

    I begin the discussion of the model with a brief review of phonology from a connectionist perspective. This review will serve to introduce the basic concepts: the organization and operating principles of the phonologic core of language, the neural basis for and the organization of distributed semantic representations, and the neural network basis of sequence knowledge. The intrinsic temporal dynamics of language derive from the interplay between distributed concept representations and different domains of sequence knowledge, hence the importance of a clear understanding of these core principles from the outset. The flow of concept representations corresponding to the flow of thought of course provide an external source of temporal dynamic.

    2

    A Parallel Distributed Processing Model of Language: Phonologic, Semantic, and Semantic–Phonologic (Lexical) Processing

    The Wernicke–Lichtheim (W-L) information-processing model of language function has played a dominant role in understanding aphasic syndromes (Lichtheim 1885) and has stood the test of time in defining the topographical relationship between the modular domains (acoustic representations, articulatory motor representations, and concept representations) underlying spoken language function. Unfortunately, the W-L information-processing model does not specify the characteristics of the representations within these domains and how they might be stored in the brain. It also does not address the means by which these domains might interact. I have proposed a PDP model that uses the same general topography as the W-L model (Nadeau 2001; Roth et al. 2006) but also specifies how representations are generated in the modular domains and how knowledge is represented in the links between these domains (see figure 2.1). Though not tested in simulations, this model is neurally plausible and provides a cogent explanation for a broad range of psycholinguistic phenomena in normal subjects and subjects with aphasia.

    Figure 2.1

    Figure 2.1

    Proposed parallel distributed processing model of language. (Roth, H. L., S. E. Nadeau, A. L. Hollingsworth, A. M. Cimino-Knight, and K. M. Heilman. 2006. Naming Concepts: Evidence of Two Routes. Neurocase 12:61–70.) Connectivity within the substrate for concept representations defines semantic knowledge. Connectivity within the acoustic–articulatory motor pattern associator defines phonologic sequence knowledge. Connectivity between the substrate for concept representations and the acoustic–articulatory motor pattern associator defines lexical knowledge (see text for details).

    The PDP modification of the W-L model posits that the acoustic domain (akin to Wernicke’s area) contains large numbers of units located in auditory association cortices that represent acoustic features of phonemes.¹ The articulatory domain (analogous to Broca’s area) contains units located predominantly in dominant frontal operculum that represent discrete articulatory features of speech, as opposed to continuously variable motor programs (e.g., phonemic distinctive features). The semantic or conceptual domain contains an array of units distributed throughout unimodal and polymodal association cortices that represent semantic features of concepts. For example, the representation of the concept of house might correspond to activation of units representing features of houses such as visual attributes, construction materials, contents (physical and human), and so on (each feature in turn a distributed representation over more primitive features). Each unit within a given domain is connected to many, if not most, of the other units in that same domain (symbolized by the small looping arrow appended to each domain in figure 2.1). Knowledge within each domain is represented as connection strengths between the units. Thus, semantic knowledge is represented as the pattern of connection strengths throughout the association cortices supporting this knowledge. Within any domain, a representation corresponds to a specific pattern of activity of all the units, hence the term distributed representation (a synonym for a population encoded representation). Each unit within each of these domains is connected via interposed hidden units to many, if not most, of the units in the other domains. During learning of a language, the strengths of the connections between the units are gradually adjusted so that a pattern of activity involving the units in one domain elicits the correct pattern of activity in the units of another domain. The entire set of connections between any two domains forms a pattern associator network. Hidden units are units whose activity cannot be directly interpreted in behavioral terms. The hidden-unit regions, in conjunction with nonlinear unit properties, enable the systematic association of representations in two connected domains that may be arbitrarily related to one another (e.g., word sound and word meaning). The model employs left–right position in acoustic and articulatory motor representations as a surrogate for temporal order in precisely the same way as the reading model of Plaut et al. (1996). Thus, acoustic and articulatory motor representations would feature positions for each output phoneme or distinctive feature, ordered as they are in the phonologic word form. The use of left to right sequential order in lieu of temporal order is a device of convenience, but there is evidence of this temporal–geographic transform in the brain (Cheung et al. 2001). During any type of language processing, initiated by input to any domain of the network, there will be almost instantaneous engagement of all domains of the network. Thus, linguistic behavior is best viewed as the emergent product of the entire network.

    I will now focus on particular components of the network in order to provide a more detailed understanding of how they work and the nature of the knowledge they support.

    Concept Representations

    As I have noted, the Wernicke–Lichtheim information-processing model provides no insight into the nature of the representations in the various domains. The nature of concept representations (depicted in figure 2.1) can be best illustrated by a particularly illuminating model developed by David Rumelhart and his colleagues (Rumelhart et al. 1986). This rooms in a house model was comprised of 40 feature units, each corresponding to an article typically found in particular rooms or an aspect of particular rooms. Each unit was connected with all the other units in the network—an attribute that defines the model as an auto-associator network. Auto-associator networks have the capacity for settling into a particular state that defines a representation. Connection strengths were defined by the likelihood that any two features might appear in conjunction in a typical house. When one or more units was clamped into the on state (as if the network had been shown these particular features or articles), activation spread throughout the model and the model eventually settled into a steady state that implicitly defined a particular room in a house. Thus, clamping oven ultimately resulted in activation of all the items one would expect to find in a kitchen and thereby implicitly defined, via a distributed or population encoded representation, the concept of a kitchen. No kitchen unit per se was turned on. Rather, kitchen was defined by the pattern of feature units that were activated. The network contained the knowledge, in the totality of its connections, that enabled this representation to be generated. The 40-unit model actually had the capability of generating distributed representations of a number of different rooms in a house (e.g., bathroom, bedroom, living room, study), subcomponents of rooms (e.g., easy chair and floor lamp, desk and desk chair, window and drapes) and blends of rooms that were not anticipated in the programming of the model (e.g., clamping both bed and sofa led to a distributed representation of a large, fancy bedroom

    Enjoying the preview?
    Page 1 of 1