1 - Teoría Cognitiva Del Aprendizaje Multimedia MAYER Ingles PDF
1 - Teoría Cognitiva Del Aprendizaje Multimedia MAYER Ingles PDF
1 - Teoría Cognitiva Del Aprendizaje Multimedia MAYER Ingles PDF
Cognitive Theory of
Multimedia Learning
Richard E. Mayer
University of California, Santa Barbara
31
to use words and pictures to improve hu- What is the role of a theory of learning in
man learning. multimedia design? Much of the work pre-
A fundamental hypothesis underlying re- sented in this handbook is based on the
search on multimedia learning is that mul- premise that the design of multimedia in-
timedia instructional messages that are de- structional messages should be compatible
signed in light of how the human mind with how people learn. In short, the de-
works are more likely to lead to meaning- sign of multimedia instructional messages
ful learning than those that are not. For the should be sensitive to what we know about
past 1 5 years my colleagues and I at the how people process information. The cog-
University of California, Santa Barbara have nitive theory of multimedia learning rep-
been engaged in a sustained effort to con- resents an attempt to help accomplish this
struct an evidenced-based theory of mul- goal by describing how people learn from
timedia learning that can guide the design words and pictures, based on consistent em-
of effective multimedia instructional mes- pirical research evidence (e.g., Mayer, 2001 ,
sages (Mayer 2001 , 2002, 2003 a; Mayer & 2002, 2003 a; Mayer & Moreno, 2003 ) and
Moreno, 2003 ). on consensus principles in cognitive science
What is a multimedia instructional (e.g., Bransford, Brown, & Cocking, 1 999;
message? A multimedia instructional mes- Lambert & McCombs, 1 998; Mayer, 2003 b).
sage is a communication containing words In building the cognitive theory of mul-
and pictures intended to foster learning. timedia learning my colleagues and I were
The communication can be delivered using guided by four criteria: theoretical plausibil-
any medium, including paper (i.e., book- ity – the theory is consistent with cognitive
based communications) or computers (i.e., science principles of learning; testability – the
computer-based communications). Words theory yields predictions that can be tested
can include printed words (such as you in scientific research; empirical plausibility –
are now reading) or spoken words (such as the theory is consistent with empirical re-
in a narration); pictures can include static search evidence on multimedia learning; and
graphics – such as illustrations or photos – applicability – the theory is relevant to edu-
or dynamic graphics – such as animation or cational needs for improving the design of
video clips. This definition is broad enough multimedia instructional messages. In this
to include textbook chapters, online lessons chapter, I describe the cognitive theory of
containing animation and narration, and multimedia learning, which is intended to
interactive simulation games. For example, meet these criteria. In particular, I sum-
Figure 3 .1 presents frames from a narrated marize three underlying assumptions of the
animation on lightning formation, which theory derived from cognitive science; de-
we have studied in numerous experiments scribe three memory stores, five cognitive
(Mayer, 2001 ). processes, and five forms of representation in
Learning can be measured by tests of re- the theory; and then provide examples and
tention (i.e., remembering the presented in- a conclusion.
formation) and transfer (i.e., being able to
use the information to solve new problems).
Our focus is on transfer because we are
mainly interested in how words and pictures Three Assumptions of the Cognitive
can be used to promote understanding. In Theory of Multimedia Learning
short, transfer tests can help tell us how well
people understand what they have learned. Decisions about how to design a multimedia
We are particularly interested in the cog- message always reflect an underlying con-
nitive processes by which people construct ception of how people learn – even when the
meaningful learning outcomes from words underlying theory of learning is not stated.
and pictures. In short, the design of multimedia messages
Dual-Channel Assumption
is influenced by the designer’s conception The dual-channel assumption is that hu-
of how the human mind works. For exam- mans possess separate information process-
ple, when a multimedia presentation con- ing channels for visually represented ma-
sists of a screen overflowing with multicol- terial and auditorily represented material.
ored words and images – flashing and moving The dual-channel assumption is incorpo-
about – this reflects the designer’s concep- rated into the cognitive theory of multi-
tion of human learning. The designer’s un- media learning by proposing that the hu-
derlying conception is that human learners man information-processing system contains
possess a single-channel, unlimited capacity, an auditory/verbal channel and a visual/
and passive-processing system. First, by not pictorial channel. When information is pre-
taking advantage of auditory modes of pre- sented to the eyes (such as illustrations,
sentation, this design is based on a single- animations, video, or on-screen text), hu-
channel assumption – all information enters mans begin by processing that information
in the visual channel; when information is terial and the other channel processes audi-
presented to the ears (such as narration or torily represented material. This conceptu-
nonverbal sounds), humans begin by pro- alization is most consistent with Baddeley’s
cessing that information in the auditory (1 986, 1 999) distinction between the visuo-
channel. The concept of separate informa- spatial sketchpad and the phonological (or
tion processing channels has a long history articulatory) loop.
in cognitive psychology and currently is most Whereas the presentation-mode ap-
closely associated with Paivio’s dual-coding proach focuses on the format of the
theory (Clark & Paivio, 1 991 ; Paivio, 1 986) stimulus-as-presented (i.e., verbal or non-
and Baddeley’s model of working memory verbal), the sensory-modality approach fo-
(Baddeley, 1 986, 1 999). cuses on the stimulus-as-represented in
working memory (i.e., auditory or visual).
what is processed in each channel? The major difference concerning multime-
There are two ways of conceptualizing the dia learning rests in the processing of printed
differences between the two channels – words (i.e., on-screen text) and background
one based on presentation modes and the sounds. On-screen text is initially processed
other based on sensory modalities. The in the verbal channel in the presentation-
presentation-mode approach focuses on mode approach but in the visual chan-
whether the presented stimulus is verbal nel in the sensory-modality approach. Back-
(such as spoken or printed words) or non- ground sounds, including nonverbal music,
verbal (such as pictures, video, animation, are initially processed in the nonverbal chan-
or background sounds). According to the nel in the presentation-mode approach but
presentation-mode approach, one channel in the auditory channel in the sensory-
processes verbal material and the other mode approach.
channel processes pictorial material and For purposes of the cognitive theory of
nonverbal sounds. This conceptualization is multimedia learning, I have opted for a
most consistent with Paivio’s (1 986) distinc- compromise in which I use the sensory-
tion between verbal and nonverbal systems. modality approach to distinguish between
In contrast, the sensory-modality ap- visually presented material (e.g., pictures,
proach focuses on whether learners initially animations, video, and on-screen text) and
process the presented materials through auditorily presented material (e.g., narra-
their eyes (e.g., for pictures, video, anima- tion and background sounds) as well as
tion, or printed words) or ears (e.g., for spo- a presentation-mode approach to distin-
ken words or background sounds). Accord- guish between the construction of pictorially
ing to the sensory-modality approach, one based and verbally based models in working
channel processes visually represented ma- memory. However, additional research is
needed to clarify the nature of the differ- the learner is able to hold only a few words in
ences between the two channels. working memory at any one time, reflecting
portions of the presented text rather than
what is the relation between the channels? a verbatim recording. For example, if the
Although information enters the human spoken text is “When the handle is pushed
information system through one channel, down, the piston moves down, the inlet
learners may also be able to convert the rep- valve opens, the outlet valve closes, and air
resentation for processing in the other chan- enters the bottom of cylinder,” the learner
nel. When learners are able to devote ad- may be able to hold the following verbal rep-
equate cognitive resources to the task, it is resentations in auditory working memory:
possible for information originally presented “handle goes up,” “inlet valve opens,” and “air
to one channel to also be represented in the enters cylinder.” The conception of limited
other channel. For example, on-screen text capacity in consciousness has a long history
may initially be processed in the visual chan- in psychology, and some modern examples
nel because it is presented to the eyes, but an are Baddeley’s (1 986, 1 999) theory of work-
experienced reader may be able to mentally ing memory and Chandler and Sweller’s
convert images into sounds, which are pro- (1 991 ; Sweller, 1 999) cognitive load theory.
cessed through the auditory channel. Sim-
ilarly, an illustration of an object or event what are the limits on cognitive capacity?
such as a cloud rising above the freezing
level may initially be processed in the vi- If we assume that each channel has limited
sual channel, but the learner may also be processing capacity, it is important to know
able to mentally construct the correspond- just how much information can be processed
ing verbal description in the auditory chan- in each channel. The classic way to mea-
nel. Conversely, a narration describing some sure someone’s cognitive capacity is to give
event such as “the cloud rises above the a memory span test (Miller, 1 95 6; Simon,
freezing level” may initially be processed in 1 980). For example, in a digit span test, I can
the auditory channel because it is presented read a list of digits at the rate of one digit per
to the ears, but the learner may also form second (e.g., 8-7-5 -3 -9-6-4) and ask you to
a corresponding mental image that is pro- repeat them back in order. The longest list
cessed in the visual channel. Cross-channel that you can recite without making an er-
representations of the same stimulus play ror is your memory span for digits (or digit
an important role in Paivio’s (1 986) dual- span). Alternatively, I can show you a se-
coding theory. ries of line drawings of simple objects at the
rate of one per second (e.g., moon-pencil-
comb-apple-chair-book-pig) and ask you to
Limited Capacity Assumption
repeat them back in order. Again, the longest
The second assumption is that humans are list you can recite without making an error
limited in the amount of information that is your memory span for pictures. Although
can be processed in each channel at one time. there are individual differences, on average
When an illustration or animation is pre- memory span is fairly small – approximately
sented, the learner is able to hold only a few five to seven chunks.
images in working memory at any one time, With practice, of course, people can learn
reflecting portions of the presented material techniques for chunking the elements in the
rather than an exact copy of the presented list, such as grouping the seven digits 8-7-5 -
material. For example, if an illustration or 3 -9-6-4 into three chunks 875 -3 9-64 (e.g.,
animation of a tire pump is presented, the “eight seven five” pause “three nine” pause
learner may be able to focus on building “six four”). In this way, the cognitive ca-
mental images of the handle going down, pacity remains the same (e.g., five to seven
the inlet valve opening, and air moving into chunks) but more elements can be remem-
the cylinder. When a narration is presented, bered within each chunk. Researchers have
developed more refined measures of ver- come of active cognitive processing is the
bal and visual working memory capacity, construction of a coherent mental represen-
but continue to show that human process- tation, so active learning can be viewed as a
ing capacity is severely limited (Miyake & process of model building. A mental model
Shah, 1 999). (or knowledge structure) represents the key
parts of the presented material and their re-
how are limited cognitive resources lations. For example, in a multimedia pre-
allocated? sentation of how lightning storms develop,
The constraints on our processing capac- the learner may attempt to build a cause-
ity force us to make decisions about which and-effect system in which a change in one
pieces of incoming information to pay at- part of the system causes a change in another
tention to, the degree to which we should part. In a lesson comparing and contrasting
build connections among the selected pieces two theories, construction of a mental model
of information, and the degree to which involves building a sort of matrix structure
we should build connections between se- that compares the two theories along several
lected pieces of information and our existing dimensions.
knowledge. Metacognitive strategies are tech- If the outcome of active learning is the
niques for allocating, monitoring, coordinat- construction of a coherent mental represen-
ing, and adjusting these limited cognitive re- tation, it is useful to explore some of the
sources. These strategies are at the heart of typical ways that knowledge can be struc-
what Baddeley (1 986, 1 999) calls the central tured. Some basic knowledge structures
executive – the system that controls the al- include process, comparison, generalization,
location of cognitive resources – and play enumeration, and classification (Chambliss &
a central role in modern theories of intel- Calfee, 1 998; Cook & Mayer, 1 988). Pro-
ligence (Sternberg, 1 990). cess structures can be represented as cause-
and-effect chains and consist of explanations
Active Processing Assumption of how some system works. An example
is an explanation of how the human ear
The third assumption is that humans ac- works. Comparison structures can be rep-
tively engage in cognitive processing in or- resented as matrices and consist of compar-
der to construct a coherent mental repre- isons among two or more elements along
sentation of their experiences. These active several dimensions. An example is a com-
cognitive processes include paying attention, parison between how two competing theo-
organizing incoming information, and in- ries of learning view the role of the learner,
tegrating incoming information with other the role of the teacher, and useful types of
knowledge. In short, humans are active pro- instructional methods. Generalization struc-
cessors who seek to make sense of multime- tures can be represented as a branching tree
dia presentations. This view of humans as and consist of a main idea with subordinate
active processors conflicts with a common supporting details. An example is a chap-
view of humans as passive processors who ter outline for a chapter explaining the ma-
seek to add as much information as possible jor causes for the American Civil War. Enu-
to memory, that is, as tape recorders who file meration structures can be represented as
copies of their experiences in memory to be lists and consist of a collection of items. An
retrieved later. example is the names of principles of multi-
media learning listed in this handbook. Clas-
what are the major ways that knowledge sification structures can be represented as hi-
can be structured? erarchies and consist of sets and subsets. An
Active learning occurs when a learner ap- example is a biological classification system
plies cognitive processes to incoming mate- for sea animals.
rial – processes that are intended to help the Understanding a multimedia message of-
learner make sense of the material. The out- ten involves constructing one of these kinds
to be held as exact auditory images for a Finally, the box on the right is labeled long-
very brief time period in an auditory sensory term memory and corresponds to the learner’s
memory (at the bottom). The arrow from storehouse of knowledge. Unlike working
pictures to eyes corresponds to a picture be- memory, long-term memory can hold large
ing registered in the eyes, the arrow from amounts of knowledge over long periods of
words to ears corresponds to spoken text be- time, but to actively think about material
ing registered in the ears, and the arrow from in long-term memory it must be brought
words to eyes corresponds to printed text be- into working memory (as indicated by the
ing registered in the eyes. arrow from long-term memory to work-
The central work of multimedia learning ing memory).
takes place in working memory so let’s focus
there. Working memory is used for tempo-
rally holding and manipulating knowledge in
active consciousness. For example, in read- Five Processes in the Cognitive Theory
ing this sentence you may be able to actively of Multimedia Learning
concentrate on only some of the words at
one time, or in looking at Figure 3 .2 you may For meaningful learning to occur in a mul-
be able to hold the images of only some of timedia environment, the learner must en-
the boxes and arrows in your mind at one gage in five cognitive processes: (1 ) selecting
time. This kind of processing – that is, pro- relevant words for processing in verbal work-
cessing that involves conscious awareness – ing memory, (2) selecting relevant images for
takes place in working memory. The left side processing in visual working memory, (3 ) or-
of working memory represents the raw ma- ganizing selected words into a verbal model,
terial that comes into working memory – (4) organizing selected images into a picto-
visual images of pictures and sound images rial model, and (5 ) integrating the verbal and
of words – so it is based on the two sen- pictorial representations with each other and
sory modalities that I call visual and auditory. with prior knowledge. Although I present
In contrast, the right side of working mem- these processes as a list, they do not nec-
ory represents the knowledge constructed in essarily occur in linear order, so a learner
working memory – pictorial and verbal mod- might move from process to process in many
els and links between them – so it is based different ways. Successful multimedia learn-
on the two representation modes that I call ing requires that the learner coordinate and
pictorial and verbal. I use the term pictorial monitor these five processes.
model to include spatial representations. The
arrow from sounds to images represents the
Selecting Relevant Words
mental conversion of a sound (such as
the spoken word cat) into a visual image The first labeled step listed in Figure 3 .2
(such as an image of a cat) – that is, when involves a change in knowledge represen-
you hear the word “cat” you might also form tation from the external presentation of
a mental image of a cat. The arrow from im- spoken words (e.g., computer-generated nar-
ages to sounds represents the mental con- ration) to a sensory representation of sounds
version of a visual image (e.g., a mental pic- to an internal working memory representa-
ture of a cat) into a sound (e.g., the sound tion of word sounds (e.g., some of the words
of the word “cat”) – that is, you mentally hear in the narration). The input for this step is a
the word cat when you see a picture of one. spoken verbal message – that is, the spoken
The major cognitive processing required words in the presented portion of the mul-
for multimedia learning is represented by timedia message. The output for this step
the arrows labeled selecting images, selecting is a word sound base (called sounds in Fig-
words, organizing images, organizing words, ure 3 .2) – that is, a mental representation
and integrating, which are described in the in the learner’s verbal working memory of
next section. selected words or phrases.
The cognitive process mediating this (e.g., a visual image of part of the anima-
change is called selecting relevant words and tion or illustration). The input for this step is
involves paying attention to some of the a pictorial portion of a multimedia message
words that are presented in the multime- that is held briefly in visual sensory memory.
dia message as they pass through auditory The output for this step is a visual image base
sensory memory. If the words are presented (called images in Figure 3 .2) – a mental rep-
as speech, this process begins in the audi- resentation in the learner’s working memory
tory channel (as indicated by the arrows of selected images.
from words to ears to sounds). However, if The cognitive process underlying this
the words are presented as on-screen text change – selecting relevant images – involves
or printed text, this process begins in the paying attention to part of the animation
visual channel (as indicated by the arrow or illustrations presented in the multime-
from words to eyes) and later may move to dia message. This process begins in the vi-
the auditory channel if the learner mentally sual channel, but it is possible to convert
articulates the printed words (as indicated part of it to the auditory channel (e.g., by
by the arrow from images to sounds in the mentally narrating an ongoing animation).
left portion of working memory). The need The need to select only part of the pre-
for selecting only part of the presented mes- sented pictorial material arises from the lim-
sage occurs because of capacity limitations ited processing capacity of the cognitive sys-
in each channel of the cognitive system. If tem. It is not possible to process all parts of
the capacity were unlimited, there would be a complex illustration or animation so learn-
no need to focus attention on only part of ers must focus on only part of the incom-
the verbal message. Finally, the selection of ing pictorial material. Finally, the selection
words is not arbitrary. The learner must de- process for images – like the selection pro-
termine which words are most relevant – an cess for words – is not arbitrary because the
activity that is consistent with the view of learner must judge which images are most
the learner as an active sense maker. relevant for making sense out of the multi-
For example, in the lightning lesson, media presentation.
one segment of the multimedia presenta- In the lightning lesson, for example,
tion contains the words, “Cool moist air one segment of the animation shows blue-
moves over a warmer surface and becomes colored arrows – representing cool air – mov-
heated,” the next segment contains the ing over a heated land surface that contains
words, “Warmed moist air near the earth’s a house and trees; another segment shows
surface rises rapidly,” and the next segment the arrows turning red and traveling upward
has the words, “As the air in this updraft above a tree; and a third segment shows the
cools, water vapor condenses into water arrows changing into a cloud with lots of
droplets and forms a cloud.” When a learner dots inside. In selecting relevant images, the
engages in the selection process, the result learner may compress all this into images of
may be that some of the words are repre- a blue arrow pointing rightward, a red ar-
sented in verbal working memory – such row pointing upward, and a cloud. Details
as, “Cool air becomes heated, rises, forms such as the house and tree on the surface,
a cloud.” the wavy form of the arrows, and the dots in
the cloud are lost.
Selecting Relevant Images
Organizing Selected Words
The second step involves a change in knowl-
edge representation from the external pre- Once the learner has formed a word sound
sentation of pictures (e.g., an animation seg- base from the incoming words of a segment
ment or an illustration) to a sensory repre- of the multimedia message, the next step is
sentation of unanalyzed visual images to an to organize the words into a coherent repre-
internal representation in working memory sentation – a knowledge structure that I call
a verbal model. The input for this step is the ble connections among images in their work-
word sound base – the word sounds selected ing memory, but rather must focus on build-
from the incoming verbal message. The out- ing a simple set of connections. As in the
put for this step is a verbal model – a co- process of organizing words, the process of
herent (or structured) representation in the organizing images is not arbitrary. Rather, it
learner’s working memory of the selected reflects an effort to build a simple structure
words or phrases. that makes sense to the learner – such as a
The cognitive process involved in this cause-and-effect chain.
change is organizing selected words in which For example, in the lightning lesson, the
the learner builds connections among pieces learner may build causal connections be-
of verbal knowledge. This process is most tween the selected images: The rightward-
likely to occur in the auditory channel and is moving blue arrow turns into a rising red ar-
subject to the same capacity limitations that row, which turns into a cloud. In short, the
affect the selection process. Learners do not learner builds causal links in which the first
have unlimited capacity to build all possible event leads to the second and so on.
connections so they must focus on building
a simple structure. The organizing process is
Integrating Word-Based and
not arbitrary, but rather reflects an effort at
Image-Based Representations
sense making – such as the construction of a
cause-and-effect chain. Perhaps the most crucial step in multime-
For example, in the lightning lesson, dia learning involves making connections be-
the learner may build causal connections tween word-based and image-based repre-
between the selected verbal components: sentations. This step involves a change from
“First: cool air is heated; second: it rises; having two separate representations – a pic-
third: it forms a cloud.” In mentally build- torial model and a verbal model – to having
ing a causal chain, the learner is organizing an integrated representation in which cor-
the selected words. responding elements and relations from one
model are mapped onto the other. The in-
put for this step is the pictorial model and
Organizing Selected Images
the verbal model that the learner has con-
The process for organizing images parallels structed so far, and the output is an inte-
that for selecting words. Once the learner grated model, which is based on connect-
has formed an image base from the incom- ing the two representations. In addition, the
ing pictures of a segment of the multime- integrated model includes connections with
dia message, the next step is to organize the prior knowledge.
images into a coherent representation – a I refer to this cognitive process as inte-
knowledge structure that I call a pictorial grating words and images because it involves
model. The input for this step is the visual building connections between correspond-
image base – the images selected from the ing portions of the pictorial and verbal mod-
incoming pictorial message. The output for els as well as knowledge from long-term
this step is a pictorial model – a coherent (or memory. This process occurs in visual and
structured) representation in the learner’s verbal working memory, and involves the
working memory of the selected images. coordination between them. This is an ex-
This change from images to pictorial tremely demanding process that requires the
model requires the application of a cogni- efficient use of cognitive capacity. The pro-
tive process that I call organizing selected im- cess reflects the epitome of sense making
ages. In this process, the learner builds con- because the learner must focus on the under-
nections among pieces of pictorial knowl- lying structure of the visual and verbal repre-
edge. This process occurs in the visual chan- sentations. The learner can use prior knowl-
nel, which is subject to the same capacity edge to help coordinate the integration
limitations that affect the selection process. process, as indicated by the arrow from long-
Learners lack the capacity to build all possi- term memory to working memory.
Table 3.2 . Five Cognitive Processes in the Cognitive Theory of Multimedia Learning
Process Description
Selecting words Learner pays attention to relevant words in a multimedia
message to create sounds in working memory
Selecting images Learner pays attention to relevant pictures in a multimedia
message to create images in working memory
Organizing words Learner builds connections among selected words to create
a coherent verbal model in working memory
Organizing images Learner builds connections among selected images to
create a coherent pictorial model in working memory
Integrating Learner builds connections between verbal and pictorial
models and with prior knowledge
For example, in the lightning lesson, the tures, reflecting their stage of processing. To
learner must see the connection between the far left, we begin with words and pic-
the verbal chain – “First, cool air is heated; tures in the multimedia presentation, that is,
second, it rises; third, it forms a cloud” – the stimuli that are presented to the learner.
and the pictorial chain – the blue arrow fol- In the case of the lightning message shown in
lowed by the red arrow followed by the Figure 3 .1 , the words are the spoken words
cloud shape. In addition, prior knowledge presented through the computer’s speakers
can be applied to the transition from the first and the pictures are the frames of the ani-
to the second event by remembering that hot mation presented on the computer’s screen.
air rises. Second, as the presented words and pictures
The five cognitive processes in multime- impinge on the learner’s ears and eyes, the
dia learning are summarized in Table 3 .2. next form of representation is acoustic rep-
Each of the five processes in multime- resentations (or sounds) and iconic represen-
dia learning is likely to occur many times tations (or images) in sensory memory. The
throughout a multimedia presentation. The sensory representations fade rapidly, unless
processes are applied segment by segment the learner pays attention to them. Third,
rather than to the entire message as a whole. when the learner selects some of the words
For example, in processing the lightning les- and images for further processing in work-
son, learners do not first select all relevant ing memory, the next form of representa-
words and images from the entire passage, tion is sounds and images in working memory.
then organize them into verbal and picto- These are the building blocks for knowledge
rial models of the entire passage, and then construction – including key phrases such as,
connect the completed models with one an- “warmed air rises,” and key images such as
other at the very end. Rather, learners carry red arrows moving upward. The fourth form
out this procedure on small segments: they of representation results from the learner’s
select relevant words and images from the construction of a verbal model and pictorial
first sentence of the narration and the first model in working memory. Here the learner
few seconds of the animation; they orga- has organized the material into coherent ver-
nize and integrate them; and then this set of bal and pictorial representations, and also
processes is repeated for the next segment, has mentally integrated them. Finally, the
and so on. fifth form of representation is knowledge in
long-term memory, which the learner uses for
guiding the process of knowledge construc-
Five Forms of Representation tion in working memory. Sweller (1 999, and
chapter 2, this volume) refers to this knowl-
As you can see in Figure 3 .2, there are five edge as schemas. After new knowledge is
forms of representation for words and pic- constructed in working memory, it is stored
Table 3.3. Five Forms of Representation in the Cognitive Theory of Multimedia Learning
Type of knowledge Location Example
Words and pictures Multimedia presentation Sound waves from computer speaker:
“Warmed moist air. . . . ”
Acoustic and iconic Sensory memory Received sounds in learner’s ears:
representations “Warmed moist air. . . . ”
Sounds and images Working memory Selected sounds: “warmed air rises”
Verbal and pictorial Working memory Mental model of cloud formation
models
Prior knowledge Long-term memory Schema for differences in air pressure
in long-term memory as prior knowledge to the lightning photograph from the first en-
be used in supporting new learning. The five cyclopedia (i.e., a static picture) or the light-
forms of representation are summarized in ing animation from the second encyclope-
Table 3 .3 . dia (i.e., a dynamic picture). The second
event – represented by the “eyes” box under
“sensory memory” – is that the pictures im-
Examples of How Three Kinds of pinge on the eyes, resulting in a brief sensory
Presented Materials Are Processed image – that is for a brief time the student’s
eye beholds the photograph or the anima-
tion frames.
Let’s take a closer look at how three kinds of
These first two events happen without
presented materials are processed from start
much effort on the part of the learner, but
to finish according to the model of multime-
next, the active cognitive processing be-
dia learning summarized in Figure 3 .2: pic-
gins – the processing over which the learner
tures, spoken words, and printed words. For
has some conscious control. If the student
example, suppose that a student clicks on an
pays attention to the fleeting images com-
entry for lightning in a multimedia encyclo-
ing from the eyes, parts of the images will
pedia and is presented with a static picture
become represented in working memory.
of a lightning storm with a paragraph of on-
This attentional processing corresponds to
screen text about the number of injuries and
the arrow labeled “selecting images” and the
deaths caused by lightning each year. Simi-
resulting mental representation is labeled
larly, suppose the student then clicks on the
“images” under “working memory.” Once
entry for lightning in another multimedia en-
working memory is full of image pieces, the
cyclopedia and is presented with a short an-
next active cognitive processing involves or-
imation along with narration describing the
ganizing those pieces into a coherent struc-
steps in lightning formation. In these exam-
ture – a process indicated by the “organiz-
ples, the first presentation contains static pic-
ing images” arrow. The resulting knowledge
tures and printed words whereas the second
representation is a pictorial model, that is,
presentation contains dynamic pictures and
the student builds an organized visual rep-
spoken words.
resentation of the main parts of a lightning
bolt (from the first encyclopedia) or an orga-
Processing of Pictures
nized set of images representing the cause-
The top frame in Figure 3 .3 shows the path and-effect steps in lightning formation (from
for processing of pictures – indicated by the second encyclopedia).
thick arrows and darkened boxes. The first Finally, active cognitive processing is re-
event – represented by the “pictures” box quired to connect the new representation
under “multimedia presentation” at the left with other knowledge – a process indicated
side of Figure 3 .3 – is the presentation of by the “integrating” arrow. For example, the
Processing of Pictures
MUTIMEDIA SENSORY LONG-TERM
WORKING MEMORY
PRESENTATION MEMORY MEMORY
student may use prior knowledge about elec- Processing of Spoken Words
tricity to help include moving positive and
negative charges in the mental representa- The middle frame in Figure 3 .3 shows the
tion of the lightning bolt or may use prior path for processing of spoken words – indi-
knowledge of electricity to help explain why cated by thick arrows and darkened boxes.
the negative and positive charges are at- When the computer produces narration (as
tracted to one another. In addition, if the indicated by the “words” box under “multi-
learners have also produced a verbal model, media presentation”) the sounds are picked
they may try to connect it to the pictorial up by the student’s ears (as indicated by the
model – such as looking for how a phrase “ears” box under “sensory memory”). For ex-
in the text corresponds to a part of the im- ample, when the computer says, “The nega-
age. This processing results in an integrated tively charged particles fall to the bottom of
learning outcome indicated by the circle un- the cloud, and most of the positively charged
der “working memory.” particles rise to the top,” these words are
picked up by the student’s ears and held tem- The presentation of printed text in mul-
porarily in auditory sensory memory. Next, timedia messages creates an information-
active cognitive processing can take place. processing challenge for the dual-channel
If the student pays attention to the sounds system portrayed in Figure 3 .2. For exam-
coming into the ears (as indicated by the ple, consider the case of a student who must
arrow labeled “selecting words”), some of read text and view an illustration. The words
the incoming sounds will be selected for in- are presented visually so they must initially
clusion in the word sound base (indicated be processed through the eyes – as indicated
by the “sounds” box under “working mem- by the arrow from “words” to “eyes.” Then,
ory”). For example, the resulting collection the student may attend to some of the in-
of words in working memory might include coming words (as indicated by the “selecting
“positive top, negative bottom.” The words images” arrow) and bring them into working
in the word base are disorganized fragments, memory as part of the images. Then, by men-
so the next step – indicated by the “orga- tally pronouncing the images of the printed
nizing words” arrow – is to build them into a words the student can get the words into
coherent mental structure – indicated by the the auditory/verbal channel – as indicated
“verbal model” box. In this process, the by the arrow from the images to the sounds.
words change from being represented based Once the words are represented in the audi-
on sound to being represented based on tory/verbal channel they are processed like
word meaning. The result could be a cause- the spoken words, as described previously.
effect chain for the steps in lightning for- This path is presented in the bottom frame
mation. Lastly, the student may use prior of Figure 3 .3 . As you can see, when verbal
knowledge to help explain the transition material must enter through the visual chan-
from one step to another and may connect nel, the words must take a complex route
words with pictures – such as connecting through the system, and must also compete
“positive top, negative bottom” with an im- for attention with the illustration that the
age of positive particles in the top of a cloud student is also processing through the visual
and negative charges in the bottom. This channel. The consequences of this problem
process is labeled “integrating” and the re- are addressed in chapters 9 and 1 1 on the
sulting integrated learning outcome is indi- modality principle.
cated by the circle under “working memory.”
“dual-coding model” (Mayer & Anderson, ganizing words, organizing images, and in-
1 991 , 1 992) and “dual-processing model of tegrating), and (e) five kinds of represen-
multimedia learning” (Mayer & Moreno, tations (i.e., presented words and pictures;
1 998; Mayer, Moreno, Boire, & Vagge, sounds and images in sensory memory; se-
1 999) – emphasized the dual-channels ele- lected sounds and images in working mem-
ment. Yet other names – such as “generative ory; verbal and pictorial models in work-
theory” (Mayer, Steinhoff, Bower, & Mars, ing memory; and knowledge in long-term
1 995 ) and “generative theory of multimedia memory). The theory incorporates elements
learning” (Mayer, 1 997; Plass, Chun, Mayer, from classic information-processing models,
& Leutner, 1 998) – emphasized all three el- such as two channels from Paivio’s (1 986)
ements. The current name, “cognitive the- dual-coding theory, limited processing capac-
ory of multimedia learning,” was used in ity from Baddeley’s (1 986, 1 999) model of
Mayer, Bove, Bryman, Mars, and Tapangco working memory, and a flowchart represen-
(1 996), Moreno and Mayer (2000), and tation of memory stores and cognitive processes
Mayer, Heiser, and Lonn (2001 ), and was se- from Atkinson and Shiffrin (1 968).
lected for use in major reviews (Mayer, 2001 , Key components of the cognitive the-
2002, 2003 a; Mayer & Moreno, 2003 ). ory of multimedia learning are consis-
An early predecessor to the flowchart rep- tent with other multimedia instructional
resentation shown in Figure 3 .2 in this chap- design theories such as Sweller’s (1 999,
ter was a dual-coding model shown in Mayer 2003 , chapter 2) cognitive load theory, and
and Sims (1 994, Figure 1 ) which contained Schnotz and Bannert’s (2003 ; Schnotz,
the same two channels and three of the same chapter 4) integrated model of text and pic-
five cognitive processes, but lacked two of ture comprehension.
the cognitive processes and sensory memory. First, consider Sweller’s (1 999, 2003 ,
Mayer, Steinhoff, Bower, and Mars (1 995 , chapter 2) cognitive load theory. Like the
Figure 1 ) and Mayer (1 997, Figure 3 ) pre- cognitive theory of multimedia learning,
sented an intermediate version that is al- Sweller’s (1 999) cognitive load theory ac-
most identical to the flowchart shown in knowledges “separate channels for dealing
Figure 3 .2 except that it lacked long-term with auditory and visual material” (p. 1 3 8)
memory and sensory memory. Finally, the and emphasizes that “we can hold few el-
current version of the flowchart appeared ements in working memory” (p. 4). Like
in Mayer, Heiser, and Lonn (2001 ), and was the cognitive theory of multimedia learn-
reproduced in subsequent reviews (Mayer, ing, the architecture of the human informa-
2001 , Figure 2; Mayer, 2002, Figure 7; Mayer, tion processing allows for several kinds of
2003 a, Figure 2; Mayer & Moreno, 2003 , representations: elements in the presented
Figure 1 ). Thus, the model has developed material correspond to words and pictures
by adding components – both cognitive pro- in the multimedia presentation, elements in
cesses and mental representations – and clar- working memory correspond to verbal and
ifying their role. The result is the cognitive pictorial models in working memory, and
theory of multimedia learning that is repre- schemas in long-term memory correspond
sented in the flowchart in Figure 3 .2 of this to knowledge in long-term memory. Cogni-
chapter. tive load theory elaborates on the implica-
tions of limited working memory capacity
for instructional design, and focuses on ways
Comparison With Related Theories
in which instruction imposes cognitive load
As can be seen in Figure 3 .2, the cognitive on learners. However, it does not focus on
theory of multimedia learning involves (a) the kinds of information processes involved
two channels (i.e., visual and verbal), (b) in multimedia learning.
limited processing capacity, (c) three kinds Second, consider Schnotz and Bannert’s
of memory stores, and (d) five cognitive pro- integrated model of text and picture com-
cesses (selecting words, selecting images, or- prehension as summarized in Figure 3 .2 of
Schnotz and Bannert (2003 ). Like the cogni- as determining how to measure cognitive
tive theory of multimedia learning, Schnotz load during learning, determining the opti-
and Bannert’s model emphasizes two chan- mal size of a chunk of presented informa-
nels, but unlike the cognitive theory of mul- tion, or determining the way that a mental
timedia learning it does not emphasize lim- model is represented in the learner’s mem-
ited capacity. All five cognitive processes ory. Second, there is a need to find consen-
are represented although with some dif- sus among theorists, such as reconciliation
ferences in conceptualization: subsemantic among cognitive load theory (Sweller, chap-
processing corresponds to selecting words, ter 2), and the cognitive theory of multime-
perception corresponds to selecting images, dia learning (this chapter), the integrative
semantic processing corresponds to organiz- model of text and picture comprehension,
ing words, thematic selection corresponds (Schnoz, chapter 4), the four-component
to organizing images, and model construc- instructional design model (Merriënboer &
tion/inspection corresponds to integrating. Kester, chapter 5 ), and related theories.
Four of the five representations are in- Third, we have a continuing need to generate
cluded although, again, with some differ- testable predictions from theories of multi-
ences in conceptualization: text and picture/ media learning and to test these predictions
diagram corresponds to words and pictures in rigorous scientific experiments. The best
in the multimedia presentation; text sur- way to insure the usefulness of theories of
face representation and visual image cor- multimedia learning is to have coherent re-
respond to sounds and images in working search literature on which to base them.
memory; propositional representation and
mental model correspond to verbal model Summary
and pictorial model; and conceptual orga-
In summary, multimedia learning takes place
nization corresponds to knowledge in long-
within the learner’s information system – a
term memory.
system that contains separate channels for
In summary, the cognitive theory of mul-
visual and verbal processing, a system with
timedia learning is compatible and some-
serious limitations on the capacity of each
what similar to other multimedia design
channel, and a system that requires coordi-
theories. Sweller’s (1 999, 2003 , chapter 2)
nated cognitive processing in each channel
cognitive load theory offers further elabora-
for active learning to occur. In particular,
tions on the role of limited capacity in in-
multimedia learning is a demanding pro-
structional design for multimedia learning,
cess that requires selecting relevant words
and Schnotz and Bannert’s (2003 , Schnotz,
and images; organizing them into coher-
chapter 4) offers further elaborations on the
ent verbal and pictorial representations; and
nature of mental representations in multi-
integrating the verbal and pictorial repre-
media learning.
sentations with each other and with prior
knowledge. In the process of multimedia
Future Directions
learning, material is represented in five
Although we have made progress in creat- forms: as words and pictures in a multimedia
ing a cognitive theory of multimedia learn- presentation; acoustic and iconic represen-
ing, much remains to done, particularly (a) tations in sensory memory; sounds and im-
in fleshing out the details of the mechanisms ages in working memory; verbal and pictorial
underlying the five cognitive processes and models in working memory; and knowledge
the five forms of representation, (b) in in- in long-term memory. The theme of this
tegrating the various theories of multimedia chapter is that multimedia messages should
learning, and (c) in building a credible re- be designed to facilitate multimedia learn-
search base. First, more work is needed to ing processes. Multimedia messages that are
understand and measure the basic constructs designed in light of how the human mind
in theories of multimedia learning, such works are more likely to lead to meaningful
Mayer, R. E., & Anderson, R. B. (1 991 ). Anima- ter meaningful learning of science text. Educa-
tions need narrations: An experimental test of tional Technology Research & Development, 43 ,
the dual-coding hypothesis. Journal of Educa- 3 1 –43 .
tional Psychology, 83 , 484–490. Miller, G. A. (1 95 6). The magic number seven,
Mayer, R. E., & Anderson, R. B. (1 992). The plus or minus two: Some limits on our capac-
instructive animation: Helping students build ity for processing information. Psychological Re-
connections between words and pictures in view, 63 , 81 –97.
multimedia learning. Journal of Educational Miyake, A., & Shah, P. (Eds.). (1 999). Models of
Psychology, 84, 444–45 2. working memory. New York: Cambridge Uni-
Mayer, R. E., Bove, W., Bryman, A., Mars, R., & versity Press.
Tapangco, L. (1 996). When less is more: Mean- Moreno, R., & Mayer, R. E. (2000). A coher-
ingful learning from visual and verbal sum- ence effect in multimedia learning: The case
maries of science textbook lessons. Journal of for minimizing irrelevant sounds in the design
Educational Psychology, 88, 64–73 . of multimedia instructional messages. Journal
Mayer, R. E., & Gallini, J. K. (1 990). When is an of Educational Psychology, 92 , 1 1 7–1 25 .
illustration worth ten thousand words? Journal Paivio, A. (1 986). Mental representations: A dual
of Educational Psychology, 82 , 71 5 –726. coding approach. New York: Oxford University
Mayer, R. E., Heiser, J., & Lonn, S. (2001 ). Cogni- Press.
tive constraints on multimedia learning: When
Plass, J. L., Chun, D. M., Mayer, R. E., & Leut-
presenting more material results in less under-
ner, D. (1 998). Supporting visual and verbal
standing. Journal of Educational Psychology, 93 ,
learning preferences in a second-language mul-
1 87–1 98.
timedia learning environment. Journal of Edu-
Mayer, R. E., & Moreno, R. (1 998). A split- cational Psychology, 90, 25 –3 6.
attention effect in multimedia learning: Evi-
Schnotz, W., & Bannert, M. (2003 ). Construction
dence for dual processing systems in working
and interference in learning from multiple rep-
memory. Journal of Educational Psychology, 90,
resentation. Learning and Instruction, 1 3 , 1 41 –
3 1 2–3 20.
1 5 6.
Mayer, R. E., & Moreno, R. (2003 ). Nine ways to
Simon, H. A., (1 974). How big is a chunk?
reduce cognitive load in multimedia learning.
Science, 1 83 , 482–488.
Educational Psychologist, 3 8, 43 –5 2.
Mayer, R. E., Moreno, R., Boire, M., & Vagge, Sternberg, R. J. (1 990). Metaphors of mind: Con-
S. (1 999). Maximizing constructivist learning ceptions of the nature of intelligence. New York:
from multimedia communications by minimiz- Cambridge University Press.
ing cognitive load. Journal of Educational Psy- Sweller, J. (1 999). Instructional design in technical
chology, 91 , 63 8–643 . areas. Camberwell, Australia: ACER Press.
Mayer, R. E., & Sims, V. K., (1 994). For whom is Sweller, J. (2003 ). Evolution of human cognitive
a picture worth a thousand words? Extensions architecture. In B. Ross (Ed.), The psychology of
of a dual-coding theory of multimedia learning. learning and motivation (Vol. 43 , pp. 21 5 –21 6).
Journal of Educational Psychology, 86, 3 89–401 . San Diego, CA: Academic Press.
Mayer, R. E., Steinhoff, K., Bower, G., & Mars, Wittrock, M. C. (1 989). Generative processes of
R. (1 995 ). A generative theory of textbook comprehension. Educational Psychologist, 2 4,
design: Using annotated illustrations to fos- 3 45 –3 76.